5 Exercise on data preparation

The example presented in this section uses data from three soil profiles from undisturbed forests in Suriname (Poels 1987) collected with a soil auger: one Orthic Ferralsol (Profile 35, 8 horizons) and two Xanthic Ferralsols (Profiles 36 and 37, 9 horizons each). This file is available in the test-data folder in your installation folder. It is also available at the GloSIS repository from GitHub.

This data requires harmonization in coordinate systems, dates, soil properties, laboratory methods and units of measurement before being included in the GloSIS database. For simplification, only analytical data for selected soil parameters have been included in this exercise.

5.1 Filling out the plot data sheet

As shown in Table 5.1, the harmonized plot data for this example include soil samples with standardized codes, coordinates and positional metadata.

  • All soil profiles have been collected within the same project. The project_name column for all profiles is set to LH/UvS_01.

  • The profile_code attribute follows the names of the soil profiles in the original document.

  • The site_code and plot_code inform about the location of the sample. In this case site_code has been set to the district Para and site_code to the protected area of Kabo, where all samples were collected.

  • Since the samples were collected with an auger, the plot_type attribute must be set to borehole The available options are borehole for auger sampling and pit if an open pit was excavated.

  • For each soil sampling point, the n_layers attribute correspond to the number of horizons/layers collected at that point. In this case, n_layers is equal to the number of horizons sampled by profile.

  • The date format in the original document follows the *DD Month YYYY* format (e.g., 27 May 1983). This format must be converted to *YYYY-MM-DD* (1983-05-27), which is the standard format accepted by the GloSIS database.

  • The original coordinates of the points were expressed in degrees and minutes, using a central point as a reference (5°15’N, 55°43’W) and including displacement information in meters. Additionally, the coordinate system (EPSG) was not defined. We estimated the coordinates for each soil sampling point, using the information provided, in relation to the EPSG::4326 coordinate system (see Table 5.1).

  • No corrections were necessary for altitude. The positional accuracy was estimated at 1850 m.

  • The extent column refers to a surface plot. In this example, the samples are points, so no information on extent is applicable.

  • The columns altitude, positional accuracy and map_sheet_code are optional.

Table 5.1: Harmonized data for the plot data sheet for the exercise data.
project_name site_code plot_code profile_code plot_type n_layers date longitude latitude altitude positional_accuracy extent map_sheet_code
LH/UvS_01 Para Kabo Profile35 Borehole 8 1983-05-27 -55.7148 5.25 20 1850
LH/UvS_01 Para Kabo Profile36 Borehole 9 1978-08-10 -55.7166 5.25 27 1850
LH/UvS_01 Para Kabo Profile37 Borehole 9 1983-06-02 -55.7131 5.25 39 1850


5.2 Filling out the specimen data sheet

The specimen data sheet contains information for each soil sample collected at each soil sampling location. The number of rows must therefore be equal to the total number of horizons/layers at each sampling location (n_layers). In this example, the total number of rows in this sheet is 26.

Two different types of data are required in this sheet. First, the columns profile_code, element_code, type, order_element, upper_depth, lower_depth, and specimen_code serve as identifiers for samples, sample types, and soil depths (Table 5.2). The remaining columns correspond to soil properties measured through laboratory analyses.

  • The profile_code column references the same attribute from the previous sheet and must contain repetitions corresponding to the number of horizons collected at each sampling location.

  • The order_element column is a sequential number that increases from top to bottom horizons, starting from 1.

  • The element_code is generated by combining the profile_code and order_element to create a unique identifier for each horizon.

  • The type attribute is set to Horizon for genetic soil horizons. If the sample is not collected based on pedogenetic criteria but is defined by specific boundaries, the option Layer is used.

  • The upper_depth and lower_depth columns specify the depth range (in cm) of each horizon or layer within the soil profile. Except for the top horizon, the upper_depth value of each horizon must match the lower_depth value of the preceding horizon within the same profile.

  • The specimen_code is an alphanumeric string that uniquely identifies each soil sample. In this case, the same value as element_code has been used for consistency.

Table 5.2: Sample descriptors in the specimen data sheet for the exercise data.
profile_code element_code type order_element upper_depth lower_depth specimen_code
Profile35 Profile35_1 Horizon 1 0 4 Profile35_1
Profile35 Profile35_2 Horizon 2 4 9 Profile35_2
Profile35 Profile35_3 Horizon 3 9 24 Profile35_3
Profile35 Profile35_4 Horizon 4 24 50 Profile35_4
Profile35 Profile35_5 Horizon 5 50 66 Profile35_5
Profile35 Profile35_6 Horizon 6 66 104 Profile35_6
Profile35 Profile35_7 Horizon 7 104 134 Profile35_7
Profile35 Profile35_8 Horizon 8 134 170 Profile35_8
Profile36 Profile36_1 Horizon 1 0 3 Profile36_1
Profile36 Profile36_2 Horizon 2 3 10 Profile36_2
Profile36 Profile36_3 Horizon 3 10 27 Profile36_3
Profile36 Profile36_4 Horizon 4 27 48 Profile36_4
Profile36 Profile36_5 Horizon 5 48 63 Profile36_5
Profile36 Profile36_6 Horizon 6 63 87 Profile36_6
Profile36 Profile36_7 Horizon 7 87 102 Profile36_7
Profile36 Profile36_8 Horizon 8 102 130 Profile36_8
Profile36 Profile36_9 Horizon 9 130 180 Profile36_9
Profile37 Profile37_1 Horizon 1 0 4 Profile37_1
Profile37 Profile37_2 Horizon 2 4 15 Profile37_2
Profile37 Profile37_3 Horizon 3 15 24 Profile37_3
Profile37 Profile37_4 Horizon 4 24 41 Profile37_4
Profile37 Profile37_5 Horizon 5 41 52 Profile37_5
Profile37 Profile37_6 Horizon 6 52 87 Profile37_6
Profile37 Profile37_7 Horizon 7 87 112 Profile37_7
Profile37 Profile37_8 Horizon 8 112 143 Profile37_8
Profile37 Profile37_9 Horizon 9 143 180 Profile37_9


The parameters for analytical soil properties in Table 5.3 must include, for each property, its measured value in the laboratory using the harmonized units of measurement defined in GloSIS, as specified in the observation_phys_chem_id column of the template. Any value reported in different units MUST be converted to the harmonized units before being entered into this sheet. Values that fall outside the predefined lower and upper thresholds in the database will be excluded, preventing full population of the database.

Table 5.3: Analytical properties in the specimen data sheet for the exercise data.
Sand (%) Silt (%) Clay (%) C (g/kg) N (g/kg) pH-H₂O pH-KCl CEC (cmol/kg) Ca (cmol/kg) Mg (cmol/kg) K (cmol/kg) Na (cmol/kg) Al (cmol/kg) ECEC (cmol/kg) K total (cmol/kg) P total (%) P-Bray I (%) Bulk Density (kg/dm3) pF 2 (m3/100 m3)
83.5 13.6 2.9 11.7 0.8 4.0 3.3 26.0 1.25 1.15 0.4 0.2 1.2654 9.2 0.0512821 0.0032 0.00022
76.2 17.6 6.3 12.0 0.9 3.7 3.4 28.4 0.40 0.50 0.3 0.5 4.1292 15.0 0.0615385 0.0049 0.00022 1.41 23.2
68.5 17.9 13.6 8.2 0.6 4.4 3.8 23.2 0.15 0.25 0.1 0.3 3.9294 13.0 0.0615385 0.0054 0.00011 1.28 22.4
63.8 18.8 17.6 5.6 0.4 4.8 4.0 22.8 0.15 0.15 0.1 0.2 4.0959 13.2 0.0769231 0.0047 0.00006 1.56 25.2
62.8 14.2 23.1 2.5 0.2 4.8 3.8 14.4 0.15 0.10 0.0 0.2 2.9970 9.7 0.0871795 0.0052 0.00002 1.60 26.0
64.2 12.4 23.6 1.4 0.3 4.8 4.0 12.2 0.15 0.05 0.0 0.1 2.5641 8.2 0.0769231 0.0049 0.00005 1.65 25.9
62.3 12.5 25.4 0.7 0.1 5.0 4.0 10.8 0.00 0.00 0.1 0.0 2.2977 7.0 0.1025641 0.0037 0.00011 1.69 27.1
58.1 14.3 27.8 0.6 0.1 5.0 4.1 9.8 0.15 0.00 0.0 0.1 2.2311 7.1 0.0871795 0.0037 0.00008 1.69 29.7
75.2 21.7 3.2 15.2 1.1 4.0 3.8 34.8 2.30 0.50 0.6 0.7 2.3643 14.0 0.1051282 0.0054 0.00021 1.38
75.6 16.4 8.0 8.6 0.8 4.0 3.6 26.6 0.30 0.25 0.3 0.3 3.6630 12.7 0.1512821 0.0060 0.00014 1.47 20.4
61.6 18.8 19.7 4.1 0.4 4.4 3.8 20.0 0.00 0.10 0.2 0.3 2.9970 9.7 0.1948718 0.0066 0.00000 1.51 27.6
57.8 17.1 25.2 4.1 0.5 4.6 3.9 19.6 0.00 0.10 0.1 0.1 3.5631 11.1 0.1846154 0.0082 0.00000 29.6
56.9 14.6 28.6 2.8 0.3 4.8 4.0 17.2 0.00 0.15 0.2 0.0 3.0636 9.7 0.1051282 0.0071 0.00007 1.49
54.8 15.0 30.3 1.8 0.3 4.8 4.0 14.4 0.00 0.20 0.1 0.0 2.3643 7.6 0.2641026 0.0076 0.00000 1.53 30.2
57.2 12.7 30.1 1.5 0.2 4.9 4.0 11.6 0.15 0.20 0.0 0.1 2.3976 8.0 0.1230769 0.0047 0.00005 1.57 29.2
58.9 13.3 27.8 1.6 0.1 5.0 4.1 10.0 0.15 0.15 0.1 0.0 2.0313 6.8 0.1128205 0.0054 0.00005 1.58 28.8
59.3 13.4 27.4 0.7 0.1 5.2 4.2 9.2 0.00 0.00 0.1 0.1 1.0989 3.5 0.1025641 0.0056 0.00018 28.7
80.1 18.8 1.1 15.1 0.9 4.0 3.8 31.0 0.55 1.05 0.3 0.3 0.9324 6.6 0.0512821 0.0024 0.00021
90.3 6.9 2.8 8.1 0.5 4.4 3.8 13.0 0.15 0.10 0.1 0.1 1.8648 6.3 0.0769231 0.0042 0.00011 1.32 11.4
87.1 8.1 4.8 4.5 0.5 4.5 3.8 13.0 0.15 0.05 0.1 0.1 2.3310 7.6 0.0769231 0.0042 0.00008 1.38 11.5
83.4 6.9 9.8 6.3 0.4 5.0 4.0 16.4 0.00 0.00 0.1 0.0 3.1635 9.6 0.1025641 0.0064 0.00007 1.45 16.6
79.8 7.7 12.6 3.2 0.1 4.9 4.1 13.2 0.15 0.05 0.1 0.0 3.0636 9.7 0.1384615 0.0066 0.00015 1.50 15.9
81.5 6.1 12.4 1.3 0.1 4.9 4.1 0.3 0.15 0.15 0.1 0.2 1.0989 4.2 0.1230769 0.0059 0.00008 1.45 14.9
77.9 7.5 19.6 0.9 0.2 5.0 4.0 7.8 0.00 0.00 0.1 0.0 1.0989 3.4 0.1641026 0.0068 0.00008 1.49 20.4
70.7 8.4 21.0 0.9 0.2 4.8 4.0 9.2 0.15 0.05 0.1 0.1 2.4975 8.1 0.2358974 0.0096 0.00002 1.49 16.2
72.7 8.8 18.6 0.6 0.2 5.0 4.2 7.2 0.00 0.00 0.1 0.0 1.5651 4.8 0.2256410 0.0103 0.00003 1.53 18.3

5.3 Filling out the procedures data sheet

The Procedures sheet establishes the links between the original soil analytical property names and their corresponding standardized names, analytical methods, and units of measurement in the GloSIS database. Any analytical properties measured in units different from those specified in this sheet MUST be converted to the standard units before being entered into the specimen data sheet and uploaded to the glosis-shiny application.

The first column in the Procedures sheet MUST contain all soil analytical properties included in the specimen data and in the same order of entrance. The proper GloSIS names for soil properties, analysis methods and units MUST be defined for each soil property using the data provided in the observation_phys_chem_id sheet. Properties that are not listed in the observation_phys_chem_id sheet are not allowed.

Table 5.4: Soil Property Mapping Table for the exercise data
soil_property property_phys_chem_id procedure_phys_chem_id unit_of_measure_id
Sand (%) Sand texture fraction SaSiCl_2-20-2000u-disp-pipette %
Silt (%) Silt texture fraction SaSiCl_2-20-2000u-disp-pipette %
Clay (%) Clay texture fraction SaSiCl_2-20-2000u-disp-pipette %
C (%) Carbon (C) - organic OrgC_wc-cro3-walkleyblack g/kg
N (%) Nitrogen (N) - total TotalN_kjeldahl g/kg
pH-H₂O pH - Hydrogen potential pHH2O pH
pH-KCl pH - Hydrogen potential pHKCl pH
CEC (me/100g) cationExchangeCapacitySoil CEC_ph7-nh4oac cmol/kg
Ca (me/100g) Calcium (Ca++) - exchangeable ExchBases_ph7-nh4oac cmol/kg
Mg (me/100g) Magnesium (Mg++) - exchangeable ExchBases_ph7-nh4oac cmol/kg
K (me/100g) Potassium (K+) - exchangeable ExchBases_ph7-nh4oac cmol/kg
Na (me/100g) Sodium (Na+) - exchangeable ExchBases_ph7-nh4oac cmol/kg
Al (me/100g) Aluminium (Al+++) - exchangeable ExchBases_ph7-nh4oac cmol/kg
ECEC (me/100g) effectiveCecProperty EffCEC_calcul-ba cmol/kg
K total (ppm) Potassium (K) - total Total_tp04 cmol/kg
P total (ppm) Phosphorus (P) - total Total_tp04 %
P-Bray I (ppm) Phosphorus (P) - extractable Extr_hcl-nh4f-bray1 %
Bulk Density (kg/l) bulkDensityFineEarthProperty BlkDensF_fe-unkn kg/dm3
pF 2 (%) Available water capacity - volumetric (FC to WP) PAWHC_calcul-fc100wp m3/100 m3

Note that the soil properties marked in red in Table 5.4 have different units of measurement in the original data compared to those in the GloSIS database (column 4). For these properties, transformation of laboratory measurements using the appropriate correction factors must be performed in the specimen data data sheet. However, for Bulk Density and Available Water Capacity (pF 2), although the units are written in a different format, they represent the same values.

This sheet ensures that all soil properties are named according to the GloSIS standard and that the corresponding methods and units are correctly defined.

5.4 Special Considerations:

  • Any soil property value that is missing MUST be left empty in the .xlsx template. Entries such as NA, nodata, -, etc., are not permitted.
  • Ensure that all numeric values are correctly formatted as numbers in the template. Any alphanumeric entry in a numeric column will cause the entire column to be treated as alphanumeric.
  • Special characters such as *, /, ?, etc., are not allowed. This includes the < symbol, commonly used for values below the detection limit of analytical equipment. In such cases, either leave the value empty or replace it by dividing the detection limit by a factor of two.
  • All soil properties that are not listed in the observation_phys_chem_id MUST be removed from the analysis. They cannot be included in either the specimen data or the Procedures sheets. If they are left in these sheets, the database will not be correctly populated.

References

Poels, R. L. H. 1987. “Soils, Water and Nutrients in a Forest Ecosystem in Suriname.” PhD thesis, Wageningen, Netherlands: Agricultural University, Wageningen.