5 Exercise on data preparation
The example presented in this section uses data from three soil profiles from undisturbed forests in Suriname (Poels 1987) collected with a soil auger: one Orthic Ferralsol (Profile 35, 8 horizons) and two Xanthic Ferralsols (Profiles 36 and 37, 9 horizons each). This file is available in the test-data
folder in your installation folder. It is also available at the GloSIS repository from GitHub.
This data requires harmonization in coordinate systems, dates, soil properties, laboratory methods and units of measurement before being included in the GloSIS database. For simplification, only analytical data for selected soil parameters have been included in this exercise.
5.1 Filling out the plot data sheet
As shown in Table 5.1, the harmonized plot data for this example include soil samples with standardized codes, coordinates and positional metadata.
All soil profiles have been collected within the same project. The
project_name
column for all profiles is set toLH/UvS_01
.The
profile_code
attribute follows the names of the soil profiles in the original document.The
site_code
andplot_code
inform about the location of the sample. In this casesite_code
has been set to the districtPara
andsite_code
to the protected area ofKabo
, where all samples were collected.Since the samples were collected with an auger, the
plot_type
attribute must be set toborehole
The available options areborehole
for auger sampling andpit
if an open pit was excavated.For each soil sampling point, the
n_layers
attribute correspond to the number of horizons/layers collected at that point. In this case,n_layers
is equal to the number of horizons sampled by profile.The
date
format in the original document follows the*DD Month YYYY*
format (e.g., 27 May 1983). This format must be converted to*YYYY-MM-DD*
(1983-05-27), which is the standard format accepted by the GloSIS database.The original coordinates of the points were expressed in degrees and minutes, using a central point as a reference (5°15’N, 55°43’W) and including displacement information in meters. Additionally, the coordinate system (EPSG) was not defined. We estimated the coordinates for each soil sampling point, using the information provided, in relation to the EPSG::4326 coordinate system (see Table 5.1).
No corrections were necessary for
altitude
. Thepositional accuracy
was estimated at 1850 m.The
extent
column refers to a surface plot. In this example, the samples are points, so no information on extent is applicable.The columns
altitude
,positional accuracy
andmap_sheet_code
are optional.
project_name | site_code | plot_code | profile_code | plot_type | n_layers | date | longitude | latitude | altitude | positional_accuracy | extent | map_sheet_code |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LH/UvS_01 | Para | Kabo | Profile35 | Borehole | 8 | 1983-05-27 | -55.7148 | 5.25 | 20 | 1850 | ||
LH/UvS_01 | Para | Kabo | Profile36 | Borehole | 9 | 1978-08-10 | -55.7166 | 5.25 | 27 | 1850 | ||
LH/UvS_01 | Para | Kabo | Profile37 | Borehole | 9 | 1983-06-02 | -55.7131 | 5.25 | 39 | 1850 |
5.2 Filling out the specimen data sheet
The specimen data
sheet contains information for each soil sample collected at each soil sampling location. The number of rows must therefore be equal to the total number of horizons/layers at each sampling location (n_layers
). In this example, the total number of rows in this sheet is 26.
Two different types of data are required in this sheet. First, the columns profile_code
, element_code
, type
, order_element
, upper_depth
, lower_depth
, and specimen_code
serve as identifiers for samples, sample types, and soil depths (Table 5.2). The remaining columns correspond to soil properties measured through laboratory analyses.
The
profile_code
column references the same attribute from the previous sheet and must contain repetitions corresponding to the number of horizons collected at each sampling location.The
order_element
column is a sequential number that increases from top to bottom horizons, starting from 1.The
element_code
is generated by combining theprofile_code
andorder_element
to create a unique identifier for each horizon.The
type
attribute is set toHorizon
for genetic soil horizons. If the sample is not collected based on pedogenetic criteria but is defined by specific boundaries, the optionLayer
is used.The
upper_depth
andlower_depth
columns specify the depth range (in cm) of each horizon or layer within the soil profile. Except for the top horizon, theupper_depth
value of each horizon must match thelower_depth
value of the preceding horizon within the same profile.The
specimen_code
is an alphanumeric string that uniquely identifies each soil sample. In this case, the same value aselement_code
has been used for consistency.
profile_code | element_code | type | order_element | upper_depth | lower_depth | specimen_code |
---|---|---|---|---|---|---|
Profile35 | Profile35_1 | Horizon | 1 | 0 | 4 | Profile35_1 |
Profile35 | Profile35_2 | Horizon | 2 | 4 | 9 | Profile35_2 |
Profile35 | Profile35_3 | Horizon | 3 | 9 | 24 | Profile35_3 |
Profile35 | Profile35_4 | Horizon | 4 | 24 | 50 | Profile35_4 |
Profile35 | Profile35_5 | Horizon | 5 | 50 | 66 | Profile35_5 |
Profile35 | Profile35_6 | Horizon | 6 | 66 | 104 | Profile35_6 |
Profile35 | Profile35_7 | Horizon | 7 | 104 | 134 | Profile35_7 |
Profile35 | Profile35_8 | Horizon | 8 | 134 | 170 | Profile35_8 |
Profile36 | Profile36_1 | Horizon | 1 | 0 | 3 | Profile36_1 |
Profile36 | Profile36_2 | Horizon | 2 | 3 | 10 | Profile36_2 |
Profile36 | Profile36_3 | Horizon | 3 | 10 | 27 | Profile36_3 |
Profile36 | Profile36_4 | Horizon | 4 | 27 | 48 | Profile36_4 |
Profile36 | Profile36_5 | Horizon | 5 | 48 | 63 | Profile36_5 |
Profile36 | Profile36_6 | Horizon | 6 | 63 | 87 | Profile36_6 |
Profile36 | Profile36_7 | Horizon | 7 | 87 | 102 | Profile36_7 |
Profile36 | Profile36_8 | Horizon | 8 | 102 | 130 | Profile36_8 |
Profile36 | Profile36_9 | Horizon | 9 | 130 | 180 | Profile36_9 |
Profile37 | Profile37_1 | Horizon | 1 | 0 | 4 | Profile37_1 |
Profile37 | Profile37_2 | Horizon | 2 | 4 | 15 | Profile37_2 |
Profile37 | Profile37_3 | Horizon | 3 | 15 | 24 | Profile37_3 |
Profile37 | Profile37_4 | Horizon | 4 | 24 | 41 | Profile37_4 |
Profile37 | Profile37_5 | Horizon | 5 | 41 | 52 | Profile37_5 |
Profile37 | Profile37_6 | Horizon | 6 | 52 | 87 | Profile37_6 |
Profile37 | Profile37_7 | Horizon | 7 | 87 | 112 | Profile37_7 |
Profile37 | Profile37_8 | Horizon | 8 | 112 | 143 | Profile37_8 |
Profile37 | Profile37_9 | Horizon | 9 | 143 | 180 | Profile37_9 |
The parameters for analytical soil properties in Table 5.3 must include, for each property, its measured value in the laboratory using the harmonized units of measurement defined in GloSIS, as specified in the observation_phys_chem_id
column of the template. Any value reported in different units MUST be converted to the harmonized units before being entered into this sheet. Values that fall outside the predefined lower and upper thresholds in the database will be excluded, preventing full population of the database.
Sand (%) | Silt (%) | Clay (%) | C (g/kg) | N (g/kg) | pH-H₂O | pH-KCl | CEC (cmol/kg) | Ca (cmol/kg) | Mg (cmol/kg) | K (cmol/kg) | Na (cmol/kg) | Al (cmol/kg) | ECEC (cmol/kg) | K total (cmol/kg) | P total (%) | P-Bray I (%) | Bulk Density (kg/dm3) | pF 2 (m3/100 m3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
83.5 | 13.6 | 2.9 | 11.7 | 0.8 | 4.0 | 3.3 | 26.0 | 1.25 | 1.15 | 0.4 | 0.2 | 1.2654 | 9.2 | 0.0512821 | 0.0032 | 0.00022 | ||
76.2 | 17.6 | 6.3 | 12.0 | 0.9 | 3.7 | 3.4 | 28.4 | 0.40 | 0.50 | 0.3 | 0.5 | 4.1292 | 15.0 | 0.0615385 | 0.0049 | 0.00022 | 1.41 | 23.2 |
68.5 | 17.9 | 13.6 | 8.2 | 0.6 | 4.4 | 3.8 | 23.2 | 0.15 | 0.25 | 0.1 | 0.3 | 3.9294 | 13.0 | 0.0615385 | 0.0054 | 0.00011 | 1.28 | 22.4 |
63.8 | 18.8 | 17.6 | 5.6 | 0.4 | 4.8 | 4.0 | 22.8 | 0.15 | 0.15 | 0.1 | 0.2 | 4.0959 | 13.2 | 0.0769231 | 0.0047 | 0.00006 | 1.56 | 25.2 |
62.8 | 14.2 | 23.1 | 2.5 | 0.2 | 4.8 | 3.8 | 14.4 | 0.15 | 0.10 | 0.0 | 0.2 | 2.9970 | 9.7 | 0.0871795 | 0.0052 | 0.00002 | 1.60 | 26.0 |
64.2 | 12.4 | 23.6 | 1.4 | 0.3 | 4.8 | 4.0 | 12.2 | 0.15 | 0.05 | 0.0 | 0.1 | 2.5641 | 8.2 | 0.0769231 | 0.0049 | 0.00005 | 1.65 | 25.9 |
62.3 | 12.5 | 25.4 | 0.7 | 0.1 | 5.0 | 4.0 | 10.8 | 0.00 | 0.00 | 0.1 | 0.0 | 2.2977 | 7.0 | 0.1025641 | 0.0037 | 0.00011 | 1.69 | 27.1 |
58.1 | 14.3 | 27.8 | 0.6 | 0.1 | 5.0 | 4.1 | 9.8 | 0.15 | 0.00 | 0.0 | 0.1 | 2.2311 | 7.1 | 0.0871795 | 0.0037 | 0.00008 | 1.69 | 29.7 |
75.2 | 21.7 | 3.2 | 15.2 | 1.1 | 4.0 | 3.8 | 34.8 | 2.30 | 0.50 | 0.6 | 0.7 | 2.3643 | 14.0 | 0.1051282 | 0.0054 | 0.00021 | 1.38 | |
75.6 | 16.4 | 8.0 | 8.6 | 0.8 | 4.0 | 3.6 | 26.6 | 0.30 | 0.25 | 0.3 | 0.3 | 3.6630 | 12.7 | 0.1512821 | 0.0060 | 0.00014 | 1.47 | 20.4 |
61.6 | 18.8 | 19.7 | 4.1 | 0.4 | 4.4 | 3.8 | 20.0 | 0.00 | 0.10 | 0.2 | 0.3 | 2.9970 | 9.7 | 0.1948718 | 0.0066 | 0.00000 | 1.51 | 27.6 |
57.8 | 17.1 | 25.2 | 4.1 | 0.5 | 4.6 | 3.9 | 19.6 | 0.00 | 0.10 | 0.1 | 0.1 | 3.5631 | 11.1 | 0.1846154 | 0.0082 | 0.00000 | 29.6 | |
56.9 | 14.6 | 28.6 | 2.8 | 0.3 | 4.8 | 4.0 | 17.2 | 0.00 | 0.15 | 0.2 | 0.0 | 3.0636 | 9.7 | 0.1051282 | 0.0071 | 0.00007 | 1.49 | |
54.8 | 15.0 | 30.3 | 1.8 | 0.3 | 4.8 | 4.0 | 14.4 | 0.00 | 0.20 | 0.1 | 0.0 | 2.3643 | 7.6 | 0.2641026 | 0.0076 | 0.00000 | 1.53 | 30.2 |
57.2 | 12.7 | 30.1 | 1.5 | 0.2 | 4.9 | 4.0 | 11.6 | 0.15 | 0.20 | 0.0 | 0.1 | 2.3976 | 8.0 | 0.1230769 | 0.0047 | 0.00005 | 1.57 | 29.2 |
58.9 | 13.3 | 27.8 | 1.6 | 0.1 | 5.0 | 4.1 | 10.0 | 0.15 | 0.15 | 0.1 | 0.0 | 2.0313 | 6.8 | 0.1128205 | 0.0054 | 0.00005 | 1.58 | 28.8 |
59.3 | 13.4 | 27.4 | 0.7 | 0.1 | 5.2 | 4.2 | 9.2 | 0.00 | 0.00 | 0.1 | 0.1 | 1.0989 | 3.5 | 0.1025641 | 0.0056 | 0.00018 | 28.7 | |
80.1 | 18.8 | 1.1 | 15.1 | 0.9 | 4.0 | 3.8 | 31.0 | 0.55 | 1.05 | 0.3 | 0.3 | 0.9324 | 6.6 | 0.0512821 | 0.0024 | 0.00021 | ||
90.3 | 6.9 | 2.8 | 8.1 | 0.5 | 4.4 | 3.8 | 13.0 | 0.15 | 0.10 | 0.1 | 0.1 | 1.8648 | 6.3 | 0.0769231 | 0.0042 | 0.00011 | 1.32 | 11.4 |
87.1 | 8.1 | 4.8 | 4.5 | 0.5 | 4.5 | 3.8 | 13.0 | 0.15 | 0.05 | 0.1 | 0.1 | 2.3310 | 7.6 | 0.0769231 | 0.0042 | 0.00008 | 1.38 | 11.5 |
83.4 | 6.9 | 9.8 | 6.3 | 0.4 | 5.0 | 4.0 | 16.4 | 0.00 | 0.00 | 0.1 | 0.0 | 3.1635 | 9.6 | 0.1025641 | 0.0064 | 0.00007 | 1.45 | 16.6 |
79.8 | 7.7 | 12.6 | 3.2 | 0.1 | 4.9 | 4.1 | 13.2 | 0.15 | 0.05 | 0.1 | 0.0 | 3.0636 | 9.7 | 0.1384615 | 0.0066 | 0.00015 | 1.50 | 15.9 |
81.5 | 6.1 | 12.4 | 1.3 | 0.1 | 4.9 | 4.1 | 0.3 | 0.15 | 0.15 | 0.1 | 0.2 | 1.0989 | 4.2 | 0.1230769 | 0.0059 | 0.00008 | 1.45 | 14.9 |
77.9 | 7.5 | 19.6 | 0.9 | 0.2 | 5.0 | 4.0 | 7.8 | 0.00 | 0.00 | 0.1 | 0.0 | 1.0989 | 3.4 | 0.1641026 | 0.0068 | 0.00008 | 1.49 | 20.4 |
70.7 | 8.4 | 21.0 | 0.9 | 0.2 | 4.8 | 4.0 | 9.2 | 0.15 | 0.05 | 0.1 | 0.1 | 2.4975 | 8.1 | 0.2358974 | 0.0096 | 0.00002 | 1.49 | 16.2 |
72.7 | 8.8 | 18.6 | 0.6 | 0.2 | 5.0 | 4.2 | 7.2 | 0.00 | 0.00 | 0.1 | 0.0 | 1.5651 | 4.8 | 0.2256410 | 0.0103 | 0.00003 | 1.53 | 18.3 |
5.3 Filling out the procedures data sheet
The Procedures
sheet establishes the links between the original soil analytical property names and their corresponding standardized names, analytical methods, and units of measurement in the GloSIS database. Any analytical properties measured in units different from those specified in this sheet MUST be converted to the standard units before being entered into the specimen data sheet and uploaded to the glosis-shiny
application.
The first column in the Procedures
sheet MUST contain all soil analytical properties included in the specimen data
and in the same order of entrance. The proper GloSIS names for soil properties, analysis methods and units MUST be defined for each soil property using the data provided in the observation_phys_chem_id
sheet. Properties that are not listed in the observation_phys_chem_id
sheet are not allowed.
soil_property | property_phys_chem_id | procedure_phys_chem_id | unit_of_measure_id |
---|---|---|---|
Sand (%) | Sand texture fraction | SaSiCl_2-20-2000u-disp-pipette | % |
Silt (%) | Silt texture fraction | SaSiCl_2-20-2000u-disp-pipette | % |
Clay (%) | Clay texture fraction | SaSiCl_2-20-2000u-disp-pipette | % |
C (%) | Carbon (C) - organic | OrgC_wc-cro3-walkleyblack | g/kg |
N (%) | Nitrogen (N) - total | TotalN_kjeldahl | g/kg |
pH-H₂O | pH - Hydrogen potential | pHH2O | pH |
pH-KCl | pH - Hydrogen potential | pHKCl | pH |
CEC (me/100g) | cationExchangeCapacitySoil | CEC_ph7-nh4oac | cmol/kg |
Ca (me/100g) | Calcium (Ca++) - exchangeable | ExchBases_ph7-nh4oac | cmol/kg |
Mg (me/100g) | Magnesium (Mg++) - exchangeable | ExchBases_ph7-nh4oac | cmol/kg |
K (me/100g) | Potassium (K+) - exchangeable | ExchBases_ph7-nh4oac | cmol/kg |
Na (me/100g) | Sodium (Na+) - exchangeable | ExchBases_ph7-nh4oac | cmol/kg |
Al (me/100g) | Aluminium (Al+++) - exchangeable | ExchBases_ph7-nh4oac | cmol/kg |
ECEC (me/100g) | effectiveCecProperty | EffCEC_calcul-ba | cmol/kg |
K total (ppm) | Potassium (K) - total | Total_tp04 | cmol/kg |
P total (ppm) | Phosphorus (P) - total | Total_tp04 | % |
P-Bray I (ppm) | Phosphorus (P) - extractable | Extr_hcl-nh4f-bray1 | % |
Bulk Density (kg/l) | bulkDensityFineEarthProperty | BlkDensF_fe-unkn | kg/dm3 |
pF 2 (%) | Available water capacity - volumetric (FC to WP) | PAWHC_calcul-fc100wp | m3/100 m3 |
Note that the soil properties marked in red in Table 5.4 have different units of measurement in the original data compared to those in the GloSIS database (column 4). For these properties, transformation of laboratory measurements using the appropriate correction factors must be performed in the specimen data
data sheet. However, for Bulk Density and Available Water Capacity (pF 2), although the units are written in a different format, they represent the same values.
This sheet ensures that all soil properties are named according to the GloSIS standard and that the corresponding methods and units are correctly defined.
5.4 Special Considerations:
- Any soil property value that is missing MUST be left empty in the
.xlsx
template. Entries such asNA
,nodata
,-
, etc., are not permitted.
- Ensure that all numeric values are correctly formatted as numbers in the template. Any alphanumeric entry in a numeric column will cause the entire column to be treated as alphanumeric.
- Special characters such as
*
,/
,?
, etc., are not allowed. This includes the<
symbol, commonly used for values below the detection limit of analytical equipment. In such cases, either leave the value empty or replace it by dividing the detection limit by a factor of two.
- All soil properties that are not listed in the
observation_phys_chem_id
MUST be removed from the analysis. They cannot be included in either thespecimen data
or theProcedures
sheets. If they are left in these sheets, the database will not be correctly populated.