Application of Vis–NIR Spectroscopy and Machine Learning for Assessing Soil Organic Carbon in the Sierra Nevada de Santa Marta, Colombia
Soil organic carbon (SOC) is an essential indicator of soil fertility, health, and carbon sequestration capacity. Its proper management improves soil structure, productivity, and resilience to climate change, making rapid and reliable SOC assessment essential for sustainable agriculture. Visible and near-infrared (Vis–NIR) spectroscopy offers a non-destructive and cost-effective alternative to conventional laboratory analyses, allowing for the simultaneous estimation of multiple soil properties from a single spectrum. This study aimed to predict SOC content using machine learning techniques applied to Vis–NIR spectra of 860 soil samples collected in the Sierra Nevada de Santa Marta, Colombia. The spectra (400–2500 nm) were acquired using a NIR spectrophotometer, and the soil organic carbon (SOC) content was quantified using a wet oxidation method that employs dichromate in an acidic medium. A hybrid modeling framework combining Random Forest (RF) with support vector regression (SVR) and XGBoost was implemented. Spectral pretreatments (Savitzky–Golay first derivative, MSC, and SNV) were compared, and spectral bands were selected every 10 nm. The 30 most relevant wavelengths were identified using RF importance analysis. Data were divided into training (80%) and test (20%) subsets using stratified random sampling, and five-fold cross-validation was applied for parameter optimization and overfitting control. The RF–XGBoost (R2 = 0.86) and RF–SVR (R2 = 0.85) models outperformed the individual RF and SVR models (R2 < 0.7). The proposed hybrid approach, optimized through features, and advanced spectral preprocessing demonstrate a robust and scalable framework for rapid prediction of SOC and sustainable soil monitoring.
