← Back to the archive
preprintDemonstration bundle2026nirs4all 0.10.0

Reproducible PLS calibration of protein content from near-infrared spectra

Gregory Beurier, nirs4all ecosystem
CIRAD
Demonstration bundle

A worked, end-to-end example of the nirs4all reproduction-document publisher. A partial least-squares model is calibrated on near-infrared spectra to predict protein content, using a min–max scaling and standard-normal-variate preprocessing chain. The deposited .n4a bundle carries the exact pipeline and fitted artifacts; this page re-derives the cross-validated scores live in the browser from an included synthetic dataset, and lists the literature for every method used.

NIRSPLSpreprocessingreproducibilitychemometrics
nirs4all
1.76
RMSE (5-fold CV)
0.69
R² (5-fold CV)
1.81
RPD (5-fold CV)
10
components
4
pipeline steps
120
samples

Pipeline4 steps

1
MinMaxScaler[1]preprocessing
sklearn.preprocessing._data.MinMaxScaler
Affine rescaling of each feature to a fixed range (default [0, 1]); a standard feature-conditioning step.
2
FullTrainFoldSplittersplit / cv
<nirs4all.pipeline.execution.refit.executor._FullTrainFoldSplitter>
internal splitter — re-derived at fit time, not a stored transform
3
StandardNormalVariate[2]preprocessing
nirs4all.operators.transforms.scalers.StandardNormalVariate
Per-spectrum centring and scaling that removes multiplicative scatter and baseline offset row by row.
4
StandardScaler[1]target transform
sklearn.preprocessing._data.StandardScaler
Centres to zero mean and scales to unit variance; applied to features or to the regression target.
5
PLSRegression[3]model
sklearn.cross_decomposition._pls.PLSRegression
Projects spectra onto a small number of latent variables that maximise covariance with the target, then regresses.
n_components=10max_iter=100

Protocol

Split5-fold cross-validation (out-of-fold predictions)
Cross-validation5-fold
Scoring metricRMSE / R² / RPD
Fold strategyweighted_average
Fold weightsfold 0: 1

Resultsas published

1.76
RMSE
5-fold CV
0.69
5-fold CV
1.81
RPD
5-fold CV

Dataset

DatasetSynthetic NIRS protein calibration set
Targetprotein (% w/w)
Samples120
Wavelengths100
NoteSynthetic, deterministic (seed = 42). Included only to demonstrate the live replay; a real paper would reference a DOI-pinned nirs4all-datasets entry.
included spectra0.7790.6080.4370.2660.0948100017502500included spectra / nm

Live replay — re-run this pipelinein your browser

ready pure-JS reference engine · NIPALS PLS · 5-fold OOF
📂 Run on your own dataset

Re-run this exact pipeline on your data, entirely in your browser. Upload a CSV (rows = samples; one column is the target, the rest are the spectrum) — for vendor spectra files, open the full nirs4all-web app. Nothing is uploaded to a server.

Full in-browser data app: nirs4all-web →

Re-runs the published preprocessing + model on the dataset under leakage-safe 5-fold cross-validation, recomputing out-of-fold predictions and scores entirely in your browser. Synthetic demonstration dataset (deterministic, seed=42, 120 samples) included so the deposited pipeline can be re-run live; no redistribution constraints.

Approximate. This is an independent pure-JS reference engine (NIPALS PLS) with a deterministic 5-fold split — it demonstrates the pipeline, but does not reproduce the deposited run's exact PLS implementation or fold strategy, so these scores are close to, not identical to, the published values above. The exact pipeline is reproducible from the .n4a with the commands below.

Methods

Spectra were preprocessed by Min–max scaling[1], then Standard Normal Variate (SNV)[2].

The regression target was Min–max scaling[1] prior to modelling.

A Partial Least Squares regression (PLS)[3] model with 10 latent variables was then calibrated, evaluated by 5-fold. The fitted pipeline and per-fold artifacts are bundled in the deposited .n4a.

Bibliography3 references

  1. [1]Pedregosa, F. et al. (2011) Scikit-learn: Machine Learning in Python (implementation reference). Journal of Machine Learning Research 12, 2825–2830. https://jmlr.org/papers/v12/pedregosa11a.htmlAffine rescaling of each feature to a fixed range (default [0, 1]); a standard feature-conditioning step.
  2. [2]Barnes, R. J.; Dhanoa, M. S.; Lister, S. J. (1989) Standard Normal Variate Transformation and De-trending of Near-Infrared Diffuse Reflectance Spectra. Applied Spectroscopy 43(5), 772–777. 10.1366/0003702894202201Per-spectrum centring and scaling that removes multiplicative scatter and baseline offset row by row.
  3. [3]Wold, S.; Sjöström, M.; Eriksson, L. (2001) PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58(2), 109–130. 10.1016/S0169-7439(01)00155-1Projects spectra onto a small number of latent variables that maximise covariance with the target, then regresses.

Licensing

Manuscript
CC-BY-4.0
Reproduction bundle
CeCILL-2.1 OR AGPL-3.0-or-later
Dataset
synthetic — no constraints

Deposited manuscripts keep their publisher's copyright; datasets keep their own license / DOI terms. The reproduction code and page are dual-licensed open-source.

Provenance & reproductionfingerprinted

Pipeline UIDd0446b12-df2a-4b80-9d3d-4a95c9d4d700
Bundle fingerprint164b03cb496f078d06274494…
nirs4all version0.10.0
Created2026-06-14T09:06:00.936069+00:00
Source typeprediction
Sourcenirs4all (examples/exports) @ D02_base_model
FileBytesSHA-256
artifacts/step_1_MinMaxScaler.joblib43,699295ae60c6ba6…
artifacts/step_3_StandardNormalVariate.joblib13715e0da1296ba…
artifacts/step_4_StandardScaler.joblib623c041e1476e15…
artifacts/step_5_fold0_PLSRegression.joblib611,2713f34996d180c…
fold_weights.json1438220153bd4a…
manifest.json3021d6729649d53…
pipeline.json56117fb9f32e626…
# 1. install the exact library version this bundle was produced with
pip install "nirs4all==0.10.0"

# 2. re-run the published pipeline on your own spectra X (n_samples x n_wavelengths)
from nirs4all.pipeline.bundle import BundleLoader
bundle = BundleLoader("model.n4a")
y_pred = bundle.predict(X)   # full preprocessing + CV ensemble + inverse target transform

Cite this

Cite the paper (below) for the science; cite this reproduction page / the deposited .n4a by its bundle fingerprint when referencing the exact pipeline.

@article{beurier2026,
  author = {Gregory Beurier and nirs4all ecosystem},
  title = {Reproducible PLS calibration of protein content from near-infrared spectra},
  journal = {Demonstration bundle},
  year = {2026}
}
CITATION.cff
cff-version: 1.2.0
message: "If you use this reproduction bundle, please cite the associated paper."
title: "Reproducible PLS calibration of protein content from near-infrared spectra"
abstract: "A worked, end-to-end example of the nirs4all reproduction-document publisher. A partial least-squares model is calibrated on near-infrared spectra to predict protein content, using a min–max scaling and standard-normal-variate preprocessing chain. The deposited .n4a bundle carries the exact pipeline and fitted artifacts; this page re-derives the cross-validated scores live in the browser from an included synthetic dataset, and lists the literature for every method used. "
authors:
  - family-names: "Beurier"
    given-names: "Gregory"
    affiliation: "CIRAD"
  - name: "nirs4all ecosystem"
identifiers:
  - type: other
    value: "nirs4all-bundle:164b03cb496f078d0627449461122de5b110ee0290159eec06f4527ff9850c7a"
    description: "Reproducibility bundle fingerprint (SHA-256), produced with nirs4all 0.10.0"
keywords:
  - "NIRS"
  - "PLS"
  - "preprocessing"
  - "reproducibility"
  - "chemometrics"
references:
  - type: article
    title: "Scikit-learn: Machine Learning in Python (implementation reference)"
    authors:
      - family-names: "Pedregosa"
        given-names: "F. et al."
    journal: "Journal of Machine Learning Research 12, 2825–2830"
    year: 2011
  - type: article
    title: "Standard Normal Variate Transformation and De-trending of Near-Infrared Diffuse Reflectance Spectra"
    authors:
      - family-names: "Barnes"
        given-names: "R. J."
      - family-names: "Dhanoa"
        given-names: "M. S."
      - family-names: "Lister"
        given-names: "S. J."
    journal: "Applied Spectroscopy 43(5), 772–777"
    year: 1989
    doi: "10.1366/0003702894202201"
  - type: article
    title: "PLS-regression: a basic tool of chemometrics"
    authors:
      - family-names: "Wold"
        given-names: "S."
      - family-names: "Sjöström"
        given-names: "M."
      - family-names: "Eriksson"
        given-names: "L."
    journal: "Chemometrics and Intelligent Laboratory Systems 58(2), 109–130"
    year: 2001
    doi: "10.1016/S0169-7439(01)00155-1"
Generated by n4a-papers 0.2.0 · 2026-06-16