A worked, end-to-end example of the nirs4all reproduction-document publisher. A partial least-squares model is calibrated on near-infrared spectra to predict protein content, using a min–max scaling and standard-normal-variate preprocessing chain. The deposited .n4a bundle carries the exact pipeline and fitted artifacts; this page re-derives the cross-validated scores live in the browser from an included synthetic dataset, and lists the literature for every method used.
n_components=10max_iter=100| Split | 5-fold cross-validation (out-of-fold predictions) |
|---|---|
| Cross-validation | 5-fold |
| Scoring metric | RMSE / R² / RPD |
| Fold strategy | weighted_average |
| Fold weights | fold 0: 1 |
| Dataset | Synthetic NIRS protein calibration set |
|---|---|
| Target | protein (% w/w) |
| Samples | 120 |
| Wavelengths | 100 |
| Note | Synthetic, deterministic (seed = 42). Included only to demonstrate the live replay; a real paper would reference a DOI-pinned nirs4all-datasets entry. |
Re-run this exact pipeline on your data, entirely in your browser. Upload a CSV (rows = samples; one column is the target, the rest are the spectrum) — for vendor spectra files, open the full nirs4all-web app. Nothing is uploaded to a server.
Re-runs the published preprocessing + model on the dataset under leakage-safe 5-fold cross-validation, recomputing out-of-fold predictions and scores entirely in your browser. Synthetic demonstration dataset (deterministic, seed=42, 120 samples) included so the deposited pipeline can be re-run live; no redistribution constraints.
Approximate. This is an independent pure-JS reference engine
(NIPALS PLS) with a deterministic 5-fold split — it demonstrates the pipeline, but does not reproduce
the deposited run's exact PLS implementation or fold strategy, so these scores are close to, not
identical to, the published values above. The exact pipeline is reproducible from the
.n4a with the commands below.
Spectra were preprocessed by Min–max scaling[1], then Standard Normal Variate (SNV)[2].
The regression target was Min–max scaling[1] prior to modelling.
A Partial Least Squares regression (PLS)[3] model with 10 latent variables was then calibrated, evaluated by 5-fold. The fitted pipeline and per-fold artifacts are bundled in the deposited .n4a.
Deposited manuscripts keep their publisher's copyright; datasets keep their own license / DOI terms. The reproduction code and page are dual-licensed open-source.
| Pipeline UID | d0446b12-df2a-4b80-9d3d-4a95c9d4d700 |
|---|---|
| Bundle fingerprint | 164b03cb496f078d06274494… |
| nirs4all version | 0.10.0 |
| Created | 2026-06-14T09:06:00.936069+00:00 |
| Source type | prediction |
| Source | nirs4all (examples/exports) @ D02_base_model |
| File | Bytes | SHA-256 |
|---|---|---|
artifacts/step_1_MinMaxScaler.joblib | 43,699 | 295ae60c6ba6… |
artifacts/step_3_StandardNormalVariate.joblib | 137 | 15e0da1296ba… |
artifacts/step_4_StandardScaler.joblib | 623 | c041e1476e15… |
artifacts/step_5_fold0_PLSRegression.joblib | 611,271 | 3f34996d180c… |
fold_weights.json | 14 | 38220153bd4a… |
manifest.json | 302 | 1d6729649d53… |
pipeline.json | 561 | 17fb9f32e626… |
# 1. install the exact library version this bundle was produced with pip install "nirs4all==0.10.0" # 2. re-run the published pipeline on your own spectra X (n_samples x n_wavelengths) from nirs4all.pipeline.bundle import BundleLoader bundle = BundleLoader("model.n4a") y_pred = bundle.predict(X) # full preprocessing + CV ensemble + inverse target transform
Cite the paper (below) for the science; cite this reproduction page / the deposited .n4a by its bundle fingerprint when referencing the exact pipeline.
@article{beurier2026,
author = {Gregory Beurier and nirs4all ecosystem},
title = {Reproducible PLS calibration of protein content from near-infrared spectra},
journal = {Demonstration bundle},
year = {2026}
}cff-version: 1.2.0
message: "If you use this reproduction bundle, please cite the associated paper."
title: "Reproducible PLS calibration of protein content from near-infrared spectra"
abstract: "A worked, end-to-end example of the nirs4all reproduction-document publisher. A partial least-squares model is calibrated on near-infrared spectra to predict protein content, using a min–max scaling and standard-normal-variate preprocessing chain. The deposited .n4a bundle carries the exact pipeline and fitted artifacts; this page re-derives the cross-validated scores live in the browser from an included synthetic dataset, and lists the literature for every method used. "
authors:
- family-names: "Beurier"
given-names: "Gregory"
affiliation: "CIRAD"
- name: "nirs4all ecosystem"
identifiers:
- type: other
value: "nirs4all-bundle:164b03cb496f078d0627449461122de5b110ee0290159eec06f4527ff9850c7a"
description: "Reproducibility bundle fingerprint (SHA-256), produced with nirs4all 0.10.0"
keywords:
- "NIRS"
- "PLS"
- "preprocessing"
- "reproducibility"
- "chemometrics"
references:
- type: article
title: "Scikit-learn: Machine Learning in Python (implementation reference)"
authors:
- family-names: "Pedregosa"
given-names: "F. et al."
journal: "Journal of Machine Learning Research 12, 2825–2830"
year: 2011
- type: article
title: "Standard Normal Variate Transformation and De-trending of Near-Infrared Diffuse Reflectance Spectra"
authors:
- family-names: "Barnes"
given-names: "R. J."
- family-names: "Dhanoa"
given-names: "M. S."
- family-names: "Lister"
given-names: "S. J."
journal: "Applied Spectroscopy 43(5), 772–777"
year: 1989
doi: "10.1366/0003702894202201"
- type: article
title: "PLS-regression: a basic tool of chemometrics"
authors:
- family-names: "Wold"
given-names: "S."
- family-names: "Sjöström"
given-names: "M."
- family-names: "Eriksson"
given-names: "L."
journal: "Chemometrics and Intelligent Laboratory Systems 58(2), 109–130"
year: 2001
doi: "10.1016/S0169-7439(01)00155-1"