Calibrated from GEE execution on 2026-03-29 (actual: 87.18%). Random Forest with 50 trees trained on ESA WorldCover pseudo-labels (4 classes: forest, crop, urban, water). Cross-platform variation expected from RF implementation differences, training sample randomness, and cloud masking. GEE uses SmileRandomForest; folia uses scikit-learn or Rust RF.
| Workflow | Model | Backend | Status | Answer | Error | Cost | Latency |
|---|---|---|---|---|---|---|---|
| exec | gold | folia-rust | PASS | 73.475 | 15.5% | --- | 212ms |
| exec | gold | gee | PASS | 87.175 | 0.2% | $0.001967 | 35.4s |
| exec | gold | qgis | PASS | 85.0 | 2.3% | --- | 1.0s |
Known-correct folia spec for this problem. This is the reference implementation used for backend quality testing.
# Platform Comparison: Supervised Classification -- Gold Spec
#
# Train a Random Forest classifier on Sentinel-2 imagery using
# ESA WorldCover as pseudo-labels. Sacramento Valley, CA.
# Ground truth: ~80% overall accuracy.
#
# Note: Training is a Python-only step (scikit-learn or GEE's RF).
# The Rust/WASM compute layer handles inference only.
name: supervised-classification
version: "1.0"
description: >
Supervised land cover classification using Random Forest trained on
Sentinel-2 bands with ESA WorldCover pseudo-labels. 4 classes:
forest, crop, urban, water. Sacramento Valley, CA.
Ground truth: ~87% overall accuracy (calibrated from GEE 2026-03-29).
settings:
default_bbox: [-121.8, 38.4, -121.3, 38.8]
default_crs: EPSG:4326
layers:
# ============================================================
# SOURCE LAYERS
# ============================================================
source/sentinel2:
uri: stac://earth-search/sentinel-2-l2a
type: raster
description: >
Sentinel-2 L2A summer 2023 composite, bands B2-B12.
Sacramento Valley, CA.
params:
bbox: [-121.8, 38.4, -121.3, 38.8]
datetime: "2023-06-01/2023-09-01"
bands: [B2, B3, B4, B8, B11, B12]
source/worldcover:
uri: stac://earth-search/esa-worldcover
type: raster
description: >
ESA WorldCover v200 (10m) as training labels.
params:
bbox: [-121.8, 38.4, -121.3, 38.8]
# ============================================================
# COMPUTE: PREPROCESSING
# ============================================================
compute/s2-masked:
type: raster
description: >
Cloud-masked Sentinel-2 imagery.
compute:
op: cloud_mask_sentinel2
inputs:
data: { layer: source/sentinel2 }
compute/s2-composite:
type: raster
description: >
Median composite of cloud-masked S2 imagery.
compute:
op: temporal_reduce
inputs:
data: { layer: compute/s2-masked }
params:
reducer: median
compute/labels:
type: raster
description: >
WorldCover remapped to 4 classes: 1=forest, 2=crop, 3=urban, 4=water.
compute:
op: raster_calc
inputs:
lc: { layer: source/worldcover }
params:
expression: >
where(lc == 10, 1,
where(lc == 40, 2,
where(lc == 50, 3,
where(lc == 80, 4, 0))))
# ============================================================
# COMPUTE: CLASSIFICATION
# ============================================================
compute/classified:
type: raster
description: >
Random Forest classification result (4 classes).
Training uses stratified sample from WorldCover labels.
compute:
op: classify_rf
inputs:
features: { layer: compute/s2-composite }
labels: { layer: compute/labels }
params:
n_trees: 50
training_points: 500
bands: [B2, B3, B4, B8, B11, B12]
class_band: class
stratified: true
# ============================================================
# RESULT: CLASSIFICATION ACCURACY
# ============================================================
result/accuracy:
type: table
description: >
Overall classification accuracy vs WorldCover labels.
Ground truth: ~87%.
compute:
op: raster_calc
inputs:
classified: { layer: compute/classified }
labels: { layer: compute/labels }
params:
expression: "mean(where(labels > 0, classified == labels, nan))"
The prompt given to LLMs in single-shot workflow benchmarks.
Problem: Train a Random Forest classifier on Sentinel-2 summer 2023
imagery using ESA WorldCover as pseudo-labels, then report overall
classification accuracy.
Methodology:
- Sentinel-2 summer 2023 median composite (bands B2,B3,B4,B8,B11,B12)
- ESA WorldCover v200 remapped to 4 classes:
1=forest(10), 2=crop(40), 3=urban(50), 4=water(80)
- 500 stratified training points per class
- Random Forest with 50 trees
- 1000 stratified validation points
- Report overall accuracy (% correct vs WorldCover labels)
Study area: -121.8, 38.4, -121.3, 38.8 (Sacramento Valley, CA).
Data: Sentinel-2 L2A Harmonized + ESA WorldCover v200.
Expected answer: approximately 87% overall accuracy.