PLATFORM-COMPARISON Benchmark Results

Comparing geospatial analysis workflows across models, backends, and cost dimensions.

8
Problems
4
Gold Specs
22
Total Runs
3
Pareto-Optimal

Workflow Comparison

Workflow Model Backend Correct Avg Cost Avg Latency
exec gold folia-rust 7/7 --- 85ms
exec gold gee 8/8 $0.000757 13.6s
exec gold qgis 7/7 --- 321ms

Pareto Frontier: Cost vs Effectiveness

Points on the frontier (highlighted) represent configurations where no other option achieves more correct answers at equal or lower cost.

Pass Rate by Category

CategoryPassedTotalRate
unknown 22 22 100%

By Model

Model Attempted Valid Specs Correct Avg Token Cost Avg Gen Latency
gold 22 0 22 --- ---

By Backend

Backend Runnable Correct Avg Exec Cost Avg Latency
folia-rust 7 7 --- 85ms
gee 8 8 $0.000757 13.6s
qgis 7 7 --- 321ms

Problems

IDTitleDifficultyCategory
harmonic-phenology Harmonic NDVI Phenology (Kansas Cropland, 2020-2023) intermediate phenology
landsat-composite Landsat 8 Annual Median Composite (Yellowstone, 2022) intermediate temporal-analysis
landtrendr-change LandTrendr Forest Disturbance Detection (PNW, 1985-2023) difficult change-detection
morphological-urban Morphological Urban Footprint Extraction (Phoenix) intermediate morphological
sentinel2-ndvi Sentinel-2 NDVI (Iowa Farmland, Summer 2023) easy index-calculation
terrain-derivatives Terrain Derivatives from SRTM (Slope/Aspect/Hillshade) easy terrain-analysis
weighted-overlay Solar Siting Weighted Overlay (Utah) intermediate siting-analysis
zonal-stats Mean Elevation by County (Colorado) easy zonal-analysis