AgriTwin-GH

Greenhouse Weather Forecast Model (Chronos + XGBoost + LSTM Ensemble)

Who is this for?
Anyone — farmer, student, developer, or complete beginner — who wants to understand what the weather_forecast.ipynb notebook does, why it matters, and how to use the final model in practice. No maths or machine-learning background is required.


Table of Contents

  1. Big Picture: What Problem Are We Solving?
  2. What Exactly Does the Model Predict?
  3. Where Does the Data Come From?
  4. Step-by-Step Pipeline Overview
  5. Feature Engineering (Turning Raw Weather Into Inputs)
  6. The Three Model Families
  7. Ensemble: Optuna-Optimised Weight Blending
  8. Conditions Classifier (Sky Condition Labels)
  9. What Gets Saved After Training?
  10. How to Use the Final Model for Inference
  11. Key Metrics and Performance
  12. Architecture Summary & Design Decisions
  13. Standalone Test Suite: test_weather_forecast.py
  14. Glossary

1. Big Picture: What Problem Are We Solving?

Inside a controlled greenhouse, weather is not just outside — it is also inside:

If we can predict the next 24–48 hours of indoor conditions, we can:

The weather_forecast.ipynb notebook builds a multi-model ensemble forecasting system for the greenhouse environment. It does not just guess tomorrow’s value from thin air — it learns from historical sensor data (2024–2025 observations for Dindigul, Tamil Nadu) using three complementary machine-learning approaches: Chronos (pretrained foundation model), XGBoost (tree-based), and LSTM (recurrent neural network).


2. What Exactly Does the Model Predict?

The model predicts future values of several indoor climate variables at two time horizons:

Target variables:

Variable Unit Type
temp °C Continuous
humidity % Continuous
windspeed km/h Continuous
solarradiation W/m² Continuous
conditions Label Categorical (e.g. “Sunny”, “Cloudy”)

For each numeric target variable and horizon, the final system outputs a point forecast (single predicted value). Optionally, the conditions variable is predicted as a discrete label via a separate classifier.

Example output:

temp:           24h = 28.3°C,  48h = 29.1°C
humidity:       24h = 67.8%,   48h = 65.2%
windspeed:      24h = 4.2 km/h, 48h = 3.8 km/h
solarradiation: 24h = 450 W/m², 48h = 480 W/m²
conditions_24h: "Partly Cloudy"
conditions_48h: "Sunny"

3. Where Does the Data Come From?

The notebook assumes that you have a historical time series of greenhouse indoor conditions, for example:

For this project:

This historical dataset is split into three parts:

  1. Training set (70%) – earlier part of the history the models learn from.
  2. Validation set (15%) – a slice used to tune hyperparameters, optimise ensemble weights, and prevent overfitting.
  3. Test set (15%) – the last portion of history used only to check final performance (held out until the very end).

The notebook builds features from these time series and feeds them into the models described below.


4. Step-by-Step Pipeline Overview

At a high level, the notebook does the following:

  1. Load and clean raw weather data
    • Read indoor greenhouse measurements (2024–2025 Dindigul data)
    • Handle missing values and ensure a continuous timeline
    • Extract temporal features (month, day-of-year, etc.)
  2. Engineer features that help models understand seasonality, trends, and interactions:
    • Cyclical time encodings (sin/cos)
    • Lag features (past 1, 2, 3, 7, 14, 30 days)
    • Rolling statistics (7/14/30-day mean, std, min, max)
    • Dindigul seasonal flags
    • Climate normals and anomalies
    • Solar geometry features
    • Interaction terms (e.g. temperature × humidity)
  3. Prepare three types of models in parallel:
    • Chronos-T5-small – pretrained time-series transformer
    • XGBoost – gradient-boosted tree ensemble (8 models for 4 targets × 2 horizons)
    • LSTM – stacked recurrent network (1 model per target)
    • Random Forest classifier – maps numeric forecasts → sky condition labels
  4. Train each model family on the training set:
    • Chronos: warm-up (frozen encoder) → full fine-tune
    • XGBoost: warm-up (shallow) → fine-tune (deep) with early stopping
    • LSTM: sliding-window dataset, Huber loss, gradient clipping, early stopping
    • RF classifier: balanced class weighting
  5. Optimise ensemble weights via Bayesian search (Optuna):
    • For each (target, horizon) pair, find optimal blend of Chronos + XGBoost + LSTM
    • Minimises validation MAPE
    • 300 trials per combination
  6. Evaluate performance on the test set and generate plots

  7. Save all necessary artefacts for realtime use:
    • Scalers, encoders, feature configurations
    • All model weights (Chronos, XGBoost, LSTM)
    • Ensemble weights and evaluation metrics
    • A ready-to-use Python loader for inference
  8. Clean up intermediate files so only the final realtime bundle and required artefacts remain

5. Feature Engineering (Turning Raw Weather Into Inputs)

Raw numbers alone (“temperature = 26.3 °C”) do not directly capture:

To help the models, the notebook creates several feature types:

5.1 Cyclical Time Features

The sine/cosine encoding captures the circular nature of time (month 12 is next to month 1).

5.2 Dindigul Seasonal Flags

For the Dindigul region, the year is split into four distinct seasons:

Season Months Character
Hemant (Winter) Jan–Feb Cool, dry, post-NE monsoon tail
Grishma (Summer) Mar–May Hot, low humidity, pre-monsoon
Varsha (SW Monsoon) Jun–Sep High humidity, moderate rain
Sharad (NE Monsoon) Oct–Dec Rain peaks, humid

One-hot encoded flags tell the model which season each day belongs to.

5.3 Lag Features

The model includes lagged versions of target variables:

These help capture autocorrelation — the fact that today’s temperature is usually similar to yesterday’s or last week’s.

5.4 Rolling Statistics

To capture local trends and volatility, the notebook computes:

These tell the model if the climate has been gradually warming, cooling, or becoming more variable.

5.5 Climate Normals and Anomalies

The notebook builds climatological baselines:

For each day, the model can compute an anomaly:

anomaly = actual_value - typical_value_for_this_time_of_year

Because plants and disease risk often depend on deviations from normal, not just absolute values.

5.6 Solar Geometry & Chronos Meta-Features

Additional derived features:

5.7 Volatility Features (NEW - For Sparse/Erratic Variables)

For variables like windspeed and humidity that exhibit high volatility, additional derived features capture momentum and regime changes:

These features help models distinguish between genuine predictable patterns and random noise, improving R² for difficult variables.

All engineered features are stored in a configuration file (feature_config.json) so inference code can reproduce them.


6. The Three Model Families

The notebook uses three different forecasting approaches and later blends them. Each has complementary strengths:

6.1 Chronos-T5-Small: Pretrained Time-Series Foundation Model

What is Chronos?

How it works conceptually:

Two-Phase Fine-Tuning Strategy

  1. Warm-up phase (5 epochs, frozen encoder):
    • Freeze the encoder (pretrained knowledge is fixed)
    • Train only the decoder and output projection head
    • Uses learning rate 1e-3 (moderate)
    • Stabilises training, prevents catastrophic forgetting
  2. Full fine-tune phase (10 epochs, all layers):
    • Unfreeze all parameters
    • Lower learning rate to 1e-4 (more careful updates)
    • Monitor validation MAPE to detect overfitting
    • Restore best checkpoint at the end

Predictions

For each target variable, Chronos sees a 30-day context window and forecasts the next 2 steps (24h, 48h). These predictions are:


6.2 XGBoost: Gradient-Boosted Direct Multi-Step Forecaster (Per-Column Regularization)

What is XGBoost?

Direct Multi-Step Strategy

Instead of predicting one step at a time (which accumulates errors), we train one model per (target, horizon) pair:

Per-Column Variable-Specific Hyperparameters

Key innovation: different variables get different regularization, because not all variables are equally prone to overfitting:

Variable max_depth λ (reg_lambda) α (reg_alpha) min_child_weight Rationale
Temperature 4 2.0 0.1 5 Stable; standard regularisation
Humidity 2 6.0 0.5 15 Very volatile; strong regularisation
Windspeed 1 20.0 2.0 15 Extremely sparse; ultra-aggressive regularisation
Solar Radiation 4 2.0 0.1 5 Stable; standard regularisation

Rationale for Aggressive Windspeed Regularisation:

Two-Phase Training for Small Data

The dataset is limited (~300 useful samples after feature engineering), so XGBoost uses:

  1. Warm-up phase:
    • Shallow trees (max_depth=3, uniform for all)
    • Few boosting rounds (200 estimators)
    • Higher learning rate (0.10)
    • Quick convergence to a warm baseline
  2. Fine-tune phase (inherits warm-up booster):
    • Per-column max_depth (see table above)
    • More boosting rounds (600 estimators)
    • Lower learning rate (0.04)
    • Per-column regularisation (λ, α, min_child_weight from table)
    • Early stopping: stops if validation MAE doesn’t improve for 30 rounds

What XGBoost sees

Each model receives the full feature vector for the last available day:

XGBoost excels at finding the best combination of features for each prediction, complementing the sequence-focused Chronos and LSTM.


6.3 LSTM: Recurrent Sequential Regressor (Per-Horizon Separate Models)

What is LSTM?

Architecture

Each model has:

Per-Horizon Separate Models — Key Architectural Change

OLD approach (single model per target):

NEW approach (separate model per target × horizon):

Per-Horizon Dropout Strategy

Horizon Base Dropout Increase Effective Dropout (humidity) Effective Dropout (windspeed) Effective Dropout (temp) Effective Dropout (solar)
24h 0.00 0.10 0.05 0.25 0.25
48h +0.15 0.25 0.20 0.40 0.40

Rationale: Predicting 48 hours ahead is fundamentally harder (exponentially more uncertainty). Adding 0.15 extra dropout for 48h forces the model to rely on only the most robust learned patterns, preventing overfitting on noise.

Variable-Specific Learning Rates & Base Dropout

Variable Learning Rate Base Dropout Rationale
Temperature 1e-3 0.25 Stable temporal patterns; standard LR
Humidity 2e-4 0.10 Volatile swings; smaller LR for finer search
Windspeed 1e-4 0.05 Ultra-sparse data; conservative LR and light dropout
Solar Radiation 1e-3 0.25 Complex multi-modal; standard LR

Sliding-Window Dataset

Unlike direct multi-step, LSTM trains on sliding windows of features:

Loss Function: Huber Loss

Instead of simple mean-squared-error, we use Huber loss ($\delta=1.0$), which is robust to outliers:

This matters for weather data because occasional extreme events (dust storms, unusual wind gusts) shouldn’t dominate the loss.

Training Strategy

  1. Optimiser: Adam with weight decay (L2 regularisation: 1e-5)
  2. Learning rate schedule: Cosine annealing (starts at per-column LR in table, gradually drops to 1e-6 minimum over 100 epochs)
  3. Gradient clipping: Prevents exploding gradients (||g|| ≤ 1.0)
  4. Early stopping: Halts if validation loss doesn’t improve for 20 epochs
  5. Per-target & per-horizon scalers: Each (variable, horizon) pair gets its own StandardScaler:
    • Fit on training data only (prevents data leakage)
    • Predictions are inverse-transformed back to original units (°C, %, km/h, W/m²)

Why Per-Horizon Separate Models?

Why LSTM Over Temporal Fusion Transformer?

The notebook originally tried Temporal Fusion Transformer (TFT) — a powerful multi-entity forecasting architecture. However:


6.4 How We Train All Models Together

The full training pipeline follows this sequence:

Step 1: Data Loading & Preprocessing + Configuration

Step 2: Feature Engineering (Done Once on All Data)

Step 3: Chronological Train/Val/Test Split

Step 4: Feature Scaling

Step 5: Chronos Training

Step 6: XGBoost Training (Per-Column Hyperparameters)

Step 7: LSTM Training (Per-Horizon Separate Models)

Step 8: Ensemble Weight Optimisation (1200 Trials)

Step 9: Conditions Classifier Training

Step 10: Final Evaluation on Test Set

Step 11: Save Artefacts & Cleanup

This ensures:


7. Ensemble: Optuna-Optimised Weight Blending (1200 Trials)

No single model is perfect. Instead of choosing just one, the notebook uses an ensemble:

For each target variable and horizon (24h, 48h), it learns a set of weights that blend the three predictions:

\[\text{final\_prediction} = w_\text{Chronos} \cdot \hat{y}_\text{Chronos} + w_\text{XGB} \cdot \hat{y}_\text{XGB} + w_\text{LSTM} \cdot \hat{y}_\text{LSTM}\]

Constraints:

Optimisation Method: Optuna (Bayesian TPE Sampler)

Final Ensemble Weight Patterns

The optimised weights show distinct patterns per variable and horizon:

Target Horizon Chronos XGBoost LSTM Pattern
Temperature 24h 0.178 0.714 0.108 XGBoost dominant (tabular features work well)
Temperature 48h ~0.000 0.485 0.515 LSTM dominant (temporal structure matters for distant forecast)
Humidity 24h 0.380 0.620 ~0.000 XGBoost dominant (tabular features capture volatile swings)
Humidity 48h 0.620 ~0.000 0.380 Chronos dominant (pretrained model best for uncertain 48h)
Windspeed 24h 0.449 0.501 0.050 Balanced Chronos/XGBoost (sparse data)
Windspeed 48h 0.420 0.580 ~0.000 XGBoost dominant (regularised depth=1 robustness)
Solar Radiation 24h ~0.000 ~0.000 1.000 LSTM only (complex temporal patterns)
Solar Radiation 48h 0.097 ~0.000 0.903 LSTM dominant (sequence model best for distant solar)

Key Observations:

This data-driven, per-horizon-per-variable approach often produces more robust predictions than any single component, and adapts the blend to variable difficulty.


8. Conditions Classifier (Sky Condition Labels)

Numbers like “26.7 °C” and “65% humidity” are informative, but sometimes we want a human-friendly label such as:

How it works:

The notebook trains a separate Random Forest classifier (per horizon) that:

  1. Takes the numeric forecast values (temperature, humidity, windspeed, solar radiation) and temporal features
  2. Maps them to a discrete sky-condition label

Saved artifacts:


9. What Gets Saved After Training?

After the notebook finishes, you will have a directory structure like:

src/agritwin_gh/models/
├── environment_forecast_<run_id>.pt
│   └── Primary LSTM bundle — all per-target state dicts + target scalers, bundled
│       Keys: run_id, target_cols, lstm_config, lstm_states, target_scalers
│
└── artifacts/environment_forecast_<run_id>/
    ├── scalers.pkl
    │   └── RobustScaler for the full engineered feature matrix (fit on train only)
    │
    ├── label_encoder.pkl
    │   └── LabelEncoder for sky condition labels (e.g. "Sunny" → 0, "Cloudy" → 1)
    │
    ├── feature_config.json
    │   └── All feature names, target cols, context length, season map, condition classes
    │
    ├── climate_normals.json
    │   └── Monthly and weekly climatological means for each target variable
    │
    ├── ensemble_weights.json
    │   └── Optimal blend weights for each (target, horizon) combination
    │
    ├── evaluation_metrics.json
    │   └── Test set metrics: MAPE, R², RMSE, MAE, accuracy per target/horizon
    │
    ├── xgb_temp_24h.pkl
    ├── xgb_temp_48h.pkl
    ├── xgb_humidity_24h.pkl
    ├── xgb_humidity_48h.pkl
    ├── xgb_windspeed_24h.pkl
    ├── xgb_windspeed_48h.pkl
    ├── xgb_solarradiation_24h.pkl
    ├── xgb_solarradiation_48h.pkl
    │   └── 8 XGBoost models (one per target × horizon)
    │
    ├── conditions_classifier_24h.pkl
    ├── conditions_classifier_48h.pkl
    │   └── Random Forest classifiers for sky condition prediction
    │
    ├── chronos_finetuned/
    │   ├── t5_finetuned_state_dict.pt
    │   │   └── Fine-tuned Chronos T5 model weights
    │   └── chronos_finetune_config.json
    │       └── Training hyperparameters and loss history
    │
    ├── environment_forecast_loader.py
    │   └── Reusable Python inference helper (standalone, no notebook state)
    │
    └── plots/
        ├── eda_timeseries.png
        ├── eda_seasonal_boxplot.png
        ├── eda_correlation.png
        ├── chronos_training_curve.png
        ├── xgb_shap_importance.png
        ├── ensemble_predictions_test.png
        ├── metrics_summary.png
        └── (other visualisations)

Cleanup Policy: After training, per-target individual LSTM .pt state dict files and individual scaler .pkl files are removed from the artifact directory — they are redundant because the primary bundle (environment_forecast_<run_id>.pt) already contains all LSTM states and target scalers. All other inference-required artifacts (scalers, XGBoost, conditions classifiers, Chronos fine-tuned weights, feature config) are retained.


10. How to Use the Final Model for Inference

After training, the notebook generates a reusable Python helper: environment_forecast_loader.py

This module contains a class EnvironmentForecastModel that:

10.1 Minimal Usage Example

from pathlib import Path
from src.agritwin_gh.models.artifacts.environment_forecast_<run_id>.environment_forecast_loader import (
    EnvironmentForecastModel,
)

# Paths to artefacts and main model bundle
artifacts_dir = "src/agritwin_gh/models/artifacts/environment_forecast_<run_id>"
model_path   = "src/agritwin_gh/models/environment_forecast_<run_id>.pt"

# Instantiate model (CPU by default, or "cuda" for GPU)
model = EnvironmentForecastModel(
    artifacts_dir=artifacts_dir,
    main_model_path=model_path,
    device="cpu",
)

# df_context must have:
# - At least `context_length` rows (typically 30)
# - All feature columns (temperature, humidity, lags, rolling stats, etc.)
#   Names are defined in feature_config.json
preds = model.predict(df_context)

print(preds)
# Example output:
# {
#   "temp": {"24h": 28.3, "48h": 29.1},
#   "humidity": {"24h": 67.8, "48h": 65.2},
#   "windspeed": {"24h": 4.2, "48h": 3.8},
#   "solarradiation": {"24h": 450, "48h": 480}
# }

10.2 What Features Must df_context Have?

Look inside feature_config.json in the artefacts directory:

Your df_context should have all these columns in the exact order/names, with at least context_length rows.


11. Key Metrics and Performance

The notebook evaluates the ensemble on the test set (held out from training). Key metrics include:

11.1 Error Metrics

11.2 Correlation/Explanation Metrics

11.3 Accuracy Proxy

11.4 Conditions Classifier

All metrics are saved in evaluation_metrics.json.


11.5 Final Test Performance (Post-Optimization)

Results after per-column XGBoost regularization, separate per-horizon LSTM models, and 1200-trial ensemble optimization:

Target Horizon MAPE (%) Accuracy (%) RMSE MAE Status
Temperature 24h 3.17 96.83 1.100 0.841 0.6903 ✅ Excellent
Temperature 48h 3.69 96.31 1.295 0.975 0.5732 ✅ Good
Humidity 24h 8.56 91.44 7.219 6.087 0.3415 ✅ Good
Humidity 48h 11.40 88.60 9.919 7.985 -0.2430 ⚠️ Challenging
Windspeed 24h 26.65 73.35 5.750 4.688 0.0156 ⚠️ Volatile
Windspeed 48h 26.85 73.15 5.732 4.675 0.0131 ⚠️ Volatile
Solar Radiation 24h 31.72 68.28 42.615 35.136 0.5354 ✅ Good
Solar Radiation 48h 35.24 64.76 47.435 38.115 0.4244 ✅ Good
Conditions 24h 55.77 ⚠️ Fair
Conditions 48h 50.00 ⚠️ Fair

Results Summary:

Key Drivers:

  1. Per-column XGBoost regularization: Windspeed uses depth=1, λ=20.0 to prevent overfitting on sparse data
  2. Per-horizon LSTM models: Each horizon tuned independently; solar radiation 48h benefits from LSTM’s temporal memory
  3. 1200-trial Optuna: Discovery of variable-specific blends (e.g., humidity 48h → Chronos-dominant, solar → LSTM-only)
  4. Volatility-aware features: Momentum and regime indicators help distinguish predictable patterns from noise

12. Architecture Summary & Design Decisions

Why Per-Column Hyperparameters for XGBoost?

Different variables have different predictability:

This variable-aware tuning prevents overfitting on small data while allowing stronger models on easier targets.

Why Per-Horizon LSTM Models?

Single model for both horizons creates a compromise:

Separate per-horizon models allow:

Result: solar radiation 48h R² improved from 0.30 → 0.42, windspeed 48h recovered to positive R² (+0.013). Humidity 48h remains challenging (R² = -0.24) due to inherent 2-day volatility in daily aggregated data.

Why 1200 Trials for Ensemble Weights?

After model improvements, ensemble weight optimization became crucial:

Why LSTM Over Temporal Fusion Transformer?

TFT (Temporal Fusion Transformer) is state-of-the-art for large multi-entity datasets. On a single daily series of ~300 samples, it tends to:

LSTM is more data-efficient:

Why Chronos Meta-Features?

Chronos is a pretrained “time-series language model.” Its predictions contain valuable generalised knowledge:

Realtime Inference Footprint


12. Standalone Test Script

File: scripts/test_weather_forecast.py

What It Does

This script independently tests the trained Environment Forecast model without requiring the notebook. It generates 10 synthetic test scenarios covering diverse seasonal and climatic patterns, runs 24-hour and 48-hour ahead forecasts, and displays predicted temperature, humidity, wind speed, solar radiation, and sky conditions.

When to Use It

The 10 Test Scenarios

# Scenario Climate Pattern Tests
1 Summer baseline (June) Warm, moderate humidity, stable Routine summer conditions
2 Monsoon onset Rising humidity, dropping solar Transition dynamics
3 Winter cold (December) 10–18°C, low solar radiation Low-temperature extremes
4 Dry hot spell 35–40°C, humidity 25–35% Heat stress conditions
5 Overcast rainy Low solar, humidity 80–95% Cloudy/wet conditions
6 Clear sky peak 800–1050 W/m² solar radiation Maximum light availability
7 Post-monsoon transition Humidity dropping 85→55% Seasonal transition
8 24h vs 48h gap analysis Divergence checkpoint Forecast horizon effects
9 Minimum extreme (cold + dry + low light) Combined stress Worst-case conditions
10 Sine wave oscillation Rolling periodic pattern Feature stability test

How to Run It

# Run all 10 scenarios
python scripts/test_weather_forecast.py

# Run a specific scenario (1–10)
python scripts/test_weather_forecast.py --scenario 3

Example Output

Loading model   : environment_forecast_20260403_173201.pt
  Model loaded successfully.
  Ensemble weights loaded.

======================================================================
Scenario  1: Summer baseline — warm, moderate humidity (June)
  Temperature:
    24h forecast:  28.3 °C  (MAE ±1.2)
    48h forecast:  29.1 °C  (MAE ±1.5)
  Humidity:
    24h forecast:  67.8%  (MAE ±3.5)
    48h forecast:  65.2%  (MAE ±4.2)
  Wind Speed:
    24h forecast:   4.2 km/h  (MAE ±0.8)
    48h forecast:   3.8 km/h  (MAE ±1.0)
  Solar Radiation:
    24h forecast:  450 W/m²  (MAE ±80)
    48h forecast:  480 W/m²  (MAE ±100)
  Sky Conditions:
    24h:  Partly Cloudy
    48h:  Sunny

Understanding the Output

For each variable, the script displays:

The Ensemble Approach

Each scenario uses three model families blended via optimised weights:

  1. Chronos-T5-Small – Pretrained time-series foundation model (weight ~0.35–0.50)
  2. XGBoost – Gradient-boosted decision trees (weight ~0.30–0.40)
  3. LSTM – Recurrent neural network (weight ~0.15–0.30)

Weights are computed separately for each target variable and horizon, optimised to minimise validation error.

Synthetic Data Generation

The script generates realistic synthetic weather sequences using:

Forecast Accuracy Metrics

The model is trained to minimise:

Troubleshooting

Model won’t load (FileNotFoundError):

# Verify model exists
Get-ChildItem -Path "src/agritwin_gh/models/environment_forecast_*.pt"

Chronos checkpoint download on first run: The model automatically downloads the Chronos-T5-Small checkpoint (~600 MB) from Hugging Face on the first run. Subsequent runs use the local cache (much faster).

Extremely high or low predictions: This may indicate the scenario is outside the training data distribution. Check that:

13. Standalone Test Suite: test_weather_forecast.py

13.1 Overview

File location: scripts/test_weather_forecast.py

Purpose:
Standalone test script to validate the trained Environment Forecast ensemble model (Chronos + XGBoost + LSTM) across 10 realistic weather scenarios covering summer heat, monsoon onset, winter cold, dry spells, overcast periods, clear skies, and edge cases.

Why it exists:
The model predicts 24h and 48h-ahead values for temperature, humidity, windspeed, and solar radiation. This script exercises the ensemble without requiring the training notebook or live sensor integration — enabling rapid validation and confidence checks before deployment.

13.2 Usage

# Run all 10 scenarios
python scripts/test_weather_forecast.py

# Run a specific scenario (1–10)
python scripts/test_weather_forecast.py --scenario 4

# NOTE: First run downloads ~600 MB Chronos checkpoint to HuggingFace cache.
#       Subsequent runs use the cached model.

13.3 What the Script Tests

# Scenario What it validates
1 Summer baseline – warm, moderate humidity (June) Normal summer conditions; model should forecast stable warm/dry
2 Monsoon onset – humidity rising 60→90%, solar dropping Major season transition; 48h forecast should show humidity climb
3 Winter cold – 10–18°C, low solar (December) Cold season; model should forecast low temperatures, low solar
4 Dry hot spell – 35–40°C, low humidity (25–35%) Extreme heat; model should forecast sustained high temp/low humidity
5 Overcast rainy – low solar (<50 W/m²), humidity 80–95% Rainy period; model should forecast persistently low light
6 Clear sky peak – 800–1050 W/m², low humidity Optimal sunny day; model should forecast high solar, moderate temp
7 Post-monsoon transition – humidity dropping 85→55%, recovery Season change; 48h forecast should show humidity decline
8 24h vs 48h divergence check – validate both horizons are finite Tests model stability; ensures 48h ≠ 24h and both are realistic
9 Minimum climate extreme – 2–8°C, 10–20% humidity, low light Cold dry minimum; stress-tests model on edge-case values
10 Sine oscillation – intra-period variance, smooth cycles Tests rolling feature stability under periodic patterns

13.4 Expected Output Structure

For each scenario, the script prints a table:

──────────────────────────────────────────────────────────────────────
Scenario  4: Dry hot spell — 35–40°C, humidity 25–35%
  Variable          24h Forecast    48h Forecast
  ──────────────────────────────────────────────────
  temp                 37.50°C        38.20°C
  humidity              28.10%         25.40%
  windspeed            18.50 km/h     17.80 km/h
  solarradiation       820.00 W/m²    840.00 W/m²

Interpretation:

13.5 Data Generation Strategy

Each scenario generates a synthetic 30-day DataFrame with:

Method 1: Linear trends (_make_weather_df):

Method 2: Sine oscillations (_make_weather_df_sine):

DataFrame columns (required by model):

{
    "datetime": pd.DatetimeIndex,       # 30 daily dates
    "temp": float (°C),                 # 30 values
    "humidity": float (%),              # 30 values
    "windspeed": float (km/h),          # 30 values
    "solarradiation": float (W/m²)      # 30 values
}

13.6 Key Validation Points

13.7 Troubleshooting Failed Scenarios

“NaN” or infinite forecast values:

“Assertion failed: forecast is not finite”:

Unexpected forecast values (e.g., 200°C in scenario 4):

Import errors (torch, diffusers, etc.):

13.8 Integration with AgriTwin-GH

This script is a diagnostic tool for the Environment Forecast model:

  1. Model validation – Confirm predictions are sensible after retraining
  2. Scenario exploration – Test model response to seasonal extremes (worst-case planning)
  3. Feature debugging – Verify rolling/lag feature logic produces expected outputs
  4. Documentation – Provides working examples of DataFrame format for inference

For live greenhouse deployment, real sensor data flows through src/agritwin_gh/models/environment_forecast_inference.py → REST API → control system.

14. Glossary


This document is a comprehensive reference for the weather_forecast.ipynb notebook.
Treat the notebook as the implementation and this markdown as the guided tour and reference manual.

For detailed code, cell-by-cell execution, and interactive plots, refer to the notebook itself.