Greenhouse Weather Forecast Model (Chronos + XGBoost + LSTM Ensemble)
Who is this for?
Anyone — farmer, student, developer, or complete beginner — who wants to understand what the weather_forecast.ipynb notebook does, why it matters, and how to use the final model in practice. No maths or machine-learning background is required.
Table of Contents
- Big Picture: What Problem Are We Solving?
- What Exactly Does the Model Predict?
- Where Does the Data Come From?
- Step-by-Step Pipeline Overview
- Feature Engineering (Turning Raw Weather Into Inputs)
- The Three Model Families
- Ensemble: Optuna-Optimised Weight Blending
- Conditions Classifier (Sky Condition Labels)
- What Gets Saved After Training?
- How to Use the Final Model for Inference
- Key Metrics and Performance
- Architecture Summary & Design Decisions
- Standalone Test Suite:
test_weather_forecast.py
- Glossary
1. Big Picture: What Problem Are We Solving?
Inside a controlled greenhouse, weather is not just outside — it is also inside:
- Air temperature
- Relative humidity
- Solar radiation / light intensity
- Wind speed and other variables
If we can predict the next 24–48 hours of indoor conditions, we can:
- Adjust heating, cooling, fans, and fogging before conditions drift out of the safe zone.
- Plan irrigation and nutrient dosing more precisely.
- Anticipate disease risk windows where temperature and humidity combinations are dangerous.
- Feed a digital twin (AgriTwin-GH) that simulates future plant growth and disease.
The weather_forecast.ipynb notebook builds a multi-model ensemble forecasting system for the greenhouse environment. It does not just guess tomorrow’s value from thin air — it learns from historical sensor data (2024–2025 observations for Dindigul, Tamil Nadu) using three complementary machine-learning approaches: Chronos (pretrained foundation model), XGBoost (tree-based), and LSTM (recurrent neural network).
2. What Exactly Does the Model Predict?
The model predicts future values of several indoor climate variables at two time horizons:
- 24 hours ahead (“24h”)
- 48 hours ahead (“48h”)
Target variables:
| Variable |
Unit |
Type |
temp |
°C |
Continuous |
humidity |
% |
Continuous |
windspeed |
km/h |
Continuous |
solarradiation |
W/m² |
Continuous |
conditions |
Label |
Categorical (e.g. “Sunny”, “Cloudy”) |
For each numeric target variable and horizon, the final system outputs a point forecast (single predicted value). Optionally, the conditions variable is predicted as a discrete label via a separate classifier.
Example output:
temp: 24h = 28.3°C, 48h = 29.1°C
humidity: 24h = 67.8%, 48h = 65.2%
windspeed: 24h = 4.2 km/h, 48h = 3.8 km/h
solarradiation: 24h = 450 W/m², 48h = 480 W/m²
conditions_24h: "Partly Cloudy"
conditions_48h: "Sunny"
3. Where Does the Data Come From?
The notebook assumes that you have a historical time series of greenhouse indoor conditions, for example:
- One row per day (or per fixed time step, e.g. daily aggregates)
- Columns for each weather variable (temperature, humidity, windspeed, solar radiation, sky conditions)
- A date/time index to keep everything ordered
For this project:
- Data source: Dindigul District, Tamil Nadu weather data (2024–2025)
- Frequency: Daily observations
- Expected columns:
datetime, temp, humidity, windspeed, solarradiation, conditions, sunriseEpoch, sunsetEpoch, and others
This historical dataset is split into three parts:
- Training set (70%) – earlier part of the history the models learn from.
- Validation set (15%) – a slice used to tune hyperparameters, optimise ensemble weights, and prevent overfitting.
- Test set (15%) – the last portion of history used only to check final performance (held out until the very end).
The notebook builds features from these time series and feeds them into the models described below.
4. Step-by-Step Pipeline Overview
At a high level, the notebook does the following:
- Load and clean raw weather data
- Read indoor greenhouse measurements (2024–2025 Dindigul data)
- Handle missing values and ensure a continuous timeline
- Extract temporal features (month, day-of-year, etc.)
- Engineer features that help models understand seasonality, trends, and interactions:
- Cyclical time encodings (sin/cos)
- Lag features (past 1, 2, 3, 7, 14, 30 days)
- Rolling statistics (7/14/30-day mean, std, min, max)
- Dindigul seasonal flags
- Climate normals and anomalies
- Solar geometry features
- Interaction terms (e.g. temperature × humidity)
- Prepare three types of models in parallel:
- Chronos-T5-small – pretrained time-series transformer
- XGBoost – gradient-boosted tree ensemble (8 models for 4 targets × 2 horizons)
- LSTM – stacked recurrent network (1 model per target)
- Random Forest classifier – maps numeric forecasts → sky condition labels
- Train each model family on the training set:
- Chronos: warm-up (frozen encoder) → full fine-tune
- XGBoost: warm-up (shallow) → fine-tune (deep) with early stopping
- LSTM: sliding-window dataset, Huber loss, gradient clipping, early stopping
- RF classifier: balanced class weighting
- Optimise ensemble weights via Bayesian search (Optuna):
- For each (target, horizon) pair, find optimal blend of Chronos + XGBoost + LSTM
- Minimises validation MAPE
- 300 trials per combination
-
Evaluate performance on the test set and generate plots
- Save all necessary artefacts for realtime use:
- Scalers, encoders, feature configurations
- All model weights (Chronos, XGBoost, LSTM)
- Ensemble weights and evaluation metrics
- A ready-to-use Python loader for inference
- Clean up intermediate files so only the final realtime bundle and required artefacts remain
Raw numbers alone (“temperature = 26.3 °C”) do not directly capture:
- Time of year and seasonal patterns
- Recent trends and local volatility
- Typical baseline values for this time/location
- Interactions between variables
To help the models, the notebook creates several feature types:
5.1 Cyclical Time Features
- Day-of-year (sin/cos encoded)
- Month (sin/cos encoded)
- Day-of-week (sin/cos encoded)
The sine/cosine encoding captures the circular nature of time (month 12 is next to month 1).
5.2 Dindigul Seasonal Flags
For the Dindigul region, the year is split into four distinct seasons:
| Season |
Months |
Character |
| Hemant (Winter) |
Jan–Feb |
Cool, dry, post-NE monsoon tail |
| Grishma (Summer) |
Mar–May |
Hot, low humidity, pre-monsoon |
| Varsha (SW Monsoon) |
Jun–Sep |
High humidity, moderate rain |
| Sharad (NE Monsoon) |
Oct–Dec |
Rain peaks, humid |
One-hot encoded flags tell the model which season each day belongs to.
5.3 Lag Features
The model includes lagged versions of target variables:
- Values from 1, 2, 3, 7, 14, and 30 days ago
These help capture autocorrelation — the fact that today’s temperature is usually similar to yesterday’s or last week’s.
5.4 Rolling Statistics
To capture local trends and volatility, the notebook computes:
- 7/14/30-day rolling mean (shifted by 1 day to avoid look-ahead bias)
- 7/14/30-day rolling standard deviation
- 7/14/30-day min and max
These tell the model if the climate has been gradually warming, cooling, or becoming more variable.
5.5 Climate Normals and Anomalies
The notebook builds climatological baselines:
- Typical monthly averages
- Typical weekly averages
For each day, the model can compute an anomaly:
anomaly = actual_value - typical_value_for_this_time_of_year
Because plants and disease risk often depend on deviations from normal, not just absolute values.
Additional derived features:
- Day length (hours between sunrise and sunset)
- Normalised solar radiation (actual / day_length)
- Chronos meta-features: predictions from the fine-tuned Chronos model, reused as extra inputs to XGBoost and LSTM
(This gives other models a “head start” from the pretrained foundation model)
5.7 Volatility Features (NEW - For Sparse/Erratic Variables)
For variables like windspeed and humidity that exhibit high volatility, additional derived features capture momentum and regime changes:
- Momentum (rate-of-change): RoC over [1, 3, 7] days
- Volatility clustering: Rolling standard deviation with high/low regime indicators
- Autocorrelation proxies: ACF-like aggregations at lags [1, 7, 14]
- Regime switches: Seasonal high/low volatility flags
These features help models distinguish between genuine predictable patterns and random noise, improving R² for difficult variables.
All engineered features are stored in a configuration file (feature_config.json) so inference code can reproduce them.
6. The Three Model Families
The notebook uses three different forecasting approaches and later blends them. Each has complementary strengths:
- Chronos understands general time-series patterns (trends, seasonality, reversals)
- XGBoost captures tabular interactions and non-linear relationships
- LSTM captures medium-range temporal structure and local dependencies
6.1 Chronos-T5-Small: Pretrained Time-Series Foundation Model
What is Chronos?
- A T5-based transformer model released by Amazon for time-series forecasting
- Pre-trained on ~84 billion diverse time-series observations
- Treats time series like the language model treats text: patterns are learned generically, then fine-tuned for specific domains
How it works conceptually:
- Encoder: Reads a sequence of past values (e.g. last 30 days of temperature)
- Decoder: Learns to generate the next values step-by-step
- Attention layers: Can “look back” at any past position, not just recent history
Two-Phase Fine-Tuning Strategy
- Warm-up phase (5 epochs, frozen encoder):
- Freeze the encoder (pretrained knowledge is fixed)
- Train only the decoder and output projection head
- Uses learning rate 1e-3 (moderate)
- Stabilises training, prevents catastrophic forgetting
- Full fine-tune phase (10 epochs, all layers):
- Unfreeze all parameters
- Lower learning rate to 1e-4 (more careful updates)
- Monitor validation MAPE to detect overfitting
- Restore best checkpoint at the end
Predictions
For each target variable, Chronos sees a 30-day context window and forecasts the next 2 steps (24h, 48h). These predictions are:
- Used directly in the final weighted ensemble
- Reused as meta-features for XGBoost and LSTM (bootstrapping other models with pretrained knowledge)
6.2 XGBoost: Gradient-Boosted Direct Multi-Step Forecaster (Per-Column Regularization)
What is XGBoost?
- A gradient-boosted decision tree ensemble
- Builds many small trees sequentially; each new tree corrects errors from previous ones
- Excellent for tabular (structured) data with many features
Direct Multi-Step Strategy
Instead of predicting one step at a time (which accumulates errors), we train one model per (target, horizon) pair:
xgb_temp_24h.pkl → temperature 24 hours ahead
xgb_humidity_48h.pkl → humidity 48 hours ahead
- (total: 4 targets × 2 horizons = 8 XGBoost models)
Per-Column Variable-Specific Hyperparameters
Key innovation: different variables get different regularization, because not all variables are equally prone to overfitting:
| Variable |
max_depth |
λ (reg_lambda) |
α (reg_alpha) |
min_child_weight |
Rationale |
| Temperature |
4 |
2.0 |
0.1 |
5 |
Stable; standard regularisation |
| Humidity |
2 |
6.0 |
0.5 |
15 |
Very volatile; strong regularisation |
| Windspeed |
1 |
20.0 |
2.0 |
15 |
Extremely sparse; ultra-aggressive regularisation |
| Solar Radiation |
4 |
2.0 |
0.1 |
5 |
Stable; standard regularisation |
Rationale for Aggressive Windspeed Regularisation:
- Windspeed data is sparse and highly variable (many days with similar values)
- Deep trees overfit on noise
- Depth=1 (stump-like trees) paired with extreme L1/L2 penalties forces the model to find only the strongest feature interactions
- Result: model generalises better to unseen data
Two-Phase Training for Small Data
The dataset is limited (~300 useful samples after feature engineering), so XGBoost uses:
- Warm-up phase:
- Shallow trees (
max_depth=3, uniform for all)
- Few boosting rounds (200 estimators)
- Higher learning rate (0.10)
- Quick convergence to a warm baseline
- Fine-tune phase (inherits warm-up booster):
- Per-column max_depth (see table above)
- More boosting rounds (600 estimators)
- Lower learning rate (0.04)
- Per-column regularisation (λ, α, min_child_weight from table)
- Early stopping: stops if validation MAE doesn’t improve for 30 rounds
What XGBoost sees
Each model receives the full feature vector for the last available day:
- All lag features, rolling statistics, season flags, Chronos meta-features, etc.
XGBoost excels at finding the best combination of features for each prediction, complementing the sequence-focused Chronos and LSTM.
6.3 LSTM: Recurrent Sequential Regressor (Per-Horizon Separate Models)
What is LSTM?
- Long Short-Term Memory — a type of recurrent neural network (RNN)
- Designed to process sequences while maintaining internal memory
- Can “remember” distant past events (via gating mechanisms)
Architecture
Each model has:
- 2 stacked LSTM layers (128 hidden units per layer)
- LayerNorm on the final hidden state (improves training stability)
- Per-horizon dropout: Different for 24h vs 48h horizons (see table below)
- Feed-forward head: maps hidden state → single step prediction (not 2 like before)
Per-Horizon Separate Models — Key Architectural Change
OLD approach (single model per target):
- One model per variable (e.g.,
lstm_temperature) outputting both 24h and 48h simultaneously
- Result: model had to balance two incompatible objectives, producing suboptimal 48h predictions
NEW approach (separate model per target × horizon):
- 8 separate models total: one for each (target, horizon) pair
lstm_temp_24h, lstm_temp_48h, lstm_humidity_24h, lstm_humidity_48h, etc.
- Each model outputs only one horizon, allowing horizon-specific tuning
- Result: 48h predictions can be heavily regularised without hurting 24h
Per-Horizon Dropout Strategy
| Horizon |
Base Dropout Increase |
Effective Dropout (humidity) |
Effective Dropout (windspeed) |
Effective Dropout (temp) |
Effective Dropout (solar) |
| 24h |
0.00 |
0.10 |
0.05 |
0.25 |
0.25 |
| 48h |
+0.15 |
0.25 |
0.20 |
0.40 |
0.40 |
Rationale: Predicting 48 hours ahead is fundamentally harder (exponentially more uncertainty). Adding 0.15 extra dropout for 48h forces the model to rely on only the most robust learned patterns, preventing overfitting on noise.
Variable-Specific Learning Rates & Base Dropout
| Variable |
Learning Rate |
Base Dropout |
Rationale |
| Temperature |
1e-3 |
0.25 |
Stable temporal patterns; standard LR |
| Humidity |
2e-4 |
0.10 |
Volatile swings; smaller LR for finer search |
| Windspeed |
1e-4 |
0.05 |
Ultra-sparse data; conservative LR and light dropout |
| Solar Radiation |
1e-3 |
0.25 |
Complex multi-modal; standard LR |
Sliding-Window Dataset
Unlike direct multi-step, LSTM trains on sliding windows of features:
- For each day in the training period, take the previous 30 days of features as input
- Target: single horizon value (24h-ahead OR 48h-ahead, not both)
- Create per-horizon
StandardScaler fitted on training data only
- Result: many overlapping training examples, each associated with one specific horizon
Loss Function: Huber Loss
Instead of simple mean-squared-error, we use Huber loss ($\delta=1.0$), which is robust to outliers:
- For small errors: acts like MSE (smooth quadratic penalty)
- For large errors: acts like MAE (linear penalty, less severe)
This matters for weather data because occasional extreme events (dust storms, unusual wind gusts) shouldn’t dominate the loss.
Training Strategy
- Optimiser: Adam with weight decay (L2 regularisation: 1e-5)
- Learning rate schedule: Cosine annealing (starts at per-column LR in table, gradually drops to 1e-6 minimum over 100 epochs)
- Gradient clipping: Prevents exploding gradients (
||g|| ≤ 1.0)
- Early stopping: Halts if validation loss doesn’t improve for 20 epochs
- Per-target & per-horizon scalers: Each (variable, horizon) pair gets its own
StandardScaler:
- Fit on training data only (prevents data leakage)
- Predictions are inverse-transformed back to original units (°C, %, km/h, W/m²)
Why Per-Horizon Separate Models?
- Problem with single model for both horizons: Model forced to average prediction quality across 24h and 48h; heavy regularisation helps 48h but hurts 24h
- Solution (per-horizon models): Each horizon gets its own model, tuned to its own difficulty level
- Result: 48h predictions maintain competitive R² scores (e.g., humidity 48h improved from negative to +0.0024, windspeed 48h from -0.2574 to +0.0128)
Why LSTM Over Temporal Fusion Transformer?
The notebook originally tried Temporal Fusion Transformer (TFT) — a powerful multi-entity forecasting architecture. However:
- TFT is designed for hundreds/thousands of time series (multi-entity datasets)
- On a single daily series of ~300 training samples, TFT tends to underfit or diverge, producing negative R²
- LSTM is more data-efficient:
- Far fewer parameters (less overfitting risk)
- More stable convergence on small data
- Proven track record on weather-like sequences
6.4 How We Train All Models Together
The full training pipeline follows this sequence:
Step 1: Data Loading & Preprocessing + Configuration
- Load 2024–2025 Dindigul weather data
- Align timestamps, remove duplicates
- Extract temporal features (year, month, day, dayofweek, etc.)
- Initialize CONFIG with per-column and per-horizon hyperparameters:
lstm_dropout_per_col: {temp: 0.25, humidity: 0.10, windspeed: 0.05, solarradiation: 0.25}
lstm_dropout_per_horizon: {1: 0.00, 2: 0.15} (add 0.15 to 48h predictions)
lstm_lr_per_col: {temp: 1e-3, humidity: 2e-4, windspeed: 1e-4, solarradiation: 1e-3}
xgb_max_depth_per_col: {temp: 4, humidity: 2, windspeed: 1, solarradiation: 4}
xgb_reg_lambda_per_col: {temp: 2.0, humidity: 6.0, windspeed: 20.0, solarradiation: 2.0}
xgb_reg_alpha_per_col: {temp: 0.1, humidity: 0.5, windspeed: 2.0, solarradiation: 0.1}
xgb_min_child_weight_per_col: {temp: 5, humidity: 15, windspeed: 15, solarradiation: 5}
Step 2: Feature Engineering (Done Once on All Data)
- Cyclical encodings (sin/cos for time features)
- Lag features (1, 2, 3, 7, 14, 30 days per target)
- Rolling statistics (7/14/30-day mean/std/min/max, shifted by 1 day)
- Dindigul seasons (one-hot encoded)
- Climate normals (monthly/weekly averages, computed on full dataset)
- Anomalies (actual - normal)
- Solar geometry (day length, normalized radiation)
- Interaction features (temp × humidity, wind × solar, etc.)
- Volatility features (momentum, regime detection for wind/humidity)
- Create forward-shifted targets for t+1 (24h) and t+2 (48h)
- Store all engineered features with versions in
feature_config.json
Step 3: Chronological Train/Val/Test Split
- Train: first 70% of samples
- Validation: next 15%
- Test: final 15% (held out completely until final evaluation)
- Split along time axis (no shuffling) to avoid look-ahead bias
Step 4: Feature Scaling
- Fit
RobustScaler on training data only
- Apply to validation and test
- Prevents data leakage
Step 5: Chronos Training
- Build time-series windows (30 days context, 2 steps target)
- Warm-up (5 epochs, frozen encoder only)
- Full fine-tune (10 epochs, all layers, restore best checkpoint)
- Generate meta-features by predicting on all splits
Step 6: XGBoost Training (Per-Column Hyperparameters)
- For each (target, horizon) pair (8 total):
- Warm-up phase:
- 200 estimators, shallow trees (depth=3 fixed), LR=0.10
- Goal: quick baseline convergence
- Fine-tune phase (inherits warm-up booster):
- 600 estimators, per-column depth (e.g., windspeed depth=1, humidity depth=2)
- LR=0.04
- Per-column regularisation: read λ, α, min_child_weight from CONFIG per-col dicts
- Example: windspeed uses λ=20.0 (ultra-aggressive L2 penalty)
- Early stopping: 30 rounds on validation MAE
Step 7: LSTM Training (Per-Horizon Separate Models)
- For each (target, horizon) pair (8 total):
- Build horizon-specific sliding-window dataset (30-day context → single horizon)
- Create per-horizon
StandardScaler (fit on training data only)
- Read per-column hyperparameters from CONFIG:
- Base dropout:
lstm_dropout_per_col[target] (e.g., humidity=0.10)
- Horizon addon:
lstm_dropout_per_horizon[h] (e.g., horizon 2 adds +0.15 for 48h)
- Final dropout: 0.10 + 0.15 = 0.25 for humidity 48h
- Learning rate:
lstm_lr_per_col[target] (e.g., windspeed=1e-4)
- Train with Huber loss (δ=1.0), Adam with weight decay (L2=1e-5)
- Learning rate schedule: Cosine annealing over 100 epochs
- Early stopping on validation loss (patience=20)
- Restore best checkpoint
- Save per-(col, h) state dict and scaler
Step 8: Ensemble Weight Optimisation (1200 Trials)
- Collect validation predictions from all three model families on held-out validation set
- Use Optuna (Bayesian Tree-Parzen Estimator sampler) to find optimal weights:
- Constraint: $w_\text{Chronos} + w_\text{XGB} + w_\text{LSTM} = 1$, all ≥ 0
- Objective: minimise validation MAPE (per target, per horizon)
- 1200 trials per (target, horizon) pair (increased from 300 to find better combinations)
- Optimization discovers which model dominates for each: e.g., solar 48h prefers pure LSTM (weight=1.0)
- Store best weights in
ensemble_weights.json
Step 9: Conditions Classifier Training
- For each horizon (24h, 48h):
- Build input: raw weather values + temporal features
- Target: observed sky condition (Sunny, Cloudy, etc.)
- Train Random Forest (400 trees, max_depth=10, balanced class weights)
Step 10: Final Evaluation on Test Set
- Ensemble predictions: blend Chronos + XGBoost + LSTM with optimised weights (per-col, per-horizon)
- Compute metrics: MAPE, R², RMSE, MAE, Accuracy per target/horizon
- All R² values confirmed > 0 (including previously negative humidity/windspeed 48h)
- Generate visualisations (actual vs predicted, metric summaries, weight distributions)
Step 11: Save Artefacts & Cleanup
- Bundle LSTM states (8 models × state dict per (col, h)) + scalers (8 per-horizon scalers) into
environment_forecast_<run_id>.pt
- Save all supporting files:
feature_config.json (feature names, target cols, context length, volatility flags)
- Per-column hyperparameter configs (for audit trail)
- Ensemble weights with confidence intervals
- Evaluation metrics (final R², MAPE, RMSE per target/horizon)
- Move plots to artefacts folder
- Delete intermediate files (fine-tuned checkpoints, temp models, Lightning logs, warm-up boosters)
This ensures:
- All models see consistent features and splits
- Chronos and LSTM exploit temporal structure with horizon-aware regularisation
- XGBoost focuses on rich tabular interactions with variable-specific regularisation
- Ensemble learns data-driven weights per-target-per-horizon (no manual guessing)
- All volatile variables (humidity, windspeed) recover to positive R² values
7. Ensemble: Optuna-Optimised Weight Blending (1200 Trials)
No single model is perfect. Instead of choosing just one, the notebook uses an ensemble:
For each target variable and horizon (24h, 48h), it learns a set of weights that blend the three predictions:
\[\text{final\_prediction} = w_\text{Chronos} \cdot \hat{y}_\text{Chronos} + w_\text{XGB} \cdot \hat{y}_\text{XGB} + w_\text{LSTM} \cdot \hat{y}_\text{LSTM}\]
Constraints:
- $w_\text{Chronos} + w_\text{XGB} + w_\text{LSTM} = 1$
- All weights ≥ 0
Optimisation Method: Optuna (Bayesian TPE Sampler)
- 1200 trials per (target, horizon) (increased from 300 to find better weight combinations after model improvements)
- Objective: minimise validation MAPE
- Sampler: Tree-Parzen Estimator (Bayesian search)
- Output: optimal weights stored in
ensemble_weights.json
Final Ensemble Weight Patterns
The optimised weights show distinct patterns per variable and horizon:
| Target |
Horizon |
Chronos |
XGBoost |
LSTM |
Pattern |
| Temperature |
24h |
0.178 |
0.714 |
0.108 |
XGBoost dominant (tabular features work well) |
| Temperature |
48h |
~0.000 |
0.485 |
0.515 |
LSTM dominant (temporal structure matters for distant forecast) |
| Humidity |
24h |
0.380 |
0.620 |
~0.000 |
XGBoost dominant (tabular features capture volatile swings) |
| Humidity |
48h |
0.620 |
~0.000 |
0.380 |
Chronos dominant (pretrained model best for uncertain 48h) |
| Windspeed |
24h |
0.449 |
0.501 |
0.050 |
Balanced Chronos/XGBoost (sparse data) |
| Windspeed |
48h |
0.420 |
0.580 |
~0.000 |
XGBoost dominant (regularised depth=1 robustness) |
| Solar Radiation |
24h |
~0.000 |
~0.000 |
1.000 |
LSTM only (complex temporal patterns) |
| Solar Radiation |
48h |
0.097 |
~0.000 |
0.903 |
LSTM dominant (sequence model best for distant solar) |
Key Observations:
- Temperature 48h is LSTM-heavy (0.515): temporal patterns matter for distant forecasts
- Solar 24h & 48h are LSTM-dominant (1.000 and 0.903): complex solar patterns captured best by sequence models
- Windspeed 48h is XGB-heavy (0.580): aggressive regularisation (depth=1) forces robustness
- Humidity 48h is Chronos-dominant (0.620): pretrained time-series knowledge best handles highly uncertain 48h humidity
This data-driven, per-horizon-per-variable approach often produces more robust predictions than any single component, and adapts the blend to variable difficulty.
8. Conditions Classifier (Sky Condition Labels)
Numbers like “26.7 °C” and “65% humidity” are informative, but sometimes we want a human-friendly label such as:
- “Sunny”
- “Partly Cloudy”
- “Overcast”
- “Rainy”
How it works:
The notebook trains a separate Random Forest classifier (per horizon) that:
- Takes the numeric forecast values (temperature, humidity, windspeed, solar radiation) and temporal features
- Maps them to a discrete sky-condition label
Saved artifacts:
conditions_classifier_24h.pkl – condition forecast 24h ahead
conditions_classifier_48h.pkl – condition forecast 48h ahead
label_encoder.pkl – decoder from integer codes back to label strings
9. What Gets Saved After Training?
After the notebook finishes, you will have a directory structure like:
src/agritwin_gh/models/
├── environment_forecast_<run_id>.pt
│ └── Primary LSTM bundle — all per-target state dicts + target scalers, bundled
│ Keys: run_id, target_cols, lstm_config, lstm_states, target_scalers
│
└── artifacts/environment_forecast_<run_id>/
├── scalers.pkl
│ └── RobustScaler for the full engineered feature matrix (fit on train only)
│
├── label_encoder.pkl
│ └── LabelEncoder for sky condition labels (e.g. "Sunny" → 0, "Cloudy" → 1)
│
├── feature_config.json
│ └── All feature names, target cols, context length, season map, condition classes
│
├── climate_normals.json
│ └── Monthly and weekly climatological means for each target variable
│
├── ensemble_weights.json
│ └── Optimal blend weights for each (target, horizon) combination
│
├── evaluation_metrics.json
│ └── Test set metrics: MAPE, R², RMSE, MAE, accuracy per target/horizon
│
├── xgb_temp_24h.pkl
├── xgb_temp_48h.pkl
├── xgb_humidity_24h.pkl
├── xgb_humidity_48h.pkl
├── xgb_windspeed_24h.pkl
├── xgb_windspeed_48h.pkl
├── xgb_solarradiation_24h.pkl
├── xgb_solarradiation_48h.pkl
│ └── 8 XGBoost models (one per target × horizon)
│
├── conditions_classifier_24h.pkl
├── conditions_classifier_48h.pkl
│ └── Random Forest classifiers for sky condition prediction
│
├── chronos_finetuned/
│ ├── t5_finetuned_state_dict.pt
│ │ └── Fine-tuned Chronos T5 model weights
│ └── chronos_finetune_config.json
│ └── Training hyperparameters and loss history
│
├── environment_forecast_loader.py
│ └── Reusable Python inference helper (standalone, no notebook state)
│
└── plots/
├── eda_timeseries.png
├── eda_seasonal_boxplot.png
├── eda_correlation.png
├── chronos_training_curve.png
├── xgb_shap_importance.png
├── ensemble_predictions_test.png
├── metrics_summary.png
└── (other visualisations)
Cleanup Policy: After training, per-target individual LSTM .pt state dict files and individual scaler .pkl files are removed from the artifact directory — they are redundant because the primary bundle (environment_forecast_<run_id>.pt) already contains all LSTM states and target scalers. All other inference-required artifacts (scalers, XGBoost, conditions classifiers, Chronos fine-tuned weights, feature config) are retained.
10. How to Use the Final Model for Inference
After training, the notebook generates a reusable Python helper: environment_forecast_loader.py
This module contains a class EnvironmentForecastModel that:
- Loads all necessary artefacts (scalers, encoders, weights, model files)
- Handles feature engineering and scaling
- Executes Chronos, XGBoost, and LSTM models
- Blends predictions using optimised weights
- Returns forecast dict
10.1 Minimal Usage Example
from pathlib import Path
from src.agritwin_gh.models.artifacts.environment_forecast_<run_id>.environment_forecast_loader import (
EnvironmentForecastModel,
)
# Paths to artefacts and main model bundle
artifacts_dir = "src/agritwin_gh/models/artifacts/environment_forecast_<run_id>"
model_path = "src/agritwin_gh/models/environment_forecast_<run_id>.pt"
# Instantiate model (CPU by default, or "cuda" for GPU)
model = EnvironmentForecastModel(
artifacts_dir=artifacts_dir,
main_model_path=model_path,
device="cpu",
)
# df_context must have:
# - At least `context_length` rows (typically 30)
# - All feature columns (temperature, humidity, lags, rolling stats, etc.)
# Names are defined in feature_config.json
preds = model.predict(df_context)
print(preds)
# Example output:
# {
# "temp": {"24h": 28.3, "48h": 29.1},
# "humidity": {"24h": 67.8, "48h": 65.2},
# "windspeed": {"24h": 4.2, "48h": 3.8},
# "solarradiation": {"24h": 450, "48h": 480}
# }
10.2 What Features Must df_context Have?
Look inside feature_config.json in the artefacts directory:
all_feature_names – complete list of feature column names (lags, rolling stats, etc.)
target_cols – variables being predicted (temp, humidity, etc.)
context_length – how many recent rows are required (typically 30 days)
Your df_context should have all these columns in the exact order/names, with at least context_length rows.
The notebook evaluates the ensemble on the test set (held out from training). Key metrics include:
11.1 Error Metrics
- MAE (Mean Absolute Error) – average absolute difference between forecast and actual
- RMSE (Root Mean Squared Error) – penalises larger errors more
- MAPE (Mean Absolute Percentage Error) – percentage error (useful for comparing variables with different scales)
11.2 Correlation/Explanation Metrics
- R² Score – how much variance the model explains (1.0 = perfect, 0.0 = no better than constant, <0 = worse than constant)
11.3 Accuracy Proxy
- Accuracy = 100 - MAPE(%)
A target accuracy of ≥ 95% means MAPE ≤ 5%
11.4 Conditions Classifier
- Accuracy – percentage of correctly predicted sky condition labels (24h and 48h)
All metrics are saved in evaluation_metrics.json.
11.5 Final Test Performance (Post-Optimization)
Results after per-column XGBoost regularization, separate per-horizon LSTM models, and 1200-trial ensemble optimization:
| Target |
Horizon |
MAPE (%) |
Accuracy (%) |
RMSE |
MAE |
R² |
Status |
| Temperature |
24h |
3.17 |
96.83 |
1.100 |
0.841 |
0.6903 |
✅ Excellent |
| Temperature |
48h |
3.69 |
96.31 |
1.295 |
0.975 |
0.5732 |
✅ Good |
| Humidity |
24h |
8.56 |
91.44 |
7.219 |
6.087 |
0.3415 |
✅ Good |
| Humidity |
48h |
11.40 |
88.60 |
9.919 |
7.985 |
-0.2430 |
⚠️ Challenging |
| Windspeed |
24h |
26.65 |
73.35 |
5.750 |
4.688 |
0.0156 |
⚠️ Volatile |
| Windspeed |
48h |
26.85 |
73.15 |
5.732 |
4.675 |
0.0131 |
⚠️ Volatile |
| Solar Radiation |
24h |
31.72 |
68.28 |
42.615 |
35.136 |
0.5354 |
✅ Good |
| Solar Radiation |
48h |
35.24 |
64.76 |
47.435 |
38.115 |
0.4244 |
✅ Good |
| Conditions |
24h |
— |
55.77 |
— |
— |
— |
⚠️ Fair |
| Conditions |
48h |
— |
50.00 |
— |
— |
— |
⚠️ Fair |
Results Summary:
- ✅ Temperature robust (R² > 0.57 for both horizons; MAPE < 4%)
- ✅ Humidity 24h good (R² = 0.34; 91.4% accuracy)
- ✅ Solar radiation strong (R² = 0.54 at 24h, 0.42 at 48h — significant improvement)
- ✅ Windspeed positive R² (both horizons; inherently sparse variable)
- ⚠️ Humidity 48h challenging (R² = -0.24); 2-day humidity forecasting remains inherently uncertain at this data density
- ⚠️ Windspeed accuracy limited (MAPE ~27%); daily aggregations mask sub-daily variability
Key Drivers:
- Per-column XGBoost regularization: Windspeed uses depth=1, λ=20.0 to prevent overfitting on sparse data
- Per-horizon LSTM models: Each horizon tuned independently; solar radiation 48h benefits from LSTM’s temporal memory
- 1200-trial Optuna: Discovery of variable-specific blends (e.g., humidity 48h → Chronos-dominant, solar → LSTM-only)
- Volatility-aware features: Momentum and regime indicators help distinguish predictable patterns from noise
12. Architecture Summary & Design Decisions
Why Per-Column Hyperparameters for XGBoost?
Different variables have different predictability:
- Temperature & Solar (stable): Standard depth=4, moderate regularisation (λ=2.0)
- Humidity (volatile): Aggressive depth=2, stronger regularisation (λ=6.0)
- Windspeed (ultra-sparse): Extreme depth=1 with λ=20.0, forcing stump-like trees that capture only the most robust patterns
This variable-aware tuning prevents overfitting on small data while allowing stronger models on easier targets.
Why Per-Horizon LSTM Models?
Single model for both horizons creates a compromise:
- Heavy regularisation helps 48h but hurts 24h accuracy
- Light regularisation helps 24h but allows 48h to overfit
Separate per-horizon models allow:
- 24h model: light regularisation for precision (dropout=0.05-0.10 base)
- 48h model: aggressive regularisation to fight uncertainty (dropout adds +0.15)
Result: solar radiation 48h R² improved from 0.30 → 0.42, windspeed 48h recovered to positive R² (+0.013). Humidity 48h remains challenging (R² = -0.24) due to inherent 2-day volatility in daily aggregated data.
Why 1200 Trials for Ensemble Weights?
After model improvements, ensemble weight optimization became crucial:
- Initial 300 trials found local optima
- 1200 trials enabled discovery of better blends
- Example: windspeed 48h confirmed XGB-dominant (0.580) for robustness
- Example: solar 24h found pure LSTM (1.000) optimal; solar 48h is LSTM-dominant (0.903)
- Example: humidity 48h switched to Chronos-dominant (0.620), outperforming LSTM for uncertain distant humidity
TFT (Temporal Fusion Transformer) is state-of-the-art for large multi-entity datasets. On a single daily series of ~300 samples, it tends to:
- Overfire on complex interactions (too many parameters)
- Underfit or produce negative R²
LSTM is more data-efficient:
- Fewer parameters
- Proven convergence on small weather datasets
- Huber loss handles outliers robustly
Chronos is a pretrained “time-series language model.” Its predictions contain valuable generalised knowledge:
- Reusing as features for XGBoost and LSTM bootstraps weaker models
- Gives tree and RNN methods a “head start” on temporal patterns
- The
.pt bundle contains only the LSTM weights and scalers
- XGBoost and RF models are separate small pickle files
- Fine-tuned Chronos state dict is a few MB
- Total footprint: < 50 MB (easily deployable to edge devices or cloud APIs)
12. Standalone Test Script
File: scripts/test_weather_forecast.py
What It Does
This script independently tests the trained Environment Forecast model without requiring the notebook. It generates 10 synthetic test scenarios covering diverse seasonal and climatic patterns, runs 24-hour and 48-hour ahead forecasts, and displays predicted temperature, humidity, wind speed, solar radiation, and sky conditions.
When to Use It
- Quick validation – verify the ensemble model loads and generates forecasts
- Seasonal scenario testing – check predictions for summer, monsoon, winter conditions
- Extreme-case validation – test model behaviour on edge cases (heat waves, cold snaps)
- Forecast confidence check – ensure predictions are within reasonable bounds
- Demonstration – show stakeholders multi-step-ahead greenhouse climate forecasting
- CI/CD pipelines – automated model health checks before deployment
The 10 Test Scenarios
| # |
Scenario |
Climate Pattern |
Tests |
| 1 |
Summer baseline (June) |
Warm, moderate humidity, stable |
Routine summer conditions |
| 2 |
Monsoon onset |
Rising humidity, dropping solar |
Transition dynamics |
| 3 |
Winter cold (December) |
10–18°C, low solar radiation |
Low-temperature extremes |
| 4 |
Dry hot spell |
35–40°C, humidity 25–35% |
Heat stress conditions |
| 5 |
Overcast rainy |
Low solar, humidity 80–95% |
Cloudy/wet conditions |
| 6 |
Clear sky peak |
800–1050 W/m² solar radiation |
Maximum light availability |
| 7 |
Post-monsoon transition |
Humidity dropping 85→55% |
Seasonal transition |
| 8 |
24h vs 48h gap analysis |
Divergence checkpoint |
Forecast horizon effects |
| 9 |
Minimum extreme (cold + dry + low light) |
Combined stress |
Worst-case conditions |
| 10 |
Sine wave oscillation |
Rolling periodic pattern |
Feature stability test |
How to Run It
# Run all 10 scenarios
python scripts/test_weather_forecast.py
# Run a specific scenario (1–10)
python scripts/test_weather_forecast.py --scenario 3
Example Output
Loading model : environment_forecast_20260403_173201.pt
Model loaded successfully.
Ensemble weights loaded.
======================================================================
Scenario 1: Summer baseline — warm, moderate humidity (June)
Temperature:
24h forecast: 28.3 °C (MAE ±1.2)
48h forecast: 29.1 °C (MAE ±1.5)
Humidity:
24h forecast: 67.8% (MAE ±3.5)
48h forecast: 65.2% (MAE ±4.2)
Wind Speed:
24h forecast: 4.2 km/h (MAE ±0.8)
48h forecast: 3.8 km/h (MAE ±1.0)
Solar Radiation:
24h forecast: 450 W/m² (MAE ±80)
48h forecast: 480 W/m² (MAE ±100)
Sky Conditions:
24h: Partly Cloudy
48h: Sunny
Understanding the Output
For each variable, the script displays:
- 24h forecast — predicted value 24 hours ahead
- 48h forecast — predicted value 48 hours ahead
- MAE ±N — estimated Mean Absolute Error (uncertainty band)
- Sky Conditions — categorical label (Clear, Partly Cloudy, Cloudy, Rainy, etc.)
The Ensemble Approach
Each scenario uses three model families blended via optimised weights:
- Chronos-T5-Small – Pretrained time-series foundation model (weight ~0.35–0.50)
- XGBoost – Gradient-boosted decision trees (weight ~0.30–0.40)
- LSTM – Recurrent neural network (weight ~0.15–0.30)
Weights are computed separately for each target variable and horizon, optimised to minimise validation error.
Synthetic Data Generation
The script generates realistic synthetic weather sequences using:
- Linear trends – gradual shifts in temperature across the scenario
- Sine-wave patterns – daily/seasonal oscillations in humidity and solar radiation
- Gaussian noise – realistic random variation (~3–5% of signal)
- Physical constraints – clipping unrealistic values (e.g., humidity stays 0–99%)
Forecast Accuracy Metrics
The model is trained to minimise:
- RMSE (Root Mean Squared Error) – penalizes large errors more heavily
- MAE (Mean Absolute Error) – average absolute deviation (shown in output)
- MAPE (Mean Absolute Percentage Error) – percentage error (for variables with wide ranges)
Troubleshooting
Model won’t load (FileNotFoundError):
# Verify model exists
Get-ChildItem -Path "src/agritwin_gh/models/environment_forecast_*.pt"
Chronos checkpoint download on first run:
The model automatically downloads the Chronos-T5-Small checkpoint (~600 MB) from Hugging Face on the first run. Subsequent runs use the local cache (much faster).
Extremely high or low predictions:
This may indicate the scenario is outside the training data distribution. Check that:
- Temperature is in range [-10, 50] °C
- Humidity is in range [5, 99] %
- Wind speed is in range [0, 80] km/h
- Solar radiation is in range [0, 1100] W/m²
13. Standalone Test Suite: test_weather_forecast.py
13.1 Overview
File location: scripts/test_weather_forecast.py
Purpose:
Standalone test script to validate the trained Environment Forecast ensemble model (Chronos + XGBoost + LSTM) across 10 realistic weather scenarios covering summer heat, monsoon onset, winter cold, dry spells, overcast periods, clear skies, and edge cases.
Why it exists:
The model predicts 24h and 48h-ahead values for temperature, humidity, windspeed, and solar radiation. This script exercises the ensemble without requiring the training notebook or live sensor integration — enabling rapid validation and confidence checks before deployment.
13.2 Usage
# Run all 10 scenarios
python scripts/test_weather_forecast.py
# Run a specific scenario (1–10)
python scripts/test_weather_forecast.py --scenario 4
# NOTE: First run downloads ~600 MB Chronos checkpoint to HuggingFace cache.
# Subsequent runs use the cached model.
13.3 What the Script Tests
| # |
Scenario |
What it validates |
| 1 |
Summer baseline – warm, moderate humidity (June) |
Normal summer conditions; model should forecast stable warm/dry |
| 2 |
Monsoon onset – humidity rising 60→90%, solar dropping |
Major season transition; 48h forecast should show humidity climb |
| 3 |
Winter cold – 10–18°C, low solar (December) |
Cold season; model should forecast low temperatures, low solar |
| 4 |
Dry hot spell – 35–40°C, low humidity (25–35%) |
Extreme heat; model should forecast sustained high temp/low humidity |
| 5 |
Overcast rainy – low solar (<50 W/m²), humidity 80–95% |
Rainy period; model should forecast persistently low light |
| 6 |
Clear sky peak – 800–1050 W/m², low humidity |
Optimal sunny day; model should forecast high solar, moderate temp |
| 7 |
Post-monsoon transition – humidity dropping 85→55%, recovery |
Season change; 48h forecast should show humidity decline |
| 8 |
24h vs 48h divergence check – validate both horizons are finite |
Tests model stability; ensures 48h ≠ 24h and both are realistic |
| 9 |
Minimum climate extreme – 2–8°C, 10–20% humidity, low light |
Cold dry minimum; stress-tests model on edge-case values |
| 10 |
Sine oscillation – intra-period variance, smooth cycles |
Tests rolling feature stability under periodic patterns |
13.4 Expected Output Structure
For each scenario, the script prints a table:
──────────────────────────────────────────────────────────────────────
Scenario 4: Dry hot spell — 35–40°C, humidity 25–35%
Variable 24h Forecast 48h Forecast
──────────────────────────────────────────────────
temp 37.50°C 38.20°C
humidity 28.10% 25.40%
windspeed 18.50 km/h 17.80 km/h
solarradiation 820.00 W/m² 840.00 W/m²
Interpretation:
- Each target variable receives a point forecast (single predicted value) for 24h and 48h horizons
- Values should be physically realistic: temp in expected range, humidity 0–100%, Solar 0–1200 W/m² on clear days
- Consecutive forecasts (24h vs 48h) should show smooth continuation, not sharp jumps
13.5 Data Generation Strategy
Each scenario generates a synthetic 30-day DataFrame with:
Method 1: Linear trends (_make_weather_df):
- Linearly interpolates from start value to end value over 30 days
- Adds small Gaussian noise for realism
- Clips to physically valid ranges
- Used for scenarios 1–7, 9–10
Method 2: Sine oscillations (_make_weather_df_sine):
- Generates periodic intra-period oscillations (daily cycles)
- Midpoint ± amplitude × sin(t)
- Models smooth seasonal or daily variation patterns
- Used for scenario 10 (rolling feature stability test)
DataFrame columns (required by model):
{
"datetime": pd.DatetimeIndex, # 30 daily dates
"temp": float (°C), # 30 values
"humidity": float (%), # 30 values
"windspeed": float (km/h), # 30 values
"solarradiation": float (W/m²) # 30 values
}
13.6 Key Validation Points
- Finiteness: All forecasts should be finite (not NaN, not ±inf)
- Physical realism: Values within expected greenhouse ranges:
- Temp: typically 10–40°C indoors
- Humidity: 5–99%
- Windspeed: 0–80 km/h (indoors: typically < 3 m/s)
- Solar: 0–1200 W/m² (peak summer clear sky)
- Continuity: 48h forecast should not be drastically different from 24h (smooth continuation)
- Trend consistency: If trend is rising (e.g., humidity increasing), 48h should be higher than 24h
13.7 Troubleshooting Failed Scenarios
“NaN” or infinite forecast values:
- Check that model weights are properly loaded from
MAIN_MODEL_PATH
- Verify Chronos checkpoint was downloaded (first run may take a few minutes)
- Confirm input DataFrame has exactly 30 rows and 4 numeric columns
“Assertion failed: forecast is not finite”:
- Indicates a model weight or scaler issue; retrain the model
- Check that feature scaling pipeline hasn’t changed
Unexpected forecast values (e.g., 200°C in scenario 4):
- Verify input DataFrame ranges are passed correctly to model
- Check that scalers (RobustScaler, etc.) are correctly loaded
- Confirm feature engineering logic hasn’t changed since training
Import errors (torch, diffusers, etc.):
- Confirm packages are installed:
pip install -r requirements.txt
- Verify CUDA/GPU drivers if using GPU (script defaults to CPU)
13.8 Integration with AgriTwin-GH
This script is a diagnostic tool for the Environment Forecast model:
- Model validation – Confirm predictions are sensible after retraining
- Scenario exploration – Test model response to seasonal extremes (worst-case planning)
- Feature debugging – Verify rolling/lag feature logic produces expected outputs
- Documentation – Provides working examples of DataFrame format for inference
For live greenhouse deployment, real sensor data flows through src/agritwin_gh/models/environment_forecast_inference.py → REST API → control system.
14. Glossary
- LSTM – Long Short-Term Memory; a type of recurrent neural network designed to handle sequences and long-range dependencies
- XGBoost – Extreme Gradient Boosting; a tree ensemble method that iteratively improves by correcting previous errors
- Chronos – Amazon’s pretrained transformer for time-series forecasting
- Ensemble – A combination of multiple models; predictions are blended (often improving accuracy and robustness)
- Horizon – How far into the future we are predicting (e.g., 24h, 48h)
- Context Window – A sliding window of recent historical data (e.g., last 30 days) fed into a model
- Feature – An input variable to a model (e.g., day-of-year, lagged temperature)
- Target Variable – The value we want to predict (e.g., tomorrow’s temperature)
- Scaler – A transformation that normalises input data (e.g.,
RobustScaler, StandardScaler)
- Label Encoder – Converts categorical strings (“Sunny”, “Cloudy”) to numeric codes (0, 1, etc.)
- MAE / RMSE / MAPE / R² – Common performance metrics for regression tasks
- Huber Loss – A robust loss function that behaves like MSE for small errors and MAE for large errors
- Optuna – A hyperparameter optimisation framework using Bayesian search (TPE sampler)
- Gradient Clipping – Preventing exploding gradients in neural networks by capping their magnitude
- Early Stopping – Halting training if a metric (e.g., validation loss) doesn’t improve for a patience period
- Autoregressive – Predicting future values based on past values of the same variable (e.g., temperature depends on past temperatures)
This document is a comprehensive reference for the weather_forecast.ipynb notebook.
Treat the notebook as the implementation and this markdown as the guided tour and reference manual.
For detailed code, cell-by-cell execution, and interactive plots, refer to the notebook itself.