Generated by: tomato_disease_progression_synthetic_generator.ipynb
Input backbone: tomato_growth_progression_synthetic_hourly.csv
Output CSV: tomato_disease_progression_synthetic_hourly.csv
Random seed: 42
Total rows: 54,240 (10,848 hourly timesteps × 5 diseases)
Timeline: 2024-07-01 to 2025-10-21
This synthetic dataset provides a realistic, sequential record of tomato disease progression in a controlled greenhouse environment. It uses hourly environmental and crop-stage backbone data as the spine and overlays multi-cycle, multi-severity disease dynamics for five diseases.
Intended use: Sequence model training for disease severity nowcasting and progression modelling. Future prediction targets (24h, 48h look-ahead) are intentionally excluded — they will be generated separately by a downstream process, allowing full flexibility in horizon and label design.
| Disease | Pathogen / Organism | Primary Trigger Conditions |
|---|---|---|
| early_blight | Alternaria solani | Warm (25–32 °C), moderate-high humidity (55–80%) |
| late_blight | Phytophthora infestans | Cool (10–22 °C), very high humidity (>85%) |
| leaf_mold | Passalora fulva | Moderate temp (16–24 °C), very high humidity (>82%) |
| powdery_mildew | Leveillula taurica | Moderate temp (20–28 °C), moderate humidity (40–70%) |
| spider_mites | Tetranychus urticae | Hot (28–38 °C), dry conditions (<52% RH) |
Each disease is scored using a piecewise-linear favorability function per environmental variable:
score(x) = {
1.0, if opt_low <= x <= opt_high
(x - stress_low) / (opt_low - stress_low), if stress_low < x < opt_low
(stress_high - x) / (stress_high - opt_high),if opt_high < x < stress_high
0.0, if x <= stress_low or x >= stress_high
}
The overall favorability is a weighted sum:
fav = w_temp * temp_score + w_hum * hum_score + w_vpd * vpd_score
fav += lw_bonus * leaf_wetness_proxy # bonus or penalty for leaf wetness
fav = max(fav, 0)
if air_velocity > airflow_threshold:
fav *= airflow_penalty_factor
fav = clip(fav, 0, 1)
| Disease | temp_opt | hum_opt | vpd_opt | r (growth) | d (decline) | lw_bonus |
|---|---|---|---|---|---|---|
| early_blight | (25.0, 32.0) | (55.0, 80.0) | (0.7, 2.0) | 0.180 | 0.042 | 0.22 |
| late_blight | (10.0, 22.0) | (85.0, 100.0) | (0.0, 0.6) | 0.220 | 0.048 | 0.30 |
| leaf_mold | (16.0, 24.0) | (82.0, 100.0) | (0.0, 0.5) | 0.160 | 0.038 | 0.28 |
| powdery_mildew | (20.0, 28.0) | (40.0, 70.0) | (0.6, 1.8) | 0.140 | 0.032 | 0.00 |
| spider_mites | (28.0, 38.0) | (28.0, 52.0) | (2.0, 4.5) | 0.200 | 0.044 | -0.18 |
Disease progression speed is modulated by the crop growth stage. A per-stage susceptibility score (0 = fully resistant, 1 = fully susceptible) multiplies the instantaneous growth rate:
| Stage | early_blight | late_blight | leaf_mold | powdery_mildew | spider_mites |
|---|---|---|---|---|---|
| seedling | 0.35 | 0.45 | 0.30 | 0.50 | 0.40 |
| early_vegetative | 0.55 | 0.60 | 0.50 | 0.70 | 0.62 |
| flowering_initiation | 0.72 | 0.78 | 0.68 | 0.78 | 0.75 |
| flowering | 0.85 | 0.90 | 0.88 | 0.82 | 0.85 |
| unripe | 0.88 | 0.92 | 0.80 | 0.65 | 0.80 |
| ripe | 0.60 | 0.55 | 0.45 | 0.40 | 0.50 |
Pre-planned outbreak seed positions are used per disease. Each seed defines the exact row index where initial infection is injected. Each disease receives 6 seeds across the full timeline:
This mix ensures training diversity across the full infection severity spectrum.
| Disease | Start Row | Initial Seed % | Peak Severity % | Category | Max Duration (h) |
|---|---|---|---|---|---|
| early_blight | 300 | 0.8 | 16.0 | mild | 420 |
| early_blight | 1400 | 2.0 | 82.0 | severe | 1080 |
| early_blight | 3100 | 1.0 | 38.0 | moderate | 580 |
| early_blight | 5600 | 2.5 | 91.0 | severe | 1150 |
| early_blight | 7600 | 0.6 | 20.0 | mild | 440 |
| early_blight | 9200 | 1.4 | 46.0 | moderate | 620 |
| late_blight | 420 | 1.5 | 78.0 | severe | 1040 |
| late_blight | 3000 | 0.5 | 14.0 | mild | 360 |
| late_blight | 4200 | 1.2 | 42.0 | moderate | 600 |
| late_blight | 6400 | 0.6 | 18.0 | mild | 400 |
| late_blight | 8100 | 2.2 | 88.0 | severe | 1120 |
| late_blight | 9700 | 1.0 | 36.0 | moderate | 540 |
| leaf_mold | 200 | 1.8 | 74.0 | severe | 1000 |
| leaf_mold | 2200 | 0.7 | 19.0 | mild | 420 |
| leaf_mold | 3800 | 1.2 | 44.0 | moderate | 620 |
| leaf_mold | 5800 | 2.0 | 86.0 | severe | 1100 |
| leaf_mold | 8300 | 0.5 | 15.0 | mild | 380 |
| leaf_mold | 9800 | 1.3 | 40.0 | moderate | 560 |
| powdery_mildew | 650 | 0.7 | 21.0 | mild | 450 |
| powdery_mildew | 1900 | 2.2 | 85.0 | severe | 1100 |
| powdery_mildew | 3600 | 1.0 | 40.0 | moderate | 580 |
| powdery_mildew | 6000 | 2.0 | 92.0 | severe | 1150 |
| powdery_mildew | 7900 | 0.8 | 18.0 | mild | 430 |
| powdery_mildew | 9400 | 1.5 | 44.0 | moderate | 600 |
| spider_mites | 350 | 2.0 | 80.0 | severe | 1060 |
| spider_mites | 2000 | 0.6 | 17.0 | mild | 400 |
| spider_mites | 3700 | 1.2 | 42.0 | moderate | 600 |
| spider_mites | 6100 | 2.5 | 93.0 | severe | 1200 |
| spider_mites | 8000 | 1.3 | 46.0 | moderate | 640 |
| spider_mites | 9900 | 0.5 | 16.0 | mild | 380 |
Each outbreak cycle advances through up to six sequential phases:
| Phase | Entry Condition | Growth Behaviour |
|---|---|---|
| latent | Cycle seed injected | Very slow fixed growth; pathogen establishing |
| onset | elapsed >= 24 h OR infection >= 2.5% | Moderate logistic growth at 50% of full rate |
| spread | infection >= max(12% of K, 5%) | Full aggressive logistic growth (env-responsive) |
| severe | infection >= 70% of K | Fluctuation near K; control events cause decline |
| stabilization | elapsed >= 76% of max_duration OR 24 h of low env. fav. | Gentle decline from peak |
| decline | elapsed >= 88% of max_duration | Exponential decay until infection < 0.5% |
When a control action is active during spread/onset: \(\Delta I_{\text{ctrl}} = \Delta I \cdot (1 - \text{intervention\_reduction})\)
Gaussian noise is added for realism at every step: \(I_{t+1} = \text{clip}\left( I_t + \Delta I + \mathcal{N}(0,\, \sigma), \; 0, 100 \right)\)
where $\sigma = \max(I \cdot 0.03,\; 0.08)$ during spread.
where $d_r$ = decline rate (multiplied by 2.5× when control action is active).
| Parameter | Value |
|---|---|
| Trigger level | infection >= 15% |
| Trigger probability | 35% per eligible hour |
| Active duration | 48 hours per event |
| Cooldown period | 72 hours between events |
| Events per cycle | Multiple allowed (cooldown-gated) |
During a control event: growth increment reduced by intervention_reduction (55–68%);
decline rate multiplied by 2.5×.
| Disease | Control Action Options |
|---|---|
| early_blight | fungicide_spray, copper_treatment, reduced_irrigation |
| late_blight | fungicide_spray, ventilation_increase, humidity_reduction |
| leaf_mold | ventilation_increase, humidity_reduction, leaf_pruning |
| powdery_mildew | sulfur_treatment, potassium_bicarbonate, improved_airflow |
| spider_mites | acaricide_spray, humidity_increase, biological_release |
| Column | Type | Description |
|---|---|---|
| disease_name | str | One of the five simulated diseases |
| disease_present_flag | int (0/1) | 1 if infection > 0% at this hour |
| disease_cycle_id | int | Sequential cycle number (1–6 per disease); 0 = no active cycle |
| disease_cycle_stage | str | Phase: latent / onset / spread / severe / stabilization / decline / none |
| outbreak_trigger_flag | int (0/1) | 1 at the exact hour an outbreak seed was injected |
| control_action_flag | int (0/1) | 1 during an active control intervention window |
| control_action_type | str | Action applied (e.g., fungicide_spray); ‘none’ otherwise |
| stage_susceptibility_score | float | Susceptibility of current growth stage $\in [0, 1]$ |
| disease_risk_score | float | fav × susc; combined hourly disease risk $\in [0, 1]$ |
| hours_since_disease_onset | float | Hours elapsed since current cycle onset; NaN if no active cycle |
| current_infection_pct | float | Simulated crop infection percentage $\in [0, 100]$ |
| infection_growth_rate_hourly | float | Delta infection per hour (positive = spreading, negative = declining) |
This dataset does not contain direct prediction target columns such as
predicted_infection_pct_24h or predicted_infection_pct_48h.
Rationale:
How to derive targets from this dataset:
# Example: create 24h and 48h look-ahead targets per disease
df_disease = df_final[df_final["disease_name"] == "early_blight"].copy()
df_disease = df_disease.sort_values("timestamp").reset_index(drop=True)
n = len(df_disease)
df_disease["target_24h"] = df_disease["current_infection_pct"].shift(-24)
df_disease["target_48h"] = df_disease["current_infection_pct"].shift(-48)
| Missing Column | Fallback Action |
|---|---|
leaf_wetness_proxy |
Derived as (indoor_humidity > 95).astype(float) |
vpd |
Uses vpd_proxy if available; otherwise 1.2 kPa |
| Any environmental column | Default neutral value; printed as warning |
stage_name |
Defaults to early_vegetative (susceptibility 0.5) |
K per outbreak cycle.Generated using Python · pandas · numpy · AgriTwin-GH project