This guide explains how to set up and use the image storage system for AgriTwin-GH using MinIO object storage and PostgreSQL metadata database.
The system consists of:
Image Files (Disk)
↓
Upload Script (Python)
↓
┌───────────────────┐
│ MinIO │ ← Image blobs (JPG, PNG, etc.)
│ (Object Store) │
└───────────────────┘
↓ (Metadata)
┌───────────────────┐
│ PostgreSQL │ ← Image metadata, categories, paths
│ (Database) │
└───────────────────┘
# Activate your virtual environment
.\.venv\Scripts\activate
# Install required packages
uv add -r requirements.txt
Option A: Using Docker (Recommended)
docker run -d `
-p 9000:9000 `
-p 9001:9001 `
--name minio `
-v E:\minio-data:/data `
-e "MINIO_ROOT_USER=minioadmin" `
-e "MINIO_ROOT_PASSWORD=minioadmin123" `
quay.io/minio/minio server /data --console-address ":9001"
Option B: Using Binary
./minio.exe server E:\minio-data --console-address ":9001"
Open your browser to: http://localhost:9001
minioadminminioadmin123 (or whatever you set)Copy the example environment file:
Copy-Item .env.example .env
.env FileUpdate with your actual credentials:
# Database (TimescaleDB Container - used for both timeseries and image metadata)
DB_HOST=localhost
DB_PORT=5432
DB_NAME=agritwin_db
DB_USER=postgres
DB_PASSWORD=agritwin-gh
# MinIO Configuration
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_SECURE=false
Note: Image metadata tables will be added to your existing
agritwin_dbdatabase.🐳 Docker Users: See POSTGRES_DOCKER_SETUP.md for complete Docker-specific commands.
# List running containers
docker ps
# Or list all containers
docker ps -a
The image metadata tables will be created in your existing agritwin_db database:
Via Docker (PowerShell):
# Run schema using Get-Content (PowerShell doesn't support < redirection)
Get-Content database/schema/image_metadata.sql | docker exec -i agritwin-timescaledb psql -U postgres -d agritwin_db
If you have psql installed locally:
psql -h localhost -U postgres -d agritwin_db -f database/schema/image_metadata.sql
# Via Docker
docker exec -i agritwin-timescaledb psql -U postgres -d agritwin_db -c "\dt"
# You should see your existing tables PLUS:
# image_metadata
# image_annotations
# image_access_log
If using Docker:
docker start minio
If using binary:
./minio.exe server E:\minio-data --console-address ":9001"
# Test connection (from Python)
python -c "from minio import Minio; client = Minio('localhost:9000', 'minioadmin', 'minioadmin123', secure=False); print('Connected:', client.bucket_exists('test') or True)"
Upload all three image datasets:
python scripts/upload_images_to_minio.py
python scripts/upload_images_to_minio.py --dry-run
# Process 50 images at a time
python scripts/upload_images_to_minio.py --batch-size 50
# Upload only disease images
python scripts/upload_images_to_minio.py --directories "data/external/Tomato Diseases"
# Upload multiple specific directories
python scripts/upload_images_to_minio.py --directories "data/external/Tomato Diseases" "data/external/Tomato Growth Stages"
The script shows:
Example output:
Processing directory: Tomato Diseases
Found 6000 image files
Tomato Diseases: 100%|████████████| 6000/6000 [05:30<00:00, 18.12 images/s]
================================================================================
Upload Complete!
================================================================================
Total files found: 14087
Successfully uploaded: 14087
Failed: 0
Skipped: 0
Total size: 2500.45 MB
Duration: 1234.56 seconds
Average: 11.41 images/second
================================================================================
Connect to PostgreSQL and run queries:
-- Get image counts by category
SELECT * FROM image_summary;
-- Find all early blight disease images
SELECT image_key, file_name, upload_date
FROM image_metadata
WHERE category = 'disease'
AND subcategory = 'tomato_early_blight'
LIMIT 10;
-- Get recently uploaded images
SELECT
category,
subcategory,
label,
COUNT(*) as count,
SUM(file_size) / 1024.0 / 1024.0 as size_mb
FROM image_metadata
WHERE upload_date > NOW() - INTERVAL '1 day'
GROUP BY category, subcategory, label;
-- Search by tags
SELECT image_key, label, tags
FROM image_metadata
WHERE 'early_blight' = ANY(tags);
Create a script to query and download images:
import psycopg2
from minio import Minio
from dotenv import load_dotenv
import os
load_dotenv()
# Connect to PostgreSQL
conn = psycopg2.connect(
host=os.getenv("DB_HOST"),
port=os.getenv("DB_PORT"),
database=os.getenv("DB_NAME"),
user=os.getenv("DB_USER"),
password=os.getenv("DB_PASSWORD")
)
# Connect to MinIO
minio_client = Minio(
os.getenv("MINIO_ENDPOINT"),
access_key=os.getenv("MINIO_ACCESS_KEY"),
secret_key=os.getenv("MINIO_SECRET_KEY"),
secure=False
)
# Query images
cursor = conn.cursor()
cursor.execute("""
SELECT image_key, bucket_name, file_name, label
FROM image_metadata
WHERE category = 'disease'
LIMIT 5
""")
# Download images
for image_key, bucket_name, file_name, label in cursor.fetchall():
print(f"Downloading: {label} - {file_name}")
minio_client.fget_object(
bucket_name,
image_key,
f"downloads/{file_name}"
)
cursor.close()
conn.close()
After upload, images are organized in MinIO as:
Buckets:
├── agritwin-diseases/
│ ├── disease/
│ │ ├── tomato_early_blight/
│ │ │ ├── 2024/
│ │ │ │ ├── 12/
│ │ │ │ │ ├── 003f4fcb-18d6-4853-89a4-51002e3164a9___RS_Erly.B 1683.JPG
│ │ │ │ │ ├── 003f4fcb-18d6-4853-89a4-51002e3164a9___RS_Erly.B 1684.JPG
│ │ ├── tomato_late_blight/
│ │ ├── tomato_leaf_mold/
│ │ └── ...
│
├── agritwin-growth-stages/
│ ├── growth_stage/
│ │ ├── stage1_seedling/
│ │ ├── stage2_early_vegetative/
│ │ └── ...
│
└── agritwin-healthy/
├── healthy/
│ ├── healthy_leaf/
│ │ ├── 2024/
│ │ │ └── 12/
│ │ │ ├── image001.jpg
│ │ │ └── ...
Problem: Connection refused or Network error
Solution:
docker ps or check the MinIO processProblem: psycopg2.OperationalError
Solution:
docker ps | Select-String timescale.env match your container passworddocker exec -i agritwin-timescaledb psql -U postgres -d agritwin_db -c "SELECT 1;"Problem: relation "image_metadata" does not exist
Solution:
# Run the schema file using Get-Content (PowerShell syntax)
Get-Content database/schema/image_metadata.sql | docker exec -i agritwin-timescaledb psql -U postgres -d agritwin_db
Problem: ModuleNotFoundError: No module named 'minio'
Solution:
# Ensure virtual environment is activated
.\.venv\Scripts\activate
# Reinstall dependencies
pip install -r requirements.txt
Problem: Timeout or memory errors
Solution:
--batch-size 10config/minio_config.pyProblem: Cannot create buckets or upload files
Solution:
Edit config/minio_config.py:
self.buckets = {
"disease": "my-custom-disease-bucket",
"growth_stage": "my-custom-stage-bucket",
"healthy": "my-custom-healthy-bucket",
"default": "my-default-bucket"
}
Modify the upload script to extract additional metadata from image EXIF or filename patterns.
For very large datasets:
# Process one category at a time
python scripts/upload_images_to_minio.py --directories "data/external/Tomato Diseases"
python scripts/upload_images_to_minio.py --directories "data/external/Tomato Growth Stages"
python scripts/upload_images_to_minio.py --directories "data/external/Tomato Healthy Leaves"
The script uses ON CONFLICT ... DO UPDATE so you can safely re-run it to resume failed uploads.
analysis_results columnFor issues or questions:
logs/image_upload.loglogs/Last Updated: 2024