ZOOM PREP — Dr. Quazi K. Hassan
Professor, Geomatics Engineering · Schulich School of Engineering, UCalgary
FEB 24 · 10:30 AM
i
Who He Is
Lab
EOE — Earth Observation for Environment
Credentials
PhD, P.Eng. (APEGA)
Output
142+ papers · 5,500+ citations
Awards
APEGA Env. Award '23 · ASTech '22
Editorial
Remote Sensing · Earth · Geomatica
NSERC
Holds active Discovery Grant
R
His Research — What He Actually Does
Sentinel-2 Landsat-8 MODIS Same satellites as you Alberta-focused
?
Q&A Prep — Click to Expand
"Tell me about your platform"
It's a citizen science app for beach and lake water quality. Users collect GPS-tagged, timestamped assessments — photos, field observations, water quality indicators. I've got 356+ assessments across Alberta. On the backend, I've built satellite analysis pipelines using Sentinel-2 and Landsat for algal bloom prediction and temporal trend analysis.
"What are your accuracy results?"
Cyanobacteria prediction R² = 0.69 for rivers, chlorophyll-a forecasting R² = 0.48–0.61 for lakes. I'm also doing multi-year temporal heatmaps showing algae progression — I can show you a 5-year animation for Eagle Lake. If he pushes on low R²: That's exactly why I need a geomatics expert — the retrieval algorithms need proper atmospheric correction, better band ratio optimization, and validation against more ground truth data. That's the core of what I want the NSERC Alliance to fund.
"What would the NSERC Alliance look like?"
You as PI or co-PI — bringing remote sensing and geomatics methodology. I'm the industry partner — platform, ground truth data, municipal relationships, in-kind contribution around $50–75K. Alberta Innovates Water Innovation program has confirmed it's a strong fit for co-funding the cash match. I'm also in touch with Dr. Baulch at USask on eutrophication and Dr. Vinebrooke at UAlberta on limnology — there's potential for a multi-institutional alliance. Core focus: validating satellite-derived water quality against citizen science ground truth across Alberta lakes.
"What's the cash match situation?"
Alberta Innovates confirmed the project fits their Water Innovation program. I've been in touch with Shane Patterson there. That could cover the cash match alongside my in-kind contribution from the platform.
"What do you need from me specifically?"
Your remote sensing expertise — proper atmospheric correction pipelines, improved retrieval algorithms, geospatial validation methodology. My platform generates the ground truth, but I need rigorous analysis to get the satellite-derived models to publication quality. Your work on Alberta lake clustering and surface water quality models is exactly the methodological approach I'm missing.
"Are there other researchers involved?"
Dr. Helen Baulch at USask is interested — expert on eutrophication and harmful algal blooms in prairie waters. Dr. Vinebrooke at UAlberta has been independently recommended by two colleagues — lake ecology and paleolimnology. There's potential for a multi-institutional alliance.
"Student involvement?"
Absolutely — the platform is a natural fit for grad students. Field data collection, algorithm development, validation campaigns. I'd be happy to support an HQP component.
"Commercialization plan?"
The app is already built and field-tested. Research improves the algorithms, which get deployed directly into the platform. Municipal adoption is the commercial pathway — I'm in conversation with several Alberta towns. Clean path from research to impact.
~
Know Your Own Work — Technical Deep Dive
Platform Overview — My BeachBook
  • Flutter cross-platform app (Android, iOS, Web)
  • Firebase backend: Firestore, Storage, Auth, App Check
  • Google Maps integration with marker clustering
  • Google ML Kit for on-device image labeling (50% confidence threshold)
  • iNaturalist API for species identification
  • Google Gemini AI for description generation
  • Geohash-based proximity validation (precision 9 = ~100m)
  • 355 beaches in database — 225 tidal, 82 lake, 48 river
  • 357 total contributions across 3 countries, 5 provinces/states
  • All submissions moderated by admin before inclusion
Satellite Data Pipeline — What You Actually Use
DATA SOURCES
  • Sentinel-2 L2A (ESA) — primary. Via Element84 STAC API, Planetary Computer + Copernicus fallbacks
  • Landsat 8/9 Collection 2 L2 (USGS) — thermal band (lwir11) for water surface temp
  • NASA SRTM — elevation/terrain (±6m vertical)
  • VIIRS DNB (NASA Black Marble) — night lights
  • WorldPop — population density at 100m
  • JRC Global Surface Water — lake extent detection
BANDS USED (Sentinel-2)
  • B02 Blue (490nm) — water color, turbidity
  • B03 Green (560nm) — NDWI, chlorophyll, turbidity
  • B04 Red (665nm) — NDVI, chlorophyll, sediment
  • B08 NIR (842nm) — NDWI, NDVI
  • B11 SWIR1 (1610nm) — urban development (NDBI)
  • SCL — Scene Classification Layer for cloud masking
PROCESSING
  • Resolution: downloaded at 120m (from native 10-20m) for computational efficiency
  • Cloud masking via SCL — keeps vegetation, bare soil, water, snow; masks clouds + shadows
  • Progressive cloud thresholds: 20% → 40% → 60% → 80% → 95%
  • xarray + stackstac for lazy loading and chunked processing
  • Multi-level caching: in-memory → disk (NetCDF .nc) → STAC download
  • Not using GEE for reports — migrated to STAC APIs for independence
Spectral Indices — What You Calculate
  • NDWI = (Green - NIR) / (Green + NIR) — water detection, shoreline dynamics, clarity proxy
  • NDVI = (NIR - Red) / (NIR + Red) — vegetation health, 5-year trend analysis
  • NDBI = (SWIR1 - NIR) / (SWIR1 + NIR) — urban development pressure (2015-2018 vs 2021-2024)
  • Chlorophyll proxy = (Green/Red - 0.9) × 27.3, clamped [0, 30] mg/m³
  • Chlorophyll Index = (Green - Red) / (Green + Red)
  • Turbidity = (Red × 0.3 + Green × 0.7) / 1000
  • Sediment ratio = Red / Green
  • Water Color Index = Blue/Green/Red band ratios (G/B and R/G)
  • Floating Debris Index = (NIR - Red) / (NIR + Red) × (Red / Green)
  • Shoreline Risk Proxy = std(NDWI over time) — temporal standard deviation of NDWI across 3 years of Sentinel-2 imagery (up to 20 scenes, 100m buffer, ≤20% cloud). Higher variance = more dynamic/unstable shoreline. Paired with Water Index = mean(NDWI) for average water presence.
AI Algae Prediction Model — Full Code Walkthrough
THE BIG PICTURE

You have two scripts that work together: algae_predictor_by_water_type.py (v1 training + runtime predictor) and train_algae_models_v2.py (v2 training with substrate features). Both output pickle files that the report generator loads at runtime.

STEP 1: DATA LOADING
  • Loads from beaches_full_export.json (v2) or beaches_cleaned_20260107.json (v1 — temperature-validated)
  • 355 beaches total, split by waterBodyType: 225 tidal, 82 lake, 48 river
  • Each beach must have algaeData with avgChlorophyll and/or avgCI
  • Beaches without algae data are skipped
STEP 2: FEATURE EXTRACTION (33 features in v2)
  • Geographic (3): latitude, longitude, lat_abs
  • Satellite temperature (4): avg, max, min, range — from Landsat 8/9 thermal band. v1 validates: rejects if min < -5°C or max < 5°C (ice/snow contamination)
  • Satellite clarity (3): turbidity, secchi depth, clarity score — from Sentinel-2
  • Other indices (5): water_index, shoreline_risk, garbage, population_pressure, sediment_index
  • NEW in v2 — Substrate (10): sand, pebbles, rocks, boulders, stone, mud + 4 engineered:
nutrient_retention = weighted score (mud×5 + stone×3 + rocks×2 + pebbles×1 + sand×0.5) / total
substrate_diversity = count of non-zero substrate types (0-5)
fine_coarse_ratio = (mud + sand) / (rocks + boulders + pebbles)
dark_substrate = mud + stone   // absorbs heat, warms water
  • NEW in v2 — Dimensions (3): beach_width, beach_length, beach_area
  • NEW in v2 — Biological (2): seaweed_total, kelp
  • Engineered interactions (3): temp_clarity, warm_turbid, temp_nutrient
STEP 3: WATER-TYPE-SPECIFIC ENGINEERING

Each water type gets 3 additional custom features that capture its unique physics:

LAKE:
  thermal_stratification_proxy = temp_range × secchi
  lake_bloom_risk = temp_avg × turbidity × nutrient_retention
  warm_shallow = temp_avg / secchi
TIDAL:
  tidal_exchange_proxy = turbidity / temp_range
  coastal_upwelling = secchi × lat_abs / 100
  mixing_index = temp_range × turbidity
RIVER:
  flow_proxy = turbidity × temp_range
  upstream_pollution = garbage × turbidity
  nutrient_loading = population_pressure × turbidity × nutrient_retention
STEP 4: PREPROCESSING
  • Imputation: SimpleImputer(strategy='median') — fills missing values with column median
  • Scaling: StandardScaler() — zero mean, unit variance. Required for gradient boosting convergence.
  • Both fitted on training data, applied to new predictions at runtime
STEP 5: MODEL TRAINING & SELECTION
  • v2 compares 4 algorithms per water type:
1. GradientBoosting (default): n_est=200, depth=5, lr=0.05
2. GradientBoosting (tuned): n_est=300, depth=4, lr=0.03, subsample=0.8
3. RandomForest: n_est=200, depth=10
4. XGBoost: n_est=200, depth=5, lr=0.05, subsample=0.8, colsample=0.8
  • Best model selected by cross-validated R² (5-fold CV when n ≥ 30)
  • XGBoost wins in most cases — hence "v2 = XGBoost + substrate features"
  • Trains 2 targets per water type: chlorophyll-a concentration + cyanobacteria index
  • v1 also trains a risk level model (RandomForest, max_depth=10) mapping to low/medium/high
STEP 6: SERIALIZATION & DEPLOYMENT
  • Models saved as .pkl files via pickle (~3MB each)
  • algae_models_by_water_type.pkl (v1) and algae_models_by_water_type_v2.pkl (v2)
  • Report generator loads v2 first, falls back to v1 if missing
  • Each pickle contains: model objects + scaler + imputer + feature list per water type
  • At prediction time: extract features → impute → scale → predict → classify risk
KEY INSIGHT FOR HASSAN
The water-type-specific feature engineering is the clever part. A lake's bloom risk depends on thermal stratification (temp_range × secchi), while a river's depends on upstream nutrient loading (population × turbidity × substrate retention). Generic models miss these physics. That's also where his geomatics expertise would improve things — better atmospheric correction = better input features = better predictions.
ML Models — Accuracy Numbers (Know These Cold)
V2 MODELS (XGBoost + substrate features, 33 input features)
  • Lakes — Chlorophyll-a: R² = 0.62 (+12.9% over v1)
  • Rivers — Chlorophyll-a: R² = 0.59 (+237% over v1)
  • Tidal — Chlorophyll-a: R² = 0.59
  • Rivers — Cyanobacteria Index: R² = 0.69
MODEL DETAILS
  • XGBoost primary, GradientBoosting fallback
  • n_estimators=200, max_depth=5, learning_rate=0.05
  • 5-fold cross-validation when sample ≥ 30
  • 33 features: 15 base (geographic, satellite, community) + 10 substrate + 3 dimension + 2 biological + 3 engineered
  • v2 added: substrate nutrient retention, warmth index, stability, porosity, seaweed/kelp
SATELLITE ACCURACY
  • Temperature: ±0.5-2°C
  • Clarity indices: ±20-40%
  • Satellite vs in-situ chlorophyll: ±30-40%
  • GPS positional accuracy: <10m
  • SRTM terrain: ±6m vertical
Algae Heatmap Pipeline — The Demo Piece
  • 5-year temporal analysis (current year minus 4)
  • Summer/growing season only (May-Sept for northern hemisphere)
  • For each year: query Sentinel-2, load Green/Red/NIR, apply water mask (NDWI > 0.1)
  • Calculate chlorophyll proxy: (Green/Red - 0.9) × 27.3, clamped [0, 30] mg/m³
  • Lake detection via JRC Global Surface Water dataset
  • Sampling grid: 100m spacing across entire lake surface (e.g. 845 points for Eagle Lake)
  • Output: per-year PNG heatmaps (blue→green→yellow→red colorscale)
  • Animated GIF: 220 frames, 800ms per year, smooth fades
  • Temporal video (MP4): week-by-week progression through bloom season, 30 FPS
  • Eagle Lake files: 17MB MP4, 29MB GIF
  • Trend: simple linear comparison first vs last year (>20% = Increasing, <-20% = Decreasing)
Report System — What Gets Generated
  • 24+ page premium report as Word (.docx) + PDF
  • 10 major sections: satellite metrics, imagery analysis, environmental maps, lake analysis, advanced analytics, algae monitoring, community observations, shoreline risk, recommendations, appendices
  • Seasonal water temperature (4 seasons from Landsat thermal)
  • Temporal satellite images spanning 1984-2025 (Landsat + Sentinel-2)
  • Environmental maps: terrain, land cover, night lights, population density
  • 5-year algae heatmaps + animated GIF + temporal video
  • Nearest-beach comparison (5 closest in database)
  • Regional benchmarking by climate zone (355 beach database)
  • GIS export package with FGDC-compliant shapefiles
  • Government contacts document
COMPOSITE SCORING
  • Overall Beach Quality (0-100) = Natural Features (30%) + Env. Stability (25%) + Water Quality (20%) + Vegetation Health (15%) + Low Development (10%)
  • Water Quality (0-100) = Turbidity (30%) + Sediment (30%) + NDWI (20%) + Color (20%)
Ground Truth Collection — What Makes Your Data Unique
  • 40+ data categories per assessment: flora (kelp, seaweed, eelgrass), fauna (25+ species), substrate (sand, pebbles, rocks, boulders, mud), driftwood, dimensions, garbage, wind, people
  • All GPS-tagged, timestamped, photo-documented
  • Geohash-validated (must be physically at beach)
  • Multiple contributions aggregated per beach (averaged numerics, deduplicated species)
  • ML-assisted identification (Google ML Kit + iNaturalist)
  • Admin-moderated quality control pipeline
  • Key selling point to Hassan: this is concurrent ground truth + satellite data. Most RS studies lack this.
Architecture — Technical Decisions He Might Ask About
  • STAC APIs over GEE — eliminates GEE dependency. Element84 primary, Planetary Computer + Copernicus fallbacks
  • xarray + stackstac for lazy loading — only downloads what's needed
  • Water-type-specific models — separate for tidal, lake, river (different physics)
  • Water sampling point validation — 5-year NDWI timeseries confirms pin is in permanent water. Auto-adjusts toward deeper water if needed (8-directional search)
  • Checkpoint/resume system — JSON checkpoints after each phase (coordinates, crop, metrics) for crash recovery
  • Flask + WebSocket for real-time progress to Flutter app during report generation
  • Latitude-based climate zones for regional benchmarking: Arctic (>60), Temperate Cold (45-60), Temperate Warm (30-45), Subtropical (15-30), Tropical (<15)
Lines That Show You Did Your Homework
I saw your work on clustering Alberta lakes by water quality parameters — that's directly relevant to what I'm doing at scale with satellite data
Your Lesser Slave Lake monitoring using Landsat + Sentinel SAR was interesting — I'm doing similar temporal analysis with Sentinel-2 optical
Congrats on the APEGA and ASTech awards — that applied impact approach is exactly what I'm going for
Value Exchange

Your Value to Him

  • Ground truth data — 356+ GPS-tagged, timestamped, photo-documented field points. Gold for RS researchers.
  • Deployed platform — real application for his satellite work, not just papers
  • AB Innovates cash match — solves the #1 NSERC Alliance pain point
  • Municipal connections — real-world policy impact for his portfolio
  • Fresh outsider perspective — you built this entire platform as a passion project without formal RS training. You approach problems differently than someone who came up through academia — that's how you ended up with water-type-specific models and citizen science ground truth instead of another GEE script. Academics value unconventional thinking.

His Value to You

  • Geomatics methodology — atmospheric correction, geometric validation, spatial statistics
  • NSERC track record — holds Discovery Grant, knows the system
  • P.Eng. — professional engineering credibility
  • Alberta focus — most of his study areas are in-province
  • Publication speed — 7 papers in 2025, he ships fast
!
Watch Out — Don't Say