Zoom Prep — Dr. Quazi Hassan

i

Who He Is

Lab

EOE — Earth Observation for Environment

Credentials

PhD, P.Eng. (APEGA)

Output

142+ papers · 5,500+ citations

Awards

APEGA Env. Award '23 · ASTech '22

Editorial

Remote Sensing · Earth · Geomatica

NSERC

Holds active Discovery Grant

R

His Research — What He Actually Does

#1 focus: wildfire forecasting using NASA MODIS — that's his fame
Alberta lake clustering by water quality params (Akbar, Hassan & Achari, 2011)
Remote sensing models for surface water quality (Akbar et al., 2014)
River water quality vs. flow regime at 3 Alberta rivers (Rostami et al., 2020)
Lesser Slave Lake monthly water/ice monitoring — Landsat-8 + Sentinel-1 SAR
Athabasca River Basin — flow forecasting, drought indices, climate-hydrology

Sentinel-2 Landsat-8 MODIS Same satellites as you Alberta-focused

?

Q&A Prep — Click to Expand

"Tell me about your platform"

It's a citizen science app for beach and lake water quality. Users collect GPS-tagged, timestamped assessments — photos, field observations, water quality indicators. I've got 356+ assessments across Alberta. On the backend, I've built satellite analysis pipelines using Sentinel-2 and Landsat for algal bloom prediction and temporal trend analysis.

"What are your accuracy results?"

Cyanobacteria prediction R² = 0.69 for rivers, chlorophyll-a forecasting R² = 0.48–0.61 for lakes. I'm also doing multi-year temporal heatmaps showing algae progression — I can show you a 5-year animation for Eagle Lake. If he pushes on low R²: That's exactly why I need a geomatics expert — the retrieval algorithms need proper atmospheric correction, better band ratio optimization, and validation against more ground truth data. That's the core of what I want the NSERC Alliance to fund.

"What would the NSERC Alliance look like?"

You as PI or co-PI — bringing remote sensing and geomatics methodology. I'm the industry partner — platform, ground truth data, municipal relationships, in-kind contribution around $50–75K. Alberta Innovates Water Innovation program has confirmed it's a strong fit for co-funding the cash match. I'm also in touch with Dr. Baulch at USask on eutrophication and Dr. Vinebrooke at UAlberta on limnology — there's potential for a multi-institutional alliance. Core focus: validating satellite-derived water quality against citizen science ground truth across Alberta lakes.

"What's the cash match situation?"

Alberta Innovates confirmed the project fits their Water Innovation program. I've been in touch with Shane Patterson there. That could cover the cash match alongside my in-kind contribution from the platform.

"What do you need from me specifically?"

Your remote sensing expertise — proper atmospheric correction pipelines, improved retrieval algorithms, geospatial validation methodology. My platform generates the ground truth, but I need rigorous analysis to get the satellite-derived models to publication quality. Your work on Alberta lake clustering and surface water quality models is exactly the methodological approach I'm missing.

"Are there other researchers involved?"

Dr. Helen Baulch at USask is interested — expert on eutrophication and harmful algal blooms in prairie waters. Dr. Vinebrooke at UAlberta has been independently recommended by two colleagues — lake ecology and paleolimnology. There's potential for a multi-institutional alliance.

"Student involvement?"

Absolutely — the platform is a natural fit for grad students. Field data collection, algorithm development, validation campaigns. I'd be happy to support an HQP component.

"Commercialization plan?"

The app is already built and field-tested. Research improves the algorithms, which get deployed directly into the platform. Municipal adoption is the commercial pathway — I'm in conversation with several Alberta towns. Clean path from research to impact.

~

Know Your Own Work — Technical Deep Dive

Platform Overview — My BeachBook

Flutter cross-platform app (Android, iOS, Web)
Firebase backend: Firestore, Storage, Auth, App Check
Google Maps integration with marker clustering
Google ML Kit for on-device image labeling (50% confidence threshold)
iNaturalist API for species identification
Google Gemini AI for description generation
Geohash-based proximity validation (precision 9 = ~100m)
355 beaches in database — 225 tidal, 82 lake, 48 river
357 total contributions across 3 countries, 5 provinces/states
All submissions moderated by admin before inclusion

Satellite Data Pipeline — What You Actually Use

DATA SOURCES

Sentinel-2 L2A (ESA) — primary. Via Element84 STAC API, Planetary Computer + Copernicus fallbacks
Landsat 8/9 Collection 2 L2 (USGS) — thermal band (lwir11) for water surface temp
NASA SRTM — elevation/terrain (±6m vertical)
VIIRS DNB (NASA Black Marble) — night lights
WorldPop — population density at 100m
JRC Global Surface Water — lake extent detection

BANDS USED (Sentinel-2)

B02 Blue (490nm) — water color, turbidity
B03 Green (560nm) — NDWI, chlorophyll, turbidity
B04 Red (665nm) — NDVI, chlorophyll, sediment
B08 NIR (842nm) — NDWI, NDVI
B11 SWIR1 (1610nm) — urban development (NDBI)
SCL — Scene Classification Layer for cloud masking

PROCESSING

Resolution: downloaded at 120m (from native 10-20m) for computational efficiency
Cloud masking via SCL — keeps vegetation, bare soil, water, snow; masks clouds + shadows
Progressive cloud thresholds: 20% → 40% → 60% → 80% → 95%
xarray + stackstac for lazy loading and chunked processing
Multi-level caching: in-memory → disk (NetCDF .nc) → STAC download
Not using GEE for reports — migrated to STAC APIs for independence

Spectral Indices — What You Calculate

NDWI = (Green - NIR) / (Green + NIR) — water detection, shoreline dynamics, clarity proxy
NDVI = (NIR - Red) / (NIR + Red) — vegetation health, 5-year trend analysis
NDBI = (SWIR1 - NIR) / (SWIR1 + NIR) — urban development pressure (2015-2018 vs 2021-2024)
Chlorophyll proxy = (Green/Red - 0.9) × 27.3, clamped [0, 30] mg/m³
Chlorophyll Index = (Green - Red) / (Green + Red)
Turbidity = (Red × 0.3 + Green × 0.7) / 1000
Sediment ratio = Red / Green
Water Color Index = Blue/Green/Red band ratios (G/B and R/G)
Floating Debris Index = (NIR - Red) / (NIR + Red) × (Red / Green)
Shoreline Risk Proxy = std(NDWI over time) — temporal standard deviation of NDWI across 3 years of Sentinel-2 imagery (up to 20 scenes, 100m buffer, ≤20% cloud). Higher variance = more dynamic/unstable shoreline. Paired with Water Index = mean(NDWI) for average water presence.

AI Algae Prediction Model — Full Code Walkthrough

THE BIG PICTURE

You have two scripts that work together: algae_predictor_by_water_type.py (v1 training + runtime predictor) and train_algae_models_v2.py (v2 training with substrate features). Both output pickle files that the report generator loads at runtime.

STEP 1: DATA LOADING

Loads from beaches_full_export.json (v2) or beaches_cleaned_20260107.json (v1 — temperature-validated)
355 beaches total, split by waterBodyType: 225 tidal, 82 lake, 48 river
Each beach must have algaeData with avgChlorophyll and/or avgCI
Beaches without algae data are skipped

STEP 2: FEATURE EXTRACTION (33 features in v2)

Geographic (3): latitude, longitude, lat_abs
Satellite temperature (4): avg, max, min, range — from Landsat 8/9 thermal band. v1 validates: rejects if min < -5°C or max < 5°C (ice/snow contamination)
Satellite clarity (3): turbidity, secchi depth, clarity score — from Sentinel-2
Other indices (5): water_index, shoreline_risk, garbage, population_pressure, sediment_index
NEW in v2 — Substrate (10): sand, pebbles, rocks, boulders, stone, mud + 4 engineered:

        nutrient_retention = weighted score (mud×5 + stone×3 + rocks×2 + pebbles×1 + sand×0.5) / total

        substrate_diversity = count of non-zero substrate types (0-5)

        fine_coarse_ratio = (mud + sand) / (rocks + boulders + pebbles)

        dark_substrate = mud + stone   // absorbs heat, warms water

NEW in v2 — Dimensions (3): beach_width, beach_length, beach_area
NEW in v2 — Biological (2): seaweed_total, kelp
Engineered interactions (3): temp_clarity, warm_turbid, temp_nutrient

STEP 3: WATER-TYPE-SPECIFIC ENGINEERING

Each water type gets 3 additional custom features that capture its unique physics:

        LAKE:

          thermal_stratification_proxy = temp_range × secchi

          lake_bloom_risk = temp_avg × turbidity × nutrient_retention

          warm_shallow = temp_avg / secchi

        TIDAL:

          tidal_exchange_proxy = turbidity / temp_range

          coastal_upwelling = secchi × lat_abs / 100

          mixing_index = temp_range × turbidity

        RIVER:

          flow_proxy = turbidity × temp_range

          upstream_pollution = garbage × turbidity

          nutrient_loading = population_pressure × turbidity × nutrient_retention

STEP 4: PREPROCESSING

Imputation: SimpleImputer(strategy='median') — fills missing values with column median
Scaling: StandardScaler() — zero mean, unit variance. Required for gradient boosting convergence.
Both fitted on training data, applied to new predictions at runtime

STEP 5: MODEL TRAINING & SELECTION

v2 compares 4 algorithms per water type:

        1. GradientBoosting (default): n_est=200, depth=5, lr=0.05

        2. GradientBoosting (tuned): n_est=300, depth=4, lr=0.03, subsample=0.8

        3. RandomForest: n_est=200, depth=10

        4. XGBoost: n_est=200, depth=5, lr=0.05, subsample=0.8, colsample=0.8

Best model selected by cross-validated R² (5-fold CV when n ≥ 30)
XGBoost wins in most cases — hence "v2 = XGBoost + substrate features"
Trains 2 targets per water type: chlorophyll-a concentration + cyanobacteria index
v1 also trains a risk level model (RandomForest, max_depth=10) mapping to low/medium/high

STEP 6: SERIALIZATION & DEPLOYMENT

Models saved as .pkl files via pickle (~3MB each)
algae_models_by_water_type.pkl (v1) and algae_models_by_water_type_v2.pkl (v2)
Report generator loads v2 first, falls back to v1 if missing
Each pickle contains: model objects + scaler + imputer + feature list per water type
At prediction time: extract features → impute → scale → predict → classify risk

KEY INSIGHT FOR HASSAN

The water-type-specific feature engineering is the clever part. A lake's bloom risk depends on thermal stratification (temp_range × secchi), while a river's depends on upstream nutrient loading (population × turbidity × substrate retention). Generic models miss these physics. That's also where his geomatics expertise would improve things — better atmospheric correction = better input features = better predictions.

ML Models — Accuracy Numbers (Know These Cold)

V2 MODELS (XGBoost + substrate features, 33 input features)

Lakes — Chlorophyll-a: R² = 0.62 (+12.9% over v1)
Rivers — Chlorophyll-a: R² = 0.59 (+237% over v1)
Tidal — Chlorophyll-a: R² = 0.59
Rivers — Cyanobacteria Index: R² = 0.69

MODEL DETAILS

XGBoost primary, GradientBoosting fallback
n_estimators=200, max_depth=5, learning_rate=0.05
5-fold cross-validation when sample ≥ 30
33 features: 15 base (geographic, satellite, community) + 10 substrate + 3 dimension + 2 biological + 3 engineered
v2 added: substrate nutrient retention, warmth index, stability, porosity, seaweed/kelp

SATELLITE ACCURACY

Temperature: ±0.5-2°C
Clarity indices: ±20-40%
Satellite vs in-situ chlorophyll: ±30-40%
GPS positional accuracy: <10m
SRTM terrain: ±6m vertical

Algae Heatmap Pipeline — The Demo Piece

5-year temporal analysis (current year minus 4)
Summer/growing season only (May-Sept for northern hemisphere)
For each year: query Sentinel-2, load Green/Red/NIR, apply water mask (NDWI > 0.1)
Calculate chlorophyll proxy: (Green/Red - 0.9) × 27.3, clamped [0, 30] mg/m³
Lake detection via JRC Global Surface Water dataset
Sampling grid: 100m spacing across entire lake surface (e.g. 845 points for Eagle Lake)
Output: per-year PNG heatmaps (blue→green→yellow→red colorscale)
Animated GIF: 220 frames, 800ms per year, smooth fades
Temporal video (MP4): week-by-week progression through bloom season, 30 FPS
Eagle Lake files: 17MB MP4, 29MB GIF
Trend: simple linear comparison first vs last year (>20% = Increasing, <-20% = Decreasing)

Report System — What Gets Generated

24+ page premium report as Word (.docx) + PDF
10 major sections: satellite metrics, imagery analysis, environmental maps, lake analysis, advanced analytics, algae monitoring, community observations, shoreline risk, recommendations, appendices
Seasonal water temperature (4 seasons from Landsat thermal)
Temporal satellite images spanning 1984-2025 (Landsat + Sentinel-2)
Environmental maps: terrain, land cover, night lights, population density
5-year algae heatmaps + animated GIF + temporal video
Nearest-beach comparison (5 closest in database)
Regional benchmarking by climate zone (355 beach database)
GIS export package with FGDC-compliant shapefiles
Government contacts document

COMPOSITE SCORING

Overall Beach Quality (0-100) = Natural Features (30%) + Env. Stability (25%) + Water Quality (20%) + Vegetation Health (15%) + Low Development (10%)
Water Quality (0-100) = Turbidity (30%) + Sediment (30%) + NDWI (20%) + Color (20%)

Ground Truth Collection — What Makes Your Data Unique

40+ data categories per assessment: flora (kelp, seaweed, eelgrass), fauna (25+ species), substrate (sand, pebbles, rocks, boulders, mud), driftwood, dimensions, garbage, wind, people
All GPS-tagged, timestamped, photo-documented
Geohash-validated (must be physically at beach)
Multiple contributions aggregated per beach (averaged numerics, deduplicated species)
ML-assisted identification (Google ML Kit + iNaturalist)
Admin-moderated quality control pipeline
Key selling point to Hassan: this is concurrent ground truth + satellite data. Most RS studies lack this.

Architecture — Technical Decisions He Might Ask About

STAC APIs over GEE — eliminates GEE dependency. Element84 primary, Planetary Computer + Copernicus fallbacks
xarray + stackstac for lazy loading — only downloads what's needed
Water-type-specific models — separate for tidal, lake, river (different physics)
Water sampling point validation — 5-year NDWI timeseries confirms pin is in permanent water. Auto-adjusts toward deeper water if needed (8-directional search)
Checkpoint/resume system — JSON checkpoints after each phase (coordinates, crop, metrics) for crash recovery
Flask + WebSocket for real-time progress to Flutter app during report generation
Latitude-based climate zones for regional benchmarking: Arctic (>60), Temperate Cold (45-60), Temperate Warm (30-45), Subtropical (15-30), Tropical (<15)

”

Lines That Show You Did Your Homework

I saw your work on clustering Alberta lakes by water quality parameters — that's directly relevant to what I'm doing at scale with satellite data

Your Lesser Slave Lake monitoring using Landsat + Sentinel SAR was interesting — I'm doing similar temporal analysis with Sentinel-2 optical

Congrats on the APEGA and ASTech awards — that applied impact approach is exactly what I'm going for

↔

Value Exchange

Your Value to Him

Ground truth data — 356+ GPS-tagged, timestamped, photo-documented field points. Gold for RS researchers.
Deployed platform — real application for his satellite work, not just papers
AB Innovates cash match — solves the #1 NSERC Alliance pain point
Municipal connections — real-world policy impact for his portfolio
Fresh outsider perspective — you built this entire platform as a passion project without formal RS training. You approach problems differently than someone who came up through academia — that's how you ended up with water-type-specific models and citizen science ground truth instead of another GEE script. Academics value unconventional thinking.

His Value to You

Geomatics methodology — atmospheric correction, geometric validation, spatial statistics
NSERC track record — holds Discovery Grant, knows the system
P.Eng. — professional engineering credibility
Alberta focus — most of his study areas are in-province
Publication speed — 7 papers in 2025, he ships fast

!

Watch Out — Don't Say

Don't oversell accuracy — be honest about R² limitations. He's an engineer, he'll spot bullshit instantly.
Don't say "AI" or "machine learning" unless you actually use ML. He wants specifics: band ratios, NDCI, atmospheric correction methods.
Don't over-name-drop Baulch/Vinebrooke — position Hassan as bringing unique geomatics/RS methodology the others don't have. He's not a secondary player.