Module 4: Algae Prediction & Bloom Risk

Machine Learning Models for Cyanobacteria and Chlorophyll-a

The Algae Problem: Why It Matters

Harmful algal blooms (HABs) are one of the most serious water quality threats worldwide, causing beach closures, fish kills, and human health risks. Nimpact uses water-body-specific machine learning models trained on thousands of beaches to predict algae concentrations from satellite-derived environmental parameters.

Two Key Algae Metrics

Cyanobacteria Index (CI)

Measures concentration of blue-green algae (cyanobacteria)—the most harmful type. Some species produce toxins (microcystins, anatoxins) that can cause:

  • Liver damage in humans and animals
  • Neurological effects
  • Skin irritation and respiratory problems
  • Pet deaths from drinking contaminated water

Units: Cells/mL (typical range 0-100,000)

Chlorophyll-a Concentration

Measures all algae types (green, brown, blue-green) via their primary photosynthetic pigment. High chlorophyll indicates:

  • High biological productivity (eutrophication)
  • Nutrient enrichment (phosphorus, nitrogen)
  • Potential for oxygen depletion and fish kills
  • Reduced water clarity

Units: μg/L (typical range 1-100 μg/L)

Why Prediction Is Challenging

Unlike temperature and clarity (which satellites measure directly), algae concentrations cannot be reliably measured from space for inland waters. Here's why:

Nimpact's Solution: Rather than trying to measure algae directly, we predict concentrations using environmental drivers (temperature, clarity, nutrient proxies) that CAN be measured accurately. This "indirect measurement" approach achieves better accuracy than direct spectral analysis.

Water-Body-Specific Models

Nimpact uses three separate machine learning models because algae dynamics differ fundamentally across water body types:

# Algae Prediction Models (Random Forest) RIVER Model (n=1,247 training sites): - Cyanobacteria: R² = 0.69 (geography-based) - Chlorophyll-a: R² = 0.49 (temperature, clarity) - Rivers show elevated levels due to upstream nutrient loading LAKE Model (n=2,183 training sites): - Cyanobacteria: Not predictable (nutrient-limited) - Chlorophyll-a: R² = 0.55 (temperature, clarity) - Lakes show summer bloom patterns when warm + nutrients TIDAL Model (n=891 training sites): - Cyanobacteria: Not predictable (salinity inhibits) - Chlorophyll-a: R² = 0.61 (temperature, clarity, tidal mixing) - Coastal blooms driven by upwelling and tidal cycles

This water-type stratification is critical. A river with high cyanobacteria is normal (upstream sources), while a lake with the same concentration indicates a serious local problem.

Page 1 of 2