satellIte phytoplaNkton Drivers In the Global Ocean over 1998-2015 (INDIGO Benchmark dataset)
This benchmark dataset contains the physical data used as predictors to reconstruct global chlorophyll-a concentrations (Chl, a proxy of phytoplankton biomass) in Roussillon et al., as well as the reference satellite Chl target fields. The nine physical predictors' data (Short-Wave radiations, Sea Surface Temperature, Sea Level Anomaly, Zonal and meridional surface currents, Zonal and meridional surface wind stress, Bathymetry, Binary continental mask) were extracted from publicly available datasets over [1998-2015] and resampled to the same spatio-temporel resolution as Chl, i.e. monthly on a 1°x1° grid between 50°N and 50°S. Missing values were gap-filled using the heat diffusion equation. Each variable was normalized by substracting its mean from the original values and dividing by its standard deviation over [1998-2015].
This dataset was used to train and validate the Multi-Mode Convolutional Neural network (CNNMM8) introduced in Roussillon et al. ; reconstructed monthly Chl fields over the [2012-2015] test period are also provided here.
We hope this benchmark dataset can help to promote the improvements of methods as well as the emergence of new ideas, as building datasets is sometimes more time-consuming than the implementation of machine learning tools themselves. This would also facilitate the quantitative comparison of models performances' on the exact same datasets.
Biological oceanography, Physical oceanography
phytoplankton physical drivers, satellite ocean color, time-series regression, global scale, deep learning, benchmark
50N, -50S, -180E, 180W