UVP5 data sorted with EcoTaxa and MorphoCluster

Here, we provide plankton image data that was sorted with the web applications EcoTaxa and MorphoCluster. The data set was used for image classification tasks as described in Schröder et. al (in preparation) and does not include any geospatial or temporal meta-data.

Plankton was imaged using the Underwater Vision Profiler 5 (Picheral et al. 2010) in various regions of the world's oceans between 2012-10-24 and 2017-08-08.

This data publication consists of an archive containing  "training.csv" (list of 392k training images for classification, validated using EcoTaxa), "validation.csv" (list of 196k validation images for classification, validated using EcoTaxa), "unlabeld.csv" (list of 1M unlabeled images), "morphocluster.csv" (1.2M objects validated using MorphoCluster, a subset of "unlabeled.csv" and "validation.csv") and the image files themselves. The CSV files each contain the columns "object_id" (a unique ID), "image_fn" (the relative filename), and "label" (the assigned name).

The training and validation sets were sorted into 65 classes using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr). This data shows a severe class imbalance; the 10% most populated classes contain more than 80% of the objects and the class sizes span four orders of magnitude. The validation set and a set of additional 1M unlabeled images were sorted during the first trial of MorphoCluster (https://github.com/morphocluster).

The images in this data set were sampled during RV Meteor cruises M92, M93, M96, M97, M98, M105, M106, M107, M108, M116, M119, M121, M130, M131, M135, M136, M137 and M138, during RV Maria S Merian cruises MSM22, MSM23, MSM40 and MSM49, during the RV Polarstern cruise PS88b and during the FLUXES1 experiment with RV Sarmiento de Gamboa.

The following people have contributed to the sorting of the image data on EcoTaxa:

Rainer Kiko, Tristan Biard, Benjamin Blanc, Svenja Christiansen, Justine Courboules, Charlotte Eich, Jannik Faustmann, Christine Gawinski, Augustin Lafond, Aakash Panchal, Marc Picheral, Akanksha Singh and Helena Hauss

In Schröder et al. (in preparation), the training set serves as a source for knowledge transfer in the training of the feature extractor. The classification using MorphoCluster was conducted by Rainer Kiko. Used labels are operational and not yet matched to respective EcoTaxa classes.


Biological oceanography, Cross-discipline


machine learning, plankton, image, underwater vision profiler, class-imbalance


90N, -90S, 180E, -180W


Picheral, M., Guidi, L., Stemmann, L., Karl, D. M., Iddaoud, G., & Gorsky, G. (2010). The Underwater Vision Profiler 5: An advanced instrument for high spatial resolution studies of particle size spectra and zooplankton. Limnology and Oceanography: Methods, 8(1), 462–473. https://doi.org/10.4319/lom.2010.8.462


Image files and filename lists
5 GoIMAGEProcessed data
How to cite
Kiko Rainer, Schröder Simon-Martin (2020). UVP5 data sorted with EcoTaxa and MorphoCluster. SEANOE. https://doi.org/10.17882/73002
In addition to properly cite this dataset, it would be appreciated that the following work(s) be cited too, when using this dataset in a publication :
Schröder Simon-Martin, Kiko Rainer, Koch Reinhard (2020). MorphoCluster: Efficient Annotation of Plankton Images by Clustering. Sensors, 20 (11), 3060-. https://doi.org/10.3390/s20113060

Copy this text