UVP5 data sorted with EcoTaxa and MorphoCluster

Date 2020
Temporal extent 2012-10-24 -2017-08-08
Author(s) Kiko RainerORCID1, Schröder Simon-MartinORCID2
Affiliation(s) 1 : Laboratoire d'Oéanographie de Villefranche-sur-mer & GEOMAR Helmholtz Centre for Ocean Research Kiel
2 : Kiel University
DOI 10.17882/73002
Publisher SEANOE
Keyword(s) machine learning, plankton, image, underwater vision profiler, class-imbalance

Here, we provide plankton image data that was sorted with the web applications EcoTaxa and MorphoCluster. The data set was used for image classification tasks as described in Schröder et. al (in preparation) and does not include any geospatial or temporal meta-data.

Plankton was imaged using the Underwater Vision Profiler 5 (Picheral et al. 2010) in various regions of the world's oceans between 2012-10-24 and 2017-08-08.

This data publication consists of an archive containing  "training.csv" (list of 392k training images for classification, validated using EcoTaxa), "validation.csv" (list of 196k validation images for classification, validated using EcoTaxa), "unlabeld.csv" (list of 1M unlabeled images), "morphocluster.csv" (1.2M objects validated using MorphoCluster, a subset of "unlabeled.csv" and "validation.csv") and the image files themselves. The CSV files each contain the columns "object_id" (a unique ID), "image_fn" (the relative filename), and "label" (the assigned name).

The training and validation sets were sorted into 65 classes using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr). This data shows a severe class imbalance; the 10% most populated classes contain more than 80% of the objects and the class sizes span four orders of magnitude. The validation set and a set of additional 1M unlabeled images were sorted during the first trial of MorphoCluster (https://github.com/morphocluster).

The images in this data set were sampled during RV Meteor cruises M92, M93, M96, M97, M98, M105, M106, M107, M108, M116, M119, M121, M130, M131, M135, M136, M137 and M138, during RV Maria S Merian cruises MSM22, MSM23, MSM40 and MSM49, during the RV Polarstern cruise PS88b and during the FLUXES1 experiment with RV Sarmiento de Gamboa.

The following people have contributed to the sorting of the image data on EcoTaxa:

Rainer Kiko, Tristan Biard, Benjamin Blanc, Svenja Christiansen, Justine Courboules, Charlotte Eich, Jannik Faustmann, Christine Gawinski, Augustin Lafond, Aakash Panchal, Marc Picheral, Akanksha Singh and Helena Hauss

In Schröder et al. (in preparation), the training set serves as a source for knowledge transfer in the training of the feature extractor. The classification using MorphoCluster was conducted by Rainer Kiko. Used labels are operational and not yet matched to respective EcoTaxa classes.

Licence CC-BY-NC
Acknowledgements We thank the chief scientists, captains and crews of the above mentioned cruises for their support. Furthermore, we would like to thank Marc Picheral and the PIQv team at the Laboratoire d'Océanographie de Villefranche-sur-mer for support with image data upload to EcoTaxa. Data acquisition was supported by the SFB 754 "Climate - Biogeochemistry Interactions in the Tropical Ocean" (www.sfb754.de, grant/award no. 27542298 of the German Science Foundation).
Sensor metadata

Picheral, M., Guidi, L., Stemmann, L., Karl, D. M., Iddaoud, G., & Gorsky, G. (2010). The Underwater Vision Profiler 5: An advanced instrument for high spatial resolution studies of particle size spectra and zooplankton. Limnology and Oceanography: Methods, 8(1), 462–473. https://doi.org/10.4319/lom.2010.8.462

File Size Format Processing Access
Image files and filename lists 5 GB IMAGE Processed data Open access
Top of the page

How to cite 

Kiko Rainer, Schröder Simon-Martin (2020). UVP5 data sorted with EcoTaxa and MorphoCluster. SEANOE. https://doi.org/10.17882/73002

In addition to properly cite this dataset, it would be appreciated that the following work(s) be cited too, when using this dataset in a publication :

Schröder Simon-Martin, Kiko Rainer, Koch Reinhard (2020). MorphoCluster: Efficient Annotation of Plankton Images by Clustering. Sensors, 20(11), 3060-. https://doi.org/10.3390/s20113060