Copy this text
UVP5 data sorted with EcoTaxa and MorphoCluster
Here, we provide plankton image data that was sorted with the web applications EcoTaxa and MorphoCluster. The data set was used for image classification tasks as described in Schröder et. al (in preparation) and does not include any geospatial or temporal meta-data.
Plankton was imaged using the Underwater Vision Profiler 5 (Picheral et al. 2010) in various regions of the world's oceans between 2012-10-24 and 2017-08-08.
This data publication consists of an archive containing "training.csv" (list of 392k training images for classification, validated using EcoTaxa), "validation.csv" (list of 196k validation images for classification, validated using EcoTaxa), "unlabeld.csv" (list of 1M unlabeled images), "morphocluster.csv" (1.2M objects validated using MorphoCluster, a subset of "unlabeled.csv" and "validation.csv") and the image files themselves. The CSV files each contain the columns "object_id" (a unique ID), "image_fn" (the relative filename), and "label" (the assigned name).
The training and validation sets were sorted into 65 classes using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr). This data shows a severe class imbalance; the 10% most populated classes contain more than 80% of the objects and the class sizes span four orders of magnitude. The validation set and a set of additional 1M unlabeled images were sorted during the first trial of MorphoCluster (https://github.com/morphocluster).
The images in this data set were sampled during RV Meteor cruises M92, M93, M96, M97, M98, M105, M106, M107, M108, M116, M119, M121, M130, M131, M135, M136, M137 and M138, during RV Maria S Merian cruises MSM22, MSM23, MSM40 and MSM49, during the RV Polarstern cruise PS88b and during the FLUXES1 experiment with RV Sarmiento de Gamboa.
The following people have contributed to the sorting of the image data on EcoTaxa:
Rainer Kiko, Tristan Biard, Benjamin Blanc, Svenja Christiansen, Justine Courboules, Charlotte Eich, Jannik Faustmann, Christine Gawinski, Augustin Lafond, Aakash Panchal, Marc Picheral, Akanksha Singh and Helena Hauss
In Schröder et al. (in preparation), the training set serves as a source for knowledge transfer in the training of the feature extractor. The classification using MorphoCluster was conducted by Rainer Kiko. Used labels are operational and not yet matched to respective EcoTaxa classes.
Biological oceanography, Cross-discipline
machine learning, plankton, image, underwater vision profiler, class-imbalance
90N, -90S, 180E, -180W
Picheral, M., Guidi, L., Stemmann, L., Karl, D. M., Iddaoud, G., & Gorsky, G. (2010). The Underwater Vision Profiler 5: An advanced instrument for high spatial resolution studies of particle size spectra and zooplankton. Limnology and Oceanography: Methods, 8(1), 462–473. https://doi.org/10.4319/lom.2010.8.462
Image files and filename lists
|5 Go||IMAGE||Processed data|