Copy this text
UVP6Net : plankton images captured with the UVP6
Plankton was imaged with UVP6 in contrasted oceanic regions. The full images were processed by the UVP6 firmware and the regions of interest (ROIs) around each individual object were recorded. A set of associated features were measured on the objects (see Picheral et al. 2021, doi:10.1002/lom3.10475, for more information). All objects were classified by a limited number of operators into 110 different classes using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr). The following dataset corresponds to the 634 459 objects that have an area superior to 73 square pixels (equivalent spherical diameter of 9.8 pixels, corresponding to the default size limit of 620µm in the UVP6 configuration). The different files provide information about the features of the objects, their taxonomic identification as well as the raw images. For the purpose of training machine learning classifiers, the images in each class were split into training, validation, and test sets, with proportions 70%, 15% and 15%.
An additional folder is provided, which includes the subset of images used to train the unique embedded classification model of the UVP6 actually deployed on the NKE CTS5 floats (10.5281/zenodo.10694203). These images correspond to UVP6Net objects filtered to retain only those with a size of 79 square pixels to fit with the 645µm class from EcoPart, resulting in a total of 595,595 objects. The taxonomic identification was also made coarser (from 110 classes to 20) to ensure adequate performance of the classification model on power-constrained hardware. Images in this subset display objects as shades of grey/white on a black background.
First folder (UVP6Net.tar) contains :
taxa.csv.gz
Table of the classification of each object in the dataset, with columns :
- objid: unique object identifier in EcoTaxa (integer number).
- taxon_level1: taxonomic name corresponding to the level 1 classification
- lineage_level1: taxonomic lineage corresponding to the level 1 classification
- taxon_level2: name of the taxon corresponding to the level 2 classification
- plankton: if the object is a plankton or not (boolean)
- set: class of the image corresponding to the taxon (train: training, val: validation, or test)
- img_path: local path of the image corresponding to the taxon (of level 1), named according to the object id
features_native.csv.gz
Table of morphological features computed by ZooCAM. All features are computed on the object only, not the background. All area/length measures are in pixels. All grey levels are in encoded in 8 bits (0=black, 255=white). With columns :
- objid: unique object identifier in EcoTaxa (integer number).
- area: surface area of the object (integer number)
- mean: average grey value within the object; sum of the grey values of all pixels in the object divided by the number of pixels
- stddev: standard deviation of the grey value used to generate the mean grey value
- mode: modal grey value within the object
- min: minimum grey value within the object (0 = black)
- max: maximum grey value within the object (255 = white)
- perim: the length of the outside boundary of the object
- width: width of the smallest rectangle enclosing the object
- height: height of the smallest rectangle enclosing the object
- major: primary axis of the best fitting ellipse for the object
- minor: secondary axis of the best fitting ellipse for the object
- angle: angle between the primary axis and a line parallel to the x-axis of the image
- circ: circularity = (4 * Pi * Area) / Perim2) ; a value of 1 indicates a perfect circle, a value approaching 0 indicates an increasingly elongated polygon
- feret: maximum feret diameter, i.e., the longest distance between any two points along the object boundary
- intden: integrated density. This is the sum of the grey values of the pixels in the object (i.e. = Area*Mean)
- median: median grey value within the object
- skew: skewness of the histogram of grey level values
- kurt: kurtosis of the histogram of grey level values
- %area: percentage of object’s surface area that is comprised of holes, defined as the background grey level
- area_exc: surface area of the holes in the object, in square pixels (=Area*(1-(%area/100))
- fractal: fractal dimension of object boundary (Berube and Jebrak 1999)
- skelarea: surface area of skeleton in pixels. In a binary image, skeleton is obtained by repeatedly removing pixels from the edges of objects until they are reduced to the width of a single pixel
- slope: slope of the grey level normalized cumulative histogram
- histcum1, 2, 3: grey level value at the first, second and third quartile of the normalized cumulative histogram of grey levels
- nb1 nb2 nb3: number of remaining objects in the image after thresholding on level Histcum1, 2 and 3
- symetrieh: bilateral horizontal symmetry index
- symetriev: bilateral vertical symmetry index
- symetriehc: symmetry of the largest remaining object in relation to the horizontal axis after thresholding at the grey level Histcum1 value
- symetrievc: symmetry of the largest remaining object in relation to the vertical axis after thresholding at the grey level Histcum1 value
- convperim: the perimeter of the smallest polygon within which all points in the object fit
- convarea:the area of the smallest polygon within which all points in the object fit
- fcons: measure of contrast based in the texture feature descriptor (Amadasun and King, 1989)
- thickr: thickness ratio: relation between the maximum thickness of an object and the averag thickness of the object excluding the maximum
- elongation: major / minor
- range: grey max - grey min
- meanpos: (mean-max) / (mean-min)
- cv: 100*(stddev/mean)
- sr: 100*(stddev/(max-min))
- perimareaexc: perim/(sqrt(area_exc))
- feretareaexc: feret/(sqrt(area_exc))
- perimferet: perim/feret
- perimmajor: perim/major
- circex: (4*PI*area_exc)/(pow(perim,2))
- kurt_mean: mean kurtosis of the histogram of grey level values
- skew_mean: mean Skewness of the histogram of grey level values
- convperim_perim: perimeter of the smallest polygon within which all points in the object fit
- convarea_area: area of the smallest polygon within which all points in the object fit
- symetrieh_area: symetrieh/area
- symetriev_area : symetriev/area
- nb1, nb2, nb3_area: nb1,nb2,nb3/area
- nb1, nb2, nb3_range: nb1,nb2,nb3/range
- median_mean: median/grey mean
- median_mean_range: (median-grey mean)/range
- skeleton_area: skelarea/area
features_skimage.csv.gz
Table of morphological features recomputed with skimage.measure.regionprops on the ROIs produced by UVP6 firmware. See http://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops for documentation.
inventory.tsv
Tree view of the taxonomy and number of images in each taxon, displayed as text. With columns :
- lineage_level1: taxonomic lineage corresponding to the level 1 classification
- taxon_level1: name of the taxon corresponding to the level 1 classification
- n: number of objects in each class
map.png
Map of the sampling locations, to give an idea of the diversity sampled in this dataset.
imgs
Directory containing images of each object, named according to the object id objid and sorted in subdirectories according to their taxon.
Second folder (UVPEC_data) contains :
imgs
Directory containing images of each object on black background, stored in the format required to perform embed classification algorithm from UVPEC package (https://github.com/ecotaxa/uvpec); i.e. each image is stored as “objid.jpg” in folders corresponding to their taxon (20 different classes), named “taxon_name__taxon_id”.
Disciplines
Biological oceanography
Keywords
plankton, image, UVP6
Location
89.92N, -57.99S, 147.98E, -149.96W
Devices
Picheral M, Catalano C, Brousseau D, Claustre H, Coppola L, Leymarie E, Coindat J, Dias F, Fevre S, Guidi L, Irisson J-O, Legendre L, Lombard F, Mortier L, Penkerch C, Rogge A, Schmechtig C, Thibault S, Tixier T, Waite A, Stemmann L (2022) The Underwater Vision Profiler 6: an imaging sensor of particle size spectra and plankton, for autonomous and cabled platforms. Limnology & Ocean Methods 20:115–129.