Name: Segmentation masks of ZooScan images focusing on images with several objects separated by a human operator
Published: 2024-03-29
License: https://creativecommons.org/licenses/by/4.0/
Keywords: image, zooscan, segmentation, semantic, instance, panoptic, vignette

The first step in many image analysis tasks is the segmentation of objects of interest from a full image. This is the case for ZooScan images. The ZooScan is a waterproof flatbed scanner dedicated to the digitization of samples of zooplankton, from sizes of 300µm and up. The jar of plankton is poured on the scanning window, objects are physically separated as best as possible and the image is acquired. After background subtraction, the full grayscale image is segmented based on a simple grey intensity threshold and each segmented object is measured (in terms of area, transparency etc.). These segments, usually called "vignettes", are then classified taxonomically, often through the help of machine learning based on the measurements. The measurements also allow estimating the size and volume of each object.

Despite the carefulness of operators, it is frequent for some of the 1000 to 2000 vignettes typically detected on a single scan to contain more than one object, hence biassing the measurements and the further quantification of concentration and biovolume of plankton. To avoid this, operators go back on the initial full frame and digitally separate touching objects by drawing white lines between them. This dataset contains ~14k vignettes with objects separated by white lines, ~5k vignettes of single, correctly detected objects as well as the binary masks of all of them. This can be used to train deep learning segmentation models, such as semantic, instance or panoptic segmenters. All these images were acquired with a ZooScan, from samples taken by a WP2 net in various places of the world, during the Tara Oceans cruise.

## Data preprocessing

The full zooscan image gets its background subtracted by ZooProcess. Then contiguous regions are detected using a connected component algorithm (that considers neighbouring pixels along the diagonal to be touching too). The pre-processed (background subtracted) scan and the mask resulting from manual separation with white lines are cropped to the regions of interest detected.

## Data splitting

The dataset is split in ~70% training set, 15% validation set, 15% test set.

## Classes, labels and annotations

All splits are organised the same way: an images directory, with grayscale png images of objects + a masks directory with binary png masks of objects to be detected.

When the binary mask contains only one region, the object is a single plankter.

## Parameters

The dataset does not contain or allow the computation of any standard variable. It is related to the computation of concentrations (http://vocab.nerc.ac.uk/collection/P01/current/SDBIOL01/) and biovolume (http://vocab.nerc.ac.uk/collection/P01/current/CVOLUKNB/) of plankton.

## Data sources

All images are taken with a ZooScan (http://vocab.nerc.ac.uk/collection/L22/current/TOOL1581/).

## Data quality

The images encompass a range of sizes of the organisms. The minimum area (number of pixels) of an organism is 358; the maximum is 1,1206,650. The pixel size is 0.0106mm. The smaller images can therefore be blurry and pixelated.

## Image resolution

Images range in size from 12 to 6020 px in width and from 6 to 13625 px in height.

## Contact information

For more information on this dataset, please contact Jean-Olivier IRISSON (irisson@normalesup.org)

Disciplines

Biological oceanography

Keywords

image, zooscan, segmentation, semantic, instance, panoptic, vignette

Location

44N, -64S, 159E, 74W

Licence

Utilisation

The data is provided as is, with no warranty that all human operations were all perfectly done.

Acknowledgements

We acknowledge the Tara Oceans consortium for the collection of this data during the Tara Oceans cruise. We acknowledge the iMagine project (Horizon Europe 101058625), which funded the final curation and reformating of this dataset.

Note

The author contributions are as follows: L Jalabert and A Elineau supervised the annotation work and quality procedures; C Desnos, N Llopis, B Serranito acquired the images and performed the initial separation of multiple objects ; L Jalabert, H Berrenger, A Bourhis, E Martins, C Merland performed further separation of multiple objects ; E Amblard reformatted the dataset for machine learning ; J-O Irisson secured funding for the project, supervised its operation and formatted the dataset for publication. Authors are listed in alphabetical order except for the first and last two to represent their supervision role.

Devices

The WP2 net is a 200µm meshed net targeting mesozooplankton.

Fraser JH (1966) Zooplankton sampling. Nature 211(5052):915-6.

The ZooScan is a flatbed scanner used to digitise plankton samples such as those coming from the WP2.

Gorsky G, Ohman MD, Picheral M, Gasparini S, Stemmann L, Romagnan J-B, Cawood A, Pesant S, Garcia-Comas C, Prejger F (2010) Digital zooplankton image analysis using the ZooScan integrated system. J Plankton Res 32:285–303.

Data

File	Size	Format	Processing	Access
Original images and binary masks of individual objects, sometimes separated by human operators.	448 Mo	IMAGE	Processed data		Download

Sea scientific
open data
publication

Segmentation masks of ZooScan images focusing on images with several objects separated by a human operator

Disciplines

Keywords

Location

Devices

Data

Sea scientific
open data
publication

Segmentation masks of ZooScan images focusing on images with several objects separated by a human operator

Disciplines

Keywords

Location

Devices

Data

Sea scientificopen datapublication

Sea scientific
open data
publication