Gulf of Naples - 40 Years of Zooplankton Biodiversity Assessment
ZooGoN standardizes taxonomic names in Mediterranean zooplankton datasets spanning four decades (1984-2024) from the LTER-MareChiara station in the Gulf of Naples. This R package is part of the DTO-BioFlow project (Digital Twin Ocean - Biodiversity Flow Integration) under the EU Horizon Mission “Restore our Ocean & Waters by 2030”.
Project Context
This package processes the most comprehensive long-term zooplankton dataset from the Western Mediterranean Sea, including:
- 📊 1,506 zooplankton samples (1984-2024)
- 🦐 148 copepod species + 61 other taxa
- 🌍 Integration with European Digital Twin of the Ocean
Key Features
- Cloud Storage Integration: Microsoft SharePoint connectivity for collaborative data management
- Survey Data Ingestion: KoboToolbox integration for automated field survey collection
- Darwin Core Conversion: Transform legacy zooplankton data to OBIS-compliant format
- Automated Workflows: Infrastructure for continuous biodiversity monitoring
- FAIR Data Principles: Ensures Findable, Accessible, Interoperable, Reusable data
- EMODnet Biology Ready: Standards-compliant data for European marine biodiversity infrastructure
Installation
You can install the development version of ZooGoN from GitHub with:
# install.packages("pak")
pak::pak("ioledc/ZOOGoN-40Y")Usage
Automated Pipeline
ZooGoN provides a fully automated data pipeline that processes zooplankton data from field collection to Darwin Core Archive. Each step reads its input from SharePoint and uploads results back automatically:
library(ZooGoN)
# 1. Ingest field surveys from KoboToolbox
ingest_surveys()
# 2. Preprocess and standardize survey data
preprocess_surveys()
# 3. Merge legacy + ongoing data into an analysis-ready dataset
format_to_tidy()
# 4. Convert tidy data to Darwin Core format (Event, Occurrence, eMoF)
format_to_dc()
# 5. Build Darwin Core Archive with EML metadata and upload
format_to_DC_archive()The pipeline produces OBIS-compliant Darwin Core tables: - Event Extension: Sampling event metadata with geographic coordinates - Occurrence Extension: Species occurrences with WoRMS LSIDs - eMoF Extension: Measurements with BODC NERC Vocabulary standards
All steps are also run automatically via GitHub Actions on a scheduled basis.
Cloud Storage Operations
Upload and download data to/from Microsoft SharePoint:
# Upload data with automatic versioning
upload_sharepoint_df(
data = my_data,
prefix = "processed",
options = config$storage$sharepoint,
format = "parquet"
)
# Download latest version
data <- download_sharepoint_file(
prefix = "processed",
options = config$storage$sharepoint,
format = "parquet"
)Survey Data Ingestion
Retrieve field survey data from KoboToolbox:
# Ingest surveys from KoboToolbox and upload to SharePoint
ingest_surveys()
# Or retrieve data directly
survey_data <- get_kobo_data(
assetid = "your_asset_id",
uname = "username",
pwd = "password"
)Dataset Overview
The LTER-MareChiara zooplankton dataset represents one of the longest continuous time series in the Mediterranean Sea:
| Period | Frequency | Samples | Net Type | Fixation |
|---|---|---|---|---|
| 1984-1990 | Biweekly | 156 | Indian Ocean (200μm, 113cm) | Formaldehyde 2-4% |
| 1991-1994 | Interruption | - | - | - |
| 1995-2015 | Weekly | 1,092 | Indian Ocean (200μm, 113cm) | Formaldehyde 2-4% |
| 2016-2024 | Weekly | 258 | WP2 (200μm, 70cm) | Ethanol 96% |
Total: 1,506 samples • 148 copepod species • 61 other taxa
Data Standards & Compliance
ZooGoN ensures compatibility with international biodiversity data standards:
- 🗂️ Darwin Core Archive: International standard for biodiversity data
-
🌐 WoRMS Integration: World Register of Marine Species taxonomic validation
- 📊 BODC NERC Vocabulary: Standardized measurement terminology
- ⚡ EMODnet Biology: European marine biodiversity data infrastructure
- 🏷️ ISO19115 Metadata: International metadata standards
- 📄 FAIR Principles: Findable, Accessible, Interoperable, Reusable data
Publishing to GBIF (optional)
- Build a Darwin Core Archive and EML with
format_to_DC_archive()(writes the zip and uploads it to SharePoint). This reads the Darwin Core tables produced byformat_to_dc(). - Production registration: use
register_gbif_dataset()with your real GBIF organization key, installation key, credentials, and a public DwC-A URL. - Test registration: use
register_gbif_dataset_test()with the GBIF-Test demo credentials (ws_client_demo/Demo123) and your public DwC-A URL to exercise the flow safely.
Contributing
This package is part of the DTO-BioFlow project timeline (2025-2026):
- May 2025: Project initiation, Paris workshop
- August 2025: First interim report, EMODnet training completion
- December 2025: Second interim report
- April 2026: Final deliverables and EMODnet Biology publication
Citation
citation("ZooGoN")Funding
This work is supported by the DTO-BioFlow project (HORIZON-MISS-2022-OCEAN-01-07) under the EU Mission “Restore our Ocean & Waters by 2030”.
Contact
- Principal Investigator: Iole Di Capua (iole.dicapua@szn.it)
Institution: Stazione Zoologica Anton Dohrn, Naples, Italy
ORCID: 0000-0003-2959-8977
- Principal Analyst: Lorenzo Longobardi (lorenzo.longobardi@gmail.com)
Institution: WorldFish
ORCID: 0000-0003-3126-7341