Downloads, validates, and harmonizes the 1984-2015 legacy zooplankton dataset from SharePoint. Taxa are matched against the WoRMS database and the output is written in both Parquet and CSV formats to the hot storage bucket.
Value
Invisible NULL. Outputs are uploaded to SharePoint as
McZoo_84-15.parquet and McZoo_84-15.csv.
Details
The function performs the following steps:
Downloads sample ID metadata (
ids_84_15.csv) and biological data (zoo_84_15.csv) from thelegacy_dataSharePoint bucket.Integrates manually curated unmatched taxa from
unmatched_worms_84_15.xlsxto correct known synonyms before WoRMS validation.Queries WoRMS via
worrms::wm_records_taxamatch()for every unique taxon; unmatched taxa are flagged withmatch_type = "no_match".Selects a single AphiaID per taxon (lowest, i.e. oldest classification).
Joins biological observations with WoRMS-validated taxonomy and sample metadata, pivoting date columns to long format.
Standardises life-stage codes.
Aggregates duplicate records by summing
Abundance.Uploads tidy data to the hot storage bucket in Parquet and CSV formats.