Downloads, validates, and harmonizes the 2016-2020 legacy zooplankton dataset from SharePoint. Taxa are matched against the WoRMS database and the output is written in both Parquet and CSV formats to the hot storage bucket.
Value
Invisible NULL. Outputs are uploaded to SharePoint as
McZoo_16-20.parquet and McZoo_16-20.csv.
Details
The function performs the following steps:
Downloads sample ID metadata (
ids_16_20.csv) and biological data (zoo_16_20.csv) from thelegacy_dataSharePoint bucket.Integrates manually curated unmatched taxa from
unmatched_worms_16_20.xlsxto correct known synonyms before WoRMS validation.Queries WoRMS via
worrms::wm_records_taxamatch()for every unique taxon; unmatched taxa are flagged withmatch_type = "no_match".Selects a single AphiaID per taxon (lowest, i.e. oldest classification).
Joins biological observations with WoRMS-validated taxonomy and sample metadata.
Standardises life-stage codes and converts abundance to numeric.
Aggregates duplicate records by summing
Abundance.Uploads tidy data to the hot storage bucket in Parquet and CSV formats.