bdf.ingest#

bdf.ingest(source: str | Path | list[str | Path], *, out_dir: str | Path | None = None, format: str = 'parquet', layout: str = 'flat', battery_metadata: str = 'embedded', recursive: bool = True, validate_existing: bool = True, validate_converted: bool = True, include_optional: bool = True, plugin: str | None = None, incremental: bool = True, force: bool = False, raise_on_error: bool = False, discover_collections: bool = False, refresh: bool = False, cache_dir: str | Path | None = None, data_dir: str | Path | None = 'timeseries', raw_dir: str | Path | None = 'timeseries/raw', cell_metadata_dir: str | Path | None = 'batteries', doi_enrich: bool = True, doi_timeout: int = 15, human: bool = False)[source]#

Convert raw vendor files to BDF and validate existing BDF artifacts.

  • source: file, directory, URL, or list of sources

  • format: “parquet” (default) or “csv”

  • layout: “flat” (default) or “nested”
    • flat: convert into out_dir/source and emit one collection metadata file

    • nested: convert into data/ under out_dir/source, emit root dataset metadata, and emit per-cell metadata.jsonld folders that describe only the battery

  • battery_metadata: “embedded” (default) or “separate” for flat layout

  • out_dir: optional output root for converted files (defaults to source_dir)

  • data_dir: output subdir for converted files (relative to out_dir)

  • raw_dir: input subdir for raw files (relative to source_dir)

  • cell_metadata_dir: base dir for per-cell metadata folders (relative to out_dir)

  • validate_existing: validate files that already look like BDF

  • validate_converted: validate after conversion

  • plugin: force a specific plugin id for raw files

  • incremental: skip previously processed files when unchanged

  • force: reprocess even if a file looks unchanged

  • discover_collections: if True, ingest each folder containing contribution.json (or collection.json)

  • refresh/cache_dir: refresh cached remote sources

  • doi_enrich: if True, enrich missing dataset metadata from DOI (DataCite, then Crossref)

  • doi_timeout: per-request timeout (seconds) for DOI lookups

  • human: if True, serialize with human prefLabels; default writes skos:notation labels

Returns a summary dict with converted/validated/failed entries. When source is a list, the summary includes “sources”; when discover_collections is True, the summary includes “roots”. Metadata generation uses contribution.json/person.json, and nested layout requires battery.json.