bdf.ingest#
- bdf.ingest(source: str | Path | list[str | Path], *, out_dir: str | Path | None = None, format: str = 'parquet', layout: str = 'flat', battery_metadata: str = 'embedded', recursive: bool = True, validate_existing: bool = True, validate_converted: bool = True, include_optional: bool = True, plugin: str | None = None, incremental: bool = True, force: bool = False, raise_on_error: bool = False, discover_collections: bool = False, refresh: bool = False, cache_dir: str | Path | None = None, data_dir: str | Path | None = 'timeseries', raw_dir: str | Path | None = 'timeseries/raw', cell_metadata_dir: str | Path | None = 'batteries', doi_enrich: bool = True, doi_timeout: int = 15, human: bool = False)[source]#
Convert raw vendor files to BDF and validate existing BDF artifacts.
source: file, directory, URL, or list of sources
format: “parquet” (default) or “csv”
- layout: “flat” (default) or “nested”
flat: convert into out_dir/source and emit one collection metadata file
nested: convert into data/ under out_dir/source, emit root dataset metadata, and emit per-cell metadata.jsonld folders that describe only the battery
battery_metadata: “embedded” (default) or “separate” for flat layout
out_dir: optional output root for converted files (defaults to source_dir)
data_dir: output subdir for converted files (relative to out_dir)
raw_dir: input subdir for raw files (relative to source_dir)
cell_metadata_dir: base dir for per-cell metadata folders (relative to out_dir)
validate_existing: validate files that already look like BDF
validate_converted: validate after conversion
plugin: force a specific plugin id for raw files
incremental: skip previously processed files when unchanged
force: reprocess even if a file looks unchanged
discover_collections: if True, ingest each folder containing contribution.json (or collection.json)
refresh/cache_dir: refresh cached remote sources
doi_enrich: if True, enrich missing dataset metadata from DOI (DataCite, then Crossref)
doi_timeout: per-request timeout (seconds) for DOI lookups
human: if True, serialize with human prefLabels; default writes skos:notation labels
Returns a summary dict with converted/validated/failed entries. When source is a list, the summary includes “sources”; when discover_collections is True, the summary includes “roots”. Metadata generation uses contribution.json/person.json, and nested layout requires battery.json.