bdf.clean#
- bdf.clean(df: DataFrame, *, time_fix: str = 'segment', outlier: str = 'none', z_thresh: float = 8.0, columns: List[str] | None = None, time_eps: float | str = 'auto', outlier_detect: str = 'hybrid', local_seconds: float | None = 600.0, local_z: float = 6.0, z_huber: float = 6.0, hampel_seconds: float | None = 300.0, hampel_k: float = 6.0, slope_gate: bool = True, slope_z: float = 8.0) Tuple[DataFrame, CleanReport][source]#
Clean a BDF-normalized DataFrame.
- time_fix:
‘segment’ -> place non-monotonic blocks between neighbors (keeps rows; default) ‘sort’ -> stable sort by time; drop duplicate timestamps ‘drop’ -> drop rows where time decreases beyond ‘time_eps’ ‘none’ -> leave time as-is
- outlier (action on flagged rows/values):
‘drop’ -> drop any row where selected columns are flagged as outliers ‘clip’ -> winsorize flagged values back to robust bounds ‘interp’ -> replace flagged values with NaN and linearly interpolate ‘none’ -> no outlier clean
- outlier_detect (how to flag):
‘mad’ -> global MAD z-score only ‘huber’ -> global Huber z-score only (SciPy; falls back to MAD if unavailable) ‘hybrid’ -> BOTH global MAD and Huber must flag (reduces false positives).
- local_seconds / hampel_seconds / slope_gate:
Neighborhood & derivative gates to catch single-sample spikes and suppress false positives on slow drifts. Combined as: (GLOBAL AND (LOCAL OR HAMPEL)) OR SLOPE.