otld.utils.pandas_utils.get_header

otld.utils.pandas_utils.get_header(df: DataFrame, column: str | int = None, find: str = None, reset: bool = False, sanitize: bool = False, idx: bool = False, concatenate: bool = False) int | DataFrame | Series

Find and extract the header row from a data frame

If only a data frame is provided, finds the first row in which all columns have a non-missing value and uses this as the header. Otherwise, searches in column column for the first occurrence of value find and uses that row as the header.

Args:

df (pd.DataFrame): A data frame to search within. column (str | int, optional): A column to search within for value find. Defaults to None. find (str, optional): A value to search for within column. Defaults to None. reset (bool, optional): Boolean indicating whether the index should be reset before searching for the header. Defaults to False. sanitize (bool, optional): A boolean indicating whether to manipulate column headers before searching them. Defaults to False. idx (bool, optional): A boolean indicating whether to return simply the row index of the header rather than the series containin the header. Defaults to False.

Returns:

int | pd.DataFrame | pd.Series: Returns a data frame with leading rows removed and the header updated if only a data frame is provided. Otherwise returns either an integer index or a series containing potential column names.