niimpy.preprocessing.util module¶
- niimpy.preprocessing.util.aggregate(df, freq, method_numerical='mean', method_categorical='first', groups=['user'], **resample_kwargs)[source]¶
Grouping and resampling the data. This function performs separated resampling for different types of columns: numerical and categorical.
- Parameters
- dfpandas Dataframe
Dataframe to resample
- freqstring
Frequency to resample the data. Requires the dataframe to have datetime-like index.
- method_numericalstr
Resampling method for numerical columns. Possible values: ‘sum’, ‘mean’, ‘median’. Default value is ‘mean’.
- method_categoricalstr
Resampling method for categorical columns. Possible values: ‘first’, ‘mode’, ‘last’.
- groupslist
Columns used for groupby operation.
- resample_kwargsdict
keywords to pass pandas resampling function
- Returns
- An aggregated and resampled multi-index dataframe.
- niimpy.preprocessing.util.date_range(df, start, end)[source]¶
Extract out a certain date range from a DataFrame.
Extract out a certain data range from a dataframe. The index must be the dates, and the index must be sorted.
- niimpy.preprocessing.util.df_normalize(df, tz=None, old_tz=None)[source]¶
Normalize a df (from sql) before presenting it to the user.
This sets the dataframe index to the time values, and converts times to pandas.TimeStamp:s. Modifies the data frame inplace.
- niimpy.preprocessing.util.install_extensions()[source]¶
Automatically install sqlite extension functions.
Only works on Linux for now, improvements welcome.
- niimpy.preprocessing.util.occurrence(series, bin_width=720, grouping_width=3600)[source]¶
Number of 12-minute
This reproduces the logic of the “occurrence” database function, without needing the database.
inputs: pandas.Series of pandas.Timestamps
Output: pandas.DataFrame with timestamp index and ‘occurance’ column.
TODO: use the grouping_width option.
- niimpy.preprocessing.util.tmp_timezone(new_tz)[source]¶
Temporarily override the global timezone for a black.
This is used as a context manager:
with tmp_timezone('Europe/Berlin'): ....
Note: this overrides the global timezone. In the future, there will be a way to handle timezones as non-global variables, which should be preferred.