niimpy.preprocessing.util module

niimpy.preprocessing.util.aggregate(df, freq, method_numerical='mean', method_categorical='first', groups=['user'], **resample_kwargs)[source]

Grouping and resampling the data. This function performs separated resampling for different types of columns: numerical and categorical.

Parameters
dfpandas Dataframe

Dataframe to resample

freqstring

Frequency to resample the data. Requires the dataframe to have datetime-like index.

method_numericalstr

Resampling method for numerical columns. Possible values: ‘sum’, ‘mean’, ‘median’. Default value is ‘mean’.

method_categoricalstr

Resampling method for categorical columns. Possible values: ‘first’, ‘mode’, ‘last’.

groupslist

Columns used for groupby operation.

resample_kwargsdict

keywords to pass pandas resampling function

Returns
An aggregated and resampled multi-index dataframe.
niimpy.preprocessing.util.date_range(df, start, end)[source]

Extract out a certain date range from a DataFrame.

Extract out a certain data range from a dataframe. The index must be the dates, and the index must be sorted.

niimpy.preprocessing.util.df_normalize(df, tz=None, old_tz=None)[source]

Normalize a df (from sql) before presenting it to the user.

This sets the dataframe index to the time values, and converts times to pandas.TimeStamp:s. Modifies the data frame inplace.

niimpy.preprocessing.util.install_extensions()[source]

Automatically install sqlite extension functions.

Only works on Linux for now, improvements welcome.

niimpy.preprocessing.util.occurrence(series, bin_width=720, grouping_width=3600)[source]

Number of 12-minute

This reproduces the logic of the “occurrence” database function, without needing the database.

inputs: pandas.Series of pandas.Timestamps

Output: pandas.DataFrame with timestamp index and ‘occurance’ column.

TODO: use the grouping_width option.

niimpy.preprocessing.util.set_tz(tz)[source]

Globally set the preferred local timezone

niimpy.preprocessing.util.tmp_timezone(new_tz)[source]

Temporarily override the global timezone for a black.

This is used as a context manager:

with tmp_timezone('Europe/Berlin'):
    ....

Note: this overrides the global timezone. In the future, there will be a way to handle timezones as non-global variables, which should be preferred.

niimpy.preprocessing.util.to_datetime(value)[source]
niimpy.preprocessing.util.uninstall_extensions()[source]

Uninstall any installed extensions