niimpy.preprocessing.survey module

niimpy.preprocessing.survey.clean_survey_column_names(df)[source]

This function takes a pandas DataFrame as input and cleans the column names by removing or replacing specified characters. It helps to ensure standardized and clean column names for further analysis or processing.

Parameters
dfpandas dataframe

The input DataFrame with column names to be cleaned.

Returns
dfpandas.DataFrame

The DataFrame with cleaned column names.

niimpy.preprocessing.survey.convert_survey_to_numerical_answer(df, id_map, use_prefix=False)[source]

Convert text answers into numerical value (assuming a long dataframe). Use answer mapping dictionaries provided by the users to convert the answers. Can convert multiple questions having the same prefix (e.g., PSS10_1, PSS10_2, …,PSS10_9) if prefix mapping is provided. Function returns original values for the answers that have not been specified for conversion.

Parameters
dfpandas dataframe

Dataframe containing the questions

answer_colstr

Name of the column containing the answers

question_idstr

Name of the column containing the question id.

id_mapdictionary

Dictionary containing answer mappings (value) for each question_id (key), or a dictionary containing a map for each question id prefix if use_prefix option is used.

use_prefixboolean

If False, uses given map (id_map) to convert questions. The default is False. If True, use question id prefix map, so that multiple question_id’s having the same prefix may be converted on the same time.

Returns
resultpandas series

Series containing converted values and original values for aswers hat are not supposed to be converted.

niimpy.preprocessing.survey.extract_features_survey(df, features=None)[source]

Calculates survey features

Parameters
dfpd.DataFrame

dataframe of survey data. Must follow Niimpy format. In additions, each survey question must be in a single column and the column name must be formatted as survey-id_question-number (for example PHQ9_3).

featuresmap (dictionary) of functions that compute features.

it is a map of map, where the keys to the first map is the name of functions that compute features and the nested map contains the keyword arguments to that function. If there is no arguments use an empty map. Default is None. If None, all the available functions are used. Those functions are in the dict survey.ALL_FEATURES. You can implement your own function and use it instead or add it to the mentioned map.

Returns
featurespd.DataFrame

Dataframe of computed features where the index is users and columns are the the features.

niimpy.preprocessing.survey.group_data(df)[source]

Group the dataframe by a standard set of columns listed in group_by_columns.

niimpy.preprocessing.survey.reset_groups(df)[source]

Group the dataframe by a standard set of columns listed in group_by_columns.

niimpy.preprocessing.survey.sum_survey_scores(df, survey_prefix=None)[source]

Sum all columns (like PHQ9_*) to get a survey score.

Parameters

df: pandas DataFrame

DataFrame should be a DateTime index, an answer_column with numeric scores, and an id_column with question IDs like “PHQ9_1”, “PHQ9_2”, etc. The given survey_prefix is the “PHQ9” (no underscore) part which selects the right questions (rows not matching this prefix won’t be included).

survey_prefix: string

The survey prefix in the ‘id’ column, e.g. ‘PHQ9’. An ‘_’ is appended.

niimpy.preprocessing.survey.survey_statistic(df, config)[source]

Return statistics for a single survey question or a list of questions. Assuming that each of the columns contains numerical values representing answers, this function returns the mean, maximum, minimum and standard deviation for each question in separate columns.

Parameters
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of screen information

configuration options include:
columns: string or list(string), optional

A list of columns to process. If empty, the prefix will be used to identify columns

prefix: string or list(string)

required unless columns is given. The function will process columns whose name starts with the prefix (QID_0, QID_1, …)

Returns
dict: pandas.DataFrame

A dataframe containing summaries of each questionaire.