niimpy.preprocessing.survey module¶

niimpy.preprocessing.survey.clean_survey_column_names(df)[source]¶

This function takes a pandas DataFrame as input and cleans the column names by removing or replacing specified characters. It helps to ensure standardized and clean column names for further analysis or processing.

Parameters

dfpandas dataframe: The input DataFrame with column names to be cleaned.

Returns

dfpandas.DataFrame: The DataFrame with cleaned column names.

niimpy.preprocessing.survey.convert_survey_to_numerical_answer(df, id_map, use_prefix=False)[source]¶

Convert text answers into numerical value (assuming a long dataframe). Use answer mapping dictionaries provided by the users to convert the answers. Can convert multiple questions having the same prefix (e.g., PSS10_1, PSS10_2, …,PSS10_9) if prefix mapping is provided. Function returns original values for the answers that have not been specified for conversion.

Parameters

dfpandas dataframe: Dataframe containing the questions
answer_colstr: Name of the column containing the answers
question_idstr: Name of the column containing the question id.
id_mapdictionary: Dictionary containing answer mappings (value) for each question_id (key), or a dictionary containing a map for each question id prefix if use_prefix option is used.
use_prefixboolean: If False, uses given map (id_map) to convert questions. The default is False. If True, use question id prefix map, so that multiple question_id’s having the same prefix may be converted on the same time.

Returns

resultpandas series: Series containing converted values and original values for aswers hat are not supposed to be converted.

niimpy.preprocessing.survey.extract_features_survey(df, features=None)[source]¶

Calculates survey features

Parameters

dfpd.DataFrame: dataframe of survey data. Must follow Niimpy format. In additions, each survey question must be in a single column and the column name must be formatted as survey-id_question-number (for example PHQ9_3).
featuresmap (dictionary) of functions that compute features.: it is a map of map, where the keys to the first map is the name of functions that compute features and the nested map contains the keyword arguments to that function. If there is no arguments use an empty map. Default is None. If None, all the available functions are used. Those functions are in the dict survey.ALL_FEATURES. You can implement your own function and use it instead or add it to the mentioned map.

Returns

featurespd.DataFrame: Dataframe of computed features where the index is users and columns are the the features.

niimpy.preprocessing.survey.group_data(df)[source]¶: Group the dataframe by a standard set of columns listed in group_by_columns.

niimpy.preprocessing.survey.reset_groups(df)[source]¶: Group the dataframe by a standard set of columns listed in group_by_columns.

niimpy.preprocessing.survey.sum_survey_scores(df, survey_prefix=None)[source]¶

Sum all columns (like PHQ9_*) to get a survey score.

Parameters¶

df: pandas DataFrame: DataFrame should be a DateTime index, an answer_column with numeric scores, and an id_column with question IDs like “PHQ9_1”, “PHQ9_2”, etc. The given survey_prefix is the “PHQ9” (no underscore) part which selects the right questions (rows not matching this prefix won’t be included).
survey_prefix: string: The survey prefix in the ‘id’ column, e.g. ‘PHQ9’. An ‘_’ is appended.

niimpy.preprocessing.survey.survey_statistic(df, config)[source]¶

Return statistics for a single survey question or a list of questions. Assuming that each of the columns contains numerical values representing answers, this function returns the mean, maximum, minimum and standard deviation for each question in separate columns.

Parameters

df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of screen information

configuration options include:

columns: string or list(string), optional: A list of columns to process. If empty, the prefix will be used to identify columns
prefix: string or list(string): required unless columns is given. The function will process columns whose name starts with the prefix (QID_0, QID_1, …)

Returns

dict: pandas.DataFrame: A dataframe containing summaries of each questionaire.