niimpy.preprocessing.survey module

niimpy.preprocessing.survey.daily_affect_variability(questions, subject=None)[source]

Returns two DataFrames corresponding to the daily affect variability and mean daily affect, both measures defined in the OLO paper available in 10.1371/journal.pone.0110907. In brief, the mean daily affect computes the mean of each of the 7 questions (e.g. sad, cheerful, tired) asked in a likert scale from 0 to 7. Conversely, the daily affect viariability computes the standard deviation of each of the 7 questions.

NOTE: This function aggregates data by day.

Parameters
questions: DataFrame with subject data (or database for backwards compatibility)
subject: string, optional (backwards compatibility only, in the future do filtering before).
Returns
DLA_mean: mean of the daily affect
DLA_std: standard deviation of the daily affect
niimpy.preprocessing.survey.survey_convert_to_numerical_answer(df, answer_col, question_id, id_map, use_prefix=False)[source]

Convert text answers into numerical value (assuming a long dataframe). Use answer mapping dictionaries provided by the users to convert the answers. Can convert multiple questions having the same prefix (e.g., PSS10_1, PSS10_2, …,PSS10_9) if prefix mapping is provided. Function returns original values for the answers that have not been specified for conversion.

Parameters
dfpandas dataframe

Dataframe containing the questions

answer_colstr

Name of the column containing the answers

question_idstr

Name of the column containing the question id.

id_mapdictionary

Dictionary containing answer mappings (value) for each question_id (key), or a dictionary containing a map for each question id prefix if use_prefix option is used.

use_prefixboolean

If False, uses given map (id_map) to convert questions. The default is False. If True, use question id prefix map, so that multiple question_id’s having the same prefix may be converted on the same time.

Returns
resultpandas series

Series containing converted values and original values for aswers hat are not supposed to be converted.

niimpy.preprocessing.survey.survey_print_statistic(df, question_id_col='id', answer_col='answer', prefix=None, group=None)[source]

Return survey statistic. Assuming that the question ids are stored in question_id_col and the survey answers are stored in answer_col, this function returns all the relevant statistics for each question. The statistic includes min, max, average and s.d of the scores of each question.

Parameters
df: pandas.DataFrame

Input data frame

question_id_col: string.

Column contains question id.

answer_col: string

Column contains answer in numerical values.

prefix: list, optional

List contains survey prefix. If None is given, search question_id_col for all possible categories.

group: string, optional

Column contains group factor. If this is given, survey statistics for each group will be returned

Returns
——-
dict: dictionary

A dictionary contains summary of each questionaire category. Example: {‘PHQ9’: {‘min’: 3, ‘max’: 8, ‘avg’: 4.5, ‘std’: 2}}

niimpy.preprocessing.survey.survey_sum_scores(df, survey_prefix=None, answer_col='answer', id_column='id')[source]

Sum all columns (like PHQ9_*) to get a survey score.

Parameters

df: pandas DataFrame

DataFrame should be a DateTime index, an answer_column with numeric scores, and an id_column with question IDs like “PHQ9_1”, “PHQ9_2”, etc. The given survey_prefix is the “PHQ9” (no underscore) part which selects the right questions (rows not matching this prefix won’t be included).

survey_prefix: string

The survey prefix in the ‘id’ column, e.g. ‘PHQ9’. An ‘_’ is appended.