niimpy.preprocessing.survey module¶

niimpy.preprocessing.survey.daily_affect_variability(questions, subject=None)[source]¶

Returns two DataFrames corresponding to the daily affect variability and mean daily affect, both measures defined in the OLO paper available in 10.1371/journal.pone.0110907. In brief, the mean daily affect computes the mean of each of the 7 questions (e.g. sad, cheerful, tired) asked in a likert scale from 0 to 7. Conversely, the daily affect viariability computes the standard deviation of each of the 7 questions.

NOTE: This function aggregates data by day.

Parameters

questions: DataFrame with subject data (or database for backwards compatibility)
subject: string, optional (backwards compatibility only, in the future do filtering before).

Returns

DLA_mean: mean of the daily affect
DLA_std: standard deviation of the daily affect

niimpy.preprocessing.survey.survey_convert_to_numerical_answer(df, answer_col, question_id, id_map, use_prefix=False)[source]¶

Convert text answers into numerical value (assuming a long dataframe). Use answer mapping dictionaries provided by the users to convert the answers. Can convert multiple questions having the same prefix (e.g., PSS10_1, PSS10_2, …,PSS10_9) if prefix mapping is provided. Function returns original values for the answers that have not been specified for conversion.

Parameters

dfpandas dataframe: Dataframe containing the questions
answer_colstr: Name of the column containing the answers
question_idstr: Name of the column containing the question id.
id_mapdictionary: Dictionary containing answer mappings (value) for each question_id (key), or a dictionary containing a map for each question id prefix if use_prefix option is used.
use_prefixboolean: If False, uses given map (id_map) to convert questions. The default is False. If True, use question id prefix map, so that multiple question_id’s having the same prefix may be converted on the same time.

Returns

resultpandas series: Series containing converted values and original values for aswers hat are not supposed to be converted.

niimpy.preprocessing.survey.survey_print_statistic(df, question_id_col='id', answer_col='answer', prefix=None, group=None)[source]¶

Return survey statistic. Assuming that the question ids are stored in question_id_col and the survey answers are stored in answer_col, this function returns all the relevant statistics for each question. The statistic includes min, max, average and s.d of the scores of each question.

Parameters

df: pandas.DataFrame: Input data frame
question_id_col: string.: Column contains question id.
answer_col: string: Column contains answer in numerical values.
prefix: list, optional: List contains survey prefix. If None is given, search question_id_col for all possible categories.
group: string, optional: Column contains group factor. If this is given, survey statistics for each group will be returned
Returns
——-
dict: dictionary: A dictionary contains summary of each questionaire category. Example: {‘PHQ9’: {‘min’: 3, ‘max’: 8, ‘avg’: 4.5, ‘std’: 2}}

niimpy.preprocessing.survey.survey_sum_scores(df, survey_prefix=None, answer_col='answer', id_column='id')[source]¶

Sum all columns (like PHQ9_*) to get a survey score.

Parameters¶

df: pandas DataFrame: DataFrame should be a DateTime index, an answer_column with numeric scores, and an id_column with question IDs like “PHQ9_1”, “PHQ9_2”, etc. The given survey_prefix is the “PHQ9” (no underscore) part which selects the right questions (rows not matching this prefix won’t be included).
survey_prefix: string: The survey prefix in the ‘id’ column, e.g. ‘PHQ9’. An ‘_’ is appended.