niimpy.preprocessing.survey module¶
- niimpy.preprocessing.survey.daily_affect_variability(questions, subject=None)[source]¶
Returns two DataFrames corresponding to the daily affect variability and mean daily affect, both measures defined in the OLO paper available in 10.1371/journal.pone.0110907. In brief, the mean daily affect computes the mean of each of the 7 questions (e.g. sad, cheerful, tired) asked in a likert scale from 0 to 7. Conversely, the daily affect viariability computes the standard deviation of each of the 7 questions.
NOTE: This function aggregates data by day.
- Parameters
- questions: DataFrame with subject data (or database for backwards compatibility)
- subject: string, optional (backwards compatibility only, in the future do filtering before).
- Returns
- DLA_mean: mean of the daily affect
- DLA_std: standard deviation of the daily affect
- niimpy.preprocessing.survey.survey_convert_to_numerical_answer(df, answer_col, question_id, id_map, use_prefix=False)[source]¶
Convert text answers into numerical value (assuming a long dataframe). Use answer mapping dictionaries provided by the users to convert the answers. Can convert multiple questions having the same prefix (e.g., PSS10_1, PSS10_2, …,PSS10_9) if prefix mapping is provided. Function returns original values for the answers that have not been specified for conversion.
- Parameters
- dfpandas dataframe
Dataframe containing the questions
- answer_colstr
Name of the column containing the answers
- question_idstr
Name of the column containing the question id.
- id_mapdictionary
Dictionary containing answer mappings (value) for each question_id (key), or a dictionary containing a map for each question id prefix if use_prefix option is used.
- use_prefixboolean
If False, uses given map (id_map) to convert questions. The default is False. If True, use question id prefix map, so that multiple question_id’s having the same prefix may be converted on the same time.
- Returns
- resultpandas series
Series containing converted values and original values for aswers hat are not supposed to be converted.
- niimpy.preprocessing.survey.survey_print_statistic(df, question_id_col='id', answer_col='answer', prefix=None, group=None)[source]¶
Return survey statistic. Assuming that the question ids are stored in question_id_col and the survey answers are stored in answer_col, this function returns all the relevant statistics for each question. The statistic includes min, max, average and s.d of the scores of each question.
- Parameters
- df: pandas.DataFrame
Input data frame
- question_id_col: string.
Column contains question id.
- answer_col: string
Column contains answer in numerical values.
- prefix: list, optional
List contains survey prefix. If None is given, search question_id_col for all possible categories.
- group: string, optional
Column contains group factor. If this is given, survey statistics for each group will be returned
- Returns
- ——-
- dict: dictionary
A dictionary contains summary of each questionaire category. Example: {‘PHQ9’: {‘min’: 3, ‘max’: 8, ‘avg’: 4.5, ‘std’: 2}}
- niimpy.preprocessing.survey.survey_sum_scores(df, survey_prefix=None, answer_col='answer', id_column='id')[source]¶
Sum all columns (like
PHQ9_*
) to get a survey score.Parameters¶
- df: pandas DataFrame
DataFrame should be a DateTime index, an answer_column with numeric scores, and an id_column with question IDs like “PHQ9_1”, “PHQ9_2”, etc. The given survey_prefix is the “PHQ9” (no underscore) part which selects the right questions (rows not matching this prefix won’t be included).
- survey_prefix: string
The survey prefix in the ‘id’ column, e.g. ‘PHQ9’. An ‘_’ is appended.