Surveys

Surveys consist of columns * id for the question identifier * answer for the answer of the question * q which is the text of the question presented to the user (optional) * As usual, the DataFrame index is the timestamp of the answer. It is the convention that all responses in a one single survey instance have the same timestamp, and this is used to link surveys together.

The raw on-disk format is “long”, that is, one row per answer, which is “tidy data”. This provides the most flexible format, but often you need to do other transformations.

Load data

[1]:
# Artificial example survey data
import niimpy
from niimpy import config
import niimpy.preprocessing.survey as survey
from niimpy.preprocessing.survey import *
import warnings
warnings.filterwarnings("ignore")
[2]:
df = niimpy.read_csv(config.SURVEY_PATH, tz='Europe/Helsinki')
df.head()
[2]:
user age gender Little interest or pleasure in doing things. Feeling down; depressed or hopeless. Feeling nervous; anxious or on edge. Not being able to stop or control worrying. In the last month; how often have you felt that you were unable to control the important things in your life? In the last month; how often have you felt confident about your ability to handle your personal problems? In the last month; how often have you felt that things were going your way? In the last month; how often have you been able to control irritations in your life? In the last month; how often have you felt that you were on top of things? In the last month; how often have you been angered because of things that were outside of your control? In the last month; how often have you felt difficulties were piling up so high that you could not overcome them?
0 1 20 Male several-days more-than-half-the-days not-at-all nearly-every-day almost-never sometimes fairly-often never sometimes very-often fairly-often
1 2 32 Male more-than-half-the-days more-than-half-the-days not-at-all several-days never never very-often sometimes never fairly-often never
2 3 15 Male more-than-half-the-days not-at-all several-days not-at-all never very-often very-often fairly-often never never almost-never
3 4 35 Female not-at-all nearly-every-day not-at-all several-days very-often fairly-often very-often never sometimes never fairly-often
4 5 23 Male more-than-half-the-days not-at-all more-than-half-the-days several-days almost-never very-often almost-never sometimes sometimes very-often never

Preprocessing

The dataframe’s columns are raw questions from a survey. Some questions belong to a specific category, so we will annotate them with ids. The id is constructed from a prefix (the questionnaire category: GAD, PHQ, PSQI etc.), followed by the question number (1,2,3). Similarly, we will also the answers to meaningful numerical values.

Note: It’s important that the dataframe follows the below schema before passing into niimpy.

[3]:
# Convert column name to id, based on provided mappers from niimpy
col_id = {**PHQ2_MAP, **PSQI_MAP, **PSS10_MAP, **PANAS_MAP, **GAD2_MAP}
selected_cols = [col for col in df.columns if col in col_id.keys()]

# Convert from wide to long format
transformed_df = pd.melt(df, id_vars=['user', 'age', 'gender'], value_vars=selected_cols, var_name='question', value_name='raw_answer')

# Assign questions to codes
transformed_df['id'] = transformed_df['question'].replace(col_id)
transformed_df.head()
[3]:
user age gender question raw_answer id
0 1 20 Male Little interest or pleasure in doing things. several-days PHQ2_1
1 2 32 Male Little interest or pleasure in doing things. more-than-half-the-days PHQ2_1
2 3 15 Male Little interest or pleasure in doing things. more-than-half-the-days PHQ2_1
3 4 35 Female Little interest or pleasure in doing things. not-at-all PHQ2_1
4 5 23 Male Little interest or pleasure in doing things. more-than-half-the-days PHQ2_1

Moreover, niimpy can convert the raw answers to numerical values for further analysis. For this, we need a mapping {raw_answer: numerical_answer}, which niimpy provides within the survey module that you can easily adjust to your own needs.

Based on the question’s id, niimpy maps the raw answers to their numerical presentation.

[4]:
# Transform raw answers to numerical values
transformed_df['answer'] = survey.survey_convert_to_numerical_answer(transformed_df, answer_col = 'raw_answer',
                                                                     question_id = 'id', id_map=ID_MAP_PREFIX, use_prefix=True)
transformed_df.head()
[4]:
user age gender question raw_answer id answer
0 1 20 Male Little interest or pleasure in doing things. several-days PHQ2_1 1
1 2 32 Male Little interest or pleasure in doing things. more-than-half-the-days PHQ2_1 2
2 3 15 Male Little interest or pleasure in doing things. more-than-half-the-days PHQ2_1 2
3 4 35 Female Little interest or pleasure in doing things. not-at-all PHQ2_1 0
4 5 23 Male Little interest or pleasure in doing things. more-than-half-the-days PHQ2_1 2