Surveys¶

Surveys consist of columns * id for the question identifier * answer for the answer of the question * q which is the text of the question presented to the user (optional) * As usual, the DataFrame index is the timestamp of the answer. It is the convention that all responses in a one single survey instance have the same timestamp, and this is used to link surveys together.

The raw on-disk format is “long”, that is, one row per answer, which is “tidy data”. This provides the most flexible format, but often you need to do other transformations.

Load data¶

[1]:

# Artificial example survey data
import niimpy
from niimpy import config
import niimpy.preprocessing.survey as survey
from niimpy.preprocessing.survey import *
import warnings
warnings.filterwarnings("ignore")

[2]:

df = niimpy.read_csv(config.SURVEY_PATH, tz='Europe/Helsinki')
df.head()

[2]:

	user	age	gender	Little interest or pleasure in doing things.	Feeling down; depressed or hopeless.	Feeling nervous; anxious or on edge.	Not being able to stop or control worrying.	In the last month; how often have you felt that you were unable to control the important things in your life?	In the last month; how often have you felt confident about your ability to handle your personal problems?	In the last month; how often have you felt that things were going your way?	In the last month; how often have you been able to control irritations in your life?	In the last month; how often have you felt that you were on top of things?	In the last month; how often have you been angered because of things that were outside of your control?	In the last month; how often have you felt difficulties were piling up so high that you could not overcome them?
0	1	20	Male	several-days	more-than-half-the-days	not-at-all	nearly-every-day	almost-never	sometimes	fairly-often	never	sometimes	very-often	fairly-often
1	2	32	Male	more-than-half-the-days	more-than-half-the-days	not-at-all	several-days	never	never	very-often	sometimes	never	fairly-often	never
2	3	15	Male	more-than-half-the-days	not-at-all	several-days	not-at-all	never	very-often	very-often	fairly-often	never	never	almost-never
3	4	35	Female	not-at-all	nearly-every-day	not-at-all	several-days	very-often	fairly-often	very-often	never	sometimes	never	fairly-often
4	5	23	Male	more-than-half-the-days	not-at-all	more-than-half-the-days	several-days	almost-never	very-often	almost-never	sometimes	sometimes	very-often	never

Preprocessing¶

The dataframe’s columns are raw questions from a survey. Some questions belong to a specific category, so we will annotate them with ids. The id is constructed from a prefix (the questionnaire category: GAD, PHQ, PSQI etc.), followed by the question number (1,2,3). Similarly, we will also the answers to meaningful numerical values.

Note: It’s important that the dataframe follows the below schema before passing into niimpy.

[3]:

# Convert column name to id, based on provided mappers from niimpy
col_id = {**PHQ2_MAP, **PSQI_MAP, **PSS10_MAP, **PANAS_MAP, **GAD2_MAP}
selected_cols = [col for col in df.columns if col in col_id.keys()]

# Convert from wide to long format
transformed_df = pd.melt(df, id_vars=['user', 'age', 'gender'], value_vars=selected_cols, var_name='question', value_name='raw_answer')

# Assign questions to codes
transformed_df['id'] = transformed_df['question'].replace(col_id)
transformed_df.head()

[3]:

	user	age	gender	question	raw_answer	id
0	1	20	Male	Little interest or pleasure in doing things.	several-days	PHQ2_1
1	2	32	Male	Little interest or pleasure in doing things.	more-than-half-the-days	PHQ2_1
2	3	15	Male	Little interest or pleasure in doing things.	more-than-half-the-days	PHQ2_1
3	4	35	Female	Little interest or pleasure in doing things.	not-at-all	PHQ2_1
4	5	23	Male	Little interest or pleasure in doing things.	more-than-half-the-days	PHQ2_1

Moreover, niimpy can convert the raw answers to numerical values for further analysis. For this, we need a mapping {raw_answer: numerical_answer}, which niimpy provides within the survey module that you can easily adjust to your own needs.

Based on the question’s id, niimpy maps the raw answers to their numerical presentation.

[4]:

# Transform raw answers to numerical values
transformed_df['answer'] = survey.survey_convert_to_numerical_answer(transformed_df, answer_col = 'raw_answer',
                                                                     question_id = 'id', id_map=ID_MAP_PREFIX, use_prefix=True)
transformed_df.head()

[4]:

	user	age	gender	question	raw_answer	id	answer
0	1	20	Male	Little interest or pleasure in doing things.	several-days	PHQ2_1	1
1	2	32	Male	Little interest or pleasure in doing things.	more-than-half-the-days	PHQ2_1	2
2	3	15	Male	Little interest or pleasure in doing things.	more-than-half-the-days	PHQ2_1	2
3	4	35	Female	Little interest or pleasure in doing things.	not-at-all	PHQ2_1	0
4	5	23	Male	Little interest or pleasure in doing things.	more-than-half-the-days	PHQ2_1	2

Print survey statistics¶

Now that we have finally preprocessed the survey, we can extract some meaningful statistic from it.

First, we can compute the mean, standard deviation, min, and max values of all questionnaires.

[5]:

d = survey.survey_print_statistic(transformed_df, question_id_col = 'id', answer_col = 'answer')
pd.DataFrame(d)

[5]:

	PHQ2	PSS10	GAD2
min	0.0000	4.000000	0.000000
max	6.0000	27.000000	6.000000
avg	3.0520	14.006000	3.042000
std	1.5855	3.687759	1.536423

You can specify the questionnaire that you want statistics of by passing a value into the prefix parameter.

[6]:

d = survey.survey_print_statistic(transformed_df, question_id_col = 'id', answer_col = 'answer', prefix='PHQ')
pd.DataFrame(d)

[6]:

	PHQ
avg	3.0520
max	6.0000
min	0.0000
std	1.5855