File formats

In principle, Niimpy can deal with any files of any format - you only need to convert them to a DataFrame. Still, it is very useful to have some common formats, so we present two standard formats with default readers:

  • CSV files are very standard and normal to create and understand, but in order to deal with them everything must be loaded into memory.

  • sqlite3 databases, which requires sqlite3 to read, but provides more power for filtering and automatic processing without reading everything into memory.

DataFrame format (in-memory)

In-memory, data is stored in a pandas DataFrame. This is basically a normal dataframe. There are some standardized columns (see the schema) and the index is a DatetimeIndex.

CSV files

CSV files should have a header that lists the column names and generally be readable by pandas.read_csv.

Reading these can be done with niimpy.read_csv:

[1]:
import os
import niimpy
import niimpy.config as config

# Read the battery data
df= niimpy.read_csv(config.MULTIUSER_AWARE_BATTERY_PATH, tz='Europe/Helsinki')

sqlite3 databases

For the purposes of niimpy, sqlite3 databases can generally be seen as supercharged CSV files.

A single database file could contain multiple datasets within it, thus when reading them a table name must be specified.

One reads the entire database into memory using sqlite.read_sqlite:

[2]:
# Read the sqlite3 data
df= niimpy.read_sqlite(config.SQLITE_SINGLEUSER_PATH, table="AwareScreen", tz='Europe/Helsinki')

You can list the tables within a database using niimpy.reading.read.read_sqlite_tables:

[3]:
niimpy.reading.read.read_sqlite_tables(config.SQLITE_SINGLEUSER_PATH)
[3]:
{'AwareScreen'}

sqlite3 files are highly recommended as a data storage format, since many common exploration options can be done within the database itself without reading the whole data into memory or writing an iterator. However, the interface is more difficult to use. Niimpy (before 2021-07) used this as its primary interface, but since then this interface has been de-emphasized. You can read more in the database section, but this is only recommended if you need efficiency when using massive amounts of data.

Other formats

You can add readers for any types of formats which you can convert into a Pandas dataframe (so basically anything). For examples of readers, see niimpy/reading/read.py. Apply the function niimpy.preprocessing.util.df_normalize in order to apply some standardizations to get the standard Niimpy format.