Process data for the Pregnancy 24/7 study • pregnancy247

Goal

The goal of the process_data() function is to process the Actiwatch, sleep diary, and activPAL data for the Pregnancy 24/7 study. This document will provide an overview of how to use process_data() appropriately based on the Pregnancy 24/7 study and data structures.

Load the package

Assuming the pregnancy247 package is installed, use the library() command to load the pregnancy247 package, as shown below, to process the Actiwatch, sleep diary, and activPAL data to the specifications of the study.

library(pregnancy247)

This allows the user to access all of the available functions present in the package, such as process_data(), without a direct reference to the functions, pregnancy247::process_data().

Set the working directory

The process_data() function uses the read_sleep(), read_events(), save_plot_1s_plot(), and all write() functions, which all require the working directory is set to the directory containing the

DATA_Iowa_activPAL,
DATA_Pitt_activPAL, and
DATA_WVU_activPAL

sub-directories with subject-specific directories within each sub-directory. Additionally, it also assumes that each visit during the study period is the concatenation of the subject ID and trimester (e.g. ./DATA_Iowa_activPAL/1000-AB/1000-AB1 for the first trimester). The requirement of the working directory and file system is based on the needs of Pregnancy 24/7. Thus, to alter this requirement adjust the dependencies of the file system in the read and write functions. If RStudio is being used, the working directory may be set by the following.

Select the Session tab in the toolbar
Navigate to Choose Directory … listed within Set Working Directory
Navigate the file explorer to the main data processing folder, which for the Pregnancy 24/7 is rdss_kawhitaker/Pregnancy247/ActivPAL Processing

Process data

The Actiwatch, sleep diary, and activPAL data are processed for a single subject at a time with the process_data() function. This function takes a few arguments that are listed below. Once the arguments are specified, all one needs to do is execute the function call. This will depend on your operating system.

process_data(subject = "1000-AB", trimester = 1, sleep_source = "./sleep.csv")

The subject and trimester parameters allow the activPAL data to imported by read_events().

subject denotes the study subject whose data is to be processed and should be listed as ####-ab.
trimester denotes the pregnancy trimester of the study subject during the data collection period.
sleep_source is the file name that contains the current version of the sleep diary and Actiwatch data.
- The variable names for this data are based on the Pregnancy 24/7 study.
- Additionally, the read_sleep() function is used to import the sleep_source data.
day1 through day9 denotes whether each day during the wear period is to be consider a valid day in terms of wearing the activPAL device. By default, each day is a valid day. Whether or not a day is valid is denoted by a TRUE or FALSE logical value with the default being TRUE.
- Once the data processing beings, there is the potential for the processing to stop if a missing sleep, nap, work, or monitor time appears. A dialog box will present itself when this occurs. The box will read as Day j has a missing (sleep/nap/work/monitor) time. Would you like to stop processing? for $j = 1, 2, \ldots, 9$ . Usually, processing is stopped if an investigation of why a time is missing is needed. If the data processor is aware of this missing time and would like to continue processing the data, then answer No.
- The processing may also stop if the subject is a shift-worker and does not sleep during one of the wear days. If the subject does not sleep for a day, stop the data processing and add in a sleep interval of 1 minute on the day of no sleep in the sleep diary (e.g. 23:58 to 23:59). Then, process the data again and do NOT stop the data processing due to the subject being a shift-worker.

The data dictionary for the sleep_source data set may be downloaded from inst/extdata of the Github site for the package.

Outputted data sets

The entire creation of process_data() depends on the

functions in the pregnancy247/R package. There is no R object returned by this function, but files are written to their appropriate subject-specific directories based on the subject and trimester parameters. A total of six data objects will be exported and the description of these data objects are presented below.

Events data is the collection of all the activity events that occurred during the wear period. Additionally, indicator variables are created to denote sleep, naps, and work times. These variables are named sleep_loop, nap_loop, and work_loop, respectively. The wear_day variable denotes the day during the wear period, where each wear day begins with a sleep window. Thus, the second wear day begins when the subject goes to sleep after day 1. The outputted data set is named ####-ab(t)_eventfile.csv, with (t) denoting the trimester of pregnancy.
One second EPOCH data is an expanded version of the events data to a second by second accounting of activity during the wear period. The outputted data set is named ####-ab(t)_1secepoch.csv, with (t) denoting the trimester of pregnancy.
Daily values data is the collection of daily movement and sedentary metrics. These values are calculated using the archived activpalProcessing package. By default, days 1 and 9 should not be used for further activity analysis. However, if one of the other wear days has less than 10 hours of activPAL wear, day 1 may be considered for analysis if it has at least 10 hours of wear. The outputted data set is named ####-ab(t)_daily_values.csv, with (t) denoting the trimester of pregnancy.
Weekly averages data is the weekly averages of the movement and sedentary metrics during the wear period. Only the days that resulted in 10 hours or more of wear time are included in these averages. The outputted data set is named ####-ab(t)_weekly_avgs.csv, with (t) denoting the trimester of pregnancy.
- Additionally, the weekly averages are included in a site-specific data set of weekly movement and sedentary metrics. These data sets are weekly_avgs_IOWA.csv for the University of Iowa, weekly_avgs_PITT.csv for the University of Pittsburgh, and weekly_avgs_WVU.csv for West Virginia University.
Graph of wear day activity is saved as a PDF. The outputted graph is named ####-ab(t)_graph.pdf, with (t) denoting the trimester of pregnancy.

The data dictionaries for these outputted data sets may be downloaded from inst/extdata/outdata/ of the Github site for the package. The file names of the dictionaries are related to the outputted data sets.