Process data for the Pregnancy 24/7 study
process-data.Rmd
Goal
The goal of the process_data()
function is to process
the Actiwatch, sleep diary, and activPAL data for the Pregnancy 24/7
study. This document will provide an overview of how to use
process_data()
appropriately based on the Pregnancy 24/7
study and data structures.
Load the package
Assuming the pregnancy247
package is installed,
use the library()
command to load the pregnancy247
package, as shown below, to process the Actiwatch, sleep diary, and
activPAL data to the specifications of the study.
This allows the user to access all of the available functions present
in the package, such as process_data()
, without a direct
reference to the functions,
pregnancy247::process_data()
.
Set the working directory
The process_data()
function uses the
read_sleep()
, read_events()
,
save_plot_1s_plot()
, and all write()
functions, which all require the working directory is set to the
directory containing the
-
DATA_Iowa_activPAL
, -
DATA_Pitt_activPAL
, and DATA_WVU_activPAL
sub-directories with subject-specific directories within each
sub-directory. Additionally, it also assumes that each visit during the
study period is the concatenation of the subject ID and trimester
(e.g. ./DATA_Iowa_activPAL/1000-AB/1000-AB1
for the first
trimester). The requirement of the working directory and file system is
based on the needs of Pregnancy 24/7. Thus, to alter this requirement
adjust the dependencies of the file system in the read and write
functions. If RStudio is
being used, the working directory may be set by the following.
- Select the Session tab in the toolbar
- Navigate to Choose Directory … listed within Set Working Directory
- Navigate the file explorer to the main data processing folder, which
for the Pregnancy 24/7 is
rdss_kawhitaker/Pregnancy247/ActivPAL Processing
Process data
The Actiwatch, sleep diary, and activPAL data are processed for a
single subject at a time with the process_data()
function. This function takes a few arguments that are listed below.
Once the arguments are specified, all one needs to do is execute the
function call. This will depend on your operating system.
process_data(subject = "1000-AB", trimester = 1, sleep_source = "./sleep.csv")
The subject
and trimester
parameters allow
the activPAL data to imported by read_events()
.
-
subject
denotes the study subject whose data is to be processed and should be listed as####-ab
. -
trimester
denotes the pregnancy trimester of the study subject during the data collection period. -
sleep_source
is the file name that contains the current version of the sleep diary and Actiwatch data.- The variable names for this data are based on the Pregnancy 24/7 study.
- Additionally, the
read_sleep()
function is used to import thesleep_source
data.
-
day1
throughday9
denotes whether each day during the wear period is to be consider a valid day in terms of wearing the activPAL device. By default, each day is a valid day. Whether or not a day is valid is denoted by aTRUE
orFALSE
logical value with the default beingTRUE
.- Once the data processing beings, there is the potential for the processing to stop if a missing sleep, nap, work, or monitor time appears. A dialog box will present itself when this occurs. The box will read as Day j has a missing (sleep/nap/work/monitor) time. Would you like to stop processing? for . Usually, processing is stopped if an investigation of why a time is missing is needed. If the data processor is aware of this missing time and would like to continue processing the data, then answer No.
- The processing may also stop if the subject is a shift-worker and does not sleep during one of the wear days. If the subject does not sleep for a day, stop the data processing and add in a sleep interval of 1 minute on the day of no sleep in the sleep diary (e.g. 23:58 to 23:59). Then, process the data again and do NOT stop the data processing due to the subject being a shift-worker.
The data dictionary for the sleep_source
data set may be
downloaded from inst/extdata
of the Github
site for the package.
Outputted data sets
The entire creation of process_data()
depends on the
-
check_dir()
, -
read_sleep()
, -
read_events()
, -
windows()
, -
process_events()
, -
merge_events()
, -
create_1s_epoch()
, -
plot_1s_epoch()
, -
summarize_daily()
, and write()
functions in the pregnancy247/R
package. There is no R
object returned by this function,
but files are written to their appropriate subject-specific directories
based on the subject
and trimester
parameters.
A total of six data objects will be exported and the description of
these data objects are presented below.
-
Events data is the collection of all the activity
events that occurred during the wear period. Additionally, indicator
variables are created to denote sleep, naps, and work times. These
variables are named
sleep_loop
,nap_loop
, andwork_loop
, respectively. Thewear_day
variable denotes the day during the wear period, where each wear day begins with a sleep window. Thus, the second wear day begins when the subject goes to sleep after day 1. The outputted data set is named####-ab(t)_eventfile.csv
, with(t)
denoting the trimester of pregnancy. -
One second EPOCH data is an expanded version of the
events data to a second by second accounting of activity during the wear
period. The outputted data set is named
####-ab(t)_1secepoch.csv
, with(t)
denoting the trimester of pregnancy. -
Daily values data is the collection of daily
movement and sedentary metrics. These values are calculated using the
archived
activpalProcessing
package. By default, days 1 and 9 should not be used for further activity analysis. However, if one of the other wear days has less than 10 hours of activPAL wear, day 1 may be considered for analysis if it has at least 10 hours of wear. The outputted data set is named####-ab(t)_daily_values.csv
, with(t)
denoting the trimester of pregnancy. -
Weekly averages data is the weekly averages of the
movement and sedentary metrics during the wear period. Only the days
that resulted in 10 hours or more of wear time are included in these
averages. The outputted data set is named
####-ab(t)_weekly_avgs.csv
, with(t)
denoting the trimester of pregnancy.- Additionally, the weekly averages are included in a site-specific
data set of weekly movement and sedentary metrics. These data sets are
weekly_avgs_IOWA.csv
for the University of Iowa,weekly_avgs_PITT.csv
for the University of Pittsburgh, andweekly_avgs_WVU.csv
for West Virginia University.
- Additionally, the weekly averages are included in a site-specific
data set of weekly movement and sedentary metrics. These data sets are
-
Graph of wear day activity is saved as a PDF. The
outputted graph is named
####-ab(t)_graph.pdf
, with(t)
denoting the trimester of pregnancy.
The data dictionaries for these outputted data sets may be downloaded
from inst/extdata/outdata/
of the Github
site for the package. The file names of the dictionaries are related to
the outputted data sets.