Tuesday, December 17, 2013

R - POSIXct vs. POSIXlt



As I continue to learn about managing date and time data, I'm working to understand the difference between

POSIXct and POSIXlt

POSIXct is a date-time data object that stores the number of seconds since a certain point in time in the past.

POSIXlt stores a list of day, month, year, hour, minute, second, etc.

I'm still learning when best to use each type in different contexts. More to come on that.

R - strftime() vs. strptime()

I'm learning how to handle date and time data in R.

There are two functions that are similar when converting date/time related data.

strptime() takes a character vector (string) and converts to a POSIXlt or POSIXct data object.

strftime() takes a a POSIXlt or POSIXct data object and converts to a character vector (string).

Here is documentation on use including formatting characters: http://www.inside-r.org/r-doc/base/strftime

Sunday, December 15, 2013

R - Comment Lines in Data File


Today I learned that you can have comments lines in data files. You need to include comment.char="#" as an argument to read.csv() and use whatever symbol you want, though a # (octothorpe) would be consistent with commenting in Rscript.

Example:
data <- read.csv("../datasets/heatmaps_in_r.csv", comment.char="#")

Saturday, December 14, 2013

R - Lubridate - Reading Date/Time Data

I'm reading in date and time data from my sleep project using the parse_date_time() function from the lubridate package.

sleep$StartDateTime <- parse_date_time(paste(sleep$Date, sleep$StartTime), "%m-%d-%Y %H.%M", truncate = 2)

The date and time data were in two separate columns so I needed to use the paste()function. The minutes data is in the format HH.MM.

Also, I was getting NA data at first when the minutes data was 00 (at the top of the hour). I added the truncate argument so that it would ignore incomplete data in seconds and minutes.

Here is a good summary of the parse_date_time() function: http://www.inside-r.org/packages/cran/lubridate/docs/parse_date_time

Monday, December 9, 2013

Sleep Segments

First graph of the number of sleep segments per night. This, combined with the duration of sleep would be the two most significant measurements of the quality of sleep. So far, it appears I am improving in my sleep by waking less during the night.

(click to view)

R - Converting Column to Date

In my sleep data, I have a date column that was originally a character type. When plotted in ggplot(), it was listing every date in the x axis, making the labels very crowded.

Then I added the following code

dailytotals$EndDate <- as.Date(dailytotals$EndDate , "%m-%d-%Y")

which converted it to a date type. Then ggplot automatically created tick marks and labels every 15 days (for data that spans two months.

Note that the %Y (capitalized) recognizes a four digit year where %y (lower case) recognizes a two digit year.

Saturday, December 7, 2013

Initial Sleep Analysis

This is my first post sharing my initial analysis of my sleep data. I've collected data for two months now using Tasker on my Android phone. It collects the start and end date/time of each sleep period as well as why I woke up.

I've just started using R and love it. Here is the first graph showing the total number of hours slept each night. Weekends are colored red because I wanted to know if I slept better on the weekends (it doesn't seem to matter).

Here is the graph:
(click to view)
I had a few days in November where my tasker action didn't save all the sleep intervals. On those days, I added estimate data. I may remove those dates altogether. There may be other dates that are incomplete. I have a could of days with less than four hours and I can't remember if that is correct or not. That probably has an impact on the dip in sleep duration in November.

Generally, I am getting more sleep that I expected, averaging around 6.5 to 7 hours per night.

Next I need to factor in the number of sleep segments to come up with a measurement of sleep quality (greater duration and fewer segments represents better sleep). I also need to consider why I woke up. If I was interrupted by an alarm, for example, and that cut my sleep shorter than it would have been naturally, it's not necessarily fair to treat that as a dependent variable when considering factors that affect sleep quality.

Here is the R Script I used (via RStudio):
library(methods) 
library(lubridate)

sleep <- read.csv("C:/Users/xxxxxxx/Documents/R/Sleep/sleepdata.csv", header=T)

sleep["EndDate"] <- NA

sleep$EndDate <- ifelse(sleep$EndTime > 12, format(mdy(as.character(sleep$Date)) + days(1), format="%m-%d-%Y"), format(mdy(as.character(sleep$Date)), format="%m-%d-%Y"))  #fill column with date of the morning of each sleep period

sleep["Weekday"] <- NA

sleep$Weekday <- weekdays(mdy(sleep$EndDate))  #fill column with weekday of the sleep period

sleep  #display data in case I want to review

dailytotals <-unique(within(sleep, {
  Duration <- ave(Duration, EndDate, FUN=sum)rm(Date,StartTime,EndTime,Status)
}))  #create frame, one record per day with total number of hours slept

dailytotals  #display in case I want to review

ggplot(dailytotals, aes(x=EndDate, y=Duration, group=1, colour=Weekday)) +
  geom_point() +  #plot points on graph
  stat_smooth(level=.99) + #regression/curve line with 99% certainty range
  theme(axis.text.x = element_text(angle = 90), axis.title.x = element_text(angle = 0), axis.title.y = element_text(angle = 0)) +  #labels, turn x axis vertically
  scale_color_manual(values=c(Saturday="red", Sunday="red", Monday="blue", Tuesday="blue", Wednesday="blue", Thursday="blue", Friday="blue"))  #color code weekend days