4 Formatting data
For all of the function to run, the data must be in the long format. This means that each event must be on a new row. An event would be a single point, a line segment, or an arrow. If a study unit has multiple events occur they must be recorded over multiple rows. Often times data is given in the long format (eg. One row per patient).
4.1 Long data
Here is an example data.frame in the long format.
<- structure(list(ID = c("ID:001", "ID:002", "ID:003"), Date.begin.Treatment = structure(c(14307,
wide_example 14126, 15312), class = "Date"), AE = structure(c(16133, 14491,
NA), class = "Date"), SAE = structure(c(16316, NA, 16042), class = "Date"),
Death.date = structure(c(16499, NA, 17869), class = "Date"),
Response1 = c("SD", "SD", NA), Response1.Start = structure(c(14745,
14345, NA), class = "Date"), Response1.End = structure(c(15111,
14418, NA), class = "Date"), Response2 = c("CR", "PR", NA
Response2.Start = structure(c(15768, 14674, NA), class = "Date"),
), Response2.End = structure(c(16133, 14856, NA), class = "Date"),
Response3 = c(NA, "CR", NA), Response3.Start = structure(c(NA,
14856, NA), class = "Date"), Response3.End = structure(c(NA,
15587, NA), class = "Date"), Last.follow.up = structure(c(16499,
17048, 17869), class = "Date")), class = "data.frame", row.names = c(NA,
-3L))
ID | Date.begin.Treatment | AE | SAE | Death.date | Response1 | Response1.Start | Response1.End | Response2 | Response2.Start | Response2.End | Response3 | Response3.Start | Response3.End | Last.follow.up |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID:001 | 2009-03-04 | 2014-03-04 | 2014-09-03 | 2015-03-05 | SD | 2010-05-16 | 2011-05-17 | CR | 2013-03-04 | 2014-03-04 | NA | NA | NA | 2015-03-05 |
ID:002 | 2008-09-04 | 2009-09-04 | NA | NA | SD | 2009-04-11 | 2009-06-23 | PR | 2010-03-06 | 2010-09-04 | CR | 2010-09-04 | 2012-09-04 | 2016-09-04 |
ID:003 | 2011-12-04 | NA | 2013-12-03 | 2018-12-04 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2018-12-04 |
All of the dates need to be converted to time. For each patient the Date.begin.Treatment is the starting point (Time 0)
<- c("Date.begin.Treatment","AE","SAE",'Death.date','Response1.Start', 'Response1.End','Response2.Start', 'Response2.End',
date_cols 'Response3.Start' ,'Response3.End' ,'Last.follow.up') # Getting the columns with dates
<- lapply(wide_example[date_cols], as.numeric) # Converting to numbers
wide_example[date_cols] <- round((wide_example[date_cols]-wide_example$Date.begin.Treatment)/365.25,1) #Calcuating the time in years since the start of treatment
wide_example[date_cols] ::kable(wide_example) knitr
ID | Date.begin.Treatment | AE | SAE | Death.date | Response1 | Response1.Start | Response1.End | Response2 | Response2.Start | Response2.End | Response3 | Response3.Start | Response3.End | Last.follow.up |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID:001 | 0 | 5 | 5.5 | 6 | SD | 1.2 | 2.2 | CR | 4.0 | 5 | NA | NA | NA | 6 |
ID:002 | 0 | 1 | NA | NA | SD | 0.6 | 0.8 | PR | 1.5 | 2 | CR | 2 | 4 | 8 |
ID:003 | 0 | NA | 2.0 | 7 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 7 |
The wide data can be used to create the bars of the swimmer plot
<- swimmer_plot(df=wide_example,id='ID',end='Last.follow.up',col='black',fill='grey')
plot plot