3.2 importExcelData Sub- Functions
3.2.1 Import the Data Dictionary
The readDataDict
function will import the dictionary file. This is slightly more useful than just using readxl::read_excel
because checks will be performed to ensure the dictionary is complete and dates will be converted to easy to read text strings.
The file and sheet name must be specified.
Optional Arguments:
colnames
Column names of the DataDictionary, defaults to those used in the Excel templaterange
allows a specified range of the sheet to be imported (ie range = “A1:F10”)origin
enables the origin of Excel dates to be set (defaults to 30 December 1899)
!readDataDict
must be run before readExcelData
3.2.2 Import the Data
The readExcelData
function will import the data, using the specifications provided in the data dictionary. Again, this should be slightly more useful than a readxl::read_excel because:
- dates will be imported nicely
- variables not in the data dictionary will be removed
- there is an option to store any import warnings, which may be useful to send to the user
- the data can be read in ‘strictly’, setting all values that fall outside the allowed range to
NA
(This can be useful if users insist on using missing value codes)
The file and sheetname, as well as the object returned by the readDataDict
function must be specified.
Optional Arguments:
range
allows a specified range of the sheet to be imported (ie range = “A1:AB102”)origin
enables the origin of Excel dates to be set (defaults to 30 December 1899)saveWarnings
will return a list with the data frame and the import warnings (if there are any)setErrorsMissing
will set all values out of range toNA
this should only be done if no warnings are given
3.2.3 Perform Checks
The checkData
function will simply check the data against the data dictionary. This requires the outputs of readDataDict
and readExcelData
.
The checkData
function will check for duplicate rows and check that all numeric, integer, date, code and category variables meet the requirements of the data dictionary. It does not check the data types, because these are enforced by the readExcelData
function.
The outputs from readDataDict
and readExcelData
are required.
Optional Argument:
- id can be used to indicate the patient identifier (ie id =‘SubjectID’) which will provide errors for each id, instead of by row number
checkData
returns a list with three elements:
a data frame containing all the errors (rows and columns without errors are removed)
a two-column data frame with Variable Names in one column and the location of errors in the second (either row numbers or IDs if provided)
a two-column data frame with Row Numbers (or IDs) in one columns and the variable containing errors in the second column.
3.2.4 Create Factor Variables
If keepOriginal = TRUE then addFactorVariables
function will rename all the code and category variables suffixed with _orig
(so ecog becomes ecog_orig) and create factor variables using the codes provided in the data dictionary as the factor levels and labels. Otherwise, if keepOriginal = FALSE then the existing variables are over-written with factor variables.
3.2.5 Create Calculated Variables
The ‘createCalculated’ function calculated variables according the specifications given in the data dictionary. For recoded variables a contingency table is exported to the log file to ensure the categories have been created as expected. For categorised variables a table with the minimum and maximum values on the continuous variable is given for each category and also exported to the log file. For combined variables an attribute is tagged onto the variable so that it is printed as a Pareto graph by the PlotVariables
function.