3.5 Calculations

All calculations can be performed by the Biostatistics department. These will be done in code and are easily reproduced and re-calculated if there are changes to the data. This saves time and increases data quality.

Examples of some variables that can be easily calculated:

  • Age (from Date of Birth and Assessment Date)
  • Overall Survival (from Date of Diagnosis and Date of Death)
  • Survival Status (from Date of Death and Last Follow-up)
  • Age or BMI categories from raw data

Do not do calculations in Excel. Instead, specify re-codings and calculations in the data dictionary.

Your can specify syntax to automatically create re-coded variables (these will not appear in your data entry sheet)

3.5.1 Recoded Variables

Recoded variables are calculated from categorical or coded variables (as opposed to categorising a numerical variable)

Syntax: OrginalVar,newCode1=oldCode1,oldCode2,newCode2=oldCode3,oldCode4

where OriginalVar is the variable in the data to be recoded and newCodes=oldCodes gives the new category followed by comma-separated original categories.

Example of recoding a variable in the Data Dictionary:

Example of a recoded variable

This will create the T0_Stg variable from T_Stage. T0_Stg will be T0 if T_Stage is T0 and T1up for all other values of T_Stage.

3.5.2 Categorised Variables

Categorised variables are created from continuous variables

Syntax: OrginalVar,category1=<cutoff1,category2=<cutoff2,category3

where OriginalVar is the variable in the data to be categorised and category1,category2 etc are the names of the new categories (these will become factor levels) and cutoff1,cutoff2 are the cut-offs for each category and the final category (in this case category3) does not have a cutoff, only a name, because all people not meeting earlier criteria will be in this level. Note that at this point it is only possible to create categories by specifying the upper bounds.

Example of categorising a continuous variable in the Data Dictionary:

Example of a recoded variable This will create the AgeGroup variable from Age with four levels: "under50’,‘50-60’,‘60-69’,‘70plus’