Background:
Several International Classification of Disease-10 (ICD-10) codes for rheumatic heart disease (RHD) are non-specific and default to RHD if rheumatic origin of valvular disease is unspecified. This results in substantial biases in RHD counts.
Aim:
To develop a comprehensive prediction model for predicting RHD status from ICD-codes in linked hospital records from five Australian jurisdictions
Methods:
RHD-coded (ICD-10 I05-I09) patient episodes (n=4087) in Queensland were validated through chart reviews and linked with hospital records from 2000 to 2017. Demographic and diagnosis variables available from the linked data include age, sex, Aboriginal status, diagnosis codes from multiple admissions (including previous acute rheumatic fever, group A streptococcal infection and non-rheumatic valvular and congenital heart disease), hospital and admission type, relevant procedure codes, pregnancy and socioeconomic variables.
Results:
We developed a prediction algorithm based on a generalised linear mixed model. Variables are categorised (“binned”) by subject and each bin is introduced subsequently into the model. Each candidate model is k-fold cross-validated and the optimal variable set is chosen based on a combination of loss functions. We suggest appropriate methods for predicting out of sample random effects. Results will be validated using other machine learning methods such as classification and regression trees and random forests. The model is illustrated through empirical data from the Australian RHD Burden study.
Conclusion:
This prediction algorithm will allow us to credibly quantify the burden of RHD at a national level which is a key prerequisite for guiding national efforts to end RHD within a generation.