
Chair: Dr Menelaos Pavlou
Mission and objectives
Prediction models can support clinical decision-making by providing individual predictions for patients. For example, a model predicting the risk of sudden death in patients with hypertrophic cardiomyopathy uses predictors such as age and medica history to guide implantation of cardioverter defibrillators.
The predictive performance of a model depends on how informative the predictors are, and how well their relationship with the outcome is estimated. Recent guidance recommends calculating development sample size to ensure that expected performance in terms of calibration and other measures meets prespecified targets.
Simple formulae were proposed for linear, logistic and Cox regression.
In this session we will discuss a framework for calculating the sample size required to develop reliable prediction models and demonstrate how the calculations can be performed with standard software. We will briefly discuss related topics such as the use of Machine Learning techniques for model fitting, the use of data-driven predictor selection for the final model and the handling of missing data when developing a model.
Aims of the session:
- Introduce prediction modelling and measures of assessing predictive performance (model validation)
- Introduce a general framework for calculating the sample size to develop reliable prediction models
- Demonstrate the use of standard software to calculate sample size for with
traditional regression methods (fitted with maximum likelihood) - Present a case study
- Discuss the following additional topics (depending on duration)
- Challenges and extensions to more complex modelling techniques
- Data-driven selection of predictors
- Handling of missing predictor values when developing, validating or
implementing models
Intended Audience
The session will be relevant to statisticians, epidemiologists, public health researchers, clinicians, and policymakers. It will include an introduction to predictive modelling and except for general knowledge of regression modelling (e.g. logistic regression) it will not require advanced statistical skills.
A previous workshop on similar topics was presented to an audience with diverse backgrounds (see below). We acknowledge that participants with different backgrounds may have interest in different aspects of the presented methods. The use of break out rooms will enable the organisers to engage with different subgroups of participants.
Expected outcomes
By the end of the session participants will be able to:
- Understand what factors (inputs) affect the sample size requirements for a given model and how to obtain information about these inputs
- Use standard software to calculate the minimum sample size to develop a reliable prediction model
- Present these calculations when writing a paper or applying for a grant.
- Recognise advantages and limitations of complex modelling approaches for model development
Agenda (provisional)
- Introduction to prediction modelling
- A framework for sample size calculations for the development of risk models
- Demonstration of software for sample size calculations
- Case study from cardiology: a clinician’s perspective.
- Additional topics: Machine Learning, predictor selection, missing data
Previous iterations
A previous version of the proposed event was presented as part of Global Engagement Grant from UCL (MP and GA were co-applicants) in collaboration with the Bolu Abant Izzet Baysal University, Turkey Date and location: 16th -17th April 2024, Bolu Abant Izzet Baysal University, Turkey.
Number of participants: 40
Enhancements for the current session: In the current iteration we plan to present recent developments and software updates that were not available at the previous iteration.
Speakers

Department of Statistical Science, University College London

Institute of Cardiovascular Health, University College London and Barts Heart Centre, St Bartholomew’s Hospital

Prof Gareth Ambler
Department of Statistical Science, University College London