Sample size requirements for developing accurate prediction models in health

Chair: Dr Menelaos Pavlou

Mission and objectives

Prediction models can support clinical decision-making by providing individual predictions for patients. For example, a model predicting the risk of sudden death in patients with hypertrophic cardiomyopathy uses predictors such as age and medica history to guide implantation of cardioverter defibrillators.

The predictive performance of a model depends on how informative the predictors are, and how well their relationship with the outcome is estimated. Recent guidance recommends calculating development sample size to ensure that expected performance in terms of calibration and other measures meets prespecified targets.

Simple formulae were proposed for linear, logistic and Cox regression.
In this session we will discuss a framework for calculating the sample size required to develop reliable prediction models and demonstrate how the calculations can be performed with standard software. We will briefly discuss related topics such as the use of Machine Learning techniques for model fitting, the use of data-driven predictor selection for the final model and the handling of missing data when developing a model.

Aims of the session:

Introduce prediction modelling and measures of assessing predictive performance (model validation)
Introduce a general framework for calculating the sample size to develop reliable prediction models
Demonstrate the use of standard software to calculate sample size for with
traditional regression methods (fitted with maximum likelihood)
Present a case study
Discuss the following additional topics (depending on duration)
- Challenges and extensions to more complex modelling techniques
- Data-driven selection of predictors
- Handling of missing predictor values when developing, validating or
  implementing models

Intended Audience

The session will be relevant to statisticians, epidemiologists, public health researchers, clinicians, and policymakers. It will include an introduction to predictive modelling and except for general knowledge of regression modelling (e.g. logistic regression) it will not require advanced statistical skills.

A previous workshop on similar topics was presented to an audience with diverse backgrounds (see below). We acknowledge that participants with different backgrounds may have interest in different aspects of the presented methods. The use of break out rooms will enable the organisers to engage with different subgroups of participants.

Expected outcomes

By the end of the session participants will be able to:

Understand what factors (inputs) affect the sample size requirements for a given model and how to obtain information about these inputs
Use standard software to calculate the minimum sample size to develop a reliable prediction model
Present these calculations when writing a paper or applying for a grant.
Recognise advantages and limitations of complex modelling approaches for model development

Agenda (provisional)

Introduction to prediction modelling
A framework for sample size calculations for the development of risk models
Demonstration of software for sample size calculations
Case study from cardiology: a clinician’s perspective.
Additional topics: Machine Learning, predictor selection, missing data

Previous iterations

A previous version of the proposed event was presented as part of Global Engagement Grant from UCL (MP and GA were co-applicants) in collaboration with the Bolu Abant Izzet Baysal University, Turkey Date and location: 16th -17th April 2024, Bolu Abant Izzet Baysal University, Turkey.

Number of participants: 40

Enhancements for the current session: In the current iteration we plan to present recent developments and software updates that were not available at the previous iteration.

Speakers

Dr Menelaos Pavlou (Chair)

Department of Statistical Science, University College London

Dr Athanasios Bakalakos

Institute of Cardiovascular Health, University College London and Barts Heart Centre, St Bartholomew’s Hospital

Prof Gareth Ambler

Department of Statistical Science, University College London

~~17 Dec~~ 21 Dec 2025	Workshop/ Tutorial/ Panel proposal: Submission EXTENDED
13 Jan 2026	Workshop/ Tutorial/ Panel proposal: Notification
~~21 Jan~~ 28 Jan 2026	Main track: Submission EXTENDED
~~11 Feb~~ 18 Feb 2026	PhD/MSc Students Track: Submission EXTENDED
~~11 Feb~~ 18 Feb 2026	Posters & Demos Track: Submission EXTENDED
~~11 Feb~~ 4 Mar 2026	Workshop CFP: Submission EXTENDED
11 Mar 2026	Innovation Award: Submission
18 Mar 2026	Persona-Based Workshop CFP: Submission
25 Mar 2026	Main Track: Notification
25 Mar 2026	Posters & Demos Track: Notification
25 Mar 2026	PhD/MSc Student Track: Notification
25 Mar 2026	Workshop CFP: Notification
27 Mar 2026	Innovation Award: Notification
30 April 2026	Early bird registration
7 May 2026	All Tracks: Camera-ready
24 May 2026	Cut-off date for room allotment (in hotel venue)
~~25 Mar~~ 2 June 2026	Innovation Award: Submission EXTENDED
23 June 2026	Pre-Conference
24 – 26 June 2026	Main Conference