Interesting piece from Paul Aylin in this week's BMJ showing that simple may be as good as complex -

The issue is predicting mortality after operation. Clinicians understandably take the view that vast numbers of factors influence the patient's chances of surviving an operation. Hence they feel the need for bespoke clinical databases, with loads of data fields. Aylin shows that the simple data available in the national Hospital Episode System can be used to do as good as job, at least for the four operations he studied: coronary artery bypass graft, repair of^{ }abdominal aortic aneurysm (elective and emergency), and colorectal excision for cancer.

Reminds me of an irresistable paper called 'The robust beauty of improper linear models' by Robyn Dawes. Originally published in American Psychologist 1979; 34: 571 - 582. and reprinted in an excellent book called Judgement under uncertainty (Kahneman D, Slovic P, Tversky A eds. Cambridge University Press 1982, costs about £35 in paperback).

Dawes makes the point (proved beyond doubt by other papers in the book) that human beings are particularly useless at combining information. So we may know the important factors in predicting an outcome but we can't put them together. Given that reality, any numberical rule of thumb (e.g. just add them up) is better that 'expert judgement'. His original research asked tutors to rate their students. Tutors were pretty hopeless; they couldn't predict which students would be successful. But they knew the important factors were ability and achievement. Simply adding up the student's Graduate Record Examination (ability) and his or her grade point average (achievement) gave a far better prediction than the tutor 'expert' judgment.

Incidentally this sort of thinking underpinned the recent MTAS system - doomed by crass implementation rather than the theory behind it. Interviews are the worst possible way to select people.

Limitation - this argument really only applies to numerical outcomes. But that includes e.g. mortality and survival rates.

(Just so you know - a 'proper' linear model is one which uses linear regression or similar to get the best possible weights (regression coefficient) for each predictor to predict the overall outcome. An improper model uses any old weight e.g. weight them equally.)