The discoveries of epidemiology and the new frontiers

Author:

Paolo Vineis


Date of publication: 02 January 2025
Last update: 02 January 2025

Abstract

Epidemiology has led to the identification of the main risk factors for chronic diseases, thus becoming the science at the root of prevention and public health. Among its most important contributions are the discovery of the risks associated with tobacco smoking and occupational carcinogens (such as asbestos, aromatic amines, some heavy metals, ionizing radiation) and other environmental and occupational exposures; the consequences for health of a diet poor in fruits and vegetables and raw fibers, or too rich in salt, red meat, and saturated fats; the many health consequences of increased body weight, and many other risk factors. It is a widespread opinion among scientists that a substantial reduction in the mentioned exposures would lead to a reduction of deaths from the main degenerative diseases - including cancer - in the order of at least 30–40%. Contemporary cohort studies have incorporated new technologies like Genome-Wide Association Studies, Epigenetics, Metabolomics, Proteomics, Transcriptomics and other methods. This has been made possible by the advancement of technologies, a reduction in their cost, and applicability to large populations. In addition, big steps forward have been made in the management and statistical analysis of big data, including methods that merge biostatistics with artificial intelligence.

 

The origins of modern epidemiology

Epidemiological methods have greatly contributed to the identification of risk factors for diseases, including cancer. From its beginning in the first half of the 19th century, epidemiology was strongly related to society, both in the sense that statistical data and concepts were introduced to improve public health, and through epidemiology’s connection with the birth of modern social science (e.g. with Adolphe Quetelet, 1796-1874). Mortality and morbidity statistical tables showed how poverty, crowding, and working conditions were related to health. Along with John Snow (1813-1858), William Farr (1807–1883) and William Budd (1811-1880) were the most important founders of the epidemiological school, and their work contributed to the creation of the London Epidemiological Society in 1850. Farr, in particular, had studied medicine and statistics and had been a student of Malthus. He developed a taxonomy of diseases, hypothesized a link between disease and urban poverty, and worked on the concept of “natural law” applied to medicine. According to Farr, the expected course of development of an epidemic could be predicted and modified by planned interventions such as vaccination. In the same way, the general health of the population could be improved by fighting the poverty-related diseases through better food and decent dwelling conditions (Farr 1834). In Farr’s day, the two main causal hypotheses about infectious diseases involved air and water as spreading factors. John Snow (1813-1858) designed the famous study from which the role of the contaminated Thames water emerged. He compared the mortality rates in several areas of London, each receiving water from a different water company, and concluded that in one particular area (Broad Street) the outbreak was more severe because the water company that supplied it pumped water from a polluted section of the Thames (Snow, 1855; Vandenbrouke 2001).

Bacteriology set the stage for causal attribution to agents of disease, based upon the following theoretical and practical premises: (1) the persistence of the microscopical agent in the passage from one host to another, as shown by Louis Pasteur (1822–1895) for anthrax and rabies (Pasteur 1880, 1885); (2) the demonstration by Koch that microorganisms could be isolated from patients and maintained in culture; and (3) the demonstration that the agent was able (as a “sufficient cause”) to induce the disease if inoculated in a new organism. These observations, later known as Koch’s postulates for causal attribution, are at the basis of modern conceptions of causality.

By the second half of the 19th century medicine already incorporated many components of the statistical, ecological, and social approaches that led to modern epidemiology. A personality representative of the multiplex nature of medicine in that era, and an example of medicine’s interdisciplinarity, is the German Rudolph Virchow (1821–1902). Virchow combined the pathologist (he founded cellular pathology) with the epidemiologist and the supporter of community medicine. It was Virchow, an innovator of microscopic medicine, who also believed that medicine was a social science. In this sense his work encompassed both the microscopic and the macroscopic levels. Virchow’s main work (1858), largely based on the concept of inflammation, represented one of the pillars of modern cancer research. Both the linkage between micro (mechanisms) and macro levels (populations), and the emphasis on inflammation are again at the centre of the stage in today’s cancer research. However, his immediate followers abandoned the original specific identification of inflammation as the main manifestation of cancer and developed a different classification of tumors.

Epidemiology in the 20th Century

After the efforts of Pierre Charles-Alexandre Louis (1787-1872), Quetelet and others, a quantitative method was eventually adopted in 19th-century medicine. The models used by Farr and then by Hamer and Brownlee (the latter describing the spreading of mumps in London) were deterministic — in other words, they were based on mathematical laws that were supposed to explain health events the way Kepler’s laws explained the movements of planets. It soon became clear, however, that mortality rates did not behave like planets and that epidemic curves did not have the desired predictive power. The need to introduce probabilistic models was recognized by some scholars, but these were not developed until the following generation, by statisticians addressing chronic disease.

The methodology of epidemiological studies in the 20th century was influenced both by the development of mathematical methods used to describe large epidemics and by the “epidemiological transition,” in which infectious diseases were replaced as the leading causes of death by chronic, degenerative diseases (cardiovascular, cerebrovascular, neurological diseases, diabetes, cancer).

Epidemiology has allowed us to identify several risk factors for chronic diseases, thus becoming the science at the root of prevention and public health. Among the most important contributions of epidemiology are the discovery of the risks associated with tobacco smoking and occupational carcinogens (such as asbestos, aromatic amines, some heavy metals, ionizing radiation) and other environmental and occupational exposures; the consequences for health of dietary habits such as a diet poor in fruits and vegetables and raw fibers, or too rich in salt, red meat, and saturated fats; and the health impact of excessive weight (to mention just a few). It is a widespread opinion among scientists that a substantial reduction in the mentioned exposures would lead to a reduction of deaths from the main degenerative diseases including cancer in the order of at least 30–40% (Vineis and Wild, 2015). This result has been made possible in particular by improvements in the conceptual basis of study designs (case-control, cohort studies) in exposure measurements and in statistical treatment of data.

An important work of evaluation and integration of the evidence on the carcinogenic hazards to humans is done by the International Agency for Research on Cancer (IARC). IARC group 1 Carcinogens are substances or exposure circumstances which have been classified as carcinogenic to humans, since “sufficient evidence” of carcinogenicity was found by Working Groups of experts. This category includes several viruses, parasites, behaviors such as smoking and alcoholic beverages, chemical exposures such as benzene, asbestos, aflatoxins, air pollution-related particulate, and others. The IARC Monographs have become an essential tool in the primary prevention of cancer in many countries. Complementary to this programme is another initiative led by IARC, the European Code Against Cancer (ECAC), promoted by the European Commission (see History of cancer prevention in Oncopedia).

All these recommendations derive from strong and well-designed epidemiological studies, thanks to the developments that occurred from the 1950s. A pivotal study was the Framingham Heart Study, that starting in 1948 collected information from a population of 5,200 residents of a town close to Boston. The Framingham study continues to provide important information on chronic diseases today, but according to current standards it can be considered a very small study (population cohorts such as EPIC and UK Biobank have enrolled half a million volunteers each, for example). The great season of population-based longitudinal studies was essentially based on information collected through interviews or questionnaires; typically, the British Doctors Study initiated in 1951 by Sir Richard Doll (1912-2005) and Austin Bradford Hill (1897-1991) was the world’s first large prospective study of the effects of smoking. Exceptions were represented by occupational cohorts in which measurements of exposure were available, and the recent wave of studies on the effects of air pollution, that included refined methods of exposure assessment based on routine monitoring data and the use of georeferencing technologies.

Causality assessment is not straightforward in epidemiology. A discussion has taken place on the “pyramid of evidence”, i.e. how to weight different types of studies and quality of evidence in assessing causality. Though the prevailing view has been that the Randomized Controlled Trial is the gold standard (e.g. experiments similar to those conducted with drugs), the reality is more nuanced, since for ethical reasons such trials usually cannot be conducted with suspect hazardous exposures. Also, there is no tenable objection to the carcinogenicity of Tobacco smoke in spite of the lack or experimental trials, since the observational evidence is consistent and overwhelming.

Recent decades have seen an acceleration in the methodology to overcome the limitations of previous studies. Some exposures are inherently difficult to investigate, such as diet, and this has led not only to an improvement of dietary interviews, but also to the introduction of advanced laboratory methods on a large scale. Contemporary cohort studies have included new technologies like Genome-Wide Association Studies, Epigenetics, Metabolomics (the totality of metabolites in a biological sample), Proteomics, Transcriptomics and other methods. This has been made possible by the advancement of technologies, a reduction in their cost, and applicability to large populations (Vineis et al, 2020). In addition, big steps forward have been made in the management and statistical analysis of big data, including methods that merge biostatistics with artificial intelligence (e.g. machine learning, neural networks) (Chadeau).

Such improvements have offered for example predictive models of how global climate warming will influence the incidence of malaria and other infectious diseases, have marked the influence of socioeconomic processes on the course of human diseases, and have explored gene-environment interactions, epigenetics-based age acceleration, and the more sophisticated approaches to molecular cancer epidemiology.

It is the analysis of the connections between both levels—biological mechanisms and the study of populations—that will allow us to identify the critical entry points for effective social and clinical interventions in disease. The new wave of epidemiology includes the approach that is called “exposome”, that is investigating multiple, life-course influences of the environment on the onset of diseases.

References

Chadeau-Hyam M, Campanella G, Jombart T, Bottolo L, Portengen L, Vineis P, Liquet B, Vermeulen RC. Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers. Environ Mol Mutagen. 2013 Aug;54(7):542-57. doi: 10.1002/em.21797.

Farr,W. 1834.Vital statistics, or the statistics of health, sickness, disease and deaths. In A statistical account of the British Empire, ed. J. R. McCulloch. London: C. Knight. Snow, J. 1855. On the mode of communication of cholera, 2nd ed. London: John Churchill.

Vandenbroucke, J. P. 2001. Changing images of John Snow in the history of epidemiol-ogy. Soz Praventivmed 46(5):288–93.

Vineis P, Wild CP. Global cancer patterns: causes and prevention. Lancet. 2014 Feb 8;383(9916):549-57. doi: 10.1016/S0140-6736(13)62224-2.

Vineis P, Robinson O, Chadeau-Hyam M, Dehghan A, Mudway I, Dagnino S. What is new in the exposome? Environ Int. 2020 Oct;143:105887. doi: 10.1016/j.envint.2020.105887.

Virchow, R. 1858. Die Cellularpathologie in ihrer Begründung auf physiologische und patholo-gische Gewebelehre. Berlin: Hirschwald.

 

1807-1883

(William Farr) - The introduction of regular medical statistics

1813-1858

(John Snow) - The discovery of the origins of cholera in London through geographic mapping

1821-1902

(Rudolph Virchow) - From the cellular basis of disease to medicine as a social science

1912-2005

(Richard Doll) - The British Doctors Study: the first large cohort study providing evidence on the effects of tobacco

1897-1991

(Austin Bradford Hill) - Posed the basis for causal reasoning in medical sciences.

1948-today

(Framingham Heart Study) - The first population study on the epidemiology of risk factors for heart disease