A better understanding of the LongITools data sets

The LongITools consortium has access to a large resource of life-course data, including prospective birth cohort studies and longitudinal studies in adults, register-based cohorts, randomised controlled trials, patient databases and maternity and hospital biobanks. These LongITools data sets include data variables on over 11 million EU citizens, from 24 different studies, which will be used to help understand how the environment and lifestyle that people are exposed to, combined with people’s biology and genetics, leads to the risk of certain diseases.

Map of studies participating in LongITools

For full names and a comprehensive list of all studies being used in the LongITools project, please visit the About page and see LongITools Data.

Types of data sets explained


A cohort of people is a group who have something in common. Cohort studies are a type of longitudinal study – an approach that follows research participants over a period of time (often many years). The ten birth cohorts within the project will provide substantial longitudinal data from pregnancy to adolescence and early adulthood, complemented by six prospective adult cohorts, with multiple follow-ups during adulthood and in older age.

One such study being used in LongITools is the Northern Finland Birth Cohort (NFBC) 1966 study, which recruited all mothers in Oulu and Lapland, the two northernmost provinces of Finland, with expected dates of delivery in the year 1966. In total 12,068 mothers were included giving birth to 12,058 live born offspring (5,889 girls and 6,169 boys, of which 314 were twins). Offspring were followed up prospectively at four different time point periods; at age 1, 14, 31, 46 and 54 years of age. The latest follow-up was completed in June 2021 to follow-up the effect of the COVID-19 pandemic. The children are followed from early pregnancy and until young adulthood.

Another example, CONSTANCES, is a general-purpose population-based adult cohort. It includes a nationally representative, randomly selected sample of 200,000 French adults aged between 18 and 69.


A biobank is a collection of biological samples linked to health information. Samples of bodily fluid or tissue e.g., blood, are collected for research use to improve understanding of health and disease. The biobanks involved in LongITools will be essential for the generalisation of the analyses in large populations. We are using data from two biobanks, one being the UK Biobank, a large-scale biomedical database and research resource, containing in-depth genetic and health information. Since 2006, UK Biobank has collected biological and medical data on half a million people, aged between 40 and 69 years old and living in the UK, as part of a large-scale prospective study. They regularly provide blood, urine and saliva samples, as well as detailed information about their lifestyle which is then linked to their health-related records to provide a deeper understanding of how individuals experience diseases.

Randomised controlled trials (RCTs)

RCTs are prospective studies following participants forward in time. RCTs measure the effectiveness of a new intervention or treatment. In RCTs, participants are recruited and randomly assigned to two or more groups. One group (the experimental group) has the intervention being tested, the other (the comparison or control group) has an alternative intervention, a dummy intervention (placebo) or no intervention at all. The RCTs involved in the LongITools project are focused on the role of nutrition and physical activity in general health and metabolism. These will not only provide comprehensive biological and exposome profiles of study participants but will also allow an in-depth analysis in more controlled settings.

The Elipa study used in LongITools recruited men and women, aged 30-65 years, living with obesity. The study, started in May 2008 and ended in February 2010, consisted of weight-loss and weight-maintenance periods. During the weight-loss period, all 99 participants receive a very low-calorie diet for 7 weeks. During the 24-week weight maintenance period, subjects were randomly allocated to two groups to consume as part of their weight-management diet foods with either higher or lower satiety values. The study aimed to find out factors associated with weight management, especially whether the satiety value of food as a part of a weight-maintenance diet would affect self-regulation of food intake and weight management.

Case control study (or retrospective study)

These studies compare a group of patients with a disease or outcome of interest (cases) with a group of people who do not have the disease or outcome (controls). In these studies, researchers compare how frequently the exposure to a risk factor is present in each group and from there determine the relationship between the risk factor and the disease. In LongITools, we are using data from The Finnish Gestational Diabetes (FinnGeDi) Study, a multicentre study of 2,200 Finnish women who gave birth in 2009–12, as well as their children and the children’s fathers. The study, focused on gestational diabetes, consisted of two arms: a prospective clinical, genetic case-control arm and a national register-based arm which also includes data on children’s siblings and grandparents.

Health or occupational registries

A registry is a collection of information about individuals, usually focused on a specific diagnosis or condition. National registries can help government officials, health practitioners, and clinical researchers answer a variety of critical questions and are useful for better planning and regulation of healthcare delivery at a national level. They can help to track trends and are used for health analysis, health statistics, improving the quality of healthcare, research, administration and emergency preparedness.

The register data used in LongITools includes data from the Nordic Children and Adults Born Preterm (NORDCAP) study. The NORDCAP study focuses on how preterm birth and maternal pregnancy disorders predict health, wellbeing, and societal achievements of the offspring, and includes all the children born in 1987-2016 in Finland (approximately 1.7 million individuals). The NORDCAP data is based on the Finnish Medical Birth Register, but are also linked with several other registers, such as Population Information System and Finnish Care Register for Health Care.

Environmental data

Altogether, the studies involved in LongITools cover the whole life course and include longitudinal birth cohorts and ageing cohorts from the same geographical location which enables us to study the changing environment and its association with cardiovascular and metabolic health. In addition to the studies, the LongITools project also accesses environmental and exposure data including, for example, fine scale traffic and land use data, pan-European noise exposure models, satellite-based indices of greenness and geographical information system data to model the built environment.

The data collected in the studies at different time points are summarized below and are available in more detail in the project’s profile paper: Dynamic longitudinal exposome trajectories in cardiovascular and metabolic noncommunicable disease.


Find out about the metadata catalogue being developed in the LongITools project.