120 Years of Olympic Summer Games History
Introduction
This analysis was completed as a final project deliverable for my Masters of Science in Applied Data Analytics at Boston University's Metropolitan College, Fundamentals of Data Analytics (using R).
The dataset is available on Kaggle but was originally created by scraping data from www.sports-reference.com. It lists all Winter and Summer Games from 1896 to 2016 and includes athlete names, physical characteristics (sex, age, height, weight), country/team, host city of each Games, sport/event, and medal won (if any).
Some notes about data processing. Rows including NA values have been removed. This analysis focuses on the Summer Games and considers sports as a whole (e.g., “swimming” rather than the 100m freestyle or 800m relay). If an athlete competed in four Games, they are listed in four separate rows in the dataset. Similarly, they are listed in multiple rows if they compete in multiple sports in the same Olympics. For example, Michael Phelps is listed 30 times: he competed in 5 Summer Games in multiple events.
To see the R code I used to generate these visualizations, click here.
Related Content
Basic info about the athletes and their sports
Throughout the history of the Summer Games, there have been 52 sports in which athletes have competed. Some sports made their appearance briefly and others have withstood the test of time. The 5 sports which have appeared in all 29 Summer Games are Athletics, Cycling, Fencing, Gymnastics, and Swimming (hereafter called the "Top Sports").
The average Summer Games athlete is 26 years old. This has varied over the years (there are now minimum ages, so the minimum age of 10 in the dataset would no longer be eligible to compete). The types of sports have also changed over the years; we see in past Olympics there were sports such as “art competitions” which allowed for a much higher competitor age (such as the maximum age of 97).
Athletes do trend young, but older competitors pull the average age higher.
AGE AND GENDER
Splitting out athletes by gender, we see that there’s a considerably different age distribution for women as compared to men.
It's not quite as simple as saying that women compete at a younger average age (24) than men (26). In actuality, we see there are considerably fewer female athletes over the history of the Summer Olympics. The number of athletes has grown over the years, but the biggest increases have been for women. The average age is becoming more similar as the numbers of men and women equalize (since 2000, the mean age of women has increased to 25, while for men stays at 26). The number of American women competing in the Olympics is also directly related to the creation of Title IX rules in 1972.
The type of sport the athlete competes in also determines their ability to compete at such a high level at a young age. Focusing specifically on the Top Sports, we do in fact see that different sports have different age distributions.
Fencing has the highest average age (29) and also the largest range of ages. This is likely because it is somewhat less physically demanding and therefore allows a broader range of participation.
Swimming has the lowest average age (21) as well as the smallest range of ages. This is likely because swimming is extremely physically demanding. Unlike gymnastics (which is the next youngest of the top sports), swimming requires extra breathing and lung capacity in addition to strength and endurance.
The youngest of the Top Sports are Gymnastics (23) and Swimming (21). These are also the sports that are usually competed in by high school and college students, likely because schools often have well developed support systems for those sports. By contrast, Fencing and Cycling generally do not have as robust a support network in school, so athletes may already have figured out the resources they need to continue competing. In addition, men's Gymnastics tends to be more based on feats of strength (such as the rings event) than women's Gymnastics. The latter, while also requiring exceptional strength and endurance, has historically focused more on grace and form.
AGE AND CAREER LENGTH
The next question we might ask is what are the downstream effects of these age distributions? Does starting the Olympics at a younger age mean a longer career? Or faster burnout?
We can see that there is not a direct relationship between length of career and starting age. The athletes who have competed in the most Olympics did not start younger than those who competed in fewer Olympics. But is this because the rules of the Olympics used to be different? The sports have changed considerably over the years (since "Art competitions" was an event). Does this trend hold true for the modern era of the Olympics? To answer that, I looked specifically at athletes who began their Olympic career in 1970 or later (if they began in the 1960s and continued into the 1970s, they are not included).
It turns out that the trends are very similar to the overall dataset. We can conclude that starting age does not have a strong effect on the career length of an Olympic athlete.
Conclusion
Through our exploration of this Olympic data, we see that the average male athlete is older than the average female athlete. Many factors play into this, particularly the sport of the athlete. Since men's events may count pure strength as a larger success factor than for women, men may qualify for the Olympics at a slightly older age; it takes longer to build strength. Sports focusing more on endurance (such as Athletics) have a more similar age distribution by gender. In addition, the creation of Title IX gave women more opportunity to compete in sports, which has in turn increased both the numbers of women in the Olympics and increased the age at which they compete.