Sex ratio is the clearest indicator of bias in the baby names dataset


I've written before about how the U.S. Social Security baby names dataset, despite being trotted out by plenty of commercial websites aimed at partents, needs to be taken with a grain of salt, and a whole shaker of salt before the 1930s. This is just about the clearest graphical demonstration I've come up with.

It's impossible to quantify race ratio for the dataset, but since only certain occupations were allowed at first, and they excluded most of the occupations that were available for black men and women (for example, day labor and domestic work), it's safe to say the database is severely unbalanced in that regard as well.

Despite having an extensive work history in biology, I never knew that more male babies are born than female babies, a univeral phenomenon across the world (exacerbated by sex-selective abortions in some regions, unfortunately).

I've updated my previous Tableau Public storyboard on the limitations of the Social Security dataset to include this tidbit.

Post a Comment