Via Sullivan, people are for some reason talking about the book Who’s Bigger? Where Historical Figures Really Rank in ways that don’t involve eye-rolling and laughter. The authors, Steven Skiena and Charles Ward, are trying to quantify historical significance by looking at how frequently a particular historical figure or concept is mentioned. This is apparently not a measure of “importance” or “impact,” but rather something like “fame” or at least the degree to which people are still talking about that figure or concept, which they call “significance” because calling it “fame” doesn’t quite capture what they’re measuring (and makes the whole exercise sound kind of dumb). This definition of “significance” could approximate “impact,” but if you can imagine an obscure but important historical figure then you’ve already spotted the flaw there. I’m not sure how many Americans would rank John Marshall, for example, among the 10 or 20 most important figures in US history, so his “significance” is questionable, but without the concept of judicial review I suspect our history would look considerably different. To measure “significance” they go, obviously, to Wikipedia. I mean, duh. Cass Sunstein explains:
Skiena and Ward compile this list by reference to what they see as five objective indicators, every one involving the English-language version of Wikipedia. (That is a big problem, and we will get to it in due course.) Their first two indicators draw on Google’s famous algorithm, called Page-Rank. Skiena and Ward contend that the pages of significant people end up getting a lot of links. If numerous Wikipedia pages end up linking to Abraham Lincoln, we have a clue that Lincoln was a major figure. With this point in mind, Skiena and Ward ask: what is the probability that a random Wikipedia page will link to a particular person’s page? The higher the probability, the more significant that person’s page.
Skiena and Ward are aware that you might come to Jesus (so to speak) not through surfing pages that involve people, but because Jesus’ page gets a lot of links from pages that involve institutions, animals, and inanimate objects. By the Page-Rank method, for example, Carl Linnaeus, the great scientist of classification, ends up third on their all-time list, which seems pretty absurd. Owing to this problem, they add a second measure, which limits the PageRank analysis to links among people. With this measure, Carl Linnaeus’s ranking plummets. (Jesus does great.)
For their third measure, Skiena and Ward focus on the number of “hits” that Wikipedia pages receive. They note that this measure can produce dramatically different rankings from those that emerge from PageRank. Many entertainers, such as Justin Bieber and Taylor Swift, get a phenomenal number of hits, even though they do not do especially well on PageRank. Their fourth measure involves the length of Wikipedia articles. In their view, more significant people will tend to end up with longer articles, reflecting the magnitude of their contribution. Fifth, and finally, Skiena and Ward explore the sheer number of times that a page is edited. They think that if a lot of people are contributing to a page, there is a great deal of interest in it, and that interest tells us something about significance.
(Pay attention to that bolded bit)
These already seem silly, don’t they? Longer Wikipedia articles suggest a greater contribution? Eliminating the 491 accompanying footnotes, Taylor Swift’s Wikipedia entry clocks in at about twice as long as Susan B. Anthony’s entry. Anybody want to correlate length with “magnitude of contribution” there? Hell, the fact that Taylor Swift’s Wikipedia entry has four hundred ninety-one freaking footnotes ought to immediately expose the absurdity of this exercise. Skiena and Ward have tried to ameliorate this by employing a second, non-Wikipedia-based metric to incorporate expected longevity and avoid having their top 10 be Justin Beiber, LeBron James, Kanye, Ellen, Beyonce, Peyton Manning, and the four guys in One Direction. Sunstein:
To see how current fame will “decay,” Skiena and Ward consult Google Ngram, a fascinating source that shows how many times various words appear in millions of books. They contend that with the use of Ngram, it is possible to create a fairly reliable model of how significance falls over time, and they adjust their findings accordingly. With respect to particular people, Ngram displays a wide range of patterns. Paul Revere, John Lennon, Malcolm X, Karl Marx, and Vincent Van Gogh became famous, or at least far more famous, posthumously. By contrast, the references to some once-celebrated historical figures have fallen precipitously; these include Arthur Wellesley (the Duke of Wellington), the explorer John Franklin, and Napoleon II. Albert Einstein shows a pretty steady increase from 1915. Babe Ruth jumps from 1915 to 1949, then starts falling until 1968, only to enjoy steady increases since that time. Woodrow Wilson shows a high point in 1942, falls until 1982, and stays level from there.
Skiena and Ward are not interested only in what happens to specific people but also in the possibility of making general predictions about likely changes over time, and thus translating their “fame” measure into one of “significance.” It turns out that famous people tend to be most discussed about sixty or seventy years after they are born, and that there is a decline from that point—but that with the most famous people, discussion is reduced later in life, and also more slowly. (Jesus is the extreme case here.) The resulting statistical model allows them to make adjustments from current fame and thus to compute not only total significance rankings (producing the top-twenty list replicated above) but also rankings within fields.
OK, that’s at least better than relying just on Wikipedia, but what about that line I bolded above? It turns out that when you try to run a quantitative study of historical significance based on Wikipedia stature, which itself introduces all sorts of biases based on what populations are more likely to use and edit Wikipedia, and then consciously limit your study to the English-language version of Wikipedia, you might just possibly be doing terrible things to your results. Let’s hear from the authors themselves:
We don’t expect you will agree with everyone chosen for the top 100, or exactly where they are placed. But we trust you will agree that most selections are reasonable: a quarter of them are philosophers or major religious figures, plus eight scientists/inventors, thirteen giants in literature and music, and three of the greatest artists of all time. We have validated our results by comparing them against several standards: published rankings by historians, public polls, even in predicting the prices of autographs, paintings, and baseball cards. Since we analyzed the English Wikipedia, we admittedly measured the interests and judgments of primarily the Western, English-speaking community. Our algorithms also don’t include many women at the very top: Queen Elizabeth I (1533-1603) [at number 13] is the top ranked woman in history according to our analysis. This is at least partially due to women being underrepresented in Wikipedia.
So they get that their process is horribly biased, but went ahead and wrote a book about it anyway because, you know, they figured people would pay them money for it. Because there’s no way they looked at this top 25 and really thought this was a representative list of the most significant figures in human history:
- William Shakespeare
- Abraham Lincoln
- George Washington
- Adolf Hitler
- Alexander the Great
- Thomas Jefferson
- Henry VIII of England
- Charles Darwin
- Elizabeth I of England
- Karl Marx
- Julius Caesar
- Queen Victoria
- Martin Luther
- Joseph Stalin
- Albert Einstein
- Christopher Columbus
- Isaac Newton
- Theodore Roosevelt
- Wolfgang Amadeus Mozart
There’s a lot to nitpick about (I’d struggle to put a single American president in the top 50 most significant historical figures, yet here are four of them in the top 25 and three in the top ten, and how does Henry VIII rank higher than Martin Luther when Luther’s the one who started the Protestant Reformation that Henry then joined?), but what isn’t nitpicky is this: if you’re ranking the 25 most significant figures in world history, regardless of criteria or your own filter, and 24 of them are European, American, or the central religious figure for most Europeans and Americans, then you’re doing it wrong. The one guy on this list whose influence lies outside the Christian world is Muhammad, the founder of another religion that has been on intimately linked with Christianity from its founding through the present day. So you’re 25 for 25 in listing figures who either are European in origin (white Americans included) or have been of paramount interest to Europeans. There’s no Buddha (he generously comes in at 52), no Confucius (outside the top 100), no Cai Lun (he only invented paper, no biggie), no Mao (but Stalin makes the top 25?). Genghis Khan is the next non-Euro on the list, at 38; I’m a big fan of Genghis Khan, but weren’t his accomplishments slightly more ephemeral than those of Qin Shi Huang, who founded the Chinese Empire? Just to show I’m not biased against Europeans, how does Teddy Roosevelt crack your top 25 while Johannes Gutenberg isn’t in the top 100, and you don’t immediately discard your list and rethink your research project?
But, really, Grover Cleveland is at 98. Skiena and Ward’s research methods determined that Grover Cleveland is the ninety-eighth most significant figure in world history. If that’s not your first clue that your study isn’t really doing what you wanted it to do, then I don’t know what to tell you. Hope the book sells well.