Who is Woombie? Woombie is my baby-to-be. He's going to be born soon. (Edit: he has been born!) Why woombie? Because unborn children need provisional names, and he's the brother of our pet robot, Roombie
We have a list of names we want to try out. Vote on which you like the best.
Aha! You're right! This project is also an exploration and tutorial in the world of ranking and statistics.
Real-world auto-dating ranking systems face messy data: power users, trolls, incomplete comparisons. Real-world data analysis also faces the fundamental problem that the experimental design you're using might not be ideal. For instance, you might wish you had a metadata-aware ranker for matchups, but instead you have an almost fully stochastic one.
(Note: by ranking system I mean "system that ranks people for leaderboards", not "content ranking in feeds". Confusing, I know)
This app demonstrates how, and more importantly why, to build robust scoring algorithms that handle these challenges.
Specifically --
- With messy or imperfect data, even algorithms meant to account for that messiness give different results
- Behold the power of regularization. Throw out a few rogue actors / outlier data and things become a lot clearer
- Rather than put data into a black box algorithm and call it a day: interrogate the results!
Yes. Look at that leaderboard!
If we had millions of votes from many people, then we could use simple win % and be done with it. But we do not live in that world. We have an odd distribution of voters on a small sample of all overall possible matchups. We aren't in the realm of arithmetic. We're in the realm of ... statistics.
First off, how do we deal with power users? If Mary votes for Imri over Alon 10 times, then we probably don't want to overweight that. It's the same as voting for Imri over Alon ... once? 1.5 times?
But if Mary has voted 50 times for different matchups, and Lisa voted just once, does Lisa's opinion count just as much?
The first move I made was to merge, per user per matchup, all votes. So if Mary voted for Imri over Alon 3 times, flatten it to once.
What if a name was lucky, and matched up against awful names very often? We want to give a boost to names that unexpectedly beat strong contenders.
Imagine a person submitted a name e.g. "X Æ A-Xii", and voted for it heavily (beating any other name). And imagine that few other people even got the chance to vote on it (since the name, by design, takes a while to propagate to other voters) "X Æ A-Xii" would have a very high win percentage -- but almost all the votes would come from one person.
You'd think! But Elo assumes that order matters. It assumes that the names are players who can increase or decrease in skill over time. Nope!
This isn't an exhaustive explanation of
Turns out, there are algorithms out there that can account for finding truth in a sea of untrusted actors. The most famous one is pagerank.
For instance: "if a strong node (name) votes for another node (loses to another name), take that seriously"
- A name is a domain.
- A link is a "I lost to them".
- Links from domains with low pagerank don't matter much.
- Links from domains with high pagerank matter a lot
Pagerank -- more useful than just for web search!
Plus -- Eigenvector centrality: a more general version of pagerank.
What we thought Elo did, Bradley-Terry rankings do! Trying to deduce reality via pairwise rankings. "If you win unexpectedly against a strong opponent, that matters more"
Yes! Between algorithms, between checking boxes (include names with less than two voters, show user added names), we get different results. (And you should have seen the results from e.g. Elo before I realized it was unsuited).
Data is important, analyzing it is helpful, data sense to interrogate the problem is necessary -- but at the end of the day, making decisions needs to be informed by data, not mandated.
In the end, we chose Omri. :-)