Let's name Woombie!

What this is:

Who is Woombie? Woombie is my baby-to-be. He's going to be born soon. (Edit: he has been born!) Why woombie? Because unborn children need provisional names, and he's the brother of our pet robot, Roombie

We have a list of names we want to try out. Vote on which you like the best.

For developers:

Okay, but what's really going on here?

Aha! You're right! This project is also an exploration and tutorial in the world of ranking and statistics.

Real-world auto-dating ranking systems face messy data: power users, trolls, incomplete comparisons. Real-world data analysis also faces the fundamental problem that the experimental design you're using might not be ideal. For instance, you might wish you had a metadata-aware ranker for matchups, but instead you have an almost fully stochastic one.

(Note: by ranking system I mean "system that ranks people for leaderboards", not "content ranking in feeds". Confusing, I know)

This app demonstrates how, and more importantly why, to build robust scoring algorithms that handle these challenges.

Specifically --

With messy or imperfect data, even algorithms meant to account for that messiness give different results
Behold the power of regularization. Throw out a few rogue actors / outlier data and things become a lot clearer
Rather than put data into a black box algorithm and call it a day: interrogate the results!

This is a tutorial?

Yes. Look at that leaderboard!

The problem: How shall we crown the best name?

If we had millions of votes from many people, then we could use simple win % and be done with it. But we do not live in that world. We have an odd distribution of voters on a small sample of all overall possible matchups. We aren't in the realm of arithmetic. We're in the realm of ... statistics.

Accounting for power users

First off, how do we deal with power users? If Mary votes for Imri over Alon 10 times, then we probably don't want to overweight that. It's the same as voting for Imri over Alon ... once? 1.5 times?

But if Mary has voted 50 times for different matchups, and Lisa voted just once, does Lisa's opinion count just as much?

The first move I made was to merge, per user per matchup, all votes. So if Mary voted for Imri over Alon 3 times, flatten it to once.

Simple overall win # doesn't work

What if a name was lucky, and matched up against awful names very often? We want to give a boost to names that unexpectedly beat strong contenders.

Win % doesn't work -- especially with user submissions

Imagine a person submitted a name e.g. "X Æ A-Xii", and voted for it heavily (beating any other name). And imagine that few other people even got the chance to vote on it (since the name, by design, takes a while to propagate to other voters) "X Æ A-Xii" would have a very high win percentage -- but almost all the votes would come from one person.

Doesn't Elo fix this?

You'd think! But Elo assumes that order matters. It assumes that the names are players who can increase or decrease in skill over time. Nope!

Different approaches that work

This isn't an exhaustive explanation of

Network theory

Turns out, there are algorithms out there that can account for finding truth in a sea of untrusted actors. The most famous one is pagerank.

For instance: "if a strong node (name) votes for another node (loses to another name), take that seriously"

A name is a domain.
A link is a "I lost to them".
Links from domains with low pagerank don't matter much.
Links from domains with high pagerank matter a lot

Pagerank -- more useful than just for web search!

Plus -- Eigenvector centrality: a more general version of pagerank.

Win/Loss without time

What we thought Elo did, Bradley-Terry rankings do! Trying to deduce reality via pairwise rankings. "If you win unexpectedly against a strong opponent, that matters more"

Wait, we get different winners!

Yes! Between algorithms, between checking boxes (include names with less than two voters, show user added names), we get different results. (And you should have seen the results from e.g. Elo before I realized it was unsuited).

Data is important, analyzing it is helpful, data sense to interrogate the problem is necessary -- but at the end of the day, making decisions needs to be informed by data, not mandated.

In the end, we chose Omri. :-)