Sunday 21 September 2008

Democracy of Online Communities

I wonder why is it that men are always good with tools and machines, and women are dubbed to be less competent in that area. It surprises me, because some of the most interesting questions I ever get on technology is from women......

At least when it's about online communities.

Carla was wondering about how search engines work, and after I explained to her that search engines rank results according to clicks, or to be exact, on how many times the link relating to the search keyword is viewed. She mentioned that that was "not right!". Amused, and foretelling that I will have to explain the Google Page Rank algorithm, I asked why. Well, we were eating pistachios at the time, and she said "Well, what if someone makes a website on how to plant pistachios, and another person wrote a novel about pistachios. The novel about pistachios will be probably be viewed more, but that would be unfair for people who are searching for how to plant pistachios."

True, but.

The web works much like a democracy: give people what they want. Is Obama the most qualified man to be US president? That's not the point! Whether he is the best man for the job or not, is irrelevant in democracy. The person who the people choose to be president, will become president, because he got the most votes. No one is asking questions, right? It's the same thing with search engines and online communities. It's a democracy, not a meritocracy, and that is the only way it actually can work. Why? Because well, if , for example Tim Berners Lee, the dude who invented the internet, was to decide if "How to plant pistachios" should be the first hit on a search engine, then he would be just a dictator. i.e. The internet, because it's a "user -driven" medium, lets its users decide on the search engine number one hit. So, if the website about the novel about pistachios is viewed more than the one about planting pistachios, then it will rank higher. If it gets viewed more, then its probably more important. More precisely, the search algorithm tries to "approximate" the best possible thing you are looking for.

Here is another example. If I search for the word "human" , which is a very common English word, probably one out of every three websites has the word "human" in it, which means millions of pages. So how can differentiate? Well, if out of these millions of pages the website about the "Human Genome Project" is the one that gets the most hits, then if you are searching for the word "human", there is a high possibility that you are looking for the HGP. Machines cannot, and will never be able to know what exactly you want (Turing impossible), because , it can only work on whatever you type into the search bar. That is it's boundary. So, it has to make an educated guess, and the best guess it can make is based on its previous "learning", i.e. what are the websites that are being viewed in relevance to this string of text? It's just a mathematical equation. Give the person searching the results of whatever most users usually want to see. Whether it's right or wrong, is completely irrelevant.

Just like Obama. He is popular, he is charismatic, people like him, he will (probably) win.

....Of course, technically speaking, this is not how the PageRank algorithm works, because it also has something called "inbound links" that are also used to rank a page based on how many pages it links to. Both this and the number of viewers I explained above, is why usuall Wikipedia articles are on the top hits, because they have alot of inbound links AND they are viewed alot. It kinda boils down to the same "democracy" principle (although, again, not quite, because since it's all machines and computers, this system can be "hacked" for getting your website as the top hit, but google's "Don't Be Evil" philosophy, comes into play to protect this democracy - and their business model :P).

I was just explaining how the internet in the third millenium works, and saving my friend the trouble of reading Von Hippel's Democratizing Innovation book.

No comments: