Monday, March 5, 2012

Website Ideas

This post is a catalog of some ideas for useful websites that I'd like to see created. (Edited from an email I sent on the singularity list.)

  1. Knowledge Wiki: A repository of data and hypotheses, which keeps track of the amount of support for various scientific hypotheses based on trusted knowledge. This would ideally grow out of the existing Linked Data network. We could see in real time the current amount of support for controversial hypotheses such as global warming. The bayesian logic would ideally be subtle enough to take into account relationships between different hypotheses, uncertainty about measurements, models of the observation bias (that is, Bayesian models of the processes by which data gets published to the repository), and many other difficulties which may be hard to foresee. Thus, it could start small (perhaps a "knowledge wiki" which connected linked data to a statistical analysis tool such as R to create public analysis capability) but would ultimately be a big research project requiring us to overcome dozens of problems in data analysis. (Note that it does not require extremely strong artificial intelligence, however-- the user is required to input the supporting arguments and many aspects of the statistical analysis, so the reasoning is relatively non-automatic.) Each user can customize the set of trusted assumptions to personalize the resulting degree of belief.
  2. Open-Source Education: This is an idea my brother was pursuing some time ago. The idea is to create high-quality drilling software with community-contributed questions. This basically means the expansion of Khan academy to all areas of education via crowdsourcing. However, the software should automatically select a mixture of problems based on its knowledge of your knowledge, according to a spaced repetition equation (google it). This functionality is embodied in systems like Anki. A combination of the right features could (I think) help to induce a flow-like state, making learning more addictive and fun. If this became popular, hiring decisions could rely on these scores more than grades, since the system naturally accumulates extremely accurate representations of a person's ability (though like grades, this would exclude people skills, creativity, and other important factors). If it worked, this would revolutionize the way we learn.
  3. Online Job Markets: The creation of robust online job markets similar to Mechanical Turk, but capable of supporting any kind of intellectual labor. This is happening slowly, but concentrating on low-skill, low-pay areas. Encompassing high-skill and high-pay areas has the potential to create a much more efficient economy for intellectual labor, since it would reduce down time searching for jobs, improve the number of options easily visible to both sides, et cetera.


  1. The knowledge wiki sounds a bit like a Wikipedia that allows original research (with supportive evidence) and popularity voting. The main problem I see is that I can hire people to vote for what I believe and push my agend to the top.

    I've spent some time thinking about open source education as well. This version sounds a bit liek gamefication, which has gained recent popularity. Khan Academy has done a bit of this with points and achievements. Someone still has to take a LOT of time to set this up. Also, not all knowledge can be taught by this method.

    With your online job market idea, there are free-lancing sites out there that help out freelance web developers, writers, and such. I don't know how they regulate the reputation system to keep it honest.

    I like the ideas overall. The problem is keeping people honest as well as building up the sites in the first place. The ideas all require a fairly large infrasructure, even if you start small. That can get a bit costly.

  2. Matt,

    Good to hear from you.

    For the knowledge wiki, I was not thinking of anything like popularity voting. Conclusions should be based on evidence, not popular opinion. The problem would not be hiring people to vote, but hiring scientists to publish (as we see in some cases in the medical industry). Or, similarly, a company/group could perform a lot of experiments and then make information public in a bias way, publishing the most successful trials. That is why I mention Bayesian models of observation bias-- I wasn't able to find a good online reference, but the techniques are well-known and can be found in textbooks (specifically Koller & Friedman). Unfortunately, to work well it needs a decent model of the publication process. (It is possible to learn the bias in the published data so that we can correct for it, but only if we start with a good guess.)

    So, I imagine that the website would have some default levels of trust (a core of strongly trusted stuff vetted by scientists, and then other categories based on different models of possible problems with the data) which could be altered by users based on personal beliefs. The defaults would be important, but ultimately how to interpret the available data is up to the user. Alternative interpretations which were significant in some way could be discussed publicly, perhaps leading to changes in the default. (Some of this kind of thing could be automated using Bayesian selection between alternative models.)

    In fact, in my mental image of this thing, classical logic would not even be a necessary assumption-- a person could set their account to prefer a nonclassical logic, and some arguments would be rejected accordingly. So, ideally, this could be a tool for philosophy as well (keeping track of possible ontologies). However, this would be far from the focus (so key features for this use-case might not get implemented quickly or might be implemented in a somewhat clumsy way).

  3. Keep in mind that a user could weigh the evidence in such as way that it always agrees with what they want it to prove. "Each user can customize the set of trusted assumptions to personalize the resulting degree of belief." That's all I meant by popularity. I suppose I misunderstood and thought that others could see what a user 'proved'.

  4. Fair enough. My point is that there would be no voting. Science is not a democracy, it's a dictatorship where everyone gets to be the dictator. :)

    It's true that you could always manipulate your prior to get whatever answer you wanted, but that's the Bayesian Way. Scientific conclusions are subjective, because we bring our prior knowledge into the judgement. However, if we take a look at someone's analysis of global warming and see that they set the prior probability to .000001 in their analysis, we can tell where their conclusion is coming from...

  5. (Of course, a prior of .000001 would still be washed out by even a little evidence... say, 15 bits. To ignore thousands of bits of evidence we would need to assign a prior probability with thousands of zeroes before the 1...)

  6. What I mean to say is, yes, users could see each other's analyses. However, they would also be clearly shown what assumptions those conclusions relied on. (That would be a big part of the advantage.)