First of, beware the pathological case. Figures don't lie, but liars figure. While it may seem reasonable to say "If a user has 6 writeups, 4 of which are at 0 rep and 2 of which are at a gazillion rep, what then?", it really isn't. You should be aware of that.

This isn't going to be a proposal for a whole new system. It's just a couple of architectural inputs, which I am sure have already been considered, but I haven't fed Klaproth in a long while.

Systematic downvoting
Beware of how attack voting will interact with the new system. A system based on MNFP could be painful (5 well placed votes would cost me 300 MNFP; 46 would cost me 600 MNFP). One way to deal with this? Obscure the rankings of a user's writeups. If anyone but the user chooses "sort by highest/lowest rep first", randomly modify the reps by +/-2, then sort. This will still give a good indicator of the order of the writeups, without painting a large target on the writeup.
Incentivize short, factual writeups. Incentivize making them longer.
Most factual noders have a string of one or two line writeups they'd love to be rid of, because they only have a rep of, say, 0. We don't feel like putting in the effort to make a writeup like simulive into a work of art, but deleting it would hurt the database. One idea: Allow a noder to "cut loose" a writeup. Writeups which have been cut loose are transferred in ownership to an account like everything, where anyone can search for things to update (Everything Quest: Replace 10 cutloose writeups with someone significantly better). Okay, maybe that idea was better in my head

Professor Pi's Comments:
  1. "Figures don't lie..." - It would indeed be unreasonable to calculate an "average" node reputation based on 6 nodes with reputations (0, 0, 0, 0, 500, 500). But the statistical representation becomes better for an increasing number of nodes: see the (unfortunately very short) writeup on the Law of Large Numbers. For a larger number of nodes, the distribution profile more closely resembles a curve centered around the average (i.e. it is highly unlikely that the noder will eventually end up with 40 writeups at rep=0 and 20 writeups at rep=500 (or 400 and 200 writeup, respectively). The entry requirement of 25 writeups or more for the Honor Roll insures that we are applying statistics on a sufficiently large sample group.
  2. Systematic Downvoting - As I described in (2), I stepped away from calculating an average node reputation (solely) based on the median. The most promising alternative so far is the Interquartile Mean, as described by yerricde (1). Targeted downvoting (i.e. mass downvoting of writeups at or around the median) should only have a small effect; on the order of 1--2%. Mass downvoting of all writeups could drop the "average" reputation by approximately 20--30%. This is of course a significant amount, but note that (1) this should not affect too many noders; the problem of mass downvoting is one under any system, including the current, (2) noders are not penalized when their average reputation drops below the average, and (3) you can only vote on each writeup once. A mass downvoter can only do so much damage. Good noders will still see benefits of the Honor Roll, even if they are mass downvoted. Still, it is very unfortunate that we don't have better means to fight mass-downvoting.
  3. Incentivize short, factual writeups. Incentivize making them longer. - Maybe there should be a place where one can sumbit improved nodes. I'm afraid there would be a high risk of abuse, though: "Node X went throught he new nodes too fast. Let's just add some hardlinks and submit it to the Improvement List".

cordelia's later additions, not necessarily in reply: The current system can be described as:
To achieve level n you must have w>=W(n) writeups (see Voting/Experience System for W(n) calculation), and a node-fu*w of at least 3*W(n). Node-fu is your reputation divided by your number of writeups, and we all know how rep is calculated. Each of the proposed changes creates a new value that would effectively replace experience as a requirement for levelling; for all systems I will call this value NFW. As an example noder for each of these, I will use myself (my stats are viewable at the bottom of my homenode). (Node-fu=15.2, Wus=301, XP=4603, mean rep=5.76, median rep=4, interquartile mean rep=3.80, Total rep=1734)

To modify this system, there are two key properties to look at when addressing deficiencies (all I plan on addressing): Gaming the system, and attacking a noder.

First, looking at the standard system:

A noder's NFW is trivial to boost, merely by casting all their votes in a day. To level up, they merely have to generate enough nodes; hence noding for numbers. For me to reach Level 6, I merely have to sneak in another 79 nodes; their reputation does not matter.

Attacking me is hard. Likely, each downvote costs me -.2 NFW. A single noder can cost me 60 XP, which I'd notice, but I make that after two days of voting. If you are Pseudo_Intellectual, you could do this to me in a day; most noders would take a week or more.

Now a look at the (unlikely to occur) Median Node-Fu Product) system:

Gaming the system: My NFW (not counting the +1 bonus in that node) is 1204. Since my median node is only 5 into my set of rep 4 nodes, I could delete 56 writeups below it - leaving me with an NFW of 1225. Probably not worth it. But as I manage to add more high rep nodes, the step effect to shift my median node becomes more tempting.

Attacking the system: With 5 strategic downvotes (easy to figure out where), an attacker could shift my median node down by one, costing me 301 NFW (ouch!). However, they can't shift me down again, although each attacker could, in fact, drop me by another 301 NFW. At most, they'd have to spend 151 votes (The first spends 5, the second 47, the third 105, the fourth 146, and the fifth and beyond 151). So here, one vote can cost at least -2 NFW, but only en masse, and it is hard to defend against.

Interquartile system:

This is a harder one to game. Interquartile only counts my nodes between rep 2 and 7, giving me an NFW of 1143. If I delete 4 low ranking nodes, I can add 3 points to my NFW, to a whopping 1143. If I removed the 56 writeups from the MNFP proposal, my NFW drops to 1104 - I lost too many writeups to be countered by the average jump in value of the writeups. I could probably figure out the breakeven point, but I'll leave that for someone with even more time on their hands.

Attacking isn't as easy as MNFP, but easy than XP. The only interesting places to attack are right in the middle; each vote in the middle half of my writeups costs me exactly 2 NFW.

Mean Rep*nodes system (sum of reputations)

This system is gameable by catering to the soy-eating lesbian monkeys. Every upvote increase my NFW by one, every downvote hurts it by one. Getting a writeup voted on a lot helps ... a lot. Since most noders have a right-tailed distribution (I can never remember if that is left or right skew), NFWs will be higher (mine is 1734); if, however, they are consistently higher, a shift in the metric would reduce the effect. A cap on outliers (no node can contribute greater than 50 rep or less than -5 rep to your NFW) might also reduce the impact on the system (I like those numbers, since they don't affect me).

Attackability: Here, an attacker can only cost you one NFW per vote cast, no matter where.


Proposal

Any system which uses sampling and node reputation to calculate NFW empowers an attacker inversely proportional to the sampling rate; sampling one-half of a user's nodes doubles the power of attack votes. A system which uses all of a user's nodes is going to be more resistant to attack.

Damp outliers. In each system, the effects of outliers - such as a toilet seat write-up - have been brought up as a justification for central area sampling. Once a wirteup's reputation has passed beyond some threshold, its effect is measured only by weighting the direction of the sampling area. Instead of doing this, the reputation used to compute NFW can be damped. One simple, and easy to plot method, is to place caps on the upper and lower ends of the distribution. No node can contribute more than 25 reputation points to NFW. Higher rep nodes contribute 25 exactly. No node can deduct more than 2 reputation points from NFW. Lower rep nodes deduct 2 exactly. (Disclaimer: under this system, my NFW is 1631)

Alternate. In an alternate system, one could use logarithms to compute NFW - for instance, each node could contribute log(rep) if rep >0, or deduct log (1-rep) of <1. This could be renormalized (multiply by 10), leading our example noder to have an NFW of 1733. However, this system is attackable. The transitions from 0 to -1 and from 1 to 2 are the most significant (changing one's NFW by 0.3), although the transition from 0 to 1 is useless (no change). However, the higher a writeup's reputation goes, the less each successive vote yields to rep.


Professor Pi's comments:

Thank you, cordelia for running the sensitivity analysis for the various approaches on your node-reputation data. I don't have much to add to your comparison of the various systems: the conclusions you draw in this section are correct.

Your proposal to determine a cutoff at rep>=25 (that is, only for purposes of the Honor Roll System of course) is an interesting alternative approach to using the Interquartile Mean (IQM). However, I would like to make the following comments on your Cutoff-System (CS):
  • The IQM by its very nature results in a more symmetric distribution of the reputations around the average compared to the method of the mean, but also compared to the CS. As a result, I tend to put more confidence in the as IQM a measure of central tendency for the widely varying types of reputation-distributions we encounter on E2.
  • The upper-cutoff is a rather arbitrary value; there are actually noders with (for instance) a 3rd quartile reputation greater than this value. Now we are going to fix the relative contribution of these writeups to an arbitrary upper mark. Using the IQM, the high rep writeups can still contribute to increasing the average by raising the 3rd quartile value.
  • The CS is indeed more stable with respect towards mass-downvoting, but with this property comes also a weakness: downvoting of "overrated" high-reputation writeups will not show much influence on a noder's average. While it is true that in the IQM, the average can be reduced by downvoting 50% of the writeups, I have no reason to believe that someone -- who is vicious enough to undertake this -- will stop at downvoting 50% instead of going for the full 100% of the writeups. In that case, the CS is not much better than the IQM. But then again, this is a reward system, and not a punishment system; mass downvoting will only hurt in the sense that noders will be benefiting less.
The logarithmic method (or perhaps the logarithmic mean) is not favored because:
  1. It is too complex for people who aren't familiar with mathematics. It is not easy understand how the average was obtained. This would turn the averaging procedure into a black box. This brings me back a couple of years, when I was a TA for a Chemical Engineering course. During my office hour, I would sometimes ask the juniors to sketch the graph of the 10 base logarithm. You should have seen the bizarre graphs that some of the students came up with... And these students are now engineers
  2. The log operation is rather slow on a computer. This would certainly make the average-recalculation during database backup a very lengthy procedure.

Adding information on the effects of an interdecile mean computation: Our smaple noders NFW is 1305.