Everything Statistics - September 29, 2001 (3)

by 18thCandidate

Fri Oct 05 2001 at 22:30:50

Saving The XP/WU System and Fixing Its Problems

The crux of this entire discussion seems to be that the current XP requirements, compared to the WUs required, is too low and encourages people to write bad writeups, because WUs are usually the limiting factor. I propose that this problem can be fixed using the current leveling system of requiring a certain number of XP and a certain number of WUs.

I decided to take a mathematical approach for mending the current XP system instead, because as other E2 users have noticed, Professor Pi's scheme has flaws.

Let's start off by looking at the XP needed and WUs needed for each level, plus the number of votes acquiring that level will get you.

Level           XP     WUs    Votes
2   Novice      50     25     10
3   Acolyte     200    70     20
4   Scribe      400    150    30
5   Monk        800    250    45
6   Crafter     1350   380    60
7   Artisan     2100   515    75
8   Seer        2900   700    90
9   Archivist   4000   900    105
10  Avatar      7500   1215   125
11  Godhead     13000  1800   150
12  Pseudo_God  23000  2700   200
13  Pedant      38000  4500   300

Given that voting is impossible prior to Level 2, it is clear that a user needs to acquire two XP per writeup to reach that level. In fact, let's use it as a benchmark. Let's call it XP1.

Let's also assume that a writer can turn out one writeup at that level of quality (earning two XPs per writeup) every two days. Some writers can do better than this; others may not be able to. This is simply to establish some sort of baseline. So, then let's add another column called D2NL, days to next level.

Let's also assume that every other day you use all of your votes. This means that you should earn, on average, you earn 20% of votes in XP when you vote and 50% of your votes in XP after dumping all your votes. This totals to 70% of votes converted to XP every other day, or 35% every day. Let's call this DXPA, or Daily XP Added

On average, then, using the WU requirements, the XP numbers are far off kilter. Let's recalculate, using the formula (XP needed for Ln = XP needed for L(n-1) + XP1(of L(n-1)) + (D2NL*DXPA of L(n-1))).

Level           WUs    Votes  XP1    D2NL   DXPA   D2NL*DXPA  XP needed
2   Novice      25     10     50     90     3.5    315        50
3   Acolyte     70     20     140    160    7      1120       415
4   Scribe      150    30     300    200    10.5   2100       1675
5   Monk        250    45     500    260    15.75  4095       4075
6   Crafter     380    60     760    270    21     5670       8670
7   Artisan     515    75     1030   370    26.25  9712       15100
8   Seer        700    90     1400   400    31.5   12600      25842
9   Archivist   900    105    1800   730    36.75  26827      39842
10  Avatar      1215   125    2430   1170   43.75  51319      68468
11  Godhead     1800   150    3600   1800   52.5   94500      122218
12  Pseudo_God  2700   200    5400   3600   70     252000     220318
13  Pedant      4500   300    9000   n/a    n/a    n/a        477718

Compared to the old style XP requirements:

Level           Old XP  New XP    
2   Novice      50      50
3   Acolyte     200     415
4   Scribe      400     1675
5   Monk        800     4075
6   Crafter     1350    8670
7   Artisan     2100    15100
8   Seer        2900    25842
9   Archivist   4000    39842
10  Avatar      7500    68468
11  Godhead     13000   122218
12  Pseudo_God  23000   220318
13  Pedant      38000   477718

It should be pointed out that a good noder who nodes more frequently than one every two days can easily level up much faster than I predict here, because of the merit of the "cooling" system, blessings, and additional XP that good writeups will incur. This is just intended to be a rough model of an average E2 user's behavior.

In this new scheme, levels are much trickier to come by. In fact, I believe that crossing the level line will often occur with writeups before XP. This means that good writeups that can continually earn XP as people discover them and vote up (as voters would on a good WU) become much more valuable, and bad writeups simply to fill a WU count will decrease in number.

The net result of this scheme is then the same result that Professor Pi wants: good writers are rewarded for good writing. Good writers, quite simply, will be able to progress through the level system faster than those who just churn stuff out to meet the WU requirement for leveling, because this system will place the emphasis on quality over quantity.

Of course, leveling up becomes much more difficult. But, on the other hand, people of a high level would deservedly have much more prestige in the system and many bad WUs would be avoided.

Some people would most definitely drop in level if this system were implemented today. Thus, if this were to be adopted, I would be in favor of "grandfathering" those people into their current level if they so wished, with the grandfather clause disappearing at their next level up.

I strongly encourage any comments you might have via /msg. Especially you, Professor Pi.

Professor Pi's comments:

You are proposing to change the rules of the game. I am very aware that my initial proposal did that as well, and based on the many comments I received I have decided that it would be unwise to alter the existing writeups/XP requirements. My second proposal does not change the existing Level Advancement system; it only adds to the existing rules. Increasing the XP reqs is not fair to those who have been declining vote XP. Now they are faced with a big XP deficit for leveling up, and it will be much harder for them to gain levels. Among those people that decline voting XP are some awesome writers that we would penalize.
Voter participation is important; you assume using ALL votes every other day for your XP gain. At the higher levels it is nearly impossible to use all your votes unless you start dumping votes carelesly; that is not good for the DB. Ask P_I how hard it is for him to use ALL his votes... Granted, there is no incentive for him to accumulate XP in order to rise to a non-existing level, but using all votes can be hard for lower levels as well.
I believe your system would put more emphasis on those "subjective/joke/rant/sex" nodes as compared to a system based on writeup reputation. Your system will accumulate XP for ALL those nodes, as they get upvoted into the sky. A system based on a fair average node reputation is less sensitive to this. An example:
Prof Pi's Stupid Example Joke: 100 upvotes, 30 downvotes, 4C!s: 36 XP gain, and it gets nuked.
Solid State Chemistry: 30 upvotes, 3 downvotes, 2C!: 16 XP. That is less XP gain, but I now created a useful and good contribution to the DB. And that is exactly what my proposed system intends to reward.

I like it!

(idea)

by cordelia

Mon Oct 08 2001 at 10:40:59

First of, beware the pathological case. Figures don't lie, but liars figure. While it may seem reasonable to say "If a user has 6 writeups, 4 of which are at 0 rep and 2 of which are at a gazillion rep, what then?", it really isn't. You should be aware of that.

This isn't going to be a proposal for a whole new system. It's just a couple of architectural inputs, which I am sure have already been considered, but I haven't fed Klaproth in a long while.

Systematic downvoting: Beware of how attack voting will interact with the new system. A system based on MNFP could be painful (5 well placed votes would cost me 300 MNFP; 46 would cost me 600 MNFP). One way to deal with this? Obscure the rankings of a user's writeups. If anyone but the user chooses "sort by highest/lowest rep first", randomly modify the reps by +/-2, then sort. This will still give a good indicator of the order of the writeups, without painting a large target on the writeup.
Incentivize short, factual writeups. Incentivize making them longer.: Most factual noders have a string of one or two line writeups they'd love to be rid of, because they only have a rep of, say, 0. We don't feel like putting in the effort to make a writeup like simulive into a work of art, but deleting it would hurt the database. One idea: Allow a noder to "cut loose" a writeup. Writeups which have been cut loose are transferred in ownership to an account like everything, where anyone can search for things to update (Everything Quest: Replace 10 cutloose writeups with someone significantly better). Okay, maybe that idea was better in my head

Professor Pi's Comments:

"Figures don't lie..." - It would indeed be unreasonable to calculate an "average" node reputation based on 6 nodes with reputations (0, 0, 0, 0, 500, 500). But the statistical representation becomes better for an increasing number of nodes: see the (unfortunately very short) writeup on the Law of Large Numbers. For a larger number of nodes, the distribution profile more closely resembles a curve centered around the average (i.e. it is highly unlikely that the noder will eventually end up with 40 writeups at rep=0 and 20 writeups at rep=500 (or 400 and 200 writeup, respectively). The entry requirement of 25 writeups or more for the Honor Roll insures that we are applying statistics on a sufficiently large sample group.
Systematic Downvoting - As I described in (2), I stepped away from calculating an average node reputation (solely) based on the median. The most promising alternative so far is the Interquartile Mean, as described by yerricde (1). Targeted downvoting (i.e. mass downvoting of writeups at or around the median) should only have a small effect; on the order of 1--2%. Mass downvoting of all writeups could drop the "average" reputation by approximately 20--30%. This is of course a significant amount, but note that (1) this should not affect too many noders; the problem of mass downvoting is one under any system, including the current, (2) noders are not penalized when their average reputation drops below the average, and (3) you can only vote on each writeup once. A mass downvoter can only do so much damage. Good noders will still see benefits of the Honor Roll, even if they are mass downvoted. Still, it is very unfortunate that we don't have better means to fight mass-downvoting.
Incentivize short, factual writeups. Incentivize making them longer. - Maybe there should be a place where one can sumbit improved nodes. I'm afraid there would be a high risk of abuse, though: "Node X went throught he new nodes too fast. Let's just add some hardlinks and submit it to the Improvement List".

cordelia's later additions, not necessarily in reply: The current system can be described as:
To achieve level n you must have w>=W(n) writeups (see Voting/Experience System for W(n) calculation), and a node-fu*w of at least 3*W(n). Node-fu is your reputation divided by your number of writeups, and we all know how rep is calculated. Each of the proposed changes creates a new value that would effectively replace experience as a requirement for levelling; for all systems I will call this value NFW. As an example noder for each of these, I will use myself (my stats are viewable at the bottom of my homenode). (Node-fu=15.2, Wus=301, XP=4603, mean rep=5.76, median rep=4, interquartile mean rep=3.80, Total rep=1734)

To modify this system, there are two key properties to look at when addressing deficiencies (all I plan on addressing): Gaming the system, and attacking a noder.

First, looking at the standard system:

A noder's NFW is trivial to boost, merely by casting all their votes in a day. To level up, they merely have to generate enough nodes; hence noding for numbers. For me to reach Level 6, I merely have to sneak in another 79 nodes; their reputation does not matter.

Attacking me is hard. Likely, each downvote costs me -.2 NFW. A single noder can cost me 60 XP, which I'd notice, but I make that after two days of voting. If you are Pseudo_Intellectual, you could do this to me in a day; most noders would take a week or more.

Now a look at the (unlikely to occur) Median Node-Fu Product) system:

Gaming the system: My NFW (not counting the +1 bonus in that node) is 1204. Since my median node is only 5 into my set of rep 4 nodes, I could delete 56 writeups below it - leaving me with an NFW of 1225. Probably not worth it. But as I manage to add more high rep nodes, the step effect to shift my median node becomes more tempting.

Attacking the system: With 5 strategic downvotes (easy to figure out where), an attacker could shift my median node down by one, costing me 301 NFW (ouch!). However, they can't shift me down again, although each attacker could, in fact, drop me by another 301 NFW. At most, they'd have to spend 151 votes (The first spends 5, the second 47, the third 105, the fourth 146, and the fifth and beyond 151). So here, one vote can cost at least -2 NFW, but only en masse, and it is hard to defend against.

Interquartile system:

This is a harder one to game. Interquartile only counts my nodes between rep 2 and 7, giving me an NFW of 1143. If I delete 4 low ranking nodes, I can add 3 points to my NFW, to a whopping 1143. If I removed the 56 writeups from the MNFP proposal, my NFW drops to 1104 - I lost too many writeups to be countered by the average jump in value of the writeups. I could probably figure out the breakeven point, but I'll leave that for someone with even more time on their hands.

Attacking isn't as easy as MNFP, but easy than XP. The only interesting places to attack are right in the middle; each vote in the middle half of my writeups costs me exactly 2 NFW.

Mean Rep*nodes system (sum of reputations)

This system is gameable by catering to the soy-eating lesbian monkeys. Every upvote increase my NFW by one, every downvote hurts it by one. Getting a writeup voted on a lot helps ... a lot. Since most noders have a right-tailed distribution (I can never remember if that is left or right skew), NFWs will be higher (mine is 1734); if, however, they are consistently higher, a shift in the metric would reduce the effect. A cap on outliers (no node can contribute greater than 50 rep or less than -5 rep to your NFW) might also reduce the impact on the system (I like those numbers, since they don't affect me).

Attackability: Here, an attacker can only cost you one NFW per vote cast, no matter where.

Proposal

Any system which uses sampling and node reputation to calculate NFW empowers an attacker inversely proportional to the sampling rate; sampling one-half of a user's nodes doubles the power of attack votes. A system which uses all of a user's nodes is going to be more resistant to attack.

Damp outliers. In each system, the effects of outliers - such as a toilet seat write-up - have been brought up as a justification for central area sampling. Once a wirteup's reputation has passed beyond some threshold, its effect is measured only by weighting the direction of the sampling area. Instead of doing this, the reputation used to compute NFW can be damped. One simple, and easy to plot method, is to place caps on the upper and lower ends of the distribution. No node can contribute more than 25 reputation points to NFW. Higher rep nodes contribute 25 exactly. No node can deduct more than 2 reputation points from NFW. Lower rep nodes deduct 2 exactly. (Disclaimer: under this system, my NFW is 1631)

Alternate. In an alternate system, one could use logarithms to compute NFW - for instance, each node could contribute log(rep) if rep >0, or deduct log (1-rep) of <1. This could be renormalized (multiply by 10), leading our example noder to have an NFW of 1733. However, this system is attackable. The transitions from 0 to -1 and from 1 to 2 are the most significant (changing one's NFW by 0.3), although the transition from 0 to 1 is useless (no change). However, the higher a writeup's reputation goes, the less each successive vote yields to rep.

Professor Pi's comments:

Thank you, cordelia for running the sensitivity analysis for the various approaches on your node-reputation data. I don't have much to add to your comparison of the various systems: the conclusions you draw in this section are correct.

Your proposal to determine a cutoff at rep>=25 (that is, only for purposes of the Honor Roll System of course) is an interesting alternative approach to using the Interquartile Mean (IQM). However, I would like to make the following comments on your Cutoff-System (CS):

The IQM by its very nature results in a more symmetric distribution of the reputations around the average compared to the method of the mean, but also compared to the CS. As a result, I tend to put more confidence in the as IQM a measure of central tendency for the widely varying types of reputation-distributions we encounter on E2.
The upper-cutoff is a rather arbitrary value; there are actually noders with (for instance) a 3rd quartile reputation greater than this value. Now we are going to fix the relative contribution of these writeups to an arbitrary upper mark. Using the IQM, the high rep writeups can still contribute to increasing the average by raising the 3rd quartile value.
The CS is indeed more stable with respect towards mass-downvoting, but with this property comes also a weakness: downvoting of "overrated" high-reputation writeups will not show much influence on a noder's average. While it is true that in the IQM, the average can be reduced by downvoting 50% of the writeups, I have no reason to believe that someone -- who is vicious enough to undertake this -- will stop at downvoting 50% instead of going for the full 100% of the writeups. In that case, the CS is not much better than the IQM. But then again, this is a reward system, and not a punishment system; mass downvoting will only hurt in the sense that noders will be benefiting less.

The logarithmic method (or perhaps the logarithmic mean) is not favored because:

It is too complex for people who aren't familiar with mathematics. It is not easy understand how the average was obtained. This would turn the averaging procedure into a black box. This brings me back a couple of years, when I was a TA for a Chemical Engineering course. During my office hour, I would sometimes ask the juniors to sketch the graph of the 10 base logarithm. You should have seen the bizarre graphs that some of the students came up with... And these students are now engineers
The log operation is rather slow on a computer. This would certainly make the average-recalculation during database backup a very lengthy procedure.

Adding information on the effects of an interdecile mean computation: Our smaple noders NFW is 1305.

I like it!

1 C!

(thing)

by Qeyser

Mon Oct 08 2001 at 20:54:44

Just some mathy questions directed at the Prof, but I'd like to hear answers from anyone who can answer this:

So I really like the idea of the proposed MNFP system of level advancement, however I am moved by some of the comments that the median node rep might not be the best measure of central tendency. While it seems that the distribution of node reputation is roughly gaussian, it does seem that some users have a larger right side tail; that is, they have many more high rep nodes than low rep nodes.

Although a noder may not have enough high rep nodes to make his distribution very skewed, the traditional measures of central tendency for gaussian distributions may not be able to tell the whole story.

What I'm getting at is this: is there any way to meaningfully quantitify the right-side skewness of a noder's rep distribution and thus be able to reward noders that have many more high rep writups than low rep writeups?

Thanks for reading, Yours Truly Qeyser

Professor Pi's Comments:

Actually, the distribution of node reputation only resembles a normal (Gaussian) distribution. The real distribution is most likely closer to a Binomial Distribution. But other factors such as writeup nuking and C!ing (more "air-time") have influence on the shape of the distribution as well.

It would be very impractical to use models such as the Binomial Distribution or the Poisson Distribution (which is the limiting case of the former and would work out for larger rep-sums) because (1) there is far too much computation work required; lots of slow factorial calculations, and (2) the whole procedure of calculating an "average" node reputation would become far too complex, for the average noder to make sense of.

There is actually a parameter that can be used to evaluate the degree of asymmetry of a distribution; it's called skewness. It is a 3rd order function of the node-reputations. I doubt thus parameter is practical in the evalation the "average" node distribution, and it would again make the entire procedure too complex.

The median actually would be a fair measure of central tendency for the reputation-distributions we encounter, but as was already mentioned: it's easily broken by targeted downvoting on writeups with a reputation at, or slightly above the median. It is not robust enough, since it is only a single-parameter description of the distribution. In order to make it more robust, it would be better to incorporate more factors, such as the 1st and 3rd quartile values, or the reputations of all the writeups between the the 1st and 3rd quartile. Again, we don't calculate the mean of all the nodes, since that would favor the outlier points too much.

I like it!

Everything Statistics - September 29, 2001	Everything Statistics - September 29, 2001 (2)	Median Node-Fu Product	Grandfather clause
Everything Statistics - January 20, 2002	The Three Men I Admired Most: Manhattan, 9/11/01	Richard Ira Bong	simulive
Googlewhacking to estimate the number of pages indexed by Google	solid state laser	Law of large numbers	Interdecile Mean
October 8, 2001	Asian Financial Crisis	Matthew 20	Poisson distribution
Devotion	Interquartile Mean	binomial distribution	The quest for high rep nodes
Node for the Ages	statistics	E2 Annex