Everything Statistics - September 29, 2001 (2)

by Professor Pi

Thu Oct 04 2001 at 19:56:35

If you reply to my first writeup on this topic, please read the following one first: at the bottom are several important changes to the initial proposal

My excuses for posting this rebuttal in a separate node; I cannot merge it with my writeup here, since that would take over the entire front page. I will merge the 2 writeups as soon as my first writeup is no longer on the front page.

I would like to address the issues that everyone has brought up in reply to my writeup: many of you have brought up valid points that definitely need to be worked out if we would change the Level Advancement system.

Robustness

I agree that my proposal to use the median reputation is quite sensitive to mass-downvoting. GangstaFeelsGood is right that a small number of carefully chosen downvotes could drop someone's median quite easily. A mass-downvoter could simply downvote all the nodes that are exactly at the median to achieve this, and it wouldn't take many votes. On the other hand, one single downvoter would never be able to drop the median by more than 1, even if he downvotes ALL the nodes a user makes. But someone who is attacked by 2 mass-downvoters is indeed in a lot of trouble.

Unfortunately the problem of mass-downvoting by itself is not easily solved. Therefore, the method to calculate an "average" node reputation will have to be more robust with respect to mass-downvoting. Please note that in this writeup I use "average" to refer to a proper measure of central tendency, and not the mean reputation.

You can incorporate more robustness into the calculation of an average reputation by including a larger part of the node distribution. But there is a trade-off between robustness and accuracy:

Incorporating more datapoints of the node-distribution (or in the extreme, all nodes) shifts the average to unreasonably high values. It would "value" the Toilet seat writeup about 200 times higher than a decent factual writeup with a reputation=2.
Incorporating fewer datapoints of the node-distribution will make the system more prone to mass-downvoting. Or mass upvoting...

yerricde's writeup is very interesting: in his calculation we ignore the bottom 25%, ignore the top 25% and calculate the mean reputation of the middle half. This is indeed a more robust calculation. It ignores the writeups that for some reason plummeted to low reputations, and the ones that soared to very high reputations. It calculates the mean on 50% of the node base, and is thus quite stable with respect to mass-voting.

I do not agree with 1010011010's method of calculating an average reputation; Node distributions are NOT smooth continuous curves. Let me give you an example. Say, my lowest reputations are at 0, my highest are at 10:

#Rep          #Writeups
0             30 *
1             24
2             31  
3             27
4             19
5             20 * 
6             13
7             17
8             11
9              8
10             6 *

If we use a 3-point approximation, the numbers marked with asterixes are used to calculate an "average" (30, 20, and 6). Now one of the 11 rep nodes gets C!ed, and two people upvote the node. This is what my distribution will look like:

#Rep          #Writeups
0             30 *
1             24
2             31  
3             27
4             19
5             20  
6             13 *
7             17
8             11
9              8
10            11
11             0 
12             2 *

Now the three midpoints to calculate the "average" are 30, 13 and 1... So I gained two upvotes, but my average dropped significantly. Thus, the method will fail, primarily because the node histograms are not "smooth continuous curves". And what if we increase N, the number of midpoints? You then end up with a mean, shifted too much towards the higher end.

"Using median size as a reference it's perfectly possible to fit four ping-pong balls and two blue whales in a rowboat."

That is a very bad analogy that 1010011010 makes. We are trying to determine a value that statistically is the best representation of a population. In other words:

If I aselectively pick an object from a collection of four ping-pong balls and two blue whales, what object is the most likely selection?

Of course, there's a 66% chance of picking a ping-pong ball. Thus the ping-pong ball is the best representation of the population. Your example is extreme in every way; no noder would ever have a node distribution with only 4 writeups at a reputation=2, and 2 writeups at a reputation=500 (if we assume that a whale is 250x bigger than a ping-pong ball ). Or at higher node counts: 100 writeups at rep=2 and 50 writeups at reputation=500...

XP

Tes and Shanoyu make some good points on the legacy of the XP system. E2 is not just about writing nodes, but active involvement in every way. XP is a convenient way to express this involvement, even though some people attach far too much value to this statistic: or as the Voting Experience System says: XP is an imaginary number granted to you by an anonymous stranger. Treat it as such.

It was, and is not my intention to get rid of the XP system. The major objective of my proposal was to reward good writers, by moving them faster to the levels. Good writing needs to be rewarded.

Shanoyu claims that this reward system would lead to more "sex with horses" nodes. I do not share this opinion. There are indeed some noders that solely rely on noding "entertaining crap" to pep up their stats, even in the current system, but their numbers are small. Editors and gods notice these people, and eventually their "contributions" get nuked... I do not think that we will get more of these people, but if we do, we will spot them more easily, as they shoot up through the levels, and we deal with their "contributions"..

Modified Advancement System - Reward System

I have given the system some more thought, and I am currently thinking of something like this:

The "regular" Level Advancement System remains in place, with XP and #Node Requirements. Perhaps the XP requirements could be adjusted to follow more closely what the average noder is already accumulating.
For noders who node above the average reputation, there is an "Honor Roll"; the #Writeups to Level-up decreases for increasing average reputations. Drop back to the average reputation or below, and you end up with the regular advancement system.

This system has the advantage that no one will lose his/her current level: noders are not punished for having low average reputations, but they are rewarded for writing high quality nodes. Noders are still encouraged to participate in voting, to meet the requirements for the "regular" XP requirements for Level Advancement.

This system needs a lot of detailing still; especially establishing a fair and robust method of measuring the "average" node-reputation. I will most certainly take another look at the node statistics, to see what impact a modified system would have.

Many thanks for the comments, questions and suggestions that I have received.

I realized that my explanation on the Honor Roll system was a bit too short. I hope that this explanation clears things up. The following table shows the proposed Level Advancement System.

Note that the numbers given for the Honor Roll and the required "average" are preliminary. This data is just to show the concept. Don't calculate any "potential" level gain based on this data, as the final Honor Roll requirements will be stricter!!!

---------------------------------------------------------------------
	|REGULAR REQUIREMENTS	| HONOR ROLL -- "average" >=3
Level 	|XP Req.	WU Req.		|     "average" x #writeups
---------------------------------------------------------------------
1	|     0		   0	|	  N/A
2	|    50		  25	|	  N/A
3	|   200		  70	|	  210
4	|   400		 150	|	  450
5	|   800		 250	|	  750
6	|  1350		 380	|	 1140
7	|  2100		 515	|	 1545
8	|  2900		 700	|	 2100
9	|  4000		 900	|	 2700
10	|  7500		1215	|	 3645
11	| 13000		1800	|	 5400
12	| 23000		2700	|	 8100
13	| 38000		4500	|	13500
---------------------------------------------------------------------

Every noder advances after meeting the XP and WU requirements (second and third columns).
The Honor Roll - Noders can advance levels with fewer writeups if they meet the following requirements:
1. Meet the regular XP requirements.
2. Obtained Level 2 through the Regular Requirements.
3. Obtain an "average" writeup reputation of 3 or more.
Level advancement in the Honor Roll goes according to the product of the "average" node reputation and the number of writeups.

The following table shows how the Honor Roll works, giving the required number of writeups as a function of the "average" writeup reputation.

------------------------------------------------------------
Level	|    "average Writeup Reputation
	|  3	  4	  5	  6	...	>=10
------------------------------------------------------------
1	|  N/A	 N/A	 N/A	 N/A	...	 N/A	
2	|  N/A	 N/A	 N/A	 N/A	...	 N/A
3	|   70	  53	  42	  35	...	  21
4	|  150	 113	  90	  75	...	  45
5	|  250	 188	 150	 125	...	  75
6	|  380	 285	 228	 190	...	 114
7	|  515	 386	 309	 258	...	 155
8	|  700	 525	 420	 350	...	 210
9	|  900	 675	 540	 450	...	 270
10	| 1215	 911	 729	 608	...	 365
11	| 1800	1350	1080	 900	...	 540
12	| 2700	2025	1620	1350	...	 810
13	| 4500	3375	2700	2250	...	1350
------------------------------------------------------------

For an "average" writeup reputation = 3, the Writeup-requirements are identical to those of the Regular Requirements. (e.g. 3 x 70 writeups = 210 points). Any higher "average" writeup reputation will reduce the required number of writeups for leveling up (e.g. only 113 writeups at an "average" reputation=4 are required at to obtain level 4). There is a cap for "average" reputations greater than 10. This cap ensures that writeup requirements do not fall below acceptable limits.

The XP requirements remain as they are. In order to level up according to the Honor Roll system, a noder still needs to meet the XP requirements. This rule ensures participation through voting.

When a noder's "average" reputation falls below 3, the Regular Requirements for leveling up apply.

notes:

The best method of calculating a robust, accurate "average" writeup reputation is still in the air. I am currently leaning towards yerricde's method of calculating the interquartile mean. I am evaluating the sensitivity of this method towards mass downvoting.

The required "average" reputation for entering the honor roll, and the level points need to be verified. This also depends on the method of calculating the "average"

I like it!

3 C!s

(idea)

by AwkwardSaw

Fri Oct 05 2001 at 6:28:28

If we're going to look for a mathematical solution for this new leveling problem, we need to figure out how we're going to take the data (a user's writeups with their reputations) and draw the most useful conclusions out of them. Based on the data we have, what's the best way to determine what a user has contributed to Everything? We have to consider what makes someone a good noder. I haven't been around for too long, but I'm learning. I look up to many noders for the following reasons:

I learn something when I read something they wrote. We have a ton of excellent factual writers.
They make me laugh. All too many noders try to write funny nodes and fail miserably. Those who succeed should be rewarded.
Their writing is straight-up quality. When I see a writeup by certain noders in the "New Writeups" nodelet, I know I'm in for some good reading.

What's the correlation between average reputation and noder quality? Well, because of our current voting system, a general answer is obvious. The better you are, the more upvotes you'll get. It's been said that factual nodes will get you nowhere. Perhaps, before I got here, factuals weren't valued as highly. But in my experiences I've found the exact opposite to be true. I have found, however, that factuals get less attention (meaning total votes) than other nodes. I try to maintain a high standard of quality for my writeups. If an excellent writeup of mine ends up with five upvotes and no downvotes, while a node about sex ends up with a reputation of 100, that does not change the quality of what I've written. This is to be expected -- if a random node about engineering and a node about a more curious topic are written at the same time, human nature dictates that more people will read the latter one. There's nothing wrong with that, because you read what you are interested in. I'm here to read about what I'm interested in, and to share my knowledge and experiences. I think that most noders are here for the same reasons.

A node on any subject can differ in quality. A funny writeup can make me fall out of my chair laughing. A half-assed attempt at a funny writeup can be so bad that it will have a halo over its head before I can even read it. Factual nodes, as well, can be thorough and informative or short and worthless (or, even worse, incorrect).

As I've voted on writeups by good noders, I've noticed one consistent thing among all types of writeups. They have their upvotes and downvotes, but a reflection of the quality of a writeup is generally found in the ratio of upvotes to downvotes. Good factuals are rarely downvoted massively, but receive less attention, while humorous or opinionated nodes receive more attention but also more downvotes. Thus, I propose that the ratio of upvotes to downvotes that a user has received be used as a factor in the new leveling scheme.

By no means do I propose this ratio for use as the only criterion for the new level system. The same should hold true for the other proposed plans. No matter what sort of criteria we use, no number or status can precisely rank which users are better than others. To do so concretely would not only be a Herculean task, it would be unfair. There's no way to say that noder X is "better" than noder Y. Call me crazy, but when I click on Everything's Best Users I don't see it as a ranking. I see a list of people who are committed to this site, whether they are regular users, editors, or gods. For the new leveling system, plenty of good ideas have been thrown around already. The administration knows we have an intelligent user base. Putting all of our ideas together, instead of relying on one statistic, will undoubtedly produce the best solution.

That being said, I would also hope that we don't complicate the leveling system too much. There is a lot that new users have to learn about E2 already. The current scheme is easy to understand, and if we make the new scheme too convoluted, many new users with tremendous potential may be turned off. Current users won't like the idea of having to figure out what they need to do (and not do) to get to the next level. It's all about striking a healthy balance.

I like it!

(idea)

by Psychonaut

Fri Oct 05 2001 at 6:47:53

I don't have any college level statistics or math knowledge. However, I think it might be important to put in my layperson's observations into text.

What Professer Pi suggests is a more complicated formula for determining someone's level, to address certain perceived inequities in the currect XP/Writeup level system. Another user points out a potential inequity in that system, and proposes a more complicated formula for determining a user's level. Another user points out a potential inequity in that system.

Logically, every system is going to have some kind of inequity. The more complicated system we use, the more complicated the inequity, and the harder it will be to spot them.

Eventually, we have a system that the average noder can't figure out. Maybe a program like the E2 Node Tracker will be able to tell them their Wasdf GTKY ^2 WhateverNode-Foo is too low, but they'll have to find a math or statistics geek to tell them how much they have to improve their nodes to level up.

Now, admittedly, I kinda follow Professor Pi's formula. The other suggestion I'd probably have to go over a few times, and may have to use some references to figure out. But the more complicated we make the system, the harder it's going to be to quantify how "good" or "bad" we're doing. Obsurity in advancement (or lack thereof) can be very discouraging, and cause us to lose what might otherwise be quality noders, or people who could become quality noders if they could understand the advancement system. Admittedly, gaining levels (and the associated perks) are a HUGE incentive to write better nodes. I'd like more of my nodes to have a higher rep than they do now.

There is also the problem of subjectivity, not of the content of a writeup, but of the perception of those who upvote or downvote a writeup. I have a lyric node that is at -4 rep right now, simply because of the title and lyrics. Why Won't Jesse Helms Just Hurry Up And Die? is a factual representation of the lyrics of a song that has widely circulated the net. As of this writing, 8 upvotes and 12 downvotes. I never understood how a lyrics node could get downvoted, aside from a total lack of html tags to format the text (which my writeup doesn't suffer).

I also see the possiblility, with some of these formulas, of a flood of E2 Nuke Requests when people decide they want to get rid of their low reputation writeups to advance a level.

Just my 2.32 Yen (approx 2 cents) worth.

I like it!

(idea)

by 1010011010

Fri Oct 05 2001 at 8:02:06

It's flattering that Professor Pi would spend so much time on my method. What's disconcerting is how little his presentation represents what I was decribing, though his mistakes are understandible. It's obvious that the /msg function of the catbox is not the best place to discuss statystical analysis and calculus

The one that jumps out at me first is that he is not using the midpoints of the section, but rather endpoints. This, as he notes, incorporates the extreme reputations and, as expected, can skew the results. This is exactly the reason why midpoints are used.

The second, and most egregious, error is that Professor Pi has swaped the domain and range in his dataset. The reputation of the write-up represents the Y-values. The postion of a write-up in the ranked list gives you an X-value.

                    The Second Dataset

12 |                                         -
11 |                                         
10 |                                       --
 9 |                                      -  
 8 |                                    --
 7 |                                 ---     
 6 |                              ---         
 5 |                          ----           
 4 |                      ----               
 3 |                 -----                   
 2 |           ------                        
 1 |      -----                              
 0 |------____________________________________
    0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2
      0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0
                        0 0 0 0 0 0 0 0 0 0 0

If we were going to use three points we would poke the write ups at 25%, 50% and 75% down the list.
In this case #52, #103, and #155, as there are 206 write-ups. Reputations as follow: #52=1 #130=4 #155=6.
These data apply to both datasets.
Node-Fu (MpNF)? 3.7
User-X's contribution to E2 (MpNFP)? 755.3

Whoops! It's been pointed out to me that the 25th and 75th percentile do not represents the midpoints of the first and last third. It should be 16.5% and 82.5% or the #34 and #170 write-ups.
The reputation of #34 is still 1, but the reputation of #170 is 7.
The result? MpNF=4 and MpNFP=824.

What was originally done was basically taking data points evenly spaced throughout the list and then discarding the highest and lowest... which may actually be a superior measure and is definitely easier to understand.

My main complaint with MNF and MNFP is not that it's easy to manipulate because it's only based on one data point. It's that it's not very representative.

User A has 66 nodes of reputation 1 and 33 nodes of reputation 40
User B has 33 nodes of reputation 1, 33 nodes of reputation -20, and 33 nodes at 40
User C has 99 nodes at 1
User D has 33 at 1, 33 at -40, and 33 at 20
User F has 66@1 and 33@-40

All of these noders have the same number of write-ups and the same median value. MNF and MNFP treats them as if they all have identical noding habits... which is clearly not the case. Whatever method is finally chosen, it should be able to effectively distinguish between the above 5 noders and order them correctly, yet not be adversely affected if each of them adds a write up to Uses of Soy in Lesbian Monkey Foreplay which is immediately C!'d and voted to 200 while still protecting users from malicious downvoting.

I like it!

Everything Statistics - September 29, 2001 (3)	Everything Statistics - September 29, 2001	The hole in the ground for bodily waste when camping	root log: September 2001
SOY! SOY! SOY! according to the Babel Fish	Honor Roll	Why Won't Jesse Helms Just Hurry Up and Die?	E2 node tracker
measures of central tendency	I can't find a bra that fits right	is pi normal?	October 5, 2001
1010011010	Everything Statistics - January 20, 2002	TES	gestalt
The Golem Project	Leather Cow Statues	The New York Times calls about E2	2¢