display | more...
Hopefully this is the daylog to which I was referred for the purposes of evaluating my sense of humor.

On the oft-heard topic of 'my martial arts school/style is better than yours':

Our white belts can beat up your black belts, yellow belts will take on your whole school and mop up your extended families for dessert, orange belts tie one hand behind their back and challenge bulls and bears to unarmed combat. Green belts chop down giant sequoias with their bare hands, scoffing at the EPA all the while, purple belts put their underwear on their head at night to fight super villains, blue belts achieve enlightenment every Tuesday, Thursday and Saturday. Brown belts don't bother with their tax returns, red belts break bricks with their mind and can start a fire by vigorously rubbing their legs together, red-black stripes crack pavement and terrorize livestock with their mere presence, black belts cause quantum singularities at the impact of every punch, and the Master creates sentient life from common household cleaning products.

In this world, there are many ways in which a person can be considered alone. By the dictionary's definition, you're alone when you're without company. Fair enough, but it's hard to go somewhere and not be surrounded by people (even if you don't know most of them), so we would have to take company to mean friends, for it to make any sense. Yet another meaning of alone is to say that someone isn't in a romantic relationship - in fact, many languages use nearly the same word for single (as in Marital status: Single) as for alone. This is all very interesting; much like steveometry, it's something most people could care less about.

Why this rant? Well, I've noticed a difference lately. I don't know what happened, but before, I'd always felt a bit alone, though not in the regular sense; I certainly have friends and a (somewhat) caring family, and I'm surrounded by either of them virtually all the time, except when sleeping in the privacy of my own room. Not that I object to this. No, this feeling of loneliness was perhaps better described as a feeling that something important was missing.

Not knowing when it started, not knowing what caused it, I moved on, as people do. I didn't search for the cause; at the time, it seemed a fruitless venture that would have caused me more hurt than good. Instead, I got used to the feeling, and with time I stopped noticing it. I thought it had gone away for good, only appearing from time to time as shadows in my dreams, sometimes taking human shape. Those were enjoyable dreams.

I had decided to be satisfied in knowing that the thing I was missing, I would never find.

Then one day, it just appeared. She came to me, proving that I had been wise not to search for her - if I had, I would never have found her, nor would she have come to me. Besides, searching for something that will come to you by itself is a foolish errand. Patience is a virtue, as the wise constantly say. My patience paid off.

The feeling disappeared. There's no void in my life anymore.

I'm not alone.

Being the statistics nut that I am, I was curious about the relative contributions of e2 users. I mean, on one hand you have a ton of users who contributed one daylog and then bolted for the door. On the other, you have pingouin's 1883 writeups, Segnbora-t's 1715, and the Jargon File's 2254 contributions. Not to mention that one guy with all the words.

So, being somewhat learned in the ways of web programming, I conjured up a PHP script to surf e2 for users and record the number of writeups they have. I limited my search to only users who have contributed at least one writeup.

So far, I've pulled out 5,340 users and over 350,000 writeups (that includes Webby and user aliases) for my data - roughly 77% of all total writeups. Throwing out the esteemed Mr. Webster as a major outlier, here are some general statistics for the data:

Users Found: 5348
Total Writeups: 252295
Mean: 47.184402468674
Standard Deviation: 19.657106235794

That is to say, the average user contributes 47 writeups here - not too shabby. But with a deviation of 19, our number's not nearly as promising. That means that about 65% of our users have somewhere between 28 and 66 writeups, and about 95% have between 8 and 85 writeups. And since you can't have less than 1 writeup in the data, the third deviation (covering 98% of all users) is significantly skewed to the right.

Of course, all of this data makes sense. We have a lot of users past and present who contributed 10-50 writeups and then generally cooled off, and post maybe one writeup a month - if they're stilla round. And so users like Pseudo_intellectual, who posted 1600+ writeups, make up for a lot of the data. Disproportionately so, it would seem.

And now it's time for a break down.

So, bearing in mind my original experiment, to see who contributes what to e2, here is a breakdown into percentiles the number of writeups contributed. (I threw out 8 users with only 1 writeup to get a number of users divisible by 20. Their addition to the data would be negligible.)

Total and Cumulative Data, 5% increments (n= 267)

 TOTAL    PCT     CUM.    PCT
135572  53.74   135572  53.74 (Top 5%)
 43733  17.33   179305  71.07
 25458  10.09   204763  81.16
 16330   6.47   221093  87.63 
 10480   4.15   231573  91.79 (Top 25%)
  6493   2.57   238066  94.36
  3933   1.56   241999  95.92
  2517   1.00   244516  96.92
  1781   0.71   246297  97.62
  1284   0.51   247581  98.13 (Top 50%)
   997   0.41   248578  98.53
   804   0.32   249382  98.85
   577   0.23   249959  99.07
   527   0.21   250486  99.28
   472   0.19   250958  99.47 (Top 75%)
   267   0.11   251225  99.58 (1 writeup per user)
   267   0.11   251492  99.68         |
   267   0.11   251760  99.79         |
   267   0.11   252027  99.89         |
   260   0.10   252287    100         v

Looking at this data (which is more than statistically significant and well above minimum sampling requirements) we see that the top 5% of users contribute over 50% of the total nodegel. We also see that Pareto's Law holds more than true, with the top 20% of users contributing 87.63% of the writeups.

It's important to note here that we're talking about out of 5,340 users, not the 78,000 listed in Everything Statistics. 20% of that is 1,068 users. The lowest person on that list has 49 writeups. So if you've hit 50 or more, congratulations - you're a major E2 contributor! (If you've reached 123 writeups, you've breached the top 10% plateau.)

As I accumulate more data, I notice two trends. One is that both the mean and the standard deviation are moving downwards - it seems I've culled the majority of users with more than 50 writeups, though there are still a few floating around. The second trend is that the number of users with 1 writeup is astounding - almost the point of ludicrosity. I wonder why so many people just stuck around for a cup of coffee and then exited forever ...

To emphasize this, I remembered Benford's Law: that in large unbounded distributions, the distribution of the first digit of the data is logarithmic, and thus more of the data's first digits will be a 1. The actual formula for the probability of a certain digit appearing is

log_10( 1+ (1/d) )

where d is the digit in question. Anyway, here is a comparison of expected to actual outcomes:

DIGIT EXPECTED ACTUAL PROPORTION
    1   0.3010 0.4042      1.343
    2   0.1761 0.1923      1.092
    3   0.1249 0.1139      0.912
    4   0.0969 0.0814      0.840
    5   0.0792 0.0561      0.708
    6   0.0669 0.0514      0.768
    7   0.0580 0.0368      0.634
    8   0.0512 0.0335      0.654
    9   0.0458 0.0305      0.666

Notice that the numbers on the right in theory should all be 1s - that is, if we had a relatively uniform distribution. But we have so many users with just 1 writeup that it skews it entirely. Over half (56%) of the users whose number of writeups begin with a 1 have, in fact, just 1 writeup. Taking them out brings the numbers much closer to Benford's ideal.

Most of this is just data for the curious. It doesn't address issues of quality, reputation, how long a user has been with e2, or any of the other data that might prove more compelling or interesting. Maybe some day I'll go get some of that data, but for now, you can check these numbers out and maybe come up with an interesting conclusion of your own. Ciao!

Log in or register to write something here or to contact authors.