pre-dawn musings on the creation and destruction of data
One of the problems a
systems developer encounters when designing a new
system is the
finite nature of
storage space. Admittedly the amount of
data it is possible to
store is increasing every day but there is an upper limit on the amount of data it is possible to store most especially when
data access
speeds enter the equation.
Let's postulate a theoretical system which stores data for any person in the world. *sheepish grin*. The first step is to decide
what data to destroy. Two obvious
criteria spring to mind.
Relevance and
Age.
Let's examine '
relevance' first of all. Obviously if the
data is not relevant to the task at hand it need not be stored. This has been the
saving grace for the majority of
current systems. Problems do occur if you store whatever is handed to the
database. In that situation one starts to wonder if it is really
necessary to store whatever a user enters. What if it is a seemingly
random set of characters. Those characters might appear
random to the
administrator but they might be vitally important to the user. Take for example the word "
barracuda". To the majority of people "
barracuda" is a fish which is easily provoked. However to me it also represents a World War II plan for an assault on
Naples which was cancelled. Most Sys Admins at this point group the data by '
The Person Entering The Data' and then they stick an
upper limit for that user. Problem
solved, that is until you understand the amount of data one user can create in a year. Whatever upper limit you set, by the time it's
realistic for the user it's
unrealistic for the system.
Trying to
accommodate the end-users (cause that's the reason we're designing the system in the first place right?) we take a look at '
age'. Immediately we
hit a brick wall. Just because data is old doesn't mean that it will not be needed or wanted sometime in the
future!
Let's examine some more possibilities:
1. Usage
2. User moderation
Usage is a distinct possibility. If we
monitor data access we pinpoint data in the
system which is '
dead'. But still we need to set a limit on the
system. Who says when dead is dead? There's has been many occurances of famous works and information being re-discovered years even centuries after they were thought lost. Famous example -
AYBABTU.
User
moderation is also a
distinct possibility. If the users decide for themselves which
information to
keep and which to
destroy then the there can be no problems or complaints when data is destroyed...the users will have
voted it
destroyed. There's alot more to examine on this
train of thought.... issues such as how to call a
vote and should all the users have the right to
vote? Another method of
user moderation is to have super-users who will edit the
content of the database. These super-users have
dictatorial powers so most current systems select super-users from
dedicated users who have a demonstrated
commitment to the system.
To sum up there are two extremes. Make all data
transient or make all data
permanent. Having all the data
transient can be represented by a
chat room or a
conversation. Perhaps the other extreme can be represented by the
internet? Somewhere inbetween is a system I want to design for users hopefully closer to the
internet than to a
chat room. Until then this is my bit of
permanence.