This writeup details BAsCET's application on bibliographic references recognition. See BAsCET for an overview.
The instance seeker is an agent that has to detect a specific term, belonging to a leaf of the hierarchy of the fields among the Blackboard objects, and to build it. This term is given by the name of the Father node, given as a parameter (node that launched it to the CodeRack). This node is necessarily a specific node from the Concept Network.It searches only among the objects that are instances of fields from the Concept Network (not among the other specific terms already found), that are also hierarchical senior for its own category.
Algorithm 1 describes the details of the instance seeker agent's behavior.
Algorithm 1: Instance seeker.
Required: Father, Concept Network node which content is to find.
Find the instance of the node containing the nearest Father node
While there is an object to check Do
For each apppearance of the string to find inside this object Do
If it is found in an instance of the specific node Then
Object's happiness is incremented
Else
Try to create the corresponding object
If the object is successfully created Then
The object is to the one that describes it
End If
End If
End For
The next object to analyze becomes the instance of the node containing the father node of the current object to analyze
End While
If the string was found in the Blackboard Then
The agent is inhibited for three cycles
If the containing field is not yet found Then
Re-activate its agents ZS and FS
End If
Else
Desactivate the Father node
Inhibit this agent for two cycles
End If
Figure 1: Part of the Concept Network matching the Blackboard in Figure 2.
+------------+
| field:doc | +-------+: generic fields
+------------+
/ \ *********: instances (specific)
/ \
| |
v v
+------------+ +-----------+
|field:author| |field:title|
+------------+ +-----------+
| |
v v
+------------+ +-----------+
| field:a | | field:word|
+------------+ +-----------+
/ \
/ \
************** *************
* word:reseaux * * word:reseau *
************** *************
Figure 2: State of the Blackboard during an example treatment.
+-+--------------------------------------------+---------+
| |+---------+ +----------------------+ |field:doc|
|H||Y. Cochet|.| Reseaux de neurones |... 1988+---------+
|?|+---------+ +----------------------+ |
+-+----|------------------------|--------------+
v v
+-+---------+----------+ +--+-----------------------+-----------+
| | | field: a | | | +------+ |field:title|
|H|Y. Cochet+----------+ |H | |Reseau|x de neurones +-----------+
|?| | 13-21 | |23| +------+ | 50-68 |
+-+---------+----------+ +--+-------|---------------+-----------+
v
############################## +--+------+----------+
#+---------+-------+--------+# | | |field:word|
#| | | father |# |H |Reseau+----------+
#|Happiness|content+--------+# |74| | 0-5 |
#| | |location|# +--+------+----------+
#+---------+-------+--------+# |
############################## v
+--+------+-----------+
| | |word:reseau|
|H |Reseau+-----------+
|74| | 0-5 |
+--+------+-----------+
Figure 2 shows what the Blackboard could contain during the treatment of a problem. This should be before running the reseaux word seeker. This word is represented in the Concept Network (cf. Figure 1) by the word:reseaux node (this word, networks in French, normally has an accented letter: é, this letter is represented in the system using XML; Figure 2 shows that the system already found an instance of the specific node word:reseau, that is singular form of word:reseaux. As no linguistic pre-treatment is run on the base, both forms are represented in the Concept Network, it would be better to run such a treatment).
The Reseau sub-string occurs two times as an object. Once as an instance of the specific node word:reseau and once as an instance of the field word. This object only have a happiness of 74%, because the use of an instance seeker alone is not considered as sufficient to be sure that what have been found is effectively a word of the title matching the term reseau. If recognition would have been total, the node's happiness would have been 98%. But it has been decided to limit the happiness of the first recognition to 75%. The algorithm has a mechanism allowing to raise this happiness the second time that an instance seeker agent for the same field find the same thing.Thi, in order to take into account the fact that the object subsisted and resisted all "fights" against other candidates, for all the time between two runs of the agent, that is at least of two cycles (when the agent has been empty-handed, it is desactivated for two cycles, but its father node is also desactivated).
The happiness of the title node is proportional to the relative size of the Reseau string in this field, and to the happiness of the word field "Reseau". Reseau represents 6 characters in a field including 19 characters (spaces included). The share of this word to title is thus of 6 / 19 x 74, that is to say 23%.
When, following Algorithm 1, the string reseaux is looked for in the title field, thus in "Reseaux de neurones", where it is found, at location 0-6, with a similarity rate of 98% (it is not 100%, due to the initial uppercase). The agent then tries to create the Reseaux object (word:Reseaux), with a happiness of 98 x 76 /100 = 74%, at the location 0-6 of the field title. Yet, at this place already lies the Reseau object (word:Reseau). The conflict has to be solved, comparing the scores of both descriptions. As these scores depends on the length and on the happiness of each of the objects, and as they have the same happiness value (due to an impromptu rounding), the longer of both objects wins: Reseaux.
See also: Building Hierarchical Structures in the Blackboard, Building the logical part of a Concept Network representing bibliographic references, Building a Concept Network to represent bibliographic references, BAsCET's application on bibliographic references recognition, BAsCET.