(CBR) is an instance-based learning
technique used in machine learning
. Unlike many other instance-based learning methods, CBR is used to make inferences about real knowledge, rather than real-valued functions or classification
"A case-based reasoner solves new problems by using or adapting solutions that were used to solve old problems." Reisbeck & Schank, 1989
Rather than using general knowledge of the problem domain, CBR uses a set of past experiences. To solve a new problem, we look for similar problems that have already been solved and attempt to apply the old solution to the new problem. In general, it is easier to learn from a concrete past experience than to generalize from it.
Clearly, this is performed in the real world by humans and other intelligent agents. A doctor might cure a patient with certain symptoms. Given a new patient with similar symptons, the doctor can apply the previous treatment to this patient. If we wish to model this process in order to apply it to machine learning situations, we need a formalisation.
A case is a problem situation. Previously experience cases are called past cases, and are stored in a case base. All stored cases are independent of each other. In a diagnosis application, a case may consist of a description of the problem, in the form of a set of attributes/value pairs describing the symptoms, and a solution including the diagnosis and the action to take to resolve to problem.
The CBR algorithm can be abstracted to 4 steps:
- RETRIEVE similar case(s)
- REUSE the knowledge stored in the cases(s)
- REVISE the solution
- RETAIN any newly acquired knowledge
When presented with a new problem, we will have a set of attributes/value pairs describing the current situation. Note that this is a case without the diagnosis and action part.
In order to find a solution, we examine the case base we already have. If this case base is very large, the storage and retrieval methods are important for efficiency. These methods depend on the size of the case base and the structure of the individual cases. Small case bases can be stored in linear lists and can be searched sequentially, but larger bases will require more efficient storage and retrieval methods, for instance, k-d trees.
We need to find 'similar' cases to our new problem, but what is similar? This is the most important aspect of CBR. We need to find problems that have a similar solution to the one we are looking for, and that can be easily adapted to our new problem. Similarity is a function from a pair of cases to the range [0, 1], with 0 being completely disimilar and 1 being identical.
In the case of a problem described as a set of attributes/value pairs, similarity could mean the sum of the similarity of corresponding attribute values. However, some attributes might be more important that others and, hence, should be weighted as such. For example, when diagnosing a fault that is causing a car to not start, the make of car is not as important as the charge in the battery. Also, some attributes might be missing, or they may be some attributes in the new problem that are not in the existing case.
Local similarity is the degree of similarity of individual features of cases and global similarity is the degree of similarity of the whole case. Global similarity is usually a weighted sum of the local similarities.
Once a similar case has been chosen from the case base, it is adapted to fit the current problem and the solution to the old problem is applied to the new problem. If the action is successful, the new case is stored in the case base. We do not overwrite the old case but extend our knowledge by keeping the new case.
An advantage of CBR is that it can explain how it came up with the solution. Many machine learning techniques are black box, they cannot show how they arrived at a solution so you must accept it at face value. In medical situations, some people are wary of taking advice from a machine that cannot show its reasoning.