At the request of one of the UK's most successful fraud detection system software providers, AIAI undertook an investigation into methods of applying new AI technologies to increase the accuracy of the already highly advanced systems presently in use. While the firm's software presently reduces the number of necessary fraud investigations by several orders of magnitude, our investigation showed that utilising adaptive algorithms and fuzzy logic results in significant diagnostic improvement on the most difficult sub-section of cases.
The focus of our investigation was to reduce the number of applications referred for expert investigation after the existing detection systems had been utilised. These well-proven systems take advantage of many person years of expert knowledge elicitation and encoding, and are able to reduce the initial volume of applications by roughly 2500 times (400 referred from every million analysed). It was on a database consisting solely of these hardest cases that the CBR system attempted to make diagnostically significant decisions.
Investigations into the financial data provided prove that, though highly chaotic, it has properties that allow multi-algorithmic and adaptive CBR techniques to be used for fraud classification and filtering. The data set could not be partitioned into fraud and non-fraud regions. Instead, the occurrence and distribution of fraud cases in the neighbourhood of an unknown application was observed to be diagnostically significant, and these relationships were effectively exploited by a multi-algorithmic proof-of-concept system. Neighbourhood-based and probabilistic algorithms have been shown to be appropriate techniques for classification, and may be further enhanced using additional diagnostic algorithms for decision making in borderline cases, and for calculating confidence and relative risk measures.
While more accurate performance metrics and more thorough testing is required to appropriately quantify peak precision, the initial testing results of 80% non-fraud and 52% fraud recognition strongly suggest that a multi-algorithmic CBR will be capable of high accuracy rates. A comparison with related work shows that CBR techniques can achieve similar performance in comparable problem areas.
The initial work was carried out using the AIAI CBR Shell.
A significant extension to the initial study has been commissioned, and completed. The issues of performance, system integration, robustness and stability over time were addressed. Extensive testing showed the validity of the CBR techniques.
Information Retrieval and CBR
Extracting knowledge from corporate archives is an important part of utilising information assets. Often, archived data is not stored in an immediately useful form, but requires retrieval, filtering and abstraction. The user's task also has an impact on information needs. CBR offers a useful approach to the retrieval and analysis of ill-structured data.
A web-based system for accessing and analysing incident reports was developed for a test equipment manufacturer. The reports record repair and/or replacement actions taken by engineers when testing high integrity electronic equipment. They are composed of textual descriptions as well as dates and codes. The CBR system functions both as a case retrieval and analysis tool, and as a diagnostic aid. Both of these functions depend on the explicit extraction and representation of concepts from text.
The data can be viewed as a case base of experiences of testing and fault diagnosis. Where the records contain information about symptoms or faults found in a particular unit, a symptom-to-outcome association can be formed. This knowledge can be used to propose the outcome of a new case, given a sufficiently descriptive documentation of the observed fault. Alternatively, given the device type and the outcome, the faults that have led to that outcome in the past can be determined, and such information could inform an analysis fault of occurrence or suggest probable faults.
The main function of the application is the recall of closely matching cases from the case base, given a query in natural language. Where the query is the description of a new case, the recalled cases will be those that match most closely, and whose outcomes can be expected to be most indicative of the outcome in the new case. The query may simply be a set of keywords or terms, and in this case the closely matching cases are those in which the terms occur - as is usual in information retrieval applications.
Aircraft Engine Fault Diagnosis
A CBR-based "next generation" testing and diagnostics package was developed for aircraft engine trouble-shooting. The client company was interested in investigating automated, and possibly intelligent trouble-shooting systems for their new GE-90 test cell/test bed, with an eye to cut down on test time, learn from previous experience, and transfer expert knowledge to test-cell technicians. They are also interested in looking into test-bed continuity and standardising and reducing the time necessary to overhaul, strip, test, and refit engines.
The CBR prototype analysed CENCO data files which record the thousands of measurements made during engine testing. A single case might be up to 200KB of data. This legacy data was not recorded in a uniform format and sophisticated adaptive parsing algorithms were developed to extract information from it. Advanced matching algorithms, multiple goal optimisation, and smooth-curve fuzzy functions were implemented in the CBR.
Artificial Intelligence Applications Institute
University of Edinburgh
Edinburgh EH8 9AB, UK