CONTENTS
1. TOTAL SYSTEM, SUBSYSTEMS, CATEGORIES, COMPLEX FEATURES, FEATURES, PRIMITIVES, AND OTHER ASPECTS OF KNOWLEDGE BASE STRUCTURE = 1
1-1 Introduction = 1
1-2 Preview of Knowledge Base structure = 10
1-3 The Classification Problem = 16
1-4 The Knowledge Base Structure = 16
1-5 Total System Model = 17
1-6 Total System Processing = 20
2. DEFINITIONS AND HISTORICAL REVIEW = 25
2-1 Definitions Used In Integrating Artificial Intelligence with Statistical Pattern Recognition = 25
2-2 Types of Leaming = 32
2-3 Example of Concept Formation In the Theorem on A Posteriori Probability(Patrick) = 36
3. MATHEMATICS AND PROBABILITY THEORY FOR STATISTICAL PATTERN RECOGNITION = 43
3-1 Introduction = 43
3-2 Sets, Set Operations, Mutually Exclusive Sets, Exhaustive Sets, Events = 44
3-3 Multiple Spaces(Category Space, Feature Space, Decision Space, Class Space) = 48
3-4 Feature Space = 53
3-5 Probability Theory, Axioms of Probability, Conditional Probability, Bayes Theorem, and Total Probability = 56
3-6 Relative Frequency Interpretation or Analysis = 59
3-7 Relative Frequency Interpretation of Bayes Theorem = 60
3-8 Statistically Independent Events = 61
3-9 Random Variable, Probability Distribution Function = 61
3-10 Vector Space and Linear Independence = 62
3-11 Information Theory : Definition of Information = 66
3-12 Fuzzy Set Theory = 67
3-13 A Posterlori Category Probability = 68
3-14 Decision Making or Concluding Rules = 69
3-15 Incorporating Loss = 71
3-16 Subcategories, Complex Categories, and Structured versus Ⅲ-Structured Problems = 72
4. METHOD STUDIED IN ARTIFICIAL INTELLIGENCE APPLIED TO CLASSIFICATION SYSTEMS = 79
4-1 Introduction = 79
4-2 Logic = 80
4-3 Predicate Calculus = 81
4-4 First-Order Logic = 82
4-5 Axiomatic Formulation = 82
4-6 Semantic Networks and Causal Networks = 83
4-7 Search Procedures for Problem Solving = 84
4-8 Blind and / or Graph Search = 85
4-9 Breadth-First Search of an and / or Tree = 86
4-1O Depth-First Search of an and / or Tree = 86
4-11 Heuristic State-Space Search and Ordered State-Space Search = 87
4-12 Production Systems = 87
4-13 Introduction to Knowledge and Knowledge Representation = 88
4-14 Reasoning = 89
4-15 Hierarchical Representation of Knowledge = 90
4-16 Heuristics or Rules from Artificial Intelligence = 92
4-17 Class Conditional Function Using A1 Concepts = 97
4-18 The Likelihood Function as Inductive Inference with Local Concept Formation = 100
5. THEOREM ON A POSTERIORI PROBABILITY(PATRICK) = 101
5-1 Introduction = 101
5-2 Prellmlnaries = 102
5-3 Theorem on A Posteriori Probability(Patrick) and Likelihood(Patrick) = 104
5-4 Extension = 106
5-5 Aspects of the Theorem on A Posteriori Probability(Patrick) = 108
5-6 Complex Class Model Based on Insignificance and Deduction = 111
5-7 Example of Learning Versus Recognition and Feedback in the Theorem by Patrick = 119
5-8 Rules Defining Complex Classes from Only Classes Based on Likeness and Unlikeness = 122
5-9 Generalized Complex Class Model = 125
5-10 Suspecting a Complex Class from Only Class Knowledge = 126
5-11 Suspecting an Only Class from Complex Class Knowledge = 127
5-12 Binary Choice Formulation = 128
5-13 Structured Model for Complex Classes = 129
5-14 Relative Frequency Interpretation of Theorem on A Posteriori Probability(Patrick) = 132
5-15 Learning A Posteriori Probability In the Theorem(Patrick) = 137
5-16 Consult- IR Likelihood Function Feedback System = 139
5-17 First-Order Product Model for Complex Category = 144
5-18 Theorem on A Posteriori Probability(Patrick) with Concept Formation Through Statistically Dependent Categories = 146
6. ENGINEERING HIERARCHICAL KNOWLEDGE STRUCTURES = 151
6-1 Introduction = 151
6-2 Literature Review = 152
6-3 Casnet and Expert = 153
6-4 Hierarchical Structure Proposed for Intemist-1 = 154
6-5 Hierarchical Knowledge with A Posteriori Probability = 155
6-6 Viewpoint of Hierarchy of Consult- IR Subsystem = 157
6-7 Mutually Exclusive Categories that are Statistically Dependent = 159
7. DISCUSSION OF CLASSES, ONLY CLASSES, CATEGORIES, AND STATISTICALLY DEPENDENT CATEGORIES = 163
7-1 Definitions = 163
7-2 Event-Conditional Probability Density Function = 165
7-3 A Priori Category Probabilities = 166
7-4 Mixture Probability Density = 166
7-5 Classes and Complex Classes = 167
7-6 Models for Complex Classes = 169
7-7 How Only Classes and Complex Classes Can Help Each Other During Training = 177
8. PRIMITIVES, FEATURES, COMPLEX FEATURES, AND INSIGNIFICANT FEATURES = 181
8-1 Introduction = 181
8-2 Llnear Independence = 182
8-3 Features and Feature Values = 183
8-4 Measurements = 184
8-5 Interviewer = 185
8-6 Missing Feature Values(of the Findings) = 185
8-7 Type 0 Measurement = 186
8-8 Type 1 Measurement = 186
8-9 Other Types of Measurements = 187
8-10 Statistically Independent and Dependent Features = 190
8-11 Critical Feature, Can't = 190
8-12 Rule-In Feature Value = 191
8-13 Comparing Type 0 Features with Type 1 Features = 191
8-14 Interaction Between Features = 192
8-15 Suboptimum Dependence Among Type 1 Features = 192
9. ENGINEERING CONSTRUCTION OF THE CATEGORY CONDITIONAL PROBABILITY DENSITY FUNCTION = 199
9-1 Introduction = 199
9-2 Columns or Subcategories = 200
9-3 A Priori Subcategory Probabilities = 201
9-4 Statistically Independent and Dependent Features = 201
9-5 Maximum Likelihood = 202
9-6 Independent Sets of Dependent Features = 203
9-7 Independence and Dependence-Column Specific = 203
9-8 Structure of Independent Probability Density Functions = 205
9-9 Insignificant Feature for Presentation(Column) of a Category = 206
9-10 Special Case : Continuous Probability Density for Independent Feature = 207
9-11 Creating Minicolumns In the Knowledge Base Structure = 208
9-12 Deductive Operation of Allowing Missing Feature Values = 208
9-13 Learning tne Complex Class Con ditional Probability Density Function by Deduction = 212
9-14 Likelihood Function for Category Used in Subsystem or Total System = 215
9-15 How Knowledge of Statistical Dependencies Imposed In Columns or Minicolumns Improves Performance = 217
9-16 Modifications of the Inference Functions = 220
9-17 Mixture Probability Density In Terms of Subcategories = 223
9-18 Discussion = 225
10. SELECTING THE NEXT FEATURE AUTOMATICALLY = 227
10-1 Introduction to Optimum Feature Selection = 227
10-2 Optimum Single Feature Selection at Stage ℓ = 229
10-3 Optimum Selection of Remaining Features at Stage ℓ = 229
10-4 Suboptimum Methods of Feature Selection = 230
11. TOTAL SYSTEM : INTEGRATING CONSULT- IR SUBSYSTEMS = 237
11-1 Introduction to the Total CONSULT- IR System = 237
11-2 Parameters of the Total System = 238
11-3 Primitives in a Total System = 239
11-4 Category Primitive Relationship = 240
11-5 Total System Likelihood Functions = 241
11-6 Grouping Categories into a Differential Diagnosis(Subsystem) = 242
11-7 Differential Diagnosis with Missing Features = 243
11-8 Activation Rules = 244
11-9 Optimum Total System Processing = 245
11-10 Developing a Total System = 247
12. INFERENCE FUNCTIONS AS CLOSENESS MEASURES AND THE COMPARISON OF CLASSIFICATION SYSTEMS = 255
12-1 Introduction = 255
12-2 Consideration In a Closeness Measure = 256
12-3 Category Conditional Probability Density Functions = 256
12-4 Generalized CONSULT- IR Closeness Measure = 257
12-5 Aspects of the CONSULT- IR Closeness Measure = 259
12-6 Statistical Dependence When Knowledge Is Incomplete = 262
12-7 Other Closeness Measures for Comparison with the CONSULT- IR Inference Function = 265
13. VISUALIZING CONSULT- IR USING THREE-DIMENSIONAL CONSTRUCTIONS = 281
13-1 Introduction = 281
13-2 Nodes in Hyperspace = 281
13-3 Marginal Probability Density of a Category(Missing Feature Values) = 283
13-4 Equivalent Feature Values = 284
13-5 Insignificant Feature = 285
13-6 Classical Nodes = 286
13-7 The Category Conditional Probability Density Function = 287
13-8 A Posteriori Category Probability = 287
13-9 Training Columns = 291
13-10 Category Conditional Probability Density Functions for Uncertain Observation = 293
14. CONSIDERATIONS WHEN TRAINING EXPERT SYSTEMS = 299
14-1 Introduction = 299
14-2 Expert Training Constraints = 300
14-3 Effect of Observation Error on Category Conditional Probability Density Function-Type 1 Features = 304
14-4 Effect of Observation Error on Category Conditional Probability Density Function-Type 0 Features = 312
15. DISCOVERING KNOWLEDGE USING THE OUTCOME ADVISO RR AND CONSULT LEARNING SYSTE MR = 315
15-1 Introduction = 315
15-2 Defining Events in the Feature Space = 316
15-3 Adaptive Set Construction = 317
15-4 CONSULT- IR Columns with Type 0 and Type 1 Features = 317
15-5 Average Conditional Probability Density of Degree = 319
15-6 Updating Hypercubes = 322
15-7 Conditional Differential Diagnosis = 323
15-8 Discovering Knowledge for Type 0 Features Using CONSULT LEARNING SYSTE MR = 324
15-9 Discovering Dependence Knowledge Among Type 1 Features = 326
16. CONCLUSIONS = 329
16-1 Where We Have Been = 329
16-2 The OUTCOME ADVISO RR , CONSULT LEARNING SYSTE MR , AND CONSULT- 1R = 330
16-3 Other Approaches = 335
GLOSSARY = 339
INDEX = 365