CONTENTS
1. Introduction to Biostalistics = 1
1.1 Introduction = 1
1.2 What is the Field of Statistics? = 1
1.3 Why Biostatistics? = 2
1.4 Goals of this Book = 3
1.5 Statistical Problems in Biomedical Research = 3
1.5.1 Example 1: King Charles II = 3
1.5.2 Example 2: Relationship Between the Use of Oral Contraceptives and Thromboembolic Disease = 4
1.5.3 Example 3: Estrogen Therapy and Endometrial Carcinoma = 5
1.5.4 Example 4: Use of Laboratory Tests and the Relation to Quality of Care = 6
1.5.5 Example 5: Salk Poliomyelitis Vaccine Field Trial = 8
1.5.6 Example 6: Internal Mammary Artery Ligation = 11
Notes = 12
Problems = 13
References = 15
2. Bioslalistical Design of Medical Studies = 17
2.1 Introduction = 17
2.2 Problems to be Investigated = 17
2.3 Some Different Types of Studies = 18
2.4 Steps Necessary to Perform a Study = 22
2.5 Ethics = 23
2.6 Data Collection: Design of Forms = 25
2.6.1 What Data are to be Collected? = 25
2.6.2 Clarity of Questions = 26
2.6.3 Pretesting of Forms and Pilot Studies = 27
2.6.4 Layout and Appearance = 27
2.7 Data Editing and Verification = 27
2.8 Data Handling = 27
2.9 Amount of Data Collected: Sample Size = 28
2.10 Inferences from a Study = 29
2.10.1 Bias = 29
2.10.2 Similarity in a Comparative Study = 29
2.10.3 Inference to a Larger Population = 31
2.10.4 Precision and Validity of Measurements = 32
2.10.5 Quantification and Reduction of Uncertainty = 32
Problems = 33
References = 34
3. Descriptive Statistics = 35
3.1 Entruduction = 35
3.2 Types of Variables = 36
3.2.1 Qualitative (Categorical) Variables = 36
3.2.2 Quantitative Variables = 37
3.3 Descriptive Statistics = 38
3.3.1 Tabulations and Frequency Distributions = 38
3.3.2 Graphs = 44
3.3.3 Stem and Leaf Diagrams = 50
3.4 Descriptive Statistics = 51
3.4.1 Introduction = 51
3.4.2 Statistics Derived from Percentiles = 51
3.4.3 Statistics Derived from Moments = 54
3.4.4 Other Measures of Location and Spread = 58
3.4.5 Which Statistics? = 59
Notes = 60
Problems = 65
References = 73
4. Statistical Inference: Populations and Samples = 75
4.1 Introduction = 75
4.2 Population and Sample = 75
4.2.1 Definition and Examples = 75
4.2.2 Estimation and Hypothesis Testing = 77
4.3 Valid Inference Through Probability Theory = 77
4.3.1 The Precise Specification of Our Ignorance = 77
4.3.2 Working with Probabilities = 78
4.3.3 Random Variables and Distributions = 83
4.4 Normal Distributions = 89
4.4.1 Introduction and Motivation = 89
4.4.2 Examples of Data that Might be Modeled by Normal Distribution = 89
4.4.3 Calculating Areas under the Normal Curve = 91
4.4.4 Probability Paper = 97
4.5 Sampling Distributions = 100
4.5.1 Statistics Are Random Variables = 100
4.5.2 Properties of Sampling Distribution = 100
4.5.3 The Central Limit Theorem = 101
4.6 Inference About the Mean of a Population = 103
4.6.1 Point and Interval Estimates = 103
4.6.2 Hypothesis Testing = 106
4.7 Confidence Intervals vs. Tests of Hypotheses = 113
4.8 Inference About the Variance of a Population = 115
4.8.1 Distribution of the Sample Variance = 115
4.8.2 Inference About a Population Variance = 117
Notes = 119
Problems = 128
References = 137
5. One- and Two-Sample Inference = 138
5.1 Introduction = 138
5.2 Pivotal Variables = 138
5.2.1 Definitions = 138
5.3 Working with Pivotal Variables = 140
5.4 The t-distribution = 143
5.5 One-Sample Inference: Location = 145
5.5.1 Estimation and Testing = 145
5.5.2 t-Tests for Paired Data = 146
5.6 Two-Sample Statistical Inference: Location = 149
5.6.1 Independent Random Variables = 149
5.6.2 Estimation and Testing = 151
5.7 Two-Sample Inference: Scale = 155
5.7.1 The F-distribution = 155
5.7.2 Testing and Estimation = 157
5.8 Sample Size Calculations = 158
Notes = 161
Problems = 165
References = 174
6. Counting Data = 176
6.1 Introduction = 176
6.2 Counting Data = 177
6.3 Binomial Random Variables = 178
6.3.1 Recognizing Binomial Random Variables = 178
6.3.2 The Binomial Model = 180
6.3.3 Hypothesis Testing for Binomial Variables = 182
6.3.4 Confidence Intervals = 183
6.3.5 Large Sample Hypothesis Testing = 183
6.3.6 Large Sample Confidence Intervals = 184
6.4 Comparing Two Proportions = 185
6.4.1 Fisher's Exact Test = 185
6.4.2 Large Sample Tests and Confidence Intervals = 187
6.4.3 Finding Sample Sizes Needed for Testing the Difference Between Proportions = 189
6.4.4 Relative Risk and the Odds Ratio = 191
6.4.5 Combination of 2 x 2 Tables = 200
6.4.6 Screening and Diagnosis: Sensitivity, Specificity, Boyes' Theorem = 206
6.5 Matched or Paired Observations = 209
6.5.1 Motivation = 209
6.5.2 Matched-Pair Data: McNemar's Test and Estimation of the Odds Ratio = 210
6.6 Poisson Random Variables = 211
6.6.1 Examples of Poisson Data = 212
6.6.2 The Poisson Model = 214
6.6.3 Large Sample Statistical Inference for the Poisson Distribution = 216
6.7 Goodness-of-Fit Tests = 218
6.7.1 Muttinomial Kandom Variables = 218
6.7.2 Known Cell Probabilities = 219
6.7.3 Addition of Independent Chi-Square Variables: Mean and Variance of the Chi-Square Distribution = 221
6.7.4 Chi-Square Tests for Unknown Cell Probabilities = 222
Notes = 225
Problems = 231
References = 242
7. Categorical Dala: Contingency Tables = 246
7.1 Introduction = 246
7.2 Two-Way Contingency Tables = 246
7.3 The Chi-Square Test for Trend in 2 x k Tables = 253
7.4 The Measurement of Agreement: kappa K = 256
7.5 Partition of Chi-Square = 259
7.6 Log-Linear Models = 263
Notes = 274
Problems = 280
References = 301
8. Nonparametric, Distribution-Free and Permutation Models: Robust Procedures = 304
8.1 Robustness: Nonparametric and Distribution-Free Procedures = 304
8.2 Review of the X2 -test for Contingency Tables: Fisher's Exact Test and McNemar's Test = 307
8.3 The Sign Test = 308
8.4 Ranks = 309
8.5 The Wilcoxon Signed Rank Test = 310
8.5.1 Assumptions and Null Hypotheses = 311
8.5.2 Alternative Hypotheses Tested with Power = 311
8.5.3 Computation of the Teat Stalistic = 311
8.5.4 Large Samples = 313
8.6 The Wilcoxon (Mann-Whitney) Two-Sample Test = 315
8.6.1 Null Hypothesis, Alternatives, and Power = 315
8.6.2 The Test Statistic = 315
8.6.3 Large Sample Approximation = 317
8.6.4 The Mann-Whitney Statistic = 319
8.7 The Kolmogorov-Smirnov Two-Sample Test = 319
8.8 Nonparametric Estimation and Confidence Intervals = 321
8.9 Permutation and Randomization Tests = 323
8.10 Monte Carlo Techniques = 327
8.10.1 Evaluation of Statistical Significance = 327
8.10.2 Empirical Evaluation of the Behavior of Statistics: Modeling and Evaluation = 328
8.11 Robust Techniques = 329
8.12 Further Reading and Directions = 331
Notes = 331
Problems = 334
References = 343
9. Association and Prediction: Linear Models wilh One Predictor Variable = 345
9.1 Introduction = 345
9.2 A Simple Linear Regression Model = 352
9.2.1 Summarizing the Data by a Linear Relationship = 352
9.2.2 Linear Regression Models = 355
9.2.3 Inference = 357
9.2.4 Analysis of Variance = 360
9.2.5 Appropriateness of the Model = 363
9.2.6 The T\vo Sample t-Test as a Regression Problem = 366
9.3 Correlation and Covariance = 369
9.3.1 Introduction = 369
9.3.2 Correlation and Covariance = 370
9.3.3 Relationship between Correlation and Regression = 375
9.3.4 Bivariate Normal Distribution = 377
9.3.5 Critical Values and Sample Size = 381
9.3.6 Using the Correlation Coefficient as a Measure of Agreement for T\vo Methods of Measuring the Same Quantity = 382
9.3.7 Errors in Bolh Variables = 384
9.3.8 Nonparametric Estimates of Correlation = 385
9.3.9 Change and Association = 389
9.4 Common Misapplication of Regression and Correlation Methods = 389
9.4.1 Regression to trie Mean = 389
9.4.2 Spurious Correlation = 390
9.4.3 Extrapolation Beyond the Range of the Data = 391
9.4.4 Inferring Causality from Correlation = 391
9.4.5 Interpretation of the Slope of the Regression Line = 392
9.4.6 Outlying Observations = 392
Notes = 393
Problems = 397
References = 417
10. Analysis of Variance = 418
10.1 Introduction = 418
10.2 One-Way Analysis of Variance = 420
10.2.1 Motivating Example = 420
10.2.2 Using the Normal Distribution Model = 421
10.2.3 One-Way ANOVA From Group Means and Standard Deviation = 429
10.2.4 One-Way Analysis of Variance Using Ranks = 430
10.3 Two-Way Analysis of Variance = 432
10.3.1 Using the Normal Distribution Model = 432
10.3.2 Randomized Block Design (RBD) = 444
10.3.3 Analyses of Randomized Block Designs Using Ranks = 445
10.3.4 Types of ANOVA Models = 448
10.4 Repeated Measures Designs and Other Designs = 451
10.4.1 Repeated Measures Designs = 451
10.4.2 Factorial Designs = 456
10.4.3 Hierarchical or Nested Designs = 456
10.4.4 Split Plot Designs = 457
10.5 Unbalanced or Nonorthogonal Designs = 458
10.5.1 Causes of Imbalance = 458
10.5.2 Restoring Balance = 458
10.5.3 Unweighted Means Analysis = 461
10.6 Validity of ANOVA Models = 462
10.6.1 Assumptions in ANOVA Models = 462
10.6.2 Transformations = 463
10.6.3 Testing of Homogeneity of Variance = 466
10.6.4 Testing of Normality in ANOVA = 469
10.6.5 Independence = 472
10.6.6 Linearity in ANOVA = 472
10.6.7 Additivity = 473
10.6.8 Strategy for Analysis of Variance = 477
Notes = 478
Problems = 481
References = 493
11. Association and Prediction: Multiple Regression Analysis, Linear Models With Multiple Predictor Variables = 496
11.1 Introduction = 496
11.2 The Multiple Regression Model = 496
11.2.1 The Linear Model = 497
11.2.2 The Least Squares Fit = 497
11.2.3 Assumptions for Statistical Inference = 500
11.2.4 Examples of Multiple Regression = 503
11.3 Linear Association: Multiple, Partial, and Canonical Correlation = 506
11.3.1 The Multipie Correlation Coefficient = 506
11.3.2 The Partial Correlation Coefficient = 510
11.3.3 The Partial Multiple Correlation Coefficient = 512
11.3.4 Canonical-Correlation = 512
11.4 Nested Hypotheses = 513
11.5 Selecting a "Best" Subset of Explanatory Variables = 518
11.5.1 The Problem, = 518
11.5.2 Approaches to the Problem Which Consider All Possible Subsets of Explanatory Variables = 519
11.5.3 Stepwise Procedures = 524
11.6 Polynomial Regression = 531
11.7 Goodness-of-Fit Considerations = 534
11.7.1 Residual Plots and Normal Probability Plots = 534
11.7.2 Nesting in More Global Hypothesis = 539
11.7.3 Splitting the Samples: Jack-Knife Procedures = 539
11.8 Analysis of Covariance = 540
11.8.1 The Need for the Analysis of Covariance = 540
11.8.2 The Analysis of Covariance Model = 542
11.9 Additional References and Directions for Further Study = 549
11.9.1 There Are now Many References on Multiple Regression Methods = 549
11.9.2 Time Series Data = 549
11.9.3 Causal Models: Structural Models and Path Analysis = 549
11.9.4 Multivariate Multiple Regression Models = 549
11.9.5 Nonlinear Regression Models = 550
Notes = 550
Problems = 553
References = 593
12. Multiple Comparisons = 596
12.1 Introduction = 596
12.2 The Multiple Comparison Problem = 596
12.3 Simultaneous Confidence Intervals and Tests for Linear Models = 600
12.3.1 Linear Combinations and Contrasts = 600
12.3.2 The Scheff e ´ Method (S-Method) = 601
12.3.3 The Tukey Method (T-Method) = 608
12.3.4 The Bonferroni Method (B-Method) = 611
12.4 Comparison of the Three Procedures = 613
12.5 Optional Stopping of Experiments = 614
12.6 Post Hoc Analysis = 617
12.6.1 The Setting = 617
12.6.2 Statistical Approaches and Principles = 618
12.6.3 Summary = 619
Notes = 620
Problems = 623
References = 628
13. Discriminalion and Classification = 630
13.1 Introduction = 630
13.2 Logistic Regression = 631
13.2.1 A Motivating Example = 631
13.2.2 Logistic Regression for k Predictor Variables = 634
13.2.3 Selecting and Testing an Appropriate Model = 638
13.3 Discriminant Analysis = 647
13.3.1 The Framework = 647
13.3.2 An Example = 650
13.3.3 Selecting and Testing an Appropriate Discriminant Model = 654
13.3.4 Quadratic Discriminant Analysis = 655
13.4 Considerations in Discrimination and Classification = 657
13.4.1 Selection and Effect of Prior Probabilities = 657
13.4.2 Criteria for Estimating Coefficients = 659
13.4.3 Bias in Estimation of Misclassification Error = 660
13.4.4 Comparison of Logistic Regression and Discriminant Analyses = 661
13.5 Mathematical Models in Computer-Aided Diagnosis = 665
13.5.1 Introduction = 665
13.5.2 Issues in the Developmsnt of Computer-Aided Diagnosis = 666
13.5.3 Evaluation of Computer Diagnosis Programs = 667
13.6 Decision Theory = 669
13.6.1 The Field = 669
13.6.2 An Example = 669
Notes = 673
Problems = 677
References = 690
14. Principal Component Analysis and Factor Analysis = 692
14.1 Variability in a Given Direction = 692
14.2 The Principal Components = 697
14.3 The Amount of Variability Explained by the Principal Components = 698
14.4 Use of the Covariance, or Correlation, Values, and Principal Component Analysis = 704
14.5 Statistical Results for Principal Component Analysis = 705
14.6 Presenting the Results of a Principal Component Analysis = 705
14.7 Uses and Interpretation of Principal Component Analysis = 706
14.8 Principal Component Analysis Examples = 707
14.9 Factor Analysis = 711
14.10 Estimation = 719
14.11 Indeterminacy of the Factor Space = 720
14.12 Determining the Number of Factors = 726
14.13 Interpretation of Factors = 729
Notes = 730
Pioblems = 731
Tables and Figures for Problems = 735
References = 762
15. Rates and Proportions = 763
15.1 Introduction = 763
15.2 Rates, Incidence, and Prevalence = 763
15.3 Direct and Indirect Standardization = 765
15.3.1 Problems With the Use of Crude Rates = 765
15.3.2 Direct Standardization = 766
15.3.3 Indirect Standardization = 769
15.3.4 Drawbacks to Using Standardized Rates = 772
15.4 Hazard Rates: When Subjects Differ in Exposure Time = 773
15.5 The Multiple Logistic Model for Estimated Risk and Adjusted Rates = 776
Notes = 777
Problems = 778
References = 784
16. Analysis of the Time to an Event: Survival Analysis = 786
16.1 Introduction = 786
16.2 The Survivorship Function or Survival Curve = 786
16.3 Estimation of the Survival Curve: The Actuarial or Life Table Method = 789
16.4 The Hazard Function or Force of Mortality = 798
16.5 Presentation of Vital Statistics Data = 799
16.6 The Product Limit or Kaplan-Meier Estimate of the Survival Curve = 801
16.7 The Comparison of Different Survival Curves: The Log-Rank Test = 803
16.8 Adjustment for Confounding Factors by Stratification = 807
16.8.1 Stratification of Life Table Analyses: The Log-Rank Test = 807
16.9 The Cox Proportional Hazard Regression Model = 811
16.9.1 The Cox Proportional Hazard Model = 811
16.9.2 An Example of the Cox Proportional Hazard Regression Model = 814
16.9.3 The Stepwise Cox Regression Model = 817
16.9.4 Interpretation of the βi Coefficients = 821
16.9.5 Use of the Cox Model as a Method of Adjustment = 822
16.10 Parametric Models = 822
16.10.1 The Exponential Model: Rates = 822
16.10.2 Two Other Parametric Models for Survival Analysis = 823
16.11 Extensions = 824
16.11.1 The Cox Model with Time Dependent Covariates = 824
16.11.2 Stratification in the Cox Model = 824
Notes = 825
Problems = 829
References = 842
17. Sample Sizes for Observational Studies = 844
17.1 Introduction = 844
17.2 Screening Studies = 844
17.3 Sample Size As a Function of Cost and Availability = 847
17.3.1 Equal Variance Case = 847
17.3.2 Unequal Variance Case = 849
17.3.3 Rule of Diminishing Precision Cain = 850
17.4 Sample Size Calculations in Selecting Continuous Variables to Discriminate Between Populations = 851
17.4.1 Univariate Screening of Continuous Variables = 852
17.4.2 Sample Size to Determine that a Set of Variables Has Discriminating Power = 855
17.4.3 Quantifying the Precision of a Discrimination Method = 857
17.4.4 Total Sample Size for an Observational Study to Select Classfication Variables = 858
Notes = 858
Problems = 862
References = 864
18. A Personal Postscript = 866
18.1 Introduction = 866
18.2 Is There Too Much Coronary Artery Surgery? = 866
18.3 Science, Regulation, and the Stock Market = 872
18.4 If We Stop The Thing that Appears to Cause the Deaths, We Must Be Prolonging Life (Or Are We?) = 880
18.5 Oh my Aching Back! = 884
18.6 If You Don't See Them Does it Mean They Aren't There? Distribution of Giardia Cysts in Drinking Water = 888
18.6.1 Background = 888
18.6.2 Poisson Model = 889
18.6.3 Rule of Threes = 890
18.6.4 Effect of Recovery Efficiency = 890
18.7 Synthesizing Information from Many Studies = 893
18.8 Are Technicians as Good as Physicians? = 897
18.9 Risky Business = 902
References = 906
Appendix = 911
Example Index = 955
Symbol Index = 959
Author Index = 963
Subject Index = 973