| 000 | 00000cam u2200205 a 4500 | |
| 001 | 000046188504 | |
| 005 | 20241122111631 | |
| 008 | 241121s2020 cc ad e b 001 0 eng | |
| 010 | ▼a 2018420845 | |
| 015 | ▼a GBC061788 ▼2 bnb | |
| 020 | ▼a 9781492072942 ▼q (paperback) | |
| 020 | ▼a 149207294X ▼q (paperback) | |
| 035 | ▼a (KERIS)REF000019698509 | |
| 040 | ▼a UTV ▼b eng ▼c UTV ▼e rda ▼d UTV ▼d AHH ▼d OCLCF ▼d YDXIT ▼d UKMGB ▼d IBI ▼d OCLCO ▼d YDX ▼d JAS ▼d OCL ▼d DLC ▼d 211009 | |
| 042 | ▼a lccopycat | |
| 050 | 0 0 | ▼a QA276.4 ▼b .B78 2020 |
| 082 | 0 4 | ▼a 001.4/22 ▼2 23 |
| 084 | ▼a 001.422 ▼2 DDCK | |
| 090 | ▼a 001.422 ▼b B887p2 | |
| 100 | 1 | ▼a Bruce, Peter C., ▼d 1953- ▼0 AUTH(211009)50742. |
| 245 | 1 0 | ▼a Practical statistics for data scientists : ▼b 50+ essential concepts using R and Python / ▼c Peter Bruce, Andrew Bruce, and Peter Gedeck. |
| 250 | ▼a 2nd ed. | |
| 260 | ▼a Beijing : ▼b O'Reilly, ▼c 2020. | |
| 264 | 1 | ▼a Beijing : ▼b O'Reilly, ▼c 2020. |
| 264 | 4 | ▼c ©2020 |
| 300 | ▼a xvi, 342 p. : ▼b ill., charts ; ▼c 24 cm. | |
| 336 | ▼a text ▼b txt ▼2 rdacontent | |
| 337 | ▼a unmediated ▼b n ▼2 rdamedia | |
| 338 | ▼a volume ▼b nc ▼2 rdacarrier | |
| 504 | ▼a Includes bibliographical references (p. 327-328) and index. | |
| 505 | 0 | ▼a Exploratory Data Analysis -- Data and Sampling Distributions -- Statistical Experiments and Significance Testing -- Regression and Prediction -- Classification -- Statistical Machine Learning -- Unsupervised Learning. |
| 520 | ▼a Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this practical guide-now including examples in Python as well as R-explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data scientists use statistical methods but lack a deeper statistical perspective. If you're familiar with the R or Python programming languages, and have had some exposure to statistics but want to learn more, this quick reference bridges the gap in an accessible, readable format. With this updated edition, you'll dive into: Exploratory data analysis Data and sampling distributions Statistical experiments and significance testing Regression and prediction Classification Statistical machine learning Unsupervised learning.--Source other than the Library of Congress. | |
| 650 | 0 | ▼a Mathematical analysis ▼x Statistical methods. |
| 650 | 0 | ▼a Quantitative research ▼x Statistical methods. |
| 650 | 0 | ▼a R (Computer program language). |
| 650 | 0 | ▼a Python (Computer program language). |
| 650 | 0 | ▼a Statistics ▼x Data processing. |
| 700 | 1 | ▼a Bruce, Andrew, ▼d 1958-, ▼e author. |
| 700 | 1 | ▼a Gedeck, Peter, ▼e author. |
| 945 | ▼a ITMT |
소장정보
| No. | 소장처 | 청구기호 | 등록번호 | 도서상태 | 반납예정일 | 예약 | 서비스 |
|---|---|---|---|---|---|---|---|
| No. 1 | 소장처 중앙도서관/서고6층/ | 청구기호 001.422 B887p2 | 등록번호 111904064 (1회 대출) | 도서상태 대출가능 | 반납예정일 | 예약 | 서비스 |
컨텐츠정보
책소개
Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what&;s important and what&;s not.
Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you&;re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.
With this book, you&;ll learn:
- Why exploratory data analysis is a key preliminary step in data science
- How random sampling can reduce bias and yield a higher-quality dataset, even with big data
- How the principles of experimental design yield definitive answers to questions
- How to use regression to estimate outcomes and detect anomalies
- Key classification techniques for predicting which categories a record belongs to
- Statistical machine learning methods that "learn" from data
- Unsupervised learning methods for extracting meaning from unlabeled data
정보제공 :
저자소개
피터 브루스(지은이)
통계 교육기관 Statistics.com 설립자. Statistics.com은 100여 개 통계 강의를 제공하며 그중 3할은 데이터 과학자가 대상이다. 치밀한 마케팅 전략을 수립해 최고 수준의 전문 데이터 과학자들을 강사로 모집해왔다. 이 과정에서 데이터 과학자를 위한 통계라는 주제에 대해 폭넓은 시야와 전문적 식견을 쌓았다.
앤드루 브루스(지은이)
데이터 과학 실무 전문가. 30년 이상 학계, 정부, 기업계에서 통계학과 데이터 과학을 연구했다. 워싱턴 대학교에서 통계학 박사학위를 땄고 학술지에 여러 논문을 발표했다. 저명한 금융회사부터 인터넷 스타트업에 이르기까지 업계에서 발생하는 폭넓은 문제에 대해 통계 기반 솔루션을 개발했고, 데이터 과학의 실무 활용 측면에서 전문가로 인정받고 있다.
목차
Preface xiii 1 Exploratory Data Analysis 1 Elements of Structured Data 2 Further Reading 4 Rectangular Data 4 Data Frames and Indexes 6 Nonrectangular Data Structures 6 Further Reading 7 Estimates of Location 7 Mean 9 Median and Robust Estimates 10 Example: Location Estimates of Population and Murder Rates 12 Further Reading 13 Estimates of Variability 13 Standard Deviation and Related Estimates 14 Estimates Based on Percentiles 16 Example: Variability Estimates of State Population 18 Further Reading 19 Exploring the Data Distribution 19 Percentiles and Boxplots 20 Frequency Tables and Histograms 22 Density Plots and Estimates 24 Further Reading 26 Exploring Binary and Categorical Data 27 Mode 29 Expected Value 29 Probability 30 Further Reading 30 Correlation 30 Scatterplots 34 Further Reading 36 Exploring Two or More Variables 36 Hexagonal Binning and Contours (Plotting Numeric Versus Numeric Data) 36 Two Categorical Variables 39 Categorical and Numeric Data 41 Visualizing Multiple Variables 43 Further Reading 46 Summary 46 2 Data and Sampling Distributions 47 Random Sampling and Sample Bias 48 Bias 50 Random Selection 51 Size Versus Quality: When Does Size Matter? 52 Sample Mean Versus Population Mean 53 Further Reading 53 Selection Bias 54 Regression to the Mean 55 Further Reading 57 Sampling Distribution of a Statistic 57 Central Limit Theorem 60 Standard Error 60 Further Reading 61 The Bootstrap 61 Resampling Versus Bootstrapping 65 Further Reading 65 Confidence Intervals 65 Further Reading 68 Normal Distribution 69 Standard Normal and QQ-Plots 71 Long-Tailed Distributions 73 Further Reading 75 Student''s t-Distribution 75 Further Reading 78 Binomial Distribution 78 Further Reading 80 Chi-Square Distribution 80 Further Reading 81 F-Distribution 82 Further Reading 82 Poisson and Related Distributions 82 Poisson Distributions 83 Exponential Distribution 84 Estimating the Failure Rate 84 Weibull Distribution 85 Further Reading 86 Summary 86 3 Statistical Experiments and Significance Testing 87 A/B Testing 88 Why Have a Control Group? 90 Why lust A/B? Why Not C, D,…? 91 Further Reading 92 Hypothesis Tests 93 The Null Hypothesis 94 Alternative Hypothesis 95 One-Way Versus Two-Way Hypothesis Tests 95 Further Reading 96 Resampling 96 Permutation Test 97 Example: Web Stickiness 98 Exhaustive and Bootstrap Permutation Tests 102 Permutation Tests: The Bottom Line for Data Science 102 Further Reading 103 Statistical Significance and p-Values 103 p-Value 106 Alpha 107 Type 1 and Type 2 Errors 109 Data Science and p-Values 109 Further Reading 110 t-Tests 110 Further Reading 112 Multiple Testing 112 Further Reading 116 Degrees of Freedom 116 Further Reading 118 ANOVA 118 F-Statistic 121 Two-Way ANOVA 123 Further Reading 124 Chi-Square Test 124 Chi-Square Test: A Resampling Approach 124 Chi-Square Test: Statistical Theory 127 Fisher''s Exact Test 128 Relevance for Data Science 130 Further Reading 131 Multi-Arm Bandit Algorithm 131 Further Reading 134 Power and Sample Size 135 Sample Size 135 Further Reading 138 Summary 139 4 Regression and Prediction 141 Simple Linear Regression 141 The Regression Equation 143 Fitted Values and Residuals 145 Least Squares 148 Prediction Versus Explanation (Profiling) 149 Further Reading 150 Multiple Linear Regression 150 Example: King County Housing Data 151 Assessing the Model 153 Cross-Validation 155 Model Selection and Stepwise Regression 156 Weighted Regression 159 Further Reading 161 Prediction Using Regression 161 The Dangers of Extrapolation 161 Confidence and Prediction Intervals 161 Factor Variables in Regression 163 Dummy Variables Representation 164 Factor Variables with Many Levels 167 Ordered Factor Variables 169 Interpreting the Regression Equation 169 Correlated Predictors 170 Multicollinearity 172 Confounding Variables 172 Interactions and Main Effects 174 Regression Diagnostics 176 Outliers 177 Influential Values 179 Heteroskedasticity, Non-Normality, and Correlated Errors 182 Partial Residual Plots and Nonlinearity 185 Polynomial and Spline Regression 187 Polynomial 188 Splines 189 Generalized Additive Models 192 Further Reading 193 Summary 194 5 Classification 195 Naive Bayes 196 Why Exact Bayesian Classification Is Impractical 197 The Naive Solution 198 Numeric Predictor Variables 200 Further Reading 201 Discriminant Analysis 201 Covariance Matrix 202 Fisher''s Linear Discriminant 203 A Simple Example 204 Further Reading 207 Logistic Regression 208 Logistic Response Function and Logit 208 Logistic Regression and the GLM 210 Generalized Linear Models 212 Predicted Values from Logistic Regression 212 Interpreting the Coefficients and Odds Ratios 213 Linear and Logistic Regression: Similarities and Differences 214 Assessing the Model 216 Further Reading 219 Evaluating Classification Models 219 Confusion Matrix 221 The Rare Class Problem 223 Precision, Recall, and Specificity 223 ROC Curve 224 AUC 226 Lift 228 Further Reading 229 Strategies for Imbalanced Data 230 Undersampling 231 Oversampling and Up/Down Weighting 232 Data Generation 233 Cost-Based Classification 234 Exploring the Predictions 234 Further Reading 236 Summary 236 6 Statistical Machine Learning 237 K-Nearest Neighbors 238 A Small Example: Predicting Loan Default 239 Distance Metrics 241 One Hot Encoder 242 Standardization (Normalization, z-Scores) 243 Choosing K 246 KNN as a Feature Engine 247 Tree Models 249 A Simple Example 250 The Recursive Partitioning Algorithm 252 Measuring Homogeneity or Impurity 254 Stopping the Tree from Growing 256 Predicting a Continuous Value 257 How Trees Are Used 258 Further Reading 259 Bagging and the Random Forest 259 Bagging 260 Random Forest 261 Variable Importance 265 Hyperparameters 269 Boosting 270 The Boosting Algorithm 271 XGBoost 272 Regularization: Avoiding Overfitting 274 Hyperparameters and Cross-Validation 279 Summary 282 7 Unsupervised Learning 283 Principal Components Analysis 284 A Simple Example 285 Computing the Principal Components 288 Interpreting Principal Components 289 Correspondence Analysis 292 Further Reading 294 K-Means Clustering 294 A Simple Example 295 K-Means Algorithm 298 Interpreting the Clusters 299 Selecting the Number of Clusters 302 Hierarchical Clustering 304 A Simple Example 305 The Dendrogram 306 The Agglomerative Algorithm 308 Measures of Dissimilarity 309 Model-Based Clustering 311 Multivariate Normal Distribution 311 Mixtures of Normals 312 Selecting the Number of Clusters 315 Further Reading 318 Scaling and Categorical Variables 318 Scaling the Variables 319 Dominant Variables 321 Categorical Data and Gower''s Distance 322 Problems with Clustering Mixed Data 325 Summary 326 Bibliography 327 Index 329
