HOME > 상세정보

상세정보

Practical statistics for data scientists : 50+ essential concepts using R and Python / 2nd ed

Practical statistics for data scientists : 50+ essential concepts using R and Python / 2nd ed (1회 대출)

자료유형
단행본
개인저자
Bruce, Peter C., 1953- Bruce, Andrew, 1958-, author. Gedeck, Peter, author.
서명 / 저자사항
Practical statistics for data scientists : 50+ essential concepts using R and Python / Peter Bruce, Andrew Bruce, and Peter Gedeck.
판사항
2nd ed.
발행사항
Beijing :   O'Reilly,   2020.  
형태사항
xvi, 342 p. : ill., charts ; 24 cm.
ISBN
9781492072942 149207294X
요약
Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this practical guide-now including examples in Python as well as R-explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data scientists use statistical methods but lack a deeper statistical perspective. If you're familiar with the R or Python programming languages, and have had some exposure to statistics but want to learn more, this quick reference bridges the gap in an accessible, readable format. With this updated edition, you'll dive into: Exploratory data analysis Data and sampling distributions Statistical experiments and significance testing Regression and prediction Classification Statistical machine learning Unsupervised learning.--Source other than the Library of Congress.
내용주기
Exploratory Data Analysis -- Data and Sampling Distributions -- Statistical Experiments and Significance Testing -- Regression and Prediction -- Classification -- Statistical Machine Learning -- Unsupervised Learning.
서지주기
Includes bibliographical references (p. 327-328) and index.
일반주제명
Mathematical analysis --Statistical methods. Quantitative research --Statistical methods. R (Computer program language). Python (Computer program language). Statistics --Data processing.
000 00000cam u2200205 a 4500
001 000046188504
005 20241122111631
008 241121s2020 cc ad e b 001 0 eng
010 ▼a 2018420845
015 ▼a GBC061788 ▼2 bnb
020 ▼a 9781492072942 ▼q (paperback)
020 ▼a 149207294X ▼q (paperback)
035 ▼a (KERIS)REF000019698509
040 ▼a UTV ▼b eng ▼c UTV ▼e rda ▼d UTV ▼d AHH ▼d OCLCF ▼d YDXIT ▼d UKMGB ▼d IBI ▼d OCLCO ▼d YDX ▼d JAS ▼d OCL ▼d DLC ▼d 211009
042 ▼a lccopycat
050 0 0 ▼a QA276.4 ▼b .B78 2020
082 0 4 ▼a 001.4/22 ▼2 23
084 ▼a 001.422 ▼2 DDCK
090 ▼a 001.422 ▼b B887p2
100 1 ▼a Bruce, Peter C., ▼d 1953- ▼0 AUTH(211009)50742.
245 1 0 ▼a Practical statistics for data scientists : ▼b 50+ essential concepts using R and Python / ▼c Peter Bruce, Andrew Bruce, and Peter Gedeck.
250 ▼a 2nd ed.
260 ▼a Beijing : ▼b O'Reilly, ▼c 2020.
264 1 ▼a Beijing : ▼b O'Reilly, ▼c 2020.
264 4 ▼c ©2020
300 ▼a xvi, 342 p. : ▼b ill., charts ; ▼c 24 cm.
336 ▼a text ▼b txt ▼2 rdacontent
337 ▼a unmediated ▼b n ▼2 rdamedia
338 ▼a volume ▼b nc ▼2 rdacarrier
504 ▼a Includes bibliographical references (p. 327-328) and index.
505 0 ▼a Exploratory Data Analysis -- Data and Sampling Distributions -- Statistical Experiments and Significance Testing -- Regression and Prediction -- Classification -- Statistical Machine Learning -- Unsupervised Learning.
520 ▼a Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this practical guide-now including examples in Python as well as R-explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data scientists use statistical methods but lack a deeper statistical perspective. If you're familiar with the R or Python programming languages, and have had some exposure to statistics but want to learn more, this quick reference bridges the gap in an accessible, readable format. With this updated edition, you'll dive into: Exploratory data analysis Data and sampling distributions Statistical experiments and significance testing Regression and prediction Classification Statistical machine learning Unsupervised learning.--Source other than the Library of Congress.
650 0 ▼a Mathematical analysis ▼x Statistical methods.
650 0 ▼a Quantitative research ▼x Statistical methods.
650 0 ▼a R (Computer program language).
650 0 ▼a Python (Computer program language).
650 0 ▼a Statistics ▼x Data processing.
700 1 ▼a Bruce, Andrew, ▼d 1958-, ▼e author.
700 1 ▼a Gedeck, Peter, ▼e author.
945 ▼a ITMT

소장정보

No. 소장처 청구기호 등록번호 도서상태 반납예정일 예약 서비스
No. 1 소장처 중앙도서관/서고6층/ 청구기호 001.422 B887p2 등록번호 111904064 (1회 대출) 도서상태 대출가능 반납예정일 예약 서비스 B M

컨텐츠정보

책소개

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what&;s important and what&;s not.

Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you&;re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

With this book, you&;ll learn:

  • Why exploratory data analysis is a key preliminary step in data science
  • How random sampling can reduce bias and yield a higher-quality dataset, even with big data
  • How the principles of experimental design yield definitive answers to questions
  • How to use regression to estimate outcomes and detect anomalies
  • Key classification techniques for predicting which categories a record belongs to
  • Statistical machine learning methods that "learn" from data
  • Unsupervised learning methods for extracting meaning from unlabeled data



정보제공 : Aladin

저자소개

피터 브루스(지은이)

통계 교육기관 Statistics.com 설립자. Statistics.com은 100여 개 통계 강의를 제공하며 그중 3할은 데이터 과학자가 대상이다. 치밀한 마케팅 전략을 수립해 최고 수준의 전문 데이터 과학자들을 강사로 모집해왔다. 이 과정에서 데이터 과학자를 위한 통계라는 주제에 대해 폭넓은 시야와 전문적 식견을 쌓았다.

앤드루 브루스(지은이)

데이터 과학 실무 전문가. 30년 이상 학계, 정부, 기업계에서 통계학과 데이터 과학을 연구했다. 워싱턴 대학교에서 통계학 박사학위를 땄고 학술지에 여러 논문을 발표했다. 저명한 금융회사부터 인터넷 스타트업에 이르기까지 업계에서 발생하는 폭넓은 문제에 대해 통계 기반 솔루션을 개발했고, 데이터 과학의 실무 활용 측면에서 전문가로 인정받고 있다.

정보제공 : Aladin

목차

Preface xiii

1 Exploratory Data Analysis 1

Elements of Structured Data 2

Further Reading 4

Rectangular Data 4

Data Frames and Indexes 6

Nonrectangular Data Structures 6

Further Reading 7

Estimates of Location 7

Mean 9

Median and Robust Estimates 10

Example: Location Estimates of Population and Murder Rates 12

Further Reading 13

Estimates of Variability 13

Standard Deviation and Related Estimates 14

Estimates Based on Percentiles 16

Example: Variability Estimates of State Population 18

Further Reading 19

Exploring the Data Distribution 19

Percentiles and Boxplots 20

Frequency Tables and Histograms 22

Density Plots and Estimates 24

Further Reading 26

Exploring Binary and Categorical Data 27

Mode 29

Expected Value 29

Probability 30

Further Reading 30

Correlation 30

Scatterplots 34

Further Reading 36

Exploring Two or More Variables 36

Hexagonal Binning and Contours (Plotting Numeric Versus Numeric Data) 36

Two Categorical Variables 39

Categorical and Numeric Data 41

Visualizing Multiple Variables 43

Further Reading 46

Summary 46

2 Data and Sampling Distributions 47

Random Sampling and Sample Bias 48

Bias 50

Random Selection 51

Size Versus Quality: When Does Size Matter? 52

Sample Mean Versus Population Mean 53

Further Reading 53

Selection Bias 54

Regression to the Mean 55

Further Reading 57

Sampling Distribution of a Statistic 57

Central Limit Theorem 60

Standard Error 60

Further Reading 61

The Bootstrap 61

Resampling Versus Bootstrapping 65

Further Reading 65

Confidence Intervals 65

Further Reading 68

Normal Distribution 69

Standard Normal and QQ-Plots 71

Long-Tailed Distributions 73

Further Reading 75

Student''s t-Distribution 75

Further Reading 78

Binomial Distribution 78

Further Reading 80

Chi-Square Distribution 80

Further Reading 81

F-Distribution 82

Further Reading 82

Poisson and Related Distributions 82

Poisson Distributions 83

Exponential Distribution 84

Estimating the Failure Rate 84

Weibull Distribution 85

Further Reading 86

Summary 86

3 Statistical Experiments and Significance Testing 87

A/B Testing 88

Why Have a Control Group? 90

Why lust A/B? Why Not C, D,…? 91

Further Reading 92

Hypothesis Tests 93

The Null Hypothesis 94

Alternative Hypothesis 95

One-Way Versus Two-Way Hypothesis Tests 95

Further Reading 96

Resampling 96

Permutation Test 97

Example: Web Stickiness 98

Exhaustive and Bootstrap Permutation Tests 102

Permutation Tests: The Bottom Line for Data Science 102

Further Reading 103

Statistical Significance and p-Values 103

p-Value 106

Alpha 107

Type 1 and Type 2 Errors 109

Data Science and p-Values 109

Further Reading 110

t-Tests 110

Further Reading 112

Multiple Testing 112

Further Reading 116

Degrees of Freedom 116

Further Reading 118

ANOVA 118

F-Statistic 121

Two-Way ANOVA 123

Further Reading 124

Chi-Square Test 124

Chi-Square Test: A Resampling Approach 124

Chi-Square Test: Statistical Theory 127

Fisher''s Exact Test 128

Relevance for Data Science 130

Further Reading 131

Multi-Arm Bandit Algorithm 131

Further Reading 134

Power and Sample Size 135

Sample Size 135

Further Reading 138

Summary 139

4 Regression and Prediction 141

Simple Linear Regression 141

The Regression Equation 143

Fitted Values and Residuals 145

Least Squares 148

Prediction Versus Explanation (Profiling) 149

Further Reading 150

Multiple Linear Regression 150

Example: King County Housing Data 151

Assessing the Model 153

Cross-Validation 155

Model Selection and Stepwise Regression 156

Weighted Regression 159

Further Reading 161

Prediction Using Regression 161

The Dangers of Extrapolation 161

Confidence and Prediction Intervals 161

Factor Variables in Regression 163

Dummy Variables Representation 164

Factor Variables with Many Levels 167

Ordered Factor Variables 169

Interpreting the Regression Equation 169

Correlated Predictors 170

Multicollinearity 172

Confounding Variables 172

Interactions and Main Effects 174

Regression Diagnostics 176

Outliers 177

Influential Values 179

Heteroskedasticity, Non-Normality, and Correlated Errors 182

Partial Residual Plots and Nonlinearity 185

Polynomial and Spline Regression 187

Polynomial 188

Splines 189

Generalized Additive Models 192

Further Reading 193

Summary 194

5 Classification 195

Naive Bayes 196

Why Exact Bayesian Classification Is Impractical 197

The Naive Solution 198

Numeric Predictor Variables 200

Further Reading 201

Discriminant Analysis 201

Covariance Matrix 202

Fisher''s Linear Discriminant 203

A Simple Example 204

Further Reading 207

Logistic Regression 208

Logistic Response Function and Logit 208

Logistic Regression and the GLM 210

Generalized Linear Models 212

Predicted Values from Logistic Regression 212

Interpreting the Coefficients and Odds Ratios 213

Linear and Logistic Regression: Similarities and Differences 214

Assessing the Model 216

Further Reading 219

Evaluating Classification Models 219

Confusion Matrix 221

The Rare Class Problem 223

Precision, Recall, and Specificity 223

ROC Curve 224

AUC 226

Lift 228

Further Reading 229

Strategies for Imbalanced Data 230

Undersampling 231

Oversampling and Up/Down Weighting 232

Data Generation 233

Cost-Based Classification 234

Exploring the Predictions 234

Further Reading 236

Summary 236

6 Statistical Machine Learning 237

K-Nearest Neighbors 238

A Small Example: Predicting Loan Default 239

Distance Metrics 241

One Hot Encoder 242

Standardization (Normalization, z-Scores) 243

Choosing K 246

KNN as a Feature Engine 247

Tree Models 249

A Simple Example 250

The Recursive Partitioning Algorithm 252

Measuring Homogeneity or Impurity 254

Stopping the Tree from Growing 256

Predicting a Continuous Value 257

How Trees Are Used 258

Further Reading 259

Bagging and the Random Forest 259

Bagging 260

Random Forest 261

Variable Importance 265

Hyperparameters 269

Boosting 270

The Boosting Algorithm 271

XGBoost 272

Regularization: Avoiding Overfitting 274

Hyperparameters and Cross-Validation 279

Summary 282

7 Unsupervised Learning 283

Principal Components Analysis 284

A Simple Example 285

Computing the Principal Components 288

Interpreting Principal Components 289

Correspondence Analysis 292

Further Reading 294

K-Means Clustering 294

A Simple Example 295

K-Means Algorithm 298

Interpreting the Clusters 299

Selecting the Number of Clusters 302

Hierarchical Clustering 304

A Simple Example 305

The Dendrogram 306

The Agglomerative Algorithm 308

Measures of Dissimilarity 309

Model-Based Clustering 311

Multivariate Normal Distribution 311

Mixtures of Normals 312

Selecting the Number of Clusters 315

Further Reading 318

Scaling and Categorical Variables 318

Scaling the Variables 319

Dominant Variables 321

Categorical Data and Gower''s Distance 322

Problems with Clustering Mixed Data 325

Summary 326

Bibliography 327

Index 329

관련분야 신착자료

윤지선 (2026)
고려대학교. D-HUSS사업단 (2025)
한국일본학회 (2025)