Introduction to data science [electronic resource] : a Python approach to concepts, techniques and applications
| 000 | 00000cam u2200205 a 4500 | |
| 001 | 000045992210 | |
| 005 | 20190805142708 | |
| 006 | m d | |
| 007 | cr | |
| 008 | 190726s2017 sz a ob 001 0 eng d | |
| 020 | ▼a 9783319500164 | |
| 020 | ▼a 9783319500171 (e-book) | |
| 040 | ▼a 211009 ▼c 211009 ▼d 211009 | |
| 050 | 4 | ▼a QA76.9.D343 |
| 082 | 0 4 | ▼a 001.42 ▼2 23 |
| 084 | ▼a 001.42 ▼2 DDCK | |
| 090 | ▼a 001.42 | |
| 100 | 1 | ▼a Igual, Laura. |
| 245 | 1 0 | ▼a Introduction to data science ▼h [electronic resource] : ▼b a Python approach to concepts, techniques and applications / ▼c Laura Igual, Santi Seguí. |
| 260 | ▼a Cham : ▼b Springer, ▼c c2017. | |
| 300 | ▼a 1 online resource (xiv, 218 p.) : ▼b ill. | |
| 490 | 1 | ▼a Undergraduate Topics in Computer Science, ▼x 1863-7310 |
| 500 | ▼a Title from e-Book title page. | |
| 504 | ▼a Includes bibliographical references and index. | |
| 505 | 0 | ▼a Introduction to Data Science -- Toolboxes for Data Scientists -- Descriptive statistics -- Statistical Inference -- Supervised Learning -- Regression Analysis -- Unsupervised Learning -- Network Analysis -- Recommender Systems -- Statistical Natural Language Processing for Sentiment Analysis -- Parallel Computing. |
| 520 | ▼a This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the emerging and interdisciplinary field of data science. The coverage spans key concepts adopted from statistics and machine learning, useful techniques for graph analysis and parallel programming, and the practical application of data science for such tasks as building recommender systems or performing sentiment analysis. Topics and features: Provides numerous practical case studies using real-world data throughout the book Supports understanding through hands-on experience of solving data science problems using Python Describes techniques and tools for statistical analysis, machine learning, graph analysis, and parallel programming Reviews a range of applications of data science, including recommender systems and sentiment analysis of text data Provides supplementary code resources and data at an associated website This practically-focused textbook provides an ideal introduction to the field for upper-tier undergraduate and beginning graduate students from computer science, mathematics, statistics, and other technical disciplines. The work is also eminently suitable for professionals on continuous education short courses, and to researchers following self-study courses. Dr. Laura Igual is an Associate Professor at the Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Spain. Dr. Santi Seguí is an Assistant Professor at the same institution. | |
| 530 | ▼a Issued also as a book. | |
| 538 | ▼a Mode of access: World Wide Web. | |
| 650 | 0 | ▼a Quantitative research. |
| 650 | 0 | ▼a Python (Computer program language). |
| 700 | 1 | ▼a Seguí, Santi. |
| 830 | 0 | ▼a Undergraduate Topics in Computer Science. |
| 856 | 4 0 | ▼u https://oca.korea.ac.kr/link.n2s?url=https://doi.org/10.1007/978-3-319-50017-1 |
| 945 | ▼a KLPA | |
| 991 | ▼a E-Book(소장) |
소장정보
| No. | 소장처 | 청구기호 | 등록번호 | 도서상태 | 반납예정일 | 예약 | 서비스 |
|---|---|---|---|---|---|---|---|
| No. 1 | 소장처 중앙도서관/e-Book 컬렉션/ | 청구기호 CR 001.42 | 등록번호 E14016062 | 도서상태 대출불가(열람가능) | 반납예정일 | 예약 | 서비스 |
컨텐츠정보
책소개
This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the emerging and interdisciplinary field of data science. The coverage spans key concepts adopted from statistics and machine learning, useful techniques for graph analysis and parallel programming, and the practical application of data science for such tasks as building recommender systems or performing sentiment analysis. Topics and features: provides numerous practical case studies using real-world data throughout the book; supports understanding through hands-on experience of solving data science problems using Python; describes techniques and tools for statistical analysis, machine learning, graph analysis, and parallel programming; reviews a range of applications of data science, including recommender systems and sentiment analysis of text data; provides supplementary code resources and data at an associated website.
New feature
This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the emerging and interdisciplinary field of data science. The coverage spans key concepts adopted from statistics and machine learning, useful techniques for graph analysis and parallel programming, and the practical application of data science for such tasks as building recommender systems or performing sentiment analysis.
Topics and features:
- Provides numerous practical case studies using real-world data throughout the book
- Supports understanding through hands-on experience of solving data science problems using Python
- Describes techniques and tools for statistical analysis, machine learning, graph analysis, and parallel programming
- Reviews a range of applications of data science, including recommender systems and sentiment analysis of text data
- Provides supplementary code resources and data at an associated website
This practically-focused textbook provides an ideal introduction to the field for upper-tier undergraduate and beginning graduate students from computer science, mathematics, statistics, and other technical disciplines. The work is also eminently suitable for professionals on continuous education short courses, and to researchers following self-study courses.
Dr. Laura Igual is an Associate Professor at the Departament de Matematiques i Informatica, Universitat de Barcelona, Spain. Dr. Santi Segui is an Assistant Professor at the same institution.
정보제공 :
목차
CONTENTS 1 Introduction to Data Science = 1 1.1 What is Data Science? = 1 1.2 About This Book = 3 2 Toolboxes for Data Scientists = 5 2.1 Introduction = 5 2.2 Why Python? = 6 2.3 Fundamental Python Libraries for Data Scientists = 6 2.3.1 Numeric and Scientific Computation : NumPy and SciPy = 7 2.3.2 SCIKIT-Learn : Machine Learning in Python = 7 2.3.3 PANDAS : Python Data Analysis Library = 7 2.4 Data Science Ecosystem Installation = 7 2.5 Integrated Development Environments (IDE) = 8 2.5.1 Web Integrated Development Environment (WIDE) : Jupyter = 9 2.6 Get Started with Python for Data Scientists = 10 2.6.1 Reading = 14 2.6.2 Selecting Data = 16 2.6.3 Filtering Data = 17 2.6.4 Filtering Missing Values = 17 2.6.5 Manipulating Data = 18 2.6.6 Sorting = 22 2.6.7 Grouping Data = 23 2.6.8 Rearranging Data = 24 2.6.9 Ranking Data = 25 2.6.10 Plotting = 26 2.7 Conclusions = 28 3 Descriptive Statistics = 29 3.1 Introduction = 29 3.2 Data Preparation = 30 3.2.1 The Adult Example = 30 3.3 Exploratory Data Analysis = 32 3.3.1 Summarizing the Data = 32 3.3.2 Data Distributions = 36 3.3.3 Outlier Treatment = 38 3.3.4 Measuring Asymmetry : Skewness and Pearson''''s Median Skewness Coefficient = 41 3.3.5 Continuous Distribution = 42 3.3.6 Kernel Density = 44 3.4 Estimation = 46 3.4.1 Sample and Estimated Mean, Variance and Standard Scores = 46 3.4.2 Covariance, and Pearson''''s and Spearman''''s Rank Correlation = 47 3.5 Conclusions = 50 References = 50 4 Statistical Inference = 51 4.1 Introduction = 51 4.2 Statistical Inference : The Frequentist Approach = 52 4.3 Measuring the Variability in Estimates = 52 4.3.1 Point Estimates = 53 4.3.2 Confidence Intervals = 56 4.4 Hypothesis Testing = 59 4.4.1 Testing Hypotheses Using Confidence Intervals = 60 4.4.2 Testing Hypotheses Using p-Values = 61 4.5 But Is the Effect E Real? = 64 4.6 Conclusions = 64 References = 65 5 Supervised Learning = 67 5.1 Introduction = 67 5.2 The Problem = 68 5.3 First Steps = 69 5.4 What Is Learning? = 78 5.5 Learning Curves = 79 5.6 Training, Validation and Test = 82 5.7 Two Learning Models = 86 5.7.1 Generalities Concerning Learning Models = 86 5.7.2 Support Vector Machines = 87 5.7.3 Random Forest = 90 5.8 Ending the Learning Process = 91 5.9 A Toy Business Case = 92 5.10 Conclusion = 95 Reference = 96 6 Regression Analysis = 97 6.1 Introduction = 97 6.2 Linear Regression = 98 6.2.1 Simple Linear Regression = 98 6.2.2 Multiple Linear Regression and Polynomial Regression = 103 6.2.3 Sparse Model = 104 6.3 Logistic Regression = 110 6.4 Conclusions = 113 References = 114 7 Unsupervised Learning = 115 7.1 Introduction = 115 7.2 Clustering = 116 7.2.1 Similarity and Distances = 117 7.2.2 What Constitutes a Good Clustering? Defining Metrics to Measure Clustering Quality = 117 7.2.3 Taxonomies of Clustering Techniques = 120 7.3 Case Study = 132 7.4 Conclusions = 138 References = 139 8 Network Analysis = 141 8.1 Introduction = 141 8.2 Basic Definitions in Graphs = 142 8.3 Social Network Analysis = 144 8.3.1 Basics in NetworkX = 144 8.3.2 Practical Case : Facebook Dataset = 145 8.4 Centrality = 147 8.4.1 Drawing Centrality in Graphs = 152 8.4.2 PageRank = 154 8.5 Ego-Networks = 157 8.6 Community Detection = 162 8.7 Conclusions = 163 References = 164 9 Recommender Systems = 165 9.1 Introduction = 165 9.2 How Do Recommender Systems Work? = 166 9.2.1 Content-Based Filtering = 166 9.2.2 Collaborative Filtering = 167 9.2.3 Hybrid Recommenders = 167 9.3 Modeling User Preferences = 167 9.4 Evaluating Recommenders = 168 9.5 Practical Case = 169 9.5.1 MovieLens Dataset = 169 9.5.2 User-Based Collaborative Filtering = 171 9.6 Conclusions = 179 References = 179 10 Statistical Natural Language Processing for Sentiment Analysis = 181 10.1 Introduction = 181 10.2 Data Cleaning = 182 10.3 Text Representation = 185 10.3.1 Bi-Grams and n-Grams = 190 10.4 Practical Cases = 191 10.5 Conclusions = 196 References = 196 11 Parallel Computing = 199 11.1 Introduction = 199 11.2 Architecture = 200 11.2.1 Getting Started = 201 11.2.2 Connecting to the Cluster (The Engines) = 202 11.3 Multicore Programming = 203 11.3.1 Direct View of Engines = 203 11.3.2 Load-Balanced View of Engines = 206 11.4 Distributed Computing = 207 11.5 A Real Application : New York Taxi Trips = 208 11.5.1 A Direct View Non-Blocking Proposal = 209 11.5.2 Results = 212 11.6 Conclusions = 214 References = 215 Index = 217
