HOME > 상세정보

상세정보

Text mining with R : a tidy approach

Text mining with R : a tidy approach (16회 대출)

자료유형
단행본
개인저자
Silge, Julia. Robinson, David.
서명 / 저자사항
Text mining with R : a tidy approach / Julia Silge and David Robinson.
발행사항
Sebastopol, CA :   O'Reilly Media,   2017.  
형태사항
xii, 178 p. : ill. ; 24 cm.
ISBN
9781491981658
서지주기
Includes bibliographical references and index.
일반주제명
R (Computer program language). Data mining.
000 00000nam u2200205 a 4500
001 000045912516
005 20170821150715
008 170817s2017 caua b 001 0 eng d
020 ▼a 9781491981658
040 ▼a 211009 ▼c 211009 ▼d 211009
082 0 4 ▼a 006.312 ▼2 23
084 ▼a 006.312 ▼2 DDCK
090 ▼a 006.312 ▼b S582t
100 1 ▼a Silge, Julia.
245 1 0 ▼a Text mining with R : ▼b a tidy approach / ▼c Julia Silge and David Robinson.
260 ▼a Sebastopol, CA : ▼b O'Reilly Media, ▼c 2017.
300 ▼a xii, 178 p. : ▼b ill. ; ▼c 24 cm.
504 ▼a Includes bibliographical references and index.
650 0 ▼a R (Computer program language).
650 0 ▼a Data mining.
700 1 ▼a Robinson, David.
945 ▼a KLPA

소장정보

No. 소장처 청구기호 등록번호 도서상태 반납예정일 예약 서비스
No. 1 소장처 중앙도서관/서고6층/ 청구기호 006.312 S582t 등록번호 111777424 (16회 대출) 도서상태 대출가능 반납예정일 예약 서비스 B M

컨텐츠정보

책소개

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective.

The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media.

  • Learn how to apply the tidy text format to NLP
  • Use sentiment analysis to mine the emotional content of text
  • Identify a document's most important terms with frequency measurements
  • Explore relationships and connections between words with the ggraph and widyr packages
  • Convert back and forth between R's tidy and non-tidy text formats
  • Use topic modeling to classify document collections into natural groups
  • Examine case studies that compare Twitter archives, dig into NASA metadata, and analyze thousands of Usenet messages


정보제공 : Aladin

저자소개

줄리아 실기(지은이)

줄리아는 스택 오버플로에서 일하는 데이터 과학자다. 복잡한 데이터셋들을 분석하기도 하고 기술적 주제로 다양한 청중과 소통하기도 한다. 천체물리학 박사이며, 제인 오스틴을 사랑하고, 아름다운 도표 그리기를 좋아한다.

데이비드 로빈슨(지은이)

데이비드는 스택 오버플로에서 데이터 과학자로 근무하고 있으며, 프린스턴대학교에서 전산생물학 박사 학위를 받았다. broom, gganimate, fuzzyjoin, widyr 같은 R 패키지를 주로 오픈소스 형태로 개발한다.

정보제공 : Aladin

목차

CONTENTS
Preface = vii
1. The Tidy Text Format = 1
 Contrasting Tidy Text with Other Data Structures = 2
 The unnest_tokens Function = 2
 Tidying the Works of Jane Austen = 4
 The gutenbergr Package = 7
 Word Frequencies = 8
 Summary = 12
2. Sentiment Analysis with Tidy Data = 13
 The sentiments Dataset = 14
 Sentiment Analysis with Inner Join = 16
 Comparing the Three Sentiment Dictionaries = 19
 Most Common Positive and Negative Words = 22
 Wordclouds = 25
 Looking at Units Beyond Just Words = 27
 Summary = 29
3. Analyzing Word and Document Frequency : tf-idf = 31
 Term Frequency in Jane Austen''''s Novels = 32
 Zipf''''s Law = 34
 The bind_tf_idf Function = 37
 A Corpus of Physics Texts = 40
 Summary = 44
4. Relationships Between Words : N-grams and Correlations = 45
 Tokenizing by N-gram = 45
  Counting and Filtering N-grams = 46
  Analyzing Bigrams = 48
  Using Bigrams to Provide Context in Sentiment Analysis = 51
  Visualizing a Network of Bigrams with ggraph = 54
  Visualizing Bigrams in Other Texts = 59
 Counting and Correlating Pairs of Words with the widyr Package = 61
  Counting and Correlating Among Sections = 62
  Examining Pairwise Correlation = 63
 Summary = 67
5. Converting to and from Nontidy Formats = 69
 Tidying a Document-Term Matrix = 70
  Tidying Document Term Matrix Objects = 71
  Tidying dfm Objects = 74
 Casting Tidy Text Data into a Matrix = 77
 Tidying Corpus Objects with Metadata = 79
  Example : Mining Financial Articles = 81
 Summary = 87
6. Topic Modeling = 89
 Latent Dirichlet Allocation = 90
  Word-Topic Probabilities = 91
  Document-Topic Probabilities = 95
 Example : The Great Library Heist = 96
  LDA on Chapters = 97
  Per-Document Classification = 100
  By-Word Assignments : augment = 103
 Alternative LDA Implementations = 107
 Summary = 108
7. Case Study : Comparing Twitter Archives = 109
 Getting the Data and Distribution of Tweets = 109
 Word Frequencies = 110
 Comparing Word Usage = 114
 Changes in Word Use = 116
 Favorites and Retweets = 120
 Summary = 124
8. Case Study : Mining NASA Metadata = 125
 How Data Is Organized at NASA = 126
  Wrangling and Tidying the Data = 126
  Some Initial Simple Exploration = 129
 Word Co-ocurrences and Correlations = 130
  Networks of Description and Title Words = 131
  Networks of Keywords = 134
 Calculating tf-idf for the Description Fields = 137
  What Is tf-idf for the Description Field Words? = 137
  Connecting Description Fields to Keywords = 138
 Topic Modeling = 140
  Casting to a Document-Term Matrix = 140
  Ready for Topic Modeling = 141
  Interpreting the Topic Model = 142
  Connecting Topic Modeling with Keywords = 149
 Summary = 152
9. Case Study : Analyzing Usenet Text = 153
 Preprocessing = 153
  Preprocessing Text = 155
 Words in Newsgroups = 156
  Finding tf-idf Within Newsgroups = 157
  Topic Modeling = 160
 Sentiment Analysis = 163
  Sentiment Analysis by Word = 164
  Sentiment Analysis by Message = 167
  N-gram Analysis = 169
 Summary = 171
Bibliography = 173
Index = 175

관련분야 신착자료

Dyer-Witheford, Nick (2026)
양성봉 (2025)