고려대학교 세종학술정보원

이전 결과로 돌아가기 검색화면

MARC

Embedded deep learning : algorithms, architectures and circuits for always-on neural network processomg (6회 대출)

자료유형

단행본

개인저자

Moons, Bert. Bankman, Daniel. Verhelst, Marian.

서명 / 저자사항

Embedded deep learning : algorithms, architectures and circuits for always-on neural network processomg / Bert Moons, Daniel Bankman, Marian Verhelst.

발행사항

Cham : Springer, 2018.

형태사항

xvi, 206 p. : ill. (some col.) ; 25 cm.

ISBN

9783319992228

서지주기

Includes bibliographical references and index.

일반주제명

Machine learning. Algorithms.

000		00000nam u2200205 a 4500
001		000045967061
005		20190110104822
008		190109s2018 sz a b 001 0 eng d
020		▼a 9783319992228
040		▼a 211009 ▼c 211009 ▼d 211009
082	0 4	▼a 006.31 ▼2 23
084		▼a 006.31 ▼2 DDCK
090		▼a 006.31 ▼b M818e
100	1	▼a Moons, Bert.
245	1 0	▼a Embedded deep learning : ▼b algorithms, architectures and circuits for always-on neural network processomg / ▼c Bert Moons, Daniel Bankman, Marian Verhelst.
260		▼a Cham : ▼b Springer, ▼c 2018.
300		▼a xvi, 206 p. : ▼b ill. (some col.) ; ▼c 25 cm.
504		▼a Includes bibliographical references and index.
650	0	▼a Machine learning.
650	0	▼a Algorithms.
700	1	▼a Bankman, Daniel.
700	1	▼a Verhelst, Marian.
945		▼a KLPA

소장정보

과학도서관

No.	소장처	청구기호	등록번호	도서상태	반납예정일	예약	서비스
No. 1	소장처 과학도서관/Sci-Info(2층서고)/	청구기호 006.31 M818e	등록번호 121247463 (6회 대출)	도서상태 대출가능	반납예정일	예약	서비스 B M

컨텐츠정보

책소개

This book covers algorithmic and hardware implementation techniques to enable embedded deep learning. The authors describe synergetic design approaches on the application-, algorithmic-, computer architecture-, and circuit-level that will help in achieving the goal of reducing the computational cost of deep learning algorithms. The impact of these techniques is displayed in four silicon prototypes for embedded deep learning.

Gives a wide overview of a series of effective solutions for energy-efficient neural networks on battery constrained wearable devices;
Discusses the optimization of neural networks for embedded deployment on all levels of the design hierarchy ? applications, algorithms, hardware architectures, and circuits ? supported by real silicon prototypes;
Elaborates on how to design efficient Convolutional Neural Network processors, exploiting parallelism and data-reuse, sparse operations, and low-precision computations;
Supports the introduced theory and design concepts by four real silicon prototypes. The physical realization’s implementation and achieved performances are discussed elaborately to illustrated and highlight the introduced cross-layer design concepts.

New feature

Gives a wide overview of a series of effective solutions for energy-efficient neural networks on battery constrained wearable devices;
Discusses the optimization of neural networks for embedded deployment on all levels of the design hierarchy ? applications, algorithms, hardware architectures, and circuits ? supported by real silicon prototypes;
Elaborates on how to design efficient Convolutional Neural Network processors, exploiting parallelism and data-reuse, sparse operations, and low-precision computations;
Supports the introduced theory and design concepts by four real silicon prototypes. The physical realization’s implementation and achieved performances are discussed elaborately to illustrated and highlight the introduced cross-layer design concepts.

정보제공 :

펼치기

Intro -- Preface -- Acknowledgments -- Contents -- Acronyms -- 1 Embedded Deep Neural Networks -- 1.1 Introduction -- 1.2 Machine Learning -- 1.2.1 Tasks, T -- 1.2.2 Performance Measures, P -- 1.2.3 Experience, E -- 1.2.3.1 Supervised Learning -- 1.2.3.2 Unsupervised Learning -- 1.3 Deep Learning -- 1.3.1 Deep Feed-Forward Neural Networks -- 1.3.2 Convolutional Neural Networks -- 1.3.3 Recurrent Neural Networks -- 1.3.4 Training Deep Neural Networks -- 1.3.4.1 Loss Functions -- 1.3.4.2 Backpropagation -- 1.3.4.3 Optimization -- 1.3.4.4 Data Sets -- 1.3.4.5 Regularization -- 1.3.4.6 Training Frameworks -- 1.4 Challenges for Embedded Deep Neural Networks -- 1.5 Book Contributions -- References -- 2 Optimized Hierarchical Cascaded Processing -- 2.1 Introduction -- 2.2 Hierarchical Cascaded Systems -- 2.2.1 Generalizing Two-Stage Wake-Up Systems -- 2.2.2 Hierarchical Cost, Precision, and Recall -- 2.2.3 A Roofline Model for Hierarchical Classifiers -- 2.2.4 Optimized Hierarchical Cascaded Sensing -- 2.3 General Proof of Concept -- 2.3.1 System Description -- 2.3.2 Input Statistics -- 2.3.3 Experiments -- 2.3.3.1 Optimal Number of Stages -- 2.3.3.2 Optimal Stage Metrics in a Hierarchy -- 2.3.4 Conclusion -- 2.4 Case study: Hierarchical, CNN-Based Face Recognition -- 2.4.1 A Face Recognition Hierarchy -- 2.4.2 Hierarchical Cost, Precision, and Recall -- 2.4.3 An Optimized Face Recognition Hierarchy -- 2.5 Conclusion -- References -- 3 Hardware-Algorithm Co-optimizations -- 3.1 An Introduction to Hardware-Algorithm Co-optimization -- 3.1.1 Exploiting Network Structure -- 3.1.2 Enhancing and Exploiting Sparsity -- 3.1.3 Enhancing and Exploiting Fault-Tolerance -- 3.2 Energy Gains in Low-Precision Neural Networks -- 3.2.1 Energy Consumption of Off-Chip Memory-Access -- 3.2.2 Generic Hardware Platform Modeling -- 3.3 Test-Time Fixed-Point Neural Networks -- 3.3.1 Analysis and Experiments -- 3.3.2 Influence of Quantization on Classification Accuracy -- 3.3.2.1 Uniform Quantization and Per-Layer Rescaling -- 3.3.2.2 Per-Layer Quantization -- 3.3.3 Energy in Sparse FPNNs -- 3.3.4 Results -- 3.3.5 Discussion -- 3.4 Train-Time Quantized Neural Networks -- 3.4.1 Training QNNs -- 3.4.1.1 Train-Time Quantized Weights -- 3.4.1.2 Train-Time Quantized Activations -- 3.4.1.3 QNN Input Layers -- 3.4.1.4 Quantized Training -- 3.4.2 Energy in QNNs -- 3.4.3 Experiments -- 3.4.3.1 Benchmarks -- 3.4.3.2 QNN Topologies -- 3.4.4 Results -- 3.4.5 Discussion -- 3.5 Clustered Neural Networks -- 3.6 Conclusion -- References -- 4 Circuit Techniques for Approximate Computing -- 4.1 Introducing the Approximate Computing Paradigm -- 4.2 Approximate Computing Techniques -- 4.2.1 Resilience Identification and Quality Management -- 4.2.2 Approximate Circuits -- 4.2.3 Approximate Architectures -- 4.2.4 Approximate Software -- 4.2.5 Discussion -- 4.3 DVAFS: Dynamic-Voltage-Accuracy-Frequency-Scaling -- 4.3.1 DVAFS Basics -- 4.3.1.1 Introducing the DVAFS Energy-Accuracy Trade-Off -- 4.3.1..
2 Precision Scaling in DVAFS -- 4.3.2 Resilience Identification for DVAFS -- 4.3.3 Energy Gains in DVAFS -- 4.3.3.1 DAS: Dynamic-Accuracy-Scaling -- 4.3.3.2 DVAS: Dynamic-Voltage-Accuracy-Scaling -- 4.3.3.3 DVAFS: Dynamic-Voltage-Accuracy-Frequency-Scaling -- 4.4 Performance Analysis of DVAFS -- 4.4.1 Block Level DVAFS -- 4.4.2 System-Level DVAFS -- 4.5 Implementation Challenges of DVAFS -- 4.5.1 Functional Implementation of Basic DVA(F)S Building Blocks -- 4.5.1.1 DAS and DVAS Compatible Building Blocks -- 4.5.1.2 DVAFS-Compatible Building Blocks -- 4.5.2 Physical Implementation of DVA(F)S Building Blocks -- 4.5.2.1 Granular Supply Scaling in DVAFS -- 4.5.2.2 Enforcing Critical Path Scaling in DVAFS -- 4.6 Overview and Discussion -- References -- 5 ENVISION: Energy-Scalable Sparse Convolutional Neural Network Processing -- 5.1 Neural Network Acceleration -- 5.2 The Envision Processor Architecture -- 5.2.1 Processor Datapath -- 5.2.1.1 2D-MAC Array -- 5.2.1.2 Other Compute Units -- 5.2.2 On-Chip Memory Architecture -- 5.2.2.1 On-Chip Main Memory -- 5.2.2.2 Direct Memory Access Controller -- 5.2.3 Hardware Support for Exploiting Network Sparsity -- 5.2.3.1 Guarding Operations -- 5.2.3.2 Compressing IO Streams for Off-Chip Communication -- 5.2.4 Energy-Efficient Flexibility Through a Custom Instruction Set -- 5.2.5 Conclusion and Overview -- 5.3 DVAS Compatible Envision V1 -- 5.3.1 RTL Level Hardware Support -- 5.3.2 Physical Implementation -- 5.3.3 Measurement Results -- 5.3.3.1 Performance of the Full Precision Baseline -- 5.3.3.2 Performance Under Dynamic Precision DVAS -- 5.3.3.3 Performance on Sparse Datastreams -- 5.3.3.4 Performance on Benchmarks -- 5.3.3.5 Comparison with the State of the Art -- 5.3.4 Envision V1 Overview -- 5.4 DVAFS-Compatible Envision V2 -- 5.4.1 RTL Level Hardware Support -- 5.4.2 Physical Implementation -- 5.4.3 Measurement Results -- 5.4.3.1 Performance Under DVA(F)S -- 5.4.3.2 Influence of Optimal Body-Biasing -- 5.4.3.3 Performance on Sparse Datastreams -- 5.4.3.4 Performance on Benchmarks -- 5.4.3.5 Comparison with the State of the Art -- 5.4.4 Envision V2 Overview -- 5.5 Conclusion -- References -- 6 BINAREYE: Digital and Mixed-Signal Always-On Binary Neural Network Processing -- 6.1 Binary Neural Networks -- 6.1.1 Introduction -- 6.1.2 Binary Neural Network Layers -- 6.2 Binary Neural Network Applications -- 6.3 A Programmable Input-to-Label Accelerator Architecture -- 6.3.1 256X: A Baseline BinaryNet Architecture -- 6.3.1.1 Neuron Array, Weight Updates, and Input and Output Demuxing -- 6.3.1.2 Input Decoding -- 6.3.1.3 Dense Layers -- 6.3.1.4 System Control -- 6.3.2 SX: A Flexible DVAFS BinaryNet Architecture -- 6.4 MSBNN: A Mixed-Signal 256X Implementation -- 6.4.1 Switched-Capacitor Neuron Array -- 6.4.2 Measurement Results -- 6.4.3 Analog Signal Path Overhead -- 6.5 BinarEye: A Digital SX Implementation -- 6.5.1 An All-Digital Binary Neuron -- 6.5.2 Physical Implementation -- 6.5.3 Measurement Results -- 6.5..
3.1 Benchmark Network Performance -- 6.5.3.2 Application-Level Performance -- 6.5.4 DVAFS in BinarEye -- 6.5.5 Comparison with the State-of-the-Art -- 6.6 Comparing Digital and Analog Binary Neural Network Implementations -- 6.7 Outlook and Future Work -- 6.8 Conclusion -- References -- 7 Conclusions, Contributions, and Future Work -- 7.1 Conclusions -- 7.2 Suggestions for Future Work -- References -- Index -- .

펼치기

Embedded deep learning : algorithms, architectures and circuits for always-on neural network processomg (6회 대출)

소장정보

컨텐츠정보

책소개

목차

청구기호 브라우징

관련분야 인기자료

관련분야 신착자료