Summer '16 Reading List

A list of papers closely related with my field of study.

Convolutional Neural Networks and Related Topics Notes

DSE (Design Space Exploration)

SqueezeNet

Bypass connections, sensitivity analysis (explore DSE for pruning). Adding parameters to most sensitive units.

CNN DSE: microarchitectural exploration (inside layers), macroarchitectural exploration (combination of modules), model compression.

Related methods:

Bayesian optimization: Practical Bayesian Optimization of Machine Learning Algorithms
Simulated Annealing: An Optimization Method for Neural Network Weights and Architecture
Randomized Search
Genetic Algorithms: Evolving Neural Network Through Augmenting Topologies

Inspirations: DSD training; Re-densifying and retraining from a sparse model can improve accuracy.

Speed

Fast Algorithms

Fast Algorithms for CNN

Winograd's minimal filtering algorithm.

Fast ConvNets Using Groupwise Brain Damage

1. Train with group sparsity regularizer --> 2. sparsifying with groupwise brain damage. Fine tuning, Gradual groupwise sparsification.

Result: Shrink receptive fields towards center and make them circular.

Fast Training of CNN through FFTs

Minimizing Computation in CNN

Highway Networks

A new architecture to ease gradient-based training of very deep networks.

Affine transform H. Transform Gate T. Carry gate C.

y_i = H_i(x)T_i(x) + x_i(1-T_i(x))

Distributed/Parallel Computing

8-Bit Approximation for Parallelism in Deep Learning

8-bit data type: use bits of mantissa to represent a binary tree with interval (0.1, 1) which is bisected according to the route taken through the tree.

Hardware Accelerator

Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing

Hardware acceleratioin; loosen ineffectual computation.

Propose data structure format to enable seamless elimination of most zero-operand multiplications.

Zero skipping: 1. lane decoupling: CNV dynamic hardware approach where zero neurons are eliminated at output; only nonzeros neurons appear in NB_in.

2. Storing input on-the-fly in appropriate format (ZFNAf).

General-Purpose Code Acceleration with Limited-Precision Analog Computation

Accelerates code that can tolerate imprecise execution.

Utilizes an algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to "analog" neural model.

Result: 3.7x speed up, 6.3x energy saving, <10% quality loss.

Origami: A 803 GOp/s/W Convolutional Networks Accelerator

Size

Network/Software Level

XNOR-Net: ImageNet Classification Using Binary CNNs

Binary-Weight-Network & XNOR-Networks: both filters and input to conv layers are binary. 58x faster conv ops and 32x memory saving.

For efficient training and inference: designing compact layers, quantizing params, network binarization.

Result: performance little bit worse than full-precision networks.

Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or -1

Reduce memory consumption and time complexity. Use shift based batch normalizing transform.

Model Compression

SqueezeNet
Network Pruning
Quantization
Deep Compression

Memory/Hardware Level

DianNao series

Size, speed, power. Focus on end-user's off-line learning, just for feed-forward.

Components: NBin (neuron buffers), NBout, SB (synaptic weights buffer), NFU (Neural Functional Unit), Control Logic (CP). Use scratchpads for storage.

Minerva DNN Accelerator

Co-design flow, algorithsm, architecture, circuit layers. Low-power accelerator, high accuracy.

Heterogeneous data type quantization; dynamic operation pruning; algorithm-aware fault mitigation.

Keras: software simulation. Aladdin: accelerator DSE. Fault mitigation: low SRAM supply voltage charge --> higher bit faults.

Detection: Use Razor double-sampling method.

Correction: Bitmasking, replace fault bits with the sign bit.

EIE

Eyeriss

Accuracy

Adaptive Dropout

Standout networks, support "regularization by noise".

ResNet

Others

Visualizing and Understanding Recurrent Networks

Analyze performance and short comings of LSTMs. Sequence learning.

GRU: Gated Recurrent Unit. LSTM: use memory cells to remember long-range information and keep track of various attributes of text it is currently processing.

Deep Visual-Semantic Alignments for Generating Image Descriptions

Introduce a multimodal RNN architecture.

Bidirectional recurrent neural network. Related to another paper (Expressing an Image Stream with a Sequence of Natural Sentences) about blog image description generation.

Take out: use multiple saccades around the image to identify all entities, their mutual interactions and wider context before generating a description.

Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

2 criteria: global ranking objective -> image-sentence pairs; fragment alignment objective -> learn appearance of sentence fragments.

Emergence of Object-Selective Features in Unsupervised Feature Learning

Completely unlabeled data by unsupervised feature learning methods can learn high-level features.

Points: extremely large datasets; very large number of features.

Discovery: able to discover object-selective features with no labeled data, potentially perform better than basic supervised detectors.

1. Learn selective features by the importance of encoding vs. training with sparse coding and vector quantization. (k-means)

2. Combine selective features into invariant features (max-pooling).

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Introduce DT (dependency-tree) - RNN models --> better abstract from details of word order and syntactic expressions.

Use multimodal embeddings, semantic DT-RNNs.

1. Image learning: first trained using unsupervised objective (train on randomly sampled images from web).

2. Multimodal mapping: inner products. Use ImageNet to adjust features.