IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 8 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-08 00:00:00
    love 0

    Neural and Evolutionary Computing

    Neural Networks Designing Neural Networks: Multi-Objective Hyper-Parameter Optimization

    Sean C. Smithson, Guang Yang, Warren J. Gross, Brett H. Meyer
    Comments: To appear in ICCAD’16. The authoritative version will appear in the ACM Digital Library
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Artificial neural networks have gone through a recent rise in popularity,
    achieving state-of-the-art results in various fields, including image
    classification, speech recognition, and automated control. Both the performance
    and computational complexity of such models are heavily dependant on the design
    of characteristic hyper-parameters (e.g., number of hidden layers, nodes per
    layer, or choice of activation functions), which have traditionally been
    optimized manually. With machine learning penetrating low-power mobile and
    embedded areas, the need to optimize not only for performance (accuracy), but
    also for implementation complexity, becomes paramount. In this work, we present
    a multi-objective design space exploration method that reduces the number of
    solution networks trained and evaluated through response surface modelling.
    Given spaces which can easily exceed 1020 solutions, manually designing a
    near-optimal architecture is unlikely as opportunities to reduce network
    complexity, while maintaining performance, may be overlooked. This problem is
    exacerbated by the fact that hyper-parameters which perform well on specific
    datasets may yield sub-par results on others, and must therefore be designed on
    a per-application basis. In our work, machine learning is leveraged by training
    an artificial neural network to predict the performance of future candidate
    networks. The method is evaluated on the MNIST and CIFAR-10 image datasets,
    optimizing for both recognition accuracy and computational complexity.
    Experimental results demonstrate that the proposed method can closely
    approximate the Pareto-optimal front, while only exploring a small fraction of
    the design space.

    Sigma Delta Quantized Networks

    Peter O'Connor, Max Welling
    Comments: 9 Pages + 1 Reference + 3 Appendix, 5 figures
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Deep neural networks can be obscenely wasteful. When processing video, a
    convolutional network expends a fixed amount of computation for each frame with
    no regard to the similarity between neighbouring frames. As a result, it ends
    up repeatedly doing very similar computations. To put an end to such waste, we
    introduce Sigma-Delta networks. With each new input, each layer in this network
    sends a discretized form of its change in activation to the next layer. Thus
    the amount of computation that the network does scales with the amount of
    change in the input and layer activations, rather than the size of the network.
    We introduce an optimization method for converting any pre-trained deep network
    into an optimally efficient Sigma-Delta network, and show that our algorithm,
    if run on the appropriate hardware, could cut at least an order of magnitude
    from the computational cost of processing video data.

    A Differentiable Physics Engine for Deep Learning in Robotics

    Jonas Degrave, Michiel Hermans, Joni Dambre, Francis wyffels
    Comments: International Conference on Learning Representations 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    One of the most important fields in robotics is the optimization of
    controllers. Currently, robots are treated as a black box in this optimization
    process, which is the reason why derivative-free optimization methods such as
    evolutionary algorithms or reinforcement learning are omnipresent. We propose
    an implementation of a modern physics engine, which has the ability to
    differentiate control parameters. This has been implemented on both CPU and
    GPU. We show how this speeds up the optimization process, even for small
    problems, and why it will scale to bigger problems. We explain why this is an
    alternative approach to deep Q-learning, for using deep learning in robotics.
    Lastly, we argue that this is a big step for deep learning in robotics, as it
    opens up new possibilities to optimize robots, both in hardware and software.

    Loss-aware Binarization of Deep Networks

    Lu Hou, Quanming Yao, James T. Kwok
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Deep neural network models, though very powerful and highly successful, are
    computationally expensive in terms of space and time. Recently, there have been
    a number of attempts on binarizing the network weights and activations. This
    greatly reduces the network size, and replaces the underlying multiplications
    to additions or even XNOR bit operations. However, existing binarization
    schemes are based on simple matrix approximation and ignore the effect of
    binarization on the loss. In this paper, we propose a proximal Newton algorithm
    with diagonal Hessian approximation that directly minimizes the loss w.r.t. the
    binarized weights. The underlying proximal step has an efficient closed-form
    solution, and the second-order information can be efficiently obtained from the
    second moments already computed by the Adam optimizer. Experiments on both
    feedforward and recurrent networks show that the proposed loss-aware
    binarization algorithm outperforms existing binarization schemes, and is also
    more robust for wide and deep networks.

    Alternating Direction Method of Multipliers for Sparse Convolutional Neural Networks

    Farkhondeh Kiaee, Christian Gagné, Mahdieh Abbasi
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE)

    The storage and computation requirements of Convolutional Neural Networks
    (CNNs) can be prohibitive for exploiting these models over low-power or
    embedded devices. This paper reduces the computational complexity of the CNNs
    by minimizing an objective function, including the recognition loss that is
    augmented with a sparsity-promoting penalty term. The sparsity structure of the
    network is identified using the Alternating Direction Method of Multipliers
    (ADMM), which is widely used in large optimization problems. This method
    alternates between promoting the sparsity of the network and optimizing the
    recognition performance, which allows us to exploit the two-part structure of
    the corresponding objective functions. In particular, we take advantage of the
    separability of the sparsity-inducing penalty functions to decompose the
    minimization problem into sub-problems that can be solved sequentially.
    Applying our method to a variety of state-of-the-art CNN models, our proposed
    method is able to simplify the original model, generating models with less
    computation and fewer parameters, while maintaining and often improving
    generalization performance. Accomplishments on a variety of models strongly
    verify that our proposed ADMM-based method can be a very useful tool for
    simplifying and improving deep CNNs.

    Quasi-Recurrent Neural Networks

    James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
    Comments: Submitted to conference track at ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)

    Recurrent neural networks are a powerful tool for modeling sequential data,
    but the dependence of each timestep’s computation on the previous timestep’s
    output limits parallelism and makes RNNs unwieldy for very long sequences. We
    introduce quasi-recurrent neural networks (QRNNs), an approach to neural
    sequence modeling that alternates convolutional layers, which apply in parallel
    across timesteps, and a minimalist recurrent pooling function that applies in
    parallel across channels. Despite lacking trainable recurrent layers, stacked
    QRNNs have better predictive accuracy than stacked LSTMs of the same hidden
    size. Due to their increased parallelism, they are up to 16 times faster at
    train and test time. Experiments on language modeling, sentiment
    classification, and character-level neural machine translation demonstrate
    these advantages and underline the viability of QRNNs as a basic building block
    for a variety of sequence tasks.

    Memory-augmented Attention Modelling for Videos

    Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli
    Comments: ICLR 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recent works on neural architectures have demonstrated the utility of
    attention mechanisms for a wide variety of tasks. Attention models used for
    problems such as image captioning typically depend on the image under
    consideration, as well as the previous sequence of words that come before the
    word currently being generated. While these types of models have produced
    impressive results, they are not able to model the higher-order interactions
    involved in problems such as video description/captioning, where the
    relationship between parts of the video and the concepts being depicted is
    complex. Motivated by these observations, we propose a novel memory-based
    attention model for video description. Our model utilizes memories of past
    attention when reasoning about where to attend to in the current time step,
    similar to the central executive system proposed in human cognition. This
    allows the model to not only reason about local attention more effectively, it
    allows it to consider the entire sequence of video frames while generating each
    word. Evaluation on the challenging and popular MSVD and Charades datasets show
    that the proposed architecture outperforms all previously proposed methods and
    leads to a new state of the art results in the video description.

    Low-effort place recognition with WiFi fingerprints using deep learning

    Michał Nowicki, Jan Wietrzykowski
    Subjects: Robotics (cs.RO); Neural and Evolutionary Computing (cs.NE)

    Using WiFi signals for indoor localization is the main localization modality
    of the existing personal indoor localization systems operating on mobile
    devices. WiFi fingerprinting is also used for mobile robots, as WiFi signals
    are usually available indoors and can provide rough initial position estimate
    or can be used together with other positioning systems. Currently, the best
    solutions rely on filtering, manual data analysis, and time-consuming parameter
    tuning to achieve reliable and accurate localization. In this work, we propose
    to use deep neural networks to significantly lower the work-force burden of the
    localization system design, while still achieving satisfactory results.
    Assuming the state-of-the-art hierarchical approach, we employ the DNN system
    for building/floor classification. We show that stacked autoencoders allow to
    efficiently reduce the feature space in order to achieve robust and precise
    classification. The proposed architecture is verified on the publicly available
    UJIIndoorLoc dataset and the results are compared with other solutions.

    Regularizing CNNs with Locally Constrained Decorrelations

    Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca
    Comments: Submitted to ICLR2017 conference
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Regularization is key for deep learning since it allows training more complex
    models while keeping lower levels of overfitting. However, the most prevalent
    regularizations do not leverage all the capacity of the models since they rely
    on reducing the effective number of parameters. Feature decorrelation is an
    alternative for using the full capacity of the models but the overfitting
    reduction margins are too narrow given the overhead it introduces. In this
    paper, we show that regularizing negatively correlated features is an obstacle
    for effective decorrelation and present OrthoReg, a novel regularization
    technique that locally enforces feature orthogonality. As a result, imposing
    locality constraints in feature decorrelation removes interferences between
    negatively correlated features, allowing the regularizer to reach higher
    decorrelation bounds, and reducing the overfitting more effectively. In
    particular, we show that the models regularized with OrthoReg have higher
    accuracy bounds even when batch normalization and dropout present. Moreover,
    since our regularization is directly performed on the weights, it is especially
    suitable for fully convolutional neural networks, where the weight space is
    constant compared to the feature map space. As a result, we are able to reduce
    the overfitting of state-of-the-art CNNs on CIFAR-10, CIFAR-100, and SVHN.

    DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing

    Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, Tarek Abdelzaher
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)

    Mobile sensing applications usually require time-series inputs from sensors.
    Some applications, such as tracking, can use sensed acceleration and rate of
    rotation to calculate displacement based on physical system models. Other
    applications, such as activity recognition, extract manually designed features
    from sensor inputs for classification. Such applications face two challenges.
    On one hand, on-device sensor measurements are noisy. For many mobile
    applications, it is hard to find a distribution that exactly describes the
    noise in practice. Unfortunately, calculating target quantities based on
    physical system and noise models is only as accurate as the noise assumptions.
    Similarly, in classification applications, although manually designed features
    have proven to be effective, it is not always straightforward to find the most
    robust features to accommodate diverse sensor noise patterns and user
    behaviors. To this end, we propose DeepSense, a deep learning framework that
    directly addresses the aforementioned noise and feature customization
    challenges in a unified manner. DeepSense integrates convolutional and
    recurrent neural networks to exploit local interactions among similar mobile
    sensors, merge local interactions of different sensory modalities into global
    interactions, and extract temporal relationships to model signal dynamics.
    DeepSense thus provides a general signal estimation and classification
    framework that accommodates a wide range of applications. We demonstrate the
    effectiveness of DeepSense using three representative and challenging tasks:
    car tracking with motion sensors, heterogeneous human activity recognition, and
    user identification with biometric motion analysis. DeepSense significantly
    outperforms the state-of-the-art methods for all three tasks. In addition,
    DeepSense is feasible to implement on smartphones due to its moderate energy
    consumption and low latency

    Learning to Perform Physics Experiments via Deep Reinforcement Learning

    Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Physics and Society (physics.soc-ph)

    When encountering novel object, humans are able to infer a wide range of
    physical properties such as mass, friction and deformability by interacting
    with them in a goal driven way. This process of active interaction is in the
    same spirit of a scientist performing an experiment to discover hidden facts.
    Recent advances in artificial intelligence have yielded machines that can
    achieve superhuman performance in Go, Atari, natural language processing, and
    complex control problems, but it is not clear that these systems can rival the
    scientific intuition of even a young child. In this work we introduce a basic
    set of tasks that require agents to estimate hidden properties such as mass and
    cohesion of objects in an interactive simulated environment where they can
    manipulate the objects and observe the consequences. We found that state of art
    deep reinforcement learning methods can learn to perform the experiments
    necessary to discover such hidden properties. By systematically manipulating
    the problem difficulty and the cost incurred by the agent for performing
    experiments, we found that agents learn different strategies that balance the
    cost of gathering information against the cost of making mistakes in different
    situations.

    Modular Multitask Reinforcement Learning with Policy Sketches

    Jacob Andreas, Dan Klein, Sergey Levine
    Comments: Submitted to ICLR
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We describe a framework for multitask deep reinforcement learning guided by
    policy sketches. Sketches annotate each task with a sequence of named subtasks,
    providing high-level structural relationships among tasks, but not providing
    the detailed guidance required by previous work on learning policy abstractions
    for RL (e.g. intermediate rewards, subtask completion signals, or intrinsic
    motivations). Our approach associates every subtask with its own modular
    subpolicy, and jointly optimizes over full task-specific policies by tying
    parameters across shared subpolicies. This optimization is accomplished via a
    simple decoupled actor-critic training objective that facilitates learning
    common behaviors from dissimilar reward functions. We evaluate the
    effectiveness of our approach on a maze navigation game and a 2-D
    Minecraft-inspired crafting game. Both games feature extremely sparse rewards
    that can be obtained only after completing a number of high-level subgoals
    (e.g. escaping from a sequence of locked rooms or collecting and combining
    various ingredients in the proper order). Experiments illustrate two main
    advantages of our approach. First, we outperform standard baselines that learn
    task-specific or shared monolithic policies. Second, our method naturally
    induces a library of primitive behaviors that can be recombined to rapidly
    acquire policies for new tasks.

    Deep Biaffine Attention for Neural Dependency Parsing

    Timothy Dozat, Christopher D. Manning
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    While deep learning parsing approaches have proven very successful at finding
    the structure of sentences, most neural dependency parsers use neural networks
    only for feature extraction, and then use those features in traditional parsing
    algorithms. In contrast, this paper builds off recent work using
    general-purpose neural network components, training an attention mechanism over
    an LSTM to attend to the head of the phrase. We get state-of-the-art results
    for standard dependency parsing benchmarks, achieving 95.44% UAS and 93.76% LAS
    on the PTB dataset, 0.8% and 1.0% improvement, respectively, over Andor et al.
    (2016). In addition to proposing a new parsing architecture using
    dimensionality reduction and biaffine interactions, we examine simple
    hyperparameter choices that had a profound influence on the model’s
    performance, such as reducing the value of beta2 in the Adam optimization
    algorithm.

    Comparing learning algorithms in neural network for diagnosing cardiovascular disease

    Mirmorsal Madani
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Today data mining techniques are exploited in medical science for diagnosing,
    overcoming and treating diseases. Neural network is one of the techniques which
    are widely used for diagnosis in medical field. In this article efficiency of
    nine algorithms, which are basis of neural network learning in diagnosing
    cardiovascular diseases, will be assessed. Algorithms are assessed in terms of
    accuracy, sensitivity, transparency, AROC and convergence rate by means of 10
    fold cross validation. The results suggest that in training phase, Lonberg-M
    algorithm has the best efficiency in terms of all metrics, algorithm OSS has
    maximum accuracy in testing phase, algorithm SCG has the maximum transparency
    and algorithm CGB has the maximum sensitivity.

    Generative Multi-Adversarial Networks

    Ishan Durugkar, Ian Gemp, Sridhar Mahadevan
    Comments: Submitted as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE)

    Generative adversarial networks (GANs) are a framework for producing a
    generative model by way of a two-player minimax game. In this paper, we propose
    the emph{Generative Multi-Adversarial Network} (GMAN), a framework that
    extends GANs to multiple discriminators. In previous work, the successful
    training of GANs requires modifying the minimax objective to accelerate
    training early on. In contrast, GMAN can be reliably trained with the original,
    untampered objective. We explore a number of design perspectives with the
    discriminator role ranging from formidable adversary to forgiving teacher.
    Image generation tasks comparing the proposed framework to standard GANs
    demonstrate GMAN produces higher quality samples in a fraction of the
    iterations when measured by a pairwise GAM-type metric.

    Representation of uncertainty in deep neural networks through sampling

    Patrick McClure, Nikolaus Kriegeskorte
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    As deep neural networks (DNNs) are applied to increasingly challenging
    problems, they will need to be able to represent their own uncertainty.
    Modeling uncertainty is one of the key features of Bayesian methods. Scalable
    Bayesian DNNs that use dropout-based variational distributions have recently
    been proposed. Here we evaluate the ability of Bayesian DNNs trained with
    Bernoulli or Gaussian distributions over units (dropout) or weights
    (dropconnect) to represent their own uncertainty at the time of inference
    through sampling. We tested how well Bayesian fully connected and convolutional
    DNNs represented their own uncertainty in classifying the MNIST handwritten
    digits. By adding different levels of Gaussian noise to the test images, we
    assessed how DNNs represented their uncertainty about regions of input space
    not covered by the training set. Bayesian DNNs estimated their own uncertainty
    more accurately than traditional DNNs with a softmax output. These results are
    important for building better deep learning systems and for investigating the
    hypothesis that biological neural networks use sampling to represent
    uncertainty.

    Neural Architecture Search with Reinforcement Learning

    Barret Zoph, Quoc V. Le
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Neural networks are powerful and flexible models that work well for many
    difficult learning tasks in image, speech and natural language understanding.
    Despite their success, neural networks are still hard to design. In this paper,
    we use a recurrent network to generate the model descriptions of neural
    networks and train this RNN with reinforcement learning to maximize the
    expected accuracy of the generated architectures on a validation set. On the
    CIFAR-10 dataset, our method, starting from scratch, can design a novel network
    architecture that rivals the best human-invented architecture in terms of test
    set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84, which is
    only 0.1 percent worse and 1.2x faster than the current state-of-the-art model.
    On the Penn Treebank dataset, our model can compose a novel recurrent cell that
    outperforms the widely-used LSTM cell, and other state-of-the-art baselines.
    Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is
    3.6 perplexity better than the previous state-of-the-art.


    Computer Vision and Pattern Recognition

    Memory-augmented Attention Modelling for Videos

    Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli
    Comments: ICLR 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recent works on neural architectures have demonstrated the utility of
    attention mechanisms for a wide variety of tasks. Attention models used for
    problems such as image captioning typically depend on the image under
    consideration, as well as the previous sequence of words that come before the
    word currently being generated. While these types of models have produced
    impressive results, they are not able to model the higher-order interactions
    involved in problems such as video description/captioning, where the
    relationship between parts of the video and the concepts being depicted is
    complex. Motivated by these observations, we propose a novel memory-based
    attention model for video description. Our model utilizes memories of past
    attention when reasoning about where to attend to in the current time step,
    similar to the central executive system proposed in human cognition. This
    allows the model to not only reason about local attention more effectively, it
    allows it to consider the entire sequence of video frames while generating each
    word. Evaluation on the challenging and popular MSVD and Charades datasets show
    that the proposed architecture outperforms all previously proposed methods and
    leads to a new state of the art results in the video description.

    Meat adulteration detection through digital image analysis of histological cuts using LBP

    João J. de Macedo Neto, Jefersson A. dos Santos, William Robson Schwartz
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Food fraud has been an area of great concern due to its risk to public
    health, reduction of food quality or nutritional value and for its economic
    consequences. For this reason, it’s been object of regulation in many countries
    (e.g. [1], [2]). One type of food that has been frequently object of fraud
    through the addition of water or an aqueous solution is bovine meat. The
    traditional methods used to detect this kind of fraud are expensive,
    time-consuming and depend on physicochemical analysis that require complex
    laboratory techniques, specific for each added substance. In this paper, based
    on digital images of histological cuts of adulterated and not-adulterated
    (normal) bovine meat, we evaluate the of digital image analysis methods to
    identify the aforementioned kind of fraud, with focus on the Local Binary
    Pattern (LBP) algorithm.

    Unsupervised Cross-Domain Image Generation

    Yaniv Taigman, Adam Polyak, Lior Wolf
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We study the problem of transferring a sample in one domain to an analog
    sample in another domain. Given two related domains, S and T, we would like to
    learn a generative function G that maps an input sample from S to the domain T,
    such that the output of a given function f, which accepts inputs in either
    domains, would remain unchanged. Other than the function f, the training data
    is unsupervised and consist of a set of samples from each domain. The Domain
    Transfer Network (DTN) we present employs a compound loss function that
    includes a multiclass GAN loss, an f-constancy component, and a regularizing
    component that encourages G to map samples from T to themselves. We apply our
    method to visual domains including digits and face images and demonstrate its
    ability to generate convincing novel images of previously unseen entities,
    while preserving their identity.

    Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation

    Yiyi Liao, Lichao Huang, Yue Wang, Sarath Kodagoda, Yinan Yu, Yong Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    Many standard robotic platforms are equipped with at least a fixed 2D laser
    range finder and a monocular camera. Although those platforms do not have
    sensors for 3D depth sensing capability, knowledge of depth is an essential
    part in many robotics activities. Therefore, recently, there is an increasing
    interest in depth estimation using monocular images. As this task is inherently
    ambiguous, the data-driven estimated depth might be unreliable in robotics
    applications. In this paper, we have attempted to improve the precision of
    monocular depth estimation by introducing 2D planar observation from the
    remaining laser range finder without extra cost. Specifically, we construct a
    dense reference map from the sparse laser range data, redefining the depth
    estimation task as estimating the distance between the real and the reference
    depth. To solve the problem, we construct a novel residual of residual neural
    network, and tightly combine the classification and regression losses for
    continuous depth estimation. Experimental results suggest that our method
    achieves considerable promotion compared to the state-of-the-art methods on
    both NYUD2 and KITTI, validating the effectiveness of our method on leveraging
    the additional sensory information. We further demonstrate the potential usage
    of our method in obstacle avoidance where our methodology provides
    comprehensive depth information compared to the solution using monocular camera
    or 2D laser range finder alone.

    Spatiotemporal Residual Networks for Video Action Recognition

    Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes
    Comments: NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Two-stream Convolutional Networks (ConvNets) have shown strong performance
    for human action recognition in videos. Recently, Residual Networks (ResNets)
    have arisen as a new technique to train extremely deep architectures. In this
    paper, we introduce spatiotemporal ResNets as a combination of these two
    approaches. Our novel architecture generalizes ResNets for the spatiotemporal
    domain by introducing residual connections in two ways. First, we inject
    residual connections between the appearance and motion pathways of a two-stream
    architecture to allow spatiotemporal interaction between the two streams.
    Second, we transform pretrained image ConvNets into spatiotemporal networks by
    equipping these with learnable convolutional filters that are initialized as
    temporal residual connections and operate on adjacent feature maps in time.
    This approach slowly increases the spatiotemporal receptive field as the depth
    of the model increases and naturally integrates image ConvNet design
    principles. The whole model is trained end-to-end to allow hierarchical
    learning of complex spatiotemporal features. We evaluate our novel
    spatiotemporal ResNet using two widely used action recognition benchmarks where
    it exceeds the previous state-of-the-art.

    Crowdsourcing in Computer Vision

    Adriana Kovashka, Olga Russakovsky, Li Fei-Fei, Kristen Grauman
    Comments: A 69-page meta review of the field, Foundations and Trends in Computer Graphics and Vision, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

    Computer vision systems require large amounts of manually annotated data to
    properly learn challenging visual concepts. Crowdsourcing platforms offer an
    inexpensive method to capture human knowledge and understanding, for a vast
    number of visual perception tasks. In this survey, we describe the types of
    annotations computer vision researchers have collected using crowdsourcing, and
    how they have ensured that this data is of high quality while annotation effort
    is minimized. We begin by discussing data collection on both classic (e.g.,
    object recognition) and recent (e.g., visual story-telling) vision tasks. We
    then summarize key design decisions for creating effective data collection
    interfaces and workflows, and present strategies for intelligently selecting
    the most important data instances to annotate. Finally, we conclude with some
    thoughts on the future of crowdsourcing in computer vision.

    Texture and Color-based Image Retrieval Using the Local Extrema Features and Riemannian Distance

    Minh-Tan Pham, Grégoire Mercier, Lionel Bombrun, Julien Michel
    Comments: 11 pages, in IEEE-TIP peer review
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A novel efficient method for content-based image retrieval (CBIR) is
    developed in this paper using both texture and color features. Our motivation
    is to represent and characterize an input image by a set of local descriptors
    extracted at characteristic points (i.e. keypoints) within the image. Then,
    dissimilarity measure between images is calculated based on the geometric
    distance between the topological feature spaces (i.e. manifolds) formed by the
    sets of local descriptors generated from these images. In this work, we propose
    to extract and use the local extrema pixels as our feature points. Then, the
    so-called local extrema-based descriptor (LED) is generated for each keypoint
    by integrating all color, spatial as well as gradient information captured by a
    set of its nearest local extrema. Hence, each image is encoded by a LED feature
    point cloud and riemannian distances between these point clouds enable us to
    tackle CBIR. Experiments performed on Vistex, Stex and colored Brodatz texture
    databases using the proposed approach provide very efficient and competitive
    results compared to the state-of-the-art methods.

    A Fully Convolutional Neural Network based Structured Prediction Approach Towards the Retinal Vessel Segmentation

    Avijit Dasgupta, Sonam Singh
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic segmentation of retinal blood vessels from fundus images plays an
    important role in the computer aided diagnosis of retinal diseases. The task of
    blood vessel segmentation is challenging due to the extreme variations in
    morphology of the vessels against noisy background. In this paper, we formulate
    the segmentation task as a multi-label inference task and utilize the implicit
    advantages of the combination of convolutional neural networks and structured
    prediction. Our proposed convolutional neural network based model achieves
    strong performance and significantly outperforms the state-of-the-art for
    automatic retinal blood vessel segmentation on DRIVE dataset with 95.33%
    accuracy and 0.974 AUC score.

    Chinese/English mixed Character Segmentation as Semantic Segmentation

    Huabin Zheng, Jingyu Wang, Zhengjie Huang, Rong Pan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    OCR character segmentation for multilingual printed documents is difficult
    due to the diversity of different linguistic characters. Previous approaches
    mainly focus on monolingual text and are not suitable for multi-lingual cases.
    In this work, we particularly tackle the Chinese/English mixed case by
    reframing it as a semantic segmentation problem. We take advantage of the
    successful architecture called fully convolutional networks (FCN) in the field
    of semantic segmentation. As a deep architecture, FCN can automatically learn
    useful features without traditional feature engineering. Given wide enough
    receptive field, it can utilize the necessary context around a position to
    better determinate whether this is a splitting point or not. Trained on
    synthesized samples with simulated random disturbances, FCN can effectively
    split characters without any hand-crafted features. The experimental results
    show that our model significantly outperforms the previous methods. It is able
    to generalize from simulated disturbances to real-world disturbances,
    generalize from one text content style to another, generalize from seen font
    styles to unseen ones, and correctly handle disconnected structures and
    touching characters.

    Fixed-point Factorized Networks

    Peisong Wang, Jian Cheng
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In recent years, Deep Neural Networks (DNNs) based methods have achieved
    remarkable performance in a wide range of tasks and have been among the most
    powerful and widely used techniques in computer vision, speech recognition and
    Natural Language Processing. However, DNN-based methods are both
    computational-intensive and resource-consuming, which hinders the application
    of these methods on embedded systems like smart phones. To alleviate this
    problem, we introduce a novel Fixed-point Factorized Networks (FFN) on
    pre-trained models to reduce the computational complexity as well as the
    storage requirement of networks. Extensive experiments on large-scale ImageNet
    classification task show the effectiveness of our proposed method.

    High-Resolution Semantic Labeling with Convolutional Neural Networks

    Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, Pierre Alliez
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks (CNNs) have received increasing attention over
    the last few years. They were initially conceived for image categorization,
    i.e., the problem of assigning a semantic label to an entire input image.

    In this paper we address the problem of dense semantic labeling, which
    consists in assigning a semantic label to every pixel in an image. Since this
    requires a high spatial accuracy to determine where labels are assigned,
    categorization CNNs, intended to be highly robust to local deformations, are
    not directly applicable.

    By adapting categorization networks, many semantic labeling CNNs have been
    recently proposed. Our first contribution is an in-depth analysis of these
    architectures. We establish the desired properties of an ideal semantic
    labeling CNN, and assess how those methods stand with regard to these
    properties. We observe that even though they provide competitive results, these
    CNNs often underexploit properties of semantic labeling that could lead to more
    effective and efficient architectures.

    Out of these observations, we then derive a CNN framework specifically
    adapted to the semantic labeling problem. In addition to learning features at
    different resolutions, it learns how to combine these features. By integrating
    local and global information in an efficient and flexible manner, it
    outperforms previous techniques. We evaluate the proposed framework and compare
    it with state-of-the-art architectures on public benchmarks of high-resolution
    aerial image labeling.

    Action2Activity: Recognizing Complex Activities from Sensor Data

    Ye Liu, Liqiang Nie, Lei Han, Luming Zhang, David S Rosenblum
    Comments: IJCAI 2015
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As compared to simple actions, activities are much more complex, but
    semantically consistent with a human’s real life. Techniques for action
    recognition from sensor generated data are mature. However, there has been
    relatively little work on bridging the gap between actions and activities. To
    this end, this paper presents a novel approach for complex activity recognition
    comprising of two components. The first component is temporal pattern mining,
    which provides a mid-level feature representation for activities, encodes
    temporal relatedness among actions, and captures the intrinsic properties of
    activities. The second component is adaptive Multi-Task Learning, which
    captures relatedness among activities and selects discriminant features.
    Extensive experiments on a real-world dataset demonstrate the effectiveness of
    our work.

    The Shallow End: Empowering Shallower Deep-Convolutional Networks through Auxiliary Outputs

    Yong Guo, Mingkui Tan, Qingyao Wu, Jian Chen, Anton Van Den Hengel, Qinfeng Shi
    Comments: 12 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks (CNNs) with very deep architectures, such as
    the residual network (ResNet) [6], have shown encouraging results in various
    tasks in computer vision and machine learning. Their depth has been one of the
    key factors behind the great success of CNNs, with the associated gradient
    vanishing issue having been largely addressed by ResNet. There are other issues
    associated with increased depth, however. First, when networks get very deep,
    the supervision information may vanish due to the associated long
    backpropagation path. This means that intermediate layers receive less training
    information, which results in redundancy in models. Second, when the model
    becomes more complex and redundant, inference becomes more expensive. Third,
    very deep models require larger volumes of training data. We propose here
    instead an AuxNet and a new training method to propagate not only gradients but
    also supervision information from multiple auxiliary outputs at intermediate
    layers. The proposed AuxNet gives rise to a more compact network which
    outperforms its very deep equivalent (i.e. ResNet). For example, AuxNet with 44
    layers performs better than the original ResNet with 110 layers on several
    benchmark data sets, i.e. CIFAR-10, CIFAR-100 and SVHN.

    Deep Convolutional Neural Network Features and the Original Image

    Connor J. Parde, Carlos Castillo, Matthew Q. Hill, Y. Ivette Colon, Swami Sankaranarayanan, Jun-Cheng Chen, Alice J. O'Toole
    Comments: Submitted to Face and Gesture Conference, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Face recognition algorithms based on deep convolutional neural networks
    (DCNNs) have made progress on the task of recognizing faces in unconstrained
    viewing conditions. These networks operate with compact feature-based face
    representations derived from learning a very large number of face images. While
    the learned features produced by DCNNs can be highly robust to changes in
    viewpoint, illumination, and appearance, little is known about the nature of
    the face code that emerges at the top level of such networks. We analyzed the
    DCNN features produced by two face recognition algorithms. In the first set of
    experiments we used the top-level features from the DCNNs as input into linear
    classifiers aimed at predicting metadata about the images. The results show
    that the DCNN features contain surprisingly accurate information about the yaw
    and pitch of a face, and about whether the face came from a still image or a
    video frame. In the second set of experiments, we measured the extent to which
    individual DCNN features operated in a view-dependent or view-invariant manner.
    We found that view-dependent coding was a characteristic of the identities
    rather than the DCNN features – with some identities coded consistently in a
    view-dependent way and others in a view-independent way. In our third analysis,
    we visualized the DCNN feature space for over 24,000 images of 500 identities.
    Images in the center of the space were uniformly of low quality (e.g., extreme
    views, face occlusion, low resolution). Image quality increased monotonically
    as a function of distance from the origin. This result suggests that image
    quality information is available in the DCNN features, such that consistently
    average feature values reflect coding failures that reliably indicate poor or
    unusable images. Combined, the results offer insight into the coding mechanisms
    that support robust representation of faces in DCNNs.

    Deep Label Distribution Learning with Label Ambiguity

    Bin-Bin Gao, Chao Xing, Chen-Wei Xie, Jianxin Wu, Xin Geng
    Comments: Submitted to IEEE Trans. Image Processing
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional Neural Networks (ConvNets) have achieved excellent recognition
    performance in various visual recognition tasks. A large labeled training set
    is one of the most important factors for its success. However, it is difficult
    to collect sufficient training images with precise labels in some domains such
    as apparent age estimation, head pose estimation, multi-label classification
    and semantic segmentation. Fortunately, there is ambiguous information among
    labels, which makes these tasks different from traditional classification.
    Based on this observation, we convert the label of each image into a discrete
    label distribution, and learn the label distribution by minimizing a
    Kullback-Leibler divergence between the predicted and ground-truth label
    distributions using deep ConvNets. The proposed DLDL (Deep Label Distribution
    Learning) method effectively utilizes the label ambiguity in both feature
    learning and classifier learning, which prevents the network from over-fitting
    even when the training set is small. Experimental results show that the
    proposed approach produces significantly better results than state-of-the-art
    methods for age estimation and head pose estimation. At the same time, it also
    improves recognition performance for multi-label classification and semantic
    segmentation tasks.

    Validation of Tsallis Entropy In Inter-Modality Neuroimage Registration

    Henrique Tomaz Amaral-Silva, Luiz Otavio Murta-Jr, Paulo Mazzoncini de Azevedo-Marques, Lauro Wichert-Ana, V. B. Surya Prasath, Colin Studholme
    Comments: 15 pages, 11 figures, 2 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Medical image registration plays an important role in determining topographic
    and morphological changes for functional diagnostic and therapeutic purposes.
    Manual alignment and semi-automated software still have been used; however they
    are subjective and make specialists spend precious time. Fully automated
    methods are faster and user-independent, but the critical point is registration
    reliability. Similarity measurement using Mutual Information (MI) with Shannon
    entropy (MIS) is the most common automated method that is being currently
    applied in medical images, although more reliable algorithms have been proposed
    over the last decade, suggesting improvements and different entropies; such as
    Studholme et al, (1999), who demonstrated that the normalization of Mutual
    Information (NMI) provides an invariant entropy measure for 3D medical image
    registration. In this paper, we described a set of experiments to evaluate the
    applicability of Tsallis entropy in the Mutual Information (MIT) and in the
    Normalized Mutual Information (NMIT) as cost functions for Magnetic Resonance
    Imaging (MRI), Positron Emission Tomography (PET) and Computed Tomography (CT)
    exams registration. The effect of changing overlap in a simple image model and
    clinical experiments on current entropies (Entropy Correlation Coefficient –
    ECC, MIS and NMI) and the proposed ones (MIT and NMT) showed NMI and NMIT with
    Tsallis parameter close to 1 as the best options (confidence and accuracy) for
    CT to MRI and PET to MRI automatic neuroimaging registration.

    End-to-end Optimized Image Compression

    Johannes Ballé, Valero Laparra, Eero P. Simoncelli
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

    We describe an image compression system, consisting of a nonlinear encoding
    transformation, a uniform quantizer, and a nonlinear decoding transformation.
    Like many deep neural network architectures, the transforms consist of layers
    of convolutional linear filters and nonlinear activation functions, but we use
    a joint nonlinearity that implements a form of local gain control, inspired by
    those used to model biological neurons. Using a variant of stochastic gradient
    descent, we jointly optimize the system for rate-distortion performance over a
    database of training images, introducing a continuous proxy for the
    discontinuous loss function arising from the quantizer. The relaxed
    optimization problem resembles that of variational autoencoders, except that it
    must operate at any point along the rate-distortion curve, whereas the
    optimization of generative models aims only to minimize entropy of the data
    under the model. Across an independent database of test images, we find that
    the optimized coder exhibits significantly better rate-distortion performance
    than the standard JPEG and JPEG 2000 compression systems, as well as a dramatic
    improvement in visual quality of compressed images.

    Boosting Image Captioning with Attributes

    Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatically describing an image with a natural language has been an
    emerging challenge in both fields of computer vision and natural language
    processing. In this paper, we present Long Short-Term Memory with Attributes
    (LSTM-A) – a novel architecture that integrates attributes into the successful
    Convolutional Neural Networks (CNNs) plus Recurrent Neural Networks (RNNs)
    image captioning framework, by training them in an end-to-end manner. To
    incorporate attributes, we construct variants of architectures by feeding image
    representations and attributes into RNNs in different ways to explore the
    mutual but also fuzzy relationship between them. Extensive experiments are
    conducted on COCO image captioning dataset and our framework achieves superior
    results when compared to state-of-the-art deep models. Most remarkably, we
    obtain METEOR/CIDEr-D of 25.2%/98.6% on testing data of widely used and
    publicly available splits in (Karpathy & Fei-Fei, 2015) when extracting image
    representations by GoogleNet and achieve to date top-1 performance on COCO
    captioning Leaderboard.

    GPU-based Pedestrian Detection for Autonomous Driving

    Victor Campmany, Sergio Silva, Antonio Espinosa, Juan Carlos Moure, David Vázquez, Antonio M. López
    Comments: 10 pages
    Journal-ref: International Conference on Computational Science 2016 Volume 80
    Pages 2377 to 2381
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a real-time pedestrian detection system for the embedded Nvidia
    Tegra X1 GPU-CPU hybrid platform. The pipeline is composed by the following
    state-of-the-art algorithms: Histogram of Local Binary Patterns (LBP) and
    Histograms of Oriented Gradients (HOG) features extracted from the input image;
    Pyramidal Sliding Window technique for candidate generation; and Support Vector
    Machine (SVM) for classification. Results show a 8x speedup in the target Tegra
    X1 platform and a better performance/watt ratio than desktop CUDA platforms in
    study.

    What Is the Best Practice for CNNs Applied to Visual Instance Retrieval?

    Jiedong Hao, Jing Dong, Wei Wang, Tieniu Tan
    Comments: The verison submitted to ICLR
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Previous work has shown that feature maps of deep convolutional neural
    networks (CNNs) can be interpreted as feature representation of a particular
    image region. Features aggregated from these feature maps have been exploited
    for image retrieval tasks and achieved state-of-the-art performances in recent
    years. The key to the success of such methods is the feature representation.
    However, the different factors that impact the effectiveness of features are
    still not explored thoroughly. There are much less discussion about the best
    combination of them.

    The main contribution of our paper is the thorough evaluations of the various
    factors that affect the discriminative ability of the features extracted from
    CNNs. Based on the evaluation results, we also identify the best choices for
    different factors and propose a new multi-scale image feature representation
    method to encode the image effectively. Finally, we show that the proposed
    method generalises well and outperforms the state-of-the-art methods on four
    typical datasets used for visual instance retrieval.

    Efficient Branching Cascaded Regression for Face Alignment under Significant Head Rotation

    Brandon M. Smith, Charles R. Dyer
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Despite much interest in face alignment in recent years, the large majority
    of work has focused on near-frontal faces. Algorithms typically break down on
    profile faces, or are too slow for real-time applications. In this work we
    propose an efficient approach to face alignment that can handle 180 degrees of
    head rotation in a unified way (e.g., without resorting to view-based models)
    using 2D training data. The foundation of our approach is cascaded shape
    regression (CSR), which has emerged recently as the leading strategy. We
    propose a generalization of conventional CSRs that we call branching cascaded
    regression (BCR). Conventional CSRs are single-track; that is, they progress
    from one cascade level to the next in a straight line, with each regressor
    attempting to fit the entire dataset. We instead split the regression problem
    into two or more simpler ones after each cascade level. Intuitively, each
    regressor can then operate on a simpler objective function (i.e., with fewer
    conflicting gradient directions). Within the BCR framework, we model and infer
    pose-related landmark visibility and face shape simultaneously using Structured
    Point Distribution Models (SPDMs). We propose to learn task-specific feature
    mapping functions that are adaptive to landmark visibility, and that use SPDM
    parameters as regression targets instead of 2D landmark coordinates.
    Additionally, we introduce a new in-the-wild dataset of profile faces to
    validate our approach.

    Efficient Vision Data Processing on a Mobile Device for Indoor Localization

    Michał Nowicki, Jan Wietrzykowski, Piotr Skrzypczyński
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    The paper presents an approach to indoor personal localization using a
    sequence of captured images. The solution is tailored specifically for
    processing vision data on mobile devices with limited computing power. The
    presented FastABLE system is based on the ABLE-M place recognition algorithm,
    but introduces major modifications to the concept of image matching, in order
    to radically cut down the processing time, and to improve scalability. FastABLE
    is compared against the original OpenABLE and the popular OpenFABMAP place
    recognition systems.

    Elliptic operator for shape analysis

    Yoni Choukroun, Alon Shtern, Ron Kimmel
    Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

    Many shape analysis methods treat the geometry of an object as a metric space
    that can be captured by the Laplace-Beltrami operator. In this paper, we
    propose to adapt a classical operator from quantum mechanics to the field of
    shape analysis where we suggest to integrate a scalar function through a
    unified elliptical Hamiltonian operator. We study the addition of a potential
    function to the Laplacian as a generator for dual spaces in which shape
    processing is performed. Then, we evaluate the resulting spectral basis for
    different applications such as mesh compression and shape matching. The
    suggested operator is shown to produce better functional spaces to operate
    with, as demonstrated by the proposed framework that outperforms existing
    spectral methods, for example, when applied to shape matching benchmarks.

    Learning to Perform Physics Experiments via Deep Reinforcement Learning

    Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Physics and Society (physics.soc-ph)

    When encountering novel object, humans are able to infer a wide range of
    physical properties such as mass, friction and deformability by interacting
    with them in a goal driven way. This process of active interaction is in the
    same spirit of a scientist performing an experiment to discover hidden facts.
    Recent advances in artificial intelligence have yielded machines that can
    achieve superhuman performance in Go, Atari, natural language processing, and
    complex control problems, but it is not clear that these systems can rival the
    scientific intuition of even a young child. In this work we introduce a basic
    set of tasks that require agents to estimate hidden properties such as mass and
    cohesion of objects in an interactive simulated environment where they can
    manipulate the objects and observe the consequences. We found that state of art
    deep reinforcement learning methods can learn to perform the experiments
    necessary to discover such hidden properties. By systematically manipulating
    the problem difficulty and the cost incurred by the agent for performing
    experiments, we found that agents learn different strategies that balance the
    cost of gathering information against the cost of making mistakes in different
    situations.

    Learning to Act by Predicting the Future

    Alexey Dosovitskiy, Vladlen Koltun
    Comments: ICLR-2017 submission
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We present an approach to sensorimotor control in immersive environments. Our
    approach utilizes a high-dimensional sensory stream and a lower-dimensional
    measurement stream. The cotemporal structure of these streams provides a rich
    supervisory signal, which enables training a sensorimotor control model by
    interacting with the environment. The model is trained using supervised
    learning techniques, but without extraneous supervision. It learns to act based
    on raw sensory input from a complex three-dimensional environment. The
    presented formulation allows changing the agent’s goal at test time. We conduct
    extensive experiments in three-dimensional simulations based on the classical
    first-person game Doom. The results demonstrate that the presented approach
    outperforms sophisticated prior formulations, particularly on challenging
    tasks. The results also show that trained models successfully generalize across
    environments and goals. A model trained using the presented approach won the
    Full Deathmatch track of the Visual Doom AI Competition, which was held in
    previously unseen environments.

    LipNet: Sentence-level Lipreading

    Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    Lipreading is the task of decoding text from the movement of a speaker’s
    mouth. Traditional approaches separated the problem into two stages: designing
    or learning visual features, and prediction. More recent deep lipreading
    approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman,
    2016a). All existing works, however, perform only word classification, not
    sentence-level sequence prediction. Studies have shown that human lipreading
    performance increases for longer words (Easton & Basala, 1982), indicating the
    importance of features capturing temporal context in an ambiguous communication
    channel. Motivated by this observation, we present LipNet, a model that maps a
    variable-length sequence of video frames to text, making use of spatiotemporal
    convolutions, an LSTM recurrent network, and the connectionist temporal
    classification loss, trained entirely end-to-end. To the best of our knowledge,
    LipNet is the first lipreading model to operate at sentence-level, using a
    single end-to-end speaker-independent deep model to simultaneously learn
    spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet
    achieves 93.4% accuracy, outperforming experienced human lipreaders and the
    previous 79.6% state-of-the-art accuracy.


    Artificial Intelligence

    Bayesian Non-parametric model to Target Gamification Notifications Using Big Data

    Meisam Hejazi Nia, Brian Ratchford
    Subjects: Artificial Intelligence (cs.AI)

    I suggest an approach that helps the online marketers to target their
    Gamification elements to users by modifying the order of the list of tasks that
    they send to users. It is more realistic and flexible as it allows the model to
    learn more parameters when the online marketers collect more data. The
    targeting approach is scalable and quick, and it can be used over streaming
    data.

    Dependence and Relevance: A probabilistic view

    Dan Geiger, David Heckerman
    Subjects: Artificial Intelligence (cs.AI); Combinatorics (math.CO); Probability (math.PR)

    We examine three probabilistic concepts related to the sentence “two
    variables have no bearing on each other”. We explore the relationships between
    these three concepts and establish their relevance to the process of
    constructing similarity networks—a tool for acquiring probabilistic knowledge
    from human experts. We also establish a precise relationship between
    connectedness in Bayesian networks and relevance in probability.

    Deep Reinforcement Learning with Averaged Target DQN

    Oron Anschel, Nir Baram, Nahum Shimkin
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    The commonly used Q-learning algorithm combined with function approximation
    induces systematic overestimations of state-action values. These systematic
    errors might cause instability, poor performance and sometimes divergence of
    learning. In this work, we present the extsc{Averaged Target DQN} (ADQN)
    algorithm, an adaptation to the DQN class of algorithms which uses a weighted
    average over past learned networks to reduce generalization noise variance. As
    a consequence, this leads to reduced overestimations, more stable learning
    process and improved performance. Additionally, we analyze ADQN variance
    reduction along trajectories and demonstrate the performance of ADQN on a toy
    Gridworld problem, as well as on several of the Atari 2600 games from the
    Arcade Learning Environment.

    Neuro-Symbolic Program Synthesis

    Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli
    Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

    Recent years have seen the proposal of a number of neural architectures for
    the problem of Program Induction. Given a set of input-output examples, these
    architectures are able to learn mappings that generalize to new test inputs.
    While achieving impressive results, these approaches have a number of important
    limitations: (a) they are computationally expensive and hard to train, (b) a
    model has to be trained for each task (program) separately, and (c) it is hard
    to interpret or verify the correctness of the learnt mapping (as it is defined
    by a neural network). In this paper, we propose a novel technique,
    Neuro-Symbolic Program Synthesis, to overcome the above-mentioned problems.
    Once trained, our approach can automatically construct computer programs in a
    domain-specific language that are consistent with a set of input-output
    examples provided at test time. Our method is based on two novel neural
    modules. The first module, called the cross correlation I/O network, given a
    set of input-output examples, produces a continuous representation of the set
    of I/O examples. The second module, the Recursive-Reverse-Recursive Neural
    Network (R3NN), given the continuous representation of the examples,
    synthesizes a program by incrementally expanding partial programs. We
    demonstrate the effectiveness of our approach by applying it to the rich and
    complex domain of regular expression based string transformations. Experiments
    show that the R3NN model is not only able to construct programs from new
    input-output examples, but it is also able to construct new programs for tasks
    that it had never observed before during training.

    Self-Wiring Question Answering Systems

    Ricardo Usbeck, Jonathan Huthmann, Nico Duldhardt, Axel-Cyrille Ngonga Ngomo
    Comments: 6 pages, 1 figure, pre-print in lncs
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Question answering (QA) has been the subject of a resurgence over the past
    years. The said resurgence has led to a multitude of question answering (QA)
    systems being developed both by companies and research facilities. While a few
    components of QA systems get reused across implementations, most systems do not
    leverage the full potential of component reuse. Hence, the development of QA
    systems is currently still a tedious and time-consuming process. We address the
    challenge of accelerating the creation of novel or tailored QA systems by
    presenting a concept for a self-wiring approach to composing QA systems. Our
    approach will allow the reuse of existing, web-based QA systems or modules
    while developing new QA platforms. To this end, it will rely on QA modules
    being described using the Web Ontology Language. Based on these descriptions,
    our approach will be able to automatically compose QA systems using a
    data-driven approach automatically.

    Optimal Binary Autoencoding with Pairwise Correlations

    Akshay Balsubramani
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We formulate learning of a binary autoencoder as a biconvex optimization
    problem which learns from the pairwise correlations between encoded and decoded
    bits. Among all possible algorithms that use this information, ours finds the
    autoencoder that reconstructs its inputs with worst-case optimal loss. The
    optimal decoder is a single layer of artificial neurons, emerging entirely from
    the minimax loss minimization, and with weights learned by convex optimization.
    All this is reflected in competitive experimental results, demonstrating that
    binary autoencoding can be done efficiently by conveying information in
    pairwise correlations in an optimal fashion.

    Gaussian Attention Model and Its Application to Knowledgebase Embedding and Question Answering

    Liwen Zhang, John Winn, Ryota Tomioka
    Comments: 12 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We propose the Gaussian attention model for content-based neural memory
    access. With the proposed attention model, a neural network has the additional
    degree of freedom to control the focus of its attention from a laser sharp
    attention to blurred attention. It is applicable whenever we can assume that
    the distance in the latent space reflects some notion of semantics. We use the
    proposed attention model as a scoring function for the embedding of a
    knowledgebase into a continuous vector space and then train a model that
    performs question answering about the entities in the knowledgebase. The
    proposed attention model can handle both the propagation of uncertainty when
    following a series of relations and also the conjunction of thoughts in a
    natural way. On a dataset of soccer players who participated in the FIFA World
    Cup 2014, we demonstrate that our model can handle both path queries and
    conjunctive queries well.

    Hierarchical compositional feature learning

    Miguel Lázaro-Gredilla, Yi Liu, D. Scott Phoenix, Dileep George
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We introduce the hierarchical compositional network (HCN), a directed
    generative model able to discover and disentangle, without supervision, the
    building blocks of a set of binary images. The building blocks are binary
    features defined hierarchically as a composition of some of the features in the
    layer immediately below, arranged in a particular manner. At a high level, HCN
    is similar to a sigmoid belief network with pooling. Inference and learning in
    HCN are very challenging and existing variational approximations do not work
    satisfactorily. A main contribution of this work is to show that both can be
    addressed using max-product message passing (MPMP) with a particular schedule
    (no EM required). Also, using MPMP as an inference engine for HCN makes new
    tasks simple: adding supervision information, classifying images, or performing
    inpainting all correspond to clamping some variables of the model to their
    known values and running MPMP on the rest. When used for classification, fast
    inference with HCN has exactly the same functional form as a convolutional
    neural network (CNN) with linear activations and binary weights. However, HCN’s
    features are qualitatively very different.

    Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

    Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Model-free deep reinforcement learning (RL) methods have been successful in a
    wide variety of simulated domains. However, a major obstacle facing deep RL in
    the real world is the high sample complexity of such methods. Unbiased batch
    policy-gradient methods offer stable learning, but at the cost of high
    variance, which often requires large batches, while TD-style methods, such as
    off-policy actor-critic and Q-learning, are more sample-efficient but biased,
    and often require costly hyperparameter sweeps to stabilize. In this work, we
    aim to develop methods that combine the stability of unbiased policy gradients
    with the efficiency of off-policy RL. We present Q-Prop, a policy gradient
    method that uses a Taylor expansion of the off-policy critic as a control
    variate. Q-Prop is both sample efficient and stable, and effectively combines
    the benefits of on-policy and off-policy methods. We analyze the connection
    between Q-Prop and existing model-free algorithms, and use control variate
    theory to derive two variants of Q-Prop with conservative and aggressive
    adaptation. We show that conservative Q-Prop provides substantial gains in
    sample efficiency over trust region policy optimization (TRPO) with generalized
    advantage estimation (GAE), and improves stability over deep deterministic
    policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods,
    on OpenAI Gym’s MuJoCo continuous control environments.

    Playing SNES in the Retro Learning Environment

    Nadav Bhonker, Shai Rozenberg, Itay Hubara
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Mastering a video game requires skill, tactics and strategy. While these
    attributes may be acquired naturally by human players, teaching them to a
    computer program is a far more challenging task. In recent years, extensive
    research was carried out in the field of reinforcement learning and numerous
    algorithms were introduced, aiming to learn how to perform human tasks such as
    playing video games. As a results, the Arcade Learning Environment (ALE) has
    become a commonly used benchmark environment allowing algorithms to train on
    various Atari 2600 games. Most Atari games no longer pose a challenge to
    state-of-the-art algorithms. In this paper we introduce a new learning
    environment, the Retro Learning Environment — RLE, based on the Super
    Nintendo Entertainment System (SNES). The environment is expandable, allowing
    for more video games and consoles to be easily added to the environment, while
    maintaining the same interface as ALE. Moreover, RLE is compatible with Python
    and Torch. SNES games pose a significant challenge to current algorithms due to
    their higher level of complexity and versatility. To overcome these challenges,
    we introduce a novel training method based on training two agents against each
    other.

    Reinforcement-based Simultaneous Algorithm and its Hyperparameters Selection

    Valeria Efimova, Andrey Filchenkov, Anatoly Shalyto
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Many algorithms for data analysis exist, especially for classification
    problems. To solve a data analysis problem, a proper algorithm should be
    chosen, and also its hyperparameters should be selected. In this paper, we
    present a new method for the simultaneous selection of an algorithm and its
    hyperparameters. In order to do so, we reduced this problem to the multi-armed
    bandit problem. We consider an algorithm as an arm and algorithm
    hyperparameters search during a fixed time as the corresponding arm play. We
    also suggest a problem-specific reward function. We performed the experiments
    on 10 real datasets and compare the suggested method with the existing one
    implemented in Auto-WEKA. The results show that our method is significantly
    better in most of the cases and never worse than the Auto-WEKA.

    Reinforcement Learning Approach for Parallelization in Filters Aggregation Based Feature Selection Algorithms

    Ivan Smetannikov, Ilya Isaev, Andrey Filchenkov
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    One of the classical problems in machine learning and data mining is feature
    selection. A feature selection algorithm is expected to be quick, and at the
    same time it should show high performance. MeLiF algorithm effectively solves
    this problem using ensembles of ranking filters. This article describes two
    different ways to improve MeLiF algorithm performance with parallelization.
    Experiments show that proposed schemes significantly improves algorithm
    performance and increase feature selection quality.

    An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax

    Wentao Huang, Kechen Zhang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

    A framework is presented for unsupervised learning of representations based
    on infomax principle for large-scale neural populations. We use an asymptotic
    approximation to the Shannon’s mutual information for a large neural population
    to demonstrate that a good initial approximation to the global
    information-theoretic optimum can be obtained by a hierarchical infomax method.
    From the initial solution, an efficient algorithm based on gradient descent of
    the final objective function is proposed to learn representations from the
    input datasets, allowing complete, overcomplete, or undercomplete bases. As
    confirmed by numerical experiments, our method is robust and highly efficient
    for extracting salient features from image datasets. Compared with the main
    existing methods, our algorithm has a distinct advantage in both the training
    speed and the robustness of unsupervised representation learning.

    Learning to Perform Physics Experiments via Deep Reinforcement Learning

    Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Physics and Society (physics.soc-ph)

    When encountering novel object, humans are able to infer a wide range of
    physical properties such as mass, friction and deformability by interacting
    with them in a goal driven way. This process of active interaction is in the
    same spirit of a scientist performing an experiment to discover hidden facts.
    Recent advances in artificial intelligence have yielded machines that can
    achieve superhuman performance in Go, Atari, natural language processing, and
    complex control problems, but it is not clear that these systems can rival the
    scientific intuition of even a young child. In this work we introduce a basic
    set of tasks that require agents to estimate hidden properties such as mass and
    cohesion of objects in an interactive simulated environment where they can
    manipulate the objects and observe the consequences. We found that state of art
    deep reinforcement learning methods can learn to perform the experiments
    necessary to discover such hidden properties. By systematically manipulating
    the problem difficulty and the cost incurred by the agent for performing
    experiments, we found that agents learn different strategies that balance the
    cost of gathering information against the cost of making mistakes in different
    situations.

    Learning to Act by Predicting the Future

    Alexey Dosovitskiy, Vladlen Koltun
    Comments: ICLR-2017 submission
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We present an approach to sensorimotor control in immersive environments. Our
    approach utilizes a high-dimensional sensory stream and a lower-dimensional
    measurement stream. The cotemporal structure of these streams provides a rich
    supervisory signal, which enables training a sensorimotor control model by
    interacting with the environment. The model is trained using supervised
    learning techniques, but without extraneous supervision. It learns to act based
    on raw sensory input from a complex three-dimensional environment. The
    presented formulation allows changing the agent’s goal at test time. We conduct
    extensive experiments in three-dimensional simulations based on the classical
    first-person game Doom. The results demonstrate that the presented approach
    outperforms sophisticated prior formulations, particularly on challenging
    tasks. The results also show that trained models successfully generalize across
    environments and goals. A model trained using the presented approach won the
    Full Deathmatch track of the Visual Doom AI Competition, which was held in
    previously unseen environments.

    A Compare-Aggregate Model for Matching Text Sequences

    Shuohang Wang, Jing Jiang
    Comments: 11 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Many NLP tasks including machine comprehension, answer selection and text
    entailment require the comparison between sequences. Matching the important
    units between sequences is a key to solve these problems. In this paper, we
    present a general “compare-aggregate” framework that performs word-level
    matching followed by aggregation using Convolutional Neural Networks. We
    particularly focus on the different comparison functions we can use to match
    two vectors. We use four different datasets to evaluate the model. We find that
    some simple comparison functions based on element-wise operations can work
    better than standard neural network and neural tensor network.

    Causes for Query Answers from Databases: Datalog Abduction, View-Updates, and Integrity Constraints

    Leopoldo Bertossi, Babak Salimi
    Comments: Journal submission. Extended version of Flairs’16 and UAI’15 WS on Causality papers
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

    Causality has been recently introduced in databases, to model, characterize,
    and possibly compute causes for query answers. Connections between QA-causality
    and consistency-based diagnosis and database repairs (wrt. integrity constraint
    violations) have already been established. In this work we establish precise
    connections between QA-causality and both abductive diagnosis and the
    view-update problem in databases, allowing us to obtain new algorithmic and
    complexity results for QA-causality. We also obtain new results on the
    complexity of view-conditioned causality, and investigate the notion of
    QA-causality in the presence of integrity constraints, obtaining complexity
    results from a connection with view-conditioned causality. The abduction
    connection under integrity constraints allows us to obtain algorithmic tools
    for QA-causality.

    Detecting Dependencies in High-Dimensional, Sparse Databases Using Probabilistic Programming and Non-parametric Bayes

    Feras Saad, Vikash Mansinghka
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Sparse databases with hundreds of variables are commonplace. In this setting,
    it is both statistically and computationally challenging to detect true
    predictive relationships between variables and also to suppress false
    positives. This paper proposes a new approach to dependency detection that
    combines probabilistic programming, information theory, and non-parametric
    Bayesian modeling. The key ideas are to (i) build an ensemble of joint
    probability models for the whole database via approximate posterior inference
    in CrossCat, a non-parametric factorial mixture; (ii) identify independencies
    by analyzing model structures; and (iii) report the distribution on conditional
    mutual information induced by posterior uncertainty over the ensemble of
    models. This paper presents experiments showing that the approach finds
    relationships that pairwise correlation misses, including context-specific
    independencies, on databases of mathematics exam scores and global indicators
    of macroeconomic development.

    TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

    Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based
    language model designed to directly capture the global semantic meaning
    relating words in a document via latent topics. Because of their sequential
    nature, RNNs are good at capturing the local structure of a word sequence –
    both semantic and syntactic – but might face difficulty remembering long-range
    dependencies. Intuitively, these long-range dependencies are of semantic
    nature. In contrast, latent topic models are able to capture the global
    underlying semantic structure of a document but do not account for word
    ordering. The proposed TopicRNN model integrates the merits of RNNs and latent
    topic models: it captures local (syntactic) dependencies using an RNN and
    global (semantic) dependencies using latent topics. Unlike previous work on
    contextual RNN language modeling, our model is learned end-to-end. Empirical
    results on word prediction show that TopicRNN outperforms existing contextual
    RNN baselines. In addition, TopicRNN can be used as an unsupervised feature
    extractor for documents. We do this for sentiment analysis and report a new
    state-of-the-art error rate on the IMDB movie review dataset that amounts to a
    (13.3\%) improvement over the previous best result. Finally TopicRNN also
    yields sensible topics, making it a useful alternative to document models such
    as latent Dirichlet allocation.

    A Differentiable Physics Engine for Deep Learning in Robotics

    Jonas Degrave, Michiel Hermans, Joni Dambre, Francis wyffels
    Comments: International Conference on Learning Representations 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    One of the most important fields in robotics is the optimization of
    controllers. Currently, robots are treated as a black box in this optimization
    process, which is the reason why derivative-free optimization methods such as
    evolutionary algorithms or reinforcement learning are omnipresent. We propose
    an implementation of a modern physics engine, which has the ability to
    differentiate control parameters. This has been implemented on both CPU and
    GPU. We show how this speeds up the optimization process, even for small
    problems, and why it will scale to bigger problems. We explain why this is an
    alternative approach to deep Q-learning, for using deep learning in robotics.
    Lastly, we argue that this is a big step for deep learning in robotics, as it
    opens up new possibilities to optimize robots, both in hardware and software.

    PGQ: Combining policy gradient and Q-learning

    Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Policy gradient is an efficient technique for improving a policy in a
    reinforcement learning setting. However, vanilla online variants are on-policy
    only and not able to take advantage of off-policy data. In this paper we
    describe a new technique that combines policy gradient with off-policy
    Q-learning, drawing experience from a replay buffer. This is motivated by
    making a connection between the fixed points of the regularized policy gradient
    algorithm and the Q-values. This connection allows us to estimate the Q-values
    from the action preferences of the policy, to which we apply Q-learning
    updates. We refer to the new technique as ‘PGQ’, for policy gradient and
    Q-learning. We also establish an equivalency between action-value fitting
    techniques and actor-critic algorithms, showing that regularized policy
    gradient techniques can be interpreted as advantage function learning
    algorithms. We conclude with some numerical examples that demonstrate improved
    data efficiency and stability of PGQ. In particular, we tested PGQ on the full
    suite of Atari games and achieved performance exceeding that of both
    asynchronous advantage actor-critic (A3C) and Q-learning.

    Neural Architecture Search with Reinforcement Learning

    Barret Zoph, Quoc V. Le
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Neural networks are powerful and flexible models that work well for many
    difficult learning tasks in image, speech and natural language understanding.
    Despite their success, neural networks are still hard to design. In this paper,
    we use a recurrent network to generate the model descriptions of neural
    networks and train this RNN with reinforcement learning to maximize the
    expected accuracy of the generated architectures on a validation set. On the
    CIFAR-10 dataset, our method, starting from scratch, can design a novel network
    architecture that rivals the best human-invented architecture in terms of test
    set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84, which is
    only 0.1 percent worse and 1.2x faster than the current state-of-the-art model.
    On the Penn Treebank dataset, our model can compose a novel recurrent cell that
    outperforms the widely-used LSTM cell, and other state-of-the-art baselines.
    Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is
    3.6 perplexity better than the previous state-of-the-art.

    QBF Solving by Counterexample-guided Expansion

    Roderick Bloem, Nicolas Braud-Santoni, Vedad Hadzic
    Comments: pre-print, submitted at TACAS 2017
    Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

    We introduce a novel generalization of Counterexample-Guided Inductive
    Synthesis (CEGIS) and instantiate it to yield a novel, competitive algorithm
    for solving Quantified Boolean Formulas (QBF). Current QBF solvers based on
    counterexample-guided expansion use a recursive approach which scales poorly
    with the number of quantifier alternations. Our generalization of CEGIS removes
    the need for this recursive approach, and we instantiate it to yield a simple
    and efficient algorithm for QBF solving. Lastly, this research is supported by
    a competitive, though straightforward, implementation of the algorithm, making
    it possible to study the practical impact of our algorithm design decisions,
    along with various optimizations.

    Topology and Geometry of Deep Rectified Network Optimization Landscapes

    C. Daniel Freeman, Joan Bruna
    Comments: 24 Pages (12 main + 12 Appendix), 4 Figures, 1 Table, Submitted to ICLR for review
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Statistics Theory (math.ST)

    The loss surface of deep neural networks has recently attracted interest in
    the optimization and machine learning communities as a prime example of
    high-dimensional non-convex problem. Some insights were recently gained using
    spin glass models and mean-field approximations, but at the expense of strongly
    simplifying the nonlinear nature of the model.

    In this work, we do not make any such assumption and study conditions on the
    data distribution and model architecture that prevent the existence of bad
    local minima. Our theoretical work quantifies and formalizes two important
    emph{folklore} facts: (i) the landscape of deep linear networks has a
    radically different topology from that of deep half-rectified ones, and (ii)
    that the energy landscape in the non-linear case is fundamentally controlled by
    the interplay between the smoothness of the data distribution and model
    over-parametrization. These results are in accordance with empirical practice
    and recent literature. %Together with %recent results that rigorously establish
    that no gradient descent can %get stuck on saddle points, we conclude that
    gradient descent converges %to a global optimum in deep rectified networks.

    The conditioning of gradient descent is the next challenge we address. We
    study this question through the geometry of the level sets, and we introduce an
    algorithm to efficiently estimate the regularity of such sets on large-scale
    networks. Our empirical results show that these level sets remain connected
    throughout all the learning phase, suggesting a near convex behavior, but they
    become exponentially more curvy as the energy level decays, in accordance to
    what is observed in practice with very low curvature attractors.


    Information Retrieval

    EpistAid: Interactive Interface for Document Filtering in Evidence-based Health Care

    Ivania Donoso, Denis Parra
    Comments: 5 pages, 4 figures, pre-print submitted to ACM IUI 2017
    Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)

    Evidence-based health care (EBHC) is an important practice of medicine which
    attempts to provide systematic scientific evidence to answer clinical
    questions. In this context, Epistemonikos (www.epistemonikos.org) is one of the
    first and most important online systems in the field, providing an interface
    that supports users on searching and filtering scientific articles for
    practicing EBHC. The system nowadays requires a large amount of expert human
    effort, where close to 500 physicians manually curate articles to be utilized
    in the platform. In order to scale up the large and continuous amount of data
    to keep the system updated, we introduce EpistAid, an interactive intelligent
    interface which supports clinicians in the process of curating documents for
    Epistemonikos within lists of papers called evidence matrices. We introduce the
    characteristics, design and algorithms of our solution, as well as a prototype
    implementation and a case study to show how our solution addresses the
    information overload problem in this area.

    Item-to-item recommendation based on Contextual Fisher Information

    Bálint Daróczy, Frederick Ayala-Gómez, András Benczúr
    Comments: 9 pages, 8 figures, 4 tables
    Subjects: Information Retrieval (cs.IR)

    Web recommendation services bear great importance in e-commerce, as they aid
    the user in navigating through the items that are most relevant to her needs.
    In a typical Web site, long history of previous activities or purchases by the
    user is rarely available. Hence in most cases, recommenders propose items that
    are similar to the most recent ones viewed in the current user session. The
    corresponding task is called session based item-to-item recommendation. For
    frequent items, it is easy to present item-to-item recommendations by “people
    who viewed this, also viewed” lists. However, most of the items belong to the
    long tail, where previous actions are sparsely available. Another difficulty is
    the so-called cold start problem, when the item has recently appeared and had
    no time yet to accumulate sufficient number of transactions. In order to
    recommend a next item in a session in sparse or cold start situations, we also
    have to incorporate item similarity models. In this paper we describe a
    probabilistic similarity model based on Random Fields to approximate
    item-to-item transition probabilities. We give a generative model for the item
    interactions based on arbitrary distance measures over the items including
    explicit, implicit ratings and external metadata. The model may change in time
    to fit better recent events and recommend the next item based on the updated
    Fisher Information. Our new model outperforms both simple similarity baseline
    methods and recent item-to-item recommenders, under several different
    performance metrics and publicly available data sets. We reach significant
    gains in particular for recommending a new item following a rare item.

    Multiround Private Information Retrieval: Capacity and Storage Overhead

    Hua Sun, Syed A. Jafar
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)

    The capacity has recently been characterized for the private information
    retrieval (PIR) problem as well as several of its variants. In every case it is
    assumed that all the queries are generated by the user simultaneously. Here we
    consider multiround PIR, where the queries in each round are allowed to depend
    on the answers received in previous rounds. We show that the capacity of
    multiround PIR is the same as the capacity of single-round PIR (the result is
    generalized to also include (T)-privacy constraints). Combined with previous
    results, this shows that there is no capacity advantage from multiround over
    single-round schemes, non-linear over linear schemes or from (epsilon)-error
    over zero-error schemes. However, we show through an example that there is an
    advantage in terms of storage overhead. We provide an example of a multiround,
    non-linear, (epsilon)-error PIR scheme that requires a strictly smaller
    storage overhead than the best possible with single-round, linear, zero-error
    PIR schemes.

    Self-Wiring Question Answering Systems

    Ricardo Usbeck, Jonathan Huthmann, Nico Duldhardt, Axel-Cyrille Ngonga Ngomo
    Comments: 6 pages, 1 figure, pre-print in lncs
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Question answering (QA) has been the subject of a resurgence over the past
    years. The said resurgence has led to a multitude of question answering (QA)
    systems being developed both by companies and research facilities. While a few
    components of QA systems get reused across implementations, most systems do not
    leverage the full potential of component reuse. Hence, the development of QA
    systems is currently still a tedious and time-consuming process. We address the
    challenge of accelerating the creation of novel or tailored QA systems by
    presenting a concept for a self-wiring approach to composing QA systems. Our
    approach will allow the reuse of existing, web-based QA systems or modules
    while developing new QA platforms. To this end, it will rely on QA modules
    being described using the Web Ontology Language. Based on these descriptions,
    our approach will be able to automatically compose QA systems using a
    data-driven approach automatically.

    Scalable Holistic Analysis of Multi-Source, Data-Intensive Problems Using Multilayered Networks

    Abhishek Santra, Sanjukta Bhowmick, Sharma Chakravarthy
    Subjects: Databases (cs.DB); Discrete Mathematics (cs.DM); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Holistic analysis of many real-world problems are based on data collected
    from multiple sources contributing to some aspect of that problem. The word
    fusion has also been used in the literature for such problems involving
    disparate data types. Holistically understanding traffic patterns, causes of
    accidents, bombings, terrorist planning and many natural phenomenon such as
    storms, earthquakes fall into this category. Some may have real-time
    requirements and some may need to be analyzed after the fact (post-mortem or
    forensic analysis.) What is common for all these problems is that the amount
    and types of data associated with the event. Data may also be incomplete and
    trustworthiness of sources may also vary. Currently, manual and ad-hoc
    approaches are used in aggregating data in different ways for analyzing and
    understanding these problems.

    In this paper, we approach this problem in a novel way using multilayered
    networks. We identify features of a central event and propose a network layer
    for each feature. This approach allows us to study the effect of each feature
    independently and its impact on the event. We also establish that the proposed
    approach allows us to compose these features in arbitrary ways (without loss of
    information) to analyze their combined effect. Additionally, formulation of
    relationships (e.g., distance measure for a single feature instead of several
    at the same time) is simpler. Further, computations can be done once on each
    layer in this approach and reused for mixing and matching the features for
    aggregate impacts and “what if” scenarios to understand the problem
    holistically. This has been demonstrated by recreating the communities for the
    AND-Composed network by using the communities of the individual layers.

    We believe that techniques proposed here make an important contribution to
    the nascent yet fast growing area of data fusion.


    Computation and Language

    Gaussian Attention Model and Its Application to Knowledgebase Embedding and Question Answering

    Liwen Zhang, John Winn, Ryota Tomioka
    Comments: 12 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We propose the Gaussian attention model for content-based neural memory
    access. With the proposed attention model, a neural network has the additional
    degree of freedom to control the focus of its attention from a laser sharp
    attention to blurred attention. It is applicable whenever we can assume that
    the distance in the latent space reflects some notion of semantics. We use the
    proposed attention model as a scoring function for the embedding of a
    knowledgebase into a continuous vector space and then train a model that
    performs question answering about the entities in the knowledgebase. The
    proposed attention model can handle both the propagation of uncertainty when
    following a series of relations and also the conjunction of thoughts in a
    natural way. On a dataset of soccer players who participated in the FIFA World
    Cup 2014, we demonstrate that our model can handle both path queries and
    conjunctive queries well.

    Building a comprehensive syntactic and semantic corpus of Chinese clinical texts

    Bin He, Bin Dong, Yi Guan, Jinfeng Yang, Zhipeng Jiang, Qiubin Yu, Jianyi Cheng, Chunyan Qu
    Comments: 27 pages
    Subjects: Computation and Language (cs.CL)

    Objective: To build a comprehensive corpus covering syntactic and semantic
    annotations of Chinese clinical texts with corresponding annotation guidelines
    and methods as well as to develop tools trained on the annotated corpus, which
    supplies baselines for research on Chinese texts in the clinical domain.
    Materials and methods: An iterative annotation method was proposed to train
    annotators and to develop annotation guidelines. Then, by using annotation
    quality assurance measures, a comprehensive corpus was built, containing
    annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions,
    and relations. Inter-annotator agreement (IAA) was calculated to evaluate the
    annotation quality and a Chinese clinical text processing and information
    extraction system (CCTPIES) was developed based on our annotated corpus.
    Results: The syntactic corpus consists of 138 Chinese clinical documents with
    47,424 tokens and 2553 full parsing trees, while the semantic corpus includes
    992 documents that annotated 39,511 entities with their assertions and 7695
    relations. IAA evaluation shows that this comprehensive corpus is of good
    quality, and the system modules are effective. Discussion: The annotated corpus
    makes a considerable contribution to natural language processing (NLP) research
    into Chinese texts in the clinical domain. However, this corpus has a number of
    limitations. Some additional types of clinical text should be introduced to
    improve corpus coverage and active learning methods should be utilized to
    promote annotation efficiency. Conclusions: In this study, several annotation
    guidelines and an annotation method for Chinese clinical texts were proposed,
    and a comprehensive corpus with its NLP modules were constructed, providing a
    foundation for further study of applying NLP techniques to Chinese texts in the
    clinical domain.

    :telephone::person::sailboat::whale::okhand:; or "Call me Ishmael" – How do you translate emoji?

    Will Radford, Andrew Chisholm, Ben Hachey, Bo Han
    Subjects: Computation and Language (cs.CL)

    We report on an exploratory analysis of Emoji Dick, a project that leverages
    crowdsourcing to translate Melville’s Moby Dick into emoji. This distinctive
    use of emoji removes textual context, and leads to a varying translation
    quality. In this paper, we use statistical word alignment and part-of-speech
    tagging to explore how people use emoji. Despite these simple methods, we
    observed differences in token and part-of-speech distributions. Experiments
    also suggest that semantics are preserved in the translation, and repetition is
    more common in emoji.

    Presenting a New Dataset for the Timeline Generation Problem

    Xavier Holt, Will Radford, Ben Hachey
    Subjects: Computation and Language (cs.CL)

    The timeline generation task summarises an entity’s biography by selecting
    stories representing key events from a large pool of relevant documents. This
    paper addresses the lack of a standard dataset and evaluative methodology for
    the problem. We present and make publicly available a new dataset of 18,793
    news articles covering 39 entities. For each entity, we provide a gold standard
    timeline and a set of entity-related articles. We propose ROUGE as an
    evaluation metric and validate our dataset by showing that top Google results
    outperform straw-man baselines.

    Keyphrase Annotation with Graph Co-Ranking

    Adrien Bougouin, Florian Boudin, Béatrice Daille
    Comments: Accepted at the COLING 2016 conference
    Subjects: Computation and Language (cs.CL)

    Keyphrase annotation is the task of identifying textual units that represent
    the main content of a document. Keyphrase annotation is either carried out by
    extracting the most important phrases from a document, keyphrase extraction, or
    by assigning entries from a controlled domain-specific vocabulary, keyphrase
    assignment. Assignment methods are generally more reliable. They provide
    better-formed keyphrases, as well as keyphrases that do not occur in the
    document. But they are often silent on the contrary of extraction methods that
    do not depend on manually built resources. This paper proposes a new method to
    perform both keyphrase extraction and keyphrase assignment in an integrated and
    mutual reinforcing manner. Experiments have been carried out on datasets
    covering different domains of humanities and social sciences. They show
    statistically significant improvements compared to both keyphrase extraction
    and keyphrase assignment state-of-the art methods.

    AC-BLSTM: Asymmetric Convolutional Bidirectional LSTM Networks for Text Classification

    Depeng Liang, Yongdong Zhang
    Comments: 7 pages
    Subjects: Computation and Language (cs.CL)

    Recently deeplearning models have been shown to be capable of making
    remarkable performance in sentences and documents classification tasks. In this
    work, we propose a novel framework called AC-BLSTM for modeling setences and
    documents, which combines the asymmetric convolution neural network (ACNN) with
    the Bidirectional Long Short-Term Memory network (BLSTM). Experiment results
    demonstrate that our model achieves state-of-the-art results on all six tasks,
    including sentiment analysis, question type classification, and subjectivity
    classification.

    Neural Machine Translation with Reconstruction

    Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li
    Subjects: Computation and Language (cs.CL)

    Although end-to-end Neural Machine Translation (NMT) has achieved remarkable
    progress in the past two years, it suffers from a major drawback: translations
    generated by NMT systems often lack of adequacy. It has been widely observed
    that NMT tends to repeatedly translate some source words while mistakenly
    ignoring other words. To alleviate this problem, we propose a novel
    encoder-decoder-reconstructor framework for NMT. The reconstructor,
    incorporated into the NMT model, manages to reconstruct the input source
    sentence from the hidden layer of the output target sentence, to ensure that
    the information in the source side is transformed to the target side as much as
    possible. Experiments show that the proposed framework significantly improves
    the adequacy of NMT output and achieves superior translation result over
    state-of-the-art NMT and statistical MT systems.

    Truth Discovery with Memory Network

    Luyang Li, Bing Qin, Wenjing Ren, Ting Liu
    Comments: 10 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Databases (cs.DB)

    Truth discovery is to resolve conflicts and find the truth from
    multiple-source statements. Conventional methods mostly research based on the
    mutual effect between the reliability of sources and the credibility of
    statements, however, pay no attention to the mutual effect among the
    credibility of statements about the same object. We propose memory network
    based models to incorporate these two ideas to do the truth discovery. We use
    feedforward memory network and feedback memory network to learn the
    representation of the credibility of statements which are about the same
    object. Specially, we adopt memory mechanism to learn source reliability and
    use it through truth prediction. During learning models, we use multiple types
    of data (categorical data and continuous data) by assigning different weights
    automatically in the loss function based on their own effect on truth discovery
    prediction. The experiment results show that the memory network based models
    much outperform the state-of-the-art method and other baseline methods.

    Latent Attention For If-Then Program Synthesis

    Xinyun Chen, Chang Liu, Richard Shin, Dawn Song, Mingcheng Chen
    Comments: Accepted by NIPS 2016
    Subjects: Computation and Language (cs.CL)

    Automatic translation from natural language descriptions into programs is a
    longstanding challenging problem. In this work, we consider a simple yet
    important sub-problem: translation from textual descriptions to If-Then
    programs. We devise a novel neural network architecture for this task which we
    train end-to-end. Specifically, we introduce Latent Attention, which computes
    multiplicative weights for the words in the description in a two-stage process
    with the goal of better leveraging the natural language structures that
    indicate the relevant parts for predicting program elements. Our architecture
    reduces the error rate by 28.57% compared to prior art. We also propose a
    one-shot learning scenario of If-Then program synthesis and simulate it with
    our existing dataset. We demonstrate a variation on the training procedure for
    this scenario that outperforms the original procedure, significantly closing
    the gap to the model trained with all data.

    Hierarchical Question Answering for Long Documents

    Eunsol Choi, Daniel Hewlett, Alexandre Lacoste, Illia Polosukhin, Jakob Uszkoreit, Jonathan Berant
    Subjects: Computation and Language (cs.CL)

    Reading an article and answering questions about its content is a fundamental
    task for natural language understanding. While most successful neural
    approaches to this problem rely on recurrent neural networks (RNNs), training
    RNNs over long documents can be prohibitively slow. We present a novel
    framework for question answering that can efficiently scale to longer documents
    while maintaining or even improving performance. Our approach combines a
    coarse, inexpensive model for selecting one or more relevant sentences and a
    more expensive RNN that produces the answer from those sentences. A central
    challenge is the lack of intermediate supervision for the coarse model, which
    we address using reinforcement learning. Experiments demonstrate
    state-of-the-art performance on a challenging subset of the WIKIREADING
    dataset(Hewlett et al., 2016) and on a newly-gathered dataset, while reducing
    the number of sequential RNN steps by 88% against a standard sequence to
    sequence model.

    Domain Adaptation For Formant Estimation Using Deep Learning

    Yehoshua Dissen, Joseph Keshet, Jacob Goldberger, Cynthia Clopper
    Subjects: Computation and Language (cs.CL); Sound (cs.SD)

    In this paper we present a domain adaptation technique for formant estimation
    using a deep network. We first train a deep learning network on a small read
    speech dataset. We then freeze the parameters of the trained network and use
    several different datasets to train an adaptation layer that makes the obtained
    network universal in the sense that it works well for a variety of speakers and
    speech domains with very different characteristics. We evaluated our adapted
    network on three datasets, each of which has different speaker characteristics
    and speech styles. The performance of our method compares favorably with
    alternative methods for formant estimation.

    A Compare-Aggregate Model for Matching Text Sequences

    Shuohang Wang, Jing Jiang
    Comments: 11 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Many NLP tasks including machine comprehension, answer selection and text
    entailment require the comparison between sequences. Matching the important
    units between sequences is a key to solve these problems. In this paper, we
    present a general “compare-aggregate” framework that performs word-level
    matching followed by aggregation using Convolutional Neural Networks. We
    particularly focus on the different comparison functions we can use to match
    two vectors. We use four different datasets to evaluate the model. We find that
    some simple comparison functions based on element-wise operations can work
    better than standard neural network and neural tensor network.

    Deep Biaffine Attention for Neural Dependency Parsing

    Timothy Dozat, Christopher D. Manning
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    While deep learning parsing approaches have proven very successful at finding
    the structure of sentences, most neural dependency parsers use neural networks
    only for feature extraction, and then use those features in traditional parsing
    algorithms. In contrast, this paper builds off recent work using
    general-purpose neural network components, training an attention mechanism over
    an LSTM to attend to the head of the phrase. We get state-of-the-art results
    for standard dependency parsing benchmarks, achieving 95.44% UAS and 93.76% LAS
    on the PTB dataset, 0.8% and 1.0% improvement, respectively, over Andor et al.
    (2016). In addition to proposing a new parsing architecture using
    dimensionality reduction and biaffine interactions, we examine simple
    hyperparameter choices that had a profound influence on the model’s
    performance, such as reducing the value of beta2 in the Adam optimization
    algorithm.

    Words or Characters? Fine-grained Gating for Reading Comprehension

    Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Previous work combines word-level and character-level representations using
    concatenation or scalar weighting, which is suboptimal for high-level tasks
    like reading comprehension. We present a fine-grained gating mechanism to
    dynamically combine word-level and character-level representations based on
    properties of the words. We also extend the idea of fine-grained gating to
    modeling the interaction between questions and paragraphs for reading
    comprehension. Experiments show that our approach can improve the performance
    on reading comprehension tasks, achieving new state-of-the-art results on the
    Children’s Book Test dataset. To demonstrate the generality of our gating
    mechanism, we also show improved results on a social media tag prediction task.

    TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

    Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based
    language model designed to directly capture the global semantic meaning
    relating words in a document via latent topics. Because of their sequential
    nature, RNNs are good at capturing the local structure of a word sequence –
    both semantic and syntactic – but might face difficulty remembering long-range
    dependencies. Intuitively, these long-range dependencies are of semantic
    nature. In contrast, latent topic models are able to capture the global
    underlying semantic structure of a document but do not account for word
    ordering. The proposed TopicRNN model integrates the merits of RNNs and latent
    topic models: it captures local (syntactic) dependencies using an RNN and
    global (semantic) dependencies using latent topics. Unlike previous work on
    contextual RNN language modeling, our model is learned end-to-end. Empirical
    results on word prediction show that TopicRNN outperforms existing contextual
    RNN baselines. In addition, TopicRNN can be used as an unsupervised feature
    extractor for documents. We do this for sentiment analysis and report a new
    state-of-the-art error rate on the IMDB movie review dataset that amounts to a
    (13.3\%) improvement over the previous best result. Finally TopicRNN also
    yields sensible topics, making it a useful alternative to document models such
    as latent Dirichlet allocation.

    Reference-Aware Language Models

    Zichao Yang, Phil Blunsom, Chris Dyer, Wang Ling
    Comments: iclr submission
    Subjects: Computation and Language (cs.CL)

    We propose a general class of language models that treat reference as an
    explicit stochastic latent variable. This architecture allows models to create
    mentions of entities and their attributes by accessing external databases
    (required by, e.g., dialogue generation and recipe generation) and internal
    state (required by, e.g. language models which are aware of coreference). This
    facilitates the incorporation of information that can be accessed in
    predictable locations in databases or discourse context, even when the targets
    of the reference may be rare words. Experiments on three tasks shows our model
    variants based on deterministic attention.

    Dynamic Coattention Networks For Question Answering

    Caiming Xiong, Victor Zhong, Richard Socher
    Comments: 13 pages, 7 figures
    Subjects: Computation and Language (cs.CL)

    Several deep learning models have been proposed for question answering.
    However, due to their single-pass nature, they have no way to recover from
    local maxima corresponding to incorrect answers. To address this problem, we
    introduce the Dynamic Coattention Network (DCN) for question answering. The DCN
    first fuses co-dependent representations of the question and the document in
    order to focus on relevant parts of both. Then a dynamic pointing decoder
    iterates over potential answer spans. This iterative procedure enables the
    model to recover from initial local maxima corresponding to incorrect answers.
    On the Stanford question answering dataset, a single DCN model improves the
    previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains
    80.4% F1.

    Bidirectional Attention Flow for Machine Comprehension

    Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi
    Subjects: Computation and Language (cs.CL)

    Machine Comprehension (MC), answering questions about a given context,
    re-quires modeling complex interactions between the context and the query.
    Recently, attention mechanisms have been successfully extended to MC. Typically
    these mechanisms use attention to summarize the query and context into a single
    vectors, couple attentions temporally, and often form a unidirectional
    attention. In this paper we introduce the Bidirectional Attention Flow (BIDAF)
    Model, a multi-stage hierarchical process that represents the context at
    different levels of granularity and uses a bi-directional attention flow
    mechanism to achieve a query-aware context representation without early
    summarization. Our experimental evaluations show that our model achieves the
    state-of-the-art results in Stanford QA(SQuAD) and CNN/DailyMail Cloze Test
    datasets.

    A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

    Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Computation and Language (cs.CL)

    Transfer and multi-task learning have traditionally focused on either a
    single source-target pair or very few, similar tasks. Ideally, the linguistic
    levels of morphology, syntax and semantics would benefit each other by being
    trained in a single model. We introduce such a joint many-task model together
    with a strategy for successively growing its depth to solve increasingly
    complex tasks. All layers include shortcut connections to both word
    representations and lower-level task predictions. We use a simple
    regularization term to allow for optimizing all model weights to improve one
    task’s loss without exhibiting catastrophic interference of the other tasks.
    Our single end-to-end trainable model obtains state-of-the-art results on
    chunking, dependency parsing, semantic relatedness and textual entailment. It
    also performs competitively on POS tagging. Our dependency parsing layer relies
    only on a single feed-forward pass and does not require a beam search.

    Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations

    Philip Blair, Yuval Merhav, Joel Barry
    Comments: 12 pages
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    We propose a language-agnostic way of automatically generating sets of
    semantically similar clusters of entities along with sets of “outlier”
    elements, which may then be used to perform an intrinsic evaluation of word
    embeddings in the outlier detection task. We used our methodology to create a
    gold-standard dataset, which we call WikiSem500, and evaluated multiple
    state-of-the-art embeddings. The results show a correlation between performance
    on this dataset and performance on sentiment analysis.

    Self-Wiring Question Answering Systems

    Ricardo Usbeck, Jonathan Huthmann, Nico Duldhardt, Axel-Cyrille Ngonga Ngomo
    Comments: 6 pages, 1 figure, pre-print in lncs
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Question answering (QA) has been the subject of a resurgence over the past
    years. The said resurgence has led to a multitude of question answering (QA)
    systems being developed both by companies and research facilities. While a few
    components of QA systems get reused across implementations, most systems do not
    leverage the full potential of component reuse. Hence, the development of QA
    systems is currently still a tedious and time-consuming process. We address the
    challenge of accelerating the creation of novel or tailored QA systems by
    presenting a concept for a self-wiring approach to composing QA systems. Our
    approach will allow the reuse of existing, web-based QA systems or modules
    while developing new QA platforms. To this end, it will rely on QA modules
    being described using the Web Ontology Language. Based on these descriptions,
    our approach will be able to automatically compose QA systems using a
    data-driven approach automatically.

    Beyond Fine Tuning: A Modular Approach to Learning on Small Data

    Ark Anderson, Kyle Shaffer, Artem Yankov, Court D. Corley, Nathan O. Hodas
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    In this paper we present a technique to train neural network models on small
    amounts of data. Current methods for training neural networks on small amounts
    of rich data typically rely on strategies such as fine-tuning a pre-trained
    neural network or the use of domain-specific hand-engineered features. Here we
    take the approach of treating network layers, or entire networks, as modules
    and combine pre-trained modules with untrained modules, to learn the shift in
    distributions between data sets. The central impact of using a modular approach
    comes from adding new representations to a network, as opposed to replacing
    representations via fine-tuning. Using this technique, we are able surpass
    results using standard fine-tuning transfer learning approaches, and we are
    also able to significantly increase performance over such approaches when using
    smaller amounts of data.

    LipNet: Sentence-level Lipreading

    Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    Lipreading is the task of decoding text from the movement of a speaker’s
    mouth. Traditional approaches separated the problem into two stages: designing
    or learning visual features, and prediction. More recent deep lipreading
    approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman,
    2016a). All existing works, however, perform only word classification, not
    sentence-level sequence prediction. Studies have shown that human lipreading
    performance increases for longer words (Easton & Basala, 1982), indicating the
    importance of features capturing temporal context in an ambiguous communication
    channel. Motivated by this observation, we present LipNet, a model that maps a
    variable-length sequence of video frames to text, making use of spatiotemporal
    convolutions, an LSTM recurrent network, and the connectionist temporal
    classification loss, trained entirely end-to-end. To the best of our knowledge,
    LipNet is the first lipreading model to operate at sentence-level, using a
    single end-to-end speaker-independent deep model to simultaneously learn
    spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet
    achieves 93.4% accuracy, outperforming experienced human lipreaders and the
    previous 79.6% state-of-the-art accuracy.

    Quasi-Recurrent Neural Networks

    James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
    Comments: Submitted to conference track at ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)

    Recurrent neural networks are a powerful tool for modeling sequential data,
    but the dependence of each timestep’s computation on the previous timestep’s
    output limits parallelism and makes RNNs unwieldy for very long sequences. We
    introduce quasi-recurrent neural networks (QRNNs), an approach to neural
    sequence modeling that alternates convolutional layers, which apply in parallel
    across timesteps, and a minimalist recurrent pooling function that applies in
    parallel across channels. Despite lacking trainable recurrent layers, stacked
    QRNNs have better predictive accuracy than stacked LSTMs of the same hidden
    size. Due to their increased parallelism, they are up to 16 times faster at
    train and test time. Experiments on language modeling, sentiment
    classification, and character-level neural machine translation demonstrate
    these advantages and underline the viability of QRNNs as a basic building block
    for a variety of sequence tasks.


    Distributed, Parallel, and Cluster Computing

    Cooperative Simultaneous Localization and Synchronization in Mobile Agent Networks

    Bernhard Etzlinger, Florian Meyer, Franz Hlawatsch, Andreas Springer, Henk Wymeersch
    Comments: 13 pages, 6 figures, 3 tables; manuscript submitted to IEEE Transaction on Signal Processing
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY)

    Cooperative localization in agent networks based on interagent time-of-flight
    measurements is closely related to synchronization. To leverage this relation,
    we propose a Bayesian factor graph framework for cooperative simultaneous
    localization and synchronization (CoSLAS). This framework is suited to mobile
    agents and time-varying local clock parameters. Building on the CoSLAS factor
    graph, we develop a distributed (decentralized) belief propagation algorithm
    for CoSLAS in the practically important case of an affine clock model and
    asymmetric time stamping. Our algorithm allows for real-time operation and is
    suitable for a time-varying network connectivity. To achieve high accuracy at
    reduced complexity and communication cost, the algorithm combines particle
    implementations with parametric message representations and takes advantage of
    a conditional independence property. Simulation results demonstrate the good
    performance of the proposed algorithm in a challenging scenario with
    time-varying network connectivity.

    A Fault-tolerance Linguistic Structure for Distributed Applications

    Vincenzo De Florio
    Comments: Doctoral thesis, successfully defended on October 13, 2000
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The structures for the expression of fault-tolerance provisions into the
    application software are the central topic of this dissertation. Structuring
    techniques provide means to control complexity, the latter being a relevant
    factor for the introduction of design faults. This fact and the ever increasing
    complexity of today’s distributed software justify the need for simple,
    coherent, and effective structures for the expression of fault-tolerance in the
    application software. A first contribution of this dissertation is the
    definition of a base of structural attributes with which application-level
    fault-tolerance structures can be qualitatively assessed and compared with each
    other and with respect to the above mentioned need. This result is then used to
    provide an elaborated survey of the state-of-the-art of software
    fault-tolerance structures. The key contribution of this work is a novel
    structuring technique for the expression of the fault-tolerance design concerns
    in the application layer of those distributed software systems that are
    characterized by soft real-time requirements and with a number of processing
    nodes known at compile-time. The main thesis of this dissertation is that this
    new structuring technique is capable of exhibiting satisfactory values of the
    structural attributes in the domain of soft real-time, distributed and parallel
    applications. Following this novel approach, beside the conventional
    programming language addressing the functional design concerns, a
    special-purpose linguistic structure (the so-called “recovery language”) is
    available to address error recovery and reconfiguration. This recovery language
    comes into play as soon as an error is detected by an underlying error
    detection layer, or when some erroneous condition is signaled by the
    application processes.

    Practical scalability assesment for parallel scientific numerical applications

    Natalie Perlin, Joel P. Zysman, Ben P. Kirtman
    Comments: 9 pages, 8 figures, 1 table
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Engineering, Finance, and Science (cs.CE); Atmospheric and Oceanic Physics (physics.ao-ph)

    The concept of scalability analysis of numerical parallel applications has
    been revisited, with the specific goals defined for the performance estimation
    of research applications. A series of Community Climate Model System (CCSM)
    numerical simulations were used to test the several MPI implementations,
    determine optimal use of the system resources, and their scalability. The
    scaling capacity and model throughput performance metrics for (N) cores showed
    a log-linear behavior approximated by a power fit in the form of (C(N)=bN^a),
    where (a) and (b) are two empirical constants. Different metrics yielded
    identical power coefficients ((a)), but different dimensionality coefficients
    ((b)). This model was consistent except for the large numbers of N. The power
    fit approach appears to be very useful for scalability estimates, especially
    when no serial testing is possible. Scalability analysis of additional
    scientific application has been conducted in the similar way to validate the
    robustness of the power fit approach.

    MetaFlow: a Scalable SDN-based Metadata Lookup Service for Distributed File Systems in Data Centers

    Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, Haiyong Xie
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In large-scale distributed file systems, efficient meta- data operations are
    critical since most file operations have to interact with metadata servers
    first. In existing distributed hash table (DHT) based metadata management
    systems, the lookup service could be a performance bottleneck due to its
    significant CPU overhead. Our investigations showed that the lookup service
    could reduce system throughput by up to 70%, and increase system latency by a
    factor of up to 8 compared to ideal scenarios. In this paper, we present
    MetaFlow, a scalable metadata lookup service utilizing software-defined
    networking (SDN) techniques to distribute lookup workload over network
    components. MetaFlow tackles the lookup bottleneck problem by leveraging
    B-tree, which is constructed over the physical topology, to manage flow tables
    for SDN-enabled switches. Therefore, metadata requests can be forwarded to
    appropriate servers using only switches. Extensive performance evaluations in
    both simulations and testbed showed that MetaFlow increases system throughput
    by a factor of up to 3.2, and reduce system latency by a factor of up to 5
    compared to DHT-based systems. We also deployed MetaFlow in a distributed file
    system, and demonstrated significant performance improvement.

    Distributed Coordinate Descent for Generalized Linear Models with Regularization

    Ilya Trofimov, Alexander Genkin
    Comments: arXiv admin note: text overlap with arXiv:1411.6520
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC)

    Generalized linear model with (L_1) and (L_2) regularization is a widely used
    technique for solving classification, class probability estimation and
    regression problems. With the numbers of both features and examples growing
    rapidly in the fields like text mining and clickstream data analysis
    parallelization and the use of cluster architectures becomes important. We
    present a novel algorithm for fitting regularized generalized linear models in
    the distributed environment. The algorithm splits data between nodes by
    features, uses coordinate descent on each node and line search to merge results
    globally. Convergence proof is provided. A modifications of the algorithm
    addresses slow node problem. For an important particular case of logistic
    regression we empirically compare our program with several state-of-the art
    approaches that rely on different algorithmic and data spitting methods.
    Experiments demonstrate that our approach is scalable and superior when
    training on large and sparse datasets.


    Learning

    Optimal Binary Autoencoding with Pairwise Correlations

    Akshay Balsubramani
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We formulate learning of a binary autoencoder as a biconvex optimization
    problem which learns from the pairwise correlations between encoded and decoded
    bits. Among all possible algorithms that use this information, ours finds the
    autoencoder that reconstructs its inputs with worst-case optimal loss. The
    optimal decoder is a single layer of artificial neurons, emerging entirely from
    the minimax loss minimization, and with weights learned by convex optimization.
    All this is reflected in competitive experimental results, demonstrating that
    binary autoencoding can be done efficiently by conveying information in
    pairwise correlations in an optimal fashion.

    Hierarchical compositional feature learning

    Miguel Lázaro-Gredilla, Yi Liu, D. Scott Phoenix, Dileep George
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We introduce the hierarchical compositional network (HCN), a directed
    generative model able to discover and disentangle, without supervision, the
    building blocks of a set of binary images. The building blocks are binary
    features defined hierarchically as a composition of some of the features in the
    layer immediately below, arranged in a particular manner. At a high level, HCN
    is similar to a sigmoid belief network with pooling. Inference and learning in
    HCN are very challenging and existing variational approximations do not work
    satisfactorily. A main contribution of this work is to show that both can be
    addressed using max-product message passing (MPMP) with a particular schedule
    (no EM required). Also, using MPMP as an inference engine for HCN makes new
    tasks simple: adding supervision information, classifying images, or performing
    inpainting all correspond to clamping some variables of the model to their
    known values and running MPMP on the rest. When used for classification, fast
    inference with HCN has exactly the same functional form as a convolutional
    neural network (CNN) with linear activations and binary weights. However, HCN’s
    features are qualitatively very different.

    Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

    Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Model-free deep reinforcement learning (RL) methods have been successful in a
    wide variety of simulated domains. However, a major obstacle facing deep RL in
    the real world is the high sample complexity of such methods. Unbiased batch
    policy-gradient methods offer stable learning, but at the cost of high
    variance, which often requires large batches, while TD-style methods, such as
    off-policy actor-critic and Q-learning, are more sample-efficient but biased,
    and often require costly hyperparameter sweeps to stabilize. In this work, we
    aim to develop methods that combine the stability of unbiased policy gradients
    with the efficiency of off-policy RL. We present Q-Prop, a policy gradient
    method that uses a Taylor expansion of the off-policy critic as a control
    variate. Q-Prop is both sample efficient and stable, and effectively combines
    the benefits of on-policy and off-policy methods. We analyze the connection
    between Q-Prop and existing model-free algorithms, and use control variate
    theory to derive two variants of Q-Prop with conservative and aggressive
    adaptation. We show that conservative Q-Prop provides substantial gains in
    sample efficiency over trust region policy optimization (TRPO) with generalized
    advantage estimation (GAE), and improves stability over deep deterministic
    policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods,
    on OpenAI Gym’s MuJoCo continuous control environments.

    Playing SNES in the Retro Learning Environment

    Nadav Bhonker, Shai Rozenberg, Itay Hubara
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Mastering a video game requires skill, tactics and strategy. While these
    attributes may be acquired naturally by human players, teaching them to a
    computer program is a far more challenging task. In recent years, extensive
    research was carried out in the field of reinforcement learning and numerous
    algorithms were introduced, aiming to learn how to perform human tasks such as
    playing video games. As a results, the Arcade Learning Environment (ALE) has
    become a commonly used benchmark environment allowing algorithms to train on
    various Atari 2600 games. Most Atari games no longer pose a challenge to
    state-of-the-art algorithms. In this paper we introduce a new learning
    environment, the Retro Learning Environment — RLE, based on the Super
    Nintendo Entertainment System (SNES). The environment is expandable, allowing
    for more video games and consoles to be easily added to the environment, while
    maintaining the same interface as ALE. Moreover, RLE is compatible with Python
    and Torch. SNES games pose a significant challenge to current algorithms due to
    their higher level of complexity and versatility. To overcome these challenges,
    we introduce a novel training method based on training two agents against each
    other.

    CoCoA: A General Framework for Communication-Efficient Distributed Optimization

    Virginia Smith, Simone Forte, Chenxin Ma, Martin Takac, Michael I. Jordan, Martin Jaggi
    Subjects: Learning (cs.LG)

    The scale of modern datasets necessitates the development of efficient
    distributed optimization methods for machine learning. We present a
    general-purpose framework for the distributed environment, CoCoA, that has an
    efficient communication scheme and is applicable to a wide variety of problems
    in machine learning and signal processing. We extend the framework to cover
    general non-strongly convex regularizers, including L1-regularized problems
    like lasso, sparse logistic regression, and elastic net regularization, and
    show how earlier work can be derived as a special case. We provide convergence
    guarantees for the class of convex regularized loss minimization objectives,
    leveraging a novel approach in handling non-strongly convex regularizers and
    non-smooth loss functions. The resulting framework has markedly improved
    performance over state-of-the-art methods, as we illustrate with an extensive
    set of experiments on real distributed datasets.

    Trusting SVM for Piecewise Linear CNNs

    Leonard Berrada, Andrew Zisserman, M. Pawan Kumar
    Subjects: Learning (cs.LG)

    We present a novel layerwise optimization algorithm for the learning
    objective of a large class of convolutional neural networks (CNNs).
    Specifically, we consider CNNs that employ piecewise linear non-linearities
    such as the commonly used ReLU and max-pool, and an SVM classifier as the final
    layer. The key observation of our approach is that the problem corresponding to
    the parameter estimation of a layer can be formulated as a difference-of-convex
    (DC) program, which happens to be a latent structured SVM. We optimize the DC
    program using the concave- convex procedure, which requires us to iteratively
    solve a structured SVM problem. To this end, we extend the block-coordinate
    Frank-Wolfe (BCFW) algorithm in three important ways: (i) we include a
    trust-region for the parameters, which allows us to use the previous parameters
    as an initialization; (ii) we reduce the memory requirement of BCFW by
    potentially several orders of magnitude for the dense layers, which enables us
    to learn a large set of parameters; and (iii) we observe that, empirically, the
    optimal solution of the structured SVM problem can be obtained efficiently by
    solving a related, but significantly easier, multi-class SVM problem. Using
    publicly available data sets, we show that our approach outperforms the state
    of the art variants of backpropagation, and is also more robust to the
    hyperparameters of the learning objective.

    Designing Neural Network Architectures using Reinforcement Learning

    Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar
    Subjects: Learning (cs.LG)

    At present, designing convolutional neural network (CNN) architectures
    requires both human expertise and labor. New architectures are handcrafted by
    careful experimentation or modified from a handful of existing networks. We
    propose a meta-modelling approach based on reinforcement learning to
    automatically generate high-performing CNN architectures for a given learning
    task. The learning agent is trained to sequentially choose CNN layers using
    Q-learning with an (epsilon)-greedy exploration strategy and experience
    replay. The agent explores a large but finite space of possible architectures
    and iteratively discovers designs with improved performance on the learning
    task. On image classification benchmarks, the agent-designed networks
    (consisting of only standard convolution, pooling, and fully-connected layers)
    beat existing networks designed with the same layer types and are competitive
    against the state-of-the-art methods that use more complex layer types. We also
    outperform existing network design meta-modelling approaches on image
    classification.

    Unrolled Generative Adversarial Networks

    Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We introduce a method to stabilize Generative Adversarial Networks (GANs) by
    defining the generator objective with respect to an unrolled optimization of
    the discriminator. This allows training to be adjusted between using the
    optimal discriminator in the generator’s objective, which is ideal but
    infeasible in practice, and using the current value of the discriminator, which
    is often unstable and leads to poor solutions. We show how this technique
    solves the common problem of mode collapse, stabilizes training of GANs with
    complex recurrent generators, and increases diversity and coverage of the data
    distribution by the generator.

    Lifelong Perceptual Programming By Example

    Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, Daniel Tarlow
    Comments: Submitted to ICLR 2017
    Subjects: Learning (cs.LG)

    We introduce and develop solutions for the problem of Lifelong Perceptual
    Programming By Example (LPPBE). The problem is to induce a series of programs
    that require understanding perceptual data like images or text. LPPBE systems
    learn from weak supervision (input-output examples) and incrementally construct
    a shared library of components that grows and improves as more tasks are
    solved. Methodologically, we extend differentiable interpreters to operate on
    perceptual data and to share components across tasks. Empirically we show that
    this leads to a lifelong learning system that transfers knowledge to new tasks
    more effectively than baselines, and the performance on earlier tasks continues
    to improve even as the system learns on new, different tasks.

    Reinforcement-based Simultaneous Algorithm and its Hyperparameters Selection

    Valeria Efimova, Andrey Filchenkov, Anatoly Shalyto
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Many algorithms for data analysis exist, especially for classification
    problems. To solve a data analysis problem, a proper algorithm should be
    chosen, and also its hyperparameters should be selected. In this paper, we
    present a new method for the simultaneous selection of an algorithm and its
    hyperparameters. In order to do so, we reduced this problem to the multi-armed
    bandit problem. We consider an algorithm as an arm and algorithm
    hyperparameters search during a fixed time as the corresponding arm play. We
    also suggest a problem-specific reward function. We performed the experiments
    on 10 real datasets and compare the suggested method with the existing one
    implemented in Auto-WEKA. The results show that our method is significantly
    better in most of the cases and never worse than the Auto-WEKA.

    Reinforcement Learning Approach for Parallelization in Filters Aggregation Based Feature Selection Algorithms

    Ivan Smetannikov, Ilya Isaev, Andrey Filchenkov
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    One of the classical problems in machine learning and data mining is feature
    selection. A feature selection algorithm is expected to be quick, and at the
    same time it should show high performance. MeLiF algorithm effectively solves
    this problem using ensembles of ranking filters. This article describes two
    different ways to improve MeLiF algorithm performance with parallelization.
    Experiments show that proposed schemes significantly improves algorithm
    performance and increase feature selection quality.

    Multi-view Generative Adversarial Networks

    Mickaël Chen, Ludovic Denoyer
    Comments: Submitted at ICLR 2017
    Subjects: Learning (cs.LG)

    Learning over multi-view data is a challenging problem with strong practical
    applications. Most related studies focus on the classification point of view
    and assume that all the views are available at any time. We consider an
    extension of this framework in two directions. First, based on the BiGAN model,
    the Multi-view BiGAN (MV-BiGAN) is able to perform density estimation from
    multi-view inputs. Second, it can deal with missing views and is able to update
    its prediction when additional views are provided. We illustrate these
    properties on a set of experiments over different datasets.

    DeepCoder: Learning to Write Programs

    Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, Daniel Tarlow
    Comments: Submitted to ICLR 2017
    Subjects: Learning (cs.LG)

    We develop a first line of attack for solving programming competition-style
    problems from input-output examples using deep learning. The approach is to
    train a neural network to predict properties of the program that generated the
    outputs from the inputs. We use the neural network’s predictions to augment
    search techniques from the programming languages community, including
    enumerative search and an SMT-based solver. Empirically, we show that our
    approach leads to an order of magnitude speedup over the strong non-augmented
    baselines and a Recurrent Neural Network approach, and that we are able to
    solve problems of difficulty comparable to the simplest problems on programming
    competition websites.

    Regularizing CNNs with Locally Constrained Decorrelations

    Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca
    Comments: Submitted to ICLR2017 conference
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Regularization is key for deep learning since it allows training more complex
    models while keeping lower levels of overfitting. However, the most prevalent
    regularizations do not leverage all the capacity of the models since they rely
    on reducing the effective number of parameters. Feature decorrelation is an
    alternative for using the full capacity of the models but the overfitting
    reduction margins are too narrow given the overhead it introduces. In this
    paper, we show that regularizing negatively correlated features is an obstacle
    for effective decorrelation and present OrthoReg, a novel regularization
    technique that locally enforces feature orthogonality. As a result, imposing
    locality constraints in feature decorrelation removes interferences between
    negatively correlated features, allowing the regularizer to reach higher
    decorrelation bounds, and reducing the overfitting more effectively. In
    particular, we show that the models regularized with OrthoReg have higher
    accuracy bounds even when batch normalization and dropout present. Moreover,
    since our regularization is directly performed on the weights, it is especially
    suitable for fully convolutional neural networks, where the weight space is
    constant compared to the feature map space. As a result, we are able to reduce
    the overfitting of state-of-the-art CNNs on CIFAR-10, CIFAR-100, and SVHN.

    Log-time and Log-space Extreme Classification

    Kalina Jasinska, Nikos Karampatziakis
    Subjects: Learning (cs.LG)

    We present LTLS, a technique for multiclass and multilabel prediction that
    can perform training and inference in logarithmic time and space. LTLS embeds
    large classification problems into simple structured prediction problems and
    relies on efficient dynamic programming algorithms for inference. We train LTLS
    with stochastic gradient descent on a number of multiclass and multilabel
    datasets and show that despite its small memory footprint it is often
    competitive with existing approaches.

    DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing

    Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, Tarek Abdelzaher
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)

    Mobile sensing applications usually require time-series inputs from sensors.
    Some applications, such as tracking, can use sensed acceleration and rate of
    rotation to calculate displacement based on physical system models. Other
    applications, such as activity recognition, extract manually designed features
    from sensor inputs for classification. Such applications face two challenges.
    On one hand, on-device sensor measurements are noisy. For many mobile
    applications, it is hard to find a distribution that exactly describes the
    noise in practice. Unfortunately, calculating target quantities based on
    physical system and noise models is only as accurate as the noise assumptions.
    Similarly, in classification applications, although manually designed features
    have proven to be effective, it is not always straightforward to find the most
    robust features to accommodate diverse sensor noise patterns and user
    behaviors. To this end, we propose DeepSense, a deep learning framework that
    directly addresses the aforementioned noise and feature customization
    challenges in a unified manner. DeepSense integrates convolutional and
    recurrent neural networks to exploit local interactions among similar mobile
    sensors, merge local interactions of different sensory modalities into global
    interactions, and extract temporal relationships to model signal dynamics.
    DeepSense thus provides a general signal estimation and classification
    framework that accommodates a wide range of applications. We demonstrate the
    effectiveness of DeepSense using three representative and challenging tasks:
    car tracking with motion sensors, heterogeneous human activity recognition, and
    user identification with biometric motion analysis. DeepSense significantly
    outperforms the state-of-the-art methods for all three tasks. In addition,
    DeepSense is feasible to implement on smartphones due to its moderate energy
    consumption and low latency

    An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax

    Wentao Huang, Kechen Zhang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

    A framework is presented for unsupervised learning of representations based
    on infomax principle for large-scale neural populations. We use an asymptotic
    approximation to the Shannon’s mutual information for a large neural population
    to demonstrate that a good initial approximation to the global
    information-theoretic optimum can be obtained by a hierarchical infomax method.
    From the initial solution, an efficient algorithm based on gradient descent of
    the final objective function is proposed to learn representations from the
    input datasets, allowing complete, overcomplete, or undercomplete bases. As
    confirmed by numerical experiments, our method is robust and highly efficient
    for extracting salient features from image datasets. Compared with the main
    existing methods, our algorithm has a distinct advantage in both the training
    speed and the robustness of unsupervised representation learning.

    Challenges of Feature Selection for Big Data Analytics

    Jundong Li, Huan Liu
    Comments: Special Issue on Big Data, IEEE Intelligent Systems, 2016. arXiv admin note: text overlap with arXiv:1601.07996
    Subjects: Learning (cs.LG)

    We are surrounded by huge amounts of large-scale high dimensional data. It is
    desirable to reduce the dimensionality of data for many learning tasks due to
    the curse of dimensionality. Feature selection has shown its effectiveness in
    many applications by building simpler and more comprehensive model, improving
    learning performance, and preparing clean, understandable data. Recently, some
    unique characteristics of big data such as data velocity and data variety
    present challenges to the feature selection problem. In this paper, we envision
    these challenges of feature selection for big data analytics. In particular, we
    first give a brief introduction about feature selection and then detail the
    challenges of feature selection for structured, heterogeneous and streaming
    data as well as its scalability and stability issues. At last, to facilitate
    and promote the feature selection research, we present an open-source feature
    selection repository (scikit-feature), which consists of most of current
    popular feature selection algorithms.

    Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

    Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun
    Subjects: Learning (cs.LG)

    This paper proposes a new optimization algorithm called Entropy-SGD for
    training deep neural networks that is motivated by the local geometry of the
    energy landscape at solutions found by gradient descent. Local extrema with low
    generalization error have a large proportion of almost-zero eigenvalues in the
    Hessian with very few positive or negative eigenvalues. We leverage upon this
    observation to construct a local entropy based objective that favors
    well-generalizable solutions lying in the flat regions of the energy landscape,
    while avoiding poorly-generalizable solutions located in the sharp valleys. Our
    algorithm resembles two nested loops of SGD, where we use Langevin dynamics to
    compute the gradient of local entropy at each update of the weights. We prove
    that incorporating local entropy into the objective function results in a
    smoother energy landscape and use uniform stability to show improved
    generalization bounds over SGD. Our experiments on competitive baselines
    demonstrate that Entropy-SGD leads to improved generalization and has the
    potential to accelerate training.

    Generative Adversarial Networks as Variational Training of Energy Based Models

    Shuangfei Zhai, Yu Cheng, Rogerio Feris, Zhongfei Zhang
    Comments: Under review at ICLR 2017
    Subjects: Learning (cs.LG)

    In this paper, we study deep generative models for effective unsupervised
    learning. We propose VGAN, which works by minimizing a variational lower bound
    of the negative log likelihood (NLL) of an energy based model (EBM), where the
    model density (p(mathbf{x})) is approximated by a variational distribution
    (q(mathbf{x})) that is easy to sample from. The training of VGAN takes a two
    step procedure: given (p(mathbf{x})), (q(mathbf{x})) is updated to maximize
    the lower bound; (p(mathbf{x})) is then updated one step with samples drawn
    from (q(mathbf{x})) to decrease the lower bound. VGAN is inspired by the
    generative adversarial networks (GANs), where (p(mathbf{x})) corresponds to
    the discriminator and (q(mathbf{x})) corresponds to the generator, but with
    several notable differences. We hence name our model variational GANs (VGANs).
    VGAN provides a practical solution to training deep EBMs in high dimensional
    space, by eliminating the need of MCMC sampling. From this view, we are also
    able to identify causes to the difficulty of training GANs and propose viable
    solutions. footnote{Experimental code is available at
    this https URL}

    Modular Multitask Reinforcement Learning with Policy Sketches

    Jacob Andreas, Dan Klein, Sergey Levine
    Comments: Submitted to ICLR
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We describe a framework for multitask deep reinforcement learning guided by
    policy sketches. Sketches annotate each task with a sequence of named subtasks,
    providing high-level structural relationships among tasks, but not providing
    the detailed guidance required by previous work on learning policy abstractions
    for RL (e.g. intermediate rewards, subtask completion signals, or intrinsic
    motivations). Our approach associates every subtask with its own modular
    subpolicy, and jointly optimizes over full task-specific policies by tying
    parameters across shared subpolicies. This optimization is accomplished via a
    simple decoupled actor-critic training objective that facilitates learning
    common behaviors from dissimilar reward functions. We evaluate the
    effectiveness of our approach on a maze navigation game and a 2-D
    Minecraft-inspired crafting game. Both games feature extremely sparse rewards
    that can be obtained only after completing a number of high-level subgoals
    (e.g. escaping from a sequence of locked rooms or collecting and combining
    various ingredients in the proper order). Experiments illustrate two main
    advantages of our approach. First, we outperform standard baselines that learn
    task-specific or shared monolithic policies. Second, our method naturally
    induces a library of primitive behaviors that can be recombined to rapidly
    acquire policies for new tasks.

    Learning to superoptimize programs

    Rudy Bunel, Alban Desmaison, M. Pawan Kumar, Philip H.S. Torr, Pushmeet Kohli
    Comments: Submitted to ICLR 2017
    Subjects: Learning (cs.LG)

    Code super-optimization is the task of transforming any given program to a
    more efficient version while preserving its input-output behaviour. In some
    sense, it is similar to the paraphrase problem from natural language processing
    where the intention is to change the syntax of an utterance without changing
    its semantics. Code-optimization has been the subject of years of research that
    has resulted in the development of rule-based transformation strategies that
    are used by compilers. More recently, however, a class of stochastic search
    based methods have been shown to outperform these strategies. This approach
    involves repeated sampling of modifications to the program from a proposal
    distribution, which are accepted or rejected based on whether they preserve
    correctness, and the improvement they achieve. These methods, however, neither
    learn from past behaviour nor do they try to leverage the semantics of the
    program under consideration. Motivated by this observation, we present a novel
    learning based approach for code super-optimization. Intuitively, our method
    works by learning the proposal distribution using unbiased estimators of the
    gradient of the expected improvement. Experiments on benchmarks comprising of
    automatically generated as well as existing (“Hacker’s Delight”) programs show
    that the proposed method is able to significantly outperform state of the art
    approaches for code super-optimization.

    Learning to Act by Predicting the Future

    Alexey Dosovitskiy, Vladlen Koltun
    Comments: ICLR-2017 submission
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    We present an approach to sensorimotor control in immersive environments. Our
    approach utilizes a high-dimensional sensory stream and a lower-dimensional
    measurement stream. The cotemporal structure of these streams provides a rich
    supervisory signal, which enables training a sensorimotor control model by
    interacting with the environment. The model is trained using supervised
    learning techniques, but without extraneous supervision. It learns to act based
    on raw sensory input from a complex three-dimensional environment. The
    presented formulation allows changing the agent’s goal at test time. We conduct
    extensive experiments in three-dimensional simulations based on the classical
    first-person game Doom. The results demonstrate that the presented approach
    outperforms sophisticated prior formulations, particularly on challenging
    tasks. The results also show that trained models successfully generalize across
    environments and goals. A model trained using the presented approach won the
    Full Deathmatch track of the Visual Doom AI Competition, which was held in
    previously unseen environments.

    Beyond Fine Tuning: A Modular Approach to Learning on Small Data

    Ark Anderson, Kyle Shaffer, Artem Yankov, Court D. Corley, Nathan O. Hodas
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    In this paper we present a technique to train neural network models on small
    amounts of data. Current methods for training neural networks on small amounts
    of rich data typically rely on strategies such as fine-tuning a pre-trained
    neural network or the use of domain-specific hand-engineered features. Here we
    take the approach of treating network layers, or entire networks, as modules
    and combine pre-trained modules with untrained modules, to learn the shift in
    distributions between data sets. The central impact of using a modular approach
    comes from adding new representations to a network, as opposed to replacing
    representations via fine-tuning. Using this technique, we are able surpass
    results using standard fine-tuning transfer learning approaches, and we are
    also able to significantly increase performance over such approaches when using
    smaller amounts of data.

    Oracle-Efficient Learning and Auction Design

    Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, Jennifer Wortman Vaughan
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT)

    We consider the design of online no-regret algorithms that are
    computationally efficient, given access to an offline optimization oracle. We
    present an algorithm we call Generalized Follow-the-Perturbed-Leader and
    provide conditions under which it achieves vanishing regret and is
    oracle-efficient. Our second main contribution is introducing a new adversarial
    auction-design framework for revenue maximization and applying our
    oracle-efficient learning results to the adaptive optimization of auctions.

    Our learning algorithm is a generalization of the FTPL algorithm of Kalai and
    Vempala that at every step plays the best-performing action subject to some
    perturbations. Our design uses a shared source of randomness across all actions
    that can be efficiently implemented using an oracle. Our work extends to
    oracle-efficient algorithms for contextual learning, learning with
    Maximal-in-Range approximation algorithms, and no-regret bidding in
    simultaneous auctions, answering an open problem of Daskalakis and Syrgkanis in
    the latter case.

    Our auction-design framework considers an auctioneer learning an optimal
    auction for a sequence of adversarially selected valuations with the goal of
    achieving revenue that is almost as good as the optimal auction in hindsight,
    among a class of auctions. We give oracle-efficient learning results for: (1)
    VCG auctions with bidder-specific reserves in single-parameter settings, (2)
    envy-free item pricing in multi-item auctions, and (3) s-level auctions of
    Morgenstern and Roughgarden for single-item settings. The last result leads to
    an approximation of the optimal Myerson auction for the stationary distribution
    of a Markov process, extending prior work that only gave such guarantees for
    the i.i.d. setting. We also extend our framework to allow the auctioneer to use
    side information about the bidders in the design of the optimal auction
    (contextual learning).

    Comparing learning algorithms in neural network for diagnosing cardiovascular disease

    Mirmorsal Madani
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Today data mining techniques are exploited in medical science for diagnosing,
    overcoming and treating diseases. Neural network is one of the techniques which
    are widely used for diagnosis in medical field. In this article efficiency of
    nine algorithms, which are basis of neural network learning in diagnosing
    cardiovascular diseases, will be assessed. Algorithms are assessed in terms of
    accuracy, sensitivity, transparency, AROC and convergence rate by means of 10
    fold cross validation. The results suggest that in training phase, Lonberg-M
    algorithm has the best efficiency in terms of all metrics, algorithm OSS has
    maximum accuracy in testing phase, algorithm SCG has the maximum transparency
    and algorithm CGB has the maximum sensitivity.

    Generative Multi-Adversarial Networks

    Ishan Durugkar, Ian Gemp, Sridhar Mahadevan
    Comments: Submitted as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE)

    Generative adversarial networks (GANs) are a framework for producing a
    generative model by way of a two-player minimax game. In this paper, we propose
    the emph{Generative Multi-Adversarial Network} (GMAN), a framework that
    extends GANs to multiple discriminators. In previous work, the successful
    training of GANs requires modifying the minimax objective to accelerate
    training early on. In contrast, GMAN can be reliably trained with the original,
    untampered objective. We explore a number of design perspectives with the
    discriminator role ranging from formidable adversary to forgiving teacher.
    Image generation tasks comparing the proposed framework to standard GANs
    demonstrate GMAN produces higher quality samples in a fraction of the
    iterations when measured by a pairwise GAM-type metric.

    Representation of uncertainty in deep neural networks through sampling

    Patrick McClure, Nikolaus Kriegeskorte
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    As deep neural networks (DNNs) are applied to increasingly challenging
    problems, they will need to be able to represent their own uncertainty.
    Modeling uncertainty is one of the key features of Bayesian methods. Scalable
    Bayesian DNNs that use dropout-based variational distributions have recently
    been proposed. Here we evaluate the ability of Bayesian DNNs trained with
    Bernoulli or Gaussian distributions over units (dropout) or weights
    (dropconnect) to represent their own uncertainty at the time of inference
    through sampling. We tested how well Bayesian fully connected and convolutional
    DNNs represented their own uncertainty in classifying the MNIST handwritten
    digits. By adding different levels of Gaussian noise to the test images, we
    assessed how DNNs represented their uncertainty about regions of input space
    not covered by the training set. Bayesian DNNs estimated their own uncertainty
    more accurately than traditional DNNs with a softmax output. These results are
    important for building better deep learning systems and for investigating the
    hypothesis that biological neural networks use sampling to represent
    uncertainty.

    PGQ: Combining policy gradient and Q-learning

    Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Policy gradient is an efficient technique for improving a policy in a
    reinforcement learning setting. However, vanilla online variants are on-policy
    only and not able to take advantage of off-policy data. In this paper we
    describe a new technique that combines policy gradient with off-policy
    Q-learning, drawing experience from a replay buffer. This is motivated by
    making a connection between the fixed points of the regularized policy gradient
    algorithm and the Q-values. This connection allows us to estimate the Q-values
    from the action preferences of the policy, to which we apply Q-learning
    updates. We refer to the new technique as ‘PGQ’, for policy gradient and
    Q-learning. We also establish an equivalency between action-value fitting
    techniques and actor-critic algorithms, showing that regularized policy
    gradient techniques can be interpreted as advantage function learning
    algorithms. We conclude with some numerical examples that demonstrate improved
    data efficiency and stability of PGQ. In particular, we tested PGQ on the full
    suite of Atari games and achieved performance exceeding that of both
    asynchronous advantage actor-critic (A3C) and Q-learning.

    Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    Frank S. He, Yang Liu, Alexander G. Schwing, Jian Peng
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose a novel training algorithm for reinforcement learning which
    combines the strength of deep Q-learning with a constrained optimization
    approach to tighten optimality and encourage faster reward propagation. Our
    novel technique makes deep reinforcement learning more practical by drastically
    reducing the training time. We evaluate the performance of our approach on the
    49 games of the challenging Arcade Learning Environment, and report significant
    improvements in both training time and accuracy.

    LipNet: Sentence-level Lipreading

    Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    Lipreading is the task of decoding text from the movement of a speaker’s
    mouth. Traditional approaches separated the problem into two stages: designing
    or learning visual features, and prediction. More recent deep lipreading
    approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman,
    2016a). All existing works, however, perform only word classification, not
    sentence-level sequence prediction. Studies have shown that human lipreading
    performance increases for longer words (Easton & Basala, 1982), indicating the
    importance of features capturing temporal context in an ambiguous communication
    channel. Motivated by this observation, we present LipNet, a model that maps a
    variable-length sequence of video frames to text, making use of spatiotemporal
    convolutions, an LSTM recurrent network, and the connectionist temporal
    classification loss, trained entirely end-to-end. To the best of our knowledge,
    LipNet is the first lipreading model to operate at sentence-level, using a
    single end-to-end speaker-independent deep model to simultaneously learn
    spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet
    achieves 93.4% accuracy, outperforming experienced human lipreaders and the
    previous 79.6% state-of-the-art accuracy.

    Class-prior Estimation for Learning from Positive and Unlabeled Data

    Marthinus C. du Plessis, Gang Niu, Masashi Sugiyama
    Comments: To appear in Machine Learning
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We consider the problem of estimating the class prior in an unlabeled
    dataset. Under the assumption that an additional labeled dataset is available,
    the class prior can be estimated by fitting a mixture of class-wise data
    distributions to the unlabeled data distribution. However, in practice, such an
    additional labeled dataset is often not available. In this paper, we show that,
    with additional samples coming only from the positive class, the class prior of
    the unlabeled dataset can be estimated correctly. Our key idea is to use
    properly penalized divergences for model fitting to cancel the error caused by
    the absence of negative samples. We further show that the use of the penalized
    (L_1)-distance gives a computationally efficient algorithm with an analytic
    solution. The consistency, stability, and estimation error are theoretically
    analyzed. Finally, we experimentally demonstrate the usefulness of the proposed
    method.

    Neural Architecture Search with Reinforcement Learning

    Barret Zoph, Quoc V. Le
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Neural networks are powerful and flexible models that work well for many
    difficult learning tasks in image, speech and natural language understanding.
    Despite their success, neural networks are still hard to design. In this paper,
    we use a recurrent network to generate the model descriptions of neural
    networks and train this RNN with reinforcement learning to maximize the
    expected accuracy of the generated architectures on a validation set. On the
    CIFAR-10 dataset, our method, starting from scratch, can design a novel network
    architecture that rivals the best human-invented architecture in terms of test
    set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84, which is
    only 0.1 percent worse and 1.2x faster than the current state-of-the-art model.
    On the Penn Treebank dataset, our model can compose a novel recurrent cell that
    outperforms the widely-used LSTM cell, and other state-of-the-art baselines.
    Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is
    3.6 perplexity better than the previous state-of-the-art.

    Gaussian Attention Model and Its Application to Knowledgebase Embedding and Question Answering

    Liwen Zhang, John Winn, Ryota Tomioka
    Comments: 12 pages, 2 figures
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    We propose the Gaussian attention model for content-based neural memory
    access. With the proposed attention model, a neural network has the additional
    degree of freedom to control the focus of its attention from a laser sharp
    attention to blurred attention. It is applicable whenever we can assume that
    the distance in the latent space reflects some notion of semantics. We use the
    proposed attention model as a scoring function for the embedding of a
    knowledgebase into a continuous vector space and then train a model that
    performs question answering about the entities in the knowledgebase. The
    proposed attention model can handle both the propagation of uncertainty when
    following a series of relations and also the conjunction of thoughts in a
    natural way. On a dataset of soccer players who participated in the FIFA World
    Cup 2014, we demonstrate that our model can handle both path queries and
    conjunctive queries well.

    Memory-augmented Attention Modelling for Videos

    Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli
    Comments: ICLR 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recent works on neural architectures have demonstrated the utility of
    attention mechanisms for a wide variety of tasks. Attention models used for
    problems such as image captioning typically depend on the image under
    consideration, as well as the previous sequence of words that come before the
    word currently being generated. While these types of models have produced
    impressive results, they are not able to model the higher-order interactions
    involved in problems such as video description/captioning, where the
    relationship between parts of the video and the concepts being depicted is
    complex. Motivated by these observations, we propose a novel memory-based
    attention model for video description. Our model utilizes memories of past
    attention when reasoning about where to attend to in the current time step,
    similar to the central executive system proposed in human cognition. This
    allows the model to not only reason about local attention more effectively, it
    allows it to consider the entire sequence of video frames while generating each
    word. Evaluation on the challenging and popular MSVD and Charades datasets show
    that the proposed architecture outperforms all previously proposed methods and
    leads to a new state of the art results in the video description.

    Using Social Dynamics to Make Individual Predictions: Variational Inference with a Stochastic Kinetic Model

    Zhen Xu, Wen Dong, Sargur Srihari
    Comments: In proceedings of 29th Conference on Neural Information Processing Systems (NIPS 2016)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Social dynamics is concerned primarily with interactions among individuals
    and the resulting group behaviors, modeling the temporal evolution of social
    systems via the interactions of individuals within these systems. In
    particular, the availability of large-scale data from social networks and
    sensor networks offers an unprecedented opportunity to predict state-changing
    events at the individual level. Examples of such events include disease
    transmission, opinion transition in elections, and rumor propagation. Unlike
    previous research focusing on the collective effects of social systems, this
    study makes efficient inferences at the individual level. In order to cope with
    dynamic interactions among a large number of individuals, we introduce the
    stochastic kinetic model to capture adaptive transition probabilities and
    propose an efficient variational inference algorithm the complexity of which
    grows linearly — rather than exponentially — with the number of
    individuals. To validate this method, we have performed epidemic-dynamics
    experiments on wireless sensor network data collected from more than ten
    thousand people over three years. The proposed algorithm was used to track
    disease transmission and predict the probability of infection for each
    individual. Our results demonstrate that this method is more efficient than
    sampling while nonetheless achieving high accuracy.

    Neural Networks Designing Neural Networks: Multi-Objective Hyper-Parameter Optimization

    Sean C. Smithson, Guang Yang, Warren J. Gross, Brett H. Meyer
    Comments: To appear in ICCAD’16. The authoritative version will appear in the ACM Digital Library
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Artificial neural networks have gone through a recent rise in popularity,
    achieving state-of-the-art results in various fields, including image
    classification, speech recognition, and automated control. Both the performance
    and computational complexity of such models are heavily dependant on the design
    of characteristic hyper-parameters (e.g., number of hidden layers, nodes per
    layer, or choice of activation functions), which have traditionally been
    optimized manually. With machine learning penetrating low-power mobile and
    embedded areas, the need to optimize not only for performance (accuracy), but
    also for implementation complexity, becomes paramount. In this work, we present
    a multi-objective design space exploration method that reduces the number of
    solution networks trained and evaluated through response surface modelling.
    Given spaces which can easily exceed 1020 solutions, manually designing a
    near-optimal architecture is unlikely as opportunities to reduce network
    complexity, while maintaining performance, may be overlooked. This problem is
    exacerbated by the fact that hyper-parameters which perform well on specific
    datasets may yield sub-par results on others, and must therefore be designed on
    a per-application basis. In our work, machine learning is leveraged by training
    an artificial neural network to predict the performance of future candidate
    networks. The method is evaluated on the MNIST and CIFAR-10 image datasets,
    optimizing for both recognition accuracy and computational complexity.
    Experimental results demonstrate that the proposed method can closely
    approximate the Pareto-optimal front, while only exploring a small fraction of
    the design space.

    Neural Functional Programming

    John K. Feser, Marc Brockschmidt, Alexander L. Gaunt, Daniel Tarlow
    Comments: Submitted to ICLR 2017
    Subjects: Programming Languages (cs.PL); Learning (cs.LG)

    We discuss a range of modeling choices that arise when constructing an
    end-to-end differentiable programming language suitable for learning programs
    from input-output examples. Taking cues from programming languages research, we
    study the effect of memory allocation schemes, immutable data, type systems,
    and built-in control-flow structures on the success rate of learning
    algorithms. We build a range of models leading up to a simple differentiable
    functional programming language. Our empirical evaluation shows that this
    language allows to learn far more programs than existing baselines.

    Fixed-point Factorized Networks

    Peisong Wang, Jian Cheng
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    In recent years, Deep Neural Networks (DNNs) based methods have achieved
    remarkable performance in a wide range of tasks and have been among the most
    powerful and widely used techniques in computer vision, speech recognition and
    Natural Language Processing. However, DNN-based methods are both
    computational-intensive and resource-consuming, which hinders the application
    of these methods on embedded systems like smart phones. To alleviate this
    problem, we introduce a novel Fixed-point Factorized Networks (FFN) on
    pre-trained models to reduce the computational complexity as well as the
    storage requirement of networks. Extensive experiments on large-scale ImageNet
    classification task show the effectiveness of our proposed method.

    Deep Reinforcement Learning with Averaged Target DQN

    Oron Anschel, Nir Baram, Nahum Shimkin
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    The commonly used Q-learning algorithm combined with function approximation
    induces systematic overestimations of state-action values. These systematic
    errors might cause instability, poor performance and sometimes divergence of
    learning. In this work, we present the extsc{Averaged Target DQN} (ADQN)
    algorithm, an adaptation to the DQN class of algorithms which uses a weighted
    average over past learned networks to reduce generalization noise variance. As
    a consequence, this leads to reduced overestimations, more stable learning
    process and improved performance. Additionally, we analyze ADQN variance
    reduction along trajectories and demonstrate the performance of ADQN on a toy
    Gridworld problem, as well as on several of the Atari 2600 games from the
    Arcade Learning Environment.

    Decision Tree Classification with Differential Privacy: A Survey

    Sam Fletcher, Md Zahidul Islam
    Comments: Pre-print of paper submitted to ACM Computing Surveys, 33 pages
    Subjects: Databases (cs.DB); Learning (cs.LG)

    Data mining information about people is becoming increasingly important in
    the data-driven society of the 21st century. Unfortunately, sometimes there are
    real-world considerations that conflict with the goals of data mining;
    sometimes the privacy of the people being data mined needs to be considered.
    This necessitates that the output of data mining algorithms be modified to
    preserve privacy while simultaneously not ruining the predictive power of the
    outputted model. Differential privacy is a strong, enforceable definition of
    privacy that can be used in data mining algorithms, guaranteeing that nothing
    will be learned about the people in the data that could not already be
    discovered without their participation. In this survey, we focus on one
    particular data mining algorithm — decision trees — and how differential
    privacy interacts with each of the components that constitute decision tree
    algorithms. We analyze both greedy and random decision trees, and the conflicts
    that arise when trying to balance privacy requirements with the accuracy of the
    model.

    Joint Multimodal Learning with Deep Generative Models

    Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We investigate deep generative models that can exchange multiple modalities
    bi-directionally, e.g., generating images from corresponding texts and vice
    versa. Recently, some studies handle multiple modalities on deep generative
    models, such as variational autoencoders (VAEs). However, these models
    typically assume that modalities are forced to have a conditioned relation,
    i.e., we can only generate modalities in one direction. To achieve our
    objective, we should extract a joint representation that captures high-level
    concepts among all modalities and through which we can exchange them
    bi-directionally. As described herein, we propose a joint multimodal
    variational autoencoder (JMVAE), in which all modalities are independently
    conditioned on joint representation. In other words, it models a joint
    distribution of modalities. Furthermore, to be able to generate missing
    modalities from the remaining modalities properly, we develop an additional
    method, JMVAE-kl, that is trained by reducing the divergence between JMVAE’s
    encoder and prepared networks of respective modalities. Our experiments show
    that our proposed method can obtain appropriate joint representation from
    multiple modalities and that it can generate and reconstruct them more properly
    than conventional VAEs. We further demonstrate that JMVAE can generate multiple
    modalities bi-directionally.

    Learning to Perform Physics Experiments via Deep Reinforcement Learning

    Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Physics and Society (physics.soc-ph)

    When encountering novel object, humans are able to infer a wide range of
    physical properties such as mass, friction and deformability by interacting
    with them in a goal driven way. This process of active interaction is in the
    same spirit of a scientist performing an experiment to discover hidden facts.
    Recent advances in artificial intelligence have yielded machines that can
    achieve superhuman performance in Go, Atari, natural language processing, and
    complex control problems, but it is not clear that these systems can rival the
    scientific intuition of even a young child. In this work we introduce a basic
    set of tasks that require agents to estimate hidden properties such as mass and
    cohesion of objects in an interactive simulated environment where they can
    manipulate the objects and observe the consequences. We found that state of art
    deep reinforcement learning methods can learn to perform the experiments
    necessary to discover such hidden properties. By systematically manipulating
    the problem difficulty and the cost incurred by the agent for performing
    experiments, we found that agents learn different strategies that balance the
    cost of gathering information against the cost of making mistakes in different
    situations.

    Learning a Static Analyzer from Data

    Pavol Bielik, Veselin Raychev, Martin Vechev
    Subjects: Programming Languages (cs.PL); Learning (cs.LG)

    To be practically useful, modern static analyzers must precisely model the
    effect of both, statements in the programming language as well as frameworks
    used by the program under analysis. While important, manually addressing these
    challenges is difficult for at least two reasons: (i) the effects on the
    overall analysis can be non-trivial, and (ii) as the size and complexity of
    modern libraries increase, so is the number of cases the analysis must handle.

    In this paper we present a new, automated approach for creating static
    analyzers: instead of manually providing the various inference rules of the
    analyzer, the key idea is to learn these rules from a dataset of programs. Our
    method consists of two ingredients: (i) a synthesis algorithm capable of
    learning a candidate analyzer from a given dataset, and (ii) a counter-example
    guided learning procedure which generates new programs beyond those in the
    initial dataset, critical for discovering corner cases and ensuring the learned
    analysis generalizes to unseen programs.

    We implemented and instantiated our approach to the task of learning a
    points-to analysis for JavaScript, a challenging yet important problem that has
    received significant research attention. We show that our approach is
    effective: our system automatically discovered practical and useful inference
    rules for many corner cases that are tricky to manually identify and are missed
    by state-of-the-art, manually tuned solutions.

    LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems

    Gyuwan Kim, Hayoon Yi, Jangho Lee, Yunheung Paek, Sungroh Yoon
    Comments: 12 pages, 5 figures
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    In computer security, designing a robust intrusion detection system is one of
    the most fundamental and important problems. In this paper, we propose a
    system-call language-modeling approach for designing anomaly-based host
    intrusion detection systems. To remedy the issue of high false-alarm rates
    commonly arising in conventional methods, we employ a novel ensemble method
    that blends multiple thresholding classifiers into a single one, making it
    possible to accumulate ‘highly normal’ sequences. The proposed system-call
    language model has various advantages leveraged by the fact that it can learn
    the semantic meaning and interactions of each system call that existing methods
    cannot effectively consider. Through diverse experiments on public benchmark
    datasets, we demonstrate the validity and effectiveness of the proposed method.
    Moreover, we show that our model possesses high portability, which is one of
    the key aspects of realizing successful intrusion detection systems.

    Words or Characters? Fine-grained Gating for Reading Comprehension

    Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Previous work combines word-level and character-level representations using
    concatenation or scalar weighting, which is suboptimal for high-level tasks
    like reading comprehension. We present a fine-grained gating mechanism to
    dynamically combine word-level and character-level representations based on
    properties of the words. We also extend the idea of fine-grained gating to
    modeling the interaction between questions and paragraphs for reading
    comprehension. Experiments show that our approach can improve the performance
    on reading comprehension tasks, achieving new state-of-the-art results on the
    Children’s Book Test dataset. To demonstrate the generality of our gating
    mechanism, we also show improved results on a social media tag prediction task.

    Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning

    Dilin Wang, Qiang Liu
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a simple algorithm to train stochastic neural networks to draw
    samples from given target distributions for probabilistic inference. Our method
    is based on iteratively adjusting the neural network parameters so that the
    output changes along a Stein variational gradient that maximumly decreases the
    KL divergence with the target distribution. Our method works for any target
    distribution specified by their unnormalized density function, and can train
    any black-box architectures that are differentiable in terms of the parameters
    we want to adapt. As an application of our method, we propose an amortized MLE
    algorithm for training deep energy model, where a neural sampler is adaptively
    trained to approximate the likelihood function. Our method mimics an
    adversarial game between the deep energy model and the neural sampler, and
    obtains realistic-looking images competitive with the state-of-the-art results.

    Detecting Dependencies in High-Dimensional, Sparse Databases Using Probabilistic Programming and Non-parametric Bayes

    Feras Saad, Vikash Mansinghka
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Sparse databases with hundreds of variables are commonplace. In this setting,
    it is both statistically and computationally challenging to detect true
    predictive relationships between variables and also to suppress false
    positives. This paper proposes a new approach to dependency detection that
    combines probabilistic programming, information theory, and non-parametric
    Bayesian modeling. The key ideas are to (i) build an ensemble of joint
    probability models for the whole database via approximate posterior inference
    in CrossCat, a non-parametric factorial mixture; (ii) identify independencies
    by analyzing model structures; and (iii) report the distribution on conditional
    mutual information induced by posterior uncertainty over the ensemble of
    models. This paper presents experiments showing that the approach finds
    relationships that pairwise correlation misses, including context-specific
    independencies, on databases of mathematics exam scores and global indicators
    of macroeconomic development.

    TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

    Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based
    language model designed to directly capture the global semantic meaning
    relating words in a document via latent topics. Because of their sequential
    nature, RNNs are good at capturing the local structure of a word sequence –
    both semantic and syntactic – but might face difficulty remembering long-range
    dependencies. Intuitively, these long-range dependencies are of semantic
    nature. In contrast, latent topic models are able to capture the global
    underlying semantic structure of a document but do not account for word
    ordering. The proposed TopicRNN model integrates the merits of RNNs and latent
    topic models: it captures local (syntactic) dependencies using an RNN and
    global (semantic) dependencies using latent topics. Unlike previous work on
    contextual RNN language modeling, our model is learned end-to-end. Empirical
    results on word prediction show that TopicRNN outperforms existing contextual
    RNN baselines. In addition, TopicRNN can be used as an unsupervised feature
    extractor for documents. We do this for sentiment analysis and report a new
    state-of-the-art error rate on the IMDB movie review dataset that amounts to a
    (13.3\%) improvement over the previous best result. Finally TopicRNN also
    yields sensible topics, making it a useful alternative to document models such
    as latent Dirichlet allocation.

    Twenty (simple) questions

    Yuval Dagan, Yuval Filmus, Ariel Gabizon, Shay Moran
    Comments: 77 pages
    Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Combinatorics (math.CO)

    A basic combinatorial interpretation of Shannon’s entropy function is via the
    “20 questions” game. This cooperative game is played by two players, Alice and
    Bob: Alice picks a distribution (pi) over the numbers ({1,ldots,n}), and
    announces it to Bob. She then chooses a number (x) according to (pi), and Bob
    attempts to identify (x) using as few Yes/No queries as possible, on average.

    An optimal strategy for the “20 questions” game is given by a Huffman code
    for (pi): Bob’s questions reveal the codeword for (x) bit by bit. This
    strategy finds (x) using fewer than (H(pi)+1) questions on average. However,
    the questions asked by Bob could be arbitrary. In this paper, we investigate
    the following question: Are there restricted sets of questions that match the
    performance of Huffman codes, either exactly or approximately?

    Our first main result shows that for every distribution (pi), Bob has a
    strategy that uses only questions of the form “(x < c)?” and “(x = c)?”, and
    uncovers (x) using at most (H(pi)+1) questions on average, matching the
    performance of Huffman codes in this sense. We also give a natural set of
    (O(rn^{1/r})) questions that achieve a performance of at most (H(pi)+r), and
    show that (Omega(rn^{1/r})) questions are required to achieve such a
    guarantee.

    Our second main result gives a set (mathcal{Q}) of (1.25^{n+o(n)}) questions
    such that for every distribution (pi), Bob can implement an optimal strategy
    for (pi) using only questions from (mathcal{Q}). We also show that
    (1.25^{n-o(n)}) questions are needed, for infinitely many (n). If we allow a
    small slack of (r) over the optimal strategy, then roughly ((rn)^{Theta(1/r)})
    questions are necessary and sufficient.

    Loss-aware Binarization of Deep Networks

    Lu Hou, Quanming Yao, James T. Kwok
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Deep neural network models, though very powerful and highly successful, are
    computationally expensive in terms of space and time. Recently, there have been
    a number of attempts on binarizing the network weights and activations. This
    greatly reduces the network size, and replaces the underlying multiplications
    to additions or even XNOR bit operations. However, existing binarization
    schemes are based on simple matrix approximation and ignore the effect of
    binarization on the loss. In this paper, we propose a proximal Newton algorithm
    with diagonal Hessian approximation that directly minimizes the loss w.r.t. the
    binarized weights. The underlying proximal step has an efficient closed-form
    solution, and the second-order information can be efficiently obtained from the
    second moments already computed by the Adam optimizer. Experiments on both
    feedforward and recurrent networks show that the proposed loss-aware
    binarization algorithm outperforms existing binarization schemes, and is also
    more robust for wide and deep networks.

    Quasi-Recurrent Neural Networks

    James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
    Comments: Submitted to conference track at ICLR 2017
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)

    Recurrent neural networks are a powerful tool for modeling sequential data,
    but the dependence of each timestep’s computation on the previous timestep’s
    output limits parallelism and makes RNNs unwieldy for very long sequences. We
    introduce quasi-recurrent neural networks (QRNNs), an approach to neural
    sequence modeling that alternates convolutional layers, which apply in parallel
    across timesteps, and a minimalist recurrent pooling function that applies in
    parallel across channels. Despite lacking trainable recurrent layers, stacked
    QRNNs have better predictive accuracy than stacked LSTMs of the same hidden
    size. Due to their increased parallelism, they are up to 16 times faster at
    train and test time. Experiments on language modeling, sentiment
    classification, and character-level neural machine translation demonstrate
    these advantages and underline the viability of QRNNs as a basic building block
    for a variety of sequence tasks.

    Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations

    Philip Blair, Yuval Merhav, Joel Barry
    Comments: 12 pages
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    We propose a language-agnostic way of automatically generating sets of
    semantically similar clusters of entities along with sets of “outlier”
    elements, which may then be used to perform an intrinsic evaluation of word
    embeddings in the outlier detection task. We used our methodology to create a
    gold-standard dataset, which we call WikiSem500, and evaluated multiple
    state-of-the-art embeddings. The results show a correlation between performance
    on this dataset and performance on sentiment analysis.

    Accurate Image Super-Resolution Using Very Deep Convolutional Networks

    Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We present a highly accurate single-image super-resolution (SR) method. Our
    method uses a very deep convolutional network inspired by VGG-net used for
    ImageNet classification cite{simonyan2015very}. We find increasing our network
    depth shows a significant improvement in accuracy. Our final model uses 20
    weight layers. By cascading small filters many times in a deep network
    structure, contextual information over large image regions is exploited in an
    efficient way. With very deep networks, however, convergence speed becomes a
    critical issue during training. We propose a simple yet effective training
    procedure. We learn residuals only and use extremely high learning rates
    ((10^4) times higher than SRCNN cite{dong2015image}) enabled by adjustable
    gradient clipping. Our proposed method performs better than existing methods in
    accuracy and visual improvements in our results are easily noticeable.

    Deeply-Recursive Convolutional Network for Image Super-Resolution

    Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose an image super-resolution method (SR) using a deeply-recursive
    convolutional network (DRCN). Our network has a very deep recursive layer (up
    to 16 recursions). Increasing recursion depth can improve performance without
    introducing new parameters for additional convolutions. Albeit advantages,
    learning a DRCN is very hard with a standard gradient descent method due to
    exploding/vanishing gradients. To ease the difficulty of training, we propose
    two extensions: recursive-supervision and skip-connection. Our method
    outperforms previous methods by a large margin.


    Information Theory

    Multiround Private Information Retrieval: Capacity and Storage Overhead

    Hua Sun, Syed A. Jafar
    Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)

    The capacity has recently been characterized for the private information
    retrieval (PIR) problem as well as several of its variants. In every case it is
    assumed that all the queries are generated by the user simultaneously. Here we
    consider multiround PIR, where the queries in each round are allowed to depend
    on the answers received in previous rounds. We show that the capacity of
    multiround PIR is the same as the capacity of single-round PIR (the result is
    generalized to also include (T)-privacy constraints). Combined with previous
    results, this shows that there is no capacity advantage from multiround over
    single-round schemes, non-linear over linear schemes or from (epsilon)-error
    over zero-error schemes. However, we show through an example that there is an
    advantage in terms of storage overhead. We provide an example of a multiround,
    non-linear, (epsilon)-error PIR scheme that requires a strictly smaller
    storage overhead than the best possible with single-round, linear, zero-error
    PIR schemes.

    Private Information Retrieval from Coded Databases with Colluding Servers

    Ragnar Freij-Hollanti, Oliver Gnilke, Camilla Hollanti, David Karpuk
    Subjects: Information Theory (cs.IT)

    We present a general framework for Private Information Retrieval (PIR) from
    arbitrary coded databases, that allows one to adjust the rate of the scheme
    according to the suspected number of colluding servers. If the storage code is
    a generalized Reed-Solomon code of length n and dimension k, we design PIR
    schemes which simultaneously protect against t colluding servers and provide
    PIR rate 1-(k+t-1)/n, for all t between 1 and n-k. This interpolates between
    the previously studied cases of t=1 and k=1 and asymptotically achieves the
    known capacity bounds in both of these cases, as the size of the database
    grows.

    Finite-Horizon Throughput Region for Wireless Multi-User Interference Channels

    Yirui Cong, Xiangyun Zhou, Rodney A. Kennedy
    Comments: The paper is accepted for publication in IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    This paper studies a wireless network consisting of multiple
    transmitter-receiver pairs where interference is treated as noise. Previously,
    the throughput region of such networks was characterized for either one time
    slot or an infinite time horizon. We aim to fill the gap by investigating the
    throughput region for transmissions over a finite time horizon. Unlike the
    infinite-horizon throughput region, which is simply the convex hull of the
    throughput region of one time slot, the finite-horizon throughput region is
    generally non-convex. Instead of directly characterizing all achievable
    rate-tuples in the finite-horizon throughput region, we propose a metric termed
    the rate margin, which not only determines whether any given rate-tuple is
    within the throughput region (i.e., achievable or unachievable), but also tells
    the amount of scaling that can be done to the given achievable (unachievable)
    rate-tuple such that the resulting rate-tuple is still within (brought back
    into) the throughput region. Furthermore, we derive an efficient algorithm to
    find the rate-achieving policy for any given rate-tuple in the finite-horizon
    throughput region.

    Pilot Precoding and Combining in Multiuser MIMO Networks

    Nima N. Moghadam, Hossein Shokri-Ghadikolaei, Gabor Fodor, Mats Bengtsson, Carlo Fischione
    Comments: Submitted to IEEE Journal on Selected Areas in Communications
    Subjects: Information Theory (cs.IT)

    Although the benefits of precoding and combining of data signals are widely
    recognized, the potential of these techniques for pilot transmission is not
    fully understood. This is particularly relevant for multiuser multiple-input
    multiple-output (MU-MIMO) cellular systems using millimeter-wave (mmWave)
    communications, where multiple antennas will have to be used both at the
    transmitter and the receiver to overcome the severe path loss. In this paper,
    we characterize the gains of pilot precoding and combining in terms of channel
    estimation quality and achievable data rate. Specifically, we consider three
    uplink pilot transmission scenarios in a mmWave MU-MIMO cellular system: 1)
    non-precoded and uncombined, 2) precoded but uncombined, and 3) precoded and
    combined. We show that a simple precoder that utilizes only the second-order
    statistics of the channel reduces the variance of the channel estimation error
    by a factor that is proportional to the number of user equipment (UE) antennas.
    We also show that using a linear combiner designed based on the second-order
    statistics of the channel significantly reduces multiuser interference and
    provides the possibility of reusing some pilots. Specifically, in the large
    antenna regime, pilot precoding and combining help to accommodate a large
    number of UEs in one cell, significantly improve channel estimation quality,
    boost the signal-to-noise ratio of the UEs located close to the cell edges,
    alleviate pilot contamination, and address the imbalanced coverage of pilot and
    data signals.

    Artificial-Noise-Aided Secure Transmission in Wiretap Channels with Transmitter-Side Correlation

    Shihao Yan, Xiangyun Zhou, Nan Yang, Biao He, Thushara D. Abhayapala
    Comments: Accepted by IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    This work for the first time examines the impact of transmitter-side
    correlation on the artificial-noise-aided secure transmission, based on which a
    new power allocation strategy for artificial noise (AN) is devised for physical
    layer security enhancement. Specifically, we design a correlation-based power
    allocation (CPA) for AN, of which the optimality in terms of achieving the
    minimum secrecy outage probability is analytically proved in the large system
    regime with the number of transmit antennas approaching infinity. In order to
    fully reveal the benefits of the CPA, we derive easy-to-evaluate expressions
    for the secrecy outage probability achieved by the CPA. Our study demonstrates
    that the CPA is nearly optimal and significantly outperforms the widely-used
    uniform power allocation (UPA) even for a moderately small number of correlated
    transmit antennas. Furthermore, our numerical results reveal a fundamental
    difference between the CPA and UPA. That is when the number of correlated
    transmit antennas increases, the secrecy outage probability of the CPA always
    reduces while the secrecy outage probability of the UPA suffers from a
    saturation point.

    Sum-networks from incidence structures: construction and capacity analysis

    Ardhendu Tripathy, Aditya Ramamoorthy
    Comments: 27 pages, 7 figures, preprint submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    A sum-network is an instance of a network coding problem over a directed
    acyclic network in which each terminal node wants to compute the sum over a
    finite field of the information observed at all the source nodes. Many
    characteristics of the well-studied multiple unicast network communication
    problem also hold for sum-networks due to a known reduction between instances
    of these two problems. In this work, we describe an algorithm to construct
    families of sum-network instances using incidence structures. The computation
    capacity of several of these sum-network families is characterized. We
    demonstrate that unlike the multiple unicast problem, the computation capacity
    of sum-networks depends on the characteristic of the finite field over which
    the sum is computed. This dependence is very strong; we show examples of
    sum-networks that have a rate-1 solution over one characteristic but a rate
    close to zero over a different characteristic. Additionally, a sum-network can
    have an arbitrary different number of computation capacities for different
    alphabets. This is contrast to the multiple unicast problem where it is known
    that the capacity is independent of the network coding alphabet.

    Please Lower Small Cell Antenna Heights in 5G

    Ming Ding, David Lopez Perez
    Comments: 6 pages, 5 figures, to appear in IEEE Globecom 2016. arXiv admin note: substantial text overlap with arXiv:1608.06694
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    In this paper, we present a new and significant theoretical discovery. If the
    absolute height difference between base station (BS) antenna and user equipment
    (UE) antenna is larger than zero, then the network capacity performance in
    terms of the area spectral efficiency (ASE) will continuously decrease as the
    BS density increases for ultra-dense (UD) small cell networks (SCNs). This
    performance behavior has a tremendous impact on the deployment of UD SCNs in
    the 5th-generation (5G) era. Network operators may invest large amounts of
    money in deploying more network infrastructure to only obtain an even worse
    network performance. Our study results reveal that it is a must to lower the
    SCN BS antenna height to the UE antenna height to fully achieve the capacity
    gains of UD SCNs in 5G. However, this requires a revolutionized approach of BS
    architecture and deployment, which is explored in this paper too.

    On the Approximate Analysis of Energy Detection over n*Rayleigh Fading Channels through Cooperative Spectrum Sensing

    Yahia Alghorani, Georges Kaddoum, Sami Muhaidat, Samuel Pierre
    Journal-ref: IEEE Wireless Communications Letters, Vol. 4, No. 4, August 2015
    Subjects: Information Theory (cs.IT)

    In this letter, we consider the problem of energy detection of unknown
    signals in an intervehicular communication (IVC) system over n*Rayleigh fading
    channels (also known as cascaded Rayleigh). Novel tight approximations for the
    probability of detection are derived for the no-diversity and the maximum ratio
    combining (MRC)) diversity schemes. Moreover, we investigate the system
    performance when cooperative spectrum sensing (CSS) is considered with and
    without imperfect reporting channels. The analytical results show that the
    detection reliability is decreased as the fading severity parameter n increases
    but reliability is substantially improved when CSS employs MRC schemes.

    Optimal High-Resolution Adaptive Sampling of Deterministic Signals

    Yehuda Dar, Alfred M. Bruckstein
    Subjects: Information Theory (cs.IT)

    In this work we study the topic of high-resolution adaptive sampling of a
    given deterministic signal and establish a clear connection to classic analyses
    of high-rate quantization. More specifically, we formulate solutions to the
    task of optimal high-resolution sampling of one-dimensional signals, which are
    shown as the counterparts of well-known results for high-rate scalar
    quantization. Our results reveal that the optimal high-resolution sampling
    structure is determined by the density of the signal-derivative energy, just as
    the probability-density-function defines the optimal high-rate quantization
    form. This paper has three main contributions: the first is establishing a
    fundamental paradigm bridging the topics of sampling and quantization. The
    second is a theoretical analysis of sampling that is relevant to the emerging
    field of high-resolution signal processing. The third is a new approach for
    nonuniform sampling of one-dimensional signals that is experimentally shown to
    outperform an optimized tree-structured sampling technique.

    Energy-Efficient Resource Allocation for Multi-User Mobile Edge Computing

    Junfeng Guo, Zhaozhe Song, Ying Cui
    Comments: submitted to ICC 2017
    Subjects: Information Theory (cs.IT)

    To increase mobile batteries’ lifetime and improve quality of experience for
    computation-intensive and latency-sensitive applications, mobile edge computing
    has received significant interest. Designing energy-efficient mobile edge
    computing systems requires joint optimization of communication and computation
    resources. In this paper, we consider energy-efficient resource allocation for
    a multi-user mobile edge computing system. First, we establish on two
    computation-efficient models with negligible and non-negligible base station
    (BS) executing durations, respectively. Then, under each model, we formulate
    the overall weighted sum energy consumption minimization problem by optimally
    allocating communication and computation resources. The optimization problem
    for negligible BS executing duration is convex, and we obtain the optimal
    solution in closed-form to this problem. The optimization problem for
    non-negligible BS executing duration is NP-hard in general, and we obtain a
    sub-optimal solution with low-complexity to this problem, by connecting it to a
    three-stage flow-shop scheduling problem and wisely utilizing Johnson’s
    algorithm. Finally, numerical results show that the proposed solutions
    outperform some baseline schemes.

    Semantic Information Encoding in One Dimensional Time Domain Signals

    Kaushik Majumdar, Srinath Jayachandran
    Comments: 11 figures, 3 tables
    Subjects: Information Theory (cs.IT)

    A one dimensional time domain analog signal s(t) can be visualized as a
    trajectory of a moving particle in a force field with one degree of freedom.
    Then the power of the particle at point t is P(s(t)) = s”(t)s'(t), which is the
    rate at which kinetic energy is dissipated (assuming the mass of the particle
    is unit) by the particle in order to create the trajectory or give shape to the
    signal. Assuming meaning of the signal or the semantic information is in its
    shape, we can say that P(s(t)) is the rate at which kinetic energy of the
    particle is dissipated to encode semantic information in s(t) at t. After s(t)
    is digitized (to make it s[n]) the discrete form P(s[n]) is valid. Considering
    the sign changes of P(s[n]) it has been shown that in the smallest neighborhood
    of n, in which n is the middle point, semantic information in s[n] can be
    encoded in 13 distinct ways. This list is exhaustive. A deterministic finite
    automaton (DFA) has been designed which can accept any finite length digital
    signal and therefore collection of all finite length digital signals forms a
    regular language. The DFA has been generalized to a weighted finite state
    transducer (WFST), which has been used to identify action potentials in a spike
    train and also to distinguish two speakers when uttering the same phoneme. It
    has been shown that in any analog signal semantic information can be encoded at
    a point in the form of the shape of its infinitesimal neighborhood in 17
    distinct ways. The list is exhaustive. A new entropy measure called semantic
    entropy has been introduced. It has been shown that a signal s(t) is traceable
    on a piece of paper or in an oscilloscope, only if s”(t) exists on all but at
    most a finite number of points within any finite interval. This is an essential
    condition for a signal to be the trajectory of a moving particle.

    Non-Orthogonal Multiple Access in Multi-Cell Networks: Theory, Performance, and Practical Challenges

    Wonjae Shin, Mojtaba Vaezi, Byungju Lee, David J. Love, Jungwoo Lee, H. Vincent Poor
    Comments: 13 pages, 4 figures, submitted to IEEE Communications Magazine
    Subjects: Information Theory (cs.IT)

    Non-orthogonal multiple access (NOMA) is a potential enabler for the
    development of 5G and beyond wireless networks. By allowing multiple users to
    share the same time and frequency, NOMA can scale up the number of served
    users, increase the spectral efficiency, and improve user-fairness compared to
    existing orthogonal multiple access (OMA) techniques. While single-cell NOMA
    has drawn significant attention recently, much less attention has been given to
    multi-cell NOMA. This article discusses the opportunities and challenges of
    NOMA in a multi-cell environment. As the density of base stations and devices
    increases, inter-cell interference becomes a major obstacle in multi-cell
    networks. As such, identifying techniques that combine interference management
    approaches with NOMA is of great significance. After discussing the theory
    behind NOMA, this paper provides an overview of the current literature and
    discusses key implementation and research challenges, with an emphasis on
    multi-cell NOMA.

    Leveraging Social Communities for Optimizing Cellular Device-to-Device Communications

    Md Abdul Alim, Tianyi Pan, My Tra Thai, Walid Saad
    Comments: 34 pages
    Subjects: Information Theory (cs.IT)

    Device-to-device (D2D) communications over licensed wireless spectrum has
    been recently proposed as a promising technology to meet the capacity crunch of
    next generation cellular networks. However, due to the high mobility of
    cellular devices, establishing and ensuring the success of D2D transmission
    becomes a major challenge. To this end, in this paper, a novel framework is
    proposed to enable devices to form multi-hop D2D connections in an effort to
    maintain sustainable communication in the presence of device mobility. To solve
    the problem posed by device mobility, in contrast to existing works, which
    mostly focus on physical domain information, a durable community based approach
    is introduced taking social encounters into context. It is shown that the
    proposed scheme can derive an optimal solution for time sensitive content
    transmission while also minimizing the cost that the base station pays in order
    to incentivize users to participate in D2D. Simulation results show that the
    proposed social community aware approach yields significant performance gain,
    in terms of the amount of traffic offloaded from the cellular network to the
    D2D tier, compared to the classical social-unaware methods.

    Decentralized Coded Caching with Distinct Cache Capacities

    Mohammad Mohammadi Amiri, Qianqian Yang, Deniz Gunduz
    Comments: Submitted for publication
    Subjects: Information Theory (cs.IT)

    Decentralized coded caching is studied in a content delivery system with a
    server holding a database of (N) files, each of size (F) bits, serving (K)
    users, each equipped with a cache memory, not necessarily of equal size. It is
    assumed that the users’ caches are filled in advance during the off-peak
    traffic period in a decentralized manner, i.e., without the knowledge of the
    number of active users, their identities, or their particular demands. User
    demands are revealed during the peak traffic period, and are served
    simultaneously through an error-free shared link. A group-based decentralized
    coded caching scheme is proposed, and it is shown to improve upon the
    state-of-the-art in terms of the minimum required delivery rate over the shared
    link, when there are more users in the system than files. Numerical results
    indicate that the improvement is more significant as the cache capacities of
    the users become more skewed. A new lower bound on the delivery rate is also
    presented, which provides a tighter bound than the classical cut-set bound.

    Diversity Pulse Shaped Transmission in Ultra-Dense Small Cell Networks

    Amir H. Jafari, Vijay Venkateswaran, David Lopez-Perez, Jie Zhang
    Comments: 13 pages, 7 figures. Submitted to IEEE Transactions on Vehicular Technology
    Subjects: Information Theory (cs.IT)

    In ultra-dense small cell networks, spatial multiplexing gain is a challenge
    because of the different propagation conditions. The channels associated with
    different transmitreceive pairs can be highly correlated due to the i) high
    probability of line-of-sight (LOS) communication between user equipment (UE)
    and base station (BS), and ii) insufficient spacing between antenna elements at
    both UE and BS. In this paper, we propose a novel transmission technique titled
    Diversity Pulse Shaped Transmission (DPST) to enhance the throughput over the
    correlated MIMO channels in an ultra-dense small cell network. The fundamental
    of DPST is to shape transmit signals at adjacent antennas with distinct
    interpolating filters, introducing pulse shaping diversity. In DPST, each
    antenna transmits its own data stream with a relative deterministic time
    offset-which must be a fraction of the symbol period-with respect to the
    adjacent antenna. The delay is interpolated with the pulse shaped signal
    generating a virtual MIMO channel that benefits from increased diversity from
    the receiver perspective. To extract the diversity, the receiver must operate
    in an over-sampled domain and hence a fractionally spaced equaliser (FSE) is
    proposed. The joint impact of DPST and FSE helps the receiver to sense a less
    correlated channel, eventually enhancing the UE’s throughput. Moreover, in
    order to minimise the spatial correlation, we aim to optimise the deterministic
    fractional delay. Simulation results show that applying DPST to a correlated
    channel can approximately enhance the UE throughput by 1.93x and 3.76x in 2×2
    and 4×4 MIMO systems, respectively.

    Convergence Analysis of Distributed Inference with Vector-Valued Gaussian Belief Propagation

    Jian Du, Shaodan Ma, Yik-Chung Wu, Soummya Kar, José M. F. Moura
    Comments: 30 pages
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT)

    In networks such as the smart grid, communication networks, and social
    networks, local measurements/observations are scattered over a wide
    geographical area. Centralized inference algorithm are based on gathering all
    the observations at a central processing unit. However, with data explosion and
    ever-increasing network sizes, centralized inference suffers from large
    communication overhead, heavy computation burden at the center, and
    susceptibility to central node failure. This paper considers inference over
    networks using factor graphs and a distributed inference algorithm based on
    Gaussian belief propagation. The distributed inference involves only local
    computation of the information matrix and of the mean vector and message
    passing between neighbors. We discover and show analytically that the message
    information matrix converges exponentially fast to a unique positive definite
    limit matrix for arbitrary positive semidefinite initialization. We provide the
    necessary and sufficient convergence condition for the belief mean vector to
    converge to the optimal centralized estimator. An easily verifiable sufficient
    convergence condition on the topology of a factor graph is further provided.

    Relations Between Work and Entropy Production for General Information-Driven, Finite-State Engines

    Neri Merhav
    Comments: 18 pages, 2 figures, submitted for publication
    Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT)

    We consider a system model of a general finite-state machine (ratchet) that
    simultaneously interacts with three kinds of reservoirs: a heat reservoir, a
    work reservoir, and an information reservoir, the latter being taken to be a
    running digital tape whose symbols interact sequentially with the machine. As
    has been shown in earlier work, this finite-state machine can act as a demon
    (with memory), which creates a net flow of energy from the heat reservoir into
    the work reservoir (thus extracting useful work) at the price of increasing the
    entropy of the information reservoir. Under very few assumptions, we propose a
    simple derivation of a family of inequalities that relate the work extraction
    with the entropy production. These inequalities can be seen as either upper
    bounds on the extractable work or as lower bounds on the entropy production,
    depending on the point of view. Many of these bounds are relatively easy to
    calculate and they are tight in the sense that equality can be approached
    arbitrarily closely. In their basic forms, these inequalities are applicable to
    any finite number of cycles (and not only asymptotically), and for a general
    input information sequence (possibly correlated), which is not necessarily
    assumed even stationary. Several known results are obtained as special cases.

    A New Error Correction Scheme for Physical Unclonable Functions

    Sven Müelich, Martin Bossert
    Comments: 6 pages
    Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

    Error correction is an indispensable component when Physical Unclonable
    Functions (PUFs) are used in cryptographic applications. So far, there exist
    schemes that obtain helper data, which they need within the error correction
    process. We introduce a new scheme, which only uses an error correcting code
    without any further helper data. The main idea is to construct for each PUF
    instance an individual code which contains the initial PUF response as
    codeword. In this work we use LDPC codes, however other code classes are also
    possible. Our scheme allows a trade-off between code rate and cryptographic
    security. In addition, decoding with linear complexity is possible.

    An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax

    Wentao Huang, Kechen Zhang
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

    A framework is presented for unsupervised learning of representations based
    on infomax principle for large-scale neural populations. We use an asymptotic
    approximation to the Shannon’s mutual information for a large neural population
    to demonstrate that a good initial approximation to the global
    information-theoretic optimum can be obtained by a hierarchical infomax method.
    From the initial solution, an efficient algorithm based on gradient descent of
    the final objective function is proposed to learn representations from the
    input datasets, allowing complete, overcomplete, or undercomplete bases. As
    confirmed by numerical experiments, our method is robust and highly efficient
    for extracting salient features from image datasets. Compared with the main
    existing methods, our algorithm has a distinct advantage in both the training
    speed and the robustness of unsupervised representation learning.

    End-to-end Optimized Image Compression

    Johannes Ballé, Valero Laparra, Eero P. Simoncelli
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

    We describe an image compression system, consisting of a nonlinear encoding
    transformation, a uniform quantizer, and a nonlinear decoding transformation.
    Like many deep neural network architectures, the transforms consist of layers
    of convolutional linear filters and nonlinear activation functions, but we use
    a joint nonlinearity that implements a form of local gain control, inspired by
    those used to model biological neurons. Using a variant of stochastic gradient
    descent, we jointly optimize the system for rate-distortion performance over a
    database of training images, introducing a continuous proxy for the
    discontinuous loss function arising from the quantizer. The relaxed
    optimization problem resembles that of variational autoencoders, except that it
    must operate at any point along the rate-distortion curve, whereas the
    optimization of generative models aims only to minimize entropy of the data
    under the model. Across an independent database of test images, we find that
    the optimized coder exhibits significantly better rate-distortion performance
    than the standard JPEG and JPEG 2000 compression systems, as well as a dramatic
    improvement in visual quality of compressed images.

    Twenty (simple) questions

    Yuval Dagan, Yuval Filmus, Ariel Gabizon, Shay Moran
    Comments: 77 pages
    Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Combinatorics (math.CO)

    A basic combinatorial interpretation of Shannon’s entropy function is via the
    “20 questions” game. This cooperative game is played by two players, Alice and
    Bob: Alice picks a distribution (pi) over the numbers ({1,ldots,n}), and
    announces it to Bob. She then chooses a number (x) according to (pi), and Bob
    attempts to identify (x) using as few Yes/No queries as possible, on average.

    An optimal strategy for the “20 questions” game is given by a Huffman code
    for (pi): Bob’s questions reveal the codeword for (x) bit by bit. This
    strategy finds (x) using fewer than (H(pi)+1) questions on average. However,
    the questions asked by Bob could be arbitrary. In this paper, we investigate
    the following question: Are there restricted sets of questions that match the
    performance of Huffman codes, either exactly or approximately?

    Our first main result shows that for every distribution (pi), Bob has a
    strategy that uses only questions of the form “(x < c)?” and “(x = c)?”, and
    uncovers (x) using at most (H(pi)+1) questions on average, matching the
    performance of Huffman codes in this sense. We also give a natural set of
    (O(rn^{1/r})) questions that achieve a performance of at most (H(pi)+r), and
    show that (Omega(rn^{1/r})) questions are required to achieve such a
    guarantee.

    Our second main result gives a set (mathcal{Q}) of (1.25^{n+o(n)}) questions
    such that for every distribution (pi), Bob can implement an optimal strategy
    for (pi) using only questions from (mathcal{Q}). We also show that
    (1.25^{n-o(n)}) questions are needed, for infinitely many (n). If we allow a
    small slack of (r) over the optimal strategy, then roughly ((rn)^{Theta(1/r)})
    questions are necessary and sufficient.




沪ICP备19023445号-2号
友情链接