IT博客汇 | arXiv Paper Daily: Wed, 30 Nov 2016

arXiv Paper Daily: Wed, 30 Nov 2016

我爱机器学习(52ml.net)发表于 2016-11-30 00:00:00

Neural and Evolutionary Computing

Emergence of foveal image sampling from learning to attend in visual scenes

Brian Cheung, Eric Weiss, Bruno Olshausen
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

We describe a neural attention model with a learnable retinal sampling
lattice. The model is trained on a visual search task requiring the
classification of an object embedded in a visual scene amidst background
distractors using the smallest number of fixations. We explore the tiling
properties that emerge in the model’s retinal sampling lattice after training.
Specifically, we show that this lattice resembles the eccentricity dependent
sampling lattice of the primate retina, with a high resolution region in the
fovea surrounded by a low resolution periphery. Furthermore, we find conditions
where these emergent properties are amplified or eliminated providing clues to
their function.

Multi-objective Active Control Policy Design for Commensurate and Incommensurate Fractional Order Chaotic Financial Systems

Indranil Pan, Saptarshi Das, Shantanu Das
Comments: 26 pages, 8 figures, 2 tables
Journal-ref: Applied Mathematical Modelling, Volume 39, Issue 2, 15 January
2015, Pages 500-514
Subjects: Optimization and Control (math.OC); Computational Engineering, Finance, and Science (cs.CE); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY); Chaotic Dynamics (nlin.CD)

In this paper, an active control policy design for a fractional order (FO)
financial system is attempted, considering multiple conflicting objectives. An
active control template as a nonlinear state feedback mechanism is developed
and the controller gains are chosen within a multi-objective optimization (MOO)
framework to satisfy the conditions of asymptotic stability, derived
analytically. The MOO gives a set of solutions on the Pareto optimal front for
the multiple conflicting objectives that are considered. It is shown that there
is a trade-off between the multiple design objectives and a better performance
in one objective can only be obtained at the cost of performance deterioration
in the other objectives. The multi-objective controller design has been
compared using three different MOO techniques viz. Non Dominated Sorting
Genetic Algorithm-II (NSGA-II), epsilon variable Multi-Objective Genetic
Algorithm (ev-MOGA), and Multi Objective Evolutionary Algorithm with
Decomposition (MOEA/D). The robustness of the same control policy designed with
the nominal system settings have been investigated also for gradual decrease in
the commensurate and incommensurate fractional orders of the financial system.

Fractional Order Load-Frequency Control of Interconnected Power Systems Using Chaotic Multi-objective Optimization

Indranil Pan, Saptarshi Das
Comments: 31 pages, 19 figures, 2 tables
Journal-ref: Applied Soft Computing, Volume 29, April 2015, Pages 328-344
Subjects: Optimization and Control (math.OC); Computational Engineering, Finance, and Science (cs.CE); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)

Fractional order proportional-integral-derivative (FOPID) controllers are
designed for load frequency control (LFC) of two interconnected power systems.
Conflicting time domain design objectives are considered in a multi objective
optimization (MOO) based design framework to design the gains and the
fractional differ-integral orders of the FOPID controllers in the two areas.
Here, we explore the effect of augmenting two different chaotic maps along with
the uniform random number generator (RNG) in the popular MOO algorithm – the
Non-dominated Sorting Genetic Algorithm-II (NSGA-II). Different measures of
quality for MOO e.g. hypervolume indicator, moment of inertia based diversity
metric, total Pareto spread, spacing metric are adopted to select the best set
of controller parameters from multiple runs of all the NSGA-II variants (i.e.
nominal and chaotic versions). The chaotic versions of the NSGA-II algorithm
are compared with the standard NSGA-II in terms of solution quality and
computational time. In addition, the Pareto optimal fronts showing the
trade-off between the two conflicting time domain design objectives are
compared to show the advantage of using the FOPID controller over that with
simple PID controller. The nature of fast/slow and high/low noise amplification
effects of the FOPID structure or the four quadrant operation in the two
inter-connected areas of the power system is also explored. A fuzzy logic based
method has been adopted next to select the best compromise solution from the
best Pareto fronts corresponding to each MOO comparison criteria. The time
domain system responses are shown for the fuzzy best compromise solutions under
nominal operating conditions. Comparative analysis on the merits and de-merits
of each controller structure is reported then. A robustness analysis is also
done for the PID and the FOPID controllers.

Fractional Order AGC for Distributed Energy Resources Using Robust Optimization

Indranil Pan, Saptarshi Das
Comments: 12 pages, 16 figures, 5 tables
Journal-ref: IEEE Transactions on Smart Grid, Volume 7, Issue 5, Pages 2175 –
2186, Sept 2016
Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

The applicability of fractional order (FO) automatic generation control (AGC)
for power system frequency oscillation damping is investigated in this paper,
employing distributed energy generation. The hybrid power system employs
various autonomous generation systems like wind turbine, solar photovoltaic,
diesel engine, fuel-cell and aqua electrolyzer along with other energy storage
devices like the battery and flywheel. The controller is placed in a remote
location while receiving and sending signals over an unreliable communication
network with stochastic delay. The controller parameters are tuned using robust
optimization techniques employing different variants of Particle Swarm
Optimization (PSO) and are compared with the corresponding optimal solutions.
An archival based strategy is used for reducing the number of function
evaluations for the robust optimization methods. The solutions obtained through
the robust optimization are able to handle higher variation in the controller
gains and orders without significant decrease in the system performance. This
is desirable from the FO controller implementation point of view, as the design
is able to accommodate variations in the system parameter which may result due
to the approximation of FO operators, using different realization methods and
order of accuracy. Also a comparison is made between the FO and the integer
order (IO) controllers to highlight the merits and demerits of each scheme.

Intelligible Language Modeling with Input Switched Affine Networks

Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo
Comments: ICLR 2107 submission: this https URL
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The computational mechanisms by which nonlinear recurrent neural networks
(RNNs) achieve their goals remains an open question. There exist many problem
domains where intelligibility of the network model is crucial for deployment.
Here we introduce a recurrent architecture composed of input-switched affine
transformations, in other words an RNN without any nonlinearity and with one
set of weights per input. We show that this architecture achieves near
identical performance to traditional architectures on language modeling of
Wikipedia text, for the same number of model parameters. It can obtain this
performance with the potential for computational speedup compared to existing
methods, by precomputing the composed affine transformations corresponding to
longer input sequences. As our architecture is affine, we are able to
understand the mechanisms by which it functions using linear methods. For
example, we show how the network linearly combines contributions from the past
to make predictions at the current time step. We show how representations for
words can be combined in order to understand how context is transferred across
word boundaries. Finally, we demonstrate how the system can be executed and
analyzed in arbitrary bases to aid understanding.

Computer Vision and Pattern Recognition

Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction

Richard Zhang, Phillip Isola, Alexei A. Efros
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose split-brain autoencoders, a straightforward modification of the
traditional autoencoder architecture, for unsupervised representation learning.
The method adds a split to the network, resulting in two disjoint sub-networks.
Each sub-network is trained to perform a difficult task — predicting one
subset of the data channels from another. Together, the sub-networks extract
features from the entire input signal. By forcing the network to solve
cross-channel prediction tasks, we induce a representation within the network
which transfers well to other, unseen tasks. This method achieves
state-of-the-art performance on several large-scale transfer learning
benchmarks.

Monocular 3D Human Pose Estimation Using Transfer Learning and Improved CNN Supervision

Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, Christian Theobalt
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a new CNN-based method for regressing 3D human body pose from a
single image that improves over the state-of-the-art on standard benchmarks by
more than 25%. Our approach addresses the limited generalizability of models
trained solely on the starkly limited publicly available 3D body pose data.
Improved CNN supervision leverages first and second order parent relationships
along the skeletal kinematic tree, and improved multi-level skip connections to
learn better representations through implicit modification of the loss
landscape. Further, transfer learning from 2D human pose prediction
significantly improves accuracy and generalizability to unseen poses and camera
views. Additionally, we contribute a new benchmark and training set for human
body pose estimation from monocular images of real humans, that has ground
truth captured with marker-less motion capture. It complements existing corpora
with greater diversity in pose, human appearance, clothing, occlusion, and
viewpoints, and enables increased scope of augmentation. The benchmark covers
outdoors and indoor scenes.

3D Ultrasound image segmentation: A Survey

Mohammad Hamed Mozaffari, WonSook Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Three-dimensional Ultrasound image segmentation methods are surveyed in this
paper. The focus of this report is to investigate applications of these
techniques and a review of the original ideas and concepts. Although many
two-dimensional image segmentation in the literature have been considered as a
three-dimensional approach by mistake but we review them as a three-dimensional
technique. We select the studies that have addressed the problem of medical
three-dimensional Ultrasound image segmentation utilizing their proposed
techniques. The evaluation methods and comparison between them are presented
and tabulated in terms of evaluation techniques, interactivity, and robustness.

InterpoNet, A brain inspired neural network for optical flow dense interpolation

Shay Zweig, Lior Wolf
Comments: 16 pages, 11 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Sparse-to-dense interpolation for optical flow is a fundamental phase in the
pipeline of most of the leading optical flow estimation algorithms. The current
state-of-the-art method for interpolation, EpicFlow, is a local average method
based on an edge aware geodesic distance. We propose a new data-driven
sparse-to-dense interpolation algorithm based on a fully convolutional network.
We draw inspiration from the filling-in process in the visual cortex and
introduce lateral dependencies between neurons and multi-layer supervision into
our learning process. We also show the importance of the image contour to the
learning process. Our method is robust and outperforms EpicFlow on competitive
optical flow benchmarks with several underlying matching algorithms. This leads
to state-of-the-art performance on the Sintel and KITTI 2012 benchmarks.

Computer Aided Detection of Oral Lesions on CT Images

Shaikat Galib, Fahima Islam, Muhammad Abir, Hyoung-Koo Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Oral lesions are important findings on computed tomography (CT) images. In
this study, a fully automatic method to detect oral lesions in mandibular
region from dental CT images is proposed. Two methods were developed to
recognize two types of lesions namely (1) Close border (CB) lesions and (2)
Open border (OB) lesions, which cover most of the lesion types that can be
found on CT images. For the detection of CB lesions, fifteen features were
extracted from each initial lesion candidates and multi layer perceptron (MLP)
neural network was used to classify suspicious regions. Moreover, OB lesions
were detected using a rule based image processing method, where no feature
extraction or classification algorithm were used. The results were validated
using a CT dataset of 52 patients, where 22 patients had abnormalities and 30
patients were normal. Using non-training dataset, CB detection algorithm
yielded 71% sensitivity with 0.31 false positives per patient. Furthermore, OB
detection algorithm achieved 100% sensitivity with 0.13 false positives per
patient. Results suggest that, the proposed framework, which consists of two
methods, has the potential to be used in clinical context, and assist
radiologists for better diagnosis.

Gossip training for deep learning

Michael Blot, David Picard, Matthieu Cord, Nicolas Thome
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

We address the issue of speeding up the training of convolutional networks.
Here we study a distributed method adapted to stochastic gradient descent
(SGD). The parallel optimization setup uses several threads, each applying
individual gradient descents on a local variable. We propose a new way to share
information between different threads inspired by gossip algorithms and showing
good consensus convergence properties. Our method called GoSGD has the
advantage to be fully asynchronous and decentralized. We compared our method to
the recent EASGD in cite{elastic} on CIFAR-10 show encouraging results.

Efficient Linear Programming for Dense CRFs

Thalaiyasingam Ajanthan, Alban Desmaison, Rudy Bunel, Mathieu Salzmann, Philip H.S. Torr, M. Pawan Kumar
Comments: 24 pages, 10 figures, 4 tables and 51 equations
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The fully connected conditional random field (CRF) with Gaussian pairwise
potentials has proven popular and effective for multi-class semantic
segmentation. While the energy of a dense CRF can be minimized accurately using
a linear programming (LP) relaxation, the state-of-the-art algorithm is too
slow to be useful in practice. To alleviate this deficiency, we introduce an
efficient LP minimization algorithm for dense CRFs. To this end, we develop a
proximal minimization framework, where the dual of each proximal problem is
optimized via block coordinate descent. We show that each block of variables
can be efficiently optimized. Specifically, for one block, the problem
decomposes into significantly smaller subproblems, each of which is defined
over a single pixel. For the other block, the problem is optimized via
conditional gradient descent. This has two advantages: 1) the conditional
gradient can be computed in a time linear in the number of pixels and labels;
and 2) the optimal step size can be computed analytically. Our experiments on
standard datasets provide compelling evidence that our approach outperforms all
existing baselines including the previous LP based approach for dense CRFs.

Surveillance Video Parsing with Single Frame Supervision

Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Surveillance video parsing, which segments the video frames into several
labels, e.g., face, pants, left-leg, has wide applications.
However,pixel-wisely annotating all frames is tedious and inefficient. In this
paper, we develop a Single frame Video Parsing (SVP) method which requires only
one labeled frame per video in training stage. To parse one particular frame,
the video segment preceding the frame is jointly considered. SVP (1) roughly
parses the frames within the video segment, (2) estimates the optical flow
between frames and (3) fuses the rough parsing results warped by optical flow
to produce the refined parsing result. The three components of SVP, namely
frame parsing, optical flow estimation and temporal fusion are integrated in an
end-to-end manner. Experimental results on two surveillance video datasets show
the superiority of SVP over state-of-the-arts.

A Large-scale Distributed Video Parsing and Evaluation Platform

Kai Yu, Yang Zhou, Da Li, Zhang Zhang, Kaiqi Huang
Comments: Accepted by Chinese Conference on Intelligent Visual Surveillance 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual surveillance systems have become one of the largest data sources of
Big Visual Data in real world. However, existing systems for video analysis
still lack the ability to handle the problems of scalability, expansibility and
error-prone, though great advances have been achieved in a number of visual
recognition tasks and surveillance applications, e.g., pedestrian/vehicle
detection, people/vehicle counting. Moreover, few algorithms explore the
specific values/characteristics in large-scale surveillance videos. To address
these problems in large-scale video analysis, we develop a scalable video
parsing and evaluation platform through combining some advanced techniques for
Big Data processing, including Spark Streaming, Kafka and Hadoop Distributed
Filesystem (HDFS). Also, a Web User Interface is designed in the system, to
collect users’ degrees of satisfaction on the recognition tasks so as to
evaluate the performance of the whole system. Furthermore, the highly
extensible platform running on the long-term surveillance videos makes it
possible to develop more intelligent incremental algorithms to enhance the
performance of various visual recognition tasks.

Fast Face-swap Using Convolutional Neural Networks

Iryna Korshunova, Wenzhe Shi, Joni Dambre, Lucas Theis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We consider the problem of face swapping in images, where an input identity
is transformed into a target identity while preserving pose, facial expression,
and lighting. To perform this mapping, we use convolutional neural networks
trained to capture the appearance of the target identity from an unstructured
collection of his/her photographs.This approach is enabled by framing the face
swapping problem in terms of style transfer, where the goal is to render an
image in the style of another one. Building on recent advances in this area, we
devise a new loss function that enables the network to produce highly
photorealistic results. By combining neural networks with simple pre- and
post-processing steps, we aim at making face swap work in real-time with no
input from the user.

Occlusion-Aware Video Deblurring with a New Layered Blur Model

Byeongjoo Ahn, Tae Hyun Kim, Wonsik Kim, Kyoung Mu Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a deblurring method for scenes with occluding objects using a
carefully designed layered blur model. Layered blur model is frequently used in
the motion deblurring problem to handle locally varying blurs, which is caused
by object motions or depth variations in a scene. However, conventional models
have a limitation in representing the layer interactions occurring at occlusion
boundaries. In this paper, we address this limitation in both theoretical and
experimental ways, and propose a new layered blur model reflecting actual blur
generation process. Based on this model, we develop an occlusion-aware
deblurring method that can estimate not only the clear foreground and
background, but also the object motion more accurately. We also provide a novel
analysis on the blur kernel at object boundaries, which shows the distinctive
characteristics of the blur kernel that cannot be captured by conventional blur
models. Experimental results on synthetic and real blurred videos demonstrate
that the proposed method yields superior results, especially at object
boundaries.

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model

Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Data-driven saliency has recently gained a lot of attention thanks to the use
of Convolutional Neural Networks. In this paper we go beyond the standard
approach to saliency prediction, in which gaze maps are computed with a
feed-forward network, and we present a novel Saliency Attentive Model which can
predict accurate saliency maps by incorporating attentive mechanisms. Our
solution is composed of a Convolutional LSTM, that iteratively focuses on the
most salient regions of the input, and a Residual Architecture designed to
preserve spatial resolution. Additionally, to tackle the center bias present in
human eye fixations, our model incorporates prior maps generated by learned
Gaussian functions. We show, through an extensive evaluation, that the proposed
architecture overcomes the current state of the art on three public saliency
prediction datasets: SALICON, MIT300 and CAT2000. We further study the
contribution of each key components to demonstrate their robustness on
different scenarios.

Lens Distortion Rectification using Triangulation based Interpolation

Burak Benligiray, Cihan Topal
Comments: International Symposium on Visual Computing, 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Nonlinear lens distortion rectification is a common first step in image
processing applications where the assumption of a linear camera model is
essential. For rectifying the lens distortion, forward distortion model needs
to be known. However, many self-calibration methods estimate the inverse
distortion model. In the literature, the inverse of the estimated model is
approximated for image rectification, which introduces additional error to the
system. We propose a novel distortion rectification method that uses the
inverse distortion model directly. The method starts by mapping the distorted
pixels to the rectified image using the inverse distortion model. The resulting
set of points with subpixel locations are triangulated. The pixel values of the
rectified image are linearly interpolated based on this triangulation. The
method is applicable to all camera calibration methods that estimate the
inverse distortion model and performs well across a large range of parameters.

Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Classifying products into categories precisely and efficiently is a major
challenge in modern e-commerce. The high traffic of new products uploaded daily
and the dynamic nature of the categories raise the need for machine learning
models that can reduce the cost and time of human editors. In this paper, we
propose a decision level fusion approach for multi-modal product classification
using text and image inputs. We train input specific state-of-the-art deep
neural networks for each input source, show the potential of forging them
together into a multi-modal architecture and train a novel policy network that
learns to choose between them. Finally, we demonstrate that our multi-modal
network improves the top-1 accuracy % over both networks on a real-world
large-scale product classification dataset that we collected fromWalmart.com.
While we focus on image-text fusion that characterizes e-commerce domains, our
algorithms can be easily applied to other modalities such as audio, video,
physical sensors, etc.

Deep Quantization: Encoding Convolutional Activations with Deep Generative Model

Zhaofan Qiu, Ting Yao, Tao Mei
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep convolutional neural networks (CNNs) have proven highly effective for
visual recognition, where learning a universal representation from activations
of convolutional layer plays a fundamental problem. In this paper, we present
Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep
architecture that quantizes the local activations of convolutional layer in a
deep generative model, by training them in an end-to-end manner. To incorporate
FV encoding strategy into deep generative models, we introduce Variational
Auto-Encoder model, which steers a variational inference and learning in a
neural network which can be straightforwardly optimized using standard
stochastic gradient method. Different from the FV characterized by conventional
generative models (e.g., Gaussian Mixture Model) which parsimoniously fit a
discrete mixture model to data distribution, the proposed FV-VAE is more
flexible to represent the natural property of data for better generalization.
Extensive experiments are conducted on three public datasets, i.e., UCF101,
ActivityNet, and CUB-200-2011 in the context of video action recognition and
fine-grained image classification, respectively. Superior results are reported
when compared to state-of-the-art representations. Most remarkably, our
proposed FV-VAE achieves to-date the best published accuracy of 94.2% on
UCF101.

Inertial-Based Scale Estimation for Structure from Motion on Mobile Devices

Janne Mustaniemi, Juho Kannala, Simo Särkkä, Jiri Matas, Janne Heikkilä
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Structure from motion algorithms have an inherent limitation that the
reconstruction can only be determined up to the unknown scale factor. Modern
mobile devices are equipped with an inertial measurement unit (IMU), which can
be used for estimating the scale of the reconstruction. We propose a method
that recovers the metric scale given inertial measurements and camera poses. In
the process, we also perform a temporal and spatial alignment of the camera and
the IMU. Therefore, our solution can be easily combined with any existing
visual reconstruction software. The method can cope with noisy camera pose
estimates, typically caused by motion blur or rolling shutter artifacts, via
utilizing a Rauch-Tung-Striebel (RTS) smoother. Furthermore, the scale
estimation is performed in the frequency domain, which provides more robustness
to inaccurate sensor time stamps and noisy IMU samples than the previously used
time domain representation. In contrast to previous methods, our approach has
no parameters that need to be tuned for achieving a good performance. In the
experiments, we show that the algorithm outperforms the state-of-the-art in
both accuracy and convergence speed of the scale estimate. The accuracy of the
scale is around (1\%) from the ground truth depending on the recording. We also
demonstrate that our method can improve the scale accuracy of the Project
Tango’s build-in motion tracking.

Social Behavior Prediction from First Person Videos

Shan Su, Jung Pyo Hong, Jianbo Shi, Hyun Soo Park
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a method to predict the future movements (location and
gaze direction) of basketball players as a whole from their first person
videos. The predicted behaviors reflect an individual physical space that
affords to take the next actions while conforming to social behaviors by
engaging to joint attention. Our key innovation is to use the 3D reconstruction
of multiple first person cameras to automatically annotate each other’s the
visual semantics of social configurations.

We leverage two learning signals uniquely embedded in first person videos.
Individually, a first person video records the visual semantics of a spatial
and social layout around a person that allows associating with past similar
situations. Collectively, first person videos follow joint attention that can
link the individuals to a group. We learn the egocentric visual semantics of
group movements using a Siamese neural network to retrieve future trajectories.
We consolidate the retrieved trajectories from all players by maximizing a
measure of social compatibility—the gaze alignment towards joint attention
predicted by their social formation, where the dynamics of joint attention is
learned by a long-term recurrent convolutional network. This allows us to
characterize which social configuration is more plausible and predict future
group trajectories.

Material Recognition from Local Appearance in Global Context

Gabriel Schwartz, Ko Nishino
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recognition of materials has proven to be a challenging problem due to the
wide variation in appearance within and between categories. Many recent
material recognition methods treat materials as yet another set of labels like
objects. Materials are, however, fundamentally different from objects as they
have no inherent shape or defined spatial extent. This makes local material
recognition particularly hard. Global image context, such as where the material
is or what object it makes up, can be crucial to recognizing the material.
Existing methods, however, operate on an implicit fusion of materials and
context by using large receptive fields as input (i.e., large image patches).
Such an approach can only take advantage of limited context as it appears
during training, and will be bounded by the combinations seen in the training
data. We instead show that recognizing materials purely from their local
appearance and integrating separately recognized global contextual cues
including objects and places leads to superior dense, per-pixel, material
recognition. We achieve this by training a fully-convolutional material
recognition network end-to-end with only material category supervision. We
integrate object and place estimates to this network from independent CNNs.
This approach avoids the necessity of preparing an infeasible amount of
training data that covers the product space of materials, objects, and scenes,
while fully leveraging contextual cues for dense material recognition.
Experimental results validate the effectiveness of our approach and show that
our method outperforms past methods that build on inseparable material and
contextual information.

Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval

Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I. Morariu, Larry S. Davis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG); Information Retrieval (cs.IR)

Spatial relationships between objects provide important information for
text-based image retrieval. As users are more likely to describe a scene from a
real world perspective, using 3D spatial relationships rather than 2D
relationships that assume a particular viewing direction, one of the main
challenges is to infer the 3D structure that bridges images with users’ text
descriptions. However, direct inference of 3D structure from images requires
learning from large scale annotated data. Since interactions between objects
can be reduced to a limited set of atomic spatial relations in 3D, we study the
possibility of inferring 3D structure from a text description rather than an
image, applying physical relation models to synthesize holistic 3D abstract
object layouts satisfying the spatial constraints present in a textual
description. We present a generic framework for retrieving images from a
textual description of a scene by matching images with these generated abstract
object layouts. Images are ranked by matching object detection outputs
(bounding boxes) to 2D layout candidates (also represented by bounding boxes)
which are obtained by projecting the 3D scenes with sampled camera directions.
We validate our approach using public indoor scene datasets and show that our
method outperforms an object occurrence based and a learned 2D pairwise
relation based baselines.

Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

Daniel Harari, Tao Gao, Nancy Kanwisher, Joshua Tenenbaum, Shimon Ullman
Comments: Daniel Harari and Tao Gao contributed equally to this work
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Humans are remarkably adept at interpreting the gaze direction of other
individuals in their surroundings. This skill is at the core of the ability to
engage in joint visual attention, which is essential for establishing social
interactions. How accurate are humans in determining the gaze direction of
others in lifelike scenes, when they can move their heads and eyes freely, and
what are the sources of information for the underlying perceptual processes?
These questions pose a challenge from both empirical and computational
perspectives, due to the complexity of the visual input in real-life
situations. Here we measure empirically human accuracy in perceiving the gaze
direction of others in lifelike scenes, and study computationally the sources
of information and representations underlying this cognitive capacity. We show
that humans perform better in face-to-face conditions compared with recorded
conditions, and that this advantage is not due to the availability of input
dynamics. We further show that humans are still performing well when only the
eyes-region is visible, rather than the whole face. We develop a computational
model, which replicates the pattern of human performance, including the finding
that the eyes-region contains on its own, the required information for
estimating both head orientation and direction of gaze. Consistent with
neurophysiological findings on task-specific face regions in the brain, the
learned computational representations reproduce perceptual effects such as the
Wollaston illusion, when trained to estimate direction of gaze, but not when
trained to recognize objects or faces.

On the Existence of Synchrostates in Multichannel EEG Signals during Face-perception Tasks

Wasifa Jamal, Saptarshi Das, Koushik Maharatna, Fabio Apicella, Georgia Chronaki, Federico Sicca, David Cohen, Filippo Muratori
Comments: 30 pages, 22 figures, 2 tables
Journal-ref: Biomedical Physics & Engineering Express, vol. 1, no. 1, pp.
015002, 2015
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP); Machine Learning (stat.ML)

Phase synchronisation in multichannel EEG is known as the manifestation of
functional brain connectivity. Traditional phase synchronisation studies are
mostly based on time average synchrony measures hence do not preserve the
temporal evolution of the phase difference. Here we propose a new method to
show the existence of a small set of unique phase synchronised patterns or
“states” in multi-channel EEG recordings, each “state” being stable of the
order of ms, from typical and pathological subjects during face perception
tasks. The proposed methodology bridges the concepts of EEG microstates and
phase synchronisation in time and frequency domain respectively. The analysis
is reported for four groups of children including typical, Autism Spectrum
Disorder (ASD), low and high anxiety subjects – a total of 44 subjects. In all
cases, we observe consistent existence of these states – termed as
synchrostates – within specific cognition related frequency bands (beta and
gamma bands), though the topographies of these synchrostates differ for
different subject groups with different pathological conditions. The
inter-synchrostate switching follows a well-defined sequence capturing the
underlying inter-electrode phase relation dynamics in stimulus- and
person-centric manner. Our study is motivated from the well-known EEG
microstate exhibiting stable potential maps over the scalp. However, here we
report a similar observation of quasi-stable phase synchronised states in
multichannel EEG. The existence of the synchrostates coupled with their unique
switching sequence characteristics could be considered as a potentially new
field over contemporary EEG phase synchronisation studies.

Easy-setup eye movement recording system for human-computer interaction

Manh Duong Phung, Quang Vinh Tran, Kenji Hara, Hirohito Inagaki, Masanobu Abe
Comments: In IEEE International Conference on Research, Innovation and Vision for the Future (RIVF), 2008
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)

Tracking the movement of human eyes is expected to yield natural and
convenient applications based on human-computer interaction (HCI). To implement
an effective eye-tracking system, eye movements must be recorded without
placing any restriction on the user’s behavior or user discomfort. This paper
describes an eye movement recording system that offers free-head, simple
configuration. It does not require the user to wear anything on her head, and
she can move her head freely. Instead of using a computer, the system uses a
visual digital signal processor (DSP) camera to detect the position of eye
corner, the center of pupil and then calculate the eye movement. Evaluation
tests show that the sampling rate of the system can be 300 Hz and the accuracy
is about 1.8 degree/s.

Artificial Intelligence

Dialogue Learning With Human-In-The-Loop

Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

An important aspect of developing conversational agents is to give a bot the
ability to improve through communicating with humans and to learn from the
mistakes that it makes. Most research has focused on learning from fixed
training sets of labeled data rather than interacting with a dialogue partner
in an online fashion. In this paper we explore this direction in a
reinforcement learning setting where the bot improves its question-answering
ability from feedback a teacher gives following its generated responses. We
build a simulator that tests various aspects of such learning in a synthetic
environment, and introduce models that work in this regime. Finally, real
experiments with Mechanical Turk validate the approach.

Learning Concept Hierarchies through Probabilistic Topic Modeling

V. S. Anoop, S. Asharaf, P. Deepak
Journal-ref: International Journal of Information Processing (IJIP), Volume 10,
Issue 3, 2016
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

With the advent of semantic web, various tools and techniques have been
introduced for presenting and organizing knowledge. Concept hierarchies are one
such technique which gained significant attention due to its usefulness in
creating domain ontologies that are considered as an integral part of semantic
web. Automated concept hierarchy learning algorithms focus on extracting
relevant concepts from unstructured text corpus and connect them together by
identifying some potential relations exist between them. In this paper, we
propose a novel approach for identifying relevant concepts from plain text and
then learns hierarchy of concepts by exploiting subsumption relation between
them. To start with, we model topics using a probabilistic topic model and then
make use of some lightweight linguistic process to extract semantically rich
concepts. Then we connect concepts by identifying an “is-a” relationship
between pair of concepts. The proposed method is completely unsupervised and
there is no need for a domain specific training corpus for concept extraction
and learning. Experiments on large and real-world text corpora such as BBC News
dataset and Reuters News corpus shows that the proposed method outperforms some
of the existing methods for concept extraction and efficient concept hierarchy
learning is possible if the overall task is guided by a probabilistic topic
modeling algorithm.

Intelligible Language Modeling with Input Switched Affine Networks

The computational mechanisms by which nonlinear recurrent neural networks
(RNNs) achieve their goals remains an open question. There exist many problem
domains where intelligibility of the network model is crucial for deployment.
Here we introduce a recurrent architecture composed of input-switched affine
transformations, in other words an RNN without any nonlinearity and with one
set of weights per input. We show that this architecture achieves near
identical performance to traditional architectures on language modeling of
Wikipedia text, for the same number of model parameters. It can obtain this
performance with the potential for computational speedup compared to existing
methods, by precomputing the composed affine transformations corresponding to
longer input sequences. As our architecture is affine, we are able to
understand the mechanisms by which it functions using linear methods. For
example, we show how the network linearly combines contributions from the past
to make predictions at the current time step. We show how representations for
words can be combined in order to understand how context is transferred across
word boundaries. Finally, we demonstrate how the system can be executed and
analyzed in arbitrary bases to aid understanding.

Adams Conditioning and Likelihood Ratio Transfer Mediated Inference

Jan A. Bergstra
Comments: 43 pages
Subjects: Artificial Intelligence (cs.AI)

Forensic science advocates the use of inference mechanisms which may be
viewed as simple multi-agent protocols. An important protocol of this kind
involves an agent FE (forensic expert) who communicates to a second agent TOF
(trier of fact) first its value of a certain likelihood ratio with respect to
its own belief state which is supposed to be captured by a probability function
on FE’s proposition space. Subsequently FE communicates its recently acquired
confirmation that a certain evidence proposition is true. The inference part of
this sort of reasoning, here referred to as likelihood ratio transfer mediated
reasoning, involves TOF’s revision of its own belief state, and in particular
an evaluation of the resulting belief in the hypothesis proposition.

Different realizations of likelihood ratio transfer mediated reasoning are
distinguished: if the evidence hypothesis is included in the prior proposition
space of TOF then a comparison is made between understanding the TOF side of a
belief revision step as a composition of two successive steps of single
likelihood Adams conditioning followed by a Bayes conditioning step, and as a
single step of double likelihood Adams conditioning followed by Bayes
conditioning; if, however the evidence hypothesis is initially outside the
proposition space of TOF an application of proposition kinetics for the
introduction of the evidence proposition precedes Bayesian conditioning, which
is followed by Jeffrey conditioning on the hypothesis proposition.

NewsQA: A Machine Comprehension Dataset

Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, Kaheer Suleman
Comments: Under review for ICLR 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present NewsQA, a challenging machine comprehension dataset of over
100,000 question-answer pairs. Crowdworkers supply questions and answers based
on a set of over 10,000 news articles from CNN, with answers consisting in
spans of text from the corresponding articles. We collect this dataset through
a four-stage process designed to solicit exploratory questions that require
reasoning. A thorough analysis confirms that NewsQA demands abilities beyond
simple word matching and recognizing entailment. We measure human performance
on the dataset and compare it to several strong neural models. The performance
gap between humans and machines (25.3% F1) indicates that significant progress
can be made on NewsQA through future research. The dataset is freely available
at datasets.maluuba.com/NewsQA.

Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

Humans are remarkably adept at interpreting the gaze direction of other
individuals in their surroundings. This skill is at the core of the ability to
engage in joint visual attention, which is essential for establishing social
interactions. How accurate are humans in determining the gaze direction of
others in lifelike scenes, when they can move their heads and eyes freely, and
what are the sources of information for the underlying perceptual processes?
These questions pose a challenge from both empirical and computational
perspectives, due to the complexity of the visual input in real-life
situations. Here we measure empirically human accuracy in perceiving the gaze
direction of others in lifelike scenes, and study computationally the sources
of information and representations underlying this cognitive capacity. We show
that humans perform better in face-to-face conditions compared with recorded
conditions, and that this advantage is not due to the availability of input
dynamics. We further show that humans are still performing well when only the
eyes-region is visible, rather than the whole face. We develop a computational
model, which replicates the pattern of human performance, including the finding
that the eyes-region contains on its own, the required information for
estimating both head orientation and direction of gaze. Consistent with
neurophysiological findings on task-specific face regions in the brain, the
learned computational representations reproduce perceptual effects such as the
Wollaston illusion, when trained to estimate direction of gaze, but not when
trained to recognize objects or faces.

Fractional Order Fuzzy Control of Hybrid Power System with Renewable Generation Using Chaotic PSO

Indranil Pan, Saptarshi Das
Comments: 21 pages, 12 figures, 4 tables
Journal-ref: ISA Transactions, Volume 62, May 2016, Pages 19-29
Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC); Chaotic Dynamics (nlin.CD)

This paper investigates the operation of a hybrid power system through a
novel fuzzy control scheme. The hybrid power system employs various autonomous
generation systems like wind turbine, solar photovoltaic, diesel engine,
fuel-cell, aqua electrolyzer etc. Other energy storage devices like the
battery, flywheel and ultra-capacitor are also present in the network. A novel
fractional order (FO) fuzzy control scheme is employed and its parameters are
tuned with a particle swarm optimization (PSO) algorithm augmented with two
chaotic maps for achieving an improved performance. This FO fuzzy controller
shows better performance over the classical PID, and the integer order fuzzy
PID controller in both linear and nonlinear operating regimes. The FO fuzzy
controller also shows stronger robustness properties against system parameter
variation and rate constraint nonlinearity, than that with the other controller
structures. The robustness is a highly desirable property in such a scenario
since many components of the hybrid power system may be switched on/off or may
run at lower/higher power output, at different time instants.

Fractional Order AGC for Distributed Energy Resources Using Robust Optimization

The applicability of fractional order (FO) automatic generation control (AGC)
for power system frequency oscillation damping is investigated in this paper,
employing distributed energy generation. The hybrid power system employs
various autonomous generation systems like wind turbine, solar photovoltaic,
diesel engine, fuel-cell and aqua electrolyzer along with other energy storage
devices like the battery and flywheel. The controller is placed in a remote
location while receiving and sending signals over an unreliable communication
network with stochastic delay. The controller parameters are tuned using robust
optimization techniques employing different variants of Particle Swarm
Optimization (PSO) and are compared with the corresponding optimal solutions.
An archival based strategy is used for reducing the number of function
evaluations for the robust optimization methods. The solutions obtained through
the robust optimization are able to handle higher variation in the controller
gains and orders without significant decrease in the system performance. This
is desirable from the FO controller implementation point of view, as the design
is able to accommodate variations in the system parameter which may result due
to the approximation of FO operators, using different realization methods and
order of accuracy. Also a comparison is made between the FO and the integer
order (IO) controllers to highlight the merits and demerits of each scheme.

Semantic Parsing of Mathematics by Context-based Learning from Aligned Corpora and Theorem Proving

Cezary Kaliszyk, Josef Urban, Jiří Vyskočil
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We study methods for automated parsing of informal mathematical expressions
into formal ones, a main prerequisite for deep computer understanding of
informal mathematical texts. We propose a context-based parsing approach that
combines efficient statistical learning of deep parse trees with their semantic
pruning by type checking and large-theory automated theorem proving. We show
that the methods very significantly improve on previous results in parsing
theorems from the Flyspeck corpus.

Generic and Efficient Solution Solves the Shortest Paths Problem in Square Runtime

Yong Tan
Comments: 26 pages, 11,100 words, 2 pictures
Subjects: Discrete Mathematics (cs.DM); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

We study a group of new methods to solve an open problem that is the shortest
paths problem on a given fix-weighted instance. It is the real significance at
a considerable altitude to reach our aim to meet these qualities of generic,
efficiency, precision which we generally require to a methodology. Besides our
proof to guarantee our measures might work normally, we pay more interest to
root out the vital theory about calculation and logic in favor of our extension
to range over a wide field about decision, operator, economy, management,
robot, AI and etc.

Learning Filter Banks Using Deep Learning For Acoustic Signals

Shuhui Qu, Juncheng Li, Wei Dai, Samarjit Das
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

Designing appropriate features for acoustic event recognition tasks is an
active field of research. Expressive features should both improve the
performance of the tasks and also be interpret-able. Currently, heuristically
designed features based on the domain knowledge requires tremendous effort in
hand-crafting, while features extracted through deep network are difficult for
human to interpret. In this work, we explore the experience guided learning
method for designing acoustic features. This is a novel hybrid approach
combining both domain knowledge and purely data driven feature designing. Based
on the procedure of log Mel-filter banks, we design a filter bank learning
layer. We concatenate this layer with a convolutional neural network (CNN)
model. After training the network, the weight of the filter bank learning layer
is extracted to facilitate the design of acoustic features. We smooth the
trained weight of the learning layer and re-initialize it in filter bank
learning layer as audio feature extractor. For the environmental sound
recognition task based on the Urban- sound8K dataset, the experience guided
learning leads to a 2% accuracy improvement compared with the fixed feature
extractors (the log Mel-filter bank). The shape of the new filter banks are
visualized and explained to prove the effectiveness of the feature design
process.

Maximizing Non-Monotone DR-Submodular Functions with Cardinality Constraints

Ali Khodabakhsh, Evdokia Nikolova
Comments: 7 pages with 2 figures
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI)

We consider the problem of maximizing a non-monotone DR-submodular function
subject to a cardinality constraint. Diminishing returns (DR) submodularity is
a generalization of the diminishing returns property for functions defined over
the integer lattice. This generalization can be used to solve many machine
learning or combinatorial optimization problems such as optimal budget
allocation, revenue maximization, etc. In this work we propose the first
polynomial-time approximation algorithms for non-monotone constrained
maximization. We implement our algorithms for a revenue maximization problem
with a real-world dataset to check their efficiency and performance.

Emergence of foveal image sampling from learning to attend in visual scenes

Brian Cheung, Eric Weiss, Bruno Olshausen
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

We describe a neural attention model with a learnable retinal sampling
lattice. The model is trained on a visual search task requiring the
classification of an object embedded in a visual scene amidst background
distractors using the smallest number of fixations. We explore the tiling
properties that emerge in the model’s retinal sampling lattice after training.
Specifically, we show that this lattice resembles the eccentricity dependent
sampling lattice of the primate retina, with a high resolution region in the
fovea surrounded by a low resolution periphery. Furthermore, we find conditions
where these emergent properties are amplified or eliminated providing clues to
their function.

Split-door criterion for causal identification: Automatic search for natural experiments

Amit Sharma, Jake M. Hofman, Duncan J. Watts
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Applications (stat.AP)

Unobserved or unknown confounders complicate even the simplest attempts to
estimate the effect of one variable on another using observational data. When
cause and effect are both affected by unobserved confounders, methods based on
identifying natural experiments have been proposed to eliminate confounds.
However, their validity is hard to verify because they depend on assumptions
about the independence of variables, that by definition, cannot be measured. In
this paper we investigate a particular scenario in time series data that
permits causal identification in the presence of unobserved confounders and
present an algorithm to automatically find such scenarios. Specifically, we
examine what we call the split-door setting, when the effect variable can be
split up into two parts: one that is potentially affected by the cause, and
another that is independent of it. We show that when both of these variables
are caused by the same (unobserved) confounders, the problem of identification
reduces to that of testing for independence among observed variables. We
discuss various situations in which split-door variables are commonly recorded
in both online and offline settings, and demonstrate the method by estimating
the causal impact of Amazon’s recommender system, obtaining more than 23,000
natural experiments that provide similar—but more precise—estimates than
past studies.

Information Retrieval

A Graph-based Push Service Platform

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He
Subjects: Information Retrieval (cs.IR)

It is well known that learning customers’ preference and making
recommendations to them from today’s information-exploded environment is
critical and non-trivial in an on-line system. There are two different modes of
recommendation systems, namely pull-mode and push-mode. The majority of the
recommendation systems are pull-mode, which recommend items to users only when
and after users enter Application Market. While push-mode works more actively
to enhance or re-build connection between Application Market and users. As one
of the most successful phone manufactures,both the number of users and apps
increase dramatically in Huawei Application Store (also named Hispace Store),
which has approximately 0.3 billion registered users and 1.2 million apps until
2016 and whose number of users is growing with high-speed. For the needs of
real scenario, we establish a Push Service Platform (shortly, PSP) to discover
the target user group automatically from web-scale user operation log data with
an additional small set of labelled apps (usually around 10 apps),in Hispace
Store. As presented in this work,PSP includes distributed storage layer,
application layer and evaluation layer. In the application layer, we design a
practical graph-based algorithm (named A-PARW) for user group discovery, which
is an approximate version of partially absorbing random walk. Based on I mode
of A-PARW, the effectiveness of our system is significantly improved, compared
to the predecessor to presented system, which uses Personalized Pagerank in its
application layer.

Learning Concept Hierarchies through Probabilistic Topic Modeling

With the advent of semantic web, various tools and techniques have been
introduced for presenting and organizing knowledge. Concept hierarchies are one
such technique which gained significant attention due to its usefulness in
creating domain ontologies that are considered as an integral part of semantic
web. Automated concept hierarchy learning algorithms focus on extracting
relevant concepts from unstructured text corpus and connect them together by
identifying some potential relations exist between them. In this paper, we
propose a novel approach for identifying relevant concepts from plain text and
then learns hierarchy of concepts by exploiting subsumption relation between
them. To start with, we model topics using a probabilistic topic model and then
make use of some lightweight linguistic process to extract semantically rich
concepts. Then we connect concepts by identifying an “is-a” relationship
between pair of concepts. The proposed method is completely unsupervised and
there is no need for a domain specific training corpus for concept extraction
and learning. Experiments on large and real-world text corpora such as BBC News
dataset and Reuters News corpus shows that the proposed method outperforms some
of the existing methods for concept extraction and efficient concept hierarchy
learning is possible if the overall task is guided by a probabilistic topic
modeling algorithm.

Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval

Spatial relationships between objects provide important information for
text-based image retrieval. As users are more likely to describe a scene from a
real world perspective, using 3D spatial relationships rather than 2D
relationships that assume a particular viewing direction, one of the main
challenges is to infer the 3D structure that bridges images with users’ text
descriptions. However, direct inference of 3D structure from images requires
learning from large scale annotated data. Since interactions between objects
can be reduced to a limited set of atomic spatial relations in 3D, we study the
possibility of inferring 3D structure from a text description rather than an
image, applying physical relation models to synthesize holistic 3D abstract
object layouts satisfying the spatial constraints present in a textual
description. We present a generic framework for retrieving images from a
textual description of a scene by matching images with these generated abstract
object layouts. Images are ranked by matching object detection outputs
(bounding boxes) to 2D layout candidates (also represented by bounding boxes)
which are obtained by projecting the 3D scenes with sampled camera directions.
We validate our approach using public indoor scene datasets and show that our
method outperforms an object occurrence based and a learned 2D pairwise
relation based baselines.

Times series averaging and denoising from a probabilistic perspective on time-elastic kernels

Pierre-François Marteau (EXPRESSION)
Subjects: Learning (cs.LG); Information Retrieval (cs.IR)

In the light of regularized dynamic time warping kernels, this paper
re-considers the concept of time elastic centroid for a setof time series. We
derive a new algorithm based on a probabilistic interpretation of kernel
alignment matrices. This algorithm expressesthe averaging process in terms of a
stochastic alignment automata. It uses an iterative agglomerative heuristic
method for averagingthe aligned samples, while also averaging the times of
occurrence of the aligned samples. By comparing classification accuracies for45
heterogeneous time series datasets obtained by first nearest centroid/medoid
classifiers we show that: i) centroid-basedapproaches significantly outperform
medoid-based approaches, ii) for the considered datasets, our algorithm that
combines averagingin the sample space and along the time axes, emerges as the
most significantly robust model for time-elastic averaging with apromising
noise reduction capability. We also demonstrate its benefit in an isolated
gesture recognition experiment and its ability tosignificantly reduce the size
of training instance sets. Finally we highlight its denoising capability using
demonstrative synthetic data:we show that it is possible to retrieve, from few
noisy instances, a signal whose components are scattered in a wide spectral
band.

Computation and Language

NewsQA: A Machine Comprehension Dataset

We present NewsQA, a challenging machine comprehension dataset of over
100,000 question-answer pairs. Crowdworkers supply questions and answers based
on a set of over 10,000 news articles from CNN, with answers consisting in
spans of text from the corresponding articles. We collect this dataset through
a four-stage process designed to solicit exploratory questions that require
reasoning. A thorough analysis confirms that NewsQA demands abilities beyond
simple word matching and recognizing entailment. We measure human performance
on the dataset and compare it to several strong neural models. The performance
gap between humans and machines (25.3% F1) indicates that significant progress
can be made on NewsQA through future research. The dataset is freely available
at datasets.maluuba.com/NewsQA.

Geometry of Compositionality

Hongyu Gong, Suma Bhat, Pramod Viswanath
Subjects: Computation and Language (cs.CL)

This paper proposes a simple test for compositionality (i.e., literal usage)
of a word or phrase in a context-specific way. The test is computationally
simple, relying on no external resources and only uses a set of trained word
vectors. Experiments show that the proposed method is competitive with state of
the art and displays high accuracy in context-specific compositionality
detection of a variety of natural language phenomena (idiomaticity, sarcasm,
metaphor) for different datasets in multiple languages. The key insight is to
connect compositionality to a curious geometric property of word embeddings,
which is of independent interest.

Semantic Parsing of Mathematics by Context-based Learning from Aligned Corpora and Theorem Proving

Cezary Kaliszyk, Josef Urban, Jiří Vyskočil
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We study methods for automated parsing of informal mathematical expressions
into formal ones, a main prerequisite for deep computer understanding of
informal mathematical texts. We propose a context-based parsing approach that
combines efficient statistical learning of deep parse trees with their semantic
pruning by type checking and large-theory automated theorem proving. We show
that the methods very significantly improve on previous results in parsing
theorems from the Flyspeck corpus.

Sentiment Analysis for Twitter : Going Beyond Tweet Text

Lahari Poddar, Kishaloy Halder, Xianyan Jia
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Analysing sentiment of tweets is important as it helps to determine the
users’ opinion. Knowing people’s opinion is crucial for several purposes
starting from gathering knowledge about customer base, e-governance,
campaigning and many more. In this report, we aim to develop a system to detect
the sentiment from tweets. We employ several linguistic features along with
some other external sources of information to detect the sentiment of a tweet.
We show that augmenting the 140 character-long tweet with information harvested
from external urls shared in the tweet as well as Social Media features
enhances the sentiment prediction accuracy significantly.

An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

Chris Lengerich, Awni Hannun
Comments: NIPS 2016 End-to-End Learning for Speech and Audio Processing Workshop
Subjects: Computation and Language (cs.CL)

We propose a single neural network architecture for two tasks: on-line
keyword spotting and voice activity detection. We develop novel inference
algorithms for an end-to-end Recurrent Neural Network trained with the
Connectionist Temporal Classification loss function which allow our model to
achieve high accuracy on both keyword spotting and voice activity detection
without retraining. In contrast to prior voice activity detection models, our
architecture does not require aligned training data and uses the same
parameters as the keyword spotting model. This allows us to deploy a high
quality voice activity detector with no additional memory or maintenance
requirements.

Dialogue Learning With Human-In-The-Loop

Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

An important aspect of developing conversational agents is to give a bot the
ability to improve through communicating with humans and to learn from the
mistakes that it makes. Most research has focused on learning from fixed
training sets of labeled data rather than interacting with a dialogue partner
in an online fashion. In this paper we explore this direction in a
reinforcement learning setting where the bot improves its question-answering
ability from feedback a teacher gives following its generated responses. We
build a simulator that tests various aspects of such learning in a synthetic
environment, and introduce models that work in this regime. Finally, real
experiments with Mechanical Turk validate the approach.

Learning Concept Hierarchies through Probabilistic Topic Modeling

With the advent of semantic web, various tools and techniques have been
introduced for presenting and organizing knowledge. Concept hierarchies are one
such technique which gained significant attention due to its usefulness in
creating domain ontologies that are considered as an integral part of semantic
web. Automated concept hierarchy learning algorithms focus on extracting
relevant concepts from unstructured text corpus and connect them together by
identifying some potential relations exist between them. In this paper, we
propose a novel approach for identifying relevant concepts from plain text and
then learns hierarchy of concepts by exploiting subsumption relation between
them. To start with, we model topics using a probabilistic topic model and then
make use of some lightweight linguistic process to extract semantically rich
concepts. Then we connect concepts by identifying an “is-a” relationship
between pair of concepts. The proposed method is completely unsupervised and
there is no need for a domain specific training corpus for concept extraction
and learning. Experiments on large and real-world text corpora such as BBC News
dataset and Reuters News corpus shows that the proposed method outperforms some
of the existing methods for concept extraction and efficient concept hierarchy
learning is possible if the overall task is guided by a probabilistic topic
modeling algorithm.

Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Classifying products into categories precisely and efficiently is a major
challenge in modern e-commerce. The high traffic of new products uploaded daily
and the dynamic nature of the categories raise the need for machine learning
models that can reduce the cost and time of human editors. In this paper, we
propose a decision level fusion approach for multi-modal product classification
using text and image inputs. We train input specific state-of-the-art deep
neural networks for each input source, show the potential of forging them
together into a multi-modal architecture and train a novel policy network that
learns to choose between them. Finally, we demonstrate that our multi-modal
network improves the top-1 accuracy % over both networks on a real-world
large-scale product classification dataset that we collected fromWalmart.com.
While we focus on image-text fusion that characterizes e-commerce domains, our
algorithms can be easily applied to other modalities such as audio, video,
physical sensors, etc.

Intelligible Language Modeling with Input Switched Affine Networks

The computational mechanisms by which nonlinear recurrent neural networks
(RNNs) achieve their goals remains an open question. There exist many problem
domains where intelligibility of the network model is crucial for deployment.
Here we introduce a recurrent architecture composed of input-switched affine
transformations, in other words an RNN without any nonlinearity and with one
set of weights per input. We show that this architecture achieves near
identical performance to traditional architectures on language modeling of
Wikipedia text, for the same number of model parameters. It can obtain this
performance with the potential for computational speedup compared to existing
methods, by precomputing the composed affine transformations corresponding to
longer input sequences. As our architecture is affine, we are able to
understand the mechanisms by which it functions using linear methods. For
example, we show how the network linearly combines contributions from the past
to make predictions at the current time step. We show how representations for
words can be combined in order to understand how context is transferred across
word boundaries. Finally, we demonstrate how the system can be executed and
analyzed in arbitrary bases to aid understanding.

Distributed, Parallel, and Cluster Computing

Serving the Grid: an Experimental Study of Server Clusters as Real-Time Demand Response Resources

Josiah McClurg, Raghuraman Mudumbai
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Demand response is a crucial technology to allow large-scale penetration of
intermittent renewable energy sources in the electric grid. This paper is based
on the thesis that datacenters represent especially attractive candidates for
providing flexible, real-time demand response services to the grid; they are
capable of finely-controllable power consumption, fast power ramp-rates, and
large dynamic range. This paper makes two main contributions: (a) it provides
detailed experimental evidence justifying this thesis, and (b) it presents a
comparative investigation of three candidate software interfaces for power
control within the servers. All of these results are based on a series of
experiments involving real-time power measurements on a lab-scale server
cluster. This cluster was specially instrumented for accurate and fast power
measurements on a time-scale of 100 ms or less. Our results provide preliminary
evidence for the feasibility of large scale demand response using datacenters,
and motivates future work on exploiting this capability.

Proposal of Optimum Application Deployment Technology for Heterogeneous IaaS Cloud

Yoji Yamato
Comments: 4 pages, 1 figure, 2016 6th International Workshop on Computer Science and Engineering (WCSE 2016), June 2016
Journal-ref: 2016 6th International Workshop on Computer Science and
Engineering (WCSE 2016), pp.34-37, June 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Recently, cloud systems composed of heterogeneous hardware have been
increased to utilize progressed hardware power. However, to program
applications for heterogeneous hardware to achieve high performance needs much
technical skill and is difficult for users. Therefore, to achieve high
performance easily, this paper proposes a PaaS which analyzes application
logics and offloads computations to GPU and FPGA automatically when users
deploy applications to clouds.

Server Structure Proposal and Automatic Verification Technology on IaaS Cloud of Plural Type Servers

Yoji Yamato
Comments: 13 pages, 9 figures, International Conference on Internet Studies (NETs2015), July 2015
Journal-ref: International Conference on Internet Studies (NETs2015), July 2015
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper, we propose a server structure proposal and automatic
performance verification technology which proposes and verifies an appropriate
server structure on Infrastructure as a Service (IaaS) cloud with baremetal
servers, container based virtual servers and virtual machines. Recently, cloud
services have been progressed and providers provide not only virtual machines
but also baremetal servers and container based virtual servers. However, users
need to design an appropriate server structure for their requirements based on
3 types quantitative performances and users need much technical knowledge to
optimize their system performances. Therefore, we study a technology which
satisfies users’ performance requirements on these 3 types IaaS cloud. Firstly,
we measure performances of a baremetal server, Docker containers, KVM (Kernel
based Virtual Machine) virtual machines on OpenStack with virtual server number
changing. Secondly, we propose a server structure proposal technology based on
the measured quantitative data. A server structure proposal technology receives
an abstract template of OpenStack Heat and function/performance requirements
and then creates a concrete template with server specification information.
Thirdly, we propose an automatic performance verification technology which
executes necessary performance tests automatically on provisioned user
environments according to the template.

Cluster-wide Scheduling of Flexible, Distributed Analytic Applications

Pace Francesco, Daniele Venzano, Damiano Carra, Pietro Michiardi
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This work addresses the problem of scheduling user-defined analytic
applications, which we define as high-level compositions of frameworks, their
components, and the logic necessary to carry out work. The key idea in our
application definition, is to distinguish classes of components, including
rigid and elastic types: the first being required for an application to make
progress, the latter contributing to reduced execution times. We show that the
problem of scheduling such applications poses new challenges, which existing
approaches address inefficiently.

Thus, we present the design and evaluation of a novel, flexible heuristic to
schedule analytic applications, that aim at high system responsiveness, by
allocating resources efficiently, thanks to the flexibility of elastic
components. Our algorithm is evaluated using a trace-driven simulation
approach, with large-scale real system traces: our flexible scheduler
outperforms a baseline approach across a variety of metrics, including
application turnaround times, and resource allocation efficiency.

We also present the design and evaluation of a full-fledged system, which we
called Zoe, that incorporates the ideas presented in this paper, and report
concrete improvements in terms of efficiency and performance, with respect to
prior generations of our system.

Learning

Improving Variational Auto-Encoders using Householder Flow

Jakub M. Tomczak, Max Welling
Comments: Bayesian Deep Learning Workshop (NIPS 2016)
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Variational auto-encoders (VAE) are scalable and powerful generative models.
However, the choice of the variational posterior determines tractability and
flexibility of the VAE. Commonly, latent variables are modeled using the normal
distribution with a diagonal covariance matrix. This results in computational
efficiency but typically it is not flexible enough to match the true posterior
distribution. One fashion of enriching the variational posterior distribution
is application of normalizing flows, i.e., a series of invertible
transformations to latent variables with a simple posterior. In this paper, we
follow this line of thinking and propose a volume-preserving flow that uses a
series of Householder transformations. We show empirically on MNIST dataset and
histopathology data that the proposed flow allows to obtain more flexible
variational posterior and highly competitive results comparing to other
normalizing flows.

Graph-Based Manifold Frequency Analysis for Denoising

Shay Deutsch, Antonio Ortega, Gerard Medioni
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We propose a new framework for manifold denoising based on processing in the
graph Fourier frequency domain, derived from the spectral decomposition of the
discrete graph Laplacian. Our approach uses the Spectral Graph Wavelet
transform in order to per- form non-iterative denoising directly in the graph
frequency domain, an approach inspired by conventional wavelet-based signal
denoising methods. We theoretically justify our approach, based on the fact
that for smooth manifolds the coordinate information energy is localized in the
low spectral graph wavelet sub-bands, while the noise affects all frequency
bands in a similar way. Experimental results show that our proposed manifold
frequency denoising (MFD) approach significantly outperforms the state of the
art denoising meth- ods, and is robust to a wide range of parameter selections,
e.g., the choice of k nearest neighbor connectivity of the graph.

Cost-Sensitive Random Pair Encoding for Multi-Label Classification

Yao-Yuan Yang, Chih-Wei Chang, Hsuan-Tien Lin
Subjects: Learning (cs.LG)

We propose a novel cost-sensitive multi-label classification algorithm called
cost-sensitive random pair encoding (CSRPE). CSRPE reduces the cost-sensitive
multi-label classification problem to many cost-sensitive binary classification
problems through the label powerset approach followed by the classic
one-versus-one decomposition. While such a naive reduction results in
exponentially-many classifiers, we resolve the training challenge of building
the many classifiers by random sampling, and the prediction challenge of voting
from the many classifiers by nearest-neighbor decoding through casting the
one-versus-one decomposition as a special case of error-correcting code.
Extensive experimental results demonstrate that CSRPE achieves stable
convergence and reaches better performance than other ensemble-learning and
error-correcting-coding algorithms for multi-label classification. The results
also justify that CSRPE is competitive with state-of-the-art cost-sensitive
multi-label classification algorithms for cost-sensitive multi-label
classification.

The Emergence of Organizing Structure in Conceptual Representation

Brenden M. Lake, Neil D. Lawrence, Joshua B. Tenenbaum
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Both scientists and children make important structural discoveries, yet their
computational underpinnings are not well understood. Structure discovery has
previously been formalized as probabilistic inference about the right
structural form — where form could be a tree, ring, chain, grid, etc. [Kemp &
Tenenbaum (2008). The discovery of structural form. PNAS, 105(3), 10687-10692].
While this approach can learn intuitive organizations, including a tree for
animals and a ring for the color circle, it assumes a strong inductive bias
that considers only these particular forms, and each form is explicitly
provided as initial knowledge. Here we introduce a new computational model of
how organizing structure can be discovered, utilizing a broad hypothesis space
with a preference for sparse connectivity. Given that the inductive bias is
more general, the model’s initial knowledge shows little qualitative
resemblance to some of the discoveries it supports. As a consequence, the model
can also learn complex structures for domains that lack intuitive description,
as well as predict human property induction judgments without explicit
structural forms. By allowing form to emerge from sparsity, our approach
clarifies how both the richness and flexibility of human conceptual
organization can coexist.

Learning Features of Music from Scratch

John Thickstun, Zaid Harchaoui, Sham Kakade
Comments: 13 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

We introduce a new large-scale music dataset, MusicNet, to serve as a source
of supervision and evaluation of machine learning methods for music research.
MusicNet consists of hundreds of freely-licensed classical music recordings by
10 composers, written for 11 instruments, together with instrument/note
annotations resulting in over 1 million temporal labels on 34 hours of chamber
music performances under various studio and microphone conditions.

We define a multi-label classification task to predict notes in musical
recordings, along with an evaluation protocol. We benchmark several machine
learning architectures for this task: i) learning from “hand-crafted”
spectrogram features; ii) end-to-end learning with a neural net; iii)
end-to-end learning with a convolutional neural net. We show that several
end-to-end learning proposals outperform approaches based on learning from
hand-crafted audio features.

Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

Humans are remarkably adept at interpreting the gaze direction of other
individuals in their surroundings. This skill is at the core of the ability to
engage in joint visual attention, which is essential for establishing social
interactions. How accurate are humans in determining the gaze direction of
others in lifelike scenes, when they can move their heads and eyes freely, and
what are the sources of information for the underlying perceptual processes?
These questions pose a challenge from both empirical and computational
perspectives, due to the complexity of the visual input in real-life
situations. Here we measure empirically human accuracy in perceiving the gaze
direction of others in lifelike scenes, and study computationally the sources
of information and representations underlying this cognitive capacity. We show
that humans perform better in face-to-face conditions compared with recorded
conditions, and that this advantage is not due to the availability of input
dynamics. We further show that humans are still performing well when only the
eyes-region is visible, rather than the whole face. We develop a computational
model, which replicates the pattern of human performance, including the finding
that the eyes-region contains on its own, the required information for
estimating both head orientation and direction of gaze. Consistent with
neurophysiological findings on task-specific face regions in the brain, the
learned computational representations reproduce perceptual effects such as the
Wollaston illusion, when trained to estimate direction of gaze, but not when
trained to recognize objects or faces.

Co-adaptive learning over a countable space

Michael Rabadi
Comments: 6 pages, 1 figure, NIPS 2016 Time Series Workshop
Journal-ref: In NIPS 2016 Time Series Workshop. Barcelona, Spain
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Co-adaptation is a special form of on-line learning where an algorithm
(mathcal{A}) must assist an unknown algorithm (mathcal{B}) to perform some
task. This is a general framework and has applications in recommendation
systems, search, education, and much more. Today, the most common use of
co-adaptive algorithms is in brain-computer interfacing (BCI), where algorithms
help patients gain and maintain control over prosthetic devices. While previous
studies have shown strong empirical results Kowalski et al. (2013); Orsborn et
al. (2014) or have been studied in specific examples Merel et al. (2013, 2015),
there is no general analysis of the co-adaptive learning problem. Here we will
study the co-adaptive learning problem in the online, closed-loop setting. We
will prove that, with high probability, co-adaptive learning is guaranteed to
outperform learning with a fixed decoder as long as a particular condition is
met.

Gossip training for deep learning

Michael Blot, David Picard, Matthieu Cord, Nicolas Thome
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

We address the issue of speeding up the training of convolutional networks.
Here we study a distributed method adapted to stochastic gradient descent
(SGD). The parallel optimization setup uses several threads, each applying
individual gradient descents on a local variable. We propose a new way to share
information between different threads inspired by gossip algorithms and showing
good consensus convergence properties. Our method called GoSGD has the
advantage to be fully asynchronous and decentralized. We compared our method to
the recent EASGD in cite{elastic} on CIFAR-10 show encouraging results.

Associative Memory using Dictionary Learning and Expander Decoding

Arya Mazumdar, Ankit Singh Rawat
Comments: To appear in AAAI 2017
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

An associative memory is a framework of content-addressable memory that
stores a collection of message vectors (or a dataset) over a neural network
while enabling a neurally feasible mechanism to recover any message in the
dataset from its noisy version. Designing an associative memory requires
addressing two main tasks: 1) learning phase: given a dataset, learn a concise
representation of the dataset in the form of a graphical model (or a neural
network), 2) recall phase: given a noisy version of a message vector from the
dataset, output the correct message vector via a neurally feasible algorithm
over the network learnt during the learning phase. This paper studies the
problem of designing a class of neural associative memories which learns a
network representation for a large dataset that ensures correction against a
large number of adversarial errors during the recall phase. Specifically, the
associative memories designed in this paper can store dataset containing
(exp(n)) (n)-length message vectors over a network with (O(n)) nodes and can
tolerate (Omega(frac{n}{{
m polylog} n})) adversarial errors. This paper
carries out this memory design by mapping the learning phase and recall phase
to the tasks of dictionary learning with a square dictionary and iterative
error correction in an expander code, respectively.

Fast Wavenet Generation Algorithm

Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang
Comments: Technical Report
Subjects: Sound (cs.SD); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

This paper presents an efficient implementation of the Wavenet generation
process called Fast Wavenet. Compared to a naive implementation that has
complexity O(2^L) (L denotes the number of layers in the network), our proposed
approach removes redundant convolution operations by caching previous
calculations, thereby reducing the complexity to O(L) time. Timing experiments
show significant advantages of our fast implementation over a naive one. While
this method is presented for Wavenet, the same scheme can be applied anytime
one wants to perform autoregressive generation or online prediction using a
model with dilated convolution layers. The code for our method is publicly
available.

The Upper Bound on Knots in Neural Networks

Kevin K. Chen
Comments: 19 pages, 8 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Neural networks with rectified linear unit activations are essentially
multivariate linear splines. As such, one of many ways to measure the
“complexity” or “expressivity” of a neural network is to count the number of
knots in the spline model. We study the number of knots in fully-connected
feedforward neural networks with rectified linear unit activation functions. We
intentionally keep the neural networks very simple, so as to make theoretical
analyses more approachable. An induction on the number of layers (l) reveals a
tight upper bound on the number of knots in (mathbb{R} o mathbb{R}^p) deep
neural networks. With (n_i gg 1) neurons in layer (i = 1, dots, l), the upper
bound is approximately (n_1 dots n_l). We then show that the exact upper bound
is tight, and we demonstrate the upper bound with an example. The purpose of
these analyses is to pave a path for understanding the behavior of general
(mathbb{R}^q o mathbb{R}^p) neural networks.

The empirical size of trained neural networks

Kevin K. Chen, Anthony Gamst, Alden Walker
Comments: 6 pages, 5 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

ReLU neural networks define piecewise linear functions of their inputs.
However, initializing and training a neural network is very different from
fitting a linear spline. In this paper, we expand empirically upon previous
theoretical work to demonstrate features of trained neural networks. Standard
network initialization and training produce networks vastly simpler than a
naive parameter count would suggest and can impart odd features to the trained
network. However, we also show the forced simplicity is beneficial and, indeed,
critical for the wide success of these networks.

Intelligible Language Modeling with Input Switched Affine Networks

The computational mechanisms by which nonlinear recurrent neural networks
(RNNs) achieve their goals remains an open question. There exist many problem
domains where intelligibility of the network model is crucial for deployment.
Here we introduce a recurrent architecture composed of input-switched affine
transformations, in other words an RNN without any nonlinearity and with one
set of weights per input. We show that this architecture achieves near
identical performance to traditional architectures on language modeling of
Wikipedia text, for the same number of model parameters. It can obtain this
performance with the potential for computational speedup compared to existing
methods, by precomputing the composed affine transformations corresponding to
longer input sequences. As our architecture is affine, we are able to
understand the mechanisms by which it functions using linear methods. For
example, we show how the network linearly combines contributions from the past
to make predictions at the current time step. We show how representations for
words can be combined in order to understand how context is transferred across
word boundaries. Finally, we demonstrate how the system can be executed and
analyzed in arbitrary bases to aid understanding.

Emergence of foveal image sampling from learning to attend in visual scenes

Brian Cheung, Eric Weiss, Bruno Olshausen
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

We describe a neural attention model with a learnable retinal sampling
lattice. The model is trained on a visual search task requiring the
classification of an object embedded in a visual scene amidst background
distractors using the smallest number of fixations. We explore the tiling
properties that emerge in the model’s retinal sampling lattice after training.
Specifically, we show that this lattice resembles the eccentricity dependent
sampling lattice of the primate retina, with a high resolution region in the
fovea surrounded by a low resolution periphery. Furthermore, we find conditions
where these emergent properties are amplified or eliminated providing clues to
their function.

Safety-Aware Robot Damage Recovery Using Constrained Bayesian Optimization and Simulated Priors

Vaios Papaspyros, Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Jean-Baptiste Mouret
Comments: Accepted at the BayesOpt 2016 NIPS workshop, 5 pages, 2 figures, 1 algorithm
Subjects: Robotics (cs.RO); Learning (cs.LG)

The recently introduced Intelligent Trial-and-Error (IT&E) algorithm showed
that robots can adapt to damage in a matter of a few trials. The success of
this algorithm relies on two components: prior knowledge acquired through
simulation with an intact robot, and Bayesian optimization (BO) that operates
on-line, on the damaged robot. While IT&E leads to fast damage recovery, it
does not incorporate any safety constraints that prevent the robot from
attempting harmful behaviors. In this work, we address this limitation by
replacing the BO component with a constrained BO procedure. We evaluate our
approach on a simulated damaged humanoid robot that needs to crawl as fast as
possible, while performing as few unsafe trials as possible. We compare our new
“safety-aware IT&E” algorithm to IT&E and a multi-objective version of IT&E in
which the safety constraints are dealt as separate objectives. Our results show
that our algorithm outperforms the other approaches, both in crawling speed
within the safe regions and number of unsafe trials.

Information Theory

Perturbation-Based Regularization for Signal Estimation in Linear Discrete Ill-posed Problems

Mohamed Suliman, Tarig Ballal, Tareq Y. Al-Naffouri
Comments: 13 pages, Journal
Subjects: Information Theory (cs.IT)

Estimating the values of unknown parameters from corrupted measured data
faces a lot of challenges in ill-posed problems. In such problems, many
fundamental estimation methods fail to provide a meaningful stabilized
solution. In this work, we propose a new regularization approach and a new
regularization parameter selection approach for linear least-squares discrete
ill-posed problems. The proposed approach is based on enhancing the
singular-value structure of the ill-posed model matrix to acquire a better
solution. Unlike many other regularization algorithms that seek to minimize the
estimated data error, the proposed approach is developed to minimize the
mean-squared error of the estimator which is the objective in many typical
estimation scenarios. The performance of the proposed approach is demonstrated
by applying it to a large set of real-world discrete ill-posed problems.
Simulation results demonstrate that the proposed approach outperforms a set of
benchmark regularization methods in most cases. In addition, the approach also
enjoys the lowest runtime and offers the highest level of robustness amongst
all the tested benchmark regularization methods.

Information Rates and post-FEC BER Prediction in Optical Fiber Communications

Alex Alvarado
Comments: Invited paper, OFC 2017
Subjects: Information Theory (cs.IT)

Information-theoretic metrics to analyze optical fiber communications systems
with binary and nonbinary soft-decision FEC are reviewed. The numerical
evaluation of these metrics in both simulations and experiments is also
discussed. Ready-to-use closed-form approximations are presented.

Transmit design for MIMO wiretap channel with a malicious jammer

Duo Zhang, Weidong Mei, Lingxiang Li, Zhi Chen
Comments: 2015 IEEE 81st Vehicular Technology Conference (VTC Spring)
Subjects: Information Theory (cs.IT)

In this paper, we consider the transmit design for multi-input multi-output
(MIMO) wiretap channel including a malicious jammer. We first transform the
system model into the traditional three-node wiretap channel by whitening the
interference at the legitimate user. Additionally, the eavesdropper channel
state information (ECSI) may be fully or statistically known, even unknown to
the transmitter. Hence, some strategies are proposed in terms of different
levels of ECSI available to the transmitter in our paper. For the case of
unknown ECSI, a target rate for the legitimate user is first specified. And
then an inverse water-filling algorithm is put forward to find the optimal
power allocation for each information symbol, with a stepwise search being used
to adjust the spatial dimension allocated to artificial noise (AN) such that
the target rate is achievable. As for the case of statistical ECSI, several
simulated channels are randomly generated according to the distribution of
ECSI. We show that the ergodic secrecy capacity can be approximated as the
average secrecy capacity of these simulated channels. Through maximizing this
average secrecy capacity, we can obtain a feasible power and spatial dimension
allocation scheme by using one dimension search. Finally, numerical results
reveal the effectiveness and computational efficiency of our algorithms.

Generalization of the de Bruijn's identity to general (φ)-entropies and (φ)-Fisher informations

Irene Valero Toranzo, Steeve Zozor, Jean-Marc Brossier
Subjects: Information Theory (cs.IT)

In this paper, we propose generalizations of the de Bruijn’s identities based
on extensions of the Shannon entropy, Fisher information and their associated
divergences or relative measures. The foundation of these generalizations are
the (phi)-entropies and divergences of the Csisz’a’s class (or Salicr’u’s
class) considered within a multidimensional context, included the
monodimensional case, and for several type of noisy channels characterized by a
more general probability distribution beyond the well-known Gaussian noise. It
is found that the gradient and/or the hessian of these entropies or divergences
with respect to the noise parameters give naturally rise to generalized
versions of the Fisher information or divergence, which are named as the
(phi)-Fisher information (divergence). The obtained identities can be viewed
as further extensions of the classical de Bruijn’s identity. Analogously, it is
shown that a similar relation holds between the (phi)-divergence and a
extended mean-square error, named (phi)-mean square error, for the Gaussian
channel.

Associative Memory using Dictionary Learning and Expander Decoding

Arya Mazumdar, Ankit Singh Rawat
Comments: To appear in AAAI 2017
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

An associative memory is a framework of content-addressable memory that
stores a collection of message vectors (or a dataset) over a neural network
while enabling a neurally feasible mechanism to recover any message in the
dataset from its noisy version. Designing an associative memory requires
addressing two main tasks: 1) learning phase: given a dataset, learn a concise
representation of the dataset in the form of a graphical model (or a neural
network), 2) recall phase: given a noisy version of a message vector from the
dataset, output the correct message vector via a neurally feasible algorithm
over the network learnt during the learning phase. This paper studies the
problem of designing a class of neural associative memories which learns a
network representation for a large dataset that ensures correction against a
large number of adversarial errors during the recall phase. Specifically, the
associative memories designed in this paper can store dataset containing
(exp(n)) (n)-length message vectors over a network with (O(n)) nodes and can
tolerate (Omega(frac{n}{{
m polylog} n})) adversarial errors. This paper
carries out this memory design by mapping the learning phase and recall phase
to the tasks of dictionary learning with a square dictionary and iterative
error correction in an expander code, respectively.