IT博客汇 | arXiv Paper Daily: Thu, 20 Oct 2016

arXiv Paper Daily: Thu, 20 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-20 00:00:00

Neural and Evolutionary Computing

Particle Swarm Optimization for Generating Fuzzy Reinforcement Learning Policies

Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

Fuzzy controllers are known to serve as efficient and interpretable system
controllers for continuous state and action spaces. To date these controllers
have been constructed by hand, or automatically trained either on expert
generated problem specific cost functions or by incorporating detailed
knowledge about the optimal control strategy. Both requirements for automatic
training processes are not given in the majority of real world reinforcement
learning (RL) problems. We introduce a new particle swarm reinforcement
learning (PSRL) approach which is capable of constructing fuzzy RL policies
solely by training parameters on world models produced from randomly generated
samples of the real system. This approach relates self-organizing fuzzy
controllers to model-based RL for the first time. PSRL can be used
straightforward on any RL problem, which is demonstrated on three standard RL
benchmarks, mountain car, cart pole balancing and cart pole swing up. Our
experiments yielded high performing and well interpretable fuzzy policies.

Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning

Qianli Liao, Kenji Kawaguchi, Tomaso Poggio
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We systematically explored a spectrum of normalization algorithms related to
Batch Normalization (BN) and propose a generalized formulation that
simultaneously solves two major limitations of BN: (1) online learning and (2)
recurrent learning. Our proposal is simpler and more biologically-plausible.
Unlike previous approaches, our technique can be applied out of the box to all
learning scenarios (e.g., online learning, batch learning, fully-connected,
convolutional, feedforward, recurrent and mixed — recurrent and
convolutional) and compare favorably with existing approaches. We also propose
Lp Normalization for normalizing by different orders of statistical moments. In
particular, L1 normalization is well-performing, simple to implement, fast to
compute, more biologically-plausible and thus ideal for GPU or hardware
implementations.

Computer Vision and Pattern Recognition

POI: Multiple Object Tracking with High Performance Detection and Appearance Feature

Fengwei Yu, Wenbo Li, Quanquan Li, Yu Liu, Xiaohua Shi, Junjie Yan
Comments: ECCV workshop BMTT 2016
Journal-ref: ECCV 2016 Workshops, Part II, LNCS 9914, paper approval (Chapter
3, 978-3-319-48880-6, 434776_1_En
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Detection and learning based appearance feature play the central role in data
association based multiple object tracking (MOT), but most recent MOT works
usually ignore them and only focus on the hand-crafted feature and association
algorithms. In this paper, we explore the high-performance detection and deep
learning based appearance feature, and show that they lead to significantly
better MOT results in both online and offline setting. We make our detection
and appearance feature publicly available. In the following part, we first
summarize the detection and appearance feature, and then introduce our tracker
named Person of Interest (POI), which has both online and offline version.

Robust Video Synchronization using Unsupervised Deep Learning

Ido Freeman, Patrick Wieschollek, Hendrik P.A. Lensch
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Aligning video sequences is a fundamental yet still unsolved component for a
wide range of applications in computer graphics and vision. Especially when
targeting video clips containing an extensively varying appearance. Using
recent advances in deep learning, we present a scalable and robust method for
computing optimal non-linear temporal video alignments. The presented algorithm
learns to retrieve and match similar video frames from input sequences without
any human interaction or additional annotations in an unsupervised fashion. An
iterative scheme is presented which leverages on the nature of the videos
themselves in order to remove the need for labels. We incorporate a variation
of Dijkstra’s shortest-path algorithm for extracting meaningful training
examples as well as a robust video alignment. While previous methods assume
similar settings as weather conditions, season and illumination, our approach
is able to robustly align videos regardless of such noise. This provides new
ways of compositing non-seasonal video clips from data recorded months apart.

An automatic bad band preremoval algorithm for hyperspectral imagery

Luyan Ji, Xiurui Geng, Yongchao Zhao, Fuxiang Wang
Comments: 17 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

For most hyperspectral remote sensing applications, removing bad bands, such
as water absorption bands, is a required preprocessing step. Currently, the
commonly applied method is by visual inspection, which is very time-consuming
and it is easy to overlook some noisy bands. In this study, we find an inherent
connection between target detection algorithms and the corrupted band removal.
As an example, for the matched filter (MF), which is the most widely used
target detection method for hyperspectral data, we present an automatic
MF-based algorithm for bad band identification. The MF detector is a filter
vector, and the resulting filter output is the sum of all bands weighted by the
MF coefficients. Therefore, we can identify bad bands only by using the MF
filter vector itself, the absolute value of whose entry accounts for the
importance of each band for the target detection. For a specific target of
interest, the bands with small MF weights correspond to the noisy or bad ones.
Based on this fact, we develop an automatic bad band preremoval algorithm by
utilizing the average absolute value of MF weights for multiple targets within
a scene. Experiments with three well known hyperspectral datasets show that our
method can always identify the water absorption and other low signal-to-noise
(SNR) bands that are usually chosen as bad bands manually.

A Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation

Duc Thanh Nguyen, Binh-Son Hua, Lap-Fai Yu, Sai-Kit Yeung
Comments: 14 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances of 3D acquisition devices have enabled large-scale
acquisition of 3D scene data. Such data, if completely and well annotated, can
serve as useful ingredients for a wide spectrum of computer vision and graphics
works such as data-driven modeling and scene understanding, object detection
and recognition. However, annotating a vast amount of 3D scene data remains
challenging due to the lack of an effective tool and/or the complexity of 3D
scenes (e.g. clutter, varying illumination conditions). This paper aims to
build a robust annotation tool that effectively and conveniently enables the
segmentation and annotation of massive 3D data. Our tool works by coupling 2D
and 3D information via an interactive framework, through which users can
provide high-level semantic annotation for objects. We have experimented our
tool and found that a typical indoor scene could be well segmented and
annotated in less than 30 minutes by using the tool, as opposed to a few hours
if done manually. Along with the tool, we created a dataset of over a hundred
3D scenes associated with complete annotations using our tool. The tool and
dataset are available at www.scenenn.net.

StuffNet: Using 'Stuff' to Improve Object Detection

Samarth Brahmbhatt, Henrik I. Christensen, James Hays
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a Convolutional Neural Network (CNN) based algorithm – StuffNet –
for object detection. In addition to the standard convolutional features
trained for region proposal and object detection [31], StuffNet uses
convolutional features trained for segmentation of objects and ‘stuff’
(amorphous categories such as ground and water). Through experiments on Pascal
VOC 2010, we show the importance of features learnt from stuff segmentation for
improving object detection performance. StuffNet improves performance from
18.8% mAP to 23.9% mAP for small objects. We also devise a method to train
StuffNet on datasets that do not have stuff segmentation labels. Through
experiments on Pascal VOC 2007 and 2012, we demonstrate the effectiveness of
this method and show that StuffNet also significantly improves object detection
performance on such datasets.

Mixed context networks for semantic segmentation

Haiming Sun, Di Xie, Shiliang Pu
Comments: 5 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Semantic segmentation is challenging as it requires both object-level
information and pixel-level accuracy. Recently, FCN-based systems gained great
improvement in this area. Unlike classification networks, combining features of
different layers plays an important role in these dense prediction models, as
these features contains information of different levels. A number of models
have been proposed to show how to use these features. However, what is the best
architecture to make use of features of different layers is still a question.
In this paper, we propose a module, called mixed context network, and show that
our presented system outperforms most existing semantic segmentation systems by
making use of this module.

Lensless Imaging with Compressive Ultrafast Sensing

Guy Satat, Matthew Tancik, Ramesh Raskar
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Conventional imaging uses a set of lenses to form an image on the sensor
plane. This pure hardware-based approach doesn’t use any signal processing, nor
the extra information in the time of arrival of photons to the sensor.
Recently, modern compressive sensing techniques have been applied for lensless
imaging. However, this computational approach tends to depend as much as
possible on signal processing (for example, single pixel camera) and results in
a long acquisition time. Here we propose using compressive ultrafast sensing
for lensless imaging. We use extremely fast sensors (picosecond time
resolution) to time tag photons as they arrive to an omnidirectional pixel.
Thus, each measurement produces a time series where time is a function of the
photon source location in the scene. This allows lensless imaging with
significantly fewer measurements compared to regular single pixel imaging (( 33
imes) less measurements in our experiments). To achieve this goal, we
developed a framework for using ultrafast pixels with compressive sensing,
including an algorithm for ideal sensor placement, and an algorithm for
optimized active illumination patterns. We show that efficient lensless imaging
is possible with ultrafast imaging and compressive sensing. This paves the way
for novel imaging architectures, and remote sensing in extreme situations where
imaging with a lens is not possible.

Fast and Accurate Surface Normal Integration on Non-Rectangular Domains

Martin Bähr, Michael Breuß, Yvain Quéau, Ali Sharifi Boroujerdi, Jean-Denis Durou
Subjects: Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV)

The integration of surface normals for the purpose of computing the shape of
a surface in 3D space is a classic problem in computer vision. However, even
nowadays it is still a challenging task to devise a method that combines the
flexibility to work on non-trivial computational domains with high accuracy,
robustness and computational efficiency. By uniting a classic approach for
surface normal integration with modern computational techniques we construct a
solver that fulfils these requirements. Building upon the Poisson integration
model we propose to use an iterative Krylov subspace solver as a core step in
tackling the task. While such a method can be very efficient, it may only show
its full potential when combined with a suitable numerical preconditioning and
a problem-specific initialisation. We perform a thorough numerical study in
order to identify an appropriate preconditioner for our purpose. To address the
issue of a suitable initialisation we propose to compute this initial state via
a recently developed fast marching integrator. Detailed numerical experiments
illuminate the benefits of this novel combination. In addition, we show on
real-world photometric stereo datasets that the developed numerical framework
is flexible enough to tackle modern computer vision applications.

Visual-Inertial Monocular SLAM with Map Reuse

Raul Mur-Artal, Juan D. Tardos
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

In recent years there have been excellent results in Visual-Inertial Odometry
techniques, which aim to compute the incremental motion of the sensor with high
accuracy and robustness. However these approaches lack the capability to close
loops, and trajectory estimation accumulates drift even if the sensor is
continually revisiting the same place. In this work we present a novel
tightly-coupled Visual-Inertial Simultaneous Localization and Mapping system
that is able to close loops and reuse its map to achieve zero-drift
localization in already mapped areas. While our approach can be applied to any
camera configuration, we address here the most general problem of a monocular
camera, with its well-known scale ambiguity. We also propose a novel IMU
initialization method, which computes the scale, the gravity direction, the
velocity, and gyroscope and accelerometer biases, in a few seconds with high
accuracy. We test our system in the 11 sequences of a recent micro-aerial
vehicle public dataset achieving a typical scale factor error of 1% and
centimeter precision. We compare to the state-of-the-art in visual-inertial
odometry in sequences with revisiting, proving the better accuracy of our
method due to map reuse and no drift accumulation.

Robot Vision Architecture for Autonomous Clothes Manipulation

Li Sun, Gerardo Aragon-Camarasa, Simon Rogers, J. Paul Siebert
Comments: 14 pages, under review
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel robot vision architecture for perceiving generic
3D clothes configurations. Our architecture is hierarchically structured,
starting from low-level curvatures, across mid-level geometric shapes &
topology descriptions; and finally approaching high-level semantic surface
structure descriptions. We demonstrate our robot vision architecture in a
customised dual-arm industrial robot with our self-designed, off-the-self
stereo vision system, carrying out autonomous grasping and dual-arm flattening.
It is worth noting that the proposed dual-arm flattening approach is unique
among the state-of-the-art robot autonomous system, which is the major
contribution of this paper. The experimental results show that the proposed
dual-arm flattening using stereo vision system remarkably outperforms the
single-arm flattening and widely-cited Kinect-based sensing system for
dexterous manipulation tasks. In addition, the proposed grasping approach
achieves satisfactory performance on grasping various kind of garments,
verifying the capability of proposed visual perception architecture to be
adapted to more than one clothing manipulation tasks.

Artificial Intelligence

Constrained Cohort Intelligence using Static and Dynamic Penalty Function Approach for Mechanical Components Design

Omkar Kulkarni, Ninad Kulkarni, Anand J Kulkarni, Ganesh Kakandikar
Subjects: Artificial Intelligence (cs.AI)

Most of the metaheuristics can efficiently solve unconstrained problems;
however, their performance may degenerate if the constraints are involved. This
paper proposes two constraint handling approaches for an emerging metaheuristic
of Cohort Intelligence (CI). More specifically CI with static penalty function
approach (SCI) and CI with dynamic penalty function approach (DCI) are
proposed. The approaches have been tested by solving several constrained test
problems. The performance of the SCI and DCI have been compared with algorithms
like GA, PSO, ABC, d-Ds. In addition, as well as three real world problems from
mechanical engineering domain with improved solutions. The results were
satisfactory and validated the applicability of CI methodology for solving real
world problems.

Fairness as a Program Property

Aws Albarghouthi, Loris D'Antoni, Samuel Drews, Aditya Nori
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)

We explore the following question: Is a decision-making program fair, for
some useful definition of fairness? First, we describe how several algorithmic
fairness questions can be phrased as program verification problems. Second, we
discuss an automated verification technique for proving or disproving fairness
of decision-making programs with respect to a probabilistic model of the
population.

Particle Swarm Optimization for Generating Fuzzy Reinforcement Learning Policies

Fuzzy controllers are known to serve as efficient and interpretable system
controllers for continuous state and action spaces. To date these controllers
have been constructed by hand, or automatically trained either on expert
generated problem specific cost functions or by incorporating detailed
knowledge about the optimal control strategy. Both requirements for automatic
training processes are not given in the majority of real world reinforcement
learning (RL) problems. We introduce a new particle swarm reinforcement
learning (PSRL) approach which is capable of constructing fuzzy RL policies
solely by training parameters on world models produced from randomly generated
samples of the real system. This approach relates self-organizing fuzzy
controllers to model-based RL for the first time. PSRL can be used
straightforward on any RL problem, which is demonstrated on three standard RL
benchmarks, mountain car, cart pole balancing and cart pole swing up. Our
experiments yielded high performing and well interpretable fuzzy policies.

Information Retrieval

Finding Representative Points in Multivariate Data Using PCA

Ashwinkumar Ganesan, Tim Oates, Matt Schmill
Subjects: Information Retrieval (cs.IR)

The idea of representation has been used in various fields of study from data
analysis to political science. In this paper, we define representativeness and
describe a method to isolate data points that can represent the entire data
set. Also, we show how the minimum set of representative data points can be
generated. We use data from GLOBE (a project to study the effects on Land
Change based on a set of parameters that include temperature, forest cover,
human population, atmospheric parameters and many other variables) to test &
validate the algorithm. Principal Component Analysis (PCA) is used to reduce
the dimensions of the multivariate data set, so that the representative points
can be generated efficiently and its Representativeness has been compared
against Random Sampling of points from the data set.

The survey of sentiment and opinion mining for behavior analysis of social media

Saqib Iqbal, Ali Zulqurnain, Yaqoob Wani, Khalid Hussain
Comments: 21-28, International Journal of Computer Science & Engineering Survey (IJCSES), Vol.6, No.5, October 2015
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)

Nowadays, internet has changed the world into a global village. Social Media
has reduced the gaps among the individuals. Previously communication was a time
consuming and expensive task between the people. Social Media has earned fame
because it is a cheaper and faster communication provider. Besides, social
media has allowed us to reduce the gaps of physical distance, it also generates
and preserves huge amount of data. The data are very valuable and it presents
association degree between people and their opinions. The comprehensive
analysis of the methods which are used on user behavior prediction is presented
in this paper. This comparison will provide a detailed information, pros and
cons in the domain of sentiment and opinion mining.

Computation and Language

Chinese Restaurant Process for cognate clustering: A threshold free approach

Taraka Rama
Subjects: Computation and Language (cs.CL)

In this paper, we introduce a threshold free approach, motivated from Chinese
Restaurant Process, for the purpose of cognate clustering. We show that our
approach yields similar results to a linguistically motivated cognate
clustering system known as LexStat. Our Chinese Restaurant Process system is
fast and does not require any threshold and can be applied to any language
family of the world.

Bidirectional LSTM-CRF for Clinical Concept Extraction

Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
Comments: This paper “Bidirectional LSTM-CRF for Clinical Concept Extraction” is accepted for short paper presentation at Clinical Natural Language Processing Workshop at COLING 2016 Osaka, Japan. December 11, 2016
Subjects: Computation and Language (cs.CL)

Extraction of concepts present in patient clinical records is an essential
step in clinical research. The 2010 i2b2/VA Workshop on Natural Language
Processing Challenges for clinical records presented concept extraction (CE)
task, with aim to identify concepts (such as treatments, tests, problems) and
classify them into predefined categories. State-of-the-art CE approaches
heavily rely on hand crafted features and domain specific resources which are
hard to collect and tune. For this reason, this paper employs bidirectional
LSTM with CRF decoding initialized with general purpose off-the-shelf word
embeddings for CE. The experimental results achieved on 2010 i2b2/VA reference
standard corpora using bidirectional LSTM CRF ranks closely with top ranked
systems.

Small-footprint Highway Deep Neural Networks for Speech Recognition

Liang Lu, Steve Renals
Comments: 9 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:1608.00892, arXiv:1607.01963
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Morden state-of-the-art speech recognition systems usually employ neural
networks for acoustic modeling. However, compared to the conventional Gaussian
mixture models, deep neural network (DNN) based acoustic models usually have
much larger number of model parameters, making it challenging for their
applications in resource constrained platforms such as mobile devices. In this
paper, we study the application of the recently proposed highway deep neural
network (HDNN) for training small-footprint acoustic models. HDNN is a type of
depth-gated feedforward neural network, which introduces two type of gate
functions to facilitate the information flow through different layers. Our
study demonstrates that HDNNs are more compact than plain DNNs for acoustic
modeling, i.e., they can achieve comparable recognition accuracy with much less
model parameters than plain DNN-based acoustic models. Furthermore, HDNNs are
more controllable than plain DNNs. The gate functions of a HDNN largely control
the behavior of the whole network with very small number of model parameters.
And finally, HDNNs are more adaptable than plain DNNs. For example, simply
updating the gate functions using the adaptation data can result in
considerable gains. We demonstrate these aspect by experiments using the
publicly available AMI meeting speech transcription corpus, which has around 80
hours of training data. Moreover, we also investigate the knowledge
distillation technique to further improve the small-footprint HDNN acoustic
models.

A Bayesian Approach to Estimation of Speaker Normalization Parameters

Dhananjay Ram, Debasis Kundu, Rajesh M. Hegde
Comments: 23 Pages, 9 Figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Applications (stat.AP)

In this work, a Bayesian approach to speaker normalization is proposed to
compensate for the degradation in performance of a speaker independent speech
recognition system. The speaker normalization method proposed herein uses the
technique of vocal tract length normalization (VTLN). The VTLN parameters are
estimated using a novel Bayesian approach which utilizes the Gibbs sampler, a
special type of Markov Chain Monte Carlo method. Additionally the
hyperparameters are estimated using maximum likelihood approach. This model is
used assuming that human vocal tract can be modeled as a tube of uniform cross
section. It captures the variation in length of the vocal tract of different
speakers more effectively, than the linear model used in literature. The work
has also investigated different methods like minimization of Mean Square Error
(MSE) and Mean Absolute Error (MAE) for the estimation of VTLN parameters. Both
single pass and two pass approaches are then used to build a VTLN based speech
recognizer. Experimental results on recognition of vowels and Hindi phrases
from a medium vocabulary indicate that the Bayesian method improves the
performance by a considerable margin.

Distributed, Parallel, and Cluster Computing

ERMrest: an entity-relationship data storage service for web-based, data-oriented collaboration

Karl Czajkowski, Carl Kesselman, Robert Schuler, Hongsuda Tangmunarunkit
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Digital Libraries (cs.DL); Human-Computer Interaction (cs.HC)

Scientific discovery is increasingly dependent on a scientist’s ability to
acquire, curate, integrate, analyze, and share large and diverse collections of
data. While the details vary from domain to domain, these data often consist of
diverse digital assets (e.g. image files, sequence data, or simulation outputs)
that are organized with complex relationships and context which may evolve over
the course of an investigation. In addition, discovery is often collaborative,
such that sharing of the data and its organizational context is highly
desirable. Common systems for managing file or asset metadata hide their
inherent relational structures, while traditional relational database systems
do not extend to the distributed collaborative environment often seen in
scientific investigations. To address these issues, we introduce ERMrest, a
collaborative data management service which allows general entity-relationship
modeling of metadata manipulated by RESTful access methods. We present the
design criteria, architecture, and service implementation, as well as describe
an ecosystem of tools and services that we have created to integrate metadata
into an end-to-end scientific data life cycle. ERMrest has been deployed to
hundreds of users across multiple scientific research communities and projects.
We present two representative use cases: an international consortium and an
early-phase, multidisciplinary research project.

From Network Reliability to the Ising Model: A Parallel Scheme for Estimating the Joint Density of States

Yihui Ren, Stephen Eubank, Madhurima Nath
Comments: Ver.2. 8 pages, 7 figures. Accepted. Phys. Rev. E
Subjects: Statistical Mechanics (cond-mat.stat-mech); Distributed, Parallel, and Cluster Computing (cs.DC)

Network reliability is the probability that a dynamical system composed of
discrete elements interacting on a network will be found in a configuration
that satisfies a particular property. We introduce a new reliability property,
Ising feasibility, for which the network reliability is the Ising model s
partition function. As shown by Moore and Shannon, the network reliability can
be separated into two factors: structural, solely determined by the network
topology, and dynamical, determined by the underlying dynamics. In this case,
the structural factor is known as the joint density of states. Using methods
developed to approximate the structural factor for other reliability
properties, we simulate the joint density of states, yielding an approximation
for the partition function. Based on a detailed examination of why naive Monte
Carlo sampling gives a poor approximation, we introduce a novel parallel scheme
for estimating the joint density of states using a Markov chain Monte Carlo
method with a spin exchange random walk. This parallel scheme makes simulating
the Ising model in the presence of an external field practical on small
computer clusters for networks with arbitrary topology with 10 to 6 energy
levels and more than 10 to 308 microstates.

Learning

Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning

Qianli Liao, Kenji Kawaguchi, Tomaso Poggio
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We systematically explored a spectrum of normalization algorithms related to
Batch Normalization (BN) and propose a generalized formulation that
simultaneously solves two major limitations of BN: (1) online learning and (2)
recurrent learning. Our proposal is simpler and more biologically-plausible.
Unlike previous approaches, our technique can be applied out of the box to all
learning scenarios (e.g., online learning, batch learning, fully-connected,
convolutional, feedforward, recurrent and mixed — recurrent and
convolutional) and compare favorably with existing approaches. We also propose
Lp Normalization for normalizing by different orders of statistical moments. In
particular, L1 normalization is well-performing, simple to implement, fast to
compute, more biologically-plausible and thus ideal for GPU or hardware
implementations.

Efficiency of active learning for the allocation of workers on crowdsourced classification tasks

Edoardo Manino, Long Tran-Thanh, Nicholas R. Jennings
Comments: paper accepted in the CrowdML workshop at NIPS 2016
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC)

Crowdsourcing has been successfully employed in the past as an effective and
cheap way to execute classification tasks and has therefore attracted the
attention of the research community. However, we still lack a theoretical
understanding of how to collect the labels from the crowd in an optimal way. In
this paper we focus on the problem of worker allocation and compare two active
learning policies proposed in the empirical literature with a uniform
allocation of the available budget. To this end we make a thorough mathematical
analysis of the problem and derive a new bound on the performance of the
system. Furthermore we run extensive simulations in a more realistic scenario
and show that our theoretical results hold in practice.

Learning to Learn Neural Networks

Tom Bosc
Comments: presented at “Reasoning, Attention, Memory” workshop, NIPS 2015
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Meta-learning consists in learning learning algorithms. We use a Long Short
Term Memory (LSTM) based network to learn to compute on-line updates of the
parameters of another neural network. These parameters are stored in the cell
state of the LSTM. Our framework allows to compare learned algorithms to
hand-made algorithms within the traditional train and test methodology. In an
experiment, we learn a learning algorithm for a one-hidden layer Multi-Layer
Perceptron (MLP) on non-linearly separable datasets. The learned algorithm is
able to update parameters of both layers and generalise well on similar
datasets.

K-Nearest Neighbor Classification Using Anatomized Data

Koray Mancuhan, Chris Clifton
Comments: Technical Report
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Databases (cs.DB)

This paper analyzes k nearest neighbor classification with training data
anonymized using anatomy. Anatomy preserves all data values, but introduces
uncertainty in the mapping between identifying and sensitive values. We first
study the theoretical effect of the anatomized training data on the k nearest
neighbor error rate bounds, nearest neighbor convergence rate, and Bayesian
error. We then validate the derived bounds empirically. We show that 1)
Learning from anatomized data approaches the limits of learning through the
unprotected data (although requiring larger training data), and 2) nearest
neighbor using anatomized data outperforms nearest neighbor on
generalization-based anonymization.

CuMF_SGD: Fast and Scalable Matrix Factorization

Xiaolong Xie, Wei Tan, Liana L. Fong, Yun Liang
Subjects: Learning (cs.LG); Numerical Analysis (cs.NA)

Matrix factorization (MF) has been widely used in e.g., recommender systems,
topic modeling and word embedding. Stochastic gradient descent (SGD) is popular
in solving MF problems because it can deal with large data sets and is easy to
do incremental learning. We observed that SGD for MF is memory bound.
Meanwhile, single-node CPU systems with caching performs well only for small
data sets; distributed systems have higher aggregated memory bandwidth but
suffer from relatively slow network connection. This observation inspires us to
accelerate MF by utilizing GPUs’s high memory bandwidth and fast intra-node
connection. We present cuMF_SGD, a CUDA-based SGD solution for large-scale MF
problems. On a single CPU, we design two workload schedule schemes, i.e.,
batch-Hogwild! and wavefront-update that fully exploit the massive amount of
cores. Especially, batch-Hogwild! as a vectorized version of Hogwild! overcomes
the issue of memory discontinuity. We also develop highly-optimized kernels for
SGD update, leveraging cache, warp-shuffle instructions and half-precision
floats. We also design a partition scheme to utilize multiple GPUs while
addressing the well-known convergence issue when parallelizing SGD. On three
data sets with only one Maxwell or Pascal GPU, cuMF_SGD runs 3.1X-28.2X as fast
compared with state-of-art CPU solutions on 1-64 CPU nodes. Evaluations also
show that cuMF_SGD scales well on multiple GPUs in large data sets.

Statistical Learning Theory Approach for Data Classification with l-diversity

Koray Mancuhan, Chris Clifton
Comments: Technical Report
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Databases (cs.DB)

Corporations are retaining ever-larger corpuses of personal data; the
frequency or breaches and corresponding privacy impact have been rising
accordingly. One way to mitigate this risk is through use of anonymized data,
limiting the exposure of individual data to only where it is absolutely needed.
This would seem particularly appropriate for data mining, where the goal is
generalizable knowledge rather than data on specific individuals. In practice,
corporate data miners often insist on original data, for fear that they might
“miss something” with anonymized or differentially private approaches. This
paper provides a theoretical justification for the use of anonymized data.
Specifically, we show that a support vector classifier trained on anatomized
data satisfying l-diversity should be expected to do as well as on the original
data. Anatomy preserves all data values, but introduces uncertainty in the
mapping between identifying and sensitive values, thus satisfying l-diversity.
The theoretical effectiveness of the proposed approach is validated using
several publicly available datasets, showing that we outperform the state of
the art for support vector classification using training data protected by
k-anonymity, and are comparable to learning on the original data.

Decision Tree Classification on Outsourced Data

Koray Mancuhan, Chris Clifton
Comments: Presented in the Data Ethics Workshop at the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Databases (cs.DB)

This paper proposes a client-server decision tree learning method for
outsourced private data. The privacy model is anatomization/fragmentation: the
server sees data values, but the link between sensitive and identifying
information is encrypted with a key known only to clients. Clients have limited
processing and storage capability. Both sensitive and identifying information
thus are stored on the server. The approach presented also retains most
processing at the server, and client-side processing is amortized over
predictions made by the clients. Experiments on various datasets show that the
method produces decision trees approaching the accuracy of a non-private
decision tree, while substantially reducing the client’s computing resource
requirements.

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Soham De, Abhay Yadav, David Jacobs, Tom Goldstein
Subjects: Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)

Classical stochastic gradient methods for optimization rely on noisy gradient
approximations that become progressively less accurate as iterates approach a
solution. The large noise and small signal in the resulting gradients makes it
difficult to use them for adaptive stepsize selection and automatic stopping.
We propose alternative “big batch” SGD schemes that adaptively grow the batch
size over time to maintain a nearly constant signal-to-noise ratio in the
gradient approximation. The resulting methods have similar convergence rates to
classical SGD methods without requiring convexity of the objective function.
The high fidelity gradients enable automated learning rate selection and do not
require stepsize decay. For this reason, big batch methods are easily automated
and can run with little or no user oversight.

Particle Swarm Optimization for Generating Fuzzy Reinforcement Learning Policies

Fuzzy controllers are known to serve as efficient and interpretable system
controllers for continuous state and action spaces. To date these controllers
have been constructed by hand, or automatically trained either on expert
generated problem specific cost functions or by incorporating detailed
knowledge about the optimal control strategy. Both requirements for automatic
training processes are not given in the majority of real world reinforcement
learning (RL) problems. We introduce a new particle swarm reinforcement
learning (PSRL) approach which is capable of constructing fuzzy RL policies
solely by training parameters on world models produced from randomly generated
samples of the real system. This approach relates self-organizing fuzzy
controllers to model-based RL for the first time. PSRL can be used
straightforward on any RL problem, which is demonstrated on three standard RL
benchmarks, mountain car, cart pole balancing and cart pole swing up. Our
experiments yielded high performing and well interpretable fuzzy policies.

A multi-task learning model for malware classification with useful file access pattern from API call sequence

Xin Wang, Siu Ming Yiu
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Learning (cs.LG)

Based on API call sequences, semantic-aware and machine learning (ML) based
malware classifiers can be built for malware detection or classification.
Previous works concentrate on crafting and extracting various features from
malware binaries, disassembled binaries or API calls via static or dynamic
analysis and resorting to ML to build classifiers. However, they tend to
involve too much feature engineering and fail to provide interpretability. We
solve these two problems with the recent advances in deep learning: 1)
RNN-based autoencoders (RNN-AEs) can automatically learn low-dimensional
representation of a malware from its raw API call sequence. 2) Multiple
decoders can be trained under different supervisions to give more information,
other than the class or family label of a malware. Inspired by the works of
document classification and automatic sentence summarization, each API call
sequence can be regarded as a sentence. In this paper, we make the first
attempt to build a multi-task malware learning model based on API call
sequences. The model consists of two decoders, one for malware classification
and one for (emph{file access pattern}) (FAP) generation given the API call
sequence of a malware. We base our model on the general seq2seq framework.
Experiments show that our model can give competitive classification results as
well as insightful FAP information.

Learning Determinantal Point Processes in Sublinear Time

Christophe Dupuy (SIERRA), Francis Bach (SIERRA, LIENS)
Comments: Under review for AISTATS 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We propose a new class of determinantal point processes (DPPs) which can be
manipulated for inference and parameter learning in potentially sublinear time
in the number of items. This class, based on a specific low-rank factorization
of the marginal kernel, is particularly suited to a subclass of continuous DPPs
and DPPs defined on exponentially many items. We apply this new class to
modelling text documents as sampling a DPP of sentences, and propose a
conditional maximum likelihood formulation to model topic proportions, which is
made possible with no approximation for our class of DPPs. We present an
application to document summarization with a DPP on (2^{500}) items.

Membership Inference Attacks against Machine Learning Models

Reza Shokri, Marco Stronati, Vitaly Shmatikov
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

We investigate how machine learning models leak information about the
individual data records on which they were trained. We focus on the basic
membership inference attack: given a data record and black-box access to a
model, determine whether the record was in the model’s training dataset. To
perform membership inference against a target model, we make adversarial use of
machine learning and train our own inference attack model to recognize
differences in the target model’s predictions on inputs that it trained on
versus inputs that it did not use during training.

We empirically evaluate our inference techniques on classification models
trained by commercial “machine learning as a service” providers such as Google
and Amazon. Using realistic datasets and classification tasks, we show that
these models can be significantly vulnerable to membership inference attacks.

Small-footprint Highway Deep Neural Networks for Speech Recognition

Morden state-of-the-art speech recognition systems usually employ neural
networks for acoustic modeling. However, compared to the conventional Gaussian
mixture models, deep neural network (DNN) based acoustic models usually have
much larger number of model parameters, making it challenging for their
applications in resource constrained platforms such as mobile devices. In this
paper, we study the application of the recently proposed highway deep neural
network (HDNN) for training small-footprint acoustic models. HDNN is a type of
depth-gated feedforward neural network, which introduces two type of gate
functions to facilitate the information flow through different layers. Our
study demonstrates that HDNNs are more compact than plain DNNs for acoustic
modeling, i.e., they can achieve comparable recognition accuracy with much less
model parameters than plain DNN-based acoustic models. Furthermore, HDNNs are
more controllable than plain DNNs. The gate functions of a HDNN largely control
the behavior of the whole network with very small number of model parameters.
And finally, HDNNs are more adaptable than plain DNNs. For example, simply
updating the gate functions using the adaptation data can result in
considerable gains. We demonstrate these aspect by experiments using the
publicly available AMI meeting speech transcription corpus, which has around 80
hours of training data. Moreover, we also investigate the knowledge
distillation technique to further improve the small-footprint HDNN acoustic
models.

Modeling the Dynamics of Online Learning Activity

Charalampos Mavroforakis, Isabel Valera, Manuel Gomez Rodriguez
Comments: Python implementation of the proposed HDHP is available at this https URL
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)

People are increasingly relying on the Web and social media to find solutions
to their problems in a wide range of domains. In this online setting, closely
related problems often lead to the same characteristic learning pattern, in
which people sharing these problems visit related pieces of information,
perform almost identical queries or, more generally, take a series of similar
actions. In this paper, we introduce a novel modeling framework for clustering
continuous-time grouped streaming data, the hierarchical Dirichlet Hawkes
process (HDHP), which allows us to automatically uncover a wide variety of
learning patterns from detailed traces of learning activity. Our model allows
for efficient inference, scaling to millions of actions taken by thousands of
users. Experiments on real data gathered from Stack Overflow reveal that our
framework can recover meaningful learning patterns in terms of both content and
temporal dynamics, as well as accurately track users’ interests and goals over
time.

RedQueen: An Online Algorithm for Smart Broadcasting in Social Networks

Ali Zarezade, Utkarsh Upadhyay, Hamid Rabiee, Manuel Gomez Rodriguez
Comments: To appear at the 10th ACM International Conference on Web Search and Data Mining (WSDM)
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Social and Information Networks (cs.SI)

Users in social networks whose posts stay at the top of their followers'{}
feeds the longest time are more likely to be noticed. Can we design an online
algorithm to help them decide when to post to stay at the top? In this paper,
we address this question as a novel optimal control problem for jump stochastic
differential equations. For a wide variety of feed dynamics, we show that the
optimal broadcasting intensity for any user is surprisingly simple — it is
given by the position of her most recent post on each of her follower’s feeds.
As a consequence, we are able to develop a simple and highly efficient online
algorithm, RedQueen, to sample the optimal times for the user to post.
Experiments on both synthetic and real data gathered from Twitter show that our
algorithm is able to consistently make a user’s posts more visible over time,
is robust to volume changes on her followers’ feeds, and significantly
outperforms the state of the art.

Information Theory

Leveraging Diversity and Sparsity in Blind Deconvolution

Ali Ahmed, Laurent Demanet
Subjects: Information Theory (cs.IT)

This paper considers recovering (L)-dimensional vectors (oldsymbol{w}), and
(oldsymbol{x}_n,~ n =1 , ldots, N) from their circular convolutions
(oldsymbol{y}_n = oldsymbol{w}*oldsymbol{x}_n). The vector
(oldsymbol{w}) is assumed to be (S)-sparse in a known basis that is spread
out in the Fourier domain, and each input (oldsymbol{x}_n) is a member of a
known (K)-dimensional random subspace.

We prove that whenever (K + Slog^2S lesssim L /log^4(LN)), the problem can
be solved effectively by using only the nuclear-norm minimization as the convex
relaxation, as long as the inputs are sufficiently diverse and obey (N gtrsim
log^2(LN)). By “diverse inputs”, we mean that the (oldsymbol{x}_n) belong to
different, generic subspaces. To our knowledge, this is the first theoretical
result on blind deconvolution where the subspace to which the impulse response
belongs is not fixed, but needs to be determined.

We discuss the result in the context of multipath channel estimation in
wireless communications. Both the fading coefficients, and the delays in the
channel impulse response (oldsymbol{w}) are unknown. The encoder codes the
(K)-dimensional message vectors randomly and then transmits them over a fixed
channel one after the other. The decoder then discovers all of the messages and
the channel response when the number of samples taken for each received message
are roughly greater than ((K+Slog^2S)log^4(LN)), and the number of messages
is roughly at least (log^2(LN)).

Communication efficient and strongly secure secret sharing schemes based on algebraic geometry codes

Umberto Martínez-Peñas
Subjects: Information Theory (cs.IT)

Secret sharing schemes with optimal and universal communication overheads
have been obtained independently by Bitar et al. and Huang et al. However,
their constructions require a finite field of size q > n, where n is the number
of shares, and do not provide strong security. In this work, we give a general
framework to construct communication efficient secret sharing schemes based on
sequences of nested linear codes, which allows to use in particular algebraic
geometry codes and allows to obtain strongly secure and communication efficient
schemes. Using this framework, we obtain: 1) schemes with universal and close
to optimal communication overheads for arbitrarily large lengths n and a fixed
finite field, 2) the first construction of schemes with universal and optimal
communication overheads and optimal strong security (for restricted lengths),
which has the security advantages of perfect schemes and the storage efficiency
of ramp schemes, and 3) schemes with universal and close to optimal
communication overheads and close to optimal strong security defined for
arbitrarily large lengths n and a fixed finite field.

D-OAMP: A Denoising-based Signal Recovery Algorithm for Compressed Sensing

Zhipeng Xue, Junjie Ma, Xiaojun Yuan
Comments: 5 pages, 3 figures, GlobalSip 2016
Subjects: Information Theory (cs.IT)

Approximate message passing (AMP) is an efficient iterative signal recovery
algorithm for compressed sensing (CS). For sensing matrices with independent
and identically distributed (i.i.d.) Gaussian entries, the behavior of AMP can
be asymptotically described by a scaler recursion called state evolution.
Orthogonal AMP (OAMP) is a variant of AMP that imposes a divergence-free
constraint on the denoiser. In this paper, we extend OAMP to incorporate
generic denoisers, hence the name D-OAMP. Our numerical results show that state
evolution predicts the performance of D-OAMP well for generic denoisers when
i.i.d. Gaussian or partial orthogonal sensing matrices are involved. We compare
the performances of denosing-AMP (D-AMP) and D-OAMP for recovering natural
images from CS measurements. Simulation results show that D-OAMP outperforms
D-AMP in both convergence speed and recovery accuracy for partial orthogonal
sensing matrices.

An algorithmic approach using multivariate polynomials for the nonlinearity of Boolean functions

Emanuele Bellini, Teo Mora, Massimiliano Sala
Comments: arXiv admin note: substantial text overlap with arXiv:1404.2741
Subjects: Information Theory (cs.IT); Commutative Algebra (math.AC)

The nonlinearity of a Boolean function is a key property in deciding its
suitability for cryptographic purposes, e.g. as a combining function in stream
ciphers, and so the nonlinearity computation is an important problem for
applications. Traditional methods to compute the nonlinearity are based on
transforms, such as the Fast Walsh Transform. In 2007 Simonetti proposed a
method to solve the above problem seen as a decision problem on the existence
of solutions for some multivariate polynomial systems. Although novel as
approach, her algorithm suffered from a direct application of Groebner bases
and was thus impractical. We now propose two more practical approaches, one
that determines the existence of solutions for Simonetti’s systems in a faster
way and another that writes similar systems but over fields with a different
characteristics. For our algorithms we provide an efficient implementation in
the software package MAGMA.

A 9.96 dB NCG FEC scheme and 164 bits/cycle low-complexity product decoder architecture

Carlo Condo, Pascal Giard, François Leduc-Primeau, Gabi Sarkis, Warren J. Gross
Subjects: Hardware Architecture (cs.AR); Information Theory (cs.IT)

Powerful Forward Error Correction (FEC) schemes are used in optical
communications to achieve bit-error rates below (10^{-15}). These FECs follow
one of two approaches: concatenation of simpler hard-decision codes or usage of
inherently powerful soft-decision codes. The first approach yields lower Net
Coding Gains (NCG), but can usually work at higher code rates and have lower
complexity decoders. In this work, we propose a novel FEC scheme based on a
product code and a post-processing technique. It can achieve an NCG of 9.96 dB
at a BER of (10^{-18}) without encountering an error floor, an error-correction
performance that sits between that of current hard-decision and soft-decision
FECs. A decoder architecture is designed, tested on FPGA and synthesized in 65
nm CMOS technology: its 164 bits/cycle worst-case information throughput can
reach 100 Gb/s at the achieved frequency of 609 MHz. Its complexity is shown to
be lower than that of hard-decision decoders in literature, and an order of
magnitude lower than the estimated complexity of soft-decision decoders.

Proximity-Aware Balanced Allocations in Cache Networks

Ali Pourmiri, Mahdi Jafari Siavoshani, Seyed Pooya Shariatpanahi
Comments: 14 pages, 6 figures
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

We consider load balancing in a network of caching servers delivering
contents to end users. Randomized load balancing via the so-called power of two
choices is a well-known approach in parallel and distributed systems that
reduces network imbalance. In this paper, we propose a randomized load
balancing scheme which considers cache size limitation and proximity in the
server redirection process. Since the memory limitation and the proximity
constraint cause correlation in the server selection process, we may not
benefit from the power of two choices in general.

However, we prove that in certain regimes, in terms of memory limitation and
proximity constraint, our scheme results in the maximum load of order
(Theta(loglog n)) (here (n) is the number of servers and requests), and at
the same time, leads to a low communication cost. This is an exponential
improvement in the maximum load compared to the scheme which assigns each
request to the nearest available replica. Finally, we investigate our scheme
performance by extensive simulations.