Eric Price, Wojciech Zaremba, Ilya Sutskever
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
The Neural GPU is a recent model that can learn algorithms such as
multi-digit binary addition and binary multiplication in a way that generalizes
to inputs of arbitrary length. We show that there are two simple ways of
improving the performance of the Neural GPU: by carefully designing a
curriculum, and by increasing model size. The latter requires careful memory
management, as a naive implementation of the Neural GPU is memory intensive. We
find that these techniques to increase the set of algorithmic problems that can
be solved by the Neural GPU: we have been able to learn to perform all the
arithmetic operations (and generalize to arbitrarily long numbers) when the
arguments are given in the decimal representation (which, surprisingly, has not
been possible before). We have also been able to train the Neural GPU to
evaluate long arithmetic expressions with multiple operands that require
respecting the precedence order of the operands, although these have succeeded
only in their binary representation, and not with 100\% accuracy.
In addition, we attempt to gain insight into the Neural GPU by understanding
its failure modes. We find that Neural GPUs that correctly generalize to
arbitrarily long numbers still fail to compute the correct answer on
highly-symmetric, atypical inputs: for example, a Neural GPU that achieves
near-perfect generalization on decimal multiplication of up to 100-digit long
numbers can fail on (000000dots002 imes 000000dots002) while succeeding at
(2 imes 2). These failure modes are reminiscent of adversarial examples.
Jonathan Binas, Giacomo Indiveri, Michael Pfeiffer
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Despite their advantages in terms of computational resources, latency, and
power consumption, event-based implementations of neural networks have not been
able to achieve the same performance figures as their equivalent
state-of-the-art deep network models. We propose counter neurons as minimal
spiking neuron models which only require addition and comparison operations,
thus avoiding costly multiplications. We show how inference carried out in deep
counter networks converges to the same accuracy levels as are achieved with
state-of-the-art conventional networks. As their event-based style of
computation leads to reduced latency and sparse updates, counter networks are
ideally suited for efficient compact and low-power hardware implementation. We
present theory and training methods for counter networks, and demonstrate on
the MNIST benchmark that counter networks converge quickly, both in terms of
time and number of operations required, to state-of-the-art classification
accuracy.
Zeinab Borhanifar, Elham Shadkam
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
In this article using Cuckoo Optimization Algorithm and simple additive
weighting method the hybrid COAW algorithm is presented to solve
multi-objective problems. Cuckoo algorithm is an efficient and structured
method for solving nonlinear continuous problems. The created Pareto frontiers
of the COAW proposed algorithm are exact and have good dispersion. This method
has a high speed in finding the Pareto frontiers and identifies the beginning
and end points of Pareto frontiers properly. In order to validation the
proposed algorithm, several experimental problems were analyzed. The results of
which indicate the proper effectiveness of COAW algorithm for solving
multi-objective problems.
Kshiteej Sheth
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose novel methods of solving two tasks using Convolutional Neural
Networks, firstly the task of generating HDR map of a static scene using
differently exposed LDR images of the scene captured using conventional cameras
and secondly the task of finding an optimal tone mapping operator that would
give a better score on the TMQI metric compared to the existing methods. We
quantitatively show the performance of our networks and illustrate the cases
where our networks performs good as well as bad.
Mina Nouredanesh, Andrew McCormick, Sunil L. Kukreja, James Tung
Comments: Accepted paper-The 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, a method to detect environmental hazards related to a fall
risk using a mobile vision system is proposed. First-person perspective videos
are proposed to provide objective evidence on cause and circumstances of
perturbed balance during activities of daily living, targeted to seniors. A
classification problem was defined with 12 total classes of potential fall
risks, including slope changes (e.g., stairs, curbs, ramps) and surfaces (e.g.,
gravel, grass, concrete). Data was collected using a chest-mounted GoPro
camera. We developed a convolutional neural network for automatic feature
extraction, reduction, and classification of frames. Initial results, with a
mean square error of 8%, are promising.
Kshiteej Sheth
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose novel methods of solving two tasks using Convolutional Neural
Networks, firstly the task of generating HDR map of a static scene using
differently exposed LDR images of the scene captured using conventional cameras
and secondly the task of finding an optimal tone mapping operator that would
give a better score on the TMQI metric compared to the existing methods. We
quantitatively show the performance of our networks and illustrate the cases
where our networks performs good as well as bad.
Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose Dual Attention Networks (DANs) which jointly leverage visual and
textual attention mechanisms to capture fine-grained interplay between vision
and language. DANs attend to specific regions in images and words in text
through multiple steps and gather essential information from both modalities.
Based on this framework, we introduce two types of DANs for multimodal
reasoning and matching, respectively. First, the reasoning model allows visual
and textual attentions to steer each other during collaborative inference,
which is useful for tasks such as Visual Question Answering (VQA). Second, the
matching model exploits the two attention mechanisms to estimate the similarity
between images and sentences by focusing on their shared semantics. Our
extensive experiments validate the effectiveness of DANs in combining vision
and language, achieving the state-of-the-art performance on public benchmarks
for VQA and image-text matching.
Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
Comments: NIPS
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep convolutional neural networks (CNN) have achieved great success. On the
other hand, modeling structural information has been proved critical in many
vision problems. It is of great interest to integrate them effectively. In a
classical neural network, there is no message passing between neurons in the
same layer. In this paper, we propose a CRF-CNN framework which can
simultaneously model structural information in both output and hidden feature
layers in a probabilistic way, and it is applied to human pose estimation. A
message passing scheme is proposed, so that in various layers each body joint
receives messages from all the others in an efficient way. Such message passing
can be implemented with convolution between features maps in the same layer,
and it is also integrated with feedforward propagation in neural networks.
Finally, a neural network implementation of end-to-end learning CRF-CNN is
provided. Its effectiveness is demonstrated through experiments on two
benchmark datasets.
Michał Januszewski, Jeremy Maitin-Shepard, Peter Li, Jörgen Kornfeld, Winfried Denk, Viren Jain
Comments: 11 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
State-of-the-art image segmentation algorithms generally consist of at least
two successive and distinct computations: a boundary detection process that
uses local image information to classify image locations as boundaries between
objects, followed by a pixel grouping step such as watershed or connected
components that clusters pixels into segments. Prior work has varied the
complexity and approach employed in these two steps, including the
incorporation of multi-layer neural networks to perform boundary prediction,
and the use of global optimizations during pixel clustering. We propose a
unified and end-to-end trainable machine learning approach, flood-filling
networks, in which a recurrent 3d convolutional network directly produces
individual segments from a raw image. The proposed approach robustly segments
images with an unknown and variable number of objects as well as highly
variable object sizes. We demonstrate the approach on a challenging 3d image
segmentation task, connectomic reconstruction from volume electron microscopy
data, on which flood-filling neural networks substantially improve accuracy
over other state-of-the-art methods. The proposed approach can replace complex
multi-step segmentation pipelines with a single neural network that is learned
end-to-end.
Tatiana Tommasi, Arun Mallya, Bryan Plummer, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg
Comments: submitted to IJCV — under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents an approach for answering fill-in-the-blank multiple
choice questions from the Visual Madlibs dataset. Instead of generic and
commonly used representations trained on the ImageNet dataset, our approach
employs a combination of networks trained for specialized tasks such as scene
recognition, person activity classification, and attribute prediction. We also
present a method for localizing phrases from candidate answers in order to
provide spatial support for feature extraction. We map each of these features,
together with candidate answers, to a joint embedding space through normalized
canonical correlation analysis (CCA). Finally, we solve an optimization problem
to learn to combine CCA scores from multiple cues to select the best answer.
Extensive experimental results show a significant improvement over the previous
state of the art and confirm that answering questions from a wide range of
types benefits from examining a variety of image cues and carefully choosing
the spatial support of feature extraction.
Hao Wang, Xingjian Shi, Dit-Yan Yeung
Comments: To appear at NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Hybrid methods that utilize both content and rating information are commonly
used in many recommender systems. However, most of them use either handcrafted
features or the bag-of-words representation as a surrogate for the content
information but they are neither effective nor natural enough. To address this
problem, we develop a collaborative recurrent autoencoder (CRAE) which is a
denoising recurrent autoencoder (DRAE) that models the generation of content
sequences in the collaborative filtering (CF) setting. The model generalizes
recent advances in recurrent deep learning from i.i.d. input to non-i.i.d.
(CF-based) input and provides a new denoising scheme along with a novel
learnable pooling scheme for the recurrent autoencoder. To do this, we first
develop a hierarchical Bayesian model for the DRAE and then generalize it to
the CF setting. The synergy between denoising and CF enables CRAE to make
accurate recommendations while learning to fill in the blanks in sequences.
Experiments on real-world datasets from different domains (CiteULike and
Netflix) show that, by jointly modeling the order-aware generation of sequences
for the content information and performing CF for the ratings, CRAE is able to
significantly outperform the state of the art on both the recommendation task
based on ratings and the sequence generation task based on content information.
Hao Wang, Xingjian Shi, Dit-Yan Yeung
Comments: To appear at NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Neural networks (NN) have achieved state-of-the-art performance in various
applications. Unfortunately in applications where training data is
insufficient, they are often prone to overfitting. One effective way to
alleviate this problem is to exploit the Bayesian approach by using Bayesian
neural networks (BNN). Another shortcoming of NN is the lack of flexibility to
customize different distributions for the weights and neurons according to the
data, as is often done in probabilistic graphical models. To address these
problems, we propose a class of probabilistic neural networks, dubbed
natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment
of NN. NPN allows the usage of arbitrary exponential-family distributions to
model the weights and neurons. Different from traditional NN and BNN, NPN takes
distributions as input and goes through layers of transformation before
producing distributions to match the target output distributions. As a Bayesian
treatment, efficient backpropagation (BP) is performed to learn the natural
parameters for the distributions over both the weights and neurons. The output
distributions of each layer, as byproducts, may be used as second-order
representations for the associated tasks such as link prediction. Experiments
on real-world datasets show that NPN can achieve state-of-the-art performance.
Marek Rosa, Jan Feyereisl, The GoodAI Collective
Subjects: Artificial Intelligence (cs.AI)
There is a significant lack of unified approaches to building generally
intelligent machines. The majority of current artificial intelligence research
operates within a very narrow field of focus, frequently without considering
the importance of the ‘big picture’. In this document, we seek to describe and
unify principles that guide the basis of our development of general artificial
intelligence. These principles revolve around the idea that intelligence is a
tool for searching for general solutions to problems. We define intelligence as
the ability to acquire skills that narrow this search, diversify it and help
steer it to more promising areas. We also provide suggestions for studying,
measuring, and testing the various skills and abilities that a human-level
intelligent machine needs to acquire. The document aims to be both
implementation agnostic, and to provide an analytic, systematic, and scalable
way to generate hypotheses that we believe are needed to meet the necessary
conditions in the search for general artificial intelligence. We believe that
such a framework is an important stepping stone for bringing together
definitions, highlighting open problems, connecting researchers willing to
collaborate, and for unifying the arguably most significant search of this
century.
W. B. Vasantha Kandasamy, Ilanthenral K, Florentin Smarandache
Comments: 226 pages, many graphs, Europa Belgique, 2016
Subjects: Artificial Intelligence (cs.AI)
In this book authors for the first time introduce the notion of strong
neutrosophic graphs. They are very different from the usual graphs and
neutrosophic graphs. Using these new structures special subgraph topological
spaces are defined. Further special lattice graph of subgraphs of these graphs
are defined and described. Several interesting properties using subgraphs of a
strong neutrosophic graph are obtained. Several open conjectures are proposed.
These new class of strong neutrosophic graphs will certainly find applications
in Neutrosophic Cognitive Maps (NCM), Neutrosophic Relational Maps (NRM) and
Neutrosophic Relational Equations (NRE) with appropriate modifications.
Oliver M. Cliff, Mikhail Prokopenko, Robert Fitch
Subjects: Artificial Intelligence (cs.AI)
In this work, we are interested in structure learning for a set of spatially
distributed dynamical systems, where individual subsystems are coupled via
latent variables and observed through a filter. We represent this model as a
directed acyclic graph (DAG) that characterises the unidirectional coupling
between subsystems. Standard approaches to structure learning are not
applicable in this framework due to the hidden variables, however we can
exploit the properties of certain dynamical systems to formulate exact methods
based on state space reconstruction. We approach the problem by using
reconstruction theorems to analytically derive a tractable expression for the
KL-divergence of a candidate DAG from the observed dataset. We show this
measure can be decomposed as a function of two information-theoretic measures,
transfer entropy and stochastic interaction. We then present two mathematically
robust scoring functions based on transfer entropy and statistical independence
tests. These results support the previously held conjecture that transfer
entropy can be used to infer effective connectivity in complex networks.
Sándor Bozóki, László Csató, József Temesi
Comments: 14 pages, 2 figures
Journal-ref: European Journal of Operational Research (2016). 248(1): 211-218
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Applications (stat.AP)
Pairwise comparison is an important tool in multi-attribute decision making.
Pairwise comparison matrices (PCM) have been applied for ranking criteria and
for scoring alternatives according to a given criterion. Our paper presents a
special application of incomplete PCMs: ranking of professional tennis players
based on their results against each other. The selected 25 players have been on
the top of the ATP rankings for a shorter or longer period in the last 40
years. Some of them have never met on the court. One of the aims of the paper
is to provide ranking of the selected players, however, the analysis of
incomplete pairwise comparison matrices is also in the focus. The eigenvector
method and the logarithmic least squares method were used to calculate weights
from incomplete PCMs. In our results the top three players of four decades were
Nadal, Federer and Sampras. Some questions have been raised on the properties
of incomplete PCMs and remains open for further investigation.
Eric Price, Wojciech Zaremba, Ilya Sutskever
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
The Neural GPU is a recent model that can learn algorithms such as
multi-digit binary addition and binary multiplication in a way that generalizes
to inputs of arbitrary length. We show that there are two simple ways of
improving the performance of the Neural GPU: by carefully designing a
curriculum, and by increasing model size. The latter requires careful memory
management, as a naive implementation of the Neural GPU is memory intensive. We
find that these techniques to increase the set of algorithmic problems that can
be solved by the Neural GPU: we have been able to learn to perform all the
arithmetic operations (and generalize to arbitrarily long numbers) when the
arguments are given in the decimal representation (which, surprisingly, has not
been possible before). We have also been able to train the Neural GPU to
evaluate long arithmetic expressions with multiple operands that require
respecting the precedence order of the operands, although these have succeeded
only in their binary representation, and not with 100\% accuracy.
In addition, we attempt to gain insight into the Neural GPU by understanding
its failure modes. We find that Neural GPUs that correctly generalize to
arbitrarily long numbers still fail to compute the correct answer on
highly-symmetric, atypical inputs: for example, a Neural GPU that achieves
near-perfect generalization on decimal multiplication of up to 100-digit long
numbers can fail on (000000dots002 imes 000000dots002) while succeeding at
(2 imes 2). These failure modes are reminiscent of adversarial examples.
Gabriel Synnaeve, Nantas Nardelli, Alex Auvolat, Soumith Chintala, Timothée Lacroix, Zeming Lin, Florian Richoux, Nicolas Usunier
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
We present TorchCraft, an open-source library that enables deep learning
research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by
making it easier to control these games from a machine learning framework, here
Torch. This white paper argues for using RTS games as a benchmark for AI
research, and describes the design and components of TorchCraft.
Zeinab Borhanifar, Elham Shadkam
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
In this article using Cuckoo Optimization Algorithm and simple additive
weighting method the hybrid COAW algorithm is presented to solve
multi-objective problems. Cuckoo algorithm is an efficient and structured
method for solving nonlinear continuous problems. The created Pareto frontiers
of the COAW proposed algorithm are exact and have good dispersion. This method
has a high speed in finding the Pareto frontiers and identifies the beginning
and end points of Pareto frontiers properly. In order to validation the
proposed algorithm, several experimental problems were analyzed. The results of
which indicate the proper effectiveness of COAW algorithm for solving
multi-objective problems.
Hao Wang, Xingjian Shi, Dit-Yan Yeung
Comments: To appear at NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Hybrid methods that utilize both content and rating information are commonly
used in many recommender systems. However, most of them use either handcrafted
features or the bag-of-words representation as a surrogate for the content
information but they are neither effective nor natural enough. To address this
problem, we develop a collaborative recurrent autoencoder (CRAE) which is a
denoising recurrent autoencoder (DRAE) that models the generation of content
sequences in the collaborative filtering (CF) setting. The model generalizes
recent advances in recurrent deep learning from i.i.d. input to non-i.i.d.
(CF-based) input and provides a new denoising scheme along with a novel
learnable pooling scheme for the recurrent autoencoder. To do this, we first
develop a hierarchical Bayesian model for the DRAE and then generalize it to
the CF setting. The synergy between denoising and CF enables CRAE to make
accurate recommendations while learning to fill in the blanks in sequences.
Experiments on real-world datasets from different domains (CiteULike and
Netflix) show that, by jointly modeling the order-aware generation of sequences
for the content information and performing CF for the ratings, CRAE is able to
significantly outperform the state of the art on both the recommendation task
based on ratings and the sequence generation task based on content information.
Hao Wang, Xingjian Shi, Dit-Yan Yeung
Comments: To appear at NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Neural networks (NN) have achieved state-of-the-art performance in various
applications. Unfortunately in applications where training data is
insufficient, they are often prone to overfitting. One effective way to
alleviate this problem is to exploit the Bayesian approach by using Bayesian
neural networks (BNN). Another shortcoming of NN is the lack of flexibility to
customize different distributions for the weights and neurons according to the
data, as is often done in probabilistic graphical models. To address these
problems, we propose a class of probabilistic neural networks, dubbed
natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment
of NN. NPN allows the usage of arbitrary exponential-family distributions to
model the weights and neurons. Different from traditional NN and BNN, NPN takes
distributions as input and goes through layers of transformation before
producing distributions to match the target output distributions. As a Bayesian
treatment, efficient backpropagation (BP) is performed to learn the natural
parameters for the distributions over both the weights and neurons. The output
distributions of each layer, as byproducts, may be used as second-order
representations for the associated tasks such as link prediction. Experiments
on real-world datasets show that NPN can achieve state-of-the-art performance.
Peter M Krafft, Michael Macy, Alex Pentland
Comments: Forthcoming in CSCW 2017
Journal-ref: The 20th ACM Conference on Computer-Supported Cooperative Work and
Social Computing (CSCW) (2016)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
The use of bots as virtual confederates in online field experiments holds
extreme promise as a new methodological tool in computational social science.
However, this potential tool comes with inherent ethical challenges. Informed
consent can be difficult to obtain in many cases, and the use of confederates
necessarily implies the use of deception. In this work we outline a design
space for bots as virtual confederates, and we propose a set of guidelines for
meeting the status quo for ethical experimentation. We draw upon examples from
prior work in the CSCW community and the broader social science literature for
illustration. While a handful of prior researchers have used bots in online
experimentation, our work is meant to inspire future work in this area and
raise awareness of the associated ethical issues.
Junseok Park, Gwangmin Kim, Dongjin Jang, Sungji Choo, Sunghwa Bae, Doheon Lee
Comments: 5 pages, 4 figures
Subjects: Information Retrieval (cs.IR); Distributed, Parallel, and Cluster Computing (cs.DC)
Literature analysis is a key step in obtaining background information in
biomedical research. However, it is difficult for researchers to obtain
knowledge of their interests in an efficient manner because of the massive
amount of the published biomedical literature. Therefore, efficient and
systematic search strategies are required, which allow ready access to the
substantial amount of literature. In this paper, we propose a novel search
system, named Co-Occurrence based on Co-Operational Formation with Advanced
Method(COCOFAM) which is suitable for the large-scale literature analysis.
COCOFAM is based on integrating both Spark for local clusters and a global job
scheduler to gather crowdsourced co-occurrence data on global clusters. It will
allow users to obtain information of their interests from the substantial
amount of literature.
João Vinagre, Alípio Mário Jorge, João Gama
Comments: Presented at STREAMEVOLV 2016, held in conjunction with ECML/PKDD 2016, Riva del Garda, Italy, September 23rd, 2016
Subjects: Information Retrieval (cs.IR)
Online recommender systems often deal with continuous, potentially fast and
unbounded flows of data. Ensemble methods for recommender systems have been
used in the past in batch algorithms, however they have never been studied with
incremental algorithms, that are capable of processing those data streams on
the fly. We propose online bagging, using an incremental matrix factorization
algorithm for positive-only data streams. Using prequential evaluation, we show
that bagging is able to improve accuracy more than 35% over the baseline with
small computational overhead.
Elvyna Tunggawan, Yustinus Eko Soelistio
Comments: This is the non-final version of the paper. The final version is published in the IC3INA 2016 Conference (3-5 Oct. 2016, this http URL). All citation should be directed to the final version
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
This paper describes a Naive-Bayesian predictive model for 2016 U.S.
Presidential Election based on Twitter data. We use 33,708 tweets gathered
since December 16, 2015 until February 29, 2016. We introduce a simpler data
preprocessing method to label the data and train the model. The model achieves
95.8% accuracy on 10-fold cross validation and predicts Ted Cruz and Bernie
Sanders as Republican and Democratic nominee respectively. It achieves a
comparable result to those in its competitor methods.
Oren Barkan, Noam Koenigstein, Eylon Yogev
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)
In Recommender Systems research, algorithms are often characterized as either
Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained
using a dataset of user explicit or implicit preferences while CB algorithms
are typically based on item profiles. These approaches harness very different
data sources hence the resulting recommended items are generally also very
different. This paper presents a novel model that serves as a bridge from items
content into their CF representations. We introduce a multiple input deep
regression model to predict the CF latent embedding vectors of items based on
their textual description and metadata. We showcase the effectiveness of the
proposed model by predicting the CF vectors of movies and apps based on their
textual descriptions. Finally, we show that the model can be further improved
by incorporating metadata such as the movie release year and tags which
contribute to a higher accuracy.
Yuanzhi Ke, Masafumi Hagiwara
Comments: 10 pages, 1 figures, 5 tables. Under review as a conference paper at ICLR 2017
Subjects: Computation and Language (cs.CL)
We figure out a trap that is not carefully addressed in the previous works
using lexicons or ontologies to train or improve distributed word
representations: For polysemantic words and utterances changing meaning in
different contexts, their paraphrases or related entities in a lexicon or an
ontology are unreliable and sometimes deteriorate the learning of word
representations. Thus, we propose an approach to address the problem that
considers each paraphrase of a word in a lexicon not fully a paraphrase, but a
fuzzy member (i.e., fuzzy paraphrase) in the paraphrase set whose degree of
truth (i.e., membership) depends on the contexts. Then we propose an efficient
method to use the fuzzy paraphrases to learn word embeddings. We approximately
estimate the local membership of paraphrases, and train word embeddings using a
lexicon jointly by replacing the words in the contexts with their paraphrases
randomly subject to the membership of each paraphrase. The experimental results
show that our method is efficient, overcomes the weakness of the previous
related works in extracting semantic information and outperforms the previous
works of learning word representations using lexicons.
Sheng Zhang, Rachel Rudinger, Kevin Duh, Benjamin Van Durme
Subjects: Computation and Language (cs.CL)
Humans have the capacity to draw common-sense inferences from natural
language: various things that are likely but not certain to hold based on
established discourse, and are rarely stated explicitly. We propose an
evaluation of automated common-sense inference based on an extension of
recognizing textual entailment: predicting ordinal human responses of
subjective likelihood of an inference holding in a given context. We describe a
framework for extracting common-sense knowledge for corpora, which is then used
to construct a dataset for this ordinal entailment task, which we then use to
train and evaluate a sequence to sequence neural network model. Further, we
annotate subsets of previously established datasets via our ordinal annotation
protocol in order to then analyze the distinctions between these and what we
have constructed.
Abbas Khosravani, Cornelius Glackin, Nazim Dugan, Gérard Chollet, Nigel Cannings
Comments: 7 pages, 3 figures, NIST SRE 2016 Workshop
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Machine Learning (stat.ML)
This paper presents the Intelligent Voice (IV) system submitted to the NIST
2016 Speaker Recognition Evaluation (SRE). The primary emphasis of SRE this
year was on developing speaker recognition technology which is robust for novel
languages that are much more heterogeneous than those used in the current
state-of-the-art, using significantly less training data, that does not contain
meta-data from those languages. The system is based on the state-of-the-art
i-vector/PLDA which is developed on the fixed training condition, and the
results are reported on the protocol defined on the development set of the
challenge.
Chaozhuo Li, Yu Wu, Wei Wu, Chen Xing, Zhoujun Li, Ming Zhou
Subjects: Computation and Language (cs.CL)
While automatic response generation for building chatbot systems has drawn a
lot of attention recently, there is limited understanding on when we need to
consider the linguistic context of an input text in the generation process. The
task is challenging, as messages in a conversational environment are short and
informal, and evidence that can indicate a message is context dependent is
scarce. After a study of social conversation data crawled from the web, we
observed that some characteristics estimated from the responses of messages are
discriminative for identifying context dependent messages. With the
characteristics as weak supervision, we propose using a Long Short Term Memory
(LSTM) network to learn a classifier. Our method carries out text
representation and classifier learning in a unified framework. Experimental
results show that the proposed method can significantly outperform baseline
methods on accuracy of classification.
Ameya Prabhu, Aditya Joshi, Manish Shrivastava, Vasudeva Varma
Comments: Accepted paper at COLING 2016
Subjects: Computation and Language (cs.CL)
Sentiment analysis (SA) using code-mixed data from social media has several
applications in opinion mining ranging from customer satisfaction to social
campaign analysis in multilingual societies. Advances in this area are impeded
by the lack of a suitable annotated dataset. We introduce a Hindi-English
(Hi-En) code-mixed dataset for sentiment analysis and perform empirical
analysis comparing the suitability and performance of various state-of-the-art
SA methods in social media.
In this paper, we introduce learning sub-word level representations in LSTM
(Subword-LSTM) architecture instead of character-level or word-level
representations. This linguistic prior in our architecture enables us to learn
the information about sentiment value of important morphemes. This also seems
to work well in highly noisy text containing misspellings as shown in our
experiments which is demonstrated in morpheme-level feature maps learned by our
model. Also, we hypothesize that encoding this linguistic prior in the
Subword-LSTM architecture leads to the superior performance. Our system attains
accuracy 4-5% greater than traditional approaches on our dataset, and also
outperforms the available system for sentiment analysis in Hi-En code-mixed
text by 18%.
Bo Wang, Yingjun Sun, Yuan Wang
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)
Though current researches often study the properties of online social
relationship from an objective view, we also need to understand individuals’
subjective opinions on their interrelationships in social computing studies.
Inspired by the theories from sociolinguistics, the latest work indicates that
interactive language can reveal individuals’ asymmetric opinions on their
interrelationship. In this work, in order to explain the opinions’ asymmetry on
interrelationship with more latent factors, we extend the investigation from
single relationship to the structural context in online social network. We
analyze the correlation between interactive language features and the
structural context of interrelationships. The structural context of vertex,
edges and triangles in social network are considered. With statistical analysis
on Enron email dataset, we find that individuals’ opinions (measured by
interactive language features) on their interrelationship are related to some
of their important structural context in social network. This result can help
us to understand and measure the individuals’ opinions on their
interrelationship with more intrinsic information.
Bo Wang, Yanshu Yu, Yuan Wang
Comments: arXiv admin note: text overlap with arXiv:1409.2450 by other authors
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)
Instead of studying the properties of social relationship from an objective
view, in this paper, we focus on individuals’ subjective and asymmetric
opinions on their interrelationships. Inspired by the theories from
sociolinguistics, we investigate two individuals’ opinions on their
interrelationship with their interactive language features. Eliminating the
difference of personal language style, we clarify that the asymmetry of
interactive language feature values can indicate individuals’ asymmetric
opinions on their interrelationship. We also discuss how the degree of
opinions’ asymmetry is related to the individuals’ personality traits.
Furthermore, to measure the individuals’ asymmetric opinions on
interrelationship concretely, we develop a novel model synthetizing interactive
language and social network features. The experimental results with Enron email
dataset provide multiple evidences of the asymmetric opinions on
interrelationship, and also verify the effectiveness of the proposed model in
measuring the degree of opinions’ asymmetry.
Hao Wang, Xingjian Shi, Dit-Yan Yeung
Comments: To appear at NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Hybrid methods that utilize both content and rating information are commonly
used in many recommender systems. However, most of them use either handcrafted
features or the bag-of-words representation as a surrogate for the content
information but they are neither effective nor natural enough. To address this
problem, we develop a collaborative recurrent autoencoder (CRAE) which is a
denoising recurrent autoencoder (DRAE) that models the generation of content
sequences in the collaborative filtering (CF) setting. The model generalizes
recent advances in recurrent deep learning from i.i.d. input to non-i.i.d.
(CF-based) input and provides a new denoising scheme along with a novel
learnable pooling scheme for the recurrent autoencoder. To do this, we first
develop a hierarchical Bayesian model for the DRAE and then generalize it to
the CF setting. The synergy between denoising and CF enables CRAE to make
accurate recommendations while learning to fill in the blanks in sequences.
Experiments on real-world datasets from different domains (CiteULike and
Netflix) show that, by jointly modeling the order-aware generation of sequences
for the content information and performing CF for the ratings, CRAE is able to
significantly outperform the state of the art on both the recommendation task
based on ratings and the sequence generation task based on content information.
Hao Wang, Xingjian Shi, Dit-Yan Yeung
Comments: To appear at NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Neural networks (NN) have achieved state-of-the-art performance in various
applications. Unfortunately in applications where training data is
insufficient, they are often prone to overfitting. One effective way to
alleviate this problem is to exploit the Bayesian approach by using Bayesian
neural networks (BNN). Another shortcoming of NN is the lack of flexibility to
customize different distributions for the weights and neurons according to the
data, as is often done in probabilistic graphical models. To address these
problems, we propose a class of probabilistic neural networks, dubbed
natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment
of NN. NPN allows the usage of arbitrary exponential-family distributions to
model the weights and neurons. Different from traditional NN and BNN, NPN takes
distributions as input and goes through layers of transformation before
producing distributions to match the target output distributions. As a Bayesian
treatment, efficient backpropagation (BP) is performed to learn the natural
parameters for the distributions over both the weights and neurons. The output
distributions of each layer, as byproducts, may be used as second-order
representations for the associated tasks such as link prediction. Experiments
on real-world datasets show that NPN can achieve state-of-the-art performance.
Elvyna Tunggawan, Yustinus Eko Soelistio
Comments: This is the non-final version of the paper. The final version is published in the IC3INA 2016 Conference (3-5 Oct. 2016, this http URL). All citation should be directed to the final version
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
This paper describes a Naive-Bayesian predictive model for 2016 U.S.
Presidential Election based on Twitter data. We use 33,708 tweets gathered
since December 16, 2015 until February 29, 2016. We introduce a simpler data
preprocessing method to label the data and train the model. The model achieves
95.8% accuracy on 10-fold cross validation and predicts Ted Cruz and Bernie
Sanders as Republican and Democratic nominee respectively. It achieves a
comparable result to those in its competitor methods.
Oren Barkan, Noam Koenigstein, Eylon Yogev
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)
In Recommender Systems research, algorithms are often characterized as either
Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained
using a dataset of user explicit or implicit preferences while CB algorithms
are typically based on item profiles. These approaches harness very different
data sources hence the resulting recommended items are generally also very
different. This paper presents a novel model that serves as a bridge from items
content into their CF representations. We introduce a multiple input deep
regression model to predict the CF latent embedding vectors of items based on
their textual description and metadata. We showcase the effectiveness of the
proposed model by predicting the CF vectors of movies and apps based on their
textual descriptions. Finally, we show that the model can be further improved
by incorporating metadata such as the movie release year and tags which
contribute to a higher accuracy.
Zahra Khatami, Sungpack Hong, Jinsu Lee, Siegfried Depner, Hassan Chafi
Comments: 8 pages, 12 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Sorting has been one of the most challenging studied problems in different
scientific researches. Although many techniques and algorithms has been
proposed on the theory of efficient parallel sorting implementation, however
achieving the desired performance on the variety of architectures with the
large number of processors is still the challenging issue. Maximizing the
parallelism level in the application can be achieved by minimizing the overhead
due to load imbalance and waiting time due to the memory latencies. In this
paper, we present a distributed sorting implemented in PGX.D, a fast
distributed graph processing system, which outperforms Spark distributed
sorting by around 2x-3x by hiding communication latencies and minimizing
unnecessary overheads. Furthermore, it shows that the proposed PGX.D sorting
method handles duplicated data efficiently and always results in having load
balance for different input data distribution types.
Jalal Khamse-Ashari, Ioannis Lambadaris, George Kesidis, Bhuvan Urgaonkar, Yiqiang Zhao
Comments: 10 pages, 7 figures, technical report
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Users of cloud computing platforms pose different types of demands for
multiple resources on servers (physical or virtual machines). Besides
differences in their resource capacities, servers may be additionally
heterogeneous in their ability to service users – certain users’ tasks may only
be serviced by a subset of the servers. We identify important shortcomings in
existing multi-resource fair allocation mechanisms – Dominant Resource Fairness
(DRF) and its follow up work – when used in such environments. We develop a new
fair allocation mechanism called Per-Server Dominant-Share Fairness (PS-DSF)
which we show offers all desirable sharing properties that DRF is able to offer
in the case of a single “resource pool” (i.e., if the resources of all servers
were pooled together into one hypothetical server). We evaluate the performance
of PS-DSF through simulations. Our evaluation shows the enhanced efficiency of
PS-DSF compared to the existing allocation mechanisms. We argue how our
proposed allocation mechanism is applicable in cloud computing networks and
especially large scale data-centers.
Alexander Jung, Alfred O. Hero III, Alexandru Mara, Sabeur Aridhi
Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
We propose a scalable method for semi-supervised (transductive) learning from
massive network-structured datasets. Our approach to semi-supervised learning
is based on representing the underlying hypothesis as a graph signal with small
total variation. Requiring a small total variation of the graph signal
representing the underlying hypothesis corresponds to the central smoothness
assumption that forms the basis for semi-supervised learning, i.e., input
points forming clusters have similar output values or labels. We formulate the
learning problem as a nonsmooth convex optimization problem which we solve by
appealing to Nesterovs optimal first-order method for nonsmooth optimization.
We also provide a message passing formulation of the learning method which
allows for a highly scalable implementation in big data frameworks.
Diego Fabregat-Traver (1), Davor Davidović (2), Markus Höhnerbach (1), Edoardo Di Napoli (3 and 4) ((1) AICES, RWTH Aachen University, (2) RBI, Zagreb, Croatia, (3) Jülich Supercomputing Centre, (4) Jülich Aachen Research Alliance — High-performance Computing)
Subjects: Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
In this paper we focus on the integration of high-performance numerical
libraries in ab initio codes and the portability of performance and
scalability. The target of our work is FLEUR, a software for electronic
structure calculations developed in the Forschungszentrum J”ulich over the
course of two decades. The presented work follows up on a previous effort to
modernize legacy code by re-engineering and rewriting it in terms of highly
optimized libraries. We illustrate how this initial effort to get efficient and
portable shared-memory code enables fast porting of the code to emerging
heterogeneous architectures. More specifically, we port the code to nodes
equipped with multiple GPUs. We divide our study in two parts. First, we show
considerable speedups attained by minor and relatively straightforward code
changes to off-load parts of the computation to the GPUs. Then, we identify
further possible improvements to achieve even higher performance and
scalability. On a system consisting of 16-cores and 2 GPUs, we observe speedups
of up to 5x with respect to our optimized shared-memory code, which in turn
means between 7.5x and 12.5x speedup with respect to the original FLEUR code.
Junseok Park, Gwangmin Kim, Dongjin Jang, Sungji Choo, Sunghwa Bae, Doheon Lee
Comments: 5 pages, 4 figures
Subjects: Information Retrieval (cs.IR); Distributed, Parallel, and Cluster Computing (cs.DC)
Literature analysis is a key step in obtaining background information in
biomedical research. However, it is difficult for researchers to obtain
knowledge of their interests in an efficient manner because of the massive
amount of the published biomedical literature. Therefore, efficient and
systematic search strategies are required, which allow ready access to the
substantial amount of literature. In this paper, we propose a novel search
system, named Co-Occurrence based on Co-Operational Formation with Advanced
Method(COCOFAM) which is suitable for the large-scale literature analysis.
COCOFAM is based on integrating both Spark for local clusters and a global job
scheduler to gather crowdsourced co-occurrence data on global clusters. It will
allow users to obtain information of their interests from the substantial
amount of literature.
Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao
Subjects: Learning (cs.LG)
The paper reviews an emerging body of theoretical results on deep learning
including the conditions under which it can be exponentially better than
shallow learning. Deep convolutional networks represent an important special
case of these conditions, though weight sharing is not the main reason for
their exponential advantage. Explanation of a few key theorems is provided
together with new results, open problems and conjectures.
Alexander Jung, Alfred O. Hero III, Alexandru Mara, Sabeur Aridhi
Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
We propose a scalable method for semi-supervised (transductive) learning from
massive network-structured datasets. Our approach to semi-supervised learning
is based on representing the underlying hypothesis as a graph signal with small
total variation. Requiring a small total variation of the graph signal
representing the underlying hypothesis corresponds to the central smoothness
assumption that forms the basis for semi-supervised learning, i.e., input
points forming clusters have similar output values or labels. We formulate the
learning problem as a nonsmooth convex optimization problem which we solve by
appealing to Nesterovs optimal first-order method for nonsmooth optimization.
We also provide a message passing formulation of the learning method which
allows for a highly scalable implementation in big data frameworks.
Chris J. Maddison, Andriy Mnih, Yee Whye Teh
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The reparameterization trick enables the optimization of large scale
stochastic computation graphs via gradient descent. The essence of the trick is
to refactor each stochastic node into a differentiable function of its
parameters and a random variable with fixed distribution. After refactoring,
the gradients of the loss propagated by the chain rule through the graph are
low variance unbiased estimators of the gradients of the expected loss. While
many continuous random variables have such reparameterizations, discrete random
variables lack continuous reparameterizations due to the discontinuous nature
of discrete states. In this work we introduce concrete random variables —
continuous relaxations of discrete random variables. The concrete distribution
is a new family of distributions with closed form densities and a simple
reparameterization. Whenever a discrete stochastic node of a computation graph
can be refactored into a one-hot bit representation that is treated
continuously, concrete stochastic nodes can be used with automatic
differentiation to produce low-variance biased gradients of objectives
(including objectives that depend on the log-likelihood of latent stochastic
nodes) on the corresponding discrete graph. We demonstrate their effectiveness
on density estimation and structured prediction tasks using neural networks.
Gabriel Synnaeve, Nantas Nardelli, Alex Auvolat, Soumith Chintala, Timothée Lacroix, Zeming Lin, Florian Richoux, Nicolas Usunier
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
We present TorchCraft, an open-source library that enables deep learning
research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by
making it easier to control these games from a machine learning framework, here
Torch. This white paper argues for using RTS games as a benchmark for AI
research, and describes the design and components of TorchCraft.
Weixiang Shao, Lifang He, Chun-Ta Lu, Philip S. Yu
Subjects: Learning (cs.LG)
In the era of big data, it is common to have data with multiple modalities or
coming from multiple sources, known as “multi-view data”. Multi-view clustering
provides a natural way to generate clusters from such data. Since different
views share some consistency and complementary information, previous works on
multi-view clustering mainly focus on how to combine various numbers of views
to improve clustering performance. However, in reality, each view may be
incomplete, i.e., instances missing in the view. Furthermore, the size of data
could be extremely huge. It is unrealistic to apply multi-view clustering in
large real-world applications without considering the incompleteness of views
and the memory requirement. None of previous works have addressed all these
challenges simultaneously. In this paper, we propose an online multi-view
clustering algorithm, OMVC, which deals with large-scale incomplete views. We
model the multi-view clustering problem as a joint weighted nonnegative matrix
factorization problem and process the multi-view data chunk by chunk to reduce
the memory requirement. OMVC learns the latent feature matrices for all the
views and pushes them towards a consensus. We further increase the robustness
of the learned latent feature matrices in OMVC via lasso regularization. To
minimize the influence of incompleteness, dynamic weight setting is introduced
to give lower weights to the incoming missing instances in different views.
More importantly, to reduce the computational time, we incorporate a faster
projected gradient descent by utilizing the Hessian matrices in OMVC. Extensive
experiments conducted on four real data demonstrate the effectiveness of the
proposed OMVC method.
Hao Wang, Xingjian Shi, Dit-Yan Yeung
Comments: To appear at NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Hybrid methods that utilize both content and rating information are commonly
used in many recommender systems. However, most of them use either handcrafted
features or the bag-of-words representation as a surrogate for the content
information but they are neither effective nor natural enough. To address this
problem, we develop a collaborative recurrent autoencoder (CRAE) which is a
denoising recurrent autoencoder (DRAE) that models the generation of content
sequences in the collaborative filtering (CF) setting. The model generalizes
recent advances in recurrent deep learning from i.i.d. input to non-i.i.d.
(CF-based) input and provides a new denoising scheme along with a novel
learnable pooling scheme for the recurrent autoencoder. To do this, we first
develop a hierarchical Bayesian model for the DRAE and then generalize it to
the CF setting. The synergy between denoising and CF enables CRAE to make
accurate recommendations while learning to fill in the blanks in sequences.
Experiments on real-world datasets from different domains (CiteULike and
Netflix) show that, by jointly modeling the order-aware generation of sequences
for the content information and performing CF for the ratings, CRAE is able to
significantly outperform the state of the art on both the recommendation task
based on ratings and the sequence generation task based on content information.
Hao Wang, Xingjian Shi, Dit-Yan Yeung
Comments: To appear at NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Neural networks (NN) have achieved state-of-the-art performance in various
applications. Unfortunately in applications where training data is
insufficient, they are often prone to overfitting. One effective way to
alleviate this problem is to exploit the Bayesian approach by using Bayesian
neural networks (BNN). Another shortcoming of NN is the lack of flexibility to
customize different distributions for the weights and neurons according to the
data, as is often done in probabilistic graphical models. To address these
problems, we propose a class of probabilistic neural networks, dubbed
natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment
of NN. NPN allows the usage of arbitrary exponential-family distributions to
model the weights and neurons. Different from traditional NN and BNN, NPN takes
distributions as input and goes through layers of transformation before
producing distributions to match the target output distributions. As a Bayesian
treatment, efficient backpropagation (BP) is performed to learn the natural
parameters for the distributions over both the weights and neurons. The output
distributions of each layer, as byproducts, may be used as second-order
representations for the associated tasks such as link prediction. Experiments
on real-world datasets show that NPN can achieve state-of-the-art performance.
Ananda Theertha Suresh, Felix X. Yu, H. Brendan McMahan, Sanjiv Kumar
Subjects: Learning (cs.LG)
Motivated by the need for distributed optimization algorithms with low
communication cost, we study communication efficient algorithms to perform
distributed mean estimation. We study scenarios in which each client sends one
bit per dimension. We first show that for (d) dimensional data with (n)
clients, a naive stochastic rounding approach yields a mean squared error
(Theta(d/n)). We then show by applying a structured random rotation of the
data (an (mathcal{O}(d log d)) algorithm), the error can be reduced to
(mathcal{O}((log d)/n)). The algorithms and the analysis make no
distributional assumptions on the data.
Jonathan Binas, Giacomo Indiveri, Michael Pfeiffer
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Despite their advantages in terms of computational resources, latency, and
power consumption, event-based implementations of neural networks have not been
able to achieve the same performance figures as their equivalent
state-of-the-art deep network models. We propose counter neurons as minimal
spiking neuron models which only require addition and comparison operations,
thus avoiding costly multiplications. We show how inference carried out in deep
counter networks converges to the same accuracy levels as are achieved with
state-of-the-art conventional networks. As their event-based style of
computation leads to reduced latency and sparse updates, counter networks are
ideally suited for efficient compact and low-power hardware implementation. We
present theory and training methods for counter networks, and demonstrate on
the MNIST benchmark that counter networks converge quickly, both in terms of
time and number of operations required, to state-of-the-art classification
accuracy.
Kshiteej Sheth
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose novel methods of solving two tasks using Convolutional Neural
Networks, firstly the task of generating HDR map of a static scene using
differently exposed LDR images of the scene captured using conventional cameras
and secondly the task of finding an optimal tone mapping operator that would
give a better score on the TMQI metric compared to the existing methods. We
quantitatively show the performance of our networks and illustrate the cases
where our networks performs good as well as bad.
Oren Barkan, Noam Koenigstein, Eylon Yogev
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Learning (cs.LG)
In Recommender Systems research, algorithms are often characterized as either
Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained
using a dataset of user explicit or implicit preferences while CB algorithms
are typically based on item profiles. These approaches harness very different
data sources hence the resulting recommended items are generally also very
different. This paper presents a novel model that serves as a bridge from items
content into their CF representations. We introduce a multiple input deep
regression model to predict the CF latent embedding vectors of items based on
their textual description and metadata. We showcase the effectiveness of the
proposed model by predicting the CF vectors of movies and apps based on their
textual descriptions. Finally, we show that the model can be further improved
by incorporating metadata such as the movie release year and tags which
contribute to a higher accuracy.
Rebecca Fiebrink, Baptiste Caramiaux
Comments: Pre-print to appear in the Oxford Handbook on Algorithmic Music. Oxford University Press
Subjects: Human-Computer Interaction (cs.HC); Learning (cs.LG)
Machine learning is the capacity of a computational system to learn
structures from datasets in order to make predictions on newly seen data. Such
an approach offers a significant advantage in music scenarios in which
musicians can teach the system to learn an idiosyncratic style, or can break
the rules to explore the system’s capacity in unexpected ways. In this chapter
we draw on music, machine learning, and human-computer interaction to elucidate
an understanding of machine learning algorithms as creative tools for music and
the sonic arts. We motivate a new understanding of learning algorithms as
human-computer interfaces. We show that, like other interfaces, learning
algorithms can be characterised by the ways their affordances intersect with
goals of human users. We also argue that the nature of interaction between
users and algorithms impacts the usability and usefulness of those algorithms
in profound ways. This human-centred view of machine learning motivates our
concluding discussion of what it means to employ machine learning as a creative
tool.
Jonathan Binas, Daniel Neil, Giacomo Indiveri, Shih-Chii Liu, Michael Pfeiffer
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
There is an urgent need for compact, fast, and power-efficient hardware
implementations of state-of-the-art artificial intelligence. Here we propose a
power-efficient approach for real-time inference, in which deep neural networks
(DNNs) are implemented through low-power analog circuits. Although analog
implementations can be extremely compact, they have been largely supplanted by
digital designs, partly because of device mismatch effects due to fabrication.
We propose a framework that exploits the power of Deep Learning to compensate
for this mismatch by incorporating the measured variations of the devices as
constraints in the DNN training process. This eliminates the use of mismatch
minimization strategies such as the use of very large transistors, and allows
circuit complexity and power-consumption to be reduced to a minimum. Our
results, based on large-scale simulations as well as a prototype VLSI chip
implementation indicate at least a 3-fold improvement of processing efficiency
over current digital implementations.
Dengkui Zhu, Boyu Li, Ping Liang
Comments: accepted to journal
Subjects: Information Theory (cs.IT)
Hybrid beamforming (HB) has been widely studied for reducing the number of
costly radio frequency (RF) chains in massive multiple-input multiple-output
(MIMO) systems. However, previous works on HB are limited to a single user
equipment (UE) or a single group of UEs, employing the frequency-flat
first-level analog beamforming (AB) that cannot be applied to multiple groups
of UEs served in different frequency resources in an orthogonal
frequency-division multiplexing (OFDM) system. In this paper, a novel HB
algorithm with unified AB based on the spatial covariance matrix (SCM)
knowledge of all UEs is proposed for a massive MIMO-OFDM system in order to
support multiple groups of UEs. The proposed HB method with a much smaller
number of RF chains can achieve more than 95% performance of full digital
beamforming. In addition, a novel practical subspace construction (SC)
algorithm based on partial channel state information is proposed to estimate
the required SCM. The proposed SC method can offer more than 97% performance of
the perfect SCM case. With the proposed methods, significant cost and power
savings can be achieved without large loss in performance. Furthermore, the
proposed methods can be applied to massive MIMO-OFDM systems in both
time-division duplex and frequency-division duplex.
C.F. Lopez, C.-X. Wang, R. Feng
Comments: 6 pages, 5 figures, conference
Journal-ref: IEEE International Workshop on Computer Aided Modelling and Design
of Communication Links and Networks (CAMAD), Toronto, Canada, Oct. 2016
Subjects: Information Theory (cs.IT)
In this paper, a novel two-dimensional (2D) non-stationary wideband
geometry-based stochastic model (GBSM) for massive multiple-input
multiple-output (MIMO) communication systems is proposed. Key characteristics
of massive MIMO channels such as near field effects and cluster evolution along
the array are addressed in this model. Near field effects are modelled by a
second-order approximation to spherical wavefronts, i.e., parabolic wavefronts,
leading to linear drifts of the angles of multipath components (MPCs) and
non-stationarity along the array. Cluster evolution along the array involving
cluster (dis)appearance and smooth average power variations is considered.
Cluster (dis)appearance is modeled by a two-state Markov process and smooth
average power variations are modelled by a spatial lognormal process.
Statistical properties of the channel model such as time autocorrelation
function (ACF), spatial cross-correlation function (CCF), and cluster average
power and Rician factor variations over the array are derived. Finally,
simulation results are presented and analyzed, demonstrating that parabolic
wavefronts and cluster soft evolution are good candidates to model important
massive MIMO channel characteristics.
Huimei Han, Xudong Guo, Ying Li
Comments: 5 pages,6 figures
Subjects: Information Theory (cs.IT)
A new scheme to resolve the intra-cell pilot collision for M2M communication
in crowded massive multiple-input multiple-output (MIMO) systems is proposed.
The proposed scheme permits those failed user equipments (UEs), judged by a
strongest-user collision resolution (SUCR) protocol, to contend for the idle
pilots, i.e., the pilots that are not selected by any UE in the initial step.
This scheme is called as SUCR combined idle pilots access (SUCR-IPA). To
analyze the performance of the SUCR-IPA scheme, we develop a simple method to
compute the access success probability of the UEs in each random access slot
(RAST). The simulation results coincide well with the analysis. It is also
shown that, compared to the SUCR protocol, the proposed SUCR-IPA scheme
increases the throughput of the system significantly, and thus decreases the
number of access attempts dramatically.
Adam Noel, Andrew W. Eckford
Comments: 6 pages, 1 table, 4 figures. Submitted to IEEE ICC 2017
Subjects: Information Theory (cs.IT)
Molecular communication requires low-complexity symbol detection algorithms
to deal with the many sources of uncertainty that are inherent in these
channels. This paper proposes two variants of a high-performance asynchronous
peak detection algorithm for a receiver that makes independent observations.
The first variant has low complexity and measures the largest observation
within a sampling interval. The second variant adds decision feedback to
mitigate inter-symbol interference. Although the algorithm does not require
synchronization between the transmitter and receiver, results demonstrate that
the bit error performance of symbol-by-symbol detection using the first variant
is better than using a single sample whose sampling time is chosen a priori.
The second variant is shown to have performance comparable to that of an energy
detector. Both variants of the algorithm demonstrate better resilience to
timing offsets than that of existing detectors.
Zheda Li, Nadisanka Rupasinghe, Ozgun Y. Bursalioglu, Chenwei Wang, Haralabos Papadopoulos, Giuseppe Caire
Comments: 12 pages, 7 figures
Subjects: Information Theory (cs.IT)
We consider a single-cell scenario involving a single base station (BS) with
a massive array serving multi-antenna terminals in the downlink of a mmWave
channel. We present a class of multiuser user MIMO schemes, which rely on
uplink training from the user terminals, and on uplink/downlink channel
reciprocity. The BS employs virtual sector-based processing according to which,
user-channel estimation and data transmission are performed in parallel over
non-overlapping angular sectors. The uplink training schemes we consider are
non-orthogonal, that is, we allow multiple users to transmit pilots on the same
pilot dimension (thereby potentially interfering with one another). Elementary
processing allows each sector to determine the subset of user channels that can
be resolved on the sector (effectively pilot contamination free) and, thus, the
subset of users that can be served by the sector. This allows resolving
multiple users on the same pilot dimension at different sectors, thereby
increasing the overall multiplexing gains of the system. Our analysis and
simulations reveal that, by using appropriately designed directional training
beams at the user terminals, the sector-based transmission schemes we present
can yield substantial spatial multiplexing and ergodic user-rates improvements
with respect to their orthogonal-training counterparts.
Eleonesio Strey, Sueli I. R. Costa
Comments: 15 pages
Subjects: Information Theory (cs.IT)
Lattices have been used in several problems in coding theory and
cryptography. In this paper we approach (q)-ary lattices obtained via
Constructions D, (D’) and (overline{D}). It is shown connections between
Constructions D and (D’). Bounds for the minimum (l_1)-distance of lattices
(Lambda_{D}), (Lambda_{D’}) and (Lambda_{overline{D}}) and, under certain
conditions, a generator matrix for (Lambda_{D’}) are presented. In addition,
when the chain of codes used is closed under the zero-one addition, we derive
explicit expressions for the minimum (l_1)-distances of the lattices
(Lambda_{D}) and (Lambda_{overline{D}}) attached to the distances of the
codes used in these constructions.
Yusuke Kawamoto, Konstantinos Chatzikokolakis, Catuscia Palamidessi
Comments: 30 pages
Subjects: Logic in Computer Science (cs.LO); Information Theory (cs.IT)
In the min-entropy approach to quantitative information flow, the leakage is
defined in terms of a minimization problem, which, in the case of large
systems, can be computationally rather heavy. The same happens for the recently
proposed generalization called g-vulnerability. In this paper we study the case
in which the channel associated to the system can be decomposed into simpler
channels, which typically happens when the observables consist of multiple
components. Our main contribution is the derivation of bounds on the g-leakage
of the whole system in terms of the g-leakages of its components. We also
consider the particular cases of min-entropy leakage and of parallel channels,
generalizing and systematizing results from the literature. We demonstrate the
effectiveness of our method and evaluate the precision of our bounds using
examples.