Distributed vector representations are a set of techniques which take a domain (usually, words) and embed it into a linear space (representing each word as a large vector of numbers). Useful tasks can then be represented as manipulations of these embedded representations. The embedding can be created in a variety of ways; often, it is learned by optimizing task performance. SENNA demonstrated that representations learned for one task are often useful for others.
There are so many interesting advances being made in distributed vector
representations, it seems that a nice toolset is emerging which will
soon be considered a basic part of machine intelligence.
Google's
word2vec assigns distributed vector representations to individual words
and a few short phrases. These representations have been shown to give intuitively reasonable results on analogy tasks with simple vector math: king - man + woman is approximately equal to the vector for queen, for example. This is despite not being explicitly optimized for that task, again showing that these representations tend to be useful for a wide range of tasks.
Similar approaches have aided
machine translation tasks by turning word translation into a linear
transform from one vector space to another.
One limitation of
this approach is that we cannot do much to represent sentences.
Sequences of words can be given somewhat useful representations by
adding together the individual word representations, but this approach is limited.
Socher's RNN learns a
matrix transform to compose two elements together and give them a score,
which is then used for greedy parsing by composing together the
highest-scoring items, with great success. This gives us useful vector
representations for phrases and sentences.
Another approach which
has been suggested is circular convolution. This combines vectors in a
way which captures ordering information, unlike addition or
multiplication. Impressively, the technique has solved Raven progressive
matrix problems:
http://eblerim.net/?page_id=2383
Then
there's a project, COMPOSES, which seeks to create a language
representation in which nouns get vector representations and other parts
of speech get matrix representations (and possibly tensor
representations?).
http://clic.cimec.unitn.it/composes/
I
haven't looked into the details fully, but conceptually it makes sense:
the parts of speech which intuitively represent modifiers are linear functions, while the parts of speech which are intuitively static
objects are getting operated on by these functions.
The following paper gives a related approach:
http://www.cs.utoronto.ca/~ilya/pubs/2008/mre.pdf
Here,
everything is represented as a matrix of the same size. Representing
the objects as functions is somewhat limiting, but the uniform
representation makes it easy to jump to higher-level functions
(modifiers on modifiers) without adding anything. This seems to have the
potential to enable a surprisingly wide range of reasoning
capabilities, given the narrow representation.
As the authors of
that last paper mention, the approach can only support reasoning of a
"memorized" sort. There is no mechanism which would allow chained
logical inferences to answer questions. This seems like a good characterization of the general limitations of the broader set of techniques. The distributed representation of a word, phrase, image, or other object is a static encoding which represents, in some sense, a classification of the object into a fuzzy categorization system we've learned. How can we push the boundary here, allowing for a more complex reasoning? Can these vector representations be integrated into a more generally capable probabilistic logic system?
No comments:
Post a Comment