Transformations: Modular tensor computation graphs in Julia

November 14, 2016

In this post I’ll try to summarize my design goals with the Transformations package for the JuliaML ecosystem. Transformations should be seen as a modular and higher-level approach to building complex tensor computation graphs, similar to those you may build in TensorFlow or Theano. The major reason for designing this package from the ground up lies in the flexibility of a pure-Julia implementation for new research paths. If you want to apply convolutional neural nets to identify cats in pictures, this is not the package for you. My focus is in complex, real-time, and incremental algorithms for learning from temporal data. I want the ability to track learning progress in real time, and to build workflows and algorithms that don’t require a gpu server farm.

What is a tensor computation graph?

A computation graph, or data flow graph, is a representation of math equations using a directed graph of nodes and edges. Here’s a simple example using Mike Innes’ cool package DataFlow. I’ve built a recipe for converting DataFlow graphs to PlotRecipes graphplot calls. See the full notebook for complete Julia code.

We’ll compute $f(x) = w * x + b$:

g = @flow f(x) = w * x + b

The computation graph is a graphical representation of the flow of mathematical calculations to compute a function. Follow the arrows, and do the operations on the inputs. First we multiply w and x together, then we add the result with b. The result of the addition is our output of the function f.

When x/w/b are numbers, this computation flow is perfectly easy to follow. But when they are tensors, the graph is much more complicated. Here’s the same example for a 1D, 2-element version where w is a 2x2 weight matrix and x and b are 2x1 column vectors:

plot(@flow f(x) = out(w11*x1 + w12*x2 + b1, w21*x1 + w22*x2 + b2))

Already this computational graph is getting out of hand. A tensor computation graph simply re-imagines the vector/matrix computations as the core units, so that the first representation ($f(x) = w * x + b$) is used to represent the tensor math which does a matrix-vector multiply and a vector add.

Transformation Graphs

Making the jump from computation graph to tensor computation graph was a big improvement in complexity and understanding of the underlying operations. This improvement is the core of frameworks like TensorFlow. But we can do better. In the same way a matrix-vector product

\[(W*x)_i = \sum_j W_{ij} x_j\]

can be represented as simply the vector $Wx$, we can treat the tensor transformation $f(x) = wx + b$ as a black box function which takes input vector (x) and produces output vector (f(x)). Parameter nodes (w/b) are considered learnable parameters which are internal to the learnable transformation. The new transformation graph looks like:

plot(@flow f(x) = affine(x))

Quite the improvement! We have created a modular, black-box representation of our affine transformation, which takes a vector input, multiplies by weight vector and adds a bias vector, producing a vector output:

And here’s the comparison for a basic recurrent network:

g = @flow function net(x)
    hidden = relu( Wxh*x + Whh*hidden + bh )
    y = logistic( Why*hidden + Wxy*x + by )

But… TensorFlow

The unfortunate climate of ML/AI research is: “TensorFlow is love, TensorFlow is life”. If it can’t be built into a TF graph, it’s not worth researching. I think we’re in a bit of a deep learning hype bubble at the moment. Lots of time and money is poured into hand-designed networks with (sometimes arbitrary) researcher-chosen hyperparameters and algorithms. Your average high school student can install and build a neural network to perform quite complex and impressive models. But this is not the path to human-level intelligence. You could represent the human brain by a fully connected deep recurrent neural network with a billion neurons and a quintillion connections, but I don’t think NVIDIA has built that GPU yet.

I believe that researchers need a more flexible framework to build, test, and train complex approaches. I want to make it easy to explore spiking neural nets, dynamically changing structure, evolutionary algorithms, and anything else that may get us closer to human intelligence (and beyond). See my first post on efficiency for a little more background on my perspective. JuliaML is my playground for exploring the future of AI research. TensorFlow and competitors are solutions to a very specific problem: gradient-based training of static tensor graphs. We need to break the cycle that research should only focus on solutions that fit (or can be hacked into) this paradigm. Transformations (the Julia package) is a generalization of tensor computation that should be able to support other paradigms and new algorithmic approaches to learning, though the full JuliaML design is one which empowers researchers to approach the problem from any perspective they see fit.