Deep Learning Programming on Ruby

Transcript

About us (1) • Kenta Murata (@mrkn) • Full-time CRuby
committer at Speee, Inc. • bigdecimal, enumerable-statistics, pycall.rb, mxnet.rb, etc. • Ruby, C/C++, Python, Julia, etc. • neovim, vscode (neovim client)
Session Introduction
Deep Learning  in Ruby
mxnet.rb
What is mxnet.rb? ‣ Ruby binding library of Apache MXNet
‣ Since Nov 2017 ‣ You can write deep learning programs in Ruby by using mxnet.rb and MXNet runtime library ‣ It doesn’t depend on Python runtime ‣ You need only Ruby ‣ But `pip install mxnet` is currently easiest way to install MXNet runtime library
Why I write mxnet.rb ‣ I want to write deep
learning programs in Ruby ‣ Without dependency on Python (pycall.rb) ‣ There is Tensorflow.rb, but I don’t want to use Tensorflow C API ‣ I think Apache MXNet must be best for Ruby
Companies that support Apache MXNet https://mxnet.incubator.apache.org/community/powered_by.html
Academic organizations that support Apache MXNet
Imperative vs Symbolic • Imperative Programs Tend to be More
Flexible • It enables us to write loop directly in the syntax of the programming language  e.g. while, until, loop { … }, each { … }, etc. • Symbolic Programs Tend to be More Efficient • It can optimize memory usage automatically • It can optimize computation orders
Multi-node computation • You can use MXNet as a framework
for distributed scientific computation • Using Key-Value Store to exchange parameters among each thread in each machine • For example: • Distributed model training  https://mxnet.incubator.apache.org/versions/master/faq/distributed_training.html
require 'mxnet' module MLPScratch ND = MXNet::NDArray class MLP def
initialize(num_inputs: 784, num_outputs: 10, num_hidden_units: [256, 128, 64], ctx: nil) @layer_dims = [num_inputs, *num_hidden_units, num_outputs] @weight_scale = 0.01 @ctx = ctx || MXNet::Context.default @all_parameters = init_parameters end attr_reader :ctx, :all_parameters, :layer_dims private def rnorm(shape) ND.random_normal(shape: shape, scale: @weight_scale, ctx: @ctx) end private def init_parameters @weights = [] @biases = [] @layer_dims.each_cons(2) do |dims| @weights << rnorm(dims) @biases << rnorm([dims[1]]) end [*@weights, *@biases].each(&:attach_grad) DEMO
require 'mxnet' module MLPScratch ND = MXNet::NDArray class MLP def
initialize(num_inputs: 784, num_outputs: 10, num_hidden_units: [256, 128, 64], ctx: nil) @layer_dims = [num_inputs, *num_hidden_units, num_outputs] @weight_scale = 0.01 @ctx = ctx || MXNet::Context.default @all_parameters = init_parameters end attr_reader :ctx, :all_parameters, :layer_dims private def rnorm(shape) ND.random_normal(shape: shape, scale: @weight_scale, ctx: @ctx) end private def init_parameters @weights = [] @biases = [] @layer_dims.each_cons(2) do |dims| @weights << rnorm(dims) @biases << rnorm([dims[1]]) end [*@weights, *@biases].each(&:attach_grad) end private def relu(x) ND.maximum(x, ND.zeros_like(x)) end def forward(x) h = x n = @layer_dims.length (n - 2).times do |i| h_linear = ND.dot(h, @weights[i]) + @biases[i] h = relu(h_linear) end y_hat_linear = ND.dot(h, @weights[-1]) + @biases[-1] end private def softmax_cross_entropy(y_hat_linear, t) -ND.nansum(t * ND.log_softmax(y_hat_linear), axis: 0, exclude: true) end def loss(y_hat_linear, t) softmax_cross_entropy(y_hat_linear, t) end def predict(x) y_hat_linear = forward(x) ND.argmax(y_hat_linear, axis: 1) end end module_function def SGD(params, lr) params.each do |param| param[0..-1] = param - lr * param.grad end end def evaluate_accuracy(data_iter, model) num, den = 0.0, 0.0 data_iter.each_with_index do |batch, i| data = batch.data[0].as_in_context(model.ctx) data = data.reshape([-1, model.layer_dims[0]]) label = batch.label[0].as_in_context(model.ctx) predictions = model.predict(data) num += ND.sum(predictions == label) den += data.shape[0] end (num / den).as_scalar end def learning_loop(train_iter, test_iter, model, epochs: 10, learning_rate: 0.001, smoothing_constant: 0.01) epochs.times do |e| start = Time.now cumloss = 0.0 num_batches = 0 train_iter.each_with_index do |batch, i| data = batch.data[0].as_in_context(model.ctx) data = data.reshape([-1, model.layer_dims[0]]) label = batch.label[0].as_in_context(model.ctx) label_one_hot = ND.one_hot(label, depth: model.layer_dims[-1]) loss = MXNet::Autograd.record do y = model.forward(data) model.loss(y, label_one_hot) end loss.backward SGD(model.all_parameters, learning_rate) cumloss = ND.sum(loss).as_scalar num_batches += 1 end test_acc = evaluate_accuracy(test_iter, model) train_acc = evaluate_accuracy(train_iter, model) duration = Time.now - start puts "Epoch #{e}. Loss: #{cumloss / (train_iter.batch_size * num_batches)}, " + "train-acc: #{train_acc}, test-acc: #{test_acc} (#{duration} sec)" end end end
Red Chainer
Can be constructed like Ruby
By having Red Chainer Application Deep Learning Red Chainer
Future of Red Chainer • GPU compatible: sonots/cumo • Fast
Numerical Computing and Deep Learning in Ruby with Cumo  http://rubykaigi.org/2018/presentations/sonots.html#may31 • Support Apache Arrow • Develop around Red Chainer • red-datasets: provides common datasets • red-arrow: Apache Arrow Ruby binding
Summary • introduced Red Chainer of Deep Learning Framework created
in Ruby • Interested in Red Data Tools, Red Chainer • online • en: https://gitter.im/red-data-tools/en • ja: https://gitter.im/red-data-tools/ja • offline • hold meetup every month at Speee, inc in Tokyo • https://speee.connpass.com/ • I’m at the Speee booth at RubyKaigi2018
Overview of the current status of Ruby’s data science supports
SciRuby GSoC In GSoC 2018, SciRuby accepts 5 students, and
then the following 4 projects are running: • Business Intelligence with daru • Advanced features in daru-views • NetworkX.rb: Ruby version of NetworkX • Ruby version of matplotlib The discussions are being held on RubyData’s discourse  https://discourse.ruby-data.org/c/gsoc/gsoc2018
Talk Summary