GitHub - anopows/SketchModels: In this project, a representation space is learned to do classification and suggestions of similar sketches on the Quick, Draw! dataset.

Representation Learning for Sketches

About

In this project, a representation space is learned for sketch classification and providing suggestions of similar sketches on the Quick, Draw! dataset.

For instructions of code implemenation please visit the Instructions README.

Sample sketches of the dataset:

Models

Four different models are implemented on both temporal and image data: LSTM, BLSTM, ResNet-v2, VRNN.

A model with combined mode of representation can be found in this repository: Combined Representations

VRNN Implementation

Overview of VRNN architecture:

The learned repesentation space is a merged vector consisting of h and z states:

To improve classification accuracy, the VRNN losses (KL-divergence, loglikelihood loss) and classification loss were added together to form a combined loss function.

Results

Classification

For a given batch size VRNN performs best in terms of classificaiton accuracy (trained with batch size 32):

Suggestions

Looking up the closest neighbors in the learned representation space, the VRNN network generally has neighbors with closer matching styles. e.g. wheels drawn with two circles:

Batch size

It is to be noted that the batch size affects the training performance significantly. This limits the implementations of larger models, as they don't fit into GPU memory when used with high batch size:

Possible explanation of batch size effect

One possible reason is that a batch with a smaller batch size has higher likelihood to include mostly faulty sketches.