Representation Learning for Sketches
About
In this project, a representation space is learned for sketch classification and providing suggestions of similar sketches on the Quick, Draw! dataset.
For instructions of code implemenation please visit the Instructions README.
Sample sketches of the dataset:

Models
Four different models are implemented on both temporal and image data: LSTM, BLSTM, ResNet-v2, VRNN.
A model with combined mode of representation can be found in this repository: Combined Representations
VRNN Implementation
Overview of VRNN architecture:

The learned repesentation space is a merged vector consisting of h and z states:

To improve classification accuracy, the VRNN losses (KL-divergence, loglikelihood loss) and classification loss were added together to form a combined loss function.
Results
Classification
For a given batch size VRNN performs best in terms of classificaiton accuracy (trained with batch size 32):
Suggestions
Looking up the closest neighbors in the learned representation space, the VRNN network generally has neighbors with closer matching styles. e.g. wheels drawn with two circles:

Batch size
It is to be noted that the batch size affects the training performance significantly. This limits the implementations of larger models, as they don't fit into GPU memory when used with high batch size:
Possible explanation of batch size effect
One possible reason is that a batch with a smaller batch size has higher likelihood to include mostly faulty sketches.



