ArXiv: 1902.06818 · DOI: 10.48550/1902.06818
Sentiment analysis is a task that may suffer from a lack of data in certain
cases, as the datasets are often generated and annotated by humans. In cases
where data is inadequate for training discriminative models, generate models
may aid training via data augmentation. Generative Adversarial Networks (GANs)
are one such model that has advanced the state of the art in several tasks,
including as image and text generation. In this paper, I train GAN models on
low resource datasets, then use them for the purpose of data augmentation
towards improving sentiment classifier generalization. Given the constraints of
limited data, I explore various techniques to train the GAN models. I also
present an analysis of the quality of generated GAN data as more training data
for the GAN is made available. In this analysis, the generated data is
evaluated as a test set (against a model trained on real data points) as well
as a training set to train classification models. Finally, I also conduct a
visual analysis by projecting the generated and the real data into a
two-dimensional space using the t-Distributed Stochastic Neighbor Embedding
(t-SNE) method.