In manymachinelearning applications, it is often expensive and time-consuming to collect annotated training data. A significant amount of research effort has therefore been devoted to developing techniques that enable training from as little annotated data as possible. Towards this goal, automatic data augmentations have proven to be very effective, particularly for image data. Existing forms of data augmentation often focus on geometric image transformations like cropping and color jittering. In this work, we explore a different kind of data augmentation, which we call “semantic augmentations,” and which can be added to most existing semi-supervised learning methods. New semantic augmentations of existing labeled training images are obtained by denoising a noisy version of the training data using a diffusion model. We show that by adding our semantic augmentation module to FixMatch, one of the leading semi-supervised learning methods, we obtain accuracy gains of up to 10% in the extreme case where we have only one labeled training example per class.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.