An autoencoder compresses its input down to a vector - with much fewer dimensions than its input data, and then transforms it back into a tensor with the same shape as its input over several neural net layers. They’re trained to reproduce their input, so it’s kind of like learning a compression algorithm for that specific dataset.
A GAN looks kind of like an inside out autoencoder - instead of compressing high dimensional data, it has low dimensional vectors as the inputs, high dimensional data in the middle.
Instead of being given a bit of data as input, it’s given a small vector of random numbers. The generator network tries to transform this little vector into a realistic sample from the training data. The discriminator network then takes this generated sample(and some real samples from the dataset) and learns to guess whether the samples are real or fake. Well, more precisely it’s trained to minimize cross entropy between the probability it outputs and a vector of *0s* for fake images and “1”s for real images. The generator learns to make more convincing samples,(or minimize the cross entropy between the discriminators guess about it’s creations and “1”)
Another difference: while they both fall under the umbrella of unsupervised learning, they are different approaches to the problem. A GAN is a generative model - it’s supposed to learn to generate realistic *new* samples of a dataset. Variational autoencoders are generative models, but normal “vanilla” autoencoders just reconstruct their inputs and can’t generate realistic new samples.