What Makes Generative AI Different
Generative AI represents a fundamental shift in what artificial intelligence can do. Before the current generation of generative AI tools, most AI systems were discriminative — they classified and categorized existing information. A discriminative model could look at an image and tell you it contained a cat, or analyze an email and label it as spam. These models were incredibly useful, but their output was limited to labels, categories, and predictions about existing data.
Generative AI models do something fundamentally different: they create new content. A generative model can produce a never-before-seen image of a cat, write an original poem, compose a melody, or generate code to solve a programming problem. Instead of drawing boundaries between categories, generative models learn the underlying distribution of their training data and sample from it to produce novel outputs that resemble the training data without copying it.
After testing this extensively, this distinction matters because it opens up entirely new categories of AI applications. Discriminative AI helps you analyze and understand the world. Generative AI helps you create and produce. Where discriminative AI augments your judgment, generative AI augments your creativity. A doctor might use discriminative AI to analyze a medical scan and detect a tumor, but they could use generative AI to produce a 3D visualization of the tumor for surgical planning, generate a patient-friendly explanation of the condition, or create synthetic training data for other AI models.
How Generative Models Are Built
My take: generative AI models are built using the same fundamental deep learning technologies as other AI systems, but with crucial differences in their training objectives and architectures. The training of a generative model involves learning the probability distribution of the training data — essentially, the model learns what makes a cat image look like a cat image, or what makes a sentence sound like natural English. Once it has learned this distribution, it can sample from it to create new examples.
Generative Adversarial Networks (GANs), introduced by Ian Goodfellow in 2014, were one of the earliest breakthroughs in generative AI. A GAN consists of two neural networks — a generator and a discriminator — that are trained together in a competitive game. The generator creates fake images, and the discriminator tries to distinguish them from real ones. The generator improves by learning to create images that fool the discriminator, while the discriminator improves by getting better at detecting fakes. Through this adversarial process, the generator becomes capable of creating remarkably realistic images.
Diffusion models, the technology behind popular image generators like DALL-E, Midjourney, and Stable Diffusion, take a different approach. They are trained by taking clean images and gradually adding random noise until the image becomes pure static. The model learns to reverse this process — starting with pure noise and gradually removing it to reconstruct a clean image. During generation, the model starts with random noise and iteratively denoises it into a coherent image that matches a text description. Diffusion models have become the dominant approach for image generation because they produce higher quality and more diverse outputs than GANs.
For text generation, the dominant architecture is the Transformer, which powers models like GPT-4, Claude, and Gemini. These models are trained on the simple task of predicting the next token in a sequence. When generating text, they start with a prompt and predict one token at a time, with each prediction conditioned on all the tokens generated so far. The element of randomness in the selection process ensures that the output isn't deterministic — the same prompt can produce different responses on different occasions.
Why does this matter?
Types of Generative AI by Medium
Generative AI capabilities have expanded across multiple types of media, each with its own specialized models and techniques. Text generation is the most widely used form of generative AI. Large language models can write articles, compose emails, generate code, create marketing copy, draft legal documents, and engage in creative writing. These tools have become essential productivity aids for knowledge workers, writers, and developers.
Image generation has captured the public imagination perhaps more than any other generative AI capability. Tools like Midjourney, DALL-E 3, and Stable Diffusion can transform text descriptions into stunning visual artwork, photorealistic images, and creative designs. Users can generate images in virtually any style — from oil painting to anime to architectural visualization — simply by describing what they want. The technology has sparked important discussions about copyright, artistic authenticity, and the nature of creativity.
Music and audio generation is a rapidly advancing frontier. AI systems can generate original music in various genres and styles, create realistic voice recordings from text, and even clone a person's voice from a short sample. Tools like Suno and ElevenLabs allow users to generate complete songs with lyrics and melodies, or produce professional-quality voiceovers without a recording studio.
In my experience, video generation is the newest and most technically demanding form of generative AI. Systems like OpenAI's Sora and Runway can generate short video clips from text descriptions. These models must maintain consistency across frames, handle motion and physics, and generate coherent temporal sequences — a much harder problem than generating single images. While still in early stages, video generation is advancing rapidly and will likely transform video production, advertising, and entertainment.
Code generation has become one of the most practically useful forms of generative AI. Tools like GitHub Copilot, Cursor, and Claude can generate functional code from natural language descriptions, explain existing code, find bugs, suggest optimizations, and even write complete applications. These tools are transforming software development by automating routine coding tasks and making programming more accessible to non-experts.
Real-World Applications and Impact
I was skeptical at first, but generative AI has found practical applications across nearly every industry. In marketing and advertising, teams use generative AI to create campaign copy, social media content, and visual assets at a fraction of the traditional cost and time. In product design, generative AI helps designers explore thousands of design variations quickly, accelerating the creative process. In education, teachers use generative AI to create lesson plans, generate practice problems, and provide personalized explanations for students.
The entertainment industry has embraced generative AI for previsulization, storyboarding, concept art, and even script development. Game developers use generative AI to create environments, characters, dialogue, and quest content. Journalists use AI to draft news briefs, summarize research, and transcribe interviews. Architects use generative AI to explore building designs and generate renderings.
Generative AI also raises important challenges. The ease of creating realistic fake content has amplified concerns about misinformation and deepfakes. Copyright law is grappling with questions about whether AI-generated content can be copyrighted and whether training on copyrighted works constitutes infringement. There are valid concerns about job displacement in creative fields. And the environmental impact of training and running large generative models is significant.
Understanding generative AI helps you make sense of the broader AI landscape. For context on the technology that powers generative models, read about deep learning and neural networks and understanding large language models.
What I'd Tell My Past Self
I've been using this for a while now, and - Generative AI creates new content rather than just classifying or analyzing existing data
- Key architectures include GANs, diffusion models, and Transformers, each suited to different types of content — took me a while to figure this out
- Generative AI can produce text, images, music, video, code, and other creative outputs — your experience may differ, but this worked for me
- Training involves learning the probability distribution of training data and sampling from it — wish I'd known this six months ago
- Applications span marketing, design, entertainment, education, software development, and many other fields — wish I'd known this six months ago
- Important challenges include misinformation, copyright, job displacement, and environmental impact (this one actually surprised me)
- Generative AI augments human creativity rather than replacing it, enabling new forms of expression and productivity
Why does this matter?