The Third Language
What a hundred years of media history says about the vertical video gold rush
A Sidebar from this week’s Vertical Media Summit in Los Angeles, California…
I spent the afternoon of June 3rd in a room at the W Hollywood watching a very good argument get made on vertical LED screens. The Vertical Media Summit is a half-day event hosted by Owl & Co, the media and tech advisory firm run by Hernan Lopez, and Lopez built the whole thing around a single claim he laid out in the opening keynote. Film was the first audiovisual language. Television was the second. Vertical video is the third, something genuinely new, with its own economics, aesthetics, and audience psychology, and most of it is still up for grabs.
He made the case well. The line that stuck with me was his evolution analogy. If you had watched a daytime serial in 1950, he asked, would you have imagined that the same box would one day give you Game of Thrones, South Park, and the Super Bowl? Microdramas, in his telling, are to vertical what soap operas were to early television. An early, slightly embarrassing expression of a medium that has not met its prestige era yet. He also moved a number while I watched. A year ago he had sized the vertical video economy outside China at around a hundred billion dollars. On stage he pushed it to a hundred and fifty billion, a figure he is set to lay out in detail in a forthcoming piece, with at least three vertical-native companies already running as billion-dollar businesses.
I am not here to argue with any of that. The keynote was good enough to send me researching, which is the highest compliment I know how to pay a talk. I sat through the panels and the firesides with the third-language idea running underneath everything else, and by the time the cocktail hour started I had a different question than the one I walked in with. Not whether Hernan is wrong. He mostly is not. The question is whether “third language” is the right name for what I was looking at, or whether vertical has a longer and stranger parentage than a clean count of one, two, three lets on.
This is my attempt to answer that. It is a reaction, not a rebuttal. His keynote gave me a lens. The history made the lens sharper, and I am just pedantic enough to want that extra edge.
What counts as a language
Before you can decide whether vertical is a new language, you have to decide what a new language is. This is where most of the vertical-versus-format arguments go sideways, because the two sides are using the word to mean different things.
The test I ended up using is this. A real break in audiovisual language changes more than the image. It changes how the work is produced, how it is distributed, how audiences build a habit around it, and how money flows through the system. Through that lens, things like color, widescreen, and IMAX look less like new languages than expansions of an existing one. They enrich the image without fully reorganizing production, distribution, habit, and money. Sound film was a new language, because synchronized sound reorganized production from the first day of the shoot. Television was a new language for reasons I will get to.
Run that lens across a hundred years and a useful thing happens. The history stops looking like a single line and starts looking like two. One lineage is about the image, how we compose and cut visual meaning. The other is about habit, how a medium earns your return, builds a daily reflex, and turns that reflex into money. Film is the headwater of the first. Radio, of all things, is the headwater of the second. They run separately for decades, touch once when television arrives, and do not fully merge until the device in your pocket forces them together. Vertical is what happens at that second meeting.
Hold those two threads in mind. I will try not to over-name them, because the more you point at a metaphor the less it carries. But they are the shape underneath everything that follows here.
The image begins as a free-for-all
Here is the part of film history that the studios would rather you not dwell on. Before Hollywood was Hollywood, the movies looked less like an industry and more like a creator field.
In the first decade of the 1900s, film was a fragmented, entrepreneurial scramble. Regional rental exchanges moved prints around the country. Independent producers fought the Edison-backed Motion Picture Patents Company, the Trust, for the right to make pictures at all. There was no fixed grammar yet, no standard length, no settled idea of what a movie even was. People were working it out reel by reel, which is to say the grammar of cinema was being written in public by a loose population of operators with cameras. It is worth pausing on how open it really was. Before the feature won, a movie could be any length, about anything, made by almost anyone with access to a camera and a storefront to show it in. Had that moment lasted, American cinema might have grown up looking less like a studio system and more like YouTube, a churn of short, various, creator-made pieces with no settled form.
Then it closed. As features replaced shorts and budgets climbed, the business centralized. The First World War knocked out the European competition, Pathé chief among them, and left American film to fill the gap worldwide. By 1917 the thing we now call Hollywood had taken its modern shape, and the studio system moved to absorb or eliminate the independents. Thomas Ince’s central-producer system, with its detailed continuity scripts and tight budgets and a single authority over the final cut, became the prototype the studios would run on for the next forty years.
I am going to call that closing an enclosure, and it will happen more than once in this piece. The first time it is worth being plain. An enclosure is the period when a messy creator field hardens into an industry, with the gatekeepers, the capital thresholds, the labor rules, the standards, and the winners that come with one. Hold it, because we are watching the same thing happen in vertical right now, and Seth Hallen and I take the early-film version of this argument up directly in a forthcoming piece called The Clubhouse and the Graph. For today, it is enough to notice the pattern forming.
One more thing the early years teach, quickly. Sound arrived at the end of the 1920s and proved the test from the inside. It did not just add audio. It changed how films were shot, slowed the camera down for a while, reorganized how a scene was built and carried, and rewrote the economics of the theaters that had to wire themselves for it. That is what a real grammar break looks like. It is the threshold that vertical video will have to clear.
Radio writes the rules nobody credits
While film was learning to compose an image, an entirely separate medium was building something early film hadn’t yet had the chance to. The habit.
Before the 1920s, radio had not yet settled into what we would recognize as regular programming. Broadcasts were irregular, a few hours of talk and music, the occasional concert or ballgame. Radio drama proper arrived around 1927, when the national networks started writing and adapting scripts for the air. The early 1930s is when the machine assembled itself. National advertisers realized they could buy airtime and sponsor whole programs, and once they did, the entire vocabulary of habitual broadcasting appeared almost at once. Serialized drama. The recurring cast you came back for. The cliffhanger. The daypart, that idea that a particular kind of show belongs at a particular hour. And the daytime serial, which we still call the soap opera for the simple reason that soap companies paid for it.
There is a tidy little prequel here too. Pulp-fiction magazines had spent decades training a mass audience on cheap genre serials, romance and westerns and detective mysteries, funded by advertising. When radio drama arrived, the advertisers simply moved their money from the page to the air, and the genres came with them. The attention-for-advertising machine did not start with radio. Radio just gave it a voice.
Notice what radio built with no pictures at all. Serialization, the engineered reflex of daily return, and the business model where you assemble an audience and rent its attention to a sponsor. Those are not minor. Two of Hernan’s three pillars for vertical, the distinct economics and the distinct audience psychology, were invented here, by an audio medium, before television existed and decades before anyone held a phone sideways. That observation matters more once cable arrives, so I will leave it sitting here for now.
Television, where the two threads first touch
Television is the first confluence, the first time the image lineage and the habit lineage run through the same box.
When TV arrived in force in the late 1940s and 1950s, it did not invent its structure from scratch. It inherited it wholesale from radio. The serials, the dayparts, the sponsor model, the genres, all of it walked across from one medium to the other, in some cases literally. Guiding Light ran on radio for fifteen years, then moved to television and ran for decades more. The habit grammar transferred intact. What television added was the image half. Its own shooting conventions grew up fast: the dominance of the close-up over the wide shot, the talking head, multi-camera capture, and a story structure cut to fit around commercial breaks.
But the clean “TV was the second language” line hides something that matters for judging vertical fairly. Television was never one grammar. A live broadcast, a multi-camera sitcom, and a single-camera filmed drama are about as different from each other as any of them is from a feature film. Live television cuts an unrepeatable event in real time and can never shoot it again. The sitcom runs on a stage in front of an audience with a fixed camera geometry. The filmed drama borrows the language of cinema almost entirely. We bundled a family of grammars under one tidy label, then forgot we bundled them. That matters later, because vertical gets the opposite treatment. One creator lineage gets split off and renamed a language.
Cable makes the case impossible to miss
Now for the beat I promised earlier. If you want to see vertical’s supposedly novel traits living a full life forty years early, you do not look at television in general. You look at cable.
Start with the economics. Cable broke broadcast’s defining rule, which was that you served the largest possible audience for free and sold it to advertisers. In its place came the earliest form of “narrowcasting”, the deliberate pursuit of a niche, devoted slice rather than a mass middle, and a dual revenue model where you paid a carriage fee and watched ads. That hybrid, subscription plus advertising, is the exact monetization question Hernan framed as one of vertical’s great open puzzles. Cable answered a version of it when ESPN went on the air in 1979.
And then the habit. CNN launched in 1980 as the first 24-hour news channel (RIP Ted Turner), and ESPN had already built the around-the-clock sports wheel. Between them they dissolved the daypart that radio invented and television inherited. The schedule stopped being a sequence of appointments and became an ambient thing, always on, waiting, refreshable. That is the bridge between radio’s ‘same time tomorrow’ and the feed’s ‘whatever is on right now’. The infinite scroll did not invent always-on. Cable Television did. The phone plus Internet just made always-on personal and portable, then swapped the programmer for an algorithm.
Then there’s the grammar. Niche channels could not survive on reruns, so economic pressure forced them to invent native visual forms. Distribution manufacturing new language. ESPN built the grammar of live sports into a language of its own, the score bug, the lower-third, the highlight package, the replay as a unit of meaning. CNN built the grammar of continuous news, the crawl, the breaking-news bug, the anchor as a permanent present-tense presence. HBO, freed by subscription from the ad break, let the serialized drama stretch into the episodic long form that streaming later inherited. Amanda D. Lotz dates the golden era of television to the 2010s, but people binged HBO seasons years before the word went mainstream. And MTV, from 1981, maybe produced the purest grammar break of the lot, rhythmic cutting decoupled from narrative, image driven by music rather than story, discontinuity as a style instead of a mistake. That grammar leaked into advertising and then into film itself.
This is why cable matters here. Niche economics, hybrid monetization, always-on habit, and brand-new native grammars, all present and accounted for, a decade before the web and three decades before the vertical microdrama. When Hernan says vertical has its own economics and its own audience psychology, he is right that vertical has them. He is not right that they are new. They are inheritances. Ask the engineers who built cable and they will tell you they solved streaming years early, and they are mostly right. The tech was never what failed. The customer service and the business model were. Cable invented the grammar of always-on and then lost the audience anyway, which is the first hint at where the durable value in any of this actually sits.
Control moves to the viewer
Cable handed the audience something else along the way, almost as a side effect, and it sets up everything after. Choice and pace started migrating from the programmer to the person on the couch.
The remote control arrived alongside the cable box. The VCR added time-shifting and this was the first real crack in appointment viewing. It was the first time you could watch on your schedule rather than the network’s. The web extended the same motion into something close to infinite, a shelf with no end and a link to everything on it. None of this changed the grammar of the image much. What it changed was who held the controls (literally). By broadband, the audience had already been trained to steer, and what came next handed it the wheel completely. A medium where the viewer did not just choose when to watch but, through sheer aggregate behavior, decided what got made. That medium was YouTube.
YouTube is the missing ancestor
Here is where I think the third-language story goes furthest off, and where the history clarifies the most. If you want vertical’s actual parent, you do not look at television. You look at YouTube and the creator culture it organized and amplified. This is the creator-native visual grammar vertical inherits most directly, and vertical is its child, not television’s and not cable’s.
Start with the grammar, because it is real and it is native to the platform. Early video bloggers were not bound by the conventions of film or television, and in that freedom they built a new vocabulary of the moving image more or less by collective experiment. Direct address straight down the lens, the performer also the storyteller. The vlogging jump cut, the abrupt mid-sentence splice that signals YouTube-native authenticity and keeps energy up for a solo creator with one camera, became so standard it now reads as a genre marker rather than an error. Watch an early creator like Brian Brushwood work, the direct address, the relentless cut-on-every-beat pacing, the energy that never dips because the edit never lets it, and you are watching a grammar that film and television never taught and never would have. He was one of many who saw it early, built a craft on it, and turned a webcam and a tight edit into the kind of public following the old gatekeepers used to hand out. He is still at it, most recently with his podcast World’s Greatest Con.
But grammar is only half of why YouTube is the ancestor. The other half is everything the platform added that television never had. The thumbnail and title as a unit of competition. The parasocial bond between creator and audience that runs deeper than anything between a network and its viewers. The comment loop, where the audience talks back and the next video answers. Format mutation, where forms evolve in weeks because creators copy and remix each other’s work in real time. And underneath all of it, the recommendation algorithm, which did something no network and no cable wheel ever could: it replaced the schedule itself. Not a programmer deciding what airs at eight o’clock, but a machine deciding, viewer by viewer, what gets seen at all. None of that is television grammar. Most of it matters more to how vertical actually behaves than anything TV ever did.
And notice what kind of place YouTube was in its early years. Fragmented, entrepreneurial, no real fixed grammar, a loose population of operators with cameras working it out in public. If that sounds familiar, it should. It is 1910 all over again. The open field had reopened, on a new surface, and the same kind of cycle that closed cinema once was about to begin its second turn.
The phone, where the threads finally merge
The two lineages I have been tracking, the image and the habit, finally fuse completely on a single device. That, to me, is the event. Not the vertical frame itself, but the moment one object in your pocket carries the entire image grammar and the entire radio-born habit machine at the same time, and never has to leave your side.
The vertical frame is almost an accident of that fusion. Hernan dates it nicely to the moment people simply stopped rotating their phones to watch, and the platforms noticed. Vine looped its six seconds and helped write the hook-first playbook that TikTok, Instagram Reels, and YouTube Shorts still run on. In the mobile feed, audio becomes a retention signal and the first half-second decides everything. Stories went vertical and disappeared. Then TikTok did the thing none of the others had managed, it made vertical genuinely entertaining and let an algorithm replace the schedule entirely. The phone is the highest-frequency distribution surface ever built. In plain terms, it is the thing people check before they are out of bed, in the coffee line, in the elevator avoiding eye contact, and occasionally while half-watching an actual television. Radio asked you to come back tomorrow. The phone never lets you leave.
So is vertical a third audiovisual language, born cleanly after film and television? No. That genealogy skips its real parents. But here is where I want to give Hernan back most of what the clean count took away, because the honest answer is more interesting than a flat denial. Vertical is a dialect, and it’s a substantial one. It forms at the confluence of older visual grammar, the radio-born habit, the creator grammar and platform logic of YouTube and Vine, and the physical interface of the phone. And like any dialect worth the name, it changes real things. It changes blocking and gaze, because the frame is tall and the face fills it. It changes intimacy and pacing, because you are inches away and your thumb is a threat. It changes body framing, subtitle density, scene construction, and the rhythm of when and how it asks you for money. It is the newest dialect of a mobile-native grammar that YouTube started writing twenty years ago and is carried by the first device built to hold both lineages at once.
Microdrama is the dialect’s current accent
This brings the whole hundred-year run up back to the room I was sitting in on Wednesday. The vertical microdrama is the current accent of that dialect, and it is also where the pattern I have been tracing becomes visible in real time.
The form itself is a confluence in miniature, and Hernan said so from the stage, to his credit. It took the swipe-up grammar of TikTok, the emotional close-up and cliffhanger of the soap opera, and the pay-to-continue monetization of mobile games, and it fused them, first at scale in China. That genealogy is exactly right. What the history adds is the part after the genealogy. Then the enclosure signals start arriving. China proved the scale. ReelShort proved the Western appetite, with U.S. users spending more time per day in the app than in Netflix. SAG-AFTRA wrote a verticals agreement, which is labor formalizing around the form. Fox took a stake in a vertical studio, which is capital arriving. TikTok launched a standalone microdrama app, which is consolidation pressure from the largest player in the room. Read that list again. Scale, demand, labor rules, capital, consolidation.
It is not the same enclosure in costume. It is the same kind of enclosure, running on a new surface. The gates are going back up.
Hernan’s three questions from the keynote, what will people watch, where will they watch it, and what is the business model, are the right questions. They are the questions you ask at exactly this moment in the cycle, when the field is still open enough to be worth asking and closing fast enough to make it urgent. I just think they sit inside a longer story than the one-two-three count suggests.
Where the value actually settles
If the pattern holds, and patterns are worth watching precisely because they sometimes do not, I would not bet on a specific winner. I would bet on the layer where the winner becomes inevitable. The habit engine, the distribution surface, and the business system underneath the format itself. Radio’s economics outlived every show it ever aired. Cable’s bundle outlived nearly every channel that rode it. The feed’s logic will outlive any format that scrolls through it, vertical microdrama included.
That is the one place I will leave a small door open for the readers who follow the larger argument I have been building elsewhere. The reference architecture I keep returning to, my personal “one architecture to rule them all,” reaches the same conclusion from the present-day side that this history reaches from the past. The lasting value accrues in the layers underneath the visible surface. If you have not been following that work, you have lost nothing, and the history stands on its own.
Back in the room
Near the end of his keynote, Hernan said that a year from now there will be companies in that room that do not exist yet, building genres that have not been invented, in categories nobody has named. I think he is right, and I think it is one of the truest things he said.


