May 4, 2024

The Case for a New Media Format

Two definitive patterns play out in the evolution of mobile / media applications:

1) More cellular bandwidth capacity means more information (bits) can be compressed to form new media asset, and platform evolution follows: 

Text > Photo > Video

Twitter > IG / Snap > TikTok

2) Long form content compresses to shorter, quickly consumable media formats:

Books > blogs > tweets

Movies > Youtube > Tiktok

And attention spans shrink…

Assuming both hold true as mobile (and spatial) applications continue to develop, the emergence of a subsequent media format feels inevitable. If photo is a more expressive version of text, and video is a more expressive version of photo, what is the more expressive version of video? I think the answer exists somewhere in the realm of immersion. What does this mean? A content format – interactive video –  that is driven by immediate feedback and dynamically shifts depending on a set of predefined or newly generated outcomes that are set by user engagement / user request. This (interactive video) most closely represents the compression of game design. More precisely, what are the elements of long form gameplay that can define a short form media package?

  • Reward function: Part of the consumption experience is working to achieve the reward; requires that the asset is interactive (I perform action X, the format is such that action Y subsequently occurs, i.e. dynamic outcomes) and prizes (likes, objects, etc) are well defined

  • Predefined goals: Standardized set of actions / decision paths to follow, this also helps to define creator experience in terms of understanding the package they are meant to create and distribute

  • Generative experiences: Generative and / or immersive (only happens in spatial) environments, i.e. worldbuilding as a version of the above reward function

What is notable about each of these components is that none of them could hypothetically manifest in a new media format if only cellular bandwidth capacity improved. This makes the progression of interactive media different than any of the previous text, image or video assets. It also explains why a new format has not yet emerged in ~10 years (, now Tiktok, started in 2014). To stand up any of the components above as a working new content package will additionally require the piping of a variety of AI models (see things like ComfyUI), likely running on device, and in the case of immersion a spatial computing experience enabled by hardware. With some of the more recent tiny model releases like Apple ELM, and what people in the open source community are doing with llama.cpp, quantization techniques, etc, we are closer to this being both accessible and feasible. Someone told me recently about a meditation application that offers a computer vision element to monitor the end user engaged in meditation, dynamically shifting the experience (music, audio, etc) when the user drifts from focus… it was described to be so accurate at measuring deviations from focus that it felt like meditation at gunpoint (this was articulated positively). This experience feels close to what I’d imagine a more generalizable media format would look like. LLMs / computer vision, faster data pass through, dynamic experience evolving based on user action, immersive environment. There are perhaps more early / skeleton examples of this dynamic playing out on the existing content platforms (the markets want it…): NPC streaming or vtubing come to mind – immediate feedback / reward function as demanded by the user. Response in this case human generated rather than machine generated. Another emergent area that feels close to this vision is natural language prompted immersive experiences on Vision Pro.

Every new content format commands a net new content platform which defines a new distribution function for a new set of creators. I am searching for the platform that sets out to package this next generation media experience that looks much more like an active consumption game vs. passive entertainment. This could be mobile native or exist on a headset. Still a lot to be defined.

