When I was probably 10 years old, my favorite book was A Mango Shaped Space. I read it over and over. It’s about a girl named Mia who experiences synesthesia, feels misunderstood, and hides it from everyone around her while she sees a world in which colors, sounds and shapes converge. I was extremely jealous – so desperate to understand this feeling.
With recent advancements in NLP and generative AI enabling new formats for expression, I’ve been thinking a lot about what it might mean for a product to translate the feeling of synesthesia (interplay between the senses), or more generally unlock a means to expressing the subconscious – a feeling or mood exhibited in familiar format. Midjourney and Dall-E are first generation examples of what this might look like (input words, output imagery), enabling shared visual understanding of something intangible (like a dream).
What happens when this abstracts slightly further? Input a singular format (words, image, sound), output another sensory experience (scent, taste). Companies like Osmo (generative scent) are experimenting with this concept. At a layer deeper, one might contemplate the influence of a certain feeling or mood in that throughput. How would the sound of a song translate to a visual configuration, based partially on bpm or lyrics, though more intensely oriented around the felt or intended experience. How does “vibe” turn into an interactive universe, or something even more physically tangible like touch (material, product), or as mentioned above – scent, taste. What does the Mona Lisa taste like? What does Bohemian Rhapsody look like? Chroma seems to be thinking about this audiovisual process in an interesting way.
To build a model that would enable this function, I’d imagine the input would be “genomic data,” mapping the qualities of a medium to things like emotion, character arc, felt sense. The extent to which human tagging / human reinforcement is a requirement vs. the extent to which GPT understands this for us (possibly on a level which we cannot entirely comprehend) is something I can’t answer.
I think the models here will create new mediums around which platforms and networks can be built, enabling a next generation of artists, expressive layers of communication, better recommendation engines for content / products based on vibe rather than physical trait. What feels most commercially feasible in the near term, using the Mona Lisa example, looks more like translating the input to a restaurant, rather than a sensory profile (they’re obviously linked). A recommendation engine analyzing the thematic contents of a book to reveal the material or color structures of an outfit. I met a founder this week who aims to develop intelligent “vibe” profiles for this purpose. This feels close in shape to what I’m visualizing.
While artificial intelligence will inevitably enhance / shift workflows across every aspect of the economy as we know it (and I’m interested in these use cases), I’m most excited about the things that we couldn’t do before with the constraints of technology. What are the net new behaviors / formats that will abound?