Google's Genie Creates Playable Worlds from Sketches
The field of generative AI has seen tremendous advances in recent years, with models capable of generating remarkably realistic images, videos, and text. However, most of these models focus on passive generation from a prompt. In their new paper, researchers from DeepMind introduce an exciting new paradigm - generative interactive environments.
Imagine sketching a whimsical landscape on a napkin over lunch and, by evening, stepping into it as a playable 2D world. That's not a page from a sci-fi novel; it's the premise behind Google's latest AI marvel, Genie.
Unlike the magical beings of lore, this Genie doesn't grant three wishes but offers endless possibilities to creators, transforming mere images into interactive experiences. Trained on a vast trove of gameplay footage, Genie crafts worlds more aligned with classic platformers than VR, but its implications ripple far beyond gaming.
The model can take a text or image prompt and generate an entire playable, game-like environment. What's more, Genie is trained without any action labels or supervision, using only raw internet videos of people playing games. This allows it to learn in a completely unsupervised manner, opening up the possibility of internet-scale training.
Under the hood, Genie consists of three core components: a video tokenizer, a latent action model, and a dynamics model. The tokenizer compresses the raw video frames into discrete tokens. The latent action model then infers a discrete set of "actions" between frames, despite no ground truth being available. Finally, the dynamics model takes the frame tokens and latent actions as input, and predicts the next frame in an autoregressive manner.
A key innovation is Genie's use of a spatiotemoral transformer architecture. By limiting self-attention to spatial and temporal dimensions separately, Genie can efficiently model long video sequences. Experiments confirm that Genie scales well as more parameters and data are added, culminating in an 11 billion parameter model trained on over 200,000 hours of gaming videos.
The results are seriously impressive. Genie can take sketches, text descriptions, and even photorealistic images as prompts to generate interactive game worlds. The latent actions provide smooth control, moving characters and objects accordingly. One remarkable demonstration is Genie's ability to emulate parallax - foreground objects moving faster than distant background ones.
While limitations remain in consistency and speed, the authors argue that Genie opens up many exciting avenues for future work. It could be a general simulation engine for training reinforcement learning agents or robots. More broadly, by unlocking creative interactive experiences from any user's imagination, Genie points the way towards more humanistic generative AI.
As we marvel at Genie's potential to democratize game design, we might also ponder: How will this technology influence our perception of creativity and authorship? Are we edging closer to a world where our imaginations are the only limits, or will these tools reshape our very concept of creativity?
Read the full article on Tom's Guide.
----
๐ก If you enjoyed this content, be sure to download my new app for a unique experience beyond your traditional newsletter.
This is one of many short posts I share daily on my app, and you can have real-time insights, recommendations and conversations with my digital twin via text, audio or video in 28 languages! Go to my PWA at app.thedigitalspeaker.com and sign up to take our connection to the next level! ๐
If you are interested in hiring me as your futurist and innovation speaker, feel free to complete the below form.
Thanks for your inquiry
We have sent you a copy of your request and we will be in touch within 24 hours on business days.
If you do not receive an email from us by then, please check your spam mailbox and whitelist email addresses from @thedigitalspeaker.com.
In the meantime, feel free to learn more about The Digital Speaker here.
Or read The Digital Speaker's latest articles here.