Level Up

Text-to-image AI? Scratch that. Now AI can make full-blown video from words

Forget text-to-image AI, the next iteration of AI-generated content is going to be all about video, and Tiktok powerhouse Bytedance is onboard

Paige Cook

Published

November 25, 2022

Welcome to Magic Video. A text-to-video AI generation framework that enables a text description to be tuned into a – currently – bite-sized video. Magic Video has the ability to generate photo-realistic video clips that promise to deliver high relevance to the inputted text content.

Text-to-video AI is something we have already seen with the likes of Meta’s Make-A-Video which enables the creation of videos derived from just a few lines of text. Magic Video works by targeting the inputted text and using them as keywords to find a meaningful sequence of frames, this is then put together, and voilà you have your AI-generated video.

Magic Video functions on a different framework to that of other platforms and avoids using the cascade diffusion pipeline and is instead based on latent diffusion models. Currently there is not a ton of details on the project but the website seemingly features some example videos, and they are… interesting.

To be blunt, some of the videos have a rather creepy air and the focus on high definition seems odd when you considering how some of these videos turn out. Many of the videos are also plastered with a Shutterstock watermarks indicating that, at least for now, source images are being pulled from there and then animated.

It’s interesting to see social giant ByteDance exploring these avenues though does make you ponder just what it’s now doing with billions of TikTok videos in the background.

HD: Is that high definition? Or highly disturbing?

One of my favourite videos on display has to be ‘face of a happy macho mature man looking at camera’ firstly because the description is only something that a machine wouldn’t find funny and secondly because the rendered video is a nightmarish merge between a real man and some strange video game graphics. The uncanny valley has never been so real.

Another personal favourite would be, ‘Woman in sunset’, sounds simple right? Well, the outcome is some super creepy eye movement and zero sunset. This isn’t to say they’re all bad but the jump from single image (as per DALL-E) to video is certainly a trickier one, with wider margins for error.

Clever… But why?

So, what’s the point of all of this anyway? Text to image you can argue it’s a form of artistic creation, seeing what these AI frameworks can conjure up from a simple idea. The same can be said for its video counterpart, it’s a way to show creative expression and grant people simple and quick tools for creation. It also speaks to the advancement of AI in general, and how these platforms can be used by creators.

But as with all things AI, it’s a double-edged sword. One could argue that text-to-video generation tools could be invaluable for artists, especially those who perhaps have disabilities that restrict their capabilities. The ability to make something ‘real’ from nothing is clearly next to miraculous. However such tools can also be used to create content that is massively misleading and potentially dangerous. AI-generated imagery and deepfake ‘tech’ as already been used in the creation of rights infringing use of likenesses and – more dangerously – the production of nonconsensual pornography.

While the work on advancing AI is an interesting concept and something that could prove beneficial for the future, the way it is handled today will require more moderation and careful control than perhaps any other tech that humans have so far developed.

Seems like getting to the end of this road without harmful content tarnishing the whole idea en route, will be a feat as miraculous as AI video generation itself.

In this article:AI / Art / ByteDance / Featured / Video

Written By Paige Cook

Paige Cook is a writer with a multi-media background. She has experience covering video games and technology and also has freelance experience in video editing, graphic design, and photography. Paige is a massive fan of the movie industry and loves a good TV show, if she is not watching something interesting then she's probably playing video games or buried in a good book. Her latest addiction is virtual photography and currently spends far too much time taking pretty pictures in games rather than actually finishing them.

Bytes

Digital Futures Institute Festival of Storytelling announced for 2nd-4th June

The London festival will explore science fiction, games and speculative narratives, culminating in the Arthur C Clarke shortlist reveal

Dave BradleyMay 19, 2026

BeyondGames.biz

Level Up

Text-to-image AI? Scratch that. Now AI can make full-blown video from words

HD: Is that high definition? Or highly disturbing?

Clever… But why?

You May Also Like

Bytes

Digital Futures Institute Festival of Storytelling announced for 2nd-4th June

Subscribe to the future

Popular reading