00:00 - 00:06

so tensent has launched Honan video a 13

00:03 - 00:09

billion parameter open source AA model

00:06 - 00:11

for text to video generation now this is

00:09 - 00:13

something that nobody expected because

00:11 - 00:16

when we take a look at the open-source

00:13 - 00:18

ecosystem especially for these kinds of

00:16 - 00:20

videos we wouldn't have expected that

00:18 - 00:23

the technology would have gotten to the

00:20 - 00:25

level where you can see the quality is

00:23 - 00:27

absolutely incredible I mean some of the

00:25 - 00:29

shots that we are seeing here I wouldn't

00:27 - 00:32

be surprised if some people even think

00:29 - 00:34

that it is from Sora now this is

00:32 - 00:36

something that is really really

00:34 - 00:38

incredible also for the fact that this

00:36 - 00:40

is basically now meaning the video

00:38 - 00:42

prices are going to fall I mean when we

00:40 - 00:43

take a look at the kind of level of

00:42 - 00:45

quality we have here we have to ask

00:43 - 00:47

ourselves how are people going to be

00:45 - 00:49

justifying paying you know hundreds and

00:47 - 00:51

hundreds of dollars per month for a

00:49 - 00:54

video system that you could literally

00:51 - 00:56

run for free now this video tool is

00:54 - 00:59

really really cool because it actually

00:56 - 01:01

comes with a unique number of features

00:59 - 01:03

and it's not just text a video it

01:01 - 01:05

actually has a decent amount of really

01:03 - 01:08

cool features that we're likely going to

01:05 - 01:10

use so I want to say that the AI space

01:08 - 01:12

is moving really rapidly because it was

01:10 - 01:14

only the other week that we got genmo AI

01:12 - 01:16

that was also really fascinating so

01:14 - 01:18

we're going to take a look at some of

01:16 - 01:21

the amazing features that this 10-cent

01:18 - 01:24

Hunan video open source video generator

01:21 - 01:25

has to offer so one of the things that

01:24 - 01:28

they actually talk about is the fact

01:25 - 01:30

that they have really high quality

01:28 - 01:32

videos in this short clip here you can

01:30 - 01:34

see exactly how high quality their

01:32 - 01:36

footage and the Fidelity is I want to

01:34 - 01:38

say that looking over many different

01:36 - 01:40

clips although some of them are probably

01:38 - 01:43

going to be cherry-picked the quality

01:40 - 01:45

and consistency is quite remarkable I

01:43 - 01:47

don't notice that many mistakes in those

01:45 - 01:50

clips and in certain demonstrations the

01:47 - 01:53

quality often exceeds what we do get

01:50 - 01:55

from other certain areas now in addition

01:53 - 01:57

to this we also see that they do have

01:55 - 01:59

something called High Dynamics which is

01:57 - 02:01

really good because apparently it breaks

01:59 - 02:03

the curse of dynamic motions which means

02:01 - 02:05

that it completes the actions in one

02:03 - 02:07

shot so one thing that we do have when

02:05 - 02:09

we're trying to get these really Dynamic

02:07 - 02:11

motion field shots is that when you have

02:09 - 02:14

the camera panning from one thing to

02:11 - 02:16

another often times we can enter a

02:14 - 02:18

situation where the model won't actually

02:16 - 02:20

be able to generate the continuous

02:18 - 02:22

action correctly it may actually make

02:20 - 02:24

some mistakes and this is something that

02:22 - 02:26

really doesn't you know perform well

02:24 - 02:28

with the generative AI video so it seems

02:26 - 02:30

that with this model this is something

02:28 - 02:31

that they've managed to focus on and

02:30 - 02:33

something that they have apparently

02:31 - 02:35

managed to solve next we actually do

02:33 - 02:38

take a look at this thing that they call

02:35 - 02:40

artistic shots which it says breaking

02:38 - 02:42

single camera movements seamless

02:40 - 02:44

integration of director level camera

02:42 - 02:46

work so I'm guessing that this is a

02:44 - 02:48

little different so what we have here is

02:46 - 02:51

something where you can actually get

02:48 - 02:53

multiple shots of the same person in any

02:51 - 02:55

kind of style which is really really

02:53 - 02:57

interesting because this is something

02:55 - 02:59

that makes me believe that now people

02:57 - 03:01

are going to be able to get multiple

02:59 - 03:02

shots of a certain character doing

03:01 - 03:05

certain things which leads to more

03:02 - 03:07

artistic creativity which of course most

03:05 - 03:08

people won't originally have which

03:07 - 03:10

lowers the you know and levels the

03:08 - 03:12

playing field for those who want to

03:10 - 03:15

enter the space now I couldn't help but

03:12 - 03:16

notice that this specific example

03:15 - 03:19

actually looks like a very familiar

03:16 - 03:21

example from another other company's

03:19 - 03:24

demo now if you remember the open AI

03:21 - 03:26

demo you'll remember seeing the same man

03:24 - 03:29

that had the same glasses the same hat

03:26 - 03:31

and was actually at a coffee shop and

03:29 - 03:33

you can see that the quality difference

03:31 - 03:35

between the two I don't think it's that

03:33 - 03:38

far obviously I mean Sora was really

03:35 - 03:40

good but let's take a look at this here

03:38 - 03:42

guys we have to understand that this one

03:40 - 03:44

was you know open source and only 13

03:42 - 03:46

billion parameters and probably doesn't

03:44 - 03:48

take as long as open AI so for something

03:46 - 03:51

that you could likely run on some local

03:48 - 03:53

hardware I think this is truly truly

03:51 - 03:56

incredible and remember guys this is you

03:53 - 03:57

know the worst technology is ever going

03:56 - 03:59

to be which means that when we talk

03:57 - 04:01

about future implications this is going

03:59 - 04:03

to be something that is you know insane

04:01 - 04:05

when we think about the use cases for

04:03 - 04:08

the things that we could create for

04:05 - 04:10

ourselves now next what we do have is

04:08 - 04:12

the concept generalization and this is

04:10 - 04:14

of course where we do have different

04:12 - 04:16

things combined with other things

04:14 - 04:17

essentially just stating that look if

04:16 - 04:20

you want to you can get really creative

04:17 - 04:22

and you can combine a panda riding a

04:20 - 04:25

bike cycling through the Streets of

04:22 - 04:27

London or Prague or whatever it is that

04:25 - 04:28

you may fancy so I think this is

04:27 - 04:29

something that is really important for

04:28 - 04:32

these models because you have have to

04:29 - 04:35

have a rich understanding of objects and

04:32 - 04:37

the relation to other objects too so

04:35 - 04:39

with physical compliance basically what

04:37 - 04:42

we have here is the system that has

04:39 - 04:43

enough of the physical properties to

04:42 - 04:45

understand how objects interact with

04:43 - 04:48

each other so this is something where

04:45 - 04:50

you can see we have the water dropping

04:48 - 04:52

and the ripples in the waves that

04:50 - 04:54

actually seem to be pretty pretty

04:52 - 04:57

accurate now this isn't the only example

04:54 - 04:59

of physical compliance you actually need

04:57 - 05:01

physical compliance in order for the you

04:59 - 05:03

know mo in order for objects to interact

05:01 - 05:06

with one each other and of course you

05:03 - 05:08

need that to perform at a really high

05:06 - 05:10

degree in order for the video footage to

05:08 - 05:11

actually look real and for it to

05:10 - 05:13

actually look legible and that's

05:11 - 05:15

something that you know a lot of video

05:13 - 05:16

models struggle with now I do know that

05:15 - 05:19

there's no model that does it pretty

05:16 - 05:21

consistently but an open source model

05:19 - 05:24

that does have a high degree of this is

05:21 - 05:26

going to be something that people really

05:24 - 05:28

really do value so the model actually

05:26 - 05:30

stating that look we actually ensured

05:28 - 05:32

that we worked on this this is a clear

05:30 - 05:34

indicator that they've got themselves a

05:32 - 05:36

good model now what they also do have is

05:34 - 05:38

I think something that is completely

05:36 - 05:40

incredible so they have something that

05:38 - 05:42

they called native camera Cuts now

05:40 - 05:44

recently this was something that I did

05:42 - 05:46

see in opening eyes Sora but basically

05:44 - 05:49

what this means is that the model

05:46 - 05:51

actually natively Cuts around the scene

05:49 - 05:52

in order to generate a consistent

05:51 - 05:54

storyline I mean this is something that

05:52 - 05:56

is completely incredible because it

05:54 - 05:58

shows us that you know I guess if we can

05:56 - 06:00

get these native camera Cuts in models

05:58 - 06:01

that are open source right now now

06:00 - 06:04

what's stopping us from the future

06:01 - 06:06

models being able to create a native

06:04 - 06:08

camera cut entire movie now of course

06:06 - 06:10

you will have to prompt it this prompt

06:08 - 06:12

has you know two levels to The Prompt

06:10 - 06:14

but I do think something like this is

06:12 - 06:15

going to be really really incredible for

06:14 - 06:17

the wide range of use cases that it can

06:15 - 06:20

have I mean something like this is just

06:17 - 06:21

really really outstanding the quality

06:20 - 06:24

the way the sand is dropping the way the

06:21 - 06:26

sand is brushing against the Dune Hills

06:24 - 06:28

I mean that is just completely

06:26 - 06:30

outstanding so hats off to them for this

06:28 - 06:32

level of consistency and the fact that

06:30 - 06:34

these cuts are going to allow people to

06:32 - 06:36

get a lot more out of these systems so

06:34 - 06:39

now next what we actually have here is

06:36 - 06:42

we have sound generation so what you're

06:39 - 06:44

about to hear is the generation for this

06:42 - 06:46

model which is able to generate sound

06:44 - 06:48

based on certain prompts and I think

06:46 - 06:50

this is really cool because it shows us

06:48 - 06:52

that this is the kind of video creation

06:50 - 06:54

tool that not only allows you to create

06:52 - 06:56

really good videos but allows you to

06:54 - 07:00

create things that can assist you in

06:56 - 07:00

that video creation

07:01 - 07:14

[Music]

07:08 - 07:14

[Applause]

07:17 - 07:20

another interesting thing that they do

07:19 - 07:22

have and I have seen this before but

07:20 - 07:24

this actually looks like a really

07:22 - 07:27

effective method is that they do have

07:24 - 07:28

motion driven movement so there's some

07:27 - 07:30

kind of you know reference motion

07:28 - 07:33

footage capture and they essentially use

07:30 - 07:35

that footage to drive the motion on the

07:33 - 07:38

image on the right and then of course

07:35 - 07:40

the final output is you can see someone

07:38 - 07:42

that is dancing now I do think that kind

07:40 - 07:44

of thing is really fascinating because

07:42 - 07:46

it gives you a lot more control over

07:44 - 07:48

what your character is doing and how you

07:46 - 07:50

want the video to be so stuff like that

07:48 - 07:52

is really really important for the

07:50 - 07:54

creative control over the final output

07:52 - 07:56

and an open source video tool adding

07:54 - 07:57

that in natively is something that's

07:56 - 07:59

really really going to be effective and

07:57 - 08:01

of course we do have reference DP

07:59 - 08:03

movement for the kind of shots which is

08:01 - 08:05

going to be once again something where

08:03 - 08:07

you can capture a video of yourself use

08:05 - 08:09

a video you found online and then use

08:07 - 08:11

that to drive the video for the

08:09 - 08:14

character performing any kind of speech

08:11 - 08:16

or any kind of shot so when we do take a

08:14 - 08:19

look at the overall system here I think

08:16 - 08:21

it shows a remarkable fact the AI

08:19 - 08:22

industry is getting a wave of different

08:21 - 08:25

AI tools that are going to really change

08:22 - 08:27

how this space shapes up to be when we

08:25 - 08:29

do have open source tools flooding the

08:27 - 08:31

market in terms of the quality adhering

08:29 - 08:33

and character consistency it's going to

08:31 - 08:35

really change Tire game for how AI

08:33 - 08:37

content creation is going to be I mean

08:35 - 08:38

these video tools are going to allow

08:37 - 08:40

people to create their own films and

08:38 - 08:42

it'll be really interesting to see how

08:40 - 08:45

this kind of Technology develops within

08:42 - 08:45

the next couple of years

The Future of Video Creation: Exploring Tencent's Hunan Video Model

In the realm of AI technology, Tencent has launched an open-source video generation model named Hunan Video, boasting an impressive 13 billion parameters. This unexpected innovation has revolutionized text-to-video generation, showcasing unparalleled quality that rivals even industry giants like Sora. The implications of this advancement are vast, particularly in reshaping the landscape of video production.

Unveiling the Advanced Features of Tencent's Hunan Video Model

High-Quality Videos and Fidelity

Tencent's Hunan Video model sets a new standard for video quality with impeccable footage and consistency. The model's ability to produce top-tier visuals demonstrates a breakthrough in AI-driven video generation.

High Dynamics for Seamless Action

By introducing High Dynamics, the model overcomes challenges related to dynamic motions, ensuring smooth and accurate action transitions. This feature enhances the realism and fluidity of generated videos.

Artistic Shots for Creative Expression

The inclusion of Artistic Shots enables diverse camera movements and styles, empowering users to explore imaginative storytelling possibilities. This feature paves the way for enhanced artistic creativity in video content creation.

Concept Generalization for Creative Fusion

With Concept Generalization, users can combine elements in unique and unconventional ways, fostering creativity and innovation. This feature expands the creative freedom of creators to craft engaging and imaginative narratives.

Physical Compliance for Realistic Interactions

The model's emphasis on Physical Compliance ensures realistic interactions between objects, enhancing the authenticity of generated videos. Accurate simulations of physical properties, like water ripples, elevate the realism of the visual content.

Native Camera Cuts for Seamless Storytelling

Through native camera cuts, the model intelligently navigates scenes to create a coherent storyline, enhancing narrative flow and continuity. This feature streamlines the video production process, resulting in a more engaging viewer experience.

Sound Generation for Enhanced Immersion

The model's ability to generate sound based on specific prompts enriches the audio-visual experience, offering a comprehensive tool for immersive video creation. The integration of sound enhances the overall quality and impact of the generated videos.

Motion-Driven Movement for Creative Control

Motion-Driven Movement allows users to manipulate character movements based on reference motion footage, granting greater creative control over video production. This dynamic feature enables users to customize actions and behaviors with precision.

The Evolution of AI Content Creation: A Glimpse into the Future

With Tencent's Hunan Video model leading the charge in AI-driven video generation, the industry is experiencing a paradigm shift towards accessible and high-quality content creation. Open-source tools like Hunan Video democratize the filmmaking process, empowering creators to produce professional-grade videos with ease.

As the AI landscape continues to evolve, the proliferation of innovative tools and features promises a future where creativity knows no bounds. The blend of advanced technologies and artistic expression is reshaping the way videos are conceptualized and produced, setting the stage for a new era of content creation.

In conclusion, Tencent's Hunan Video model heralds a new dawn in video generation, offering a glimpse into a future where AI-driven tools redefine the boundaries of creativity and innovation. The transformative impact of these technologies on content creation is poised to revolutionize the industry, driving towards a future where every individual can harness the power of AI to unleash their storytelling potential.