00:00 - 00:06
so tensent has launched Honan video a 13
00:03 - 00:09
billion parameter open source AA model
00:06 - 00:11
for text to video generation now this is
00:09 - 00:13
something that nobody expected because
00:11 - 00:16
when we take a look at the open-source
00:13 - 00:18
ecosystem especially for these kinds of
00:16 - 00:20
videos we wouldn't have expected that
00:18 - 00:23
the technology would have gotten to the
00:20 - 00:25
level where you can see the quality is
00:23 - 00:27
absolutely incredible I mean some of the
00:25 - 00:29
shots that we are seeing here I wouldn't
00:27 - 00:32
be surprised if some people even think
00:29 - 00:34
that it is from Sora now this is
00:32 - 00:36
something that is really really
00:34 - 00:38
incredible also for the fact that this
00:36 - 00:40
is basically now meaning the video
00:38 - 00:42
prices are going to fall I mean when we
00:40 - 00:43
take a look at the kind of level of
00:42 - 00:45
quality we have here we have to ask
00:43 - 00:47
ourselves how are people going to be
00:45 - 00:49
justifying paying you know hundreds and
00:47 - 00:51
hundreds of dollars per month for a
00:49 - 00:54
video system that you could literally
00:51 - 00:56
run for free now this video tool is
00:54 - 00:59
really really cool because it actually
00:56 - 01:01
comes with a unique number of features
00:59 - 01:03
and it's not just text a video it
01:01 - 01:05
actually has a decent amount of really
01:03 - 01:08
cool features that we're likely going to
01:05 - 01:10
use so I want to say that the AI space
01:08 - 01:12
is moving really rapidly because it was
01:10 - 01:14
only the other week that we got genmo AI
01:12 - 01:16
that was also really fascinating so
01:14 - 01:18
we're going to take a look at some of
01:16 - 01:21
the amazing features that this 10-cent
01:18 - 01:24
Hunan video open source video generator
01:21 - 01:25
has to offer so one of the things that
01:24 - 01:28
they actually talk about is the fact
01:25 - 01:30
that they have really high quality
01:28 - 01:32
videos in this short clip here you can
01:30 - 01:34
see exactly how high quality their
01:32 - 01:36
footage and the Fidelity is I want to
01:34 - 01:38
say that looking over many different
01:36 - 01:40
clips although some of them are probably
01:38 - 01:43
going to be cherry-picked the quality
01:40 - 01:45
and consistency is quite remarkable I
01:43 - 01:47
don't notice that many mistakes in those
01:45 - 01:50
clips and in certain demonstrations the
01:47 - 01:53
quality often exceeds what we do get
01:50 - 01:55
from other certain areas now in addition
01:53 - 01:57
to this we also see that they do have
01:55 - 01:59
something called High Dynamics which is
01:57 - 02:01
really good because apparently it breaks
01:59 - 02:03
the curse of dynamic motions which means
02:01 - 02:05
that it completes the actions in one
02:03 - 02:07
shot so one thing that we do have when
02:05 - 02:09
we're trying to get these really Dynamic
02:07 - 02:11
motion field shots is that when you have
02:09 - 02:14
the camera panning from one thing to
02:11 - 02:16
another often times we can enter a
02:14 - 02:18
situation where the model won't actually
02:16 - 02:20
be able to generate the continuous
02:18 - 02:22
action correctly it may actually make
02:20 - 02:24
some mistakes and this is something that
02:22 - 02:26
really doesn't you know perform well
02:24 - 02:28
with the generative AI video so it seems
02:26 - 02:30
that with this model this is something
02:28 - 02:31
that they've managed to focus on and
02:30 - 02:33
something that they have apparently
02:31 - 02:35
managed to solve next we actually do
02:33 - 02:38
take a look at this thing that they call
02:35 - 02:40
artistic shots which it says breaking
02:38 - 02:42
single camera movements seamless
02:40 - 02:44
integration of director level camera
02:42 - 02:46
work so I'm guessing that this is a
02:44 - 02:48
little different so what we have here is
02:46 - 02:51
something where you can actually get
02:48 - 02:53
multiple shots of the same person in any
02:51 - 02:55
kind of style which is really really
02:53 - 02:57
interesting because this is something
02:55 - 02:59
that makes me believe that now people
02:57 - 03:01
are going to be able to get multiple
02:59 - 03:02
shots of a certain character doing
03:01 - 03:05
certain things which leads to more
03:02 - 03:07
artistic creativity which of course most
03:05 - 03:08
people won't originally have which
03:07 - 03:10
lowers the you know and levels the
03:08 - 03:12
playing field for those who want to
03:10 - 03:15
enter the space now I couldn't help but
03:12 - 03:16
notice that this specific example
03:15 - 03:19
actually looks like a very familiar
03:16 - 03:21
example from another other company's
03:19 - 03:24
demo now if you remember the open AI
03:21 - 03:26
demo you'll remember seeing the same man
03:24 - 03:29
that had the same glasses the same hat
03:26 - 03:31
and was actually at a coffee shop and
03:29 - 03:33
you can see that the quality difference
03:31 - 03:35
between the two I don't think it's that
03:33 - 03:38
far obviously I mean Sora was really
03:35 - 03:40
good but let's take a look at this here
03:38 - 03:42
guys we have to understand that this one
03:40 - 03:44
was you know open source and only 13
03:42 - 03:46
billion parameters and probably doesn't
03:44 - 03:48
take as long as open AI so for something
03:46 - 03:51
that you could likely run on some local
03:48 - 03:53
hardware I think this is truly truly
03:51 - 03:56
incredible and remember guys this is you
03:53 - 03:57
know the worst technology is ever going
03:56 - 03:59
to be which means that when we talk
03:57 - 04:01
about future implications this is going
03:59 - 04:03
to be something that is you know insane
04:01 - 04:05
when we think about the use cases for
04:03 - 04:08
the things that we could create for
04:05 - 04:10
ourselves now next what we do have is
04:08 - 04:12
the concept generalization and this is
04:10 - 04:14
of course where we do have different
04:12 - 04:16
things combined with other things
04:14 - 04:17
essentially just stating that look if
04:16 - 04:20
you want to you can get really creative
04:17 - 04:22
and you can combine a panda riding a
04:20 - 04:25
bike cycling through the Streets of
04:22 - 04:27
London or Prague or whatever it is that
04:25 - 04:28
you may fancy so I think this is
04:27 - 04:29
something that is really important for
04:28 - 04:32
these models because you have have to
04:29 - 04:35
have a rich understanding of objects and
04:32 - 04:37
the relation to other objects too so
04:35 - 04:39
with physical compliance basically what
04:37 - 04:42
we have here is the system that has
04:39 - 04:43
enough of the physical properties to
04:42 - 04:45
understand how objects interact with
04:43 - 04:48
each other so this is something where
04:45 - 04:50
you can see we have the water dropping
04:48 - 04:52
and the ripples in the waves that
04:50 - 04:54
actually seem to be pretty pretty
04:52 - 04:57
accurate now this isn't the only example
04:54 - 04:59
of physical compliance you actually need
04:57 - 05:01
physical compliance in order for the you
04:59 - 05:03
know mo in order for objects to interact
05:01 - 05:06
with one each other and of course you
05:03 - 05:08
need that to perform at a really high
05:06 - 05:10
degree in order for the video footage to
05:08 - 05:11
actually look real and for it to
05:10 - 05:13
actually look legible and that's
05:11 - 05:15
something that you know a lot of video
05:13 - 05:16
models struggle with now I do know that
05:15 - 05:19
there's no model that does it pretty
05:16 - 05:21
consistently but an open source model
05:19 - 05:24
that does have a high degree of this is
05:21 - 05:26
going to be something that people really
05:24 - 05:28
really do value so the model actually
05:26 - 05:30
stating that look we actually ensured
05:28 - 05:32
that we worked on this this is a clear
05:30 - 05:34
indicator that they've got themselves a
05:32 - 05:36
good model now what they also do have is
05:34 - 05:38
I think something that is completely
05:36 - 05:40
incredible so they have something that
05:38 - 05:42
they called native camera Cuts now
05:40 - 05:44
recently this was something that I did
05:42 - 05:46
see in opening eyes Sora but basically
05:44 - 05:49
what this means is that the model
05:46 - 05:51
actually natively Cuts around the scene
05:49 - 05:52
in order to generate a consistent
05:51 - 05:54
storyline I mean this is something that
05:52 - 05:56
is completely incredible because it
05:54 - 05:58
shows us that you know I guess if we can
05:56 - 06:00
get these native camera Cuts in models
05:58 - 06:01
that are open source right now now
06:00 - 06:04
what's stopping us from the future
06:01 - 06:06
models being able to create a native
06:04 - 06:08
camera cut entire movie now of course
06:06 - 06:10
you will have to prompt it this prompt
06:08 - 06:12
has you know two levels to The Prompt
06:10 - 06:14
but I do think something like this is
06:12 - 06:15
going to be really really incredible for
06:14 - 06:17
the wide range of use cases that it can
06:15 - 06:20
have I mean something like this is just
06:17 - 06:21
really really outstanding the quality
06:20 - 06:24
the way the sand is dropping the way the
06:21 - 06:26
sand is brushing against the Dune Hills
06:24 - 06:28
I mean that is just completely
06:26 - 06:30
outstanding so hats off to them for this
06:28 - 06:32
level of consistency and the fact that
06:30 - 06:34
these cuts are going to allow people to
06:32 - 06:36
get a lot more out of these systems so
06:34 - 06:39
now next what we actually have here is
06:36 - 06:42
we have sound generation so what you're
06:39 - 06:44
about to hear is the generation for this
06:42 - 06:46
model which is able to generate sound
06:44 - 06:48
based on certain prompts and I think
06:46 - 06:50
this is really cool because it shows us
06:48 - 06:52
that this is the kind of video creation
06:50 - 06:54
tool that not only allows you to create
06:52 - 06:56
really good videos but allows you to
06:54 - 07:00
create things that can assist you in
06:56 - 07:00
that video creation
07:17 - 07:20
another interesting thing that they do
07:19 - 07:22
have and I have seen this before but
07:20 - 07:24
this actually looks like a really
07:22 - 07:27
effective method is that they do have
07:24 - 07:28
motion driven movement so there's some
07:27 - 07:30
kind of you know reference motion
07:28 - 07:33
footage capture and they essentially use
07:30 - 07:35
that footage to drive the motion on the
07:33 - 07:38
image on the right and then of course
07:35 - 07:40
the final output is you can see someone
07:38 - 07:42
that is dancing now I do think that kind
07:40 - 07:44
of thing is really fascinating because
07:42 - 07:46
it gives you a lot more control over
07:44 - 07:48
what your character is doing and how you
07:46 - 07:50
want the video to be so stuff like that
07:48 - 07:52
is really really important for the
07:50 - 07:54
creative control over the final output
07:52 - 07:56
and an open source video tool adding
07:54 - 07:57
that in natively is something that's
07:56 - 07:59
really really going to be effective and
07:57 - 08:01
of course we do have reference DP
07:59 - 08:03
movement for the kind of shots which is
08:01 - 08:05
going to be once again something where
08:03 - 08:07
you can capture a video of yourself use
08:05 - 08:09
a video you found online and then use
08:07 - 08:11
that to drive the video for the
08:09 - 08:14
character performing any kind of speech
08:11 - 08:16
or any kind of shot so when we do take a
08:14 - 08:19
look at the overall system here I think
08:16 - 08:21
it shows a remarkable fact the AI
08:19 - 08:22
industry is getting a wave of different
08:21 - 08:25
AI tools that are going to really change
08:22 - 08:27
how this space shapes up to be when we
08:25 - 08:29
do have open source tools flooding the
08:27 - 08:31
market in terms of the quality adhering
08:29 - 08:33
and character consistency it's going to
08:31 - 08:35
really change Tire game for how AI
08:33 - 08:37
content creation is going to be I mean
08:35 - 08:38
these video tools are going to allow
08:37 - 08:40
people to create their own films and
08:38 - 08:42
it'll be really interesting to see how
08:40 - 08:45
this kind of Technology develops within
08:42 - 08:45
the next couple of years