00:03 - 00:06
all right buffal up for this one we're
00:04 - 00:08
diving into research where a team
00:06 - 00:11
simulated 9500 years of Driving
00:08 - 00:13
Experience 9500 yeah and they did it
00:11 - 00:16
without using any data from you know
00:13 - 00:18
real human drivers wow so no learning
00:16 - 00:20
from our messy human habits just pure AI
00:18 - 00:21
learning from scratch right it's all
00:20 - 00:24
thanks to their system called Giga flow
00:21 - 00:26
which trains AI drivers entirely in a
00:24 - 00:28
simulated world so like a giant virtual
00:26 - 00:30
driving school but all the students and
00:28 - 00:34
the instructors are AI exact they create
00:30 - 00:36
this simplified virtual world right okay
00:34 - 00:39
and they put up to 150 AI Control agents
00:36 - 00:41
in there cars trucks bikes pedestrians
00:39 - 00:43
wow all learning from each other the
00:41 - 00:46
scale of it is unbelievable yeah how big
00:43 - 00:49
are we talking over 1.6 billion kilom of
00:46 - 00:51
driving simulated hold on 1.6 billion
00:49 - 00:53
that's like farther than the distance
00:51 - 00:55
from the Sun to Saturn it is how long
00:53 - 00:59
did that even take to run well here's
00:55 - 01:01
the kicker gigur simulates 360,000 times
00:59 - 01:03
faster than real time wait what so all
01:01 - 01:04
those thousands of years of Driving
01:03 - 01:06
Experience it took a little over a week
01:04 - 01:09
on a single computer with
01:06 - 01:12
AGP okay back up how can you simulate
01:09 - 01:13
driving that much faster than real time
01:12 - 01:16
is this some kind of Time Warp not time
01:13 - 01:18
travel just really clever optimizations
01:16 - 01:19
they use techniques like batched
01:18 - 01:21
simulation where thousands of
01:19 - 01:23
simulations run at the same time on
01:21 - 01:24
different gpus so like having tons of
01:23 - 01:26
virtual drivers learning at once yeah
01:24 - 01:27
kind of like an army of them all
01:26 - 01:29
learning and evolving together that
01:27 - 01:31
makes sense but wouldn't running all
01:29 - 01:33
those simulations still require a ton of
01:31 - 01:35
computing power surprisingly no they
01:33 - 01:37
designed Giga flow to be very
01:35 - 01:39
costeffective the whole simulation costs
01:37 - 01:42
less than $5 million per million
01:39 - 01:43
kilometers simulated seriously that's
01:42 - 01:44
cheaper than a cup of coffee all right
01:43 - 01:47
so they've got this huge simulation
01:44 - 01:48
running but how does one AI learn to
01:47 - 01:51
control all these different types of
01:48 - 01:53
vehicles do they have to train separate
01:51 - 01:55
AIS for each like one for Cars one for
01:53 - 01:56
trucks one for bikes no they don't and
01:55 - 01:59
that's what's so cool about this one
01:56 - 02:00
single train AI policy can control any
01:59 - 02:01
of the vehicles
02:00 - 02:03
really it just adjusts its parameters
02:01 - 02:05
depending on what it's controlling no
02:03 - 02:07
need to train separate AIS so it's like
02:05 - 02:08
this AI just understands how to move
02:07 - 02:11
through the world no matter what it's
02:08 - 02:13
driving yeah pretty much fascinating but
02:11 - 02:16
how do you even start training an AI
02:13 - 02:18
like that with a proof simple reward
02:16 - 02:20
system actually the AI gets rewards for
02:18 - 02:21
reaching its goal it's penalized for
02:20 - 02:23
things like collisions and it's
02:21 - 02:26
encouraged to follow traffic laws and
02:23 - 02:28
stay in its Lane so just basic safe
02:26 - 02:30
driving stuff yeah but here's the thing
02:28 - 02:32
about self-play okay as AI agents
02:30 - 02:34
interact and learn from each other they
02:32 - 02:36
start to do much more complex things
02:34 - 02:37
things that go beyond just following the
02:37 - 02:42
rules oh interesting like what kind of
02:40 - 02:44
things like making unprotected left
02:42 - 02:47
turns and heavy traffic o yeah those are
02:44 - 02:50
tricky or navigating through bottlenecks
02:47 - 02:52
or merging smoothly onto a highway wow
02:50 - 02:54
these are all things that require like
02:52 - 02:56
real understanding of driving right
02:54 - 02:58
anticipating what others will do making
02:56 - 03:00
split-second decisions and you're seeing
02:58 - 03:02
these behaviors just emerge merged they
03:00 - 03:03
weren't programmed in exactly it's kind
03:02 - 03:06
of like watching a kid learn to ride a
03:03 - 03:08
bike at first they're wobbly and unsure
03:06 - 03:10
right but with practice they just
03:08 - 03:11
develop this like natural flow and
03:10 - 03:13
confidence that's what's happening here
03:11 - 03:15
with these AI drivers so this is more
03:13 - 03:17
like I don't know a virtual evolution of
03:15 - 03:19
driving intelligence that's a great way
03:17 - 03:20
to put it and it all happens really fast
03:19 - 03:22
because of something called Advantage
03:20 - 03:24
filtering Advantage filtering remind me
03:22 - 03:27
what that is again it basically makes
03:24 - 03:29
the AI focus on the most important parts
03:27 - 03:30
of the learning process like think about
03:29 - 03:33
about it a lot of the time you spend
03:30 - 03:35
driving is pretty uneventful right
03:33 - 03:38
driving on a straight empty road it
03:35 - 03:39
doesn't really teach you much true so
03:38 - 03:41
they filter out all those boring Parts
03:39 - 03:43
the AI focuses on the situations that
03:41 - 03:45
are more challenging like near
03:43 - 03:46
collisions or tricky Maneuvers ah so
03:45 - 03:48
it's like a highlight reel of the
03:46 - 03:50
important stuff which speeds up the
03:48 - 03:51
learning how do they even decide which
03:50 - 03:54
parts are valuable they use a metric
03:51 - 03:56
called Advantage it basically measures
03:54 - 03:57
how much better or worse the ai's action
03:56 - 03:59
was compared to all the other things it
03:57 - 04:01
could have done the experiences with a
03:59 - 04:03
High Advantage either positive or
04:01 - 04:05
negative are the ones they focus on for
04:03 - 04:08
learning so the AI isn't just learning
04:05 - 04:09
from its mistakes no it's also learning
04:08 - 04:11
from the times that made a really good
04:09 - 04:14
decision exactly it's learning from both
04:11 - 04:16
the successes and the saur and focusing
04:14 - 04:18
on the experiences that really make a
04:16 - 04:20
difference in its driving ability
04:18 - 04:23
welcome back let's take a closer look at
04:20 - 04:25
what makes Giga flow so impressive like
04:23 - 04:26
from a technical perspective all right
04:25 - 04:29
you mentioned before how they process a
04:26 - 04:30
crazy amount of data in this simulation
04:29 - 04:33
how much are we talking out here oh tons
04:30 - 04:35
we're talking 4.4 billion State
04:33 - 04:37
transitions per hour okay I'm not even
04:35 - 04:39
sure what that means it basically means
04:37 - 04:41
they're simulating about 42 years of
04:39 - 04:45
Driving Experience every hour on that
04:41 - 04:47
computer 42 years every hour that's wild
04:45 - 04:49
how is that even possible what's the
04:47 - 04:51
secret size well it's a few things
04:49 - 04:54
working together they've got a really
04:51 - 04:57
fast batched simulator a small but
04:54 - 05:00
powerful AI policy and a super efficient
04:57 - 05:01
training algorithm all right break that
05:00 - 05:03
down for me let's start with the
05:01 - 05:06
simulator what makes it so fast so
05:03 - 05:08
picture this Giga flow is running
05:06 - 05:11
38,400 simulated environments all at
05:08 - 05:13
once wow across eight gpus each
05:11 - 05:15
environment can have up to 150 vehicles
05:13 - 05:17
in it okay and all the basic
05:15 - 05:18
calculations like figuring out what
05:17 - 05:20
actions to take and updating the
05:18 - 05:22
positions of all the cars they do that
05:20 - 05:24
in batches I see so they're really
05:22 - 05:27
taking advantage of the parallel
05:24 - 05:29
processing power of those gpus right
05:27 - 05:31
it's like having a whole Army of virtual
05:29 - 05:33
drivers all learning and making
05:31 - 05:36
decisions simultaneously but how do they
05:33 - 05:38
even manage all that data wouldn't it be
05:36 - 05:40
super chaotic that's where spatial
05:38 - 05:42
hashing comes in spatial hashing yeah
05:40 - 05:45
it's kind of like a super efficient GPS
05:42 - 05:46
system for all those virtual drivers oh
05:45 - 05:49
okay it keeps track of everyone's
05:46 - 05:51
locations speeds and surroundings and
05:49 - 05:53
because the maps used in the simulations
05:51 - 05:55
can get pretty big they pre-compute all
05:53 - 05:57
the observations about the environment
05:55 - 05:59
and store it in this hash ah so it's
05:57 - 06:00
like a shared mental map for all the AI
06:00 - 06:04
constantly being updated exactly and it
06:02 - 06:06
makes looking up information during the
06:04 - 06:08
simulation super fast so you've got this
06:06 - 06:11
massive simulation running smoothly but
06:08 - 06:12
how can one AI Control so many different
06:11 - 06:14
agents each with their own unique
06:12 - 06:16
properties yeah that was my question too
06:14 - 06:18
the Giga flow policy itself is pretty
06:16 - 06:20
amazing it's this single neural network
06:18 - 06:22
but it's surprisingly small only 6
06:20 - 06:24
million parameters and the reason it can
06:22 - 06:26
control all these different agents is
06:24 - 06:28
that it's um
06:26 - 06:29
parameterized parameterized what does
06:28 - 06:31
that even mean are you you saying the
06:29 - 06:34
same network can drive a car a truck and
06:31 - 06:36
a pedestrian yep it's kind of like
06:34 - 06:38
giving the AI different instructions
06:36 - 06:40
depending on what it's controlling so it
06:38 - 06:41
gets this input a conditioning input
06:40 - 06:43
that tells it what type of agent it's
06:41 - 06:44
embodying at that moment and that
06:43 - 06:46
changes how it behaves without
06:44 - 06:48
retraining the whole network right no
06:46 - 06:49
retraining needed that's really
06:48 - 06:52
efficient so it's like the AI has this
06:49 - 06:54
core understanding of movement and then
06:52 - 06:55
it adapts that based of what it's
06:54 - 06:57
controlling exactly and the single
06:55 - 07:00
policy architecture is one of the big
06:57 - 07:02
reasons the system is so fast one batch
07:00 - 07:05
processing step handle millions of
07:02 - 07:07
agents at once wow okay that covers the
07:05 - 07:08
simulator and the policy what about the
07:07 - 07:10
third element you mentioned the high
07:08 - 07:12
throughput training algorithm right they
07:10 - 07:15
use a version of an algorithm called
07:12 - 07:16
proximal policy optimization but they've
07:15 - 07:19
tweaked it to work even better with
07:16 - 07:20
their setup that's the filtering part
07:19 - 07:22
right you got it they Incorporated
07:20 - 07:24
Advantage filtering directly into the
07:22 - 07:25
training process can you walk me through
07:24 - 07:27
how that works again I'm still trying to
07:25 - 07:28
wrap my head around it sure remember how
07:27 - 07:30
we talked about how a lot of driving is
07:28 - 07:32
kind of boring like just cruising along
07:30 - 07:35
a straight road yeah well Advantage
07:32 - 07:36
filtering takes that idea even further
07:35 - 07:38
it's not just about skipping over the
07:36 - 07:40
boring Parts it's about focusing on the
07:38 - 07:43
parts where the ai's decisions actually
07:40 - 07:44
matter like those close calls or those
07:43 - 07:46
difficult Maneuvers so instead of
07:44 - 07:48
learning from every second the AI is
07:46 - 07:50
learning from the most impactful moments
07:48 - 07:51
right it makes learning much faster and
07:50 - 07:53
more efficient how do they decide which
07:51 - 07:55
moments are the most impactful though
07:53 - 07:57
they use that Advantage metric basically
07:55 - 07:59
it measures how good or bad the ai's
07:57 - 08:00
choice was compared to all the other
08:00 - 08:03
the moments with a high Advantage
08:01 - 08:04
whether positive or negative get more
08:03 - 08:07
emphasis during
08:04 - 08:08
training so it's not just about learning
08:07 - 08:10
from mistakes it's about learning from
08:08 - 08:12
the really good choices too exactly it's
08:10 - 08:14
about optimizing the learning process to
08:12 - 08:15
get the most out of the simulation
08:14 - 08:17
reader that's really pleaser so we've
08:15 - 08:19
got the simulator producing all this
08:17 - 08:22
data Advantage filtering is making sure
08:19 - 08:24
the AI learns efficiently but what about
08:22 - 08:25
the neural network itself what's going
08:24 - 08:27
on under the hood there well the neural
08:25 - 08:29
network is designed specifically for the
08:27 - 08:31
challenges of driving it takes in t tons
08:29 - 08:33
of information about the environment and
08:31 - 08:35
uses that to make decisions what kind of
08:33 - 08:38
information are we talking about well it
08:35 - 08:40
gets basic stuff like the car speed its
08:38 - 08:42
position relative to the Lane the
08:40 - 08:44
curvature of the road so the basic
08:42 - 08:46
physics of driving essentially right but
08:44 - 08:48
it also gets more complex information
08:46 - 08:51
like a detailed map of the surrounding
08:48 - 08:52
area Lane markings Road boundaries
08:51 - 08:54
traffic lights so it's like it has a
08:52 - 08:56
mental map of the whole environment
08:54 - 08:58
exactly and it's not just a static map
08:56 - 09:00
the AI also gets information about the
08:58 - 09:03
movements of other other agents in the
09:00 - 09:04
environment cars pedestrians cyclists so
09:03 - 09:06
it's aware of everything that's moving
09:04 - 09:07
around it that seems like a lot to
09:06 - 09:09
process it is but the network is
09:07 - 09:11
designed to handle it efficiently for
09:09 - 09:14
example instead of keeping the entire
09:11 - 09:16
map in memory it uses these pre-computed
09:14 - 09:17
representations that it can access
09:16 - 09:20
really quickly based on the ai's current
09:17 - 09:21
location so it's only focusing on the
09:20 - 09:23
relevant parts of the map at any given
09:21 - 09:25
time yeah exactly they use similar
09:23 - 09:27
techniques to keep track of the other
09:25 - 09:28
agents in the environment so the AI
09:27 - 09:31
knows where everyone is and how they're
09:28 - 09:33
moving yes but it doesn't know things
09:31 - 09:35
like their goals or intentions that's
09:33 - 09:36
important because in the real world you
09:35 - 09:38
don't know what other drivers are
09:36 - 09:41
thinking right right so the AI has to
09:38 - 09:42
learn to predict their behavior based on
09:41 - 09:44
what it can actually see just like a
09:42 - 09:46
human driver that makes sense so they've
09:44 - 09:48
created this system where the AI has to
09:46 - 09:50
learn to drive defensively and
09:48 - 09:52
anticipate the actions of others exactly
09:50 - 09:54
and this ability to handle these
09:52 - 09:56
multi-agent interactions is what makes
09:54 - 09:58
Giga flow so robust this is also
09:56 - 10:00
impressive do we walk through how the
09:58 - 10:02
training actually works like how does
10:00 - 10:04
the AI actually learn to drive okay so
10:02 - 10:07
imagine the AI gets dropped into a
10:04 - 10:09
random spot on the map okay and it's
10:07 - 10:10
given a random destination so it's like
10:09 - 10:13
being thrown into a new city and told to
10:10 - 10:14
find your way somewhere right and the AI
10:13 - 10:16
has to figure out how to get there
10:14 - 10:19
safely efficiently following the rules
10:16 - 10:20
of the road navigating around all these
10:19 - 10:22
other agents that's quite the challenge
10:20 - 10:25
what happens next the AI uses its neural
10:22 - 10:27
network to decide what to do accelerate
10:25 - 10:29
break turn left turn right some
10:27 - 10:31
combination of those okay and then the
10:29 - 10:33
simulator updates the world based on
10:31 - 10:35
that action so it's constantly making
10:33 - 10:36
decisions and seeing the results of
10:35 - 10:39
those decisions exactly and based on
10:36 - 10:41
what happens it gets either a reward or
10:39 - 10:42
a penalty remember rewards for reaching
10:41 - 10:44
goals penalties for things like
10:42 - 10:46
collisions so it's learning through
10:44 - 10:48
trial and error basically yeah but at a
10:46 - 10:50
massively accelerated Pace that's the
10:48 - 10:53
beauty of simulation it can cram years
10:50 - 10:55
of experience into just a few days okay
10:53 - 10:57
so the AI is driving making decisions
10:55 - 10:58
getting feedback how does it actually
10:57 - 10:59
improve how does it get better at
10:58 - 11:02
driving that's where the reinforcement
10:59 - 11:03
learning part comes in the AI uses all
11:02 - 11:05
that feedback those rewards and
11:03 - 11:07
penalties to adjust its neural network
11:05 - 11:09
so it's constantly refining its ability
11:07 - 11:11
to choose actions that lead to good
11:09 - 11:13
outcomes it's like it's constantly
11:11 - 11:15
analyzing its own driving and making
11:13 - 11:17
adjustments to improve exactly and
11:15 - 11:19
because it's learning from such a
11:17 - 11:21
diverse set of experiences it becomes a
11:19 - 11:23
much more robust driver than systems
11:21 - 11:24
trained using traditional methods this
11:23 - 11:26
is starting to sound like we're on the
11:24 - 11:28
verge of reaing some truly intelligent
11:26 - 11:30
self-driving cars but how does this all
11:28 - 11:32
Translate to to the real world that's
11:30 - 11:33
the real test right absolutely and
11:32 - 11:35
that's exactly what we're going to
11:33 - 11:36
explore next we'll see how Giga flow
11:35 - 11:38
performs when it's faced with the
11:36 - 11:40
complexities of real world driving
11:38 - 11:42
scenarios all right so we've spent the
11:40 - 11:44
last two parts learning about this Giga
11:42 - 11:46
flow system simulating thousands of
11:44 - 11:49
years of Driving Experience in a virtual
11:46 - 11:51
world but the big question is can this
11:49 - 11:53
AI trained entirely in simulation
11:51 - 11:55
actually handle driving in the real
11:53 - 11:57
world yeah that's the key right to find
11:55 - 12:00
out they tested Giga flow on three
11:57 - 12:03
different benchmarks for autonomous
12:00 - 12:04
driving okay Carla new plan and Whos
12:03 - 12:06
waymax I've heard of those they're not
12:04 - 12:09
easy tests are they nope not at all each
12:06 - 12:11
bench mark has its own challenges like
12:09 - 12:13
what well Carla tests the AI on longer
12:11 - 12:16
driving scenarios with things like
12:13 - 12:19
pedestrians suddenly jaywalking or cars
12:16 - 12:21
swerving unexpectedly oh new plan uses
12:19 - 12:23
real world map data and has more
12:21 - 12:25
reactive less predictable traffic it's
12:23 - 12:27
closer to real driving conditions makes
12:25 - 12:30
sense and wh Mo's waymax what's that one
12:27 - 12:32
about waymax uses a huge amount of real
12:30 - 12:34
world driving data from wh Mo's
12:32 - 12:36
self-driving car so it's like a library
12:34 - 12:38
of the hardest and most varied driving
12:36 - 12:39
situations they've encountered so
12:38 - 12:41
they're throwing some really tough
12:39 - 12:44
challenges at this Ai and it's all data
12:41 - 12:46
from Real World Systems exact but they
12:44 - 12:48
didn't try to change gigow for these
12:46 - 12:49
tests right like fine-tune it no they
12:48 - 12:51
didn't they wanted to see how well it
12:49 - 12:53
could adapt to new situations so they
12:51 - 12:55
ran it zero shot meaning meaning gigow
12:53 - 12:57
had never seen any of the data from
12:55 - 12:59
these benchmarks before it was going in
12:57 - 13:01
totally blind wow that's a bold move
12:59 - 13:03
usually AIS are trained on the specific
13:01 - 13:05
benchmarks they're going to be tested on
13:03 - 13:07
so how did Giga flow do this is the
13:05 - 13:09
really cool part Giga flow even though
13:07 - 13:12
it was trained entirely in a simulation
13:09 - 13:14
it actually beat AI systems that were
13:12 - 13:17
specifically trained on real world data
13:14 - 13:19
on all three benchmarks seriously an AI
13:17 - 13:22
trained in a simulation outperformed
13:19 - 13:24
systems trained on real data how is that
13:22 - 13:26
even possible think about it gigow went
13:24 - 13:28
through so much training right billions
13:26 - 13:30
of kilometers thousands of years of
13:28 - 13:32
experience so it developed a really
13:30 - 13:34
Broad and adaptable understanding of
13:32 - 13:37
driving more so than those systems
13:34 - 13:39
trained on smaller real world data sets
13:37 - 13:41
Okay I could see that but did it
13:39 - 13:43
actually drive like a human or was it
13:41 - 13:45
just following the rules really rigidly
13:43 - 13:47
to find that out they looked closely at
13:45 - 13:49
how the AI was making decisions and they
13:47 - 13:50
found some amazing stuff Giga flow was
13:49 - 13:52
doing things that you'd expect from a
13:50 - 13:54
human driver making unprotected left
13:52 - 13:57
turns and busy traffic navigating
13:54 - 13:58
bottlenecks merging smoothly into
13:57 - 14:00
traffic so it wasn't just following a
13:58 - 14:03
set of rules it was making strategic
14:00 - 14:04
decisions based on the situation like a
14:03 - 14:06
person would yeah exactly and it wasn't
14:04 - 14:07
just reacting to what was right in front
14:06 - 14:10
of it either it was actually planning
14:07 - 14:12
ahead anticipating future events and
14:10 - 14:13
making decisions based on that you mean
14:12 - 14:15
like changing its route based on traffic
14:13 - 14:16
further down the road yeah things like
14:15 - 14:18
that that's really impressive what about
14:16 - 14:20
safety did it ever make any mistakes
14:18 - 14:22
with all this complex driving well one
14:20 - 14:24
of the most impressive findings was Giga
14:22 - 14:26
flow's safety record they ran these long
14:24 - 14:28
simulations designed to really push it
14:26 - 14:30
to its limits okay and the average time
14:28 - 14:34
between accidents for the Giga flow
14:30 - 14:36
agents was 17.5 years 17.5 years that's
14:34 - 14:37
way better than the average human driver
14:36 - 14:39
how do they measure that in the
14:37 - 14:41
simulation they set up a special testing
14:39 - 14:44
environment within Giga flow designed
14:41 - 14:45
for long-term safety they removed some
14:44 - 14:48
of the randomness from the training
14:45 - 14:50
environment sped up how fast the AI made
14:48 - 14:52
decisions to be more like real world
14:50 - 14:54
conditions and they really emphasized
14:52 - 14:56
safe driving in the reward system so it
14:54 - 14:59
was like a super intense driving exam
14:56 - 15:01
and Giga flow aced it exactly it was
14:59 - 15:03
incredibly stable consistently avoided
15:01 - 15:05
accidents which really shows how
15:03 - 15:07
powerful self-play and reinforcement
15:05 - 15:09
learning can be for creating safe AI
15:07 - 15:10
systems this is all amazing but
15:09 - 15:12
realistically are we going to have Giga
15:10 - 15:15
flow powered cars on the roads anytime
15:12 - 15:18
soon maybe not tomorrow but the research
15:15 - 15:20
is incredibly promising sure I mean
15:18 - 15:24
gigow was trained in a simp oh wait
15:20 - 15:26
someone wants to join hey go for it hey
15:24 - 15:30
thank you for your sharing and can we
15:26 - 15:30
talk about the word analization
15:34 - 15:36
absolutely I'm glad you ask
15:35 - 15:38
initialization is a really important
15:36 - 15:40
part of the process here yeah and it's a
15:38 - 15:42
great question because it really
15:40 - 15:44
highlights how they manag to make this
15:42 - 15:46
system so efficient so when we talk
15:44 - 15:48
about initialization in this context it
15:46 - 15:51
refers to how they set up the starting
15:48 - 15:52
conditions for each simulated Drive okay
15:51 - 15:54
so it's like where the AI drivers start
15:52 - 15:56
from and what their goals are exactly
15:54 - 15:58
they use two main techniques to speed
15:56 - 16:01
this up first they create a big pool of
15:58 - 16:03
r starting points and then they pull
16:01 - 16:05
from that pool as they need new initial
16:03 - 16:06
positions for each car oh so instead of
16:05 - 16:08
generating each starting position from
16:06 - 16:09
scratch every time they have this
16:09 - 16:13
buffer exactly that saves a lot of
16:11 - 16:15
computing time got it and then the
16:13 - 16:17
second part is how they place all the
16:15 - 16:19
cars without having them all start in a
16:17 - 16:20
pileup yeah that makes sense you
16:19 - 16:22
wouldn't want a massive crash before the
16:20 - 16:24
simulation even gets started right they
16:22 - 16:26
use something called sequential
16:24 - 16:28
rejection sampling where they add one
16:26 - 16:30
car at a time making sure it's not
16:28 - 16:34
colliding with any existing cars so it's
16:30 - 16:37
like carefully placing each car so that
16:34 - 16:39
everyone is starting in a safe spot yeah
16:37 - 16:41
these smart methods allowed them to keep
16:39 - 16:43
the simulation running smoothly without
16:41 - 16:45
constant crashes and restarts letting
16:43 - 16:47
them get to that massive scale we talked
16:45 - 16:49
about okay that makes sense so it's not
16:47 - 16:51
just about having a cool AI it's also
16:49 - 16:53
about setting up the simulation in a
16:51 - 16:55
Smart Way exactly now back to where we
16:53 - 16:57
were Giga flow was trained in a
16:55 - 16:59
simplified version of reality right so
16:57 - 17:01
there's still work to do to bridge the
16:59 - 17:03
gap between simulation and real world
17:01 - 17:05
driving so how do they do that how do
17:03 - 17:06
they make the jump from the virtual
17:05 - 17:08
world to the real world well the
17:06 - 17:10
researchers have some ideas for one they
17:08 - 17:13
want to use more realistic sensor data
17:10 - 17:15
in the simulations okay so instead of
17:13 - 17:17
having perfect information about the
17:15 - 17:20
environment the AI would have to make
17:17 - 17:22
decisions based on data from cameras lar
17:20 - 17:23
you know things like that like a real
17:22 - 17:25
self-driving car so it would be like
17:23 - 17:28
experiencing the messiness of the real
17:25 - 17:30
world but still in a safe simulated
17:28 - 17:33
setting right another idea is to combine
17:30 - 17:34
self-play with real world driving data
17:33 - 17:37
interesting so not just one or the other
17:34 - 17:39
but a mix of both exactly like maybe use
17:37 - 17:42
immocation learning where the AI learns
17:39 - 17:45
by watching human drivers or incorporate
17:42 - 17:48
actual real world driving
17:45 - 17:50
scenarios so can we just wrap up and get
17:53 - 17:56
conclusion of course I understand you're
17:55 - 17:57
ready for the wrap up yeah we can
17:56 - 17:59
definitely get to that conclusion for
17:57 - 18:01
you now before we do though I think it's
17:59 - 18:03
worth briefly mentioning the other ideas
18:01 - 18:04
the researchers had yeah just to tie it
18:03 - 18:06
all together so they were thinking about
18:04 - 18:09
things like incorporating more real
18:06 - 18:11
world sensor data right so like camera
18:09 - 18:14
and liar data and they also wanted to
18:11 - 18:16
explore combining the self-play with
18:14 - 18:18
actual driving
18:16 - 18:20
data exactly things like imitation
18:18 - 18:22
learning and incorporating real world
18:20 - 18:24
scenarios directly into the simulation
18:22 - 18:26
but okay with that said let's get to the
18:24 - 18:27
big takeaway this has been a deep dive
18:26 - 18:30
right absolutely we've covered so much
18:27 - 18:32
ground yeah from the crazy scale of the
18:30 - 18:33
Giga flat simulations where they racked
18:32 - 18:35
up thousands of years of Driving
18:33 - 18:37
Experience to how they used this
18:35 - 18:40
Innovative self-play method to train a
18:37 - 18:42
powerful Ai and how that AI even though
18:40 - 18:44
it was trained only in a virtual world
18:42 - 18:46
actually outperformed systems that were
18:44 - 18:49
trained on real world data it really
18:46 - 18:52
does show the potential for self-play
18:49 - 18:54
and AI to revolutionize the future of
18:52 - 18:56
Transportation it really is incredible
18:54 - 18:58
This research shows that we're getting
18:56 - 19:00
closer to safe and reliable self-driving
18:58 - 19:02
cars and it's all thanks to this
19:00 - 19:04
Innovative system and the researchers
19:02 - 19:05
who developed it so that wraps up our
19:04 - 19:07
Deep dive for today thanks for joining
19:05 - 19:09
us and engaging with us yes it was great
19:07 - 19:10
to have you along for the ride keep
19:09 - 19:13
exploring keep learning and we'll see
19:10 - 19:15
you next time thanks again and goodbye
19:26 - 19:32
now ah yeah uh-huh
19:29 - 19:35
so the human data is not very important
19:38 - 19:41
pay-per-view that's a really insightful
19:40 - 19:43
point and it gets to the heart of This
19:41 - 19:46
research yeah it's not that human
19:43 - 19:48
driving data is completely unimportant
19:46 - 19:50
but this paper really challenges the
19:48 - 19:52
traditional idea that it's absolutely
19:50 - 19:54
essential right like usually you'd think
19:52 - 19:56
an AI would need to learn from tons of
19:54 - 19:59
human driving examples and while human
19:56 - 20:01
data can be helpful this work shows that
19:59 - 20:03
a lot of really robust and complex
20:01 - 20:05
driving behaviors can emerge purely from
20:03 - 20:10
self-play so the AI is learning from its
20:05 - 20:10
own experiences in the simulated world