00:01 - 00:07
want to thank the
00:03 - 00:10
organizers for choosing a paper for this
00:07 - 00:13
award it was very
00:10 - 00:16
nice and I also want to thank my
00:13 - 00:20
incredible co-authors and collaborators
00:16 - 00:22
oral vineel and qule who stood right
00:20 - 00:26
before you a moment
00:22 - 00:31
ago and what you have here
00:26 - 00:31
is an image a screenshot
00:34 - 00:40
from a similar talk 10 years ago at new
00:37 - 00:43
RPS in 2014 in
00:40 - 00:46
Montreal and it was a much more innocent
00:43 - 00:49
time here we are shown in the
00:46 - 00:52
photos this is the before here's the
00:49 - 00:52
after by the
00:53 - 01:00
way and now we've got more
00:56 - 01:01
experienced hopefully wiser
01:00 - 01:06
but here I'd like to talk a little bit
01:01 - 01:08
about the work itself and maybe a 10year
01:06 - 01:12
retrospective on
01:08 - 01:16
it because a lot of the things in this
01:12 - 01:18
work were correct but some not so much
01:16 - 01:20
and we can review them and we can see
01:18 - 01:23
what happened and how it gently flowed
01:20 - 01:27
to where we are
01:23 - 01:30
today so let's begin by talking about
01:27 - 01:32
what we did and the way we'll do it
01:30 - 01:37
is by showing
01:32 - 01:40
slides from the same talk 10 years
01:37 - 01:42
ago but the summary of what we did is
01:40 - 01:45
the following three bullet points it's
01:42 - 01:47
an auto regressive model train on text
01:45 - 01:50
it's a large neural network and it's a
01:47 - 01:54
large data set and that's it now let's
01:50 - 01:58
dive in into the details a little bit
01:54 - 02:01
more so this was a slide 10 years ago
01:58 - 02:04
not too bad the Deep load
02:01 - 02:07
hypothesis and what we said here is that
02:04 - 02:08
if you have a large neural network with
02:08 - 02:14
layers then it can do anything that a
02:11 - 02:16
human being can do in a fraction of a
02:14 - 02:19
second like why did we have this
02:16 - 02:21
emphasis emphasis on things that human
02:19 - 02:25
beings can do in a fraction of a second
02:21 - 02:28
why this thing specifically well if you
02:25 - 02:30
believe the Deep learning Dogma so to
02:28 - 02:31
say that artificial neurons and
02:30 - 02:33
biological neurons are similar or at
02:31 - 02:35
least not too
02:33 - 02:37
different and you believe that real
02:35 - 02:40
neurons are slow than anything that we
02:37 - 02:42
can do quickly by we I mean human beings
02:40 - 02:44
I even mean just one human in the entire
02:42 - 02:46
world if there is one human in the
02:44 - 02:48
entire world that can do some task in a
02:46 - 02:50
fraction of a second then a 10 layer
02:48 - 02:52
neural network can do it too right it
02:50 - 02:54
follows you just take their connections
02:52 - 02:57
and you embed them inside your neuronet
02:54 - 02:58
the artificial one so this was the
02:57 - 03:00
motivation anything that a human being
02:58 - 03:01
can do in a fraction of a second second
03:00 - 03:04
a big 10 10 layer neural network can do
03:01 - 03:06
too we focused on 10 layer neural
03:04 - 03:08
networks because this was the neural
03:06 - 03:09
networks we knew how to train back in
03:09 - 03:15
day if you could go beyond in your
03:12 - 03:17
layers somehow then you could do more
03:15 - 03:20
but back then we could only do 10 layers
03:17 - 03:23
which is why we emphasized whatever
03:20 - 03:25
human beings can do in a fraction of a
03:23 - 03:28
second a different slide from the talk a
03:25 - 03:29
slide which says our main idea and you
03:28 - 03:31
may be able to recognize two things or
03:29 - 03:33
at least one thing you might be able to
03:31 - 03:36
recognize that something Auto regressive
03:33 - 03:38
is going on here what is it saying
03:36 - 03:41
really what does this slide really say
03:38 - 03:42
this slide says that if you have an auto
03:42 - 03:48
model and it predicts the next token
03:45 - 03:50
well enough then it will in fact grab
03:48 - 03:52
and capture and grasp the correct
03:50 - 03:54
distribution over whatever over
03:52 - 03:57
sequences that come next and this was a
03:54 - 03:58
relatively new thing it wasn't literally
03:58 - 04:03
ever Auto regressive neural network but
04:01 - 04:05
I would argue it was the first Auto
04:03 - 04:07
regressive neural network where we
04:05 - 04:09
really believed that if you train it
04:07 - 04:12
really well then you will get whatever
04:09 - 04:14
you want in our case back then was the
04:12 - 04:16
humble today humble then incredibly
04:14 - 04:18
audacious task of
04:16 - 04:20
translation now I'm going to show you
04:18 - 04:22
some ancient history that many of you
04:20 - 04:24
might have never seen before it's called
04:24 - 04:31
lstm to those unfamiliar an lstm is the
04:28 - 04:32
things that poor deplan researchers did
04:32 - 04:40
Transformers and it's basically a res
04:35 - 04:43
net but rotated 90° so that's an
04:40 - 04:45
lsdm and it came before it's
04:43 - 04:47
like it's like kind of like a slightly
04:45 - 04:51
more complicated reset you can see there
04:47 - 04:53
is your integrator which is now called
04:51 - 04:55
the residual stream but you've got some
04:53 - 04:55
multiplication going on it's a little
04:55 - 05:01
complicated but that's what we did it
04:58 - 05:04
was a reset Ro
05:01 - 05:05
90° another cool feature from that Old
05:04 - 05:09
talk that I want to highlight is that we
05:05 - 05:11
used parallelization but not just any
05:09 - 05:14
parallelization we used
05:11 - 05:16
pipelining as witnessed by this one
05:16 - 05:22
GPU was it wise to pipeline as we now
05:19 - 05:25
know pipelining is not wise but we were
05:22 - 05:28
not as wise back then so we used that
05:25 - 05:31
and we got a 3.5x speed up using eight
05:32 - 05:36
and the conclusion slide in some sense
05:35 - 05:39
the conclusion slide from the talk from
05:36 - 05:41
back then is the most important slide
05:39 - 05:45
because it spelled out what could
05:41 - 05:47
arguably be the beginning of the scaling
05:45 - 05:49
hypothesis right that if you have a very
05:47 - 05:51
big data set and you train a very big
05:49 - 05:54
neural network then success is
05:51 - 05:58
guaranteed and one can argue if one is
05:54 - 06:01
charitable that this indeed has been
05:58 - 06:05
what's been happening
06:01 - 06:07
I want to mention one other idea and
06:05 - 06:09
this is I claim the idea that truly
06:07 - 06:11
stood the test of time it's the core
06:09 - 06:14
idea of deploying itself it's the idea
06:11 - 06:17
of connectionism it's the idea that if
06:14 - 06:21
you allow yourself to
06:17 - 06:23
believe that an artificial neuron is
06:21 - 06:26
kind of sort
06:23 - 06:29
of like a biological
06:26 - 06:32
neuron right if you believe that one is
06:29 - 06:32
kind of sort like the
06:32 - 06:37
other then it gives you the confidence
06:36 - 06:39
to believe that very large neural
06:37 - 06:41
networks they don't need to be literally
06:39 - 06:44
human brain scale they might be a little
06:41 - 06:46
bit smaller but you could configure them
06:44 - 06:50
to do pretty much all the
06:46 - 06:52
things that we do human beings there's
06:50 - 06:55
still a difference oh I forgot to the
06:52 - 06:57
end there is still a difference because
06:55 - 07:00
the human brain also figures out how to
06:57 - 07:02
reconfigure itself whereas we are using
07:00 - 07:04
the best learning algorithms that we
07:04 - 07:09
require as many data points as there are
07:07 - 07:10
parameters human beings are still better
07:10 - 07:19
regard but what this led so I claim
07:16 - 07:21
arguably to the age of pre-training and
07:19 - 07:25
the age of pre-training is what we might
07:21 - 07:28
say the gpt2 model the gpt3 model the
07:25 - 07:31
scaling laws and I want to specifically
07:28 - 07:36
call out my former collaborators Alec
07:31 - 07:36
Radford also Jared Kaplan Dario
07:36 - 07:41
mode for really making this
07:39 - 07:43
work but that led to the age of
07:41 - 07:45
pre-training and this is what's been the
07:43 - 07:48
driver of all of progress all the
07:45 - 07:49
progress that we see today extra large
07:49 - 07:55
networks extraordinar large neural
07:51 - 07:58
networks trained on huge data
07:55 - 08:00
sets but pre-training as we know it will
07:58 - 08:04
unquestionably end
08:00 - 08:06
pre-training will end why will it end
08:04 - 08:09
because while computers growing through
08:06 - 08:12
better Hardware better
08:09 - 08:15
algorithms and logic clusters right all
08:12 - 08:17
those things keep increasing your
08:15 - 08:20
compute all these things keep increasing
08:17 - 08:23
your compute the data is not
08:20 - 08:26
growing because we have but one
08:23 - 08:28
internet we have but one
08:26 - 08:31
internet you could even say you can even
08:28 - 08:33
go as far as to say that data is the
08:31 - 08:36
fossil fuel of
08:33 - 08:39
AI it was like created
08:36 - 08:44
somehow and now we use
08:39 - 08:46
it and we've achieved Peak data and
08:44 - 08:48
there'll be no more we have to deal with
08:46 - 08:52
the data that we have now it still still
08:48 - 08:56
let us go quite far but this
08:52 - 09:00
is there's only one
08:56 - 09:02
internet so here I'll take um a bit of
09:00 - 09:04
Liberty to speculate about what comes
09:02 - 09:06
next actually I don't need to speculate
09:04 - 09:08
because many people are speculating too
09:06 - 09:10
and I'll mention their
09:08 - 09:13
speculations you may have heard the
09:10 - 09:16
phrase agents it's common and I'm sure
09:13 - 09:18
that eventually something will happen
09:16 - 09:20
but people feel like something agents is
09:20 - 09:26
future more concretely but also a little
09:23 - 09:28
bit vaguely synthetic data but what does
09:26 - 09:30
synthetic data mean figuring this out is
09:28 - 09:32
a big challenge
09:30 - 09:35
and I'm sure that different people have
09:32 - 09:37
all kinds of interesting progress there
09:35 - 09:39
and an inference time compute or maybe
09:37 - 09:42
what's been most recently most vividly
09:39 - 09:45
seen in 01 the o1 model these are all
09:42 - 09:46
examples of things of people trying to
09:46 - 09:50
out what to do after
09:48 - 09:52
pre-training and those are all very good
09:52 - 09:58
do I want to mention one other example
09:56 - 10:00
from biology which I think is really
10:00 - 10:07
and the example is this so about many
10:04 - 10:09
many years ago at this conference also I
10:07 - 10:12
saw a talk where someone presented this
10:09 - 10:16
graph but the graph showed the
10:12 - 10:18
relationship between the size of the
10:18 - 10:24
of the size of the body of a mammal and
10:21 - 10:27
the size of their brain in this case
10:24 - 10:29
it's in mass and the that talk I
10:27 - 10:31
remember vividly they were saying look
10:29 - 10:33
it's in biology everything is so messy
10:31 - 10:36
but here you have one rare example where
10:33 - 10:38
there is a very tight relationship
10:36 - 10:41
between the size of the body of the
10:38 - 10:43
animal and their brain and totally
10:41 - 10:46
randomly I became curious at this graph
10:43 - 10:48
and one of the early one of the early so
10:46 - 10:50
I went to Google to do research to to
10:48 - 10:53
look for this graph and one of the
10:50 - 10:55
images and Google Images was this and
10:53 - 10:58
the interesting thing in this
10:55 - 11:00
image is you see like I don't know is
10:58 - 11:02
the mouse work working oh yeah the mouse
11:00 - 11:05
is working great so you've got this
11:02 - 11:07
mammals right all the different
11:05 - 11:10
mammals then you've got nonhuman
11:07 - 11:12
primates it's basically the same thing
11:10 - 11:16
but then you've got the hominids and to
11:12 - 11:19
my knowledge hominids are like close
11:16 - 11:22
relatives to the humans in
11:19 - 11:22
evolution like the
11:24 - 11:29
neand there's a bunch of them like it's
11:27 - 11:33
called homohabilis maybe there a whole
11:29 - 11:35
bunch and they're all here and what's
11:33 - 11:38
interesting is that they have a
11:35 - 11:40
different slope on their brain to body
11:40 - 11:46
exponent so that's pretty cool what that
11:43 - 11:49
means is that there is a precedent there
11:46 - 11:53
is an example
11:49 - 11:56
of biology figuring out some kind of
11:53 - 11:58
different scaling something clearly is
11:56 - 12:00
different so I think that is cool and by
11:58 - 12:02
the way I want to highlight highl light
12:00 - 12:06
this xaxis is log scale you see this is
12:02 - 12:12
100 this is a th000 10,000 100,000 and
12:06 - 12:12
likewise in grams 1 g 10 G 100 g th000
12:15 - 12:19
so it is possible for things to be
12:17 - 12:20
different the things that we are doing
12:19 - 12:22
the things that we've been scaling so
12:20 - 12:25
far is actually the first thing that we
12:22 - 12:28
figured out how to scale and without
12:25 - 12:30
doubt the field everyone who's working
12:28 - 12:33
here will figure out
12:30 - 12:36
what to do but I want to talk here I
12:33 - 12:38
want to take a few minutes and speculate
12:36 - 12:40
about the longer term the longer term
12:38 - 12:42
where are we all headed right we're
12:40 - 12:47
making all this progress it's an it's
12:42 - 12:49
astounding progress It's really I
12:47 - 12:51
mean those of you who' have been in the
12:49 - 12:55
field 10 years ago and you remember just
12:51 - 12:58
how incapable everything has
12:55 - 13:00
been like yes you can say even if you
12:58 - 13:02
kind of say of course learning still to
13:00 - 13:04
see it is just
13:02 - 13:06
unbelievable it's
13:04 - 13:07
completely I can't convey that feeling
13:07 - 13:11
you you know if you joined the field in
13:09 - 13:13
the last two years then of course you
13:11 - 13:15
speak to computers and they talk back to
13:13 - 13:18
you and they disagree and that's what
13:15 - 13:21
computers are but it hasn't always been
13:18 - 13:24
the case but I want to talk to a little
13:21 - 13:27
bit about super intelligence just a bit
13:24 - 13:29
cuz that is obviously where this field
13:27 - 13:32
is headed this is obviously what's being
13:29 - 13:33
built here and the thing about super
13:32 - 13:37
intelligence is that it will be
13:33 - 13:40
different qualitatively from what we
13:37 - 13:42
have and my goal in the next minute to
13:42 - 13:47
you some concrete intuition of how it
13:45 - 13:48
will be different so that you yourself
13:47 - 13:51
could reason about
13:48 - 13:53
it so right now we have our incredible
13:51 - 13:54
language models and the unbelievable
13:53 - 13:57
chat bot and they can even do things but
13:54 - 13:59
they're also kind of strangely
13:57 - 14:01
unreliable and they get confused
13:59 - 14:04
when while also
14:01 - 14:06
having dramatically superhuman
14:04 - 14:09
performance on evals so it's really
14:06 - 14:13
unclear how to reconcile this but
14:09 - 14:15
eventually sooner or later the following
14:13 - 14:17
will be achieved those systems are
14:15 - 14:19
actually going to be agentic in a real
14:17 - 14:22
ways whereas right now the systems are
14:19 - 14:24
not agents in any meaningful sense just
14:22 - 14:27
very that might be too strong they're
14:24 - 14:30
very very slightly agentic just
14:27 - 14:31
beginning it will actually reason and by
14:30 - 14:35
the way I want to mention something
14:31 - 14:37
about reasoning is that a system that
14:35 - 14:40
reasons the more it reasons the more
14:37 - 14:41
unpredictable it becomes the more it
14:40 - 14:43
reasons the more unpredictable it
14:41 - 14:45
becomes all the Deep learning that we've
14:43 - 14:47
been used to is very predictable because
14:45 - 14:49
if you've been working on replicating
14:47 - 14:53
human intuition essentially it's like
14:49 - 14:55
the gut fi if you come back to the 0.1
14:53 - 14:59
second reaction time what kind of
14:55 - 15:02
processing we do in our brains well
14:59 - 15:05
it's our intuition so we've endowed
15:02 - 15:07
ouris with some of that intuition but
15:05 - 15:10
reasoning you're seeing some early signs
15:07 - 15:11
of that reasoning is unpredictable and
15:10 - 15:14
one reason to see that is because the
15:11 - 15:16
chess AIS the really good ones are
15:14 - 15:20
unpredictable to the best human chess
15:16 - 15:23
players so we will have to be dealing
15:20 - 15:25
with AI systems that are incredibly
15:23 - 15:27
unpredictable they will understand
15:25 - 15:29
things from limited data they will not
15:27 - 15:31
get confused all the things which are
15:29 - 15:34
really big limitations I'm not saying
15:31 - 15:36
how by the way and I'm not saying when
15:34 - 15:38
I'm saying that it
15:36 - 15:40
will and when all those things will
15:38 - 15:42
happen together with
15:40 - 15:45
self-awareness because why not
15:42 - 15:48
self-awareness is useful it is part your
15:45 - 15:50
ourselves are parts of our own world
15:48 - 15:52
models when all those things come
15:50 - 15:54
together we will have systems of
15:52 - 15:56
radically different qualities and
15:54 - 15:58
properties that exist today and of
15:56 - 16:00
course they will have incredible and
15:58 - 16:01
amazing capabili is but the kind of
16:00 - 16:03
issues that come up with systems like
16:01 - 16:06
this and I'll just leave it as an
16:03 - 16:08
exercise to um
16:06 - 16:12
imagine it's very different from what we
16:12 - 16:18
and I would say that it's definitely
16:15 - 16:22
also impossible to predict the future
16:18 - 16:25
really all kinds of stuff is possible
16:22 - 16:30
but on this uplifting note I will
16:25 - 16:30
conclude thank you so much um
16:45 - 16:50
you um now in
16:47 - 16:52
2024 are there other biological
16:50 - 16:54
structures that are part of human
16:52 - 16:57
cognition that you think are worth
16:54 - 17:00
exploring in a similar way or that
16:57 - 17:00
you're interested in anyway
17:03 - 17:10
so the way I'd answer this
17:05 - 17:12
question is that if you are or someone
17:10 - 17:16
is a person who has a specific insight
17:12 - 17:18
about hey we are all being extremely
17:16 - 17:19
silly because clearly the brain does
17:18 - 17:21
something and we are
17:19 - 17:25
not and that's something that can be
17:21 - 17:28
done they should pursue it I personally
17:29 - 17:32
well depends on the level of abstraction
17:31 - 17:34
you're looking at maybe I'll answer it
17:32 - 17:38
this way like there's been a lot of
17:34 - 17:40
desire to make biologically inspired Ai
17:38 - 17:42
and you could argue on some level that
17:40 - 17:44
biologically inspired AI is incredibly
17:42 - 17:46
successful which is all of the learning
17:44 - 17:48
biologically inspired AI but on the
17:46 - 17:50
other hand the biological inspiration
17:48 - 17:53
was very very very modest it's like
17:50 - 17:55
let's use neurons this is the full
17:53 - 17:56
extent of the biological inspiration
17:55 - 17:59
let's use
17:56 - 18:01
neurons and more detailed bi iCal
17:59 - 18:04
inspiration has been very hard to come
18:01 - 18:06
by but I wouldn't rule it out I think if
18:04 - 18:08
someone has a special Insight they might
18:06 - 18:10
be able to to see something and that
18:11 - 18:18
useful I have a question for you um
18:14 - 18:19
about sort of autocorrect um so here is
18:18 - 18:23
here's the question you mentioned
18:19 - 18:25
reasoning as being um one of the core
18:23 - 18:29
aspects of maybe the modeling in the
18:25 - 18:31
future and maybe a differentiator
18:29 - 18:33
um what we saw in some of the poster
18:31 - 18:36
sessions is that hallucinations in
18:33 - 18:37
today's models are the way we're
18:36 - 18:39
analyzing I mean maybe you correct me
18:37 - 18:41
you're the expert on this but the way
18:39 - 18:43
we're analyzing whether a model is
18:41 - 18:45
hallucinating today without because we
18:43 - 18:47
know of the dangers of models not being
18:45 - 18:50
able to reason that we're using a
18:47 - 18:51
statistical analysis let's say some
18:50 - 18:54
amount of standard deviations or
18:51 - 18:57
whatever away from the mean in the
18:54 - 19:00
future wouldn't it would do you think
18:57 - 19:01
that a model given reasoning will be
19:00 - 19:04
able to correct itself sort of
19:01 - 19:07
autocorrect itself and that will be a
19:04 - 19:09
core feature of Future model so that
19:07 - 19:11
there won't be as many hallucinations
19:09 - 19:13
because the model will recognize when I
19:11 - 19:15
maybe that's too esoteric of a question
19:13 - 19:16
but the model will be able to reason and
19:15 - 19:18
understand when a Hallucination is
19:16 - 19:21
occurring does the question make sense
19:18 - 19:24
yes and the answer is also yes I think
19:21 - 19:27
what you described is extremely highly
19:24 - 19:29
plausible yeah I mean you should check I
19:27 - 19:31
mean for yeah it's I wouldn't I wouldn't
19:29 - 19:33
rule out that it might already be
19:31 - 19:36
happening with some of the you know
19:33 - 19:39
early reasoning models of today I don't
19:36 - 19:41
know but longer term why not yeah I mean
19:39 - 19:43
it's part part of like Microsoft Word
19:41 - 19:46
like autocorrect it's a you know it's a
19:43 - 19:48
it's a core feature yeah I just I mean I
19:46 - 19:51
think calling it autocorrect is really
19:48 - 19:55
doing any disservice I think you are
19:51 - 19:58
when you say autocorrect you evoke
19:55 - 19:59
like it's far grander than autocorrect
19:58 - 20:03
but other than but you know this point
19:59 - 20:03
aside the answer is yes thank
20:03 - 20:11
you hiia I loved the ending uh
20:08 - 20:13
mysteriously uh leaving out do they
20:11 - 20:15
replace us or are they you know Superior
20:13 - 20:19
do they need rights you know it's a new
20:15 - 20:21
species of homo sapien spawned
20:19 - 20:24
intelligence so maybe they need I mean
20:21 - 20:26
uh I think the RL guy uh thinks they
20:24 - 20:30
think uh you know we need rights for
20:26 - 20:32
these things I have a UNR question to
20:30 - 20:35
that how do you how do you create the
20:32 - 20:39
right incentive mechanisms for Humanity
20:35 - 20:42
to actually create it in a way that
20:39 - 20:45
gives it the freedoms that we have as
20:45 - 20:50
sapiens you know I feel like this in
20:48 - 20:52
some in some in some sense those are
20:50 - 20:55
those are the kind of questions that
20:52 - 20:59
people should be uh reflecting on
21:05 - 21:10
to your question about what incentive
21:07 - 21:12
structure should we create I I don't
21:10 - 21:15
feel that I know I don't feel confident
21:12 - 21:17
answering questions like this
21:15 - 21:20
because uh it's like you're talking
21:17 - 21:23
about creating some kind of a top down
21:20 - 21:26
structure government thing I don't know
21:23 - 21:29
it could be a cryptocurrency too yeah I
21:26 - 21:32
mean there's bit tensor you know those
21:29 - 21:34
things I don't feel like I am the right
21:32 - 21:38
person to comment on
21:34 - 21:38
cryptocurrency but
21:39 - 21:43
but you know there is a chance by the
21:42 - 21:46
way what what you're describing will
21:43 - 21:49
happen that indeed we will have you know
21:46 - 21:53
in some sense it's it's it's not a bad
21:49 - 21:55
end result if you have AIS and all they
21:53 - 21:58
want is to coexist with
21:55 - 22:00
us and also just to have rights maybe
21:58 - 22:03
that will be fine
22:00 - 22:05
it's but I don't know I mean I think
22:03 - 22:06
things are so incredibly unpredictable I
22:05 - 22:10
I hesitate to comment but I encourage
22:06 - 22:12
the speculation thank you uh and uh yeah
22:10 - 22:15
thank you for the talk it's really
22:12 - 22:17
awesome hi there thank you for the great
22:15 - 22:20
talk my name is shalev liit from
22:17 - 22:22
University of Toronto working with
22:20 - 22:26
Sheila thanks for all the work you've
22:22 - 22:29
done I wanted to ask do you think llms
22:26 - 22:31
generalize multihop Reon reasoning out
22:32 - 22:39
distribution so okay the question
22:36 - 22:41
assumes that the answer is yes or no but
22:39 - 22:44
the question should not be answered with
22:41 - 22:46
yes or no because what does it mean out
22:44 - 22:48
of distribution generalization what does
22:46 - 22:49
it mean what does it mean in
22:48 - 22:52
distribution and what does it mean out
22:49 - 22:56
of distribution because it's a test of
22:52 - 22:59
time talk I'll say that long long
22:56 - 23:01
ago before people were using deep
22:59 - 23:04
learning they were using things like
23:01 - 23:06
string matching engrams for machine
23:04 - 23:09
translation people were using
23:06 - 23:11
statistical phrase tables can you
23:09 - 23:15
imagine they had tens of thousands of
23:11 - 23:16
code of complexity which was I mean it's
23:15 - 23:20
it was truly
23:16 - 23:23
unfathomable and back then
23:20 - 23:25
generalization meant is it literally not
23:23 - 23:29
in this the same phrasing as in the data
23:25 - 23:32
set now we may say well my model
23:29 - 23:34
achieves this high score on um I don't
23:32 - 23:37
know math competitions but maybe the
23:34 - 23:39
math maybe some discussion in some Forum
23:37 - 23:41
on the internet was about the same ideas
23:39 - 23:43
and therefore it's memorized well okay
23:41 - 23:44
you could say maybe it's in distribution
23:43 - 23:46
maybe it's
23:44 - 23:47
memorization but I also think that our
23:46 - 23:50
standards for what counts as
23:47 - 23:52
generalization have increased really
23:50 - 23:54
quite substantially dramatically
23:52 - 23:56
unimaginably if you keep
23:56 - 24:02
and so I think then answer is to some
24:00 - 24:04
degree probably not as well as human
24:02 - 24:08
beings I think it is true that human
24:04 - 24:10
beings generalize much better but at the
24:08 - 24:12
same time they definitely generalize out
24:10 - 24:16
of distribution to some
24:12 - 24:18
degree I hope it's a useful topological
24:16 - 24:22
answer thank
24:18 - 24:23
you and unfortunately we're out of time
24:22 - 24:26
for this session I have a feeling we
24:23 - 24:28
could go on for the next six hours uh
24:26 - 24:32
but thank you so much Ilia for the talk
24:28 - 24:32
thank you wonderful