00:21 - 00:25
hello welcome to the 12 days of open AI
00:24 - 00:27
we're going to try something that as far
00:25 - 00:28
as we know no tech company has done
00:27 - 00:31
before which is every day for the next
00:28 - 00:33
12 every week we are going to launch or
00:31 - 00:35
demo some new thing that we've built and
00:33 - 00:36
we think we've got some great stuff for
00:35 - 00:39
you starting today we hope you'll really
00:36 - 00:41
love it and you know we'll try to make
00:39 - 00:42
this fun and fast and not take too long
00:41 - 00:44
but it'll be a way to show you what
00:42 - 00:46
we've been working on and a little
00:44 - 00:47
holiday present from us so we'll jump
00:46 - 00:49
right into this first day uh today we
00:47 - 00:51
actually have two things to launch the
00:49 - 00:53
first one is the full version of 01 we
00:51 - 00:54
have been very hard at work we've
00:53 - 00:56
listened to your feedback you want uh
00:54 - 00:59
you like ow1 preview but you want it to
00:56 - 01:00
be smarter and faster and be multimodal
00:59 - 01:02
and be better instruction following a
01:00 - 01:04
bunch of other things so we' put a lot
01:02 - 01:06
of work into this and for scientists
01:04 - 01:08
engineers coders we think they will
01:06 - 01:10
really love this new model uh I'd like
01:08 - 01:13
to show you quickly about how it
01:10 - 01:16
performs so you can see uh the jump from
01:13 - 01:20
GPT 40 to 01 preview across math
01:16 - 01:22
competition coding GP QA Diamond um and
01:20 - 01:24
you can see that 01 is a pretty big step
01:22 - 01:26
forward um it's also much better in a
01:24 - 01:27
lot of other ways but raw intelligence
01:26 - 01:29
is something that we care about coding
01:27 - 01:30
performance in particular is an area
01:29 - 01:34
where people people are using the model
01:30 - 01:35
a lot so in just a minute uh these guys
01:34 - 01:38
will demo some things about1 they'll
01:35 - 01:39
show you how it does at speed how it
01:38 - 01:41
does at really hard problems how it does
01:39 - 01:42
with multimodality but first I want to
01:41 - 01:45
talk just for a minute about the second
01:42 - 01:47
thing we're launching today a lot of
01:45 - 01:49
people uh Power users of chat gbt at
01:47 - 01:51
this point they really use it a lot and
01:49 - 01:53
they want more compute than $20 a month
01:51 - 01:56
can buy so we're launching a new tier
01:53 - 01:58
chat gbt pro and pro has unlimited
01:56 - 02:01
access to our models uh and also things
01:58 - 02:04
like advanced voice mode it also has a
02:01 - 02:06
uh a new thing called 01 prom mode so 01
02:04 - 02:09
is the smartest model in the world now
02:06 - 02:10
except for 01 being used in PR mode and
02:09 - 02:13
for the hardest problems that people
02:10 - 02:15
have uh o1 Pro mode lets you do even a
02:13 - 02:17
little bit better um so you can see at
02:15 - 02:19
competition math you can see a GP QA
02:17 - 02:21
Diamond um and these boosts may look
02:19 - 02:22
small but in in complex workflows where
02:21 - 02:25
you're really pushing the limits of
02:22 - 02:27
these models it's pretty significant uh
02:25 - 02:30
I'll show you one more thing about Pro
02:27 - 02:31
about the pr mode so one that people
02:30 - 02:34
really have said they want is
02:31 - 02:36
reliability and here you can see how the
02:34 - 02:37
reliability of an answer from prom mode
02:36 - 02:40
Compares to1 and this isn't even
02:37 - 02:41
stronger Delta and again for our Pro
02:40 - 02:44
users we've heard a lot about how much
02:41 - 02:46
people want this chat GPT Pro is $200 a
02:44 - 02:48
month uh launches today over the course
02:46 - 02:50
of this these 12 days we have some other
02:48 - 02:52
things to add to it that we think you
02:50 - 02:55
also really love um but Unlimited Model
02:52 - 02:57
use and uh this new1 prom mode so I want
02:55 - 02:59
to jump right in and we'll show some of
02:57 - 03:01
those demos that we talked about uh and
02:59 - 03:03
these are some of the guys that helped
03:01 - 03:06
build 01 uh with many other people
03:03 - 03:09
behind them on the team thanks Sam hi um
03:06 - 03:11
I'm hung one I'm Jason and I'm Max we're
03:09 - 03:13
all research scientists who worked on
03:11 - 03:15
building 01 01 is really distinctive
03:13 - 03:17
because it's the first model we've
03:15 - 03:19
trained that thinks before it responds
03:17 - 03:21
meaning it gives much better and often
03:19 - 03:22
more detailed and more correct responses
03:21 - 03:25
than other models you might have tried
03:22 - 03:28
01 is being rolled out today to all uh
03:25 - 03:31
plus and soon to be Pro subscribers on
03:28 - 03:34
chat gbt replacing o1
03:31 - 03:36
o1 model is uh faster and smarter than
03:34 - 03:38
the o1 preview model which we launched
03:36 - 03:40
in September after the launch many
03:38 - 03:43
people asked about the multimodal input
03:40 - 03:46
so we added that uh so now the oan model
03:43 - 03:48
live today is able to Reon through both
03:46 - 03:50
images and text
03:48 - 03:52
jointly as Sam mentioned today we're
03:50 - 03:56
also going to launch a new tier of Chad
03:52 - 03:59
gbt called chbt pro chbt pro offers
03:56 - 04:03
unlimited access to our best models like
03:59 - 04:06
o1 40 and advanced voice chbt Pro also
04:03 - 04:09
has a special way of using 01 called 01
04:06 - 04:11
Pro mode with 01 Pro mode you can ask
04:09 - 04:13
the model to use even more compute to
04:11 - 04:14
think even harder on some of the most
04:14 - 04:19
problems we think the audience for chat
04:17 - 04:21
gbt Pro will be the power users of chat
04:19 - 04:22
GPT those who are already pushing the
04:21 - 04:25
models to the limits of their
04:22 - 04:27
capabilities on tasks like math
04:25 - 04:28
programming and writing it's been
04:27 - 04:30
amazing to see how much people are
04:28 - 04:32
pushing a one preview uh how much people
04:30 - 04:33
who do technical work all day get out of
04:32 - 04:36
this and uh we're really excited to let
04:33 - 04:38
them push it further yeah we also really
04:36 - 04:40
think that 01 will be much better for
04:38 - 04:42
everyday use cases not necessarily just
04:40 - 04:43
really hard math and programming
04:42 - 04:45
problems in particular one piece of
04:43 - 04:47
feedback we received about o1 preview
04:45 - 04:48
constantly was that it was way too slow
04:47 - 04:50
it would think for 10 seconds if you
04:48 - 04:52
said hi to it and we fixed that was
04:50 - 04:55
really annoying it it was kind of funny
04:52 - 04:56
honestly it really thought it cared
04:55 - 04:59
really thought hard about saying back
04:56 - 05:01
yeah um and so we fixed that 01 will now
04:59 - 05:03
think much more intelligently if you ask
05:01 - 05:04
it a simple question it'll respond
05:03 - 05:06
really quickly and if you ask it a
05:04 - 05:08
really hard question it'll think for a
05:06 - 05:09
really long time uh we ran a pretty
05:08 - 05:11
detailed Suite of human evaluations for
05:09 - 05:14
this model and what we found was that it
05:11 - 05:17
made major mistakes about 34% less often
05:14 - 05:19
than 01 preview while thinking fully
05:17 - 05:21
about 50% faster and we think this will
05:19 - 05:23
be a really really noticeable difference
05:21 - 05:25
for all of you so I really enjoy just
05:23 - 05:26
talking to these models I'm a big
05:25 - 05:28
history buff and I'll show you a really
05:26 - 05:30
quick demo of for example a sort of
05:28 - 05:33
question that I might ask one of these
05:30 - 05:36
models so uh right here I on the left I
05:33 - 05:37
have 01 on the right I have o1 preview
05:36 - 05:39
and I'm just asking at a really simple
05:37 - 05:41
history question list the Roman ERS of
05:39 - 05:44
the second century tell me about their
05:41 - 05:46
dates what they did um not hard but you
05:44 - 05:49
know GPT 40 actually gets this wrong a
05:46 - 05:51
reasonable fraction of the time um and
05:49 - 05:53
so I've asked o1 this I've asked o1
05:51 - 05:55
preview this I tested this offline a few
05:53 - 05:58
times and I found that 01 on average
05:55 - 05:59
responded about 60% faster than1 preview
05:58 - 06:01
um this could be a little bit varable
05:59 - 06:04
because right now we're in the process
06:01 - 06:07
of swapping all our gpus from 01 uh Pro
06:04 - 06:11
preview to 01 so actually 01 thought for
06:07 - 06:13
about 14 seconds 01 preview still
06:11 - 06:15
going there's a lot of Roman emperors
06:13 - 06:16
there's a lot of Roman emperors yeah 40
06:15 - 06:17
actually gets this wrong a lot of the
06:16 - 06:20
time there are a lot of folks who rolled
06:17 - 06:22
for like uh 6 days 12 days a month and
06:20 - 06:23
it sometimes forgets those can you do
06:22 - 06:25
them all for memory including the six
06:25 - 06:30
no yep so here we go 01 thought for
06:28 - 06:32
about 14 seconds preview thought for
06:30 - 06:33
about 33 seconds these should both be
06:32 - 06:35
faster once we finish deploying but we
06:33 - 06:37
wanted this to go live right now exactly
06:35 - 06:39
um so yeah we we think you'll really
06:37 - 06:40
enjoy talking to this model we we found
06:39 - 06:42
that it gave great responses it thought
06:40 - 06:44
much faster it should just be a much
06:42 - 06:45
better user experience for everyone so
06:44 - 06:47
one other feature we know that people
06:45 - 06:49
really wanted for everyday use cases
06:47 - 06:50
that we've had requested a lot is
06:49 - 06:52
multimodal inputs and image
06:50 - 06:54
understanding and hungan is going to
06:52 - 06:57
talk about that now yep to illustrate
06:54 - 07:00
the multimodal input and reasoning uh I
06:57 - 07:03
created this toy problem uh with some
07:00 - 07:05
hand drone diagrams and so on so here it
07:03 - 07:08
is it's hard to see so I already took a
07:05 - 07:11
photo of this and so let's look at this
07:08 - 07:14
photo in a laptop so once you upload the
07:11 - 07:17
image into the chat GPT you can click on
07:14 - 07:20
it and um to see the zoomed in version
07:17 - 07:25
so this is a system of a data center in
07:20 - 07:28
space so maybe um in the future we might
07:25 - 07:30
want to train AI models in the space uh
07:28 - 07:33
I think we should do that but the Power
07:30 - 07:35
number looks a little low one G okay but
07:33 - 07:39
the general idea rookie numbers in this
07:35 - 07:41
rookie numbers rookie okay yeah so uh we
07:39 - 07:44
have a sun right here uh taking in power
07:41 - 07:46
on this solar panel and then uh there's
07:44 - 07:50
a small data center here that's exactly
07:46 - 07:53
what they look like yeah GPU R and then
07:50 - 07:55
pump nice pump here and one interesting
07:53 - 07:58
thing about um operation in space is
07:55 - 08:01
that on Earth we can do air cooling
07:58 - 08:03
water cooling to cool the gpus but in
08:01 - 08:06
space there's nothing there so we have
08:03 - 08:09
to radiate this um heat into the deep
08:06 - 08:12
space and that's why we need this uh
08:09 - 08:15
giant radiator cooling panel and this
08:12 - 08:18
problem is about finding the lower bound
08:15 - 08:22
estimate of the cooling panel area
08:18 - 08:24
required to operate um this 1 gaw uh uh
08:22 - 08:28
data center probably going to be very
08:24 - 08:30
Big Y let's see how big is let's see so
08:28 - 08:33
that's the problem and I'm going to this
08:30 - 08:36
prompt and uh yeah this is essentially
08:33 - 08:39
asking for that so let me uh hit go and
08:36 - 08:41
the model will think for
08:39 - 08:43
seconds by the way most people don't
08:41 - 08:46
know I've been working with hangan for a
08:43 - 08:48
long time henan actually has a PHD in
08:46 - 08:50
thermodynamics which it's totally
08:48 - 08:52
unrelated to Ai and you always joke that
08:50 - 08:55
you haven't been able to use your PhD
08:52 - 08:57
work in your job until today so you can
08:55 - 09:00
you can trust hungan on this analysis
08:57 - 09:03
finally finally uh thanks for hyping up
09:00 - 09:06
now I really have to get this right uh
09:03 - 09:08
okay so the model finished thinking only
09:06 - 09:11
10 seconds it's a simple problem so
09:08 - 09:14
let's see if how the model did it so
09:11 - 09:17
power input um so first of all this one
09:14 - 09:19
gwatt that was only drawn in the paper
09:17 - 09:21
so the model was able to pick that up
09:19 - 09:23
nicely and then um radiative heat
09:21 - 09:25
transfer only that's the thing I
09:23 - 09:29
mentioned so in space nothing else and
09:25 - 09:31
then some simplifying um uh choices and
09:29 - 09:32
one critical thing is that I
09:31 - 09:36
intentionally made this problem under
09:32 - 09:37
specified meaning that um the critical
09:36 - 09:41
parameter is a temperature of the
09:37 - 09:43
cooling panel uh I left it out so that
09:41 - 09:47
uh we can test out the model's ability
09:43 - 09:50
to handle um ambiguity and so on so the
09:47 - 09:53
model was able to recognize that this is
09:50 - 09:55
actually a unspecified but important
09:53 - 09:58
parameter and it actually picked the
09:55 - 10:00
right um range of param uh temperature
09:58 - 10:03
which is about the room temperature and
10:00 - 10:05
with that it continues to the analysis
10:03 - 10:09
and does a whole bunch of things and
10:05 - 10:10
then found out the area which is 2.42
10:09 - 10:13
million square meters just to get a
10:10 - 10:16
sense of how big this is this is about
10:13 - 10:19
2% of the uh land area of San Francisco
10:16 - 10:20
this is huge not that bad not that bad
10:20 - 10:26
okay um yeah so I guess this this uh
10:24 - 10:28
reasonable I'll skip through the rest of
10:26 - 10:33
the details but I think the model did a
10:28 - 10:35
great job job um making nice consistent
10:33 - 10:38
assumptions that um you know make the
10:35 - 10:42
required area as little as possible and
10:38 - 10:45
so um yeah so this is the demonstration
10:42 - 10:48
of the multimodal reasoning and this is
10:45 - 10:51
a simple problem but o1 is actually very
10:48 - 10:54
strong and on standard benchmarks like
10:51 - 10:55
mm muu and math Vista o actually has the
10:54 - 10:58
state-ofthe-art
10:55 - 10:59
performance now Jason will showcase the
10:58 - 11:02
the pro mode
10:59 - 11:06
great so I want to give a short demo of
11:02 - 11:09
uh chachy pt1 Pro mode um people will
11:06 - 11:11
find uh o1 PR mode the most useful for
11:09 - 11:13
say hard math science or programming
11:11 - 11:16
problems so here I have a pretty
11:13 - 11:19
challenging chemistry problem that o1
11:16 - 11:22
preview gets usually Incorrect and so I
11:19 - 11:24
will uh let the model start
11:22 - 11:27
thinking um one thing we've learned with
11:24 - 11:29
these models is that uh for these very
11:27 - 11:31
challenging problems the model can think
11:29 - 11:32
for up to a few minutes I think for this
11:31 - 11:35
problem the model usually thinks
11:32 - 11:37
anywhere from 1 minute to up to 3
11:35 - 11:39
minutes um and so we have to provide
11:37 - 11:41
some entertainment for for people while
11:39 - 11:43
the model is thinking so I'll describe
11:41 - 11:45
the problem a little bit and then if the
11:43 - 11:48
model is still thinking when I'm done
11:45 - 11:51
I've prepared a dad joke for for us uh
11:48 - 11:52
to fill the rest of the time um so I
11:51 - 11:56
hope it thinks for a long
11:52 - 11:59
time you can see uh the problem asks for
11:56 - 12:01
a protein that fits a very specif
11:59 - 12:04
specific set of criteria so uh there are
12:01 - 12:06
six criteria and the challenge is each
12:04 - 12:08
of them ask for pretty chemistry domain
12:06 - 12:09
specific knowledge that the model would
12:09 - 12:14
recall and the other thing to know about
12:11 - 12:16
this problem uh is that none of these
12:14 - 12:18
criteria actually give away what the
12:16 - 12:20
correct answer is so for any given
12:18 - 12:23
criteria there could be dozens of
12:20 - 12:24
proteins that might fit that criteria
12:23 - 12:26
and so the model has to think through
12:24 - 12:27
all the candidates and then check if
12:26 - 12:30
they fit all the
12:27 - 12:33
criteria okay so you could see the model
12:30 - 12:36
actually was faster this time uh so it
12:33 - 12:38
finished in 53 seconds you can click and
12:36 - 12:40
see some of the thought process that the
12:38 - 12:42
model went through to get the answer uh
12:40 - 12:44
you could see it's uh thinking about
12:42 - 12:46
different candidates like neural Lian
12:44 - 12:49
initially um and then it arrives at the
12:46 - 12:51
correct answer which is uh retino chisen
12:51 - 12:59
great um okay so to summarize um we saw
12:54 - 13:02
from Max that o1 is smarter and faster
12:59 - 13:05
than uh o1 preview we saw from hangan
13:02 - 13:08
that oan can now reason over both text
13:05 - 13:11
and images and then finally we saw with
13:08 - 13:15
Chach BT Pro mode uh you can use o1 to
13:11 - 13:17
think about uh the the to to to to
13:15 - 13:20
reason about the hardest uh science and
13:17 - 13:23
math problems yep there's more to come
13:20 - 13:26
um for the chat PT Pro tier uh we're
13:23 - 13:28
working on even more computer intensive
13:26 - 13:31
tasks to uh Power longer and bigger
13:28 - 13:34
tasks task for those who want to push
13:31 - 13:37
the model even further and we're still
13:34 - 13:41
working on adding tools to the o1 um
13:37 - 13:43
model such as web browsing file uploads
13:41 - 13:45
and things like that we're also hard at
13:43 - 13:47
work to bring o1 to the to the API we're
13:45 - 13:49
going to be adding some new features for
13:47 - 13:52
developers structured outputs function
13:49 - 13:53
calling developer messages and API image
13:52 - 13:55
understanding which we think you'll
13:53 - 13:57
really enjoy we expect this to be a
13:55 - 13:59
great model for developers and really
13:57 - 14:00
unlock a whole new frontier of aent
13:59 - 14:02
things you guys can build we hope you
14:00 - 14:05
love it as much as we
14:02 - 14:07
do that was great thank you guys so much
14:05 - 14:10
congratulations uh to you and the team
14:07 - 14:13
on on getting this done uh we we really
14:10 - 14:15
hope that you'll enjoy 01 and PR mode uh
14:13 - 14:16
or Pro tier uh we have a lot more stuff
14:15 - 14:19
to come tomorrow we'll be back with
14:16 - 14:21
something great for developers uh and
14:19 - 14:24
we'll keep going from there before we
14:21 - 14:27
wrap up can can we hear your joke yes uh
14:24 - 14:31
so um I made this joke this
14:27 - 14:33
morning the the joke is this so Santa
14:31 - 14:36
was trying to get his large language
14:33 - 14:37
model to do a math problem and he was
14:36 - 14:40
prompting it really hard but it wasn't
14:37 - 14:46
working how did he eventually fix
14:40 - 14:46
it no idea he used reindeer enforcement
14:48 - 14:53
learning thank you very much thank you