00:00 - 00:13
please welcome Andrew
00:13 - 00:19
in thank you it's such a good time to be
00:16 - 00:20
a builder I'm excited to be back here at
00:20 - 00:25
build what i' like to do today is share
00:23 - 00:26
you where I think are some of ai's
00:26 - 00:30
opportunities you may have heard me say
00:28 - 00:32
that I think AI is the new electricity
00:30 - 00:35
that's because a has a general purpose
00:32 - 00:36
technology like electricity if I ask you
00:35 - 00:38
what is electricity good for it's always
00:36 - 00:41
hard to answer because it's good for so
00:38 - 00:43
many different things and new AI
00:41 - 00:45
technology is creating a huge set of
00:43 - 00:47
opportunities for us to build new
00:45 - 00:50
applications that weren't possible
00:47 - 00:52
before people often ask me hey Andrew
00:50 - 00:54
where are the biggest AI opportunities
00:52 - 00:56
this is what I think of as the AI stack
00:54 - 00:58
at the lowest level is the
00:56 - 01:00
semiconductors and then on top of that
00:58 - 01:03
lot of the cloud infr to including of
01:00 - 01:05
Course Snowflake and then on top of that
01:03 - 01:08
are many of the foundation model
01:05 - 01:10
trainers and models and it turns out
01:08 - 01:11
that a lot of the media hype and
01:10 - 01:13
excitement and social media Buzz has
01:11 - 01:16
been on these layers of the stack kind
01:13 - 01:17
of the new technology layers when if
01:16 - 01:19
there's a new technology like generative
01:17 - 01:21
AI L the buzz is on these technology
01:19 - 01:24
layers and there's nothing wrong with
01:21 - 01:26
that but I think that almost by
01:24 - 01:29
definition there's another layer of the
01:26 - 01:31
stack that has to work out even better
01:29 - 01:32
and that's the applic apption layer
01:31 - 01:34
because we need the applications to
01:32 - 01:36
generate even more value and even more
01:34 - 01:39
Revenue so that you know to really
01:36 - 01:41
afford to pay the technology providers
01:39 - 01:43
below so I spend a lot of my time
01:41 - 01:45
thinking about AI applications and I
01:43 - 01:48
think that's where lot of the best
01:45 - 01:51
opportunities will be to build new
01:48 - 01:53
things one of the trends that has been
01:51 - 01:55
growing for the last couple years in no
01:53 - 01:58
small pop because of generative AI is
01:55 - 02:01
fast and faster machine learning model
01:58 - 02:03
development um and in particular
02:01 - 02:06
generative AI is letting us build things
02:03 - 02:09
faster than ever before take the problem
02:06 - 02:10
of say building a sentiment cost vario
02:09 - 02:12
taking text and deciding is this a
02:10 - 02:14
positive or negative sentiment for
02:12 - 02:16
reputation monitoring say typical
02:14 - 02:19
workflow using supervised learning might
02:16 - 02:22
be that will take a month to get some
02:19 - 02:24
label data and then you know train AI
02:22 - 02:27
model that might take a few months and
02:24 - 02:28
then find a cloud service or something
02:27 - 02:31
to deploy on that'll take another few
02:28 - 02:33
months and so for a long time very
02:31 - 02:35
valuable AI systems might take good AI
02:33 - 02:37
teams six to 12 months to build right
02:35 - 02:39
and there's nothing wrong with that I
02:37 - 02:41
think many people create very valuable
02:39 - 02:44
AI systems this way but with generative
02:41 - 02:48
AI there's certain cles of applications
02:44 - 02:51
where you can write a prompt in days and
02:48 - 02:53
then deploy it in you know again maybe
02:51 - 02:55
days and what this means is there are a
02:53 - 02:57
lot of applications that used to take me
02:55 - 02:59
and used to take very good AI teams
02:57 - 03:02
months to build that today you can build
02:59 - 03:06
in maybe 10 days or so and this opens up
03:02 - 03:09
the opportunity to experiment with build
03:06 - 03:10
new prototypes and and ship new AI
03:09 - 03:13
products that's certainly the
03:10 - 03:15
prototyping aspect of it and these are
03:13 - 03:18
some of the consequences of this trend
03:15 - 03:21
which is fast experimentation is
03:18 - 03:23
becoming a more promising path to
03:21 - 03:25
invention previously if it took six
03:23 - 03:26
months to build something then you know
03:25 - 03:28
we better study it make sure there user
03:26 - 03:30
demand have product managers we look at
03:28 - 03:32
it document it and and then spend all
03:30 - 03:33
that effort to build in it hopefully it
03:32 - 03:35
turns out to be
03:33 - 03:38
worthwhile but now for fast moving AI
03:35 - 03:40
teams I see a design pattern where you
03:38 - 03:42
can say you know what it take us a
03:40 - 03:43
weekend to throw together prototype
03:42 - 03:45
let's build 20 prototypes and see what
03:43 - 03:47
SS and if 18 of them don't work out
03:45 - 03:50
we'll just stitch them and stick with
03:47 - 03:53
what works so fast iteration and fast
03:50 - 03:55
experimentation is becoming a new path
03:53 - 03:57
to inventing new user
03:55 - 04:00
experiences um one of interesting
03:57 - 04:01
implication is that evaluations or evals
04:00 - 04:04
for short are becoming a bigger
04:01 - 04:06
bottleneck for how we build things so it
04:04 - 04:08
turns out back in supervised learning
04:06 - 04:10
world if you're collecting 10,000 data
04:08 - 04:12
points anyway to trade a model then you
04:10 - 04:14
know if you needed to collect an extra
04:12 - 04:18
1,000 data points for testing it was
04:14 - 04:19
fine whereas extra 10% increase in cost
04:18 - 04:21
but for a lot of large language Mel
04:19 - 04:24
based apps if there's no need to have
04:21 - 04:26
any trading data if you made me slow
04:24 - 04:28
down to collect a thousand test examples
04:26 - 04:30
boy that seems like a huge bottleneck
04:28 - 04:32
and so the new Dev velopment workflow
04:30 - 04:34
often feels as if we're building and
04:32 - 04:37
collecting data more in parallel rather
04:34 - 04:39
than sequentially um in which we build a
04:37 - 04:42
prototype and then as it becomes import
04:39 - 04:43
more important and as robustness and
04:42 - 04:46
reliability becomes more important then
04:43 - 04:48
we gradually build up that test St here
04:46 - 04:50
in parallel but I see exciting
04:48 - 04:53
Innovations to be had still in how we
04:50 - 04:56
build evals um and then what I'm seeing
04:53 - 04:58
as well is the prototyping of machine
04:56 - 05:00
learning has become much faster but
04:58 - 05:02
building a software application has lots
05:00 - 05:03
of steps does the product work you know
05:02 - 05:06
the design work does the software
05:03 - 05:08
integration work a lot of Plumbing work
05:06 - 05:10
um then after deployment Dev Ops and L
05:08 - 05:13
Ops so some of those other pieces are
05:10 - 05:14
becoming faster but they haven't become
05:13 - 05:17
faster at the same rate that the machine
05:14 - 05:19
learning modeling pot has become faster
05:17 - 05:21
so you take a process and one piece of
05:19 - 05:23
it becomes much faster um what I'm
05:21 - 05:25
seeing is prototyping is not really
05:23 - 05:28
really fast but sometimes you take a
05:25 - 05:30
prototype into robust reliable
05:28 - 05:33
production with guard rails and so on
05:30 - 05:34
those other steps still take some time
05:33 - 05:36
but the interesting Dynamic I'm seeing
05:34 - 05:38
is the fact that the machine learning p
05:36 - 05:41
is so fast is putting a lot of pressure
05:38 - 05:43
on organizations to speed up all of
05:41 - 05:46
those other parts as well so that's been
05:43 - 05:48
exciting progress for our few and in
05:46 - 05:51
terms of how machine learning
05:48 - 05:53
development um is speeding things up I
05:51 - 05:57
think the Mantra moved fast and break
05:53 - 06:00
things got a bad rep because you know it
05:57 - 06:01
broke things um I think some people
06:00 - 06:04
interpret this to mean we shouldn't move
06:01 - 06:08
fast but I disagree with that I think
06:04 - 06:10
the better mindra is move fast and be
06:08 - 06:12
responsible I'm seeing a lot of teams
06:10 - 06:14
able to prototype quickly evaluate and
06:12 - 06:16
test robustly so without shipping
06:14 - 06:18
anything out to The Wider world that
06:16 - 06:21
could you know cause damage or cause um
06:18 - 06:23
meaningful harm I'm finding smart teams
06:21 - 06:25
able to build really quickly and move
06:23 - 06:26
really fast but also do this in a very
06:25 - 06:28
responsible way and I find this
06:26 - 06:30
exhilarating that you can build things
06:28 - 06:32
and ship things and responsible way much
06:30 - 06:35
faster than ever
06:32 - 06:38
before now there's a lot going on in Ai
06:35 - 06:41
and of all the things going on AI um in
06:38 - 06:44
terms of technical Trend the one Trend
06:41 - 06:46
I'm most excited about is agentic AI
06:44 - 06:48
workflows and so if you to ask what's
06:46 - 06:50
the one most important AI technology to
06:48 - 06:55
pay attention to I would say is agentic
06:50 - 06:56
AI um I think when I started saying this
06:55 - 06:58
you know near the beginning of this year
06:56 - 07:01
it was a bit of a controversial
06:58 - 07:04
statement but now the word AI agents has
07:01 - 07:06
is become so widely used uh by by
07:04 - 07:08
Technical and non-technical people is
07:06 - 07:10
become you know little bit of a hype
07:08 - 07:13
term uh but so let me just share with
07:10 - 07:15
you how I view AI agents and why I think
07:13 - 07:16
they're important approaching just from
07:16 - 07:21
perspective the way that most of us use
07:19 - 07:23
large language models today is with what
07:21 - 07:25
something is called zero shot prompting
07:23 - 07:29
and that roughly means we would ask it
07:25 - 07:31
to uh give it a prompt write an essay or
07:29 - 07:33
write an output for us and it's a bit
07:31 - 07:36
like if we're going to a person or in
07:33 - 07:38
this case going to an AI and asking it
07:36 - 07:40
to type out an essay for us by going
07:38 - 07:42
from the first word writing from the
07:40 - 07:44
first word to the last word all in one
07:42 - 07:47
go without ever using backspac just
07:44 - 07:49
right from start to finish like that and
07:47 - 07:51
it turns out people you know we don't do
07:49 - 07:53
our best writing this way uh but despite
07:51 - 07:55
the difficulty of being forced to write
07:53 - 07:56
this way a Lish models do you know not
07:56 - 08:02
well here's what an agentic workflow
07:59 - 08:04
it's like uh to gener an essay we ask an
08:02 - 08:06
AI to First write an essay outline and
08:04 - 08:07
ask you do you need to do some web
08:06 - 08:09
research if so let's download some web
08:07 - 08:11
pages and put into the context of the
08:09 - 08:12
large H model then let's write the first
08:11 - 08:15
draft and then let's read the first
08:12 - 08:17
draft and critique it and revise the
08:15 - 08:20
draft and so on and this workflow looks
08:17 - 08:23
more like um doing some thinking or some
08:20 - 08:24
research and then some revision and then
08:23 - 08:27
going back to do more thinking and more
08:24 - 08:29
research and by going round this Loop
08:27 - 08:31
over and over um it takes longer but
08:29 - 08:34
this results in a much better work
08:31 - 08:36
output so in some teams I work with we
08:34 - 08:38
apply this agentic workflow to
08:36 - 08:41
processing complex tricky legal
08:38 - 08:43
documents or to um do Health Care
08:41 - 08:45
diagnosis Assistance or to do very
08:43 - 08:47
complex compliance with government
08:45 - 08:50
paperwork so many times I'm seeing this
08:47 - 08:51
drive much better results than was ever
08:50 - 08:53
possible and one thing I'm want to focus
08:51 - 08:55
on in this presentation I'll talk about
08:53 - 08:58
later is devise of visual AI where
08:55 - 09:00
agentic repal are letting us process
08:58 - 09:03
image and video data
09:00 - 09:05
but to get back to that later um it
09:03 - 09:07
turns out that there are benchmarks that
09:05 - 09:10
show seem to show a gentic workflows
09:07 - 09:12
deliver much better results um this is
09:10 - 09:15
the human eval Benchmark which is a
09:12 - 09:17
benchmark for open AI that measures
09:15 - 09:20
learning out lar rage model's ability to
09:17 - 09:23
solve coding puzzles like this one and
09:20 - 09:25
um my team collected some data turns out
09:23 - 09:29
that um on this Benchmark I think it was
09:25 - 09:33
POS K Benchmark POS K metric GB 3.5 got
09:29 - 09:36
48% right on this coding Benchmark gb4
09:33 - 09:39
huge Improvement you know
09:36 - 09:42
67% but the improvement from GB 3.5 to
09:39 - 09:46
gbd4 is dwarf by the improvement from
09:42 - 09:49
gbt 3.5 to GB 3.5 using an agentic
09:46 - 09:53
workflow um which gets over up to about
09:49 - 09:58
95% and gbd4 with an agentic workflow
09:53 - 10:00
also does much better um and so it turns
09:58 - 10:03
out that in the way Builders built
10:00 - 10:05
agentic reasoning or agentic workflows
10:03 - 10:07
in their applications there are I want
10:05 - 10:09
to say four major design patterns which
10:07 - 10:12
are reflection two use planning and
10:09 - 10:14
multi-agent collaboration and to
10:12 - 10:16
demystify agentic workflows a little bit
10:14 - 10:19
let me quickly step through what these
10:16 - 10:21
workflows mean um and I find that
10:19 - 10:22
agentic workflows sometimes seem a
10:21 - 10:24
little bit mysterious until you actually
10:22 - 10:26
read through the code for one or two of
10:24 - 10:28
these go oh that's it you know that's
10:26 - 10:29
really cool but oh that's all it takes
10:28 - 10:32
but let me just step through
10:29 - 10:36
um to for for concreteness what
10:32 - 10:39
reflection with ls looks like so I might
10:36 - 10:41
start off uh prompting an L there a
10:39 - 10:43
coder agent l so maybe an assistant
10:41 - 10:45
message to your roles to be a coder and
10:43 - 10:47
write code um so you can tell you know
10:45 - 10:50
please write code for certain tasks and
10:47 - 10:52
the L May generate codes and then it
10:50 - 10:54
turns out that you can construct a
10:52 - 10:57
prompt that takes the code that was just
10:54 - 10:59
generated and copy paste the code back
10:57 - 11:01
into the prompt and ask it you know he
10:59 - 11:04
some code intended for a Tas examine
11:01 - 11:05
this code and critique it right and it
11:04 - 11:09
turns out you prompt the same Elum this
11:05 - 11:12
way it may sometimes um find some
11:09 - 11:14
problems with it or make some useful
11:12 - 11:17
suggestions out proofy code then you
11:14 - 11:19
prompt the same LM with the feedback and
11:17 - 11:21
ask you to improve the code and become
11:19 - 11:23
with with a new version and uh maybe
11:21 - 11:25
foreshadowing two use you can have the
11:23 - 11:28
LM run some unit tests and give the
11:25 - 11:29
feedback of the unit test back to the LM
11:28 - 11:31
then that can be additional feedback to
11:29 - 11:33
help it iterate further to further
11:31 - 11:35
improve the code and it turns out that
11:33 - 11:37
this type of reflection workflow is not
11:35 - 11:39
magic doesn't solve all problems um but
11:37 - 11:43
it will often take the Baseline level
11:39 - 11:46
performance and lift it uh to to better
11:43 - 11:47
level performance and it turns out also
11:46 - 11:49
with this type of workflow where we're
11:47 - 11:51
think of prompting an LM to critique his
11:49 - 11:54
own output use it own criticism to
11:51 - 11:56
improve it this may be also foreshadows
11:54 - 11:58
multi-agent planning or multi-agent
11:56 - 12:00
workflows where you can prompt one
11:58 - 12:03
prompt an LM to sometimes play the role
12:00 - 12:06
of a coder and sometimes prom on to play
12:03 - 12:08
the role of a CR of a Critic um to
12:06 - 12:10
review the code so such the same
12:08 - 12:13
conversation but we can prompt the LM
12:10 - 12:15
you know differently to tell sometimes
12:13 - 12:17
work on the code sometimes try to make
12:15 - 12:19
helpful suggestions and this same
12:17 - 12:24
results in improved performance so this
12:19 - 12:27
is a reflection design pattern um and
12:24 - 12:29
second major design pattern is to use uh
12:27 - 12:31
in which a lar language model can be
12:29 - 12:34
prompted to generate a request for an
12:31 - 12:37
API call to have it decide when it needs
12:34 - 12:39
to uh search the web or execute code or
12:37 - 12:41
take a the task like um issue a customer
12:39 - 12:43
refund or send an email or pull up a
12:41 - 12:45
calendar entry so to use is a major
12:43 - 12:47
design pattern that is letting large
12:45 - 12:49
language models make function calls and
12:47 - 12:52
I think this is expanding what we can do
12:49 - 12:55
with these agentic workflows um real
12:52 - 12:57
quick here's a planning or reasoning
12:55 - 12:58
design pattern in which if you were to
12:57 - 13:01
give a fairly complex request you know
12:58 - 13:04
generate image or where girls reading a
13:01 - 13:06
book and so on then an LM this example
13:04 - 13:09
adapted from the hugging GTP paper an LM
13:06 - 13:12
can look at the picture and decide to
13:09 - 13:14
first use a um open pose model to detect
13:12 - 13:17
the pose and then after that gener
13:14 - 13:19
picture of a girl um after that you'll
13:17 - 13:21
describe the image and after that use
13:19 - 13:24
sex the spe or TTS to generate the audio
13:21 - 13:27
but so in planning you an L look at a
13:24 - 13:30
complex request and pick a sequence of
13:27 - 13:33
actions execute in order to deliver on a
13:30 - 13:35
complex task um and lastly multi Asian
13:33 - 13:37
collaboration is that design pattern
13:35 - 13:40
alluded to where instead of prompting an
13:37 - 13:42
LM to just do one thing you prompt the
13:40 - 13:44
LM to play different roles at different
13:42 - 13:46
points in time so the different agents
13:44 - 13:49
simulate agents interact with each other
13:46 - 13:52
and come together to solve a task and I
13:49 - 13:54
know that some people may may wonder you
13:52 - 13:57
know if you're using one why do you need
13:54 - 13:59
to make this one play the role with
13:57 - 14:02
multip multiple agents um many teams
13:59 - 14:04
have demonstrated significant improved
14:02 - 14:07
performance for a variety of tasks using
14:04 - 14:08
this design pattern and it turns out
14:07 - 14:10
that if you have an LM sometimes
14:08 - 14:13
specialize on different tasks maybe one
14:10 - 14:14
at a time have it interact many teams
14:13 - 14:18
seem to really get much better results
14:14 - 14:20
using this I feel like maybe um there's
14:18 - 14:23
an analogy to if you're running jobs on
14:20 - 14:25
a processor on a CPU you why do we need
14:23 - 14:27
multiple processes it's all the same
14:25 - 14:29
process there you know at the end of the
14:27 - 14:31
day but we found that having multiple FS
14:29 - 14:33
of processes is a useful extraction for
14:31 - 14:35
developers to take a task and break it
14:33 - 14:37
down to subtask and I think multi-agent
14:35 - 14:39
collaboration is a bit like that too if
14:37 - 14:41
you were big task then if you think of
14:39 - 14:43
hiring a bunch of agents to do different
14:41 - 14:46
pieces of task then interact sometimes
14:43 - 14:48
that helps the developer um build
14:46 - 14:52
complex systems to deliver a good
14:48 - 14:54
result so I think with these four major
14:52 - 14:57
agentic design patterns agentic
14:54 - 14:59
reasoning workflow design patterns um it
14:57 - 15:01
gives us a huge space to play with to
14:59 - 15:04
build Rich agents to do things that
15:01 - 15:08
frankly were just not possible you know
15:04 - 15:10
even a year ago um and I want to one
15:08 - 15:13
aspect of this I'm particularly excited
15:10 - 15:15
about is the rise of not not just large
15:13 - 15:17
language model B agents but large
15:15 - 15:21
multimodal based a large multimodal
15:17 - 15:25
model based agents so um give an image
15:21 - 15:27
like this if you were wanted to uh use a
15:25 - 15:29
lmm large multimodal model you could
15:27 - 15:31
actually do zero shot PR and that's a
15:29 - 15:33
bit like telling it you know take a
15:31 - 15:36
glance at the image and just tell me the
15:33 - 15:38
output and for simple image thoughts
15:36 - 15:40
that's okay you can actually have it you
15:38 - 15:42
know look at the image and uh right give
15:40 - 15:44
you the numbers of the runners or
15:42 - 15:46
something but it turns out just as with
15:44 - 15:48
large language modelbased agents SL
15:46 - 15:51
multi modelbased model based agents can
15:48 - 15:53
do better with an itative workflow where
15:51 - 15:55
you can approach this problem step by
15:53 - 15:58
step so detect the faces detect the
15:55 - 16:00
numbers put it together and so with this
15:58 - 16:03
more irrit workflow uh you can actually
16:00 - 16:06
get an agent to do some planning testing
16:03 - 16:08
right code plan test right code and come
16:06 - 16:11
up with a most complex plan as
16:08 - 16:14
articulated expressing code to deliver
16:11 - 16:17
on more complex thoughts so what I like
16:14 - 16:20
to do is um show you a demo of some work
16:17 - 16:22
that uh Dan Malone and I and the H AI
16:20 - 16:27
team has been working on on building
16:22 - 16:31
agentic workflows for visual AI
16:27 - 16:32
tasks so if we switch to my
16:32 - 16:41
um let me have an image here of a uh
16:38 - 16:43
soccer game or football game and um I'm
16:41 - 16:47
going to say let's see counts the
16:43 - 16:49
players in the vi oh and just so fun if
16:47 - 16:50
you're not how to prompt it after
16:49 - 16:53
uploading an image This little light
16:50 - 16:55
bulb here you know gives some suggested
16:53 - 16:57
prompts you may ask for this uh but let
16:55 - 17:00
me run this so count players on the
16:57 - 17:02
field right and what this kicks off is a
17:00 - 17:04
process that actually runs for a couple
17:02 - 17:07
minutes um to Think Through how to write
17:04 - 17:10
code uh in order to come up a plan to
17:07 - 17:11
give an accurate result for uh counting
17:10 - 17:12
the number of players in the few this is
17:11 - 17:13
actually a little bit complex because
17:12 - 17:15
you don't want the players in the
17:13 - 17:18
background just be in the few I already
17:15 - 17:22
ran this earlier so we just jumped to
17:18 - 17:26
the result um but it says the Cod has
17:22 - 17:28
selected seven players on the field and
17:26 - 17:30
I think that should right 1 2 3 4 5 six
17:30 - 17:37
um and if I were to zoom in to the model
17:33 - 17:39
output Now 1 2 3 4 five six seven I
17:37 - 17:45
think that's actually right and the part
17:39 - 17:48
of the output of this is that um it has
17:45 - 17:51
also generated code uh that you can run
17:48 - 17:54
over and over um actually generated
17:51 - 17:56
python code uh
17:54 - 17:59
that if you want you can run over and
17:56 - 18:01
over on the large collection of images
17:59 - 18:04
es and I think this is exciting because
18:01 - 18:06
there are a lot of companies um and
18:04 - 18:09
teams that actually have a lot of visual
18:06 - 18:12
AI data have a lot of images um have a
18:09 - 18:15
lot of videos kind of stored somewhere
18:12 - 18:18
and until now it's been really difficult
18:15 - 18:20
to get value out of this data so for a
18:18 - 18:23
lot of the you know small teams or large
18:20 - 18:25
businesses with a lot of visual data
18:23 - 18:27
visual AI capabilities like the vision
18:25 - 18:29
agent lets you take all this data
18:27 - 18:31
previously shove somewhere in BL storage
18:29 - 18:32
and and you know get real value out of
18:31 - 18:35
this I think this is a big
18:32 - 18:38
transformation for AI um here's another
18:35 - 18:42
example you know this says um given a
18:38 - 18:43
video split this another soccer game or
18:43 - 18:48
game so given video split the video
18:46 - 18:50
clips of 5 Seconds find the clip where
18:48 - 18:52
go is being scored display a frame so
18:50 - 18:54
output so Rand is already because takes
18:52 - 18:56
a little the time to run then this will
18:54 - 19:00
generate code evaluate code for a while
18:56 - 19:04
and this is the output and it says true
19:00 - 19:06
1015 so it think those a go St you know
19:04 - 19:10
around here around between
19:06 - 19:13
the right and there you go that's the go
19:10 - 19:15
and also as instructed you know
19:13 - 19:17
extracted some of the frames associated
19:15 - 19:21
with this so really useful for
19:17 - 19:23
processing um video data and maybe
19:21 - 19:25
here's one last example uh of of of the
19:23 - 19:27
vision agent which is um you can also
19:25 - 19:29
ask it FR program to split the input
19:27 - 19:32
video into small video chunks every 6
19:29 - 19:33
seconds describe each chunk andore the
19:32 - 19:35
information at Panda's data frame along
19:33 - 19:38
with clip name s and end time return the
19:35 - 19:41
Panda's data frame so this is a way to
19:38 - 19:44
look at video data that you may have and
19:41 - 19:46
generate metadata for this uh that you
19:44 - 19:48
can then store you know in snow fake or
19:46 - 19:50
somewhere uh to then build other
19:48 - 19:54
applications on top of but just to show
19:50 - 19:57
you the output of this um so you know
19:54 - 20:00
clip name start time end time and then
19:57 - 20:02
there actually written code um here
20:00 - 20:03
right wrot code that you can then run
20:02 - 20:06
elsewhere if you want uh let me put in a
20:03 - 20:10
stream the tab or something that you can
20:06 - 20:15
then use to then write a lot of you know
20:10 - 20:17
text descriptions for this um and using
20:15 - 20:21
this capability of the vision agent to
20:17 - 20:24
help write code my team at Landing AI
20:21 - 20:26
actually built this little demo app that
20:24 - 20:28
um uses code from the vision agent so
20:26 - 20:30
instead of us sing the write code have
20:28 - 20:34
the Vision agent write the code to build
20:30 - 20:36
this metadata and then um indexes a
20:34 - 20:39
bunch of videos so let's see I say
20:36 - 20:42
browsing so skar airborne right I
20:39 - 20:45
actually ran this earlier hope it works
20:42 - 20:47
so what this demo shows is um we already
20:45 - 20:50
ran the code to take the video split in
20:47 - 20:52
chunks store the metadata and then when
20:50 - 20:55
I do a search for skier Airborne you
20:52 - 20:57
know it shows the clips uh that have
20:57 - 21:02
similarity right right oh marked here
20:59 - 21:03
with the green has high similarity well
21:02 - 21:08
this is getting my heart rate out seeing
21:03 - 21:11
do that oh here's another one whoa all
21:08 - 21:13
right all right and and the green parts
21:11 - 21:18
of the timeline show where the skier is
21:13 - 21:20
Airborne let's see gray wolf at night I
21:18 - 21:22
actually find it pretty fun yeah when
21:20 - 21:24
when you have a collection of video to
21:22 - 21:26
index it and then just browse through
21:24 - 21:29
right here's a gray wolf at night and
21:26 - 21:30
this timeline in green shows what a gr
21:29 - 21:33
wolf and Knight is and if I actually
21:30 - 21:35
jump to different part of the video
21:33 - 21:37
there's a bunch of other stuff as well
21:35 - 21:40
right there that's not a g wolf at night
21:37 - 21:46
so I that's pretty cool
21:40 - 21:46
um let's see just one last example so
21:47 - 21:53
um yeah if I actually been on the road a
21:50 - 21:56
lot uh but if sear if your luggage this
21:53 - 21:59
black luggage right
21:56 - 22:00
um there this but it turns out turns out
21:59 - 22:02
there actually a lot of black Luggage So
22:00 - 22:04
if you want your luggage let's say black
22:02 - 22:08
luggage with
22:04 - 22:09
rainbow strap this there a lot of black
22:11 - 22:16
then you know there right black luggage
22:14 - 22:18
with rainbow strap so a lot of fun
22:16 - 22:22
things to do um and I think the nice
22:18 - 22:25
thing about this is uh the work needed
22:22 - 22:27
to build applications like this is lower
22:25 - 22:30
than ever before so let's go back to the
22:33 - 22:42
and in terms of AI opportunities I spoke
22:37 - 22:44
a bit about agentic workflows and um how
22:42 - 22:48
that is changing the AI stack is as
22:44 - 22:51
follows it turns out that in addition to
22:48 - 22:54
this stack I show there's actually a new
22:51 - 22:56
emerging um agentic orchestration layer
22:54 - 22:58
and there little orchestration layer
22:56 - 22:59
like L chain that been around for a
22:58 - 23:02
while that are also becoming
22:59 - 23:04
increasingly agentic through langra for
23:02 - 23:06
example and this new agentic
23:04 - 23:08
orchestration layer is also making
23:06 - 23:10
easier for developers to build
23:08 - 23:13
applications on top uh and I hope that
23:10 - 23:15
Landing ai's Vision agent is another
23:13 - 23:17
contribution to this to makes it easier
23:15 - 23:21
for you to build visual AI applications
23:17 - 23:22
to process all this image and video data
23:21 - 23:25
that possibly you had but that was
23:22 - 23:28
really hard to get value all of um until
23:25 - 23:30
until more recently so but fire when I
23:28 - 23:32
you what to think are maybe four of the
23:30 - 23:34
most important AI Trends there's a lot
23:32 - 23:36
going on on AI is impossible to
23:34 - 23:38
summarize everything in one slide if you
23:36 - 23:40
had to make me pick what's the one most
23:38 - 23:42
important Trend I would say is a gentic
23:40 - 23:45
AI but here are four of things I think
23:42 - 23:47
are worth paying attention to first um
23:45 - 23:49
turns out agentic workflows need to read
23:47 - 23:51
a lot of text or images and generate a
23:49 - 23:54
lot of text so we say that generates a
23:51 - 23:56
lot of tokens and their exciting efforts
23:54 - 23:59
to speed up token generation including
23:56 - 24:01
semiconductor work by Sova Service drop
23:59 - 24:02
and others a lot of software and other
24:01 - 24:05
types of Hardware work as well this will
24:02 - 24:07
make a gentic workflows work much better
24:05 - 24:09
second Trend I'm about excited about
24:07 - 24:11
today's large language models has
24:09 - 24:14
started off being optimized to answer
24:11 - 24:16
human questions and human generated
24:14 - 24:18
instructions things like you know why
24:16 - 24:19
did Shakespeare write mcbath or explain
24:18 - 24:21
why Shakespeare wrote Mac beath these
24:19 - 24:23
are the types of questions that L
24:21 - 24:25
langage models are often as answer on
24:23 - 24:28
the internet but agentic workflows call
24:25 - 24:30
for other operations like to use so the
24:28 - 24:32
fact that large language models are
24:30 - 24:35
often now tuned explicitly to support
24:32 - 24:37
tool use or just a couple weeks ago um
24:35 - 24:39
anthropic release a model that can
24:37 - 24:41
support computer use I think these
24:39 - 24:43
exciting developments are create a lot
24:41 - 24:45
of lift rate create a much higher
24:43 - 24:48
ceiling for what we can now get atic
24:45 - 24:50
workloads to do with L langage models
24:48 - 24:53
that tune not just to answer human
24:50 - 24:57
queries but to tune EXA explicitly to
24:53 - 24:58
fit into these erative agentic workflows
24:58 - 25:03
data engineering's importance is rising
25:01 - 25:05
particularly with unstructured data it
25:03 - 25:07
turns out that a lot of the value of
25:05 - 25:10
machine learning was a Structure data
25:07 - 25:12
kind of tables of numbers but with geni
25:10 - 25:14
we're much better than ever before at
25:12 - 25:17
processing text and images and video and
25:14 - 25:19
maybe audio and so the importance of
25:17 - 25:21
data engineering is increasing in terms
25:19 - 25:22
of how to manage your unstructured data
25:21 - 25:24
and the metad DAT for that and
25:22 - 25:26
deployment to get the unstructured data
25:24 - 25:28
where it needs to go to create value so
25:26 - 25:31
that that would be a major effort for a
25:28 - 25:32
lot of large businesses and then lastly
25:31 - 25:34
um I think we've all seen that the text
25:32 - 25:36
processing revolution has already
25:34 - 25:38
arrived the image processing Revolution
25:36 - 25:40
is in a slightly early phase but it is
25:38 - 25:42
coming and as it comes many people many
25:40 - 25:45
businesses um will be able to get a lot
25:42 - 25:48
more value out of the visual data than
25:45 - 25:49
was possible ever before and I'm excited
25:48 - 25:51
because I think that will significantly
25:49 - 25:56
increase the space of applications we
25:51 - 25:59
can build as well so just wrap up this
25:56 - 26:01
is a great time to be a builder uh gen
25:59 - 26:03
is learning us experiment faster than
26:01 - 26:05
ever a gentic AI is expanding the set of
26:03 - 26:08
things that now possible and there just
26:05 - 26:11
so many new applications that we can now
26:08 - 26:13
build in visual AI or not in visual AI
26:11 - 26:15
that just weren't possible ever before
26:13 - 26:19
if you're interested in checking out the
26:15 - 26:21
uh visual AI demos that I ran uh please
26:19 - 26:24
go to va. landing.ai the exact demos
26:21 - 26:26
that I ran you better try out yourself
26:24 - 26:28
online and get the code and uh run code
26:26 - 26:31
yourself in your own applications so
26:28 - 26:32
with that let me say thank you all very
26:31 - 26:34
much and please also join me in
26:32 - 26:37
welcoming Elsa back onto the stage thank