00:00 - 00:13

please welcome Andrew

00:02 - 00:16

[Applause]

00:13 - 00:19

in thank you it's such a good time to be

00:16 - 00:20

a builder I'm excited to be back here at

00:19 - 00:23

snowfake

00:20 - 00:25

build what i' like to do today is share

00:23 - 00:26

you where I think are some of ai's

00:25 - 00:28

biggest

00:26 - 00:30

opportunities you may have heard me say

00:28 - 00:32

that I think AI is the new electricity

00:30 - 00:35

that's because a has a general purpose

00:32 - 00:36

technology like electricity if I ask you

00:35 - 00:38

what is electricity good for it's always

00:36 - 00:41

hard to answer because it's good for so

00:38 - 00:43

many different things and new AI

00:41 - 00:45

technology is creating a huge set of

00:43 - 00:47

opportunities for us to build new

00:45 - 00:50

applications that weren't possible

00:47 - 00:52

before people often ask me hey Andrew

00:50 - 00:54

where are the biggest AI opportunities

00:52 - 00:56

this is what I think of as the AI stack

00:54 - 00:58

at the lowest level is the

00:56 - 01:00

semiconductors and then on top of that

00:58 - 01:03

lot of the cloud infr to including of

01:00 - 01:05

Course Snowflake and then on top of that

01:03 - 01:08

are many of the foundation model

01:05 - 01:10

trainers and models and it turns out

01:08 - 01:11

that a lot of the media hype and

01:10 - 01:13

excitement and social media Buzz has

01:11 - 01:16

been on these layers of the stack kind

01:13 - 01:17

of the new technology layers when if

01:16 - 01:19

there's a new technology like generative

01:17 - 01:21

AI L the buzz is on these technology

01:19 - 01:24

layers and there's nothing wrong with

01:21 - 01:26

that but I think that almost by

01:24 - 01:29

definition there's another layer of the

01:26 - 01:31

stack that has to work out even better

01:29 - 01:32

and that's the applic apption layer

01:31 - 01:34

because we need the applications to

01:32 - 01:36

generate even more value and even more

01:34 - 01:39

Revenue so that you know to really

01:36 - 01:41

afford to pay the technology providers

01:39 - 01:43

below so I spend a lot of my time

01:41 - 01:45

thinking about AI applications and I

01:43 - 01:48

think that's where lot of the best

01:45 - 01:51

opportunities will be to build new

01:48 - 01:53

things one of the trends that has been

01:51 - 01:55

growing for the last couple years in no

01:53 - 01:58

small pop because of generative AI is

01:55 - 02:01

fast and faster machine learning model

01:58 - 02:03

development um and in particular

02:01 - 02:06

generative AI is letting us build things

02:03 - 02:09

faster than ever before take the problem

02:06 - 02:10

of say building a sentiment cost vario

02:09 - 02:12

taking text and deciding is this a

02:10 - 02:14

positive or negative sentiment for

02:12 - 02:16

reputation monitoring say typical

02:14 - 02:19

workflow using supervised learning might

02:16 - 02:22

be that will take a month to get some

02:19 - 02:24

label data and then you know train AI

02:22 - 02:27

model that might take a few months and

02:24 - 02:28

then find a cloud service or something

02:27 - 02:31

to deploy on that'll take another few

02:28 - 02:33

months and so for a long time very

02:31 - 02:35

valuable AI systems might take good AI

02:33 - 02:37

teams six to 12 months to build right

02:35 - 02:39

and there's nothing wrong with that I

02:37 - 02:41

think many people create very valuable

02:39 - 02:44

AI systems this way but with generative

02:41 - 02:48

AI there's certain cles of applications

02:44 - 02:51

where you can write a prompt in days and

02:48 - 02:53

then deploy it in you know again maybe

02:51 - 02:55

days and what this means is there are a

02:53 - 02:57

lot of applications that used to take me

02:55 - 02:59

and used to take very good AI teams

02:57 - 03:02

months to build that today you can build

02:59 - 03:06

in maybe 10 days or so and this opens up

03:02 - 03:09

the opportunity to experiment with build

03:06 - 03:10

new prototypes and and ship new AI

03:09 - 03:13

products that's certainly the

03:10 - 03:15

prototyping aspect of it and these are

03:13 - 03:18

some of the consequences of this trend

03:15 - 03:21

which is fast experimentation is

03:18 - 03:23

becoming a more promising path to

03:21 - 03:25

invention previously if it took six

03:23 - 03:26

months to build something then you know

03:25 - 03:28

we better study it make sure there user

03:26 - 03:30

demand have product managers we look at

03:28 - 03:32

it document it and and then spend all

03:30 - 03:33

that effort to build in it hopefully it

03:32 - 03:35

turns out to be

03:33 - 03:38

worthwhile but now for fast moving AI

03:35 - 03:40

teams I see a design pattern where you

03:38 - 03:42

can say you know what it take us a

03:40 - 03:43

weekend to throw together prototype

03:42 - 03:45

let's build 20 prototypes and see what

03:43 - 03:47

SS and if 18 of them don't work out

03:45 - 03:50

we'll just stitch them and stick with

03:47 - 03:53

what works so fast iteration and fast

03:50 - 03:55

experimentation is becoming a new path

03:53 - 03:57

to inventing new user

03:55 - 04:00

experiences um one of interesting

03:57 - 04:01

implication is that evaluations or evals

04:00 - 04:04

for short are becoming a bigger

04:01 - 04:06

bottleneck for how we build things so it

04:04 - 04:08

turns out back in supervised learning

04:06 - 04:10

world if you're collecting 10,000 data

04:08 - 04:12

points anyway to trade a model then you

04:10 - 04:14

know if you needed to collect an extra

04:12 - 04:18

1,000 data points for testing it was

04:14 - 04:19

fine whereas extra 10% increase in cost

04:18 - 04:21

but for a lot of large language Mel

04:19 - 04:24

based apps if there's no need to have

04:21 - 04:26

any trading data if you made me slow

04:24 - 04:28

down to collect a thousand test examples

04:26 - 04:30

boy that seems like a huge bottleneck

04:28 - 04:32

and so the new Dev velopment workflow

04:30 - 04:34

often feels as if we're building and

04:32 - 04:37

collecting data more in parallel rather

04:34 - 04:39

than sequentially um in which we build a

04:37 - 04:42

prototype and then as it becomes import

04:39 - 04:43

more important and as robustness and

04:42 - 04:46

reliability becomes more important then

04:43 - 04:48

we gradually build up that test St here

04:46 - 04:50

in parallel but I see exciting

04:48 - 04:53

Innovations to be had still in how we

04:50 - 04:56

build evals um and then what I'm seeing

04:53 - 04:58

as well is the prototyping of machine

04:56 - 05:00

learning has become much faster but

04:58 - 05:02

building a software application has lots

05:00 - 05:03

of steps does the product work you know

05:02 - 05:06

the design work does the software

05:03 - 05:08

integration work a lot of Plumbing work

05:06 - 05:10

um then after deployment Dev Ops and L

05:08 - 05:13

Ops so some of those other pieces are

05:10 - 05:14

becoming faster but they haven't become

05:13 - 05:17

faster at the same rate that the machine

05:14 - 05:19

learning modeling pot has become faster

05:17 - 05:21

so you take a process and one piece of

05:19 - 05:23

it becomes much faster um what I'm

05:21 - 05:25

seeing is prototyping is not really

05:23 - 05:28

really fast but sometimes you take a

05:25 - 05:30

prototype into robust reliable

05:28 - 05:33

production with guard rails and so on

05:30 - 05:34

those other steps still take some time

05:33 - 05:36

but the interesting Dynamic I'm seeing

05:34 - 05:38

is the fact that the machine learning p

05:36 - 05:41

is so fast is putting a lot of pressure

05:38 - 05:43

on organizations to speed up all of

05:41 - 05:46

those other parts as well so that's been

05:43 - 05:48

exciting progress for our few and in

05:46 - 05:51

terms of how machine learning

05:48 - 05:53

development um is speeding things up I

05:51 - 05:57

think the Mantra moved fast and break

05:53 - 06:00

things got a bad rep because you know it

05:57 - 06:01

broke things um I think some people

06:00 - 06:04

interpret this to mean we shouldn't move

06:01 - 06:08

fast but I disagree with that I think

06:04 - 06:10

the better mindra is move fast and be

06:08 - 06:12

responsible I'm seeing a lot of teams

06:10 - 06:14

able to prototype quickly evaluate and

06:12 - 06:16

test robustly so without shipping

06:14 - 06:18

anything out to The Wider world that

06:16 - 06:21

could you know cause damage or cause um

06:18 - 06:23

meaningful harm I'm finding smart teams

06:21 - 06:25

able to build really quickly and move

06:23 - 06:26

really fast but also do this in a very

06:25 - 06:28

responsible way and I find this

06:26 - 06:30

exhilarating that you can build things

06:28 - 06:32

and ship things and responsible way much

06:30 - 06:35

faster than ever

06:32 - 06:38

before now there's a lot going on in Ai

06:35 - 06:41

and of all the things going on AI um in

06:38 - 06:44

terms of technical Trend the one Trend

06:41 - 06:46

I'm most excited about is agentic AI

06:44 - 06:48

workflows and so if you to ask what's

06:46 - 06:50

the one most important AI technology to

06:48 - 06:55

pay attention to I would say is agentic

06:50 - 06:56

AI um I think when I started saying this

06:55 - 06:58

you know near the beginning of this year

06:56 - 07:01

it was a bit of a controversial

06:58 - 07:04

statement but now the word AI agents has

07:01 - 07:06

is become so widely used uh by by

07:04 - 07:08

Technical and non-technical people is

07:06 - 07:10

become you know little bit of a hype

07:08 - 07:13

term uh but so let me just share with

07:10 - 07:15

you how I view AI agents and why I think

07:13 - 07:16

they're important approaching just from

07:15 - 07:19

a technical

07:16 - 07:21

perspective the way that most of us use

07:19 - 07:23

large language models today is with what

07:21 - 07:25

something is called zero shot prompting

07:23 - 07:29

and that roughly means we would ask it

07:25 - 07:31

to uh give it a prompt write an essay or

07:29 - 07:33

write an output for us and it's a bit

07:31 - 07:36

like if we're going to a person or in

07:33 - 07:38

this case going to an AI and asking it

07:36 - 07:40

to type out an essay for us by going

07:38 - 07:42

from the first word writing from the

07:40 - 07:44

first word to the last word all in one

07:42 - 07:47

go without ever using backspac just

07:44 - 07:49

right from start to finish like that and

07:47 - 07:51

it turns out people you know we don't do

07:49 - 07:53

our best writing this way uh but despite

07:51 - 07:55

the difficulty of being forced to write

07:53 - 07:56

this way a Lish models do you know not

07:55 - 07:59

bad pretty

07:56 - 08:02

well here's what an agentic workflow

07:59 - 08:04

it's like uh to gener an essay we ask an

08:02 - 08:06

AI to First write an essay outline and

08:04 - 08:07

ask you do you need to do some web

08:06 - 08:09

research if so let's download some web

08:07 - 08:11

pages and put into the context of the

08:09 - 08:12

large H model then let's write the first

08:11 - 08:15

draft and then let's read the first

08:12 - 08:17

draft and critique it and revise the

08:15 - 08:20

draft and so on and this workflow looks

08:17 - 08:23

more like um doing some thinking or some

08:20 - 08:24

research and then some revision and then

08:23 - 08:27

going back to do more thinking and more

08:24 - 08:29

research and by going round this Loop

08:27 - 08:31

over and over um it takes longer but

08:29 - 08:34

this results in a much better work

08:31 - 08:36

output so in some teams I work with we

08:34 - 08:38

apply this agentic workflow to

08:36 - 08:41

processing complex tricky legal

08:38 - 08:43

documents or to um do Health Care

08:41 - 08:45

diagnosis Assistance or to do very

08:43 - 08:47

complex compliance with government

08:45 - 08:50

paperwork so many times I'm seeing this

08:47 - 08:51

drive much better results than was ever

08:50 - 08:53

possible and one thing I'm want to focus

08:51 - 08:55

on in this presentation I'll talk about

08:53 - 08:58

later is devise of visual AI where

08:55 - 09:00

agentic repal are letting us process

08:58 - 09:03

image and video data

09:00 - 09:05

but to get back to that later um it

09:03 - 09:07

turns out that there are benchmarks that

09:05 - 09:10

show seem to show a gentic workflows

09:07 - 09:12

deliver much better results um this is

09:10 - 09:15

the human eval Benchmark which is a

09:12 - 09:17

benchmark for open AI that measures

09:15 - 09:20

learning out lar rage model's ability to

09:17 - 09:23

solve coding puzzles like this one and

09:20 - 09:25

um my team collected some data turns out

09:23 - 09:29

that um on this Benchmark I think it was

09:25 - 09:33

POS K Benchmark POS K metric GB 3.5 got

09:29 - 09:36

48% right on this coding Benchmark gb4

09:33 - 09:39

huge Improvement you know

09:36 - 09:42

67% but the improvement from GB 3.5 to

09:39 - 09:46

gbd4 is dwarf by the improvement from

09:42 - 09:49

gbt 3.5 to GB 3.5 using an agentic

09:46 - 09:53

workflow um which gets over up to about

09:49 - 09:58

95% and gbd4 with an agentic workflow

09:53 - 10:00

also does much better um and so it turns

09:58 - 10:03

out that in the way Builders built

10:00 - 10:05

agentic reasoning or agentic workflows

10:03 - 10:07

in their applications there are I want

10:05 - 10:09

to say four major design patterns which

10:07 - 10:12

are reflection two use planning and

10:09 - 10:14

multi-agent collaboration and to

10:12 - 10:16

demystify agentic workflows a little bit

10:14 - 10:19

let me quickly step through what these

10:16 - 10:21

workflows mean um and I find that

10:19 - 10:22

agentic workflows sometimes seem a

10:21 - 10:24

little bit mysterious until you actually

10:22 - 10:26

read through the code for one or two of

10:24 - 10:28

these go oh that's it you know that's

10:26 - 10:29

really cool but oh that's all it takes

10:28 - 10:32

but let me just step through

10:29 - 10:36

um to for for concreteness what

10:32 - 10:39

reflection with ls looks like so I might

10:36 - 10:41

start off uh prompting an L there a

10:39 - 10:43

coder agent l so maybe an assistant

10:41 - 10:45

message to your roles to be a coder and

10:43 - 10:47

write code um so you can tell you know

10:45 - 10:50

please write code for certain tasks and

10:47 - 10:52

the L May generate codes and then it

10:50 - 10:54

turns out that you can construct a

10:52 - 10:57

prompt that takes the code that was just

10:54 - 10:59

generated and copy paste the code back

10:57 - 11:01

into the prompt and ask it you know he

10:59 - 11:04

some code intended for a Tas examine

11:01 - 11:05

this code and critique it right and it

11:04 - 11:09

turns out you prompt the same Elum this

11:05 - 11:12

way it may sometimes um find some

11:09 - 11:14

problems with it or make some useful

11:12 - 11:17

suggestions out proofy code then you

11:14 - 11:19

prompt the same LM with the feedback and

11:17 - 11:21

ask you to improve the code and become

11:19 - 11:23

with with a new version and uh maybe

11:21 - 11:25

foreshadowing two use you can have the

11:23 - 11:28

LM run some unit tests and give the

11:25 - 11:29

feedback of the unit test back to the LM

11:28 - 11:31

then that can be additional feedback to

11:29 - 11:33

help it iterate further to further

11:31 - 11:35

improve the code and it turns out that

11:33 - 11:37

this type of reflection workflow is not

11:35 - 11:39

magic doesn't solve all problems um but

11:37 - 11:43

it will often take the Baseline level

11:39 - 11:46

performance and lift it uh to to better

11:43 - 11:47

level performance and it turns out also

11:46 - 11:49

with this type of workflow where we're

11:47 - 11:51

think of prompting an LM to critique his

11:49 - 11:54

own output use it own criticism to

11:51 - 11:56

improve it this may be also foreshadows

11:54 - 11:58

multi-agent planning or multi-agent

11:56 - 12:00

workflows where you can prompt one

11:58 - 12:03

prompt an LM to sometimes play the role

12:00 - 12:06

of a coder and sometimes prom on to play

12:03 - 12:08

the role of a CR of a Critic um to

12:06 - 12:10

review the code so such the same

12:08 - 12:13

conversation but we can prompt the LM

12:10 - 12:15

you know differently to tell sometimes

12:13 - 12:17

work on the code sometimes try to make

12:15 - 12:19

helpful suggestions and this same

12:17 - 12:24

results in improved performance so this

12:19 - 12:27

is a reflection design pattern um and

12:24 - 12:29

second major design pattern is to use uh

12:27 - 12:31

in which a lar language model can be

12:29 - 12:34

prompted to generate a request for an

12:31 - 12:37

API call to have it decide when it needs

12:34 - 12:39

to uh search the web or execute code or

12:37 - 12:41

take a the task like um issue a customer

12:39 - 12:43

refund or send an email or pull up a

12:41 - 12:45

calendar entry so to use is a major

12:43 - 12:47

design pattern that is letting large

12:45 - 12:49

language models make function calls and

12:47 - 12:52

I think this is expanding what we can do

12:49 - 12:55

with these agentic workflows um real

12:52 - 12:57

quick here's a planning or reasoning

12:55 - 12:58

design pattern in which if you were to

12:57 - 13:01

give a fairly complex request you know

12:58 - 13:04

generate image or where girls reading a

13:01 - 13:06

book and so on then an LM this example

13:04 - 13:09

adapted from the hugging GTP paper an LM

13:06 - 13:12

can look at the picture and decide to

13:09 - 13:14

first use a um open pose model to detect

13:12 - 13:17

the pose and then after that gener

13:14 - 13:19

picture of a girl um after that you'll

13:17 - 13:21

describe the image and after that use

13:19 - 13:24

sex the spe or TTS to generate the audio

13:21 - 13:27

but so in planning you an L look at a

13:24 - 13:30

complex request and pick a sequence of

13:27 - 13:33

actions execute in order to deliver on a

13:30 - 13:35

complex task um and lastly multi Asian

13:33 - 13:37

collaboration is that design pattern

13:35 - 13:40

alluded to where instead of prompting an

13:37 - 13:42

LM to just do one thing you prompt the

13:40 - 13:44

LM to play different roles at different

13:42 - 13:46

points in time so the different agents

13:44 - 13:49

simulate agents interact with each other

13:46 - 13:52

and come together to solve a task and I

13:49 - 13:54

know that some people may may wonder you

13:52 - 13:57

know if you're using one why do you need

13:54 - 13:59

to make this one play the role with

13:57 - 14:02

multip multiple agents um many teams

13:59 - 14:04

have demonstrated significant improved

14:02 - 14:07

performance for a variety of tasks using

14:04 - 14:08

this design pattern and it turns out

14:07 - 14:10

that if you have an LM sometimes

14:08 - 14:13

specialize on different tasks maybe one

14:10 - 14:14

at a time have it interact many teams

14:13 - 14:18

seem to really get much better results

14:14 - 14:20

using this I feel like maybe um there's

14:18 - 14:23

an analogy to if you're running jobs on

14:20 - 14:25

a processor on a CPU you why do we need

14:23 - 14:27

multiple processes it's all the same

14:25 - 14:29

process there you know at the end of the

14:27 - 14:31

day but we found that having multiple FS

14:29 - 14:33

of processes is a useful extraction for

14:31 - 14:35

developers to take a task and break it

14:33 - 14:37

down to subtask and I think multi-agent

14:35 - 14:39

collaboration is a bit like that too if

14:37 - 14:41

you were big task then if you think of

14:39 - 14:43

hiring a bunch of agents to do different

14:41 - 14:46

pieces of task then interact sometimes

14:43 - 14:48

that helps the developer um build

14:46 - 14:52

complex systems to deliver a good

14:48 - 14:54

result so I think with these four major

14:52 - 14:57

agentic design patterns agentic

14:54 - 14:59

reasoning workflow design patterns um it

14:57 - 15:01

gives us a huge space to play with to

14:59 - 15:04

build Rich agents to do things that

15:01 - 15:08

frankly were just not possible you know

15:04 - 15:10

even a year ago um and I want to one

15:08 - 15:13

aspect of this I'm particularly excited

15:10 - 15:15

about is the rise of not not just large

15:13 - 15:17

language model B agents but large

15:15 - 15:21

multimodal based a large multimodal

15:17 - 15:25

model based agents so um give an image

15:21 - 15:27

like this if you were wanted to uh use a

15:25 - 15:29

lmm large multimodal model you could

15:27 - 15:31

actually do zero shot PR and that's a

15:29 - 15:33

bit like telling it you know take a

15:31 - 15:36

glance at the image and just tell me the

15:33 - 15:38

output and for simple image thoughts

15:36 - 15:40

that's okay you can actually have it you

15:38 - 15:42

know look at the image and uh right give

15:40 - 15:44

you the numbers of the runners or

15:42 - 15:46

something but it turns out just as with

15:44 - 15:48

large language modelbased agents SL

15:46 - 15:51

multi modelbased model based agents can

15:48 - 15:53

do better with an itative workflow where

15:51 - 15:55

you can approach this problem step by

15:53 - 15:58

step so detect the faces detect the

15:55 - 16:00

numbers put it together and so with this

15:58 - 16:03

more irrit workflow uh you can actually

16:00 - 16:06

get an agent to do some planning testing

16:03 - 16:08

right code plan test right code and come

16:06 - 16:11

up with a most complex plan as

16:08 - 16:14

articulated expressing code to deliver

16:11 - 16:17

on more complex thoughts so what I like

16:14 - 16:20

to do is um show you a demo of some work

16:17 - 16:22

that uh Dan Malone and I and the H AI

16:20 - 16:27

team has been working on on building

16:22 - 16:31

agentic workflows for visual AI

16:27 - 16:32

tasks so if we switch to my

16:31 - 16:38

laptop

16:32 - 16:41

um let me have an image here of a uh

16:38 - 16:43

soccer game or football game and um I'm

16:41 - 16:47

going to say let's see counts the

16:43 - 16:49

players in the vi oh and just so fun if

16:47 - 16:50

you're not how to prompt it after

16:49 - 16:53

uploading an image This little light

16:50 - 16:55

bulb here you know gives some suggested

16:53 - 16:57

prompts you may ask for this uh but let

16:55 - 17:00

me run this so count players on the

16:57 - 17:02

field right and what this kicks off is a

17:00 - 17:04

process that actually runs for a couple

17:02 - 17:07

minutes um to Think Through how to write

17:04 - 17:10

code uh in order to come up a plan to

17:07 - 17:11

give an accurate result for uh counting

17:10 - 17:12

the number of players in the few this is

17:11 - 17:13

actually a little bit complex because

17:12 - 17:15

you don't want the players in the

17:13 - 17:18

background just be in the few I already

17:15 - 17:22

ran this earlier so we just jumped to

17:18 - 17:26

the result um but it says the Cod has

17:22 - 17:28

selected seven players on the field and

17:26 - 17:30

I think that should right 1 2 3 4 5 six

17:28 - 17:33

seven

17:30 - 17:37

um and if I were to zoom in to the model

17:33 - 17:39

output Now 1 2 3 4 five six seven I

17:37 - 17:45

think that's actually right and the part

17:39 - 17:48

of the output of this is that um it has

17:45 - 17:51

also generated code uh that you can run

17:48 - 17:54

over and over um actually generated

17:51 - 17:56

python code uh

17:54 - 17:59

that if you want you can run over and

17:56 - 18:01

over on the large collection of images

17:59 - 18:04

es and I think this is exciting because

18:01 - 18:06

there are a lot of companies um and

18:04 - 18:09

teams that actually have a lot of visual

18:06 - 18:12

AI data have a lot of images um have a

18:09 - 18:15

lot of videos kind of stored somewhere

18:12 - 18:18

and until now it's been really difficult

18:15 - 18:20

to get value out of this data so for a

18:18 - 18:23

lot of the you know small teams or large

18:20 - 18:25

businesses with a lot of visual data

18:23 - 18:27

visual AI capabilities like the vision

18:25 - 18:29

agent lets you take all this data

18:27 - 18:31

previously shove somewhere in BL storage

18:29 - 18:32

and and you know get real value out of

18:31 - 18:35

this I think this is a big

18:32 - 18:38

transformation for AI um here's another

18:35 - 18:42

example you know this says um given a

18:38 - 18:43

video split this another soccer game or

18:42 - 18:46

football

18:43 - 18:48

game so given video split the video

18:46 - 18:50

clips of 5 Seconds find the clip where

18:48 - 18:52

go is being scored display a frame so

18:50 - 18:54

output so Rand is already because takes

18:52 - 18:56

a little the time to run then this will

18:54 - 19:00

generate code evaluate code for a while

18:56 - 19:04

and this is the output and it says true

19:00 - 19:06

1015 so it think those a go St you know

19:04 - 19:10

around here around between

19:06 - 19:13

the right and there you go that's the go

19:10 - 19:15

and also as instructed you know

19:13 - 19:17

extracted some of the frames associated

19:15 - 19:21

with this so really useful for

19:17 - 19:23

processing um video data and maybe

19:21 - 19:25

here's one last example uh of of of the

19:23 - 19:27

vision agent which is um you can also

19:25 - 19:29

ask it FR program to split the input

19:27 - 19:32

video into small video chunks every 6

19:29 - 19:33

seconds describe each chunk andore the

19:32 - 19:35

information at Panda's data frame along

19:33 - 19:38

with clip name s and end time return the

19:35 - 19:41

Panda's data frame so this is a way to

19:38 - 19:44

look at video data that you may have and

19:41 - 19:46

generate metadata for this uh that you

19:44 - 19:48

can then store you know in snow fake or

19:46 - 19:50

somewhere uh to then build other

19:48 - 19:54

applications on top of but just to show

19:50 - 19:57

you the output of this um so you know

19:54 - 20:00

clip name start time end time and then

19:57 - 20:02

there actually written code um here

20:00 - 20:03

right wrot code that you can then run

20:02 - 20:06

elsewhere if you want uh let me put in a

20:03 - 20:10

stream the tab or something that you can

20:06 - 20:15

then use to then write a lot of you know

20:10 - 20:17

text descriptions for this um and using

20:15 - 20:21

this capability of the vision agent to

20:17 - 20:24

help write code my team at Landing AI

20:21 - 20:26

actually built this little demo app that

20:24 - 20:28

um uses code from the vision agent so

20:26 - 20:30

instead of us sing the write code have

20:28 - 20:34

the Vision agent write the code to build

20:30 - 20:36

this metadata and then um indexes a

20:34 - 20:39

bunch of videos so let's see I say

20:36 - 20:42

browsing so skar airborne right I

20:39 - 20:45

actually ran this earlier hope it works

20:42 - 20:47

so what this demo shows is um we already

20:45 - 20:50

ran the code to take the video split in

20:47 - 20:52

chunks store the metadata and then when

20:50 - 20:55

I do a search for skier Airborne you

20:52 - 20:57

know it shows the clips uh that have

20:55 - 20:59

high

20:57 - 21:02

similarity right right oh marked here

20:59 - 21:03

with the green has high similarity well

21:02 - 21:08

this is getting my heart rate out seeing

21:03 - 21:11

do that oh here's another one whoa all

21:08 - 21:13

right all right and and the green parts

21:11 - 21:18

of the timeline show where the skier is

21:13 - 21:20

Airborne let's see gray wolf at night I

21:18 - 21:22

actually find it pretty fun yeah when

21:20 - 21:24

when you have a collection of video to

21:22 - 21:26

index it and then just browse through

21:24 - 21:29

right here's a gray wolf at night and

21:26 - 21:30

this timeline in green shows what a gr

21:29 - 21:33

wolf and Knight is and if I actually

21:30 - 21:35

jump to different part of the video

21:33 - 21:37

there's a bunch of other stuff as well

21:35 - 21:40

right there that's not a g wolf at night

21:37 - 21:46

so I that's pretty cool

21:40 - 21:46

um let's see just one last example so

21:47 - 21:53

um yeah if I actually been on the road a

21:50 - 21:56

lot uh but if sear if your luggage this

21:53 - 21:59

black luggage right

21:56 - 22:00

um there this but it turns out turns out

21:59 - 22:02

there actually a lot of black Luggage So

22:00 - 22:04

if you want your luggage let's say black

22:02 - 22:08

luggage with

22:04 - 22:09

rainbow strap this there a lot of black

22:08 - 22:11

luggage out

22:09 - 22:14

there

22:11 - 22:16

then you know there right black luggage

22:14 - 22:18

with rainbow strap so a lot of fun

22:16 - 22:22

things to do um and I think the nice

22:18 - 22:25

thing about this is uh the work needed

22:22 - 22:27

to build applications like this is lower

22:25 - 22:30

than ever before so let's go back to the

22:27 - 22:30

slides

22:30 - 22:37

um

22:33 - 22:42

and in terms of AI opportunities I spoke

22:37 - 22:44

a bit about agentic workflows and um how

22:42 - 22:48

that is changing the AI stack is as

22:44 - 22:51

follows it turns out that in addition to

22:48 - 22:54

this stack I show there's actually a new

22:51 - 22:56

emerging um agentic orchestration layer

22:54 - 22:58

and there little orchestration layer

22:56 - 22:59

like L chain that been around for a

22:58 - 23:02

while that are also becoming

22:59 - 23:04

increasingly agentic through langra for

23:02 - 23:06

example and this new agentic

23:04 - 23:08

orchestration layer is also making

23:06 - 23:10

easier for developers to build

23:08 - 23:13

applications on top uh and I hope that

23:10 - 23:15

Landing ai's Vision agent is another

23:13 - 23:17

contribution to this to makes it easier

23:15 - 23:21

for you to build visual AI applications

23:17 - 23:22

to process all this image and video data

23:21 - 23:25

that possibly you had but that was

23:22 - 23:28

really hard to get value all of um until

23:25 - 23:30

until more recently so but fire when I

23:28 - 23:32

you what to think are maybe four of the

23:30 - 23:34

most important AI Trends there's a lot

23:32 - 23:36

going on on AI is impossible to

23:34 - 23:38

summarize everything in one slide if you

23:36 - 23:40

had to make me pick what's the one most

23:38 - 23:42

important Trend I would say is a gentic

23:40 - 23:45

AI but here are four of things I think

23:42 - 23:47

are worth paying attention to first um

23:45 - 23:49

turns out agentic workflows need to read

23:47 - 23:51

a lot of text or images and generate a

23:49 - 23:54

lot of text so we say that generates a

23:51 - 23:56

lot of tokens and their exciting efforts

23:54 - 23:59

to speed up token generation including

23:56 - 24:01

semiconductor work by Sova Service drop

23:59 - 24:02

and others a lot of software and other

24:01 - 24:05

types of Hardware work as well this will

24:02 - 24:07

make a gentic workflows work much better

24:05 - 24:09

second Trend I'm about excited about

24:07 - 24:11

today's large language models has

24:09 - 24:14

started off being optimized to answer

24:11 - 24:16

human questions and human generated

24:14 - 24:18

instructions things like you know why

24:16 - 24:19

did Shakespeare write mcbath or explain

24:18 - 24:21

why Shakespeare wrote Mac beath these

24:19 - 24:23

are the types of questions that L

24:21 - 24:25

langage models are often as answer on

24:23 - 24:28

the internet but agentic workflows call

24:25 - 24:30

for other operations like to use so the

24:28 - 24:32

fact that large language models are

24:30 - 24:35

often now tuned explicitly to support

24:32 - 24:37

tool use or just a couple weeks ago um

24:35 - 24:39

anthropic release a model that can

24:37 - 24:41

support computer use I think these

24:39 - 24:43

exciting developments are create a lot

24:41 - 24:45

of lift rate create a much higher

24:43 - 24:48

ceiling for what we can now get atic

24:45 - 24:50

workloads to do with L langage models

24:48 - 24:53

that tune not just to answer human

24:50 - 24:57

queries but to tune EXA explicitly to

24:53 - 24:58

fit into these erative agentic workflows

24:57 - 25:01

um third

24:58 - 25:03

data engineering's importance is rising

25:01 - 25:05

particularly with unstructured data it

25:03 - 25:07

turns out that a lot of the value of

25:05 - 25:10

machine learning was a Structure data

25:07 - 25:12

kind of tables of numbers but with geni

25:10 - 25:14

we're much better than ever before at

25:12 - 25:17

processing text and images and video and

25:14 - 25:19

maybe audio and so the importance of

25:17 - 25:21

data engineering is increasing in terms

25:19 - 25:22

of how to manage your unstructured data

25:21 - 25:24

and the metad DAT for that and

25:22 - 25:26

deployment to get the unstructured data

25:24 - 25:28

where it needs to go to create value so

25:26 - 25:31

that that would be a major effort for a

25:28 - 25:32

lot of large businesses and then lastly

25:31 - 25:34

um I think we've all seen that the text

25:32 - 25:36

processing revolution has already

25:34 - 25:38

arrived the image processing Revolution

25:36 - 25:40

is in a slightly early phase but it is

25:38 - 25:42

coming and as it comes many people many

25:40 - 25:45

businesses um will be able to get a lot

25:42 - 25:48

more value out of the visual data than

25:45 - 25:49

was possible ever before and I'm excited

25:48 - 25:51

because I think that will significantly

25:49 - 25:56

increase the space of applications we

25:51 - 25:59

can build as well so just wrap up this

25:56 - 26:01

is a great time to be a builder uh gen

25:59 - 26:03

is learning us experiment faster than

26:01 - 26:05

ever a gentic AI is expanding the set of

26:03 - 26:08

things that now possible and there just

26:05 - 26:11

so many new applications that we can now

26:08 - 26:13

build in visual AI or not in visual AI

26:11 - 26:15

that just weren't possible ever before

26:13 - 26:19

if you're interested in checking out the

26:15 - 26:21

uh visual AI demos that I ran uh please

26:19 - 26:24

go to va. landing.ai the exact demos

26:21 - 26:26

that I ran you better try out yourself

26:24 - 26:28

online and get the code and uh run code

26:26 - 26:31

yourself in your own applications so

26:28 - 26:32

with that let me say thank you all very

26:31 - 26:34

much and please also join me in

26:32 - 26:37

welcoming Elsa back onto the stage thank

26:34 - 26:37

you

Building AI Applications with Agentic Workflows

In the video, Andrew discusses the exciting opportunities in AI and the shift he sees towards agentic AI workflows, a key aspect of AI development. Andrew believes AI is comparable to electricity - a versatile tool with endless possibilities. He emphasizes the importance of the application layer in AI development, highlighting how fast and iterative development cycles are shortening the time needed to build valuable AI systems.

Agentic AI: A New Era in Development

Andrew introduces the concept of agentic AI workflows, emphasizing the shift towards more iterative, interactive, and thoughtful AI processes. He details how these workflows involve prompting AI models to perform complex tasks step by step, enabling better outcomes and problem-solving capabilities. The discussion includes the reflection, to-use, planning, and multi-agent collaboration design patterns that enhance the effectiveness of AI systems.

Advancements in Agentic AI Workflows

Moreover, Andrew showcases practical examples of agentic workflows in action, particularly in the realm of visual AI tasks. He demonstrates how AI models can process visual data efficiently, such as counting players on a field or identifying specific scenes in videos. These capabilities open up new opportunities for businesses to derive insights from visual data that were previously challenging to extract.

The Future of AI Development

Looking ahead, Andrew underscores the importance of data engineering with unstructured data, the rise of large language models optimized for agentic workflows, and the ongoing revolution in image processing. He envisions a future where businesses can leverage visual data to create innovative applications that were once unimaginable.

In conclusion, Andrew expresses optimism for builders in the AI space, citing the increased speed of experimentation and the expanding possibilities with agentic AI. The accessibility of demos and tools like Landing AI's Vision Agent signifies a transformative phase in AI development, empowering developers to create groundbreaking applications more efficiently than ever.

In a rapidly evolving AI landscape, the adoption of agentic workflows heralds a new era of intelligent, efficient, and responsible AI development practices. As AI continues to revolutionize various industries, the emphasis on thoughtful, iterative workflows will be crucial in unlocking the full potential of artificial intelligence.