00:00 - 00:05

hey I'm Dave welcome to my shop this is

00:03 - 00:07

nvidia's Jetson Ora Nano super and it's

00:05 - 00:09

an impressive Edge computer capable of

00:07 - 00:12

running deep seek R1 models right on the

00:09 - 00:14

device it's got 1024 Cuda cores 32

00:12 - 00:18

tensor cores 8 GB of

00:14 - 00:20

lpddr5 six arm CPU cores SSD expansion

00:18 - 00:22

and much much more we're going to use it

00:20 - 00:24

today to flip the script on AI as I show

00:22 - 00:27

you how to run it locally on your own

00:24 - 00:30

desktop or on the Nano and then for

00:27 - 00:31

comparison I'll show you the mass of 67

00:30 - 00:34

1 billion parameter version running

00:31 - 00:36

Uncorked on a top end thread Ripper you

00:34 - 00:37

see when it comes to AI most of us are

00:36 - 00:39

used to asking questions in a web

00:37 - 00:41

browser window and then waiting for the

00:39 - 00:43

cloud to do its thing but what if you

00:41 - 00:44

didn't need the cloud at all what if you

00:43 - 00:46

could ask the same questions and get

00:44 - 00:49

answers from an AI running right on your

00:46 - 00:51

desk in the privacy of your home lab or

00:49 - 00:54

maybe even your own garage that's where

00:51 - 00:56

deep seek R1 comes in it's a NextGen

00:54 - 00:58

conversational AI that unlike its Cloud

00:56 - 01:00

locked cousins can be self-hosted at

00:58 - 01:02

home the advantages of this are clear

01:00 - 01:04

once you think about them you get full

01:02 - 01:06

control over your data privacy isn't

01:04 - 01:07

somebody else's problem and you avoid

01:06 - 01:10

the recurring subscription fees that

01:07 - 01:12

many services charge and perhaps best of

01:10 - 01:14

all it can be just plain faster or at

01:12 - 01:16

least more responsive when you're not at

01:14 - 01:18

the mercy of server latency or network

01:16 - 01:19

oages youve suddenly got yourself an AI

01:18 - 01:22

assistant that's truly yours no

01:19 - 01:23

middleman required and if like me you're

01:22 - 01:26

working on something that has complex

01:23 - 01:27

code that causes a large context window

01:26 - 01:29

you won't burn through your open AI

01:27 - 01:31

subscription meter quite as quickly now

01:29 - 01:33

the specs on this little guy as I said

01:31 - 01:38

earlier are pretty impressive 1024 Cuda

01:33 - 01:41

cores 16 tensor cores six CPU cores 8 GB

01:38 - 01:43

of RAM uh what do we got in here 1 tby

01:41 - 01:45

SSD as it's configured but what does all

01:43 - 01:46

this mean in Practical terms well it's a

01:45 - 01:48

bit like having the brain of a

01:46 - 01:50

workstation GPU packed into something

01:48 - 01:52

small enough to well almost fit in the

01:50 - 01:53

palm of your hand but the kicker here is

01:52 - 01:55

that it's specifically tuned for AI

01:53 - 01:58

workloads that makes it the perfect

01:55 - 02:00

platform for deeps R1 an AI model that

01:58 - 02:03

thrives on edge Hardware so let's talk

02:00 - 02:04

setup to get deep seek R1 up and running

02:03 - 02:06

at home we're using a program called

02:04 - 02:08

olama if you're not familiar with it

02:06 - 02:09

think of it like a streamlined

02:08 - 02:12

deployment tool for large language

02:09 - 02:14

models you run olama AMA downloads and

02:12 - 02:16

runs the models for you olama simplifies

02:14 - 02:18

the process of downloading setting up

02:16 - 02:20

and configuring AI models without

02:18 - 02:22

needing to be a wizard or really even

02:20 - 02:23

know that much about them I know some of

02:22 - 02:25

you probably love the command line as

02:23 - 02:26

much as I do especially if you grew up

02:25 - 02:28

compiling kernels and Machines That

02:26 - 02:30

Couldn't load web pages yet but trust me

02:28 - 02:32

when I say that ama does make life a lot

02:30 - 02:34

easier you'll be up and running in

02:32 - 02:36

minutes not hours and the good news is

02:34 - 02:37

you can still use the command line to

02:36 - 02:40

operate it if you prefer once it's

02:37 - 02:41

running I'll set a llama up on the oron

02:40 - 02:44

Nano just as you might on your own

02:41 - 02:45

desktop PC and we'll use it the same way

02:44 - 02:48

so everything we cover here works with

02:45 - 02:49

your own desktop GPU as well the

02:48 - 02:51

installation straightforward and once

02:49 - 02:54

it's done we'll pull the Deep seek R1

02:51 - 02:58

model down from ama's catalog we use the

02:54 - 03:02

following command olama pull deep seek D

02:58 - 03:03

R1 colon 1.5 B and yes this step does

03:02 - 03:04

require an internet connection but

03:03 - 03:06

here's the beauty of it once the model

03:04 - 03:08

is then downloaded you're done with the

03:06 - 03:10

web you could pull the cable everything

03:08 - 03:12

after that is completely local and why

03:10 - 03:14

does this matter well for one privacy

03:12 - 03:16

when you run deep seek R1 locally your

03:14 - 03:18

queries and data never leave your

03:16 - 03:20

machine if you've ever hesitated to ask

03:18 - 03:22

a sensitive question to a cloud-based AI

03:20 - 03:24

you're not alone the idea that your

03:22 - 03:26

inquiries might live on forever in some

03:24 - 03:28

Far Away server or state that represents

03:26 - 03:30

you can be a bit unnerving with deep SE

03:28 - 03:32

car one what you ask stays right where

03:30 - 03:34

you ask it on the Jetson Nano sitting on

03:32 - 03:36

your desk but privacy isn't the only win

03:34 - 03:38

here there's something satisfying about

03:36 - 03:40

the idea of self-hosting it's the same

03:38 - 03:42

appeal that Drew many of us into running

03:40 - 03:43

our own web servers back in the day I

03:42 - 03:45

mean I didn't need to be running

03:43 - 03:47

exchange server at home for my email but

03:45 - 03:49

I was and running deep seek R1 locally

03:47 - 03:51

scratches that same kind of itch it's a

03:49 - 03:53

project that you control and there's a

03:51 - 03:54

sense of ownership that comes with that

03:53 - 03:56

plus you get the added benefit of

03:54 - 03:58

knowing that your setup can run even

03:56 - 04:00

when your internet connection doesn't

03:58 - 04:02

once olama is installed and model is

04:00 - 04:03

loaded running queries is as simple as

04:02 - 04:05

opening a terminal or connecting to its

04:03 - 04:07

web interface you can input your

04:05 - 04:09

questions just like you would any other

04:07 - 04:11

AI chatbot and the responses come back

04:09 - 04:12

in near real time assuming you're not

04:11 - 04:15

asking it to write the Great American

04:12 - 04:17

novel or do Innovative fluid Dynamic

04:15 - 04:18

simulations now this is also a reasoning

04:17 - 04:20

model so it does think for a while

04:18 - 04:22

before it generates an answer but the

04:20 - 04:24

thinking is fast and starts immediately

04:22 - 04:25

the jet sonat handles most

04:24 - 04:28

conversational queries with ease thanks

04:25 - 04:30

to its optimized tensor cores and GPU

04:28 - 04:31

compute capabilities

04:30 - 04:33

let's consider the Practical side of

04:31 - 04:36

things say you're working on a coding

04:33 - 04:37

project maybe something in python or C++

04:36 - 04:40

now I've managed to burn through my open

04:37 - 04:42

AI monthly credits in just a few days by

04:40 - 04:44

iterating with the AI on a complex piece

04:42 - 04:46

of code because the longer the context

04:44 - 04:47

window gets the more resources it

04:46 - 04:50

consumes but if you're running it

04:47 - 04:51

locally you don't care you just want the

04:50 - 04:53

code that produces to work and you don't

04:51 - 04:55

want to be build for it as it goes about

04:53 - 04:57

it and what about home automation

04:55 - 04:59

enthusiasts well this setup can serve as

04:57 - 05:01

the brains behind your smart home taking

04:59 - 05:03

voice commands analyzing sensor data

05:01 - 05:04

offering suggestions all without needing

05:03 - 05:07

to send a single bite of your

05:04 - 05:09

information to a Cloud Server imagine

05:07 - 05:11

asking your AI to analyze the security

05:09 - 05:14

footage to find a particular person all

05:11 - 05:15

handled locally and securely in a

05:14 - 05:17

previous video you might have seen how I

05:15 - 05:20

rigged the oron Nano up to monitor the

05:17 - 05:22

feed for my own driveway it used P torch

05:20 - 05:24

and YOLO to wash for and announce as new

05:22 - 05:26

vehicles came and left and I think

05:24 - 05:28

that's a killer feature of the Nano it's

05:26 - 05:30

small but it's not a toy it's got the

05:28 - 05:32

hardware to do real work and it does it

05:30 - 05:34

admirably well of course the jetsen or

05:32 - 05:36

Nano isn't the only Hardware capable of

05:34 - 05:38

running deepsea car1 but it's arguably

05:36 - 05:40

one of the most cost-effective options

05:38 - 05:42

for its level of performance there's no

05:40 - 05:45

need to invest thousands into Enterprise

05:42 - 05:47

grade gpus or Cloud credits because for

05:45 - 05:48

under 250 bucks you got a system that's

05:47 - 05:50

powerful enough for most personal Ai

05:48 - 05:52

workloads and flexible enough to handle

05:50 - 05:55

a variety of projects Beyond just

05:52 - 05:56

chat-based queries and because the

05:55 - 05:57

Jetson series designed for Edge

05:56 - 05:59

Computing it's also well suited for

05:57 - 06:01

mobile or embedded use cases meaning you

05:59 - 06:04

could deploy it in everything from

06:01 - 06:05

robots to custom iot devices but at this

06:04 - 06:07

point you might be wondering what's the

06:05 - 06:09

catch well honestly there isn't much of

06:07 - 06:11

one sure there are limitations to

06:09 - 06:12

running AI models locally you're

06:11 - 06:14

constrained by the hardware and you're

06:12 - 06:16

not going to train a large language

06:14 - 06:19

model on the Jetson Nano but that's not

06:16 - 06:21

the point here for inference to actually

06:19 - 06:22

use the AI to generate answers the

06:21 - 06:25

Jetson Nano punches well above its

06:22 - 06:27

weight to prove that point let's start

06:25 - 06:29

with the smallest model with only 1.5

06:27 - 06:31

billion parameters I'll ask it to simple

06:29 - 06:33

science question like why no two

06:31 - 06:35

snowflakes are apparently alike and see

06:33 - 06:36

what it comes up with it processes The

06:35 - 06:38

Prompt and begins thinking almost

06:36 - 06:40

immediately and what appears to be less

06:38 - 06:42

than 1 second it then goes into its

06:40 - 06:44

reasoning phase because you see deep

06:42 - 06:46

seek is not just a regular large

06:44 - 06:48

language model but a reasoning Model A

06:46 - 06:50

reasoning model is a type of AI system

06:48 - 06:52

specifically designed to go beyond

06:50 - 06:54

surface level responses and to provide

06:52 - 06:56

conclusions based on deeper contextual

06:54 - 06:58

understanding and logical deductions

06:56 - 07:00

unlike traditional large language models

06:58 - 07:01

which focus on predicting the word or

07:00 - 07:04

token based on patterns it finds in

07:01 - 07:06

massive data sets reasoning models are

07:04 - 07:08

engineered to evaluate facts consider

07:06 - 07:10

possible outcomes and synthesize answers

07:08 - 07:12

that demonstrate a level of structured

07:10 - 07:14

thought and here's where deep seek R1

07:12 - 07:16

stands apart it's not just regurgitating

07:14 - 07:18

patterns from its training data that it

07:16 - 07:20

saw on the web somewhere it's capable of

07:18 - 07:22

understanding the relationships between

07:20 - 07:24

Concepts and applying deductive or

07:22 - 07:26

inductive or abductive reasoning

07:24 - 07:28

processes deductive reasoning works by

07:26 - 07:31

applying general rules to specific cases

07:28 - 07:33

such as all humans are mortal Socrates

07:31 - 07:36

is a human and therefore Socrates is

07:33 - 07:38

Mortal inductive reasoning generalizes

07:36 - 07:40

based on observations for example I've

07:38 - 07:43

seen many swans and they've always been

07:40 - 07:44

white therefore swans are likely white

07:43 - 07:46

abductive reasoning deals with the best

07:44 - 07:48

explanation given the evidence often

07:46 - 07:50

used in scenarios where multiple

07:48 - 07:53

hypotheses could explain an observation

07:50 - 07:54

deep seek as a reasoning model handles

07:53 - 07:57

queries by considering how multiple

07:54 - 07:58

pieces of information relate and whether

07:57 - 08:01

a given response fits logically within

07:58 - 08:03

the presented context for example if you

08:01 - 08:05

asked a reasoning model to explain why a

08:03 - 08:07

system might be overheating it wouldn't

08:05 - 08:09

just list common causes from the

08:07 - 08:11

training data instead it would evaluate

08:09 - 08:13

context specific variables like airf

08:11 - 08:16

flow or component specs or recent system

08:13 - 08:18

behavior and produce a well thought of

08:16 - 08:20

diagnosis this is a significant Leap

08:18 - 08:23

Forward for self-hosted AI a reasoning

08:20 - 08:25

model like deep seek on local hardware

08:23 - 08:26

doesn't just save bandwidth it brings

08:25 - 08:28

meaningful decision-making directly to

08:26 - 08:30

your machine making it perfect for

08:28 - 08:32

environments where privacy latency or

08:30 - 08:35

cost are critical whether you're

08:32 - 08:37

analyzing system logs making predictions

08:35 - 08:39

or solving complex problems a reasoning

08:37 - 08:41

model as the structured thinking that

08:39 - 08:43

large language models otherwise

08:41 - 08:45

sometimes Overlook with the smallest

08:43 - 08:47

model the 1.5 billion parameter model we

08:45 - 08:49

saw a performance of about 32 tokens per

08:47 - 08:51

second which is fast enough for almost

08:49 - 08:53

all interactive purposes that I can

08:51 - 08:55

think of at least once the thinking part

08:53 - 08:57

is over if we step up to the next larger

08:55 - 08:59

model which is a 7 billion parameter

08:57 - 09:01

model we find that it can produce re

08:59 - 09:03

reasoning at a rate of about 12 tokens

09:01 - 09:04

per second that's a fair bit slower than

09:03 - 09:06

the smallest model but it's still

09:04 - 09:07

reasonable performance akin to what

09:06 - 09:10

you're going to experience in the cloud

09:07 - 09:12

at least for Speed I find that it's also

09:10 - 09:14

just slightly slower than my reading

09:12 - 09:16

speed so I can read its line of thinking

09:14 - 09:18

in that model at about the rate that it

09:16 - 09:19

produces it and it's all still local and

09:18 - 09:21

it's all still running on affordable

09:19 - 09:22

Hardware we could just keep working our

09:21 - 09:23

way up the food chain until we couldn't

09:22 - 09:25

load one of the models and that's

09:23 - 09:26

precisely what I did but I won't make

09:25 - 09:28

you watch me load and test them all

09:26 - 09:30

because anything bigger than 8 gbes is

09:28 - 09:32

not going to into memory and that limits

09:30 - 09:34

us to about the 7 billion parameter

09:32 - 09:36

model size if we want to run a larger

09:34 - 09:37

model then we're going to have to leave

09:36 - 09:39

the aura Nano behind for a moment and

09:37 - 09:41

break out another one of nvidia's big

09:39 - 09:44

party tricks this one in the form of an

09:41 - 09:47

RTX 60008 GPU which can still push

09:44 - 09:52

$110,000 on the retail Market with its

09:47 - 09:55

48 GB of ggdr 6 18,1 76 cicor and 91

09:52 - 09:57

Tera flops of floating Point performance

09:55 - 10:00

we'll pair it with a CPU of a similar

09:57 - 10:03

price the AMD Thro for $799 WX and then

10:00 - 10:05

throw in 512 GB of RAM to make sure that

10:03 - 10:07

it is room for even the largest of the

10:05 - 10:09

large models and we're going to need it

10:07 - 10:12

because the largest deeps R1 model has

10:09 - 10:16

671 billion parameters now thankfully

10:12 - 10:17

I'm on 5 GB fiber because it's 44 GB to

10:16 - 10:19

download and it's still a linkly

10:17 - 10:21

download though only about 20 minutes I

10:19 - 10:23

think I recall it being but even once

10:21 - 10:25

you have the model downloaded verifying

10:23 - 10:27

the hash will take many minutes as we'll

10:25 - 10:29

simply loading the model each time you

10:27 - 10:33

go to start it after all the model model

10:29 - 10:35

is 404 GB and if your SSD manages 4 GB

10:33 - 10:37

per second in sustained reads that's

10:35 - 10:38

still 100 seconds minimum to load that

10:37 - 10:40

much data and since it's not perfectly

10:38 - 10:43

efficient you're realistically looking

10:40 - 10:44

at a couple of minutes to load the model

10:43 - 10:47

once it loads though it works fine and

10:44 - 10:49

has impressive reasoning skills in fact

10:47 - 10:50

on the now famous performance slide

10:49 - 10:52

that's been making the rounds with deep

10:50 - 10:54

seek you can see that it even best chat

10:52 - 10:57

gpts o01 in some tasks and effectively

10:54 - 10:58

equals it on the remainder the

10:57 - 11:00

performance however does leave something

10:58 - 11:02

to be design tired in terms of real-time

11:00 - 11:04

interaction even with this Mighty

11:02 - 11:05

Hardware that we've brought to the task

11:04 - 11:08

the system manages the best of only

11:05 - 11:10

about four tokens per second I also

11:08 - 11:12

found that on Windows AMA isn't great

11:10 - 11:14

about taking advantage of all your CPU

11:12 - 11:15

cores at least if you have more than 64

11:14 - 11:17

of them if you do it's important to

11:15 - 11:18

issue a command in The Interpreter to

11:17 - 11:21

set the maximum number of threads to

11:18 - 11:23

match your CPU and that way it will take

11:21 - 11:25

advantage of all your cores see the

11:23 - 11:27

video description on the thread Ripper

11:25 - 11:29

the CPU is pegged at 100% but with the

11:27 - 11:31

smaller models though more of it runs on

11:29 - 11:34

the GPU and you'll see your GPU loads

11:31 - 11:36

approaching 100% And now for one last

11:34 - 11:37

trick the smallest model and the fastest

11:36 - 11:40

Hardware just so we can see how many

11:37 - 11:41

tokens per second that it can generate

11:40 - 11:43

I'll ask deep seek to tell me a long and

11:41 - 11:46

interesting story so to spend some time

11:43 - 11:48

thinking and as it does we see a GPU

11:46 - 11:51

load of 100% And this time it's in

11:48 - 11:53

contrast to a largely idle CPU and when

11:51 - 11:55

running the 1.5 billion parameter model

11:53 - 11:59

the big RTX 6000 cranks out an

11:55 - 12:01

impressive 233 tokens per second if

11:59 - 12:02

you've enjoyed today's little fora into

12:01 - 12:04

deep seek on both ends of the hardware

12:02 - 12:05

Spectrum remember I'm mostly in this for

12:04 - 12:07

the subs and likes so I'd be honored if

12:05 - 12:09

you consider subscribing to my channel

12:07 - 12:11

to get more like it and if you're

12:09 - 12:12

already subscribed thank you don't

12:11 - 12:14

forget to turn on the Bell icon leave a

12:12 - 12:15

like on the video and maybe click on

12:14 - 12:17

share to send it to a friend who might

12:15 - 12:19

also be interested I always appreciate

12:17 - 12:21

any organic efforts to hack the YouTube

12:19 - 12:22

algorithm as that other guy likes to say

12:21 - 12:24

and if you have any interest in matters

12:22 - 12:26

related to the autism spectrum please be

12:24 - 12:28

sure to check out the sample of my book

12:26 - 12:29

on Amazon Link in the video description

12:28 - 12:31

it's everything I know now about living

12:29 - 12:34

your best life on the spectrum that I

12:31 - 12:35

wish I'd known years ago in the meantime

12:34 - 12:38

and in between time hope to see you next

12:35 - 12:41

time right here in Dave's Garage hello

12:38 - 12:43

my baby hello my honey hello my R time

12:41 - 12:43

girl

Running AI Locally: Exploring NVIDIA's Jetson Nano and DeepSeek R1

Title: "Unleashing the Power of AI: Running DeepSeek R1 Locally with NVIDIA's Jetson Nano"

Introduction:
In today's tech-driven world, most of us are accustomed to relying on the cloud for AI tasks. But what if there was a way to harness the power of AI right on our desktops or edge devices without the need for the cloud? Enter NVIDIA's Jetson Nano and DeepSeek R1. In this article, we will explore the impressive capabilities of the Jetson Nano and learn how to run DeepSeek R1 locally, empowering us with privacy, control, and faster response times.

The Power of NVIDIA's Jetson Nano:
The Jetson Nano is a compact yet powerful edge computer designed specifically for AI workloads. With 1024 CUDA cores, 32 tensor cores, 8 GB of LPDDR5 RAM, and six ARM CPU cores, the Jetson Nano packs the brain of a workstation GPU into a palm-sized device. Its optimized hardware makes it an ideal platform for running DeepSeek R1, a NextGen conversational AI model that thrives on edge computing.

Advantages of Running DeepSeek R1 Locally:
Traditionally, AI queries are processed in the cloud, raising concerns about data privacy and recurring subscription fees. By running DeepSeek R1 locally on the Jetson Nano, users gain full control over their data and avoid dependency on cloud servers. Additionally, local execution eliminates latency issues and offers a more responsive AI assistant. Running AI locally also proves beneficial for those working on projects involving complex code, as it minimizes open AI subscription expenses.

Setting Up DeepSeek R1 with Olama:
To get started with DeepSeek R1 on the Jetson Nano, we use a simplified deployment tool called Olama. Olama acts as a streamlined interface for downloading, setting up, and configuring AI models. With Olama, the installation process is quick and easy, and users can choose to operate it via command-line or web interface. Once installed, DeepSeek R1 can be downloaded from Olama's catalog using a straightforward command.

Privacy and Ownership:
Running DeepSeek R1 locally not only enhances data privacy but also gives users a sense of ownership. Similar to hosting your own web server, self-hosting an AI model allows you to have complete control over the system. Furthermore, with local execution, users can rely on DeepSeek R1 even when the internet connection is unavailable, ensuring uninterrupted AI assistance.

Real-World Applications:
DeepSeek R1's power extends beyond mere chat-based queries. Its capabilities make it suitable for various real-world applications. For instance, it