00:00 - 00:05

hey guys welcome back to the channel

00:02 - 00:06

admittedly I've got a pretty low effort

00:05 - 00:08

video for you guys this week because I

00:06 - 00:11

felt really shitty this entire time had

00:08 - 00:13

a bad headache uh and then I proceeded

00:11 - 00:16

to spend as little time as possible on

00:13 - 00:17

my YouTube channel uh slightly different

00:16 - 00:19

Channel announcement coming up which is

00:17 - 00:22

that I will begin to be doing some

00:19 - 00:24

sponsored posts starting in May now you

00:22 - 00:26

may be thinking to yourself oh Jordan uh

00:24 - 00:28

well evidently you must need this money

00:26 - 00:29

for production costs on the channel

00:28 - 00:31

right and to keep things going like

00:29 - 00:33

every other YouTuber says uh no I don't

00:31 - 00:35

I'm just greedy it cost me0 to make

00:33 - 00:38

these videos I'm incredibly lazy it's

00:35 - 00:39

just me writing on my iPad uh but I

00:38 - 00:42

would just like a few hundred extra

00:39 - 00:44

dollars so uh if you are complaining

00:42 - 00:46

about the advertisements that would be

00:44 - 00:48

why I guess you're welcome to skip them

00:46 - 00:50

um but otherwise I'm going to try and

00:48 - 00:52

make them uh the least bit painful as

00:50 - 00:54

possible and hopefully a little bit

00:52 - 00:56

funny so let's go ahead and get into

00:54 - 00:59

some systems design and then we can all

00:56 - 01:00

have ourselves a great weekend okay so

00:59 - 01:02

today we're going to be talking about

01:00 - 01:04

how to build a notification service so

01:02 - 01:06

I'm just letting you guys know right off

01:04 - 01:08

the bat this is going to be a relatively

01:06 - 01:10

short video the reason being that

01:08 - 01:11

building a notification service is a

01:10 - 01:13

pattern that we've now seen three

01:11 - 01:15

different times on this channel and that

01:13 - 01:17

pattern is going to be what I like to

01:15 - 01:20

dub the fan out pattern AK when you

01:17 - 01:21

deliver a message to a variety of users

01:20 - 01:23

that are all subscribed or interested to

01:21 - 01:25

some sort of topic we've seen this in

01:23 - 01:27

Dropbox when we have to send document

01:25 - 01:29

changes to a variety of interesting

01:27 - 01:30

users we've seen it with Facebook

01:29 - 01:32

Messenger when we we basically have to

01:30 - 01:34

do the same and we've seen it with

01:32 - 01:36

Twitter and so really what I want to do

01:34 - 01:38

in this video is formalize that pattern

01:36 - 01:39

once and for all and then maybe add a

01:38 - 01:41

little bit of extra thought into how we

01:39 - 01:43

can make sure that we've delivered each

01:41 - 01:45

message only once to every single client

01:43 - 01:47

and then from there we'll call it a day

01:45 - 01:49

so everyone can have a nice Saturday so

01:47 - 01:51

what does a notification service look

01:49 - 01:52

like well obviously if I'm on my phone

01:51 - 01:54

here we've got a few different types of

01:52 - 01:56

messages that I can receive from a bunch

01:54 - 01:57

of different applications but even

01:56 - 01:59

though these are from different

01:57 - 02:00

applications they're effectively

01:59 - 02:03

centralized through the actual device

02:00 - 02:05

provider themselves like like apple for

02:03 - 02:06

example has a push notification server

02:05 - 02:08

that when you're making an app you would

02:06 - 02:11

hook into and you would hit some API to

02:08 - 02:14

actually push a notification there cool

02:11 - 02:15

so let's formalize some problem

02:14 - 02:17

requirements and capacity estimates

02:15 - 02:19

again this is a pretty abstract problem

02:17 - 02:21

so I'm going to you know just try and

02:19 - 02:22

give some very broad capacity estimates

02:21 - 02:24

then maybe we'll discuss them a little

02:22 - 02:26

bit more concretely later in the video

02:24 - 02:28

so the idea is that if we have a bunch

02:26 - 02:29

of users let's say on our cell phone

02:28 - 02:31

device we want to be able to give them

02:29 - 02:34

notific ifications in real time based on

02:31 - 02:35

some sort of topic now in theory it is

02:34 - 02:38

possible that notifications might be

02:35 - 02:39

sent to a specific device ID but at

02:38 - 02:41

least in my opinion I guess that's a

02:39 - 02:43

little bit of an easier problem you just

02:41 - 02:45

go send it to their specific device ID

02:43 - 02:46

and uh yeah or maybe to the server that

02:45 - 02:48

they're connected to if they're

02:46 - 02:50

connected via some sort of websocket or

02:48 - 02:52

something like that and then number two

02:50 - 02:54

is that if users are offline we need to

02:52 - 02:56

be able to store their notifications so

02:54 - 02:59

that we can run some sort of query and

02:56 - 03:00

ultimately fetch them later cool so

02:59 - 03:02

another thing is if we have let's say a

03:00 - 03:03

billion topics because at the end of the

03:02 - 03:04

day there are a lot of apps out there

03:03 - 03:08

they're probably each registering their

03:04 - 03:09

own topic and each user or uh basically

03:08 - 03:12

each topic is going to receive a th000

03:09 - 03:14

notifications per day of around 100

03:12 - 03:15

bytes on average now all of a sudden if

03:14 - 03:17

we have to store all of these messages

03:15 - 03:19

in our database we're storing 100

03:17 - 03:21

terabytes of data per day 30 pedabytes a

03:19 - 03:22

year so that's a lot it means we're

03:21 - 03:23

going to definitely have to be doing

03:22 - 03:25

some amount of partitioning in our

03:23 - 03:27

tables and uh whenever we're doing

03:25 - 03:28

partitioning and we have data

03:27 - 03:29

distributed across multiple nodes it

03:28 - 03:32

does allow us to do a little bit of

03:29 - 03:34

caching to speed up reads at times cool

03:32 - 03:35

so again note the topics can have

03:34 - 03:37

millions of users subscrib to them in

03:35 - 03:39

the same way that if I'm running Twitter

03:37 - 03:41

a given person who's posting tweets can

03:39 - 03:43

have millions of followers or if I'm

03:41 - 03:45

running Dropbox a given document can

03:43 - 03:47

have millions of users subscribe to

03:45 - 03:49

those changes so the question is do we

03:47 - 03:51

always want to push the notification to

03:49 - 03:52

each user well if you're thinking of

03:51 - 03:54

those previous videos and thinking about

03:52 - 03:57

the patterns here the answer is probably

03:54 - 03:58

not cool so the first thing that I'm

03:57 - 04:00

going to do is try and formalize that

03:58 - 04:02

fan out design path patter because at

04:00 - 04:03

least on my channel it's probably always

04:02 - 04:05

going to look the same there are many

04:03 - 04:07

different ways of implementing this type

04:05 - 04:09

of thing but you know how I do it which

04:07 - 04:11

is that I am going to go jerk off over

04:09 - 04:13

flank and uh we're going to do things

04:11 - 04:15

that way so if we have all of these

04:13 - 04:17

notifications to be delivered we would

04:15 - 04:19

just be throwing them into a Kafka

04:17 - 04:22

stream and we can Shard that up by the

04:19 - 04:25

actual topic ID uh we would also have to

04:22 - 04:27

have um in our Flint consumer uh we need

04:25 - 04:29

to have a sense of basically which users

04:27 - 04:30

care about which topics right so that we

04:29 - 04:33

can say something like oh for the

04:30 - 04:35

messages topic user 10 and 12 are

04:33 - 04:38

subscribed to that well the way that we

04:35 - 04:39

would do this is you know as opposed to

04:38 - 04:41

having to have flank reach out to the

04:39 - 04:43

database every single time of topic

04:41 - 04:45

subscriptions to figure out all the

04:43 - 04:47

users that care about a topic that takes

04:45 - 04:48

a lot of time those are expensive reads

04:47 - 04:50

there are potentially a lot of users

04:48 - 04:52

that are subscribed to a given topic

04:50 - 04:53

what would be better is if we could

04:52 - 04:56

actually just pre-populate our flank

04:53 - 04:58

node with all of this information per

04:56 - 05:00

topic ID that our flank node cares about

04:58 - 05:02

so instead what we'll do is we'll take

05:00 - 05:04

our topic subscription table we can

05:02 - 05:06

Shard It Out by user ID the reason being

05:04 - 05:07

that in the future we'll want to figure

05:06 - 05:09

out for a given user what topics they

05:07 - 05:11

care about but then when we actually

05:09 - 05:14

push this change data to Kafka we can

05:11 - 05:16

basically rehart it by our topic ID and

05:14 - 05:18

so this goes in sharted by topic ID so

05:16 - 05:21

that Flink can easily consume to just

05:18 - 05:22

one kofka Q or listen to One kofka q and

05:21 - 05:24

that way for every single topic that

05:22 - 05:27

Flink cares about it knows what users

05:24 - 05:29

subscribe to it so the thing to note

05:27 - 05:30

here is that for certain topics uh we're

05:29 - 05:32

going to have way too many users

05:30 - 05:34

subscribing to it right such that it

05:32 - 05:36

doesn't actually make sense to deliver

05:34 - 05:38

that message to each user individually

05:36 - 05:40

the cost is just going to be too

05:38 - 05:41

expensive so maybe we've got some other

05:40 - 05:44

topic called

05:41 - 05:46

all and Flink realizes oh shoot you know

05:44 - 05:47

what I've got a th users subscrib to

05:46 - 05:49

this thing we're just going to write

05:47 - 05:50

down here we're not going to store all

05:49 - 05:53

that state we're just going to call it

05:50 - 05:55

popular cool so let's say we do that

05:53 - 05:57

basically depending on whether this

05:55 - 06:00

incoming message is for the messages

05:57 - 06:02

topic let's say that's going to user 10

06:00 - 06:04

and 12 so then you know flank also can

06:02 - 06:07

either have some intermediary layer to

06:04 - 06:08

reach out to or just read from zookeeper

06:07 - 06:10

but the point is now it knows where to

06:08 - 06:12

Route these messages to eventually get

06:10 - 06:14

them to user 10 and 12 they have to go

06:12 - 06:15

to our sync servers so maybe this is the

06:14 - 06:17

server that's connected to user 10 this

06:15 - 06:19

is the server that's connected to user

06:17 - 06:21

12 so it's going to send that message

06:19 - 06:23

both here and over here on the other

06:21 - 06:25

hand if a message all comes

06:23 - 06:27

in now that's a popular one I want to

06:25 - 06:28

send it to the popular messages server

06:27 - 06:30

which is going to treat this thing a

06:28 - 06:32

little bit differently

06:30 - 06:34

AKA it's not going to try and send them

06:32 - 06:35

to individual users uh we're just going

06:34 - 06:39

to end up polling that later but I'll

06:35 - 06:40

discuss that in a little bit cool so the

06:39 - 06:42

main uh kind of nuance of this video

06:40 - 06:43

that I want to discuss that I probably

06:42 - 06:46

didn't spend too much time discussing in

06:43 - 06:49

the Facebook Messenger or the Google

06:46 - 06:50

Drive video is uh item potent so we do

06:49 - 06:53

want to make sure that all of our

06:50 - 06:54

messages are delivered only once so the

06:53 - 06:56

nice thing about flank is that flank is

06:54 - 06:57

going to ensure that all of those

06:56 - 06:59

messages are going to be delivered at

06:57 - 07:01

least once right because of the way that

06:59 - 07:03

Flink does checkpointing so at least

07:01 - 07:05

within Flink we know that all of these

07:03 - 07:08

messages are going to be handled at

07:05 - 07:10

least once over here and then depending

07:08 - 07:12

on how we handle kind of Downstream all

07:10 - 07:14

of the messages that get to our sync

07:12 - 07:16

servers we know that uh they're going to

07:14 - 07:18

be delivered to the user at least once

07:16 - 07:19

so it's our job to make sure that

07:18 - 07:21

they're going to be delivered only once

07:19 - 07:24

and not more than once or at least

07:21 - 07:25

processed only once so what could we

07:24 - 07:27

actually do well keep in mind that our

07:25 - 07:29

notification server right that's just

07:27 - 07:31

forwarding messages to the user so

07:29 - 07:32

that's state list if it hears from flank

07:31 - 07:33

it's just going to say ooh am I

07:32 - 07:35

connected to this user let me go ahead

07:33 - 07:37

and forward this over there so again

07:35 - 07:39

this message is going to deliver at

07:37 - 07:42

least once due to that flank uh kind of

07:39 - 07:44

state replay so the question is well

07:42 - 07:46

what can we do to actually ensure that

07:44 - 07:48

our client isn't just showing duplicate

07:46 - 07:50

notifications all the time one thing we

07:48 - 07:52

could do is that we can actually store

07:50 - 07:54

the notifications that we've seen or at

07:52 - 07:55

least the IDS of them on the client to

07:54 - 07:57

make sure that we don't redeliver them

07:55 - 07:59

so the concept of this is called

07:57 - 08:00

something like an item potency key

07:59 - 08:02

meaning that uh you know we're basically

08:00 - 08:05

just keeping a little bit of memory on

08:02 - 08:06

our client uh and you know keeping a set

08:05 - 08:08

and making sure that you know based on

08:06 - 08:10

hashing that IDE that we don't already

08:08 - 08:12

have it somewhere on our client the

08:10 - 08:14

problem is this is going to use up extra

08:12 - 08:16

memory the question is how much extra

08:14 - 08:20

memory and do we actually

08:16 - 08:23

care so uh in theory um basically if we

08:20 - 08:24

are going to assume that uh you know

08:23 - 08:26

we're getting about a th000

08:24 - 08:28

notifications per day on a device and

08:26 - 08:30

each one is 16 bytes because that's how

08:28 - 08:33

much a uu ID which we could use for our

08:30 - 08:36

ID and potency key takes that alone is

08:33 - 08:38

going to be like basically 16 kilobytes

08:36 - 08:40

which is not so bad and in theory we

08:38 - 08:42

could totally store this on the client

08:40 - 08:43

but uh I would imagine that in a systems

08:42 - 08:45

design interview someone might push you

08:43 - 08:46

a little bit and say well what if we

08:45 - 08:48

don't want to use that extra memory

08:46 - 08:50

footprint on the client let's go ahead

08:48 - 08:53

and try and store all of these item

08:50 - 08:55

potency Keys elsewhere to which I say

08:53 - 08:56

fair enough um you know I suppose in

08:55 - 08:59

certain edge cases this would be too

08:56 - 09:00

much memory for certain devices maybe

08:59 - 09:02

you've got like like an apple watch or

09:00 - 09:03

something and uh at the end of the day

09:02 - 09:07

that is just not going to have the

09:03 - 09:10

memory to store all of this so again uh

09:07 - 09:13

if we wanted to store this let's say on

09:10 - 09:14

our actual notification server itself so

09:13 - 09:16

that would be up here the thing

09:14 - 09:18

forwarding the messages that means we

09:16 - 09:20

would have to store all of the item

09:18 - 09:22

potent keys for all of the users that

09:20 - 09:24

that notification server is uh cares

09:22 - 09:26

about and so let's say it's connected to

09:24 - 09:27

around 65,000 users CU that's around how

09:26 - 09:29

many ports you can use to connect to

09:27 - 09:31

people that comes out to around a

09:29 - 09:33

gigabyte of memory which itself is not

09:31 - 09:35

too bad but again they may push you and

09:33 - 09:37

say well what if it were greater than

09:35 - 09:38

this what would you do then so the one

09:37 - 09:40

thing I want to note about actually

09:38 - 09:42

keeping all of these uh keys on our

09:40 - 09:44

server to ensure that we aren't sending

09:42 - 09:46

duplicate messages to the client is that

09:44 - 09:49

we have the potential for partial

09:46 - 09:51

failures right so let's say that uh one

09:49 - 09:53

thing that we do is we say okay we're

09:51 - 09:54

going to send our message to the client

09:53 - 09:56

and then when we hear back from the

09:54 - 09:58

client saying I got the message we're

09:56 - 10:00

going to keep track of that item potency

09:58 - 10:03

key so we don't actually send that

10:00 - 10:05

message again well what about when the

10:03 - 10:07

client device for whatever reason is

10:05 - 10:09

sending that uh acknowledgement back to

10:07 - 10:10

the notification server and it doesn't

10:09 - 10:12

work well we're never actually going to

10:10 - 10:14

add the ad in potency key so that's one

10:12 - 10:15

partial failure scenario and the other

10:14 - 10:17

is that you might say oh well what if

10:15 - 10:19

instead of uh keeping track of the item

10:17 - 10:21

potency key after we send it to the

10:19 - 10:23

client we write it down before we send

10:21 - 10:26

it to the client uh and then we'll just

10:23 - 10:28

go ahead and store that around well it

10:26 - 10:31

would be possible that you know we write

10:28 - 10:33

down the item potency key so it's like

10:31 - 10:35

key1 we try and send that over to the

10:33 - 10:37

client that doesn't work out for us and

10:35 - 10:39

then maybe our server goes down so it

10:37 - 10:41

never actually realizes that the client

10:39 - 10:42

didn't get it now all of a sudden we've

10:41 - 10:45

added an Iden potency key for an

10:42 - 10:46

undelivered message which is bad because

10:45 - 10:48

if that message then comes in again and

10:46 - 10:49

we want to redeliver it we're not going

10:48 - 10:51

to do so and that's never going to get

10:49 - 10:54

to the client so in theory if we did

10:51 - 10:56

want to make this process completely

10:54 - 10:58

perfect we would need two-phase commit

10:56 - 11:00

in practice uh it's not really the end

10:58 - 11:02

of the world if you don't get a push

11:00 - 11:05

notification received and so who cares

11:02 - 11:06

uh but yeah you know note that uh if we

11:05 - 11:08

want everything to be proper we would

11:06 - 11:10

have to basically add our item potency

11:08 - 11:12

key conditional on the fact that the

11:10 - 11:13

client device is able to process it and

11:12 - 11:15

that of course is a distributed

11:13 - 11:18

transaction which requires two phase

11:15 - 11:21

commit cool so the last piece of this is

11:18 - 11:22

you know maybe you say o you know 16 or

11:21 - 11:24

a gigabyte of memory on our notification

11:22 - 11:26

server we just don't want to pay for

11:24 - 11:28

that we would rather do this all on disk

11:26 - 11:30

in some sort of database so you know

11:28 - 11:32

let's imagine instead we got 10,000

11:30 - 11:34

messages a day per user and all of a

11:32 - 11:35

sudden we need to store 16 gbt of memory

11:34 - 11:37

and that's just too much for our

11:35 - 11:39

notification server we could actually do

11:37 - 11:41

this in a database like I

11:39 - 11:43

mentioned and uh the way we would do

11:41 - 11:44

that to probably make reads and writes

11:43 - 11:46

as fast as possible is we would go ahead

11:44 - 11:49

and index on our specific item potency

11:46 - 11:50

key and then partition the table by user

11:49 - 11:52

ID so that you know for a given user and

11:50 - 11:54

a given item potency key we can at least

11:52 - 11:57

relatively quickly jump into that index

11:54 - 11:59

and find it however uh this is obviously

11:57 - 12:01

going to incur quite a bit of extra

11:59 - 12:04

latency as in step one we have to check

12:01 - 12:06

whether the item potency key exists step

12:04 - 12:08

two we have to go ahead and write our

12:06 - 12:09

message to the client and then in step

12:08 - 12:11

three we have to write that item potency

12:09 - 12:14

key back to the database so is there

12:11 - 12:16

anything that we can do to optimize this

12:14 - 12:17

process a little bit well actually in

12:16 - 12:19

fact there is we could use something

12:17 - 12:21

like a bloom filter so a bloom filter is

12:19 - 12:23

going to allow us to ideally get rid of

12:21 - 12:25

this step one where we have to read the

12:23 - 12:28

database to check if uh that item

12:25 - 12:29

potency key has been seen already it's

12:28 - 12:31

not perfect but at least some percentage

12:29 - 12:33

of the time the bloom filter will help

12:31 - 12:35

us out here so a bloom filter is a

12:33 - 12:36

probabilistic data structure that we've

12:35 - 12:37

spoken about plenty of times on this

12:36 - 12:40

channel but I'll quickly go over it one

12:37 - 12:42

more time so the idea is that we

12:40 - 12:45

basically want to avoid one extra read

12:42 - 12:48

to database so let's say I see some key

12:45 - 12:49

called Jordan right that's going to hash

12:48 - 12:52

and basically our Bloom filter is a fix

12:49 - 12:54

size memory buffer that uh involves

12:52 - 12:56

basically using multiple hash functions

12:54 - 12:58

to map a given key to multiple places in

12:56 - 12:59

the memory buffer so let's say hash

12:58 - 13:02

function one is going going to map the

12:59 - 13:03

key Jordan over here into that bucket

13:02 - 13:05

hash function two maps it into this

13:03 - 13:08

bucket hash function three Maps it into

13:05 - 13:11

this bucket then we add the key Kate uh

13:08 - 13:13

Kate is going to go here here and here

13:11 - 13:14

with our three hash functions and then

13:13 - 13:17

the key Megan comes around and we want

13:14 - 13:18

to say o have we already seen Megan well

13:17 - 13:21

we know already just using hash function

13:18 - 13:23

one that we haven't because there's no

13:21 - 13:25

other element that's already filled up

13:23 - 13:28

this bucket right here so Megan has to

13:25 - 13:30

be unique even if this guy went and

13:28 - 13:32

hashed over here and this guy on hash

13:30 - 13:34

two hashed over here we could see that

13:32 - 13:37

because hash function 3 put Megan in a

13:34 - 13:38

unique bucket that Megan is a key that

13:37 - 13:40

we have not already seen and then we

13:38 - 13:42

don't actually have to read the database

13:40 - 13:43

we can just go ahead and send that

13:42 - 13:46

message right over to the client without

13:43 - 13:48

incurring an extra read cost now it is

13:46 - 13:50

the case that you know we might see a

13:48 - 13:51

key and it looks like it is completely

13:50 - 13:52

in the bloom filter but we can't

13:51 - 13:54

guarantee that it is because those

13:52 - 13:56

buckets may have been filled up by a

13:54 - 13:58

combination of other keys so the bloom

13:56 - 14:00

filter will tell us when we haven't

13:58 - 14:01

already seen a key but we can't be sure

14:00 - 14:03

that we have already seen a key so

14:01 - 14:05

that's worth noting so of course when we

14:03 - 14:07

haven't already seen that message we

14:05 - 14:09

don't have to read the database we just

14:07 - 14:11

go ahead and write it right to our

14:09 - 14:13

client so you can see that I'm uh losing

14:11 - 14:14

my voice a little bit here but I'm going

14:13 - 14:17

to try and get through the rest of this

14:14 - 14:18

thing so as far as the client is

14:17 - 14:20

concerned right because the client

14:18 - 14:22

itself is connected to the notification

14:20 - 14:23

service we need to be able to do two

14:22 - 14:25

things the first is that for the

14:23 - 14:27

messages that don't have to be delivered

14:25 - 14:29

to too many users which I'll call the

14:27 - 14:31

unpopular ones we want to be delivered

14:29 - 14:33

bring those in real time so the idea

14:31 - 14:35

here is that again we've got some sort

14:33 - 14:37

of routing server right and we've used

14:35 - 14:39

this pattern as well in a ton of places

14:37 - 14:41

Uber Facebook Messenger all that so the

14:39 - 14:43

routing server is basically just going

14:41 - 14:45

to look at the existing notification

14:43 - 14:47

servers how many connections they have

14:45 - 14:49

probably use some sort of consistent

14:47 - 14:51

hashing schema on the user ID and then

14:49 - 14:54

assign a notification server to a given

14:51 - 14:55

user that wants to connect to one so

14:54 - 14:57

it's going to assign it some sort of

14:55 - 14:59

address based on consistent hashing and

14:57 - 15:00

then if for whatever reason uh you know

14:59 - 15:03

they're sending heartbeats back and

15:00 - 15:04

forth with one another and uh the client

15:03 - 15:06

server real or the client device

15:04 - 15:08

realizes that it's no longer connected

15:06 - 15:10

to that notification Service uh it will

15:08 - 15:12

just reach right back out to the routing

15:10 - 15:13

server to get another connection now the

15:12 - 15:17

one thing I wanted to point out is we

15:13 - 15:18

should probably do a random Jitter uh

15:17 - 15:20

every time that that connection breaks

15:18 - 15:23

down to basically wait a little bit

15:20 - 15:25

before we reconnect otherwise if one

15:23 - 15:27

notification server goes down we're

15:25 - 15:28

going to have all of its multiple

15:27 - 15:30

thousands of devices trying to reconnect

15:28 - 15:33

to new servers at the same time and that

15:30 - 15:34

could cause a Thundering Herd cool the

15:33 - 15:37

next thing is that we also want to be

15:34 - 15:39

polling popular messages on an interval

15:37 - 15:41

so the idea here is that you know

15:39 - 15:42

because uh we have all this information

15:41 - 15:45

stored in our database and partitioned

15:42 - 15:46

by user we can really quickly just get

15:45 - 15:48

all of our topic subscriptions for a

15:46 - 15:50

given client device and then that is

15:48 - 15:52

going to tell me whether a given topic

15:50 - 15:55

is also popular right so one thing that

15:52 - 15:58

I didn't mention too much is that in

15:55 - 16:00

this fan out pattern when Flink deems

15:58 - 16:02

that a given uh topic is popular because

16:00 - 16:04

it's subscribed to all of its users it

16:02 - 16:06

can actually go ahead and reach back out

16:04 - 16:08

to the Topic's table and say hey by the

16:06 - 16:10

way this topic is now considered a

16:08 - 16:12

popular one or we could make it user

16:10 - 16:13

configurable as well like an app might

16:12 - 16:14

be able to say hey I'm going to send

16:13 - 16:17

this particular notification to all

16:14 - 16:19

users Market in advance as a popular

16:17 - 16:21

topic either way works um but you know

16:19 - 16:24

it's a design

16:21 - 16:26

specific cool so the idea is um you know

16:24 - 16:29

when we figure out which information on

16:26 - 16:31

a given po uh topic is popular then we

16:29 - 16:34

can basically go ahead and go to our

16:31 - 16:35

actual notifications table which uh you

16:34 - 16:36

know something like Cassandra works

16:35 - 16:39

because we want fast ingestion there

16:36 - 16:41

through the lstm tree or rather the LSM

16:39 - 16:43

tree which is uh buffered in memory and

16:41 - 16:45

basically also multiple different leader

16:43 - 16:47

nodes and then we can just read from

16:45 - 16:49

Cassandra we can partition It Out by

16:47 - 16:51

topic so that uh we can easily find

16:49 - 16:53

those messages and sort within a topic

16:51 - 16:54

on timestamp which is going to make our

16:53 - 16:57

life pretty

16:54 - 16:59

easy cool sorry I'm brushing through

16:57 - 17:01

this one guys I just am in quite a bit

16:59 - 17:04

of pain at the moment so let me go ahead

17:01 - 17:05

and finish it off so as you can see as

17:04 - 17:07

far as the actual notification service

17:05 - 17:09

diagram goes uh it's pretty standard to

17:07 - 17:11

what we describing in this full video on

17:09 - 17:13

the left let's actually first discuss

17:11 - 17:16

how a client is going to receive

17:13 - 17:18

notifications basically we've got you

17:16 - 17:20

know our client over here and on one

17:18 - 17:21

side we've got some sort of notification

17:20 - 17:23

server router which is going to listen

17:21 - 17:25

to zookeeper get the current consistent

17:23 - 17:27

hashing policy connect us to a

17:25 - 17:29

notification server and then once we're

17:27 - 17:30

connected to a notification server we

17:29 - 17:33

can communicate back and forth sending

17:30 - 17:34

one another heartbeats via websockets

17:33 - 17:36

this is going to allow me as the client

17:34 - 17:38

to realize if I'm still connected to the

17:36 - 17:40

notification server and if I'm not then

17:38 - 17:43

I have to once again go back here and

17:40 - 17:44

reconnect to a different one cool so

17:43 - 17:46

that is going to make sure that I'm

17:44 - 17:47

getting basically the unpopular

17:46 - 17:49

notifications for the popular

17:47 - 17:51

notifications I'm going to go ahead and

17:49 - 17:54

read from my load balancer I'm going to

17:51 - 17:55

hit my notification polling service

17:54 - 17:57

which is now going to all of a sudden

17:55 - 17:59

basically tell me what topics I'm

17:57 - 18:01

subscribed to and then from knowing the

17:59 - 18:03

topics that I'm subscribed to I can

18:01 - 18:05

actually go ahead and first off read

18:03 - 18:08

from some sort of popular notifications

18:05 - 18:10

cache which is in redus uh this guy is

18:08 - 18:12

just going to basically act as a proxy

18:10 - 18:14

between the actual notifications

18:12 - 18:16

database table and uh hopefully should

18:14 - 18:18

end up being filled with all of the

18:16 - 18:19

popular notifications because those are

18:18 - 18:21

going to be getting read a lot more uh

18:19 - 18:24

we can just use an lru replacement

18:21 - 18:25

policy to basically ensure this and then

18:24 - 18:26

that is going to be getting uh proxied

18:25 - 18:28

information from our actual

18:26 - 18:29

notifications table that I just

18:28 - 18:31

described before so this guy is going to

18:29 - 18:33

be Cassandra because we are going to be

18:31 - 18:35

publishing a ton of messages there from

18:33 - 18:37

flank and uh it would be good to be able

18:35 - 18:40

to injust them pretty quickly we can

18:37 - 18:42

Shard this guy by our topic ID and sort

18:40 - 18:44

it by our timestamp that way we can

18:42 - 18:46

basically just say for a given topic

18:44 - 18:47

that I want to get the messages for all

18:46 - 18:48

of our messages that I care about

18:47 - 18:50

between these two time stamps are

18:48 - 18:52

already pre-index so it should be a

18:50 - 18:53

relatively fast query and ideally most

18:52 - 18:55

of the time we're just going to be

18:53 - 18:58

hitting over here on our cache as far as

18:55 - 18:59

flank is concerned basically the way

18:58 - 19:01

that this works like I mentioned before

18:59 - 19:04

is using the fan out pattern so we have

19:01 - 19:05

all of our topic subscriptions this can

19:04 - 19:07

just be stored in a mySQL database we're

19:05 - 19:08

not going to be reading there too often

19:07 - 19:10

we're not going to be writing there too

19:08 - 19:11

often but we are going to be reading

19:10 - 19:13

from there a decent amount and as a

19:11 - 19:15

result of that I think using SQL or a B

19:13 - 19:17

tree based database is going to be nice

19:15 - 19:19

here and also again it just keeps things

19:17 - 19:22

simple sh this guy out on user ID and

19:19 - 19:24

then when we actually put everything in

19:22 - 19:27

Kafka apologies typo here on my end this

19:24 - 19:29

cka que should be sharded on topic ID at

19:27 - 19:32

which point we go ahead ahead and put

19:29 - 19:34

everything into Flink so Flink is itself

19:32 - 19:36

sharded again on topic ID so that per

19:34 - 19:38

topic that it cares about it has a sense

19:36 - 19:39

of all the users that it needs to go

19:38 - 19:42

ahead and reach out

19:39 - 19:44

to so when a message comes in from our

19:42 - 19:45

actual notification queue over here and

19:44 - 19:47

this can be published from a variety of

19:45 - 19:49

different places right every app is

19:47 - 19:51

effectively publishing to this massive

19:49 - 19:53

CFA Q probably through some sort of you

19:51 - 19:56

know notification service that then you

19:53 - 19:58

know proxies over to this cofa but the

19:56 - 19:59

gist is that flank is basically going to

19:58 - 20:02

get the

19:59 - 20:05

messages put them in the notifications

20:02 - 20:06

table whether they're popular or not and

20:05 - 20:08

then also assuming they're not popular

20:06 - 20:11

it is going to Fan them out to the

20:08 - 20:12

proper notification servers that we care

20:11 - 20:15

about and so we can either you know do

20:12 - 20:17

that fan out via some sort of middleware

20:15 - 20:20

over here that's listening to a

20:17 - 20:23

zookeeper or we can have Flink itself

20:20 - 20:24

listen to zookeeper you know get a sense

20:23 - 20:27

of where the messages actually have to

20:24 - 20:29

be delivered and then deliver them uh

20:27 - 20:30

deliver them there accordingly

20:29 - 20:33

well guys I hope this video was somewhat

20:30 - 20:35

helpful again like nothing too novel

20:33 - 20:37

here in terms of um what we've actually

20:35 - 20:38

spoken about in this series so far the

20:37 - 20:41

reason being that I want to just Hammer

20:38 - 20:43

home the point that at least for four to

20:41 - 20:45

five of these problems that I've covered

20:43 - 20:47

right now this fan out problem or rather

20:45 - 20:48

this fan out design is going to be

20:47 - 20:50

something that you want to really know

20:48 - 20:52

super well and uh have that in the back

20:50 - 20:54

of your head and also be able to

20:52 - 20:56

recognize when to use it uh because it

20:54 - 20:59

will definitely come in handy anyways I

20:56 - 21:02

hope you all enjoy your weekend and uh I

20:59 - 21:02

will see you in the next one

Building a Notification Service: Enhancing User Experience through Efficient Delivery

In this video, Jordan explores the notification service design, emphasizing the importance of efficiently delivering messages to users. Despite facing a rough week, Jordan briefly introduces upcoming sponsored posts on the channel. The central theme revolves around the fan out pattern in notification services, focusing on reaching multiple interested users subscribed to various topics.

Fan Out Pattern in Notification Services

Jordan delves into the fan out pattern, where messages are distributed to multiple users subscribed to specific topics, drawing parallels with platforms like Dropbox, Facebook Messenger, and Twitter. He formalizes the pattern, stressing the need to ensure message delivery occurs only once to each client.

Designing the Notification Service

  1. Message Handling: Jordan illustrates the need to efficiently handle messages for a billion topics, accommodating around 100 terabytes of data daily. He suggests partitioning data and using caching mechanisms to enhance read speeds.
  2. Implementing Fan Out: Through tools like Kafka and Flink, Jordan creates a streamlined process for managing topic subscriptions and delivering messages based on user interests.
  3. Ensuring Delivery: Jordan delves into ensuring item potency to prevent duplicate message deliveries, proposing strategies like storing item potency keys on the client or server.

Optimizing Message Delivery

  1. Client Interaction: Jordan discusses how the client device interacts with the notification service, receiving both real-time and popular messages efficiently.
  2. Scalability and Performance: Exploring storage options like databases, Jordan suggests using index and partition strategies to optimize reads and writes, emphasizing the significance of a bloom filter to reduce database reads.

Conclusion

While navigating the complexities of building a notification service, Jordan emphasizes the importance of efficient message delivery for a seamless user experience. Through the fan out design pattern and strategic optimizations, the notification service aims to cater to user preferences effectively.

By understanding the nuances of notification service design, developers can elevate the user experience by ensuring timely and relevant message delivery.

In a nutshell, mastering the fan out pattern and employing optimization techniques can revolutionize the notification service landscape, enhancing communication between applications and users.

From optimizing message delivery to ensuring item potency, this comprehensive guide equips developers with the necessary tools to create a robust and efficient notification service.

Remember, in the realm of notification services, efficiency and reliability reign supreme. So, optimize, adapt, and elevate the user experience!