00:00 - 00:05
hey guys welcome back to the channel
00:02 - 00:06
admittedly I've got a pretty low effort
00:05 - 00:08
video for you guys this week because I
00:06 - 00:11
felt really shitty this entire time had
00:08 - 00:13
a bad headache uh and then I proceeded
00:11 - 00:16
to spend as little time as possible on
00:13 - 00:17
my YouTube channel uh slightly different
00:16 - 00:19
Channel announcement coming up which is
00:17 - 00:22
that I will begin to be doing some
00:19 - 00:24
sponsored posts starting in May now you
00:22 - 00:26
may be thinking to yourself oh Jordan uh
00:24 - 00:28
well evidently you must need this money
00:26 - 00:29
for production costs on the channel
00:28 - 00:31
right and to keep things going like
00:29 - 00:33
every other YouTuber says uh no I don't
00:31 - 00:35
I'm just greedy it cost me0 to make
00:33 - 00:38
these videos I'm incredibly lazy it's
00:35 - 00:39
just me writing on my iPad uh but I
00:38 - 00:42
would just like a few hundred extra
00:39 - 00:44
dollars so uh if you are complaining
00:42 - 00:46
about the advertisements that would be
00:44 - 00:48
why I guess you're welcome to skip them
00:46 - 00:50
um but otherwise I'm going to try and
00:48 - 00:52
make them uh the least bit painful as
00:50 - 00:54
possible and hopefully a little bit
00:52 - 00:56
funny so let's go ahead and get into
00:54 - 00:59
some systems design and then we can all
00:56 - 01:00
have ourselves a great weekend okay so
00:59 - 01:02
today we're going to be talking about
01:00 - 01:04
how to build a notification service so
01:02 - 01:06
I'm just letting you guys know right off
01:04 - 01:08
the bat this is going to be a relatively
01:06 - 01:10
short video the reason being that
01:08 - 01:11
building a notification service is a
01:10 - 01:13
pattern that we've now seen three
01:11 - 01:15
different times on this channel and that
01:13 - 01:17
pattern is going to be what I like to
01:15 - 01:20
dub the fan out pattern AK when you
01:17 - 01:21
deliver a message to a variety of users
01:20 - 01:23
that are all subscribed or interested to
01:21 - 01:25
some sort of topic we've seen this in
01:23 - 01:27
Dropbox when we have to send document
01:25 - 01:29
changes to a variety of interesting
01:27 - 01:30
users we've seen it with Facebook
01:29 - 01:32
Messenger when we we basically have to
01:30 - 01:34
do the same and we've seen it with
01:32 - 01:36
Twitter and so really what I want to do
01:34 - 01:38
in this video is formalize that pattern
01:36 - 01:39
once and for all and then maybe add a
01:38 - 01:41
little bit of extra thought into how we
01:39 - 01:43
can make sure that we've delivered each
01:41 - 01:45
message only once to every single client
01:43 - 01:47
and then from there we'll call it a day
01:45 - 01:49
so everyone can have a nice Saturday so
01:47 - 01:51
what does a notification service look
01:49 - 01:52
like well obviously if I'm on my phone
01:51 - 01:54
here we've got a few different types of
01:52 - 01:56
messages that I can receive from a bunch
01:54 - 01:57
of different applications but even
01:56 - 01:59
though these are from different
01:57 - 02:00
applications they're effectively
01:59 - 02:03
centralized through the actual device
02:00 - 02:05
provider themselves like like apple for
02:03 - 02:06
example has a push notification server
02:05 - 02:08
that when you're making an app you would
02:06 - 02:11
hook into and you would hit some API to
02:08 - 02:14
actually push a notification there cool
02:11 - 02:15
so let's formalize some problem
02:14 - 02:17
requirements and capacity estimates
02:15 - 02:19
again this is a pretty abstract problem
02:17 - 02:21
so I'm going to you know just try and
02:19 - 02:22
give some very broad capacity estimates
02:21 - 02:24
then maybe we'll discuss them a little
02:22 - 02:26
bit more concretely later in the video
02:24 - 02:28
so the idea is that if we have a bunch
02:26 - 02:29
of users let's say on our cell phone
02:28 - 02:31
device we want to be able to give them
02:29 - 02:34
notific ifications in real time based on
02:31 - 02:35
some sort of topic now in theory it is
02:34 - 02:38
possible that notifications might be
02:35 - 02:39
sent to a specific device ID but at
02:38 - 02:41
least in my opinion I guess that's a
02:39 - 02:43
little bit of an easier problem you just
02:41 - 02:45
go send it to their specific device ID
02:43 - 02:46
and uh yeah or maybe to the server that
02:45 - 02:48
they're connected to if they're
02:46 - 02:50
connected via some sort of websocket or
02:48 - 02:52
something like that and then number two
02:50 - 02:54
is that if users are offline we need to
02:52 - 02:56
be able to store their notifications so
02:54 - 02:59
that we can run some sort of query and
02:56 - 03:00
ultimately fetch them later cool so
02:59 - 03:02
another thing is if we have let's say a
03:00 - 03:03
billion topics because at the end of the
03:02 - 03:04
day there are a lot of apps out there
03:03 - 03:08
they're probably each registering their
03:04 - 03:09
own topic and each user or uh basically
03:08 - 03:12
each topic is going to receive a th000
03:09 - 03:14
notifications per day of around 100
03:12 - 03:15
bytes on average now all of a sudden if
03:14 - 03:17
we have to store all of these messages
03:15 - 03:19
in our database we're storing 100
03:17 - 03:21
terabytes of data per day 30 pedabytes a
03:19 - 03:22
year so that's a lot it means we're
03:21 - 03:23
going to definitely have to be doing
03:22 - 03:25
some amount of partitioning in our
03:23 - 03:27
tables and uh whenever we're doing
03:25 - 03:28
partitioning and we have data
03:27 - 03:29
distributed across multiple nodes it
03:28 - 03:32
does allow us to do a little bit of
03:29 - 03:34
caching to speed up reads at times cool
03:32 - 03:35
so again note the topics can have
03:34 - 03:37
millions of users subscrib to them in
03:35 - 03:39
the same way that if I'm running Twitter
03:37 - 03:41
a given person who's posting tweets can
03:39 - 03:43
have millions of followers or if I'm
03:41 - 03:45
running Dropbox a given document can
03:43 - 03:47
have millions of users subscribe to
03:45 - 03:49
those changes so the question is do we
03:47 - 03:51
always want to push the notification to
03:49 - 03:52
each user well if you're thinking of
03:51 - 03:54
those previous videos and thinking about
03:52 - 03:57
the patterns here the answer is probably
03:54 - 03:58
not cool so the first thing that I'm
03:57 - 04:00
going to do is try and formalize that
03:58 - 04:02
fan out design path patter because at
04:00 - 04:03
least on my channel it's probably always
04:02 - 04:05
going to look the same there are many
04:03 - 04:07
different ways of implementing this type
04:05 - 04:09
of thing but you know how I do it which
04:07 - 04:11
is that I am going to go jerk off over
04:09 - 04:13
flank and uh we're going to do things
04:11 - 04:15
that way so if we have all of these
04:13 - 04:17
notifications to be delivered we would
04:15 - 04:19
just be throwing them into a Kafka
04:17 - 04:22
stream and we can Shard that up by the
04:19 - 04:25
actual topic ID uh we would also have to
04:22 - 04:27
have um in our Flint consumer uh we need
04:25 - 04:29
to have a sense of basically which users
04:27 - 04:30
care about which topics right so that we
04:29 - 04:33
can say something like oh for the
04:30 - 04:35
messages topic user 10 and 12 are
04:33 - 04:38
subscribed to that well the way that we
04:35 - 04:39
would do this is you know as opposed to
04:38 - 04:41
having to have flank reach out to the
04:39 - 04:43
database every single time of topic
04:41 - 04:45
subscriptions to figure out all the
04:43 - 04:47
users that care about a topic that takes
04:45 - 04:48
a lot of time those are expensive reads
04:47 - 04:50
there are potentially a lot of users
04:48 - 04:52
that are subscribed to a given topic
04:50 - 04:53
what would be better is if we could
04:52 - 04:56
actually just pre-populate our flank
04:53 - 04:58
node with all of this information per
04:56 - 05:00
topic ID that our flank node cares about
04:58 - 05:02
so instead what we'll do is we'll take
05:00 - 05:04
our topic subscription table we can
05:02 - 05:06
Shard It Out by user ID the reason being
05:04 - 05:07
that in the future we'll want to figure
05:06 - 05:09
out for a given user what topics they
05:07 - 05:11
care about but then when we actually
05:09 - 05:14
push this change data to Kafka we can
05:11 - 05:16
basically rehart it by our topic ID and
05:14 - 05:18
so this goes in sharted by topic ID so
05:16 - 05:21
that Flink can easily consume to just
05:18 - 05:22
one kofka Q or listen to One kofka q and
05:21 - 05:24
that way for every single topic that
05:22 - 05:27
Flink cares about it knows what users
05:24 - 05:29
subscribe to it so the thing to note
05:27 - 05:30
here is that for certain topics uh we're
05:29 - 05:32
going to have way too many users
05:30 - 05:34
subscribing to it right such that it
05:32 - 05:36
doesn't actually make sense to deliver
05:34 - 05:38
that message to each user individually
05:36 - 05:40
the cost is just going to be too
05:38 - 05:41
expensive so maybe we've got some other
05:40 - 05:44
topic called
05:41 - 05:46
all and Flink realizes oh shoot you know
05:44 - 05:47
what I've got a th users subscrib to
05:46 - 05:49
this thing we're just going to write
05:47 - 05:50
down here we're not going to store all
05:49 - 05:53
that state we're just going to call it
05:50 - 05:55
popular cool so let's say we do that
05:53 - 05:57
basically depending on whether this
05:55 - 06:00
incoming message is for the messages
05:57 - 06:02
topic let's say that's going to user 10
06:00 - 06:04
and 12 so then you know flank also can
06:02 - 06:07
either have some intermediary layer to
06:04 - 06:08
reach out to or just read from zookeeper
06:07 - 06:10
but the point is now it knows where to
06:08 - 06:12
Route these messages to eventually get
06:10 - 06:14
them to user 10 and 12 they have to go
06:12 - 06:15
to our sync servers so maybe this is the
06:14 - 06:17
server that's connected to user 10 this
06:15 - 06:19
is the server that's connected to user
06:17 - 06:21
12 so it's going to send that message
06:19 - 06:23
both here and over here on the other
06:21 - 06:25
hand if a message all comes
06:23 - 06:27
in now that's a popular one I want to
06:25 - 06:28
send it to the popular messages server
06:27 - 06:30
which is going to treat this thing a
06:28 - 06:32
little bit differently
06:30 - 06:34
AKA it's not going to try and send them
06:32 - 06:35
to individual users uh we're just going
06:34 - 06:39
to end up polling that later but I'll
06:35 - 06:40
discuss that in a little bit cool so the
06:39 - 06:42
main uh kind of nuance of this video
06:40 - 06:43
that I want to discuss that I probably
06:42 - 06:46
didn't spend too much time discussing in
06:43 - 06:49
the Facebook Messenger or the Google
06:46 - 06:50
Drive video is uh item potent so we do
06:49 - 06:53
want to make sure that all of our
06:50 - 06:54
messages are delivered only once so the
06:53 - 06:56
nice thing about flank is that flank is
06:54 - 06:57
going to ensure that all of those
06:56 - 06:59
messages are going to be delivered at
06:57 - 07:01
least once right because of the way that
06:59 - 07:03
Flink does checkpointing so at least
07:01 - 07:05
within Flink we know that all of these
07:03 - 07:08
messages are going to be handled at
07:05 - 07:10
least once over here and then depending
07:08 - 07:12
on how we handle kind of Downstream all
07:10 - 07:14
of the messages that get to our sync
07:12 - 07:16
servers we know that uh they're going to
07:14 - 07:18
be delivered to the user at least once
07:16 - 07:19
so it's our job to make sure that
07:18 - 07:21
they're going to be delivered only once
07:19 - 07:24
and not more than once or at least
07:21 - 07:25
processed only once so what could we
07:24 - 07:27
actually do well keep in mind that our
07:25 - 07:29
notification server right that's just
07:27 - 07:31
forwarding messages to the user so
07:29 - 07:32
that's state list if it hears from flank
07:31 - 07:33
it's just going to say ooh am I
07:32 - 07:35
connected to this user let me go ahead
07:33 - 07:37
and forward this over there so again
07:35 - 07:39
this message is going to deliver at
07:37 - 07:42
least once due to that flank uh kind of
07:39 - 07:44
state replay so the question is well
07:42 - 07:46
what can we do to actually ensure that
07:44 - 07:48
our client isn't just showing duplicate
07:46 - 07:50
notifications all the time one thing we
07:48 - 07:52
could do is that we can actually store
07:50 - 07:54
the notifications that we've seen or at
07:52 - 07:55
least the IDS of them on the client to
07:54 - 07:57
make sure that we don't redeliver them
07:55 - 07:59
so the concept of this is called
07:57 - 08:00
something like an item potency key
07:59 - 08:02
meaning that uh you know we're basically
08:00 - 08:05
just keeping a little bit of memory on
08:02 - 08:06
our client uh and you know keeping a set
08:05 - 08:08
and making sure that you know based on
08:06 - 08:10
hashing that IDE that we don't already
08:08 - 08:12
have it somewhere on our client the
08:10 - 08:14
problem is this is going to use up extra
08:12 - 08:16
memory the question is how much extra
08:14 - 08:20
memory and do we actually
08:16 - 08:23
care so uh in theory um basically if we
08:20 - 08:24
are going to assume that uh you know
08:23 - 08:26
we're getting about a th000
08:24 - 08:28
notifications per day on a device and
08:26 - 08:30
each one is 16 bytes because that's how
08:28 - 08:33
much a uu ID which we could use for our
08:30 - 08:36
ID and potency key takes that alone is
08:33 - 08:38
going to be like basically 16 kilobytes
08:36 - 08:40
which is not so bad and in theory we
08:38 - 08:42
could totally store this on the client
08:40 - 08:43
but uh I would imagine that in a systems
08:42 - 08:45
design interview someone might push you
08:43 - 08:46
a little bit and say well what if we
08:45 - 08:48
don't want to use that extra memory
08:46 - 08:50
footprint on the client let's go ahead
08:48 - 08:53
and try and store all of these item
08:50 - 08:55
potency Keys elsewhere to which I say
08:53 - 08:56
fair enough um you know I suppose in
08:55 - 08:59
certain edge cases this would be too
08:56 - 09:00
much memory for certain devices maybe
08:59 - 09:02
you've got like like an apple watch or
09:00 - 09:03
something and uh at the end of the day
09:02 - 09:07
that is just not going to have the
09:03 - 09:10
memory to store all of this so again uh
09:07 - 09:13
if we wanted to store this let's say on
09:10 - 09:14
our actual notification server itself so
09:13 - 09:16
that would be up here the thing
09:14 - 09:18
forwarding the messages that means we
09:16 - 09:20
would have to store all of the item
09:18 - 09:22
potent keys for all of the users that
09:20 - 09:24
that notification server is uh cares
09:22 - 09:26
about and so let's say it's connected to
09:24 - 09:27
around 65,000 users CU that's around how
09:26 - 09:29
many ports you can use to connect to
09:27 - 09:31
people that comes out to around a
09:29 - 09:33
gigabyte of memory which itself is not
09:31 - 09:35
too bad but again they may push you and
09:33 - 09:37
say well what if it were greater than
09:35 - 09:38
this what would you do then so the one
09:37 - 09:40
thing I want to note about actually
09:38 - 09:42
keeping all of these uh keys on our
09:40 - 09:44
server to ensure that we aren't sending
09:42 - 09:46
duplicate messages to the client is that
09:44 - 09:49
we have the potential for partial
09:46 - 09:51
failures right so let's say that uh one
09:49 - 09:53
thing that we do is we say okay we're
09:51 - 09:54
going to send our message to the client
09:53 - 09:56
and then when we hear back from the
09:54 - 09:58
client saying I got the message we're
09:56 - 10:00
going to keep track of that item potency
09:58 - 10:03
key so we don't actually send that
10:00 - 10:05
message again well what about when the
10:03 - 10:07
client device for whatever reason is
10:05 - 10:09
sending that uh acknowledgement back to
10:07 - 10:10
the notification server and it doesn't
10:09 - 10:12
work well we're never actually going to
10:10 - 10:14
add the ad in potency key so that's one
10:12 - 10:15
partial failure scenario and the other
10:14 - 10:17
is that you might say oh well what if
10:15 - 10:19
instead of uh keeping track of the item
10:17 - 10:21
potency key after we send it to the
10:19 - 10:23
client we write it down before we send
10:21 - 10:26
it to the client uh and then we'll just
10:23 - 10:28
go ahead and store that around well it
10:26 - 10:31
would be possible that you know we write
10:28 - 10:33
down the item potency key so it's like
10:31 - 10:35
key1 we try and send that over to the
10:33 - 10:37
client that doesn't work out for us and
10:35 - 10:39
then maybe our server goes down so it
10:37 - 10:41
never actually realizes that the client
10:39 - 10:42
didn't get it now all of a sudden we've
10:41 - 10:45
added an Iden potency key for an
10:42 - 10:46
undelivered message which is bad because
10:45 - 10:48
if that message then comes in again and
10:46 - 10:49
we want to redeliver it we're not going
10:48 - 10:51
to do so and that's never going to get
10:49 - 10:54
to the client so in theory if we did
10:51 - 10:56
want to make this process completely
10:54 - 10:58
perfect we would need two-phase commit
10:56 - 11:00
in practice uh it's not really the end
10:58 - 11:02
of the world if you don't get a push
11:00 - 11:05
notification received and so who cares
11:02 - 11:06
uh but yeah you know note that uh if we
11:05 - 11:08
want everything to be proper we would
11:06 - 11:10
have to basically add our item potency
11:08 - 11:12
key conditional on the fact that the
11:10 - 11:13
client device is able to process it and
11:12 - 11:15
that of course is a distributed
11:13 - 11:18
transaction which requires two phase
11:15 - 11:21
commit cool so the last piece of this is
11:18 - 11:22
you know maybe you say o you know 16 or
11:21 - 11:24
a gigabyte of memory on our notification
11:22 - 11:26
server we just don't want to pay for
11:24 - 11:28
that we would rather do this all on disk
11:26 - 11:30
in some sort of database so you know
11:28 - 11:32
let's imagine instead we got 10,000
11:30 - 11:34
messages a day per user and all of a
11:32 - 11:35
sudden we need to store 16 gbt of memory
11:34 - 11:37
and that's just too much for our
11:35 - 11:39
notification server we could actually do
11:37 - 11:41
this in a database like I
11:39 - 11:43
mentioned and uh the way we would do
11:41 - 11:44
that to probably make reads and writes
11:43 - 11:46
as fast as possible is we would go ahead
11:44 - 11:49
and index on our specific item potency
11:46 - 11:50
key and then partition the table by user
11:49 - 11:52
ID so that you know for a given user and
11:50 - 11:54
a given item potency key we can at least
11:52 - 11:57
relatively quickly jump into that index
11:54 - 11:59
and find it however uh this is obviously
11:57 - 12:01
going to incur quite a bit of extra
11:59 - 12:04
latency as in step one we have to check
12:01 - 12:06
whether the item potency key exists step
12:04 - 12:08
two we have to go ahead and write our
12:06 - 12:09
message to the client and then in step
12:08 - 12:11
three we have to write that item potency
12:09 - 12:14
key back to the database so is there
12:11 - 12:16
anything that we can do to optimize this
12:14 - 12:17
process a little bit well actually in
12:16 - 12:19
fact there is we could use something
12:17 - 12:21
like a bloom filter so a bloom filter is
12:19 - 12:23
going to allow us to ideally get rid of
12:21 - 12:25
this step one where we have to read the
12:23 - 12:28
database to check if uh that item
12:25 - 12:29
potency key has been seen already it's
12:28 - 12:31
not perfect but at least some percentage
12:29 - 12:33
of the time the bloom filter will help
12:31 - 12:35
us out here so a bloom filter is a
12:33 - 12:36
probabilistic data structure that we've
12:35 - 12:37
spoken about plenty of times on this
12:36 - 12:40
channel but I'll quickly go over it one
12:37 - 12:42
more time so the idea is that we
12:40 - 12:45
basically want to avoid one extra read
12:42 - 12:48
to database so let's say I see some key
12:45 - 12:49
called Jordan right that's going to hash
12:48 - 12:52
and basically our Bloom filter is a fix
12:49 - 12:54
size memory buffer that uh involves
12:52 - 12:56
basically using multiple hash functions
12:54 - 12:58
to map a given key to multiple places in
12:56 - 12:59
the memory buffer so let's say hash
12:58 - 13:02
function one is going going to map the
12:59 - 13:03
key Jordan over here into that bucket
13:02 - 13:05
hash function two maps it into this
13:03 - 13:08
bucket hash function three Maps it into
13:05 - 13:11
this bucket then we add the key Kate uh
13:08 - 13:13
Kate is going to go here here and here
13:11 - 13:14
with our three hash functions and then
13:13 - 13:17
the key Megan comes around and we want
13:14 - 13:18
to say o have we already seen Megan well
13:17 - 13:21
we know already just using hash function
13:18 - 13:23
one that we haven't because there's no
13:21 - 13:25
other element that's already filled up
13:23 - 13:28
this bucket right here so Megan has to
13:25 - 13:30
be unique even if this guy went and
13:28 - 13:32
hashed over here and this guy on hash
13:30 - 13:34
two hashed over here we could see that
13:32 - 13:37
because hash function 3 put Megan in a
13:34 - 13:38
unique bucket that Megan is a key that
13:37 - 13:40
we have not already seen and then we
13:38 - 13:42
don't actually have to read the database
13:40 - 13:43
we can just go ahead and send that
13:42 - 13:46
message right over to the client without
13:43 - 13:48
incurring an extra read cost now it is
13:46 - 13:50
the case that you know we might see a
13:48 - 13:51
key and it looks like it is completely
13:50 - 13:52
in the bloom filter but we can't
13:51 - 13:54
guarantee that it is because those
13:52 - 13:56
buckets may have been filled up by a
13:54 - 13:58
combination of other keys so the bloom
13:56 - 14:00
filter will tell us when we haven't
13:58 - 14:01
already seen a key but we can't be sure
14:00 - 14:03
that we have already seen a key so
14:01 - 14:05
that's worth noting so of course when we
14:03 - 14:07
haven't already seen that message we
14:05 - 14:09
don't have to read the database we just
14:07 - 14:11
go ahead and write it right to our
14:09 - 14:13
client so you can see that I'm uh losing
14:11 - 14:14
my voice a little bit here but I'm going
14:13 - 14:17
to try and get through the rest of this
14:14 - 14:18
thing so as far as the client is
14:17 - 14:20
concerned right because the client
14:18 - 14:22
itself is connected to the notification
14:20 - 14:23
service we need to be able to do two
14:22 - 14:25
things the first is that for the
14:23 - 14:27
messages that don't have to be delivered
14:25 - 14:29
to too many users which I'll call the
14:27 - 14:31
unpopular ones we want to be delivered
14:29 - 14:33
bring those in real time so the idea
14:31 - 14:35
here is that again we've got some sort
14:33 - 14:37
of routing server right and we've used
14:35 - 14:39
this pattern as well in a ton of places
14:37 - 14:41
Uber Facebook Messenger all that so the
14:39 - 14:43
routing server is basically just going
14:41 - 14:45
to look at the existing notification
14:43 - 14:47
servers how many connections they have
14:45 - 14:49
probably use some sort of consistent
14:47 - 14:51
hashing schema on the user ID and then
14:49 - 14:54
assign a notification server to a given
14:51 - 14:55
user that wants to connect to one so
14:54 - 14:57
it's going to assign it some sort of
14:55 - 14:59
address based on consistent hashing and
14:57 - 15:00
then if for whatever reason uh you know
14:59 - 15:03
they're sending heartbeats back and
15:00 - 15:04
forth with one another and uh the client
15:03 - 15:06
server real or the client device
15:04 - 15:08
realizes that it's no longer connected
15:06 - 15:10
to that notification Service uh it will
15:08 - 15:12
just reach right back out to the routing
15:10 - 15:13
server to get another connection now the
15:12 - 15:17
one thing I wanted to point out is we
15:13 - 15:18
should probably do a random Jitter uh
15:17 - 15:20
every time that that connection breaks
15:18 - 15:23
down to basically wait a little bit
15:20 - 15:25
before we reconnect otherwise if one
15:23 - 15:27
notification server goes down we're
15:25 - 15:28
going to have all of its multiple
15:27 - 15:30
thousands of devices trying to reconnect
15:28 - 15:33
to new servers at the same time and that
15:30 - 15:34
could cause a Thundering Herd cool the
15:33 - 15:37
next thing is that we also want to be
15:34 - 15:39
polling popular messages on an interval
15:37 - 15:41
so the idea here is that you know
15:39 - 15:42
because uh we have all this information
15:41 - 15:45
stored in our database and partitioned
15:42 - 15:46
by user we can really quickly just get
15:45 - 15:48
all of our topic subscriptions for a
15:46 - 15:50
given client device and then that is
15:48 - 15:52
going to tell me whether a given topic
15:50 - 15:55
is also popular right so one thing that
15:52 - 15:58
I didn't mention too much is that in
15:55 - 16:00
this fan out pattern when Flink deems
15:58 - 16:02
that a given uh topic is popular because
16:00 - 16:04
it's subscribed to all of its users it
16:02 - 16:06
can actually go ahead and reach back out
16:04 - 16:08
to the Topic's table and say hey by the
16:06 - 16:10
way this topic is now considered a
16:08 - 16:12
popular one or we could make it user
16:10 - 16:13
configurable as well like an app might
16:12 - 16:14
be able to say hey I'm going to send
16:13 - 16:17
this particular notification to all
16:14 - 16:19
users Market in advance as a popular
16:17 - 16:21
topic either way works um but you know
16:19 - 16:24
it's a design
16:21 - 16:26
specific cool so the idea is um you know
16:24 - 16:29
when we figure out which information on
16:26 - 16:31
a given po uh topic is popular then we
16:29 - 16:34
can basically go ahead and go to our
16:31 - 16:35
actual notifications table which uh you
16:34 - 16:36
know something like Cassandra works
16:35 - 16:39
because we want fast ingestion there
16:36 - 16:41
through the lstm tree or rather the LSM
16:39 - 16:43
tree which is uh buffered in memory and
16:41 - 16:45
basically also multiple different leader
16:43 - 16:47
nodes and then we can just read from
16:45 - 16:49
Cassandra we can partition It Out by
16:47 - 16:51
topic so that uh we can easily find
16:49 - 16:53
those messages and sort within a topic
16:51 - 16:54
on timestamp which is going to make our
16:54 - 16:59
easy cool sorry I'm brushing through
16:57 - 17:01
this one guys I just am in quite a bit
16:59 - 17:04
of pain at the moment so let me go ahead
17:01 - 17:05
and finish it off so as you can see as
17:04 - 17:07
far as the actual notification service
17:05 - 17:09
diagram goes uh it's pretty standard to
17:07 - 17:11
what we describing in this full video on
17:09 - 17:13
the left let's actually first discuss
17:11 - 17:16
how a client is going to receive
17:13 - 17:18
notifications basically we've got you
17:16 - 17:20
know our client over here and on one
17:18 - 17:21
side we've got some sort of notification
17:20 - 17:23
server router which is going to listen
17:21 - 17:25
to zookeeper get the current consistent
17:23 - 17:27
hashing policy connect us to a
17:25 - 17:29
notification server and then once we're
17:27 - 17:30
connected to a notification server we
17:29 - 17:33
can communicate back and forth sending
17:30 - 17:34
one another heartbeats via websockets
17:33 - 17:36
this is going to allow me as the client
17:34 - 17:38
to realize if I'm still connected to the
17:36 - 17:40
notification server and if I'm not then
17:38 - 17:43
I have to once again go back here and
17:40 - 17:44
reconnect to a different one cool so
17:43 - 17:46
that is going to make sure that I'm
17:44 - 17:47
getting basically the unpopular
17:46 - 17:49
notifications for the popular
17:47 - 17:51
notifications I'm going to go ahead and
17:49 - 17:54
read from my load balancer I'm going to
17:51 - 17:55
hit my notification polling service
17:54 - 17:57
which is now going to all of a sudden
17:55 - 17:59
basically tell me what topics I'm
17:57 - 18:01
subscribed to and then from knowing the
17:59 - 18:03
topics that I'm subscribed to I can
18:01 - 18:05
actually go ahead and first off read
18:03 - 18:08
from some sort of popular notifications
18:05 - 18:10
cache which is in redus uh this guy is
18:08 - 18:12
just going to basically act as a proxy
18:10 - 18:14
between the actual notifications
18:12 - 18:16
database table and uh hopefully should
18:14 - 18:18
end up being filled with all of the
18:16 - 18:19
popular notifications because those are
18:18 - 18:21
going to be getting read a lot more uh
18:19 - 18:24
we can just use an lru replacement
18:21 - 18:25
policy to basically ensure this and then
18:24 - 18:26
that is going to be getting uh proxied
18:25 - 18:28
information from our actual
18:26 - 18:29
notifications table that I just
18:28 - 18:31
described before so this guy is going to
18:29 - 18:33
be Cassandra because we are going to be
18:31 - 18:35
publishing a ton of messages there from
18:33 - 18:37
flank and uh it would be good to be able
18:35 - 18:40
to injust them pretty quickly we can
18:37 - 18:42
Shard this guy by our topic ID and sort
18:40 - 18:44
it by our timestamp that way we can
18:42 - 18:46
basically just say for a given topic
18:44 - 18:47
that I want to get the messages for all
18:46 - 18:48
of our messages that I care about
18:47 - 18:50
between these two time stamps are
18:48 - 18:52
already pre-index so it should be a
18:50 - 18:53
relatively fast query and ideally most
18:52 - 18:55
of the time we're just going to be
18:53 - 18:58
hitting over here on our cache as far as
18:55 - 18:59
flank is concerned basically the way
18:58 - 19:01
that this works like I mentioned before
18:59 - 19:04
is using the fan out pattern so we have
19:01 - 19:05
all of our topic subscriptions this can
19:04 - 19:07
just be stored in a mySQL database we're
19:05 - 19:08
not going to be reading there too often
19:07 - 19:10
we're not going to be writing there too
19:08 - 19:11
often but we are going to be reading
19:10 - 19:13
from there a decent amount and as a
19:11 - 19:15
result of that I think using SQL or a B
19:13 - 19:17
tree based database is going to be nice
19:15 - 19:19
here and also again it just keeps things
19:17 - 19:22
simple sh this guy out on user ID and
19:19 - 19:24
then when we actually put everything in
19:22 - 19:27
Kafka apologies typo here on my end this
19:24 - 19:29
cka que should be sharded on topic ID at
19:27 - 19:32
which point we go ahead ahead and put
19:29 - 19:34
everything into Flink so Flink is itself
19:32 - 19:36
sharded again on topic ID so that per
19:34 - 19:38
topic that it cares about it has a sense
19:36 - 19:39
of all the users that it needs to go
19:38 - 19:42
ahead and reach out
19:39 - 19:44
to so when a message comes in from our
19:42 - 19:45
actual notification queue over here and
19:44 - 19:47
this can be published from a variety of
19:45 - 19:49
different places right every app is
19:47 - 19:51
effectively publishing to this massive
19:49 - 19:53
CFA Q probably through some sort of you
19:51 - 19:56
know notification service that then you
19:53 - 19:58
know proxies over to this cofa but the
19:56 - 19:59
gist is that flank is basically going to
19:59 - 20:05
messages put them in the notifications
20:02 - 20:06
table whether they're popular or not and
20:05 - 20:08
then also assuming they're not popular
20:06 - 20:11
it is going to Fan them out to the
20:08 - 20:12
proper notification servers that we care
20:11 - 20:15
about and so we can either you know do
20:12 - 20:17
that fan out via some sort of middleware
20:15 - 20:20
over here that's listening to a
20:17 - 20:23
zookeeper or we can have Flink itself
20:20 - 20:24
listen to zookeeper you know get a sense
20:23 - 20:27
of where the messages actually have to
20:24 - 20:29
be delivered and then deliver them uh
20:27 - 20:30
deliver them there accordingly
20:29 - 20:33
well guys I hope this video was somewhat
20:30 - 20:35
helpful again like nothing too novel
20:33 - 20:37
here in terms of um what we've actually
20:35 - 20:38
spoken about in this series so far the
20:37 - 20:41
reason being that I want to just Hammer
20:38 - 20:43
home the point that at least for four to
20:41 - 20:45
five of these problems that I've covered
20:43 - 20:47
right now this fan out problem or rather
20:45 - 20:48
this fan out design is going to be
20:47 - 20:50
something that you want to really know
20:48 - 20:52
super well and uh have that in the back
20:50 - 20:54
of your head and also be able to
20:52 - 20:56
recognize when to use it uh because it
20:54 - 20:59
will definitely come in handy anyways I
20:56 - 21:02
hope you all enjoy your weekend and uh I
20:59 - 21:02
will see you in the next one