00:00 - 00:04
hey everyone today we are going to dive
00:02 - 00:07
into mail servers the real backbone of
00:04 - 00:09
email systems a mail server is a
00:07 - 00:12
computer system responsible for sending
00:09 - 00:14
receiving and storing email messages it
00:12 - 00:16
acts as a virtual post office that
00:14 - 00:18
handles the entire email flow allowing
00:16 - 00:21
users to communicate over the
00:18 - 00:23
Internet so when you hit send your email
00:21 - 00:25
client like the Gmail app sends the
00:23 - 00:28
email to Google's outgoing mail server
00:25 - 00:31
using SMTP now the email is in Google
00:28 - 00:33
system ready for further Prof processing
00:31 - 00:35
Google server performs a DNS lookup to
00:33 - 00:37
identify the correct mail server for the
00:35 - 00:40
recipient's domain such as yaho MX or
00:37 - 00:42
Mail Exchange Server for an @yahoo.com
00:40 - 00:45
address this lookup helps Google find
00:42 - 00:47
where to send the email next and once
00:45 - 00:49
Google has the MX record it begins
00:47 - 00:52
transmitting the email to Yahoo server
00:49 - 00:53
via SMTP the email might pass through
00:52 - 00:56
multiple Network routers or relay
00:53 - 00:58
servers ensuring it reaches Yahoo and
00:56 - 01:00
finally the email arrives at Yahoo
00:58 - 01:02
server where it is process processed
01:00 - 01:04
filtered and stored in the recipient's
01:02 - 01:07
mailbox waiting for them to open and
01:04 - 01:09
read it in this video you will not only
01:07 - 01:11
learn more about mail servers but we'll
01:09 - 01:13
also do a high level system design so
01:11 - 01:13
let's get
01:16 - 01:21
started just like physical mail needs a
01:19 - 01:23
post office for sorting and delivery
01:21 - 01:26
email needs a mail server to handle
01:23 - 01:28
sending receiving and storing messages
01:26 - 01:30
the two main ties we'll focus on today
01:28 - 01:33
are the outgoing mail server usually
01:30 - 01:37
using SMTP and the incoming mail server
01:33 - 01:39
using either IMAP or pop 3 together they
01:37 - 01:42
keep our digital communication flowing
01:39 - 01:44
smoothly when you had send on an email
01:42 - 01:46
your email client such as Outlook or
01:44 - 01:49
Gmail connects to an outgoing email
01:46 - 01:52
server via SMTP or simple mail transfer
01:49 - 01:55
protocol the SMTP server checks the
01:52 - 01:58
domain of the email recipient say at
01:55 - 02:00
yahoo.com to identify where it should go
01:58 - 02:03
now to actually find yaho server the
02:00 - 02:06
SMTP server performs a DNS lookup think
02:03 - 02:08
of DNS as the internet's address book
02:06 - 02:10
its goal is to locate the recipient mail
02:08 - 02:12
server or Mail Exchange records
02:10 - 02:13
basically the coordinates of Yahoo's
02:13 - 02:17
server with the MS record in hand our
02:15 - 02:19
email is ready to be handed off to
02:17 - 02:22
Yahoo's mail server here is where things
02:19 - 02:23
get interesting the email might pass
02:22 - 02:26
through several intermediary relay
02:23 - 02:28
servers and these are like transfer
02:26 - 02:30
stations along the way each one ensuring
02:28 - 02:33
your message stays on track
02:30 - 02:35
some relay servers belong to isps others
02:33 - 02:37
to Backbone providers that make up the
02:35 - 02:38
internet's infrastructure the email
02:37 - 02:42
travels through these networks using
02:38 - 02:45
secure protocols thanks to bgp or border
02:42 - 02:47
Gateway protocol and a system of puring
02:45 - 02:50
agreements between internet providers
02:47 - 02:53
and all of this happens in milliseconds
02:50 - 02:55
now the internet is made up of thousands
02:53 - 02:58
of autonomous systems or as large
02:55 - 03:00
networks owned by isps backbone
02:58 - 03:02
providers data centers and large
03:00 - 03:05
organizations like Google or Yahoo are
03:02 - 03:07
the autonomous systems bgp is
03:05 - 03:09
responsible for figuring out the most
03:07 - 03:11
efficient route for your data to travel
03:09 - 03:14
across multiple networks routers within
03:11 - 03:16
each as advertise their available routes
03:14 - 03:18
to neighboring asss using
03:16 - 03:20
bgp these advertisements include
03:18 - 03:23
information about the paths and metrics
03:20 - 03:25
that help determine the optimal route
03:23 - 03:27
considering things like distance
03:25 - 03:29
connection speed and agreements between
03:27 - 03:31
networks so as your email leaves Google
03:29 - 03:33
Network it is routed through various
03:31 - 03:35
intermediary networks such as isps or
03:33 - 03:39
backand providers before reaching
03:35 - 03:41
Yahoo's Network neither Google nor Yahoo
03:39 - 03:44
manually selects each ISP along the way
03:41 - 03:46
instead bgp dynamically routes the data
03:44 - 03:48
based on real-time Network conditions
03:46 - 03:50
and the routing policies of each as all
03:48 - 03:53
right so our email finally arrives at
03:50 - 03:55
Yahoo's mail server now what happens the
03:53 - 03:57
email is stored on the server in the
03:55 - 03:59
recipient's inbox until they log in to
03:57 - 04:01
retrieve it at this point the
03:59 - 04:04
recipient's email client whether it is
04:01 - 04:07
on their phone laptop or browser uses
04:04 - 04:09
either IMAP or pop 3 to fetch the
04:07 - 04:12
email IMAP or Internet message access
04:09 - 04:15
protocol keeps emails on the server
04:12 - 04:18
allowing access from multiple devices
04:15 - 04:19
whereas POP 3 or post office protocol 3
04:18 - 04:22
downloads the email to one device and
04:19 - 04:24
removes it from the server and it's a
04:22 - 04:26
great option if you want offline access
04:24 - 04:28
but again it's less flexible this
04:26 - 04:31
layered approach allows emails to reach
04:28 - 04:32
its destination securely and efficiently
04:31 - 04:35
while relying on various network
04:32 - 04:37
resources now if you're curious about
04:35 - 04:39
setting up your own mail server while
04:37 - 04:42
it's rewarding it's a bit like running
04:39 - 04:44
your own post office here is a high
04:42 - 04:46
level view of the process you first
04:44 - 04:49
register a custom domain like
04:46 - 04:50
example.com so your email can use at
04:50 - 04:56
you then set up a dedicated server a VPS
04:54 - 04:59
from other providers like AWS LOD or
04:56 - 05:01
digital ocean will do most M server
04:59 - 05:04
software is optimized for Linux you then
05:01 - 05:07
have an instant mail server software you
05:04 - 05:10
use an SMTP server like postf or exm for
05:07 - 05:12
sending emails and Dove code for IMAP or
05:10 - 05:15
pop 3 if you want to retrieve
05:12 - 05:19
emails you then configure the DNS with
05:15 - 05:21
MX SPF dkim and dmar records and this
05:19 - 05:24
step is essential to Route emails to
05:21 - 05:25
your server and keep them secure these
05:24 - 05:27
records authenticate and protect your
05:25 - 05:30
domain from spammers and
05:27 - 05:33
spoofers and then you use softwares like
05:30 - 05:35
spam assassin and clam a that can help
05:33 - 05:37
with filtering out spams and viruses and
05:35 - 05:39
finally tools like round CU provide a
05:37 - 05:41
browser based inbox so you can manage
05:39 - 05:42
your emails like on Gmail but with your
05:42 - 05:46
domain setting up your own mail server
05:45 - 05:48
might not be difficult but it requires
05:46 - 05:50
maintenance monitoring and ongoing
05:48 - 05:52
security updates but if you're up for
05:50 - 05:53
the challenge it's a great way to
05:52 - 05:55
control your Communication in this video
05:53 - 05:56
we are not going to do that setup but
05:55 - 05:58
we'll be deep diving into the system
05:56 - 06:01
design aspect of it now before we
05:58 - 06:02
proceed note that a regular server isn't
06:01 - 06:05
typically configured for email
06:02 - 06:06
processing male servers are specialized
06:05 - 06:09
for handling email specific task and
06:06 - 06:12
protocols they support email protocols
06:09 - 06:14
like SMTP IMAP and POP 3 which aren't
06:12 - 06:16
configured by default on regular servers
06:14 - 06:18
email can experience relays due to
06:16 - 06:20
network issues so mail servers use
06:18 - 06:23
queuing mechanism to retry delivery if
06:20 - 06:26
needed regular servers lack this robust
06:23 - 06:28
retry mechanisms so clearly designing a
06:26 - 06:30
robust mail server system involves
06:28 - 06:33
integrating several key components to
06:30 - 06:35
ensure efficient email delivery storage
06:33 - 06:37
and security let's explore each
06:35 - 06:39
component and its role in the system at
06:37 - 06:41
a high level a mail server system
06:39 - 06:43
includes a few key components SMTP
06:41 - 06:46
servers handle sending and receiving
06:43 - 06:48
emails over the network mail retrieval
06:46 - 06:51
servers manage email access through
06:48 - 06:53
protocols like IMAP or pop 3 and then we
06:51 - 06:55
have mail storage system which stores
06:53 - 06:58
email messages and attachments securely
06:55 - 07:00
and efficiently you can also think of a
06:58 - 07:03
user authentication Service
07:00 - 07:05
that keeps user login secure a web and
07:03 - 07:06
mobile interface for users to access and
07:05 - 07:09
interact with their
07:06 - 07:11
inboxes a Spam and virus filtering
07:09 - 07:13
service that protects against unwanted
07:11 - 07:15
and malicious content and then of course
07:13 - 07:17
we'll have database systems to store
07:15 - 07:19
metadata user profiles and email
07:17 - 07:21
indexing information let's explore each
07:19 - 07:23
components its function and how they
07:21 - 07:25
interact with each other the SMTP
07:23 - 07:27
servers or the simple mail transfer
07:25 - 07:30
protocol servers are the backbone of
07:27 - 07:32
email delivery they handle both outgoing
07:30 - 07:34
and incoming emails so when you send an
07:32 - 07:36
email the smtv server reaches out to a
07:34 - 07:39
DNS resolver to find the recipient's
07:36 - 07:41
domain Mail Exchange record the record
07:39 - 07:44
tells SMTP server where to deliver the
07:41 - 07:46
email if there are multiple MX records
07:44 - 07:49
They are prioritized based on preference
07:46 - 07:51
values and to speed things up SMTP
07:49 - 07:52
servers cach DNS lookups reducing
07:51 - 07:55
latency for frequent
07:52 - 07:56
addresses in order to scale the SMTP
07:55 - 07:58
servers these servers are typically
07:56 - 08:01
deployed behind load balances to handle
07:58 - 08:03
High IM volumes rate limiting and
08:01 - 08:05
logging are in place to prevent abuse
08:03 - 08:08
and ensure smooth traffic flow in
08:05 - 08:10
essence SMTP servers are like their mail
08:08 - 08:13
carriers of the system ensuring emails
08:10 - 08:15
get from point A to point B as securely
08:13 - 08:17
and efficiently as possible once emails
08:15 - 08:19
are on the server we need a way for
08:17 - 08:21
users to access them that's where mail
08:19 - 08:23
retrieval servers comes in using
08:21 - 08:26
protocols like IMAP and POP 3 like we
08:23 - 08:28
have seen before IMAP keeps emails on
08:26 - 08:31
the server making it ideal for users who
08:28 - 08:33
access email from multiple devices POP 3
08:31 - 08:35
on the other hand downloads emails to a
08:33 - 08:36
device and often deletes them from the
08:36 - 08:41
afterwards like SMTP servers IMAP and
08:39 - 08:43
POP 3 servers are deployed behind load
08:41 - 08:46
balancers to manage High trffic the
08:43 - 08:48
setup keeps things responsive even with
08:46 - 08:51
a larger user base now let's talk about
08:48 - 08:53
where all those emails actually Live
08:51 - 08:55
Mail storage systems are designed to
08:53 - 08:57
store massive amounts of email data
08:55 - 08:59
while keeping it accessible and secure
08:57 - 09:02
so let's say each user sends or receives
08:59 - 09:04
around 20 emails daily with an average
09:02 - 09:07
email size of 75 KB including the
09:04 - 09:09
attachments so for 10 million users
09:07 - 09:12
that's about 15 terab of data every day
09:09 - 09:15
over a year we looking at around 5.5
09:12 - 09:18
petabytes so daily storage is about 15
09:15 - 09:21
terabytes per day and our yearly storage
09:18 - 09:23
is 5.5 paby per year and to handle all
09:21 - 09:26
this we use distributed Storage
09:23 - 09:29
Solutions so for attachments object
09:26 - 09:32
storage like Amazon S3 or htfs will work
09:29 - 09:34
well for metadata and indexing in no SQL
09:32 - 09:35
database like Cassandra provide
09:35 - 09:39
retrieval we replicate data across
09:37 - 09:42
multiple data centers and use eraser
09:39 - 09:44
coding for efficiency and fall tolerance
09:42 - 09:48
this ensures that even if some storage
09:44 - 09:49
notes fail no data is lost in short the
09:48 - 09:52
male storage system is like a massive
09:49 - 09:54
filing cabinet structured to handle
09:52 - 09:56
immense amount of data while keeping
09:54 - 09:58
everything safe and easily accessible
09:56 - 10:01
now for a full functional email system
09:58 - 10:03
users need web and mobile interfaces
10:01 - 10:06
just like what you see with Gmail or
10:03 - 10:08
Outlook so scalable web servers like
10:06 - 10:10
enginex or Apache sit behind load
10:08 - 10:12
balancers handling traffic and serving
10:10 - 10:14
for the user interface you can expose
10:12 - 10:16
restful apis for both web and mobile
10:14 - 10:18
clients and these apis are secured with
10:16 - 10:20
authentication and rate limiting to
10:18 - 10:22
ensure safe and smooth access these
10:20 - 10:24
interfaces provide users with seamless
10:22 - 10:26
responsive experience Bridging the
10:24 - 10:28
backend services and the front-end
10:26 - 10:30
experience furthermore every secure
10:28 - 10:33
system needs an authentication Service
10:30 - 10:36
to ensure only authorized user access it
10:33 - 10:38
so for secure logins you can use a2.0
10:36 - 10:41
and strong hashing algorithms like
10:38 - 10:44
bcrypt or argun 2 to store passwords and
10:41 - 10:46
we add multiactor authentications or MFA
10:44 - 10:48
for an extra layout of security this
10:46 - 10:50
component ensures that every login is
10:48 - 10:52
secure protecting user accounts from
10:50 - 10:54
unauthorized access one of the key
10:52 - 10:57
features of any mail server system is
10:54 - 11:00
Spam and virus filtering keeping your
10:57 - 11:02
inboxes safe no want spam or viruses in
11:00 - 11:05
their inbox right and that's where spam
11:02 - 11:07
and virus filtering service comes in so
11:05 - 11:09
we use machine learning models like
11:07 - 11:11
knife based classifier and logistic
11:09 - 11:15
regression to filter out spams Blacklist
11:11 - 11:17
and wh list help refine these filters
11:15 - 11:19
then you can integrate your antivirus
11:17 - 11:21
software scans attachments and email
11:19 - 11:24
content to catch any malicious files
11:21 - 11:26
before they reach the inbox you can also
11:24 - 11:28
include a realtime feed that helps us
11:26 - 11:31
stay updated on the latest threats
11:28 - 11:33
adjusting your filters as needed this
11:31 - 11:35
filter sits right in the email pipeline
11:33 - 11:38
scanning every incoming message to keep
11:35 - 11:40
users inboxes clean and secure behind
11:38 - 11:42
the scenes our database system stored
11:40 - 11:45
the metadata things like user profiles
11:42 - 11:47
email indexes and configurations so we
11:45 - 11:50
can use databases like MySQL or postare
11:47 - 11:52
SQL for structured data sharding and
11:50 - 11:55
replication ensures scalability and
11:52 - 11:57
reliability for quick access we use
11:55 - 11:59
inmemory caches like red or mem cach to
11:57 - 12:02
store frequent access data
11:59 - 12:04
these databases act as a backbone of our
12:02 - 12:07
system managing metadata efficiently and
12:04 - 12:08
keeping the system responsive and here
12:07 - 12:10
is the list of some important tables
12:08 - 12:12
from the context of mail server
12:10 - 12:13
functionality each of these tables
12:12 - 12:15
interact with others to form a
12:13 - 12:18
comprehensive mail server backend
12:15 - 12:20
handling everything from user profiles
12:18 - 12:22
to secure email transmission storage and
12:20 - 12:25
retrieval user Table stores information
12:22 - 12:27
about each user of the email service
12:25 - 12:30
each user has a unique user ID and basic
12:27 - 12:32
profile details
12:30 - 12:34
emails table records each email sent or
12:32 - 12:37
received by users linking each email to
12:34 - 12:39
the sender recipient and Associated
12:37 - 12:42
metadata like subject and time
12:39 - 12:46
stamps folders table manages folders for
12:42 - 12:49
organizing emails like inbox sent draft
12:46 - 12:50
Etc with a relationship to the user ID
12:49 - 12:53
this is a many to many relationship
12:50 - 12:55
table linking emails to folders allowing
12:53 - 12:58
an email to be associated with multiple
12:55 - 12:59
folders for example inbox and important
12:58 - 13:02
emails you also have an attachments
12:59 - 13:04
table that manages email attachments
13:02 - 13:06
linking each attachment to an email ID
13:04 - 13:09
and storing relevant
13:06 - 13:12
metadata spam classification table is
13:09 - 13:15
crucial for our topic this table stores
13:12 - 13:17
information about emails flagged as spam
13:15 - 13:19
either by Machine learning algorithms or
13:17 - 13:21
by user reports now this is not a full
13:19 - 13:23
list and the design can be further
13:21 - 13:26
optimized or expanded as necessary to
13:23 - 13:27
suit more specific requirements all
13:26 - 13:30
right so that's pretty much at a high
13:27 - 13:32
level about databases now let's not
13:30 - 13:34
forget reliability and security which
13:32 - 13:37
are also two crucial aspects of any mail
13:34 - 13:39
system so for data replication we store
13:37 - 13:41
multiple copies of each email across
13:39 - 13:43
storage notes with replication across
13:41 - 13:45
different data centers this protects us
13:43 - 13:47
from data loss due to server
13:45 - 13:49
failures you can have regular backups to
13:47 - 13:52
long-term storage and a solid Disaster
13:49 - 13:53
Recovery plan ensuring we are prepared
13:53 - 13:58
unexpected in terms of security measures
13:56 - 14:00
we'll have data and transit for which we
13:58 - 14:04
use T LS or SSL encryption for all
14:00 - 14:07
communication from SMTP to IMAP to
14:04 - 14:09
https for data at rest we have disc
14:07 - 14:11
level encryption which products emails
14:09 - 14:13
and sensitive data stored on the
14:11 - 14:16
servers we also Implement role-based
14:13 - 14:19
access control or rback and enforce or
14:16 - 14:21
tokens for API access and finally we'll
14:19 - 14:23
have auditing such as detail logs of
14:21 - 14:25
system activity which helps us detect
14:23 - 14:27
suspicious behaviors and ensure
14:25 - 14:29
compliance these measures create a
14:27 - 14:32
robust mail server environment that
14:29 - 14:34
secure reliable and resilient against
14:32 - 14:36
failures finally for any male server
14:34 - 14:39
managing High volumes of spam
14:36 - 14:42
efficiently is crucial so we'll have a
14:39 - 14:45
preq filtering that is before even
14:42 - 14:47
accepting an email we verify senders IPS
14:45 - 14:49
and use techniques like raisting to
14:47 - 14:51
reduce spam we apply rate limiting and
14:49 - 14:54
content analysis that is we limit the
14:51 - 14:56
number of emails from single IPS and
14:54 - 14:58
analyze email content for spam
14:56 - 15:00
characteristics you'll also have spam
14:58 - 15:02
processing pipeline line so by using
15:00 - 15:04
tools like Apache cavka and Apache storm
15:02 - 15:06
we can create a scalable pipeline that
15:04 - 15:09
can process emails in parallel keeping
15:06 - 15:10
performance High these layers of
15:09 - 15:13
prodection allow us to process large
15:10 - 15:15
volumes of spam without impacting system
15:13 - 15:17
performance from handling billions of
15:15 - 15:20
emails a day to keeping inboxes secure
15:17 - 15:22
and spam free every component has a role
15:20 - 15:23
in making the system robust and scalable
15:22 - 15:25
I hope this breakdown gives you a clear
15:23 - 15:27
picture of what's Happening behind the
15:25 - 15:29
scenes every time you hit s and if you
15:27 - 15:31
have enjoyed this breakdown don't forget
15:29 - 15:35
to like subscribe and check out my other
15:31 - 15:38
videos on system design