00:00 - 00:04
would you believe me if i told you that
00:02 - 00:07
there's a database out there that can
00:04 - 00:08
continuously handle large volumes of
00:08 - 00:13
scale automagically
00:10 - 00:15
and be available to keep on continuously
00:13 - 00:17
taking on data
00:15 - 00:19
hello my name is jamil spang developer
00:17 - 00:22
advocate with ibm
00:19 - 00:24
and today's topic is the answer to that
00:22 - 00:27
elasticsearch
00:24 - 00:28
all right it's a great database data
00:27 - 00:31
store and i want to talk a little more
00:28 - 00:33
about the some of the characteristics of
00:31 - 00:35
it we're going to compare it to a
00:33 - 00:37
relational database management system
00:35 - 00:39
and then talk about the ecosystem that
00:37 - 00:42
comes with it
00:39 - 00:45
so to get started let's talk about what
00:42 - 00:48
is elasticsearch exactly well
00:45 - 00:51
first it is distributed in nature
00:48 - 00:54
and it is a nosql
00:56 - 00:59
we're going to abbreviate that with the
00:57 - 01:00
ds there as well
01:02 - 01:07
on the spectrum of where databases fall
01:04 - 01:10
with postgres in my sequel kind of being
01:07 - 01:12
the most structured type of databases
01:10 - 01:15
put this on the outer sphere
01:12 - 01:18
past mongodb when it comes to how
01:15 - 01:20
unstructured and nosql it can be
01:18 - 01:23
when it comes to interacting with
01:20 - 01:27
elasticsearch interest interests
01:23 - 01:29
interestingly enough it's done through a
01:33 - 01:37
so all your queries happen that way you
01:35 - 01:40
programmatically
01:37 - 01:42
program all your indexes and all the
01:40 - 01:43
stuff that you pretty much anything you
01:42 - 01:45
need to interact with it would be
01:43 - 01:48
through rest urls
01:45 - 01:48
and a lot of the major use cases for
01:48 - 01:52
you know could be
01:50 - 01:55
you can take many different data sources
01:52 - 01:57
from logs it could be
01:55 - 01:58
any type of metrics you have from
01:57 - 02:01
different systems
01:58 - 02:03
and maybe even some application
02:03 - 02:08
that comes in and you can have one
02:05 - 02:09
system that you can combine all of this
02:08 - 02:12
you think about data coming from all
02:09 - 02:13
these different sources
02:12 - 02:16
and it being able to
02:13 - 02:19
uh push them into json documents and
02:16 - 02:21
then allow you the ability to search
02:19 - 02:23
and get that information back in real
02:23 - 02:27
so it sounds like a big job that it has
02:25 - 02:30
to do and certainly let's
02:27 - 02:32
do it from our normal comparison of what
02:30 - 02:34
a relational what we know of from
02:32 - 02:36
relational databases
02:34 - 02:39
to see how that compares and how the
02:36 - 02:42
lingo and the context changes well we
02:39 - 02:44
know that with relational database
02:42 - 02:46
management systems they are called
02:48 - 02:52
elasticsearch these are known as
03:00 - 03:05
e s indexes all right
03:03 - 03:06
and also in a
03:06 - 03:10
relational database we have the term of
03:13 - 03:16
this they're going to be called
03:15 - 03:18
it could be called kind of index
03:18 - 03:22
and some of the earlier versions they
03:20 - 03:25
were known as types
03:22 - 03:28
all right so now we know from our tables
03:25 - 03:30
in relational database has many tables
03:28 - 03:33
all right and we know the obvious second
03:30 - 03:35
one we're going to look at is
03:33 - 03:36
i'm going to put both of these down as
03:35 - 03:41
we're getting to the bottom of the
03:36 - 03:41
screen here rows and then columns
03:43 - 03:46
okay let's get my other marker here
03:46 - 03:52
rows just like we know from most
03:49 - 03:55
nosql data sources are going to be as
03:57 - 04:01
and normally in a relational database
04:00 - 04:04
you know you have tables you have the
04:01 - 04:08
rows individual columns these are going
04:04 - 04:08
to be called fields
04:08 - 04:13
so just a quick comparison if you have a
04:11 - 04:16
lot of familiarity with a relational
04:13 - 04:18
databases like mysql or postgres this is
04:16 - 04:19
kind of a way to transition your
04:18 - 04:21
understanding
04:19 - 04:23
of all that and know how things kind of
04:21 - 04:25
map together and when you start planning
04:23 - 04:27
out your your structure these are things
04:25 - 04:29
that you need to consider that how you
04:27 - 04:31
can translate that over so we know that
04:31 - 04:35
json based data store you're going to
04:33 - 04:37
interact with it with rest and we're
04:35 - 04:41
looking to get many it's very powerful
04:37 - 04:43
has the capability to
04:41 - 04:45
ingest data from many many data sources
04:43 - 04:47
and scale out if i think about the cap
04:45 - 04:50
theorem concepts i will probably put
04:47 - 04:52
this on an a and a p for availability
04:50 - 04:53
and partition tolerance already built in
04:52 - 04:55
and depending on how you want to
04:53 - 04:56
configure it you could probably achieve
04:56 - 05:02
different consistency bases as well but
04:59 - 05:04
let's get move on to the whole ecosystem
05:02 - 05:07
so you hear the name elasticsearch out
05:04 - 05:08
there but often you will hear about this
05:12 - 05:16
this is how you you will hear about it
05:14 - 05:19
being referenced and i think the easiest
05:16 - 05:21
way to break down how the stack works
05:19 - 05:23
let's diagram it out and then we'll talk
05:21 - 05:25
about each counter component and the
05:23 - 05:27
place that it fits and that would be a
05:25 - 05:29
great way to really help understand this
05:27 - 05:31
so let's put
05:29 - 05:33
elasticsearch
05:31 - 05:35
i'm going to abbreviate this es that's
05:33 - 05:38
going to be kind of in the center
05:35 - 05:41
of everything here and what we're going
05:42 - 05:46
the k is for cabana
05:46 - 05:53
and kambana is a web-based ui
05:51 - 05:56
this will be how you actually interact
05:53 - 05:58
with a lot of the data that uh
05:56 - 06:01
elasticsearch prepares and indexes for
05:58 - 06:04
you to use and so you can build um
06:01 - 06:04
your dashboard
06:05 - 06:08
and you can build different widgets
06:09 - 06:12
or visualizations
06:16 - 06:22
that can continuously update as well as
06:19 - 06:24
data comes in uh on that side so this
06:22 - 06:27
could really be your main interface that
06:24 - 06:29
you use to keep
06:27 - 06:32
keep updating and looking at your data
06:29 - 06:34
as it flows in now let's talk about the
06:32 - 06:36
other side so we talked about the output
06:34 - 06:37
we have this great data store
06:36 - 06:40
elasticsearch we're going to be
06:37 - 06:42
visualizing things with cabana kind of
06:40 - 06:44
our gateway to view our data and how
06:42 - 06:47
things are running now let's talk about
06:44 - 06:51
how data gets in and there are two parts
06:47 - 06:51
that i would like to talk about here
06:51 - 06:55
we have something called logstash and
06:53 - 06:58
you'll also
06:55 - 06:58
hear something called beats
07:00 - 07:03
so for logstash
07:04 - 07:08
think of this as well it actually is a
07:06 - 07:12
very open source
07:08 - 07:14
server-side uh processing pipeline
07:12 - 07:17
and its main job is to do two things to
07:14 - 07:17
take data in
07:21 - 07:28
input data from many different sources
07:24 - 07:28
is then going to transform
07:30 - 07:35
and then you get to what we like to call
07:33 - 07:38
so eloquently stash it somewhere all
07:35 - 07:40
right now the inputs can be from variety
07:38 - 07:42
of things you can actually just put it
07:40 - 07:44
in a format most of the time you can add
07:42 - 07:46
sdks or things to your
07:44 - 07:48
code or or different systems and they
07:46 - 07:50
push the data into logstash
07:48 - 07:53
transformations may be to do some
07:50 - 07:54
formatting on the data minor structuring
07:53 - 07:57
before it comes in
07:54 - 07:59
if you would like and then you can
07:57 - 08:00
output that through
07:59 - 08:03
to stash that somewhere
08:00 - 08:05
and you can imagine one of the first
08:03 - 08:08
plugins that are there is elasticsearch
08:05 - 08:10
so let's complete our triangle here
08:08 - 08:11
we'll go from logstash
08:11 - 08:16
uh elasticsearch and so you can
08:13 - 08:18
continuously feed things in
08:16 - 08:21
now we mentioned the part beats uh that
08:18 - 08:22
were here unlike the headphones these
08:21 - 08:25
beats are set up to
08:22 - 08:27
be kind of agents on different servers
08:25 - 08:29
so say you have something in maybe in
08:27 - 08:31
serverless or
08:29 - 08:32
um or you have some files that you want
08:31 - 08:33
to do or different
08:33 - 08:37
maybe something on windows server so
08:35 - 08:39
it's kind of a complementary kind of
08:37 - 08:41
component that's very logstash in nature
08:39 - 08:42
but it has plug-ins to many different
08:41 - 08:45
other services
08:42 - 08:46
and one of its outputs is to go directly
08:45 - 08:48
into logstash
08:46 - 08:50
so collectively you're kind of building
08:48 - 08:52
this consistent
08:50 - 08:55
pipeline that keeps going
08:52 - 08:57
in as you visualize you can kind of say
08:55 - 09:00
you program more
08:57 - 09:01
things to come in and continuously keep
09:01 - 09:05
circular nature coming
09:04 - 09:06
and keep flowing
09:06 - 09:12
this can scale up to
09:09 - 09:13
massive amounts of information and nodes
09:12 - 09:15
that can really already set up to be
09:13 - 09:18
distributed in nature and handle a
09:15 - 09:21
variety of scenarios but one great thing
09:18 - 09:23
is there are containers available that
09:21 - 09:25
you can set up this complete
09:23 - 09:28
infrastructure all on your laptop to
09:25 - 09:31
taste test things out on a very much
09:28 - 09:33
smaller scale and have it grow to a much
09:31 - 09:35
larger scale effectively making it a
09:33 - 09:38
great component in your architecture to
09:35 - 09:39
be how you visualize your data that will
09:38 - 09:40
be in the data lake that you are
09:40 - 09:45
thank you very much for your time
09:43 - 09:48
if you have any questions please drop us
09:45 - 09:50
a line below and if you want to see more
09:48 - 09:54
videos like this in the future please
09:50 - 09:54
like and subscribe