00:00 - 00:04

would you believe me if i told you that

00:02 - 00:07

there's a database out there that can

00:04 - 00:08

continuously handle large volumes of

00:07 - 00:10

information

00:08 - 00:13

scale automagically

00:10 - 00:15

and be available to keep on continuously

00:13 - 00:17

taking on data

00:15 - 00:19

hello my name is jamil spang developer

00:17 - 00:22

advocate with ibm

00:19 - 00:24

and today's topic is the answer to that

00:22 - 00:27

elasticsearch

00:24 - 00:28

all right it's a great database data

00:27 - 00:31

store and i want to talk a little more

00:28 - 00:33

about the some of the characteristics of

00:31 - 00:35

it we're going to compare it to a

00:33 - 00:37

relational database management system

00:35 - 00:39

and then talk about the ecosystem that

00:37 - 00:42

comes with it

00:39 - 00:45

so to get started let's talk about what

00:42 - 00:48

is elasticsearch exactly well

00:45 - 00:51

first it is distributed in nature

00:48 - 00:54

and it is a nosql

00:51 - 00:54

json based

00:54 - 00:57

datastore

00:56 - 00:59

we're going to abbreviate that with the

00:57 - 01:00

ds there as well

00:59 - 01:02

um

01:00 - 01:04

so um

01:02 - 01:07

on the spectrum of where databases fall

01:04 - 01:10

with postgres in my sequel kind of being

01:07 - 01:12

the most structured type of databases

01:10 - 01:15

put this on the outer sphere

01:12 - 01:18

past mongodb when it comes to how

01:15 - 01:20

unstructured and nosql it can be

01:18 - 01:23

when it comes to interacting with

01:20 - 01:27

elasticsearch interest interests

01:23 - 01:29

interestingly enough it's done through a

01:27 - 01:29

restful

01:31 - 01:35

api

01:33 - 01:37

so all your queries happen that way you

01:35 - 01:40

programmatically

01:37 - 01:42

program all your indexes and all the

01:40 - 01:43

stuff that you pretty much anything you

01:42 - 01:45

need to interact with it would be

01:43 - 01:48

through rest urls

01:45 - 01:48

and a lot of the major use cases for

01:48 - 01:50

this

01:48 - 01:52

you know could be

01:50 - 01:55

you can take many different data sources

01:52 - 01:57

from logs it could be

01:55 - 01:58

any type of metrics you have from

01:57 - 02:01

different systems

01:58 - 02:03

and maybe even some application

02:01 - 02:05

trace data

02:03 - 02:08

that comes in and you can have one

02:05 - 02:09

system that you can combine all of this

02:08 - 02:12

you think about data coming from all

02:09 - 02:13

these different sources

02:12 - 02:16

and it being able to

02:13 - 02:19

uh push them into json documents and

02:16 - 02:21

then allow you the ability to search

02:19 - 02:23

and get that information back in real

02:21 - 02:25

time

02:23 - 02:27

so it sounds like a big job that it has

02:25 - 02:30

to do and certainly let's

02:27 - 02:32

do it from our normal comparison of what

02:30 - 02:34

a relational what we know of from

02:32 - 02:36

relational databases

02:34 - 02:39

to see how that compares and how the

02:36 - 02:42

lingo and the context changes well we

02:39 - 02:44

know that with relational database

02:42 - 02:46

management systems they are called

02:44 - 02:48

databases

02:46 - 02:52

and in

02:48 - 02:52

elasticsearch these are known as

02:52 - 02:55

indexes

02:56 - 02:58

or

02:57 - 03:00

i

02:58 - 03:03

n d i c

03:00 - 03:05

e s indexes all right

03:03 - 03:06

and also in a

03:05 - 03:09

uh

03:06 - 03:10

relational database we have the term of

03:09 - 03:11

tables

03:10 - 03:13

okay

03:11 - 03:15

and in

03:13 - 03:16

this they're going to be called

03:15 - 03:18

it could be called kind of index

03:16 - 03:20

patterns

03:18 - 03:22

and some of the earlier versions they

03:20 - 03:25

were known as types

03:22 - 03:28

all right so now we know from our tables

03:25 - 03:30

in relational database has many tables

03:28 - 03:33

all right and we know the obvious second

03:30 - 03:35

one we're going to look at is

03:33 - 03:36

i'm going to put both of these down as

03:35 - 03:41

we're getting to the bottom of the

03:36 - 03:41

screen here rows and then columns

03:43 - 03:46

okay let's get my other marker here

03:45 - 03:49

and

03:46 - 03:52

rows just like we know from most

03:49 - 03:55

nosql data sources are going to be as

03:52 - 03:55

documents

03:57 - 04:01

and normally in a relational database

04:00 - 04:04

you know you have tables you have the

04:01 - 04:08

rows individual columns these are going

04:04 - 04:08

to be called fields

04:08 - 04:13

so just a quick comparison if you have a

04:11 - 04:16

lot of familiarity with a relational

04:13 - 04:18

databases like mysql or postgres this is

04:16 - 04:19

kind of a way to transition your

04:18 - 04:21

understanding

04:19 - 04:23

of all that and know how things kind of

04:21 - 04:25

map together and when you start planning

04:23 - 04:27

out your your structure these are things

04:25 - 04:29

that you need to consider that how you

04:27 - 04:31

can translate that over so we know that

04:29 - 04:33

it's a

04:31 - 04:35

json based data store you're going to

04:33 - 04:37

interact with it with rest and we're

04:35 - 04:41

looking to get many it's very powerful

04:37 - 04:43

has the capability to

04:41 - 04:45

ingest data from many many data sources

04:43 - 04:47

and scale out if i think about the cap

04:45 - 04:50

theorem concepts i will probably put

04:47 - 04:52

this on an a and a p for availability

04:50 - 04:53

and partition tolerance already built in

04:52 - 04:55

and depending on how you want to

04:53 - 04:56

configure it you could probably achieve

04:55 - 04:59

some

04:56 - 05:02

different consistency bases as well but

04:59 - 05:04

let's get move on to the whole ecosystem

05:02 - 05:07

so you hear the name elasticsearch out

05:04 - 05:08

there but often you will hear about this

05:07 - 05:12

term elk

05:08 - 05:14

elk stack

05:12 - 05:16

this is how you you will hear about it

05:14 - 05:19

being referenced and i think the easiest

05:16 - 05:21

way to break down how the stack works

05:19 - 05:23

let's diagram it out and then we'll talk

05:21 - 05:25

about each counter component and the

05:23 - 05:27

place that it fits and that would be a

05:25 - 05:29

great way to really help understand this

05:27 - 05:31

so let's put

05:29 - 05:33

elasticsearch

05:31 - 05:35

i'm going to abbreviate this es that's

05:33 - 05:38

going to be kind of in the center

05:35 - 05:41

of everything here and what we're going

05:38 - 05:41

to do the

05:42 - 05:46

the k is for cabana

05:46 - 05:53

and kambana is a web-based ui

05:51 - 05:56

this will be how you actually interact

05:53 - 05:58

with a lot of the data that uh

05:56 - 06:01

elasticsearch prepares and indexes for

05:58 - 06:04

you to use and so you can build um

06:01 - 06:04

your dashboard

06:05 - 06:08

and you can build different widgets

06:09 - 06:12

or visualizations

06:16 - 06:22

that can continuously update as well as

06:19 - 06:24

data comes in uh on that side so this

06:22 - 06:27

could really be your main interface that

06:24 - 06:29

you use to keep

06:27 - 06:32

keep updating and looking at your data

06:29 - 06:34

as it flows in now let's talk about the

06:32 - 06:36

other side so we talked about the output

06:34 - 06:37

we have this great data store

06:36 - 06:40

elasticsearch we're going to be

06:37 - 06:42

visualizing things with cabana kind of

06:40 - 06:44

our gateway to view our data and how

06:42 - 06:47

things are running now let's talk about

06:44 - 06:51

how data gets in and there are two parts

06:47 - 06:51

that i would like to talk about here

06:51 - 06:55

we have something called logstash and

06:53 - 06:58

you'll also

06:55 - 06:58

hear something called beats

06:58 - 07:03

all right

07:00 - 07:03

so for logstash

07:04 - 07:08

think of this as well it actually is a

07:06 - 07:12

very open source

07:08 - 07:14

server-side uh processing pipeline

07:12 - 07:17

and its main job is to do two things to

07:14 - 07:17

take data in

07:21 - 07:28

input data from many different sources

07:24 - 07:28

is then going to transform

07:29 - 07:33

that data

07:30 - 07:35

and then you get to what we like to call

07:33 - 07:38

so eloquently stash it somewhere all

07:35 - 07:40

right now the inputs can be from variety

07:38 - 07:42

of things you can actually just put it

07:40 - 07:44

in a format most of the time you can add

07:42 - 07:46

sdks or things to your

07:44 - 07:48

code or or different systems and they

07:46 - 07:50

push the data into logstash

07:48 - 07:53

transformations may be to do some

07:50 - 07:54

formatting on the data minor structuring

07:53 - 07:57

before it comes in

07:54 - 07:59

if you would like and then you can

07:57 - 08:00

output that through

07:59 - 08:03

to stash that somewhere

08:00 - 08:05

and you can imagine one of the first

08:03 - 08:08

plugins that are there is elasticsearch

08:05 - 08:10

so let's complete our triangle here

08:08 - 08:11

we'll go from logstash

08:10 - 08:13

into

08:11 - 08:16

uh elasticsearch and so you can

08:13 - 08:18

continuously feed things in

08:16 - 08:21

now we mentioned the part beats uh that

08:18 - 08:22

were here unlike the headphones these

08:21 - 08:25

beats are set up to

08:22 - 08:27

be kind of agents on different servers

08:25 - 08:29

so say you have something in maybe in

08:27 - 08:31

serverless or

08:29 - 08:32

um or you have some files that you want

08:31 - 08:33

to do or different

08:32 - 08:35

[Music]

08:33 - 08:37

maybe something on windows server so

08:35 - 08:39

it's kind of a complementary kind of

08:37 - 08:41

component that's very logstash in nature

08:39 - 08:42

but it has plug-ins to many different

08:41 - 08:45

other services

08:42 - 08:46

and one of its outputs is to go directly

08:45 - 08:48

into logstash

08:46 - 08:50

so collectively you're kind of building

08:48 - 08:52

this consistent

08:50 - 08:55

pipeline that keeps going

08:52 - 08:57

in as you visualize you can kind of say

08:55 - 09:00

you program more

08:57 - 09:01

things to come in and continuously keep

09:00 - 09:04

this

09:01 - 09:05

circular nature coming

09:04 - 09:06

and keep flowing

09:05 - 09:09

now

09:06 - 09:12

this can scale up to

09:09 - 09:13

massive amounts of information and nodes

09:12 - 09:15

that can really already set up to be

09:13 - 09:18

distributed in nature and handle a

09:15 - 09:21

variety of scenarios but one great thing

09:18 - 09:23

is there are containers available that

09:21 - 09:25

you can set up this complete

09:23 - 09:28

infrastructure all on your laptop to

09:25 - 09:31

taste test things out on a very much

09:28 - 09:33

smaller scale and have it grow to a much

09:31 - 09:35

larger scale effectively making it a

09:33 - 09:38

great component in your architecture to

09:35 - 09:39

be how you visualize your data that will

09:38 - 09:40

be in the data lake that you are

09:39 - 09:43

building

09:40 - 09:45

thank you very much for your time

09:43 - 09:48

if you have any questions please drop us

09:45 - 09:50

a line below and if you want to see more

09:48 - 09:54

videos like this in the future please

09:50 - 09:54

like and subscribe

The Power of Elasticsearch: Revolutionizing Data Storage

In today's data-driven world, the need for a robust database that can handle vast amounts of information seamlessly is more crucial than ever. Enter Elasticsearch, a distributed NoSQL JSON-based datastore that defies the limitations of traditional databases. Let's delve into what makes Elasticsearch so unique and how it compares to relational databases.

Understanding Elasticsearch: A Paradigm Shift in Data Storage

At its core, Elasticsearch stands out for its distributed nature, allowing data to be seamlessly scaled and managed. Unlike relational databases like MySQL or PostgreSQL, Elasticsearch operates on the concept of indexes rather than tables, with documents replacing rows and fields replacing columns. This shift in terminology may seem daunting at first, but it's a paradigm shift that offers unparalleled flexibility in storing and retrieving data.

One of the key features that sets Elasticsearch apart is its RESTful API interface. By interacting with Elasticsearch via REST URLs, users can programmatically manage indexes and execute queries, making the process of data manipulation more intuitive and efficient.

Bridging the Gap: Elasticsearch vs. Relational Databases

In the realm of relational databases, the structure is rigid, defined by tables, rows, and columns. In contrast, Elasticsearch adopts a more dynamic approach, treating data as documents with nested fields. This flexibility allows for the seamless integration of data from diverse sources, such as logs, metrics, and application trace data.

When it comes to scalability, Elasticsearch shines. Embracing the principles of the CAP theorem, Elasticsearch prioritizes availability and partition tolerance, making it a robust choice for handling large volumes of data across various sources. Its ability to ingest data from multiple channels and process it into JSON documents sets it apart as a versatile and powerful datastore.

Exploring the Elasticsearch Ecosystem: The ELK Stack

Beyond its standalone capabilities, Elasticsearch is often part of a broader ecosystem known as the ELK stack, comprising Elasticsearch, Logstash, and Kibana. This integrated framework offers a comprehensive solution for data ingestion, processing, and visualization.

  • Elasticsearch (ES): The central datastore where data is indexed and stored.
  • Logstash: A server-side processing pipeline that ingests, transforms, and stores data from diverse sources.
  • Kibana: A web-based UI that enables users to visualize and interact with data stored in Elasticsearch.

Additionally, the ELK stack features "Beats," lightweight agents that serve as data shippers, seamlessly feeding data into Logstash for processing.

Building a Seamless Data Pipeline with Elasticsearch

By leveraging the ELK stack, organizations can establish a continuous data pipeline that efficiently manages data from source to visualization. The synergy between Logstash, Beats, and Elasticsearch ensures that data is ingested, processed, and stored effectively, allowing for real-time insights and analytics.

The scalability of Elasticsearch, coupled with the flexibility of the ELK stack, makes it a powerhouse for organizations seeking to harness the full potential of their data. Whether you're dealing with terabytes of logs or complex metrics, Elasticsearch offers a streamlined and efficient solution for data management and analysis.

In conclusion, Elasticsearch represents a paradigm shift in data storage, offering a dynamic and flexible alternative to traditional relational databases. Its seamless scalability, distributed nature, and integration capabilities make it a valuable asset for organizations looking to optimize their data infrastructure. Embrace the power of Elasticsearch and unlock new possibilities in data management and analysis!


If you're ready to revolutionize your data storage and analysis capabilities, Elasticsearch is your key to success. Dive into the world of distributed, scalable databases and unleash the full potential of your data.Embrace the power of Elasticsearch and unlock new possibilities in data management and analysis!