This article
is about ELK (buzz word now) implementation.
The ELK
stack consists of Elasticsearch, Logstash, and Kibana.
Logstash is a tool for log data intake, processing, and output. This
includes virtually any type of log that you manage: system logs, webserver
logs, error logs, and app logs.
Here in this
post, Logstash will be replaced by AWS CloudWatch and AWS Kinesis Firehose.
Elasticsearch - Is a NoSQL database that is based on the
Lucene search engine. Is a popular
open-source search and analytics engine. It is designed to be distributed
across multiple nodes enabling work with large datasets. Handle use cases as :
Log Analytics, Real-time application monitoring, Click Stream Analytics and
Text Search
Here in this
post, AWS Elasticsearch Service will be
used for Elasticsearch component.
Kibana is your log-data dashboard. It’s a
stylish interface for visualizing logs and other time-stamped data.
Enable
better grip on your large data stores with point-and-click pie charts, bar
graphs, trendlines, maps and scatter plots.
First Implementation – ELK With
CloudTrail/CloudWatch (as LogStash)
We’ll try to
list few easy steps to do so:
-
Go to AWS
Elastic Search
-
Create ES
Domain – amelasticsearchdomain
o Set Access Policy to Allow All/Your Id
-
Go to AWS
CloudTrail Service
-
Create Cloud
Trail - amElasticSearchCloudTrail
o Create S3 Bucket – amelasticsearchbucket
(Used to hold cloudtrail data)
o Create CloudWatch Group -
amElasticSearchCloudWatchGroup
o In order to deliver CloudTrail events to
CloudWatch Logs log group, CloudTrail will assume role with below two
permissions
§ CreateLogStream: Create a CloudWatch Logs log
stream in the CloudWatch Logs log group you specify
§ PutLogEvents: Deliver CloudTrail events to
the CloudWatch Logs log stream
-
Go &
Setup Cloud Watch,
-
Select Group
and Then Action to Stream data to Elastic Search Domain
o Create New Role -
AM_lambda_elasticsearch_execution
o Create Lambda (Automatically)
LogsToElasticsearch_amelasticsearchdomain - CloudWatch Logs uses Lambda to
deliver log data to Amazon Elasticsearch Service / Amazon Elasticsearch Service
Cluster.
-
Go to
Elastic Search
o Hit Kibana link
o On Kibana - Configure an index pattern
Second Implementation – ELK
With AWS KinesisFirehose/CloudWatch (as LogStash)
We’ll try to
list few easy steps to do so:
-
Go to AWS
Elastic Search
-
Create ES
Domain - amelasticsearchdomain
o Set Access Policy to Allow All/Your Id
-
Create Kinesis
Firehose Delivery Stream - amelasticsearchkinesisfirehosestream
o Attach it to above ES Domain
o Create Lambda (Optional) - amelasticsearchkinesisfirehoselambda
o Create S3 Bucket for Backup -
amelasticsearchkinesisfirehosebucket
o Create Role - am_kinesisfirehose_delivery_role
-
Create EC2
System - (To send log data to above configured Kinesis Firehose)
o This will be using 1995 NASA Apache Log
(http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html) to feed into Kinesis
Firehose.
o EC2 used the Amazon Kinesis Agent to flow
data from my file system into my Firehose stream.
o Amazon Kinesis Agent is a standalone Java
software application that offers an easy way to collect and send data to Amazon
Kinesis and to Firehose
- Steps:
- Launch an EC2 Instance (t2.micro)
running the Amazon Linux Amazon Machine Image (AMI)
- Putty into
instance/etc/aws-kinesis/agent
- Install Kinesis Agent - sudo yum
install –y aws-kinesis-agent
- Go to directory - /etc/aws-kinesis/
- Open file - nano agent.json
- Make sure it has this data:
{
"cloudwatch.emitMetrics": true,
"firehose.endpoint":
"https://firehose.us-east-1.amazonaws.com",
"flows": [
{
"filePattern":
"/tmp/mylog.txt",
"deliveryStream":
"amelasticsearchkinesisfirehosestream",
"initialPosition":
"START_OF_FILE"
}
]
}
- Now Download NASA access log file in
your local desktop and Upload to S3
- URL -
http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
- File download - Jul 01
to Jul 31, ASCII format, 20.7 MB gzip compressed,
- Unzip and uplaod this
file to any S3 bucket (other than any used above)
- Make sure file is
Public
- Again go to EC2 Putty
- Go to directory -
/etc/aws-kinesis/
- Downlaod file from S3 -
wget https://s3-us-west-1.amazonaws.com/arunm/access_log_Jul95
- Concatenate this file
to mylog.txt - cat access_log_Jul95 >> /tmp/mylog.txt
-
Again go to EC2 Putty
- Come to root - cd ~
- Go to directory - /var/log/aws-kinesis-agent/
- Monitor the agent’s log
at /var/logs/aws-kinesis-agent/aws-kinesis-agent.log.
- Open file - nano
aws-kinesis-agent.log
- You’ll find log lines
like : 2017-03-01 21:46:38.476+0000 ip-10-0-0-55
(Agent.MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.Agent [INFO]
Agent: Progress: 1891715 records parsed (205242369 bytes), and 1891715 records
sent successfully to destinations. Uptime: 630024ms
-
Create
Kibana ( To Visualize data)
o Go to AWS Elasticsearch
o Click on link to Kibana
o The first thing you need to do is configure
an index pattern. Use the index root you set when you created the Firehose
stream (in our case, logs*).
o Kibana should recognize the logs indexes and
let you set the Time-field name value. Firehose provides two possibilities:
§ @timestamp – the time as recorded in the file
§ @timestamp_utc – available when time zone
information is present in the log data
o Choose either one, and you should see a
summary of the fields detected.
o Select the discover tab, and you
see a graph of events by time along with some expandable details for each
event.
o As we are using the NASA dataset, we get a
message that there are no results. That’s because the data is way back in 1995.
o Expand the time selector in the top right of
the Kibana dashboard and choose an absolute time. Pick a start of June 30,
1995, and an end of August 1, 1995. You’ll see something like this.
Hope this
helps.
Regards,
Arun
Manglick