Arun Manglick - Architect View: May 2017

This article is about ELK (buzz word now) implementation.

The ELK stack consists of Elasticsearch, Logstash, and Kibana.

Logstash is a tool for log data intake, processing, and output. This includes virtually any type of log that you manage: system logs, webserver logs, error logs, and app logs.

Here in this post, Logstash will be replaced by AWS CloudWatch and AWS Kinesis Firehose.

Elasticsearch - Is a NoSQL database that is based on the Lucene search engine. Is a popular open-source search and analytics engine. It is designed to be distributed across multiple nodes enabling work with large datasets. Handle use cases as : Log Analytics, Real-time application monitoring, Click Stream Analytics and Text Search

Here in this post, AWS Elasticsearch Service will be used for Elasticsearch component.

Kibana is your log-data dashboard. It’s a stylish interface for visualizing logs and other time-stamped data.

Enable better grip on your large data stores with point-and-click pie charts, bar graphs, trendlines, maps and scatter plots.

First Implementation – ELK With CloudTrail/CloudWatch (as LogStash)

We’ll try to list few easy steps to do so:

- Go to AWS Elastic Search

- Create ES Domain – amelasticsearchdomain

o Set Access Policy to Allow All/Your Id

- Go to AWS CloudTrail Service

- Create Cloud Trail - amElasticSearchCloudTrail

o Create S3 Bucket – amelasticsearchbucket (Used to hold cloudtrail data)

o Create CloudWatch Group - amElasticSearchCloudWatchGroup

o In order to deliver CloudTrail events to CloudWatch Logs log group, CloudTrail will assume role with below two permissions

§ CreateLogStream: Create a CloudWatch Logs log stream in the CloudWatch Logs log group you specify

§ PutLogEvents: Deliver CloudTrail events to the CloudWatch Logs log stream

- Go & Setup Cloud Watch,

- Select Group and Then Action to Stream data to Elastic Search Domain

o Create New Role - AM_lambda_elasticsearch_execution

o Create Lambda (Automatically) LogsToElasticsearch_amelasticsearchdomain - CloudWatch Logs uses Lambda to deliver log data to Amazon Elasticsearch Service / Amazon Elasticsearch Service Cluster.

- Go to Elastic Search

o Hit Kibana link

o On Kibana - Configure an index pattern

Second Implementation – ELK With AWS KinesisFirehose/CloudWatch (as LogStash)

We’ll try to list few easy steps to do so:

- Go to AWS Elastic Search

- Create ES Domain - amelasticsearchdomain

o Set Access Policy to Allow All/Your Id

- Create Kinesis Firehose Delivery Stream - amelasticsearchkinesisfirehosestream

o Attach it to above ES Domain

o Create Lambda (Optional) - amelasticsearchkinesisfirehoselambda

o Create S3 Bucket for Backup - amelasticsearchkinesisfirehosebucket

o Create Role - am_kinesisfirehose_delivery_role

- Create EC2 System - (To send log data to above configured Kinesis Firehose)

o This will be using 1995 NASA Apache Log (http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html) to feed into Kinesis Firehose.

o EC2 used the Amazon Kinesis Agent to flow data from my file system into my Firehose stream.

o Amazon Kinesis Agent is a standalone Java software application that offers an easy way to collect and send data to Amazon Kinesis and to Firehose

- Steps:

- Launch an EC2 Instance (t2.micro) running the Amazon Linux Amazon Machine Image (AMI)

- Putty into instance/etc/aws-kinesis/agent

- Install Kinesis Agent - sudo yum install –y aws-kinesis-agent

- Go to directory - /etc/aws-kinesis/

- Open file - nano agent.json

- Make sure it has this data:

{

"cloudwatch.emitMetrics": true,

"firehose.endpoint": "https://firehose.us-east-1.amazonaws.com",

"flows": [

{

"filePattern": "/tmp/mylog.txt",

"deliveryStream": "amelasticsearchkinesisfirehosestream",

"initialPosition": "START_OF_FILE"

}

]

}

- Now Download NASA access log file in your local desktop and Upload to S3

- URL - http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html

- File download - Jul 01 to Jul 31, ASCII format, 20.7 MB gzip compressed,

- Unzip and uplaod this file to any S3 bucket (other than any used above)

- Make sure file is Public

- Again go to EC2 Putty

- Go to directory - /etc/aws-kinesis/

- Downlaod file from S3 - wget https://s3-us-west-1.amazonaws.com/arunm/access_log_Jul95

- Concatenate this file to mylog.txt - cat access_log_Jul95 >> /tmp/mylog.txt

- Again go to EC2 Putty

- Come to root - cd ~

- Go to directory - /var/log/aws-kinesis-agent/

- Monitor the agent’s log at /var/logs/aws-kinesis-agent/aws-kinesis-agent.log.

- Open file - nano aws-kinesis-agent.log

- You’ll find log lines like : 2017-03-01 21:46:38.476+0000 ip-10-0-0-55 (Agent.MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Progress: 1891715 records parsed (205242369 bytes), and 1891715 records sent successfully to destinations. Uptime: 630024ms

- Create Kibana ( To Visualize data)

o Go to AWS Elasticsearch

o Click on link to Kibana

o The first thing you need to do is configure an index pattern. Use the index root you set when you created the Firehose stream (in our case, logs*).

o Kibana should recognize the logs indexes and let you set the Time-field name value. Firehose provides two possibilities:

§ @timestamp – the time as recorded in the file

§ @timestamp_utc – available when time zone information is present in the log data

o Choose either one, and you should see a summary of the fields detected.

o Select the discover tab, and you see a graph of events by time along with some expandable details for each event.

o As we are using the NASA dataset, we get a message that there are no results. That’s because the data is way back in 1995.

o Expand the time selector in the top right of the Kibana dashboard and choose an absolute time. Pick a start of June 30, 1995, and an end of August 1, 1995. You’ll see something like this.

Hope this helps.

Regards,

Arun Manglick

Arun Manglick - Architect View

Monday, May 1, 2017

ELK/EKK - AWS Implementation