Monday, May 1, 2017

ELK/EKK - AWS Implementation

This article is about ELK (buzz word now) implementation.

The ELK stack consists of Elasticsearch, Logstash, and Kibana.





Logstash is a tool for log data intake, processing, and output. This includes virtually any type of log that you manage: system logs, webserver logs, error logs, and app logs.
Here in this post, Logstash will be replaced by AWS CloudWatch and AWS Kinesis Firehose.

Elasticsearch - Is a NoSQL database that is based on the Lucene search engine.  Is a popular open-source search and analytics engine. It is designed to be distributed across multiple nodes enabling work with large datasets. Handle use cases as : Log Analytics, Real-time application monitoring, Click Stream Analytics and Text Search
Here in this post, AWS Elasticsearch Service  will be used for Elasticsearch component.

Kibana is your log-data dashboard. It’s a stylish interface for visualizing logs and other time-stamped data.
Enable better grip on your large data stores with point-and-click pie charts, bar graphs, trendlines, maps and scatter plots.

First Implementation – ELK With CloudTrail/CloudWatch (as LogStash)

We’ll try to list few easy steps to do so:

-          Go to AWS Elastic Search
-          Create ES Domain – amelasticsearchdomain
o   Set Access Policy to Allow All/Your Id

-          Go to AWS CloudTrail Service
-          Create Cloud Trail - amElasticSearchCloudTrail
o   Create S3 Bucket – amelasticsearchbucket (Used to hold cloudtrail data)
o   Create CloudWatch Group - amElasticSearchCloudWatchGroup
o   In order to deliver CloudTrail events to CloudWatch Logs log group, CloudTrail will assume role with below two permissions
§  CreateLogStream: Create a CloudWatch Logs log stream in the CloudWatch Logs log group you specify
§  PutLogEvents: Deliver CloudTrail events to the CloudWatch Logs log stream

-          Go & Setup Cloud Watch,
-          Select Group and Then Action to Stream data to Elastic Search Domain
o   Create New Role - AM_lambda_elasticsearch_execution
o   Create Lambda (Automatically) LogsToElasticsearch_amelasticsearchdomain - CloudWatch Logs uses Lambda to deliver log data to Amazon Elasticsearch Service / Amazon Elasticsearch Service Cluster.

-          Go to Elastic Search
o   Hit Kibana link
o   On Kibana - Configure an index pattern


Second Implementation – ELK With AWS KinesisFirehose/CloudWatch (as LogStash)

We’ll try to list few easy steps to do so:

-          Go to AWS Elastic Search
-          Create ES Domain - amelasticsearchdomain
o   Set Access Policy to Allow All/Your Id
      
-          Create Kinesis Firehose Delivery Stream - amelasticsearchkinesisfirehosestream
o   Attach it to above ES Domain
o   Create Lambda (Optional)  - amelasticsearchkinesisfirehoselambda
o   Create S3 Bucket for Backup - amelasticsearchkinesisfirehosebucket
o   Create Role - am_kinesisfirehose_delivery_role

-          Create EC2 System - (To send log data to above configured Kinesis Firehose)
o   This will be using 1995 NASA Apache Log (http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html) to feed into Kinesis Firehose.
o   EC2 used the Amazon Kinesis Agent to flow data from my file system into my Firehose stream.
o   Amazon Kinesis Agent is a standalone Java software application that offers an easy way to collect and send data to Amazon Kinesis and to Firehose
               
- Steps:
       - Launch an EC2 Instance (t2.micro) running the Amazon Linux Amazon Machine Image (AMI)
       - Putty into instance/etc/aws-kinesis/agent
       - Install Kinesis Agent - sudo yum install –y aws-kinesis-agent
       - Go to directory - /etc/aws-kinesis/
       - Open file - nano agent.json
       - Make sure it has this data:
                       {
                         "cloudwatch.emitMetrics": true,
                         "firehose.endpoint": "https://firehose.us-east-1.amazonaws.com",

                         "flows": [
                                       {
                                         "filePattern": "/tmp/mylog.txt",
                                         "deliveryStream": "amelasticsearchkinesisfirehosestream",
                                         "initialPosition": "START_OF_FILE"
                                       }
                         ]
                       }
       - Now Download NASA access log file in your local desktop and Upload to S3
                       - URL - http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
                       - File download - Jul 01 to Jul 31, ASCII format, 20.7 MB gzip compressed,
                       - Unzip and uplaod this file to any S3 bucket (other than any used above)
                       - Make sure file is Public
                      
       - Again go to EC2 Putty 
                       - Go to directory - /etc/aws-kinesis/
                       - Downlaod file from S3 - wget https://s3-us-west-1.amazonaws.com/arunm/access_log_Jul95
                       - Concatenate this file to mylog.txt - cat access_log_Jul95 >> /tmp/mylog.txt
                      
       -  Again go to EC2 Putty
                       - Come to root - cd ~
                       - Go to directory -  /var/log/aws-kinesis-agent/
                       - Monitor the agent’s log at /var/logs/aws-kinesis-agent/aws-kinesis-agent.log.
                       - Open file - nano aws-kinesis-agent.log
                       - You’ll find log lines like : 2017-03-01 21:46:38.476+0000 ip-10-0-0-55 (Agent.MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Progress: 1891715 records parsed (205242369 bytes), and 1891715 records sent successfully to destinations. Uptime: 630024ms

-          Create Kibana ( To Visualize data)                    
o   Go to AWS Elasticsearch
o   Click on link to Kibana
o   The first thing you need to do is configure an index pattern. Use the index root you set when you created the Firehose stream (in our case, logs*).
o   Kibana should recognize the logs indexes and let you set the Time-field name value. Firehose provides two possibilities:
§  @timestamp – the time as recorded in the file
§  @timestamp_utc – available when time zone information is present in the log data
o   Choose either one, and you should see a summary of the fields detected.
o   Select the discover tab, and you see a graph of events by time along with some expandable details for each event.
o   As we are using the NASA dataset, we get a message that there are no results. That’s because the data is way back in 1995.
o   Expand the time selector in the top right of the Kibana dashboard and choose an absolute time. Pick a start of June 30, 1995, and an end of August 1, 1995. You’ll see something like this.



Hope this helps.

Regards,
Arun Manglick


Tuesday, March 7, 2017

AWS - Key Components Limits

Port:

·         SSH: 22
·         RDP: 3389
·         HTTP: 80
·         HTTPS: 443
·         MySQL: 3306
·         Redshift: 5439
·         MS SQL: 1433
ELB
·         Supported Protocols: HTTP/HTTPS/TCP/SSL

·         Supported Ports: 25, 80, 443, 1024 - 36635


Compute

EC2


·         Max Number of Tags – 10 Per EC2 Instance
·         How many regions are there on the AWS platform currently - 11
·         Total Regions Supported for EC2 - 9
·         EC2 Instance Limits (Per Region)
·         On-Demand : 20
·         Reserved: 20
·         SPOT – No Limit
·         Number of EBS volumes - 5000

Auto Scaling

·         Default Cooling Period – 5 mins
·         Health Check Grace Period – 300Secs (5 Mins) Default

Elastic Bean Stalk

·         Default Limit
·         Applications: 75
·         Application Version: 1000
·         Environments: 200

 

Storage

S3


·         Per AWS A/c S3 Buckets – 100 (Call AWS to increase limit)
·         File Size: 1 Byte – 5 TB
·         Object/File Size in Single PUT – 5 GB
·         Multipart Upload – Greater than 100 MB

Glacier

·         1000 Vaults – Per A/c Per Region
·         No Max Limit to the total amount of data.
·         Individual Archives Limit: 1 Byte – 40 TB
·         Object/File Size in Single PUT – 4 GB

Storage Gateway

·         Gateway Stored Volume – 16 TB, 32 Volumes: 512 TB
·         Gateway Cached Volume – 32 TB, 32 Volumes: 1 PB
·         Virtual Tape  – 1 PB (1500 Virtual Tapes) (Takes 24 Hours for retrieval)

Import/Export

·         Max Device Capacity – 16 TB
·         Snowball - Max Device Capacity – 50 TB

CloudFront: (Can be Writable)

·         1000 – Request Per Second
·         Max File Size that can be delivered thru CloudFront – 20 GB
·         TTL – 24 Hrs (86400 secs)
·         Cannot be - RDS, Glacier
·         Can be – S3, EC2, ELB, Route53





Database


RDS:

·         Limit – 40 RDS Instances
·         Max DB on Single SQL-Server Instance – 30
·         Max DB on Single Oracle Instance – 01
·         RDS Backup Retention Period – 1 - 35 Days
·         Read-Replicas – 05
·         MySQL DB Size – 6 TB
·         Maximum RDS Volume size using RDS PIOPS storage with MySQL & Oracle DB Engine  - 6 TB
·         Maximum PIOPS capacity on an MySQL and Oracle RDS instance is 30,000 IOPS (Default)
·         Maximum size for a Microsoft SQL Server DB Instance with SQL Server Express edition – 10 GB (SA Mega Quiz #20)

Dynamo DB

·         Storage – No Limit
·         Single Item Size (Row Size): 1 – 400 KB
·         Local/Global Secondary Index – 05 per Table
·         Streams – Stored for 24 Hours only.
·         Maximum Write Thruput – Can go beyond 10,000 capacity units, but contact AWS first.
·         Projected Non-Key Attributes – 20 Per Table
·         LSIs - Limit the total size of all elements (tables and indexes) to 10 GB per partition key value. (GSI does not have any such limitations)
·         Tags: 50 Tags Per DynamoDB Table
·         Triggers for a Table - Unlimited

RedShift

·         Block Size – 1024 KB
·         Maintain 3 copies
·         Compute Node: 1 – 128
·         Backup Retention Period – 1 Day (Max)


Aurora

·         Maintain 6 copies in 3 AZs

ElastiCache

·         Reserved Cache Nodes – 20
·          




Networking


ELB:

·         Allowed Load balancer : 20
·         Port Supported: HTTP, HTTPS, SSL, TCP
·         Acceptable ports for both the HTTPS/SSL and HTTP/TCP connections are 25, 80, 443, and 1024-65535

Per Region

·         05 - VPCs
·         05 - EIP
·         05  – Virtual Private Gateway
·         50 - VPN Connections
·         50 – Customer Gateway

Per VPC

·         01 - Internet Gateway
·         01 – IP Address Range
·         200 – Subnets
·         20 - EC2 Instances (Default)

Per Subnet

·         01 – AZ
·         01 - ACL

Notes:

·         An instance retains its Private IP and persist across starts and stops
·         Assign multiple IP addresses to your instances
·         EIP is associated with your AWS account and not a particular instance. It remains associated with your account until you explicitly release it.
·         Subnet Cannot Span Multi-AZ
·         Security Group Can Span Multi-AZ
·         N/W ACL Can Span Multi-AZ
·         Route Table Can Span Multi-AZ



Route53

·         Number of domains you can manage using Route 53 is 50  (however it is a soft limit and can be raised by contacting AWS support)

Management


Cloud Watch

·         Logs – Unlimited/ Indefinitely
·         Alarms – 2 Weeks (14 Days)
·         Metrics – 2 Weeks (14 Days)
·         For more, use API – GetMetricStatistics or some third party tools
·         EC2 Metrics Monitoring
·         Standard – 5 Mins
·         Detailed – 1 Min (Paid)

·         Custom Metrics Monitoring – Minimum 1 Min


Cloud Formation

·         Templates – No Limits
·         Stack – 200 Per A/c
·         Parameters – 60 per Template
·         Output – 60 per Template
·         Description Field Size – 4096 Characters

Cloud Trail

·         5 Trails – Per Region
·         Deliver Log Files – Every 5 mins
·         Capture API Activity – Last 7 Days


OpsWorks

·         40 Stacks
·         40 Layers per stack
·         40 Instances per stack
·         40 - Apps per stack

Security


·         Roles: 250 Per AWS Account
·         KMS
·         Master Keys – 1000 Per AWS A/c
·         Data Key – No Limit
·         Resource Base Permission:
·         S3, Glacier, EBS
·         SNS, SQS
·         Elastic Bean Stalk, Cloud Trail

Analytics


EMR:
·         EC2 Instances Across All clusters - 20

  

Application


SQS:

·         Visibility Timeout – 30 Secs (Default) (Otherwise 12 hours) Value must be between 0 seconds and 12 hours.
·         Retention Period – 4 days (Default). Can be set form 1 min to 2 Weeks (14 Days) Value must be between 1 minute and 14 days.
·         Max Long Polling Timeout – 20 Secs Value must be between 0 and 20 seconds.
·         Message Size – 256 KB Value must be between 1 and 256 KB.
·         Number of Queues – Unlimited
·         Number of messages per queue – Unlimited
·         Queue name: 80 characters

SES:

·         SES Email Size – 10 MB (including Attachments)
·         SES Recipients – 50 for every message
·         Sending Limits:

SNS:

·         Topic – 1,00,000 Lakh (Per A/c)
·         Subscription – 10 Million Per Topic (Per A/c)
·         TTL – 04 Weeks

SWF:

·         Retention Period – 1 Year
·         Max workflow execution – 1 Year
·         History of Execution – 90 Days Max
·         Max  Workflow and Activity Types – 10,000
·         Max Amazon SWF domains – 100
·         Max open executions in a domain – 1,00,000

Elastic Encoder

·         Jobs – 10k Per Pipeline


Hope this helps.
Keep Blogging!!!

http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html

Regards,
Arun Manglick