Log aggregation with CloudWatch Logs

Log aggregation is now easier than ever to setup thanks to CloudWatch Logs. If you aren't familiar with log management, check out this article for a brief introduction.

I'm a fan of CloudWatch Logs for several reasons:

This article highlights another cool feature--you can tail your logs in near realtime too! So if you're a fan of tail -f /path/to/mylog.log, this article is for you.

Create Policies for CloudWatch Logs

If you're running this from EC2, you can add the following policy to the IAM Role associated with the instance:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents",
        "logs:DescribeLogStreams"
    ],
      "Resource": [
        "arn:aws:logs:*:*:*"
    ]
  }
 ]
}

Install CloudWatch Logs Agent

You can install the agent like this (I'm using Ubuntu 14.04 on EC2):

wget https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py
sudo python ./awslogs-agent-setup.py --region us-east-1

The installer will configure log monitoring for syslog by default. I find it easier to edit the file manually rather than through the tool. For our example, lets say that we want to send the following log files to CloudWatch:

  • /var/log/nginx/error.log
  • /var/log/nginx/access.log

Just add the following to the bottom of /var/awslogs/etc/awslogs.conf (replace APP_ID with a more meaningful name, like acme-api or acme-web):

[/var/log/nginx/error.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /var/log/nginx/error.log
buffer_duration = 5000
log_stream_name = APP_ID {instance_id}
initial_position = end_of_file
log_group_name = /var/log/nginx/error.log

[/var/log/nginx/access.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /var/log/nginx/access.log
buffer_duration = 5000
log_stream_name = APP_ID {instance_id}
initial_position = end_of_file
log_group_name = /var/log/nginx/access.log

Then restart the logs service by running sudo service awslogs restart.

Get your log data

To get log events you can use the AWS CLI like this:

$ aws logs --profile=my_profile get-log-events --log-group-name /var/log/nginx/error.log --log-stream-name "acme-web i-07fdf4d3"
{
    "nextForwardToken": "f/32258355170809589614344124816922785120512659126568615936",
    "events": [
        {
            "ingestionTime": 1446513625731,
            "timestamp": 1446513629293,
            "message": "App 22007 stdout: { specialtyData: "
        },
...

I prefer to use the human friendly awslogs tool (pip install awslogs) so that I can tail logs like this:

$ AWS_PROFILE=my_profile awslogs get /var/log/nginx/error.log "acme-web*" --watch
/var/log/nginx/error.log acme-web i-07fdf4d3 App 12432 stdout:
/var/log/nginx/error.log acme-web i-07fdf4d3 App 12432 stdout: Node Environment is: production
/var/log/nginx/error.log acme-web i-07fdf4d3 App 12432 stdout: Creating httpServer
/var/log/nginx/error.log acme-web i-07fdf4d3 App 12432 stdout: listening on 80
...

NOTE: Ensure that you're using version 0.1.2 or higher of this tool. There was an issue with --watch that caused the command to hang.

This just scratches the surface of what you can do. Happy logging!

S3 Object Expiration

Situation: You're putting a bunch of files into S3 and don't want it to grow uncontrollably. Maybe your build process is generating a bunch of zip files that are being used with CodeDeploy. If you're not careful you're gonna have a lot of files sitting around.

You could fix this with a cronjob to cleanup the S3 bucket, but there's an even easier fix. Use object expiration! This isn't a new feature (it was announced back in 2011), but it might be something that you haven't setup.

Getting Started

Full documentation is located here. You can set this up using either the console or the AWS CLI. I'll be using the put-bucket-lifecycle option of the CLI.

Example 1: Delete Objects After 5 days

For lifecycle management the word "expire" really means delete. That policy could be placed in lifecycle.json:

{
  "Rules": [
    {
      "ID": "Delete objects older than 5 days",
      "Prefix": "",
      "Status": "Enabled",
      "Expiration": {
        "Days": 5
      }
    }
  ]
}

You could then apply to the bucket like this:

aws --profile=<NOT NECESSARY IF YOU WANT THE DEFAULT> --region=<AWS_REGION> s3api put-bucket-lifecycle --bucket <BUCKET_NAME> --lifecycle-configuration file://lifecycle.json

Example 2: Move objects to Glacier after 5 days, store indefinitely

If you don't specify an expiration the object will stay stored indefintely. All you do is specify that you want to move objects to GLACIER after 5 days. The policy would like this:

{
    "Rules": [
        {
            "Status": "Enabled",
            "Prefix": "",
            "Transition": {
                "Days": 5,
                "StorageClass": "GLACIER"
            },
            "ID": "Glacier after 5 days, store indefinitely"
        }
    ]
}

You could then apply to the bucket like this:

aws --profile=<NOT NECESSARY IF YOU WANT THE DEFAULT> --region=<AWS_REGION> s3api put-bucket-lifecycle --bucket <BUCKET_NAME> --lifecycle-configuration file://lifecycle.json

Example 3: Move Objects to Glacier after 5 days, delete after 30 days

You can also specify 2 rules. The first rule moves the object to GLACIER after 5 days. After 30 days the object will expire from GLACIER. The policy looks like this:

{
    "Rules": [
        {
            "Status": "Enabled",
            "Prefix": "",
            "Transition": {
                "Days": 5,
                "StorageClass": "GLACIER"
            },
            "Expiration": {
                "Days": 30
            },
            "ID": "Glacier after 5 days, expire after 30 days"
        }
    ]
}

You could then apply to the bucket like this:

aws --profile=<NOT NECESSARY IF YOU WANT THE DEFAULT> --region=<AWS_REGION> s3api put-bucket-lifecycle --bucket <BUCKET_NAME> --lifecycle-configuration file://lifecycle.json

What policies are defined for a bucket?

You can see what policies are in effect for a bucket by running this:

$ aws --profile=<NOT NECESSARY IF YOU WANT THE DEFAULT> s3api get-bucket-lifecycle --bucket <BUCKET_NAME> --region=<AWS_REGION>
{
    "Rules": [
        {
            "Status": "Enabled",
            "Prefix": null,
            "Expiration": {
                "Days": 5
            },
            "ID": "Delete objects older than 5 days"
        }
    ]
}

That's it. Once the policy is in place you don't have any additional work to do. Happy expiration!

Mock APIs with API Blueprint, Dredd, api-mocks, and rails

Here's a story we're all too familiar with: you're developing an API and webapp at the same time. API first is all the rage these days. At some point frontend developers will need to connect to a backend API. They can mock out API behavior for a little while, but may wind up in trouble if the final API is very different than what was expected. You don't want to rush API development, but you need to share something with the people developing the frontend...sooner rather than later.

Now this may sound obvious, but the thing about coding is this: you don't just start writing code. Instead, you spend a good chunk of time figuring out what to build. The better idea you have of what needs to be built, the faster you'll be able to build it once you get started. You're also less likely to build features that aren't needed or introduce unnecessary code (less is more!). Just like the frontend, the design phase is equally important for the API.

Rather than starting to code the API, wouldn't it be nice if you could:

  • Design the API using a modeling language that is programming language agnostic.
  • Provide frontend developers with a mocked API that matches the specification above, with no additional development effort.
  • Implement the actual API such that it behaves according to the same specification as the mocked API.

Thankfully there are several such API modeling languages out there that make it relatively straightforward to accomplish the goals above. The main competitors are:

Looking for some comparisons? Check out the following articles:

The general consensus seems to be that there's no clear winner in the API modeling language game. PLUS--there's tools to convert between modeling languages, so migration is always a potential option should one of the modeling langauges become deprecated down the road.

For my project, I chose API Blueprint for the following reasons:

  • Easy to understand (markdown is pretty easy to read)
  • Easy to write (markdown)
  • Tooling seems decent (dredd, api-mock, and many more)

Ok, so how do I get started with all of this?

Start by installing dependencies

Please note that I'm working on a Mac and using Homebrew so my notes are for that kind of setup.

node (using v0.10.40)

dredd and api-mock are node tools so you'll need node.

brew install nodejs

dredd (v1.0.1) / api-mock (v0.2.2)

Use npm to install dredd and api-mock:

npm install -g dredd api-mock

ruby2.2 / rails4.2

This is documented here. If you're on Yosemite and experiencing any OpenSSL issues, take a look at this article. Once ruby is installed you can install rails and bundler like this:

gem install --no-ri --no-rdoc rails bundler

Great, now we can get started.

Time to create the dummy API

We want our API to have a single endpoint located at /dummy. It will respond to a GET request with the following plaintext: hello world. If we're using curl, it would behave like this:

$ curl -v -X GET localhost:3000/dummy
*   Trying ::1...
* connect to ::1 port 3000 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 3000 (#0)
> GET /message HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Content-Type: text/plain
< Content-Length: 17
< Date: Mon, 23 Nov 2015 06:00:07 GMT
< Connection: keep-alive
<
hello world
* Connection #0 to host localhost left intact

API Blueprint

The associated specification (a.k.a "spec") file describing this API would look like this:

dummyapi.apib

# My API
## GET /dummy
+ Response 200 (text/plain)

  hello world

Mock API Server

To run this mock API, run this:

$ api-mock ./dummyapi.apib --port 3000                                           
info:    Enabled Cross-Origin-Resource-Sharing (CORS)
info:     Allow-Origin: *
info:     Allow-Methods: GET, PUT, POST, PATCH, DELETE, TRACE, OPTIONS
info:     Allow-Headers: Origin, X-Requested-With, Content-Type, Accept, Authorization, Referer, Prefer
info:    Listening on port 3000

You now have a dummy mocked API (on http://localhost:3000). Sweet!

$ curl -v -X GET localhost:3000/dummy
*   Trying ::1...
* connect to ::1 port 3000 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 3000 (#0)
> GET /message HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Content-Type: text/plain
< Content-Length: 17
< Date: Mon, 23 Nov 2015 06:00:07 GMT
< Connection: keep-alive
<
hello world
* Connection #0 to host localhost left intact

Implementation Time

All is fine and dandy, but what do you do when you want to start implementing this api in some language like ruby, python, go, etc? That's where dredd comes in. The tagline for dredd is:

> Dredd is a language agnostic command-line tool for testing API documentation written in the API Blueprint format against its backend implementation.

This means that:

  • Your API documentation can always be up to date (use a CI system like Travis or Jenkins to accomplish this).
  • You can verify that the API you build conforms to the same standards as the mocked api.

Configure dredd

dredd has an interactive configuration mode so you can get started quickly. For our example, run dred init to get started:

$ dredd init
? Location of the API blueprint: apiary.apib
? Command to start API backend server e.g. (bundle exec rails server) bundle exec rails s
? URL of tested API endpoint: http://localhost:3000
? Programming language of hooks: ruby
? Do you want to use Apiary test inspector? No
? Found CircleCI configuration, do you want to add Dredd to the build? No

Configuration saved to dredd.yml

Install hooks handler and run Dredd test with:

  $ gem install dredd_hooks
  $ dredd

This results in a dredd.yml file that looks like this:

dry-run: null
hookfiles: ./hooks.rb
language: ruby
sandbox: false
server: bundle exec rails s
server-wait: 3
init: false
custom: {}
names: false
only: []
reporter: []
output: []
header: []
sorted: false
user: null
inline-errors: false
details: false
method: []
color: true
level: info
timestamp: false
silent: false
path: []
blueprint: apiary.apib
endpoint: 'http://localhost:3000'

NOTE: Ignore the hookfiles part right now. We'll get to that soon.

Implement the dummy API

Before creating the implementation, lets see what happens if we just try running this by itself:

$ dredd
Configuration dredd.yml found, ignoring other arguments.
Starting server with command: bundle exec rails s
Waiting 3 seconds for server command to start...
Could not locate Gemfile or .bundle/ directory
info: Beginning Dredd testing...
error: GET /dummt duration: 4ms
error: Error connecting to server under test!
info: Displaying failed tests...
fail: GET /dummy duration: 4ms
fail: Error connecting to server under test!
complete: 0 passing, 0 failing, 1 errors, 0 skipped, 1 total
complete: Tests took 9ms

Dredd attempted to start up our api...and it didn't exist. Can't test something that doesn't exist! I'll quickly setup a quick rails app.

$  rails new dummyapi
      create
      create  README.rdoc
      ...
Bundle complete! 12 Gemfile dependencies, 53 gems now installed.

And run dredd again:

$ dredd
Configuration dredd.yml found, ignoring other arguments.
Starting server with command: bundle exec rails s
Waiting 3 seconds for server command to start...
[2015-11-22 23:01:20] INFO  WEBrick 1.3.1
[2015-11-22 23:01:20] INFO  ruby 2.2.2 (2015-04-13) [x86_64-darwin14]
[2015-11-22 23:01:20] INFO  WEBrick::HTTPServer#start: pid=52324 port=3000
info: Beginning Dredd testing...
fail: GET /dummy duration: 447ms
info: Displaying failed tests...
fail: GET /dummy duration: 447ms
fail: headers: Header 'content-type' has value 'text/html; charset=utf-8' instead of 'text/plain; charset=utf-8'
body: Real and expected data does not match.
statusCode: Status code is not '200'

request:
body:

headers:
    User-Agent: Dredd/1.0.1 (Darwin 15.0.0; x64)

uri: /dummy
method: GET


expected:
headers:
    Content-Type: text/plain; charset=utf-8

body:
hello world

statusCode: 200


actual:
statusCode: 404
headers:
    content-type: text/html; charset=utf-8
    content-length: 37322
    x-web-console-session-id: 5266580f4d873b0f4c7f9fd7a0bd0e6d
    x-request-id: 10e10910-d808-4b8c-a0e7-11bbd8c851db
    x-runtime: 0.283871
    server: WEBrick/1.3.1 (Ruby/2.2.2/2015-04-13)
    date: Mon, 23 Nov 2015 07:01:22 GMT
    connection: Keep-Alive

body:
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8" />
  <title>Action Controller: Exception caught</title>
...
complete: 0 passing, 1 failing, 0 errors, 0 skipped, 1 total
complete: Tests took 451ms

We now see 1 failing test (/dummy), with the following errors:

  • fail: headers: Header 'content-type' has value 'text/html; charset=utf-8' instead of 'text/plain; charset=utf-8'
  • body: Real and expected data does not match.
  • statusCode: Status code is not '200'

I haven't set any endpoints up yet, so these errors are letting me know what's missing.

routes.rb

Rails.application.routes.draw do
  get 'dummy', to: 'api#dummy'
  ...
end

app/controllers/api_controller.rb

class ApiController < ApplicationController

  def dummy
    render plain: "hello world"
  end

end

Now to test again...

$ dredd
Configuration dredd.yml found, ignoring other arguments.
Starting server with command: bundle exec rails s
Waiting 3 seconds for server command to start...
[2015-11-22 23:08:44] INFO  WEBrick 1.3.1
[2015-11-22 23:08:44] INFO  ruby 2.2.2 (2015-04-13) [x86_64-darwin14]
[2015-11-22 23:08:44] INFO  WEBrick::HTTPServer#start: pid=52409 port=3000
info: Beginning Dredd testing...
fail: GET /dummy duration: 155ms
info: Displaying failed tests...
fail: GET /dummy duration: 155ms
fail: body: Real and expected data does not match.

request:
body:

headers:
    User-Agent: Dredd/1.0.1 (Darwin 15.0.0; x64)

uri: /dummy
method: GET


expected:
headers:
    Content-Type: text/plain; charset=utf-8

body:
hello world

statusCode: 200


actual:
statusCode: 200
headers:
    x-frame-options: SAMEORIGIN
    x-xss-protection: 1; mode=block
    x-content-type-options: nosniff
    content-type: text/plain; charset=utf-8
    etag: W/"bd13b94ec091c54f6f01d47ce47a54a5"
    cache-control: max-age=0, private, must-revalidate
    x-request-id: bc936f24-2dcb-4e96-8c03-3673d8edaf01
    x-runtime: 0.114245
    server: WEBrick/1.3.1 (Ruby/2.2.2/2015-04-13)
    date: Mon, 23 Nov 2015 07:08:46 GMT
    content-length: 16
    connection: Keep-Alive

body:
hello world

complete: 0 passing, 1 failing, 0 errors, 0 skipped, 1 total
complete: Tests took 158ms

What's going on here? Well, it turns out there there's some extra whitespace in a text/plain response. This is documented here. This is where the hook system comes into play.

Dredd hooks

Dredd hooks are documented here. Common use cases for hooks include (taken from the dredd documentation):

  • loading db fixtures
  • cleanup after test step or steps
  • handling authentication and sessions
  • passing data between transactions (saving state from responses to stash)
  • modifying request generated from blueprint
  • changing generated expectations
  • setting custom expectations
  • debugging via logging stuff

In our case we can use the hooks to clean up the newline character in the expected result. For ruby/rails, this means that we'll need to install the dredd_hooks gem:

gem install --no-ri --no-rdoc dredd_hooks

Then we'll create a hooks.rb file that looks like the following:

hooks.rb

require 'dredd_hooks'
include DreddHooks::Methods

before_each do |transaction|
  if transaction['expected']['headers']['Content-Type'].match(/^text\/plain/)
    transaction['expected']['body'] = transaction['expected']['body'].gsub(/^\s+|\s+$/, "")
  end
end

Now to run dredd again (important: hookfiles must be set in dredd.yml to ./hooks.rb):

$ dredd
Configuration dredd.yml found, ignoring other arguments.
Starting server with command: bundle exec rails s
Waiting 3 seconds for server command to start...
[2015-11-22 23:17:31] INFO  WEBrick 1.3.1
[2015-11-22 23:17:31] INFO  ruby 2.2.2 (2015-04-13) [x86_64-darwin14]
[2015-11-22 23:17:31] INFO  WEBrick::HTTPServer#start: pid=52619 port=3000
info: Beginning Dredd testing...
Native thread-sleep not available.
This will result in much slower performance, but it will still work.
You should re-install spawn-sync or upgrade to the lastest version of node if possible.
Check /usr/local/lib/node_modules/dredd/node_modules/spawn-sync/error.log for more details
Spawning `ruby` hooks handler
Hook handler stdout: ./hooks.rb
Starting Ruby Dredd Hooks Worker

Hook handler stderr: Dredd connected to Ruby Dredd hooks worker

pass: GET /message duration: NaNms
complete: 1 passing, 0 failing, 0 errors, 0 skipped, 1 total
complete: Tests took 2245ms

Success! You have now accomplished the following:

  • Created an API spec using API Blueprint.
  • Created a Mock API for the API Spec (this can be used by anyone working on the frontend--and yes, they can work offline with the mock api).
  • Used dredd to test the implemented API against the specification. We used ruby, but language really doesn't matter here!

Hi there

Hi there. This is my first post here. Looking for more information? Check out my about page.