Monday, December 7, 2015

AWS Lambda: "Occasionally Reliable Caching"

One of the biggest misconceptions I've heard from developers working with Lambda for the first time is that every execution of a Lambda function is entirely isolated and independent. AWS may be slightly responsible for this, as their documentation states:
Each AWS Lambda function runs in its own isolated environment, with its own resources and file system view. AWS Lambda uses the same techniques as Amazon EC2 to provide security and separation at the infrastructure and execution levels.
The trouble with this statement is that, while each function may be running in an isolated environment, the executions of that function have access to the same memory and disk resources. I'll skip right to an example.

var str = 'hello';

exports.handler = function(event, context) {
console.log(str);
str = 'it\'s me';
context.succeed();
};

Create a new Lambda function using this code and then run it twice. Here's the output:

START RequestId: 031528d9-8eea-22e5-b1e5-13516c9e3426 Version: $LATEST
2015-12-07T22:37:50.125Z 031528d9-8eea-22e5-b1e5-13516c9e3426 hello
END RequestId: 031528d9-8eea-22e5-b1e5-13516c9e3426
REPORT RequestId: 031528d9-8eea-22e5-b1e5-13516c9e3426 Duration: 10.91 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 8 MB

START RequestId: 4ac77131-8aca-11e3-874c-cb361dfcaaf1 Version: $LATEST
2015-12-07T22:38:10.101Z 4ac77131-8aca-11e3-874c-cb361dfcaaf1 it's me
END RequestId: 4ac77131-8aca-11e3-874c-cb361dfcaaf1
REPORT RequestId: 4ac77131-8aca-11e3-874c-cb361dfcaaf1 Duration: 8.49 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 9 MB

Run the function again in 30 minutes. It will output "hello" again because the container had to be reinitialized.

Not so isolated, right? To be fair, AWS does claim that any code outside of the handler function is treated as global code and is only initialized once per container. However, I've seen numerous developers mistakenly add code outside of the handler that should remain private. As Lambda becomes more ubiquitous, the distinction is critical for both performance and security.

Performance

Because code outside of the handler is only initialized once, this is the perfect spot for initializing Node modules, making database connections, etc. Additionally, it's also where global caching can be added. Take the following sample code:

var CACHE = {};

exports.handler = function(event, context) {
if (CACHE[event.item]) {
return context.succeed(CACHE[event.item]);
}

// Lookup object in database
db.find(event.item, function(err, item){
if (err) return context.fail(err);

CACHE[event.item] = item;
context.succeed(item);
});
};

Adding caching outside of the handler allows cached items to be shared across multiple executions of the same function. Because AWS does not guarantee that each execution will occur on the same container (due to scaling or lack of use), I've nicknamed this form of caching "Occasionally Reliable Caching" (ORC).

Security

As you can imagine, there are important security considerations here as well. Take the following hypothetical code:

var creditCard;

exports.handler = function(event, context) {
creditCard = {
number: event.credit_card_number,
exp: event.credit_card_expiration,
code: event.credit_card_security_code
};
// Connect to payment system to verify
payments.verify(creditCard, function(err){
if (err) {
context.fail('Invalid card');
} else {
context.succeed('Payment okay');
}
})
};

While this is honestly just as bad as storing credit card information in a global variable on a web server, outside of the request handler, the distinction isn't as clear or as well-known with Lambda. In the example above, if there are multiple executions occurring simultaneously, there is no guarantee as to which card is being verified.

This is an extreme example to indicate an obvious code error, but also note that Lambda is not yet PCI certified, so shouldn't be used for this kind of data in the first place.

As developers begin working more with Lambda, I expect this post to become unneeded; but until Lambda's execution model is more well-understood, it is important to keep in mind.

Enjoy this post? Want to learn more about Lambda? Pre-order my complete Lambda book on Amazon for only $3.99!

Monday, November 2, 2015

AWS Lambda Pricing Calculator

The AWS Lambda pricing scheme is a bit confusing to understand. To help easily calculate the cost of running Lambda functions, I've embedded my simple AWS Lambda pricing calculator below. (Click here to open in a new window).


Pre-Order the Complete Lambda Guide


Click Here to pre-order for only $3.99! The Lambda guide covers everything you need to know to get started with Lambda and deep dives into scores of topics.



Tuesday, May 12, 2015

YouTube's Content ID Program is Leading to Hundreds of Distorted Variations of Songs and Shows

Recently while looking for a song on YouTube, I came across what appeared to be the song I was searching for, but with a slightly-off pitch. I assumed it was a fluke and moved on. However, over the past few months I have noticed hundreds of videos, both of popular songs and hit television shows, that have been distorted in some way or another. There are songs that have been slowed down or sped up by ~5%, videos that are missing the top or bottom 20%, videos that have black boxes over parts of the video, audio that has had its pitch altered, videos where the original source is shrunk to half its size and embedded within some background, the list goes on.

After a bit of digging, it became apparent that this was almost certainly an attempt by the uploader to defeat YouTube's Content ID recognition system. For those who aren't familiar, Content ID is an automated system used by YouTube to detect copyrighted works within videos uploaded to the site. In most cases, Content ID will alert the uploader that the work is copyrighted and give them the option to swap the audio, or display advertisements on the video (proceeds go to the rights holders). While this is arguably a workable approach to copyright detection, it has mis-identified countless videos in the past (example another another). Besides the constant false positives and the lack of recourse for users accused of a violation, another major issue exists with Content ID: it struggles to detect derivatives of an original work, leading to multiple copies, all of them distorted in some way.

Given that Content ID is an automated system, it lends itself to the constant back-and-forth battles that exist with such technologies; for every attempt YouTube takes to curb copyright violations, the violators will seek ways to defeat the system. This very issue is now littering YouTube with thousands, if not millions, of defective copies of original works.

Take the hit show, "Shark Tank," as an example.

Here's a copy of a recent episode where about 10% of the left side of the video has been removed: https://www.youtube.com/watch?v=O9wPE-GZpQI.



Here's an episode where the entire video has been framed by a black box and translucent lines: https://www.youtube.com/watch?v=9kJq9hNgPrw


In this episode, the uploader shrunk the entire video within a large black border: https://www.youtube.com/watch?v=bmg5oqXWskY.



Here's an even more annoying example where the actual video only makes up about 40% of the box, while the rest if filled with a background pattern: https://www.youtube.com/watch?v=JB78NFhmIYs.



In this copy, the uploader has added a giant image background behind the original video, which has also been cropped: https://www.youtube.com/watch?v=fsSlp9b2P4Q.



When it comes to audio changes, pitch shifting on Taylor Swift's music has become so rampant, that not having pitch shift has become a selling point:



Unfortunately, I doubt that Content ID will ever be able to detect all of the variations of videos and songs that are copyrighted. Because these works have now been altered, sometimes to an almost laughable extent, I'm afraid that the outcome for the original artists is actually worse than if the originals were allowed to remain. Now, instead of hearing Taylor Swift's "Welcome to New York" the way she intended, I've heard about a hundred different variations, some fast, some slow, some higher pitch, some low. It remains to be seen how YouTube will respond to these videos, but my guess is that they will continue to adjust Content ID. Eventually, new uploads of Justin Bieber may sound like this: https://www.youtube.com/watch?v=bidHnEekXpE








Wednesday, April 8, 2015

More Graceful Deployments with CodeDeploy

UPDATE: As user /u/Andy_Troutman on Reddit kindly pointed out to me, the AWS team has already considered functionality very similar to this. They even wrote a similar script (long before I wrote this one). If you want the more "official" version, here it is: https://github.com/awslabs/aws-codedeploy-samples/tree/master/load-balancing/elb

If you haven't used AWS CodeDeploy before, it's a new service aimed at automating deployments across a fleet of EC2 instances. It works using an agent on the instance that polls AWS for new changes to application code. When a change is detected (you or a CI tool triggered a deployment), the instance downloads the new code and runs a series of steps you define in a YAML file. These steps can include installing dependencies, validating that the service is running, and pretty much anything you can fit in a script. CodeDeploy can be configured to deploy the code to all instances at once, one at a time, or in percentage groups.

Ideally, a new deployment revision should not cause any loss of traffic. However, once the application is installed (the code is unzipped and copied to the correct location), most services must be restarted for the changes to take effect. Personally, I use a simple Node.js process running with "forever" for 90% of my projects. When CodeDeploy finishes installing the code, I have to run "forever stop" and "forever start" for the changes to be applied. This takes about 500 to 4000 milliseconds depending on how large the application is and whether it has to make database connections or perform other startup procedures. During this time, traffic is obviously rejected and the load balancer returns a "503 - Backend Service at Capacity" error to the client.

Although ELBs do have health check options (and theoretically, you could set it to have a 1 second ping), the application restart still causes a instantaneous cut-off of all connections, followed by a failure of the health check. Until the check fails, the ELB is still sending traffic, which could amount to hundreds of connections for a highly trafficked service.

The solution to this issue is to tell the load balancer to stop sending the instance traffic and then wait for the existing connections to drain before restarting the application. Honestly, CodeDeploy could easily implement a simple option for "remove instance from the load balancer when deploying," but I've decided to recreate the effect using the start, stop, and validate scripts used by the agent.

Overview

In the next several steps, I'm going to script out the process of: 1) De-registering the instance from the load balancer 2) performing the application restart, and 3) Re-registering it once the health check passes.

Instance Preparation

Your instances must have the AWS command line tools installed. This can be done on the AMI (recommended) or during the bootstrap process. Additionally, you could even add it as a "BeforeInstall" hook (just know that it will run the install command before every deployment until it's removed).

Additionally, the instances are going to need to be able to make the necessary API calls to AWS to register and deregister themselves from the load balancer. I've allowed this using IAM roles (you are using IAM roles, right?) in CloudFormation below:

{
"Effect" : "Allow",
"Action" : [
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer"
],
"Resource" : [
{
"Fn::Join": [
"",
[
"arn:aws:elasticloadbalancing:",
{ "Ref" : "AWS::Region" },
":",
{ "Ref":"AWS::AccountId" },
":loadbalancer/your-elb-name"
]
]
}
]
}

You can also modify the instance's IAM role directly from the console and use the policy generator to give it the same permissions.

ELB Preparation

You should also enable connection draining on your ELB and set the time to whatever is appropriate for your application (if you're just serving webpages, 30 seconds is probably fine; if users are uploading files to your service, you may want to increase it).

CodeDeploy Files

Now that your instances have the correct permissions, you can include the code in your scripts to gracefully remove them from the ELB before running the application restart. Your scripts may differ considerably, but I have the following appspec.yml file:

version: 0.0
os: linux
files:
  - source: /
    destination: /path/to/install/location
hooks:
  AfterInstall:
    - location: deployment/stop.sh
      runas: user
  ApplicationStart:
    - location: deployment/start.sh
      runas: user
  ValidateService:
    - location: deployment/validate.sh
      runas: user

When a deployment is triggered, CodeDeploy runs the "ApplicationStop" script, downloads your artifact, runs the "BeforeInstall" script, copies the files to the correct location, runs the "AfterInstall" script, then the "ApplicationStart" script, and then finally the "ValidateService" script. As you can see, they are not all required, and I have not made use of every one.

Once the artifact is downloaded and unzipped, the "AfterInstall" script is run, which I've configured to remove the instance from the ELB, wait for the connections to drain, then stop my application:

#!/bin/bash

# Get the instance-id from AWS
INSTANCEID=$(curl http://169.254.169.254/latest/meta-data/instance-id)

# Remove the instance from the load balancer
aws elb deregister-instances-from-load-balancer --load-balancer-name elb-name --instances $INSTANCEID --region us-east-1

# Let connections drain for 30 seconds (replace with your drain time)
sleep 30

# Now stop the server
forever stop /path/to/process.js

At this point, the instance is successfully removed from the ELB, connections have been drained, and you can do whatever is needed to restart your app without worrying about loosing requests. My start.sh script restarts the server:

#!/bin/bash
forever start path/to/process.js

Finally, you should add validation to ensure your app is actually running before you re-attach the instance to the ELB. I've done this in the validate.sh script:

#!/bin/bash

# Wait for however long the service takes to be responsive
sleep 10;

res=`curl -s -I localhost/ping | grep HTTP/1.1 | awk {'print $2'}`
echo $res;

if [ $res -eq 200 ]
then
# Get the instance-id from AWS
INSTANCEID=$(curl http://169.254.169.254/latest/meta-data/instance-id)

# Add the instance back to the ELB
aws elb register-instances-with-load-balancer --load-balancer-name elb-name --instances $INSTANCEID --region us-east-1

# Wait for the instance to be detected by the ELB (set this to the health check interval)
sleep 10
exit 0;
else
exit 1;
fi

If everything is successful, CodeDeploy will complete this step and move to the next instance (assuming you're deploying one at a time). If not, the deployment will fail but the instance will remain removed from the ELB. You can either re-trigger a deployment with a fix or rollback to a previously working one.

Additional Thoughts

Depending on the size of your application, this may not fully replace proper A-B stack deployments that includes switching DNS. If you only have a few servers, taking one offline will increase the load on the others substantially. Finally, these steps will increase the time of your deployments by 30 seconds to a few minutes per server. If you have 100 servers, consider using the "percentage at a time" deployment method, but balance this will the increased load on the remaining servers.


Wednesday, March 4, 2015

Update Your AWS ELBs to Protect Against the FREAK Vulnerability

The recently announced "FREAK" vulnerability is yet another blight on SSL. Fortunately, AWS has been quick as always to issue another cipher update. To protect your AWS Elastic Load Balancers from FREAK, follow these steps.

1. Navigate to your ELB in the EC2 console and select the "Listeners" tab.
2. Click on "Cipher."
3. Change the dropdown to the 2015-02 cipher policy.
4. Save.

That's it! Your ELBs will now be protected.

Wednesday, February 11, 2015

AWS CodeDeploy: An In-Depth First Look

At Amazon's 2014 re:invent conference in Las Vegas, they announced CodeDeploy, a tool designed to simplify the process of deploying applications to groups of servers, sometimes numbering in the hundreds. The primary objective of CodeDeploy is to make deployments consistent, repeatable, and integrated with existing AWS services (you can complain about vendor lock-in now, but AWS is doing a great job of providing value for that lock-in).

I took a few hours to setup CodeDeploy and documented issues I ran into. This post is a result of a few hours of playing with the service and trying to get it running on a Ubuntu 12.04 Server (despite 14.04 being the only "officially" supported version.

First Impressions

At first glance, CodeDeploy really seems like a game changer; it's built by Amazon, integrated with their services, and a convenient way to do rolling, all-at-once, or grouped deployments. Once I started working with CodeDeploy, it felt like a solid product once I got past the first few issues. Of course, given its recent release, it also lacks a lot of support or online discussion, which left me manually digging through error logs and support forums for dependencies. While the documentation is pretty decent, there are currently only about thirty questions in the AWS forums about CodeDeploy. The biggest issue I found was that I had to manually add an alternative source for ruby2.0 on Ubuntu 12.04 and install it myself before continuing - but this was not the fault of CodeDeploy.

IAM Setup

CodeDeploy requires a moderate amount of setup to get working properly. The biggest error-prone aspect is creating the appropriate IAM roles for both the CodeDeploy service and the instances. First, I created the CodeDeploy IAM role with the following policy:

{
"PolicyName" : "AWSCodeDeployPolicy",
"PolicyDocument" : {
"Statement": [
{
"Action": [
"autoscaling:PutLifecycleHook",
"autoscaling:DeleteLifecycleHook",
"autoscaling:RecordLifecycleActionHeartbeat",
"autoscaling:CompleteLifecycleAction",
"autoscaling:DescribeAutoscalingGroups",
"autoscaling:PutInstanceInStandby",
"autoscaling:PutInstanceInService",
"ec2:Describe*"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
}

This allows CodeDeploy to access the tags and autoscaling groups it needs to in order to create applications and deployment configurations.

Next, I created the instance IAM role. It is important to remember that the CodeDeploy service needs access to the autoscaling and EC2 resources listed above while the instance itself only needs access to the S3 bucket containing the CodeDeploy agent and whatever bucket you store your final compressed file in.

{
"Effect" : "Allow",
"Action" : [
"s3:Get*",
"s3:List*"
],
"Resource" : [
"arn:aws:s3:::aws-codedeploy-us-east-1/*",
"arn:aws:s3:::your-bucket/path/*"
]
}

Here's a good place to tell you what I did wrong. Being security conscious, I thought I could get away with giving the instance role GetObject permissions only. My existing deployment strategy only requires this permission to pull the file from S3. However, apparently CodeDeploy tries to list the file and its ACL before downloading, which results in an error without the additional permissions. Lesson learned.

The CodeDeploy Agent

The next step was to get the agent installed on the Ubuntu Server instance. Amazon provides its own "Amazon Linux" if you're looking for an officially AWS-supported AMI, but I'm much more familiar with Debian-based distros, so I chose to stick with that. When you launch your instance, make sure you either give it a descriptive tag or place it in an autos-scaling group.

Installing the CodeDeploy agent on Ubuntu 12.04 proved to be a bit more difficult than the documentation reveals for 14.04 (again, not the fault of CodeDeploy, just my own need to use an older version). According to AWS, all you have to do is run:

sudo apt-get update
sudo apt-get install awscli
sudo apt-get install ruby2.0
cd /home/ubuntu
sudo aws s3 cp s3://bucket-name/latest/install . --region region-name
sudo chmod +x ./install
sudo ./install auto


However, if you try that, you'll notice it fails at the third line with:

E: Couldn't find any package by regex 'ruby2.0'

There is a yet-unanswered forum post about this here.

I decided to get Ruby installed another way. After getting it installed via rvm and rerunning the install script, it failed again, this time with:

"Dependency not satisfiable: ruby2.0"

So, I finally installed Ruby by adding an alternative source from Brightbox as documented here. In case that's ever not available, here were the steps:

sudo apt-get install software-properties-common
sudo apt-add-repository ppa:brightbox/ruby-ng
sudo apt-get update

Finally, I ran the install script yet again and it worked!

Preparing the Application

The application I wanted to deploy was a simple Node.js web app. It runs on the server using "forever," a daemon that keeps the process running in the background. To prepare it for CodeDeploy, I had to add an appspec.yml file and two scripts: a start and stop script.

The appspec.yml file looked very simple:

version: 0.0
os: linux
files:
  - source: /
    destination: /usr/local/projects/source
hooks:
  AfterInstall:
    - location: deployment/stop.sh
      runas: root
  ApplicationStart:
    - location: deployment/start.sh
      runas: root

Keep in mind that the YAML file is super-particular about spacing. There's an entire section devoted to it on the AWS docs.

Next, I added the start and stop scripts to the deployment directory of the project. Obviously they can be much more complex than this, but I'm trying to keep it relatively simple:

start.sh:

#!/bin/sh
forever start /usr/local/projects/source/server.js --flags --here;

stop.sh:

#!/bin/sh
forever stopall

Like I said, super simple, but it works. The basic premise of this is that CodeDeploy will execute each file that you provide during the correct lifecycle event, as defined by the appspec file. Besides "AfterInstall" and "ApplicationStart," there are also "ApplicationStop," "BeforeInstall," and "ValidateService." AWS provides explanations here, but keep in mind that "Install" purely means copying files to the right directories. In my example, "AfterInstall" means that CodeDeploy will wait until the files have been copied before stopping the previous running instance.

Once all of this has been done, create a compressed file of your choice (zip, tar, and tar.gz are supported on Linux, zips for Windows). Put the file in the same S3 bucket that you gave your instance permissions to earlier.

CodeDeploy Console

Within the AWS console, you can now setup your application. To do this, head to the CodeDeploy page and create a new application. Provide a name and a deployment group name. The console doesn't make this clear, but the difference is that you can have multiple deployment groups belonging to an application. For example, you could have an app called "node-app" and create a "node-app-a" deployment group and then later create a "node-app-b" group which would help with A-B style deployments.


In the tags section, enter either the autoscaling group or the tags you created earlier. If everything is successful, you should see the instance count increase.

The next section, Deployment Configuration, allows you to determine how you want your apps deployed. This is not really relevant when you only have one server, but it becomes very helpful if you have multiple servers. If you choose "one at a time," AWS will go to each server, attempt to deploy your app, and stop if any servers fail along the way. With all at once or half at a time, CodeDeploy will run in parallel accordingly. This is a much faster, but also much more dangerous option.


The service role should be the role created earlier with the necessary permissions. This role can be re-used for every application, as the permissions are the same regardless.

Finally, the application can be created. The next page is a bit confusing because it does not contain any action buttons. Instead, it says to use the command line to upload an application. Instead of doing that, head back to the main CodeDeploy page and click on "Deployments."

On this page, select your application from the list, then select the group name, paste the full S3 URL to your source into the box, and select your deployment method. Then, click deploy.



You can then see the results of the deployment.


Potential Issues

Besides the ruby dependency issue I mentioned above, I also ran into a very ambiguous error message:

UnknownError: Not Opened for Reading

This message really didn't tell me what was happening, but after logging into the instances, going to the /opt/codedeploy-agent/deployment-root directory and finding that all of the source files contained an XML error from S3 instead of the actual files, I was able to debug it. Be sure that you use all of the permissions listed above for the instance role or you might run into the same problems.

Other Options and Thoughts

Besides deploying from an S3 object, you can also tie into GitHub. While this could work, I prefer to have a 100% working source file before actually deploying to an instance. I still use Jenkins to pull my changes from GitHub, install dependencies (node modules for my apps), run tests, zip everything up, and put it on S3. Once that's done, I can launch a new deployment from the console, or even have Jenkins use the AWS CLI to launch a deployment pointed at the file it just uploaded. While I could certainly install node modules as part of the pre-install hooks on the instance itself, that is much more error prone and slower as well.

UPDATE: AWS has also informed me that there is an open-source Jenkins CodeDeploy plugin available. I've installed the plugin and it works quite nicely; you can easily specify the application name, deployment group, and deployment policy from within Jenkins. Then, it executes as a build step with the same exit codes as Jenkins. Essentially, you can push to GitHub, copy to Jenkins, run tests, then execute a CodeDeploy deployment all as a result of one push (assuming you have the appropriate webhooks).

Overall, CodeDeploy worked very well once I got it working. I was able to deploy my app multiple times in a row without issues and even tested out the "one at a time" feature with an autoscaling group. Everything worked as expected. While I don't think CodeDeploy will be a complete replacement for a tool like Jenkins or other CI suites, it does remove the last few steps and make them more tightly integrated to AWS. I highly reccommend you try CodeDeploy out, but definitely do it in a test environment first until you have the process down to a science.

Wednesday, January 28, 2015

NYC Blizzard 2015

I am going to break with my traditional technology-based posts to share some images I took during the "blizzard" this past Monday. NYC only wound up getting about eight inches of snow - a lot less than expected.