Monday, December 7, 2015

AWS Lambda: "Occasionally Reliable Caching"

One of the biggest misconceptions I've heard from developers working with Lambda for the first time is that every execution of a Lambda function is entirely isolated and independent. AWS may be slightly responsible for this, as their documentation states:
Each AWS Lambda function runs in its own isolated environment, with its own resources and file system view. AWS Lambda uses the same techniques as Amazon EC2 to provide security and separation at the infrastructure and execution levels.
The trouble with this statement is that, while each function may be running in an isolated environment, the executions of that function have access to the same memory and disk resources. I'll skip right to an example.

var str = 'hello';

exports.handler = function(event, context) {
console.log(str);
str = 'it\'s me';
context.succeed();
};

Create a new Lambda function using this code and then run it twice. Here's the output:

START RequestId: 031528d9-8eea-22e5-b1e5-13516c9e3426 Version: $LATEST
2015-12-07T22:37:50.125Z 031528d9-8eea-22e5-b1e5-13516c9e3426 hello
END RequestId: 031528d9-8eea-22e5-b1e5-13516c9e3426
REPORT RequestId: 031528d9-8eea-22e5-b1e5-13516c9e3426 Duration: 10.91 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 8 MB

START RequestId: 4ac77131-8aca-11e3-874c-cb361dfcaaf1 Version: $LATEST
2015-12-07T22:38:10.101Z 4ac77131-8aca-11e3-874c-cb361dfcaaf1 it's me
END RequestId: 4ac77131-8aca-11e3-874c-cb361dfcaaf1
REPORT RequestId: 4ac77131-8aca-11e3-874c-cb361dfcaaf1 Duration: 8.49 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 9 MB

Run the function again in 30 minutes. It will output "hello" again because the container had to be reinitialized.

Not so isolated, right? To be fair, AWS does claim that any code outside of the handler function is treated as global code and is only initialized once per container. However, I've seen numerous developers mistakenly add code outside of the handler that should remain private. As Lambda becomes more ubiquitous, the distinction is critical for both performance and security.

Performance

Because code outside of the handler is only initialized once, this is the perfect spot for initializing Node modules, making database connections, etc. Additionally, it's also where global caching can be added. Take the following sample code:

var CACHE = {};

exports.handler = function(event, context) {
if (CACHE[event.item]) {
return context.succeed(CACHE[event.item]);
}

// Lookup object in database
db.find(event.item, function(err, item){
if (err) return context.fail(err);

CACHE[event.item] = item;
context.succeed(item);
});
};

Adding caching outside of the handler allows cached items to be shared across multiple executions of the same function. Because AWS does not guarantee that each execution will occur on the same container (due to scaling or lack of use), I've nicknamed this form of caching "Occasionally Reliable Caching" (ORC).

Security

As you can imagine, there are important security considerations here as well. Take the following hypothetical code:

var creditCard;

exports.handler = function(event, context) {
creditCard = {
number: event.credit_card_number,
exp: event.credit_card_expiration,
code: event.credit_card_security_code
};
// Connect to payment system to verify
payments.verify(creditCard, function(err){
if (err) {
context.fail('Invalid card');
} else {
context.succeed('Payment okay');
}
})
};

While this is honestly just as bad as storing credit card information in a global variable on a web server, outside of the request handler, the distinction isn't as clear or as well-known with Lambda. In the example above, if there are multiple executions occurring simultaneously, there is no guarantee as to which card is being verified.

This is an extreme example to indicate an obvious code error, but also note that Lambda is not yet PCI certified, so shouldn't be used for this kind of data in the first place.

As developers begin working more with Lambda, I expect this post to become unneeded; but until Lambda's execution model is more well-understood, it is important to keep in mind.

Enjoy this post? Want to learn more about Lambda? Pre-order my complete Lambda book on Amazon for only $3.99!

3 comments:

  1. AWS explains on this on the compute blog: https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/

    ReplyDelete
  2. "if there are multiple executions occurring simultaneously, there is no guarantee as to which card is being verified."

    This is not true in the example as currently written. Sure, storing the credit card in a global variable that persists after the function is inoked is bad practice. However, the next time the function is invoked you overwrite the old value and test the new value correctly.

    Each instance of the AWS Lambda function will have its own copy of the global variable. No values will migrate between then even if multiple copies of the function are invoked simultaneously.

    AWS Lambda is not multithreaded programming. Multiple concurrent invocations do not share memory.

    ReplyDelete
  3. This can also be an advantage for performance if you use container re-use to your advantage.

    ReplyDelete