Author here. I didn't expect this to show up here. I found containers on lambda to work really well. I can easily test them locally and they scale to zero when I'm not using them in AWS. So it's perfect for things that are idle a lot.
I have a follow up coming where I use go in a container, and the request speed got a lot better.
The service in this article is my html to text convert, so having a container where I could install OS-dependencies was crucial to getting this working. It's covered here and here:
> I have a follow up coming where I use go in a container, and the request speed got a lot better.
My team transitioned a lightweight python based lambda from zip to container based and we also saw a small (but definitely real) speedup in request time. I'm not the one who did the benchmarking, but iirc it was about 10ms faster. From ~50ms to ~40ms or so.
edit: originally phrased it backwards to seem like container method was slower.
My experience has been the opposite, with simple python docker containers showing noticeably slower cold starts than python zips. I've also found the size of the zip package matters (smaller is faster). However things change over time, maybe there have been recent improvements?
Specifically regarding cold starts of lambda instances, I recall this excellent article from Jan 2021 which measures cold start latencies for a variety of AWS lambda runtimes:
Our benchmarking was only for a to a warmed lambda, I don't believe any language can do a 50ms cold start on lambda. The fastest example in your link is python with around 150ms cold start as the fastest, and that's an outlier from the average.
Are you saying latency is lower on container than plain python?
I recently made a project for the company and didn't even consider containers (which I love btw) because I though I could save some ms without container.
Do you know if lambda containers performance is also good in nodejs? I need to have python and nodejs lambdas.
Versus the .zip file deploy method if that's what you mean, yes the container had a lower latency. The core code didn't change. This is just for the response time on a warmed-up lambda. We didn't benchmark cold-start times, but just from using it the cold-starts don't seem noticeably worse (Maybe they're also better? Honestly I don't know).
Sorry don't know about nodejs yet. This was for a new lambda that wasn't in production yet, so it made sense to experiment.
The increase in speed was unexpected but nice. But we're also assuming this increase is a static ~10ms speedup, and not a 20% speedup, but we only tested one lambda. I'm interested to see Adam's followup post and see if I'm wrong about that.
Oh also just want to say, since the workings of lambda are hidden, it could be some totally different reason why containers perform better. Since its a newer feature, maybe it's running a newer version of the backend that the zip file backend will be moved to. Maybe container-based lambdas are allocated on newer hardware. Maybe containers are actually just faster. Can't ever really know without getting hired by Amazon and signing an NDA, lol.
Someone here recently said that the Lambda Python runtime is not stock Python (apparently it's compiled without optimisations: https://news.ycombinator.com/item?id=30953202) - maybe that has something to do with it?
It definitely could, we are talking in the order of ~10ms, even the version of python over the years can mean such improvement (or better depending on the code of course).
I was interested on the cold starts though. I will have to experiment in the future.
I had much better experience with GCP Cloud Run. Prepare a OCI/Docker image, type `gcloud run` with a few flags and you’re done. In 2021 they added a bunch of features which in my opinion make Cloud Run one of the most trivial ways of deploying containers indented for production usage.
Simply the best cloud service from any of the three big CSP’s bar none. The day AWS supports this more out of the box on Lambda (no, I will not use your required base image or signatures) will be the day containers and serverless become one (like it is already on GCP).
We're trying this out at a large insurance company. Historically actuarial teams created excel workbooks, r code, and python. Then those models were given to development teams to implement in a different language. As one might guess there were loads of bugs and the process was slow. Now we're going to deploy an R lambda that owned by DevOps which integrates all the I/O into dataframes. The lambda calls a calculation in R that takes those dataframes and returns a dataframe answer. If all goes well (prototype works fine), we saved probably 500k and 6 months.
You’ll have to deal with lambda cold starts if you want it to be performant:
> When the Lambda service receives a request to run a function via the Lambda API, the service first prepares an execution environment. During this step, the service downloads the code for the function, which is stored in an internal Amazon S3 bucket (or in Amazon Elastic Container Registry if the function uses container packaging). It then creates an environment with the memory, runtime, and configuration specified. Once complete, Lambda runs any initialization code outside of the event handler before finally running the handler code.
It's not entirely accurate that Lambda pulls container images from ECR at start-up time. Here's me talking about what happens behind the scenes (which, in the real world, often makes things orders of magnitude faster than a full container pull): https://www.youtube.com/watch?v=A-7j0QlGwFk
But your broader point is correct. Cold starts are a challenge, but they're one that the team is constantly working on and improving. You can also help reduce cold-start time by picking languages without heavy VMs (Go, Rust, etc), but reducing work done in 'static' code, and by minimizing the size of your container image. All those things will get less important over time, but they all can have a huge impact on cold-starts now.
Pardon the ignorance, but is the state of lambda containers considered to be single-threaded? Or can they serve requests in parallel?
If I had a Spring Java (well, Kotlin) app that processes stuff off SQS (large startup time but potentially very high parallelism), would you recommend running ECS containers and scale them up based on SQS back-pressure? Or would you package them up as Lambdas with provisioned capacity? Throughput will be fairly consistent (never zero) and occasionally bursty.
I would not use Spring, or Java for that matter, for lambdas, speaking from experience.
"Lambda containers" is a bit of a misnomer, as you can have multiple instances of a function run on the same container, it's just that initial startup time once the container shuts down that is slow (which can be somewhat avoided by a "warming" function set to trigger on a cron).
I would definitely go with containers if your intention is to use Spring. ECS containers can autoscale just the same as lambdas.
There's some work being done to package Java code to run more efficiently in serverless computing environments, but IIRC, it's not there yet.
Thanks! I wasn't planning it, but can't hurt to ask.
When I looked the Lambda API looked uncomplicated to implement (I saw an example somewhere) and it felt like you could just write a few controllers and gain the ability to run a subset of functionality in Lambda, especially if your app could be kept warm.
(to your cron comment, I thought that the reserved capacity would mean the container would be forcibly kept warm?)
Provisioned concurrency is nice, but can get pricey, especially in an autoscaling scenario. It moves you from a pay-per-usage situation to hourly fee + usage model. I would wait until your requirements show you absolutely need it. For most use cases, you will either have enough traffic to keep the lambda warm, or can incur the cost of the cold start. Warming functions did the trick for us. If you think about it provisioned concurrency is paying for managed warmers.
Spring is a one thing, Java is really another. One can use Java without reflection, and then the cold starts are really reduced. Additionally, there's a GraalVm which is optimized VM which should be even faster. On top of that, if the reflection is not used, these days one can compile Java to the native image, which has none of these problems.
When you say fast though, you really are talking in comparison to other methods of using Java on Lambda. But compared to using something like Go, they are all slow.
I'm not an expert in this area, but have you all considered using CRIU[0] (checkpoint restore in userspace) for container-based Lambdas to allow users to snapshot their containers after most of the a language's VM (like Python) has performed its startup? Do you think this would reduce startup times?
Accelerating cold starts with checkpoint and restore is a good idea. There's been a lot of research in academia around it, and some progress in industry too. It's one of those things, though, that works really well for specific use-cases or at small scale, but take a lot of work to generalize and scale up.
For example, one challenge is making sure that random number generators (RNGs) don't ever return the same values ever after cloning (because that completely breaks GCM mode, for example). More details here: https://arxiv.org/abs/2102.12892
As for CRIU specifically, it turned out not to be the right fit for Lambda, because Lambda lets you create multiple processes, interact with the OS in various ways, store local state, and other things that CRIU doesn't model in the way we needed. It's cool stuff, though, and likely a good fit for other use-cases.
They have a feature called "provisioned concurrency" where basically one "instance" of your lambda (or however many you want to configure) stays running warm, so that it can handle requests quickly.
I know it defeats the conceptual purpose of serverless, but it's a nice workaround while cloud platforms work on mitigating the cold start problem.
That'll also cost you $$$$ and takes any provisioned lambda out of the free tier. Also note that only your specified number of instances will stay warm, meaning if your lambda needs to scale up, you risk slow cold starts on additional instances outside of the number you provisioned. You could specify the number of reserved concurrent images (limiting the number that run) but that also costs money and will eat int your quota.
Using containers for lambda is generally a bad idea for anything TypeScript/JavaScript that handles a realtime request - you just can't beat the speed of a single JavaScript file (compiled, in the case of TS). AWS CDK now ships with the NodeJSFunction as well, which makes generating those a breeze with ESBuild.
I've had some pretty good luck sliming things down as well. That's a win win usually for even non-lambda cases (trying things like docker-slim / switching stuff to go that needs a quick response).
That said, the $2-5/month is fine as well for some cases.
It’s nice to have that dial. Running lambda with provisioned concurrency is still a very managed experience: much different than running a container cluster.
If cold starts are at all an issue for whatever use-case, you can just do a warming job like we do (in our case it's built into Ruby on Jets). We find invoking every 30 seconds is enough to never have a cold start. It's still quite cheap as well. The lambda portion of our bill (with tons of platform usage) is still incredibly low / low double digits.
Just doing a warming job with no other usage falls well within free tier usage, I can confirm.
This is definitely an issue especially with infrequently accessed functions but I've seen cold start issues regardless. I assume some scaling events will cause cold starts (measured in seconds).
There's a good argument to go with packaged code instead of containers if you can manage the development complication and versioning (cold starts measured in milliseconds).
My team owns a Node 14 JS lambda application that is completely serverless. We’re focused on keeping our lambdas small with single responsibilities and leverage lambda layers for anything common across multiple lambdas. Cold starts are a concern, but is very negligible (< 100ms tops) and unnoticed by our client apps. We host all of our static web and JS assets via Cloud Front so they load quickly. If a user happened to visit our site when a lambda incurred a cold start, it’s not perceptible. We were much more worried initially about cold starts than it turned out we needed to be. Keeping lambdas small, picking a good language for the job, and leveraging lambda layers help minimize this a lot.
But Lambda also does things that CGI didn't do: dynamic auto-scaling, integration with queues and streams for async processing, strong security isolation with fine-grained authorization, host failure detection and recovery, datacenter failure resilience, and many other things. The interface being familiar and relatively analogous to existing things is intentional. The point is to be different where being different helps, and familiar where being different doesn't help.
You get like 90% of that dropping your CGI directory on NFS shared among multiple hosts. AWS offers a full featured host environment, so you don't have to build that part, but the part you write is quite similar.
As someone who has been around long enough to actually remember setting up /cgi-bin/... there is a lot more to lambdas.
They are scalable (including, importantly, down to zero!), have their own dedicated resources for each process, and are pretty efficient at being put to sleep and woken up. Plus the code is immutable by default and you get a lot out of the box like logging and setting limits.
I wouldn't start a new project with CGI at all right now but I use Lambda constantly.
In addition to what the other replies said I'd like to offer the following observation:
Lambda is the control plane running inside an "AWS OS" context, that means it has access to internal apis with scoped permissions. Most commonly people discuss user facing lambdas on the edge, however you are not obligated to expose it to the world.
If you do choose to go the cloud route understand that your account activities generate quite a lot of data. Simplest example would be custom CloudWatch events generated from say autoscaling groups i.e. "Server 23 has ram usage above some moving average threshold" => kick off lambda => custom logic (send email/slack, reboot server, spin up more instances, whatever).
People who like middle brow dismissals would say "what does it matter where the script runs, it could just as easily be running on the instance itself" - to them I say, pain is the best teacher. :)
We are using a container hosting .net6 with our lambda. We use it where I think lambdas really work well, that is to process queue items off of SQS. It works well with the dead-letter queue as well. We dont notice any performance issues, but this is just a processor, so we don't need to worry about real-time responses either.
Until it it fails, and the backing CF errors and won’t resolve itself for over 24rs. Bit twice, never again with Copilot. Good idea they had, just shaky foundation.
I think CGI is a good high level way of think about AWS Lambda and other Serverless compute platforms. I think the key innovations are the integrations with cloud services, and the scaling/economic model. Videos like the one linked below really demonstrate the level of optimizations implemented to make this work on the "cloud" scale. I think Serverless Compute platforms are going to become a really meaningful part of the software development ecosystem.
Feels pretty similar to any typical fastcgi implementation with pools, scaling up/down, separate spaces for initialization code and per-request code, etc.
I'v found that using this will cause Lambda sometimes to return 500 errors while it's reloading the container image from the registry. This might be the price for allowing large images, they've decided not to do it in a blocking way.
How long does it take to fetch the container - is it warm or cold? For AWS Batch it was taking me 1-3 min. So I was really surprised/ happy to see this lambda container post.
It's warm - when you change the ImageUri or "Update Code" for the lambda definition, it downloads the container into "somewhere" lambda-y - this takes a few seconds depending on size. Startups are fairly quick, but because of the way it persists your running image in memory, your container is generally (on frequent usage) 'paused' between invocations, and resumes quite quickly.
We're working on some more formal documentation of this, both to help customers understand how to optimize things, and to join the research conversation about container loading. Our solution combines deterministic flattening, demand loading, erasure coding, convergent encryption, and a couple of other cool technologies to both accelerate the average case and drive down tail latency.
Cost wise you are just paying for the time and memory when the lambda is running. So the image may be 'close' to the lambda but it's billed like other lambdas.
Not sure about Django but on the Node.js side there are Express.js compatible libraries that let you write your app like you would Express but it's exposed via Lambda and API gateway. Good chance Python has something similar. Biggest difference is you're not dealing with HTTP (directly) but it can be abstracted to -- from a framework perspective -- act like HTTP.
I am actually working on a python library that works like you described at https://cdevframework.io! The goal is the provided a developer experience like Django that then deploys onto Serverless Infrastructure. It is a long way from that goal, but one day it will get there!
Is Aurora v2 still inaccessible from outside the VPC? The fact that I have to set up some kind of proxy EC2 in order to view the database from my machine is a non-starter for me.
This is going to throw unknowing readers for a loop, because it's a comment trying to be cheeky.
Simply put: Fargate/ECS/EC2+EB = long running tasks
Lambda = Short burst tasks with a max life of 15 minutes
Running a lambda 24/7 will nuke your credit card. Using Fargate for a scheduler/cron job that only runs 4 times a day will nuke your credit card. Use the right tool for the right job.
The bulk of the cost here is usually on the EC2 side. The main extra cost is the price it takes your runner EC2 instance(s) to bootstrap - ie usually about 2-3 minutes. Can be mitigated by using prepaid discounts, etc.
I have a follow up coming where I use go in a container, and the request speed got a lot better.
The service in this article is my html to text convert, so having a container where I could install OS-dependencies was crucial to getting this working. It's covered here and here:
https://news.ycombinator.com/item?id=30829568
https://earthly-tools.com/text-mode