Getting the “ChatGPT” effect of streaming tokens back to the frontend is invaluable, especially for chatbots where response times can vary between 1 to 60 seconds.
In April 2023, AWS announced support for streaming with Lambda’s HTTP Response Streaming – it offers an easy-to-implement solution with improved Time to First Byte (TTFB) and reduced memory usage.
But is it the right choice for your chatbot or generative AI app? As always, it depends.
While Lambda shines for small-scale applications and prototypes, scaling to production workloads introduces challenges. In this blog post, I’ll cover:
- How Lambda Response Streaming works and the benefits it offers.
- Key cost considerations with real examples of monthly costs
- Caveats and limitations for scaling Lambda in production.
- Alternatives to consider for high-traffic or persistent chatbots.
Need help with a generative AI idea? Book your free 30min consultation call.
How Lambda HTTP Response Streaming Works

Lambda allows you to stream HTTP responses via its InvokeWithResponseStream API or function URLs. This approach is designed to improve Time to First Byte (TTFB) performance and handle payloads larger than the standard 6 MB limit, up to 20 MB.

Is this really necessary?
The following 2 figures explain the difference between serverless architectures with and without response streaming. Overall, because there is less overhead, you’ll see less time to start sending information back to the user.


The comparison between the Traditional API Method and Lambda Response Streaming API Method highlights a key advantage of AWS Lambda’s streaming capabilities: significantly improved Time to First Byte (TTFB).
Unlike the traditional approach, where the client waits for the entire response to be processed before receiving data, Lambda Response Streaming delivers data incrementally, enhancing user experience and reducing perceived latency. This makes it ideal for real-time applications or dynamic datasets.
With 85% faster TTFB and 30% faster total API times, it’s clear that Lambda’s streaming approach can keep users engaged while reducing drop-off rates—though it comes with a slightly more complex setup.
| Feature | Traditional API Method | Lambda Response Streaming API Method |
|---|---|---|
| Response Timing | Sent after all processing is complete. | Sent incrementally as data becomes available. |
| User Experience | High perceived latency (poor TTFB). | Low perceived latency (faster TTFB). |
| Ideal Use Case | Small datasets, batch responses. | Large datasets, dynamic or partial rendering. |
| Architecture Complexity | Simple, no streaming logic. | More complex, requires streaming setup. |

Advantages of Lambda Response Streaming
- Faster Initial Rendering: Improves perceived performance by starting the response earlier.
- Lower Memory Usage: No need to store the entire dataset in Lambda’s memory before responding.
- Dynamic Experience: Ideal for chatbots, streaming dashboards, or any app needing real-time updates.
Pricing Highlights
- Response Streaming:
- The first 6MB per request is free.
- Beyond 6MB: $0.008 per GB.
- Standard Lambda Costs:
- Memory cost: $0.0000166667 per GB-second (for the first tier).
- Request cost: $0.20 per 1M requests.
Cost Comparison for Chatbot Scenarios
Let’s compare costs across different traffic levels and architectures. As you can see, at low/moderate traffic, Lambda makes lots of sense for its high developer velocity and generous free tier.
| Scenario | Lambda (Without Streaming) | Lambda (With Streaming) | ECS with Fargate | App Runner |
|---|---|---|---|---|
| Cost per Unit | $0.0000167/GB-second | $0.0000167/GB-second + $0.008/GB | $75 flat/month | $85 flat/month |
| Low traffic (10K req/month) | ~$0.84 | ~$1.17 | ~$8 | ~$10 |
| Moderate traffic (100K req/month) | ~$8.35 | ~$11.67 | ~$20 | ~$25 |
| High traffic (1M req/month) | ~$83.53 | ~$116.66 | ~$75 | ~$85 |
| Very high traffic (10M req/month) | ~$835.34 | ~$1166.59 | ~$75 | ~$85 |
At 10-100K requests/month (for most small apps), Lambda is competitive. However, at higher request volume (10M requests per month), AWS Lambda costs skyrocket to ~$833/month, while ECS with Fargate and App Runner remain at a predictable ~$75–85/month.
When Lambda Works for Streaming Chatbots
Lambda can work well for:
- Prototypes and MVPs: Launch quickly without infrastructure overhead.
- Low Traffic: If your chatbot handles fewer than 10,000 requests per month, Lambda’s free tier makes it cost-effective.
- Short-Lived Use Cases: Ideal for bursty, event-driven workflows (e.g., chatbots for limited campaigns).
Caveats of Lambda Response Streaming for Chatbots
While AWS Lambda Response Streaming offers significant advantages for small-scale applications and prototyping, there are several caveats to consider when scaling up to production-grade workloads:
1. Costs Scale Linearly with Traffic
Lambda’s cost structure is usage-based, which makes it highly flexible for low-traffic applications. However, at higher request volumes, the costs grow significantly due to:
- Compute costs: $0.0000167/GB-second for memory usage.
- Streaming costs: $0.008 per GB streamed beyond the free 6MB per request.
For example:
- At 10,000 requests/month, Lambda costs only ~$1.17 (with streaming).
- At 10 million requests/month, costs balloon to ~$1166, far exceeding the flat ~$75–85/month pricing of ECS with Fargate or App Runner.
2. Cold Starts Impact User Experience
Cold starts can delay your chatbot’s initial response by 100ms to 1 second, especially during periods of low traffic. For real-time applications, this can negatively affect user experience. While provisioned concurrency mitigates this issue, it adds additional cost and complexity.
3. Stateless Nature Adds Overhead
Lambda functions are stateless, meaning any context (e.g., chat history) must be managed externally in databases like DynamoDB or Redis. This:
- Increases latency as external calls add processing time.
- Adds complexity to your architecture, especially as traffic scales.
4. Limited Long-Lived Connection Support
While Lambda Response Streaming improves Time to First Byte (TTFB), it’s not designed for maintaining persistent connections or handling continuous streams of data like WebSocket APIs or long-lived containerized applications.
When to Choose Lambda Response Streaming
Ideal Use Cases
Lambda Response Streaming is a great choice for:
- Small Applications and Prototypes:
- If your chatbot handles fewer than 100,000 requests per month, Lambda’s free tier and granular pricing make it cost-effective and easy to deploy.
- Short-Term or Event-Driven Workloads:
- Perfect for seasonal chatbots, limited-time campaigns, or testing concepts.
- Fast Prototyping:
- With no infrastructure to manage, you can quickly iterate and test ideas.
Key Benefits
- Developer Velocity: Focus on building functionality without worrying about infrastructure.
- Cost-Effective for Low Traffic: Takes full advantage of the free tier and usage-based pricing.
- Simple Setup: No container orchestration or server management required.
When to Consider Alternatives
For high-traffic workloads (>1M requests/month) or persistent, stateful applications, ECS with Fargate or App Runner offer significant advantages:
- Flat-Rate Pricing: Consistent costs of ~$75–85/month make these solutions more predictable and affordable for large-scale deployments.
- Better Performance: No cold starts or state management overhead.
- Persistent Connections: Ideal for real-time interactions using WebSockets or long-running tasks.
Balancing the Decision
If you’re building a chatbot and expect low to moderate traffic, AWS Lambda with Response Streaming can be a powerful solution. However, as your application scales, consider transitioning to ECS with Fargate or App Runner to control costs, reduce latency, and simplify architecture.
For those just starting, Lambda provides a great foundation. For established, high-traffic workloads, containers are the way forward.
Helpful Links
Need help with a generative AI idea? Book your free 30min consultation call.
