How to Stream LLM Responses with AWS Bedrock and Lambda

Getting the “ChatGPT” effect of streaming tokens back to the frontend is invaluable, especially for chatbots where response times can vary between 1 to 60 seconds.

In April 2023, AWS announced support for streaming with Lambda’s HTTP Response Streaming – it offers an easy-to-implement solution with improved Time to First Byte (TTFB) and reduced memory usage.

But is it the right choice for your chatbot or generative AI app? As always, it depends.

While Lambda shines for small-scale applications and prototypes, scaling to production workloads introduces challenges. In this blog post, I’ll cover:

  • How Lambda Response Streaming works and the benefits it offers.
  • Key cost considerations with real examples of monthly costs
  • Caveats and limitations for scaling Lambda in production.
  • Alternatives to consider for high-traffic or persistent chatbots.
Lambda Response Streaming in action!

Need help with a generative AI idea? Book your free 30min consultation call.

How Lambda HTTP Response Streaming Works

A technical diagram, for you technical folks.

Lambda allows you to stream HTTP responses via its InvokeWithResponseStream API or function URLs. This approach is designed to improve Time to First Byte (TTFB) performance and handle payloads larger than the standard 6 MB limit, up to 20 MB.

Response streaming improves user experience with time to first byte. Source: AWS Documentation

Is this really necessary?

The following 2 figures explain the difference between serverless architectures with and without response streaming. Overall, because there is less overhead, you’ll see less time to start sending information back to the user.

Traditional serverless architectures with React / API Gateway / Lambda / DynamoDB.
You can stream responses directly from AWS! Source: AWS

The comparison between the Traditional API Method and Lambda Response Streaming API Method highlights a key advantage of AWS Lambda’s streaming capabilities: significantly improved Time to First Byte (TTFB).

Unlike the traditional approach, where the client waits for the entire response to be processed before receiving data, Lambda Response Streaming delivers data incrementally, enhancing user experience and reducing perceived latency. This makes it ideal for real-time applications or dynamic datasets.

With 85% faster TTFB and 30% faster total API times, it’s clear that Lambda’s streaming approach can keep users engaged while reducing drop-off rates—though it comes with a slightly more complex setup.

FeatureTraditional API MethodLambda Response Streaming API Method
Response TimingSent after all processing is complete.Sent incrementally as data becomes available.
User ExperienceHigh perceived latency (poor TTFB).Low perceived latency (faster TTFB).
Ideal Use CaseSmall datasets, batch responses.Large datasets, dynamic or partial rendering.
Architecture ComplexitySimple, no streaming logic.More complex, requires streaming setup.
An extra 1.5 seconds speed advantage could be your competitive advantage.

Advantages of Lambda Response Streaming

  1. Faster Initial Rendering: Improves perceived performance by starting the response earlier.
  2. Lower Memory Usage: No need to store the entire dataset in Lambda’s memory before responding.
  3. Dynamic Experience: Ideal for chatbots, streaming dashboards, or any app needing real-time updates.

Pricing Highlights

  • Response Streaming:
  • Standard Lambda Costs:

Cost Comparison for Chatbot Scenarios

Let’s compare costs across different traffic levels and architectures. As you can see, at low/moderate traffic, Lambda makes lots of sense for its high developer velocity and generous free tier.

ScenarioLambda (Without Streaming)Lambda (With Streaming)ECS with FargateApp Runner
Cost per Unit$0.0000167/GB-second$0.0000167/GB-second + $0.008/GB$75 flat/month$85 flat/month
Low traffic (10K req/month)~$0.84~$1.17~$8~$10
Moderate traffic (100K req/month)~$8.35~$11.67~$20~$25
High traffic (1M req/month)~$83.53~$116.66~$75~$85
Very high traffic (10M req/month)~$835.34~$1166.59~$75~$85
Comparing costs for different streaming options. Source: Lambda Pricing

At 10-100K requests/month (for most small apps), Lambda is competitive. However, at higher request volume (10M requests per month), AWS Lambda costs skyrocket to ~$833/month, while ECS with Fargate and App Runner remain at a predictable ~$75–85/month.

When Lambda Works for Streaming Chatbots

Lambda can work well for:

  • Prototypes and MVPs: Launch quickly without infrastructure overhead.
  • Low Traffic: If your chatbot handles fewer than 10,000 requests per month, Lambda’s free tier makes it cost-effective.
  • Short-Lived Use Cases: Ideal for bursty, event-driven workflows (e.g., chatbots for limited campaigns).

Caveats of Lambda Response Streaming for Chatbots

While AWS Lambda Response Streaming offers significant advantages for small-scale applications and prototyping, there are several caveats to consider when scaling up to production-grade workloads:

1. Costs Scale Linearly with Traffic

Lambda’s cost structure is usage-based, which makes it highly flexible for low-traffic applications. However, at higher request volumes, the costs grow significantly due to:

  • Compute costs: $0.0000167/GB-second for memory usage.
  • Streaming costs: $0.008 per GB streamed beyond the free 6MB per request.

For example:

  • At 10,000 requests/month, Lambda costs only ~$1.17 (with streaming).
  • At 10 million requests/month, costs balloon to ~$1166, far exceeding the flat ~$75–85/month pricing of ECS with Fargate or App Runner.

2. Cold Starts Impact User Experience

Cold starts can delay your chatbot’s initial response by 100ms to 1 second, especially during periods of low traffic. For real-time applications, this can negatively affect user experience. While provisioned concurrency mitigates this issue, it adds additional cost and complexity.

3. Stateless Nature Adds Overhead

Lambda functions are stateless, meaning any context (e.g., chat history) must be managed externally in databases like DynamoDB or Redis. This:

  • Increases latency as external calls add processing time.
  • Adds complexity to your architecture, especially as traffic scales.

4. Limited Long-Lived Connection Support

While Lambda Response Streaming improves Time to First Byte (TTFB), it’s not designed for maintaining persistent connections or handling continuous streams of data like WebSocket APIs or long-lived containerized applications.


When to Choose Lambda Response Streaming

Ideal Use Cases

Lambda Response Streaming is a great choice for:

  1. Small Applications and Prototypes:
    • If your chatbot handles fewer than 100,000 requests per month, Lambda’s free tier and granular pricing make it cost-effective and easy to deploy.
  2. Short-Term or Event-Driven Workloads:
    • Perfect for seasonal chatbots, limited-time campaigns, or testing concepts.
  3. Fast Prototyping:
    • With no infrastructure to manage, you can quickly iterate and test ideas.

Key Benefits

  • Developer Velocity: Focus on building functionality without worrying about infrastructure.
  • Cost-Effective for Low Traffic: Takes full advantage of the free tier and usage-based pricing.
  • Simple Setup: No container orchestration or server management required.

When to Consider Alternatives

For high-traffic workloads (>1M requests/month) or persistent, stateful applications, ECS with Fargate or App Runner offer significant advantages:

  • Flat-Rate Pricing: Consistent costs of ~$75–85/month make these solutions more predictable and affordable for large-scale deployments.
  • Better Performance: No cold starts or state management overhead.
  • Persistent Connections: Ideal for real-time interactions using WebSockets or long-running tasks.

Balancing the Decision

If you’re building a chatbot and expect low to moderate traffic, AWS Lambda with Response Streaming can be a powerful solution. However, as your application scales, consider transitioning to ECS with Fargate or App Runner to control costs, reduce latency, and simplify architecture.

For those just starting, Lambda provides a great foundation. For established, high-traffic workloads, containers are the way forward.


Helpful Links

Need help with a generative AI idea? Book your free 30min consultation call.