Streaming Response part.2: Implementing Streaming with AWS Lambda, Python, FastAPI, and OpenAI: Leveraging the AWS Web Adapter

Streaming Response part.2: Implementing Streaming with AWS Lambda, Python, FastAPI, and OpenAI: Leveraging the AWS Web Adapter

Our company has a new requirement involving a series of data analyses followed by a summary of conclusions using OpenAI. However, the colleague responsible for the data analysis part writes in Python. This entire process needs to be wrapped in a Lambda function, and it should also be able to enable Lambda InvokeMode RESPONSE_STREAM. Since my colleague is not familiar with Web Request Response, I had to step in to help speed up the development process.

I'm not very familiar with Python myself, so I spent quite a bit of time researching. I've decided to document this journey in detail, hoping it might help others in similar situations.

3 methods to stream response

In fact, to achieve the Streaming effect, you can use three methods:

  1. Server Sent Event (SSE) => You can refer to this article I wrote

  2. Transfer-Encoding: chunked=>This is the method discussed in the current article

  3. Websocket: There are many tutorials about this online, so I didn't prepare a separate article on it

Supporting RESPONSE_STREAM on AWS Lambda is more complicated with Python than with Node.js.

Transfer-Encoding: chunked

Initially, I found that in April 2023, AWS officially announced support for RESPONSE_STREAM. It appears that this effect is mainly achieved through HTTP Transfer-Encoding: chunked. When you call some APIs capable of streaming, you'll find this Transfer-Encoding: chunked in the response header.

Python requires the use of AWS Lambda Web Adapter and FastAPI.

However, upon further investigation, I discovered that it only supports the Node.js runtime...

After searching online for quite a while, I found this article "aws-lambda-response-streaming," which contains usable code examples. It turns out that you need to use it in conjunction with AWS Lambda Web Adapter. The example in the article teaches you to directly import the Web Adapter image for use. Consequently, when deploying later, you must package your entire application into another Docker image. Finally, you need to upload your packaged Docker image to AWS ECR.

Demo code

The main difference between my demo code and the previously mentioned article "aws-lambda-response-streaming" is that I use GitHub Actions and AWS SAM for deployment.

https://github.com/surferintaiwan/aws-sam-deploy-lambda-python-openai-streaming-response

Test time

  1. use curl

  2.  curl -v -N --location '${{FastAPIFunctionUrl}}/api/chat/stream' \
     --header 'Content-Type: application/json' \
     --header 'Transfer-Encoding: chunked' \
     --data '{"messages":[{"role":"user","content":"Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ..."}],"prompt":""}'
    
  1. use postman

  2. Node.js Fetch
    I’ve been working on this for a long time and finally succeeded.

     const response  = await fetch('yourLambdaFunctionUrl/api/chat/stream', 
         { 
           method: 'POST', 
           body: JSON.stringify({ 'messages':[{'role':'user','content':'how to write a blog?'}], 'prompt':'' }),
           headers: {
             'Content-Type': 'application/json'
           }
         }
       );
    
       console.log('response', response);
       let nice = true;
       const reader = response.body.getReader();
       const decoder = new TextDecoder('utf-8');
    
       while (nice) {
         const { done, value } = await reader.read();
         if (done) break;
         console.log('value', decoder.decode(value));
       }
    

Reference: