Streaming Response part.2: Implementing Streaming with AWS Lambda, Python, FastAPI, and OpenAI: Leveraging the AWS Web Adapter
Our company has a new requirement involving a series of data analyses followed by a summary of conclusions using OpenAI. However, the colleague responsible for the data analysis part writes in Python. This entire process needs to be wrapped in a Lambda function, and it should also be able to enable Lambda InvokeMode RESPONSE_STREAM. Since my colleague is not familiar with Web Request Response, I had to step in to help speed up the development process.
I'm not very familiar with Python myself, so I spent quite a bit of time researching. I've decided to document this journey in detail, hoping it might help others in similar situations.
3 methods to stream response
In fact, to achieve the Streaming effect, you can use three methods:
Server Sent Event (SSE) => You can refer to this article I wrote
Transfer-Encoding: chunked=>This is the method discussed in the current article
Websocket: There are many tutorials about this online, so I didn't prepare a separate article on it
Supporting RESPONSE_STREAM on AWS Lambda is more complicated with Python than with Node.js.
Transfer-Encoding: chunked
Initially, I found that in April 2023, AWS officially announced support for RESPONSE_STREAM. It appears that this effect is mainly achieved through HTTP Transfer-Encoding: chunked. When you call some APIs capable of streaming, you'll find this Transfer-Encoding: chunked in the response header.
Python requires the use of AWS Lambda Web Adapter and FastAPI.
However, upon further investigation, I discovered that it only supports the Node.js runtime...
After searching online for quite a while, I found this article "aws-lambda-response-streaming," which contains usable code examples. It turns out that you need to use it in conjunction with AWS Lambda Web Adapter. The example in the article teaches you to directly import the Web Adapter image for use. Consequently, when deploying later, you must package your entire application into another Docker image. Finally, you need to upload your packaged Docker image to AWS ECR.
Demo code
The main difference between my demo code and the previously mentioned article "aws-lambda-response-streaming" is that I use GitHub Actions and AWS SAM for deployment.
https://github.com/surferintaiwan/aws-sam-deploy-lambda-python-openai-streaming-response
Test time
use curl
curl -v -N --location '${{FastAPIFunctionUrl}}/api/chat/stream' \ --header 'Content-Type: application/json' \ --header 'Transfer-Encoding: chunked' \ --data '{"messages":[{"role":"user","content":"Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ..."}],"prompt":""}'
use postman
Node.js Fetch
I’ve been working on this for a long time and finally succeeded.const response = await fetch('yourLambdaFunctionUrl/api/chat/stream', { method: 'POST', body: JSON.stringify({ 'messages':[{'role':'user','content':'how to write a blog?'}], 'prompt':'' }), headers: { 'Content-Type': 'application/json' } } ); console.log('response', response); let nice = true; const reader = response.body.getReader(); const decoder = new TextDecoder('utf-8'); while (nice) { const { done, value } = await reader.read(); if (done) break; console.log('value', decoder.decode(value)); }
Reference: