Langchain LLM Streaming langchain通过callback机制,可以将LLM输出的token实时处理
from langchain.chat_models import ChatOpenAI from langchain.schema import ( HumanMessage, ) from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler chat = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0) resp = chat([HumanMessage(content="Write me a song about sparkling water.")]) langchian同时支持同步IO和异步IO的方式来输出token,分别对应StreamingStdOutCallbackHandler 和AsyncIteratorCallbackHandler
StreamingStdOutCallbackHandler 首先可以看看Langchain官方实现StreamingStdOutCallbackHandler, 将LLM输出的token实时打印在终端, 主要实现了on_llm_new_token
class StreamingStdOutCallbackHandler(BaseCallbackHandler): ... def on_llm_new_token(self, token: str, **kwargs: Any) -> None: """Run on new LLM token. Only available when streaming is enabled.""" sys.stdout.write(token) sys.stdout.flush() 但这种方式是同步的,接下来看看异步IO的方式
AsyncIteratorCallbackHandler 以下是使用AsyncIteratorCallbackHandler, 异步打印返回的token
import asyncio from langchain.callbacks import AsyncIteratorCallbackHandler from langchain.