搜索引擎在进行搜索工作时不再局限于用户所输入请求语句的字面本身,而是能够准确地理解用户所输入语句后面的真正意图进行搜索,从而更加准确地返回最符合用户需求的搜索结果。
语义搜索应用用户意图、上下文和概念含义来将用户查询与相应的内容相匹配。它使用矢量搜索和机器学习来返回旨在匹配用户查询的结果,即使没有单词匹配也是如此。
它利用自然语言处理(NLP)和机器学习技术,以便更好地理解文本的语义含义,从而提高搜索的准确性和相关性。
实现语义搜索通常涉及以下步骤:
检索增强生成(RAG)是Meta于2020年推出的一种技术,通过为模型提供相关上下文和问题/任务细节,提高了语言模型的性能;通过添加提供数据的信息检索来增强大型语言模型 (LLM) (如 ChatGPT)的功能。 RAG 意味着可以将自然语言处理限制为源自矢量化文档、图像、音频和视频 的企业内容 。
检索增强生成是机器学习领域两种强大技术的迷人融合:检索和生成。
嵌入(embeddings)捕捉文本、图像、视频或其他信息类型的“相关性”。这种相关性最常用于以下方面:
嵌入将离散信息(单词和符号)压缩成分布式连续值数据(向量)。
OpenAI 提供了一个 API(当然,还有其他模型也提供了类似的能力),可以使用其语言模型为文本字符串生成嵌入;最新的嵌入模型 text-embedding-ada-002 会输出 1536 个维度。
向量数据库是一种特殊类型的数据库,它可以存储和处理向量数据。它的一个关键特性是它能够快速地找到与给定向量最相似的其他向量,这是通过计算向量之间的距离(例如欧氏距离或余弦相似度)来实现的。
向量数据库 vs 关系型数据库 vs 非关系型数据库
向量数据库在处理大规模、高维度的任务时更为高效。由于向量数据库可以直接在向量空间中进行搜索,它们可以快速地找到与给定向量最相似的其他向量。向量数据库还可以处理非结构化的数据,如图像和文本,这是关系数据库无法做到的。
关系数据库是最常见的数据库类型,它们使用表格的形式来存储数据,并通过预定义的关系来连接不同的表。关系数据库的一个主要优点是它们可以保证数据的一致性和完整性。然而,关系数据库在处理大规模、高维度的数据时可能会遇到困难。
非关系数据库,也被称为NoSQL数据库,是一种灵活的数据库类型,它们可以处理各种类型的数据,包括结构化的、半结构化的和非结构化的数据。非关系数据库的一个主要优点是它们可以很好地处理大规模的数据,并且可以很容易地进行水平扩展。然而,非关系数据库在处理复杂的查询和高维度的数据时可能会遇到困难。
主流的向量数据库
在市场上,有几种流行的向量数据库,包括Faiss、Milvus、Annoy、Pinecone、chroma等。
特性/库 | Faiss | Milvus | Annoy | Pinecone | Chroma |
开发者 | Facebook AI | Zilliz (开源) | Spotify (开源) | Pinecone (SaaS) | Lystic Labs |
是否开源 | 是 | 是 | 是 | 否 | 是 |
云端支持 | 否 | 是 | 否 | 是 | 是 |
优点 | – 高性能的相似性搜索 – 丰富的索引选项 – 社区活跃度高 | – 强大的向量检索功能 – 支持多种索引算法- 社区支持佳 | – 轻量级,易于使用 – 内存效率高 – 支持大规模数据 | – 简单的托管服务 – 强大的实时搜索 – 可扩展性 | – 高性能向量搜索 – 容易使用的API – 自动索引优化 |
缺点 | – 需要自己搭建和管理 – 学习曲线较陡峭 | – 需要自己搭建和管理 – 配置相对复杂 | – 不适合大规模数据 – 仅提供基本功能 | – 价格相对较高 – 无法自托管 | – 有使用费用 – 某些功能可能有限制 |
价格 | 开源免费 | 开源免费 | 开源免费 | 按使用计费 | 免费试用可用 |
下面提供了分词和token计算的工具,可以先上手体验一下:
https://langchain-text-splitter.streamlit.app/
https://platform.openai.com/tokenizer
该图基于Langchain实现,图中内容基本上都是Langchain的六大模块中的小插件。该过程包括:加载文件 -> 读取文本 -> 文本分割 -> 文本向量化 -> 问句向量化 -> 在文本向量中匹配出与问句向量最相似的 top k
个 -> 匹配出的文本作为上下文和问题一起添加到 prompt
中 -> 提交给 LLM
生成回答。
其中:
所以,1-11步是整个检索的过程,运用了向量数据库去寻找相似性内容的能力;12-15步是LLM进行内容创作的过程,让检索的答案进行组装并让输出更友好。这里面,如果使用传统的关键词匹配等搜索,只要在LLM的token不受限情况下,其实也可以完成后续的步骤。
从文档的角度,分为以下五步:
Top_K
个结果。Top_K
个结果还原成真实文本信息参考信息
:将3.b中的上下文回填信息作为参考资料素材Query提问
:将Query作为提问素材原始文档中存在大量的重复内容,这些重复内容不太适合LLM,会产生很多不必要的上下文。
解决方案:
如果对于同一个问题,不同来源的数据给出不同的回答,则会导致信息冲突。如果将这些数据据全部都给到LLM,可能会导致LLM混乱。
解决方案:
文档可能一直处于变化中,后续会有不同的版本;LLM在具体使用中应该给予最新鲜的信息。
解决方案:
某些情况下,用户提出的问题更侧重于元数据信息而非内容本身。
例如,用户可能会查询“1980年间关于外星人的电影”。其中,“关于外星人的电影”这一部分可以进行语义搜索,而”1980年间“其实是需要通过精确匹配来筛选结果的。
许多向量存储器都允许在查询前先通过元数据过滤器筛选数据。如果大家选择的向量存储器不支持在查询前进行元数据过滤,那么在语义搜索之后再过滤数据也是一个可行的方案。
解决方案:
用户可能会一次提出多个问题,这会给语义搜索带来挑战。
解决方案:
如图,描述了该Index的详细信息,Index的TITLE
、EHV
和Key
是接入该索引的重要信息。
{
"name": "PDF Loader-demo",
"description": "Load a PDF and start asking questions about it.",
"data": {
"nodes": [
{
"width": 384,
"height": 267,
"id": "VectorStoreAgent-oWxqW",
"type": "genericNode",
"position": {
"x": 1759.0521504033006,
"y": -1084.8109307754983
},
"data": {
"type": "VectorStoreAgent",
"node": {
"template": {
"llm": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "llm",
"display_name": "LLM",
"advanced": false,
"dynamic": false,
"info": "",
"type": "BaseLanguageModel",
"list": false
},
"vectorstoreinfo": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "vectorstoreinfo",
"display_name": "Vector Store Info",
"advanced": false,
"dynamic": false,
"info": "",
"type": "VectorStoreInfo",
"list": false
},
"_type": "vectorstore_agent"
},
"description": "Construct an agent from a Vector Store.",
"base_classes": [
"AgentExecutor"
],
"display_name": "VectorStoreAgent",
"documentation": ""
},
"id": "VectorStoreAgent-oWxqW",
"value": null
},
"selected": false,
"positionAbsolute": {
"x": 1759.0521504033006,
"y": -1084.8109307754983
}
},
{
"width": 384,
"height": 399,
"id": "VectorStoreInfo-xaM04",
"type": "genericNode",
"position": {
"x": 1196.8213224104938,
"y": -1126.393770900602
},
"data": {
"type": "VectorStoreInfo",
"node": {
"template": {
"vectorstore": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "vectorstore",
"advanced": false,
"dynamic": false,
"info": "",
"type": "VectorStore",
"list": false
},
"description": {
"required": true,
"placeholder": "",
"show": true,
"multiline": true,
"password": false,
"name": "description",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false,
"value": "Information about a PDF File"
},
"name": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "name",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false,
"value": "PDF"
},
"_type": "VectorStoreInfo"
},
"description": "Information about a VectorStore.",
"base_classes": [
"VectorStoreInfo"
],
"display_name": "VectorStoreInfo",
"documentation": ""
},
"id": "VectorStoreInfo-xaM04",
"value": null
},
"selected": false,
"positionAbsolute": {
"x": 1196.8213224104938,
"y": -1126.393770900602
},
"dragging": false
},
{
"width": 384,
"height": 359,
"id": "OpenAIEmbeddings-CQwCi",
"type": "genericNode",
"position": {
"x": 246.115851921792,
"y": -523.4791223726195
},
"data": {
"type": "OpenAIEmbeddings",
"node": {
"template": {
"allowed_special": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": [],
"password": false,
"name": "allowed_special",
"advanced": true,
"dynamic": false,
"info": "",
"type": "Literal'all'",
"list": true
},
"disallowed_special": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "all",
"password": false,
"name": "disallowed_special",
"advanced": true,
"dynamic": false,
"info": "",
"type": "Literal'all'",
"list": true
},
"chunk_size": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": 1000,
"password": false,
"name": "chunk_size",
"advanced": true,
"dynamic": false,
"info": "",
"type": "int",
"list": false
},
"client": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "client",
"advanced": true,
"dynamic": false,
"info": "",
"type": "Any",
"list": false
},
"deployment": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "text-embedding-ada-002",
"password": false,
"name": "deployment",
"advanced": true,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"embedding_ctx_length": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": 8191,
"password": false,
"name": "embedding_ctx_length",
"advanced": true,
"dynamic": false,
"info": "",
"type": "int",
"list": false
},
"headers": {
"required": false,
"placeholder": "",
"show": false,
"multiline": true,
"value": "{'Authorization':\n 'Bearer <token>'}",
"password": false,
"name": "headers",
"advanced": true,
"dynamic": false,
"info": "",
"type": "Any",
"list": false
},
"max_retries": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": 6,
"password": false,
"name": "max_retries",
"advanced": true,
"dynamic": false,
"info": "",
"type": "int",
"list": false
},
"model": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "text-embedding-ada-002",
"password": false,
"name": "model",
"advanced": true,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"model_kwargs": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "model_kwargs",
"advanced": true,
"dynamic": false,
"info": "",
"type": "code",
"list": false
},
"openai_api_base": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": true,
"name": "openai_api_base",
"display_name": "OpenAI API Base",
"advanced": true,
"dynamic": false,
"info": "",
"type": "str",
"list": false,
"value": ""
},
"openai_api_key": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "",
"password": true,
"name": "openai_api_key",
"display_name": "OpenAI API Key",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"openai_api_type": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": true,
"name": "openai_api_type",
"display_name": "OpenAI API Type",
"advanced": true,
"dynamic": false,
"info": "",
"type": "str",
"list": false,
"value": ""
},
"openai_api_version": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": true,
"name": "openai_api_version",
"display_name": "OpenAI API Version",
"advanced": true,
"dynamic": false,
"info": "",
"type": "str",
"list": false,
"value": ""
},
"openai_organization": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "openai_organization",
"display_name": "OpenAI Organization",
"advanced": true,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"openai_proxy": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "openai_proxy",
"display_name": "OpenAI Proxy",
"advanced": true,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"request_timeout": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "request_timeout",
"advanced": true,
"dynamic": false,
"info": "",
"type": "float",
"list": false
},
"show_progress_bar": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": false,
"password": false,
"name": "show_progress_bar",
"advanced": true,
"dynamic": false,
"info": "",
"type": "bool",
"list": false
},
"tiktoken_model_name": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": true,
"name": "tiktoken_model_name",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false,
"value": ""
},
"_type": "OpenAIEmbeddings"
},
"description": "OpenAI embedding models.",
"base_classes": [
"OpenAIEmbeddings",
"Embeddings"
],
"display_name": "OpenAIEmbeddings",
"documentation": "https://python.langchain.com/docs/modules/data_connection/text_embedding/integrations/openai"
},
"id": "OpenAIEmbeddings-CQwCi",
"value": null
},
"selected": false,
"positionAbsolute": {
"x": 246.115851921792,
"y": -523.4791223726195
},
"dragging": false
},
{
"width": 384,
"height": 575,
"id": "RecursiveCharacterTextSplitter-O1O0g",
"type": "genericNode",
"position": {
"x": 248.90133783569058,
"y": -1150.9950743649817
},
"data": {
"type": "RecursiveCharacterTextSplitter",
"node": {
"template": {
"documents": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "documents",
"advanced": false,
"dynamic": false,
"info": "",
"type": "Document",
"list": true
},
"chunk_overlap": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"value": 200,
"password": false,
"name": "chunk_overlap",
"display_name": "Chunk Overlap",
"advanced": false,
"dynamic": false,
"info": "",
"type": "int",
"list": false
},
"chunk_size": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"value": 1000,
"password": false,
"name": "chunk_size",
"display_name": "Chunk Size",
"advanced": false,
"dynamic": false,
"info": "",
"type": "int",
"list": false
},
"separator_type": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"value": "Text",
"password": false,
"options": [
"Text",
"cpp",
"go",
"html",
"java",
"js",
"latex",
"markdown",
"php",
"proto",
"python",
"rst",
"ruby",
"rust",
"scala",
"sol",
"swift"
],
"name": "separator_type",
"display_name": "Separator Type",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": true
},
"separators": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"value": ".",
"password": false,
"name": "separators",
"display_name": "Separator",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"_type": "RecursiveCharacterTextSplitter"
},
"description": "Splitting text by recursively look at characters.",
"base_classes": [
"Document"
],
"display_name": "RecursiveCharacterTextSplitter",
"custom_fields": {},
"output_types": [
"Document"
],
"documentation": "https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter"
},
"id": "RecursiveCharacterTextSplitter-O1O0g",
"value": null
},
"selected": false,
"positionAbsolute": {
"x": 248.90133783569058,
"y": -1150.9950743649817
},
"dragging": false
},
{
"width": 384,
"height": 621,
"id": "ChatOpenAI-y0z8v",
"type": "genericNode",
"position": {
"x": 1201.3143261061039,
"y": -704.8915816630376
},
"data": {
"type": "ChatOpenAI",
"node": {
"template": {
"callbacks": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "callbacks",
"advanced": false,
"dynamic": false,
"info": "",
"type": "langchain.callbacks.base.BaseCallbackHandler",
"list": true
},
"cache": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "cache",
"advanced": false,
"dynamic": false,
"info": "",
"type": "bool",
"list": false
},
"client": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "client",
"advanced": false,
"dynamic": false,
"info": "",
"type": "Any",
"list": false
},
"max_retries": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"value": 6,
"password": false,
"name": "max_retries",
"advanced": false,
"dynamic": false,
"info": "",
"type": "int",
"list": false
},
"max_tokens": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": true,
"name": "max_tokens",
"advanced": false,
"dynamic": false,
"info": "",
"type": "int",
"list": false,
"value": ""
},
"metadata": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "metadata",
"advanced": false,
"dynamic": false,
"info": "",
"type": "code",
"list": false
},
"model_kwargs": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "model_kwargs",
"advanced": true,
"dynamic": false,
"info": "",
"type": "code",
"list": false
},
"model_name": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "gpt-3.5-turbo-0613",
"password": false,
"options": [
"gpt-3.5-turbo-0613",
"gpt-3.5-turbo",
"gpt-3.5-turbo-16k-0613",
"gpt-3.5-turbo-16k",
"gpt-4-0613",
"gpt-4-32k-0613",
"gpt-4",
"gpt-4-32k"
],
"name": "model_name",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": true
},
"n": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"value": 1,
"password": false,
"name": "n",
"advanced": false,
"dynamic": false,
"info": "",
"type": "int",
"list": false
},
"openai_api_base": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "openai_api_base",
"display_name": "OpenAI API Base",
"advanced": false,
"dynamic": false,
"info": "\nThe base URL of the OpenAI API. Defaults to https://api.openai.com/v1.\n\nYou can change this to use other APIs like JinaChat, LocalAI and Prem.\n",
"type": "str",
"list": false
},
"openai_api_key": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "",
"password": true,
"name": "openai_api_key",
"display_name": "OpenAI API Key",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"openai_organization": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "openai_organization",
"display_name": "OpenAI Organization",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"openai_proxy": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "openai_proxy",
"display_name": "OpenAI Proxy",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"request_timeout": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "request_timeout",
"advanced": false,
"dynamic": false,
"info": "",
"type": "float",
"list": false
},
"streaming": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"value": false,
"password": false,
"name": "streaming",
"advanced": false,
"dynamic": false,
"info": "",
"type": "bool",
"list": false
},
"tags": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "tags",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": true
},
"temperature": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "0.2",
"password": false,
"name": "temperature",
"advanced": false,
"dynamic": false,
"info": "",
"type": "float",
"list": false
},
"tiktoken_model_name": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "tiktoken_model_name",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"verbose": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"value": false,
"password": false,
"name": "verbose",
"advanced": false,
"dynamic": false,
"info": "",
"type": "bool",
"list": false
},
"_type": "ChatOpenAI"
},
"description": "`OpenAI` Chat large language models API.",
"base_classes": [
"ChatOpenAI",
"BaseChatModel",
"BaseLanguageModel",
"BaseLLM"
],
"display_name": "ChatOpenAI",
"custom_fields": {},
"output_types": [],
"documentation": "https://python.langchain.com/docs/modules/model_io/models/chat/integrations/openai"
},
"id": "ChatOpenAI-y0z8v",
"value": null
},
"selected": false,
"positionAbsolute": {
"x": 1201.3143261061039,
"y": -704.8915816630376
}
},
{
"width": 384,
"height": 379,
"id": "PyPDFLoader-my24T",
"type": "genericNode",
"position": {
"x": -249.89545919397153,
"y": -1327.2789565489504
},
"data": {
"type": "PyPDFLoader",
"node": {
"template": {
"file_path": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"value": "xxxxxxxxxxxxxx.pdf",
"suffixes": [
".pdf"
],
"password": false,
"name": "file_path",
"advanced": false,
"dynamic": false,
"info": "",
"type": "file",
"list": false,
"fileTypes": [
"pdf"
],
"file_path": "/root/.cache/langflow/cc058308-260c-4176-9c4a-3dc89e7724b4/ecdebbde4d4748094cc738da70630e62342bbaf6ec2fb7176dd124cac2fbb3e1"
},
"metadata": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"value": "{}",
"password": false,
"name": "metadata",
"display_name": "Metadata",
"advanced": false,
"dynamic": false,
"info": "",
"type": "code",
"list": false
},
"_type": "PyPDFLoader"
},
"description": "Load `PDF using `pypdf` and chunks at character level.",
"base_classes": [
"Document"
],
"display_name": "PyPDFLoader",
"custom_fields": {},
"output_types": [
"Document"
],
"documentation": "https://python.langchain.com/docs/modules/data_connection/document_loaders/how_to/pdf"
},
"id": "PyPDFLoader-my24T",
"value": null
},
"selected": true,
"positionAbsolute": {
"x": -249.89545919397153,
"y": -1327.2789565489504
},
"dragging": false
},
{
"width": 384,
"height": 525,
"id": "Pinecone-0Rzmp",
"type": "genericNode",
"position": {
"x": 726.5519511332589,
"y": -721.4659012184297
},
"data": {
"type": "Pinecone",
"node": {
"template": {
"documents": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "documents",
"display_name": "Documents",
"advanced": false,
"dynamic": false,
"info": "",
"type": "Document",
"list": true
},
"embedding": {
"required": true,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "embedding",
"display_name": "Embedding",
"advanced": false,
"dynamic": false,
"info": "",
"type": "Embeddings",
"list": false
},
"batch_size": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"value": 32,
"password": false,
"name": "batch_size",
"advanced": false,
"dynamic": false,
"info": "",
"type": "int",
"list": false
},
"ids": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "ids",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": true
},
"index_name": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "index_name",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false,
"value": "ming"
},
"metadatas": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "metadatas",
"advanced": false,
"dynamic": false,
"info": "",
"type": "code",
"list": true
},
"namespace": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"password": false,
"name": "namespace",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false,
"value": ""
},
"pinecone_api_key": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "a3d5f648-f41b-4453-836d-996442bf76ea",
"password": false,
"name": "pinecone_api_key",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"pinecone_env": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "gcp-starter",
"password": false,
"name": "pinecone_env",
"advanced": true,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"search_kwargs": {
"required": false,
"placeholder": "",
"show": true,
"multiline": false,
"value": "{}",
"password": false,
"name": "search_kwargs",
"advanced": true,
"dynamic": false,
"info": "",
"type": "code",
"list": false
},
"text_key": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"value": "",
"password": true,
"name": "text_key",
"advanced": false,
"dynamic": false,
"info": "",
"type": "str",
"list": false
},
"upsert_kwargs": {
"required": false,
"placeholder": "",
"show": false,
"multiline": false,
"password": false,
"name": "upsert_kwargs",
"advanced": true,
"dynamic": false,
"info": "",
"type": "code",
"list": false
},
"_type": "Pinecone"
},
"description": "Construct Pinecone wrapper from raw documents.",
"base_classes": [
"Pinecone",
"VectorStore",
"BaseRetriever",
"VectorStoreRetriever"
],
"display_name": "Pinecone",
"custom_fields": {},
"output_types": [],
"documentation": "https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/pinecone",
"beta": false,
"error": null
},
"id": "Pinecone-0Rzmp"
},
"selected": false,
"positionAbsolute": {
"x": 726.5519511332589,
"y": -721.4659012184297
},
"dragging": false
}
],
"edges": [
{
"source": "VectorStoreInfo-xaM04",
"target": "VectorStoreAgent-oWxqW",
"sourceHandle": "VectorStoreInfo|VectorStoreInfo-xaM04|VectorStoreInfo",
"targetHandle": "VectorStoreInfo|vectorstoreinfo|VectorStoreAgent-oWxqW",
"id": "reactflow__edge-VectorStoreInfo-xaM04VectorStoreInfo|VectorStoreInfo-xaM04|VectorStoreInfo-VectorStoreAgent-oWxqWVectorStoreInfo|vectorstoreinfo|VectorStoreAgent-oWxqW",
"style": {
"stroke": "#555"
},
"className": "",
"animated": false,
"selected": false
},
{
"source": "ChatOpenAI-y0z8v",
"target": "VectorStoreAgent-oWxqW",
"sourceHandle": "ChatOpenAI|ChatOpenAI-y0z8v|ChatOpenAI|BaseChatModel|BaseLanguageModel|BaseLLM",
"targetHandle": "BaseLanguageModel|llm|VectorStoreAgent-oWxqW",
"id": "reactflow__edge-ChatOpenAI-y0z8vChatOpenAI|ChatOpenAI-y0z8v|ChatOpenAI|BaseChatModel|BaseLanguageModel|BaseLLM-VectorStoreAgent-oWxqWBaseLanguageModel|llm|VectorStoreAgent-oWxqW",
"style": {
"stroke": "#555"
},
"className": "",
"animated": false,
"selected": false
},
{
"source": "PyPDFLoader-my24T",
"sourceHandle": "PyPDFLoader|PyPDFLoader-my24T|Document",
"target": "RecursiveCharacterTextSplitter-O1O0g",
"targetHandle": "Document|documents|RecursiveCharacterTextSplitter-O1O0g",
"style": {
"stroke": "#555"
},
"className": "",
"animated": false,
"id": "reactflow__edge-PyPDFLoader-my24TPyPDFLoader|PyPDFLoader-my24T|Document-RecursiveCharacterTextSplitter-O1O0gDocument|documents|RecursiveCharacterTextSplitter-O1O0g",
"selected": false
},
{
"source": "RecursiveCharacterTextSplitter-O1O0g",
"sourceHandle": "RecursiveCharacterTextSplitter|RecursiveCharacterTextSplitter-O1O0g|Document",
"target": "Pinecone-0Rzmp",
"targetHandle": "Document|documents|Pinecone-0Rzmp",
"style": {
"stroke": "#555"
},
"className": "",
"animated": false,
"id": "reactflow__edge-RecursiveCharacterTextSplitter-O1O0gRecursiveCharacterTextSplitter|RecursiveCharacterTextSplitter-O1O0g|Document-Pinecone-0RzmpDocument|documents|Pinecone-0Rzmp"
},
{
"source": "OpenAIEmbeddings-CQwCi",
"sourceHandle": "OpenAIEmbeddings|OpenAIEmbeddings-CQwCi|OpenAIEmbeddings|Embeddings",
"target": "Pinecone-0Rzmp",
"targetHandle": "Embeddings|embedding|Pinecone-0Rzmp",
"style": {
"stroke": "#555"
},
"className": "",
"animated": false,
"id": "reactflow__edge-OpenAIEmbeddings-CQwCiOpenAIEmbeddings|OpenAIEmbeddings-CQwCi|OpenAIEmbeddings|Embeddings-Pinecone-0RzmpEmbeddings|embedding|Pinecone-0Rzmp"
},
{
"source": "Pinecone-0Rzmp",
"sourceHandle": "Pinecone|Pinecone-0Rzmp|Pinecone|VectorStore|BaseRetriever|VectorStoreRetriever",
"target": "VectorStoreInfo-xaM04",
"targetHandle": "VectorStore|vectorstore|VectorStoreInfo-xaM04",
"style": {
"stroke": "#555"
},
"className": "",
"animated": false,
"id": "reactflow__edge-Pinecone-0RzmpPinecone|Pinecone-0Rzmp|Pinecone|VectorStore|BaseRetriever|VectorStoreRetriever-VectorStoreInfo-xaM04VectorStore|vectorstore|VectorStoreInfo-xaM04"
}
],
"viewport": {
"x": 183.0697158912123,
"y": 741.0434923492705,
"zoom": 0.4953951104905531
}
},
"id": "cc058308-260c-4176-9c4a-3dc89e7724b4",
"user_id": "6052609e-a1b3-4a8d-9d64-de8b6d87f5e2"
}
项目详见这里 https://github.com/chatchat-space/Langchain-Chatchat
LangChain 联合创始人下场揭秘:如何用 LangChain 和向量数据库搞定语义搜索_Zilliz Planet的博客-CSDN博客
【译】私域聊天机器人如何工作?检索增强的内容生成(RAG)概述 · 语雀