AI Agent案例实践（基于LangChain框架）

阿司匹林

2025-10-31 16:37:09

文章摘要

从AI问答到AI行动：LangChain正为 Agent革命打造底层基础设施。本文拆解新闻编辑Agent的完整构建过程，揭示它为何能像搭积木一样，组合模型、记忆和外部工具，赋予AI “手脚”抓取网页、分析数据。

这两年，AI的发展经历了一个显著变，从问答模式走向了行动模模式。

现在的AI Agent 能帮你爬取网页、总结文件、调API、分析数据。

而让这一切变得可行的基础设施之一，就是LangChain。

一、LangChain 解决了什么问题？

LangChain 的设计哲学很简单：

语言模型强大，但不好用；

LangChain的目标，就是让它变得可编程、可组合。

换句话说，它是一个帮助开发者构建智能Agent的开源框架，能将各种组件（模型、提示词、记忆、工具）像积木一样组合起来。

三大核心优势

模型集成能力：轻松接入国内外主流的LLM（如GPT、Gemini等），让你站在巨人的肩膀上。

上下文管理与记忆：Agent能通过上下文保持（Context Management），记住之前漫长的对话内容，让互动更连贯、更自然。

工具与行动力：LangChain允许你将任何外部API封装成一个可供LLM调用的工具（Tool），从而赋予Agent与真实世界交互的能力。

二、从零开始：构建一个爬取新闻并总结内容的 AI Agent

1. 定义系统提示（System Prompt）

我们要让它变成一个资深新闻编辑，具备以下能力：

接受新闻网址；

自动抓取网页正文；

提炼出关键信息；

输出摘要（可选：带上文章情绪或新闻类型）。

SYSTEM_PROMPT = """You are a professional news summarization agent.

Your task:
- Retrieve the news article from a given URL.
- Extract the main content (ignore ads, comments, unrelated links).
- Summarize the key points in clear, concise English.
- Optionally identify the sentiment (positive, negative, neutral) and topic (politics, tech, sports, etc).

You have access to two tools:
- fetch_news_html: fetches the HTML content of a given URL.
- extract_text_from_html: extracts main readable text from the raw HTML.

If the user gives you a URL, first use fetch_news_html to get the HTML, 
then use extract_text_from_html to clean it, 
and finally produce a structured summary of the content."""

告诉了模型有两个工具可用，分别是获取网页和提取正文。

同时给出行为顺序（先抓取 → 再清理 → 再总结）。

2. 创建工具（Tools）

我们需要两个核心工具：

1. fetch_news_html：根据 URL 获取网页 HTML；

2. extract_text_from_html：从 HTML 中提取正文文本。

LangChain 的 @tool 装饰器会自动为工具生成可调用的函数描述，让模型知道如何调用它。

from dataclasses import dataclass
from langchain.tools import tool

import requests
from bs4 import BeautifulSoup

@tool
def fetch_news_html(url: str) -> str:
    """Fetch raw HTML from a news webpage."""
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        return response.text
    except Exception as e:
        return f"Error fetching the URL: {str(e)}"

@tool
def extract_text_from_html(html: str) -> str:
    """Extract main readable text from HTML using simple heuristics."""
    try:
        soup = BeautifulSoup(html, "html.parser")
                for tag in soup(["script", "style", "nav", "footer", "header", "aside"]):
            tag.decompose()
        text = " ".join(soup.stripped_strings)
        return text[:5000]  
    except Exception as e:
        return f"Error parsing HTML: {str(e)}"

3. 配置语言模型（Model）

temperature 控制输出随机性，新闻摘要场景下建议 0.2～0.4；

max_tokens 限制最大生成长度。

from langchain.chat_models import init_chat_model

model = init_chat_model(
    "anthropic:claude-sonnet-4-5",
    temperature=0.3,  
    timeout=15,
    max_tokens=1500
)

4. 定义输出格式

我们可以定义一个输出模式，让模型输出的数据更规范。

from dataclasses import dataclass

@dataclass
class NewsSummary:
    """Structured output schema for summarized news."""
    title: str
    summary: str
    sentiment: str | None = None
    category: str | None = None

5. 添加记忆（Memory）

在新闻类任务中，记忆功能可以帮助 Agent 保留用户上下文，比如连续查询多条新闻。

这里使用内存存储（InMemorySaver），生产环境建议使用数据库版本。

from langgraph.checkpoint.memory import InMemorySaver

checkpointer = InMemorySaver()

6. 组装并运行 Agent

create_agent() 把所有组件组合成一个可运行的智能体；

agent.invoke() 执行一次对话；

传入参数包括用户输入和上下文配置；

最后我们从 response['structured_response'] 获取格式化结果

from langchain.agents import create_agent

agent = create_agent(
    model=model,
    system_prompt=SYSTEM_PROMPT,
    tools=[fetch_news_html, extract_text_from_html],
    response_format=NewsSummary,
    checkpointer=checkpointer
)

config = {"configurable": {"thread_id": "session_1"}}

response = agent.invoke(
    {"messages": [{"role": "user", "content": "Summarize the latest news from https://www.bbc.com/news/technology-123456"}]},
    config=config
)

print(response["structured_response"])

预期输出示例：

NewsSummary(
    title="AI Breakthrough in Open Source Models",
    summary="The BBC reports that several open-source AI models are closing the gap with proprietary systems. Experts believe this trend will accelerate innovation across industries, though concerns about misuse remain.",
    sentiment="neutral",
    category="technology"
)

你可以多次调用同一个 thread_id 来继续对话，例如：

response = agent.invoke(
    {"messages": [{"role": "user", "content": "Can you compare it with the latest OpenAI announcement?"}]},
    config=config
)

此时，Agent 会记得上一次的新闻摘要，基于上下文生成比较分析。

三、 LangChain 的几个小经验

如果你能跑通上面的Demo，就算是真正迈进Agent开发的门槛了。

但 LangChain上手容易，调优难、维护更难。

以下是几点实战心得：

● Prompt 工程是核心竞争力，同样的工具调用逻辑，不同Prompt效果天差地别。

● 初学者喜欢在一个Chain里嵌十几步逻辑，但那几乎必炸，推荐用多个小Chain组合，用管道思维。

● 长记忆容易造成上下文混乱，建议先关掉Memory，调通主逻辑后再添加记忆层。

● 日志与追踪很重要，用verbose=True或LangSmith等工具追踪Agent的推理过程，能帮你理解它到底在想什么。

以上内容不代表本平台立场，仅供读者参考