What are Python libraries like LLM and LangChain?
A couple weeks ago I was hacking together a new idea (as I tend to do when I’m avoiding “real” work).
I wanted to build a script that would scan through my project README files and use an AI to summarize them for me—so I could quickly get a high-level view of which repos needed the most cleanup.
And of course, I immediately ran into the usual dilemma:
Do I build everything from scratch with raw API calls, or do I reach for one of these fancy AI Python libraries?
In this post, I want to give you a slightly deeper, more technical take on two popular Python ecosystems for working with large language models: LLM
by Simon Willison, and the now-famous LangChain
.
I’ll show some code, talk about tradeoffs, and hopefully give you a sense of when to use what.
Python libraries are just (smart) toolkits
Quick recap. A Python library is code written by someone else that you install (usually via pip
) and use in your own Python scripts.
It saves you from reinventing the wheel. For instance:
pandas
does data analysis.requests
does HTTP.llm
orlangchain
help you talk to large language models such as OpenAI,Gemini, etc
.
In AI, these libraries are even more valuable because otherwise you’d be manually crafting HTTP requests, handling retries, parsing JSON, managing costs and tokens yourself. Boring and error-prone.
The LLM librarary (minimalist + SQLite-backed)
LLM
is super minimalist. It’s mostly a CLI and a tiny Python interface.
What makes it interesting is that it automatically logs your prompts and responses in a local SQLite database. That means every experiment you run is stored for later introspection.
You can install it with:
pip install llm
llm keys set openai your_api_key_here
Then on the command line:
llm "Give me 3 startup ideas using AI for small shops"
It logs this to llm.db
. You can even search your past prompts:
llm list-prompts
llm show 42
Using LLM
from Python
The API is extremely simple. Here’s how I’ve used it to iterate on prompt experiments:
import llm
model = llm.get_model("openai:gpt-4")
response = model.prompt("Summarize why we use context managers in Python.")
print(response.text())
Want to see all your past prompts? You can use its built-in database browser:
import sqlite_utils
db = sqlite_utils.Database("llm.db")
for row in db["prompts"].rows:
print(row["prompt"], "=>", row["response"])
It’s such a small tool, but super powerful for prompt engineering.
I often keep llm
around as a playground to rapidly test prompts before embedding them in more complex apps.
LangChain: the big framework for building LLM-powered workflows
LangChain is way more than a simple wrapper. It’s a whole framework designed to help you build chains of logic around large language models.
Think of it like a Django for multi-step reasoning.
Official docs:
https://python.langchain.com
And its PyPI page: https://pypi.org/project/langchain/
It gives you:
Prompt templates — structured ways to fill in dynamic text.
Chains — sequences like: user input → search a database → feed to LLM → return summary.
Agents — autonomous loops where the LLM decides which tool to call next.
Retrievers & vector stores — so you can do RAG (retrieval augmented generation).
Chat histories — for building memory in your apps.
A more complex LangChain example
Say you want to build a Q&A app over your Markdown files.
LangChain lets you load your files into a vector store (like FAISS or Chroma), then run similarity search before asking GPT.
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
# Load your markdown files
loader = TextLoader("docs/")
documents = loader.load()
# Embed them
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents, embeddings)
# Build a retrieval QA chain
qa = RetrievalQA.from_chain_type(
llm=OpenAI(),
retriever=db.as_retriever(),
)
# Ask it stuff
result = qa.run("Which files mention performance optimization?")
print(result)
This is an actual skeleton of how many production-grade document bots are built today.
In raw Python, this would take you hundreds of lines and you’d reinvent embedding models, nearest neighbors search, etc.
Agents with LangChain
You can also build tools that the LLM can use dynamically.
For example, it might decide: “I need to call a calculator function now,” or “I should do a web search.”
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
def multiply_numbers(input):
nums = list(map(int, input.split()))
return str(nums[0] * nums[1])
tools = [
Tool(
name="Multiplier",
func=multiply_numbers,
description="Multiplies two numbers given as 'X Y'"
)
]
agent = initialize_agent(
tools,
OpenAI(temperature=0),
agent="zero-shot-react-description"
)
print(agent.run("What's 12 times 8?"))
This is mind-blowing because the LLM is not just generating text—it’s deciding when to call your functions. As a programmer you make this possible by providing descriptions for this functions so the LLM can decide which function to call based on those descriptions.
Trade-offs: why I don’t always use LangChain
LangChain is awesome, but it also comes with:
more dependencies
more hidden logic (which can be hard to debug)
sometimes slower iterations
For many of my own experiments, I still prefer writing my own tiny pipeline:
# my_minimal_pipeline.py
context = " ".join(open(f).read() for f in md_files)
prompt = f"Summarize the main topics:\n\n{context}"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message["content"])
This is fast, transparent, and easy to tweak.
If you’re exploring LLM ideas in Python:
Try llm for quick local experiments and to keep track of your prompts.
Dive into LangChain if you’re building more ambitious multi-step AI applications, especially ones that need memory, tool usage, or document retrieval.
And if you’re like me—sometimes it’s fun to avoid all frameworks and just glue a few API calls with requests
or openai
.
Because at the end of the day, knowing what’s under the hood is the best way to debug when it inevitably breaks.
If you enjoyed this breakdown, let me know—I can do deeper dives, like a “Building your first LangChain Q&A bot step by step” or a mini-course on advanced prompt workflows with llm
.
Just hit reply.