Review of https://github.com/mlabonne/llm-course?utm_source=chatgpt.com

Running LLMs
Running LLMs can be resource-intensive, but there are flexible ways to use them from APIs to local setups.
APIs vs Local Models: Private APIs (OpenAI, Google, Anthropic,) are fast to integrate, while open-source options (OpenRouter, Hugging Face, Together AI, etc.) allow for customization and privacy.
Prompt Engineering: Techniques like zero-shot, few-shot, Chain-of-Thought, and ReAct greatly influence quality.
Structured Outputs: Tools like Outlines and JSON schemas can guide LLM responses into clean, usable formats.
Building a Vector Storage
Vector databases form the foundation of Retrieval Augmented Generation (RAG) systems.
Document ingestion and splitting: Use structured loaders and semantic text splitters (LangChain provides many).
Embeddings: Task-specific embedding models improve semantic retrieval accuracy.
Vector databases: Tools like Chroma, Pinecone, Milvus, FAISS, Annoy, etc. efficiently store and search embeddings.
Retrieval Augmented Generation
RAG enhances LLM responses with real-time knowledge retrieval.
Orchestrators: Frameworks like LangChain and LlamaIndex streamline RAG pipelines.
Retrievers & Memory: Advanced retrievers (CoRAG, HyDE) and context memory systems boost relevance.
Evaluation: Tools like Ragas and DeepEval assess retrieval precision and answer quality.
Advanced RAG
For production-grade systems, RAG can integrate structured databases, APIs, and even programmatic optimizations.
Query Construction: Translate user intent into SQL or graph queries.
Agents & Tools: Combine LLMs with external APIs and interpreters for more powerful reasoning.
Post-processing: Techniques like re-ranking, RAG-fusion,, and classification refine the final output.
DSPy: Allows programmatic prompt and weight optimization.
Agents
Agents bring autonomy to LLMs — enabling them to reason, take actions, and learn from results.
Core Loop: Thought → Action → Observation.
Frameworks: