Optimizing Gen-AI Applications with DSPy and Haystack: A Practical Guide
Building Gen-AI applications often involves the challenging task of manually optimizing prompts. DSPy, an open-source library, addresses this by turning prompt engineering into an optimization problem, making the process more scalable and robust.
Overview of DSPy
DSPy simplifies prompt engineering by providing abstractions like signatures and modules to define inputs and outputs for systems interacting with Large Language Models (LLMs).
Example: Defining a Signature
class Emotion(dspy.Signature):
"""Classify emotions in a sentence."""
sentence = dspy.InputField()
sentiment = dspy.OutputField(desc="Possible choices: sadness, joy, love, anger, fear, surprise.")
This signature translates into a structured prompt for classifying emotions in a sentence.
Using Modules for Optimization
DSPy modules, such as dspy.Predict
and dspy.ChainOfThought
, define predictors with optimizable parameters. The dspy.ChainOfThought
module, for example, asks the LLM to provide reasoning, enhancing response accuracy.
Optimizing Modules
To optimize a DSPy module, you need:
- The module to be optimized.
- A labeled training set.
- Evaluation metrics.
The BootstrapFewShot
optimizer searches through the training set, selecting the best examples to include in the prompt.
Example: Simplified BootstrapFewShot Algorithm
class SimplifiedBootstrapFewShot(Teleprompter):
def __init__(self, metric=None):
self.metric = metric
def compile(self, student, trainset, teacher=None):
teacher = teacher if teacher is not None else student
compiled_program = student.deepcopy()
# Map predictors and bootstrap traces
for example in trainset:
if self.metric(example, prediction, predicted_traces):
for predictor, inputs, outputs in predicted_traces:
d = dspy.Example(automated=True, **inputs, **outputs)
predictor_name = self.predictor2name[id(predictor)]
compiled_program[predictor_name].demonstrations.append(d)
return compiled_program
This algorithm goes through training inputs, makes predictions, and checks if they meet the evaluation metric.
Building a Custom Haystack Pipeline
Using a dataset derived from PubMedQA, we can create a Haystack pipeline to retrieve and generate concise answers.
Example Pipeline Setup
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from haystack import Pipeline
retriever = InMemoryBM25Retriever(document_store, top_k=3)
generator = OpenAIGenerator(model="gpt-3.5-turbo")
template = """
Given the following information, answer the question.
Context:
Question:
Answer:
"""
prompt_builder = PromptBuilder(template=template)
rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", generator)
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")
Example Query and Response
question = "What effects does ketamine have on rat neural stem cells?"
response = rag_pipeline.run({"retriever": {"query": question}, "prompt_builder": {"question": question}})
print(response["llm"]["replies"][0])
The detailed response indicates the need for more concise answers.
Using DSPy for Concise Answers
Defining the Signature and Module
class GenerateAnswer(dspy.Signature):
"""Answer questions with short factoid answers."""
context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
answer = dspy.OutputField(desc="short and precise answer")
class RAG(dspy.Module):
def __init__(self):
super().__init__()
self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
def retrieve(self, question):
results = retriever.run(query=question)
passages = [res.content for res in results['documents']]
return Prediction(passages=passages)
def forward(self, question):
context = self.retrieve(question).passages
prediction = self.generate_answer(context=context, question=question)
return dspy.Prediction(context=context, answer=prediction.answer)
Defining Evaluation Metrics
from haystack.components.evaluators import SASEvaluator
sas_evaluator = SASEvaluator()
sas_evaluator.warm_up()
def mixed_metric(example, pred, trace=None):
semantic_similarity = sas_evaluator.run(ground_truth_answers=[example.answer], predicted_answers=[pred.answer])["score"]
n_words = len(pred.answer.split())
long_answer_penalty = 0
if 20 < n_words < 40:
long_answer_penalty = 0.025 * (n_words - 20)
elif n_words >= 40:
long_answer_penalty = 0.5
return semantic_similarity - long_answer_penalty
Compiling the Optimized Pipeline
from dspy.teleprompt import BootstrapFewShot
optimizer = BootstrapFewShot(metric=mixed_metric)
compiled_rag = optimizer.compile(RAG(), trainset=trainset)
Re-evaluating the compiled pipeline shows improved performance, with concise answers scoring higher.
Final Optimized Pipeline
template = static_prompt + """
---
Context:
Question:
Reasoning: Let's think step by step in order to
"""
new_prompt_builder = PromptBuilder(template=template)
new_retriever = InMemoryBM25Retriever(document_store, top_k=3)
new_generator = OpenAIGenerator(model="gpt-3.5-turbo")
answer_builder = AnswerBuilder(pattern="Answer: (.*)")
optimized_rag_pipeline = Pipeline()
optimized_rag_pipeline.add_component("retriever", new_retriever)
optimized_rag_pipeline.add_component("prompt_builder", new_prompt_builder)
optimized_rag_pipeline.add_component("llm", new_generator)
optimized_rag_pipeline.add_component("answer_builder", answer_builder)
optimized_rag_pipeline.connect("retriever", "prompt_builder.documents")
optimized_rag_pipeline.connect("prompt_builder", "llm")
optimized_rag_pipeline.connect("llm.replies", "answer_builder.replies")
Testing the optimized pipeline confirms shorter, more precise answers.
Conclusion
By leveraging DSPy to optimize prompts in a Haystack RAG pipeline, we improved the performance by nearly 40% without manual prompt engineering. This approach allows for scalable and robust prompt optimization, enhancing the quality and efficiency of Gen-AI applications.