TEXTGRAD - Revolutionizing LLMs with Automatic Differentiation via Text


rocket

What is TEXTGRAD?

Traditional neural networks rely on backpropagation and numerical gradients to optimize their parameters. TEXTGRAD extends this concept by using textual feedback as gradients to improve different parts of AI systems. Essentially, TEXTGRAD converts an AI system into a computation graph where each variable is an input or output of complex functions, such as LLM API calls or external numerical solvers.

In traditional automatic differentiation, gradients provide a direction for improving variables in a model. Similarly, TEXTGRAD uses natural language feedback (referred to as “textual gradients”) to inform how variables should be modified to enhance the system’s performance.

How TEXTGRAD Works

Basic Example

Let’s consider a simple example to understand how TEXTGRAD operates:

Prediction = LLM(Prompt + Question)
Evaluation = LLM(Evaluation Instruction + Prediction)

In this system, we aim to optimize the Prompt to enhance the evaluation result. The gradient computation follows these steps:

  1. Calculate feedback on the Prediction based on the Evaluation.
  2. Use this feedback to adjust the Prompt.

Mathematically, this process can be represented as:

\[\frac{\partial \text{Evaluation}}{\partial \text{Prediction}} = \nabla_{\text{LLM}}(\text{Prediction}, \text{Evaluation})\] \[\frac{\partial \text{Evaluation}}{\partial \text{Prompt}} = \frac{\partial \text{Evaluation}}{\partial \text{Prediction}} \circ \nabla_{\text{LLM}}(\text{Prompt}, \text{Prediction}, \frac{\partial \text{Evaluation}}{\partial \text{Prediction}})\]

Here, \(\nabla_{\text{LLM}}\) represents the gradient operator when the forward function is an LLM call, returning natural language feedback such as “This prediction can be improved by…”.

General Framework

For more complex systems, TEXTGRAD treats the AI system as a computation graph with variables and functions. Each variable is updated based on feedback from its successors in the graph:

\[\frac{\partial L}{\partial v} = \bigcup_{w \in \text{SuccessorsOf}(v)} \nabla_f \left( v, w, \frac{\partial L}{\partial w} \right)\]

where \(L\) is the loss function, \(v\) is a variable, and \(\nabla_f\) denotes the gradient function for a function \(f\).

To update any desired variable \(v\), TEXTGRAD uses Textual Gradient Descent (TGD):

\[v_{\text{new}} = \text{TGD.step}\left(v, \frac{\partial L}{\partial v}\right)\]

Applications of TEXTGRAD in LLMs

1. Code Optimization

TEXTGRAD is highly effective in optimizing code solutions. For instance, it can enhance solutions to coding problems from platforms like LeetCode. By providing feedback on the code’s correctness and runtime performance, TEXTGRAD iteratively refines the code until it passes all test cases. This process significantly boosts the completion rate of challenging coding problems.

Here’s an example:

Initial Code:

for i in range(n):
    if nums[i] < k:
        balance -= 1
    elif nums[i] > k:
        balance += 1
    if nums[i] == k:
        result += count.get(balance, 0) + count.get(balance - 1, 0)
    else:
        count[balance] = count.get(balance, 0) + 1

TEXTGRAD Feedback:

Handling `nums[i] == k`: The current logic does not correctly handle the case when `nums[i] == k`. The balance should be reset or adjusted differently when `k` is encountered.

Optimized Code:

for i in range(n):
    if nums[i] < k:
        balance -= 1
    elif nums[i] > k:
        balance += 1
    else:
        found_k = True
    if nums[i] == k:
        result += count.get(balance, 0) + count.get(balance - 1, 0)
    else:
        count[balance] = count.get(balance, 0) + 1

2. Problem Solving

TEXTGRAD can optimize solutions to complex scientific questions by refining the answers iteratively. For example, in the Google-Proof Question Answering benchmark, TEXTGRAD improved the zero-shot accuracy of GPT-4 from 51% to 55%.

The objective function for the refinement looks like:

\[\text{Solution Refinement Objective} = \text{LLM}(\text{Question} + \text{Solution} + \text{Test-time Instruction})\]

3. Reasoning Enhancement

Prompt optimization is another critical application. By fine-tuning prompts, TEXTGRAD can significantly improve the reasoning performance of LLMs. For example, it can push the performance of GPT-3.5 closer to that of GPT-4 on various reasoning tasks by optimizing the prompts provided to the model.

Here’s an example of prompt optimization:

Initial Prompt:

You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.

Optimized Prompt:

You will answer a reasoning question. List each item and its quantity in a clear and consistent format, such as '- Item: Quantity'. Sum the values directly from the list and provide a concise summation. Ensure the final answer is clearly indicated in the format: 'Answer: $VALUE' where VALUE is a numerical value. Verify the relevance of each item to the context of the query and handle potential errors or ambiguities in the input. Double-check the final count to ensure accuracy.

4. Molecule Design

TEXTGRAD can be applied in chemistry to design new small molecules with desirable properties. By iteratively optimizing molecular structures based on textual feedback about their druglikeness and binding affinity, TEXTGRAD can help in drug discovery and development.

Example Optimization:

Iteration 1:

Molecule: [Initial SMILES String]
Vina Score: -4.3 kcal/mol
QED: 0.44

Iteration 2:

Molecule: [Optimized SMILES String]
Vina Score: -5.5 kcal/mol
QED: 0.59

Iteration 3:

Molecule: [Further Optimized SMILES String]
Vina Score: -7.5 kcal/mol
QED: 0.79

5. Medical Treatment Planning

In the field of medicine, TEXTGRAD can optimize radiotherapy treatment plans for cancer patients. By adjusting the weights assigned to different tissues and organs, TEXTGRAD ensures that the treatment plan delivers the prescribed dose to the target volume while minimizing exposure to healthy tissues.

Example Hyperparameters:

Initial Weights:
- PTV: 0.5
- Bladder: 0.2
- Rectum: 0.2
- Femoral Heads: 0.1

Optimized Weights (after iterations):
- PTV: 0.6
- Bladder: 0.3
- Rectum: 0.3
- Femoral Heads: 0.1

Conclusion

TEXTGRAD represents a significant advancement in the optimization of AI systems, particularly those involving LLMs. By leveraging natural language feedback as gradients, it provides a flexible and powerful way to enhance the performance of various AI components across different domains. Whether it’s coding, problem-solving, reasoning, molecule design, or medical treatment planning, TEXTGRAD demonstrates its potential to revolutionize the next generation of AI systems.

To explore more about TEXTGRAD and its applications, visit the TEXTGRAD GitHub repository.

Hits