Overview
Adversarial Candidate Generators act as the core prompt engineering component in many jailbreaking methods. They create variations of prompts that attempt to bypass model safety measures while retaining the semantic goal of the original request.Base Class
All generators inherit from theAdversarialCandidateGenerator base class:
Available Generators
TreeRefinementGenerator
TheTreeRefinementGenerator generates adversarial prompts by creating a tree of refinements, using an attacker model to iteratively improve prompts based on target model responses. This is the core generator used in the TAP (Tree-of-Attacks with Pruning) jailbreak method.
MultiTurnAttackGenerator
TheMultiTurnAttackGenerator creates conversation-based attacks that build context over multiple turns, implementing approaches similar to the Crescendo technique. It’s designed to gradually build up context through seemingly innocent questions.
StrategyAttackGenerator
TheStrategyAttackGenerator implements advanced prompt generation strategies used in methods like AutoDAN-Turbo, focusing on creating prompts that appear benign but effectively bypass model safeguards. It uses a strategy library to learn from successful approaches.
GACandidateGenerator
TheGACandidateGenerator implements an evolutionary approach to adversarial prompt generation, using genetic algorithms to evolve a population of prompts through selection, crossover, and mutation operations. This is particularly effective for exploring large search spaces of possible prompts.
Common Parameters
Most generators accept these common parameters:| Parameter | Description |
|---|---|
attacker_model | Model used to generate adversarial prompts (typically different from the target model) |
branching_factor | Number of candidate variations to generate at each step |
temperature | Sampling temperature for generation (higher = more diverse) |
max_tokens | Maximum tokens to generate in responses |

