MultiTurnAttackGenerator implements a conversation-based approach to jailbreaking, creating multi-turn dialogues that gradually build context to elicit prohibited responses. This technique forms the basis of methods like Crescendo.
Class Definition
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
attacker_model | BlackBoxModel | (Required) | Model used to generate follow-up questions |
Methods
generate_candidates
Generates the next question for a multi-turn attack based on the conversation history.Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
goal | str | (Required) | Ultimate objective to achieve with the attack |
current_round | int | 0 | Current conversation turn number |
scores | List[int] | [] | Success scores for previous questions |
questions | List[str] | [] | Previous questions in the conversation |
responses | List[str] | [] | Model responses to previous questions |
response_summaries | List[str] | [] | Summaries of previous responses |
Returns
A dictionary containing:next_question: The next question to asklast_response_summary: A summary of the last response
Internal Operation
TheMultiTurnAttackGenerator works by:
- Analyzing the conversation history (previous questions and responses)
- Assessing how close the conversation is to achieving the goal
- Generating a follow-up question that builds on previous context
- Maintaining a coherent narrative while gradually approaching the target objective

