Evaluation Tests
Jailbreak Evaluations
Create Static Jailbreak Test
create_static_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)
Create a static jailbreak test on a model
Method Parameters
name | required string
Test identifier name.
model_key | required string
Unique identifier of model object that test will be run on.
compute | required GPUSpecification | CPUSpecification
Compute specification identifying CPU or GPU configurations for the test.
dataset_id | optional str
Id of the dataset to be used. If not provided, the test will default to the v0 dataset, which is a small dataset with 50 prompts for testing purposes:
https://github.com/patrickrchao/JailbreakingLLMs/blob/main/data/harmful_behaviors_custom.csv
If using a custom dataset, ensure that the dataset has the following columns:
- "goal": the prompt
- "category": the category of the prompt
- "shortened_prompt": the goal column shortened to 1-2 words (used for encoding attack and ascii art attack)
- "gcg": the prompt that includes the gcg suffix
grid | optional List[Dict[str, List[str | float | int]]]
Grid of hyperparameters supported for this attack
Hyperparameters
| Param | Type | Description |
|---|---|---|
| temperature | float | Model temperature, controls model randomness, should be > 0 |
Returns
Test object.
Example
test_info = dfl.create_static_jailbreak_test(
name="static_jailbreak_test_{}".format(SLUG).format(),
model_key=model.key,
compute=CPUConfig(cpu_count=1, memory_count=2),
grid=[
{
"temperature": [0],
}
],
)
Create Adaptive Jailbreak Test
create_adaptive_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)
Create an adaptive jailbreak test on a model.
Method Parameters
name | required string
Test identifier name.
model_key | required string
Unique identifier of model object that test will be run on.
compute | required GPUSpecification | CPUSpecification
Compute specification identifying CPU or GPU configurations for the test.
dataset_id | optional str
ID of the dataset to be used. If not provided, the test will default an internal attack dataset, which is a dataset comprising of 50 adversarial prompts.
If using a custom dataset, ensure that the dataset has the following columns:
- "goal": the prompt
- "target": the target column
grid | optional List[Dict[str, List[str | float | int]]]
Grid of hyperparameters supported for this attack
Hyperparameters
| Param | Type | Description |
|---|---|---|
| temperature | float | Model temperature, controls model randomness, should be > 0 |
Returns
Test object.
Example
test_info = dfl.create_adaptive_jailbreak_test(
name="create_adaptive_jailbreak_test_{}".format(SLUG).format(),
model_key=model.key,
compute=CPUConfig(cpu_count=1, memory_count=2),
grid=[
{
"temperature": [0],
}
],
)
Create Policy Jailbreak Test
create_policy_jailbreak_test(name, model_key, compute, dataset_id?, grid?)
Create a policy jailbreak test on a model
Method Parameters
name | required string
Test identifier name.
model_key | required string
Unique identifier of model object that test will be run on.
compute | required GPUSpecification | CPUSpecification
Compute specification identifying CPU or GPU configurations for the test.
dataset_id | optional str
Id of the dataset to be used. If not provided, the test will default to the v0 dataset, which is a small dataset with 50 prompts for testing purposes:
https://github.com/patrickrchao/JailbreakingLLMs/blob/main/data/harmful_behaviors_custom.csv
If using a custom dataset, ensure that the dataset has the following columns:
- "goal": the prompt
- "category": the category of the prompt
- "shortened_prompt": the goal column shortened to 1-2 words (used for encoding attack and ascii art attack)
- "gcg": the prompt that includes the gcg suffix
grid | optional List[Dict[str, List[str | float | int]]]
Grid of hyperparameters supported for this attack
Hyperparameters
| Param | Type | Description |
|---|---|---|
| temperature | float | Model temperature, controls model randomness, should be > 0 |
Returns
Test object.
Example
test_info = dfl.create_policy_jailbreak_test(
name="policy_jailbreak_test_{}".format(SLUG).format(),
model_key=model.key,
compute=CPUConfig(cpu_count=1, memory_count=2),
grid=[
{
"temperature": [0],
}
],
)
Compliance and Security Evaluations
Create System Policy Compliance Test
create_system_policy_compliance_test(name, model_key, applied_dynamoguard_policies?, evaluated_dynamoguard_policies?, dynamoguard_endpoint?, dynamoguard_api_key?, enable_perturbations?, perturbation_methods?, compute?, grid?)
Create System Policy Compliance benchmark test. Evaluate compliance of AI system with applied and evaluated DynamoGuard policies and associated benchmark datasets/policy descriptions.
Method Parameters
name | required string
Test identifier name.
model_key | required string
Key of the target model.
applied_dynamoguard_policies | optional List[str]
List of DynamoGuard policy IDs. These guardrail models will be applied and evaluated.
evaluated_dynamoguard_policies | optional List[str]
List of DynamoGuard policy IDs. These guardrail models will only be evaluated.
dynamoguard_endpoint | optional string
Endpoint for the DynamoGuard policies. This should be the analyze endpoint and end with v1/moderation/analyze/.
dynamoguard_api_key | optional string
API key for the DynamoGuard policies.
enable_perturbations | optional boolean
Defaulted to True; perturbations will run by default.
perturbation_methods | optional List[str]
If enable_perturbations is True, these perturbation methods will run. By default, the full set of perturbations is applied: rewording, common_misspelling, leet_letters, random_upper.
compute | optional GPUSpecification | CPUSpecification
Compute specification identifying CPU or GPU configurations for the test. Defaults to a small CPU configuration if not provided.
grid | optional List[Dict[str, List[str | float | int]]]
Grid support for hyperparameters if/when applicable to the attack.
Note: Either applied_dynamoguard_policies or evaluated_dynamoguard_policies must be provided.
Returns
Test object.
Example
test_info = dfl.create_system_policy_compliance_test(
name="guardrail_benchmark_{}".format(SLUG),
model_key=model.key,
applied_dynamoguard_policies=["policy_123"],
evaluated_dynamoguard_policies=["policy_456"],
dynamoguard_endpoint="https://api.dynamofl.com/v1/moderation/analyze/",
dynamoguard_api_key=os.environ["DYNAMOGUARD_API_KEY"],
enable_perturbations=True,
perturbation_methods=["rewording", "common_misspelling", "leet_letters", "random_upper"],
compute=CPUConfig(cpu_count=1, memory_count=2),
grid=[{}],
)
Tests - Helpers
Get Attack Information
get_attack_info(attack_id)
Returns attack object status.
Method Parameters
attack_id | required string
Unique attack identifier.
Returns
Attack result JSON object.
Example
all_attacks = test_info.attacks
attack_ids = [attack["id"] for attack in all_attacks]
for attack in attack_ids:
attack_info = dfl.get_attack_info(attack)
# Example Response:
# {'id': '6566d2718cf68d15c393ff0d',
# 'status': 'COMPLETED',
# 'failureReason': None,
# 'response': {
# 'metrics': {
# 'precision': 0.023429541595925297,
# 'recall': 0.014047231270358305,
# 'pii_intersection_per_category': {'DATE': 57, 'ORG': 6, 'PERSON': 6},
# 'dataset_pii_per_category': {'ORG': 1848, 'EMAIL': 494, 'USERNAME': 1130, 'DATE': 518, 'PERSON': 922},
# 'dataset_pii_category_count': 5,
# 'dataset_top_3_categories': ['ORG', 'USERNAME', 'PERSON'],
# 'extracted_pii_per_category': {'DATE': 568, 'EMAIL': 424, 'USERNAME': 1120, 'PERSON': 721, 'ORG': 112},
# 'samples': [{'prompt': '', 'response': "..."}, {...}],
# 'model_type': 'decoder'
# },
# 'inferences_location': 's3://dynamofl-pentest-prod/attacks/output/naive_extraction_1701238142.json',
# 'resolved_args': {'attack_args': {...}
# }
# }
# 'testId': '6566d2718cf68d15c393ff05'
# }