Skip to main content

Evaluation Tests

Jailbreak Evaluations

Create Static Jailbreak Test

create_static_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)

Create a static jailbreak test on a model

Method Parameters

name | required string

Test identifier name.


model_key | required string

Unique identifier of model object that test will be run on.


compute | required GPUSpecification | CPUSpecification

Compute specification identifying CPU or GPU configurations for the test.


dataset_id | optional str

Id of the dataset to be used. If not provided, the test will default to the v0 dataset, which is a small dataset with 50 prompts for testing purposes:

https://github.com/patrickrchao/JailbreakingLLMs/blob/main/data/harmful_behaviors_custom.csv

If using a custom dataset, ensure that the dataset has the following columns:

  • "goal": the prompt
  • "category": the category of the prompt
  • "shortened_prompt": the goal column shortened to 1-2 words (used for encoding attack and ascii art attack)
  • "gcg": the prompt that includes the gcg suffix

grid | optional List[Dict[str, List[str | float | int]]]

Grid of hyperparameters supported for this attack


Hyperparameters

ParamTypeDescription
temperaturefloatModel temperature, controls model randomness, should be > 0

Returns

Test object.

Example

test_info = dfl.create_static_jailbreak_test(
name="static_jailbreak_test_{}".format(SLUG).format(),
model_key=model.key,
compute=CPUConfig(cpu_count=1, memory_count=2),
grid=[
{
"temperature": [0],
}
],
)

Create Adaptive Jailbreak Test

create_adaptive_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)

Create an adaptive jailbreak test on a model.

Method Parameters

name | required string

Test identifier name.


model_key | required string

Unique identifier of model object that test will be run on.


compute | required GPUSpecification | CPUSpecification

Compute specification identifying CPU or GPU configurations for the test.


dataset_id | optional str

ID of the dataset to be used. If not provided, the test will default an internal attack dataset, which is a dataset comprising of 50 adversarial prompts.

If using a custom dataset, ensure that the dataset has the following columns:

  • "goal": the prompt
  • "target": the target column

grid | optional List[Dict[str, List[str | float | int]]]

Grid of hyperparameters supported for this attack


Hyperparameters

ParamTypeDescription
temperaturefloatModel temperature, controls model randomness, should be > 0

Returns

Test object.

Example

test_info = dfl.create_adaptive_jailbreak_test(
name="create_adaptive_jailbreak_test_{}".format(SLUG).format(),
model_key=model.key,
compute=CPUConfig(cpu_count=1, memory_count=2),
grid=[
{
"temperature": [0],
}
],
)

Create Policy Jailbreak Test

create_policy_jailbreak_test(name, model_key, compute, dataset_id?, grid?)

Create a policy jailbreak test on a model

Method Parameters

name | required string

Test identifier name.


model_key | required string

Unique identifier of model object that test will be run on.


compute | required GPUSpecification | CPUSpecification

Compute specification identifying CPU or GPU configurations for the test.


dataset_id | optional str

Id of the dataset to be used. If not provided, the test will default to the v0 dataset, which is a small dataset with 50 prompts for testing purposes:

https://github.com/patrickrchao/JailbreakingLLMs/blob/main/data/harmful_behaviors_custom.csv

If using a custom dataset, ensure that the dataset has the following columns:

  • "goal": the prompt
  • "category": the category of the prompt
  • "shortened_prompt": the goal column shortened to 1-2 words (used for encoding attack and ascii art attack)
  • "gcg": the prompt that includes the gcg suffix

grid | optional List[Dict[str, List[str | float | int]]]

Grid of hyperparameters supported for this attack


Hyperparameters

ParamTypeDescription
temperaturefloatModel temperature, controls model randomness, should be > 0

Returns

Test object.

Example

test_info = dfl.create_policy_jailbreak_test(
name="policy_jailbreak_test_{}".format(SLUG).format(),
model_key=model.key,
compute=CPUConfig(cpu_count=1, memory_count=2),
grid=[
{
"temperature": [0],
}
],
)

Compliance and Security Evaluations

Create System Policy Compliance Test

create_system_policy_compliance_test(name, model_key, applied_dynamoguard_policies?, evaluated_dynamoguard_policies?, dynamoguard_endpoint?, dynamoguard_api_key?, enable_perturbations?, perturbation_methods?, compute?, grid?)

Create System Policy Compliance benchmark test. Evaluate compliance of AI system with applied and evaluated DynamoGuard policies and associated benchmark datasets/policy descriptions.

Method Parameters

name | required string

Test identifier name.


model_key | required string

Key of the target model.


applied_dynamoguard_policies | optional List[str]

List of DynamoGuard policy IDs. These guardrail models will be applied and evaluated.


evaluated_dynamoguard_policies | optional List[str]

List of DynamoGuard policy IDs. These guardrail models will only be evaluated.


dynamoguard_endpoint | optional string

Endpoint for the DynamoGuard policies. This should be the analyze endpoint and end with v1/moderation/analyze/.


dynamoguard_api_key | optional string

API key for the DynamoGuard policies.


enable_perturbations | optional boolean

Defaulted to True; perturbations will run by default.


perturbation_methods | optional List[str]

If enable_perturbations is True, these perturbation methods will run. By default, the full set of perturbations is applied: rewording, common_misspelling, leet_letters, random_upper.


compute | optional GPUSpecification | CPUSpecification

Compute specification identifying CPU or GPU configurations for the test. Defaults to a small CPU configuration if not provided.


grid | optional List[Dict[str, List[str | float | int]]]

Grid support for hyperparameters if/when applicable to the attack.


Note: Either applied_dynamoguard_policies or evaluated_dynamoguard_policies must be provided.


Returns

Test object.

Example

test_info = dfl.create_system_policy_compliance_test(
name="guardrail_benchmark_{}".format(SLUG),
model_key=model.key,
applied_dynamoguard_policies=["policy_123"],
evaluated_dynamoguard_policies=["policy_456"],
dynamoguard_endpoint="https://api.dynamofl.com/v1/moderation/analyze/",
dynamoguard_api_key=os.environ["DYNAMOGUARD_API_KEY"],
enable_perturbations=True,
perturbation_methods=["rewording", "common_misspelling", "leet_letters", "random_upper"],
compute=CPUConfig(cpu_count=1, memory_count=2),
grid=[{}],
)

Tests - Helpers

Get Attack Information

get_attack_info(attack_id)

Returns attack object status.

Method Parameters

attack_id | required string

Unique attack identifier.

Returns

Attack result JSON object.

Example

all_attacks = test_info.attacks
attack_ids = [attack["id"] for attack in all_attacks]
for attack in attack_ids:
attack_info = dfl.get_attack_info(attack)
# Example Response:
# {'id': '6566d2718cf68d15c393ff0d',
# 'status': 'COMPLETED',
# 'failureReason': None,
# 'response': {
# 'metrics': {
# 'precision': 0.023429541595925297,
# 'recall': 0.014047231270358305,
# 'pii_intersection_per_category': {'DATE': 57, 'ORG': 6, 'PERSON': 6},
# 'dataset_pii_per_category': {'ORG': 1848, 'EMAIL': 494, 'USERNAME': 1130, 'DATE': 518, 'PERSON': 922},
# 'dataset_pii_category_count': 5,
# 'dataset_top_3_categories': ['ORG', 'USERNAME', 'PERSON'],
# 'extracted_pii_per_category': {'DATE': 568, 'EMAIL': 424, 'USERNAME': 1120, 'PERSON': 721, 'ORG': 112},
# 'samples': [{'prompt': '', 'response': "..."}, {...}],
# 'model_type': 'decoder'
# },
# 'inferences_location': 's3://dynamofl-pentest-prod/attacks/output/naive_extraction_1701238142.json',
# 'resolved_args': {'attack_args': {...}
# }
# }
# 'testId': '6566d2718cf68d15c393ff05'
# }