Evaluation Tests

Jailbreak Evaluations

Create Static Jailbreak Test

`create_static_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)`

Create a static jailbreak test on a model

Method Parameters

name | required string

Test identifier name.

model_key | required string

Unique identifier of model object that test will be run on.

compute | required GPUSpecification | CPUSpecification

Compute specification identifying CPU or GPU configurations for the test.

dataset_id | optional str

Id of the dataset to be used. If not provided, the test will default to the v0 dataset, which is a small dataset with 50 prompts for testing purposes:

https://github.com/patrickrchao/JailbreakingLLMs/blob/main/data/harmful_behaviors_custom.csv

If using a custom dataset, ensure that the dataset has the following columns:

"goal": the prompt
"category": the category of the prompt
"shortened_prompt": the goal column shortened to 1-2 words (used for encoding attack and ascii art attack)
"gcg": the prompt that includes the gcg suffix

grid | optional List[Dict[str, List[str | float | int]]]

Grid of hyperparameters supported for this attack

Hyperparameters

Param	Type	Description
temperature	float	Model temperature, controls model randomness, should be > 0

Returns

Test object.

Example

test_info = dfl.create_static_jailbreak_test(
    name="static_jailbreak_test_{}".format(SLUG).format(),
    model_key=model.key,
    compute=CPUConfig(cpu_count=1, memory_count=2),
    grid=[
        {
            "temperature": [0],
        }
    ],
)

Create Adaptive Jailbreak Test

`create_adaptive_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)`

Create an adaptive jailbreak test on a model.

Method Parameters

name | required string

Test identifier name.

model_key | required string

Unique identifier of model object that test will be run on.

compute | required GPUSpecification | CPUSpecification

Compute specification identifying CPU or GPU configurations for the test.

dataset_id | optional str

ID of the dataset to be used. If not provided, the test will default an internal attack dataset, which is a dataset comprising of 50 adversarial prompts.

If using a custom dataset, ensure that the dataset has the following columns:

"goal": the prompt
"target": the target column

grid | optional List[Dict[str, List[str | float | int]]]

Grid of hyperparameters supported for this attack

Hyperparameters

Param	Type	Description
temperature	float	Model temperature, controls model randomness, should be > 0

Returns

Test object.

Example

test_info = dfl.create_adaptive_jailbreak_test(
    name="create_adaptive_jailbreak_test_{}".format(SLUG).format(),
    model_key=model.key,
    compute=CPUConfig(cpu_count=1, memory_count=2),
    grid=[
        {
            "temperature": [0],
        }
    ],
)

Create Policy Jailbreak Test

`create_policy_jailbreak_test(name, model_key, compute, dataset_id?, grid?)`

Create a policy jailbreak test on a model

Method Parameters

name | required string

Test identifier name.

model_key | required string

Unique identifier of model object that test will be run on.

compute | required GPUSpecification | CPUSpecification

Compute specification identifying CPU or GPU configurations for the test.

dataset_id | optional str

Id of the dataset to be used. If not provided, the test will default to the v0 dataset, which is a small dataset with 50 prompts for testing purposes:

https://github.com/patrickrchao/JailbreakingLLMs/blob/main/data/harmful_behaviors_custom.csv

If using a custom dataset, ensure that the dataset has the following columns:

"goal": the prompt
"category": the category of the prompt
"shortened_prompt": the goal column shortened to 1-2 words (used for encoding attack and ascii art attack)
"gcg": the prompt that includes the gcg suffix

grid | optional List[Dict[str, List[str | float | int]]]

Grid of hyperparameters supported for this attack

Hyperparameters

Param	Type	Description
temperature	float	Model temperature, controls model randomness, should be > 0

Returns

Test object.

Example

test_info = dfl.create_policy_jailbreak_test(
    name="policy_jailbreak_test_{}".format(SLUG).format(),
    model_key=model.key,
    compute=CPUConfig(cpu_count=1, memory_count=2),
    grid=[
        {
            "temperature": [0],
        }
    ],
)

Compliance and Security Evaluations

Create System Policy Compliance Test

`create_system_policy_compliance_test(name, model_key, applied_dynamoguard_policies?, evaluated_dynamoguard_policies?, dynamoguard_endpoint?, dynamoguard_api_key?, enable_perturbations?, perturbation_methods?, compute?, grid?)`

Create System Policy Compliance benchmark test. Evaluate compliance of AI system with applied and evaluated DynamoGuard policies and associated benchmark datasets/policy descriptions.

Method Parameters

name | required string

Test identifier name.

model_key | required string

Key of the target model.

applied_dynamoguard_policies | optional List[str]

List of DynamoGuard policy IDs. These guardrail models will be applied and evaluated.

evaluated_dynamoguard_policies | optional List[str]

List of DynamoGuard policy IDs. These guardrail models will only be evaluated.

dynamoguard_endpoint | optional string

Endpoint for the DynamoGuard policies. This should be the analyze endpoint and end with v1/moderation/analyze/.

dynamoguard_api_key | optional string

API key for the DynamoGuard policies.

enable_perturbations | optional boolean

Defaulted to True; perturbations will run by default.

perturbation_methods | optional List[str]

If enable_perturbations is True, these perturbation methods will run. By default, the full set of perturbations is applied: rewording, common_misspelling, leet_letters, random_upper.

compute | optional GPUSpecification | CPUSpecification

Compute specification identifying CPU or GPU configurations for the test. Defaults to a small CPU configuration if not provided.

grid | optional List[Dict[str, List[str | float | int]]]

Grid support for hyperparameters if/when applicable to the attack.

Note: Either applied_dynamoguard_policies or evaluated_dynamoguard_policies must be provided.

Returns

Test object.

Example

test_info = dfl.create_system_policy_compliance_test(
    name="guardrail_benchmark_{}".format(SLUG),
    model_key=model.key,
    applied_dynamoguard_policies=["policy_123"],
    evaluated_dynamoguard_policies=["policy_456"],
    dynamoguard_endpoint="https://api.dynamofl.com/v1/moderation/analyze/",
    dynamoguard_api_key=os.environ["DYNAMOGUARD_API_KEY"],
    enable_perturbations=True,
    perturbation_methods=["rewording", "common_misspelling", "leet_letters", "random_upper"],
    compute=CPUConfig(cpu_count=1, memory_count=2),
    grid=[{}],
)

Tests - Helpers

Get Attack Information

`get_attack_info(attack_id)`

Returns attack object status.

Method Parameters

attack_id | required string

Unique attack identifier.

Returns

Attack result JSON object.

Example

all_attacks = test_info.attacks
attack_ids = [attack["id"] for attack in all_attacks]
for attack in attack_ids:
    attack_info = dfl.get_attack_info(attack)
# Example Response: 
# {'id': '6566d2718cf68d15c393ff0d', 
#  'status': 'COMPLETED', 
#  'failureReason': None, 
#  'response': {
#      'metrics': {
#          'precision': 0.023429541595925297, 
#          'recall': 0.014047231270358305, 
#          'pii_intersection_per_category': {'DATE': 57, 'ORG': 6, 'PERSON': 6}, 
#          'dataset_pii_per_category': {'ORG': 1848, 'EMAIL': 494, 'USERNAME': 1130, 'DATE': 518, 'PERSON': 922}, 
#          'dataset_pii_category_count': 5, 
#          'dataset_top_3_categories': ['ORG', 'USERNAME', 'PERSON'], 
#          'extracted_pii_per_category': {'DATE': 568, 'EMAIL': 424, 'USERNAME': 1120, 'PERSON': 721, 'ORG': 112}, 
#          'samples': [{'prompt': '', 'response': "..."}, {...}], 
#          'model_type': 'decoder'
#       }, 
#      'inferences_location': 's3://dynamofl-pentest-prod/attacks/output/naive_extraction_1701238142.json', 
#      'resolved_args': {'attack_args': {...}
#      }
#  }
#  'testId': '6566d2718cf68d15c393ff05'
# }

Jailbreak Evaluations​

Create Static Jailbreak Test​

create_static_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)​

Method Parameters​

Returns​

Example​

Create Adaptive Jailbreak Test​

create_adaptive_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)​

Method Parameters​

Returns​

Example​

Create Policy Jailbreak Test​

create_policy_jailbreak_test(name, model_key, compute, dataset_id?, grid?)​

Method Parameters​

Returns​

Example​

Compliance and Security Evaluations​

Create System Policy Compliance Test​

create_system_policy_compliance_test(name, model_key, applied_dynamoguard_policies?, evaluated_dynamoguard_policies?, dynamoguard_endpoint?, dynamoguard_api_key?, enable_perturbations?, perturbation_methods?, compute?, grid?)​

Method Parameters​

Returns​

Example​

Tests - Helpers​

Get Attack Information​

get_attack_info(attack_id)​

Method Parameters​

Returns​

Example​

Jailbreak Evaluations

Create Static Jailbreak Test

`create_static_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)`

Method Parameters

Returns

Example

Create Adaptive Jailbreak Test

`create_adaptive_jailbreak_test(name, model_key, compute, fast_mode, dataset_id?, grid?)`

Method Parameters

Returns

Example

Create Policy Jailbreak Test

`create_policy_jailbreak_test(name, model_key, compute, dataset_id?, grid?)`

Method Parameters

Returns

Example

Compliance and Security Evaluations

Create System Policy Compliance Test

`create_system_policy_compliance_test(name, model_key, applied_dynamoguard_policies?, evaluated_dynamoguard_policies?, dynamoguard_endpoint?, dynamoguard_api_key?, enable_perturbations?, perturbation_methods?, compute?, grid?)`

Method Parameters

Returns

Example

Tests - Helpers

Get Attack Information

`get_attack_info(attack_id)`

Method Parameters

Returns

Example