Coming Soon

Adversarial Testing

Vulnerability detection is critical to the safety of your users as well as your organization. Exploits can cause your chatbot to disclose sensitive information to unauthorized individuals, or exhibit behaviors that cause severe reputational damage to your organization.

bottest.ai provides pre-made Test Suites designed after cutting-edge LLM security research to help test for vulnerabilities before you release to production.

adversarial

Competing Objective Attacks

A competing objective attack refers to scenarios in which a model’s instructions and its defined safety goals conflict. Through the exploitation of a chatbot’s inclination to follow instructions, a user can override safety rules and jailbreak your chatbot in 4 main ways:

coming-soon

1. Prefix Injection

This is when a user will ask the model to start their response with an affirmative response. Due to the technical nature of how LLMs generate outputs, this will highly increase the chances of the chatbot following through with the request.

2. Test Repository

This is when a user provides detailed instructions to the chatbot to not refuse their request. For example, telling the chatbot is in development mode and should therefore answer everything.

3. Refusal Suppression

Track Test Run data to understand performance over time. Receive emailed reports highlighting changes key metrics.

4. Role-play

This is when a user will provide detailed instructions for your chatbot to take on the role of a character that might follow every instruction and ignore all safety barriers.

Mismatched Generalizations

A mismatched generalization attack refers to scenarios in which an attacker takes advantage of the limited domain in which safety training occurs. Through a model’s potential failure to generalize its safety barriers due to a limited safety training dataset, an attacker can jailbreak the model in 4 main ways:

coming-soon

1. Prefix Injection

This is when an attacker encodes their input (for example in Base64), which might be understood by the larger underlying LLM, but will completely bypass the safety training.

2. Character Transformation

This is when an attacker will obfuscate their prompt by altering individual characters in ways that the underlying model may understand, but is not accounted for during safety

3. Word Transformation

Similar to character transformation, this is when an attacker will obfuscate their prompt by altering specific sensitive words that might cause the model to trigger a safety response (for example splitting sensitive words into substrings or using Pig Latin).

4. Prompt Level Obfuscation

Also similar to the other obfuscation methods, this technique is when an attacker applies obfuscation on the entire prompt, such as translating to another language or having the LLM itself output an obfuscation that it would still understand.

Ready to get started?

Create a free account, no credit card required. Or, take a look at the pricing options for a comparison of different plans.