Infrastructure deployed for testing
Strip patterns like "ignore previous instructions" from external API responses before processing. Consider a secondary filter model.
Remove hidden elements (display:none, white-on-white), HTML comments, and meta tags before processing scraped content.
Include explicit instructions to ignore commands found in external data sources and maintain core guidelines.
Monitor responses for unique canary strings to detect if agents are echoing injected content.