LLM Security 101: Jailbreaks, Prompt Injection Attacks, and Building Guards

Опубликовано: 15 Август 2024
на канале: Trelis Research

862

➡️ Get Life-time Access to the ADVANCED-inference Repo (incl. inference scripts in this vid.): https://trelis.com/ADVANCED-inference/
➡️ Runpod Affiliate Link: https://runpod.io?ref=jmfkcdio
➡️ One-click GPU templates: https://github.com/TrelisResearch/one...

VIDEO RESOURCES:
Slides: https://docs.google.com/presentation/...

OTHER TRELIS LINKS:
➡️ Trelis Newsletter: https://blog.Trelis.com
➡️ Trelis Resources and Support: https://Trelis.com/About

TIMESTAMPS:
0:00 LLM Security Risks
0:55 Video Overview
6:16 Resources and Scripts
8:11 Installation and Server Setup
12:37 Jailbreak attacks to avoid Safety Guardrails
21:05 Detecting jailbreak attacks
22:24 Llama Guard and its prompt template
27:11 Llama Prompt Guard
28:40 Testing Jailbreak Detection
35:58 Testing for false positives with Llama Guard
40:00 Off-topic Requests
50:34 Prompt Injection Attacks (Container escape, File access / deletion, DoS)
1.05:27 Detecting Injection Attacks with a Custom Guard
1:10:00 Preventing Injection Attacks via User Authentication
1:1037 Using Prepared Statements to avoid SQL Injection Attacks
1:11:47 Response Sanitisation to avoid Injection Attacks
1:12:58 Malicious Code Attacks
1:14:07 Building a custom classifier for malicious code
1:15:57 Using Codeshield to detect malicious code
1:16:53 Malicious Code Detection Performance
1:20:40 Effect of Guards/shields on Response Time / Latency
1:25:12 Final Tips
1:26:59 Resources