AI Safety Researcher · Ex-Tech Leader · Machine Learning Engineer
KatherineSoto
Building red-teaming agents and using mechanistic interpretability to detect multi-turn jailbreak attacks on LLMs. Ex Tech Lead. Peruvian, based in Spain.
Scroll
01 — RESEARCH
Current thesis
MSc Thesis · La Salle — Universitat Ramon Llull
Interpretability in Multi-turn Jailbreak Attacks on LLMs
My research sits at the intersection of AI security and mechanistic interpretability. I'm working on both sides of the problem — building adversarial agents that attack, and using interpretability to understand and detect those attacks from inside the model.
Red-teaming Agent
Building a multi-turn adversarial agent that generates jailbreak attacks against LLMs, fine-tuned with LoRA on the ScaleAI/MHJ dataset.
Attack Detection via Interp
Using mechanistic interpretability to identify internal signals that distinguish an ongoing multi-turn attack from normal conversation.
02 — PROJECTS
Selected work
01
Multi-turn Red-team Bench
Training and evaluating multi-turn adversarial agents to test LLM safety robustness. Comparing censored vs uncensored models as attackers, fine-tuned with LoRA on ScaleAI/MHJ.
PythonLoRALLMsRed-teaming
View project →
02
Moltbook Safety Pipeline
End-to-end data pipeline for scraping, processing with Polars, and predicting user karma with H2O AutoML. Production data engineering for content safety.
PolarsH2O AutoMLPipeline
View project →
03
Network Attack Detection
Entropy-based analysis of network traffic to distinguish between normal patterns and various types of cyber attacks using the KDD Cup 1999 dataset.
EntropyNetwork SecurityClassification
View project →
04
Emotion Recognition — ViT
Sentiment analysis in images using Vision Transformers combined with LightGBM and Optuna hyperparameter optimization for emotion understanding.
ViTLightGBMOptuna
View project →
03 — JOURNEY
Builder to researcher
Experience
2025 — PRESENT
MSc Researcher — AI Safety
La Salle — Universitat Ramon Llull
MSc in Data Science. Thesis on interpretability in multi-turn jailbreak attacks — building red-teaming agents and mech interp for attack detection.
PREVIOUS
Tech Lead
US-based Startup
Led technical team. Full-stack development, DevOps infrastructure, and production ML systems at scale.
AI Safety Programs
BlueDot AI Safety
Technical Program · 2026
ML4Good
AI Safety Bootcamp · 2025
MSc Data Science
Universitat Ramon Llull · 2025–2026