TRICT
Injecting attacks into coder LLMs.
Back in the early days of LLMs, no one knew how easy they were to fool! This was a simple experiment that showed that it is extremely easy to inject attack code into coding LLMs. As a government advisor, it was important to demonstrate that generative AI has no real “guard rails”. I finetuned a lightweight LLM with a simple python injection attack and found that the model preditably injected the attack into suggested output. This demonstrated how LLMs trained on open source repositories could be vulnerable to malcious behavior.
I was the PI on this project. This project was an internally funded research project that I pitched and won a small grant for.