Projects

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents Preprint 2026

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

Ahmad Al-Tawaha (Virginia Tech), Shangding Gu (UC Berkeley), Peizhi Niu (UIUC), Ruoxi Jia (Virginia Tech), Ming Jin (Virginia Tech)

Preprint, 2026 — under review at NeurIPS

Memory-equipped LLM agents can become less safe over time even without an attacker. We introduce temporal memory contamination — a failure mode driven by benign memory accumulation across tasks — and a trigger-probe protocol that measures it. Across 5 deployment streams, 8 memory architectures, and two agent classes, violation rates rise with exposure length, and a retrieval-time diagnostic catches it before generation (0.970 / 0.984 recall).

A Dynamic Penalization Framework for Online Rank-1 Semidefinite Programming Relaxations L4DC 2025

A Dynamic Penalization Framework for Online Rank-1 Semidefinite Programming Relaxations

Ahmad Al-Tawaha, Javad Lavaei, Ming Jin

L4DC, 2025

In Dynamic Penalization for Rank-1 SDP Relaxations (L4DC 2025, with Lavaei and Jin), we differentiate through a penalized SDP solver to learn penalty matrices that drive relaxations toward rank-1 solutions, and meta-learn initializations across tasks for faster, feasibility-preserving solves on Max-Cut and optimal power flow.

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models ICML 2024

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models

Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar, Lu Wang, Ruoxi Jia, Ming Jin

ICML, 2024

Some tasks — like planning problems — cannot be solved linearly. Chain-of-Thought works for step-by-step reasoning but fails when you need to explore multiple paths. Tree-of-Thoughts solves this but requires many separate LLM queries per problem plus external code to manage the tree. AoT’s insight is simpler: show the model examples of complete search trajectories — including backtracking and dead ends — and it learns to internalize the search itself. No external tree management. One query.