Projects
SLM-RL Search →
Trained a 4B parameter model with RL to write and execute its own search strategies. 3.7x improvement over baseline.
Multi-Agent LLM Collaboration →
Multi-agent architecture with custom reflection and inter-agent communication. Fine-tuned LLaMA-3 8B with LoRA, 85% success rate across 7 coordination tasks.
PyCxsim →
Open-source Python package for running multi-agent simulations with a real-time visual interface.