Bridging research and engineering to make AI more reliable and controllable.
At Zhejiang University and Berkeley AI Research, I focus on vision-language systems.
My work includes text-to-image generation, multimodal reasoning, and developing tools to make advanced models more accessible.
Key focus areas
Joint reasoning across text and vision
Steerable generation workflows
Responsible model behaviors
Highlights from ongoing research and collaborations
Curated a benchmarking suite for evaluating grounding and reasoning across captioning, VQA, and editing tasks.
Snapshots of what I've been up to
Started an internship with the BAIR lab to explore multimodal instruction-following models.
Shared work on layout-aware diffusion editing at a university seminar.
Published a public repo of prompts and evaluation scripts for diffusion-based editing pipelines.