CV · Generative AI · Multimodal

Hi, I'm Ji Xie

Building controllable vision-language systems at Berkeley AI Research & Zhejiang University.

Contact Projects

About

Bridging research and engineering to make AI more reliable and controllable.

At Zhejiang University and Berkeley AI Research, I focus on vision-language systems.

My work includes text-to-image generation, multimodal reasoning, and developing tools to make advanced models more accessible.

Key focus areas

Joint reasoning across text and vision

Steerable generation workflows

Responsible model behaviors

Highlights from ongoing research and collaborations

2024 · Research

Building a controllable editing stack that aligns diffusion models with free-form instructions and layout hints.

2023 · Collaboration

Curated a benchmarking suite for evaluating grounding and reasoning across captioning, VQA, and editing tasks.

Snapshots of what I've been up to

Jan 2024

Started an internship with the BAIR lab to explore multimodal instruction-following models.

Jun 2023

Shared work on layout-aware diffusion editing at a university seminar.

Dec 2022

Published a public repo of prompts and evaluation scripts for diffusion-based editing pipelines.

Let's connect about vision and generative AI

sanaka@berkeley.edu

GitHub

Scholar