CV · Generative AI · Multimodal

Hi, I'm Ji Xie

Building controllable vision-language systems at Berkeley AI Research & Zhejiang University.

About

Bridging research and engineering to make AI more reliable and controllable.

At Zhejiang University and Berkeley AI Research, I focus on vision-language systems.

My work includes text-to-image generation, multimodal reasoning, and developing tools to make advanced models more accessible.

Research

Key focus areas

Multimodal Models

Joint reasoning across text and vision

Controllable AI

Steerable generation workflows

Trustworthy AI

Responsible model behaviors

Selected Work

Highlights from ongoing research and collaborations

AI visualization
2024 · Research

Instruction-Guided Diffusion Editing

Building a controllable editing stack that aligns diffusion models with free-form instructions and layout hints.

Multimodal visualization
2023 · Collaboration

Unified Vision-Language Benchmarks

Curated a benchmarking suite for evaluating grounding and reasoning across captioning, VQA, and editing tasks.

Recent Updates

Snapshots of what I've been up to

Jan 2024

Joined Berkeley AI Research

Started an internship with the BAIR lab to explore multimodal instruction-following models.

Jun 2023

Presented on controllable generation

Shared work on layout-aware diffusion editing at a university seminar.

Dec 2022

Launched open-source resources

Published a public repo of prompts and evaluation scripts for diffusion-based editing pipelines.

Contact

Let's connect about vision and generative AI