May 1, 2026 · 8 min read
GPT Image 2 Team Exposed: 13-Person Elite Squad, 4 Months to the Top
Why is GPT Image 2 so good? The core team has only 13 members. From Research Lead Boyuan Chen's Diffusion Forcing to Jianfeng Wang's world knowledge understanding, discover how this elite squad achieved an image generation leap in just 4 months.
Why Is GPT Image 2 So Good?
After GPT Image 2 went viral across the internet, one question kept coming up: why are the results so impressive?
Research Lead Boyuan Chen gave a directional answer: the underlying architecture has been completely rebuilt. However, he declined to reveal whether they use diffusion models or autoregressive techniques, instead mysteriously describing it as a “general-purpose model” or “the GPT of the image domain.”
From one of Chen’s tweets, we can see that since GPT Image 1.5 at the end of last December, the entire team achieved this massive improvement in just four months. Even more astonishing is that such breakthrough results came from a core team of only 13 people.
Team lead Gabriel Goh also shared an AI-generated family portrait of the team members, with commenters marveling: why are they all Asian faces?

Want to experience GPT Image 2’s image generation capabilities yourself? Click the button below to start using it for free on the official site:
Boyuan Chen: From Not Knowing Python to Research Lead
To understand GPT Image 2’s technical strength, you only need to look at the academic backgrounds of its core team members.
Boyuan Chen is the team’s Research Lead. He and another team member, Kiwhan Song, both completed their PhDs at MIT under the same advisor — Professor Vincent Sitzmann.
His doctoral representative work, “Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion,” was selected for NeurIPS 2024. This research proposes an entirely new paradigm for sequence generation training — Diffusion Forcing — which combines per-token independent noise-level diffusion with causal next-token prediction, merging the variable-length generation capability of autoregressive models with the long-range guidance advantages of full-sequence diffusion models.
During his internship at Google, he also published SpatialVLM as a co-first author. This research automatically constructed an internet-scale 3D spatial reasoning VQA dataset (covering 10 million images and 2 billion QA pairs), giving visual language models both quantitative and qualitative spatial reasoning capabilities — outputting precise metric distances, dimensions, bearings, and more from a single 2D image.
This research was later applied to chain-of-thought spatial reasoning in the embodied intelligence domain.
Also during his Google internship, the instruction fine-tuning technology he developed was subsequently adopted by Gemini 2.0.
Interestingly, when Boyuan Chen attended a research summer camp in high school, he didn’t even know the basics of Python syntax. It was then that he met Xia Fei, a senior researcher at Google DeepMind, who introduced him to the world of AI. Xia Fei invited him to complete high-quality internships at DeepMind twice, and these experiences gave Chen engineering experience in large-scale model training, as well as a valuable perspective for understanding the data needs of multimodal systems.
After completing his PhD, Chen joined OpenAI in June 2025 and quickly became one of the five core members of GPT image generation, responsible for all training of the GPT image generation model, while also being a member of the Sora video generation team.
In a public demonstration, he created a poster for his hometown of Wuxi, then made a Korean poster for a teammate from Seoul, and a Bengali poster for a teammate from Bangladesh — each with perfectly rendered text.
Jianfeng Wang: Making Image Generation AI Understand World Knowledge
Jianfeng Wang, who completed his PhD at USTC, is responsible for another amazing capability in the GPT Image 2 team: instruction following and world understanding.
Old models always drew clocks pointing to 10:10. This phenomenon stems from the internet being flooded with clock advertising images — manufacturers ran experiments with psychologists and determined that this angle best stimulates consumers’ willingness to buy, so almost all advertising images show 10:10.
Wang had the new model draw 2:25, 3:30, 9:10, 7:45 — and the results were essentially accurate.
That was just the appetizer. More complex spatial layout tests — apple in the center, cup on the right, book on top, camera on the left, basketball below — the model executed all of them precisely.
Before joining OpenAI, he worked at Microsoft for nearly 9 years. During his time at Microsoft, he had already collaborated with the OpenAI team on DALL-E 3.
He has published multiple academic papers in the computer vision field, with research covering image classification, object detection, semantic segmentation, and visual representation learning. The significant improvement in GPT Image 2’s world knowledge understanding is largely thanks to his correct understanding of object semantic content and functional structure.
At the end of his demo video, Wang said: GPT Image 2 is eliminating the gap between your intent and the model’s output, truly delivering exactly what you want.
Yuguang Yang: Generating High-Precision Complex Infographics
Yuguang Yang demonstrated infographic and presentation generation capabilities during the GPT Image 2 launch event.
A full 75-page GPT-3 paper dropped into ChatGPT, automatically generating 7 slides.
His experience is arguably the most diverse among team members. Every career change was a crossover, but all focused on machine learning:
- Undergraduate: Zhejiang University Chu Kochen Honors College, Engineering
- PhD: Johns Hopkins University, Computational Chemical Physics and Machine Learning
- First full-time job: Quantitative Analyst
- During visiting research at Tsinghua: Reinforcement learning and control algorithms for nanorobots
- At Amazon: Alexa Speech Research
- At Microsoft: Bing Search query understanding and retrieval, document understanding
- After joining OpenAI in early 2025: Participated in the ChatGPT agent project in addition to image generation
On his personal account, Yang introduced GPT Image 2’s infographic generation capabilities, noting that it can save researchers a lot of time. He also repeatedly reminded everyone: don’t forget to select thinking mode when creating infographics.
From DALL-E to GPT Image 2.0
From team member Kenji Hata’s self-introduction, we learned that GPT Image 1.0 was essentially the image generation part of GPT-4o.
And there’s one person who has participated in the entire OpenAI multimodal series research from the DALL-E days — he is the GPT Image 2.0 team lead, Gabriel Goh.
Since joining OpenAI in 2019, his early research was more theoretical, focusing on interpretability and convex optimization. From DALL-E onward, he gradually shifted toward image generation.
Another team member, Weixin Liang’s research background, also revealed more about GPT Image 2’s technical foundation.
His representative work during his Meta internship, Mixture-of-Transformers, introduced modality-decoupled MoE and decoupled attention mechanisms, significantly reducing the computational cost of multimodal model pre-training.
He completed his PhD at Stanford and his undergraduate degree at Zhejiang University Chu Kochen Honors College, though a few years later than Yuguang Yang. Like Boyuan Chen, Weixin Liang joined OpenAI right after completing his PhD in 2025 and quickly became a core member of the team.
Other GPT Image 2.0 Team Members
| Name | Background | Role |
|---|---|---|
| Ayaan Haque | Previously at Luma AI, participated in training Luma’s video generation foundation model Dream Machine | Image Generation |
| Bing Liang | Worked at Google for 5+ years, participated in Imagen3, Veo, Gemini Multimodal, joined OpenAI in 2025 | Image Generation Research |
| Mengchao Zhong | Undergraduate at Shanghai Jiao Tong University, Master’s from Texas A&M, previously software engineer at Pinterest and Airtable | Multimodal Product Engineering |
| Dibya Bhattacharjee | Yale University, 2015 IPhO Bronze Medal, highest global score in CIE A-Level Math and Biology | Core Research |
| Kiwhan Song | Latest to join the team in October 2025, MIT PhD graduate | Research & Prompt Master |
From the earliest DALL-E to today’s GPT Image 2.0, this team has successively solved four core problems:
- Can draw it (DALL-E phase)
- Can draw it clearly (DALL-E 2/3 phase)
- Can draw it beautifully (GPT-4o image generation phase)
- Can draw it accurately (GPT Image 2.0 phase)
Despite significant talent turnover at OpenAI in recent years, OpenAI remains the company that continues to attract unique personalities — no restrictions on professional backgrounds, welcoming crossovers, believing in bottom-up emergent research. Starting from a small team, gaining resources after breakthroughs, until changing the world.
One More Thing
Once, the avatars generated by GPT-4o image generation in Ghibli style swept the world.
Now, GPT Image 2.0 team members have all changed their avatars to this unique long-neck style.
Ready to Try GPT Image 2 Yourself?
All this technical detail is great, but nothing beats trying it yourself.
How good is GPT Image 2’s image generation really? The text rendering precision, spatial understanding, instruction following ability — these impressive capabilities in papers and demo videos are things you can intuitively feel the difference with just one try.
This article was organized with the assistance of AI tools and reviewed and edited by humans to ensure accuracy. If you have any questions or feedback about GPT Image 2, please contact our team at: support@gpt-image2.cn.