Zixin Guo

I'm a fourth-year PhD student at Department of Computer Science, Aalto University. My current research interests include:

  • Multimodal Learning, advised by Jorma Laaksonen (Aalto University, Finland 🇫🇮), in close collaboration with Min Cao (Soochow University, China 🇨🇳).
  • Eye Tracking, led by Antti Oulasvirta (Aalto University, Finland 🇫🇮), in close collaboration with Yue Jiang (University of Utah, USA 🇺🇸) and Luis A. Leiva (University of Luxembourg, Luxembourg 🇱🇺).

Email  /  Scholar  /  Github  /  LinkedIn

profile photo

Selected Publications

* = Equal contribution
# = Mentoring role

Imagine How To Change: Explicit Procedure Modeling for Change Captioning
Jiayang Sun*, Zixin Guo*#, Min Cao, Guibo Zhu, Jorma Laaksonen
ICLR, 2026  (Top tier)
paper / code
SeekUI: Predicting Visual Search Behavior on Graphical User Interfaces with a Reward-Augmented Vision Language Model
Zixin Guo*, Yue Jiang*, Luis A. Leiva, Antti Oulasvirta
CHI, 2026  (Top tier)
paper / code
Learning to Describe Implicit Changes: Noise-Robust Pre-training for Image Difference Captioning
Zixin Guo, Jiayang Sun, Tzu-Jui Julius Wang, Abduljalil Radman, Selen Pehlivan, Min Cao, Jorma Laaksonen
Findings of EMNLP, 2025  (Top tier)
paper
Valor32k-AVQA v2.0: Open-Ended Audio-Visual Question Answering Dataset and Benchmark
Ines Riahi, Abduljalil Radman, Zixin Guo, Rachid Hedjam, Jorma Laaksonen
ACM MM, 2025  (Top tier)
paper / code
EyeFormer: Predicting Scanpaths in Free-Viewing Tasks with Transformer-Guided Reinforcement Learning
Yue Jiang*, Zixin Guo*, Hamed Rezazadegan Tavakoli, Luis A. Leiva, Antti Oulasvirta
UIST, 2024  (Top tier)
paper / code
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting
Zixin Guo, Tzu-Jui Julius Wang, Selen Pehlivan, Abduljalil Radman, Jorma Laaksonen
SIGIR, 2023  (Short Paper)  (Top tier)
paper
CLIP4IDC: CLIP for Image Difference Captioning
Zixin Guo, Tzu-Jui Julius Wang, Jorma Laaksonen
AACL, 2022  (Short Paper)  (50+ Citations)
paper / code
See Full List

Website template from Jon Barron.