|
Zixin Guo
I'm a fourth-year PhD student at Department of Computer Science, Aalto
University. My current research
interests
include:
-
Multimodal Learning, advised by
Jorma Laaksonen (Aalto University, Finland 🇫🇮) and
in collaboration with Min Cao (Soochow University, China 🇨🇳).
-
Eye Tracking, in close collaboration with
Yue Jiang
(University of Utah, USA 🇺🇸).
Email /
Scholar /
Github /
LinkedIn
|
|
Selected Publications
* = Equal contribution # = Mentoring role
|
Imagine How To Change: Explicit Procedure Modeling for Change
Captioning
Jiayang Sun*,
Zixin Guo*#,
Min Cao,
Guibo Zhu,
Jorma Laaksonen
ICLR, 2026  (CCF A)
paper
/
code
|
SeekUI: Predicting Visual Search Behavior on Graphical User Interfaces with a
Reward-Augmented Vision Language Model
Zixin Guo*,
Yue Jiang*,
Luis A. Leiva,
Antti Oulasvirta
CHI, 2026  (CCF A)
paper
/
code
|
Learning to Describe Implicit Changes: Noise-Robust Pre-training for Image
Difference Captioning
Zixin Guo,
Jiayang Sun,
Tzu-Jui Julius Wang,
Abduljalil Radman,
Selen Pehlivan,
Min Cao,
Jorma Laaksonen
Findings of EMNLP, 2025  (CCF B)
paper
|
EyeFormer: Predicting Scanpaths in Free-Viewing Tasks with Transformer-Guided
Reinforcement Learning
Yue Jiang*,
Zixin Guo*,
Hamed Rezazadegan Tavakoli,
Luis A. Leiva,
Antti Oulasvirta
UIST, 2024  (CCF A)
paper
/
code
|
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language
Pre-training via Prompting
Zixin Guo,
Tzu-Jui Julius Wang,
Selen Pehlivan,
Abduljalil Radman,
Jorma Laaksonen
SIGIR, 2023  (Short Paper)  (CCF A)
paper
|
CLIP4IDC: CLIP for Image Difference Captioning
Zixin Guo,
Tzu-Jui Julius Wang,
Jorma Laaksonen
AACL, 2022  (Short Paper)  (50+ Citations)
paper
/
code
|
Valor32k-AVQA v2.0: Open-Ended Audio-Visual Question Answering Dataset and
Benchmark
Ines Riahi,
Abduljalil Radman,
Zixin Guo,
Rachid Hedjam,
Jorma Laaksonen
ACM MM, 2025
paper
|
Prompt-based Weakly-supervised Vision-language Pre-training
Zixin Guo,
Tzu-Jui Julius Wang,
Selen Pehlivan,
Abduljalil Radman,
Min Cao,
Jorma Laaksonen
Pattern Recognition Letters, 2025
paper
|
Efficient Text-to-video Retrieval via Multi-modal Multi-tagger Derived
Pre-screening
Yingjia Xu,
Mengxia Wu,
Zixin Guo,
Min Cao,
Mang Ye,
Jorma Laaksonen
Visual Intelligence, 2025
paper
|
Diffusion-based Multimodal Video Captioning
Jaakko Kainulainen,
Zixin Guo#,
Jorma Laaksonen
ACCV, 2024
paper
|
Impact of Design Decisions in Scanpath Modeling
Parvin Emami,
Yue Jiang,
Zixin Guo,
Luis A. Leiva
ETRA, 2024
paper
|
Post-Attention Modulator for Dense Video Captioning
Zixin Guo,
Tzu-Jui Julius Wang,
Jorma Laaksonen
ICPR, 2022
paper
|
|