2026

Rethinking Model Efficiency: Multi-Agent Inference with Large Models

arXiv'2604 Rethinking Model Efficiency: Multi-Agent Inference with Large Models

Sixun Dong, Juhua Hu, Steven Li, Wei Wen, Qi Qian

TL;DR

Analyzes VLM end-to-end latency and reveals that output token length dominates inference cost. While large models with short outputs outperform small models with long generations, reasoning remains essential for complex tasks. We bridge this by proposing a multi-agent framework where a small model computes the reasoning tokens and transfers them to a large model, achieving large-model accuracy with minimal latency.

To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks

arXiv'2602 To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks

Nanxu Gong, Haotian Li, Sixun Dong, Jianxun Lian, Yanjie Fu, Xing Xie

TL;DR

Investigates the capabilities of Large Reasoning Models (like OpenAI o1/o3) on Theory of Mind tasks, revealing that continuous reasoning does not always guarantee better socially-aware outcomes.

Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER

ICASSP'26 Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER

Xiuwen Zheng, Sixun Dong, Bornali Phukon, Mark Hasegawa-Johnson, Chang D. Yoo

TL;DR

Proposes an LLM-agent-based post-ASR correction framework for dysarthric speech recognition that focuses on semantic accuracy rather than just minimizing Word Error Rate (WER).

2025

MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

ICLR'26 MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

Sixun Dong, Juhua Hu, Mian Zhang, Ming Yin, Yanjie Fu, Qi Qian

TL;DR

Proposes MMTok to efficiently select informative vision tokens via a multimodal coverage maximization strategy, significantly accelerating VLM inference while maintaining model performance.

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

arXiv'2508 LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Ming Yin, Dinghan Shen, Silei Xu, Jianbing Han, Sixun Dong, Mian Zhang, Yebowen Hu, Shujian Liu, Simin Ma, Song Wang, Sathish Reddy Indurthi, Xun Wang, Yiran Chen, Kaiqiang Song

TL;DR

Constructs a stress-testing benchmark specifically designed to evaluate and diagnose the stability and limitations of MCP-enabled agents when handling complex, challenging queries.

Complex Logical Instruction Generation

arXiv'2508 Complex Logical Instruction Generation

Mian Zhang, Shujian Liu, Sixun Dong, Ming Yin, Yebowen Hu, Xun Wang, Steven Ma, Song Wang, Sathish Reddy Indurthi, Haoyun Deng, Zhiyu Zoey Chen, Kaiqiang Song

TL;DR

Introduces an automated data generation framework and evaluation benchmark for complex logical instructions to assess and enhance the multi-step reasoning capabilities of LLMs.

Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives

arXiv'2506 Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives

Sixun Dong, Wei Fan, Teresa Wu, Yanjie Fu

TL;DR

Proposes TimesCLIP, innovatively aligning time series data with visual and textual multi-modal perspectives to effectively enhance both short-term and long-term forecasting performance.

TimesFrame: Multi-Variable Time Series is a Video of Numerical Data

Under Review TimesFrame: Multi-Variable Time Series is a Video of Numerical Data

Sixun Dong, Nanxu Gong, Haoyue Bai, Xinyuan Wang, Wangyang Ying, Wei Fan, Yanjie Fu

Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories

arXiv'2505 Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories

Nanxu Gong*, Sixun Dong*, Haoyue Bai, Xinyuan Wang, Wangyang Ying, Yanjie Fu

TL;DR

Builds an LLM-driven agentic framework that unifies feature selection and generation through collaborative teaming, task planning, and memory mechanisms.

Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation

NeurIPS'25 Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation

Nanxu Gong*, Zijun Li*, Sixun Dong, Haoyue Bai, Wangyang Ying, Xinyuan Wang, Yanjie Fu

TL;DR

Proposes a reward-guided hierarchical diffusion model that "sculpts" data representations from noise to generate optimal feature transformations for specific downstream tasks.

MECT: From Multimodal Knowledge Acquisition To Contrastive Embedding Construction For Generative Feature Transformation

Under Review MECT: From Multimodal Knowledge Acquisition To Contrastive Embedding Construction For Generative Feature Transformation

Nanxu Gong, Sixun Dong, Haoyue Bai, Wangyang Ying, Yanjie Fu

Efficient Post-Training Refinement of Latent Reasoning in Large Language Models

AAAI'26 Efficient Post-Training Refinement of Latent Reasoning in Large Language Models

Xinyuan Wang, Dongjie Wang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Sixun Dong, Kunpeng Liu, Yanjie Fu

TL;DR

Proposes an efficient post-training refinement strategy specifically designed to optimize and unlock the deep logical reasoning capabilities of LLMs within their latent representation space.

Brownian Bridge Augmented Surrogate Simulation and Injection Planning for Geological CO2 Storage

AAAI'26 Brownian Bridge Augmented Surrogate Simulation and Injection Planning for Geological CO2 Storage

Haoyue Bai, Guodong Chen, Wangyang Ying, Xinyuan Wang, Nanxu Gong, Sixun Dong, Giulia Pedrielli, Haoyu Wang, Haifeng Chen, Yanjie Fu

TL;DR

Combines Brownian Bridge-augmented surrogate models to provide highly efficient simulation and injection planning for optimizing geological CO2 storage strategies.

Unsupervised feature transformation via in-context generation, generator-critic llm agents, and duet-play teaming

IJCAI'25 Unsupervised feature transformation via in-context generation, generator-critic llm agents, and duet-play teaming

Nanxu Gong, Xinyuan Wang, Wangyang Ying, Haoyue Bai, Sixun Dong, Haifeng Chen, Yanjie Fu

TL;DR

Presents a fully unsupervised feature transformation framework utilizing generator-critic LLM agents and in-context generation via duet-play teaming to automatically extract high-quality features.

2024

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

WACV'25 MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

Chenyu Wang, Weixin Luo, Sixun Dong, Xiaohua Xuan, Zhengxin Li, Lin Ma, Shenghua Gao

TL;DR

Introduces the MLLM-Tool framework and contributes a specialized dataset to empower Multimodal Large Language Models (MLLMs) with the ability to understand, learn, and invoke external tools.

RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation

3DV'24 RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation

Yiqun Zhao, Zibo Zhao, Jing Li, Sixun Dong, Shenghua Gao

TL;DR

Presents RoomDesigner, which encodes "anchor-latents" to guide the generation of 3D indoor scenes that are both highly style-consistent and shape-compatible.

2023

Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

CVPR'23 Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

Sixun Dong*, Huazhang Hu*, Dongze Lian, Weixin Luo, Yicheng Qian, Shenghua Gao

TL;DR

Proposes a weakly supervised video representation learning approach that utilizes unaligned text for sequential videos, significantly reducing the reliance on fine-grained video-text alignment annotations.

2022

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

CVPR'22🏆 Oral TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Huazhang Hu*, Sixun Dong*, Yiqun Zhao, Dongze Lian, Zhengxin Li, Shenghua Gao

TL;DR

Introduces the RepCount dataset and pioneers the use of regression density maps alongside a Transformer-based architecture to encode multi-scale temporal correlations, significantly improving repetitive action counting.

Survey Papers

Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

TKDD'26 Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

Dongjie Wang, Yanyong Huang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Xinyuan Wang, Sixun Dong, Tao Zhe, Kunpeng Liu, Meng Xiao, et al.

TL;DR

A comprehensive survey mapping the evolution of Data-Centric AI in tabular data transformation, covering traditional statistical methods, reinforcement learning, and generative AI approaches.

A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

arXiv'2502 A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

Wangyang Ying, Cong Wei, Nanxu Gong, Xinyuan Wang, Haoyue Bai, Arun Vignesh Malarkkan, Sixun Dong, Dongjie Wang, Denghui Zhang, Yanjie Fu

TL;DR

A systematic review focusing on how state-of-the-art reinforcement learning and generative AI models can be leveraged to improve the quality and efficiency of tabular data learning.