cat-serve

Content-Aware Caching and Scheduling for Transformer Model Serving

March 31st 2025 @ Rotterdam, The Netherlands

Held in conjunction with ASPLOS 2025

About

The increasing deployment of transformer models in production environments has revealed fundamental challenges in traditional caching and scheduling approaches that treat all requests equally. This workshop focuses on the emerging paradigm of content-aware system design for transformer serving, where caching decisions and scheduling policies are guided by the semantic patterns and computational characteristics of incoming requests adaptively. As transformer applications span from simple queries to complex chain-of-thought reasoning and RAG-enhanced processing, the variance in resource requirements and execution patterns demands more sophisticated serving strategies.

CAT-Serve brings together researchers and practitioners to explore novel approaches that leverage request content understanding for system optimization.

Important Dates

All deadlines are AOE 11:59pm.

Call for Papers

CAT-Serve emphasizes solutions that bridge hardware capabilities, system software, and ML serving frameworks to enable intelligent resource management.

Topics include, but not limited to:

Aiming to find novel ways to:

Submission Guidelines

Paper submission system:

Submit your paper here
Please note you have to be a CMT registered user to submit.
Register to CMT here

Workshop Committee

Prof. Freddy Gabbay, Hebrew University

Prof. Avi Mendelson, Technion

Dr. Xiuqiao Li, Huawei

Igor Gov, Huawei

Avigail Oron, Huawei

Contact Us

Any questions may be directed to: cat-serve2025@googlegroups.com


The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.