March 31st 2025 @ Rotterdam, The Netherlands
Held in conjunction with ASPLOS 2025
The increasing deployment of transformer models in production environments has revealed fundamental challenges in traditional caching and scheduling approaches that treat all requests equally. This workshop focuses on the emerging paradigm of content-aware system design for transformer serving, where caching decisions and scheduling policies are guided by the semantic patterns and computational characteristics of incoming requests adaptively. As transformer applications span from simple queries to complex chain-of-thought reasoning and RAG-enhanced processing, the variance in resource requirements and execution patterns demands more sophisticated serving strategies.
CAT-Serve brings together researchers and practitioners to explore novel approaches that leverage request content understanding for system optimization.
All deadlines are AOE 11:59pm.
CAT-Serve emphasizes solutions that bridge hardware capabilities, system software, and ML serving frameworks to enable intelligent resource management.
Topics include, but not limited to:
Aiming to find novel ways to:
Submit your paper here
Please note you have to be a CMT registered user to submit.
Register to CMT here
Prof. Freddy Gabbay, Hebrew University
Prof. Avi Mendelson, Technion
Dr. Xiuqiao Li, Huawei
Igor Gov, Huawei
Avigail Oron, Huawei
Any questions may be directed to: cat-serve2025@googlegroups.com
The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.