March 31st 2025 @ Rotterdam, The Netherlands
Held in conjunction with ASPLOS 2025
The increasing deployment of transformer models in production environments has revealed fundamental challenges in traditional caching and scheduling approaches that treat all requests equally. This workshop focuses on the emerging paradigm of content-aware system design for transformer serving, where caching decisions and scheduling policies are guided by the semantic patterns and computational characteristics of incoming requests adaptively. As transformer applications span from simple queries to complex chain-of-thought reasoning and RAG-enhanced processing, the variance in resource requirements and execution patterns demands more sophisticated serving strategies.
CAT-Serve brings together researchers and practitioners to explore novel approaches that leverage request content understanding for system optimization.
Time | Topic | Speaker | Institution |
---|---|---|---|
8:30-9:00 | Coffee and Registration | ||
9:00-9:10 | Welcome speech | Prof. Freddy Gabbay | HUJI, Israel |
9:10-9:30 | Keynote Optimizing Transformer Model Serving: From Training to Content-Aware Inference | Gil Bloch, Principal architect | Nvidia |
9:30-10:00 | Context fast fusion for ehanced LLM Scheduling | Dr. Joseph Kampeas | Huawei Tel-Aviv Research Center, Israel |
10:00-10:30 | Boosting Transformer Efficiency with Decomposition and Adaptive Scheduling | Dr. Ori Schweitzer | Technion - Israel Institute of Technology, Israel |
10:30-11:00 | Coffee Break | ||
11:00-11:30 | CacheTrie: Optimizing Reuse of KV Cache via Path-compressed Trie | Chou Weizhong | Huawei Computing Product Line in cooperation with University of Science and Technology of China, China |
11:30-12:00 | SSSD: Simply-Scalable Speculative Decoding for Large Batch LLM Inference | Michele Marzollo | Huawei Zurich Research Center |
12:00-12:30 | Energy-aware and Adaptive Model Serving for DL Inference Stream” (invited talk) | Prof. Demetris Trihinas | University of Nicosia, Cyprus |
12:30-13:00 | Planning Rust Features for System Programming Requirements (invited talk) | Prof. Yijun Yu | The Open University, UK and a Director of Ada Language Engineering Lab at Huawei Ireland Research Center |
12:30-14:00 | Lunch Break |
All deadlines are AOE 11:59pm.
CAT-Serve emphasizes solutions that bridge hardware capabilities, system software, and ML serving frameworks to enable intelligent resource management.
Topics include, but not limited to:
Aiming to find novel ways to:
Submit your paper here
Please note you have to be a CMT registered user to submit.
Register to CMT here
Prof. Freddy Gabbay, Hebrew University
Prof. Avi Mendelson, Technion
Dr. Xiuqiao Li, Huawei
Igor Gov, Huawei
Avigail Oron, Huawei
Any questions may be directed to: cat-serve2025@googlegroups.com
The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.