Annus Zulfiqar

Annus Zulfiqar
I’m a PhD candidate in the Computer Science and Engineering department at the University of Michigan, Ann Arbor, advised by Professor Muhammad Shahbaz. My research is about rethinking data plane architectures and algorithms to build scalable networked systems that overcome traditional resource limitations and adapt quickly to dynamic network conditions.
Academic CV Google Scholar Github Email LinkedIn
Publications
Micro 2025
NetSparse: In-Network Acceleration of Distributed Sparse Kernels
Gerasimos Gerogiannis, Dimitrios Merkouriadis, Charles Block, Annus Zulfiqar, Filippos Tofalos, Muhammad Shahbaz, Josep Torrellas
Many accelerators can speed up sparse computations, but in large clusters performance is often limited by communication, and software-based networking optimizations are inefficient. With NetSparse, we propose a set of new hardware mechanisms in SmartNICs and programmable switches that offload communication, eliminate redundant requests, combine multiple transfers into fewer packets, and cache data across racks. In simulations of a 128-node cluster, NetSparse accelerates sparse workloads by 38x compared to a single node, far surpassing the 3x speedup with software communication and reaching over half the performance of an ideal system with no network overhead.
Paper
Arxiv 2025
Preprint
SpliDT: Partitioned Decision Trees for Scalable Stateful Inference at Line Rate
Murayyiam Parvez*, Annus Zulfiqar*, Roman Beltiukov, Shir Landau-Feibish, Walter Willinger, Arpit Gupta, Muhammad Shahbaz
Decision Trees (DTs) are ideal for in-network ML using programmable RMT switches owing to their hierarchical decisions and post-training interpretability. But traditional implementations only accommodate top-k features, limited by the small stateful memory in switches and deliver low classification performance. We present our vision of a partitioned DT inference architecture, SpliDT, where we delegate feature collection to subtree-level inference and reuse stateful resources over sliding windows of packets to scale far beyond top-k features. Preliminary results demonstrate that SpliDT inference yields Pareto frontier on F1 score vs number of stateful flows for many real-world security applications.
Paper Extended Abstract Poster
SIGCOMM 2025 Poster
Kairo: Incremental View Maintenance for Scalable Virtual Switch Caching
Annus Zulfiqar, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
To support diverse data center environments and workloads, virtual switches (vSwitches) offer flexibility via multi-table programmable packet pipelines with flexible control flows and rely on caching packet lookups into a single table to deliver high performance. Maintaining a consistent view of this cache with the vSwitch pipeline under frequent updates is a hard problem owing to the nature of these pipelines: tables contain wildcard rules with priorities and it is difficult to determine how to reflect rule updates to individual cache entries. Therefore, with each update, the entire cache is "revalidated" in a brute force manner. This poster presents our vision for a new approach to cache consistency by framing the problem as an instance of Incremental View Maintenance (IVM), a long studied problem in large databases community where a view generated from a query is updated incrementally upon updates to the DB.
Extended Abstract Poster
ASPLOS 2025
Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
Gigaflow is a novel caching architecture designed for SmartNICs to accelerate virtual switch (vSwitch) packet processing. It leverages inherent pipeline-aware locality in vSwitches—defined by policies (e.g., L2, L3, ACL) and their execution order (e.g., using P4 or OpenFlow)—to cache rules for shared segments (sub-traversals) rather than full flows (traversals). These reusable segments improve cache efficiency (e.g., up to 51% higher cache hit rate) and capture more rule-space (up to 450x) compared to traditional Megaflow caching, all within the limited memory of modern SmartNICs.
Paper Talk Code P4 Dev Days Talk Extended Abstract Poster
In the Media
Computer Science and Engineering
Streamlining cloud traffic with a Gigaflow Cache
Gigaflow improves virtual switches for programmable SmartNICs, delivering a 51% higher hit rate and 90% lower misses.
danglingpointers.substack.com
Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs
Separable memoization
SIGCOMM CCR 2023
The Slow Path Needs An Accelerator Too!
Annus Zulfiqar, Ben Pfaff, William Tu, Gianni Antichi, Muhammad Shahbaz
Traditional Software-Defined Networking (SDN) literature abstracts the network into a centralized control plane and a simple, highly performant data plane. However, real-world SDN deployments—across virtual switches, hardware switches, 5G cores, and service meshes—rely on a third component: the slow path, which acts as an exception handler and provides an infinite-resource abstraction to the control plane. Though historically overlooked, we argue that the slow path is becoming a key bottleneck in modern data center networks and warrants its own dedicated accelerator.
Paper
ASPLOS  2023
Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks (Distinguished Artifact Award)
Tushar Swamy, Annus Zulfiqar, Luigi Nardi, Muhammad Shahbaz, Kunle Olukotun
In recent years, in-network support for machine learning (ML) has improved significantly through dedicated line-rate inference architectures (e.g., Taurus). However, programming in-network ML models requires deep knowledge of ML, low-level hardware (e.g., Spatial HDL), and strict performance/hardware constraints. We introduce Homunculus, a supervised learning framework and domain-specific language (DSL) that automates this process: given a dataset and a set of performance (e.g., latency, throughput) and hardware (e.g., compute, memory) constraints, it performs full design-space exploration to generate a compliant, high-performance ML model.
Paper Code