Annus Zulfiqar
I’m a PhD candidate in the Computer Science and Engineering department at the University of Michigan, Ann Arbor, advised by Professor Muhammad Shahbaz. My research is about rethinking data plane architectures and algorithms to build scalable networked systems that overcome traditional resource limitations and adapt quickly to dynamic network conditions.
Publications
[ASPLOS 2025] Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
Gigaflow is a novel caching architecture designed for SmartNICs to accelerate virtual switch (vSwitch) packet processing. It leverages inherent pipeline-aware locality in vSwitches—defined by policies (e.g., L2, L3, ACL) and their execution order (e.g., using P4 or OpenFlow)—to cache rules for shared segments (sub-traversals) rather than full flows (traversals). These reusable segments improve cache efficiency (e.g., up to 51% higher cache hit rate) and capture more rule-space (up to 450x) compared to traditional Megaflow caching, all within the limited memory of modern SmartNICs.
In the Media

Computer Science and Engineering

Streamlining cloud traffic with a Gigaflow Cache

Gigaflow improves virtual switches for programmable SmartNICs, delivering a 51% higher hit rate and 90% lower misses.

P4 - Language Consortium

P4 Developer Days – Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern Smart NICs

P4 Developer Days webinar, “Pipeline-Aware Sub-Traversal Caching for Modern Smart NICs”


[NSDI 2025] A Smart Cache for a SmartNIC! Rethinking Caching, Locality, & Revalidation for Modern Virtual Switches
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
This poster presents our vision for a modern virtual switch (vSwitch) design with a multi-table cache architecture, called Gigaflow, for SmartNIC offload that captures pipeline-aware locality from vSwitch programmable packet pipelines and delivers state-of-the-art end-to-end performance. We also reframe vSwitch cache consistency as an instance of the Incremental View Maintenance (IVM) problem from large-scale databases and highlight key open challenges in this space.

[NSDI 2025] SpliDT: Partitioned Decision Trees for Scalable Stateful Inference at Line Rate
Murayyiam Parvez*, Annus Zulfiqar*, Roman Beltiukov, Shir Landau-Feibish, Walter Willinger, Arpit Gupta, Muhammad Shahbaz
Decision Trees (DTs) are ideal for in-network ML using programmable RMT switches owing to their hierarchical decisions and post-training interpretability. But traditional implementations only accommodate top-k features, limited by the small stateful memory in switches and deliver low classification performance. We present our vision of a partitioned DT inference architecture, SpliDT, where we delegate feature collection to subtree-level inference and reuse stateful resources over sliding windows of packets to scale far beyond top-k features. Preliminary results demonstrate that SpliDT inference yields Pareto frontier on F1 score vs number of stateful flows for many real-world security applications.

[Hot Chips 2024] A Smart Cache for a SmartNIC! Scaling End-host Networking to 400 Gbps & Beyond
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
Virtual switches (vSwitches) have evolved to deliver high performance by implementing N-to-1 table caching but this architecture was designed keeping CPU constraints in mind (e.g., multi-table lookups are expensive) which was the underlying architecture at the time of their inception. Modern vSwitches operate in data centers where SmartNICs are pervasive and don't have the same limitations (e.g., offer line-rate, multi-table, P4-programmable pipelines). We envision a SmartNIC-native vSwitch architecture with a novel multi-table cache subsystem, present our preliminary benchmarks, and discuss open challenges in this endeavor.

[SIGCOMM CCR 2023] The Slow Path Needs An Accelerator Too!
Annus Zulfiqar, Ben Pfaff, William Tu, Gianni Antichi, Muhammad Shahbaz
Traditional Software-Defined Networking (SDN) literature abstracts the network into a centralized control plane and a simple, highly performant data plane. However, real-world SDN deployments—across virtual switches, hardware switches, 5G cores, and service meshes—rely on a third component: the slow path, which acts as an exception handler and provides an infinite-resource abstraction to the control plane. Though historically overlooked, we argue that the slow path is becoming a key bottleneck in modern data center networks and warrants its own dedicated accelerator.

[ASPLOS 2023] Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks (Distinguished Artifact Award)
Tushar Swamy, Annus Zulfiqar, Luigi Nardi, Muhammad Shahbaz, Kunle Olukotun
In recent years, in-network support for machine learning (ML) has improved significantly through dedicated line-rate inference architectures (e.g., Taurus). However, programming in-network ML models requires deep knowledge of ML, low-level hardware (e.g., Spatial HDL), and strict performance/hardware constraints. We introduce Homunculus, a supervised learning framework and domain-specific language (DSL) that automates this process: given a dataset and a set of performance (e.g., latency, throughput) and hardware (e.g., compute, memory) constraints, it performs full design-space exploration to generate a compliant, high-performance ML model.
Made with