Annus Zulfiqar
I’m a PhD candidate in the Computer Science and Engineering department at the University of Michigan, Ann Arbor, advised by Professor Muhammad Shahbaz. My research is about rethinking data plane architectures and algorithms to build scalable networked systems that overcome traditional resource limitations and adapt quickly to dynamic network conditions.
Publications
SIGCOMM 2025 Poster
Kairo: Incremental View Maintenance for Scalable Virtual Switch Caching
Annus Zulfiqar, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
To support diverse data center environments and workloads, virtual switches (vSwitches) offer flexibility via multi-table programmable packet pipelines with flexible control flows and rely on caching packet lookups into a single table to deliver high performance. Maintaining a consistent view of this cache with the vSwitch pipeline under frequent updates is a hard problem owing to the nature of these pipelines: tables contain wildcard rules with priorities and it is difficult to determine how to reflect rule updates to individual cache entries. Therefore, with each update, the entire cache is "revalidated" in a brute force manner. This poster presents our vision for a new approach to cache consistency by framing the problem as an instance of Incremental View Maintenance (IVM), a long studied problem in large databases community where a view generated from a query is updated incrementally upon updates to the DB.

ASPLOS 2025
Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
Gigaflow is a novel caching architecture designed for SmartNICs to accelerate virtual switch (vSwitch) packet processing. It leverages inherent pipeline-aware locality in vSwitches—defined by policies (e.g., L2, L3, ACL) and their execution order (e.g., using P4 or OpenFlow)—to cache rules for shared segments (sub-traversals) rather than full flows (traversals). These reusable segments improve cache efficiency (e.g., up to 51% higher cache hit rate) and capture more rule-space (up to 450x) compared to traditional Megaflow caching, all within the limited memory of modern SmartNICs.
In the Media

Computer Science and Engineering

Streamlining cloud traffic with a Gigaflow Cache

Gigaflow improves virtual switches for programmable SmartNICs, delivering a 51% higher hit rate and 90% lower misses.

P4 - Language Consortium

P4 Developer Days – Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern Smart NICs

P4 Developer Days webinar, “Pipeline-Aware Sub-Traversal Caching for Modern Smart NICs”


NSDI 2025 Poster
A Smart Cache for a SmartNIC! Rethinking Caching, Locality, & Revalidation for Modern Virtual Switches
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
This poster presents our vision for a modern virtual switch (vSwitch) design with a multi-table cache architecture, called Gigaflow, for SmartNIC offload that captures pipeline-aware locality from vSwitch programmable packet pipelines and delivers state-of-the-art end-to-end performance. We also reframe vSwitch cache consistency as an instance of the Incremental View Maintenance (IVM) problem from large-scale databases and highlight key open challenges in this space.

NSDI 2025 Poster
SpliDT: Partitioned Decision Trees for Scalable Stateful Inference at Line Rate
Murayyiam Parvez*, Annus Zulfiqar*, Roman Beltiukov, Shir Landau-Feibish, Walter Willinger, Arpit Gupta, Muhammad Shahbaz
Decision Trees (DTs) are ideal for in-network ML using programmable RMT switches owing to their hierarchical decisions and post-training interpretability. But traditional implementations only accommodate top-k features, limited by the small stateful memory in switches and deliver low classification performance. We present our vision of a partitioned DT inference architecture, SpliDT, where we delegate feature collection to subtree-level inference and reuse stateful resources over sliding windows of packets to scale far beyond top-k features. Preliminary results demonstrate that SpliDT inference yields Pareto frontier on F1 score vs number of stateful flows for many real-world security applications.

Hot Chips 2024 Poster
A Smart Cache for a SmartNIC! Scaling End-host Networking to 400 Gbps & Beyond
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
Virtual switches (vSwitches) have evolved to deliver high performance by implementing N-to-1 table caching but this architecture was designed keeping CPU constraints in mind (e.g., multi-table lookups are expensive) which was the underlying architecture at the time of their inception. Modern vSwitches operate in data centers where SmartNICs are pervasive and don't have the same limitations (e.g., offer line-rate, multi-table, P4-programmable pipelines). We envision a SmartNIC-native vSwitch architecture with a novel multi-table cache subsystem, present our preliminary benchmarks, and discuss open challenges in this endeavor.

SIGCOMM CCR 2023
The Slow Path Needs An Accelerator Too!
Annus Zulfiqar, Ben Pfaff, William Tu, Gianni Antichi, Muhammad Shahbaz
Traditional Software-Defined Networking (SDN) literature abstracts the network into a centralized control plane and a simple, highly performant data plane. However, real-world SDN deployments—across virtual switches, hardware switches, 5G cores, and service meshes—rely on a third component: the slow path, which acts as an exception handler and provides an infinite-resource abstraction to the control plane. Though historically overlooked, we argue that the slow path is becoming a key bottleneck in modern data center networks and warrants its own dedicated accelerator.

ASPLOS 2023
Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks (Distinguished Artifact Award)
Tushar Swamy, Annus Zulfiqar, Luigi Nardi, Muhammad Shahbaz, Kunle Olukotun
In recent years, in-network support for machine learning (ML) has improved significantly through dedicated line-rate inference architectures (e.g., Taurus). However, programming in-network ML models requires deep knowledge of ML, low-level hardware (e.g., Spatial HDL), and strict performance/hardware constraints. We introduce Homunculus, a supervised learning framework and domain-specific language (DSL) that automates this process: given a dataset and a set of performance (e.g., latency, throughput) and hardware (e.g., compute, memory) constraints, it performs full design-space exploration to generate a compliant, high-performance ML model.
Made with