netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2 0/3] virtio-net: fix the deadlock when disabling rx NAPI
@ 2026-01-02 15:20 Bui Quang Minh
  2026-01-02 15:20 ` [PATCH net v2 1/3] virtio-net: don't schedule delayed refill worker Bui Quang Minh
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Bui Quang Minh @ 2026-01-02 15:20 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	virtualization, linux-kernel, bpf, Bui Quang Minh

Calling napi_disable() on an already disabled napi can cause the
deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
when pausing rx"), to avoid the deadlock, when pausing the RX in
virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
However, in the virtnet_rx_resume_all(), we enable the delayed refill
work too early before enabling all the receive queue napis.

The deadlock can be reproduced by running
selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
device and inserting a cond_resched() inside the for loop in
virtnet_rx_resume_all() to increase the success rate. Because the worker
processing the delayed refilled work runs on the same CPU as
virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
In real scenario, the contention on netdev_lock can cause the
reschedule.

Due to the complexity of delayed refill worker, in this series, we remove
it. When we fail to refill the receive buffer, we will retry in the next
NAPI poll instead.
- Patch 1: removes delayed refill worker schedule and retry refill in next
NAPI
- Patch 2, 3: removes and clean up unused delayed refill worker code

For testing, I've run the following tests with no issue so far
- selftests/drivers/net/hw/xsk_reconfig.py which sets up the XDP zerocopy
without providing any descriptors to the fill ring. As a result,
try_fill_recv will always fail.
- Send TCP packets from host to guest while guest is nearly OOM and some
try_fill_recv calls fail.

Changes in v2:
- Remove the delayed refill worker to simplify the logic instead of trying
to fix it
- Link to v1:
https://lore.kernel.org/netdev/20251223152533.24364-1-minhquangbui99@gmail.com/

Link to the previous approach and discussion:
https://lore.kernel.org/netdev/20251212152741.11656-1-minhquangbui99@gmail.com/

Thanks,
Quang Minh.

Bui Quang Minh (3):
  virtio-net: don't schedule delayed refill worker
  virtio-net: remove unused delayed refill worker
  virtio-net: clean up __virtnet_rx_pause/resume

 drivers/net/virtio_net.c | 171 +++++++++------------------------------
 1 file changed, 40 insertions(+), 131 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-01-03 17:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-02 15:20 [PATCH net v2 0/3] virtio-net: fix the deadlock when disabling rx NAPI Bui Quang Minh
2026-01-02 15:20 ` [PATCH net v2 1/3] virtio-net: don't schedule delayed refill worker Bui Quang Minh
2026-01-03  0:16   ` Michael S. Tsirkin
2026-01-03 16:57   ` Michael S. Tsirkin
2026-01-03 17:34     ` Michael S. Tsirkin
2026-01-02 15:20 ` [PATCH net v2 2/3] virtio-net: remove unused " Bui Quang Minh
2026-01-03  9:09   ` Michael S. Tsirkin
2026-01-02 15:20 ` [PATCH net v2 3/3] virtio-net: clean up __virtnet_rx_pause/resume Bui Quang Minh
2026-01-03  9:13 ` [PATCH net v2 0/3] virtio-net: fix the deadlock when disabling rx NAPI Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).