* [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point
2024-12-04 5:07 [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Koichiro Den
@ 2024-12-04 5:07 ` Koichiro Den
2024-12-05 7:30 ` Jason Wang
2024-12-05 10:33 ` Michael S. Tsirkin
2024-12-04 5:07 ` [PATCH net-next v3 2/7] virtio_net: replace vq2rxq with vq2txq where appropriate Koichiro Den
` (6 subsequent siblings)
7 siblings, 2 replies; 25+ messages in thread
From: Koichiro Den @ 2024-12-04 5:07 UTC (permalink / raw)
To: virtualization
Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
kuba, pabeni, jiri, netdev, linux-kernel, stable
When virtnet_close is followed by virtnet_open, some TX completions can
possibly remain unconsumed, until they are finally processed during the
first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
[1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
before RX napi enable") was not sufficient to eliminate all BQL crash
cases for virtio-net.
This issue can be reproduced with the latest net-next master by running:
`while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
TX load from inside the machine.
netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
the device is not stopped in any case. For BQL core part, it's just like
traffic nearly ceases to exist for some period. For stall detector added
to BQL, even if virtnet_close could somehow lead to some TX completions
delayed for long, followed by virtnet_open, we can just take it as stall
as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
based on BQL"). Note also that users can still reset stall_max via sysfs.
So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
explicitly required in freeze/restore path, so this patch adds it to
free_unused_bufs().
[1]:
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:99!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:dql_completed+0x26b/0x290
Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
<IRQ>
? die+0x32/0x80
? do_trap+0xd9/0x100
? dql_completed+0x26b/0x290
? dql_completed+0x26b/0x290
? do_error_trap+0x6d/0xb0
? dql_completed+0x26b/0x290
? exc_invalid_op+0x4c/0x60
? dql_completed+0x26b/0x290
? asm_exc_invalid_op+0x16/0x20
? dql_completed+0x26b/0x290
__free_old_xmit+0xff/0x170 [virtio_net]
free_old_xmit+0x54/0xc0 [virtio_net]
virtnet_poll+0xf4/0xe30 [virtio_net]
? __update_load_avg_cfs_rq+0x264/0x2d0
? update_curr+0x35/0x260
? reweight_entity+0x1be/0x260
__napi_poll.constprop.0+0x28/0x1c0
net_rx_action+0x329/0x420
? enqueue_hrtimer+0x35/0x90
? trace_hardirqs_on+0x1d/0x80
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? sched_clock_cpu+0xd/0x1a0
handle_softirqs+0x138/0x3e0
do_softirq.part.0+0x89/0xc0
</IRQ>
<TASK>
__local_bh_enable_ip+0xa7/0xb0
virtnet_open+0xc8/0x310 [virtio_net]
__dev_open+0xfa/0x1b0
__dev_change_flags+0x1de/0x250
dev_change_flags+0x22/0x60
do_setlink.isra.0+0x2df/0x10b0
? rtnetlink_rcv_msg+0x34f/0x3f0
? netlink_rcv_skb+0x54/0x100
? netlink_unicast+0x23e/0x390
? netlink_sendmsg+0x21e/0x490
? ____sys_sendmsg+0x31b/0x350
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? __nla_validate_parse+0x5f/0xee0
? __pfx___probestub_irq_enable+0x3/0x10
? __create_object+0x5e/0x90
? security_capable+0x3b/0x70
rtnl_newlink+0x784/0xaf0
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? stack_depot_save_flags+0x24/0x6d0
? __pfx_rtnl_newlink+0x10/0x10
rtnetlink_rcv_msg+0x34f/0x3f0
? do_syscall_64+0x6c/0x180
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
? __pfx_rtnetlink_rcv_msg+0x10/0x10
netlink_rcv_skb+0x54/0x100
netlink_unicast+0x23e/0x390
netlink_sendmsg+0x21e/0x490
____sys_sendmsg+0x31b/0x350
? copy_msghdr_from_user+0x6d/0xa0
___sys_sendmsg+0x86/0xd0
? __pte_offset_map+0x17/0x160
? preempt_count_add+0x69/0xa0
? __call_rcu_common.constprop.0+0x147/0x610
? preempt_count_add+0x69/0xa0
? preempt_count_add+0x69/0xa0
? _raw_spin_trylock+0x13/0x60
? trace_hardirqs_on+0x1d/0x80
__sys_sendmsg+0x66/0xc0
do_syscall_64+0x6c/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f41defe5b34
Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
</TASK>
[...]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
drivers/net/virtio_net.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 64c87bb48a41..48ce8b3881b6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
if (err < 0)
goto err_xdp_reg_mem_model;
- netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
@@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
struct virtqueue *vq = vi->sq[i].vq;
while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
virtnet_sq_free_unused_buf(vq, buf);
+ netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
cond_resched();
}
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point
2024-12-04 5:07 ` [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point Koichiro Den
@ 2024-12-05 7:30 ` Jason Wang
2024-12-05 10:33 ` Michael S. Tsirkin
1 sibling, 0 replies; 25+ messages in thread
From: Jason Wang @ 2024-12-05 7:30 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, mst, xuanzhuo, eperezma, andrew+netdev, davem,
edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 4, 2024 at 1:08 PM Koichiro Den <koichiro.den@canonical.com> wrote:
>
> When virtnet_close is followed by virtnet_open, some TX completions can
> possibly remain unconsumed, until they are finally processed during the
> first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> [1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> before RX napi enable") was not sufficient to eliminate all BQL crash
> cases for virtio-net.
>
> This issue can be reproduced with the latest net-next master by running:
> `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> TX load from inside the machine.
>
> netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> the device is not stopped in any case. For BQL core part, it's just like
> traffic nearly ceases to exist for some period. For stall detector added
> to BQL, even if virtnet_close could somehow lead to some TX completions
> delayed for long, followed by virtnet_open, we can just take it as stall
> as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> based on BQL"). Note also that users can still reset stall_max via sysfs.
>
> So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> explicitly required in freeze/restore path, so this patch adds it to
> free_unused_bufs().
>
> [1]:
> ------------[ cut here ]------------
> kernel BUG at lib/dynamic_queue_limits.c:99!
> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
> Tainted: [N]=TEST
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> RIP: 0010:dql_completed+0x26b/0x290
> Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
> RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
> RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
> R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
> R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
> FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
> PKRU: 55555554
> Call Trace:
> <IRQ>
> ? die+0x32/0x80
> ? do_trap+0xd9/0x100
> ? dql_completed+0x26b/0x290
> ? dql_completed+0x26b/0x290
> ? do_error_trap+0x6d/0xb0
> ? dql_completed+0x26b/0x290
> ? exc_invalid_op+0x4c/0x60
> ? dql_completed+0x26b/0x290
> ? asm_exc_invalid_op+0x16/0x20
> ? dql_completed+0x26b/0x290
> __free_old_xmit+0xff/0x170 [virtio_net]
> free_old_xmit+0x54/0xc0 [virtio_net]
> virtnet_poll+0xf4/0xe30 [virtio_net]
> ? __update_load_avg_cfs_rq+0x264/0x2d0
> ? update_curr+0x35/0x260
> ? reweight_entity+0x1be/0x260
> __napi_poll.constprop.0+0x28/0x1c0
> net_rx_action+0x329/0x420
> ? enqueue_hrtimer+0x35/0x90
> ? trace_hardirqs_on+0x1d/0x80
> ? kvm_sched_clock_read+0xd/0x20
> ? sched_clock+0xc/0x30
> ? kvm_sched_clock_read+0xd/0x20
> ? sched_clock+0xc/0x30
> ? sched_clock_cpu+0xd/0x1a0
> handle_softirqs+0x138/0x3e0
> do_softirq.part.0+0x89/0xc0
> </IRQ>
> <TASK>
> __local_bh_enable_ip+0xa7/0xb0
> virtnet_open+0xc8/0x310 [virtio_net]
> __dev_open+0xfa/0x1b0
> __dev_change_flags+0x1de/0x250
> dev_change_flags+0x22/0x60
> do_setlink.isra.0+0x2df/0x10b0
> ? rtnetlink_rcv_msg+0x34f/0x3f0
> ? netlink_rcv_skb+0x54/0x100
> ? netlink_unicast+0x23e/0x390
> ? netlink_sendmsg+0x21e/0x490
> ? ____sys_sendmsg+0x31b/0x350
> ? avc_has_perm_noaudit+0x67/0xf0
> ? cred_has_capability.isra.0+0x75/0x110
> ? __nla_validate_parse+0x5f/0xee0
> ? __pfx___probestub_irq_enable+0x3/0x10
> ? __create_object+0x5e/0x90
> ? security_capable+0x3b/0x70
> rtnl_newlink+0x784/0xaf0
> ? avc_has_perm_noaudit+0x67/0xf0
> ? cred_has_capability.isra.0+0x75/0x110
> ? stack_depot_save_flags+0x24/0x6d0
> ? __pfx_rtnl_newlink+0x10/0x10
> rtnetlink_rcv_msg+0x34f/0x3f0
> ? do_syscall_64+0x6c/0x180
> ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> netlink_rcv_skb+0x54/0x100
> netlink_unicast+0x23e/0x390
> netlink_sendmsg+0x21e/0x490
> ____sys_sendmsg+0x31b/0x350
> ? copy_msghdr_from_user+0x6d/0xa0
> ___sys_sendmsg+0x86/0xd0
> ? __pte_offset_map+0x17/0x160
> ? preempt_count_add+0x69/0xa0
> ? __call_rcu_common.constprop.0+0x147/0x610
> ? preempt_count_add+0x69/0xa0
> ? preempt_count_add+0x69/0xa0
> ? _raw_spin_trylock+0x13/0x60
> ? trace_hardirqs_on+0x1d/0x80
> __sys_sendmsg+0x66/0xc0
> do_syscall_64+0x6c/0x180
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f41defe5b34
> Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
> f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
> RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
> RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
> RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
> R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
> R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
> </TASK>
> [...]
> ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
> Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> Cc: <stable@vger.kernel.org> # v6.11+
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> ---
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point
2024-12-04 5:07 ` [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point Koichiro Den
2024-12-05 7:30 ` Jason Wang
@ 2024-12-05 10:33 ` Michael S. Tsirkin
2024-12-05 12:43 ` Koichiro Den
1 sibling, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2024-12-05 10:33 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> When virtnet_close is followed by virtnet_open, some TX completions can
> possibly remain unconsumed, until they are finally processed during the
> first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> [1].
So it's a bugfix. Why net-next not net?
> Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> before RX napi enable") was not sufficient to eliminate all BQL crash
> cases for virtio-net.
>
> This issue can be reproduced with the latest net-next master by running:
> `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> TX load from inside the machine.
>
> netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> the device is not stopped in any case. For BQL core part, it's just like
> traffic nearly ceases to exist for some period. For stall detector added
> to BQL, even if virtnet_close could somehow lead to some TX completions
> delayed for long, followed by virtnet_open, we can just take it as stall
> as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> based on BQL"). Note also that users can still reset stall_max via sysfs.
>
> So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> explicitly required in freeze/restore path, so this patch adds it to
> free_unused_bufs().
I don't much like that free_unused_bufs now has this side effect.
I think would be better to just add a loop in virtnet_restore.
Or if you want to keep it there, pls rename the function
to hint it does more.
>
> [1]:
> ------------[ cut here ]------------
> kernel BUG at lib/dynamic_queue_limits.c:99!
> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
> Tainted: [N]=TEST
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> RIP: 0010:dql_completed+0x26b/0x290
> Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
> RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
> RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
> R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
> R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
> FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
> PKRU: 55555554
> Call Trace:
> <IRQ>
> ? die+0x32/0x80
> ? do_trap+0xd9/0x100
> ? dql_completed+0x26b/0x290
> ? dql_completed+0x26b/0x290
> ? do_error_trap+0x6d/0xb0
> ? dql_completed+0x26b/0x290
> ? exc_invalid_op+0x4c/0x60
> ? dql_completed+0x26b/0x290
> ? asm_exc_invalid_op+0x16/0x20
> ? dql_completed+0x26b/0x290
> __free_old_xmit+0xff/0x170 [virtio_net]
> free_old_xmit+0x54/0xc0 [virtio_net]
> virtnet_poll+0xf4/0xe30 [virtio_net]
> ? __update_load_avg_cfs_rq+0x264/0x2d0
> ? update_curr+0x35/0x260
> ? reweight_entity+0x1be/0x260
> __napi_poll.constprop.0+0x28/0x1c0
> net_rx_action+0x329/0x420
> ? enqueue_hrtimer+0x35/0x90
> ? trace_hardirqs_on+0x1d/0x80
> ? kvm_sched_clock_read+0xd/0x20
> ? sched_clock+0xc/0x30
> ? kvm_sched_clock_read+0xd/0x20
> ? sched_clock+0xc/0x30
> ? sched_clock_cpu+0xd/0x1a0
> handle_softirqs+0x138/0x3e0
> do_softirq.part.0+0x89/0xc0
> </IRQ>
> <TASK>
> __local_bh_enable_ip+0xa7/0xb0
> virtnet_open+0xc8/0x310 [virtio_net]
> __dev_open+0xfa/0x1b0
> __dev_change_flags+0x1de/0x250
> dev_change_flags+0x22/0x60
> do_setlink.isra.0+0x2df/0x10b0
> ? rtnetlink_rcv_msg+0x34f/0x3f0
> ? netlink_rcv_skb+0x54/0x100
> ? netlink_unicast+0x23e/0x390
> ? netlink_sendmsg+0x21e/0x490
> ? ____sys_sendmsg+0x31b/0x350
> ? avc_has_perm_noaudit+0x67/0xf0
> ? cred_has_capability.isra.0+0x75/0x110
> ? __nla_validate_parse+0x5f/0xee0
> ? __pfx___probestub_irq_enable+0x3/0x10
> ? __create_object+0x5e/0x90
> ? security_capable+0x3b/0x70
> rtnl_newlink+0x784/0xaf0
> ? avc_has_perm_noaudit+0x67/0xf0
> ? cred_has_capability.isra.0+0x75/0x110
> ? stack_depot_save_flags+0x24/0x6d0
> ? __pfx_rtnl_newlink+0x10/0x10
> rtnetlink_rcv_msg+0x34f/0x3f0
> ? do_syscall_64+0x6c/0x180
> ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> netlink_rcv_skb+0x54/0x100
> netlink_unicast+0x23e/0x390
> netlink_sendmsg+0x21e/0x490
> ____sys_sendmsg+0x31b/0x350
> ? copy_msghdr_from_user+0x6d/0xa0
> ___sys_sendmsg+0x86/0xd0
> ? __pte_offset_map+0x17/0x160
> ? preempt_count_add+0x69/0xa0
> ? __call_rcu_common.constprop.0+0x147/0x610
> ? preempt_count_add+0x69/0xa0
> ? preempt_count_add+0x69/0xa0
> ? _raw_spin_trylock+0x13/0x60
> ? trace_hardirqs_on+0x1d/0x80
> __sys_sendmsg+0x66/0xc0
> do_syscall_64+0x6c/0x180
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f41defe5b34
> Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
> f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
> RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
> RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
> RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
> R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
> R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
> </TASK>
> [...]
> ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
> Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> Cc: <stable@vger.kernel.org> # v6.11+
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> ---
> drivers/net/virtio_net.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 64c87bb48a41..48ce8b3881b6 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
> if (err < 0)
> goto err_xdp_reg_mem_model;
>
> - netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
> virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
> virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
>
> @@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
> struct virtqueue *vq = vi->sq[i].vq;
> while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> virtnet_sq_free_unused_buf(vq, buf);
> + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> cond_resched();
> }
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point
2024-12-05 10:33 ` Michael S. Tsirkin
@ 2024-12-05 12:43 ` Koichiro Den
2024-12-05 13:16 ` Koichiro Den
0 siblings, 1 reply; 25+ messages in thread
From: Koichiro Den @ 2024-12-05 12:43 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Thu, Dec 05, 2024 at 05:33:36AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> > When virtnet_close is followed by virtnet_open, some TX completions can
> > possibly remain unconsumed, until they are finally processed during the
> > first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> > [1].
>
>
> So it's a bugfix. Why net-next not net?
I was mistaken (I just read netdev-FAQ again). I'll resend to net, with
adjustments reflecting your feedback.
>
> > Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> > before RX napi enable") was not sufficient to eliminate all BQL crash
> > cases for virtio-net.
> >
> > This issue can be reproduced with the latest net-next master by running:
> > `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> > TX load from inside the machine.
> >
> > netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> > the device is not stopped in any case. For BQL core part, it's just like
> > traffic nearly ceases to exist for some period. For stall detector added
> > to BQL, even if virtnet_close could somehow lead to some TX completions
> > delayed for long, followed by virtnet_open, we can just take it as stall
> > as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> > based on BQL"). Note also that users can still reset stall_max via sysfs.
> >
> > So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> > eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> > explicitly required in freeze/restore path, so this patch adds it to
> > free_unused_bufs().
>
> I don't much like that free_unused_bufs now has this side effect.
> I think would be better to just add a loop in virtnet_restore.
> Or if you want to keep it there, pls rename the function
> to hint it does more.
It makes sense. I would go for the former. Thanks.
>
>
> >
> > [1]:
> > ------------[ cut here ]------------
> > kernel BUG at lib/dynamic_queue_limits.c:99!
> > Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
> > Tainted: [N]=TEST
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:dql_completed+0x26b/0x290
> > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
> > RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
> > RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
> > RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
> > R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
> > R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
> > FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
> > knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
> > PKRU: 55555554
> > Call Trace:
> > <IRQ>
> > ? die+0x32/0x80
> > ? do_trap+0xd9/0x100
> > ? dql_completed+0x26b/0x290
> > ? dql_completed+0x26b/0x290
> > ? do_error_trap+0x6d/0xb0
> > ? dql_completed+0x26b/0x290
> > ? exc_invalid_op+0x4c/0x60
> > ? dql_completed+0x26b/0x290
> > ? asm_exc_invalid_op+0x16/0x20
> > ? dql_completed+0x26b/0x290
> > __free_old_xmit+0xff/0x170 [virtio_net]
> > free_old_xmit+0x54/0xc0 [virtio_net]
> > virtnet_poll+0xf4/0xe30 [virtio_net]
> > ? __update_load_avg_cfs_rq+0x264/0x2d0
> > ? update_curr+0x35/0x260
> > ? reweight_entity+0x1be/0x260
> > __napi_poll.constprop.0+0x28/0x1c0
> > net_rx_action+0x329/0x420
> > ? enqueue_hrtimer+0x35/0x90
> > ? trace_hardirqs_on+0x1d/0x80
> > ? kvm_sched_clock_read+0xd/0x20
> > ? sched_clock+0xc/0x30
> > ? kvm_sched_clock_read+0xd/0x20
> > ? sched_clock+0xc/0x30
> > ? sched_clock_cpu+0xd/0x1a0
> > handle_softirqs+0x138/0x3e0
> > do_softirq.part.0+0x89/0xc0
> > </IRQ>
> > <TASK>
> > __local_bh_enable_ip+0xa7/0xb0
> > virtnet_open+0xc8/0x310 [virtio_net]
> > __dev_open+0xfa/0x1b0
> > __dev_change_flags+0x1de/0x250
> > dev_change_flags+0x22/0x60
> > do_setlink.isra.0+0x2df/0x10b0
> > ? rtnetlink_rcv_msg+0x34f/0x3f0
> > ? netlink_rcv_skb+0x54/0x100
> > ? netlink_unicast+0x23e/0x390
> > ? netlink_sendmsg+0x21e/0x490
> > ? ____sys_sendmsg+0x31b/0x350
> > ? avc_has_perm_noaudit+0x67/0xf0
> > ? cred_has_capability.isra.0+0x75/0x110
> > ? __nla_validate_parse+0x5f/0xee0
> > ? __pfx___probestub_irq_enable+0x3/0x10
> > ? __create_object+0x5e/0x90
> > ? security_capable+0x3b/0x70
> > rtnl_newlink+0x784/0xaf0
> > ? avc_has_perm_noaudit+0x67/0xf0
> > ? cred_has_capability.isra.0+0x75/0x110
> > ? stack_depot_save_flags+0x24/0x6d0
> > ? __pfx_rtnl_newlink+0x10/0x10
> > rtnetlink_rcv_msg+0x34f/0x3f0
> > ? do_syscall_64+0x6c/0x180
> > ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > netlink_rcv_skb+0x54/0x100
> > netlink_unicast+0x23e/0x390
> > netlink_sendmsg+0x21e/0x490
> > ____sys_sendmsg+0x31b/0x350
> > ? copy_msghdr_from_user+0x6d/0xa0
> > ___sys_sendmsg+0x86/0xd0
> > ? __pte_offset_map+0x17/0x160
> > ? preempt_count_add+0x69/0xa0
> > ? __call_rcu_common.constprop.0+0x147/0x610
> > ? preempt_count_add+0x69/0xa0
> > ? preempt_count_add+0x69/0xa0
> > ? _raw_spin_trylock+0x13/0x60
> > ? trace_hardirqs_on+0x1d/0x80
> > __sys_sendmsg+0x66/0xc0
> > do_syscall_64+0x6c/0x180
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x7f41defe5b34
> > Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
> > f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> > f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
> > RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
> > RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
> > RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
> > R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
> > R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
> > </TASK>
> > [...]
> > ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> >
> > Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> > Cc: <stable@vger.kernel.org> # v6.11+
> > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> > ---
> > drivers/net/virtio_net.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 64c87bb48a41..48ce8b3881b6 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
> > if (err < 0)
> > goto err_xdp_reg_mem_model;
> >
> > - netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
> > virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
> > virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
> >
> > @@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > struct virtqueue *vq = vi->sq[i].vq;
> > while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > virtnet_sq_free_unused_buf(vq, buf);
> > + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > cond_resched();
> > }
> >
> > --
> > 2.43.0
>
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point
2024-12-05 12:43 ` Koichiro Den
@ 2024-12-05 13:16 ` Koichiro Den
2024-12-05 15:14 ` Michael S. Tsirkin
2024-12-05 15:17 ` Michael S. Tsirkin
0 siblings, 2 replies; 25+ messages in thread
From: Koichiro Den @ 2024-12-05 13:16 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Thu, Dec 05, 2024 at 09:43:38PM +0900, Koichiro Den wrote:
> On Thu, Dec 05, 2024 at 05:33:36AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> > > When virtnet_close is followed by virtnet_open, some TX completions can
> > > possibly remain unconsumed, until they are finally processed during the
> > > first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> > > [1].
> >
> >
> > So it's a bugfix. Why net-next not net?
>
> I was mistaken (I just read netdev-FAQ again). I'll resend to net, with
> adjustments reflecting your feedback.
>
> >
> > > Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> > > before RX napi enable") was not sufficient to eliminate all BQL crash
> > > cases for virtio-net.
> > >
> > > This issue can be reproduced with the latest net-next master by running:
> > > `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> > > TX load from inside the machine.
> > >
> > > netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> > > the device is not stopped in any case. For BQL core part, it's just like
> > > traffic nearly ceases to exist for some period. For stall detector added
> > > to BQL, even if virtnet_close could somehow lead to some TX completions
> > > delayed for long, followed by virtnet_open, we can just take it as stall
> > > as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> > > based on BQL"). Note also that users can still reset stall_max via sysfs.
> > >
> > > So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> > > eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> > > explicitly required in freeze/restore path, so this patch adds it to
> > > free_unused_bufs().
> >
> > I don't much like that free_unused_bufs now has this side effect.
> > I think would be better to just add a loop in virtnet_restore.
> > Or if you want to keep it there, pls rename the function
> > to hint it does more.
>
> It makes sense. I would go for the former. Thanks.
Hmm, as Jacob pointed out in v1
(https://lore.kernel.org/all/20241202181445.0da50076@kernel.org/),
it looks better to follow the rule of thumb. Taking both suggestions
from Jacob and you, adding a loop to remove_vq_common(), just after
free_unused_bufs(), seems more fitting now, like this:
static void remove_vq_common(struct virtnet_info *vi)
{
+ int i;
+
virtio_reset_device(vi->vdev);
/* Free unused buffers in both send and recv, if any. */
free_unused_bufs(vi);
+ /* Tx unused buffers flushed, so reset BQL counter */
+ for (i = 0; i < vi->max_queue_pairs; i++)
+ netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
+
free_receive_bufs(vi);
What do you think?
Thanks,
-Koichiro Den
>
> >
> >
> > >
> > > [1]:
> > > ------------[ cut here ]------------
> > > kernel BUG at lib/dynamic_queue_limits.c:99!
> > > Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
> > > Tainted: [N]=TEST
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > RIP: 0010:dql_completed+0x26b/0x290
> > > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > > RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
> > > RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
> > > RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
> > > RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
> > > R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
> > > FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
> > > knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
> > > PKRU: 55555554
> > > Call Trace:
> > > <IRQ>
> > > ? die+0x32/0x80
> > > ? do_trap+0xd9/0x100
> > > ? dql_completed+0x26b/0x290
> > > ? dql_completed+0x26b/0x290
> > > ? do_error_trap+0x6d/0xb0
> > > ? dql_completed+0x26b/0x290
> > > ? exc_invalid_op+0x4c/0x60
> > > ? dql_completed+0x26b/0x290
> > > ? asm_exc_invalid_op+0x16/0x20
> > > ? dql_completed+0x26b/0x290
> > > __free_old_xmit+0xff/0x170 [virtio_net]
> > > free_old_xmit+0x54/0xc0 [virtio_net]
> > > virtnet_poll+0xf4/0xe30 [virtio_net]
> > > ? __update_load_avg_cfs_rq+0x264/0x2d0
> > > ? update_curr+0x35/0x260
> > > ? reweight_entity+0x1be/0x260
> > > __napi_poll.constprop.0+0x28/0x1c0
> > > net_rx_action+0x329/0x420
> > > ? enqueue_hrtimer+0x35/0x90
> > > ? trace_hardirqs_on+0x1d/0x80
> > > ? kvm_sched_clock_read+0xd/0x20
> > > ? sched_clock+0xc/0x30
> > > ? kvm_sched_clock_read+0xd/0x20
> > > ? sched_clock+0xc/0x30
> > > ? sched_clock_cpu+0xd/0x1a0
> > > handle_softirqs+0x138/0x3e0
> > > do_softirq.part.0+0x89/0xc0
> > > </IRQ>
> > > <TASK>
> > > __local_bh_enable_ip+0xa7/0xb0
> > > virtnet_open+0xc8/0x310 [virtio_net]
> > > __dev_open+0xfa/0x1b0
> > > __dev_change_flags+0x1de/0x250
> > > dev_change_flags+0x22/0x60
> > > do_setlink.isra.0+0x2df/0x10b0
> > > ? rtnetlink_rcv_msg+0x34f/0x3f0
> > > ? netlink_rcv_skb+0x54/0x100
> > > ? netlink_unicast+0x23e/0x390
> > > ? netlink_sendmsg+0x21e/0x490
> > > ? ____sys_sendmsg+0x31b/0x350
> > > ? avc_has_perm_noaudit+0x67/0xf0
> > > ? cred_has_capability.isra.0+0x75/0x110
> > > ? __nla_validate_parse+0x5f/0xee0
> > > ? __pfx___probestub_irq_enable+0x3/0x10
> > > ? __create_object+0x5e/0x90
> > > ? security_capable+0x3b/0x70
> > > rtnl_newlink+0x784/0xaf0
> > > ? avc_has_perm_noaudit+0x67/0xf0
> > > ? cred_has_capability.isra.0+0x75/0x110
> > > ? stack_depot_save_flags+0x24/0x6d0
> > > ? __pfx_rtnl_newlink+0x10/0x10
> > > rtnetlink_rcv_msg+0x34f/0x3f0
> > > ? do_syscall_64+0x6c/0x180
> > > ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > > netlink_rcv_skb+0x54/0x100
> > > netlink_unicast+0x23e/0x390
> > > netlink_sendmsg+0x21e/0x490
> > > ____sys_sendmsg+0x31b/0x350
> > > ? copy_msghdr_from_user+0x6d/0xa0
> > > ___sys_sendmsg+0x86/0xd0
> > > ? __pte_offset_map+0x17/0x160
> > > ? preempt_count_add+0x69/0xa0
> > > ? __call_rcu_common.constprop.0+0x147/0x610
> > > ? preempt_count_add+0x69/0xa0
> > > ? preempt_count_add+0x69/0xa0
> > > ? _raw_spin_trylock+0x13/0x60
> > > ? trace_hardirqs_on+0x1d/0x80
> > > __sys_sendmsg+0x66/0xc0
> > > do_syscall_64+0x6c/0x180
> > > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > RIP: 0033:0x7f41defe5b34
> > > Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
> > > f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> > > f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
> > > RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
> > > RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
> > > RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
> > > R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
> > > R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
> > > </TASK>
> > > [...]
> > > ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > >
> > > Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> > > Cc: <stable@vger.kernel.org> # v6.11+
> > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> > > ---
> > > drivers/net/virtio_net.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 64c87bb48a41..48ce8b3881b6 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
> > > if (err < 0)
> > > goto err_xdp_reg_mem_model;
> > >
> > > - netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
> > > virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
> > > virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
> > >
> > > @@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > struct virtqueue *vq = vi->sq[i].vq;
> > > while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > virtnet_sq_free_unused_buf(vq, buf);
> > > + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > > cond_resched();
> > > }
> > >
> > > --
> > > 2.43.0
> >
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point
2024-12-05 13:16 ` Koichiro Den
@ 2024-12-05 15:14 ` Michael S. Tsirkin
2024-12-05 15:17 ` Michael S. Tsirkin
1 sibling, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2024-12-05 15:14 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Thu, Dec 05, 2024 at 10:16:35PM +0900, Koichiro Den wrote:
> On Thu, Dec 05, 2024 at 09:43:38PM +0900, Koichiro Den wrote:
> > On Thu, Dec 05, 2024 at 05:33:36AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> > > > When virtnet_close is followed by virtnet_open, some TX completions can
> > > > possibly remain unconsumed, until they are finally processed during the
> > > > first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> > > > [1].
> > >
> > >
> > > So it's a bugfix. Why net-next not net?
> >
> > I was mistaken (I just read netdev-FAQ again). I'll resend to net, with
> > adjustments reflecting your feedback.
> >
> > >
> > > > Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> > > > before RX napi enable") was not sufficient to eliminate all BQL crash
> > > > cases for virtio-net.
> > > >
> > > > This issue can be reproduced with the latest net-next master by running:
> > > > `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> > > > TX load from inside the machine.
> > > >
> > > > netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> > > > the device is not stopped in any case. For BQL core part, it's just like
> > > > traffic nearly ceases to exist for some period. For stall detector added
> > > > to BQL, even if virtnet_close could somehow lead to some TX completions
> > > > delayed for long, followed by virtnet_open, we can just take it as stall
> > > > as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> > > > based on BQL"). Note also that users can still reset stall_max via sysfs.
> > > >
> > > > So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> > > > eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> > > > explicitly required in freeze/restore path, so this patch adds it to
> > > > free_unused_bufs().
> > >
> > > I don't much like that free_unused_bufs now has this side effect.
> > > I think would be better to just add a loop in virtnet_restore.
> > > Or if you want to keep it there, pls rename the function
> > > to hint it does more.
> >
> > It makes sense. I would go for the former. Thanks.
>
> Hmm, as Jacob pointed out in v1
> (https://lore.kernel.org/all/20241202181445.0da50076@kernel.org/),
> it looks better to follow the rule of thumb. Taking both suggestions
> from Jacob and you, adding a loop to remove_vq_common(), just after
> free_unused_bufs(), seems more fitting now, like this:
>
> static void remove_vq_common(struct virtnet_info *vi)
> {
> + int i;
> +
> virtio_reset_device(vi->vdev);
>
> /* Free unused buffers in both send and recv, if any. */
> free_unused_bufs(vi);
>
> + /* Tx unused buffers flushed, so reset BQL counter */
> + for (i = 0; i < vi->max_queue_pairs; i++)
> + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> +
> free_receive_bufs(vi);
>
> What do you think?
>
> Thanks,
>
> -Koichiro Den
why in remove_vq_common? It's only used on freeze/restore.
> >
> > >
> > >
> > > >
> > > > [1]:
> > > > ------------[ cut here ]------------
> > > > kernel BUG at lib/dynamic_queue_limits.c:99!
> > > > Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
> > > > Tainted: [N]=TEST
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > > > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > > RIP: 0010:dql_completed+0x26b/0x290
> > > > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > > > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > > > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > > > RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
> > > > RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
> > > > RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
> > > > RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
> > > > R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
> > > > FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
> > > > knlGS:0000000000000000
> > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
> > > > PKRU: 55555554
> > > > Call Trace:
> > > > <IRQ>
> > > > ? die+0x32/0x80
> > > > ? do_trap+0xd9/0x100
> > > > ? dql_completed+0x26b/0x290
> > > > ? dql_completed+0x26b/0x290
> > > > ? do_error_trap+0x6d/0xb0
> > > > ? dql_completed+0x26b/0x290
> > > > ? exc_invalid_op+0x4c/0x60
> > > > ? dql_completed+0x26b/0x290
> > > > ? asm_exc_invalid_op+0x16/0x20
> > > > ? dql_completed+0x26b/0x290
> > > > __free_old_xmit+0xff/0x170 [virtio_net]
> > > > free_old_xmit+0x54/0xc0 [virtio_net]
> > > > virtnet_poll+0xf4/0xe30 [virtio_net]
> > > > ? __update_load_avg_cfs_rq+0x264/0x2d0
> > > > ? update_curr+0x35/0x260
> > > > ? reweight_entity+0x1be/0x260
> > > > __napi_poll.constprop.0+0x28/0x1c0
> > > > net_rx_action+0x329/0x420
> > > > ? enqueue_hrtimer+0x35/0x90
> > > > ? trace_hardirqs_on+0x1d/0x80
> > > > ? kvm_sched_clock_read+0xd/0x20
> > > > ? sched_clock+0xc/0x30
> > > > ? kvm_sched_clock_read+0xd/0x20
> > > > ? sched_clock+0xc/0x30
> > > > ? sched_clock_cpu+0xd/0x1a0
> > > > handle_softirqs+0x138/0x3e0
> > > > do_softirq.part.0+0x89/0xc0
> > > > </IRQ>
> > > > <TASK>
> > > > __local_bh_enable_ip+0xa7/0xb0
> > > > virtnet_open+0xc8/0x310 [virtio_net]
> > > > __dev_open+0xfa/0x1b0
> > > > __dev_change_flags+0x1de/0x250
> > > > dev_change_flags+0x22/0x60
> > > > do_setlink.isra.0+0x2df/0x10b0
> > > > ? rtnetlink_rcv_msg+0x34f/0x3f0
> > > > ? netlink_rcv_skb+0x54/0x100
> > > > ? netlink_unicast+0x23e/0x390
> > > > ? netlink_sendmsg+0x21e/0x490
> > > > ? ____sys_sendmsg+0x31b/0x350
> > > > ? avc_has_perm_noaudit+0x67/0xf0
> > > > ? cred_has_capability.isra.0+0x75/0x110
> > > > ? __nla_validate_parse+0x5f/0xee0
> > > > ? __pfx___probestub_irq_enable+0x3/0x10
> > > > ? __create_object+0x5e/0x90
> > > > ? security_capable+0x3b/0x70
> > > > rtnl_newlink+0x784/0xaf0
> > > > ? avc_has_perm_noaudit+0x67/0xf0
> > > > ? cred_has_capability.isra.0+0x75/0x110
> > > > ? stack_depot_save_flags+0x24/0x6d0
> > > > ? __pfx_rtnl_newlink+0x10/0x10
> > > > rtnetlink_rcv_msg+0x34f/0x3f0
> > > > ? do_syscall_64+0x6c/0x180
> > > > ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > > > netlink_rcv_skb+0x54/0x100
> > > > netlink_unicast+0x23e/0x390
> > > > netlink_sendmsg+0x21e/0x490
> > > > ____sys_sendmsg+0x31b/0x350
> > > > ? copy_msghdr_from_user+0x6d/0xa0
> > > > ___sys_sendmsg+0x86/0xd0
> > > > ? __pte_offset_map+0x17/0x160
> > > > ? preempt_count_add+0x69/0xa0
> > > > ? __call_rcu_common.constprop.0+0x147/0x610
> > > > ? preempt_count_add+0x69/0xa0
> > > > ? preempt_count_add+0x69/0xa0
> > > > ? _raw_spin_trylock+0x13/0x60
> > > > ? trace_hardirqs_on+0x1d/0x80
> > > > __sys_sendmsg+0x66/0xc0
> > > > do_syscall_64+0x6c/0x180
> > > > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > RIP: 0033:0x7f41defe5b34
> > > > Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
> > > > f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> > > > f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
> > > > RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
> > > > RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
> > > > RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
> > > > R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
> > > > R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
> > > > </TASK>
> > > > [...]
> > > > ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > > >
> > > > Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> > > > Cc: <stable@vger.kernel.org> # v6.11+
> > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> > > > ---
> > > > drivers/net/virtio_net.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 64c87bb48a41..48ce8b3881b6 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
> > > > if (err < 0)
> > > > goto err_xdp_reg_mem_model;
> > > >
> > > > - netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
> > > > virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
> > > > virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
> > > >
> > > > @@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > struct virtqueue *vq = vi->sq[i].vq;
> > > > while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > virtnet_sq_free_unused_buf(vq, buf);
> > > > + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > > > cond_resched();
> > > > }
> > > >
> > > > --
> > > > 2.43.0
> > >
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point
2024-12-05 13:16 ` Koichiro Den
2024-12-05 15:14 ` Michael S. Tsirkin
@ 2024-12-05 15:17 ` Michael S. Tsirkin
2024-12-06 0:27 ` Koichiro Den
1 sibling, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2024-12-05 15:17 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Thu, Dec 05, 2024 at 10:16:35PM +0900, Koichiro Den wrote:
> On Thu, Dec 05, 2024 at 09:43:38PM +0900, Koichiro Den wrote:
> > On Thu, Dec 05, 2024 at 05:33:36AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> > > > When virtnet_close is followed by virtnet_open, some TX completions can
> > > > possibly remain unconsumed, until they are finally processed during the
> > > > first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> > > > [1].
> > >
> > >
> > > So it's a bugfix. Why net-next not net?
> >
> > I was mistaken (I just read netdev-FAQ again). I'll resend to net, with
> > adjustments reflecting your feedback.
> >
> > >
> > > > Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> > > > before RX napi enable") was not sufficient to eliminate all BQL crash
> > > > cases for virtio-net.
> > > >
> > > > This issue can be reproduced with the latest net-next master by running:
> > > > `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> > > > TX load from inside the machine.
> > > >
> > > > netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> > > > the device is not stopped in any case. For BQL core part, it's just like
> > > > traffic nearly ceases to exist for some period. For stall detector added
> > > > to BQL, even if virtnet_close could somehow lead to some TX completions
> > > > delayed for long, followed by virtnet_open, we can just take it as stall
> > > > as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> > > > based on BQL"). Note also that users can still reset stall_max via sysfs.
> > > >
> > > > So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> > > > eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> > > > explicitly required in freeze/restore path, so this patch adds it to
> > > > free_unused_bufs().
> > >
> > > I don't much like that free_unused_bufs now has this side effect.
> > > I think would be better to just add a loop in virtnet_restore.
> > > Or if you want to keep it there, pls rename the function
> > > to hint it does more.
> >
> > It makes sense. I would go for the former. Thanks.
>
> Hmm, as Jacob pointed out in v1
> (https://lore.kernel.org/all/20241202181445.0da50076@kernel.org/),
> it looks better to follow the rule of thumb.
OK then. I'm fine with keeping your code as is, just a squash,
and add a comment
/*
* Rule of thumb is netdev_tx_reset_queue() should follow any
* skb freeing not followed by netdev_tx_completed_queue()
*/
> Taking both suggestions
> from Jacob and you, adding a loop to remove_vq_common(), just after
> free_unused_bufs(), seems more fitting now, like this:
>
> static void remove_vq_common(struct virtnet_info *vi)
> {
> + int i;
> +
> virtio_reset_device(vi->vdev);
>
> /* Free unused buffers in both send and recv, if any. */
> free_unused_bufs(vi);
>
> + /* Tx unused buffers flushed, so reset BQL counter */
> + for (i = 0; i < vi->max_queue_pairs; i++)
> + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> +
> free_receive_bufs(vi);
>
> What do you think?
>
> Thanks,
>
> -Koichiro Den
>
> >
> > >
> > >
> > > >
> > > > [1]:
> > > > ------------[ cut here ]------------
> > > > kernel BUG at lib/dynamic_queue_limits.c:99!
> > > > Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
> > > > Tainted: [N]=TEST
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > > > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > > RIP: 0010:dql_completed+0x26b/0x290
> > > > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > > > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > > > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > > > RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
> > > > RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
> > > > RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
> > > > RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
> > > > R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
> > > > FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
> > > > knlGS:0000000000000000
> > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
> > > > PKRU: 55555554
> > > > Call Trace:
> > > > <IRQ>
> > > > ? die+0x32/0x80
> > > > ? do_trap+0xd9/0x100
> > > > ? dql_completed+0x26b/0x290
> > > > ? dql_completed+0x26b/0x290
> > > > ? do_error_trap+0x6d/0xb0
> > > > ? dql_completed+0x26b/0x290
> > > > ? exc_invalid_op+0x4c/0x60
> > > > ? dql_completed+0x26b/0x290
> > > > ? asm_exc_invalid_op+0x16/0x20
> > > > ? dql_completed+0x26b/0x290
> > > > __free_old_xmit+0xff/0x170 [virtio_net]
> > > > free_old_xmit+0x54/0xc0 [virtio_net]
> > > > virtnet_poll+0xf4/0xe30 [virtio_net]
> > > > ? __update_load_avg_cfs_rq+0x264/0x2d0
> > > > ? update_curr+0x35/0x260
> > > > ? reweight_entity+0x1be/0x260
> > > > __napi_poll.constprop.0+0x28/0x1c0
> > > > net_rx_action+0x329/0x420
> > > > ? enqueue_hrtimer+0x35/0x90
> > > > ? trace_hardirqs_on+0x1d/0x80
> > > > ? kvm_sched_clock_read+0xd/0x20
> > > > ? sched_clock+0xc/0x30
> > > > ? kvm_sched_clock_read+0xd/0x20
> > > > ? sched_clock+0xc/0x30
> > > > ? sched_clock_cpu+0xd/0x1a0
> > > > handle_softirqs+0x138/0x3e0
> > > > do_softirq.part.0+0x89/0xc0
> > > > </IRQ>
> > > > <TASK>
> > > > __local_bh_enable_ip+0xa7/0xb0
> > > > virtnet_open+0xc8/0x310 [virtio_net]
> > > > __dev_open+0xfa/0x1b0
> > > > __dev_change_flags+0x1de/0x250
> > > > dev_change_flags+0x22/0x60
> > > > do_setlink.isra.0+0x2df/0x10b0
> > > > ? rtnetlink_rcv_msg+0x34f/0x3f0
> > > > ? netlink_rcv_skb+0x54/0x100
> > > > ? netlink_unicast+0x23e/0x390
> > > > ? netlink_sendmsg+0x21e/0x490
> > > > ? ____sys_sendmsg+0x31b/0x350
> > > > ? avc_has_perm_noaudit+0x67/0xf0
> > > > ? cred_has_capability.isra.0+0x75/0x110
> > > > ? __nla_validate_parse+0x5f/0xee0
> > > > ? __pfx___probestub_irq_enable+0x3/0x10
> > > > ? __create_object+0x5e/0x90
> > > > ? security_capable+0x3b/0x70
> > > > rtnl_newlink+0x784/0xaf0
> > > > ? avc_has_perm_noaudit+0x67/0xf0
> > > > ? cred_has_capability.isra.0+0x75/0x110
> > > > ? stack_depot_save_flags+0x24/0x6d0
> > > > ? __pfx_rtnl_newlink+0x10/0x10
> > > > rtnetlink_rcv_msg+0x34f/0x3f0
> > > > ? do_syscall_64+0x6c/0x180
> > > > ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > > > netlink_rcv_skb+0x54/0x100
> > > > netlink_unicast+0x23e/0x390
> > > > netlink_sendmsg+0x21e/0x490
> > > > ____sys_sendmsg+0x31b/0x350
> > > > ? copy_msghdr_from_user+0x6d/0xa0
> > > > ___sys_sendmsg+0x86/0xd0
> > > > ? __pte_offset_map+0x17/0x160
> > > > ? preempt_count_add+0x69/0xa0
> > > > ? __call_rcu_common.constprop.0+0x147/0x610
> > > > ? preempt_count_add+0x69/0xa0
> > > > ? preempt_count_add+0x69/0xa0
> > > > ? _raw_spin_trylock+0x13/0x60
> > > > ? trace_hardirqs_on+0x1d/0x80
> > > > __sys_sendmsg+0x66/0xc0
> > > > do_syscall_64+0x6c/0x180
> > > > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > RIP: 0033:0x7f41defe5b34
> > > > Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
> > > > f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> > > > f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
> > > > RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
> > > > RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
> > > > RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
> > > > R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
> > > > R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
> > > > </TASK>
> > > > [...]
> > > > ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > > >
> > > > Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> > > > Cc: <stable@vger.kernel.org> # v6.11+
> > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> > > > ---
> > > > drivers/net/virtio_net.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 64c87bb48a41..48ce8b3881b6 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
> > > > if (err < 0)
> > > > goto err_xdp_reg_mem_model;
> > > >
> > > > - netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
> > > > virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
> > > > virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
> > > >
> > > > @@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > struct virtqueue *vq = vi->sq[i].vq;
> > > > while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > virtnet_sq_free_unused_buf(vq, buf);
> > > > + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > > > cond_resched();
> > > > }
> > > >
> > > > --
> > > > 2.43.0
> > >
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point
2024-12-05 15:17 ` Michael S. Tsirkin
@ 2024-12-06 0:27 ` Koichiro Den
0 siblings, 0 replies; 25+ messages in thread
From: Koichiro Den @ 2024-12-06 0:27 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Thu, Dec 05, 2024 at 10:17:59AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 05, 2024 at 10:16:35PM +0900, Koichiro Den wrote:
> > On Thu, Dec 05, 2024 at 09:43:38PM +0900, Koichiro Den wrote:
> > > On Thu, Dec 05, 2024 at 05:33:36AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> > > > > When virtnet_close is followed by virtnet_open, some TX completions can
> > > > > possibly remain unconsumed, until they are finally processed during the
> > > > > first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> > > > > [1].
> > > >
> > > >
> > > > So it's a bugfix. Why net-next not net?
> > >
> > > I was mistaken (I just read netdev-FAQ again). I'll resend to net, with
> > > adjustments reflecting your feedback.
> > >
> > > >
> > > > > Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> > > > > before RX napi enable") was not sufficient to eliminate all BQL crash
> > > > > cases for virtio-net.
> > > > >
> > > > > This issue can be reproduced with the latest net-next master by running:
> > > > > `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> > > > > TX load from inside the machine.
> > > > >
> > > > > netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> > > > > the device is not stopped in any case. For BQL core part, it's just like
> > > > > traffic nearly ceases to exist for some period. For stall detector added
> > > > > to BQL, even if virtnet_close could somehow lead to some TX completions
> > > > > delayed for long, followed by virtnet_open, we can just take it as stall
> > > > > as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> > > > > based on BQL"). Note also that users can still reset stall_max via sysfs.
> > > > >
> > > > > So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> > > > > eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> > > > > explicitly required in freeze/restore path, so this patch adds it to
> > > > > free_unused_bufs().
> > > >
> > > > I don't much like that free_unused_bufs now has this side effect.
> > > > I think would be better to just add a loop in virtnet_restore.
> > > > Or if you want to keep it there, pls rename the function
> > > > to hint it does more.
> > >
> > > It makes sense. I would go for the former. Thanks.
> >
> > Hmm, as Jacob pointed out in v1
> > (https://lore.kernel.org/all/20241202181445.0da50076@kernel.org/),
> > it looks better to follow the rule of thumb.
>
> OK then. I'm fine with keeping your code as is, just a squash,
> and add a comment
>
> /*
> * Rule of thumb is netdev_tx_reset_queue() should follow any
> * skb freeing not followed by netdev_tx_completed_queue()
> */
Ok, thanks for the review!
>
> > Taking both suggestions
> > from Jacob and you, adding a loop to remove_vq_common(), just after
> > free_unused_bufs(), seems more fitting now, like this:
> >
> > static void remove_vq_common(struct virtnet_info *vi)
> > {
> > + int i;
> > +
> > virtio_reset_device(vi->vdev);
> >
> > /* Free unused buffers in both send and recv, if any. */
> > free_unused_bufs(vi);
> >
> > + /* Tx unused buffers flushed, so reset BQL counter */
> > + for (i = 0; i < vi->max_queue_pairs; i++)
> > + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > +
> > free_receive_bufs(vi);
> >
> > What do you think?
> >
> > Thanks,
> >
> > -Koichiro Den
> >
> > >
> > > >
> > > >
> > > > >
> > > > > [1]:
> > > > > ------------[ cut here ]------------
> > > > > kernel BUG at lib/dynamic_queue_limits.c:99!
> > > > > Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
> > > > > Tainted: [N]=TEST
> > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > > > > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > > > RIP: 0010:dql_completed+0x26b/0x290
> > > > > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > > > > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > > > > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > > > > RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
> > > > > RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
> > > > > RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
> > > > > RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
> > > > > R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
> > > > > R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
> > > > > FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
> > > > > knlGS:0000000000000000
> > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
> > > > > PKRU: 55555554
> > > > > Call Trace:
> > > > > <IRQ>
> > > > > ? die+0x32/0x80
> > > > > ? do_trap+0xd9/0x100
> > > > > ? dql_completed+0x26b/0x290
> > > > > ? dql_completed+0x26b/0x290
> > > > > ? do_error_trap+0x6d/0xb0
> > > > > ? dql_completed+0x26b/0x290
> > > > > ? exc_invalid_op+0x4c/0x60
> > > > > ? dql_completed+0x26b/0x290
> > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > ? dql_completed+0x26b/0x290
> > > > > __free_old_xmit+0xff/0x170 [virtio_net]
> > > > > free_old_xmit+0x54/0xc0 [virtio_net]
> > > > > virtnet_poll+0xf4/0xe30 [virtio_net]
> > > > > ? __update_load_avg_cfs_rq+0x264/0x2d0
> > > > > ? update_curr+0x35/0x260
> > > > > ? reweight_entity+0x1be/0x260
> > > > > __napi_poll.constprop.0+0x28/0x1c0
> > > > > net_rx_action+0x329/0x420
> > > > > ? enqueue_hrtimer+0x35/0x90
> > > > > ? trace_hardirqs_on+0x1d/0x80
> > > > > ? kvm_sched_clock_read+0xd/0x20
> > > > > ? sched_clock+0xc/0x30
> > > > > ? kvm_sched_clock_read+0xd/0x20
> > > > > ? sched_clock+0xc/0x30
> > > > > ? sched_clock_cpu+0xd/0x1a0
> > > > > handle_softirqs+0x138/0x3e0
> > > > > do_softirq.part.0+0x89/0xc0
> > > > > </IRQ>
> > > > > <TASK>
> > > > > __local_bh_enable_ip+0xa7/0xb0
> > > > > virtnet_open+0xc8/0x310 [virtio_net]
> > > > > __dev_open+0xfa/0x1b0
> > > > > __dev_change_flags+0x1de/0x250
> > > > > dev_change_flags+0x22/0x60
> > > > > do_setlink.isra.0+0x2df/0x10b0
> > > > > ? rtnetlink_rcv_msg+0x34f/0x3f0
> > > > > ? netlink_rcv_skb+0x54/0x100
> > > > > ? netlink_unicast+0x23e/0x390
> > > > > ? netlink_sendmsg+0x21e/0x490
> > > > > ? ____sys_sendmsg+0x31b/0x350
> > > > > ? avc_has_perm_noaudit+0x67/0xf0
> > > > > ? cred_has_capability.isra.0+0x75/0x110
> > > > > ? __nla_validate_parse+0x5f/0xee0
> > > > > ? __pfx___probestub_irq_enable+0x3/0x10
> > > > > ? __create_object+0x5e/0x90
> > > > > ? security_capable+0x3b/0x70
> > > > > rtnl_newlink+0x784/0xaf0
> > > > > ? avc_has_perm_noaudit+0x67/0xf0
> > > > > ? cred_has_capability.isra.0+0x75/0x110
> > > > > ? stack_depot_save_flags+0x24/0x6d0
> > > > > ? __pfx_rtnl_newlink+0x10/0x10
> > > > > rtnetlink_rcv_msg+0x34f/0x3f0
> > > > > ? do_syscall_64+0x6c/0x180
> > > > > ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > > > > netlink_rcv_skb+0x54/0x100
> > > > > netlink_unicast+0x23e/0x390
> > > > > netlink_sendmsg+0x21e/0x490
> > > > > ____sys_sendmsg+0x31b/0x350
> > > > > ? copy_msghdr_from_user+0x6d/0xa0
> > > > > ___sys_sendmsg+0x86/0xd0
> > > > > ? __pte_offset_map+0x17/0x160
> > > > > ? preempt_count_add+0x69/0xa0
> > > > > ? __call_rcu_common.constprop.0+0x147/0x610
> > > > > ? preempt_count_add+0x69/0xa0
> > > > > ? preempt_count_add+0x69/0xa0
> > > > > ? _raw_spin_trylock+0x13/0x60
> > > > > ? trace_hardirqs_on+0x1d/0x80
> > > > > __sys_sendmsg+0x66/0xc0
> > > > > do_syscall_64+0x6c/0x180
> > > > > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > > RIP: 0033:0x7f41defe5b34
> > > > > Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
> > > > > f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> > > > > f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
> > > > > RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > > > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
> > > > > RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
> > > > > RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
> > > > > R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
> > > > > R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
> > > > > </TASK>
> > > > > [...]
> > > > > ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > > > >
> > > > > Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> > > > > Cc: <stable@vger.kernel.org> # v6.11+
> > > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> > > > > ---
> > > > > drivers/net/virtio_net.c | 2 +-
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 64c87bb48a41..48ce8b3881b6 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
> > > > > if (err < 0)
> > > > > goto err_xdp_reg_mem_model;
> > > > >
> > > > > - netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
> > > > > virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
> > > > > virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
> > > > >
> > > > > @@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > struct virtqueue *vq = vi->sq[i].vq;
> > > > > while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > virtnet_sq_free_unused_buf(vq, buf);
> > > > > + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > > > > cond_resched();
> > > > > }
> > > > >
> > > > > --
> > > > > 2.43.0
> > > >
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next v3 2/7] virtio_net: replace vq2rxq with vq2txq where appropriate
2024-12-04 5:07 [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Koichiro Den
2024-12-04 5:07 ` [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point Koichiro Den
@ 2024-12-04 5:07 ` Koichiro Den
2024-12-05 7:30 ` Jason Wang
2024-12-05 10:34 ` Michael S. Tsirkin
2024-12-04 5:07 ` [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done() Koichiro Den
` (5 subsequent siblings)
7 siblings, 2 replies; 25+ messages in thread
From: Koichiro Den @ 2024-12-04 5:07 UTC (permalink / raw)
To: virtualization
Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
kuba, pabeni, jiri, netdev, linux-kernel, stable
While not harmful, using vq2rxq where it's always sq appears odd.
Replace it with the more appropriate vq2txq for clarity and correctness.
Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
drivers/net/virtio_net.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 48ce8b3881b6..1b7a85e75e14 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -6213,7 +6213,7 @@ static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
{
struct virtnet_info *vi = vq->vdev->priv;
struct send_queue *sq;
- int i = vq2rxq(vq);
+ int i = vq2txq(vq);
sq = &vi->sq[i];
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 2/7] virtio_net: replace vq2rxq with vq2txq where appropriate
2024-12-04 5:07 ` [PATCH net-next v3 2/7] virtio_net: replace vq2rxq with vq2txq where appropriate Koichiro Den
@ 2024-12-05 7:30 ` Jason Wang
2024-12-05 10:34 ` Michael S. Tsirkin
1 sibling, 0 replies; 25+ messages in thread
From: Jason Wang @ 2024-12-05 7:30 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, mst, xuanzhuo, eperezma, andrew+netdev, davem,
edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 4, 2024 at 1:08 PM Koichiro Den <koichiro.den@canonical.com> wrote:
>
> While not harmful, using vq2rxq where it's always sq appears odd.
> Replace it with the more appropriate vq2txq for clarity and correctness.
>
> Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> ---
> drivers/net/virtio_net.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 2/7] virtio_net: replace vq2rxq with vq2txq where appropriate
2024-12-04 5:07 ` [PATCH net-next v3 2/7] virtio_net: replace vq2rxq with vq2txq where appropriate Koichiro Den
2024-12-05 7:30 ` Jason Wang
@ 2024-12-05 10:34 ` Michael S. Tsirkin
1 sibling, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2024-12-05 10:34 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 04, 2024 at 02:07:19PM +0900, Koichiro Den wrote:
> While not harmful, using vq2rxq where it's always sq appears odd.
> Replace it with the more appropriate vq2txq for clarity and correctness.
>
> Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/virtio_net.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 48ce8b3881b6..1b7a85e75e14 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -6213,7 +6213,7 @@ static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
> {
> struct virtnet_info *vi = vq->vdev->priv;
> struct send_queue *sq;
> - int i = vq2rxq(vq);
> + int i = vq2txq(vq);
>
> sq = &vi->sq[i];
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done()
2024-12-04 5:07 [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Koichiro Den
2024-12-04 5:07 ` [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point Koichiro Den
2024-12-04 5:07 ` [PATCH net-next v3 2/7] virtio_net: replace vq2rxq with vq2txq where appropriate Koichiro Den
@ 2024-12-04 5:07 ` Koichiro Den
2024-12-05 7:31 ` Jason Wang
2024-12-05 10:40 ` Michael S. Tsirkin
2024-12-04 5:07 ` [PATCH net-next v3 4/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_resize() Koichiro Den
` (4 subsequent siblings)
7 siblings, 2 replies; 25+ messages in thread
From: Koichiro Den @ 2024-12-04 5:07 UTC (permalink / raw)
To: virtualization
Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
kuba, pabeni, jiri, netdev, linux-kernel, stable
This will be used in the following commits, to ensure DQL reset occurs
iff. all unused buffers are actually recycled.
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
drivers/net/virtio_net.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1b7a85e75e14..b3cbbd8052e4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -503,6 +503,7 @@ struct virtio_net_common_hdr {
static struct virtio_net_common_hdr xsk_hdr;
static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
+static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq);
static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
struct net_device *dev,
unsigned int *xdp_xmit,
@@ -6233,6 +6234,14 @@ static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
}
}
+static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq)
+{
+ struct virtnet_info *vi = vq->vdev->priv;
+ int i = vq2txq(vq);
+
+ netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
+}
+
static void free_unused_bufs(struct virtnet_info *vi)
{
void *buf;
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done()
2024-12-04 5:07 ` [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done() Koichiro Den
@ 2024-12-05 7:31 ` Jason Wang
2024-12-05 10:40 ` Michael S. Tsirkin
1 sibling, 0 replies; 25+ messages in thread
From: Jason Wang @ 2024-12-05 7:31 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, mst, xuanzhuo, eperezma, andrew+netdev, davem,
edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 4, 2024 at 1:08 PM Koichiro Den <koichiro.den@canonical.com> wrote:
>
> This will be used in the following commits, to ensure DQL reset occurs
> iff. all unused buffers are actually recycled.
>
> Cc: <stable@vger.kernel.org> # v6.11+
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done()
2024-12-04 5:07 ` [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done() Koichiro Den
2024-12-05 7:31 ` Jason Wang
@ 2024-12-05 10:40 ` Michael S. Tsirkin
2024-12-05 12:53 ` Koichiro Den
1 sibling, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2024-12-05 10:40 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 04, 2024 at 02:07:20PM +0900, Koichiro Den wrote:
> This will be used in the following commits, to ensure DQL reset occurs
> iff. all unused buffers are actually recycled.
>
> Cc: <stable@vger.kernel.org> # v6.11+
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
to avoid adding an unused function, squash with a patch that uses it.
> ---
> drivers/net/virtio_net.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 1b7a85e75e14..b3cbbd8052e4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -503,6 +503,7 @@ struct virtio_net_common_hdr {
> static struct virtio_net_common_hdr xsk_hdr;
>
> static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> +static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq);
> static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> struct net_device *dev,
> unsigned int *xdp_xmit,
> @@ -6233,6 +6234,14 @@ static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
> }
> }
>
> +static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq)
> +{
> + struct virtnet_info *vi = vq->vdev->priv;
> + int i = vq2txq(vq);
> +
> + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> +}
> +
> static void free_unused_bufs(struct virtnet_info *vi)
> {
> void *buf;
> --
> 2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done()
2024-12-05 10:40 ` Michael S. Tsirkin
@ 2024-12-05 12:53 ` Koichiro Den
0 siblings, 0 replies; 25+ messages in thread
From: Koichiro Den @ 2024-12-05 12:53 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Thu, Dec 05, 2024 at 05:40:33AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 04, 2024 at 02:07:20PM +0900, Koichiro Den wrote:
> > This will be used in the following commits, to ensure DQL reset occurs
> > iff. all unused buffers are actually recycled.
> >
> > Cc: <stable@vger.kernel.org> # v6.11+
> > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
>
> to avoid adding an unused function, squash with a patch that uses it.
I originally seperated this out because some were supposed to land stable
tree while others not, and this was the common prerequisite. However, this
can be squahsed with [5/7] regardless of that, and should be done so as you
pointed out.
I'll do so and send v4 later, thanks for the review.
>
>
> > ---
> > drivers/net/virtio_net.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 1b7a85e75e14..b3cbbd8052e4 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -503,6 +503,7 @@ struct virtio_net_common_hdr {
> > static struct virtio_net_common_hdr xsk_hdr;
> >
> > static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > +static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq);
> > static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > struct net_device *dev,
> > unsigned int *xdp_xmit,
> > @@ -6233,6 +6234,14 @@ static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
> > }
> > }
> >
> > +static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq)
> > +{
> > + struct virtnet_info *vi = vq->vdev->priv;
> > + int i = vq2txq(vq);
> > +
> > + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > +}
> > +
> > static void free_unused_bufs(struct virtnet_info *vi)
> > {
> > void *buf;
> > --
> > 2.43.0
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next v3 4/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
2024-12-04 5:07 [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Koichiro Den
` (2 preceding siblings ...)
2024-12-04 5:07 ` [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done() Koichiro Den
@ 2024-12-04 5:07 ` Koichiro Den
2024-12-05 7:31 ` Jason Wang
2024-12-04 5:07 ` [PATCH net-next v3 5/7] virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize Koichiro Den
` (3 subsequent siblings)
7 siblings, 1 reply; 25+ messages in thread
From: Koichiro Den @ 2024-12-04 5:07 UTC (permalink / raw)
To: virtualization
Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
kuba, pabeni, jiri, netdev, linux-kernel, stable
When virtqueue_resize() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when the recycle really occurs.
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
drivers/net/virtio_net.c | 4 ++--
drivers/virtio/virtio_ring.c | 6 +++++-
include/linux/virtio.h | 3 ++-
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b3cbbd8052e4..2a90655cfa4f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3332,7 +3332,7 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
virtnet_rx_pause(vi, rq);
- err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
+ err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf, NULL);
if (err)
netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err);
@@ -3395,7 +3395,7 @@ static int virtnet_tx_resize(struct virtnet_info *vi, struct send_queue *sq,
virtnet_tx_pause(vi, sq);
- err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
+ err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, NULL);
if (err)
netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 82a7d2cbc704..6af8cf6a619e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2772,6 +2772,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
* @_vq: the struct virtqueue we're talking about.
* @num: new ring num
* @recycle: callback to recycle unused buffers
+ * @recycle_done: callback to be invoked when recycle for all unused buffers done
*
* When it is really necessary to create a new vring, it will set the current vq
* into the reset state. Then call the passed callback to recycle the buffer
@@ -2792,7 +2793,8 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
*
*/
int virtqueue_resize(struct virtqueue *_vq, u32 num,
- void (*recycle)(struct virtqueue *vq, void *buf))
+ void (*recycle)(struct virtqueue *vq, void *buf),
+ void (*recycle_done)(struct virtqueue *vq))
{
struct vring_virtqueue *vq = to_vvq(_vq);
int err;
@@ -2809,6 +2811,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
+ if (recycle_done)
+ recycle_done(_vq);
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 57cc4b07fd17..0aa7df4ed5ca 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -109,7 +109,8 @@ dma_addr_t virtqueue_get_avail_addr(const struct virtqueue *vq);
dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
int virtqueue_resize(struct virtqueue *vq, u32 num,
- void (*recycle)(struct virtqueue *vq, void *buf));
+ void (*recycle)(struct virtqueue *vq, void *buf),
+ void (*recycle_done)(struct virtqueue *vq));
int virtqueue_reset(struct virtqueue *vq,
void (*recycle)(struct virtqueue *vq, void *buf));
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 4/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
2024-12-04 5:07 ` [PATCH net-next v3 4/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_resize() Koichiro Den
@ 2024-12-05 7:31 ` Jason Wang
0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2024-12-05 7:31 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, mst, xuanzhuo, eperezma, andrew+netdev, davem,
edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 4, 2024 at 1:08 PM Koichiro Den <koichiro.den@canonical.com> wrote:
>
> When virtqueue_resize() has actually recycled all unused buffers,
> additional work may be required in some cases. Relying solely on its
> return status is fragile, so introduce a new function argument
> 'recycle_done', which is invoked when the recycle really occurs.
>
> Cc: <stable@vger.kernel.org> # v6.11+
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next v3 5/7] virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
2024-12-04 5:07 [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Koichiro Den
` (3 preceding siblings ...)
2024-12-04 5:07 ` [PATCH net-next v3 4/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_resize() Koichiro Den
@ 2024-12-04 5:07 ` Koichiro Den
2024-12-05 7:32 ` Jason Wang
2024-12-04 5:07 ` [PATCH net-next v3 6/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_reset() Koichiro Den
` (2 subsequent siblings)
7 siblings, 1 reply; 25+ messages in thread
From: Koichiro Den @ 2024-12-04 5:07 UTC (permalink / raw)
To: virtualization
Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
kuba, pabeni, jiri, netdev, linux-kernel, stable
virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to
be reset when flushing has actually occurred. Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to
handle this.
Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
drivers/net/virtio_net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2a90655cfa4f..d0cf29fd8255 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3395,7 +3395,8 @@ static int virtnet_tx_resize(struct virtnet_info *vi, struct send_queue *sq,
virtnet_tx_pause(vi, sq);
- err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, NULL);
+ err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf,
+ virtnet_sq_free_unused_buf_done);
if (err)
netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 5/7] virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
2024-12-04 5:07 ` [PATCH net-next v3 5/7] virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize Koichiro Den
@ 2024-12-05 7:32 ` Jason Wang
0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2024-12-05 7:32 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, mst, xuanzhuo, eperezma, andrew+netdev, davem,
edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 4, 2024 at 1:08 PM Koichiro Den <koichiro.den@canonical.com> wrote:
>
> virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to
> be reset when flushing has actually occurred. Add
> virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to
> handle this.
>
> Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> Cc: <stable@vger.kernel.org> # v6.11+
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> ---
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next v3 6/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()
2024-12-04 5:07 [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Koichiro Den
` (4 preceding siblings ...)
2024-12-04 5:07 ` [PATCH net-next v3 5/7] virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize Koichiro Den
@ 2024-12-04 5:07 ` Koichiro Den
2024-12-05 7:32 ` Jason Wang
2024-12-04 5:07 ` [PATCH net-next v3 7/7] virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx Koichiro Den
2024-12-05 10:41 ` [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Michael S. Tsirkin
7 siblings, 1 reply; 25+ messages in thread
From: Koichiro Den @ 2024-12-04 5:07 UTC (permalink / raw)
To: virtualization
Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
kuba, pabeni, jiri, netdev, linux-kernel, stable
When virtqueue_reset() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when it really occurs.
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
drivers/net/virtio_net.c | 4 ++--
drivers/virtio/virtio_ring.c | 6 +++++-
include/linux/virtio.h | 3 ++-
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d0cf29fd8255..5eaa7a2884d5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5711,7 +5711,7 @@ static int virtnet_rq_bind_xsk_pool(struct virtnet_info *vi, struct receive_queu
virtnet_rx_pause(vi, rq);
- err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
+ err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf, NULL);
if (err) {
netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: %d\n", qindex, err);
@@ -5740,7 +5740,7 @@ static int virtnet_sq_bind_xsk_pool(struct virtnet_info *vi,
virtnet_tx_pause(vi, sq);
- err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf);
+ err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf, NULL);
if (err) {
netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: %d\n", qindex, err);
pool = NULL;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6af8cf6a619e..fdd2d2b07b5a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2827,6 +2827,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
* virtqueue_reset - detach and recycle all unused buffers
* @_vq: the struct virtqueue we're talking about.
* @recycle: callback to recycle unused buffers
+ * @recycle_done: callback to be invoked when recycle for all unused buffers done
*
* Caller must ensure we don't call this with other virtqueue operations
* at the same time (except where noted).
@@ -2838,7 +2839,8 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
* -EPERM: Operation not permitted
*/
int virtqueue_reset(struct virtqueue *_vq,
- void (*recycle)(struct virtqueue *vq, void *buf))
+ void (*recycle)(struct virtqueue *vq, void *buf),
+ void (*recycle_done)(struct virtqueue *vq))
{
struct vring_virtqueue *vq = to_vvq(_vq);
int err;
@@ -2846,6 +2848,8 @@ int virtqueue_reset(struct virtqueue *_vq,
err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
+ if (recycle_done)
+ recycle_done(_vq);
if (vq->packed_ring)
virtqueue_reinit_packed(vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 0aa7df4ed5ca..dd88682e27e3 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -112,7 +112,8 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
void (*recycle)(struct virtqueue *vq, void *buf),
void (*recycle_done)(struct virtqueue *vq));
int virtqueue_reset(struct virtqueue *vq,
- void (*recycle)(struct virtqueue *vq, void *buf));
+ void (*recycle)(struct virtqueue *vq, void *buf),
+ void (*recycle_done)(struct virtqueue *vq));
struct virtio_admin_cmd {
__le16 opcode;
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 6/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()
2024-12-04 5:07 ` [PATCH net-next v3 6/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_reset() Koichiro Den
@ 2024-12-05 7:32 ` Jason Wang
0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2024-12-05 7:32 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, mst, xuanzhuo, eperezma, andrew+netdev, davem,
edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 4, 2024 at 1:08 PM Koichiro Den <koichiro.den@canonical.com> wrote:
>
> When virtqueue_reset() has actually recycled all unused buffers,
> additional work may be required in some cases. Relying solely on its
> return status is fragile, so introduce a new function argument
> 'recycle_done', which is invoked when it really occurs.
>
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next v3 7/7] virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx
2024-12-04 5:07 [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Koichiro Den
` (5 preceding siblings ...)
2024-12-04 5:07 ` [PATCH net-next v3 6/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_reset() Koichiro Den
@ 2024-12-04 5:07 ` Koichiro Den
2024-12-05 7:33 ` Jason Wang
2024-12-05 10:41 ` [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Michael S. Tsirkin
7 siblings, 1 reply; 25+ messages in thread
From: Koichiro Den @ 2024-12-04 5:07 UTC (permalink / raw)
To: virtualization
Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
kuba, pabeni, jiri, netdev, linux-kernel, stable
virtnet_sq_bind_xsk_pool() flushes tx skbs and then resets tx queue, so
DQL counters need to be reset when flushing has actually occurred, Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_resize()
to handle this.
Fixes: 21a4e3ce6dc7 ("virtio_net: xsk: bind/unbind xsk for tx")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
drivers/net/virtio_net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5eaa7a2884d5..177705a56812 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5740,7 +5740,8 @@ static int virtnet_sq_bind_xsk_pool(struct virtnet_info *vi,
virtnet_tx_pause(vi, sq);
- err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf, NULL);
+ err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf,
+ virtnet_sq_free_unused_buf_done);
if (err) {
netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: %d\n", qindex, err);
pool = NULL;
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next v3 7/7] virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx
2024-12-04 5:07 ` [PATCH net-next v3 7/7] virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx Koichiro Den
@ 2024-12-05 7:33 ` Jason Wang
0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2024-12-05 7:33 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, mst, xuanzhuo, eperezma, andrew+netdev, davem,
edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 4, 2024 at 1:08 PM Koichiro Den <koichiro.den@canonical.com> wrote:
>
> virtnet_sq_bind_xsk_pool() flushes tx skbs and then resets tx queue, so
> DQL counters need to be reset when flushing has actually occurred, Add
> virtnet_sq_free_unused_buf_done() as a callback for virtqueue_resize()
> to handle this.
>
> Fixes: 21a4e3ce6dc7 ("virtio_net: xsk: bind/unbind xsk for tx")
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> ---
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points
2024-12-04 5:07 [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Koichiro Den
` (6 preceding siblings ...)
2024-12-04 5:07 ` [PATCH net-next v3 7/7] virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx Koichiro Den
@ 2024-12-05 10:41 ` Michael S. Tsirkin
7 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2024-12-05 10:41 UTC (permalink / raw)
To: Koichiro Den
Cc: virtualization, jasowang, xuanzhuo, eperezma, andrew+netdev,
davem, edumazet, kuba, pabeni, jiri, netdev, linux-kernel, stable
On Wed, Dec 04, 2024 at 02:07:17PM +0900, Koichiro Den wrote:
> When virtnet_close is followed by virtnet_open, some TX completions can
> possibly remain unconsumed, until they are finally processed during the
> first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> [1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> before RX napi enable") was not sufficient to eliminate all BQL crash
> cases for virtio-net.
>
> This issue can be reproduced with the latest net-next master by running:
> `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> TX load from inside the machine.
>
> This patch series resolves the issue and also addresses similar existing
> problems:
>
> (a). Drop netdev_tx_reset_queue() from open/close path. This eliminates the
> BQL crashes due to the problematic open/close path.
>
> (b). As a result of (a), netdev_tx_reset_queue() is now explicitly required
> in freeze/restore path. Add netdev_tx_reset_queue() to
> free_unused_bufs().
>
> (c). Fix missing resetting in virtnet_tx_resize().
> virtnet_tx_resize() has lacked proper resetting since commit
> c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits").
>
> (d). Fix missing resetting in the XDP_SETUP_XSK_POOL path.
> Similar to (c), this path lacked proper resetting. Call
> netdev_tx_reset_queue() when virtqueue_reset() has actually recycled
> unused buffers.
>
> This patch series consists of five commits:
> [1/7]: Resolves (a) and (b).
> [2/7[: Minor fix to make [3/7] streamlined.
> [3/7]: Prerequisite for (c) and (d).
> [4/7]: Prerequisite for (c).
> [5/7]: Resolves (c).
> [6/7]: Preresuisite for (d).
> [7/7]: Resolves (d).
>
> Changes for v3:
> - replace 'flushed' argument with 'recycle_done'
> Changes for v2:
> - add tx queue resetting for (b) to (d) above
>
> v2: https://lore.kernel.org/all/20241203073025.67065-1-koichiro.den@canonical.com/
> v1: https://lore.kernel.org/all/20241130181744.3772632-1-koichiro.den@canonical.com/
Overall looks good, but I think this is net material, not net-next.
Sent some suggestions for cosmetic changes.
>
> [1]:
> ------------[ cut here ]------------
> kernel BUG at lib/dynamic_queue_limits.c:99!
> Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
> Tainted: [N]=TEST
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> RIP: 0010:dql_completed+0x26b/0x290
> Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
> RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
> RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
> R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
> R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
> FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
> PKRU: 55555554
> Call Trace:
> <IRQ>
> ? die+0x32/0x80
> ? do_trap+0xd9/0x100
> ? dql_completed+0x26b/0x290
> ? dql_completed+0x26b/0x290
> ? do_error_trap+0x6d/0xb0
> ? dql_completed+0x26b/0x290
> ? exc_invalid_op+0x4c/0x60
> ? dql_completed+0x26b/0x290
> ? asm_exc_invalid_op+0x16/0x20
> ? dql_completed+0x26b/0x290
> __free_old_xmit+0xff/0x170 [virtio_net]
> free_old_xmit+0x54/0xc0 [virtio_net]
> virtnet_poll+0xf4/0xe30 [virtio_net]
> ? __update_load_avg_cfs_rq+0x264/0x2d0
> ? update_curr+0x35/0x260
> ? reweight_entity+0x1be/0x260
> __napi_poll.constprop.0+0x28/0x1c0
> net_rx_action+0x329/0x420
> ? enqueue_hrtimer+0x35/0x90
> ? trace_hardirqs_on+0x1d/0x80
> ? kvm_sched_clock_read+0xd/0x20
> ? sched_clock+0xc/0x30
> ? kvm_sched_clock_read+0xd/0x20
> ? sched_clock+0xc/0x30
> ? sched_clock_cpu+0xd/0x1a0
> handle_softirqs+0x138/0x3e0
> do_softirq.part.0+0x89/0xc0
> </IRQ>
> <TASK>
> __local_bh_enable_ip+0xa7/0xb0
> virtnet_open+0xc8/0x310 [virtio_net]
> __dev_open+0xfa/0x1b0
> __dev_change_flags+0x1de/0x250
> dev_change_flags+0x22/0x60
> do_setlink.isra.0+0x2df/0x10b0
> ? rtnetlink_rcv_msg+0x34f/0x3f0
> ? netlink_rcv_skb+0x54/0x100
> ? netlink_unicast+0x23e/0x390
> ? netlink_sendmsg+0x21e/0x490
> ? ____sys_sendmsg+0x31b/0x350
> ? avc_has_perm_noaudit+0x67/0xf0
> ? cred_has_capability.isra.0+0x75/0x110
> ? __nla_validate_parse+0x5f/0xee0
> ? __pfx___probestub_irq_enable+0x3/0x10
> ? __create_object+0x5e/0x90
> ? security_capable+0x3b/0x7^[[I0
> rtnl_newlink+0x784/0xaf0
> ? avc_has_perm_noaudit+0x67/0xf0
> ? cred_has_capability.isra.0+0x75/0x110
> ? stack_depot_save_flags+0x24/0x6d0
> ? __pfx_rtnl_newlink+0x10/0x10
> rtnetlink_rcv_msg+0x34f/0x3f0
> ? do_syscall_64+0x6c/0x180
> ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> netlink_rcv_skb+0x54/0x100
> netlink_unicast+0x23e/0x390
> netlink_sendmsg+0x21e/0x490
> ____sys_sendmsg+0x31b/0x350
> ? copy_msghdr_from_user+0x6d/0xa0
> ___sys_sendmsg+0x86/0xd0
> ? __pte_offset_map+0x17/0x160
> ? preempt_count_add+0x69/0xa0
> ? __call_rcu_common.constprop.0+0x147/0x610
> ? preempt_count_add+0x69/0xa0
> ? preempt_count_add+0x69/0xa0
> ? _raw_spin_trylock+0x13/0x60
> ? trace_hardirqs_on+0x1d/0x80
> __sys_sendmsg+0x66/0xc0
> do_syscall_64+0x6c/0x180
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f41defe5b34
> Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
> f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
> RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
> RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
> RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
> R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
> R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
> </TASK>
> [...]
> ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
>
> Koichiro Den (7):
> virtio_net: correct netdev_tx_reset_queue() invocation point
> virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
> virtio_net: replace vq2rxq with vq2txq where appropriate
> virtio_net: introduce virtnet_sq_free_unused_buf_done()
> virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
> virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()
> virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx
>
> drivers/net/virtio_net.c | 23 +++++++++++++++++------
> drivers/virtio/virtio_ring.c | 12 ++++++++++--
> include/linux/virtio.h | 6 ++++--
> 3 files changed, 31 insertions(+), 10 deletions(-)
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread