From: "Michael S. Tsirkin" <mst@redhat.com>
To: Koichiro Den <koichiro.den@canonical.com>
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
xuanzhuo@linux.alibaba.com, eperezma@redhat.com,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, jiri@resnulli.us,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point
Date: Thu, 5 Dec 2024 10:14:19 -0500 [thread overview]
Message-ID: <20241205101351-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <cv7ph7yna6d5a37k7hoxplyzrbmrdxrcjd67nrttevsta3r54h@35ztxhqaczqd>
On Thu, Dec 05, 2024 at 10:16:35PM +0900, Koichiro Den wrote:
> On Thu, Dec 05, 2024 at 09:43:38PM +0900, Koichiro Den wrote:
> > On Thu, Dec 05, 2024 at 05:33:36AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> > > > When virtnet_close is followed by virtnet_open, some TX completions can
> > > > possibly remain unconsumed, until they are finally processed during the
> > > > first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> > > > [1].
> > >
> > >
> > > So it's a bugfix. Why net-next not net?
> >
> > I was mistaken (I just read netdev-FAQ again). I'll resend to net, with
> > adjustments reflecting your feedback.
> >
> > >
> > > > Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> > > > before RX napi enable") was not sufficient to eliminate all BQL crash
> > > > cases for virtio-net.
> > > >
> > > > This issue can be reproduced with the latest net-next master by running:
> > > > `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> > > > TX load from inside the machine.
> > > >
> > > > netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> > > > the device is not stopped in any case. For BQL core part, it's just like
> > > > traffic nearly ceases to exist for some period. For stall detector added
> > > > to BQL, even if virtnet_close could somehow lead to some TX completions
> > > > delayed for long, followed by virtnet_open, we can just take it as stall
> > > > as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> > > > based on BQL"). Note also that users can still reset stall_max via sysfs.
> > > >
> > > > So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> > > > eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> > > > explicitly required in freeze/restore path, so this patch adds it to
> > > > free_unused_bufs().
> > >
> > > I don't much like that free_unused_bufs now has this side effect.
> > > I think would be better to just add a loop in virtnet_restore.
> > > Or if you want to keep it there, pls rename the function
> > > to hint it does more.
> >
> > It makes sense. I would go for the former. Thanks.
>
> Hmm, as Jacob pointed out in v1
> (https://lore.kernel.org/all/20241202181445.0da50076@kernel.org/),
> it looks better to follow the rule of thumb. Taking both suggestions
> from Jacob and you, adding a loop to remove_vq_common(), just after
> free_unused_bufs(), seems more fitting now, like this:
>
> static void remove_vq_common(struct virtnet_info *vi)
> {
> + int i;
> +
> virtio_reset_device(vi->vdev);
>
> /* Free unused buffers in both send and recv, if any. */
> free_unused_bufs(vi);
>
> + /* Tx unused buffers flushed, so reset BQL counter */
> + for (i = 0; i < vi->max_queue_pairs; i++)
> + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> +
> free_receive_bufs(vi);
>
> What do you think?
>
> Thanks,
>
> -Koichiro Den
why in remove_vq_common? It's only used on freeze/restore.
> >
> > >
> > >
> > > >
> > > > [1]:
> > > > ------------[ cut here ]------------
> > > > kernel BUG at lib/dynamic_queue_limits.c:99!
> > > > Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
> > > > Tainted: [N]=TEST
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > > > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > > RIP: 0010:dql_completed+0x26b/0x290
> > > > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > > > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > > > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > > > RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
> > > > RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
> > > > RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
> > > > RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
> > > > R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
> > > > FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
> > > > knlGS:0000000000000000
> > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
> > > > PKRU: 55555554
> > > > Call Trace:
> > > > <IRQ>
> > > > ? die+0x32/0x80
> > > > ? do_trap+0xd9/0x100
> > > > ? dql_completed+0x26b/0x290
> > > > ? dql_completed+0x26b/0x290
> > > > ? do_error_trap+0x6d/0xb0
> > > > ? dql_completed+0x26b/0x290
> > > > ? exc_invalid_op+0x4c/0x60
> > > > ? dql_completed+0x26b/0x290
> > > > ? asm_exc_invalid_op+0x16/0x20
> > > > ? dql_completed+0x26b/0x290
> > > > __free_old_xmit+0xff/0x170 [virtio_net]
> > > > free_old_xmit+0x54/0xc0 [virtio_net]
> > > > virtnet_poll+0xf4/0xe30 [virtio_net]
> > > > ? __update_load_avg_cfs_rq+0x264/0x2d0
> > > > ? update_curr+0x35/0x260
> > > > ? reweight_entity+0x1be/0x260
> > > > __napi_poll.constprop.0+0x28/0x1c0
> > > > net_rx_action+0x329/0x420
> > > > ? enqueue_hrtimer+0x35/0x90
> > > > ? trace_hardirqs_on+0x1d/0x80
> > > > ? kvm_sched_clock_read+0xd/0x20
> > > > ? sched_clock+0xc/0x30
> > > > ? kvm_sched_clock_read+0xd/0x20
> > > > ? sched_clock+0xc/0x30
> > > > ? sched_clock_cpu+0xd/0x1a0
> > > > handle_softirqs+0x138/0x3e0
> > > > do_softirq.part.0+0x89/0xc0
> > > > </IRQ>
> > > > <TASK>
> > > > __local_bh_enable_ip+0xa7/0xb0
> > > > virtnet_open+0xc8/0x310 [virtio_net]
> > > > __dev_open+0xfa/0x1b0
> > > > __dev_change_flags+0x1de/0x250
> > > > dev_change_flags+0x22/0x60
> > > > do_setlink.isra.0+0x2df/0x10b0
> > > > ? rtnetlink_rcv_msg+0x34f/0x3f0
> > > > ? netlink_rcv_skb+0x54/0x100
> > > > ? netlink_unicast+0x23e/0x390
> > > > ? netlink_sendmsg+0x21e/0x490
> > > > ? ____sys_sendmsg+0x31b/0x350
> > > > ? avc_has_perm_noaudit+0x67/0xf0
> > > > ? cred_has_capability.isra.0+0x75/0x110
> > > > ? __nla_validate_parse+0x5f/0xee0
> > > > ? __pfx___probestub_irq_enable+0x3/0x10
> > > > ? __create_object+0x5e/0x90
> > > > ? security_capable+0x3b/0x70
> > > > rtnl_newlink+0x784/0xaf0
> > > > ? avc_has_perm_noaudit+0x67/0xf0
> > > > ? cred_has_capability.isra.0+0x75/0x110
> > > > ? stack_depot_save_flags+0x24/0x6d0
> > > > ? __pfx_rtnl_newlink+0x10/0x10
> > > > rtnetlink_rcv_msg+0x34f/0x3f0
> > > > ? do_syscall_64+0x6c/0x180
> > > > ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > > > netlink_rcv_skb+0x54/0x100
> > > > netlink_unicast+0x23e/0x390
> > > > netlink_sendmsg+0x21e/0x490
> > > > ____sys_sendmsg+0x31b/0x350
> > > > ? copy_msghdr_from_user+0x6d/0xa0
> > > > ___sys_sendmsg+0x86/0xd0
> > > > ? __pte_offset_map+0x17/0x160
> > > > ? preempt_count_add+0x69/0xa0
> > > > ? __call_rcu_common.constprop.0+0x147/0x610
> > > > ? preempt_count_add+0x69/0xa0
> > > > ? preempt_count_add+0x69/0xa0
> > > > ? _raw_spin_trylock+0x13/0x60
> > > > ? trace_hardirqs_on+0x1d/0x80
> > > > __sys_sendmsg+0x66/0xc0
> > > > do_syscall_64+0x6c/0x180
> > > > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > RIP: 0033:0x7f41defe5b34
> > > > Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
> > > > f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
> > > > f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
> > > > RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34
> > > > RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003
> > > > RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001
> > > > R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003
> > > > R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000
> > > > </TASK>
> > > > [...]
> > > > ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > > >
> > > > Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> > > > Cc: <stable@vger.kernel.org> # v6.11+
> > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> > > > ---
> > > > drivers/net/virtio_net.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 64c87bb48a41..48ce8b3881b6 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
> > > > if (err < 0)
> > > > goto err_xdp_reg_mem_model;
> > > >
> > > > - netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
> > > > virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
> > > > virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
> > > >
> > > > @@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > struct virtqueue *vq = vi->sq[i].vq;
> > > > while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > virtnet_sq_free_unused_buf(vq, buf);
> > > > + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > > > cond_resched();
> > > > }
> > > >
> > > > --
> > > > 2.43.0
> > >
next prev parent reply other threads:[~2024-12-05 15:14 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-04 5:07 [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Koichiro Den
2024-12-04 5:07 ` [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point Koichiro Den
2024-12-05 7:30 ` Jason Wang
2024-12-05 10:33 ` Michael S. Tsirkin
2024-12-05 12:43 ` Koichiro Den
2024-12-05 13:16 ` Koichiro Den
2024-12-05 15:14 ` Michael S. Tsirkin [this message]
2024-12-05 15:17 ` Michael S. Tsirkin
2024-12-06 0:27 ` Koichiro Den
2024-12-04 5:07 ` [PATCH net-next v3 2/7] virtio_net: replace vq2rxq with vq2txq where appropriate Koichiro Den
2024-12-04 5:09 ` kernel test robot
2024-12-05 7:30 ` Jason Wang
2024-12-05 10:34 ` Michael S. Tsirkin
2024-12-04 5:07 ` [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done() Koichiro Den
2024-12-05 7:31 ` Jason Wang
2024-12-05 10:40 ` Michael S. Tsirkin
2024-12-05 12:53 ` Koichiro Den
2024-12-04 5:07 ` [PATCH net-next v3 4/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_resize() Koichiro Den
2024-12-05 7:31 ` Jason Wang
2024-12-04 5:07 ` [PATCH net-next v3 5/7] virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize Koichiro Den
2024-12-05 7:32 ` Jason Wang
2024-12-04 5:07 ` [PATCH net-next v3 6/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_reset() Koichiro Den
2024-12-05 7:32 ` Jason Wang
2024-12-04 5:07 ` [PATCH net-next v3 7/7] virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx Koichiro Den
2024-12-05 7:33 ` Jason Wang
2024-12-05 10:41 ` [PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241205101351-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=jiri@resnulli.us \
--cc=koichiro.den@canonical.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.