* [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze [not found] <CGME20250812090805epcas5p25ba25ca1c5c68bbdf9df27d95ccae4f5@epcas5p2.samsung.com> @ 2025-08-12 9:08 ` Junnan Wu 2025-08-14 0:23 ` Jakub Kicinski 0 siblings, 1 reply; 16+ messages in thread From: Junnan Wu @ 2025-08-12 9:08 UTC (permalink / raw) To: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni Cc: xuanzhuo, eperezma, virtualization, netdev, linux-kernel, ying123.xu, lei19.wang, q1.huang, Junnan Wu "Use after free" issue appears in suspend once race occurs when napi poll scheduls after `netif_device_detach` and before napi disables. For details, during suspend flow of virtio-net, the tx queue state is set to "__QUEUE_STATE_DRV_XOFF" by CPU-A. And at some coincidental times, if a TCP connection is still working, CPU-B does `virtnet_poll` before napi disable. In this flow, the state "__QUEUE_STATE_DRV_XOFF" of tx queue will be cleared. This is not the normal process it expects. After that, CPU-A continues to close driver then virtqueue is removed. Sequence likes below: -------------------------------------------------------------------------- CPU-A CPU-B ----- ----- suspend is called A TCP based on virtio-net still work virtnet_freeze |- virtnet_freeze_down | |- netif_device_detach | | |- netif_tx_stop_all_queues | | |- netif_tx_stop_queue | | |- set_bit | | (__QUEUE_STATE_DRV_XOFF,...) | | softirq rasied | | |- net_rx_action | | |- napi_poll | | |- virtnet_poll | | |- virtnet_poll_cleantx | | |- netif_tx_wake_queue | | |- test_and_clear_bit | | (__QUEUE_STATE_DRV_XOFF,...) | |- virtnet_close | |- virtnet_disable_queue_pair | |- virtnet_napi_tx_disable |- remove_vq_common -------------------------------------------------------------------------- When TCP delayack timer is up, a cpu gets softirq and irq handler `tcp_delack_timer_handler` will be called, which will finally call `start_xmit` in virtio net driver. Then the access to tx virtq will cause panic. The root cause of this issue is that napi tx is not disable before `netif_tx_stop_queue`, once `virnet_poll` schedules in such coincidental time, the tx queue state will be cleared. To solve this issue, adjusts the order of function `virtnet_close` in `virtnet_freeze_down`. Co-developed-by: Ying Xu <ying123.xu@samsung.com> Signed-off-by: Ying Xu <ying123.xu@samsung.com> Signed-off-by: Junnan Wu <junnan01.wu@samsung.com> --- drivers/net/virtio_net.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index d14e6d602273..975bdc5dab84 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -5758,14 +5758,15 @@ static void virtnet_freeze_down(struct virtio_device *vdev) disable_rx_mode_work(vi); flush_work(&vi->rx_mode_work); - netif_tx_lock_bh(vi->dev); - netif_device_detach(vi->dev); - netif_tx_unlock_bh(vi->dev); if (netif_running(vi->dev)) { rtnl_lock(); virtnet_close(vi->dev); rtnl_unlock(); } + + netif_tx_lock_bh(vi->dev); + netif_device_detach(vi->dev); + netif_tx_unlock_bh(vi->dev); } static int init_vqs(struct virtnet_info *vi); -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze 2025-08-12 9:08 ` [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze Junnan Wu @ 2025-08-14 0:23 ` Jakub Kicinski [not found] ` <CGME20250814023554epcas5p4b3dcab50835e2da4749be1be135def20@epcas5p4.samsung.com> 0 siblings, 1 reply; 16+ messages in thread From: Jakub Kicinski @ 2025-08-14 0:23 UTC (permalink / raw) To: Junnan Wu Cc: mst, jasowang, andrew+netdev, davem, edumazet, pabeni, xuanzhuo, eperezma, virtualization, netdev, linux-kernel, ying123.xu, lei19.wang, q1.huang On Tue, 12 Aug 2025 17:08:17 +0800 Junnan Wu wrote: > "Use after free" issue appears in suspend once race occurs when > napi poll scheduls after `netif_device_detach` and before napi disables. Sounds like a fix people may want to backport. Could you repost with an appropriate Fixes tag added, pointing to the earliest commit where the problem can be observed? -- pw-bot: cr ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CGME20250814023554epcas5p4b3dcab50835e2da4749be1be135def20@epcas5p4.samsung.com>]
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze [not found] ` <CGME20250814023554epcas5p4b3dcab50835e2da4749be1be135def20@epcas5p4.samsung.com> @ 2025-08-14 2:36 ` Junnan Wu 2025-08-14 4:01 ` Jason Wang 0 siblings, 1 reply; 16+ messages in thread From: Junnan Wu @ 2025-08-14 2:36 UTC (permalink / raw) To: kuba Cc: andrew+netdev, davem, edumazet, eperezma, jasowang, junnan01.wu, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote: > Sounds like a fix people may want to backport. Could you repost with > an appropriate Fixes tag added, pointing to the earliest commit where > the problem can be observed? This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633" After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx` will be invoked, which will wakeup tx queue and clear queue state. If you agree with it, I will repost with this Fixes tag later. Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze 2025-08-14 2:36 ` Junnan Wu @ 2025-08-14 4:01 ` Jason Wang [not found] ` <CGME20250814054321epcas5p1dd83614241a15c78645e7f08d5e959c3@epcas5p1.samsung.com> 0 siblings, 1 reply; 16+ messages in thread From: Jason Wang @ 2025-08-14 4:01 UTC (permalink / raw) To: Junnan Wu Cc: kuba, andrew+netdev, davem, edumazet, eperezma, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote: > > On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote: > > Sounds like a fix people may want to backport. Could you repost with > > an appropriate Fixes tag added, pointing to the earliest commit where > > the problem can be observed? > > This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633" > After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx` > will be invoked, which will wakeup tx queue and clear queue state. > If you agree with it, I will repost with this Fixes tag later. > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") Could you please explain why it is specific to RX NAPI but not TX? Thanks ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CGME20250814054321epcas5p1dd83614241a15c78645e7f08d5e959c3@epcas5p1.samsung.com>]
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze [not found] ` <CGME20250814054321epcas5p1dd83614241a15c78645e7f08d5e959c3@epcas5p1.samsung.com> @ 2025-08-14 5:43 ` Junnan Wu 2025-08-14 6:49 ` Jason Wang 2025-08-15 1:07 ` Jason Wang 0 siblings, 2 replies; 16+ messages in thread From: Junnan Wu @ 2025-08-14 5:43 UTC (permalink / raw) To: jasowang Cc: andrew+netdev, davem, edumazet, eperezma, junnan01.wu, kuba, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Thu, 14 Aug 2025 12:01:18 +0800 Jason Wang wrote: > On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote: > > > > On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote: > > > Sounds like a fix people may want to backport. Could you repost with > > > an appropriate Fixes tag added, pointing to the earliest commit where > > > the problem can be observed? > > > > This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633" > > After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx` > > will be invoked, which will wakeup tx queue and clear queue state. > > If you agree with it, I will repost with this Fixes tag later. > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > Could you please explain why it is specific to RX NAPI but not TX? > > Thanks This issue appears in suspend flow, if a TCP connection in host VM is still sending packet before driver suspend is completed, it will tigger RX napi schedule, Finally "use after free" happens when tcp ack timer is up. And in suspend flow, the action to send packet is already stopped in guest VM, therefore TX napi will not be scheduled. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze 2025-08-14 5:43 ` Junnan Wu @ 2025-08-14 6:49 ` Jason Wang [not found] ` <CGME20250814070809epcas5p46bc5a5fe62fd5bedc8a858ae4d28a92a@epcas5p4.samsung.com> 2025-08-15 1:07 ` Jason Wang 1 sibling, 1 reply; 16+ messages in thread From: Jason Wang @ 2025-08-14 6:49 UTC (permalink / raw) To: Junnan Wu Cc: andrew+netdev, davem, edumazet, eperezma, kuba, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Thu, Aug 14, 2025 at 2:44 PM Junnan Wu <junnan01.wu@samsung.com> wrote: > > On Thu, 14 Aug 2025 12:01:18 +0800 Jason Wang wrote: > > On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote: > > > > > > On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote: > > > > Sounds like a fix people may want to backport. Could you repost with > > > > an appropriate Fixes tag added, pointing to the earliest commit where > > > > the problem can be observed? > > > > > > This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633" > > > After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx` > > > will be invoked, which will wakeup tx queue and clear queue state. > > > If you agree with it, I will repost with this Fixes tag later. > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > Could you please explain why it is specific to RX NAPI but not TX? > > > > Thanks > > This issue appears in suspend flow, if a TCP connection in host VM is still > sending packet before driver suspend is completed, it will tigger RX napi schedule, > Finally "use after free" happens when tcp ack timer is up. > > And in suspend flow, the action to send packet is already stopped in guest VM, The TX interrupt and NAPI is not disabled yet. Or anything I miss here? > therefore TX napi will not be scheduled. > Thanks ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CGME20250814070809epcas5p46bc5a5fe62fd5bedc8a858ae4d28a92a@epcas5p4.samsung.com>]
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze [not found] ` <CGME20250814070809epcas5p46bc5a5fe62fd5bedc8a858ae4d28a92a@epcas5p4.samsung.com> @ 2025-08-14 7:08 ` Junnan Wu 0 siblings, 0 replies; 16+ messages in thread From: Junnan Wu @ 2025-08-14 7:08 UTC (permalink / raw) To: jasowang Cc: andrew+netdev, davem, edumazet, eperezma, junnan01.wu, kuba, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Thu, 14 Aug 2025 14:49:06 +0800 Jason Wang wrote: > On Thu, Aug 14, 2025 at 2:44 PM Junnan Wu <junnan01.wu@samsung.com> wrote: > > > > On Thu, 14 Aug 2025 12:01:18 +0800 Jason Wang wrote: > > > On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote: > > > > > > > > On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote: > > > > > Sounds like a fix people may want to backport. Could you repost with > > > > > an appropriate Fixes tag added, pointing to the earliest commit where > > > > > the problem can be observed? > > > > > > > > This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633" > > > > After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx` > > > > will be invoked, which will wakeup tx queue and clear queue state. > > > > If you agree with it, I will repost with this Fixes tag later. > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > Could you please explain why it is specific to RX NAPI but not TX? > > > > > > Thanks > > > > This issue appears in suspend flow, if a TCP connection in host VM is still > > sending packet before driver suspend is completed, it will tigger RX napi schedule, > > Finally "use after free" happens when tcp ack timer is up. > > > > And in suspend flow, the action to send packet is already stopped in guest VM, > > The TX interrupt and NAPI is not disabled yet. Or anything I miss here? When system suspends, the userspace progress which based on virtio_net will be freezed firstly, and then driver suspend callback executes. so though TX interrupt and NAPI is not disabled at that time, it will also not be scheduled. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze 2025-08-14 5:43 ` Junnan Wu 2025-08-14 6:49 ` Jason Wang @ 2025-08-15 1:07 ` Jason Wang [not found] ` <CGME20250815022257epcas5p3de66f4e633a87e17275c0b16b888bb4e@epcas5p3.samsung.com> 1 sibling, 1 reply; 16+ messages in thread From: Jason Wang @ 2025-08-15 1:07 UTC (permalink / raw) To: Junnan Wu Cc: andrew+netdev, davem, edumazet, eperezma, kuba, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Thu, Aug 14, 2025 at 2:44 PM Junnan Wu <junnan01.wu@samsung.com> wrote: > > On Thu, 14 Aug 2025 12:01:18 +0800 Jason Wang wrote: > > On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote: > > > > > > On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote: > > > > Sounds like a fix people may want to backport. Could you repost with > > > > an appropriate Fixes tag added, pointing to the earliest commit where > > > > the problem can be observed? > > > > > > This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633" > > > After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx` > > > will be invoked, which will wakeup tx queue and clear queue state. > > > If you agree with it, I will repost with this Fixes tag later. > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > Could you please explain why it is specific to RX NAPI but not TX? > > > > Thanks > > This issue appears in suspend flow, if a TCP connection in host VM is still > sending packet before driver suspend is completed, it will tigger RX napi schedule, > Finally "use after free" happens when tcp ack timer is up. > > And in suspend flow, the action to send packet is already stopped in guest VM, > therefore TX napi will not be scheduled. I basically mean who guarantees the TX NAPI is not scheduled? Thanks > ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CGME20250815022257epcas5p3de66f4e633a87e17275c0b16b888bb4e@epcas5p3.samsung.com>]
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze [not found] ` <CGME20250815022257epcas5p3de66f4e633a87e17275c0b16b888bb4e@epcas5p3.samsung.com> @ 2025-08-15 2:23 ` Junnan Wu 2025-08-15 5:38 ` Jason Wang 0 siblings, 1 reply; 16+ messages in thread From: Junnan Wu @ 2025-08-15 2:23 UTC (permalink / raw) To: jasowang Cc: andrew+netdev, davem, edumazet, eperezma, junnan01.wu, kuba, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu Sorry, I basically mean that the tx napi which caused by userspace will not be scheduled during suspend, others can not be guaranteed, such as unfinished packets already in tx vq etc. But after this patch, once `virtnet_close` completes, both tx and rq napi will be disabled which guarantee their napi will not be scheduled in future. And the tx state will be set to "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze 2025-08-15 2:23 ` Junnan Wu @ 2025-08-15 5:38 ` Jason Wang [not found] ` <CGME20250815060604epcas5p3a6856bcd64ee4ed80abb43b09aab8a42@epcas5p3.samsung.com> 0 siblings, 1 reply; 16+ messages in thread From: Jason Wang @ 2025-08-15 5:38 UTC (permalink / raw) To: Junnan Wu Cc: andrew+netdev, davem, edumazet, eperezma, kuba, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Fri, Aug 15, 2025 at 10:24 AM Junnan Wu <junnan01.wu@samsung.com> wrote: > > Sorry, I basically mean that the tx napi which caused by userspace will not be scheduled during suspend, > others can not be guaranteed, such as unfinished packets already in tx vq etc. > > But after this patch, once `virtnet_close` completes, > both tx and rq napi will be disabled which guarantee their napi will not be scheduled in future. > And the tx state will be set to "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`. Ok, so the commit mentioned by fix tag is incorrect. Thanks > > Thanks. > ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CGME20250815060604epcas5p3a6856bcd64ee4ed80abb43b09aab8a42@epcas5p3.samsung.com>]
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze [not found] ` <CGME20250815060604epcas5p3a6856bcd64ee4ed80abb43b09aab8a42@epcas5p3.samsung.com> @ 2025-08-15 6:06 ` Junnan Wu 2025-08-15 15:55 ` Jakub Kicinski 0 siblings, 1 reply; 16+ messages in thread From: Junnan Wu @ 2025-08-15 6:06 UTC (permalink / raw) To: jasowang Cc: andrew+netdev, davem, edumazet, eperezma, junnan01.wu, kuba, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Fri, 15 Aug 2025 13:38:21 +0800 Jason Wang <jasowang@redhat.com> wrote > On Fri, Aug 15, 2025 at 10:24 AM Junnan Wu <junnan01.wu@samsung.com> wrote: > > > > Sorry, I basically mean that the tx napi which caused by userspace will not be scheduled during suspend, > > others can not be guaranteed, such as unfinished packets already in tx vq etc. > > > > But after this patch, once `virtnet_close` completes, > > both tx and rq napi will be disabled which guarantee their napi will not be scheduled in future. > > And the tx state will be set to "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`. > > Ok, so the commit mentioned by fix tag is incorrect. Yes, you are right. The commit of this fix tag is the first commit I found which add function `virtnet_poll_cleantx`. Actually, we are not sure whether this issue appears after this commit. In our side, this issue is found by chance in version 5.15. It's hard to find the key commit which cause this issue for reason that the reproduction of this scenario is too complex. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze 2025-08-15 6:06 ` Junnan Wu @ 2025-08-15 15:55 ` Jakub Kicinski [not found] ` <CGME20250818011515epcas5p21295745d0e831fd988706877d598f913@epcas5p2.samsung.com> 0 siblings, 1 reply; 16+ messages in thread From: Jakub Kicinski @ 2025-08-15 15:55 UTC (permalink / raw) To: Junnan Wu Cc: jasowang, andrew+netdev, davem, edumazet, eperezma, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Fri, 15 Aug 2025 14:06:15 +0800 Junnan Wu wrote: > On Fri, 15 Aug 2025 13:38:21 +0800 Jason Wang <jasowang@redhat.com> wrote > > On Fri, Aug 15, 2025 at 10:24 AM Junnan Wu <junnan01.wu@samsung.com> wrote: > > > Sorry, I basically mean that the tx napi which caused by > > > userspace will not be scheduled during suspend, others can not be > > > guaranteed, such as unfinished packets already in tx vq etc. > > > > > > But after this patch, once `virtnet_close` completes, > > > both tx and rq napi will be disabled which guarantee their napi > > > will not be scheduled in future. And the tx state will be set to > > > "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`. > > > > Ok, so the commit mentioned by fix tag is incorrect. > > Yes, you are right. The commit of this fix tag is the first commit I > found which add function `virtnet_poll_cleantx`. Actually, we are not > sure whether this issue appears after this commit. > > In our side, this issue is found by chance in version 5.15. > > It's hard to find the key commit which cause this issue > for reason that the reproduction of this scenario is too complex. I think the problem needs to be more clearly understood, and then it will be easier to find the fixes tag. At the face of it the patch makes it look like close() doesn't reliably stop the device, which is highly odd. ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CGME20250818011515epcas5p21295745d0e831fd988706877d598f913@epcas5p2.samsung.com>]
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze [not found] ` <CGME20250818011515epcas5p21295745d0e831fd988706877d598f913@epcas5p2.samsung.com> @ 2025-08-18 1:15 ` Junnan Wu 2025-08-18 15:39 ` Jakub Kicinski 0 siblings, 1 reply; 16+ messages in thread From: Junnan Wu @ 2025-08-18 1:15 UTC (permalink / raw) To: kuba Cc: andrew+netdev, davem, edumazet, eperezma, jasowang, junnan01.wu, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Fri, 15 Aug 2025 08:55:03 -0700 Jakub Kicinski wrote > On Fri, 15 Aug 2025 14:06:15 +0800 Junnan Wu wrote: > > On Fri, 15 Aug 2025 13:38:21 +0800 Jason Wang <jasowang@redhat.com> wrote > > > On Fri, Aug 15, 2025 at 10:24 AM Junnan Wu <junnan01.wu@samsung.com> wrote: > > > > Sorry, I basically mean that the tx napi which caused by > > > > userspace will not be scheduled during suspend, others can not be > > > > guaranteed, such as unfinished packets already in tx vq etc. > > > > > > > > But after this patch, once `virtnet_close` completes, > > > > both tx and rq napi will be disabled which guarantee their napi > > > > will not be scheduled in future. And the tx state will be set to > > > > "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`. > > > > > > Ok, so the commit mentioned by fix tag is incorrect. > > > > Yes, you are right. The commit of this fix tag is the first commit I > > found which add function `virtnet_poll_cleantx`. Actually, we are not > > sure whether this issue appears after this commit. > > > > In our side, this issue is found by chance in version 5.15. > > > > It's hard to find the key commit which cause this issue > > for reason that the reproduction of this scenario is too complex. > > I think the problem needs to be more clearly understood, and then it > will be easier to find the fixes tag. At the face of it the patch > makes it look like close() doesn't reliably stop the device, which > is highly odd. Yes, you are right. It is really strange that `close()` acts like that, because current order has worked for long time. But panic call stack in our env shows that the function `virtnet_close` and `netif_device_detach` should have a correct execution order. And it needs more time to find the fixes tag. I wonder that is it must have fixes tag to merge? By the way, you mentioned that "the problem need to be more clearly understood", did you mean the descriptions and sequences in commit message are not easy to understand? Do you have some suggestions about this? Thanks! ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze 2025-08-18 1:15 ` Junnan Wu @ 2025-08-18 15:39 ` Jakub Kicinski 2025-08-19 2:48 ` Jason Wang 0 siblings, 1 reply; 16+ messages in thread From: Jakub Kicinski @ 2025-08-18 15:39 UTC (permalink / raw) To: Junnan Wu Cc: andrew+netdev, davem, edumazet, eperezma, jasowang, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Mon, 18 Aug 2025 09:15:22 +0800 Junnan Wu wrote: > > > Yes, you are right. The commit of this fix tag is the first commit I > > > found which add function `virtnet_poll_cleantx`. Actually, we are not > > > sure whether this issue appears after this commit. > > > > > > In our side, this issue is found by chance in version 5.15. > > > > > > It's hard to find the key commit which cause this issue > > > for reason that the reproduction of this scenario is too complex. > > > > I think the problem needs to be more clearly understood, and then it > > will be easier to find the fixes tag. At the face of it the patch > > makes it look like close() doesn't reliably stop the device, which > > is highly odd. > > Yes, you are right. It is really strange that `close()` acts like > that, because current order has worked for long time. But panic call > stack in our env shows that the function `virtnet_close` and > `netif_device_detach` should have a correct execution order. And it > needs more time to find the fixes tag. I wonder that is it must have > fixes tag to merge? > > By the way, you mentioned that "the problem need to be more clearly > understood", did you mean the descriptions and sequences in commit > message are not easy to understand? Do you have some suggestions > about this? Perhaps Jason gets your explanation and will correct me, but to me it seems like the fix is based on trial and error rather than clear understanding of the problem. If you understood the problem clearly you should be able to find the Fixes tag without a problem.. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze 2025-08-18 15:39 ` Jakub Kicinski @ 2025-08-19 2:48 ` Jason Wang [not found] ` <CGME20250819033318epcas5p3325d9c2481db3e40d776197d13f09d5a@epcas5p3.samsung.com> 0 siblings, 1 reply; 16+ messages in thread From: Jason Wang @ 2025-08-19 2:48 UTC (permalink / raw) To: Jakub Kicinski Cc: Junnan Wu, andrew+netdev, davem, edumazet, eperezma, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Mon, Aug 18, 2025 at 11:39 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 18 Aug 2025 09:15:22 +0800 Junnan Wu wrote: > > > > Yes, you are right. The commit of this fix tag is the first commit I > > > > found which add function `virtnet_poll_cleantx`. Actually, we are not > > > > sure whether this issue appears after this commit. > > > > > > > > In our side, this issue is found by chance in version 5.15. > > > > > > > > It's hard to find the key commit which cause this issue > > > > for reason that the reproduction of this scenario is too complex. > > > > > > I think the problem needs to be more clearly understood, and then it > > > will be easier to find the fixes tag. At the face of it the patch > > > makes it look like close() doesn't reliably stop the device, which > > > is highly odd. > > > > Yes, you are right. It is really strange that `close()` acts like > > that, because current order has worked for long time. But panic call > > stack in our env shows that the function `virtnet_close` and > > `netif_device_detach` should have a correct execution order. And it > > needs more time to find the fixes tag. I wonder that is it must have > > fixes tag to merge? > > > > By the way, you mentioned that "the problem need to be more clearly > > understood", did you mean the descriptions and sequences in commit > > message are not easy to understand? Do you have some suggestions > > about this? > > Perhaps Jason gets your explanation and will correct me, but to me it > seems like the fix is based on trial and error rather than clear > understanding of the problem. If you understood the problem clearly > you should be able to find the Fixes tag without a problem.. > +1 The code looks fine but the fixes tag needs to be correct. Thanks ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CGME20250819033318epcas5p3325d9c2481db3e40d776197d13f09d5a@epcas5p3.samsung.com>]
* Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze [not found] ` <CGME20250819033318epcas5p3325d9c2481db3e40d776197d13f09d5a@epcas5p3.samsung.com> @ 2025-08-19 3:33 ` Junnan Wu 0 siblings, 0 replies; 16+ messages in thread From: Junnan Wu @ 2025-08-19 3:33 UTC (permalink / raw) To: jasowang Cc: andrew+netdev, davem, edumazet, eperezma, junnan01.wu, kuba, lei19.wang, linux-kernel, mst, netdev, pabeni, q1.huang, virtualization, xuanzhuo, ying123.xu On Tue, 19 Aug 2025 10:48:37 +0800 Jason Wang wrote: > On Mon, Aug 18, 2025 at 11:39 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Mon, 18 Aug 2025 09:15:22 +0800 Junnan Wu wrote: > > > > > Yes, you are right. The commit of this fix tag is the first commit I > > > > > found which add function `virtnet_poll_cleantx`. Actually, we are not > > > > > sure whether this issue appears after this commit. > > > > > > > > > > In our side, this issue is found by chance in version 5.15. > > > > > > > > > > It's hard to find the key commit which cause this issue > > > > > for reason that the reproduction of this scenario is too complex. > > > > > > > > I think the problem needs to be more clearly understood, and then it > > > > will be easier to find the fixes tag. At the face of it the patch > > > > makes it look like close() doesn't reliably stop the device, which > > > > is highly odd. > > > > > > Yes, you are right. It is really strange that `close()` acts like > > > that, because current order has worked for long time. But panic call > > > stack in our env shows that the function `virtnet_close` and > > > `netif_device_detach` should have a correct execution order. And it > > > needs more time to find the fixes tag. I wonder that is it must have > > > fixes tag to merge? > > > > > > By the way, you mentioned that "the problem need to be more clearly > > > understood", did you mean the descriptions and sequences in commit > > > message are not easy to understand? Do you have some suggestions > > > about this? > > > > Perhaps Jason gets your explanation and will correct me, but to me it > > seems like the fix is based on trial and error rather than clear > > understanding of the problem. If you understood the problem clearly > > you should be able to find the Fixes tag without a problem.. > > > > +1 > > The code looks fine but the fixes tag needs to be correct. > > Thanks Well, I will do some works to find out the fixes tag. Once there's progress, I will let you know. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-08-19 3:34 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20250812090805epcas5p25ba25ca1c5c68bbdf9df27d95ccae4f5@epcas5p2.samsung.com> 2025-08-12 9:08 ` [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze Junnan Wu 2025-08-14 0:23 ` Jakub Kicinski [not found] ` <CGME20250814023554epcas5p4b3dcab50835e2da4749be1be135def20@epcas5p4.samsung.com> 2025-08-14 2:36 ` Junnan Wu 2025-08-14 4:01 ` Jason Wang [not found] ` <CGME20250814054321epcas5p1dd83614241a15c78645e7f08d5e959c3@epcas5p1.samsung.com> 2025-08-14 5:43 ` Junnan Wu 2025-08-14 6:49 ` Jason Wang [not found] ` <CGME20250814070809epcas5p46bc5a5fe62fd5bedc8a858ae4d28a92a@epcas5p4.samsung.com> 2025-08-14 7:08 ` Junnan Wu 2025-08-15 1:07 ` Jason Wang [not found] ` <CGME20250815022257epcas5p3de66f4e633a87e17275c0b16b888bb4e@epcas5p3.samsung.com> 2025-08-15 2:23 ` Junnan Wu 2025-08-15 5:38 ` Jason Wang [not found] ` <CGME20250815060604epcas5p3a6856bcd64ee4ed80abb43b09aab8a42@epcas5p3.samsung.com> 2025-08-15 6:06 ` Junnan Wu 2025-08-15 15:55 ` Jakub Kicinski [not found] ` <CGME20250818011515epcas5p21295745d0e831fd988706877d598f913@epcas5p2.samsung.com> 2025-08-18 1:15 ` Junnan Wu 2025-08-18 15:39 ` Jakub Kicinski 2025-08-19 2:48 ` Jason Wang [not found] ` <CGME20250819033318epcas5p3325d9c2481db3e40d776197d13f09d5a@epcas5p3.samsung.com> 2025-08-19 3:33 ` Junnan Wu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).