[PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
@ 2024-01-15  1:29 Zhu Yanjun
  2024-01-15  2:20 ` Jason Wang
  2024-01-16 12:04 ` Paolo Abeni
  0 siblings, 2 replies; 29+ messages in thread
From: Zhu Yanjun @ 2024-01-15  1:29 UTC (permalink / raw)
  To: mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
	virtualization, netdev
  Cc: Zhu Yanjun

From: Zhu Yanjun <yanjun.zhu@linux.dev>

Some devices emulate the virtio_net hardwares. When virtio_net
driver sends commands to the emulated hardware, normally the
hardware needs time to response. Sometimes the time is very
long. Thus, the following will appear. Then the whole system
will hang.
The similar problems also occur in Intel NICs and Mellanox NICs.
As such, the similar solution is borrowed from them. A timeout
value is added and the timeout value as large as possible is set
to ensure that the driver gets the maximum possible response from
the hardware.

"
[  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157]
[  213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
[  213.796194] irq event stamp: 67740
[  213.796195] hardirqs last  enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
[  213.796208] softirqs last  enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
[  213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
[  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9
[  213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
[  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
[  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b
[  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
[  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001
[  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000
[  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002
[  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000
[  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000
[  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0
[  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  213.796247] PKRU: 55555554
[  213.796249] Call Trace:
[  213.796250]  <IRQ>
[  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
[  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
[  213.796269]  ? hrtimer_interrupt+0xf8/0x230
[  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
[  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
[  213.796282]  </IRQ>
[  213.796284]  <TASK>
[  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
[  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
[  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
[  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
[  213.796328]  virtio_dev_probe+0x195/0x230
[  213.796333]  really_probe+0x19f/0x400
[  213.796338]  ? __pfx___driver_attach+0x10/0x10
[  213.796340]  __driver_probe_device+0x78/0x160
[  213.796343]  driver_probe_device+0x1f/0x90
[  213.796346]  __driver_attach+0xd6/0x1d0
[  213.796349]  bus_for_each_dev+0x8c/0xe0
[  213.796355]  bus_add_driver+0x119/0x220
[  213.796359]  driver_register+0x59/0x100
[  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
[  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
[  213.796375]  do_one_initcall+0x6f/0x380
[  213.796384]  do_init_module+0x60/0x240
[  213.796388]  init_module_from_file+0x86/0xc0
[  213.796396]  idempotent_init_module+0x129/0x2c0
[  213.796406]  __x64_sys_finit_module+0x5e/0xb0
[  213.796409]  do_syscall_64+0x60/0xe0
[  213.796415]  ? do_syscall_64+0x6f/0xe0
[  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
[  213.796424]  ? do_syscall_64+0x6f/0xe0
[  213.796427]  ? do_syscall_64+0x6f/0xe0
[  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  213.796435] RIP: 0033:0x7f7758f279cd
[  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48
[  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd
[  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f
[  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860
[  213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
[  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0
[  213.796482]  </TASK>
"

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/net/virtio_net.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 51b1868d2f22..28b7dd917a43 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 {
 	struct scatterlist *sgs[4], hdr, stat;
 	unsigned out_num = 0, tmp;
-	int ret;
+	int ret, timeout = 200;
 
 	/* Caller should know better */
 	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
@@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 	 * into the hypervisor, so the request should be handled immediately.
 	 */
 	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
-	       !virtqueue_is_broken(vi->cvq))
+	       !virtqueue_is_broken(vi->cvq)) {
+		if (timeout)
+			timeout--;
+		else
+			return false; /* long time no response */
+
 		cpu_relax();
+	}
 
 	return vi->ctrl->status == VIRTIO_NET_OK;
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-15  1:29 [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang Zhu Yanjun
@ 2024-01-15  2:20 ` Jason Wang
  2024-01-15 10:25   ` Zhu Yanjun
       [not found]   ` <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev>
  2024-01-16 12:04 ` Paolo Abeni
  1 sibling, 2 replies; 29+ messages in thread
From: Jason Wang @ 2024-01-15  2:20 UTC (permalink / raw)
  To: Zhu Yanjun
  Cc: mst, xuanzhuo, davem, edumazet, kuba, pabeni, virtualization,
	netdev, Zhu Yanjun

On Mon, Jan 15, 2024 at 9:35 AM Zhu Yanjun <yanjun.zhu@intel.com> wrote:
>
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> Some devices emulate the virtio_net hardwares. When virtio_net
> driver sends commands to the emulated hardware, normally the
> hardware needs time to response. Sometimes the time is very
> long. Thus, the following will appear. Then the whole system
> will hang.
> The similar problems also occur in Intel NICs and Mellanox NICs.
> As such, the similar solution is borrowed from them. A timeout
> value is added and the timeout value as large as possible is set
> to ensure that the driver gets the maximum possible response from
> the hardware.
>
> "
> [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157]
> [  213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
> [  213.796194] irq event stamp: 67740
> [  213.796195] hardirqs last  enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
> [  213.796208] softirqs last  enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> [  213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9
> [  213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
> [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
> [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b
> [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
> [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001
> [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000
> [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002
> [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000
> [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000
> [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0
> [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [  213.796247] PKRU: 55555554
> [  213.796249] Call Trace:
> [  213.796250]  <IRQ>
> [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
> [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
> [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
> [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
> [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
> [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> [  213.796282]  </IRQ>
> [  213.796284]  <TASK>
> [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
> [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
> [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
> [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
> [  213.796328]  virtio_dev_probe+0x195/0x230
> [  213.796333]  really_probe+0x19f/0x400
> [  213.796338]  ? __pfx___driver_attach+0x10/0x10
> [  213.796340]  __driver_probe_device+0x78/0x160
> [  213.796343]  driver_probe_device+0x1f/0x90
> [  213.796346]  __driver_attach+0xd6/0x1d0
> [  213.796349]  bus_for_each_dev+0x8c/0xe0
> [  213.796355]  bus_add_driver+0x119/0x220
> [  213.796359]  driver_register+0x59/0x100
> [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
> [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
> [  213.796375]  do_one_initcall+0x6f/0x380
> [  213.796384]  do_init_module+0x60/0x240
> [  213.796388]  init_module_from_file+0x86/0xc0
> [  213.796396]  idempotent_init_module+0x129/0x2c0
> [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
> [  213.796409]  do_syscall_64+0x60/0xe0
> [  213.796415]  ? do_syscall_64+0x6f/0xe0
> [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
> [  213.796424]  ? do_syscall_64+0x6f/0xe0
> [  213.796427]  ? do_syscall_64+0x6f/0xe0
> [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [  213.796435] RIP: 0033:0x7f7758f279cd
> [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48
> [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd
> [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f
> [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860
> [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
> [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0
> [  213.796482]  </TASK>
> "
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
>  drivers/net/virtio_net.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 51b1868d2f22..28b7dd917a43 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>  {
>         struct scatterlist *sgs[4], hdr, stat;
>         unsigned out_num = 0, tmp;
> -       int ret;
> +       int ret, timeout = 200;

Any reason we choose this value or how can we know it works for all
types of devices?

A more easy way is to use cond_resched() but it may have side effects
as well[1]. But it seems less intrusive anyhow than the proposal here?

Thanks

[1] https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html

>
>         /* Caller should know better */
>         BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>          * into the hypervisor, so the request should be handled immediately.
>          */
>         while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> -              !virtqueue_is_broken(vi->cvq))
> +              !virtqueue_is_broken(vi->cvq)) {
> +               if (timeout)
> +                       timeout--;
> +               else
> +                       return false; /* long time no response */
> +
>                 cpu_relax();
> +       }
>
>         return vi->ctrl->status == VIRTIO_NET_OK;
>  }
> --
> 2.42.0
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-15  2:20 ` Jason Wang
@ 2024-01-15 10:25   ` Zhu Yanjun
       [not found]   ` <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev>
  1 sibling, 0 replies; 29+ messages in thread
From: Zhu Yanjun @ 2024-01-15 10:25 UTC (permalink / raw)
  To: Jason Wang, Zhu Yanjun
  Cc: mst, xuanzhuo, davem, edumazet, kuba, pabeni, virtualization,
	netdev


在 2024/1/15 10:20, Jason Wang 写道:
> On Mon, Jan 15, 2024 at 9:35 AM Zhu Yanjun <yanjun.zhu@intel.com> wrote:
>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>
>> Some devices emulate the virtio_net hardwares. When virtio_net
>> driver sends commands to the emulated hardware, normally the
>> hardware needs time to response. Sometimes the time is very
>> long. Thus, the following will appear. Then the whole system
>> will hang.
>> The similar problems also occur in Intel NICs and Mellanox NICs.
>> As such, the similar solution is borrowed from them. A timeout
>> value is added and the timeout value as large as possible is set
>> to ensure that the driver gets the maximum possible response from
>> the hardware.
>>
>> "
>> [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157]
>> [  213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
>> [  213.796194] irq event stamp: 67740
>> [  213.796195] hardirqs last  enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> [  213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
>> [  213.796208] softirqs last  enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
>> [  213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
>> [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9
>> [  213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
>> [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
>> [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b
>> [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
>> [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001
>> [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000
>> [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>> [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002
>> [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000
>> [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000
>> [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0
>> [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
>> [  213.796247] PKRU: 55555554
>> [  213.796249] Call Trace:
>> [  213.796250]  <IRQ>
>> [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
>> [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
>> [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
>> [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
>> [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
>> [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
>> [  213.796282]  </IRQ>
>> [  213.796284]  <TASK>
>> [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
>> [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
>> [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
>> [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
>> [  213.796328]  virtio_dev_probe+0x195/0x230
>> [  213.796333]  really_probe+0x19f/0x400
>> [  213.796338]  ? __pfx___driver_attach+0x10/0x10
>> [  213.796340]  __driver_probe_device+0x78/0x160
>> [  213.796343]  driver_probe_device+0x1f/0x90
>> [  213.796346]  __driver_attach+0xd6/0x1d0
>> [  213.796349]  bus_for_each_dev+0x8c/0xe0
>> [  213.796355]  bus_add_driver+0x119/0x220
>> [  213.796359]  driver_register+0x59/0x100
>> [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
>> [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
>> [  213.796375]  do_one_initcall+0x6f/0x380
>> [  213.796384]  do_init_module+0x60/0x240
>> [  213.796388]  init_module_from_file+0x86/0xc0
>> [  213.796396]  idempotent_init_module+0x129/0x2c0
>> [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
>> [  213.796409]  do_syscall_64+0x60/0xe0
>> [  213.796415]  ? do_syscall_64+0x6f/0xe0
>> [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
>> [  213.796424]  ? do_syscall_64+0x6f/0xe0
>> [  213.796427]  ? do_syscall_64+0x6f/0xe0
>> [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>> [  213.796435] RIP: 0033:0x7f7758f279cd
>> [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48
>> [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>> [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd
>> [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f
>> [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860
>> [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
>> [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0
>> [  213.796482]  </TASK>
>> "
>>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> ---
>>   drivers/net/virtio_net.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 51b1868d2f22..28b7dd917a43 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>>   {
>>          struct scatterlist *sgs[4], hdr, stat;
>>          unsigned out_num = 0, tmp;
>> -       int ret;
>> +       int ret, timeout = 200;
> Any reason we choose this value or how can we know it works for all
> types of devices?

Sorry. Resend it again because it is rejected by netdev.

As I mentioned in the commit log, the similar problem also occurs in 
Intel NIC driver and Mellanox NIC driver.


This commit is borrowed from the solution of Intel NIC driver. So the 
value is also from Intel NIC driver solution.

>
> A more easy way is to use cond_resched() but it may have side effects
> as well[1]. But it seems less intrusive anyhow than the proposal here?
Thanks a lot for your suggestions. I have made tests with the commits in 
the link 
https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html.

Because virtio_net driver spins waiting for the response of hardware, 
virtio_net driver can not be unloaded and kernel hang still occurs when 
running "ip link" or unloading virtio_net module.

Zhu Yanjun
>
> Thanks
>
> [1] https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html
>
>>          /* Caller should know better */
>>          BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>>           * into the hypervisor, so the request should be handled immediately.
>>           */
>>          while (!virtqueue_get_buf(vi->cvq, &tmp) &&
>> -              !virtqueue_is_broken(vi->cvq))
>> +              !virtqueue_is_broken(vi->cvq)) {
>> +               if (timeout)
>> +                       timeout--;
>> +               else
>> +                       return false; /* long time no response */
>> +
>>                  cpu_relax();
>> +       }
>>
>>          return vi->ctrl->status == VIRTIO_NET_OK;
>>   }
>> --
>> 2.42.0
>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

[parent not found: <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev>]

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
       [not found]   ` <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev>
@ 2024-01-22  3:06     ` Jason Wang
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Wang @ 2024-01-22  3:06 UTC (permalink / raw)
  To: Zhu Yanjun
  Cc: Zhu Yanjun, mst, xuanzhuo, davem, edumazet, kuba, pabeni,
	virtualization, netdev

On Mon, Jan 15, 2024 at 6:22 PM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>
>
> 在 2024/1/15 10:20, Jason Wang 写道:
>
> On Mon, Jan 15, 2024 at 9:35 AM Zhu Yanjun <yanjun.zhu@intel.com> wrote:
>
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> Some devices emulate the virtio_net hardwares. When virtio_net
> driver sends commands to the emulated hardware, normally the
> hardware needs time to response. Sometimes the time is very
> long. Thus, the following will appear. Then the whole system
> will hang.
> The similar problems also occur in Intel NICs and Mellanox NICs.
> As such, the similar solution is borrowed from them. A timeout
> value is added and the timeout value as large as possible is set
> to ensure that the driver gets the maximum possible response from
> the hardware.
>
> "
> [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157]
> [  213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
> [  213.796194] irq event stamp: 67740
> [  213.796195] hardirqs last  enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
> [  213.796208] softirqs last  enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> [  213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9
> [  213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
> [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
> [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b
> [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
> [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001
> [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000
> [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002
> [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000
> [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000
> [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0
> [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [  213.796247] PKRU: 55555554
> [  213.796249] Call Trace:
> [  213.796250]  <IRQ>
> [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
> [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
> [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
> [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
> [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
> [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> [  213.796282]  </IRQ>
> [  213.796284]  <TASK>
> [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
> [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
> [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
> [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
> [  213.796328]  virtio_dev_probe+0x195/0x230
> [  213.796333]  really_probe+0x19f/0x400
> [  213.796338]  ? __pfx___driver_attach+0x10/0x10
> [  213.796340]  __driver_probe_device+0x78/0x160
> [  213.796343]  driver_probe_device+0x1f/0x90
> [  213.796346]  __driver_attach+0xd6/0x1d0
> [  213.796349]  bus_for_each_dev+0x8c/0xe0
> [  213.796355]  bus_add_driver+0x119/0x220
> [  213.796359]  driver_register+0x59/0x100
> [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
> [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
> [  213.796375]  do_one_initcall+0x6f/0x380
> [  213.796384]  do_init_module+0x60/0x240
> [  213.796388]  init_module_from_file+0x86/0xc0
> [  213.796396]  idempotent_init_module+0x129/0x2c0
> [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
> [  213.796409]  do_syscall_64+0x60/0xe0
> [  213.796415]  ? do_syscall_64+0x6f/0xe0
> [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
> [  213.796424]  ? do_syscall_64+0x6f/0xe0
> [  213.796427]  ? do_syscall_64+0x6f/0xe0
> [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [  213.796435] RIP: 0033:0x7f7758f279cd
> [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48
> [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd
> [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f
> [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860
> [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
> [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0
> [  213.796482]  </TASK>
> "
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
>  drivers/net/virtio_net.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 51b1868d2f22..28b7dd917a43 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>  {
>         struct scatterlist *sgs[4], hdr, stat;
>         unsigned out_num = 0, tmp;
> -       int ret;
> +       int ret, timeout = 200;
>
> Any reason we choose this value or how can we know it works for all
> types of devices?
>
> As I mentioned in the commit log, the similar problem also occurs in Intel NIC driver and Mellanox NIC driver.
>
> This commit is borrowed from the solution of Intel NIC driver. So the value is also from Intel NIC driver solution.

Right, so basically I meant we need a solution that works for all vendors.

>
> A more easy way is to use cond_resched() but it may have side effects
> as well[1]. But it seems less intrusive anyhow than the proposal here?
>
> Thanks a lot for your suggestions. I have made tests with the commits in the link https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html.
>
> Because virtio_net driver spins waiting for the response of hardware, virtio_net driver can not be unloaded and kernel hang still occurs when running "ip link" or unloading virtio_net module.

Well, yes. It aims to solve the lockups as described in this commit
log. It doesn't solve the issue of infinite waiting etc.

Thanks

>
> Zhu Yanjun
>
> Thanks
>
> [1] https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html
>
>         /* Caller should know better */
>         BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>          * into the hypervisor, so the request should be handled immediately.
>          */
>         while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> -              !virtqueue_is_broken(vi->cvq))
> +              !virtqueue_is_broken(vi->cvq)) {
> +               if (timeout)
> +                       timeout--;
> +               else
> +                       return false; /* long time no response */
> +
>                 cpu_relax();
> +       }
>
>         return vi->ctrl->status == VIRTIO_NET_OK;
>  }
> --
> 2.42.0
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-15  1:29 [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang Zhu Yanjun
  2024-01-15  2:20 ` Jason Wang
@ 2024-01-16 12:04 ` Paolo Abeni
  2024-01-18 12:01   ` Zhu Yanjun
  2024-01-18 13:14   ` Michael S. Tsirkin
  1 sibling, 2 replies; 29+ messages in thread
From: Paolo Abeni @ 2024-01-16 12:04 UTC (permalink / raw)
  To: Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet, kuba,
	virtualization, netdev
  Cc: Zhu Yanjun

On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
> Some devices emulate the virtio_net hardwares. When virtio_net
> driver sends commands to the emulated hardware, normally the
> hardware needs time to response. Sometimes the time is very
> long. Thus, the following will appear. Then the whole system
> will hang.
> The similar problems also occur in Intel NICs and Mellanox NICs.
> As such, the similar solution is borrowed from them. A timeout
> value is added and the timeout value as large as possible is set
> to ensure that the driver gets the maximum possible response from
> the hardware.
> 
> "
> [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157]
> [  213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
> [  213.796194] irq event stamp: 67740
> [  213.796195] hardirqs last  enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
> [  213.796208] softirqs last  enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> [  213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9
> [  213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
> [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
> [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b
> [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
> [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001
> [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000
> [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002
> [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000
> [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000
> [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0
> [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [  213.796247] PKRU: 55555554
> [  213.796249] Call Trace:
> [  213.796250]  <IRQ>
> [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
> [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
> [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
> [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
> [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
> [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> [  213.796282]  </IRQ>
> [  213.796284]  <TASK>
> [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
> [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
> [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
> [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
> [  213.796328]  virtio_dev_probe+0x195/0x230
> [  213.796333]  really_probe+0x19f/0x400
> [  213.796338]  ? __pfx___driver_attach+0x10/0x10
> [  213.796340]  __driver_probe_device+0x78/0x160
> [  213.796343]  driver_probe_device+0x1f/0x90
> [  213.796346]  __driver_attach+0xd6/0x1d0
> [  213.796349]  bus_for_each_dev+0x8c/0xe0
> [  213.796355]  bus_add_driver+0x119/0x220
> [  213.796359]  driver_register+0x59/0x100
> [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
> [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
> [  213.796375]  do_one_initcall+0x6f/0x380
> [  213.796384]  do_init_module+0x60/0x240
> [  213.796388]  init_module_from_file+0x86/0xc0
> [  213.796396]  idempotent_init_module+0x129/0x2c0
> [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
> [  213.796409]  do_syscall_64+0x60/0xe0
> [  213.796415]  ? do_syscall_64+0x6f/0xe0
> [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
> [  213.796424]  ? do_syscall_64+0x6f/0xe0
> [  213.796427]  ? do_syscall_64+0x6f/0xe0
> [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [  213.796435] RIP: 0033:0x7f7758f279cd
> [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48
> [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd
> [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f
> [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860
> [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
> [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0
> [  213.796482]  </TASK>
> "
> 
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
>  drivers/net/virtio_net.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 51b1868d2f22..28b7dd917a43 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>  {
>  	struct scatterlist *sgs[4], hdr, stat;
>  	unsigned out_num = 0, tmp;
> -	int ret;
> +	int ret, timeout = 200;
>  
>  	/* Caller should know better */
>  	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>  	 * into the hypervisor, so the request should be handled immediately.
>  	 */
>  	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> -	       !virtqueue_is_broken(vi->cvq))
> +	       !virtqueue_is_broken(vi->cvq)) {
> +		if (timeout)
> +			timeout--;

This is not really a timeout, just a loop counter. 200 iterations could
be a very short time on reasonable H/W. I guess this avoid the soft
lockup, but possibly (likely?) breaks the functionality when we need to
loop for some non negligible time.

I fear we need a more complex solution, as mentioned by Micheal in the
thread you quoted.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-16 12:04 ` Paolo Abeni
@ 2024-01-18 12:01   ` Zhu Yanjun
  2024-01-19 14:27     ` Heng Qi
  2024-01-18 13:14   ` Michael S. Tsirkin
  1 sibling, 1 reply; 29+ messages in thread
From: Zhu Yanjun @ 2024-01-18 12:01 UTC (permalink / raw)
  To: Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet,
	kuba, virtualization, netdev


在 2024/1/16 20:04, Paolo Abeni 写道:
> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote:
>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>
>> Some devices emulate the virtio_net hardwares. When virtio_net
>> driver sends commands to the emulated hardware, normally the
>> hardware needs time to response. Sometimes the time is very
>> long. Thus, the following will appear. Then the whole system
>> will hang.
>> The similar problems also occur in Intel NICs and Mellanox NICs.
>> As such, the similar solution is borrowed from them. A timeout
>> value is added and the timeout value as large as possible is set
>> to ensure that the driver gets the maximum possible response from
>> the hardware.
>>
>> "
>> [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157]
>> [  213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
>> [  213.796194] irq event stamp: 67740
>> [  213.796195] hardirqs last  enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> [  213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
>> [  213.796208] softirqs last  enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
>> [  213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
>> [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9
>> [  213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
>> [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
>> [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b
>> [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
>> [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001
>> [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000
>> [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>> [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002
>> [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000
>> [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000
>> [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0
>> [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
>> [  213.796247] PKRU: 55555554
>> [  213.796249] Call Trace:
>> [  213.796250]  <IRQ>
>> [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
>> [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
>> [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
>> [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
>> [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
>> [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
>> [  213.796282]  </IRQ>
>> [  213.796284]  <TASK>
>> [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
>> [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
>> [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
>> [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
>> [  213.796328]  virtio_dev_probe+0x195/0x230
>> [  213.796333]  really_probe+0x19f/0x400
>> [  213.796338]  ? __pfx___driver_attach+0x10/0x10
>> [  213.796340]  __driver_probe_device+0x78/0x160
>> [  213.796343]  driver_probe_device+0x1f/0x90
>> [  213.796346]  __driver_attach+0xd6/0x1d0
>> [  213.796349]  bus_for_each_dev+0x8c/0xe0
>> [  213.796355]  bus_add_driver+0x119/0x220
>> [  213.796359]  driver_register+0x59/0x100
>> [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
>> [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
>> [  213.796375]  do_one_initcall+0x6f/0x380
>> [  213.796384]  do_init_module+0x60/0x240
>> [  213.796388]  init_module_from_file+0x86/0xc0
>> [  213.796396]  idempotent_init_module+0x129/0x2c0
>> [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
>> [  213.796409]  do_syscall_64+0x60/0xe0
>> [  213.796415]  ? do_syscall_64+0x6f/0xe0
>> [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
>> [  213.796424]  ? do_syscall_64+0x6f/0xe0
>> [  213.796427]  ? do_syscall_64+0x6f/0xe0
>> [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>> [  213.796435] RIP: 0033:0x7f7758f279cd
>> [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48
>> [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>> [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd
>> [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f
>> [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860
>> [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
>> [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0
>> [  213.796482]  </TASK>
>> "
>>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> ---
>>   drivers/net/virtio_net.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 51b1868d2f22..28b7dd917a43 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>>   {
>>   	struct scatterlist *sgs[4], hdr, stat;
>>   	unsigned out_num = 0, tmp;
>> -	int ret;
>> +	int ret, timeout = 200;
>>   
>>   	/* Caller should know better */
>>   	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>>   	 * into the hypervisor, so the request should be handled immediately.
>>   	 */
>>   	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
>> -	       !virtqueue_is_broken(vi->cvq))
>> +	       !virtqueue_is_broken(vi->cvq)) {
>> +		if (timeout)
>> +			timeout--;
> This is not really a timeout, just a loop counter. 200 iterations could
> be a very short time on reasonable H/W. I guess this avoid the soft
> lockup, but possibly (likely?) breaks the functionality when we need to
> loop for some non negligible time.
>
> I fear we need a more complex solution, as mentioned by Micheal in the
> thread you quoted.

Got it. I also look forward to the more complex solution to this problem.

Zhu Yanjun

>
> Cheers,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-18 12:01   ` Zhu Yanjun
@ 2024-01-19 14:27     ` Heng Qi
  2024-01-19 17:29       ` Andrew Lunn
  2024-01-22  3:08       ` Jason Wang
  0 siblings, 2 replies; 29+ messages in thread
From: Heng Qi @ 2024-01-19 14:27 UTC (permalink / raw)
  To: Zhu Yanjun, Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo,
	davem, edumazet, kuba, virtualization, netdev



在 2024/1/18 下午8:01, Zhu Yanjun 写道:
>
> 在 2024/1/16 20:04, Paolo Abeni 写道:
>> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote:
>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>
>>> Some devices emulate the virtio_net hardwares. When virtio_net
>>> driver sends commands to the emulated hardware, normally the
>>> hardware needs time to response. Sometimes the time is very
>>> long. Thus, the following will appear. Then the whole system
>>> will hang.
>>> The similar problems also occur in Intel NICs and Mellanox NICs.
>>> As such, the similar solution is borrowed from them. A timeout
>>> value is added and the timeout value as large as possible is set
>>> to ensure that the driver gets the maximum possible response from
>>> the hardware.
>>>
>>> "
>>> [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! 
>>> [(udev-worker):3157]
>>> [  213.796114] Modules linked in: virtio_net(+) net_failover 
>>> failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common 
>>> intel_uncore_frequency intel_uncore_frequency_common intel_ifs 
>>> i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp 
>>> coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support 
>>> vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev 
>>> intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci 
>>> isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus 
>>> i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf 
>>> ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update 
>>> fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel 
>>> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 
>>> bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi 
>>> pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
>>> [  213.796194] irq event stamp: 67740
>>> [  213.796195] hardirqs last  enabled at (67739): 
>>> [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>> [  213.796203] hardirqs last disabled at (67740): 
>>> [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
>>> [  213.796208] softirqs last  enabled at (67686): 
>>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
>>> [  213.796214] softirqs last disabled at (67681): 
>>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
>>> [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded 
>>> Not tainted 6.7.0+ #9
>>> [  213.796220] Hardware name: Intel Corporation 
>>> M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 
>>> 11/14/2022
>>> [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
>>> [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 
>>> 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 
>>> 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 
>>> 89 e8 5b
>>> [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
>>> [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 
>>> 0000000000000001
>>> [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: 
>>> ff2f15095896f000
>>> [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 
>>> 0000000000000000
>>> [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 
>>> 0000000000000002
>>> [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: 
>>> ff2f1509534f3000
>>> [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) 
>>> knlGS:0000000000000000
>>> [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 
>>> 0000000000f71ef0
>>> [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>> 0000000000000000
>>> [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 
>>> 0000000000000400
>>> [  213.796247] PKRU: 55555554
>>> [  213.796249] Call Trace:
>>> [  213.796250]  <IRQ>
>>> [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
>>> [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
>>> [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
>>> [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
>>> [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
>>> [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
>>> [  213.796282]  </IRQ>
>>> [  213.796284]  <TASK>
>>> [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>> [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
>>> [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
>>> [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
>>> [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
>>> [  213.796328]  virtio_dev_probe+0x195/0x230
>>> [  213.796333]  really_probe+0x19f/0x400
>>> [  213.796338]  ? __pfx___driver_attach+0x10/0x10
>>> [  213.796340]  __driver_probe_device+0x78/0x160
>>> [  213.796343]  driver_probe_device+0x1f/0x90
>>> [  213.796346]  __driver_attach+0xd6/0x1d0
>>> [  213.796349]  bus_for_each_dev+0x8c/0xe0
>>> [  213.796355]  bus_add_driver+0x119/0x220
>>> [  213.796359]  driver_register+0x59/0x100
>>> [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
>>> [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
>>> [  213.796375]  do_one_initcall+0x6f/0x380
>>> [  213.796384]  do_init_module+0x60/0x240
>>> [  213.796388]  init_module_from_file+0x86/0xc0
>>> [  213.796396]  idempotent_init_module+0x129/0x2c0
>>> [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
>>> [  213.796409]  do_syscall_64+0x60/0xe0
>>> [  213.796415]  ? do_syscall_64+0x6f/0xe0
>>> [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
>>> [  213.796424]  ? do_syscall_64+0x6f/0xe0
>>> [  213.796427]  ? do_syscall_64+0x6f/0xe0
>>> [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>> [  213.796435] RIP: 0033:0x7f7758f279cd
>>> [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e 
>>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 
>>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 
>>> 89 01 48
>>> [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 
>>> 0000000000000139
>>> [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 
>>> 00007f7758f279cd
>>> [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 
>>> 000000000000000f
>>> [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 
>>> 00007ffe2cad8860
>>> [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 
>>> 0000000000020000
>>> [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 
>>> 0000557f98783aa0
>>> [  213.796482]  </TASK>
>>> "
>>>
>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>>> ---
>>>   drivers/net/virtio_net.c | 10 ++++++++--
>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 51b1868d2f22..28b7dd917a43 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct 
>>> virtnet_info *vi, u8 class, u8 cmd,
>>>   {
>>>       struct scatterlist *sgs[4], hdr, stat;
>>>       unsigned out_num = 0, tmp;
>>> -    int ret;
>>> +    int ret, timeout = 200;
>>>         /* Caller should know better */
>>>       BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
>>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct 
>>> virtnet_info *vi, u8 class, u8 cmd,
>>>        * into the hypervisor, so the request should be handled 
>>> immediately.
>>>        */
>>>       while (!virtqueue_get_buf(vi->cvq, &tmp) &&
>>> -           !virtqueue_is_broken(vi->cvq))
>>> +           !virtqueue_is_broken(vi->cvq)) {
>>> +        if (timeout)
>>> +            timeout--;
>> This is not really a timeout, just a loop counter. 200 iterations could
>> be a very short time on reasonable H/W. I guess this avoid the soft
>> lockup, but possibly (likely?) breaks the functionality when we need to
>> loop for some non negligible time.
>>
>> I fear we need a more complex solution, as mentioned by Micheal in the
>> thread you quoted.
>
> Got it. I also look forward to the more complex solution to this problem.

Can we add a device capability (new feature bit) such as 
ctrq_wait_timeout to get a reasonable timeout？

Thanks,
Heng

>
> Zhu Yanjun
>
>>
>> Cheers,
>>
>> Paolo
>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-19 14:27     ` Heng Qi
@ 2024-01-19 17:29       ` Andrew Lunn
  2024-01-20  4:21         ` Zhu Yanjun
  2024-01-22  2:12         ` Zhu Yanjun
  2024-01-22  3:08       ` Jason Wang
  1 sibling, 2 replies; 29+ messages in thread
From: Andrew Lunn @ 2024-01-19 17:29 UTC (permalink / raw)
  To: Heng Qi
  Cc: Zhu Yanjun, Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo,
	davem, edumazet, kuba, virtualization, netdev

> > > >       while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > -           !virtqueue_is_broken(vi->cvq))
> > > > +           !virtqueue_is_broken(vi->cvq)) {
> > > > +        if (timeout)
> > > > +            timeout--;
> > > This is not really a timeout, just a loop counter. 200 iterations could
> > > be a very short time on reasonable H/W. I guess this avoid the soft
> > > lockup, but possibly (likely?) breaks the functionality when we need to
> > > loop for some non negligible time.
> > > 
> > > I fear we need a more complex solution, as mentioned by Micheal in the
> > > thread you quoted.
> > 
> > Got it. I also look forward to the more complex solution to this problem.
> 
> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> to get a reasonable timeout？

The usual solution to this is include/linux/iopoll.h. If you can sleep
read_poll_timeout() otherwise read_poll_timeout_atomic().

	Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-19 17:29       ` Andrew Lunn
@ 2024-01-20  4:21         ` Zhu Yanjun
  2024-01-22  2:12         ` Zhu Yanjun
  1 sibling, 0 replies; 29+ messages in thread
From: Zhu Yanjun @ 2024-01-20  4:21 UTC (permalink / raw)
  To: Andrew Lunn, Heng Qi
  Cc: Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet,
	kuba, virtualization, netdev


在 2024/1/20 1:29, Andrew Lunn 写道:
>>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
>>>>> -           !virtqueue_is_broken(vi->cvq))
>>>>> +           !virtqueue_is_broken(vi->cvq)) {
>>>>> +        if (timeout)
>>>>> +            timeout--;
>>>> This is not really a timeout, just a loop counter. 200 iterations could
>>>> be a very short time on reasonable H/W. I guess this avoid the soft
>>>> lockup, but possibly (likely?) breaks the functionality when we need to
>>>> loop for some non negligible time.
>>>>
>>>> I fear we need a more complex solution, as mentioned by Micheal in the
>>>> thread you quoted.
>>> Got it. I also look forward to the more complex solution to this problem.
>> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
>> to get a reasonable timeout？
> The usual solution to this is include/linux/iopoll.h. If you can sleep
> read_poll_timeout() otherwise read_poll_timeout_atomic().

Thanks. The 2 functions read_poll_timeout() and 
read_poll_timeout_atomic() are interesting.

Zhu Yanjun

>
> 	Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-19 17:29       ` Andrew Lunn
  2024-01-20  4:21         ` Zhu Yanjun
@ 2024-01-22  2:12         ` Zhu Yanjun
  2024-01-22  3:14           ` Jason Wang
  1 sibling, 1 reply; 29+ messages in thread
From: Zhu Yanjun @ 2024-01-22  2:12 UTC (permalink / raw)
  To: Andrew Lunn, Heng Qi
  Cc: Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet,
	kuba, virtualization, netdev


在 2024/1/20 1:29, Andrew Lunn 写道:
>>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
>>>>> -           !virtqueue_is_broken(vi->cvq))
>>>>> +           !virtqueue_is_broken(vi->cvq)) {
>>>>> +        if (timeout)
>>>>> +            timeout--;
>>>> This is not really a timeout, just a loop counter. 200 iterations could
>>>> be a very short time on reasonable H/W. I guess this avoid the soft
>>>> lockup, but possibly (likely?) breaks the functionality when we need to
>>>> loop for some non negligible time.
>>>>
>>>> I fear we need a more complex solution, as mentioned by Micheal in the
>>>> thread you quoted.
>>> Got it. I also look forward to the more complex solution to this problem.
>> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
>> to get a reasonable timeout？
> The usual solution to this is include/linux/iopoll.h. If you can sleep
> read_poll_timeout() otherwise read_poll_timeout_atomic().

I read carefully the functions read_poll_timeout() and 
read_poll_timeout_atomic(). The timeout is set by the caller of the 2 
functions.

As such, can we add a module parameter to customize this timeout value 
by the user?

Or this timeout value is stored in device register, virtio_net driver 
will read this timeout value at initialization?

Zhu Yanjun

>
> 	Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  2:12         ` Zhu Yanjun
@ 2024-01-22  3:14           ` Jason Wang
  2024-01-22  3:58             ` Xuan Zhuo
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Wang @ 2024-01-22  3:14 UTC (permalink / raw)
  To: Zhu Yanjun
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, xuanzhuo,
	davem, edumazet, kuba, virtualization, netdev

On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>
>
> 在 2024/1/20 1:29, Andrew Lunn 写道:
> >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> >>>>> -           !virtqueue_is_broken(vi->cvq))
> >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> >>>>> +        if (timeout)
> >>>>> +            timeout--;
> >>>> This is not really a timeout, just a loop counter. 200 iterations could
> >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> >>>> loop for some non negligible time.
> >>>>
> >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> >>>> thread you quoted.
> >>> Got it. I also look forward to the more complex solution to this problem.
> >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> >> to get a reasonable timeout？
> > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > read_poll_timeout() otherwise read_poll_timeout_atomic().
>
> I read carefully the functions read_poll_timeout() and
> read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> functions.

FYI, in order to avoid a swtich of atomic or not, we need convert rx
mode setting to workqueue first:

https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html

>
> As such, can we add a module parameter to customize this timeout value
> by the user?

Who is the "user" here, or how can the "user" know the value?

>
> Or this timeout value is stored in device register, virtio_net driver
> will read this timeout value at initialization?

See another thread. The design needs to be general, or you can post a RFC.

In another thought, we've already had a tx watchdog, maybe we can have
something similar to cvq and use timeout + reset in that case.

Thans

>
> Zhu Yanjun
>
> >
> >       Andrew
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  3:14           ` Jason Wang
@ 2024-01-22  3:58             ` Xuan Zhuo
  2024-01-22  4:16               ` Jason Wang
  0 siblings, 1 reply; 29+ messages in thread
From: Xuan Zhuo @ 2024-01-22  3:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> >
> >
> > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > >>>>> +        if (timeout)
> > >>>>> +            timeout--;
> > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > >>>> loop for some non negligible time.
> > >>>>
> > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > >>>> thread you quoted.
> > >>> Got it. I also look forward to the more complex solution to this problem.
> > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > >> to get a reasonable timeout？
> > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> >
> > I read carefully the functions read_poll_timeout() and
> > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > functions.
>
> FYI, in order to avoid a swtich of atomic or not, we need convert rx
> mode setting to workqueue first:
>
> https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
>
> >
> > As such, can we add a module parameter to customize this timeout value
> > by the user?
>
> Who is the "user" here, or how can the "user" know the value?
>
> >
> > Or this timeout value is stored in device register, virtio_net driver
> > will read this timeout value at initialization?
>
> See another thread. The design needs to be general, or you can post a RFC.
>
> In another thought, we've already had a tx watchdog, maybe we can have
> something similar to cvq and use timeout + reset in that case.

But we may block by the reset ^_^ if the device is broken?

Thanks.


>
> Thans
>
> >
> > Zhu Yanjun
> >
> > >
> > >       Andrew
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  3:58             ` Xuan Zhuo
@ 2024-01-22  4:16               ` Jason Wang
  2024-01-22  6:16                 ` Xuan Zhuo
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Wang @ 2024-01-22  4:16 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > >
> > >
> > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > >>>>> +        if (timeout)
> > > >>>>> +            timeout--;
> > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > >>>> loop for some non negligible time.
> > > >>>>
> > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > >>>> thread you quoted.
> > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > >> to get a reasonable timeout？
> > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > >
> > > I read carefully the functions read_poll_timeout() and
> > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > functions.
> >
> > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > mode setting to workqueue first:
> >
> > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> >
> > >
> > > As such, can we add a module parameter to customize this timeout value
> > > by the user?
> >
> > Who is the "user" here, or how can the "user" know the value?
> >
> > >
> > > Or this timeout value is stored in device register, virtio_net driver
> > > will read this timeout value at initialization?
> >
> > See another thread. The design needs to be general, or you can post a RFC.
> >
> > In another thought, we've already had a tx watchdog, maybe we can have
> > something similar to cvq and use timeout + reset in that case.
>
> But we may block by the reset ^_^ if the device is broken?

I mean vq reset here.

It looks like we have multiple goals here

1) avoid lockups, using workqueue + cond_resched() seems to be
sufficient, it has issue but nothing new
2) recover from the unresponsive device, the issue for timeout is that
it needs to deal with false positives

Thanks

>
> Thanks.
>
>
> >
> > Thans
> >
> > >
> > > Zhu Yanjun
> > >
> > > >
> > > >       Andrew
> > >
> >
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  4:16               ` Jason Wang
@ 2024-01-22  6:16                 ` Xuan Zhuo
  2024-01-22  6:55                   ` Jason Wang
  0 siblings, 1 reply; 29+ messages in thread
From: Xuan Zhuo @ 2024-01-22  6:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > >
> > > >
> > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > >>>>> +        if (timeout)
> > > > >>>>> +            timeout--;
> > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > >>>> loop for some non negligible time.
> > > > >>>>
> > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > >>>> thread you quoted.
> > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > >> to get a reasonable timeout？
> > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > >
> > > > I read carefully the functions read_poll_timeout() and
> > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > functions.
> > >
> > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > mode setting to workqueue first:
> > >
> > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > >
> > > >
> > > > As such, can we add a module parameter to customize this timeout value
> > > > by the user?
> > >
> > > Who is the "user" here, or how can the "user" know the value?
> > >
> > > >
> > > > Or this timeout value is stored in device register, virtio_net driver
> > > > will read this timeout value at initialization?
> > >
> > > See another thread. The design needs to be general, or you can post a RFC.
> > >
> > > In another thought, we've already had a tx watchdog, maybe we can have
> > > something similar to cvq and use timeout + reset in that case.
> >
> > But we may block by the reset ^_^ if the device is broken?
>
> I mean vq reset here.

I see.

I mean when the deivce is broken, the vq reset also many be blocked.

	void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
	{
		struct virtio_pci_modern_common_cfg __iomem *cfg;

		cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;

		vp_iowrite16(index, &cfg->cfg.queue_select);
		vp_iowrite16(1, &cfg->queue_reset);

		while (vp_ioread16(&cfg->queue_reset))
			msleep(1);

		while (vp_ioread16(&cfg->cfg.queue_enable))
			msleep(1);
	}
	EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);

In this function, for the broken device, we can not expect something.


>
> It looks like we have multiple goals here
>
> 1) avoid lockups, using workqueue + cond_resched() seems to be
> sufficient, it has issue but nothing new
> 2) recover from the unresponsive device, the issue for timeout is that
> it needs to deal with false positives


I agree.

But I want to add a new goal, cvq async. In the netdim, we will
send many requests via the cvq, so the cvq async will be nice.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thans
> > >
> > > >
> > > > Zhu Yanjun
> > > >
> > > > >
> > > > >       Andrew
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  6:16                 ` Xuan Zhuo
@ 2024-01-22  6:55                   ` Jason Wang
  2024-01-22  6:58                     ` Jason Wang
  2024-01-22  7:01                     ` Xuan Zhuo
  0 siblings, 2 replies; 29+ messages in thread
From: Jason Wang @ 2024-01-22  6:55 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > >
> > > > >
> > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > >>>>> +        if (timeout)
> > > > > >>>>> +            timeout--;
> > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > >>>> loop for some non negligible time.
> > > > > >>>>
> > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > >>>> thread you quoted.
> > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > >> to get a reasonable timeout？
> > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > >
> > > > > I read carefully the functions read_poll_timeout() and
> > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > functions.
> > > >
> > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > mode setting to workqueue first:
> > > >
> > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > >
> > > > >
> > > > > As such, can we add a module parameter to customize this timeout value
> > > > > by the user?
> > > >
> > > > Who is the "user" here, or how can the "user" know the value?
> > > >
> > > > >
> > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > will read this timeout value at initialization?
> > > >
> > > > See another thread. The design needs to be general, or you can post a RFC.
> > > >
> > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > something similar to cvq and use timeout + reset in that case.
> > >
> > > But we may block by the reset ^_^ if the device is broken?
> >
> > I mean vq reset here.
>
> I see.
>
> I mean when the deivce is broken, the vq reset also many be blocked.
>
>         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
>         {
>                 struct virtio_pci_modern_common_cfg __iomem *cfg;
>
>                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
>
>                 vp_iowrite16(index, &cfg->cfg.queue_select);
>                 vp_iowrite16(1, &cfg->queue_reset);
>
>                 while (vp_ioread16(&cfg->queue_reset))
>                         msleep(1);
>
>                 while (vp_ioread16(&cfg->cfg.queue_enable))
>                         msleep(1);
>         }
>         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
>
> In this function, for the broken device, we can not expect something.

Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.

Thanks

>
>
> >
> > It looks like we have multiple goals here
> >
> > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > sufficient, it has issue but nothing new
> > 2) recover from the unresponsive device, the issue for timeout is that
> > it needs to deal with false positives
>
>
> I agree.
>
> But I want to add a new goal, cvq async. In the netdim, we will
> send many requests via the cvq, so the cvq async will be nice.
>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thans
> > > >
> > > > >
> > > > > Zhu Yanjun
> > > > >
> > > > > >
> > > > > >       Andrew
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  6:55                   ` Jason Wang
@ 2024-01-22  6:58                     ` Jason Wang
  2024-01-22  7:02                       ` Xuan Zhuo
  2024-01-22  7:01                     ` Xuan Zhuo
  1 sibling, 1 reply; 29+ messages in thread
From: Jason Wang @ 2024-01-22  6:58 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > > >
> > > > > >
> > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > >>>>> +        if (timeout)
> > > > > > >>>>> +            timeout--;
> > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > >>>> loop for some non negligible time.
> > > > > > >>>>
> > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > >>>> thread you quoted.
> > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > >> to get a reasonable timeout？
> > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > >
> > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > functions.
> > > > >
> > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > mode setting to workqueue first:
> > > > >
> > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > > >
> > > > > >
> > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > by the user?
> > > > >
> > > > > Who is the "user" here, or how can the "user" know the value?
> > > > >
> > > > > >
> > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > will read this timeout value at initialization?
> > > > >
> > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > >
> > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > something similar to cvq and use timeout + reset in that case.
> > > >
> > > > But we may block by the reset ^_^ if the device is broken?
> > >
> > > I mean vq reset here.
> >
> > I see.
> >
> > I mean when the deivce is broken, the vq reset also many be blocked.
> >
> >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> >         {
> >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> >
> >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> >
> >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> >                 vp_iowrite16(1, &cfg->queue_reset);
> >
> >                 while (vp_ioread16(&cfg->queue_reset))
> >                         msleep(1);
> >
> >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> >                         msleep(1);
> >         }
> >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> >
> > In this function, for the broken device, we can not expect something.
>
> Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
>
> Thanks
>
> >
> >
> > >
> > > It looks like we have multiple goals here
> > >
> > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > sufficient, it has issue but nothing new
> > > 2) recover from the unresponsive device, the issue for timeout is that
> > > it needs to deal with false positives
> >
> >
> > I agree.
> >
> > But I want to add a new goal, cvq async. In the netdim, we will
> > send many requests via the cvq, so the cvq async will be nice.

Then you need an interrupt for cvq.

FYI, I've posted a series that use interrupt for cvq in the past:

https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/

Haven't found time in working on this anymore, maybe we can start from
this or not.

Thanks

> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thans
> > > > >
> > > > > >
> > > > > > Zhu Yanjun
> > > > > >
> > > > > > >
> > > > > > >       Andrew
> > > > > >
> > > > >
> > > >
> > >
> >


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  6:58                     ` Jason Wang
@ 2024-01-22  7:02                       ` Xuan Zhuo
  2024-01-22  7:19                         ` Jason Wang
       [not found]                         ` <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev>
  0 siblings, 2 replies; 29+ messages in thread
From: Xuan Zhuo @ 2024-01-22  7:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > > > >
> > > > > > >
> > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > > >>>>> +        if (timeout)
> > > > > > > >>>>> +            timeout--;
> > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > > >>>> loop for some non negligible time.
> > > > > > > >>>>
> > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > > >>>> thread you quoted.
> > > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > > >> to get a reasonable timeout？
> > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > > >
> > > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > > functions.
> > > > > >
> > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > > mode setting to workqueue first:
> > > > > >
> > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > > > >
> > > > > > >
> > > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > > by the user?
> > > > > >
> > > > > > Who is the "user" here, or how can the "user" know the value?
> > > > > >
> > > > > > >
> > > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > > will read this timeout value at initialization?
> > > > > >
> > > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > > >
> > > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > > something similar to cvq and use timeout + reset in that case.
> > > > >
> > > > > But we may block by the reset ^_^ if the device is broken?
> > > >
> > > > I mean vq reset here.
> > >
> > > I see.
> > >
> > > I mean when the deivce is broken, the vq reset also many be blocked.
> > >
> > >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> > >         {
> > >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> > >
> > >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> > >
> > >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> > >                 vp_iowrite16(1, &cfg->queue_reset);
> > >
> > >                 while (vp_ioread16(&cfg->queue_reset))
> > >                         msleep(1);
> > >
> > >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> > >                         msleep(1);
> > >         }
> > >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> > >
> > > In this function, for the broken device, we can not expect something.
> >
> > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
> >
> > Thanks
> >
> > >
> > >
> > > >
> > > > It looks like we have multiple goals here
> > > >
> > > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > > sufficient, it has issue but nothing new
> > > > 2) recover from the unresponsive device, the issue for timeout is that
> > > > it needs to deal with false positives
> > >
> > >
> > > I agree.
> > >
> > > But I want to add a new goal, cvq async. In the netdim, we will
> > > send many requests via the cvq, so the cvq async will be nice.
>
> Then you need an interrupt for cvq.
>
> FYI, I've posted a series that use interrupt for cvq in the past:
>
> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/

I know this. But the interrupt maybe not a good solution without new space.

>
> Haven't found time in working on this anymore, maybe we can start from
> this or not.


I said async, but my aim is to put many requests to the cvq before getting the
response.

Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/

Thanks.


>
> Thanks
>
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Thans
> > > > > >
> > > > > > >
> > > > > > > Zhu Yanjun
> > > > > > >
> > > > > > > >
> > > > > > > >       Andrew
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  7:02                       ` Xuan Zhuo
@ 2024-01-22  7:19                         ` Jason Wang
  2024-01-22  7:25                           ` Xuan Zhuo
       [not found]                         ` <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev>
  1 sibling, 1 reply; 29+ messages in thread
From: Jason Wang @ 2024-01-22  7:19 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > >>>>> +        if (timeout)
> > > > > > > > >>>>> +            timeout--;
> > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > > > >>>> loop for some non negligible time.
> > > > > > > > >>>>
> > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > > > >>>> thread you quoted.
> > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > > > >> to get a reasonable timeout？
> > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > > > >
> > > > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > > > functions.
> > > > > > >
> > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > > > mode setting to workqueue first:
> > > > > > >
> > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > > > > >
> > > > > > > >
> > > > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > > > by the user?
> > > > > > >
> > > > > > > Who is the "user" here, or how can the "user" know the value?
> > > > > > >
> > > > > > > >
> > > > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > > > will read this timeout value at initialization?
> > > > > > >
> > > > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > > > >
> > > > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > > > something similar to cvq and use timeout + reset in that case.
> > > > > >
> > > > > > But we may block by the reset ^_^ if the device is broken?
> > > > >
> > > > > I mean vq reset here.
> > > >
> > > > I see.
> > > >
> > > > I mean when the deivce is broken, the vq reset also many be blocked.
> > > >
> > > >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> > > >         {
> > > >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> > > >
> > > >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> > > >
> > > >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> > > >                 vp_iowrite16(1, &cfg->queue_reset);
> > > >
> > > >                 while (vp_ioread16(&cfg->queue_reset))
> > > >                         msleep(1);
> > > >
> > > >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> > > >                         msleep(1);
> > > >         }
> > > >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> > > >
> > > > In this function, for the broken device, we can not expect something.
> > >
> > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > >
> > > > > It looks like we have multiple goals here
> > > > >
> > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > > > sufficient, it has issue but nothing new
> > > > > 2) recover from the unresponsive device, the issue for timeout is that
> > > > > it needs to deal with false positives
> > > >
> > > >
> > > > I agree.
> > > >
> > > > But I want to add a new goal, cvq async. In the netdim, we will
> > > > send many requests via the cvq, so the cvq async will be nice.
> >
> > Then you need an interrupt for cvq.
> >
> > FYI, I've posted a series that use interrupt for cvq in the past:
> >
> > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/
>
> I know this. But the interrupt maybe not a good solution without new space.

What do you mean by "new space"?

We can introduce something like enable_cb_delayed(), then you will
only get notified after several requests.

>
> >
> > Haven't found time in working on this anymore, maybe we can start from
> > this or not.
>
>
> I said async, but my aim is to put many requests to the cvq before getting the
> response.

It doesn't differ from TX/RX in this case.

>
> Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/
>

This seems like a hack, if interrupt is used, you can simply do that
in the callback.

Thanks

> Thanks.
>
>
> >
> > Thanks
> >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thans
> > > > > > >
> > > > > > > >
> > > > > > > > Zhu Yanjun
> > > > > > > >
> > > > > > > > >
> > > > > > > > >       Andrew
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  7:19                         ` Jason Wang
@ 2024-01-22  7:25                           ` Xuan Zhuo
  2024-01-22  7:57                             ` Jason Wang
  0 siblings, 1 reply; 29+ messages in thread
From: Xuan Zhuo @ 2024-01-22  7:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > > >>>>> +        if (timeout)
> > > > > > > > > >>>>> +            timeout--;
> > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > > > > >>>> loop for some non negligible time.
> > > > > > > > > >>>>
> > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > > > > >>>> thread you quoted.
> > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > > > > >> to get a reasonable timeout？
> > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > > > > >
> > > > > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > > > > functions.
> > > > > > > >
> > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > > > > mode setting to workqueue first:
> > > > > > > >
> > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > > > > > >
> > > > > > > > >
> > > > > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > > > > by the user?
> > > > > > > >
> > > > > > > > Who is the "user" here, or how can the "user" know the value?
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > > > > will read this timeout value at initialization?
> > > > > > > >
> > > > > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > > > > >
> > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > > > > something similar to cvq and use timeout + reset in that case.
> > > > > > >
> > > > > > > But we may block by the reset ^_^ if the device is broken?
> > > > > >
> > > > > > I mean vq reset here.
> > > > >
> > > > > I see.
> > > > >
> > > > > I mean when the deivce is broken, the vq reset also many be blocked.
> > > > >
> > > > >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> > > > >         {
> > > > >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> > > > >
> > > > >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> > > > >
> > > > >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> > > > >                 vp_iowrite16(1, &cfg->queue_reset);
> > > > >
> > > > >                 while (vp_ioread16(&cfg->queue_reset))
> > > > >                         msleep(1);
> > > > >
> > > > >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> > > > >                         msleep(1);
> > > > >         }
> > > > >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> > > > >
> > > > > In this function, for the broken device, we can not expect something.
> > > >
> > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > It looks like we have multiple goals here
> > > > > >
> > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > > > > sufficient, it has issue but nothing new
> > > > > > 2) recover from the unresponsive device, the issue for timeout is that
> > > > > > it needs to deal with false positives
> > > > >
> > > > >
> > > > > I agree.
> > > > >
> > > > > But I want to add a new goal, cvq async. In the netdim, we will
> > > > > send many requests via the cvq, so the cvq async will be nice.
> > >
> > > Then you need an interrupt for cvq.
> > >
> > > FYI, I've posted a series that use interrupt for cvq in the past:
> > >
> > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/
> >
> > I know this. But the interrupt maybe not a good solution without new space.
>
> What do you mean by "new space"?

Yes, I know, the cvq can work with interrupt by the virtio spec.
But as I know, many hypervisors implement the cvq without supporting interrupt.
If we let the cvq work with interrupt without negotiation then
many hypervisors will hang on the new kernel.

>
> We can introduce something like enable_cb_delayed(), then you will
> only get notified after several requests.
>
> >
> > >
> > > Haven't found time in working on this anymore, maybe we can start from
> > > this or not.
> >
> >
> > I said async, but my aim is to put many requests to the cvq before getting the
> > response.
>
> It doesn't differ from TX/RX in this case.
>
> >
> > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/
> >
>
> This seems like a hack, if interrupt is used, you can simply do that
> in the callback.

YES.

I also want to change the code, I just want to say the async is a goal.

For the rx mode, we have introduce a work queue, I want to move the
sending command job to the work queue. The caller just wakeup
the work queue.

If the caller want to got the result sync, then the caller can wait for it.
If not, the caller can register an function to the work queue.

And I think it will be easy to implement the timeout inside the workqueue.

Thanks.


>
> Thanks
>
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thans
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Zhu Yanjun
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >       Andrew
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  7:25                           ` Xuan Zhuo
@ 2024-01-22  7:57                             ` Jason Wang
  2024-01-22  8:01                               ` Xuan Zhuo
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Wang @ 2024-01-22  7:57 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > > > >>>>> +        if (timeout)
> > > > > > > > > > >>>>> +            timeout--;
> > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > > > > > >>>> loop for some non negligible time.
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > > > > > >>>> thread you quoted.
> > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > > > > > >> to get a reasonable timeout？
> > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > > > > > >
> > > > > > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > > > > > functions.
> > > > > > > > >
> > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > > > > > mode setting to workqueue first:
> > > > > > > > >
> > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > > > > > by the user?
> > > > > > > > >
> > > > > > > > > Who is the "user" here, or how can the "user" know the value?
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > > > > > will read this timeout value at initialization?
> > > > > > > > >
> > > > > > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > > > > > >
> > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > > > > > something similar to cvq and use timeout + reset in that case.
> > > > > > > >
> > > > > > > > But we may block by the reset ^_^ if the device is broken?
> > > > > > >
> > > > > > > I mean vq reset here.
> > > > > >
> > > > > > I see.
> > > > > >
> > > > > > I mean when the deivce is broken, the vq reset also many be blocked.
> > > > > >
> > > > > >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> > > > > >         {
> > > > > >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> > > > > >
> > > > > >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> > > > > >
> > > > > >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> > > > > >                 vp_iowrite16(1, &cfg->queue_reset);
> > > > > >
> > > > > >                 while (vp_ioread16(&cfg->queue_reset))
> > > > > >                         msleep(1);
> > > > > >
> > > > > >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> > > > > >                         msleep(1);
> > > > > >         }
> > > > > >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> > > > > >
> > > > > > In this function, for the broken device, we can not expect something.
> > > > >
> > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > It looks like we have multiple goals here
> > > > > > >
> > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > > > > > sufficient, it has issue but nothing new
> > > > > > > 2) recover from the unresponsive device, the issue for timeout is that
> > > > > > > it needs to deal with false positives
> > > > > >
> > > > > >
> > > > > > I agree.
> > > > > >
> > > > > > But I want to add a new goal, cvq async. In the netdim, we will
> > > > > > send many requests via the cvq, so the cvq async will be nice.
> > > >
> > > > Then you need an interrupt for cvq.
> > > >
> > > > FYI, I've posted a series that use interrupt for cvq in the past:
> > > >
> > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/
> > >
> > > I know this. But the interrupt maybe not a good solution without new space.
> >
> > What do you mean by "new space"?
>
> Yes, I know, the cvq can work with interrupt by the virtio spec.
> But as I know, many hypervisors implement the cvq without supporting interrupt.

It's a bug of the hypervisor that needs to be fix. Interrupt is
provided by transport not the virtio itself.

Otherwise it can only support for Linux but not other OSes.

> If we let the cvq work with interrupt without negotiation then
> many hypervisors will hang on the new kernel.
>
> >
> > We can introduce something like enable_cb_delayed(), then you will
> > only get notified after several requests.
> >
> > >
> > > >
> > > > Haven't found time in working on this anymore, maybe we can start from
> > > > this or not.
> > >
> > >
> > > I said async, but my aim is to put many requests to the cvq before getting the
> > > response.
> >
> > It doesn't differ from TX/RX in this case.
> >
> > >
> > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/
> > >
> >
> > This seems like a hack, if interrupt is used, you can simply do that
> > in the callback.
>
> YES.
>
> I also want to change the code, I just want to say the async is a goal.
>
> For the rx mode, we have introduce a work queue, I want to move the
> sending command job to the work queue. The caller just wakeup
> the work queue.
>
> If the caller want to got the result sync, then the caller can wait for it.
> If not, the caller can register an function to the work queue.
>
> And I think it will be easy to implement the timeout inside the workqueue.

Looks much more complicated than a simple interrupt + timer/watchdog etc.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thans
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Zhu Yanjun
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >       Andrew
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  7:57                             ` Jason Wang
@ 2024-01-22  8:01                               ` Xuan Zhuo
  2024-01-22  8:32                                 ` Jason Wang
  0 siblings, 1 reply; 29+ messages in thread
From: Xuan Zhuo @ 2024-01-22  8:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, 22 Jan 2024 15:57:08 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > > > > >>>>> +        if (timeout)
> > > > > > > > > > > >>>>> +            timeout--;
> > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > > > > > > >>>> loop for some non negligible time.
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > > > > > > >>>> thread you quoted.
> > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > > > > > > >> to get a reasonable timeout？
> > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > > > > > > >
> > > > > > > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > > > > > > functions.
> > > > > > > > > >
> > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > > > > > > mode setting to workqueue first:
> > > > > > > > > >
> > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > > > > > > by the user?
> > > > > > > > > >
> > > > > > > > > > Who is the "user" here, or how can the "user" know the value?
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > > > > > > will read this timeout value at initialization?
> > > > > > > > > >
> > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > > > > > > >
> > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > > > > > > something similar to cvq and use timeout + reset in that case.
> > > > > > > > >
> > > > > > > > > But we may block by the reset ^_^ if the device is broken?
> > > > > > > >
> > > > > > > > I mean vq reset here.
> > > > > > >
> > > > > > > I see.
> > > > > > >
> > > > > > > I mean when the deivce is broken, the vq reset also many be blocked.
> > > > > > >
> > > > > > >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> > > > > > >         {
> > > > > > >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> > > > > > >
> > > > > > >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> > > > > > >
> > > > > > >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> > > > > > >                 vp_iowrite16(1, &cfg->queue_reset);
> > > > > > >
> > > > > > >                 while (vp_ioread16(&cfg->queue_reset))
> > > > > > >                         msleep(1);
> > > > > > >
> > > > > > >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> > > > > > >                         msleep(1);
> > > > > > >         }
> > > > > > >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> > > > > > >
> > > > > > > In this function, for the broken device, we can not expect something.
> > > > > >
> > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > It looks like we have multiple goals here
> > > > > > > >
> > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > > > > > > sufficient, it has issue but nothing new
> > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that
> > > > > > > > it needs to deal with false positives
> > > > > > >
> > > > > > >
> > > > > > > I agree.
> > > > > > >
> > > > > > > But I want to add a new goal, cvq async. In the netdim, we will
> > > > > > > send many requests via the cvq, so the cvq async will be nice.
> > > > >
> > > > > Then you need an interrupt for cvq.
> > > > >
> > > > > FYI, I've posted a series that use interrupt for cvq in the past:
> > > > >
> > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/
> > > >
> > > > I know this. But the interrupt maybe not a good solution without new space.
> > >
> > > What do you mean by "new space"?
> >
> > Yes, I know, the cvq can work with interrupt by the virtio spec.
> > But as I know, many hypervisors implement the cvq without supporting interrupt.
>
> It's a bug of the hypervisor that needs to be fix. Interrupt is
> provided by transport not the virtio itself.

YES. I agree.

But I still think we should not work with interrupt without any negotiation
directly. I more like to introduce a new feature to enable this.

Thanks.


>
> Otherwise it can only support for Linux but not other OSes.
>
> > If we let the cvq work with interrupt without negotiation then
> > many hypervisors will hang on the new kernel.
> >
> > >
> > > We can introduce something like enable_cb_delayed(), then you will
> > > only get notified after several requests.
> > >
> > > >
> > > > >
> > > > > Haven't found time in working on this anymore, maybe we can start from
> > > > > this or not.
> > > >
> > > >
> > > > I said async, but my aim is to put many requests to the cvq before getting the
> > > > response.
> > >
> > > It doesn't differ from TX/RX in this case.
> > >
> > > >
> > > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/
> > > >
> > >
> > > This seems like a hack, if interrupt is used, you can simply do that
> > > in the callback.
> >
> > YES.
> >
> > I also want to change the code, I just want to say the async is a goal.
> >
> > For the rx mode, we have introduce a work queue, I want to move the
> > sending command job to the work queue. The caller just wakeup
> > the work queue.
> >
> > If the caller want to got the result sync, then the caller can wait for it.
> > If not, the caller can register an function to the work queue.
> >
> > And I think it will be easy to implement the timeout inside the workqueue.
>
> Looks much more complicated than a simple interrupt + timer/watchdog etc.



>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thans
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Zhu Yanjun
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >       Andrew
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  8:01                               ` Xuan Zhuo
@ 2024-01-22  8:32                                 ` Jason Wang
  2024-01-22  9:11                                   ` Xuan Zhuo
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Wang @ 2024-01-22  8:32 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, Jan 22, 2024 at 4:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 22 Jan 2024 15:57:08 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > > > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > > > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > > > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > > > > > >>>>> +        if (timeout)
> > > > > > > > > > > > >>>>> +            timeout--;
> > > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > > > > > > > >>>> loop for some non negligible time.
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > > > > > > > >>>> thread you quoted.
> > > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > > > > > > > >> to get a reasonable timeout？
> > > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > > > > > > > >
> > > > > > > > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > > > > > > > functions.
> > > > > > > > > > >
> > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > > > > > > > mode setting to workqueue first:
> > > > > > > > > > >
> > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > > > > > > > by the user?
> > > > > > > > > > >
> > > > > > > > > > > Who is the "user" here, or how can the "user" know the value?
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > > > > > > > will read this timeout value at initialization?
> > > > > > > > > > >
> > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > > > > > > > >
> > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > > > > > > > something similar to cvq and use timeout + reset in that case.
> > > > > > > > > >
> > > > > > > > > > But we may block by the reset ^_^ if the device is broken?
> > > > > > > > >
> > > > > > > > > I mean vq reset here.
> > > > > > > >
> > > > > > > > I see.
> > > > > > > >
> > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked.
> > > > > > > >
> > > > > > > >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> > > > > > > >         {
> > > > > > > >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> > > > > > > >
> > > > > > > >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> > > > > > > >
> > > > > > > >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> > > > > > > >                 vp_iowrite16(1, &cfg->queue_reset);
> > > > > > > >
> > > > > > > >                 while (vp_ioread16(&cfg->queue_reset))
> > > > > > > >                         msleep(1);
> > > > > > > >
> > > > > > > >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> > > > > > > >                         msleep(1);
> > > > > > > >         }
> > > > > > > >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> > > > > > > >
> > > > > > > > In this function, for the broken device, we can not expect something.
> > > > > > >
> > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > It looks like we have multiple goals here
> > > > > > > > >
> > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > > > > > > > sufficient, it has issue but nothing new
> > > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that
> > > > > > > > > it needs to deal with false positives
> > > > > > > >
> > > > > > > >
> > > > > > > > I agree.
> > > > > > > >
> > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will
> > > > > > > > send many requests via the cvq, so the cvq async will be nice.
> > > > > >
> > > > > > Then you need an interrupt for cvq.
> > > > > >
> > > > > > FYI, I've posted a series that use interrupt for cvq in the past:
> > > > > >
> > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/
> > > > >
> > > > > I know this. But the interrupt maybe not a good solution without new space.
> > > >
> > > > What do you mean by "new space"?
> > >
> > > Yes, I know, the cvq can work with interrupt by the virtio spec.
> > > But as I know, many hypervisors implement the cvq without supporting interrupt.
> >
> > It's a bug of the hypervisor that needs to be fix. Interrupt is
> > provided by transport not the virtio itself.
>
> YES. I agree.
>
> But I still think we should not work with interrupt without any negotiation
> directly. I more like to introduce a new feature to enable this.

I can hardly believe we need to workaround the issue of specific
hypervisors like this...

Thanks

>
> Thanks.
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  8:32                                 ` Jason Wang
@ 2024-01-22  9:11                                   ` Xuan Zhuo
  0 siblings, 0 replies; 29+ messages in thread
From: Xuan Zhuo @ 2024-01-22  9:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, 22 Jan 2024 16:32:46 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 22, 2024 at 4:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 22 Jan 2024 15:57:08 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > > > > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > > > > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > > > > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > > > > > > >>>>> +        if (timeout)
> > > > > > > > > > > > > >>>>> +            timeout--;
> > > > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > > > > > > > > >>>> loop for some non negligible time.
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > > > > > > > > >>>> thread you quoted.
> > > > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > > > > > > > > >> to get a reasonable timeout？
> > > > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > > > > > > > > >
> > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > > > > > > > > functions.
> > > > > > > > > > > >
> > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > > > > > > > > mode setting to workqueue first:
> > > > > > > > > > > >
> > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > > > > > > > > by the user?
> > > > > > > > > > > >
> > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value?
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > > > > > > > > will read this timeout value at initialization?
> > > > > > > > > > > >
> > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > > > > > > > > >
> > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > > > > > > > > something similar to cvq and use timeout + reset in that case.
> > > > > > > > > > >
> > > > > > > > > > > But we may block by the reset ^_^ if the device is broken?
> > > > > > > > > >
> > > > > > > > > > I mean vq reset here.
> > > > > > > > >
> > > > > > > > > I see.
> > > > > > > > >
> > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked.
> > > > > > > > >
> > > > > > > > >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> > > > > > > > >         {
> > > > > > > > >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> > > > > > > > >
> > > > > > > > >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> > > > > > > > >
> > > > > > > > >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> > > > > > > > >                 vp_iowrite16(1, &cfg->queue_reset);
> > > > > > > > >
> > > > > > > > >                 while (vp_ioread16(&cfg->queue_reset))
> > > > > > > > >                         msleep(1);
> > > > > > > > >
> > > > > > > > >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> > > > > > > > >                         msleep(1);
> > > > > > > > >         }
> > > > > > > > >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> > > > > > > > >
> > > > > > > > > In this function, for the broken device, we can not expect something.
> > > > > > > >
> > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > It looks like we have multiple goals here
> > > > > > > > > >
> > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > > > > > > > > sufficient, it has issue but nothing new
> > > > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that
> > > > > > > > > > it needs to deal with false positives
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I agree.
> > > > > > > > >
> > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will
> > > > > > > > > send many requests via the cvq, so the cvq async will be nice.
> > > > > > >
> > > > > > > Then you need an interrupt for cvq.
> > > > > > >
> > > > > > > FYI, I've posted a series that use interrupt for cvq in the past:
> > > > > > >
> > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/
> > > > > >
> > > > > > I know this. But the interrupt maybe not a good solution without new space.
> > > > >
> > > > > What do you mean by "new space"?
> > > >
> > > > Yes, I know, the cvq can work with interrupt by the virtio spec.
> > > > But as I know, many hypervisors implement the cvq without supporting interrupt.
> > >
> > > It's a bug of the hypervisor that needs to be fix. Interrupt is
> > > provided by transport not the virtio itself.
> >
> > YES. I agree.
> >
> > But I still think we should not work with interrupt without any negotiation
> > directly. I more like to introduce a new feature to enable this.
>
> I can hardly believe we need to workaround the issue of specific
> hypervisors like this...

Maybe we should hear others.

We are ok for this. I just think for other hypervisors.

Thanks.



>
> Thanks
>
> >
> > Thanks.
> >
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

[parent not found: <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev>]

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
       [not found]                         ` <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev>
@ 2024-01-26  3:13                           ` Zhu Yanjun
  0 siblings, 0 replies; 29+ messages in thread
From: Zhu Yanjun @ 2024-01-26  3:13 UTC (permalink / raw)
  To: Xuan Zhuo, Jason Wang
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev


在 2024/1/26 11:11, Zhu Yanjun 写道:
>
>
> 在 2024/1/22 15:02, Xuan Zhuo 写道:
>> On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang<jasowang@redhat.com>  wrote:
>>> On Mon, Jan 22, 2024 at 2:55 PM Jason Wang<jasowang@redhat.com>  wrote:
>>>> On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo<xuanzhuo@linux.alibaba.com>  wrote:
>>>>> On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang<jasowang@redhat.com>  wrote:
>>>>>> On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo<xuanzhuo@linux.alibaba.com>  wrote:
>>>>>>> On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang<jasowang@redhat.com>  wrote:
>>>>>>>> On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun<yanjun.zhu@linux.dev>  wrote:
>>>>>>>>> 在 2024/1/20 1:29, Andrew Lunn 写道:
>>>>>>>>>>>>>>         while (!virtqueue_get_buf(vi->cvq, &tmp) &&
>>>>>>>>>>>>>> -           !virtqueue_is_broken(vi->cvq))
>>>>>>>>>>>>>> +           !virtqueue_is_broken(vi->cvq)) {
>>>>>>>>>>>>>> +        if (timeout)
>>>>>>>>>>>>>> +            timeout--;
>>>>>>>>>>>>> This is not really a timeout, just a loop counter. 200 iterations could
>>>>>>>>>>>>> be a very short time on reasonable H/W. I guess this avoid the soft
>>>>>>>>>>>>> lockup, but possibly (likely?) breaks the functionality when we need to
>>>>>>>>>>>>> loop for some non negligible time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I fear we need a more complex solution, as mentioned by Micheal in the
>>>>>>>>>>>>> thread you quoted.
>>>>>>>>>>>> Got it. I also look forward to the more complex solution to this problem.
>>>>>>>>>>> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
>>>>>>>>>>> to get a reasonable timeout？
>>>>>>>>>> The usual solution to this is include/linux/iopoll.h. If you can sleep
>>>>>>>>>> read_poll_timeout() otherwise read_poll_timeout_atomic().
>>>>>>>>> I read carefully the functions read_poll_timeout() and
>>>>>>>>> read_poll_timeout_atomic(). The timeout is set by the caller of the 2
>>>>>>>>> functions.
>>>>>>>> FYI, in order to avoid a swtich of atomic or not, we need convert rx
>>>>>>>> mode setting to workqueue first:
>>>>>>>>
>>>>>>>> https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
>>>>>>>>
>>>>>>>>> As such, can we add a module parameter to customize this timeout value
>>>>>>>>> by the user?
>>>>>>>> Who is the "user" here, or how can the "user" know the value?
>>>>>>>>
>>>>>>>>> Or this timeout value is stored in device register, virtio_net driver
>>>>>>>>> will read this timeout value at initialization?
>>>>>>>> See another thread. The design needs to be general, or you can post a RFC.
>>>>>>>>
>>>>>>>> In another thought, we've already had a tx watchdog, maybe we can have
>>>>>>>> something similar to cvq and use timeout + reset in that case.
>>>>>>> But we may block by the reset ^_^ if the device is broken?
>>>>>> I mean vq reset here.
>>>>> I see.
>>>>>
>>>>> I mean when the deivce is broken, the vq reset also many be blocked.
>>>>>
>>>>>          void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
>>>>>          {
>>>>>                  struct virtio_pci_modern_common_cfg __iomem *cfg;
>>>>>
>>>>>                  cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
>>>>>
>>>>>                  vp_iowrite16(index, &cfg->cfg.queue_select);
>>>>>                  vp_iowrite16(1, &cfg->queue_reset);
>>>>>
>>>>>                  while (vp_ioread16(&cfg->queue_reset))
>>>>>                          msleep(1);
>>>>>
>>>>>                  while (vp_ioread16(&cfg->cfg.queue_enable))
>>>>>                          msleep(1);
>>>>>          }
>>>>>          EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
>>>>>
>>>>> In this function, for the broken device, we can not expect something.
>>>> Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
>>>>
>>>> Thanks
>>>>
>>>>>> It looks like we have multiple goals here
>>>>>>
>>>>>> 1) avoid lockups, using workqueue + cond_resched() seems to be
>>>>>> sufficient, it has issue but nothing new
>>>>>> 2) recover from the unresponsive device, the issue for timeout is that
>>>>>> it needs to deal with false positives
>>>>> I agree.
>>>>>
>>>>> But I want to add a new goal, cvq async. In the netdim, we will
>>>>> send many requests via the cvq, so the cvq async will be nice.
>>> Then you need an interrupt for cvq.
>>>
>>> FYI, I've posted a series that use interrupt for cvq in the past:
>>>
>>> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/
>> I know this. But the interrupt maybe not a good solution without new space.
>>
>>> Haven't found time in working on this anymore, maybe we can start from
>>> this or not.
>> I said async, but my aim is to put many requests to the cvq before getting the
>> response.
>>
>> Heng Qi posted thishttps://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/

Sorry. This mail is rejected by netdev maillist. So I have to resend it.


Thanks a lot. I read Heng Qi's commits carefully. This patch series are 
similiar with the NIC feature xmit_more.

But if cvq command is urgent, can we let this urgent cvq command be 
passed ASAP?

I mean, can we set a flag similar to xmit_more? if cvq command is not 
urgent, it can be queued. If it is urgent,

this cvq command is passed ASAP.

Zhu Yanjun

> Zhu Yanjun
>
>> Thanks.
>>
>>
>>> Thanks
>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>> Thans
>>>>>>>>
>>>>>>>>> Zhu Yanjun
>>>>>>>>>
>>>>>>>>>>        Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  6:55                   ` Jason Wang
  2024-01-22  6:58                     ` Jason Wang
@ 2024-01-22  7:01                     ` Xuan Zhuo
  1 sibling, 0 replies; 29+ messages in thread
From: Xuan Zhuo @ 2024-01-22  7:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem,
	edumazet, kuba, virtualization, netdev, Zhu Yanjun

On Mon, 22 Jan 2024 14:55:46 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> > > > > >
> > > > > >
> > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > >>>>> +        if (timeout)
> > > > > > >>>>> +            timeout--;
> > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > >>>> loop for some non negligible time.
> > > > > > >>>>
> > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > >>>> thread you quoted.
> > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > >> to get a reasonable timeout？
> > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > >
> > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > functions.
> > > > >
> > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > mode setting to workqueue first:
> > > > >
> > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html
> > > > >
> > > > > >
> > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > by the user?
> > > > >
> > > > > Who is the "user" here, or how can the "user" know the value?
> > > > >
> > > > > >
> > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > will read this timeout value at initialization?
> > > > >
> > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > >
> > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > something similar to cvq and use timeout + reset in that case.
> > > >
> > > > But we may block by the reset ^_^ if the device is broken?
> > >
> > > I mean vq reset here.
> >
> > I see.
> >
> > I mean when the deivce is broken, the vq reset also many be blocked.
> >
> >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> >         {
> >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> >
> >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> >
> >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> >                 vp_iowrite16(1, &cfg->queue_reset);
> >
> >                 while (vp_ioread16(&cfg->queue_reset))
> >                         msleep(1);
> >
> >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> >                         msleep(1);
> >         }
> >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> >
> > In this function, for the broken device, we can not expect something.
>
> Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
>

YES. I agree.

Thanks.


> Thanks
>
> >
> >
> > >
> > > It looks like we have multiple goals here
> > >
> > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > sufficient, it has issue but nothing new
> > > 2) recover from the unresponsive device, the issue for timeout is that
> > > it needs to deal with false positives
> >
> >
> > I agree.
> >
> > But I want to add a new goal, cvq async. In the netdim, we will
> > send many requests via the cvq, so the cvq async will be nice.
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thans
> > > > >
> > > > > >
> > > > > > Zhu Yanjun
> > > > > >
> > > > > > >
> > > > > > >       Andrew
> > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-19 14:27     ` Heng Qi
  2024-01-19 17:29       ` Andrew Lunn
@ 2024-01-22  3:08       ` Jason Wang
  2024-01-22  4:42         ` Heng Qi
  1 sibling, 1 reply; 29+ messages in thread
From: Jason Wang @ 2024-01-22  3:08 UTC (permalink / raw)
  To: Heng Qi
  Cc: Zhu Yanjun, Paolo Abeni, Zhu Yanjun, mst, xuanzhuo, davem,
	edumazet, kuba, virtualization, netdev

On Fri, Jan 19, 2024 at 10:27 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
>
> 在 2024/1/18 下午8:01, Zhu Yanjun 写道:
> >
> > 在 2024/1/16 20:04, Paolo Abeni 写道:
> >> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote:
> >>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> >>>
> >>> Some devices emulate the virtio_net hardwares. When virtio_net
> >>> driver sends commands to the emulated hardware, normally the
> >>> hardware needs time to response. Sometimes the time is very
> >>> long. Thus, the following will appear. Then the whole system
> >>> will hang.
> >>> The similar problems also occur in Intel NICs and Mellanox NICs.
> >>> As such, the similar solution is borrowed from them. A timeout
> >>> value is added and the timeout value as large as possible is set
> >>> to ensure that the driver gets the maximum possible response from
> >>> the hardware.
> >>>
> >>> "
> >>> [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s!
> >>> [(udev-worker):3157]
> >>> [  213.796114] Modules linked in: virtio_net(+) net_failover
> >>> failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common
> >>> intel_uncore_frequency intel_uncore_frequency_common intel_ifs
> >>> i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp
> >>> coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support
> >>> vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev
> >>> intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci
> >>> isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus
> >>> i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf
> >>> ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update
> >>> fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel
> >>> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
> >>> bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi
> >>> pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
> >>> [  213.796194] irq event stamp: 67740
> >>> [  213.796195] hardirqs last  enabled at (67739):
> >>> [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
> >>> [  213.796203] hardirqs last disabled at (67740):
> >>> [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
> >>> [  213.796208] softirqs last  enabled at (67686):
> >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> >>> [  213.796214] softirqs last disabled at (67681):
> >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> >>> [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded
> >>> Not tainted 6.7.0+ #9
> >>> [  213.796220] Hardware name: Intel Corporation
> >>> M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926
> >>> 11/14/2022
> >>> [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
> >>> [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89
> >>> 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87
> >>> 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48
> >>> 89 e8 5b
> >>> [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
> >>> [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX:
> >>> 0000000000000001
> >>> [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI:
> >>> ff2f15095896f000
> >>> [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09:
> >>> 0000000000000000
> >>> [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12:
> >>> 0000000000000002
> >>> [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15:
> >>> ff2f1509534f3000
> >>> [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000)
> >>> knlGS:0000000000000000
> >>> [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4:
> >>> 0000000000f71ef0
> >>> [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> >>> 0000000000000000
> >>> [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7:
> >>> 0000000000000400
> >>> [  213.796247] PKRU: 55555554
> >>> [  213.796249] Call Trace:
> >>> [  213.796250]  <IRQ>
> >>> [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
> >>> [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
> >>> [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
> >>> [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
> >>> [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
> >>> [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> >>> [  213.796282]  </IRQ>
> >>> [  213.796284]  <TASK>
> >>> [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> >>> [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
> >>> [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
> >>> [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
> >>> [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
> >>> [  213.796328]  virtio_dev_probe+0x195/0x230
> >>> [  213.796333]  really_probe+0x19f/0x400
> >>> [  213.796338]  ? __pfx___driver_attach+0x10/0x10
> >>> [  213.796340]  __driver_probe_device+0x78/0x160
> >>> [  213.796343]  driver_probe_device+0x1f/0x90
> >>> [  213.796346]  __driver_attach+0xd6/0x1d0
> >>> [  213.796349]  bus_for_each_dev+0x8c/0xe0
> >>> [  213.796355]  bus_add_driver+0x119/0x220
> >>> [  213.796359]  driver_register+0x59/0x100
> >>> [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
> >>> [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
> >>> [  213.796375]  do_one_initcall+0x6f/0x380
> >>> [  213.796384]  do_init_module+0x60/0x240
> >>> [  213.796388]  init_module_from_file+0x86/0xc0
> >>> [  213.796396]  idempotent_init_module+0x129/0x2c0
> >>> [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
> >>> [  213.796409]  do_syscall_64+0x60/0xe0
> >>> [  213.796415]  ? do_syscall_64+0x6f/0xe0
> >>> [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
> >>> [  213.796424]  ? do_syscall_64+0x6f/0xe0
> >>> [  213.796427]  ? do_syscall_64+0x6f/0xe0
> >>> [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >>> [  213.796435] RIP: 0033:0x7f7758f279cd
> >>> [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
> >>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> >>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64
> >>> 89 01 48
> >>> [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX:
> >>> 0000000000000139
> >>> [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX:
> >>> 00007f7758f279cd
> >>> [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI:
> >>> 000000000000000f
> >>> [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09:
> >>> 00007ffe2cad8860
> >>> [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12:
> >>> 0000000000020000
> >>> [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15:
> >>> 0000557f98783aa0
> >>> [  213.796482]  </TASK>
> >>> "
> >>>
> >>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> >>> ---
> >>>   drivers/net/virtio_net.c | 10 ++++++++--
> >>>   1 file changed, 8 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index 51b1868d2f22..28b7dd917a43 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct
> >>> virtnet_info *vi, u8 class, u8 cmd,
> >>>   {
> >>>       struct scatterlist *sgs[4], hdr, stat;
> >>>       unsigned out_num = 0, tmp;
> >>> -    int ret;
> >>> +    int ret, timeout = 200;
> >>>         /* Caller should know better */
> >>>       BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
> >>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct
> >>> virtnet_info *vi, u8 class, u8 cmd,
> >>>        * into the hypervisor, so the request should be handled
> >>> immediately.
> >>>        */
> >>>       while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> >>> -           !virtqueue_is_broken(vi->cvq))
> >>> +           !virtqueue_is_broken(vi->cvq)) {
> >>> +        if (timeout)
> >>> +            timeout--;
> >> This is not really a timeout, just a loop counter. 200 iterations could
> >> be a very short time on reasonable H/W. I guess this avoid the soft
> >> lockup, but possibly (likely?) breaks the functionality when we need to
> >> loop for some non negligible time.
> >>
> >> I fear we need a more complex solution, as mentioned by Micheal in the
> >> thread you quoted.
> >
> > Got it. I also look forward to the more complex solution to this problem.
>
> Can we add a device capability (new feature bit) such as
> ctrq_wait_timeout to get a reasonable timeout？

This adds another kind of complexity for migration compatibility.

And we need to make it more general, e.g

1) it should not be cvq specific
2) or we can have a timeout that works for all queues

?

Thanks

>
> Thanks,
> Heng
>
> >
> > Zhu Yanjun
> >
> >>
> >> Cheers,
> >>
> >> Paolo
> >>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-22  3:08       ` Jason Wang
@ 2024-01-22  4:42         ` Heng Qi
  0 siblings, 0 replies; 29+ messages in thread
From: Heng Qi @ 2024-01-22  4:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu Yanjun, Paolo Abeni, Zhu Yanjun, mst, xuanzhuo, davem,
	edumazet, kuba, virtualization, netdev



在 2024/1/22 上午11:08, Jason Wang 写道:
> On Fri, Jan 19, 2024 at 10:27 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>
>>
>> 在 2024/1/18 下午8:01, Zhu Yanjun 写道:
>>> 在 2024/1/16 20:04, Paolo Abeni 写道:
>>>> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote:
>>>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>>>
>>>>> Some devices emulate the virtio_net hardwares. When virtio_net
>>>>> driver sends commands to the emulated hardware, normally the
>>>>> hardware needs time to response. Sometimes the time is very
>>>>> long. Thus, the following will appear. Then the whole system
>>>>> will hang.
>>>>> The similar problems also occur in Intel NICs and Mellanox NICs.
>>>>> As such, the similar solution is borrowed from them. A timeout
>>>>> value is added and the timeout value as large as possible is set
>>>>> to ensure that the driver gets the maximum possible response from
>>>>> the hardware.
>>>>>
>>>>> "
>>>>> [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s!
>>>>> [(udev-worker):3157]
>>>>> [  213.796114] Modules linked in: virtio_net(+) net_failover
>>>>> failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common
>>>>> intel_uncore_frequency intel_uncore_frequency_common intel_ifs
>>>>> i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp
>>>>> coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support
>>>>> vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev
>>>>> intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci
>>>>> isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus
>>>>> i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf
>>>>> ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update
>>>>> fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel
>>>>> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
>>>>> bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi
>>>>> pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
>>>>> [  213.796194] irq event stamp: 67740
>>>>> [  213.796195] hardirqs last  enabled at (67739):
>>>>> [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>>>> [  213.796203] hardirqs last disabled at (67740):
>>>>> [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
>>>>> [  213.796208] softirqs last  enabled at (67686):
>>>>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
>>>>> [  213.796214] softirqs last disabled at (67681):
>>>>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
>>>>> [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded
>>>>> Not tainted 6.7.0+ #9
>>>>> [  213.796220] Hardware name: Intel Corporation
>>>>> M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926
>>>>> 11/14/2022
>>>>> [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
>>>>> [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89
>>>>> 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87
>>>>> 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48
>>>>> 89 e8 5b
>>>>> [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
>>>>> [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX:
>>>>> 0000000000000001
>>>>> [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI:
>>>>> ff2f15095896f000
>>>>> [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09:
>>>>> 0000000000000000
>>>>> [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12:
>>>>> 0000000000000002
>>>>> [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15:
>>>>> ff2f1509534f3000
>>>>> [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000)
>>>>> knlGS:0000000000000000
>>>>> [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4:
>>>>> 0000000000f71ef0
>>>>> [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>> 0000000000000000
>>>>> [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7:
>>>>> 0000000000000400
>>>>> [  213.796247] PKRU: 55555554
>>>>> [  213.796249] Call Trace:
>>>>> [  213.796250]  <IRQ>
>>>>> [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
>>>>> [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
>>>>> [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
>>>>> [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
>>>>> [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
>>>>> [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
>>>>> [  213.796282]  </IRQ>
>>>>> [  213.796284]  <TASK>
>>>>> [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>>>> [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
>>>>> [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
>>>>> [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
>>>>> [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
>>>>> [  213.796328]  virtio_dev_probe+0x195/0x230
>>>>> [  213.796333]  really_probe+0x19f/0x400
>>>>> [  213.796338]  ? __pfx___driver_attach+0x10/0x10
>>>>> [  213.796340]  __driver_probe_device+0x78/0x160
>>>>> [  213.796343]  driver_probe_device+0x1f/0x90
>>>>> [  213.796346]  __driver_attach+0xd6/0x1d0
>>>>> [  213.796349]  bus_for_each_dev+0x8c/0xe0
>>>>> [  213.796355]  bus_add_driver+0x119/0x220
>>>>> [  213.796359]  driver_register+0x59/0x100
>>>>> [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
>>>>> [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
>>>>> [  213.796375]  do_one_initcall+0x6f/0x380
>>>>> [  213.796384]  do_init_module+0x60/0x240
>>>>> [  213.796388]  init_module_from_file+0x86/0xc0
>>>>> [  213.796396]  idempotent_init_module+0x129/0x2c0
>>>>> [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
>>>>> [  213.796409]  do_syscall_64+0x60/0xe0
>>>>> [  213.796415]  ? do_syscall_64+0x6f/0xe0
>>>>> [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
>>>>> [  213.796424]  ? do_syscall_64+0x6f/0xe0
>>>>> [  213.796427]  ? do_syscall_64+0x6f/0xe0
>>>>> [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>>>> [  213.796435] RIP: 0033:0x7f7758f279cd
>>>>> [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
>>>>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>>>>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64
>>>>> 89 01 48
>>>>> [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000139
>>>>> [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX:
>>>>> 00007f7758f279cd
>>>>> [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI:
>>>>> 000000000000000f
>>>>> [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09:
>>>>> 00007ffe2cad8860
>>>>> [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12:
>>>>> 0000000000020000
>>>>> [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15:
>>>>> 0000557f98783aa0
>>>>> [  213.796482]  </TASK>
>>>>> "
>>>>>
>>>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>>> ---
>>>>>    drivers/net/virtio_net.c | 10 ++++++++--
>>>>>    1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index 51b1868d2f22..28b7dd917a43 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct
>>>>> virtnet_info *vi, u8 class, u8 cmd,
>>>>>    {
>>>>>        struct scatterlist *sgs[4], hdr, stat;
>>>>>        unsigned out_num = 0, tmp;
>>>>> -    int ret;
>>>>> +    int ret, timeout = 200;
>>>>>          /* Caller should know better */
>>>>>        BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
>>>>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct
>>>>> virtnet_info *vi, u8 class, u8 cmd,
>>>>>         * into the hypervisor, so the request should be handled
>>>>> immediately.
>>>>>         */
>>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
>>>>> -           !virtqueue_is_broken(vi->cvq))
>>>>> +           !virtqueue_is_broken(vi->cvq)) {
>>>>> +        if (timeout)
>>>>> +            timeout--;
>>>> This is not really a timeout, just a loop counter. 200 iterations could
>>>> be a very short time on reasonable H/W. I guess this avoid the soft
>>>> lockup, but possibly (likely?) breaks the functionality when we need to
>>>> loop for some non negligible time.
>>>>
>>>> I fear we need a more complex solution, as mentioned by Micheal in the
>>>> thread you quoted.
>>> Got it. I also look forward to the more complex solution to this problem.
>> Can we add a device capability (new feature bit) such as
>> ctrq_wait_timeout to get a reasonable timeout？
> This adds another kind of complexity for migration compatibility.

Yes, this requires device adaptation. But the value will be more 
reasonable, different devices have different requirements for timeout.

>
> And we need to make it more general, e.g
>
> 1) it should not be cvq specific

Agree. Especially considering avq also has this problem.

> 2) or we can have a timeout that works for all queues

I remember Parav mentioned that they were also making some bulk 
improvements to ctrlq/avq,
or we should look at a combined solution.

rxq timeout can be used for our detection aliveness needs, especially 
after a live migration.
txq timeout can be used with watchdog + vq reset.

Thanks,
Heng

> ?
>
> Thanks
>
>> Thanks,
>> Heng
>>
>>> Zhu Yanjun
>>>
>>>> Cheers,
>>>>
>>>> Paolo
>>>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-16 12:04 ` Paolo Abeni
  2024-01-18 12:01   ` Zhu Yanjun
@ 2024-01-18 13:14   ` Michael S. Tsirkin
  2024-01-19  1:42     ` Xuan Zhuo
  1 sibling, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2024-01-18 13:14 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Zhu Yanjun, jasowang, xuanzhuo, davem, edumazet, kuba,
	virtualization, netdev, Zhu Yanjun

On Tue, Jan 16, 2024 at 01:04:49PM +0100, Paolo Abeni wrote:
> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote:
> > From: Zhu Yanjun <yanjun.zhu@linux.dev>
> > 
> > Some devices emulate the virtio_net hardwares. When virtio_net
> > driver sends commands to the emulated hardware, normally the
> > hardware needs time to response. Sometimes the time is very
> > long. Thus, the following will appear. Then the whole system
> > will hang.
> > The similar problems also occur in Intel NICs and Mellanox NICs.
> > As such, the similar solution is borrowed from them. A timeout
> > value is added and the timeout value as large as possible is set
> > to ensure that the driver gets the maximum possible response from
> > the hardware.
> > 
> > "
> > [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157]
> > [  213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
> > [  213.796194] irq event stamp: 67740
> > [  213.796195] hardirqs last  enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > [  213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
> > [  213.796208] softirqs last  enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> > [  213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> > [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9
> > [  213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
> > [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
> > [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b
> > [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
> > [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001
> > [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000
> > [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002
> > [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000
> > [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000
> > [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0
> > [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> > [  213.796247] PKRU: 55555554
> > [  213.796249] Call Trace:
> > [  213.796250]  <IRQ>
> > [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
> > [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
> > [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
> > [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
> > [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
> > [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> > [  213.796282]  </IRQ>
> > [  213.796284]  <TASK>
> > [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
> > [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
> > [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
> > [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
> > [  213.796328]  virtio_dev_probe+0x195/0x230
> > [  213.796333]  really_probe+0x19f/0x400
> > [  213.796338]  ? __pfx___driver_attach+0x10/0x10
> > [  213.796340]  __driver_probe_device+0x78/0x160
> > [  213.796343]  driver_probe_device+0x1f/0x90
> > [  213.796346]  __driver_attach+0xd6/0x1d0
> > [  213.796349]  bus_for_each_dev+0x8c/0xe0
> > [  213.796355]  bus_add_driver+0x119/0x220
> > [  213.796359]  driver_register+0x59/0x100
> > [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
> > [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
> > [  213.796375]  do_one_initcall+0x6f/0x380
> > [  213.796384]  do_init_module+0x60/0x240
> > [  213.796388]  init_module_from_file+0x86/0xc0
> > [  213.796396]  idempotent_init_module+0x129/0x2c0
> > [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
> > [  213.796409]  do_syscall_64+0x60/0xe0
> > [  213.796415]  ? do_syscall_64+0x6f/0xe0
> > [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
> > [  213.796424]  ? do_syscall_64+0x6f/0xe0
> > [  213.796427]  ? do_syscall_64+0x6f/0xe0
> > [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > [  213.796435] RIP: 0033:0x7f7758f279cd
> > [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48
> > [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd
> > [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f
> > [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860
> > [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
> > [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0
> > [  213.796482]  </TASK>
> > "
> > 
> > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> > ---
> >  drivers/net/virtio_net.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 51b1868d2f22..28b7dd917a43 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> >  {
> >  	struct scatterlist *sgs[4], hdr, stat;
> >  	unsigned out_num = 0, tmp;
> > -	int ret;
> > +	int ret, timeout = 200;
> >  
> >  	/* Caller should know better */
> >  	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
> > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> >  	 * into the hypervisor, so the request should be handled immediately.
> >  	 */
> >  	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > -	       !virtqueue_is_broken(vi->cvq))
> > +	       !virtqueue_is_broken(vi->cvq)) {
> > +		if (timeout)
> > +			timeout--;
> 
> This is not really a timeout, just a loop counter. 200 iterations could
> be a very short time on reasonable H/W. I guess this avoid the soft
> lockup, but possibly (likely?) breaks the functionality when we need to
> loop for some non negligible time.
> 
> I fear we need a more complex solution, as mentioned by Micheal in the
> thread you quoted.
> 
> Cheers,
> 
> Paolo

So a bunch of devices just queue a rquest on a workqueue. Problem is,
errors are not reported then. I really feel this is something netdev
core should support better: return a "deferred" status and have
netdev core wait outside a lock on a waitqueue.


-- 
MST


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang
  2024-01-18 13:14   ` Michael S. Tsirkin
@ 2024-01-19  1:42     ` Xuan Zhuo
  0 siblings, 0 replies; 29+ messages in thread
From: Xuan Zhuo @ 2024-01-19  1:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhu Yanjun, jasowang, davem, edumazet, kuba, virtualization,
	netdev, Zhu Yanjun, Paolo Abeni, Heng Qi

On Thu, 18 Jan 2024 08:14:21 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jan 16, 2024 at 01:04:49PM +0100, Paolo Abeni wrote:
> > On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote:
> > > From: Zhu Yanjun <yanjun.zhu@linux.dev>
> > >
> > > Some devices emulate the virtio_net hardwares. When virtio_net
> > > driver sends commands to the emulated hardware, normally the
> > > hardware needs time to response. Sometimes the time is very
> > > long. Thus, the following will appear. Then the whole system
> > > will hang.
> > > The similar problems also occur in Intel NICs and Mellanox NICs.
> > > As such, the similar solution is borrowed from them. A timeout
> > > value is added and the timeout value as large as possible is set
> > > to ensure that the driver gets the maximum possible response from
> > > the hardware.
> > >
> > > "
> > > [  213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157]
> > > [  213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
> > > [  213.796194] irq event stamp: 67740
> > > [  213.796195] hardirqs last  enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > > [  213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90
> > > [  213.796208] softirqs last  enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> > > [  213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0
> > > [  213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9
> > > [  213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
> > > [  213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110
> > > [  213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b
> > > [  213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246
> > > [  213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001
> > > [  213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000
> > > [  213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > > [  213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002
> > > [  213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000
> > > [  213.796240] FS:  00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000
> > > [  213.796242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0
> > > [  213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [  213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> > > [  213.796247] PKRU: 55555554
> > > [  213.796249] Call Trace:
> > > [  213.796250]  <IRQ>
> > > [  213.796252]  ? watchdog_timer_fn+0x1c0/0x220
> > > [  213.796258]  ? __pfx_watchdog_timer_fn+0x10/0x10
> > > [  213.796261]  ? __hrtimer_run_queues+0x1af/0x380
> > > [  213.796269]  ? hrtimer_interrupt+0xf8/0x230
> > > [  213.796274]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
> > > [  213.796279]  ? sysvec_apic_timer_interrupt+0x6d/0x90
> > > [  213.796282]  </IRQ>
> > > [  213.796284]  <TASK>
> > > [  213.796285]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > > [  213.796293]  ? virtqueue_get_buf_ctx_split+0x8d/0x110
> > > [  213.796297]  virtnet_send_command+0x18a/0x1f0 [virtio_net]
> > > [  213.796310]  _virtnet_set_queues+0xc6/0x120 [virtio_net]
> > > [  213.796319]  virtnet_probe+0xa06/0xd50 [virtio_net]
> > > [  213.796328]  virtio_dev_probe+0x195/0x230
> > > [  213.796333]  really_probe+0x19f/0x400
> > > [  213.796338]  ? __pfx___driver_attach+0x10/0x10
> > > [  213.796340]  __driver_probe_device+0x78/0x160
> > > [  213.796343]  driver_probe_device+0x1f/0x90
> > > [  213.796346]  __driver_attach+0xd6/0x1d0
> > > [  213.796349]  bus_for_each_dev+0x8c/0xe0
> > > [  213.796355]  bus_add_driver+0x119/0x220
> > > [  213.796359]  driver_register+0x59/0x100
> > > [  213.796362]  ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net]
> > > [  213.796369]  virtio_net_driver_init+0x8e/0xff0 [virtio_net]
> > > [  213.796375]  do_one_initcall+0x6f/0x380
> > > [  213.796384]  do_init_module+0x60/0x240
> > > [  213.796388]  init_module_from_file+0x86/0xc0
> > > [  213.796396]  idempotent_init_module+0x129/0x2c0
> > > [  213.796406]  __x64_sys_finit_module+0x5e/0xb0
> > > [  213.796409]  do_syscall_64+0x60/0xe0
> > > [  213.796415]  ? do_syscall_64+0x6f/0xe0
> > > [  213.796418]  ? lockdep_hardirqs_on_prepare+0xe4/0x1a0
> > > [  213.796424]  ? do_syscall_64+0x6f/0xe0
> > > [  213.796427]  ? do_syscall_64+0x6f/0xe0
> > > [  213.796431]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > > [  213.796435] RIP: 0033:0x7f7758f279cd
> > > [  213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48
> > > [  213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > > [  213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd
> > > [  213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f
> > > [  213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860
> > > [  213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
> > > [  213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0
> > > [  213.796482]  </TASK>
> > > "
> > >
> > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> > > ---
> > >  drivers/net/virtio_net.c | 10 ++++++++--
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 51b1868d2f22..28b7dd917a43 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> > >  {
> > >  	struct scatterlist *sgs[4], hdr, stat;
> > >  	unsigned out_num = 0, tmp;
> > > -	int ret;
> > > +	int ret, timeout = 200;
> > >
> > >  	/* Caller should know better */
> > >  	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
> > > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> > >  	 * into the hypervisor, so the request should be handled immediately.
> > >  	 */
> > >  	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > -	       !virtqueue_is_broken(vi->cvq))
> > > +	       !virtqueue_is_broken(vi->cvq)) {
> > > +		if (timeout)
> > > +			timeout--;
> >
> > This is not really a timeout, just a loop counter. 200 iterations could
> > be a very short time on reasonable H/W. I guess this avoid the soft
> > lockup, but possibly (likely?) breaks the functionality when we need to
> > loop for some non negligible time.
> >
> > I fear we need a more complex solution, as mentioned by Micheal in the
> > thread you quoted.
> >
> > Cheers,
> >
> > Paolo
>
> So a bunch of devices just queue a rquest on a workqueue. Problem is,
> errors are not reported then. I really feel this is something netdev
> core should support better: return a "deferred" status and have
> netdev core wait outside a lock on a waitqueue.

And on the other side, more and more requests will be post to the cvq,
we should refactor this to async mode.

Now the netdim has this request.

@Heng Qi

Thanks.


>
>
> --
> MST
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-01-26  3:13 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-15  1:29 [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang Zhu Yanjun
2024-01-15  2:20 ` Jason Wang
2024-01-15 10:25   ` Zhu Yanjun
     [not found]   ` <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev>
2024-01-22  3:06     ` Jason Wang
2024-01-16 12:04 ` Paolo Abeni
2024-01-18 12:01   ` Zhu Yanjun
2024-01-19 14:27     ` Heng Qi
2024-01-19 17:29       ` Andrew Lunn
2024-01-20  4:21         ` Zhu Yanjun
2024-01-22  2:12         ` Zhu Yanjun
2024-01-22  3:14           ` Jason Wang
2024-01-22  3:58             ` Xuan Zhuo
2024-01-22  4:16               ` Jason Wang
2024-01-22  6:16                 ` Xuan Zhuo
2024-01-22  6:55                   ` Jason Wang
2024-01-22  6:58                     ` Jason Wang
2024-01-22  7:02                       ` Xuan Zhuo
2024-01-22  7:19                         ` Jason Wang
2024-01-22  7:25                           ` Xuan Zhuo
2024-01-22  7:57                             ` Jason Wang
2024-01-22  8:01                               ` Xuan Zhuo
2024-01-22  8:32                                 ` Jason Wang
2024-01-22  9:11                                   ` Xuan Zhuo
     [not found]                         ` <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev>
2024-01-26  3:13                           ` Zhu Yanjun
2024-01-22  7:01                     ` Xuan Zhuo
2024-01-22  3:08       ` Jason Wang
2024-01-22  4:42         ` Heng Qi
2024-01-18 13:14   ` Michael S. Tsirkin
2024-01-19  1:42     ` Xuan Zhuo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).