* [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang @ 2024-01-15 1:29 Zhu Yanjun 2024-01-15 2:20 ` Jason Wang 2024-01-16 12:04 ` Paolo Abeni 0 siblings, 2 replies; 29+ messages in thread From: Zhu Yanjun @ 2024-01-15 1:29 UTC (permalink / raw) To: mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni, virtualization, netdev Cc: Zhu Yanjun From: Zhu Yanjun <yanjun.zhu@linux.dev> Some devices emulate the virtio_net hardwares. When virtio_net driver sends commands to the emulated hardware, normally the hardware needs time to response. Sometimes the time is very long. Thus, the following will appear. Then the whole system will hang. The similar problems also occur in Intel NICs and Mellanox NICs. As such, the similar solution is borrowed from them. A timeout value is added and the timeout value as large as possible is set to ensure that the driver gets the maximum possible response from the hardware. " [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath [ 213.796194] irq event stamp: 67740 [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 213.796247] PKRU: 55555554 [ 213.796249] Call Trace: [ 213.796250] <IRQ> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 [ 213.796282] </IRQ> [ 213.796284] <TASK> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] [ 213.796328] virtio_dev_probe+0x195/0x230 [ 213.796333] really_probe+0x19f/0x400 [ 213.796338] ? __pfx___driver_attach+0x10/0x10 [ 213.796340] __driver_probe_device+0x78/0x160 [ 213.796343] driver_probe_device+0x1f/0x90 [ 213.796346] __driver_attach+0xd6/0x1d0 [ 213.796349] bus_for_each_dev+0x8c/0xe0 [ 213.796355] bus_add_driver+0x119/0x220 [ 213.796359] driver_register+0x59/0x100 [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] [ 213.796375] do_one_initcall+0x6f/0x380 [ 213.796384] do_init_module+0x60/0x240 [ 213.796388] init_module_from_file+0x86/0xc0 [ 213.796396] idempotent_init_module+0x129/0x2c0 [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 [ 213.796409] do_syscall_64+0x60/0xe0 [ 213.796415] ? do_syscall_64+0x6f/0xe0 [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 [ 213.796424] ? do_syscall_64+0x6f/0xe0 [ 213.796427] ? do_syscall_64+0x6f/0xe0 [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 213.796435] RIP: 0033:0x7f7758f279cd [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 [ 213.796482] </TASK> " Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> --- drivers/net/virtio_net.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 51b1868d2f22..28b7dd917a43 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, { struct scatterlist *sgs[4], hdr, stat; unsigned out_num = 0, tmp; - int ret; + int ret, timeout = 200; /* Caller should know better */ BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, * into the hypervisor, so the request should be handled immediately. */ while (!virtqueue_get_buf(vi->cvq, &tmp) && - !virtqueue_is_broken(vi->cvq)) + !virtqueue_is_broken(vi->cvq)) { + if (timeout) + timeout--; + else + return false; /* long time no response */ + cpu_relax(); + } return vi->ctrl->status == VIRTIO_NET_OK; } -- 2.42.0 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-15 1:29 [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang Zhu Yanjun @ 2024-01-15 2:20 ` Jason Wang 2024-01-15 10:25 ` Zhu Yanjun [not found] ` <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev> 2024-01-16 12:04 ` Paolo Abeni 1 sibling, 2 replies; 29+ messages in thread From: Jason Wang @ 2024-01-15 2:20 UTC (permalink / raw) To: Zhu Yanjun Cc: mst, xuanzhuo, davem, edumazet, kuba, pabeni, virtualization, netdev, Zhu Yanjun On Mon, Jan 15, 2024 at 9:35 AM Zhu Yanjun <yanjun.zhu@intel.com> wrote: > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > Some devices emulate the virtio_net hardwares. When virtio_net > driver sends commands to the emulated hardware, normally the > hardware needs time to response. Sometimes the time is very > long. Thus, the following will appear. Then the whole system > will hang. > The similar problems also occur in Intel NICs and Mellanox NICs. > As such, the similar solution is borrowed from them. A timeout > value is added and the timeout value as large as possible is set > to ensure that the driver gets the maximum possible response from > the hardware. > > " > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > [ 213.796194] irq event stamp: 67740 > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > [ 213.796247] PKRU: 55555554 > [ 213.796249] Call Trace: > [ 213.796250] <IRQ> > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > [ 213.796282] </IRQ> > [ 213.796284] <TASK> > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > [ 213.796328] virtio_dev_probe+0x195/0x230 > [ 213.796333] really_probe+0x19f/0x400 > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > [ 213.796340] __driver_probe_device+0x78/0x160 > [ 213.796343] driver_probe_device+0x1f/0x90 > [ 213.796346] __driver_attach+0xd6/0x1d0 > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > [ 213.796355] bus_add_driver+0x119/0x220 > [ 213.796359] driver_register+0x59/0x100 > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > [ 213.796375] do_one_initcall+0x6f/0x380 > [ 213.796384] do_init_module+0x60/0x240 > [ 213.796388] init_module_from_file+0x86/0xc0 > [ 213.796396] idempotent_init_module+0x129/0x2c0 > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > [ 213.796409] do_syscall_64+0x60/0xe0 > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 213.796435] RIP: 0033:0x7f7758f279cd > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > [ 213.796482] </TASK> > " > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > drivers/net/virtio_net.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 51b1868d2f22..28b7dd917a43 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > { > struct scatterlist *sgs[4], hdr, stat; > unsigned out_num = 0, tmp; > - int ret; > + int ret, timeout = 200; Any reason we choose this value or how can we know it works for all types of devices? A more easy way is to use cond_resched() but it may have side effects as well[1]. But it seems less intrusive anyhow than the proposal here? Thanks [1] https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > * into the hypervisor, so the request should be handled immediately. > */ > while (!virtqueue_get_buf(vi->cvq, &tmp) && > - !virtqueue_is_broken(vi->cvq)) > + !virtqueue_is_broken(vi->cvq)) { > + if (timeout) > + timeout--; > + else > + return false; /* long time no response */ > + > cpu_relax(); > + } > > return vi->ctrl->status == VIRTIO_NET_OK; > } > -- > 2.42.0 > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-15 2:20 ` Jason Wang @ 2024-01-15 10:25 ` Zhu Yanjun [not found] ` <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev> 1 sibling, 0 replies; 29+ messages in thread From: Zhu Yanjun @ 2024-01-15 10:25 UTC (permalink / raw) To: Jason Wang, Zhu Yanjun Cc: mst, xuanzhuo, davem, edumazet, kuba, pabeni, virtualization, netdev 在 2024/1/15 10:20, Jason Wang 写道: > On Mon, Jan 15, 2024 at 9:35 AM Zhu Yanjun <yanjun.zhu@intel.com> wrote: >> From: Zhu Yanjun <yanjun.zhu@linux.dev> >> >> Some devices emulate the virtio_net hardwares. When virtio_net >> driver sends commands to the emulated hardware, normally the >> hardware needs time to response. Sometimes the time is very >> long. Thus, the following will appear. Then the whole system >> will hang. >> The similar problems also occur in Intel NICs and Mellanox NICs. >> As such, the similar solution is borrowed from them. A timeout >> value is added and the timeout value as large as possible is set >> to ensure that the driver gets the maximum possible response from >> the hardware. >> >> " >> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] >> [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath >> [ 213.796194] irq event stamp: 67740 >> [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 >> [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 >> [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >> [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 >> [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 >> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 >> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b >> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 >> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 >> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 >> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 >> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 >> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 >> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 >> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 >> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 >> [ 213.796247] PKRU: 55555554 >> [ 213.796249] Call Trace: >> [ 213.796250] <IRQ> >> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 >> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 >> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 >> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 >> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 >> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 >> [ 213.796282] </IRQ> >> [ 213.796284] <TASK> >> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 >> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 >> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] >> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] >> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] >> [ 213.796328] virtio_dev_probe+0x195/0x230 >> [ 213.796333] really_probe+0x19f/0x400 >> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 >> [ 213.796340] __driver_probe_device+0x78/0x160 >> [ 213.796343] driver_probe_device+0x1f/0x90 >> [ 213.796346] __driver_attach+0xd6/0x1d0 >> [ 213.796349] bus_for_each_dev+0x8c/0xe0 >> [ 213.796355] bus_add_driver+0x119/0x220 >> [ 213.796359] driver_register+0x59/0x100 >> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] >> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] >> [ 213.796375] do_one_initcall+0x6f/0x380 >> [ 213.796384] do_init_module+0x60/0x240 >> [ 213.796388] init_module_from_file+0x86/0xc0 >> [ 213.796396] idempotent_init_module+0x129/0x2c0 >> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 >> [ 213.796409] do_syscall_64+0x60/0xe0 >> [ 213.796415] ? do_syscall_64+0x6f/0xe0 >> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 >> [ 213.796424] ? do_syscall_64+0x6f/0xe0 >> [ 213.796427] ? do_syscall_64+0x6f/0xe0 >> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >> [ 213.796435] RIP: 0033:0x7f7758f279cd >> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 >> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 >> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd >> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f >> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 >> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 >> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 >> [ 213.796482] </TASK> >> " >> >> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >> --- >> drivers/net/virtio_net.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index 51b1868d2f22..28b7dd917a43 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, >> { >> struct scatterlist *sgs[4], hdr, stat; >> unsigned out_num = 0, tmp; >> - int ret; >> + int ret, timeout = 200; > Any reason we choose this value or how can we know it works for all > types of devices? Sorry. Resend it again because it is rejected by netdev. As I mentioned in the commit log, the similar problem also occurs in Intel NIC driver and Mellanox NIC driver. This commit is borrowed from the solution of Intel NIC driver. So the value is also from Intel NIC driver solution. > > A more easy way is to use cond_resched() but it may have side effects > as well[1]. But it seems less intrusive anyhow than the proposal here? Thanks a lot for your suggestions. I have made tests with the commits in the link https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html. Because virtio_net driver spins waiting for the response of hardware, virtio_net driver can not be unloaded and kernel hang still occurs when running "ip link" or unloading virtio_net module. Zhu Yanjun > > Thanks > > [1] https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html > >> /* Caller should know better */ >> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); >> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, >> * into the hypervisor, so the request should be handled immediately. >> */ >> while (!virtqueue_get_buf(vi->cvq, &tmp) && >> - !virtqueue_is_broken(vi->cvq)) >> + !virtqueue_is_broken(vi->cvq)) { >> + if (timeout) >> + timeout--; >> + else >> + return false; /* long time no response */ >> + >> cpu_relax(); >> + } >> >> return vi->ctrl->status == VIRTIO_NET_OK; >> } >> -- >> 2.42.0 >> ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev>]
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang [not found] ` <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev> @ 2024-01-22 3:06 ` Jason Wang 0 siblings, 0 replies; 29+ messages in thread From: Jason Wang @ 2024-01-22 3:06 UTC (permalink / raw) To: Zhu Yanjun Cc: Zhu Yanjun, mst, xuanzhuo, davem, edumazet, kuba, pabeni, virtualization, netdev On Mon, Jan 15, 2024 at 6:22 PM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > 在 2024/1/15 10:20, Jason Wang 写道: > > On Mon, Jan 15, 2024 at 9:35 AM Zhu Yanjun <yanjun.zhu@intel.com> wrote: > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > Some devices emulate the virtio_net hardwares. When virtio_net > driver sends commands to the emulated hardware, normally the > hardware needs time to response. Sometimes the time is very > long. Thus, the following will appear. Then the whole system > will hang. > The similar problems also occur in Intel NICs and Mellanox NICs. > As such, the similar solution is borrowed from them. A timeout > value is added and the timeout value as large as possible is set > to ensure that the driver gets the maximum possible response from > the hardware. > > " > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > [ 213.796194] irq event stamp: 67740 > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > [ 213.796247] PKRU: 55555554 > [ 213.796249] Call Trace: > [ 213.796250] <IRQ> > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > [ 213.796282] </IRQ> > [ 213.796284] <TASK> > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > [ 213.796328] virtio_dev_probe+0x195/0x230 > [ 213.796333] really_probe+0x19f/0x400 > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > [ 213.796340] __driver_probe_device+0x78/0x160 > [ 213.796343] driver_probe_device+0x1f/0x90 > [ 213.796346] __driver_attach+0xd6/0x1d0 > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > [ 213.796355] bus_add_driver+0x119/0x220 > [ 213.796359] driver_register+0x59/0x100 > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > [ 213.796375] do_one_initcall+0x6f/0x380 > [ 213.796384] do_init_module+0x60/0x240 > [ 213.796388] init_module_from_file+0x86/0xc0 > [ 213.796396] idempotent_init_module+0x129/0x2c0 > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > [ 213.796409] do_syscall_64+0x60/0xe0 > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 213.796435] RIP: 0033:0x7f7758f279cd > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > [ 213.796482] </TASK> > " > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > drivers/net/virtio_net.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 51b1868d2f22..28b7dd917a43 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > { > struct scatterlist *sgs[4], hdr, stat; > unsigned out_num = 0, tmp; > - int ret; > + int ret, timeout = 200; > > Any reason we choose this value or how can we know it works for all > types of devices? > > As I mentioned in the commit log, the similar problem also occurs in Intel NIC driver and Mellanox NIC driver. > > This commit is borrowed from the solution of Intel NIC driver. So the value is also from Intel NIC driver solution. Right, so basically I meant we need a solution that works for all vendors. > > A more easy way is to use cond_resched() but it may have side effects > as well[1]. But it seems less intrusive anyhow than the proposal here? > > Thanks a lot for your suggestions. I have made tests with the commits in the link https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html. > > Because virtio_net driver spins waiting for the response of hardware, virtio_net driver can not be unloaded and kernel hang still occurs when running "ip link" or unloading virtio_net module. Well, yes. It aims to solve the lockups as described in this commit log. It doesn't solve the issue of infinite waiting etc. Thanks > > Zhu Yanjun > > Thanks > > [1] https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > * into the hypervisor, so the request should be handled immediately. > */ > while (!virtqueue_get_buf(vi->cvq, &tmp) && > - !virtqueue_is_broken(vi->cvq)) > + !virtqueue_is_broken(vi->cvq)) { > + if (timeout) > + timeout--; > + else > + return false; /* long time no response */ > + > cpu_relax(); > + } > > return vi->ctrl->status == VIRTIO_NET_OK; > } > -- > 2.42.0 > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-15 1:29 [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang Zhu Yanjun 2024-01-15 2:20 ` Jason Wang @ 2024-01-16 12:04 ` Paolo Abeni 2024-01-18 12:01 ` Zhu Yanjun 2024-01-18 13:14 ` Michael S. Tsirkin 1 sibling, 2 replies; 29+ messages in thread From: Paolo Abeni @ 2024-01-16 12:04 UTC (permalink / raw) To: Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet, kuba, virtualization, netdev Cc: Zhu Yanjun On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > Some devices emulate the virtio_net hardwares. When virtio_net > driver sends commands to the emulated hardware, normally the > hardware needs time to response. Sometimes the time is very > long. Thus, the following will appear. Then the whole system > will hang. > The similar problems also occur in Intel NICs and Mellanox NICs. > As such, the similar solution is borrowed from them. A timeout > value is added and the timeout value as large as possible is set > to ensure that the driver gets the maximum possible response from > the hardware. > > " > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > [ 213.796194] irq event stamp: 67740 > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > [ 213.796247] PKRU: 55555554 > [ 213.796249] Call Trace: > [ 213.796250] <IRQ> > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > [ 213.796282] </IRQ> > [ 213.796284] <TASK> > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > [ 213.796328] virtio_dev_probe+0x195/0x230 > [ 213.796333] really_probe+0x19f/0x400 > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > [ 213.796340] __driver_probe_device+0x78/0x160 > [ 213.796343] driver_probe_device+0x1f/0x90 > [ 213.796346] __driver_attach+0xd6/0x1d0 > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > [ 213.796355] bus_add_driver+0x119/0x220 > [ 213.796359] driver_register+0x59/0x100 > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > [ 213.796375] do_one_initcall+0x6f/0x380 > [ 213.796384] do_init_module+0x60/0x240 > [ 213.796388] init_module_from_file+0x86/0xc0 > [ 213.796396] idempotent_init_module+0x129/0x2c0 > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > [ 213.796409] do_syscall_64+0x60/0xe0 > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 213.796435] RIP: 0033:0x7f7758f279cd > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > [ 213.796482] </TASK> > " > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > drivers/net/virtio_net.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 51b1868d2f22..28b7dd917a43 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > { > struct scatterlist *sgs[4], hdr, stat; > unsigned out_num = 0, tmp; > - int ret; > + int ret, timeout = 200; > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > * into the hypervisor, so the request should be handled immediately. > */ > while (!virtqueue_get_buf(vi->cvq, &tmp) && > - !virtqueue_is_broken(vi->cvq)) > + !virtqueue_is_broken(vi->cvq)) { > + if (timeout) > + timeout--; This is not really a timeout, just a loop counter. 200 iterations could be a very short time on reasonable H/W. I guess this avoid the soft lockup, but possibly (likely?) breaks the functionality when we need to loop for some non negligible time. I fear we need a more complex solution, as mentioned by Micheal in the thread you quoted. Cheers, Paolo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-16 12:04 ` Paolo Abeni @ 2024-01-18 12:01 ` Zhu Yanjun 2024-01-19 14:27 ` Heng Qi 2024-01-18 13:14 ` Michael S. Tsirkin 1 sibling, 1 reply; 29+ messages in thread From: Zhu Yanjun @ 2024-01-18 12:01 UTC (permalink / raw) To: Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet, kuba, virtualization, netdev 在 2024/1/16 20:04, Paolo Abeni 写道: > On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: >> From: Zhu Yanjun <yanjun.zhu@linux.dev> >> >> Some devices emulate the virtio_net hardwares. When virtio_net >> driver sends commands to the emulated hardware, normally the >> hardware needs time to response. Sometimes the time is very >> long. Thus, the following will appear. Then the whole system >> will hang. >> The similar problems also occur in Intel NICs and Mellanox NICs. >> As such, the similar solution is borrowed from them. A timeout >> value is added and the timeout value as large as possible is set >> to ensure that the driver gets the maximum possible response from >> the hardware. >> >> " >> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] >> [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath >> [ 213.796194] irq event stamp: 67740 >> [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 >> [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 >> [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >> [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 >> [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 >> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 >> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b >> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 >> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 >> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 >> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 >> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 >> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 >> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 >> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 >> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 >> [ 213.796247] PKRU: 55555554 >> [ 213.796249] Call Trace: >> [ 213.796250] <IRQ> >> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 >> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 >> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 >> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 >> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 >> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 >> [ 213.796282] </IRQ> >> [ 213.796284] <TASK> >> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 >> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 >> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] >> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] >> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] >> [ 213.796328] virtio_dev_probe+0x195/0x230 >> [ 213.796333] really_probe+0x19f/0x400 >> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 >> [ 213.796340] __driver_probe_device+0x78/0x160 >> [ 213.796343] driver_probe_device+0x1f/0x90 >> [ 213.796346] __driver_attach+0xd6/0x1d0 >> [ 213.796349] bus_for_each_dev+0x8c/0xe0 >> [ 213.796355] bus_add_driver+0x119/0x220 >> [ 213.796359] driver_register+0x59/0x100 >> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] >> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] >> [ 213.796375] do_one_initcall+0x6f/0x380 >> [ 213.796384] do_init_module+0x60/0x240 >> [ 213.796388] init_module_from_file+0x86/0xc0 >> [ 213.796396] idempotent_init_module+0x129/0x2c0 >> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 >> [ 213.796409] do_syscall_64+0x60/0xe0 >> [ 213.796415] ? do_syscall_64+0x6f/0xe0 >> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 >> [ 213.796424] ? do_syscall_64+0x6f/0xe0 >> [ 213.796427] ? do_syscall_64+0x6f/0xe0 >> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >> [ 213.796435] RIP: 0033:0x7f7758f279cd >> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 >> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 >> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd >> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f >> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 >> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 >> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 >> [ 213.796482] </TASK> >> " >> >> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >> --- >> drivers/net/virtio_net.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index 51b1868d2f22..28b7dd917a43 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, >> { >> struct scatterlist *sgs[4], hdr, stat; >> unsigned out_num = 0, tmp; >> - int ret; >> + int ret, timeout = 200; >> >> /* Caller should know better */ >> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); >> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, >> * into the hypervisor, so the request should be handled immediately. >> */ >> while (!virtqueue_get_buf(vi->cvq, &tmp) && >> - !virtqueue_is_broken(vi->cvq)) >> + !virtqueue_is_broken(vi->cvq)) { >> + if (timeout) >> + timeout--; > This is not really a timeout, just a loop counter. 200 iterations could > be a very short time on reasonable H/W. I guess this avoid the soft > lockup, but possibly (likely?) breaks the functionality when we need to > loop for some non negligible time. > > I fear we need a more complex solution, as mentioned by Micheal in the > thread you quoted. Got it. I also look forward to the more complex solution to this problem. Zhu Yanjun > > Cheers, > > Paolo > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-18 12:01 ` Zhu Yanjun @ 2024-01-19 14:27 ` Heng Qi 2024-01-19 17:29 ` Andrew Lunn 2024-01-22 3:08 ` Jason Wang 0 siblings, 2 replies; 29+ messages in thread From: Heng Qi @ 2024-01-19 14:27 UTC (permalink / raw) To: Zhu Yanjun, Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet, kuba, virtualization, netdev 在 2024/1/18 下午8:01, Zhu Yanjun 写道: > > 在 2024/1/16 20:04, Paolo Abeni 写道: >> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: >>> From: Zhu Yanjun <yanjun.zhu@linux.dev> >>> >>> Some devices emulate the virtio_net hardwares. When virtio_net >>> driver sends commands to the emulated hardware, normally the >>> hardware needs time to response. Sometimes the time is very >>> long. Thus, the following will appear. Then the whole system >>> will hang. >>> The similar problems also occur in Intel NICs and Mellanox NICs. >>> As such, the similar solution is borrowed from them. A timeout >>> value is added and the timeout value as large as possible is set >>> to ensure that the driver gets the maximum possible response from >>> the hardware. >>> >>> " >>> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! >>> [(udev-worker):3157] >>> [ 213.796114] Modules linked in: virtio_net(+) net_failover >>> failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common >>> intel_uncore_frequency intel_uncore_frequency_common intel_ifs >>> i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp >>> coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support >>> vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev >>> intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci >>> isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus >>> i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf >>> ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update >>> fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel >>> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 >>> bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi >>> pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath >>> [ 213.796194] irq event stamp: 67740 >>> [ 213.796195] hardirqs last enabled at (67739): >>> [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 >>> [ 213.796203] hardirqs last disabled at (67740): >>> [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 >>> [ 213.796208] softirqs last enabled at (67686): >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >>> [ 213.796214] softirqs last disabled at (67681): >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >>> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded >>> Not tainted 6.7.0+ #9 >>> [ 213.796220] Hardware name: Intel Corporation >>> M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 >>> 11/14/2022 >>> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 >>> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 >>> 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 >>> 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 >>> 89 e8 5b >>> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 >>> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: >>> 0000000000000001 >>> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: >>> ff2f15095896f000 >>> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: >>> 0000000000000000 >>> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: >>> 0000000000000002 >>> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: >>> ff2f1509534f3000 >>> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) >>> knlGS:0000000000000000 >>> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: >>> 0000000000f71ef0 >>> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: >>> 0000000000000400 >>> [ 213.796247] PKRU: 55555554 >>> [ 213.796249] Call Trace: >>> [ 213.796250] <IRQ> >>> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 >>> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 >>> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 >>> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 >>> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 >>> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 >>> [ 213.796282] </IRQ> >>> [ 213.796284] <TASK> >>> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 >>> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 >>> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] >>> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] >>> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] >>> [ 213.796328] virtio_dev_probe+0x195/0x230 >>> [ 213.796333] really_probe+0x19f/0x400 >>> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 >>> [ 213.796340] __driver_probe_device+0x78/0x160 >>> [ 213.796343] driver_probe_device+0x1f/0x90 >>> [ 213.796346] __driver_attach+0xd6/0x1d0 >>> [ 213.796349] bus_for_each_dev+0x8c/0xe0 >>> [ 213.796355] bus_add_driver+0x119/0x220 >>> [ 213.796359] driver_register+0x59/0x100 >>> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] >>> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] >>> [ 213.796375] do_one_initcall+0x6f/0x380 >>> [ 213.796384] do_init_module+0x60/0x240 >>> [ 213.796388] init_module_from_file+0x86/0xc0 >>> [ 213.796396] idempotent_init_module+0x129/0x2c0 >>> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 >>> [ 213.796409] do_syscall_64+0x60/0xe0 >>> [ 213.796415] ? do_syscall_64+0x6f/0xe0 >>> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 >>> [ 213.796424] ? do_syscall_64+0x6f/0xe0 >>> [ 213.796427] ? do_syscall_64+0x6f/0xe0 >>> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >>> [ 213.796435] RIP: 0033:0x7f7758f279cd >>> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e >>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 >>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 >>> 89 01 48 >>> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: >>> 0000000000000139 >>> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: >>> 00007f7758f279cd >>> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: >>> 000000000000000f >>> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: >>> 00007ffe2cad8860 >>> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: >>> 0000000000020000 >>> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: >>> 0000557f98783aa0 >>> [ 213.796482] </TASK> >>> " >>> >>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >>> --- >>> drivers/net/virtio_net.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>> index 51b1868d2f22..28b7dd917a43 100644 >>> --- a/drivers/net/virtio_net.c >>> +++ b/drivers/net/virtio_net.c >>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct >>> virtnet_info *vi, u8 class, u8 cmd, >>> { >>> struct scatterlist *sgs[4], hdr, stat; >>> unsigned out_num = 0, tmp; >>> - int ret; >>> + int ret, timeout = 200; >>> /* Caller should know better */ >>> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); >>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct >>> virtnet_info *vi, u8 class, u8 cmd, >>> * into the hypervisor, so the request should be handled >>> immediately. >>> */ >>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>> - !virtqueue_is_broken(vi->cvq)) >>> + !virtqueue_is_broken(vi->cvq)) { >>> + if (timeout) >>> + timeout--; >> This is not really a timeout, just a loop counter. 200 iterations could >> be a very short time on reasonable H/W. I guess this avoid the soft >> lockup, but possibly (likely?) breaks the functionality when we need to >> loop for some non negligible time. >> >> I fear we need a more complex solution, as mentioned by Micheal in the >> thread you quoted. > > Got it. I also look forward to the more complex solution to this problem. Can we add a device capability (new feature bit) such as ctrq_wait_timeout to get a reasonable timeout? Thanks, Heng > > Zhu Yanjun > >> >> Cheers, >> >> Paolo >> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-19 14:27 ` Heng Qi @ 2024-01-19 17:29 ` Andrew Lunn 2024-01-20 4:21 ` Zhu Yanjun 2024-01-22 2:12 ` Zhu Yanjun 2024-01-22 3:08 ` Jason Wang 1 sibling, 2 replies; 29+ messages in thread From: Andrew Lunn @ 2024-01-19 17:29 UTC (permalink / raw) To: Heng Qi Cc: Zhu Yanjun, Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet, kuba, virtualization, netdev > > > > while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > - !virtqueue_is_broken(vi->cvq)) > > > > + !virtqueue_is_broken(vi->cvq)) { > > > > + if (timeout) > > > > + timeout--; > > > This is not really a timeout, just a loop counter. 200 iterations could > > > be a very short time on reasonable H/W. I guess this avoid the soft > > > lockup, but possibly (likely?) breaks the functionality when we need to > > > loop for some non negligible time. > > > > > > I fear we need a more complex solution, as mentioned by Micheal in the > > > thread you quoted. > > > > Got it. I also look forward to the more complex solution to this problem. > > Can we add a device capability (new feature bit) such as ctrq_wait_timeout > to get a reasonable timeout? The usual solution to this is include/linux/iopoll.h. If you can sleep read_poll_timeout() otherwise read_poll_timeout_atomic(). Andrew ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-19 17:29 ` Andrew Lunn @ 2024-01-20 4:21 ` Zhu Yanjun 2024-01-22 2:12 ` Zhu Yanjun 1 sibling, 0 replies; 29+ messages in thread From: Zhu Yanjun @ 2024-01-20 4:21 UTC (permalink / raw) To: Andrew Lunn, Heng Qi Cc: Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet, kuba, virtualization, netdev 在 2024/1/20 1:29, Andrew Lunn 写道: >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>>>> - !virtqueue_is_broken(vi->cvq)) >>>>> + !virtqueue_is_broken(vi->cvq)) { >>>>> + if (timeout) >>>>> + timeout--; >>>> This is not really a timeout, just a loop counter. 200 iterations could >>>> be a very short time on reasonable H/W. I guess this avoid the soft >>>> lockup, but possibly (likely?) breaks the functionality when we need to >>>> loop for some non negligible time. >>>> >>>> I fear we need a more complex solution, as mentioned by Micheal in the >>>> thread you quoted. >>> Got it. I also look forward to the more complex solution to this problem. >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout >> to get a reasonable timeout? > The usual solution to this is include/linux/iopoll.h. If you can sleep > read_poll_timeout() otherwise read_poll_timeout_atomic(). Thanks. The 2 functions read_poll_timeout() and read_poll_timeout_atomic() are interesting. Zhu Yanjun > > Andrew ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-19 17:29 ` Andrew Lunn 2024-01-20 4:21 ` Zhu Yanjun @ 2024-01-22 2:12 ` Zhu Yanjun 2024-01-22 3:14 ` Jason Wang 1 sibling, 1 reply; 29+ messages in thread From: Zhu Yanjun @ 2024-01-22 2:12 UTC (permalink / raw) To: Andrew Lunn, Heng Qi Cc: Paolo Abeni, Zhu Yanjun, mst, jasowang, xuanzhuo, davem, edumazet, kuba, virtualization, netdev 在 2024/1/20 1:29, Andrew Lunn 写道: >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>>>> - !virtqueue_is_broken(vi->cvq)) >>>>> + !virtqueue_is_broken(vi->cvq)) { >>>>> + if (timeout) >>>>> + timeout--; >>>> This is not really a timeout, just a loop counter. 200 iterations could >>>> be a very short time on reasonable H/W. I guess this avoid the soft >>>> lockup, but possibly (likely?) breaks the functionality when we need to >>>> loop for some non negligible time. >>>> >>>> I fear we need a more complex solution, as mentioned by Micheal in the >>>> thread you quoted. >>> Got it. I also look forward to the more complex solution to this problem. >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout >> to get a reasonable timeout? > The usual solution to this is include/linux/iopoll.h. If you can sleep > read_poll_timeout() otherwise read_poll_timeout_atomic(). I read carefully the functions read_poll_timeout() and read_poll_timeout_atomic(). The timeout is set by the caller of the 2 functions. As such, can we add a module parameter to customize this timeout value by the user? Or this timeout value is stored in device register, virtio_net driver will read this timeout value at initialization? Zhu Yanjun > > Andrew ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 2:12 ` Zhu Yanjun @ 2024-01-22 3:14 ` Jason Wang 2024-01-22 3:58 ` Xuan Zhuo 0 siblings, 1 reply; 29+ messages in thread From: Jason Wang @ 2024-01-22 3:14 UTC (permalink / raw) To: Zhu Yanjun Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, xuanzhuo, davem, edumazet, kuba, virtualization, netdev On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > >>>>> - !virtqueue_is_broken(vi->cvq)) > >>>>> + !virtqueue_is_broken(vi->cvq)) { > >>>>> + if (timeout) > >>>>> + timeout--; > >>>> This is not really a timeout, just a loop counter. 200 iterations could > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > >>>> loop for some non negligible time. > >>>> > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > >>>> thread you quoted. > >>> Got it. I also look forward to the more complex solution to this problem. > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > >> to get a reasonable timeout? > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > I read carefully the functions read_poll_timeout() and > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > functions. FYI, in order to avoid a swtich of atomic or not, we need convert rx mode setting to workqueue first: https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > As such, can we add a module parameter to customize this timeout value > by the user? Who is the "user" here, or how can the "user" know the value? > > Or this timeout value is stored in device register, virtio_net driver > will read this timeout value at initialization? See another thread. The design needs to be general, or you can post a RFC. In another thought, we've already had a tx watchdog, maybe we can have something similar to cvq and use timeout + reset in that case. Thans > > Zhu Yanjun > > > > > Andrew > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 3:14 ` Jason Wang @ 2024-01-22 3:58 ` Xuan Zhuo 2024-01-22 4:16 ` Jason Wang 0 siblings, 1 reply; 29+ messages in thread From: Xuan Zhuo @ 2024-01-22 3:58 UTC (permalink / raw) To: Jason Wang Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > >>>>> + if (timeout) > > >>>>> + timeout--; > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > >>>> loop for some non negligible time. > > >>>> > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > >>>> thread you quoted. > > >>> Got it. I also look forward to the more complex solution to this problem. > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > >> to get a reasonable timeout? > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > I read carefully the functions read_poll_timeout() and > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > functions. > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > mode setting to workqueue first: > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > As such, can we add a module parameter to customize this timeout value > > by the user? > > Who is the "user" here, or how can the "user" know the value? > > > > > Or this timeout value is stored in device register, virtio_net driver > > will read this timeout value at initialization? > > See another thread. The design needs to be general, or you can post a RFC. > > In another thought, we've already had a tx watchdog, maybe we can have > something similar to cvq and use timeout + reset in that case. But we may block by the reset ^_^ if the device is broken? Thanks. > > Thans > > > > > Zhu Yanjun > > > > > > > > Andrew > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 3:58 ` Xuan Zhuo @ 2024-01-22 4:16 ` Jason Wang 2024-01-22 6:16 ` Xuan Zhuo 0 siblings, 1 reply; 29+ messages in thread From: Jason Wang @ 2024-01-22 4:16 UTC (permalink / raw) To: Xuan Zhuo Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > >>>>> + if (timeout) > > > >>>>> + timeout--; > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > >>>> loop for some non negligible time. > > > >>>> > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > >>>> thread you quoted. > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > >> to get a reasonable timeout? > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > I read carefully the functions read_poll_timeout() and > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > functions. > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > mode setting to workqueue first: > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > by the user? > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > will read this timeout value at initialization? > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > In another thought, we've already had a tx watchdog, maybe we can have > > something similar to cvq and use timeout + reset in that case. > > But we may block by the reset ^_^ if the device is broken? I mean vq reset here. It looks like we have multiple goals here 1) avoid lockups, using workqueue + cond_resched() seems to be sufficient, it has issue but nothing new 2) recover from the unresponsive device, the issue for timeout is that it needs to deal with false positives Thanks > > Thanks. > > > > > > Thans > > > > > > > > Zhu Yanjun > > > > > > > > > > > Andrew > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 4:16 ` Jason Wang @ 2024-01-22 6:16 ` Xuan Zhuo 2024-01-22 6:55 ` Jason Wang 0 siblings, 1 reply; 29+ messages in thread From: Xuan Zhuo @ 2024-01-22 6:16 UTC (permalink / raw) To: Jason Wang Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > >>>>> + if (timeout) > > > > >>>>> + timeout--; > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > >>>> loop for some non negligible time. > > > > >>>> > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > >>>> thread you quoted. > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > >> to get a reasonable timeout? > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > functions. > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > mode setting to workqueue first: > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > by the user? > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > will read this timeout value at initialization? > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > something similar to cvq and use timeout + reset in that case. > > > > But we may block by the reset ^_^ if the device is broken? > > I mean vq reset here. I see. I mean when the deivce is broken, the vq reset also many be blocked. void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) { struct virtio_pci_modern_common_cfg __iomem *cfg; cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; vp_iowrite16(index, &cfg->cfg.queue_select); vp_iowrite16(1, &cfg->queue_reset); while (vp_ioread16(&cfg->queue_reset)) msleep(1); while (vp_ioread16(&cfg->cfg.queue_enable)) msleep(1); } EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); In this function, for the broken device, we can not expect something. > > It looks like we have multiple goals here > > 1) avoid lockups, using workqueue + cond_resched() seems to be > sufficient, it has issue but nothing new > 2) recover from the unresponsive device, the issue for timeout is that > it needs to deal with false positives I agree. But I want to add a new goal, cvq async. In the netdim, we will send many requests via the cvq, so the cvq async will be nice. Thanks. > > Thanks > > > > > Thanks. > > > > > > > > > > Thans > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > Andrew > > > > > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 6:16 ` Xuan Zhuo @ 2024-01-22 6:55 ` Jason Wang 2024-01-22 6:58 ` Jason Wang 2024-01-22 7:01 ` Xuan Zhuo 0 siblings, 2 replies; 29+ messages in thread From: Jason Wang @ 2024-01-22 6:55 UTC (permalink / raw) To: Xuan Zhuo Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > >>>>> + if (timeout) > > > > > >>>>> + timeout--; > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > >>>> loop for some non negligible time. > > > > > >>>> > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > >>>> thread you quoted. > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > >> to get a reasonable timeout? > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > functions. > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > mode setting to workqueue first: > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > by the user? > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > will read this timeout value at initialization? > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > something similar to cvq and use timeout + reset in that case. > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > I mean vq reset here. > > I see. > > I mean when the deivce is broken, the vq reset also many be blocked. > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > { > struct virtio_pci_modern_common_cfg __iomem *cfg; > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > vp_iowrite16(index, &cfg->cfg.queue_select); > vp_iowrite16(1, &cfg->queue_reset); > > while (vp_ioread16(&cfg->queue_reset)) > msleep(1); > > while (vp_ioread16(&cfg->cfg.queue_enable)) > msleep(1); > } > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > In this function, for the broken device, we can not expect something. Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. Thanks > > > > > > It looks like we have multiple goals here > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > sufficient, it has issue but nothing new > > 2) recover from the unresponsive device, the issue for timeout is that > > it needs to deal with false positives > > > I agree. > > But I want to add a new goal, cvq async. In the netdim, we will > send many requests via the cvq, so the cvq async will be nice. > > Thanks. > > > > > > Thanks > > > > > > > > Thanks. > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 6:55 ` Jason Wang @ 2024-01-22 6:58 ` Jason Wang 2024-01-22 7:02 ` Xuan Zhuo 2024-01-22 7:01 ` Xuan Zhuo 1 sibling, 1 reply; 29+ messages in thread From: Jason Wang @ 2024-01-22 6:58 UTC (permalink / raw) To: Xuan Zhuo Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > >>>>> + if (timeout) > > > > > > >>>>> + timeout--; > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > >>>> loop for some non negligible time. > > > > > > >>>> > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > >>>> thread you quoted. > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > >> to get a reasonable timeout? > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > functions. > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > mode setting to workqueue first: > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > by the user? > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > will read this timeout value at initialization? > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > I mean vq reset here. > > > > I see. > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > { > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > vp_iowrite16(1, &cfg->queue_reset); > > > > while (vp_ioread16(&cfg->queue_reset)) > > msleep(1); > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > msleep(1); > > } > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > In this function, for the broken device, we can not expect something. > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > Thanks > > > > > > > > > > > It looks like we have multiple goals here > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > sufficient, it has issue but nothing new > > > 2) recover from the unresponsive device, the issue for timeout is that > > > it needs to deal with false positives > > > > > > I agree. > > > > But I want to add a new goal, cvq async. In the netdim, we will > > send many requests via the cvq, so the cvq async will be nice. Then you need an interrupt for cvq. FYI, I've posted a series that use interrupt for cvq in the past: https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ Haven't found time in working on this anymore, maybe we can start from this or not. Thanks > > > > Thanks. > > > > > > > > > > Thanks > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 6:58 ` Jason Wang @ 2024-01-22 7:02 ` Xuan Zhuo 2024-01-22 7:19 ` Jason Wang [not found] ` <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev> 0 siblings, 2 replies; 29+ messages in thread From: Xuan Zhuo @ 2024-01-22 7:02 UTC (permalink / raw) To: Jason Wang Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > >>>>> + if (timeout) > > > > > > > >>>>> + timeout--; > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > >>>> loop for some non negligible time. > > > > > > > >>>> > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > >>>> thread you quoted. > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > >> to get a reasonable timeout? > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > functions. > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > mode setting to workqueue first: > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > by the user? > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > I mean vq reset here. > > > > > > I see. > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > { > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > msleep(1); > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > msleep(1); > > > } > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > In this function, for the broken device, we can not expect something. > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > Thanks > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > sufficient, it has issue but nothing new > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > it needs to deal with false positives > > > > > > > > > I agree. > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > send many requests via the cvq, so the cvq async will be nice. > > Then you need an interrupt for cvq. > > FYI, I've posted a series that use interrupt for cvq in the past: > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ I know this. But the interrupt maybe not a good solution without new space. > > Haven't found time in working on this anymore, maybe we can start from > this or not. I said async, but my aim is to put many requests to the cvq before getting the response. Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ Thanks. > > Thanks > > > > > > > Thanks. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 7:02 ` Xuan Zhuo @ 2024-01-22 7:19 ` Jason Wang 2024-01-22 7:25 ` Xuan Zhuo [not found] ` <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev> 1 sibling, 1 reply; 29+ messages in thread From: Jason Wang @ 2024-01-22 7:19 UTC (permalink / raw) To: Xuan Zhuo Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > >>>>> + if (timeout) > > > > > > > > >>>>> + timeout--; > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > >>>> > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > >>>> thread you quoted. > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > functions. > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > by the user? > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > I mean vq reset here. > > > > > > > > I see. > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > { > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > msleep(1); > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > msleep(1); > > > > } > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > sufficient, it has issue but nothing new > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > it needs to deal with false positives > > > > > > > > > > > > I agree. > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > send many requests via the cvq, so the cvq async will be nice. > > > > Then you need an interrupt for cvq. > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > I know this. But the interrupt maybe not a good solution without new space. What do you mean by "new space"? We can introduce something like enable_cb_delayed(), then you will only get notified after several requests. > > > > > Haven't found time in working on this anymore, maybe we can start from > > this or not. > > > I said async, but my aim is to put many requests to the cvq before getting the > response. It doesn't differ from TX/RX in this case. > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ > This seems like a hack, if interrupt is used, you can simply do that in the callback. Thanks > Thanks. > > > > > > Thanks > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 7:19 ` Jason Wang @ 2024-01-22 7:25 ` Xuan Zhuo 2024-01-22 7:57 ` Jason Wang 0 siblings, 1 reply; 29+ messages in thread From: Xuan Zhuo @ 2024-01-22 7:25 UTC (permalink / raw) To: Jason Wang Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > >>>>> + timeout--; > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > >>>> > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > >>>> thread you quoted. > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > functions. > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > by the user? > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > I see. > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > { > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > msleep(1); > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > msleep(1); > > > > > } > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > sufficient, it has issue but nothing new > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > I agree. > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > Then you need an interrupt for cvq. > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > I know this. But the interrupt maybe not a good solution without new space. > > What do you mean by "new space"? Yes, I know, the cvq can work with interrupt by the virtio spec. But as I know, many hypervisors implement the cvq without supporting interrupt. If we let the cvq work with interrupt without negotiation then many hypervisors will hang on the new kernel. > > We can introduce something like enable_cb_delayed(), then you will > only get notified after several requests. > > > > > > > > > Haven't found time in working on this anymore, maybe we can start from > > > this or not. > > > > > > I said async, but my aim is to put many requests to the cvq before getting the > > response. > > It doesn't differ from TX/RX in this case. > > > > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ > > > > This seems like a hack, if interrupt is used, you can simply do that > in the callback. YES. I also want to change the code, I just want to say the async is a goal. For the rx mode, we have introduce a work queue, I want to move the sending command job to the work queue. The caller just wakeup the work queue. If the caller want to got the result sync, then the caller can wait for it. If not, the caller can register an function to the work queue. And I think it will be easy to implement the timeout inside the workqueue. Thanks. > > Thanks > > > Thanks. > > > > > > > > > > Thanks > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 7:25 ` Xuan Zhuo @ 2024-01-22 7:57 ` Jason Wang 2024-01-22 8:01 ` Xuan Zhuo 0 siblings, 1 reply; 29+ messages in thread From: Jason Wang @ 2024-01-22 7:57 UTC (permalink / raw) To: Xuan Zhuo Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > > >>>>> + timeout--; > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > > >>>> > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > > >>>> thread you quoted. > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > > functions. > > > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > > by the user? > > > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > > > I see. > > > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > > { > > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > > msleep(1); > > > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > > msleep(1); > > > > > > } > > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > > sufficient, it has issue but nothing new > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > > > Then you need an interrupt for cvq. > > > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > > > I know this. But the interrupt maybe not a good solution without new space. > > > > What do you mean by "new space"? > > Yes, I know, the cvq can work with interrupt by the virtio spec. > But as I know, many hypervisors implement the cvq without supporting interrupt. It's a bug of the hypervisor that needs to be fix. Interrupt is provided by transport not the virtio itself. Otherwise it can only support for Linux but not other OSes. > If we let the cvq work with interrupt without negotiation then > many hypervisors will hang on the new kernel. > > > > > We can introduce something like enable_cb_delayed(), then you will > > only get notified after several requests. > > > > > > > > > > > > > Haven't found time in working on this anymore, maybe we can start from > > > > this or not. > > > > > > > > > I said async, but my aim is to put many requests to the cvq before getting the > > > response. > > > > It doesn't differ from TX/RX in this case. > > > > > > > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ > > > > > > > This seems like a hack, if interrupt is used, you can simply do that > > in the callback. > > YES. > > I also want to change the code, I just want to say the async is a goal. > > For the rx mode, we have introduce a work queue, I want to move the > sending command job to the work queue. The caller just wakeup > the work queue. > > If the caller want to got the result sync, then the caller can wait for it. > If not, the caller can register an function to the work queue. > > And I think it will be easy to implement the timeout inside the workqueue. Looks much more complicated than a simple interrupt + timer/watchdog etc. Thanks > > Thanks. > > > > > > Thanks > > > > > Thanks. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 7:57 ` Jason Wang @ 2024-01-22 8:01 ` Xuan Zhuo 2024-01-22 8:32 ` Jason Wang 0 siblings, 1 reply; 29+ messages in thread From: Xuan Zhuo @ 2024-01-22 8:01 UTC (permalink / raw) To: Jason Wang Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, 22 Jan 2024 15:57:08 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > > > >>>>> + timeout--; > > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > > > >>>> thread you quoted. > > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > > > functions. > > > > > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > > > by the user? > > > > > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > > > > > I see. > > > > > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > > > { > > > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > > > msleep(1); > > > > > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > > > msleep(1); > > > > > > > } > > > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > > > sufficient, it has issue but nothing new > > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > > > > > Then you need an interrupt for cvq. > > > > > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > > > > > I know this. But the interrupt maybe not a good solution without new space. > > > > > > What do you mean by "new space"? > > > > Yes, I know, the cvq can work with interrupt by the virtio spec. > > But as I know, many hypervisors implement the cvq without supporting interrupt. > > It's a bug of the hypervisor that needs to be fix. Interrupt is > provided by transport not the virtio itself. YES. I agree. But I still think we should not work with interrupt without any negotiation directly. I more like to introduce a new feature to enable this. Thanks. > > Otherwise it can only support for Linux but not other OSes. > > > If we let the cvq work with interrupt without negotiation then > > many hypervisors will hang on the new kernel. > > > > > > > > We can introduce something like enable_cb_delayed(), then you will > > > only get notified after several requests. > > > > > > > > > > > > > > > > > Haven't found time in working on this anymore, maybe we can start from > > > > > this or not. > > > > > > > > > > > > I said async, but my aim is to put many requests to the cvq before getting the > > > > response. > > > > > > It doesn't differ from TX/RX in this case. > > > > > > > > > > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ > > > > > > > > > > This seems like a hack, if interrupt is used, you can simply do that > > > in the callback. > > > > YES. > > > > I also want to change the code, I just want to say the async is a goal. > > > > For the rx mode, we have introduce a work queue, I want to move the > > sending command job to the work queue. The caller just wakeup > > the work queue. > > > > If the caller want to got the result sync, then the caller can wait for it. > > If not, the caller can register an function to the work queue. > > > > And I think it will be easy to implement the timeout inside the workqueue. > > Looks much more complicated than a simple interrupt + timer/watchdog etc. > > Thanks > > > > > Thanks. > > > > > > > > > > Thanks > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 8:01 ` Xuan Zhuo @ 2024-01-22 8:32 ` Jason Wang 2024-01-22 9:11 ` Xuan Zhuo 0 siblings, 1 reply; 29+ messages in thread From: Jason Wang @ 2024-01-22 8:32 UTC (permalink / raw) To: Xuan Zhuo Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, Jan 22, 2024 at 4:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 15:57:08 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > > > > >>>>> + timeout--; > > > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > > > > >>>> thread you quoted. > > > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > > > > functions. > > > > > > > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > > > > by the user? > > > > > > > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > > > > > > > I see. > > > > > > > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > > > > { > > > > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > > > > msleep(1); > > > > > > > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > > > > msleep(1); > > > > > > > > } > > > > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > > > > sufficient, it has issue but nothing new > > > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > > > > > > > Then you need an interrupt for cvq. > > > > > > > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > > > > > > > I know this. But the interrupt maybe not a good solution without new space. > > > > > > > > What do you mean by "new space"? > > > > > > Yes, I know, the cvq can work with interrupt by the virtio spec. > > > But as I know, many hypervisors implement the cvq without supporting interrupt. > > > > It's a bug of the hypervisor that needs to be fix. Interrupt is > > provided by transport not the virtio itself. > > YES. I agree. > > But I still think we should not work with interrupt without any negotiation > directly. I more like to introduce a new feature to enable this. I can hardly believe we need to workaround the issue of specific hypervisors like this... Thanks > > Thanks. > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 8:32 ` Jason Wang @ 2024-01-22 9:11 ` Xuan Zhuo 0 siblings, 0 replies; 29+ messages in thread From: Xuan Zhuo @ 2024-01-22 9:11 UTC (permalink / raw) To: Jason Wang Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, 22 Jan 2024 16:32:46 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 4:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 15:57:08 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > > > > > >>>>> + timeout--; > > > > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > > > > > >>>> thread you quoted. > > > > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > > > > > functions. > > > > > > > > > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > > > > > by the user? > > > > > > > > > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > > > > > > > > > I see. > > > > > > > > > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > > > > > { > > > > > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > > > > > msleep(1); > > > > > > > > > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > > > > > msleep(1); > > > > > > > > > } > > > > > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > > > > > sufficient, it has issue but nothing new > > > > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > > > > > > > > > Then you need an interrupt for cvq. > > > > > > > > > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > > > > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > > > > > > > > > I know this. But the interrupt maybe not a good solution without new space. > > > > > > > > > > What do you mean by "new space"? > > > > > > > > Yes, I know, the cvq can work with interrupt by the virtio spec. > > > > But as I know, many hypervisors implement the cvq without supporting interrupt. > > > > > > It's a bug of the hypervisor that needs to be fix. Interrupt is > > > provided by transport not the virtio itself. > > > > YES. I agree. > > > > But I still think we should not work with interrupt without any negotiation > > directly. I more like to introduce a new feature to enable this. > > I can hardly believe we need to workaround the issue of specific > hypervisors like this... Maybe we should hear others. We are ok for this. I just think for other hypervisors. Thanks. > > Thanks > > > > > Thanks. > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev>]
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang [not found] ` <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev> @ 2024-01-26 3:13 ` Zhu Yanjun 0 siblings, 0 replies; 29+ messages in thread From: Zhu Yanjun @ 2024-01-26 3:13 UTC (permalink / raw) To: Xuan Zhuo, Jason Wang Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev 在 2024/1/26 11:11, Zhu Yanjun 写道: > > > 在 2024/1/22 15:02, Xuan Zhuo 写道: >> On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang<jasowang@redhat.com> wrote: >>> On Mon, Jan 22, 2024 at 2:55 PM Jason Wang<jasowang@redhat.com> wrote: >>>> On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo<xuanzhuo@linux.alibaba.com> wrote: >>>>> On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang<jasowang@redhat.com> wrote: >>>>>> On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo<xuanzhuo@linux.alibaba.com> wrote: >>>>>>> On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang<jasowang@redhat.com> wrote: >>>>>>>> On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun<yanjun.zhu@linux.dev> wrote: >>>>>>>>> 在 2024/1/20 1:29, Andrew Lunn 写道: >>>>>>>>>>>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>>>>>>>>>>>>> - !virtqueue_is_broken(vi->cvq)) >>>>>>>>>>>>>> + !virtqueue_is_broken(vi->cvq)) { >>>>>>>>>>>>>> + if (timeout) >>>>>>>>>>>>>> + timeout--; >>>>>>>>>>>>> This is not really a timeout, just a loop counter. 200 iterations could >>>>>>>>>>>>> be a very short time on reasonable H/W. I guess this avoid the soft >>>>>>>>>>>>> lockup, but possibly (likely?) breaks the functionality when we need to >>>>>>>>>>>>> loop for some non negligible time. >>>>>>>>>>>>> >>>>>>>>>>>>> I fear we need a more complex solution, as mentioned by Micheal in the >>>>>>>>>>>>> thread you quoted. >>>>>>>>>>>> Got it. I also look forward to the more complex solution to this problem. >>>>>>>>>>> Can we add a device capability (new feature bit) such as ctrq_wait_timeout >>>>>>>>>>> to get a reasonable timeout? >>>>>>>>>> The usual solution to this is include/linux/iopoll.h. If you can sleep >>>>>>>>>> read_poll_timeout() otherwise read_poll_timeout_atomic(). >>>>>>>>> I read carefully the functions read_poll_timeout() and >>>>>>>>> read_poll_timeout_atomic(). The timeout is set by the caller of the 2 >>>>>>>>> functions. >>>>>>>> FYI, in order to avoid a swtich of atomic or not, we need convert rx >>>>>>>> mode setting to workqueue first: >>>>>>>> >>>>>>>> https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html >>>>>>>> >>>>>>>>> As such, can we add a module parameter to customize this timeout value >>>>>>>>> by the user? >>>>>>>> Who is the "user" here, or how can the "user" know the value? >>>>>>>> >>>>>>>>> Or this timeout value is stored in device register, virtio_net driver >>>>>>>>> will read this timeout value at initialization? >>>>>>>> See another thread. The design needs to be general, or you can post a RFC. >>>>>>>> >>>>>>>> In another thought, we've already had a tx watchdog, maybe we can have >>>>>>>> something similar to cvq and use timeout + reset in that case. >>>>>>> But we may block by the reset ^_^ if the device is broken? >>>>>> I mean vq reset here. >>>>> I see. >>>>> >>>>> I mean when the deivce is broken, the vq reset also many be blocked. >>>>> >>>>> void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) >>>>> { >>>>> struct virtio_pci_modern_common_cfg __iomem *cfg; >>>>> >>>>> cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; >>>>> >>>>> vp_iowrite16(index, &cfg->cfg.queue_select); >>>>> vp_iowrite16(1, &cfg->queue_reset); >>>>> >>>>> while (vp_ioread16(&cfg->queue_reset)) >>>>> msleep(1); >>>>> >>>>> while (vp_ioread16(&cfg->cfg.queue_enable)) >>>>> msleep(1); >>>>> } >>>>> EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); >>>>> >>>>> In this function, for the broken device, we can not expect something. >>>> Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. >>>> >>>> Thanks >>>> >>>>>> It looks like we have multiple goals here >>>>>> >>>>>> 1) avoid lockups, using workqueue + cond_resched() seems to be >>>>>> sufficient, it has issue but nothing new >>>>>> 2) recover from the unresponsive device, the issue for timeout is that >>>>>> it needs to deal with false positives >>>>> I agree. >>>>> >>>>> But I want to add a new goal, cvq async. In the netdim, we will >>>>> send many requests via the cvq, so the cvq async will be nice. >>> Then you need an interrupt for cvq. >>> >>> FYI, I've posted a series that use interrupt for cvq in the past: >>> >>> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ >> I know this. But the interrupt maybe not a good solution without new space. >> >>> Haven't found time in working on this anymore, maybe we can start from >>> this or not. >> I said async, but my aim is to put many requests to the cvq before getting the >> response. >> >> Heng Qi posted thishttps://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ Sorry. This mail is rejected by netdev maillist. So I have to resend it. Thanks a lot. I read Heng Qi's commits carefully. This patch series are similiar with the NIC feature xmit_more. But if cvq command is urgent, can we let this urgent cvq command be passed ASAP? I mean, can we set a flag similar to xmit_more? if cvq command is not urgent, it can be queued. If it is urgent, this cvq command is passed ASAP. Zhu Yanjun > Zhu Yanjun > >> Thanks. >> >> >>> Thanks >>> >>>>> Thanks. >>>>> >>>>> >>>>>> Thanks >>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>>> Thans >>>>>>>> >>>>>>>>> Zhu Yanjun >>>>>>>>> >>>>>>>>>> Andrew ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 6:55 ` Jason Wang 2024-01-22 6:58 ` Jason Wang @ 2024-01-22 7:01 ` Xuan Zhuo 1 sibling, 0 replies; 29+ messages in thread From: Xuan Zhuo @ 2024-01-22 7:01 UTC (permalink / raw) To: Jason Wang Cc: Andrew Lunn, Heng Qi, Paolo Abeni, Zhu Yanjun, mst, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Mon, 22 Jan 2024 14:55:46 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > >>>>> + if (timeout) > > > > > > >>>>> + timeout--; > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > >>>> loop for some non negligible time. > > > > > > >>>> > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > >>>> thread you quoted. > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > >> to get a reasonable timeout? > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > functions. > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > mode setting to workqueue first: > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > by the user? > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > will read this timeout value at initialization? > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > I mean vq reset here. > > > > I see. > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > { > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > vp_iowrite16(1, &cfg->queue_reset); > > > > while (vp_ioread16(&cfg->queue_reset)) > > msleep(1); > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > msleep(1); > > } > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > In this function, for the broken device, we can not expect something. > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > YES. I agree. Thanks. > Thanks > > > > > > > > > > > It looks like we have multiple goals here > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > sufficient, it has issue but nothing new > > > 2) recover from the unresponsive device, the issue for timeout is that > > > it needs to deal with false positives > > > > > > I agree. > > > > But I want to add a new goal, cvq async. In the netdim, we will > > send many requests via the cvq, so the cvq async will be nice. > > > > Thanks. > > > > > > > > > > Thanks > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-19 14:27 ` Heng Qi 2024-01-19 17:29 ` Andrew Lunn @ 2024-01-22 3:08 ` Jason Wang 2024-01-22 4:42 ` Heng Qi 1 sibling, 1 reply; 29+ messages in thread From: Jason Wang @ 2024-01-22 3:08 UTC (permalink / raw) To: Heng Qi Cc: Zhu Yanjun, Paolo Abeni, Zhu Yanjun, mst, xuanzhuo, davem, edumazet, kuba, virtualization, netdev On Fri, Jan 19, 2024 at 10:27 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/1/18 下午8:01, Zhu Yanjun 写道: > > > > 在 2024/1/16 20:04, Paolo Abeni 写道: > >> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: > >>> From: Zhu Yanjun <yanjun.zhu@linux.dev> > >>> > >>> Some devices emulate the virtio_net hardwares. When virtio_net > >>> driver sends commands to the emulated hardware, normally the > >>> hardware needs time to response. Sometimes the time is very > >>> long. Thus, the following will appear. Then the whole system > >>> will hang. > >>> The similar problems also occur in Intel NICs and Mellanox NICs. > >>> As such, the similar solution is borrowed from them. A timeout > >>> value is added and the timeout value as large as possible is set > >>> to ensure that the driver gets the maximum possible response from > >>> the hardware. > >>> > >>> " > >>> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! > >>> [(udev-worker):3157] > >>> [ 213.796114] Modules linked in: virtio_net(+) net_failover > >>> failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common > >>> intel_uncore_frequency intel_uncore_frequency_common intel_ifs > >>> i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp > >>> coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support > >>> vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev > >>> intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci > >>> isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus > >>> i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf > >>> ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update > >>> fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel > >>> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 > >>> bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi > >>> pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > >>> [ 213.796194] irq event stamp: 67740 > >>> [ 213.796195] hardirqs last enabled at (67739): > >>> [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > >>> [ 213.796203] hardirqs last disabled at (67740): > >>> [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > >>> [ 213.796208] softirqs last enabled at (67686): > >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > >>> [ 213.796214] softirqs last disabled at (67681): > >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > >>> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded > >>> Not tainted 6.7.0+ #9 > >>> [ 213.796220] Hardware name: Intel Corporation > >>> M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 > >>> 11/14/2022 > >>> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > >>> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 > >>> 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 > >>> 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 > >>> 89 e8 5b > >>> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > >>> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: > >>> 0000000000000001 > >>> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: > >>> ff2f15095896f000 > >>> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: > >>> 0000000000000000 > >>> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: > >>> 0000000000000002 > >>> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: > >>> ff2f1509534f3000 > >>> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) > >>> knlGS:0000000000000000 > >>> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: > >>> 0000000000f71ef0 > >>> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > >>> 0000000000000000 > >>> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: > >>> 0000000000000400 > >>> [ 213.796247] PKRU: 55555554 > >>> [ 213.796249] Call Trace: > >>> [ 213.796250] <IRQ> > >>> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > >>> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > >>> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > >>> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > >>> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > >>> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > >>> [ 213.796282] </IRQ> > >>> [ 213.796284] <TASK> > >>> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > >>> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > >>> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > >>> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > >>> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > >>> [ 213.796328] virtio_dev_probe+0x195/0x230 > >>> [ 213.796333] really_probe+0x19f/0x400 > >>> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > >>> [ 213.796340] __driver_probe_device+0x78/0x160 > >>> [ 213.796343] driver_probe_device+0x1f/0x90 > >>> [ 213.796346] __driver_attach+0xd6/0x1d0 > >>> [ 213.796349] bus_for_each_dev+0x8c/0xe0 > >>> [ 213.796355] bus_add_driver+0x119/0x220 > >>> [ 213.796359] driver_register+0x59/0x100 > >>> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > >>> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > >>> [ 213.796375] do_one_initcall+0x6f/0x380 > >>> [ 213.796384] do_init_module+0x60/0x240 > >>> [ 213.796388] init_module_from_file+0x86/0xc0 > >>> [ 213.796396] idempotent_init_module+0x129/0x2c0 > >>> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > >>> [ 213.796409] do_syscall_64+0x60/0xe0 > >>> [ 213.796415] ? do_syscall_64+0x6f/0xe0 > >>> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > >>> [ 213.796424] ? do_syscall_64+0x6f/0xe0 > >>> [ 213.796427] ? do_syscall_64+0x6f/0xe0 > >>> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > >>> [ 213.796435] RIP: 0033:0x7f7758f279cd > >>> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e > >>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 > >>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 > >>> 89 01 48 > >>> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: > >>> 0000000000000139 > >>> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: > >>> 00007f7758f279cd > >>> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: > >>> 000000000000000f > >>> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: > >>> 00007ffe2cad8860 > >>> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: > >>> 0000000000020000 > >>> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: > >>> 0000557f98783aa0 > >>> [ 213.796482] </TASK> > >>> " > >>> > >>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > >>> --- > >>> drivers/net/virtio_net.c | 10 ++++++++-- > >>> 1 file changed, 8 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>> index 51b1868d2f22..28b7dd917a43 100644 > >>> --- a/drivers/net/virtio_net.c > >>> +++ b/drivers/net/virtio_net.c > >>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct > >>> virtnet_info *vi, u8 class, u8 cmd, > >>> { > >>> struct scatterlist *sgs[4], hdr, stat; > >>> unsigned out_num = 0, tmp; > >>> - int ret; > >>> + int ret, timeout = 200; > >>> /* Caller should know better */ > >>> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > >>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct > >>> virtnet_info *vi, u8 class, u8 cmd, > >>> * into the hypervisor, so the request should be handled > >>> immediately. > >>> */ > >>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > >>> - !virtqueue_is_broken(vi->cvq)) > >>> + !virtqueue_is_broken(vi->cvq)) { > >>> + if (timeout) > >>> + timeout--; > >> This is not really a timeout, just a loop counter. 200 iterations could > >> be a very short time on reasonable H/W. I guess this avoid the soft > >> lockup, but possibly (likely?) breaks the functionality when we need to > >> loop for some non negligible time. > >> > >> I fear we need a more complex solution, as mentioned by Micheal in the > >> thread you quoted. > > > > Got it. I also look forward to the more complex solution to this problem. > > Can we add a device capability (new feature bit) such as > ctrq_wait_timeout to get a reasonable timeout? This adds another kind of complexity for migration compatibility. And we need to make it more general, e.g 1) it should not be cvq specific 2) or we can have a timeout that works for all queues ? Thanks > > Thanks, > Heng > > > > > Zhu Yanjun > > > >> > >> Cheers, > >> > >> Paolo > >> > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-22 3:08 ` Jason Wang @ 2024-01-22 4:42 ` Heng Qi 0 siblings, 0 replies; 29+ messages in thread From: Heng Qi @ 2024-01-22 4:42 UTC (permalink / raw) To: Jason Wang Cc: Zhu Yanjun, Paolo Abeni, Zhu Yanjun, mst, xuanzhuo, davem, edumazet, kuba, virtualization, netdev 在 2024/1/22 上午11:08, Jason Wang 写道: > On Fri, Jan 19, 2024 at 10:27 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >> >> >> 在 2024/1/18 下午8:01, Zhu Yanjun 写道: >>> 在 2024/1/16 20:04, Paolo Abeni 写道: >>>> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: >>>>> From: Zhu Yanjun <yanjun.zhu@linux.dev> >>>>> >>>>> Some devices emulate the virtio_net hardwares. When virtio_net >>>>> driver sends commands to the emulated hardware, normally the >>>>> hardware needs time to response. Sometimes the time is very >>>>> long. Thus, the following will appear. Then the whole system >>>>> will hang. >>>>> The similar problems also occur in Intel NICs and Mellanox NICs. >>>>> As such, the similar solution is borrowed from them. A timeout >>>>> value is added and the timeout value as large as possible is set >>>>> to ensure that the driver gets the maximum possible response from >>>>> the hardware. >>>>> >>>>> " >>>>> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! >>>>> [(udev-worker):3157] >>>>> [ 213.796114] Modules linked in: virtio_net(+) net_failover >>>>> failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common >>>>> intel_uncore_frequency intel_uncore_frequency_common intel_ifs >>>>> i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp >>>>> coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support >>>>> vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev >>>>> intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci >>>>> isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus >>>>> i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf >>>>> ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update >>>>> fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel >>>>> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 >>>>> bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi >>>>> pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath >>>>> [ 213.796194] irq event stamp: 67740 >>>>> [ 213.796195] hardirqs last enabled at (67739): >>>>> [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 >>>>> [ 213.796203] hardirqs last disabled at (67740): >>>>> [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 >>>>> [ 213.796208] softirqs last enabled at (67686): >>>>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >>>>> [ 213.796214] softirqs last disabled at (67681): >>>>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >>>>> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded >>>>> Not tainted 6.7.0+ #9 >>>>> [ 213.796220] Hardware name: Intel Corporation >>>>> M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 >>>>> 11/14/2022 >>>>> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 >>>>> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 >>>>> 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 >>>>> 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 >>>>> 89 e8 5b >>>>> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 >>>>> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: >>>>> 0000000000000001 >>>>> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: >>>>> ff2f15095896f000 >>>>> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: >>>>> 0000000000000000 >>>>> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: >>>>> 0000000000000002 >>>>> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: >>>>> ff2f1509534f3000 >>>>> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: >>>>> 0000000000f71ef0 >>>>> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>> 0000000000000000 >>>>> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: >>>>> 0000000000000400 >>>>> [ 213.796247] PKRU: 55555554 >>>>> [ 213.796249] Call Trace: >>>>> [ 213.796250] <IRQ> >>>>> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 >>>>> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 >>>>> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 >>>>> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 >>>>> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 >>>>> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 >>>>> [ 213.796282] </IRQ> >>>>> [ 213.796284] <TASK> >>>>> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 >>>>> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 >>>>> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] >>>>> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] >>>>> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] >>>>> [ 213.796328] virtio_dev_probe+0x195/0x230 >>>>> [ 213.796333] really_probe+0x19f/0x400 >>>>> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 >>>>> [ 213.796340] __driver_probe_device+0x78/0x160 >>>>> [ 213.796343] driver_probe_device+0x1f/0x90 >>>>> [ 213.796346] __driver_attach+0xd6/0x1d0 >>>>> [ 213.796349] bus_for_each_dev+0x8c/0xe0 >>>>> [ 213.796355] bus_add_driver+0x119/0x220 >>>>> [ 213.796359] driver_register+0x59/0x100 >>>>> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] >>>>> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] >>>>> [ 213.796375] do_one_initcall+0x6f/0x380 >>>>> [ 213.796384] do_init_module+0x60/0x240 >>>>> [ 213.796388] init_module_from_file+0x86/0xc0 >>>>> [ 213.796396] idempotent_init_module+0x129/0x2c0 >>>>> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 >>>>> [ 213.796409] do_syscall_64+0x60/0xe0 >>>>> [ 213.796415] ? do_syscall_64+0x6f/0xe0 >>>>> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 >>>>> [ 213.796424] ? do_syscall_64+0x6f/0xe0 >>>>> [ 213.796427] ? do_syscall_64+0x6f/0xe0 >>>>> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >>>>> [ 213.796435] RIP: 0033:0x7f7758f279cd >>>>> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e >>>>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 >>>>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 >>>>> 89 01 48 >>>>> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: >>>>> 0000000000000139 >>>>> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: >>>>> 00007f7758f279cd >>>>> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: >>>>> 000000000000000f >>>>> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: >>>>> 00007ffe2cad8860 >>>>> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: >>>>> 0000000000020000 >>>>> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: >>>>> 0000557f98783aa0 >>>>> [ 213.796482] </TASK> >>>>> " >>>>> >>>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >>>>> --- >>>>> drivers/net/virtio_net.c | 10 ++++++++-- >>>>> 1 file changed, 8 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>> index 51b1868d2f22..28b7dd917a43 100644 >>>>> --- a/drivers/net/virtio_net.c >>>>> +++ b/drivers/net/virtio_net.c >>>>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct >>>>> virtnet_info *vi, u8 class, u8 cmd, >>>>> { >>>>> struct scatterlist *sgs[4], hdr, stat; >>>>> unsigned out_num = 0, tmp; >>>>> - int ret; >>>>> + int ret, timeout = 200; >>>>> /* Caller should know better */ >>>>> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); >>>>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct >>>>> virtnet_info *vi, u8 class, u8 cmd, >>>>> * into the hypervisor, so the request should be handled >>>>> immediately. >>>>> */ >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>>>> - !virtqueue_is_broken(vi->cvq)) >>>>> + !virtqueue_is_broken(vi->cvq)) { >>>>> + if (timeout) >>>>> + timeout--; >>>> This is not really a timeout, just a loop counter. 200 iterations could >>>> be a very short time on reasonable H/W. I guess this avoid the soft >>>> lockup, but possibly (likely?) breaks the functionality when we need to >>>> loop for some non negligible time. >>>> >>>> I fear we need a more complex solution, as mentioned by Micheal in the >>>> thread you quoted. >>> Got it. I also look forward to the more complex solution to this problem. >> Can we add a device capability (new feature bit) such as >> ctrq_wait_timeout to get a reasonable timeout? > This adds another kind of complexity for migration compatibility. Yes, this requires device adaptation. But the value will be more reasonable, different devices have different requirements for timeout. > > And we need to make it more general, e.g > > 1) it should not be cvq specific Agree. Especially considering avq also has this problem. > 2) or we can have a timeout that works for all queues I remember Parav mentioned that they were also making some bulk improvements to ctrlq/avq, or we should look at a combined solution. rxq timeout can be used for our detection aliveness needs, especially after a live migration. txq timeout can be used with watchdog + vq reset. Thanks, Heng > ? > > Thanks > >> Thanks, >> Heng >> >>> Zhu Yanjun >>> >>>> Cheers, >>>> >>>> Paolo >>>> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-16 12:04 ` Paolo Abeni 2024-01-18 12:01 ` Zhu Yanjun @ 2024-01-18 13:14 ` Michael S. Tsirkin 2024-01-19 1:42 ` Xuan Zhuo 1 sibling, 1 reply; 29+ messages in thread From: Michael S. Tsirkin @ 2024-01-18 13:14 UTC (permalink / raw) To: Paolo Abeni Cc: Zhu Yanjun, jasowang, xuanzhuo, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun On Tue, Jan 16, 2024 at 01:04:49PM +0100, Paolo Abeni wrote: > On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > > > Some devices emulate the virtio_net hardwares. When virtio_net > > driver sends commands to the emulated hardware, normally the > > hardware needs time to response. Sometimes the time is very > > long. Thus, the following will appear. Then the whole system > > will hang. > > The similar problems also occur in Intel NICs and Mellanox NICs. > > As such, the similar solution is borrowed from them. A timeout > > value is added and the timeout value as large as possible is set > > to ensure that the driver gets the maximum possible response from > > the hardware. > > > > " > > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > > [ 213.796194] irq event stamp: 67740 > > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > > [ 213.796247] PKRU: 55555554 > > [ 213.796249] Call Trace: > > [ 213.796250] <IRQ> > > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > > [ 213.796282] </IRQ> > > [ 213.796284] <TASK> > > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > > [ 213.796328] virtio_dev_probe+0x195/0x230 > > [ 213.796333] really_probe+0x19f/0x400 > > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > > [ 213.796340] __driver_probe_device+0x78/0x160 > > [ 213.796343] driver_probe_device+0x1f/0x90 > > [ 213.796346] __driver_attach+0xd6/0x1d0 > > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > > [ 213.796355] bus_add_driver+0x119/0x220 > > [ 213.796359] driver_register+0x59/0x100 > > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > > [ 213.796375] do_one_initcall+0x6f/0x380 > > [ 213.796384] do_init_module+0x60/0x240 > > [ 213.796388] init_module_from_file+0x86/0xc0 > > [ 213.796396] idempotent_init_module+0x129/0x2c0 > > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > > [ 213.796409] do_syscall_64+0x60/0xe0 > > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > [ 213.796435] RIP: 0033:0x7f7758f279cd > > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > > [ 213.796482] </TASK> > > " > > > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > > --- > > drivers/net/virtio_net.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 51b1868d2f22..28b7dd917a43 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > { > > struct scatterlist *sgs[4], hdr, stat; > > unsigned out_num = 0, tmp; > > - int ret; > > + int ret, timeout = 200; > > > > /* Caller should know better */ > > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > * into the hypervisor, so the request should be handled immediately. > > */ > > while (!virtqueue_get_buf(vi->cvq, &tmp) && > > - !virtqueue_is_broken(vi->cvq)) > > + !virtqueue_is_broken(vi->cvq)) { > > + if (timeout) > > + timeout--; > > This is not really a timeout, just a loop counter. 200 iterations could > be a very short time on reasonable H/W. I guess this avoid the soft > lockup, but possibly (likely?) breaks the functionality when we need to > loop for some non negligible time. > > I fear we need a more complex solution, as mentioned by Micheal in the > thread you quoted. > > Cheers, > > Paolo So a bunch of devices just queue a rquest on a workqueue. Problem is, errors are not reported then. I really feel this is something netdev core should support better: return a "deferred" status and have netdev core wait outside a lock on a waitqueue. -- MST ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang 2024-01-18 13:14 ` Michael S. Tsirkin @ 2024-01-19 1:42 ` Xuan Zhuo 0 siblings, 0 replies; 29+ messages in thread From: Xuan Zhuo @ 2024-01-19 1:42 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zhu Yanjun, jasowang, davem, edumazet, kuba, virtualization, netdev, Zhu Yanjun, Paolo Abeni, Heng Qi On Thu, 18 Jan 2024 08:14:21 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Jan 16, 2024 at 01:04:49PM +0100, Paolo Abeni wrote: > > On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: > > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > > > > > Some devices emulate the virtio_net hardwares. When virtio_net > > > driver sends commands to the emulated hardware, normally the > > > hardware needs time to response. Sometimes the time is very > > > long. Thus, the following will appear. Then the whole system > > > will hang. > > > The similar problems also occur in Intel NICs and Mellanox NICs. > > > As such, the similar solution is borrowed from them. A timeout > > > value is added and the timeout value as large as possible is set > > > to ensure that the driver gets the maximum possible response from > > > the hardware. > > > > > > " > > > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > > > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > > > [ 213.796194] irq event stamp: 67740 > > > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > > > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > > > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > > > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > > > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > > > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > > > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > > > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > > > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > > > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > > > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > > > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > > > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > > > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > > > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > > > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > > > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > > > [ 213.796247] PKRU: 55555554 > > > [ 213.796249] Call Trace: > > > [ 213.796250] <IRQ> > > > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > > > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > > > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > > > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > > > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > > > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > > > [ 213.796282] </IRQ> > > > [ 213.796284] <TASK> > > > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > > > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > > > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > > > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > > > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > > > [ 213.796328] virtio_dev_probe+0x195/0x230 > > > [ 213.796333] really_probe+0x19f/0x400 > > > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > > > [ 213.796340] __driver_probe_device+0x78/0x160 > > > [ 213.796343] driver_probe_device+0x1f/0x90 > > > [ 213.796346] __driver_attach+0xd6/0x1d0 > > > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > > > [ 213.796355] bus_add_driver+0x119/0x220 > > > [ 213.796359] driver_register+0x59/0x100 > > > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > > > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > > > [ 213.796375] do_one_initcall+0x6f/0x380 > > > [ 213.796384] do_init_module+0x60/0x240 > > > [ 213.796388] init_module_from_file+0x86/0xc0 > > > [ 213.796396] idempotent_init_module+0x129/0x2c0 > > > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > > > [ 213.796409] do_syscall_64+0x60/0xe0 > > > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > > > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > > > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > > > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > > > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > [ 213.796435] RIP: 0033:0x7f7758f279cd > > > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > > > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > > > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > > > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > > > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > > > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > > > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > > > [ 213.796482] </TASK> > > > " > > > > > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > > > --- > > > drivers/net/virtio_net.c | 10 ++++++++-- > > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index 51b1868d2f22..28b7dd917a43 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > > { > > > struct scatterlist *sgs[4], hdr, stat; > > > unsigned out_num = 0, tmp; > > > - int ret; > > > + int ret, timeout = 200; > > > > > > /* Caller should know better */ > > > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > > > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > > * into the hypervisor, so the request should be handled immediately. > > > */ > > > while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > - !virtqueue_is_broken(vi->cvq)) > > > + !virtqueue_is_broken(vi->cvq)) { > > > + if (timeout) > > > + timeout--; > > > > This is not really a timeout, just a loop counter. 200 iterations could > > be a very short time on reasonable H/W. I guess this avoid the soft > > lockup, but possibly (likely?) breaks the functionality when we need to > > loop for some non negligible time. > > > > I fear we need a more complex solution, as mentioned by Micheal in the > > thread you quoted. > > > > Cheers, > > > > Paolo > > So a bunch of devices just queue a rquest on a workqueue. Problem is, > errors are not reported then. I really feel this is something netdev > core should support better: return a "deferred" status and have > netdev core wait outside a lock on a waitqueue. And on the other side, more and more requests will be post to the cvq, we should refactor this to async mode. Now the netdim has this request. @Heng Qi Thanks. > > > -- > MST > ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-01-26 3:13 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-15 1:29 [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang Zhu Yanjun
2024-01-15 2:20 ` Jason Wang
2024-01-15 10:25 ` Zhu Yanjun
[not found] ` <6cf2699a-483d-4124-9782-b6a771a41e70@linux.dev>
2024-01-22 3:06 ` Jason Wang
2024-01-16 12:04 ` Paolo Abeni
2024-01-18 12:01 ` Zhu Yanjun
2024-01-19 14:27 ` Heng Qi
2024-01-19 17:29 ` Andrew Lunn
2024-01-20 4:21 ` Zhu Yanjun
2024-01-22 2:12 ` Zhu Yanjun
2024-01-22 3:14 ` Jason Wang
2024-01-22 3:58 ` Xuan Zhuo
2024-01-22 4:16 ` Jason Wang
2024-01-22 6:16 ` Xuan Zhuo
2024-01-22 6:55 ` Jason Wang
2024-01-22 6:58 ` Jason Wang
2024-01-22 7:02 ` Xuan Zhuo
2024-01-22 7:19 ` Jason Wang
2024-01-22 7:25 ` Xuan Zhuo
2024-01-22 7:57 ` Jason Wang
2024-01-22 8:01 ` Xuan Zhuo
2024-01-22 8:32 ` Jason Wang
2024-01-22 9:11 ` Xuan Zhuo
[not found] ` <e46d04d7-4eb7-4fcd-821a-d558c07531b7@linux.dev>
2024-01-26 3:13 ` Zhu Yanjun
2024-01-22 7:01 ` Xuan Zhuo
2024-01-22 3:08 ` Jason Wang
2024-01-22 4:42 ` Heng Qi
2024-01-18 13:14 ` Michael S. Tsirkin
2024-01-19 1:42 ` Xuan Zhuo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).