* [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]
@ 2024-04-30 6:13 David Wang
2024-05-06 14:30 ` David Wang
0 siblings, 1 reply; 6+ messages in thread
From: David Wang @ 2024-04-30 6:13 UTC (permalink / raw)
To: dreaming.about.electric.sheep, airlied, kraxel, maarten.lankhorst,
mripard, tzimmermann, airlied, daniel
Cc: virtualization, spice-devel, dri-devel, linux-kernel, regressions,
David Wang
Hi,
I got following kernel WARNING when the my 2-core KVM(6.9.0-rc6) is under high cpu load.
[Mon Apr 29 21:36:04 2024] ------------[ cut here ]------------
[Mon Apr 29 21:36:04 2024] workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]
[Mon Apr 29 21:36:04 2024] WARNING: CPU: 1 PID: 792 at kernel/workqueue.c:3728 check_flush_dependency+0xfd/0x120
[Mon Apr 29 21:36:04 2024] Modules linked in: xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) nft_compat(E) nf_tables(E) br_netfilter(E) bridge(E) stp(E) llc(E) ip_set(E) nfnetlink(E) ip_vs_sh(E) ip_vs_wrr(E) ip_vs_rr(E) ip_vs(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) intel_rapl_msr(E) intel_rapl_common(E) crct10dif_pclmul(E) ghash_clmulni_intel(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_intel_dspcfg(E) sha512_ssse3(E) snd_hda_codec(E) sha512_generic(E) sha256_ssse3(E) overlay(E) sha1_ssse3(E) snd_hda_core(E) snd_hwdep(E) aesni_intel(E) snd_pcm(E) crypto_simd(E) pcspkr(E) cryptd(E) joydev(E) qxl(E) snd_timer(E) drm_ttm_helper(E) ttm(E) evdev(E) snd(E) iTCO_wdt(E) serio_raw(E) sg(E) virtio_balloon(E) virtio_console(E) iTCO_vendor_support(E) soundcore(E) qemu_fw_cfg(E) drm_kms_helper(E) button(E) binfmt_misc(E) fuse(E) drm(E) configfs(E) virtio_rng(E) rng_core(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E)
[Mon Apr 29 21:36:04 2024] hid_generic(E) usbhid(E) hid(E) sr_mod(E) cdrom(E) ahci(E) libahci(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E) libata(E) xhci_pci(E) crc32_pclmul(E) crc32c_intel(E) scsi_mod(E) scsi_common(E) lpc_ich(E) i2c_i801(E) xhci_hcd(E) psmouse(E) i2c_smbus(E) virtio_pci(E) usbcore(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E) usb_common(E) virtio(E) mfd_core(E) virtio_ring(E)
[Mon Apr 29 21:36:04 2024] CPU: 1 PID: 792 Comm: kworker/u13:4 Tainted: G E 6.9.0-rc6-linan-5 #197
[Mon Apr 29 21:36:04 2024] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[Mon Apr 29 21:36:04 2024] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[Mon Apr 29 21:36:04 2024] RIP: 0010:check_flush_dependency+0xfd/0x120
[Mon Apr 29 21:36:04 2024] Code: 8b 45 18 48 8d b2 c0 00 00 00 49 89 e8 48 8d 8b c0 00 00 00 48 c7 c7 68 30 a4 a7 c6 05 9b 12 6e 01 01 48 89 c2 e8 53 b9 fd ff <0f> 0b e9 1e ff ff ff 80 3d 86 12 6e 01 00 75 93 e9 4a ff ff ff 66
[Mon Apr 29 21:36:04 2024] RSP: 0018:ffff9d31805abce8 EFLAGS: 00010086
[Mon Apr 29 21:36:04 2024] RAX: 0000000000000000 RBX: ffff8c8c4004ee00 RCX: 0000000000000000
[Mon Apr 29 21:36:04 2024] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
[Mon Apr 29 21:36:04 2024] RBP: ffffffffc0b53570 R08: 0000000000000000 R09: 0000000000000003
[Mon Apr 29 21:36:04 2024] R10: ffff9d31805abb80 R11: ffffffffa7cc1108 R12: ffff8c8c42eb8000
[Mon Apr 29 21:36:04 2024] R13: ffff8c8c48077900 R14: ffff8c8cbbd30b80 R15: 0000000000000001
[Mon Apr 29 21:36:04 2024] FS: 0000000000000000(0000) GS:ffff8c8cbbd00000(0000) knlGS:0000000000000000
[Mon Apr 29 21:36:04 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Apr 29 21:36:04 2024] CR2: 00007ffd38bb3ff8 CR3: 000000010217a000 CR4: 0000000000350ef0
[Mon Apr 29 21:36:04 2024] Call Trace:
[Mon Apr 29 21:36:04 2024] <TASK>
[Mon Apr 29 21:36:04 2024] ? __warn+0x7c/0x120
[Mon Apr 29 21:36:04 2024] ? check_flush_dependency+0xfd/0x120
[Mon Apr 29 21:36:04 2024] ? report_bug+0x18d/0x1c0
[Mon Apr 29 21:36:04 2024] ? srso_return_thunk+0x5/0x5f
[Mon Apr 29 21:36:04 2024] ? handle_bug+0x3c/0x80
[Mon Apr 29 21:36:04 2024] ? exc_invalid_op+0x13/0x60
[Mon Apr 29 21:36:04 2024] ? asm_exc_invalid_op+0x16/0x20
[Mon Apr 29 21:36:04 2024] ? __pfx_qxl_gc_work+0x10/0x10 [qxl]
[Mon Apr 29 21:36:04 2024] ? check_flush_dependency+0xfd/0x120
[Mon Apr 29 21:36:04 2024] ? check_flush_dependency+0xfd/0x120
[Mon Apr 29 21:36:04 2024] __flush_work.isra.0+0xc0/0x270
[Mon Apr 29 21:36:04 2024] ? srso_return_thunk+0x5/0x5f
[Mon Apr 29 21:36:04 2024] ? srso_return_thunk+0x5/0x5f
[Mon Apr 29 21:36:04 2024] ? __queue_work.part.0+0x18b/0x3d0
[Mon Apr 29 21:36:04 2024] ? srso_return_thunk+0x5/0x5f
[Mon Apr 29 21:36:04 2024] qxl_queue_garbage_collect+0x7f/0x90 [qxl]
[Mon Apr 29 21:36:04 2024] qxl_fence_wait+0x9c/0x180 [qxl]
[Mon Apr 29 21:36:04 2024] dma_fence_wait_timeout+0x61/0x130
[Mon Apr 29 21:36:04 2024] dma_resv_wait_timeout+0x6d/0xd0
[Mon Apr 29 21:36:04 2024] ttm_bo_delayed_delete+0x26/0x80 [ttm]
[Mon Apr 29 21:36:04 2024] process_one_work+0x18c/0x3b0
[Mon Apr 29 21:36:04 2024] worker_thread+0x273/0x390
[Mon Apr 29 21:36:04 2024] ? __pfx_worker_thread+0x10/0x10
[Mon Apr 29 21:36:04 2024] kthread+0xdd/0x110
[Mon Apr 29 21:36:04 2024] ? __pfx_kthread+0x10/0x10
[Mon Apr 29 21:36:04 2024] ret_from_fork+0x30/0x50
[Mon Apr 29 21:36:04 2024] ? __pfx_kthread+0x10/0x10
[Mon Apr 29 21:36:04 2024] ret_from_fork_asm+0x1a/0x30
[Mon Apr 29 21:36:04 2024] </TASK>
[Mon Apr 29 21:36:04 2024] ---[ end trace 0000000000000000 ]---
I find that the exact warning message mentioned in
https://lore.kernel.org/lkml/20240404181448.1643-1-dreaming.about.electric.sheep@gmail.com/T/#m8c2ecc83ebba8717b1290ec28d4dc15f2fa595d5
And confirmed that the warning is caused by 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.
It seems that under heavy load, qxl_queue_garbage_collect would be called within
a WQ_MEM_RECLAIM worker, and flush qxl_gc_work which is a
!WQ_MEM_RECLAIM worker. This will trigger the kernel WARNING by
check_flush_dependency.
And I tried following changes, setting flush flag to false.
The warning is gone, but I am not sure whether there is any other side-effect,
especially the issue mentioned in
https://lore.kernel.org/lkml/20240404181448.1643-2-dreaming.about.electric.sheep@gmail.com/T/#m988ffad2000c794dcfdab7e60b03db93d8726391
Signed-off-by: David Wang <00107082@163.com>
---
drivers/gpu/drm/qxl/qxl_release.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 9febc8b73f09..f372085c5aad 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -76,7 +76,7 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
qxl_io_notify_oom(qdev);
for (count = 0; count < 11; count++) {
- if (!qxl_queue_garbage_collect(qdev, true))
+ if (!qxl_queue_garbage_collect(qdev, false))
break;
if (dma_fence_is_signaled(fence))
--
2.39.2
David
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]
2024-04-30 6:13 [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl] David Wang
@ 2024-05-06 14:30 ` David Wang
2024-05-07 5:04 ` Linux regression tracking (Thorsten Leemhuis)
0 siblings, 1 reply; 6+ messages in thread
From: David Wang @ 2024-05-06 14:30 UTC (permalink / raw)
To: 00107082
Cc: airlied, airlied, daniel, dreaming.about.electric.sheep,
dri-devel, kraxel, linux-kernel, maarten.lankhorst, mripard,
regressions, spice-devel, tzimmermann, virtualization
The kernel warning still shows up in 6.9.0-rc7.
(I think 4 high load processes on a 2-Core VM could easily trigger the kernel warning.)
Thanks
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]
2024-05-06 14:30 ` David Wang
@ 2024-05-07 5:04 ` Linux regression tracking (Thorsten Leemhuis)
2024-05-08 12:35 ` Anders Blomdell
0 siblings, 1 reply; 6+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-05-07 5:04 UTC (permalink / raw)
To: David Wang
Cc: airlied, airlied, daniel, dreaming.about.electric.sheep,
dri-devel, kraxel, linux-kernel, maarten.lankhorst, mripard,
regressions, spice-devel, tzimmermann, virtualization
On 06.05.24 16:30, David Wang wrote:
>> On 30.04.24 08:13, David Wang wrote:
>> And confirmed that the warning is caused by
>> 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.
>
> The kernel warning still shows up in 6.9.0-rc7.
> (I think 4 high load processes on a 2-Core VM could easily trigger the kernel warning.)
Thx for the report. Linus just reverted the commit 07ed11afb68 you
mentioned in your initial mail (I put that quote in again, see above):
3628e0383dd349 ("Reapply "drm/qxl: simplify qxl_fence_wait"")
https://git.kernel.org/torvalds/c/3628e0383dd349f02f882e612ab6184e4bb3dc10
So this hopefully should be history now.
Ciao, Thorsten
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]
2024-05-07 5:04 ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-05-08 12:35 ` Anders Blomdell
2024-05-08 12:51 ` Linux regression tracking (Thorsten Leemhuis)
0 siblings, 1 reply; 6+ messages in thread
From: Anders Blomdell @ 2024-05-08 12:35 UTC (permalink / raw)
To: Linux regressions mailing list, David Wang
Cc: airlied, airlied, daniel, dreaming.about.electric.sheep,
dri-devel, kraxel, linux-kernel, maarten.lankhorst, mripard,
spice-devel, tzimmermann, virtualization, stable
On 2024-05-07 07:04, Linux regression tracking (Thorsten Leemhuis) wrote:
>
>
> On 06.05.24 16:30, David Wang wrote:
>>> On 30.04.24 08:13, David Wang wrote:
>
>>> And confirmed that the warning is caused by
>>> 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.
>>
>> The kernel warning still shows up in 6.9.0-rc7.
>> (I think 4 high load processes on a 2-Core VM could easily trigger the kernel warning.)
>
> Thx for the report. Linus just reverted the commit 07ed11afb68 you
> mentioned in your initial mail (I put that quote in again, see above):
>
> 3628e0383dd349 ("Reapply "drm/qxl: simplify qxl_fence_wait"")
> https://git.kernel.org/torvalds/c/3628e0383dd349f02f882e612ab6184e4bb3dc10
>
> So this hopefully should be history now.
>
> Ciao, Thorsten
>
Since this affects the 6.8 series (6.8.7 and onwards), I made a CC to stable@vger.kernel.org
/Anders
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]
2024-05-08 12:35 ` Anders Blomdell
@ 2024-05-08 12:51 ` Linux regression tracking (Thorsten Leemhuis)
2024-05-13 11:32 ` Greg KH
0 siblings, 1 reply; 6+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-05-08 12:51 UTC (permalink / raw)
To: stable, Greg KH
Cc: airlied, airlied, daniel, dreaming.about.electric.sheep,
dri-devel, kraxel, linux-kernel, maarten.lankhorst, mripard,
spice-devel, tzimmermann, virtualization, Anders Blomdell,
Linux regressions mailing list, David Wang
On 08.05.24 14:35, Anders Blomdell wrote:
> On 2024-05-07 07:04, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 06.05.24 16:30, David Wang wrote:
>>>> On 30.04.24 08:13, David Wang wrote:
>>
>>>> And confirmed that the warning is caused by
>>>> 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.
>>>
>>> The kernel warning still shows up in 6.9.0-rc7.
>>> (I think 4 high load processes on a 2-Core VM could easily trigger
>>> the kernel warning.)
>>
>> Thx for the report. Linus just reverted the commit 07ed11afb68 you
>> mentioned in your initial mail (I put that quote in again, see above):
>>
>> 3628e0383dd349 ("Reapply "drm/qxl: simplify qxl_fence_wait"")
>> https://git.kernel.org/torvalds/c/3628e0383dd349f02f882e612ab6184e4bb3dc10
>>
>> So this hopefully should be history now.
>>
> Since this affects the 6.8 series (6.8.7 and onwards), I made a CC to
> stable@vger.kernel.org
Ohh, good idea, I thought Linus had added a stable tag, but that is not
the case. Adding Greg as well and making things explicit:
@Greg: you might want to add 3628e0383dd349 ("Reapply "drm/qxl: simplify
qxl_fence_wait"") to all branches that received 07ed11afb68d94 ("Revert
"drm/qxl: simplify qxl_fence_wait"") (which afaics went into v6.8.7,
v6.6.28, v6.1.87, and v5.15.156).
Ciao, Thorsten
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]
2024-05-08 12:51 ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-05-13 11:32 ` Greg KH
0 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2024-05-13 11:32 UTC (permalink / raw)
To: Linux regressions mailing list
Cc: stable, airlied, airlied, daniel, dreaming.about.electric.sheep,
dri-devel, kraxel, linux-kernel, maarten.lankhorst, mripard,
spice-devel, tzimmermann, virtualization, Anders Blomdell,
David Wang
On Wed, May 08, 2024 at 02:51:10PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 08.05.24 14:35, Anders Blomdell wrote:
> > On 2024-05-07 07:04, Linux regression tracking (Thorsten Leemhuis) wrote:
> >> On 06.05.24 16:30, David Wang wrote:
> >>>> On 30.04.24 08:13, David Wang wrote:
> >>
> >>>> And confirmed that the warning is caused by
> >>>> 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.
> >>>
> >>> The kernel warning still shows up in 6.9.0-rc7.
> >>> (I think 4 high load processes on a 2-Core VM could easily trigger
> >>> the kernel warning.)
> >>
> >> Thx for the report. Linus just reverted the commit 07ed11afb68 you
> >> mentioned in your initial mail (I put that quote in again, see above):
> >>
> >> 3628e0383dd349 ("Reapply "drm/qxl: simplify qxl_fence_wait"")
> >> https://git.kernel.org/torvalds/c/3628e0383dd349f02f882e612ab6184e4bb3dc10
> >>
> >> So this hopefully should be history now.
> >>
> > Since this affects the 6.8 series (6.8.7 and onwards), I made a CC to
> > stable@vger.kernel.org
>
> Ohh, good idea, I thought Linus had added a stable tag, but that is not
> the case. Adding Greg as well and making things explicit:
>
> @Greg: you might want to add 3628e0383dd349 ("Reapply "drm/qxl: simplify
> qxl_fence_wait"") to all branches that received 07ed11afb68d94 ("Revert
> "drm/qxl: simplify qxl_fence_wait"") (which afaics went into v6.8.7,
> v6.6.28, v6.1.87, and v5.15.156).
Now queued up, thanks.
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-05-13 11:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-30 6:13 [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl] David Wang
2024-05-06 14:30 ` David Wang
2024-05-07 5:04 ` Linux regression tracking (Thorsten Leemhuis)
2024-05-08 12:35 ` Anders Blomdell
2024-05-08 12:51 ` Linux regression tracking (Thorsten Leemhuis)
2024-05-13 11:32 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).