* next: WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0
@ 2023-05-19 10:16 Naresh Kamboju
[not found] ` <CALm+0cXew-QX9gkHkL7majJuENP1C_WLVG=pQ_6RGZ1UgQ79Eg@mail.gmail.com>
0 siblings, 1 reply; 4+ messages in thread
From: Naresh Kamboju @ 2023-05-19 10:16 UTC (permalink / raw)
To: open list, Linux-Next Mailing List, lkft-triage,
clang-built-linux
Cc: Nathan Chancellor, Arnd Bergmann, Dan Carpenter, Nick Desaulniers,
Anders Roxell
Linux next 6.4.0-rc2-next-20230519 fails intermittently on qemu-x86_64
with the following kernel warning. The occurrence is 3 times out of 100 times.
Noticed with gcc-12 and clang-17 built kernels.
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Boot log:
-----------
<4>[ 2.460214] ------------[ cut here ]------------
<4>[ 2.460602] WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999
worker_enter_idle+0xb2/0xc0
<4>[ 2.462163] Modules linked in:
<4>[ 2.463401] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted
6.4.0-rc2-next-20230519 #1
<4>[ 2.463771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.14.0-2 04/01/2014
<4>[ 2.465127] Workqueue: 0x0 (events)
<4>[ 2.465678] RIP: 0010:worker_enter_idle+0xb2/0xc0
<4>[ 2.466095] Code: e0 93 04 00 48 03 35 7d 59 bc 01 e8 48 1e 0a
00 8b 43 3c 39 43 38 75 06 83 7b 24 00 75 0c 5b 5d c3 cc cc cc cc cc
0f 0b eb f4 <0f> 0b eb f0 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
90 90 90
<4>[ 2.467476] RSP: 0000:ffffa44a4042feb0 EFLAGS: 00000086
<4>[ 2.468003] RAX: 0000000000000003 RBX: ffff8d71bbc2b000 RCX:
0000000050480000
<4>[ 2.468470] RDX: 000000010000087f RSI: 0000000000000086 RDI:
0000000000000001
<4>[ 2.468900] RBP: ffffa44a4042feb8 R08: 0000000000100000 R09:
0000000000000001
<4>[ 2.469608] R10: ffff8d7140882d80 R11: 0000000000000000 R12:
ffff8d71bbc2b028
<4>[ 2.470204] R13: ffff8d7140882d00 R14: ffff8d71bbc2b000 R15:
ffff8d71413df700
<4>[ 2.470720] FS: 0000000000000000(0000)
GS:ffff8d71bbc00000(0000) knlGS:0000000000000000
<4>[ 2.471248] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2.471648] CR2: ffff8d71bffff000 CR3: 0000000123e40000 CR4:
00000000000006f0
<4>[ 2.472614] Call Trace:
<4>[ 2.473152] <TASK>
<4>[ 2.474182] worker_thread+0x71/0x430
<4>[ 2.474992] ? _raw_spin_unlock_irqrestore+0x28/0x50
<4>[ 2.475263] kthread+0x103/0x120
<4>[ 2.475493] ? __pfx_worker_thread+0x10/0x10
<4>[ 2.476355] ? __pfx_kthread+0x10/0x10
<4>[ 2.476635] ret_from_fork+0x2c/0x50
<4>[ 2.477051] </TASK>
<4>[ 2.477785] ---[ end trace 0000000000000000 ]---
<6>[ 2.748347] tsc: Refined TSC clocksource calibration: 2899.874 MHz
<6>[ 2.749124] clocksource: tsc: mask: 0xffffffffffffffff
max_cycles: 0x29cccc5071e, max_idle_ns: 440795322437 ns
<6>[ 2.751064] clocksource: Switched to clocksource tsc
log:
-----
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/log
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/details/
--
Linaro LKFT
https://lkft.linaro.org
^ permalink raw reply [flat|nested] 4+ messages in thread[parent not found: <CALm+0cXew-QX9gkHkL7majJuENP1C_WLVG=pQ_6RGZ1UgQ79Eg@mail.gmail.com>]
* Re: next: WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0 [not found] ` <CALm+0cXew-QX9gkHkL7majJuENP1C_WLVG=pQ_6RGZ1UgQ79Eg@mail.gmail.com> @ 2023-05-22 18:20 ` Tejun Heo 2023-05-22 21:12 ` Tejun Heo 0 siblings, 1 reply; 4+ messages in thread From: Tejun Heo @ 2023-05-22 18:20 UTC (permalink / raw) To: Z qiang Cc: Naresh Kamboju, open list, Linux-Next Mailing List, lkft-triage, clang-built-linux, Nathan Chancellor, Arnd Bergmann, Dan Carpenter, Nick Desaulniers, Anders Roxell Hello, On Mon, May 22, 2023 at 09:24:09PM +0800, Z qiang wrote: > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 9c5c1cfa478f..f8d739fef311 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1060,10 +1060,9 @@ void wq_worker_running(struct task_struct *task) > * and leave with an unexpected pool->nr_running == 1 on the newly > unbound > * pool. Protect against such race. > */ > - preempt_disable(); > + local_irq_disable(); > if (!(worker->flags & WORKER_NOT_RUNNING)) > worker->pool->nr_running++; > - preempt_enable(); > > /* > * CPU intensive auto-detection cares about how long a work item > hogged > @@ -1072,6 +1071,7 @@ void wq_worker_running(struct task_struct *task) > worker->current_at = worker->task->se.sum_exec_runtime; > > worker->sleeping = 0; > + local_irq_enable(); > } Ah, yeah, this is correct. Now we're modifying nr_running from timer tick too, so if don't block irq, the timer tick can ruin the not-irq-protected read-write-modify nr_running update from wq_worker_running(). Naresh, can you please confirm the fix? Thanks. -- tejun ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: next: WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0 2023-05-22 18:20 ` Tejun Heo @ 2023-05-22 21:12 ` Tejun Heo 2023-05-23 9:25 ` Naresh Kamboju 0 siblings, 1 reply; 4+ messages in thread From: Tejun Heo @ 2023-05-22 21:12 UTC (permalink / raw) To: Z qiang Cc: Naresh Kamboju, open list, Linux-Next Mailing List, lkft-triage, clang-built-linux, Nathan Chancellor, Arnd Bergmann, Dan Carpenter, Nick Desaulniers, Anders Roxell On Mon, May 22, 2023 at 08:20:38AM -1000, Tejun Heo wrote: > Hello, > > On Mon, May 22, 2023 at 09:24:09PM +0800, Z qiang wrote: > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > index 9c5c1cfa478f..f8d739fef311 100644 > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > > @@ -1060,10 +1060,9 @@ void wq_worker_running(struct task_struct *task) > > * and leave with an unexpected pool->nr_running == 1 on the newly > > unbound > > * pool. Protect against such race. > > */ > > - preempt_disable(); > > + local_irq_disable(); > > if (!(worker->flags & WORKER_NOT_RUNNING)) > > worker->pool->nr_running++; > > - preempt_enable(); > > > > /* > > * CPU intensive auto-detection cares about how long a work item > > hogged > > @@ -1072,6 +1071,7 @@ void wq_worker_running(struct task_struct *task) > > worker->current_at = worker->task->se.sum_exec_runtime; > > > > worker->sleeping = 0; > > + local_irq_enable(); > > } > > Ah, yeah, this is correct. Now we're modifying nr_running from timer tick > too, so if don't block irq, the timer tick can ruin the not-irq-protected > read-write-modify nr_running update from wq_worker_running(). Naresh, can > you please confirm the fix? Z qiang, while waiting for Naresh's test result, can you send the fix as a proper signed-off-patch? Thanks. -- tejun ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: next: WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0 2023-05-22 21:12 ` Tejun Heo @ 2023-05-23 9:25 ` Naresh Kamboju 0 siblings, 0 replies; 4+ messages in thread From: Naresh Kamboju @ 2023-05-23 9:25 UTC (permalink / raw) To: Tejun Heo, Anders Roxell Cc: Z qiang, open list, Linux-Next Mailing List, lkft-triage, clang-built-linux, Nathan Chancellor, Arnd Bergmann, Dan Carpenter, Nick Desaulniers On Tue, 23 May 2023 at 02:42, Tejun Heo <tj@kernel.org> wrote: > > On Mon, May 22, 2023 at 08:20:38AM -1000, Tejun Heo wrote: > > Hello, > > > > On Mon, May 22, 2023 at 09:24:09PM +0800, Z qiang wrote: > > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > > index 9c5c1cfa478f..f8d739fef311 100644 > > > --- a/kernel/workqueue.c > > > +++ b/kernel/workqueue.c > > > @@ -1060,10 +1060,9 @@ void wq_worker_running(struct task_struct *task) > > > * and leave with an unexpected pool->nr_running == 1 on the newly > > > unbound > > > * pool. Protect against such race. > > > */ > > > - preempt_disable(); > > > + local_irq_disable(); > > > if (!(worker->flags & WORKER_NOT_RUNNING)) > > > worker->pool->nr_running++; > > > - preempt_enable(); > > > > > > /* > > > * CPU intensive auto-detection cares about how long a work item > > > hogged > > > @@ -1072,6 +1071,7 @@ void wq_worker_running(struct task_struct *task) > > > worker->current_at = worker->task->se.sum_exec_runtime; > > > > > > worker->sleeping = 0; > > > + local_irq_enable(); > > > } > > > > Ah, yeah, this is correct. Now we're modifying nr_running from timer tick > > too, so if don't block irq, the timer tick can ruin the not-irq-protected > > read-write-modify nr_running update from wq_worker_running(). Naresh, can > > you please confirm the fix? > > Z qiang, while waiting for Naresh's test result, can you send the fix as a > proper signed-off-patch? The proposed patch applied on top of Linux next and boot tested for more than 100 times and is still running in a loop by Anders. The reported warning did not reproduce again. Tested-by: Anders Roxell <anders.roxell@linaro.org> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> - Naresh > > Thanks. > > -- > tejun ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-05-23 9:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-19 10:16 next: WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0 Naresh Kamboju
[not found] ` <CALm+0cXew-QX9gkHkL7majJuENP1C_WLVG=pQ_6RGZ1UgQ79Eg@mail.gmail.com>
2023-05-22 18:20 ` Tejun Heo
2023-05-22 21:12 ` Tejun Heo
2023-05-23 9:25 ` Naresh Kamboju
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox