* [peterz-queue:sched/core] [sched/fair] 420356c350: WARNING:at_kernel/sched/core.c:#__might_sleep
@ 2024-08-16 9:15 kernel test robot
2024-08-17 9:33 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: kernel test robot @ 2024-08-16 9:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen, oliver.sang
Hello,
kernel test robot noticed "WARNING:at_kernel/sched/core.c:#__might_sleep" on:
commit: 420356c3504091f0f6021974389df7c58f365dad ("sched/fair: Implement delayed dequeue")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/core
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: cpuhotplug
torture_type: tasks-rude
compiler: clang-18
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
+-------------------------------------------------------------------------+------------+------------+
| | 18fdefe603 | 420356c350 |
+-------------------------------------------------------------------------+------------+------------+
| WARNING:at_kernel/sched/core.c:#__might_sleep | 0 | 18 |
| RIP:__might_sleep | 0 | 18 |
+-------------------------------------------------------------------------+------------+------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202408161619.9ed8b83e-lkp@intel.com
[ 86.252370][ T674] ------------[ cut here ]------------
[ 86.252945][ T674] do not call blocking ops when !TASK_RUNNING; state=1 set at kthread_worker_fn (kernel/kthread.c:?)
[ 86.254001][ T674] WARNING: CPU: 1 PID: 674 at kernel/sched/core.c:8469 __might_sleep (kernel/sched/core.c:8465)
[ 86.255224][ T674] Modules linked in: rcutorture torture crct10dif_pclmul crc32c_intel polyval_clmulni polyval_generic tiny_power_button sha256_ssse3 rtc_cmos sha1_ssse3 button aesni_intel evdev ipmi_devintf ipmi_msghandler drm fuse loop efi_pstore drm_panel_orientation_quirks pstore qemu_fw_cfg ip_tables x_tables autofs4
[ 86.260658][ T674] CPU: 1 UID: 0 PID: 674 Comm: erofs_worker/1 Tainted: G T 6.11.0-rc1-00042-g420356c35040 #1 f37f1ad66c1ebf3b07abcf4fc3040d14b7156f8a
[ 86.262530][ T674] Tainted: [T]=RANDSTRUCT
[ 86.263000][ T674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 86.264783][ T674] RIP: 0010:__might_sleep (kernel/sched/core.c:8465)
[ 86.265305][ T674] Code: 00 00 00 00 fc ff df 80 3c 08 00 74 08 4c 89 ff e8 e6 50 62 00 49 8b 0f 48 c7 c7 e0 b0 ee 84 44 89 e6 48 89 ca e8 c1 0e f5 ff <0f> 0b 48 c7 c7 b0 01 8c 86 be 01 00 00 00 31 d2 b9 01 00 00 00 eb
All code
========
0: 00 00 add %al,(%rax)
2: 00 00 add %al,(%rax)
4: fc cld
5: ff (bad)
6: df 80 3c 08 00 74 filds 0x7400083c(%rax)
c: 08 4c 89 ff or %cl,-0x1(%rcx,%rcx,4)
10: e8 e6 50 62 00 call 0x6250fb
15: 49 8b 0f mov (%r15),%rcx
18: 48 c7 c7 e0 b0 ee 84 mov $0xffffffff84eeb0e0,%rdi
1f: 44 89 e6 mov %r12d,%esi
22: 48 89 ca mov %rcx,%rdx
25: e8 c1 0e f5 ff call 0xfffffffffff50eeb
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 c7 c7 b0 01 8c 86 mov $0xffffffff868c01b0,%rdi
33: be 01 00 00 00 mov $0x1,%esi
38: 31 d2 xor %edx,%edx
3a: b9 01 00 00 00 mov $0x1,%ecx
3f: eb .byte 0xeb
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 c7 c7 b0 01 8c 86 mov $0xffffffff868c01b0,%rdi
9: be 01 00 00 00 mov $0x1,%esi
e: 31 d2 xor %edx,%edx
10: b9 01 00 00 00 mov $0x1,%ecx
15: eb .byte 0xeb
[ 86.266970][ T674] RSP: 0000:ffffc90000f1fdb8 EFLAGS: 00010246
[ 86.267710][ T674] RAX: 93ef2e1fd3a87a00 RBX: 0000000000000001 RCX: 0000000000000000
[ 86.268961][ T674] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffffff868af248
[ 86.269995][ T674] RBP: ffffc90000f1fdf0 R08: ffffc90000f1fbf7 R09: 1ffff920001e3f7e
[ 86.270937][ T674] R10: dffffc0000000000 R11: fffff520001e3f7f R12: 0000000000000001
[ 86.272231][ T674] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88817b340080
[ 86.273485][ T674] FS: 0000000000000000(0000) GS:ffff8883aef00000(0000) knlGS:0000000000000000
[ 86.274487][ T674] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 86.275275][ T674] CR2: 0000000000000000 CR3: 0000000005815000 CR4: 00000000000406f0
[ 86.276454][ T674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 86.277565][ T674] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 86.278443][ T674] Call Trace:
[ 86.279070][ T674] <TASK>
[ 86.279553][ T674] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[ 86.279942][ T674] ? __warn (kernel/panic.c:735)
[ 86.280703][ T674] ? __might_sleep (kernel/sched/core.c:8465)
[ 86.281360][ T674] ? report_bug (lib/bug.c:?)
[ 86.282183][ T674] ? __might_sleep (kernel/sched/core.c:8465)
[ 86.282816][ T674] ? __wake_up_klogd
[ 86.283398][ T674] ? handle_bug (arch/x86/kernel/traps.c:239)
[ 86.283995][ T674] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
[ 86.284787][ T674] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 86.285682][ T674] ? __might_sleep (kernel/sched/core.c:8465)
[ 86.286380][ T674] ? __might_sleep (kernel/sched/core.c:8465)
[ 86.287116][ T674] kthread_worker_fn (include/linux/kernel.h:73 include/linux/freezer.h:53 kernel/kthread.c:851)
[ 86.287701][ T674] ? kthread_worker_fn (kernel/kthread.c:?)
[ 86.288138][ T674] kthread (kernel/kthread.c:391)
[ 86.288482][ T674] ? __cfi_kthread_worker_fn (kernel/kthread.c:803)
[ 86.288951][ T674] ? __cfi_kthread (kernel/kthread.c:342)
[ 86.289560][ T674] ret_from_fork (arch/x86/kernel/process.c:153)
[ 86.290162][ T674] ? __cfi_kthread (kernel/kthread.c:342)
[ 86.291465][ T674] ret_from_fork_asm (arch/x86/entry/entry_64.S:254)
[ 86.292080][ T674] </TASK>
[ 86.292468][ T674] irq event stamp: 579
[ 86.292872][ T674] hardirqs last enabled at (589): console_unlock (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:97 arch/x86/include/asm/irqflags.h:155 kernel/printk/printk.c:341 kernel/printk/printk.c:2801 kernel/printk/printk.c:3120)
[ 86.293852][ T674] hardirqs last disabled at (598): console_unlock (kernel/printk/printk.c:339 kernel/printk/printk.c:2801 kernel/printk/printk.c:3120)
[ 86.294901][ T674] softirqs last enabled at (0): copy_process (include/linux/rcupdate.h:326 include/linux/rcupdate.h:838 kernel/fork.c:2243)
[ 86.296048][ T674] softirqs last disabled at (0): 0x0
[ 86.296931][ T674] ---[ end trace 0000000000000000 ]---
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240816/202408161619.9ed8b83e-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [peterz-queue:sched/core] [sched/fair] 420356c350: WARNING:at_kernel/sched/core.c:#__might_sleep
2024-08-16 9:15 [peterz-queue:sched/core] [sched/fair] 420356c350: WARNING:at_kernel/sched/core.c:#__might_sleep kernel test robot
@ 2024-08-17 9:33 ` Peter Zijlstra
2024-08-19 4:44 ` Chen Yu
0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2024-08-17 9:33 UTC (permalink / raw)
To: kernel test robot; +Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen
On Fri, Aug 16, 2024 at 05:15:12PM +0800, kernel test robot wrote:
> kernel test robot noticed "WARNING:at_kernel/sched/core.c:#__might_sleep" on:
>
> commit: 420356c3504091f0f6021974389df7c58f365dad ("sched/fair: Implement delayed dequeue")
> https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/core
> [ 86.252370][ T674] ------------[ cut here ]------------
> [ 86.252945][ T674] do not call blocking ops when !TASK_RUNNING; state=1 set at kthread_worker_fn (kernel/kthread.c:?)
> [ 86.254001][ T674] WARNING: CPU: 1 PID: 674 at kernel/sched/core.c:8469 __might_sleep (kernel/sched/core.c:8465)
> [ 86.283398][ T674] ? handle_bug (arch/x86/kernel/traps.c:239)
> [ 86.283995][ T674] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
> [ 86.284787][ T674] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
> [ 86.285682][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> [ 86.286380][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> [ 86.287116][ T674] kthread_worker_fn (include/linux/kernel.h:73 include/linux/freezer.h:53 kernel/kthread.c:851)
> [ 86.287701][ T674] ? kthread_worker_fn (kernel/kthread.c:?)
> [ 86.288138][ T674] kthread (kernel/kthread.c:391)
> [ 86.288482][ T674] ? __cfi_kthread_worker_fn (kernel/kthread.c:803)
> [ 86.288951][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> [ 86.289560][ T674] ret_from_fork (arch/x86/kernel/process.c:153)
> [ 86.290162][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> [ 86.291465][ T674] ret_from_fork_asm (arch/x86/entry/entry_64.S:254)
AFAICT this is a pre-existing issue. Notably that all transcribes to:
kthread_worker_fn()
...
repeat:
set_current_state(TASK_INTERRUPTIBLE);
...
if (work) { // false
__set_current_state(TASK_RUNNING);
...
} else if (!freezing(current)) // false -- we are freezing
schedule();
// so state really is still TASK_INTERRUPTIBLE here
try_to_freeze()
might_sleep() <--- boom, per the above.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [peterz-queue:sched/core] [sched/fair] 420356c350: WARNING:at_kernel/sched/core.c:#__might_sleep
2024-08-17 9:33 ` Peter Zijlstra
@ 2024-08-19 4:44 ` Chen Yu
2024-08-19 8:40 ` Oliver Sang
2024-08-22 15:49 ` Peter Zijlstra
0 siblings, 2 replies; 7+ messages in thread
From: Chen Yu @ 2024-08-19 4:44 UTC (permalink / raw)
To: Peter Zijlstra, Oliver Sang
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, aubrey.li
On 2024-08-17 at 11:33:29 +0200, Peter Zijlstra wrote:
> On Fri, Aug 16, 2024 at 05:15:12PM +0800, kernel test robot wrote:
> > kernel test robot noticed "WARNING:at_kernel/sched/core.c:#__might_sleep" on:
> >
> > commit: 420356c3504091f0f6021974389df7c58f365dad ("sched/fair: Implement delayed dequeue")
> > https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/core
>
> > [ 86.252370][ T674] ------------[ cut here ]------------
> > [ 86.252945][ T674] do not call blocking ops when !TASK_RUNNING; state=1 set at kthread_worker_fn (kernel/kthread.c:?)
> > [ 86.254001][ T674] WARNING: CPU: 1 PID: 674 at kernel/sched/core.c:8469 __might_sleep (kernel/sched/core.c:8465)
>
> > [ 86.283398][ T674] ? handle_bug (arch/x86/kernel/traps.c:239)
> > [ 86.283995][ T674] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
> > [ 86.284787][ T674] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
> > [ 86.285682][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > [ 86.286380][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > [ 86.287116][ T674] kthread_worker_fn (include/linux/kernel.h:73 include/linux/freezer.h:53 kernel/kthread.c:851)
> > [ 86.287701][ T674] ? kthread_worker_fn (kernel/kthread.c:?)
> > [ 86.288138][ T674] kthread (kernel/kthread.c:391)
> > [ 86.288482][ T674] ? __cfi_kthread_worker_fn (kernel/kthread.c:803)
> > [ 86.288951][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > [ 86.289560][ T674] ret_from_fork (arch/x86/kernel/process.c:153)
> > [ 86.290162][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > [ 86.291465][ T674] ret_from_fork_asm (arch/x86/entry/entry_64.S:254)
>
> AFAICT this is a pre-existing issue. Notably that all transcribes to:
>
> kthread_worker_fn()
> ...
> repeat:
> set_current_state(TASK_INTERRUPTIBLE);
> ...
> if (work) { // false
> __set_current_state(TASK_RUNNING);
> ...
> } else if (!freezing(current)) // false -- we are freezing
> schedule();
>
> // so state really is still TASK_INTERRUPTIBLE here
> try_to_freeze()
> might_sleep() <--- boom, per the above.
>
Would the following fix make sense?
diff --git a/kernel/kthread.c b/kernel/kthread.c
index f7be976ff88a..09850b2109c9 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -848,6 +848,12 @@ int kthread_worker_fn(void *worker_ptr)
} else if (!freezing(current))
schedule();
+ /*
+ * Explictly set the running state in case we are being frozen
+ * and skip the schedule() above. try_to_freeze() expects the
+ * current task to be in running state.
+ */
+ __set_current_state(TASK_RUNNING);
try_to_freeze();
cond_resched();
goto repeat;
--
2.25.1
Hi Oliver,
Could you please help check if above change would make the warning go away?
thanks,
Chenyu
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [peterz-queue:sched/core] [sched/fair] 420356c350: WARNING:at_kernel/sched/core.c:#__might_sleep
2024-08-19 4:44 ` Chen Yu
@ 2024-08-19 8:40 ` Oliver Sang
2024-08-22 15:49 ` Peter Zijlstra
1 sibling, 0 replies; 7+ messages in thread
From: Oliver Sang @ 2024-08-19 8:40 UTC (permalink / raw)
To: Chen Yu; +Cc: Peter Zijlstra, oe-lkp, lkp, linux-kernel, aubrey.li, oliver.sang
hi, Chen Yu,
On Mon, Aug 19, 2024 at 12:44:39PM +0800, Chen Yu wrote:
> On 2024-08-17 at 11:33:29 +0200, Peter Zijlstra wrote:
> > On Fri, Aug 16, 2024 at 05:15:12PM +0800, kernel test robot wrote:
> > > kernel test robot noticed "WARNING:at_kernel/sched/core.c:#__might_sleep" on:
> > >
> > > commit: 420356c3504091f0f6021974389df7c58f365dad ("sched/fair: Implement delayed dequeue")
> > > https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/core
> >
> > > [ 86.252370][ T674] ------------[ cut here ]------------
> > > [ 86.252945][ T674] do not call blocking ops when !TASK_RUNNING; state=1 set at kthread_worker_fn (kernel/kthread.c:?)
> > > [ 86.254001][ T674] WARNING: CPU: 1 PID: 674 at kernel/sched/core.c:8469 __might_sleep (kernel/sched/core.c:8465)
> >
> > > [ 86.283398][ T674] ? handle_bug (arch/x86/kernel/traps.c:239)
> > > [ 86.283995][ T674] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
> > > [ 86.284787][ T674] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
> > > [ 86.285682][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > > [ 86.286380][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > > [ 86.287116][ T674] kthread_worker_fn (include/linux/kernel.h:73 include/linux/freezer.h:53 kernel/kthread.c:851)
> > > [ 86.287701][ T674] ? kthread_worker_fn (kernel/kthread.c:?)
> > > [ 86.288138][ T674] kthread (kernel/kthread.c:391)
> > > [ 86.288482][ T674] ? __cfi_kthread_worker_fn (kernel/kthread.c:803)
> > > [ 86.288951][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > > [ 86.289560][ T674] ret_from_fork (arch/x86/kernel/process.c:153)
> > > [ 86.290162][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > > [ 86.291465][ T674] ret_from_fork_asm (arch/x86/entry/entry_64.S:254)
> >
> > AFAICT this is a pre-existing issue. Notably that all transcribes to:
> >
> > kthread_worker_fn()
> > ...
> > repeat:
> > set_current_state(TASK_INTERRUPTIBLE);
> > ...
> > if (work) { // false
> > __set_current_state(TASK_RUNNING);
> > ...
> > } else if (!freezing(current)) // false -- we are freezing
> > schedule();
> >
> > // so state really is still TASK_INTERRUPTIBLE here
> > try_to_freeze()
> > might_sleep() <--- boom, per the above.
> >
>
> Would the following fix make sense?
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index f7be976ff88a..09850b2109c9 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -848,6 +848,12 @@ int kthread_worker_fn(void *worker_ptr)
> } else if (!freezing(current))
> schedule();
>
> + /*
> + * Explictly set the running state in case we are being frozen
> + * and skip the schedule() above. try_to_freeze() expects the
> + * current task to be in running state.
> + */
> + __set_current_state(TASK_RUNNING);
> try_to_freeze();
> cond_resched();
> goto repeat;
> --
> 2.25.1
>
> Hi Oliver,
> Could you please help check if above change would make the warning go away?
confirmed this patch could resolve the WARNING. thanks
>
> thanks,
> Chenyu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [peterz-queue:sched/core] [sched/fair] 420356c350: WARNING:at_kernel/sched/core.c:#__might_sleep
2024-08-19 4:44 ` Chen Yu
2024-08-19 8:40 ` Oliver Sang
@ 2024-08-22 15:49 ` Peter Zijlstra
2024-08-26 8:25 ` Chen Yu
1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2024-08-22 15:49 UTC (permalink / raw)
To: Chen Yu; +Cc: Oliver Sang, oe-lkp, lkp, linux-kernel, aubrey.li
On Mon, Aug 19, 2024 at 12:44:39PM +0800, Chen Yu wrote:
> On 2024-08-17 at 11:33:29 +0200, Peter Zijlstra wrote:
> > On Fri, Aug 16, 2024 at 05:15:12PM +0800, kernel test robot wrote:
> > > kernel test robot noticed "WARNING:at_kernel/sched/core.c:#__might_sleep" on:
> > >
> > > commit: 420356c3504091f0f6021974389df7c58f365dad ("sched/fair: Implement delayed dequeue")
> > > https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/core
> >
> > > [ 86.252370][ T674] ------------[ cut here ]------------
> > > [ 86.252945][ T674] do not call blocking ops when !TASK_RUNNING; state=1 set at kthread_worker_fn (kernel/kthread.c:?)
> > > [ 86.254001][ T674] WARNING: CPU: 1 PID: 674 at kernel/sched/core.c:8469 __might_sleep (kernel/sched/core.c:8465)
> >
> > > [ 86.283398][ T674] ? handle_bug (arch/x86/kernel/traps.c:239)
> > > [ 86.283995][ T674] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
> > > [ 86.284787][ T674] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
> > > [ 86.285682][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > > [ 86.286380][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > > [ 86.287116][ T674] kthread_worker_fn (include/linux/kernel.h:73 include/linux/freezer.h:53 kernel/kthread.c:851)
> > > [ 86.287701][ T674] ? kthread_worker_fn (kernel/kthread.c:?)
> > > [ 86.288138][ T674] kthread (kernel/kthread.c:391)
> > > [ 86.288482][ T674] ? __cfi_kthread_worker_fn (kernel/kthread.c:803)
> > > [ 86.288951][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > > [ 86.289560][ T674] ret_from_fork (arch/x86/kernel/process.c:153)
> > > [ 86.290162][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > > [ 86.291465][ T674] ret_from_fork_asm (arch/x86/entry/entry_64.S:254)
> >
> > AFAICT this is a pre-existing issue. Notably that all transcribes to:
> >
> > kthread_worker_fn()
> > ...
> > repeat:
> > set_current_state(TASK_INTERRUPTIBLE);
> > ...
> > if (work) { // false
> > __set_current_state(TASK_RUNNING);
> > ...
> > } else if (!freezing(current)) // false -- we are freezing
> > schedule();
> >
> > // so state really is still TASK_INTERRUPTIBLE here
> > try_to_freeze()
> > might_sleep() <--- boom, per the above.
> >
>
> Would the following fix make sense?
Yeah, that looks fine. Could you write it up as a proper patch please?
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index f7be976ff88a..09850b2109c9 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -848,6 +848,12 @@ int kthread_worker_fn(void *worker_ptr)
> } else if (!freezing(current))
> schedule();
>
> + /*
> + * Explictly set the running state in case we are being frozen
> + * and skip the schedule() above. try_to_freeze() expects the
> + * current task to be in running state.
> + */
> + __set_current_state(TASK_RUNNING);
> try_to_freeze();
> cond_resched();
> goto repeat;
> --
> 2.25.1
>
> Hi Oliver,
> Could you please help check if above change would make the warning go away?
>
> thanks,
> Chenyu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [peterz-queue:sched/core] [sched/fair] 420356c350: WARNING:at_kernel/sched/core.c:#__might_sleep
2024-08-22 15:49 ` Peter Zijlstra
@ 2024-08-26 8:25 ` Chen Yu
2024-08-27 9:40 ` Chen Yu
0 siblings, 1 reply; 7+ messages in thread
From: Chen Yu @ 2024-08-26 8:25 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Oliver Sang, oe-lkp, lkp, linux-kernel, aubrey.li
On 2024-08-22 at 17:49:23 +0200, Peter Zijlstra wrote:
> On Mon, Aug 19, 2024 at 12:44:39PM +0800, Chen Yu wrote:
> > On 2024-08-17 at 11:33:29 +0200, Peter Zijlstra wrote:
> > > On Fri, Aug 16, 2024 at 05:15:12PM +0800, kernel test robot wrote:
> > > > kernel test robot noticed "WARNING:at_kernel/sched/core.c:#__might_sleep" on:
> > > >
> > > > commit: 420356c3504091f0f6021974389df7c58f365dad ("sched/fair: Implement delayed dequeue")
> > > > https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/core
> > >
> > > > [ 86.252370][ T674] ------------[ cut here ]------------
> > > > [ 86.252945][ T674] do not call blocking ops when !TASK_RUNNING; state=1 set at kthread_worker_fn (kernel/kthread.c:?)
> > > > [ 86.254001][ T674] WARNING: CPU: 1 PID: 674 at kernel/sched/core.c:8469 __might_sleep (kernel/sched/core.c:8465)
> > >
> > > > [ 86.283398][ T674] ? handle_bug (arch/x86/kernel/traps.c:239)
> > > > [ 86.283995][ T674] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
> > > > [ 86.284787][ T674] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
> > > > [ 86.285682][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > > > [ 86.286380][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > > > [ 86.287116][ T674] kthread_worker_fn (include/linux/kernel.h:73 include/linux/freezer.h:53 kernel/kthread.c:851)
> > > > [ 86.287701][ T674] ? kthread_worker_fn (kernel/kthread.c:?)
> > > > [ 86.288138][ T674] kthread (kernel/kthread.c:391)
> > > > [ 86.288482][ T674] ? __cfi_kthread_worker_fn (kernel/kthread.c:803)
> > > > [ 86.288951][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > > > [ 86.289560][ T674] ret_from_fork (arch/x86/kernel/process.c:153)
> > > > [ 86.290162][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > > > [ 86.291465][ T674] ret_from_fork_asm (arch/x86/entry/entry_64.S:254)
> > >
> > > AFAICT this is a pre-existing issue. Notably that all transcribes to:
> > >
> > > kthread_worker_fn()
> > > ...
> > > repeat:
> > > set_current_state(TASK_INTERRUPTIBLE);
> > > ...
> > > if (work) { // false
> > > __set_current_state(TASK_RUNNING);
> > > ...
> > > } else if (!freezing(current)) // false -- we are freezing
> > > schedule();
> > >
> > > // so state really is still TASK_INTERRUPTIBLE here
> > > try_to_freeze()
> > > might_sleep() <--- boom, per the above.
> > >
> >
> > Would the following fix make sense?
>
> Yeah, that looks fine. Could you write it up as a proper patch please?
>
Yes, it should be a race condition in theory and I've sent a patch here:
https://lore.kernel.org/lkml/20240819141551.111610-1-yu.c.chen@intel.com/
And Andrew has given some comments on it.
However, after I did some further investigation, this warning seems to
not be directly related to task freeze, but has connection with the
delay dequeue. I'm planning to add debug patch and investigate the
symptom in 0day's environment, will send the finding later.
thanks,
Chenyu
> >
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index f7be976ff88a..09850b2109c9 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -848,6 +848,12 @@ int kthread_worker_fn(void *worker_ptr)
> > } else if (!freezing(current))
> > schedule();
> >
> > + /*
> > + * Explictly set the running state in case we are being frozen
> > + * and skip the schedule() above. try_to_freeze() expects the
> > + * current task to be in running state.
> > + */
> > + __set_current_state(TASK_RUNNING);
> > try_to_freeze();
> > cond_resched();
> > goto repeat;
> > --
> > 2.25.1
> >
> > Hi Oliver,
> > Could you please help check if above change would make the warning go away?
> >
> > thanks,
> > Chenyu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [peterz-queue:sched/core] [sched/fair] 420356c350: WARNING:at_kernel/sched/core.c:#__might_sleep
2024-08-26 8:25 ` Chen Yu
@ 2024-08-27 9:40 ` Chen Yu
0 siblings, 0 replies; 7+ messages in thread
From: Chen Yu @ 2024-08-27 9:40 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Oliver Sang, oe-lkp, lkp, linux-kernel, aubrey.li
On 2024-08-26 at 16:25:56 +0800, Chen Yu wrote:
> On 2024-08-22 at 17:49:23 +0200, Peter Zijlstra wrote:
> > On Mon, Aug 19, 2024 at 12:44:39PM +0800, Chen Yu wrote:
> > > On 2024-08-17 at 11:33:29 +0200, Peter Zijlstra wrote:
> > > > On Fri, Aug 16, 2024 at 05:15:12PM +0800, kernel test robot wrote:
> > > > > kernel test robot noticed "WARNING:at_kernel/sched/core.c:#__might_sleep" on:
> > > > >
> > > > > commit: 420356c3504091f0f6021974389df7c58f365dad ("sched/fair: Implement delayed dequeue")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/core
> > > >
> > > > > [ 86.252370][ T674] ------------[ cut here ]------------
> > > > > [ 86.252945][ T674] do not call blocking ops when !TASK_RUNNING; state=1 set at kthread_worker_fn (kernel/kthread.c:?)
> > > > > [ 86.254001][ T674] WARNING: CPU: 1 PID: 674 at kernel/sched/core.c:8469 __might_sleep (kernel/sched/core.c:8465)
> > > >
> > > > > [ 86.283398][ T674] ? handle_bug (arch/x86/kernel/traps.c:239)
> > > > > [ 86.283995][ T674] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
> > > > > [ 86.284787][ T674] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
> > > > > [ 86.285682][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > > > > [ 86.286380][ T674] ? __might_sleep (kernel/sched/core.c:8465)
> > > > > [ 86.287116][ T674] kthread_worker_fn (include/linux/kernel.h:73 include/linux/freezer.h:53 kernel/kthread.c:851)
> > > > > [ 86.287701][ T674] ? kthread_worker_fn (kernel/kthread.c:?)
> > > > > [ 86.288138][ T674] kthread (kernel/kthread.c:391)
> > > > > [ 86.288482][ T674] ? __cfi_kthread_worker_fn (kernel/kthread.c:803)
> > > > > [ 86.288951][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > > > > [ 86.289560][ T674] ret_from_fork (arch/x86/kernel/process.c:153)
> > > > > [ 86.290162][ T674] ? __cfi_kthread (kernel/kthread.c:342)
> > > > > [ 86.291465][ T674] ret_from_fork_asm (arch/x86/entry/entry_64.S:254)
> > > >
> > > > AFAICT this is a pre-existing issue. Notably that all transcribes to:
> > > >
> > > > kthread_worker_fn()
> > > > ...
> > > > repeat:
> > > > set_current_state(TASK_INTERRUPTIBLE);
> > > > ...
> > > > if (work) { // false
> > > > __set_current_state(TASK_RUNNING);
> > > > ...
> > > > } else if (!freezing(current)) // false -- we are freezing
> > > > schedule();
> > > >
> > > > // so state really is still TASK_INTERRUPTIBLE here
> > > > try_to_freeze()
> > > > might_sleep() <--- boom, per the above.
> > > >
> > >
> > > Would the following fix make sense?
> >
> > Yeah, that looks fine. Could you write it up as a proper patch please?
> >
>
> Yes, it should be a race condition in theory and I've sent a patch here:
> https://lore.kernel.org/lkml/20240819141551.111610-1-yu.c.chen@intel.com/
> And Andrew has given some comments on it.
>
> However, after I did some further investigation, this warning seems to
> not be directly related to task freeze, but has connection with the
> delay dequeue. I'm planning to add debug patch and investigate the
> symptom in 0day's environment, will send the finding later.
>
The root cause is replied to the delayed dequeue patch set here:
https://lore.kernel.org/lkml/Zs2ZoAcUsZMX2B%2FI@chenyu5-mobl2/
And since the race condition mentioned previously is real
(although not the root cause of the warning reported in this thread),
I'll send a v2 patch to get that fixed.
thanks,
Chenyu
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-27 9:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-16 9:15 [peterz-queue:sched/core] [sched/fair] 420356c350: WARNING:at_kernel/sched/core.c:#__might_sleep kernel test robot
2024-08-17 9:33 ` Peter Zijlstra
2024-08-19 4:44 ` Chen Yu
2024-08-19 8:40 ` Oliver Sang
2024-08-22 15:49 ` Peter Zijlstra
2024-08-26 8:25 ` Chen Yu
2024-08-27 9:40 ` Chen Yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox