* [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg
@ 2024-07-29 6:14 kernel test robot
2024-07-31 6:16 ` K Prateek Nayak
0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2024-07-29 6:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen, oliver.sang
Hello,
kernel test robot noticed "WARNING:at_kernel/sched/sched.h:#update_load_avg" on:
commit: 124c8f43740f3f1e3e2aeecd976bcf5ddd7bd435 ("sched/fair: Implement delayed dequeue")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/prep
in testcase: boot
compiler: clang-18
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
we noticed the issue does not always happen, 60 times out of 150 runs as below.
but the parent keeps clean.
a1133ac6dc7c6bde 124c8f43740f3f1e3e2aeecd976
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:150 40% 60:150 dmesg.WARNING:at_kernel/sched/sched.h:#update_load_avg
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202407291328.761441a7-oliver.sang@intel.com
[ 13.522527][ C0] ------------[ cut here ]------------
[ 13.523046][ C0] rq->clock_update_flags < RQCF_ACT_SKIP
[ 13.523055][ C0] WARNING: CPU: 0 PID: 158 at kernel/sched/sched.h:1550 update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
[ 13.524265][ C0] Modules linked in: drm fuse ip_tables
[ 13.524722][ C0] CPU: 0 UID: 0 PID: 158 Comm: openipmi Not tainted 6.10.0-12745-g124c8f43740f #1
[ 13.525464][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 13.526307][ C0] RIP: 0010:update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
[ 13.526793][ C0] Code: 00 4c 2b bb 88 01 00 00 40 f6 c5 02 0f 84 b7 fa ff ff e9 ca fa ff ff c6 05 b9 1b e4 01 01 48 c7 c7 34 74 d6 a6 e8 a1 c7 fa ff <0f> 0b e9 78 fa ff ff 65 8b 0d 8f 83 91 5a 89 c9 48 0f a3 0d e9 7b
All code
========
0: 00 4c 2b bb add %cl,-0x45(%rbx,%rbp,1)
4: 88 01 mov %al,(%rcx)
6: 00 00 add %al,(%rax)
8: 40 f6 c5 02 test $0x2,%bpl
c: 0f 84 b7 fa ff ff je 0xfffffffffffffac9
12: e9 ca fa ff ff jmpq 0xfffffffffffffae1
17: c6 05 b9 1b e4 01 01 movb $0x1,0x1e41bb9(%rip) # 0x1e41bd7
1e: 48 c7 c7 34 74 d6 a6 mov $0xffffffffa6d67434,%rdi
25: e8 a1 c7 fa ff callq 0xfffffffffffac7cb
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 78 fa ff ff jmpq 0xfffffffffffffaa9
31: 65 8b 0d 8f 83 91 5a mov %gs:0x5a91838f(%rip),%ecx # 0x5a9183c7
38: 89 c9 mov %ecx,%ecx
3a: 48 rex.W
3b: 0f .byte 0xf
3c: a3 .byte 0xa3
3d: 0d .byte 0xd
3e: e9 .byte 0xe9
3f: 7b .byte 0x7b
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 78 fa ff ff jmpq 0xfffffffffffffa7f
7: 65 8b 0d 8f 83 91 5a mov %gs:0x5a91838f(%rip),%ecx # 0x5a91839d
e: 89 c9 mov %ecx,%ecx
10: 48 rex.W
11: 0f .byte 0xf
12: a3 .byte 0xa3
13: 0d .byte 0xd
14: e9 .byte 0xe9
15: 7b .byte 0x7b
[ 13.528234][ C0] RSP: 0000:ffffbaef40003d90 EFLAGS: 00010046
[ 13.528723][ C0] RAX: b53834d2850b7b00 RBX: ffff9ef2efd34640 RCX: 0000000000000027
[ 13.529379][ C0] RDX: 0000000000000002 RSI: 00000000ffff7fff RDI: ffff9ef2efc20b88
[ 13.530029][ C0] RBP: 0000000000000001 R08: 0000000000007fff R09: ffffffffa7053610
[ 13.530685][ C0] R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9ef2efd345c0
[ 13.531344][ C0] R13: ffff9ef2efd34640 R14: ffff9eefc0e26400 R15: 0000000000000000
[ 13.532001][ C0] FS: 0000000000000000(0000) GS:ffff9ef2efc00000(0063) knlGS:00000000f7f5f180
[ 13.532720][ C0] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 13.533242][ C0] CR2: 00000000577c51ac CR3: 0000000162e6c000 CR4: 00000000000406f0
[ 13.533891][ C0] Call Trace:
[ 13.534215][ C0] <IRQ>
[ 13.534504][ C0] ? __warn (kernel/panic.c:240 kernel/panic.c:735)
[ 13.534863][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
[ 13.535278][ C0] ? report_bug (lib/bug.c:? lib/bug.c:219)
[ 13.535695][ C0] ? handle_bug (arch/x86/kernel/traps.c:239)
[ 13.536079][ C0] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
[ 13.536498][ C0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 13.536920][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
[ 13.537334][ C0] dequeue_entity (kernel/sched/sched.h:817 kernel/sched/fair.c:5429)
[ 13.537729][ C0] dequeue_entities (kernel/sched/fair.c:6972)
[ 13.538130][ C0] unregister_fair_sched_group (kernel/sched/fair.c:379 kernel/sched/fair.c:13159)
[ 13.554647][ C0] ? __pfx_sched_unregister_group_rcu (kernel/sched/core.c:8759)
[ 13.555167][ C0] sched_unregister_group_rcu (kernel/sched/core.c:8707 kernel/sched/core.c:8761)
[ 13.555621][ C0] rcu_do_batch (arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2576)
[ 13.556008][ C0] rcu_core (kernel/rcu/tree.c:2845)
[ 13.556373][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:555)
[ 13.556775][ C0] __irq_exit_rcu (kernel/softirq.c:617 kernel/softirq.c:639)
[ 13.557166][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 arch/x86/kernel/apic/apic.c:1043)
[ 13.557638][ C0] </IRQ>
[ 13.557920][ C0] <TASK>
[ 13.560463][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
[ 13.560957][ C0] RIP: 0010:irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231)
[ 13.561465][ C0] Code: 90 41 57 41 56 53 48 89 fb e8 63 90 1f ff 65 4c 8b 34 25 40 37 03 00 49 8b 3e f7 c7 0e 30 02 00 74 7c 49 89 ff e8 97 8f 1f ff <4c> 89 f8 a8 08 75 38 a9 00 10 00 00 75 40 a9 00 20 00 00 75 4b a9
All code
========
0: 90 nop
1: 41 57 push %r15
3: 41 56 push %r14
5: 53 push %rbx
6: 48 89 fb mov %rdi,%rbx
9: e8 63 90 1f ff callq 0xffffffffff1f9071
e: 65 4c 8b 34 25 40 37 mov %gs:0x33740,%r14
15: 03 00
17: 49 8b 3e mov (%r14),%rdi
1a: f7 c7 0e 30 02 00 test $0x2300e,%edi
20: 74 7c je 0x9e
22: 49 89 ff mov %rdi,%r15
25: e8 97 8f 1f ff callq 0xffffffffff1f8fc1
2a:* 4c 89 f8 mov %r15,%rax <-- trapping instruction
2d: a8 08 test $0x8,%al
2f: 75 38 jne 0x69
31: a9 00 10 00 00 test $0x1000,%eax
36: 75 40 jne 0x78
38: a9 00 20 00 00 test $0x2000,%eax
3d: 75 4b jne 0x8a
3f: a9 .byte 0xa9
Code starting with the faulting instruction
===========================================
0: 4c 89 f8 mov %r15,%rax
3: a8 08 test $0x8,%al
5: 75 38 jne 0x3f
7: a9 00 10 00 00 test $0x1000,%eax
c: 75 40 jne 0x4e
e: a9 00 20 00 00 test $0x2000,%eax
13: 75 4b jne 0x60
15: a9 .byte 0xa9
[ 13.562938][ C0] RSP: 0000:ffffbaef40613f38 EFLAGS: 00000202
[ 13.563444][ C0] RAX: ffff9ef0287dd080 RBX: ffffbaef40613f58 RCX: 000000000000080b
[ 13.564125][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000008
[ 13.564796][ C0] RBP: 0000000000000000 R08: 000000000000002b R09: 000000000000002b
[ 13.565456][ C0] R10: 0000000000000296 R11: 00000000ff8772ac R12: 0000000000000000
[ 13.566115][ C0] R13: 0000000000000000 R14: ffff9ef0287dd080 R15: 0000000020000008
[ 13.566782][ C0] ? irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231)
[ 13.567254][ C0] asm_sysvec_reschedule_ipi (arch/x86/include/asm/idtentry.h:707)
[ 13.567709][ C0] RIP: 0023:0x56652630
[ 13.568072][ C0] Code: ff ff 83 c4 10 85 c0 74 05 83 c4 08 5b c3 83 ec 0c 8d 83 8d 6b ff ff 50 e8 cd 7c ff ff 8d b4 26 00 00 00 00 8d b6 00 00 00 00 <55> 57 56 53 e8 e7 61 ff ff 81 c3 13 38 01 00 83 ec 0c 8b 44 24 20
All code
========
0: ff (bad)
1: ff 83 c4 10 85 c0 incl -0x3f7aef3c(%rbx)
7: 74 05 je 0xe
9: 83 c4 08 add $0x8,%esp
c: 5b pop %rbx
d: c3 retq
e: 83 ec 0c sub $0xc,%esp
11: 8d 83 8d 6b ff ff lea -0x9473(%rbx),%eax
17: 50 push %rax
18: e8 cd 7c ff ff callq 0xffffffffffff7cea
1d: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
24: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
2a:* 55 push %rbp <-- trapping instruction
2b: 57 push %rdi
2c: 56 push %rsi
2d: 53 push %rbx
2e: e8 e7 61 ff ff callq 0xffffffffffff621a
33: 81 c3 13 38 01 00 add $0x13813,%ebx
39: 83 ec 0c sub $0xc,%esp
3c: 8b 44 24 20 mov 0x20(%rsp),%eax
Code starting with the faulting instruction
===========================================
0: 55 push %rbp
1: 57 push %rdi
2: 56 push %rsi
3: 53 push %rbx
4: e8 e7 61 ff ff callq 0xffffffffffff61f0
9: 81 c3 13 38 01 00 add $0x13813,%ebx
f: 83 ec 0c sub $0xc,%esp
12: 8b 44 24 20 mov 0x20(%rsp),%eax
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240729/202407291328.761441a7-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg
2024-07-29 6:14 [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg kernel test robot
@ 2024-07-31 6:16 ` K Prateek Nayak
2024-07-31 11:54 ` Peter Zijlstra
0 siblings, 1 reply; 4+ messages in thread
From: K Prateek Nayak @ 2024-07-31 6:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen,
kernel test robot
Hello Peter, Oliver,
P.S. I see that peterz-queue has been updated recently so I'm not sure
if these observations still hold but if they do ...
On 7/29/2024 11:44 AM, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "WARNING:at_kernel/sched/sched.h:#update_load_avg" on:
>
> commit: 124c8f43740f3f1e3e2aeecd976bcf5ddd7bd435 ("sched/fair: Implement delayed dequeue")
> https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/prep
>
> in testcase: boot
>
> compiler: clang-18
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> we noticed the issue does not always happen, 60 times out of 150 runs as below.
> but the parent keeps clean.
>
> a1133ac6dc7c6bde 124c8f43740f3f1e3e2aeecd976
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :150 40% 60:150 dmesg.WARNING:at_kernel/sched/sched.h:#update_load_avg
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202407291328.761441a7-oliver.sang@intel.com
>
>
> [ 13.522527][ C0] ------------[ cut here ]------------
> [ 13.523046][ C0] rq->clock_update_flags < RQCF_ACT_SKIP
> [ 13.523055][ C0] WARNING: CPU: 0 PID: 158 at kernel/sched/sched.h:1550 update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
> [ 13.524265][ C0] Modules linked in: drm fuse ip_tables
> [ 13.524722][ C0] CPU: 0 UID: 0 PID: 158 Comm: openipmi Not tainted 6.10.0-12745-g124c8f43740f #1
> [ 13.525464][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 13.526307][ C0] RIP: 0010:update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
> [ 13.526793][ C0] Code: 00 4c 2b bb 88 01 00 00 40 f6 c5 02 0f 84 b7 fa ff ff e9 ca fa ff ff c6 05 b9 1b e4 01 01 48 c7 c7 34 74 d6 a6 e8 a1 c7 fa ff <0f> 0b e9 78 fa ff ff 65 8b 0d 8f 83 91 5a 89 c9 48 0f a3 0d e9 7b
> All code
> ========
> 0: 00 4c 2b bb add %cl,-0x45(%rbx,%rbp,1)
> 4: 88 01 mov %al,(%rcx)
> 6: 00 00 add %al,(%rax)
> 8: 40 f6 c5 02 test $0x2,%bpl
> c: 0f 84 b7 fa ff ff je 0xfffffffffffffac9
> 12: e9 ca fa ff ff jmpq 0xfffffffffffffae1
> 17: c6 05 b9 1b e4 01 01 movb $0x1,0x1e41bb9(%rip) # 0x1e41bd7
> 1e: 48 c7 c7 34 74 d6 a6 mov $0xffffffffa6d67434,%rdi
> 25: e8 a1 c7 fa ff callq 0xfffffffffffac7cb
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: e9 78 fa ff ff jmpq 0xfffffffffffffaa9
> 31: 65 8b 0d 8f 83 91 5a mov %gs:0x5a91838f(%rip),%ecx # 0x5a9183c7
> 38: 89 c9 mov %ecx,%ecx
> 3a: 48 rex.W
> 3b: 0f .byte 0xf
> 3c: a3 .byte 0xa3
> 3d: 0d .byte 0xd
> 3e: e9 .byte 0xe9
> 3f: 7b .byte 0x7b
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: e9 78 fa ff ff jmpq 0xfffffffffffffa7f
> 7: 65 8b 0d 8f 83 91 5a mov %gs:0x5a91838f(%rip),%ecx # 0x5a91839d
> e: 89 c9 mov %ecx,%ecx
> 10: 48 rex.W
> 11: 0f .byte 0xf
> 12: a3 .byte 0xa3
> 13: 0d .byte 0xd
> 14: e9 .byte 0xe9
> 15: 7b .byte 0x7b
> [ 13.528234][ C0] RSP: 0000:ffffbaef40003d90 EFLAGS: 00010046
> [ 13.528723][ C0] RAX: b53834d2850b7b00 RBX: ffff9ef2efd34640 RCX: 0000000000000027
> [ 13.529379][ C0] RDX: 0000000000000002 RSI: 00000000ffff7fff RDI: ffff9ef2efc20b88
> [ 13.530029][ C0] RBP: 0000000000000001 R08: 0000000000007fff R09: ffffffffa7053610
> [ 13.530685][ C0] R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9ef2efd345c0
> [ 13.531344][ C0] R13: ffff9ef2efd34640 R14: ffff9eefc0e26400 R15: 0000000000000000
> [ 13.532001][ C0] FS: 0000000000000000(0000) GS:ffff9ef2efc00000(0063) knlGS:00000000f7f5f180
> [ 13.532720][ C0] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
> [ 13.533242][ C0] CR2: 00000000577c51ac CR3: 0000000162e6c000 CR4: 00000000000406f0
> [ 13.533891][ C0] Call Trace:
> [ 13.534215][ C0] <IRQ>
> [ 13.534504][ C0] ? __warn (kernel/panic.c:240 kernel/panic.c:735)
> [ 13.534863][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
> [ 13.535278][ C0] ? report_bug (lib/bug.c:? lib/bug.c:219)
> [ 13.535695][ C0] ? handle_bug (arch/x86/kernel/traps.c:239)
> [ 13.536079][ C0] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
> [ 13.536498][ C0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
> [ 13.536920][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
> [ 13.537334][ C0] dequeue_entity (kernel/sched/sched.h:817 kernel/sched/fair.c:5429)
> [ 13.537729][ C0] dequeue_entities (kernel/sched/fair.c:6972)
> [ 13.538130][ C0] unregister_fair_sched_group (kernel/sched/fair.c:379 kernel/sched/fair.c:13159)
> [ 13.554647][ C0] ? __pfx_sched_unregister_group_rcu (kernel/sched/core.c:8759)
> [ 13.555167][ C0] sched_unregister_group_rcu (kernel/sched/core.c:8707 kernel/sched/core.c:8761)
> [ 13.555621][ C0] rcu_do_batch (arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2576)
> [ 13.556008][ C0] rcu_core (kernel/rcu/tree.c:2845)
> [ 13.556373][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:555)
> [ 13.556775][ C0] __irq_exit_rcu (kernel/softirq.c:617 kernel/softirq.c:639)
> [ 13.557166][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 arch/x86/kernel/apic/apic.c:1043)
Looking at the call stack, any chance it is the following case that is
tripping this warning:
CPU0 CPU1
==== ====
wakeup_preempt(rq) {
resched_curr(rq); ------------------------> sysvec_reschedule_ipi() {
... scheduler_ipi();
if (task_on_rq_queued(rq->curr) && } /* sysvec_reschedule_ipi */
test_tsk_need_resched(rq->curr)) handle_softirqs() {
rq_clock_skip_update(rq); /* RCU_SOFTIRQ */
} /* wakeup_preempt(); */ rcu_core_si() {
unregister_fair_sched_group() {
/* On the rq that got the resched IPI */
if (se->sched_delayed) {
dequeue_entities() {
/* Trips RQCF_ACT_SKIP check within since waker called rq_clock_skip_update() */
} /* dequeue_entities(); */
} /* if() */
} /* unregister_fair_sched_group */
} /* rcu_core_si */
} /* handle_softirqs */
--
commit a64692a3afd8 ("sched: Cleanup/optimize clock updates") seems to
have added this condition to skip clock updates after an enqueue since
wakeup target will do a schedule() soon. Since delayed dequeue may has
entity to dequeue now, looking at rq_clock_pelt() in dequeue path will
trip this since PELT believes that the clock is stale. Could something
like the below work?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd4a6bf14828..c437b408d29b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -13297,10 +13297,34 @@ void unregister_fair_sched_group(struct task_group *tg)
if (se->sched_delayed) {
guard(rq_lock_irqsave)(rq);
if (se->sched_delayed) {
+ /*
+ * We can reach here when processing RCU_SOFTIRQ on exit path from
+ * a reschedule IPI. wakeup_preempt() may have set RQCF_REQ_SKIP to
+ * skip a close clock update in schedule(), however, in presence of
+ * a delayed entity, this trips the check in rq_clock_pelt() which
+ * now believes the clock value is stale and needs updating. To
+ * prevent such situation, cancel any pending skip updates, and
+ * update the rq clock.
+ */
+ rq_clock_cancel_skipupdate(rq);
+
+ /*
+ * XXX: Will this trip WARN_DOUBLE_CLOCK? In which case, can
+ * rq_clock_cancel_skipupdate() be made to return a bool if
+ * RQCF_REQ_SKIP is set and we avoid this update?
+ */
update_rq_clock(rq);
+
dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
+
+ /* Avoid updating the clock again if a schedule() is pending */
+ if (task_on_rq_queued(rq->curr) &&
+ test_tsk_need_resched(rq->curr))
+ rq_clock_skip_update(rq);
}
list_del_leaf_cfs_rq(cfs_rq);
+
+
}
remove_entity_load_avg(se);
}
--
I was not able to hit the SCHED_WARN_ON() reliably on my end when
launching a guest to confirm my hypothesis - I was able to hit it
approx. 1 in 10 run, but I've ran with the above changes 10s of
times and I've not hit the splat (yet!)
Note: Although the diff was updated for the latest
peterz-queue:sched/eevdf, all my testing was done on the previous
version on this branch based on v6.10.0 kernel.
--
Thanks and Regards,
Prateek
> [ 13.557638][ C0] </IRQ>
> [ 13.557920][ C0] <TASK>
> [ 13.560463][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
> [ 13.560957][ C0] RIP: 0010:irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231)
> [ 13.561465][ C0] Code: 90 41 57 41 56 53 48 89 fb e8 63 90 1f ff 65 4c 8b 34 25 40 37 03 00 49 8b 3e f7 c7 0e 30 02 00 74 7c 49 89 ff e8 97 8f 1f ff <4c> 89 f8 a8 08 75 38 a9 00 10 00 00 75 40 a9 00 20 00 00 75 4b a9
> All code
> ========
> 0: 90 nop
> 1: 41 57 push %r15
> 3: 41 56 push %r14
> 5: 53 push %rbx
> 6: 48 89 fb mov %rdi,%rbx
> 9: e8 63 90 1f ff callq 0xffffffffff1f9071
> e: 65 4c 8b 34 25 40 37 mov %gs:0x33740,%r14
> 15: 03 00
> 17: 49 8b 3e mov (%r14),%rdi
> 1a: f7 c7 0e 30 02 00 test $0x2300e,%edi
> 20: 74 7c je 0x9e
> 22: 49 89 ff mov %rdi,%r15
> 25: e8 97 8f 1f ff callq 0xffffffffff1f8fc1
> 2a:* 4c 89 f8 mov %r15,%rax <-- trapping instruction
> 2d: a8 08 test $0x8,%al
> 2f: 75 38 jne 0x69
> 31: a9 00 10 00 00 test $0x1000,%eax
> 36: 75 40 jne 0x78
> 38: a9 00 20 00 00 test $0x2000,%eax
> 3d: 75 4b jne 0x8a
> 3f: a9 .byte 0xa9
>
> Code starting with the faulting instruction
> ===========================================
> 0: 4c 89 f8 mov %r15,%rax
> 3: a8 08 test $0x8,%al
> 5: 75 38 jne 0x3f
> 7: a9 00 10 00 00 test $0x1000,%eax
> c: 75 40 jne 0x4e
> e: a9 00 20 00 00 test $0x2000,%eax
> 13: 75 4b jne 0x60
> 15: a9 .byte 0xa9
> [ 13.562938][ C0] RSP: 0000:ffffbaef40613f38 EFLAGS: 00000202
> [ 13.563444][ C0] RAX: ffff9ef0287dd080 RBX: ffffbaef40613f58 RCX: 000000000000080b
> [ 13.564125][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000008
> [ 13.564796][ C0] RBP: 0000000000000000 R08: 000000000000002b R09: 000000000000002b
> [ 13.565456][ C0] R10: 0000000000000296 R11: 00000000ff8772ac R12: 0000000000000000
> [ 13.566115][ C0] R13: 0000000000000000 R14: ffff9ef0287dd080 R15: 0000000020000008
> [ 13.566782][ C0] ? irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231)
> [ 13.567254][ C0] asm_sysvec_reschedule_ipi (arch/x86/include/asm/idtentry.h:707)
> [ 13.567709][ C0] RIP: 0023:0x56652630
> [ 13.568072][ C0] Code: ff ff 83 c4 10 85 c0 74 05 83 c4 08 5b c3 83 ec 0c 8d 83 8d 6b ff ff 50 e8 cd 7c ff ff 8d b4 26 00 00 00 00 8d b6 00 00 00 00 <55> 57 56 53 e8 e7 61 ff ff 81 c3 13 38 01 00 83 ec 0c 8b 44 24 20
> All code
> ========
> 0: ff (bad)
> 1: ff 83 c4 10 85 c0 incl -0x3f7aef3c(%rbx)
> 7: 74 05 je 0xe
> 9: 83 c4 08 add $0x8,%esp
> c: 5b pop %rbx
> d: c3 retq
> e: 83 ec 0c sub $0xc,%esp
> 11: 8d 83 8d 6b ff ff lea -0x9473(%rbx),%eax
> 17: 50 push %rax
> 18: e8 cd 7c ff ff callq 0xffffffffffff7cea
> 1d: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
> 24: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
> 2a:* 55 push %rbp <-- trapping instruction
> 2b: 57 push %rdi
> 2c: 56 push %rsi
> 2d: 53 push %rbx
> 2e: e8 e7 61 ff ff callq 0xffffffffffff621a
> 33: 81 c3 13 38 01 00 add $0x13813,%ebx
> 39: 83 ec 0c sub $0xc,%esp
> 3c: 8b 44 24 20 mov 0x20(%rsp),%eax
>
> Code starting with the faulting instruction
> ===========================================
> 0: 55 push %rbp
> 1: 57 push %rdi
> 2: 56 push %rsi
> 3: 53 push %rbx
> 4: e8 e7 61 ff ff callq 0xffffffffffff61f0
> 9: 81 c3 13 38 01 00 add $0x13813,%ebx
> f: 83 ec 0c sub $0xc,%esp
> 12: 8b 44 24 20 mov 0x20(%rsp),%eax
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240729/202407291328.761441a7-oliver.sang@intel.com
>
>
>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg
2024-07-31 6:16 ` K Prateek Nayak
@ 2024-07-31 11:54 ` Peter Zijlstra
2024-08-01 3:40 ` K Prateek Nayak
0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2024-07-31 11:54 UTC (permalink / raw)
To: K Prateek Nayak
Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen,
kernel test robot
On Wed, Jul 31, 2024 at 11:46:48AM +0530, K Prateek Nayak wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index cd4a6bf14828..c437b408d29b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -13297,10 +13297,34 @@ void unregister_fair_sched_group(struct task_group *tg)
> if (se->sched_delayed) {
> guard(rq_lock_irqsave)(rq);
> if (se->sched_delayed) {
> + /*
> + * We can reach here when processing RCU_SOFTIRQ on exit path from
> + * a reschedule IPI. wakeup_preempt() may have set RQCF_REQ_SKIP to
> + * skip a close clock update in schedule(), however, in presence of
> + * a delayed entity, this trips the check in rq_clock_pelt() which
> + * now believes the clock value is stale and needs updating. To
> + * prevent such situation, cancel any pending skip updates, and
> + * update the rq clock.
> + */
> + rq_clock_cancel_skipupdate(rq);
> +
> + /*
> + * XXX: Will this trip WARN_DOUBLE_CLOCK? In which case, can
> + * rq_clock_cancel_skipupdate() be made to return a bool if
> + * RQCF_REQ_SKIP is set and we avoid this update?
> + */
> update_rq_clock(rq);
> +
> dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
> +
> + /* Avoid updating the clock again if a schedule() is pending */
> + if (task_on_rq_queued(rq->curr) &&
> + test_tsk_need_resched(rq->curr))
> + rq_clock_skip_update(rq);
> }
> list_del_leaf_cfs_rq(cfs_rq);
> +
> +
> }
> remove_entity_load_avg(se);
> }
So I did update this to simply add update_rq_clock() before the
dequeue_entity(SLEEP|DELAYED). I initially had, these, then confused
myself between deactivate_task() and dequeue_entity(), where the former
updates the clock but the latter does not, and removed them. Then Mike
complained, and I restored it for the regular exit path and forgot the
cgroup exit path.
But now they should both be doing update_rq_clock() here.
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=5b3a132d4dd5c91f26beb3e8973c03cdb77d7873
Since this is all with our own rq->lock held, I don't think skip would
be relevant here.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg
2024-07-31 11:54 ` Peter Zijlstra
@ 2024-08-01 3:40 ` K Prateek Nayak
0 siblings, 0 replies; 4+ messages in thread
From: K Prateek Nayak @ 2024-08-01 3:40 UTC (permalink / raw)
To: Peter Zijlstra
Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen,
kernel test robot
Hello Peter,
On 7/31/2024 5:24 PM, Peter Zijlstra wrote:
> On Wed, Jul 31, 2024 at 11:46:48AM +0530, K Prateek Nayak wrote:
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index cd4a6bf14828..c437b408d29b 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -13297,10 +13297,34 @@ void unregister_fair_sched_group(struct task_group *tg)
>> if (se->sched_delayed) {
>> guard(rq_lock_irqsave)(rq);
>> if (se->sched_delayed) {
>> + /*
>> + * We can reach here when processing RCU_SOFTIRQ on exit path from
>> + * a reschedule IPI. wakeup_preempt() may have set RQCF_REQ_SKIP to
>> + * skip a close clock update in schedule(), however, in presence of
>> + * a delayed entity, this trips the check in rq_clock_pelt() which
>> + * now believes the clock value is stale and needs updating. To
>> + * prevent such situation, cancel any pending skip updates, and
>> + * update the rq clock.
>> + */
>> + rq_clock_cancel_skipupdate(rq);
>> +
>> + /*
>> + * XXX: Will this trip WARN_DOUBLE_CLOCK? In which case, can
>> + * rq_clock_cancel_skipupdate() be made to return a bool if
>> + * RQCF_REQ_SKIP is set and we avoid this update?
>> + */
>> update_rq_clock(rq);
>> +
>> dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
>> +
>> + /* Avoid updating the clock again if a schedule() is pending */
>> + if (task_on_rq_queued(rq->curr) &&
>> + test_tsk_need_resched(rq->curr))
>> + rq_clock_skip_update(rq);
>> }
>> list_del_leaf_cfs_rq(cfs_rq);
>> +
>> +
>> }
>> remove_entity_load_avg(se);
>> }
>
> So I did update this to simply add update_rq_clock() before the
> dequeue_entity(SLEEP|DELAYED). I initially had, these, then confused
> myself between deactivate_task() and dequeue_entity(), where the former
> updates the clock but the latter does not, and removed them. Then Mike
> complained, and I restored it for the regular exit path and forgot the
> cgroup exit path.
>
> But now they should both be doing update_rq_clock() here.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=5b3a132d4dd5c91f26beb3e8973c03cdb77d7873
>
> Since this is all with our own rq->lock held, I don't think skip would
> be relevant here.
Thank you for clarifying the nuances. I'll continue testing with the
latest branch.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-08-01 3:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-29 6:14 [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg kernel test robot
2024-07-31 6:16 ` K Prateek Nayak
2024-07-31 11:54 ` Peter Zijlstra
2024-08-01 3:40 ` K Prateek Nayak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox