* [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg
@ 2024-07-29 6:14 kernel test robot
2024-07-31 6:16 ` K Prateek Nayak
0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2024-07-29 6:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen, oliver.sang
Hello,
kernel test robot noticed "WARNING:at_kernel/sched/sched.h:#update_load_avg" on:
commit: 124c8f43740f3f1e3e2aeecd976bcf5ddd7bd435 ("sched/fair: Implement delayed dequeue")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/prep
in testcase: boot
compiler: clang-18
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
we noticed the issue does not always happen, 60 times out of 150 runs as below.
but the parent keeps clean.
a1133ac6dc7c6bde 124c8f43740f3f1e3e2aeecd976
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:150 40% 60:150 dmesg.WARNING:at_kernel/sched/sched.h:#update_load_avg
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202407291328.761441a7-oliver.sang@intel.com
[ 13.522527][ C0] ------------[ cut here ]------------
[ 13.523046][ C0] rq->clock_update_flags < RQCF_ACT_SKIP
[ 13.523055][ C0] WARNING: CPU: 0 PID: 158 at kernel/sched/sched.h:1550 update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
[ 13.524265][ C0] Modules linked in: drm fuse ip_tables
[ 13.524722][ C0] CPU: 0 UID: 0 PID: 158 Comm: openipmi Not tainted 6.10.0-12745-g124c8f43740f #1
[ 13.525464][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 13.526307][ C0] RIP: 0010:update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
[ 13.526793][ C0] Code: 00 4c 2b bb 88 01 00 00 40 f6 c5 02 0f 84 b7 fa ff ff e9 ca fa ff ff c6 05 b9 1b e4 01 01 48 c7 c7 34 74 d6 a6 e8 a1 c7 fa ff <0f> 0b e9 78 fa ff ff 65 8b 0d 8f 83 91 5a 89 c9 48 0f a3 0d e9 7b
All code
========
0: 00 4c 2b bb add %cl,-0x45(%rbx,%rbp,1)
4: 88 01 mov %al,(%rcx)
6: 00 00 add %al,(%rax)
8: 40 f6 c5 02 test $0x2,%bpl
c: 0f 84 b7 fa ff ff je 0xfffffffffffffac9
12: e9 ca fa ff ff jmpq 0xfffffffffffffae1
17: c6 05 b9 1b e4 01 01 movb $0x1,0x1e41bb9(%rip) # 0x1e41bd7
1e: 48 c7 c7 34 74 d6 a6 mov $0xffffffffa6d67434,%rdi
25: e8 a1 c7 fa ff callq 0xfffffffffffac7cb
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 78 fa ff ff jmpq 0xfffffffffffffaa9
31: 65 8b 0d 8f 83 91 5a mov %gs:0x5a91838f(%rip),%ecx # 0x5a9183c7
38: 89 c9 mov %ecx,%ecx
3a: 48 rex.W
3b: 0f .byte 0xf
3c: a3 .byte 0xa3
3d: 0d .byte 0xd
3e: e9 .byte 0xe9
3f: 7b .byte 0x7b
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 78 fa ff ff jmpq 0xfffffffffffffa7f
7: 65 8b 0d 8f 83 91 5a mov %gs:0x5a91838f(%rip),%ecx # 0x5a91839d
e: 89 c9 mov %ecx,%ecx
10: 48 rex.W
11: 0f .byte 0xf
12: a3 .byte 0xa3
13: 0d .byte 0xd
14: e9 .byte 0xe9
15: 7b .byte 0x7b
[ 13.528234][ C0] RSP: 0000:ffffbaef40003d90 EFLAGS: 00010046
[ 13.528723][ C0] RAX: b53834d2850b7b00 RBX: ffff9ef2efd34640 RCX: 0000000000000027
[ 13.529379][ C0] RDX: 0000000000000002 RSI: 00000000ffff7fff RDI: ffff9ef2efc20b88
[ 13.530029][ C0] RBP: 0000000000000001 R08: 0000000000007fff R09: ffffffffa7053610
[ 13.530685][ C0] R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9ef2efd345c0
[ 13.531344][ C0] R13: ffff9ef2efd34640 R14: ffff9eefc0e26400 R15: 0000000000000000
[ 13.532001][ C0] FS: 0000000000000000(0000) GS:ffff9ef2efc00000(0063) knlGS:00000000f7f5f180
[ 13.532720][ C0] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 13.533242][ C0] CR2: 00000000577c51ac CR3: 0000000162e6c000 CR4: 00000000000406f0
[ 13.533891][ C0] Call Trace:
[ 13.534215][ C0] <IRQ>
[ 13.534504][ C0] ? __warn (kernel/panic.c:240 kernel/panic.c:735)
[ 13.534863][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
[ 13.535278][ C0] ? report_bug (lib/bug.c:? lib/bug.c:219)
[ 13.535695][ C0] ? handle_bug (arch/x86/kernel/traps.c:239)
[ 13.536079][ C0] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
[ 13.536498][ C0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 13.536920][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
[ 13.537334][ C0] dequeue_entity (kernel/sched/sched.h:817 kernel/sched/fair.c:5429)
[ 13.537729][ C0] dequeue_entities (kernel/sched/fair.c:6972)
[ 13.538130][ C0] unregister_fair_sched_group (kernel/sched/fair.c:379 kernel/sched/fair.c:13159)
[ 13.554647][ C0] ? __pfx_sched_unregister_group_rcu (kernel/sched/core.c:8759)
[ 13.555167][ C0] sched_unregister_group_rcu (kernel/sched/core.c:8707 kernel/sched/core.c:8761)
[ 13.555621][ C0] rcu_do_batch (arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2576)
[ 13.556008][ C0] rcu_core (kernel/rcu/tree.c:2845)
[ 13.556373][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:555)
[ 13.556775][ C0] __irq_exit_rcu (kernel/softirq.c:617 kernel/softirq.c:639)
[ 13.557166][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 arch/x86/kernel/apic/apic.c:1043)
[ 13.557638][ C0] </IRQ>
[ 13.557920][ C0] <TASK>
[ 13.560463][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
[ 13.560957][ C0] RIP: 0010:irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231)
[ 13.561465][ C0] Code: 90 41 57 41 56 53 48 89 fb e8 63 90 1f ff 65 4c 8b 34 25 40 37 03 00 49 8b 3e f7 c7 0e 30 02 00 74 7c 49 89 ff e8 97 8f 1f ff <4c> 89 f8 a8 08 75 38 a9 00 10 00 00 75 40 a9 00 20 00 00 75 4b a9
All code
========
0: 90 nop
1: 41 57 push %r15
3: 41 56 push %r14
5: 53 push %rbx
6: 48 89 fb mov %rdi,%rbx
9: e8 63 90 1f ff callq 0xffffffffff1f9071
e: 65 4c 8b 34 25 40 37 mov %gs:0x33740,%r14
15: 03 00
17: 49 8b 3e mov (%r14),%rdi
1a: f7 c7 0e 30 02 00 test $0x2300e,%edi
20: 74 7c je 0x9e
22: 49 89 ff mov %rdi,%r15
25: e8 97 8f 1f ff callq 0xffffffffff1f8fc1
2a:* 4c 89 f8 mov %r15,%rax <-- trapping instruction
2d: a8 08 test $0x8,%al
2f: 75 38 jne 0x69
31: a9 00 10 00 00 test $0x1000,%eax
36: 75 40 jne 0x78
38: a9 00 20 00 00 test $0x2000,%eax
3d: 75 4b jne 0x8a
3f: a9 .byte 0xa9
Code starting with the faulting instruction
===========================================
0: 4c 89 f8 mov %r15,%rax
3: a8 08 test $0x8,%al
5: 75 38 jne 0x3f
7: a9 00 10 00 00 test $0x1000,%eax
c: 75 40 jne 0x4e
e: a9 00 20 00 00 test $0x2000,%eax
13: 75 4b jne 0x60
15: a9 .byte 0xa9
[ 13.562938][ C0] RSP: 0000:ffffbaef40613f38 EFLAGS: 00000202
[ 13.563444][ C0] RAX: ffff9ef0287dd080 RBX: ffffbaef40613f58 RCX: 000000000000080b
[ 13.564125][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000008
[ 13.564796][ C0] RBP: 0000000000000000 R08: 000000000000002b R09: 000000000000002b
[ 13.565456][ C0] R10: 0000000000000296 R11: 00000000ff8772ac R12: 0000000000000000
[ 13.566115][ C0] R13: 0000000000000000 R14: ffff9ef0287dd080 R15: 0000000020000008
[ 13.566782][ C0] ? irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231)
[ 13.567254][ C0] asm_sysvec_reschedule_ipi (arch/x86/include/asm/idtentry.h:707)
[ 13.567709][ C0] RIP: 0023:0x56652630
[ 13.568072][ C0] Code: ff ff 83 c4 10 85 c0 74 05 83 c4 08 5b c3 83 ec 0c 8d 83 8d 6b ff ff 50 e8 cd 7c ff ff 8d b4 26 00 00 00 00 8d b6 00 00 00 00 <55> 57 56 53 e8 e7 61 ff ff 81 c3 13 38 01 00 83 ec 0c 8b 44 24 20
All code
========
0: ff (bad)
1: ff 83 c4 10 85 c0 incl -0x3f7aef3c(%rbx)
7: 74 05 je 0xe
9: 83 c4 08 add $0x8,%esp
c: 5b pop %rbx
d: c3 retq
e: 83 ec 0c sub $0xc,%esp
11: 8d 83 8d 6b ff ff lea -0x9473(%rbx),%eax
17: 50 push %rax
18: e8 cd 7c ff ff callq 0xffffffffffff7cea
1d: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
24: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
2a:* 55 push %rbp <-- trapping instruction
2b: 57 push %rdi
2c: 56 push %rsi
2d: 53 push %rbx
2e: e8 e7 61 ff ff callq 0xffffffffffff621a
33: 81 c3 13 38 01 00 add $0x13813,%ebx
39: 83 ec 0c sub $0xc,%esp
3c: 8b 44 24 20 mov 0x20(%rsp),%eax
Code starting with the faulting instruction
===========================================
0: 55 push %rbp
1: 57 push %rdi
2: 56 push %rsi
3: 53 push %rbx
4: e8 e7 61 ff ff callq 0xffffffffffff61f0
9: 81 c3 13 38 01 00 add $0x13813,%ebx
f: 83 ec 0c sub $0xc,%esp
12: 8b 44 24 20 mov 0x20(%rsp),%eax
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240729/202407291328.761441a7-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg 2024-07-29 6:14 [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg kernel test robot @ 2024-07-31 6:16 ` K Prateek Nayak 2024-07-31 11:54 ` Peter Zijlstra 0 siblings, 1 reply; 4+ messages in thread From: K Prateek Nayak @ 2024-07-31 6:16 UTC (permalink / raw) To: Peter Zijlstra Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen, kernel test robot Hello Peter, Oliver, P.S. I see that peterz-queue has been updated recently so I'm not sure if these observations still hold but if they do ... On 7/29/2024 11:44 AM, kernel test robot wrote: > > > Hello, > > kernel test robot noticed "WARNING:at_kernel/sched/sched.h:#update_load_avg" on: > > commit: 124c8f43740f3f1e3e2aeecd976bcf5ddd7bd435 ("sched/fair: Implement delayed dequeue") > https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/prep > > in testcase: boot > > compiler: clang-18 > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > we noticed the issue does not always happen, 60 times out of 150 runs as below. > but the parent keeps clean. > > a1133ac6dc7c6bde 124c8f43740f3f1e3e2aeecd976 > ---------------- --------------------------- > fail:runs %reproduction fail:runs > | | | > :150 40% 60:150 dmesg.WARNING:at_kernel/sched/sched.h:#update_load_avg > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202407291328.761441a7-oliver.sang@intel.com > > > [ 13.522527][ C0] ------------[ cut here ]------------ > [ 13.523046][ C0] rq->clock_update_flags < RQCF_ACT_SKIP > [ 13.523055][ C0] WARNING: CPU: 0 PID: 158 at kernel/sched/sched.h:1550 update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747) > [ 13.524265][ C0] Modules linked in: drm fuse ip_tables > [ 13.524722][ C0] CPU: 0 UID: 0 PID: 158 Comm: openipmi Not tainted 6.10.0-12745-g124c8f43740f #1 > [ 13.525464][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 13.526307][ C0] RIP: 0010:update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747) > [ 13.526793][ C0] Code: 00 4c 2b bb 88 01 00 00 40 f6 c5 02 0f 84 b7 fa ff ff e9 ca fa ff ff c6 05 b9 1b e4 01 01 48 c7 c7 34 74 d6 a6 e8 a1 c7 fa ff <0f> 0b e9 78 fa ff ff 65 8b 0d 8f 83 91 5a 89 c9 48 0f a3 0d e9 7b > All code > ======== > 0: 00 4c 2b bb add %cl,-0x45(%rbx,%rbp,1) > 4: 88 01 mov %al,(%rcx) > 6: 00 00 add %al,(%rax) > 8: 40 f6 c5 02 test $0x2,%bpl > c: 0f 84 b7 fa ff ff je 0xfffffffffffffac9 > 12: e9 ca fa ff ff jmpq 0xfffffffffffffae1 > 17: c6 05 b9 1b e4 01 01 movb $0x1,0x1e41bb9(%rip) # 0x1e41bd7 > 1e: 48 c7 c7 34 74 d6 a6 mov $0xffffffffa6d67434,%rdi > 25: e8 a1 c7 fa ff callq 0xfffffffffffac7cb > 2a:* 0f 0b ud2 <-- trapping instruction > 2c: e9 78 fa ff ff jmpq 0xfffffffffffffaa9 > 31: 65 8b 0d 8f 83 91 5a mov %gs:0x5a91838f(%rip),%ecx # 0x5a9183c7 > 38: 89 c9 mov %ecx,%ecx > 3a: 48 rex.W > 3b: 0f .byte 0xf > 3c: a3 .byte 0xa3 > 3d: 0d .byte 0xd > 3e: e9 .byte 0xe9 > 3f: 7b .byte 0x7b > > Code starting with the faulting instruction > =========================================== > 0: 0f 0b ud2 > 2: e9 78 fa ff ff jmpq 0xfffffffffffffa7f > 7: 65 8b 0d 8f 83 91 5a mov %gs:0x5a91838f(%rip),%ecx # 0x5a91839d > e: 89 c9 mov %ecx,%ecx > 10: 48 rex.W > 11: 0f .byte 0xf > 12: a3 .byte 0xa3 > 13: 0d .byte 0xd > 14: e9 .byte 0xe9 > 15: 7b .byte 0x7b > [ 13.528234][ C0] RSP: 0000:ffffbaef40003d90 EFLAGS: 00010046 > [ 13.528723][ C0] RAX: b53834d2850b7b00 RBX: ffff9ef2efd34640 RCX: 0000000000000027 > [ 13.529379][ C0] RDX: 0000000000000002 RSI: 00000000ffff7fff RDI: ffff9ef2efc20b88 > [ 13.530029][ C0] RBP: 0000000000000001 R08: 0000000000007fff R09: ffffffffa7053610 > [ 13.530685][ C0] R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9ef2efd345c0 > [ 13.531344][ C0] R13: ffff9ef2efd34640 R14: ffff9eefc0e26400 R15: 0000000000000000 > [ 13.532001][ C0] FS: 0000000000000000(0000) GS:ffff9ef2efc00000(0063) knlGS:00000000f7f5f180 > [ 13.532720][ C0] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > [ 13.533242][ C0] CR2: 00000000577c51ac CR3: 0000000162e6c000 CR4: 00000000000406f0 > [ 13.533891][ C0] Call Trace: > [ 13.534215][ C0] <IRQ> > [ 13.534504][ C0] ? __warn (kernel/panic.c:240 kernel/panic.c:735) > [ 13.534863][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747) > [ 13.535278][ C0] ? report_bug (lib/bug.c:? lib/bug.c:219) > [ 13.535695][ C0] ? handle_bug (arch/x86/kernel/traps.c:239) > [ 13.536079][ C0] ? exc_invalid_op (arch/x86/kernel/traps.c:260) > [ 13.536498][ C0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) > [ 13.536920][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747) > [ 13.537334][ C0] dequeue_entity (kernel/sched/sched.h:817 kernel/sched/fair.c:5429) > [ 13.537729][ C0] dequeue_entities (kernel/sched/fair.c:6972) > [ 13.538130][ C0] unregister_fair_sched_group (kernel/sched/fair.c:379 kernel/sched/fair.c:13159) > [ 13.554647][ C0] ? __pfx_sched_unregister_group_rcu (kernel/sched/core.c:8759) > [ 13.555167][ C0] sched_unregister_group_rcu (kernel/sched/core.c:8707 kernel/sched/core.c:8761) > [ 13.555621][ C0] rcu_do_batch (arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2576) > [ 13.556008][ C0] rcu_core (kernel/rcu/tree.c:2845) > [ 13.556373][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:555) > [ 13.556775][ C0] __irq_exit_rcu (kernel/softirq.c:617 kernel/softirq.c:639) > [ 13.557166][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 arch/x86/kernel/apic/apic.c:1043) Looking at the call stack, any chance it is the following case that is tripping this warning: CPU0 CPU1 ==== ==== wakeup_preempt(rq) { resched_curr(rq); ------------------------> sysvec_reschedule_ipi() { ... scheduler_ipi(); if (task_on_rq_queued(rq->curr) && } /* sysvec_reschedule_ipi */ test_tsk_need_resched(rq->curr)) handle_softirqs() { rq_clock_skip_update(rq); /* RCU_SOFTIRQ */ } /* wakeup_preempt(); */ rcu_core_si() { unregister_fair_sched_group() { /* On the rq that got the resched IPI */ if (se->sched_delayed) { dequeue_entities() { /* Trips RQCF_ACT_SKIP check within since waker called rq_clock_skip_update() */ } /* dequeue_entities(); */ } /* if() */ } /* unregister_fair_sched_group */ } /* rcu_core_si */ } /* handle_softirqs */ -- commit a64692a3afd8 ("sched: Cleanup/optimize clock updates") seems to have added this condition to skip clock updates after an enqueue since wakeup target will do a schedule() soon. Since delayed dequeue may has entity to dequeue now, looking at rq_clock_pelt() in dequeue path will trip this since PELT believes that the clock is stale. Could something like the below work? diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cd4a6bf14828..c437b408d29b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13297,10 +13297,34 @@ void unregister_fair_sched_group(struct task_group *tg) if (se->sched_delayed) { guard(rq_lock_irqsave)(rq); if (se->sched_delayed) { + /* + * We can reach here when processing RCU_SOFTIRQ on exit path from + * a reschedule IPI. wakeup_preempt() may have set RQCF_REQ_SKIP to + * skip a close clock update in schedule(), however, in presence of + * a delayed entity, this trips the check in rq_clock_pelt() which + * now believes the clock value is stale and needs updating. To + * prevent such situation, cancel any pending skip updates, and + * update the rq clock. + */ + rq_clock_cancel_skipupdate(rq); + + /* + * XXX: Will this trip WARN_DOUBLE_CLOCK? In which case, can + * rq_clock_cancel_skipupdate() be made to return a bool if + * RQCF_REQ_SKIP is set and we avoid this update? + */ update_rq_clock(rq); + dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED); + + /* Avoid updating the clock again if a schedule() is pending */ + if (task_on_rq_queued(rq->curr) && + test_tsk_need_resched(rq->curr)) + rq_clock_skip_update(rq); } list_del_leaf_cfs_rq(cfs_rq); + + } remove_entity_load_avg(se); } -- I was not able to hit the SCHED_WARN_ON() reliably on my end when launching a guest to confirm my hypothesis - I was able to hit it approx. 1 in 10 run, but I've ran with the above changes 10s of times and I've not hit the splat (yet!) Note: Although the diff was updated for the latest peterz-queue:sched/eevdf, all my testing was done on the previous version on this branch based on v6.10.0 kernel. -- Thanks and Regards, Prateek > [ 13.557638][ C0] </IRQ> > [ 13.557920][ C0] <TASK> > [ 13.560463][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702) > [ 13.560957][ C0] RIP: 0010:irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231) > [ 13.561465][ C0] Code: 90 41 57 41 56 53 48 89 fb e8 63 90 1f ff 65 4c 8b 34 25 40 37 03 00 49 8b 3e f7 c7 0e 30 02 00 74 7c 49 89 ff e8 97 8f 1f ff <4c> 89 f8 a8 08 75 38 a9 00 10 00 00 75 40 a9 00 20 00 00 75 4b a9 > All code > ======== > 0: 90 nop > 1: 41 57 push %r15 > 3: 41 56 push %r14 > 5: 53 push %rbx > 6: 48 89 fb mov %rdi,%rbx > 9: e8 63 90 1f ff callq 0xffffffffff1f9071 > e: 65 4c 8b 34 25 40 37 mov %gs:0x33740,%r14 > 15: 03 00 > 17: 49 8b 3e mov (%r14),%rdi > 1a: f7 c7 0e 30 02 00 test $0x2300e,%edi > 20: 74 7c je 0x9e > 22: 49 89 ff mov %rdi,%r15 > 25: e8 97 8f 1f ff callq 0xffffffffff1f8fc1 > 2a:* 4c 89 f8 mov %r15,%rax <-- trapping instruction > 2d: a8 08 test $0x8,%al > 2f: 75 38 jne 0x69 > 31: a9 00 10 00 00 test $0x1000,%eax > 36: 75 40 jne 0x78 > 38: a9 00 20 00 00 test $0x2000,%eax > 3d: 75 4b jne 0x8a > 3f: a9 .byte 0xa9 > > Code starting with the faulting instruction > =========================================== > 0: 4c 89 f8 mov %r15,%rax > 3: a8 08 test $0x8,%al > 5: 75 38 jne 0x3f > 7: a9 00 10 00 00 test $0x1000,%eax > c: 75 40 jne 0x4e > e: a9 00 20 00 00 test $0x2000,%eax > 13: 75 4b jne 0x60 > 15: a9 .byte 0xa9 > [ 13.562938][ C0] RSP: 0000:ffffbaef40613f38 EFLAGS: 00000202 > [ 13.563444][ C0] RAX: ffff9ef0287dd080 RBX: ffffbaef40613f58 RCX: 000000000000080b > [ 13.564125][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000008 > [ 13.564796][ C0] RBP: 0000000000000000 R08: 000000000000002b R09: 000000000000002b > [ 13.565456][ C0] R10: 0000000000000296 R11: 00000000ff8772ac R12: 0000000000000000 > [ 13.566115][ C0] R13: 0000000000000000 R14: ffff9ef0287dd080 R15: 0000000020000008 > [ 13.566782][ C0] ? irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231) > [ 13.567254][ C0] asm_sysvec_reschedule_ipi (arch/x86/include/asm/idtentry.h:707) > [ 13.567709][ C0] RIP: 0023:0x56652630 > [ 13.568072][ C0] Code: ff ff 83 c4 10 85 c0 74 05 83 c4 08 5b c3 83 ec 0c 8d 83 8d 6b ff ff 50 e8 cd 7c ff ff 8d b4 26 00 00 00 00 8d b6 00 00 00 00 <55> 57 56 53 e8 e7 61 ff ff 81 c3 13 38 01 00 83 ec 0c 8b 44 24 20 > All code > ======== > 0: ff (bad) > 1: ff 83 c4 10 85 c0 incl -0x3f7aef3c(%rbx) > 7: 74 05 je 0xe > 9: 83 c4 08 add $0x8,%esp > c: 5b pop %rbx > d: c3 retq > e: 83 ec 0c sub $0xc,%esp > 11: 8d 83 8d 6b ff ff lea -0x9473(%rbx),%eax > 17: 50 push %rax > 18: e8 cd 7c ff ff callq 0xffffffffffff7cea > 1d: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi > 24: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi > 2a:* 55 push %rbp <-- trapping instruction > 2b: 57 push %rdi > 2c: 56 push %rsi > 2d: 53 push %rbx > 2e: e8 e7 61 ff ff callq 0xffffffffffff621a > 33: 81 c3 13 38 01 00 add $0x13813,%ebx > 39: 83 ec 0c sub $0xc,%esp > 3c: 8b 44 24 20 mov 0x20(%rsp),%eax > > Code starting with the faulting instruction > =========================================== > 0: 55 push %rbp > 1: 57 push %rdi > 2: 56 push %rsi > 3: 53 push %rbx > 4: e8 e7 61 ff ff callq 0xffffffffffff61f0 > 9: 81 c3 13 38 01 00 add $0x13813,%ebx > f: 83 ec 0c sub $0xc,%esp > 12: 8b 44 24 20 mov 0x20(%rsp),%eax > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20240729/202407291328.761441a7-oliver.sang@intel.com > > > ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg 2024-07-31 6:16 ` K Prateek Nayak @ 2024-07-31 11:54 ` Peter Zijlstra 2024-08-01 3:40 ` K Prateek Nayak 0 siblings, 1 reply; 4+ messages in thread From: Peter Zijlstra @ 2024-07-31 11:54 UTC (permalink / raw) To: K Prateek Nayak Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen, kernel test robot On Wed, Jul 31, 2024 at 11:46:48AM +0530, K Prateek Nayak wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index cd4a6bf14828..c437b408d29b 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -13297,10 +13297,34 @@ void unregister_fair_sched_group(struct task_group *tg) > if (se->sched_delayed) { > guard(rq_lock_irqsave)(rq); > if (se->sched_delayed) { > + /* > + * We can reach here when processing RCU_SOFTIRQ on exit path from > + * a reschedule IPI. wakeup_preempt() may have set RQCF_REQ_SKIP to > + * skip a close clock update in schedule(), however, in presence of > + * a delayed entity, this trips the check in rq_clock_pelt() which > + * now believes the clock value is stale and needs updating. To > + * prevent such situation, cancel any pending skip updates, and > + * update the rq clock. > + */ > + rq_clock_cancel_skipupdate(rq); > + > + /* > + * XXX: Will this trip WARN_DOUBLE_CLOCK? In which case, can > + * rq_clock_cancel_skipupdate() be made to return a bool if > + * RQCF_REQ_SKIP is set and we avoid this update? > + */ > update_rq_clock(rq); > + > dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED); > + > + /* Avoid updating the clock again if a schedule() is pending */ > + if (task_on_rq_queued(rq->curr) && > + test_tsk_need_resched(rq->curr)) > + rq_clock_skip_update(rq); > } > list_del_leaf_cfs_rq(cfs_rq); > + > + > } > remove_entity_load_avg(se); > } So I did update this to simply add update_rq_clock() before the dequeue_entity(SLEEP|DELAYED). I initially had, these, then confused myself between deactivate_task() and dequeue_entity(), where the former updates the clock but the latter does not, and removed them. Then Mike complained, and I restored it for the regular exit path and forgot the cgroup exit path. But now they should both be doing update_rq_clock() here. https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=5b3a132d4dd5c91f26beb3e8973c03cdb77d7873 Since this is all with our own rq->lock held, I don't think skip would be relevant here. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg 2024-07-31 11:54 ` Peter Zijlstra @ 2024-08-01 3:40 ` K Prateek Nayak 0 siblings, 0 replies; 4+ messages in thread From: K Prateek Nayak @ 2024-08-01 3:40 UTC (permalink / raw) To: Peter Zijlstra Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen, kernel test robot Hello Peter, On 7/31/2024 5:24 PM, Peter Zijlstra wrote: > On Wed, Jul 31, 2024 at 11:46:48AM +0530, K Prateek Nayak wrote: >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index cd4a6bf14828..c437b408d29b 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -13297,10 +13297,34 @@ void unregister_fair_sched_group(struct task_group *tg) >> if (se->sched_delayed) { >> guard(rq_lock_irqsave)(rq); >> if (se->sched_delayed) { >> + /* >> + * We can reach here when processing RCU_SOFTIRQ on exit path from >> + * a reschedule IPI. wakeup_preempt() may have set RQCF_REQ_SKIP to >> + * skip a close clock update in schedule(), however, in presence of >> + * a delayed entity, this trips the check in rq_clock_pelt() which >> + * now believes the clock value is stale and needs updating. To >> + * prevent such situation, cancel any pending skip updates, and >> + * update the rq clock. >> + */ >> + rq_clock_cancel_skipupdate(rq); >> + >> + /* >> + * XXX: Will this trip WARN_DOUBLE_CLOCK? In which case, can >> + * rq_clock_cancel_skipupdate() be made to return a bool if >> + * RQCF_REQ_SKIP is set and we avoid this update? >> + */ >> update_rq_clock(rq); >> + >> dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED); >> + >> + /* Avoid updating the clock again if a schedule() is pending */ >> + if (task_on_rq_queued(rq->curr) && >> + test_tsk_need_resched(rq->curr)) >> + rq_clock_skip_update(rq); >> } >> list_del_leaf_cfs_rq(cfs_rq); >> + >> + >> } >> remove_entity_load_avg(se); >> } > > So I did update this to simply add update_rq_clock() before the > dequeue_entity(SLEEP|DELAYED). I initially had, these, then confused > myself between deactivate_task() and dequeue_entity(), where the former > updates the clock but the latter does not, and removed them. Then Mike > complained, and I restored it for the regular exit path and forgot the > cgroup exit path. > > But now they should both be doing update_rq_clock() here. > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=5b3a132d4dd5c91f26beb3e8973c03cdb77d7873 > > Since this is all with our own rq->lock held, I don't think skip would > be relevant here. Thank you for clarifying the nuances. I'll continue testing with the latest branch. -- Thanks and Regards, Prateek ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-08-01 3:42 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-29 6:14 [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg kernel test robot 2024-07-31 6:16 ` K Prateek Nayak 2024-07-31 11:54 ` Peter Zijlstra 2024-08-01 3:40 ` K Prateek Nayak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox