[peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [peterz-queue:sched/prep] [sched/fair]  124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg
@ 2024-07-29  6:14 kernel test robot
  2024-07-31  6:16 ` K Prateek Nayak
  0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2024-07-29  6:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen, oliver.sang



Hello,

kernel test robot noticed "WARNING:at_kernel/sched/sched.h:#update_load_avg" on:

commit: 124c8f43740f3f1e3e2aeecd976bcf5ddd7bd435 ("sched/fair: Implement delayed dequeue")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/prep

in testcase: boot

compiler: clang-18
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


we noticed the issue does not always happen, 60 times out of 150 runs as below.
but the parent keeps clean.

a1133ac6dc7c6bde 124c8f43740f3f1e3e2aeecd976
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :150         40%          60:150   dmesg.WARNING:at_kernel/sched/sched.h:#update_load_avg



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202407291328.761441a7-oliver.sang@intel.com


[   13.522527][    C0] ------------[ cut here ]------------
[   13.523046][    C0] rq->clock_update_flags < RQCF_ACT_SKIP
[ 13.523055][ C0] WARNING: CPU: 0 PID: 158 at kernel/sched/sched.h:1550 update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747) 
[   13.524265][    C0] Modules linked in: drm fuse ip_tables
[   13.524722][    C0] CPU: 0 UID: 0 PID: 158 Comm: openipmi Not tainted 6.10.0-12745-g124c8f43740f #1
[   13.525464][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 13.526307][ C0] RIP: 0010:update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747) 
[ 13.526793][ C0] Code: 00 4c 2b bb 88 01 00 00 40 f6 c5 02 0f 84 b7 fa ff ff e9 ca fa ff ff c6 05 b9 1b e4 01 01 48 c7 c7 34 74 d6 a6 e8 a1 c7 fa ff <0f> 0b e9 78 fa ff ff 65 8b 0d 8f 83 91 5a 89 c9 48 0f a3 0d e9 7b
All code
========
   0:	00 4c 2b bb          	add    %cl,-0x45(%rbx,%rbp,1)
   4:	88 01                	mov    %al,(%rcx)
   6:	00 00                	add    %al,(%rax)
   8:	40 f6 c5 02          	test   $0x2,%bpl
   c:	0f 84 b7 fa ff ff    	je     0xfffffffffffffac9
  12:	e9 ca fa ff ff       	jmpq   0xfffffffffffffae1
  17:	c6 05 b9 1b e4 01 01 	movb   $0x1,0x1e41bb9(%rip)        # 0x1e41bd7
  1e:	48 c7 c7 34 74 d6 a6 	mov    $0xffffffffa6d67434,%rdi
  25:	e8 a1 c7 fa ff       	callq  0xfffffffffffac7cb
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	e9 78 fa ff ff       	jmpq   0xfffffffffffffaa9
  31:	65 8b 0d 8f 83 91 5a 	mov    %gs:0x5a91838f(%rip),%ecx        # 0x5a9183c7
  38:	89 c9                	mov    %ecx,%ecx
  3a:	48                   	rex.W
  3b:	0f                   	.byte 0xf
  3c:	a3                   	.byte 0xa3
  3d:	0d                   	.byte 0xd
  3e:	e9                   	.byte 0xe9
  3f:	7b                   	.byte 0x7b

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	e9 78 fa ff ff       	jmpq   0xfffffffffffffa7f
   7:	65 8b 0d 8f 83 91 5a 	mov    %gs:0x5a91838f(%rip),%ecx        # 0x5a91839d
   e:	89 c9                	mov    %ecx,%ecx
  10:	48                   	rex.W
  11:	0f                   	.byte 0xf
  12:	a3                   	.byte 0xa3
  13:	0d                   	.byte 0xd
  14:	e9                   	.byte 0xe9
  15:	7b                   	.byte 0x7b
[   13.528234][    C0] RSP: 0000:ffffbaef40003d90 EFLAGS: 00010046
[   13.528723][    C0] RAX: b53834d2850b7b00 RBX: ffff9ef2efd34640 RCX: 0000000000000027
[   13.529379][    C0] RDX: 0000000000000002 RSI: 00000000ffff7fff RDI: ffff9ef2efc20b88
[   13.530029][    C0] RBP: 0000000000000001 R08: 0000000000007fff R09: ffffffffa7053610
[   13.530685][    C0] R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9ef2efd345c0
[   13.531344][    C0] R13: ffff9ef2efd34640 R14: ffff9eefc0e26400 R15: 0000000000000000
[   13.532001][    C0] FS:  0000000000000000(0000) GS:ffff9ef2efc00000(0063) knlGS:00000000f7f5f180
[   13.532720][    C0] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[   13.533242][    C0] CR2: 00000000577c51ac CR3: 0000000162e6c000 CR4: 00000000000406f0
[   13.533891][    C0] Call Trace:
[   13.534215][    C0]  <IRQ>
[ 13.534504][ C0] ? __warn (kernel/panic.c:240 kernel/panic.c:735) 
[ 13.534863][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747) 
[ 13.535278][ C0] ? report_bug (lib/bug.c:? lib/bug.c:219) 
[ 13.535695][ C0] ? handle_bug (arch/x86/kernel/traps.c:239) 
[ 13.536079][ C0] ? exc_invalid_op (arch/x86/kernel/traps.c:260) 
[ 13.536498][ C0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) 
[ 13.536920][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747) 
[ 13.537334][ C0] dequeue_entity (kernel/sched/sched.h:817 kernel/sched/fair.c:5429) 
[ 13.537729][ C0] dequeue_entities (kernel/sched/fair.c:6972) 
[ 13.538130][ C0] unregister_fair_sched_group (kernel/sched/fair.c:379 kernel/sched/fair.c:13159) 
[ 13.554647][ C0] ? __pfx_sched_unregister_group_rcu (kernel/sched/core.c:8759) 
[ 13.555167][ C0] sched_unregister_group_rcu (kernel/sched/core.c:8707 kernel/sched/core.c:8761) 
[ 13.555621][ C0] rcu_do_batch (arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2576) 
[ 13.556008][ C0] rcu_core (kernel/rcu/tree.c:2845) 
[ 13.556373][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:555) 
[ 13.556775][ C0] __irq_exit_rcu (kernel/softirq.c:617 kernel/softirq.c:639) 
[ 13.557166][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 arch/x86/kernel/apic/apic.c:1043) 
[   13.557638][    C0]  </IRQ>
[   13.557920][    C0]  <TASK>
[ 13.560463][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702) 
[ 13.560957][ C0] RIP: 0010:irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231) 
[ 13.561465][ C0] Code: 90 41 57 41 56 53 48 89 fb e8 63 90 1f ff 65 4c 8b 34 25 40 37 03 00 49 8b 3e f7 c7 0e 30 02 00 74 7c 49 89 ff e8 97 8f 1f ff <4c> 89 f8 a8 08 75 38 a9 00 10 00 00 75 40 a9 00 20 00 00 75 4b a9
All code
========
   0:	90                   	nop
   1:	41 57                	push   %r15
   3:	41 56                	push   %r14
   5:	53                   	push   %rbx
   6:	48 89 fb             	mov    %rdi,%rbx
   9:	e8 63 90 1f ff       	callq  0xffffffffff1f9071
   e:	65 4c 8b 34 25 40 37 	mov    %gs:0x33740,%r14
  15:	03 00 
  17:	49 8b 3e             	mov    (%r14),%rdi
  1a:	f7 c7 0e 30 02 00    	test   $0x2300e,%edi
  20:	74 7c                	je     0x9e
  22:	49 89 ff             	mov    %rdi,%r15
  25:	e8 97 8f 1f ff       	callq  0xffffffffff1f8fc1
  2a:*	4c 89 f8             	mov    %r15,%rax		<-- trapping instruction
  2d:	a8 08                	test   $0x8,%al
  2f:	75 38                	jne    0x69
  31:	a9 00 10 00 00       	test   $0x1000,%eax
  36:	75 40                	jne    0x78
  38:	a9 00 20 00 00       	test   $0x2000,%eax
  3d:	75 4b                	jne    0x8a
  3f:	a9                   	.byte 0xa9

Code starting with the faulting instruction
===========================================
   0:	4c 89 f8             	mov    %r15,%rax
   3:	a8 08                	test   $0x8,%al
   5:	75 38                	jne    0x3f
   7:	a9 00 10 00 00       	test   $0x1000,%eax
   c:	75 40                	jne    0x4e
   e:	a9 00 20 00 00       	test   $0x2000,%eax
  13:	75 4b                	jne    0x60
  15:	a9                   	.byte 0xa9
[   13.562938][    C0] RSP: 0000:ffffbaef40613f38 EFLAGS: 00000202
[   13.563444][    C0] RAX: ffff9ef0287dd080 RBX: ffffbaef40613f58 RCX: 000000000000080b
[   13.564125][    C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000008
[   13.564796][    C0] RBP: 0000000000000000 R08: 000000000000002b R09: 000000000000002b
[   13.565456][    C0] R10: 0000000000000296 R11: 00000000ff8772ac R12: 0000000000000000
[   13.566115][    C0] R13: 0000000000000000 R14: ffff9ef0287dd080 R15: 0000000020000008
[ 13.566782][ C0] ? irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231) 
[ 13.567254][ C0] asm_sysvec_reschedule_ipi (arch/x86/include/asm/idtentry.h:707) 
[   13.567709][    C0] RIP: 0023:0x56652630
[ 13.568072][ C0] Code: ff ff 83 c4 10 85 c0 74 05 83 c4 08 5b c3 83 ec 0c 8d 83 8d 6b ff ff 50 e8 cd 7c ff ff 8d b4 26 00 00 00 00 8d b6 00 00 00 00 <55> 57 56 53 e8 e7 61 ff ff 81 c3 13 38 01 00 83 ec 0c 8b 44 24 20
All code
========
   0:	ff                   	(bad)  
   1:	ff 83 c4 10 85 c0    	incl   -0x3f7aef3c(%rbx)
   7:	74 05                	je     0xe
   9:	83 c4 08             	add    $0x8,%esp
   c:	5b                   	pop    %rbx
   d:	c3                   	retq   
   e:	83 ec 0c             	sub    $0xc,%esp
  11:	8d 83 8d 6b ff ff    	lea    -0x9473(%rbx),%eax
  17:	50                   	push   %rax
  18:	e8 cd 7c ff ff       	callq  0xffffffffffff7cea
  1d:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
  24:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
  2a:*	55                   	push   %rbp		<-- trapping instruction
  2b:	57                   	push   %rdi
  2c:	56                   	push   %rsi
  2d:	53                   	push   %rbx
  2e:	e8 e7 61 ff ff       	callq  0xffffffffffff621a
  33:	81 c3 13 38 01 00    	add    $0x13813,%ebx
  39:	83 ec 0c             	sub    $0xc,%esp
  3c:	8b 44 24 20          	mov    0x20(%rsp),%eax

Code starting with the faulting instruction
===========================================
   0:	55                   	push   %rbp
   1:	57                   	push   %rdi
   2:	56                   	push   %rsi
   3:	53                   	push   %rbx
   4:	e8 e7 61 ff ff       	callq  0xffffffffffff61f0
   9:	81 c3 13 38 01 00    	add    $0x13813,%ebx
   f:	83 ec 0c             	sub    $0xc,%esp
  12:	8b 44 24 20          	mov    0x20(%rsp),%eax


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240729/202407291328.761441a7-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg
  2024-07-29  6:14 [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg kernel test robot
@ 2024-07-31  6:16 ` K Prateek Nayak
  2024-07-31 11:54   ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: K Prateek Nayak @ 2024-07-31  6:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen,
	kernel test robot

Hello Peter, Oliver,

P.S. I see that peterz-queue has been updated recently so I'm not sure
if these observations still hold but if they do ...

On 7/29/2024 11:44 AM, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed "WARNING:at_kernel/sched/sched.h:#update_load_avg" on:
> 
> commit: 124c8f43740f3f1e3e2aeecd976bcf5ddd7bd435 ("sched/fair: Implement delayed dequeue")
> https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/prep
> 
> in testcase: boot
> 
> compiler: clang-18
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> we noticed the issue does not always happen, 60 times out of 150 runs as below.
> but the parent keeps clean.
> 
> a1133ac6dc7c6bde 124c8f43740f3f1e3e2aeecd976
> ---------------- ---------------------------
>         fail:runs  %reproduction    fail:runs
>             |             |             |
>             :150         40%          60:150   dmesg.WARNING:at_kernel/sched/sched.h:#update_load_avg
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202407291328.761441a7-oliver.sang@intel.com
> 
> 
> [   13.522527][    C0] ------------[ cut here ]------------
> [   13.523046][    C0] rq->clock_update_flags < RQCF_ACT_SKIP
> [ 13.523055][ C0] WARNING: CPU: 0 PID: 158 at kernel/sched/sched.h:1550 update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
> [   13.524265][    C0] Modules linked in: drm fuse ip_tables
> [   13.524722][    C0] CPU: 0 UID: 0 PID: 158 Comm: openipmi Not tainted 6.10.0-12745-g124c8f43740f #1
> [   13.525464][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 13.526307][ C0] RIP: 0010:update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
> [ 13.526793][ C0] Code: 00 4c 2b bb 88 01 00 00 40 f6 c5 02 0f 84 b7 fa ff ff e9 ca fa ff ff c6 05 b9 1b e4 01 01 48 c7 c7 34 74 d6 a6 e8 a1 c7 fa ff <0f> 0b e9 78 fa ff ff 65 8b 0d 8f 83 91 5a 89 c9 48 0f a3 0d e9 7b
> All code
> ========
>     0:	00 4c 2b bb          	add    %cl,-0x45(%rbx,%rbp,1)
>     4:	88 01                	mov    %al,(%rcx)
>     6:	00 00                	add    %al,(%rax)
>     8:	40 f6 c5 02          	test   $0x2,%bpl
>     c:	0f 84 b7 fa ff ff    	je     0xfffffffffffffac9
>    12:	e9 ca fa ff ff       	jmpq   0xfffffffffffffae1
>    17:	c6 05 b9 1b e4 01 01 	movb   $0x1,0x1e41bb9(%rip)        # 0x1e41bd7
>    1e:	48 c7 c7 34 74 d6 a6 	mov    $0xffffffffa6d67434,%rdi
>    25:	e8 a1 c7 fa ff       	callq  0xfffffffffffac7cb
>    2a:*	0f 0b                	ud2    		<-- trapping instruction
>    2c:	e9 78 fa ff ff       	jmpq   0xfffffffffffffaa9
>    31:	65 8b 0d 8f 83 91 5a 	mov    %gs:0x5a91838f(%rip),%ecx        # 0x5a9183c7
>    38:	89 c9                	mov    %ecx,%ecx
>    3a:	48                   	rex.W
>    3b:	0f                   	.byte 0xf
>    3c:	a3                   	.byte 0xa3
>    3d:	0d                   	.byte 0xd
>    3e:	e9                   	.byte 0xe9
>    3f:	7b                   	.byte 0x7b
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	0f 0b                	ud2
>     2:	e9 78 fa ff ff       	jmpq   0xfffffffffffffa7f
>     7:	65 8b 0d 8f 83 91 5a 	mov    %gs:0x5a91838f(%rip),%ecx        # 0x5a91839d
>     e:	89 c9                	mov    %ecx,%ecx
>    10:	48                   	rex.W
>    11:	0f                   	.byte 0xf
>    12:	a3                   	.byte 0xa3
>    13:	0d                   	.byte 0xd
>    14:	e9                   	.byte 0xe9
>    15:	7b                   	.byte 0x7b
> [   13.528234][    C0] RSP: 0000:ffffbaef40003d90 EFLAGS: 00010046
> [   13.528723][    C0] RAX: b53834d2850b7b00 RBX: ffff9ef2efd34640 RCX: 0000000000000027
> [   13.529379][    C0] RDX: 0000000000000002 RSI: 00000000ffff7fff RDI: ffff9ef2efc20b88
> [   13.530029][    C0] RBP: 0000000000000001 R08: 0000000000007fff R09: ffffffffa7053610
> [   13.530685][    C0] R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9ef2efd345c0
> [   13.531344][    C0] R13: ffff9ef2efd34640 R14: ffff9eefc0e26400 R15: 0000000000000000
> [   13.532001][    C0] FS:  0000000000000000(0000) GS:ffff9ef2efc00000(0063) knlGS:00000000f7f5f180
> [   13.532720][    C0] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> [   13.533242][    C0] CR2: 00000000577c51ac CR3: 0000000162e6c000 CR4: 00000000000406f0
> [   13.533891][    C0] Call Trace:
> [   13.534215][    C0]  <IRQ>
> [ 13.534504][ C0] ? __warn (kernel/panic.c:240 kernel/panic.c:735)
> [ 13.534863][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
> [ 13.535278][ C0] ? report_bug (lib/bug.c:? lib/bug.c:219)
> [ 13.535695][ C0] ? handle_bug (arch/x86/kernel/traps.c:239)
> [ 13.536079][ C0] ? exc_invalid_op (arch/x86/kernel/traps.c:260)
> [ 13.536498][ C0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
> [ 13.536920][ C0] ? update_load_avg (kernel/sched/sched.h:1550 kernel/sched/pelt.h:67 kernel/sched/pelt.h:174 kernel/sched/fair.c:4747)
> [ 13.537334][ C0] dequeue_entity (kernel/sched/sched.h:817 kernel/sched/fair.c:5429)
> [ 13.537729][ C0] dequeue_entities (kernel/sched/fair.c:6972)
> [ 13.538130][ C0] unregister_fair_sched_group (kernel/sched/fair.c:379 kernel/sched/fair.c:13159)
> [ 13.554647][ C0] ? __pfx_sched_unregister_group_rcu (kernel/sched/core.c:8759)
> [ 13.555167][ C0] sched_unregister_group_rcu (kernel/sched/core.c:8707 kernel/sched/core.c:8761)
> [ 13.555621][ C0] rcu_do_batch (arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2576)
> [ 13.556008][ C0] rcu_core (kernel/rcu/tree.c:2845)
> [ 13.556373][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:555)
> [ 13.556775][ C0] __irq_exit_rcu (kernel/softirq.c:617 kernel/softirq.c:639)
> [ 13.557166][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 arch/x86/kernel/apic/apic.c:1043)

Looking at the call stack, any chance it is the following case that is
tripping this warning:

	CPU0						CPU1
	====						====
wakeup_preempt(rq) {
   resched_curr(rq);  ------------------------>	sysvec_reschedule_ipi() {
   ...						  scheduler_ipi();
   if (task_on_rq_queued(rq->curr) &&		} /* sysvec_reschedule_ipi */
       test_tsk_need_resched(rq->curr))		handle_softirqs() {
      rq_clock_skip_update(rq);			  /* RCU_SOFTIRQ */
} /* wakeup_preempt(); */			  rcu_core_si() {
						    unregister_fair_sched_group() {
						      /* On the rq that got the resched IPI */
						      if (se->sched_delayed) {
						        dequeue_entities() {
							  /* Trips RQCF_ACT_SKIP check within since waker called rq_clock_skip_update() */
							} /* dequeue_entities(); */
						      } /* if() */
						    } /* unregister_fair_sched_group */
						  } /* rcu_core_si */
						} /* handle_softirqs */
--

commit a64692a3afd8 ("sched: Cleanup/optimize clock updates") seems to
have added this condition to skip clock updates after an enqueue since
wakeup target will do a schedule() soon. Since delayed dequeue may has
entity to dequeue now, looking at rq_clock_pelt() in dequeue path will
trip this since PELT believes that the clock is stale. Could something
like the below work?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd4a6bf14828..c437b408d29b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -13297,10 +13297,34 @@ void unregister_fair_sched_group(struct task_group *tg)
  			if (se->sched_delayed) {
  				guard(rq_lock_irqsave)(rq);
  				if (se->sched_delayed) {
+					/*
+					 * We can reach here when processing RCU_SOFTIRQ on exit path from
+					 * a reschedule IPI. wakeup_preempt() may have set RQCF_REQ_SKIP to
+					 * skip a close clock update in schedule(), however, in presence of
+					 * a delayed entity, this trips the check in rq_clock_pelt() which
+					 * now believes the clock value is stale and needs updating. To
+					 * prevent such situation, cancel any pending skip updates, and
+					 * update the rq clock.
+					 */
+					rq_clock_cancel_skipupdate(rq);
+
+					/*
+					 * XXX: Will this trip WARN_DOUBLE_CLOCK? In which case, can
+					 * rq_clock_cancel_skipupdate() be made to return a bool if
+					 * RQCF_REQ_SKIP is set and we avoid this update?
+					 */
  					update_rq_clock(rq);
+
  					dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
+
+					/* Avoid updating the clock again if a schedule() is pending */
+					if (task_on_rq_queued(rq->curr) &&
+					    test_tsk_need_resched(rq->curr))
+						rq_clock_skip_update(rq);
  				}
  				list_del_leaf_cfs_rq(cfs_rq);
+
+
  			}
  			remove_entity_load_avg(se);
  		}
--

I was not able to hit the SCHED_WARN_ON() reliably on my end when
launching a guest to confirm my hypothesis - I was able to hit it
approx. 1 in 10 run, but I've ran with the above changes 10s of
times and I've not hit the splat (yet!)

Note: Although the diff was updated for the latest
peterz-queue:sched/eevdf, all my testing was done on the previous
version on this branch based on v6.10.0 kernel.
--
Thanks and Regards,
Prateek

> [   13.557638][    C0]  </IRQ>
> [   13.557920][    C0]  <TASK>
> [ 13.560463][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
> [ 13.560957][ C0] RIP: 0010:irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231)
> [ 13.561465][ C0] Code: 90 41 57 41 56 53 48 89 fb e8 63 90 1f ff 65 4c 8b 34 25 40 37 03 00 49 8b 3e f7 c7 0e 30 02 00 74 7c 49 89 ff e8 97 8f 1f ff <4c> 89 f8 a8 08 75 38 a9 00 10 00 00 75 40 a9 00 20 00 00 75 4b a9
> All code
> ========
>     0:	90                   	nop
>     1:	41 57                	push   %r15
>     3:	41 56                	push   %r14
>     5:	53                   	push   %rbx
>     6:	48 89 fb             	mov    %rdi,%rbx
>     9:	e8 63 90 1f ff       	callq  0xffffffffff1f9071
>     e:	65 4c 8b 34 25 40 37 	mov    %gs:0x33740,%r14
>    15:	03 00
>    17:	49 8b 3e             	mov    (%r14),%rdi
>    1a:	f7 c7 0e 30 02 00    	test   $0x2300e,%edi
>    20:	74 7c                	je     0x9e
>    22:	49 89 ff             	mov    %rdi,%r15
>    25:	e8 97 8f 1f ff       	callq  0xffffffffff1f8fc1
>    2a:*	4c 89 f8             	mov    %r15,%rax		<-- trapping instruction
>    2d:	a8 08                	test   $0x8,%al
>    2f:	75 38                	jne    0x69
>    31:	a9 00 10 00 00       	test   $0x1000,%eax
>    36:	75 40                	jne    0x78
>    38:	a9 00 20 00 00       	test   $0x2000,%eax
>    3d:	75 4b                	jne    0x8a
>    3f:	a9                   	.byte 0xa9
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	4c 89 f8             	mov    %r15,%rax
>     3:	a8 08                	test   $0x8,%al
>     5:	75 38                	jne    0x3f
>     7:	a9 00 10 00 00       	test   $0x1000,%eax
>     c:	75 40                	jne    0x4e
>     e:	a9 00 20 00 00       	test   $0x2000,%eax
>    13:	75 4b                	jne    0x60
>    15:	a9                   	.byte 0xa9
> [   13.562938][    C0] RSP: 0000:ffffbaef40613f38 EFLAGS: 00000202
> [   13.563444][    C0] RAX: ffff9ef0287dd080 RBX: ffffbaef40613f58 RCX: 000000000000080b
> [   13.564125][    C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000008
> [   13.564796][    C0] RBP: 0000000000000000 R08: 000000000000002b R09: 000000000000002b
> [   13.565456][    C0] R10: 0000000000000296 R11: 00000000ff8772ac R12: 0000000000000000
> [   13.566115][    C0] R13: 0000000000000000 R14: ffff9ef0287dd080 R15: 0000000020000008
> [ 13.566782][ C0] ? irqentry_exit_to_user_mode (kernel/entry/common.c:? include/linux/entry-common.h:328 kernel/entry/common.c:231)
> [ 13.567254][ C0] asm_sysvec_reschedule_ipi (arch/x86/include/asm/idtentry.h:707)
> [   13.567709][    C0] RIP: 0023:0x56652630
> [ 13.568072][ C0] Code: ff ff 83 c4 10 85 c0 74 05 83 c4 08 5b c3 83 ec 0c 8d 83 8d 6b ff ff 50 e8 cd 7c ff ff 8d b4 26 00 00 00 00 8d b6 00 00 00 00 <55> 57 56 53 e8 e7 61 ff ff 81 c3 13 38 01 00 83 ec 0c 8b 44 24 20
> All code
> ========
>     0:	ff                   	(bad)
>     1:	ff 83 c4 10 85 c0    	incl   -0x3f7aef3c(%rbx)
>     7:	74 05                	je     0xe
>     9:	83 c4 08             	add    $0x8,%esp
>     c:	5b                   	pop    %rbx
>     d:	c3                   	retq
>     e:	83 ec 0c             	sub    $0xc,%esp
>    11:	8d 83 8d 6b ff ff    	lea    -0x9473(%rbx),%eax
>    17:	50                   	push   %rax
>    18:	e8 cd 7c ff ff       	callq  0xffffffffffff7cea
>    1d:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
>    24:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
>    2a:*	55                   	push   %rbp		<-- trapping instruction
>    2b:	57                   	push   %rdi
>    2c:	56                   	push   %rsi
>    2d:	53                   	push   %rbx
>    2e:	e8 e7 61 ff ff       	callq  0xffffffffffff621a
>    33:	81 c3 13 38 01 00    	add    $0x13813,%ebx
>    39:	83 ec 0c             	sub    $0xc,%esp
>    3c:	8b 44 24 20          	mov    0x20(%rsp),%eax
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	55                   	push   %rbp
>     1:	57                   	push   %rdi
>     2:	56                   	push   %rsi
>     3:	53                   	push   %rbx
>     4:	e8 e7 61 ff ff       	callq  0xffffffffffff61f0
>     9:	81 c3 13 38 01 00    	add    $0x13813,%ebx
>     f:	83 ec 0c             	sub    $0xc,%esp
>    12:	8b 44 24 20          	mov    0x20(%rsp),%eax
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240729/202407291328.761441a7-oliver.sang@intel.com
> 
> 
> 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg
  2024-07-31  6:16 ` K Prateek Nayak
@ 2024-07-31 11:54   ` Peter Zijlstra
  2024-08-01  3:40     ` K Prateek Nayak
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2024-07-31 11:54 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen,
	kernel test robot

On Wed, Jul 31, 2024 at 11:46:48AM +0530, K Prateek Nayak wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index cd4a6bf14828..c437b408d29b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -13297,10 +13297,34 @@ void unregister_fair_sched_group(struct task_group *tg)
>  			if (se->sched_delayed) {
>  				guard(rq_lock_irqsave)(rq);
>  				if (se->sched_delayed) {
> +					/*
> +					 * We can reach here when processing RCU_SOFTIRQ on exit path from
> +					 * a reschedule IPI. wakeup_preempt() may have set RQCF_REQ_SKIP to
> +					 * skip a close clock update in schedule(), however, in presence of
> +					 * a delayed entity, this trips the check in rq_clock_pelt() which
> +					 * now believes the clock value is stale and needs updating. To
> +					 * prevent such situation, cancel any pending skip updates, and
> +					 * update the rq clock.
> +					 */
> +					rq_clock_cancel_skipupdate(rq);
> +
> +					/*
> +					 * XXX: Will this trip WARN_DOUBLE_CLOCK? In which case, can
> +					 * rq_clock_cancel_skipupdate() be made to return a bool if
> +					 * RQCF_REQ_SKIP is set and we avoid this update?
> +					 */
>  					update_rq_clock(rq);
> +
>  					dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
> +
> +					/* Avoid updating the clock again if a schedule() is pending */
> +					if (task_on_rq_queued(rq->curr) &&
> +					    test_tsk_need_resched(rq->curr))
> +						rq_clock_skip_update(rq);
>  				}
>  				list_del_leaf_cfs_rq(cfs_rq);
> +
> +
>  			}
>  			remove_entity_load_avg(se);
>  		}

So I did update this to simply add update_rq_clock() before the
dequeue_entity(SLEEP|DELAYED). I initially had, these, then confused
myself between deactivate_task() and dequeue_entity(), where the former
updates the clock but the latter does not, and removed them. Then Mike
complained, and I restored it for the regular exit path and forgot the
cgroup exit path.

But now they should both be doing update_rq_clock() here.

  https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=5b3a132d4dd5c91f26beb3e8973c03cdb77d7873

Since this is all with our own rq->lock held, I don't think skip would
be relevant here.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg
  2024-07-31 11:54   ` Peter Zijlstra
@ 2024-08-01  3:40     ` K Prateek Nayak
  0 siblings, 0 replies; 4+ messages in thread
From: K Prateek Nayak @ 2024-08-01  3:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen,
	kernel test robot

Hello Peter,

On 7/31/2024 5:24 PM, Peter Zijlstra wrote:
> On Wed, Jul 31, 2024 at 11:46:48AM +0530, K Prateek Nayak wrote:
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index cd4a6bf14828..c437b408d29b 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -13297,10 +13297,34 @@ void unregister_fair_sched_group(struct task_group *tg)
>>   			if (se->sched_delayed) {
>>   				guard(rq_lock_irqsave)(rq);
>>   				if (se->sched_delayed) {
>> +					/*
>> +					 * We can reach here when processing RCU_SOFTIRQ on exit path from
>> +					 * a reschedule IPI. wakeup_preempt() may have set RQCF_REQ_SKIP to
>> +					 * skip a close clock update in schedule(), however, in presence of
>> +					 * a delayed entity, this trips the check in rq_clock_pelt() which
>> +					 * now believes the clock value is stale and needs updating. To
>> +					 * prevent such situation, cancel any pending skip updates, and
>> +					 * update the rq clock.
>> +					 */
>> +					rq_clock_cancel_skipupdate(rq);
>> +
>> +					/*
>> +					 * XXX: Will this trip WARN_DOUBLE_CLOCK? In which case, can
>> +					 * rq_clock_cancel_skipupdate() be made to return a bool if
>> +					 * RQCF_REQ_SKIP is set and we avoid this update?
>> +					 */
>>   					update_rq_clock(rq);
>> +
>>   					dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
>> +
>> +					/* Avoid updating the clock again if a schedule() is pending */
>> +					if (task_on_rq_queued(rq->curr) &&
>> +					    test_tsk_need_resched(rq->curr))
>> +						rq_clock_skip_update(rq);
>>   				}
>>   				list_del_leaf_cfs_rq(cfs_rq);
>> +
>> +
>>   			}
>>   			remove_entity_load_avg(se);
>>   		}
> 
> So I did update this to simply add update_rq_clock() before the
> dequeue_entity(SLEEP|DELAYED). I initially had, these, then confused
> myself between deactivate_task() and dequeue_entity(), where the former
> updates the clock but the latter does not, and removed them. Then Mike
> complained, and I restored it for the regular exit path and forgot the
> cgroup exit path.
> 
> But now they should both be doing update_rq_clock() here.
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=5b3a132d4dd5c91f26beb3e8973c03cdb77d7873
> 
> Since this is all with our own rq->lock held, I don't think skip would
> be relevant here.

Thank you for clarifying the nuances. I'll continue testing with the
latest branch.
-- 
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-08-01  3:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-29  6:14 [peterz-queue:sched/prep] [sched/fair] 124c8f4374: WARNING:at_kernel/sched/sched.h:#update_load_avg kernel test robot
2024-07-31  6:16 ` K Prateek Nayak
2024-07-31 11:54   ` Peter Zijlstra
2024-08-01  3:40     ` K Prateek Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox