[PATCH 0/1] sched/fair: Update blocked averages on tick

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/1] sched/fair: Update blocked averages on tick
@ 2024-10-11 12:32 Pierre Gondois
  2024-10-11 12:32 ` [PATCH 1/1] " Pierre Gondois
  2024-10-11 12:45 ` [PATCH 0/1] " Pierre Gondois
  0 siblings, 2 replies; 7+ messages in thread
From: Pierre Gondois @ 2024-10-11 12:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hongyan Xia, Chritian Loehle, Pierre Gondois, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider

The patch below was tested on a Pixel6 based on v6.8. The patch was
then ported to v6.12-rc2. The energy consumption of the Pixel6 is:
- measured using the phone's energy counters
- estimated using util signals, cluster frequency and the Energy Model

An rt-app workload runs during 20s, 5 iterations. 'count' tasks are created
with a utilization of 'util'%. Task's period is 16ms.

Measured energy:
+-------------+------+-------+-----------+------------+------------+
| granularity | util | count | base      | with patch | ratio      |
+-------------+------+-------+-----------+------------+------------+
| 0           | 5    | 8     |  5703.759 |  5624.371  | - 1.391    |
| 0           | 5    | 16    | 15021.425 | 14540.839  | - 3.199    |
| 0           | 5    | 24    | 30018.226 | 27046.899  | - 9.898    |
| 0           | 10   | 8     | 12843.869 | 12346.898  | - 3.869    |
| 0           | 10   | 16    | 45925.643 | 43081.072  | - 6.193    |
| 2           | 5    | 8     |  6288.044 |  6174.609  | - 1.803    |
| 2           | 5    | 16    | 15657.632 | 15037.596  | - 3.959    |
| 2           | 5    | 24    | 31483.032 | 27205.665  | -13.58     |
| 2           | 10   | 8     | 14003.121 | 12675.037  | - 9.484    |
| 2           | 10   | 16    | 45005.382 | 42088.515  | - 6.481    |
+-------------+------+-------+-----------+------------+------------+

Computed energy:
+-------------+------+-------+-----------+------------+---------+
| granularity | util | count | base      | with patch | ratio   |
+-------------+------+-------+-----------+------------+---------+
| 0           | 5    | 8     | 2.5764e10 | 2.4581e10  | - 4.591 |
| 0           | 5    | 16    | 7.0705e10 | 6.2580e10  | -11.490 |
| 0           | 5    | 24    | 1.5034e11 | 1.1889e11  | -20.916 |
| 0           | 10   | 8     | 4.7312e10 | 4.1573e10  | -12.131 |
| 0           | 10   | 16    | 1.7860e11 | 1.4748e11  | -17.423 |
| 2           | 5    | 8     | 3.0811e10 | 2.8950e10  | - 6.040 |
| 2           | 5    | 16    | 8.3871e10 | 7.1320e10  | -14.963 |
| 2           | 5    | 24    | 1.3414e11 | 1.2771e11  | - 4.788 |
| 2           | 10   | 8     | 5.8014e10 | 4.5861e10  | -20.949 |
| 2           | 10   | 16    | 1.8283e11 | 1.6708e11  | - 8.613 |
+-------------+------+-------+-----------+------------+---------+

Pierre Gondois (1):
  sched/fair: Update blocked averages on tick

 kernel/sched/fair.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] sched/fair: Update blocked averages on tick
  2024-10-11 12:32 [PATCH 0/1] sched/fair: Update blocked averages on tick Pierre Gondois
@ 2024-10-11 12:32 ` Pierre Gondois
  2024-10-14  5:26   ` kernel test robot
  2024-10-15 12:44   ` Vincent Guittot
  2024-10-11 12:45 ` [PATCH 0/1] " Pierre Gondois
  1 sibling, 2 replies; 7+ messages in thread
From: Pierre Gondois @ 2024-10-11 12:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hongyan Xia, Chritian Loehle, Pierre Gondois, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider

The Energy Aware Scheduler (EAS) relies on CPU/tasks utilization
signals. On an idle CPU, the blocked load is updated during
load balancing.

sd->balance_interval increases with the number of CPUs in the domain.
On an Arm DynamIQ system, sched domains containing CPUs with the same
capacity do not exist. On a Pixel6 with 4 little, 2 mid, 2 big CPUs:
- sd->min_interval = 8
- sd->min_interval = 16

The balance interval is doubled if the system is balanced, meaning
that a balanced system will likely update blocked load every 16ms.

The find_energy_efficient_cpu() function might thus relies on outdated
util signals to place tasks, leading to bad energy placement.

Update blocked load on sched tick if:
- the rq is idle
- the load balancer will not be triggered.

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
---
 kernel/sched/fair.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 225b31aaee55..2f03bd10ac7a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9841,15 +9841,12 @@ static unsigned long task_h_load(struct task_struct *p)
 }
 #endif
 
-static void sched_balance_update_blocked_averages(int cpu)
+static void update_blocked_averages(struct rq *rq)
 {
 	bool decayed = false, done = true;
-	struct rq *rq = cpu_rq(cpu);
-	struct rq_flags rf;
 
-	rq_lock_irqsave(rq, &rf);
-	update_blocked_load_tick(rq);
 	update_rq_clock(rq);
+	update_blocked_load_tick(rq);
 
 	decayed |= __update_blocked_others(rq, &done);
 	decayed |= __update_blocked_fair(rq, &done);
@@ -9857,6 +9854,18 @@ static void sched_balance_update_blocked_averages(int cpu)
 	update_blocked_load_status(rq, !done);
 	if (decayed)
 		cpufreq_update_util(rq, 0);
+}
+
+static void sched_balance_update_blocked_averages(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	struct cfs_rq *cfs_rq;
+	struct rq_flags rf;
+
+	cfs_rq = &rq->cfs;
+
+	rq_lock_irqsave(rq, &rf);
+	update_blocked_averages(rq);
 	rq_unlock_irqrestore(rq, &rf);
 }
 
@@ -12877,6 +12886,8 @@ void sched_balance_trigger(struct rq *rq)
 
 	if (time_after_eq(jiffies, rq->next_balance))
 		raise_softirq(SCHED_SOFTIRQ);
+	else if (idle_cpu(rq->cpu))
+		update_blocked_averages(rq);
 
 	nohz_balancer_kick(rq);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] sched/fair: Update blocked averages on tick
  2024-10-11 12:32 [PATCH 0/1] sched/fair: Update blocked averages on tick Pierre Gondois
  2024-10-11 12:32 ` [PATCH 1/1] " Pierre Gondois
@ 2024-10-11 12:45 ` Pierre Gondois
  1 sibling, 0 replies; 7+ messages in thread
From: Pierre Gondois @ 2024-10-11 12:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hongyan Xia, Chritian Loehle, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider



On 10/11/24 14:32, Pierre Gondois wrote:
> The patch below was tested on a Pixel6 based on v6.8. The patch was
> then ported to v6.12-rc2. The energy consumption of the Pixel6 is:
> - measured using the phone's energy counters
> - estimated using util signals, cluster frequency and the Energy Model
> 
> An rt-app workload runs during 20s, 5 iterations. 'count' tasks are created
> with a utilization of 'util'%. Task's period is 16ms.
> 
> Measured energy:
> +-------------+------+-------+-----------+------------+------------+
> | granularity | util | count | base      | with patch | ratio      |
> +-------------+------+-------+-----------+------------+------------+
> | 0           | 5    | 8     |  5703.759 |  5624.371  | - 1.391    |
> | 0           | 5    | 16    | 15021.425 | 14540.839  | - 3.199    |
> | 0           | 5    | 24    | 30018.226 | 27046.899  | - 9.898    |
> | 0           | 10   | 8     | 12843.869 | 12346.898  | - 3.869    |
> | 0           | 10   | 16    | 45925.643 | 43081.072  | - 6.193    |

I forgot to trimm a configuration off the results. Please ignore the
lines with a 'granularity=2'.

8>
> | 2           | 5    | 8     |  6288.044 |  6174.609  | - 1.803    |
> | 2           | 5    | 16    | 15657.632 | 15037.596  | - 3.959    |
> | 2           | 5    | 24    | 31483.032 | 27205.665  | -13.58     |
> | 2           | 10   | 8     | 14003.121 | 12675.037  | - 9.484    |
> | 2           | 10   | 16    | 45005.382 | 42088.515  | - 6.481    |
<8
> +-------------+------+-------+-----------+------------+------------+
> 
> Computed energy:
> +-------------+------+-------+-----------+------------+---------+
> | granularity | util | count | base      | with patch | ratio   |
> +-------------+------+-------+-----------+------------+---------+
> | 0           | 5    | 8     | 2.5764e10 | 2.4581e10  | - 4.591 |
> | 0           | 5    | 16    | 7.0705e10 | 6.2580e10  | -11.490 |
> | 0           | 5    | 24    | 1.5034e11 | 1.1889e11  | -20.916 |
> | 0           | 10   | 8     | 4.7312e10 | 4.1573e10  | -12.131 |
> | 0           | 10   | 16    | 1.7860e11 | 1.4748e11  | -17.423 |

8>
> | 2           | 5    | 8     | 3.0811e10 | 2.8950e10  | - 6.040 |
> | 2           | 5    | 16    | 8.3871e10 | 7.1320e10  | -14.963 |
> | 2           | 5    | 24    | 1.3414e11 | 1.2771e11  | - 4.788 |
> | 2           | 10   | 8     | 5.8014e10 | 4.5861e10  | -20.949 |
> | 2           | 10   | 16    | 1.8283e11 | 1.6708e11  | - 8.613 |
<8
> +-------------+------+-------+-----------+------------+---------+
> 
> Pierre Gondois (1):
>    sched/fair: Update blocked averages on tick
> 
>   kernel/sched/fair.c | 21 ++++++++++++++++-----
>   1 file changed, 16 insertions(+), 5 deletions(-)
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] sched/fair: Update blocked averages on tick
  2024-10-11 12:32 ` [PATCH 1/1] " Pierre Gondois
@ 2024-10-14  5:26   ` kernel test robot
  2024-10-15 12:44   ` Vincent Guittot
  1 sibling, 0 replies; 7+ messages in thread
From: kernel test robot @ 2024-10-14  5:26 UTC (permalink / raw)
  To: Pierre Gondois
  Cc: oe-lkp, lkp, linux-kernel, aubrey.li, yu.c.chen, Hongyan Xia,
	Chritian Loehle, Pierre Gondois, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, oliver.sang



Hello,

kernel test robot noticed "WARNING:at_kernel/sched/sched.h:#update_rq_clock" on:

commit: 8250cf6004de715b0abbbd5bf837d5ac869f1641 ("[PATCH 1/1] sched/fair: Update blocked averages on tick")
url: https://github.com/intel-lab-lkp/linux/commits/Pierre-Gondois/sched-fair-Update-blocked-averages-on-tick/20241011-203402
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 7266f0a6d3bb73f42ea06656d3cc48c7d0386f71
patch link: https://lore.kernel.org/all/20241011123222.1282936-2-pierre.gondois@arm.com/
patch subject: [PATCH 1/1] sched/fair: Update blocked averages on tick

in testcase: boot

config: x86_64-randconfig-006-20241013
compiler: clang-18
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-------------------------------------------------------------+------------+------------+
|                                                             | 7266f0a6d3 | 8250cf6004 |
+-------------------------------------------------------------+------------+------------+
| boot_successes                                              | 18         | 0          |
| boot_failures                                               | 0          | 21         |
| WARNING:at_kernel/sched/sched.h:#update_rq_clock            | 0          | 21         |
| RIP:update_rq_clock                                         | 0          | 21         |
| RIP:default_idle                                            | 0          | 21         |
| WARNING:at_kernel/sched/sched.h:#_update_idle_rq_clock_pelt | 0          | 21         |
| RIP:_update_idle_rq_clock_pelt                              | 0          | 21         |
| WARNING:at_kernel/sched/sched.h:#update_other_load_avgs     | 0          | 21         |
| RIP:update_other_load_avgs                                  | 0          | 21         |
| WARNING:at_kernel/sched/sched.h:#update_blocked_averages    | 0          | 21         |
| RIP:update_blocked_averages                                 | 0          | 21         |
+-------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202410141238.aa194a45-lkp@intel.com


[    2.078083][    C1] ------------[ cut here ]------------
[ 2.079049][ C1] WARNING: CPU: 1 PID: 0 at kernel/sched/sched.h:1496 update_rq_clock (kernel/sched/sched.h:1496) 
[    2.080428][    C1] Modules linked in:
[    2.081070][    C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.0-rc2-00014-g8250cf6004de #1 dfe5ae60b9ab8d21e49756916bb824f564f72e7b
[    2.081391][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 2.081391][ C1] RIP: 0010:update_rq_clock (kernel/sched/sched.h:1496) 
[ 2.081391][ C1] Code: 00 00 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 c7 38 41 00 4c 01 3b 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b e9 52 fd ff ff 48 89 df e8 fe 38 01 00 eb e0 48 c7 c1 3c 21
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	48 89 d8             	mov    %rbx,%rax
   5:	48 c1 e8 03          	shr    $0x3,%rax
   9:	42 80 3c 28 00       	cmpb   $0x0,(%rax,%r13,1)
   e:	74 08                	je     0x18
  10:	48 89 df             	mov    %rbx,%rdi
  13:	e8 c7 38 41 00       	call   0x4138df
  18:	4c 01 3b             	add    %r15,(%rbx)
  1b:	48 83 c4 18          	add    $0x18,%rsp
  1f:	5b                   	pop    %rbx
  20:	41 5c                	pop    %r12
  22:	41 5d                	pop    %r13
  24:	41 5e                	pop    %r14
  26:	41 5f                	pop    %r15
  28:	5d                   	pop    %rbp
  29:	c3                   	ret
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	e9 52 fd ff ff       	jmp    0xfffffffffffffd83
  31:	48 89 df             	mov    %rbx,%rdi
  34:	e8 fe 38 01 00       	call   0x13937
  39:	eb e0                	jmp    0x1b
  3b:	48                   	rex.W
  3c:	c7                   	.byte 0xc7
  3d:	c1                   	.byte 0xc1
  3e:	3c 21                	cmp    $0x21,%al

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	e9 52 fd ff ff       	jmp    0xfffffffffffffd59
   7:	48 89 df             	mov    %rbx,%rdi
   a:	e8 fe 38 01 00       	call   0x1390d
   f:	eb e0                	jmp    0xfffffffffffffff1
  11:	48                   	rex.W
  12:	c7                   	.byte 0xc7
  13:	c1                   	.byte 0xc1
  14:	3c 21                	cmp    $0x21,%al
[    2.081391][    C1] RSP: 0000:ffffc900001e8df0 EFLAGS: 00010046
[    2.081391][    C1] RAX: 0000000000000000 RBX: ffff8883aeb396c0 RCX: dffffc0000000000
[    2.081391][    C1] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff8883aeb396d8
[    2.081391][    C1] RBP: ffffc900001e8e30 R08: ffffffffb6af076f R09: 1ffffffff6d5e0ed
[    2.081391][    C1] R10: dffffc0000000000 R11: fffffbfff6d5e0ee R12: dffffc0000000000
[    2.081391][    C1] R13: dffffc0000000000 R14: ffff8883aeb3a158 R15: ffff8883aeb3a088
[    2.081391][    C1] FS:  0000000000000000(0000) GS:ffff8883aeb00000(0000) knlGS:0000000000000000
[    2.081391][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.081391][    C1] CR2: 0000000000000000 CR3: 0000000182a4c000 CR4: 00000000000406f0
[    2.081391][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.081391][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    2.081391][    C1] Call Trace:
[    2.081391][    C1]  <IRQ>
[ 2.081391][ C1] ? show_regs (arch/x86/kernel/dumpstack.c:479) 
[ 2.081391][ C1] ? __warn (kernel/panic.c:748) 
[ 2.081391][ C1] ? update_rq_clock (kernel/sched/sched.h:1496) 
[ 2.081391][ C1] ? report_bug (lib/bug.c:?) 
[ 2.081391][ C1] ? update_rq_clock (kernel/sched/sched.h:1496) 
[ 2.081391][ C1] ? handle_bug (arch/x86/kernel/traps.c:285) 
[ 2.081391][ C1] ? exc_invalid_op (arch/x86/kernel/traps.c:309) 
[ 2.081391][ C1] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) 
[ 2.081391][ C1] ? update_rq_clock (kernel/sched/sched.h:1496) 
[ 2.081391][ C1] update_blocked_averages (kernel/sched/fair.c:9720 kernel/sched/fair.c:9845) 
[ 2.081391][ C1] sched_balance_trigger (kernel/sched/fair.c:12887) 
[ 2.081391][ C1] sched_tick (kernel/sched/core.c:5617) 
[ 2.081391][ C1] update_process_times (kernel/time/timer.c:2526) 
[ 2.081391][ C1] tick_periodic (kernel/time/tick-common.c:102) 
[ 2.081391][ C1] tick_handle_periodic (kernel/time/tick-common.c:120) 
[ 2.081391][ C1] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1044) 
[ 2.081391][ C1] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1037) 
[    2.081391][    C1]  </IRQ>
[    2.081391][    C1]  <TASK>
[ 2.081391][ C1] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702) 
[ 2.081391][ C1] RIP: 0010:default_idle (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:743) 
[ 2.081391][ C1] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 90 90 90 b8 0c 67 40 a5 eb 07 0f 00 2d 9f 89 47 00 55 48 89 e5 fb f4 <fa> 5d c3 66 0f 1f 44 00 00 90 90 90 b8 0c 67 40 a5 55 48 89 e5 e8
All code
========
   0:	00 4d 29             	add    %cl,0x29(%rbp)
   3:	c8 4c 01 c7          	enter  $0x14c,$0xc7
   7:	4c 29 c2             	sub    %r8,%rdx
   a:	e9 72 ff ff ff       	jmp    0xffffffffffffff81
   f:	cc                   	int3
  10:	cc                   	int3
  11:	cc                   	int3
  12:	cc                   	int3
  13:	90                   	nop
  14:	90                   	nop
  15:	90                   	nop
  16:	b8 0c 67 40 a5       	mov    $0xa540670c,%eax
  1b:	eb 07                	jmp    0x24
  1d:	0f 00 2d 9f 89 47 00 	verw   0x47899f(%rip)        # 0x4789c3
  24:	55                   	push   %rbp
  25:	48 89 e5             	mov    %rsp,%rbp
  28:	fb                   	sti
  29:	f4                   	hlt
  2a:*	fa                   	cli		<-- trapping instruction
  2b:	5d                   	pop    %rbp
  2c:	c3                   	ret
  2d:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  33:	90                   	nop
  34:	90                   	nop
  35:	90                   	nop
  36:	b8 0c 67 40 a5       	mov    $0xa540670c,%eax
  3b:	55                   	push   %rbp
  3c:	48 89 e5             	mov    %rsp,%rbp
  3f:	e8                   	.byte 0xe8

Code starting with the faulting instruction
===========================================
   0:	fa                   	cli
   1:	5d                   	pop    %rbp
   2:	c3                   	ret
   3:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
   9:	90                   	nop
   a:	90                   	nop
   b:	90                   	nop
   c:	b8 0c 67 40 a5       	mov    $0xa540670c,%eax
  11:	55                   	push   %rbp
  12:	48 89 e5             	mov    %rsp,%rbp
  15:	e8                   	.byte 0xe8


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241014/202410141238.aa194a45-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] sched/fair: Update blocked averages on tick
  2024-10-11 12:32 ` [PATCH 1/1] " Pierre Gondois
  2024-10-14  5:26   ` kernel test robot
@ 2024-10-15 12:44   ` Vincent Guittot
  2024-10-21  9:47     ` Pierre Gondois
  1 sibling, 1 reply; 7+ messages in thread
From: Vincent Guittot @ 2024-10-15 12:44 UTC (permalink / raw)
  To: Pierre Gondois
  Cc: linux-kernel, Hongyan Xia, Chritian Loehle, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

On Fri, 11 Oct 2024 at 14:32, Pierre Gondois <pierre.gondois@arm.com> wrote:
>
> The Energy Aware Scheduler (EAS) relies on CPU/tasks utilization
> signals. On an idle CPU, the blocked load is updated during
> load balancing.
>
> sd->balance_interval increases with the number of CPUs in the domain.
> On an Arm DynamIQ system, sched domains containing CPUs with the same
> capacity do not exist. On a Pixel6 with 4 little, 2 mid, 2 big CPUs:
> - sd->min_interval = 8
> - sd->min_interval = 16
>
> The balance interval is doubled if the system is balanced, meaning
> that a balanced system will likely update blocked load every 16ms.

The real max boundary is LOAD_AVG_PERIOD that is used to update
nohz.next_blocked. This is the max between 2 updates of blocked load.
The other ones are opportunistics updates when a normal load balance
is triggered.

>
> The find_energy_efficient_cpu() function might thus relies on outdated
> util signals to place tasks, leading to bad energy placement.

Moving from 8ms to 16 ms is what makes the difference for you ?

The LOAD_AVG_PERIOD mas period has been used as a default value but if
it's too long, we could consider changing the max period between 2
updates

>
> Update blocked load on sched tick if:
> - the rq is idle
> - the load balancer will not be triggered.
>
> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
> ---
>  kernel/sched/fair.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 225b31aaee55..2f03bd10ac7a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9841,15 +9841,12 @@ static unsigned long task_h_load(struct task_struct *p)
>  }
>  #endif
>
> -static void sched_balance_update_blocked_averages(int cpu)
> +static void update_blocked_averages(struct rq *rq)
>  {
>         bool decayed = false, done = true;
> -       struct rq *rq = cpu_rq(cpu);
> -       struct rq_flags rf;
>
> -       rq_lock_irqsave(rq, &rf);
> -       update_blocked_load_tick(rq);
>         update_rq_clock(rq);
> +       update_blocked_load_tick(rq);
>
>         decayed |= __update_blocked_others(rq, &done);
>         decayed |= __update_blocked_fair(rq, &done);
> @@ -9857,6 +9854,18 @@ static void sched_balance_update_blocked_averages(int cpu)
>         update_blocked_load_status(rq, !done);
>         if (decayed)
>                 cpufreq_update_util(rq, 0);
> +}
> +
> +static void sched_balance_update_blocked_averages(int cpu)
> +{
> +       struct rq *rq = cpu_rq(cpu);
> +       struct cfs_rq *cfs_rq;
> +       struct rq_flags rf;
> +
> +       cfs_rq = &rq->cfs;
> +
> +       rq_lock_irqsave(rq, &rf);
> +       update_blocked_averages(rq);
>         rq_unlock_irqrestore(rq, &rf);
>  }
>
> @@ -12877,6 +12886,8 @@ void sched_balance_trigger(struct rq *rq)
>
>         if (time_after_eq(jiffies, rq->next_balance))
>                 raise_softirq(SCHED_SOFTIRQ);
> +       else if (idle_cpu(rq->cpu))
> +               update_blocked_averages(rq);

would be good to explain why you don't need rq lock here

There is no rate limit so we can do this every tick (possibly  1ms)
when staying in shallowest state

So it's looks better to update the period between 2 update of blocked
load instead of adding a new path

>
>         nohz_balancer_kick(rq);
>  }
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] sched/fair: Update blocked averages on tick
  2024-10-15 12:44   ` Vincent Guittot
@ 2024-10-21  9:47     ` Pierre Gondois
  2024-10-22 15:08       ` Vincent Guittot
  0 siblings, 1 reply; 7+ messages in thread
From: Pierre Gondois @ 2024-10-21  9:47 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Hongyan Xia, Chritian Loehle, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

Hello Vincent,

On 10/15/24 14:44, Vincent Guittot wrote:
> On Fri, 11 Oct 2024 at 14:32, Pierre Gondois <pierre.gondois@arm.com> wrote:
>>
>> The Energy Aware Scheduler (EAS) relies on CPU/tasks utilization
>> signals. On an idle CPU, the blocked load is updated during
>> load balancing.
>>
>> sd->balance_interval increases with the number of CPUs in the domain.
>> On an Arm DynamIQ system, sched domains containing CPUs with the same
>> capacity do not exist. On a Pixel6 with 4 little, 2 mid, 2 big CPUs:
>> - sd->min_interval = 8
>> - sd->min_interval = 16
>>
>> The balance interval is doubled if the system is balanced, meaning
>> that a balanced system will likely update blocked load every 16ms.
> 
> The real max boundary is LOAD_AVG_PERIOD that is used to update
> nohz.next_blocked. This is the max between 2 updates of blocked load.
> The other ones are opportunistics updates when a normal load balance
> is triggered.

I wanted to mean that on an idle CPU with ticks still on, the cfs_rq load
is only updated through this path:
sched_balance_trigger() {
   if (time_after_eq(jiffies, rq->next_balance))
     raise_softirq(SCHED_SOFTIRQ);
}

...

sched_balance_softirq()
\-sched_balance_update_blocked_averages()

If the next_balance happens every 16ms, this means feec() might operate
its task placement using an (up to) 16ms old util signal. The CPU might
thus look busier than what it actually is.

> 
>>
>> The find_energy_efficient_cpu() function might thus relies on outdated
>> util signals to place tasks, leading to bad energy placement.
> 
> Moving from 8ms to 16 ms is what makes the difference for you ?

With this patch, the cfs_rq signal of an idle CPU is updated every tick,
so every 4ms.

> 
> The LOAD_AVG_PERIOD mas period has been used as a default value but if
> it's too long, we could consider changing the max period between 2
> updates
> 
>>
>> Update blocked load on sched tick if:
>> - the rq is idle
>> - the load balancer will not be triggered.
>>
>> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
>> ---
>>   kernel/sched/fair.c | 21 ++++++++++++++++-----
>>   1 file changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 225b31aaee55..2f03bd10ac7a 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9841,15 +9841,12 @@ static unsigned long task_h_load(struct task_struct *p)
>>   }
>>   #endif
>>
>> -static void sched_balance_update_blocked_averages(int cpu)
>> +static void update_blocked_averages(struct rq *rq)
>>   {
>>          bool decayed = false, done = true;
>> -       struct rq *rq = cpu_rq(cpu);
>> -       struct rq_flags rf;
>>
>> -       rq_lock_irqsave(rq, &rf);
>> -       update_blocked_load_tick(rq);
>>          update_rq_clock(rq);
>> +       update_blocked_load_tick(rq);
>>
>>          decayed |= __update_blocked_others(rq, &done);
>>          decayed |= __update_blocked_fair(rq, &done);
>> @@ -9857,6 +9854,18 @@ static void sched_balance_update_blocked_averages(int cpu)
>>          update_blocked_load_status(rq, !done);
>>          if (decayed)
>>                  cpufreq_update_util(rq, 0);
>> +}
>> +
>> +static void sched_balance_update_blocked_averages(int cpu)
>> +{
>> +       struct rq *rq = cpu_rq(cpu);
>> +       struct cfs_rq *cfs_rq;
>> +       struct rq_flags rf;
>> +
>> +       cfs_rq = &rq->cfs;
>> +
>> +       rq_lock_irqsave(rq, &rf);
>> +       update_blocked_averages(rq);
>>          rq_unlock_irqrestore(rq, &rf);
>>   }
>>
>> @@ -12877,6 +12886,8 @@ void sched_balance_trigger(struct rq *rq)
>>
>>          if (time_after_eq(jiffies, rq->next_balance))
>>                  raise_softirq(SCHED_SOFTIRQ);
>> +       else if (idle_cpu(rq->cpu))
>> +               update_blocked_averages(rq);
> 
> would be good to explain why you don't need rq lock here

This is a mistake, the lock is indeed required.

> 
> There is no rate limit so we can do this every tick (possibly  1ms)
> when staying in shallowest state

I'm not sure we understood each other as this patch should no be related
to NOHZ CPUs. So please let me know if I used a wrong path as you said,
or if a rate limit would be needed.

> 
> So it's looks better to update the period between 2 update of blocked
> load instead of adding a new path
> 
>>
>>          nohz_balancer_kick(rq);
>>   }
>> --
>> 2.25.1
>>

Regards,
Pierre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] sched/fair: Update blocked averages on tick
  2024-10-21  9:47     ` Pierre Gondois
@ 2024-10-22 15:08       ` Vincent Guittot
  0 siblings, 0 replies; 7+ messages in thread
From: Vincent Guittot @ 2024-10-22 15:08 UTC (permalink / raw)
  To: Pierre Gondois
  Cc: linux-kernel, Hongyan Xia, Chritian Loehle, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

On Mon, 21 Oct 2024 at 11:47, Pierre Gondois <pierre.gondois@arm.com> wrote:
>
> Hello Vincent,
>
> On 10/15/24 14:44, Vincent Guittot wrote:
> > On Fri, 11 Oct 2024 at 14:32, Pierre Gondois <pierre.gondois@arm.com> wrote:
> >>
> >> The Energy Aware Scheduler (EAS) relies on CPU/tasks utilization
> >> signals. On an idle CPU, the blocked load is updated during
> >> load balancing.
> >>
> >> sd->balance_interval increases with the number of CPUs in the domain.
> >> On an Arm DynamIQ system, sched domains containing CPUs with the same
> >> capacity do not exist. On a Pixel6 with 4 little, 2 mid, 2 big CPUs:
> >> - sd->min_interval = 8
> >> - sd->min_interval = 16
> >>
> >> The balance interval is doubled if the system is balanced, meaning
> >> that a balanced system will likely update blocked load every 16ms.
> >
> > The real max boundary is LOAD_AVG_PERIOD that is used to update
> > nohz.next_blocked. This is the max between 2 updates of blocked load.
> > The other ones are opportunistics updates when a normal load balance
> > is triggered.
>
> I wanted to mean that on an idle CPU with ticks still on, the cfs_rq load
> is only updated through this path:
> sched_balance_trigger() {
>    if (time_after_eq(jiffies, rq->next_balance))
>      raise_softirq(SCHED_SOFTIRQ);
> }
>
> ...
>
> sched_balance_softirq()
> \-sched_balance_update_blocked_averages()
>
> If the next_balance happens every 16ms, this means feec() might operate
> its task placement using an (up to) 16ms old util signal. The CPU might

This is true for all idle CPUs and not only the local one with tick
still firing when idle

> thus look busier than what it actually is.

yes and it can be up to 32 ms because the real max limit between 2
updates is currently set to LOAD_AVG_PERIOD.  You can probably find a
unitary test on a board that takes advantage of some "random" update
during idle ticks but you will still have some old values on the
system and you don't have any control of their max period.

You could also reduce min_interval to a lower value but this will not
take care of other idle cpus

>
> >
> >>
> >> The find_energy_efficient_cpu() function might thus relies on outdated
> >> util signals to place tasks, leading to bad energy placement.
> >
> > Moving from 8ms to 16 ms is what makes the difference for you ?
>
> With this patch, the cfs_rq signal of an idle CPU is updated every tick,
> so every 4ms.

If the CPU is kept in the shallowest idle state then it is not
expected to stay for a long time which also means not that old
outdated value.

And what if the tick is 1ms or 10ms ?

>
> >
> > The LOAD_AVG_PERIOD mas period has been used as a default value but if
> > it's too long, we could consider changing the max period between 2
> > updates
> >
> >>
> >> Update blocked load on sched tick if:
> >> - the rq is idle
> >> - the load balancer will not be triggered.
> >>
> >> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
> >> ---
> >>   kernel/sched/fair.c | 21 ++++++++++++++++-----
> >>   1 file changed, 16 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 225b31aaee55..2f03bd10ac7a 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -9841,15 +9841,12 @@ static unsigned long task_h_load(struct task_struct *p)
> >>   }
> >>   #endif
> >>
> >> -static void sched_balance_update_blocked_averages(int cpu)
> >> +static void update_blocked_averages(struct rq *rq)
> >>   {
> >>          bool decayed = false, done = true;
> >> -       struct rq *rq = cpu_rq(cpu);
> >> -       struct rq_flags rf;
> >>
> >> -       rq_lock_irqsave(rq, &rf);
> >> -       update_blocked_load_tick(rq);
> >>          update_rq_clock(rq);
> >> +       update_blocked_load_tick(rq);
> >>
> >>          decayed |= __update_blocked_others(rq, &done);
> >>          decayed |= __update_blocked_fair(rq, &done);
> >> @@ -9857,6 +9854,18 @@ static void sched_balance_update_blocked_averages(int cpu)
> >>          update_blocked_load_status(rq, !done);
> >>          if (decayed)
> >>                  cpufreq_update_util(rq, 0);
> >> +}
> >> +
> >> +static void sched_balance_update_blocked_averages(int cpu)
> >> +{
> >> +       struct rq *rq = cpu_rq(cpu);
> >> +       struct cfs_rq *cfs_rq;
> >> +       struct rq_flags rf;
> >> +
> >> +       cfs_rq = &rq->cfs;
> >> +
> >> +       rq_lock_irqsave(rq, &rf);
> >> +       update_blocked_averages(rq);
> >>          rq_unlock_irqrestore(rq, &rf);
> >>   }
> >>
> >> @@ -12877,6 +12886,8 @@ void sched_balance_trigger(struct rq *rq)
> >>
> >>          if (time_after_eq(jiffies, rq->next_balance))
> >>                  raise_softirq(SCHED_SOFTIRQ);
> >> +       else if (idle_cpu(rq->cpu))
> >> +               update_blocked_averages(rq);
> >
> > would be good to explain why you don't need rq lock here
>
> This is a mistake, the lock is indeed required.

update_blocked_averages() can take time as we go through all cgroups
and interrupts are still disabled here whereas they are not during
softirq. Some already complained that running
update_blocked_averages() can impact the system latency that is why we
moved the update out of schedler path for the newly idle case for
example.

IMO, you should keep the update in the softirq and take advantage if
this tick to update other idle CPUs which don't have idle tick.

>
> >
> > There is no rate limit so we can do this every tick (possibly  1ms)
> > when staying in shallowest state
>
> I'm not sure we understood each other as this patch should no be related
> to NOHZ CPUs. So please let me know if I used a wrong path as you said,
> or if a rate limit would be needed.

If your problem is about too old pelt value for idle CPUs because the
load balance interval and the next_block period are too high then you
should fix it for all idle CPUs with the  next_block period probably

>
> >
> > So it's looks better to update the period between 2 update of blocked
> > load instead of adding a new path
> >
> >>
> >>          nohz_balancer_kick(rq);
> >>   }
> >> --
> >> 2.25.1
> >>
>
> Regards,
> Pierre

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-10-22 15:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-11 12:32 [PATCH 0/1] sched/fair: Update blocked averages on tick Pierre Gondois
2024-10-11 12:32 ` [PATCH 1/1] " Pierre Gondois
2024-10-14  5:26   ` kernel test robot
2024-10-15 12:44   ` Vincent Guittot
2024-10-21  9:47     ` Pierre Gondois
2024-10-22 15:08       ` Vincent Guittot
2024-10-11 12:45 ` [PATCH 0/1] " Pierre Gondois

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox