[PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
@ 2019-03-19 13:00 Phil Auld
  2019-03-21 18:01 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Phil Auld @ 2019-03-19 13:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ben Segall, Ingo Molnar, Peter Zijlstra, Anton Blanchard

sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup

With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0.  The large number of children can make
do_sched_cfs_period_timer() take longer than the period.

[  217.264946] NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
[  217.264948] Modules linked in: sunrpc iTCO_wdt gpio_ich iTCO_vendor_support intel_powerclamp coretemp kvm_intel
+kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_si intel_cstate joydev
+ipmi_devintf pcspkr hpilo intel_uncore sg hpwdt ipmi_msghandler acpi_power_meter i7core_edac lpc_ich acpi_cpufreq
+ip_tables xfs libcrc32c sr_mod sd_mod cdrom ata_generic radeon i2c_algo_bit drm_kms_helper syscopyarea
+sysfillrect sysimgblt fb_sys_fops ttm ata_piix drm serio_raw crc32c_intel hpsa myri10ge libata dca
+scsi_transport_sas netxen_nic dm_mirror dm_region_hash dm_log dm_mod
[  217.264964] CPU: 24 PID: 46684 Comm: myapp Not tainted 5.0.0-rc7+ #25
[  217.264965] Hardware name: HP ProLiant DL580 G7, BIOS P65 08/16/2015
[  217.264965] RIP: 0010:tg_nop+0x0/0x10
[  217.264966] Code: 83 c9 08 f0 48 0f b1 0f 48 39 c2 74 0e 48 89 c2 f7 c2 00 00 20 00 75 dc 31 c0 c3 b8 01 00 00
+00 c3 66 0f 1f 84 00 00 00 00 00 <66> 66 66 66 90 31 c0 c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 8b
[  217.264967] RSP: 0000:ffff976a7f703e48 EFLAGS: 00000087
[  217.264967] RAX: ffff976a7c7c8f00 RBX: ffff976a6f4fad00 RCX: ffff976a7c7c90f0
[  217.264968] RDX: ffff976a6f4faee0 RSI: ffff976a7f961f00 RDI: ffff976a6f4fad00
[  217.264968] RBP: ffff976a7f961f00 R08: 0000000000000002 R09: ffffffad2c9b3331
[  217.264969] R10: 0000000000000000 R11: 0000000000000000 R12: ffff976a7c7c8f00
[  217.264969] R13: ffffffffb2305c00 R14: 0000000000000000 R15: ffffffffb22f8510
[  217.264970] FS:  00007f6240678740(0000) GS:ffff976a7f700000(0000) knlGS:0000000000000000
[  217.264970] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  217.264971] CR2: 00000000006dee20 CR3: 000000bf2bffc005 CR4: 00000000000206e0
[  217.264971] Call Trace:
[  217.264971]  <IRQ>
[  217.264972]  walk_tg_tree_from+0x29/0xb0
[  217.264972]  unthrottle_cfs_rq+0xe0/0x1a0
[  217.264972]  distribute_cfs_runtime+0xd3/0xf0
[  217.264973]  sched_cfs_period_timer+0xcb/0x160
[  217.264973]  ? sched_cfs_slack_timer+0xd0/0xd0
[  217.264973]  __hrtimer_run_queues+0xfb/0x270
[  217.264974]  hrtimer_interrupt+0x122/0x270
[  217.264974]  smp_apic_timer_interrupt+0x6a/0x140
[  217.264975]  apic_timer_interrupt+0xf/0x20
[  217.264975]  </IRQ>
[  217.264975] RIP: 0033:0x7f6240125fe5
[  217.264976] Code: 44 17 d0 f3 44 0f 7f 47 30 f3 44 0f 7f 44 17 c0 48 01 fa 48 83 e2 c0 48 39 d1 74 a3 66 0f 1f
+84 00 00 00 00 00 66 44 0f 7f 01 <66> 44 0f 7f 41 10 66 44 0f 7f 41 20 66 44 0f 7f 41 30 48 83 c1 40
...

To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires.  This preserves the relative runtime
quota while preventing the hard lockup.

A warning is issued reporting this state and the new values.

v2: Math reworked/simplified by Peter Zijlstra.

Signed-off-by: Phil Auld <pauld@redhat.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anton Blanchard <anton@ozlabs.org>
---
 kernel/sched/fair.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ea74d43924b2..4e43a40ec13c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 	unsigned long flags;
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock_irqsave(&cfs_b->lock, flags);
 	for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota /= old;
+
+			pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+				smp_processor_id(),
+				new/NSEC_PER_USEC,
+				cfs_b->quota/NSEC_PER_USEC);
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
 	}
 	if (idle)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
@ 2019-03-21 18:01 ` Peter Zijlstra
  2019-03-21 18:32   ` Phil Auld
  2019-04-03  8:38 ` [tip:sched/core] " tip-bot for Phil Auld
  2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
  2 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2019-03-21 18:01 UTC (permalink / raw)
  To: Phil Auld; +Cc: linux-kernel, Ben Segall, Ingo Molnar, Anton Blanchard

On Tue, Mar 19, 2019 at 09:00:05AM -0400, Phil Auld wrote:
> sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
> 
> With extremely short cfs_period_us setting on a parent task group with a large
> number of children the for loop in sched_cfs_period_timer can run until the
> watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> will ever return 0.  The large number of children can make
> do_sched_cfs_period_timer() take longer than the period.

> 
> To prevent this we add protection to the loop that detects when the loop has run
> too many times and scales the period and quota up, proportionally, so that the timer
> can complete before then next period expires.  This preserves the relative runtime
> quota while preventing the hard lockup.
> 
> A warning is issued reporting this state and the new values.
> 
> v2: Math reworked/simplified by Peter Zijlstra.
> 
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Anton Blanchard <anton@ozlabs.org>

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-03-21 18:01 ` Peter Zijlstra
@ 2019-03-21 18:32   ` Phil Auld
  0 siblings, 0 replies; 11+ messages in thread
From: Phil Auld @ 2019-03-21 18:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ben Segall, Ingo Molnar, Anton Blanchard

On Thu, Mar 21, 2019 at 07:01:37PM +0100 Peter Zijlstra wrote:
> On Tue, Mar 19, 2019 at 09:00:05AM -0400, Phil Auld wrote:
> > sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
> > 
> > With extremely short cfs_period_us setting on a parent task group with a large
> > number of children the for loop in sched_cfs_period_timer can run until the
> > watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> > will ever return 0.  The large number of children can make
> > do_sched_cfs_period_timer() take longer than the period.
> 
> > 
> > To prevent this we add protection to the loop that detects when the loop has run
> > too many times and scales the period and quota up, proportionally, so that the timer
> > can complete before then next period expires.  This preserves the relative runtime
> > quota while preventing the hard lockup.
> > 
> > A warning is issued reporting this state and the new values.
> > 
> > v2: Math reworked/simplified by Peter Zijlstra.
> > 
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: Anton Blanchard <anton@ozlabs.org>
> 
> Thanks!

Thank you for your time and help.

What do you think about Cc: stable?


Cheers,
Phil

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
  2019-03-21 18:01 ` Peter Zijlstra
@ 2019-04-03  8:38 ` tip-bot for Phil Auld
  2019-04-09 12:48   ` Phil Auld
  2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
  2 siblings, 1 reply; 11+ messages in thread
From: tip-bot for Phil Auld @ 2019-04-03  8:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, pauld, efault, tglx, peterz, anton, bsegall, torvalds,
	stable, hpa, linux-kernel

Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
Author:     Phil Auld <pauld@redhat.com>
AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 3 Apr 2019 09:50:23 +0200

sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup

With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0.  The large number of children can make
do_sched_cfs_period_timer() take longer than the period.

 NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
 RIP: 0010:tg_nop+0x0/0x10
  <IRQ>
  walk_tg_tree_from+0x29/0xb0
  unthrottle_cfs_rq+0xe0/0x1a0
  distribute_cfs_runtime+0xd3/0xf0
  sched_cfs_period_timer+0xcb/0x160
  ? sched_cfs_slack_timer+0xd0/0xd0
  __hrtimer_run_queues+0xfb/0x270
  hrtimer_interrupt+0x122/0x270
  smp_apic_timer_interrupt+0x6a/0x140
  apic_timer_interrupt+0xf/0x20
  </IRQ>

To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires.  This preserves the relative runtime
quota while preventing the hard lockup.

A warning is issued reporting this state and the new values.

Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..d4cce633eac8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 	unsigned long flags;
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock_irqsave(&cfs_b->lock, flags);
 	for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota /= old;
+
+			pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+				smp_processor_id(),
+				new/NSEC_PER_USEC,
+				cfs_b->quota/NSEC_PER_USEC);
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
 	}
 	if (idle)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-03  8:38 ` [tip:sched/core] " tip-bot for Phil Auld
@ 2019-04-09 12:48   ` Phil Auld
  2019-04-09 13:05     ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Phil Auld @ 2019-04-09 12:48 UTC (permalink / raw)
  To: linux-kernel, hpa, peterz, bsegall, torvalds, anton, efault,
	mingo, tglx
  Cc: linux-tip-commits

Hi Ingo, Peter,

On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> Author:     Phil Auld <pauld@redhat.com>
> AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Wed, 3 Apr 2019 09:50:23 +0200

This commit seems to have gotten lost. It's not in tip and now the 
direct gitweb link is also showing bad commit reference. 

Did this fall victim to a reset or something?


Thanks,

Phil


> 
> sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
> 
> With extremely short cfs_period_us setting on a parent task group with a large
> number of children the for loop in sched_cfs_period_timer can run until the
> watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> will ever return 0.  The large number of children can make
> do_sched_cfs_period_timer() take longer than the period.
> 
>  NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
>  RIP: 0010:tg_nop+0x0/0x10
>   <IRQ>
>   walk_tg_tree_from+0x29/0xb0
>   unthrottle_cfs_rq+0xe0/0x1a0
>   distribute_cfs_runtime+0xd3/0xf0
>   sched_cfs_period_timer+0xcb/0x160
>   ? sched_cfs_slack_timer+0xd0/0xd0
>   __hrtimer_run_queues+0xfb/0x270
>   hrtimer_interrupt+0x122/0x270
>   smp_apic_timer_interrupt+0x6a/0x140
>   apic_timer_interrupt+0xf/0x20
>   </IRQ>
> 
> To prevent this we add protection to the loop that detects when the loop has run
> too many times and scales the period and quota up, proportionally, so that the timer
> can complete before then next period expires.  This preserves the relative runtime
> quota while preventing the hard lockup.
> 
> A warning is issued reporting this state and the new values.
> 
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Anton Blanchard <anton@ozlabs.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: <stable@vger.kernel.org>
> Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/fair.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 40bd1e27b1b7..d4cce633eac8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
>  	return HRTIMER_NORESTART;
>  }
>  
> +extern const u64 max_cfs_quota_period;
> +
>  static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
>  {
>  	struct cfs_bandwidth *cfs_b =
> @@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
>  	unsigned long flags;
>  	int overrun;
>  	int idle = 0;
> +	int count = 0;
>  
>  	raw_spin_lock_irqsave(&cfs_b->lock, flags);
>  	for (;;) {
> @@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
>  		if (!overrun)
>  			break;
>  
> +		if (++count > 3) {
> +			u64 new, old = ktime_to_ns(cfs_b->period);
> +
> +			new = (old * 147) / 128; /* ~115% */
> +			new = min(new, max_cfs_quota_period);
> +
> +			cfs_b->period = ns_to_ktime(new);
> +
> +			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
> +			cfs_b->quota *= new;
> +			cfs_b->quota /= old;
> +
> +			pr_warn_ratelimited(
> +	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
> +				smp_processor_id(),
> +				new/NSEC_PER_USEC,
> +				cfs_b->quota/NSEC_PER_USEC);
> +
> +			/* reset count so we don't come right back in here */
> +			count = 0;
> +		}
> +
>  		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
>  	}
>  	if (idle)

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-09 12:48   ` Phil Auld
@ 2019-04-09 13:05     ` Peter Zijlstra
  2019-04-09 13:15       ` Phil Auld
                         ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Peter Zijlstra @ 2019-04-09 13:05 UTC (permalink / raw)
  To: Phil Auld
  Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx,
	linux-tip-commits

On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote:
> Hi Ingo, Peter,
> 
> On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> > Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > Author:     Phil Auld <pauld@redhat.com>
> > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200
> 
> This commit seems to have gotten lost. It's not in tip and now the 
> direct gitweb link is also showing bad commit reference. 
> 
> Did this fall victim to a reset or something?

It had (trivial) builds fails on 32 bit. I have a fixed up version
around somewhere, but that hasn't made it back in yet.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-09 13:05     ` Peter Zijlstra
@ 2019-04-09 13:15       ` Phil Auld
  2019-04-16 13:33       ` Phil Auld
  2019-04-16 16:18       ` Phil Auld
  2 siblings, 0 replies; 11+ messages in thread
From: Phil Auld @ 2019-04-09 13:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx,
	linux-tip-commits

On Tue, Apr 09, 2019 at 03:05:27PM +0200 Peter Zijlstra wrote:
> On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote:
> > Hi Ingo, Peter,
> > 
> > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> > > Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Author:     Phil Auld <pauld@redhat.com>
> > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> > > Committer:  Ingo Molnar <mingo@kernel.org>
> > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200
> > 
> > This commit seems to have gotten lost. It's not in tip and now the 
> > direct gitweb link is also showing bad commit reference. 
> > 
> > Did this fall victim to a reset or something?
> 
> It had (trivial) builds fails on 32 bit. I have a fixed up version
> around somewhere, but that hasn't made it back in yet.

Drat. Sorry about that. At least I'm not going crazy...  

Do you want me to fix it and push it back or will you do that? 


Thanks,
Phil


-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-09 13:05     ` Peter Zijlstra
  2019-04-09 13:15       ` Phil Auld
@ 2019-04-16 13:33       ` Phil Auld
  2019-04-16 16:18       ` Phil Auld
  2 siblings, 0 replies; 11+ messages in thread
From: Phil Auld @ 2019-04-16 13:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx,
	linux-tip-commits

On Tue, Apr 09, 2019 at 03:05:27PM +0200 Peter Zijlstra wrote:
> On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote:
> > Hi Ingo, Peter,
> > 
> > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> > > Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Author:     Phil Auld <pauld@redhat.com>
> > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> > > Committer:  Ingo Molnar <mingo@kernel.org>
> > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200
> > 
> > This commit seems to have gotten lost. It's not in tip and now the 
> > direct gitweb link is also showing bad commit reference. 
> > 
> > Did this fall victim to a reset or something?
> 
> It had (trivial) builds fails on 32 bit. I have a fixed up version
> around somewhere, but that hasn't made it back in yet.


Friendly ping...


Thanks,
Phil

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
  2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
  2019-03-21 18:01 ` Peter Zijlstra
  2019-04-03  8:38 ` [tip:sched/core] " tip-bot for Phil Auld
@ 2019-04-16 15:32 ` tip-bot for Phil Auld
  2019-04-16 19:26   ` Phil Auld
  2 siblings, 1 reply; 11+ messages in thread
From: tip-bot for Phil Auld @ 2019-04-16 15:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, bsegall, linux-kernel, pauld, stable, anton, tglx,
	peterz, hpa, mingo

Commit-ID:  2e8e19226398db8265a8e675fcc0118b9e80c9e8
Gitweb:     https://git.kernel.org/tip/2e8e19226398db8265a8e675fcc0118b9e80c9e8
Author:     Phil Auld <pauld@redhat.com>
AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 16 Apr 2019 16:50:05 +0200

sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup

With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0.  The large number of children can make
do_sched_cfs_period_timer() take longer than the period.

 NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
 RIP: 0010:tg_nop+0x0/0x10
  <IRQ>
  walk_tg_tree_from+0x29/0xb0
  unthrottle_cfs_rq+0xe0/0x1a0
  distribute_cfs_runtime+0xd3/0xf0
  sched_cfs_period_timer+0xcb/0x160
  ? sched_cfs_slack_timer+0xd0/0xd0
  __hrtimer_run_queues+0xfb/0x270
  hrtimer_interrupt+0x122/0x270
  smp_apic_timer_interrupt+0x6a/0x140
  apic_timer_interrupt+0xf/0x20
  </IRQ>

To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires.  This preserves the relative runtime
quota while preventing the hard lockup.

A warning is issued reporting this state and the new values.

Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..a4d9e14bf138 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 	unsigned long flags;
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock_irqsave(&cfs_b->lock, flags);
 	for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+			pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+				smp_processor_id(),
+				div_u64(new, NSEC_PER_USEC),
+				div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
 	}
 	if (idle)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-09 13:05     ` Peter Zijlstra
  2019-04-09 13:15       ` Phil Auld
  2019-04-16 13:33       ` Phil Auld
@ 2019-04-16 16:18       ` Phil Auld
  2 siblings, 0 replies; 11+ messages in thread
From: Phil Auld @ 2019-04-16 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx,
	linux-tip-commits

On Tue, Apr 09, 2019 at 03:05:27PM +0200 Peter Zijlstra wrote:
> On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote:
> > Hi Ingo, Peter,
> > 
> > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> > > Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Author:     Phil Auld <pauld@redhat.com>
> > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> > > Committer:  Ingo Molnar <mingo@kernel.org>
> > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200
> > 
> > This commit seems to have gotten lost. It's not in tip and now the 
> > direct gitweb link is also showing bad commit reference. 
> > 
> > Did this fall victim to a reset or something?
> 
> It had (trivial) builds fails on 32 bit. I have a fixed up version
> around somewhere, but that hasn't made it back in yet.


Thank you :)

I'll post the follow up version for the stable trees soon.


-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
  2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
@ 2019-04-16 19:26   ` Phil Auld
  0 siblings, 0 replies; 11+ messages in thread
From: Phil Auld @ 2019-04-16 19:26 UTC (permalink / raw)
  To: sashal
  Cc: tglx, hpa, peterz, mingo, torvalds, linux-kernel, bsegall, stable,
	anton


Hi Sasha,

On Tue, Apr 16, 2019 at 08:32:09AM -0700 tip-bot for Phil Auld wrote:
> Commit-ID:  2e8e19226398db8265a8e675fcc0118b9e80c9e8
> Gitweb:     https://git.kernel.org/tip/2e8e19226398db8265a8e675fcc0118b9e80c9e8
> Author:     Phil Auld <pauld@redhat.com>
> AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 16 Apr 2019 16:50:05 +0200
> 
> sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
> 
> With extremely short cfs_period_us setting on a parent task group with a large
> number of children the for loop in sched_cfs_period_timer() can run until the
> watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> will ever return 0.  The large number of children can make
> do_sched_cfs_period_timer() take longer than the period.
> 
>  NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
>  RIP: 0010:tg_nop+0x0/0x10
>   <IRQ>
>   walk_tg_tree_from+0x29/0xb0
>   unthrottle_cfs_rq+0xe0/0x1a0
>   distribute_cfs_runtime+0xd3/0xf0
>   sched_cfs_period_timer+0xcb/0x160
>   ? sched_cfs_slack_timer+0xd0/0xd0
>   __hrtimer_run_queues+0xfb/0x270
>   hrtimer_interrupt+0x122/0x270
>   smp_apic_timer_interrupt+0x6a/0x140
>   apic_timer_interrupt+0xf/0x20
>   </IRQ>
> 
> To prevent this we add protection to the loop that detects when the loop has run
> too many times and scales the period and quota up, proportionally, so that the timer
> can complete before then next period expires.  This preserves the relative runtime
> quota while preventing the hard lockup.
> 
> A warning is issued reporting this state and the new values.
> 
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: <stable@vger.kernel.org>
> Cc: Anton Blanchard <anton@ozlabs.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---

The above commit won't work on the stable trees. Below is an updated version
that will work on v5.0.7, v4.19.34, v4.14.111, v4.9.168, and v4.4.178 with 
increasing offsets. I believe v3.18.138 will require more so that one is not 
included.

There is only a minor change to context, none of actual changes in the patch are
different.


Thanks,
Phil
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 310d0637fe4b..f0380229b6f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4859,12 +4859,15 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
 		container_of(timer, struct cfs_bandwidth, period_timer);
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock(&cfs_b->lock);
 	for (;;) {
@@ -4872,6 +4875,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+			pr_warn_ratelimited(
+        "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+	                        smp_processor_id(),
+	                        div_u64(new, NSEC_PER_USEC),
+                                div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun);
 	}
 	if (idle)



-- 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-04-16 19:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
2019-03-21 18:01 ` Peter Zijlstra
2019-03-21 18:32   ` Phil Auld
2019-04-03  8:38 ` [tip:sched/core] " tip-bot for Phil Auld
2019-04-09 12:48   ` Phil Auld
2019-04-09 13:05     ` Peter Zijlstra
2019-04-09 13:15       ` Phil Auld
2019-04-16 13:33       ` Phil Auld
2019-04-16 16:18       ` Phil Auld
2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
2019-04-16 19:26   ` Phil Auld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox