* [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
@ 2019-03-19 13:00 Phil Auld
2019-03-21 18:01 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Phil Auld @ 2019-03-19 13:00 UTC (permalink / raw)
To: linux-kernel; +Cc: Ben Segall, Ingo Molnar, Peter Zijlstra, Anton Blanchard
sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0. The large number of children can make
do_sched_cfs_period_timer() take longer than the period.
[ 217.264946] NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
[ 217.264948] Modules linked in: sunrpc iTCO_wdt gpio_ich iTCO_vendor_support intel_powerclamp coretemp kvm_intel
+kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_si intel_cstate joydev
+ipmi_devintf pcspkr hpilo intel_uncore sg hpwdt ipmi_msghandler acpi_power_meter i7core_edac lpc_ich acpi_cpufreq
+ip_tables xfs libcrc32c sr_mod sd_mod cdrom ata_generic radeon i2c_algo_bit drm_kms_helper syscopyarea
+sysfillrect sysimgblt fb_sys_fops ttm ata_piix drm serio_raw crc32c_intel hpsa myri10ge libata dca
+scsi_transport_sas netxen_nic dm_mirror dm_region_hash dm_log dm_mod
[ 217.264964] CPU: 24 PID: 46684 Comm: myapp Not tainted 5.0.0-rc7+ #25
[ 217.264965] Hardware name: HP ProLiant DL580 G7, BIOS P65 08/16/2015
[ 217.264965] RIP: 0010:tg_nop+0x0/0x10
[ 217.264966] Code: 83 c9 08 f0 48 0f b1 0f 48 39 c2 74 0e 48 89 c2 f7 c2 00 00 20 00 75 dc 31 c0 c3 b8 01 00 00
+00 c3 66 0f 1f 84 00 00 00 00 00 <66> 66 66 66 90 31 c0 c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 8b
[ 217.264967] RSP: 0000:ffff976a7f703e48 EFLAGS: 00000087
[ 217.264967] RAX: ffff976a7c7c8f00 RBX: ffff976a6f4fad00 RCX: ffff976a7c7c90f0
[ 217.264968] RDX: ffff976a6f4faee0 RSI: ffff976a7f961f00 RDI: ffff976a6f4fad00
[ 217.264968] RBP: ffff976a7f961f00 R08: 0000000000000002 R09: ffffffad2c9b3331
[ 217.264969] R10: 0000000000000000 R11: 0000000000000000 R12: ffff976a7c7c8f00
[ 217.264969] R13: ffffffffb2305c00 R14: 0000000000000000 R15: ffffffffb22f8510
[ 217.264970] FS: 00007f6240678740(0000) GS:ffff976a7f700000(0000) knlGS:0000000000000000
[ 217.264970] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 217.264971] CR2: 00000000006dee20 CR3: 000000bf2bffc005 CR4: 00000000000206e0
[ 217.264971] Call Trace:
[ 217.264971] <IRQ>
[ 217.264972] walk_tg_tree_from+0x29/0xb0
[ 217.264972] unthrottle_cfs_rq+0xe0/0x1a0
[ 217.264972] distribute_cfs_runtime+0xd3/0xf0
[ 217.264973] sched_cfs_period_timer+0xcb/0x160
[ 217.264973] ? sched_cfs_slack_timer+0xd0/0xd0
[ 217.264973] __hrtimer_run_queues+0xfb/0x270
[ 217.264974] hrtimer_interrupt+0x122/0x270
[ 217.264974] smp_apic_timer_interrupt+0x6a/0x140
[ 217.264975] apic_timer_interrupt+0xf/0x20
[ 217.264975] </IRQ>
[ 217.264975] RIP: 0033:0x7f6240125fe5
[ 217.264976] Code: 44 17 d0 f3 44 0f 7f 47 30 f3 44 0f 7f 44 17 c0 48 01 fa 48 83 e2 c0 48 39 d1 74 a3 66 0f 1f
+84 00 00 00 00 00 66 44 0f 7f 01 <66> 44 0f 7f 41 10 66 44 0f 7f 41 20 66 44 0f 7f 41 30 48 83 c1 40
...
To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires. This preserves the relative runtime
quota while preventing the hard lockup.
A warning is issued reporting this state and the new values.
v2: Math reworked/simplified by Peter Zijlstra.
Signed-off-by: Phil Auld <pauld@redhat.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anton Blanchard <anton@ozlabs.org>
---
kernel/sched/fair.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ea74d43924b2..4e43a40ec13c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
+extern const u64 max_cfs_quota_period;
+
static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
{
struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
unsigned long flags;
int overrun;
int idle = 0;
+ int count = 0;
raw_spin_lock_irqsave(&cfs_b->lock, flags);
for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
if (!overrun)
break;
+ if (++count > 3) {
+ u64 new, old = ktime_to_ns(cfs_b->period);
+
+ new = (old * 147) / 128; /* ~115% */
+ new = min(new, max_cfs_quota_period);
+
+ cfs_b->period = ns_to_ktime(new);
+
+ /* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+ cfs_b->quota *= new;
+ cfs_b->quota /= old;
+
+ pr_warn_ratelimited(
+ "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+ smp_processor_id(),
+ new/NSEC_PER_USEC,
+ cfs_b->quota/NSEC_PER_USEC);
+
+ /* reset count so we don't come right back in here */
+ count = 0;
+ }
+
idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
}
if (idle)
--
2.18.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup 2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld @ 2019-03-21 18:01 ` Peter Zijlstra 2019-03-21 18:32 ` Phil Auld 2019-04-03 8:38 ` [tip:sched/core] " tip-bot for Phil Auld 2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld 2 siblings, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2019-03-21 18:01 UTC (permalink / raw) To: Phil Auld; +Cc: linux-kernel, Ben Segall, Ingo Molnar, Anton Blanchard On Tue, Mar 19, 2019 at 09:00:05AM -0400, Phil Auld wrote: > sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup > > With extremely short cfs_period_us setting on a parent task group with a large > number of children the for loop in sched_cfs_period_timer can run until the > watchdog fires. There is no guarantee that the call to hrtimer_forward_now() > will ever return 0. The large number of children can make > do_sched_cfs_period_timer() take longer than the period. > > To prevent this we add protection to the loop that detects when the loop has run > too many times and scales the period and quota up, proportionally, so that the timer > can complete before then next period expires. This preserves the relative runtime > quota while preventing the hard lockup. > > A warning is issued reporting this state and the new values. > > v2: Math reworked/simplified by Peter Zijlstra. > > Signed-off-by: Phil Auld <pauld@redhat.com> > Cc: Ben Segall <bsegall@google.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Anton Blanchard <anton@ozlabs.org> Thanks! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup 2019-03-21 18:01 ` Peter Zijlstra @ 2019-03-21 18:32 ` Phil Auld 0 siblings, 0 replies; 11+ messages in thread From: Phil Auld @ 2019-03-21 18:32 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ben Segall, Ingo Molnar, Anton Blanchard On Thu, Mar 21, 2019 at 07:01:37PM +0100 Peter Zijlstra wrote: > On Tue, Mar 19, 2019 at 09:00:05AM -0400, Phil Auld wrote: > > sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup > > > > With extremely short cfs_period_us setting on a parent task group with a large > > number of children the for loop in sched_cfs_period_timer can run until the > > watchdog fires. There is no guarantee that the call to hrtimer_forward_now() > > will ever return 0. The large number of children can make > > do_sched_cfs_period_timer() take longer than the period. > > > > > To prevent this we add protection to the loop that detects when the loop has run > > too many times and scales the period and quota up, proportionally, so that the timer > > can complete before then next period expires. This preserves the relative runtime > > quota while preventing the hard lockup. > > > > A warning is issued reporting this state and the new values. > > > > v2: Math reworked/simplified by Peter Zijlstra. > > > > Signed-off-by: Phil Auld <pauld@redhat.com> > > Cc: Ben Segall <bsegall@google.com> > > Cc: Ingo Molnar <mingo@redhat.com> > > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > > Cc: Anton Blanchard <anton@ozlabs.org> > > Thanks! Thank you for your time and help. What do you think about Cc: stable? Cheers, Phil -- ^ permalink raw reply [flat|nested] 11+ messages in thread
* [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup 2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld 2019-03-21 18:01 ` Peter Zijlstra @ 2019-04-03 8:38 ` tip-bot for Phil Auld 2019-04-09 12:48 ` Phil Auld 2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld 2 siblings, 1 reply; 11+ messages in thread From: tip-bot for Phil Auld @ 2019-04-03 8:38 UTC (permalink / raw) To: linux-tip-commits Cc: mingo, pauld, efault, tglx, peterz, anton, bsegall, torvalds, stable, hpa, linux-kernel Commit-ID: 06ec5d30e8d57b820d44df6340dcb25010d6d0fa Gitweb: https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa Author: Phil Auld <pauld@redhat.com> AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Wed, 3 Apr 2019 09:50:23 +0200 sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup With extremely short cfs_period_us setting on a parent task group with a large number of children the for loop in sched_cfs_period_timer can run until the watchdog fires. There is no guarantee that the call to hrtimer_forward_now() will ever return 0. The large number of children can make do_sched_cfs_period_timer() take longer than the period. NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 RIP: 0010:tg_nop+0x0/0x10 <IRQ> walk_tg_tree_from+0x29/0xb0 unthrottle_cfs_rq+0xe0/0x1a0 distribute_cfs_runtime+0xd3/0xf0 sched_cfs_period_timer+0xcb/0x160 ? sched_cfs_slack_timer+0xd0/0xd0 __hrtimer_run_queues+0xfb/0x270 hrtimer_interrupt+0x122/0x270 smp_apic_timer_interrupt+0x6a/0x140 apic_timer_interrupt+0xf/0x20 </IRQ> To prevent this we add protection to the loop that detects when the loop has run too many times and scales the period and quota up, proportionally, so that the timer can complete before then next period expires. This preserves the relative runtime quota while preventing the hard lockup. A warning is issued reporting this state and the new values. Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 40bd1e27b1b7..d4cce633eac8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer) return HRTIMER_NORESTART; } +extern const u64 max_cfs_quota_period; + static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) { struct cfs_bandwidth *cfs_b = @@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) unsigned long flags; int overrun; int idle = 0; + int count = 0; raw_spin_lock_irqsave(&cfs_b->lock, flags); for (;;) { @@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) if (!overrun) break; + if (++count > 3) { + u64 new, old = ktime_to_ns(cfs_b->period); + + new = (old * 147) / 128; /* ~115% */ + new = min(new, max_cfs_quota_period); + + cfs_b->period = ns_to_ktime(new); + + /* since max is 1s, this is limited to 1e9^2, which fits in u64 */ + cfs_b->quota *= new; + cfs_b->quota /= old; + + pr_warn_ratelimited( + "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n", + smp_processor_id(), + new/NSEC_PER_USEC, + cfs_b->quota/NSEC_PER_USEC); + + /* reset count so we don't come right back in here */ + count = 0; + } + idle = do_sched_cfs_period_timer(cfs_b, overrun, flags); } if (idle) ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup 2019-04-03 8:38 ` [tip:sched/core] " tip-bot for Phil Auld @ 2019-04-09 12:48 ` Phil Auld 2019-04-09 13:05 ` Peter Zijlstra 0 siblings, 1 reply; 11+ messages in thread From: Phil Auld @ 2019-04-09 12:48 UTC (permalink / raw) To: linux-kernel, hpa, peterz, bsegall, torvalds, anton, efault, mingo, tglx Cc: linux-tip-commits Hi Ingo, Peter, On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote: > Commit-ID: 06ec5d30e8d57b820d44df6340dcb25010d6d0fa > Gitweb: https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa > Author: Phil Auld <pauld@redhat.com> > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400 > Committer: Ingo Molnar <mingo@kernel.org> > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200 This commit seems to have gotten lost. It's not in tip and now the direct gitweb link is also showing bad commit reference. Did this fall victim to a reset or something? Thanks, Phil > > sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup > > With extremely short cfs_period_us setting on a parent task group with a large > number of children the for loop in sched_cfs_period_timer can run until the > watchdog fires. There is no guarantee that the call to hrtimer_forward_now() > will ever return 0. The large number of children can make > do_sched_cfs_period_timer() take longer than the period. > > NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 > RIP: 0010:tg_nop+0x0/0x10 > <IRQ> > walk_tg_tree_from+0x29/0xb0 > unthrottle_cfs_rq+0xe0/0x1a0 > distribute_cfs_runtime+0xd3/0xf0 > sched_cfs_period_timer+0xcb/0x160 > ? sched_cfs_slack_timer+0xd0/0xd0 > __hrtimer_run_queues+0xfb/0x270 > hrtimer_interrupt+0x122/0x270 > smp_apic_timer_interrupt+0x6a/0x140 > apic_timer_interrupt+0xf/0x20 > </IRQ> > > To prevent this we add protection to the loop that detects when the loop has run > too many times and scales the period and quota up, proportionally, so that the timer > can complete before then next period expires. This preserves the relative runtime > quota while preventing the hard lockup. > > A warning is issued reporting this state and the new values. > > Signed-off-by: Phil Auld <pauld@redhat.com> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Anton Blanchard <anton@ozlabs.org> > Cc: Ben Segall <bsegall@google.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Mike Galbraith <efault@gmx.de> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: <stable@vger.kernel.org> > Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com > Signed-off-by: Ingo Molnar <mingo@kernel.org> > --- > kernel/sched/fair.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 40bd1e27b1b7..d4cce633eac8 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer) > return HRTIMER_NORESTART; > } > > +extern const u64 max_cfs_quota_period; > + > static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) > { > struct cfs_bandwidth *cfs_b = > @@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) > unsigned long flags; > int overrun; > int idle = 0; > + int count = 0; > > raw_spin_lock_irqsave(&cfs_b->lock, flags); > for (;;) { > @@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) > if (!overrun) > break; > > + if (++count > 3) { > + u64 new, old = ktime_to_ns(cfs_b->period); > + > + new = (old * 147) / 128; /* ~115% */ > + new = min(new, max_cfs_quota_period); > + > + cfs_b->period = ns_to_ktime(new); > + > + /* since max is 1s, this is limited to 1e9^2, which fits in u64 */ > + cfs_b->quota *= new; > + cfs_b->quota /= old; > + > + pr_warn_ratelimited( > + "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n", > + smp_processor_id(), > + new/NSEC_PER_USEC, > + cfs_b->quota/NSEC_PER_USEC); > + > + /* reset count so we don't come right back in here */ > + count = 0; > + } > + > idle = do_sched_cfs_period_timer(cfs_b, overrun, flags); > } > if (idle) -- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup 2019-04-09 12:48 ` Phil Auld @ 2019-04-09 13:05 ` Peter Zijlstra 2019-04-09 13:15 ` Phil Auld ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Peter Zijlstra @ 2019-04-09 13:05 UTC (permalink / raw) To: Phil Auld Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx, linux-tip-commits On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote: > Hi Ingo, Peter, > > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote: > > Commit-ID: 06ec5d30e8d57b820d44df6340dcb25010d6d0fa > > Gitweb: https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa > > Author: Phil Auld <pauld@redhat.com> > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400 > > Committer: Ingo Molnar <mingo@kernel.org> > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200 > > This commit seems to have gotten lost. It's not in tip and now the > direct gitweb link is also showing bad commit reference. > > Did this fall victim to a reset or something? It had (trivial) builds fails on 32 bit. I have a fixed up version around somewhere, but that hasn't made it back in yet. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup 2019-04-09 13:05 ` Peter Zijlstra @ 2019-04-09 13:15 ` Phil Auld 2019-04-16 13:33 ` Phil Auld 2019-04-16 16:18 ` Phil Auld 2 siblings, 0 replies; 11+ messages in thread From: Phil Auld @ 2019-04-09 13:15 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx, linux-tip-commits On Tue, Apr 09, 2019 at 03:05:27PM +0200 Peter Zijlstra wrote: > On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote: > > Hi Ingo, Peter, > > > > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote: > > > Commit-ID: 06ec5d30e8d57b820d44df6340dcb25010d6d0fa > > > Gitweb: https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa > > > Author: Phil Auld <pauld@redhat.com> > > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400 > > > Committer: Ingo Molnar <mingo@kernel.org> > > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200 > > > > This commit seems to have gotten lost. It's not in tip and now the > > direct gitweb link is also showing bad commit reference. > > > > Did this fall victim to a reset or something? > > It had (trivial) builds fails on 32 bit. I have a fixed up version > around somewhere, but that hasn't made it back in yet. Drat. Sorry about that. At least I'm not going crazy... Do you want me to fix it and push it back or will you do that? Thanks, Phil -- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup 2019-04-09 13:05 ` Peter Zijlstra 2019-04-09 13:15 ` Phil Auld @ 2019-04-16 13:33 ` Phil Auld 2019-04-16 16:18 ` Phil Auld 2 siblings, 0 replies; 11+ messages in thread From: Phil Auld @ 2019-04-16 13:33 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx, linux-tip-commits On Tue, Apr 09, 2019 at 03:05:27PM +0200 Peter Zijlstra wrote: > On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote: > > Hi Ingo, Peter, > > > > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote: > > > Commit-ID: 06ec5d30e8d57b820d44df6340dcb25010d6d0fa > > > Gitweb: https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa > > > Author: Phil Auld <pauld@redhat.com> > > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400 > > > Committer: Ingo Molnar <mingo@kernel.org> > > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200 > > > > This commit seems to have gotten lost. It's not in tip and now the > > direct gitweb link is also showing bad commit reference. > > > > Did this fall victim to a reset or something? > > It had (trivial) builds fails on 32 bit. I have a fixed up version > around somewhere, but that hasn't made it back in yet. Friendly ping... Thanks, Phil -- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup 2019-04-09 13:05 ` Peter Zijlstra 2019-04-09 13:15 ` Phil Auld 2019-04-16 13:33 ` Phil Auld @ 2019-04-16 16:18 ` Phil Auld 2 siblings, 0 replies; 11+ messages in thread From: Phil Auld @ 2019-04-16 16:18 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx, linux-tip-commits On Tue, Apr 09, 2019 at 03:05:27PM +0200 Peter Zijlstra wrote: > On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote: > > Hi Ingo, Peter, > > > > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote: > > > Commit-ID: 06ec5d30e8d57b820d44df6340dcb25010d6d0fa > > > Gitweb: https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa > > > Author: Phil Auld <pauld@redhat.com> > > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400 > > > Committer: Ingo Molnar <mingo@kernel.org> > > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200 > > > > This commit seems to have gotten lost. It's not in tip and now the > > direct gitweb link is also showing bad commit reference. > > > > Did this fall victim to a reset or something? > > It had (trivial) builds fails on 32 bit. I have a fixed up version > around somewhere, but that hasn't made it back in yet. Thank you :) I'll post the follow up version for the stable trees soon. -- ^ permalink raw reply [flat|nested] 11+ messages in thread
* [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup 2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld 2019-03-21 18:01 ` Peter Zijlstra 2019-04-03 8:38 ` [tip:sched/core] " tip-bot for Phil Auld @ 2019-04-16 15:32 ` tip-bot for Phil Auld 2019-04-16 19:26 ` Phil Auld 2 siblings, 1 reply; 11+ messages in thread From: tip-bot for Phil Auld @ 2019-04-16 15:32 UTC (permalink / raw) To: linux-tip-commits Cc: torvalds, bsegall, linux-kernel, pauld, stable, anton, tglx, peterz, hpa, mingo Commit-ID: 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Gitweb: https://git.kernel.org/tip/2e8e19226398db8265a8e675fcc0118b9e80c9e8 Author: Phil Auld <pauld@redhat.com> AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 16 Apr 2019 16:50:05 +0200 sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup With extremely short cfs_period_us setting on a parent task group with a large number of children the for loop in sched_cfs_period_timer() can run until the watchdog fires. There is no guarantee that the call to hrtimer_forward_now() will ever return 0. The large number of children can make do_sched_cfs_period_timer() take longer than the period. NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 RIP: 0010:tg_nop+0x0/0x10 <IRQ> walk_tg_tree_from+0x29/0xb0 unthrottle_cfs_rq+0xe0/0x1a0 distribute_cfs_runtime+0xd3/0xf0 sched_cfs_period_timer+0xcb/0x160 ? sched_cfs_slack_timer+0xd0/0xd0 __hrtimer_run_queues+0xfb/0x270 hrtimer_interrupt+0x122/0x270 smp_apic_timer_interrupt+0x6a/0x140 apic_timer_interrupt+0xf/0x20 </IRQ> To prevent this we add protection to the loop that detects when the loop has run too many times and scales the period and quota up, proportionally, so that the timer can complete before then next period expires. This preserves the relative runtime quota while preventing the hard lockup. A warning is issued reporting this state and the new values. Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 40bd1e27b1b7..a4d9e14bf138 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer) return HRTIMER_NORESTART; } +extern const u64 max_cfs_quota_period; + static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) { struct cfs_bandwidth *cfs_b = @@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) unsigned long flags; int overrun; int idle = 0; + int count = 0; raw_spin_lock_irqsave(&cfs_b->lock, flags); for (;;) { @@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) if (!overrun) break; + if (++count > 3) { + u64 new, old = ktime_to_ns(cfs_b->period); + + new = (old * 147) / 128; /* ~115% */ + new = min(new, max_cfs_quota_period); + + cfs_b->period = ns_to_ktime(new); + + /* since max is 1s, this is limited to 1e9^2, which fits in u64 */ + cfs_b->quota *= new; + cfs_b->quota = div64_u64(cfs_b->quota, old); + + pr_warn_ratelimited( + "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n", + smp_processor_id(), + div_u64(new, NSEC_PER_USEC), + div_u64(cfs_b->quota, NSEC_PER_USEC)); + + /* reset count so we don't come right back in here */ + count = 0; + } + idle = do_sched_cfs_period_timer(cfs_b, overrun, flags); } if (idle) ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup 2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld @ 2019-04-16 19:26 ` Phil Auld 0 siblings, 0 replies; 11+ messages in thread From: Phil Auld @ 2019-04-16 19:26 UTC (permalink / raw) To: sashal Cc: tglx, hpa, peterz, mingo, torvalds, linux-kernel, bsegall, stable, anton Hi Sasha, On Tue, Apr 16, 2019 at 08:32:09AM -0700 tip-bot for Phil Auld wrote: > Commit-ID: 2e8e19226398db8265a8e675fcc0118b9e80c9e8 > Gitweb: https://git.kernel.org/tip/2e8e19226398db8265a8e675fcc0118b9e80c9e8 > Author: Phil Auld <pauld@redhat.com> > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400 > Committer: Ingo Molnar <mingo@kernel.org> > CommitDate: Tue, 16 Apr 2019 16:50:05 +0200 > > sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup > > With extremely short cfs_period_us setting on a parent task group with a large > number of children the for loop in sched_cfs_period_timer() can run until the > watchdog fires. There is no guarantee that the call to hrtimer_forward_now() > will ever return 0. The large number of children can make > do_sched_cfs_period_timer() take longer than the period. > > NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 > RIP: 0010:tg_nop+0x0/0x10 > <IRQ> > walk_tg_tree_from+0x29/0xb0 > unthrottle_cfs_rq+0xe0/0x1a0 > distribute_cfs_runtime+0xd3/0xf0 > sched_cfs_period_timer+0xcb/0x160 > ? sched_cfs_slack_timer+0xd0/0xd0 > __hrtimer_run_queues+0xfb/0x270 > hrtimer_interrupt+0x122/0x270 > smp_apic_timer_interrupt+0x6a/0x140 > apic_timer_interrupt+0xf/0x20 > </IRQ> > > To prevent this we add protection to the loop that detects when the loop has run > too many times and scales the period and quota up, proportionally, so that the timer > can complete before then next period expires. This preserves the relative runtime > quota while preventing the hard lockup. > > A warning is issued reporting this state and the new values. > > Signed-off-by: Phil Auld <pauld@redhat.com> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: <stable@vger.kernel.org> > Cc: Anton Blanchard <anton@ozlabs.org> > Cc: Ben Segall <bsegall@google.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com > Signed-off-by: Ingo Molnar <mingo@kernel.org> > --- The above commit won't work on the stable trees. Below is an updated version that will work on v5.0.7, v4.19.34, v4.14.111, v4.9.168, and v4.4.178 with increasing offsets. I believe v3.18.138 will require more so that one is not included. There is only a minor change to context, none of actual changes in the patch are different. Thanks, Phil diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 310d0637fe4b..f0380229b6f2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4859,12 +4859,15 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer) return HRTIMER_NORESTART; } +extern const u64 max_cfs_quota_period; + static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) { struct cfs_bandwidth *cfs_b = container_of(timer, struct cfs_bandwidth, period_timer); int overrun; int idle = 0; + int count = 0; raw_spin_lock(&cfs_b->lock); for (;;) { @@ -4872,6 +4875,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) if (!overrun) break; + if (++count > 3) { + u64 new, old = ktime_to_ns(cfs_b->period); + + new = (old * 147) / 128; /* ~115% */ + new = min(new, max_cfs_quota_period); + + cfs_b->period = ns_to_ktime(new); + + /* since max is 1s, this is limited to 1e9^2, which fits in u64 */ + cfs_b->quota *= new; + cfs_b->quota = div64_u64(cfs_b->quota, old); + + pr_warn_ratelimited( + "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n", + smp_processor_id(), + div_u64(new, NSEC_PER_USEC), + div_u64(cfs_b->quota, NSEC_PER_USEC)); + + /* reset count so we don't come right back in here */ + count = 0; + } + idle = do_sched_cfs_period_timer(cfs_b, overrun); } if (idle) -- ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-04-16 19:26 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld 2019-03-21 18:01 ` Peter Zijlstra 2019-03-21 18:32 ` Phil Auld 2019-04-03 8:38 ` [tip:sched/core] " tip-bot for Phil Auld 2019-04-09 12:48 ` Phil Auld 2019-04-09 13:05 ` Peter Zijlstra 2019-04-09 13:15 ` Phil Auld 2019-04-16 13:33 ` Phil Auld 2019-04-16 16:18 ` Phil Auld 2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld 2019-04-16 19:26 ` Phil Auld
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox