From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Phil Auld <pauld@redhat.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Anton Blanchard <anton@ozlabs.org>,
Ben Segall <bsegall@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 39/44] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
Date: Wed, 24 Apr 2019 19:10:17 +0200 [thread overview]
Message-ID: <20190424170900.992630680@linuxfoundation.org> (raw)
In-Reply-To: <20190424170839.924291114@linuxfoundation.org>
[ Upstream commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 ]
With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0. The large number of children can make
do_sched_cfs_period_timer() take longer than the period.
NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
RIP: 0010:tg_nop+0x0/0x10
<IRQ>
walk_tg_tree_from+0x29/0xb0
unthrottle_cfs_rq+0xe0/0x1a0
distribute_cfs_runtime+0xd3/0xf0
sched_cfs_period_timer+0xcb/0x160
? sched_cfs_slack_timer+0xd0/0xd0
__hrtimer_run_queues+0xfb/0x270
hrtimer_interrupt+0x122/0x270
smp_apic_timer_interrupt+0x6a/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires. This preserves the relative runtime
quota while preventing the hard lockup.
A warning is issued reporting this state and the new values.
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/sched/fair.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1c630d94f86b..4b1e0669740c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4347,12 +4347,15 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
+extern const u64 max_cfs_quota_period;
+
static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
{
struct cfs_bandwidth *cfs_b =
container_of(timer, struct cfs_bandwidth, period_timer);
int overrun;
int idle = 0;
+ int count = 0;
raw_spin_lock(&cfs_b->lock);
for (;;) {
@@ -4360,6 +4363,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
if (!overrun)
break;
+ if (++count > 3) {
+ u64 new, old = ktime_to_ns(cfs_b->period);
+
+ new = (old * 147) / 128; /* ~115% */
+ new = min(new, max_cfs_quota_period);
+
+ cfs_b->period = ns_to_ktime(new);
+
+ /* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+ cfs_b->quota *= new;
+ cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+ pr_warn_ratelimited(
+ "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+ smp_processor_id(),
+ div_u64(new, NSEC_PER_USEC),
+ div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+ /* reset count so we don't come right back in here */
+ count = 0;
+ }
+
idle = do_sched_cfs_period_timer(cfs_b, overrun);
}
if (idle)
--
2.19.1
next prev parent reply other threads:[~2019-04-24 17:25 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-24 17:09 [PATCH 4.9 00/44] 4.9.171-stable review Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 01/44] bonding: fix event handling for stacked bonds Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 02/44] net: atm: Fix potential Spectre v1 vulnerabilities Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 03/44] net: bridge: fix per-port af_packet sockets Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 04/44] net: bridge: multicast: use rcu to access port list from br_multicast_start_querier Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 05/44] net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 06/44] tcp: tcp_grow_window() needs to respect tcp_space() Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 07/44] team: set slave to promisc if team is already in promisc mode Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 08/44] vhost: reject zero size iova range Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 09/44] ipv4: recompile ip options in ipv4_link_failure Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 10/44] ipv4: ensure rcu_read_lock() in ipv4_link_failure() Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 11/44] crypto: crypto4xx - properly set IV after de- and encrypt Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 12/44] mmc: sdhci: Fix data command CRC error handling Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 13/44] modpost: file2alias: go back to simple devtable lookup Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 14/44] modpost: file2alias: check prototype of handler Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 15/44] tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 16/44] CIFS: keep FileInfo handle live during oplock break Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 17/44] KVM: x86: Dont clear EFER during SMM transitions for 32-bit vCPU Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 18/44] staging: iio: ad7192: Fix ad7193 channel address Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 19/44] iio/gyro/bmg160: Use millidegrees for temperature scale Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 20/44] iio: ad_sigma_delta: select channel when reading register Greg Kroah-Hartman
2019-04-24 17:09 ` [PATCH 4.9 21/44] iio: adc: at91: disable adc channel interrupt in timeout case Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 22/44] io: accel: kxcjk1013: restore the range after resume Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 23/44] staging: comedi: vmk80xx: Fix use of uninitialized semaphore Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 24/44] staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 25/44] staging: comedi: ni_usb6501: Fix use of uninitialized mutex Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 26/44] staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 27/44] ALSA: core: Fix card races between register and disconnect Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 28/44] Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO" Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 29/44] Revert "svm: Fix AVIC incomplete IPI emulation" Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 30/44] crypto: x86/poly1305 - fix overflow during partial reduction Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 31/44] arm64: futex: Restore oldval initialization to work around buggy compilers Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 32/44] x86/kprobes: Verify stack frame on kretprobe Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 33/44] kprobes: Mark ftrace mcount handler functions nokprobe Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 34/44] kprobes: Fix error check when reusing optimized probes Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 35/44] rt2x00: do not increment sequence number while re-transmitting Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 36/44] mac80211: do not call driver wake_tx_queue op during reconfig Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 37/44] perf/x86/amd: Add event map for AMD Family 17h Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 38/44] Revert "kbuild: use -Oz instead of -Os when using clang" Greg Kroah-Hartman
2019-04-24 17:10 ` Greg Kroah-Hartman [this message]
2019-04-24 17:10 ` [PATCH 4.9 40/44] device_cgroup: fix RCU imbalance in error case Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 41/44] mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 42/44] ALSA: info: Fix racy addition/deletion of nodes Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 43/44] percpu: stop printing kernel addresses Greg Kroah-Hartman
2019-04-24 17:10 ` [PATCH 4.9 44/44] i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array Greg Kroah-Hartman
2019-04-24 21:25 ` [PATCH 4.9 00/44] 4.9.171-stable review kernelci.org bot
2019-04-25 5:24 ` Naresh Kamboju
2019-04-25 11:55 ` Jon Hunter
2019-04-25 16:26 ` shuah
2019-04-25 21:40 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190424170900.992630680@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=anton@ozlabs.org \
--cc=bsegall@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox