Re: [PATCH] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Phil Auld <pauld@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, Ben Segall <bsegall@google.com>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
Date: Fri, 15 Mar 2019 09:51:25 -0400	[thread overview]
Message-ID: <20190315135124.GC27131@pauld.bos.csb> (raw)
In-Reply-To: <20190315103357.GC6521@hirez.programming.kicks-ass.net>

On Fri, Mar 15, 2019 at 11:33:57AM +0100 Peter Zijlstra wrote:
> On Fri, Mar 15, 2019 at 11:11:50AM +0100, Peter Zijlstra wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ea74d43924b2..b71557be6b42 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > +extern const u64 max_cfs_quota_period;
> > +
> >  static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
> >  {
> >  	struct cfs_bandwidth *cfs_b =
> > @@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
> >  	unsigned long flags;
> >  	int overrun;
> >  	int idle = 0;
> > +	int count = 0;
> >  
> >  	raw_spin_lock_irqsave(&cfs_b->lock, flags);
> >  	for (;;) {
> > @@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
> >  		if (!overrun)
> >  			break;
> >  
> > +		if (++count > 3) {
> > +			u64 new, old = ktime_to_ns(cfs_b->period);
> > +
> > +			new = (old * 147) / 128; /* ~115% */
> > +			new = min(new, max_cfs_quota_period);
> 
> Also, we can still engineer things to come unstuck; if we explicitly
> configure period at 1e9 and then set a really small quota and then
> create this insane amount of cgroups you have..
> 
> this code has no room to manouvre left.
> 
> Do we want to do anything about that? Or leave it as is, don't do that
> then?
> 

If the period is 1s it would be hard to make this loop fire repeatedly. I don't think
it's that dependent on the quota other than getting some rqs throttled. The small quota
would also mean fewer of them would get unthrottled per distribute call. You'd probably
need _significantly_ more cgroups than my insane 2500 to hit it. 

Right now it settles out with a new period of ~12-15ms.  So ~200,000 cgroups?

Ben and I talked a little about this in another thread. I think hitting this is enough of 
an edge case that this approach will make the problem go away. The only alternative we 
came up with to reduce the time taken in unthrottle involved a fair bit of complexity 
added to the every day code paths.  And might not help if the children all had their
own quota/period settings active. 

Thoughts?


Cheers,
Phil



> > +
> > +			cfs_b->period = ns_to_ktime(new);
> > +
> > +			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
> > +			cfs_b->quota *= new;
> > +			cfs_b->quota /= old;
> > +
> > +			pr_warn_ratelimited(
> > +	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
> > +				smp_processor_id(),
> > +				new/NSEC_PER_USEC,
> > +				cfs_b->quota/NSEC_PER_USEC);
> > +
> > +			/* reset count so we don't come right back in here */
> > +			count = 0;
> > +		}
> > +
> >  		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
> >  	}
> >  	if (idle)

--

next prev parent reply	other threads:[~2019-03-15 13:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-13 15:08 [PATCH] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
2019-03-15 10:11 ` Peter Zijlstra
2019-03-15 10:33   ` Peter Zijlstra
2019-03-15 13:51     ` Phil Auld [this message]
2019-03-15 15:59       ` Peter Zijlstra
2019-03-15 16:19         ` Phil Auld
2019-03-15 13:30   ` Phil Auld
2019-03-15 16:29     ` Peter Zijlstra
2019-03-15 15:30   ` Phil Auld
2019-03-15 16:03     ` Peter Zijlstra
2019-03-15 16:17       ` Phil Auld
2019-03-18 13:29       ` Phil Auld
2019-03-18 17:14         ` bsegall
2019-03-18 17:52           ` Phil Auld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190315135124.GC27131@pauld.bos.csb \
    --to=pauld@redhat.com \
    --cc=bsegall@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox