Re: [PATCH] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Phil Auld <pauld@redhat.com>
To: bsegall@google.com
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
Date: Mon, 18 Mar 2019 13:52:29 -0400	[thread overview]
Message-ID: <20190318175229.GB15377@pauld.bos.csb> (raw)
In-Reply-To: <xm264l80xfyp.fsf@bsegall-linux.svl.corp.google.com>

On Mon, Mar 18, 2019 at 10:14:22AM -0700 bsegall@google.com wrote:
> Phil Auld <pauld@redhat.com> writes:
> 
> > On Fri, Mar 15, 2019 at 05:03:47PM +0100 Peter Zijlstra wrote:
> >> On Fri, Mar 15, 2019 at 11:30:42AM -0400, Phil Auld wrote:
> >> 
> >> >> I'll rework the maths in the averaged version and post v2 if that makes sense.
> >> > 
> >> > It may have the extra timer fetch, although maybe I could rework it so that it used the 
> >> > nsstart time the first time and did not need to do it twice in a row. I had originally
> >> > reverted the hrtimer_forward_now() to hrtimer_forward() but put that back. 
> >> 
> >> Sure; but remember, simpler is often better, esp. for code that
> >> typically 'never' runs.
> >
> > I reworked it to the below. This settles a bit faster. The average is sort of squishy because
> > it's 3 samples divided by 4.  And if we stay in a single call after updating the period the "average"
> > will be even less accurate. 
> >
> > It settles at a larger value faster so produces fewer messages and none of the callback supressed ones.
> > The added complexity may not be worth it, though.
> >
> > I think this or your version, either one, would work.  
> >
> > What needs to happen now to get one of them to land somewhere? Should I just repost one with my 
> > signed-off and let you add whatever other tags?  And if so do you have a preference for which one?  
> >
> > Also, Ben, thoughts?
> 
> It would probably make sense to have it just be ++count > 4 then I
> think? But otherwise yeah, I'm fine with either.
> 

That would certainly work, of course :)

I liked the 3 to catch it as soon as possible but in the end one more loop won't likely matter, 
and it may prevent more since we are more likely to have it right. 


Cheers,
Phil 

> >
> > Cheers,
> > Phil
> >
> > --
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ea74d43924b2..297fd228fdb0 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > +extern const u64 max_cfs_quota_period;
> > +
> >  static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
> >  {
> >  	struct cfs_bandwidth *cfs_b =
> > @@ -4892,14 +4894,46 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
> >  	unsigned long flags;
> >  	int overrun;
> >  	int idle = 0;
> > +	int count = 0;
> > +	u64 start, now;
> >  
> >  	raw_spin_lock_irqsave(&cfs_b->lock, flags);
> > +	now = start = ktime_to_ns(hrtimer_cb_get_time(timer));
> >  	for (;;) {
> > -		overrun = hrtimer_forward_now(timer, cfs_b->period);
> > +		overrun = hrtimer_forward(timer, now, cfs_b->period);
> >  		if (!overrun)
> >  			break;
> >  
> > +		if (++count > 3) {
> > +			u64 new, old = ktime_to_ns(cfs_b->period);
> > +
> > +                        /* rough average of the time each loop is taking
> > +			  * really should be (n-s)/3 but this is easier for the machine
> > +			  */
> > +			new = (now - start) >> 2; 
> > +			if (new < old)
> > +				new = old;
> > +			new = (new * 147) / 128; /* ~115% */
> > +			new = min(new, max_cfs_quota_period);
> > +
> > +			cfs_b->period = ns_to_ktime(new);
> > +
> > +			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
> > +			cfs_b->quota *= new;
> > +			cfs_b->quota /= old;
> > +
> > +			pr_warn_ratelimited(
> > +	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
> > +				smp_processor_id(),
> > +				new/NSEC_PER_USEC,
> > +				cfs_b->quota/NSEC_PER_USEC);
> > +
> > +			/* reset count so we don't come right back in here */
> > +			count = 0;
> > +		}
> > +
> >  		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
> > +		now = ktime_to_ns(hrtimer_cb_get_time(timer));
> >  	}
> >  	if (idle)
> >  		cfs_b->period_active = 0;

--

     prev parent reply	other threads:[~2019-03-18 17:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-13 15:08 [PATCH] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
2019-03-15 10:11 ` Peter Zijlstra
2019-03-15 10:33   ` Peter Zijlstra
2019-03-15 13:51     ` Phil Auld
2019-03-15 15:59       ` Peter Zijlstra
2019-03-15 16:19         ` Phil Auld
2019-03-15 13:30   ` Phil Auld
2019-03-15 16:29     ` Peter Zijlstra
2019-03-15 15:30   ` Phil Auld
2019-03-15 16:03     ` Peter Zijlstra
2019-03-15 16:17       ` Phil Auld
2019-03-18 13:29       ` Phil Auld
2019-03-18 17:14         ` bsegall
2019-03-18 17:52           ` Phil Auld [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190318175229.GB15377@pauld.bos.csb \
    --to=pauld@redhat.com \
    --cc=bsegall@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox