public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: john stultz <johnstul@us.ibm.com>
To: Andrew Lutomirski <luto@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, pc@us.ibm.com
Subject: Re: [PATCH] Improve clocksource unstable warning
Date: Tue, 16 Nov 2010 17:54:10 -0800	[thread overview]
Message-ID: <1289958850.3860.70.camel@localhost.localdomain> (raw)
In-Reply-To: <AANLkTimU63uKP1g4OB1wXMLJK85eNA=uTG+ZMbv2_NRv@mail.gmail.com>

On Tue, 2010-11-16 at 20:24 -0500, Andrew Lutomirski wrote:
> On Tue, Nov 16, 2010 at 8:19 PM, john stultz <johnstul@us.ibm.com> wrote:
> > On Tue, 2010-11-16 at 19:54 -0500, Andrew Lutomirski wrote:
> >> On Tue, Nov 16, 2010 at 7:26 PM, john stultz <johnstul@us.ibm.com> wrote:
> >> > I'm starting to think we should be pushing the watchdog check into the
> >> > timekeeping accumulation loop (or have it hang off of the accumulation
> >> > loop).
> >> >
> >> > 1) The clocksource cyc2ns conversion code is built with assumptions
> >> > linked to how frequently we accumulate time via update_wall_time().
> >> >
> >> > 2) update_wall_time() happens in timer irq context, so we don't have to
> >> > worry about being delayed. If an irq storm or something does actually
> >> > cause the timer irq to be delayed, we have bigger issues.
> >>
> >> That's why I hit this.  It would be nice if we didn't respond to irq
> >> storms by calling stop_machine.
> >
> > So even if we don't change clocksources, if you have a long enough
> > interrupt storm that delays the hard timer irq, such that the
> > clocksources wrap (or hit the mult overflow), your system time will be
> > lagging behind anyway. So that would be broken regardless of if the
> > watchdog kicked in or not.
> >
> > I suspect that even with such an irq storm, the timer irq will hopefully
> > be high enough priority to be serviced first, avoiding the accumulation
> > loss.
> >
> >
> >> > The only trouble with this, is that if we actually push the max_idle_ns
> >> > out to something like 10 seconds on the TSC, we could end up having the
> >> > watchdog clocksource wrapping while we're in nohz idle.  So that could
> >> > be ugly. Maybe if the current clocksource needs the watchdog
> >> > observations, we should cap the max_idle_ns to the smaller of the
> >> > current clocksource and the watchdog clocksource.
> >> >
> >>
> >> What would you think about implementing non-overflowing
> >> clocksource_cyc2ns on architectures that can do it efficiently?  You'd
> >> have to artificially limit the mask to 2^64 / (rate in GHz), rounded
> >> down to a power of 2, but that shouldn't be a problem for any sensible
> >> clocksource.
> >
> > You would run into accuracy issues. The reason why we use large
> > mult/shift pairs for timekeeping is because we need to make very fine
> > grained adjustments to steer the clock (also just the freq accuracy can
> > be poor if you use too low a shift value in the cyc2ns conversions).
> >
> 
> Why would it be any worse than right now?  We could keep shift as high
> as 32 (or even higher) and use the exact same logic as we use now.

Oh. My apologies, I thought you were suggesting to drop shift down, so
the 64bit mult doesn't overflow, not using a 128 bit mult to just avoid
that issue.

> gcc compiles this code:
> 
> uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift)
> {
> #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5)
>   if (shift >= 32)
>     __builtin_unreachable();
> #endif
>   return (uint64_t)( ((__uint128_t)a * (__uint128_t)mult) >> shift );
> }
> 
> To:
> 
>    0:   89 f0                   mov    %esi,%eax
>    2:   89 d1                   mov    %edx,%ecx
>    4:   48 f7 e7                mul    %rdi
>    7:   48 0f ad d0             shrd   %cl,%rdx,%rax
>    b:   48 d3 ea                shr    %cl,%rdx
>    e:   f6 c1 40                test   $0x40,%cl
>   11:   48 0f 45 c2             cmovne %rdx,%rax
>   15:   c3                      retq
> 
> And if the compiler were a little smarter, it would generate:
> 
> mov    %esi,%eax
> mov    %edx,%ecx
> mul    %rdi
> shrd   %cl,%rdx,%rax
> retq
> 
> So it would be essentially free.

So yes, on 64bit systems it won't be so bad, but again, I'm worried a
bit about overhead on 32bit systems, as clocksource_cyc2ns is in the
gettimeofday hot path for a quite a lot of applications.

But it is an interesting thought.

And something like the following could avoid the overhead most of the
time.
if(unlikely(delta > cs->max_mult64_cycles))
	return cyc2ns128(delta, cs->mult, cs->shift);
return cyc2ns64(delta, cs->mult, cs->shift);

Where we optimize mult/shift pair so for the likely max nohz time
interval, but allow deeper sleeps without problems.

-john



  reply	other threads:[~2010-11-17  1:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-10 22:16 [PATCH] Improve clocksource unstable warning Andy Lutomirski
2010-11-10 22:28 ` Thomas Gleixner
2010-11-10 22:42   ` [PATCH v2] " Andy Lutomirski
2010-11-12 21:31 ` [PATCH] " john stultz
2010-11-12 21:51   ` john stultz
2010-11-12 21:52   ` Andrew Lutomirski
2010-11-12 23:40     ` john stultz
2010-11-12 23:48       ` Andrew Lutomirski
2010-11-12 23:51         ` Andrew Lutomirski
2010-11-13  0:22           ` john stultz
2010-11-13  0:58             ` john stultz
2010-11-17  0:05               ` Andrew Lutomirski
2010-11-17  0:26                 ` john stultz
2010-11-17  0:54                   ` Andrew Lutomirski
2010-11-17  1:19                     ` john stultz
2010-11-17  1:24                       ` Andrew Lutomirski
2010-11-17  1:54                         ` john stultz [this message]
2010-11-12 23:52         ` john stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1289958850.3860.70.camel@localhost.localdomain \
    --to=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@mit.edu \
    --cc=pc@us.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox