From: john stultz <johnstul@us.ibm.com>
To: Andrew Lutomirski <luto@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org, pc@us.ibm.com
Subject: Re: [PATCH] Improve clocksource unstable warning
Date: Tue, 16 Nov 2010 17:54:10 -0800 [thread overview]
Message-ID: <1289958850.3860.70.camel@localhost.localdomain> (raw)
In-Reply-To: <AANLkTimU63uKP1g4OB1wXMLJK85eNA=uTG+ZMbv2_NRv@mail.gmail.com>
On Tue, 2010-11-16 at 20:24 -0500, Andrew Lutomirski wrote:
> On Tue, Nov 16, 2010 at 8:19 PM, john stultz <johnstul@us.ibm.com> wrote:
> > On Tue, 2010-11-16 at 19:54 -0500, Andrew Lutomirski wrote:
> >> On Tue, Nov 16, 2010 at 7:26 PM, john stultz <johnstul@us.ibm.com> wrote:
> >> > I'm starting to think we should be pushing the watchdog check into the
> >> > timekeeping accumulation loop (or have it hang off of the accumulation
> >> > loop).
> >> >
> >> > 1) The clocksource cyc2ns conversion code is built with assumptions
> >> > linked to how frequently we accumulate time via update_wall_time().
> >> >
> >> > 2) update_wall_time() happens in timer irq context, so we don't have to
> >> > worry about being delayed. If an irq storm or something does actually
> >> > cause the timer irq to be delayed, we have bigger issues.
> >>
> >> That's why I hit this. It would be nice if we didn't respond to irq
> >> storms by calling stop_machine.
> >
> > So even if we don't change clocksources, if you have a long enough
> > interrupt storm that delays the hard timer irq, such that the
> > clocksources wrap (or hit the mult overflow), your system time will be
> > lagging behind anyway. So that would be broken regardless of if the
> > watchdog kicked in or not.
> >
> > I suspect that even with such an irq storm, the timer irq will hopefully
> > be high enough priority to be serviced first, avoiding the accumulation
> > loss.
> >
> >
> >> > The only trouble with this, is that if we actually push the max_idle_ns
> >> > out to something like 10 seconds on the TSC, we could end up having the
> >> > watchdog clocksource wrapping while we're in nohz idle. So that could
> >> > be ugly. Maybe if the current clocksource needs the watchdog
> >> > observations, we should cap the max_idle_ns to the smaller of the
> >> > current clocksource and the watchdog clocksource.
> >> >
> >>
> >> What would you think about implementing non-overflowing
> >> clocksource_cyc2ns on architectures that can do it efficiently? You'd
> >> have to artificially limit the mask to 2^64 / (rate in GHz), rounded
> >> down to a power of 2, but that shouldn't be a problem for any sensible
> >> clocksource.
> >
> > You would run into accuracy issues. The reason why we use large
> > mult/shift pairs for timekeeping is because we need to make very fine
> > grained adjustments to steer the clock (also just the freq accuracy can
> > be poor if you use too low a shift value in the cyc2ns conversions).
> >
>
> Why would it be any worse than right now? We could keep shift as high
> as 32 (or even higher) and use the exact same logic as we use now.
Oh. My apologies, I thought you were suggesting to drop shift down, so
the 64bit mult doesn't overflow, not using a 128 bit mult to just avoid
that issue.
> gcc compiles this code:
>
> uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift)
> {
> #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5)
> if (shift >= 32)
> __builtin_unreachable();
> #endif
> return (uint64_t)( ((__uint128_t)a * (__uint128_t)mult) >> shift );
> }
>
> To:
>
> 0: 89 f0 mov %esi,%eax
> 2: 89 d1 mov %edx,%ecx
> 4: 48 f7 e7 mul %rdi
> 7: 48 0f ad d0 shrd %cl,%rdx,%rax
> b: 48 d3 ea shr %cl,%rdx
> e: f6 c1 40 test $0x40,%cl
> 11: 48 0f 45 c2 cmovne %rdx,%rax
> 15: c3 retq
>
> And if the compiler were a little smarter, it would generate:
>
> mov %esi,%eax
> mov %edx,%ecx
> mul %rdi
> shrd %cl,%rdx,%rax
> retq
>
> So it would be essentially free.
So yes, on 64bit systems it won't be so bad, but again, I'm worried a
bit about overhead on 32bit systems, as clocksource_cyc2ns is in the
gettimeofday hot path for a quite a lot of applications.
But it is an interesting thought.
And something like the following could avoid the overhead most of the
time.
if(unlikely(delta > cs->max_mult64_cycles))
return cyc2ns128(delta, cs->mult, cs->shift);
return cyc2ns64(delta, cs->mult, cs->shift);
Where we optimize mult/shift pair so for the likely max nohz time
interval, but allow deeper sleeps without problems.
-john
next prev parent reply other threads:[~2010-11-17 1:54 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-10 22:16 [PATCH] Improve clocksource unstable warning Andy Lutomirski
2010-11-10 22:28 ` Thomas Gleixner
2010-11-10 22:42 ` [PATCH v2] " Andy Lutomirski
2010-11-12 21:31 ` [PATCH] " john stultz
2010-11-12 21:51 ` john stultz
2010-11-12 21:52 ` Andrew Lutomirski
2010-11-12 23:40 ` john stultz
2010-11-12 23:48 ` Andrew Lutomirski
2010-11-12 23:51 ` Andrew Lutomirski
2010-11-13 0:22 ` john stultz
2010-11-13 0:58 ` john stultz
2010-11-17 0:05 ` Andrew Lutomirski
2010-11-17 0:26 ` john stultz
2010-11-17 0:54 ` Andrew Lutomirski
2010-11-17 1:19 ` john stultz
2010-11-17 1:24 ` Andrew Lutomirski
2010-11-17 1:54 ` john stultz [this message]
2010-11-12 23:52 ` john stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1289958850.3860.70.camel@localhost.localdomain \
--to=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@mit.edu \
--cc=pc@us.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.