From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
To: Santosh Shukla <santosh.shukla@linaro.org>,
Thomas Gleixner <tglx@linutronix.de>,
fweisbec@gmail.com
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linaro-kernel <linaro-kernel@lists.linaro.org>,
daniel.lezcano@linaro.org, Kevin Hilman <khilman@linaro.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [Query] Spurious interrupts from clockevent device on X86 Ivybridge
Date: Wed, 10 Dec 2014 18:03:58 +0530 [thread overview]
Message-ID: <54883DB6.10504@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+iXiiOhPxvJKR5-9U_9Hd6mOnXJadBS+WRZhtv3yPw2r8KgJQ@mail.gmail.com>
Hi Santhosh,
On 12/10/2014 11:00 AM, Santosh Shukla wrote:
> Hi Guys,
>
> I have been chasing spurious clockevent interrupts (In HIGHRES mode) on my x86
> IvyBridge server and stuck on some blockers (mostly due to lack of knowledge)
> and need some help..
>
> What does spurious mean ?
> I take it as the case when a core is unnecessarily interrupted by the clockevent
> device while we didn't wanted it to.
Right. We get an interrupt when nobody had asked for it to be delivered
or had asked for it to be delivered and later canceled the request. It
is most often in the latter situation, that there can be race
conditions. If these race conditions are not taken care of, they can
result in spurious interrupts.
>
> I tried to catch them with the help of few new trace points in
> hrtimer_interrupt() when it didn't serviced any pending hrtimers..
>
> The causes I could find:
> - Clockevent device's counter overflow: When we need to program a clockevt
> device for times larger than it can handle, we program it for max time and
> then reprogram again on expiration. Because of hardware counter limitations we
> have to do it this way and this isn't really a spurious interrupt as we
> programmed it that way.
Yes this is a possible cause for spurious interrupts. On PowerPC we
check for such a situation specifically before we begin to handle timers.
>
> - Hardware timer firing before its expiration: Yes, its hard to digest but from
> my tests it looks like this does happen. Will circulate another mail once I am
> sure about it.
Yep, will need more information if at all this is happening.
>
> - Difference between time returned by hrtimer_update_base() and clockevt
> device's notion of time: Wanted to discuss this here..
>
>
> An example of what I am witnessing in my tests:
>
> clockevt device programmed to fire after 199 ms and it fires exactly after that
> time (Verified with ftrace time-diff and tsc diff [tsc: timestamp counter
> instructions for x86]).
>
> But when hrtimer_interrupt() tries to find the hrtimer for which the event
> occurred, it finds the value returned by hrtimer_update_base() being lesser than
> the softexpires of the timer for which the event was programmed. And the diff is
> approximately 1 us most of the times I see the spurious interrupt.
>
> <idle>-0 [010] d... 68.961171: clk_evt_program: delta:199933679
> expires=69071000000 /* programmed for 199 ms */
> <idle>-0 [010] d... 68.961171: lapic_next_deadline:
> cur_tsc:365799488705427 new_tsc:365799907536051 delta:199443 usec /*
> sets new_tsc */
> <idle>-0 [010] d.h. 69.161183: local_timer_entry: vector=239
> <idle>-0 [010] d.h. 69.161185: local_apic_timer_interrupt:
> cur_tsc:365799907547595 delta:199448 usec /* Interrupt arrived
> afterexactly 199 ms */
> <idle>-0 [010] d.h. 69.161187:
> hrtimer_interrupt:function=tick_sched_timer expires=69071000000
> softexpires=69071000000
> basenow:69070997234 /* But Softexpires > basenow ?? */
> <idle>-0 [010] d.h. 69.161188: clk_evt_program: delta:1000
> expires=69071000000 /* new expiry value (1 us) */
> <idle>-0 [010] d.h. 69.161189: lapic_next_deadline:
> cur_tsc:365799907556179 new_tsc:365799907558259 delta:0 usec /* sets
> new tsc value */
> <idle>-0 [010] d.h. 69.161189: local_timer_exit: vector=239
Since the difference is 1us and not a noticeably high value, it is most
probably because during hrtimer handling, we traverse all queued timers
and call their handlers, till we find timers whose expiry is in the
future. I would not be surprised if we overshoot the expiry of the
timers at the end of the list by a microsecond by the time we call their
handlers.
Regards
Preeti U Murthy
>
> Tested on: Ivybrdge V2, 12 core X86 server, though it occurs on ARM as well
> (but not that frequent).
>
> Regards,
> Santosh
>
next prev parent reply other threads:[~2014-12-10 12:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-10 5:30 [Query] Spurious interrupts from clockevent device on X86 Ivybridge Santosh Shukla
2014-12-10 12:33 ` Preeti U Murthy [this message]
2014-12-10 12:52 ` Viresh Kumar
2014-12-11 4:44 ` Preeti U Murthy
2014-12-11 4:56 ` Santosh Shukla
2014-12-12 11:47 ` Preeti U Murthy
2014-12-11 5:18 ` Viresh Kumar
2014-12-10 14:59 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54883DB6.10504@linux.vnet.ibm.com \
--to=preeti@linux.vnet.ibm.com \
--cc=daniel.lezcano@linaro.org \
--cc=fweisbec@gmail.com \
--cc=khilman@linaro.org \
--cc=linaro-kernel@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=santosh.shukla@linaro.org \
--cc=tglx@linutronix.de \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox