All of lore.kernel.org
 help / color / mirror / Atom feed
* hrtimer_interrupt time sync issues across cores
@ 2017-12-14  7:01 Rajasekaran Chandrasekaran
  2017-12-14  7:47 ` Greg KH
  2017-12-14 17:02 ` valdis.kletnieks at vt.edu
  0 siblings, 2 replies; 3+ messages in thread
From: Rajasekaran Chandrasekaran @ 2017-12-14  7:01 UTC (permalink / raw)
  To: kernelnewbies

Hi,


In our multi-core x86 based system that is running 3.4.19 version of
kernel, hrtimer_interrupt (called from apic_timer_interrupt) keeps looping
in hardirq for atleast 1.6 seconds.  We use tsc as our clock source. The
issue happens very rarely in our system and hard to reproduce.



Problem:

Inside hrtimer_interrupt function, basenow.tv64 in CPU-3 is 1.6 seconds
ahead of other CPU?s (we have 4 cores), whereas hrtimer->_softexpires.tv64
is in sync with remaining CPU?s. Due to this, the if condition inside
hrtimer_interrupt where we check if basenow.tv64 <
hrtimer_get_softexpires_tv64(timer) is not true for 1.6 seconds, which
cause the while loop inside hrtimer_interrupt to not exit. Below is the
ftrace captured during the problem.



<idle>-0     [002] d.h. 800364.533632: hrtimer_expire_entry:
hrtimer=ffff88017fd0c960 function=tick_sched_timer now=801616439840902

ksoftirqd/3-19    [003] dNh. 800364.539178: hrtimer_expire_entry:
hrtimer=ffff88017fd8c960 function=tick_sched_timer now=801618042768641

ksoftirqd/3-19    [003] dNh. 800364.539185: hrtimer_start:
hrtimer=ffff88017fd8c960 function=tick_sched_timer expires=801616446505014
softexpires=801616446505014



As we can see, the difference in now time between CPU-2 and CPU-3(where the
time jump is seen) is significant. Ftrace indicates that the now time has
drifted apart in CPU-3 by 1602 milliseconds, even though timestamp is apart
by only 6 milliseconds. Also since the hrtimer expiry time is in the past,
we end up spending lot of time in hardirq. From my understanding of the
code, , basenow.tv64  is computed in hrtimer_update_base()
->ktime_get_update_offsets() as timekeeper.xtime ? offs_real. Both
timekeeper.xtime and offs_real are always updated under a lock.  So, I am
still unsure on how only one core is seeing the time incorrectly.


Any inputs will be greatly help.



Thanks,

Raj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20171213/05fede11/attachment.html 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-12-14 17:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-14  7:01 hrtimer_interrupt time sync issues across cores Rajasekaran Chandrasekaran
2017-12-14  7:47 ` Greg KH
2017-12-14 17:02 ` valdis.kletnieks at vt.edu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.