* [PATCH] smp: print only local CPU info when sched_clock goes backward @ 2024-07-15 17:49 Rik van Riel 2024-07-15 18:07 ` Paul E. McKenney 2024-07-16 9:04 ` Peter Zijlstra 0 siblings, 2 replies; 6+ messages in thread From: Rik van Riel @ 2024-07-15 17:49 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra (Intel), linux-kernel, kernel-team, neeraj.upadhyay, mingo, peterz, rostedt, Leonardo Bras About 40% of all csd_lock warnings observed in our fleet appear to be due to sched_clock() going backward in time (usually only a little bit), resulting in ts0 being larger than ts2. When the local CPU is at fault, we should print out a message reflecting that, rather than trying to get the remote CPU's stack trace. Signed-off-by: Rik van Riel <riel@surriel.com> --- kernel/smp.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/smp.c b/kernel/smp.c index f085ebcdf9e7..5656ef63ea82 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0)) return false; + if (ts0 > ts2) { + /* Our own sched_clock went backward; don't blame another CPU. */ + ts_delta = ts0 - ts2; + pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta); + *ts1 = ts2; + return false; + } + firsttime = !*bug_id; if (firsttime) *bug_id = atomic_inc_return(&csd_bug_count); -- 2.45.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward 2024-07-15 17:49 [PATCH] smp: print only local CPU info when sched_clock goes backward Rik van Riel @ 2024-07-15 18:07 ` Paul E. McKenney 2024-07-24 17:19 ` Neeraj Upadhyay 2024-07-16 9:04 ` Peter Zijlstra 1 sibling, 1 reply; 6+ messages in thread From: Paul E. McKenney @ 2024-07-15 18:07 UTC (permalink / raw) To: Rik van Riel Cc: Peter Zijlstra (Intel), linux-kernel, kernel-team, neeraj.upadhyay, mingo, rostedt, Leonardo Bras On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote: > About 40% of all csd_lock warnings observed in our fleet appear to > be due to sched_clock() going backward in time (usually only a little > bit), resulting in ts0 being larger than ts2. > > When the local CPU is at fault, we should print out a message reflecting > that, rather than trying to get the remote CPU's stack trace. > > Signed-off-by: Rik van Riel <riel@surriel.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> > --- > kernel/smp.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/kernel/smp.c b/kernel/smp.c > index f085ebcdf9e7..5656ef63ea82 100644 > --- a/kernel/smp.c > +++ b/kernel/smp.c > @@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in > if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0)) > return false; > > + if (ts0 > ts2) { > + /* Our own sched_clock went backward; don't blame another CPU. */ > + ts_delta = ts0 - ts2; > + pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta); > + *ts1 = ts2; > + return false; > + } > + > firsttime = !*bug_id; > if (firsttime) > *bug_id = atomic_inc_return(&csd_bug_count); > -- > 2.45.2 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward 2024-07-15 18:07 ` Paul E. McKenney @ 2024-07-24 17:19 ` Neeraj Upadhyay 0 siblings, 0 replies; 6+ messages in thread From: Neeraj Upadhyay @ 2024-07-24 17:19 UTC (permalink / raw) To: riel Cc: Paul E. McKenney, linux-kernel, kernel-team, neeraj.upadhyay, mingo, rostedt, Leonardo Bras On Mon, Jul 15, 2024 at 11:07:30AM -0700, Paul E. McKenney wrote: > On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote: > > About 40% of all csd_lock warnings observed in our fleet appear to > > be due to sched_clock() going backward in time (usually only a little > > bit), resulting in ts0 being larger than ts2. > > > > When the local CPU is at fault, we should print out a message reflecting > > that, rather than trying to get the remote CPU's stack trace. > > > > Signed-off-by: Rik van Riel <riel@surriel.com> > > Tested-by: Paul E. McKenney <paulmck@kernel.org> > I have included this patch as part of the CSD-lock diagnostics series which is submitted for review and planned for v6.12 [1]. I have also included it in RCU tree [2] for more testing. [1] https://lore.kernel.org/lkml/20240722133559.GA667117@neeraj.linux/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/neeraj.upadhyay/linux-rcu.git/log/?h=next - Neeraj > > --- > > kernel/smp.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/kernel/smp.c b/kernel/smp.c > > index f085ebcdf9e7..5656ef63ea82 100644 > > --- a/kernel/smp.c > > +++ b/kernel/smp.c > > @@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in > > if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0)) > > return false; > > > > + if (ts0 > ts2) { > > + /* Our own sched_clock went backward; don't blame another CPU. */ > > + ts_delta = ts0 - ts2; > > + pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta); > > + *ts1 = ts2; > > + return false; > > + } > > + > > firsttime = !*bug_id; > > if (firsttime) > > *bug_id = atomic_inc_return(&csd_bug_count); > > -- > > 2.45.2 > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward 2024-07-15 17:49 [PATCH] smp: print only local CPU info when sched_clock goes backward Rik van Riel 2024-07-15 18:07 ` Paul E. McKenney @ 2024-07-16 9:04 ` Peter Zijlstra 2024-07-16 13:10 ` Rik van Riel 2024-07-16 13:47 ` Paul E. McKenney 1 sibling, 2 replies; 6+ messages in thread From: Peter Zijlstra @ 2024-07-16 9:04 UTC (permalink / raw) To: Rik van Riel Cc: Paul E. McKenney, linux-kernel, kernel-team, neeraj.upadhyay, mingo, rostedt, Leonardo Bras On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote: > About 40% of all csd_lock warnings observed in our fleet appear to > be due to sched_clock() going backward in time (usually only a little > bit), resulting in ts0 being larger than ts2. > > When the local CPU is at fault, we should print out a message reflecting > that, rather than trying to get the remote CPU's stack trace. > > Signed-off-by: Rik van Riel <riel@surriel.com> > --- > kernel/smp.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/kernel/smp.c b/kernel/smp.c > index f085ebcdf9e7..5656ef63ea82 100644 > --- a/kernel/smp.c > +++ b/kernel/smp.c > @@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in > if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0)) > return false; > > + if (ts0 > ts2) { > + /* Our own sched_clock went backward; don't blame another CPU. */ > + ts_delta = ts0 - ts2; > + pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta); > + *ts1 = ts2; > + return false; > + } So I've seen some chatter about this on IRC and was WTF, seeing this patch I'm still WTF. What is going on with those machines?!?! ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward 2024-07-16 9:04 ` Peter Zijlstra @ 2024-07-16 13:10 ` Rik van Riel 2024-07-16 13:47 ` Paul E. McKenney 1 sibling, 0 replies; 6+ messages in thread From: Rik van Riel @ 2024-07-16 13:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Paul E. McKenney, linux-kernel, kernel-team, neeraj.upadhyay, mingo, rostedt, Leonardo Bras On Tue, 2024-07-16 at 11:04 +0200, Peter Zijlstra wrote: > On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote: > > About 40% of all csd_lock warnings observed in our fleet appear to > > be due to sched_clock() going backward in time (usually only a > > little > > bit), resulting in ts0 being larger than ts2. > > > > When the local CPU is at fault, we should print out a message > > reflecting > > that, rather than trying to get the remote CPU's stack trace. > > > > Signed-off-by: Rik van Riel <riel@surriel.com> > > --- > > kernel/smp.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/kernel/smp.c b/kernel/smp.c > > index f085ebcdf9e7..5656ef63ea82 100644 > > --- a/kernel/smp.c > > +++ b/kernel/smp.c > > @@ -237,6 +237,14 @@ static bool > > csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, > > in > > if (likely(ts_delta <= csd_lock_timeout_ns || > > csd_lock_timeout_ns == 0)) > > return false; > > > > + if (ts0 > ts2) { > > + /* Our own sched_clock went backward; don't blame > > another CPU. */ > > + ts_delta = ts0 - ts2; > > + pr_alert("sched_clock on CPU %d went backward by > > %llu ns\n", raw_smp_processor_id(), ts_delta); > > + *ts1 = ts2; > > + return false; > > + } > > So I've seen some chatter about this on IRC and was WTF, seeing this > patch I'm still WTF. What is going on with those machines?!?! > Unfortunately this particular behavior does not appear to be limited to a few bad systems. The distribution over time looks as follows: - 1 day: N affected systems - 3 days: 3N affected systems - 1 week: 6N affected systems - 2 weeks: 14N affected systems - 30 days: 29N affected systems - 90 days: 72N affected systems In other words, we are not looking at a few "repeat customers", but at a large number of systems that show the backward TSC behavior a few times, and then not again for months. N is a triple digit number, so I'm fairly confident in these numbers. If it's any consolation this seems to happen on both Intel and AMD systems. The only detection code we have currently is in csd_lock_wait_toolong, so there could be a lot of cases of the TSC moving backward that are falling through the cracks. Most of the time the TSC only moves backward a few dozen, or a few hundred nanoseconds, and backward time past a few microseconds is a very long tail. I have not sliced this data by CPU model, and don't know if the distribution is different for different CPUs. The code in question simply loads one sched_clock value into ts0, and another into ts2: static void __csd_lock_wait(call_single_data_t *csd) { int bug_id = 0; u64 ts0, ts1; ts1 = ts0 = sched_clock(); for (;;) { if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id)) break; cpu_relax(); } ... static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) { ... ts2 = sched_clock(); I don't know if it matters for the CPU that csd_lock_wait_toolong gets inlined into __csd_lock_wait. On one particular system replacing the rdtsc() in native_sched_clock() with rdtsc_ordered() did not help, but that system sees frequent negative time jumps of 2.5ms, and might just be broken. The MTBF is high enough that I'm not sure whether always using rdtsc_ordered() would be warranted, and I don't know whether it would even help... -- All Rights Reversed. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward 2024-07-16 9:04 ` Peter Zijlstra 2024-07-16 13:10 ` Rik van Riel @ 2024-07-16 13:47 ` Paul E. McKenney 1 sibling, 0 replies; 6+ messages in thread From: Paul E. McKenney @ 2024-07-16 13:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Rik van Riel, linux-kernel, kernel-team, neeraj.upadhyay, mingo, rostedt, Leonardo Bras On Tue, Jul 16, 2024 at 11:04:43AM +0200, Peter Zijlstra wrote: > On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote: > > About 40% of all csd_lock warnings observed in our fleet appear to > > be due to sched_clock() going backward in time (usually only a little > > bit), resulting in ts0 being larger than ts2. > > > > When the local CPU is at fault, we should print out a message reflecting > > that, rather than trying to get the remote CPU's stack trace. > > > > Signed-off-by: Rik van Riel <riel@surriel.com> > > --- > > kernel/smp.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/kernel/smp.c b/kernel/smp.c > > index f085ebcdf9e7..5656ef63ea82 100644 > > --- a/kernel/smp.c > > +++ b/kernel/smp.c > > @@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in > > if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0)) > > return false; > > > > + if (ts0 > ts2) { > > + /* Our own sched_clock went backward; don't blame another CPU. */ > > + ts_delta = ts0 - ts2; > > + pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta); > > + *ts1 = ts2; > > + return false; > > + } > > So I've seen some chatter about this on IRC and was WTF, seeing this > patch I'm still WTF. What is going on with those machines?!?! I wish I knew, but for the record, I completely agree that both your acronym and interrobang apply to this case. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-24 17:19 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-15 17:49 [PATCH] smp: print only local CPU info when sched_clock goes backward Rik van Riel 2024-07-15 18:07 ` Paul E. McKenney 2024-07-24 17:19 ` Neeraj Upadhyay 2024-07-16 9:04 ` Peter Zijlstra 2024-07-16 13:10 ` Rik van Riel 2024-07-16 13:47 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox