[PATCH] smp: print only local CPU info when sched

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] smp: print only local CPU info when sched_clock goes  backward
@ 2024-07-15 17:49 Rik van Riel
  2024-07-15 18:07 ` Paul E. McKenney
  2024-07-16  9:04 ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Rik van Riel @ 2024-07-15 17:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra (Intel), linux-kernel, kernel-team,
	neeraj.upadhyay, mingo, peterz, rostedt, Leonardo Bras

About 40% of all csd_lock warnings observed in our fleet appear to
be due to sched_clock() going backward in time (usually only a little
bit), resulting in ts0 being larger than ts2.

When the local CPU is at fault, we should print out a message reflecting
that, rather than trying to get the remote CPU's stack trace.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 kernel/smp.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index f085ebcdf9e7..5656ef63ea82 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
 	if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0))
 		return false;
 
+	if (ts0 > ts2) {
+		/* Our own sched_clock went backward; don't blame another CPU. */
+		ts_delta = ts0 - ts2;
+		pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta);
+		*ts1 = ts2;
+		return false;
+	}
+
 	firsttime = !*bug_id;
 	if (firsttime)
 		*bug_id = atomic_inc_return(&csd_bug_count);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward
  2024-07-15 17:49 [PATCH] smp: print only local CPU info when sched_clock goes backward Rik van Riel
@ 2024-07-15 18:07 ` Paul E. McKenney
  2024-07-24 17:19   ` Neeraj Upadhyay
  2024-07-16  9:04 ` Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2024-07-15 18:07 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Zijlstra (Intel), linux-kernel, kernel-team,
	neeraj.upadhyay, mingo, rostedt, Leonardo Bras

On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote:
> About 40% of all csd_lock warnings observed in our fleet appear to
> be due to sched_clock() going backward in time (usually only a little
> bit), resulting in ts0 being larger than ts2.
> 
> When the local CPU is at fault, we should print out a message reflecting
> that, rather than trying to get the remote CPU's stack trace.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>

Tested-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/smp.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index f085ebcdf9e7..5656ef63ea82 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
>  	if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0))
>  		return false;
>  
> +	if (ts0 > ts2) {
> +		/* Our own sched_clock went backward; don't blame another CPU. */
> +		ts_delta = ts0 - ts2;
> +		pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta);
> +		*ts1 = ts2;
> +		return false;
> +	}
> +
>  	firsttime = !*bug_id;
>  	if (firsttime)
>  		*bug_id = atomic_inc_return(&csd_bug_count);
> -- 
> 2.45.2
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward
  2024-07-15 18:07 ` Paul E. McKenney
@ 2024-07-24 17:19   ` Neeraj Upadhyay
  0 siblings, 0 replies; 6+ messages in thread
From: Neeraj Upadhyay @ 2024-07-24 17:19 UTC (permalink / raw)
  To: riel
  Cc: Paul E. McKenney, linux-kernel, kernel-team, neeraj.upadhyay,
	mingo, rostedt, Leonardo Bras

On Mon, Jul 15, 2024 at 11:07:30AM -0700, Paul E. McKenney wrote:
> On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote:
> > About 40% of all csd_lock warnings observed in our fleet appear to
> > be due to sched_clock() going backward in time (usually only a little
> > bit), resulting in ts0 being larger than ts2.
> > 
> > When the local CPU is at fault, we should print out a message reflecting
> > that, rather than trying to get the remote CPU's stack trace.
> > 
> > Signed-off-by: Rik van Riel <riel@surriel.com>
> 
> Tested-by: Paul E. McKenney <paulmck@kernel.org>
> 

I have included this patch as part of the CSD-lock diagnostics series
which is submitted for review and planned for v6.12 [1]. I have also
included it in RCU tree [2] for more testing.


[1] https://lore.kernel.org/lkml/20240722133559.GA667117@neeraj.linux/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/neeraj.upadhyay/linux-rcu.git/log/?h=next


- Neeraj

> > ---
> >  kernel/smp.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index f085ebcdf9e7..5656ef63ea82 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
> >  	if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0))
> >  		return false;
> >  
> > +	if (ts0 > ts2) {
> > +		/* Our own sched_clock went backward; don't blame another CPU. */
> > +		ts_delta = ts0 - ts2;
> > +		pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta);
> > +		*ts1 = ts2;
> > +		return false;
> > +	}
> > +
> >  	firsttime = !*bug_id;
> >  	if (firsttime)
> >  		*bug_id = atomic_inc_return(&csd_bug_count);
> > -- 
> > 2.45.2
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward
  2024-07-15 17:49 [PATCH] smp: print only local CPU info when sched_clock goes backward Rik van Riel
  2024-07-15 18:07 ` Paul E. McKenney
@ 2024-07-16  9:04 ` Peter Zijlstra
  2024-07-16 13:10   ` Rik van Riel
  2024-07-16 13:47   ` Paul E. McKenney
  1 sibling, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2024-07-16  9:04 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Paul E. McKenney, linux-kernel, kernel-team, neeraj.upadhyay,
	mingo, rostedt, Leonardo Bras

On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote:
> About 40% of all csd_lock warnings observed in our fleet appear to
> be due to sched_clock() going backward in time (usually only a little
> bit), resulting in ts0 being larger than ts2.
> 
> When the local CPU is at fault, we should print out a message reflecting
> that, rather than trying to get the remote CPU's stack trace.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  kernel/smp.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index f085ebcdf9e7..5656ef63ea82 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
>  	if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0))
>  		return false;
>  
> +	if (ts0 > ts2) {
> +		/* Our own sched_clock went backward; don't blame another CPU. */
> +		ts_delta = ts0 - ts2;
> +		pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta);
> +		*ts1 = ts2;
> +		return false;
> +	}

So I've seen some chatter about this on IRC and was WTF, seeing this
patch I'm still WTF. What is going on with those machines?!?!



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward
  2024-07-16  9:04 ` Peter Zijlstra
@ 2024-07-16 13:10   ` Rik van Riel
  2024-07-16 13:47   ` Paul E. McKenney
  1 sibling, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2024-07-16 13:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, kernel-team, neeraj.upadhyay,
	mingo, rostedt, Leonardo Bras

On Tue, 2024-07-16 at 11:04 +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote:
> > About 40% of all csd_lock warnings observed in our fleet appear to
> > be due to sched_clock() going backward in time (usually only a
> > little
> > bit), resulting in ts0 being larger than ts2.
> > 
> > When the local CPU is at fault, we should print out a message
> > reflecting
> > that, rather than trying to get the remote CPU's stack trace.
> > 
> > Signed-off-by: Rik van Riel <riel@surriel.com>
> > ---
> >  kernel/smp.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index f085ebcdf9e7..5656ef63ea82 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -237,6 +237,14 @@ static bool
> > csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1,
> > in
> >  	if (likely(ts_delta <= csd_lock_timeout_ns ||
> > csd_lock_timeout_ns == 0))
> >  		return false;
> >  
> > +	if (ts0 > ts2) {
> > +		/* Our own sched_clock went backward; don't blame
> > another CPU. */
> > +		ts_delta = ts0 - ts2;
> > +		pr_alert("sched_clock on CPU %d went backward by
> > %llu ns\n", raw_smp_processor_id(), ts_delta);
> > +		*ts1 = ts2;
> > +		return false;
> > +	}
> 
> So I've seen some chatter about this on IRC and was WTF, seeing this
> patch I'm still WTF. What is going on with those machines?!?!
> 

Unfortunately this particular behavior does not appear to be limited
to a few bad systems. The distribution over time looks as follows:

- 1 day:     N affected systems
- 3 days:   3N affected systems
- 1 week:   6N affected systems
- 2 weeks: 14N affected systems
- 30 days: 29N affected systems
- 90 days: 72N affected systems

In other words, we are not looking at a few "repeat customers",
but at a large number of systems that show the backward TSC
behavior a few times, and then not again for months.

N is a triple digit number, so I'm fairly confident in these
numbers.

If it's any consolation this seems to happen on both Intel
and AMD systems. 

The only detection code we have currently is in csd_lock_wait_toolong,
so there could be a lot of cases of the TSC moving backward that
are falling through the cracks.

Most of the time the TSC only moves backward a few dozen, or a few
hundred nanoseconds, and backward time past a few microseconds is
a very long tail. I have not sliced this data by CPU model, and
don't know if the distribution is different for different CPUs.

The code in question simply loads one sched_clock value into ts0,
and another into ts2:

static void __csd_lock_wait(call_single_data_t *csd)
{
        int bug_id = 0;
        u64 ts0, ts1;

        ts1 = ts0 = sched_clock();
        for (;;) {
                if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id))
                        break;
                cpu_relax();
        }

...

static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64
*ts1, int *bug_id)
{
...
        ts2 = sched_clock();

I don't know if it matters for the CPU that csd_lock_wait_toolong
gets inlined into __csd_lock_wait.

On one particular system replacing the rdtsc() in native_sched_clock()
with rdtsc_ordered() did not help, but that system sees frequent
negative time jumps of 2.5ms, and might just be broken.

The MTBF is high enough that I'm not sure whether always using
rdtsc_ordered() would be warranted, and I don't know whether it
would even help...

-- 
All Rights Reversed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] smp: print only local CPU info when sched_clock goes backward
  2024-07-16  9:04 ` Peter Zijlstra
  2024-07-16 13:10   ` Rik van Riel
@ 2024-07-16 13:47   ` Paul E. McKenney
  1 sibling, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2024-07-16 13:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rik van Riel, linux-kernel, kernel-team, neeraj.upadhyay, mingo,
	rostedt, Leonardo Bras

On Tue, Jul 16, 2024 at 11:04:43AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2024 at 01:49:41PM -0400, Rik van Riel wrote:
> > About 40% of all csd_lock warnings observed in our fleet appear to
> > be due to sched_clock() going backward in time (usually only a little
> > bit), resulting in ts0 being larger than ts2.
> > 
> > When the local CPU is at fault, we should print out a message reflecting
> > that, rather than trying to get the remote CPU's stack trace.
> > 
> > Signed-off-by: Rik van Riel <riel@surriel.com>
> > ---
> >  kernel/smp.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index f085ebcdf9e7..5656ef63ea82 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -237,6 +237,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
> >  	if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0))
> >  		return false;
> >  
> > +	if (ts0 > ts2) {
> > +		/* Our own sched_clock went backward; don't blame another CPU. */
> > +		ts_delta = ts0 - ts2;
> > +		pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta);
> > +		*ts1 = ts2;
> > +		return false;
> > +	}
> 
> So I've seen some chatter about this on IRC and was WTF, seeing this
> patch I'm still WTF. What is going on with those machines?!?!

I wish I knew, but for the record, I completely agree that both your
acronym and interrobang apply to this case.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-24 17:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-15 17:49 [PATCH] smp: print only local CPU info when sched_clock goes backward Rik van Riel
2024-07-15 18:07 ` Paul E. McKenney
2024-07-24 17:19   ` Neeraj Upadhyay
2024-07-16  9:04 ` Peter Zijlstra
2024-07-16 13:10   ` Rik van Riel
2024-07-16 13:47   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox