From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris J Arges <chris.j.arges@canonical.com>,
Rafael David Tinoco <inaddy@ubuntu.com>,
Peter Anvin <hpa@zytor.com>,
Jiang Liu <jiang.liu@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
Frederic Weisbecker <fweisbec@gmail.com>,
Gema Gomez <gema.gomez-solano@canonical.com>,
the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [PATCH] smp/call: Detect stuck CSD locks
Date: Tue, 7 Apr 2015 11:21:21 +0200 [thread overview]
Message-ID: <20150407092121.GA9971@gmail.com> (raw)
In-Reply-To: <CA+55aFyqbih4XLYtC0kAWFrEH+_uUMx_mbNsnKWKNuKQJRA3Dg@mail.gmail.com>
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, Apr 6, 2015 at 9:58 AM, Chris J Arges
> <chris.j.arges@canonical.com> wrote:
> >
> > I noticed that with this patch it never reached 'csd: Detected
> > non-responsive CSD lock...' because it seems that ts_delta never reached
> > CSD_LOCK_TIMEOUT. I tried adjusting the TIMEOUT value and still got a
> > hang without reaching this statement. I made the ts0,ts1 values global
> > and put a counter into the while loop and found that the loop iterated
> > about 670 million times before the softlockup was detected. In addition
> > ts0 and ts1 both had the same values upon soft lockup, and thus would
> > never trip the CSD_LOCK_TIMEOUT check.
>
> Sounds like jiffies stops updating. Which doesn't sound unreasonable
> for when there is some IPI problem.
Yeah - although it weakens the 'IPI lost spuriously' hypothesis: we
ought to have irqs enabled here which normally doesn't stop jiffies
from updating, and the timer interrupt stopping suggests a much deeper
problem than just some lost IPI ...
>
> How about just changing the debug patch to count iterations, and
> print out a warning when it reaches ten million or so.
Yeah, or replace jiffies_to_ms() with:
sched_clock()/1000000
sched_clock() should be safe to call in these codepaths.
Like the attached patch. (Totally untested.)
Thanks,
Ingo
---
kernel/smp.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 46 insertions(+), 5 deletions(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index f38a1e692259..e0eec1ab3ef2 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -98,22 +98,63 @@ void __init call_function_init(void)
register_cpu_notifier(&hotplug_cfd_notifier);
}
+/* Locking timeout in ms: */
+#define CSD_LOCK_TIMEOUT (10*1000ULL)
+
+/* Print this ID in every printk line we output, to be able to easily sort them apart: */
+static int csd_bug_count;
+
/*
* csd_lock/csd_unlock used to serialize access to per-cpu csd resources
*
* For non-synchronous ipi calls the csd can still be in use by the
* previous function call. For multi-cpu calls its even more interesting
* as we'll have to ensure no other cpu is observing our csd.
+ *
+ * ( The overhead of deadlock detection is not a big problem, this is a
+ * cpu_relax() loop that is actively wasting CPU cycles to poll for
+ * completion. )
*/
-static void csd_lock_wait(struct call_single_data *csd)
+static void csd_lock_wait(struct call_single_data *csd, int cpu)
{
- while (csd->flags & CSD_FLAG_LOCK)
+ int bug_id = 0;
+ u64 ts0, ts1, ts_delta;
+
+ ts0 = sched_clock()/1000000;
+
+ if (unlikely(!csd_bug_count)) {
+ csd_bug_count++;
+ printk("csd: CSD deadlock debugging initiated!\n");
+ }
+
+ while (csd->flags & CSD_FLAG_LOCK) {
+ ts1 = sched_clock()/1000000;
+
+ ts_delta = ts1-ts0;
+ if (unlikely(ts_delta >= CSD_LOCK_TIMEOUT)) { /* Uh oh, it took too long. Why? */
+
+ bug_id = csd_bug_count;
+ csd_bug_count++;
+
+ ts0 = ts1; /* Re-start the timeout detection */
+
+ printk("csd: Detected non-responsive CSD lock (#%d) on CPU#%02d, waiting %Ld.%03Ld secs for CPU#%02d\n",
+ bug_id, raw_smp_processor_id(), ts_delta/1000ULL, ts_delta % 1000ULL, cpu);
+ if (cpu >= 0) {
+ printk("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", bug_id, raw_smp_processor_id(), cpu);
+ arch_send_call_function_single_ipi(cpu);
+ }
+ dump_stack();
+ }
cpu_relax();
+ }
+ if (unlikely(bug_id))
+ printk("csd: CSD lock (#%d) got unstuck on CPU#%02d, CPU#%02d released the lock after all. Phew!\n", bug_id, raw_smp_processor_id(), cpu);
}
static void csd_lock(struct call_single_data *csd)
{
- csd_lock_wait(csd);
+ csd_lock_wait(csd, -1);
csd->flags |= CSD_FLAG_LOCK;
/*
@@ -191,7 +232,7 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
arch_send_call_function_single_ipi(cpu);
if (wait)
- csd_lock_wait(csd);
+ csd_lock_wait(csd, cpu);
return 0;
}
@@ -446,7 +487,7 @@ void smp_call_function_many(const struct cpumask *mask,
struct call_single_data *csd;
csd = per_cpu_ptr(cfd->csd, cpu);
- csd_lock_wait(csd);
+ csd_lock_wait(csd, cpu);
}
}
}
next prev parent reply other threads:[~2015-04-07 9:21 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-11 13:19 smp_call_function_single lockups Rafael David Tinoco
2015-02-11 18:18 ` Linus Torvalds
2015-02-11 19:59 ` Linus Torvalds
2015-02-11 20:42 ` Linus Torvalds
2015-02-12 16:38 ` Rafael David Tinoco
2015-02-18 22:25 ` Peter Zijlstra
2015-02-19 15:42 ` Rafael David Tinoco
2015-02-19 16:14 ` Linus Torvalds
2015-02-23 14:01 ` Rafael David Tinoco
2015-02-23 19:32 ` Linus Torvalds
2015-02-23 20:50 ` Peter Zijlstra
2015-02-23 21:02 ` Rafael David Tinoco
2015-02-19 16:16 ` Peter Zijlstra
2015-02-19 16:26 ` Linus Torvalds
2015-02-19 16:32 ` Rafael David Tinoco
2015-02-19 16:59 ` Linus Torvalds
2015-02-19 17:30 ` Rafael David Tinoco
2015-02-19 17:39 ` Linus Torvalds
2015-02-19 20:29 ` Linus Torvalds
2015-02-19 21:59 ` Linus Torvalds
2015-02-19 22:45 ` Linus Torvalds
2015-03-31 3:15 ` Chris J Arges
2015-03-31 4:28 ` Linus Torvalds
2015-03-31 10:56 ` [debug PATCHes] " Ingo Molnar
2015-03-31 22:38 ` Chris J Arges
2015-04-01 12:39 ` Ingo Molnar
2015-04-01 14:10 ` Chris J Arges
2015-04-01 14:55 ` Ingo Molnar
2015-03-31 4:46 ` Linus Torvalds
2015-03-31 15:08 ` Linus Torvalds
2015-03-31 22:23 ` Chris J Arges
2015-03-31 23:07 ` Linus Torvalds
2015-04-01 14:32 ` Chris J Arges
2015-04-01 15:36 ` Linus Torvalds
2015-04-02 9:55 ` Ingo Molnar
2015-04-02 17:35 ` Linus Torvalds
2015-04-01 12:43 ` Ingo Molnar
2015-04-01 16:10 ` Chris J Arges
2015-04-01 16:14 ` Linus Torvalds
2015-04-01 21:59 ` Chris J Arges
2015-04-02 17:31 ` Linus Torvalds
2015-04-02 18:26 ` Ingo Molnar
2015-04-02 18:51 ` Chris J Arges
2015-04-02 19:07 ` Ingo Molnar
2015-04-02 20:57 ` Linus Torvalds
2015-04-02 21:13 ` Chris J Arges
2015-04-03 5:43 ` [PATCH] smp/call: Detect stuck CSD locks Ingo Molnar
2015-04-03 5:47 ` Ingo Molnar
2015-04-06 16:58 ` Chris J Arges
2015-04-06 17:32 ` Linus Torvalds
2015-04-07 9:21 ` Ingo Molnar [this message]
2015-04-07 20:59 ` Chris J Arges
2015-04-07 21:15 ` Linus Torvalds
2015-04-08 6:47 ` Ingo Molnar
2015-04-13 3:56 ` Chris J Arges
2015-04-13 6:14 ` Ingo Molnar
2015-04-15 19:54 ` Chris J Arges
2015-04-16 11:04 ` Ingo Molnar
2015-04-16 15:58 ` Chris J Arges
2015-04-16 16:31 ` Ingo Molnar
2015-04-29 21:08 ` Chris J Arges
2015-05-11 14:00 ` Ingo Molnar
2015-05-20 18:19 ` Chris J Arges
2015-04-03 5:45 ` smp_call_function_single lockups Ingo Molnar
2015-04-06 17:23 ` Chris J Arges
2015-02-20 9:30 ` Ingo Molnar
2015-02-20 16:49 ` Linus Torvalds
2015-02-20 19:41 ` Ingo Molnar
2015-02-20 20:03 ` Linus Torvalds
2015-02-20 20:11 ` Ingo Molnar
2015-03-20 10:15 ` Peter Zijlstra
2015-03-20 16:26 ` Linus Torvalds
2015-03-20 17:14 ` Mike Galbraith
2015-04-01 14:22 ` Frederic Weisbecker
2015-04-18 10:13 ` [tip:locking/urgent] smp: Fix smp_call_function_single_async() locking tip-bot for Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150407092121.GA9971@gmail.com \
--to=mingo@kernel.org \
--cc=axboe@kernel.dk \
--cc=chris.j.arges@canonical.com \
--cc=fweisbec@gmail.com \
--cc=gema.gomez-solano@canonical.com \
--cc=hpa@zytor.com \
--cc=inaddy@ubuntu.com \
--cc=jiang.liu@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.