* [PATCH] KGDB: add smp_mb() in synchronisation during exception handler exit
@ 2010-03-09 18:20 Will Deacon
2010-03-09 21:02 ` [Kgdb-bugreport] " George Anzinger
0 siblings, 1 reply; 3+ messages in thread
From: Will Deacon @ 2010-03-09 18:20 UTC (permalink / raw)
To: linux-arm-kernel
KGDB uses atomic variables and busy-wait loops to co-ordinate between
multiple CPUs on an SMP system. When an exception is handled, the primary
CPU executes kgdb_handle_exception() whilst the others execute kgdb_wait.
There comes a point when the waiters are waiting for the primary CPU to finish:
/* Wait till primary CPU is done with debugging */
(1) while (atomic_read(&passive_cpu_wait[cpu]))
cpu_relax();
/* Do important KGDB stuff */
/* Signal the primary CPU that we are done: */
atomic_set(&cpu_in_kgdb[cpu], 0);
In parallel to this, the primary CPU is doing:
for (i = NR_CPUS-1; i >= 0; i--)
atomic_set(&passive_cpu_wait[i], 0);
/*
* Wait till all the CPUs have quit
* from the debugger.
*/
for_each_online_cpu(i) {
(1) while (atomic_read(&cpu_in_kgdb[i]))
cpu_relax();
}
There is a potential deadlock situation at point (1) because the previous
writes to the passive_cpu_wait variables by the primary CPU may not yet be
visible to the other CPUs [for instance, they may be sitting in the local
store buffer]. This means that the waiter CPUs will never exit the while loop
and therefore never write to the cpu_in_kgdb variables, which the primary CPU
is blocked on. Furthermore, because the primary CPU is aggressively performing
reads, the store buffer may not necessarily drain so the system will deadlock.
This deadlock has been experienced on a quad-core ARM11MPCore platform.
The following patch addresses the issue by adding a memory barrier to the
primary CPU before the polling loop, therefore forcing the previous atomic_sets
to be visible before waiting for the waiters to finish.
Cc: KGDB Mailing List <kgdb-bugreport@lists.sourceforge.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: linux-arm-kernel at lists.infradead.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
kernel/kgdb.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index 761fdd2..ee7694b 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -1537,6 +1537,7 @@ acquirelock:
* Wait till all the CPUs have quit
* from the debugger.
*/
+ smp_mb();
for_each_online_cpu(i) {
while (atomic_read(&cpu_in_kgdb[i]))
cpu_relax();
--
1.6.3.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [Kgdb-bugreport] [PATCH] KGDB: add smp_mb() in synchronisation during exception handler exit
2010-03-09 18:20 [PATCH] KGDB: add smp_mb() in synchronisation during exception handler exit Will Deacon
@ 2010-03-09 21:02 ` George Anzinger
2010-03-10 10:17 ` Will Deacon
0 siblings, 1 reply; 3+ messages in thread
From: George Anzinger @ 2010-03-09 21:02 UTC (permalink / raw)
To: linux-arm-kernel
On 03/09/2010 11:20 AM, Will Deacon was caught saying:
> KGDB uses atomic variables and busy-wait loops to co-ordinate between
> multiple CPUs on an SMP system. When an exception is handled, the primary
> CPU executes kgdb_handle_exception() whilst the others execute kgdb_wait.
>
> There comes a point when the waiters are waiting for the primary CPU to finish:
>
> /* Wait till primary CPU is done with debugging */
> (1) while (atomic_read(&passive_cpu_wait[cpu]))
> cpu_relax();
>
> /* Do important KGDB stuff */
>
> /* Signal the primary CPU that we are done: */
> atomic_set(&cpu_in_kgdb[cpu], 0);
>
> In parallel to this, the primary CPU is doing:
>
> for (i = NR_CPUS-1; i>= 0; i--)
> atomic_set(&passive_cpu_wait[i], 0);
> /*
> * Wait till all the CPUs have quit
> * from the debugger.
> */
> for_each_online_cpu(i) {
> (1) while (atomic_read(&cpu_in_kgdb[i]))
> cpu_relax();
> }
>
> There is a potential deadlock situation at point (1) because the previous
> writes to the passive_cpu_wait variables by the primary CPU may not yet be
> visible to the other CPUs [for instance, they may be sitting in the local
> store buffer]. This means that the waiter CPUs will never exit the while loop
> and therefore never write to the cpu_in_kgdb variables, which the primary CPU
> is blocked on. Furthermore, because the primary CPU is aggressively performing
> reads, the store buffer may not necessarily drain so the system will deadlock.
>
> This deadlock has been experienced on a quad-core ARM11MPCore platform.
>
> The following patch addresses the issue by adding a memory barrier to the
> primary CPU before the polling loop, therefore forcing the previous atomic_sets
> to be visible before waiting for the waiters to finish.
>
> Cc: KGDB Mailing List<kgdb-bugreport@lists.sourceforge.net>
> Cc: Catalin Marinas<catalin.marinas@arm.com>
> Cc: Russell King - ARM Linux<linux@arm.linux.org.uk>
> Cc: linux-arm-kernel at lists.infradead.org
> Signed-off-by: Will Deacon<will.deacon@arm.com>
> ---
> kernel/kgdb.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/kgdb.c b/kernel/kgdb.c
> index 761fdd2..ee7694b 100644
> --- a/kernel/kgdb.c
> +++ b/kernel/kgdb.c
> @@ -1537,6 +1537,7 @@ acquirelock:
> * Wait till all the CPUs have quit
> * from the debugger.
> */
> + smp_mb();
> for_each_online_cpu(i) {
> while (atomic_read(&cpu_in_kgdb[i]))
> cpu_relax();
>
Doesn't this have the same issue if this cpu gets to the while prior to
the other cpu doing its write. I would think the "smp_mb()" should be
in the while loop not prior to it.
--
George Anzinger george at wildturkeyranch.net
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Kgdb-bugreport] [PATCH] KGDB: add smp_mb() in synchronisation during exception handler exit
2010-03-09 21:02 ` [Kgdb-bugreport] " George Anzinger
@ 2010-03-10 10:17 ` Will Deacon
0 siblings, 0 replies; 3+ messages in thread
From: Will Deacon @ 2010-03-10 10:17 UTC (permalink / raw)
To: linux-arm-kernel
Hi George,
> > diff --git a/kernel/kgdb.c b/kernel/kgdb.c
> > index 761fdd2..ee7694b 100644
> > --- a/kernel/kgdb.c
> > +++ b/kernel/kgdb.c
> > @@ -1537,6 +1537,7 @@ acquirelock:
> > * Wait till all the CPUs have quit
> > * from the debugger.
> > */
> > + smp_mb();
> > for_each_online_cpu(i) {
> > while (atomic_read(&cpu_in_kgdb[i]))
> > cpu_relax();
> >
> Doesn't this have the same issue if this cpu gets to the while prior to
> the other cpu doing its write. I would think the "smp_mb()" should be
> in the while loop not prior to it.
I don't think so. The deadlock in question is caused because the aggressive
reading in the polling loop prevents the write-buffer from being able to drain.
For a dual-core system the setup can be simplified like this:
Initially: x = 1, y = 1;
CPU1 CPU0
==== ====
while (y == 1); y = 0;
x = 0; while (x == 1); /* This loop prevents y from
leaving the store buffer */
So the patch inserts a memory barrier on CPU0 between the two lines,
forcing the update to y to become visible before the reads to x occur.
CPU1 will then trundle along happily and update x. The absence of a further
polling loop means that the update to x will become visible to CPU0 at some
future time.
Cheers,
Will
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-03-10 10:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-09 18:20 [PATCH] KGDB: add smp_mb() in synchronisation during exception handler exit Will Deacon
2010-03-09 21:02 ` [Kgdb-bugreport] " George Anzinger
2010-03-10 10:17 ` Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).