linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines
@ 2010-07-07 17:12 Anton Vorontsov
  2010-07-07 17:13 ` [PATCH RFC 1/4] ARM: kgdb: Must poll for IPIs during busy-waiting Anton Vorontsov
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Anton Vorontsov @ 2010-07-07 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

It appears that KGDB can easily hang SMP ARM machines.

Here are few fixes that I come up with.

Before the patches, doing 'b schedule' and then 1-3 'continue'
or 'si' gdb commands would cause the board to lock-up completely.

After the patches, KGDB seems to be bullet proof. Well, at least
with the 'b schedule' testcase, there might be other bugs which
I didn't hit yet. ;-)

Notes:

1. I'm testing with I/D caches disabled. With enabled caches,
   ARMv6K still locks-up. I'll test ARMv7 soon, and will try
   to debug cache issues on ARMv6K.

2. The patches are against a heavily patched kernel, and so far
   I didn't rebase them onto the 'debug core' rework as found
   in the very latest mainline kernels. I'll rebase the patches
   soon, so for now this is just an RFC.

Thanks!

-- 
Anton Vorontsov
email: cbouatmailru at gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 1/4] ARM: kgdb: Must poll for IPIs during busy-waiting
  2010-07-07 17:12 [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
@ 2010-07-07 17:13 ` Anton Vorontsov
  2010-07-07 17:13 ` [PATCH RFC 2/4] ARM: kgdb: Disable preemption before re-enabling interrupts Anton Vorontsov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Anton Vorontsov @ 2010-07-07 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

On architectures w/o NMIs (e.g. ARM), ordinary (maskable) IRQs are used
for SMP IPI calls.

Various deadlocks are possible if we not poll for IPIs:

- The master CPU might hang in kgdb_roundup_cpus() because the slave CPU
  does not process IPIs;
- DMA cache coherency calls are implemented as IPIs, so the master CPU
  might hang in *dma_sync_*() calls that may be issued by the KGDB IO
  back-ends.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 arch/arm/kernel/kgdb.c |    8 ++++++++
 kernel/kgdb.c          |   23 ++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index a5b846b..5c61100 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -11,6 +11,7 @@
  */
 #include <linux/irq.h>
 #include <linux/kgdb.h>
+#include <asm/smp.h>
 #include <asm/traps.h>
 
 /* Make a local copy of the registers passed into the handler (bletch) */
@@ -171,6 +172,13 @@ void kgdb_roundup_cpus(unsigned long flags)
        local_irq_disable();
 }
 
+#ifdef CONFIG_SMP
+void kgdb_arch_poll_ipi(struct pt_regs *regs)
+{
+	do_IPI(regs);
+}
+#endif
+
 /**
  *	kgdb_arch_init - Perform any architecture specific initalization.
  *
diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index 9934aa0..97edb05 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -225,6 +225,25 @@ int __weak kgdb_arch_init(void)
 	return 0;
 }
 
+/*
+ * On architectures w/o NMIs (e.g. ARM), ordinary (maskable) IRQs are used
+ * for SMP IPI calls.
+ *
+ * Various deadlocks are possible if we not poll for IPIs:
+ *
+ * - The master CPU might hang in kgdb_roundup_cpus() because the slave CPU
+ *   does not process IPIs;
+ * - DMA cache coherency calls are implemented as IPIs, so the master CPU
+ *   might hang in *dma_sync_*() calls that may be issued by the KGDB IO
+ *   back-ends.
+ *
+ * The rule of thumb: always poll for IPIs in busy-waiting loops until
+ * all CPUs are in KGDB (i.e. all cpu_in_kgdb[] are set).
+ */
+void __weak kgdb_arch_poll_ipi(struct pt_regs *regs)
+{
+}
+
 int __weak kgdb_skipexception(int exception, struct pt_regs *regs)
 {
 	return 0;
@@ -1389,6 +1408,8 @@ acquirelock:
 	 * master cpu and acquire the kgdb_active lock:
 	 */
 	while (1) {
+		kgdb_arch_poll_ipi(regs);
+
 		if (kgdb_info[cpu].exception_state & DCPU_WANT_MASTER) {
 			if (atomic_cmpxchg(&kgdb_active, -1, cpu) == cpu)
 				break;
@@ -1465,7 +1486,7 @@ return_normal:
 	 */
 	for_each_online_cpu(i) {
 		while (!atomic_read(&cpu_in_kgdb[i]))
-			cpu_relax();
+			kgdb_arch_poll_ipi(regs);
 	}
 
 	/*
-- 
1.7.0.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/4] ARM: kgdb: Disable preemption before re-enabling interrupts
  2010-07-07 17:12 [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
  2010-07-07 17:13 ` [PATCH RFC 1/4] ARM: kgdb: Must poll for IPIs during busy-waiting Anton Vorontsov
@ 2010-07-07 17:13 ` Anton Vorontsov
  2010-07-07 17:13 ` [PATCH RFC 3/4] net: Implement napi_try_disable() Anton Vorontsov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Anton Vorontsov @ 2010-07-07 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

We have to disable preemption before enabling local IRQs because
local_irq_enable() makes it possible for the kernel to reschedule,
so the kernel might hit a breakpoint causing itself to re-enter
KGDB and die.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 arch/arm/kernel/kgdb.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index 5c61100..6ece654 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -167,9 +167,17 @@ static void kgdb_call_nmi_hook(void *ignored)
 
 void kgdb_roundup_cpus(unsigned long flags)
 {
+	/*
+	 * We have to disable preemption before enabling local
+	 * IRQs because local_irq_enable() makes it possible for
+	 * the kernel to reschedule, so the kernel might hit a
+	 * breakpoint causing itself to re-enter KGDB and die.
+	 */
+	preempt_disable();
        local_irq_enable();
        smp_call_function(kgdb_call_nmi_hook, NULL, 0);
        local_irq_disable();
+	preempt_enable();
 }
 
 #ifdef CONFIG_SMP
-- 
1.7.0.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 3/4] net: Implement napi_try_disable()
  2010-07-07 17:12 [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
  2010-07-07 17:13 ` [PATCH RFC 1/4] ARM: kgdb: Must poll for IPIs during busy-waiting Anton Vorontsov
  2010-07-07 17:13 ` [PATCH RFC 2/4] ARM: kgdb: Disable preemption before re-enabling interrupts Anton Vorontsov
@ 2010-07-07 17:13 ` Anton Vorontsov
  2010-07-07 17:13 ` [PATCH RFC 4/4] kgdb: Quiesce IO back-end before rounding up secondary CPUs Anton Vorontsov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Anton Vorontsov @ 2010-07-07 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

For KGDBoE we'll need to disable NAPI, so that we don't randomly
interrupt secondary CPUs while they process networking stuff.

We need to wait for NAPI to become disabled from an atomic context,
thus we can't use napi_disable() as it calls msleep().

Plus, in KGDB we have to always poll for IPIs during busy-waiting,
which means that we must implement the function in a such way that
we can do some caller specific work during the time we wait for
NAPI disable, so just changing msleep() to mdelay() won't work.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 include/linux/netdevice.h |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 08a1dd8..4fc24eb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -414,6 +414,27 @@ extern void __napi_complete(struct napi_struct *n);
 extern void napi_complete(struct napi_struct *n);
 
 /**
+ *	napi_try_disable - try to prevent NAPI from scheduling
+ *	@n: napi context
+ *
+ * This call is similar to napi_disable(), except that it doesn't
+ * wait till any outstanding processing completes, but returns
+ * whether napi was successfuly disabled.
+ *
+ * Note that you still must wait till napi is actually disabled,
+ * so the only benefit from using this call is that you can do
+ * some useful work while you wait.
+ */
+static inline int napi_try_disable(struct napi_struct *n)
+{
+	set_bit(NAPI_STATE_DISABLE, &n->state);
+	if (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
+		return 0;
+	clear_bit(NAPI_STATE_DISABLE, &n->state);
+	return 1;
+}
+
+/**
  *	napi_disable - prevent NAPI from scheduling
  *	@n: napi context
  *
-- 
1.7.0.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 4/4] kgdb: Quiesce IO back-end before rounding up secondary CPUs
  2010-07-07 17:12 [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
                   ` (2 preceding siblings ...)
  2010-07-07 17:13 ` [PATCH RFC 3/4] net: Implement napi_try_disable() Anton Vorontsov
@ 2010-07-07 17:13 ` Anton Vorontsov
  2010-07-07 17:54 ` [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
  2010-07-08  8:48 ` Will Deacon
  5 siblings, 0 replies; 7+ messages in thread
From: Anton Vorontsov @ 2010-07-07 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

KGDB may try to roundup secondary CPUs while they're in the IO
back-end code (e.g. NAPI is polling an eth device), so the
secondary CPU might stop processing IO at a random place, which
may cause the IO back-end to become unusable for KGDB itself.

This patch implements try_quiesce and activate KGDB IO callbacks,
so that we quiesce the IO back-end before rounding up the CPUs.
So far it's only implemented for KGDBoE driver.

Note that we have to quiesce the devices via _try mechanism since
we have to poll for IPIs during the time we wait for the other
CPUs.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 drivers/net/kgdboe.c |   21 +++++++++++++++++++++
 include/linux/kgdb.h |    2 ++
 kernel/kgdb.c        |    6 ++++++
 3 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/net/kgdboe.c b/drivers/net/kgdboe.c
index 939797a..b46cc9a 100644
--- a/drivers/net/kgdboe.c
+++ b/drivers/net/kgdboe.c
@@ -270,8 +270,29 @@ static int param_set_kgdboe_var(const char *kmessage, struct kernel_param *kp)
 	return 0;
 }
 
+static int eth_try_quiesce(void)
+{
+	struct napi_struct *napi;
+
+	list_for_each_entry(napi, &np.dev->napi_list, dev_list) {
+		if (!napi_try_disable(napi))
+			return 0;
+	}
+	return 1;
+}
+
+static void eth_activate(void)
+{
+	struct napi_struct *napi;
+
+	list_for_each_entry(napi, &np.dev->napi_list, dev_list)
+		napi_enable(napi);
+}
+
 static struct kgdb_io local_kgdb_io_ops = {
 	.name = "kgdboe",
+	.try_quiesce = eth_try_quiesce,
+	.activate = eth_activate,
 	.read_char = eth_get_char,
 	.write_char = eth_put_char,
 	.flush = eth_flush_buf,
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index 4c80859..93cd305 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -250,6 +250,8 @@ struct kgdb_arch {
  */
 struct kgdb_io {
 	const char		*name;
+	int			(*try_quiesce) (void);
+	void			(*activate) (void);
 	int			(*read_char) (void);
 	void			(*write_char) (u8);
 	void			(*flush) (void);
diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index 97edb05..e7a2274 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -1475,6 +1475,9 @@ return_normal:
 			atomic_inc(&passive_cpu_wait[i]);
 	}
 
+	while (kgdb_io_ops->try_quiesce && !kgdb_io_ops->try_quiesce())
+		kgdb_arch_poll_ipi(regs);
+
 #ifdef CONFIG_SMP
 	/* Signal the other CPUs to enter kgdb_wait() */
 	if ((!kgdb_single_step) && kgdb_do_roundup)
@@ -1489,6 +1492,9 @@ return_normal:
 			kgdb_arch_poll_ipi(regs);
 	}
 
+	if (kgdb_io_ops->activate)
+		kgdb_io_ops->activate();
+
 	/*
 	 * At this point the primary processor is completely
 	 * in the debugger and all secondary CPUs are quiescent
-- 
1.7.0.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines
  2010-07-07 17:12 [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
                   ` (3 preceding siblings ...)
  2010-07-07 17:13 ` [PATCH RFC 4/4] kgdb: Quiesce IO back-end before rounding up secondary CPUs Anton Vorontsov
@ 2010-07-07 17:54 ` Anton Vorontsov
  2010-07-08  8:48 ` Will Deacon
  5 siblings, 0 replies; 7+ messages in thread
From: Anton Vorontsov @ 2010-07-07 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 07, 2010 at 09:12:22PM +0400, Anton Vorontsov wrote:
[...] 
> 2. The patches are against a heavily patched kernel, and so far
>    I didn't rebase them onto the 'debug core' rework as found
>    in the very latest mainline kernels. I'll rebase the patches
>    soon, so for now this is just an RFC.

BTW, I'm testing with another small fixup applied, I didn't send
it as a patch because this deadlock was already fixed in the
debug_core implementation (which KGDB is using nowadays). But
for the completeness, here it is:

(Don't deadlock if there's a wannabe-master CPUs, which entered
KGDB via exception, not NMI/IPI).

diff --git a/kernel/kgdb.c b/kernel/kgdb.c
index e7a2274..65bf75d 100644
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -1522,7 +1522,9 @@ return_normal:
 		 * from the debugger.
 		 */
 		for_each_online_cpu(i) {
-			while (atomic_read(&cpu_in_kgdb[i]))
+			while (atomic_read(&cpu_in_kgdb[i]) &&
+					!(kgdb_info[i].exception_state &
+						DCPU_WANT_MASTER))
 				cpu_relax();
 		}
 	}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines
  2010-07-07 17:12 [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
                   ` (4 preceding siblings ...)
  2010-07-07 17:54 ` [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
@ 2010-07-08  8:48 ` Will Deacon
  5 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2010-07-08  8:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Anton,

> It appears that KGDB can easily hang SMP ARM machines.

Yep. As pointed out at:

http://lkml.org/lkml/2010/4/8/214

Commit ae6bf53e broke kgdb on SMP systems because it incorrectly assumes
that atomic_{inc,dec} imply barriers. Until this is fixed, it will never
work reliably on ARM. Try reverting the patch [this doesn't completely
resolve the situation, but it might help].

> 1. I'm testing with I/D caches disabled. With enabled caches,
>    ARMv6K still locks-up. I'll test ARMv7 soon, and will try
>    to debug cache issues on ARMv6K.

This patch might help if you're using the ARM11MPCore:

http://lists.infradead.org/pipermail/linux-arm-kernel/2010-May/016567.html

Hope that helps,

Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-08  8:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-07 17:12 [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
2010-07-07 17:13 ` [PATCH RFC 1/4] ARM: kgdb: Must poll for IPIs during busy-waiting Anton Vorontsov
2010-07-07 17:13 ` [PATCH RFC 2/4] ARM: kgdb: Disable preemption before re-enabling interrupts Anton Vorontsov
2010-07-07 17:13 ` [PATCH RFC 3/4] net: Implement napi_try_disable() Anton Vorontsov
2010-07-07 17:13 ` [PATCH RFC 4/4] kgdb: Quiesce IO back-end before rounding up secondary CPUs Anton Vorontsov
2010-07-07 17:54 ` [PATCH RFC 0/4] ARM/KGDB: Some fixes for SMP machines Anton Vorontsov
2010-07-08  8:48 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).