* [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
@ 2018-10-29 18:07 Douglas Anderson
  2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, nm, linux-mips,
	dalias, catalin.marinas, vigneshr, linux-aspeed, linux-sh, peterz,
	will.deacon, mhocko, paulus, hpa, sparclinux, marex, sfr, ysato,
	linux-hexagon, x86, linux, pombredanne, tony, mingo, joel,
	linux-serial, rolf.evers.fischer, jhogan, asierra, linux-snps-arc,
	dan.carpenter, ying.huang
I started out this series trying to make sysrq work over the serial
console on qcom_geni_serial, then fell into a rat's nest.
To solve the deadlock I faced when enabling sysrq I tried to borrow
code from '8250_port.c' which avoided grabbing the port lock in
console_write().  ...but since these days I try to run with lockdep on
all the time, I found it caused an annoying lockdep splat (which I
also reproduced on my rk3399 board).  ...so I instead changed my
qcom_geni_serial solution to borrow code from 'msm_serial.c'
I wasn't super happy with the solution in 'msm_serial.c' though.  I
don't like releasing the spinlock there.  Not only is it ugly but it
means we are unlocking / re-locking _all the time_ even though sysrq
characters are rare.  ...so I came up with what I think is a better
solution and then implemented it for qcom_geni_serial.
Since I had a good way to test 8250-based UARTs, I also fixed that
driver to use my new method.  When doing so, I ran into a missing
msm_serial.c at all, so I didn't switch that (or all other serial
drivers for that matter) to the new method.
After fixing all the above issues, I found the next lockdep splat in
kgdb and I think I've worked around it in a good-enough way, but I'm
much less confident about this.  Hopefully folks can take a look at
it.
In general, patches earlier in this series should be "less
controversial" and hopefully can land even if people don't like
patches later in the series.  ;-)
Looking back, this is pretty much two series squashed that could be
treated indepdently.  The first is a serial series and the second is a
kgdb series.
For all serial patches I'd expect them to go through the tty tree once
they've been reviewed.
If folks are OK w/ the 'smp' patch it probably should go in some core
kernel tree.  The kgdb patch won't work without it, though, so to land
that we'd need coordination between the folks landing that and the
folks landing the 'smp' patch.
Douglas Anderson (7):
  serial: qcom_geni_serial: Finish supporting sysrq
  serial: core: Allow processing sysrq at port unlock time
  serial: qcom_geni_serial: Process sysrq at port unlock time
  serial: core: Include console.h from serial_core.h
  serial: 8250: Process sysrq at port unlock time
  smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  kgdb: Remove irq flags and local_irq_enable/disable from roundup
 arch/arc/kernel/kgdb.c                      |  4 +--
 arch/arm/kernel/kgdb.c                      |  4 +--
 arch/arm64/kernel/kgdb.c                    |  4 +--
 arch/hexagon/kernel/kgdb.c                  | 11 ++----
 arch/mips/kernel/kgdb.c                     |  4 +--
 arch/powerpc/kernel/kgdb.c                  |  2 +-
 arch/sh/kernel/kgdb.c                       |  4 +--
 arch/sparc/kernel/smp_64.c                  |  2 +-
 arch/x86/kernel/kgdb.c                      |  9 ++---
 drivers/tty/serial/8250/8250_aspeed_vuart.c |  6 +++-
 drivers/tty/serial/8250/8250_fsl.c          |  6 +++-
 drivers/tty/serial/8250/8250_omap.c         |  6 +++-
 drivers/tty/serial/8250/8250_port.c         |  8 ++---
 drivers/tty/serial/qcom_geni_serial.c       | 10 ++++--
 include/linux/kgdb.h                        |  9 ++---
 include/linux/serial_core.h                 | 38 ++++++++++++++++++++-
 kernel/debug/debug_core.c                   |  2 +-
 kernel/smp.c                                |  4 ++-
 18 files changed, 80 insertions(+), 53 deletions(-)
-- 
2.19.1.568.g152ad8e336-goog
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
@ 2018-10-29 18:07 ` Douglas Anderson
  2018-10-30  8:25   ` Peter Zijlstra
  2018-10-30  9:41   ` Daniel Thompson
  2018-10-29 18:07 ` [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup Douglas Anderson
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, linux-mips,
	linux-sh, peterz, linux-hexagon, frederic, riel, linux-kernel,
	luto, sparclinux, linux-snps-arc, linuxppc-dev, linux-arm-kernel
In kgdb_roundup_cpus() we've got code that looks like:
  local_irq_enable();
  smp_call_function(kgdb_call_nmi_hook, NULL, 0);
  local_irq_disable();
In certain cases when we drop into kgdb (like with sysrq-g on a serial
console) we'll get a big yell that looks like:
  sysrq: SysRq : DEBUG
  ------------[ cut here ]------------
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)
  WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
  pstate: 604003c9 (nZCv DAIF +PAN -UAO)
  pc : lockdep_hardirqs_on+0xf0/0x160
  ...
  Call trace:
   lockdep_hardirqs_on+0xf0/0x160
   trace_hardirqs_on+0x188/0x1ac
   kgdb_roundup_cpus+0x14/0x3c
   kgdb_cpu_enter+0x53c/0x5cc
   kgdb_handle_exception+0x180/0x1d4
   kgdb_compiled_brk_fn+0x30/0x3c
   brk_handler+0x134/0x178
   do_debug_exception+0xfc/0x178
   el1_dbg+0x18/0x78
   kgdb_breakpoint+0x34/0x58
   sysrq_handle_dbg+0x54/0x5c
   __handle_sysrq+0x114/0x21c
   handle_sysrq+0x30/0x3c
   qcom_geni_serial_isr+0x2dc/0x30c
  ...
  ...
  irq event stamp: ...45
  hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
  hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
  softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
  softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
  ---[ end trace adf21f830c46e638 ]---
Let's add kgdb to the list of reasons not to warn in
smp_call_function_many().  That will allow us (in a future patch) to
stop calling local_irq_enable() which will get rid of the original
splat.
NOTE: with this change comes the obvious question: will we start
deadlocking more often now when we drop into the debugger.  I can't
say that for sure one way or the other, but the fact that we do the
same logic for "oops_in_progress" makes me feel like it shouldn't
happen too often.  Also note that the old logic of turning on
interrupts temporarily wasn't exactly safe since (I presume) that
could have violated spin_lock_irqsave() semantics and ended up with a
deadlock of its own.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
 kernel/smp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index 163c451af42e..bb581e58c8dc 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -19,6 +19,7 @@
 #include <linux/sched.h>
 #include <linux/sched/idle.h>
 #include <linux/hypervisor.h>
+#include <linux/kgdb.h>
 
 #include "smpboot.h"
 
@@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
 	 * can't happen.
 	 */
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
-		     && !oops_in_progress && !early_boot_irqs_disabled);
+		     && !oops_in_progress && !early_boot_irqs_disabled
+		     && !in_dbg_master());
 
 	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
-- 
2.19.1.568.g152ad8e336-goog
^ permalink raw reply related	[flat|nested] 12+ messages in thread
* [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
  2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
@ 2018-10-29 18:07 ` Douglas Anderson
  2018-10-30 11:46   ` Daniel Thompson
  2018-10-30 11:56 ` [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Daniel Thompson
  2018-10-30 12:36 ` Andy Shevchenko
  3 siblings, 1 reply; 12+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, kstewart,
	linux-mips, dalias, linux-sh, benh, will.deacon, linux-kernel,
	mhocko, paulus, hpa, sparclinux, linux-hexagon, sfr, ysato, mpe,
	x86, linux, mingo, catalin.marinas, jhogan, linux-snps-arc,
	ying.huang, rppt, bp, linux-arm-kernel, christophe.leroy,
	pombredanne, ralf, rkuo
The function kgdb_roundup_cpus() was passed a parameter that was
documented as:
> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().
Nobody used those flags.  Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them.  So we can definitely remove the flags.
Speaking of calling local_irq_enable(), it seems like a bad idea and
it caused a nice splat on my system with lockdep turned on.
Specifically it hit:
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)
See the previous patch in this series ("smp: Don't yell about IRQs
disabled in kgdb_roundup_cpus()") for more details, but the last few
things on the stack were this on my arm64 board:
  lockdep_hardirqs_on+0xf0/0x160
  trace_hardirqs_on+0x188/0x1ac
  kgdb_roundup_cpus+0x14/0x3c
As agrued in the the commit text of ("smp: Don't yell about IRQs
disabled in kgdb_roundup_cpus()"), it seems better to make
smp_call_function() lenient about kgdb than to locally turn on IRQs
here.  Thus let's totally remove all the local_irq_enable() and
local_irq_disable() calls from all of the kgdb_roundup_cpus() calls.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
 arch/arc/kernel/kgdb.c     |  4 +---
 arch/arm/kernel/kgdb.c     |  4 +---
 arch/arm64/kernel/kgdb.c   |  4 +---
 arch/hexagon/kernel/kgdb.c | 11 ++---------
 arch/mips/kernel/kgdb.c    |  4 +---
 arch/powerpc/kernel/kgdb.c |  2 +-
 arch/sh/kernel/kgdb.c      |  4 +---
 arch/sparc/kernel/smp_64.c |  2 +-
 arch/x86/kernel/kgdb.c     |  9 ++-------
 include/linux/kgdb.h       |  9 ++-------
 kernel/debug/debug_core.c  |  2 +-
 11 files changed, 14 insertions(+), 41 deletions(-)
diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 9a3c34af2ae8..d94d3cb7f9e8 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -197,11 +197,9 @@ static void kgdb_call_nmi_hook(void *ignored)
 	kgdb_nmicallback(raw_smp_processor_id(), NULL);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 
 struct kgdb_arch arch_kgdb_ops = {
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe3dc61..a80e9259f7e9 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -175,11 +175,9 @@ static void kgdb_call_nmi_hook(void *ignored)
        kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-       local_irq_enable();
        smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-       local_irq_disable();
 }
 
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a20de58061a8..5d171c26788f 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -289,11 +289,9 @@ static void kgdb_call_nmi_hook(void *ignored)
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 16c24b22d0b2..30fbc491cf45 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)
 
 /**
  * kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
  *
  * On SMP systems, we need to get the attention of the other CPUs
  * and get them be in a known state.  This should do what is needed
  * to get the other CPUs to call kgdb_wait(). Note that on some arches,
  * the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
  *
  * On non-SMP systems, this is not called.
  */
@@ -139,11 +134,9 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 #endif
 
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index eb6c0d582626..6671a279966f 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -219,11 +219,9 @@ static void kgdb_call_nmi_hook(void *ignored)
 	set_fs(old_fs);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 
 static int compute_signal(int tt)
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 59c578f865aa..b0e804844be0 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_SMP
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
 	smp_send_debugger_break();
 }
diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
index 4f04c6638a4d..86b3ea927e42 100644
--- a/arch/sh/kernel/kgdb.c
+++ b/arch/sh/kernel/kgdb.c
@@ -319,11 +319,9 @@ static void kgdb_call_nmi_hook(void *ignored)
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 4792e08ad36b..f45d876983f1 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1014,7 +1014,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 }
 
 #ifdef CONFIG_KGDB
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
 	smp_cross_call(&xcall_kgdb_capture, 0, 0, 0);
 }
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8e36f249646e..ac6291a4178d 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -422,21 +422,16 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
 #ifdef CONFIG_SMP
 /**
  *	kgdb_roundup_cpus - Get other CPUs into a holding pattern
- *	@flags: Current IRQ state
  *
  *	On SMP systems, we need to get the attention of the other CPUs
  *	and get them be in a known state.  This should do what is needed
  *	to get the other CPUs to call kgdb_wait(). Note that on some arches,
  *	the NMI approach is not used for rounding up all the CPUs. For example,
- *	in case of MIPS, smp_call_function() is used to roundup CPUs. In
- *	this case, we have to make sure that interrupts are enabled before
- *	calling smp_call_function(). The argument to this function is
- *	the flags that will be used when restoring the interrupts. There is
- *	local_irq_save() call before kgdb_roundup_cpus().
+ *	in case of MIPS, smp_call_function() is used to roundup CPUs.
  *
  *	On non-SMP systems, this is not called.
  */
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
 	apic->send_IPI_allbutself(APIC_DM_NMI);
 }
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index e465bb15912d..05e5b2eb0d32 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -178,21 +178,16 @@ kgdb_arch_handle_exception(int vector, int signo, int err_code,
 
 /**
  *	kgdb_roundup_cpus - Get other CPUs into a holding pattern
- *	@flags: Current IRQ state
  *
  *	On SMP systems, we need to get the attention of the other CPUs
  *	and get them into a known state.  This should do what is needed
  *	to get the other CPUs to call kgdb_wait(). Note that on some arches,
  *	the NMI approach is not used for rounding up all the CPUs. For example,
- *	in case of MIPS, smp_call_function() is used to roundup CPUs. In
- *	this case, we have to make sure that interrupts are enabled before
- *	calling smp_call_function(). The argument to this function is
- *	the flags that will be used when restoring the interrupts. There is
- *	local_irq_save() call before kgdb_roundup_cpus().
+ *	in case of MIPS, smp_call_function() is used to roundup CPUs.
  *
  *	On non-SMP systems, this is not called.
  */
-extern void kgdb_roundup_cpus(unsigned long flags);
+extern void kgdb_roundup_cpus(void);
 
 /**
  *	kgdb_arch_set_pc - Generic call back to the program counter
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 65c0f1363788..f3cadda45f07 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -593,7 +593,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 
 	/* Signal the other CPUs to enter kgdb_wait() */
 	else if ((!kgdb_single_step) && kgdb_do_roundup)
-		kgdb_roundup_cpus(flags);
+		kgdb_roundup_cpus();
 #endif
 
 	/*
-- 
2.19.1.568.g152ad8e336-goog
^ permalink raw reply related	[flat|nested] 12+ messages in thread
* Re: [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
@ 2018-10-30  8:25   ` Peter Zijlstra
  2018-10-30  9:41   ` Daniel Thompson
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-10-30  8:25 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh, linux-arm-msm,
	kgdb-bugreport, linux-mips, linux-sh, linux-hexagon, frederic,
	riel, linux-kernel, luto, sparclinux, linux-snps-arc,
	linuxppc-dev, linux-arm-kernel
On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> In kgdb_roundup_cpus() we've got code that looks like:
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
>   local_irq_disable();
> Let's add kgdb to the list of reasons not to warn in
> smp_call_function_many().  That will allow us (in a future patch) to
> stop calling local_irq_enable() which will get rid of the original
> splat.
> 
> NOTE: with this change comes the obvious question: will we start
> deadlocking more often now when we drop into the debugger.  I can't
> say that for sure one way or the other, but the fact that we do the
> same logic for "oops_in_progress" makes me feel like it shouldn't
> happen too often.  Also note that the old logic of turning on
> interrupts temporarily wasn't exactly safe since (I presume) that
> could have violated spin_lock_irqsave() semantics and ended up with a
> deadlock of its own.
How is any of that not utterly and terminally broken?
> @@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
>  	 * can't happen.
>  	 */
>  	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> -		     && !oops_in_progress && !early_boot_irqs_disabled);
> +		     && !oops_in_progress && !early_boot_irqs_disabled
> +		     && !in_dbg_master());
>  
>  	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
>  	cpu = cpumask_first_and(mask, cpu_online_mask);
Not a fan of this. There is a distinct difference between
oops_in_progress and dropping into kgdb in that you don't ever expect to
return/survive oopses, whereas we do expect to survive kgdb.
Also, how does kgdb work at all without actual NMIs ?
So no, NAK on this.
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
  2018-10-30  8:25   ` Peter Zijlstra
@ 2018-10-30  9:41   ` Daniel Thompson
  2018-10-30 22:21     ` Doug Anderson
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Thompson @ 2018-10-30  9:41 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, tglx, mingo, gregkh, linux-arm-msm, kgdb-bugreport,
	linux-mips, linux-sh, peterz, linux-hexagon, frederic, riel,
	linux-kernel, luto, sparclinux, linux-snps-arc, linuxppc-dev,
	linux-arm-kernel
On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> In kgdb_roundup_cpus() we've got code that looks like:
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
>   local_irq_disable();
> 
> In certain cases when we drop into kgdb (like with sysrq-g on a serial
> console) we'll get a big yell that looks like:
> 
>   sysrq: SysRq : DEBUG
>   ------------[ cut here ]------------
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
>   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
>   pc : lockdep_hardirqs_on+0xf0/0x160
>   ...
>   Call trace:
>    lockdep_hardirqs_on+0xf0/0x160
>    trace_hardirqs_on+0x188/0x1ac
>    kgdb_roundup_cpus+0x14/0x3c
>    kgdb_cpu_enter+0x53c/0x5cc
>    kgdb_handle_exception+0x180/0x1d4
>    kgdb_compiled_brk_fn+0x30/0x3c
>    brk_handler+0x134/0x178
>    do_debug_exception+0xfc/0x178
>    el1_dbg+0x18/0x78
>    kgdb_breakpoint+0x34/0x58
>    sysrq_handle_dbg+0x54/0x5c
>    __handle_sysrq+0x114/0x21c
>    handle_sysrq+0x30/0x3c
>    qcom_geni_serial_isr+0x2dc/0x30c
>   ...
>   ...
>   irq event stamp: ...45
>   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
>   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
>   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
>   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
>   ---[ end trace adf21f830c46e638 ]---
> 
> Let's add kgdb to the list of reasons not to warn in
> smp_call_function_many().  That will allow us (in a future patch) to
> stop calling local_irq_enable() which will get rid of the original
> splat.
> 
> NOTE: with this change comes the obvious question: will we start
> deadlocking more often now when we drop into the debugger.  I can't
> say that for sure one way or the other, but the fact that we do the
> same logic for "oops_in_progress" makes me feel like it shouldn't
> happen too often.  Also note that the old logic of turning on
> interrupts temporarily wasn't exactly safe since (I presume) that
> could have violated spin_lock_irqsave() semantics and ended up with a
> deadlock of its own.
This is part of the code to bring all the cores to a halt and since
the other cores are still running kgdb isn't yet able to use the fact
all the CPUs are halted to bend the rules. It is better for this code
to play by the rules if it can.
Is is possible to get the roundup functions to use a private csd
alongside smp_call_function_single_async()? We could add a helper
function to the debug core to avoid having to add cpu_online loops into
every kgdb_roundup_cpus() implementaton.
Daniel.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  kernel/smp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 163c451af42e..bb581e58c8dc 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -19,6 +19,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/idle.h>
>  #include <linux/hypervisor.h>
> +#include <linux/kgdb.h>
>  
>  #include "smpboot.h"
>  
> @@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
>  	 * can't happen.
>  	 */
>  	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> -		     && !oops_in_progress && !early_boot_irqs_disabled);
> +		     && !oops_in_progress && !early_boot_irqs_disabled
> +		     && !in_dbg_master());
>  
>  	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
>  	cpu = cpumask_first_and(mask, cpu_online_mask);
> -- 
> 2.19.1.568.g152ad8e336-goog
> 
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup
  2018-10-29 18:07 ` [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup Douglas Anderson
@ 2018-10-30 11:46   ` Daniel Thompson
  2018-10-30 22:22     ` Doug Anderson
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Thompson @ 2018-10-30 11:46 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, tglx, mingo, gregkh, linux-arm-msm, kgdb-bugreport,
	kstewart, linux-mips, dalias, linux-sh, benh, will.deacon,
	linux-kernel, mhocko, paulus, hpa, sparclinux, linux-hexagon, sfr,
	ysato, mpe, x86, linux, mingo, catalin.marinas, jhogan,
	linux-snps-arc, ying.huang, rppt, bp, linux-arm-kernel,
	christophe.leroy
On Mon, Oct 29, 2018 at 11:07:07AM -0700, Douglas Anderson wrote:
> The function kgdb_roundup_cpus() was passed a parameter that was
> documented as:
> 
> > the flags that will be used when restoring the interrupts. There is
> > local_irq_save() call before kgdb_roundup_cpus().
> 
> Nobody used those flags.  Anyone who wanted to temporarily turn on
> interrupts just did local_irq_enable() and local_irq_disable() without
> looking at them.  So we can definitely remove the flags.
On the whole I'd rather that this change...
> Speaking of calling local_irq_enable(), it seems like a bad idea and
> it caused a nice splat on my system with lockdep turned on.
> Specifically it hit:
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
... and fixes for this this were in separate patches. They don't appear
especially related.
Daniel.
 
> See the previous patch in this series ("smp: Don't yell about IRQs
> disabled in kgdb_roundup_cpus()") for more details, but the last few
> things on the stack were this on my arm64 board:
>   lockdep_hardirqs_on+0xf0/0x160
>   trace_hardirqs_on+0x188/0x1ac
>   kgdb_roundup_cpus+0x14/0x3c
> 
> As agrued in the the commit text of ("smp: Don't yell about IRQs
> disabled in kgdb_roundup_cpus()"), it seems better to make
> smp_call_function() lenient about kgdb than to locally turn on IRQs
> here.  Thus let's totally remove all the local_irq_enable() and
> local_irq_disable() calls from all of the kgdb_roundup_cpus() calls.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  arch/arc/kernel/kgdb.c     |  4 +---
>  arch/arm/kernel/kgdb.c     |  4 +---
>  arch/arm64/kernel/kgdb.c   |  4 +---
>  arch/hexagon/kernel/kgdb.c | 11 ++---------
>  arch/mips/kernel/kgdb.c    |  4 +---
>  arch/powerpc/kernel/kgdb.c |  2 +-
>  arch/sh/kernel/kgdb.c      |  4 +---
>  arch/sparc/kernel/smp_64.c |  2 +-
>  arch/x86/kernel/kgdb.c     |  9 ++-------
>  include/linux/kgdb.h       |  9 ++-------
>  kernel/debug/debug_core.c  |  2 +-
>  11 files changed, 14 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 9a3c34af2ae8..d94d3cb7f9e8 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -197,11 +197,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>  	kgdb_nmicallback(raw_smp_processor_id(), NULL);
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  
>  struct kgdb_arch arch_kgdb_ops = {
> diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
> index caa0dbe3dc61..a80e9259f7e9 100644
> --- a/arch/arm/kernel/kgdb.c
> +++ b/arch/arm/kernel/kgdb.c
> @@ -175,11 +175,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>         kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -       local_irq_enable();
>         smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -       local_irq_disable();
>  }
>  
>  static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..5d171c26788f 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -289,11 +289,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>  	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  
>  static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
> index 16c24b22d0b2..30fbc491cf45 100644
> --- a/arch/hexagon/kernel/kgdb.c
> +++ b/arch/hexagon/kernel/kgdb.c
> @@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)
>  
>  /**
>   * kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - * @flags: Current IRQ state
>   *
>   * On SMP systems, we need to get the attention of the other CPUs
>   * and get them be in a known state.  This should do what is needed
>   * to get the other CPUs to call kgdb_wait(). Note that on some arches,
>   * the NMI approach is not used for rounding up all the CPUs. For example,
> - * in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - * this case, we have to make sure that interrupts are enabled before
> - * calling smp_call_function(). The argument to this function is
> - * the flags that will be used when restoring the interrupts. There is
> - * local_irq_save() call before kgdb_roundup_cpus().
> + * in case of MIPS, smp_call_function() is used to roundup CPUs.
>   *
>   * On non-SMP systems, this is not called.
>   */
> @@ -139,11 +134,9 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
>  	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  #endif
>  
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index eb6c0d582626..6671a279966f 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -219,11 +219,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>  	set_fs(old_fs);
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  
>  static int compute_signal(int tt)
> diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
> index 59c578f865aa..b0e804844be0 100644
> --- a/arch/powerpc/kernel/kgdb.c
> +++ b/arch/powerpc/kernel/kgdb.c
> @@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
>  }
>  
>  #ifdef CONFIG_SMP
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>  	smp_send_debugger_break();
>  }
> diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
> index 4f04c6638a4d..86b3ea927e42 100644
> --- a/arch/sh/kernel/kgdb.c
> +++ b/arch/sh/kernel/kgdb.c
> @@ -319,11 +319,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>  	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  
>  static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
> index 4792e08ad36b..f45d876983f1 100644
> --- a/arch/sparc/kernel/smp_64.c
> +++ b/arch/sparc/kernel/smp_64.c
> @@ -1014,7 +1014,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
>  }
>  
>  #ifdef CONFIG_KGDB
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>  	smp_cross_call(&xcall_kgdb_capture, 0, 0, 0);
>  }
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index 8e36f249646e..ac6291a4178d 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -422,21 +422,16 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
>  #ifdef CONFIG_SMP
>  /**
>   *	kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - *	@flags: Current IRQ state
>   *
>   *	On SMP systems, we need to get the attention of the other CPUs
>   *	and get them be in a known state.  This should do what is needed
>   *	to get the other CPUs to call kgdb_wait(). Note that on some arches,
>   *	the NMI approach is not used for rounding up all the CPUs. For example,
> - *	in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - *	this case, we have to make sure that interrupts are enabled before
> - *	calling smp_call_function(). The argument to this function is
> - *	the flags that will be used when restoring the interrupts. There is
> - *	local_irq_save() call before kgdb_roundup_cpus().
> + *	in case of MIPS, smp_call_function() is used to roundup CPUs.
>   *
>   *	On non-SMP systems, this is not called.
>   */
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>  	apic->send_IPI_allbutself(APIC_DM_NMI);
>  }
> diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> index e465bb15912d..05e5b2eb0d32 100644
> --- a/include/linux/kgdb.h
> +++ b/include/linux/kgdb.h
> @@ -178,21 +178,16 @@ kgdb_arch_handle_exception(int vector, int signo, int err_code,
>  
>  /**
>   *	kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - *	@flags: Current IRQ state
>   *
>   *	On SMP systems, we need to get the attention of the other CPUs
>   *	and get them into a known state.  This should do what is needed
>   *	to get the other CPUs to call kgdb_wait(). Note that on some arches,
>   *	the NMI approach is not used for rounding up all the CPUs. For example,
> - *	in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - *	this case, we have to make sure that interrupts are enabled before
> - *	calling smp_call_function(). The argument to this function is
> - *	the flags that will be used when restoring the interrupts. There is
> - *	local_irq_save() call before kgdb_roundup_cpus().
> + *	in case of MIPS, smp_call_function() is used to roundup CPUs.
>   *
>   *	On non-SMP systems, this is not called.
>   */
> -extern void kgdb_roundup_cpus(unsigned long flags);
> +extern void kgdb_roundup_cpus(void);
>  
>  /**
>   *	kgdb_arch_set_pc - Generic call back to the program counter
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 65c0f1363788..f3cadda45f07 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -593,7 +593,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
>  
>  	/* Signal the other CPUs to enter kgdb_wait() */
>  	else if ((!kgdb_single_step) && kgdb_do_roundup)
> -		kgdb_roundup_cpus(flags);
> +		kgdb_roundup_cpus();
>  #endif
>  
>  	/*
> -- 
> 2.19.1.568.g152ad8e336-goog
> 
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
  2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
  2018-10-29 18:07 ` [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup Douglas Anderson
@ 2018-10-30 11:56 ` Daniel Thompson
  2018-10-30 12:31   ` Russell King - ARM Linux
  2018-10-30 22:20   ` Doug Anderson
  2018-10-30 12:36 ` Andy Shevchenko
  3 siblings, 2 replies; 12+ messages in thread
From: Daniel Thompson @ 2018-10-30 11:56 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, tglx, mingo, gregkh, linux-arm-msm, kgdb-bugreport,
	nm, linux-mips, dalias, catalin.marinas, vigneshr, linux-aspeed,
	linux-sh, peterz, will.deacon, mhocko, paulus, hpa, sparclinux,
	marex, sfr, ysato, linux-hexagon, x86, linux, pombredanne, tony,
	mingo, joel, linux-serial, rolf.evers.fischer, jhogan, asierra,
	linux-snps-arc
On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> Looking back, this is pretty much two series squashed that could be
> treated indepdently.  The first is a serial series and the second is a
> kgdb series.
Indeed.
I couldn't work out the link between the first 5 patches and the last 2
until I read this...
Is anything in the 01->05 patch set even related to kgdb? From the stack
traces it looks to me like the lock dep warning would trigger for any
sysrq. I think separating into two threads for v2 would be sensible.
Daniel.
> 
> For all serial patches I'd expect them to go through the tty tree once
> they've been reviewed.
> 
> If folks are OK w/ the 'smp' patch it probably should go in some core
> kernel tree.  The kgdb patch won't work without it, though, so to land
> that we'd need coordination between the folks landing that and the
> folks landing the 'smp' patch.
> 
> 
> Douglas Anderson (7):
>   serial: qcom_geni_serial: Finish supporting sysrq
>   serial: core: Allow processing sysrq at port unlock time
>   serial: qcom_geni_serial: Process sysrq at port unlock time
>   serial: core: Include console.h from serial_core.h
>   serial: 8250: Process sysrq at port unlock time
>   smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
>   kgdb: Remove irq flags and local_irq_enable/disable from roundup
> 
>  arch/arc/kernel/kgdb.c                      |  4 +--
>  arch/arm/kernel/kgdb.c                      |  4 +--
>  arch/arm64/kernel/kgdb.c                    |  4 +--
>  arch/hexagon/kernel/kgdb.c                  | 11 ++----
>  arch/mips/kernel/kgdb.c                     |  4 +--
>  arch/powerpc/kernel/kgdb.c                  |  2 +-
>  arch/sh/kernel/kgdb.c                       |  4 +--
>  arch/sparc/kernel/smp_64.c                  |  2 +-
>  arch/x86/kernel/kgdb.c                      |  9 ++---
>  drivers/tty/serial/8250/8250_aspeed_vuart.c |  6 +++-
>  drivers/tty/serial/8250/8250_fsl.c          |  6 +++-
>  drivers/tty/serial/8250/8250_omap.c         |  6 +++-
>  drivers/tty/serial/8250/8250_port.c         |  8 ++---
>  drivers/tty/serial/qcom_geni_serial.c       | 10 ++++--
>  include/linux/kgdb.h                        |  9 ++---
>  include/linux/serial_core.h                 | 38 ++++++++++++++++++++-
>  kernel/debug/debug_core.c                   |  2 +-
>  kernel/smp.c                                |  4 ++-
>  18 files changed, 80 insertions(+), 53 deletions(-)
> 
> -- 
> 2.19.1.568.g152ad8e336-goog
> 
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
  2018-10-30 11:56 ` [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Daniel Thompson
@ 2018-10-30 12:31   ` Russell King - ARM Linux
  2018-10-30 22:20   ` Doug Anderson
  1 sibling, 0 replies; 12+ messages in thread
From: Russell King - ARM Linux @ 2018-10-30 12:31 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Douglas Anderson, Jason Wessel, tglx, mingo, gregkh,
	linux-arm-msm, kgdb-bugreport, nm, linux-mips, dalias,
	catalin.marinas, vigneshr, linux-aspeed, linux-sh, peterz,
	will.deacon, mhocko, paulus, hpa, sparclinux, marex, sfr, ysato,
	linux-hexagon, x86, pombredanne, tony, mingo, joel, linux-serial,
	rolf.evers.fischer, jhogan, asierra
On Tue, Oct 30, 2018 at 11:56:28AM +0000, Daniel Thompson wrote:
> On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> > Looking back, this is pretty much two series squashed that could be
> > treated indepdently.  The first is a serial series and the second is a
> > kgdb series.
> 
> Indeed.
> 
> I couldn't work out the link between the first 5 patches and the last 2
> until I read this...
> 
> Is anything in the 01->05 patch set even related to kgdb? From the stack
> traces it looks to me like the lock dep warning would trigger for any
> sysrq. I think separating into two threads for v2 would be sensible.
I'm concerned about calling smp_call_function() from IRQ context with
IRQs disabled - that will block the ability of the _calling_ CPU to
process IPIs from other CPUs in the system.  If we have other CPUs
waiting on their IPIs to complete on _this_ CPU, we could end up
deadlocking while trying to grab the CSD lock.
This is the intention of warnings in smp_call_function*() - to catch
cases where deadlocks _can_ occur, but do not reliably show up.
The exceptions to the warning (disregarding oops_in_progress) are
chosen to allow IRQs-disabled calls when we're sure that the rest of
the system isn't going to be sending the calling CPU an IPI (eg,
because the CPU isn't marked online, and we only send IPIs to online
CPUs, or if we're still early in the kernel boot and hence have no
other CPUs running.)  The exception is oops_in_progress, which can
occur at any time - even with the current CPU already holding some
CSD locks (eg, oops while trying to send an IPI.)
-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
                   ` (2 preceding siblings ...)
  2018-10-30 11:56 ` [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Daniel Thompson
@ 2018-10-30 12:36 ` Andy Shevchenko
  3 siblings, 0 replies; 12+ messages in thread
From: Andy Shevchenko @ 2018-10-30 12:36 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh, linux-arm-msm,
	kgdb-bugreport, nm, linux-mips, dalias, catalin.marinas, vigneshr,
	linux-aspeed, linux-sh, peterz, will.deacon, mhocko, paulus, hpa,
	sparclinux, marex, sfr, ysato, linux-hexagon, x86, linux,
	pombredanne, tony, mingo, joel, linux-serial, rolf.evers.fischer,
	jhogan
On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> I started out this series trying to make sysrq work over the serial
> console on qcom_geni_serial, then fell into a rat's nest.
> 
> To solve the deadlock I faced when enabling sysrq I tried to borrow
> code from '8250_port.c' which avoided grabbing the port lock in
> console_write().  ...but since these days I try to run with lockdep on
> all the time, I found it caused an annoying lockdep splat (which I
> also reproduced on my rk3399 board).  ...so I instead changed my
> qcom_geni_serial solution to borrow code from 'msm_serial.c'
> 
> I wasn't super happy with the solution in 'msm_serial.c' though.  I
> don't like releasing the spinlock there.  Not only is it ugly but it
> means we are unlocking / re-locking _all the time_ even though sysrq
> characters are rare.  ...so I came up with what I think is a better
> solution and then implemented it for qcom_geni_serial.
> 
> Since I had a good way to test 8250-based UARTs, I also fixed that
> driver to use my new method.  When doing so, I ran into a missing
> msm_serial.c at all, so I didn't switch that (or all other serial
> drivers for that matter) to the new method.
> 
> After fixing all the above issues, I found the next lockdep splat in
> kgdb and I think I've worked around it in a good-enough way, but I'm
> much less confident about this.  Hopefully folks can take a look at
> it.
> 
> In general, patches earlier in this series should be "less
> controversial" and hopefully can land even if people don't like
> patches later in the series.  ;-)
> 
> Looking back, this is pretty much two series squashed that could be
> treated indepdently.  The first is a serial series and the second is a
> kgdb series.
> 
> For all serial patches I'd expect them to go through the tty tree once
> they've been reviewed.
> 
> If folks are OK w/ the 'smp' patch it probably should go in some core
> kernel tree.  The kgdb patch won't work without it, though, so to land
> that we'd need coordination between the folks landing that and the
> folks landing the 'smp' patch.
I have got only 0/7 and 5/7, everything okay with your mail client and other tools?
> 
> 
> Douglas Anderson (7):
>   serial: qcom_geni_serial: Finish supporting sysrq
>   serial: core: Allow processing sysrq at port unlock time
>   serial: qcom_geni_serial: Process sysrq at port unlock time
>   serial: core: Include console.h from serial_core.h
>   serial: 8250: Process sysrq at port unlock time
>   smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
>   kgdb: Remove irq flags and local_irq_enable/disable from roundup
> 
>  arch/arc/kernel/kgdb.c                      |  4 +--
>  arch/arm/kernel/kgdb.c                      |  4 +--
>  arch/arm64/kernel/kgdb.c                    |  4 +--
>  arch/hexagon/kernel/kgdb.c                  | 11 ++----
>  arch/mips/kernel/kgdb.c                     |  4 +--
>  arch/powerpc/kernel/kgdb.c                  |  2 +-
>  arch/sh/kernel/kgdb.c                       |  4 +--
>  arch/sparc/kernel/smp_64.c                  |  2 +-
>  arch/x86/kernel/kgdb.c                      |  9 ++---
>  drivers/tty/serial/8250/8250_aspeed_vuart.c |  6 +++-
>  drivers/tty/serial/8250/8250_fsl.c          |  6 +++-
>  drivers/tty/serial/8250/8250_omap.c         |  6 +++-
>  drivers/tty/serial/8250/8250_port.c         |  8 ++---
>  drivers/tty/serial/qcom_geni_serial.c       | 10 ++++--
>  include/linux/kgdb.h                        |  9 ++---
>  include/linux/serial_core.h                 | 38 ++++++++++++++++++++-
>  kernel/debug/debug_core.c                   |  2 +-
>  kernel/smp.c                                |  4 ++-
>  18 files changed, 80 insertions(+), 53 deletions(-)
> 
> -- 
> 2.19.1.568.g152ad8e336-goog
> 
-- 
With Best Regards,
Andy Shevchenko
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
  2018-10-30 11:56 ` [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Daniel Thompson
  2018-10-30 12:31   ` Russell King - ARM Linux
@ 2018-10-30 22:20   ` Doug Anderson
  1 sibling, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2018-10-30 22:20 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Nishanth Menon, linux-mips, dalias, Benjamin Herrenschmidt,
	vigneshr, linux-aspeed, linux-sh, Peter Zijlstra, Catalin Marinas,
	Will Deacon, mhocko, paulus, mpe, H. Peter Anvin, sparclinux,
	Jiri Slaby, Ingo Molnar, marex, Stephen Rothwell, ysato,
	linux-hexagon, x86, Russell King - ARM Linux, Tony Lindgren,
	Ingo Molnar, joel, linux-serial, kgdb-bugreport, jhogan,
	rolf.evers.fischer, asierra
Hi,
On Tue, Oct 30, 2018 at 4:56 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> > Looking back, this is pretty much two series squashed that could be
> > treated indepdently.  The first is a serial series and the second is a
> > kgdb series.
>
> Indeed.
>
> I couldn't work out the link between the first 5 patches and the last 2
> until I read this...
>
> Is anything in the 01->05 patch set even related to kgdb? From the stack
> traces it looks to me like the lock dep warning would trigger for any
> sysrq. I think separating into two threads for v2 would be sensible.
Yes, sorry about the mess.  Splitting this into two series makes the
most sense.  Then I can focus more on trying to get the CCs right and
people can just get the patches that matter to them.
OK, I've sent v2 of both series out now.  Please yell if you can't
find them for whatever reason.
-Doug
-Doug
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  2018-10-30  9:41   ` Daniel Thompson
@ 2018-10-30 22:21     ` Doug Anderson
  0 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2018-10-30 22:21 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Jason Wessel, Thomas Gleixner, Ingo Molnar, Greg Kroah-Hartman,
	linux-arm-msm, kgdb-bugreport, linux-mips, linux-sh,
	Peter Zijlstra, linux-hexagon, frederic, riel, LKML, luto,
	sparclinux, linux-snps-arc, linuxppc-dev, Linux ARM
Daniel,
On Tue, Oct 30, 2018 at 2:42 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> > In kgdb_roundup_cpus() we've got code that looks like:
> >   local_irq_enable();
> >   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> >   local_irq_disable();
> >
> > In certain cases when we drop into kgdb (like with sysrq-g on a serial
> > console) we'll get a big yell that looks like:
> >
> >   sysrq: SysRq : DEBUG
> >   ------------[ cut here ]------------
> >   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> >   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
> >   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
> >   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
> >   pc : lockdep_hardirqs_on+0xf0/0x160
> >   ...
> >   Call trace:
> >    lockdep_hardirqs_on+0xf0/0x160
> >    trace_hardirqs_on+0x188/0x1ac
> >    kgdb_roundup_cpus+0x14/0x3c
> >    kgdb_cpu_enter+0x53c/0x5cc
> >    kgdb_handle_exception+0x180/0x1d4
> >    kgdb_compiled_brk_fn+0x30/0x3c
> >    brk_handler+0x134/0x178
> >    do_debug_exception+0xfc/0x178
> >    el1_dbg+0x18/0x78
> >    kgdb_breakpoint+0x34/0x58
> >    sysrq_handle_dbg+0x54/0x5c
> >    __handle_sysrq+0x114/0x21c
> >    handle_sysrq+0x30/0x3c
> >    qcom_geni_serial_isr+0x2dc/0x30c
> >   ...
> >   ...
> >   irq event stamp: ...45
> >   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
> >   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
> >   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
> >   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
> >   ---[ end trace adf21f830c46e638 ]---
> >
> > Let's add kgdb to the list of reasons not to warn in
> > smp_call_function_many().  That will allow us (in a future patch) to
> > stop calling local_irq_enable() which will get rid of the original
> > splat.
> >
> > NOTE: with this change comes the obvious question: will we start
> > deadlocking more often now when we drop into the debugger.  I can't
> > say that for sure one way or the other, but the fact that we do the
> > same logic for "oops_in_progress" makes me feel like it shouldn't
> > happen too often.  Also note that the old logic of turning on
> > interrupts temporarily wasn't exactly safe since (I presume) that
> > could have violated spin_lock_irqsave() semantics and ended up with a
> > deadlock of its own.
>
> This is part of the code to bring all the cores to a halt and since
> the other cores are still running kgdb isn't yet able to use the fact
> all the CPUs are halted to bend the rules. It is better for this code
> to play by the rules if it can.
>
> Is is possible to get the roundup functions to use a private csd
> alongside smp_call_function_single_async()? We could add a helper
> function to the debug core to avoid having to add cpu_online loops into
> every kgdb_roundup_cpus() implementaton.
Exactly the kind of helpful suggestion I was looking for.  Thank you
very much!  See v2 and hopefully it matches what you were thinking of.
-Doug
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup
  2018-10-30 11:46   ` Daniel Thompson
@ 2018-10-30 22:22     ` Doug Anderson
  0 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2018-10-30 22:22 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: kstewart, linux-mips, dalias, Catalin Marinas, linux-sh,
	Benjamin Herrenschmidt, Will Deacon, mhocko, paulus,
	H. Peter Anvin, sparclinux, linux-hexagon, Ingo Molnar,
	Stephen Rothwell, ysato, mpe, x86, Russell King - ARM Linux,
	Ingo Molnar, kgdb-bugreport, jhogan, linux-snps-arc,
	linux-arm-msm, ying.huang, rppt, bp, Thomas Gleixner, Linux ARM,
	christophe.leroy
Hi,
On Tue, Oct 30, 2018 at 4:46 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> On Mon, Oct 29, 2018 at 11:07:07AM -0700, Douglas Anderson wrote:
> > The function kgdb_roundup_cpus() was passed a parameter that was
> > documented as:
> >
> > > the flags that will be used when restoring the interrupts. There is
> > > local_irq_save() call before kgdb_roundup_cpus().
> >
> > Nobody used those flags.  Anyone who wanted to temporarily turn on
> > interrupts just did local_irq_enable() and local_irq_disable() without
> > looking at them.  So we can definitely remove the flags.
>
> On the whole I'd rather that this change...
>
>
> > Speaking of calling local_irq_enable(), it seems like a bad idea and
> > it caused a nice splat on my system with lockdep turned on.
> > Specifically it hit:
> >   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>
> ... and fixes for this this were in separate patches. They don't appear
> especially related.
Agreed that this is cleaner.  Done for v2.
-Doug
^ permalink raw reply	[flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-10-30 22:22 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
2018-10-30  8:25   ` Peter Zijlstra
2018-10-30  9:41   ` Daniel Thompson
2018-10-30 22:21     ` Doug Anderson
2018-10-29 18:07 ` [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup Douglas Anderson
2018-10-30 11:46   ` Daniel Thompson
2018-10-30 22:22     ` Doug Anderson
2018-10-30 11:56 ` [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Daniel Thompson
2018-10-30 12:31   ` Russell King - ARM Linux
2018-10-30 22:20   ` Doug Anderson
2018-10-30 12:36 ` Andy Shevchenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).