linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] kgdb: Timeout if secondary CPUs ignore the roundup
@ 2014-07-01 14:16 Daniel Thompson
  2014-07-01 14:22 ` Jason Wessel
  2014-07-02 14:12 ` [PATCH v2] " Daniel Thompson
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Thompson @ 2014-07-01 14:16 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Daniel Thompson, linux-kernel, patches, linaro-kernel,
	Mike Travis, Randy Dunlap, Dimitri Sivanich, Andrew Morton,
	Borislav Petkov, kgdb-bugreport

Currently if an active CPU fails to respond to a roundup request the
CPU that requested the roundup will become stuck. This needlessly
reduces the robustness of the debugger.

This patch introduces a timeout allowing the system state to be examined
even when the system contains unresponsive processors. It also modifies
kdb's cpu command to make it censor attempts to switch to unresponsive
processors and to report their state as (D)ead.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: kgdb-bugreport@lists.sourceforge.net
---
 kernel/debug/debug_core.c   | 9 +++++++--
 kernel/debug/kdb/kdb_main.c | 4 +++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 1adf62b..acd7497 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -471,6 +471,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	int cpu;
 	int trace_on = 0;
 	int online_cpus = num_online_cpus();
+	u64 time_left;
 
 	kgdb_info[ks->cpu].enter_kgdb++;
 	kgdb_info[ks->cpu].exception_state |= exception_state;
@@ -595,9 +596,13 @@ return_normal:
 	/*
 	 * Wait for the other CPUs to be notified and be waiting for us:
 	 */
-	while (kgdb_do_roundup && (atomic_read(&masters_in_kgdb) +
-				atomic_read(&slaves_in_kgdb)) != online_cpus)
+	time_left = loops_per_jiffy * HZ;
+	while (kgdb_do_roundup && --time_left &&
+	       (atomic_read(&masters_in_kgdb) + atomic_read(&slaves_in_kgdb)) !=
+		   online_cpus)
 		cpu_relax();
+	if (!time_left)
+		pr_crit("KGDB: Timed out waiting for secondary CPUs.\n");
 
 	/*
 	 * At this point the primary processor is completely
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 2f7c760..49f2425 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2157,6 +2157,8 @@ static void kdb_cpu_status(void)
 	for (start_cpu = -1, i = 0; i < NR_CPUS; i++) {
 		if (!cpu_online(i)) {
 			state = 'F';	/* cpu is offline */
+		} else if (!kgdb_info[i].enter_kgdb) {
+			state = 'D';	/* cpu is online but unresponsive */
 		} else {
 			state = ' ';	/* cpu is responding to kdb */
 			if (kdb_task_state_char(KDB_TSK(i)) == 'I')
@@ -2210,7 +2212,7 @@ static int kdb_cpu(int argc, const char **argv)
 	/*
 	 * Validate cpunum
 	 */
-	if ((cpunum > NR_CPUS) || !cpu_online(cpunum))
+	if ((cpunum > NR_CPUS) || !kgdb_info[cpunum].enter_kgdb)
 		return KDB_BADCPUNUM;
 
 	dbg_switch_cpu = cpunum;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] kgdb: Timeout if secondary CPUs ignore the roundup
  2014-07-01 14:16 [RFC PATCH] kgdb: Timeout if secondary CPUs ignore the roundup Daniel Thompson
@ 2014-07-01 14:22 ` Jason Wessel
  2014-07-02 14:12 ` [PATCH v2] " Daniel Thompson
  1 sibling, 0 replies; 7+ messages in thread
From: Jason Wessel @ 2014-07-01 14:22 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: linux-kernel, patches, linaro-kernel, Mike Travis, Randy Dunlap,
	Dimitri Sivanich, Andrew Morton, Borislav Petkov, kgdb-bugreport

On 07/01/2014 09:16 AM, Daniel Thompson wrote:
> Currently if an active CPU fails to respond to a roundup request the
> CPU that requested the roundup will become stuck. This needlessly
> reduces the robustness of the debugger.
>
> This patch introduces a timeout allowing the system state to be examined
> even when the system contains unresponsive processors. It also modifies
> kdb's cpu command to make it censor attempts to switch to unresponsive
> processors and to report their state as (D)ead.


It seems reasonable to allow entry on the master core because there certainly could be useful information to be had with respect to how you got there in the first place, but I wonder about the case for resuming the system.   In general if you couldn't sync in the the first place, the system is dead.  My opinion is that we probably should explicitly disallow a resume or single step at that point.

Jason.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] kgdb: Timeout if secondary CPUs ignore the roundup
  2014-07-01 14:16 [RFC PATCH] kgdb: Timeout if secondary CPUs ignore the roundup Daniel Thompson
  2014-07-01 14:22 ` Jason Wessel
@ 2014-07-02 14:12 ` Daniel Thompson
  2014-08-18 15:01   ` [RESEND PATCH v2 3.17rc1] " Daniel Thompson
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Thompson @ 2014-07-02 14:12 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Daniel Thompson, linux-kernel, patches, linaro-kernel,
	Mike Travis, Randy Dunlap, Dimitri Sivanich, Andrew Morton,
	Borislav Petkov, kgdb-bugreport

Currently if an active CPU fails to respond to a roundup request the
CPU that requested the roundup will become stuck. This needlessly
reduces the robustness of the debugger.

This patch introduces a timeout allowing the system state to be examined
even when the system contains unresponsive processors. It also modifies
kdb's cpu command to make it censor attempts to switch to unresponsive
processors and to report their state as (D)ead.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: kgdb-bugreport@lists.sourceforge.net
---

Notes:
    Changes since v1:
    - Set CATASTROPHIC if the system contains unresponsive processors
      (Jason Wessel)

 kernel/debug/debug_core.c       | 9 +++++++--
 kernel/debug/kdb/kdb_debugger.c | 4 ++++
 kernel/debug/kdb/kdb_main.c     | 4 +++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 1adf62b..acd7497 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -471,6 +471,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	int cpu;
 	int trace_on = 0;
 	int online_cpus = num_online_cpus();
+	u64 time_left;
 
 	kgdb_info[ks->cpu].enter_kgdb++;
 	kgdb_info[ks->cpu].exception_state |= exception_state;
@@ -595,9 +596,13 @@ return_normal:
 	/*
 	 * Wait for the other CPUs to be notified and be waiting for us:
 	 */
-	while (kgdb_do_roundup && (atomic_read(&masters_in_kgdb) +
-				atomic_read(&slaves_in_kgdb)) != online_cpus)
+	time_left = loops_per_jiffy * HZ;
+	while (kgdb_do_roundup && --time_left &&
+	       (atomic_read(&masters_in_kgdb) + atomic_read(&slaves_in_kgdb)) !=
+		   online_cpus)
 		cpu_relax();
+	if (!time_left)
+		pr_crit("KGDB: Timed out waiting for secondary CPUs.\n");
 
 	/*
 	 * At this point the primary processor is completely
diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c
index 8859ca3..15e1a7a 100644
--- a/kernel/debug/kdb/kdb_debugger.c
+++ b/kernel/debug/kdb/kdb_debugger.c
@@ -129,6 +129,10 @@ int kdb_stub(struct kgdb_state *ks)
 		ks->pass_exception = 1;
 		KDB_FLAG_SET(CATASTROPHIC);
 	}
+	/* set CATASTROPHIC if the system contains unresponsive processors */
+	for_each_online_cpu(i)
+		if (!kgdb_info[i].enter_kgdb)
+			KDB_FLAG_SET(CATASTROPHIC);
 	if (KDB_STATE(SSBPT) && reason == KDB_REASON_SSTEP) {
 		KDB_STATE_CLEAR(SSBPT);
 		KDB_STATE_CLEAR(DOING_SS);
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 2f7c760..49f2425 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2157,6 +2157,8 @@ static void kdb_cpu_status(void)
 	for (start_cpu = -1, i = 0; i < NR_CPUS; i++) {
 		if (!cpu_online(i)) {
 			state = 'F';	/* cpu is offline */
+		} else if (!kgdb_info[i].enter_kgdb) {
+			state = 'D';	/* cpu is online but unresponsive */
 		} else {
 			state = ' ';	/* cpu is responding to kdb */
 			if (kdb_task_state_char(KDB_TSK(i)) == 'I')
@@ -2210,7 +2212,7 @@ static int kdb_cpu(int argc, const char **argv)
 	/*
 	 * Validate cpunum
 	 */
-	if ((cpunum > NR_CPUS) || !cpu_online(cpunum))
+	if ((cpunum > NR_CPUS) || !kgdb_info[cpunum].enter_kgdb)
 		return KDB_BADCPUNUM;
 
 	dbg_switch_cpu = cpunum;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RESEND PATCH v2 3.17rc1] kgdb: Timeout if secondary CPUs ignore the roundup
  2014-07-02 14:12 ` [PATCH v2] " Daniel Thompson
@ 2014-08-18 15:01   ` Daniel Thompson
  2014-11-11 17:50     ` [PATCH v3] " Daniel Thompson
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Thompson @ 2014-08-18 15:01 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Daniel Thompson, linux-kernel, patches, linaro-kernel,
	Mike Travis, Randy Dunlap, Dimitri Sivanich, Andrew Morton,
	Borislav Petkov, kgdb-bugreport, Ingo Molnar

Currently if an active CPU fails to respond to a roundup request the
CPU that requested the roundup will become stuck. This needlessly
reduces the robustness of the debugger.

This patch introduces a timeout allowing the system state to be examined
even when the system contains unresponsive processors. It also modifies
kdb's cpu command to make it censor attempts to switch to unresponsive
processors and to report their state as (D)ead.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: Ingo Molnar <mingo@kernel.org>
---

Notes:
    Changes since v1:
    - Set CATASTROPHIC if the system contains unresponsive processors
      (Jason Wessel)

 kernel/debug/debug_core.c       | 9 +++++++--
 kernel/debug/kdb/kdb_debugger.c | 4 ++++
 kernel/debug/kdb/kdb_main.c     | 4 +++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 1adf62b..acd7497 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -471,6 +471,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	int cpu;
 	int trace_on = 0;
 	int online_cpus = num_online_cpus();
+	u64 time_left;

 	kgdb_info[ks->cpu].enter_kgdb++;
 	kgdb_info[ks->cpu].exception_state |= exception_state;
@@ -595,9 +596,13 @@ return_normal:
 	/*
 	 * Wait for the other CPUs to be notified and be waiting for us:
 	 */
-	while (kgdb_do_roundup && (atomic_read(&masters_in_kgdb) +
-				atomic_read(&slaves_in_kgdb)) != online_cpus)
+	time_left = loops_per_jiffy * HZ;
+	while (kgdb_do_roundup && --time_left &&
+	       (atomic_read(&masters_in_kgdb) + atomic_read(&slaves_in_kgdb)) !=
+		   online_cpus)
 		cpu_relax();
+	if (!time_left)
+		pr_crit("KGDB: Timed out waiting for secondary CPUs.\n");

 	/*
 	 * At this point the primary processor is completely
diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c
index 8859ca3..15e1a7a 100644
--- a/kernel/debug/kdb/kdb_debugger.c
+++ b/kernel/debug/kdb/kdb_debugger.c
@@ -129,6 +129,10 @@ int kdb_stub(struct kgdb_state *ks)
 		ks->pass_exception = 1;
 		KDB_FLAG_SET(CATASTROPHIC);
 	}
+	/* set CATASTROPHIC if the system contains unresponsive processors */
+	for_each_online_cpu(i)
+		if (!kgdb_info[i].enter_kgdb)
+			KDB_FLAG_SET(CATASTROPHIC);
 	if (KDB_STATE(SSBPT) && reason == KDB_REASON_SSTEP) {
 		KDB_STATE_CLEAR(SSBPT);
 		KDB_STATE_CLEAR(DOING_SS);
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 379650b..0633e78 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2157,6 +2157,8 @@ static void kdb_cpu_status(void)
 	for (start_cpu = -1, i = 0; i < NR_CPUS; i++) {
 		if (!cpu_online(i)) {
 			state = 'F';	/* cpu is offline */
+		} else if (!kgdb_info[i].enter_kgdb) {
+			state = 'D';	/* cpu is online but unresponsive */
 		} else {
 			state = ' ';	/* cpu is responding to kdb */
 			if (kdb_task_state_char(KDB_TSK(i)) == 'I')
@@ -2210,7 +2212,7 @@ static int kdb_cpu(int argc, const char **argv)
 	/*
 	 * Validate cpunum
 	 */
-	if ((cpunum > NR_CPUS) || !cpu_online(cpunum))
+	if ((cpunum > NR_CPUS) || !kgdb_info[cpunum].enter_kgdb)
 		return KDB_BADCPUNUM;

 	dbg_switch_cpu = cpunum;
--
1.9.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3] kgdb: Timeout if secondary CPUs ignore the roundup
  2014-08-18 15:01   ` [RESEND PATCH v2 3.17rc1] " Daniel Thompson
@ 2014-11-11 17:50     ` Daniel Thompson
  2014-12-10 20:46       ` Daniel Thompson
  2015-01-07 16:15       ` [RESEND PATCH v3 3.19-rc2] " Daniel Thompson
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Thompson @ 2014-11-11 17:50 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Daniel Thompson, linux-kernel, patches, linaro-kernel,
	John Stultz, Sumit Semwal, Mike Travis, Randy Dunlap,
	Dimitri Sivanich, Andrew Morton, Borislav Petkov, kgdb-bugreport,
	Ingo Molnar

Currently if an active CPU fails to respond to a roundup request the
CPU that requested the roundup will become stuck. This needlessly
reduces the robustness of the debugger.

This patch introduces a timeout allowing the system state to be examined
even when the system contains unresponsive processors. It also modifies
kdb's cpu command to make it censor attempts to switch to unresponsive
processors and to report their state as (D)ead.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: Ingo Molnar <mingo@kernel.org>
---

Notes:
    I had to spin this one again due to an out-by-one in kdb_cpu() so
    I've tackled a couple of other small issues at the same time.
    
    v3:
    
    * Fix an out-by-one error in kdb_cpu().
    
    * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we
      really want a static limit (Jason Wessel).
    
    * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c
      (kgdb-next contains a patch which introduced pr_fmt() to this file
      to the tag will now be applied automatically).
    
    v2:
    
    * Set CATASTROPHIC if the system contains unresponsive processors
      (Jason Wessel)
    

 kernel/debug/debug_core.c       | 9 +++++++--
 kernel/debug/kdb/kdb_debugger.c | 4 ++++
 kernel/debug/kdb/kdb_main.c     | 4 +++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 1adf62b39b96..f21580b347cc 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -471,6 +471,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	int cpu;
 	int trace_on = 0;
 	int online_cpus = num_online_cpus();
+	u64 time_left;

 	kgdb_info[ks->cpu].enter_kgdb++;
 	kgdb_info[ks->cpu].exception_state |= exception_state;
@@ -595,9 +596,13 @@ return_normal:
 	/*
 	 * Wait for the other CPUs to be notified and be waiting for us:
 	 */
-	while (kgdb_do_roundup && (atomic_read(&masters_in_kgdb) +
-				atomic_read(&slaves_in_kgdb)) != online_cpus)
+	time_left = loops_per_jiffy * HZ;
+	while (kgdb_do_roundup && --time_left &&
+	       (atomic_read(&masters_in_kgdb) + atomic_read(&slaves_in_kgdb)) !=
+		   online_cpus)
 		cpu_relax();
+	if (!time_left)
+		pr_crit("Timed out waiting for secondary CPUs.\n");

 	/*
 	 * At this point the primary processor is completely
diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c
index 8859ca34dcfe..15e1a7af5dd0 100644
--- a/kernel/debug/kdb/kdb_debugger.c
+++ b/kernel/debug/kdb/kdb_debugger.c
@@ -129,6 +129,10 @@ int kdb_stub(struct kgdb_state *ks)
 		ks->pass_exception = 1;
 		KDB_FLAG_SET(CATASTROPHIC);
 	}
+	/* set CATASTROPHIC if the system contains unresponsive processors */
+	for_each_online_cpu(i)
+		if (!kgdb_info[i].enter_kgdb)
+			KDB_FLAG_SET(CATASTROPHIC);
 	if (KDB_STATE(SSBPT) && reason == KDB_REASON_SSTEP) {
 		KDB_STATE_CLEAR(SSBPT);
 		KDB_STATE_CLEAR(DOING_SS);
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 379650b984f8..0c1dc7fa2e58 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2157,6 +2157,8 @@ static void kdb_cpu_status(void)
 	for (start_cpu = -1, i = 0; i < NR_CPUS; i++) {
 		if (!cpu_online(i)) {
 			state = 'F';	/* cpu is offline */
+		} else if (!kgdb_info[i].enter_kgdb) {
+			state = 'D';	/* cpu is online but unresponsive */
 		} else {
 			state = ' ';	/* cpu is responding to kdb */
 			if (kdb_task_state_char(KDB_TSK(i)) == 'I')
@@ -2210,7 +2212,7 @@ static int kdb_cpu(int argc, const char **argv)
 	/*
 	 * Validate cpunum
 	 */
-	if ((cpunum > NR_CPUS) || !cpu_online(cpunum))
+	if ((cpunum >= CONFIG_NR_CPUS) || !kgdb_info[cpunum].enter_kgdb)
 		return KDB_BADCPUNUM;

 	dbg_switch_cpu = cpunum;
--
1.9.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] kgdb: Timeout if secondary CPUs ignore the roundup
  2014-11-11 17:50     ` [PATCH v3] " Daniel Thompson
@ 2014-12-10 20:46       ` Daniel Thompson
  2015-01-07 16:15       ` [RESEND PATCH v3 3.19-rc2] " Daniel Thompson
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Thompson @ 2014-12-10 20:46 UTC (permalink / raw)
  To: Jason Wessel
  Cc: linux-kernel, patches, linaro-kernel, John Stultz, Sumit Semwal,
	Mike Travis, Randy Dunlap, Dimitri Sivanich, Andrew Morton,
	Borislav Petkov, kgdb-bugreport, Ingo Molnar, Andrew Morton

Hi Jason

On 11/11/14 17:50, Daniel Thompson wrote:
> Currently if an active CPU fails to respond to a roundup request the
> CPU that requested the roundup will become stuck. This needlessly
> reduces the robustness of the debugger.
> 
> This patch introduces a timeout allowing the system state to be examined
> even when the system contains unresponsive processors. It also modifies
> kdb's cpu command to make it censor attempts to switch to unresponsive
> processors and to report their state as (D)ead.
> 
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> Cc: Jason Wessel <jason.wessel@windriver.com>
> Cc: Mike Travis <travis@sgi.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Dimitri Sivanich <sivanich@sgi.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: kgdb-bugreport@lists.sourceforge.net
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
> 
> Notes:
>     I had to spin this one again due to an out-by-one in kdb_cpu() so
>     I've tackled a couple of other small issues at the same time.
>     
>     v3:
>     
>     * Fix an out-by-one error in kdb_cpu().
>     
>     * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we
>       really want a static limit (Jason Wessel).
>     
>     * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c
>       (kgdb-next contains a patch which introduced pr_fmt() to this file
>       to the tag will now be applied automatically).

It looks like this patch is still at v2 in your kgdb-next tree. It
really would be better to switch it for v3 before sending Linus a pull
request. My patch didn't introduce the out-by-one error mentioned about
but the changes I made did change it from being a latent bug to a real one.

Also, at this stage in the merge window I'm not sure what you plan to do
with the other patches still pending from me. They've all been out for
review for a long time but, obviously, haven't spent any time in linux-next:


kdb: Remove stack dump when entering kgdb due to NMI
http://thread.gmane.org/gmane.linux.kernel/1748706/focus=1823086

kdb: Improve command output searching
http://thread.gmane.org/gmane.linux.kernel.debugging.kgdb.bugs/6569/focus=1823109

kdb: Avoid printing KERN_ levels to consoles
  (KERN_ levels are now not-printable characters so putting them to
  consoles it bad)
http://thread.gmane.org/gmane.linux.kernel.debugging.kgdb.bugs/6577/focus=112757


Daniel.


>     v2:
>     
>     * Set CATASTROPHIC if the system contains unresponsive processors
>       (Jason Wessel)
>     
> 
>  kernel/debug/debug_core.c       | 9 +++++++--
>  kernel/debug/kdb/kdb_debugger.c | 4 ++++
>  kernel/debug/kdb/kdb_main.c     | 4 +++-
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 1adf62b39b96..f21580b347cc 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -471,6 +471,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
>  	int cpu;
>  	int trace_on = 0;
>  	int online_cpus = num_online_cpus();
> +	u64 time_left;
> 
>  	kgdb_info[ks->cpu].enter_kgdb++;
>  	kgdb_info[ks->cpu].exception_state |= exception_state;
> @@ -595,9 +596,13 @@ return_normal:
>  	/*
>  	 * Wait for the other CPUs to be notified and be waiting for us:
>  	 */
> -	while (kgdb_do_roundup && (atomic_read(&masters_in_kgdb) +
> -				atomic_read(&slaves_in_kgdb)) != online_cpus)
> +	time_left = loops_per_jiffy * HZ;
> +	while (kgdb_do_roundup && --time_left &&
> +	       (atomic_read(&masters_in_kgdb) + atomic_read(&slaves_in_kgdb)) !=
> +		   online_cpus)
>  		cpu_relax();
> +	if (!time_left)
> +		pr_crit("Timed out waiting for secondary CPUs.\n");
> 
>  	/*
>  	 * At this point the primary processor is completely
> diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c
> index 8859ca34dcfe..15e1a7af5dd0 100644
> --- a/kernel/debug/kdb/kdb_debugger.c
> +++ b/kernel/debug/kdb/kdb_debugger.c
> @@ -129,6 +129,10 @@ int kdb_stub(struct kgdb_state *ks)
>  		ks->pass_exception = 1;
>  		KDB_FLAG_SET(CATASTROPHIC);
>  	}
> +	/* set CATASTROPHIC if the system contains unresponsive processors */
> +	for_each_online_cpu(i)
> +		if (!kgdb_info[i].enter_kgdb)
> +			KDB_FLAG_SET(CATASTROPHIC);
>  	if (KDB_STATE(SSBPT) && reason == KDB_REASON_SSTEP) {
>  		KDB_STATE_CLEAR(SSBPT);
>  		KDB_STATE_CLEAR(DOING_SS);
> diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
> index 379650b984f8..0c1dc7fa2e58 100644
> --- a/kernel/debug/kdb/kdb_main.c
> +++ b/kernel/debug/kdb/kdb_main.c
> @@ -2157,6 +2157,8 @@ static void kdb_cpu_status(void)
>  	for (start_cpu = -1, i = 0; i < NR_CPUS; i++) {
>  		if (!cpu_online(i)) {
>  			state = 'F';	/* cpu is offline */
> +		} else if (!kgdb_info[i].enter_kgdb) {
> +			state = 'D';	/* cpu is online but unresponsive */
>  		} else {
>  			state = ' ';	/* cpu is responding to kdb */
>  			if (kdb_task_state_char(KDB_TSK(i)) == 'I')
> @@ -2210,7 +2212,7 @@ static int kdb_cpu(int argc, const char **argv)
>  	/*
>  	 * Validate cpunum
>  	 */
> -	if ((cpunum > NR_CPUS) || !cpu_online(cpunum))
> +	if ((cpunum >= CONFIG_NR_CPUS) || !kgdb_info[cpunum].enter_kgdb)
>  		return KDB_BADCPUNUM;
> 
>  	dbg_switch_cpu = cpunum;
> --
> 1.9.3
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RESEND PATCH v3 3.19-rc2] kgdb: Timeout if secondary CPUs ignore the roundup
  2014-11-11 17:50     ` [PATCH v3] " Daniel Thompson
  2014-12-10 20:46       ` Daniel Thompson
@ 2015-01-07 16:15       ` Daniel Thompson
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Thompson @ 2015-01-07 16:15 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Daniel Thompson, linux-kernel, patches, linaro-kernel,
	John Stultz, Sumit Semwal, Mike Travis, Randy Dunlap,
	Dimitri Sivanich, Andrew Morton, Borislav Petkov, kgdb-bugreport,
	Ingo Molnar

Currently if an active CPU fails to respond to a roundup request the
CPU that requested the roundup will become stuck. This needlessly
reduces the robustness of the debugger.

This patch introduces a timeout allowing the system state to be examined
even when the system contains unresponsive processors. It also modifies
kdb's cpu command to make it censor attempts to switch to unresponsive
processors and to report their state as (D)ead.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: Ingo Molnar <mingo@kernel.org>
---

Notes:
    Jason:
      v2 of this patch is already integrated into kgdb-next. It's probably
      best to nuke v2 and replace it with this patch. However I can easily
      provide a diff of v2 versus v3 if you prefer. Just ask...
    
    v3:
    
    * Fix an out-by-one error in kdb_cpu().
    
    * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we
      really want a static limit (Jason Wessel).
    
    * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c
      (kgdb-next contains a patch which introduced pr_fmt() to this file
      to the tag will now be applied automatically).
    
    v2:
    
    * Set CATASTROPHIC if the system contains unresponsive processors
      (Jason Wessel)
    

 kernel/debug/debug_core.c       | 9 +++++++--
 kernel/debug/kdb/kdb_debugger.c | 4 ++++
 kernel/debug/kdb/kdb_main.c     | 4 +++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 1adf62b39b96..f21580b347cc 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -471,6 +471,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	int cpu;
 	int trace_on = 0;
 	int online_cpus = num_online_cpus();
+	u64 time_left;

 	kgdb_info[ks->cpu].enter_kgdb++;
 	kgdb_info[ks->cpu].exception_state |= exception_state;
@@ -595,9 +596,13 @@ return_normal:
 	/*
 	 * Wait for the other CPUs to be notified and be waiting for us:
 	 */
-	while (kgdb_do_roundup && (atomic_read(&masters_in_kgdb) +
-				atomic_read(&slaves_in_kgdb)) != online_cpus)
+	time_left = loops_per_jiffy * HZ;
+	while (kgdb_do_roundup && --time_left &&
+	       (atomic_read(&masters_in_kgdb) + atomic_read(&slaves_in_kgdb)) !=
+		   online_cpus)
 		cpu_relax();
+	if (!time_left)
+		pr_crit("Timed out waiting for secondary CPUs.\n");

 	/*
 	 * At this point the primary processor is completely
diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c
index 8859ca34dcfe..15e1a7af5dd0 100644
--- a/kernel/debug/kdb/kdb_debugger.c
+++ b/kernel/debug/kdb/kdb_debugger.c
@@ -129,6 +129,10 @@ int kdb_stub(struct kgdb_state *ks)
 		ks->pass_exception = 1;
 		KDB_FLAG_SET(CATASTROPHIC);
 	}
+	/* set CATASTROPHIC if the system contains unresponsive processors */
+	for_each_online_cpu(i)
+		if (!kgdb_info[i].enter_kgdb)
+			KDB_FLAG_SET(CATASTROPHIC);
 	if (KDB_STATE(SSBPT) && reason == KDB_REASON_SSTEP) {
 		KDB_STATE_CLEAR(SSBPT);
 		KDB_STATE_CLEAR(DOING_SS);
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 379650b984f8..0c1dc7fa2e58 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2157,6 +2157,8 @@ static void kdb_cpu_status(void)
 	for (start_cpu = -1, i = 0; i < NR_CPUS; i++) {
 		if (!cpu_online(i)) {
 			state = 'F';	/* cpu is offline */
+		} else if (!kgdb_info[i].enter_kgdb) {
+			state = 'D';	/* cpu is online but unresponsive */
 		} else {
 			state = ' ';	/* cpu is responding to kdb */
 			if (kdb_task_state_char(KDB_TSK(i)) == 'I')
@@ -2210,7 +2212,7 @@ static int kdb_cpu(int argc, const char **argv)
 	/*
 	 * Validate cpunum
 	 */
-	if ((cpunum > NR_CPUS) || !cpu_online(cpunum))
+	if ((cpunum >= CONFIG_NR_CPUS) || !kgdb_info[cpunum].enter_kgdb)
 		return KDB_BADCPUNUM;

 	dbg_switch_cpu = cpunum;
--
1.9.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-01-07 16:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-01 14:16 [RFC PATCH] kgdb: Timeout if secondary CPUs ignore the roundup Daniel Thompson
2014-07-01 14:22 ` Jason Wessel
2014-07-02 14:12 ` [PATCH v2] " Daniel Thompson
2014-08-18 15:01   ` [RESEND PATCH v2 3.17rc1] " Daniel Thompson
2014-11-11 17:50     ` [PATCH v3] " Daniel Thompson
2014-12-10 20:46       ` Daniel Thompson
2015-01-07 16:15       ` [RESEND PATCH v3 3.19-rc2] " Daniel Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).