* [PATCH] powerpc/eeh: Disable EEH stack dump by default
@ 2017-09-19 14:25 Jose Ricardo Ziviani
2017-09-20 4:47 ` Michael Ellerman
2017-09-20 5:54 ` Andrew Donnellan
0 siblings, 2 replies; 4+ messages in thread
From: Jose Ricardo Ziviani @ 2017-09-19 14:25 UTC (permalink / raw)
To: linuxppc-dev
Today, each EEH causes a stack dump to be printed in the logs. In
production environment it's not quite necessary. Thus, this patch
adds a new command line argument in order to enable the stack
dump for debugging purposes.
For example, instead of the following:
[ 131.778661] EEH: Frozen PHB#2-PE#fd detected
[ 131.778672] EEH: PE location: N/A, PHB location: N/A
[ 131.778677] CPU: 21 PID: 10098 Comm: lspci Not tainted ...
[ 131.778680] Call Trace:
[ 131.778686] [c0000003a140bab0] [c000000000beb58c] dump_stack+...
<snip ~10 lines>
[ 131.778770] EEH: Detected PCI bus error on PHB#2-PE#fd
[ 131.778775] EEH: This PCI device has failed 1 times in the last hour
...
we will have this by default:
[12777.175880] EEH: Frozen PHB#2-PE#fd detected
[12777.175893] EEH: PE location: N/A, PHB location: N/A
[12777.175922] EEH: Detected PCI bus error on PHB#2-PE#fd
[12777.175931] EEH: This PCI device has failed 2 times in the last hour
...
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
---
arch/powerpc/kernel/eeh.c | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9e81678..4336c3b1 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -157,6 +157,19 @@ static int __init eeh_setup(char *str)
__setup("eeh=", eeh_setup);
/*
+ * It's not necessary to dump the stack trace when an EEH occours
+ * in the production environment. For debugging, the command line
+ * option "enable_eeh_stacktrace" brings the stack dump back
+ */
+static bool eeh_show_stacktrace;
+static int __init enable_eeh_stacktrace(char *p)
+{
+ eeh_show_stacktrace = true;
+ return 0;
+}
+early_param("enable_eeh_stacktrace", enable_eeh_stacktrace);
+
+/*
* This routine captures assorted PCI configuration space data
* for the indicated PCI device, and puts them into a buffer
* for RTAS error logging.
@@ -407,7 +420,10 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
pr_err("EEH: PHB#%x failure detected, location: %s\n",
phb_pe->phb->global_number, eeh_pe_loc_get(phb_pe));
- dump_stack();
+
+ if (eeh_show_stacktrace)
+ dump_stack();
+
eeh_send_failure_event(phb_pe);
return 1;
@@ -504,7 +520,9 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
eeh_driver_name(dev), eeh_pci_name(dev));
printk(KERN_ERR "EEH: Might be infinite loop in %s driver\n",
eeh_driver_name(dev));
- dump_stack();
+
+ if (eeh_show_stacktrace)
+ dump_stack();
}
goto dn_unlock;
}
@@ -572,7 +590,9 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
pe->phb->global_number, pe->addr);
pr_err("EEH: PE location: %s, PHB location: %s\n",
eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
- dump_stack();
+
+ if (eeh_show_stacktrace)
+ dump_stack();
eeh_send_failure_event(pe);
--
2.7.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] powerpc/eeh: Disable EEH stack dump by default
2017-09-19 14:25 [PATCH] powerpc/eeh: Disable EEH stack dump by default Jose Ricardo Ziviani
@ 2017-09-20 4:47 ` Michael Ellerman
2017-09-20 17:55 ` joserz
2017-09-20 5:54 ` Andrew Donnellan
1 sibling, 1 reply; 4+ messages in thread
From: Michael Ellerman @ 2017-09-20 4:47 UTC (permalink / raw)
To: Jose Ricardo Ziviani, linuxppc-dev
Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> writes:
> Today, each EEH causes a stack dump to be printed in the logs. In
> production environment it's not quite necessary. Thus, this patch
I'm unconvinced. A production environment is exactly where you don't
want to be getting an EEH, and so if you *do* then every bit of
information is helpful.
> For example, instead of the following:
>
> [ 131.778661] EEH: Frozen PHB#2-PE#fd detected
> [ 131.778672] EEH: PE location: N/A, PHB location: N/A
> [ 131.778677] CPU: 21 PID: 10098 Comm: lspci Not tainted ...
> [ 131.778680] Call Trace:
> [ 131.778686] [c0000003a140bab0] [c000000000beb58c] dump_stack+...
> <snip ~10 lines>
> [ 131.778770] EEH: Detected PCI bus error on PHB#2-PE#fd
> [ 131.778775] EEH: This PCI device has failed 1 times in the last hour
> ...
>
> we will have this by default:
>
> [12777.175880] EEH: Frozen PHB#2-PE#fd detected
> [12777.175893] EEH: PE location: N/A, PHB location: N/A
> [12777.175922] EEH: Detected PCI bus error on PHB#2-PE#fd
> [12777.175931] EEH: This PCI device has failed 2 times in the last hour
*What* PCI device?
How am I supposed to know what device/driver just failed? If I had the
stack trace I could probably at least work it out based on the driver
involved.
cheers
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] powerpc/eeh: Disable EEH stack dump by default
2017-09-20 4:47 ` Michael Ellerman
@ 2017-09-20 17:55 ` joserz
0 siblings, 0 replies; 4+ messages in thread
From: joserz @ 2017-09-20 17:55 UTC (permalink / raw)
To: Michael Ellerman, andrew.donnellan; +Cc: linuxppc-dev, ruscur
On Wed, Sep 20, 2017 at 02:47:08PM +1000, Michael Ellerman wrote:
> Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> writes:
>
> > Today, each EEH causes a stack dump to be printed in the logs. In
> > production environment it's not quite necessary. Thus, this patch
>
> I'm unconvinced. A production environment is exactly where you don't
> want to be getting an EEH, and so if you *do* then every bit of
> information is helpful.
>
> > For example, instead of the following:
> >
> > [ 131.778661] EEH: Frozen PHB#2-PE#fd detected
> > [ 131.778672] EEH: PE location: N/A, PHB location: N/A
> > [ 131.778677] CPU: 21 PID: 10098 Comm: lspci Not tainted ...
> > [ 131.778680] Call Trace:
> > [ 131.778686] [c0000003a140bab0] [c000000000beb58c] dump_stack+...
> > <snip ~10 lines>
> > [ 131.778770] EEH: Detected PCI bus error on PHB#2-PE#fd
> > [ 131.778775] EEH: This PCI device has failed 1 times in the last hour
> > ...
> >
> > we will have this by default:
> >
> > [12777.175880] EEH: Frozen PHB#2-PE#fd detected
> > [12777.175893] EEH: PE location: N/A, PHB location: N/A
> > [12777.175922] EEH: Detected PCI bus error on PHB#2-PE#fd
> > [12777.175931] EEH: This PCI device has failed 2 times in the last hour
>
> *What* PCI device?
>
> How am I supposed to know what device/driver just failed? If I had the
> stack trace I could probably at least work it out based on the driver
> involved.
>
> cheers
>
Thank you guys! More people told me it's important to keep it as is.
Please, disregard this patch.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] powerpc/eeh: Disable EEH stack dump by default
2017-09-19 14:25 [PATCH] powerpc/eeh: Disable EEH stack dump by default Jose Ricardo Ziviani
2017-09-20 4:47 ` Michael Ellerman
@ 2017-09-20 5:54 ` Andrew Donnellan
1 sibling, 0 replies; 4+ messages in thread
From: Andrew Donnellan @ 2017-09-20 5:54 UTC (permalink / raw)
To: Jose Ricardo Ziviani, linuxppc-dev
On 20/09/17 00:25, Jose Ricardo Ziviani wrote:
> Today, each EEH causes a stack dump to be printed in the logs. In
> production environment it's not quite necessary. Thus, this patch
> adds a new command line argument in order to enable the stack
> dump for debugging purposes.
>
> For example, instead of the following:
>
> [ 131.778661] EEH: Frozen PHB#2-PE#fd detected
> [ 131.778672] EEH: PE location: N/A, PHB location: N/A
> [ 131.778677] CPU: 21 PID: 10098 Comm: lspci Not tainted ...
> [ 131.778680] Call Trace:
> [ 131.778686] [c0000003a140bab0] [c000000000beb58c] dump_stack+...
> <snip ~10 lines>
> [ 131.778770] EEH: Detected PCI bus error on PHB#2-PE#fd
> [ 131.778775] EEH: This PCI device has failed 1 times in the last hour
> ...
>
> we will have this by default:
>
> [12777.175880] EEH: Frozen PHB#2-PE#fd detected
> [12777.175893] EEH: PE location: N/A, PHB location: N/A
> [12777.175922] EEH: Detected PCI bus error on PHB#2-PE#fd
> [12777.175931] EEH: This PCI device has failed 2 times in the last hour
> ...
>
> Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
As someone who's had to debug far too many EEH-related bugs, I'd really
prefer if this remained as is.
Andrew
> ---
> arch/powerpc/kernel/eeh.c | 26 +++++++++++++++++++++++---
> 1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 9e81678..4336c3b1 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -157,6 +157,19 @@ static int __init eeh_setup(char *str)
> __setup("eeh=", eeh_setup);
>
> /*
> + * It's not necessary to dump the stack trace when an EEH occours
> + * in the production environment. For debugging, the command line
> + * option "enable_eeh_stacktrace" brings the stack dump back
> + */
> +static bool eeh_show_stacktrace;
> +static int __init enable_eeh_stacktrace(char *p)
> +{
> + eeh_show_stacktrace = true;
> + return 0;
> +}
> +early_param("enable_eeh_stacktrace", enable_eeh_stacktrace);
> +
> +/*
> * This routine captures assorted PCI configuration space data
> * for the indicated PCI device, and puts them into a buffer
> * for RTAS error logging.
> @@ -407,7 +420,10 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
>
> pr_err("EEH: PHB#%x failure detected, location: %s\n",
> phb_pe->phb->global_number, eeh_pe_loc_get(phb_pe));
> - dump_stack();
> +
> + if (eeh_show_stacktrace)
> + dump_stack();
> +
> eeh_send_failure_event(phb_pe);
>
> return 1;
> @@ -504,7 +520,9 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
> eeh_driver_name(dev), eeh_pci_name(dev));
> printk(KERN_ERR "EEH: Might be infinite loop in %s driver\n",
> eeh_driver_name(dev));
> - dump_stack();
> +
> + if (eeh_show_stacktrace)
> + dump_stack();
> }
> goto dn_unlock;
> }
> @@ -572,7 +590,9 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
> pe->phb->global_number, pe->addr);
> pr_err("EEH: PE location: %s, PHB location: %s\n",
> eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
> - dump_stack();
> +
> + if (eeh_show_stacktrace)
> + dump_stack();
>
> eeh_send_failure_event(pe);
>
>
--
Andrew Donnellan OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com IBM Australia Limited
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-20 17:55 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-19 14:25 [PATCH] powerpc/eeh: Disable EEH stack dump by default Jose Ricardo Ziviani
2017-09-20 4:47 ` Michael Ellerman
2017-09-20 17:55 ` joserz
2017-09-20 5:54 ` Andrew Donnellan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).