* [PATCH v3] powerpc/pseries: Limit EPOW reset event warnings
@ 2015-07-14 15:09 Kamalesh Babulal
2015-07-14 18:01 ` Vipin K Parashar
0 siblings, 1 reply; 9+ messages in thread
From: Kamalesh Babulal @ 2015-07-14 15:09 UTC (permalink / raw)
To: linuxppc-dev
Cc: vipin, Kamalesh Babulal, Anshuman Khandual, Anton Blanchard,
Michael Ellerman
We print the respective warning after parsing EPOW interrupts,
prompting user to take action depending upon the severity of the
event.
Some times EPOW rest event warning, such as below could flood
kernel log, over a period of time. Limit these warnings by use of
epow_state flag, which is initialized to false and when any event
gets reported, the flag set to true once an event gets acknowledged
by a reset.
The reset action is guarded by bool flag (set only if there was event
reported previously) and ignore multiple resets, without real EPOW
event. Also, merge adjacent pr_err/pr_emerg into single one to reduce
the number of lines printed per warning.
May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
Suggested-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
---
v3 Changes:
- Limit warning printed by EPOW RESET event, by guarding it with bool flag.
Instead of rate limiting all the EPOW events.
v2 Changes:
- Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
warnings, based on Michael's comments.
arch/powerpc/platforms/pseries/ras.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 02e4a17..b30396a 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -40,6 +40,9 @@ static int ras_check_exception_token;
#define EPOW_SENSOR_TOKEN 9
#define EPOW_SENSOR_INDEX 0
+/* Flag to limit EPOW RESET warning. */
+static bool epow_state;
+
static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
@@ -145,21 +148,27 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
switch (action_code) {
case EPOW_RESET:
- pr_err("Non critical power or cooling issue cleared");
+ if (epow_state) {
+ pr_err("Non critical power or cooling issue cleared");
+ epow_state = false;
+ }
break;
case EPOW_WARN_COOLING:
- pr_err("Non critical cooling issue reported by firmware");
- pr_err("Check RTAS error log for details");
+ pr_err("Non critical cooling issue reported by firmware, "
+ "Check RTAS error log for details");
+ epow_state = true;
break;
case EPOW_WARN_POWER:
- pr_err("Non critical power issue reported by firmware");
- pr_err("Check RTAS error log for details");
+ pr_err("Non critical power issue reported by firmware, "
+ "Check RTAS error log for details");
+ epow_state = true;
break;
case EPOW_SYSTEM_SHUTDOWN:
handle_system_shutdown(epow_log->event_modifier);
+ epow_state = true;
break;
case EPOW_SYSTEM_HALT:
@@ -169,9 +178,8 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
case EPOW_MAIN_ENCLOSURE:
case EPOW_POWER_OFF:
- pr_emerg("Critical power/cooling issue reported by firmware");
- pr_emerg("Check RTAS error log for details");
- pr_emerg("Immediate power off");
+ pr_emerg("Critical power/cooling issue reported by firmware, "
+ "Check RTAS error log for details. Immediate power off.");
emergency_sync();
kernel_power_off();
break;
@@ -179,6 +187,7 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
default:
pr_err("Unknown power/cooling event (action code %d)",
action_code);
+ epow_state = true;
}
}
--
2.1.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3] powerpc/pseries: Limit EPOW reset event warnings
2015-07-14 15:09 [PATCH v3] powerpc/pseries: Limit EPOW reset event warnings Kamalesh Babulal
@ 2015-07-14 18:01 ` Vipin K Parashar
2015-07-15 4:22 ` [RESEND PATCH " Kamalesh Babulal
0 siblings, 1 reply; 9+ messages in thread
From: Vipin K Parashar @ 2015-07-14 18:01 UTC (permalink / raw)
To: Kamalesh Babulal, linuxppc-dev
Cc: Anshuman Khandual, Anton Blanchard, Michael Ellerman
Patch looks good. Though it seems that we can improve upon
commit log description to better describe the problem and solution.
Few suggestions as below:
Avoid multiple EPOW reset ..........
is better suited as one line description of this problem.
On 07/14/2015 08:39 PM, Kamalesh Babulal wrote:
> We print the respective warning after parsing EPOW interrupts,
Kernel prints respective warnings about various EPOW events for
user information/action after parsing EPOW interrupts.
> prompting user to take action depending upon the severity of the
> event.
Please merge below line with above one.
>
> Some times EPOW rest event warning, such as below could flood
^ ^ ^^
At times EPOW reset event warning is found to be flooding..
> kernel log, over a period of time.
Paste the multiple warnings log here.
> Limit these warnings by use of
This patch avoids these multiple EPOW reset warnings by using a boolean
flag.
> epow_state flag, which is initialized to false and when any event
This flag is initialized to false and is set to true upon arrival of
EPOW event.
> gets reported, the flag set to true once an event gets acknowledged
> by a reset.
>
> The reset action is guarded by bool flag (set only if there was event
This same flag is checked and reset during EPOW_RESET scenario to
filter out valid EPOW reset events and avoid multiple warning logs.
> reported previously) and ignore multiple resets, without real EPOW
> event.
> Also, merge adjacent pr_err/pr_emerg into single one to reduce
^
merged
> the number of lines printed per warning.
>
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
>
> Suggested-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Cc: Anton Blanchard <anton@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> ---
> v3 Changes:
> - Limit warning printed by EPOW RESET event, by guarding it with bool flag.
> Instead of rate limiting all the EPOW events.
>
> v2 Changes:
> - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
> warnings, based on Michael's comments.
>
> arch/powerpc/platforms/pseries/ras.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..b30396a 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -40,6 +40,9 @@ static int ras_check_exception_token;
> #define EPOW_SENSOR_TOKEN 9
> #define EPOW_SENSOR_INDEX 0
>
> +/* Flag to limit EPOW RESET warning. */
> +static bool epow_state;
> +
> static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
> static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
>
> @@ -145,21 +148,27 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
> switch (action_code) {
> case EPOW_RESET:
> - pr_err("Non critical power or cooling issue cleared");
> + if (epow_state) {
> + pr_err("Non critical power or cooling issue cleared");
> + epow_state = false;
> + }
> break;
>
> case EPOW_WARN_COOLING:
> - pr_err("Non critical cooling issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_err("Non critical cooling issue reported by firmware, "
> + "Check RTAS error log for details");
> + epow_state = true;
> break;
>
> case EPOW_WARN_POWER:
> - pr_err("Non critical power issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_err("Non critical power issue reported by firmware, "
> + "Check RTAS error log for details");
> + epow_state = true;
> break;
>
> case EPOW_SYSTEM_SHUTDOWN:
> handle_system_shutdown(epow_log->event_modifier);
> + epow_state = true;
> break;
>
> case EPOW_SYSTEM_HALT:
> @@ -169,9 +178,8 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
> case EPOW_MAIN_ENCLOSURE:
> case EPOW_POWER_OFF:
> - pr_emerg("Critical power/cooling issue reported by firmware");
> - pr_emerg("Check RTAS error log for details");
> - pr_emerg("Immediate power off");
> + pr_emerg("Critical power/cooling issue reported by firmware, "
> + "Check RTAS error log for details. Immediate power off.");
> emergency_sync();
> kernel_power_off();
> break;
> @@ -179,6 +187,7 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
> default:
> pr_err("Unknown power/cooling event (action code %d)",
> action_code);
> + epow_state = true;
> }
> }
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RESEND PATCH v3] powerpc/pseries: Limit EPOW reset event warnings
2015-07-14 18:01 ` Vipin K Parashar
@ 2015-07-15 4:22 ` Kamalesh Babulal
2015-07-15 7:01 ` Vipin K Parashar
2015-07-16 4:05 ` [RESEND,v3] " Michael Ellerman
0 siblings, 2 replies; 9+ messages in thread
From: Kamalesh Babulal @ 2015-07-15 4:22 UTC (permalink / raw)
To: linuxppc-dev
Cc: Kamalesh Babulal, vipin, Anshuman Khandual, Anton Blanchard,
Michael Ellerman
Kernel prints respective warnings about various EPOW events for
user information/action after parsing EPOW interrupts.Prompting
user to take action depending upon the severity of the event.
At times EPOW reset event warning, such as below could flood
kernel log, over a period of time.
May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
This patch avoids these multiple EPOW reset warnings by using a boolean
flag. This flag is initialized to false and is set to true upon arrival
of EPOW event. This same flag is checked and reset during EPOW_RESET
scenario to filter out valid EPOW reset events and avoid multiple warning
logs.
Also, merged adjacent pr_err/pr_emerg into single one to reduce
the number of lines printed per warning.
Suggested-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
[Vipin: edited the changelog]
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
---
v3 Changes:
- Limit warning printed by EPOW RESET event, by guarding it with bool flag.
Instead of rate limiting all the EPOW events.
v2 Changes:
- Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
warnings, based on Michael's comments.
arch/powerpc/platforms/pseries/ras.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 02e4a17..b30396a 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -40,6 +40,9 @@ static int ras_check_exception_token;
#define EPOW_SENSOR_TOKEN 9
#define EPOW_SENSOR_INDEX 0
+/* Flag to limit EPOW RESET warning. */
+static bool epow_state;
+
static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
@@ -145,21 +148,27 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
switch (action_code) {
case EPOW_RESET:
- pr_err("Non critical power or cooling issue cleared");
+ if (epow_state) {
+ pr_err("Non critical power or cooling issue cleared");
+ epow_state = false;
+ }
break;
case EPOW_WARN_COOLING:
- pr_err("Non critical cooling issue reported by firmware");
- pr_err("Check RTAS error log for details");
+ pr_err("Non critical cooling issue reported by firmware, "
+ "Check RTAS error log for details");
+ epow_state = true;
break;
case EPOW_WARN_POWER:
- pr_err("Non critical power issue reported by firmware");
- pr_err("Check RTAS error log for details");
+ pr_err("Non critical power issue reported by firmware, "
+ "Check RTAS error log for details");
+ epow_state = true;
break;
case EPOW_SYSTEM_SHUTDOWN:
handle_system_shutdown(epow_log->event_modifier);
+ epow_state = true;
break;
case EPOW_SYSTEM_HALT:
@@ -169,9 +178,8 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
case EPOW_MAIN_ENCLOSURE:
case EPOW_POWER_OFF:
- pr_emerg("Critical power/cooling issue reported by firmware");
- pr_emerg("Check RTAS error log for details");
- pr_emerg("Immediate power off");
+ pr_emerg("Critical power/cooling issue reported by firmware, "
+ "Check RTAS error log for details. Immediate power off.");
emergency_sync();
kernel_power_off();
break;
@@ -179,6 +187,7 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
default:
pr_err("Unknown power/cooling event (action code %d)",
action_code);
+ epow_state = true;
}
}
--
2.1.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RESEND PATCH v3] powerpc/pseries: Limit EPOW reset event warnings
2015-07-15 4:22 ` [RESEND PATCH " Kamalesh Babulal
@ 2015-07-15 7:01 ` Vipin K Parashar
2015-07-16 4:05 ` [RESEND,v3] " Michael Ellerman
1 sibling, 0 replies; 9+ messages in thread
From: Vipin K Parashar @ 2015-07-15 7:01 UTC (permalink / raw)
To: Kamalesh Babulal, linuxppc-dev
Cc: Anshuman Khandual, Anton Blanchard, Michael Ellerman
On 07/15/2015 09:52 AM, Kamalesh Babulal wrote:
> Kernel prints respective warnings about various EPOW events for
> user information/action after parsing EPOW interrupts. Prompting
> user to take action depending upon the severity of the event.
Second line probably isn't needed. Also below line can be merged with
first one
as both are in same context to describe problem.
>
> At times EPOW reset event warning, such as below could flood
> kernel log, over a period of time.
>
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
>
> This patch avoids these multiple EPOW reset warnings by using a boolean
> flag. This flag is initialized to false and is set to true upon arrival
> of EPOW event. This same flag is checked and reset during EPOW_RESET
> scenario to filter out valid EPOW reset events and avoid multiple warning
> logs.
>
> Also, merged adjacent pr_err/pr_emerg into single one to reduce
> the number of lines printed per warning.
>
> Suggested-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
> [Vipin: edited the changelog]
This probably should go to change summary below.
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Cc: Anton Blanchard <anton@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> ---
> v3 Changes:
> - Limit warning printed by EPOW RESET event, by guarding it with bool flag.
> Instead of rate limiting all the EPOW events.
>
> v2 Changes:
> - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
> warnings, based on Michael's comments.
>
> arch/powerpc/platforms/pseries/ras.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..b30396a 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -40,6 +40,9 @@ static int ras_check_exception_token;
> #define EPOW_SENSOR_TOKEN 9
> #define EPOW_SENSOR_INDEX 0
>
> +/* Flag to limit EPOW RESET warning. */
> +static bool epow_state;
> +
> static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
> static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
>
> @@ -145,21 +148,27 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
> switch (action_code) {
> case EPOW_RESET:
> - pr_err("Non critical power or cooling issue cleared");
> + if (epow_state) {
> + pr_err("Non critical power or cooling issue cleared");
> + epow_state = false;
> + }
> break;
>
> case EPOW_WARN_COOLING:
> - pr_err("Non critical cooling issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_err("Non critical cooling issue reported by firmware, "
> + "Check RTAS error log for details");
> + epow_state = true;
> break;
>
> case EPOW_WARN_POWER:
> - pr_err("Non critical power issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_err("Non critical power issue reported by firmware, "
> + "Check RTAS error log for details");
> + epow_state = true;
> break;
>
> case EPOW_SYSTEM_SHUTDOWN:
> handle_system_shutdown(epow_log->event_modifier);
> + epow_state = true;
> break;
>
> case EPOW_SYSTEM_HALT:
> @@ -169,9 +178,8 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
> case EPOW_MAIN_ENCLOSURE:
> case EPOW_POWER_OFF:
> - pr_emerg("Critical power/cooling issue reported by firmware");
> - pr_emerg("Check RTAS error log for details");
> - pr_emerg("Immediate power off");
> + pr_emerg("Critical power/cooling issue reported by firmware, "
> + "Check RTAS error log for details. Immediate power off.");
> emergency_sync();
> kernel_power_off();
> break;
> @@ -179,6 +187,7 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
> default:
> pr_err("Unknown power/cooling event (action code %d)",
> action_code);
> + epow_state = true;
> }
> }
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RESEND,v3] powerpc/pseries: Limit EPOW reset event warnings
2015-07-15 4:22 ` [RESEND PATCH " Kamalesh Babulal
2015-07-15 7:01 ` Vipin K Parashar
@ 2015-07-16 4:05 ` Michael Ellerman
2015-07-17 8:17 ` Kamalesh Babulal
2015-07-17 9:51 ` Vipin K Parashar
1 sibling, 2 replies; 9+ messages in thread
From: Michael Ellerman @ 2015-07-16 4:05 UTC (permalink / raw)
To: Kamalesh Babulal, linuxppc-dev
Cc: vipin, Anshuman Khandual, Anton Blanchard, Kamalesh Babulal
On Wed, 2015-15-07 at 04:22:06 UTC, Kamalesh Babulal wrote:
> Kernel prints respective warnings about various EPOW events for
> user information/action after parsing EPOW interrupts.Prompting
> user to take action depending upon the severity of the event.
>
> At times EPOW reset event warning, such as below could flood
> kernel log, over a period of time.
>
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
>
> This patch avoids these multiple EPOW reset warnings by using a boolean
> flag. This flag is initialized to false and is set to true upon arrival
> of EPOW event. This same flag is checked and reset during EPOW_RESET
> scenario to filter out valid EPOW reset events and avoid multiple warning
> logs.
Why are we even getting these reset events when nothing has happened?
> Also, merged adjacent pr_err/pr_emerg into single one to reduce
> the number of lines printed per warning.
>
> Suggested-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
> [Vipin: edited the changelog]
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Cc: Anton Blanchard <anton@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> ---
> v3 Changes:
> - Limit warning printed by EPOW RESET event, by guarding it with bool flag.
> Instead of rate limiting all the EPOW events.
>
> v2 Changes:
> - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
> warnings, based on Michael's comments.
>
> arch/powerpc/platforms/pseries/ras.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..b30396a 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -40,6 +40,9 @@ static int ras_check_exception_token;
> #define EPOW_SENSOR_TOKEN 9
> #define EPOW_SENSOR_INDEX 0
>
> +/* Flag to limit EPOW RESET warning. */
> +static bool epow_state;
This name is terrible, it doesn't give me any hint to what it means.
But really it should be a counter, not a boolean.
We could have multiple EPOW events come in and then later get the reset events
for them, couldn't we?
So what about:
static unsigned epow_event_depth;
And then below:
> @@ -145,21 +148,27 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
epow_event_depth++;
switch (action_code) {
case EPOW_RESET:
if (epow_event_depth)
epow_event_depth--;
if (epow_event_depth)
> + pr_err("Non critical power or cooling issue cleared");
> break;
And that's all you need.
> case EPOW_WARN_COOLING:
> - pr_err("Non critical cooling issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_err("Non critical cooling issue reported by firmware, "
> + "Check RTAS error log for details");
This should be:
pr_err("Non-critical cooling issue reported by firmware, check RTAS error log for details.\n");
But that's too long, so how about:
pr_err("Non-critical cooling issue reported, check RTAS error log for details.\n");
And if it's non-critical it shouldn't be pr_err(), it should be pr_info().
Similarly for all the other messages.
cheers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RESEND,v3] powerpc/pseries: Limit EPOW reset event warnings
2015-07-16 4:05 ` [RESEND,v3] " Michael Ellerman
@ 2015-07-17 8:17 ` Kamalesh Babulal
2015-09-07 21:40 ` Vipin K Parashar
2015-09-10 17:17 ` Vipin K Parashar
2015-07-17 9:51 ` Vipin K Parashar
1 sibling, 2 replies; 9+ messages in thread
From: Kamalesh Babulal @ 2015-07-17 8:17 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev, vipin, Anton Blanchard, Anshuman Khandual
* Michael Ellerman <mpe@ellerman.id.au> [2015-07-16 14:05:52]:
[..]
>
> Why are we even getting these reset events when nothing has happened?
Thanks for the review. It was seen only on one machine, couldn't
get hold of the machine any more. I am guessing here, that it might be
the firmware.
>
> > Also, merged adjacent pr_err/pr_emerg into single one to reduce
> > the number of lines printed per warning.
[..]
> >
> > +/* Flag to limit EPOW RESET warning. */
> > +static bool epow_state;
>
> This name is terrible, it doesn't give me any hint to what it means.
>
> But really it should be a counter, not a boolean.
>
> We could have multiple EPOW events come in and then later get the reset events
> for them, couldn't we?
>
>
> So what about:
>
> static unsigned epow_event_depth;
>
--->8----
>From 0d27916fd09a9f0912a217432a41e2b579dc2952 Mon Sep 17 00:00:00 2001
From: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date: Fri, 17 Jul 2015 13:19:31 +0530
Subject: [PATCH v4] powerpc/pseries: Limit EPOW reset event warnings
Kernel prints respective warnings about various EPOW events for
user information/action after parsing EPOW interrupts. At times
EPOW reset event warning, such as below could flood kernel log,
over a period of time.
May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
This patch avoids these multiple EPOW reset warnings by using epow_depth
counter. Which is incremented every time EPOW event is reported and
decremented on EPOW_RESET event. With this approach number EPOW RESET
warning matches the number of EPOW events.
Also, merged adjacent pr_info/pr_err/pr_emerg into single one to reduce
the number of lines printed per warning across the file and converted
non-critical errors to pr_info from pr_error, including grammar
correction in the warnings printed.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
---
V4: Changes:
- Changed the approach to depth counter to match the EPOW events and
EPOW reset.
- Converted pr_err() ot pr_info() for non-critical errors.
- Merged adjacent warnings into single line across the file.
- Fixed grammar in the warnings to make is short.
v3 Changes:
- Limit warning printed by EPOW RESET event, by guarding it with bool flag.
Instead of rate limiting all the EPOW events.
v2 Changes:
- Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
warnings, based on Michael's comments.
arch/powerpc/platforms/pseries/ras.c | 53 ++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 24 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 02e4a17..995cab8 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -40,6 +40,8 @@ static int ras_check_exception_token;
#define EPOW_SENSOR_TOKEN 9
#define EPOW_SENSOR_INDEX 0
+static unsigned epow_event_depth;
+
static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
@@ -82,32 +84,30 @@ static void handle_system_shutdown(char event_modifier)
{
switch (event_modifier) {
case EPOW_SHUTDOWN_NORMAL:
- pr_emerg("Firmware initiated power off");
+ pr_emerg("Firmware initiated power off\n");
orderly_poweroff(true);
break;
case EPOW_SHUTDOWN_ON_UPS:
- pr_emerg("Loss of power reported by firmware, system is "
- "running on UPS/battery");
- pr_emerg("Check RTAS error log for details");
+ pr_emerg("Loss of power reported, system is running on"
+ " UPS/battery. Check RTAS error log for details\n");
orderly_poweroff(true);
break;
case EPOW_SHUTDOWN_LOSS_OF_CRITICAL_FUNCTIONS:
- pr_emerg("Loss of system critical functions reported by "
- "firmware");
- pr_emerg("Check RTAS error log for details");
+ pr_emerg("Loss of system critical functions reported. Check"
+ " RTAS error log for details\n");
orderly_poweroff(true);
break;
case EPOW_SHUTDOWN_AMBIENT_TEMPERATURE_TOO_HIGH:
- pr_emerg("Ambient temperature too high reported by firmware");
- pr_emerg("Check RTAS error log for details");
+ pr_emerg("Ambient temperature too high reported. Check RTAS"
+ " error log for details\n");
orderly_poweroff(true);
break;
default:
- pr_err("Unknown power/cooling shutdown event (modifier %d)",
+ pr_info("Unknown power/cooling shutdown event (modifier %d)\n",
event_modifier);
}
}
@@ -145,40 +145,46 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
switch (action_code) {
case EPOW_RESET:
- pr_err("Non critical power or cooling issue cleared");
+ if (epow_event_depth) {
+ pr_err("Non critical power/cooling issue cleared\n");
+ epow_event_depth--;
+ }
break;
case EPOW_WARN_COOLING:
- pr_err("Non critical cooling issue reported by firmware");
- pr_err("Check RTAS error log for details");
+ pr_info("Non-critical cooling issue reported, check RTAS error"
+ " log for details\n");
+ epow_event_depth++;
break;
case EPOW_WARN_POWER:
- pr_err("Non critical power issue reported by firmware");
- pr_err("Check RTAS error log for details");
+ pr_info("Non-critical power issue reported, check RTAS error"
+ " log for details\n");
+ epow_event_depth++;
break;
case EPOW_SYSTEM_SHUTDOWN:
handle_system_shutdown(epow_log->event_modifier);
+ epow_event_depth++;
break;
case EPOW_SYSTEM_HALT:
- pr_emerg("Firmware initiated power off");
+ pr_emerg("Firmware initiated power off\n");
orderly_poweroff(true);
break;
case EPOW_MAIN_ENCLOSURE:
case EPOW_POWER_OFF:
- pr_emerg("Critical power/cooling issue reported by firmware");
- pr_emerg("Check RTAS error log for details");
- pr_emerg("Immediate power off");
+ pr_emerg("Critical power/cooling issue reported, Check RTAS"
+ " error log for details. Immediate power off\n");
emergency_sync();
kernel_power_off();
break;
default:
- pr_err("Unknown power/cooling event (action code %d)",
+ pr_info("Unknown power/cooling event (action code %d)\n",
action_code);
+ epow_event_depth++;
}
}
@@ -248,13 +254,12 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
log_error(ras_log_buf, ERR_TYPE_RTAS_LOG, fatal);
if (fatal) {
- pr_emerg("Fatal hardware error reported by firmware");
- pr_emerg("Check RTAS error log for details");
- pr_emerg("Immediate power off");
+ pr_emerg("Fatal hardware error reported, Check RTAS error"
+ " log for details. Immediate power off\n");
emergency_sync();
kernel_power_off();
} else {
- pr_err("Recoverable hardware error reported by firmware");
+ pr_err("Recoverable hardware error reported by firmware\n");
}
spin_unlock(&ras_log_buf_lock);
--
2.1.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RESEND,v3] powerpc/pseries: Limit EPOW reset event warnings
2015-07-16 4:05 ` [RESEND,v3] " Michael Ellerman
2015-07-17 8:17 ` Kamalesh Babulal
@ 2015-07-17 9:51 ` Vipin K Parashar
1 sibling, 0 replies; 9+ messages in thread
From: Vipin K Parashar @ 2015-07-17 9:51 UTC (permalink / raw)
To: Michael Ellerman, Kamalesh Babulal, linuxppc-dev
Cc: Anshuman Khandual, Anton Blanchard
On 07/16/2015 09:35 AM, Michael Ellerman wrote:
> On Wed, 2015-15-07 at 04:22:06 UTC, Kamalesh Babulal wrote:
>> Kernel prints respective warnings about various EPOW events for
>> user information/action after parsing EPOW interrupts.Prompting
>> user to take action depending upon the severity of the event.
>>
>> At times EPOW reset event warning, such as below could flood
>> kernel log, over a period of time.
>>
>> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
>> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
>> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
>> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
>> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
>> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
>> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
>> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
>> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
>> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
>> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
>> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
>> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
>>
>> This patch avoids these multiple EPOW reset warnings by using a boolean
>> flag. This flag is initialized to false and is set to true upon arrival
>> of EPOW event. This same flag is checked and reset during EPOW_RESET
>> scenario to filter out valid EPOW reset events and avoid multiple warning
>> logs.
> Why are we even getting these reset events when nothing has happened?
>
>> Also, merged adjacent pr_err/pr_emerg into single one to reduce
>> the number of lines printed per warning.
>>
>> Suggested-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
>> [Vipin: edited the changelog]
>> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> Cc: Anton Blanchard <anton@samba.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
>> ---
>> v3 Changes:
>> - Limit warning printed by EPOW RESET event, by guarding it with bool flag.
>> Instead of rate limiting all the EPOW events.
>>
>> v2 Changes:
>> - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
>> warnings, based on Michael's comments.
>>
>> arch/powerpc/platforms/pseries/ras.c | 25 +++++++++++++++++--------
>> 1 file changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index 02e4a17..b30396a 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -40,6 +40,9 @@ static int ras_check_exception_token;
>> #define EPOW_SENSOR_TOKEN 9
>> #define EPOW_SENSOR_INDEX 0
>>
>> +/* Flag to limit EPOW RESET warning. */
>> +static bool epow_state;
> This name is terrible, it doesn't give me any hint to what it means.
>
> But really it should be a counter, not a boolean.
>
> We could have multiple EPOW events come in and then later get the reset events
> for them, couldn't we?
As per PAPR i see below description of EPOW_RESET
EPOW_RESET / MESSAGE (0) - No EPOW event is pending.
So we probably need to understand if it is send only after all EPOW
events have
reset or just last EPOW event. From the PAPR description is seems to be
first case.
>
>
> So what about:
>
> static unsigned epow_event_depth;
>
> And then below:
>
>> @@ -145,21 +148,27 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>>
> epow_event_depth++;
>
> switch (action_code) {
> case EPOW_RESET:
> if (epow_event_depth)
> epow_event_depth--;
>
> if (epow_event_depth)
>> + pr_err("Non critical power or cooling issue cleared");
>> break;
>
> And that's all you need.
>
>
>> case EPOW_WARN_COOLING:
>> - pr_err("Non critical cooling issue reported by firmware");
>> - pr_err("Check RTAS error log for details");
>> + pr_err("Non critical cooling issue reported by firmware, "
>> + "Check RTAS error log for details");
> This should be:
>
> pr_err("Non-critical cooling issue reported by firmware, check RTAS error log for details.\n");
>
> But that's too long, so how about:
>
> pr_err("Non-critical cooling issue reported, check RTAS error log for details.\n");
>
> And if it's non-critical it shouldn't be pr_err(), it should be pr_info().
>
> Similarly for all the other messages.
>
>
> cheers
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RESEND,v3] powerpc/pseries: Limit EPOW reset event warnings
2015-07-17 8:17 ` Kamalesh Babulal
@ 2015-09-07 21:40 ` Vipin K Parashar
2015-09-10 17:17 ` Vipin K Parashar
1 sibling, 0 replies; 9+ messages in thread
From: Vipin K Parashar @ 2015-09-07 21:40 UTC (permalink / raw)
To: Kamalesh Babulal, Michael Ellerman
Cc: linuxppc-dev, Anton Blanchard, Anshuman Khandual
On 07/17/2015 01:47 PM, Kamalesh Babulal wrote:
> * Michael Ellerman <mpe@ellerman.id.au> [2015-07-16 14:05:52]:
>
> [..]
>> Why are we even getting these reset events when nothing has happened?
> Thanks for the review. It was seen only on one machine, couldn't
> get hold of the machine any more. I am guessing here, that it might be
> the firmware.
Checking with PFW guys as to under what circumstances one would see
so many reset events being reported ? Will post out findings as soon as i
hear things back from PFW guys on this.
>>> Also, merged adjacent pr_err/pr_emerg into single one to reduce
>>> the number of lines printed per warning.
> [..]
>>>
>>> +/* Flag to limit EPOW RESET warning. */
>>> +static bool epow_state;
>> This name is terrible, it doesn't give me any hint to what it means.
>>
>> But really it should be a counter, not a boolean.
>>
>> We could have multiple EPOW events come in and then later get the reset events
>> for them, couldn't we?
>>
>>
>> So what about:
>>
>> static unsigned epow_event_depth;
>>
> --->8----
>
> From 0d27916fd09a9f0912a217432a41e2b579dc2952 Mon Sep 17 00:00:00 2001
> From: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> Date: Fri, 17 Jul 2015 13:19:31 +0530
> Subject: [PATCH v4] powerpc/pseries: Limit EPOW reset event warnings
>
> Kernel prints respective warnings about various EPOW events for
> user information/action after parsing EPOW interrupts. At times
> EPOW reset event warning, such as below could flood kernel log,
> over a period of time.
>
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
>
> This patch avoids these multiple EPOW reset warnings by using epow_depth
> counter. Which is incremented every time EPOW event is reported and
> decremented on EPOW_RESET event. With this approach number EPOW RESET
> warning matches the number of EPOW events.
>
> Also, merged adjacent pr_info/pr_err/pr_emerg into single one to reduce
> the number of lines printed per warning across the file and converted
> non-critical errors to pr_info from pr_error, including grammar
> correction in the warnings printed.
>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Cc: Anton Blanchard <anton@samba.org>
> Cc: Vipin K Parashar <vipin@linux.vnet.ibm.com>
> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> ---
> V4: Changes:
> - Changed the approach to depth counter to match the EPOW events and
> EPOW reset.
> - Converted pr_err() ot pr_info() for non-critical errors.
> - Merged adjacent warnings into single line across the file.
> - Fixed grammar in the warnings to make is short.
>
> v3 Changes:
> - Limit warning printed by EPOW RESET event, by guarding it with bool flag.
> Instead of rate limiting all the EPOW events.
>
> v2 Changes:
> - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
> warnings, based on Michael's comments.
>
> arch/powerpc/platforms/pseries/ras.c | 53 ++++++++++++++++++++----------------
> 1 file changed, 29 insertions(+), 24 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..995cab8 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -40,6 +40,8 @@ static int ras_check_exception_token;
> #define EPOW_SENSOR_TOKEN 9
> #define EPOW_SENSOR_INDEX 0
>
> +static unsigned epow_event_depth;
> +
> static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
> static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
>
> @@ -82,32 +84,30 @@ static void handle_system_shutdown(char event_modifier)
> {
> switch (event_modifier) {
> case EPOW_SHUTDOWN_NORMAL:
> - pr_emerg("Firmware initiated power off");
> + pr_emerg("Firmware initiated power off\n");
> orderly_poweroff(true);
> break;
>
> case EPOW_SHUTDOWN_ON_UPS:
> - pr_emerg("Loss of power reported by firmware, system is "
> - "running on UPS/battery");
> - pr_emerg("Check RTAS error log for details");
> + pr_emerg("Loss of power reported, system is running on"
> + " UPS/battery. Check RTAS error log for details\n");
> orderly_poweroff(true);
> break;
>
> case EPOW_SHUTDOWN_LOSS_OF_CRITICAL_FUNCTIONS:
> - pr_emerg("Loss of system critical functions reported by "
> - "firmware");
> - pr_emerg("Check RTAS error log for details");
> + pr_emerg("Loss of system critical functions reported. Check"
> + " RTAS error log for details\n");
> orderly_poweroff(true);
> break;
>
> case EPOW_SHUTDOWN_AMBIENT_TEMPERATURE_TOO_HIGH:
> - pr_emerg("Ambient temperature too high reported by firmware");
> - pr_emerg("Check RTAS error log for details");
> + pr_emerg("Ambient temperature too high reported. Check RTAS"
> + " error log for details\n");
> orderly_poweroff(true);
> break;
>
> default:
> - pr_err("Unknown power/cooling shutdown event (modifier %d)",
> + pr_info("Unknown power/cooling shutdown event (modifier %d)\n",
> event_modifier);
> }
> }
> @@ -145,40 +145,46 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
> switch (action_code) {
> case EPOW_RESET:
> - pr_err("Non critical power or cooling issue cleared");
> + if (epow_event_depth) {
> + pr_err("Non critical power/cooling issue cleared\n");
> + epow_event_depth--;
> + }
> break;
>
> case EPOW_WARN_COOLING:
> - pr_err("Non critical cooling issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_info("Non-critical cooling issue reported, check RTAS error"
> + " log for details\n");
> + epow_event_depth++;
> break;
>
> case EPOW_WARN_POWER:
> - pr_err("Non critical power issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_info("Non-critical power issue reported, check RTAS error"
> + " log for details\n");
> + epow_event_depth++;
> break;
>
> case EPOW_SYSTEM_SHUTDOWN:
> handle_system_shutdown(epow_log->event_modifier);
> + epow_event_depth++;
> break;
>
> case EPOW_SYSTEM_HALT:
> - pr_emerg("Firmware initiated power off");
> + pr_emerg("Firmware initiated power off\n");
> orderly_poweroff(true);
> break;
>
> case EPOW_MAIN_ENCLOSURE:
> case EPOW_POWER_OFF:
> - pr_emerg("Critical power/cooling issue reported by firmware");
> - pr_emerg("Check RTAS error log for details");
> - pr_emerg("Immediate power off");
> + pr_emerg("Critical power/cooling issue reported, Check RTAS"
> + " error log for details. Immediate power off\n");
> emergency_sync();
> kernel_power_off();
> break;
>
> default:
> - pr_err("Unknown power/cooling event (action code %d)",
> + pr_info("Unknown power/cooling event (action code %d)\n",
> action_code);
> + epow_event_depth++;
> }
> }
>
> @@ -248,13 +254,12 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
> log_error(ras_log_buf, ERR_TYPE_RTAS_LOG, fatal);
>
> if (fatal) {
> - pr_emerg("Fatal hardware error reported by firmware");
> - pr_emerg("Check RTAS error log for details");
> - pr_emerg("Immediate power off");
> + pr_emerg("Fatal hardware error reported, Check RTAS error"
> + " log for details. Immediate power off\n");
> emergency_sync();
> kernel_power_off();
> } else {
> - pr_err("Recoverable hardware error reported by firmware");
> + pr_err("Recoverable hardware error reported by firmware\n");
> }
>
> spin_unlock(&ras_log_buf_lock);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RESEND,v3] powerpc/pseries: Limit EPOW reset event warnings
2015-07-17 8:17 ` Kamalesh Babulal
2015-09-07 21:40 ` Vipin K Parashar
@ 2015-09-10 17:17 ` Vipin K Parashar
1 sibling, 0 replies; 9+ messages in thread
From: Vipin K Parashar @ 2015-09-10 17:17 UTC (permalink / raw)
To: Kamalesh Babulal, Michael Ellerman
Cc: linuxppc-dev, Anton Blanchard, Anshuman Khandual
On 07/17/2015 01:47 PM, Kamalesh Babulal wrote:
> * Michael Ellerman <mpe@ellerman.id.au> [2015-07-16 14:05:52]:
>
> [..]
>> Why are we even getting these reset events when nothing has happened?
Based on info received from PFW guys and HW working of EPOW
FW only acts as a passthru here passing EPOW info obtained from
underneath PHYP/HW.
On FSP based POWER systems EPOW information is send via Panel status
notification alerts which also
contains SPCN info along with EPOW details. As a result even when no
EPOW condition is present
these notifications are still sent by HW to notify any SPCN related
changes. Thus FW ends up sending
multiple EPOW_RESET notifications with no actual EPOW event being
active. So multiple EPOW_RESET
notifications are expected as per design and Linux would need to add a
fix to avoid multiple logging for them.
> Thanks for the review. It was seen only on one machine, couldn't
> get hold of the machine any more. I am guessing here, that it might be
> the firmware.
>
>>> Also, merged adjacent pr_err/pr_emerg into single one to reduce
>>> the number of lines printed per warning.
> [..]
>>>
>>> +/* Flag to limit EPOW RESET warning. */
>>> +static bool epow_state;
>> This name is terrible, it doesn't give me any hint to what it means.
>>
>> But really it should be a counter, not a boolean.
>>
>> We could have multiple EPOW events come in and then later get the reset events
>> for them, couldn't we?
Below is EPOW_RESET description from PAPR:
EPOW_RESET/MESSAGE (0) - No EPOW event is pending. This action code is
the lowest priority.
PFW sends EPOW_RESET only when none of EPOW condition is present in system.
For two outstanding EPOW conditions HW also doesn't provide any means to
know
that one has got reset. It only tells phyp/PFW the highest priority EPOW
condition
and would inform reset case when all such conditions go away.
With this would a boolean flag be more appropriate here ?
>>
>>
>> So what about:
>>
>> static unsigned epow_event_depth;
>>
> --->8----
>
> From 0d27916fd09a9f0912a217432a41e2b579dc2952 Mon Sep 17 00:00:00 2001
> From: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> Date: Fri, 17 Jul 2015 13:19:31 +0530
> Subject: [PATCH v4] powerpc/pseries: Limit EPOW reset event warnings
>
> Kernel prints respective warnings about various EPOW events for
> user information/action after parsing EPOW interrupts. At times
> EPOW reset event warning, such as below could flood kernel log,
> over a period of time.
>
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
>
> This patch avoids these multiple EPOW reset warnings by using epow_depth
> counter. Which is incremented every time EPOW event is reported and
> decremented on EPOW_RESET event. With this approach number EPOW RESET
> warning matches the number of EPOW events.
>
> Also, merged adjacent pr_info/pr_err/pr_emerg into single one to reduce
> the number of lines printed per warning across the file and converted
> non-critical errors to pr_info from pr_error, including grammar
> correction in the warnings printed.
>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Cc: Anton Blanchard <anton@samba.org>
> Cc: Vipin K Parashar <vipin@linux.vnet.ibm.com>
> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> ---
> V4: Changes:
> - Changed the approach to depth counter to match the EPOW events and
> EPOW reset.
> - Converted pr_err() ot pr_info() for non-critical errors.
> - Merged adjacent warnings into single line across the file.
> - Fixed grammar in the warnings to make is short.
> v3 Changes:
> - Limit warning printed by EPOW RESET event, by guarding it with bool flag.
> Instead of rate limiting all the EPOW events.
>
> v2 Changes:
> - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
> warnings, based on Michael's comments.
>
> arch/powerpc/platforms/pseries/ras.c | 53 ++++++++++++++++++++----------------
> 1 file changed, 29 insertions(+), 24 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..995cab8 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -40,6 +40,8 @@ static int ras_check_exception_token;
> #define EPOW_SENSOR_TOKEN 9
> #define EPOW_SENSOR_INDEX 0
>
> +static unsigned epow_event_depth;
> +
> static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
> static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
>
> @@ -82,32 +84,30 @@ static void handle_system_shutdown(char event_modifier)
> {
> switch (event_modifier) {
> case EPOW_SHUTDOWN_NORMAL:
> - pr_emerg("Firmware initiated power off");
> + pr_emerg("Firmware initiated power off\n");
> orderly_poweroff(true);
> break;
>
> case EPOW_SHUTDOWN_ON_UPS:
> - pr_emerg("Loss of power reported by firmware, system is "
> - "running on UPS/battery");
> - pr_emerg("Check RTAS error log for details");
> + pr_emerg("Loss of power reported, system is running on"
> + " UPS/battery. Check RTAS error log for details\n");
> orderly_poweroff(true);
> break;
>
> case EPOW_SHUTDOWN_LOSS_OF_CRITICAL_FUNCTIONS:
> - pr_emerg("Loss of system critical functions reported by "
> - "firmware");
> - pr_emerg("Check RTAS error log for details");
> + pr_emerg("Loss of system critical functions reported. Check"
> + " RTAS error log for details\n");
> orderly_poweroff(true);
> break;
>
> case EPOW_SHUTDOWN_AMBIENT_TEMPERATURE_TOO_HIGH:
> - pr_emerg("Ambient temperature too high reported by firmware");
> - pr_emerg("Check RTAS error log for details");
> + pr_emerg("Ambient temperature too high reported. Check RTAS"
> + " error log for details\n");
"High Ambient temperature reported" ?
> orderly_poweroff(true);
> break;
>
> default:
> - pr_err("Unknown power/cooling shutdown event (modifier %d)",
> + pr_info("Unknown power/cooling shutdown event (modifier %d)\n",
> event_modifier);
pr_err seems apt here as we don't know the severity of
unknown/unsupported event.
> }
> }
> @@ -145,40 +145,46 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
> switch (action_code) {
> case EPOW_RESET:
> - pr_err("Non critical power or cooling issue cleared");
> + if (epow_event_depth) {
> + pr_err("Non critical power/cooling issue cleared\n");
pr_info ?
> + epow_event_depth--;
> + }
> break;
>
> case EPOW_WARN_COOLING:
> - pr_err("Non critical cooling issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_info("Non-critical cooling issue reported, check RTAS error"
> + " log for details\n");
> + epow_event_depth++;
> break;
>
> case EPOW_WARN_POWER:
> - pr_err("Non critical power issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_info("Non-critical power issue reported, check RTAS error"
> + " log for details\n");
> + epow_event_depth++;
> break;
>
> case EPOW_SYSTEM_SHUTDOWN:
> handle_system_shutdown(epow_log->event_modifier);
> + epow_event_depth++;
> break;
>
> case EPOW_SYSTEM_HALT:
> - pr_emerg("Firmware initiated power off");
> + pr_emerg("Firmware initiated power off\n");
> orderly_poweroff(true);
> break;
>
> case EPOW_MAIN_ENCLOSURE:
> case EPOW_POWER_OFF:
> - pr_emerg("Critical power/cooling issue reported by firmware");
> - pr_emerg("Check RTAS error log for details");
> - pr_emerg("Immediate power off");
> + pr_emerg("Critical power/cooling issue reported, Check RTAS"
> + " error log for details. Immediate power off\n");
> emergency_sync();
> kernel_power_off();
> break;
>
> default:
> - pr_err("Unknown power/cooling event (action code %d)",
> + pr_info("Unknown power/cooling event (action code %d)\n",
> action_code);
pr_err ?
> + epow_event_depth++;
> }
> }
>
> @@ -248,13 +254,12 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
> log_error(ras_log_buf, ERR_TYPE_RTAS_LOG, fatal);
>
> if (fatal) {
> - pr_emerg("Fatal hardware error reported by firmware");
> - pr_emerg("Check RTAS error log for details");
> - pr_emerg("Immediate power off");
> + pr_emerg("Fatal hardware error reported, Check RTAS error"
> + " log for details. Immediate power off\n");
> emergency_sync();
> kernel_power_off();
> } else {
> - pr_err("Recoverable hardware error reported by firmware");
> + pr_err("Recoverable hardware error reported by firmware\n");
> }
We can omit braces here.
>
> spin_unlock(&ras_log_buf_lock);
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-09-10 17:17 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-14 15:09 [PATCH v3] powerpc/pseries: Limit EPOW reset event warnings Kamalesh Babulal
2015-07-14 18:01 ` Vipin K Parashar
2015-07-15 4:22 ` [RESEND PATCH " Kamalesh Babulal
2015-07-15 7:01 ` Vipin K Parashar
2015-07-16 4:05 ` [RESEND,v3] " Michael Ellerman
2015-07-17 8:17 ` Kamalesh Babulal
2015-09-07 21:40 ` Vipin K Parashar
2015-09-10 17:17 ` Vipin K Parashar
2015-07-17 9:51 ` Vipin K Parashar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).