linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] powerpc/pseries: Ratelimit EPOW event warnings
@ 2015-05-28  4:33 Kamalesh Babulal
  2015-06-01 11:26 ` Michael Ellerman
  0 siblings, 1 reply; 4+ messages in thread
From: Kamalesh Babulal @ 2015-05-28  4:33 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Kamalesh Babulal, Anshuman Khandual, Anton Blanchard,
	Michael Ellerman

We print the respective warning after parsing EPOW interrupts,
prompting user to take action depending upon the severity of the
event.

Some times same EPOW event warning, such as below could flood kernel
log, within very short duration. So Limit the message by using
ratelimit variant of pr_err.

May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/platforms/pseries/ras.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 02e4a17..2556bc2 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -145,17 +145,17 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
 
 	switch (action_code) {
 	case EPOW_RESET:
-		pr_err("Non critical power or cooling issue cleared");
+		pr_err_ratelimited("Non critical power or cooling issue cleared");
 		break;
 
 	case EPOW_WARN_COOLING:
-		pr_err("Non critical cooling issue reported by firmware");
-		pr_err("Check RTAS error log for details");
+		pr_err_ratelimited("Non critical cooling issue reported by firmware");
+		pr_err_ratelimited("Check RTAS error log for details");
 		break;
 
 	case EPOW_WARN_POWER:
-		pr_err("Non critical power issue reported by firmware");
-		pr_err("Check RTAS error log for details");
+		pr_err_ratelimited("Non critical power issue reported by firmware");
+		pr_err_ratelimited("Check RTAS error log for details");
 		break;
 
 	case EPOW_SYSTEM_SHUTDOWN:
@@ -177,7 +177,7 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
 		break;
 
 	default:
-		pr_err("Unknown power/cooling event (action code %d)",
+		pr_err_ratelimited("Unknown power/cooling event (action code %d)",
 			action_code);
 	}
 }
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] powerpc/pseries: Ratelimit EPOW event warnings
  2015-05-28  4:33 [RFC PATCH] powerpc/pseries: Ratelimit EPOW event warnings Kamalesh Babulal
@ 2015-06-01 11:26 ` Michael Ellerman
  2015-06-02  5:03   ` Kamalesh Babulal
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Ellerman @ 2015-06-01 11:26 UTC (permalink / raw)
  To: Kamalesh Babulal; +Cc: linuxppc-dev, Anshuman Khandual, Anton Blanchard

On Thu, 2015-05-28 at 10:03 +0530, Kamalesh Babulal wrote:
> We print the respective warning after parsing EPOW interrupts,
> prompting user to take action depending upon the severity of the
> event.
> 
> Some times same EPOW event warning, such as below could flood kernel
> log, within very short duration. So Limit the message by using
> ratelimit variant of pr_err.
> 
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared

Looking at the time stamps those are actually all fairly far apart in time,
aren't they? So do we actually see them within a short duration in practice?

It does seem sensible to rate limit them though.

> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..2556bc2 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -145,17 +145,17 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>  
>  	switch (action_code) {
>  	case EPOW_RESET:
> -		pr_err("Non critical power or cooling issue cleared");
> +		pr_err_ratelimited("Non critical power or cooling issue cleared");
>  		break;
>  
>  	case EPOW_WARN_COOLING:
> -		pr_err("Non critical cooling issue reported by firmware");
> -		pr_err("Check RTAS error log for details");
> +		pr_err_ratelimited("Non critical cooling issue reported by firmware");
> +		pr_err_ratelimited("Check RTAS error log for details");
>  		break;
>  
>  	case EPOW_WARN_POWER:
> -		pr_err("Non critical power issue reported by firmware");
> -		pr_err("Check RTAS error log for details");
> +		pr_err_ratelimited("Non critical power issue reported by firmware");
> +		pr_err_ratelimited("Check RTAS error log for details");
>  		break;

Those last two could be collapsed onto one line which would reduce the spam.

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] powerpc/pseries: Ratelimit EPOW event warnings
  2015-06-01 11:26 ` Michael Ellerman
@ 2015-06-02  5:03   ` Kamalesh Babulal
  2015-06-02  7:01     ` Michael Ellerman
  0 siblings, 1 reply; 4+ messages in thread
From: Kamalesh Babulal @ 2015-06-02  5:03 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Anton Blanchard, Anshuman Khandual

* Michael Ellerman <mpe@ellerman.id.au> [2015-06-01 21:26:51]:

> On Thu, 2015-05-28 at 10:03 +0530, Kamalesh Babulal wrote:
> > We print the respective warning after parsing EPOW interrupts,
> > prompting user to take action depending upon the severity of the
> > event.
> > 
> > Some times same EPOW event warning, such as below could flood kernel
> > log, within very short duration. So Limit the message by using
> > ratelimit variant of pr_err.
> > 
> > May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> > May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> > May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> > May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> > May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> > May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> > May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> > May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> > May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> > May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> > May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> > May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> > May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
> 
> Looking at the time stamps those are actually all fairly far apart in time,
> aren't they? So do we actually see them within a short duration in practice?

Thanks for the review. Agree, I should have phrased it better. My intend was to
say, that these warnings keep flooding the kernel log, over a period of time.

[..]
> >  	case EPOW_WARN_POWER:
> > -		pr_err("Non critical power issue reported by firmware");
> > -		pr_err("Check RTAS error log for details");
> > +		pr_err_ratelimited("Non critical power issue reported by firmware");
> > +		pr_err_ratelimited("Check RTAS error log for details");
> >  		break;
> 
> Those last two could be collapsed onto one line which would reduce the spam.

Yes, it could reduce the number of lines printed. Will resend the patch with the
changes.

Thanks,
Kamalesh.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] powerpc/pseries: Ratelimit EPOW event warnings
  2015-06-02  5:03   ` Kamalesh Babulal
@ 2015-06-02  7:01     ` Michael Ellerman
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2015-06-02  7:01 UTC (permalink / raw)
  To: Kamalesh Babulal; +Cc: linuxppc-dev, Anton Blanchard, Anshuman Khandual

On Tue, 2015-06-02 at 10:33 +0530, Kamalesh Babulal wrote:
> * Michael Ellerman <mpe@ellerman.id.au> [2015-06-01 21:26:51]:
> 
> > On Thu, 2015-05-28 at 10:03 +0530, Kamalesh Babulal wrote:
> > > We print the respective warning after parsing EPOW interrupts,
> > > prompting user to take action depending upon the severity of the
> > > event.
> > > 
> > > Some times same EPOW event warning, such as below could flood kernel
> > > log, within very short duration. So Limit the message by using
> > > ratelimit variant of pr_err.
> > > 
> > > May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> > > May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> > > May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> > > May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> > > May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> > > May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> > > May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> > > May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> > > May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> > > May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> > > May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> > > May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> > > May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
> > 
> > Looking at the time stamps those are actually all fairly far apart in time,
> > aren't they? So do we actually see them within a short duration in practice?
> 
> Thanks for the review. Agree, I should have phrased it better. My intend was to
> say, that these warnings keep flooding the kernel log, over a period of time.

OK. By default printk_ratelimited() allows up to 10 messages in five seconds,
so it won't reduce the number of messages in the above example.

But I'm still OK with a patch to ratelimit them.

> [..]
> > >  	case EPOW_WARN_POWER:
> > > -		pr_err("Non critical power issue reported by firmware");
> > > -		pr_err("Check RTAS error log for details");
> > > +		pr_err_ratelimited("Non critical power issue reported by firmware");
> > > +		pr_err_ratelimited("Check RTAS error log for details");
> > >  		break;
> > 
> > Those last two could be collapsed onto one line which would reduce the spam.
> 
> Yes, it could reduce the number of lines printed. Will resend the patch with the
> changes.

Thanks.

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-02  7:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-28  4:33 [RFC PATCH] powerpc/pseries: Ratelimit EPOW event warnings Kamalesh Babulal
2015-06-01 11:26 ` Michael Ellerman
2015-06-02  5:03   ` Kamalesh Babulal
2015-06-02  7:01     ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).