public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* ACPI: EC: Fix logspam in "GPE storm avoidance" code
@ 2008-10-28 14:05 Alan Jenkins
  2008-10-28 18:25 ` Alexey Starikovskiy
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Jenkins @ 2008-10-28 14:05 UTC (permalink / raw)
  To: linux acpi; +Cc: Alexey Starikovskiy

<http://bugzilla.kernel.org/show_bug.cgi?id=11841>
"plenty of line "ACPI: EC: non-query interrupt received,
 switching to interrupt mode" in dmesg"

GPE storm avoidance involves disabling the EC GPE during transactions.
Unfortunately we forget to re-enable EC_FLAGS_GPE_MODE afterwards.

This happens to work because we re-enable it anyway when we detect
the first confirmation interrupt of the next transaction.  However,
this causes lots of bogus "switching to interrupt mode" messages.

Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 7f0d81c..ba47c7c 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -285,6 +285,7 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec, u8 command,
 		/* check if we received SCI during transaction */
 		ec_check_sci(ec, acpi_ec_read_status(ec));
 		/* it is safe to enable GPE outside of transaction */
+		set_bit(EC_FLAGS_GPE_MODE, &ec->flags);
 		acpi_enable_gpe(NULL, ec->gpe, ACPI_NOT_ISR);
 	} else if (test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
 		   t.irq_count > ACPI_EC_STORM_THRESHOLD) {



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: ACPI: EC: Fix logspam in "GPE storm avoidance" code
  2008-10-28 14:05 ACPI: EC: Fix logspam in "GPE storm avoidance" code Alan Jenkins
@ 2008-10-28 18:25 ` Alexey Starikovskiy
  2008-10-28 20:18   ` Alan Jenkins
  2008-10-29  0:12   ` Henrique de Moraes Holschuh
  0 siblings, 2 replies; 10+ messages in thread
From: Alexey Starikovskiy @ 2008-10-28 18:25 UTC (permalink / raw)
  To: Alan Jenkins; +Cc: linux acpi

Alan Jenkins wrote:
> <http://bugzilla.kernel.org/show_bug.cgi?id=11841>
> "plenty of line "ACPI: EC: non-query interrupt received,
>  switching to interrupt mode" in dmesg"
Probably, it is better to make pr_debug().
> 
> GPE storm avoidance involves disabling the EC GPE during transactions.
> Unfortunately we forget to re-enable EC_FLAGS_GPE_MODE afterwards.
"We" don't forget it, "we" just use same mechanism as driver init. Thus if
for some reason interrupts do not start to arrive after we issue enable_gpe(),
we will not fall to timeouts. 
> 
> This happens to work because we re-enable it anyway when we detect
> the first confirmation interrupt of the next transaction.  However,
> this causes lots of bogus "switching to interrupt mode" messages.
they are not "bogus", it is really switching the mode...
> 
> Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
> 
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index 7f0d81c..ba47c7c 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -285,6 +285,7 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec, u8 command,
>  		/* check if we received SCI during transaction */
>  		ec_check_sci(ec, acpi_ec_read_status(ec));
>  		/* it is safe to enable GPE outside of transaction */
> +		set_bit(EC_FLAGS_GPE_MODE, &ec->flags);
>  		acpi_enable_gpe(NULL, ec->gpe, ACPI_NOT_ISR);
>  	} else if (test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
>  		   t.irq_count > ACPI_EC_STORM_THRESHOLD) {
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ACPI: EC: Fix logspam in "GPE storm avoidance" code
  2008-10-28 18:25 ` Alexey Starikovskiy
@ 2008-10-28 20:18   ` Alan Jenkins
  2008-10-29  0:12   ` Henrique de Moraes Holschuh
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Jenkins @ 2008-10-28 20:18 UTC (permalink / raw)
  To: Alexey Starikovskiy; +Cc: linux acpi

Alexey Starikovskiy wrote:
> Alan Jenkins wrote:
>> <http://bugzilla.kernel.org/show_bug.cgi?id=11841>
>> "plenty of line "ACPI: EC: non-query interrupt received,
>>  switching to interrupt mode" in dmesg"
> Probably, it is better to make pr_debug().

Yes, it seems useful to have when all the DEBUG output is enabled.

In that case, the initial message about starting in poll mode should 
also be switched to pr_debug().  Otherwise, the change will give the 
impression that the EC never switches to interrupt mode.

That just leaves the message "missing confirmations, switching to 
polling mode".  That should only happen once per boot, or after a 
suspend/resume cycle, right?  Because it sets NO_GPE, which inhibits 
GPE_MODE and is only reset on resume.  So that can be preserved, and 
it's probably a useful message to have.

I'll leave it to you to submit a more appropriate patch.

>>
>> GPE storm avoidance involves disabling the EC GPE during transactions.
>> Unfortunately we forget to re-enable EC_FLAGS_GPE_MODE afterwards.
> "We" don't forget it, "we" just use same mechanism as driver init. 
> Thus if
> for some reason interrupts do not start to arrive after we issue 
> enable_gpe(),
> we will not fall to timeouts.

Interesting.  I agree that sort of caution is meritted for the EC 
driver!  If that's by design, then great.

I will have a look at the code again for my own peace of mind; I thought 
I understood the GPE storm avoidance but now I'm a bit fuzzy on the 
details.  We apologize for the overuse of the third person.  After 
staring at the code for so long I have to identify with it to some 
extent, despite not having actually written any of it.

>>
>> This happens to work because we re-enable it anyway when we detect
>> the first confirmation interrupt of the next transaction.  However,
>> this causes lots of bogus "switching to interrupt mode" messages.
> they are not "bogus", it is really switching the mode...

Bad words.  It seems bogus because the message changes meaning - for me 
as _user_.  It used to be a gentle reassurance that my computer was 
booting or resuming it's way towards normal working state.  Now it has 
changed to a weaker meaning, because it can also mean that a new 
transaction started in GPE_STORM mode.

In the technical and literal sense the meaning is unchanged.  If the 
message is hidden under DEBUG, that is the only meaning that will matter :).

Thanks
Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ACPI: EC: Fix logspam in "GPE storm avoidance" code
  2008-10-28 18:25 ` Alexey Starikovskiy
  2008-10-28 20:18   ` Alan Jenkins
@ 2008-10-29  0:12   ` Henrique de Moraes Holschuh
  2008-10-29  9:51     ` Alan Jenkins
  1 sibling, 1 reply; 10+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-10-29  0:12 UTC (permalink / raw)
  To: Alexey Starikovskiy; +Cc: Alan Jenkins, linux acpi, Len Brown

On Tue, 28 Oct 2008, Alexey Starikovskiy wrote:
> Alan Jenkins wrote:
>> <http://bugzilla.kernel.org/show_bug.cgi?id=11841>
>> "plenty of line "ACPI: EC: non-query interrupt received,
>>  switching to interrupt mode" in dmesg"
> Probably, it is better to make pr_debug().

Please don't do that.  This code has had a lot of churn, and *regressions*
as of lately, and sometimes we only notice these because we see those
messages in the logs.  Moving them to pr_debug() pretty much makes them
utterly useless in a large number of the cases they could be of help.

Besides, I very much doubt we will stop seing EC interrupt crappage. Not
only our code is NOT good and resilient enough yet (if it were, there
wouldn't be so many patches flying about it), the vendors are obviously
getting this wrong at a fast rate.

We need those messages.  Rate-limit them, but don't hide them or move them
to pr_debug, please.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ACPI: EC: Fix logspam in "GPE storm avoidance" code
  2008-10-29  0:12   ` Henrique de Moraes Holschuh
@ 2008-10-29  9:51     ` Alan Jenkins
  2008-10-29 15:08       ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Jenkins @ 2008-10-29  9:51 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh; +Cc: Alexey Starikovskiy, linux acpi, Len Brown

Henrique de Moraes Holschuh wrote:
> On Tue, 28 Oct 2008, Alexey Starikovskiy wrote:
>   
>> Alan Jenkins wrote:
>>     
>>> <http://bugzilla.kernel.org/show_bug.cgi?id=11841>
>>> "plenty of line "ACPI: EC: non-query interrupt received,
>>>  switching to interrupt mode" in dmesg"
>>>       
>> Probably, it is better to make pr_debug().
>>     
>
> Please don't do that.  This code has had a lot of churn, and *regressions*
> as of lately, and sometimes we only notice these because we see those
> messages in the logs.  Moving them to pr_debug() pretty much makes them
> utterly useless in a large number of the cases they could be of help.
>
> Besides, I very much doubt we will stop seing EC interrupt crappage. Not
> only our code is NOT good and resilient enough yet (if it were, there
> wouldn't be so many patches flying about it), the vendors are obviously
> getting this wrong at a fast rate.
>
> We need those messages.  Rate-limit them, but don't hide them or move them
> to pr_debug, please.
>   

Please have a look at the dmesg attached to the bug.  They're already
rate-limited.

When in GPE storm avoidance mode, they will trigger once for each
transaction.  Transactions happen frequently, and will happen
continually once e.g. gnome-power-manager is polling the battery level. 
In this special case, they're not a useful message to users or
blackbox-level testers; they are only useful as part of a full DEBUG
trace that actually shows the transactions.

My original patch suppresses the message in this particular case, but it
preserves it for the common non-storm case, where it may provide useful
information.  And the message would still happen once on boot, before
the GPE storm is detected.  Unfortunately my patch also makes the driver
a little less robust.  If the robustness issue can be addressed, do you
accept that it's a good idea to suppress the flood of duplicate messages
reported in this bug?

We already have... damn.  I think you missed a more important omission. 
There _used_ to be a message that says we've switched to storm avoidance
mode.  However, it was removed in the latest re-write.  This bug report
suggests that a) the cause would have been more obvious if we had the
GPE storm message, and b) the stormy case wasn't really tested so we
really do need a message, in case it goes wrong.

Thanks
Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ACPI: EC: Fix logspam in "GPE storm avoidance" code
  2008-10-29  9:51     ` Alan Jenkins
@ 2008-10-29 15:08       ` Henrique de Moraes Holschuh
  2008-10-29 16:35         ` Alan Jenkins
  0 siblings, 1 reply; 10+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-10-29 15:08 UTC (permalink / raw)
  To: Alan Jenkins; +Cc: Alexey Starikovskiy, linux acpi, Len Brown

On Wed, 29 Oct 2008, Alan Jenkins wrote:
> Henrique de Moraes Holschuh wrote:
> > On Tue, 28 Oct 2008, Alexey Starikovskiy wrote:
> >> Alan Jenkins wrote:
> >>> <http://bugzilla.kernel.org/show_bug.cgi?id=11841>
> >>> "plenty of line "ACPI: EC: non-query interrupt received,
> >>>  switching to interrupt mode" in dmesg"
> >>>       
> >> Probably, it is better to make pr_debug().
> >>     
> >
> > Please don't do that.  This code has had a lot of churn, and *regressions*
> > as of lately, and sometimes we only notice these because we see those
> > messages in the logs.  Moving them to pr_debug() pretty much makes them
> > utterly useless in a large number of the cases they could be of help.
> >
> > Besides, I very much doubt we will stop seing EC interrupt crappage. Not
> > only our code is NOT good and resilient enough yet (if it were, there
> > wouldn't be so many patches flying about it), the vendors are obviously
> > getting this wrong at a fast rate.
> >
> > We need those messages.  Rate-limit them, but don't hide them or move them
> > to pr_debug, please.
> 
> Please have a look at the dmesg attached to the bug.  They're already
> rate-limited.

If people are considering moving it to pr_debug, it is not rate-limited
enough (one per mode switch is not enough if the EC or the code is behaving
so bad that it switches modes too often)...

> When in GPE storm avoidance mode, they will trigger once for each
> transaction.  Transactions happen frequently, and will happen
> continually once e.g. gnome-power-manager is polling the battery level. 
> In this special case, they're not a useful message to users or
> blackbox-level testers; they are only useful as part of a full DEBUG
> trace that actually shows the transactions.

Well, if they move to pr_debug _only_ when in GPE storm avoidance mode, AND
we get the logging of entering AND exiting GPE storm avoidance modes, that
would be quite acceptable, I think.

> My original patch suppresses the message in this particular case, but it
> preserves it for the common non-storm case, where it may provide useful
> information.  And the message would still happen once on boot, before
> the GPE storm is detected.  Unfortunately my patch also makes the driver
> a little less robust.  If the robustness issue can be addressed, do you
> accept that it's a good idea to suppress the flood of duplicate messages
> reported in this bug?

As I said above, if you supress them ONLY during GPE storm avoidance, then
yes, I am quite OK with it.

But we really should have GPE storm avoidance events logged, if we don't do
that already.

> We already have... damn.  I think you missed a more important omission. 
> There _used_ to be a message that says we've switched to storm avoidance
> mode.  However, it was removed in the latest re-write.  This bug report
> suggests that a) the cause would have been more obvious if we had the
> GPE storm message, and b) the stormy case wasn't really tested so we
> really do need a message, in case it goes wrong.

Indeed, that's bad, and needs to be fixed IMHO.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ACPI: EC: Fix logspam in "GPE storm avoidance" code
  2008-10-29 15:08       ` Henrique de Moraes Holschuh
@ 2008-10-29 16:35         ` Alan Jenkins
  2008-10-29 19:00           ` Alexey Starikovskiy
  2008-10-30  1:47           ` Zhao Yakui
  0 siblings, 2 replies; 10+ messages in thread
From: Alan Jenkins @ 2008-10-29 16:35 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh; +Cc: Alexey Starikovskiy, linux acpi, Len Brown

Henrique de Moraes Holschuh wrote:
> On Wed, 29 Oct 2008, Alan Jenkins wrote:
>   
>> Henrique de Moraes Holschuh wrote:
>>     
>>> On Tue, 28 Oct 2008, Alexey Starikovskiy wrote:
>>>       
>>>> Alan Jenkins wrote:
>>>>         
>>>>> <http://bugzilla.kernel.org/show_bug.cgi?id=11841>
>>>>> "plenty of line "ACPI: EC: non-query interrupt received,
>>>>>  switching to interrupt mode" in dmesg"
>>>>>       
>>>>>           
>>>> Probably, it is better to make pr_debug().
>>>>     
>>>>         
>>> Please don't do that.  This code has had a lot of churn, and *regressions*
>>> as of lately, and sometimes we only notice these because we see those
>>> messages in the logs.  Moving them to pr_debug() pretty much makes them
>>> utterly useless in a large number of the cases they could be of help.
>>>
>>> Besides, I very much doubt we will stop seing EC interrupt crappage. Not
>>> only our code is NOT good and resilient enough yet (if it were, there
>>> wouldn't be so many patches flying about it), the vendors are obviously
>>> getting this wrong at a fast rate.
>>>
>>> We need those messages.  Rate-limit them, but don't hide them or move them
>>> to pr_debug, please.
>>>       
>> Please have a look at the dmesg attached to the bug.  They're already
>> rate-limited.
>>     
>
> If people are considering moving it to pr_debug, it is not rate-limited
> enough (one per mode switch is not enough if the EC or the code is behaving
> so bad that it switches modes too often)...
>
>   
>> When in GPE storm avoidance mode, they will trigger once for each
>> transaction.  Transactions happen frequently, and will happen
>> continually once e.g. gnome-power-manager is polling the battery level. 
>> In this special case, they're not a useful message to users or
>> blackbox-level testers; they are only useful as part of a full DEBUG
>> trace that actually shows the transactions.
>>     
>
> Well, if they move to pr_debug _only_ when in GPE storm avoidance mode, AND
> we get the logging of entering AND exiting GPE storm avoidance modes, that
> would be quite acceptable, I think.
>
>   
>> My original patch suppresses the message in this particular case, but it
>> preserves it for the common non-storm case, where it may provide useful
>> information.  And the message would still happen once on boot, before
>> the GPE storm is detected.  Unfortunately my patch also makes the driver
>> a little less robust.  If the robustness issue can be addressed, do you
>> accept that it's a good idea to suppress the flood of duplicate messages
>> reported in this bug?
>>     
>
> As I said above, if you supress them ONLY during GPE storm avoidance, then
> yes, I am quite OK with it.
>
> But we really should have GPE storm avoidance events logged, if we don't do
> that already.
>
>   
>> We already have... damn.  I think you missed a more important omission. 
>> There _used_ to be a message that says we've switched to storm avoidance
>> mode.  However, it was removed in the latest re-write.  This bug report
>> suggests that a) the cause would have been more obvious if we had the
>> GPE storm message, and b) the stormy case wasn't really tested so we
>> really do need a message, in case it goes wrong.
>>     
>
> Indeed, that's bad, and needs to be fixed IMHO.
>   

Alex, can we persuade you?  Here are the two changes in code form
(untested).

------------>

ACPI: EC: make kernel messages more useful when GPE storm is detected

Make sure we can tell if the GPE storm workaround gets activated,
and avoid flooding the logs afterwards.

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index ef42316..139046c 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -286,7 +286,8 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
 		acpi_enable_gpe(NULL, ec->gpe, ACPI_NOT_ISR);
 	} else if (test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
 		   t->irq_count > ACPI_EC_STORM_THRESHOLD) {
-		pr_debug(PREFIX "GPE storm detected\n");
+		pr_info(PREFIX "GPE storm detected, "
+			"transactions will use polling mode\n");
 		set_bit(EC_FLAGS_GPE_STORM, &ec->flags);
 	}
 	return ret;
@@ -566,9 +567,15 @@ static u32 acpi_ec_gpe_handler(void *data)
 	if (!test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
 	    !test_bit(EC_FLAGS_NO_GPE, &ec->flags)) {
 		/* this is non-query, must be confirmation */
-		if (printk_ratelimit())
-			pr_info(PREFIX "non-query interrupt received,"
+		if (!test_bit(EC_FLAGS_GPE_STORM, &ec->flags)) {
+			if (printk_ratelimit())
+				pr_info(PREFIX "non-query interrupt received,"
+					" switching to interrupt mode\n");
+		} else {
+			/* hush, STORM switches the mode every transaction */
+			pr_debug(PREFIX "non-query interrupt received,"
 				" switching to interrupt mode\n");
+		}
 		set_bit(EC_FLAGS_GPE_MODE, &ec->flags);
 	}
 	return ACPI_INTERRUPT_HANDLED;




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: ACPI: EC: Fix logspam in "GPE storm avoidance" code
  2008-10-29 16:35         ` Alan Jenkins
@ 2008-10-29 19:00           ` Alexey Starikovskiy
  2008-11-01 11:03             ` Alan Jenkins
  2008-10-30  1:47           ` Zhao Yakui
  1 sibling, 1 reply; 10+ messages in thread
From: Alexey Starikovskiy @ 2008-10-29 19:00 UTC (permalink / raw)
  To: Alan Jenkins; +Cc: Henrique de Moraes Holschuh, linux acpi, Len Brown

Alan Jenkins wrote:
> Henrique de Moraes Holschuh wrote:
>> On Wed, 29 Oct 2008, Alan Jenkins wrote:
>>   
>>> Henrique de Moraes Holschuh wrote:
>>>     
>>>> On Tue, 28 Oct 2008, Alexey Starikovskiy wrote:
>>>>       
>>>>> Alan Jenkins wrote:
>>>>>         
>>>>>> <http://bugzilla.kernel.org/show_bug.cgi?id=11841>
>>>>>> "plenty of line "ACPI: EC: non-query interrupt received,
>>>>>>  switching to interrupt mode" in dmesg"
>>>>>>       
>>>>>>           
>>>>> Probably, it is better to make pr_debug().
>>>>>     
>>>>>         
>>>> Please don't do that.  This code has had a lot of churn, and *regressions*
>>>> as of lately, and sometimes we only notice these because we see those
>>>> messages in the logs.  Moving them to pr_debug() pretty much makes them
>>>> utterly useless in a large number of the cases they could be of help.
>>>>
>>>> Besides, I very much doubt we will stop seing EC interrupt crappage. Not
>>>> only our code is NOT good and resilient enough yet (if it were, there
>>>> wouldn't be so many patches flying about it), the vendors are obviously
>>>> getting this wrong at a fast rate.
>>>>
>>>> We need those messages.  Rate-limit them, but don't hide them or move them
>>>> to pr_debug, please.
>>>>       
>>> Please have a look at the dmesg attached to the bug.  They're already
>>> rate-limited.
>>>     
>> If people are considering moving it to pr_debug, it is not rate-limited
>> enough (one per mode switch is not enough if the EC or the code is behaving
>> so bad that it switches modes too often)...
>>
>>   
>>> When in GPE storm avoidance mode, they will trigger once for each
>>> transaction.  Transactions happen frequently, and will happen
>>> continually once e.g. gnome-power-manager is polling the battery level. 
>>> In this special case, they're not a useful message to users or
>>> blackbox-level testers; they are only useful as part of a full DEBUG
>>> trace that actually shows the transactions.
>>>     
>> Well, if they move to pr_debug _only_ when in GPE storm avoidance mode, AND
>> we get the logging of entering AND exiting GPE storm avoidance modes, that
>> would be quite acceptable, I think.
>>
>>   
>>> My original patch suppresses the message in this particular case, but it
>>> preserves it for the common non-storm case, where it may provide useful
>>> information.  And the message would still happen once on boot, before
>>> the GPE storm is detected.  Unfortunately my patch also makes the driver
>>> a little less robust.  If the robustness issue can be addressed, do you
>>> accept that it's a good idea to suppress the flood of duplicate messages
>>> reported in this bug?
>>>     
>> As I said above, if you supress them ONLY during GPE storm avoidance, then
>> yes, I am quite OK with it.
>>
>> But we really should have GPE storm avoidance events logged, if we don't do
>> that already.
>>
>>   
>>> We already have... damn.  I think you missed a more important omission. 
>>> There _used_ to be a message that says we've switched to storm avoidance
>>> mode.  However, it was removed in the latest re-write.  This bug report
>>> suggests that a) the cause would have been more obvious if we had the
>>> GPE storm message, and b) the stormy case wasn't really tested so we
>>> really do need a message, in case it goes wrong.
>>>     
>> Indeed, that's bad, and needs to be fixed IMHO.
>>   
> 
> Alex, can we persuade you?  Here are the two changes in code form
> (untested).
Well, I was the one who inserted all this printouts in the first place...
Then every second reader of dmesg, and there are quite a lot of those starts to
open a bug report -- "my system prints bla-bla"...
So, if there is demand to have this printed -- there is no objection from me.
> 
> ------------>
> 
> ACPI: EC: make kernel messages more useful when GPE storm is detected
> 
> Make sure we can tell if the GPE storm workaround gets activated,
> and avoid flooding the logs afterwards.
> 
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index ef42316..139046c 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -286,7 +286,8 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
>  		acpi_enable_gpe(NULL, ec->gpe, ACPI_NOT_ISR);
>  	} else if (test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
>  		   t->irq_count > ACPI_EC_STORM_THRESHOLD) {
> -		pr_debug(PREFIX "GPE storm detected\n");
> +		pr_info(PREFIX "GPE storm detected, "
> +			"transactions will use polling mode\n");
>  		set_bit(EC_FLAGS_GPE_STORM, &ec->flags);
>  	}
>  	return ret;
> @@ -566,9 +567,15 @@ static u32 acpi_ec_gpe_handler(void *data)
>  	if (!test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
>  	    !test_bit(EC_FLAGS_NO_GPE, &ec->flags)) {
>  		/* this is non-query, must be confirmation */
> -		if (printk_ratelimit())
> -			pr_info(PREFIX "non-query interrupt received,"
> +		if (!test_bit(EC_FLAGS_GPE_STORM, &ec->flags)) {
> +			if (printk_ratelimit())
> +				pr_info(PREFIX "non-query interrupt received,"
> +					" switching to interrupt mode\n");
> +		} else {
> +			/* hush, STORM switches the mode every transaction */
> +			pr_debug(PREFIX "non-query interrupt received,"
>  				" switching to interrupt mode\n");
> +		}
>  		set_bit(EC_FLAGS_GPE_MODE, &ec->flags);
>  	}
>  	return ACPI_INTERRUPT_HANDLED;
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ACPI: EC: Fix logspam in "GPE storm avoidance" code
  2008-10-29 16:35         ` Alan Jenkins
  2008-10-29 19:00           ` Alexey Starikovskiy
@ 2008-10-30  1:47           ` Zhao Yakui
  1 sibling, 0 replies; 10+ messages in thread
From: Zhao Yakui @ 2008-10-30  1:47 UTC (permalink / raw)
  To: Alan Jenkins
  Cc: Henrique de Moraes Holschuh, Alexey Starikovskiy, linux acpi,
	Len Brown

On Thu, 2008-10-30 at 00:35 +0800, Alan Jenkins wrote:
> Henrique de Moraes Holschuh wrote:
> > On Wed, 29 Oct 2008, Alan Jenkins wrote:
> >   
> >> Henrique de Moraes Holschuh wrote:
> >>     
> >>> On Tue, 28 Oct 2008, Alexey Starikovskiy wrote:
> >>>       
> >>>> Alan Jenkins wrote:
> >>>>         
> >>>>> <http://bugzilla.kernel.org/show_bug.cgi?id=11841>
> >>>>> "plenty of line "ACPI: EC: non-query interrupt received,
> >>>>>  switching to interrupt mode" in dmesg"
> >>>>>       
> >>>>>           
> >>>> Probably, it is better to make pr_debug().
> >>>>     
> >>>>         
> >>> Please don't do that.  This code has had a lot of churn, and *regressions*
> >>> as of lately, and sometimes we only notice these because we see those
> >>> messages in the logs.  Moving them to pr_debug() pretty much makes them
> >>> utterly useless in a large number of the cases they could be of help.
> >>>
> >>> Besides, I very much doubt we will stop seing EC interrupt crappage. Not
> >>> only our code is NOT good and resilient enough yet (if it were, there
> >>> wouldn't be so many patches flying about it), the vendors are obviously
> >>> getting this wrong at a fast rate.
> >>>
> >>> We need those messages.  Rate-limit them, but don't hide them or move them
> >>> to pr_debug, please.
> >>>       
> >> Please have a look at the dmesg attached to the bug.  They're already
> >> rate-limited.
> >>     
> >
> > If people are considering moving it to pr_debug, it is not rate-limited
> > enough (one per mode switch is not enough if the EC or the code is behaving
> > so bad that it switches modes too often)...
> >
> >   
> >> When in GPE storm avoidance mode, they will trigger once for each
> >> transaction.  Transactions happen frequently, and will happen
> >> continually once e.g. gnome-power-manager is polling the battery level. 
> >> In this special case, they're not a useful message to users or
> >> blackbox-level testers; they are only useful as part of a full DEBUG
> >> trace that actually shows the transactions.
> >>     
> >
> > Well, if they move to pr_debug _only_ when in GPE storm avoidance mode, AND
> > we get the logging of entering AND exiting GPE storm avoidance modes, that
> > would be quite acceptable, I think.
> >
> >   
> >> My original patch suppresses the message in this particular case, but it
> >> preserves it for the common non-storm case, where it may provide useful
> >> information.  And the message would still happen once on boot, before
> >> the GPE storm is detected.  Unfortunately my patch also makes the driver
> >> a little less robust.  If the robustness issue can be addressed, do you
> >> accept that it's a good idea to suppress the flood of duplicate messages
> >> reported in this bug?
> >>     
> >
> > As I said above, if you supress them ONLY during GPE storm avoidance, then
> > yes, I am quite OK with it.
> >
> > But we really should have GPE storm avoidance events logged, if we don't do
> > that already.
> >
> >   
> >> We already have... damn.  I think you missed a more important omission. 
> >> There _used_ to be a message that says we've switched to storm avoidance
> >> mode.  However, it was removed in the latest re-write.  This bug report
> >> suggests that a) the cause would have been more obvious if we had the
> >> GPE storm message, and b) the stormy case wasn't really tested so we
> >> really do need a message, in case it goes wrong.
> >>     
> >
> > Indeed, that's bad, and needs to be fixed IMHO.
> >   
> 
> Alex, can we persuade you?  Here are the two changes in code form
> (untested).
> 
> ------------>
> 
> ACPI: EC: make kernel messages more useful when GPE storm is detected
> 
> Make sure we can tell if the GPE storm workaround gets activated,
> and avoid flooding the logs afterwards.
> 
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index ef42316..139046c 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -286,7 +286,8 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
>  		acpi_enable_gpe(NULL, ec->gpe, ACPI_NOT_ISR);
>  	} else if (test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
>  		   t->irq_count > ACPI_EC_STORM_THRESHOLD) {
> -		pr_debug(PREFIX "GPE storm detected\n");
this is very good. It is helpful to analyze the root cause.
> +		pr_info(PREFIX "GPE storm detected, "
> +			"transactions will use polling mode\n");
>  		set_bit(EC_FLAGS_GPE_STORM, &ec->flags);
>  	}
>  	return ret;
> @@ -566,9 +567,15 @@ static u32 acpi_ec_gpe_handler(void *data)
>  	if (!test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
>  	    !test_bit(EC_FLAGS_NO_GPE, &ec->flags)) {
>  		/* this is non-query, must be confirmation */
> -		if (printk_ratelimit())
> -			pr_info(PREFIX "non-query interrupt received,"
After the following is added, the  plenty of "non-query interrupt
received" will disappear. But if the DEBUG is enabled in EC driver, the
problem still exists. In fact when EC GPE storm is detected, the EC GPE
is disabled while in EC transaction. 
    Maybe it is appropriate that the NO_GPE bit is set when GPE storm is
detected. If so, it won't be switched to interrupt mode again in the EC
GPE handler.
> +		if (!test_bit(EC_FLAGS_GPE_STORM, &ec->flags)) {
> +			if (printk_ratelimit())
> +				pr_info(PREFIX "non-query interrupt received,"
> +					" switching to interrupt mode\n");
> +		} else {
> +			/* hush, STORM switches the mode every transaction */
> +			pr_debug(PREFIX "non-query interrupt received,"
>  				" switching to interrupt mode\n");
> +		}
>  		set_bit(EC_FLAGS_GPE_MODE, &ec->flags);
>  	}
>  	return ACPI_INTERRUPT_HANDLED;
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ACPI: EC: Fix logspam in "GPE storm avoidance" code
  2008-10-29 19:00           ` Alexey Starikovskiy
@ 2008-11-01 11:03             ` Alan Jenkins
  0 siblings, 0 replies; 10+ messages in thread
From: Alan Jenkins @ 2008-11-01 11:03 UTC (permalink / raw)
  To: Alexey Starikovskiy; +Cc: Henrique de Moraes Holschuh, linux acpi, Len Brown

Alexey Starikovskiy wrote:
> Alan Jenkins wrote:
>> Henrique de Moraes Holschuh wrote:
>>> On Wed, 29 Oct 2008, Alan Jenkins wrote:
>>>  
>>>> Henrique de Moraes Holschuh wrote:
>>>>    
>>>>> On Tue, 28 Oct 2008, Alexey Starikovskiy wrote:
>>>>>      
>>>>>> Alan Jenkins wrote:
>>>>>>        
>>>>>>> <http://bugzilla.kernel.org/show_bug.cgi?id=11841>
>>>>>>> "plenty of line "ACPI: EC: non-query interrupt received,
>>>>>>>  switching to interrupt mode" in dmesg"
>>>>>>>                 
>>>>>> Probably, it is better to make pr_debug().
>>>>>>             
>>>>> Please don't do that.  This code has had a lot of churn, and
>>>>> *regressions*
>>>>> as of lately, and sometimes we only notice these because we see those
>>>>> messages in the logs.  Moving them to pr_debug() pretty much makes
>>>>> them
>>>>> utterly useless in a large number of the cases they could be of help.
>>>>>
>>>>> Besides, I very much doubt we will stop seing EC interrupt
>>>>> crappage. Not
>>>>> only our code is NOT good and resilient enough yet (if it were, there
>>>>> wouldn't be so many patches flying about it), the vendors are
>>>>> obviously
>>>>> getting this wrong at a fast rate.
>>>>>
>>>>> We need those messages.  Rate-limit them, but don't hide them or
>>>>> move them
>>>>> to pr_debug, please.
>>>>>       
>>>> Please have a look at the dmesg attached to the bug.  They're already
>>>> rate-limited.
>>>>     
>>> If people are considering moving it to pr_debug, it is not rate-limited
>>> enough (one per mode switch is not enough if the EC or the code is
>>> behaving
>>> so bad that it switches modes too often)...
>>>
>>>  
>>>> When in GPE storm avoidance mode, they will trigger once for each
>>>> transaction.  Transactions happen frequently, and will happen
>>>> continually once e.g. gnome-power-manager is polling the battery
>>>> level. In this special case, they're not a useful message to users or
>>>> blackbox-level testers; they are only useful as part of a full DEBUG
>>>> trace that actually shows the transactions.
>>>>     
>>> Well, if they move to pr_debug _only_ when in GPE storm avoidance
>>> mode, AND
>>> we get the logging of entering AND exiting GPE storm avoidance
>>> modes, that
>>> would be quite acceptable, I think.
>>>
>>>  
>>>> My original patch suppresses the message in this particular case,
>>>> but it
>>>> preserves it for the common non-storm case, where it may provide
>>>> useful
>>>> information.  And the message would still happen once on boot, before
>>>> the GPE storm is detected.  Unfortunately my patch also makes the
>>>> driver
>>>> a little less robust.  If the robustness issue can be addressed, do
>>>> you
>>>> accept that it's a good idea to suppress the flood of duplicate
>>>> messages
>>>> reported in this bug?
>>>>     
>>> As I said above, if you supress them ONLY during GPE storm
>>> avoidance, then
>>> yes, I am quite OK with it.
>>>
>>> But we really should have GPE storm avoidance events logged, if we
>>> don't do
>>> that already.
>>>
>>>  
>>>> We already have... damn.  I think you missed a more important
>>>> omission. There _used_ to be a message that says we've switched to
>>>> storm avoidance
>>>> mode.  However, it was removed in the latest re-write.  This bug
>>>> report
>>>> suggests that a) the cause would have been more obvious if we had the
>>>> GPE storm message, and b) the stormy case wasn't really tested so we
>>>> really do need a message, in case it goes wrong.
>>>>     
>>> Indeed, that's bad, and needs to be fixed IMHO.
>>>   
>>
>> Alex, can we persuade you?  Here are the two changes in code form
>> (untested).
> Well, I was the one who inserted all this printouts in the first place...
> Then every second reader of dmesg, and there are quite a lot of those
> starts to
> open a bug report -- "my system prints bla-bla"...
> So, if there is demand to have this printed -- there is no objection
> from me.

To be fair, some of those reports were from people whose hardware used
to work, and was then degraded by the old polling code.  I.e. laggy /
jerky brightness hotkeys, which are rather noticable, especially with
auto-repeat.  It wasn't the _message_ that they were complaining about :).

Sorry for the delay, I had to test the code first!  I set
ACPI_EC_STORM_THRESHOLD to zero to get the storm avoidance to trigger on
my hardware.  I had to apply on top of the acpi-test tree, i.e. the
msleep()->udelay() patch, otherwise the storm avoidance kills the EC. 
After that, I could see the flood of messages.  My patch got rid of the
message flood, as expected.

Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-11-01 11:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-28 14:05 ACPI: EC: Fix logspam in "GPE storm avoidance" code Alan Jenkins
2008-10-28 18:25 ` Alexey Starikovskiy
2008-10-28 20:18   ` Alan Jenkins
2008-10-29  0:12   ` Henrique de Moraes Holschuh
2008-10-29  9:51     ` Alan Jenkins
2008-10-29 15:08       ` Henrique de Moraes Holschuh
2008-10-29 16:35         ` Alan Jenkins
2008-10-29 19:00           ` Alexey Starikovskiy
2008-11-01 11:03             ` Alan Jenkins
2008-10-30  1:47           ` Zhao Yakui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox