[RFC] [Patch 0/4] ACPI : several patches for EC

linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] [Patch 0/4]  ACPI : several patches for EC
@ 2008-09-25  3:53 Zhao Yakui
  2008-09-25  5:25 ` Alexey Starikovskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Zhao Yakui @ 2008-09-25  3:53 UTC (permalink / raw)
  To: linux-acpi; +Cc: lenb

Hi, All
 
    The following are several patches for EC driver.
    
     Patch 1/4: Don't issue the burst disable command if EC exits the
burst mode
     It is meaningless that OS continues to issue the burst disable
command if the EC already exists the burst mode. So if EC already exists
the burst mode, don't issue burst disable command again.
     If OS continues to issue the burst disable command, there exist
some meaningless ACPI interrupts.
   
    Patch 2/4: Simplify EC working flowchart and always enable EC GPE 
    In current kernel the EC driver will be started in polling mode. When the EC
GPE interrupt is triggered, it will be switched to EC GPE interrupt mode.
And when there is no EC GPE confirmation for some EC transaction on some broken
laptops,the EC driver will be switched to polling mode and EC GPE will be
disabled. In such case the EC notification event will be lost.
        For example: Bug 11428/8459
    http://bugzilla.kernel.org/show_bug.cgi?id=11428

    But if the EC GPE is still enable although switching to polling mode, the EC
notification event won't be lost.
    
    In the patch the EC default work mode will be interrupt-driven while EC driver
is started. Only when there is no EC GPE interrupt for some EC transactions, it 
will be switched to polling mode. But EC GPE is still enabled.

    Patch 3/4: Add some delay in EC GPE handler to workaround EC GPE storm
    On some broken BIOS there exists the EC GPE interrupt storm. It means that there exist
too many ACPI interrupts related with EC GPE.
    If the EC GPE is still enabled, it will cause that keystrokes will be lost and 
the system can't work well. For example: 
    http://bugzilla.kernel.org/show_bug.cgi?id=9998

	If the EC GPE is disabled after EC GPE storm happens, some system will be unstable.
     http://bugzilla.kernel.org/show_bug.cgi?id=10724

     And on some laptops with "incorrect EC status before EC GPE arrives" OS will
report the incorrect thermal temperature if EC GPE is disabled while EC GPE storm happens.
    http://bugzilla.kernel.org/show_bug.cgi?id=11309

    This patch is to add some delay in EC GPE handler on some broken
BIOS.Only when more than five interrupts happen in the same jiffies and
EC status are the same, OS will add some delay in the EC GPE handler. If
the same issue still happens after adding delay,the delay time will be
increased.But the max delay time is ten microseconds.
   In such case the EC GPE is still enabled although EC GPE storm happens. And EC driver
will continue to work in interrupt-driven mode. But the number of EC GPE interrupts can
be reduced very significantly.

   I know that this method is ugly. Some delay is added in the EC GPE handler(interrupt context).
But no delay is added on the normal laptops. It only workarounds the broken laptops.
    
    Patch 4/4: Cleanup some useless code in EC driver
    If the above three patches are applied, there exist some useless code. Cleanup it.
    

Thanks for the comments.
  Yakui



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [Patch 0/4]  ACPI : several patches for EC
  2008-09-25  3:53 [RFC] [Patch 0/4] ACPI : several patches for EC Zhao Yakui
@ 2008-09-25  5:25 ` Alexey Starikovskiy
  2008-09-25  6:41   ` Zhao Yakui
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Starikovskiy @ 2008-09-25  5:25 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: linux-acpi, lenb

Hi Yakui,

As a current maintainer of EC driver I NAK the whole series,
as conflicting with the patch proposed by me.

Regards,
Alex.

Zhao Yakui wrote:
> Hi, All
>  
>     The following are several patches for EC driver.
>     
>      Patch 1/4: Don't issue the burst disable command if EC exits the
> burst mode
>      It is meaningless that OS continues to issue the burst disable
> command if the EC already exists the burst mode. So if EC already exists
> the burst mode, don't issue burst disable command again.
>      If OS continues to issue the burst disable command, there exist
> some meaningless ACPI interrupts.
>    
>     Patch 2/4: Simplify EC working flowchart and always enable EC GPE 
>     In current kernel the EC driver will be started in polling mode. When the EC
> GPE interrupt is triggered, it will be switched to EC GPE interrupt mode.
> And when there is no EC GPE confirmation for some EC transaction on some broken
> laptops,the EC driver will be switched to polling mode and EC GPE will be
> disabled. In such case the EC notification event will be lost.
>         For example: Bug 11428/8459
>     http://bugzilla.kernel.org/show_bug.cgi?id=11428
>
>     But if the EC GPE is still enable although switching to polling mode, the EC
> notification event won't be lost.
>     
>     In the patch the EC default work mode will be interrupt-driven while EC driver
> is started. Only when there is no EC GPE interrupt for some EC transactions, it 
> will be switched to polling mode. But EC GPE is still enabled.
>
>     Patch 3/4: Add some delay in EC GPE handler to workaround EC GPE storm
>     On some broken BIOS there exists the EC GPE interrupt storm. It means that there exist
> too many ACPI interrupts related with EC GPE.
>     If the EC GPE is still enabled, it will cause that keystrokes will be lost and 
> the system can't work well. For example: 
>     http://bugzilla.kernel.org/show_bug.cgi?id=9998
>
> 	If the EC GPE is disabled after EC GPE storm happens, some system will be unstable.
>      http://bugzilla.kernel.org/show_bug.cgi?id=10724
>
>      And on some laptops with "incorrect EC status before EC GPE arrives" OS will
> report the incorrect thermal temperature if EC GPE is disabled while EC GPE storm happens.
>     http://bugzilla.kernel.org/show_bug.cgi?id=11309
>
>     This patch is to add some delay in EC GPE handler on some broken
> BIOS.Only when more than five interrupts happen in the same jiffies and
> EC status are the same, OS will add some delay in the EC GPE handler. If
> the same issue still happens after adding delay,the delay time will be
> increased.But the max delay time is ten microseconds.
>    In such case the EC GPE is still enabled although EC GPE storm happens. And EC driver
> will continue to work in interrupt-driven mode. But the number of EC GPE interrupts can
> be reduced very significantly.
>
>    I know that this method is ugly. Some delay is added in the EC GPE handler(interrupt context).
> But no delay is added on the normal laptops. It only workarounds the broken laptops.
>     
>     Patch 4/4: Cleanup some useless code in EC driver
>     If the above three patches are applied, there exist some useless code. Cleanup it.
>     
>
> Thanks for the comments.
>   Yakui
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [Patch 0/4]  ACPI : several patches for EC
  2008-09-25  5:25 ` Alexey Starikovskiy
@ 2008-09-25  6:41   ` Zhao Yakui
  2008-09-25  8:31     ` Alexey Starikovskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Zhao Yakui @ 2008-09-25  6:41 UTC (permalink / raw)
  To: Alexey Starikovskiy; +Cc: linux-acpi, lenb

On Thu, 2008-09-25 at 09:25 +0400, Alexey Starikovskiy wrote:
> Hi Yakui,
> 
> As a current maintainer of EC driver I NAK the whole series,
> as conflicting with the patch proposed by me.
Please NAK the whole series if my patch is not reasonable or will cause
some regression. If so, please give some explanation.
Otherwise it is unconvincing if you NAK my patch only because it is
conflicted with your patch.

Now the patch proposed by your patch is already reverted in acpi_test
tree. There are a lot of errors in it.

At the same time I raise two issues about your proposed patch. But there
is no explanation.
   a. Bogus timeout
   b. How to deal with the laptop with "incorrect EC status before EC
GPE arrives". For example: bug 11309 (GPE storm happens and OS will
report the incorrect temperature while EC GPE is disabled.)
      Maybe the laptop of bug 8110 is broken again by your proposed
patch.

Best regards.
   Yakui
> 
> Regards,
> Alex.
> 
> Zhao Yakui wrote:
> > Hi, All
> >  
> >     The following are several patches for EC driver.
> >     
> >      Patch 1/4: Don't issue the burst disable command if EC exits the
> > burst mode
> >      It is meaningless that OS continues to issue the burst disable
> > command if the EC already exists the burst mode. So if EC already exists
> > the burst mode, don't issue burst disable command again.
> >      If OS continues to issue the burst disable command, there exist
> > some meaningless ACPI interrupts.
> >    
> >     Patch 2/4: Simplify EC working flowchart and always enable EC GPE 
> >     In current kernel the EC driver will be started in polling mode. When the EC
> > GPE interrupt is triggered, it will be switched to EC GPE interrupt mode.
> > And when there is no EC GPE confirmation for some EC transaction on some broken
> > laptops,the EC driver will be switched to polling mode and EC GPE will be
> > disabled. In such case the EC notification event will be lost.
> >         For example: Bug 11428/8459
> >     http://bugzilla.kernel.org/show_bug.cgi?id=11428
> >
> >     But if the EC GPE is still enable although switching to polling mode, the EC
> > notification event won't be lost.
> >     
> >     In the patch the EC default work mode will be interrupt-driven while EC driver
> > is started. Only when there is no EC GPE interrupt for some EC transactions, it 
> > will be switched to polling mode. But EC GPE is still enabled.
> >
> >     Patch 3/4: Add some delay in EC GPE handler to workaround EC GPE storm
> >     On some broken BIOS there exists the EC GPE interrupt storm. It means that there exist
> > too many ACPI interrupts related with EC GPE.
> >     If the EC GPE is still enabled, it will cause that keystrokes will be lost and 
> > the system can't work well. For example: 
> >     http://bugzilla.kernel.org/show_bug.cgi?id=9998
> >
> > 	If the EC GPE is disabled after EC GPE storm happens, some system will be unstable.
> >      http://bugzilla.kernel.org/show_bug.cgi?id=10724
> >
> >      And on some laptops with "incorrect EC status before EC GPE arrives" OS will
> > report the incorrect thermal temperature if EC GPE is disabled while EC GPE storm happens.
> >     http://bugzilla.kernel.org/show_bug.cgi?id=11309
> >
> >     This patch is to add some delay in EC GPE handler on some broken
> > BIOS.Only when more than five interrupts happen in the same jiffies and
> > EC status are the same, OS will add some delay in the EC GPE handler. If
> > the same issue still happens after adding delay,the delay time will be
> > increased.But the max delay time is ten microseconds.
> >    In such case the EC GPE is still enabled although EC GPE storm happens. And EC driver
> > will continue to work in interrupt-driven mode. But the number of EC GPE interrupts can
> > be reduced very significantly.
> >
> >    I know that this method is ugly. Some delay is added in the EC GPE handler(interrupt context).
> > But no delay is added on the normal laptops. It only workarounds the broken laptops.
> >     
> >     Patch 4/4: Cleanup some useless code in EC driver
> >     If the above three patches are applied, there exist some useless code. Cleanup it.
> >     
> >
> > Thanks for the comments.
> >   Yakui
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >   
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [Patch 0/4]  ACPI : several patches for EC
  2008-09-25  6:41   ` Zhao Yakui
@ 2008-09-25  8:31     ` Alexey Starikovskiy
  2008-09-25 10:05       ` Zhao Yakui
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Starikovskiy @ 2008-09-25  8:31 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: linux-acpi, lenb

Zhao Yakui wrote:
> On Thu, 2008-09-25 at 09:25 +0400, Alexey Starikovskiy wrote:
>   
>> Hi Yakui,
>>
>> As a current maintainer of EC driver I NAK the whole series,
>> as conflicting with the patch proposed by me.
>>     
> Please NAK the whole series if my patch is not reasonable or will cause
> some regression. If so, please give some explanation.
>   
I gave you explanations for two weeks already. Enough is enough.
> Otherwise it is unconvincing if you NAK my patch only because it is
> conflicted with your patch.
>
> Now the patch proposed by your patch is already reverted in acpi_test
> tree. There are a lot of errors in it.
>   
This is a lie.
> At the same time I raise two issues about your proposed patch. But there
> is no explanation.
>    a. Bogus timeout
>   
You are referring to bug report, there system was stalling to 120 seconds
due to broken HPET.  While there is no need to try to workaround such 
things,
changing to udelay() from msleep() in poll mode will fix it.
>    b. How to deal with the laptop with "incorrect EC status before EC
> GPE arrives". For example: bug 11309 (GPE storm happens and OS will
> report the incorrect temperature while EC GPE is disabled.)
>   
This is _your guess_. None of your patches was reported to fix a situation,
and submitter is not able to compile kernel.
>       Maybe the laptop of bug 8110 is broken again by your proposed
> patch.
>   
It is not, and I explained to you why.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [Patch 0/4]  ACPI : several patches for EC
  2008-09-25  8:31     ` Alexey Starikovskiy
@ 2008-09-25 10:05       ` Zhao Yakui
  2008-09-25 11:05         ` Alexey Starikovskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Zhao Yakui @ 2008-09-25 10:05 UTC (permalink / raw)
  To: Alexey Starikovskiy; +Cc: linux-acpi, lenb

On Thu, 2008-09-25 at 12:31 +0400, Alexey Starikovskiy wrote:
> Zhao Yakui wrote:
> >   
> I gave you explanations for two weeks already. Enough is enough.
Please send you explanation to Linux ACPI mailing list. OK?
    If my patch has the possibility to break some laptops or cause some
regression, please also give some bug number.
> > Otherwise it is unconvincing if you NAK my patch only because it is
> > conflicted with your patch.
> >
> > Now the patch proposed by your patch is already reverted in acpi_test
> > tree. There are a lot of errors in it.
On 9/4 you sent the patch to Andi. And there are a lot of errors.
   a. t->command is always zero.
   b. GPE_STORM mode is always set.
   c. There is no synchronization while t is accessed in different
routines.

   Of course the above three errors are fixed in the updated patch of
    http://bugzilla.kernel.org/show_bug.cgi?id=11549#C3.
   
   But I think that the spin_lock is overkill in the updated patch.
   Assuming that 1000 EC transactions are done per second, the CPU
interrupt is disabled for 1ms. It is important that the normal laptops
will be affected by this.
   
   At the same time the following source code looks so ugly.
>  The local variable in function is assigned to the global pointer variable.
>      >struct transaction_data t = {.wdata = wdata, .rdata = rdata,
>                                      .wlen = wdata_len, .rlen = rdata_len};
>       >ec->t = &t;
> 
    And this will be accessed by interrupt context. 
   
   Assuming that an application tries to access the battery info through the /proc 
interface,maybe there will exist kernel panic if the application is killed before it returns normally.
(The battery info is related with EC.)
In normal conditions it will be OK. But in some specific cases , maybe there exists the problem.

> This is a lie.
> > At the same time I raise two issues about your proposed patch. But there
> > is no explanation.
> >    a. Bogus timeout>   
> You are referring to bug report, there system was stalling to 120 seconds
> due to broken HPET.  While there is no need to try to workaround such 
> things,
> changing to udelay() from msleep() in poll mode will fix it.
Please look at the following source to see whether the bogus timeout
happens.
   >spin_unlock_irqrestore(&ec->spinlock, tmp);
   >	/* if we selected poll mode or failed in GPE-mode do a poll loop */  
   >	if (force_poll ||
   >	    !test_bit(EC_FLAGS_GPE_MODE, &ec->flags) ||
   >	    acpi_ec_wait(ec))
   >		ret = ec_poll(ec);
		
	When EC GPE storm happens, EC driver will work in polling mode. And it will
call the function of ec_poll after issuing the EC command.
   static int ec_poll(struct acpi_ec *ec)
{
	unsigned long delay = jiffies + msecs_to_jiffies(ACPI_EC_DELAY);
        //If preempt schedule happens here, the timeout happens after it is rescheduled.
        // In such case how to issue the following EC command sequence?
	while (time_before(jiffies, delay)) {
		gpe_transaction(ec, acpi_ec_read_status(ec));
		/* do a shortest sleep */
		msleep(1);
		if (ec_transaction_done(ec))
 			return 0;
	        //If the preempt schedule happens here, maybe the timeout happens when it is rescheduled. 
                // And the EC transaction is not finished. How to explain it?
 	}
-	pr_err(PREFIX "acpi_ec_wait timeout, status = 0x%2.2x, event = %s\n",
-		acpi_ec_read_status(ec),
-		(event == ACPI_EC_EVENT_OBF_1) ? "\"b0=1\"" : "\"b1=0\"");
 	return -ETIME;
 }
       I don't care whether the above bogus timeout often happens.  What I cared is
how to explain it or process it. 
       Regarded it as Timeout or ignore it? Unreasonable.       

> >    b. How to deal with the laptop with "incorrect EC status before EC
> > GPE arrives". For example: bug 11309 (GPE storm happens and OS will
> > report the incorrect temperature while EC GPE is disabled.)
> This is _your guess_. None of your patches was reported to fix a situation,
> and submitter is not able to compile kernel.
The bug reporter doesn't give response. But please pay attention to this
is a regression.
    The detect of EC GPE storm is not shipped in 2.6.24.x kernel.
    The detect of EC GPE storm is shipped in 2.6.26.x kernel.
    On the 2.6.24.x kernel there is no detection of EC GPE storm and EC
GPE is enabled. In such case the temperature is reported correctly.   
    And after EC GPE storm happens , sometimes it will report the
incorrect thermal zone temperature, which causes that the system will be
shutdown. (In such case EC GPE is disabled and EC will work in polling
mode).
    How to explain it? The possible cause is that EC status is incorrect
before EC GPE arrives.

The laptop of bug 8110 is fixed by the following commit.
    >commit 9e197219605513c14d3eae41039ecf1b82d1920d
     > Author: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
     > Date:   Wed Mar 7 18:29:35 2007 -0500
        >ACPI: ec: fix race in status register access
   
   If the EC GPE storm happens on this laptop, I believe that this
laptop will be broken by your proposed patch again.
   
> It is not, and I explained to you why.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [Patch 0/4]  ACPI : several patches for EC
  2008-09-25 10:05       ` Zhao Yakui
@ 2008-09-25 11:05         ` Alexey Starikovskiy
  2008-09-25 11:22           ` Alexey Starikovskiy
  2008-09-26  1:47           ` Zhao Yakui
  0 siblings, 2 replies; 9+ messages in thread
From: Alexey Starikovskiy @ 2008-09-25 11:05 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: linux-acpi, lenb

Zhao Yakui wrote:
> But I think that the spin_lock is overkill in the updated patch.
> Assuming that 1000 EC transactions are done per second, the CPU
> interrupt is disabled for 1ms. It is important that the normal laptops
> will be affected by this.
>   
How do you arrive with these numbers? Where do you get this 1ms?
Spinlock is around single inb/outb instruction plus several even simpler
instructions. Do you claim it is going to take 1us? Do you claim that it 
will
add anything to interrupts-disabled time of ACPI SCI interrupt handler
itself?
>    
>    At the same time the following source code looks so ugly.
>   
It is a matter of taste and, probably, experience.

 

> Please look at the following source to see whether the bogus timeout
> happens.
>    >spin_unlock_irqrestore(&ec->spinlock, tmp);
>    >	/* if we selected poll mode or failed in GPE-mode do a poll loop */  
>    >	if (force_poll ||
>    >	    !test_bit(EC_FLAGS_GPE_MODE, &ec->flags) ||
>    >	    acpi_ec_wait(ec))
>    >		ret = ec_poll(ec);
> 		
> 	When EC GPE storm happens, EC driver will work in polling mode. And it will
> call the function of ec_poll after issuing the EC command.
>    static int ec_poll(struct acpi_ec *ec)
> {
> 	unsigned long delay = jiffies + msecs_to_jiffies(ACPI_EC_DELAY);
>         //If preempt schedule happens here, the timeout happens after it is rescheduled.
>         // In such case how to issue the following EC command sequence?
>   
> 	while (time_before(jiffies, delay)) {
> 		gpe_transaction(ec, acpi_ec_read_status(ec));
> 		/* do a shortest sleep */
> 		msleep(1);
> 		if (ec_transaction_done(ec))
>  			return 0;
> 	        //If the preempt schedule happens here, maybe the timeout happens when it is rescheduled. 
>                 // And the EC transaction is not finished. How to explain it?
>  	}
> -	pr_err(PREFIX "acpi_ec_wait timeout, status = 0x%2.2x, event = %s\n",
> -		acpi_ec_read_status(ec),
> -		(event == ACPI_EC_EVENT_OBF_1) ? "\"b0=1\"" : "\"b1=0\"");
>  	return -ETIME;
>  }
>        I don't care whether the above bogus timeout often happens.  What I cared is
> how to explain it or process it. 
>        Regarded it as Timeout or ignore it? Unreasonable.       
>
>   
You seem to agree, that with working scheduler it is not possible that 
this code will not be
executed for half a second, right? Then, with broken scheduler (the one 
with 120s sleeps),
you arrive at this code 120 seconds late and still want to continue 
transaction?
500msec here is _cut off_ after which we don't care about this 
transaction. In normal
conditions transaction should not _ever_ come close to 1/10th of this value.
>>>    b. How to deal with the laptop with "incorrect EC status before EC
>>> GPE arrives". For example: bug 11309 (GPE storm happens and OS will
>>> report the incorrect temperature while EC GPE is disabled.)
>>>       
>> This is _your guess_. None of your patches was reported to fix a situation,
>> and submitter is not able to compile kernel.
>>     
> The bug reporter doesn't give response. But please pay attention to this
> is a regression.
>     The detect of EC GPE storm is not shipped in 2.6.24.x kernel.
>     The detect of EC GPE storm is shipped in 2.6.26.x kernel.
>     On the 2.6.24.x kernel there is no detection of EC GPE storm and EC
> GPE is enabled. In such case the temperature is reported correctly.   
>     And after EC GPE storm happens , sometimes it will report the
> incorrect thermal zone temperature, which causes that the system will be
> shutdown. (In such case EC GPE is disabled and EC will work in polling
> mode).
>     How to explain it? The possible cause is that EC status is incorrect
> before EC GPE arrives.
>
>   
It is easy to add same msleep(1) before the while() loop to overcome
this, right? So, as soon as the submitter will be able to test patches,
and if he finds his hardware still not working correctly, we could
easily fix it with 1-liner patch.
> The laptop of bug 8110 is fixed by the following commit.
>     >commit 9e197219605513c14d3eae41039ecf1b82d1920d
>      > Author: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
>      > Date:   Wed Mar 7 18:29:35 2007 -0500
>         >ACPI: ec: fix race in status register access
>    
>    If the EC GPE storm happens on this laptop, I believe that this
> laptop will be broken by your proposed patch again.
>   
Do you want me to insert the above msleep(1) now, or do you want
me to drop my patch and instead go with yours?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [Patch 0/4]  ACPI : several patches for EC
  2008-09-25 11:05         ` Alexey Starikovskiy
@ 2008-09-25 11:22           ` Alexey Starikovskiy
  2008-09-26  1:47           ` Zhao Yakui
  1 sibling, 0 replies; 9+ messages in thread
From: Alexey Starikovskiy @ 2008-09-25 11:22 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: linux-acpi, lenb

Alexey Starikovskiy wrote:
> Zhao Yakui wrote:
>> But I think that the spin_lock is overkill in the updated patch.
>> Assuming that 1000 EC transactions are done per second, the CPU
>> interrupt is disabled for 1ms. It is important that the normal laptops
>> will be affected by this.
>>   
> How do you arrive with these numbers? Where do you get this 1ms?
> Spinlock is around single inb/outb instruction plus several even simpler
> instructions. Do you claim it is going to take 1us? Do you claim that 
> it will
> add anything to interrupts-disabled time of ACPI SCI interrupt handler
> itself?
>>       At the same time the following source code looks so ugly.
>>   
> It is a matter of taste and, probably, experience.
>
>
>
>> Please look at the following source to see whether the bogus timeout
>> happens.
>>    >spin_unlock_irqrestore(&ec->spinlock, tmp);
>>    >    /* if we selected poll mode or failed in GPE-mode do a poll 
>> loop */     >    if (force_poll ||
>>    >        !test_bit(EC_FLAGS_GPE_MODE, &ec->flags) ||
>>    >        acpi_ec_wait(ec))
>>    >        ret = ec_poll(ec);
>>        
>>     When EC GPE storm happens, EC driver will work in polling mode. 
>> And it will
>> call the function of ec_poll after issuing the EC command.
>>    static int ec_poll(struct acpi_ec *ec)
>> {
>>     unsigned long delay = jiffies + msecs_to_jiffies(ACPI_EC_DELAY);
>>         //If preempt schedule happens here, the timeout happens after 
>> it is rescheduled.
>>         // In such case how to issue the following EC command sequence?
>>       while (time_before(jiffies, delay)) {
>>         gpe_transaction(ec, acpi_ec_read_status(ec));
>>         /* do a shortest sleep */
>>         msleep(1);
>>         if (ec_transaction_done(ec))
>>              return 0;
>>             //If the preempt schedule happens here, maybe the timeout 
>> happens when it is rescheduled.                 // And the EC 
>> transaction is not finished. How to explain it?
>>      }
>> -    pr_err(PREFIX "acpi_ec_wait timeout, status = 0x%2.2x, event = 
>> %s\n",
>> -        acpi_ec_read_status(ec),
>> -        (event == ACPI_EC_EVENT_OBF_1) ? "\"b0=1\"" : "\"b1=0\"");
>>      return -ETIME;
>>  }
>>        I don't care whether the above bogus timeout often happens.  
>> What I cared is
>> how to explain it or process it.        Regarded it as Timeout or 
>> ignore it? Unreasonable.      
>>   
> You seem to agree, that with working scheduler it is not possible that 
> this code will not be
> executed for half a second, right? Then, with broken scheduler (the 
> one with 120s sleeps),
> you arrive at this code 120 seconds late and still want to continue 
> transaction?
> 500msec here is _cut off_ after which we don't care about this 
> transaction. In normal
> conditions transaction should not _ever_ come close to 1/10th of this 
> value.
>>>>    b. How to deal with the laptop with "incorrect EC status before EC
>>>> GPE arrives". For example: bug 11309 (GPE storm happens and OS will
>>>> report the incorrect temperature while EC GPE is disabled.)
>>>>       
>>> This is _your guess_. None of your patches was reported to fix a 
>>> situation,
>>> and submitter is not able to compile kernel.
>>>     
>> The bug reporter doesn't give response. But please pay attention to this
>> is a regression.
>>     The detect of EC GPE storm is not shipped in 2.6.24.x kernel.
>>     The detect of EC GPE storm is shipped in 2.6.26.x kernel.
>>     On the 2.6.24.x kernel there is no detection of EC GPE storm and EC
>> GPE is enabled. In such case the temperature is reported correctly.   
>>     And after EC GPE storm happens , sometimes it will report the
>> incorrect thermal zone temperature, which causes that the system will be
>> shutdown. (In such case EC GPE is disabled and EC will work in polling
>> mode).
>>     How to explain it? The possible cause is that EC status is incorrect
>> before EC GPE arrives.
>>
>>   
> It is easy to add same msleep(1) before the while() loop to overcome
> this, right? So, as soon as the submitter will be able to test patches,
> and if he finds his hardware still not working correctly, we could
> easily fix it with 1-liner patch.
>> The laptop of bug 8110 is fixed by the following commit.
>>     >commit 9e197219605513c14d3eae41039ecf1b82d1920d
>>      > Author: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
>>      > Date:   Wed Mar 7 18:29:35 2007 -0500
>>         >ACPI: ec: fix race in status register access
>>       If the EC GPE storm happens on this laptop, I believe that this
>> laptop will be broken by your proposed patch again.
>>  
Actually, this patch suggests, that there is _no_ storm. If storm was were,
simply waiting for an interrupt (which come in tens) does not help.
> Do you want me to insert the above msleep(1) now, or do you want
> me to drop my patch and instead go with yours?
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [Patch 0/4]  ACPI : several patches for EC
  2008-09-25 11:05         ` Alexey Starikovskiy
  2008-09-25 11:22           ` Alexey Starikovskiy
@ 2008-09-26  1:47           ` Zhao Yakui
  2008-09-26  6:27             ` Alexey Starikovskiy
  1 sibling, 1 reply; 9+ messages in thread
From: Zhao Yakui @ 2008-09-26  1:47 UTC (permalink / raw)
  To: Alexey Starikovskiy; +Cc: linux-acpi, lenb

On Thu, 2008-09-25 at 15:05 +0400, Alexey Starikovskiy wrote:
> Zhao Yakui wrote:
> > But I think that the spin_lock is overkill in the updated patch.
> > Assuming that 1000 EC transactions are done per second, the CPU
> > interrupt is disabled for 1ms. It is important that the normal laptops
> > will be affected by this.
> >   
> How do you arrive with these numbers? Where do you get this 1ms?
> Spinlock is around single inb/outb instruction plus several even simpler
> instructions. Do you claim it is going to take 1us? Do you claim that it 
> will
It is noted that inb/outb instruction is not memory read/write
operation. It will take some time to read/write the external peripheral
device.
For example: 
On intel platform : The EC is connected with ICH device through LPC bus.
For every LCP I/O read/write operation, it will take almost 0.7us.

When OS reads the SBS battery info, there will a lot of SCI interrupts,
most of which are related with EC transaction. 
> add anything to interrupts-disabled time of ACPI SCI interrupt handler
> itself?
Not included.

Why not explain the following ? The local function variable resides on
the process stack space. If the process that is doing EC transaction is
killed before it is finished, what will happen? Maybe the system will be
kernel panic. 
   At the same time the following source code looks so ugly.
>  The local variable in function is assigned to the global pointer
variable.
>      >struct transaction_data t = {.wdata = wdata, .rdata = rdata,
>                                      .wlen = wdata_len, .rlen =
rdata_len};
>       >ec->t = &t;
> 
    And this will be accessed by interrupt context. 
   
   Assuming that an application tries to access the battery info through
the /proc interface, maybe there will exist kernel panic if the
application is killed before it returns normally.(The battery info is
related with EC.)
In normal conditions it will be OK. But in some specific cases , maybe
there exists the problem.
> >    
> >    At the same time the following source code looks so ugly.
> >   
> It is a matter of taste and, probably, experience.
> 
>  
> 
> > Please look at the following source to see whether the bogus timeout
> > happens.
> >    >spin_unlock_irqrestore(&ec->spinlock, tmp);
> >    >	/* if we selected poll mode or failed in GPE-mode do a poll loop */  
> >    >	if (force_poll ||
> >    >	    !test_bit(EC_FLAGS_GPE_MODE, &ec->flags) ||
> >    >	    acpi_ec_wait(ec))
> >    >		ret = ec_poll(ec);
> > 		
> > 	When EC GPE storm happens, EC driver will work in polling mode. And it will
> > call the function of ec_poll after issuing the EC command.
> >    static int ec_poll(struct acpi_ec *ec)
> > {
> > 	unsigned long delay = jiffies + msecs_to_jiffies(ACPI_EC_DELAY);
> >         //If preempt schedule happens here, the timeout happens after it is rescheduled.
> >         // In such case how to issue the following EC command sequence?
> >   
> > 	while (time_before(jiffies, delay)) {
> > 		gpe_transaction(ec, acpi_ec_read_status(ec));
> > 		/* do a shortest sleep */
> > 		msleep(1);
> > 		if (ec_transaction_done(ec))
> >  			return 0;
> > 	        //If the preempt schedule happens here, maybe the timeout happens when it is rescheduled. 
> >                 // And the EC transaction is not finished. How to explain it?
> >  	}
> > -	pr_err(PREFIX "acpi_ec_wait timeout, status = 0x%2.2x, event = %s\n",
> > -		acpi_ec_read_status(ec),
> > -		(event == ACPI_EC_EVENT_OBF_1) ? "\"b0=1\"" : "\"b1=0\"");
> >  	return -ETIME;
> >  }
> >        I don't care whether the above bogus timeout often happens.  What I cared is
> > how to explain it or process it. 
> >        Regarded it as Timeout or ignore it? Unreasonable.       
> >
> >   
> You seem to agree, that with working scheduler it is not possible that 
> this code will not be
> executed for half a second, right? Then, with broken scheduler (the one 
> with 120s sleeps),
I haven't  said anything about the scheduler.  
> you arrive at this code 120 seconds late and still want to continue 
> transaction?
Why not continue transaction? 
When timeout happens, the following EC command sequence can't be issued.
Maybe this will crash the EC internal working state.
If not,please help me explain why so many timeouts happen on the laptop
of bug 11141 after one bogus timeout happens.
    http://bugzilla.kernel.org/show_bug.cgi?id=11141

> 500msec here is _cut off_ after which we don't care about this 
> transaction. In normal
> conditions transaction should not _ever_ come close to 1/10th of this value.
Yes. I agree. In normal conditions transaction will be finished very
quickly. 
But in the exception conditions? What will happen?
Please give me a clear explanation. 
> >>>    b. How to deal with the laptop with "incorrect EC status before EC
> >>> GPE arrives". For example: bug 11309 (GPE storm happens and OS will
> >>> report the incorrect temperature while EC GPE is disabled.)
> >>>       
> >> This is _your guess_. None of your patches was reported to fix a situation,
> >> and submitter is not able to compile kernel.
> >>     
> > The bug reporter doesn't give response. But please pay attention to this
> > is a regression.
> >     The detect of EC GPE storm is not shipped in 2.6.24.x kernel.
> >     The detect of EC GPE storm is shipped in 2.6.26.x kernel.
> >     On the 2.6.24.x kernel there is no detection of EC GPE storm and EC
> > GPE is enabled. In such case the temperature is reported correctly.   
> >     And after EC GPE storm happens , sometimes it will report the
> > incorrect thermal zone temperature, which causes that the system will be
> > shutdown. (In such case EC GPE is disabled and EC will work in polling
> > mode).
> >     How to explain it? The possible cause is that EC status is incorrect
> > before EC GPE arrives.
> >
I only want to know why you raise the following commit?
    >commit 9e197219605513c14d3eae41039ecf1b82d1920d
    >Author: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
    > Date:   Wed Mar 7 18:29:35 2007 -0500
      >ACPI: ec: fix race in status register access
   
   If your understanding is correct, why push it?
   Why give two different explanations about the same issue?
At the same time please confirm whether the laptop of bug 8110 is broken
again by your patch if the GPE storm happens ?
> >   
> It is easy to add same msleep(1) before the while() loop to overcome
> this, right? So, as soon as the submitter will be able to test patches,
> and if he finds his hardware still not working correctly, we could
> easily fix it with 1-liner patch.
> > The laptop of bug 8110 is fixed by the following commit.
> >     >commit 9e197219605513c14d3eae41039ecf1b82d1920d
> >      > Author: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
> >      > Date:   Wed Mar 7 18:29:35 2007 -0500
> >         >ACPI: ec: fix race in status register access
> >    
> >    If the EC GPE storm happens on this laptop, I believe that this
> > laptop will be broken by your proposed patch again.
> >   
> Do you want me to insert the above msleep(1) now, or do you want
> me to drop my patch and instead go with yours?
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [Patch 0/4]  ACPI : several patches for EC
  2008-09-26  1:47           ` Zhao Yakui
@ 2008-09-26  6:27             ` Alexey Starikovskiy
  0 siblings, 0 replies; 9+ messages in thread
From: Alexey Starikovskiy @ 2008-09-26  6:27 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: linux-acpi, lenb

Zhao Yakui wrote:
> On Thu, 2008-09-25 at 15:05 +0400, Alexey Starikovskiy wrote:
>   
>> Zhao Yakui wrote:
>>     
>>> But I think that the spin_lock is overkill in the updated patch.
>>> Assuming that 1000 EC transactions are done per second, the CPU
>>> interrupt is disabled for 1ms. It is important that the normal laptops
>>> will be affected by this.
>>>   
>>>       
>> How do you arrive with these numbers? Where do you get this 1ms?
>> Spinlock is around single inb/outb instruction plus several even simpler
>> instructions. Do you claim it is going to take 1us? Do you claim that it 
>> will
>>     
> It is noted that inb/outb instruction is not memory read/write
> operation. It will take some time to read/write the external peripheral
> device.
> For example: 
> On intel platform : The EC is connected with ICH device through LPC bus.
> For every LCP I/O read/write operation, it will take almost 0.7us.
>
> When OS reads the SBS battery info, there will a lot of SCI interrupts,
> most of which are related with EC transaction. 
>   
You seem to forget, that SBS driver is also written by me... Don't
explain to me how it works. Thanks.
>> add anything to interrupts-disabled time of ACPI SCI interrupt handler
>> itself?
>>     
> Not included.
>
> Why not explain the following ? The local function variable resides on
> the process stack space. If the process that is doing EC transaction is
> killed before it is finished, what will happen? Maybe the system will be
> kernel panic.
Ever tried to kill process, which entered syscall (e.g. in kernel)?
Please read some books first.
> I only want to know why you raise the following commit?
>     >commit 9e197219605513c14d3eae41039ecf1b82d1920d
>     >Author: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
>     > Date:   Wed Mar 7 18:29:35 2007 -0500
>       >ACPI: ec: fix race in status register access
>    
>    If your understanding is correct, why push it?
>    Why give two different explanations about the same issue?
> At the same time please confirm whether the laptop of bug 8110 is broken
> again by your patch if the GPE storm happens ?
>   
I just explained everything in the above mail. If you don't understand --
this is not my problem, but _yours_. I am not going to explain same things
to you several times.

My patch does not break machine in bug report 8110.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-09-26  6:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-25  3:53 [RFC] [Patch 0/4] ACPI : several patches for EC Zhao Yakui
2008-09-25  5:25 ` Alexey Starikovskiy
2008-09-25  6:41   ` Zhao Yakui
2008-09-25  8:31     ` Alexey Starikovskiy
2008-09-25 10:05       ` Zhao Yakui
2008-09-25 11:05         ` Alexey Starikovskiy
2008-09-25 11:22           ` Alexey Starikovskiy
2008-09-26  1:47           ` Zhao Yakui
2008-09-26  6:27             ` Alexey Starikovskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).