* Re: eeepc_hotkey rmmod issues
@ 2009-07-29 10:01 Alan Jenkins
2009-07-29 10:21 ` Corentin Chary
0 siblings, 1 reply; 5+ messages in thread
From: Alan Jenkins @ 2009-07-29 10:01 UTC (permalink / raw)
To: Corentin Chary; +Cc: Luciano Rocha, Pekka Enberg, linux-kernel
On 7/28/09, Corentin Chary <corentin.chary@gmail.com> wrote:
> On Tue, Jul 28, 2009 at 9:19 PM, Alan
> Jenkins<sourcejedi.lkml@googlemail.com> wrote:
>> On 7/28/09, Luciano Rocha <luciano@eurotux.com> wrote:
>>> On Tue, Jul 28, 2009 at 05:50:26PM +0100, Luciano Rocha wrote:
>>>> On Mon, Jul 27, 2009 at 10:45:14PM +0300, Pekka Enberg wrote:
>>>> > (Adding Corentin to cc)
>>>> >
>>>> > On Mon, Jul 27, 2009 at 10:27 PM, Luciano Rocha<luciano@eurotux.com>
>>>> > wrote:
>>>> > > Also, a "rmmod eeepc_hotkeys" resulted in a kernel panic. If asked,
>>>> > > I'll
>>>> > > try to replicate it.
>>>> >
>>>> > Yes, please.
>>>>
>>>> Hm, rebooted without i2c_i801, browsed some, then did a rmmod
>>>> eeepc_laptop:
>>>> ERROR!!! H2M_MAILBOX still hold by MCU. command fail
>>>> ERROR!!! H2M_MAILBOX still hold by MCU. command fail
>>>>
>>>> Two equal lines, yes. What does it mean?
>>>
>>> Nevermind, the wireless driver didn't like that the hardware
>>> disappeared.
>>
>> Thanks for the bug report anyway :-).
>>
>> So presumably this is what caused your oops earlier. I assume the
>> wireless toggle button doesn't normally cause any errors.
>>
>> The new rfkill core in 2.6.31 should avoid triggering this bug. The
>> new core won't disable the wireless when the eeepc-laptop module is
>> removed.
>>
>> But we should still fix the underlying problem. It sounds like
>> there's a narrow danger window on module unload. And it's still there
>> in 2.6.31-rc4:
>>
>> 1019 static void eeepc_rfkill_exit(void)
>> 1020 {
>> 1021 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>> 1022 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>> 1023 if (ehotk->wlan_rfkill)
>> 1024 rfkill_unregister(ehotk->wlan_rfkill);
>>
>> Really we need to perform these unregistrations "at the same time".
>> The rfkill device relies on the notifier, but the notifier callback
>> also uses the rfkill device. I guess we will need to a mutex to
>> synchronize unregistration (and registration).
>
> I think 2.6.31 is ok,
> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
> rfkill_free, which was an error because
> the notifier callback uses the rfkill device.
Ok. I don't see how that causes Luciano's errors. So I guess he was
right to blame the wireless driver.
> But I believe that the rfkill device can work without the notifier
> (which is an acpi notifier).
I don't think it can.
If the rfkill device is set to "soft blocked", the pci device is
removed. If the acpi notifier is not called, the pci driver (e.g.
ath5k) won't realise the device is gone. The network device (e.g.
wlan0) will remain present, but it won't work.
So I believe there's a circular dependency which we need to resolve.
Would you like me to write a patch for it?
Regards
Alan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: eeepc_hotkey rmmod issues
2009-07-29 10:01 eeepc_hotkey rmmod issues Alan Jenkins
@ 2009-07-29 10:21 ` Corentin Chary
2009-07-29 12:02 ` Alan Jenkins
0 siblings, 1 reply; 5+ messages in thread
From: Corentin Chary @ 2009-07-29 10:21 UTC (permalink / raw)
To: Alan Jenkins; +Cc: Luciano Rocha, Pekka Enberg, linux-kernel
On Wed, Jul 29, 2009 at 12:01 PM, Alan
Jenkins<sourcejedi.lkml@googlemail.com> wrote:
> On 7/28/09, Corentin Chary <corentin.chary@gmail.com> wrote:
>> On Tue, Jul 28, 2009 at 9:19 PM, Alan
>> Jenkins<sourcejedi.lkml@googlemail.com> wrote:
>>> On 7/28/09, Luciano Rocha <luciano@eurotux.com> wrote:
>>>> On Tue, Jul 28, 2009 at 05:50:26PM +0100, Luciano Rocha wrote:
>>>>> On Mon, Jul 27, 2009 at 10:45:14PM +0300, Pekka Enberg wrote:
>>>>> > (Adding Corentin to cc)
>>>>> >
>>>>> > On Mon, Jul 27, 2009 at 10:27 PM, Luciano Rocha<luciano@eurotux.com>
>>>>> > wrote:
>>>>> > > Also, a "rmmod eeepc_hotkeys" resulted in a kernel panic. If asked,
>>>>> > > I'll
>>>>> > > try to replicate it.
>>>>> >
>>>>> > Yes, please.
>>>>>
>>>>> Hm, rebooted without i2c_i801, browsed some, then did a rmmod
>>>>> eeepc_laptop:
>>>>> ERROR!!! H2M_MAILBOX still hold by MCU. command fail
>>>>> ERROR!!! H2M_MAILBOX still hold by MCU. command fail
>>>>>
>>>>> Two equal lines, yes. What does it mean?
>>>>
>>>> Nevermind, the wireless driver didn't like that the hardware
>>>> disappeared.
>>>
>>> Thanks for the bug report anyway :-).
>>>
>>> So presumably this is what caused your oops earlier. I assume the
>>> wireless toggle button doesn't normally cause any errors.
>>>
>>> The new rfkill core in 2.6.31 should avoid triggering this bug. The
>>> new core won't disable the wireless when the eeepc-laptop module is
>>> removed.
>>>
>>> But we should still fix the underlying problem. It sounds like
>>> there's a narrow danger window on module unload. And it's still there
>>> in 2.6.31-rc4:
>>>
>>> 1019 static void eeepc_rfkill_exit(void)
>>> 1020 {
>>> 1021 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>>> 1022 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>>> 1023 if (ehotk->wlan_rfkill)
>>> 1024 rfkill_unregister(ehotk->wlan_rfkill);
>>>
>>> Really we need to perform these unregistrations "at the same time".
>>> The rfkill device relies on the notifier, but the notifier callback
>>> also uses the rfkill device. I guess we will need to a mutex to
>>> synchronize unregistration (and registration).
>>
>> I think 2.6.31 is ok,
>
>> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
>> rfkill_free, which was an error because
>> the notifier callback uses the rfkill device.
>
> Ok. I don't see how that causes Luciano's errors. So I guess he was
> right to blame the wireless driver.
If he was using 2.6.30, then :
eeepc_unregister_rfkill_notifier() was called after rfkill_unregister()
And the callback was still registered after rfkill_unregister(), *Ooops*
In 2.6.31 we first unregister the callback, and then rfkill, so rmmod
should works.
>> But I believe that the rfkill device can work without the notifier
>> (which is an acpi notifier).
>
> I don't think it can.
>
> If the rfkill device is set to "soft blocked", the pci device is
> removed. If the acpi notifier is not called, the pci driver (e.g.
> ath5k) won't realise the device is gone. The network device (e.g.
> wlan0) will remain present, but it won't work.
Hum, there is a misunderstanding here. What I mean is : I think
eeepc_rfkill_exit(void) is ok in 2.6.31 (Luciano used 2.6.30).
And eeepc_rfkill_exit() is only called on rmmod eeepc-laptop
Commit 7de39389d8f61aa517ce2a8b4d925acc62696ae5 did a lot of
change in rfkill code.
> So I believe there's a circular dependency which we need to resolve.
> Would you like me to write a patch for it?
It's possible that I miss the issue here, so go ahead :)
--
Corentin Chary
http://xf.iksaif.net - http://uffs.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: eeepc_hotkey rmmod issues
2009-07-29 10:21 ` Corentin Chary
@ 2009-07-29 12:02 ` Alan Jenkins
2009-07-29 12:15 ` Corentin Chary
0 siblings, 1 reply; 5+ messages in thread
From: Alan Jenkins @ 2009-07-29 12:02 UTC (permalink / raw)
To: Corentin Chary; +Cc: Pekka Enberg, linux-kernel
On 7/29/09, Corentin Chary <corentin.chary@gmail.com> wrote:
> On Wed, Jul 29, 2009 at 12:01 PM, Alan
> Jenkins<sourcejedi.lkml@googlemail.com> wrote:
>> On 7/28/09, Corentin Chary <corentin.chary@gmail.com> wrote:
>>> On Tue, Jul 28, 2009 at 9:19 PM, Alan
>>> Jenkins<sourcejedi.lkml@googlemail.com> wrote:
>>>> But we should still fix the underlying problem. It sounds like
>>>> there's a narrow danger window on module unload. And it's still there
>>>> in 2.6.31-rc4:
>>>>
>>>> 1019 static void eeepc_rfkill_exit(void)
>>>> 1020 {
>>>> 1021 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>>>> 1022 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>>>> 1023 if (ehotk->wlan_rfkill)
>>>> 1024 rfkill_unregister(ehotk->wlan_rfkill);
>>>>
>>>> Really we need to perform these unregistrations "at the same time".
>>>> The rfkill device relies on the notifier, but the notifier callback
>>>> also uses the rfkill device. I guess we will need to a mutex to
>>>> synchronize unregistration (and registration).
>>>
>>> I think 2.6.31 is ok,
>>
>>> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
>>> rfkill_free, which was an error because
>>> the notifier callback uses the rfkill device.
>>
>> Ok. I don't see how that causes Luciano's errors. So I guess he was
>> right to blame the wireless driver.
>
> If he was using 2.6.30, then :
> eeepc_unregister_rfkill_notifier() was called after rfkill_unregister()
> And the callback was still registered after rfkill_unregister(), *Ooops*
>
> In 2.6.31 we first unregister the callback, and then rfkill, so rmmod
> should works.
>
>>> But I believe that the rfkill device can work without the notifier
>>> (which is an acpi notifier).
>>
>> I don't think it can.
>>
>> If the rfkill device is set to "soft blocked", the pci device is
>> removed. If the acpi notifier is not called, the pci driver (e.g.
>> ath5k) won't realise the device is gone. The network device (e.g.
>> wlan0) will remain present, but it won't work.
>
> Hum, there is a misunderstanding here. What I mean is : I think
> eeepc_rfkill_exit(void) is ok in 2.6.31 (Luciano used 2.6.30).
>
> And eeepc_rfkill_exit() is only called on rmmod eeepc-laptop
>
> Commit 7de39389d8f61aa517ce2a8b4d925acc62696ae5 did a lot of
> change in rfkill code.
>
>> So I believe there's a circular dependency which we need to resolve.
>> Would you like me to write a patch for it?
>
> It's possible that I miss the issue here, so go ahead :)
Thanks :)
Here is a test case to show the race I am talking about
diff --git a/drivers/platform/x86/eeepc-laptop.c b/drivers/platform/x86/eeepc-laptop.c
index ec560f1..c478db5 100644
--- a/drivers/platform/x86/eeepc-laptop.c
+++ b/drivers/platform/x86/eeepc-laptop.c
@@ -1020,6 +1020,17 @@ static void eeepc_rfkill_exit(void)
{
eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
+
+ //
+ // Simulated error
+ // Imagine that userspace set the wifi to "soft blocked" at this exact moment
+ // (or the wireless toggle key was pressed)
+ //
+ // The PCI device will disappear, but we will not see any notification
+ //
+ set_acpi(CM_ASL_WLAN, 0);
+ rfkill_set_sw_state(ehotk->wlan_rfkill, true);
+
if (ehotk->wlan_rfkill)
rfkill_unregister(ehotk->wlan_rfkill);
if (ehotk->bluetooth_rfkill)
If you unload eeepc-laptop with this simulated race, the wireless
interface stays around but stops working.
[ 191.391155] ath5k phy0: can't reset hardware (-5)
[ 191.432983] ath5k phy0: failed to wakeup the MAC Chip
[ 196.940835] __ratelimit: 21 callbacks suppressed
Alan
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: eeepc_hotkey rmmod issues
2009-07-29 12:02 ` Alan Jenkins
@ 2009-07-29 12:15 ` Corentin Chary
2009-07-29 12:17 ` Alan Jenkins
0 siblings, 1 reply; 5+ messages in thread
From: Corentin Chary @ 2009-07-29 12:15 UTC (permalink / raw)
To: alan-jenkins; +Cc: Pekka Enberg, linux-kernel
On Wed, Jul 29, 2009 at 2:02 PM, Alan
Jenkins<sourcejedi.lkml@googlemail.com> wrote:
> On 7/29/09, Corentin Chary <corentin.chary@gmail.com> wrote:
>> On Wed, Jul 29, 2009 at 12:01 PM, Alan
>> Jenkins<sourcejedi.lkml@googlemail.com> wrote:
>>> On 7/28/09, Corentin Chary <corentin.chary@gmail.com> wrote:
>>>> On Tue, Jul 28, 2009 at 9:19 PM, Alan
>>>> Jenkins<sourcejedi.lkml@googlemail.com> wrote:
>
>>>>> But we should still fix the underlying problem. It sounds like
>>>>> there's a narrow danger window on module unload. And it's still there
>>>>> in 2.6.31-rc4:
>>>>>
>>>>> 1019 static void eeepc_rfkill_exit(void)
>>>>> 1020 {
>>>>> 1021 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>>>>> 1022 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>>>>> 1023 if (ehotk->wlan_rfkill)
>>>>> 1024 rfkill_unregister(ehotk->wlan_rfkill);
>>>>>
>>>>> Really we need to perform these unregistrations "at the same time".
>>>>> The rfkill device relies on the notifier, but the notifier callback
>>>>> also uses the rfkill device. I guess we will need to a mutex to
>>>>> synchronize unregistration (and registration).
>>>>
>>>> I think 2.6.31 is ok,
>>>
>>>> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
>>>> rfkill_free, which was an error because
>>>> the notifier callback uses the rfkill device.
>>>
>>> Ok. I don't see how that causes Luciano's errors. So I guess he was
>>> right to blame the wireless driver.
>>
>> If he was using 2.6.30, then :
>> eeepc_unregister_rfkill_notifier() was called after rfkill_unregister()
>> And the callback was still registered after rfkill_unregister(), *Ooops*
>>
>> In 2.6.31 we first unregister the callback, and then rfkill, so rmmod
>> should works.
>>
>>>> But I believe that the rfkill device can work without the notifier
>>>> (which is an acpi notifier).
>>>
>>> I don't think it can.
>>>
>>> If the rfkill device is set to "soft blocked", the pci device is
>>> removed. If the acpi notifier is not called, the pci driver (e.g.
>>> ath5k) won't realise the device is gone. The network device (e.g.
>>> wlan0) will remain present, but it won't work.
>>
>> Hum, there is a misunderstanding here. What I mean is : I think
>> eeepc_rfkill_exit(void) is ok in 2.6.31 (Luciano used 2.6.30).
>>
>> And eeepc_rfkill_exit() is only called on rmmod eeepc-laptop
>>
>> Commit 7de39389d8f61aa517ce2a8b4d925acc62696ae5 did a lot of
>> change in rfkill code.
>>
>>> So I believe there's a circular dependency which we need to resolve.
>>> Would you like me to write a patch for it?
>>
>> It's possible that I miss the issue here, so go ahead :)
>
> Thanks :)
>
> Here is a test case to show the race I am talking about
>
> diff --git a/drivers/platform/x86/eeepc-laptop.c b/drivers/platform/x86/eeepc-laptop.c
> index ec560f1..c478db5 100644
> --- a/drivers/platform/x86/eeepc-laptop.c
> +++ b/drivers/platform/x86/eeepc-laptop.c
> @@ -1020,6 +1020,17 @@ static void eeepc_rfkill_exit(void)
> {
> eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
> eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
> +
> + //
> + // Simulated error
> + // Imagine that userspace set the wifi to "soft blocked" at this exact moment
> + // (or the wireless toggle key was pressed)
> + //
> + // The PCI device will disappear, but we will not see any notification
> + //
> + set_acpi(CM_ASL_WLAN, 0);
> + rfkill_set_sw_state(ehotk->wlan_rfkill, true);
> +
> if (ehotk->wlan_rfkill)
> rfkill_unregister(ehotk->wlan_rfkill);
> if (ehotk->bluetooth_rfkill)
>
>
>
> If you unload eeepc-laptop with this simulated race, the wireless
> interface stays around but stops working.
>
> [ 191.391155] ath5k phy0: can't reset hardware (-5)
> [ 191.432983] ath5k phy0: failed to wakeup the MAC Chip
> [ 196.940835] __ratelimit: 21 callbacks suppressed
>
> Alan
>
Indeed :) . Let's serialize that. Do you want me to do it ?
Thanks,
--
Corentin Chary
http://xf.iksaif.net - http://uffs.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: eeepc_hotkey rmmod issues
2009-07-29 12:15 ` Corentin Chary
@ 2009-07-29 12:17 ` Alan Jenkins
0 siblings, 0 replies; 5+ messages in thread
From: Alan Jenkins @ 2009-07-29 12:17 UTC (permalink / raw)
To: Corentin Chary; +Cc: Pekka Enberg, linux-kernel
Corentin Chary wrote:
> On Wed, Jul 29, 2009 at 2:02 PM, Alan
> Jenkins<sourcejedi.lkml@googlemail.com> wrote:
>
>> On 7/29/09, Corentin Chary <corentin.chary@gmail.com> wrote:
>>
>>> On Wed, Jul 29, 2009 at 12:01 PM, Alan
>>> Jenkins<sourcejedi.lkml@googlemail.com> wrote:
>>>
>>>> On 7/28/09, Corentin Chary <corentin.chary@gmail.com> wrote:
>>>>
>>>>> On Tue, Jul 28, 2009 at 9:19 PM, Alan
>>>>> Jenkins<sourcejedi.lkml@googlemail.com> wrote:
>>>>>
>>>>>> But we should still fix the underlying problem. It sounds like
>>>>>> there's a narrow danger window on module unload. And it's still there
>>>>>> in 2.6.31-rc4:
>>>>>>
>>>>>> 1019 static void eeepc_rfkill_exit(void)
>>>>>> 1020 {
>>>>>> 1021 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>>>>>> 1022 eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>>>>>> 1023 if (ehotk->wlan_rfkill)
>>>>>> 1024 rfkill_unregister(ehotk->wlan_rfkill);
>>>>>>
>>>>>> Really we need to perform these unregistrations "at the same time".
>>>>>> The rfkill device relies on the notifier, but the notifier callback
>>>>>> also uses the rfkill device. I guess we will need to a mutex to
>>>>>> synchronize unregistration (and registration).
>>>>>>
>>>>> I think 2.6.31 is ok,
>>>>>
>>>>> In 2.6.30, we called eeepc_unregister_rfkill_notifier after
>>>>> rfkill_free, which was an error because
>>>>> the notifier callback uses the rfkill device.
>>>>>
>>>> Ok. I don't see how that causes Luciano's errors. So I guess he was
>>>> right to blame the wireless driver.
>>>>
>>> If he was using 2.6.30, then :
>>> eeepc_unregister_rfkill_notifier() was called after rfkill_unregister()
>>> And the callback was still registered after rfkill_unregister(), *Ooops*
>>>
>>> In 2.6.31 we first unregister the callback, and then rfkill, so rmmod
>>> should works.
>>>
>>>
>>>>> But I believe that the rfkill device can work without the notifier
>>>>> (which is an acpi notifier).
>>>>>
>>>> I don't think it can.
>>>>
>>>> If the rfkill device is set to "soft blocked", the pci device is
>>>> removed. If the acpi notifier is not called, the pci driver (e.g.
>>>> ath5k) won't realise the device is gone. The network device (e.g.
>>>> wlan0) will remain present, but it won't work.
>>>>
>>> Hum, there is a misunderstanding here. What I mean is : I think
>>> eeepc_rfkill_exit(void) is ok in 2.6.31 (Luciano used 2.6.30).
>>>
>>> And eeepc_rfkill_exit() is only called on rmmod eeepc-laptop
>>>
>>> Commit 7de39389d8f61aa517ce2a8b4d925acc62696ae5 did a lot of
>>> change in rfkill code.
>>>
>>>
>>>> So I believe there's a circular dependency which we need to resolve.
>>>> Would you like me to write a patch for it?
>>>>
>>> It's possible that I miss the issue here, so go ahead :)
>>>
>> Thanks :)
>>
>> Here is a test case to show the race I am talking about
>>
>> diff --git a/drivers/platform/x86/eeepc-laptop.c b/drivers/platform/x86/eeepc-laptop.c
>> index ec560f1..c478db5 100644
>> --- a/drivers/platform/x86/eeepc-laptop.c
>> +++ b/drivers/platform/x86/eeepc-laptop.c
>> @@ -1020,6 +1020,17 @@ static void eeepc_rfkill_exit(void)
>> {
>> eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P6");
>> eeepc_unregister_rfkill_notifier("\\_SB.PCI0.P0P7");
>> +
>> + //
>> + // Simulated error
>> + // Imagine that userspace set the wifi to "soft blocked" at this exact moment
>> + // (or the wireless toggle key was pressed)
>> + //
>> + // The PCI device will disappear, but we will not see any notification
>> + //
>> + set_acpi(CM_ASL_WLAN, 0);
>> + rfkill_set_sw_state(ehotk->wlan_rfkill, true);
>> +
>> if (ehotk->wlan_rfkill)
>> rfkill_unregister(ehotk->wlan_rfkill);
>> if (ehotk->bluetooth_rfkill)
>>
>>
>>
>> If you unload eeepc-laptop with this simulated race, the wireless
>> interface stays around but stops working.
>>
>> [ 191.391155] ath5k phy0: can't reset hardware (-5)
>> [ 191.432983] ath5k phy0: failed to wakeup the MAC Chip
>> [ 196.940835] __ratelimit: 21 callbacks suppressed
>>
>> Alan
>>
>>
>
> Indeed :) . Let's serialize that. Do you want me to do it ?
> Thanks,
>
It's ok, I'm already working on a fix.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-07-29 12:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-29 10:01 eeepc_hotkey rmmod issues Alan Jenkins
2009-07-29 10:21 ` Corentin Chary
2009-07-29 12:02 ` Alan Jenkins
2009-07-29 12:15 ` Corentin Chary
2009-07-29 12:17 ` Alan Jenkins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox