linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG REPORT: libertas causing kernel lockups
@ 2009-07-04  9:03 Alexander Barinov
  2009-07-04 10:41 ` Johannes Berg
  2009-07-07  6:59 ` Holger Schurig
  0 siblings, 2 replies; 10+ messages in thread
From: Alexander Barinov @ 2009-07-04  9:03 UTC (permalink / raw)
  To: linux-wireless

Hi,

I have Eking I1 UMPC that features Marvell 8686 SDIO wireless card. I
am currently using vanilla kernel 2.6.30, but the bug is there on
Debian 2.26 and 2.29 as well as vanilla 2.29 kernels I tried previously

The initial symptom of the bug was system lockup when executing 'ifdown
eth0' with 'BUG: scheduling while atomic'. Trying to understand the
cause of the bug and find a workaround I was able to find an easier way
to reproduce it. By executing 'cat /proc/net/wireless' I get the same
bug:

BUG: scheduling while atomic: cat/1885/0x00000002 
Pid: 1885, comm: cat Not tainted 2.6.30.090704 #2 
Call Trace: 
[<c042eb7f>] ? __schedule+0x37f/0x8c0 
[<c01237cd>] ? try_to_wake_up+0x8d/0x1e0 
[<c011ee0e>] ? __wake_up+0x3e/0x60 
[<c02e882c>] ? __lbs_cmd_async+0x11c/0x280 
[<c018ddbe>] ? d_rehash+0x2e/0x50 
[<c042f0d0>] ? schedule+0x10/0x30 
[<c02e8be4>] ? __lbs_cmd+0xa4/0x1a0 
[<c02ea3f0>] ? lbs_cmd_copyback+0x0/0x40 
[<c013c460>] ? autoremove_wake_function+0x0/0x50 
[<c02e67d5>] ? lbs_get_wireless_stats+0xf5/0x3c0 
[<c02ea3f0>] ? lbs_cmd_copyback+0x0/0x40 
[<c03fb1e7>] ? wireless_seq_show+0x47/0x180 
[<c03965bf>] ? dev_seq_start+0x1f/0xb0 
[<c0196c6d>] ? seq_read+0x1fd/0x360 
[<c0196a70>] ? seq_read+0x0/0x360 
[<c01b4774>] ? proc_reg_read+0x64/0xa0 
[<c01b4710>] ? proc_reg_read+0x0/0xa0 
[<c017f9ab>] ? vfs_read+0x9b/0x120 
[<c017fb01>] ? sys_read+0x41/0x80 
[<c0102f21>] ? syscall_call+0x7/0xb

Please let me know if I should provide any additional details regarding
the bug.

-- 
Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG REPORT: libertas causing kernel lockups
  2009-07-04  9:03 BUG REPORT: libertas causing kernel lockups Alexander Barinov
@ 2009-07-04 10:41 ` Johannes Berg
  2009-07-05  8:59   ` Alexander Barinov
  2009-07-07  6:59 ` Holger Schurig
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Berg @ 2009-07-04 10:41 UTC (permalink / raw)
  To: Alexander Barinov; +Cc: linux-wireless

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]

On Sat, 2009-07-04 at 13:03 +0400, Alexander Barinov wrote:
> Hi,
> 
> I have Eking I1 UMPC that features Marvell 8686 SDIO wireless card. I
> am currently using vanilla kernel 2.6.30, but the bug is there on
> Debian 2.26 and 2.29 as well as vanilla 2.29 kernels I tried previously
> 
> The initial symptom of the bug was system lockup when executing 'ifdown
> eth0' with 'BUG: scheduling while atomic'. Trying to understand the
> cause of the bug and find a workaround I was able to find an easier way
> to reproduce it. By executing 'cat /proc/net/wireless' I get the same
> bug:
> 
> BUG: scheduling while atomic: cat/1885/0x00000002 
> Pid: 1885, comm: cat Not tainted 2.6.30.090704 #2 
> Call Trace: 
> [<c042eb7f>] ? __schedule+0x37f/0x8c0 
> [<c01237cd>] ? try_to_wake_up+0x8d/0x1e0 
> [<c011ee0e>] ? __wake_up+0x3e/0x60 
> [<c02e882c>] ? __lbs_cmd_async+0x11c/0x280 
> [<c018ddbe>] ? d_rehash+0x2e/0x50 
> [<c042f0d0>] ? schedule+0x10/0x30 
> [<c02e8be4>] ? __lbs_cmd+0xa4/0x1a0 
> [<c02ea3f0>] ? lbs_cmd_copyback+0x0/0x40 
> [<c013c460>] ? autoremove_wake_function+0x0/0x50 
> [<c02e67d5>] ? lbs_get_wireless_stats+0xf5/0x3c0 
> [<c02ea3f0>] ? lbs_cmd_copyback+0x0/0x40 
> [<c03fb1e7>] ? wireless_seq_show+0x47/0x180 
> [<c03965bf>] ? dev_seq_start+0x1f/0xb0 
> [<c0196c6d>] ? seq_read+0x1fd/0x360 
> [<c0196a70>] ? seq_read+0x0/0x360 
> [<c01b4774>] ? proc_reg_read+0x64/0xa0 
> [<c01b4710>] ? proc_reg_read+0x0/0xa0 
> [<c017f9ab>] ? vfs_read+0x9b/0x120 
> [<c017fb01>] ? sys_read+0x41/0x80 
> [<c0102f21>] ? syscall_call+0x7/0xb
> 
> Please let me know if I should provide any additional details regarding
> the bug.

This should have been fixed by 87057825824973f29cf2f37cff1e549170b2d7e6.
For some reason everybody seems to have assumed that get_wireless_stats
can sleep, which before that commit it could _not_.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG REPORT: libertas causing kernel lockups
  2009-07-04 10:41 ` Johannes Berg
@ 2009-07-05  8:59   ` Alexander Barinov
  2009-07-06 17:29     ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Barinov @ 2009-07-05  8:59 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless

On Sat, 04 Jul 2009 12:41:42 +0200
Johannes Berg <johannes@sipsolutions.net> wrote:
> On Sat, 2009-07-04 at 13:03 +0400, Alexander Barinov wrote:
> > The initial symptom of the bug was system lockup when executing
> > 'ifdown eth0' with 'BUG: scheduling while atomic'. Trying to
> > understand the cause of the bug and find a workaround I was able to
> > find an easier way to reproduce it. By executing
> > 'cat /proc/net/wireless' I get the same bug:
> This should have been fixed by
> 87057825824973f29cf2f37cff1e549170b2d7e6. For some reason everybody
> seems to have assumed that get_wireless_stats can sleep, which before
> that commit it could _not_.

The patch you have mentioned was not directly applicable to 2.6.30
kernel so I pulled wireless-testing git tree and compiled it. This
deteriorated the situation further - now after executing
'cat /proc/net/wireless' I get kernel lockup without any further
messages.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG REPORT: libertas causing kernel lockups
  2009-07-05  8:59   ` Alexander Barinov
@ 2009-07-06 17:29     ` Dan Williams
  2009-07-06 19:23       ` Alexander Barinov
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2009-07-06 17:29 UTC (permalink / raw)
  To: Alexander Barinov; +Cc: Johannes Berg, linux-wireless

On Sun, 2009-07-05 at 12:59 +0400, Alexander Barinov wrote:
> On Sat, 04 Jul 2009 12:41:42 +0200
> Johannes Berg <johannes@sipsolutions.net> wrote:
> > On Sat, 2009-07-04 at 13:03 +0400, Alexander Barinov wrote:
> > > The initial symptom of the bug was system lockup when executing
> > > 'ifdown eth0' with 'BUG: scheduling while atomic'. Trying to
> > > understand the cause of the bug and find a workaround I was able to
> > > find an easier way to reproduce it. By executing
> > > 'cat /proc/net/wireless' I get the same bug:
> > This should have been fixed by
> > 87057825824973f29cf2f37cff1e549170b2d7e6. For some reason everybody
> > seems to have assumed that get_wireless_stats can sleep, which before
> > that commit it could _not_.
> 
> The patch you have mentioned was not directly applicable to 2.6.30
> kernel so I pulled wireless-testing git tree and compiled it. This
> deteriorated the situation further - now after executing
> 'cat /proc/net/wireless' I get kernel lockup without any further
> messages.

Can you get anything at all out of the kernel on that?  2.6.29.5 with
quite recent wireless-testing libertas driver works fine with sd8686 on
my machine (hp 2530p laptop with a Ricoh controller).  Can you post your
backport of that patch too?

Dan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG REPORT: libertas causing kernel lockups
  2009-07-06 17:29     ` Dan Williams
@ 2009-07-06 19:23       ` Alexander Barinov
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Barinov @ 2009-07-06 19:23 UTC (permalink / raw)
  To: Dan Williams; +Cc: Johannes Berg, linux-wireless

On Mon, 06 Jul 2009 13:29:55 -0400
Dan Williams <dcbw@redhat.com> wrote:
> On Sun, 2009-07-05 at 12:59 +0400, Alexander Barinov wrote:
> > On Sat, 04 Jul 2009 12:41:42 +0200
> > Johannes Berg <johannes@sipsolutions.net> wrote:
> > > On Sat, 2009-07-04 at 13:03 +0400, Alexander Barinov wrote:
> > > > The initial symptom of the bug was system lockup when executing
> > > > 'ifdown eth0' with 'BUG: scheduling while atomic'. Trying to
> > > > understand the cause of the bug and find a workaround I was
> > > > able to find an easier way to reproduce it. By executing
> > > > 'cat /proc/net/wireless' I get the same bug:
> > > This should have been fixed by
> > > 87057825824973f29cf2f37cff1e549170b2d7e6. For some reason
> > > everybody seems to have assumed that get_wireless_stats can
> > > sleep, which before that commit it could _not_.
> > 
> > The patch you have mentioned was not directly applicable to 2.6.30
> > kernel so I pulled wireless-testing git tree and compiled it. This
> > deteriorated the situation further - now after executing
> > 'cat /proc/net/wireless' I get kernel lockup without any further
> > messages.
> 
> Can you get anything at all out of the kernel on that?  2.6.29.5 with
> quite recent wireless-testing libertas driver works fine with sd8686
> on my machine (hp 2530p laptop with a Ricoh controller).  Can you
> post your backport of that patch too?

The problem is I get no kernel messages at all, just a lockup. Wireless
debug (CONFIG_MAC80211_*_DEBUG) and libertas debug
(CONFIG_LIBERTAS_DEBUG) are on, no "quiet" option on boot. Is there any
other way I can increase level of debug output to be able to give you
some meaningful information with regards to my problem?

BTW, my case is sd8686 on some SDHCI PCI controller, I am not sure how
to find the vendor.

I am not sure which backport you are talking about, as I am using
wireless-testing pulled from git, it should be something around
2.6.31-rc1.

Regards,
Alexander

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG REPORT: libertas causing kernel lockups
  2009-07-04  9:03 BUG REPORT: libertas causing kernel lockups Alexander Barinov
  2009-07-04 10:41 ` Johannes Berg
@ 2009-07-07  6:59 ` Holger Schurig
  2009-07-07  8:56   ` Johannes Berg
  2009-07-07 20:37   ` Alexander Barinov
  1 sibling, 2 replies; 10+ messages in thread
From: Holger Schurig @ 2009-07-07  6:59 UTC (permalink / raw)
  To: linux-wireless; +Cc: Alexander Barinov

> By executing 'cat /proc/net/wireless' I get the same bug:

I haven't tested your exact but, but I've the following in my 
mind, back from about 1 year ago when I did lots of libertas and 
libertas_cs work:

Access wireless stats made the libertas driver (in wext.c) issue 
a command towards the firmware in the hardware. I think it was 
the command to get the current SNR/RSSI/whatever.

The libertas command-response-cycle does sleep, so you could 
trigger a bug. I once had a patch (and I think I posted that 
patch). The patch lied, by providing the last SNR/RSSI/whatever, 
and issueing a "get SNR/RSSI/whatever" in the background, just 
storing the result. So just one "cat /proc/net/wireless" returns 
bogus, but a continues command returns "valid" info, e.g. when 
doing "watch iwconfig eth1".

-- 
http://www.holgerschurig.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG REPORT: libertas causing kernel lockups
  2009-07-07  6:59 ` Holger Schurig
@ 2009-07-07  8:56   ` Johannes Berg
  2009-07-08  7:42     ` Holger Schurig
  2009-07-07 20:37   ` Alexander Barinov
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Berg @ 2009-07-07  8:56 UTC (permalink / raw)
  To: Holger Schurig; +Cc: linux-wireless, Alexander Barinov

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On Tue, 2009-07-07 at 08:59 +0200, Holger Schurig wrote:
> > By executing 'cat /proc/net/wireless' I get the same bug:
> 
> I haven't tested your exact but, but I've the following in my 
> mind, back from about 1 year ago when I did lots of libertas and 
> libertas_cs work:
> 
> Access wireless stats made the libertas driver (in wext.c) issue 
> a command towards the firmware in the hardware. I think it was 
> the command to get the current SNR/RSSI/whatever.

Yes, but the patch that I quoted makes it allowable to sleep there, so
it must be something else. Is it maybe using the RTNL there? Or using
schedule_work() and then waiting for it or something that the work
triggers, which will deadlock on the RTNL if there's something in front
of it on the queue that needs the RTNL, because get_wireless_stats is
executed under RTNL? (lockdep couldn't find that particular case because
it knows nothing about completions)

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG REPORT: libertas causing kernel lockups
  2009-07-07  6:59 ` Holger Schurig
  2009-07-07  8:56   ` Johannes Berg
@ 2009-07-07 20:37   ` Alexander Barinov
  2009-07-08  7:32     ` Holger Schurig
  1 sibling, 1 reply; 10+ messages in thread
From: Alexander Barinov @ 2009-07-07 20:37 UTC (permalink / raw)
  To: Holger Schurig; +Cc: linux-wireless

On Tue, 7 Jul 2009 08:59:39 +0200
Holger Schurig <hs4233@mail.mn-solutions.de> wrote:
> > By executing 'cat /proc/net/wireless' I get the same bug:
> 
> I haven't tested your exact but, but I've the following in my 
> mind, back from about 1 year ago when I did lots of libertas and 
> libertas_cs work:
> 
> Access wireless stats made the libertas driver (in wext.c) issue 
> a command towards the firmware in the hardware. I think it was 
> the command to get the current SNR/RSSI/whatever.
> 
> The libertas command-response-cycle does sleep, so you could 
> trigger a bug. I once had a patch (and I think I posted that 
> patch). The patch lied, by providing the last SNR/RSSI/whatever, 
> and issueing a "get SNR/RSSI/whatever" in the background, just 
> storing the result. So just one "cat /proc/net/wireless" returns 
> bogus, but a continues command returns "valid" info, e.g. when 
> doing "watch iwconfig eth1".

Sorry for asking a potentially stupid question as I am no way a
wireless hardware expert, but can this bug be caused by wrong firmware
version?

BTW, it is really hard to find a patch you are talking about in the
archive. Could you please give some clues on the key words to search
(besides your name and "patch", of cause)?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG REPORT: libertas causing kernel lockups
  2009-07-07 20:37   ` Alexander Barinov
@ 2009-07-08  7:32     ` Holger Schurig
  0 siblings, 0 replies; 10+ messages in thread
From: Holger Schurig @ 2009-07-08  7:32 UTC (permalink / raw)
  To: linux-wireless; +Cc: Alexander Barinov

> Sorry for asking a potentially stupid question as I am no way
> a wireless hardware expert, but can this bug be caused by
> wrong firmware version?

AFAIK not.

> BTW, it is really hard to find a patch you are talking about
> in the archive. 

It's dead easy:

$ cd linux-git
$ git show 87057825824973f29cf2f37cff1e549170b2d7e6

Or you can go via the web-interface: start at 
http://git.kernel.org, search Linus' tree and you end up at 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=summary.

Now select any commitdiff, and substitute in the URL the 
hex-thingy from above. And voila, you're at

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=87057825824973f29cf2f37cff1e549170b2d7e6


-- 
http://www.holgerschurig.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG REPORT: libertas causing kernel lockups
  2009-07-07  8:56   ` Johannes Berg
@ 2009-07-08  7:42     ` Holger Schurig
  0 siblings, 0 replies; 10+ messages in thread
From: Holger Schurig @ 2009-07-08  7:42 UTC (permalink / raw)
  To: linux-wireless; +Cc: Johannes Berg, Alexander Barinov

> Yes, but the patch that I quoted makes it allowable to sleep
> there, so it must be something else. Is it maybe using the
> RTNL there? Or using schedule_work() and then waiting for it
> or something that the work triggers, which will deadlock on
> the RTNL if there's something in front of it on the queue that
> needs the RTNL, because get_wireless_stats is executed under
> RTNL? (lockdep couldn't find that particular case because it
> knows nothing about completions)

Again this is all from my memory, around the 2.6.25 
time. "iwconfig" or "cat /proc/net/wireless" ended up in 
drivers/net/wireless/libertas/wext.c, AFAIK in 
lbs_get_wireless_stats(). This calls

   lbs_cmd_with_response(priv, CMD_802_11_GET_LOG, &log);

this is a macro calling lbs_cmd(). This thingy then does 
__lbs_cmd_async(), which creates a "command node", queues this, 
and calls

	wake_up_interruptible(&priv->waitq);

(cmd.c, around line 2050) to get the queue handled (e.g. sending 
the command to the firmware). And I think that this wake_up-call 
calls __schedule() now.



Later, lbs_cmd() does this:

   might_sleep();
   wait_event_interruptible(cmdnode->cmdwait_q,
        cmdnode->cmdwaitqwoken);

But AFAIK this isn't problematic.

-- 
http://www.holgerschurig.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-07-08  7:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-04  9:03 BUG REPORT: libertas causing kernel lockups Alexander Barinov
2009-07-04 10:41 ` Johannes Berg
2009-07-05  8:59   ` Alexander Barinov
2009-07-06 17:29     ` Dan Williams
2009-07-06 19:23       ` Alexander Barinov
2009-07-07  6:59 ` Holger Schurig
2009-07-07  8:56   ` Johannes Berg
2009-07-08  7:42     ` Holger Schurig
2009-07-07 20:37   ` Alexander Barinov
2009-07-08  7:32     ` Holger Schurig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).