linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: kernel BUG in iwl-agn-rs.c:2076,was: iwlagn + some accesspoint = hardlock
@ 2010-04-29 18:26 lkml
  0 siblings, 0 replies; 30+ messages in thread
From: lkml @ 2010-04-29 18:26 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

  Hi,

  In addition to the previous message it seems that:

  - RFKILL switch, switching radio on, (obviously) is a precondition. But then, it only 
    happens shortly after switching radio on,
  - it happens more frequently and faster after switching radio on when kde + firefox + xy is running

Please find attached an image of the bug. Finally I could catch a glimpse of what happens, I longued
for this quite some time.

Strangely, as stated in the previous message, this bug does only happen in conjunction at a specific 
geographical position, so far it only happened there that is. How does this correlate with the designated
code at line 2076 in iwl-agn-rs.c?:

  /* Sanity-check TPT calculations */
  BUG_ON(window->average_tpt != ((window->success_ratio *
      tbl->expected_tpt[index] + 64) / 128));


Furthermore, it seems like there was already an encounter with this bug before. Previously I had no means 
of grabbing more information off it, though, so I wrote down some notes:

2010-03-20: iwl-agn-rs.c line 2076: bug
The file where I kept my note for this bug contains: 

- crashed during wireless AP scan using wpa_supplicant (after RFKILL switching to on..)
- this happend right after a fresh reboot (after crashing for the same bug)

- afterwards, auth.log contained this (bad, bad where's my data gone?):
Mar 20 18:24:01 localhost CRON[16684]: pam_unix(cron:session): session closed for user root
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@i686; it; rv:1.9.2.1) Gecko/20091211 Firefox/3.5.6 (like Firefox/3.5.11; Debian-3.5.11-1)"

Noticed the line in the bug from 2010-03-20? Yep, the same as in the one of today.

Happy to serve with more info if required (and available..).

  Regards,



[-- Attachment #2: 2010-04-29_kernel-oops_iwlagn_2076_cut4www.png --]
[-- Type: image/png, Size: 83116 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread
* RE: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock
@ 2010-05-03 19:17 NilsRadtkelkml
  2010-05-03 19:22 ` John W. Linville
  0 siblings, 1 reply; 30+ messages in thread
From: NilsRadtkelkml @ 2010-05-03 19:17 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4610 bytes --]


  Hello,

  Some of the previous messages about this BUG seem lost, so here again in a single message:

When connecting to the local university's wireless APs, rather regularly the kernel crashes.

So, connecting to one of those the system hardlocks. No panic LED blinking, no mouse cursor moving, 
no magic sysreq key working. Poweroff is the only way to revive the machine. Until the upcoming
lock-up.

What might serve as a hint: using knemo and kwifimanager (with wpa_supplicant in the bg) it feels like it is 
freezing more frequently. Not running kwifimanager seems to reduce the frequency w/o however eliminating it
completely. Sometimes iceweasel has been touched and then the crash occurred. 
Maybe some TX (or subsequent RX) provokes the crash? Some idea that also came up was that accessing iwlagn 
drivers isn't clean when it's happening from more that one place at the same time 
(->wpa_supplicant + kwifimanager)? [edit: this isn't probably it, or not solely]

This behaviour is currently only known to happen in this exact place/geolocation. So one assumption is that it
is somehow correlated to a particular type of access point, firmware respectively. Nonetheless, even
if there's some fw sending kill packets, the kernel should not crash, obviously.

The access points in range are of type: (obviously, fw versions are unknown..)

  - one 0:1d:8b PIRELLI BROADBAND SOLUTIONS 
  - couple of those 0:40:96 Cisco Systems, Inc.
  - one 0:1a:70 Cisco-Linksys

This notebook has no serial port, so we haven't been able to get a glimpse on what is happening at that 
precise moment. xconsole or dmesg haven't been helpful as the system just freezes and that's it.

Other circumstances leading to a crash (reproducibility):

  - RFKILL switch, switching radio on, (obviously) is a precondition. But then, it mainly 
    happens shortly after switching radio on,
  - it happens more frequently and faster after switching radio on when kde + firefox + xy is running

Please find attached an image of the bug. Finally I could catch a glimpse of what happens, I longued
for this quite some time.

Strangely, as stated in the previous message, this bug does only happen in conjunction at a specific 
geographical position, so far it only happened there that is. How does this correlate with the designated
code at line 2076 in iwl-agn-rs.c?:

  /* Sanity-check TPT calculations */
  BUG_ON(window->average_tpt != ((window->success_ratio *
      tbl->expected_tpt[index] + 64) / 128));


Furthermore, it seems like there was already an encounter with this bug before. Previously I had no means 
of grabbing more information off it, though, so I wrote down some notes:

2010-03-20: iwl-agn-rs.c line 2076: bug
The file where I kept my note for this bug contains: 

  - crashed during wireless AP scan using wpa_supplicant (after RFKILL switching to on..)
  - this happend right after a fresh reboot (after crashing for the same bug)

  - afterwards, auth.log contained this (bad, bad where's my data gone?):
Mar 20 18:24:01 localhost CRON[16684]: pam_unix(cron:session): session closed for user root
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@i686; it; rv:1.9.2.1) Gecko/20091211 Firefox/3.5.6 (like Firefox/3.5.11; Debian-3.5.11-1)"

Noticed the line in the bug from 2010-03-20? Yep, the same as in the one of today.

2010-04-30: One crash, two oopses: Msg restored mainly from remainders left in some brain's corner, w/ 
heavy massaging applied, mainly Code has been written down properly due to outside constrains..
Unfortunately, the proper ordering of lines got lost, so, yes, this is a poor report..
Order is as it's believed it was shown on-screen.

Number 1: nearly lost as subsequent oops made it scroll off-screen, only bottom half visible:

in CPU idle +x3f/x90

EIP rs_tx_status +x8f/x2030    SS: ESP 0668:f749fc4c
  
Code: 7f 22 69 c7 80 01 00 00 8b 8c 24 fc 01 00 00 8b 84 01 a8 00 00 00 6b 04 b0 64 39 44 24 1c 0f 8e 
      d2 fe ff ff 31 db e9 cb fe ff ff <0f> 0b fe 8b 54 24 4c 8b 8c 24 fc 01 00 00 8b 82 04 00 00 00

in iwl_rx_queue restock xbd/x130   iwlcore

panic -> oops begin 

kernel panic not syncing fatal exec in int

pid 0 comm: swapper tainted GD .33.2


Comments, ideas, fixes welcome... this BUG is highly annoying..

Happy to serve with more info if required (and available..).

  Regards,

      Nils


[-- Attachment #2: 2010-04-29_kernel-oops_iwlagn_2076_cut4www.png --]
[-- Type: image/png, Size: 83116 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock
@ 2010-05-11  9:41 Nils Radtke
  0 siblings, 0 replies; 30+ messages in thread
From: Nils Radtke @ 2010-05-11  9:41 UTC (permalink / raw)
  To: reinette.chatre; +Cc: linville, linux-kernel, linux-wireless

  Hi,

Thanks a lot for the driver not hanging w/ bug_on() any more. At least the machine 
keeps working and when on battery no repeated reboots are required any more. That alone
already means a lot.

# On Mon, 2010-05-10 at 11:36 -0700, Nils Radtke wrote:
# >   Today weather was fine again, finally. So testing with .33.3 w/ the patch applied: 
# > 
# >   http://marc.info/?l=linux-wireless&m=127290931304496&w=2
# > 
# > The kernel kernel .32 was still running before it crashed immediately on wireless activation.
# > The crash log showed again at least two messages, the last was as already described in my first
# > message, bug from 2010-04-30: I think even the 0x2030 was the same:
# > 
# > EIP rs_tx_status +x8f/x2030 
# 
# You report an issue on 2.6.32 ...
Yes. These errors happened to be the same regardless of .32 or .33

# > W/ .33.3 and the above patch applied:
# 
# ... but then test the patch with 2.6.33.
# 
# Which kernel are you focused on?
Sorry, no intention to confuse or show erratic behaviour.. :)

It's just that the errors occur on both of them. Then I accidently booted the old one again (now
removed from the system), but again, the error showed up on .32, .33{1,2,3} . But you always had
had an indication which kernel it happened on.

OTH, it's basically the same, the identical error persists, so I can't seem the difference here. 
Except for a scientific approach one shouldn't do that, ACK. But, hey, I'd like to use the machine in
the meantime and happened to update the kernel source. 

# > Linux mypole 2.6.33.3 #18 SMP PREEMPT Thu May 6 21:51:37 CEST 2010 i686 GNU/Linux
# > 
# > May 10 19:14:11 [   80.586637] iwlagn 0000:03:00.0: expected_tpt should have been calculated by now
# > May 10 19:23:17 [  626.476078] iwlagn 0000:03:00.0: expected_tpt should have been calculated by now
# > May 10 19:23:30 [  638.913740] iwlagn 0000:03:00.0: expected_tpt should have been calculated by now
# > May 10 19:23:32 [  641.232425] iwlagn 0000:03:00.0: expected_tpt should have been calculated by now
# > May 10 19:23:54 [  663.392697] iwlagn 0000:03:00.0: expected_tpt should have been calculated by now
# > May 10 19:23:58 [  666.980247] iwlagn 0000:03:00.0: expected_tpt should have been calculated by now
# > May 10 19:24:02 [  671.121826] iwlagn 0000:03:00.0: expected_tpt should have been calculated by now
# Can you see any impact on your connection speed that can be connected to
# these messages?
I'm glad you're asking. Yes, indeed, speed it exceptionally low to what might be achievable. Around 30k/s
average, burst with maybe 200k/s, instead of 700k/s.

# > Additionally these were logged, could you tell why they're there and what to do? (also .33.3 w/ patch)
# > 
# > May 10 19:24:16 [  685.079617] iwlagn 0000:03:00.0: iwl_tx_agg_start on ra = 00:1a:70:12:23:25 tid = 0
# > May 10 19:24:22 [  691.026737] iwlagn 0000:03:00.0: iwl_tx_agg_start on ra = 00:1a:70:12:23:25 tid = 0
# > May 10 19:28:02 [  911.406162] iwlagn 0000:03:00.0: iwl_tx_agg_start on ra = 00:1a:70:12:23:25 tid = 0
# > May 10 19:35:38 [ 1367.251240] iwlagn 0000:03:00.0: iwl_tx_agg_start on ra = 00:1a:70:12:23:25 tid = 0
# > 
# > The above "iwl_tx_agg_start" lines happen when connecting - again to a Cisco AP - and the connection gets
# > dropped the exact moment when a download is started. It even often drops when dhcp is still negotiating, has
# > got it's IP but the nego isn't finished yet. Conn drops, same procedure again and again. This happens only
# > with this Cisco AP (which is BTW another one from the "expected_tpt should have been calculated by now" 
# > problem).
# It could be that some of the queues get stuck. Can you try with the
# patches in
# http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2037#c113 ? They
# are based on 2.6.33.
Good, no wait, bad, now running on .34-rc7. *sigh

I'll apply the patches to .33. .34-rc7 hadn't brought the desired success w/ the olicard100 usb-umts-stick.

Update: noticed you mean 2.6.33 not .33.x ;) On .33.3 it doesn't apply cleanly for a couple of files..
Any objections if I apply it to .33.3 anyway? (Fixing the rej of course..)

Interestingly enough, quilt import 0001*patch imports, quilt push patches but it applies the patch w/o
rej. patch -p1 0001*patch does recognize the patch already applied and rejects..

All patches applied successfully, trying again these days.

Thanks for your comments.

Will keep you informed.

Nils


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-06-10 16:19 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-29 18:26 kernel BUG in iwl-agn-rs.c:2076,was: iwlagn + some accesspoint = hardlock lkml
  -- strict thread matches above, loose matches on Subject: below --
2010-05-03 19:17 kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock NilsRadtkelkml
2010-05-03 19:22 ` John W. Linville
2010-05-06  9:14   ` Christian Borntraeger
2010-05-06 16:28     ` reinette chatre
2010-05-11 15:50       ` Christian Borntraeger
2010-05-11 17:21         ` reinette chatre
2010-05-12 15:18           ` Christian Borntraeger
2010-05-10 18:36   ` Nils Radtke
2010-05-10 23:32     ` reinette chatre
2010-05-12 14:39       ` Nils Radtke
2010-05-12 23:14         ` reinette chatre
2010-05-13 10:34           ` Nils Radtke
2010-05-13 11:32           ` Nils Radtke
2010-05-13 16:31             ` reinette chatre
2010-05-14 17:45               ` Nils Radtke
2010-05-13 15:05           ` Nils Radtke
2010-05-17 23:19             ` reinette chatre
2010-05-20 12:15               ` Nils Radtke
2010-05-20 18:33                 ` reinette chatre
2010-05-31 20:12                   ` Nils Radtke
2010-06-02 17:51                     ` reinette chatre
2010-06-04 16:57                       ` Nils Radtke
2010-06-08 17:46                         ` reinette chatre
2010-06-10 14:22                           ` Nils Radtke
2010-06-10 16:19                             ` reinette chatre
2010-05-20 12:31               ` Nils Radtke
2010-05-20 18:26                 ` reinette chatre
2010-05-20 22:30                 ` David Miller
2010-05-11  9:41 Nils Radtke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).