Kernel Panic in mac80211

linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel Panic in mac80211
@ 2008-01-24  7:16 Larry Finger
  2008-01-24 14:33 ` Johannes Berg
  0 siblings, 1 reply; 6+ messages in thread
From: Larry Finger @ 2008-01-24  7:16 UTC (permalink / raw)
  To: Johannes Berg; +Cc: wireless, John Linville

I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These 
crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After 
running a memory test to ensure that these panics were not caused by a hardware problem, I enabled 
netconsole logging and caught the following crash report for my x86_64 system:

Unable to handle kernel paging request at ffffffff88246288 RIP:
  [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a
PGD 203067 PUD 207063 PMD 580d3067 PTE 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: netconsole af_packet snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device sunrpc 
rfkill_input cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 
freq_table usbserial ipt_MASQUERADE xt_state nf_nat_ftp iptable_nat nf_nat nf_conntrack_ftp 
nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables fuse loop dm_mod snd_hda_intel 
snd_pcm b43 rfkill mac80211 led_class snd_timer snd soundcore rtc_cmos rtc_core k8temp 
snd_page_alloc rtc_lib forcedeth hwmon ohci1394 sdhci mmc_core sr_mod ieee1394 cdrom ssb serio_raw 
i2c_nforce2 output button sg ehci_hcd ohci_hcd sd_mod usbcore edd ext3 mbcache jbd fan sata_nv 
pata_amd libata scsi_mod thermal processor
Pid: 0, comm: swapper Not tainted 2.6.24-rc7-L2.6-gbc77eb36-dirty #11
RIP: 0010:[<ffffffff88202940>]  [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a
RSP: 0018:ffff810059b77dd0  EFLAGS: 00010202
RAX: 0000000000003600 RBX: ffff8100581e6000 RCX: 0000000000000240
RDX: ffffffff88242c80 RSI: 0000000000000001 RDI: ffff810056066800
RBP: ffff810059b77e00 R08: 0000000000000007 R09: 000000000000000b
R10: 000000000000000c R11: ffff8100581e6010 R12: 0000000000000001
R13: 00000000ffffffff R14: 0000000000000000 R15: ffff810037a75028
FS:  00002b0125b744f0(0000) GS:ffff810059801780(0000) knlGS:00000000f70de6c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffff88246288 CR3: 0000000056044000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff810059b72000, task ffff810059b70000)
Stack:  ffff810058b2e5c8 ffff810058071400 ffff810058b2e460 ffff810058b2e460
  ffff810058165ba0 ffff810053e9d823 ffff810059b77e50 ffffffff881ee684
  ffff810058b2e5b0 ffff810058071400 ffff810059b77e50 ffff810058071400
Call Trace:
  <IRQ>  [<ffffffff881ee684>] :mac80211:ieee80211_tx_status+0x19d/0x424
  [<ffffffff881eee0a>] :mac80211:ieee80211_tasklet_handler+0x7e/0xe2
  [<ffffffff8023ef23>] tasklet_action+0x6e/0xc8
  [<ffffffff8023edf6>] __do_softirq+0x70/0xf1
  [<ffffffff8020b03c>] default_idle+0x0/0x51
  [<ffffffff8020d38c>] call_softirq+0x1c/0x28
  [<ffffffff8020f2fb>] do_softirq+0x39/0x9f
  [<ffffffff8023ed44>] irq_exit+0x4e/0x90
  [<ffffffff8020f41b>] do_IRQ+0xba/0xdb
  [<ffffffff8020b03c>] default_idle+0x0/0x51
  [<ffffffff8020c686>] ret_from_intr+0x0/0xf
  <EOI>  [<ffffffff8020b073>] default_idle+0x37/0x51
  [<ffffffff8020b071>] default_idle+0x35/0x51
  [<ffffffff8020b133>] cpu_idle+0xa6/0xce
  [<ffffffff8021df9f>] start_secondary+0x3de/0x3ef


Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24
RIP  [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a
  RSP <ffff810059b77dd0>
CR2: ffffffff88246288
---[ end trace 638a30c2fdaf8180 ]---
Kernel panic - not syncing: Aiee, killing interrupt handler!

I have not yet figured out which instruction is found at mac80211:rate_control_pid_tx_status+0x426, 
but I will continue to work on it. I wanted to get this report filed so that the problem can be 
found before 2.6.25-rcX comes out.

Larry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel Panic in mac80211
  2008-01-24  7:16 Kernel Panic in mac80211 Larry Finger
@ 2008-01-24 14:33 ` Johannes Berg
  2008-01-25  4:35   ` Larry Finger
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Berg @ 2008-01-24 14:33 UTC (permalink / raw)
  To: Larry Finger; +Cc: wireless, John Linville, Stefano Brivio

[-- Attachment #1: Type: text/plain, Size: 1137 bytes --]


On Thu, 2008-01-24 at 00:16 -0700, Larry Finger wrote:
> I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These 
> crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After 
> running a memory test to ensure that these panics were not caused by a hardware problem, I enabled 
> netconsole logging and caught the following crash report for my x86_64 system:

> Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24
> RIP  [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a
>   RSP <ffff810059b77dd0>
> CR2: ffffffff88246288
> ---[ end trace 638a30c2fdaf8180 ]---
> Kernel panic - not syncing: Aiee, killing interrupt handler!
> 
> I have not yet figured out which instruction is found at mac80211:rate_control_pid_tx_status+0x426, 
> but I will continue to work on it. I wanted to get this report filed so that the problem can be 
> found before 2.6.25-rcX comes out.

Damn, I've seen that too but blamed it on my own patching. Stefano, any
idea? IIRC some sta struct was NULL in pid_tx_status.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel Panic in mac80211
  2008-01-24 14:33 ` Johannes Berg
@ 2008-01-25  4:35   ` Larry Finger
  2008-01-25  5:24     ` Stefano Brivio
  2008-01-25 20:59     ` Johannes Berg
  0 siblings, 2 replies; 6+ messages in thread
From: Larry Finger @ 2008-01-25  4:35 UTC (permalink / raw)
  To: Johannes Berg; +Cc: wireless, John Linville, Stefano Brivio

Johannes Berg wrote:
> On Thu, 2008-01-24 at 00:16 -0700, Larry Finger wrote:
>> I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These 
>> crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After 
>> running a memory test to ensure that these panics were not caused by a hardware problem, I enabled 
>> netconsole logging and caught the following crash report for my x86_64 system:
> 
>> Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24
>> RIP  [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a
> 
> Damn, I've seen that too but blamed it on my own patching. Stefano, any
> idea? IIRC some sta struct was NULL in pid_tx_status.

The problem is not a NULL in one of the structs, but a runaway loop. The error occurs in the
following loop in rate_control_pid_adjust_rate():

         while (newidx != sta->txrate) {
                 if (rate_supported(sta, mode, newidx) &&
                     (maxrate < 0 || newidx <= maxrate)) {
                         sta->txrate = newidx;
                         break;
                 }

                 newidx += back;
         }

The panic triggers in rate_supported(), which is compiled in-line, with newidx having a value of 576
at the time of the panic!! I'm not sure of the fix, but I think newindex should always be <=
mode->num_rates. The following patch should cure the crash, but may not be the best fix.

Index: wireless-2.6/net/mac80211/rc80211_pid_algo.c
===================================================================
--- wireless-2.6.orig/net/mac80211/rc80211_pid_algo.c
+++ wireless-2.6/net/mac80211/rc80211_pid_algo.c
@@ -123,6 +123,8 @@ static void rate_control_pid_adjust_rate
  		}

  		newidx += back;
+		if (newidx < 0 || newidx >= mode->num_rates)
+			return;
  	}

  #ifdef CONFIG_MAC80211_DEBUGFS

This patch has been compile tested at the moment, but it will get further testing after this E-mail
is sent.

Larry


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel Panic in mac80211
  2008-01-25  4:35   ` Larry Finger
@ 2008-01-25  5:24     ` Stefano Brivio
  2008-01-25  7:12       ` Larry Finger
  2008-01-25 20:59     ` Johannes Berg
  1 sibling, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2008-01-25  5:24 UTC (permalink / raw)
  To: Larry Finger; +Cc: Johannes Berg, wireless, John Linville

[D'oh, I missed this report until now.]

On Thu, 24 Jan 2008 21:35:13 -0700
Larry Finger <larry.finger@lwfinger.net> wrote:

> Johannes Berg wrote:
> > On Thu, 2008-01-24 at 00:16 -0700, Larry Finger wrote:
> >> I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These 
> >> crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After 
> >> running a memory test to ensure that these panics were not caused by a hardware problem, I enabled 
> >> netconsole logging and caught the following crash report for my x86_64 system:
> > 
> >> Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24
> >> RIP  [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a
> > 
> > Damn, I've seen that too but blamed it on my own patching. Stefano, any
> > idea? IIRC some sta struct was NULL in pid_tx_status.
> 
> The problem is not a NULL in one of the structs, but a runaway loop. The error occurs in the
> following loop in rate_control_pid_adjust_rate():
> 
>          while (newidx != sta->txrate) {
>                  if (rate_supported(sta, mode, newidx) &&
>                      (maxrate < 0 || newidx <= maxrate)) {
>                          sta->txrate = newidx;
>                          break;
>                  }
> 
>                  newidx += back;
>          }

Is this commit in the tree you are testing?

commit 5bfcaca1279835867e2aa3406cfaf2fd7d92ff7c
Author: Stefano Brivio <stefano.brivio@polimi.it>
Date:   Sun Dec 23 04:41:19 2007 +0100

    rc80211-pid: simplify and fix shift_adjust

> The panic triggers in rate_supported(), which is compiled in-line, with newidx having a value of 576
> at the time of the panic!! I'm not sure of the fix, but I think newindex should always be <=
> mode->num_rates. The following patch should cure the crash, but may not be the best fix.

Sure, but rate_control_pid_shift_adjust() ensures that the newindex we
start with is within the ranges, so that I can't actually explain how
run-away can ever happen (because as soon as we hit the lower or the higher
limit, that should be a supported rate!) However, a bug prevented that from
working correctly, but should be fixed by the commit I mentioned above.

> Index: wireless-2.6/net/mac80211/rc80211_pid_algo.c
> ===================================================================
> --- wireless-2.6.orig/net/mac80211/rc80211_pid_algo.c
> +++ wireless-2.6/net/mac80211/rc80211_pid_algo.c
> @@ -123,6 +123,8 @@ static void rate_control_pid_adjust_rate
>   		}
> 
>   		newidx += back;
> +		if (newidx < 0 || newidx >= mode->num_rates)
> +			return;
>   	}
> 
>   #ifdef CONFIG_MAC80211_DEBUGFS
> 
> This patch has been compile tested at the moment, but it will get further testing after this E-mail
> is sent.

ACK, this can be useful as an additional sanity check, even if that
shouldn't be needed. Please ensure you have that commit in your tree -- I'll
investigate further, in case. Thank you for the report, I never hit that!


--
Ciao
Stefano

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel Panic in mac80211
  2008-01-25  5:24     ` Stefano Brivio
@ 2008-01-25  7:12       ` Larry Finger
  0 siblings, 0 replies; 6+ messages in thread
From: Larry Finger @ 2008-01-25  7:12 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Johannes Berg, wireless, John Linville

Stefano Brivio wrote:
> [D'oh, I missed this report until now.]
> 
> On Thu, 24 Jan 2008 21:35:13 -0700
> Larry Finger <larry.finger@lwfinger.net> wrote:
> 
>> Johannes Berg wrote:
>>> On Thu, 2008-01-24 at 00:16 -0700, Larry Finger wrote:
>>>> I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These 
>>>> crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After 
>>>> running a memory test to ensure that these panics were not caused by a hardware problem, I enabled 
>>>> netconsole logging and caught the following crash report for my x86_64 system:
>>>> Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24
>>>> RIP  [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a
>>> Damn, I've seen that too but blamed it on my own patching. Stefano, any
>>> idea? IIRC some sta struct was NULL in pid_tx_status.
>> The problem is not a NULL in one of the structs, but a runaway loop. The error occurs in the
>> following loop in rate_control_pid_adjust_rate():
>>
>>          while (newidx != sta->txrate) {
>>                  if (rate_supported(sta, mode, newidx) &&
>>                      (maxrate < 0 || newidx <= maxrate)) {
>>                          sta->txrate = newidx;
>>                          break;
>>                  }
>>
>>                  newidx += back;
>>          }
> 
> Is this commit in the tree you are testing?
> 
> commit 5bfcaca1279835867e2aa3406cfaf2fd7d92ff7c
> Author: Stefano Brivio <stefano.brivio@polimi.it>
> Date:   Sun Dec 23 04:41:19 2007 +0100
> 
>     rc80211-pid: simplify and fix shift_adjust
> 
>> The panic triggers in rate_supported(), which is compiled in-line, with newidx having a value of 576
>> at the time of the panic!! I'm not sure of the fix, but I think newindex should always be <=
>> mode->num_rates. The following patch should cure the crash, but may not be the best fix.
> 
> Sure, but rate_control_pid_shift_adjust() ensures that the newindex we
> start with is within the ranges, so that I can't actually explain how
> run-away can ever happen (because as soon as we hit the lower or the higher
> limit, that should be a supported rate!) However, a bug prevented that from
> working correctly, but should be fixed by the commit I mentioned above.
> 
>> Index: wireless-2.6/net/mac80211/rc80211_pid_algo.c
>> ===================================================================
>> --- wireless-2.6.orig/net/mac80211/rc80211_pid_algo.c
>> +++ wireless-2.6/net/mac80211/rc80211_pid_algo.c
>> @@ -123,6 +123,8 @@ static void rate_control_pid_adjust_rate
>>   		}
>>
>>   		newidx += back;
>> +		if (newidx < 0 || newidx >= mode->num_rates)
>> +			return;
>>   	}
>>
>>   #ifdef CONFIG_MAC80211_DEBUGFS
>>
>> This patch has been compile tested at the moment, but it will get further testing after this E-mail
>> is sent.
> 
> ACK, this can be useful as an additional sanity check, even if that
> shouldn't be needed. Please ensure you have that commit in your tree -- I'll
> investigate further, in case. Thank you for the report, I never hit that!

Yes, the commit you mentioned is in my tree. I'm now running with some additional diagnostics. If my 
new check is triggered. I'll let you know what I find.

Larry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel Panic in mac80211
  2008-01-25  4:35   ` Larry Finger
  2008-01-25  5:24     ` Stefano Brivio
@ 2008-01-25 20:59     ` Johannes Berg
  1 sibling, 0 replies; 6+ messages in thread
From: Johannes Berg @ 2008-01-25 20:59 UTC (permalink / raw)
  To: Larry Finger; +Cc: wireless, John Linville, Stefano Brivio

[-- Attachment #1: Type: text/plain, Size: 510 bytes --]


> >> Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24
> >> RIP  [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a
> > 
> > Damn, I've seen that too but blamed it on my own patching. Stefano, any
> > idea? IIRC some sta struct was NULL in pid_tx_status.
> 
> The problem is not a NULL in one of the structs, but a runaway loop.

Ah, you're right, I remember now that the access was to some random
rather high address which matches what you observed.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-01-29 11:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-24  7:16 Kernel Panic in mac80211 Larry Finger
2008-01-24 14:33 ` Johannes Berg
2008-01-25  4:35   ` Larry Finger
2008-01-25  5:24     ` Stefano Brivio
2008-01-25  7:12       ` Larry Finger
2008-01-25 20:59     ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).