* Kernel Panic in mac80211 @ 2008-01-24 7:16 Larry Finger 2008-01-24 14:33 ` Johannes Berg 0 siblings, 1 reply; 6+ messages in thread From: Larry Finger @ 2008-01-24 7:16 UTC (permalink / raw) To: Johannes Berg; +Cc: wireless, John Linville I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After running a memory test to ensure that these panics were not caused by a hardware problem, I enabled netconsole logging and caught the following crash report for my x86_64 system: Unable to handle kernel paging request at ffffffff88246288 RIP: [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a PGD 203067 PUD 207063 PMD 580d3067 PTE 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: netconsole af_packet snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device sunrpc rfkill_input cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table usbserial ipt_MASQUERADE xt_state nf_nat_ftp iptable_nat nf_nat nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables fuse loop dm_mod snd_hda_intel snd_pcm b43 rfkill mac80211 led_class snd_timer snd soundcore rtc_cmos rtc_core k8temp snd_page_alloc rtc_lib forcedeth hwmon ohci1394 sdhci mmc_core sr_mod ieee1394 cdrom ssb serio_raw i2c_nforce2 output button sg ehci_hcd ohci_hcd sd_mod usbcore edd ext3 mbcache jbd fan sata_nv pata_amd libata scsi_mod thermal processor Pid: 0, comm: swapper Not tainted 2.6.24-rc7-L2.6-gbc77eb36-dirty #11 RIP: 0010:[<ffffffff88202940>] [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a RSP: 0018:ffff810059b77dd0 EFLAGS: 00010202 RAX: 0000000000003600 RBX: ffff8100581e6000 RCX: 0000000000000240 RDX: ffffffff88242c80 RSI: 0000000000000001 RDI: ffff810056066800 RBP: ffff810059b77e00 R08: 0000000000000007 R09: 000000000000000b R10: 000000000000000c R11: ffff8100581e6010 R12: 0000000000000001 R13: 00000000ffffffff R14: 0000000000000000 R15: ffff810037a75028 FS: 00002b0125b744f0(0000) GS:ffff810059801780(0000) knlGS:00000000f70de6c0 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffffffff88246288 CR3: 0000000056044000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff810059b72000, task ffff810059b70000) Stack: ffff810058b2e5c8 ffff810058071400 ffff810058b2e460 ffff810058b2e460 ffff810058165ba0 ffff810053e9d823 ffff810059b77e50 ffffffff881ee684 ffff810058b2e5b0 ffff810058071400 ffff810059b77e50 ffff810058071400 Call Trace: <IRQ> [<ffffffff881ee684>] :mac80211:ieee80211_tx_status+0x19d/0x424 [<ffffffff881eee0a>] :mac80211:ieee80211_tasklet_handler+0x7e/0xe2 [<ffffffff8023ef23>] tasklet_action+0x6e/0xc8 [<ffffffff8023edf6>] __do_softirq+0x70/0xf1 [<ffffffff8020b03c>] default_idle+0x0/0x51 [<ffffffff8020d38c>] call_softirq+0x1c/0x28 [<ffffffff8020f2fb>] do_softirq+0x39/0x9f [<ffffffff8023ed44>] irq_exit+0x4e/0x90 [<ffffffff8020f41b>] do_IRQ+0xba/0xdb [<ffffffff8020b03c>] default_idle+0x0/0x51 [<ffffffff8020c686>] ret_from_intr+0x0/0xf <EOI> [<ffffffff8020b073>] default_idle+0x37/0x51 [<ffffffff8020b071>] default_idle+0x35/0x51 [<ffffffff8020b133>] cpu_idle+0xa6/0xce [<ffffffff8021df9f>] start_secondary+0x3de/0x3ef Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24 RIP [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a RSP <ffff810059b77dd0> CR2: ffffffff88246288 ---[ end trace 638a30c2fdaf8180 ]--- Kernel panic - not syncing: Aiee, killing interrupt handler! I have not yet figured out which instruction is found at mac80211:rate_control_pid_tx_status+0x426, but I will continue to work on it. I wanted to get this report filed so that the problem can be found before 2.6.25-rcX comes out. Larry ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel Panic in mac80211 2008-01-24 7:16 Kernel Panic in mac80211 Larry Finger @ 2008-01-24 14:33 ` Johannes Berg 2008-01-25 4:35 ` Larry Finger 0 siblings, 1 reply; 6+ messages in thread From: Johannes Berg @ 2008-01-24 14:33 UTC (permalink / raw) To: Larry Finger; +Cc: wireless, John Linville, Stefano Brivio [-- Attachment #1: Type: text/plain, Size: 1137 bytes --] On Thu, 2008-01-24 at 00:16 -0700, Larry Finger wrote: > I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These > crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After > running a memory test to ensure that these panics were not caused by a hardware problem, I enabled > netconsole logging and caught the following crash report for my x86_64 system: > Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24 > RIP [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a > RSP <ffff810059b77dd0> > CR2: ffffffff88246288 > ---[ end trace 638a30c2fdaf8180 ]--- > Kernel panic - not syncing: Aiee, killing interrupt handler! > > I have not yet figured out which instruction is found at mac80211:rate_control_pid_tx_status+0x426, > but I will continue to work on it. I wanted to get this report filed so that the problem can be > found before 2.6.25-rcX comes out. Damn, I've seen that too but blamed it on my own patching. Stefano, any idea? IIRC some sta struct was NULL in pid_tx_status. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel Panic in mac80211 2008-01-24 14:33 ` Johannes Berg @ 2008-01-25 4:35 ` Larry Finger 2008-01-25 5:24 ` Stefano Brivio 2008-01-25 20:59 ` Johannes Berg 0 siblings, 2 replies; 6+ messages in thread From: Larry Finger @ 2008-01-25 4:35 UTC (permalink / raw) To: Johannes Berg; +Cc: wireless, John Linville, Stefano Brivio Johannes Berg wrote: > On Thu, 2008-01-24 at 00:16 -0700, Larry Finger wrote: >> I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These >> crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After >> running a memory test to ensure that these panics were not caused by a hardware problem, I enabled >> netconsole logging and caught the following crash report for my x86_64 system: > >> Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24 >> RIP [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a > > Damn, I've seen that too but blamed it on my own patching. Stefano, any > idea? IIRC some sta struct was NULL in pid_tx_status. The problem is not a NULL in one of the structs, but a runaway loop. The error occurs in the following loop in rate_control_pid_adjust_rate(): while (newidx != sta->txrate) { if (rate_supported(sta, mode, newidx) && (maxrate < 0 || newidx <= maxrate)) { sta->txrate = newidx; break; } newidx += back; } The panic triggers in rate_supported(), which is compiled in-line, with newidx having a value of 576 at the time of the panic!! I'm not sure of the fix, but I think newindex should always be <= mode->num_rates. The following patch should cure the crash, but may not be the best fix. Index: wireless-2.6/net/mac80211/rc80211_pid_algo.c =================================================================== --- wireless-2.6.orig/net/mac80211/rc80211_pid_algo.c +++ wireless-2.6/net/mac80211/rc80211_pid_algo.c @@ -123,6 +123,8 @@ static void rate_control_pid_adjust_rate } newidx += back; + if (newidx < 0 || newidx >= mode->num_rates) + return; } #ifdef CONFIG_MAC80211_DEBUGFS This patch has been compile tested at the moment, but it will get further testing after this E-mail is sent. Larry ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel Panic in mac80211 2008-01-25 4:35 ` Larry Finger @ 2008-01-25 5:24 ` Stefano Brivio 2008-01-25 7:12 ` Larry Finger 2008-01-25 20:59 ` Johannes Berg 1 sibling, 1 reply; 6+ messages in thread From: Stefano Brivio @ 2008-01-25 5:24 UTC (permalink / raw) To: Larry Finger; +Cc: Johannes Berg, wireless, John Linville [D'oh, I missed this report until now.] On Thu, 24 Jan 2008 21:35:13 -0700 Larry Finger <larry.finger@lwfinger.net> wrote: > Johannes Berg wrote: > > On Thu, 2008-01-24 at 00:16 -0700, Larry Finger wrote: > >> I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These > >> crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After > >> running a memory test to ensure that these panics were not caused by a hardware problem, I enabled > >> netconsole logging and caught the following crash report for my x86_64 system: > > > >> Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24 > >> RIP [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a > > > > Damn, I've seen that too but blamed it on my own patching. Stefano, any > > idea? IIRC some sta struct was NULL in pid_tx_status. > > The problem is not a NULL in one of the structs, but a runaway loop. The error occurs in the > following loop in rate_control_pid_adjust_rate(): > > while (newidx != sta->txrate) { > if (rate_supported(sta, mode, newidx) && > (maxrate < 0 || newidx <= maxrate)) { > sta->txrate = newidx; > break; > } > > newidx += back; > } Is this commit in the tree you are testing? commit 5bfcaca1279835867e2aa3406cfaf2fd7d92ff7c Author: Stefano Brivio <stefano.brivio@polimi.it> Date: Sun Dec 23 04:41:19 2007 +0100 rc80211-pid: simplify and fix shift_adjust > The panic triggers in rate_supported(), which is compiled in-line, with newidx having a value of 576 > at the time of the panic!! I'm not sure of the fix, but I think newindex should always be <= > mode->num_rates. The following patch should cure the crash, but may not be the best fix. Sure, but rate_control_pid_shift_adjust() ensures that the newindex we start with is within the ranges, so that I can't actually explain how run-away can ever happen (because as soon as we hit the lower or the higher limit, that should be a supported rate!) However, a bug prevented that from working correctly, but should be fixed by the commit I mentioned above. > Index: wireless-2.6/net/mac80211/rc80211_pid_algo.c > =================================================================== > --- wireless-2.6.orig/net/mac80211/rc80211_pid_algo.c > +++ wireless-2.6/net/mac80211/rc80211_pid_algo.c > @@ -123,6 +123,8 @@ static void rate_control_pid_adjust_rate > } > > newidx += back; > + if (newidx < 0 || newidx >= mode->num_rates) > + return; > } > > #ifdef CONFIG_MAC80211_DEBUGFS > > This patch has been compile tested at the moment, but it will get further testing after this E-mail > is sent. ACK, this can be useful as an additional sanity check, even if that shouldn't be needed. Please ensure you have that commit in your tree -- I'll investigate further, in case. Thank you for the report, I never hit that! -- Ciao Stefano ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel Panic in mac80211 2008-01-25 5:24 ` Stefano Brivio @ 2008-01-25 7:12 ` Larry Finger 0 siblings, 0 replies; 6+ messages in thread From: Larry Finger @ 2008-01-25 7:12 UTC (permalink / raw) To: Stefano Brivio; +Cc: Johannes Berg, wireless, John Linville Stefano Brivio wrote: > [D'oh, I missed this report until now.] > > On Thu, 24 Jan 2008 21:35:13 -0700 > Larry Finger <larry.finger@lwfinger.net> wrote: > >> Johannes Berg wrote: >>> On Thu, 2008-01-24 at 00:16 -0700, Larry Finger wrote: >>>> I have been having "random" kernel panics where the "Caps Lock" LED is flashing at ~1 Hz. These >>>> crashes only occur for the wireless-2.6 tree and have been happening for roughly 3 weeks. After >>>> running a memory test to ensure that these panics were not caused by a hardware problem, I enabled >>>> netconsole logging and caught the following crash report for my x86_64 system: >>>> Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24 >>>> RIP [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a >>> Damn, I've seen that too but blamed it on my own patching. Stefano, any >>> idea? IIRC some sta struct was NULL in pid_tx_status. >> The problem is not a NULL in one of the structs, but a runaway loop. The error occurs in the >> following loop in rate_control_pid_adjust_rate(): >> >> while (newidx != sta->txrate) { >> if (rate_supported(sta, mode, newidx) && >> (maxrate < 0 || newidx <= maxrate)) { >> sta->txrate = newidx; >> break; >> } >> >> newidx += back; >> } > > Is this commit in the tree you are testing? > > commit 5bfcaca1279835867e2aa3406cfaf2fd7d92ff7c > Author: Stefano Brivio <stefano.brivio@polimi.it> > Date: Sun Dec 23 04:41:19 2007 +0100 > > rc80211-pid: simplify and fix shift_adjust > >> The panic triggers in rate_supported(), which is compiled in-line, with newidx having a value of 576 >> at the time of the panic!! I'm not sure of the fix, but I think newindex should always be <= >> mode->num_rates. The following patch should cure the crash, but may not be the best fix. > > Sure, but rate_control_pid_shift_adjust() ensures that the newindex we > start with is within the ranges, so that I can't actually explain how > run-away can ever happen (because as soon as we hit the lower or the higher > limit, that should be a supported rate!) However, a bug prevented that from > working correctly, but should be fixed by the commit I mentioned above. > >> Index: wireless-2.6/net/mac80211/rc80211_pid_algo.c >> =================================================================== >> --- wireless-2.6.orig/net/mac80211/rc80211_pid_algo.c >> +++ wireless-2.6/net/mac80211/rc80211_pid_algo.c >> @@ -123,6 +123,8 @@ static void rate_control_pid_adjust_rate >> } >> >> newidx += back; >> + if (newidx < 0 || newidx >= mode->num_rates) >> + return; >> } >> >> #ifdef CONFIG_MAC80211_DEBUGFS >> >> This patch has been compile tested at the moment, but it will get further testing after this E-mail >> is sent. > > ACK, this can be useful as an additional sanity check, even if that > shouldn't be needed. Please ensure you have that commit in your tree -- I'll > investigate further, in case. Thank you for the report, I never hit that! Yes, the commit you mentioned is in my tree. I'm now running with some additional diagnostics. If my new check is triggered. I'll let you know what I find. Larry ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel Panic in mac80211 2008-01-25 4:35 ` Larry Finger 2008-01-25 5:24 ` Stefano Brivio @ 2008-01-25 20:59 ` Johannes Berg 1 sibling, 0 replies; 6+ messages in thread From: Johannes Berg @ 2008-01-25 20:59 UTC (permalink / raw) To: Larry Finger; +Cc: wireless, John Linville, Stefano Brivio [-- Attachment #1: Type: text/plain, Size: 510 bytes --] > >> Code: f6 44 02 08 10 74 12 45 85 ed 78 05 44 39 e9 7f 08 89 8f 24 > >> RIP [<ffffffff88202940>] :mac80211:rate_control_pid_tx_status+0x426/0x45a > > > > Damn, I've seen that too but blamed it on my own patching. Stefano, any > > idea? IIRC some sta struct was NULL in pid_tx_status. > > The problem is not a NULL in one of the structs, but a runaway loop. Ah, you're right, I remember now that the access was to some random rather high address which matches what you observed. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-01-29 11:28 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-24 7:16 Kernel Panic in mac80211 Larry Finger 2008-01-24 14:33 ` Johannes Berg 2008-01-25 4:35 ` Larry Finger 2008-01-25 5:24 ` Stefano Brivio 2008-01-25 7:12 ` Larry Finger 2008-01-25 20:59 ` Johannes Berg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).