* [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze @ 2012-02-06 0:29 Carlos R. Mafra 2012-02-06 22:57 ` [ath9k-devel] " Pavel Roskin 0 siblings, 1 reply; 4+ messages in thread From: Carlos R. Mafra @ 2012-02-06 0:29 UTC (permalink / raw) To: LKML; +Cc: Luis R. Rodriguez, ath9k-devel I'm testing the latest kernel 3.3.0-rc2+ I pulled from git this morning. My laptop just froze, and when I rebooted I noticed that /var/log/messages contained 48 thousand (!) warnings coming from ath9k since a few hours ago. I'm pasting the first one: ------------[ cut here ]------------ WARNING: at /home/mafra/linux-2.6/drivers/net/wireless/ath/ath9k/rc.c:697 ath_rc_get_highest_rix+0x156/0x210 [ath9k]() Hardware name: VPCEB4X1E Modules linked in: dvb_usb_dib0700 dib0090 dib7000p dib7000m dib0070 dib8000 dib3000mc dibx000_common af_packet snd_pcm_oss snd_mixer_oss microcode arc4 sr_mod cdrom sg ath9k mac80211 uvcvideo ath9k_common ath9k_hw videobuf2_core videobuf2_vmalloc videobuf2_memops ath i2c_i801 cfg80211 sony_laptop rfkill radeon ttm drm_kms_helper drm i2c_algo_bit Pid: 48, comm: kworker/u:6 Not tainted 3.3.0-rc2+ #6 Call Trace: [<ffffffff81032f2a>] warn_slowpath_common+0x7a/0xb0 [<ffffffff81032f75>] warn_slowpath_null+0x15/0x20 [<ffffffffa0298eb6>] ath_rc_get_highest_rix+0x156/0x210 [ath9k] [<ffffffffa0298ff3>] ath_get_rate+0x83/0x480 [ath9k] [<ffffffffa024f006>] rate_control_get_rate+0x86/0x160 [mac80211] [<ffffffffa025a112>] invoke_tx_handlers+0x7b2/0x1420 [mac80211] [<ffffffff81061a00>] ? load_balance+0x90/0x8a0 [<ffffffffa025af22>] ieee80211_tx+0x82/0xf0 [mac80211] [<ffffffffa025b030>] ieee80211_xmit+0xa0/0x100 [mac80211] [<ffffffffa025c23a>] ieee80211_tx_skb_tid+0x5a/0x70 [mac80211] [<ffffffffa0246f56>] ieee80211_send_nullfunc+0x56/0x70 [mac80211] [<ffffffffa0248460>] ieee80211_dynamic_ps_enable_work+0x240/0x310 [mac80211] [<ffffffffa0248220>] ? ieee80211_dynamic_ps_disable_work+0x50/0x50 [mac80211] [<ffffffff81048d67>] process_one_work+0x117/0x390 [<ffffffff81049899>] worker_thread+0x169/0x350 [<ffffffff81049730>] ? manage_workers+0x240/0x240 [<ffffffff8104f096>] kthread+0x96/0xa0 [<ffffffff814f7714>] kernel_thread_helper+0x4/0x10 [<ffffffff8104f000>] ? flush_kthread_worker+0xb0/0xb0 [<ffffffff814f7710>] ? gs_change+0xb/0xb ---[ end trace 7a4892c93f508f3a ]--- ------------[ cut here ]------------ I must say that until the laptop froze, I didn't notice anything unusual to the point of checking my dmesg. Sometimes the internet was slow, but I thought it was unrelated to the kernel. Is there any other info I should provide or the above warning is enough? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ath9k-devel] [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze 2012-02-06 0:29 [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze Carlos R. Mafra @ 2012-02-06 22:57 ` Pavel Roskin 2012-02-07 12:23 ` Carlos R. Mafra 0 siblings, 1 reply; 4+ messages in thread From: Pavel Roskin @ 2012-02-06 22:57 UTC (permalink / raw) To: Carlos R. Mafra; +Cc: LKML, Luis R. Rodriguez, ath9k-devel [-- Attachment #1: Type: text/plain, Size: 2004 bytes --] On Mon, 6 Feb 2012 00:29:07 +0000 "Carlos R. Mafra" <crmafra@gmail.com> wrote: > > I'm testing the latest kernel 3.3.0-rc2+ I pulled from git > this morning. > > My laptop just froze, and when I rebooted I noticed > that /var/log/messages contained 48 thousand (!) warnings coming from > ath9k since a few hours ago. I'm pasting the first one: > > ------------[ cut here ]------------ > WARNING: > at /home/mafra/linux-2.6/drivers/net/wireless/ath/ath9k/rc.c:697 > ath_rc_get_highest_rix+0x156/0x210 [ath9k]() Hardware name: VPCEB4X1E I believe I found a solution for this today. Please see this bug tracker: https://bugzilla.redhat.com/show_bug.cgi?id=768639 While Fedora users report a warning, I've seen panic reports in the list. It's a memory corruption bug, so it can manifest in different ways. Please test the latest patch (attached). Here's my comment to the patch: This patch is based on my analysis of printk() output I added to the ath9k driver. I didn't have a chance to test the patch, so testing would be greatly appreciated. The corruption must be happening in ath_debug_stat_rc(), which is given the result of ath_rc_get_rateindex(). ath_rc_get_rateindex() can return -1, which causes ath_debug_stat_rc() to increment the value that lies 16 bytes before rcstats in struct ath_rate_priv. On 64-bit systems, that happens to be rate_table. Once the rate_table pointer is incremented, all data there becomes invalid, which leads to the warning. On 32-bit systems, the corruption should happen in neg_ht_rates. The -1 value of idx in struct ieee80211_tx_rate is described in net/mac80211.h. I don't know why we have -1 there and how to reproduce the problem reliably. But -1 can be there and ath9k has no checks for it. The patch introduces two protections: ath_rc_get_rateindex() never returns a negative value and ath_debug_stat_rc() checks the array bounds. It may not be good enough for the kernel, but it may be good enough for Fedora. -- Regards, Pavel Roskin [-- Attachment #2: 01-rix-check.patch --] [-- Type: text/x-patch, Size: 1280 bytes --] Prevent memory corruption in ath9k rate control algorithm From: Pavel Roskin <proski@gnu.org> Check final_rate in ath_debug_stat_rc(). Don't return negative values from ath_rc_get_rateindex(), callers don't expect it. Signed-off-by: Pavel Roskin <proski@gnu.org> --- drivers/net/wireless/ath/ath9k/rc.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/rc.c b/drivers/net/wireless/ath/ath9k/rc.c index 635b592..afe22f4 100644 --- a/drivers/net/wireless/ath/ath9k/rc.c +++ b/drivers/net/wireless/ath/ath9k/rc.c @@ -385,6 +385,11 @@ static int ath_rc_get_rateindex(const struct ath_rate_table *rate_table, int rix = 0, i = 0; static const int mcs_rix_off[] = { 7, 15, 20, 21, 22, 23 }; + if (rate->idx < 0) { + printk(KERN_ERR "%s: rate->idx = %d\n", __func__, rate->idx); + return 0; + } + if (!(rate->flags & IEEE80211_TX_RC_MCS)) return rate->idx; @@ -1324,6 +1329,11 @@ static void ath_debug_stat_rc(struct ath_rate_priv *rc, int final_rate) { struct ath_rc_stats *stats; + if (final_rate < 0 || final_rate >= RATE_TABLE_SIZE) { + printk(KERN_ERR "%s: invalid final_rate: %d\n", __func__, + final_rate); + return; + } stats = &rc->rcstats[final_rate]; stats->success++; } ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [ath9k-devel] [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze 2012-02-06 22:57 ` [ath9k-devel] " Pavel Roskin @ 2012-02-07 12:23 ` Carlos R. Mafra 2012-02-08 22:29 ` Calvin Owens 0 siblings, 1 reply; 4+ messages in thread From: Carlos R. Mafra @ 2012-02-07 12:23 UTC (permalink / raw) To: Pavel Roskin; +Cc: LKML, Luis R. Rodriguez, ath9k-devel On Mon, 6 Feb 2012 at 17:57:02 -0500, Pavel Roskin wrote: > On Mon, 6 Feb 2012 00:29:07 +0000 > "Carlos R. Mafra" <crmafra@gmail.com> wrote: > > > > > I'm testing the latest kernel 3.3.0-rc2+ I pulled from git > > this morning. > > > > My laptop just froze, and when I rebooted I noticed > > that /var/log/messages contained 48 thousand (!) warnings coming from > > ath9k since a few hours ago. I'm pasting the first one: > > > > > ------------[ cut here ]------------ > > WARNING: > > at /home/mafra/linux-2.6/drivers/net/wireless/ath/ath9k/rc.c:697 > > ath_rc_get_highest_rix+0x156/0x210 [ath9k]() Hardware name: VPCEB4X1E > > I believe I found a solution for this today. Please see this bug > tracker: https://bugzilla.redhat.com/show_bug.cgi?id=768639 > > While Fedora users report a warning, I've seen panic reports in the > list. It's a memory corruption bug, so it can manifest in different > ways. Please test the latest patch (attached). > > Here's my comment to the patch: > > This patch is based on my analysis of printk() output I added to the > ath9k driver. I didn't have a chance to test the patch, so testing > would be greatly appreciated. > > The corruption must be happening in ath_debug_stat_rc(), which is given > the result of ath_rc_get_rateindex(). ath_rc_get_rateindex() can > return -1, which causes ath_debug_stat_rc() to increment the value that > lies 16 bytes before rcstats in struct ath_rate_priv. On 64-bit > systems, that happens to be rate_table. Once the rate_table pointer is > incremented, all data there becomes invalid, which leads to the > warning. On 32-bit systems, the corruption should happen in > neg_ht_rates. > > The -1 value of idx in struct ieee80211_tx_rate is described in > net/mac80211.h. I don't know why we have -1 there and how to reproduce > the problem reliably. But -1 can be there and ath9k has no checks for > it. > > The patch introduces two protections: ath_rc_get_rateindex() never > returns a negative value and ath_debug_stat_rc() checks the array > bounds. > > It may not be good enough for the kernel, but it may be good enough for > Fedora. Thanks for the link to the bugzilla and for the attached patch. I'm currently testing it, and so far so good. > Prevent memory corruption in ath9k rate control algorithm > > From: Pavel Roskin <proski@gnu.org> > > Check final_rate in ath_debug_stat_rc(). Don't return negative values > from ath_rc_get_rateindex(), callers don't expect it. > > Signed-off-by: Pavel Roskin <proski@gnu.org> > --- > > drivers/net/wireless/ath/ath9k/rc.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > > diff --git a/drivers/net/wireless/ath/ath9k/rc.c b/drivers/net/wireless/ath/ath9k/rc.c > index 635b592..afe22f4 100644 > --- a/drivers/net/wireless/ath/ath9k/rc.c > +++ b/drivers/net/wireless/ath/ath9k/rc.c > @@ -385,6 +385,11 @@ static int ath_rc_get_rateindex(const struct ath_rate_table *rate_table, > int rix = 0, i = 0; > static const int mcs_rix_off[] = { 7, 15, 20, 21, 22, 23 }; > > + if (rate->idx < 0) { > + printk(KERN_ERR "%s: rate->idx = %d\n", __func__, rate->idx); > + return 0; > + } > + > if (!(rate->flags & IEEE80211_TX_RC_MCS)) > return rate->idx; > > @@ -1324,6 +1329,11 @@ static void ath_debug_stat_rc(struct ath_rate_priv *rc, int final_rate) > { > struct ath_rc_stats *stats; > > + if (final_rate < 0 || final_rate >= RATE_TABLE_SIZE) { > + printk(KERN_ERR "%s: invalid final_rate: %d\n", __func__, > + final_rate); > + return; > + } > stats = &rc->rcstats[final_rate]; > stats->success++; > } ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ath9k-devel] [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze 2012-02-07 12:23 ` Carlos R. Mafra @ 2012-02-08 22:29 ` Calvin Owens 0 siblings, 0 replies; 4+ messages in thread From: Calvin Owens @ 2012-02-08 22:29 UTC (permalink / raw) To: Carlos R. Mafra; +Cc: Pavel Roskin, LKML, Luis R. Rodriguez, ath9k-devel I've been having this issue on x86_64 with 3.3-rc2+ as well. I got a panic a couple days ago, but I haven't been able to reproduce it. I can, however, reproduce the WARN_ON()'s on my laptop by pinging an external server with a high frequency (.1s works well) while walking in circles around the 2nd floor of a nearby library. (It's kind of hilarious, but that's the only way I seem to be able to get it). I can confirm that Pavel's patch resolves the problem. Feel free to add: Tested-by: Calvin Owens <jcalvinowens@gmail.com> Thanks, Calvin ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-02-08 22:29 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-02-06 0:29 [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze Carlos R. Mafra 2012-02-06 22:57 ` [ath9k-devel] " Pavel Roskin 2012-02-07 12:23 ` Carlos R. Mafra 2012-02-08 22:29 ` Calvin Owens
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox