linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit
       [not found] <CANugF35YQzs7gi4htrgCUKrTDp_ia0RUo9g_HNJ=CzfnOCkO2g@mail.gmail.com>
@ 2012-07-06  4:36 ` Andrew Chant
  2012-07-06  7:15   ` Johannes Berg
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Chant @ 2012-07-06  4:36 UTC (permalink / raw)
  To: Johannes Berg, John Linville, linux-wireless

Hello linux-wireless,
 while performance testing ath9k -> ath9k performance in 3.4.4, I got
a nasty kernel panic.  My performance testing involved filling the air
with 1410-byte UDP packets between the machines, and switching the
frequencies of the two cards to see how frequency affected
performance.  I had switched between channels 36, 40, 44, and 48.
Oops was on the transmitting machine, which was acting as the AP.

Very clear screen image of the oops is at
https://picasaweb.google.com/lh/photo/CjBdHLZH0up5PrnmCySJidMTjNZETYmyPJy0liipFm0?feat=directlink

Rough transcription:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<fffffffff8125103a>] skb_dequeue+0x3a/0x58
PGD 0
Oops: 0002 [#1] SMP
CPU 4
Modules linked in: vfat fat usb_storage loop hid_microsoft usbhid
snd_hda_codec_hdmi snd_hda_codec_via i915 cfbimgblt arc4 cfbcopyarea
cfbfillarea ath9k i2c_algo_bit drm_kms_helper ath9k_common ath9k_hw
snd_hda_intel mac80211 ath snd_hda_codec snd_hwdep snd_pcm snd_timer
xhci_hcd cfg80211 drm ehci_hcd usbcore snd psmouse intel_agp atl1c
usb_common video intel_gtt i2c_core evdev crc32c_intel microcode
snd_page_alloc agpgart

Pid: 0, comm: swapper/4 Not tainted 3.4.4 #37 Gigabyte Technology Co.,
Ltd.  To be filled by O.E.M./Z77-D3H
RIP: 0010:[<ffffffff8125103a>][<ffffffff8125103a>] skm_dequeue+0x3a/0x58
RSP: blah... look at image if you care
RAX: 0000...00012 ... RCX: 0
blah blah blah

Call Trace:
 test_and_clear_sta_flag+0x33/0x33 [mac80211]
 ieee80211_add_pending_skbs_fn+0x81/0xf7 [mac80211]
 ieee80211_sta_ps_deliver_wakeup+0x170/0x18a[mac80211]
 ieee80211_rx_handlers+0x5b3/0x1685 [mac80211]
 get_pageblock_migratetype+0xc/0xd
 ieee80211_prepare_and_rx_handle+0x634/0x6c6 [mac80211]
 ieee80211_rx+0x492/0x5a1 [ath9k]
 ath_rx_tasklet+0x135/0x15a1 [ath9k]
 ath9k_tasklet+0xce/0x10b [ath9k]
...blah blah blah

Code: 32 a8 07 00 48 8b 5d 00 48 39 eb 74 27 48 85 db 74 24 ff 4d 10
48 8b 0b 48 c7 03 00 00 00 00 48 8b ...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit
  2012-07-06  4:36 ` v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit Andrew Chant
@ 2012-07-06  7:15   ` Johannes Berg
  2012-07-06  7:46     ` Andrew Chant
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Berg @ 2012-07-06  7:15 UTC (permalink / raw)
  To: Andrew Chant
  Cc: linux-wireless, Luis R. Rodriguez, Jouni Malinen,
	Vasanthakumar Thiagarajan, Senthil Balasubramanian

-John
+QCA folks

On Thu, 2012-07-05 at 21:36 -0700, Andrew Chant wrote:

>  while performance testing ath9k -> ath9k performance in 3.4.4, I got
> a nasty kernel panic.  My performance testing involved filling the air
> with 1410-byte UDP packets between the machines, and switching the
> frequencies of the two cards to see how frequency affected
> performance.  I had switched between channels 36, 40, 44, and 48.
> Oops was on the transmitting machine, which was acting as the AP.
> 
> Very clear screen image of the oops is at
> https://picasaweb.google.com/lh/photo/CjBdHLZH0up5PrnmCySJidMTjNZETYmyPJy0liipFm0?feat=directlink

I briefly looked at this, but I don't see a bug in mac80211. It seems
likely that ath9k hands back a corrupted SKB, or frees one it no longer
owns, or such. The skb->next/prev pointers seem corrupted (rcx is NULL)
in one of the SKBs on the list, but mac80211 can't do that afaict.

johannes


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit
  2012-07-06  7:15   ` Johannes Berg
@ 2012-07-06  7:46     ` Andrew Chant
  2012-07-12  6:35       ` Andrew Chant
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Chant @ 2012-07-06  7:46 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linux-wireless, Luis R. Rodriguez, Jouni Malinen,
	Vasanthakumar Thiagarajan, Senthil Balasubramanian

I was able to reproduce this on a boot shortly afterwards without
changing the frequencies.
Exact same stack trace w/ exception of slightly different values for
RBX & R15, and R10 had 0x7f instead of 0x80.  I have not been able to
reproduce since despite trying quite hard :)  I have a picture of the
second oops if that helps.
PCI ID is 168c:0030 (AR9300 Wireless LAN adaptor (rev 01))
-Andrew

On Fri, Jul 6, 2012 at 12:15 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> -John
> +QCA folks
>
> On Thu, 2012-07-05 at 21:36 -0700, Andrew Chant wrote:
>
>>  while performance testing ath9k -> ath9k performance in 3.4.4, I got
>> a nasty kernel panic.  My performance testing involved filling the air
>> with 1410-byte UDP packets between the machines, and switching the
>> frequencies of the two cards to see how frequency affected
>> performance.  I had switched between channels 36, 40, 44, and 48.
>> Oops was on the transmitting machine, which was acting as the AP.
>>
>> Very clear screen image of the oops is at
>> https://picasaweb.google.com/lh/photo/CjBdHLZH0up5PrnmCySJidMTjNZETYmyPJy0liipFm0?feat=directlink
>
> I briefly looked at this, but I don't see a bug in mac80211. It seems
> likely that ath9k hands back a corrupted SKB, or frees one it no longer
> owns, or such. The skb->next/prev pointers seem corrupted (rcx is NULL)
> in one of the SKBs on the list, but mac80211 can't do that afaict.
>
> johannes
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit
  2012-07-06  7:46     ` Andrew Chant
@ 2012-07-12  6:35       ` Andrew Chant
  2012-07-16  5:19         ` Mohammed Shafi
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Chant @ 2012-07-12  6:35 UTC (permalink / raw)
  Cc: linux-wireless, Luis R. Rodriguez, Jouni Malinen,
	Vasanthakumar Thiagarajan, Senthil Balasubramanian

Any QCA people get a chance to take a look?  This is completely
reproducible for me on 3.4.4, sometimes within a few minutes but
occasionally requires up to an hour.  Do you qca folks have any tests
where you continuously transmit as many UDP packets as you possibly
can to another host?

On Fri, Jul 6, 2012 at 12:46 AM, Andrew Chant <andrew.chant@gmail.com> wrote:
> I was able to reproduce this on a boot shortly afterwards without
> changing the frequencies.
> Exact same stack trace w/ exception of slightly different values for
> RBX & R15, and R10 had 0x7f instead of 0x80.  I have not been able to
> reproduce since despite trying quite hard :)  I have a picture of the
> second oops if that helps.
> PCI ID is 168c:0030 (AR9300 Wireless LAN adaptor (rev 01))
> -Andrew
>
> On Fri, Jul 6, 2012 at 12:15 AM, Johannes Berg
> <johannes@sipsolutions.net> wrote:
>> -John
>> +QCA folks
>>
>> On Thu, 2012-07-05 at 21:36 -0700, Andrew Chant wrote:
>>
>>>  while performance testing ath9k -> ath9k performance in 3.4.4, I got
>>> a nasty kernel panic.  My performance testing involved filling the air
>>> with 1410-byte UDP packets between the machines, and switching the
>>> frequencies of the two cards to see how frequency affected
>>> performance.  I had switched between channels 36, 40, 44, and 48.
>>> Oops was on the transmitting machine, which was acting as the AP.
>>>
>>> Very clear screen image of the oops is at
>>> https://picasaweb.google.com/lh/photo/CjBdHLZH0up5PrnmCySJidMTjNZETYmyPJy0liipFm0?feat=directlink
>>
>> I briefly looked at this, but I don't see a bug in mac80211. It seems
>> likely that ath9k hands back a corrupted SKB, or frees one it no longer
>> owns, or such. The skb->next/prev pointers seem corrupted (rcx is NULL)
>> in one of the SKBs on the list, but mac80211 can't do that afaict.
>>
>> johannes
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit
  2012-07-12  6:35       ` Andrew Chant
@ 2012-07-16  5:19         ` Mohammed Shafi
  2012-07-17  3:06           ` Andrew Chant
  0 siblings, 1 reply; 8+ messages in thread
From: Mohammed Shafi @ 2012-07-16  5:19 UTC (permalink / raw)
  To: Andrew Chant
  Cc: linux-wireless, Luis R. Rodriguez, Jouni Malinen,
	Vasanthakumar Thiagarajan, Senthil Balasubramanian

On Thu, Jul 12, 2012 at 12:05 PM, Andrew Chant <andrew.chant@gmail.com> wrote:
> Any QCA people get a chance to take a look?  This is completely
> reproducible for me on 3.4.4, sometimes within a few minutes but
> occasionally requires up to an hour.  Do you qca folks have any tests
> where you continuously transmit as many UDP packets as you possibly
> can to another host?

please check whether the following patch helps.
http://comments.gmane.org/gmane.linux.kernel.wireless.general/93723
Could please help whether it happens with wireless-testing tree ?
http://linuxwireless.org/en/developers/Documentation/git-guide#Cloning_latest_wireless-testing

>
> On Fri, Jul 6, 2012 at 12:46 AM, Andrew Chant <andrew.chant@gmail.com> wrote:
>> I was able to reproduce this on a boot shortly afterwards without
>> changing the frequencies.
>> Exact same stack trace w/ exception of slightly different values for
>> RBX & R15, and R10 had 0x7f instead of 0x80.  I have not been able to
>> reproduce since despite trying quite hard :)  I have a picture of the
>> second oops if that helps.
>> PCI ID is 168c:0030 (AR9300 Wireless LAN adaptor (rev 01))
>> -Andrew
>>
>> On Fri, Jul 6, 2012 at 12:15 AM, Johannes Berg
>> <johannes@sipsolutions.net> wrote:
>>> -John
>>> +QCA folks
>>>
>>> On Thu, 2012-07-05 at 21:36 -0700, Andrew Chant wrote:
>>>
>>>>  while performance testing ath9k -> ath9k performance in 3.4.4, I got
>>>> a nasty kernel panic.  My performance testing involved filling the air
>>>> with 1410-byte UDP packets between the machines, and switching the
>>>> frequencies of the two cards to see how frequency affected
>>>> performance.  I had switched between channels 36, 40, 44, and 48.
>>>> Oops was on the transmitting machine, which was acting as the AP.
>>>>
>>>> Very clear screen image of the oops is at
>>>> https://picasaweb.google.com/lh/photo/CjBdHLZH0up5PrnmCySJidMTjNZETYmyPJy0liipFm0?feat=directlink
>>>
>>> I briefly looked at this, but I don't see a bug in mac80211. It seems
>>> likely that ath9k hands back a corrupted SKB, or frees one it no longer
>>> owns, or such. The skb->next/prev pointers seem corrupted (rcx is NULL)
>>> in one of the SKBs on the list, but mac80211 can't do that afaict.
>>>
>>> johannes
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
thanks,
shafi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit
  2012-07-16  5:19         ` Mohammed Shafi
@ 2012-07-17  3:06           ` Andrew Chant
  2012-07-17  3:18             ` Mohammed Shafi
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Chant @ 2012-07-17  3:06 UTC (permalink / raw)
  To: Mohammed Shafi
  Cc: linux-wireless, Luis R. Rodriguez, Jouni Malinen,
	Vasanthakumar Thiagarajan, Senthil Balasubramanian

Thanks.  That patch seems good against 3.4.4 after the first few
minutes - I'll leave it to run overnight.

On Sun, Jul 15, 2012 at 10:19 PM, Mohammed Shafi
<shafi.wireless@gmail.com> wrote:
> On Thu, Jul 12, 2012 at 12:05 PM, Andrew Chant <andrew.chant@gmail.com> wrote:
>> Any QCA people get a chance to take a look?  This is completely
>> reproducible for me on 3.4.4, sometimes within a few minutes but
>> occasionally requires up to an hour.  Do you qca folks have any tests
>> where you continuously transmit as many UDP packets as you possibly
>> can to another host?
>
> please check whether the following patch helps.
> http://comments.gmane.org/gmane.linux.kernel.wireless.general/93723
> Could please help whether it happens with wireless-testing tree ?
> http://linuxwireless.org/en/developers/Documentation/git-guide#Cloning_latest_wireless-testing
>
>>
>> On Fri, Jul 6, 2012 at 12:46 AM, Andrew Chant <andrew.chant@gmail.com> wrote:
>>> I was able to reproduce this on a boot shortly afterwards without
>>> changing the frequencies.
>>> Exact same stack trace w/ exception of slightly different values for
>>> RBX & R15, and R10 had 0x7f instead of 0x80.  I have not been able to
>>> reproduce since despite trying quite hard :)  I have a picture of the
>>> second oops if that helps.
>>> PCI ID is 168c:0030 (AR9300 Wireless LAN adaptor (rev 01))
>>> -Andrew
>>>
>>> On Fri, Jul 6, 2012 at 12:15 AM, Johannes Berg
>>> <johannes@sipsolutions.net> wrote:
>>>> -John
>>>> +QCA folks
>>>>
>>>> On Thu, 2012-07-05 at 21:36 -0700, Andrew Chant wrote:
>>>>
>>>>>  while performance testing ath9k -> ath9k performance in 3.4.4, I got
>>>>> a nasty kernel panic.  My performance testing involved filling the air
>>>>> with 1410-byte UDP packets between the machines, and switching the
>>>>> frequencies of the two cards to see how frequency affected
>>>>> performance.  I had switched between channels 36, 40, 44, and 48.
>>>>> Oops was on the transmitting machine, which was acting as the AP.
>>>>>
>>>>> Very clear screen image of the oops is at
>>>>> https://picasaweb.google.com/lh/photo/CjBdHLZH0up5PrnmCySJidMTjNZETYmyPJy0liipFm0?feat=directlink
>>>>
>>>> I briefly looked at this, but I don't see a bug in mac80211. It seems
>>>> likely that ath9k hands back a corrupted SKB, or frees one it no longer
>>>> owns, or such. The skb->next/prev pointers seem corrupted (rcx is NULL)
>>>> in one of the SKBs on the list, but mac80211 can't do that afaict.
>>>>
>>>> johannes
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> thanks,
> shafi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit
  2012-07-17  3:06           ` Andrew Chant
@ 2012-07-17  3:18             ` Mohammed Shafi
  2012-07-17 15:05               ` Andrew Chant
  0 siblings, 1 reply; 8+ messages in thread
From: Mohammed Shafi @ 2012-07-17  3:18 UTC (permalink / raw)
  To: Andrew Chant
  Cc: linux-wireless, Luis R. Rodriguez, Jouni Malinen,
	Vasanthakumar Thiagarajan, Senthil Balasubramanian

Hi Andrew,

On Tue, Jul 17, 2012 at 8:36 AM, Andrew Chant <andrew.chant@gmail.com> wrote:
> Thanks.  That patch seems good against 3.4.4 after the first few
> minutes - I'll leave it to run overnight.

thanks, sure!

-- 
thanks,
shafi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit
  2012-07-17  3:18             ` Mohammed Shafi
@ 2012-07-17 15:05               ` Andrew Chant
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Chant @ 2012-07-17 15:05 UTC (permalink / raw)
  To: Mohammed Shafi
  Cc: linux-wireless, Luis R. Rodriguez, Jouni Malinen,
	Vasanthakumar Thiagarajan, Senthil Balasubramanian

It was good overnight.  Is this worth trying to put into 3.4.6 if there is one?

On Mon, Jul 16, 2012 at 8:18 PM, Mohammed Shafi
<shafi.wireless@gmail.com> wrote:
> Hi Andrew,
>
> On Tue, Jul 17, 2012 at 8:36 AM, Andrew Chant <andrew.chant@gmail.com> wrote:
>> Thanks.  That patch seems good against 3.4.4 after the first few
>> minutes - I'll leave it to run overnight.
>
> thanks, sure!
>
> --
> thanks,
> shafi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-07-17 15:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CANugF35YQzs7gi4htrgCUKrTDp_ia0RUo9g_HNJ=CzfnOCkO2g@mail.gmail.com>
2012-07-06  4:36 ` v3.4.4 ath9k: kernel NULL pointer dereference in skb_dequeue during heavy udp xmit Andrew Chant
2012-07-06  7:15   ` Johannes Berg
2012-07-06  7:46     ` Andrew Chant
2012-07-12  6:35       ` Andrew Chant
2012-07-16  5:19         ` Mohammed Shafi
2012-07-17  3:06           ` Andrew Chant
2012-07-17  3:18             ` Mohammed Shafi
2012-07-17 15:05               ` Andrew Chant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).