* Crash in cfg80211_unlink_bss @ 2010-10-06 17:28 Ben Greear 2010-10-06 18:04 ` Johannes Berg 0 siblings, 1 reply; 8+ messages in thread From: Ben Greear @ 2010-10-06 17:28 UTC (permalink / raw) To: linux-wireless@vger.kernel.org This test scenario has 72 stations on ath5k trying to connect to a cisco AP that supposedly only supports 63 stations. The 72 STA were created without ssid's configured, then we re-configured all 72 'at once' to give them the proper SSID (ifdown, ifup, iwconfig to set values). The system crashed and rebooted. Kernel is wireless-testing as of later yesterday, with a few additional patches mostly dealing with counters in /proc/net/wireless and some lockdep fixes pulled in from lkml etc. We have seen this before, but this is the first good stacktrace we got. Likely we can reproduce this if extra information is needed. ------------[ cut here ]------------ WARNING: at /home/greearb/git/linux.wireless-testing/lib/list_debug.c:48 list_del+0x24/0xab() Hardware name: PDSM4+ list_del corruption, next is LIST_POISON1 (00100100) Modules linked in: xt_CT iptable_raw ipt_addrtype xt_DSCP xt_dscp xt_string xt_owner xt_NFQUEUE xt_multiport xt_mark xt_ipra] Pid: 27077, comm: kworker/u:1 Not tainted 2.6.36-rc6-wl+ #6 Call Trace: [<c04345cb>] warn_slowpath_common+0x65/0x7a [<c058f3c8>] ? list_del+0x24/0xab [<c0434644>] warn_slowpath_fmt+0x26/0x2a [<c058f3c8>] list_del+0x24/0xab [<fac52fed>] cfg80211_unlink_bss+0x4e/0x7d [cfg80211] [<fb8aa464>] ieee80211_work_work+0x562/0xcfa [mac80211] [<c0443cea>] ? process_one_work+0x145/0x295 [<c0443d34>] process_one_work+0x18f/0x295 [<c0443cea>] ? process_one_work+0x145/0x295 [<fb8a9f02>] ? ieee80211_work_work+0x0/0xcfa [mac80211] [<c044535d>] worker_thread+0xf9/0x1b8 [<c0445264>] ? worker_thread+0x0/0x1b8 [<c0447ca7>] kthread+0x62/0x67 [<c0447c45>] ? kthread+0x0/0x67 [<c0403506>] kernel_thread_helper+0x6/0x1a ---[ end trace 378943e5dc829f28 ]--- BUG: unable to handle kernel paging request at 00200200 IP: [<c058f3f8>] list_del+0x54/0xab *pde = 00000000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.2/0000:03:01.0/ieee80211/phy0/macaddress Modules linked in: xt_CT iptable_raw ipt_addrtype xt_DSCP xt_dscp xt_string xt_owner xt_NFQUEUE xt_multiport xt_mark xt_ipra] -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Crash in cfg80211_unlink_bss 2010-10-06 17:28 Crash in cfg80211_unlink_bss Ben Greear @ 2010-10-06 18:04 ` Johannes Berg 2010-10-06 18:08 ` Ben Greear 2010-10-06 18:11 ` Johannes Berg 0 siblings, 2 replies; 8+ messages in thread From: Johannes Berg @ 2010-10-06 18:04 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless@vger.kernel.org On Wed, 2010-10-06 at 10:28 -0700, Ben Greear wrote: > This test scenario has 72 stations on ath5k trying to connect to a cisco AP > that supposedly only supports 63 stations. > > The 72 STA were created without ssid's configured, then we re-configured all > 72 'at once' to give them the proper SSID (ifdown, ifup, iwconfig to set values). Eww, iwconfig ;-) > The system crashed and rebooted. > > Kernel is wireless-testing as of later yesterday, with a few additional > patches mostly dealing with counters in /proc/net/wireless and some lockdep > fixes pulled in from lkml etc. > > We have seen this before, but this is the first good stacktrace we got. > > Likely we can reproduce this if extra information is needed. > list_del corruption, next is LIST_POISON1 (00100100) This one's interesting. But anyway, now that I look at it in more detail, it seems fairly obvious. You should be able to trigger it with two stations, but it's probably harder ... I need to analyse the refcounting here again and in more detail, but in the meantime can you try below patch? johannes --- net/wireless/scan.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) --- wireless-testing.orig/net/wireless/scan.c 2010-10-06 19:59:41.000000000 +0200 +++ wireless-testing/net/wireless/scan.c 2010-10-06 20:01:20.000000000 +0200 @@ -668,11 +668,11 @@ void cfg80211_unlink_bss(struct wiphy *w bss = container_of(pub, struct cfg80211_internal_bss, pub); spin_lock_bh(&dev->bss_lock); - - list_del(&bss->list); - dev->bss_generation++; - rb_erase(&bss->rbn, &dev->bss_tree); - + if (!list_empty(&bss->list)) { + list_del_init(&bss->list); + dev->bss_generation++; + rb_erase(&bss->rbn, &dev->bss_tree); + } spin_unlock_bh(&dev->bss_lock); kref_put(&bss->ref, bss_release); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Crash in cfg80211_unlink_bss 2010-10-06 18:04 ` Johannes Berg @ 2010-10-06 18:08 ` Ben Greear 2010-10-06 18:16 ` Johannes Berg 2010-10-06 18:11 ` Johannes Berg 1 sibling, 1 reply; 8+ messages in thread From: Ben Greear @ 2010-10-06 18:08 UTC (permalink / raw) To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org On 10/06/2010 11:04 AM, Johannes Berg wrote: > On Wed, 2010-10-06 at 10:28 -0700, Ben Greear wrote: >> This test scenario has 72 stations on ath5k trying to connect to a cisco AP >> that supposedly only supports 63 stations. >> >> The 72 STA were created without ssid's configured, then we re-configured all >> 72 'at once' to give them the proper SSID (ifdown, ifup, iwconfig to set values). > > Eww, iwconfig ;-) Heh, one thing at a time :) >> The system crashed and rebooted. >> >> Kernel is wireless-testing as of later yesterday, with a few additional >> patches mostly dealing with counters in /proc/net/wireless and some lockdep >> fixes pulled in from lkml etc. >> >> We have seen this before, but this is the first good stacktrace we got. >> >> Likely we can reproduce this if extra information is needed. > >> list_del corruption, next is LIST_POISON1 (00100100) > > This one's interesting. > > But anyway, now that I look at it in more detail, it seems fairly > obvious. You should be able to trigger it with two stations, but it's > probably harder ... > > I need to analyse the refcounting here again and in more detail, but in > the meantime can you try below patch? Yes, will do so and let you know the results. Thanks, Ben > > johannes > > --- > net/wireless/scan.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > --- wireless-testing.orig/net/wireless/scan.c 2010-10-06 19:59:41.000000000 +0200 > +++ wireless-testing/net/wireless/scan.c 2010-10-06 20:01:20.000000000 +0200 > @@ -668,11 +668,11 @@ void cfg80211_unlink_bss(struct wiphy *w > bss = container_of(pub, struct cfg80211_internal_bss, pub); > > spin_lock_bh(&dev->bss_lock); > - > - list_del(&bss->list); > - dev->bss_generation++; > - rb_erase(&bss->rbn,&dev->bss_tree); > - > + if (!list_empty(&bss->list)) { > + list_del_init(&bss->list); > + dev->bss_generation++; > + rb_erase(&bss->rbn,&dev->bss_tree); > + } > spin_unlock_bh(&dev->bss_lock); > > kref_put(&bss->ref, bss_release); > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Crash in cfg80211_unlink_bss 2010-10-06 18:08 ` Ben Greear @ 2010-10-06 18:16 ` Johannes Berg 2010-10-06 18:20 ` Ben Greear 2010-10-06 19:14 ` Ben Greear 0 siblings, 2 replies; 8+ messages in thread From: Johannes Berg @ 2010-10-06 18:16 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless@vger.kernel.org On Wed, 2010-10-06 at 11:08 -0700, Ben Greear wrote: > > I need to analyse the refcounting here again and in more detail, but in > > the meantime can you try below patch? > > Yes, will do so and let you know the results. Please try the new version, and let me know. I should submit this ASAP. johannes ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Crash in cfg80211_unlink_bss 2010-10-06 18:16 ` Johannes Berg @ 2010-10-06 18:20 ` Ben Greear 2010-10-06 19:14 ` Ben Greear 1 sibling, 0 replies; 8+ messages in thread From: Ben Greear @ 2010-10-06 18:20 UTC (permalink / raw) To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org On 10/06/2010 11:16 AM, Johannes Berg wrote: > On Wed, 2010-10-06 at 11:08 -0700, Ben Greear wrote: > >>> I need to analyse the refcounting here again and in more detail, but in >>> the meantime can you try below patch? >> >> Yes, will do so and let you know the results. > > Please try the new version, and let me know. I should submit this ASAP. I have to stash my slub debug hacking and re-compile...should have it tested within 30 min if all goes well. Thanks, Ben > > johannes -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Crash in cfg80211_unlink_bss 2010-10-06 18:16 ` Johannes Berg 2010-10-06 18:20 ` Ben Greear @ 2010-10-06 19:14 ` Ben Greear 2010-10-06 19:19 ` Johannes Berg 1 sibling, 1 reply; 8+ messages in thread From: Ben Greear @ 2010-10-06 19:14 UTC (permalink / raw) To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org, Hun-Kyi Wynn On 10/06/2010 11:16 AM, Johannes Berg wrote: > On Wed, 2010-10-06 at 11:08 -0700, Ben Greear wrote: > >>> I need to analyse the refcounting here again and in more detail, but in >>> the meantime can you try below patch? >> >> Yes, will do so and let you know the results. > > Please try the new version, and let me know. I should submit this ASAP. That patch seems to fix things. Feel free to add: Tested-by: Hun-Kyi Wynn <hkwynn@candelatech.com> Thanks, Ben > > johannes -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Crash in cfg80211_unlink_bss 2010-10-06 19:14 ` Ben Greear @ 2010-10-06 19:19 ` Johannes Berg 0 siblings, 0 replies; 8+ messages in thread From: Johannes Berg @ 2010-10-06 19:19 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless@vger.kernel.org, Hun-Kyi Wynn On Wed, 2010-10-06 at 12:14 -0700, Ben Greear wrote: > On 10/06/2010 11:16 AM, Johannes Berg wrote: > > On Wed, 2010-10-06 at 11:08 -0700, Ben Greear wrote: > > > >>> I need to analyse the refcounting here again and in more detail, but in > >>> the meantime can you try below patch? > >> > >> Yes, will do so and let you know the results. > > > > Please try the new version, and let me know. I should submit this ASAP. > > That patch seems to fix things. > > Feel free to add: Tested-by: Hun-Kyi Wynn <hkwynn@candelatech.com> Thanks, I've sent out the patch. johannes ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Crash in cfg80211_unlink_bss 2010-10-06 18:04 ` Johannes Berg 2010-10-06 18:08 ` Ben Greear @ 2010-10-06 18:11 ` Johannes Berg 1 sibling, 0 replies; 8+ messages in thread From: Johannes Berg @ 2010-10-06 18:11 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless@vger.kernel.org On Wed, 2010-10-06 at 20:04 +0200, Johannes Berg wrote: > But anyway, now that I look at it in more detail, it seems fairly > obvious. You should be able to trigger it with two stations, but it's > probably harder ... > > I need to analyse the refcounting here again and in more detail, but in > the meantime can you try below patch? Ok, I did that, and it's because the BSS list owns a reference, so this'll just crash somewhere else for you now ... Below should fix it completely. johannes --- net/wireless/scan.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) --- wireless-testing.orig/net/wireless/scan.c 2010-10-06 19:59:41.000000000 +0200 +++ wireless-testing/net/wireless/scan.c 2010-10-06 20:10:41.000000000 +0200 @@ -668,14 +668,14 @@ void cfg80211_unlink_bss(struct wiphy *w bss = container_of(pub, struct cfg80211_internal_bss, pub); spin_lock_bh(&dev->bss_lock); + if (!list_empty(&bss->list)) { + list_del_init(&bss->list); + dev->bss_generation++; + rb_erase(&bss->rbn, &dev->bss_tree); - list_del(&bss->list); - dev->bss_generation++; - rb_erase(&bss->rbn, &dev->bss_tree); - + kref_put(&bss->ref, bss_release); + } spin_unlock_bh(&dev->bss_lock); - - kref_put(&bss->ref, bss_release); } EXPORT_SYMBOL(cfg80211_unlink_bss); ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-10-06 19:19 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-06 17:28 Crash in cfg80211_unlink_bss Ben Greear 2010-10-06 18:04 ` Johannes Berg 2010-10-06 18:08 ` Ben Greear 2010-10-06 18:16 ` Johannes Berg 2010-10-06 18:20 ` Ben Greear 2010-10-06 19:14 ` Ben Greear 2010-10-06 19:19 ` Johannes Berg 2010-10-06 18:11 ` Johannes Berg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).