From: Ben Greear <greearb@candelatech.com>
To: "Luis R. Rodriguez" <mcgrof@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>,
"linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>
Subject: Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
Date: Mon, 04 Oct 2010 20:39:39 -0700 [thread overview]
Message-ID: <4CAA9DFB.4090009@candelatech.com> (raw)
In-Reply-To: <AANLkTi=3EXnANmEP28Z2ASmqxopbe-EDypska72bL=to@mail.gmail.com>
On 10/04/2010 04:48 PM, Luis R. Rodriguez wrote:
> On Mon, Oct 4, 2010 at 2:38 PM, Ben Greear<greearb@candelatech.com> wrote:
>> On 10/04/2010 02:13 PM, Luis R. Rodriguez wrote:
>>>
>>> On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez<mcgrof@gmail.com>
>>> wrote:
>>>>
>>>> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
>>>> <johannes@sipsolutions.net> wrote:
>>>>>
>>>>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>>>>>>
>>>>>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>>>>>>>
>>>>>>> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>>>>>>>>
>>>>>>>> Just in case this seems familiar to anyone...
>>>>>>>>
>>>>>>>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>>>>>>>
>>>>>>> Do you have debug info that'd point to a code line?
>>>>>>>
>>>>>>> I have never heard of this.
>>>>>>
>>>>>> I don't actually know how to get a line of code out of those
>>>>>> hex offsets...
>>>>>>
>>>>>> Someone told me many years ago..but I lost that information :P
>>>>>
>>>>> Err, I never remember either, I think Luis knows the gdb thing ... I
>>>>> usually use "objdump -dS"
>>>>
>>>> gdb net/mac80211/mac80211.ko
>>>> l *(ieee80211_stop_tx_ba_session+0x14/0x84)
>>>
>>> Oops I meant:
>>>
>>> gdb net/mac80211/mac80211.ko
>>> l *(ieee80211_stop_tx_ba_session+0x14)
>>
>> Thank!
>>
>> I had to re-compile with debugging symbols, and added kgdb (hopefully
>> that won't mess anything up).
>
> You may want to look at using netconsole instead if you're goal is
> just to get some oops off the box.
>
> CONFIG_NETCONSOLE=m
>
> mcgrof@tux ~/bin $ cat netconsole
> #!/bin/bash
> sudo dmesg -n 8
> sudo ip addr add 192.168.4.2/24 dev eth4
> sudo modprobe netconsole
> netconsole="@192.168.4.2/eth4,@192.168.4.3/00:1e:37:82:48:5a"
>
> I'd run that script on the dev box, and on 192.168.4.3 just do `nc -l
> -p 6666 | tee log`. To test just modprobe and rmmod ath9k.
>
>> Reading symbols from
>> /home/greearb/kernel/2.6/wireless-testing-dbg.p4s/net/mac80211/mac80211.ko...done.
>> (gdb) l *(ieee80211_stop_tx_ba_session+0x14)
>> 0x54fe is in ieee80211_stop_tx_ba_session
>> (/home/greearb/git/linux.wireless-testing/net/mac80211/agg-tx.c:595).
>> 590
>> 591 int ieee80211_stop_tx_ba_session(struct ieee80211_sta *pubsta, u16
>> tid)
>> 592 {
>> 593 struct sta_info *sta = container_of(pubsta, struct sta_info,
>> sta);
>> 594 struct ieee80211_sub_if_data *sdata = sta->sdata;
>> 595 struct ieee80211_local *local = sdata->local;
>
> What was the oops complaint? NULL pointer dereference? If sdata got
> screwed up that would be pretty serious, the only way that could
> happen is if somehow it managed to get removed prior to the
> ieee80211_stop_tx_ba_session() or if there is some sort of memory
> corruption., What steps do you follow to reproduce?
It's dying trying to de-reference something, probably sdata, but for some reason I didn't
think it was NULL. (I was having trouble getting clean stack dumps
on the serial console on top of my other issues today.) In A probably-similar
crash it was trying to dereference 0x00100104 (See my 3:42 email)
in this series.
I added printks to the stop_tx_ba_session method to try to figure out what was happening, but
of course then I could no longer reproduce it, or at least it crashed in the cfg80211_unlink_bss
first.
To reproduce, I have a user-space app that creates 130 or so STA devices, starts wpa_supplicant
for each one, and then watches events with 'iw event', and reads /proc/net/wireless quite often
(and grabs some other stats out of debugfs, etc). It runs 'iwconfig' and parses output for
other stats. In short, it does a bunch of things that would be hard to reproduce with any
simple script. The user-space app is proprietary, though I would of course give you a free
binary and help you set it up should you wish to use it.
When I disabled power-save, it ran a lot longer, but it would still hard-hang or occasionally
crash with stack-trace pointing to the 0x00100104 dereference.
Perhaps related, with power-save disabled, after a while (maybe 10-20 minutes), the system
would often get to a state where the ath9k no longer showed any additional transmitted packets
in it's debugfs traffic. The netdevices (sta1, etc), would show tx pkt counters increasing,
and the qdiscs showed no backlog. It was getting rx interrupts, but no tx, according to
debugfs output. I didn't get any chance to debug that any further.
We have much better luck with ath5k in general, so I think most of these issues are
related to ath9k and/or /n in general. But, even so, we do see deadlocks (on rtnl_lock, it seems)
with ath5k, and I still have some lockdep warnings to deal with in the mac80211 code,
so it's possible the problem is more general and ath9k just triggers it much easier.
Thanks,
Ben
>
> Luis
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
next prev parent reply other threads:[~2010-10-05 3:39 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-04 18:51 Crash in agg-tx.c, with ath9k and lots of STA VIFs Ben Greear
2010-10-04 19:01 ` Johannes Berg
2010-10-04 19:04 ` Ben Greear
2010-10-04 19:10 ` Johannes Berg
2010-10-04 21:12 ` Luis R. Rodriguez
2010-10-04 21:13 ` Luis R. Rodriguez
2010-10-04 21:38 ` Ben Greear
2010-10-04 22:42 ` Ben Greear
2010-10-05 7:56 ` Johannes Berg
2010-10-05 16:24 ` Ben Greear
2010-10-04 23:48 ` Luis R. Rodriguez
2010-10-05 3:39 ` Ben Greear [this message]
2010-10-05 6:09 ` Luis R. Rodriguez
2010-10-05 3:43 ` Ben Greear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CAA9DFB.4090009@candelatech.com \
--to=greearb@candelatech.com \
--cc=johannes@sipsolutions.net \
--cc=linux-wireless@vger.kernel.org \
--cc=mcgrof@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).