linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Crash in agg-tx.c, with ath9k and lots of STA VIFs.
@ 2010-10-04 18:51 Ben Greear
  2010-10-04 19:01 ` Johannes Berg
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Greear @ 2010-10-04 18:51 UTC (permalink / raw)
  To: linux-wireless@vger.kernel.org

Just in case this seems familiar to anyone...

IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
*pde = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:08:01.0/ieee80211/phy0/index
Modules linked in: michael_mic aes_i586 aes_generic 8021q garp stp llc macvlan pktgen fuse nfs lockd fscache nfs_acl ]

Pid: 16563, comm: parse-iwconfig. Not tainted 2.6.36-rc6-wl+ #23 PDSBM/PDSBM
EIP: 0060:[<f8ba74da>] EFLAGS: 00010282 CPU: 0
EIP is at ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
EAX: 00010000 EBX: f30d5a98 ECX: 00000001 EDX: 00000000
ESI: f30d5800 EDI: f30d5800 EBP: f5ed5ef4 ESP: f5ed5ee0
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process parse-iwconfig. (pid: 16563, ti=f5ed4000 task=f44ee860 task.ti=f5ed4000)
Stack:
  00000246 00000246 00000000 f30d5800 00000000 f5ed5f08 f8ba7b28 c0a2be80
<0> f2f11090 f5ed5f50 f5ed5f64 c043d5a8 00000000 00000002 00000000 c043d546
<0> f5ed5f50 f5ed5f44 f30d5a44 c0a2cca8 c0a2caa8 c0a2c8a8 c0a2c6a8 f8ba7a90
Call Trace:
  [<f8ba7b28>] ? sta_addba_resp_timer_expired+0x98/0xb4 [mac80211]
  [<c043d5a8>] ? run_timer_softirq+0x14f/0x1e7
  [<c043d546>] ? run_timer_softirq+0xed/0x1e7
  [<f8ba7a90>] ? sta_addba_resp_timer_expired+0x0/0xb4 [mac80211]
  [<c04393f2>] ? __do_softirq+0x86/0x111
  [<c04394b3>] ? do_softirq+0x36/0x5a
  [<c04395ec>] ? irq_exit+0x35/0x69
  [<c0418d1b>] ? smp_apic_timer_interrupt+0x6e/0x7c
  [<c076459f>] ? apic_timer_interrupt+0x2f/0x40
Code: b2 c7 8b 45 ec 8d 93 00 fe ff ff e8 46 23 01 00 58 5a 5b 5e 5f 5d c3 55 89 e5 57 56 53 89 c3 8d b8 68 fd ff ff
EIP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211] SS:ESP 0068:f5ed5ee0
CR2: 00000000000101e0
---[ end trace 4c1f9d6f2b4d6888 ]---



-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 18:51 Crash in agg-tx.c, with ath9k and lots of STA VIFs Ben Greear
@ 2010-10-04 19:01 ` Johannes Berg
  2010-10-04 19:04   ` Ben Greear
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Berg @ 2010-10-04 19:01 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
> Just in case this seems familiar to anyone...
> 
> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]

Do you have debug info that'd point to a code line?

I have never heard of this.

johannes


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 19:01 ` Johannes Berg
@ 2010-10-04 19:04   ` Ben Greear
  2010-10-04 19:10     ` Johannes Berg
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Greear @ 2010-10-04 19:04 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org

On 10/04/2010 12:01 PM, Johannes Berg wrote:
> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>> Just in case this seems familiar to anyone...
>>
>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>
> Do you have debug info that'd point to a code line?
>
> I have never heard of this.

I don't actually know how to get a line of code out of those
hex offsets...

Someone told me many years ago..but I lost that information :P

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 19:04   ` Ben Greear
@ 2010-10-04 19:10     ` Johannes Berg
  2010-10-04 21:12       ` Luis R. Rodriguez
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Berg @ 2010-10-04 19:10 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
> On 10/04/2010 12:01 PM, Johannes Berg wrote:
> > On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
> >> Just in case this seems familiar to anyone...
> >>
> >> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
> >
> > Do you have debug info that'd point to a code line?
> >
> > I have never heard of this.
> 
> I don't actually know how to get a line of code out of those
> hex offsets...
> 
> Someone told me many years ago..but I lost that information :P

Err, I never remember either, I think Luis knows the gdb thing ... I
usually use "objdump -dS"

johannes


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 19:10     ` Johannes Berg
@ 2010-10-04 21:12       ` Luis R. Rodriguez
  2010-10-04 21:13         ` Luis R. Rodriguez
  0 siblings, 1 reply; 14+ messages in thread
From: Luis R. Rodriguez @ 2010-10-04 21:12 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Ben Greear, linux-wireless@vger.kernel.org

On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>> > On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>> >> Just in case this seems familiar to anyone...
>> >>
>> >> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>> >
>> > Do you have debug info that'd point to a code line?
>> >
>> > I have never heard of this.
>>
>> I don't actually know how to get a line of code out of those
>> hex offsets...
>>
>> Someone told me many years ago..but I lost that information :P
>
> Err, I never remember either, I think Luis knows the gdb thing ... I
> usually use "objdump -dS"

gdb net/mac80211/mac80211.ko
l *(ieee80211_stop_tx_ba_session+0x14/0x84)


  Luis

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 21:12       ` Luis R. Rodriguez
@ 2010-10-04 21:13         ` Luis R. Rodriguez
  2010-10-04 21:38           ` Ben Greear
  0 siblings, 1 reply; 14+ messages in thread
From: Luis R. Rodriguez @ 2010-10-04 21:13 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Ben Greear, linux-wireless@vger.kernel.org

On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
> <johannes@sipsolutions.net> wrote:
>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>>> > On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>>> >> Just in case this seems familiar to anyone...
>>> >>
>>> >> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>>> >
>>> > Do you have debug info that'd point to a code line?
>>> >
>>> > I have never heard of this.
>>>
>>> I don't actually know how to get a line of code out of those
>>> hex offsets...
>>>
>>> Someone told me many years ago..but I lost that information :P
>>
>> Err, I never remember either, I think Luis knows the gdb thing ... I
>> usually use "objdump -dS"
>
> gdb net/mac80211/mac80211.ko
> l *(ieee80211_stop_tx_ba_session+0x14/0x84)

Oops I meant:

gdb net/mac80211/mac80211.ko
l *(ieee80211_stop_tx_ba_session+0x14)

   Luis

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 21:13         ` Luis R. Rodriguez
@ 2010-10-04 21:38           ` Ben Greear
  2010-10-04 22:42             ` Ben Greear
  2010-10-04 23:48             ` Luis R. Rodriguez
  0 siblings, 2 replies; 14+ messages in thread
From: Ben Greear @ 2010-10-04 21:38 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Johannes Berg, linux-wireless@vger.kernel.org

On 10/04/2010 02:13 PM, Luis R. Rodriguez wrote:
> On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez<mcgrof@gmail.com>  wrote:
>> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
>> <johannes@sipsolutions.net>  wrote:
>>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>>>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>>>>> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>>>>>> Just in case this seems familiar to anyone...
>>>>>>
>>>>>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>>>>>
>>>>> Do you have debug info that'd point to a code line?
>>>>>
>>>>> I have never heard of this.
>>>>
>>>> I don't actually know how to get a line of code out of those
>>>> hex offsets...
>>>>
>>>> Someone told me many years ago..but I lost that information :P
>>>
>>> Err, I never remember either, I think Luis knows the gdb thing ... I
>>> usually use "objdump -dS"
>>
>> gdb net/mac80211/mac80211.ko
>> l *(ieee80211_stop_tx_ba_session+0x14/0x84)
>
> Oops I meant:
>
> gdb net/mac80211/mac80211.ko
> l *(ieee80211_stop_tx_ba_session+0x14)

Thank!

I had to re-compile with debugging symbols, and added kgdb (hopefully
that won't mess anything up).

Reading symbols from /home/greearb/kernel/2.6/wireless-testing-dbg.p4s/net/mac80211/mac80211.ko...done.
(gdb) l *(ieee80211_stop_tx_ba_session+0x14)
0x54fe is in ieee80211_stop_tx_ba_session (/home/greearb/git/linux.wireless-testing/net/mac80211/agg-tx.c:595).
590	
591	int ieee80211_stop_tx_ba_session(struct ieee80211_sta *pubsta, u16 tid)
592	{
593		struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
594		struct ieee80211_sub_if_data *sdata = sta->sdata;
595		struct ieee80211_local *local = sdata->local;
596		struct tid_ampdu_tx *tid_tx;
597		int ret = 0;
598	
599		trace_api_stop_tx_ba_session(pubsta, tid);


I'm not sure I quite got the hang of kgdb yet, but hoping to get that working
and reproduce with it enabled...

Thanks,
Ben

>
>     Luis


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 21:38           ` Ben Greear
@ 2010-10-04 22:42             ` Ben Greear
  2010-10-05  7:56               ` Johannes Berg
  2010-10-04 23:48             ` Luis R. Rodriguez
  1 sibling, 1 reply; 14+ messages in thread
From: Ben Greear @ 2010-10-04 22:42 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Johannes Berg, linux-wireless@vger.kernel.org

On 10/04/2010 02:38 PM, Ben Greear wrote:
> On 10/04/2010 02:13 PM, Luis R. Rodriguez wrote:
>> On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez<mcgrof@gmail.com>
>> wrote:
>>> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
>>> <johannes@sipsolutions.net> wrote:
>>>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>>>>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>>>>>> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>>>>>>> Just in case this seems familiar to anyone...
>>>>>>>
>>>>>>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>>>>>>
>>>>>> Do you have debug info that'd point to a code line?
>>>>>>
>>>>>> I have never heard of this.
>>>>>
>>>>> I don't actually know how to get a line of code out of those
>>>>> hex offsets...
>>>>>
>>>>> Someone told me many years ago..but I lost that information :P
>>>>
>>>> Err, I never remember either, I think Luis knows the gdb thing ... I
>>>> usually use "objdump -dS"
>>>
>>> gdb net/mac80211/mac80211.ko
>>> l *(ieee80211_stop_tx_ba_session+0x14/0x84)
>>
>> Oops I meant:
>>
>> gdb net/mac80211/mac80211.ko
>> l *(ieee80211_stop_tx_ba_session+0x14)
>
> Thank!
>
> I had to re-compile with debugging symbols, and added kgdb (hopefully
> that won't mess anything up).
>
> Reading symbols from
> /home/greearb/kernel/2.6/wireless-testing-dbg.p4s/net/mac80211/mac80211.ko...done.
>
> (gdb) l *(ieee80211_stop_tx_ba_session+0x14)
> 0x54fe is in ieee80211_stop_tx_ba_session
> (/home/greearb/git/linux.wireless-testing/net/mac80211/agg-tx.c:595).
> 590
> 591 int ieee80211_stop_tx_ba_session(struct ieee80211_sta *pubsta, u16 tid)
> 592 {
> 593 struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
> 594 struct ieee80211_sub_if_data *sdata = sta->sdata;
> 595 struct ieee80211_local *local = sdata->local;
> 596 struct tid_ampdu_tx *tid_tx;
> 597 int ret = 0;
> 598
> 599 trace_api_stop_tx_ba_session(pubsta, tid);
>
>
> I'm not sure I quite got the hang of kgdb yet, but hoping to get that
> working
> and reproduce with it enabled...

I gave up on getting kgdb to work..seemed to cause more problems than
it fixed.

I did find a similar crash while debugging a kernel with just symbols
compiled in (not kgdb) though:

The interesting thing to me is that the 00100104 address is the same in
both crashes, though this one is in a different method.

Also, this is with power-save NOT disabled (I was hoping to
hit some debugging code I put in for the other crash..hit this instead.)

Oct  4 15:26:15 localhost kernel: BUG: unable to handle kernel paging request at 00100104
Oct  4 15:26:15 localhost kernel: IP: [<fc77e038>] cfg80211_unlink_bss+0x4d/0x8d [cfg80211]
Oct  4 15:26:15 localhost kernel: *pde = 00000000
Oct  4 15:26:15 localhost kernel: Oops: 0002 [#1] SMP
Oct  4 15:26:15 localhost kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:08:01.0/net/sta26/flags
Oct  4 15:26:15 localhost kernel: Modules linked in: michael_mic ath5k arc4 ath9k mac80211 ath9k_common ath9k_hw ath cfg80211 aes_i586 aes_generic 8021q garp 
stp llc macvlan pktgen fuse nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 uinput ecb e1000e iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 microcode i915 
drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: ipt_addrtype]
Oct  4 15:26:15 localhost kernel:
Oct  4 15:26:15 localhost kernel: Pid: 41, comm: kworker/u:2 Not tainted 2.6.36-rc6-wl+ #4 PDSBM/PDSBM
Oct  4 15:26:15 localhost kernel: EIP: 0060:[<fc77e038>] EFLAGS: 00010282 CPU: 1
Oct  4 15:26:15 localhost kernel: EIP is at cfg80211_unlink_bss+0x4d/0x8d [cfg80211]
Oct  4 15:26:15 localhost kernel: EAX: 00200200 EBX: f2779424 ECX: 00100100 EDX: f2779400
Oct  4 15:26:15 localhost kernel: ESI: f53a0180 EDI: f53a0000 EBP: f73d3ec4 ESP: f73d3eb0
Oct  4 15:26:15 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Oct  4 15:26:15 localhost kernel: Process kworker/u:2 (pid: 41, ti=f73d2000 task=f71272d0 task.ti=f73d2000)


(gdb) l *(cfg80211_unlink_bss+0x4d)
0x405c is in cfg80211_unlink_bss (/home/greearb/git/linux.wireless-testing/include/linux/list.h:89).
84	 * This is only for internal list manipulation where we know
85	 * the prev/next entries already!
86	 */
87	static inline void __list_del(struct list_head * prev, struct list_head * next)
88	{
89		next->prev = prev;
90		prev->next = next;
91	}
92	
93	/**


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 21:38           ` Ben Greear
  2010-10-04 22:42             ` Ben Greear
@ 2010-10-04 23:48             ` Luis R. Rodriguez
  2010-10-05  3:39               ` Ben Greear
  2010-10-05  3:43               ` Ben Greear
  1 sibling, 2 replies; 14+ messages in thread
From: Luis R. Rodriguez @ 2010-10-04 23:48 UTC (permalink / raw)
  To: Ben Greear; +Cc: Johannes Berg, linux-wireless@vger.kernel.org

On Mon, Oct 4, 2010 at 2:38 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 10/04/2010 02:13 PM, Luis R. Rodriguez wrote:
>>
>> On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez<mcgrof@gmail.com>
>>  wrote:
>>>
>>> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
>>> <johannes@sipsolutions.net>  wrote:
>>>>
>>>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>>>>>
>>>>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>>>>>>
>>>>>> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>>>>>>>
>>>>>>> Just in case this seems familiar to anyone...
>>>>>>>
>>>>>>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>>>>>>
>>>>>> Do you have debug info that'd point to a code line?
>>>>>>
>>>>>> I have never heard of this.
>>>>>
>>>>> I don't actually know how to get a line of code out of those
>>>>> hex offsets...
>>>>>
>>>>> Someone told me many years ago..but I lost that information :P
>>>>
>>>> Err, I never remember either, I think Luis knows the gdb thing ... I
>>>> usually use "objdump -dS"
>>>
>>> gdb net/mac80211/mac80211.ko
>>> l *(ieee80211_stop_tx_ba_session+0x14/0x84)
>>
>> Oops I meant:
>>
>> gdb net/mac80211/mac80211.ko
>> l *(ieee80211_stop_tx_ba_session+0x14)
>
> Thank!
>
> I had to re-compile with debugging symbols, and added kgdb (hopefully
> that won't mess anything up).

You may want to look at using netconsole instead if you're goal is
just to get some oops off the box.

CONFIG_NETCONSOLE=m

mcgrof@tux ~/bin $ cat netconsole
#!/bin/bash
sudo dmesg -n 8
sudo ip addr add 192.168.4.2/24 dev eth4
sudo modprobe netconsole
netconsole="@192.168.4.2/eth4,@192.168.4.3/00:1e:37:82:48:5a"

I'd run that script on the dev box, and on 192.168.4.3 just do `nc -l
-p 6666 | tee log`. To test just modprobe and rmmod ath9k.

> Reading symbols from
> /home/greearb/kernel/2.6/wireless-testing-dbg.p4s/net/mac80211/mac80211.ko...done.
> (gdb) l *(ieee80211_stop_tx_ba_session+0x14)
> 0x54fe is in ieee80211_stop_tx_ba_session
> (/home/greearb/git/linux.wireless-testing/net/mac80211/agg-tx.c:595).
> 590
> 591     int ieee80211_stop_tx_ba_session(struct ieee80211_sta *pubsta, u16
> tid)
> 592     {
> 593             struct sta_info *sta = container_of(pubsta, struct sta_info,
> sta);
> 594             struct ieee80211_sub_if_data *sdata = sta->sdata;
> 595             struct ieee80211_local *local = sdata->local;

What was the oops complaint? NULL pointer dereference? If sdata got
screwed up that would be pretty serious, the only way that could
happen is if somehow it managed to get removed prior to the
ieee80211_stop_tx_ba_session() or if there is some sort of memory
corruption., What steps do you follow to reproduce?

  Luis

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 23:48             ` Luis R. Rodriguez
@ 2010-10-05  3:39               ` Ben Greear
  2010-10-05  6:09                 ` Luis R. Rodriguez
  2010-10-05  3:43               ` Ben Greear
  1 sibling, 1 reply; 14+ messages in thread
From: Ben Greear @ 2010-10-05  3:39 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Johannes Berg, linux-wireless@vger.kernel.org

On 10/04/2010 04:48 PM, Luis R. Rodriguez wrote:
> On Mon, Oct 4, 2010 at 2:38 PM, Ben Greear<greearb@candelatech.com>  wrote:
>> On 10/04/2010 02:13 PM, Luis R. Rodriguez wrote:
>>>
>>> On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez<mcgrof@gmail.com>
>>>   wrote:
>>>>
>>>> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
>>>> <johannes@sipsolutions.net>    wrote:
>>>>>
>>>>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>>>>>>
>>>>>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>>>>>>>
>>>>>>> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>>>>>>>>
>>>>>>>> Just in case this seems familiar to anyone...
>>>>>>>>
>>>>>>>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>>>>>>>
>>>>>>> Do you have debug info that'd point to a code line?
>>>>>>>
>>>>>>> I have never heard of this.
>>>>>>
>>>>>> I don't actually know how to get a line of code out of those
>>>>>> hex offsets...
>>>>>>
>>>>>> Someone told me many years ago..but I lost that information :P
>>>>>
>>>>> Err, I never remember either, I think Luis knows the gdb thing ... I
>>>>> usually use "objdump -dS"
>>>>
>>>> gdb net/mac80211/mac80211.ko
>>>> l *(ieee80211_stop_tx_ba_session+0x14/0x84)
>>>
>>> Oops I meant:
>>>
>>> gdb net/mac80211/mac80211.ko
>>> l *(ieee80211_stop_tx_ba_session+0x14)
>>
>> Thank!
>>
>> I had to re-compile with debugging symbols, and added kgdb (hopefully
>> that won't mess anything up).
>
> You may want to look at using netconsole instead if you're goal is
> just to get some oops off the box.
>
> CONFIG_NETCONSOLE=m
>
> mcgrof@tux ~/bin $ cat netconsole
> #!/bin/bash
> sudo dmesg -n 8
> sudo ip addr add 192.168.4.2/24 dev eth4
> sudo modprobe netconsole
> netconsole="@192.168.4.2/eth4,@192.168.4.3/00:1e:37:82:48:5a"
>
> I'd run that script on the dev box, and on 192.168.4.3 just do `nc -l
> -p 6666 | tee log`. To test just modprobe and rmmod ath9k.
>
>> Reading symbols from
>> /home/greearb/kernel/2.6/wireless-testing-dbg.p4s/net/mac80211/mac80211.ko...done.
>> (gdb) l *(ieee80211_stop_tx_ba_session+0x14)
>> 0x54fe is in ieee80211_stop_tx_ba_session
>> (/home/greearb/git/linux.wireless-testing/net/mac80211/agg-tx.c:595).
>> 590
>> 591     int ieee80211_stop_tx_ba_session(struct ieee80211_sta *pubsta, u16
>> tid)
>> 592     {
>> 593             struct sta_info *sta = container_of(pubsta, struct sta_info,
>> sta);
>> 594             struct ieee80211_sub_if_data *sdata = sta->sdata;
>> 595             struct ieee80211_local *local = sdata->local;
>
> What was the oops complaint? NULL pointer dereference? If sdata got
> screwed up that would be pretty serious, the only way that could
> happen is if somehow it managed to get removed prior to the
> ieee80211_stop_tx_ba_session() or if there is some sort of memory
> corruption., What steps do you follow to reproduce?

It's dying trying to de-reference something, probably sdata, but for some reason I didn't
think it was NULL.  (I was having trouble getting clean stack dumps
on the serial console on top of my other issues today.)  In A probably-similar
crash it was trying to dereference 0x00100104 (See my 3:42 email)
in this series.

I added printks to the stop_tx_ba_session method to try to figure out what was happening, but
of course then I could no longer reproduce it, or at least it crashed in the cfg80211_unlink_bss
first.

To reproduce, I have a user-space app that creates 130 or so STA devices, starts wpa_supplicant
for each one, and then watches events with 'iw event', and reads /proc/net/wireless quite often
(and grabs some other stats out of debugfs, etc).  It runs 'iwconfig' and parses output for
other stats.  In short, it does a bunch of things that would be hard to reproduce with any
simple script.  The user-space app is proprietary, though I would of course give you a free
binary and help you set it up should you wish to use it.

When I disabled power-save, it ran a lot longer, but it would still hard-hang or occasionally
crash with stack-trace pointing to the 0x00100104 dereference.

Perhaps related, with power-save disabled, after a while (maybe 10-20 minutes), the system
would often get to a state where the ath9k no longer showed any additional transmitted packets
in it's debugfs traffic.  The netdevices (sta1, etc), would show tx pkt counters increasing,
and the qdiscs showed no backlog.  It was getting rx interrupts, but no tx, according to
debugfs output.  I didn't get any chance to debug that any further.

We have much better luck with ath5k in general, so I think most of these issues are
related to ath9k and/or /n in general.  But, even so, we do see deadlocks (on rtnl_lock, it seems)
with ath5k, and I still have some lockdep warnings to deal with in the mac80211 code,
so it's possible the problem is more general and ath9k just triggers it much easier.

Thanks,
Ben

>
>    Luis


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 23:48             ` Luis R. Rodriguez
  2010-10-05  3:39               ` Ben Greear
@ 2010-10-05  3:43               ` Ben Greear
  1 sibling, 0 replies; 14+ messages in thread
From: Ben Greear @ 2010-10-05  3:43 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Johannes Berg, linux-wireless@vger.kernel.org

On 10/04/2010 04:48 PM, Luis R. Rodriguez wrote:
> On Mon, Oct 4, 2010 at 2:38 PM, Ben Greear<greearb@candelatech.com>  wrote:
>> On 10/04/2010 02:13 PM, Luis R. Rodriguez wrote:
>>>
>>> On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez<mcgrof@gmail.com>
>>>   wrote:
>>>>
>>>> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
>>>> <johannes@sipsolutions.net>    wrote:
>>>>>
>>>>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>>>>>>
>>>>>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>>>>>>>
>>>>>>> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>>>>>>>>
>>>>>>>> Just in case this seems familiar to anyone...
>>>>>>>>
>>>>>>>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>>>>>>>
>>>>>>> Do you have debug info that'd point to a code line?
>>>>>>>
>>>>>>> I have never heard of this.
>>>>>>
>>>>>> I don't actually know how to get a line of code out of those
>>>>>> hex offsets...
>>>>>>
>>>>>> Someone told me many years ago..but I lost that information :P
>>>>>
>>>>> Err, I never remember either, I think Luis knows the gdb thing ... I
>>>>> usually use "objdump -dS"
>>>>
>>>> gdb net/mac80211/mac80211.ko
>>>> l *(ieee80211_stop_tx_ba_session+0x14/0x84)
>>>
>>> Oops I meant:
>>>
>>> gdb net/mac80211/mac80211.ko
>>> l *(ieee80211_stop_tx_ba_session+0x14)
>>
>> Thank!
>>
>> I had to re-compile with debugging symbols, and added kgdb (hopefully
>> that won't mess anything up).
>
> You may want to look at using netconsole instead if you're goal is
> just to get some oops off the box.
>
> CONFIG_NETCONSOLE=m
>
> mcgrof@tux ~/bin $ cat netconsole
> #!/bin/bash
> sudo dmesg -n 8
> sudo ip addr add 192.168.4.2/24 dev eth4
> sudo modprobe netconsole
> netconsole="@192.168.4.2/eth4,@192.168.4.3/00:1e:37:82:48:5a"
>
> I'd run that script on the dev box, and on 192.168.4.3 just do `nc -l
> -p 6666 | tee log`. To test just modprobe and rmmod ath9k.

I have serial-console, is netconsole any better, or just useful
if you don't have serial console?

I was sort of hoping kgdb would magically drop me into a debug
shell on panic and let me look at backtraces and variables...but
instead the system would go OOM, hard-lock, or panic and show a single
line of panic and then dead to the world.  I've enough to debug w/out
debugging kgdb as well :)

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-05  3:39               ` Ben Greear
@ 2010-10-05  6:09                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 14+ messages in thread
From: Luis R. Rodriguez @ 2010-10-05  6:09 UTC (permalink / raw)
  To: Ben Greear; +Cc: Johannes Berg, linux-wireless@vger.kernel.org

On Mon, Oct 4, 2010 at 8:39 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 10/04/2010 04:48 PM, Luis R. Rodriguez wrote:
>>
>> On Mon, Oct 4, 2010 at 2:38 PM, Ben Greear<greearb@candelatech.com>
>>  wrote:
>>>
>>> On 10/04/2010 02:13 PM, Luis R. Rodriguez wrote:
>>>>
>>>> On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez<mcgrof@gmail.com>
>>>>  wrote:
>>>>>
>>>>> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
>>>>> <johannes@sipsolutions.net>    wrote:
>>>>>>
>>>>>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>>>>>>>
>>>>>>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>>>>>>>>
>>>>>>>> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>>>>>>>>>
>>>>>>>>> Just in case this seems familiar to anyone...
>>>>>>>>>
>>>>>>>>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>>>>>>>>
>>>>>>>> Do you have debug info that'd point to a code line?
>>>>>>>>
>>>>>>>> I have never heard of this.
>>>>>>>
>>>>>>> I don't actually know how to get a line of code out of those
>>>>>>> hex offsets...
>>>>>>>
>>>>>>> Someone told me many years ago..but I lost that information :P
>>>>>>
>>>>>> Err, I never remember either, I think Luis knows the gdb thing ... I
>>>>>> usually use "objdump -dS"
>>>>>
>>>>> gdb net/mac80211/mac80211.ko
>>>>> l *(ieee80211_stop_tx_ba_session+0x14/0x84)
>>>>
>>>> Oops I meant:
>>>>
>>>> gdb net/mac80211/mac80211.ko
>>>> l *(ieee80211_stop_tx_ba_session+0x14)
>>>
>>> Thank!
>>>
>>> I had to re-compile with debugging symbols, and added kgdb (hopefully
>>> that won't mess anything up).
>>
>> You may want to look at using netconsole instead if you're goal is
>> just to get some oops off the box.
>>
>> CONFIG_NETCONSOLE=m
>>
>> mcgrof@tux ~/bin $ cat netconsole
>> #!/bin/bash
>> sudo dmesg -n 8
>> sudo ip addr add 192.168.4.2/24 dev eth4
>> sudo modprobe netconsole
>> netconsole="@192.168.4.2/eth4,@192.168.4.3/00:1e:37:82:48:5a"
>>
>> I'd run that script on the dev box, and on 192.168.4.3 just do `nc -l
>> -p 6666 | tee log`. To test just modprobe and rmmod ath9k.
>>
>>> Reading symbols from
>>>
>>> /home/greearb/kernel/2.6/wireless-testing-dbg.p4s/net/mac80211/mac80211.ko...done.
>>> (gdb) l *(ieee80211_stop_tx_ba_session+0x14)
>>> 0x54fe is in ieee80211_stop_tx_ba_session
>>> (/home/greearb/git/linux.wireless-testing/net/mac80211/agg-tx.c:595).
>>> 590
>>> 591     int ieee80211_stop_tx_ba_session(struct ieee80211_sta *pubsta,
>>> u16
>>> tid)
>>> 592     {
>>> 593             struct sta_info *sta = container_of(pubsta, struct
>>> sta_info,
>>> sta);
>>> 594             struct ieee80211_sub_if_data *sdata = sta->sdata;
>>> 595             struct ieee80211_local *local = sdata->local;
>>
>> What was the oops complaint? NULL pointer dereference? If sdata got
>> screwed up that would be pretty serious, the only way that could
>> happen is if somehow it managed to get removed prior to the
>> ieee80211_stop_tx_ba_session() or if there is some sort of memory
>> corruption., What steps do you follow to reproduce?
>
> It's dying trying to de-reference something, probably sdata, but for some
> reason I didn't
> think it was NULL.  (I was having trouble getting clean stack dumps
> on the serial console on top of my other issues today.)  In A
> probably-similar
> crash it was trying to dereference 0x00100104 (See my 3:42 email)
> in this series.
>
> I added printks to the stop_tx_ba_session method to try to figure out what
> was happening, but
> of course then I could no longer reproduce it, or at least it crashed in the
> cfg80211_unlink_bss
> first.
>
> To reproduce, I have a user-space app that creates 130 or so STA devices,
> starts wpa_supplicant
> for each one, and then watches events with 'iw event', and reads
> /proc/net/wireless quite often
> (and grabs some other stats out of debugfs, etc).  It runs 'iwconfig' and
> parses output for
> other stats.  In short, it does a bunch of things that would be hard to
> reproduce with any
> simple script.  The user-space app is proprietary, though I would of course
> give you a free
> binary and help you set it up should you wish to use it.
>
> When I disabled power-save, it ran a lot longer, but it would still
> hard-hang or occasionally
> crash with stack-trace pointing to the 0x00100104 dereference.
>
> Perhaps related, with power-save disabled, after a while (maybe 10-20
> minutes), the system
> would often get to a state where the ath9k no longer showed any additional
> transmitted packets
> in it's debugfs traffic.  The netdevices (sta1, etc), would show tx pkt
> counters increasing,
> and the qdiscs showed no backlog.  It was getting rx interrupts, but no tx,
> according to
> debugfs output.  I didn't get any chance to debug that any further.
>
> We have much better luck with ath5k in general, so I think most of these
> issues are
> related to ath9k and/or /n in general.  But, even so, we do see deadlocks
> (on rtnl_lock, it seems)
> with ath5k, and I still have some lockdep warnings to deal with in the
> mac80211 code,
> so it's possible the problem is more general and ath9k just triggers it much
> easier.

Can you try with mac80211_hwsim ?

  Luis

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-04 22:42             ` Ben Greear
@ 2010-10-05  7:56               ` Johannes Berg
  2010-10-05 16:24                 ` Ben Greear
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Berg @ 2010-10-05  7:56 UTC (permalink / raw)
  To: Ben Greear; +Cc: Luis R. Rodriguez, linux-wireless@vger.kernel.org

On Mon, 2010-10-04 at 15:42 -0700, Ben Greear wrote:

> I did find a similar crash while debugging a kernel with just symbols
> compiled in (not kgdb) though:

Doesn't seem all that similar...

> The interesting thing to me is that the 00100104 address is the same in
> both crashes, though this one is in a different method.

except for that maybe. But this is LIST_POISON1, and with seemingly
different lists. It may be faster to reproduce if you enable slab/slub
debugging.

> Also, this is with power-save NOT disabled (I was hoping to
> hit some debugging code I put in for the other crash..hit this instead.)
> 
> Oct  4 15:26:15 localhost kernel: BUG: unable to handle kernel paging request at 00100104
> Oct  4 15:26:15 localhost kernel: IP: [<fc77e038>] cfg80211_unlink_bss+0x4d/0x8d [cfg80211]

Looks like a mac80211 bug, but you snipped the stack trace.

johannes


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.
  2010-10-05  7:56               ` Johannes Berg
@ 2010-10-05 16:24                 ` Ben Greear
  0 siblings, 0 replies; 14+ messages in thread
From: Ben Greear @ 2010-10-05 16:24 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Luis R. Rodriguez, linux-wireless@vger.kernel.org

On 10/05/2010 12:56 AM, Johannes Berg wrote:
> On Mon, 2010-10-04 at 15:42 -0700, Ben Greear wrote:
>
>> I did find a similar crash while debugging a kernel with just symbols
>> compiled in (not kgdb) though:
>
> Doesn't seem all that similar...
>
>> The interesting thing to me is that the 00100104 address is the same in
>> both crashes, though this one is in a different method.
>
> except for that maybe. But this is LIST_POISON1, and with seemingly
> different lists. It may be faster to reproduce if you enable slab/slub
> debugging.
>
>> Also, this is with power-save NOT disabled (I was hoping to
>> hit some debugging code I put in for the other crash..hit this instead.)
>>
>> Oct  4 15:26:15 localhost kernel: BUG: unable to handle kernel paging request at 00100104
>> Oct  4 15:26:15 localhost kernel: IP: [<fc77e038>] cfg80211_unlink_bss+0x4d/0x8d [cfg80211]
>
> Looks like a mac80211 bug, but you snipped the stack trace.

I'll apply your deadlock-fix patch, enable slab debugging, and have another go
at it today.

Thanks!
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-10-05 16:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-04 18:51 Crash in agg-tx.c, with ath9k and lots of STA VIFs Ben Greear
2010-10-04 19:01 ` Johannes Berg
2010-10-04 19:04   ` Ben Greear
2010-10-04 19:10     ` Johannes Berg
2010-10-04 21:12       ` Luis R. Rodriguez
2010-10-04 21:13         ` Luis R. Rodriguez
2010-10-04 21:38           ` Ben Greear
2010-10-04 22:42             ` Ben Greear
2010-10-05  7:56               ` Johannes Berg
2010-10-05 16:24                 ` Ben Greear
2010-10-04 23:48             ` Luis R. Rodriguez
2010-10-05  3:39               ` Ben Greear
2010-10-05  6:09                 ` Luis R. Rodriguez
2010-10-05  3:43               ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).