* [ath9k-devel] ath9k deadbeef
@ 2011-01-07 16:34 Brian Prodoehl
2011-01-07 16:55 ` Senthilkumar Balasubramanian
0 siblings, 1 reply; 10+ messages in thread
From: Brian Prodoehl @ 2011-01-07 16:34 UTC (permalink / raw)
To: ath9k-devel
What's the story on reading a register in ath9k and getting
0xDEADBEEF? I know this has come up before, and whatever the symptom
is sort of goes away and everyone moves on, but I haven't seen a
discussion of why deadbeef comes up in the first place. A couple
weeks ago I had a calibration fail on an AR9280 with a recent
compat-wireless, and dmesg revealed that some register value was
0xDEADBEEF (sadly I wasn't in a position to capture it). If I set up
a loop to read a big register block over and over again on AR9220 and
AR9280 chips, it's very easy to get large chunks of the register set
to read as 0xDEADBEEF. Should the register read routine be checking
for this? Are we occasionally getting deadbeef at random points, and
doing the wrong thing because finally some bit we're checking on is
cleared or set?
I wonder if adding a WARN_ON whenever a register reads as deadbeef
would shed some light on any of the common bugs (such as the "ath:
Could not stop RX, we could be confusing the DMA engine when we start
RX up" problem, which I see all the time) or the others that are more
easily reproduced with lots of vifs.
I just don't know enough about why I'm seeing deadbeef in the first
place to say where it may or may not cause problems.
-Brian
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ath9k-devel] ath9k deadbeef
2011-01-07 16:34 [ath9k-devel] ath9k deadbeef Brian Prodoehl
@ 2011-01-07 16:55 ` Senthilkumar Balasubramanian
2011-01-07 17:24 ` Ben Greear
0 siblings, 1 reply; 10+ messages in thread
From: Senthilkumar Balasubramanian @ 2011-01-07 16:55 UTC (permalink / raw)
To: ath9k-devel
On Fri, Jan 7, 2011 at 10:04 PM, Brian Prodoehl <bprodoehl@gmail.com> wrote:
> What's the story on reading a register in ath9k and getting
> 0xDEADBEEF? ?I know this has come up before, and whatever the symptom
If you are seeing DEADBEEF then the chip is in SLEEP state and i believe
there is some race condition still where we need to handle
sleep/wakeup properly.
Can you please provide us the steps to reproduce this issue consistently?
I assume you are using the latest wireless testing, please confirm.
> is sort of goes away and everyone moves on, but I haven't seen a
> discussion of why deadbeef comes up in the first place. ?A couple
> weeks ago I had a calibration fail on an AR9280 with a recent
> compat-wireless, and dmesg revealed that some register value was
> 0xDEADBEEF (sadly I wasn't in a position to capture it). ?If I set up
> a loop to read a big register block over and over again on AR9220 and
> AR9280 chips, it's very easy to get large chunks of the register set
> to read as 0xDEADBEEF. ?Should the register read routine be checking
> for this? ?Are we occasionally getting deadbeef at random points, and
> doing the wrong thing because finally some bit we're checking on is
> cleared or set?
>
> I wonder if adding a WARN_ON whenever a register reads as deadbeef
> would shed some light on any of the common bugs (such as the "ath:
> Could not stop RX, we could be confusing the DMA engine when we start
> RX up" problem, which I see all the time) or the others that are more
> easily reproduced with lots of vifs.
>
> I just don't know enough about why I'm seeing deadbeef in the first
> place to say where it may or may not cause problems.
>
> -Brian
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel at lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ath9k-devel] ath9k deadbeef
2011-01-07 16:55 ` Senthilkumar Balasubramanian
@ 2011-01-07 17:24 ` Ben Greear
2011-01-07 17:44 ` Brian Prodoehl
0 siblings, 1 reply; 10+ messages in thread
From: Ben Greear @ 2011-01-07 17:24 UTC (permalink / raw)
To: ath9k-devel
On 01/07/2011 08:55 AM, Senthilkumar Balasubramanian wrote:
> On Fri, Jan 7, 2011 at 10:04 PM, Brian Prodoehl<bprodoehl@gmail.com> wrote:
>> What's the story on reading a register in ath9k and getting
>> 0xDEADBEEF? I know this has come up before, and whatever the symptom
>
> If you are seeing DEADBEEF then the chip is in SLEEP state and i believe
> there is some race condition still where we need to handle
> sleep/wakeup properly.
>
> Can you please provide us the steps to reproduce this issue consistently?
>
> I assume you are using the latest wireless testing, please confirm.
For me: Create 60 stations, bring them up, rmmod ath9k modules.
Watch DMA fail to stop and very often DEADBEEF register reads.
Probably a lot less stations would do the same, but it happens very
often with 60.
Doesn't seem to do any lasting harm, however... The failure to
stop DMA *might* be the cause of the rmmod corruption crash I reported
yesterday, but not certain.
I am using latest wireless-testing, but have seen this same general
problem for months.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ath9k-devel] ath9k deadbeef
2011-01-07 17:24 ` Ben Greear
@ 2011-01-07 17:44 ` Brian Prodoehl
2011-01-07 18:05 ` Peter Stuge
0 siblings, 1 reply; 10+ messages in thread
From: Brian Prodoehl @ 2011-01-07 17:44 UTC (permalink / raw)
To: ath9k-devel
On Fri, Jan 7, 2011 at 12:24 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 01/07/2011 08:55 AM, Senthilkumar Balasubramanian wrote:
>>
>> On Fri, Jan 7, 2011 at 10:04 PM, Brian Prodoehl<bprodoehl@gmail.com>
>> ?wrote:
>>>
>>> What's the story on reading a register in ath9k and getting
>>> 0xDEADBEEF? ?I know this has come up before, and whatever the symptom
>>
>> If you are seeing DEADBEEF then the chip is in SLEEP state and i believe
>> there is some race condition still where we need to handle
>> sleep/wakeup properly.
>>
>> Can you please provide us the steps to reproduce this issue consistently?
>>
>> I assume you are using the latest wireless testing, please confirm.
>
> For me: ?Create 60 stations, bring them up, rmmod ath9k modules.
>
> Watch DMA fail to stop and very often DEADBEEF register reads.
>
> Probably a lot less stations would do the same, but it happens very
> often with 60.
>
> Doesn't seem to do any lasting harm, however... ?The failure to
> stop DMA *might* be the cause of the rmmod corruption crash I reported
> yesterday, but not certain.
>
> I am using latest wireless-testing, but have seen this same general
> problem for months.
>
> Thanks,
> Ben
Getting that value while the chip is sleeping makes perfect sense.
Thanks for that nugget of info! It's probably fair to say that we
should never see that value on a register read during normal
operation, so I'm adding the WARN_ON that I talked about to my tree,
and will post whatever code paths I'm seeing traversed while the chip
is sleeping. By normal operation, I mean a register read from within
the driver, and not test loops or something out of debugfs. For the
past few weeks I've been running compat-wireless-2010-12-16 on x86 and
ARM. Now I'm setting up to run 12-26.
-Brian
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ath9k-devel] ath9k deadbeef
2011-01-07 17:44 ` Brian Prodoehl
@ 2011-01-07 18:05 ` Peter Stuge
2011-01-07 18:14 ` [ath9k-devel] unknown header type 04, ignoring device Peter Stuge
2011-01-07 18:15 ` [ath9k-devel] ath9k deadbeef Ben Greear
0 siblings, 2 replies; 10+ messages in thread
From: Peter Stuge @ 2011-01-07 18:05 UTC (permalink / raw)
To: ath9k-devel
Brian Prodoehl wrote:
> I'm adding the WARN_ON that I talked about to my tree,
Where can I grab that commit?
> and will post whatever code paths I'm seeing traversed while the
> chip is sleeping.
They end up in kernel log, right?
I'm inspired by the recent dialogue and am just about to reboot to
wireless-testing master-2011-01-05 and try to live with it for a
while.
Linus' 2010-12-30 master locks up hard.
Happened within one second of associating while in X, tried first
with iw connect, then with wpa_supplicant. I use KMS.
Third try I remained in console mode, shut down X, and could then
work over wifi and listen to streaming for a while, maybe ten or
fifteen minutes. After the third hard lock and a power cycle the
kernel says:
[ 0.375411] pci 0000:02:02.0: [0000:0004] type 4 class 0x000004
[ 0.375416] pci 0000:02:02.0: unknown header type 04, ignoring device
After this I usually need to power cycle a couple of times to get the
card responding again on the bus. This card was installed brand new
from ESD bag, clearly without marks on the miniPCI card edge
contacts.
Hard lock of course means no debug messages. :\ Note not kernel
panic, just CPU stop. Suspect PCI problem. Would be nice to have
a miniPCI proxy card that I could hook logic analyzer to.
Well, this is as it has been, more to come with wireless-testing..
//Peter
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ath9k-devel] unknown header type 04, ignoring device
2011-01-07 18:05 ` Peter Stuge
@ 2011-01-07 18:14 ` Peter Stuge
2011-01-11 15:30 ` Mohammed Shafi
2011-01-07 18:15 ` [ath9k-devel] ath9k deadbeef Ben Greear
1 sibling, 1 reply; 10+ messages in thread
From: Peter Stuge @ 2011-01-07 18:14 UTC (permalink / raw)
To: ath9k-devel
Peter Stuge wrote:
> After the third hard lock and a power cycle the kernel says:
>
> [ 0.375411] pci 0000:02:02.0: [0000:0004] type 4 class 0x000004
> [ 0.375416] pci 0000:02:02.0: unknown header type 04, ignoring device
>
> After this I usually need to power cycle a couple of times to get the
> card responding again on the bus.
(on boot)
Using direct access a while later the card responds:
# lspci -A intel-conf1 -vs 2:2 -xxx -nn
02:02.0 Network controller [0280]: Atheros Communications Inc. AR922X Wireless Network Adapter [168c:0029] (rev 01)
Subsystem: Atheros Communications Inc. Device [168c:2096]
Flags: 66MHz, medium devsel
Memory at 00040000 (32-bit, non-prefetchable) [disabled]
Capabilities: [44] Power Management version 2
Kernel modules: ath9k
00: 8c 16 29 00 00 01 b0 82 01 00 80 02 00 00 00 00
10: 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 96 20
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 00 00
40: 80 01 00 00 01 00 82 48 00 01 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Could I have some issues in my system that cause the card to not boot
up quick enough? If yes, what should I look for?
Is there a sidechannel on the chip/card that I could listen to for
some information?
//Peter
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ath9k-devel] ath9k deadbeef
2011-01-07 18:05 ` Peter Stuge
2011-01-07 18:14 ` [ath9k-devel] unknown header type 04, ignoring device Peter Stuge
@ 2011-01-07 18:15 ` Ben Greear
2011-01-07 23:53 ` Peter Stuge
1 sibling, 1 reply; 10+ messages in thread
From: Ben Greear @ 2011-01-07 18:15 UTC (permalink / raw)
To: ath9k-devel
On 01/07/2011 10:05 AM, Peter Stuge wrote:
> Brian Prodoehl wrote:
>> I'm adding the WARN_ON that I talked about to my tree,
>
> Where can I grab that commit?
>
>
>> and will post whatever code paths I'm seeing traversed while the
>> chip is sleeping.
>
> They end up in kernel log, right?
>
>
> I'm inspired by the recent dialogue and am just about to reboot to
> wireless-testing master-2011-01-05 and try to live with it for a
> while.
>
>
> Linus' 2010-12-30 master locks up hard.
>
> Happened within one second of associating while in X, tried first
> with iw connect, then with wpa_supplicant. I use KMS.
>
> Third try I remained in console mode, shut down X, and could then
> work over wifi and listen to streaming for a while, maybe ten or
> fifteen minutes. After the third hard lock and a power cycle the
> kernel says:
>
> [ 0.375411] pci 0000:02:02.0: [0000:0004] type 4 class 0x000004
> [ 0.375416] pci 0000:02:02.0: unknown header type 04, ignoring device
>
> After this I usually need to power cycle a couple of times to get the
> card responding again on the bus. This card was installed brand new
> from ESD bag, clearly without marks on the miniPCI card edge
> contacts.
>
> Hard lock of course means no debug messages. :\ Note not kernel
> panic, just CPU stop. Suspect PCI problem. Would be nice to have
> a miniPCI proxy card that I could hook logic analyzer to.
>
> Well, this is as it has been, more to come with wireless-testing..
I assume you tried sysrq when it locked hard?
Also, can you try that same NIC in different hardware?
What NIC are you using?
Thanks,
Ben
>
>
> //Peter
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel at lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ath9k-devel] ath9k deadbeef
2011-01-07 18:15 ` [ath9k-devel] ath9k deadbeef Ben Greear
@ 2011-01-07 23:53 ` Peter Stuge
2011-01-08 0:10 ` Ben Greear
0 siblings, 1 reply; 10+ messages in thread
From: Peter Stuge @ 2011-01-07 23:53 UTC (permalink / raw)
To: ath9k-devel
Ben Greear wrote:
> > Hard lock of course means no debug messages. :\ Note not kernel
> > panic, just CPU stop.
>
> I assume you tried sysrq when it locked hard?
I generally disable sysrq because this is my laptop that goes on the
road. But good point, I've enabled it now.
> Also, can you try that same NIC in different hardware?
Not without remodeling my environment a bit, repurposing the current
access point. I'd like to avoid changing that parameter.
But I have replaced the mainboard in this ThinkPad X40 during the
last year, and had issues both with old and new mainboard.
> What NIC are you using?
An AR9280 Mini PCI card. Previously I was using an Apple AR5008
(AR5414) Mini PCI card which was much more problematic in some ways,
but on the other hand I can't recall that it ever failed to answer on
the PCI bus so that Linux would disregard it in the system.
When the AR9280 seems solid I'm happy to switch back to AR5008 and
look if issues seem resolved with it as well.
//Peter
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ath9k-devel] ath9k deadbeef
2011-01-07 23:53 ` Peter Stuge
@ 2011-01-08 0:10 ` Ben Greear
0 siblings, 0 replies; 10+ messages in thread
From: Ben Greear @ 2011-01-08 0:10 UTC (permalink / raw)
To: ath9k-devel
On 01/07/2011 03:53 PM, Peter Stuge wrote:
>> Also, can you try that same NIC in different hardware?
>
> Not without remodeling my environment a bit, repurposing the current
> access point. I'd like to avoid changing that parameter.
>
> But I have replaced the mainboard in this ThinkPad X40 during the
> last year, and had issues both with old and new mainboard.
>
>
>> What NIC are you using?
>
> An AR9280 Mini PCI card. Previously I was using an Apple AR5008
> (AR5414) Mini PCI card which was much more problematic in some ways,
> but on the other hand I can't recall that it ever failed to answer on
> the PCI bus so that Linux would disregard it in the system.
What vendor & model?
Some PCI issues like this can be exacerbated by BIOS and/or motherboard chipsets,
so if you could put it in a different server, that might be interesting.
There are adapter cards to make mini-pci adapters work in regular
PCI slots, for instance.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ath9k-devel] unknown header type 04, ignoring device
2011-01-07 18:14 ` [ath9k-devel] unknown header type 04, ignoring device Peter Stuge
@ 2011-01-11 15:30 ` Mohammed Shafi
0 siblings, 0 replies; 10+ messages in thread
From: Mohammed Shafi @ 2011-01-11 15:30 UTC (permalink / raw)
To: ath9k-devel
On Fri, Jan 7, 2011 at 11:44 PM, Peter Stuge <peter@stuge.se> wrote:
> Peter Stuge wrote:
>> After the third hard lock and a power cycle the kernel says:
>>
>> [ ? ?0.375411] pci 0000:02:02.0: [0000:0004] type 4 class 0x000004
>> [ ? ?0.375416] pci 0000:02:02.0: unknown header type 04, ignoring device
>>
>> After this I usually need to power cycle a couple of times to get the
>> card responding again on the bus.
>
> (on boot)
>
> Using direct access a while later the card responds:
>
> # lspci -A intel-conf1 -vs 2:2 -xxx -nn
> 02:02.0 Network controller [0280]: Atheros Communications Inc. AR922X Wireless Network Adapter [168c:0029] (rev 01)
> ? ? ? ?Subsystem: Atheros Communications Inc. Device [168c:2096]
> ? ? ? ?Flags: 66MHz, medium devsel
> ? ? ? ?Memory at 00040000 (32-bit, non-prefetchable) [disabled]
> ? ? ? ?Capabilities: [44] Power Management version 2
> ? ? ? ?Kernel modules: ath9k
> 00: 8c 16 29 00 00 01 b0 82 01 00 80 02 00 00 00 00
> 10: 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 96 20
> 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 00 00
> 40: 80 01 00 00 01 00 82 48 00 01 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
>
> Could I have some issues in my system that cause the card to not boot
> up quick enough? If yes, what should I look for?
May be and this does not seems to be an issue specific to ath9k(based
on goggling your problem header).
>
> Is there a sidechannel on the chip/card that I could listen to for
> some information?
>
>
> //Peter
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel at lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-01-11 15:30 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-07 16:34 [ath9k-devel] ath9k deadbeef Brian Prodoehl
2011-01-07 16:55 ` Senthilkumar Balasubramanian
2011-01-07 17:24 ` Ben Greear
2011-01-07 17:44 ` Brian Prodoehl
2011-01-07 18:05 ` Peter Stuge
2011-01-07 18:14 ` [ath9k-devel] unknown header type 04, ignoring device Peter Stuge
2011-01-11 15:30 ` Mohammed Shafi
2011-01-07 18:15 ` [ath9k-devel] ath9k deadbeef Ben Greear
2011-01-07 23:53 ` Peter Stuge
2011-01-08 0:10 ` Ben Greear
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.