* [ath9k-devel] Irritating issue (-tip)
[not found] ` <f488382f0810040238vebdcb5s5b4b99fe751d6dfe@mail.gmail.com>
@ 2008-10-04 10:25 ` Steven Noonan
2008-10-04 11:45 ` Sujith
0 siblings, 1 reply; 3+ messages in thread
From: Steven Noonan @ 2008-10-04 10:25 UTC (permalink / raw)
To: ath9k-devel
On Sat, Oct 4, 2008 at 2:38 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> On Sat, Oct 4, 2008 at 2:31 AM, Ingo Molnar <mingo@elte.hu> wrote:
>>
>> * Steven Noonan <steven@uplinklabs.net> wrote:
>>
>>> On Sat, Oct 4, 2008 at 12:43 AM, Ingo Molnar <mingo@elte.hu> wrote:
>>> >
>>> > * Steven Noonan <steven@uplinklabs.net> wrote:
>>> >
>>> >> > Looks like it is probably a genuine issue and not my own doing!
>>> >>
>>> >> It's definitely not -tip specific. I got the same thing on Linus'
>>> >> latest tree. I have not yet had the problem occur on a clean
>>> >> 2.6.27-rc8 build, but it still may be there. If not, I have a spot to
>>> >> start a git-bisect with.
>>> >
>>> > is it a hard lockup - i.e. when you trigger it in text mode does the
>>> > NumLock key stop working?
>>> >
>>> > if it's a hard lockup there's chances that nmi_watchdog=2 might catch it
>>> > (and it's easier than a full-blown bisection!) and produce some stack
>>> > dump that you could make a digital picture of.
>>> >
>>> > If you boot with nmi_watchdog=2 then double-check it really works: the
>>> > NMI count in /proc/interrupts should increase by one for each CPU/core,
>>> > per second.
>>> >
>>> > An artificial hard-lockup program ran as root should be detected by it
>>> > as well within a minute:
>>> >
>>> > $ cat > lockupcli.c
>>> > main()
>>> > {
>>> > iopl(3);
>>> > for (;;) asm("cli");
>>> > }
>>> > <Ctrl-D>
>>> > $ make lockupcli
>>> > $ ./lockupcli
>>> >
>>> > (note: save all data before executing this ;-)
>>> >
>>>
>>> The NMI watchdog does indeed catch the lockup by your short C program
>>> there. It doesn't, however, catch the lockup we -want- to catch. Also,
>>> the NMI watchdog interrupt count does not increase by 1 each second.
>>> It seems to do so every 5 seconds, or 10 seconds. Not sure why.
>>
>> hm, does the NMI count increase on all cores/CPUs?
>
> Yes, but it seems there's a bit of a gap between when one core's NMI
> count increases and the other follows suit.
>
>>
>>> I did manage to capture a video of the lockup, but it's useless
>>> without any debug printout. It doesn't seem to behave like a -typical-
>>> lockup, because I noticed that the kernel was still picking up
>>> hotplugged hardware (and printing info about it on VT12).
>>>
>>> Any ideas before I torture myself with a bisection?
>>
>> ah, so it's not a _real_ hard lockup.
>
> I suppose that's good news. I still fear a bisection could be the only
> way to pin this thing down. But the elusiveness of this particular bug
> is going to make the bisection very nondeterministic.
>
>> do you have the softlockup detector enabled:
>>
>> CONFIG_DETECT_SOFTLOCKUP=y
>>
>> ?
>>
>> That facility should print out lockups too of different kinds, best-case
>> within 60 seconds and worse-case within 480 seconds.
>>
>
> I do indeed have that detector enabled. Also, this is a somewhat
> elusive bug. I've been running the same typically-crashing kernel for
> an hour now with no such lockup. On an earlier boot, it locked
> immediately after 'local' started. And another, after X started. And
> another after I started tinkering in BASH. Not sure what to make of
> it. I'm going to try rebooting and see if I can trigger it again. And
> instead of giving up so quickly (I waited about 20 seconds in previous
> lockups), I'll wait as you recommend.
>
Oh GOODIE. I finally caught the soft lockup. Which driver/subsystem is
at fault? *drumroll*
http://www.uplinklabs.net/~tycho/linux/soft_lockup.jpg
ath9k! What a surprise. Don't get me wrong, I love the ath9k driver's
-existence-, but it's amusing to me that all but a couple of my issues
during the 2.6.27-rc* series have been ath9k-related.
Anyway, I'm CC-ing this to ath9k-devel, and Luis Rodriguez. Finally, I
can sleep tonight.
- Steven
^ permalink raw reply [flat|nested] 3+ messages in thread