* [ath9k-devel] Irritating issue (-tip)
[not found] ` <f488382f0810040238vebdcb5s5b4b99fe751d6dfe@mail.gmail.com>
@ 2008-10-04 10:25 ` Steven Noonan
2008-10-04 11:45 ` Sujith
0 siblings, 1 reply; 3+ messages in thread
From: Steven Noonan @ 2008-10-04 10:25 UTC (permalink / raw)
To: ath9k-devel
On Sat, Oct 4, 2008 at 2:38 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> On Sat, Oct 4, 2008 at 2:31 AM, Ingo Molnar <mingo@elte.hu> wrote:
>>
>> * Steven Noonan <steven@uplinklabs.net> wrote:
>>
>>> On Sat, Oct 4, 2008 at 12:43 AM, Ingo Molnar <mingo@elte.hu> wrote:
>>> >
>>> > * Steven Noonan <steven@uplinklabs.net> wrote:
>>> >
>>> >> > Looks like it is probably a genuine issue and not my own doing!
>>> >>
>>> >> It's definitely not -tip specific. I got the same thing on Linus'
>>> >> latest tree. I have not yet had the problem occur on a clean
>>> >> 2.6.27-rc8 build, but it still may be there. If not, I have a spot to
>>> >> start a git-bisect with.
>>> >
>>> > is it a hard lockup - i.e. when you trigger it in text mode does the
>>> > NumLock key stop working?
>>> >
>>> > if it's a hard lockup there's chances that nmi_watchdog=2 might catch it
>>> > (and it's easier than a full-blown bisection!) and produce some stack
>>> > dump that you could make a digital picture of.
>>> >
>>> > If you boot with nmi_watchdog=2 then double-check it really works: the
>>> > NMI count in /proc/interrupts should increase by one for each CPU/core,
>>> > per second.
>>> >
>>> > An artificial hard-lockup program ran as root should be detected by it
>>> > as well within a minute:
>>> >
>>> > $ cat > lockupcli.c
>>> > main()
>>> > {
>>> > iopl(3);
>>> > for (;;) asm("cli");
>>> > }
>>> > <Ctrl-D>
>>> > $ make lockupcli
>>> > $ ./lockupcli
>>> >
>>> > (note: save all data before executing this ;-)
>>> >
>>>
>>> The NMI watchdog does indeed catch the lockup by your short C program
>>> there. It doesn't, however, catch the lockup we -want- to catch. Also,
>>> the NMI watchdog interrupt count does not increase by 1 each second.
>>> It seems to do so every 5 seconds, or 10 seconds. Not sure why.
>>
>> hm, does the NMI count increase on all cores/CPUs?
>
> Yes, but it seems there's a bit of a gap between when one core's NMI
> count increases and the other follows suit.
>
>>
>>> I did manage to capture a video of the lockup, but it's useless
>>> without any debug printout. It doesn't seem to behave like a -typical-
>>> lockup, because I noticed that the kernel was still picking up
>>> hotplugged hardware (and printing info about it on VT12).
>>>
>>> Any ideas before I torture myself with a bisection?
>>
>> ah, so it's not a _real_ hard lockup.
>
> I suppose that's good news. I still fear a bisection could be the only
> way to pin this thing down. But the elusiveness of this particular bug
> is going to make the bisection very nondeterministic.
>
>> do you have the softlockup detector enabled:
>>
>> CONFIG_DETECT_SOFTLOCKUP=y
>>
>> ?
>>
>> That facility should print out lockups too of different kinds, best-case
>> within 60 seconds and worse-case within 480 seconds.
>>
>
> I do indeed have that detector enabled. Also, this is a somewhat
> elusive bug. I've been running the same typically-crashing kernel for
> an hour now with no such lockup. On an earlier boot, it locked
> immediately after 'local' started. And another, after X started. And
> another after I started tinkering in BASH. Not sure what to make of
> it. I'm going to try rebooting and see if I can trigger it again. And
> instead of giving up so quickly (I waited about 20 seconds in previous
> lockups), I'll wait as you recommend.
>
Oh GOODIE. I finally caught the soft lockup. Which driver/subsystem is
at fault? *drumroll*
http://www.uplinklabs.net/~tycho/linux/soft_lockup.jpg
ath9k! What a surprise. Don't get me wrong, I love the ath9k driver's
-existence-, but it's amusing to me that all but a couple of my issues
during the 2.6.27-rc* series have been ath9k-related.
Anyway, I'm CC-ing this to ath9k-devel, and Luis Rodriguez. Finally, I
can sleep tonight.
- Steven
^ permalink raw reply [flat|nested] 3+ messages in thread
* [ath9k-devel] Irritating issue (-tip)
2008-10-04 10:25 ` [ath9k-devel] Irritating issue (-tip) Steven Noonan
@ 2008-10-04 11:45 ` Sujith
2008-10-04 12:06 ` Ingo Molnar
0 siblings, 1 reply; 3+ messages in thread
From: Sujith @ 2008-10-04 11:45 UTC (permalink / raw)
To: ath9k-devel
Steven Noonan wrote:
> Oh GOODIE. I finally caught the soft lockup. Which driver/subsystem is
> at fault? *drumroll*
>
> http://www.uplinklabs.net/~tycho/linux/soft_lockup.jpg
>
> ath9k! What a surprise. Don't get me wrong, I love the ath9k driver's
> -existence-, but it's amusing to me that all but a couple of my issues
> during the 2.6.27-rc* series have been ath9k-related.
>
> Anyway, I'm CC-ing this to ath9k-devel, and Luis Rodriguez. Finally, I
> can sleep tonight.
>
This is the same issue for which a patch was posted earlier [1].
Please verify if the issue is still seen with that patch.
[1]: http://marc.info/?l=linux-wireless&m=122309915413328&w=2
Sujith
^ permalink raw reply [flat|nested] 3+ messages in thread
* [ath9k-devel] Irritating issue (-tip)
2008-10-04 11:45 ` Sujith
@ 2008-10-04 12:06 ` Ingo Molnar
0 siblings, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2008-10-04 12:06 UTC (permalink / raw)
To: ath9k-devel
* Sujith <m.sujith@gmail.com> wrote:
> Steven Noonan wrote:
> > Oh GOODIE. I finally caught the soft lockup. Which driver/subsystem is
> > at fault? *drumroll*
> >
> > http://www.uplinklabs.net/~tycho/linux/soft_lockup.jpg
> >
> > ath9k! What a surprise. Don't get me wrong, I love the ath9k driver's
> > -existence-, but it's amusing to me that all but a couple of my issues
> > during the 2.6.27-rc* series have been ath9k-related.
> >
> > Anyway, I'm CC-ing this to ath9k-devel, and Luis Rodriguez. Finally, I
> > can sleep tonight.
> >
>
> This is the same issue for which a patch was posted earlier [1].
> Please verify if the issue is still seen with that patch.
>
> [1]: http://marc.info/?l=linux-wireless&m=122309915413328&w=2
i've applied that patch to tip/out-of-tree and pushed out the latest
tip/master - Steven, could you check whether that fixes the
crashes/lockups you are experiencing?
Ingo
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-10-04 12:06 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20081003084614.GB16585@elte.hu>
[not found] ` <20081003090044.GD27551@elte.hu>
[not found] ` <f488382f0810030209m163497eax214f3fcf6294d88d@mail.gmail.com>
[not found] ` <20081003091113.GG27551@elte.hu>
[not found] ` <f488382f0810032316i2537a05fo2e55511db9757e52@mail.gmail.com>
[not found] ` <f488382f0810032340w4dd6e2b4t273315fc5379052c@mail.gmail.com>
[not found] ` <20081004074300.GA10252@elte.hu>
[not found] ` <f488382f0810040142w52e7ee40k7e993e22f4520989@mail.gmail.com>
[not found] ` <20081004093130.GA6110@elte.hu>
[not found] ` <f488382f0810040238vebdcb5s5b4b99fe751d6dfe@mail.gmail.com>
2008-10-04 10:25 ` [ath9k-devel] Irritating issue (-tip) Steven Noonan
2008-10-04 11:45 ` Sujith
2008-10-04 12:06 ` Ingo Molnar
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.