All of lore.kernel.org
 help / color / mirror / Atom feed
* [ath9k-devel] Irritating issue (-tip)
       [not found]                 ` <f488382f0810040238vebdcb5s5b4b99fe751d6dfe@mail.gmail.com>
@ 2008-10-04 10:25                   ` Steven Noonan
  2008-10-04 11:45                     ` Sujith
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Noonan @ 2008-10-04 10:25 UTC (permalink / raw)
  To: ath9k-devel

On Sat, Oct 4, 2008 at 2:38 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> On Sat, Oct 4, 2008 at 2:31 AM, Ingo Molnar <mingo@elte.hu> wrote:
>>
>> * Steven Noonan <steven@uplinklabs.net> wrote:
>>
>>> On Sat, Oct 4, 2008 at 12:43 AM, Ingo Molnar <mingo@elte.hu> wrote:
>>> >
>>> > * Steven Noonan <steven@uplinklabs.net> wrote:
>>> >
>>> >> > Looks like it is probably a genuine issue and not my own doing!
>>> >>
>>> >> It's definitely not -tip specific. I got the same thing on Linus'
>>> >> latest tree. I have not yet had the problem occur on a clean
>>> >> 2.6.27-rc8 build, but it still may be there. If not, I have a spot to
>>> >> start a git-bisect with.
>>> >
>>> > is it a hard lockup - i.e. when you trigger it in text mode does the
>>> > NumLock key stop working?
>>> >
>>> > if it's a hard lockup there's chances that nmi_watchdog=2 might catch it
>>> > (and it's easier than a full-blown bisection!) and produce some stack
>>> > dump that you could make a digital picture of.
>>> >
>>> > If you boot with nmi_watchdog=2 then double-check it really works: the
>>> > NMI count in /proc/interrupts should increase by one for each CPU/core,
>>> > per second.
>>> >
>>> > An artificial hard-lockup program ran as root should be detected by it
>>> > as well within a minute:
>>> >
>>> >  $ cat > lockupcli.c
>>> >  main()
>>> >  {
>>> >        iopl(3);
>>> >        for (;;) asm("cli");
>>> >  }
>>> >  <Ctrl-D>
>>> >  $ make lockupcli
>>> >  $ ./lockupcli
>>> >
>>> > (note: save all data before executing this ;-)
>>> >
>>>
>>> The NMI watchdog does indeed catch the lockup by your short C program
>>> there. It doesn't, however, catch the lockup we -want- to catch. Also,
>>> the NMI watchdog interrupt count does not increase by 1 each second.
>>> It seems to do so every 5 seconds, or 10 seconds. Not sure why.
>>
>> hm, does the NMI count increase on all cores/CPUs?
>
> Yes, but it seems there's a bit of a gap between when one core's NMI
> count increases and the other follows suit.
>
>>
>>> I did manage to capture a video of the lockup, but it's useless
>>> without any debug printout. It doesn't seem to behave like a -typical-
>>> lockup, because I noticed that the kernel was still picking up
>>> hotplugged hardware (and printing info about it on VT12).
>>>
>>> Any ideas before I torture myself with a bisection?
>>
>> ah, so it's not a _real_ hard lockup.
>
> I suppose that's good news. I still fear a bisection could be the only
> way to pin this thing down. But the elusiveness of this particular bug
> is going to make the bisection very nondeterministic.
>
>> do you have the softlockup detector enabled:
>>
>>  CONFIG_DETECT_SOFTLOCKUP=y
>>
>> ?
>>
>> That facility should print out lockups too of different kinds, best-case
>> within 60 seconds and worse-case within 480 seconds.
>>
>
> I do indeed have that detector enabled. Also, this is a somewhat
> elusive bug. I've been running the same typically-crashing kernel for
> an hour now with no such lockup. On an earlier boot, it locked
> immediately after 'local' started. And another, after X started. And
> another after I started tinkering in BASH. Not sure what to make of
> it. I'm going to try rebooting and see if I can trigger it again. And
> instead of giving up so quickly (I waited about 20 seconds in previous
> lockups), I'll wait as you recommend.
>

Oh GOODIE. I finally caught the soft lockup. Which driver/subsystem is
at fault? *drumroll*

http://www.uplinklabs.net/~tycho/linux/soft_lockup.jpg

ath9k! What a surprise. Don't get me wrong, I love the ath9k driver's
-existence-, but it's amusing to me that all but a couple of my issues
during the 2.6.27-rc* series have been ath9k-related.

Anyway, I'm CC-ing this to ath9k-devel, and Luis Rodriguez. Finally, I
can sleep tonight.

- Steven

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ath9k-devel] Irritating issue (-tip)
  2008-10-04 10:25                   ` [ath9k-devel] Irritating issue (-tip) Steven Noonan
@ 2008-10-04 11:45                     ` Sujith
  2008-10-04 12:06                       ` Ingo Molnar
  0 siblings, 1 reply; 3+ messages in thread
From: Sujith @ 2008-10-04 11:45 UTC (permalink / raw)
  To: ath9k-devel

Steven Noonan wrote:
 > Oh GOODIE. I finally caught the soft lockup. Which driver/subsystem is
 > at fault? *drumroll*
 > 
 > http://www.uplinklabs.net/~tycho/linux/soft_lockup.jpg
 > 
 > ath9k! What a surprise. Don't get me wrong, I love the ath9k driver's
 > -existence-, but it's amusing to me that all but a couple of my issues
 > during the 2.6.27-rc* series have been ath9k-related.
 > 
 > Anyway, I'm CC-ing this to ath9k-devel, and Luis Rodriguez. Finally, I
 > can sleep tonight.
 > 

This is the same issue for which a patch was posted earlier [1].
Please verify if the issue is still seen with that patch.

[1]: http://marc.info/?l=linux-wireless&m=122309915413328&w=2

Sujith

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ath9k-devel] Irritating issue (-tip)
  2008-10-04 11:45                     ` Sujith
@ 2008-10-04 12:06                       ` Ingo Molnar
  0 siblings, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2008-10-04 12:06 UTC (permalink / raw)
  To: ath9k-devel


* Sujith <m.sujith@gmail.com> wrote:

> Steven Noonan wrote:
>  > Oh GOODIE. I finally caught the soft lockup. Which driver/subsystem is
>  > at fault? *drumroll*
>  > 
>  > http://www.uplinklabs.net/~tycho/linux/soft_lockup.jpg
>  > 
>  > ath9k! What a surprise. Don't get me wrong, I love the ath9k driver's
>  > -existence-, but it's amusing to me that all but a couple of my issues
>  > during the 2.6.27-rc* series have been ath9k-related.
>  > 
>  > Anyway, I'm CC-ing this to ath9k-devel, and Luis Rodriguez. Finally, I
>  > can sleep tonight.
>  > 
> 
> This is the same issue for which a patch was posted earlier [1].
> Please verify if the issue is still seen with that patch.
> 
> [1]: http://marc.info/?l=linux-wireless&m=122309915413328&w=2

i've applied that patch to tip/out-of-tree and pushed out the latest 
tip/master - Steven, could you check whether that fixes the 
crashes/lockups you are experiencing?

	Ingo

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-10-04 12:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20081003084614.GB16585@elte.hu>
     [not found] ` <20081003090044.GD27551@elte.hu>
     [not found]   ` <f488382f0810030209m163497eax214f3fcf6294d88d@mail.gmail.com>
     [not found]     ` <20081003091113.GG27551@elte.hu>
     [not found]       ` <f488382f0810032316i2537a05fo2e55511db9757e52@mail.gmail.com>
     [not found]         ` <f488382f0810032340w4dd6e2b4t273315fc5379052c@mail.gmail.com>
     [not found]           ` <20081004074300.GA10252@elte.hu>
     [not found]             ` <f488382f0810040142w52e7ee40k7e993e22f4520989@mail.gmail.com>
     [not found]               ` <20081004093130.GA6110@elte.hu>
     [not found]                 ` <f488382f0810040238vebdcb5s5b4b99fe751d6dfe@mail.gmail.com>
2008-10-04 10:25                   ` [ath9k-devel] Irritating issue (-tip) Steven Noonan
2008-10-04 11:45                     ` Sujith
2008-10-04 12:06                       ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.