public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@saeurebad.de>
To: Vegard Nossum <vegard.nossum@gmail.com>
Cc: a.p.zijlstra@chello.nl, arjan@linux.intel.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] softirq softlockup debugging
Date: Tue, 24 Jun 2008 12:41:13 +0200	[thread overview]
Message-ID: <87prq79kty.fsf@skyscraper.fehenstaub.lan> (raw)
In-Reply-To: <20080622122845.GA10133@damson.getinternet.no> (Vegard Nossum's message of "Sun, 22 Jun 2008 14:28:45 +0200")

Hi Vegard,

Vegard Nossum <vegard.nossum@gmail.com> writes:

> Hi,
>
> I'm debugging a problem with a softirq that gets stuck for a long time,
> so I wrote this patch to help find out what's going wrong.
>
> I actually think it can be useful in general as well, see for example
> http://www.kerneloops.org/search.php?search=__do_softirq&btnG=Function+Search
>
> ..and these cases are virtually impossible to debug since we don't know
> anything about *what* it was that got stuck. (The NMI watchdog could
> help, though.)
>
> The patch is #ifdef-ugly, I know... Suggestions are welcome.
>
>
> Vegard
>
>
> From: Vegard Nossum <vegard.nossum@gmail.com>
> Date: Sun, 22 Jun 2008 14:12:31 +0200
> Subject: [PATCH] softirq softlockup debugging
>
>>From the Kconfig: If a softlockup happens in a softirq, the softlockup
> stack trace is utterly unhelpful; it will only show the stack up to
> __do_softirq(), since this is where interrupts are reenabled.

After more staring at the code in question, I think that the approach is
not correct (or I didn't understand it, which is not unlikely).

I hunted down the address of the traces from kerneloops.org
(__do_softirq+0x6d) on a kernel image with a fedora config and it's at
the local_irq_enable() right after the restart:label in __do_softirq().

So if the softirq handler had disabled interrupts, the softlockup would
have been detected still within the handler (when it reenables irqs and
the timer irq runs) and the stackframe should be there.

do_softirq()
  local_irq_save()			1)
  local_softirq_pending()
  __do_softirq()
   restart:				2)
    local_irq_enable()			3)
    run a handler
    local_irq_disable()			4)
    jnz restart

So the lockup must be caused somewhere
  between 1) and 3)
or
  between 4) and 3) [when we jump back]

These functions are in the path and possible candidates for causing it:

- local_softirq_pending()
- account_system_vtime()
- __local_bh_disable()
- trace_softirq_enter()
- smp_processor_id()
- set_softirq_pending()

What do you think?  You said you actually used your patch already for
debugging lockups in softirq handlers, so it confuses me why the
stackframe of the handler was no longer present.

	Hannes

  parent reply	other threads:[~2008-06-24 10:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-22 12:28 [PATCH] softirq softlockup debugging Vegard Nossum
2008-06-22 21:18 ` Arjan van de Ven
2008-06-23 19:31   ` Vegard Nossum
2008-06-24 10:41 ` Johannes Weiner [this message]
2008-06-24 11:45   ` Vegard Nossum
2008-06-24 13:06     ` Johannes Weiner
2008-06-24 12:06 ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87prq79kty.fsf@skyscraper.fehenstaub.lan \
    --to=hannes@saeurebad.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=arjan@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vegard.nossum@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox