From: David Miller <davem@davemloft.net>
To: linux-kernel@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Subject: NMI watchdog + NOHZ question
Date: Mon, 22 Jun 2009 00:27:22 -0700 (PDT) [thread overview]
Message-ID: <20090622.002722.53993293.davem@davemloft.net> (raw)
If some expert in this area can help I'd appreciate it.
And I'll note immediately that the issue I'm looking into
I've only investigated thoroughly with 2.6.29 vanilla.
In 2.6.29 we added an NMI watchdog timer to sparc64, it
operates identically to how the x86 one works except that
it's on by default :-)
When the qla2xxx driver is built into the kernel statically,
the firmware load causes an NMI watchdog timeout.
The qla2xxx driver is fine, it only actually disables interrupts for
very short periods to program the chip registers, telling it to load a
few blocks of the firmware via DMA or similar.
Then it waits for the interrupt to signal the firmware partial-load is
done using wait_for_completion_timeout() (see qla2x00_mailbox_command
in drivers/scsi/qla2xx/qla_mbx.c)
Assuming NOHZ is enabled, what if qla2xxx driver init is the only
running task on a cpu, no timers (at least for 5 seconds, the NMI
timeout) are due to fire, and the qla2xxx code loops in this manner
for more than 5 seconds loading the firmware?
As far as I can see it, the NOHZ code has no reason to start the timer
firing again in this situation.
So we'll just loop continuously into the scheduler (to wait for
the qla2xxx driver completion). I believe the events trigger quick
enough that need_resched() is not true if the scheduler even makes
it to the idle thread.
So the sequence seems to be scheduling in and out of a pure kernel
thread, with no pending non-scheduler timers for a long time, and all
this happening for longer than the NMI watchdog timeout, with NOHZ
enabled.
I'll note that adding printk's (this is a serial console) to the
qla2xxx mailbox command code makes the NMI watchdog problem go away :)
But if I only put printk's around the entire firmware loading
sequence, the NMI watchdog does trigger.
Is there something fundamental that should be preventing this?
next reply other threads:[~2009-06-22 7:27 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-22 7:27 David Miller [this message]
2009-06-22 8:18 ` NMI watchdog + NOHZ question Andi Kleen
2009-06-22 9:27 ` David Miller
2009-06-24 0:17 ` David Miller
2009-06-24 7:03 ` Andi Kleen
2009-06-24 7:08 ` David Miller
2009-06-24 7:15 ` Andi Kleen
2009-06-24 7:17 ` David Miller
2009-06-24 7:53 ` Andi Kleen
2009-06-24 8:51 ` David Miller
2009-06-24 9:44 ` David Miller
2009-06-24 10:23 ` Andi Kleen
2009-06-24 10:32 ` David Miller
2009-06-24 10:52 ` Andi Kleen
2009-06-24 10:59 ` David Miller
2009-06-24 11:10 ` Andi Kleen
2009-09-03 9:36 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090622.002722.53993293.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox