From: Andrew Morton <akpm@linux-foundation.org>
To: mike@nauticaltech.com
Cc: bugme-daemon@bugzilla.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [Bugme-new] [Bug 12201] New: long wait in call_usermodehelper() / queue_work() / wait_for_completion()
Date: Thu, 11 Dec 2008 14:37:58 -0800 [thread overview]
Message-ID: <20081211143758.510b51b6.akpm@linux-foundation.org> (raw)
In-Reply-To: <bug-12201-10286@http.bugzilla.kernel.org/>
(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).
On Thu, 11 Dec 2008 14:15:21 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=12201
>
> Summary: long wait in call_usermodehelper() / queue_work() /
> wait_for_completion()
> Product: Process Management
> Version: 2.5
> KernelVersion: 2.6.26.8
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Other
> AssignedTo: process_other@kernel-bugs.osdl.org
> ReportedBy: mike@nauticaltech.com
>
>
> Latest working kernel version: None
> Earliest failing kernel version: 2.6.26 (can't test any older)
It'd be great if you could test something more recent please.
> Distribution: CentOS 5
> Hardware Environment: Sun x4450, 16-cores, 128GB of RAM
> Software Environment: CentOS 5 + Apache webserver
> Problem Description:
> My problem started with SSH using the audit library, and my kernel not having
> AUDIT support. During strace, any call to socket(PF_NETLINK, SOCK_RAW,
> NETLINK_AUDIT) took 1-2 seconds to return. During this time, sys% was high.
Well this is bad. We don't want the kernel calling out to userspace
each time you run socket(PF_NETLINK, ...). The performance could be
awful.
I don't know if this is a net problem, an audit problem or whatever.
Probably the offending kernel code simply shouldn't exist if
CONFIG_AUDIT=n.
Please attach a copy of the config to
http://bugzilla.kernel.org/show_bug.cgi?id=12201
> As I continued to dig deeper (using lots of printks), I found that these delays
> were caused by the netlink_create() code calling request_module() to find/load
> a module for AUDIT support which doesn't exist.
>
> Continuing to dig, I found that request_module() uses call_usermodehelper() to
> run /sbin/modprobe to find/load the module.
>
> The farthest I got is that after the process is created, we call
> wait_for_completion() to get the result of that process. This waiting process
> takes 1-2 seconds.
>
> The big problem in troubleshooting here is that this only starts to happen
> after the server has been online for a while (10 days maybe) and serving lots
> of traffic. The delay gradually builds up and maxes out at around 2 seconds.
>
> If I manually call /sbin/modprobe on the commandline and provide it the same
> arguments that call_usermodehelper() uses, the command returns instantly 100%
> of the time (assuming server has been on for a while).
>
> If I write a small pilot program that calls socket(PF_NETLINK, SOCK_RAW,
> NETLINK_AUDIT), it will delay by 1-2 seconds 100% of the time (assuming server
> has been online for a while). Certain protocol types given to socket() have
> zero delay (because no module needs to be loaded).
>
> Steps to reproduce:
> Once server has been online for a while, a simple call to socket(PF_NETLINK,
> SOCK_RAW, NETLINK_AUDIT) shows the problem.
OK, weird.
Please get sysrq working then get us a task trace, so we can see who is
sleeping where. Do this:
- run your "small pilot program"
- wait one second (so we catch it while it is delaying)
- echo t > /proc/sysrq-trigger
- dmesg -s 1000000 > foo
- send us a copy of foo.
(foo will be large, so it would be best to attach it to
http://bugzilla.kernel.org/show_bug.cgi?id=12201 then email us the URL)
(Using `echo w > /proc/sysrq-trigger' would work too, if we know that
the offending processes are stuck in D state. It will produce less
output)
next parent reply other threads:[~2008-12-11 22:38 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-12201-10286@http.bugzilla.kernel.org/>
2008-12-11 22:37 ` Andrew Morton [this message]
2008-12-11 22:48 ` [Bugme-new] [Bug 12201] New: long wait in call_usermodehelper() / queue_work() / wait_for_completion() Kay Sievers
2008-12-12 19:47 ` Michael Spiegle
2008-12-11 22:51 ` Evgeniy Polyakov
2008-12-11 23:05 ` Evgeniy Polyakov
2008-12-12 19:51 ` Michael Spiegle
2008-12-12 19:42 ` Michael Spiegle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081211143758.510b51b6.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=bugme-daemon@bugzilla.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mike@nauticaltech.com \
--cc=netdev@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).