From: Stephen Champion <schamp@sgi.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Robin Holt <holt@sgi.com>,
linux-kernel@vger.kernel.org, Pavel Emelyanov <xemul@openvz.org>,
Oleg Nesterov <oleg@tv-sign.ru>,
Sukadev Bhattiprolu <sukadev@us.ibm.com>,
Paul Menage <menage@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus().
Date: Mon, 04 Aug 2008 06:11:26 -0700 [thread overview]
Message-ID: <4896FFFE.7080400@sgi.com> (raw)
In-Reply-To: <m14p64zetj.fsf@frodo.ebiederm.org>
Eric W. Biederman wrote:
> Robin Holt <holt@sgi.com> writes:
>> Oops, confusing details. That was a different problem we had been
>> tracking.
>
> Which leads back to the original question. What were you measuring
> that showed improvement with a larger pid hash size?
>
> Almost by definition a larger hash table will perform better. However
> my intuition is that we are talking about something that should be in
> the noise for most workloads.
Robin asked me to chime in on this, as I did the early "look at that"
work and suggested it to Robin.
I noticed the potential for increasing pid_shift while chasing down a
patch to our kernel (2.6.16 stable based) which had proc_pid_readdir()
calling find_pid() for init_task through the highest pid #. This patch
caused a rather serious problem on a 2048 core Altix. Before
identifying the culprit, I increased pidhash_shift. This made a *huge*
difference: enough to get the box marginally functional while I tracked
down the origins of the problem.
After backing out the problematic patch, I took a look at pidhash_shift
in normal circumstances: With pidhash_shift == 12, running only a few
common services and monitoring tools (sendmail, nagios, etc for ~28k
active processes, mostly of the kernel variety), the 20 cpu boot cpuset
we use on that system to confine normal system processes and interactive
logins was spending >1% of it's time in find_pid(), and an 'ls /proc >
/dev/null' took >0.4s. With pidhash_shift == 16, the timing went to
<0.2, and the total time spent in find_pid() was reduced to noise level.
In addition to raising the limit on larger systems, it looked reasonable
to scale the pid hash with the # processors instead of memory. While I
observed variably high process:cpu ratios on small systems (2c - 32c),
they also have relatively few processes. The 192c - 2048c systems I was
able to look at were all hovering at 13 +/- 2 processes per cpu, even
with wildly varying memory sizes.
Despite more recent changes in proc_pid_readdir, my results should apply
to current source. It looks like both the old 2.6.16 implementation and
the current version will call find_pid (or equivalent) once for each
successive getdents() call on /proc, excepting when the cursor is on the
first entry. A quick look, and we have 88 getdents64() calls both 'ps'
and 'ls /proc' with 29k processes running, which appears to be the
primary source of calls.
It's not giganormous, although I probably could come up with a pointless
microbenchmark to show it's 300% better. Importantly, it does
noticeably improve normal interactive tools like 'ps' and 'top', a
performance visualization tool developed by a customer (nodemon)
refreshes faster. For a 512k init allocation, that seems like a very
good deal.
I'd like to lose 20,000 kernel processes in addition to growing the pid
hash!
next prev parent reply other threads:[~2008-08-04 13:13 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-31 17:00 [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus() Robin Holt
2008-07-31 18:35 ` Eric W. Biederman
2008-07-31 19:32 ` Robin Holt
2008-07-31 19:49 ` Eric W. Biederman
2008-07-31 20:08 ` Robin Holt
2008-07-31 22:04 ` Eric W. Biederman
2008-08-01 12:04 ` Robin Holt
2008-08-01 18:27 ` Eric W. Biederman
2008-08-01 19:13 ` Robin Holt
2008-08-01 19:59 ` Eric W. Biederman
2008-08-04 13:11 ` Stephen Champion [this message]
2008-08-04 20:36 ` Eric W. Biederman
2008-08-04 23:58 ` Robin Holt
2008-08-05 0:38 ` Eric W. Biederman
2008-08-06 3:21 ` Stephen Champion
2008-08-01 18:49 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4896FFFE.7080400@sgi.com \
--to=schamp@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=holt@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=oleg@tv-sign.ru \
--cc=sukadev@us.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox