From: Stephen Champion <schamp@sgi.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Robin Holt <holt@sgi.com>,
linux-kernel@vger.kernel.org, Pavel Emelyanov <xemul@openvz.org>,
Oleg Nesterov <oleg@tv-sign.ru>,
Sukadev Bhattiprolu <sukadev@us.ibm.com>,
Paul Menage <menage@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus().
Date: Mon, 04 Aug 2008 06:11:26 -0700 [thread overview]
Message-ID: <4896FFFE.7080400@sgi.com> (raw)
In-Reply-To: <m14p64zetj.fsf@frodo.ebiederm.org>
Eric W. Biederman wrote:
> Robin Holt <holt@sgi.com> writes:
>> Oops, confusing details. That was a different problem we had been
>> tracking.
>
> Which leads back to the original question. What were you measuring
> that showed improvement with a larger pid hash size?
>
> Almost by definition a larger hash table will perform better. However
> my intuition is that we are talking about something that should be in
> the noise for most workloads.
Robin asked me to chime in on this, as I did the early "look at that"
work and suggested it to Robin.
I noticed the potential for increasing pid_shift while chasing down a
patch to our kernel (2.6.16 stable based) which had proc_pid_readdir()
calling find_pid() for init_task through the highest pid #. This patch
caused a rather serious problem on a 2048 core Altix. Before
identifying the culprit, I increased pidhash_shift. This made a *huge*
difference: enough to get the box marginally functional while I tracked
down the origins of the problem.
After backing out the problematic patch, I took a look at pidhash_shift
in normal circumstances: With pidhash_shift == 12, running only a few
common services and monitoring tools (sendmail, nagios, etc for ~28k
active processes, mostly of the kernel variety), the 20 cpu boot cpuset
we use on that system to confine normal system processes and interactive
logins was spending >1% of it's time in find_pid(), and an 'ls /proc >
/dev/null' took >0.4s. With pidhash_shift == 16, the timing went to
<0.2, and the total time spent in find_pid() was reduced to noise level.
In addition to raising the limit on larger systems, it looked reasonable
to scale the pid hash with the # processors instead of memory. While I
observed variably high process:cpu ratios on small systems (2c - 32c),
they also have relatively few processes. The 192c - 2048c systems I was
able to look at were all hovering at 13 +/- 2 processes per cpu, even
with wildly varying memory sizes.
Despite more recent changes in proc_pid_readdir, my results should apply
to current source. It looks like both the old 2.6.16 implementation and
the current version will call find_pid (or equivalent) once for each
successive getdents() call on /proc, excepting when the cursor is on the
first entry. A quick look, and we have 88 getdents64() calls both 'ps'
and 'ls /proc' with 29k processes running, which appears to be the
primary source of calls.
It's not giganormous, although I probably could come up with a pointless
microbenchmark to show it's 300% better. Importantly, it does
noticeably improve normal interactive tools like 'ps' and 'top', a
performance visualization tool developed by a customer (nodemon)
refreshes faster. For a 512k init allocation, that seems like a very
good deal.
I'd like to lose 20,000 kernel processes in addition to growing the pid
hash!
next prev parent reply other threads:[~2008-08-04 13:13 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-31 17:00 [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus() Robin Holt
2008-07-31 18:35 ` Eric W. Biederman
2008-07-31 19:32 ` Robin Holt
2008-07-31 19:49 ` Eric W. Biederman
2008-07-31 20:08 ` Robin Holt
2008-07-31 22:04 ` Eric W. Biederman
2008-08-01 12:04 ` Robin Holt
2008-08-01 18:27 ` Eric W. Biederman
2008-08-01 19:13 ` Robin Holt
2008-08-01 19:59 ` Eric W. Biederman
2008-08-04 13:11 ` Stephen Champion [this message]
2008-08-04 20:36 ` Eric W. Biederman
2008-08-04 23:58 ` Robin Holt
2008-08-05 0:38 ` Eric W. Biederman
2008-08-06 3:21 ` Stephen Champion
2008-08-01 18:49 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4896FFFE.7080400@sgi.com \
--to=schamp@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=holt@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=oleg@tv-sign.ru \
--cc=sukadev@us.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.