Re: [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus().

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Stephen Champion <schamp@sgi.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Robin Holt <holt@sgi.com>,
	linux-kernel@vger.kernel.org, Pavel Emelyanov <xemul@openvz.org>,
	Oleg Nesterov <oleg@tv-sign.ru>,
	Sukadev Bhattiprolu <sukadev@us.ibm.com>,
	Paul Menage <menage@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus().
Date: Mon, 04 Aug 2008 06:11:26 -0700	[thread overview]
Message-ID: <4896FFFE.7080400@sgi.com> (raw)
In-Reply-To: <m14p64zetj.fsf@frodo.ebiederm.org>

Eric W. Biederman wrote:
> Robin Holt <holt@sgi.com> writes:
>> Oops, confusing details.  That was a different problem we had been
>> tracking.
> 
> Which leads back to the original question.  What were you measuring
> that showed improvement with a larger pid hash size?
> 
> Almost by definition a larger hash table will perform better.  However
> my intuition is that we are talking about something that should be in
> the noise for most workloads.

Robin asked me to chime in on this, as I did the early "look at that" 
work and suggested it to Robin.

I noticed the potential for increasing pid_shift while chasing down a 
patch to our kernel (2.6.16 stable based) which had proc_pid_readdir() 
calling find_pid() for init_task through the highest pid #.  This patch 
caused a rather serious problem on a 2048 core Altix.  Before 
identifying the culprit, I increased pidhash_shift.  This made a *huge* 
difference: enough to get the box marginally functional while I tracked 
down the origins of the problem.

After backing out the problematic patch, I took a look at pidhash_shift 
in normal circumstances:  With pidhash_shift == 12, running only a few 
common services and monitoring tools (sendmail, nagios, etc for ~28k 
active processes, mostly of the kernel variety), the 20 cpu boot cpuset 
we use on that system to confine normal system processes and interactive 
logins was spending >1% of it's time in find_pid(), and an 'ls /proc > 
/dev/null' took >0.4s.  With pidhash_shift == 16, the timing went to 
<0.2, and the total time spent in find_pid() was reduced to noise level.

In addition to raising the limit on larger systems, it looked reasonable 
to scale the pid hash with the # processors instead of memory.  While I 
observed variably high process:cpu ratios on small systems (2c - 32c), 
they also have relatively few processes.  The 192c - 2048c systems I was 
able to look at were all hovering at 13 +/- 2 processes per cpu, even 
with wildly varying memory sizes.

Despite more recent changes in proc_pid_readdir, my results should apply 
to current source.  It looks like both the old 2.6.16 implementation and 
the current version will call find_pid (or equivalent) once for each 
successive getdents() call on /proc, excepting when the cursor is on the 
first entry.  A quick look, and we have 88 getdents64() calls both  'ps' 
and 'ls /proc' with 29k processes running, which appears to be the 
primary source of calls.

It's not giganormous, although I probably could come up with a pointless 
microbenchmark to show it's 300% better.  Importantly, it does 
noticeably improve normal interactive tools like 'ps' and 'top', a 
performance visualization tool developed by a customer (nodemon) 
refreshes faster.  For a 512k init allocation, that seems like a very 
good deal.

I'd like to lose 20,000 kernel processes in addition to growing the pid 
hash!

next prev parent reply	other threads:[~2008-08-04 13:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-31 17:00 [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus() Robin Holt
2008-07-31 18:35 ` Eric W. Biederman
2008-07-31 19:32   ` Robin Holt
2008-07-31 19:49     ` Eric W. Biederman
2008-07-31 20:08       ` Robin Holt
2008-07-31 22:04         ` Eric W. Biederman
2008-08-01 12:04           ` Robin Holt
2008-08-01 18:27             ` Eric W. Biederman
2008-08-01 19:13               ` Robin Holt
2008-08-01 19:59                 ` Eric W. Biederman
2008-08-04 13:11                   ` Stephen Champion [this message]
2008-08-04 20:36                     ` Eric W. Biederman
2008-08-04 23:58                       ` Robin Holt
2008-08-05  0:38                         ` Eric W. Biederman
2008-08-06  3:21                           ` Stephen Champion
2008-08-01 18:49             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4896FFFE.7080400@sgi.com \
    --to=schamp@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=holt@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=oleg@tv-sign.ru \
    --cc=sukadev@us.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox