All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Champion <schamp@sgi.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Robin Holt <holt@sgi.com>,
	linux-kernel@vger.kernel.org, Pavel Emelyanov <xemul@openvz.org>,
	Oleg Nesterov <oleg@tv-sign.ru>,
	Sukadev Bhattiprolu <sukadev@us.ibm.com>,
	Paul Menage <menage@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus().
Date: Mon, 04 Aug 2008 06:11:26 -0700	[thread overview]
Message-ID: <4896FFFE.7080400@sgi.com> (raw)
In-Reply-To: <m14p64zetj.fsf@frodo.ebiederm.org>

Eric W. Biederman wrote:
> Robin Holt <holt@sgi.com> writes:
>> Oops, confusing details.  That was a different problem we had been
>> tracking.
> 
> Which leads back to the original question.  What were you measuring
> that showed improvement with a larger pid hash size?
> 
> Almost by definition a larger hash table will perform better.  However
> my intuition is that we are talking about something that should be in
> the noise for most workloads.

Robin asked me to chime in on this, as I did the early "look at that" 
work and suggested it to Robin.

I noticed the potential for increasing pid_shift while chasing down a 
patch to our kernel (2.6.16 stable based) which had proc_pid_readdir() 
calling find_pid() for init_task through the highest pid #.  This patch 
caused a rather serious problem on a 2048 core Altix.  Before 
identifying the culprit, I increased pidhash_shift.  This made a *huge* 
difference: enough to get the box marginally functional while I tracked 
down the origins of the problem.

After backing out the problematic patch, I took a look at pidhash_shift 
in normal circumstances:  With pidhash_shift == 12, running only a few 
common services and monitoring tools (sendmail, nagios, etc for ~28k 
active processes, mostly of the kernel variety), the 20 cpu boot cpuset 
we use on that system to confine normal system processes and interactive 
logins was spending >1% of it's time in find_pid(), and an 'ls /proc > 
/dev/null' took >0.4s.  With pidhash_shift == 16, the timing went to 
<0.2, and the total time spent in find_pid() was reduced to noise level.

In addition to raising the limit on larger systems, it looked reasonable 
to scale the pid hash with the # processors instead of memory.  While I 
observed variably high process:cpu ratios on small systems (2c - 32c), 
they also have relatively few processes.  The 192c - 2048c systems I was 
able to look at were all hovering at 13 +/- 2 processes per cpu, even 
with wildly varying memory sizes.

Despite more recent changes in proc_pid_readdir, my results should apply 
to current source.  It looks like both the old 2.6.16 implementation and 
the current version will call find_pid (or equivalent) once for each 
successive getdents() call on /proc, excepting when the cursor is on the 
first entry.  A quick look, and we have 88 getdents64() calls both  'ps' 
and 'ls /proc' with 29k processes running, which appears to be the 
primary source of calls.

It's not giganormous, although I probably could come up with a pointless 
microbenchmark to show it's 300% better.  Importantly, it does 
noticeably improve normal interactive tools like 'ps' and 'top', a 
performance visualization tool developed by a customer (nodemon) 
refreshes faster.  For a 512k init allocation, that seems like a very 
good deal.


I'd like to lose 20,000 kernel processes in addition to growing the pid 
hash!

  reply	other threads:[~2008-08-04 13:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-31 17:00 [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus() Robin Holt
2008-07-31 18:35 ` Eric W. Biederman
2008-07-31 19:32   ` Robin Holt
2008-07-31 19:49     ` Eric W. Biederman
2008-07-31 20:08       ` Robin Holt
2008-07-31 22:04         ` Eric W. Biederman
2008-08-01 12:04           ` Robin Holt
2008-08-01 18:27             ` Eric W. Biederman
2008-08-01 19:13               ` Robin Holt
2008-08-01 19:59                 ` Eric W. Biederman
2008-08-04 13:11                   ` Stephen Champion [this message]
2008-08-04 20:36                     ` Eric W. Biederman
2008-08-04 23:58                       ` Robin Holt
2008-08-05  0:38                         ` Eric W. Biederman
2008-08-06  3:21                           ` Stephen Champion
2008-08-01 18:49             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4896FFFE.7080400@sgi.com \
    --to=schamp@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=holt@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=oleg@tv-sign.ru \
    --cc=sukadev@us.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.