From: Jean Delvare <jdelvare@suse.de>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@osdl.org>, Oleg Nesterov <oleg@tv-sign.ru>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, ak@suse.de,
Albert Cahalan <acahalan@gmail.com>, Paul Jackson <pj@sgi.com>
Subject: Re: [PATCH] proc: readdir race fix (take 3)
Date: Wed, 6 Sep 2006 11:01:11 +0200 [thread overview]
Message-ID: <200609061101.11544.jdelvare@suse.de> (raw)
In-Reply-To: <m1ac5e2woe.fsf_-_@ebiederm.dsl.xmission.com>
On Tuesday 5 September 2006 16:52, Eric W. Biederman wrote:
> The problem: An opendir, readdir, closedir sequence can fail to report
> process ids that are continually in use throughout the sequence of
> system calls. For this race to trigger the process that
> proc_pid_readdir stops at must exit before readdir is called again.
>
> This can cause ps to fail to report processes, and it is in violation
> of posix guarantees and normal application expectations with respect
> to readdir.
>
> Currently there is no way to work around this problem in user space
> short of providing a gargantuan buffer to user space so the directory
> read all happens in on system call.
>
> This patch implements the normal directory semantics for proc,
> that guarantee that a directory entry that is neither created nor
> destroyed while reading the directory entry will be returned. For
> directory that are either created or destroyed during the readdir you
> may or may not see them. Furthermore you may seek to a directory
> offset you have previously seen.
>
> These are the guarantee that ext[23] provides and that posix requires,
> and more importantly that user space expects. Plus it is a simple
> semantic to implement reliable service. It is just a matter of
> calling readdir a second time if you are wondering if something new
> has show up.
>
> These better semantics are implemented by scanning through the
> pids in numerical order and by making the file offset a pid
> plus a fixed offset.
>
> The pid scan happens on the pid bitmap, which when you look at it is
> remarkably efficient for a brute force algorithm. Given that a typical
> cache line is 64 bytes and thus covers space for 64*8 == 200 pids.
> There are only 40 cache lines for the entire 32K pid space. A typical
> system will have 100 pids or more so this is actually fewer cache lines
> we have to look at to scan a linked list, and the worst case of having
> to scan the entire pid bitmap is pretty reasonable.
>
> If we need something more efficient we can go to a more efficient data
> structure for indexing the pids, but for now what we have should be
> sufficient.
>
> In addition this takes no additional locks and is actually less
> code than what we are doing now.
>
> Also another very subtle bug in this area has been fixed. It is
> possible to catch a task in the middle of de_thread where a thread is
> assuming the thread of it's thread group leader. This patch carefully
> handles that case so if we hit it we don't fail to return the pid, that
> is undergoing the de_thread dance.
>
> This patch is against 2.6.18-rc6 and it should be relatively straight
> forward to backport to older kernels as well.
>
> Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
> providing the first fix, pointing this out and working on it.
Eric, Kame, thanks a lot for working on this. I'll be giving some good
testing to this patch today, and will return back to you when I'm done.
--
Jean Delvare
next prev parent reply other threads:[~2006-09-06 9:00 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-25 9:29 [RFC][PATCH] ps command race fix take 4 [4/4] proc root open/release/llseek KAMEZAWA Hiroyuki
2006-09-04 23:13 ` [PATCH] proc: readdir race fix Eric W. Biederman
2006-09-05 1:30 ` KAMEZAWA Hiroyuki
2006-09-05 2:30 ` Eric W. Biederman
2006-09-05 2:41 ` KAMEZAWA Hiroyuki
2006-09-05 2:26 ` KAMEZAWA Hiroyuki
2006-09-05 2:54 ` Eric W. Biederman
2006-09-05 3:07 ` Eric W. Biederman
2006-09-05 5:39 ` KAMEZAWA Hiroyuki
2006-09-05 10:10 ` Oleg Nesterov
2006-09-05 11:36 ` Eric W. Biederman
2006-09-05 14:52 ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
2006-09-06 9:01 ` Jean Delvare [this message]
2006-09-06 21:12 ` Jean Delvare
2006-09-06 22:25 ` Oleg Nesterov
2006-09-06 22:38 ` [PATCH] proc-readdir-race-fix-take-3-fix-3 Oleg Nesterov
2006-09-06 22:59 ` Eric W. Biederman
2006-09-08 6:38 ` Eric W. Biederman
2006-09-06 22:43 ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
2006-09-07 8:31 ` Jean Delvare
2006-09-07 13:57 ` Eric W. Biederman
2006-09-07 18:07 ` Jean Delvare
2006-09-07 18:40 ` Eric W. Biederman
2006-09-05 5:26 ` [PATCH] proc: readdir race fix KAMEZAWA Hiroyuki
2006-09-05 5:41 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200609061101.11544.jdelvare@suse.de \
--to=jdelvare@suse.de \
--cc=acahalan@gmail.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=ebiederm@xmission.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@tv-sign.ru \
--cc=pj@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox