public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Paul Menage <menage@google.com>
Cc: Benjamin Blum <bblum@google.com>,
	lizf@cn.fujitzu.com, serue@us.ibm.com,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] Adds a read-only "procs" file similar to "tasks" that  shows only unique tgids
Date: Thu, 2 Jul 2009 23:55:27 -0700	[thread overview]
Message-ID: <20090702235527.7ddc873c.akpm@linux-foundation.org> (raw)
In-Reply-To: <6599ad830907022116n7a711c7fs52ff9b400ec8797f@mail.gmail.com>

On Thu, 2 Jul 2009 21:16:15 -0700 Paul Menage <menage@google.com> wrote:

> On Thu, Jul 2, 2009 at 7:08 PM, Andrew Morton<akpm@linux-foundation.org> wrote:
> >
> > Why are we doing all this anyway? __To avoid presenting duplicated pids
> > to userspace? __Nothing else?
> 
> To present the pids or tgids in sorted order. Removing duplicates is
> only for the case of the "procs" file; that could certainly be left to
> userspace, but it wouldn't by itself remove the existing requirement
> for a contiguous array.
> 
> The seq_file iterator for these files relies on them being sorted so
> that it can pick up where it left off even in the event of the pid set
> changing between reads - it does a binary search to find the first pid
> greater than the last one that was returned, so as to guarantee that
> we return every pid that was in the cgroup before the scan started and
> remained in the cgroup until after the scan finished; there are no
> guarantees about pids that enter/leave the cgroup during the scan.

OIC.  Gee we made it hard for ourselves.  That tears it.

> > Or we can do it the other way? __Create an initially-empty local IDR
> > tree or radix tree and, within that, mark off any pids which we've
> > already emitted? __That'll have a worst-case memory consumption of
> > approximately PID_MAX_LIMIT bits -- presently that's half a megabyte.
> > With no large allocations needed?
> >
> 
> But that would be half a megabyte per open fd?

worst-case, assuming that there are 4,000,000/64 pids, spread evenly
across the pid space.

> That's a lot of memory
> that userspace can pin down by opening fds.

yup.

> The reason for the current
> pid array approach is to mean that there's only ever one pid array
> allocated at a time per cgroup, rather than per open fd.

Oh.  Didn't know that.  That would have been an interesting thing to
have documeted, but the sorry little comment which tried to explain
this got deleted by this patch :(

> There's actually a structure already for doing that - cgroup_scanner,
> which uses a high-watermark and a priority heap to provide a similar
> guarantee, with a constant memory size overhead (typically one page).
> But it can take O(n^2) time to scan a large cgroup, as would, I
> suspect, using an IDR, so it's only used for cases where we really
> can't avoid it due to locking reasons. I'd rather have something that
> accumulates unsorted pids in page-size chunks as we iterate through
> the cgroup, and then sorts them using something like Lai Jiangshan's
> patch did.

Was it a mistake to try to present an ordered, dupes-removed view of a
large amount of data from the kernel?
 
> >
> > btw, did pidlist_uniq() actually needs to allocate new memory for the
> > output array? __Could it have done the filtering in-place?
> 
> Yes - or just omit duplicates in the seq_file iterator, I guess

OK.


So now what?  lib/dynarray.c?

  reply	other threads:[~2009-07-03  6:55 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-02 23:26 [PATCH 0/2] CGroups: cgroup member list enhancement/fix Paul Menage
2009-07-02 23:26 ` [PATCH 1/2] Adds a read-only "procs" file similar to "tasks" that shows only unique tgids Paul Menage
2009-07-02 23:46   ` Andrew Morton
2009-07-03  0:31     ` Benjamin Blum
2009-07-03  0:53       ` Andrew Morton
2009-07-03  1:08         ` Paul Menage
2009-07-03  1:17           ` Benjamin Blum
2009-07-03  2:08             ` Andrew Morton
2009-07-03  4:16               ` Paul Menage
2009-07-03  6:55                 ` Andrew Morton [this message]
2009-07-03  7:54                   ` KAMEZAWA Hiroyuki
2009-07-03 16:11                   ` Paul Menage
2009-07-03 16:50                     ` Andrew Morton
2009-07-03 17:54                       ` Paul Menage
2009-07-03 18:10                         ` Andrew Morton
2009-07-15  8:33                           ` Eric W. Biederman
2009-07-15 16:18                             ` Paul Menage
2009-07-03  2:25             ` Matt Helsley
2009-07-03  3:49               ` Paul Menage
2009-07-03  7:08               ` Benjamin Blum
2009-07-03  1:30           ` Andrew Morton
2009-07-03  5:54             ` KAMEZAWA Hiroyuki
2009-07-03 15:52               ` Paul Menage
2009-07-04  2:07                 ` KAMEZAWA Hiroyuki
2009-07-04 16:10                   ` Paul Menage
2009-07-05 23:53                     ` KAMEZAWA Hiroyuki
2009-07-02 23:26 ` [PATCH 2/2] Ensures correct concurrent opening/reading of pidlists across pid namespaces Paul Menage
2009-07-02 23:54   ` Andrew Morton
2009-07-03  0:22     ` Paul Menage
2009-07-03  0:26       ` Paul Menage
2009-07-03  0:43     ` Benjamin Blum
2009-07-03  1:15 ` [PATCH 0/2] CGroups: cgroup member list enhancement/fix Li Zefan
2009-07-05  6:38 ` Balbir Singh
2009-07-10 23:58   ` Paul Menage
2009-07-13 12:11     ` Balbir Singh
2009-07-13 16:26       ` Paul Menage
2009-07-14  5:56         ` Balbir Singh
2009-07-14  6:49           ` Paul Menage
2009-07-14  7:16             ` Balbir Singh
2009-07-14 17:34               ` Benjamin Blum
2009-07-14 17:43                 ` Paul Menage
2009-07-14 20:38                   ` Paul Menage
2009-07-14 23:08                     ` Matt Helsley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090702235527.7ddc873c.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=bblum@google.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitzu.com \
    --cc=menage@google.com \
    --cc=serue@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox