Re: [PATCH 1/2] Adds a read-only "procs" file similar to "tasks" that shows only unique tgids

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Benjamin Blum <bblum@google.com>
Cc: Paul Menage <menage@google.com>,
	lizf@cn.fujitzu.com, serue@us.ibm.com,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] Adds a read-only "procs" file similar to "tasks" that  shows only unique tgids
Date: Thu, 2 Jul 2009 17:53:41 -0700	[thread overview]
Message-ID: <20090702175341.fd2e26d5.akpm@linux-foundation.org> (raw)
In-Reply-To: <2f86c2480907021731h13e0bb95q94f06829eded9aa6@mail.gmail.com>

On Thu, 2 Jul 2009 17:31:38 -0700 Benjamin Blum <bblum@google.com> wrote:

> On Thu, Jul 2, 2009 at 4:46 PM, Andrew Morton<akpm@linux-foundation.org> wrote:
> >> +/**
> >> + * pidlist_uniq - given a kmalloc()ed list, strip out all duplicate entries
> >> + * returns -ENOMEM if the malloc fails; 0 on success
> >> + */
> >
> > The comment purports to be kerneldoc ("/**") but didn't document the
> > function's arguments.
> 
> Wasn't aware of that restriction. Recommend making
> scripts/checkpatch.pl look for that sort of thing?

ooh, hard.

Probably the kerneldoc parsing tools are the place to do this checking
- there's no point in duplicating it.  But they might not be smart
enough to detect missing arguments.

> >> + __ __ list = *p;
> >> + __ __ /*
> >> + __ __ __* we presume the 0th element is unique, so i starts at 1. trivial
> >> + __ __ __* edge cases first; no work needs to be done for either
> >> + __ __ __*/
> >> + __ __ if (*length == 0 || *length == 1)
> >> + __ __ __ __ __ __ return 0;
> >> + __ __ for (i = 1; i < *length; i++) {
> >> + __ __ __ __ __ __ BUG_ON(list[i] == PIDLIST_VALUE_NONE);
> >> + __ __ __ __ __ __ if (list[i] == list[last]) {
> >> + __ __ __ __ __ __ __ __ __ __ list[i] = PIDLIST_VALUE_NONE;
> >> + __ __ __ __ __ __ } else {
> >> + __ __ __ __ __ __ __ __ __ __ last = i;
> >> + __ __ __ __ __ __ __ __ __ __ count++;
> >> + __ __ __ __ __ __ }
> >> + __ __ }

Someone's email client is doing s/0x20/0xa0/grr

> >> + __ __ newlist = kmalloc(count * sizeof(pid_t), GFP_KERNEL);
> >
> > What is the maximum possible value of `count' here?
> >
> > Is there any way in which malicious code can exploit the potential
> > multiplicative overflow in this statement? __(kcalloc() checks for
> > this).
> >> + __ __ /*
> >> + __ __ __* If cgroup gets more users after we read count, we won't have
> >> + __ __ __* enough space - tough. __This race is indistinguishable to the
> >> + __ __ __* caller from the case that the additional cgroup users didn't
> >> + __ __ __* show up until sometime later on.
> >> + __ __ __*/
> >> + __ __ length = cgroup_task_count(cgrp);
> >> + __ __ array = kmalloc(length * sizeof(pid_t), GFP_KERNEL);
> >
> > max size?
> >
> > overflowable?
> 
> In the first snippet, count will be at most equal to length. As length
> is determined from cgroup_task_count, it can be no greater than the
> total number of pids on the system.

Well that's a problem, because there can be tens or hundreds of
thousands of pids, and there's a fairly low maximum size for kmalloc()s
(include/linux/kmalloc_sizes.h).

And even if this allocation attempt doesn't exceed KMALLOC_MAX_SIZE,
large allocations are less unreliable.  There is a large break point at
8*PAGE_SIZE (PAGE_ALLOC_COSTLY_ORDER).

It would be really nice if we could avoid "fixing" this via vmalloc(). 
Because vmalloc() causes, and is vulnerable to internal fragmentation
problems.

> (Also, the second snippet of code
> was there before, just relocated, so if there's an overflow bug in
> either it'll have already been there.)
> 
> >> @@ -2389,21 +2457,27 @@ static int cgroup_write_notify_on_release(struct cgroup *cgrp,
> >> __/*
> >> __ * for the common functions, 'private' gives the type of file
> >> __ */
> >> +/* for hysterical reasons, we can't put this on the older files */
> >
> > "raisins" ;)
> 
> They keys are right next to each other, I promise.
> 
> There was a bit of discussion on how to name these files. Paul wanted
> to start naming the generic cgroup files with the "cgroup." prefix,
> but we can't change "tasks" and "notify_on_release" etc. We decided to
> use the new name format but only for the new file - can anything be
> done about the other ones, or do they have to stay as is?

One could perhaps create an alias (symlink?) and leave that in place
for a few kernel releases and then remove the old names.  The trick to
doing this politely is to arrange for a friendly printk to come out
when userspace uses the old filename, so people know to change their
tools.  That printk should come out once-per-boot, not once-per-access.

next prev parent reply	other threads:[~2009-07-03  0:53 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-02 23:26 [PATCH 0/2] CGroups: cgroup member list enhancement/fix Paul Menage
2009-07-02 23:26 ` [PATCH 1/2] Adds a read-only "procs" file similar to "tasks" that shows only unique tgids Paul Menage
2009-07-02 23:46   ` Andrew Morton
2009-07-03  0:31     ` Benjamin Blum
2009-07-03  0:53       ` Andrew Morton [this message]
2009-07-03  1:08         ` Paul Menage
2009-07-03  1:17           ` Benjamin Blum
2009-07-03  2:08             ` Andrew Morton
2009-07-03  4:16               ` Paul Menage
2009-07-03  6:55                 ` Andrew Morton
2009-07-03  7:54                   ` KAMEZAWA Hiroyuki
2009-07-03 16:11                   ` Paul Menage
2009-07-03 16:50                     ` Andrew Morton
2009-07-03 17:54                       ` Paul Menage
2009-07-03 18:10                         ` Andrew Morton
2009-07-15  8:33                           ` Eric W. Biederman
2009-07-15 16:18                             ` Paul Menage
2009-07-03  2:25             ` Matt Helsley
2009-07-03  3:49               ` Paul Menage
2009-07-03  7:08               ` Benjamin Blum
2009-07-03  1:30           ` Andrew Morton
2009-07-03  5:54             ` KAMEZAWA Hiroyuki
2009-07-03 15:52               ` Paul Menage
2009-07-04  2:07                 ` KAMEZAWA Hiroyuki
2009-07-04 16:10                   ` Paul Menage
2009-07-05 23:53                     ` KAMEZAWA Hiroyuki
2009-07-02 23:26 ` [PATCH 2/2] Ensures correct concurrent opening/reading of pidlists across pid namespaces Paul Menage
2009-07-02 23:54   ` Andrew Morton
2009-07-03  0:22     ` Paul Menage
2009-07-03  0:26       ` Paul Menage
2009-07-03  0:43     ` Benjamin Blum
2009-07-03  1:15 ` [PATCH 0/2] CGroups: cgroup member list enhancement/fix Li Zefan
2009-07-05  6:38 ` Balbir Singh
2009-07-10 23:58   ` Paul Menage
2009-07-13 12:11     ` Balbir Singh
2009-07-13 16:26       ` Paul Menage
2009-07-14  5:56         ` Balbir Singh
2009-07-14  6:49           ` Paul Menage
2009-07-14  7:16             ` Balbir Singh
2009-07-14 17:34               ` Benjamin Blum
2009-07-14 17:43                 ` Paul Menage
2009-07-14 20:38                   ` Paul Menage
2009-07-14 23:08                     ` Matt Helsley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090702175341.fd2e26d5.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=bblum@google.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitzu.com \
    --cc=menage@google.com \
    --cc=serue@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox