public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Dobson <colpatch@us.ibm.com>
To: Andi Kleen <ak@suse.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>,
	"Martin J. Bligh" <mbligh@aracnet.com>
Subject: Re: NUMA API for Linux
Date: Wed, 07 Apr 2004 17:58:23 -0700	[thread overview]
Message-ID: <1081385903.9925.109.camel@arrakis> (raw)
In-Reply-To: <20040407234525.4f775c16.ak@suse.de>

On Wed, 2004-04-07 at 14:45, Andi Kleen wrote:

> diff -u linux-2.6.5-mc2-numa/mm/mempolicy.c-o linux-2.6.5-mc2-numa/mm/mempolicy.c
> --- linux-2.6.5-mc2-numa/mm/mempolicy.c-o	2004-04-07 12:07:41.000000000 +0200
> +++ linux-2.6.5-mc2-numa/mm/mempolicy.c	2004-04-07 13:07:02.000000000 +0200

<snip>

> +/* Do sanity checking on a policy */
> +static int check_policy(int mode, unsigned long *nodes)
> +{
> +	int empty = bitmap_empty(nodes, MAX_NUMNODES);
> +	switch (mode) {
> +	case MPOL_DEFAULT:
> +		if (!empty)
> +			return -EINVAL;
> +		break;
> +	case MPOL_BIND:
> +	case MPOL_INTERLEAVE:
> +		/* Preferred will only use the first bit, but allow
> +		   more for now. */
> +		if (empty)
> +			return -EINVAL;
> +		break;
> +	}
> +	return check_online(nodes);
> +}

Is there a reason you don't have a case for MPOL_PREFERRED?  You have a
comment about it in the function, but you don't check the nodemask isn't
empty...

> +/* Copy a node mask from user space. */
> +static int get_nodes(unsigned long *nodes, unsigned long *nmask,
> +		     unsigned long maxnode, int mode)
> +{
> +	unsigned long k;
> +	unsigned long nlongs;
> +	unsigned long endmask;
> +
> +	--maxnode;
> +	nlongs = BITS_TO_LONGS(maxnode);
> +	if ((maxnode % BITS_PER_LONG) == 0)
> +		endmask = ~0UL;
> +	else
> +		endmask = (1UL << (maxnode % BITS_PER_LONG)) - 1;
> +
> +	/* When the user specified more nodes than supported just check
> +	   if the non supported part is all zero. */
> +	if (nmask && nlongs > BITS_TO_LONGS(MAX_NUMNODES)) {
> +		for (k = BITS_TO_LONGS(MAX_NUMNODES); k < nlongs; k++) {
> +			unsigned long t;
> +			if (get_user(t,  nmask + k))
> +				return -EFAULT;
> +			if (k == nlongs - 1) {
> +				if (t & endmask)
> +					return -EINVAL;
> +			} else if (t)
> +				return -EINVAL;
> +		}
> +		nlongs = BITS_TO_LONGS(MAX_NUMNODES);
> +		endmask = ~0UL;
> +	}
> +
> +	bitmap_clear(nodes, MAX_NUMNODES);
> +	if (nmask && copy_from_user(nodes, nmask, nlongs*sizeof(unsigned long)))
> +		return -EFAULT;
> +	nodes[nlongs-1] &= endmask;
> +	return check_policy(mode, nodes);
> +}

In this function, why do we care what bits the user set past
MAX_NUMNODES?  Why shouldn't we just silently ignore the bits like we do
in sys_sched_setaffinity?  If a user tries to hand us an 8k bitmask, my
opinion is we should just grab as much as we care about (MAX_NUMNODES
bits rounded up to the nearest UL).

> +/* Generate a custom zonelist for the BIND policy. */
> +static struct zonelist *bind_zonelist(unsigned long *nodes)
> +{
> +	struct zonelist *zl;
> +	int num, max, nd;
> +
> +	max = 1 + MAX_NR_ZONES * bitmap_weight(nodes, MAX_NUMNODES);
> +	zl = kmalloc(sizeof(void *) * max, GFP_KERNEL);
> +	if (!zl)
> +		return NULL;
> +	num = 0;
> +	for (nd = find_first_bit(nodes, MAX_NUMNODES);
> +	     nd < MAX_NUMNODES;
> +	     nd = find_next_bit(nodes, MAX_NUMNODES, 1+nd)) {
> +		int k;
> +		for (k = MAX_NR_ZONES-1; k >= 0; k--) {
> +			struct zone *z = &NODE_DATA(nd)->node_zones[k];
> +			if (!z->present_pages)
> +				continue;
> +			zl->zones[num++] = z;
> +			if (k > policy_zone)
> +				policy_zone = k;
> +		}
> +	}
> +	BUG_ON(num >= max);
> +	zl->zones[num] = NULL;
> +	return zl;
> +}

This seems a bit strange to me.  Instead of just allocating a whole
struct zonelist, you're allocating part of one?  I guess it's safe,
since the array is meant to be NULL terminated, but we should put a note
in any code using these zonelists that they *aren't* regular zonelists,
they will be smaller, and dereferencing arbitrary array elements in the
struct could be dangerous.  I think we'd be better off creating a
kmem_cache_t for these and using *whole* zonelist structures. 
Allocating part of a well-defined structure makes me a bit nervous...

> +/* Create a new policy */
> +static struct mempolicy *new_policy(int mode, unsigned long *nodes)
> +{
> +	struct mempolicy *policy;
> +	PDprintk("setting mode %d nodes[0] %lx\n", mode, nodes[0]);
> +	if (mode == MPOL_DEFAULT)
> +		return NULL;
> +	policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
> +	if (!policy)
> +		return ERR_PTR(-ENOMEM);
> +	atomic_set(&policy->refcnt, 1);
> +	switch (mode) {
> +	case MPOL_INTERLEAVE:
> +		bitmap_copy(policy->v.nodes, nodes, MAX_NUMNODES);
> +		break;
> +	case MPOL_PREFERRED:
> +		policy->v.preferred_node = find_first_bit(nodes, MAX_NUMNODES);
> +		if (policy->v.preferred_node >= MAX_NUMNODES)
> +			policy->v.preferred_node = -1;
> +		break;
> +	case MPOL_BIND:
> +		policy->v.zonelist = bind_zonelist(nodes);
> +		if (policy->v.zonelist == NULL) {
> +			kmem_cache_free(policy_cache, policy);
> +			return ERR_PTR(-ENOMEM);
> +		}
> +		break;
> +	}
> +	policy->policy = mode;
> +	return policy;
> +}

I'm guessing this is why you aren't checking MPOL_PREFERRED in
check_policy()?  So the user can call mbind() with MPOL_PREFERRED and an
empty nodes bitmap and get the default behavior you mentioned in the
comments?

I've got to get some dinner now...  I'll keep reading and send more
comments as I come up with them.

Cheers!

-Matt


  parent reply	other threads:[~2004-04-08  0:58 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-07 21:24 NUMA API for Linux Matthew Dobson
2004-04-07 21:27 ` Andi Kleen
2004-04-07 21:41   ` Matthew Dobson
2004-04-07 21:45     ` Andi Kleen
2004-04-07 22:19       ` Matthew Dobson
2004-04-08  0:58       ` Matthew Dobson [this message]
2004-04-08  1:31         ` Andi Kleen
2004-04-08 18:36           ` Matthew Dobson
2004-04-09  1:09       ` Matthew Dobson
2004-04-09  5:29         ` Martin J. Bligh
2004-04-09 18:44           ` Matthew Dobson
2004-04-15  0:38   ` Matthew Dobson
2004-04-15 10:39     ` Andi Kleen
2004-04-15 11:48       ` Robin Holt
2004-04-15 18:32         ` Matthew Dobson
2004-04-15 19:44       ` Matthew Dobson
2004-04-07 21:35 ` Matthew Dobson
2004-04-07 21:51 ` Andrew Morton
2004-04-07 22:16   ` Andi Kleen
2004-04-07 22:34     ` Andrew Morton
2004-04-07 22:39     ` Martin J. Bligh
2004-04-07 22:33       ` Andi Kleen
2004-04-07 22:38   ` Martin J. Bligh
2004-04-07 22:38     ` Andi Kleen
2004-04-07 22:52       ` Andrew Morton
2004-04-07 23:09         ` Martin J. Bligh
2004-04-07 23:35         ` Andi Kleen
2004-04-07 23:56           ` Andrew Morton
2004-04-08  0:14             ` Andi Kleen
2004-04-08  0:26               ` Andrea Arcangeli
2004-04-08  0:51                 ` Andi Kleen
2004-04-08 16:15             ` Hugh Dickins
2004-04-08 17:05               ` Martin J. Bligh
2004-04-08 18:16                 ` Hugh Dickins
2004-04-08 19:25               ` Andrew Morton
2004-04-09  2:41                 ` Wim Coekaerts
2004-04-08  0:22           ` Andrea Arcangeli
     [not found] <1IsMQ-3vi-35@gated-at.bofh.it>
     [not found] ` <1IsMS-3vi-45@gated-at.bofh.it>
     [not found]   ` <1It5U-3J1-21@gated-at.bofh.it>
     [not found]     ` <1ItfE-3PL-3@gated-at.bofh.it>
     [not found]       ` <1ISQC-7Cv-5@gated-at.bofh.it>
2004-04-09  5:39         ` Andi Kleen
     [not found] <1IL3l-1dP-35@gated-at.bofh.it>
     [not found] ` <1IMik-2is-37@gated-at.bofh.it>
2004-04-08 19:20   ` Rajesh Venkatasubramanian
2004-04-08 19:48     ` Hugh Dickins
2004-04-08 19:57       ` Rajesh Venkatasubramanian
2004-04-08 19:52     ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2004-04-06 13:33 Andi Kleen
2004-04-06 23:35 ` Paul Jackson
2004-04-08 20:12 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1081385903.9925.109.camel@arrakis \
    --to=colpatch@us.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@aracnet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox