From: Matthew Dobson <colpatch@us.ibm.com>
To: Andi Kleen <ak@suse.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
"Martin J. Bligh" <mbligh@aracnet.com>
Subject: Re: NUMA API for Linux
Date: Thu, 08 Apr 2004 11:36:47 -0700 [thread overview]
Message-ID: <1081449406.12673.27.camel@arrakis> (raw)
In-Reply-To: <20040408033125.376459b3.ak@suse.de>
On Wed, 2004-04-07 at 18:31, Andi Kleen wrote:
> On Wed, 07 Apr 2004 17:58:23 -0700
> Matthew Dobson <colpatch@us.ibm.com> wrote:
>
>
> > Is there a reason you don't have a case for MPOL_PREFERRED? You have a
> > comment about it in the function, but you don't check the nodemask isn't
> > empty...
>
> Empty prefered is a special case. It means DEFAULT. This is useful
> when you have a process policy != DEFAULT, but want to set a specific
> VMA to default. Normally default in a VMA would mean use process policy.
Ok.. That makes sense.
> > In this function, why do we care what bits the user set past
> > MAX_NUMNODES? Why shouldn't we just silently ignore the bits like we do
> > in sys_sched_setaffinity? If a user tries to hand us an 8k bitmask, my
> > opinion is we should just grab as much as we care about (MAX_NUMNODES
> > bits rounded up to the nearest UL).
>
> This is to catch uninitialized bits. Otherwise it could work on a kernel
> with small MAX_NUMNODES, and then suddenly fail on a kernel with bigger
> MAX_NUMNODES when a node isn't online.
I am of the opinion that we should allow currently offline nodes in the
user's mask. Those nodes may come online later on, and we should
respect the user's request to allocate from those nodes if possible.
Just like in sched_setaffinity() we take in the user's mask, and when we
actually use the mask to make a decision, we check it against
cpu_online_map. Just because a node isn't online at the time of the
mbind() call doesn't mean it won't be soon. Besides, we should be
checking against node_online_map anyway, because nodes could go away.
Well, maybe not right now, but in the near future. Hotplugable memory
is a reality, even if we don't support it just yet.
> > This seems a bit strange to me. Instead of just allocating a whole
> > struct zonelist, you're allocating part of one? I guess it's safe,
> > since the array is meant to be NULL terminated, but we should put a note
> > in any code using these zonelists that they *aren't* regular zonelists,
> > they will be smaller, and dereferencing arbitrary array elements in the
> > struct could be dangerous. I think we'd be better off creating a
> > kmem_cache_t for these and using *whole* zonelist structures.
> > Allocating part of a well-defined structure makes me a bit nervous...
>
> And that after all the whining about sharing policies? ;-) (a BIND policy will
> always carry a zonelist). As far as I can see all existing zonelist code
> just walks it until NULL.
>
> I would not be opposed to always using a full one, but it would use considerably
> more memory in many cases.
I'm not whining about sharing policies because of the space usage,
although that is a small side issue. I'm whining about sharing policies
because it just makes sense. You've got a data structure that is always
dynamically allocated and referenced by pointers, that has no instance
specific data in it, and that *already has* an atomic reference counter
in it. And you decided not to share this data structure?! In my
opinion, it's harder and more code to *not* share it... Instead of
copying the structure in mpol_copy(), just atomic_inc(policy->refcnt)
and we're pretty much done. You already do an atomic_dec_and_test() in
mpol_free()...
-Matt
next prev parent reply other threads:[~2004-04-08 18:37 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-04-07 21:24 NUMA API for Linux Matthew Dobson
2004-04-07 21:27 ` Andi Kleen
2004-04-07 21:41 ` Matthew Dobson
2004-04-07 21:45 ` Andi Kleen
2004-04-07 22:19 ` Matthew Dobson
2004-04-08 0:58 ` Matthew Dobson
2004-04-08 1:31 ` Andi Kleen
2004-04-08 18:36 ` Matthew Dobson [this message]
2004-04-09 1:09 ` Matthew Dobson
2004-04-09 5:29 ` Martin J. Bligh
2004-04-09 18:44 ` Matthew Dobson
2004-04-15 0:38 ` Matthew Dobson
2004-04-15 10:39 ` Andi Kleen
2004-04-15 11:48 ` Robin Holt
2004-04-15 18:32 ` Matthew Dobson
2004-04-15 19:44 ` Matthew Dobson
2004-04-07 21:35 ` Matthew Dobson
2004-04-07 21:51 ` Andrew Morton
2004-04-07 22:16 ` Andi Kleen
2004-04-07 22:34 ` Andrew Morton
2004-04-07 22:39 ` Martin J. Bligh
2004-04-07 22:33 ` Andi Kleen
2004-04-07 22:38 ` Martin J. Bligh
2004-04-07 22:38 ` Andi Kleen
2004-04-07 22:52 ` Andrew Morton
2004-04-07 23:09 ` Martin J. Bligh
2004-04-07 23:35 ` Andi Kleen
2004-04-07 23:56 ` Andrew Morton
2004-04-08 0:14 ` Andi Kleen
2004-04-08 0:26 ` Andrea Arcangeli
2004-04-08 0:51 ` Andi Kleen
2004-04-08 16:15 ` Hugh Dickins
2004-04-08 17:05 ` Martin J. Bligh
2004-04-08 18:16 ` Hugh Dickins
2004-04-08 19:25 ` Andrew Morton
2004-04-09 2:41 ` Wim Coekaerts
2004-04-08 0:22 ` Andrea Arcangeli
[not found] <1IsMQ-3vi-35@gated-at.bofh.it>
[not found] ` <1IsMS-3vi-45@gated-at.bofh.it>
[not found] ` <1It5U-3J1-21@gated-at.bofh.it>
[not found] ` <1ItfE-3PL-3@gated-at.bofh.it>
[not found] ` <1ISQC-7Cv-5@gated-at.bofh.it>
2004-04-09 5:39 ` Andi Kleen
[not found] <1IL3l-1dP-35@gated-at.bofh.it>
[not found] ` <1IMik-2is-37@gated-at.bofh.it>
2004-04-08 19:20 ` Rajesh Venkatasubramanian
2004-04-08 19:48 ` Hugh Dickins
2004-04-08 19:57 ` Rajesh Venkatasubramanian
2004-04-08 19:52 ` Andrea Arcangeli
-- strict thread matches above, loose matches on Subject: below --
2004-04-06 13:33 Andi Kleen
2004-04-06 23:35 ` Paul Jackson
2004-04-08 20:12 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1081449406.12673.27.camel@arrakis \
--to=colpatch@us.ibm.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@aracnet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox