public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: NUMA API
       [not found] <1QAMU-4gf-15@gated-at.bofh.it>
@ 2004-04-30 20:01 ` Andi Kleen
  2004-05-01  5:15   ` Martin J. Bligh
  2004-05-03 18:34   ` Ulrich Drepper
  2004-04-30 20:39 ` Andi Kleen
       [not found] ` <1RLdk-29R-11@gated-at.bofh.it>
  2 siblings, 2 replies; 7+ messages in thread
From: Andi Kleen @ 2004-04-30 20:01 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

Ulrich Drepper <drepper@redhat.com> writes:

>In the last weeks I have been working on designing a new API for a NUMA
>support library.  I am aware of the code in libnuma by ak but this code
>has many shortcomings:

> ~ a completely unacceptable library interface (e.g., global variables as
> part of the API, WTF?)

You mean numa_no_nodes et.al. ? 

This is essentially static data that never changes (like in6addr_any).
numa_all_nodes could maybe in future change with node hotplug support,
but even then it will be a global property.

Everything else is thread local.

> ~ inadequate topology discovery

I believe it is good enough for current machines, at least 
until there is enough experience to really figure out what
node discovery is needed.. I have seen some proposals
for complex graph based descriptions, but so far I have seen
nothing that could really take advantage of something so fancy.
If it should be really needed it can be added later.

IMHO we just do not know enough right now to design a good topology
discovery interface. Until that is fixed it is best to err on the
side of simplicity.

> ~ fixed cpu set size

That is wrong. The latest version does not have fixed cpu set size.

> ~ no inclusion of SMT/multicore in the cpu hierarchy

Not sure why you would care about that. NUMA is only about "what CPUs
belong to which memory block". While multicore can affect the number
of CPUs in a node the actually shared packages only have cache
effects, but not "sticky memory" effects.  To handle cache effects all
you need to do is to change scheduling, not NUMA policy. Supporting
cache policy in the NUMA policy would result in a quite complex
optimization problem on how to tune the scheduler. But the whole point
why at least I started libnuma initially was to avoid this complex
problem, and just use simple hints. For this reasons putting cache
policy into the memory policy is imho quite misguided.

> As specified, the implementation of the interface is designed with only
> the requirements of a program on NUMA hardware in mind.  I have paid no
> attention to the currently proposed kernel extensions.  If the latter do
> not really allow implementing the functionality programmers need then it
> is wasted efforts.

Well, I spent a lot of time talking to various users; and IMHO
it matches the needs of a lot of them. I did not add all the features
everybody wanted, but that was simply not possible and still comming
up with a reasonable design.

> For instance, I think the way memory allocated in interleaved fashion is
> not "ideal".  Interleaved allocation is a property of a specific
> allocation.  Global states for processes (or threads) are a terrible way
> to handle this and other properties since it requires the programmer to
> constantly switch the mode back and forth since any part of the runtime
> might be NUMA aware and reset the mode.

If you do not want per process state just use the allocation function
in libnuma instead. They use mbind() and have no per thread state,
only per VMA state.

The per process state is needed for numactl though.

I kept the support for this visible in libnuma to make it easier to convert
old code to this (just wrap some code with a policy) For designed from 
scratch programs it is probably better to use the allocation functions
with mbind directly.

> Also, the concept of hard/soft sets for CPUs is useful.  Likewise
> "spilling" over to other memory nodes.  Usually using NUMA means hinting
> the desired configuration to the system.  It'll be used whenever
> possible.  If it is not possible (for instance, if a given processor is
> not available) it is mostly no good idea to completely fail the

Agreed. That is why prefered and bind are different policies
and you can switch between them in libnuma. 

> execution.  Instead a less optimal resource should be used.  For memory
> it is hard to know how much memory on which node is in use etc.

numa_node_size()

> Another missing feature in libnuma and the current kernel design is
> support for changes in the configuration.  CPUs might be added or
> removed, likewise memory.  Additional interconnects between NUMA blocks
> might be added etc.

It is version 1.0. So far all the CPU hotplug code seems to be
still too far in flux to really do something good about it. I expect 
once all this settles down libnuma will also grow some support
for dynamic reconfiguration. 

Comments on some (not all of your criticisms):

>
> Comparison with libnuma
> =======================
>
> nodemask_t:
>
>   Unlike nodemask_t, cpu_set_t is already in use in glibc.  The affinity
>   interfaces use it so there is not need to duplicate the functionality
>   and no need to define other versions of the affinity interfaces.
>
>   Furthermore, the nodemask_t type is of fixed size.  The cpu_set_t
>   has a convenience version which is of fixed size but can be of
>   arbitrary size.  This is important has a bit of math shows:
>
>     Assume a processor with four cores and 4 threads each
>
>     Four such processors on a single NUMA node
>
>     That's a total of 64 virtual processors for one node.  With 32 such
>     nodes the 1024 processors of cpu_set_t would be filled.  And we do
>     not want to mention the total of 64 supported processors in libnuma's
>     nodemask_t.  To be future safe the bitset size must be variable.

nodemask_t has nothing to do with virtual CPUs, only with nodes
(= memory controllers) 

There is no fixed size in the current version for CPUs.
There was in some earlier version, but I quickly dropped that because
it was indeed a bad idea.

There is a fixed size nodemask type though, although its upper limit
is extremly high (4096 nodes on IA64).  I traded this limit 
for simplicity of use.


> numa_bind()    --> NUMA_mem_set_home() or NUMA_mem_set_home_thread()
>                   or NUMA_aff_set_cpu(() or NUMA_aff_set_cpu_thread()
>
>  numa_bind() misses A LOT of flexibility.  First, memory and CPU need
>  node be the same nodes. Second, thread handling is missing.  Third,
>  hard versus soft requirements are not handled for CPU usage.

Correct. That is why lower level functions exist too. numa_bind is
merely a comfortable high level utility function to make libnuma more
pleasant to use for many (but not all) users. It trades some
flexibility to cater to the common case.

> numa_police_memory()  -->  nothing yet
> 
 > I don't see why this is necessary.  Yes, address space allocation and
 > the actual allocation of memory are two steps.  But this should be
 > taken case of by the allocation functions (if necessary).  To support
 > memory allocation with other interfaces then those described here and
 > magically treat them in the "NUMA-way" seems dumb.

You need process policy for command line policy. To make converting
old programs easier I opted to expose it in libnuma too. For new programs
I agree it is better to just use the allocator functions.

> numa_set_bind_policy() --> too coarse grained
>
>  This cannot be a process property.  And it must be possible to change

It is a per thread property.

-Andi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NUMA API
       [not found] <1QAMU-4gf-15@gated-at.bofh.it>
  2004-04-30 20:01 ` NUMA API Andi Kleen
@ 2004-04-30 20:39 ` Andi Kleen
       [not found] ` <1RLdk-29R-11@gated-at.bofh.it>
  2 siblings, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2004-04-30 20:39 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

Ulrich Drepper <drepper@redhat.com> writes:

[my apologies if this turns up twice. I have had some problems with
the mailer]

>In the last weeks I have been working on designing a new API for a NUMA
>support library.  I am aware of the code in libnuma by ak but this code
>has many shortcomings:

> ~ a completely unacceptable library interface (e.g., global variables as
> part of the API, WTF?)

You mean numa_no_nodes et.al. ? 

This is essentially static data that never changes (like in6addr_any).
numa_all_nodes could maybe in future change with node hotplug support,
but even then it will be a global property.

Everything else is thread local.

> ~ inadequate topology discovery

I believe it is good enough for current machines, at least 
until there is enough experience to really figure out what
node discovery is needed.. I have seen some proposals
for complex graph based descriptions, but so far I have seen
nothing that could really take advantage of something so fancy.
If it should be really needed it can be added later.

IMHO we just do not know enough right now to design a good topology
discovery interface. Until that is fixed it is best to err on the
side of simplicity.

> ~ fixed cpu set size

That is wrong. The latest version does not have fixed cpu set size.

> ~ no inclusion of SMT/multicore in the cpu hierarchy

Not sure why you would care about that. NUMA is only about "what CPUs
belong to which memory block". While multicore can affect the number
of CPUs in a node the actually shared packages only have cache
effects, but not "sticky memory" effects.  To handle cache effects all
you need to do is to change scheduling, not NUMA policy. Supporting
cache policy in the NUMA policy would result in a quite complex
optimization problem on how to tune the scheduler. But the whole point
why at least I started libnuma initially was to avoid this complex
problem, and just use simple hints. For this reasons putting cache
policy into the memory policy is imho quite misguided.

> As specified, the implementation of the interface is designed with only
> the requirements of a program on NUMA hardware in mind.  I have paid no
> attention to the currently proposed kernel extensions.  If the latter do
> not really allow implementing the functionality programmers need then it
> is wasted efforts.

Well, I spent a lot of time talking to various users; and IMHO
it matches the needs of a lot of them. I did not add all the features
everybody wanted, but that was simply not possible and still comming
up with a reasonable design.

> For instance, I think the way memory allocated in interleaved fashion is
> not "ideal".  Interleaved allocation is a property of a specific
> allocation.  Global states for processes (or threads) are a terrible way
> to handle this and other properties since it requires the programmer to
> constantly switch the mode back and forth since any part of the runtime
> might be NUMA aware and reset the mode.

If you do not want per process state just use the allocation function
in libnuma instead. They use mbind() and have no per thread state,
only per VMA state.

The per process state is needed for numactl though.

I kept the support for this visible in libnuma to make it easier to convert
old code to this (just wrap some code with a policy) For designed from 
scratch programs it is probably better to use the allocation functions
with mbind directly.

> Also, the concept of hard/soft sets for CPUs is useful.  Likewise
> "spilling" over to other memory nodes.  Usually using NUMA means hinting
> the desired configuration to the system.  It'll be used whenever
> possible.  If it is not possible (for instance, if a given processor is
> not available) it is mostly no good idea to completely fail the

Agreed. That is why prefered and bind are different policies
and you can switch between them in libnuma. 

> execution.  Instead a less optimal resource should be used.  For memory
> it is hard to know how much memory on which node is in use etc.

numa_node_size()

> Another missing feature in libnuma and the current kernel design is
> support for changes in the configuration.  CPUs might be added or
> removed, likewise memory.  Additional interconnects between NUMA blocks
> might be added etc.

It is version 1.0. So far all the CPU hotplug code seems to be
still too far in flux to really do something good about it. I expect 
once all this settles down libnuma will also grow some support
for dynamic reconfiguration. 

Comments on some (not all of your criticisms):

>
> Comparison with libnuma
> =======================
>
> nodemask_t:
>
>   Unlike nodemask_t, cpu_set_t is already in use in glibc.  The affinity
>   interfaces use it so there is not need to duplicate the functionality
>   and no need to define other versions of the affinity interfaces.
>
>   Furthermore, the nodemask_t type is of fixed size.  The cpu_set_t
>   has a convenience version which is of fixed size but can be of
>   arbitrary size.  This is important has a bit of math shows:
>
>     Assume a processor with four cores and 4 threads each
>
>     Four such processors on a single NUMA node
>
>     That's a total of 64 virtual processors for one node.  With 32 such
>     nodes the 1024 processors of cpu_set_t would be filled.  And we do
>     not want to mention the total of 64 supported processors in libnuma's
>     nodemask_t.  To be future safe the bitset size must be variable.

nodemask_t has nothing to do with virtual CPUs, only with nodes
(= memory controllers) 

There is no fixed size in the current version for CPUs.
There was in some earlier version, but I quickly dropped that because
it was indeed a bad idea.

There is a fixed size nodemask type though, although its upper limit
is extremly high (4096 nodes on IA64).  I traded this limit 
for simplicity of use.


> numa_bind()    --> NUMA_mem_set_home() or NUMA_mem_set_home_thread()
>                   or NUMA_aff_set_cpu(() or NUMA_aff_set_cpu_thread()
>
>  numa_bind() misses A LOT of flexibility.  First, memory and CPU need
>  node be the same nodes. Second, thread handling is missing.  Third,
>  hard versus soft requirements are not handled for CPU usage.

Correct. That is why lower level functions exist too. numa_bind is
merely a comfortable high level utility function to make libnuma more
pleasant to use for many (but not all) users. It trades some
flexibility to cater to the common case.

> numa_police_memory()  -->  nothing yet
> 
 > I don't see why this is necessary.  Yes, address space allocation and
 > the actual allocation of memory are two steps.  But this should be
 > taken case of by the allocation functions (if necessary).  To support
 > memory allocation with other interfaces then those described here and
 > magically treat them in the "NUMA-way" seems dumb.

You need process policy for command line policy. To make converting
old programs easier I opted to expose it in libnuma too. For new programs
I agree it is better to just use the allocator functions.

> numa_set_bind_policy() --> too coarse grained
>
>  This cannot be a process property.  And it must be possible to change

It is a per thread property.

-Andi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NUMA API
  2004-04-30 20:01 ` NUMA API Andi Kleen
@ 2004-05-01  5:15   ` Martin J. Bligh
  2004-05-03 18:34   ` Ulrich Drepper
  1 sibling, 0 replies; 7+ messages in thread
From: Martin J. Bligh @ 2004-05-01  5:15 UTC (permalink / raw)
  To: Andi Kleen, Ulrich Drepper; +Cc: linux-kernel

>> As specified, the implementation of the interface is designed with only
>> the requirements of a program on NUMA hardware in mind.  I have paid no
>> attention to the currently proposed kernel extensions.  If the latter do
>> not really allow implementing the functionality programmers need then it
>> is wasted efforts.
> 
> Well, I spent a lot of time talking to various users; and IMHO
> it matches the needs of a lot of them. 

As have I, and the rest of IBM ... and what Andi has done (and the design
was discussed extensively with other people during the process) fulfills
the needs that we see out there.

> I did not add all the features everybody wanted, but that was simply 
> not possible and still comming up with a reasonable design.

Exactly ... this needs to be simple.

M.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NUMA API - wish list
  2004-04-30  7:35 NUMA API Ulrich Drepper
@ 2004-05-03 12:48 ` Zoltan Menyhart
  2004-05-03 17:57   ` Paul Jackson
  0 siblings, 1 reply; 7+ messages in thread
From: Zoltan Menyhart @ 2004-05-03 12:48 UTC (permalink / raw)
  To: Ulrich Drepper, linux-kernel

Can you remember back the "old golden days" when there were no open(),
read(), lseek(), write(), mmap(), etc., and one had to tell explicitly
(job control punched cards) that s/he needed the sectors 123... 145 on
the disk on channel 6 unit 7 ?
Or somewhat more recently, one had to manage by hand the memory and the
overlays.
Now we are going to manage (from applications) the topology, CPU or
memory binding. Moreover, to have the applications resolve resources
management / dependency problems / conflicts among them...

The operating systems should provide for abstractions of the actual
HW platform: file system, virtual memory, shared CPUs, etc.

Why should an application care for the actual physical characteristics ?
Including counting nanoseconds of some HW resource access time ? We'll
end up with some completely un-portable applications.

I think an application should describe what it needs for its optimal run,
e.g.:
	- I need 3 * N (where N = 1, 2, 3,...) CPUs "very close"
	  together and 2.5 Gbytes / N real memory (working set size) for
	  each CPUs "very very close to" their respective CPUs
	- Should not it fit into a "domain", the CPUs have to be
	  "very very close" to each other 3 by 3
	- If no resources for even N == 1, do not start it at all
	- Use "gang scheduling" for them, otherwise I'll busy wait :-)
	- In addition, I need M CPUs + X Gbytes of memory
	  "where my previous group is" and I need a disk I/O path of
	  the capacity of 200 Mbytes / sec "more or less close to" my
	  memory
	- I need "some more" CPUs "somewhere" with some 100 Mbytes of
	  memory "preferably close to" the CPUs and 10 Mbytes / sec
	  TCP/IP bandwidth "close to" my memory 

	- I need 70 % of the CPU time on my CPUs (the scheduler can
	  select others for the 30 % of the time left)

	- O.K. should my request be too much, here is my minimal,
	  "degraded" configuration:...

The OS reserves the resources for the application (exec time assignment)
and reports the applications what of its needs have been granted.

When the application allocates some memory, it'll say: you know, this
is for the memory pool I've described in the 5th criteria.
When it creates threads, it'll say they are in the 2nd group of threads
mentioned at the 1st line

The work load manager / load balancer can negotiate other resource
assignment at any time with the application.
The work load manager / load balancer is free to move a collection of
resources from some NUMA domains to others, provided the application's
requirements are still met. (No hard binding.)

Billing is done accordingly :-)

As you do not need to know anything about SCSI LUNs, sector IDs, phy-
sical memory maps or the other applications when you compile your kernel,  
why should an application care for HW NUMA details ?

Thanks,

Zoltán Menyhárt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NUMA API - wish list
       [not found] ` <1RLdk-29R-11@gated-at.bofh.it>
@ 2004-05-03 13:17   ` Andi Kleen
  0 siblings, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2004-05-03 13:17 UTC (permalink / raw)
  To: Zoltan.Menyhart; +Cc: linux-kernel

Zoltan Menyhart <Zoltan.Menyhart_AT_bull.net@nospam.org> writes:

> The work load manager / load balancer can negotiate other resource
> assignment at any time with the application.
> The work load manager / load balancer is free to move a collection of
> resources from some NUMA domains to others, provided the application's
> requirements are still met. (No hard binding.)

IMHO these are hard research topics that will need considerable
more work to be automated, if they will ever work automated at all.
The main problem is that you several conflicting goals: you 
want to use all available CPU power, all available memory,
all available memory bandwidth and the best average memory latency.
They all conflict.

First: basically any more advanced automatic schemes will
require to go all the way to a full workload manager 
that can move around memory later, because it is near impossible
to get even two of these goals right in advance.

I first tried to develop a NUMA scheduler "homenode scheduler" that
attempted to do a lot of this automatically.  I then realized that it
is just too hard to do and it never worked very well. That is why I
changed gears and just started with a simple API to let the user tell
the kernel what he wants.

The advantage of this is that a lot of complexity is avoided; 
e.g. the NUMA API avoids any need to move memory around.

Now if somebody comes up with a good design for a workload manager and
does all the experiments needed to validate it then it could be later
added. But defering NUMA optimization efforts until this considerable
task is solved (if it even can be solved) would be a big mistake IMHO.

> Billing is done accordingly :-)
>
> As you do not need to know anything about SCSI LUNs, sector IDs, phy-
> sical memory maps or the other applications when you compile your kernel,  
> why should an application care for HW NUMA details ?

There is a big difference between these and NUMA. 

LUNs, sectors, physical memory are all hidden for correctness. For 
that virtualization is fine, because performance is secondary 
after correctness.

But NUMA knowledge is purely for optimization. And for optimization
purposes you want to avoid virtualization layers, because they get
in the way of your optimization efforts.

When a human does NUMA optimization they usually want to work near the
bare hardware.  And if your dream of a automatic workload manager ever
worked it would also work on the bare hardware.

-Andi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NUMA API - wish list
  2004-05-03 12:48 ` NUMA API - wish list Zoltan Menyhart
@ 2004-05-03 17:57   ` Paul Jackson
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Jackson @ 2004-05-03 17:57 UTC (permalink / raw)
  To: Zoltan.Menyhart; +Cc: drepper, linux-kernel

> The operating systems should provide for abstractions of the actual ...

True ... so long as you don't confuse "operating system" with "kernel".

Most of what you describe can and should be in user space, as what I
call "system software", constructed of libraries, daemons, utilities
and specific language support.

Having the kernel support the abstraction of "file", to hide details of
sectors, channels and devices has been a great success.  But the kernel
doesn't need to support every such abstraction, such as in this case
"abstract computers" with certain amounts of compute, memory and i/o
resources.

Rather the kernel only needs to provide the essential primitives, such
as cpu and memory placement, jobs (as related set of tasks), and access
to primitive topology and hardware attributes.

(Your spam encoded from address "Zoltan.Menyhart_AT_bull.net@nospam.org"
is a minor annoyance ...).

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NUMA API
  2004-04-30 20:01 ` NUMA API Andi Kleen
  2004-05-01  5:15   ` Martin J. Bligh
@ 2004-05-03 18:34   ` Ulrich Drepper
  1 sibling, 0 replies; 7+ messages in thread
From: Ulrich Drepper @ 2004-05-03 18:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:

> You mean numa_no_nodes et.al. ? 
> 
> This is essentially static data that never changes (like in6addr_any).
> numa_all_nodes could maybe in future change with node hotplug support,
> but even then it will be a global property.

And you don't see a problem with this?  You are hardcoding variables of
a certain size and layout.  This is against every good design principal.

There are other problems like not using a protected namespace.


> Everything else is thread local.

This, too, is not adequate.  It requires working on the 1-on-1 model.
Using user-level contexts (setcontext etc) is made very hard and
expensive.  There is not one state to change.  Using any of the code (or
functionality using the state) in signal handlers, possibly recursive,
will throw things in disorder.

There is no reason to not make the state explicit.


> I believe it is good enough for current machines, at least 
> until there is enough experience to really figure out what
> node discovery is needed..

That's the point.  We cannot start using an inadequate API now since one
will _never_ be able to get rid of it again.  We have accumulated
several examples of this in the past years.

The API design should be general enough to work for all the
architectures which are currently envisioned and must be extensible to
be extendable for future architectures.  Your API does not allow to
write adequate code even on some/many of today's architectures.



> I have seen some proposals
> for complex graph based descriptions, but so far I have seen
> nothing that could really take advantage of something so fancy.

I am proposing no graph-based descriptions.  And this is something where
you miss the point.  This is only the lowest level interface.  It
prorvides enough functionality to describe the machine architecture.  If
some fancy alternative representations are needed this is something for
a higher-level interface.

What must be avoided at all costs is programs peeking into /sys and
/proc to determine the topology.  First of all, this makes programs
architecture and even machine-specific.  Second, the /sys and /proc
format will change over time.  All these access must be hidden and the
NUMA library is just the place for that.

Saying

> If it should be really needed it can be added later.

just means we will get programs today which hardcode today's existing
and most probably inadequate representation of the topology in /sys and
/proc.


>>~ no inclusion of SMT/multicore in the cpu hierarchy
> 
> 
> Not sure why you would care about that.

These are two sides of the same coin.  Today we already have problems
with programs running on machines with SMT processors.  How can those
use pthread_setaffinity() to create theoptimal number of threads and
place them accordingly?  It requires magic /proc parsing for each and
every architecture.  The problem is exactly the same as with NUMA and
the interface extensions to cover MC/SMT as well are minimal.


> Well, I spent a lot of time talking to various users; and IMHO
> it matches the needs of a lot of them. I did not add all the features
> everybody wanted, but that was simply not possible and still comming
> up with a reasonable design.

And this means it should not be done?


> The per process state is needed for numactl though.
> 
> I kept the support for this visible in libnuma to make it easier to convert
> old code to this (just wrap some code with a policy) For designed from 
> scratch programs it is probably better to use the allocation functions
> with mbind directly.

The NUMA library interface should not be cluttered because of
considerations of legacy apps which need to be converted.  These are
separate issues, the design of the API must not be influenced by this.
The problem always has been that in such cases the correct interfaces
are not being used but instead the "easier to use" legacy interfaces are
used.


>>Also, the concept of hard/soft sets for CPUs is useful.  Likewise
>>"spilling" over to other memory nodes.  Usually using NUMA means hinting
>>the desired configuration to the system.  It'll be used whenever
>>possible.  If it is not possible (for instance, if a given processor is
>>not available) it is mostly no good idea to completely fail the
> 
> 
> Agreed. That is why prefered and bind are different policies
> and you can switch between them in libnuma. 

That is inadequate.  Any process/thread state like this increases the
program cost since it means that the program at all times must remember
the current state and switch if necessary.  Combine this with 3rd party
libraries using the functionality as well and you'll get explicit
switching before *every* memory allocation because one cannot assume
anything about the state.  Even if the NUMA library keeps track of the
state internally, there is always the possibility that more than one
instance of the library is used at any one time (e.g., statically linked
into a DSO).

I repeast myself: global or thread-local states are bad.  Always have
been, always will be.


-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-05-03 18:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1QAMU-4gf-15@gated-at.bofh.it>
2004-04-30 20:01 ` NUMA API Andi Kleen
2004-05-01  5:15   ` Martin J. Bligh
2004-05-03 18:34   ` Ulrich Drepper
2004-04-30 20:39 ` Andi Kleen
     [not found] ` <1RLdk-29R-11@gated-at.bofh.it>
2004-05-03 13:17   ` NUMA API - wish list Andi Kleen
2004-04-30  7:35 NUMA API Ulrich Drepper
2004-05-03 12:48 ` NUMA API - wish list Zoltan Menyhart
2004-05-03 17:57   ` Paul Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox