* [PATCH] Run PCI driver initialization on local node
@ 2005-07-06 13:32 Andi Kleen
2005-07-06 16:35 ` Christoph Lameter
0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2005-07-06 13:32 UTC (permalink / raw)
To: christoph, akpm, linux-kernel; +Cc: linux-pci, gregkh
Run PCI driver initialization on local node
Instead of adding messy kmalloc_node()s everywhere run the
PCI driver probe on the node local to the device.
Then the normal NUMA aware allocators do the right thing.
This would not have helped for IDE, but should for
other more clean drivers that do more initialization in probe().
It won't help for drivers that do most of the work
on first open (like many network drivers)
Signed-off-by: Andi Kleen <ak@suse.de>
cc: christoph@lameter.com
Index: linux/drivers/pci/pci-driver.c
===================================================================
--- linux.orig/drivers/pci/pci-driver.c
+++ linux/drivers/pci/pci-driver.c
@@ -167,6 +167,27 @@ const struct pci_device_id *pci_match_de
return NULL;
}
+static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ int error;
+#ifdef CONFIG_NUMA
+ /* Execute driver initialization on node where the
+ device's bus is attached to. This way the driver likely
+ allocates its local memory on the right node without
+ any need to change it. */
+ cpumask_t oldmask = current->cpus_allowed;
+ int node = pcibus_to_node(dev->bus);
+ if (node >= 0 && node_online(node))
+ set_cpus_allowed(current, node_to_cpumask(node));
+#endif
+ error = drv->probe(dev, id);
+#ifdef CONFIG_NUMA
+ set_cpus_allowed(current, oldmask);
+#endif
+ return error;
+}
+
/**
* __pci_device_probe()
*
@@ -184,7 +205,7 @@ __pci_device_probe(struct pci_driver *dr
id = pci_match_device(drv, pci_dev);
if (id)
- error = drv->probe(pci_dev, id);
+ error = pci_call_probe(drv, pci_dev, id);
if (error >= 0) {
pci_dev->driver = drv;
error = 0;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-06 13:32 [PATCH] Run PCI driver initialization on local node Andi Kleen
@ 2005-07-06 16:35 ` Christoph Lameter
2005-07-06 17:56 ` Andi Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 16:35 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh
On Wed, 6 Jul 2005, Andi Kleen wrote:
> Instead of adding messy kmalloc_node()s everywhere run the
> PCI driver probe on the node local to the device.
> Then the normal NUMA aware allocators do the right thing.
That depends on the architecture. Some do round robin allocs for periods
of time during bootup. I think it is better to explicitly place control
structures.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-06 16:35 ` Christoph Lameter
@ 2005-07-06 17:56 ` Andi Kleen
2005-07-06 18:01 ` Christoph Lameter
2005-07-06 19:33 ` Christoph Lameter
0 siblings, 2 replies; 11+ messages in thread
From: Andi Kleen @ 2005-07-06 17:56 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh
On Wed, Jul 06, 2005 at 09:35:32AM -0700, Christoph Lameter wrote:
> On Wed, 6 Jul 2005, Andi Kleen wrote:
>
> > Instead of adding messy kmalloc_node()s everywhere run the
> > PCI driver probe on the node local to the device.
> > Then the normal NUMA aware allocators do the right thing.
>
> That depends on the architecture. Some do round robin allocs for periods
> of time during bootup. I think it is better to explicitly place control
slab will usually do the right thing because it has a forced
local node policy, but __gfp might not.
I agree it would be better to force the policy.
I kept the scheduling to CPU because slab still needs it.
Updated patch appended.
> structures.
Patching every driver in existence? That sounds like a lot of
work.
The node local placement should be correct for nearly all drivers. I didn't
see any other fancy placement in your patches. If a driver still wants to do
fancy placement it is free to overwrite the policy. Having a good
default is good.
Greg, please consider applying.
-Andi
Run PCI driver initialization on local node
Instead of adding messy kmalloc_node()s everywhere run the
PCI driver probe on the node local to the device.
This would not have helped for IDE, but should for
other more clean drivers that do more initialization in probe().
It won't help for drivers that do most of the work
on first open (like many network drivers)
Signed-off-by: Andi Kleen <ak@suse.de>
cc: christoph@lameter.com
Index: linux/drivers/pci/pci-driver.c
===================================================================
--- linux.orig/drivers/pci/pci-driver.c
+++ linux/drivers/pci/pci-driver.c
@@ -7,6 +7,7 @@
#include <linux/module.h>
#include <linux/init.h>
#include <linux/device.h>
+#include <linux/mempolicy.h>
#include "pci.h"
/*
@@ -167,6 +168,34 @@ const struct pci_device_id *pci_match_de
return NULL;
}
+static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ int error;
+#ifdef CONFIG_NUMA
+ /* Execute driver initialization on node where the
+ device's bus is attached to. This way the driver likely
+ allocates its local memory on the right node without
+ any need to change it. */
+ struct mempolicy *oldpol;
+ cpumask_t oldmask = current->cpus_allowed;
+ int node = pcibus_to_node(dev->bus);
+ if (node >= 0 && node_online(node))
+ set_cpus_allowed(current, node_to_cpumask(node));
+ /* And set default memory allocation policy */
+ oldpol = current->mempolicy;
+ current->mempolicy = &default_policy;
+ mpol_get(current->mempolicy);
+#endif
+ error = drv->probe(dev, id);
+#ifdef CONFIG_NUMA
+ set_cpus_allowed(current, oldmask);
+ mpol_free(current->mempolicy);
+ current->mempolicy = oldpol;
+#endif
+ return error;
+}
+
/**
* __pci_device_probe()
*
@@ -184,7 +213,7 @@ __pci_device_probe(struct pci_driver *dr
id = pci_match_device(drv, pci_dev);
if (id)
- error = drv->probe(pci_dev, id);
+ error = pci_call_probe(drv, pci_dev, id);
if (error >= 0) {
pci_dev->driver = drv;
error = 0;
Index: linux/mm/mempolicy.c
===================================================================
--- linux.orig/mm/mempolicy.c
+++ linux/mm/mempolicy.c
@@ -88,7 +88,7 @@ static kmem_cache_t *sn_cache;
policied. */
static int policy_zone;
-static struct mempolicy default_policy = {
+struct mempolicy default_policy = {
.refcnt = ATOMIC_INIT(1), /* never free it */
.policy = MPOL_DEFAULT,
};
Index: linux/include/linux/mempolicy.h
===================================================================
--- linux.orig/include/linux/mempolicy.h
+++ linux/include/linux/mempolicy.h
@@ -152,6 +152,7 @@ struct mempolicy *mpol_shared_policy_loo
extern void numa_default_policy(void);
extern void numa_policy_init(void);
+extern struct mempolicy default_policy;
#else
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-06 17:56 ` Andi Kleen
@ 2005-07-06 18:01 ` Christoph Lameter
2005-07-06 18:13 ` Andi Kleen
2005-07-06 19:33 ` Christoph Lameter
1 sibling, 1 reply; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 18:01 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh
On Wed, 6 Jul 2005, Andi Kleen wrote:
> On Wed, Jul 06, 2005 at 09:35:32AM -0700, Christoph Lameter wrote:
> > On Wed, 6 Jul 2005, Andi Kleen wrote:
> >
> > > Instead of adding messy kmalloc_node()s everywhere run the
> > > PCI driver probe on the node local to the device.
> > > Then the normal NUMA aware allocators do the right thing.
> >
> > That depends on the architecture. Some do round robin allocs for periods
> > of time during bootup. I think it is better to explicitly place control
>
> slab will usually do the right thing because it has a forced
> local node policy, but __gfp might not.
GFP allocs may not do the right thing. If you want to do this then it
may be best to set the memory policy to restrict allocations to the node
on which the device resides.
Plus there are CPU less nodes. What happens to those?
> Patching every driver in existence? That sounds like a lot of
> work.
No just patch those that would benefit from it. The existing
dma_alloc_coherent already takes care of many of the placement
issues for driver memory. We would probably need to patch more locations
where higher level control structure allocations are being done.
> The node local placement should be correct for nearly all drivers. I didn't
> see any other fancy placement in your patches. If a driver still wants to do
> fancy placement it is free to overwrite the policy. Having a good
> default is good.
Definitely.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-06 18:01 ` Christoph Lameter
@ 2005-07-06 18:13 ` Andi Kleen
2005-07-06 18:28 ` Christoph Lameter
2005-07-06 19:31 ` Christoph Lameter
0 siblings, 2 replies; 11+ messages in thread
From: Andi Kleen @ 2005-07-06 18:13 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh
On Wed, Jul 06, 2005 at 11:01:14AM -0700, Christoph Lameter wrote:
> On Wed, 6 Jul 2005, Andi Kleen wrote:
>
> > On Wed, Jul 06, 2005 at 09:35:32AM -0700, Christoph Lameter wrote:
> > > On Wed, 6 Jul 2005, Andi Kleen wrote:
> > >
> > > > Instead of adding messy kmalloc_node()s everywhere run the
> > > > PCI driver probe on the node local to the device.
> > > > Then the normal NUMA aware allocators do the right thing.
> > >
> > > That depends on the architecture. Some do round robin allocs for periods
> > > of time during bootup. I think it is better to explicitly place control
> >
> > slab will usually do the right thing because it has a forced
> > local node policy, but __gfp might not.
>
> GFP allocs may not do the right thing. If you want to do this then it
> may be best to set the memory policy to restrict allocations to the node
> on which the device resides.
They will do the right thing. Under memory pressue on the node
it is better to back off than to fail.
>
> Plus there are CPU less nodes. What happens to those?
They are not worse off that they are now.
>
> > Patching every driver in existence? That sounds like a lot of
> > work.
>
> No just patch those that would benefit from it. The existing
This would be "all devices that SGI ships on Altixes" ?
IMHO all can benefit.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-06 18:13 ` Andi Kleen
@ 2005-07-06 18:28 ` Christoph Lameter
2005-07-06 19:31 ` Christoph Lameter
1 sibling, 0 replies; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 18:28 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh
On Wed, 6 Jul 2005, Andi Kleen wrote:
> > > Patching every driver in existence? That sounds like a lot of
> > > work.
> >
> > No just patch those that would benefit from it. The existing
>
> This would be "all devices that SGI ships on Altixes" ?
Anyone can patch devices drivers. High performance drivers suffer the most
from wrong node placement. These are most likely 10G ethernet, high speed
scsi etc.
The main concern at this point are the higher abstraction layers. These
are generic and if they do the right thing then we have already come a
long way.
> IMHO all can benefit.
Absolutely.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-06 18:13 ` Andi Kleen
2005-07-06 18:28 ` Christoph Lameter
@ 2005-07-06 19:31 ` Christoph Lameter
1 sibling, 0 replies; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 19:31 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh
On Wed, 6 Jul 2005, Andi Kleen wrote:
> > GFP allocs may not do the right thing. If you want to do this then it
> > may be best to set the memory policy to restrict allocations to the node
> > on which the device resides.
>
> They will do the right thing. Under memory pressue on the node
> it is better to back off than to fail.
Node specific allocs fall back to other nodes and will not fail unless
there is no memory available.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-06 17:56 ` Andi Kleen
2005-07-06 18:01 ` Christoph Lameter
@ 2005-07-06 19:33 ` Christoph Lameter
2005-07-07 10:39 ` Andi Kleen
1 sibling, 1 reply; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 19:33 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh
On Wed, 6 Jul 2005, Andi Kleen wrote:
> > That depends on the architecture. Some do round robin allocs for periods
> > of time during bootup. I think it is better to explicitly place control
>
> slab will usually do the right thing because it has a forced
> local node policy, but __gfp might not.
The slab allocator will do the right thing with the numa slab allocator in
Andrew's tree but not with the one in Linus'tree. The one is Linus tree
will just pickup whatever slab is available irregardless of the node.
Only kmalloc_node will make a reasonable attempt to locate the memory on
a specific node.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-06 19:33 ` Christoph Lameter
@ 2005-07-07 10:39 ` Andi Kleen
2005-07-07 13:52 ` Christoph Lameter
0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2005-07-07 10:39 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh
On Wed, Jul 06, 2005 at 12:33:51PM -0700, Christoph Lameter wrote:
> On Wed, 6 Jul 2005, Andi Kleen wrote:
>
> > > That depends on the architecture. Some do round robin allocs for periods
> > > of time during bootup. I think it is better to explicitly place control
> >
> > slab will usually do the right thing because it has a forced
> > local node policy, but __gfp might not.
>
> The slab allocator will do the right thing with the numa slab allocator in
> Andrew's tree but not with the one in Linus'tree. The one is Linus tree
> will just pickup whatever slab is available irregardless of the node.
It should usually do the right thing because it
runs on the correct CPUs. The only case that doesn't work
is freeing on different CPUs than it was allocated, but hopefully
that is not too common during system startup.
And then at some point NUMA aware slab will make it into mainline
I guess.
> Only kmalloc_node will make a reasonable attempt to locate the memory on
> a specific node.
You forgot __get_free_pages.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-07 10:39 ` Andi Kleen
@ 2005-07-07 13:52 ` Christoph Lameter
2005-07-07 14:13 ` Andi Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Lameter @ 2005-07-07 13:52 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh
On Thu, 7 Jul 2005, Andi Kleen wrote:
> > The slab allocator will do the right thing with the numa slab allocator in
> > Andrew's tree but not with the one in Linus'tree. The one is Linus tree
> > will just pickup whatever slab is available irregardless of the node.
>
> It should usually do the right thing because it
> runs on the correct CPUs. The only case that doesn't work
> is freeing on different CPUs than it was allocated, but hopefully
> that is not too common during system startup.
The current slab wont do that unless you allocate enough entries so
that a new page is retrieved from the page allocator. Then you may have
local memory from the slab (if the memory policy is not on round robin).
If you allocate some slab entries on one node then you typically have a
partially used page from that node. If you then switch to a different
processor on a different node and then use the slab allocator to get an
entry for that slab then that partially used page will be used! The
slab allocator will return an entry from the *prior* node.
> And then at some point NUMA aware slab will make it into mainline > I guess.
Hopefully.
> > Only kmalloc_node will make a reasonable attempt to locate the memory on
> > a specific node.
>
> You forgot __get_free_pages.
The slab allocator uses alloc_pages and alloc_pages_node
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node
2005-07-07 13:52 ` Christoph Lameter
@ 2005-07-07 14:13 ` Andi Kleen
0 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2005-07-07 14:13 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh
> > > Only kmalloc_node will make a reasonable attempt to locate the memory on
> > > a specific node.
> >
> > You forgot __get_free_pages.
>
> The slab allocator uses alloc_pages and alloc_pages_node
alloc_pages is NUMA aware. __get_free_pages is just a wrapper for it.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-07-07 14:14 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-06 13:32 [PATCH] Run PCI driver initialization on local node Andi Kleen
2005-07-06 16:35 ` Christoph Lameter
2005-07-06 17:56 ` Andi Kleen
2005-07-06 18:01 ` Christoph Lameter
2005-07-06 18:13 ` Andi Kleen
2005-07-06 18:28 ` Christoph Lameter
2005-07-06 19:31 ` Christoph Lameter
2005-07-06 19:33 ` Christoph Lameter
2005-07-07 10:39 ` Andi Kleen
2005-07-07 13:52 ` Christoph Lameter
2005-07-07 14:13 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox