* [PATCH] Run PCI driver initialization on local node
@ 2005-07-06 13:32 Andi Kleen
2005-07-06 16:35 ` Christoph Lameter
0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2005-07-06 13:32 UTC (permalink / raw)
To: christoph, akpm, linux-kernel; +Cc: linux-pci, gregkh
Run PCI driver initialization on local node
Instead of adding messy kmalloc_node()s everywhere run the
PCI driver probe on the node local to the device.
Then the normal NUMA aware allocators do the right thing.
This would not have helped for IDE, but should for
other more clean drivers that do more initialization in probe().
It won't help for drivers that do most of the work
on first open (like many network drivers)
Signed-off-by: Andi Kleen <ak@suse.de>
cc: christoph@lameter.com
Index: linux/drivers/pci/pci-driver.c
===================================================================
--- linux.orig/drivers/pci/pci-driver.c
+++ linux/drivers/pci/pci-driver.c
@@ -167,6 +167,27 @@ const struct pci_device_id *pci_match_de
return NULL;
}
+static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ int error;
+#ifdef CONFIG_NUMA
+ /* Execute driver initialization on node where the
+ device's bus is attached to. This way the driver likely
+ allocates its local memory on the right node without
+ any need to change it. */
+ cpumask_t oldmask = current->cpus_allowed;
+ int node = pcibus_to_node(dev->bus);
+ if (node >= 0 && node_online(node))
+ set_cpus_allowed(current, node_to_cpumask(node));
+#endif
+ error = drv->probe(dev, id);
+#ifdef CONFIG_NUMA
+ set_cpus_allowed(current, oldmask);
+#endif
+ return error;
+}
+
/**
* __pci_device_probe()
*
@@ -184,7 +205,7 @@ __pci_device_probe(struct pci_driver *dr
id = pci_match_device(drv, pci_dev);
if (id)
- error = drv->probe(pci_dev, id);
+ error = pci_call_probe(drv, pci_dev, id);
if (error >= 0) {
pci_dev->driver = drv;
error = 0;
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] Run PCI driver initialization on local node 2005-07-06 13:32 [PATCH] Run PCI driver initialization on local node Andi Kleen @ 2005-07-06 16:35 ` Christoph Lameter 2005-07-06 17:56 ` Andi Kleen 0 siblings, 1 reply; 11+ messages in thread From: Christoph Lameter @ 2005-07-06 16:35 UTC (permalink / raw) To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh On Wed, 6 Jul 2005, Andi Kleen wrote: > Instead of adding messy kmalloc_node()s everywhere run the > PCI driver probe on the node local to the device. > Then the normal NUMA aware allocators do the right thing. That depends on the architecture. Some do round robin allocs for periods of time during bootup. I think it is better to explicitly place control structures. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node 2005-07-06 16:35 ` Christoph Lameter @ 2005-07-06 17:56 ` Andi Kleen 2005-07-06 18:01 ` Christoph Lameter 2005-07-06 19:33 ` Christoph Lameter 0 siblings, 2 replies; 11+ messages in thread From: Andi Kleen @ 2005-07-06 17:56 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh On Wed, Jul 06, 2005 at 09:35:32AM -0700, Christoph Lameter wrote: > On Wed, 6 Jul 2005, Andi Kleen wrote: > > > Instead of adding messy kmalloc_node()s everywhere run the > > PCI driver probe on the node local to the device. > > Then the normal NUMA aware allocators do the right thing. > > That depends on the architecture. Some do round robin allocs for periods > of time during bootup. I think it is better to explicitly place control slab will usually do the right thing because it has a forced local node policy, but __gfp might not. I agree it would be better to force the policy. I kept the scheduling to CPU because slab still needs it. Updated patch appended. > structures. Patching every driver in existence? That sounds like a lot of work. The node local placement should be correct for nearly all drivers. I didn't see any other fancy placement in your patches. If a driver still wants to do fancy placement it is free to overwrite the policy. Having a good default is good. Greg, please consider applying. -Andi Run PCI driver initialization on local node Instead of adding messy kmalloc_node()s everywhere run the PCI driver probe on the node local to the device. This would not have helped for IDE, but should for other more clean drivers that do more initialization in probe(). It won't help for drivers that do most of the work on first open (like many network drivers) Signed-off-by: Andi Kleen <ak@suse.de> cc: christoph@lameter.com Index: linux/drivers/pci/pci-driver.c =================================================================== --- linux.orig/drivers/pci/pci-driver.c +++ linux/drivers/pci/pci-driver.c @@ -7,6 +7,7 @@ #include <linux/module.h> #include <linux/init.h> #include <linux/device.h> +#include <linux/mempolicy.h> #include "pci.h" /* @@ -167,6 +168,34 @@ const struct pci_device_id *pci_match_de return NULL; } +static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, + const struct pci_device_id *id) +{ + int error; +#ifdef CONFIG_NUMA + /* Execute driver initialization on node where the + device's bus is attached to. This way the driver likely + allocates its local memory on the right node without + any need to change it. */ + struct mempolicy *oldpol; + cpumask_t oldmask = current->cpus_allowed; + int node = pcibus_to_node(dev->bus); + if (node >= 0 && node_online(node)) + set_cpus_allowed(current, node_to_cpumask(node)); + /* And set default memory allocation policy */ + oldpol = current->mempolicy; + current->mempolicy = &default_policy; + mpol_get(current->mempolicy); +#endif + error = drv->probe(dev, id); +#ifdef CONFIG_NUMA + set_cpus_allowed(current, oldmask); + mpol_free(current->mempolicy); + current->mempolicy = oldpol; +#endif + return error; +} + /** * __pci_device_probe() * @@ -184,7 +213,7 @@ __pci_device_probe(struct pci_driver *dr id = pci_match_device(drv, pci_dev); if (id) - error = drv->probe(pci_dev, id); + error = pci_call_probe(drv, pci_dev, id); if (error >= 0) { pci_dev->driver = drv; error = 0; Index: linux/mm/mempolicy.c =================================================================== --- linux.orig/mm/mempolicy.c +++ linux/mm/mempolicy.c @@ -88,7 +88,7 @@ static kmem_cache_t *sn_cache; policied. */ static int policy_zone; -static struct mempolicy default_policy = { +struct mempolicy default_policy = { .refcnt = ATOMIC_INIT(1), /* never free it */ .policy = MPOL_DEFAULT, }; Index: linux/include/linux/mempolicy.h =================================================================== --- linux.orig/include/linux/mempolicy.h +++ linux/include/linux/mempolicy.h @@ -152,6 +152,7 @@ struct mempolicy *mpol_shared_policy_loo extern void numa_default_policy(void); extern void numa_policy_init(void); +extern struct mempolicy default_policy; #else ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node 2005-07-06 17:56 ` Andi Kleen @ 2005-07-06 18:01 ` Christoph Lameter 2005-07-06 18:13 ` Andi Kleen 2005-07-06 19:33 ` Christoph Lameter 1 sibling, 1 reply; 11+ messages in thread From: Christoph Lameter @ 2005-07-06 18:01 UTC (permalink / raw) To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh On Wed, 6 Jul 2005, Andi Kleen wrote: > On Wed, Jul 06, 2005 at 09:35:32AM -0700, Christoph Lameter wrote: > > On Wed, 6 Jul 2005, Andi Kleen wrote: > > > > > Instead of adding messy kmalloc_node()s everywhere run the > > > PCI driver probe on the node local to the device. > > > Then the normal NUMA aware allocators do the right thing. > > > > That depends on the architecture. Some do round robin allocs for periods > > of time during bootup. I think it is better to explicitly place control > > slab will usually do the right thing because it has a forced > local node policy, but __gfp might not. GFP allocs may not do the right thing. If you want to do this then it may be best to set the memory policy to restrict allocations to the node on which the device resides. Plus there are CPU less nodes. What happens to those? > Patching every driver in existence? That sounds like a lot of > work. No just patch those that would benefit from it. The existing dma_alloc_coherent already takes care of many of the placement issues for driver memory. We would probably need to patch more locations where higher level control structure allocations are being done. > The node local placement should be correct for nearly all drivers. I didn't > see any other fancy placement in your patches. If a driver still wants to do > fancy placement it is free to overwrite the policy. Having a good > default is good. Definitely. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node 2005-07-06 18:01 ` Christoph Lameter @ 2005-07-06 18:13 ` Andi Kleen 2005-07-06 18:28 ` Christoph Lameter 2005-07-06 19:31 ` Christoph Lameter 0 siblings, 2 replies; 11+ messages in thread From: Andi Kleen @ 2005-07-06 18:13 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh On Wed, Jul 06, 2005 at 11:01:14AM -0700, Christoph Lameter wrote: > On Wed, 6 Jul 2005, Andi Kleen wrote: > > > On Wed, Jul 06, 2005 at 09:35:32AM -0700, Christoph Lameter wrote: > > > On Wed, 6 Jul 2005, Andi Kleen wrote: > > > > > > > Instead of adding messy kmalloc_node()s everywhere run the > > > > PCI driver probe on the node local to the device. > > > > Then the normal NUMA aware allocators do the right thing. > > > > > > That depends on the architecture. Some do round robin allocs for periods > > > of time during bootup. I think it is better to explicitly place control > > > > slab will usually do the right thing because it has a forced > > local node policy, but __gfp might not. > > GFP allocs may not do the right thing. If you want to do this then it > may be best to set the memory policy to restrict allocations to the node > on which the device resides. They will do the right thing. Under memory pressue on the node it is better to back off than to fail. > > Plus there are CPU less nodes. What happens to those? They are not worse off that they are now. > > > Patching every driver in existence? That sounds like a lot of > > work. > > No just patch those that would benefit from it. The existing This would be "all devices that SGI ships on Altixes" ? IMHO all can benefit. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node 2005-07-06 18:13 ` Andi Kleen @ 2005-07-06 18:28 ` Christoph Lameter 2005-07-06 19:31 ` Christoph Lameter 1 sibling, 0 replies; 11+ messages in thread From: Christoph Lameter @ 2005-07-06 18:28 UTC (permalink / raw) To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh On Wed, 6 Jul 2005, Andi Kleen wrote: > > > Patching every driver in existence? That sounds like a lot of > > > work. > > > > No just patch those that would benefit from it. The existing > > This would be "all devices that SGI ships on Altixes" ? Anyone can patch devices drivers. High performance drivers suffer the most from wrong node placement. These are most likely 10G ethernet, high speed scsi etc. The main concern at this point are the higher abstraction layers. These are generic and if they do the right thing then we have already come a long way. > IMHO all can benefit. Absolutely. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node 2005-07-06 18:13 ` Andi Kleen 2005-07-06 18:28 ` Christoph Lameter @ 2005-07-06 19:31 ` Christoph Lameter 1 sibling, 0 replies; 11+ messages in thread From: Christoph Lameter @ 2005-07-06 19:31 UTC (permalink / raw) To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh On Wed, 6 Jul 2005, Andi Kleen wrote: > > GFP allocs may not do the right thing. If you want to do this then it > > may be best to set the memory policy to restrict allocations to the node > > on which the device resides. > > They will do the right thing. Under memory pressue on the node > it is better to back off than to fail. Node specific allocs fall back to other nodes and will not fail unless there is no memory available. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node 2005-07-06 17:56 ` Andi Kleen 2005-07-06 18:01 ` Christoph Lameter @ 2005-07-06 19:33 ` Christoph Lameter 2005-07-07 10:39 ` Andi Kleen 1 sibling, 1 reply; 11+ messages in thread From: Christoph Lameter @ 2005-07-06 19:33 UTC (permalink / raw) To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh On Wed, 6 Jul 2005, Andi Kleen wrote: > > That depends on the architecture. Some do round robin allocs for periods > > of time during bootup. I think it is better to explicitly place control > > slab will usually do the right thing because it has a forced > local node policy, but __gfp might not. The slab allocator will do the right thing with the numa slab allocator in Andrew's tree but not with the one in Linus'tree. The one is Linus tree will just pickup whatever slab is available irregardless of the node. Only kmalloc_node will make a reasonable attempt to locate the memory on a specific node. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node 2005-07-06 19:33 ` Christoph Lameter @ 2005-07-07 10:39 ` Andi Kleen 2005-07-07 13:52 ` Christoph Lameter 0 siblings, 1 reply; 11+ messages in thread From: Andi Kleen @ 2005-07-07 10:39 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh On Wed, Jul 06, 2005 at 12:33:51PM -0700, Christoph Lameter wrote: > On Wed, 6 Jul 2005, Andi Kleen wrote: > > > > That depends on the architecture. Some do round robin allocs for periods > > > of time during bootup. I think it is better to explicitly place control > > > > slab will usually do the right thing because it has a forced > > local node policy, but __gfp might not. > > The slab allocator will do the right thing with the numa slab allocator in > Andrew's tree but not with the one in Linus'tree. The one is Linus tree > will just pickup whatever slab is available irregardless of the node. It should usually do the right thing because it runs on the correct CPUs. The only case that doesn't work is freeing on different CPUs than it was allocated, but hopefully that is not too common during system startup. And then at some point NUMA aware slab will make it into mainline I guess. > Only kmalloc_node will make a reasonable attempt to locate the memory on > a specific node. You forgot __get_free_pages. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node 2005-07-07 10:39 ` Andi Kleen @ 2005-07-07 13:52 ` Christoph Lameter 2005-07-07 14:13 ` Andi Kleen 0 siblings, 1 reply; 11+ messages in thread From: Christoph Lameter @ 2005-07-07 13:52 UTC (permalink / raw) To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh On Thu, 7 Jul 2005, Andi Kleen wrote: > > The slab allocator will do the right thing with the numa slab allocator in > > Andrew's tree but not with the one in Linus'tree. The one is Linus tree > > will just pickup whatever slab is available irregardless of the node. > > It should usually do the right thing because it > runs on the correct CPUs. The only case that doesn't work > is freeing on different CPUs than it was allocated, but hopefully > that is not too common during system startup. The current slab wont do that unless you allocate enough entries so that a new page is retrieved from the page allocator. Then you may have local memory from the slab (if the memory policy is not on round robin). If you allocate some slab entries on one node then you typically have a partially used page from that node. If you then switch to a different processor on a different node and then use the slab allocator to get an entry for that slab then that partially used page will be used! The slab allocator will return an entry from the *prior* node. > And then at some point NUMA aware slab will make it into mainline > I guess. Hopefully. > > Only kmalloc_node will make a reasonable attempt to locate the memory on > > a specific node. > > You forgot __get_free_pages. The slab allocator uses alloc_pages and alloc_pages_node ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Run PCI driver initialization on local node 2005-07-07 13:52 ` Christoph Lameter @ 2005-07-07 14:13 ` Andi Kleen 0 siblings, 0 replies; 11+ messages in thread From: Andi Kleen @ 2005-07-07 14:13 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh > > > Only kmalloc_node will make a reasonable attempt to locate the memory on > > > a specific node. > > > > You forgot __get_free_pages. > > The slab allocator uses alloc_pages and alloc_pages_node alloc_pages is NUMA aware. __get_free_pages is just a wrapper for it. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-07-07 14:14 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-07-06 13:32 [PATCH] Run PCI driver initialization on local node Andi Kleen 2005-07-06 16:35 ` Christoph Lameter 2005-07-06 17:56 ` Andi Kleen 2005-07-06 18:01 ` Christoph Lameter 2005-07-06 18:13 ` Andi Kleen 2005-07-06 18:28 ` Christoph Lameter 2005-07-06 19:31 ` Christoph Lameter 2005-07-06 19:33 ` Christoph Lameter 2005-07-07 10:39 ` Andi Kleen 2005-07-07 13:52 ` Christoph Lameter 2005-07-07 14:13 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox