public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Run PCI driver initialization on local node
@ 2005-07-06 13:32 Andi Kleen
  2005-07-06 16:35 ` Christoph Lameter
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2005-07-06 13:32 UTC (permalink / raw)
  To: christoph, akpm, linux-kernel; +Cc: linux-pci, gregkh


Run PCI driver initialization on local node

Instead of adding messy kmalloc_node()s everywhere run the 
PCI driver probe on the node local to the device.
Then the normal NUMA aware allocators do the right thing.

This would not have helped for IDE, but should for 
other more clean drivers that do more initialization in probe().
It won't help for drivers that do most of the work
on first open (like many network drivers)

Signed-off-by: Andi Kleen <ak@suse.de> 
cc: christoph@lameter.com

Index: linux/drivers/pci/pci-driver.c
===================================================================
--- linux.orig/drivers/pci/pci-driver.c
+++ linux/drivers/pci/pci-driver.c
@@ -167,6 +167,27 @@ const struct pci_device_id *pci_match_de
 	return NULL;
 }
 
+static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, 
+			  const struct pci_device_id *id)
+{
+	int error;
+#ifdef CONFIG_NUMA
+	/* Execute driver initialization on node where the 
+	   device's bus is attached to.  This way the driver likely
+	   allocates its local memory on the right node without
+	   any need to change it. */
+	cpumask_t oldmask = current->cpus_allowed;
+	int node = pcibus_to_node(dev->bus);
+	if (node >= 0 && node_online(node))
+	    set_cpus_allowed(current, node_to_cpumask(node));	
+#endif
+	error = drv->probe(dev, id);
+#ifdef CONFIG_NUMA
+	set_cpus_allowed(current, oldmask);
+#endif
+	return error;
+}
+
 /**
  * __pci_device_probe()
  * 
@@ -184,7 +205,7 @@ __pci_device_probe(struct pci_driver *dr
 
 		id = pci_match_device(drv, pci_dev);
 		if (id)
-			error = drv->probe(pci_dev, id);
+			error = pci_call_probe(drv, pci_dev, id);
 		if (error >= 0) {
 			pci_dev->driver = drv;
 			error = 0;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-06 13:32 [PATCH] Run PCI driver initialization on local node Andi Kleen
@ 2005-07-06 16:35 ` Christoph Lameter
  2005-07-06 17:56   ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 16:35 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh

On Wed, 6 Jul 2005, Andi Kleen wrote:

> Instead of adding messy kmalloc_node()s everywhere run the 
> PCI driver probe on the node local to the device.
> Then the normal NUMA aware allocators do the right thing.

That depends on the architecture. Some do round robin allocs for periods 
of time during bootup. I think it is better to explicitly place control 
structures.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-06 16:35 ` Christoph Lameter
@ 2005-07-06 17:56   ` Andi Kleen
  2005-07-06 18:01     ` Christoph Lameter
  2005-07-06 19:33     ` Christoph Lameter
  0 siblings, 2 replies; 11+ messages in thread
From: Andi Kleen @ 2005-07-06 17:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh

On Wed, Jul 06, 2005 at 09:35:32AM -0700, Christoph Lameter wrote:
> On Wed, 6 Jul 2005, Andi Kleen wrote:
> 
> > Instead of adding messy kmalloc_node()s everywhere run the 
> > PCI driver probe on the node local to the device.
> > Then the normal NUMA aware allocators do the right thing.
> 
> That depends on the architecture. Some do round robin allocs for periods 
> of time during bootup. I think it is better to explicitly place control 

slab will usually do the right thing because it has a forced
local node policy, but __gfp might not.
I agree it would be better to force the policy.

I kept the scheduling to CPU because slab still needs it.

Updated patch appended.

> structures.

Patching every driver in existence? That sounds like a lot of
work. 

The node local placement should be correct for nearly all drivers. I didn't 
see any other fancy placement in your patches. If a driver still wants to do 
fancy placement it is free to overwrite the policy. Having a good
default is good.

Greg, please consider applying.

-Andi


Run PCI driver initialization on local node

Instead of adding messy kmalloc_node()s everywhere run the 
PCI driver probe on the node local to the device.

This would not have helped for IDE, but should for 
other more clean drivers that do more initialization in probe().
It won't help for drivers that do most of the work
on first open (like many network drivers)

Signed-off-by: Andi Kleen <ak@suse.de> 
cc: christoph@lameter.com

Index: linux/drivers/pci/pci-driver.c
===================================================================
--- linux.orig/drivers/pci/pci-driver.c
+++ linux/drivers/pci/pci-driver.c
@@ -7,6 +7,7 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/device.h>
+#include <linux/mempolicy.h>
 #include "pci.h"
 
 /*
@@ -167,6 +168,34 @@ const struct pci_device_id *pci_match_de
 	return NULL;
 }
 
+static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, 
+			  const struct pci_device_id *id)
+{
+	int error;
+#ifdef CONFIG_NUMA
+	/* Execute driver initialization on node where the 
+	   device's bus is attached to.  This way the driver likely
+	   allocates its local memory on the right node without
+	   any need to change it. */
+	struct mempolicy *oldpol; 
+	cpumask_t oldmask = current->cpus_allowed;
+	int node = pcibus_to_node(dev->bus);
+	if (node >= 0 && node_online(node))
+	    set_cpus_allowed(current, node_to_cpumask(node));	
+	/* And set default memory allocation policy */
+	oldpol = current->mempolicy;
+	current->mempolicy = &default_policy;
+	mpol_get(current->mempolicy);
+#endif
+	error = drv->probe(dev, id);
+#ifdef CONFIG_NUMA
+	set_cpus_allowed(current, oldmask);
+	mpol_free(current->mempolicy);
+	current->mempolicy = oldpol;
+#endif
+	return error;
+}
+
 /**
  * __pci_device_probe()
  * 
@@ -184,7 +213,7 @@ __pci_device_probe(struct pci_driver *dr
 
 		id = pci_match_device(drv, pci_dev);
 		if (id)
-			error = drv->probe(pci_dev, id);
+			error = pci_call_probe(drv, pci_dev, id);
 		if (error >= 0) {
 			pci_dev->driver = drv;
 			error = 0;
Index: linux/mm/mempolicy.c
===================================================================
--- linux.orig/mm/mempolicy.c
+++ linux/mm/mempolicy.c
@@ -88,7 +88,7 @@ static kmem_cache_t *sn_cache;
    policied. */
 static int policy_zone;
 
-static struct mempolicy default_policy = {
+struct mempolicy default_policy = {
 	.refcnt = ATOMIC_INIT(1), /* never free it */
 	.policy = MPOL_DEFAULT,
 };
Index: linux/include/linux/mempolicy.h
===================================================================
--- linux.orig/include/linux/mempolicy.h
+++ linux/include/linux/mempolicy.h
@@ -152,6 +152,7 @@ struct mempolicy *mpol_shared_policy_loo
 
 extern void numa_default_policy(void);
 extern void numa_policy_init(void);
+extern struct mempolicy default_policy;
 
 #else
 





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-06 17:56   ` Andi Kleen
@ 2005-07-06 18:01     ` Christoph Lameter
  2005-07-06 18:13       ` Andi Kleen
  2005-07-06 19:33     ` Christoph Lameter
  1 sibling, 1 reply; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 18:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh

On Wed, 6 Jul 2005, Andi Kleen wrote:

> On Wed, Jul 06, 2005 at 09:35:32AM -0700, Christoph Lameter wrote:
> > On Wed, 6 Jul 2005, Andi Kleen wrote:
> > 
> > > Instead of adding messy kmalloc_node()s everywhere run the 
> > > PCI driver probe on the node local to the device.
> > > Then the normal NUMA aware allocators do the right thing.
> > 
> > That depends on the architecture. Some do round robin allocs for periods 
> > of time during bootup. I think it is better to explicitly place control 
> 
> slab will usually do the right thing because it has a forced
> local node policy, but __gfp might not.

GFP allocs may not do the right thing. If you want to do this then it 
may be best to set the memory policy to restrict allocations to the node 
on which the device resides.

Plus there are CPU less nodes. What happens to those?

> Patching every driver in existence? That sounds like a lot of
> work. 

No just patch those that would benefit from it. The existing 
dma_alloc_coherent already takes care of many of the placement 
issues for driver memory. We would probably need to patch more locations 
where higher level control structure allocations are being done.

> The node local placement should be correct for nearly all drivers. I didn't 
> see any other fancy placement in your patches. If a driver still wants to do 
> fancy placement it is free to overwrite the policy. Having a good
> default is good.

Definitely.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-06 18:01     ` Christoph Lameter
@ 2005-07-06 18:13       ` Andi Kleen
  2005-07-06 18:28         ` Christoph Lameter
  2005-07-06 19:31         ` Christoph Lameter
  0 siblings, 2 replies; 11+ messages in thread
From: Andi Kleen @ 2005-07-06 18:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh

On Wed, Jul 06, 2005 at 11:01:14AM -0700, Christoph Lameter wrote:
> On Wed, 6 Jul 2005, Andi Kleen wrote:
> 
> > On Wed, Jul 06, 2005 at 09:35:32AM -0700, Christoph Lameter wrote:
> > > On Wed, 6 Jul 2005, Andi Kleen wrote:
> > > 
> > > > Instead of adding messy kmalloc_node()s everywhere run the 
> > > > PCI driver probe on the node local to the device.
> > > > Then the normal NUMA aware allocators do the right thing.
> > > 
> > > That depends on the architecture. Some do round robin allocs for periods 
> > > of time during bootup. I think it is better to explicitly place control 
> > 
> > slab will usually do the right thing because it has a forced
> > local node policy, but __gfp might not.
> 
> GFP allocs may not do the right thing. If you want to do this then it 
> may be best to set the memory policy to restrict allocations to the node 
> on which the device resides.

They will do the right thing. Under memory pressue on the node 
it is better to back off than to fail.

> 
> Plus there are CPU less nodes. What happens to those?

They are not worse off that they are now.

> 
> > Patching every driver in existence? That sounds like a lot of
> > work. 
> 
> No just patch those that would benefit from it. The existing 

This would be "all devices that SGI ships on Altixes" ?

IMHO all can benefit.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-06 18:13       ` Andi Kleen
@ 2005-07-06 18:28         ` Christoph Lameter
  2005-07-06 19:31         ` Christoph Lameter
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 18:28 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh

On Wed, 6 Jul 2005, Andi Kleen wrote:

> > > Patching every driver in existence? That sounds like a lot of
> > > work. 
> > 
> > No just patch those that would benefit from it. The existing 
> 
> This would be "all devices that SGI ships on Altixes" ?

Anyone can patch devices drivers. High performance drivers suffer the most 
from wrong node placement. These are most likely 10G ethernet, high speed 
scsi etc.

The main concern at this point are the higher abstraction layers. These 
are generic and if they do the right thing then we have already come a 
long way.

> IMHO all can benefit.

Absolutely. 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-06 18:13       ` Andi Kleen
  2005-07-06 18:28         ` Christoph Lameter
@ 2005-07-06 19:31         ` Christoph Lameter
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 19:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh

On Wed, 6 Jul 2005, Andi Kleen wrote:

> > GFP allocs may not do the right thing. If you want to do this then it 
> > may be best to set the memory policy to restrict allocations to the node 
> > on which the device resides.
> 
> They will do the right thing. Under memory pressue on the node 
> it is better to back off than to fail.

Node specific allocs fall back to other nodes and will not fail unless 
there is no memory available.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-06 17:56   ` Andi Kleen
  2005-07-06 18:01     ` Christoph Lameter
@ 2005-07-06 19:33     ` Christoph Lameter
  2005-07-07 10:39       ` Andi Kleen
  1 sibling, 1 reply; 11+ messages in thread
From: Christoph Lameter @ 2005-07-06 19:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh

On Wed, 6 Jul 2005, Andi Kleen wrote:

> > That depends on the architecture. Some do round robin allocs for periods 
> > of time during bootup. I think it is better to explicitly place control 
> 
> slab will usually do the right thing because it has a forced
> local node policy, but __gfp might not.

The slab allocator will do the right thing with the numa slab allocator in 
Andrew's tree but not with the one in Linus'tree. The one is Linus tree
will just pickup whatever slab is available irregardless of the node.
Only kmalloc_node will make a reasonable attempt to locate the memory on 
a specific node.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-06 19:33     ` Christoph Lameter
@ 2005-07-07 10:39       ` Andi Kleen
  2005-07-07 13:52         ` Christoph Lameter
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2005-07-07 10:39 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh

On Wed, Jul 06, 2005 at 12:33:51PM -0700, Christoph Lameter wrote:
> On Wed, 6 Jul 2005, Andi Kleen wrote:
> 
> > > That depends on the architecture. Some do round robin allocs for periods 
> > > of time during bootup. I think it is better to explicitly place control 
> > 
> > slab will usually do the right thing because it has a forced
> > local node policy, but __gfp might not.
> 
> The slab allocator will do the right thing with the numa slab allocator in 
> Andrew's tree but not with the one in Linus'tree. The one is Linus tree
> will just pickup whatever slab is available irregardless of the node.

It should usually do the right thing because it
runs on the correct CPUs. The only case that doesn't work 
is freeing on different CPUs than it was allocated, but hopefully
that is not too common during system startup.

And then at some point NUMA aware slab will make it into mainline
I guess.

> Only kmalloc_node will make a reasonable attempt to locate the memory on 
> a specific node.

You forgot __get_free_pages.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-07 10:39       ` Andi Kleen
@ 2005-07-07 13:52         ` Christoph Lameter
  2005-07-07 14:13           ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Lameter @ 2005-07-07 13:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-kernel, linux-pci, gregkh

On Thu, 7 Jul 2005, Andi Kleen wrote:

> > The slab allocator will do the right thing with the numa slab allocator in 
> > Andrew's tree but not with the one in Linus'tree. The one is Linus tree
> > will just pickup whatever slab is available irregardless of the node.
> 
> It should usually do the right thing because it
> runs on the correct CPUs. The only case that doesn't work 
> is freeing on different CPUs than it was allocated, but hopefully
> that is not too common during system startup.

The current slab wont do that unless you allocate enough entries so 
that a new page is retrieved from the page allocator. Then you may have 
local memory from the slab (if the memory policy is not on round robin).

If you allocate some slab entries on one node then you typically have a 
partially used page from that node. If you then switch to a different 
processor on a different node and then use the slab allocator to get an 
entry for that slab then that partially used page will be used! The
slab allocator will return an entry from the *prior* node.

> And then at some point NUMA aware slab will make it into mainline > I guess.

Hopefully.

> > Only kmalloc_node will make a reasonable attempt to locate the memory on 
> > a specific node.
> 
> You forgot __get_free_pages.

The slab allocator uses alloc_pages and alloc_pages_node

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Run PCI driver initialization on local node
  2005-07-07 13:52         ` Christoph Lameter
@ 2005-07-07 14:13           ` Andi Kleen
  0 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2005-07-07 14:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-kernel, linux-pci, gregkh

> > > Only kmalloc_node will make a reasonable attempt to locate the memory on 
> > > a specific node.
> > 
> > You forgot __get_free_pages.
> 
> The slab allocator uses alloc_pages and alloc_pages_node

alloc_pages is NUMA aware. __get_free_pages is just a wrapper for it.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-07-07 14:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-06 13:32 [PATCH] Run PCI driver initialization on local node Andi Kleen
2005-07-06 16:35 ` Christoph Lameter
2005-07-06 17:56   ` Andi Kleen
2005-07-06 18:01     ` Christoph Lameter
2005-07-06 18:13       ` Andi Kleen
2005-07-06 18:28         ` Christoph Lameter
2005-07-06 19:31         ` Christoph Lameter
2005-07-06 19:33     ` Christoph Lameter
2005-07-07 10:39       ` Andi Kleen
2005-07-07 13:52         ` Christoph Lameter
2005-07-07 14:13           ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox