public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH]  pci: derive nearby CPUs from device's instead of bus' NUMA information
@ 2009-04-17 10:01 Andreas Herrmann
  2009-04-17 16:21 ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Herrmann @ 2009-04-17 10:01 UTC (permalink / raw)
  To: jbarnes; +Cc: Ingo Molnar, H. Peter Anvin, linux-kernel

In case of AMD CPU northbridge functions this NUMA information might
differ.

Here is an example from a 4-socket system.

Currently Linux shows

  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat numa_node
  0
  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat local_cpu*
  0-3
  00000000,0000000f

which is not correct for northbridge functions as the local CPUs
are those of the same socket.

With this patch and a quirk for AMD CPU NB functions Linux can
do better and correctly show

  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat numa_node
  2
  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat local_cpu*
  8-11
  00000000,00000f00

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
---
 drivers/pci/pci-sysfs.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

The quirk for AMD CPU NB functions is contained in another patch
that I'll send to x86-maintainers for inclusion into tip tree.

Please apply.

Thanks,
Andreas

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index a7eb1b4..9360f3d 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -74,7 +74,11 @@ static ssize_t local_cpus_show(struct device *dev,
 	const struct cpumask *mask;
 	int len;
 
+#ifdef CONFIG_NUMA
+	mask = cpumask_of_node(dev_to_node(dev));
+#else
 	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
+#endif
 	len = cpumask_scnprintf(buf, PAGE_SIZE-2, mask);
 	buf[len++] = '\n';
 	buf[len] = '\0';
@@ -88,7 +92,11 @@ static ssize_t local_cpulist_show(struct device *dev,
 	const struct cpumask *mask;
 	int len;
 
+#ifdef CONFIG_NUMA
+	mask = cpumask_of_node(dev_to_node(dev));
+#else
 	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
+#endif
 	len = cpulist_scnprintf(buf, PAGE_SIZE-2, mask);
 	buf[len++] = '\n';
 	buf[len] = '\0';
-- 
1.6.2




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH]  pci: derive nearby CPUs from device's instead of bus' NUMA information
  2009-04-17 10:01 [PATCH] pci: derive nearby CPUs from device's instead of bus' NUMA information Andreas Herrmann
@ 2009-04-17 16:21 ` Ingo Molnar
  2009-04-17 19:26   ` Yinghai Lu
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2009-04-17 16:21 UTC (permalink / raw)
  To: Andreas Herrmann; +Cc: jbarnes, H. Peter Anvin, linux-kernel


* Andreas Herrmann <andreas.herrmann3@amd.com> wrote:

> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index a7eb1b4..9360f3d 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -74,7 +74,11 @@ static ssize_t local_cpus_show(struct device *dev,
>  	const struct cpumask *mask;
>  	int len;
>  
> +#ifdef CONFIG_NUMA
> +	mask = cpumask_of_node(dev_to_node(dev));
> +#else
>  	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
> +#endif
>  	len = cpumask_scnprintf(buf, PAGE_SIZE-2, mask);
>  	buf[len++] = '\n';
>  	buf[len] = '\0';
> @@ -88,7 +92,11 @@ static ssize_t local_cpulist_show(struct device *dev,
>  	const struct cpumask *mask;
>  	int len;
>  
> +#ifdef CONFIG_NUMA
> +	mask = cpumask_of_node(dev_to_node(dev));
> +#else
>  	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
> +#endif

No objections against the change (at all), but this pattern cries 
out for a different, cleaner solution.

Shouldnt there be a cpumask_of_pcidev(dev) helper instead, which 
[recognizing that most PCI devices dont get their node info 
initialized in practice] would do something like:

const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
{
	if (dev->numa_node == -1)
		return cpumask_of_pcibus(to_pci_dev(dev)->bus);

	return cpumask_of_node(dev_to_node(dev));
}

? This would work fine in all cases.

Which you could thus use in both cases above, cleanly.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci: derive nearby CPUs from device's instead of bus'  NUMA information
  2009-04-17 16:21 ` Ingo Molnar
@ 2009-04-17 19:26   ` Yinghai Lu
  2009-04-20  8:47     ` Andreas Herrmann
  0 siblings, 1 reply; 10+ messages in thread
From: Yinghai Lu @ 2009-04-17 19:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andreas Herrmann, jbarnes, H. Peter Anvin, linux-kernel

On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andreas Herrmann <andreas.herrmann3@amd.com> wrote:
>
>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>> index a7eb1b4..9360f3d 100644
>> --- a/drivers/pci/pci-sysfs.c
>> +++ b/drivers/pci/pci-sysfs.c
>> @@ -74,7 +74,11 @@ static ssize_t local_cpus_show(struct device *dev,
>>       const struct cpumask *mask;
>>       int len;
>>
>> +#ifdef CONFIG_NUMA
>> +     mask = cpumask_of_node(dev_to_node(dev));
>> +#else
>>       mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
>> +#endif
>>       len = cpumask_scnprintf(buf, PAGE_SIZE-2, mask);
>>       buf[len++] = '\n';
>>       buf[len] = '\0';
>> @@ -88,7 +92,11 @@ static ssize_t local_cpulist_show(struct device *dev,
>>       const struct cpumask *mask;
>>       int len;
>>
>> +#ifdef CONFIG_NUMA
>> +     mask = cpumask_of_node(dev_to_node(dev));
>> +#else
>>       mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
>> +#endif
>
> No objections against the change (at all), but this pattern cries
> out for a different, cleaner solution.
>
> Shouldnt there be a cpumask_of_pcidev(dev) helper instead, which
> [recognizing that most PCI devices dont get their node info
> initialized in practice] would do something like:
>
> const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> {
>        if (dev->numa_node == -1)
>                return cpumask_of_pcibus(to_pci_dev(dev)->bus);
>
>        return cpumask_of_node(dev_to_node(dev));
> }
>
> ? This would work fine in all cases.

you are right, dev_to_node(dev) could return -1 on 64bit, if there is
no memory on that node.

YH

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci: derive nearby CPUs from device's instead of bus' NUMA information
  2009-04-17 19:26   ` Yinghai Lu
@ 2009-04-20  8:47     ` Andreas Herrmann
  2009-04-20 20:03       ` Jesse Barnes
  2009-04-20 21:23       ` Yinghai Lu
  0 siblings, 2 replies; 10+ messages in thread
From: Andreas Herrmann @ 2009-04-20  8:47 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Ingo Molnar, jbarnes, H. Peter Anvin, linux-kernel

On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
> On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@elte.hu> wrote:
> > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> > {
> >        if (dev->numa_node == -1)
> >                return cpumask_of_pcibus(to_pci_dev(dev)->bus);
> >
> >        return cpumask_of_node(dev_to_node(dev));
> > }
> >
> > ? This would work fine in all cases.

Yes, I think so. That's the general solution w/o additional
"ifdefing".

> you are right, dev_to_node(dev) could return -1 on 64bit, if there is
> no memory on that node.

Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.

During initialization the struct device's numa_node is set to -1 and
later on the information is inherited from the parent numa_node.

So what do I miss?


Thanks,
Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci: derive nearby CPUs from device's instead of bus' NUMA information
  2009-04-20  8:47     ` Andreas Herrmann
@ 2009-04-20 20:03       ` Jesse Barnes
  2009-05-07  8:51         ` Andreas Herrmann
  2009-04-20 21:23       ` Yinghai Lu
  1 sibling, 1 reply; 10+ messages in thread
From: Jesse Barnes @ 2009-04-20 20:03 UTC (permalink / raw)
  To: Andreas Herrmann; +Cc: Yinghai Lu, Ingo Molnar, H. Peter Anvin, linux-kernel

On Mon, 20 Apr 2009 10:47:47 +0200
Andreas Herrmann <andreas.herrmann3@amd.com> wrote:

> On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
> > On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@elte.hu> wrote:
> > > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> > > {
> > >        if (dev->numa_node == -1)
> > >                return cpumask_of_pcibus(to_pci_dev(dev)->bus);
> > >
> > >        return cpumask_of_node(dev_to_node(dev));
> > > }
> > >
> > > ? This would work fine in all cases.
> 
> Yes, I think so. That's the general solution w/o additional
> "ifdefing".
> 
> > you are right, dev_to_node(dev) could return -1 on 64bit, if there
> > is no memory on that node.
> 
> Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.
> 
> During initialization the struct device's numa_node is set to -1 and
> later on the information is inherited from the parent numa_node.
> 
> So what do I miss?

I like the idea of cpumask_of_pcidev(), but it seems like
cpumask_of_pcibus should return the same value.  So if the node is
unassigned or "equadistant" (there's code that treats -1 as both I
think), cpumask_of_pcibus should figure out what the nearest CPUs are
and return that, right?

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci: derive nearby CPUs from device's instead of bus'  NUMA information
  2009-04-20  8:47     ` Andreas Herrmann
  2009-04-20 20:03       ` Jesse Barnes
@ 2009-04-20 21:23       ` Yinghai Lu
  2009-04-21 18:05         ` Andreas Herrmann
  1 sibling, 1 reply; 10+ messages in thread
From: Yinghai Lu @ 2009-04-20 21:23 UTC (permalink / raw)
  To: Andreas Herrmann; +Cc: Ingo Molnar, jbarnes, H. Peter Anvin, linux-kernel

On Mon, Apr 20, 2009 at 1:47 AM, Andreas Herrmann
<andreas.herrmann3@amd.com> wrote:
> On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
>> On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@elte.hu> wrote:
>> > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
>> > {
>> >        if (dev->numa_node == -1)
>> >                return cpumask_of_pcibus(to_pci_dev(dev)->bus);
>> >
>> >        return cpumask_of_node(dev_to_node(dev));
>> > }
>> >
>> > ? This would work fine in all cases.
>
> Yes, I think so. That's the general solution w/o additional
> "ifdefing".
>
>> you are right, dev_to_node(dev) could return -1 on 64bit, if there is
>> no memory on that node.
>
> Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.
>
> During initialization the struct device's numa_node is set to -1 and
> later on the information is inherited from the parent numa_node.
>
parent numa_node could be -1 too.

in amd_bus.c
int get_mp_bus_to_node(int busnum)
{
        int node = -1;

        if (busnum < 0 || busnum > (BUS_NR - 1))
                return node;

        node = mp_bus_to_node[busnum];

        /*
         * let numa_node_id to decide it later in dma_alloc_pages
         * if there is no ram on that node
         */
        if (node != -1 && !node_online(node))
                node = -1;

        return node;
}


YH

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci: derive nearby CPUs from device's instead of bus' NUMA information
  2009-04-20 21:23       ` Yinghai Lu
@ 2009-04-21 18:05         ` Andreas Herrmann
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Herrmann @ 2009-04-21 18:05 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Ingo Molnar, jbarnes, H. Peter Anvin, linux-kernel

On Mon, Apr 20, 2009 at 02:23:47PM -0700, Yinghai Lu wrote:
> On Mon, Apr 20, 2009 at 1:47 AM, Andreas Herrmann
> <andreas.herrmann3@amd.com> wrote:
> > During initialization the struct device's numa_node is set to -1 and
> > later on the information is inherited from the parent numa_node.
> >
> parent numa_node could be -1 too.
> 
> in amd_bus.c
> int get_mp_bus_to_node(int busnum)
> {
>         int node = -1;
> 
>         if (busnum < 0 || busnum > (BUS_NR - 1))
>                 return node;
> 
>         node = mp_bus_to_node[busnum];
> 
>         /*
>          * let numa_node_id to decide it later in dma_alloc_pages
>          * if there is no ram on that node
>          */
>         if (node != -1 && !node_online(node))
>                 node = -1;
> 
>         return node;
> }

Ok, I see.
Thanks,

Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci: derive nearby CPUs from device's instead of bus' NUMA information
  2009-04-20 20:03       ` Jesse Barnes
@ 2009-05-07  8:51         ` Andreas Herrmann
  2009-05-11 21:54           ` Jesse Barnes
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Herrmann @ 2009-05-07  8:51 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Yinghai Lu, Ingo Molnar, H. Peter Anvin, linux-kernel

On Mon, Apr 20, 2009 at 01:03:41PM -0700, Jesse Barnes wrote:
> On Mon, 20 Apr 2009 10:47:47 +0200
> Andreas Herrmann <andreas.herrmann3@amd.com> wrote:
> 
> > On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
> > > On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@elte.hu> wrote:
> > > > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> > > > {
> > > >        if (dev->numa_node == -1)
> > > >                return cpumask_of_pcibus(to_pci_dev(dev)->bus);
> > > >
> > > >        return cpumask_of_node(dev_to_node(dev));
> > > > }
> > > >
> > > > ? This would work fine in all cases.
> > 
> > Yes, I think so. That's the general solution w/o additional
> > "ifdefing".
> > 
> > > you are right, dev_to_node(dev) could return -1 on 64bit, if there
> > > is no memory on that node.
> > 
> > Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.
> > 
> > During initialization the struct device's numa_node is set to -1 and
> > later on the information is inherited from the parent numa_node.
> > 
> > So what do I miss?
> 
> I like the idea of cpumask_of_pcidev(), but it seems like
> cpumask_of_pcibus should return the same value.  So if the node is
> unassigned or "equadistant" (there's code that treats -1 as both I
> think), cpumask_of_pcibus should figure out what the nearest CPUs are
> and return that, right?

Usually this is true.

But there is one special case.

Northbridge functions of AMD CPUs appear to be on bus 0 device 24-31
(each having 4 or 5 functions depending on the CPU family).

Requests to those devices (e.g. reading config space) are handled by
the processor(s) themselves and aren't routed to the PCI bus.
At most such requests are routed to another processor (node) if the
request is for a northbridge function of a different processor.

See 9b94b3a19b13e094c10f65f24bc358f6ffe4eacd for some additional info.

That is why I think that using cpumask_of_pcidev should have
precedence over cpumask_of_pcibus. (numa_node information of a PCI
device can be fixed up and then differ from node information of the
PCI bus .)


Regards,
Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci: derive nearby CPUs from device's instead of bus' NUMA information
  2009-05-07  8:51         ` Andreas Herrmann
@ 2009-05-11 21:54           ` Jesse Barnes
  2009-06-09  5:47             ` Andreas Herrmann
  0 siblings, 1 reply; 10+ messages in thread
From: Jesse Barnes @ 2009-05-11 21:54 UTC (permalink / raw)
  To: Andreas Herrmann; +Cc: Yinghai Lu, Ingo Molnar, H. Peter Anvin, linux-kernel

On Thu, 7 May 2009 10:51:36 +0200
Andreas Herrmann <andreas.herrmann3@amd.com> wrote:

> On Mon, Apr 20, 2009 at 01:03:41PM -0700, Jesse Barnes wrote:
> > On Mon, 20 Apr 2009 10:47:47 +0200
> > Andreas Herrmann <andreas.herrmann3@amd.com> wrote:
> > 
> > > On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
> > > > On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@elte.hu>
> > > > wrote:
> > > > > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> > > > > {
> > > > >        if (dev->numa_node == -1)
> > > > >                return cpumask_of_pcibus(to_pci_dev(dev)->bus);
> > > > >
> > > > >        return cpumask_of_node(dev_to_node(dev));
> > > > > }
> > > > >
> > > > > ? This would work fine in all cases.
> > > 
> > > Yes, I think so. That's the general solution w/o additional
> > > "ifdefing".
> > > 
> > > > you are right, dev_to_node(dev) could return -1 on 64bit, if
> > > > there is no memory on that node.
> > > 
> > > Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.
> > > 
> > > During initialization the struct device's numa_node is set to -1
> > > and later on the information is inherited from the parent
> > > numa_node.
> > > 
> > > So what do I miss?
> > 
> > I like the idea of cpumask_of_pcidev(), but it seems like
> > cpumask_of_pcibus should return the same value.  So if the node is
> > unassigned or "equadistant" (there's code that treats -1 as both I
> > think), cpumask_of_pcibus should figure out what the nearest CPUs
> > are and return that, right?
> 
> Usually this is true.
> 
> But there is one special case.
> 
> Northbridge functions of AMD CPUs appear to be on bus 0 device 24-31
> (each having 4 or 5 functions depending on the CPU family).
> 
> Requests to those devices (e.g. reading config space) are handled by
> the processor(s) themselves and aren't routed to the PCI bus.
> At most such requests are routed to another processor (node) if the
> request is for a northbridge function of a different processor.
> 
> See 9b94b3a19b13e094c10f65f24bc358f6ffe4eacd for some additional info.
> 
> That is why I think that using cpumask_of_pcidev should have
> precedence over cpumask_of_pcibus. (numa_node information of a PCI
> device can be fixed up and then differ from node information of the
> PCI bus .)

So we're making the generic code more confusing to handle an AMD
special case?  Are the functions you mention likely to have drivers
that allocate memory or need cpumask_of_pcibus info?  I guess there are
no nice solutions given the above split of the device across busses (in
a logical sense), so the cleanups Ingo suggested may be the best we can
do.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci: derive nearby CPUs from device's instead of bus' NUMA information
  2009-05-11 21:54           ` Jesse Barnes
@ 2009-06-09  5:47             ` Andreas Herrmann
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Herrmann @ 2009-06-09  5:47 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Yinghai Lu, Ingo Molnar, H. Peter Anvin, linux-kernel

On Mon, May 11, 2009 at 02:54:23PM -0700, Jesse Barnes wrote:
> On Thu, 7 May 2009 10:51:36 +0200
> Andreas Herrmann <andreas.herrmann3@amd.com> wrote:
> > On Mon, Apr 20, 2009 at 01:03:41PM -0700, Jesse Barnes wrote:
> > > On Mon, 20 Apr 2009 10:47:47 +0200
> > > Andreas Herrmann <andreas.herrmann3@amd.com> wrote:
> > > > On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
> > > > > On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@elte.hu>
> > > > > wrote:
> > > > > > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> > > > > > {
> > > > > >        if (dev->numa_node == -1)
> > > > > >                return cpumask_of_pcibus(to_pci_dev(dev)->bus);
> > > > > >
> > > > > >        return cpumask_of_node(dev_to_node(dev));
> > > > > > }
> > > > > >
> > > > > > ? This would work fine in all cases.
> > > > 
> > > > Yes, I think so. That's the general solution w/o additional
> > > > "ifdefing".
> > > > 
> > > > > you are right, dev_to_node(dev) could return -1 on 64bit, if
> > > > > there is no memory on that node.
> > > > 
> > > > Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.
> > > > 
> > > > During initialization the struct device's numa_node is set to -1
> > > > and later on the information is inherited from the parent
> > > > numa_node.
> > > > 
> > > > So what do I miss?
> > > 
> > > I like the idea of cpumask_of_pcidev(), but it seems like
> > > cpumask_of_pcibus should return the same value.  So if the node is
> > > unassigned or "equadistant" (there's code that treats -1 as both I
> > > think), cpumask_of_pcibus should figure out what the nearest CPUs
> > > are and return that, right?
> > 
> > Usually this is true.
> > 
> > But there is one special case.
> > 
> > Northbridge functions of AMD CPUs appear to be on bus 0 device 24-31
> > (each having 4 or 5 functions depending on the CPU family).
> > 
> > Requests to those devices (e.g. reading config space) are handled by
> > the processor(s) themselves and aren't routed to the PCI bus.
> > At most such requests are routed to another processor (node) if the
> > request is for a northbridge function of a different processor.
> > 
> > See 9b94b3a19b13e094c10f65f24bc358f6ffe4eacd for some additional info.
> > 
> > That is why I think that using cpumask_of_pcidev should have
> > precedence over cpumask_of_pcibus. (numa_node information of a PCI
> > device can be fixed up and then differ from node information of the
> > PCI bus .)
> 
> So we're making the generic code more confusing to handle an AMD
> special case?

Yes.

> Are the functions you mention likely to have drivers
> that allocate memory or need cpumask_of_pcibus info?

Rarely or better say not at the moment.

> I guess there are no nice solutions given the above split of the
> device across busses (in a logical sense), so the cleanups Ingo
> suggested may be the best we can do.

Yes, I think so.


Regards,
Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-06-09  5:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-17 10:01 [PATCH] pci: derive nearby CPUs from device's instead of bus' NUMA information Andreas Herrmann
2009-04-17 16:21 ` Ingo Molnar
2009-04-17 19:26   ` Yinghai Lu
2009-04-20  8:47     ` Andreas Herrmann
2009-04-20 20:03       ` Jesse Barnes
2009-05-07  8:51         ` Andreas Herrmann
2009-05-11 21:54           ` Jesse Barnes
2009-06-09  5:47             ` Andreas Herrmann
2009-04-20 21:23       ` Yinghai Lu
2009-04-21 18:05         ` Andreas Herrmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox