[PATCH 00/11] x86: cpumask: some more cpumask cleanups

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/11] x86: cpumask: some more cpumask cleanups
@ 2009-01-04 13:17 Mike Travis
  2009-01-04 13:18 ` [PATCH 01/11] [PATCH] ia64: cpumask fix for is_affinity_mask_valid() Mike Travis
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel


Here's some more cpumask cleanups.

    ia64: cpumask fix for is_affinity_mask_valid()
    cpumask: update local_cpus_show to use new cpumask API
    cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
    x86: cleanup remaining cpumask_t ops in smpboot code
    x86: clean up speedstep-centrino and reduce cpumask_t usage
    cpumask: Replace CPUMASK_ALLOC etc with cpumask_var_t.
    cpumask: convert struct cpufreq_policy to cpumask_var_t.
    cpumask: use work_on_cpu in acpi/cstate.c
    cpumask: use cpumask_var_t in acpi-cpufreq.c
    cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
    cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs

This version basically splits out the changes to make it more
bisectable, and has been patch-wise compile/boot tested.  Updated
stats are below.

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Len Brown <len.brown@intel.com>

---

The number of stack hogs have been significantly reduced:

====== Stack (-l 500)
    1 - allyesconfig-128
    2 - allyesconfig-4k

  .1.    .2.    ..final..
    0  +1032   1032      .  flush_tlb_page
    0  +1024   1024      .  kvm_reload_remote_mmus
    0  +1024   1024      .  kvm_flush_remote_tlbs
    0  +1024   1024      .  flush_tlb_mm
    0  +1024   1024      .  flush_tlb_current_task
    0   +824    824      .  efx_pci_probe_main
    0   +800    800      .  cpuset_write_resmask
    0   +736    736      .  update_flag
    0   +696    696      .  init_intel_cacheinfo
    0   +640    640      .  cpuset_attach
    0   +600    600      .  shmem_getpage
    0   +600    600      .  powernowk8_target
    0   +584    584      .  powernowk8_cpu_init
    0   +584    584      .  __percpu_alloc_mask
    0   +552    552      .  xen_flush_tlb_others
    0   +552    552      .  smp_call_function_many
    0   +536    536      .  reload_store
    0   +536    536      .  native_flush_tlb_others
    0   +536    536      .  cpuset_common_file_read
    0   +528    528      .  get_cur_freq
    0   +528    528      .  check_supported_cpu
    0   +520    520      .  powernowk8_get
    0   +520    520      .  microcode_write
    0   +520    520      .  microcode_init_cpu
    0   +520    520      .  cpuset_can_attach
    0   +512    512      .  smi_request
    0   +512    512      .  pci_device_probe
    0   +512    512      .  find_lowest_rq


And the overall memory usage is becoming quite less affected by changing
NR_CPUS from 128 to 4096:

====== Text/Data ()
    1 - allyesconfig-128
    2 - allyesconfig-4k

       .1.       .2.    ..final..
  32950272    +10240   32960512 +0.03%  TextSize
  45082624    +77824   45160448 +0.17%  DataSize
  11436032  +4167680   15603712   +36%  BssSize
   3514368   +794624    4308992   +22%  InitSize
   1904640     +4096    1908736 +0.22%  PerCPU
    360448   +722944    1083392  +200%  OtherSize
  95248384  +5777408  101025792  +106%  Totals


Detailing it by section shows there still are a few areas to examine more
closely.  Note init.data is "give back" memory, but data.cacheline_aligned
indicates there are still many lists sized by NR_CPUS.


====== Sections (-l 500)
    1 - allyesconfig-128
    2 - allyesconfig-4k

        .1.       .2.    ..final..
  563607471  +6217546   569825017 +1.10%  Total
  362852723   +357693   363210416 +0.10%  .debug_info
   43002053    +10004    43012057 +0.02%  .debug_loc
   32949913    +10464    32960377 +0.03%  .text
   25793792     +4848    25798640 +0.02%  .debug_ranges
   22693731    +39360    22733091 +0.17%  .rodata
   21157048    +33968    21191016 +0.16%  .data
   19190023      +983    19191006 +0.01%  .debug_line
   11436936  +4167424    15604360   +36%  .bss
    7205306    +85021     7290327 +1.18%  .debug_abbrev
    1904160     +5504     1909664 +0.29%  .data.percpu
     829999      +709      830708 +0.09%  .init.text
     595328   +783920     1379248  +131%  .init.data
     150400     +1456      151856 +0.97%  __param
     101464   +239632      341096  +236%  .data.read_mostly
      32384   +475648      508032 +1468%  .data.cacheline_aligned


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/11] [PATCH] ia64: cpumask fix for is_affinity_mask_valid()
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 13:18 ` [PATCH 02/11] cpumask: update local_cpus_show to use new cpumask API Mike Travis
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel

[-- Attachment #1: ia64:fixup-cpumask-refs.patch --]
[-- Type: text/plain, Size: 2819 bytes --]

Impact: cleanup

The function prototype should use 'struct cpumask *' to declare
cpumask arguments (instead of cpumask_var_t).

Note: arch/ia64/kernel/irq.c still had the following "old cpumask_t" usages:

105:	cpumask_t mask = CPU_MASK_NONE;
107:	cpu_set(cpu_logical_id(hwid), mask);
110:                 irq_desc[irq].affinity = mask;

	... replaced with a simple "cpumask_of(cpu_logical_id(hwid))".

161:			new_cpu = any_online_cpu(cpu_online_map);
194:		time_keeper_id = first_cpu(cpu_online_map);

	... replaced with cpu_online_mask refs.

Based on tip/cpus4096-v2.

Build tested with ia64-allyesconfig.

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/ia64/include/asm/irq.h |    2 +-
 arch/ia64/kernel/irq.c      |   15 ++++++---------
 2 files changed, 7 insertions(+), 10 deletions(-)

--- linux-2.6-for-ingo.orig/arch/ia64/include/asm/irq.h
+++ linux-2.6-for-ingo/arch/ia64/include/asm/irq.h
@@ -27,7 +27,7 @@ irq_canonicalize (int irq)
 }
 
 extern void set_irq_affinity_info (unsigned int irq, int dest, int redir);
-bool is_affinity_mask_valid(cpumask_var_t cpumask);
+bool is_affinity_mask_valid(const struct cpumask *cpumask);
 
 #define is_affinity_mask_valid is_affinity_mask_valid
 
--- linux-2.6-for-ingo.orig/arch/ia64/kernel/irq.c
+++ linux-2.6-for-ingo/arch/ia64/kernel/irq.c
@@ -102,17 +102,14 @@ static char irq_redir [NR_IRQS]; // = { 
 
 void set_irq_affinity_info (unsigned int irq, int hwid, int redir)
 {
-	cpumask_t mask = CPU_MASK_NONE;
-
-	cpu_set(cpu_logical_id(hwid), mask);
-
 	if (irq < NR_IRQS) {
-		irq_desc[irq].affinity = mask;
+		cpumask_copy(&irq_desc[irq].affinity,
+			     cpumask_of(cpu_logical_id(hwid)));
 		irq_redir[irq] = (char) (redir & 0xff);
 	}
 }
 
-bool is_affinity_mask_valid(cpumask_var_t cpumask)
+bool is_affinity_mask_valid(const struct cpumask *cpumask)
 {
 	if (ia64_platform_is("sn2")) {
 		/* Only allow one CPU to be specified in the smp_affinity mask */
@@ -128,7 +125,7 @@ bool is_affinity_mask_valid(cpumask_var_
 unsigned int vectors_in_migration[NR_IRQS];
 
 /*
- * Since cpu_online_map is already updated, we just need to check for
+ * Since cpu_online_mask is already updated, we just need to check for
  * affinity that has zeros
  */
 static void migrate_irqs(void)
@@ -158,7 +155,7 @@ static void migrate_irqs(void)
 			 */
 			vectors_in_migration[irq] = irq;
 
-			new_cpu = any_online_cpu(cpu_online_map);
+			new_cpu = cpumask_any(cpu_online_mask);
 
 			/*
 			 * Al three are essential, currently WARN_ON.. maybe panic?
@@ -191,7 +188,7 @@ void fixup_irqs(void)
 	 * Find a new timesync master
 	 */
 	if (smp_processor_id() == time_keeper_id) {
-		time_keeper_id = first_cpu(cpu_online_map);
+		time_keeper_id = cpumask_first(cpu_online_mask);
 		printk ("CPU %d is now promoted to time-keeper master\n", time_keeper_id);
 	}
 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 02/11] cpumask: update local_cpus_show to use new cpumask API
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
  2009-01-04 13:18 ` [PATCH 01/11] [PATCH] ia64: cpumask fix for is_affinity_mask_valid() Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 13:18 ` [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity " Mike Travis
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel, Jesse Barnes

[-- Attachment #1: cpumask:update-local_cpus_show.patch --]
[-- Type: text/plain, Size: 1411 bytes --]

Impact: cleanup, reduce stack usage, use new cpumask API.

Replace the local cpumask_t variable with a pointer to the
const cpumask that needs to be printed.

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/pci/pci-sysfs.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- linux-2.6-for-ingo.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6-for-ingo/drivers/pci/pci-sysfs.c
@@ -70,11 +70,11 @@ static ssize_t broken_parity_status_stor
 static ssize_t local_cpus_show(struct device *dev,
 			struct device_attribute *attr, char *buf)
 {		
-	cpumask_t mask;
+	const struct cpumask *mask;
 	int len;
 
-	mask = pcibus_to_cpumask(to_pci_dev(dev)->bus);
-	len = cpumask_scnprintf(buf, PAGE_SIZE-2, &mask);
+	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
+	len = cpumask_scnprintf(buf, PAGE_SIZE-2, mask);
 	buf[len++] = '\n';
 	buf[len] = '\0';
 	return len;
@@ -84,11 +84,11 @@ static ssize_t local_cpus_show(struct de
 static ssize_t local_cpulist_show(struct device *dev,
 			struct device_attribute *attr, char *buf)
 {
-	cpumask_t mask;
+	const struct cpumask *mask;
 	int len;
 
-	mask = pcibus_to_cpumask(to_pci_dev(dev)->bus);
-	len = cpulist_scnprintf(buf, PAGE_SIZE-2, &mask);
+	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
+	len = cpulist_scnprintf(buf, PAGE_SIZE-2, mask);
 	buf[len++] = '\n';
 	buf[len] = '\0';
 	return len;


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
  2009-01-04 13:18 ` [PATCH 01/11] [PATCH] ia64: cpumask fix for is_affinity_mask_valid() Mike Travis
  2009-01-04 13:18 ` [PATCH 02/11] cpumask: update local_cpus_show to use new cpumask API Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-05 19:27   ` Jesse Barnes
  2009-01-04 13:18 ` [PATCH 04/11] x86: cleanup remaining cpumask_t ops in smpboot code Mike Travis
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel, Jesse Barnes

[-- Attachment #1: cpumask:update-pci_bus_show_cpuaffinity.patch --]
[-- Type: text/plain, Size: 942 bytes --]

Impact: cleanup, reduce stack usage, use new cpumask API.

Replace the local cpumask_t variable with a pointer to the
const cpumask that needs to be printed.

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/pci/probe.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- linux-2.6-for-ingo.orig/drivers/pci/probe.c
+++ linux-2.6-for-ingo/drivers/pci/probe.c
@@ -51,12 +51,12 @@ static ssize_t pci_bus_show_cpuaffinity(
 					char *buf)
 {
 	int ret;
-	cpumask_t cpumask;
+	const struct cpumask *cpumask;
 
-	cpumask = pcibus_to_cpumask(to_pci_bus(dev));
+	cpumask = cpumask_of_pcibus(to_pci_bus(dev));
 	ret = type?
-		cpulist_scnprintf(buf, PAGE_SIZE-2, &cpumask) :
-		cpumask_scnprintf(buf, PAGE_SIZE-2, &cpumask);
+		cpulist_scnprintf(buf, PAGE_SIZE-2, cpumask) :
+		cpumask_scnprintf(buf, PAGE_SIZE-2, cpumask);
 	buf[ret++] = '\n';
 	buf[ret] = '\0';
 	return ret;


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  2009-01-04 13:18 ` [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity " Mike Travis
@ 2009-01-05 19:27   ` Jesse Barnes
  2009-01-05 19:31     ` Mike Travis
  2009-01-05 19:44     ` Linus Torvalds
  0 siblings, 2 replies; 24+ messages in thread
From: Jesse Barnes @ 2009-01-05 19:27 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Rusty Russell, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds, Jack Steiner, linux-kernel

On Sunday, January 4, 2009 5:18 am Mike Travis wrote:
> Impact: cleanup, reduce stack usage, use new cpumask API.
>
> Replace the local cpumask_t variable with a pointer to the
> const cpumask that needs to be printed.
>
> Signed-off-by: Mike Travis <travis@sgi.com>
> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>

Can you resend these two against my linux-next branch?

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  2009-01-05 19:27   ` Jesse Barnes
@ 2009-01-05 19:31     ` Mike Travis
  2009-01-05 19:59       ` Jesse Barnes
  2009-01-05 19:44     ` Linus Torvalds
  1 sibling, 1 reply; 24+ messages in thread
From: Mike Travis @ 2009-01-05 19:31 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Ingo Molnar, Rusty Russell, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds, Jack Steiner, linux-kernel

Jesse Barnes wrote:
> On Sunday, January 4, 2009 5:18 am Mike Travis wrote:
>> Impact: cleanup, reduce stack usage, use new cpumask API.
>>
>> Replace the local cpumask_t variable with a pointer to the
>> const cpumask that needs to be printed.
>>
>> Signed-off-by: Mike Travis <travis@sgi.com>
>> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> 
> Can you resend these two against my linux-next branch?
> 
> Thanks,

Sure thing.  Would this be the latest .../sfr/linux-next.git master
tree or would I need to select some other branch?

Thanks,
Mike


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  2009-01-05 19:31     ` Mike Travis
@ 2009-01-05 19:59       ` Jesse Barnes
  2009-01-07 15:19         ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Jesse Barnes @ 2009-01-05 19:59 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Rusty Russell, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds, Jack Steiner, linux-kernel

On Monday, January 5, 2009 11:31 am Mike Travis wrote:
> Jesse Barnes wrote:
> > On Sunday, January 4, 2009 5:18 am Mike Travis wrote:
> >> Impact: cleanup, reduce stack usage, use new cpumask API.
> >>
> >> Replace the local cpumask_t variable with a pointer to the
> >> const cpumask that needs to be printed.
> >>
> >> Signed-off-by: Mike Travis <travis@sgi.com>
> >> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> >
> > Can you resend these two against my linux-next branch?
> >
> > Thanks,
>
> Sure thing.  Would this be the latest .../sfr/linux-next.git master
> tree or would I need to select some other branch?

That would probably work, but my actual tree is at 
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6.git; the linux-
next branch is the one that I'll be sending to Linus soon.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  2009-01-05 19:59       ` Jesse Barnes
@ 2009-01-07 15:19         ` Ingo Molnar
  2009-01-07 16:59           ` Jesse Barnes
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2009-01-07 15:19 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Mike Travis, Ingo Molnar, Rusty Russell, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Jack Steiner, linux-kernel


* Jesse Barnes <jbarnes@virtuousgeek.org> wrote:

> On Monday, January 5, 2009 11:31 am Mike Travis wrote:
> > Jesse Barnes wrote:
> > > On Sunday, January 4, 2009 5:18 am Mike Travis wrote:
> > >> Impact: cleanup, reduce stack usage, use new cpumask API.
> > >>
> > >> Replace the local cpumask_t variable with a pointer to the
> > >> const cpumask that needs to be printed.
> > >>
> > >> Signed-off-by: Mike Travis <travis@sgi.com>
> > >> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> > >
> > > Can you resend these two against my linux-next branch?
> > >
> > > Thanks,
> >
> > Sure thing.  Would this be the latest .../sfr/linux-next.git master
> > tree or would I need to select some other branch?
> 
> That would probably work, but my actual tree is at 
> git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6.git; the linux-
> next branch is the one that I'll be sending to Linus soon.

hm, i already have it queued up in tip/cpus4096:

588235b: cpumask: update pci_bus_show_cpuaffinity to use new cpumask API

as you said that it would be fine to do it there. I guess it's not a 
problem to have it duplicate - the code changes one narrow area of code.

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  2009-01-07 15:19         ` Ingo Molnar
@ 2009-01-07 16:59           ` Jesse Barnes
  0 siblings, 0 replies; 24+ messages in thread
From: Jesse Barnes @ 2009-01-07 16:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Travis, Ingo Molnar, Rusty Russell, H. Peter Anvin,
	Thomas Gleixner, Linus Torvalds, Jack Steiner, linux-kernel

On Wednesday, January 7, 2009 7:19 am Ingo Molnar wrote:
> * Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> > On Monday, January 5, 2009 11:31 am Mike Travis wrote:
> > > Jesse Barnes wrote:
> > > > On Sunday, January 4, 2009 5:18 am Mike Travis wrote:
> > > >> Impact: cleanup, reduce stack usage, use new cpumask API.
> > > >>
> > > >> Replace the local cpumask_t variable with a pointer to the
> > > >> const cpumask that needs to be printed.
> > > >>
> > > >> Signed-off-by: Mike Travis <travis@sgi.com>
> > > >> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> > > >
> > > > Can you resend these two against my linux-next branch?
> > > >
> > > > Thanks,
> > >
> > > Sure thing.  Would this be the latest .../sfr/linux-next.git master
> > > tree or would I need to select some other branch?
> >
> > That would probably work, but my actual tree is at
> > git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6.git; the
> > linux- next branch is the one that I'll be sending to Linus soon.
>
> hm, i already have it queued up in tip/cpus4096:
>
> 588235b: cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
>
> as you said that it would be fine to do it there. I guess it's not a
> problem to have it duplicate - the code changes one narrow area of code.

I don't have it queued in my tree; it depends on other changes in Linus' tree 
I haven't rebased to yet (will do today when I pull in Rafael's PCI 
suspend/resume fixes), but I'll let this stuff come through your tree alone.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  2009-01-05 19:27   ` Jesse Barnes
  2009-01-05 19:31     ` Mike Travis
@ 2009-01-05 19:44     ` Linus Torvalds
  2009-01-05 19:49       ` Jesse Barnes
  1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2009-01-05 19:44 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Mike Travis, Ingo Molnar, Rusty Russell, H. Peter Anvin,
	Thomas Gleixner, Jack Steiner, linux-kernel

On Mon, 5 Jan 2009, Jesse Barnes wrote:
> 
> Can you resend these two against my linux-next branch?

Btw, Jesse, what's the schedule for merging the pci thing.

I'm currently planning on -rc1 next weekend, which gives us time to do at 
least a shortened -rc2 before people are at LCA. And the suspend/resume 
changes are some of the more "exciting" (aka scary) parts of the whole 
merge window, so I'd rather get them merged with a few days to go, rather 
than just before -rc1.

In fact, they are probably more scary than the cpumask changes, since at 
least the cpumask issues are likely to not be a big deal with any normal 
sane config (ie you really do have to enable MAXSMP to hit the stack usage 
issues). So if you end up waiting for those, I'd rather prefer to first 
merge the rest of the PCI code.

Or did the PCI late-suspend/early-resume patches go in somebody elses 
tree and I'm barking up the wrong tree entirely?

			Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  2009-01-05 19:44     ` Linus Torvalds
@ 2009-01-05 19:49       ` Jesse Barnes
  0 siblings, 0 replies; 24+ messages in thread
From: Jesse Barnes @ 2009-01-05 19:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Travis, Ingo Molnar, Rusty Russell, H. Peter Anvin,
	Thomas Gleixner, Jack Steiner, linux-kernel

On Monday, January 5, 2009 11:44 am Linus Torvalds wrote:
> On Mon, 5 Jan 2009, Jesse Barnes wrote:
> > Can you resend these two against my linux-next branch?
>
> Btw, Jesse, what's the schedule for merging the pci thing.

Just pulling together some more patches today; was planning on sending a pull 
request tomorrow.

> I'm currently planning on -rc1 next weekend, which gives us time to do at
> least a shortened -rc2 before people are at LCA. And the suspend/resume
> changes are some of the more "exciting" (aka scary) parts of the whole
> merge window, so I'd rather get them merged with a few days to go, rather
> than just before -rc1.
>
> In fact, they are probably more scary than the cpumask changes, since at
> least the cpumask issues are likely to not be a big deal with any normal
> sane config (ie you really do have to enable MAXSMP to hit the stack usage
> issues). So if you end up waiting for those, I'd rather prefer to first
> merge the rest of the PCI code.

Yeah, they're no big deal, just wanted to get them (the cpumask changes) 
queued up...

> Or did the PCI late-suspend/early-resume patches go in somebody elses
> tree and I'm barking up the wrong tree entirely?

No they're coming through my tree; just need to ping Rafael again and get the 
latest set.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 04/11] x86: cleanup remaining cpumask_t ops in smpboot code
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
                   ` (2 preceding siblings ...)
  2009-01-04 13:18 ` [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity " Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 13:18 ` [PATCH 05/11] x86: clean up speedstep-centrino and reduce cpumask_t usage From: Rusty Russell <rusty@rustcorp.com.au> Mike Travis
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel

[-- Attachment #1: x86:cleanup-cpumasks-in-smpboot --]
[-- Type: text/plain, Size: 17162 bytes --]

Impact: Reduce memory and stack usage and use new cpumask API.

Allocate the following local cpumasks based on the number of cpus that
are present.  References will use new cpumask API.  (Currently only
modified for x86_64, x86_32 continues to use the *_map variants.)

    cpu_callin_mask
    cpu_callout_mask
    cpu_initialized_mask
    cpu_sibling_setup_mask

Provide the following accessor functions:

    struct cpumask *cpu_sibling_mask(int cpu)
    struct cpumask *cpu_core_mask(int cpu)

Other changes are when setting or clearing the cpu online, possible
or present maps, use the accessor functions.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
---
 arch/x86/include/asm/smp.h     |   32 +++++++++-
 arch/x86/kernel/cpu/common.c   |   26 +++++++-
 arch/x86/kernel/setup_percpu.c |   25 +++++++-
 arch/x86/kernel/smp.c          |   17 +++--
 arch/x86/kernel/smpboot.c      |  128 ++++++++++++++++++++---------------------
 5 files changed, 152 insertions(+), 76 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/include/asm/smp.h
+++ linux-2.6-for-ingo/arch/x86/include/asm/smp.h
@@ -18,9 +18,26 @@
 #include <asm/pda.h>
 #include <asm/thread_info.h>
 
+#ifdef CONFIG_X86_64
+
+extern cpumask_var_t cpu_callin_mask;
+extern cpumask_var_t cpu_callout_mask;
+extern cpumask_var_t cpu_initialized_mask;
+extern cpumask_var_t cpu_sibling_setup_mask;
+
+#else /* CONFIG_X86_32 */
+
+extern cpumask_t cpu_callin_map;
 extern cpumask_t cpu_callout_map;
 extern cpumask_t cpu_initialized;
-extern cpumask_t cpu_callin_map;
+extern cpumask_t cpu_sibling_setup_map;
+
+#define cpu_callin_mask		((struct cpumask *)&cpu_callin_map)
+#define cpu_callout_mask	((struct cpumask *)&cpu_callout_map)
+#define cpu_initialized_mask	((struct cpumask *)&cpu_initialized)
+#define cpu_sibling_setup_mask	((struct cpumask *)&cpu_sibling_setup_map)
+
+#endif /* CONFIG_X86_32 */
 
 extern void (*mtrr_hook)(void);
 extern void zap_low_mappings(void);
@@ -29,7 +46,6 @@ extern int __cpuinit get_local_pda(int c
 
 extern int smp_num_siblings;
 extern unsigned int num_processors;
-extern cpumask_t cpu_initialized;
 
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
@@ -38,6 +54,16 @@ DECLARE_PER_CPU(u16, cpu_llc_id);
 DECLARE_PER_CPU(int, cpu_number);
 #endif
 
+static inline struct cpumask *cpu_sibling_mask(int cpu)
+{
+	return &per_cpu(cpu_sibling_map, cpu);
+}
+
+static inline struct cpumask *cpu_core_mask(int cpu)
+{
+	return &per_cpu(cpu_core_map, cpu);
+}
+
 DECLARE_EARLY_PER_CPU(u16, x86_cpu_to_apicid);
 DECLARE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid);
 
@@ -149,7 +175,7 @@ void smp_store_cpu_info(int id);
 /* We don't mark CPUs online until __cpu_up(), so we need another measure */
 static inline int num_booting_cpus(void)
 {
-	return cpus_weight(cpu_callout_map);
+	return cpumask_weight(cpu_callout_mask);
 }
 #else
 static inline void prefill_possible_map(void)
--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/common.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/common.c
@@ -40,6 +40,26 @@
 
 #include "cpu.h"
 
+#ifdef CONFIG_X86_64
+
+/* all of these masks are initialized in setup_cpu_local_masks() */
+cpumask_var_t cpu_callin_mask;
+cpumask_var_t cpu_callout_mask;
+cpumask_var_t cpu_initialized_mask;
+
+/* representing cpus for which sibling maps can be computed */
+cpumask_var_t cpu_sibling_setup_mask;
+
+#else /* CONFIG_X86_32 */
+
+cpumask_t cpu_callin_map;
+cpumask_t cpu_callout_map;
+cpumask_t cpu_initialized;
+cpumask_t cpu_sibling_setup_map;
+
+#endif /* CONFIG_X86_32 */
+
+
 static struct cpu_dev *this_cpu __cpuinitdata;
 
 #ifdef CONFIG_X86_64
@@ -856,8 +876,6 @@ static __init int setup_disablecpuid(cha
 }
 __setup("clearcpuid=", setup_disablecpuid);
 
-cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
-
 #ifdef CONFIG_X86_64
 struct x8664_pda **_cpu_pda __read_mostly;
 EXPORT_SYMBOL(_cpu_pda);
@@ -976,7 +994,7 @@ void __cpuinit cpu_init(void)
 
 	me = current;
 
-	if (cpu_test_and_set(cpu, cpu_initialized))
+	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
 		panic("CPU#%d already initialized!\n", cpu);
 
 	printk(KERN_INFO "Initializing CPU#%d\n", cpu);
@@ -1085,7 +1103,7 @@ void __cpuinit cpu_init(void)
 	struct tss_struct *t = &per_cpu(init_tss, cpu);
 	struct thread_struct *thread = &curr->thread;
 
-	if (cpu_test_and_set(cpu, cpu_initialized)) {
+	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
 		printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
 		for (;;) local_irq_enable();
 	}
--- linux-2.6-for-ingo.orig/arch/x86/kernel/setup_percpu.c
+++ linux-2.6-for-ingo/arch/x86/kernel/setup_percpu.c
@@ -131,7 +131,27 @@ static void __init setup_cpu_pda_map(voi
 	/* point to new pointer table */
 	_cpu_pda = new_cpu_pda;
 }
-#endif
+
+#endif /* CONFIG_SMP && CONFIG_X86_64 */
+
+#ifdef CONFIG_X86_64
+
+/* correctly size the local cpu masks */
+static void setup_cpu_local_masks(void)
+{
+	alloc_bootmem_cpumask_var(&cpu_initialized_mask);
+	alloc_bootmem_cpumask_var(&cpu_callin_mask);
+	alloc_bootmem_cpumask_var(&cpu_callout_mask);
+	alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
+}
+
+#else /* CONFIG_X86_32 */
+
+static inline void setup_cpu_local_masks(void)
+{
+}
+
+#endif /* CONFIG_X86_32 */
 
 /*
  * Great future plan:
@@ -187,6 +207,9 @@ void __init setup_per_cpu_areas(void)
 
 	/* Setup node to cpumask map */
 	setup_node_to_cpumask_map();
+
+	/* Setup cpu initialized, callin, callout masks */
+	setup_cpu_local_masks();
 }
 
 #endif
--- linux-2.6-for-ingo.orig/arch/x86/kernel/smp.c
+++ linux-2.6-for-ingo/arch/x86/kernel/smp.c
@@ -128,16 +128,23 @@ void native_send_call_func_single_ipi(in
 
 void native_send_call_func_ipi(const struct cpumask *mask)
 {
-	cpumask_t allbutself;
+	cpumask_var_t allbutself;
 
-	allbutself = cpu_online_map;
-	cpu_clear(smp_processor_id(), allbutself);
+	if (!alloc_cpumask_var(&allbutself, GFP_ATOMIC)) {
+		send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
+		return;
+	}
 
-	if (cpus_equal(*mask, allbutself) &&
-	    cpus_equal(cpu_online_map, cpu_callout_map))
+	cpumask_copy(allbutself, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), allbutself);
+
+	if (cpumask_equal(mask, allbutself) &&
+	    cpumask_equal(cpu_online_mask, cpu_callout_mask))
 		send_IPI_allbutself(CALL_FUNCTION_VECTOR);
 	else
 		send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
+
+	free_cpumask_var(allbutself);
 }
 
 /*
--- linux-2.6-for-ingo.orig/arch/x86/kernel/smpboot.c
+++ linux-2.6-for-ingo/arch/x86/kernel/smpboot.c
@@ -102,9 +102,6 @@ EXPORT_SYMBOL(smp_num_siblings);
 /* Last level cache ID of each logical CPU */
 DEFINE_PER_CPU(u16, cpu_llc_id) = BAD_APICID;
 
-cpumask_t cpu_callin_map;
-cpumask_t cpu_callout_map;
-
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU(cpumask_t, cpu_sibling_map);
 EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
@@ -120,9 +117,6 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
 static atomic_t init_deasserted;
 
 
-/* representing cpus for which sibling maps can be computed */
-static cpumask_t cpu_sibling_setup_map;
-
 /* Set if we find a B stepping CPU */
 static int __cpuinitdata smp_b_stepping;
 
@@ -140,7 +134,7 @@ EXPORT_SYMBOL(cpu_to_node_map);
 static void map_cpu_to_node(int cpu, int node)
 {
 	printk(KERN_INFO "Mapping cpu %d to node %d\n", cpu, node);
-	cpu_set(cpu, node_to_cpumask_map[node]);
+	cpumask_set_cpu(cpu, &node_to_cpumask_map[node]);
 	cpu_to_node_map[cpu] = node;
 }
 
@@ -151,7 +145,7 @@ static void unmap_cpu_to_node(int cpu)
 
 	printk(KERN_INFO "Unmapping cpu %d from all nodes\n", cpu);
 	for (node = 0; node < MAX_NUMNODES; node++)
-		cpu_clear(cpu, node_to_cpumask_map[node]);
+		cpumask_clear_cpu(cpu, &node_to_cpumask_map[node]);
 	cpu_to_node_map[cpu] = 0;
 }
 #else /* !(CONFIG_NUMA && CONFIG_X86_32) */
@@ -209,7 +203,7 @@ static void __cpuinit smp_callin(void)
 	 */
 	phys_id = read_apic_id();
 	cpuid = smp_processor_id();
-	if (cpu_isset(cpuid, cpu_callin_map)) {
+	if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
 		panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
 					phys_id, cpuid);
 	}
@@ -231,7 +225,7 @@ static void __cpuinit smp_callin(void)
 		/*
 		 * Has the boot CPU finished it's STARTUP sequence?
 		 */
-		if (cpu_isset(cpuid, cpu_callout_map))
+		if (cpumask_test_cpu(cpuid, cpu_callout_mask))
 			break;
 		cpu_relax();
 	}
@@ -274,7 +268,7 @@ static void __cpuinit smp_callin(void)
 	/*
 	 * Allow the master to continue.
 	 */
-	cpu_set(cpuid, cpu_callin_map);
+	cpumask_set_cpu(cpuid, cpu_callin_mask);
 }
 
 static int __cpuinitdata unsafe_smp;
@@ -332,7 +326,7 @@ notrace static void __cpuinit start_seco
 	ipi_call_lock();
 	lock_vector_lock();
 	__setup_vector_irq(smp_processor_id());
-	cpu_set(smp_processor_id(), cpu_online_map);
+	set_cpu_online(smp_processor_id(), true);
 	unlock_vector_lock();
 	ipi_call_unlock();
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
@@ -438,50 +432,52 @@ void __cpuinit set_cpu_sibling_map(int c
 	int i;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
-	cpu_set(cpu, cpu_sibling_setup_map);
+	cpumask_set_cpu(cpu, cpu_sibling_setup_mask);
 
 	if (smp_num_siblings > 1) {
-		for_each_cpu_mask_nr(i, cpu_sibling_setup_map) {
-			if (c->phys_proc_id == cpu_data(i).phys_proc_id &&
-			    c->cpu_core_id == cpu_data(i).cpu_core_id) {
-				cpu_set(i, per_cpu(cpu_sibling_map, cpu));
-				cpu_set(cpu, per_cpu(cpu_sibling_map, i));
-				cpu_set(i, per_cpu(cpu_core_map, cpu));
-				cpu_set(cpu, per_cpu(cpu_core_map, i));
-				cpu_set(i, c->llc_shared_map);
-				cpu_set(cpu, cpu_data(i).llc_shared_map);
+		for_each_cpu(i, cpu_sibling_setup_mask) {
+			struct cpuinfo_x86 *o = &cpu_data(i);
+
+			if (c->phys_proc_id == o->phys_proc_id &&
+			    c->cpu_core_id == o->cpu_core_id) {
+				cpumask_set_cpu(i, cpu_sibling_mask(cpu));
+				cpumask_set_cpu(cpu, cpu_sibling_mask(i));
+				cpumask_set_cpu(i, cpu_core_mask(cpu));
+				cpumask_set_cpu(cpu, cpu_core_mask(i));
+				cpumask_set_cpu(i, &c->llc_shared_map);
+				cpumask_set_cpu(cpu, &o->llc_shared_map);
 			}
 		}
 	} else {
-		cpu_set(cpu, per_cpu(cpu_sibling_map, cpu));
+		cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
 	}
 
-	cpu_set(cpu, c->llc_shared_map);
+	cpumask_set_cpu(cpu, &c->llc_shared_map);
 
 	if (current_cpu_data.x86_max_cores == 1) {
-		per_cpu(cpu_core_map, cpu) = per_cpu(cpu_sibling_map, cpu);
+		cpumask_copy(cpu_core_mask(cpu), cpu_sibling_mask(cpu));
 		c->booted_cores = 1;
 		return;
 	}
 
-	for_each_cpu_mask_nr(i, cpu_sibling_setup_map) {
+	for_each_cpu(i, cpu_sibling_setup_mask) {
 		if (per_cpu(cpu_llc_id, cpu) != BAD_APICID &&
 		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
-			cpu_set(i, c->llc_shared_map);
-			cpu_set(cpu, cpu_data(i).llc_shared_map);
+			cpumask_set_cpu(i, &c->llc_shared_map);
+			cpumask_set_cpu(cpu, &cpu_data(i).llc_shared_map);
 		}
 		if (c->phys_proc_id == cpu_data(i).phys_proc_id) {
-			cpu_set(i, per_cpu(cpu_core_map, cpu));
-			cpu_set(cpu, per_cpu(cpu_core_map, i));
+			cpumask_set_cpu(i, cpu_core_mask(cpu));
+			cpumask_set_cpu(cpu, cpu_core_mask(i));
 			/*
 			 *  Does this new cpu bringup a new core?
 			 */
-			if (cpus_weight(per_cpu(cpu_sibling_map, cpu)) == 1) {
+			if (cpumask_weight(cpu_sibling_mask(cpu)) == 1) {
 				/*
 				 * for each core in package, increment
 				 * the booted_cores for this new cpu
 				 */
-				if (first_cpu(per_cpu(cpu_sibling_map, i)) == i)
+				if (cpumask_first(cpu_sibling_mask(i)) == i)
 					c->booted_cores++;
 				/*
 				 * increment the core count for all
@@ -504,7 +500,7 @@ const struct cpumask *cpu_coregroup_mask
 	 * And for power savings, we return cpu_core_map
 	 */
 	if (sched_mc_power_savings || sched_smt_power_savings)
-		return &per_cpu(cpu_core_map, cpu);
+		return cpu_core_mask(cpu);
 	else
 		return &c->llc_shared_map;
 }
@@ -523,7 +519,7 @@ static void impress_friends(void)
 	 */
 	pr_debug("Before bogomips.\n");
 	for_each_possible_cpu(cpu)
-		if (cpu_isset(cpu, cpu_callout_map))
+		if (cpumask_test_cpu(cpu, cpu_callout_mask))
 			bogosum += cpu_data(cpu).loops_per_jiffy;
 	printk(KERN_INFO
 		"Total of %d processors activated (%lu.%02lu BogoMIPS).\n",
@@ -904,19 +900,19 @@ do_rest:
 		 * allow APs to start initializing.
 		 */
 		pr_debug("Before Callout %d.\n", cpu);
-		cpu_set(cpu, cpu_callout_map);
+		cpumask_set_cpu(cpu, cpu_callout_mask);
 		pr_debug("After Callout %d.\n", cpu);
 
 		/*
 		 * Wait 5s total for a response
 		 */
 		for (timeout = 0; timeout < 50000; timeout++) {
-			if (cpu_isset(cpu, cpu_callin_map))
+			if (cpumask_test_cpu(cpu, cpu_callin_mask))
 				break;	/* It has booted */
 			udelay(100);
 		}
 
-		if (cpu_isset(cpu, cpu_callin_map)) {
+		if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
 			/* number CPUs logically, starting from 1 (BSP is 0) */
 			pr_debug("OK.\n");
 			printk(KERN_INFO "CPU%d: ", cpu);
@@ -941,9 +937,14 @@ restore_state:
 	if (boot_error) {
 		/* Try to put things back the way they were before ... */
 		numa_remove_cpu(cpu); /* was set by numa_add_cpu */
-		cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
-		cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
-		cpu_clear(cpu, cpu_present_map);
+
+		/* was set by do_boot_cpu() */
+		cpumask_clear_cpu(cpu, cpu_callout_mask);
+
+		/* was set by cpu_init() */
+		cpumask_clear_cpu(cpu, cpu_initialized_mask);
+
+		set_cpu_present(cpu, false);
 		per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
 	}
 
@@ -977,7 +978,7 @@ int __cpuinit native_cpu_up(unsigned int
 	/*
 	 * Already booted CPU?
 	 */
-	if (cpu_isset(cpu, cpu_callin_map)) {
+	if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
 		pr_debug("do_boot_cpu %d Already started\n", cpu);
 		return -ENOSYS;
 	}
@@ -1032,8 +1033,9 @@ int __cpuinit native_cpu_up(unsigned int
  */
 static __init void disable_smp(void)
 {
-	cpu_present_map = cpumask_of_cpu(0);
-	cpu_possible_map = cpumask_of_cpu(0);
+	/* use the read/write pointers to the present and possible maps */
+	cpumask_copy(&cpu_present_map, cpumask_of(0));
+	cpumask_copy(&cpu_possible_map, cpumask_of(0));
 	smpboot_clear_io_apic_irqs();
 
 	if (smp_found_config)
@@ -1041,8 +1043,8 @@ static __init void disable_smp(void)
 	else
 		physid_set_mask_of_physid(0, &phys_cpu_present_map);
 	map_cpu_to_logical_apicid();
-	cpu_set(0, per_cpu(cpu_sibling_map, 0));
-	cpu_set(0, per_cpu(cpu_core_map, 0));
+	cpumask_set_cpu(0, cpu_sibling_mask(0));
+	cpumask_set_cpu(0, cpu_core_mask(0));
 }
 
 /*
@@ -1064,14 +1066,14 @@ static int __init smp_sanity_check(unsig
 		nr = 0;
 		for_each_present_cpu(cpu) {
 			if (nr >= 8)
-				cpu_clear(cpu, cpu_present_map);
+				set_cpu_present(cpu, false);
 			nr++;
 		}
 
 		nr = 0;
 		for_each_possible_cpu(cpu) {
 			if (nr >= 8)
-				cpu_clear(cpu, cpu_possible_map);
+				set_cpu_possible(cpu, false);
 			nr++;
 		}
 
@@ -1167,7 +1169,7 @@ void __init native_smp_prepare_cpus(unsi
 	preempt_disable();
 	smp_cpu_index_default();
 	current_cpu_data = boot_cpu_data;
-	cpu_callin_map = cpumask_of_cpu(0);
+	cpumask_copy(cpu_callin_mask, cpumask_of(0));
 	mb();
 	/*
 	 * Setup boot CPU information
@@ -1242,8 +1244,8 @@ void __init native_smp_prepare_boot_cpu(
 	init_gdt(me);
 #endif
 	switch_to_new_gdt();
-	/* already set me in cpu_online_map in boot_cpu_init() */
-	cpu_set(me, cpu_callout_map);
+	/* already set me in cpu_online_mask in boot_cpu_init() */
+	cpumask_set_cpu(me, cpu_callout_mask);
 	per_cpu(cpu_state, me) = CPU_ONLINE;
 }
 
@@ -1311,7 +1313,7 @@ __init void prefill_possible_map(void)
 		possible, max_t(int, possible - num_processors, 0));
 
 	for (i = 0; i < possible; i++)
-		cpu_set(i, cpu_possible_map);
+		set_cpu_possible(i, true);
 
 	nr_cpu_ids = possible;
 }
@@ -1323,31 +1325,31 @@ static void remove_siblinginfo(int cpu)
 	int sibling;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
-	for_each_cpu_mask_nr(sibling, per_cpu(cpu_core_map, cpu)) {
-		cpu_clear(cpu, per_cpu(cpu_core_map, sibling));
+	for_each_cpu(sibling, cpu_core_mask(cpu)) {
+		cpumask_clear_cpu(cpu, cpu_core_mask(sibling));
 		/*/
 		 * last thread sibling in this cpu core going down
 		 */
-		if (cpus_weight(per_cpu(cpu_sibling_map, cpu)) == 1)
+		if (cpumask_weight(cpu_sibling_mask(cpu)) == 1)
 			cpu_data(sibling).booted_cores--;
 	}
 
-	for_each_cpu_mask_nr(sibling, per_cpu(cpu_sibling_map, cpu))
-		cpu_clear(cpu, per_cpu(cpu_sibling_map, sibling));
-	cpus_clear(per_cpu(cpu_sibling_map, cpu));
-	cpus_clear(per_cpu(cpu_core_map, cpu));
+	for_each_cpu(sibling, cpu_sibling_mask(cpu))
+		cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
+	cpumask_clear(cpu_sibling_mask(cpu));
+	cpumask_clear(cpu_core_mask(cpu));
 	c->phys_proc_id = 0;
 	c->cpu_core_id = 0;
-	cpu_clear(cpu, cpu_sibling_setup_map);
+	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
 }
 
 static void __ref remove_cpu_from_maps(int cpu)
 {
-	cpu_clear(cpu, cpu_online_map);
-	cpu_clear(cpu, cpu_callout_map);
-	cpu_clear(cpu, cpu_callin_map);
+	set_cpu_online(cpu, false);
+	cpumask_clear_cpu(cpu, cpu_callout_mask);
+	cpumask_clear_cpu(cpu, cpu_callin_mask);
 	/* was set by cpu_init() */
-	cpu_clear(cpu, cpu_initialized);
+	cpumask_clear_cpu(cpu, cpu_initialized_mask);
 	numa_remove_cpu(cpu);
 }
 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 05/11] x86: clean up speedstep-centrino and reduce cpumask_t usage From: Rusty Russell <rusty@rustcorp.com.au>
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
                   ` (3 preceding siblings ...)
  2009-01-04 13:18 ` [PATCH 04/11] x86: cleanup remaining cpumask_t ops in smpboot code Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 13:18 ` [PATCH 06/11] cpumask: Replace CPUMASK_ALLOC etc with cpumask_var_t. " Mike Travis
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel, Dave Jones

[-- Attachment #1: x86:speedstep-centrino-cleanup.patch --]
[-- Type: text/plain, Size: 4093 bytes --]

Impact: cleanup

1) The #ifdef CONFIG_HOTPLUG_CPU seems unnecessary these days.
2) The loop can simply skip over offline cpus, rather than creating a tmp mask.
3) set_mask is set to either a single cpu or all online cpus in a policy.
   Since it's just used for set_cpus_allowed(), any offline cpus in a policy
   don't matter, so we can just use cpumask_of_cpu() or the policy->cpus.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Dave Jones <davej@redhat.com>
---
 arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c |   51 ++++++++++-------------
 1 file changed, 24 insertions(+), 27 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
@@ -459,9 +459,7 @@ static int centrino_verify (struct cpufr
  * Sets a new CPUFreq policy.
  */
 struct allmasks {
-	cpumask_t		online_policy_cpus;
 	cpumask_t		saved_mask;
-	cpumask_t		set_mask;
 	cpumask_t		covered_cpus;
 };
 
@@ -475,9 +473,7 @@ static int centrino_target (struct cpufr
 	int			retval = 0;
 	unsigned int		j, k, first_cpu, tmp;
 	CPUMASK_ALLOC(allmasks);
-	CPUMASK_PTR(online_policy_cpus, allmasks);
 	CPUMASK_PTR(saved_mask, allmasks);
-	CPUMASK_PTR(set_mask, allmasks);
 	CPUMASK_PTR(covered_cpus, allmasks);
 
 	if (unlikely(allmasks == NULL))
@@ -497,30 +493,28 @@ static int centrino_target (struct cpufr
 		goto out;
 	}
 
-#ifdef CONFIG_HOTPLUG_CPU
-	/* cpufreq holds the hotplug lock, so we are safe from here on */
-	cpus_and(*online_policy_cpus, cpu_online_map, policy->cpus);
-#else
-	*online_policy_cpus = policy->cpus;
-#endif
-
 	*saved_mask = current->cpus_allowed;
 	first_cpu = 1;
 	cpus_clear(*covered_cpus);
-	for_each_cpu_mask_nr(j, *online_policy_cpus) {
+	for_each_cpu_mask_nr(j, policy->cpus) {
+		const cpumask_t *mask;
+
+		/* cpufreq holds the hotplug lock, so we are safe here */
+		if (!cpu_online(j))
+			continue;
+
 		/*
 		 * Support for SMP systems.
 		 * Make sure we are running on CPU that wants to change freq
 		 */
-		cpus_clear(*set_mask);
 		if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY)
-			cpus_or(*set_mask, *set_mask, *online_policy_cpus);
+			mask = &policy->cpus;
 		else
-			cpu_set(j, *set_mask);
+			mask = &cpumask_of_cpu(j);
 
-		set_cpus_allowed_ptr(current, set_mask);
+		set_cpus_allowed_ptr(current, mask);
 		preempt_disable();
-		if (unlikely(!cpu_isset(smp_processor_id(), *set_mask))) {
+		if (unlikely(!cpu_isset(smp_processor_id(), *mask))) {
 			dprintk("couldn't limit to CPUs in this domain\n");
 			retval = -EAGAIN;
 			if (first_cpu) {
@@ -548,7 +542,9 @@ static int centrino_target (struct cpufr
 			dprintk("target=%dkHz old=%d new=%d msr=%04x\n",
 				target_freq, freqs.old, freqs.new, msr);
 
-			for_each_cpu_mask_nr(k, *online_policy_cpus) {
+			for_each_cpu_mask_nr(k, policy->cpus) {
+				if (!cpu_online(k))
+					continue;
 				freqs.cpu = k;
 				cpufreq_notify_transition(&freqs,
 					CPUFREQ_PRECHANGE);
@@ -571,7 +567,9 @@ static int centrino_target (struct cpufr
 		preempt_enable();
 	}
 
-	for_each_cpu_mask_nr(k, *online_policy_cpus) {
+	for_each_cpu_mask_nr(k, policy->cpus) {
+		if (!cpu_online(k))
+			continue;
 		freqs.cpu = k;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
@@ -584,18 +582,17 @@ static int centrino_target (struct cpufr
 		 * Best effort undo..
 		 */
 
-		if (!cpus_empty(*covered_cpus))
-			for_each_cpu_mask_nr(j, *covered_cpus) {
-				set_cpus_allowed_ptr(current,
-						     &cpumask_of_cpu(j));
-				wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
-			}
+		for_each_cpu_mask_nr(j, *covered_cpus) {
+			set_cpus_allowed_ptr(current, &cpumask_of_cpu(j));
+			wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+		}
 
 		tmp = freqs.new;
 		freqs.new = freqs.old;
 		freqs.old = tmp;
-		for_each_cpu_mask_nr(j, *online_policy_cpus) {
-			freqs.cpu = j;
+		for_each_cpu_mask_nr(j, policy->cpus) {
+			if (!cpu_online(j))
+				continue;
 			cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 			cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 		}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 06/11] cpumask: Replace CPUMASK_ALLOC etc with cpumask_var_t. From: Rusty Russell <rusty@rustcorp.com.au>
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
                   ` (4 preceding siblings ...)
  2009-01-04 13:18 ` [PATCH 05/11] x86: clean up speedstep-centrino and reduce cpumask_t usage From: Rusty Russell <rusty@rustcorp.com.au> Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 13:18 ` [PATCH 07/11] cpumask: convert struct cpufreq_policy to " Mike Travis
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel, Dave Jones

[-- Attachment #1: cpumask:get-rid-of-CPUMASK_ALLOC-x86.patch --]
[-- Type: text/plain, Size: 1912 bytes --]

Impact: cleanup

There's only one user, and it's a fairly easy conversion.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Dave Jones <davej@redhat.com>
---
 arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c |   21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
@@ -458,11 +458,6 @@ static int centrino_verify (struct cpufr
  *
  * Sets a new CPUFreq policy.
  */
-struct allmasks {
-	cpumask_t		saved_mask;
-	cpumask_t		covered_cpus;
-};
-
 static int centrino_target (struct cpufreq_policy *policy,
 			    unsigned int target_freq,
 			    unsigned int relation)
@@ -472,12 +467,15 @@ static int centrino_target (struct cpufr
 	struct cpufreq_freqs	freqs;
 	int			retval = 0;
 	unsigned int		j, k, first_cpu, tmp;
-	CPUMASK_ALLOC(allmasks);
-	CPUMASK_PTR(saved_mask, allmasks);
-	CPUMASK_PTR(covered_cpus, allmasks);
+	cpumask_var_t saved_mask, covered_cpus;
 
-	if (unlikely(allmasks == NULL))
+	if (unlikely(!alloc_cpumask_var(&saved_mask, GFP_KERNEL)))
 		return -ENOMEM;
+	if (unlikely(!alloc_cpumask_var(&covered_cpus, GFP_KERNEL))) {
+		free_cpumask_var(saved_mask);
+		return -ENOMEM;
+	}
+	cpumask_copy(saved_mask, &current->cpus_allowed);
 
 	if (unlikely(per_cpu(centrino_model, cpu) == NULL)) {
 		retval = -ENODEV;
@@ -493,9 +491,7 @@ static int centrino_target (struct cpufr
 		goto out;
 	}
 
-	*saved_mask = current->cpus_allowed;
 	first_cpu = 1;
-	cpus_clear(*covered_cpus);
 	for_each_cpu_mask_nr(j, policy->cpus) {
 		const cpumask_t *mask;
 
@@ -605,7 +601,8 @@ migrate_end:
 	preempt_enable();
 	set_cpus_allowed_ptr(current, saved_mask);
 out:
-	CPUMASK_FREE(allmasks);
+	free_cpumask_var(saved_mask);
+	free_cpumask_var(covered_cpus);
 	return retval;
 }
 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 07/11] cpumask: convert struct cpufreq_policy to cpumask_var_t. From: Rusty Russell <rusty@rustcorp.com.au>
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
                   ` (5 preceding siblings ...)
  2009-01-04 13:18 ` [PATCH 06/11] cpumask: Replace CPUMASK_ALLOC etc with cpumask_var_t. " Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 13:18 ` [PATCH 08/11] cpumask: use work_on_cpu in acpi/cstate.c Mike Travis
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel, Andreas Herrmann, Dave Jones,
	Len Brown

[-- Attachment #1: cpumask:convert-drivers_acpi.patch --]
[-- Type: text/plain, Size: 14520 bytes --]

Impact: reduce memory usage, use new API.

This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS.  cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c       |   10 ++---
 arch/x86/kernel/cpu/cpufreq/p4-clockmod.c        |    8 ++--
 arch/x86/kernel/cpu/cpufreq/powernow-k8.c        |    6 +--
 arch/x86/kernel/cpu/cpufreq/powernow-k8.h        |    2 -
 arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c |   14 +++----
 arch/x86/kernel/cpu/cpufreq/speedstep-ich.c      |   18 ++++-----
 drivers/cpufreq/cpufreq.c                        |   42 +++++++++++++++--------
 drivers/cpufreq/cpufreq_conservative.c           |    2 -
 drivers/cpufreq/cpufreq_ondemand.c               |    4 +-
 include/linux/cpufreq.h                          |    4 +-
 10 files changed, 62 insertions(+), 48 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -411,7 +411,7 @@ static int acpi_cpufreq_target(struct cp
 
 #ifdef CONFIG_HOTPLUG_CPU
 	/* cpufreq holds the hotplug lock, so we are safe from here on */
-	cpus_and(online_policy_cpus, cpu_online_map, policy->cpus);
+	cpumask_and(&online_policy_cpus, cpu_online_mask, policy->cpus);
 #else
 	online_policy_cpus = policy->cpus;
 #endif
@@ -625,15 +625,15 @@ static int acpi_cpufreq_cpu_init(struct 
 	 */
 	if (policy->shared_type == CPUFREQ_SHARED_TYPE_ALL ||
 	    policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {
-		cpumask_copy(&policy->cpus, perf->shared_cpu_map);
+		cpumask_copy(policy->cpus, perf->shared_cpu_map);
 	}
-	cpumask_copy(&policy->related_cpus, perf->shared_cpu_map);
+	cpumask_copy(policy->related_cpus, perf->shared_cpu_map);
 
 #ifdef CONFIG_SMP
 	dmi_check_system(sw_any_bug_dmi_table);
-	if (bios_with_sw_any_bug && cpus_weight(policy->cpus) == 1) {
+	if (bios_with_sw_any_bug && cpumask_weight(policy->cpus) == 1) {
 		policy->shared_type = CPUFREQ_SHARED_TYPE_ALL;
-		policy->cpus = per_cpu(cpu_core_map, cpu);
+		cpumask_copy(policy->cpus, cpu_core_mask(cpu));
 	}
 #endif
 
--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
@@ -122,7 +122,7 @@ static int cpufreq_p4_target(struct cpuf
 		return 0;
 
 	/* notifiers */
-	for_each_cpu_mask_nr(i, policy->cpus) {
+	for_each_cpu(i, policy->cpus) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 	}
@@ -130,11 +130,11 @@ static int cpufreq_p4_target(struct cpuf
 	/* run on each logical CPU, see section 13.15.3 of IA32 Intel Architecture Software
 	 * Developer's Manual, Volume 3
 	 */
-	for_each_cpu_mask_nr(i, policy->cpus)
+	for_each_cpu(i, policy->cpus)
 		cpufreq_p4_setdc(i, p4clockmod_table[newstate].index);
 
 	/* notifiers */
-	for_each_cpu_mask_nr(i, policy->cpus) {
+	for_each_cpu(i, policy->cpus) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
@@ -200,7 +200,7 @@ static int cpufreq_p4_cpu_init(struct cp
 	unsigned int i;
 
 #ifdef CONFIG_SMP
-	policy->cpus = per_cpu(cpu_sibling_map, policy->cpu);
+	cpumask_copy(policy->cpus, &per_cpu(cpu_sibling_map, policy->cpu));
 #endif
 
 	/* Errata workaround */
--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
@@ -1199,10 +1199,10 @@ static int __cpuinit powernowk8_cpu_init
 	set_cpus_allowed_ptr(current, &oldmask);
 
 	if (cpu_family == CPU_HW_PSTATE)
-		pol->cpus = cpumask_of_cpu(pol->cpu);
+		cpumask_copy(pol->cpus, cpumask_of(pol->cpu));
 	else
-		pol->cpus = per_cpu(cpu_core_map, pol->cpu);
-	data->available_cores = &(pol->cpus);
+		cpumask_copy(pol->cpus, &per_cpu(cpu_core_map, pol->cpu));
+	data->available_cores = pol->cpus;
 
 	/* Take a crude guess here.
 	 * That guess was in microseconds, so multiply with 1000 */
--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/powernow-k8.h
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/powernow-k8.h
@@ -53,7 +53,7 @@ struct powernow_k8_data {
 	/* we need to keep track of associated cores, but let cpufreq
 	 * handle hotplug events - so just point at cpufreq pol->cpus
 	 * structure */
-	cpumask_t *available_cores;
+	struct cpumask *available_cores;
 };
 
 
--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
@@ -492,8 +492,8 @@ static int centrino_target (struct cpufr
 	}
 
 	first_cpu = 1;
-	for_each_cpu_mask_nr(j, policy->cpus) {
-		const cpumask_t *mask;
+	for_each_cpu(j, policy->cpus) {
+		const struct cpumask *mask;
 
 		/* cpufreq holds the hotplug lock, so we are safe here */
 		if (!cpu_online(j))
@@ -504,9 +504,9 @@ static int centrino_target (struct cpufr
 		 * Make sure we are running on CPU that wants to change freq
 		 */
 		if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY)
-			mask = &policy->cpus;
+			mask = policy->cpus;
 		else
-			mask = &cpumask_of_cpu(j);
+			mask = cpumask_of(j);
 
 		set_cpus_allowed_ptr(current, mask);
 		preempt_disable();
@@ -538,7 +538,7 @@ static int centrino_target (struct cpufr
 			dprintk("target=%dkHz old=%d new=%d msr=%04x\n",
 				target_freq, freqs.old, freqs.new, msr);
 
-			for_each_cpu_mask_nr(k, policy->cpus) {
+			for_each_cpu(k, policy->cpus) {
 				if (!cpu_online(k))
 					continue;
 				freqs.cpu = k;
@@ -563,7 +563,7 @@ static int centrino_target (struct cpufr
 		preempt_enable();
 	}
 
-	for_each_cpu_mask_nr(k, policy->cpus) {
+	for_each_cpu(k, policy->cpus) {
 		if (!cpu_online(k))
 			continue;
 		freqs.cpu = k;
@@ -586,7 +586,7 @@ static int centrino_target (struct cpufr
 		tmp = freqs.new;
 		freqs.new = freqs.old;
 		freqs.old = tmp;
-		for_each_cpu_mask_nr(j, policy->cpus) {
+		for_each_cpu(j, policy->cpus) {
 			if (!cpu_online(j))
 				continue;
 			cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/speedstep-ich.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/speedstep-ich.c
@@ -229,7 +229,7 @@ static unsigned int speedstep_detect_chi
 	return 0;
 }
 
-static unsigned int _speedstep_get(const cpumask_t *cpus)
+static unsigned int _speedstep_get(const struct cpumask *cpus)
 {
 	unsigned int speed;
 	cpumask_t cpus_allowed;
@@ -244,7 +244,7 @@ static unsigned int _speedstep_get(const
 
 static unsigned int speedstep_get(unsigned int cpu)
 {
-	return _speedstep_get(&cpumask_of_cpu(cpu));
+	return _speedstep_get(cpumask_of(cpu));
 }
 
 /**
@@ -267,7 +267,7 @@ static int speedstep_target (struct cpuf
 	if (cpufreq_frequency_table_target(policy, &speedstep_freqs[0], target_freq, relation, &newstate))
 		return -EINVAL;
 
-	freqs.old = _speedstep_get(&policy->cpus);
+	freqs.old = _speedstep_get(policy->cpus);
 	freqs.new = speedstep_freqs[newstate].frequency;
 	freqs.cpu = policy->cpu;
 
@@ -279,20 +279,20 @@ static int speedstep_target (struct cpuf
 
 	cpus_allowed = current->cpus_allowed;
 
-	for_each_cpu_mask_nr(i, policy->cpus) {
+	for_each_cpu(i, policy->cpus) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 	}
 
 	/* switch to physical CPU where state is to be changed */
-	set_cpus_allowed_ptr(current, &policy->cpus);
+	set_cpus_allowed_ptr(current, policy->cpus);
 
 	speedstep_set_state(newstate);
 
 	/* allow to be run on all CPUs */
 	set_cpus_allowed_ptr(current, &cpus_allowed);
 
-	for_each_cpu_mask_nr(i, policy->cpus) {
+	for_each_cpu(i, policy->cpus) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
@@ -322,11 +322,11 @@ static int speedstep_cpu_init(struct cpu
 
 	/* only run on CPU to be set, or on its sibling */
 #ifdef CONFIG_SMP
-	policy->cpus = per_cpu(cpu_sibling_map, policy->cpu);
+	cpumask_copy(policy->cpus, &per_cpu(cpu_sibling_map, policy->cpu));
 #endif
 
 	cpus_allowed = current->cpus_allowed;
-	set_cpus_allowed_ptr(current, &policy->cpus);
+	set_cpus_allowed_ptr(current, policy->cpus);
 
 	/* detect low and high frequency and transition latency */
 	result = speedstep_get_freqs(speedstep_processor,
@@ -339,7 +339,7 @@ static int speedstep_cpu_init(struct cpu
 		return result;
 
 	/* get current speed setting */
-	speed = _speedstep_get(&policy->cpus);
+	speed = _speedstep_get(policy->cpus);
 	if (!speed)
 		return -EIO;
 
--- linux-2.6-for-ingo.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6-for-ingo/drivers/cpufreq/cpufreq.c
@@ -584,12 +584,12 @@ out:
 	return i;
 }
 
-static ssize_t show_cpus(cpumask_t mask, char *buf)
+static ssize_t show_cpus(const struct cpumask *mask, char *buf)
 {
 	ssize_t i = 0;
 	unsigned int cpu;
 
-	for_each_cpu_mask_nr(cpu, mask) {
+	for_each_cpu(cpu, mask) {
 		if (i)
 			i += scnprintf(&buf[i], (PAGE_SIZE - i - 2), " ");
 		i += scnprintf(&buf[i], (PAGE_SIZE - i - 2), "%u", cpu);
@@ -606,7 +606,7 @@ static ssize_t show_cpus(cpumask_t mask,
  */
 static ssize_t show_related_cpus(struct cpufreq_policy *policy, char *buf)
 {
-	if (cpus_empty(policy->related_cpus))
+	if (cpumask_empty(policy->related_cpus))
 		return show_cpus(policy->cpus, buf);
 	return show_cpus(policy->related_cpus, buf);
 }
@@ -801,9 +801,20 @@ static int cpufreq_add_dev(struct sys_de
 		ret = -ENOMEM;
 		goto nomem_out;
 	}
+	if (!alloc_cpumask_var(&policy->cpus, GFP_KERNEL)) {
+		kfree(policy);
+		ret = -ENOMEM;
+		goto nomem_out;
+	}
+	if (!alloc_cpumask_var(&policy->related_cpus, GFP_KERNEL)) {
+		free_cpumask_var(policy->cpus);
+		kfree(policy);
+		ret = -ENOMEM;
+		goto nomem_out;
+	}
 
 	policy->cpu = cpu;
-	policy->cpus = cpumask_of_cpu(cpu);
+	cpumask_copy(policy->cpus, cpumask_of(cpu));
 
 	/* Initially set CPU itself as the policy_cpu */
 	per_cpu(policy_cpu, cpu) = cpu;
@@ -838,7 +849,7 @@ static int cpufreq_add_dev(struct sys_de
 	}
 #endif
 
-	for_each_cpu_mask_nr(j, policy->cpus) {
+	for_each_cpu(j, policy->cpus) {
 		if (cpu == j)
 			continue;
 
@@ -856,7 +867,7 @@ static int cpufreq_add_dev(struct sys_de
 				goto err_out_driver_exit;
 
 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
-			managed_policy->cpus = policy->cpus;
+			cpumask_copy(managed_policy->cpus, policy->cpus);
 			per_cpu(cpufreq_cpu_data, cpu) = managed_policy;
 			spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
@@ -901,14 +912,14 @@ static int cpufreq_add_dev(struct sys_de
 	}
 
 	spin_lock_irqsave(&cpufreq_driver_lock, flags);
-	for_each_cpu_mask_nr(j, policy->cpus) {
+	for_each_cpu(j, policy->cpus) {
 		per_cpu(cpufreq_cpu_data, j) = policy;
 		per_cpu(policy_cpu, j) = policy->cpu;
 	}
 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
 	/* symlink affected CPUs */
-	for_each_cpu_mask_nr(j, policy->cpus) {
+	for_each_cpu(j, policy->cpus) {
 		if (j == cpu)
 			continue;
 		if (!cpu_online(j))
@@ -948,7 +959,7 @@ static int cpufreq_add_dev(struct sys_de
 
 err_out_unregister:
 	spin_lock_irqsave(&cpufreq_driver_lock, flags);
-	for_each_cpu_mask_nr(j, policy->cpus)
+	for_each_cpu(j, policy->cpus)
 		per_cpu(cpufreq_cpu_data, j) = NULL;
 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
@@ -1009,7 +1020,7 @@ static int __cpufreq_remove_dev(struct s
 	 */
 	if (unlikely(cpu != data->cpu)) {
 		dprintk("removing link\n");
-		cpu_clear(cpu, data->cpus);
+		cpumask_clear_cpu(cpu, data->cpus);
 		spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 		sysfs_remove_link(&sys_dev->kobj, "cpufreq");
 		cpufreq_cpu_put(data);
@@ -1030,8 +1041,8 @@ static int __cpufreq_remove_dev(struct s
 	 * per_cpu(cpufreq_cpu_data) while holding the lock, and remove
 	 * the sysfs links afterwards.
 	 */
-	if (unlikely(cpus_weight(data->cpus) > 1)) {
-		for_each_cpu_mask_nr(j, data->cpus) {
+	if (unlikely(cpumask_weight(data->cpus) > 1)) {
+		for_each_cpu(j, data->cpus) {
 			if (j == cpu)
 				continue;
 			per_cpu(cpufreq_cpu_data, j) = NULL;
@@ -1040,8 +1051,8 @@ static int __cpufreq_remove_dev(struct s
 
 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
-	if (unlikely(cpus_weight(data->cpus) > 1)) {
-		for_each_cpu_mask_nr(j, data->cpus) {
+	if (unlikely(cpumask_weight(data->cpus) > 1)) {
+		for_each_cpu(j, data->cpus) {
 			if (j == cpu)
 				continue;
 			dprintk("removing link for cpu %u\n", j);
@@ -1075,7 +1086,10 @@ static int __cpufreq_remove_dev(struct s
 	if (cpufreq_driver->exit)
 		cpufreq_driver->exit(data);
 
+	free_cpumask_var(data->related_cpus);
+	free_cpumask_var(data->cpus);
 	kfree(data);
+	per_cpu(cpufreq_cpu_data, cpu) = NULL;
 
 	cpufreq_debug_enable_ratelimit();
 	return 0;
--- linux-2.6-for-ingo.orig/drivers/cpufreq/cpufreq_conservative.c
+++ linux-2.6-for-ingo/drivers/cpufreq/cpufreq_conservative.c
@@ -498,7 +498,7 @@ static int cpufreq_governor_dbs(struct c
 			return rc;
 		}
 
-		for_each_cpu_mask_nr(j, policy->cpus) {
+		for_each_cpu(j, policy->cpus) {
 			struct cpu_dbs_info_s *j_dbs_info;
 			j_dbs_info = &per_cpu(cpu_dbs_info, j);
 			j_dbs_info->cur_policy = policy;
--- linux-2.6-for-ingo.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ linux-2.6-for-ingo/drivers/cpufreq/cpufreq_ondemand.c
@@ -400,7 +400,7 @@ static void dbs_check_cpu(struct cpu_dbs
 	/* Get Absolute Load - in terms of freq */
 	max_load_freq = 0;
 
-	for_each_cpu_mask_nr(j, policy->cpus) {
+	for_each_cpu(j, policy->cpus) {
 		struct cpu_dbs_info_s *j_dbs_info;
 		cputime64_t cur_wall_time, cur_idle_time;
 		unsigned int idle_time, wall_time;
@@ -568,7 +568,7 @@ static int cpufreq_governor_dbs(struct c
 			return rc;
 		}
 
-		for_each_cpu_mask_nr(j, policy->cpus) {
+		for_each_cpu(j, policy->cpus) {
 			struct cpu_dbs_info_s *j_dbs_info;
 			j_dbs_info = &per_cpu(cpu_dbs_info, j);
 			j_dbs_info->cur_policy = policy;
--- linux-2.6-for-ingo.orig/include/linux/cpufreq.h
+++ linux-2.6-for-ingo/include/linux/cpufreq.h
@@ -80,8 +80,8 @@ struct cpufreq_real_policy {
 };
 
 struct cpufreq_policy {
-	cpumask_t		cpus;	/* CPUs requiring sw coordination */
-	cpumask_t		related_cpus; /* CPUs with any coordination */
+	cpumask_var_t		cpus;	/* CPUs requiring sw coordination */
+	cpumask_var_t		related_cpus; /* CPUs with any coordination */
 	unsigned int		shared_type; /* ANY or ALL affected CPUs
 						should set cpufreq */
 	unsigned int		cpu;    /* cpu nr of registered CPU */


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 08/11] cpumask: use work_on_cpu in acpi/cstate.c
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
                   ` (6 preceding siblings ...)
  2009-01-04 13:18 ` [PATCH 07/11] cpumask: convert struct cpufreq_policy to " Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 13:18 ` [PATCH 09/11] cpumask: use cpumask_var_t in acpi-cpufreq.c Mike Travis
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel, Dave Jones

[-- Attachment #1: cpumask:use-work_on_cpu-in-acpi_cstate.c --]
[-- Type: text/plain, Size: 3633 bytes --]

Impact: cleanup, reduce stack usage, use new cpumask API.

Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for the acpi_processor_ffh_cstate_probe() function.

Basically splits acpi_processor_ffh_cstate_probe() into two functions, the
other being acpi_processor_ffh_cstate_probe_cpu which is the work function
run on the designated cpu.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dave Jones <davej@redhat.com>
---
 arch/x86/kernel/acpi/cstate.c |   70 ++++++++++++++++++++++--------------------
 1 file changed, 37 insertions(+), 33 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/acpi/cstate.c
+++ linux-2.6-for-ingo/arch/x86/kernel/acpi/cstate.c
@@ -66,35 +66,15 @@ static short mwait_supported[ACPI_PROCES
 
 #define NATIVE_CSTATE_BEYOND_HALT	(2)
 
-int acpi_processor_ffh_cstate_probe(unsigned int cpu,
-		struct acpi_processor_cx *cx, struct acpi_power_register *reg)
+static long acpi_processor_ffh_cstate_probe_cpu(void *_cx)
 {
-	struct cstate_entry *percpu_entry;
-	struct cpuinfo_x86 *c = &cpu_data(cpu);
-
-	cpumask_t saved_mask;
-	int retval;
+	struct acpi_processor_cx *cx = _cx;
+	long retval;
 	unsigned int eax, ebx, ecx, edx;
 	unsigned int edx_part;
 	unsigned int cstate_type; /* C-state type and not ACPI C-state type */
 	unsigned int num_cstate_subtype;
 
-	if (!cpu_cstate_entry || c->cpuid_level < CPUID_MWAIT_LEAF )
-		return -1;
-
-	if (reg->bit_offset != NATIVE_CSTATE_BEYOND_HALT)
-		return -1;
-
-	percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
-	percpu_entry->states[cx->index].eax = 0;
-	percpu_entry->states[cx->index].ecx = 0;
-
-	/* Make sure we are running on right CPU */
-	saved_mask = current->cpus_allowed;
-	retval = set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
-	if (retval)
-		return -1;
-
 	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
 
 	/* Check whether this particular cx_type (in CST) is supported or not */
@@ -114,21 +94,45 @@ int acpi_processor_ffh_cstate_probe(unsi
 		retval = -1;
 		goto out;
 	}
-	percpu_entry->states[cx->index].ecx = MWAIT_ECX_INTERRUPT_BREAK;
-
-	/* Use the hint in CST */
-	percpu_entry->states[cx->index].eax = cx->address;
 
 	if (!mwait_supported[cstate_type]) {
 		mwait_supported[cstate_type] = 1;
-		printk(KERN_DEBUG "Monitor-Mwait will be used to enter C-%d "
-		       "state\n", cx->type);
+		printk(KERN_DEBUG
+			"Monitor-Mwait will be used to enter C-%d "
+			"state\n", cx->type);
 	}
-	snprintf(cx->desc, ACPI_CX_DESC_LEN, "ACPI FFH INTEL MWAIT 0x%x",
-		 cx->address);
-
+	snprintf(cx->desc,
+			ACPI_CX_DESC_LEN, "ACPI FFH INTEL MWAIT 0x%x",
+			cx->address);
 out:
-	set_cpus_allowed_ptr(current, &saved_mask);
+	return retval;
+}
+
+int acpi_processor_ffh_cstate_probe(unsigned int cpu,
+		struct acpi_processor_cx *cx, struct acpi_power_register *reg)
+{
+	struct cstate_entry *percpu_entry;
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	long retval;
+
+	if (!cpu_cstate_entry || c->cpuid_level < CPUID_MWAIT_LEAF)
+		return -1;
+
+	if (reg->bit_offset != NATIVE_CSTATE_BEYOND_HALT)
+		return -1;
+
+	percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
+	percpu_entry->states[cx->index].eax = 0;
+	percpu_entry->states[cx->index].ecx = 0;
+
+	/* Make sure we are running on right CPU */
+
+	retval = work_on_cpu(cpu, acpi_processor_ffh_cstate_probe_cpu, cx);
+	if (retval == 0) {
+		/* Use the hint in CST */
+		percpu_entry->states[cx->index].eax = cx->address;
+		percpu_entry->states[cx->index].ecx = MWAIT_ECX_INTERRUPT_BREAK;
+	}
 	return retval;
 }
 EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 09/11] cpumask: use cpumask_var_t in acpi-cpufreq.c
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
                   ` (7 preceding siblings ...)
  2009-01-04 13:18 ` [PATCH 08/11] cpumask: use work_on_cpu in acpi/cstate.c Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 13:18 ` [PATCH 10/11] cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write Mike Travis
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel, Dave Jones

[-- Attachment #1: cpumask:use-cpumask_var_t-in-acpi-cpufreq_c --]
[-- Type: text/plain, Size: 5135 bytes --]

Impact: cleanup, reduce stack usage, use new cpumask API.

Replace the cpumask_t in struct drv_cmd with a cpumask_var_t.  Remove unneeded
online_policy_cpus cpumask_t in acpi_cpufreq_target.  Update refs to use
new cpumask API.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dave Jones <davej@redhat.com>
---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |   58 ++++++++++++++---------------
 1 file changed, 29 insertions(+), 29 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -145,7 +145,7 @@ typedef union {
 
 struct drv_cmd {
 	unsigned int type;
-	cpumask_t mask;
+	cpumask_var_t mask;
 	drv_addr_union addr;
 	u32 val;
 };
@@ -193,7 +193,7 @@ static void drv_read(struct drv_cmd *cmd
 	cpumask_t saved_mask = current->cpus_allowed;
 	cmd->val = 0;
 
-	set_cpus_allowed_ptr(current, &cmd->mask);
+	set_cpus_allowed_ptr(current, cmd->mask);
 	do_drv_read(cmd);
 	set_cpus_allowed_ptr(current, &saved_mask);
 }
@@ -203,8 +203,8 @@ static void drv_write(struct drv_cmd *cm
 	cpumask_t saved_mask = current->cpus_allowed;
 	unsigned int i;
 
-	for_each_cpu_mask_nr(i, cmd->mask) {
-		set_cpus_allowed_ptr(current, &cpumask_of_cpu(i));
+	for_each_cpu(i, cmd->mask) {
+		set_cpus_allowed_ptr(current, cpumask_of(i));
 		do_drv_write(cmd);
 	}
 
@@ -212,22 +212,22 @@ static void drv_write(struct drv_cmd *cm
 	return;
 }
 
-static u32 get_cur_val(const cpumask_t *mask)
+static u32 get_cur_val(const struct cpumask *mask)
 {
 	struct acpi_processor_performance *perf;
 	struct drv_cmd cmd;
 
-	if (unlikely(cpus_empty(*mask)))
+	if (unlikely(cpumask_empty(mask)))
 		return 0;
 
-	switch (per_cpu(drv_data, first_cpu(*mask))->cpu_feature) {
+	switch (per_cpu(drv_data, cpumask_first(mask))->cpu_feature) {
 	case SYSTEM_INTEL_MSR_CAPABLE:
 		cmd.type = SYSTEM_INTEL_MSR_CAPABLE;
 		cmd.addr.msr.reg = MSR_IA32_PERF_STATUS;
 		break;
 	case SYSTEM_IO_CAPABLE:
 		cmd.type = SYSTEM_IO_CAPABLE;
-		perf = per_cpu(drv_data, first_cpu(*mask))->acpi_data;
+		perf = per_cpu(drv_data, cpumask_first(mask))->acpi_data;
 		cmd.addr.io.port = perf->control_register.address;
 		cmd.addr.io.bit_width = perf->control_register.bit_width;
 		break;
@@ -235,7 +235,7 @@ static u32 get_cur_val(const cpumask_t *
 		return 0;
 	}
 
-	cmd.mask = *mask;
+	cpumask_copy(cmd.mask, mask);
 
 	drv_read(&cmd);
 
@@ -386,7 +386,6 @@ static int acpi_cpufreq_target(struct cp
 	struct acpi_cpufreq_data *data = per_cpu(drv_data, policy->cpu);
 	struct acpi_processor_performance *perf;
 	struct cpufreq_freqs freqs;
-	cpumask_t online_policy_cpus;
 	struct drv_cmd cmd;
 	unsigned int next_state = 0; /* Index into freq_table */
 	unsigned int next_perf_state = 0; /* Index into perf table */
@@ -401,20 +400,18 @@ static int acpi_cpufreq_target(struct cp
 		return -ENODEV;
 	}
 
+	if (unlikely(!alloc_cpumask_var(&cmd.mask, GFP_KERNEL)))
+		return -ENOMEM;
+
 	perf = data->acpi_data;
 	result = cpufreq_frequency_table_target(policy,
 						data->freq_table,
 						target_freq,
 						relation, &next_state);
-	if (unlikely(result))
-		return -ENODEV;
-
-#ifdef CONFIG_HOTPLUG_CPU
-	/* cpufreq holds the hotplug lock, so we are safe from here on */
-	cpumask_and(&online_policy_cpus, cpu_online_mask, policy->cpus);
-#else
-	online_policy_cpus = policy->cpus;
-#endif
+	if (unlikely(result)) {
+		result = -ENODEV;
+		goto out;
+	}
 
 	next_perf_state = data->freq_table[next_state].index;
 	if (perf->state == next_perf_state) {
@@ -425,7 +422,7 @@ static int acpi_cpufreq_target(struct cp
 		} else {
 			dprintk("Already at target state (P%d)\n",
 				next_perf_state);
-			return 0;
+			goto out;
 		}
 	}
 
@@ -444,19 +441,19 @@ static int acpi_cpufreq_target(struct cp
 		cmd.val = (u32) perf->states[next_perf_state].control;
 		break;
 	default:
-		return -ENODEV;
+		result = -ENODEV;
+		goto out;
 	}
 
-	cpus_clear(cmd.mask);
-
+	/* cpufreq holds the hotplug lock, so we are safe from here on */
 	if (policy->shared_type != CPUFREQ_SHARED_TYPE_ANY)
-		cmd.mask = online_policy_cpus;
+		cpumask_and(cmd.mask, cpu_online_mask, policy->cpus);
 	else
-		cpu_set(policy->cpu, cmd.mask);
+		cpumask_copy(cmd.mask, cpumask_of(policy->cpu));
 
 	freqs.old = perf->states[perf->state].core_frequency * 1000;
 	freqs.new = data->freq_table[next_state].frequency;
-	for_each_cpu_mask_nr(i, cmd.mask) {
+	for_each_cpu(i, cmd.mask) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 	}
@@ -464,19 +461,22 @@ static int acpi_cpufreq_target(struct cp
 	drv_write(&cmd);
 
 	if (acpi_pstate_strict) {
-		if (!check_freqs(&cmd.mask, freqs.new, data)) {
+		if (!check_freqs(cmd.mask, freqs.new, data)) {
 			dprintk("acpi_cpufreq_target failed (%d)\n",
 				policy->cpu);
-			return -EAGAIN;
+			result = -EAGAIN;
+			goto out;
 		}
 	}
 
-	for_each_cpu_mask_nr(i, cmd.mask) {
+	for_each_cpu(i, cmd.mask) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
 	perf->state = next_perf_state;
 
+out:
+	free_cpumask_var(cmd.mask);
 	return result;
 }
 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 10/11] cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
                   ` (8 preceding siblings ...)
  2009-01-04 13:18 ` [PATCH 09/11] cpumask: use cpumask_var_t in acpi-cpufreq.c Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 13:18 ` [PATCH 11/11] cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs Mike Travis
  2009-01-04 14:44 ` [PATCH 00/11] x86: cpumask: some more cpumask cleanups Ingo Molnar
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel, Dave Jones

[-- Attachment #1: cpumask:use-work_on_cpu-in-acpi-cpufreq_c-p1 --]
[-- Type: text/plain, Size: 2237 bytes --]

Impact: cleanup, reduce stack usage, use new cpumask API.

Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().

Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dave Jones <davej@redhat.com>
---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |   25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -150,8 +150,9 @@ struct drv_cmd {
 	u32 val;
 };
 
-static void do_drv_read(struct drv_cmd *cmd)
+static long do_drv_read(void *_cmd)
 {
+	struct drv_cmd *cmd = _cmd;
 	u32 h;
 
 	switch (cmd->type) {
@@ -166,10 +167,12 @@ static void do_drv_read(struct drv_cmd *
 	default:
 		break;
 	}
+	return 0;
 }
 
-static void do_drv_write(struct drv_cmd *cmd)
+static long do_drv_write(void *_cmd)
 {
+	struct drv_cmd *cmd = _cmd;
 	u32 lo, hi;
 
 	switch (cmd->type) {
@@ -186,30 +189,23 @@ static void do_drv_write(struct drv_cmd 
 	default:
 		break;
 	}
+	return 0;
 }
 
 static void drv_read(struct drv_cmd *cmd)
 {
-	cpumask_t saved_mask = current->cpus_allowed;
 	cmd->val = 0;
 
-	set_cpus_allowed_ptr(current, cmd->mask);
-	do_drv_read(cmd);
-	set_cpus_allowed_ptr(current, &saved_mask);
+	work_on_cpu(cpumask_any(cmd->mask), do_drv_read, cmd);
 }
 
 static void drv_write(struct drv_cmd *cmd)
 {
-	cpumask_t saved_mask = current->cpus_allowed;
 	unsigned int i;
 
 	for_each_cpu(i, cmd->mask) {
-		set_cpus_allowed_ptr(current, cpumask_of(i));
-		do_drv_write(cmd);
+		work_on_cpu(i, do_drv_write, cmd);
 	}
-
-	set_cpus_allowed_ptr(current, &saved_mask);
-	return;
 }
 
 static u32 get_cur_val(const struct cpumask *mask)
@@ -235,10 +231,15 @@ static u32 get_cur_val(const struct cpum
 		return 0;
 	}
 
+	if (unlikely(!alloc_cpumask_var(&cmd.mask, GFP_KERNEL)))
+		return 0;
+
 	cpumask_copy(cmd.mask, mask);
 
 	drv_read(&cmd);
 
+	free_cpumask_var(cmd.mask);
+
 	dprintk("get_cur_val = %u\n", cmd.val);
 
 	return cmd.val;


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 11/11] cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
                   ` (9 preceding siblings ...)
  2009-01-04 13:18 ` [PATCH 10/11] cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write Mike Travis
@ 2009-01-04 13:18 ` Mike Travis
  2009-01-04 14:44 ` [PATCH 00/11] x86: cpumask: some more cpumask cleanups Ingo Molnar
  11 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-04 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, linux-kernel, Dave Jones

[-- Attachment #1: cpumask:use-work_on_cpu-in-acpi-cpufreq_c-p2 --]
[-- Type: text/plain, Size: 4792 bytes --]

Impact: cleanup, reduce stack usage, use new cpumask API

Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for read_measured_perf_ctrs().

Basically splits off the work function from get_measured_perf which is
run on the designated cpu.  Moves definition of struct perf_cur out of
function local namespace, and is used as the work function argument.
References in get_measured_perf use values in the perf_cur struct.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dave Jones <davej@redhat.com>
---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |   83 +++++++++++++++--------------
 1 file changed, 43 insertions(+), 40 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -245,6 +245,30 @@ static u32 get_cur_val(const struct cpum
 	return cmd.val;
 }
 
+struct perf_cur {
+	union {
+		struct {
+			u32 lo;
+			u32 hi;
+		} split;
+		u64 whole;
+	} aperf_cur, mperf_cur;
+};
+
+
+static long read_measured_perf_ctrs(void *_cur)
+{
+	struct perf_cur *cur = _cur;
+
+	rdmsr(MSR_IA32_APERF, cur->aperf_cur.split.lo, cur->aperf_cur.split.hi);
+	rdmsr(MSR_IA32_MPERF, cur->mperf_cur.split.lo, cur->mperf_cur.split.hi);
+
+	wrmsr(MSR_IA32_APERF, 0, 0);
+	wrmsr(MSR_IA32_MPERF, 0, 0);
+
+	return 0;
+}
+
 /*
  * Return the measured active (C0) frequency on this CPU since last call
  * to this function.
@@ -261,31 +285,12 @@ static u32 get_cur_val(const struct cpum
 static unsigned int get_measured_perf(struct cpufreq_policy *policy,
 				      unsigned int cpu)
 {
-	union {
-		struct {
-			u32 lo;
-			u32 hi;
-		} split;
-		u64 whole;
-	} aperf_cur, mperf_cur;
-
-	cpumask_t saved_mask;
+	struct perf_cur cur;
 	unsigned int perf_percent;
 	unsigned int retval;
 
-	saved_mask = current->cpus_allowed;
-	set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
-	if (get_cpu() != cpu) {
-		/* We were not able to run on requested processor */
-		put_cpu();
+	if (!work_on_cpu(cpu, read_measured_perf_ctrs, &cur))
 		return 0;
-	}
-
-	rdmsr(MSR_IA32_APERF, aperf_cur.split.lo, aperf_cur.split.hi);
-	rdmsr(MSR_IA32_MPERF, mperf_cur.split.lo, mperf_cur.split.hi);
-
-	wrmsr(MSR_IA32_APERF, 0,0);
-	wrmsr(MSR_IA32_MPERF, 0,0);
 
 #ifdef __i386__
 	/*
@@ -293,37 +298,39 @@ static unsigned int get_measured_perf(st
 	 * Get an approximate value. Return failure in case we cannot get
 	 * an approximate value.
 	 */
-	if (unlikely(aperf_cur.split.hi || mperf_cur.split.hi)) {
+	if (unlikely(cur.aperf_cur.split.hi || cur.mperf_cur.split.hi)) {
 		int shift_count;
 		u32 h;
 
-		h = max_t(u32, aperf_cur.split.hi, mperf_cur.split.hi);
+		h = max_t(u32, cur.aperf_cur.split.hi, cur.mperf_cur.split.hi);
 		shift_count = fls(h);
 
-		aperf_cur.whole >>= shift_count;
-		mperf_cur.whole >>= shift_count;
+		cur.aperf_cur.whole >>= shift_count;
+		cur.mperf_cur.whole >>= shift_count;
 	}
 
-	if (((unsigned long)(-1) / 100) < aperf_cur.split.lo) {
+	if (((unsigned long)(-1) / 100) < cur.aperf_cur.split.lo) {
 		int shift_count = 7;
-		aperf_cur.split.lo >>= shift_count;
-		mperf_cur.split.lo >>= shift_count;
+		cur.aperf_cur.split.lo >>= shift_count;
+		cur.mperf_cur.split.lo >>= shift_count;
 	}
 
-	if (aperf_cur.split.lo && mperf_cur.split.lo)
-		perf_percent = (aperf_cur.split.lo * 100) / mperf_cur.split.lo;
+	if (cur.aperf_cur.split.lo && cur.mperf_cur.split.lo)
+		perf_percent = (cur.aperf_cur.split.lo * 100) /
+				cur.mperf_cur.split.lo;
 	else
 		perf_percent = 0;
 
 #else
-	if (unlikely(((unsigned long)(-1) / 100) < aperf_cur.whole)) {
+	if (unlikely(((unsigned long)(-1) / 100) < cur.aperf_cur.whole)) {
 		int shift_count = 7;
-		aperf_cur.whole >>= shift_count;
-		mperf_cur.whole >>= shift_count;
+		cur.aperf_cur.whole >>= shift_count;
+		cur.mperf_cur.whole >>= shift_count;
 	}
 
-	if (aperf_cur.whole && mperf_cur.whole)
-		perf_percent = (aperf_cur.whole * 100) / mperf_cur.whole;
+	if (cur.aperf_cur.whole && cur.mperf_cur.whole)
+		perf_percent = (cur.aperf_cur.whole * 100) /
+				cur.mperf_cur.whole;
 	else
 		perf_percent = 0;
 
@@ -331,10 +338,6 @@ static unsigned int get_measured_perf(st
 
 	retval = per_cpu(drv_data, policy->cpu)->max_freq * perf_percent / 100;
 
-	put_cpu();
-	set_cpus_allowed_ptr(current, &saved_mask);
-
-	dprintk("cpu %d: performance percent %d\n", cpu, perf_percent);
 	return retval;
 }
 
@@ -352,7 +355,7 @@ static unsigned int get_cur_freq_on_cpu(
 	}
 
 	cached_freq = data->freq_table[data->acpi_data->state].frequency;
-	freq = extract_freq(get_cur_val(&cpumask_of_cpu(cpu)), data);
+	freq = extract_freq(get_cur_val(cpumask_of(cpu)), data);
 	if (freq != cached_freq) {
 		/*
 		 * The dreaded BIOS frequency change behind our back.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/11] x86: cpumask: some more cpumask cleanups
  2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
                   ` (10 preceding siblings ...)
  2009-01-04 13:18 ` [PATCH 11/11] cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs Mike Travis
@ 2009-01-04 14:44 ` Ingo Molnar
  2009-01-05 18:28   ` Mike Travis
  2009-01-06  3:49   ` [PATCH 00/11] x86: cpumask: some more cpumask cleanups - flush_tlb_* Mike Travis
  11 siblings, 2 replies; 24+ messages in thread
From: Ingo Molnar @ 2009-01-04 14:44 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Rusty Russell, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds, Jack Steiner, linux-kernel


* Mike Travis <travis@sgi.com> wrote:

> Here's some more cpumask cleanups.
> 
>     ia64: cpumask fix for is_affinity_mask_valid()
>     cpumask: update local_cpus_show to use new cpumask API
>     cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
>     x86: cleanup remaining cpumask_t ops in smpboot code
>     x86: clean up speedstep-centrino and reduce cpumask_t usage
>     cpumask: Replace CPUMASK_ALLOC etc with cpumask_var_t.
>     cpumask: convert struct cpufreq_policy to cpumask_var_t.
>     cpumask: use work_on_cpu in acpi/cstate.c
>     cpumask: use cpumask_var_t in acpi-cpufreq.c
>     cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
>     cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
> 
> This version basically splits out the changes to make it more 
> bisectable, and has been patch-wise compile/boot tested.  Updated stats 
> are below.

ok, i've picked them up into tip/cpus4096:

1d1a70e: cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
4d30e6b: cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
0771cd4: cpumask: use cpumask_var_t in acpi-cpufreq.c
9fa9864: cpumask: use work_on_cpu in acpi/cstate.c
a2a8809: cpumask: convert struct cpufreq_policy to cpumask_var_t
ee557bd: cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
3744123: x86: clean up speedstep-centrino and reduce cpumask_t usage
c2d1cec: x86: cleanup remaining cpumask_t ops in smpboot code
588235b: cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
3be8305: cpumask: update local_cpus_show to use new cpumask API
d3b66bf: ia64: cpumask fix for is_affinity_mask_valid()

( Sidenote, your mail scripts have a bug that do this to the Subject line:

    Subject: [PATCH 05/11] x86: clean up speedstep-centrino and reduce 
    cpumask_t usage From: Rusty Russell <rusty@rustcorp.com.au>

  i've fixed them up manually so that Rusty is in the Author field. )


> The number of stack hogs have been significantly reduced:
> 
> ====== Stack (-l 500)
>     1 - allyesconfig-128
>     2 - allyesconfig-4k
> 
>   .1.    .2.    ..final..
>     0  +1032   1032      .  flush_tlb_page
>     0  +1024   1024      .  kvm_reload_remote_mmus
>     0  +1024   1024      .  kvm_flush_remote_tlbs
>     0  +1024   1024      .  flush_tlb_mm
>     0  +1024   1024      .  flush_tlb_current_task

Quite good! Can we fix those TLB flush cpumask uses too?

> And the overall memory usage is becoming quite less affected by changing
> NR_CPUS from 128 to 4096:
[...]
>         .1.       .2.    ..final..
>    11436936  +4167424    15604360   +36%  .bss

.bss seems to account for ~80% of the increase. Are these static cpumasks, 
or do we still have NR_CPUS arrays around?

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/11] x86: cpumask: some more cpumask cleanups
  2009-01-04 14:44 ` [PATCH 00/11] x86: cpumask: some more cpumask cleanups Ingo Molnar
@ 2009-01-05 18:28   ` Mike Travis
  2009-01-06  3:49   ` [PATCH 00/11] x86: cpumask: some more cpumask cleanups - flush_tlb_* Mike Travis
  1 sibling, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-05 18:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, Rusty Russell, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds, Jack Steiner, linux-kernel

Ingo Molnar wrote:
> * Mike Travis <travis@sgi.com> wrote:
> 
>> Here's some more cpumask cleanups.
>>
>>     ia64: cpumask fix for is_affinity_mask_valid()
>>     cpumask: update local_cpus_show to use new cpumask API
>>     cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
>>     x86: cleanup remaining cpumask_t ops in smpboot code
>>     x86: clean up speedstep-centrino and reduce cpumask_t usage
>>     cpumask: Replace CPUMASK_ALLOC etc with cpumask_var_t.
>>     cpumask: convert struct cpufreq_policy to cpumask_var_t.
>>     cpumask: use work_on_cpu in acpi/cstate.c
>>     cpumask: use cpumask_var_t in acpi-cpufreq.c
>>     cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
>>     cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
>>
>> This version basically splits out the changes to make it more 
>> bisectable, and has been patch-wise compile/boot tested.  Updated stats 
>> are below.
> 
> ok, i've picked them up into tip/cpus4096:

Thanks Ingo!

> 
> 1d1a70e: cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
> 4d30e6b: cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
> 0771cd4: cpumask: use cpumask_var_t in acpi-cpufreq.c
> 9fa9864: cpumask: use work_on_cpu in acpi/cstate.c
> a2a8809: cpumask: convert struct cpufreq_policy to cpumask_var_t
> ee557bd: cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
> 3744123: x86: clean up speedstep-centrino and reduce cpumask_t usage
> c2d1cec: x86: cleanup remaining cpumask_t ops in smpboot code
> 588235b: cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
> 3be8305: cpumask: update local_cpus_show to use new cpumask API
> d3b66bf: ia64: cpumask fix for is_affinity_mask_valid()
> 
> ( Sidenote, your mail scripts have a bug that do this to the Subject line:
> 
>     Subject: [PATCH 05/11] x86: clean up speedstep-centrino and reduce 
>     cpumask_t usage From: Rusty Russell <rusty@rustcorp.com.au>

It's in quilt mail (even the latest version), but since it's a script, I'll
see about fixing it manually.

> 
>   i've fixed them up manually so that Rusty is in the Author field. )
> 
> 
>> The number of stack hogs have been significantly reduced:
>>
>> ====== Stack (-l 500)
>>     1 - allyesconfig-128
>>     2 - allyesconfig-4k
>>
>>   .1.    .2.    ..final..
>>     0  +1032   1032      .  flush_tlb_page
>>     0  +1024   1024      .  kvm_reload_remote_mmus
>>     0  +1024   1024      .  kvm_flush_remote_tlbs
>>     0  +1024   1024      .  flush_tlb_mm
>>     0  +1024   1024      .  flush_tlb_current_task
> 
> Quite good! Can we fix those TLB flush cpumask uses too?

I've looked at the tlb ones and they are hairy.  But we now have a few more
facilities in place so I'll revisit them.

> 
>> And the overall memory usage is becoming quite less affected by changing
>> NR_CPUS from 128 to 4096:
> [...]
>>         .1.       .2.    ..final..
>>    11436936  +4167424    15604360   +36%  .bss
> 
> .bss seems to account for ~80% of the increase. Are these static cpumasks, 
> or do we still have NR_CPUS arrays around?

There are 72 arrays still using NR_CPUS (though some legitimately) and 14 static
cpumask_t's and 11 "DECLARE_BITMAP(..., NR_CPUS)".

There are also about 5 patches left in my queue that need further testing with
the latest tip code.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/11] x86: cpumask: some more cpumask cleanups - flush_tlb_*
  2009-01-04 14:44 ` [PATCH 00/11] x86: cpumask: some more cpumask cleanups Ingo Molnar
  2009-01-05 18:28   ` Mike Travis
@ 2009-01-06  3:49   ` Mike Travis
  2009-01-07  2:12     ` Rusty Russell
  1 sibling, 1 reply; 24+ messages in thread
From: Mike Travis @ 2009-01-06  3:49 UTC (permalink / raw)
  To: Ingo Molnar, Rusty Russell
  Cc: H. Peter Anvin, Thomas Gleixner, Linus Torvalds, Jack Steiner,
	Cliff Wickman, Nick Piggin, Jeremy Fitzhardinge,
	Christoph Lameter, Jes Sorensen, LKML

Ingo Molnar wrote:
...
>> ====== Stack (-l 500)
>>     1 - allyesconfig-128
>>     2 - allyesconfig-4k
>>
>>   .1.    .2.    ..final..
>>     0  +1032   1032      .  flush_tlb_page
>>     0  +1024   1024      .  kvm_reload_remote_mmus
>>     0  +1024   1024      .  kvm_flush_remote_tlbs
>>     0  +1024   1024      .  flush_tlb_mm
>>     0  +1024   1024      .  flush_tlb_current_task
> 
> Quite good! Can we fix those TLB flush cpumask uses too?

Here is one proposal.  I don't like increasing PER_CPU area but eventually we
should be able to per_cpu_alloc the cpumasks so they only take up enough room
as needed.  Also, it was unclear whether one scratch cpumask work suffice for
all three flush_tlb_* functions or a separate one for each was needed.  I
went for the safe route and used one for each function.

An alternate approach might be to add a scratch cpumask to the smp_flush_state
struct.  It already has one fixed cpumask_t (flush_cpumask) so this whole
struct should per_cpu_alloc'd.  (And since it's cacheline_aligned, then a
per_cpu_alloc_aligned() will probably be needed?)

(btw, only 64-bit shown here, 32-bit needs similar changes.)

Thanks,
Mike

---
Subject: cpumask: remove cpumask_t from stack for flush_tlb_xxx

Remove the scratch cpumask_t from the stack for the flush_tlb_*
functions.  Since we're disabling preemption, we can use a
one-per-cpu scratch cpumask.  "Un-const" the cpumask pointer arg
in native_flush_tlb_others as one of the flush_others functions
(uv) changed the cpumask.  I believe this is safe.

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/include/asm/paravirt.h  |    8 ++---
 arch/x86/include/asm/tlbflush.h  |    4 +-
 arch/x86/include/asm/uv/uv_bau.h |    3 +-
 arch/x86/kernel/tlb_64.c         |   53 +++++++++++++++++++++------------------
 arch/x86/kernel/tlb_uv.c         |   12 ++++----
 5 files changed, 43 insertions(+), 37 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/include/asm/paravirt.h
+++ linux-2.6-for-ingo/arch/x86/include/asm/paravirt.h
@@ -244,7 +244,7 @@ struct pv_mmu_ops {
 	void (*flush_tlb_user)(void);
 	void (*flush_tlb_kernel)(void);
 	void (*flush_tlb_single)(unsigned long addr);
-	void (*flush_tlb_others)(const cpumask_t *cpus, struct mm_struct *mm,
+	void (*flush_tlb_others)(struct cpumask *cpus, struct mm_struct *mm,
 				 unsigned long va);
 
 	/* Hooks for allocating and freeing a pagetable top-level */
@@ -984,10 +984,10 @@ static inline void __flush_tlb_single(un
 	PVOP_VCALL1(pv_mmu_ops.flush_tlb_single, addr);
 }
 
-static inline void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
-				    unsigned long va)
+static inline void flush_tlb_others(struct cpumask *cpumask,
+				    struct mm_struct *mm, unsigned long va)
 {
-	PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, &cpumask, mm, va);
+	PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, cpumask, mm, va);
 }
 
 static inline int paravirt_pgd_alloc(struct mm_struct *mm)
--- linux-2.6-for-ingo.orig/arch/x86/include/asm/tlbflush.h
+++ linux-2.6-for-ingo/arch/x86/include/asm/tlbflush.h
@@ -142,7 +142,7 @@ static inline void flush_tlb_range(struc
 	flush_tlb_mm(vma->vm_mm);
 }
 
-void native_flush_tlb_others(const cpumask_t *cpumask, struct mm_struct *mm,
+void native_flush_tlb_others(struct cpumask *cpumask, struct mm_struct *mm,
 			     unsigned long va);
 
 #define TLBSTATE_OK	1
@@ -166,7 +166,7 @@ static inline void reset_lazy_tlbstate(v
 #endif	/* SMP */
 
 #ifndef CONFIG_PARAVIRT
-#define flush_tlb_others(mask, mm, va)	native_flush_tlb_others(&mask, mm, va)
+#define flush_tlb_others(mask, mm, va)	native_flush_tlb_others(mask, mm, va)
 #endif
 
 static inline void flush_tlb_kernel_range(unsigned long start,
--- linux-2.6-for-ingo.orig/arch/x86/include/asm/uv/uv_bau.h
+++ linux-2.6-for-ingo/arch/x86/include/asm/uv/uv_bau.h
@@ -325,7 +325,8 @@ static inline void bau_cpubits_clear(str
 #define cpubit_isset(cpu, bau_local_cpumask) \
 	test_bit((cpu), (bau_local_cpumask).bits)
 
-extern int uv_flush_tlb_others(cpumask_t *, struct mm_struct *, unsigned long);
+extern int uv_flush_tlb_others(struct cpumask *, struct mm_struct *,
+			       unsigned long);
 extern void uv_bau_message_intr1(void);
 extern void uv_bau_timeout_intr1(void);
 
--- linux-2.6-for-ingo.orig/arch/x86/kernel/tlb_64.c
+++ linux-2.6-for-ingo/arch/x86/kernel/tlb_64.c
@@ -157,14 +157,13 @@ out:
 	inc_irq_stat(irq_tlb_count);
 }
 
-void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
+void native_flush_tlb_others(struct cpumask *cpumask, struct mm_struct *mm,
 			     unsigned long va)
 {
 	int sender;
 	union smp_flush_state *f;
-	cpumask_t cpumask = *cpumaskp;
 
-	if (is_uv_system() && uv_flush_tlb_others(&cpumask, mm, va))
+	if (is_uv_system() && uv_flush_tlb_others(cpumask, mm, va))
 		return;
 
 	/* Caller has disabled preemption */
@@ -180,7 +179,7 @@ void native_flush_tlb_others(const cpuma
 
 	f->flush_mm = mm;
 	f->flush_va = va;
-	cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask);
+	cpumask_or(&f->flush_cpumask, cpumask, &f->flush_cpumask);
 
 	/*
 	 * Make the above memory operations globally visible before
@@ -191,9 +190,9 @@ void native_flush_tlb_others(const cpuma
 	 * We have to send the IPI only to
 	 * CPUs affected.
 	 */
-	send_IPI_mask(&cpumask, INVALIDATE_TLB_VECTOR_START + sender);
+	send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR_START + sender);
 
-	while (!cpus_empty(f->flush_cpumask))
+	while (!cpumask_empty(&f->flush_cpumask))
 		cpu_relax();
 
 	f->flush_mm = NULL;
@@ -212,28 +211,32 @@ static int __cpuinit init_smp_flush(void
 }
 core_initcall(init_smp_flush);
 
+static DEFINE_PER_CPU(cpumask_t, flush_tlb_task_cpumask);
+
 void flush_tlb_current_task(void)
 {
 	struct mm_struct *mm = current->mm;
-	cpumask_t cpu_mask;
+	struct cpumask *cpu_mask;
 
-	preempt_disable();
-	cpu_mask = mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
+	cpu_mask = &get_cpu_var(flush_tlb_task_cpumask);
+	cpumask_copy(cpu_mask, &mm->cpu_vm_mask);
+	cpumask_clear_cpu(smp_processor_id(), cpu_mask);
 
 	local_flush_tlb();
-	if (!cpus_empty(cpu_mask))
+	if (!cpumask_empty(cpu_mask))
 		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
-	preempt_enable();
+	put_cpu_var(flush_tlb_task_cpumask);
 }
 
+static DEFINE_PER_CPU(cpumask_t, flush_tlb_mm_cpumask);
+
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	cpumask_t cpu_mask;
+	struct cpumask * cpu_mask;
 
-	preempt_disable();
-	cpu_mask = mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
+	cpu_mask = &get_cpu_var(flush_tlb_mm_cpumask);
+	cpumask_copy(cpu_mask, &mm->cpu_vm_mask);
+	cpumask_clear_cpu(smp_processor_id(), cpu_mask);
 
 	if (current->active_mm == mm) {
 		if (current->mm)
@@ -241,20 +244,22 @@ void flush_tlb_mm(struct mm_struct *mm)
 		else
 			leave_mm(smp_processor_id());
 	}
-	if (!cpus_empty(cpu_mask))
+	if (!cpumask_empty(cpu_mask))
 		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
 
-	preempt_enable();
+	put_cpu_var(flush_tlb_mm_cpumask);
 }
 
+static DEFINE_PER_CPU(cpumask_t, flush_tlb_page_cpumask);
+
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	cpumask_t cpu_mask;
+	struct cpumask *cpu_mask;
 
-	preempt_disable();
-	cpu_mask = mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
+	cpu_mask = &get_cpu_var(flush_tlb_page_cpumask);
+	cpumask_copy(cpu_mask, &mm->cpu_vm_mask);
+	cpumask_clear_cpu(smp_processor_id(), cpu_mask);
 
 	if (current->active_mm == mm) {
 		if (current->mm)
@@ -263,10 +268,10 @@ void flush_tlb_page(struct vm_area_struc
 			leave_mm(smp_processor_id());
 	}
 
-	if (!cpus_empty(cpu_mask))
+	if (!cpumask_empty(cpu_mask))
 		flush_tlb_others(cpu_mask, mm, va);
 
-	preempt_enable();
+	put_cpu_var(flush_tlb_page_cpumask);
 }
 
 static void do_flush_tlb_all(void *info)
--- linux-2.6-for-ingo.orig/arch/x86/kernel/tlb_uv.c
+++ linux-2.6-for-ingo/arch/x86/kernel/tlb_uv.c
@@ -216,7 +216,7 @@ static int uv_wait_completion(struct bau
  * unchanged.
  */
 int uv_flush_send_and_wait(int cpu, int this_blade, struct bau_desc *bau_desc,
-			   cpumask_t *cpumaskp)
+			   struct cpumask *cpumaskp)
 {
 	int completion_status = 0;
 	int right_shift;
@@ -263,13 +263,13 @@ int uv_flush_send_and_wait(int cpu, int 
 	 * Success, so clear the remote cpu's from the mask so we don't
 	 * use the IPI method of shootdown on them.
 	 */
-	for_each_cpu_mask(bit, *cpumaskp) {
+	for_each_cpu(bit, cpumaskp) {
 		blade = uv_cpu_to_blade_id(bit);
 		if (blade == this_blade)
 			continue;
-		cpu_clear(bit, *cpumaskp);
+		cpumask_clear_cpu(bit, cpumaskp);
 	}
-	if (!cpus_empty(*cpumaskp))
+	if (!cpumask_empty(cpumaskp))
 		return 0;
 	return 1;
 }
@@ -296,7 +296,7 @@ int uv_flush_send_and_wait(int cpu, int 
  * Returns 1 if all remote flushing was done.
  * Returns 0 if some remote flushing remains to be done.
  */
-int uv_flush_tlb_others(cpumask_t *cpumaskp, struct mm_struct *mm,
+int uv_flush_tlb_others(struct cpumask *cpumaskp, struct mm_struct *mm,
 			unsigned long va)
 {
 	int i;
@@ -315,7 +315,7 @@ int uv_flush_tlb_others(cpumask_t *cpuma
 	bau_nodes_clear(&bau_desc->distribution, UV_DISTRIBUTION_SIZE);
 
 	i = 0;
-	for_each_cpu_mask(bit, *cpumaskp) {
+	for_each_cpu(bit, cpumaskp) {
 		blade = uv_cpu_to_blade_id(bit);
 		BUG_ON(blade > (UV_DISTRIBUTION_SIZE - 1));
 		if (blade == this_blade) {

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/11] x86: cpumask: some more cpumask cleanups - flush_tlb_*
  2009-01-06  3:49   ` [PATCH 00/11] x86: cpumask: some more cpumask cleanups - flush_tlb_* Mike Travis
@ 2009-01-07  2:12     ` Rusty Russell
  2009-01-07  2:50       ` Mike Travis
  0 siblings, 1 reply; 24+ messages in thread
From: Rusty Russell @ 2009-01-07  2:12 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, Cliff Wickman, Nick Piggin, Jeremy Fitzhardinge,
	Christoph Lameter, Jes Sorensen, LKML

On Tuesday 06 January 2009 14:19:35 Mike Travis wrote:
> Ingo Molnar wrote:
> > Quite good! Can we fix those TLB flush cpumask uses too?
> 
> Here is one proposal.

Here's what I had.  It's untested though...

x86: change flush_tlb_others to take a const struct cpumask *. FIXME: REVIEW

This is made a little more tricky by uv_flush_tlb_others which
actually alters its argument, for an IPI to be sent to the remaining
cpus in the mask.

I solve this by allocating a cpumask_var_t for this case and falling back
to IPI should this fail.

To eliminate temporaries in the caller, all flush_tlb_others implementations
now do the this-cpu-elimination step themselves.

Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)"
which has been there since pre-git and yet f->flush_cpumask is always zero
at this point.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 arch/x86/include/asm/paravirt.h  |    8 ++--
 arch/x86/include/asm/tlbflush.h  |    6 +--
 arch/x86/include/asm/uv/uv_bau.h |    3 +
 arch/x86/kernel/tlb_32.c         |   69 ++++++++++++++++-----------------------
 arch/x86/kernel/tlb_64.c         |   62 ++++++++++++++++++-----------------
 arch/x86/kernel/tlb_uv.c         |   16 ++++-----
 arch/x86/xen/enlighten.c         |   31 ++++++-----------
 7 files changed, 92 insertions(+), 103 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -244,7 +244,8 @@ struct pv_mmu_ops {
 	void (*flush_tlb_user)(void);
 	void (*flush_tlb_kernel)(void);
 	void (*flush_tlb_single)(unsigned long addr);
-	void (*flush_tlb_others)(const cpumask_t *cpus, struct mm_struct *mm,
+	void (*flush_tlb_others)(const struct cpumask *cpus,
+				 struct mm_struct *mm,
 				 unsigned long va);
 
 	/* Hooks for allocating and freeing a pagetable top-level */
@@ -984,10 +985,11 @@ static inline void __flush_tlb_single(un
 	PVOP_VCALL1(pv_mmu_ops.flush_tlb_single, addr);
 }
 
-static inline void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
+static inline void flush_tlb_others(const struct cpumask *cpumask,
+				    struct mm_struct *mm,
 				    unsigned long va)
 {
-	PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, &cpumask, mm, va);
+	PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, cpumask, mm, va);
 }
 
 static inline int paravirt_pgd_alloc(struct mm_struct *mm)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -113,7 +113,7 @@ static inline void flush_tlb_range(struc
 		__flush_tlb();
 }
 
-static inline void native_flush_tlb_others(const cpumask_t *cpumask,
+static inline void native_flush_tlb_others(const struct cpumask *cpumask,
 					   struct mm_struct *mm,
 					   unsigned long va)
 {
@@ -142,8 +142,8 @@ static inline void flush_tlb_range(struc
 	flush_tlb_mm(vma->vm_mm);
 }
 
-void native_flush_tlb_others(const cpumask_t *cpumask, struct mm_struct *mm,
-			     unsigned long va);
+void native_flush_tlb_others(const struct cpumask *cpumask,
+			     struct mm_struct *mm, unsigned long va);
 
 #define TLBSTATE_OK	1
 #define TLBSTATE_LAZY	2
diff --git a/arch/x86/include/asm/uv/uv_bau.h b/arch/x86/include/asm/uv/uv_bau.h
--- a/arch/x86/include/asm/uv/uv_bau.h
+++ b/arch/x86/include/asm/uv/uv_bau.h
@@ -325,7 +325,8 @@ static inline void bau_cpubits_clear(str
 #define cpubit_isset(cpu, bau_local_cpumask) \
 	test_bit((cpu), (bau_local_cpumask).bits)
 
-extern int uv_flush_tlb_others(cpumask_t *, struct mm_struct *, unsigned long);
+extern int uv_flush_tlb_others(const struct cpumask *,
+			       struct mm_struct *, unsigned long);
 extern void uv_bau_message_intr1(void);
 extern void uv_bau_timeout_intr1(void);
 
diff --git a/arch/x86/kernel/tlb_32.c b/arch/x86/kernel/tlb_32.c
--- a/arch/x86/kernel/tlb_32.c
+++ b/arch/x86/kernel/tlb_32.c
@@ -20,7 +20,7 @@ DEFINE_PER_CPU(struct tlb_state, cpu_tlb
  *	Optimizations Manfred Spraul <manfred@colorfullife.com>
  */
 
-static cpumask_t flush_cpumask;
+static cpumask_var_t flush_cpumask;
 static struct mm_struct *flush_mm;
 static unsigned long flush_va;
 static DEFINE_SPINLOCK(tlbstate_lock);
@@ -93,7 +93,7 @@ void smp_invalidate_interrupt(struct pt_
 
 	cpu = get_cpu();
 
-	if (!cpu_isset(cpu, flush_cpumask))
+	if (!cpumask_test_cpu(cpu, flush_cpumask))
 		goto out;
 		/*
 		 * This was a BUG() but until someone can quote me the
@@ -115,34 +115,21 @@ void smp_invalidate_interrupt(struct pt_
 	}
 	ack_APIC_irq();
 	smp_mb__before_clear_bit();
-	cpu_clear(cpu, flush_cpumask);
+	cpumask_clear_cpu(cpu, flush_cpumask);
 	smp_mb__after_clear_bit();
 out:
 	put_cpu_no_resched();
 	inc_irq_stat(irq_tlb_count);
 }
 
-void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
-			     unsigned long va)
+void native_flush_tlb_others(const struct cpumask *cpumaskp,
+			     struct mm_struct *mm, unsigned long va)
 {
-	cpumask_t cpumask = *cpumaskp;
-
 	/*
-	 * A couple of (to be removed) sanity checks:
-	 *
-	 * - current CPU must not be in mask
 	 * - mask must exist :)
 	 */
-	BUG_ON(cpus_empty(cpumask));
-	BUG_ON(cpu_isset(smp_processor_id(), cpumask));
+	BUG_ON(cpumask_empty(cpumask));
 	BUG_ON(!mm);
-
-#ifdef CONFIG_HOTPLUG_CPU
-	/* If a CPU which we ran on has gone down, OK. */
-	cpus_and(cpumask, cpumask, cpu_online_map);
-	if (unlikely(cpus_empty(cpumask)))
-		return;
-#endif
 
 	/*
 	 * i'm not happy about this global shared spinlock in the
@@ -151,9 +138,17 @@ void native_flush_tlb_others(const cpuma
 	 */
 	spin_lock(&tlbstate_lock);
 
+	cpumask_andnot(flush_cpumask, cpumask, cpumask_of(smp_processor_id()));
+#ifdef CONFIG_HOTPLUG_CPU
+	/* If a CPU which we ran on has gone down, OK. */
+	cpumask_and(flush_cpumask, flush_cpumask, cpu_online_mask);
+	if (unlikely(cpumask_empty(flush_cpumask))) {
+		spin_unlock(&tlbstate_lock);
+		return;
+	}
+#endif
 	flush_mm = mm;
 	flush_va = va;
-	cpus_or(flush_cpumask, cpumask, flush_cpumask);
 
 	/*
 	 * Make the above memory operations globally visible before
@@ -164,9 +159,9 @@ void native_flush_tlb_others(const cpuma
 	 * We have to send the IPI only to
 	 * CPUs affected.
 	 */
-	send_IPI_mask(&cpumask, INVALIDATE_TLB_VECTOR);
+	send_IPI_mask(flush_cpumask, INVALIDATE_TLB_VECTOR);
 
-	while (!cpus_empty(flush_cpumask))
+	while (!cpumask_empty(flush_cpumask))
 		/* nothing. lockup detection does not belong here */
 		cpu_relax();
 
@@ -178,25 +173,18 @@ void flush_tlb_current_task(void)
 void flush_tlb_current_task(void)
 {
 	struct mm_struct *mm = current->mm;
-	cpumask_t cpu_mask;
 
 	preempt_disable();
-	cpu_mask = *mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
 
 	local_flush_tlb();
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+		flush_tlb_others(mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
 	preempt_enable();
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	cpumask_t cpu_mask;
-
 	preempt_disable();
-	cpu_mask = *mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
 
 	if (current->active_mm == mm) {
 		if (current->mm)
@@ -204,8 +192,8 @@ void flush_tlb_mm(struct mm_struct *mm)
 		else
 			leave_mm(smp_processor_id());
 	}
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+		flush_tlb_others(mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
 
 	preempt_enable();
 }
@@ -213,12 +201,8 @@ void flush_tlb_page(struct vm_area_struc
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	cpumask_t cpu_mask;
 
 	preempt_disable();
-	cpu_mask = *mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
-
 	if (current->active_mm == mm) {
 		if (current->mm)
 			__flush_tlb_one(va);
@@ -226,9 +210,8 @@ void flush_tlb_page(struct vm_area_struc
 			leave_mm(smp_processor_id());
 	}
 
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, va);
-
+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+		flush_tlb_others(mm->cpu_vm_mask, mm, va);
 	preempt_enable();
 }
 EXPORT_SYMBOL(flush_tlb_page);
@@ -255,3 +238,9 @@ void reset_lazy_tlbstate(void)
 	per_cpu(cpu_tlbstate, cpu).active_mm = &init_mm;
 }
 
+static int init_flush_cpumask(void)
+{
+	alloc_cpumask_var(&flush_cpumask, GFP_KERNEL);
+	return 0;
+}
+early_initcall(init_flush_cpumask);
diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/kernel/tlb_64.c
--- a/arch/x86/kernel/tlb_64.c
+++ b/arch/x86/kernel/tlb_64.c
@@ -43,10 +43,10 @@
 
 union smp_flush_state {
 	struct {
-		cpumask_t flush_cpumask;
 		struct mm_struct *flush_mm;
 		unsigned long flush_va;
 		spinlock_t tlbstate_lock;
+		DECLARE_BITMAP(flush_cpumask, NR_CPUS);
 	};
 	char pad[SMP_CACHE_BYTES];
 } ____cacheline_aligned;
@@ -131,7 +131,7 @@ asmlinkage void smp_invalidate_interrupt
 	sender = ~regs->orig_ax - INVALIDATE_TLB_VECTOR_START;
 	f = &per_cpu(flush_state, sender);
 
-	if (!cpu_isset(cpu, f->flush_cpumask))
+	if (!cpumask_test_cpu(cpu, to_cpumask(f->flush_cpumask)))
 		goto out;
 		/*
 		 * This was a BUG() but until someone can quote me the
@@ -153,19 +153,15 @@ asmlinkage void smp_invalidate_interrupt
 	}
 out:
 	ack_APIC_irq();
-	cpu_clear(cpu, f->flush_cpumask);
+	cpumask_clear_cpu(cpu, f->flush_cpumask);
 	inc_irq_stat(irq_tlb_count);
 }
 
-void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
-			     unsigned long va)
+static void flush_tlb_others_ipi(const struct cpumask *cpumaskp,
+				 struct mm_struct *mm, unsigned long va)
 {
 	int sender;
 	union smp_flush_state *f;
-	cpumask_t cpumask = *cpumaskp;
-
-	if (is_uv_system() && uv_flush_tlb_others(&cpumask, mm, va))
-		return;
 
 	/* Caller has disabled preemption */
 	sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;
@@ -180,7 +176,8 @@ void native_flush_tlb_others(const cpuma
 
 	f->flush_mm = mm;
 	f->flush_va = va;
-	cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask);
+	cpumask_andnot(to_cpumask(f->flush_cpumask),
+		       cpumask, cpumask_of(smp_processor_id));
 
 	/*
 	 * Make the above memory operations globally visible before
@@ -191,14 +188,32 @@ void native_flush_tlb_others(const cpuma
 	 * We have to send the IPI only to
 	 * CPUs affected.
 	 */
-	send_IPI_mask(&cpumask, INVALIDATE_TLB_VECTOR_START + sender);
+	send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR_START + sender);
 
-	while (!cpus_empty(f->flush_cpumask))
+	while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
 		cpu_relax();
 
 	f->flush_mm = NULL;
 	f->flush_va = 0;
 	spin_unlock(&f->tlbstate_lock);
+
+
+void native_flush_tlb_others(const struct cpumask *cpumask,
+			     struct mm_struct *mm, unsigned long va)
+{
+	if (is_uv_system()) {
+		cpumask_var_t after_uv_flush;
+
+		if (alloc_cpumask_var(&after_uv_flush, GFP_ATOMIC)) {
+			cpumask_andnot(after_uv_flush,
+				       cpumask, cpumask_of(smp_processor_id()));
+			if (!uv_flush_tlb_others(after_uv_flush, mm, va))
+				flush_tlb_others_ipi(after_uv_flush, mm, va);
+			free_cpumask_var(after_uv_flush);
+			return;
+		}
+	}
+	flush_tlb_others_ipi(cpumask, mm, va);
 }
 
 static int __cpuinit init_smp_flush(void)
@@ -215,34 +230,26 @@ void flush_tlb_current_task(void)
 void flush_tlb_current_task(void)
 {
 	struct mm_struct *mm = current->mm;
-	cpumask_t cpu_mask;
 
 	preempt_disable();
-	cpu_mask = *mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
-
 	local_flush_tlb();
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+		flush_tlb_others(mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
 	preempt_enable();
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	cpumask_t cpu_mask;
 
 	preempt_disable();
-	cpu_mask = *mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
-
 	if (current->active_mm == mm) {
 		if (current->mm)
 			local_flush_tlb();
 		else
 			leave_mm(smp_processor_id());
 	}
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+		flush_tlb_others(mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
 
 	preempt_enable();
 }
@@ -250,11 +257,8 @@ void flush_tlb_page(struct vm_area_struc
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	cpumask_t cpu_mask;
 
 	preempt_disable();
-	cpu_mask = *mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
 
 	if (current->active_mm == mm) {
 		if (current->mm)
@@ -263,8 +267,8 @@ void flush_tlb_page(struct vm_area_struc
 			leave_mm(smp_processor_id());
 	}
 
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, va);
+	if (cpumask_any_but(mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+		flush_tlb_others(mm->cpu_vm_mask, mm, va);
 
 	preempt_enable();
 }
diff --git a/arch/x86/kernel/tlb_uv.c b/arch/x86/kernel/tlb_uv.c
--- a/arch/x86/kernel/tlb_uv.c
+++ b/arch/x86/kernel/tlb_uv.c
@@ -212,11 +212,11 @@ static int uv_wait_completion(struct bau
  * The cpumaskp mask contains the cpus the broadcast was sent to.
  *
  * Returns 1 if all remote flushing was done. The mask is zeroed.
- * Returns 0 if some remote flushing remains to be done. The mask is left
- * unchanged.
+ * Returns 0 if some remote flushing remains to be done. The mask will have
+ * some bits still set.
  */
 int uv_flush_send_and_wait(int cpu, int this_blade, struct bau_desc *bau_desc,
-			   cpumask_t *cpumaskp)
+			   struct cpumask *cpumaskp)
 {
 	int completion_status = 0;
 	int right_shift;
@@ -263,13 +263,13 @@ int uv_flush_send_and_wait(int cpu, int 
 	 * Success, so clear the remote cpu's from the mask so we don't
 	 * use the IPI method of shootdown on them.
 	 */
-	for_each_cpu_mask(bit, *cpumaskp) {
+	for_each_cpu(bit, cpumaskp) {
 		blade = uv_cpu_to_blade_id(bit);
 		if (blade == this_blade)
 			continue;
-		cpu_clear(bit, *cpumaskp);
+		cpumask_clear_cpu(bit, cpumaskp);
 	}
-	if (!cpus_empty(*cpumaskp))
+	if (!cpumask_empty(cpumaskp))
 		return 0;
 	return 1;
 }
@@ -296,7 +296,7 @@ int uv_flush_send_and_wait(int cpu, int 
  * Returns 1 if all remote flushing was done.
  * Returns 0 if some remote flushing remains to be done.
  */
-int uv_flush_tlb_others(cpumask_t *cpumaskp, struct mm_struct *mm,
+int uv_flush_tlb_others(struct cpumask *cpumaskp, struct mm_struct *mm,
 			unsigned long va)
 {
 	int i;
@@ -315,7 +315,7 @@ int uv_flush_tlb_others(cpumask_t *cpuma
 	bau_nodes_clear(&bau_desc->distribution, UV_DISTRIBUTION_SIZE);
 
 	i = 0;
-	for_each_cpu_mask(bit, *cpumaskp) {
+	for_each_cpu(bit, cpumaskp) {
 		blade = uv_cpu_to_blade_id(bit);
 		BUG_ON(blade > (UV_DISTRIBUTION_SIZE - 1));
 		if (blade == this_blade) {
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -634,35 +634,27 @@ static void xen_flush_tlb_single(unsigne
 	preempt_enable();
 }
 
-static void xen_flush_tlb_others(const cpumask_t *cpus, struct mm_struct *mm,
-				 unsigned long va)
+static void xen_flush_tlb_others(const struct cpumask *cpus,
+				 struct mm_struct *mm, unsigned long va)
 {
 	struct {
 		struct mmuext_op op;
-		cpumask_t mask;
+		DECLARE_BITMAP(mask, NR_CPUS);
 	} *args;
-	cpumask_t cpumask = *cpus;
 	struct multicall_space mcs;
 
-	/*
-	 * A couple of (to be removed) sanity checks:
-	 *
-	 * - current CPU must not be in mask
-	 * - mask must exist :)
-	 */
-	BUG_ON(cpus_empty(cpumask));
-	BUG_ON(cpu_isset(smp_processor_id(), cpumask));
+	BUG_ON(cpumask_empty(cpus));
 	BUG_ON(!mm);
-
-	/* If a CPU which we ran on has gone down, OK. */
-	cpus_and(cpumask, cpumask, cpu_online_map);
-	if (cpus_empty(cpumask))
-		return;
 
 	mcs = xen_mc_entry(sizeof(*args));
 	args = mcs.args;
-	args->mask = cpumask;
-	args->op.arg2.vcpumask = &args->mask;
+	args->op.arg2.vcpumask = to_cpumask(args->mask);
+
+	/* Remove us, and any offline CPUS. */
+	cpumask_and(to_cpumask(args->mask), cpus, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), to_cpumask(args->mask));
+	if (unlikely(cpumask_empty(to_cpumask(args->mask))))
+		goto issue;
 
 	if (va == TLB_FLUSH_ALL) {
 		args->op.cmd = MMUEXT_TLB_FLUSH_MULTI;
@@ -673,6 +665,7 @@ static void xen_flush_tlb_others(const c
 
 	MULTI_mmuext_op(mcs.mc, &args->op, 1, NULL, DOMID_SELF);
 
+issue:
 	xen_mc_issue(PARAVIRT_LAZY_MMU);
 }
 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/11] x86: cpumask: some more cpumask cleanups - flush_tlb_*
  2009-01-07  2:12     ` Rusty Russell
@ 2009-01-07  2:50       ` Mike Travis
  0 siblings, 0 replies; 24+ messages in thread
From: Mike Travis @ 2009-01-07  2:50 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Jack Steiner, Cliff Wickman, Nick Piggin, Jeremy Fitzhardinge,
	Christoph Lameter, Jes Sorensen, LKML

Rusty Russell wrote:
> On Tuesday 06 January 2009 14:19:35 Mike Travis wrote:
>> Ingo Molnar wrote:
>>> Quite good! Can we fix those TLB flush cpumask uses too?
>> Here is one proposal.
> 
> Here's what I had.  It's untested though...
> 
> x86: change flush_tlb_others to take a const struct cpumask *. FIXME: REVIEW
> 
> This is made a little more tricky by uv_flush_tlb_others which
> actually alters its argument, for an IPI to be sent to the remaining
> cpus in the mask.
> 
> I solve this by allocating a cpumask_var_t for this case and falling back
> to IPI should this fail.

I thought about this but I wondered if we wanted to add the overhead of a kmalloc
call for every tlb flush?  For a UV system, simultaneous flushes will be quite common,
so introducing two kmalloc's in the path could really hamper performance.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-01-07 17:09 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-04 13:17 [PATCH 00/11] x86: cpumask: some more cpumask cleanups Mike Travis
2009-01-04 13:18 ` [PATCH 01/11] [PATCH] ia64: cpumask fix for is_affinity_mask_valid() Mike Travis
2009-01-04 13:18 ` [PATCH 02/11] cpumask: update local_cpus_show to use new cpumask API Mike Travis
2009-01-04 13:18 ` [PATCH 03/11] cpumask: update pci_bus_show_cpuaffinity " Mike Travis
2009-01-05 19:27   ` Jesse Barnes
2009-01-05 19:31     ` Mike Travis
2009-01-05 19:59       ` Jesse Barnes
2009-01-07 15:19         ` Ingo Molnar
2009-01-07 16:59           ` Jesse Barnes
2009-01-05 19:44     ` Linus Torvalds
2009-01-05 19:49       ` Jesse Barnes
2009-01-04 13:18 ` [PATCH 04/11] x86: cleanup remaining cpumask_t ops in smpboot code Mike Travis
2009-01-04 13:18 ` [PATCH 05/11] x86: clean up speedstep-centrino and reduce cpumask_t usage From: Rusty Russell <rusty@rustcorp.com.au> Mike Travis
2009-01-04 13:18 ` [PATCH 06/11] cpumask: Replace CPUMASK_ALLOC etc with cpumask_var_t. " Mike Travis
2009-01-04 13:18 ` [PATCH 07/11] cpumask: convert struct cpufreq_policy to " Mike Travis
2009-01-04 13:18 ` [PATCH 08/11] cpumask: use work_on_cpu in acpi/cstate.c Mike Travis
2009-01-04 13:18 ` [PATCH 09/11] cpumask: use cpumask_var_t in acpi-cpufreq.c Mike Travis
2009-01-04 13:18 ` [PATCH 10/11] cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write Mike Travis
2009-01-04 13:18 ` [PATCH 11/11] cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs Mike Travis
2009-01-04 14:44 ` [PATCH 00/11] x86: cpumask: some more cpumask cleanups Ingo Molnar
2009-01-05 18:28   ` Mike Travis
2009-01-06  3:49   ` [PATCH 00/11] x86: cpumask: some more cpumask cleanups - flush_tlb_* Mike Travis
2009-01-07  2:12     ` Rusty Russell
2009-01-07  2:50       ` Mike Travis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox