[PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3
@ 2008-04-05  1:11 Mike Travis
  2008-04-05  1:11 ` [PATCH 01/12] x86: Convert cpumask_of_cpu macro to allocated array Mike Travis
                   ` (11 more replies)
  0 siblings, 12 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel


Modify usage of cpumask_t variables to use pointers as much as possible.

Changes are:

	* Use an allocated array of cpumask_t's for cpumask_of_cpu().
	  This removes > 20,000 bytes of stack usage (see all changes
	  in the chart below),
	  as well as reduces the code generated for each usage.

	* Use set_cpus_allowed_ptr() to pass a pointer to the "newly allowed"
	  cpumask.  This removes > 10,000 bytes of stack usage.

	* Use node_to_cpumask_ptr that returns pointer to cpumask for the
	  specified node.  This removes > 10,000 bytes of stack usage.

	* Modify build_sched_domains and related sub-functions to pass
	  pointers to cpumask temp variables.  This consolidates stack
	  space that was spread over various functions.

	* Remove large array from numa_initmem_init() [> 8,000 bytes].

	* Optimize usages of {CPU,NODE}_MASK_{NONE,ALL} [> 9,000 bytes].

	* Various other changes to reduce stacksize and silence checkpatch
	  warnings [ > 7,000 bytes].

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Cc: Cliff Wickman <cpw@sgi.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Banks <gnb@melbourne.sgi.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Len Brown <len.brown@intel.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William L. Irwin <wli@holomorphy.com>

Signed-off-by: Mike Travis <travis@sgi.com>
---
v3: rebased on x86/latest + sched-devel/latest
    collapsed many patches so same files don't appear in different
    patches (kernel/sched.c unfortunately still appears in about 5
    or 6 patches, and include/asm-x86/topology.h is in 2.)

v2: resubmitted based on x86/latest.
--- ---------------------------------------------------------
* Memory Usages Changes

Patch list summary of various memory usage changes using the akpm2
config file with NR_CPUS=4096 and MAX_NUMNODES=512.

====== Data (-l 500)
... files 13 vars 1855 all 0 lim 500 unch 0

    1 - initial
    2 - cpumask_of_cpu
    3 - add-CPUMASK_ALL_PTR
    5 - nr_cpus-in-kernel_sched
    7 - generic-set_cpus_allowed
   12 - kernel_sched_c
   13 - use-CPUMASK_ALL_PTR

    .1.   .2.   .3.     .5.   .7.  .12.  .13.    ..final..
  32768     .     .  -32768     .     .     .    .  -100%  sched_group_nodes_bycpu(.bss)
  32768     .     .  -32768     .     .     .    .  -100%  init_sched_rt_entity_p(.bss)
  32768     .     .  -32768     .     .     .    .  -100%  init_rt_rq_p(.bss)
   3550     .     .      +9     .  -842     . 2717   -23%  build_sched_domains(.text)
    674   -95     .       .  -579     .     .    .  -100%  acpi_processor_set_throttling(.text)
    533  -533     .       .     .     .     .    .  -100%  hpet_enable(.init.text)
    512     .     .       .     .     .  -512    .  -100%  C(.rodata)
      0     .  +512       .     .     .     .  512      .  cpu_mask_all(.data.read_mostly)
 103573  -628  +512  -98295  -579  -842  -512 3229   -96%  Totals

====== Sections (-l 500)
... files 13 vars 37 all 0 lim 500 unch 0

    1 - initial
    2 - cpumask_of_cpu
    3 - add-CPUMASK_ALL_PTR
    5 - nr_cpus-in-kernel_sched
    6 - x86-set_cpus_allowed
    7 - generic-set_cpus_allowed
    8 - cpuset_cpus_allowed
    9 - cpumask_affinity
   10 - numa_initmem_init
   11 - node_to_cpumask_ptr
   12 - kernel_sched_c
   13 - use-CPUMASK_ALL_PTR

       .1.     .2.   .3.     .5.    .6.    .7.   .8.   .9.   .10.  .11.   .12.   .13.    ..final..
  75833950  -13010  +260  -98051  -2972  -2295  +644  +122  +8174  -465   -698  +1349 75727008    <1%  Total
  42810838   -7428   +39    +225   -757   -431  +215  +182     -1  -366   +945   +591 42804052    <1%  .debug_info
   6808830    -404  -211    -116   -316   -243   +42  +113    -22   +95  +1950    +14  6809732    <1%  .debug_loc
   4805176     +16  +512       .      .      .     .     .      .     .      .      .  4805704    <1%  .data.read_mostly
   3475017    -528   -16     +48   -128   -496  -256  -112      .  -496    -32    -80  3472921    <1%  .text
   2720422    -895   -16     +26   -192   -146   -23    -4     -1  -164    +26    +81  2719114    <1%  .debug_line
   1775040       .     .  -98176      .      .     .     .      .     .      .      .  1676864    -5%  .bss
   1395188    -245    -9       .   -131    -95   -19   -17      .     .    +14   +103  1394789    <1%  .debug_abbrev
   1141392   -3104   -64       .  -1408   -848  +752   -16      .  +480   +208   +640  1138032    <1%  .debug_ranges
   1021159     +32  -512       .      .  -1152  -640     .      .     .  -1377   -512  1016998    <1%  .rodata
    982688       .     .       .      .      .     .     .  +8208     .      .      .   990896    <1%  .init.data
      8080     -80  +464       .      .  +1152  +640     .      .     .  -2720   +512     8048    <1%  __param
142777780 -25646 +447 -196044 -5904 -4554 +1355 +268 +16358 -916 -1684 +2698 142564158   +0%  Totals

====== Text/Data ()
... files 13 vars 6 all 0 lim 0 unch 0

    1 - initial
    2 - cpumask_of_cpu
    5 - nr_cpus-in-kernel_sched
    7 - generic-set_cpus_allowed
   10 - numa_initmem_init
   12 - kernel_sched_c

       .1.    .2.     .5.    .7.   .10.   .12.    ..final..
   3475456  -2048       .      .      .      . 3473408    <1%  TextSize
   1738752  +2048       .  -4096      .  -4096 1732608    <1%  DataSize
   1775616      .  -98304      .      .      . 1677312    -5%  BssSize
   1220608      .       .      .  +8192      . 1228800    <1%  InitSize
  10399744      .       .  -4096      .  +4096 10399744      .  OtherSize
 18610176     . -98304 -8192 +8192     . 18511872   +0%  Totals

====== PerCPU ()
... files 13 vars 10 all 0 lim 0 unch 0

    1 - initial

  .1.    ..final..
   0 .   +0%  Totals

====== Stack (-l 500)
... files 13 vars 166 all 0 lim 500 unch 0

    1 - initial
    2 - cpumask_of_cpu
    3 - add-CPUMASK_ALL_PTR
    5 - nr_cpus-in-kernel_sched
    6 - x86-set_cpus_allowed
    7 - generic-set_cpus_allowed
    8 - cpuset_cpus_allowed
    9 - cpumask_affinity
   10 - numa_initmem_init
   11 - node_to_cpumask_ptr
   12 - kernel_sched_c
   13 - use-CPUMASK_ALL_PTR

    .1.    .2.    .3.    .5.   .6.    .7.   .8.    .9.   .10.   .11.   .12.  .13.    ..final..
  11080      .      .      .     .      .     .      .      .   -512  -8336     . 2232   -79%  build_sched_domains
   8248      .      .      .     .      .     .      .  -8248      .      .     .    .  -100%  numa_initmem_init
   4648      .      .  -3840     .      .     .      .      .      .   -808     .    .  -100%  cpu_attach_domain
   3176      .    -16    +16     .      .     .      .      .  -1552  -1024     .  600   -81%  sched_domain_node_span
   3176   -512      .      .  -512      .     .      .      .      .      .     . 2152   -32%  centrino_target
   2584  -1024      .      .     .   -512     .      .      .      .      .     . 1048   -59%  acpi_processor_set_throttling
   2104      .      .      .     .  -1024     .      .      .      .      .     . 1080   -48%  _cpu_down
   2088  -1024      .      .  -512      .     .      .      .      .      .     .  552   -73%  powernowk8_cpu_init
   2072   -512      .      .     .      .     .      .      .      .      .     . 1560   -24%  tick_notify
   2056      .      .      .     .      .     .  -2056      .      .      .     .    .  -100%  affinity_set
   1784  -1024      .      .     .      .     .      .      .      .      .     .  760   -57%  cpufreq_add_dev
   1704      .      .      .     .      .     .      .      .  -1704      .     .    .  -100%  kswapd
   1608   -512      .      .  -512      .     .      .      .      .      .     .  584   -63%  powernowk8_target
   1608  -1608      .      .     .      .     .      .      .      .      .     .    .  -100%  disable_smp
   1608   -512      .      .  -512      .     .      .      .      .      .     .  584   -63%  cache_add_dev
   1592      .      .      .     .      .     .      .      .      .  -1592     .    .  -100%  do_tune_cpucache
   1576      .      .      .     .      .     .      .      .      .  -1576     .    .  -100%  init_sched_build_groups
   1560      .      .      .     .  -1040     .      .      .      .      .     .  520   -66%  pci_device_probe
   1560   -512      .      .  -512      .     .      .      .      .      .     .  536   -65%  check_supported_cpu
   1544      .      .      .     .      .  -512   +512      .      .   -512     . 1032   -33%  sched_setaffinity
   1544   -512      .      .  -520      .     .      .      .      .      .     .  512   -66%  powernowk8_get
   1544  -1008      .      .     .      .     .      .      .      .      .     .  536   -65%  alloc_ldt
   1536   -504      .      .     .      .     .      .      .      .      .     . 1032   -32%  smp_call_function_single
   1536  -1024      .      .     .      .     .      .      .      .      .     .  512   -66%  native_smp_send_reschedule
   1536   -512      .      .  -504      .     .      .      .      .      .     .  520   -66%  get_cur_freq
   1536   -512      .      .     .   -512     .      .      .      .      .     .  512   -66%  acpi_processor_get_throttling
   1536   -512      .      .  -504      .     .      .      .      .      .     .  520   -66%  acpi_processor_ffh_cstate_probe
   1176      .      .      .     .      .     .      .      .      .   -512     .  664   -43%  thread_return
   1176      .      .      .     .      .     .      .      .      .   -512     .  664   -43%  schedule
   1160      .      .      .     .      .     .      .      .      .   -512     .  648   -44%  run_rebalance_domains
   1160      .      .      .     .      .     .      .      .  -1160      .     .    .  -100%  __build_all_zonelists
   1152      .      .      .     .      .  -504      .      .      .      .     .  648   -43%  cpuset_attach
   1072      .      .      .     .      .  -512      .      .      .      .     .  560   -47%  pdflush
   1064      .      .      .     .      .     .      .      .  -1064      .     .    .  -100%  cpuup_canceled
   1064      .      .      .     .      .     .      .      .      .  -1064     .    .  -100%  cpuup_callback
   1032  -1032      .      .     .      .     .      .      .      .      .     .    .  -100%  uv_target_cpus
   1032      .      .      .     .   -520     .      .      .      .      .     .  512   -50%  system_kthread_notifier
   1032  -1032      .      .     .      .     .      .      .      .      .     .    .  -100%  setup_pit_timer
   1032      .      .      .     .      .     .      .      .      .   -512     .  520   -49%  sched_init_smp
   1032      .      .      .     .      .     .      .      .      .      .  -520  512   -50%  physflat_vector_allocation_domain
   1032      .  -1032      .     .      .     .      .      .      .      .     .    .  -100%  kernel_init
   1032  -1032      .      .     .      .     .      .      .      .      .     .    .  -100%  init_workqueues
   1032  -1032      .      .     .      .     .      .      .      .      .     .    .  -100%  init_idle
   1032      .      .      .     .      .     .      .      .      .      .  -512  520   -49%  destroy_irq
   1032      .      .      .     .  -1032     .      .      .      .      .     .    .  -100%  ____call_usermodehelper
   1024      .      .      .     .      .     .   -512      .      .      .     .  512   -50%  sys_sched_setaffinity
   1024   -504      .      .     .   -520     .      .      .      .      .     .    .  -100%  stopmachine
   1024  -1024      .      .     .      .     .      .      .      .      .     .    .  -100%  setup_APIC_timer
   1024  -1024      .      .     .      .     .      .      .      .      .     .    .  -100%  native_smp_prepare_cpus
   1024   -504      .      .  -520      .     .      .      .      .      .     .    .  -100%  native_machine_shutdown
   1024      .      .      .     .  -1024     .      .      .      .      .     .    .  -100%  kthreadd
   1024  -1024      .      .     .      .     .      .      .      .      .     .    .  -100%  kthread_bind
   1024  -1024      .      .     .      .     .      .      .      .      .     .    .  -100%  hpet_enable
   1024      .      .      .     .      .     .   -512      .      .      .     .  512   -50%  compat_sys_sched_setaffinity
   1024      .      .      .     .      .     .      .      .      .      .  -512  512   -50%  __percpu_populate_mask
    576      .      .      .     .      .  -576      .      .      .      .     .    .  -100%  cpuset_init
    576      .      .      .     .      .  -576      .      .      .      .     .    .  -100%  cpuset_create
    552      .      .      .     .      .     .      .      .      .   -552     .    .  -100%  migration_call
    520      .      .      .     .      .     .      .      .   -520      .     .    .  -100%  node_read_cpumap
    520      .      .      .     .      .     .      .      .      .      .  -520    .  -100%  dynamic_irq_init
    520      .      .      .     .      .  -520      .      .      .      .     .    .  -100%  cpuset_cpus_allowed
    520      .      .      .     .      .  -520      .      .      .      .     .    .  -100%  cpuset_change_cpumask
    520      .      .      .     .      .     .      .      .      .   -520     .    .  -100%  cpu_to_phys_group
    520      .      .      .     .      .     .      .      .      .   -520     .    .  -100%  cpu_to_core_group
    520      .      .      .     .      .     .   -520      .      .      .     .    .  -100%  affinity_restore
    512      .      .      .     .      .  -512      .      .      .      .     .    .  -100%  cpuset_cpus_allowed_locked
      0      .      .      .     .      .     .      .      .      .   +760     .  760      .  sd_init_SIBLING
      0      .      .      .     .      .     .      .      .      .   +760     .  760      .  sd_init_NODE
      0      .      .      .     .      .     .      .      .      .   +752     .  752      .  sd_init_MC
      0      .      .      .     .      .     .      .      .      .   +752     .  752      .  sd_init_CPU
      0      .      .      .     .      .     .      .      .      .   +752     .  752      .  sd_init_ALLNODES
      0      .      .      .     .      .     .      .      .      .   +512     .  512      .  detach_destroy_domains
103584 -21056 -1048 -3824 -4608 -6184 -4232 -3088 -8248 -6512 -14264 -2064 28456  -72%  Totals

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/12] x86: Convert cpumask_of_cpu macro to allocated array
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 02/12] cpumask: add CPU_MASK_ALL_PTR macro Mike Travis
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel,
	Ingo Molnar, Christoph Lameter

[-- Attachment #1: cpumask_of_cpu --]
[-- Type: text/plain, Size: 3484 bytes --]

  * Here is a simple patch to use an allocated array of cpumasks to
    represent cpumask_of_cpu() instead of constructing one on the stack.
    It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
    currently only set for x86_64 SMP.  Otherwise the the existing
    cpumask_of_cpu() is used but has been changed to produce an lvalue
    so a pointer to it can be used.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/Kconfig        |    3 +++
 arch/x86/kernel/setup.c |   28 +++++++++++++++++++++++++++-
 include/linux/cpumask.h |   12 +++++++++---
 3 files changed, 39 insertions(+), 4 deletions(-)

--- linux-2.6.x86.orig/arch/x86/Kconfig
+++ linux-2.6.x86/arch/x86/Kconfig
@@ -124,6 +124,9 @@ config ARCH_HAS_CPU_RELAX
 config HAVE_SETUP_PER_CPU_AREA
 	def_bool X86_64 || (X86_SMP && !X86_VOYAGER)
 
+config HAVE_CPUMASK_OF_CPU_MAP
+	def_bool X86_64_SMP
+
 config ARCH_HIBERNATION_POSSIBLE
 	def_bool y
 	depends on !SMP || !X86_VOYAGER
--- linux-2.6.x86.orig/arch/x86/kernel/setup.c
+++ linux-2.6.x86/arch/x86/kernel/setup.c
@@ -38,6 +38,24 @@ static void __init setup_per_cpu_maps(vo
 #endif
 }
 
+#ifdef CONFIG_HAVE_CPUMASK_OF_CPU_MAP
+cpumask_t *cpumask_of_cpu_map __read_mostly;
+EXPORT_SYMBOL(cpumask_of_cpu_map);
+
+/* requires nr_cpu_ids to be initialized */
+static void __init setup_cpumask_of_cpu(void)
+{
+	int i;
+
+	/* alloc_bootmem zeroes memory */
+	cpumask_of_cpu_map = alloc_bootmem_low(sizeof(cpumask_t) * nr_cpu_ids);
+	for (i = 0; i < nr_cpu_ids; i++)
+		cpu_set(i, cpumask_of_cpu_map[i]);
+}
+#else
+static inline void setup_cpumask_of_cpu(void) { }
+#endif
+
 #ifdef CONFIG_X86_32
 /*
  * Great future not-so-futuristic plan: make i386 and x86_64 do it
@@ -54,7 +72,7 @@ EXPORT_SYMBOL(__per_cpu_offset);
  */
 void __init setup_per_cpu_areas(void)
 {
-	int i;
+	int i, highest_cpu = 0;
 	unsigned long size;
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -88,10 +106,18 @@ void __init setup_per_cpu_areas(void)
 		__per_cpu_offset[i] = ptr - __per_cpu_start;
 #endif
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+
+		highest_cpu = i;
 	}
 
+	nr_cpu_ids = highest_cpu + 1;
+	printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d\n", NR_CPUS, nr_cpu_ids);
+
 	/* Setup percpu data maps */
 	setup_per_cpu_maps();
+
+	/* Setup cpumask_of_cpu map */
+	setup_cpumask_of_cpu();
 }
 
 #endif
--- linux-2.6.x86.orig/include/linux/cpumask.h
+++ linux-2.6.x86/include/linux/cpumask.h
@@ -222,8 +222,13 @@ int __next_cpu(int n, const cpumask_t *s
 #define next_cpu(n, src)	({ (void)(src); 1; })
 #endif
 
+#ifdef CONFIG_HAVE_CPUMASK_OF_CPU_MAP
+extern cpumask_t *cpumask_of_cpu_map;
+#define cpumask_of_cpu(cpu)    (cpumask_of_cpu_map[cpu])
+
+#else
 #define cpumask_of_cpu(cpu)						\
-({									\
+(*({									\
 	typeof(_unused_cpumask_arg_) m;					\
 	if (sizeof(m) == sizeof(unsigned long)) {			\
 		m.bits[0] = 1UL<<(cpu);					\
@@ -231,8 +236,9 @@ int __next_cpu(int n, const cpumask_t *s
 		cpus_clear(m);						\
 		cpu_set((cpu), m);					\
 	}								\
-	m;								\
-})
+	&m;								\
+}))
+#endif
 
 #define CPU_MASK_LAST_WORD BITMAP_LAST_WORD_MASK(NR_CPUS)
 

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 02/12] cpumask: add CPU_MASK_ALL_PTR macro
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
  2008-04-05  1:11 ` [PATCH 01/12] x86: Convert cpumask_of_cpu macro to allocated array Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 03/12] cpumask: reduce stack pressure in cpu_coregroup_map Mike Travis
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel

[-- Attachment #1: add-CPUMASK_ALL_PTR --]
[-- Type: text/plain, Size: 2132 bytes --]

  * Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as
    a pointer reference to CPU_MASK_ALL.  This reduces where possible
    the instances where CPU_MASK_ALL allocates and fills a large
    array on the stack.  Used only if NR_CPUS > BITS_PER_LONG.

  * Change init/main.c to use new set_cpus_allowed_ptr().

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

# x86
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>

Signed-off-by: Mike Travis <travis@sgi.com>
---
 include/linux/cpumask.h |    6 ++++++
 init/main.c             |    7 ++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

--- linux-2.6.x86.orig/include/linux/cpumask.h
+++ linux-2.6.x86/include/linux/cpumask.h
@@ -249,6 +249,8 @@ extern cpumask_t *cpumask_of_cpu_map;
 	[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD			\
 } }
 
+#define CPU_MASK_ALL_PTR	(&CPU_MASK_ALL)
+
 #else
 
 #define CPU_MASK_ALL							\
@@ -257,6 +259,10 @@ extern cpumask_t *cpumask_of_cpu_map;
 	[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD			\
 } }
 
+/* cpu_mask_all is in init/main.c */
+extern cpumask_t cpu_mask_all;
+#define CPU_MASK_ALL_PTR	(&cpu_mask_all)
+
 #endif
 
 #define CPU_MASK_NONE							\
--- linux-2.6.x86.orig/init/main.c
+++ linux-2.6.x86/init/main.c
@@ -366,6 +366,11 @@ static inline void smp_prepare_cpus(unsi
 
 #else
 
+#if NR_CPUS > BITS_PER_LONG
+cpumask_t cpu_mask_all __read_mostly = CPU_MASK_ALL;
+EXPORT_SYMBOL(cpu_mask_all);
+#endif
+
 /* Setup number of possible processor ids */
 int nr_cpu_ids __read_mostly = NR_CPUS;
 EXPORT_SYMBOL(nr_cpu_ids);
@@ -837,7 +842,7 @@ static int __init kernel_init(void * unu
 	/*
 	 * init can run on any cpu.
 	 */
-	set_cpus_allowed(current, CPU_MASK_ALL);
+	set_cpus_allowed_ptr(current, CPU_MASK_ALL_PTR);
 	/*
 	 * Tell the world that we're going to be the grim
 	 * reaper of innocent orphaned children.

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 03/12] cpumask: reduce stack pressure in cpu_coregroup_map
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
  2008-04-05  1:11 ` [PATCH 01/12] x86: Convert cpumask_of_cpu macro to allocated array Mike Travis
  2008-04-05  1:11 ` [PATCH 02/12] cpumask: add CPU_MASK_ALL_PTR macro Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 04/12] sched: Remove fixed NR_CPUS sized arrays in kernel_sched_c v2 Mike Travis
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel,
	David S. Miller, William L. Irwin

[-- Attachment #1: cpu_coregroup_map --]
[-- Type: text/plain, Size: 3289 bytes --]

  * Return pointer to requested cpumask_t value for the cpu_coregroup_map()
    functions instead of returning the cpumask_t value on the stack.

    The only uses of these functions are in the sparc and x86 architectures.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

# sparc
Cc: David S. Miller <davem@davemloft.net>
Cc: William L. Irwin <wli@holomorphy.com>

# x86
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/smpboot.c      |    6 +++---
 include/asm-sparc64/topology.h |    2 +-
 include/asm-x86/topology.h     |    2 +-
 kernel/sched.c                 |    6 +++---
 4 files changed, 8 insertions(+), 8 deletions(-)

--- linux-2.6.x86.orig/arch/x86/kernel/smpboot.c
+++ linux-2.6.x86/arch/x86/kernel/smpboot.c
@@ -545,7 +545,7 @@ void __cpuinit set_cpu_sibling_map(int c
 }
 
 /* maps the cpu to the sched domain representing multi-core */
-cpumask_t cpu_coregroup_map(int cpu)
+const cpumask_t *cpu_coregroup_map(int cpu)
 {
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	/*
@@ -553,9 +553,9 @@ cpumask_t cpu_coregroup_map(int cpu)
 	 * And for power savings, we return cpu_core_map
 	 */
 	if (sched_mc_power_savings || sched_smt_power_savings)
-		return per_cpu(cpu_core_map, cpu);
+		return &per_cpu(cpu_core_map, cpu);
 	else
-		return c->llc_shared_map;
+		return &c->llc_shared_map;
 }
 
 /*
--- linux-2.6.x86.orig/include/asm-sparc64/topology.h
+++ linux-2.6.x86/include/asm-sparc64/topology.h
@@ -12,6 +12,6 @@
 
 #include <asm-generic/topology.h>
 
-#define cpu_coregroup_map(cpu)			(cpu_core_map[cpu])
+#define cpu_coregroup_map(cpu)			(&cpu_core_map[cpu])
 
 #endif /* _ASM_SPARC64_TOPOLOGY_H */
--- linux-2.6.x86.orig/include/asm-x86/topology.h
+++ linux-2.6.x86/include/asm-x86/topology.h
@@ -201,7 +201,7 @@ static inline void set_mp_bus_to_node(in
 
 #include <asm-generic/topology.h>
 
-extern cpumask_t cpu_coregroup_map(int cpu);
+const cpumask_t *cpu_coregroup_map(int cpu);
 
 #ifdef ENABLE_TOPO_DEFINES
 #define topology_physical_package_id(cpu)	(cpu_data(cpu).phys_proc_id)
--- linux-2.6.x86.orig/kernel/sched.c
+++ linux-2.6.x86/kernel/sched.c
@@ -6615,7 +6615,7 @@ cpu_to_phys_group(int cpu, const cpumask
 {
 	int group;
 #ifdef CONFIG_SCHED_MC
-	cpumask_t mask = cpu_coregroup_map(cpu);
+	cpumask_t mask = *cpu_coregroup_map(cpu);
 	cpus_and(mask, mask, *cpu_map);
 	group = first_cpu(mask);
 #elif defined(CONFIG_SCHED_SMT)
@@ -6849,7 +6849,7 @@ static int build_sched_domains(const cpu
 		p = sd;
 		sd = &per_cpu(core_domains, i);
 		*sd = SD_MC_INIT;
-		sd->span = cpu_coregroup_map(i);
+		sd->span = *cpu_coregroup_map(i);
 		cpus_and(sd->span, sd->span, *cpu_map);
 		sd->parent = p;
 		p->child = sd;
@@ -6884,7 +6884,7 @@ static int build_sched_domains(const cpu
 #ifdef CONFIG_SCHED_MC
 	/* Set up multi-core groups */
 	for_each_cpu_mask(i, *cpu_map) {
-		cpumask_t this_core_map = cpu_coregroup_map(i);
+		cpumask_t this_core_map = *cpu_coregroup_map(i);
 		cpus_and(this_core_map, this_core_map, *cpu_map);
 		if (i != first_cpu(this_core_map))
 			continue;

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 04/12] sched: Remove fixed NR_CPUS sized arrays in kernel_sched_c v2
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
                   ` (2 preceding siblings ...)
  2008-04-05  1:11 ` [PATCH 03/12] cpumask: reduce stack pressure in cpu_coregroup_map Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-22 16:16   ` Tony Luck
  2008-04-05  1:11 ` [PATCH 05/12] x86: use new set_cpus_allowed_ptr function Mike Travis
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel

[-- Attachment #1: nr_cpus-in-kernel_sched --]
[-- Type: text/plain, Size: 6893 bytes --]

 * Change fixed size arrays to per_cpu variables or dynamically allocated
   arrays in sched_init() and sched_init_smp().

     (1) static struct sched_entity *init_sched_entity_p[NR_CPUS];
     (1) static struct cfs_rq *init_cfs_rq_p[NR_CPUS];
     (1) static struct sched_rt_entity *init_sched_rt_entity_p[NR_CPUS];
     (1) static struct rt_rq *init_rt_rq_p[NR_CPUS];
	 static struct sched_group **sched_group_nodes_bycpu[NR_CPUS];

     (1) - these arrays are allocated via alloc_bootmem_low()

 * Change sched_domain_debug_one() to use cpulist_scnprintf instead of
   cpumask_scnprintf.  This reduces the output buffer required and improves
   readability when large NR_CPU count machines arrive.

 * In sched_create_group() we allocate new arrays based on nr_cpu_ids.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Travis <travis@sgi.com>
---
v2: Removed reference to cpumask_scnprintf_len().
---
 kernel/sched.c |   80 +++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 52 insertions(+), 28 deletions(-)

--- linux-2.6.x86.orig/kernel/sched.c
+++ linux-2.6.x86/kernel/sched.c
@@ -68,6 +68,7 @@
 #include <linux/hrtimer.h>
 #include <linux/ftrace.h>
 #include <linux/tick.h>
+#include <linux/bootmem.h>
 
 #include <asm/tlb.h>
 #include <asm/irq_regs.h>
@@ -278,17 +279,11 @@ struct task_group {
 static DEFINE_PER_CPU(struct sched_entity, init_sched_entity);
 /* Default task group's cfs_rq on each cpu */
 static DEFINE_PER_CPU(struct cfs_rq, init_cfs_rq) ____cacheline_aligned_in_smp;
-
-static struct sched_entity *init_sched_entity_p[NR_CPUS];
-static struct cfs_rq *init_cfs_rq_p[NR_CPUS];
 #endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
 static DEFINE_PER_CPU(struct sched_rt_entity, init_sched_rt_entity);
 static DEFINE_PER_CPU(struct rt_rq, init_rt_rq) ____cacheline_aligned_in_smp;
-
-static struct sched_rt_entity *init_sched_rt_entity_p[NR_CPUS];
-static struct rt_rq *init_rt_rq_p[NR_CPUS];
 #endif
 
 /* task_group_lock serializes add/remove of task groups and also changes to
@@ -312,17 +307,7 @@ static int init_task_group_load = INIT_T
 /* Default task group.
  *	Every task in system belong to this group at bootup.
  */
-struct task_group init_task_group = {
-#ifdef CONFIG_FAIR_GROUP_SCHED
-	.se	= init_sched_entity_p,
-	.cfs_rq = init_cfs_rq_p,
-#endif
-
-#ifdef CONFIG_RT_GROUP_SCHED
-	.rt_se	= init_sched_rt_entity_p,
-	.rt_rq	= init_rt_rq_p,
-#endif
-};
+struct task_group init_task_group;
 
 /* return group to which a task belongs */
 static inline struct task_group *task_group(struct task_struct *p)
@@ -3754,7 +3739,7 @@ static inline void trigger_load_balance(
 			 */
 			int ilb = first_cpu(nohz.cpu_mask);
 
-			if (ilb != NR_CPUS)
+			if (ilb < nr_cpu_ids)
 				resched_cpu(ilb);
 		}
 	}
@@ -5729,11 +5714,11 @@ static void move_task_off_dead_cpu(int d
 		dest_cpu = any_online_cpu(mask);
 
 		/* On any allowed CPU? */
-		if (dest_cpu == NR_CPUS)
+		if (dest_cpu >= nr_cpu_ids)
 			dest_cpu = any_online_cpu(p->cpus_allowed);
 
 		/* No more Mr. Nice Guy. */
-		if (dest_cpu == NR_CPUS) {
+		if (dest_cpu >= nr_cpu_ids) {
 			cpumask_t cpus_allowed = cpuset_cpus_allowed_locked(p);
 			/*
 			 * Try to stay on the same cpuset, where the
@@ -6188,9 +6173,9 @@ static int sched_domain_debug_one(struct
 {
 	struct sched_group *group = sd->groups;
 	cpumask_t groupmask;
-	char str[NR_CPUS];
+	char str[256];
 
-	cpumask_scnprintf(str, NR_CPUS, sd->span);
+	cpulist_scnprintf(str, sizeof(str), sd->span);
 	cpus_clear(groupmask);
 
 	printk(KERN_DEBUG "%*s domain %d: ", level, "", level);
@@ -6243,7 +6228,7 @@ static int sched_domain_debug_one(struct
 
 		cpus_or(groupmask, groupmask, group->cpumask);
 
-		cpumask_scnprintf(str, NR_CPUS, group->cpumask);
+		cpulist_scnprintf(str, sizeof(str), group->cpumask);
 		printk(KERN_CONT " %s", str);
 
 		group = group->next;
@@ -6637,7 +6622,7 @@ cpu_to_phys_group(int cpu, const cpumask
  * gets dynamically allocated.
  */
 static DEFINE_PER_CPU(struct sched_domain, node_domains);
-static struct sched_group **sched_group_nodes_bycpu[NR_CPUS];
+static struct sched_group ***sched_group_nodes_bycpu;
 
 static DEFINE_PER_CPU(struct sched_domain, allnodes_domains);
 static DEFINE_PER_CPU(struct sched_group, sched_group_allnodes);
@@ -7280,6 +7265,11 @@ void __init sched_init_smp(void)
 {
 	cpumask_t non_isolated_cpus;
 
+#if defined(CONFIG_NUMA)
+	sched_group_nodes_bycpu = kzalloc(nr_cpu_ids * sizeof(void **),
+								GFP_KERNEL);
+	BUG_ON(sched_group_nodes_bycpu == NULL);
+#endif
 	get_online_cpus();
 	arch_init_sched_domains(&cpu_online_map);
 	non_isolated_cpus = cpu_possible_map;
@@ -7297,6 +7287,11 @@ void __init sched_init_smp(void)
 #else
 void __init sched_init_smp(void)
 {
+#if defined(CONFIG_NUMA)
+	sched_group_nodes_bycpu = kzalloc(nr_cpu_ids * sizeof(void **),
+								GFP_KERNEL);
+	BUG_ON(sched_group_nodes_bycpu == NULL);
+#endif
 	sched_init_granularity();
 }
 #endif /* CONFIG_SMP */
@@ -7393,6 +7388,35 @@ static void init_tg_rt_entry(struct rq *
 void __init sched_init(void)
 {
 	int i, j;
+	unsigned long alloc_size = 0, ptr;
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	alloc_size += 2 * nr_cpu_ids * sizeof(void **);
+#endif
+#ifdef CONFIG_RT_GROUP_SCHED
+	alloc_size += 2 * nr_cpu_ids * sizeof(void **);
+#endif
+	/*
+	 * As sched_init() is called before page_alloc is setup,
+	 * we use alloc_bootmem().
+	 */
+	if (alloc_size) {
+		ptr = (unsigned long)alloc_bootmem_low(alloc_size);
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+		init_task_group.se = (struct sched_entity **)ptr;
+		ptr += nr_cpu_ids * sizeof(void **);
+
+		init_task_group.cfs_rq = (struct cfs_rq **)ptr;
+		ptr += nr_cpu_ids * sizeof(void **);
+#endif
+#ifdef CONFIG_RT_GROUP_SCHED
+		init_task_group.rt_se = (struct sched_rt_entity **)ptr;
+		ptr += nr_cpu_ids * sizeof(void **);
+
+		init_task_group.rt_rq = (struct rt_rq **)ptr;
+#endif
+	}
 
 #ifdef CONFIG_SMP
 	init_defrootdomain();
@@ -7643,10 +7667,10 @@ static int alloc_fair_sched_group(struct
 	struct rq *rq;
 	int i;
 
-	tg->cfs_rq = kzalloc(sizeof(cfs_rq) * NR_CPUS, GFP_KERNEL);
+	tg->cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL);
 	if (!tg->cfs_rq)
 		goto err;
-	tg->se = kzalloc(sizeof(se) * NR_CPUS, GFP_KERNEL);
+	tg->se = kzalloc(sizeof(se) * nr_cpu_ids, GFP_KERNEL);
 	if (!tg->se)
 		goto err;
 
@@ -7728,10 +7752,10 @@ static int alloc_rt_sched_group(struct t
 	struct rq *rq;
 	int i;
 
-	tg->rt_rq = kzalloc(sizeof(rt_rq) * NR_CPUS, GFP_KERNEL);
+	tg->rt_rq = kzalloc(sizeof(rt_rq) * nr_cpu_ids, GFP_KERNEL);
 	if (!tg->rt_rq)
 		goto err;
-	tg->rt_se = kzalloc(sizeof(rt_se) * NR_CPUS, GFP_KERNEL);
+	tg->rt_se = kzalloc(sizeof(rt_se) * nr_cpu_ids, GFP_KERNEL);
 	if (!tg->rt_se)
 		goto err;
 

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 05/12] x86: use new set_cpus_allowed_ptr function
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
                   ` (3 preceding siblings ...)
  2008-04-05  1:11 ` [PATCH 04/12] sched: Remove fixed NR_CPUS sized arrays in kernel_sched_c v2 Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 06/12] generic: " Mike Travis
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel,
	Len Brown, Dave Jones

[-- Attachment #1: x86-set_cpus_allowed --]
[-- Type: text/plain, Size: 16403 bytes --]

  * Use new set_cpus_allowed_ptr() function added by previous patch,
    which instead of passing the "newly allowed cpus" cpumask_t arg
    by value,  pass it by pointer:

    -int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
    +int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

  * Cleanup uses of CPU_MASK_ALL.

  * Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
    Use pointers to cpumask_t arguments whenever possible.

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

# acpi/cpufreq
Cc: Len Brown <len.brown@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/acpi/cstate.c                    |    4 +-
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c       |   28 ++++++++++----------
 arch/x86/kernel/cpu/cpufreq/powernow-k8.c        |   32 ++++++++++++-----------
 arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c |   13 +++++----
 arch/x86/kernel/cpu/cpufreq/speedstep-ich.c      |   20 +++++++-------
 arch/x86/kernel/cpu/intel_cacheinfo.c            |    4 +-
 arch/x86/kernel/microcode.c                      |   16 +++++------
 arch/x86/kernel/reboot.c                         |    2 -
 8 files changed, 61 insertions(+), 58 deletions(-)

--- linux-2.6.x86.orig/arch/x86/kernel/acpi/cstate.c
+++ linux-2.6.x86/arch/x86/kernel/acpi/cstate.c
@@ -91,7 +91,7 @@ int acpi_processor_ffh_cstate_probe(unsi
 
 	/* Make sure we are running on right CPU */
 	saved_mask = current->cpus_allowed;
-	retval = set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	retval = set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 	if (retval)
 		return -1;
 
@@ -128,7 +128,7 @@ int acpi_processor_ffh_cstate_probe(unsi
 		 cx->address);
 
 out:
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 	return retval;
 }
 EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
--- linux-2.6.x86.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ linux-2.6.x86/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -192,9 +192,9 @@ static void drv_read(struct drv_cmd *cmd
 	cpumask_t saved_mask = current->cpus_allowed;
 	cmd->val = 0;
 
-	set_cpus_allowed(current, cmd->mask);
+	set_cpus_allowed_ptr(current, &cmd->mask);
 	do_drv_read(cmd);
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 }
 
 static void drv_write(struct drv_cmd *cmd)
@@ -203,30 +203,30 @@ static void drv_write(struct drv_cmd *cm
 	unsigned int i;
 
 	for_each_cpu_mask(i, cmd->mask) {
-		set_cpus_allowed(current, cpumask_of_cpu(i));
+		set_cpus_allowed_ptr(current, &cpumask_of_cpu(i));
 		do_drv_write(cmd);
 	}
 
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 	return;
 }
 
-static u32 get_cur_val(cpumask_t mask)
+static u32 get_cur_val(const cpumask_t *mask)
 {
 	struct acpi_processor_performance *perf;
 	struct drv_cmd cmd;
 
-	if (unlikely(cpus_empty(mask)))
+	if (unlikely(cpus_empty(*mask)))
 		return 0;
 
-	switch (per_cpu(drv_data, first_cpu(mask))->cpu_feature) {
+	switch (per_cpu(drv_data, first_cpu(*mask))->cpu_feature) {
 	case SYSTEM_INTEL_MSR_CAPABLE:
 		cmd.type = SYSTEM_INTEL_MSR_CAPABLE;
 		cmd.addr.msr.reg = MSR_IA32_PERF_STATUS;
 		break;
 	case SYSTEM_IO_CAPABLE:
 		cmd.type = SYSTEM_IO_CAPABLE;
-		perf = per_cpu(drv_data, first_cpu(mask))->acpi_data;
+		perf = per_cpu(drv_data, first_cpu(*mask))->acpi_data;
 		cmd.addr.io.port = perf->control_register.address;
 		cmd.addr.io.bit_width = perf->control_register.bit_width;
 		break;
@@ -234,7 +234,7 @@ static u32 get_cur_val(cpumask_t mask)
 		return 0;
 	}
 
-	cmd.mask = mask;
+	cmd.mask = *mask;
 
 	drv_read(&cmd);
 
@@ -271,7 +271,7 @@ static unsigned int get_measured_perf(un
 	unsigned int retval;
 
 	saved_mask = current->cpus_allowed;
-	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 	if (get_cpu() != cpu) {
 		/* We were not able to run on requested processor */
 		put_cpu();
@@ -329,7 +329,7 @@ static unsigned int get_measured_perf(un
 	retval = per_cpu(drv_data, cpu)->max_freq * perf_percent / 100;
 
 	put_cpu();
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 
 	dprintk("cpu %d: performance percent %d\n", cpu, perf_percent);
 	return retval;
@@ -347,13 +347,13 @@ static unsigned int get_cur_freq_on_cpu(
 		return 0;
 	}
 
-	freq = extract_freq(get_cur_val(cpumask_of_cpu(cpu)), data);
+	freq = extract_freq(get_cur_val(&cpumask_of_cpu(cpu)), data);
 	dprintk("cur freq = %u\n", freq);
 
 	return freq;
 }
 
-static unsigned int check_freqs(cpumask_t mask, unsigned int freq,
+static unsigned int check_freqs(const cpumask_t *mask, unsigned int freq,
 				struct acpi_cpufreq_data *data)
 {
 	unsigned int cur_freq;
@@ -449,7 +449,7 @@ static int acpi_cpufreq_target(struct cp
 	drv_write(&cmd);
 
 	if (acpi_pstate_strict) {
-		if (!check_freqs(cmd.mask, freqs.new, data)) {
+		if (!check_freqs(&cmd.mask, freqs.new, data)) {
 			dprintk("acpi_cpufreq_target failed (%d)\n",
 				policy->cpu);
 			return -EAGAIN;
--- linux-2.6.x86.orig/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
+++ linux-2.6.x86/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
@@ -478,12 +478,12 @@ static int core_voltage_post_transition(
 
 static int check_supported_cpu(unsigned int cpu)
 {
-	cpumask_t oldmask = CPU_MASK_ALL;
+	cpumask_t oldmask;
 	u32 eax, ebx, ecx, edx;
 	unsigned int rc = 0;
 
 	oldmask = current->cpus_allowed;
-	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 
 	if (smp_processor_id() != cpu) {
 		printk(KERN_ERR PFX "limiting to cpu %u failed\n", cpu);
@@ -528,7 +528,7 @@ static int check_supported_cpu(unsigned 
 	rc = 1;
 
 out:
-	set_cpus_allowed(current, oldmask);
+	set_cpus_allowed_ptr(current, &oldmask);
 	return rc;
 }
 
@@ -1015,7 +1015,7 @@ static int transition_frequency_pstate(s
 /* Driver entry point to switch to the target frequency */
 static int powernowk8_target(struct cpufreq_policy *pol, unsigned targfreq, unsigned relation)
 {
-	cpumask_t oldmask = CPU_MASK_ALL;
+	cpumask_t oldmask;
 	struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
 	u32 checkfid;
 	u32 checkvid;
@@ -1030,7 +1030,7 @@ static int powernowk8_target(struct cpuf
 
 	/* only run on specific CPU from here on */
 	oldmask = current->cpus_allowed;
-	set_cpus_allowed(current, cpumask_of_cpu(pol->cpu));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(pol->cpu));
 
 	if (smp_processor_id() != pol->cpu) {
 		printk(KERN_ERR PFX "limiting to cpu %u failed\n", pol->cpu);
@@ -1085,7 +1085,7 @@ static int powernowk8_target(struct cpuf
 	ret = 0;
 
 err_out:
-	set_cpus_allowed(current, oldmask);
+	set_cpus_allowed_ptr(current, &oldmask);
 	return ret;
 }
 
@@ -1104,7 +1104,7 @@ static int powernowk8_verify(struct cpuf
 static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol)
 {
 	struct powernow_k8_data *data;
-	cpumask_t oldmask = CPU_MASK_ALL;
+	cpumask_t oldmask;
 	int rc;
 
 	if (!cpu_online(pol->cpu))
@@ -1145,7 +1145,7 @@ static int __cpuinit powernowk8_cpu_init
 
 	/* only run on specific CPU from here on */
 	oldmask = current->cpus_allowed;
-	set_cpus_allowed(current, cpumask_of_cpu(pol->cpu));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(pol->cpu));
 
 	if (smp_processor_id() != pol->cpu) {
 		printk(KERN_ERR PFX "limiting to cpu %u failed\n", pol->cpu);
@@ -1164,7 +1164,7 @@ static int __cpuinit powernowk8_cpu_init
 		fidvid_msr_init();
 
 	/* run on any CPU again */
-	set_cpus_allowed(current, oldmask);
+	set_cpus_allowed_ptr(current, &oldmask);
 
 	if (cpu_family == CPU_HW_PSTATE)
 		pol->cpus = cpumask_of_cpu(pol->cpu);
@@ -1205,7 +1205,7 @@ static int __cpuinit powernowk8_cpu_init
 	return 0;
 
 err_out:
-	set_cpus_allowed(current, oldmask);
+	set_cpus_allowed_ptr(current, &oldmask);
 	powernow_k8_cpu_exit_acpi(data);
 
 	kfree(data);
@@ -1242,10 +1242,11 @@ static unsigned int powernowk8_get (unsi
 	if (!data)
 		return -EINVAL;
 
-	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 	if (smp_processor_id() != cpu) {
-		printk(KERN_ERR PFX "limiting to CPU %d failed in powernowk8_get\n", cpu);
-		set_cpus_allowed(current, oldmask);
+		printk(KERN_ERR PFX
+			"limiting to CPU %d failed in powernowk8_get\n", cpu);
+		set_cpus_allowed_ptr(current, &oldmask);
 		return 0;
 	}
 
@@ -1253,13 +1254,14 @@ static unsigned int powernowk8_get (unsi
 		goto out;
 
 	if (cpu_family == CPU_HW_PSTATE)
-		khz = find_khz_freq_from_pstate(data->powernow_table, data->currpstate);
+		khz = find_khz_freq_from_pstate(data->powernow_table,
+						data->currpstate);
 	else
 		khz = find_khz_freq_from_fid(data->currfid);
 
 
 out:
-	set_cpus_allowed(current, oldmask);
+	set_cpus_allowed_ptr(current, &oldmask);
 	return khz;
 }
 
--- linux-2.6.x86.orig/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
+++ linux-2.6.x86/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
@@ -315,7 +315,7 @@ static unsigned int get_cur_freq(unsigne
 	cpumask_t saved_mask;
 
 	saved_mask = current->cpus_allowed;
-	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 	if (smp_processor_id() != cpu)
 		return 0;
 
@@ -333,7 +333,7 @@ static unsigned int get_cur_freq(unsigne
 		clock_freq = extract_clock(l, cpu, 1);
 	}
 
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 	return clock_freq;
 }
 
@@ -487,7 +487,7 @@ static int centrino_target (struct cpufr
 		else
 			cpu_set(j, set_mask);
 
-		set_cpus_allowed(current, set_mask);
+		set_cpus_allowed_ptr(current, &set_mask);
 		preempt_disable();
 		if (unlikely(!cpu_isset(smp_processor_id(), set_mask))) {
 			dprintk("couldn't limit to CPUs in this domain\n");
@@ -555,7 +555,8 @@ static int centrino_target (struct cpufr
 
 		if (!cpus_empty(covered_cpus)) {
 			for_each_cpu_mask(j, covered_cpus) {
-				set_cpus_allowed(current, cpumask_of_cpu(j));
+				set_cpus_allowed_ptr(current,
+						     &cpumask_of_cpu(j));
 				wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
 			}
 		}
@@ -569,12 +570,12 @@ static int centrino_target (struct cpufr
 			cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 		}
 	}
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 	return 0;
 
 migrate_end:
 	preempt_enable();
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 	return 0;
 }
 
--- linux-2.6.x86.orig/arch/x86/kernel/cpu/cpufreq/speedstep-ich.c
+++ linux-2.6.x86/arch/x86/kernel/cpu/cpufreq/speedstep-ich.c
@@ -229,22 +229,22 @@ static unsigned int speedstep_detect_chi
 	return 0;
 }
 
-static unsigned int _speedstep_get(cpumask_t cpus)
+static unsigned int _speedstep_get(const cpumask_t *cpus)
 {
 	unsigned int speed;
 	cpumask_t cpus_allowed;
 
 	cpus_allowed = current->cpus_allowed;
-	set_cpus_allowed(current, cpus);
+	set_cpus_allowed_ptr(current, cpus);
 	speed = speedstep_get_processor_frequency(speedstep_processor);
-	set_cpus_allowed(current, cpus_allowed);
+	set_cpus_allowed_ptr(current, &cpus_allowed);
 	dprintk("detected %u kHz as current frequency\n", speed);
 	return speed;
 }
 
 static unsigned int speedstep_get(unsigned int cpu)
 {
-	return _speedstep_get(cpumask_of_cpu(cpu));
+	return _speedstep_get(&cpumask_of_cpu(cpu));
 }
 
 /**
@@ -267,7 +267,7 @@ static int speedstep_target (struct cpuf
 	if (cpufreq_frequency_table_target(policy, &speedstep_freqs[0], target_freq, relation, &newstate))
 		return -EINVAL;
 
-	freqs.old = _speedstep_get(policy->cpus);
+	freqs.old = _speedstep_get(&policy->cpus);
 	freqs.new = speedstep_freqs[newstate].frequency;
 	freqs.cpu = policy->cpu;
 
@@ -285,12 +285,12 @@ static int speedstep_target (struct cpuf
 	}
 
 	/* switch to physical CPU where state is to be changed */
-	set_cpus_allowed(current, policy->cpus);
+	set_cpus_allowed_ptr(current, &policy->cpus);
 
 	speedstep_set_state(newstate);
 
 	/* allow to be run on all CPUs */
-	set_cpus_allowed(current, cpus_allowed);
+	set_cpus_allowed_ptr(current, &cpus_allowed);
 
 	for_each_cpu_mask(i, policy->cpus) {
 		freqs.cpu = i;
@@ -326,7 +326,7 @@ static int speedstep_cpu_init(struct cpu
 #endif
 
 	cpus_allowed = current->cpus_allowed;
-	set_cpus_allowed(current, policy->cpus);
+	set_cpus_allowed_ptr(current, &policy->cpus);
 
 	/* detect low and high frequency and transition latency */
 	result = speedstep_get_freqs(speedstep_processor,
@@ -334,12 +334,12 @@ static int speedstep_cpu_init(struct cpu
 				     &speedstep_freqs[SPEEDSTEP_HIGH].frequency,
 				     &policy->cpuinfo.transition_latency,
 				     &speedstep_set_state);
-	set_cpus_allowed(current, cpus_allowed);
+	set_cpus_allowed_ptr(current, &cpus_allowed);
 	if (result)
 		return result;
 
 	/* get current speed setting */
-	speed = _speedstep_get(policy->cpus);
+	speed = _speedstep_get(&policy->cpus);
 	if (!speed)
 		return -EIO;
 
--- linux-2.6.x86.orig/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ linux-2.6.x86/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -525,7 +525,7 @@ static int __cpuinit detect_cache_attrib
 		return -ENOMEM;
 
 	oldmask = current->cpus_allowed;
-	retval = set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	retval = set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 	if (retval)
 		goto out;
 
@@ -542,7 +542,7 @@ static int __cpuinit detect_cache_attrib
 		}
 		cache_shared_cpu_map_setup(cpu, j);
 	}
-	set_cpus_allowed(current, oldmask);
+	set_cpus_allowed_ptr(current, &oldmask);
 
 out:
 	if (retval) {
--- linux-2.6.x86.orig/arch/x86/kernel/microcode.c
+++ linux-2.6.x86/arch/x86/kernel/microcode.c
@@ -402,7 +402,7 @@ static int do_microcode_update (void)
 
 			if (!uci->valid)
 				continue;
-			set_cpus_allowed(current, cpumask_of_cpu(cpu));
+			set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 			error = get_maching_microcode(new_mc, cpu);
 			if (error < 0)
 				goto out;
@@ -416,7 +416,7 @@ out:
 		vfree(new_mc);
 	if (cursor < 0)
 		error = cursor;
-	set_cpus_allowed(current, old);
+	set_cpus_allowed_ptr(current, &old);
 	return error;
 }
 
@@ -579,7 +579,7 @@ static int apply_microcode_check_cpu(int
 		return 0;
 
 	old = current->cpus_allowed;
-	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 
 	/* Check if the microcode we have in memory matches the CPU */
 	if (c->x86_vendor != X86_VENDOR_INTEL || c->x86 < 6 ||
@@ -610,7 +610,7 @@ static int apply_microcode_check_cpu(int
 			" sig=0x%x, pf=0x%x, rev=0x%x\n",
 			cpu, uci->sig, uci->pf, uci->rev);
 
-	set_cpus_allowed(current, old);
+	set_cpus_allowed_ptr(current, &old);
 	return err;
 }
 
@@ -621,13 +621,13 @@ static void microcode_init_cpu(int cpu, 
 
 	old = current->cpus_allowed;
 
-	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 	mutex_lock(&microcode_mutex);
 	collect_cpu_info(cpu);
 	if (uci->valid && system_state == SYSTEM_RUNNING && !resume)
 		cpu_request_microcode(cpu);
 	mutex_unlock(&microcode_mutex);
-	set_cpus_allowed(current, old);
+	set_cpus_allowed_ptr(current, &old);
 }
 
 static void microcode_fini_cpu(int cpu)
@@ -657,14 +657,14 @@ static ssize_t reload_store(struct sys_d
 		old = current->cpus_allowed;
 
 		get_online_cpus();
-		set_cpus_allowed(current, cpumask_of_cpu(cpu));
+		set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 
 		mutex_lock(&microcode_mutex);
 		if (uci->valid)
 			err = cpu_request_microcode(cpu);
 		mutex_unlock(&microcode_mutex);
 		put_online_cpus();
-		set_cpus_allowed(current, old);
+		set_cpus_allowed_ptr(current, &old);
 	}
 	if (err)
 		return err;
--- linux-2.6.x86.orig/arch/x86/kernel/reboot.c
+++ linux-2.6.x86/arch/x86/kernel/reboot.c
@@ -420,7 +420,7 @@ static void native_machine_shutdown(void
 		reboot_cpu_id = smp_processor_id();
 
 	/* Make certain I only run on the appropriate processor */
-	set_cpus_allowed(current, cpumask_of_cpu(reboot_cpu_id));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(reboot_cpu_id));
 
 	/* O.K Now that I'm on the appropriate processor,
 	 * stop all of the others.

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 06/12] generic: use new set_cpus_allowed_ptr function
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
                   ` (4 preceding siblings ...)
  2008-04-05  1:11 ` [PATCH 05/12] x86: use new set_cpus_allowed_ptr function Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 07/12] cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer Mike Travis
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel

[-- Attachment #1: generic-set_cpus_allowed --]
[-- Type: text/plain, Size: 8244 bytes --]

  * Use new set_cpus_allowed_ptr() function added by previous patch,
    which instead of passing the "newly allowed cpus" cpumask_t arg
    by value,  pass it by pointer:

    -int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
    +int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

  * Modify CPU_MASK_ALL 

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 drivers/acpi/processor_throttling.c |   10 +++++-----
 drivers/firmware/dcdbas.c           |    4 ++--
 drivers/pci/pci-driver.c            |    9 ++++++---
 kernel/cpu.c                        |    6 +++---
 kernel/kmod.c                       |    2 +-
 kernel/kthread.c                    |    6 +++---
 kernel/rcutorture.c                 |   15 +++++++++------
 kernel/stop_machine.c               |    2 +-
 kernel/trace/trace_sysprof.c        |    4 ++--
 9 files changed, 32 insertions(+), 26 deletions(-)

--- linux-2.6.x86.orig/drivers/acpi/processor_throttling.c
+++ linux-2.6.x86/drivers/acpi/processor_throttling.c
@@ -838,10 +838,10 @@ static int acpi_processor_get_throttling
 	 * Migrate task to the cpu pointed by pr.
 	 */
 	saved_mask = current->cpus_allowed;
-	set_cpus_allowed(current, cpumask_of_cpu(pr->id));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
 	ret = pr->throttling.acpi_processor_get_throttling(pr);
 	/* restore the previous state */
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 
 	return ret;
 }
@@ -1025,7 +1025,7 @@ int acpi_processor_set_throttling(struct
 	 * it can be called only for the cpu pointed by pr.
 	 */
 	if (p_throttling->shared_type == DOMAIN_COORD_TYPE_SW_ANY) {
-		set_cpus_allowed(current, cpumask_of_cpu(pr->id));
+		set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
 		ret = p_throttling->acpi_processor_set_throttling(pr,
 						t_state.target_state);
 	} else {
@@ -1056,7 +1056,7 @@ int acpi_processor_set_throttling(struct
 				continue;
 			}
 			t_state.cpu = i;
-			set_cpus_allowed(current, cpumask_of_cpu(i));
+			set_cpus_allowed_ptr(current, &cpumask_of_cpu(i));
 			ret = match_pr->throttling.
 				acpi_processor_set_throttling(
 				match_pr, t_state.target_state);
@@ -1074,7 +1074,7 @@ int acpi_processor_set_throttling(struct
 							&t_state);
 	}
 	/* restore the previous state */
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 	return ret;
 }
 
--- linux-2.6.x86.orig/drivers/firmware/dcdbas.c
+++ linux-2.6.x86/drivers/firmware/dcdbas.c
@@ -265,7 +265,7 @@ static int smi_request(struct smi_cmd *s
 
 	/* SMI requires CPU 0 */
 	old_mask = current->cpus_allowed;
-	set_cpus_allowed(current, cpumask_of_cpu(0));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu(0));
 	if (smp_processor_id() != 0) {
 		dev_dbg(&dcdbas_pdev->dev, "%s: failed to get CPU 0\n",
 			__FUNCTION__);
@@ -285,7 +285,7 @@ static int smi_request(struct smi_cmd *s
 	);
 
 out:
-	set_cpus_allowed(current, old_mask);
+	set_cpus_allowed_ptr(current, &old_mask);
 	return ret;
 }
 
--- linux-2.6.x86.orig/drivers/pci/pci-driver.c
+++ linux-2.6.x86/drivers/pci/pci-driver.c
@@ -182,15 +182,18 @@ static int pci_call_probe(struct pci_dri
 	struct mempolicy *oldpol;
 	cpumask_t oldmask = current->cpus_allowed;
 	int node = dev_to_node(&dev->dev);
-	if (node >= 0)
-	    set_cpus_allowed(current, node_to_cpumask(node));
+
+	if (node >= 0) {
+		node_to_cpumask_ptr(nodecpumask, node);
+		set_cpus_allowed_ptr(current, nodecpumask);
+	}
 	/* And set default memory allocation policy */
 	oldpol = current->mempolicy;
 	current->mempolicy = NULL;	/* fall back to system default policy */
 #endif
 	error = drv->probe(dev, id);
 #ifdef CONFIG_NUMA
-	set_cpus_allowed(current, oldmask);
+	set_cpus_allowed_ptr(current, &oldmask);
 	current->mempolicy = oldpol;
 #endif
 	return error;
--- linux-2.6.x86.orig/kernel/cpu.c
+++ linux-2.6.x86/kernel/cpu.c
@@ -232,9 +232,9 @@ static int _cpu_down(unsigned int cpu, i
 
 	/* Ensure that we are not runnable on dying cpu */
 	old_allowed = current->cpus_allowed;
-	tmp = CPU_MASK_ALL;
+	cpus_setall(tmp);
 	cpu_clear(cpu, tmp);
-	set_cpus_allowed(current, tmp);
+	set_cpus_allowed_ptr(current, &tmp);
 
 	p = __stop_machine_run(take_cpu_down, &tcd_param, cpu);
 
@@ -268,7 +268,7 @@ static int _cpu_down(unsigned int cpu, i
 out_thread:
 	err = kthread_stop(p);
 out_allowed:
-	set_cpus_allowed(current, old_allowed);
+	set_cpus_allowed_ptr(current, &old_allowed);
 out_release:
 	cpu_hotplug_done();
 	return err;
--- linux-2.6.x86.orig/kernel/kmod.c
+++ linux-2.6.x86/kernel/kmod.c
@@ -165,7 +165,7 @@ static int ____call_usermodehelper(void 
 	}
 
 	/* We can run anywhere, unlike our parent keventd(). */
-	set_cpus_allowed(current, CPU_MASK_ALL);
+	set_cpus_allowed_ptr(current, CPU_MASK_ALL_PTR);
 
 	/*
 	 * Our parent is keventd, which runs with elevated scheduling priority.
--- linux-2.6.x86.orig/kernel/kthread.c
+++ linux-2.6.x86/kernel/kthread.c
@@ -109,7 +109,7 @@ static void create_kthread(struct kthrea
 		 */
 		sched_setscheduler(create->result, SCHED_NORMAL, &param);
 		set_user_nice(create->result, KTHREAD_NICE_LEVEL);
-		set_cpus_allowed(create->result, cpu_system_map);
+		set_cpus_allowed_ptr(create->result, &cpu_system_map);
 	}
 	complete(&create->done);
 }
@@ -235,7 +235,7 @@ int kthreadd(void *unused)
 	set_task_comm(tsk, "kthreadd");
 	ignore_signals(tsk);
 	set_user_nice(tsk, KTHREAD_NICE_LEVEL);
-	set_cpus_allowed(tsk, cpu_system_map);
+	set_cpus_allowed_ptr(tsk, &cpu_system_map);
 
 	current->flags |= PF_NOFREEZE;
 
@@ -284,7 +284,7 @@ again:
 			 */
 			get_task_struct(t);
 			rcu_read_unlock();
-			set_cpus_allowed(t, *new_system_map);
+			set_cpus_allowed_ptr(t, new_system_map);
 			put_task_struct(t);
 			goto again;
 		}
--- linux-2.6.x86.orig/kernel/rcutorture.c
+++ linux-2.6.x86/kernel/rcutorture.c
@@ -723,9 +723,10 @@ static int rcu_idle_cpu;	/* Force all to
  */
 static void rcu_torture_shuffle_tasks(void)
 {
-	cpumask_t tmp_mask = CPU_MASK_ALL;
+	cpumask_t tmp_mask;
 	int i;
 
+	cpus_setall(tmp_mask);
 	get_online_cpus();
 
 	/* No point in shuffling if there is only one online CPU (ex: UP) */
@@ -737,25 +738,27 @@ static void rcu_torture_shuffle_tasks(vo
 	if (rcu_idle_cpu != -1)
 		cpu_clear(rcu_idle_cpu, tmp_mask);
 
-	set_cpus_allowed(current, tmp_mask);
+	set_cpus_allowed_ptr(current, &tmp_mask);
 
 	if (reader_tasks) {
 		for (i = 0; i < nrealreaders; i++)
 			if (reader_tasks[i])
-				set_cpus_allowed(reader_tasks[i], tmp_mask);
+				set_cpus_allowed_ptr(reader_tasks[i],
+						     &tmp_mask);
 	}
 
 	if (fakewriter_tasks) {
 		for (i = 0; i < nfakewriters; i++)
 			if (fakewriter_tasks[i])
-				set_cpus_allowed(fakewriter_tasks[i], tmp_mask);
+				set_cpus_allowed_ptr(fakewriter_tasks[i],
+						     &tmp_mask);
 	}
 
 	if (writer_task)
-		set_cpus_allowed(writer_task, tmp_mask);
+		set_cpus_allowed_ptr(writer_task, &tmp_mask);
 
 	if (stats_task)
-		set_cpus_allowed(stats_task, tmp_mask);
+		set_cpus_allowed_ptr(stats_task, &tmp_mask);
 
 	if (rcu_idle_cpu == -1)
 		rcu_idle_cpu = num_online_cpus() - 1;
--- linux-2.6.x86.orig/kernel/stop_machine.c
+++ linux-2.6.x86/kernel/stop_machine.c
@@ -35,7 +35,7 @@ static int stopmachine(void *cpu)
 	int irqs_disabled = 0;
 	int prepared = 0;
 
-	set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
+	set_cpus_allowed_ptr(current, &cpumask_of_cpu((int)(long)cpu));
 
 	/* Ack: we are alive */
 	smp_mb(); /* Theoretically the ack = 0 might not be on this CPU yet. */
--- linux-2.6.x86.orig/kernel/trace/trace_sysprof.c
+++ linux-2.6.x86/kernel/trace/trace_sysprof.c
@@ -205,10 +205,10 @@ static void start_stack_timers(void)
 	int cpu;
 
 	for_each_online_cpu(cpu) {
-		set_cpus_allowed(current, cpumask_of_cpu(cpu));
+		set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 		start_stack_timer(cpu);
 	}
-	set_cpus_allowed(current, saved_mask);
+	set_cpus_allowed_ptr(current, &saved_mask);
 }
 
 static void stop_stack_timer(int cpu)

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 07/12] cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
                   ` (5 preceding siblings ...)
  2008-04-05  1:11 ` [PATCH 06/12] generic: " Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 08/12] generic: reduce stack pressure in sched_affinity Mike Travis
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel

[-- Attachment #1: cpuset_cpus_allowed --]
[-- Type: text/plain, Size: 6360 bytes --]

  * Modify cpuset_cpus_allowed to return the currently allowed cpuset
    via a pointer argument instead of as the function return value.

  * Use new set_cpus_allowed_ptr function.

  * Cleanup CPU_MASK_ALL and NODE_MASK_ALL uses.

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 include/linux/cpuset.h |   13 +++++++------
 kernel/cpuset.c        |   31 ++++++++++++-------------------
 kernel/sched.c         |    8 +++++---
 mm/pdflush.c           |    4 ++--
 4 files changed, 26 insertions(+), 30 deletions(-)

--- linux-2.6.x86.orig/include/linux/cpuset.h
+++ linux-2.6.x86/include/linux/cpuset.h
@@ -20,8 +20,8 @@ extern int number_of_cpusets;	/* How man
 extern int cpuset_init_early(void);
 extern int cpuset_init(void);
 extern void cpuset_init_smp(void);
-extern cpumask_t cpuset_cpus_allowed(struct task_struct *p);
-extern cpumask_t cpuset_cpus_allowed_locked(struct task_struct *p);
+extern void cpuset_cpus_allowed(struct task_struct *p, cpumask_t *mask);
+extern void cpuset_cpus_allowed_locked(struct task_struct *p, cpumask_t *mask);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
 #define cpuset_current_mems_allowed (current->mems_allowed)
 void cpuset_init_current_mems_allowed(void);
@@ -86,13 +86,14 @@ static inline int cpuset_init_early(void
 static inline int cpuset_init(void) { return 0; }
 static inline void cpuset_init_smp(void) {}
 
-static inline cpumask_t cpuset_cpus_allowed(struct task_struct *p)
+static inline void cpuset_cpus_allowed(struct task_struct *p, cpumask_t *mask)
 {
-	return cpu_possible_map;
+	*mask = cpu_possible_map;
 }
-static inline cpumask_t cpuset_cpus_allowed_locked(struct task_struct *p)
+static inline void cpuset_cpus_allowed_locked(struct task_struct *p,
+								cpumask_t *mask)
 {
-	return cpu_possible_map;
+	*mask = cpu_possible_map;
 }
 
 static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
--- linux-2.6.x86.orig/kernel/cpuset.c
+++ linux-2.6.x86/kernel/cpuset.c
@@ -741,7 +741,7 @@ int cpuset_test_cpumask(struct task_stru
  */
 void cpuset_change_cpumask(struct task_struct *tsk, struct cgroup_scanner *scan)
 {
-	set_cpus_allowed(tsk, (cgroup_cs(scan->cg))->cpus_allowed);
+	set_cpus_allowed_ptr(tsk, &((cgroup_cs(scan->cg))->cpus_allowed));
 }
 
 /**
@@ -1269,7 +1269,7 @@ static void cpuset_attach(struct cgroup_
 
 	mutex_lock(&callback_mutex);
 	guarantee_online_cpus(cs, &cpus);
-	set_cpus_allowed(tsk, cpus);
+	set_cpus_allowed_ptr(tsk, &cpus);
 	mutex_unlock(&callback_mutex);
 
 	from = oldcs->mems_allowed;
@@ -1663,8 +1663,8 @@ static struct cgroup_subsys_state *cpuse
 		set_bit(CS_SPREAD_SLAB, &cs->flags);
 	set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
 	set_bit(CS_SYSTEM, &cs->flags);
-	cs->cpus_allowed = CPU_MASK_NONE;
-	cs->mems_allowed = NODE_MASK_NONE;
+	cpus_clear(cs->cpus_allowed);
+	nodes_clear(cs->mems_allowed);
 	cs->mems_generation = cpuset_mems_generation++;
 	fmeter_init(&cs->fmeter);
 
@@ -1737,8 +1737,8 @@ int __init cpuset_init(void)
 {
 	int err = 0;
 
-	top_cpuset.cpus_allowed = CPU_MASK_ALL;
-	top_cpuset.mems_allowed = NODE_MASK_ALL;
+	cpus_setall(top_cpuset.cpus_allowed);
+	nodes_setall(top_cpuset.mems_allowed);
 
 	fmeter_init(&top_cpuset.fmeter);
 	top_cpuset.mems_generation = cpuset_mems_generation++;
@@ -1957,6 +1957,7 @@ void __init cpuset_init_smp(void)
 
  * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset.
  * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
+ * @pmask: pointer to cpumask_t variable to receive cpus_allowed set.
  *
  * Description: Returns the cpumask_t cpus_allowed of the cpuset
  * attached to the specified @tsk.  Guaranteed to return some non-empty
@@ -1964,35 +1965,27 @@ void __init cpuset_init_smp(void)
  * tasks cpuset.
  **/
 
-cpumask_t cpuset_cpus_allowed(struct task_struct *tsk)
+void cpuset_cpus_allowed(struct task_struct *tsk, cpumask_t *pmask)
 {
-	cpumask_t mask;
-
 	mutex_lock(&callback_mutex);
-	mask = cpuset_cpus_allowed_locked(tsk);
+	cpuset_cpus_allowed_locked(tsk, pmask);
 	mutex_unlock(&callback_mutex);
-
-	return mask;
 }
 
 /**
  * cpuset_cpus_allowed_locked - return cpus_allowed mask from a tasks cpuset.
  * Must be called with callback_mutex held.
  **/
-cpumask_t cpuset_cpus_allowed_locked(struct task_struct *tsk)
+void cpuset_cpus_allowed_locked(struct task_struct *tsk, cpumask_t *pmask)
 {
-	cpumask_t mask;
-
 	task_lock(tsk);
-	guarantee_online_cpus(task_cs(tsk), &mask);
+	guarantee_online_cpus(task_cs(tsk), pmask);
 	task_unlock(tsk);
-
-	return mask;
 }
 
 void cpuset_init_current_mems_allowed(void)
 {
-	current->mems_allowed = NODE_MASK_ALL;
+	nodes_setall(current->mems_allowed);
 }
 
 /**
--- linux-2.6.x86.orig/kernel/sched.c
+++ linux-2.6.x86/kernel/sched.c
@@ -4996,13 +4996,13 @@ long sched_setaffinity(pid_t pid, cpumas
 	if (retval)
 		goto out_unlock;
 
-	cpus_allowed = cpuset_cpus_allowed(p);
+	cpuset_cpus_allowed(p, &cpus_allowed);
 	cpus_and(new_mask, new_mask, cpus_allowed);
  again:
 	retval = set_cpus_allowed(p, new_mask);
 
 	if (!retval) {
-		cpus_allowed = cpuset_cpus_allowed(p);
+		cpuset_cpus_allowed(p, &cpus_allowed);
 		if (!cpus_subset(new_mask, cpus_allowed)) {
 			/*
 			 * We must have raced with a concurrent cpuset
@@ -5719,7 +5719,9 @@ static void move_task_off_dead_cpu(int d
 
 		/* No more Mr. Nice Guy. */
 		if (dest_cpu >= nr_cpu_ids) {
-			cpumask_t cpus_allowed = cpuset_cpus_allowed_locked(p);
+			cpumask_t cpus_allowed;
+
+			cpuset_cpus_allowed_locked(p, &cpus_allowed);
 			/*
 			 * Try to stay on the same cpuset, where the
 			 * current cpuset may be a subset of all cpus.
--- linux-2.6.x86.orig/mm/pdflush.c
+++ linux-2.6.x86/mm/pdflush.c
@@ -187,8 +187,8 @@ static int pdflush(void *dummy)
 	 * This is needed as pdflush's are dynamically created and destroyed.
 	 * The boottime pdflush's are easily placed w/o these 2 lines.
 	 */
-	cpus_allowed = cpuset_cpus_allowed(current);
-	set_cpus_allowed(current, cpus_allowed);
+	cpuset_cpus_allowed(current, &cpus_allowed);
+	set_cpus_allowed_ptr(current, &cpus_allowed);
 
 	return __pdflush(&my_work);
 }

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 08/12] generic: reduce stack pressure in sched_affinity
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
                   ` (6 preceding siblings ...)
  2008-04-05  1:11 ` [PATCH 07/12] cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 09/12] numa: move large array from stack to _initdata section Mike Travis
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel,
	Paul Jackson, Cliff Wickman

[-- Attachment #1: cpumask_affinity --]
[-- Type: text/plain, Size: 6158 bytes --]

  * Modify sched_affinity functions to pass cpumask_t variables by reference
    instead of by value.

  * Use new set_cpus_allowed_ptr function.

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Jackson <pj@sgi.com>
Cc: Cliff Wickman <cpw@sgi.com>

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/cpu/mcheck/mce_amd_64.c |   46 ++++++++++++++++----------------
 include/linux/sched.h                   |    2 -
 kernel/compat.c                         |    2 -
 kernel/rcupreempt.c                     |    4 +-
 kernel/sched.c                          |    5 ++-
 5 files changed, 30 insertions(+), 29 deletions(-)

--- linux-2.6.x86.orig/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
+++ linux-2.6.x86/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
@@ -251,18 +251,18 @@ struct threshold_attr {
 	ssize_t(*store) (struct threshold_block *, const char *, size_t count);
 };
 
-static cpumask_t affinity_set(unsigned int cpu)
+static void affinity_set(unsigned int cpu, cpumask_t *oldmask,
+					   cpumask_t *newmask)
 {
-	cpumask_t oldmask = current->cpus_allowed;
-	cpumask_t newmask = CPU_MASK_NONE;
-	cpu_set(cpu, newmask);
-	set_cpus_allowed(current, newmask);
-	return oldmask;
+	*oldmask = current->cpus_allowed;
+	cpus_clear(*newmask);
+	cpu_set(cpu, *newmask);
+	set_cpus_allowed_ptr(current, newmask);
 }
 
-static void affinity_restore(cpumask_t oldmask)
+static void affinity_restore(const cpumask_t *oldmask)
 {
-	set_cpus_allowed(current, oldmask);
+	set_cpus_allowed_ptr(current, oldmask);
 }
 
 #define SHOW_FIELDS(name)                                           \
@@ -277,15 +277,15 @@ static ssize_t store_interrupt_enable(st
 				      const char *buf, size_t count)
 {
 	char *end;
-	cpumask_t oldmask;
+	cpumask_t oldmask, newmask;
 	unsigned long new = simple_strtoul(buf, &end, 0);
 	if (end == buf)
 		return -EINVAL;
 	b->interrupt_enable = !!new;
 
-	oldmask = affinity_set(b->cpu);
+	affinity_set(b->cpu, &oldmask, &newmask);
 	threshold_restart_bank(b, 0, 0);
-	affinity_restore(oldmask);
+	affinity_restore(&oldmask);
 
 	return end - buf;
 }
@@ -294,7 +294,7 @@ static ssize_t store_threshold_limit(str
 				     const char *buf, size_t count)
 {
 	char *end;
-	cpumask_t oldmask;
+	cpumask_t oldmask, newmask;
 	u16 old;
 	unsigned long new = simple_strtoul(buf, &end, 0);
 	if (end == buf)
@@ -306,9 +306,9 @@ static ssize_t store_threshold_limit(str
 	old = b->threshold_limit;
 	b->threshold_limit = new;
 
-	oldmask = affinity_set(b->cpu);
+	affinity_set(b->cpu, &oldmask, &newmask);
 	threshold_restart_bank(b, 0, old);
-	affinity_restore(oldmask);
+	affinity_restore(&oldmask);
 
 	return end - buf;
 }
@@ -316,10 +316,10 @@ static ssize_t store_threshold_limit(str
 static ssize_t show_error_count(struct threshold_block *b, char *buf)
 {
 	u32 high, low;
-	cpumask_t oldmask;
-	oldmask = affinity_set(b->cpu);
+	cpumask_t oldmask, newmask;
+	affinity_set(b->cpu, &oldmask, &newmask);
 	rdmsr(b->address, low, high);
-	affinity_restore(oldmask);
+	affinity_restore(&oldmask);
 	return sprintf(buf, "%x\n",
 		       (high & 0xFFF) - (THRESHOLD_MAX - b->threshold_limit));
 }
@@ -327,10 +327,10 @@ static ssize_t show_error_count(struct t
 static ssize_t store_error_count(struct threshold_block *b,
 				 const char *buf, size_t count)
 {
-	cpumask_t oldmask;
-	oldmask = affinity_set(b->cpu);
+	cpumask_t oldmask, newmask;
+	affinity_set(b->cpu, &oldmask, &newmask);
 	threshold_restart_bank(b, 1, 0);
-	affinity_restore(oldmask);
+	affinity_restore(&oldmask);
 	return 1;
 }
 
@@ -468,7 +468,7 @@ static __cpuinit int threshold_create_ba
 {
 	int i, err = 0;
 	struct threshold_bank *b = NULL;
-	cpumask_t oldmask = CPU_MASK_NONE;
+	cpumask_t oldmask, newmask;
 	char name[32];
 
 	sprintf(name, "threshold_bank%i", bank);
@@ -519,10 +519,10 @@ static __cpuinit int threshold_create_ba
 
 	per_cpu(threshold_banks, cpu)[bank] = b;
 
-	oldmask = affinity_set(cpu);
+	affinity_set(cpu, &oldmask, &newmask);
 	err = allocate_threshold_blocks(cpu, bank, 0,
 					MSR_IA32_MC0_MISC + bank * 4);
-	affinity_restore(oldmask);
+	affinity_restore(&oldmask);
 
 	if (err)
 		goto out_free;
--- linux-2.6.x86.orig/include/linux/sched.h
+++ linux-2.6.x86/include/linux/sched.h
@@ -2086,7 +2086,7 @@ ftrace_special(unsigned long arg1, unsig
 }
 #endif
 
-extern long sched_setaffinity(pid_t pid, cpumask_t new_mask);
+extern long sched_setaffinity(pid_t pid, const cpumask_t *new_mask);
 extern long sched_getaffinity(pid_t pid, cpumask_t *mask);
 
 extern int sched_mc_power_savings, sched_smt_power_savings;
--- linux-2.6.x86.orig/kernel/compat.c
+++ linux-2.6.x86/kernel/compat.c
@@ -446,7 +446,7 @@ asmlinkage long compat_sys_sched_setaffi
 	if (retval)
 		return retval;
 
-	return sched_setaffinity(pid, new_mask);
+	return sched_setaffinity(pid, &new_mask);
 }
 
 asmlinkage long compat_sys_sched_getaffinity(compat_pid_t pid, unsigned int len,
--- linux-2.6.x86.orig/kernel/rcupreempt.c
+++ linux-2.6.x86/kernel/rcupreempt.c
@@ -1005,10 +1005,10 @@ void __synchronize_sched(void)
 	if (sched_getaffinity(0, &oldmask) < 0)
 		oldmask = cpu_possible_map;
 	for_each_online_cpu(cpu) {
-		sched_setaffinity(0, cpumask_of_cpu(cpu));
+		sched_setaffinity(0, &cpumask_of_cpu(cpu));
 		schedule();
 	}
-	sched_setaffinity(0, oldmask);
+	sched_setaffinity(0, &oldmask);
 }
 EXPORT_SYMBOL_GPL(__synchronize_sched);
 
--- linux-2.6.x86.orig/kernel/sched.c
+++ linux-2.6.x86/kernel/sched.c
@@ -4963,9 +4963,10 @@ out_unlock:
 	return retval;
 }
 
-long sched_setaffinity(pid_t pid, cpumask_t new_mask)
+long sched_setaffinity(pid_t pid, const cpumask_t *in_mask)
 {
 	cpumask_t cpus_allowed;
+	cpumask_t new_mask = *in_mask;
 	struct task_struct *p;
 	int retval;
 
@@ -5046,7 +5047,7 @@ asmlinkage long sys_sched_setaffinity(pi
 	if (retval)
 		return retval;
 
-	return sched_setaffinity(pid, new_mask);
+	return sched_setaffinity(pid, &new_mask);
 }
 
 /*

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 09/12] numa: move large array from stack to _initdata section
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
                   ` (7 preceding siblings ...)
  2008-04-05  1:11 ` [PATCH 08/12] generic: reduce stack pressure in sched_affinity Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 10/12] nodemask: use new node_to_cpumask_ptr function Mike Travis
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel

[-- Attachment #1: numa_initmem_init --]
[-- Type: text/plain, Size: 1153 bytes --]

  * Move large array "struct bootnode nodes" from stack to _initdata
    section to reduce amount of stack space required.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/mm/numa_64.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- linux-2.6.x86.orig/arch/x86/mm/numa_64.c
+++ linux-2.6.x86/arch/x86/mm/numa_64.c
@@ -411,9 +411,10 @@ static int __init split_nodes_by_size(st
  * Sets up the system RAM area from start_pfn to end_pfn according to the
  * numa=fake command-line option.
  */
+static struct bootnode nodes[MAX_NUMNODES] __initdata;
+
 static int __init numa_emulation(unsigned long start_pfn, unsigned long end_pfn)
 {
-	struct bootnode nodes[MAX_NUMNODES];
 	u64 size, addr = start_pfn << PAGE_SHIFT;
 	u64 max_addr = end_pfn << PAGE_SHIFT;
 	int num_nodes = 0, num = 0, coeff_flag, coeff = -1, i;

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 10/12] nodemask: use new node_to_cpumask_ptr function
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
                   ` (8 preceding siblings ...)
  2008-04-05  1:11 ` [PATCH 09/12] numa: move large array from stack to _initdata section Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 11/12] cpumask: reduce stack usage in SD_x_INIT initializers Mike Travis
  2008-04-05  1:11 ` [PATCH 12/12] cpumask: Cleanup more uses of CPU_MASK and NODE_MASK Mike Travis
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel,
	Greg Kroah-Hartman, Greg Banks

[-- Attachment #1: node_to_cpumask_ptr --]
[-- Type: text/plain, Size: 8568 bytes --]

  * Use new node_to_cpumask_ptr.  This creates a pointer to the
    cpumask for a given node.  This definition is in mm patch:

	asm-generic-add-node_to_cpumask_ptr-macro.patch

  * Use new set_cpus_allowed_ptr function.

Depends on:
	[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
	[sched-devel]: sched: add new set_cpus_allowed_ptr function
	[x86/latest]: x86: add cpus_scnprintf function

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

# pci
Cc: Greg Kroah-Hartman <gregkh@suse.de>

# sunrpc
Cc: Greg Banks <gnb@melbourne.sgi.com>

# x86
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>

Signed-off-by: Mike Travis <travis@sgi.com>
---
One checkpatch error that I don't think can be fixed (was already in source):

ERROR: Macros with complex values should be enclosed in parenthesis
#230: FILE: include/linux/topology.h:49:

#define for_each_node_with_cpus(node)			\
        for_each_online_node(node)                      \
		if (nr_cpus_node(node))
---
 drivers/base/node.c |    7 ++++---
 kernel/sched.c      |   29 ++++++++++++++---------------
 mm/page_alloc.c     |    6 +++---
 mm/slab.c           |    5 ++---
 mm/vmscan.c         |   18 ++++++++----------
 net/sunrpc/svc.c    |   16 +++++++++++-----
 6 files changed, 42 insertions(+), 39 deletions(-)

--- linux-2.6.x86.orig/drivers/base/node.c
+++ linux-2.6.x86/drivers/base/node.c
@@ -22,14 +22,15 @@ static struct sysdev_class node_class = 
 static ssize_t node_read_cpumap(struct sys_device * dev, char * buf)
 {
 	struct node *node_dev = to_node(dev);
-	cpumask_t mask = node_to_cpumask(node_dev->sysdev.id);
+	node_to_cpumask_ptr(mask, node_dev->sysdev.id);
 	int len;
 
 	/* 2004/06/03: buf currently PAGE_SIZE, need > 1 char per 4 bits. */
 	BUILD_BUG_ON(MAX_NUMNODES/4 > PAGE_SIZE/2);
 
-	len = cpumask_scnprintf(buf, PAGE_SIZE-1, mask);
-	len += sprintf(buf + len, "\n");
+	len = cpumask_scnprintf(buf, PAGE_SIZE-2, *mask);
+ 	buf[len++] = '\n';
+ 	buf[len] = '\0';
 	return len;
 }
 
--- linux-2.6.x86.orig/kernel/sched.c
+++ linux-2.6.x86/kernel/sched.c
@@ -6484,7 +6484,7 @@ init_sched_build_groups(cpumask_t span, 
  *
  * Should use nodemask_t.
  */
-static int find_next_best_node(int node, unsigned long *used_nodes)
+static int find_next_best_node(int node, nodemask_t *used_nodes)
 {
 	int i, n, val, min_val, best_node = 0;
 
@@ -6498,7 +6498,7 @@ static int find_next_best_node(int node,
 			continue;
 
 		/* Skip already used nodes */
-		if (test_bit(n, used_nodes))
+		if (node_isset(n, *used_nodes))
 			continue;
 
 		/* Simple min distance search */
@@ -6510,14 +6510,13 @@ static int find_next_best_node(int node,
 		}
 	}
 
-	set_bit(best_node, used_nodes);
+	node_set(best_node, *used_nodes);
 	return best_node;
 }
 
 /**
  * sched_domain_node_span - get a cpumask for a node's sched_domain
  * @node: node whose cpumask we're constructing
- * @size: number of nodes to include in this span
  *
  * Given a node, construct a good cpumask for its sched_domain to span. It
  * should be one that prevents unnecessary balancing, but also spreads tasks
@@ -6525,22 +6524,22 @@ static int find_next_best_node(int node,
  */
 static cpumask_t sched_domain_node_span(int node)
 {
-	DECLARE_BITMAP(used_nodes, MAX_NUMNODES);
-	cpumask_t span, nodemask;
+	nodemask_t used_nodes;
+	cpumask_t span;
+	node_to_cpumask_ptr(nodemask, node);
 	int i;
 
 	cpus_clear(span);
-	bitmap_zero(used_nodes, MAX_NUMNODES);
+	nodes_clear(used_nodes);
 
-	nodemask = node_to_cpumask(node);
-	cpus_or(span, span, nodemask);
-	set_bit(node, used_nodes);
+	cpus_or(span, span, *nodemask);
+	node_set(node, used_nodes);
 
 	for (i = 1; i < SD_NODES_PER_DOMAIN; i++) {
-		int next_node = find_next_best_node(node, used_nodes);
+		int next_node = find_next_best_node(node, &used_nodes);
 
-		nodemask = node_to_cpumask(next_node);
-		cpus_or(span, span, nodemask);
+		node_to_cpumask_ptr_next(nodemask, next_node);
+		cpus_or(span, span, *nodemask);
 	}
 
 	return span;
@@ -6937,6 +6936,7 @@ static int build_sched_domains(const cpu
 		for (j = 0; j < MAX_NUMNODES; j++) {
 			cpumask_t tmp, notcovered;
 			int n = (i + j) % MAX_NUMNODES;
+			node_to_cpumask_ptr(pnodemask, n);
 
 			cpus_complement(notcovered, covered);
 			cpus_and(tmp, notcovered, *cpu_map);
@@ -6944,8 +6944,7 @@ static int build_sched_domains(const cpu
 			if (cpus_empty(tmp))
 				break;
 
-			nodemask = node_to_cpumask(n);
-			cpus_and(tmp, tmp, nodemask);
+			cpus_and(tmp, tmp, *pnodemask);
 			if (cpus_empty(tmp))
 				continue;
 
--- linux-2.6.x86.orig/mm/page_alloc.c
+++ linux-2.6.x86/mm/page_alloc.c
@@ -2029,6 +2029,7 @@ static int find_next_best_node(int node,
 	int n, val;
 	int min_val = INT_MAX;
 	int best_node = -1;
+	node_to_cpumask_ptr(tmp, 0);
 
 	/* Use the local node if we haven't already */
 	if (!node_isset(node, *used_node_mask)) {
@@ -2037,7 +2038,6 @@ static int find_next_best_node(int node,
 	}
 
 	for_each_node_state(n, N_HIGH_MEMORY) {
-		cpumask_t tmp;
 
 		/* Don't want a node to appear more than once */
 		if (node_isset(n, *used_node_mask))
@@ -2050,8 +2050,8 @@ static int find_next_best_node(int node,
 		val += (n < node);
 
 		/* Give preference to headless and unused nodes */
-		tmp = node_to_cpumask(n);
-		if (!cpus_empty(tmp))
+		node_to_cpumask_ptr_next(tmp, n);
+		if (!cpus_empty(*tmp))
 			val += PENALTY_FOR_NODE_WITH_CPUS;
 
 		/* Slight preference for less loaded node */
--- linux-2.6.x86.orig/mm/slab.c
+++ linux-2.6.x86/mm/slab.c
@@ -1160,14 +1160,13 @@ static void __cpuinit cpuup_canceled(lon
 	struct kmem_cache *cachep;
 	struct kmem_list3 *l3 = NULL;
 	int node = cpu_to_node(cpu);
+	node_to_cpumask_ptr(mask, node);
 
 	list_for_each_entry(cachep, &cache_chain, next) {
 		struct array_cache *nc;
 		struct array_cache *shared;
 		struct array_cache **alien;
-		cpumask_t mask;
 
-		mask = node_to_cpumask(node);
 		/* cpu is dead; no one can alloc from it. */
 		nc = cachep->array[cpu];
 		cachep->array[cpu] = NULL;
@@ -1183,7 +1182,7 @@ static void __cpuinit cpuup_canceled(lon
 		if (nc)
 			free_block(cachep, nc->entry, nc->avail, node);
 
-		if (!cpus_empty(mask)) {
+		if (!cpus_empty(*mask)) {
 			spin_unlock_irq(&l3->list_lock);
 			goto free_array_cache;
 		}
--- linux-2.6.x86.orig/mm/vmscan.c
+++ linux-2.6.x86/mm/vmscan.c
@@ -1647,11 +1647,10 @@ static int kswapd(void *p)
 	struct reclaim_state reclaim_state = {
 		.reclaimed_slab = 0,
 	};
-	cpumask_t cpumask;
+	node_to_cpumask_ptr(cpumask, pgdat->node_id);
 
-	cpumask = node_to_cpumask(pgdat->node_id);
-	if (!cpus_empty(cpumask))
-		set_cpus_allowed(tsk, cpumask);
+	if (!cpus_empty(*cpumask))
+		set_cpus_allowed_ptr(tsk, cpumask);
 	current->reclaim_state = &reclaim_state;
 
 	/*
@@ -1880,17 +1879,16 @@ out:
 static int __devinit cpu_callback(struct notifier_block *nfb,
 				  unsigned long action, void *hcpu)
 {
-	pg_data_t *pgdat;
-	cpumask_t mask;
 	int nid;
 
 	if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
 		for_each_node_state(nid, N_HIGH_MEMORY) {
-			pgdat = NODE_DATA(nid);
-			mask = node_to_cpumask(pgdat->node_id);
-			if (any_online_cpu(mask) != NR_CPUS)
+			pg_data_t *pgdat = NODE_DATA(nid);
+			node_to_cpumask_ptr(mask, pgdat->node_id);
+
+			if (any_online_cpu(*mask) < nr_cpu_ids)
 				/* One of our CPUs online: restore mask */
-				set_cpus_allowed(pgdat->kswapd, mask);
+				set_cpus_allowed_ptr(pgdat->kswapd, mask);
 		}
 	}
 	return NOTIFY_OK;
--- linux-2.6.x86.orig/net/sunrpc/svc.c
+++ linux-2.6.x86/net/sunrpc/svc.c
@@ -301,7 +301,6 @@ static inline int
 svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask)
 {
 	struct svc_pool_map *m = &svc_pool_map;
-	unsigned int node; /* or cpu */
 
 	/*
 	 * The caller checks for sv_nrpools > 1, which
@@ -314,16 +313,23 @@ svc_pool_map_set_cpumask(unsigned int pi
 	default:
 		return 0;
 	case SVC_POOL_PERCPU:
-		node = m->pool_to[pidx];
+	{
+		unsigned int cpu = m->pool_to[pidx];
+
 		*oldmask = current->cpus_allowed;
-		set_cpus_allowed(current, cpumask_of_cpu(node));
+		set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
 		return 1;
+	}
 	case SVC_POOL_PERNODE:
-		node = m->pool_to[pidx];
+	{
+		unsigned int node = m->pool_to[pidx];
+		node_to_cpumask_ptr(nodecpumask, node);
+
 		*oldmask = current->cpus_allowed;
-		set_cpus_allowed(current, node_to_cpumask(node));
+		set_cpus_allowed_ptr(current, nodecpumask);
 		return 1;
 	}
+	}
 }
 
 /*

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 11/12] cpumask: reduce stack usage in SD_x_INIT initializers
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
                   ` (9 preceding siblings ...)
  2008-04-05  1:11 ` [PATCH 10/12] nodemask: use new node_to_cpumask_ptr function Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  2008-04-05  1:11 ` [PATCH 12/12] cpumask: Cleanup more uses of CPU_MASK and NODE_MASK Mike Travis
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel

[-- Attachment #1: kernel_sched_c --]
[-- Type: text/plain, Size: 30788 bytes --]

  * Remove empty cpumask_t (and all non-zero/non-null) variables
    in SD_*_INIT macros.  Use memset(0) to clear.  Also, don't
    inline the initializer functions to save on stack space in
    build_sched_domains().

  * Merge change to include/linux/topology.h that uses the new
    node_to_cpumask_ptr function in the nr_cpus_node macro into
    this patch.

Depends on:
	[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>

Signed-off-by: Mike Travis <travis@sgi.com>
---
 include/asm-x86/topology.h |    5 
 include/linux/topology.h   |   46 -----
 kernel/sched.c             |  368 ++++++++++++++++++++++++++++++---------------
 3 files changed, 256 insertions(+), 163 deletions(-)

--- linux-2.6.x86.orig/include/asm-x86/topology.h
+++ linux-2.6.x86/include/asm-x86/topology.h
@@ -155,10 +155,6 @@ extern unsigned long node_remap_size[];
 
 /* sched_domains SD_NODE_INIT for NUMAQ machines */
 #define SD_NODE_INIT (struct sched_domain) {		\
-	.span			= CPU_MASK_NONE,	\
-	.parent			= NULL,			\
-	.child			= NULL,			\
-	.groups			= NULL,			\
 	.min_interval		= 8,			\
 	.max_interval		= 32,			\
 	.busy_factor		= 32,			\
@@ -176,7 +172,6 @@ extern unsigned long node_remap_size[];
 				| SD_WAKE_BALANCE,	\
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
-	.nr_balance_failed	= 0,			\
 }
 
 #ifdef CONFIG_X86_64_ACPI_NUMA
--- linux-2.6.x86.orig/include/linux/topology.h
+++ linux-2.6.x86/include/linux/topology.h
@@ -38,16 +38,15 @@
 #endif
 
 #ifndef nr_cpus_node
-#define nr_cpus_node(node)							\
-	({									\
-		cpumask_t __tmp__;						\
-		__tmp__ = node_to_cpumask(node);				\
-		cpus_weight(__tmp__);						\
+#define nr_cpus_node(node)				\
+	({						\
+		node_to_cpumask_ptr(__tmp__, node);	\
+		cpus_weight(*__tmp__);			\
 	})
 #endif
 
-#define for_each_node_with_cpus(node)						\
-	for_each_online_node(node)						\
+#define for_each_node_with_cpus(node)			\
+	for_each_online_node(node)			\
 		if (nr_cpus_node(node))
 
 void arch_update_cpu_topology(void);
@@ -80,7 +79,9 @@ void arch_update_cpu_topology(void);
  * by defining their own arch-specific initializer in include/asm/topology.h.
  * A definition there will automagically override these default initializers
  * and allow arch-specific performance tuning of sched_domains.
+ * (Only non-zero and non-null fields need be specified.)
  */
+
 #ifdef CONFIG_SCHED_SMT
 /* MCD - Do we really need this?  It is always on if CONFIG_SCHED_SMT is,
  * so can't we drop this in favor of CONFIG_SCHED_SMT?
@@ -89,20 +90,10 @@ void arch_update_cpu_topology(void);
 /* Common values for SMT siblings */
 #ifndef SD_SIBLING_INIT
 #define SD_SIBLING_INIT (struct sched_domain) {		\
-	.span			= CPU_MASK_NONE,	\
-	.parent			= NULL,			\
-	.child			= NULL,			\
-	.groups			= NULL,			\
 	.min_interval		= 1,			\
 	.max_interval		= 2,			\
 	.busy_factor		= 64,			\
 	.imbalance_pct		= 110,			\
-	.cache_nice_tries	= 0,			\
-	.busy_idx		= 0,			\
-	.idle_idx		= 0,			\
-	.newidle_idx		= 0,			\
-	.wake_idx		= 0,			\
-	.forkexec_idx		= 0,			\
 	.flags			= SD_LOAD_BALANCE	\
 				| SD_BALANCE_NEWIDLE	\
 				| SD_BALANCE_FORK	\
@@ -112,7 +103,6 @@ void arch_update_cpu_topology(void);
 				| SD_SHARE_CPUPOWER,	\
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
-	.nr_balance_failed	= 0,			\
 }
 #endif
 #endif /* CONFIG_SCHED_SMT */
@@ -121,18 +111,12 @@ void arch_update_cpu_topology(void);
 /* Common values for MC siblings. for now mostly derived from SD_CPU_INIT */
 #ifndef SD_MC_INIT
 #define SD_MC_INIT (struct sched_domain) {		\
-	.span			= CPU_MASK_NONE,	\
-	.parent			= NULL,			\
-	.child			= NULL,			\
-	.groups			= NULL,			\
 	.min_interval		= 1,			\
 	.max_interval		= 4,			\
 	.busy_factor		= 64,			\
 	.imbalance_pct		= 125,			\
 	.cache_nice_tries	= 1,			\
 	.busy_idx		= 2,			\
-	.idle_idx		= 0,			\
-	.newidle_idx		= 0,			\
 	.wake_idx		= 1,			\
 	.forkexec_idx		= 1,			\
 	.flags			= SD_LOAD_BALANCE	\
@@ -144,7 +128,6 @@ void arch_update_cpu_topology(void);
 				| BALANCE_FOR_MC_POWER,	\
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
-	.nr_balance_failed	= 0,			\
 }
 #endif
 #endif /* CONFIG_SCHED_MC */
@@ -152,10 +135,6 @@ void arch_update_cpu_topology(void);
 /* Common values for CPUs */
 #ifndef SD_CPU_INIT
 #define SD_CPU_INIT (struct sched_domain) {		\
-	.span			= CPU_MASK_NONE,	\
-	.parent			= NULL,			\
-	.child			= NULL,			\
-	.groups			= NULL,			\
 	.min_interval		= 1,			\
 	.max_interval		= 4,			\
 	.busy_factor		= 64,			\
@@ -174,16 +153,11 @@ void arch_update_cpu_topology(void);
 				| BALANCE_FOR_PKG_POWER,\
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
-	.nr_balance_failed	= 0,			\
 }
 #endif
 
 /* sched_domains SD_ALLNODES_INIT for NUMA machines */
 #define SD_ALLNODES_INIT (struct sched_domain) {	\
-	.span			= CPU_MASK_NONE,	\
-	.parent			= NULL,			\
-	.child			= NULL,			\
-	.groups			= NULL,			\
 	.min_interval		= 64,			\
 	.max_interval		= 64*num_online_cpus(),	\
 	.busy_factor		= 128,			\
@@ -191,14 +165,10 @@ void arch_update_cpu_topology(void);
 	.cache_nice_tries	= 1,			\
 	.busy_idx		= 3,			\
 	.idle_idx		= 3,			\
-	.newidle_idx		= 0, /* unused */	\
-	.wake_idx		= 0, /* unused */	\
-	.forkexec_idx		= 0, /* unused */	\
 	.flags			= SD_LOAD_BALANCE	\
 				| SD_SERIALIZE,	\
 	.last_balance		= jiffies,		\
 	.balance_interval	= 64,			\
-	.nr_balance_failed	= 0,			\
 }
 
 #ifdef CONFIG_NUMA
--- linux-2.6.x86.orig/kernel/sched.c
+++ linux-2.6.x86/kernel/sched.c
@@ -1900,17 +1900,17 @@ find_idlest_group(struct sched_domain *s
  * find_idlest_cpu - find the idlest cpu among the cpus in group.
  */
 static int
-find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
+find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu,
+		cpumask_t *tmp)
 {
-	cpumask_t tmp;
 	unsigned long load, min_load = ULONG_MAX;
 	int idlest = -1;
 	int i;
 
 	/* Traverse only the allowed CPUs */
-	cpus_and(tmp, group->cpumask, p->cpus_allowed);
+	cpus_and(*tmp, group->cpumask, p->cpus_allowed);
 
-	for_each_cpu_mask(i, tmp) {
+	for_each_cpu_mask(i, *tmp) {
 		load = weighted_cpuload(i);
 
 		if (load < min_load || (load == min_load && i == this_cpu)) {
@@ -1949,7 +1949,7 @@ static int sched_balance_self(int cpu, i
 	}
 
 	while (sd) {
-		cpumask_t span;
+		cpumask_t span, tmpmask;
 		struct sched_group *group;
 		int new_cpu, weight;
 
@@ -1965,7 +1965,7 @@ static int sched_balance_self(int cpu, i
 			continue;
 		}
 
-		new_cpu = find_idlest_cpu(group, t, cpu);
+		new_cpu = find_idlest_cpu(group, t, cpu, &tmpmask);
 		if (new_cpu == -1 || new_cpu == cpu) {
 			/* Now try balancing at a lower domain level of cpu */
 			sd = sd->child;
@@ -2852,7 +2852,7 @@ static int move_one_task(struct rq *this
 static struct sched_group *
 find_busiest_group(struct sched_domain *sd, int this_cpu,
 		   unsigned long *imbalance, enum cpu_idle_type idle,
-		   int *sd_idle, cpumask_t *cpus, int *balance)
+		   int *sd_idle, const cpumask_t *cpus, int *balance)
 {
 	struct sched_group *busiest = NULL, *this = NULL, *group = sd->groups;
 	unsigned long max_load, avg_load, total_load, this_load, total_pwr;
@@ -3153,7 +3153,7 @@ ret:
  */
 static struct rq *
 find_busiest_queue(struct sched_group *group, enum cpu_idle_type idle,
-		   unsigned long imbalance, cpumask_t *cpus)
+		   unsigned long imbalance, const cpumask_t *cpus)
 {
 	struct rq *busiest = NULL, *rq;
 	unsigned long max_load = 0;
@@ -3192,15 +3192,16 @@ find_busiest_queue(struct sched_group *g
  */
 static int load_balance(int this_cpu, struct rq *this_rq,
 			struct sched_domain *sd, enum cpu_idle_type idle,
-			int *balance)
+			int *balance, cpumask_t *cpus)
 {
 	int ld_moved, all_pinned = 0, active_balance = 0, sd_idle = 0;
 	struct sched_group *group;
 	unsigned long imbalance;
 	struct rq *busiest;
-	cpumask_t cpus = CPU_MASK_ALL;
 	unsigned long flags;
 
+	cpus_setall(*cpus);
+
 	/*
 	 * When power savings policy is enabled for the parent domain, idle
 	 * sibling can pick up load irrespective of busy siblings. In this case,
@@ -3215,7 +3216,7 @@ static int load_balance(int this_cpu, st
 
 redo:
 	group = find_busiest_group(sd, this_cpu, &imbalance, idle, &sd_idle,
-				   &cpus, balance);
+				   cpus, balance);
 
 	if (*balance == 0)
 		goto out_balanced;
@@ -3225,7 +3226,7 @@ redo:
 		goto out_balanced;
 	}
 
-	busiest = find_busiest_queue(group, idle, imbalance, &cpus);
+	busiest = find_busiest_queue(group, idle, imbalance, cpus);
 	if (!busiest) {
 		schedstat_inc(sd, lb_nobusyq[idle]);
 		goto out_balanced;
@@ -3258,8 +3259,8 @@ redo:
 
 		/* All tasks on this runqueue were pinned by CPU affinity */
 		if (unlikely(all_pinned)) {
-			cpu_clear(cpu_of(busiest), cpus);
-			if (!cpus_empty(cpus))
+			cpu_clear(cpu_of(busiest), *cpus);
+			if (!cpus_empty(*cpus))
 				goto redo;
 			goto out_balanced;
 		}
@@ -3344,7 +3345,8 @@ out_one_pinned:
  * this_rq is locked.
  */
 static int
-load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd)
+load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd,
+			cpumask_t *cpus)
 {
 	struct sched_group *group;
 	struct rq *busiest = NULL;
@@ -3352,7 +3354,8 @@ load_balance_newidle(int this_cpu, struc
 	int ld_moved = 0;
 	int sd_idle = 0;
 	int all_pinned = 0;
-	cpumask_t cpus = CPU_MASK_ALL;
+
+	cpus_setall(*cpus);
 
 	/*
 	 * When power savings policy is enabled for the parent domain, idle
@@ -3367,14 +3370,13 @@ load_balance_newidle(int this_cpu, struc
 	schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
 redo:
 	group = find_busiest_group(sd, this_cpu, &imbalance, CPU_NEWLY_IDLE,
-				   &sd_idle, &cpus, NULL);
+				   &sd_idle, cpus, NULL);
 	if (!group) {
 		schedstat_inc(sd, lb_nobusyg[CPU_NEWLY_IDLE]);
 		goto out_balanced;
 	}
 
-	busiest = find_busiest_queue(group, CPU_NEWLY_IDLE, imbalance,
-				&cpus);
+	busiest = find_busiest_queue(group, CPU_NEWLY_IDLE, imbalance, cpus);
 	if (!busiest) {
 		schedstat_inc(sd, lb_nobusyq[CPU_NEWLY_IDLE]);
 		goto out_balanced;
@@ -3396,8 +3398,8 @@ redo:
 		spin_unlock(&busiest->lock);
 
 		if (unlikely(all_pinned)) {
-			cpu_clear(cpu_of(busiest), cpus);
-			if (!cpus_empty(cpus))
+			cpu_clear(cpu_of(busiest), *cpus);
+			if (!cpus_empty(*cpus))
 				goto redo;
 		}
 	}
@@ -3431,6 +3433,7 @@ static void idle_balance(int this_cpu, s
 	struct sched_domain *sd;
 	int pulled_task = -1;
 	unsigned long next_balance = jiffies + HZ;
+	cpumask_t tmpmask;
 
 	for_each_domain(this_cpu, sd) {
 		unsigned long interval;
@@ -3440,8 +3443,8 @@ static void idle_balance(int this_cpu, s
 
 		if (sd->flags & SD_BALANCE_NEWIDLE)
 			/* If we've pulled tasks over stop searching: */
-			pulled_task = load_balance_newidle(this_cpu,
-								this_rq, sd);
+			pulled_task = load_balance_newidle(this_cpu, this_rq,
+							   sd, &tmpmask);
 
 		interval = msecs_to_jiffies(sd->balance_interval);
 		if (time_after(next_balance, sd->last_balance + interval))
@@ -3600,6 +3603,7 @@ static void rebalance_domains(int cpu, e
 	/* Earliest time when we have to do rebalance again */
 	unsigned long next_balance = jiffies + 60*HZ;
 	int update_next_balance = 0;
+	cpumask_t tmp;
 
 	for_each_domain(cpu, sd) {
 		if (!(sd->flags & SD_LOAD_BALANCE))
@@ -3623,7 +3627,7 @@ static void rebalance_domains(int cpu, e
 		}
 
 		if (time_after_eq(jiffies, sd->last_balance + interval)) {
-			if (load_balance(cpu, rq, sd, idle, &balance)) {
+			if (load_balance(cpu, rq, sd, idle, &balance, &tmp)) {
 				/*
 				 * We've pulled tasks over so either we're no
 				 * longer idle, or one of our SMT siblings is
@@ -5000,7 +5004,7 @@ long sched_setaffinity(pid_t pid, const 
 	cpuset_cpus_allowed(p, &cpus_allowed);
 	cpus_and(new_mask, new_mask, cpus_allowed);
  again:
-	retval = set_cpus_allowed(p, new_mask);
+	retval = set_cpus_allowed_ptr(p, &new_mask);
 
 	if (!retval) {
 		cpuset_cpus_allowed(p, &cpus_allowed);
@@ -5758,7 +5762,7 @@ static void move_task_off_dead_cpu(int d
  */
 static void migrate_nr_uninterruptible(struct rq *rq_src)
 {
-	struct rq *rq_dest = cpu_rq(any_online_cpu(CPU_MASK_ALL));
+	struct rq *rq_dest = cpu_rq(any_online_cpu(*CPU_MASK_ALL_PTR));
 	unsigned long flags;
 
 	local_irq_save(flags);
@@ -6172,14 +6176,14 @@ void __init migration_init(void)
 
 #ifdef CONFIG_SCHED_DEBUG
 
-static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level)
+static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
+				  cpumask_t *groupmask)
 {
 	struct sched_group *group = sd->groups;
-	cpumask_t groupmask;
 	char str[256];
 
 	cpulist_scnprintf(str, sizeof(str), sd->span);
-	cpus_clear(groupmask);
+	cpus_clear(*groupmask);
 
 	printk(KERN_DEBUG "%*s domain %d: ", level, "", level);
 
@@ -6223,13 +6227,13 @@ static int sched_domain_debug_one(struct
 			break;
 		}
 
-		if (cpus_intersects(groupmask, group->cpumask)) {
+		if (cpus_intersects(*groupmask, group->cpumask)) {
 			printk(KERN_CONT "\n");
 			printk(KERN_ERR "ERROR: repeated CPUs\n");
 			break;
 		}
 
-		cpus_or(groupmask, groupmask, group->cpumask);
+		cpus_or(*groupmask, *groupmask, group->cpumask);
 
 		cpulist_scnprintf(str, sizeof(str), group->cpumask);
 		printk(KERN_CONT " %s", str);
@@ -6238,10 +6242,10 @@ static int sched_domain_debug_one(struct
 	} while (group != sd->groups);
 	printk(KERN_CONT "\n");
 
-	if (!cpus_equal(sd->span, groupmask))
+	if (!cpus_equal(sd->span, *groupmask))
 		printk(KERN_ERR "ERROR: groups don't span domain->span\n");
 
-	if (sd->parent && !cpus_subset(groupmask, sd->parent->span))
+	if (sd->parent && !cpus_subset(*groupmask, sd->parent->span))
 		printk(KERN_ERR "ERROR: parent span is not a superset "
 			"of domain->span\n");
 	return 0;
@@ -6249,6 +6253,7 @@ static int sched_domain_debug_one(struct
 
 static void sched_domain_debug(struct sched_domain *sd, int cpu)
 {
+	cpumask_t *groupmask;
 	int level = 0;
 
 	if (!sd) {
@@ -6258,14 +6263,21 @@ static void sched_domain_debug(struct sc
 
 	printk(KERN_DEBUG "CPU%d attaching sched-domain:\n", cpu);
 
+	groupmask = kmalloc(sizeof(cpumask_t), GFP_KERNEL);
+	if (!groupmask) {
+		printk(KERN_DEBUG "Cannot load-balance (out of memory)\n");
+		return;
+	}
+
 	for (;;) {
-		if (sched_domain_debug_one(sd, cpu, level))
+		if (sched_domain_debug_one(sd, cpu, level, groupmask))
 			break;
 		level++;
 		sd = sd->parent;
 		if (!sd)
 			break;
 	}
+	kfree(groupmask);
 }
 #else
 # define sched_domain_debug(sd, cpu) do { } while (0)
@@ -6435,30 +6447,33 @@ cpu_attach_domain(struct sched_domain *s
  * and ->cpu_power to 0.
  */
 static void
-init_sched_build_groups(cpumask_t span, const cpumask_t *cpu_map,
+init_sched_build_groups(const cpumask_t *span, const cpumask_t *cpu_map,
 			int (*group_fn)(int cpu, const cpumask_t *cpu_map,
-					struct sched_group **sg))
+					struct sched_group **sg,
+					cpumask_t *tmpmask),
+			cpumask_t *covered, cpumask_t *tmpmask)
 {
 	struct sched_group *first = NULL, *last = NULL;
-	cpumask_t covered = CPU_MASK_NONE;
 	int i;
 
-	for_each_cpu_mask(i, span) {
+	cpus_clear(*covered);
+
+	for_each_cpu_mask(i, *span) {
 		struct sched_group *sg;
-		int group = group_fn(i, cpu_map, &sg);
+		int group = group_fn(i, cpu_map, &sg, tmpmask);
 		int j;
 
-		if (cpu_isset(i, covered))
+		if (cpu_isset(i, *covered))
 			continue;
 
-		sg->cpumask = CPU_MASK_NONE;
+		cpus_clear(sg->cpumask);
 		sg->__cpu_power = 0;
 
-		for_each_cpu_mask(j, span) {
-			if (group_fn(j, cpu_map, NULL) != group)
+		for_each_cpu_mask(j, *span) {
+			if (group_fn(j, cpu_map, NULL, tmpmask) != group)
 				continue;
 
-			cpu_set(j, covered);
+			cpu_set(j, *covered);
 			cpu_set(j, sg->cpumask);
 		}
 		if (!first)
@@ -6556,7 +6571,8 @@ static DEFINE_PER_CPU(struct sched_domai
 static DEFINE_PER_CPU(struct sched_group, sched_group_cpus);
 
 static int
-cpu_to_cpu_group(int cpu, const cpumask_t *cpu_map, struct sched_group **sg)
+cpu_to_cpu_group(int cpu, const cpumask_t *cpu_map, struct sched_group **sg,
+		 cpumask_t *unused)
 {
 	if (sg)
 		*sg = &per_cpu(sched_group_cpus, cpu);
@@ -6574,19 +6590,22 @@ static DEFINE_PER_CPU(struct sched_group
 
 #if defined(CONFIG_SCHED_MC) && defined(CONFIG_SCHED_SMT)
 static int
-cpu_to_core_group(int cpu, const cpumask_t *cpu_map, struct sched_group **sg)
+cpu_to_core_group(int cpu, const cpumask_t *cpu_map, struct sched_group **sg,
+		  cpumask_t *mask)
 {
 	int group;
-	cpumask_t mask = per_cpu(cpu_sibling_map, cpu);
-	cpus_and(mask, mask, *cpu_map);
-	group = first_cpu(mask);
+
+	*mask = per_cpu(cpu_sibling_map, cpu);
+	cpus_and(*mask, *mask, *cpu_map);
+	group = first_cpu(*mask);
 	if (sg)
 		*sg = &per_cpu(sched_group_core, group);
 	return group;
 }
 #elif defined(CONFIG_SCHED_MC)
 static int
-cpu_to_core_group(int cpu, const cpumask_t *cpu_map, struct sched_group **sg)
+cpu_to_core_group(int cpu, const cpumask_t *cpu_map, struct sched_group **sg,
+		  cpumask_t *unused)
 {
 	if (sg)
 		*sg = &per_cpu(sched_group_core, cpu);
@@ -6598,17 +6617,18 @@ static DEFINE_PER_CPU(struct sched_domai
 static DEFINE_PER_CPU(struct sched_group, sched_group_phys);
 
 static int
-cpu_to_phys_group(int cpu, const cpumask_t *cpu_map, struct sched_group **sg)
+cpu_to_phys_group(int cpu, const cpumask_t *cpu_map, struct sched_group **sg,
+		  cpumask_t *mask)
 {
 	int group;
 #ifdef CONFIG_SCHED_MC
-	cpumask_t mask = *cpu_coregroup_map(cpu);
-	cpus_and(mask, mask, *cpu_map);
-	group = first_cpu(mask);
+	*mask = *cpu_coregroup_map(cpu);
+	cpus_and(*mask, *mask, *cpu_map);
+	group = first_cpu(*mask);
 #elif defined(CONFIG_SCHED_SMT)
-	cpumask_t mask = per_cpu(cpu_sibling_map, cpu);
-	cpus_and(mask, mask, *cpu_map);
-	group = first_cpu(mask);
+	*mask = per_cpu(cpu_sibling_map, cpu);
+	cpus_and(*mask, *mask, *cpu_map);
+	group = first_cpu(*mask);
 #else
 	group = cpu;
 #endif
@@ -6630,13 +6650,13 @@ static DEFINE_PER_CPU(struct sched_domai
 static DEFINE_PER_CPU(struct sched_group, sched_group_allnodes);
 
 static int cpu_to_allnodes_group(int cpu, const cpumask_t *cpu_map,
-				 struct sched_group **sg)
+				 struct sched_group **sg, cpumask_t *nodemask)
 {
-	cpumask_t nodemask = node_to_cpumask(cpu_to_node(cpu));
 	int group;
 
-	cpus_and(nodemask, nodemask, *cpu_map);
-	group = first_cpu(nodemask);
+	*nodemask = node_to_cpumask(cpu_to_node(cpu));
+	cpus_and(*nodemask, *nodemask, *cpu_map);
+	group = first_cpu(*nodemask);
 
 	if (sg)
 		*sg = &per_cpu(sched_group_allnodes, group);
@@ -6672,7 +6692,7 @@ static void init_numa_sched_groups_power
 
 #ifdef CONFIG_NUMA
 /* Free memory allocated for various sched_group structures */
-static void free_sched_groups(const cpumask_t *cpu_map)
+static void free_sched_groups(const cpumask_t *cpu_map, cpumask_t *nodemask)
 {
 	int cpu, i;
 
@@ -6684,11 +6704,11 @@ static void free_sched_groups(const cpum
 			continue;
 
 		for (i = 0; i < MAX_NUMNODES; i++) {
-			cpumask_t nodemask = node_to_cpumask(i);
 			struct sched_group *oldsg, *sg = sched_group_nodes[i];
 
-			cpus_and(nodemask, nodemask, *cpu_map);
-			if (cpus_empty(nodemask))
+			*nodemask = node_to_cpumask(i);
+			cpus_and(*nodemask, *nodemask, *cpu_map);
+			if (cpus_empty(*nodemask))
 				continue;
 
 			if (sg == NULL)
@@ -6706,7 +6726,7 @@ next_sg:
 	}
 }
 #else
-static void free_sched_groups(const cpumask_t *cpu_map)
+static void free_sched_groups(const cpumask_t *cpu_map, cpumask_t *nodemask)
 {
 }
 #endif
@@ -6764,6 +6784,65 @@ static void init_sched_groups_power(int 
 }
 
 /*
+ * Initializers for schedule domains
+ * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
+ */
+
+#define	SD_INIT(sd, type)	sd_init_##type(sd)
+#define SD_INIT_FUNC(type)	\
+static noinline void sd_init_##type(struct sched_domain *sd)	\
+{								\
+	memset(sd, 0, sizeof(*sd));				\
+	*sd = SD_##type##_INIT;					\
+}
+
+SD_INIT_FUNC(CPU)
+#ifdef CONFIG_NUMA
+ SD_INIT_FUNC(ALLNODES)
+ SD_INIT_FUNC(NODE)
+#endif
+#ifdef CONFIG_SCHED_SMT
+ SD_INIT_FUNC(SIBLING)
+#endif
+#ifdef CONFIG_SCHED_MC
+ SD_INIT_FUNC(MC)
+#endif
+
+/*
+ * To minimize stack usage kmalloc room for cpumasks and share the
+ * space as the usage in build_sched_domains() dictates.  Used only
+ * if the amount of space is significant.
+ */
+struct allmasks {
+	cpumask_t tmpmask;			/* make this one first */
+	union {
+		cpumask_t nodemask;
+		cpumask_t this_sibling_map;
+		cpumask_t this_core_map;
+	};
+	cpumask_t send_covered;
+
+#ifdef CONFIG_NUMA
+	cpumask_t domainspan;
+	cpumask_t covered;
+	cpumask_t notcovered;
+#endif
+};
+
+#if	NR_CPUS > 128
+#define	SCHED_CPUMASK_ALLOC		1
+#define	SCHED_CPUMASK_FREE(v)		kfree(v)
+#define	SCHED_CPUMASK_DECLARE(v)	struct allmasks *v
+#else
+#define	SCHED_CPUMASK_ALLOC		0
+#define	SCHED_CPUMASK_FREE(v)
+#define	SCHED_CPUMASK_DECLARE(v)	struct allmasks _v, *v = &_v
+#endif
+
+#define	SCHED_CPUMASK_VAR(v, a) 	cpumask_t *v = (cpumask_t *) \
+			((unsigned long)(a) + offsetof(struct allmasks, v))
+
+/*
  * Build sched domains for a given set of cpus and attach the sched domains
  * to the individual cpus
  */
@@ -6771,6 +6850,8 @@ static int build_sched_domains(const cpu
 {
 	int i;
 	struct root_domain *rd;
+	SCHED_CPUMASK_DECLARE(allmasks);
+	cpumask_t *tmpmask;
 #ifdef CONFIG_NUMA
 	struct sched_group **sched_group_nodes = NULL;
 	int sd_allnodes = 0;
@@ -6784,38 +6865,60 @@ static int build_sched_domains(const cpu
 		printk(KERN_WARNING "Can not alloc sched group node list\n");
 		return -ENOMEM;
 	}
-	sched_group_nodes_bycpu[first_cpu(*cpu_map)] = sched_group_nodes;
 #endif
 
 	rd = alloc_rootdomain();
 	if (!rd) {
 		printk(KERN_WARNING "Cannot alloc root domain\n");
+#ifdef CONFIG_NUMA
+		kfree(sched_group_nodes);
+#endif
 		return -ENOMEM;
 	}
 
+#if SCHED_CPUMASK_ALLOC
+	/* get space for all scratch cpumask variables */
+	allmasks = kmalloc(sizeof(*allmasks), GFP_KERNEL);
+	if (!allmasks) {
+		printk(KERN_WARNING "Cannot alloc cpumask array\n");
+		kfree(rd);
+#ifdef CONFIG_NUMA
+		kfree(sched_group_nodes);
+#endif
+		return -ENOMEM;
+	}
+#endif
+	tmpmask = (cpumask_t *)allmasks;
+
+
+#ifdef CONFIG_NUMA
+	sched_group_nodes_bycpu[first_cpu(*cpu_map)] = sched_group_nodes;
+#endif
+
 	/*
 	 * Set up domains for cpus specified by the cpu_map.
 	 */
 	for_each_cpu_mask(i, *cpu_map) {
 		struct sched_domain *sd = NULL, *p;
-		cpumask_t nodemask = node_to_cpumask(cpu_to_node(i));
+		SCHED_CPUMASK_VAR(nodemask, allmasks);
 
-		cpus_and(nodemask, nodemask, *cpu_map);
+		*nodemask = node_to_cpumask(cpu_to_node(i));
+		cpus_and(*nodemask, *nodemask, *cpu_map);
 
 #ifdef CONFIG_NUMA
 		if (cpus_weight(*cpu_map) >
-				SD_NODES_PER_DOMAIN*cpus_weight(nodemask)) {
+				SD_NODES_PER_DOMAIN*cpus_weight(*nodemask)) {
 			sd = &per_cpu(allnodes_domains, i);
-			*sd = SD_ALLNODES_INIT;
+			SD_INIT(sd, ALLNODES);
 			sd->span = *cpu_map;
-			cpu_to_allnodes_group(i, cpu_map, &sd->groups);
+			cpu_to_allnodes_group(i, cpu_map, &sd->groups, tmpmask);
 			p = sd;
 			sd_allnodes = 1;
 		} else
 			p = NULL;
 
 		sd = &per_cpu(node_domains, i);
-		*sd = SD_NODE_INIT;
+		SD_INIT(sd, NODE);
 		sd->span = sched_domain_node_span(cpu_to_node(i));
 		sd->parent = p;
 		if (p)
@@ -6825,94 +6928,114 @@ static int build_sched_domains(const cpu
 
 		p = sd;
 		sd = &per_cpu(phys_domains, i);
-		*sd = SD_CPU_INIT;
-		sd->span = nodemask;
+		SD_INIT(sd, CPU);
+		sd->span = *nodemask;
 		sd->parent = p;
 		if (p)
 			p->child = sd;
-		cpu_to_phys_group(i, cpu_map, &sd->groups);
+		cpu_to_phys_group(i, cpu_map, &sd->groups, tmpmask);
 
 #ifdef CONFIG_SCHED_MC
 		p = sd;
 		sd = &per_cpu(core_domains, i);
-		*sd = SD_MC_INIT;
+		SD_INIT(sd, MC);
 		sd->span = *cpu_coregroup_map(i);
 		cpus_and(sd->span, sd->span, *cpu_map);
 		sd->parent = p;
 		p->child = sd;
-		cpu_to_core_group(i, cpu_map, &sd->groups);
+		cpu_to_core_group(i, cpu_map, &sd->groups, tmpmask);
 #endif
 
 #ifdef CONFIG_SCHED_SMT
 		p = sd;
 		sd = &per_cpu(cpu_domains, i);
-		*sd = SD_SIBLING_INIT;
+		SD_INIT(sd, SIBLING);
 		sd->span = per_cpu(cpu_sibling_map, i);
 		cpus_and(sd->span, sd->span, *cpu_map);
 		sd->parent = p;
 		p->child = sd;
-		cpu_to_cpu_group(i, cpu_map, &sd->groups);
+		cpu_to_cpu_group(i, cpu_map, &sd->groups, tmpmask);
 #endif
 	}
 
 #ifdef CONFIG_SCHED_SMT
 	/* Set up CPU (sibling) groups */
 	for_each_cpu_mask(i, *cpu_map) {
-		cpumask_t this_sibling_map = per_cpu(cpu_sibling_map, i);
-		cpus_and(this_sibling_map, this_sibling_map, *cpu_map);
-		if (i != first_cpu(this_sibling_map))
+		SCHED_CPUMASK_VAR(this_sibling_map, allmasks);
+		SCHED_CPUMASK_VAR(send_covered, allmasks);
+
+		*this_sibling_map = per_cpu(cpu_sibling_map, i);
+		cpus_and(*this_sibling_map, *this_sibling_map, *cpu_map);
+		if (i != first_cpu(*this_sibling_map))
 			continue;
 
 		init_sched_build_groups(this_sibling_map, cpu_map,
-					&cpu_to_cpu_group);
+					&cpu_to_cpu_group,
+					send_covered, tmpmask);
 	}
 #endif
 
 #ifdef CONFIG_SCHED_MC
 	/* Set up multi-core groups */
 	for_each_cpu_mask(i, *cpu_map) {
-		cpumask_t this_core_map = *cpu_coregroup_map(i);
-		cpus_and(this_core_map, this_core_map, *cpu_map);
-		if (i != first_cpu(this_core_map))
+		SCHED_CPUMASK_VAR(this_core_map, allmasks);
+		SCHED_CPUMASK_VAR(send_covered, allmasks);
+
+		*this_core_map = *cpu_coregroup_map(i);
+		cpus_and(*this_core_map, *this_core_map, *cpu_map);
+		if (i != first_cpu(*this_core_map))
 			continue;
+
 		init_sched_build_groups(this_core_map, cpu_map,
-					&cpu_to_core_group);
+					&cpu_to_core_group,
+					send_covered, tmpmask);
 	}
 #endif
 
 	/* Set up physical groups */
 	for (i = 0; i < MAX_NUMNODES; i++) {
-		cpumask_t nodemask = node_to_cpumask(i);
+		SCHED_CPUMASK_VAR(nodemask, allmasks);
+		SCHED_CPUMASK_VAR(send_covered, allmasks);
 
-		cpus_and(nodemask, nodemask, *cpu_map);
-		if (cpus_empty(nodemask))
+		*nodemask = node_to_cpumask(i);
+		cpus_and(*nodemask, *nodemask, *cpu_map);
+		if (cpus_empty(*nodemask))
 			continue;
 
-		init_sched_build_groups(nodemask, cpu_map, &cpu_to_phys_group);
+		init_sched_build_groups(nodemask, cpu_map,
+					&cpu_to_phys_group,
+					send_covered, tmpmask);
 	}
 
 #ifdef CONFIG_NUMA
 	/* Set up node groups */
-	if (sd_allnodes)
-		init_sched_build_groups(*cpu_map, cpu_map,
-					&cpu_to_allnodes_group);
+	if (sd_allnodes) {
+		SCHED_CPUMASK_VAR(send_covered, allmasks);
+
+		init_sched_build_groups(cpu_map, cpu_map,
+					&cpu_to_allnodes_group,
+					send_covered, tmpmask);
+	}
 
 	for (i = 0; i < MAX_NUMNODES; i++) {
 		/* Set up node groups */
 		struct sched_group *sg, *prev;
-		cpumask_t nodemask = node_to_cpumask(i);
-		cpumask_t domainspan;
-		cpumask_t covered = CPU_MASK_NONE;
+		SCHED_CPUMASK_VAR(nodemask, allmasks);
+		SCHED_CPUMASK_VAR(domainspan, allmasks);
+		SCHED_CPUMASK_VAR(covered, allmasks);
 		int j;
 
-		cpus_and(nodemask, nodemask, *cpu_map);
-		if (cpus_empty(nodemask)) {
+		*nodemask = node_to_cpumask(i);
+		cpus_clear(*covered);
+
+		cpus_and(*nodemask, *nodemask, *cpu_map);
+		if (cpus_empty(*nodemask)) {
 			sched_group_nodes[i] = NULL;
 			continue;
 		}
 
-		domainspan = sched_domain_node_span(i);
-		cpus_and(domainspan, domainspan, *cpu_map);
+		*domainspan = sched_domain_node_span(i);
+		cpus_and(*domainspan, *domainspan, *cpu_map);
 
 		sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
 		if (!sg) {
@@ -6921,31 +7044,31 @@ static int build_sched_domains(const cpu
 			goto error;
 		}
 		sched_group_nodes[i] = sg;
-		for_each_cpu_mask(j, nodemask) {
+		for_each_cpu_mask(j, *nodemask) {
 			struct sched_domain *sd;
 
 			sd = &per_cpu(node_domains, j);
 			sd->groups = sg;
 		}
 		sg->__cpu_power = 0;
-		sg->cpumask = nodemask;
+		sg->cpumask = *nodemask;
 		sg->next = sg;
-		cpus_or(covered, covered, nodemask);
+		cpus_or(*covered, *covered, *nodemask);
 		prev = sg;
 
 		for (j = 0; j < MAX_NUMNODES; j++) {
-			cpumask_t tmp, notcovered;
+			SCHED_CPUMASK_VAR(notcovered, allmasks);
 			int n = (i + j) % MAX_NUMNODES;
 			node_to_cpumask_ptr(pnodemask, n);
 
-			cpus_complement(notcovered, covered);
-			cpus_and(tmp, notcovered, *cpu_map);
-			cpus_and(tmp, tmp, domainspan);
-			if (cpus_empty(tmp))
+			cpus_complement(*notcovered, *covered);
+			cpus_and(*tmpmask, *notcovered, *cpu_map);
+			cpus_and(*tmpmask, *tmpmask, *domainspan);
+			if (cpus_empty(*tmpmask))
 				break;
 
-			cpus_and(tmp, tmp, *pnodemask);
-			if (cpus_empty(tmp))
+			cpus_and(*tmpmask, *tmpmask, *pnodemask);
+			if (cpus_empty(*tmpmask))
 				continue;
 
 			sg = kmalloc_node(sizeof(struct sched_group),
@@ -6956,9 +7079,9 @@ static int build_sched_domains(const cpu
 				goto error;
 			}
 			sg->__cpu_power = 0;
-			sg->cpumask = tmp;
+			sg->cpumask = *tmpmask;
 			sg->next = prev->next;
-			cpus_or(covered, covered, tmp);
+			cpus_or(*covered, *covered, *tmpmask);
 			prev->next = sg;
 			prev = sg;
 		}
@@ -6994,7 +7117,8 @@ static int build_sched_domains(const cpu
 	if (sd_allnodes) {
 		struct sched_group *sg;
 
-		cpu_to_allnodes_group(first_cpu(*cpu_map), cpu_map, &sg);
+		cpu_to_allnodes_group(first_cpu(*cpu_map), cpu_map, &sg,
+								tmpmask);
 		init_numa_sched_groups_power(sg);
 	}
 #endif
@@ -7012,11 +7136,13 @@ static int build_sched_domains(const cpu
 		cpu_attach_domain(sd, rd, i);
 	}
 
+	SCHED_CPUMASK_FREE((void *)allmasks);
 	return 0;
 
 #ifdef CONFIG_NUMA
 error:
-	free_sched_groups(cpu_map);
+	free_sched_groups(cpu_map, tmpmask);
+	SCHED_CPUMASK_FREE((void *)allmasks);
 	return -ENOMEM;
 #endif
 }
@@ -7056,9 +7182,10 @@ static int arch_init_sched_domains(const
 	return err;
 }
 
-static void arch_destroy_sched_domains(const cpumask_t *cpu_map)
+static void arch_destroy_sched_domains(const cpumask_t *cpu_map,
+				       cpumask_t *tmpmask)
 {
-	free_sched_groups(cpu_map);
+	free_sched_groups(cpu_map, tmpmask);
 }
 
 /*
@@ -7067,6 +7194,7 @@ static void arch_destroy_sched_domains(c
  */
 static void detach_destroy_domains(const cpumask_t *cpu_map)
 {
+	cpumask_t tmpmask;
 	int i;
 
 	unregister_sched_domain_sysctl();
@@ -7074,7 +7202,7 @@ static void detach_destroy_domains(const
 	for_each_cpu_mask(i, *cpu_map)
 		cpu_attach_domain(NULL, &def_root_domain, i);
 	synchronize_sched();
-	arch_destroy_sched_domains(cpu_map);
+	arch_destroy_sched_domains(cpu_map, &tmpmask);
 }
 
 /*
@@ -7282,7 +7410,7 @@ void __init sched_init_smp(void)
 	hotcpu_notifier(update_sched_domains, 0);
 
 	/* Move init over to a non-isolated CPU */
-	if (set_cpus_allowed(current, non_isolated_cpus) < 0)
+	if (set_cpus_allowed_ptr(current, &non_isolated_cpus) < 0)
 		BUG();
 	sched_init_granularity();
 }

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 12/12] cpumask: Cleanup more uses of CPU_MASK and NODE_MASK
  2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
                   ` (10 preceding siblings ...)
  2008-04-05  1:11 ` [PATCH 11/12] cpumask: reduce stack usage in SD_x_INIT initializers Mike Travis
@ 2008-04-05  1:11 ` Mike Travis
  11 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-05  1:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel

[-- Attachment #1: use-CPUMASK_ALL_PTR --]
[-- Type: text/plain, Size: 2339 bytes --]

 *  Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
    NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
    and MAXNODES counts.

 *  In some cases, the cpumask variable was initialized but then overwritten
    with another value.  This is the case for changes like this:

    -       cpumask_t oldmask = CPU_MASK_ALL;
    +       cpumask_t oldmask;

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   x86/latest          .../x86/linux-2.6-x86.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/genapic_flat_64.c |    4 +++-
 arch/x86/kernel/io_apic_64.c      |    2 +-
 kernel/irq/chip.c                 |    2 +-
 mm/allocpercpu.c                  |    3 ++-
 4 files changed, 7 insertions(+), 4 deletions(-)

--- linux-2.6.x86.orig/arch/x86/kernel/genapic_flat_64.c
+++ linux-2.6.x86/arch/x86/kernel/genapic_flat_64.c
@@ -138,7 +138,9 @@ static cpumask_t physflat_target_cpus(vo
 
 static cpumask_t physflat_vector_allocation_domain(int cpu)
 {
-	cpumask_t domain = CPU_MASK_NONE;
+	cpumask_t domain;
+
+	cpus_clear(domain);
 	cpu_set(cpu, domain);
 	return domain;
 }
--- linux-2.6.x86.orig/arch/x86/kernel/io_apic_64.c
+++ linux-2.6.x86/arch/x86/kernel/io_apic_64.c
@@ -772,7 +772,7 @@ static void __clear_irq_vector(int irq)
 		per_cpu(vector_irq, cpu)[vector] = -1;
 
 	cfg->vector = 0;
-	cfg->domain = CPU_MASK_NONE;
+	cpus_clear(cfg->domain);
 }
 
 void __setup_vector_irq(int cpu)
--- linux-2.6.x86.orig/kernel/irq/chip.c
+++ linux-2.6.x86/kernel/irq/chip.c
@@ -47,7 +47,7 @@ void dynamic_irq_init(unsigned int irq)
 	desc->irq_count = 0;
 	desc->irqs_unhandled = 0;
 #ifdef CONFIG_SMP
-	desc->affinity = CPU_MASK_ALL;
+	cpus_setall(desc->affinity);
 #endif
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
--- linux-2.6.x86.orig/mm/allocpercpu.c
+++ linux-2.6.x86/mm/allocpercpu.c
@@ -82,9 +82,10 @@ EXPORT_SYMBOL_GPL(percpu_populate);
 int __percpu_populate_mask(void *__pdata, size_t size, gfp_t gfp,
 			   cpumask_t *mask)
 {
-	cpumask_t populated = CPU_MASK_NONE;
+	cpumask_t populated;
 	int cpu;
 
+	cpus_clear(populated);
 	for_each_cpu_mask(cpu, *mask)
 		if (unlikely(!percpu_populate(__pdata, size, gfp, cpu))) {
 			__percpu_depopulate_mask(__pdata, &populated);

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 04/12] sched: Remove fixed NR_CPUS sized arrays in kernel_sched_c v2
  2008-04-05  1:11 ` [PATCH 04/12] sched: Remove fixed NR_CPUS sized arrays in kernel_sched_c v2 Mike Travis
@ 2008-04-22 16:16   ` Tony Luck
  2008-04-22 16:42     ` Mike Travis
  2008-04-22 17:04     ` [PATCH 1/1] sched: remove unnecessary kzalloc in sched_init_smp Mike Travis
  0 siblings, 2 replies; 17+ messages in thread
From: Tony Luck @ 2008-04-22 16:16 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	linux-kernel

On Fri, Apr 4, 2008 at 6:11 PM, Mike Travis <travis@sgi.com> wrote:
>  @@ -7297,6 +7287,11 @@ void __init sched_init_smp(void)
>   #else
>   void __init sched_init_smp(void)
>   {
>  +#if defined(CONFIG_NUMA)
>  +       sched_group_nodes_bycpu = kzalloc(nr_cpu_ids * sizeof(void **),
>  +                                                               GFP_KERNEL);
>  +       BUG_ON(sched_group_nodes_bycpu == NULL);
>  +#endif
>         sched_init_granularity();
>   }
>   #endif /* CONFIG_SMP */

This hunk is causing problems with one of my builds (generic,
uniprocessor). Note
that the #else at the start of this hunk is from a #ifdef CONFIG_SMP ... so I'm
wondering why we need #if defined(CONFIG_NUMA) inside uniprocessor code :-)
How can you have NUMA issues with only one cpu!!!

[I'm also wondering why the config that has the compile problem has CONFIG_SMP=n
and CONFIG_NUMA=y ... but that weirdness exposed this silliness, so
perhaps it isn't
all bad]

Error message is:
kernel/sched.c: In function `sched_init_smp':
kernel/sched.c:7994: error: `sched_group_nodes_bycpu' undeclared
(first use in this function)
kernel/sched.c:7994: error: (Each undeclared identifier is reported only once
kernel/sched.c:7994: error: for each function it appears in.)

-Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 04/12] sched: Remove fixed NR_CPUS sized arrays in kernel_sched_c v2
  2008-04-22 16:16   ` Tony Luck
@ 2008-04-22 16:42     ` Mike Travis
  2008-04-22 17:04     ` [PATCH 1/1] sched: remove unnecessary kzalloc in sched_init_smp Mike Travis
  1 sibling, 0 replies; 17+ messages in thread
From: Mike Travis @ 2008-04-22 16:42 UTC (permalink / raw)
  To: Tony Luck
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	linux-kernel

Tony Luck wrote:
> On Fri, Apr 4, 2008 at 6:11 PM, Mike Travis <travis@sgi.com> wrote:
>>  @@ -7297,6 +7287,11 @@ void __init sched_init_smp(void)
>>   #else
>>   void __init sched_init_smp(void)
>>   {
>>  +#if defined(CONFIG_NUMA)
>>  +       sched_group_nodes_bycpu = kzalloc(nr_cpu_ids * sizeof(void **),
>>  +                                                               GFP_KERNEL);
>>  +       BUG_ON(sched_group_nodes_bycpu == NULL);
>>  +#endif
>>         sched_init_granularity();
>>   }
>>   #endif /* CONFIG_SMP */
> 
> This hunk is causing problems with one of my builds (generic,
> uniprocessor). Note
> that the #else at the start of this hunk is from a #ifdef CONFIG_SMP ... so I'm
> wondering why we need #if defined(CONFIG_NUMA) inside uniprocessor code :-)
> How can you have NUMA issues with only one cpu!!!
> 
> [I'm also wondering why the config that has the compile problem has CONFIG_SMP=n
> and CONFIG_NUMA=y ... but that weirdness exposed this silliness, so
> perhaps it isn't
> all bad]
> 
> Error message is:
> kernel/sched.c: In function `sched_init_smp':
> kernel/sched.c:7994: error: `sched_group_nodes_bycpu' undeclared
> (first use in this function)
> kernel/sched.c:7994: error: (Each undeclared identifier is reported only once
> kernel/sched.c:7994: error: for each function it appears in.)
> 
> -Tony

Hi Tony,

Hmm, yes, good point.  I guess you might have it when there's a single
processor under a guest OS that's not on node 0, but I suspect there's
plenty of other problems with that scenario.

We could make CONFIG_NUMA dependent on CONFIG_SMP?  Or define
sched_group_nodes_bycpu for the non-SMP case?

[I'll have to go back to that patch and research why I added this.]

Thanks,
Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/1] sched: remove unnecessary kzalloc in sched_init_smp
  2008-04-22 16:16   ` Tony Luck
  2008-04-22 16:42     ` Mike Travis
@ 2008-04-22 17:04     ` Mike Travis
  2008-04-22 17:28       ` Luck, Tony
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Travis @ 2008-04-22 17:04 UTC (permalink / raw)
  To: Tony Luck, Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel

Tony Luck wrote:
> On Fri, Apr 4, 2008 at 6:11 PM, Mike Travis <travis@sgi.com> wrote:
>>  @@ -7297,6 +7287,11 @@ void __init sched_init_smp(void)
>>   #else
>>   void __init sched_init_smp(void)
>>   {
>>  +#if defined(CONFIG_NUMA)
>>  +       sched_group_nodes_bycpu = kzalloc(nr_cpu_ids * sizeof(void **),
>>  +                                                               GFP_KERNEL);
>>  +       BUG_ON(sched_group_nodes_bycpu == NULL);
>>  +#endif
>>         sched_init_granularity();
>>   }
>>   #endif /* CONFIG_SMP */
> 
> This hunk is causing problems with one of my builds (generic,
> uniprocessor). Note
> that the #else at the start of this hunk is from a #ifdef CONFIG_SMP ... so I'm
> wondering why we need #if defined(CONFIG_NUMA) inside uniprocessor code :-)
> How can you have NUMA issues with only one cpu!!!

>>> CONFIG_NUMA is dependent on CONFIG_SMP for x86 so that caused me to miss
>>> the error.  But here's the fix:

  * sched_group_nodes_bycpu is defined only for the SMP + NUMA case,
    it should not be referenced in the !SMP + NUMA case.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 kernel/sched.c |    5 -----
 1 file changed, 5 deletions(-)

--- linux-2.6.sched.orig/kernel/sched.c
+++ linux-2.6.sched/kernel/sched.c
@@ -8028,11 +8028,6 @@ void __init sched_init_smp(void)
 #else
 void __init sched_init_smp(void)
 {
-#if defined(CONFIG_NUMA)
-	sched_group_nodes_bycpu = kzalloc(nr_cpu_ids * sizeof(void **),
-								GFP_KERNEL);
-	BUG_ON(sched_group_nodes_bycpu == NULL);
-#endif
 	sched_init_granularity();
 }
 #endif /* CONFIG_SMP */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 1/1] sched: remove unnecessary kzalloc in sched_init_smp
  2008-04-22 17:04     ` [PATCH 1/1] sched: remove unnecessary kzalloc in sched_init_smp Mike Travis
@ 2008-04-22 17:28       ` Luck, Tony
  0 siblings, 0 replies; 17+ messages in thread
From: Luck, Tony @ 2008-04-22 17:28 UTC (permalink / raw)
  To: Mike Travis, Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-kernel

Acked-by: Tony Luck <tony.luck@intel.com>

> For inclusion into sched-devel/latest tree.

Needs to head to Linus tree ... the preceeding patches have already
been merged there and are causing build errors for me.

Signed-off-by: Mike Travis <travis@sgi.com>
---
 kernel/sched.c |    5 -----
 1 file changed, 5 deletions(-)

--- linux-2.6.sched.orig/kernel/sched.c
+++ linux-2.6.sched/kernel/sched.c
@@ -8028,11 +8028,6 @@ void __init sched_init_smp(void)
 #else
 void __init sched_init_smp(void)
 {
-#if defined(CONFIG_NUMA)
-	sched_group_nodes_bycpu = kzalloc(nr_cpu_ids * sizeof(void **),
-								GFP_KERNEL);
-	BUG_ON(sched_group_nodes_bycpu == NULL);
-#endif
 	sched_init_granularity();
 }
 #endif /* CONFIG_SMP */

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-04-22 17:28 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-05  1:11 [PATCH 00/12] cpumask: reduce stack pressure from local/passed cpumask variables v3 Mike Travis
2008-04-05  1:11 ` [PATCH 01/12] x86: Convert cpumask_of_cpu macro to allocated array Mike Travis
2008-04-05  1:11 ` [PATCH 02/12] cpumask: add CPU_MASK_ALL_PTR macro Mike Travis
2008-04-05  1:11 ` [PATCH 03/12] cpumask: reduce stack pressure in cpu_coregroup_map Mike Travis
2008-04-05  1:11 ` [PATCH 04/12] sched: Remove fixed NR_CPUS sized arrays in kernel_sched_c v2 Mike Travis
2008-04-22 16:16   ` Tony Luck
2008-04-22 16:42     ` Mike Travis
2008-04-22 17:04     ` [PATCH 1/1] sched: remove unnecessary kzalloc in sched_init_smp Mike Travis
2008-04-22 17:28       ` Luck, Tony
2008-04-05  1:11 ` [PATCH 05/12] x86: use new set_cpus_allowed_ptr function Mike Travis
2008-04-05  1:11 ` [PATCH 06/12] generic: " Mike Travis
2008-04-05  1:11 ` [PATCH 07/12] cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer Mike Travis
2008-04-05  1:11 ` [PATCH 08/12] generic: reduce stack pressure in sched_affinity Mike Travis
2008-04-05  1:11 ` [PATCH 09/12] numa: move large array from stack to _initdata section Mike Travis
2008-04-05  1:11 ` [PATCH 10/12] nodemask: use new node_to_cpumask_ptr function Mike Travis
2008-04-05  1:11 ` [PATCH 11/12] cpumask: reduce stack usage in SD_x_INIT initializers Mike Travis
2008-04-05  1:11 ` [PATCH 12/12] cpumask: Cleanup more uses of CPU_MASK and NODE_MASK Mike Travis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox