LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: Sampling instruction pointer on PPC
From: David Ahern @ 2012-03-01 18:00 UTC (permalink / raw)
  To: Victor Jimenez, linuxppc-dev; +Cc: linux-perf-users
In-Reply-To: <4F4FACFA.6080209@bsc.es>

[Added linuxppc-dev list.]

On 3/1/12 10:08 AM, Victor Jimenez wrote:
> I am trying to sample instruction pointer along time on a Power7 system.
> I know that there are accurate mechanisms to do so in Intel processors
> (e.g., PEBS and Branch Trace Store).
>
> Is it possible to do something similar in Power7? Will the samples be
> accurate? I am worried that significant delays (skids) may appear.
>
> Thank you,
> Victor
>
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
>
> http://www.bsc.es/disclaimer.htm
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] atomic: Allow atomic_inc_not_zero to be overridden
From: Mike Frysinger @ 2012-03-01 17:19 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: eric.dumazet, linux-kernel, paulus, akpm, linuxppc-dev, asharma
In-Reply-To: <20120301180953.0f61576f@kryten>

[-- Attachment #1: Type: Text/Plain, Size: 236 bytes --]

On Thursday 01 March 2012 02:09:53 Anton Blanchard wrote:
> We want to implement a ppc64 specific version of atomic_inc_not_zero
> so wrap it in an ifdef to allow it to be overridden.

Acked-by: Mike Frysinger <vapier@gentoo.org>
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
From: Mel Gorman @ 2012-03-01 11:42 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
	Johannes Weiner, Andrew Morton, Robert Jennings, linuxppc-dev
In-Reply-To: <20120229181233.GF5136@linux.vnet.ibm.com>

On Wed, Feb 29, 2012 at 10:12:33AM -0800, Nishanth Aravamudan wrote:
> <SNIP>
>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> 

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply

* [PATCH v2] powerpc: document the FSL MPIC message register binding
From: Jia Hongtao @ 2012-03-01  9:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: b38951

This binding documents how the message register blocks found in some FSL
MPIC implementations shall be represented in a device tree.

Signed-off-by: Meador Inge <meador_inge@mentor.com>
Signed-off-by: Jia Hongtao <B38951@freescale.com>
---
Changes for v2:
 * Update compatible type from <string> to <string-list>.
 * Update interrupts description.
 * Update mpic-msgr-receive-mask description.

 .../devicetree/bindings/powerpc/fsl/mpic-msgr.txt  |   64 ++++++++++++++++++++
 1 files changed, 64 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt b/Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt
new file mode 100644
index 0000000..d52ac48
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt
@@ -0,0 +1,64 @@
+* FSL MPIC Message Registers
+
+This binding specifies what properties must be available in the device tree
+representation of the message register blocks found in some FSL MPIC
+implementations.
+
+Required properties:
+
+    - compatible: Specifies the compatibility list for the message register
+      block.  The type shall be <string-list> and the value shall be of the form
+      "fsl,mpic-v<version>-msgr", where <version> is the version number of
+      the MPIC containing the message registers.
+
+    - reg: Specifies the base physical address(s) and size(s) of the
+      message register block's addressable register space.  The type shall be
+      <prop-encoded-array>.
+
+    - interrupts: Specifies a list of interrupt-specifiers which are available
+      for receiving interrupts. Interrupt-specifier consists of two cells: first
+      cell is interrupt-number and second cell is level-sense. The type shall be
+      <prop-encoded-array>.
+
+Optional properties:
+
+    - mpic-msgr-receive-mask: Specifies what registers in the containing block
+      are allowed to receive interrupts. The value is a bit mask where a set
+      bit at bit 'n' indicates that message register 'n' can receive interrupts.
+      Note that "bit 'n'" is numbered from LSB for PPC hardware. The type shall
+      be <u32>. If not present, then all of the message registers in the block
+      are available.
+
+Aliases:
+
+    An alias should be created for every message register block.  They are not
+    required, though.  However, a particular implementation of this binding
+    may require aliases to be present.  Aliases are of the form
+    'mpic-msgr-block<n>', where <n> is an integer specifying the block's number.
+    Numbers shall start at 0.
+
+Example:
+
+	aliases {
+		mpic-msgr-block0 = &mpic_msgr_block0;
+		mpic-msgr-block1 = &mpic_msgr_block1;
+	};
+
+	mpic_msgr_block0: mpic-msgr-block@41400 {
+		compatible = "fsl,mpic-v3.1-msgr";
+		reg = <0x41400 0x200>;
+		// Message registers 0 and 2 in this block can receive interrupts on
+		// sources 0xb0 and 0xb2, respectively.
+		interrupts = <0xb0 2 0xb2 2>;
+		mpic-msgr-receive-mask = <0x5>;
+	};
+
+	mpic_msgr_block1: mpic-msgr-block@42400 {
+		compatible = "fsl,mpic-v3.1-msgr";
+		reg = <0x42400 0x200>;
+		// Message registers 0 and 2 in this block can receive interrupts on
+		// sources 0xb4 and 0xb6, respectively.
+		interrupts = <0xb4 2 0xb6 2>;
+		mpic-msgr-receive-mask = <0x5>;
+	};
+
-- 
1.7.5.1

^ permalink raw reply related

* [PATCH 3/3] CPU hotplug, arch/sparc: Fix CPU hotplug callback registration
From: Srivatsa S. Bhat @ 2012-03-01  8:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Nick Piggin, Paul E. McKenney, Rusty Russell,
	linux-kernel, Rafael J. Wysocki, Paul Gortmaker, Alexander Viro,
	KOSAKI Motohiro, sparclinux, linux-fsdevel, Andrew Morton,
	Arjan van de Ven, ppc-dev, David S. Miller, Peter Zijlstra
In-Reply-To: <4F4F2F7F.5040207@linux.vnet.ibm.com>


Restructure CPU hotplug setup and callback registration in topology_init
so as to be race-free.

---

 arch/sparc/kernel/sysfs.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/kernel/sysfs.c b/arch/sparc/kernel/sysfs.c
index 654e8aa..22cb881 100644
--- a/arch/sparc/kernel/sysfs.c
+++ b/arch/sparc/kernel/sysfs.c
@@ -300,16 +300,14 @@ static int __init topology_init(void)
 
 	check_mmu_stats();
 
-	register_cpu_notifier(&sysfs_cpu_nb);
-
 	for_each_possible_cpu(cpu) {
 		struct cpu *c = &per_cpu(cpu_devices, cpu);
 
 		register_cpu(c, cpu);
-		if (cpu_online(cpu))
-			register_cpu_online(cpu);
 	}
 
+	register_allcpu_notifier(&sysfs_cpu_nb, true, NULL);
+
 	return 0;
 }
 

^ permalink raw reply related

* [PATCH 2/3] CPU hotplug, arch/powerpc: Fix CPU hotplug callback registration
From: Srivatsa S. Bhat @ 2012-03-01  8:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Nick Piggin, Paul E. McKenney, Rusty Russell,
	linux-kernel, Rafael J. Wysocki, Paul Gortmaker, Alexander Viro,
	KOSAKI Motohiro, sparclinux, linux-fsdevel, Andrew Morton,
	Arjan van de Ven, ppc-dev, David S. Miller, Peter Zijlstra
In-Reply-To: <4F4F2F7F.5040207@linux.vnet.ibm.com>



Restructure CPU hotplug setup and callback registration in topology_init
so as to be race-free.

---

 arch/powerpc/kernel/sysfs.c |   44 +++++++++++++++++++++++++++++++++++--------
 arch/powerpc/mm/numa.c      |   11 ++++++++---
 2 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 883e74c..5838b33 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -496,6 +496,38 @@ ssize_t arch_cpu_release(const char *buf, size_t count)
 
 #endif /* CONFIG_HOTPLUG_CPU */
 
+static void cpu_register_helper(struct cpu *c, int cpu)
+{
+	register_cpu(c, cpu);
+	device_create_file(&c->dev, &dev_attr_physical_id);
+}
+
+static int __cpuinit sysfs_cpu_notify_first_time(struct notifier_block *self,
+				      unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (unsigned int)(long)hcpu;
+	struct cpu *c = &per_cpu(cpu_devices, cpu);
+
+	if (action == CPU_ONLINE)
+		if (!c->hotpluggable) /* Avoid duplicate registrations */
+			cpu_register_helper(c, cpu);
+		register_cpu_online(cpu);
+	}
+	return NOTIFY_OK;
+}
+static int __cpuinit sysfs_cpu_notify_setup(void)
+{
+	int cpu;
+
+	/*
+	 * We don't race with CPU hotplug because we are called from
+	 * the CPU hotplug callback registration function.
+	 */
+	for_each_online_cpu(cpu)
+		sysfs_cpu_notify_first_time(NULL, CPU_ONLINE, cpu);
+
+	return 0;
+}
 static int __cpuinit sysfs_cpu_notify(struct notifier_block *self,
 				      unsigned long action, void *hcpu)
 {
@@ -637,7 +669,6 @@ static int __init topology_init(void)
 	int cpu;
 
 	register_nodes();
-	register_cpu_notifier(&sysfs_cpu_nb);
 
 	for_each_possible_cpu(cpu) {
 		struct cpu *c = &per_cpu(cpu_devices, cpu);
@@ -652,15 +683,12 @@ static int __init topology_init(void)
 		if (ppc_md.cpu_die)
 			c->hotpluggable = 1;
 
-		if (cpu_online(cpu) || c->hotpluggable) {
-			register_cpu(c, cpu);
+		if (c->hotpluggable)
+			cpu_register_helper(c, cpu);
+	}
 
-			device_create_file(&c->dev, &dev_attr_physical_id);
-		}
+	register_allcpu_notifier(&sysfs_cpu_nb, true, &sysfs_cpu_notify_setup);
 
-		if (cpu_online(cpu))
-			register_cpu_online(cpu);
-	}
 #ifdef CONFIG_PPC64
 	sysfs_create_dscr_default();
 #endif /* CONFIG_PPC64 */
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 3feefc3..e326455 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1014,6 +1014,13 @@ static void __init mark_reserved_regions_for_nid(int nid)
 	}
 }
 
+static int __cpuinit cpu_numa_callback_setup(void)
+{
+	cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
+			(void *)(unsigned long)boot_cpuid);
+	return 0;
+}
+
 
 void __init do_init_bootmem(void)
 {
@@ -1088,9 +1095,7 @@ void __init do_init_bootmem(void)
 	 */
 	setup_node_to_cpumask_map();
 
-	register_cpu_notifier(&ppc64_numa_nb);
-	cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
-			  (void *)(unsigned long)boot_cpuid);
+	register_allcpu_notifier(&ppc64_numa_nb, true, &cpu_numa_callback_setup);
 }
 
 void __init paging_init(void)

^ permalink raw reply related

* [PATCH 1/3] CPU hotplug: Fix issues with callback registration
From: Srivatsa S. Bhat @ 2012-03-01  8:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Nick Piggin, Paul E. McKenney, Rusty Russell,
	linux-kernel, Rafael J. Wysocki, Paul Gortmaker, Alexander Viro,
	KOSAKI Motohiro, sparclinux, linux-fsdevel, Andrew Morton,
	Arjan van de Ven, ppc-dev, David S. Miller, Peter Zijlstra
In-Reply-To: <4F4F2F7F.5040207@linux.vnet.ibm.com>


Currently, there are several intertwined problems with CPU hotplug callback
registration:

Code which needs to get notified of CPU hotplug events and additionally wants
to do something for each already online CPU, would typically do something like:

   register_cpu_notifier(&foobar_cpu_notifier);
				<============ "A"
   get_online_cpus();
   for_each_online_cpu(cpu) {
	/* Do something */
   }
   put_online_cpus();

At the point marked as "A", a CPU hotplug event could sneak in, leaving the
code confused. Moving the registration to after put_online_cpus() won't help
either, because we could be losing a CPU hotplug event between put_online_cpus()
and the callback registration. Also, doing the registration inside the
get/put_online_cpus() block is also not going to help, because it will lead to
ABBA deadlock with CPU hotplug, the 2 locks being cpu_add_remove_lock and
cpu_hotplug lock.

It is also to be noted that, at times, we might want to do different setups
or initializations depending on whether a CPU is coming online for the first
time (as part of booting) or whether it is being only soft-onlined at a later
point in time. To achieve this, doing something like the code shown above,
with the "Do something" being different than what the registered callback
does wouldn't work out, because of the race conditions mentioned above.

The solution to all this is to include "history replay upon request" within
the CPU hotplug callback registration code, while also providing an option
for a different callback to be invoked while replaying history.

Though the above mentioned race condition was mostly theoretical before, it
gets all real when things like asynchronous booting[1] come into the picture,
as shown by the PowerPC boot failure in [2]. So this fix is also a step forward
in getting cool things like asynchronous booting to work properly.

References:
[1]. https://lkml.org/lkml/2012/2/14/62

---

 include/linux/cpu.h |   15 +++++++++++++++
 kernel/cpu.c        |   49 ++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 6e53b48..90a6d76 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -124,16 +124,25 @@ enum {
 #endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */
 #ifdef CONFIG_HOTPLUG_CPU
 extern int register_cpu_notifier(struct notifier_block *nb);
+extern int register_allcpu_notifier(struct notifier_block *nb,
+			bool replay_history, int (*history_setup)(void));
 extern void unregister_cpu_notifier(struct notifier_block *nb);
 #else
 
 #ifndef MODULE
 extern int register_cpu_notifier(struct notifier_block *nb);
+extern int register_allcpu_notifier(struct notifier_block *nb,
+			bool replay_history, int (*history_setup)(void));
 #else
 static inline int register_cpu_notifier(struct notifier_block *nb)
 {
 	return 0;
 }
+static inline int register_allcpu_notifier(struct notifier_block *nb,
+			bool replay_history, int (*history_setup)(void))
+{
+	return 0;
+}
 #endif
 
 static inline void unregister_cpu_notifier(struct notifier_block *nb)
@@ -155,6 +164,12 @@ static inline int register_cpu_notifier(struct notifier_block *nb)
 	return 0;
 }
 
+static inline int register_allcpu_notifier(struct notifier_block *nb,
+			bool replay_history, int (*history_setup)(void))
+{
+	return 0;
+}
+
 static inline void unregister_cpu_notifier(struct notifier_block *nb)
 {
 }
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d520d34..1564c1d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -132,12 +132,56 @@ static void cpu_hotplug_done(void) {}
 /* Need to know about CPUs going up/down? */
 int __ref register_cpu_notifier(struct notifier_block *nb)
 {
-	int ret;
+	return register_allcpu_notifier(nb, false, NULL);
+}
+EXPORT_SYMBOL(register_cpu_notifier);
+
+int __ref register_allcpu_notifier(struct notifier_block *nb,
+			bool replay_history, int (*history_setup)(void))
+{
+	int cpu, ret = 0;
+
+	if (!replay_history && history_setup)
+		return -EINVAL;
+
 	cpu_maps_update_begin();
-	ret = raw_notifier_chain_register(&cpu_chain, nb);
+	/*
+	 * We don't race with CPU hotplug, because we just took the
+	 * cpu_add_remove_lock.
+	 */
+
+	if (!replay_history)
+		goto Register;
+
+	if (history_setup) {
+		/*
+		 * The caller has a special setup routine to rewrite
+		 * history as he desires. Just invoke it. Don't
+		 * proceed with callback registration if this setup is
+		 * unsuccessful.
+		 */
+		ret = history_setup();
+	} else {
+		/*
+		 * Fallback to the usual callback, if a special handler
+		 * for past CPU hotplug events is not specified.
+		 * In this case, we will replay only past CPU bring-up
+		 * events.
+		 */
+		for_each_online_cpu(cpu) {
+			nb->notifier_call(nb, CPU_UP_PREPARE, cpu);
+			nb->notifier_call(nb, CPU_ONLINE, cpu);
+		}
+	}
+
+ Register:
+	if (!ret)
+		ret = raw_notifier_chain_register(&cpu_chain, nb);
+
 	cpu_maps_update_done();
 	return ret;
 }
+EXPORT_SYMBOL(register_allcpu_notifier);
 
 static int __cpu_notify(unsigned long val, void *v, int nr_to_call,
 			int *nr_calls)
@@ -161,7 +205,6 @@ static void cpu_notify_nofail(unsigned long val, void *v)
 {
 	BUG_ON(cpu_notify(val, v));
 }
-EXPORT_SYMBOL(register_cpu_notifier);
 
 void __ref unregister_cpu_notifier(struct notifier_block *nb)
 {

^ permalink raw reply related

* Re: [PATCH] cpumask: fix lg_lock/br_lock.
From: Srivatsa S. Bhat @ 2012-03-01  8:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: sparclinux, Andi Kleen, Nick Piggin, KOSAKI Motohiro,
	Rusty Russell, linux-kernel, Rafael J. Wysocki, Paul Gortmaker,
	Alexander Viro, Arjan van de Ven, linux-fsdevel, Andrew Morton,
	Paul E. McKenney, ppc-dev, David S. Miller, Peter Zijlstra
In-Reply-To: <4F4E083A.2080304@linux.vnet.ibm.com>

On 02/29/2012 04:42 PM, Srivatsa S. Bhat wrote:

> On 02/29/2012 02:47 PM, Ingo Molnar wrote:
> 
>>
>> * Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>
>>> Hi Andrew,
>>>
>>> On 02/29/2012 02:57 AM, Andrew Morton wrote:
>>>
>>>> On Tue, 28 Feb 2012 09:43:59 +0100
>>>> Ingo Molnar <mingo@elte.hu> wrote:
>>>>
>>>>> This patch should also probably go upstream through the 
>>>>> locking/lockdep tree? Mind sending it us once you think it's 
>>>>> ready?
>>>>
>>>> Oh goody, that means you own
>>>> http://marc.info/?l=linux-kernel&m=131419353511653&w=2.
>>>>
>>>
>>>
>>> That bug got fixed sometime around Dec 2011. See commit e30e2fdf
>>> (VFS: Fix race between CPU hotplug and lglocks)
>>
>> The lglocks code is still CPU-hotplug racy AFAICS, despite the 
>> ->cpu_lock complication:
>>
>> Consider a taken global lock on a CPU:
>>
>> 	CPU#1
>> 	...
>> 	br_write_lock(vfsmount_lock);
>>
>> this takes the lock of all online CPUs: say CPU#1 and CPU#2. Now 
>> CPU#3 comes online and takes the read lock:
> 
> 
> CPU#3 cannot come online! :-)
> 
> No new CPU can come online until that corresponding br_write_unlock()
> is completed. That is because  br_write_lock acquires &name##_cpu_lock
> and only br_write_unlock will release it.
> And, CPU_UP_PREPARE callback tries to acquire that very same spinlock,
> and hence will keep spinning until br_write_unlock() is run. And hence,
> the CPU#3 or any new CPU online for that matter will not complete until
> br_write_unlock() is done.
> 
> It is of course debatable as to how good this design really is, but IMHO,
> the lglocks code is not CPU-hotplug racy now..
> 
> Here is the link to the original discussion during the development of
> that patch: thread.gmane.org/gmane.linux.file-systems/59750/
> 
>>
>> 			CPU#3
>> 			br_read_lock(vfsmount_lock);
>>
>> This will succeed while the br_write_lock() is still active, 
>> because CPU#1 has only taken the locks of CPU#1 and CPU#2. 
>>
>> Crash!
>>
>> The proper fix would be for CPU-online to serialize with all 
>> known lglocks, via the notifier callback, i.e. to do something 
>> like this:
>>
>>         case CPU_UP_PREPARE:                                            
>> 		for_each_online_cpu(cpu) {
>> 	                spin_lock(&name##_cpu_lock);                            
>> 	                spin_unlock(&name##_cpu_lock);
>> 		}
>> 	...
>>
>> I.e. in essence do:
>>
>>         case CPU_UP_PREPARE:                                            
>> 		name##_global_lock_online();
>> 		name##_global_unlock_online();
>>
>> Another detail I noticed, this bit:
>>
>>         register_hotcpu_notifier(&name##_lg_cpu_notifier);              \
>>         get_online_cpus();                                              \
>>         for_each_online_cpu(i)                                          \
>>                 cpu_set(i, name##_cpus);                                \
>>         put_online_cpus();                                              \
>>
>> could be something simpler and loop-less, like:
>>
>>         get_online_cpus();
>> 	cpumask_copy(name##_cpus, cpu_online_mask);
>> 	register_hotcpu_notifier(&name##_lg_cpu_notifier);
>> 	put_online_cpus();
>>
> 
> 
> While the cpumask_copy is definitely better, we can't put the
> register_hotcpu_notifier() within get/put_online_cpus() because it will
> lead to ABBA deadlock with a newly initiated CPU Hotplug operation, the
> 2 locks involved being the cpu_add_remove_lock and the cpu_hotplug lock.
> 
> IOW, at the moment there is no "absolutely race-free way" way to do
> CPU Hotplug callback registration. Some time ago, while going through the
> asynchronous booting patch by Arjan [1] I had written up a patch to fix
> that race because that race got transformed from "purely theoretical"
> to "very real" with the async boot patch, as shown by the powerpc boot
> failures [2].
> 
> But then I stopped short of posting that patch to the lists because I
> started wondering how important that race would actually turn out to be,
> in case the async booting design takes a totally different approach
> altogether.. [And the reason why I didn't post it is also because it
> would require lots of changes in many parts where CPU Hotplug registration
> is done, and that wouldn't probably be justified (I don't know..) if the
> race remained only theoretical, as it is now.]
> 
> [1]. http://thread.gmane.org/gmane.linux.kernel/1246209
> [2]. https://lkml.org/lkml/2012/2/13/383
>  


Ok, now that I mentioned about my patch, let me as well show it some daylight..
It is totally untested, incomplete and probably won't even compile.. (given
that I had abandoned working on it some time ago, since I was not sure in
what direction the async boot design was headed, which was the original
motivation for me to try to fix this race)

I really hate to post it when it is in such a state, but atleast let me get
the idea out, now that the discussion is around it, atleast just to get some
thoughts about whether it is even worth pursuing! 
(I'll post the patches as a reply to this mail.)

By the way, it should solve the powerpc boot failure, atleast in principle,
considering what the root cause of the failure was..

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* [PATCH 2/2] powerpc: atomic: Implement atomic*_inc_not_zero
From: Anton Blanchard @ 2012-03-01  7:12 UTC (permalink / raw)
  To: benh, paulus, akpm, asharma, vapier, eric.dumazet
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20120301180953.0f61576f@kryten>


Implement atomic_inc_not_zero and atomic64_inc_not_zero. At the
moment we use atomic*_add_unless which requires us to put 0 and
1 constants into registers. We can also avoid a subtract by
saving the original value in a second temporary.

This removes 3 instructions from fget:

- c0000000001b63c0:       39 00 00 00     li      r8,0
- c0000000001b63c4:       39 40 00 01     li      r10,1
...
- c0000000001b63e8:       7c 0a 00 50     subf    r0,r10,r0

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-build/arch/powerpc/include/asm/atomic.h
===================================================================
--- linux-build.orig/arch/powerpc/include/asm/atomic.h	2012-02-11 21:42:36.101190317 +1100
+++ linux-build/arch/powerpc/include/asm/atomic.h	2012-02-11 21:58:46.102791345 +1100
@@ -212,6 +212,36 @@ static __inline__ int __atomic_add_unles
 	return t;
 }
 
+/**
+ * atomic_inc_not_zero - increment unless the number is zero
+ * @v: pointer of type atomic_t
+ *
+ * Atomically increments @v by 1, so long as @v is non-zero.
+ * Returns non-zero if @v was non-zero, and zero otherwise.
+ */
+static __inline__ int atomic_inc_not_zero(atomic_t *v)
+{
+	int t1, t2;
+
+	__asm__ __volatile__ (
+	PPC_ATOMIC_ENTRY_BARRIER
+"1:	lwarx	%0,0,%2		# atomic_inc_not_zero\n\
+	cmpwi	0,%0,0\n\
+	beq-	2f\n\
+	addic	%1,%0,1\n"
+	PPC405_ERR77(0,%2)
+"	stwcx.	%1,0,%2\n\
+	bne-	1b\n"
+	PPC_ATOMIC_EXIT_BARRIER
+	"\n\
+2:"
+	: "=&r" (t1), "=&r" (t2)
+	: "r" (&v->counter)
+	: "cc", "xer", "memory");
+
+	return t1;
+}
+#define atomic_inc_not_zero(v) atomic_inc_not_zero((v))
 
 #define atomic_sub_and_test(a, v)	(atomic_sub_return((a), (v)) == 0)
 #define atomic_dec_and_test(v)		(atomic_dec_return((v)) == 0)
@@ -467,7 +497,34 @@ static __inline__ int atomic64_add_unles
 	return t != u;
 }
 
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
+/**
+ * atomic_inc64_not_zero - increment unless the number is zero
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically increments @v by 1, so long as @v is non-zero.
+ * Returns non-zero if @v was non-zero, and zero otherwise.
+ */
+static __inline__ long atomic64_inc_not_zero(atomic64_t *v)
+{
+	long t1, t2;
+
+	__asm__ __volatile__ (
+	PPC_ATOMIC_ENTRY_BARRIER
+"1:	ldarx	%0,0,%2		# atomic64_inc_not_zero\n\
+	cmpdi	0,%0,0\n\
+	beq-	2f\n\
+	addic	%1,%0,1\n\
+	stdcx.	%1,0,%2\n\
+	bne-	1b\n"
+	PPC_ATOMIC_EXIT_BARRIER
+	"\n\
+2:"
+	: "=&r" (t1), "=&r" (t2)
+	: "r" (&v->counter)
+	: "cc", "xer", "memory");
+
+	return t1;
+}
 
 #endif /* __powerpc64__ */
 

^ permalink raw reply

* [PATCH 1/2] atomic: Allow atomic_inc_not_zero to be overridden
From: Anton Blanchard @ 2012-03-01  7:09 UTC (permalink / raw)
  To: benh, paulus, akpm, asharma, vapier, eric.dumazet, linuxppc-dev,
	linux-kernel


We want to implement a ppc64 specific version of atomic_inc_not_zero
so wrap it in an ifdef to allow it to be overridden.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-build/include/linux/atomic.h
===================================================================
--- linux-build.orig/include/linux/atomic.h	2012-02-11 14:59:23.284714257 +1100
+++ linux-build/include/linux/atomic.h	2012-02-11 15:01:14.894764555 +1100
@@ -24,7 +24,9 @@ static inline int atomic_add_unless(atom
  * Atomically increments @v by 1, so long as @v is non-zero.
  * Returns non-zero if @v was non-zero, and zero otherwise.
  */
+#ifndef atomic_inc_not_zero
 #define atomic_inc_not_zero(v)		atomic_add_unless((v), 1, 0)
+#endif
 
 /**
  * atomic_inc_not_zero_hint - increment if not null

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/44x: Add more changes for APM821XX EMAC driver
From: Duc Dang @ 2012-03-01  5:05 UTC (permalink / raw)
  To: David Miller, jwboyer; +Cc: netdev, paulus, linuxppc-dev, linux-kernel
In-Reply-To: <20120229.132513.1607879808995168004.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1964 bytes --]

Thanks, David and Josh.

Except the coding style problem that David mentioned, do you have other
comment about my patch set?

Regards,
Duc Dang.

On Thu, Mar 1, 2012 at 1:25 AM, David Miller <davem@davemloft.net> wrote:

> From: Josh Boyer <jwboyer@gmail.com>
> Date: Wed, 29 Feb 2012 08:43:46 -0500
>
> > On Fri, Feb 17, 2012 at 3:07 AM, Duc Dang <dhdang@apm.com> wrote:
> >> This patch includes:
> >>
> >>  Configure EMAC PHY clock source (clock from PHY or internal clock).
> >>
> >>  Do not advertise PHY half duplex capability as APM821XX EMAC does not
> >> support half duplex mode.
> >>
> >>  Add changes to support configuring jumbo frame for APM821XX EMAC.
> >>
> >> Signed-off-by: Duc Dang <dhdang@apm.com>
> >
> > This should have been sent to netdev.  CC'ing them now.
> >
> > Ben and David, I can take this change through the 4xx tree if it looks
> OK to
> > both of you.  The pre-requisite DTS patch will go through my tree, so it
> might
> > make sense to keep them together.
>
> Well the patch has coding style problems, for one:
>
> >> +                    dev->features |=
> (EMAC_APM821XX_REQ_JUMBO_FRAME_SIZE
> >> +                                    | EMAC_FTR_APM821XX_NO_HALF_DUPLEX
> >> +                                    | EMAC_FTR_460EX_PHY_CLK_FIX);
>
> Should be:
>
> >> +                    dev->features |=
> (EMAC_APM821XX_REQ_JUMBO_FRAME_SIZE |
> >> +                                      EMAC_FTR_APM821XX_NO_HALF_DUPLEX
> |
> >> +                                      EMAC_FTR_460EX_PHY_CLK_FIX);
>
> And this:
>
> >> +            dev->phy_feat_exc = (SUPPORTED_1000baseT_Half
> >> +                                    | SUPPORTED_100baseT_Half
> >> +                                    | SUPPORTED_10baseT_Half);
>
> Should be:
>
> >> +            dev->phy_feat_exc = (SUPPORTED_1000baseT_Half |
> >> +                                 SUPPORTED_100baseT_Half |
> >> +                                 SUPPORTED_10baseT_Half);
>

[-- Attachment #2: Type: text/html, Size: 2650 bytes --]

^ permalink raw reply

* [PATCH 26/36] PCI, powerpc: Register busn_res for root buses
From: Yinghai Lu @ 2012-03-01  3:00 UTC (permalink / raw)
  To: Jesse Barnes, Benjamin Herrenschmidt, Tony Luck, David Miller,
	x86
  Cc: linux-arch, linux-pci, linuxppc-dev, linux-kernel,
	Dominik Brodowski, Paul Mackerras, Bjorn Helgaas, Yinghai Lu
In-Reply-To: <1330570837-26638-1-git-send-email-yinghai@kernel.org>

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/pci-bridge.h |    1 +
 arch/powerpc/kernel/pci-common.c      |   10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 5d48765..11cebf0 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -30,6 +30,7 @@ struct pci_controller {
 	int first_busno;
 	int last_busno;
 	int self_busno;
+	struct resource busn;
 
 	void __iomem *io_base_virt;
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 910b9de..ee8c0c9 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1648,6 +1648,11 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
 	/* Wire up PHB bus resources */
 	pcibios_setup_phb_resources(hose, &resources);
 
+	hose->busn.start = hose->first_busno;
+	hose->busn.end	 = hose->last_busno;
+	hose->busn.flags = IORESOURCE_BUS;
+	pci_add_resource(&resources, &hose->busn);
+
 	/* Create an empty bus for the toplevel */
 	bus = pci_create_root_bus(hose->parent, hose->first_busno,
 				  hose->ops, hose, &resources);
@@ -1670,8 +1675,11 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
 		of_scan_bus(node, bus);
 	}
 
-	if (mode == PCI_PROBE_NORMAL)
+	if (mode == PCI_PROBE_NORMAL) {
+		pci_bus_update_busn_res_end(bus, 255);
 		hose->last_busno = bus->subordinate = pci_scan_child_bus(bus);
+		pci_bus_update_busn_res_end(bus, bus->subordinate);
+	}
 
 	/* Platform gets a chance to do some global fixups before
 	 * we proceed to resource allocation
-- 
1.7.7

^ permalink raw reply related

* RE: [PATCH 1/3] powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board
From: Vinh Huu Tuong Nguyen @ 2012-03-01  2:53 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <CA+5PVA5SZmeEdBF0XvNmvCwroCKs1EwdWNbd_nwyEDCoZGHo2Q@mail.gmail.com>

> -----Original Message-----
> From: Josh Boyer [mailto:jwboyer@gmail.com]
> Sent: Wednesday, February 29, 2012 8:54 PM
> To: Vinh Nguyen Huu Tuong
> Cc: Benjamin Herrenschmidt; Paul Mackerras; linuxppc-
> dev@lists.ozlabs.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/3] powerpc/44x: The bug fixed support for
> APM821xx SoC and Bluestone board
>
> On Tue, Dec 20, 2011 at 7:43 AM, Vinh Nguyen Huu Tuong
> <vhtnguyen@apm.com> wrote:
> > This patch consists of:
> > - Fix the pvr mask for checking pvr in cputable.c
> > - Fix the cpu name as consistent with cpu name is describled in dts
> file
> >
> > Signed-off-by: Vinh Nguyen Huu Tuong <vhtnguyen@apm.com>
> > ---
>
> I was waiting to see if you would submit a new series with patch 3/3
> fixed for
> the comments I made.  Seems you haven't yet or I missed it entirely.
> For now,
> I'll take this patch as it's stand-alone.  The DTS and PCI driver
> patches will
> need to be submitted together again.

I'm sorry for my lateness, I've been ready to submit new update 2 weeks
ago, but when I synced up with master branch again, the version of Linux
is changed (changed to 3.2) with some updates from Marri that made my
update didn't work. I have to update and test again. I'll submit the new
update within this week as your recommendation.

Best regards,
Vinh Nguyen.

^ permalink raw reply

* RE: [PATCH][v3] NAND Machine support for Integrated Flash Controller
From: Kushwaha Prabhakar-B32579 @ 2012-03-01  2:25 UTC (permalink / raw)
  To: Kumar Gala, dedekind1@gmail.com
  Cc: Wood Scott-B07421, Aggrwal Poonam-B10812, Li Yang-R58472,
	Liu Shuo-B35362, linux-mtd@lists.infradead.org, Dipen Dudhat,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1327063925-3580-1-git-send-email-prabhakar@freescale.com>

Hi Kumar,

This patch is supposed to be pushed via powerpc.git repository to main-line=
.
Because of dependent patch in powerpc/mpc85xx: " powerpc/fsl: Add support f=
or Integrated Flash Controller support"
And it is already picked by you.
	Commit ID: a20cbdeffce247a2b6fb83cd8d22433994068565

So, can you please pick this patch in powerpc.git for future main-line pull=
 request as early as possible.
It will avoid future rebasing of this :)

Regards,
Prabhakar

> -----Original Message-----
> From: Kushwaha Prabhakar-B32579
> Sent: Friday, January 20, 2012 6:22 PM
> To: linuxppc-dev@lists.ozlabs.org; linux-mtd@lists.infradead.org
> Cc: Kushwaha Prabhakar-B32579; Dipen Dudhat; Wood Scott-B07421; Li Yang-
> R58472; Liu Shuo-B35362; Aggrwal Poonam-B10812
> Subject: [PATCH][v3] NAND Machine support for Integrated Flash Controller
>=20
> Integrated Flash Controller(IFC) can be used to hook NAND Flash chips
> using NAND Flash Machine available on it.
>=20
> Signed-off-by: Dipen Dudhat <Dipen.Dudhat@freescale.com>
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> Signed-off-by: Li Yang <leoli@freescale.com>
> Signed-off-by: Liu Shuo <b35362@freescale.com>
> Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
> Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
> ---
>  Based upon
> git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git (branch
> next)
>=20
>  Tested on P1010RDB
>=20
>  Changes for v2: Ported IFC driver for linux-3.2.0-rc3
> 	- Use chip->bbt_options for BBT
> 	- Use mtd_device_parse_register instead of old parse_mtd_partitions
>=20
>   Changes for v3: Squashed following patch to make singe NAND driver
> patch
> 	- mtd/nand:Fix wrong usage of is_blank() in fsl_ifc_run_command
> 		http://patchwork.ozlabs.org/patch/136547/
> 	- mtd/nand: Fix IFC driver to support 2K NAND page
> 		http://patchwork.ozlabs.org/patch/135010/
>=20

^ permalink raw reply

* Re: [PATCH 20/21] Introduce struct eeh_stats for EEH - Reworked
From: Gavin Shan @ 2012-03-01  1:47 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1330409051-8941-21-git-send-email-shangw@linux.vnet.ibm.com>

With the original EEH implementation, the EEH global statistics
are maintained by individual global variables. That makes the
code a little hard to maintain.

The patch introduces extra struct eeh_stats for the EEH global
statistics so that it can be maintained in collective fashion.

It's the rework on the corresponding v5 patch. According to
the comments from David Laight, the EEH global statistics have
been changed for a litte bit so that they have fixed-type of
"u64". Also, the format used to print them has been changed to
"%llu" based on David's suggestion. Also, the output format of
EEH global statistics should be kept as intacted according to
Michael's suggestion that there might be tools parsing them.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh.c |   65 ++++++++++++++++++++--------------
 1 files changed, 38 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
index 9b1fd0c..1d08cd7 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -102,14 +102,22 @@ static DEFINE_RAW_SPINLOCK(confirm_error_lock);
 #define EEH_PCI_REGS_LOG_LEN 4096
 static unsigned char pci_regs_buf[EEH_PCI_REGS_LOG_LEN];
 
-/* System monitoring statistics */
-static unsigned long no_device;
-static unsigned long no_dn;
-static unsigned long no_cfg_addr;
-static unsigned long ignored_check;
-static unsigned long total_mmio_ffs;
-static unsigned long false_positives;
-static unsigned long slot_resets;
+/*
+ * The struct is used to maintain the EEH global statistic
+ * information. Besides, the EEH global statistics will be
+ * exported to user space through procfs
+ */
+struct eeh_stats {
+	u64 no_device;		/* PCI device not found		*/
+	u64 no_dn;		/* OF node not found		*/
+	u64 no_cfg_addr;	/* Config address not found	*/
+	u64 ignored_check;	/* EEH check skipped		*/
+	u64 total_mmio_ffs;	/* Total EEH checks		*/
+	u64 false_positives;	/* Unnecessary EEH checks	*/
+	u64 slot_resets;	/* PE reset			*/
+};
+
+static struct eeh_stats eeh_stats;
 
 #define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE)
 
@@ -392,13 +400,13 @@ int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev)
 	int rc = 0;
 	const char *location;
 
-	total_mmio_ffs++;
+	eeh_stats.total_mmio_ffs++;
 
 	if (!eeh_subsystem_enabled)
 		return 0;
 
 	if (!dn) {
-		no_dn++;
+		eeh_stats.no_dn++;
 		return 0;
 	}
 	dn = eeh_find_device_pe(dn);
@@ -407,14 +415,14 @@ int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev)
 	/* Access to IO BARs might get this far and still not want checking. */
 	if (!(edev->mode & EEH_MODE_SUPPORTED) ||
 	    edev->mode & EEH_MODE_NOCHECK) {
-		ignored_check++;
+		eeh_stats.ignored_check++;
 		pr_debug("EEH: Ignored check (%x) for %s %s\n",
 			edev->mode, eeh_pci_name(dev), dn->full_name);
 		return 0;
 	}
 
 	if (!edev->config_addr && !edev->pe_config_addr) {
-		no_cfg_addr++;
+		eeh_stats.no_cfg_addr++;
 		return 0;
 	}
 
@@ -460,13 +468,13 @@ int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev)
 	    (ret == EEH_STATE_NOT_SUPPORT) ||
 	    (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) ==
 	    (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) {
-		false_positives++;
+		eeh_stats.false_positives++;
 		edev->false_positives ++;
 		rc = 0;
 		goto dn_unlock;
 	}
 
-	slot_resets++;
+	eeh_stats.slot_resets++;
  
 	/* Avoid repeated reports of this failure, including problems
 	 * with other functions on this device, and functions under
@@ -513,7 +521,7 @@ unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned lon
 	addr = eeh_token_to_phys((unsigned long __force) token);
 	dev = pci_addr_cache_get_device(addr);
 	if (!dev) {
-		no_device++;
+		eeh_stats.no_device++;
 		return val;
 	}
 
@@ -1174,21 +1182,24 @@ static int proc_eeh_show(struct seq_file *m, void *v)
 {
 	if (0 == eeh_subsystem_enabled) {
 		seq_printf(m, "EEH Subsystem is globally disabled\n");
-		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", total_mmio_ffs);
+		seq_printf(m, "eeh_total_mmio_ffs=%llu\n", eeh_stats.total_mmio_ffs);
 	} else {
 		seq_printf(m, "EEH Subsystem is enabled\n");
 		seq_printf(m,
-				"no device=%ld\n"
-				"no device node=%ld\n"
-				"no config address=%ld\n"
-				"check not wanted=%ld\n"
-				"eeh_total_mmio_ffs=%ld\n"
-				"eeh_false_positives=%ld\n"
-				"eeh_slot_resets=%ld\n",
-				no_device, no_dn, no_cfg_addr, 
-				ignored_check, total_mmio_ffs, 
-				false_positives,
-				slot_resets);
+				"no device=%llu\n"
+				"no device node=%llu\n"
+				"no config address=%llu\n"
+				"check not wanted=%llu\n"
+				"eeh_total_mmio_ffs=%llu\n"
+				"eeh_false_positives=%llu\n"
+				"eeh_slot_resets=%llu\n",
+				eeh_stats.no_device,
+				eeh_stats.no_dn,
+				eeh_stats.no_cfg_addr,
+				eeh_stats.ignored_check,
+				eeh_stats.total_mmio_ffs,
+				eeh_stats.false_positives,
+				eeh_stats.slot_resets);
 	}
 
 	return 0;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH v2 2/2] KVM: booke: Improve SPE switch
From: Olivia Yin @ 2012-03-01  1:20 UTC (permalink / raw)
  To: kvm-ppc, kvm, linuxppc-dev; +Cc: Liu Yu, Olivia Yin

From: Liu Yu <yu.liu@freescale.com>

Like book3s did for fp switch,
instead of switch SPE between host and guest,
the patch switch SPE state between qemu and guest.
In this way, we can simulate a host loadup SPE when load guest SPE state,
and let host to decide when to giveup SPE state.
Therefor it cooperates better with host SPE usage,
and so that has some performance benifit in UP host(lazy SPE).

Moreover, since the patch save guest SPE state into linux thread field,
it creates the condition to emulate guest SPE instructions in host,
so that we can avoid injecting SPE exception to guest.

The patch also turns all asm code into C code,
and add SPE stat counts.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Olivia Yin <hong-hua.yin@freescale.com>
---
v2: 	
Keep shadow MSR[SPE] consistent with 
thread MSR[SPE] in kvmppc_core_vcpu_load

 arch/powerpc/include/asm/kvm_host.h |   11 +++++-
 arch/powerpc/kernel/asm-offsets.c   |    7 ----
 arch/powerpc/kvm/booke.c            |   63 +++++++++++++++++++++++++++++++----
 arch/powerpc/kvm/booke.h            |    8 +----
 arch/powerpc/kvm/booke_interrupts.S |   37 --------------------
 arch/powerpc/kvm/e500.c             |   13 ++++---
 arch/powerpc/kvm/timing.c           |    5 +++
 arch/powerpc/kvm/timing.h           |   11 ++++++
 8 files changed, 91 insertions(+), 64 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 1843d5d..6186d08 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -117,6 +117,11 @@ struct kvm_vcpu_stat {
 	u32 st;
 	u32 st_slow;
 #endif
+#ifdef CONFIG_SPE
+	u32 spe_unavail;
+	u32 spe_fp_data;
+	u32 spe_fp_round;
+#endif
 };
 
 enum kvm_exit_types {
@@ -147,6 +152,11 @@ enum kvm_exit_types {
 	FP_UNAVAIL,
 	DEBUG_EXITS,
 	TIMEINGUEST,
+#ifdef CONFIG_SPE
+	SPE_UNAVAIL,
+	SPE_FP_DATA,
+	SPE_FP_ROUND,
+#endif
 	__NUMBER_OF_KVM_EXIT_TYPES
 };
 
@@ -330,7 +340,6 @@ struct kvm_vcpu_arch {
 #ifdef CONFIG_SPE
 	ulong evr[32];
 	ulong spefscr;
-	ulong host_spefscr;
 	u64 acc;
 #endif
 #ifdef CONFIG_ALTIVEC
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 8e0db0b..ff68f71 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -604,13 +604,6 @@ int main(void)
 	DEFINE(TLBCAM_MAS7, offsetof(struct tlbcam, MAS7));
 #endif
 
-#if defined(CONFIG_KVM) && defined(CONFIG_SPE)
-	DEFINE(VCPU_EVR, offsetof(struct kvm_vcpu, arch.evr[0]));
-	DEFINE(VCPU_ACC, offsetof(struct kvm_vcpu, arch.acc));
-	DEFINE(VCPU_SPEFSCR, offsetof(struct kvm_vcpu, arch.spefscr));
-	DEFINE(VCPU_HOST_SPEFSCR, offsetof(struct kvm_vcpu, arch.host_spefscr));
-#endif
-
 #ifdef CONFIG_KVM_EXIT_TIMING
 	DEFINE(VCPU_TIMING_EXIT_TBU, offsetof(struct kvm_vcpu,
 						arch.timing_exit.tv32.tbu));
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index ee9e1ee..f20010b 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -55,6 +55,11 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ "dec",        VCPU_STAT(dec_exits) },
 	{ "ext_intr",   VCPU_STAT(ext_intr_exits) },
 	{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
+#ifdef CONFIG_SPE
+	{ "spe_unavail", VCPU_STAT(spe_unavail) },
+	{ "spe_fp_data", VCPU_STAT(spe_fp_data) },
+	{ "spe_fp_round", VCPU_STAT(spe_fp_round) },
+#endif
 	{ NULL }
 };
 
@@ -80,11 +85,11 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
 }
 
 #ifdef CONFIG_SPE
-void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu)
+static void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu)
 {
 	preempt_disable();
-	enable_kernel_spe();
-	kvmppc_save_guest_spe(vcpu);
+	if (current->thread.regs->msr & MSR_SPE)
+		giveup_spe(current);
 	vcpu->arch.shadow_msr &= ~MSR_SPE;
 	preempt_enable();
 }
@@ -92,8 +97,10 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu)
 static void kvmppc_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 {
 	preempt_disable();
-	enable_kernel_spe();
-	kvmppc_load_guest_spe(vcpu);
+	if (!(current->thread.regs->msr & MSR_SPE)) {
+		load_up_spe(NULL);
+		current->thread.regs->msr |= MSR_SPE;
+	}
 	vcpu->arch.shadow_msr |= MSR_SPE;
 	preempt_enable();
 }
@@ -104,7 +111,7 @@ static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu)
 		if (!(vcpu->arch.shadow_msr & MSR_SPE))
 			kvmppc_vcpu_enable_spe(vcpu);
 	} else if (vcpu->arch.shadow_msr & MSR_SPE) {
-		kvmppc_vcpu_disable_spe(vcpu);
+		vcpu->arch.shadow_msr &= ~MSR_SPE;
 	}
 }
 #else
@@ -124,7 +131,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
 	vcpu->arch.shared->msr = new_msr;
 
 	kvmppc_mmu_msr_notify(vcpu, old_msr);
-	kvmppc_vcpu_sync_spe(vcpu);
+ 	if ((old_msr ^ new_msr) & MSR_SPE)
+		kvmppc_vcpu_sync_spe(vcpu);
 }
 
 static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
@@ -338,6 +346,11 @@ void kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
 	int ret;
+#ifdef CONFIG_SPE
+	ulong evr[32];
+	ulong spefscr;
+	u64 acc;
+#endif
 
 	if (!vcpu->arch.sane) {
 		kvm_run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
@@ -355,7 +368,40 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	}
 
 	kvm_guest_enter();
+#ifdef CONFIG_SPE
+	/* Save userspace SPE state in stack */
+	enable_kernel_spe();
+	memcpy(evr, current->thread.evr, sizeof(current->thread.evr));
+	acc = current->thread.acc;
+
+	/* Restore guest SPE state to thread */
+	memcpy(current->thread.evr, vcpu->arch.evr, sizeof(vcpu->arch.evr));
+	current->thread.acc = vcpu->arch.acc;
+
+	/* Switch SPEFSCR and load guest SPE state if needed */
+	spefscr = mfspr(SPRN_SPEFSCR);
+	kvmppc_vcpu_sync_spe(vcpu);
+	mtspr(SPRN_SPEFSCR, vcpu->arch.spefscr);
+#endif
+
 	ret = __kvmppc_vcpu_run(kvm_run, vcpu);
+
+#ifdef CONFIG_SPE
+	/* Switch SPEFSCR and save guest SPE state if needed */
+	vcpu->arch.spefscr = mfspr(SPRN_SPEFSCR);
+	kvmppc_vcpu_disable_spe(vcpu);
+	mtspr(SPRN_SPEFSCR, spefscr);
+
+	/* Save guest SPE state from thread */
+	memcpy(vcpu->arch.evr, current->thread.evr, sizeof(vcpu->arch.evr));
+	vcpu->arch.acc = current->thread.acc;
+
+	/* Restore userspace SPE state from stack */
+	memcpy(current->thread.evr, evr, sizeof(current->thread.evr));
+	current->thread.spefscr = spefscr;
+	current->thread.acc = acc;
+#endif
+
 	kvm_guest_exit();
 
 out:
@@ -457,17 +503,20 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		else
 			kvmppc_booke_queue_irqprio(vcpu,
 						   BOOKE_IRQPRIO_SPE_UNAVAIL);
+		kvmppc_account_exit(vcpu, SPE_UNAVAIL);
 		r = RESUME_GUEST;
 		break;
 	}
 
 	case BOOKE_INTERRUPT_SPE_FP_DATA:
 		kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA);
+		kvmppc_account_exit(vcpu, SPE_FP_DATA);
 		r = RESUME_GUEST;
 		break;
 
 	case BOOKE_INTERRUPT_SPE_FP_ROUND:
 		kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND);
+		kvmppc_account_exit(vcpu, SPE_FP_ROUND);
 		r = RESUME_GUEST;
 		break;
 #else
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 2fe2027..c02b8f9 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -22,6 +22,7 @@
 
 #include <linux/types.h>
 #include <linux/kvm_host.h>
+#include <asm/system.h>
 #include <asm/kvm_ppc.h>
 #include "timing.h"
 
@@ -64,11 +65,4 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt);
 int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs);
 
-/* low-level asm code to transfer guest state */
-void kvmppc_load_guest_spe(struct kvm_vcpu *vcpu);
-void kvmppc_save_guest_spe(struct kvm_vcpu *vcpu);
-
-/* high-level function, manages flags, host state */
-void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
-
 #endif /* __KVM_BOOKE_H__ */
diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S
index 10d8ef6..c44367d 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -245,15 +245,6 @@ _GLOBAL(kvmppc_resume_host)
 
 heavyweight_exit:
 	/* Not returning to guest. */
-
-#ifdef CONFIG_SPE
-	/* save guest SPEFSCR and load host SPEFSCR */
-	mfspr	r9, SPRN_SPEFSCR
-	stw	r9, VCPU_SPEFSCR(r4)
-	lwz	r9, VCPU_HOST_SPEFSCR(r4)
-	mtspr	SPRN_SPEFSCR, r9
-#endif
-
 	/* We already saved guest volatile register state; now save the
 	 * non-volatiles. */
 	stw	r15, VCPU_GPR(r15)(r4)
@@ -355,14 +346,6 @@ _GLOBAL(__kvmppc_vcpu_run)
 	lwz	r30, VCPU_GPR(r30)(r4)
 	lwz	r31, VCPU_GPR(r31)(r4)
 
-#ifdef CONFIG_SPE
-	/* save host SPEFSCR and load guest SPEFSCR */
-	mfspr	r3, SPRN_SPEFSCR
-	stw	r3, VCPU_HOST_SPEFSCR(r4)
-	lwz	r3, VCPU_SPEFSCR(r4)
-	mtspr	SPRN_SPEFSCR, r3
-#endif
-
 lightweight_exit:
 	stw	r2, HOST_R2(r1)
 
@@ -460,23 +443,3 @@ lightweight_exit:
 	lwz	r4, VCPU_GPR(r4)(r4)
 	rfi
 
-#ifdef CONFIG_SPE
-_GLOBAL(kvmppc_save_guest_spe)
-	cmpi	0,r3,0
-	beqlr-
-	SAVE_32EVRS(0, r4, r3, VCPU_EVR)
-	evxor   evr6, evr6, evr6
-	evmwumiaa evr6, evr6, evr6
-	li	r4,VCPU_ACC
-	evstddx evr6, r4, r3		/* save acc */
-	blr
-
-_GLOBAL(kvmppc_load_guest_spe)
-	cmpi	0,r3,0
-	beqlr-
-	li      r4,VCPU_ACC
-	evlddx  evr6,r4,r3
-	evmra   evr6,evr6		/* load acc */
-	REST_32EVRS(0, r4, r3, VCPU_EVR)
-	blr
-#endif
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index ddcd896..dfc516b 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -37,16 +37,19 @@ void kvmppc_core_load_guest_debugstate(struct kvm_vcpu *vcpu)
 void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	kvmppc_e500_tlb_load(vcpu, cpu);
+
+	/*
+	 * Keep shadow MSR[SPE] consistent with thread MSR[SPE].
+	 * If guest SPE state is saved by host, we just diable guest SPE.
+	 */
+	if ((current->flags & PF_VCPU) &&
+			!(current->thread.regs->msr & MSR_SPE))
+		vcpu->arch.shadow_msr &= ~MSR_SPE;
 }
 
 void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	kvmppc_e500_tlb_put(vcpu);
-
-#ifdef CONFIG_SPE
-	if (vcpu->arch.shadow_msr & MSR_SPE)
-		kvmppc_vcpu_disable_spe(vcpu);
-#endif
 }
 
 int kvmppc_core_check_processor_compat(void)
diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
index 07b6110..c9ce332 100644
--- a/arch/powerpc/kvm/timing.c
+++ b/arch/powerpc/kvm/timing.c
@@ -135,6 +135,11 @@ static const char *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = {
 	[USR_PR_INST] =             "USR_PR_INST",
 	[FP_UNAVAIL] =              "FP_UNAVAIL",
 	[DEBUG_EXITS] =             "DEBUG",
+#ifdef CONFIG_SPE
+	[SPE_UNAVAIL] =              "SPE_UNAVAIL",
+	[SPE_FP_DATA] =              "SPE_FP_DATA",
+	[SPE_FP_ROUND] =             "SPE_FP_ROUND",
+#endif
 	[TIMEINGUEST] =             "TIMEINGUEST"
 };
 
diff --git a/arch/powerpc/kvm/timing.h b/arch/powerpc/kvm/timing.h
index 8167d42..712ab3a 100644
--- a/arch/powerpc/kvm/timing.h
+++ b/arch/powerpc/kvm/timing.h
@@ -93,6 +93,17 @@ static inline void kvmppc_account_exit_stat(struct kvm_vcpu *vcpu, int type)
 	case SIGNAL_EXITS:
 		vcpu->stat.signal_exits++;
 		break;
+#ifdef CONFIG_SPE
+	case SPE_UNAVAIL:
+		vcpu->stat.spe_unavail++;
+		break;
+	case SPE_FP_DATA:
+		vcpu->stat.spe_fp_data++;
+		break;
+	case SPE_FP_ROUND:
+		vcpu->stat.spe_fp_round++;
+		break;
+#endif
 	}
 }
 
-- 
1.6.4

^ permalink raw reply related

* [PATCH v2 1/2] powerpc/e500: make load_up_spe a normal fuction
From: Olivia Yin @ 2012-03-01  1:20 UTC (permalink / raw)
  To: kvm-ppc, kvm, linuxppc-dev; +Cc: Liu Yu, Olivia Yin

From: Liu Yu <yu.liu@freescale.com>

So that we can call it when improving SPE switch like book3e did for fp switch.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Olivia Yin <hong-hua.yin@freescale.com>
---
v2: 	add Signed-off-by

 arch/powerpc/kernel/head_fsl_booke.S |   23 ++++++-----------------
 1 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
index d5d78c4..c96e025 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -539,8 +539,10 @@ interrupt_base:
 	/* SPE Unavailable */
 	START_EXCEPTION(SPEUnavailable)
 	NORMAL_EXCEPTION_PROLOG
-	bne	load_up_spe
-	addi	r3,r1,STACK_FRAME_OVERHEAD
+	beq	1f
+	bl	load_up_spe
+	b	fast_exception_return
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	EXC_XFER_EE_LITE(0x2010, KernelSPE)
 #else
 	EXCEPTION(0x2020, SPEUnavailable, unknown_exception, EXC_XFER_EE)
@@ -743,7 +745,7 @@ tlb_write_entry:
 /* Note that the SPE support is closely modeled after the AltiVec
  * support.  Changes to one are likely to be applicable to the
  * other!  */
-load_up_spe:
+_GLOBAL(load_up_spe)
 /*
  * Disable SPE for the task which had SPE previously,
  * and save its SPE registers in its thread_struct.
@@ -791,20 +793,7 @@ load_up_spe:
 	subi	r4,r5,THREAD
 	stw	r4,last_task_used_spe@l(r3)
 #endif /* !CONFIG_SMP */
-	/* restore registers and return */
-2:	REST_4GPRS(3, r11)
-	lwz	r10,_CCR(r11)
-	REST_GPR(1, r11)
-	mtcr	r10
-	lwz	r10,_LINK(r11)
-	mtlr	r10
-	REST_GPR(10, r11)
-	mtspr	SPRN_SRR1,r9
-	mtspr	SPRN_SRR0,r12
-	REST_GPR(9, r11)
-	REST_GPR(12, r11)
-	lwz	r11,GPR11(r11)
-	rfi
+	blr
 
 /*
  * SPE unavailable trap from kernel - print a message, but let
-- 
1.6.4

^ permalink raw reply related

* Re: [PATCH 20/21] Introduce struct eeh_stats for EEH
From: Gavin Shan @ 2012-03-01  1:14 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <1330520204.15023.16.camel@concordia>

> > With the original EEH implementation, the EEH global statistics
> > are maintained by individual global variables. That makes the
> > code a little hard to maintain.
> 
> Hi Gavin,
> 
> > @@ -1174,21 +1182,24 @@ static int proc_eeh_show(struct seq_file *m, void *v)
> >  {
> >  	if (0 == eeh_subsystem_enabled) {
> >  		seq_printf(m, "EEH Subsystem is globally disabled\n");
> > -		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", total_mmio_ffs);
> > +		seq_printf(m, "eeh_total_mmio_ffs=%d\n", eeh_stats.total_mmio_ffs);
> >  	} else {
> >  		seq_printf(m, "EEH Subsystem is enabled\n");
> >  		seq_printf(m,
> > -				"no device=%ld\n"
> > -				"no device node=%ld\n"
> > -				"no config address=%ld\n"
> > -				"check not wanted=%ld\n"
> > -				"eeh_total_mmio_ffs=%ld\n"
> > -				"eeh_false_positives=%ld\n"
> > -				"eeh_slot_resets=%ld\n",
> > -				no_device, no_dn, no_cfg_addr, 
> > -				ignored_check, total_mmio_ffs, 
> > -				false_positives,
> > -				slot_resets);
> > +				"no device           =%d\n"
> > +				"no device node      =%d\n"
> > +				"no config address   =%d\n"
> > +				"check not wanted    =%d\n"
> > +				"eeh_total_mmio_ffs  =%d\n"
> > +				"eeh_false_positives =%d\n"
> > +				"eeh_slot_resets     =%d\n",
> 
> There *might* be tools out there that parse this output, so I'd say
> don't change it unless you have to - and I don't think you have to?
> 

Thanks for catching the point, Michael. I will change it back soon ;-)

Thanks,
Gavin

^ permalink raw reply

* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
From: Nishanth Aravamudan @ 2012-03-01  0:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Anton Blanchard, Dave Hansen, stable, linux-mm, Paul Mackerras,
	Mel Gorman, Johannes Weiner, Robert Jennings, linuxppc-dev
In-Reply-To: <20120229152830.22fc72a2.akpm@linux-foundation.org>

On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote:
> On Wed, 29 Feb 2012 10:12:33 -0800
> Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> 
> > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > Overcommit) on powerpc, we tripped the following:
> > 
> > kernel BUG at mm/bootmem.c:483!
> >
> > ...
> > 
> > This is
> > 
> >         BUG_ON(limit && goal + size > limit);
> > 
> > and after some debugging, it seems that
> > 
> > 	goal = 0x7ffff000000
> > 	limit = 0x80000000000
> > 
> > and sparse_early_usemaps_alloc_node ->
> > sparse_early_usemaps_alloc_pgdat_section calls
> > 
> > 	return alloc_bootmem_section(usemap_size() * count, section_nr);
> > 
> > This is on a system with 8TB available via the AMS pool, and as a quirk
> > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > with an allocation that will fail the goal/limit constraints. In theory,
> > we could "fall-back" to alloc_bootmem_node() in
> > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > unconditionally remove the limit condition in alloc_bootmem_section,
> > meaning allocations are allowed to cross section boundaries (necessary
> > for systems of this size).
> > 
> > Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> > guarantees section-locality, we need check_usemap_section_nr() to print
> > possible cross-dependencies between node descriptors and the usemaps
> > allocated through it. That makes the two loops in
> > sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> > bit.
> 
> The patch is a bit scary now, so I think we should merge it into
> 3.4-rc1 and then backport it into 3.3.1 if nothing blows up.

I think that's fair.

> Do you think it should be backported into 3.3.x?  Earlier kernels?

3.3.x seems reasonable. If I had to guess, I think this could be hit on
any kernels with this functionality -- that is, sparsemem in general?
Not sure how far back it's worth backporting.

> Also, this?

Urgh, yeah, that's way better.

Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>

> --- a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix
> +++ a/mm/bootmem.c
> @@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi
>  				    unsigned long section_nr)
>  {
>  	bootmem_data_t *bdata;
> -	unsigned long pfn, goal, limit;
> +	unsigned long pfn, goal;
> 
>  	pfn = section_nr_to_pfn(section_nr);
>  	goal = pfn << PAGE_SHIFT;
> -	limit = 0;
>  	bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
> 
> -	return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
> +	return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0);
>  }
>  #endif

Thanks for all the feedback!

-Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
From: Andrew Morton @ 2012-02-29 23:28 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Anton Blanchard, Dave Hansen, stable, linux-mm, Paul Mackerras,
	Mel Gorman, Johannes Weiner, Robert Jennings, linuxppc-dev
In-Reply-To: <20120229181233.GF5136@linux.vnet.ibm.com>

On Wed, 29 Feb 2012 10:12:33 -0800
Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:

> While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> Overcommit) on powerpc, we tripped the following:
> 
> kernel BUG at mm/bootmem.c:483!
>
> ...
> 
> This is
> 
>         BUG_ON(limit && goal + size > limit);
> 
> and after some debugging, it seems that
> 
> 	goal = 0x7ffff000000
> 	limit = 0x80000000000
> 
> and sparse_early_usemaps_alloc_node ->
> sparse_early_usemaps_alloc_pgdat_section calls
> 
> 	return alloc_bootmem_section(usemap_size() * count, section_nr);
> 
> This is on a system with 8TB available via the AMS pool, and as a quirk
> of AMS in firmware, all of that memory shows up in node 0. So, we end up
> with an allocation that will fail the goal/limit constraints. In theory,
> we could "fall-back" to alloc_bootmem_node() in
> sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> defined, we'll BUG_ON() instead. A simple solution appears to be to
> unconditionally remove the limit condition in alloc_bootmem_section,
> meaning allocations are allowed to cross section boundaries (necessary
> for systems of this size).
> 
> Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> guarantees section-locality, we need check_usemap_section_nr() to print
> possible cross-dependencies between node descriptors and the usemaps
> allocated through it. That makes the two loops in
> sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> bit.

The patch is a bit scary now, so I think we should merge it into
3.4-rc1 and then backport it into 3.3.1 if nothing blows up.

Do you think it should be backported into 3.3.x?  Earlier kernels?

Also, this?

--- a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix
+++ a/mm/bootmem.c
@@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi
 				    unsigned long section_nr)
 {
 	bootmem_data_t *bdata;
-	unsigned long pfn, goal, limit;
+	unsigned long pfn, goal;
 
 	pfn = section_nr_to_pfn(section_nr);
 	goal = pfn << PAGE_SHIFT;
-	limit = 0;
 	bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];
 
-	return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
+	return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0);
 }
 #endif
 
_

^ permalink raw reply

* [PATCH 28/39] PCI, powerpc: Register busn_res for root buses
From: Yinghai Lu @ 2012-02-29 23:07 UTC (permalink / raw)
  To: Jesse Barnes, Benjamin Herrenschmidt, Tony Luck, David Miller,
	x86
  Cc: linux-arch, linux-pci, linuxppc-dev, linux-kernel,
	Dominik Brodowski, Paul Mackerras, Bjorn Helgaas, Yinghai Lu
In-Reply-To: <1330556858-11768-1-git-send-email-yinghai@kernel.org>

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/pci-bridge.h |    1 +
 arch/powerpc/kernel/pci-common.c      |   10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 5d48765..11cebf0 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -30,6 +30,7 @@ struct pci_controller {
 	int first_busno;
 	int last_busno;
 	int self_busno;
+	struct resource busn;
 
 	void __iomem *io_base_virt;
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 910b9de..ee8c0c9 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1648,6 +1648,11 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
 	/* Wire up PHB bus resources */
 	pcibios_setup_phb_resources(hose, &resources);
 
+	hose->busn.start = hose->first_busno;
+	hose->busn.end	 = hose->last_busno;
+	hose->busn.flags = IORESOURCE_BUS;
+	pci_add_resource(&resources, &hose->busn);
+
 	/* Create an empty bus for the toplevel */
 	bus = pci_create_root_bus(hose->parent, hose->first_busno,
 				  hose->ops, hose, &resources);
@@ -1670,8 +1675,11 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
 		of_scan_bus(node, bus);
 	}
 
-	if (mode == PCI_PROBE_NORMAL)
+	if (mode == PCI_PROBE_NORMAL) {
+		pci_bus_update_busn_res_end(bus, 255);
 		hose->last_busno = bus->subordinate = pci_scan_child_bus(bus);
+		pci_bus_update_busn_res_end(bus, bus->subordinate);
+	}
 
 	/* Platform gets a chance to do some global fixups before
 	 * we proceed to resource allocation
-- 
1.7.7

^ permalink raw reply related

* Re: [PATCH] KVM: PPC: Don't sync timebase when inside KVM
From: Scott Wood @ 2012-02-29 19:06 UTC (permalink / raw)
  To: Alexander Graf
  Cc: <linuxppc-dev@lists.ozlabs.org>,
	<kvm@vger.kernel.org>, <kvm-ppc@vger.kernel.org>
In-Reply-To: <39AA9511-4D56-4087-BC98-4BB32EF048AA@suse.de>

On 02/29/2012 12:28 PM, Alexander Graf wrote:
> 
> 
> On 29.02.2012, at 18:50, Scott Wood <scottwood@freescale.com> wrote:
> 
>> On 02/28/2012 08:16 PM, Alexander Graf wrote:
>>> When we know that we're running inside of a KVM guest, we don't have to
>>> worry about synchronizing timebases between different CPUs, since the
>>> host already took care of that.
>>>
>>> This fixes CPU overcommit scenarios where vCPUs could hang forever trying
>>> to sync each other while not being scheduled.
>>>
>>> Reported-by: Stuart Yoder <B08248@freescale.com>
>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>
>> This should apply to any hypervisor, not just KVM.  
> 
> Sure, but do you have a generic function to evaluate that? :)

The presence of a hypervisor node without testing compatible.  Might not
get them all, but at least it will cover more than just KVM.

>> Which platforms are you seeing this on?  If it's on Freescale chips,
>> U-Boot should be doing the sync and Linux should never do it, even in
>> the absence of a hypervisor.
> 
> This is on e500mc.

On e500mc Linux should never by trying to sync the timebase.  If it is,
let's fix that.

-Scott

^ permalink raw reply

* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
From: Johannes Weiner @ 2012-02-29 18:45 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras,
	Mel Gorman, Andrew Morton, Robert Jennings, linuxppc-dev
In-Reply-To: <20120229181233.GF5136@linux.vnet.ibm.com>

On Wed, Feb 29, 2012 at 10:12:33AM -0800, Nishanth Aravamudan wrote:
> On 28.02.2012 [15:47:32 +0000], Mel Gorman wrote:
> > On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote:
> > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> > > Overcommit) on powerpc, we tripped the following:
> > > 
> > > kernel BUG at mm/bootmem.c:483!
> > > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
> > >     pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
> > >     lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
> > >     sp: c000000000c03bc0
> > >    msr: 8000000000021032
> > >   current = 0xc000000000b0cce0
> > >   paca    = 0xc000000001d80000
> > >     pid   = 0, comm = swapper
> > > kernel BUG at mm/bootmem.c:483!
> > > enter ? for help
> > > [c000000000c03c80] c000000000a64bcc
> > > .sparse_early_usemaps_alloc_node+0x84/0x29c
> > > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
> > > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
> > > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
> > > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
> > > 
> > > This is
> > > 
> > >         BUG_ON(limit && goal + size > limit);
> > > 
> > > and after some debugging, it seems that
> > > 
> > > 	goal = 0x7ffff000000
> > > 	limit = 0x80000000000
> > > 
> > > and sparse_early_usemaps_alloc_node ->
> > > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls
> > > 
> > > 	return alloc_bootmem_section(usemap_size() * count, section_nr);
> > > 
> > > This is on a system with 8TB available via the AMS pool, and as a quirk
> > > of AMS in firmware, all of that memory shows up in node 0. So, we end up
> > > with an allocation that will fail the goal/limit constraints. In theory,
> > > we could "fall-back" to alloc_bootmem_node() in
> > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> > > defined, we'll BUG_ON() instead. A simple solution appears to be to
> > > disable the limit check if the size of the allocation in
> > > alloc_bootmem_secition exceeds the section size.
> > > 
> > > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > > Cc: Dave Hansen <haveblue@us.ibm.com>
> > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > Cc: Paul Mackerras <paulus@samba.org>
> > > Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
> > > Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > ---
> > >  include/linux/mmzone.h |    2 ++
> > >  mm/bootmem.c           |    5 ++++-
> > >  2 files changed, 6 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 650ba2f..4176834 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
> > >   * PA_SECTION_SHIFT		physical address to/from section number
> > >   * PFN_SECTION_SHIFT		pfn to/from section number
> > >   */
> > > +#define BYTES_PER_SECTION	(1UL << SECTION_SIZE_BITS)
> > > +
> > >  #define SECTIONS_SHIFT		(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
> > >  
> > >  #define PA_SECTION_SHIFT	(SECTION_SIZE_BITS)
> > > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > > index 668e94d..5cbbc76 100644
> > > --- a/mm/bootmem.c
> > > +++ b/mm/bootmem.c
> > > @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size,
> > >  
> > >  	pfn = section_nr_to_pfn(section_nr);
> > >  	goal = pfn << PAGE_SHIFT;
> > > -	limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> > > +	if (size > BYTES_PER_SECTION)
> > > +		limit = 0;
> > > +	else
> > > +		limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
> > 
> > As it's ok to spill the allocation over to an adjacent section, why not
> > just make limit==0 unconditionally. That would avoid defining
> > BYTES_PER_SECTION.
> 
> Something like this?
> 
> Andrew, presuming Mel & Johannes give their, ack this should presumably
> supersede the patch you pulled into -mm.
> 
> Thanks,
> Nish
> 
> -------
> 
> While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
> Overcommit) on powerpc, we tripped the following:
> 
> kernel BUG at mm/bootmem.c:483!
> cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
>     pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
>     lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
>     sp: c000000000c03bc0
>    msr: 8000000000021032
>   current = 0xc000000000b0cce0
>   paca    = 0xc000000001d80000
>     pid   = 0, comm = swapper
> kernel BUG at mm/bootmem.c:483!
> enter ? for help
> [c000000000c03c80] c000000000a64bcc
> .sparse_early_usemaps_alloc_node+0x84/0x29c
> [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
> [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
> [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
> [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
> 
> This is
> 
>         BUG_ON(limit && goal + size > limit);
> 
> and after some debugging, it seems that
> 
> 	goal = 0x7ffff000000
> 	limit = 0x80000000000
> 
> and sparse_early_usemaps_alloc_node ->
> sparse_early_usemaps_alloc_pgdat_section calls
> 
> 	return alloc_bootmem_section(usemap_size() * count, section_nr);
> 
> This is on a system with 8TB available via the AMS pool, and as a quirk
> of AMS in firmware, all of that memory shows up in node 0. So, we end up
> with an allocation that will fail the goal/limit constraints. In theory,
> we could "fall-back" to alloc_bootmem_node() in
> sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
> defined, we'll BUG_ON() instead. A simple solution appears to be to
> unconditionally remove the limit condition in alloc_bootmem_section,
> meaning allocations are allowed to cross section boundaries (necessary
> for systems of this size).
> 
> Johannes Weiner pointed out that if alloc_bootmem_section() no longer
> guarantees section-locality, we need check_usemap_section_nr() to print
> possible cross-dependencies between node descriptors and the usemaps
> allocated through it. That makes the two loops in
> sparse_early_usemaps_alloc_node() identical, so re-factor the code a
> bit.
> 
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Don't sync timebase when inside KVM
From: Alexander Graf @ 2012-02-29 18:28 UTC (permalink / raw)
  To: Scott Wood
  Cc: <linuxppc-dev@lists.ozlabs.org>,
	<kvm@vger.kernel.org>, <kvm-ppc@vger.kernel.org>
In-Reply-To: <4F4E6574.5050604@freescale.com>



On 29.02.2012, at 18:50, Scott Wood <scottwood@freescale.com> wrote:

> On 02/28/2012 08:16 PM, Alexander Graf wrote:
>> When we know that we're running inside of a KVM guest, we don't have to
>> worry about synchronizing timebases between different CPUs, since the
>> host already took care of that.
>> 
>> This fixes CPU overcommit scenarios where vCPUs could hang forever trying
>> to sync each other while not being scheduled.
>> 
>> Reported-by: Stuart Yoder <B08248@freescale.com>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
> 
> This should apply to any hypervisor, not just KVM.  

Sure, but do you have a generic function to evaluate that? :)

> On book3e, Power ISA
> says timebase is read-only on virtualized implementations.  My
> understanding is that book3s is paravirt-only (guest state is not
> considered an implementation of the Power ISA), and it says "Writing the
> Time Base is privileged, and can be done only in hypervisor state".

For PR non-PAPR KVM, we are non-paravirt, but ignore tb writes iirc.

> 
> Which platforms are you seeing this on?  If it's on Freescale chips,
> U-Boot should be doing the sync and Linux should never do it, even in
> the absence of a hypervisor.

This is on e500mc.

Alex

> 
> -Scott
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/44x: Add more changes for APM821XX EMAC driver
From: David Miller @ 2012-02-29 18:25 UTC (permalink / raw)
  To: jwboyer; +Cc: dhdang, linux-kernel, paulus, netdev, linuxppc-dev
In-Reply-To: <CA+5PVA5hciQSvfkodX-oP_kUZueiTp=0+t8X_0iHQ+ehU0ecOQ@mail.gmail.com>

From: Josh Boyer <jwboyer@gmail.com>
Date: Wed, 29 Feb 2012 08:43:46 -0500

> On Fri, Feb 17, 2012 at 3:07 AM, Duc Dang <dhdang@apm.com> wrote:
>> This patch includes:
>>
>> =A0Configure EMAC PHY clock source (clock from PHY or internal clock=
).
>>
>> =A0Do not advertise PHY half duplex capability as APM821XX EMAC does=
 not
>> support half duplex mode.
>>
>> =A0Add changes to support configuring jumbo frame for APM821XX EMAC.=

>>
>> Signed-off-by: Duc Dang <dhdang@apm.com>
> =

> This should have been sent to netdev.  CC'ing them now.
> =

> Ben and David, I can take this change through the 4xx tree if it look=
s OK to
> both of you.  The pre-requisite DTS patch will go through my tree, so=
 it might
> make sense to keep them together.

Well the patch has coding style problems, for one:

>> +			dev->features |=3D (EMAC_APM821XX_REQ_JUMBO_FRAME_SIZE
>> +					| EMAC_FTR_APM821XX_NO_HALF_DUPLEX
>> +					| EMAC_FTR_460EX_PHY_CLK_FIX);

Should be:

>> +			dev->features |=3D (EMAC_APM821XX_REQ_JUMBO_FRAME_SIZE |
>> +					  EMAC_FTR_APM821XX_NO_HALF_DUPLEX |
>> +					  EMAC_FTR_460EX_PHY_CLK_FIX);

And this:

>> +		dev->phy_feat_exc =3D (SUPPORTED_1000baseT_Half
>> +					| SUPPORTED_100baseT_Half
>> +					| SUPPORTED_10baseT_Half);

Should be:

>> +		dev->phy_feat_exc =3D (SUPPORTED_1000baseT_Half |
>> +				     SUPPORTED_100baseT_Half |
>> +				     SUPPORTED_10baseT_Half);

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox