LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* RE: [PATCH] powerpc/8xx: tqm8xx: fix incorrect placement of __initdata tag
From: David Laight @ 2013-09-30 14:20 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz, Vitaly Bordug
  Cc: Marcelo Tosatti, Kyungmin Park, linuxppc-dev, linux-kernel
In-Reply-To: <1589399.KSlGuZSdd2@amdc1032>

> __initdata tag should be placed between the variable name and equal
> sign for the variable to be placed in the intended .init.data section.
...
> -static struct __initdata cpm_pin tqm8xx_pins[] =3D {
> +static struct cpm_pin tqm8xx_pins[] __initdata =3D {

As far as gcc is concerned it can go almost anywhere before the '=3D',
even before the 'static'.
Splitting 'struct cpm_pin' does seem an odd choice.

The Linux coding standards might suggest a location.
I'd have thought that either before or after the 'static' would be best
(ie as a storage class qualifier).

	David

^ permalink raw reply

* Re: [RFC PATCH 06/11] kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops
From: Alexander Graf @ 2013-09-30 14:51 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, kvm-ppc
In-Reply-To: <87pprqbh96.fsf@linux.vnet.ibm.com>

On 09/30/2013 02:56 PM, Aneesh Kumar K.V wrote:
> Alexander Graf<agraf@suse.de>  writes:
>
>> On 27.09.2013, at 15:03, Aneesh Kumar K.V wrote:
>>
>>> Alexander Graf<agraf@suse.de>  writes:
>>>
>>>
>>>>> diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
>>>>> index 1abe478..e0229dd 100644
>>>>> --- a/arch/powerpc/kvm/book3s_segment.S
>>>>> +++ b/arch/powerpc/kvm/book3s_segment.S
>>>>> @@ -161,9 +161,14 @@ kvmppc_handler_trampoline_enter_end:
>>>>> .global kvmppc_handler_trampoline_exit
>>>>> kvmppc_handler_trampoline_exit:
>>>>>
>>>>> +#if defined(CONFIG_KVM_BOOK3S_HV)
>>>>> +.global kvmppc_interrupt_pr
>>>>> +kvmppc_interrupt_pr:
>>>>> +	ld	r9, HSTATE_SCRATCH2(r13)
>>>>> +#else
>>>>> .global kvmppc_interrupt
>>>>> kvmppc_interrupt:
>>>> Just always call it kvmppc_interrupt_pr and thus share at least that
>>>> part of the code :).
>>> But if i don't have HV enabled, we don't compile book3s_hv_rmhandlers.S
>>> Hence don't have the kvmppc_interrupt symbol defined.
>> Ah, because we're always jumping to kvmppc_interrupt. Can we make this
>> slightly less magical? How about we always call kvmppc_interrupt_hv
>> when CONFIG_KVM_BOOK3S_HV_POSSIBLE and always kvmppc_interrupt_pr when
>> CONFIG_KVM_BOOK3S_PR_POSSIBLE and then branch to kvmppc_interrupt_pr
>> from kvmppc_interrupt_hv?
>>
>> IMHO that would make the code flow more obvious.
>
> To make sure I understand you correctly, what you are suggesting is
> to update __KVM_HANDLER to call kvmppc_interupt_pr when HV is not
> enabled ?

Yes, I think that makes the code flow more obvious. Every function 
always has the same name regardless of config options then.


Alex

^ permalink raw reply

* Re: [RFC PATCH 07/11] kvm: powerpc: book3s: pr: move PR related tracepoints to a separate header
From: Alexander Graf @ 2013-09-30 14:51 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, kvm-ppc
In-Reply-To: <87mwmubh70.fsf@linux.vnet.ibm.com>

On 09/30/2013 02:57 PM, Aneesh Kumar K.V wrote:
> Alexander Graf<agraf@suse.de>  writes:
>
>> On 27.09.2013, at 15:06, Aneesh Kumar K.V wrote:
>>
>>> Alexander Graf<agraf@suse.de>  writes:
>>>
>>>> On 27.09.2013, at 12:03, Aneesh Kumar K.V wrote:
>>>>
>>>>> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
>>>>>
>>>>> This patch moves PR related tracepoints to a separate header. This
>>>>> enables in converting PR to a kernel module which will be done in
>>>>> later patches
>>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>>>> ---
>>>>> arch/powerpc/kvm/book3s_64_mmu_host.c |   2 +-
>>>>> arch/powerpc/kvm/book3s_mmu_hpte.c    |   2 +-
>>>>> arch/powerpc/kvm/book3s_pr.c          |   3 +-
>>>>> arch/powerpc/kvm/trace.h              | 234 +--------------------------
>>>>> arch/powerpc/kvm/trace_pr.h           | 297 ++++++++++++++++++++++++++++++++++
>>>>> 5 files changed, 308 insertions(+), 230 deletions(-)
>>>>> create mode 100644 arch/powerpc/kvm/trace_pr.h
>>>>>
>>>>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
>>>>> index 329a978..fd5b393 100644
>>>>> --- a/arch/powerpc/kvm/book3s_64_mmu_host.c
>>>>> +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
>>>>> @@ -27,7 +27,7 @@
>>>>> #include<asm/machdep.h>
>>>>> #include<asm/mmu_context.h>
>>>>> #include<asm/hw_irq.h>
>>>>> -#include "trace.h"
>>>>> +#include "trace_pr.h"
>>>>>
>>>>> #define PTE_SIZE 12
>>>>>
>>>>> diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c b/arch/powerpc/kvm/book3s_mmu_hpte.c
>>>>> index d2d280b..4556168 100644
>>>>> --- a/arch/powerpc/kvm/book3s_mmu_hpte.c
>>>>> +++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
>>>>> @@ -28,7 +28,7 @@
>>>>> #include<asm/mmu_context.h>
>>>>> #include<asm/hw_irq.h>
>>>>>
>>>>> -#include "trace.h"
>>>>> +#include "trace_pr.h"
>>>>>
>>>>> #define PTE_SIZE	12
>>>>>
>>>>> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
>>>>> index 2a97279..99d0839 100644
>>>>> --- a/arch/powerpc/kvm/book3s_pr.c
>>>>> +++ b/arch/powerpc/kvm/book3s_pr.c
>>>>> @@ -41,7 +41,8 @@
>>>>> #include<linux/vmalloc.h>
>>>>> #include<linux/highmem.h>
>>>>>
>>>>> -#include "trace.h"
>>>>> +#define CREATE_TRACE_POINTS
>>>>> +#include "trace_pr.h"
>>>>>
>>>>> /* #define EXIT_DEBUG */
>>>>> /* #define DEBUG_EXT */
>>>>> diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
>>>>> index a088e9a..7d5a136 100644
>>>>> --- a/arch/powerpc/kvm/trace.h
>>>>> +++ b/arch/powerpc/kvm/trace.h
>>>>> @@ -85,6 +85,12 @@ TRACE_EVENT(kvm_ppc_instr,
>>>>> 	{41, "HV_PRIV"}
>>>>> #endif
>>>>>
>>>>> +#ifndef CONFIG_KVM_BOOK3S_PR
>>>>> +/*
>>>>> + * For pr we define this in trace_pr.h since it pr can be built as
>>>>> + * a module
>>>> Not sure I understand the need. If the config option is available, so
>>>> should the struct field. Worst case that happens with HV is that we
>>>> get empty shadow_srr1 values in our trace, no?
>>> That is not the real reason. trace.h get built as part of kvm.ko or as
>>> part of kernel. These trace functions actually get called from
>>> kvm-pr.ko. To make they build i would either need EXPORT_SYMBOL or move
>>> the definition of them to kvm-pr.ko. I did the later and moved only pr
>>> related traces to kvm-pr.ko
>> I fail to see why we wouldn't have a trace_hv.h file then, as that can
>> also be built as a module, no? And at that point I don't see why we
>> would need any conditionals at all in trace.h anymore, as it would
>> only cover generic code.
> Currently HV module is not using any tracepoints. Once it start using
> tracepoints we would have trace_hv.h

So why would there be an #ifndef in trace.h?


Alex

^ permalink raw reply

* Re: [RFC PATCH 00/11 Allow PR and HV KVM to coexist in one kernel
From: Alexander Graf @ 2013-09-30 14:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: <kvm@vger.kernel.org> list, Gleb Natapov, kvm-ppc,
	Paul Mackerras, Paolo Bonzini, linuxppc-dev
In-Reply-To: <87had2bgme.fsf@linux.vnet.ibm.com>

On 09/30/2013 03:09 PM, Aneesh Kumar K.V wrote:
> Alexander Graf<agraf@suse.de>  writes:
>
>> On 27.09.2013, at 12:52, Aneesh Kumar K.V wrote:
>>
>>> "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>  writes:
>>>
>>>> Hi All,
>>>>
>>>> This patch series support enabling HV and PR KVM together in the same kernel. We
>>>> extend machine property with new property "kvm_type". A value of 1 will force HV
>>>> KVM and 2 PR KVM. The default value is 0 which will select the fastest KVM mode.
>>>> ie, HV if that is supported otherwise PR.
>>>>
>>>> With Qemu command line having
>>>>
>>>> -machine pseries,accel=kvm,kvm_type=1
>>>>
>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>> failed to initialize KVM: Invalid argument
>>>> [root@llmp24l02 qemu]# modprobe kvm-pr
>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>> failed to initialize KVM: Invalid argument
>>>> [root@llmp24l02 qemu]# modprobe  kvm-hv
>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>
>>>> now with
>>>>
>>>> -machine pseries,accel=kvm,kvm_type=2
>>>>
>>>> [root@llmp24l02 qemu]# rmmod kvm-pr
>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>> failed to initialize KVM: Invalid argument
>>>> [root@llmp24l02 qemu]#
>>>> [root@llmp24l02 qemu]# modprobe kvm-pr
>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>
>>>> if don't specify kvm_type machine property, it will take a default value 0,
>>>> which means fastest supported.
>>> Related qemu patch
>>>
>>> commit 8d139053177d48a70cb710b211ea4c2843eccdfb
>>> Author: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>> Date:   Mon Sep 23 12:28:37 2013 +0530
>>>
>>>     kvm: Add a new machine property kvm_type
>>>
>>>     Targets like ppc64 support different type of KVM, one which use
>>>     hypervisor mode and the other which doesn't. Add a new machine
>>>     property kvm_type that helps in selecting the respective ones
>>>
>>>     Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>> This really is too early, as we can't possibly run in HV mode for
>> non-pseries machines, so the interpretation (or at least sanity
>> checking) of what values are reasonable should occur in the
>> machine. That's why it's a variable in the "machine opts".
> With the current code CREATE_VM will fail, because we won't have
> kvm-hv.ko loaded and trying to create a vm with type 1 will fail.
> Now the challenge related to moving that to machine_init or later is, we
> depend on HV or PR callback early in CREATE_VM. With the changes we have
>
> int kvmppc_core_init_vm(struct kvm *kvm)
> {
>
> #ifdef CONFIG_PPC64
> 	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
> 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
> #endif
>
> 	return kvm->arch.kvm_ops->init_vm(kvm);
> }
>
> Also the mmu notifier callback do end up calling kvm_unmap_hva etc which
> are all HV/PR dependent.

Yes, so we should verify in the machine models that we're runnable with 
the currently selected type at least, to give the user a sensible error 
message.

>
>
>
>> Also, users don't want to say type=0. They want to say type=PR or
>> type=HV or type=HV,PR. In fact, can't you make this a property of
>> -accel? Then it's truly accel specific and everything should be well.
> If we are doing this as machine property, we can't specify string,
> because "HV"/"PR" are all powerpc dependent, so parsing that is not
> possible in kvm_init in qemu. But, yes ideally it would be nice to be

Well, we could do the "name to integer" conversion in an arch specific 
function, no?

> able to speicy the type using string. I thought accel is a machine
> property, hence was not sure whether I can have additional properties
> against that. I was using it as below.
>
>   -machine pseries,accel=kvm,kvm_type=1
>
> will look into more details to check whether this can be accel property.

Great :).


Alex

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: tqm8xx: fix incorrect placement of __initdata tag
From: Bartlomiej Zolnierkiewicz @ 2013-09-30 15:05 UTC (permalink / raw)
  To: David Laight; +Cc: Marcelo Tosatti, Kyungmin Park, linuxppc-dev, linux-kernel
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B736B@saturn3.aculab.com>

On Monday, September 30, 2013 03:20:29 PM David Laight wrote:
> > __initdata tag should be placed between the variable name and equal
> > sign for the variable to be placed in the intended .init.data section.
> ...
> > -static struct __initdata cpm_pin tqm8xx_pins[] = {
> > +static struct cpm_pin tqm8xx_pins[] __initdata = {
> 
> As far as gcc is concerned it can go almost anywhere before the '=',
> even before the 'static'.
> Splitting 'struct cpm_pin' does seem an odd choice.

It is not only an odd choice, it just doesn't work as it should in
the practice (as tested with gcc-4.6.3 from Ubuntu 12.04).

> The Linux coding standards might suggest a location.
> I'd have thought that either before or after the 'static' would be best
> (ie as a storage class qualifier).

The majority of the kernel code uses __initdata before equal sign
and the __initdata documentation in <linux/init.h> recommends such
usage:

"
 * For initialized data:
 * You should insert __initdata or __initconst between the variable name
 * and equal sign followed by value, e.g.:
 *
 * static int init_variable __initdata = 0;
 * static const char linux_logo[] __initconst = { 0x32, 0x36, ... };
"

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply

* Re: [RFC PATCH 07/11] kvm: powerpc: book3s: pr: move PR related tracepoints to a separate header
From: Aneesh Kumar K.V @ 2013-09-30 15:53 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc
In-Reply-To: <52499003.1090405@suse.de>

Alexander Graf <agraf@suse.de> writes:

> On 09/30/2013 02:57 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf<agraf@suse.de>  writes:
>>>>>> diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
>>>>>> index a088e9a..7d5a136 100644
>>>>>> --- a/arch/powerpc/kvm/trace.h
>>>>>> +++ b/arch/powerpc/kvm/trace.h
>>>>>> @@ -85,6 +85,12 @@ TRACE_EVENT(kvm_ppc_instr,
>>>>>> 	{41, "HV_PRIV"}
>>>>>> #endif
>>>>>>
>>>>>> +#ifndef CONFIG_KVM_BOOK3S_PR
>>>>>> +/*
>>>>>> + * For pr we define this in trace_pr.h since it pr can be built as
>>>>>> + * a module
>>>>> Not sure I understand the need. If the config option is available, so
>>>>> should the struct field. Worst case that happens with HV is that we
>>>>> get empty shadow_srr1 values in our trace, no?
>>>> That is not the real reason. trace.h get built as part of kvm.ko or as
>>>> part of kernel. These trace functions actually get called from
>>>> kvm-pr.ko. To make they build i would either need EXPORT_SYMBOL or move
>>>> the definition of them to kvm-pr.ko. I did the later and moved only pr
>>>> related traces to kvm-pr.ko
>>> I fail to see why we wouldn't have a trace_hv.h file then, as that can
>>> also be built as a module, no? And at that point I don't see why we
>>> would need any conditionals at all in trace.h anymore, as it would
>>> only cover generic code.
>> Currently HV module is not using any tracepoints. Once it start using
>> tracepoints we would have trace_hv.h
>
> So why would there be an #ifndef in trace.h?
>

to handle things like trace_kvm_exit in booke.c. For that we still don't
have kernel module, and booke.c include trace.h.

-aneesh

^ permalink raw reply

* Re: [RFC PATCH 07/11] kvm: powerpc: book3s: pr: move PR related tracepoints to a separate header
From: Alexander Graf @ 2013-09-30 15:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, kvm-ppc
In-Reply-To: <87bo3ab920.fsf@linux.vnet.ibm.com>

On 09/30/2013 05:53 PM, Aneesh Kumar K.V wrote:
> Alexander Graf<agraf@suse.de>  writes:
>
>> On 09/30/2013 02:57 PM, Aneesh Kumar K.V wrote:
>>> Alexander Graf<agraf@suse.de>   writes:
>>>>>>> diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
>>>>>>> index a088e9a..7d5a136 100644
>>>>>>> --- a/arch/powerpc/kvm/trace.h
>>>>>>> +++ b/arch/powerpc/kvm/trace.h
>>>>>>> @@ -85,6 +85,12 @@ TRACE_EVENT(kvm_ppc_instr,
>>>>>>> 	{41, "HV_PRIV"}
>>>>>>> #endif
>>>>>>>
>>>>>>> +#ifndef CONFIG_KVM_BOOK3S_PR
>>>>>>> +/*
>>>>>>> + * For pr we define this in trace_pr.h since it pr can be built as
>>>>>>> + * a module
>>>>>> Not sure I understand the need. If the config option is available, so
>>>>>> should the struct field. Worst case that happens with HV is that we
>>>>>> get empty shadow_srr1 values in our trace, no?
>>>>> That is not the real reason. trace.h get built as part of kvm.ko or as
>>>>> part of kernel. These trace functions actually get called from
>>>>> kvm-pr.ko. To make they build i would either need EXPORT_SYMBOL or move
>>>>> the definition of them to kvm-pr.ko. I did the later and moved only pr
>>>>> related traces to kvm-pr.ko
>>>> I fail to see why we wouldn't have a trace_hv.h file then, as that can
>>>> also be built as a module, no? And at that point I don't see why we
>>>> would need any conditionals at all in trace.h anymore, as it would
>>>> only cover generic code.
>>> Currently HV module is not using any tracepoints. Once it start using
>>> tracepoints we would have trace_hv.h
>> So why would there be an #ifndef in trace.h?
>>
> to handle things like trace_kvm_exit in booke.c. For that we still don't
> have kernel module, and booke.c include trace.h.

Ah, so it's for booke! :)

Just move them to trace_booke.h then to clarify things.


Alex

^ permalink raw reply

* Re: [RFC PATCH 06/11] kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops
From: Aneesh Kumar K.V @ 2013-09-30 16:20 UTC (permalink / raw)
  To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc
In-Reply-To: <52498FD9.6060809@suse.de>

Alexander Graf <agraf@suse.de> writes:

> On 09/30/2013 02:56 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf<agraf@suse.de>  writes:
>>
>>> On 27.09.2013, at 15:03, Aneesh Kumar K.V wrote:
>>>
>>>> Alexander Graf<agraf@suse.de>  writes:
>>>>
>>>>
>>>>>> diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
>>>>>> index 1abe478..e0229dd 100644
>>>>>> --- a/arch/powerpc/kvm/book3s_segment.S
>>>>>> +++ b/arch/powerpc/kvm/book3s_segment.S
>>>>>> @@ -161,9 +161,14 @@ kvmppc_handler_trampoline_enter_end:
>>>>>> .global kvmppc_handler_trampoline_exit
>>>>>> kvmppc_handler_trampoline_exit:
>>>>>>
>>>>>> +#if defined(CONFIG_KVM_BOOK3S_HV)
>>>>>> +.global kvmppc_interrupt_pr
>>>>>> +kvmppc_interrupt_pr:
>>>>>> +	ld	r9, HSTATE_SCRATCH2(r13)
>>>>>> +#else
>>>>>> .global kvmppc_interrupt
>>>>>> kvmppc_interrupt:
>>>>> Just always call it kvmppc_interrupt_pr and thus share at least that
>>>>> part of the code :).
>>>> But if i don't have HV enabled, we don't compile book3s_hv_rmhandlers.S
>>>> Hence don't have the kvmppc_interrupt symbol defined.
>>> Ah, because we're always jumping to kvmppc_interrupt. Can we make this
>>> slightly less magical? How about we always call kvmppc_interrupt_hv
>>> when CONFIG_KVM_BOOK3S_HV_POSSIBLE and always kvmppc_interrupt_pr when
>>> CONFIG_KVM_BOOK3S_PR_POSSIBLE and then branch to kvmppc_interrupt_pr
>>> from kvmppc_interrupt_hv?
>>>
>>> IMHO that would make the code flow more obvious.
>>
>> To make sure I understand you correctly, what you are suggesting is
>> to update __KVM_HANDLER to call kvmppc_interupt_pr when HV is not
>> enabled ?
>
> Yes, I think that makes the code flow more obvious. Every function 
> always has the same name regardless of config options then.
>

Something like this ( btw non tested )

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index cca12f0..0b798d4 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -198,6 +198,17 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 	cmpwi	r10,0;							\
 	bne	do_kvm_##n
 
+#ifdef CONFIG_KVM_BOOK3S_HV
+/*
+ * If hv is possible, interrupts come into to the hv version
+ * of the kvmppc_interrupt code, which then jumps to the PR handler,
+ * kvmppc_interrupt_pr, if the guest is a PR guest.
+ */
+#define kvmppc_interrupt kvmppc_interrupt_hv
+#else
+#define kvmppc_interrupt kvmppc_interrupt_pr
+#endif
+
 #define __KVM_HANDLER(area, h, n)					\
 do_kvm_##n:								\
 	BEGIN_FTR_SECTION_NESTED(947)					\
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5ede7fc..2eb6622 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -563,8 +563,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 /*
  * We come here from the first-level interrupt handlers.
  */
-	.globl	kvmppc_interrupt
-kvmppc_interrupt:
+	.globl	kvmppc_interrupt_hv
+kvmppc_interrupt_hv:
 	/*
 	 * Register contents:
 	 * R12		= interrupt vector
@@ -577,6 +577,11 @@ kvmppc_interrupt:
 	lbz	r9, HSTATE_IN_GUEST(r13)
 	cmpwi	r9, KVM_GUEST_MODE_HOST_HV
 	beq	kvmppc_bad_host_intr
+#ifdef CONFIG_KVM_BOOK3S_PR
+	cmpwi	r9, KVM_GUEST_MODE_GUEST
+	ld	r9, HSTATE_SCRATCH2(r13)
+	beq	kvmppc_interrupt_pr
+#endif
 	/* We're now back in the host but in guest MMU context */
 	li	r9, KVM_GUEST_MODE_HOST_HV
 	stb	r9, HSTATE_IN_GUEST(r13)
diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
index 1abe478..bc50c97 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -161,8 +161,8 @@ kvmppc_handler_trampoline_enter_end:
 .global kvmppc_handler_trampoline_exit
 kvmppc_handler_trampoline_exit:
 
-.global kvmppc_interrupt
-kvmppc_interrupt:
+.global kvmppc_interrupt_pr
+kvmppc_interrupt_pr:
 
 	/* Register usage at this point:
 	 *

^ permalink raw reply related

* Re: [RFC PATCH 06/11] kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops
From: Alexander Graf @ 2013-09-30 16:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, kvm-ppc
In-Reply-To: <878uyeb7s4.fsf@linux.vnet.ibm.com>

On 09/30/2013 06:20 PM, Aneesh Kumar K.V wrote:
> Alexander Graf<agraf@suse.de>  writes:
>
>> On 09/30/2013 02:56 PM, Aneesh Kumar K.V wrote:
>>> Alexander Graf<agraf@suse.de>   writes:
>>>
>>>> On 27.09.2013, at 15:03, Aneesh Kumar K.V wrote:
>>>>
>>>>> Alexander Graf<agraf@suse.de>   writes:
>>>>>
>>>>>
>>>>>>> diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
>>>>>>> index 1abe478..e0229dd 100644
>>>>>>> --- a/arch/powerpc/kvm/book3s_segment.S
>>>>>>> +++ b/arch/powerpc/kvm/book3s_segment.S
>>>>>>> @@ -161,9 +161,14 @@ kvmppc_handler_trampoline_enter_end:
>>>>>>> .global kvmppc_handler_trampoline_exit
>>>>>>> kvmppc_handler_trampoline_exit:
>>>>>>>
>>>>>>> +#if defined(CONFIG_KVM_BOOK3S_HV)
>>>>>>> +.global kvmppc_interrupt_pr
>>>>>>> +kvmppc_interrupt_pr:
>>>>>>> +	ld	r9, HSTATE_SCRATCH2(r13)
>>>>>>> +#else
>>>>>>> .global kvmppc_interrupt
>>>>>>> kvmppc_interrupt:
>>>>>> Just always call it kvmppc_interrupt_pr and thus share at least that
>>>>>> part of the code :).
>>>>> But if i don't have HV enabled, we don't compile book3s_hv_rmhandlers.S
>>>>> Hence don't have the kvmppc_interrupt symbol defined.
>>>> Ah, because we're always jumping to kvmppc_interrupt. Can we make this
>>>> slightly less magical? How about we always call kvmppc_interrupt_hv
>>>> when CONFIG_KVM_BOOK3S_HV_POSSIBLE and always kvmppc_interrupt_pr when
>>>> CONFIG_KVM_BOOK3S_PR_POSSIBLE and then branch to kvmppc_interrupt_pr
>>>> from kvmppc_interrupt_hv?
>>>>
>>>> IMHO that would make the code flow more obvious.
>>> To make sure I understand you correctly, what you are suggesting is
>>> to update __KVM_HANDLER to call kvmppc_interupt_pr when HV is not
>>> enabled ?
>> Yes, I think that makes the code flow more obvious. Every function
>> always has the same name regardless of config options then.
>>
> Something like this ( btw non tested )

Yes :).


Alex

^ permalink raw reply

* Re: [PATCH 36/39] powerpc/kvm/book3s_hv: Add little endian guest support
From: Alexander Graf @ 2013-09-30 18:40 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Paul Mackerras, linuxppc-dev
In-Reply-To: <1379901913-5945-37-git-send-email-anton@samba.org>

On 09/23/2013 04:05 AM, Anton Blanchard wrote:
> Add support for the H_SET_MODE hcall so we can select the
> endianness of our exceptions.
>
> We create a guest MSR from scratch when delivering exceptions in
> a few places and instead of extracing the LPCR[ILE] and inserting
> it into MSR_LE each time simply create a new variable intr_msr which
> contains the entire MSR to use.
>
> Signed-off-by: Anton Blanchard<anton@samba.org>
> Cc: Alexander Graf<agraf@suse.de>
> ---
>   arch/powerpc/include/asm/kvm_host.h     |  1 +
>   arch/powerpc/kernel/asm-offsets.c       |  1 +
>   arch/powerpc/kvm/book3s_64_mmu_hv.c     |  2 +-
>   arch/powerpc/kvm/book3s_hv.c            | 44 +++++++++++++++++++++++++++++++++
>   arch/powerpc/kvm/book3s_hv_rmhandlers.S | 15 ++++-------
>   5 files changed, 52 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 3328353..50f204e 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -605,6 +605,7 @@ struct kvm_vcpu_arch {
>   	spinlock_t tbacct_lock;
>   	u64 busy_stolen;
>   	u64 busy_preempt;
> +	unsigned long intr_msr;
>   #endif
>   };
>
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index d8958be..3967b15 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -490,6 +490,7 @@ int main(void)
>   	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
>   	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
>   	DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
> +	DEFINE(VCPU_INTR_MSR, offsetof(struct kvm_vcpu, arch.intr_msr));
>   #endif
>   #ifdef CONFIG_PPC_BOOK3S
>   	DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 043eec8..30459bd 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -266,7 +266,7 @@ void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
>
>   static void kvmppc_mmu_book3s_64_hv_reset_msr(struct kvm_vcpu *vcpu)
>   {
> -	kvmppc_set_msr(vcpu, MSR_SF | MSR_ME);
> +	kvmppc_set_msr(vcpu, vcpu->arch.intr_msr);
>   }
>
>   /*
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 62a2b5a..c9b90b8 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -503,6 +503,43 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
>   	vcpu->arch.dtl.dirty = true;
>   }
>
> +static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
> +			     unsigned long resource, unsigned long value1,
> +			     unsigned long value2)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_vcpu *v;
> +	int n;
> +
> +	if (resource == 4) {
> +		if (value1)
> +			return H_P3;
> +		if (value2)
> +			return H_P4;
> +
> +		switch (mflags) {
> +		case 0:
> +			kvm->arch.lpcr&= ~LPCR_ILE;

Can we live migrate this properly?


Alex

^ permalink raw reply

* Re: [PATCH v2] powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
From: David Miller @ 2013-09-30 18:50 UTC (permalink / raw)
  To: aida.mynzhasova; +Cc: devicetree, richardcochran, linuxppc-dev, netdev
In-Reply-To: <1380093863-5388-1-git-send-email-aida.mynzhasova@skitlab.ru>

From: Aida Mynzhasova <aida.mynzhasova@skitlab.ru>
Date: Wed, 25 Sep 2013 11:24:23 +0400

> Currently IEEE 1588 timer reference clock source is determined through
> hard-coded value in gianfar_ptp driver. This patch allows to select ptp
> clock source by means of device tree file node.
> 
> For instance:
> 
> 	fsl,cksel = <0>;
> 
> for using external (TSEC_TMR_CLK input) high precision timer
> reference clock.
> 
> Other acceptable values:
> 
> 	<1> : eTSEC system clock
> 	<2> : eTSEC1 transmit clock
> 	<3> : RTC clock input
> 
> When this attribute isn't used, eTSEC system clock will serve as
> IEEE 1588 timer reference clock.
> 
> Signed-off-by: Aida Mynzhasova <aida.mynzhasova@skitlab.ru>

Applied, thanks.

^ permalink raw reply

* Avoiding the dentry d_lock on final dput(), part deux: transactional memory
From: Linus Torvalds @ 2013-09-30 19:29 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Waiman Long, Benjamin Herrenschmidt
  Cc: Norton, Scott J, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, George Spelvin, linux-fsdevel,
	ppc-dev

[-- Attachment #1: Type: text/plain, Size: 3956 bytes --]

So with all the lockref work, we now avoid the dentry d_lock for
almost all normal cases.

There is one single remaining common case, though: the final dput()
when the dentry count goes down to zero, and we need to check if we
are supposed to get rid of the dentry (or at least put it on the LRU
lists etc).

And that's something lockref itself cannot really help us with unless
we start adding status bits to it (eg some kind of "enable slow-case"
bit in the lock part of the lockref). Which sounds like a clever but
very fragile approach.

However, I did get myself a i7-4770S exactly because I was
forward-thinking, and wanted to try using transactional memory for
these kinds of things.

Quite frankly, from all I've seen so far, the kernel is not going to
have very good luck with things like lock elision, because we're
really fine-grained already, and at least the Intel lock-elision
(don't know about POWER8) basically requires software to do prediction
on whether the transaction will succeed or not, dynamically based on
aborts etc. And quite frankly, by the time you have to do things like
that, you've already lost. We're better off just using our normal
locks.

So as far as I'm concerned, transactional memory is going to be useful
- *if* it is useful - only for specialized code. Some of that might be
architecture-internal lock implementations, other things might be
exactly the dput() kind of situation.

And the thing is, *normally* dput() doesn't need to do anything at
all, except decrement the dentry reference count. However, for that
normal case to be true, we need to atomically check:

 - that the dentry lock isn't held (same as lockref)
 - that we are already on the LRU list and don't need to add ourselves to it
 - that we already have the DCACHE_REFERENCED bit set and don't need to set it
 - that the dentry isn't unhashed and needs to be killed.

Additionally, we need to check that it's not a dentry that has a
"d_delete()" operation, but that's a static attribute of a dentry, so
that's not something that we need to check atomically wrt the other
things.

ANYWAY. With all that out of the way, the basic point is that this is
really simple, and fits very well with even very limited transactional
memory. We literally need to do just a single write, and something
like three reads from memory. And we already have a trivial fallback,
namely the old code using the lockrefs.  IOW, it's literally ten
straight-line instructions between the xbegin and the xend for me.

So here's a patch that works for me. It requires gcc to know "asm
goto", and it requires binutils that know about xbegin/xabort. And it
requires a CPU that supports the intel RTM extensions.

But I'm cc'ing the POWER people, because I don't know the POWER8
interfaces, and I don't want to necessarily call this "xbegin"/"xend"
when I actually wrap it in some helper functions.

Anyway, profiles with this look beautiful (I'm using "make -j" on a
fully built allmodconfig kernel tree as the source of profile data).
There's no spinlocks from dput at all, and the dput() profile itself
shows basically 1% in anything but the fastpath (probably the _very_
occasional first accesses where we need to add things to the LRU
lists).

And the patch is small, but is obviously totally lacking any test for
CPU support or anything like that. Or portability. But I thought I'd
get the ball rolling, because I doubt the intel TSX patches are going
to be useful (if they were, Intel would be crowing about performance
numbers now that the CPU's are out, and they aren't).

I don't know if the people doing HP performance testing have
TSX-enabled machines, but hey, maybe. So you guys are cc'd too.

I also didn't actually check if performance is affected. I doubt it is
measurable on this machine, especially on "make -j" that spends 90% of
its time in user space. But the profile comparison really does make it
look good..

Comments?

                         Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 1614 bytes --]

 fs/dcache.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 41000305d716..c988806b941e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -603,6 +603,9 @@ relock:
  * Real recursion would eat up our stack space.
  */

+#define is_simple_dput(dentry) \
+	(((dentry)->d_flags & (DCACHE_REFERENCED |DCACHE_LRU_LIST)) == (DCACHE_REFERENCED |DCACHE_LRU_LIST))
+
 /*
  * dput - release a dentry
  * @dentry: dentry to release 
@@ -617,6 +620,35 @@ void dput(struct dentry *dentry)
 	if (unlikely(!dentry))
 		return;

+	/*
+	 * Try RTM for the trivial - and common - case.
+	 *
+	 * We don't do this for DCACHE_OP_DELETE (which is a static flag,
+	 * so check it outside the transaction), and we require that the
+	 * dentry is already marked referenced and on the LRU list.
+	 *
+	 * If that is true, and the dentry is not locked, we can just
+	 * decrement the usage count.
+	 *
+	 * This is kind of a special super-case of lockref_put(), but
+	 * atomically testing the dentry flags to make sure that there
+	 * is nothing else we need to look at.
+	 */
+	if (unlikely(dentry->d_flags & DCACHE_OP_DELETE))
+		goto repeat;
+	asm goto("xbegin %l[repeat]": : :"memory","ax":repeat);
+	if (unlikely(d_unhashed(dentry)))
+		goto xabort;
+	if (unlikely(!is_simple_dput(dentry)))
+		goto xabort;
+	if (unlikely(!arch_spin_value_unlocked(dentry->d_lock.rlock.raw_lock)))
+		goto xabort;
+	dentry->d_lockref.count--;
+	asm volatile("xend");
+	return;
+
+xabort:
+	asm volatile("xabort $0");
 repeat:
 	if (lockref_put_or_lock(&dentry->d_lockref))
 		return;

^ permalink raw reply related

* Re: [PATCH 1/2][v7] powerpc/mpc85xx:Add initial device tree support of T104x
From: Scott Wood @ 2013-09-30 19:47 UTC (permalink / raw)
  To: Prabhakar Kushwaha
  Cc: Varun Sethi, linuxppc-dev, Poonam Aggrwal, Priyanka Jain
In-Reply-To: <1380524042-13720-1-git-send-email-prabhakar@freescale.com>

On Mon, 2013-09-30 at 12:24 +0530, Prabhakar Kushwaha wrote:
>     - Removed l2switch. It will be added later

Why?

> +sata@220000 {
> +			fsl,iommu-parent = <&pamu0>;
> +			fsl,liodn-reg = <&guts 0x550>; /* SATA1LIODNR */
> +};
> +/include/ "qoriq-sata2-1.dtsi"
> +sata@221000 {
> +			fsl,iommu-parent = <&pamu0>;
> +			fsl,liodn-reg = <&guts 0x554>; /* SATA2LIODNR */
> +};

Whitespace

> +/include/ "qoriq-sec5.0-0.dtsi"
> +};
> diff --git a/arch/powerpc/boot/dts/fsl/t1042si-post.dtsi b/arch/powerpc/boot/dts/fsl/t1042si-post.dtsi
> new file mode 100644
> index 0000000..f286a50
> --- /dev/null
> +++ b/arch/powerpc/boot/dts/fsl/t1042si-post.dtsi
> @@ -0,0 +1,35 @@
> +/*
> + * T1042 Silicon/SoC Device Tree Source (post include)
> + *
> + * Copyright 2013 Freescale Semiconductor Inc.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are met:
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in the
> + *       documentation and/or other materials provided with the distribution.
> + *     * Neither the name of Freescale Semiconductor nor the
> + *       names of its contributors may be used to endorse or promote products
> + *       derived from this software without specific prior written permission.
> + *
> + *
> + * ALTERNATIVELY, this software may be distributed under the terms of the
> + * GNU General Public License ("GPL") as published by the Free Software
> + * Foundation, either version 2 of that License or (at your option) any
> + * later version.
> + *
> + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
> + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
> + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
> + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +/include/ "t1040si-post.dtsi"

Should at least have a comment indicating that eventually this should
hold the l2 switch node.

> +	aliases {
> +		ccsr = &soc;
> +		dcsr = &dcsr;
> +
> +		serial0 = &serial0;
> +		serial1 = &serial1;
> +		serial2 = &serial2;
> +		serial3 = &serial3;
> +		pci0 = &pci0;
> +		pci1 = &pci1;
> +		pci2 = &pci2;
> +		pci3 = &pci3;
> +		usb0 = &usb0;
> +		usb1 = &usb1;
> +		sdhc = &sdhc;
> +
> +		crypto = &crypto;
> +
> +	};
> +
> +	cpus {
> +		#address-cells = <1>;
> +		#size-cells = <0>;
[snip]
> +			L2_4: l2-cache {
> +				next-level-cache = <&cpc>;
> +			};
> +		};
> +
> +	};
> +};

Don't leave a blank line before a closing brace.

-Scott

^ permalink raw reply

* Re: Avoiding the dentry d_lock on final dput(), part deux: transactional memory
From: Linus Torvalds @ 2013-09-30 20:04 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, George Spelvin, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, linux-fsdevel,
	ppc-dev, Ingo Molnar
In-Reply-To: <5249D897.1010109@hp.com>

On Mon, Sep 30, 2013 at 1:01 PM, Waiman Long <waiman.long@hp.com> wrote:
>
> I think this patch is worth a trial if relevant hardware is more widely
> available. The TSX code certainly need to be moved to an architecture
> specific area and should be runtime enabled using a static key. We also need
> more TSX support infrastructure in place first.

I think we can pick that up from Andi's patches, he should have that.
Although that did have very x86-specific naming (ie "xbegin"). And I
don't think he used "asm goto" to quite the same advantage as this -
and I think we do want to make sure that the overhead is minimal.

                  Linus

^ permalink raw reply

* Re: Avoiding the dentry d_lock on final dput(), part deux: transactional memory
From: Waiman Long @ 2013-09-30 20:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, George Spelvin, Linux Kernel Mailing List,
	Chandramouleeswaran, Aswin, Norton, Scott J, linux-fsdevel,
	ppc-dev, Ingo Molnar
In-Reply-To: <CA+55aFwp-RAKEMVvbMG53C=kTMqS09NPXy1eP4xEUYZpvV8X5Q@mail.gmail.com>

On 09/30/2013 03:29 PM, Linus Torvalds wrote:
> So with all the lockref work, we now avoid the dentry d_lock for
> almost all normal cases.
>
> There is one single remaining common case, though: the final dput()
> when the dentry count goes down to zero, and we need to check if we
> are supposed to get rid of the dentry (or at least put it on the LRU
> lists etc).
>
> And that's something lockref itself cannot really help us with unless
> we start adding status bits to it (eg some kind of "enable slow-case"
> bit in the lock part of the lockref). Which sounds like a clever but
> very fragile approach.
>
> However, I did get myself a i7-4770S exactly because I was
> forward-thinking, and wanted to try using transactional memory for
> these kinds of things.
>
> Quite frankly, from all I've seen so far, the kernel is not going to
> have very good luck with things like lock elision, because we're
> really fine-grained already, and at least the Intel lock-elision
> (don't know about POWER8) basically requires software to do prediction
> on whether the transaction will succeed or not, dynamically based on
> aborts etc. And quite frankly, by the time you have to do things like
> that, you've already lost. We're better off just using our normal
> locks.
>
> So as far as I'm concerned, transactional memory is going to be useful
> - *if* it is useful - only for specialized code. Some of that might be
> architecture-internal lock implementations, other things might be
> exactly the dput() kind of situation.
>
> And the thing is, *normally* dput() doesn't need to do anything at
> all, except decrement the dentry reference count. However, for that
> normal case to be true, we need to atomically check:
>
>   - that the dentry lock isn't held (same as lockref)
>   - that we are already on the LRU list and don't need to add ourselves to it
>   - that we already have the DCACHE_REFERENCED bit set and don't need to set it
>   - that the dentry isn't unhashed and needs to be killed.
>
> Additionally, we need to check that it's not a dentry that has a
> "d_delete()" operation, but that's a static attribute of a dentry, so
> that's not something that we need to check atomically wrt the other
> things.
>
> ANYWAY. With all that out of the way, the basic point is that this is
> really simple, and fits very well with even very limited transactional
> memory. We literally need to do just a single write, and something
> like three reads from memory. And we already have a trivial fallback,
> namely the old code using the lockrefs.  IOW, it's literally ten
> straight-line instructions between the xbegin and the xend for me.
>
> So here's a patch that works for me. It requires gcc to know "asm
> goto", and it requires binutils that know about xbegin/xabort. And it
> requires a CPU that supports the intel RTM extensions.
>
> But I'm cc'ing the POWER people, because I don't know the POWER8
> interfaces, and I don't want to necessarily call this "xbegin"/"xend"
> when I actually wrap it in some helper functions.
>
> Anyway, profiles with this look beautiful (I'm using "make -j" on a
> fully built allmodconfig kernel tree as the source of profile data).
> There's no spinlocks from dput at all, and the dput() profile itself
> shows basically 1% in anything but the fastpath (probably the _very_
> occasional first accesses where we need to add things to the LRU
> lists).
>
> And the patch is small, but is obviously totally lacking any test for
> CPU support or anything like that. Or portability. But I thought I'd
> get the ball rolling, because I doubt the intel TSX patches are going
> to be useful (if they were, Intel would be crowing about performance
> numbers now that the CPU's are out, and they aren't).
>
> I don't know if the people doing HP performance testing have
> TSX-enabled machines, but hey, maybe. So you guys are cc'd too.

The Xeon class CPUs are typically about one year behind the consumer CPU 
chips. We are testing large NUMA machine with IvyBridge-EX CPUs right 
now. Haswell-EX has to be at least one year out. So we don't have the 
hardware to do the testing at the moment.

> I also didn't actually check if performance is affected. I doubt it is
> measurable on this machine, especially on "make -j" that spends 90% of
> its time in user space. But the profile comparison really does make it
> look good..
>
> Comments?
>
>                           Linus

I think this patch is worth a trial if relevant hardware is more widely 
available. The TSX code certainly need to be moved to an architecture 
specific area and should be runtime enabled using a static key. We also 
need more TSX support infrastructure in place first.

-Longman

^ permalink raw reply

* [PATCH v2 RESEND 3/3] cpufreq: pmac64: enable cpufreq on iMac G5 (iSight) model
From: Aaro Koskinen @ 2013-09-30 20:44 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Benjamin Herrenschmidt, linux-pm,
	linuxppc-dev
  Cc: Aaro Koskinen
In-Reply-To: <1380573873-14448-1-git-send-email-aaro.koskinen@iki.fi>

Enable cpufreq on iMac G5 (iSight) model. Tested with the 2.1 GHz version.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/pmac64-cpufreq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/pmac64-cpufreq.c b/drivers/cpufreq/pmac64-cpufreq.c
index 0196d60..6084f9b 100644
--- a/drivers/cpufreq/pmac64-cpufreq.c
+++ b/drivers/cpufreq/pmac64-cpufreq.c
@@ -397,7 +397,8 @@ static int __init g5_neo2_cpufreq_init(struct device_node *cpunode)
 	/* Check supported platforms */
 	if (of_machine_is_compatible("PowerMac8,1") ||
 	    of_machine_is_compatible("PowerMac8,2") ||
-	    of_machine_is_compatible("PowerMac9,1"))
+	    of_machine_is_compatible("PowerMac9,1") ||
+	    of_machine_is_compatible("PowerMac12,1"))
 		use_volts_smu = 1;
 	else if (of_machine_is_compatible("PowerMac11,2"))
 		use_volts_vdnap = 1;
-- 
1.8.4.rc3

^ permalink raw reply related

* [PATCH v2 RESEND 1/3] cpufreq: pmac64: speed up frequency switch
From: Aaro Koskinen @ 2013-09-30 20:44 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Benjamin Herrenschmidt, linux-pm,
	linuxppc-dev
  Cc: Aaro Koskinen
In-Reply-To: <1380573873-14448-1-git-send-email-aaro.koskinen@iki.fi>

Some functions on switch path use msleep() which is inaccurate, and
depends on HZ. With HZ=100 msleep(1) takes actually over ten times longer.
Using usleep_range() we get more accurate sleeps.

I measured the "pfunc_slewing_done" polling to take 300us at max (on
2.3GHz dual-processor Xserve G5), so using 500us sleep there should
be fine.

With the patch, g5_switch_freq() duration drops from ~50ms to ~10ms on
Xserve with HZ=100.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/pmac64-cpufreq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/pmac64-cpufreq.c b/drivers/cpufreq/pmac64-cpufreq.c
index 3a51ad7..46ae223 100644
--- a/drivers/cpufreq/pmac64-cpufreq.c
+++ b/drivers/cpufreq/pmac64-cpufreq.c
@@ -142,7 +142,7 @@ static void g5_vdnap_switch_volt(int speed_mode)
 		pmf_call_one(pfunc_vdnap0_complete, &args);
 		if (done)
 			break;
-		msleep(1);
+		usleep_range(1000, 1000);
 	}
 	if (done == 0)
 		printk(KERN_WARNING "cpufreq: Timeout in clock slewing !\n");
@@ -241,7 +241,7 @@ static void g5_pfunc_switch_volt(int speed_mode)
 		if (pfunc_cpu1_volt_low)
 			pmf_call_one(pfunc_cpu1_volt_low, NULL);
 	}
-	msleep(10); /* should be faster , to fix */
+	usleep_range(10000, 10000); /* should be faster , to fix */
 }
 
 /*
@@ -286,7 +286,7 @@ static int g5_pfunc_switch_freq(int speed_mode)
 		pmf_call_one(pfunc_slewing_done, &args);
 		if (done)
 			break;
-		msleep(1);
+		usleep_range(500, 500);
 	}
 	if (done == 0)
 		printk(KERN_WARNING "cpufreq: Timeout in clock slewing !\n");
-- 
1.8.4.rc3

^ permalink raw reply related

* [PATCH v2 RESEND 0/3] cpufreq/ondemand support for Xserve G5 & iMac G5 iSight
From: Aaro Koskinen @ 2013-09-30 20:44 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Benjamin Herrenschmidt, linux-pm,
	linuxppc-dev
  Cc: Aaro Koskinen

Hi,

This is a resend of the v2 patchset:

	http://marc.info/?t=137797013200001&r=1&w=2

No changes except rebasing. Any chance to get these into v3.13?

Aaro Koskinen (3):
  cpufreq: pmac64: speed up frequency switch
  cpufreq: pmac64: provide cpufreq transition latency for older G5
    models
  cpufreq: pmac64: enable cpufreq on iMac G5 (iSight) model

 drivers/cpufreq/pmac64-cpufreq.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

-- 
1.8.4.rc3

^ permalink raw reply

* [PATCH v2 RESEND 2/3] cpufreq: pmac64: provide cpufreq transition latency for older G5 models
From: Aaro Koskinen @ 2013-09-30 20:44 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Benjamin Herrenschmidt, linux-pm,
	linuxppc-dev
  Cc: Aaro Koskinen
In-Reply-To: <1380573873-14448-1-git-send-email-aaro.koskinen@iki.fi>

Currently cpufreq ondemand governor cannot used on older G5 models,
because the transition latency is set to CPUFREQ_ETERNAL. Provide a
value based on a measurement on Xserve G5, which happens to be also the
highest allowed latency.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
---
 drivers/cpufreq/pmac64-cpufreq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/pmac64-cpufreq.c b/drivers/cpufreq/pmac64-cpufreq.c
index 46ae223..0196d60 100644
--- a/drivers/cpufreq/pmac64-cpufreq.c
+++ b/drivers/cpufreq/pmac64-cpufreq.c
@@ -647,8 +647,10 @@ static int __init g5_pm72_cpufreq_init(struct device_node *cpunode)
 	g5_cpu_freqs[0].frequency = max_freq;
 	g5_cpu_freqs[1].frequency = min_freq;
 
+	/* Based on a measurement on Xserve G5, rounded up. */
+	transition_latency = 10 * NSEC_PER_MSEC;
+
 	/* Set callbacks */
-	transition_latency = CPUFREQ_ETERNAL;
 	g5_switch_volt = g5_pfunc_switch_volt;
 	g5_switch_freq = g5_pfunc_switch_freq;
 	g5_query_freq = g5_pfunc_query_freq;
-- 
1.8.4.rc3

^ permalink raw reply related

* Re: [PATCH 2/2] pci: fsl: rework PCI driver compatible with Layerscape
From: Scott Wood @ 2013-09-30 22:41 UTC (permalink / raw)
  To: Lian Minghuan-b31939
  Cc: linux-pci, Zang Roy-R61911, Timur Tabi, Minghuan Lian,
	Bjorn Helgaas, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <5248D94E.7080904@freescale.com>

On Mon, 2013-09-30 at 09:52 +0800, Lian Minghuan-b31939 wrote:
> Hi Timur,
> 
> Thanks for your comments.
> 
> How about PCI_FSL_COMMON?

Why not just have one symbol, which is used in the makefile in both
drivers/pci and arch/<whatever>?

-Scott

^ permalink raw reply

* Re: [PATCH 2/2] pci: fsl: rework PCI driver compatible with Layerscape
From: Scott Wood @ 2013-09-30 22:50 UTC (permalink / raw)
  To: Lian Minghuan-b31939
  Cc: linux-pci, Zang Roy-R61911, Minghuan Lian, Bjorn Helgaas,
	linuxppc-dev
In-Reply-To: <5248144A.6070307@freescale.com>

On Sun, 2013-09-29 at 19:51 +0800, Lian Minghuan-b31939 wrote:
> >> +/*
> >> + * The fsl_arch_* functions are arch hooks. Those functions are
> >> + * implemented as weak symbols so that they can be overridden by
> >> + * architecture specific code if needed.
> >> + */
> >> +
> >> +/* Return PCI64 DMA offset */
> >> +u64 fsl_arch_pci64_dma_offset(void);
> > Is this always guaranteed to exist?
> [Minghuan] Yes. I define a __weak implementation in pci-fsl.c

I meant is it guaranteed that such a mapping exists?  The API doesn't
give a way to say "no".

Also you should describe what "PCI64 DMA offset" means.

> >> +/* Register PCI/PCIe controller to architecture system */
> >> +int __weak fsl_arch_pci_sys_register(struct fsl_pci *pci);
> >> +
> >> +/* Remove PCI/PCIe controller from architecture system */
> >> +void __weak fsl_arch_pci_sys_remove(struct fsl_pci *pci);
> > Why do these need to be weak?  Won't there be exactly one implementation
> > per supported arch?
> [Minghuan] I added __weak for compiling kernel when selecting pci-fsl 
> module but
> there is no related arch pci implementation.

The module should not be buildable on arches that don't have support.

-Scott

^ permalink raw reply

* Re: Avoiding the dentry d_lock on final dput(), part deux: transactional memory
From: Benjamin Herrenschmidt @ 2013-09-30 22:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, Peter Zijlstra, Norton, Scott J,
	Linux Kernel Mailing List, Chandramouleeswaran, Aswin,
	Michael Neuling, George Spelvin, linux-fsdevel, ppc-dev,
	Ingo Molnar
In-Reply-To: <CA+55aFwp-RAKEMVvbMG53C=kTMqS09NPXy1eP4xEUYZpvV8X5Q@mail.gmail.com>

On Mon, 2013-09-30 at 12:29 -0700, Linus Torvalds wrote:
> 
> But I'm cc'ing the POWER people, because I don't know the POWER8
> interfaces, and I don't want to necessarily call this "xbegin"/"xend"
> when I actually wrap it in some helper functions.

The main problem with powerpc TM is that we need to handle interrupts
happening while in transactional state. We currently only handle that
for userspace.

Mikey, how hard would it be to detect the in-kernel TM case and just
simulate an abort in the exception entry ? (return to tbegin basically)

Transactions in kernel make me nervous because of the PC jumping around
on aborts and how easy we can get that stuff wrong :-) They also have
interesting ordering semantics vs. locks, we need to be a tad careful
(as long as we don't access a lock variable transactionally we should be
ok. If we do, then spin_unlock needs a stronger barrier).

The basic semantic for us is tbegin. [...] tend instructions. If the
transaction fails, control returns to tbegin. (can happen at any point)
which returns a CC code indicating success or failure.

Things get really complicated if you take an interrupt however, the
transaction gets into some special "suspended" state, it doesn't just
die so we need to handle things in our interrupt entry (even if it's
just to make the transaction abort cleanly) and right now we don't when
coming from kernel space.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
From: Scott Wood @ 2013-09-30 23:06 UTC (permalink / raw)
  To: Wang Dongsheng-B40534
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org,
	Bhushan Bharat-R65777
In-Reply-To: <ABB05CD9C9F68C46A5CEDC7F15439259010920B0@039-SN2MPN1-021.039d.mgd.msft.net>

On Sun, 2013-09-29 at 01:57 -0500, Wang Dongsheng-B40534 wrote:
> 
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Saturday, September 28, 2013 5:33 AM
> > To: Wang Dongsheng-B40534
> > Cc: Wood Scott-B07421; Bhushan Bharat-R65777; linuxppc-
> > dev@lists.ozlabs.org
> > Subject: Re: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and
> > altivec idle
> > 
> > On Thu, 2013-09-26 at 22:34 -0500, Wang Dongsheng-B40534 wrote:
> > > cycle = div_u64(ns * tb_ticks_per_usec, 1000); It's look good.
> > > But why we need:
> > > 	if (ns >= 10000)
> > > 		cycle = ((ns + 500) / 1000) * tb_ticks_per_usec; ?
> > >
> > > I think "cycle = div_u64(ns * tb_ticks_per_usec, 1000)" is good
> > > enough. :)
> > 
> > It's to deal with overflow if a very large value of ns is provided
> > (and/or tb_ticks_per_usec is larger than we expect).
> > 
> :), as I think, it's to deal with overflow. But you version cannot do deal with it.
> Because ns is u64, if ns > 0xffffffff_fffffe0b, ns + 500 will overflow, And if tb_ticks_per_usec > 1000 and ns > (0xffffffff_ffffffff / 10) cycle also will overflow.

Sigh... It significantly increases the value of ns at which you'll get
overflow problems. :-)  I was more concerned with
large-but-somewhat-reasonable values of ns (e.g. than with trying to
handle every possible input.  Even without that test, getting overflow
is stretching the bounds of reasonableness (e.g. a 1 GHz timebase would
require a timeout of over 7 months to overflow), but it was low-hanging
fruit.  And the worst case is we go to pw20 sooner than the user wanted,
so let's not go too overboard.

I doubt you would see > 0xffffffff_fffffe0b except if someone is trying
to disable it by setting 0xffffffff_ffffffff even though a separate API
is provided to cleanly disable it.

> I think we need to do this:
> 
> #define U64_LOW_MASK            0xffffffffULL
> #define U64_MASK                0xffffffffffffffffULL
> 
>         u32 tmp_rem;
>         u64 ns_u_rem, ns_u, ns_l, ns_l_carry;
>         u64 cycle;
> 
>         ns_u = ns >> 32;
>         ns_l = ns & U64_LOW_MASK;
> 
>         ns_l *= tb_ticks_per_usec;
>         ns_l_carry = ns_l >> 32;
>         ns_u *= tb_ticks_per_usec;
>         ns_u += ns_l_carry;
> 
>         ns_u = div_u64_rem(ns_u, 1000, &tmp_rem);
>         ns_u_rem = tmp_rem;
>         ns_l = (ns_l & U64_LOW_MASK) | ((ns_u_rem) << 32);
>         ns_l = div_u64(ns_l, 1000);
> 
>         if (ns_u >> 32)
>                 cycle = U64_MASK;
>         else
>                 cycle = (ns_u << 32) | (ns_l & U64_LOW_MASK);
> 
> I has already tested this code, and works good. :)

Ugh.  I don't think we need to get this complicated (and I'd rather not
spend the time verifying the correctness of this).

If for some reason we did need something like this in some other context
(I don't want to see it just for pw20), I'd be more inclined to see
general 128-bit mult/divide support.

-Scott

^ permalink raw reply

* Re: Avoiding the dentry d_lock on final dput(), part deux: transactional memory
From: Michael Neuling @ 2013-10-01  0:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Waiman Long, Peter Zijlstra, Norton, Scott J, ppc-dev,
	Linux Kernel Mailing List, Chandramouleeswaran, Aswin,
	George Spelvin, linux-fsdevel, Linus Torvalds, Ingo Molnar
In-Reply-To: <1380581522.6396.20.camel@pasglop>

Ben,

> On Mon, 2013-09-30 at 12:29 -0700, Linus Torvalds wrote:
> > 
> > But I'm cc'ing the POWER people, because I don't know the POWER8
> > interfaces, and I don't want to necessarily call this "xbegin"/"xend"
> > when I actually wrap it in some helper functions.
> 
> The main problem with powerpc TM is that we need to handle interrupts
> happening while in transactional state. We currently only handle that
> for userspace.

Yep.  

> Mikey, how hard would it be to detect the in-kernel TM case and just
> simulate an abort in the exception entry ? (return to tbegin basically)

It's doable.

The scary part is that we to make all register volatile.  You were not
that keen on doing this as there are a lot of places in exception
entry/exit where we only save/restore a subset of the registers.  We'd
need to catch all these.

> Transactions in kernel make me nervous because of the PC jumping
> around on aborts and how easy we can get that stuff wrong :-) 

The same applies for userspace.  We are pretty careful not to screw that
up though.

It's also one of the reason we don't do software rollback of userspace
transactions even in transactional (non-suspended) mode.  We always save
and restore all state and let the HW deal with the PC and state jumping
around.

> They also have interesting ordering semantics vs. locks, we need to be
> a tad careful (as long as we don't access a lock variable
> transactionally we should be ok. If we do, then spin_unlock needs a
> stronger barrier).

Yep.

> The basic semantic for us is tbegin. [...] tend instructions. If the
> transaction fails, control returns to tbegin. (can happen at any
> point) which returns a CC code indicating success or failure.

FWIW eg.

     tbegin
     beq abort /* passes first time through */
     ....
     transactional stuff
     ....
     tend
     b pass
     
abort:

pass:

> Things get really complicated if you take an interrupt however, the
> transaction gets into some special "suspended" state, it doesn't just
> die so we need to handle things in our interrupt entry (even if it's
> just to make the transaction abort cleanly) and right now we don't
> when coming from kernel space.

Yep, but we could check easily enough.

Mikey

^ permalink raw reply

* Re: Avoiding the dentry d_lock on final dput(), part deux: transactional memory
From: Linus Torvalds @ 2013-10-01  0:56 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Waiman Long, Peter Zijlstra, George Spelvin,
	Linux Kernel Mailing List, Chandramouleeswaran, Aswin,
	Norton, Scott J, linux-fsdevel, ppc-dev, Ingo Molnar
In-Reply-To: <23118.1380587771@ale.ozlabs.ibm.com>

On Mon, Sep 30, 2013 at 5:36 PM, Michael Neuling <mikey@neuling.org> wrote:
>
> The scary part is that we to make all register volatile.  You were not
> that keen on doing this as there are a lot of places in exception
> entry/exit where we only save/restore a subset of the registers.  We'd
> need to catch all these.

Ugh. It's very possible it's not worth using for the kernel then. The
example I posted is normally fine *without* any transactional support,
since it's a very local per-dentry lock, and since we only take that
lock when the last reference drops (so it's not some common directory
dentry, it's a end-point file dentry). In fact, on ARM they just made
the cmpxchg much faster by making it entirely non-serializing (since
it only updates a reference count, there is no locking involved apart
from checking that the lock state is unlocked)

So there is basically never any contention, and the transaction needs
to basically be pretty much the same cost as a "cmpxchg". It's not
clear if the intel TSX is good enough for that, and if you have to
save a lot of registers in order to use transactions on POWER8, I
doubt it's worthwhile.

We have very few - if any - locks where contention or even cache
bouncing is common or normal. Sure, we have a few particular loads
that can trigger it, but even that is becoming rare. So from a
performance standpoint, the target always needs to be "comparable to
hot spinlock in local cache".

>> They also have interesting ordering semantics vs. locks, we need to be
>> a tad careful (as long as we don't access a lock variable
>> transactionally we should be ok. If we do, then spin_unlock needs a
>> stronger barrier).
>
> Yep.

Well, just about any kernel transaction will at least read the state
of a lock. Without that, it's generally totally useless. My dput()
example sequence very much verified that the lock was not held, for
example.

I'm not sure how that affects anything. The actual transaction had
better not be visible inside the locked region (ie as far as any lock
users go, transactions better all happen fully before or after the
lock, if they read the lock and see it being unlocked).

That said, I cannot see how POWER8 could possibly violate that rule.
The whole "transactions are atomic" is kind of the whole and only
point of a transaction. So I'm not sure what odd lock restrictions
POWER8 could have.

> FWIW eg.
>
>      tbegin
>      beq abort /* passes first time through */
>      ....
>      transactional stuff
>      ....
>      tend
>      b pass
>
> abort:
>
> pass:

That's fine, and matches the x86 semantics fairly closely, except
"xbegin" kind of "contains" that "jump to abort address". But we could
definitely use the same models. Call it
"transaction_begin/abort/end()", and it should be architecture-neutral
naming-wise.

Of course, if tbegin then acts basically like some crazy
assembly-level setjmp (I'm guessing it does exactly, and presumably
precisely that kind of compiler support - ie a function with
"__attribute((returns_twice))" in gcc-speak), the overhead of doing it
may kill it.

            Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox