LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] iommu/fsl_pamu: use physical cpu index to find the matched cpu nodes
From: Scott Wood @ 2013-11-19  3:04 UTC (permalink / raw)
  To: Varun Sethi
  Cc: joro@8bytes.org, linuxppc-dev@lists.ozlabs.org,
	iommu@lists.linux-foundation.org
In-Reply-To: <C5ECD7A89D1DC44195F34B25E172658D0A5B966D@039-SN2MPN1-012.039d.mgd.msft.net>

On Mon, 2013-11-18 at 20:42 -0600, Varun Sethi wrote:
> For the DSP case again we have to set up the stash attribute. Are you saying that this should be a separate attribute?

Not necessarily a separate attribute, but there should be some way to
distinguish whether you're providing a Linux cpu number or some external
stash destination.

-Scott

^ permalink raw reply

* RE: [PATCH] iommu/fsl_pamu: use physical cpu index to find the matched cpu nodes
From: Varun Sethi @ 2013-11-19  2:42 UTC (permalink / raw)
  To: Scott Wood
  Cc: joro@8bytes.org, linuxppc-dev@lists.ozlabs.org,
	iommu@lists.linux-foundation.org
In-Reply-To: <1384803449.1403.313.camel@snotra.buserror.net>

Rm9yIHRoZSBEU1AgY2FzZSBhZ2FpbiB3ZSBoYXZlIHRvIHNldCB1cCB0aGUgc3Rhc2ggYXR0cmli
dXRlLiBBcmUgeW91IHNheWluZyB0aGF0IHRoaXMgc2hvdWxkIGJlIGEgc2VwYXJhdGUgYXR0cmli
dXRlPyANCg0KLVZhcnVuDQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTog
V29vZCBTY290dC1CMDc0MjENCj4gU2VudDogVHVlc2RheSwgTm92ZW1iZXIgMTksIDIwMTMgMTow
NyBBTQ0KPiBUbzogU2V0aGkgVmFydW4tQjE2Mzk1DQo+IENjOiBXYW5nIEhhaXlpbmctUjU0OTY0
OyBqb3JvQDhieXRlcy5vcmc7IGlvbW11QGxpc3RzLmxpbnV4LQ0KPiBmb3VuZGF0aW9uLm9yZzsg
bGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmcNCj4gU3ViamVjdDogUmU6IFtQQVRDSF0gaW9t
bXUvZnNsX3BhbXU6IHVzZSBwaHlzaWNhbCBjcHUgaW5kZXggdG8gZmluZCB0aGUNCj4gbWF0Y2hl
ZCBjcHUgbm9kZXMNCj4gDQo+IE9uIFRodSwgMjAxMy0xMS0xNCBhdCAyMToxNiAtMDYwMCwgU2V0
aGkgVmFydW4tQjE2Mzk1IHdyb3RlOg0KPiA+IEhhaXlpbmcvU2NvdHQsDQo+ID4gRm9yZ290IHRv
IG1lbnRpb24gdGhpcywgdGhlIFBBTVUgZHJpdmVyIGhhcyB0byBoYW5kbGUgc3Rhc2gNCj4gPiBk
ZXN0aW5hdGlvbiBzZXR0aW5ncyBib3RoIGZvciBwb3dlciBhbmQgZHNwIGNvcmVzIChvbiBCNCBw
bGF0Zm9ybSkuDQo+ID4gRm9yIHRoZSBkc3AgY29yZXMgd2Ugd291bGQgZXhwZWN0IHRoZSBwaHlz
aWNhbCBjb3JlIGlkIChub3QgY29udHJvbGxlZA0KPiBieSBMaW51eCkuDQo+ID4gVG8gbWFrZSB0
aGUgaW50ZXJmYWNlIGNvbnNpc3RlbnQsIEkgd291bGQgZXhwZWN0IHRoZSBjYWxsZXIgKGZvcg0K
PiA+IGlvbW11X3NldF9hdHRyKSB0byBwYXNzIHRoZSBwaHlzaWNhbCBjb3JlIGlkLg0KPiANCj4g
VGhhdCBzb3VuZHMgbGlrZSB5b3UgbmVlZCB0d28gZGlmZmVyZW50IGludGVyZmFjZXMuDQo+IA0K
PiAtU2NvdHQNCj4gDQoNCg==

^ permalink raw reply

* Re: [PATCH] powerpc: Don't use ELFv2 ABI to build the kernel
From: Alistair Popple @ 2013-11-19  0:46 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <1384802965.1403.304.camel@snotra.buserror.net>

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

On Mon, 18 Nov 2013 13:29:25 Scott Wood wrote:
> On Mon, 2013-11-18 at 17:21 +1100, Alistair Popple wrote:

[snip]

> 
> How hard would it be to get the kernel to work with the new ABI?

We are preparing patches at the moment to allow the kernel to support a 
userspace compiled with the new ABI, however the kernel itself still needs to 
be built using the old ABI.

I'm not sure how hard it would be to compile a kernel with the new ABI. It 
mostly compiles now however I'm guessing there's probably some code that would 
need porting to make it functional.

> Do you know where I can find a document for it?

The ABI is still under development so there isn't any documentation available 
for it yet.

> -Scott

Regards,

Alistair

[-- Attachment #2: Type: text/html, Size: 4307 bytes --]

^ permalink raw reply

* Re: [PATCH v4] powerpc: kvm: fix rare but potential deadlock scene
From: Alexander Graf @ 2013-11-18 21:43 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxppc-dev, Liu Ping Fan, kvm-ppc,
	kvm@vger.kernel.org mailing list
In-Reply-To: <6E84B29F-6921-4309-9A73-3AD8D8C6872A@suse.de>


On 18.11.2013, at 16:32, Alexander Graf <agraf@suse.de> wrote:

>=20
> On 16.11.2013, at 01:55, Paul Mackerras <paulus@samba.org> wrote:
>=20
>> On Fri, Nov 15, 2013 at 04:35:00PM +0800, Liu Ping Fan wrote:
>>> Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
>>> realmode, so it can trigger the deadlock.
>>>=20
>>> Suppose the following scene:
>>>=20
>>> Two physical cpuM, cpuN, two VM instances A, B, each VM has a group =
of
>>> vcpus.
>>>=20
>>> If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is =
switched
>>> out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will =
be
>>> caught in realmode for a long time.
>>>=20
>>> What makes things even worse if the following happens,
>>> On cpuM, bitlockX is hold, on cpuN, Y is hold.
>>> vcpu_B_2 try to lock Y on cpuM in realmode
>>> vcpu_A_2 try to lock X on cpuN in realmode
>>>=20
>>> Oops! deadlock happens
>>>=20
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>=20
>> Reviewed-by: Paul Mackerras <paulus@samba.org>
>=20
> Thanks, applied to kvm-ppc-queue.

Actually, I've changed my mind and moved the patch to the for-3.13 =
branch instead. Please make sure to CC kvm@vger on all patches you =
submit though.


Alex

^ permalink raw reply

* Re: [PATCH v4] powerpc: kvm: fix rare but potential deadlock scene
From: Alexander Graf @ 2013-11-18 21:32 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Liu Ping Fan, kvm-ppc
In-Reply-To: <20131116065517.GA18339@iris.ozlabs.ibm.com>


On 16.11.2013, at 01:55, Paul Mackerras <paulus@samba.org> wrote:

> On Fri, Nov 15, 2013 at 04:35:00PM +0800, Liu Ping Fan wrote:
>> Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
>> realmode, so it can trigger the deadlock.
>> 
>> Suppose the following scene:
>> 
>> Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of
>> vcpus.
>> 
>> If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched
>> out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be
>> caught in realmode for a long time.
>> 
>> What makes things even worse if the following happens,
>>  On cpuM, bitlockX is hold, on cpuN, Y is hold.
>>  vcpu_B_2 try to lock Y on cpuM in realmode
>>  vcpu_A_2 try to lock X on cpuN in realmode
>> 
>> Oops! deadlock happens
>> 
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Reviewed-by: Paul Mackerras <paulus@samba.org>

Thanks, applied to kvm-ppc-queue.


Alex

^ permalink raw reply

* Re: [PATCH v3] powerpc: kvm: optimize "sc 1" as fast return
From: Alexander Graf @ 2013-11-18 20:45 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc
In-Reply-To: <1384736947-21437-1-git-send-email-pingfank@linux.vnet.ibm.com>


On 17.11.2013, at 20:09, Liu Ping Fan <qemulist@gmail.com> wrote:

> In some scene, e.g openstack CI, PR guest can trigger "sc 1" =
frequently,
> this patch optimizes the path by directly delivering =
BOOK3S_INTERRUPT_SYSCALL
> to HV guest, so powernv can return to HV guest without heavy exit, =
i.e,
> no need to swap TLB, HTAB,.. etc
>=20
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
> arch/powerpc/kvm/book3s_hv.c            |  6 ------
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 +++++++++++-
> 2 files changed, 11 insertions(+), 7 deletions(-)
>=20
> diff --git a/arch/powerpc/kvm/book3s_hv.c =
b/arch/powerpc/kvm/book3s_hv.c
> index 62a2b5a..73dc852 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -628,12 +628,6 @@ static int kvmppc_handle_exit(struct kvm_run =
*run, struct kvm_vcpu *vcpu,
> 		/* hcall - punt to userspace */
> 		int i;
>=20
> -		if (vcpu->arch.shregs.msr & MSR_PR) {
> -			/* sc 1 from userspace - reflect to guest =
syscall */
> -			kvmppc_book3s_queue_irqprio(vcpu, =
BOOK3S_INTERRUPT_SYSCALL);
> -			r =3D RESUME_GUEST;
> -			break;
> -		}

Please document the fact that we never get here for

 1) hypercall with MSR_PR
 2) hypercall that was handled by rm code

so that when anyone reads the C file he doesn't have to dig through =
piles of asm code to understand what's happening.

> 		run->papr_hcall.nr =3D kvmppc_get_gpr(vcpu, 3);
> 		for (i =3D 0; i < 9; ++i)
> 			run->papr_hcall.args[i] =3D kvmppc_get_gpr(vcpu, =
4 + i);
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S =
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index c71103b..0d1e2c2 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1388,7 +1388,8 @@ kvmppc_hisi:
> hcall_try_real_mode:
> 	ld	r3,VCPU_GPR(R3)(r9)
> 	andi.	r0,r11,MSR_PR
> -	bne	guest_exit_cont
> +	/* sc 1 from userspace - reflect to guest syscall */
> +	bne	sc_1_fast_return
> 	clrrdi	r3,r3,2
> 	cmpldi	r3,hcall_real_table_end - hcall_real_table
> 	bge	guest_exit_cont
> @@ -1409,6 +1410,15 @@ hcall_try_real_mode:
> 	ld	r11,VCPU_MSR(r4)
> 	b	fast_guest_return
>=20
> +sc_1_fast_return:
> +	mtspr	SPRN_SRR0,r10
> +	mtspr	SPRN_SRR1,r11
> +	li	r10, BOOK3S_INTERRUPT_SYSCALL
> +	li	r11, (MSR_ME << 1) | 1  /* synthesize MSR_SF | MSR_ME */
> +	rotldi	r11, r11, 63
> +	mr	r4,r9
> +	b	fast_guest_return

While at it, please also document the input registers to =
fast_guest_return. As you've seen it's very easy to get them right when =
they're not properly documented in the header of the function.

Apart from these minor bits, the patch looks very sound and hopefully =
speeds up PR-in-HV by quite a bit.


Alex

^ permalink raw reply

* Re: [PATCH] iommu/fsl_pamu: use physical cpu index to find the matched cpu nodes
From: Scott Wood @ 2013-11-18 19:37 UTC (permalink / raw)
  To: Sethi Varun-B16395
  Cc: joro@8bytes.org, linuxppc-dev@lists.ozlabs.org,
	iommu@lists.linux-foundation.org
In-Reply-To: <C5ECD7A89D1DC44195F34B25E172658D0A5B0AE8@039-SN2MPN1-012.039d.mgd.msft.net>

On Thu, 2013-11-14 at 21:16 -0600, Sethi Varun-B16395 wrote:
> Haiying/Scott,
> Forgot to mention this, the PAMU driver has to handle stash destination
> settings both for power and dsp cores (on B4 platform). For the dsp
> cores we would expect the physical core id (not controlled by Linux).
> To make the interface consistent, I would expect the caller (for
> iommu_set_attr) to pass the physical core id.

That sounds like you need two different interfaces.

-Scott

^ permalink raw reply

* Re: [PATCH] powerpc: Don't use ELFv2 ABI to build the kernel
From: Scott Wood @ 2013-11-18 19:29 UTC (permalink / raw)
  To: Alistair Popple; +Cc: linuxppc-dev
In-Reply-To: <1384755694-30293-1-git-send-email-alistair@popple.id.au>

On Mon, 2013-11-18 at 17:21 +1100, Alistair Popple wrote:
> The kernel doesn't build correctly using the ELFv2 ABI.  This patch
> ensures that the ELFv1 ABI is used when building a kernel with an
> ELFv2 enabled compiler.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> ---
>  arch/powerpc/Makefile | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 607acf5..8a24636 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -111,6 +111,7 @@ endif
>  endif
>  
>  CFLAGS-$(CONFIG_PPC64)	:= -mtraceback=no -mcall-aixdesc
> +CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mabi=elfv1)
>  CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mcmodel=medium,-mminimal-toc)
>  CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mno-pointers-to-nested-functions)
>  CFLAGS-$(CONFIG_PPC32)	:= -ffixed-r2 $(MULTIPLEWORD)

How hard would it be to get the kernel to work with the new ABI?

Do you know where I can find a document for it?

-Scott

^ permalink raw reply

* Re: Ping^2 Re: [PATCH 0/6] powerpc/math-emu: e500 SPE float emulation fixes
From: Scott Wood @ 2013-11-18 19:07 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Liu Yu, linuxppc-dev, linux-kernel, Shan Hai
In-Reply-To: <Pine.LNX.4.64.1311181453560.32728@digraph.polyomino.org.uk>

On Mon, 2013-11-18 at 14:54 +0000, Joseph S. Myers wrote:
> Ping^2.  I still haven't seen any comments on any of these patches.
> 

Sorry for the delay -- I plan to take a look at them, but they came in
too late for the last pull request, especially since I'm not that
familiar with SPE and thus it'll take some time (and I've got other
time-consuming patches yet to review that came in even earlier).  As
long as the patches are still "action required" in patchwork, there's no
need to ping unless it's been a few months.

-Scott

^ permalink raw reply

* Re: [PATCH RFC v5 4/5] dma: mpc512x: register for device tree channel lookup
From: Mark Rutland @ 2013-11-18 15:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: devicetree@vger.kernel.org, Lars-Peter Clausen, Vinod Koul,
	Gerhard Sittig, Alexander Popov, Dan Williams, Anatolij Gustschin,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <201311181531.54818.arnd@arndb.de>

On Mon, Nov 18, 2013 at 02:31:54PM +0000, Arnd Bergmann wrote:
> On Monday 18 November 2013, Mark Rutland wrote:
> > > +++ b/Documentation/devicetree/bindings/dma/mpc512x-dma.txt
> > > @@ -0,0 +1,55 @@
> > > +* Freescale MPC512x DMA Controller
> > > +
> > > +The DMA controller in the Freescale MPC512x SoC can move blocks of
> > > +memory contents between memory and peripherals or memory to memory.
> > > +
> > > +Refer to the "Generic DMA Controller and DMA request bindings" description
> > > +in the dma.txt file for a more detailled discussion of the binding.  The
> > > +MPC512x DMA engine binding follows the common scheme, but doesn't provide
> > > +support for the optional channels and requests counters (those values are
> > > +derived from the detected hardware features) and has a fixed client
> > > +specifier length of 1 integer cell (the value is the DMA channel, since
> > > +the DMA controller uses a fixed assignment of request lines per channel).
> > 
> > The fact that #dma-cells must be <1> isn't a difference from the
> > standard binding, and needs not be described here. The meaning of the
> > value should be in your description of #dma-cells below.
> 
> I think the value it has to be in there, and I have in the past asked other
> people to add this. Note that in the generic binding, it says that it must
> be "at least 1". You can have controllers that require a larger number, or
> that can use 1 or 2 alternatively, depending on how the device is wired
> up, e.g. when a dma controller has two master ports you would need a
> second cell to specify the port number, but only if more than one port
> is actually connected to a slave.

The number of cells required should be described. My points were that it
should be described at the property description rather than in the introduction,
and that the fact that a specific value was required was not a
difference from the bindings as the paragraph implied.

> 
> > I'm not sure it's worth mentioning optional channels / request counters.
> > If anything, it would be better to update dma.txt to move the "Optional
> > properties" to something like "Suggested properties"...
> 
> These are less clearly defined. In the generic binding, it's mostly a matter
> of "if you need to pass this information, use these properties". The individual
> binding can then make them mandatory if needed.

Agreed. I'd prefer that bindings described the suggested properties they
used, rather than those they don't.

Thanks,
Mark.

^ permalink raw reply

* Ping^2 Re: [PATCH 0/6] powerpc/math-emu: e500 SPE float emulation fixes
From: Joseph S. Myers @ 2013-11-18 14:54 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Liu Yu, linux-kernel, Shan Hai
In-Reply-To: <Pine.LNX.4.64.1311111428160.18663@digraph.polyomino.org.uk>

Ping^2.  I still haven't seen any comments on any of these patches.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply

* Re: [PATCH RFC v5 4/5] dma: mpc512x: register for device tree channel lookup
From: Arnd Bergmann @ 2013-11-18 14:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: devicetree@vger.kernel.org, Lars-Peter Clausen, Vinod Koul,
	Gerhard Sittig, Alexander Popov, Dan Williams, Anatolij Gustschin,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20131118120957.GH30853@e106331-lin.cambridge.arm.com>

On Monday 18 November 2013, Mark Rutland wrote:
> > +++ b/Documentation/devicetree/bindings/dma/mpc512x-dma.txt
> > @@ -0,0 +1,55 @@
> > +* Freescale MPC512x DMA Controller
> > +
> > +The DMA controller in the Freescale MPC512x SoC can move blocks of
> > +memory contents between memory and peripherals or memory to memory.
> > +
> > +Refer to the "Generic DMA Controller and DMA request bindings" description
> > +in the dma.txt file for a more detailled discussion of the binding.  The
> > +MPC512x DMA engine binding follows the common scheme, but doesn't provide
> > +support for the optional channels and requests counters (those values are
> > +derived from the detected hardware features) and has a fixed client
> > +specifier length of 1 integer cell (the value is the DMA channel, since
> > +the DMA controller uses a fixed assignment of request lines per channel).
> 
> The fact that #dma-cells must be <1> isn't a difference from the
> standard binding, and needs not be described here. The meaning of the
> value should be in your description of #dma-cells below.

I think the value it has to be in there, and I have in the past asked other
people to add this. Note that in the generic binding, it says that it must
be "at least 1". You can have controllers that require a larger number, or
that can use 1 or 2 alternatively, depending on how the device is wired
up, e.g. when a dma controller has two master ports you would need a
second cell to specify the port number, but only if more than one port
is actually connected to a slave.

> I'm not sure it's worth mentioning optional channels / request counters.
> If anything, it would be better to update dma.txt to move the "Optional
> properties" to something like "Suggested properties"...

These are less clearly defined. In the generic binding, it's mostly a matter
of "if you need to pass this information, use these properties". The individual
binding can then make them mandatory if needed.

	Arnd

^ permalink raw reply

* Re: [PATCH RFC v5 4/5] dma: mpc512x: register for device tree channel lookup
From: Mark Rutland @ 2013-11-18 12:09 UTC (permalink / raw)
  To: Alexander Popov
  Cc: devicetree@vger.kernel.org, Lars-Peter Clausen, Arnd Bergmann,
	Vinod Koul, Gerhard Sittig, Dan Williams, Anatolij Gustschin,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1383290374-17484-5-git-send-email-a13xp0p0v88@gmail.com>

On Fri, Nov 01, 2013 at 07:19:33AM +0000, Alexander Popov wrote:
> From: Gerhard Sittig <gsi@denx.de>
> 
> register the controller for device tree based lookup of DMA channels
> (non-fatal for backwards compatibility with older device trees), provide
> the '#dma-cells' property in the shared mpc5121.dtsi file, and introduce
> a bindings document for the MPC512x DMA controller
> 
> Signed-off-by: Gerhard Sittig <gsi@denx.de>
> ---
>  .../devicetree/bindings/dma/mpc512x-dma.txt        | 55 ++++++++++++++++++++++
>  arch/powerpc/boot/dts/mpc5121.dtsi                 |  1 +
>  drivers/dma/mpc512x_dma.c                          | 21 +++++++--
>  3 files changed, 74 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/dma/mpc512x-dma.txt
> 
> diff --git a/Documentation/devicetree/bindings/dma/mpc512x-dma.txt b/Documentation/devicetree/bindings/dma/mpc512x-dma.txt
> new file mode 100644
> index 0000000..a4867d5
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/dma/mpc512x-dma.txt
> @@ -0,0 +1,55 @@
> +* Freescale MPC512x DMA Controller
> +
> +The DMA controller in the Freescale MPC512x SoC can move blocks of
> +memory contents between memory and peripherals or memory to memory.
> +
> +Refer to the "Generic DMA Controller and DMA request bindings" description
> +in the dma.txt file for a more detailled discussion of the binding.  The
> +MPC512x DMA engine binding follows the common scheme, but doesn't provide
> +support for the optional channels and requests counters (those values are
> +derived from the detected hardware features) and has a fixed client
> +specifier length of 1 integer cell (the value is the DMA channel, since
> +the DMA controller uses a fixed assignment of request lines per channel).

The fact that #dma-cells must be <1> isn't a difference from the
standard binding, and needs not be described here. The meaning of the
value should be in your description of #dma-cells below.

I'm not sure it's worth mentioning optional channels / request counters.
If anything, it would be better to update dma.txt to move the "Optional
properties" to something like "Suggested properties"...

> +
> +
> +DMA controller node properties:
> +
> +Required properties:
> +- compatible:		should be "fsl,mpc5121-dma"
> +- reg:			address and size of the DMA controller's register set
> +- interrupts:		interrupt spec for the DMA controller
> +
> +Optional properties:
> +- #dma-cells:		must be <1>, describes the number of integer cells
> +			needed to specify the 'dmas' property in client nodes,
> +			strongly recommended since common client helper code
> +			uses this property

Rather than redundantly describing what the standard property means, it
would be better to describe what the cell actually means (i.e. which
value maps to which DMA channel in hardware, how many their are, etc).

> +
> +Example:
> +
> +	dma0: dma@14000 {
> +		compatible = "fsl,mpc5121-dma";
> +		reg = <0x14000 0x1800>;
> +		interrupts = <65 0x8>;
> +		#dma-cells = <1>;
> +	};
> +
> +
> +Client node properties:
> +
> +Required properties:
> +- dmas:			list of DMA specifiers, consisting each of a handle
> +			for the DMA controller and integer cells to specify
> +			the channel used within the DMA controller
> +- dma-names:		list of identifier strings for the DMA specifiers,
> +			client device driver code uses these strings to
> +			have DMA channels looked up at the controller
> +

I don't think its necessary or helpful to describe the client binding
here, it's in dma.txt

Thanks,
Mark.

^ permalink raw reply

* Re: [v6][PATCH 0/5] powerpc/book3e: powerpc/book3e: make kgdb to work well
From: "“tiejun.chen”" @ 2013-11-18  8:36 UTC (permalink / raw)
  To: scottwood; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1382520685-11609-1-git-send-email-tiejun.chen@windriver.com>

On 10/23/2013 05:31 PM, Tiejun Chen wrote:
> Scott,
>
> Tested on fsl-p5040 DS.

Scott,

Any comments to this version?

Tiejun

>
> v6:
>
> * rebase
> * change the C code to initialize the exception stack addresses in the PACA instead.
> * Clear the PACA_IRQ_HARD_DIS force to exit directly from this debug exception
>    without replaying interrupt.
> * so drop "book3e/kgdb: update thread's dbcr0".
>
> v5:
>
> * rebase on merge branch.
>
> Note the original patch, [ATCH 5/7] kgdb/kgdbts: support ppc64, is already merged
> by Jason.
>
> v4:
>
> * use DEFINE_PER_CPU to allocate kgdb's thread_info
> * add patch 7 to make usre copy thread_info only !__check_irq_replay
> * leave "andi.   r14,r11,MSR_PR" out of "#ifndef CONFIG_KGDB"
>    since cr0 is still used lately.
> * retest
>
> v3:
>
> * make work when enable CONFIG_RELOCATABLE
> * fix one typo in patch,
>    "powerpc/book3e: store critical/machine/debug exception thread info":
> 	ld	r1,PACAKSAVE(r13);
>      ->  ld	r14,PACAKSAVE(r13);
> * remove copying the thread_info since booke and book3e always copy
>    the thead_info now when we enter the debug exception, and so drop
>    the v2 patch, "book3e/kgdb: Fix a single stgep case of lazy IRQ"
>
> v2:
>
> * Make sure we cover CONFIG_PPC_BOOK3E_64 safely
> * Use LOAD_REG_IMMEDIATE() to load properly
> 	the value of the constant expression in load debug exception stack
> * Copy thread infor form the kernel stack coming from usr
> * Rebase latest powerpc git tree
>
> v1:
>
> * Copy thread info only when we are from !user mode since we'll get kernel stack
>    coming from usr directly.
> * remove save/restore EX_R14/EX_R15 since DBG_EXCEPTION_PROLOG already covered
>    this.
> * use CURRENT_THREAD_INFO() conveniently to get thread.
> * fix some typos
> * add a patch to make sure gdb can generate a single step properly to invoke a
>    kgdb state.
> * add a patch to if we need to replay an interrupt, we shouldn't restore that
>    previous backup thread info to make sure we can replay an interrupt lately
>    with a proper thread info.
> * rebase latest powerpc git tree
>
> v0:
>
> This patchset is used to support kgdb for book3e.
>
> ----------------------------------------------------------------
> Tiejun Chen (5):
>        powerpc/book3e: initialize crit/mc/dbg kernel stack pointers
>        powerpc/book3e: store crit/mc/dbg exception thread info
>        powerpc/book3e: support kgdb for kernel space
>        powerpc/kgdb: use DEFINE_PER_CPU to allocate kgdb's thread_info
>        powerpc/book3e/kgdb: Fix a single stgep case of lazy IRQ
>
>   arch/powerpc/kernel/exceptions-64e.S |   26 ++++++++++++++++++++++----
>   arch/powerpc/kernel/kgdb.c           |   13 ++++++++++---
>   arch/powerpc/kernel/setup_64.c       |   18 ++++++++++++------
>   3 files changed, 44 insertions(+), 13 deletions(-)
>
> Tiejun
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>

^ permalink raw reply

* [PATCH] powerpc/powernv: Update dump README file
From: Vasant Hegde @ 2013-11-18 11:09 UTC (permalink / raw)
  To: linuxppc-dev

Update dump README file content.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/opal-dump.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-dump.c b/arch/powerpc/platforms/powernv/opal-dump.c
index e102a80..9bc8ad3 100644
--- a/arch/powerpc/platforms/powernv/opal-dump.c
+++ b/arch/powerpc/platforms/powernv/opal-dump.c
@@ -317,9 +317,17 @@ static struct notifier_block dump_nb = {
 };
 
 
-/* FIXME: debugfs README message */
+/* debugfs README message */
 static const char readme_msg[] =
-	"This file will be populated shortly..";
+	"Platform dump HOWTO:\n\n"
+	"files:\n"
+	"  dump                  - Binary file, contains actual dump data\n"
+	"  dump_available (r--)  - New dump available notification\n"
+	"                          0 : No dump present\n"
+	"                          1 : Dump present\n"
+	"  dump_control(-w-)     - Dump control file\n"
+	"                          1 : Send acknowledgement (dump copied)\n"
+	"                          2 : Initiate FipS dump\n";
 
 /* debugfs dump_control file operations */
 static ssize_t dump_control_write(struct file *file,

^ permalink raw reply related

* [PATCH] powerpc/powernv: Platform dump interface
From: Vasant Hegde @ 2013-11-18 11:09 UTC (permalink / raw)
  To: linuxppc-dev

This patch adds Platform dump retrieval interface.

Flow:
  - We register to OPAL notification event.
  - OPAL sends new dump available notification.
  - We retrieve the dump and send it to debugfs.
  - User copies the dump data and end ACKs via debugfs.
  - We send ACK to OPAL.

debugfs files:
  We create below dump related files under "fsp" directory.
  - dump		: Dump data
  - dump_available	: New dump available notification to userspace
  - dump_control	: ACK/initiate new dump
  - README		: README

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h                |   12 +
 arch/powerpc/platforms/powernv/Makefile        |    2 
 arch/powerpc/platforms/powernv/opal-dump.c     |  420 ++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |    4 
 arch/powerpc/platforms/powernv/opal.c          |    2 
 5 files changed, 438 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-dump.c

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index d1af862..a1c5237 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -154,6 +154,10 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_FLASH_VALIDATE			76
 #define OPAL_FLASH_MANAGE			77
 #define OPAL_FLASH_UPDATE			78
+#define OPAL_DUMP_INIT				81
+#define OPAL_DUMP_INFO				82
+#define OPAL_DUMP_READ				83
+#define OPAL_DUMP_ACK				84
 
 #ifndef __ASSEMBLY__
 
@@ -233,7 +237,8 @@ enum OpalPendingState {
 	OPAL_EVENT_ERROR_LOG		= 0x40,
 	OPAL_EVENT_EPOW			= 0x80,
 	OPAL_EVENT_LED_STATUS		= 0x100,
-	OPAL_EVENT_PCI_ERROR		= 0x200
+	OPAL_EVENT_PCI_ERROR		= 0x200,
+	OPAL_EVENT_DUMP_AVAIL		= 0x400,
 };
 
 /* Machine check related definitions */
@@ -752,6 +757,10 @@ int64_t opal_lpc_read(uint32_t chip_id, enum OpalLPCAddressType addr_type,
 int64_t opal_validate_flash(uint64_t buffer, uint32_t *size, uint32_t *result);
 int64_t opal_manage_flash(uint8_t op);
 int64_t opal_update_flash(uint64_t blk_list);
+int64_t opal_dump_init(uint8_t dump_type);
+int64_t opal_dump_info(uint32_t *dump_id, uint32_t *dump_size);
+int64_t opal_dump_read(uint32_t dump_id, uint64_t buffer);
+int64_t opal_dump_ack(uint32_t dump_id);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname, int depth, void *data);
@@ -781,6 +790,7 @@ extern void opal_get_rtc_time(struct rtc_time *tm);
 extern unsigned long opal_get_boot_time(void);
 extern void opal_nvram_init(void);
 extern void opal_flash_init(void);
+extern void opal_platform_dump_init(void);
 
 extern int opal_machine_check(struct pt_regs *regs);
 
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 873fa13..379b215 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,6 +1,6 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o
 obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
-obj-y			+= rng.o
+obj-y			+= rng.o opal-dump.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/opal-dump.c b/arch/powerpc/platforms/powernv/opal-dump.c
new file mode 100644
index 0000000..e102a80
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-dump.c
@@ -0,0 +1,420 @@
+/*
+ * PowerNV OPAL Dump Interface
+ *
+ * Copyright 2013 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kobject.h>
+#include <linux/debugfs.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/pagemap.h>
+#include <linux/delay.h>
+
+#include <asm/opal.h>
+
+/* Dump type */
+#define DUMP_TYPE_FSP	0x01
+
+/* Extract failed */
+#define DUMP_NACK_ID	0x00
+
+/* Dump record */
+struct dump_record {
+	uint8_t		type;
+	uint32_t	id;
+	uint32_t	size;
+	char		*buffer;
+};
+static struct dump_record dump_record;
+
+/* Dump available status */
+static u32 dump_avail;
+
+/* Binary blobs */
+static struct debugfs_blob_wrapper dump_blob;
+static struct debugfs_blob_wrapper readme_blob;
+
+/* Ignore dump notification, if we fail to create debugfs files */
+static bool dump_disarmed = false;
+
+
+static void free_dump_sg_list(struct opal_sg_list *list)
+{
+	struct opal_sg_list *sg1;
+	while (list) {
+		sg1 = list->next;
+		kfree(list);
+		list = sg1;
+	}
+	list = NULL;
+}
+
+/*
+ * Build dump buffer scatter gather list
+ */
+static struct opal_sg_list *dump_data_to_sglist(void)
+{
+	struct opal_sg_list *sg1, *list = NULL;
+	void *addr;
+	int64_t size;
+
+	addr = dump_record.buffer;
+	size = dump_record.size;
+
+	sg1 = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!sg1)
+		goto nomem;
+
+	list = sg1;
+	sg1->num_entries = 0;
+	while (size > 0) {
+		/* Translate virtual address to physical address */
+		sg1->entry[sg1->num_entries].data =
+			(void *)(vmalloc_to_pfn(addr) << PAGE_SHIFT);
+
+		if (size > PAGE_SIZE)
+			sg1->entry[sg1->num_entries].length = PAGE_SIZE;
+		else
+			sg1->entry[sg1->num_entries].length = size;
+
+		sg1->num_entries++;
+		if (sg1->num_entries >= SG_ENTRIES_PER_NODE) {
+			sg1->next = kzalloc(PAGE_SIZE, GFP_KERNEL);
+			if (!sg1->next)
+				goto nomem;
+
+			sg1 = sg1->next;
+			sg1->num_entries = 0;
+		}
+		addr += PAGE_SIZE;
+		size -= PAGE_SIZE;
+	}
+	return list;
+
+nomem:
+	pr_err("%s : Failed to allocate memory\n", __func__);
+	free_dump_sg_list(list);
+	return NULL;
+}
+
+/*
+ * Translate sg list address to absolute
+ */
+static void sglist_to_phy_addr(struct opal_sg_list *list)
+{
+	struct opal_sg_list *sg, *next;
+
+	for (sg = list; sg; sg = next) {
+		next = sg->next;
+		/* Don't translate NULL pointer for last entry */
+		if (sg->next)
+			sg->next = (struct opal_sg_list *)__pa(sg->next);
+		else
+			sg->next = NULL;
+
+		/* Convert num_entries to length */
+		sg->num_entries =
+			sg->num_entries * sizeof(struct opal_sg_entry) + 16;
+	}
+}
+
+static void free_dump_data_buf(void)
+{
+	vfree(dump_record.buffer);
+	dump_record.size = 0;
+}
+
+/*
+ * Allocate dump data buffer.
+ */
+static int alloc_dump_data_buf(void)
+{
+	dump_record.buffer = vzalloc(PAGE_ALIGN(dump_record.size));
+	if (!dump_record.buffer) {
+		pr_err("%s : Failed to allocate memory\n", __func__);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+/*
+ * Initiate FipS dump
+ */
+static int64_t dump_fips_init(uint8_t type)
+{
+	int rc;
+
+	rc = opal_dump_init(type);
+	if (rc)
+		pr_warn("%s: Failed to initiate FipS dump (%d)\n",
+			__func__, rc);
+	return rc;
+}
+
+/*
+ * Get dump ID and size.
+ */
+static int64_t dump_read_info(void)
+{
+	int rc;
+
+	rc = opal_dump_info(&dump_record.id, &dump_record.size);
+	if (rc)
+		pr_warn("%s: Failed to get dump info (%d)\n",
+			__func__, rc);
+	return rc;
+}
+
+/*
+ * Send acknoledgement to OPAL
+ */
+static int64_t dump_send_ack(uint32_t dump_id)
+{
+	int rc;
+
+	rc = opal_dump_ack(dump_id);
+	if (rc)
+		pr_warn("%s: Failed to send ack message to ID 0x%x (%d)\n",
+			__func__, dump_id, rc);
+	return rc;
+}
+
+/*
+ * Retrieve dump data
+ */
+static int64_t dump_read_data(void)
+{
+	struct opal_sg_list *list;
+	uint64_t addr;
+	int64_t rc;
+
+	/* Allocate memory */
+	rc = alloc_dump_data_buf();
+	if (rc)
+		goto out;
+
+	/* Generate SG list */
+	list = dump_data_to_sglist();
+	if (!list) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	/* Translate sg list addr to real address */
+	sglist_to_phy_addr(list);
+
+	/* First entry address */
+	addr = __pa(list);
+
+	/* Fetch data */
+	rc = OPAL_BUSY;
+	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
+		rc = opal_dump_read(dump_record.id, addr);
+		if (rc == OPAL_BUSY) {
+			opal_poll_events(NULL);
+			mdelay(10);
+		}
+	}
+
+	if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL)
+		pr_warn("%s: Extract dump failed for ID 0x%x\n",
+			__func__, dump_record.id);
+
+	/* Free SG list */
+	free_dump_sg_list(list);
+
+out:
+	return rc;
+}
+
+static int extract_dump(void)
+{
+	int rc;
+
+	/* Get dump ID, size */
+	rc = dump_read_info();
+	if (rc != OPAL_SUCCESS)
+		return rc;
+
+	/* Read dump data */
+	rc = dump_read_data();
+	if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) {
+		/*
+		 * Failed to allocate memory to retrieve dump. Lets send
+		 * negative ack so that we get notification again.
+		 */
+		dump_send_ack(DUMP_NACK_ID);
+
+		/* Free dump buffer */
+		free_dump_data_buf();
+
+		return rc;
+	}
+	if (rc == OPAL_PARTIAL)
+		pr_info("%s: Partially read dump ID 0x%x\n",
+			__func__, dump_record.id);
+
+	pr_info("%s: New platform dump available. ID = 0x%x\n",
+		__func__, dump_record.id);
+
+	/* Update dump blob */
+	dump_blob.data = (void *)dump_record.buffer;
+	dump_blob.size = dump_record.size;
+
+	/* Update dump available status */
+	dump_avail = 1;
+
+	return rc;
+}
+
+static void dump_extract_fn(struct work_struct *work)
+{
+	extract_dump();
+}
+
+static DECLARE_WORK(dump_work, dump_extract_fn);
+
+/* Workqueue to extract dump */
+static void schedule_extract_dump(void)
+{
+	schedule_work(&dump_work);
+}
+
+/*
+ * New dump available notification
+ *
+ * Once we get notification, we extract dump via OPAL call
+ * and then write dump to file.
+ */
+static int dump_event(struct notifier_block *nb,
+		      unsigned long events, void *change)
+{
+	/*
+	 * Don't retrieve dump, if we don't have debugfs
+	 * interface to pass data to userspace.
+	 */
+	if (dump_disarmed)
+		return 0;
+
+	/* Check for dump available notification */
+	if (events & OPAL_EVENT_DUMP_AVAIL)
+		schedule_extract_dump();
+
+	return 0;
+}
+
+static struct notifier_block dump_nb = {
+	.notifier_call  = dump_event,
+	.next           = NULL,
+	.priority       = 0
+};
+
+
+/* FIXME: debugfs README message */
+static const char readme_msg[] =
+	"This file will be populated shortly..";
+
+/* debugfs dump_control file operations */
+static ssize_t dump_control_write(struct file *file,
+				  const char __user *user_buf,
+				  size_t count, loff_t *ppos)
+{
+	char buf[4];
+	size_t buf_size;
+
+	buf_size = min(count, (sizeof(buf) - 1));
+	if (copy_from_user(buf, user_buf, buf_size))
+		return -EFAULT;
+
+	switch (buf[0]) {
+	case '1':	/* Dump send ack */
+		if (dump_avail) {
+			dump_avail = 0;
+			free_dump_data_buf();
+			dump_send_ack(dump_record.id);
+		}
+		break;
+	case '2':	/* Initiate FipS dump */
+		dump_fips_init(DUMP_TYPE_FSP);
+		break;
+	default:
+		break;
+	}
+	return count;
+}
+
+static const struct file_operations dump_control_fops = {
+	.open	= simple_open,
+	.write	= dump_control_write,
+	.llseek	= default_llseek,
+};
+
+/*
+ * Create dump debugfs file
+ */
+static int debugfs_dump_init(void)
+{
+	struct dentry *dir, *file;
+
+	/* FSP dump directory */
+	dir = debugfs_create_dir("fsp", NULL);
+	if (!dir)
+		goto out;
+
+	/* README */
+	readme_blob.data = (void *)readme_msg;
+	readme_blob.size = strlen(readme_msg);
+	file = debugfs_create_blob("README", 0400, dir, &readme_blob);
+	if (!file)
+		goto remove_dir;
+
+	/* Dump available notification */
+	file = debugfs_create_u32("dump_avail", 0400, dir, &dump_avail);
+	if (!file)
+		goto remove_dir;
+
+	/* data file */
+	dump_blob.data = (void *)dump_record.buffer;
+	dump_blob.size = dump_record.size;
+	file = debugfs_create_blob("dump", 0400, dir, &dump_blob);
+	if (!file)
+		goto remove_dir;
+
+	/* Control file */
+	file = debugfs_create_file("dump_control", 0200, dir,
+				   NULL, &dump_control_fops);
+	if (!file)
+		goto remove_dir;
+
+	return 0;
+
+remove_dir:
+	debugfs_remove_recursive(dir);
+
+out:
+	dump_disarmed = true;
+	return -1;
+}
+
+void __init opal_platform_dump_init(void)
+{
+	int ret;
+
+	/* Register for opal notifier */
+	ret = opal_notifier_register(&dump_nb);
+	if (ret) {
+		pr_warn("%s: Can't register OPAL event notifier (%d)\n",
+			__func__, ret);
+		return;
+	}
+
+	/* debugfs interface */
+	ret = debugfs_dump_init();
+}
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index e780650..1485a09 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -126,3 +126,7 @@ OPAL_CALL(opal_return_cpu,			OPAL_RETURN_CPU);
 OPAL_CALL(opal_validate_flash,			OPAL_FLASH_VALIDATE);
 OPAL_CALL(opal_manage_flash,			OPAL_FLASH_MANAGE);
 OPAL_CALL(opal_update_flash,			OPAL_FLASH_UPDATE);
+OPAL_CALL(opal_dump_init,			OPAL_DUMP_INIT);
+OPAL_CALL(opal_dump_info,			OPAL_DUMP_INFO);
+OPAL_CALL(opal_dump_read,			OPAL_DUMP_READ);
+OPAL_CALL(opal_dump_ack,			OPAL_DUMP_ACK);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 1c798cd..7c7524c 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -442,6 +442,8 @@ static int __init opal_init(void)
 	if (rc == 0) {
 		/* Setup code update interface */
 		opal_flash_init();
+		/* Setup platform dump extract interface */
+		opal_platform_dump_init();
 	}
 
 	return 0;

^ permalink raw reply related

* [PATCH] powerpc/powernv: Move SG list structure to header file
From: Vasant Hegde @ 2013-11-18 11:09 UTC (permalink / raw)
  To: linuxppc-dev

Move SG list and entry structure to header file so that
it can be used in other places as well.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h             |   22 +++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-flash.c |   31 +++++----------------------
 2 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 033c06b..d1af862 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -33,6 +33,28 @@ struct opal_takeover_args {
 	u64	rd_loc;			/* r11 */
 };
 
+/*
+ * SG entry
+ *
+ * WARNING: The current implementation requires each entry
+ * to represent a block that is 4k aligned *and* each block
+ * size except the last one in the list to be as well.
+ */
+struct opal_sg_entry {
+	void    *data;
+	long    length;
+};
+
+/* sg list */
+struct opal_sg_list {
+	unsigned long num_entries;
+	struct opal_sg_list *next;
+	struct opal_sg_entry entry[];
+};
+
+/* We calculate number of sg entries based on PAGE_SIZE */
+#define SG_ENTRIES_PER_NODE ((PAGE_SIZE - 16) / sizeof(struct opal_sg_entry))
+
 extern long opal_query_takeover(u64 *hal_size, u64 *hal_align);
 
 extern long opal_do_takeover(struct opal_takeover_args *args);
diff --git a/arch/powerpc/platforms/powernv/opal-flash.c b/arch/powerpc/platforms/powernv/opal-flash.c
index 6ffa6b1..4aeae4f 100644
--- a/arch/powerpc/platforms/powernv/opal-flash.c
+++ b/arch/powerpc/platforms/powernv/opal-flash.c
@@ -103,27 +103,6 @@ struct image_header_t {
 	uint32_t	size;
 };
 
-/* Scatter/gather entry */
-struct opal_sg_entry {
-	void	*data;
-	long	length;
-};
-
-/* We calculate number of entries based on PAGE_SIZE */
-#define SG_ENTRIES_PER_NODE ((PAGE_SIZE - 16) / sizeof(struct opal_sg_entry))
-
-/*
- * This struct is very similar but not identical to that
- * needed by the opal flash update. All we need to do for
- * opal is rewrite num_entries into a version/length and
- * translate the pointers to absolute.
- */
-struct opal_sg_list {
-	unsigned long num_entries;
-	struct opal_sg_list *next;
-	struct opal_sg_entry entry[SG_ENTRIES_PER_NODE];
-};
-
 struct validate_flash_t {
 	int		status;		/* Return status */
 	void		*buf;		/* Candiate image buffer */
@@ -333,7 +312,7 @@ static struct opal_sg_list *image_data_to_sglist(void)
 	addr = image_data.data;
 	size = image_data.size;
 
-	sg1 = kzalloc((sizeof(struct opal_sg_list)), GFP_KERNEL);
+	sg1 = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	if (!sg1)
 		return NULL;
 
@@ -351,8 +330,7 @@ static struct opal_sg_list *image_data_to_sglist(void)
 
 		sg1->num_entries++;
 		if (sg1->num_entries >= SG_ENTRIES_PER_NODE) {
-			sg1->next = kzalloc((sizeof(struct opal_sg_list)),
-					    GFP_KERNEL);
+			sg1->next = kzalloc(PAGE_SIZE, GFP_KERNEL);
 			if (!sg1->next) {
 				pr_err("%s : Failed to allocate memory\n",
 				       __func__);
@@ -402,7 +380,10 @@ static int opal_flash_update(int op)
 		else
 			sg->next = NULL;
 
-		/* Make num_entries into the version/length field */
+		/*
+		 * Convert num_entries to version/length format
+		 * to satisfy OPAL.
+		 */
 		sg->num_entries = (SG_LIST_VERSION << 56) |
 			(sg->num_entries * sizeof(struct opal_sg_entry) + 16);
 	}

^ permalink raw reply related

* [PATCH] DMA-API: ppc: vio: use dma_coerce_mask_and_coherent()
From: Cédric Le Goater @ 2013-11-18 10:57 UTC (permalink / raw)
  To: benh; +Cc: Russell King, Cédric Le Goater, linuxppc-dev, Paul Mackerras
In-Reply-To: <20131116153244.GD25039@n2100.arm.linux.org.uk>

Commit 4886c399da70d5f8a4016c2213850dce6cac88c5 (DMA-API: ppc: vio.c: 
replace dma_set_mask()+dma_set_coherent_mask() with new helper) 
introduced the usage of the new helper routine dma_set_mask_and_coherent(). 
This breaks the initialization of the pseries vio devices which do not 
setup an initial dev->dma_mask for the device.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
---

 arch/powerpc/kernel/vio.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c
index e7d0c88f..76a6482 100644
--- a/arch/powerpc/kernel/vio.c
+++ b/arch/powerpc/kernel/vio.c
@@ -1419,7 +1419,7 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node)
 
 		/* needed to ensure proper operation of coherent allocations
 		 * later, in case driver doesn't set it explicitly */
-		dma_set_mask_and_coherent(&viodev->dev, DMA_BIT_MASK(64));
+		dma_coerce_mask_and_coherent(&viodev->dev, DMA_BIT_MASK(64));
 	}
 
 	/* register with generic device framework */
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH 16/51] DMA-API: ppc: vio.c: replace dma_set_mask()+dma_set_coherent_mask() with new helper
From: Cedric Le Goater @ 2013-11-18 10:54 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: Paul Mackerras, linuxppc-dev
In-Reply-To: <20131116153244.GD25039@n2100.arm.linux.org.uk>

On 11/16/2013 04:32 PM, Russell King - ARM Linux wrote:
> On Fri, Nov 15, 2013 at 05:16:55PM +0100, Cedric Le Goater wrote:
>> The new helper routine dma_set_mask_and_coherent() breaks the 
>> initialization of the pseries vio devices which do not have an 
>> initial dev->dma_mask. I think we need to use dma_coerce_mask_and_coherent()
>> instead.
> 
> Who wants to handle this patch?
>
> Also, is it possible to fix it so that dev->dma_mask is correctly setup
> by the code which creates the device, as it should be in the first place?

The vio_dev should probably be improved to setup devices as the other 
drivers do but we might be short on time to do that for 3.13 and pseries 
is really broken ... For now, I will just resend the patch with some 
context in the changelog. 

Thanks,

C.

^ permalink raw reply

* [PATCH v3] powerpc/powernv: infrastructure to read opal messages in generic format.
From: Mahesh J Salgaonkar @ 2013-11-18 10:05 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Opal now has a new messaging infrastructure to push the messages to
linux in a generic format for different type of messages using only one
event bit. The format of the opal message is as below:

struct opal_msg {
        uint32_t msg_type;
	uint32_t reserved;
	uint64_t params[8];
};

This patch allows clients to subscribe for notification for specific
message type. It is upto the subscriber to decipher the messages who showed
interested in receiving specific message type.

The interface to subscribe for notification is:

	int opal_message_notifier_register(enum OpalMessageType msg_type,
                                        struct notifier_block *nb)


The notifier will fetch the opal message when available and notify the
subscriber with message type and the opal message. It is subscribers
responsibility to copy the message data before returning from notifier
callback.

I will post a seperate patch series for fsp memory handling which uses
this new messaging channel to pull fsp memory errors.

Changes in v3:
- Fixed event numbering and naming issue.
- Per Ben's comment, renamed opal_message_handle_event() to
  opal_handle_message()

Changes in v2:
- Fixed opal token numbers to match with latest changes in opal.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h                |   23 ++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |    2 +
 arch/powerpc/platforms/powernv/opal.c          |   90 ++++++++++++++++++++++++
 3 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index c5cd728..eb0dfd4 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -129,6 +129,8 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_LPC_READ				67
 #define OPAL_LPC_WRITE				68
 #define OPAL_RETURN_CPU				69
+#define OPAL_GET_MSG				85
+#define OPAL_CHECK_ASYNC_COMPLETION		86
 
 #ifndef __ASSEMBLY__
 
@@ -208,7 +210,16 @@ enum OpalPendingState {
 	OPAL_EVENT_ERROR_LOG		= 0x40,
 	OPAL_EVENT_EPOW			= 0x80,
 	OPAL_EVENT_LED_STATUS		= 0x100,
-	OPAL_EVENT_PCI_ERROR		= 0x200
+	OPAL_EVENT_PCI_ERROR		= 0x200,
+	OPAL_EVENT_MSG_PENDING		= 0x800,
+};
+
+enum OpalMessageType {
+	OPAL_MSG_ASYNC_COMP		= 0,
+	OPAL_MSG_MEM_ERR,
+	OPAL_MSG_EPOW,
+	OPAL_MSG_SHUTDOWN,
+	OPAL_MSG_TYPE_MAX,
 };
 
 /* Machine check related definitions */
@@ -353,6 +364,12 @@ enum OpalLPCAddressType {
 	OPAL_LPC_FW	= 2,
 };
 
+struct opal_msg {
+	uint32_t msg_type;
+	uint32_t reserved;
+	uint64_t params[8];
+};
+
 struct opal_machine_check_event {
 	enum OpalMCE_Version	version:8;	/* 0x00 */
 	uint8_t			in_use;		/* 0x01 */
@@ -656,6 +673,8 @@ int64_t opal_lpc_write(uint32_t chip_id, enum OpalLPCAddressType addr_type,
 		       uint32_t addr, uint32_t data, uint32_t sz);
 int64_t opal_lpc_read(uint32_t chip_id, enum OpalLPCAddressType addr_type,
 		      uint32_t addr, uint32_t *data, uint32_t sz);
+int64_t opal_get_msg(uint64_t buffer, size_t size);
+int64_t opal_check_completion(uint64_t buffer, size_t size, uint64_t token);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname, int depth, void *data);
@@ -670,6 +689,8 @@ extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
 				   int depth, void *data);
 
 extern int opal_notifier_register(struct notifier_block *nb);
+extern int opal_message_notifier_register(enum OpalMessageType msg_type,
+						struct notifier_block *nb);
 extern void opal_notifier_enable(void);
 extern void opal_notifier_disable(void);
 extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 8f38445..4c2e19a 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -116,3 +116,5 @@ OPAL_CALL(opal_xscom_write,			OPAL_XSCOM_WRITE);
 OPAL_CALL(opal_lpc_read,			OPAL_LPC_READ);
 OPAL_CALL(opal_lpc_write,			OPAL_LPC_WRITE);
 OPAL_CALL(opal_return_cpu,			OPAL_RETURN_CPU);
+OPAL_CALL(opal_get_msg,				OPAL_GET_MSG);
+OPAL_CALL(opal_check_completion,		OPAL_CHECK_ASYNC_COMPLETION);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 2911abe..dd5a8b3 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -33,6 +33,7 @@ extern u64 opal_mc_secondary_handler[];
 static unsigned int *opal_irqs;
 static unsigned int opal_irq_count;
 static ATOMIC_NOTIFIER_HEAD(opal_notifier_head);
+static struct atomic_notifier_head opal_msg_notifier_head[OPAL_MSG_TYPE_MAX];
 static DEFINE_SPINLOCK(opal_notifier_lock);
 static uint64_t last_notified_mask = 0x0ul;
 static atomic_t opal_notifier_hold = ATOMIC_INIT(0);
@@ -162,6 +163,95 @@ void opal_notifier_disable(void)
 	atomic_set(&opal_notifier_hold, 1);
 }
 
+/*
+ * Opal message notifier based on message type. Allow subscribers to get
+ * notified for specific messgae type.
+ */
+int opal_message_notifier_register(enum OpalMessageType msg_type,
+					struct notifier_block *nb)
+{
+	if (!nb) {
+		pr_warning("%s: Invalid argument (%p)\n",
+			   __func__, nb);
+		return -EINVAL;
+	}
+	if (msg_type > OPAL_MSG_TYPE_MAX) {
+		pr_warning("%s: Invalid message type argument (%d)\n",
+			   __func__, msg_type);
+		return -EINVAL;
+	}
+	return atomic_notifier_chain_register(
+				&opal_msg_notifier_head[msg_type], nb);
+}
+
+static void opal_message_do_notify(uint32_t msg_type, void *msg)
+{
+	/* notify subscribers */
+	atomic_notifier_call_chain(&opal_msg_notifier_head[msg_type],
+					msg_type, msg);
+}
+
+static void opal_handle_message(void)
+{
+	s64 ret;
+	/*
+	 * TODO: pre-allocate a message buffer depending on opal-msg-size
+	 * value in /proc/device-tree.
+	 */
+	static struct opal_msg msg;
+
+	ret = opal_get_msg(__pa(&msg), sizeof(msg));
+	/* No opal message pending. */
+	if (ret == OPAL_RESOURCE)
+		return;
+
+	/* check for errors. */
+	if (ret) {
+		pr_warning("%s: Failed to retrive opal message, err=%lld\n",
+				__func__, ret);
+		return;
+	}
+
+	/* Sanity check */
+	if (msg.msg_type > OPAL_MSG_TYPE_MAX) {
+		pr_warning("%s: Unknown message type: %u\n",
+				__func__, msg.msg_type);
+		return;
+	}
+	opal_message_do_notify(msg.msg_type, (void *)&msg);
+}
+
+static int opal_message_notify(struct notifier_block *nb,
+			  unsigned long events, void *change)
+{
+	if (events & OPAL_EVENT_MSG_PENDING)
+		opal_handle_message();
+	return 0;
+}
+
+static struct notifier_block opal_message_nb = {
+	.notifier_call	= opal_message_notify,
+	.next		= NULL,
+	.priority	= 0,
+};
+
+static int __init opal_message_init(void)
+{
+	int ret, i;
+
+	for (i = 0; i < OPAL_MSG_TYPE_MAX; i++)
+		ATOMIC_INIT_NOTIFIER_HEAD(&opal_msg_notifier_head[i]);
+
+	ret = opal_notifier_register(&opal_message_nb);
+	if (ret) {
+		pr_err("%s: Can't register OPAL event notifier (%d)\n",
+		       __func__, ret);
+		return ret;
+	}
+	return 0;
+}
+early_initcall(opal_message_init);
+
 int opal_get_chars(uint32_t vtermno, char *buf, int count)
 {
 	s64 len, rc;

^ permalink raw reply related

* [PATCH -V2 5/5] powerpc: mm: book3s: Enable _PAGE_NUMA for book3s
From: Aneesh Kumar K.V @ 2013-11-18  9:28 UTC (permalink / raw)
  To: benh, paulus, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1384766893-10189-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We steal the _PAGE_COHERENCE bit and use that for indicating NUMA ptes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pgtable.h     | 66 +++++++++++++++++++++++++++++++++-
 arch/powerpc/include/asm/pte-hash64.h  |  6 ++++
 arch/powerpc/platforms/Kconfig.cputype |  1 +
 3 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 7d6eacf249cf..b999ca318985 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -3,6 +3,7 @@
 #ifdef __KERNEL__
 
 #ifndef __ASSEMBLY__
+#include <linux/mmdebug.h>
 #include <asm/processor.h>		/* For TASK_SIZE */
 #include <asm/mmu.h>
 #include <asm/page.h>
@@ -33,10 +34,73 @@ static inline int pte_dirty(pte_t pte)		{ return pte_val(pte) & _PAGE_DIRTY; }
 static inline int pte_young(pte_t pte)		{ return pte_val(pte) & _PAGE_ACCESSED; }
 static inline int pte_file(pte_t pte)		{ return pte_val(pte) & _PAGE_FILE; }
 static inline int pte_special(pte_t pte)	{ return pte_val(pte) & _PAGE_SPECIAL; }
-static inline int pte_present(pte_t pte)	{ return pte_val(pte) & _PAGE_PRESENT; }
 static inline int pte_none(pte_t pte)		{ return (pte_val(pte) & ~_PTE_NONE_MASK) == 0; }
 static inline pgprot_t pte_pgprot(pte_t pte)	{ return __pgprot(pte_val(pte) & PAGE_PROT_BITS); }
 
+#ifdef CONFIG_NUMA_BALANCING
+
+static inline int pte_present(pte_t pte)
+{
+	return pte_val(pte) & (_PAGE_PRESENT | _PAGE_NUMA);
+}
+
+#define pte_numa pte_numa
+static inline int pte_numa(pte_t pte)
+{
+	return (pte_val(pte) &
+		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
+}
+
+#define pte_mknonnuma pte_mknonnuma
+static inline pte_t pte_mknonnuma(pte_t pte)
+{
+	pte_val(pte) &= ~_PAGE_NUMA;
+	pte_val(pte) |=  _PAGE_PRESENT | _PAGE_ACCESSED;
+	return pte;
+}
+
+#define pte_mknuma pte_mknuma
+static inline pte_t pte_mknuma(pte_t pte)
+{
+	/*
+	 * We should not set _PAGE_NUMA on non present ptes. Also clear the
+	 * present bit so that hash_page will return 1 and we collect this
+	 * as numa fault.
+	 */
+	if (pte_present(pte)) {
+		pte_val(pte) |= _PAGE_NUMA;
+		pte_val(pte) &= ~_PAGE_PRESENT;
+	} else
+		VM_BUG_ON(1);
+	return pte;
+}
+
+#define pmd_numa pmd_numa
+static inline int pmd_numa(pmd_t pmd)
+{
+	return pte_numa(pmd_pte(pmd));
+}
+
+#define pmd_mknonnuma pmd_mknonnuma
+static inline pmd_t pmd_mknonnuma(pmd_t pmd)
+{
+	return pte_pmd(pte_mknonnuma(pmd_pte(pmd)));
+}
+
+#define pmd_mknuma pmd_mknuma
+static inline pmd_t pmd_mknuma(pmd_t pmd)
+{
+	return pte_pmd(pte_mknuma(pmd_pte(pmd)));
+}
+
+# else
+
+static inline int pte_present(pte_t pte)
+{
+	return pte_val(pte) & _PAGE_PRESENT;
+}
+#endif /* CONFIG_NUMA_BALANCING */
+
 /* Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  *
diff --git a/arch/powerpc/include/asm/pte-hash64.h b/arch/powerpc/include/asm/pte-hash64.h
index 55aea0caf95e..2505d8eab15c 100644
--- a/arch/powerpc/include/asm/pte-hash64.h
+++ b/arch/powerpc/include/asm/pte-hash64.h
@@ -27,6 +27,12 @@
 #define _PAGE_RW		0x0200 /* software: user write access allowed */
 #define _PAGE_BUSY		0x0800 /* software: PTE & hash are busy */
 
+/*
+ * Used for tracking numa faults
+ */
+#define _PAGE_NUMA	0x00000010 /* Gather numa placement stats */
+
+
 /* No separate kernel read-only */
 #define _PAGE_KERNEL_RW		(_PAGE_RW | _PAGE_DIRTY) /* user access blocked by key */
 #define _PAGE_KERNEL_RO		 _PAGE_KERNEL_RW
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index c2a566fb8bb8..2048655d8ec4 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -72,6 +72,7 @@ config PPC_BOOK3S_64
 	select PPC_HAVE_PMU_SUPPORT
 	select SYS_SUPPORTS_HUGETLBFS
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE if PPC_64K_PAGES
+	select ARCH_SUPPORTS_NUMA_BALANCING
 
 config PPC_BOOK3E_64
 	bool "Embedded processors"
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64
From: Aneesh Kumar K.V @ 2013-11-18  9:28 UTC (permalink / raw)
  To: benh, paulus, linux-mm; +Cc: linuxppc-dev

Hi,

This patch series add support for numa faults on ppc64 architecture. We steal the
_PAGE_COHERENCE bit and use that for indicating _PAGE_NUMA. We clear the _PAGE_PRESENT bit
and also invalidate the hpte entry on setting _PAGE_NUMA. The next fault on that
page will be considered a numa fault.

Changes from V1:
* Dropped few patches related pmd update because batch handling of pmd pages got dropped from core code
   0f19c17929c952c6f0966d93ab05558e7bf814cc "mm: numa: Do not batch handle PMD pages"
   This also avoided the large lock contention on page_table_lock that we observed with the previous series.

 -aneesh

^ permalink raw reply

* [PATCH -V2 2/5] powerpc: Free up _PAGE_COHERENCE for numa fault use later
From: Aneesh Kumar K.V @ 2013-11-18  9:28 UTC (permalink / raw)
  To: benh, paulus, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1384766893-10189-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Set  memory coherence always on hash64 config. If
a platform cannot have memory coherence always set they
can infer that from _PAGE_NO_CACHE and _PAGE_WRITETHRU
like in lpar. So we dont' really need a separate bit
for tracking _PAGE_COHERENCE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pte-hash64.h |  2 +-
 arch/powerpc/mm/hash_low_64.S         | 15 ++++++++++++---
 arch/powerpc/mm/hash_utils_64.c       |  7 ++++---
 arch/powerpc/mm/hugepage-hash64.c     |  6 +++++-
 arch/powerpc/mm/hugetlbpage-hash64.c  |  4 ++++
 5 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/pte-hash64.h b/arch/powerpc/include/asm/pte-hash64.h
index 0419eeb53274..55aea0caf95e 100644
--- a/arch/powerpc/include/asm/pte-hash64.h
+++ b/arch/powerpc/include/asm/pte-hash64.h
@@ -19,7 +19,7 @@
 #define _PAGE_FILE		0x0002 /* (!present only) software: pte holds file offset */
 #define _PAGE_EXEC		0x0004 /* No execute on POWER4 and newer (we invert) */
 #define _PAGE_GUARDED		0x0008
-#define _PAGE_COHERENT		0x0010 /* M: enforce memory coherence (SMP systems) */
+/* We can derive Memory coherence from _PAGE_NO_CACHE */
 #define _PAGE_NO_CACHE		0x0020 /* I: cache inhibit */
 #define _PAGE_WRITETHRU		0x0040 /* W: cache write-through */
 #define _PAGE_DIRTY		0x0080 /* C: page changed */
diff --git a/arch/powerpc/mm/hash_low_64.S b/arch/powerpc/mm/hash_low_64.S
index d3cbda62857b..1136d26a95ae 100644
--- a/arch/powerpc/mm/hash_low_64.S
+++ b/arch/powerpc/mm/hash_low_64.S
@@ -148,7 +148,10 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 	and	r0,r0,r4		/* _PAGE_RW & _PAGE_DIRTY ->r0 bit 30*/
 	andc	r0,r30,r0		/* r0 = pte & ~r0 */
 	rlwimi	r3,r0,32-1,31,31	/* Insert result into PP lsb */
-	ori	r3,r3,HPTE_R_C		/* Always add "C" bit for perf. */
+	/*
+	 * Always add "C" bit for perf. Memory coherence is always enabled
+	 */
+	ori	r3,r3,HPTE_R_C | HPTE_R_M
 
 	/* We eventually do the icache sync here (maybe inline that
 	 * code rather than call a C function...) 
@@ -457,7 +460,10 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 	and	r0,r0,r4		/* _PAGE_RW & _PAGE_DIRTY ->r0 bit 30*/
 	andc	r0,r3,r0		/* r0 = pte & ~r0 */
 	rlwimi	r3,r0,32-1,31,31	/* Insert result into PP lsb */
-	ori	r3,r3,HPTE_R_C		/* Always add "C" bit for perf. */
+	/*
+	 * Always add "C" bit for perf. Memory coherence is always enabled
+	 */
+	ori	r3,r3,HPTE_R_C | HPTE_R_M
 
 	/* We eventually do the icache sync here (maybe inline that
 	 * code rather than call a C function...)
@@ -795,7 +801,10 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 	and	r0,r0,r4		/* _PAGE_RW & _PAGE_DIRTY ->r0 bit 30*/
 	andc	r0,r30,r0		/* r0 = pte & ~r0 */
 	rlwimi	r3,r0,32-1,31,31	/* Insert result into PP lsb */
-	ori	r3,r3,HPTE_R_C		/* Always add "C" bit for perf. */
+	/*
+	 * Always add "C" bit for perf. Memory coherence is always enabled
+	 */
+	ori	r3,r3,HPTE_R_C | HPTE_R_M
 
 	/* We eventually do the icache sync here (maybe inline that
 	 * code rather than call a C function...)
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 6176b3cdf579..de6881259aef 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -169,9 +169,10 @@ static unsigned long htab_convert_pte_flags(unsigned long pteflags)
 	if ((pteflags & _PAGE_USER) && !((pteflags & _PAGE_RW) &&
 					 (pteflags & _PAGE_DIRTY)))
 		rflags |= 1;
-
-	/* Always add C */
-	return rflags | HPTE_R_C;
+	/*
+	 * Always add "C" bit for perf. Memory coherence is always enabled
+	 */
+	return rflags | HPTE_R_C | HPTE_R_M;
 }
 
 int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
index 34de9e0cdc34..826893fcb3a7 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -127,7 +127,11 @@ repeat:
 
 		/* Add in WIMG bits */
 		rflags |= (new_pmd & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
-				      _PAGE_COHERENT | _PAGE_GUARDED));
+				      _PAGE_GUARDED));
+		/*
+		 * enable the memory coherence always
+		 */
+		rflags |= HPTE_R_M;
 
 		/* Insert into the hash table, primary slot */
 		slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index 0b7fb6761015..a5bcf9301196 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -99,6 +99,10 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
 		/* Add in WIMG bits */
 		rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
 				      _PAGE_COHERENT | _PAGE_GUARDED));
+		/*
+		 * enable the memory coherence always
+		 */
+		rflags |= HPTE_R_M;
 
 		slot = hpte_insert_repeating(hash, vpn, pa, rflags, 0,
 					     mmu_psize, ssize);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
From: Aneesh Kumar K.V @ 2013-11-18  9:28 UTC (permalink / raw)
  To: benh, paulus, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1384766893-10189-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
On archs like ppc64 that don't use _PAGE_PROTNONE and also have
a separate page table outside linux pagetable, we just need to
make sure that when calling change_prot_numa we flush the
hardware page table entry so that next page access  result in a numa
fault.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/mm.h | 3 ---
 mm/mempolicy.c     | 9 ---------
 2 files changed, 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0548eb201e05..51794c1a1d7e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1851,11 +1851,8 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
 }
 #endif
 
-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
 unsigned long change_prot_numa(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end);
-#endif
-
 struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
 			unsigned long pfn, unsigned long size, pgprot_t);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index c4403cdf3433..cae10af4fdc4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -613,7 +613,6 @@ static inline int queue_pages_pgd_range(struct vm_area_struct *vma,
 	return 0;
 }
 
-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
 /*
  * This is used to mark a range of virtual addresses to be inaccessible.
  * These are later cleared by a NUMA hinting fault. Depending on these
@@ -627,7 +626,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
 			unsigned long addr, unsigned long end)
 {
 	int nr_updated;
-	BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
 
 	nr_updated = change_protection(vma, addr, end, vma->vm_page_prot, 0, 1);
 	if (nr_updated)
@@ -635,13 +633,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
 
 	return nr_updated;
 }
-#else
-static unsigned long change_prot_numa(struct vm_area_struct *vma,
-			unsigned long addr, unsigned long end)
-{
-	return 0;
-}
-#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */
 
 /*
  * Walk through page tables and collect pages to be migrated.
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH -V2 4/5] powerpc: mm: Only check for _PAGE_PRESENT in set_pte/pmd functions
From: Aneesh Kumar K.V @ 2013-11-18  9:28 UTC (permalink / raw)
  To: benh, paulus, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1384766893-10189-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We want to make sure we don't use these function when updating a pte
or pmd entry that have a valid hpte entry, because these functions
don't invalidate them. So limit the check to _PAGE_PRESENT bit.
Numafault core changes use these functions for updating _PAGE_NUMA bits.
That should be ok because when _PAGE_NUMA is set we can be sure that
hpte entries are not present.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable.c    | 2 +-
 arch/powerpc/mm/pgtable_64.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 841e0d00863c..ad90429bbd8b 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -174,7 +174,7 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 		pte_t pte)
 {
 #ifdef CONFIG_DEBUG_VM
-	WARN_ON(pte_present(*ptep));
+	WARN_ON(pte_val(*ptep) & _PAGE_PRESENT);
 #endif
 	/* Note: mm->context.id might not yet have been assigned as
 	 * this context might not have been activated yet when this
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 9d95786aa80f..02e8681fb865 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -687,7 +687,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 		pmd_t *pmdp, pmd_t pmd)
 {
 #ifdef CONFIG_DEBUG_VM
-	WARN_ON(!pmd_none(*pmdp));
+	WARN_ON(pmd_val(*pmdp) & _PAGE_PRESENT);
 	assert_spin_locked(&mm->page_table_lock);
 	WARN_ON(!pmd_trans_huge(pmd));
 #endif
-- 
1.8.3.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox