LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support
From: Scott Wood @ 2014-05-05 19:04 UTC (permalink / raw)
  To: Lijun Pan; +Cc: linuxppc-dev, Emilian.Medve
In-Reply-To: <1399314195-28616-1-git-send-email-Lijun.Pan@freescale.com>

On Mon, 2014-05-05 at 13:23 -0500, Lijun Pan wrote:
> P1023RDS is no longer supported/manufactured by Freescale while P1023RDB is.
> 
> Signed-off-by: Lijun Pan <Lijun.Pan@freescale.com>
> ---
>  arch/powerpc/boot/dts/p1023rds.dts                 | 219 ---------------------
>  arch/powerpc/configs/mpc85xx_defconfig             |   1 -
>  arch/powerpc/configs/mpc85xx_smp_defconfig         |   1 -
>  arch/powerpc/platforms/85xx/Kconfig                |   6 +-
>  arch/powerpc/platforms/85xx/Makefile               |   2 +-
>  .../platforms/85xx/{p1023_rds.c => p1023_rdb.c}    |  36 +---
>  6 files changed, 10 insertions(+), 255 deletions(-)
>  delete mode 100644 arch/powerpc/boot/dts/p1023rds.dts
>  rename arch/powerpc/platforms/85xx/{p1023_rds.c => p1023_rdb.c} (75%)

What changed from v1?

If you want this patch merged, please respond to the comments on v1.

-Scott

^ permalink raw reply

* RE: [PATCH v2] powerpc/mpc85xx: Remove P1023 RDS support
From: Lijun Pan @ 2014-05-05 19:08 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev@ozlabs.org, Emilian Medve
In-Reply-To: <1399316691.15726.93.camel@snotra.buserror.net>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogV29vZCBTY290dC1CMDc0
MjENCj4gU2VudDogTW9uZGF5LCBNYXkgMDUsIDIwMTQgMjowNSBQTQ0KPiBUbzogUGFuIExpanVu
LUI0NDMwNg0KPiBDYzogbGludXhwcGMtZGV2QG96bGFicy5vcmc7IE1lZHZlIEVtaWxpYW4tRU1N
RURWRTENCj4gU3ViamVjdDogUmU6IFtQQVRDSCB2Ml0gcG93ZXJwYy9tcGM4NXh4OiBSZW1vdmUg
UDEwMjMgUkRTIHN1cHBvcnQNCj4gDQo+IE9uIE1vbiwgMjAxNC0wNS0wNSBhdCAxMzoyMyAtMDUw
MCwgTGlqdW4gUGFuIHdyb3RlOg0KPiA+IFAxMDIzUkRTIGlzIG5vIGxvbmdlciBzdXBwb3J0ZWQv
bWFudWZhY3R1cmVkIGJ5IEZyZWVzY2FsZSB3aGlsZQ0KPiBQMTAyM1JEQiBpcy4NCj4gPg0KPiA+
IFNpZ25lZC1vZmYtYnk6IExpanVuIFBhbiA8TGlqdW4uUGFuQGZyZWVzY2FsZS5jb20+DQo+ID4g
LS0tDQo+ID4gIGFyY2gvcG93ZXJwYy9ib290L2R0cy9wMTAyM3Jkcy5kdHMgICAgICAgICAgICAg
ICAgIHwgMjE5IC0tLS0tLS0tLS0tLS0NCj4gLS0tLS0tLS0NCj4gPiAgYXJjaC9wb3dlcnBjL2Nv
bmZpZ3MvbXBjODV4eF9kZWZjb25maWcgICAgICAgICAgICAgfCAgIDEgLQ0KPiA+ICBhcmNoL3Bv
d2VycGMvY29uZmlncy9tcGM4NXh4X3NtcF9kZWZjb25maWcgICAgICAgICB8ICAgMSAtDQo+ID4g
IGFyY2gvcG93ZXJwYy9wbGF0Zm9ybXMvODV4eC9LY29uZmlnICAgICAgICAgICAgICAgIHwgICA2
ICstDQo+ID4gIGFyY2gvcG93ZXJwYy9wbGF0Zm9ybXMvODV4eC9NYWtlZmlsZSAgICAgICAgICAg
ICAgIHwgICAyICstDQo+ID4gIC4uLi9wbGF0Zm9ybXMvODV4eC97cDEwMjNfcmRzLmMgPT4gcDEw
MjNfcmRiLmN9ICAgIHwgIDM2ICstLS0NCj4gPiAgNiBmaWxlcyBjaGFuZ2VkLCAxMCBpbnNlcnRp
b25zKCspLCAyNTUgZGVsZXRpb25zKC0pDQo+ID4gIGRlbGV0ZSBtb2RlIDEwMDY0NCBhcmNoL3Bv
d2VycGMvYm9vdC9kdHMvcDEwMjNyZHMuZHRzDQo+ID4gIHJlbmFtZSBhcmNoL3Bvd2VycGMvcGxh
dGZvcm1zLzg1eHgve3AxMDIzX3Jkcy5jID0+IHAxMDIzX3JkYi5jfSAoNzUlKQ0KPiANCj4gV2hh
dCBjaGFuZ2VkIGZyb20gdjE/DQoNCiJQbGVhc2Ugd3JhcCBjaGFuZ2Vsb2dzIGF0IG5vIG1vcmUg
dGhhbiA3NSBjb2x1bW5zLiINCg0KDQo+IElmIHlvdSB3YW50IHRoaXMgcGF0Y2ggbWVyZ2VkLCBw
bGVhc2UgcmVzcG9uZCB0byB0aGUgY29tbWVudHMgb24gdjEuDQo+IA0KPiAtU2NvdHQNCj4gDQoN
Cg==

^ permalink raw reply

* Re: [PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)
From: Scott Wood @ 2014-05-05 19:31 UTC (permalink / raw)
  To: Diana Craciun; +Cc: devicetree, linuxppc-dev
In-Reply-To: <1399305499-6612-1-git-send-email-diana.craciun@freescale.com>

On Mon, 2014-05-05 at 18:58 +0300, Diana Craciun wrote:
> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> index 922c30a..09dbc5f 100644
> --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> @@ -20,3 +20,11 @@ PROPERTIES
>  	a property named fsl,eref-[CAT], where [CAT] is the abbreviated category
>  	name with all uppercase letters converted to lowercase, indicates that
>  	the category is supported by the implementation.
> +
> +	- fsl,portid-mapping : <u32>
> +	The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port Mapping
> +	registers which are part of the CoreNet Coherency fabric (CCF) provide a
> +	CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping functions.
> +	Certain bits from these registers should be set if the coresponding CPU
> +	should be snooped. This property defines a bitmask which selects the bit that
> +	should be set if this cpu should be snooped.

Please follow existing formatting in this file.

> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> index 1f5e329..827c637 100644
> --- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> @@ -26,6 +26,13 @@ Required properties:
>  		  A standard property.
>  - #size-cells	: <u32>
>  		  A standard property.
> +- fsl,portid-mapping : <u32>
> +	The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port Mapping
> +	registers which are part of the CoreNet Coherency fabric (CCF) provide a
> +	CoreNet Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping functions.
> +	Certain bits from these registers should be set if PAMUs should be snooped.
> +	This property defines a bitmask which selects the bits that should be set
> +	if PAMUs should be snooped.

This can't be a required property since existing trees don't have it --
in addition to allowing for the possibility of a PAMU where the snoop ID
is not known or where the snoop domain mechanism does not exist.

-Scott

^ permalink raw reply

* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Christian Zigotzky @ 2014-05-05 21:23 UTC (permalink / raw)
  To: Olof Johansson, Alexander Graf, Aneesh Kumar K.V
  Cc: Paul Mackerras, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <CALiw-2E222RXHK0drwaygb62hXT8F8KCG_AJ7OGKgNfL+Vj8dg@mail.gmail.com>

Am 05.05.14 16:57, schrieb Olof Johansson:
> [Now without HTML email -- it's what you get for cc:ing me at work
> instead of my upstream email :)]
>
> 2014-05-05 7:43 GMT-07:00 Alexander Graf <agraf@suse.de>:
>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>> Alexander Graf <agraf@suse.de> writes:
>>>
>>>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>>>>> Although it's optional IBM POWER cpus always had DAR value set on
>>>>> alignment interrupt. So don't try to compute these values.
>>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>>>> ---
>>>>> Changes from V3:
>>>>> * Use make_dsisr instead of checking feature flag to decide whether to use
>>>>>      saved dsisr or not
>>>>>
>>> ....
>>>
>>>>>     ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>>>>>     {
>>>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>>>> +       return vcpu->arch.fault_dar;
>>>> How about PA6T and G5s?
>>>>
>>>>
>>> Paul mentioned that BOOK3S always had DAR value set on alignment
>>> interrupt. And the patch is to enable/collect correct DAR value when
>>> running with Little Endian PR guest. Now to limit the impact and to
>>> enable Little Endian PR guest, I ended up doing the conditional code
>>> only for book3s 64 for which we know for sure that we set DAR value.
>>
>> Yes, and I'm asking whether we know that this statement holds true for PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least developed by IBM, I'd assume its semantics here are similar to POWER4, but for PA6T I wouldn't be so sure.
>>
> Thanks for looking out for us, obviously IBM doesn't (based on the
> reply a minute ago).
>
> In the end, since there's been no work to enable KVM on PA6T, I'm not
> too worried. I guess it's one more thing to sort out (and check for)
> whenever someone does that.
>
> I definitely don't have cycles to deal with that myself at this time.
> I can help find hardware for someone who wants to, but even then I'm
> guessing the interest is pretty limited.
>
>
> -Olof
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Just for info: "PR" KVM works great on my PA6T machine. I booted the 
Lubuntu 14.04 PowerPC live DVD on a QEMU virtual machine with "PR" KVM 
successfully. But Mac OS X Jaguar, Panther, and Tiger don't boot with 
KVM on Mac-on-Linux and QEMU. See 
http://forum.hyperion-entertainment.biz/viewtopic.php?f=35&t=1747.

-- Christian

^ permalink raw reply

* Re: [PATCH 4/6] powerpc/corenet: Create the dts components for the DPAA FMan
From: Scott Wood @ 2014-05-05 23:25 UTC (permalink / raw)
  To: Emil Medve; +Cc: devicetree, Shruti Kanetkar, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <5364BEB3.6000904@Freescale.com>

On Sat, 2014-05-03 at 05:02 -0500, Emil Medve wrote:
> Hello Scott,
> 
> 
> On 04/21/2014 05:11 PM, Scott Wood wrote:
> > On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote:
> >> +fman@400000 {
> >> +	mdio@f1000 {
> >> +		#address-cells = <1>;
> >> +		#size-cells = <0>;
> >> +		compatible = "fsl,fman-xmdio";
> >> +		reg = <0xf1000 0x1000>;
> >> +	};
> >> +};
> > 
> > I'd like to see a complete fman binding before we start adding pieces.
> 
> The driver for the FMan 10 Gb/s MDIO has upstreamed a couple of years
> ago: '9f35a73 net/fsl: introduce Freescale 10G MDIO driver', granted
> without a binding writeup.

Pushing driver code through the netdev tree does not establish device
tree ABI.  Binding documents and dts files do.

> This patch series should probably include a
> binding blurb. However, let's not gate this patchset on a complete
> binding for the FMan

I at least want to see enough of the FMan binding to have confidence
that what we're adding now is correct.

> As you know we don't own the FMan work and the FMan work is... not ready
> for upstreaming.

I'm not asking for a driver, just a binding that describes hardware.  Is
there any reason why the fman node needs to be anywhere near as
complicated as it is in the SDK, if we're limiting it to actual hardware
description?  Do we really need to have nodes for all the sub-blocks?

> In an attempt to make some sort of progress we've
> decided to upstream the pieces that are less controversial and MDIO is
> an obvious candidate
> 
> >> +fman@400000 {
> >> +	mdio0: mdio@e1120 {
> >> +		#address-cells = <1>;
> >> +		#size-cells = <0>;
> >> +		compatible = "fsl,fman-mdio";
> >> +		reg = <0xe1120 0xee0>;
> >> +	};
> >> +};
> > 
> > What is the difference between "fsl,fman-mdio" and "fsl,fman-xmdio"?  I
> > don't see the latter on the list of compatibles in patch 3/6.
> 
> 'fsl,fman-mdio' is the 1 Gb/s MDIO (Clause 22 only). 'fsl,fman-xmdio' is
> the 10 Gb/s MDIO (Clause 45 only). We can respin this patch wi
> 

"respin this patch wi..."?

> I believe 'fsl,fman-mdio' (and others on that list) was added
> gratuitously as the FMan MDIO is completely compatible with the
> eTSEC/gianfar MDIO driver, but we can deal with that later

It's still good to identify the specific device, even if it's believed
to be 100% compatible.  Plus, IIRC there's been enough badness in the
eTSEC MDIO binding that it'd be good to steer clear of it.

> > Within each category, is the exact fman version discoverable from the
> > mdio registers?
> 
> No, but that's irrelevant as that's not the difference between the two
> compatibles

It's relevant because it means the compatible string should have a block
version number in it, or at least some other way in the MDIO node to
indicate the block version.

> >> +fman@500000 {
> >> +	#address-cells = <1>;
> >> +	#size-cells = <1>;
> >> +	compatible = "simple-bus";
> > 
> > Why is this simple-bus?
> 
> Because that's the translation type for the FMan sub-nodes.

What do you mean by "translation type"?

> We need it now to get the MDIO nodes probed

No.  "simple-bus" is stating an attribute of the hardware, that the
child nodes represent simple memory-mapped devices that can be used
without special bus knowledge.  I don't think that applies here.

You can get the MDIO node probed without misusing simple-bus by adding
the fman node's compatible to the probe list in the kernel code.

This sort of thing is why I want to see what the rest of the fman
binding will look like.

>  and we'll needed later to probe other nodes/devices that will have
> standalone drivers: MAC, MURAM. etc. 

How are they truly standalone?  The exist in service to the greater
entity that is fman.  They presumably work together in some fashion.

> >> +	/* mdio nodes for fman v3 @ 0x500000 */
> >> +	mdio@fc000 {
> >> +		#address-cells = <1>;
> >> +		#size-cells = <0>;
> >> +		reg = <0xfc000 0x1000>;
> >> +	};
> >> +
> >> +	mdio@fd000 {
> >> +		#address-cells = <1>;
> >> +		#size-cells = <0>;
> >> +		reg = <0xfd000 0x1000>;
> >> +	};
> >> +};
> > 
> > Where's the compatible?  Why is this file different from all the others?
> 
> The FMan v3 MDIO block (supports both Clause 22/45) is compatible with
> the FMan v2 10 Gb/s MDIO (the xgmac-mdio driver). However, the driver
> needs a small clean-up patch (still in internal review) that will get it
> working for FMan v3 MDIO.

This suggests that it is not 100% backwards compatible.

>  With that patch will add the compatible to these nodes. However, we
> need these nodes now for the board level MDIO bus muxing support
> (included in this patchset)

If you need these nodes now then add the compatible property now.

-Scott

^ permalink raw reply

* Re: [PATCH 5/6] powerpc/corenet: Add DPAA FMan support to the SoC device tree(s)
From: Scott Wood @ 2014-05-05 23:34 UTC (permalink / raw)
  To: Emil Medve
  Cc: devicetree, Kanetkar Shruti-B44454, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <53661DAB.10808@Freescale.com>

On Sun, 2014-05-04 at 05:59 -0500, Emil Medve wrote:
> Hello Scott,
> 
> 
> On 04/21/2014 05:14 PM, Scott Wood wrote:
> > On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote:
> >> FMan 1 Gb/s MACs (dTSEC and mEMAC) have support for SGMII PHYs.
> >> Add support for the internal SerDes TBI PHYs
> >>
> >> Based on prior work by Andy Fleming <afleming@gmail.com>
> >>
> >> Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com>
> >> ---
> >>  arch/powerpc/boot/dts/fsl/b4860si-post.dtsi |  28 +++++
> >>  arch/powerpc/boot/dts/fsl/b4si-post.dtsi    |  51 +++++++++
> >>  arch/powerpc/boot/dts/fsl/p1023si-post.dtsi |  14 +++
> >>  arch/powerpc/boot/dts/fsl/p2041si-post.dtsi |  64 ++++++++++++
> >>  arch/powerpc/boot/dts/fsl/p3041si-post.dtsi |  64 ++++++++++++
> >>  arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 104 +++++++++++++++++++
> >>  arch/powerpc/boot/dts/fsl/p5020si-post.dtsi |  64 ++++++++++++
> >>  arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 128 +++++++++++++++++++++++
> >>  arch/powerpc/boot/dts/fsl/t4240si-post.dtsi | 154 ++++++++++++++++++++++++++++
> >>  9 files changed, 671 insertions(+)
> >>
> >> diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
> >> index cbc354b..45b0ff5 100644
> >> --- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
> >> +++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
> >> @@ -172,6 +172,34 @@
> >>  		compatible = "fsl,b4860-rcpm", "fsl,qoriq-rcpm-2.0";
> >>  	};
> >>  
> >> +/include/ "qoriq-fman3-0-1g-4.dtsi"
> >> +/include/ "qoriq-fman3-0-1g-5.dtsi"
> >> +/include/ "qoriq-fman3-0-10g-0.dtsi"
> >> +/include/ "qoriq-fman3-0-10g-1.dtsi"
> >> +	fman@400000 {
> >> +		ethernet@e8000 {
> >> +			tbi-handle = <&tbi4>;
> >> +		};
> > 
> > Binding needed
> > 
> > Where is the "reg" for these unit addresses?
> 
> As I said, the bulk of the FMan work comes from another team. Here we
> need just enough to hook up the MDIO and PHY nodes.

Unit addresses must match reg.  No reg, no unit address.

> I'd really like to be able to make progress on this without waiting for that moment in time
> we can get the entire FMan binding in place

Why is the fman binding such a big deal?

> >> +		mdio@e9000 {
> >> +			tbi4: tbi-phy@8 {
> >> +				reg = <0x8>;
> >> +				device_type = "tbi-phy";
> >> +			};
> >> +		};
> > 
> > Binding needed for tbi-phy device_type
> 
> I guess that's fair (BTW, you accepted tbi-phy nodes/device-type before
> without a binding)

It's existing practice on eTSEC.  FMan seemed like an opportunity to
avoid carrying cruft forward.

> > Why are we using device_type at all for this?
> 
> That's what the upstream driver is looking for.

Drivers should look for what the binding says -- not the other way
around.

>  Anyway, most days PHYs can be discovered so they don't use/need
> compatible properties. That's I guess part of the reason we don't have
> bindings for them PHY nodes

I don't see why there couldn't be a compatible that describes the
standard programming interface.

> However, what you can't discover is how they are wired to the MAC(s) so
> we still need some nodes in the device tree to convey that. Also, when
> looking for a specific kind of PHY, such as TBI, device_type works
> easier then parsing compatibles from various vendors or so

Don't you find the TBI by following the tbi-handle property?  That said,
I don't object to having a way to label a PHY as attached via TBI if
that's useful.  I'm giving a mild, non-nacking (given the history)
objection to using device_type for that (given other history).

-Scott

^ permalink raw reply

* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Benjamin Herrenschmidt @ 2014-05-06  0:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, paulus, Alexander Graf, kvm-ppc, kvm
In-Reply-To: <87tx949u9d.fsf@linux.vnet.ibm.com>

On Mon, 2014-05-05 at 19:56 +0530, Aneesh Kumar K.V wrote:
> 
> Paul mentioned that BOOK3S always had DAR value set on alignment
> interrupt. And the patch is to enable/collect correct DAR value when
> running with Little Endian PR guest. Now to limit the impact and to
> enable Little Endian PR guest, I ended up doing the conditional code
> only for book3s 64 for which we know for sure that we set DAR value.

Only BookS ? Afaik, the kernel align.c unconditionally uses DAR on
every processor type. It's DSISR that may or may not be populated
but afaik DAR always is.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
From: Benjamin Herrenschmidt @ 2014-05-06  0:06 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linuxppc-dev@lists.ozlabs.org, paulus@samba.org, Aneesh Kumar K.V,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
In-Reply-To: <20FFDF8F-1A3D-4719-B492-1E4B70F9D1B4@suse.de>

On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
> Isn't this a greater problem? We should start swapping before we hit
> the point where non movable kernel allocation fails, no?

Possibly but the fact remains, this can be avoided by making sure that
if we create a CMA reserve for KVM, then it uses it rather than using
the rest of main memory for hash tables.

> The fact that KVM uses a good number of normal kernel pages is maybe
> suboptimal, but shouldn't be a critical problem.

The point is that we explicitly reserve those pages in CMA for use
by KVM for that specific purpose, but the current code tries first
to get them out of the normal pool.

This is not an optimal behaviour and is what Aneesh patches are
trying to fix.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Benjamin Herrenschmidt @ 2014-05-06  0:07 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, olofj, kvm-ppc, paulus, Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <5367A39D.9080709@suse.de>

On Mon, 2014-05-05 at 16:43 +0200, Alexander Graf wrote:
> > Paul mentioned that BOOK3S always had DAR value set on alignment
> > interrupt. And the patch is to enable/collect correct DAR value when
> > running with Little Endian PR guest. Now to limit the impact and to
> > enable Little Endian PR guest, I ended up doing the conditional code
> > only for book3s 64 for which we know for sure that we set DAR value.
> 
> Yes, and I'm asking whether we know that this statement holds true for 
> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is 
> at least developed by IBM, I'd assume its semantics here are similar to 
> POWER4, but for PA6T I wouldn't be so sure.

I am not aware of any PowerPC processor that does not set DAR on
alignment interrupts. Paul, are you ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Paul Mackerras @ 2014-05-06  0:41 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, Aneesh Kumar K.V, kvm-ppc, kvm
In-Reply-To: <536773C2.1070502@suse.de>

On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote:
> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
> >+#ifdef CONFIG_PPC_BOOK3S_64
> >+	return vcpu->arch.fault_dar;
> 
> How about PA6T and G5s?

G5 sets DAR on an alignment interrupt.

As for PA6T, I don't know for sure, but if it doesn't, ordinary
alignment interrupts wouldn't be handled properly, since the code in
arch/powerpc/kernel/align.c assumes DAR contains the address being
accessed on all PowerPC CPUs.

Did PA Semi ever publish a user manual for the PA6T, I wonder?

Paul.

^ permalink raw reply

* [PATCH] powerpc/fsl-booke64: Set vmemmap_psize to 4K
From: Scott Wood @ 2014-05-06  0:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Scott Wood

The only way Freescale booke chips support mappings larger than 4K
is via TLB1.  The only way we support (direct) TLB1 entries is via
hugetlb, which is not what map_kernel_page() does when given a large
page size.

Without this, a kernel with CONFIG_SPARSEMEM_VMEMMAP enabled crashes on
boot with messages such as:

PID hash table entries: 4096 (order: 3, 32768 bytes)
Sorting __ex_table...
BUG: Bad page state in process swapper  pfn:00a2f
page:8000040000023a48 count:0 mapcount:0 mapping:0000040000ffce48 index:0x40000ffbe50
page flags: 0x40000ffda40(active|arch_1|private|private_2|head|tail|swapcache|mappedtodisk|reclaim|swapbacked|unevictable|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
page flags: 0x311840(active|private|private_2|swapcache|unevictable|mlocked)
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.15.0-rc1-00003-g7fa250c #299
Call Trace:
[c00000000098ba20] [c000000000008b3c] .show_stack+0x7c/0x1cc (unreliable)
[c00000000098baf0] [c00000000060aa50] .dump_stack+0x88/0xb4
[c00000000098bb70] [c0000000000c0468] .bad_page+0x144/0x1a0
[c00000000098bc10] [c0000000000c0628] .free_pages_prepare+0x164/0x17c
[c00000000098bcc0] [c0000000000c24cc] .free_hot_cold_page+0x48/0x214
[c00000000098bd60] [c00000000086c318] .free_all_bootmem+0x1fc/0x354
[c00000000098be70] [c00000000085da84] .mem_init+0xac/0xdc
[c00000000098bef0] [c0000000008547b0] .start_kernel+0x21c/0x4d4
[c00000000098bf90] [c000000000000448] .start_here_common+0x20/0x58

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/mm/tlb_nohash.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index ae3d5b7..92cb18d 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -596,8 +596,13 @@ static void __early_init_mmu(int boot_cpu)
 	/* XXX This should be decided at runtime based on supported
 	 * page sizes in the TLB, but for now let's assume 16M is
 	 * always there and a good fit (which it probably is)
+	 *
+	 * Freescale booke only supports 4K pages in TLB0, so use that.
 	 */
-	mmu_vmemmap_psize = MMU_PAGE_16M;
+	if (mmu_has_feature(MMU_FTR_TYPE_FSL_E))
+		mmu_vmemmap_psize = MMU_PAGE_4K;
+	else
+		mmu_vmemmap_psize = MMU_PAGE_16M;
 
 	/* XXX This code only checks for TLB 0 capabilities and doesn't
 	 *     check what page size combos are supported by the HW. It
-- 
1.9.1

^ permalink raw reply related

* [PATCH v4] powerpc/fsl: Add binding for Freescale CCF
From: Scott Wood @ 2014-05-06  1:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Diana Craciun, devicetree, Scott Wood

From: Diana Craciun <Diana.Craciun@freescale.com>

The CoreNet coherency fabric is a fabric-oriented, conectivity
infrastructure that enables the implementation of coherent, multicore
systems. The CCF acts as a central interconnect for cores,
platform-level caches, memory subsystem, peripheral devices and I/O host
bridges in the system.

Signed-off-by: Diana Craciun <Diana.Craciun@freescale.com>
[scottwood@freescale.com: formatting and minor changes]
Signed-off-by: Scott Wood <scottwood@freescale.com>
---
v4: Fixed various formatting issues, minor edits for clarity, and
made fsl,portid-mapping an optional property.

 .../devicetree/bindings/powerpc/fsl/ccf.txt        | 46 ++++++++++++++++++++++
 .../devicetree/bindings/powerpc/fsl/cpus.txt       | 11 ++++++
 .../devicetree/bindings/powerpc/fsl/pamu.txt       | 10 +++++
 3 files changed, 67 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt
new file mode 100644
index 0000000..454da7e
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/fsl/ccf.txt
@@ -0,0 +1,46 @@
+Freescale CoreNet Coherency Fabric(CCF) Device Tree Binding
+
+DESCRIPTION
+
+The CoreNet coherency fabric is a fabric-oriented, connectivity infrastructure
+that enables the implementation of coherent, multicore systems.
+
+Required properties:
+
+- compatible: <string list>
+		fsl,corenet1-cf - CoreNet coherency fabric version 1.
+		Example chips: T4240, B4860
+
+		fsl,corenet2-cf - CoreNet coherency fabric version 2.
+		Example chips: P5040, P5020, P4080, P3041, P2041
+
+		fsl,corenet-cf - Used to represent the common registers
+		between CCF version 1 and CCF version 2.  This compatible
+		is retained for compatibility reasons, as it was already
+		used for both CCF version 1 chips and CCF version 2
+		chips.  It should be specified after either
+		"fsl,corenet1-cf" or "fsl,corenet2-cf".
+
+- reg: <prop-encoded-array>
+		A standard property. Represents the CCF registers.
+
+- interrupts: <prop-encoded-array>
+		Interrupt mapping for CCF error interrupt.
+
+- fsl,ccf-num-csdids: <u32>
+		Specifies the number of Coherency Subdomain ID Port Mapping
+		Registers that are supported by the CCF.
+
+- fsl,ccf-num-snoopids: <u32>
+		Specifies the number of Snoop ID Port Mapping Registers that
+		are supported by CCF.
+
+Example:
+
+	corenet-cf@18000 {
+		compatible = "fsl,corenet2-cf", "fsl,corenet-cf";
+		reg = <0x18000 0x1000>;
+		interrupts = <16 2 1 31>;
+		fsl,ccf-num-csdids = <32>;
+		fsl,ccf-num-snoopids = <32>;
+	};
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
index 922c30a..f8cd239 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
@@ -20,3 +20,14 @@ PROPERTIES
 	a property named fsl,eref-[CAT], where [CAT] is the abbreviated category
 	name with all uppercase letters converted to lowercase, indicates that
 	the category is supported by the implementation.
+
+    - fsl,portid-mapping
+	Usage: optional
+	Value type: <u32>
+	Definition: The Coherency Subdomain ID Port Mapping Registers and
+	Snoop ID Port Mapping registers, which are part of the CoreNet
+	Coherency fabric (CCF), provide a CoreNet Coherency Subdomain
+	ID/CoreNet Snoop ID to cpu mapping functions.  Certain bits from
+	these registers should be set if the coresponding CPU should be
+	snooped.  This property defines a bitmask which selects the bit
+	that should be set if this cpu should be snooped.
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
index 1f5e329..c2b2899 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
@@ -34,6 +34,15 @@ Optional properties:
 		  for legacy drivers.
 - interrupt-parent : <phandle>
 		  Phandle to interrupt controller
+- fsl,portid-mapping : <u32>
+		  The Coherency Subdomain ID Port Mapping Registers and
+		  Snoop ID Port Mapping registers, which are part of the
+		  CoreNet Coherency fabric (CCF), provide a CoreNet
+		  Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping
+		  functions.  Certain bits from these registers should be
+		  set if PAMUs should be snooped.  This property defines
+		  a bitmask which selects the bits that should be set if
+		  PAMUs should be snooped.
 
 Child nodes:
 
@@ -88,6 +97,7 @@ Example:
 		compatible = "fsl,pamu-v1.0", "fsl,pamu";
 		reg = <0x20000 0x5000>;
 		ranges = <0 0x20000 0x5000>;
+		fsl,portid-mapping = <0xf80000>;
 		#address-cells = <1>;
 		#size-cells = <1>;
 		interrupts = <
-- 
1.9.1

^ permalink raw reply related

* [PATCH] powerpc/fsl: Add fsl,portid-mapping to corenet1-cf chips
From: Scott Wood @ 2014-05-06  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Scott Wood, devicetree, Diana Craciun

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Diana Craciun <diana.craciun@freescale.com>
---
 arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi  | 4 ++++
 arch/powerpc/boot/dts/fsl/p3041si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi  | 4 ++++
 arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p4080si-pre.dtsi  | 8 ++++++++
 arch/powerpc/boot/dts/fsl/p5020si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi  | 2 ++
 arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 1 +
 arch/powerpc/boot/dts/fsl/p5040si-pre.dtsi  | 4 ++++
 10 files changed, 27 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi b/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi
index b5daa4c..5290df8 100644
--- a/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p2041si-post.dtsi
@@ -262,6 +262,7 @@
 		interrupts = <
 			24 2 0 0
 			16 2 1 30>;
+		fsl,portid-mapping = <0x0f000000>;
 
 		pamu0: pamu@0 {
 			reg = <0 0x1000>;
diff --git a/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi b/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi
index 22f3b14..b1ea147 100644
--- a/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p2041si-pre.dtsi
@@ -83,6 +83,7 @@
 			reg = <0>;
 			clocks = <&mux0>;
 			next-level-cache = <&L2_0>;
+			fsl,portid-mapping = <0x80000000>;
 			L2_0: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -92,6 +93,7 @@
 			reg = <1>;
 			clocks = <&mux1>;
 			next-level-cache = <&L2_1>;
+			fsl,portid-mapping = <0x40000000>;
 			L2_1: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -101,6 +103,7 @@
 			reg = <2>;
 			clocks = <&mux2>;
 			next-level-cache = <&L2_2>;
+			fsl,portid-mapping = <0x20000000>;
 			L2_2: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -110,6 +113,7 @@
 			reg = <3>;
 			clocks = <&mux3>;
 			next-level-cache = <&L2_3>;
+			fsl,portid-mapping = <0x10000000>;
 			L2_3: l2-cache {
 				next-level-cache = <&cpc>;
 			};
diff --git a/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi b/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi
index 5abd1fc..cd63cb1 100644
--- a/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p3041si-post.dtsi
@@ -289,6 +289,7 @@
 		interrupts = <
 			24 2 0 0
 			16 2 1 30>;
+		fsl,portid-mapping = <0x0f000000>;
 
 		pamu0: pamu@0 {
 			reg = <0 0x1000>;
diff --git a/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi b/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi
index 468e8be..dc5f4b3 100644
--- a/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p3041si-pre.dtsi
@@ -84,6 +84,7 @@
 			reg = <0>;
 			clocks = <&mux0>;
 			next-level-cache = <&L2_0>;
+			fsl,portid-mapping = <0x80000000>;
 			L2_0: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -93,6 +94,7 @@
 			reg = <1>;
 			clocks = <&mux1>;
 			next-level-cache = <&L2_1>;
+			fsl,portid-mapping = <0x40000000>;
 			L2_1: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -102,6 +104,7 @@
 			reg = <2>;
 			clocks = <&mux2>;
 			next-level-cache = <&L2_2>;
+			fsl,portid-mapping = <0x20000000>;
 			L2_2: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -111,6 +114,7 @@
 			reg = <3>;
 			clocks = <&mux3>;
 			next-level-cache = <&L2_3>;
+			fsl,portid-mapping = <0x10000000>;
 			L2_3: l2-cache {
 				next-level-cache = <&cpc>;
 			};
diff --git a/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi b/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi
index bf0e7c9..12947cc 100644
--- a/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p4080si-post.dtsi
@@ -297,6 +297,7 @@
 		interrupts = <
 			24 2 0 0
 			16 2 1 30>;
+		fsl,portid-mapping = <0x00f80000>;
 
 		pamu0: pamu@0 {
 			reg = <0 0x1000>;
diff --git a/arch/powerpc/boot/dts/fsl/p4080si-pre.dtsi b/arch/powerpc/boot/dts/fsl/p4080si-pre.dtsi
index 0040b5a..38bde09 100644
--- a/arch/powerpc/boot/dts/fsl/p4080si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p4080si-pre.dtsi
@@ -83,6 +83,7 @@
 			reg = <0>;
 			clocks = <&mux0>;
 			next-level-cache = <&L2_0>;
+			fsl,portid-mapping = <0x80000000>;
 			L2_0: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -92,6 +93,7 @@
 			reg = <1>;
 			clocks = <&mux1>;
 			next-level-cache = <&L2_1>;
+			fsl,portid-mapping = <0x40000000>;
 			L2_1: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -101,6 +103,7 @@
 			reg = <2>;
 			clocks = <&mux2>;
 			next-level-cache = <&L2_2>;
+			fsl,portid-mapping = <0x20000000>;
 			L2_2: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -110,6 +113,7 @@
 			reg = <3>;
 			clocks = <&mux3>;
 			next-level-cache = <&L2_3>;
+			fsl,portid-mapping = <0x10000000>;
 			L2_3: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -119,6 +123,7 @@
 			reg = <4>;
 			clocks = <&mux4>;
 			next-level-cache = <&L2_4>;
+			fsl,portid-mapping = <0x08000000>;
 			L2_4: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -128,6 +133,7 @@
 			reg = <5>;
 			clocks = <&mux5>;
 			next-level-cache = <&L2_5>;
+			fsl,portid-mapping = <0x04000000>;
 			L2_5: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -137,6 +143,7 @@
 			reg = <6>;
 			clocks = <&mux6>;
 			next-level-cache = <&L2_6>;
+			fsl,portid-mapping = <0x02000000>;
 			L2_6: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -146,6 +153,7 @@
 			reg = <7>;
 			clocks = <&mux7>;
 			next-level-cache = <&L2_7>;
+			fsl,portid-mapping = <0x01000000>;
 			L2_7: l2-cache {
 				next-level-cache = <&cpc>;
 			};
diff --git a/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi b/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
index f7ca9f4..4c4a2b0 100644
--- a/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
@@ -294,6 +294,7 @@
 		interrupts = <
 			24 2 0 0
 			16 2 1 30>;
+		fsl,portid-mapping = <0x3c000000>;
 
 		pamu0: pamu@0 {
 			reg = <0 0x1000>;
diff --git a/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi b/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
index fe1a2e6..1cc61e1 100644
--- a/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
@@ -90,6 +90,7 @@
 			reg = <0>;
 			clocks = <&mux0>;
 			next-level-cache = <&L2_0>;
+			fsl,portid-mapping = <0x80000000>;
 			L2_0: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -99,6 +100,7 @@
 			reg = <1>;
 			clocks = <&mux1>;
 			next-level-cache = <&L2_1>;
+			fsl,portid-mapping = <0x40000000>;
 			L2_1: l2-cache {
 				next-level-cache = <&cpc>;
 			};
diff --git a/arch/powerpc/boot/dts/fsl/p5040si-post.dtsi b/arch/powerpc/boot/dts/fsl/p5040si-post.dtsi
index 91477b5..67296fd 100644
--- a/arch/powerpc/boot/dts/fsl/p5040si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5040si-post.dtsi
@@ -248,6 +248,7 @@
 		#size-cells = <1>;
 		interrupts = <24 2 0 0
 			      16 2 1 30>;
+		fsl,portid-mapping = <0x0f800000>;
 
 		pamu0: pamu@0 {
 			reg = <0 0x1000>;
diff --git a/arch/powerpc/boot/dts/fsl/p5040si-pre.dtsi b/arch/powerpc/boot/dts/fsl/p5040si-pre.dtsi
index 3674686..b048a2b 100644
--- a/arch/powerpc/boot/dts/fsl/p5040si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5040si-pre.dtsi
@@ -83,6 +83,7 @@
 			reg = <0>;
 			clocks = <&mux0>;
 			next-level-cache = <&L2_0>;
+			fsl,portid-mapping = <0x80000000>;
 			L2_0: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -92,6 +93,7 @@
 			reg = <1>;
 			clocks = <&mux1>;
 			next-level-cache = <&L2_1>;
+			fsl,portid-mapping = <0x40000000>;
 			L2_1: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -101,6 +103,7 @@
 			reg = <2>;
 			clocks = <&mux2>;
 			next-level-cache = <&L2_2>;
+			fsl,portid-mapping = <0x20000000>;
 			L2_2: l2-cache {
 				next-level-cache = <&cpc>;
 			};
@@ -110,6 +113,7 @@
 			reg = <3>;
 			clocks = <&mux3>;
 			next-level-cache = <&L2_3>;
+			fsl,portid-mapping = <0x10000000>;
 			L2_3: l2-cache {
 				next-level-cache = <&cpc>;
 			};
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)
From: Kumar Gala @ 2014-05-06  2:12 UTC (permalink / raw)
  To: Diana Craciun; +Cc: scottwood, devicetree, linuxppc-dev
In-Reply-To: <1399305499-6612-1-git-send-email-diana.craciun@freescale.com>


On May 5, 2014, at 10:58 AM, Diana Craciun <diana.craciun@freescale.com> =
wrote:

> From: Diana Craciun <Diana.Craciun@freescale.com>
>=20
> The CoreNet coherency fabric is a fabric-oriented, conectivity
> infrastructure that enables the implementation of coherent, multicore
> systems. The CCF acts as a central interconnect for cores,
> platform-level caches, memory subsystem, peripheral devices and I/O =
host
> bridges in the system.
>=20
> Signed-off-by: Diana Craciun <Diana.Craciun@freescale.com>
> ---
> v3:
> 	- added port ID mapping
> 	- removed fsl,corenetx-cf
>=20
> .../devicetree/bindings/powerpc/fsl/ccf.txt        | 42 =
++++++++++++++++++++++
> .../devicetree/bindings/powerpc/fsl/cpus.txt       |  8 +++++
> .../devicetree/bindings/powerpc/fsl/pamu.txt       |  8 +++++
> 3 files changed, 58 insertions(+)
> create mode 100644 =
Documentation/devicetree/bindings/powerpc/fsl/ccf.txt

[snip]

> --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> @@ -20,3 +20,11 @@ PROPERTIES
> 	a property named fsl,eref-[CAT], where [CAT] is the abbreviated =
category
> 	name with all uppercase letters converted to lowercase, =
indicates that
> 	the category is supported by the implementation.
> +
> +	- fsl,portid-mapping : <u32>
> +	The Coherency Subdomain ID Port Mapping Registers and Snoop ID =
Port Mapping
> +	registers which are part of the CoreNet Coherency fabric (CCF) =
provide a
> +	CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping =
functions.
> +	Certain bits from these registers should be set if the =
coresponding CPU
> +	should be snooped. This property defines a bitmask which selects =
the bit that
> +	should be set if this cpu should be snooped.

Under what cases can software not figure out how to set this based on =
the PAMUs in the DT?

> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt =
b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> index 1f5e329..827c637 100644
> --- a/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/pamu.txt
> @@ -26,6 +26,13 @@ Required properties:
> 		  A standard property.
> - #size-cells	: <u32>
> 		  A standard property.
> +- fsl,portid-mapping : <u32>
> +	The Coherency Subdomain ID Port Mapping Registers and Snoop ID =
Port Mapping
> +	registers which are part of the CoreNet Coherency fabric (CCF) =
provide a
> +	CoreNet Coherency Subdomain ID/CoreNet Snoop ID to pamu mapping =
functions.
> +	Certain bits from these registers should be set if PAMUs should =
be snooped.
> +	This property defines a bitmask which selects the bits that =
should be set
> +	if PAMUs should be snooped.
>=20
> Optional properties:
> - reg		: <prop-encoded-array>
> @@ -88,6 +95,7 @@ Example:
> 		compatible =3D "fsl,pamu-v1.0", "fsl,pamu";
> 		reg =3D <0x20000 0x5000>;
> 		ranges =3D <0 0x20000 0x5000>;
> +		fsl,portid-mapping =3D <0xf80000>;
> 		#address-cells =3D <1>;
> 		#size-cells =3D <1>;
> 		interrupts =3D <
> --=20
> 1.7.11.7
>=20
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH v3] powerpc/fsl: Added binding for Freescale CoreNet coherency fabric (CCF)
From: Scott Wood @ 2014-05-06  2:22 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Diana Craciun, devicetree, linuxppc-dev
In-Reply-To: <1FE471CC-2024-4B39-89E2-FBACF8F18A9B@kernel.crashing.org>

On Mon, 2014-05-05 at 21:12 -0500, Kumar Gala wrote:
> On May 5, 2014, at 10:58 AM, Diana Craciun <diana.craciun@freescale.com> wrote:
> 
> > From: Diana Craciun <Diana.Craciun@freescale.com>
> > 
> > The CoreNet coherency fabric is a fabric-oriented, conectivity
> > infrastructure that enables the implementation of coherent, multicore
> > systems. The CCF acts as a central interconnect for cores,
> > platform-level caches, memory subsystem, peripheral devices and I/O host
> > bridges in the system.
> > 
> > Signed-off-by: Diana Craciun <Diana.Craciun@freescale.com>
> > ---
> > v3:
> > 	- added port ID mapping
> > 	- removed fsl,corenetx-cf
> > 
> > .../devicetree/bindings/powerpc/fsl/ccf.txt        | 42 ++++++++++++++++++++++
> > .../devicetree/bindings/powerpc/fsl/cpus.txt       |  8 +++++
> > .../devicetree/bindings/powerpc/fsl/pamu.txt       |  8 +++++
> > 3 files changed, 58 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/ccf.txt
> 
> [snip]
> 
> > --- a/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> > +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpus.txt
> > @@ -20,3 +20,11 @@ PROPERTIES
> > 	a property named fsl,eref-[CAT], where [CAT] is the abbreviated category
> > 	name with all uppercase letters converted to lowercase, indicates that
> > 	the category is supported by the implementation.
> > +
> > +	- fsl,portid-mapping : <u32>
> > +	The Coherency Subdomain ID Port Mapping Registers and Snoop ID Port Mapping
> > +	registers which are part of the CoreNet Coherency fabric (CCF) provide a
> > +	CoreNet Coherency Subdomain ID/CoreNet Snoop ID to cpu mapping functions.
> > +	Certain bits from these registers should be set if the coresponding CPU
> > +	should be snooped. This property defines a bitmask which selects the bit that
> > +	should be set if this cpu should be snooped.
> 
> Under what cases can software not figure out how to set this based on the PAMUs in the DT?

How would it go about doing that?

Besides the difference between corenet1-cf and corenet2-cf, on
corenet1-cf the position of the PAMU bits depends on the number of CPUs
that the chip was designed for.  This may be different from the number
of CPUs that are actually present (e.g. p4040, or AMP).  It's also a
complication that IMHO is asking for trouble, versus straightforwardly
recording information that is present in a table in the manual.

-Scott

^ permalink raw reply

* Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
From: Paul Mackerras @ 2014-05-06  4:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Alexander Graf, kvm-ppc, kvm
In-Reply-To: <87lhug9taz.fsf@linux.vnet.ibm.com>

On Mon, May 05, 2014 at 08:17:00PM +0530, Aneesh Kumar K.V wrote:
> Alexander Graf <agraf@suse.de> writes:
> 
> > On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >
> > No patch description, no proper explanations anywhere why you're doing 
> > what. All of that in a pretty sensitive piece of code. There's no way 
> > this patch can go upstream in its current form.
> >
> 
> Sorry about being vague. Will add a better commit message. The goal is
> to export MPSS support to guest if the host support the same. MPSS
> support is exported via penc encoding in "ibm,segment-page-sizes". The
> actual format can be found at htab_dt_scan_page_sizes. When the guest
> memory is backed by hugetlbfs we expose the penc encoding the host
> support to guest via kvmppc_add_seg_page_size. 

In a case like this it's good to assume the reader doesn't know very
much about Power CPUs, and probably isn't familiar with acronyms such
as MPSS.  The patch needs an introductory paragraph explaining that on
recent IBM Power CPUs, while the hashed page table is looked up using
the page size from the segmentation hardware (i.e. the SLB), it is
possible to have the HPT entry indicate a larger page size.  Thus for
example it is possible to put a 16MB page in a 64kB segment, but since
the hash lookup is done using a 64kB page size, it may be necessary to
put multiple entries in the HPT for a single 16MB page.  This
capability is called mixed page-size segment (MPSS).  With MPSS,
there are two relevant page sizes: the base page size, which is the
size used in searching the HPT, and the actual page size, which is the
size indicated in the HPT entry.  Note that the actual page size is
always >= base page size.

> Now the challenge to THP support is to make sure that our henter,
> hremove etc decode base page size and actual page size correctly
> from the hash table entry values. Most of the changes is to do that.
> Rest of the stuff is already handled by kvm. 
> 
> NOTE: It is much easier to read the code after applying the patch rather
> than reading the diff. I have added comments around each steps in the
> code.

Paul.

^ permalink raw reply

* Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
From: Gavin Shan @ 2014-05-06  4:26 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, aik, Alexander Graf, kvm-ppc, Gavin Shan, qiudayu,
	linuxppc-dev
In-Reply-To: <1399298412.24318.521.camel@ul30vt.home>

On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote:
>On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote:
>> On 05/05/2014 03:27 AM, Gavin Shan wrote:
>> > The series of patches intends to support EEH for PCI devices, which have been
>> > passed through to PowerKVM based guest via VFIO. The implementation is
>> > straightforward based on the issues or problems we have to resolve to support
>> > EEH for PowerKVM based guest.
>> >
>> > - Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure
>> >    to emulate XICS. Without introducing new mechanism, we just extend that
>> >    existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
>> >    initiated from guest are posted to host where the requests get handled or
>> >    delivered to underly firmware for further handling. For that, the host kerenl
>> >    has to maintain the PCI address (host domain/bus/slot/function to guest's
>> >    PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping
>> >    will be built when initializing VFIO device in QEMU and destroied when the
>> >    VFIO device in QEMU is going to offline, or VM is destroy.
>> 
>> Do you also expose all those interfaces to user space? VFIO is as much 
>> about user space device drivers as it is about device assignment.
>> 

Yep, all the interfaces are exported to user space. 

>> I would like to first see an implementation that doesn't touch KVM 
>> emulation code at all but instead routes everything through QEMU. As a 
>> second step we can then accelerate performance critical paths inside of KVM.
>> 

Ok. I'll change the implementation. However, the QEMU still has to
poll/push information from/to host kerenl. So the best place for that
would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature.

For the error injection, I guess I have to put the logic token management
into QEMU and error injection request will be handled by QEMU and then
routed to host kernel via additional syscall as we did for pSeries.

>> That way we ensure that user space device drivers have all the power 
>> over a device they need to drive it.
>
>+1
>

Thanks,
Gavin

^ permalink raw reply

* Re: [PATCH 4/6] powerpc/corenet: Create the dts components for the DPAA FMan
From: Emil Medve @ 2014-05-06  5:54 UTC (permalink / raw)
  To: Scott Wood; +Cc: devicetree, Shruti Kanetkar, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1399332359.15726.154.camel@snotra.buserror.net>

Hello Scott,


On 05/05/2014 06:25 PM, Scott Wood wrote:
> On Sat, 2014-05-03 at 05:02 -0500, Emil Medve wrote:
>> Hello Scott,
>>
>>
>> On 04/21/2014 05:11 PM, Scott Wood wrote:
>>> On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote:
>>>> +fman@400000 {
>>>> +	mdio@f1000 {
>>>> +		#address-cells = <1>;
>>>> +		#size-cells = <0>;
>>>> +		compatible = "fsl,fman-xmdio";
>>>> +		reg = <0xf1000 0x1000>;
>>>> +	};
>>>> +};
>>>
>>> I'd like to see a complete fman binding before we start adding pieces.
>>
>> The driver for the FMan 10 Gb/s MDIO has upstreamed a couple of years
>> ago: '9f35a73 net/fsl: introduce Freescale 10G MDIO driver', granted
>> without a binding writeup.
> 
> Pushing driver code through the netdev tree does not establish device
> tree ABI.  Binding documents and dts files do.

Sure, ideally and formally. But upstreaming a driver represents, if
nothing else, a statement of intent to observe a device tree ABI. Via
the SDK, FSL customers are using the device tree ABI the driver de facto
establishes. I guess a driver that makes it upstream can establish an
device tree ABI

We'll re-spin adding the binding document

>> This patch series should probably include a
>> binding blurb. However, let's not gate this patchset on a complete
>> binding for the FMan
> 
> I at least want to see enough of the FMan binding to have confidence
> that what we're adding now is correct.

I'm not sure what you're looking for. The nodes we're adding are
describing a very common CCSR space interface for quite common device blocks

>> As you know we don't own the FMan work and the FMan work is... not ready
>> for upstreaming.
> 
> I'm not asking for a driver, just a binding that describes hardware.  Is
> there any reason why the fman node needs to be anywhere near as
> complicated as it is in the SDK, if we're limiting it to actual hardware
> description?

Is this a trick question? :-) Of course it doesn't need to be more
complicated than actual hardware. But, to repeat myself, said
description is not... ready and I don't know when it will be. Somebody
else owns pushing the bulk of FMan upstream and I'd rather not step on
their turf quite like this

> Do we really need to have nodes for all the sub-blocks?

Definitely no, and internally I'm pushing to clean that up. However, you
surely remember we've been pushing from the early days of P4080 and it's
been, to put it optimistically, slow

>> In an attempt to make some sort of progress we've
>> decided to upstream the pieces that are less controversial and MDIO is
>> an obvious candidate
>>
>>>> +fman@400000 {
>>>> +	mdio0: mdio@e1120 {
>>>> +		#address-cells = <1>;
>>>> +		#size-cells = <0>;
>>>> +		compatible = "fsl,fman-mdio";
>>>> +		reg = <0xe1120 0xee0>;
>>>> +	};
>>>> +};
>>>
>>> What is the difference between "fsl,fman-mdio" and "fsl,fman-xmdio"?  I
>>> don't see the latter on the list of compatibles in patch 3/6.
>>
>> 'fsl,fman-mdio' is the 1 Gb/s MDIO (Clause 22 only). 'fsl,fman-xmdio' is
>> the 10 Gb/s MDIO (Clause 45 only). We can respin this patch wi
>>
> 
> "respin this patch wi..."?

Not sure where the end of that sentence went. I meant we'll re-spin with
a binding for the 10 Gb/s MDIO block

>> I believe 'fsl,fman-mdio' (and others on that list) was added
>> gratuitously as the FMan MDIO is completely compatible with the
>> eTSEC/gianfar MDIO driver, but we can deal with that later
> 
> It's still good to identify the specific device, even if it's believed
> to be 100% compatible.

You suggesting we create new compatibles for every instance/integration
of a hardware block even though is identical with an earlier hardware
integration? Well, I guess that's been done that and now we have about 8
different compatibles that convey no real difference at all

> Plus, IIRC there's been enough badness in the
> eTSEC MDIO binding that it'd be good to steer clear of it.

Hmm... I guess we can leave things as they are. I wasn't going to touch
this just now anyway

>>> Within each category, is the exact fman version discoverable from the
>>> mdio registers?
>>
>> No, but that's irrelevant as that's not the difference between the two
>> compatibles
> 
> It's relevant because it means the compatible string should have a block
> version number in it, or at least some other way in the MDIO node to
> indicate the block version.

The 1 Gb/s MDIO block doesn't track a version of its own and from a
programming interface perspective it has no visible difference since
eTSEC. The 10 Gb/s MDIO doesn't track a version of its own either and
across the existing FMan versions is identical from a programming
interface perspective

I guess we can append a 'v1.0' to the MDIO compatible(s). However, given
the SDK we'll have to support the compatibles the (already upstream)
drivers support. Dealing with all that legacy is going to be so tedious

>>>> +fman@500000 {
>>>> +	#address-cells = <1>;
>>>> +	#size-cells = <1>;
>>>> +	compatible = "simple-bus";
>>>
>>> Why is this simple-bus?
>>
>> Because that's the translation type for the FMan sub-nodes.
> 
> What do you mean by "translation type"?

I mean address translation across buses

>> We need it now to get the MDIO nodes probed
> 
> No.  "simple-bus" is stating an attribute of the hardware, that the
> child nodes represent simple memory-mapped devices that can be used
> without special bus knowledge.  I don't think that applies here.

Yes it does. The FMan sub-nodes are "simple memory-mapped devices that
can be used without special bus knowledge". Perhaps you're thinking
about the PHY devices on the MDIO bus

> You can get the MDIO node probed without misusing simple-bus by adding
> the fman node's compatible to the probe list in the kernel code.

I think that's gratuitous and it's been done gratuitously in the past
for CCSR space (sub-)nodes

> This sort of thing is why I want to see what the rest of the fman
> binding will look like.
> 
>>  and we'll needed later to probe other nodes/devices that will have
>> standalone drivers: MAC, MURAM. etc. 
> 
> How are they truly standalone?

I meant that they have individual drivers and they are not handled by
the high-level FMan driver

> The exist in service to the greater
> entity that is fman.  They presumably work together in some fashion.

Some blocks can work independently. The MURAM is an example and it seems
the existing CPM/QE MURAM code allows it to be used as regular memory.
The MDIO block could handle PHY(s) for other MACs in the system.

>>>> +	/* mdio nodes for fman v3 @ 0x500000 */
>>>> +	mdio@fc000 {
>>>> +		#address-cells = <1>;
>>>> +		#size-cells = <0>;
>>>> +		reg = <0xfc000 0x1000>;
>>>> +	};
>>>> +
>>>> +	mdio@fd000 {
>>>> +		#address-cells = <1>;
>>>> +		#size-cells = <0>;
>>>> +		reg = <0xfd000 0x1000>;
>>>> +	};
>>>> +};
>>>
>>> Where's the compatible?  Why is this file different from all the others?
>>
>> The FMan v3 MDIO block (supports both Clause 22/45) is compatible with
>> the FMan v2 10 Gb/s MDIO (the xgmac-mdio driver). However, the driver
>> needs a small clean-up patch (still in internal review) that will get it
>> working for FMan v3 MDIO.
> 
> This suggests that it is not 100% backwards compatible.

It is. The code is just not everything it should be


Cheers,


>>  With that patch will add the compatible to these nodes. However, we
>> need these nodes now for the board level MDIO bus muxing support
>> (included in this patchset)
> 
> If you need these nodes now then add the compatible property now.
> 
> -Scott

^ permalink raw reply

* Re: [PATCH 5/6] powerpc/corenet: Add DPAA FMan support to the SoC device tree(s)
From: Emil Medve @ 2014-05-06  6:28 UTC (permalink / raw)
  To: Scott Wood
  Cc: devicetree, Kanetkar Shruti-B44454, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1399332886.15726.161.camel@snotra.buserror.net>

Hello Scott,


On 05/05/2014 06:34 PM, Scott Wood wrote:
> On Sun, 2014-05-04 at 05:59 -0500, Emil Medve wrote:
>> Hello Scott,
>>
>>
>> On 04/21/2014 05:14 PM, Scott Wood wrote:
>>> On Fri, 2014-04-18 at 07:21 -0500, Shruti Kanetkar wrote:
>>>> FMan 1 Gb/s MACs (dTSEC and mEMAC) have support for SGMII PHYs.
>>>> Add support for the internal SerDes TBI PHYs
>>>>
>>>> Based on prior work by Andy Fleming <afleming@gmail.com>
>>>>
>>>> Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com>
>>>> ---
>>>>  arch/powerpc/boot/dts/fsl/b4860si-post.dtsi |  28 +++++
>>>>  arch/powerpc/boot/dts/fsl/b4si-post.dtsi    |  51 +++++++++
>>>>  arch/powerpc/boot/dts/fsl/p1023si-post.dtsi |  14 +++
>>>>  arch/powerpc/boot/dts/fsl/p2041si-post.dtsi |  64 ++++++++++++
>>>>  arch/powerpc/boot/dts/fsl/p3041si-post.dtsi |  64 ++++++++++++
>>>>  arch/powerpc/boot/dts/fsl/p4080si-post.dtsi | 104 +++++++++++++++++++
>>>>  arch/powerpc/boot/dts/fsl/p5020si-post.dtsi |  64 ++++++++++++
>>>>  arch/powerpc/boot/dts/fsl/p5040si-post.dtsi | 128 +++++++++++++++++++++++
>>>>  arch/powerpc/boot/dts/fsl/t4240si-post.dtsi | 154 ++++++++++++++++++++++++++++
>>>>  9 files changed, 671 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
>>>> index cbc354b..45b0ff5 100644
>>>> --- a/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
>>>> +++ b/arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
>>>> @@ -172,6 +172,34 @@
>>>>  		compatible = "fsl,b4860-rcpm", "fsl,qoriq-rcpm-2.0";
>>>>  	};
>>>>  
>>>> +/include/ "qoriq-fman3-0-1g-4.dtsi"
>>>> +/include/ "qoriq-fman3-0-1g-5.dtsi"
>>>> +/include/ "qoriq-fman3-0-10g-0.dtsi"
>>>> +/include/ "qoriq-fman3-0-10g-1.dtsi"
>>>> +	fman@400000 {
>>>> +		ethernet@e8000 {
>>>> +			tbi-handle = <&tbi4>;
>>>> +		};
>>>
>>> Binding needed
>>>
>>> Where is the "reg" for these unit addresses?
>>
>> As I said, the bulk of the FMan work comes from another team. Here we
>> need just enough to hook up the MDIO and PHY nodes.
> 
> Unit addresses must match reg.  No reg, no unit address.

We can add a 'reg' property, but we really don't want to clash with the
team that is working on upstreaming the FMan/MAC bindings and drivers

>> I'd really like to be able to make progress on this without waiting for that moment in time
>> we can get the entire FMan binding in place
> 
> Why is the fman binding such a big deal?
> 
>>>> +		mdio@e9000 {
>>>> +			tbi4: tbi-phy@8 {
>>>> +				reg = <0x8>;
>>>> +				device_type = "tbi-phy";
>>>> +			};
>>>> +		};
>>>
>>> Binding needed for tbi-phy device_type
>>
>> I guess that's fair (BTW, you accepted tbi-phy nodes/device-type before
>> without a binding)
> 
> It's existing practice on eTSEC.  FMan seemed like an opportunity to
> avoid carrying cruft forward.

The 1 Gb/s MDIO block is not FMan specific. As I said is the same block
from eTSEC. That's part of the reason we're trying upstreaming this
independent of the FMan stuff. So, don't think FMan, think MDIO

>>> Why are we using device_type at all for this?
>>
>> That's what the upstream driver is looking for.
> 
> Drivers should look for what the binding says -- not the other way
> around.

Yeah yeah. Nobody likes it, but the driver is/describes the de facto binding

On a constructive note, the Ethernet PHY code doesn't do device tree
based probing so no compatibles are used at all. So device_type is used
to convey a TBI PHY

>>  Anyway, most days PHYs can be discovered so they don't use/need
>> compatible properties. That's I guess part of the reason we don't have
>> bindings for them PHY nodes
> 
> I don't see why there couldn't be a compatible that describes the
> standard programming interface.

Because it can be detected at runtime and I guess stuff like that should
stay out of the device tree. I'm using PCI as an analogy here

>> However, what you can't discover is how they are wired to the MAC(s) so
>> we still need some nodes in the device tree to convey that. Also, when
>> looking for a specific kind of PHY, such as TBI, device_type works
>> easier then parsing compatibles from various vendors or so
> 
> Don't you find the TBI by following the tbi-handle property?

When the MAC "attaches" to the PHY the tbi-handle is followed. But the
MDIO/PHY code/driver(s) doesn't quite "see" the tbi-handle as it's
outside the MDIO/PHY nodes

> That said,
> I don't object to having a way to label a PHY as attached via TBI if
> that's useful.  I'm giving a mild, non-nacking (given the history)
> objection to using device_type for that (given other history).

Personally, I think that TBI PHY support is a bit messy but I don't have
bandwidth to deal with that. The TBI PHY should be handled as a regular
PHY and right now is a special case


Cheers,

^ permalink raw reply

* Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
From: Alexander Graf @ 2014-05-06  6:56 UTC (permalink / raw)
  To: Gavin Shan, Alex Williamson; +Cc: kvm, aik, kvm-ppc, qiudayu, linuxppc-dev
In-Reply-To: <20140506042622.GA24228@shangw>


On 06.05.14 06:26, Gavin Shan wrote:
> On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote:
>> On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote:
>>> On 05/05/2014 03:27 AM, Gavin Shan wrote:
>>>> The series of patches intends to support EEH for PCI devices, which have been
>>>> passed through to PowerKVM based guest via VFIO. The implementation is
>>>> straightforward based on the issues or problems we have to resolve to support
>>>> EEH for PowerKVM based guest.
>>>>
>>>> - Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure
>>>>     to emulate XICS. Without introducing new mechanism, we just extend that
>>>>     existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
>>>>     initiated from guest are posted to host where the requests get handled or
>>>>     delivered to underly firmware for further handling. For that, the host kerenl
>>>>     has to maintain the PCI address (host domain/bus/slot/function to guest's
>>>>     PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping
>>>>     will be built when initializing VFIO device in QEMU and destroied when the
>>>>     VFIO device in QEMU is going to offline, or VM is destroy.
>>> Do you also expose all those interfaces to user space? VFIO is as much
>>> about user space device drivers as it is about device assignment.
>>>
> Yep, all the interfaces are exported to user space.
>
>>> I would like to first see an implementation that doesn't touch KVM
>>> emulation code at all but instead routes everything through QEMU. As a
>>> second step we can then accelerate performance critical paths inside of KVM.
>>>
> Ok. I'll change the implementation. However, the QEMU still has to
> poll/push information from/to host kerenl. So the best place for that
> would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature.
>
> For the error injection, I guess I have to put the logic token management
> into QEMU and error injection request will be handled by QEMU and then
> routed to host kernel via additional syscall as we did for pSeries.

Yes, start off without in-kernel XICS so everything simply lives in 
QEMU. Then add callbacks into the in-kernel XICS to inject these 
interrupts if we don't have wide enough interfaces already.



Alex

^ permalink raw reply

* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Alexander Graf @ 2014-05-06  6:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Aneesh Kumar K.V, kvm-ppc, kvm
In-Reply-To: <20140506004133.GA12595@iris.ozlabs.ibm.com>


On 06.05.14 02:41, Paul Mackerras wrote:
> On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote:
>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>> +	return vcpu->arch.fault_dar;
>> How about PA6T and G5s?
> G5 sets DAR on an alignment interrupt.
>
> As for PA6T, I don't know for sure, but if it doesn't, ordinary
> alignment interrupts wouldn't be handled properly, since the code in
> arch/powerpc/kernel/align.c assumes DAR contains the address being
> accessed on all PowerPC CPUs.

Now that's a good point. If we simply behave like Linux, I'm fine. This 
definitely deserves a comment on the #ifdef in the code.


Alex

^ permalink raw reply

* Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
From: Alexander Graf @ 2014-05-06  7:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev@lists.ozlabs.org, paulus@samba.org, Aneesh Kumar K.V,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
In-Reply-To: <1399334797.20388.71.camel@pasglop>


On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>> Isn't this a greater problem? We should start swapping before we hit
>> the point where non movable kernel allocation fails, no?
> Possibly but the fact remains, this can be avoided by making sure that
> if we create a CMA reserve for KVM, then it uses it rather than using
> the rest of main memory for hash tables.

So why were we preferring non-CMA memory before? Considering that Aneesh 
introduced that logic in fa61a4e3 I suppose this was just a mistake?

>> The fact that KVM uses a good number of normal kernel pages is maybe
>> suboptimal, but shouldn't be a critical problem.
> The point is that we explicitly reserve those pages in CMA for use
> by KVM for that specific purpose, but the current code tries first
> to get them out of the normal pool.
>
> This is not an optimal behaviour and is what Aneesh patches are
> trying to fix.

I agree, and I agree that it's worth it to make better use of our 
resources. But we still shouldn't crash.

However, reading through this thread I think I've slowly grasped what 
the problem is. The hugetlbfs size calculation.

I guess something in your stack overreserves huge pages because it 
doesn't account for the fact that some part of system memory is already 
reserved for CMA.

So the underlying problem is something completely orthogonal. The patch 
body as is is fine, but the patch description should simply say that we 
should prefer the CMA region because it's already reserved for us for 
this purpose and we make better use of our available resources that way.

All the bits about pinning, numa, libvirt and whatnot don't really 
matter and are just details that led Aneesh to find this non-optimal 
allocation.


Alex

^ permalink raw reply

* Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
From: Benjamin Herrenschmidt @ 2014-05-06  7:14 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, aik, Gavin Shan, kvm-ppc, Alex Williamson, qiudayu,
	linuxppc-dev
In-Reply-To: <536887A3.30703@suse.de>

On Tue, 2014-05-06 at 08:56 +0200, Alexander Graf wrote:
> > For the error injection, I guess I have to put the logic token
> management
> > into QEMU and error injection request will be handled by QEMU and
> then
> > routed to host kernel via additional syscall as we did for pSeries.
> 
> Yes, start off without in-kernel XICS so everything simply lives in 
> QEMU. Then add callbacks into the in-kernel XICS to inject these 
> interrupts if we don't have wide enough interfaces already.

It's got nothing to do with XICS ... :-)

But yes, we can route everything via qemu for now, then we'll need
at least one of the call to have a "direct" path but we should probably
strive to even make it real mode if that's possible, it's the one that
Linux will call whenever an MMIO returns all f's to check if the
underlying PE is frozen.

But we can do that as a second stage.

In fact going via VFIO ioctl's does make the whole security and
translation model much simpler initially.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
From: Benjamin Herrenschmidt @ 2014-05-06  7:19 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linuxppc-dev@lists.ozlabs.org, paulus@samba.org, Aneesh Kumar K.V,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
In-Reply-To: <536889C6.1050603@suse.de>

On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
> > On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
> >> Isn't this a greater problem? We should start swapping before we hit
> >> the point where non movable kernel allocation fails, no?
> > Possibly but the fact remains, this can be avoided by making sure that
> > if we create a CMA reserve for KVM, then it uses it rather than using
> > the rest of main memory for hash tables.
> 
> So why were we preferring non-CMA memory before? Considering that Aneesh 
> introduced that logic in fa61a4e3 I suppose this was just a mistake?

I assume so.

> >> The fact that KVM uses a good number of normal kernel pages is maybe
> >> suboptimal, but shouldn't be a critical problem.
> > The point is that we explicitly reserve those pages in CMA for use
> > by KVM for that specific purpose, but the current code tries first
> > to get them out of the normal pool.
> >
> > This is not an optimal behaviour and is what Aneesh patches are
> > trying to fix.
> 
> I agree, and I agree that it's worth it to make better use of our 
> resources. But we still shouldn't crash.

Well, Linux hitting out of memory conditions has never been a happy
story :-)

> However, reading through this thread I think I've slowly grasped what 
> the problem is. The hugetlbfs size calculation.

Not really.

> I guess something in your stack overreserves huge pages because it 
> doesn't account for the fact that some part of system memory is already 
> reserved for CMA.

Either that or simply Linux runs out because we dirty too fast...
really, Linux has never been good at dealing with OO situations,
especially when things like network drivers and filesystems try to do
ATOMIC or NOIO allocs...
 
> So the underlying problem is something completely orthogonal. The patch 
> body as is is fine, but the patch description should simply say that we 
> should prefer the CMA region because it's already reserved for us for 
> this purpose and we make better use of our available resources that way.

No.

We give a chunk of memory to hugetlbfs, it's all good and fine.

Whatever remains is split between CMA and the normal page allocator.

Without Aneesh latest patch, when creating guests, KVM starts allocating
it's hash tables from the latter instead of CMA (we never allocate from
hugetlb pool afaik, only guest pages do that, not hash tables).

So we exhaust the page allocator and get linux into OOM conditions
while there's plenty of space in CMA. But the kernel cannot use CMA for
it's own allocations, only to back user pages, which we don't care about
because our guest pages are covered by our hugetlb reserve :-)

> All the bits about pinning, numa, libvirt and whatnot don't really 
> matter and are just details that led Aneesh to find this non-optimal 
> allocation.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
From: Alexander Graf @ 2014-05-06  7:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev@lists.ozlabs.org, paulus@samba.org, Aneesh Kumar K.V,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
In-Reply-To: <1399360775.20388.112.camel@pasglop>


On 06.05.14 09:19, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
>> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
>>> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>>>> Isn't this a greater problem? We should start swapping before we hit
>>>> the point where non movable kernel allocation fails, no?
>>> Possibly but the fact remains, this can be avoided by making sure that
>>> if we create a CMA reserve for KVM, then it uses it rather than using
>>> the rest of main memory for hash tables.
>> So why were we preferring non-CMA memory before? Considering that Aneesh
>> introduced that logic in fa61a4e3 I suppose this was just a mistake?
> I assume so.
>
>>>> The fact that KVM uses a good number of normal kernel pages is maybe
>>>> suboptimal, but shouldn't be a critical problem.
>>> The point is that we explicitly reserve those pages in CMA for use
>>> by KVM for that specific purpose, but the current code tries first
>>> to get them out of the normal pool.
>>>
>>> This is not an optimal behaviour and is what Aneesh patches are
>>> trying to fix.
>> I agree, and I agree that it's worth it to make better use of our
>> resources. But we still shouldn't crash.
> Well, Linux hitting out of memory conditions has never been a happy
> story :-)
>
>> However, reading through this thread I think I've slowly grasped what
>> the problem is. The hugetlbfs size calculation.
> Not really.
>
>> I guess something in your stack overreserves huge pages because it
>> doesn't account for the fact that some part of system memory is already
>> reserved for CMA.
> Either that or simply Linux runs out because we dirty too fast...
> really, Linux has never been good at dealing with OO situations,
> especially when things like network drivers and filesystems try to do
> ATOMIC or NOIO allocs...
>   
>> So the underlying problem is something completely orthogonal. The patch
>> body as is is fine, but the patch description should simply say that we
>> should prefer the CMA region because it's already reserved for us for
>> this purpose and we make better use of our available resources that way.
> No.
>
> We give a chunk of memory to hugetlbfs, it's all good and fine.
>
> Whatever remains is split between CMA and the normal page allocator.
>
> Without Aneesh latest patch, when creating guests, KVM starts allocating
> it's hash tables from the latter instead of CMA (we never allocate from
> hugetlb pool afaik, only guest pages do that, not hash tables).
>
> So we exhaust the page allocator and get linux into OOM conditions
> while there's plenty of space in CMA. But the kernel cannot use CMA for
> it's own allocations, only to back user pages, which we don't care about
> because our guest pages are covered by our hugetlb reserve :-)

Yes. Write that in the patch description and I'm happy ;).


Alex

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox