LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] powerpc: add vr save/restore functions
From: Andreas Schwab @ 2014-01-08  9:54 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <1389154699.2076.6.camel@concordia>

Michael Ellerman <michael@ellerman.id.au> writes:

> On Mon, 2013-12-30 at 15:31 +0100, Andreas Schwab wrote:
>> GCC 4.8 now generates out-of-line vr save/restore functions when
>> optimizing for size.  They are needed for the raid6 altivec support.
>
> It looks like they're identical for 32 & 64-bit ?

They use different temporary registers and calling conventions (no .opd
for ppc64).
  
Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Anton Blanchard @ 2014-01-08 14:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: cl, nacc, penberg, linux-mm, paulus, mpm, linuxppc-dev
In-Reply-To: <871u0k5lri.fsf@tassilo.jf.intel.com>


Hi Andi,

> > Thoughts? It seems like we could hit a similar situation if a
> > machine is balanced but we run out of memory on a single node.
> 
> Yes I agree, but your patch doesn't seem to attempt to handle this?

It doesn't. I was hoping someone with more mm knowledge than I could
suggest a lightweight way of doing this.

Anton

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Anton Blanchard @ 2014-01-08 14:14 UTC (permalink / raw)
  To: David Laight
  Cc: cl@linux-foundation.org, nacc@linux.vnet.ibm.com,
	penberg@kernel.org, linux-mm@kvack.org, paulus@samba.org,
	mpm@selenic.com, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D453A4E@AcuExch.aculab.com>


Hi David,

> Why not just delete the entire test?
> Presumably some time a little earlier no local memory was available.
> Even if there is some available now, it is very likely that some won't
> be available again in the near future.

I agree, the current behaviour seems strange but it has been around
since the inital slub commit.

Anton

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Anton Blanchard @ 2014-01-08 14:17 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: cl, nacc, penberg, linux-mm, paulus, mpm, linuxppc-dev
In-Reply-To: <20140107041939.GA20916@hacker.(null)>


Hi Wanpeng,

> >+		if (node_spanned_pages(node)) {
> 
> s/node_spanned_pages/node_present_pages 

Thanks, I hadn't come across node_present_pages() before.

Anton

^ permalink raw reply

* RE: [PATCH] KVM: PPC: Add devname:kvm aliases for modules
From: mihai.caraman @ 2014-01-08 15:41 UTC (permalink / raw)
  To: agraf@suse.de, kvm-ppc@vger.kernel.org
  Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org mailing list

> -----Original Message-----
> From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-
> owner@vger.kernel.org] On Behalf Of Alexander Graf
> Sent: Monday, December 09, 2013 5:02 PM
> To: "; " <kvm-ppc@vger.kernel.org>"@suse.de
> Cc: kvm@vger.kernel.org mailing list
> Subject: [PATCH] KVM: PPC: Add devname:kvm aliases for modules
>=20
> Systems that support automatic loading of kernel modules through
> device aliases should try and automatically load kvm when /dev/kvm
> gets opened.
>=20
> Add code to support that magic for all PPC kvm targets, even the
> ones that don't support modules yet.
>=20
> Signed-off-by: Alexander Graf <agraf@suse.de>

...

> --- a/arch/powerpc/kvm/e500mc.c
> +++ b/arch/powerpc/kvm/e500mc.c
> @@ -391,3 +391,6 @@ static void __exit kvmppc_e500mc_exit(void)
>=20
>  module_init(kvmppc_e500mc_init);
>  module_exit(kvmppc_e500mc_exit);
> +#include <linux/miscdevice.h>
> +MODULE_ALIAS_MISCDEV(KVM_MINOR);
> +MODULE_ALIAS("devname:kvm");
> --

This patch breaks the build on KMV Book3E, you need to include
<linux/module.h> too.

-Mike

^ permalink raw reply

* Re: [PATCH RFC v6 4/5] dma: mpc512x: register for device tree channel lookup
From: Gerhard Sittig @ 2014-01-08 16:47 UTC (permalink / raw)
  To: Alexander Popov
  Cc: Lars-Peter Clausen, Arnd Bergmann, Vinod Koul, dmaengine,
	Dan Williams, Anatolij Gustschin, linuxppc-dev
In-Reply-To: <CAF0T0X7wotm60Bq4tfYcRhkDwiCw2hh5R5m2WmVLmLGLZYpQsQ@mail.gmail.com>

[ dropping devicetree from the Cc: list ]

[ what is the semantics of DMA_PRIVATE capability flag?
  is documentation available beyond the initial commit message?
  need individual channels be handled instead of controllers? ]

On Sat, Jan 04, 2014 at 00:54 +0400, Alexander Popov wrote:
> 
> Hello Gerhard.
> Thanks for your review.
> 
> 2013/12/26 Gerhard Sittig <gsi@denx.de>:
> > [ dropping devicetree, we're DMA specific here ]
> >
> > On Tue, Dec 24, 2013 at 16:06 +0400, Alexander Popov wrote:
> >>
> >> --- a/drivers/dma/mpc512x_dma.c
> >> +++ b/drivers/dma/mpc512x_dma.c
> >> [ ... ]
> >> @@ -950,6 +951,7 @@ static int mpc_dma_probe(struct platform_device *op)
> >>       INIT_LIST_HEAD(&dma->channels);
> >>       dma_cap_set(DMA_MEMCPY, dma->cap_mask);
> >>       dma_cap_set(DMA_SLAVE, dma->cap_mask);
> >> +     dma_cap_set(DMA_PRIVATE, dma->cap_mask);
> >>
> >>       for (i = 0; i < dma->chancnt; i++) {
> >>               mchan = &mdma->channels[i];
> >
> > What are the implications of this?  Is a comment due?
> 
> I've involved DMA_PRIVATE flag because new of_dma_xlate_by_chan_id()
> uses dma_get_slave_channel() instead of dma_request_channel()
> (PATCH RFC v6 3/5). This flag is implicitly set in dma_request_channel(),
> but is not set in dma_get_slave_channel().
> 
> There are only two places in the mainline kernel, where
> dma_get_slave_channel() is used. I've picked up the idea
> at one of these places. Please look at this patch:
> http://www.spinics.net/lists/arm-kernel/msg268718.html

I agree that the change looks simple, and there is no doubt that
other drivers apply the flag.  None of this I questioned.  Yet
I'm afraid that the implications are rather huge.

Unless I miss something, I'd happily learn where I'm wrong.

> > I haven't found documentation about the DMA_PRIVATE flag, only
> > saw commit 59b5ec21446b9 "dmaengine: introduce
> > dma_request_channel and private channels".
> 
> Unfortunately I didn't find any description of DMA_PRIVATE flag too.
> But the comment at the beginning of drivers/dma/dmaengine.c
> may give a clue. Quotation:
>   * subsystem can get access to a channel by calling dmaengine_get() followed
>   * by dma_find_channel(), or if it has need for an exclusive channel
> it can call
>   * dma_request_channel().  Once a channel is allocated a reference is taken
>   * against its corresponding driver to disable removal.
> 
> DMA_PRIVATE capability flag might indicate that the DMA controller
> can provide exclusive channels to its clients. Please correct me if I'm wrong.
> 
> > Alex, unless I'm
> > missing something this one-line change is quite a change in
> > semantics, and has dramatic influence on the code's behaviour
> > (ignores the DMA controller when looking for channels that can do
> > mem-to-mem transfers)
> 
> Excuse me, Gerhard, I don't see what you mean.
> Could you point to the corresponding code?

You did see `git show 59b5ec21446b9`, didn't you?  The commit
message strongly suggests that DMA_PRIVATE applies to the whole
DMA controller and excludes _all_ of its channels from the
general purpose allocator which mem-to-mem transfers appear to be
using.  It's not just a hint, but an active decision to reject
requests.

Not only checking code references, but doing a text search,
reveals one more comment on the DMA_PRIVATE flag in a crypto
related document, which supports my interpretation:
Documentation/crypto/async-tx-api.txt:203


Can somebody ACK or NAK my interpretation?  Dan, you committed
this change which introduced the DMA_PRIVATE logic.  What was the
motivation for it, or the goal to achieve?  Do other platforms
have several dedicated DMA controllers, some for peripherals and
some for memory transfers?  Should the "private" flag apply to
channels and not whole controllers?  Am I over-estimating the
benefit or importance of DMA supported memory transfers?


Still I see a difference in the lookup approaches:  Yours applies
DMA_PRIVATE globally and in advance, preventing _any_ use of DMA
for memory transfers.  While the __dma_request_channel() routine
only applies it _temporarily_ around a dma_chan_get() operation.
Allowing for use of DMA channels by both individual peripherals
as well as memory transfers.


> > Consider the fact that this driver
> > handles both MPC5121 as well as MPC8308 hardware.
> 
> Ah, yes, sorry. I should certainly fix this, if setting of DMA_PRIVATE flag
> is needed at all.

What I meant here is that implications for all affected platforms
should be considered.  There is one driver source, but the driver
applies to more than one platform (another issue of the driver is
that this is not apparent from the doc nor the compat strings).

MPC512x has one (GP) DMA controller, of which one channel is
dedicated to DDR, and all other channels can get used for memory
transfers as well.  In addition to most channels being connected
to a specific peripheral for flow control.  Which your patch set
introduces initial support for.

MPC8308 has _all_ channels for memory transfers exclusively (or
at least none of its channels supports flow control).

So blocking memory transfers in mpc512x_dma.c is a total breakage
for MPC8308 (removes the only previous feature and adds nothing),
and is a regression for MPC512x (removes the previously supported
memory transfers, while it may add peripheral supports with very
few users).


virtually yours
Gerhard Sittig
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr. 5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-0 Fax: +49-8142-66989-80  Email: office@denx.de

^ permalink raw reply

* Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars
From: Olof Johansson @ 2014-01-08 17:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Ellerman, linuxppc-dev, linux-kernel@vger.kernel.org,
	Anton Blanchard, chzigotzky
In-Reply-To: <1389154706.4672.21.camel@pasglop>

On Wed, Jan 08, 2014 at 03:18:26PM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2014-01-08 at 15:09 +1100, Michael Ellerman wrote:
> > > Of course, main worry is that this is just hiding some latent NULL
> > deref in
> > > the kernel now... :-/
> > 
> > Wow, that would have to come close to winning the
> > grossest-hack-in-arch-powerpc
> > award :)
> > 
> > Have you tried changing the value at 8 to point to a reserved page?
> > 
> > Some other possibilities:
> > 
> >  * Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would
> > mean
> >    you'd only be able to boot pasemi_defconfig.

No thanks -- this went uncaught because that used to be all I booted
(and for some random reason it didn't trigger in that case).

> >  * Move the hack into FIXUP_ENDIAN
> 
> We actually found the root cause on irc the other day, I was waiting for
> Olof to send a fix :-)

Yeah, I'm low on spare time these days, in particular spare time to spend on
ppc stuff. :-(

> Olof: Can you try this totally untested patch ?

With one fixup below:

Tested-by: Olof Johansson <olof@lixom.net>

> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -1986,8 +1986,6 @@ static void __init prom_init_stdout(void)
>         /* Get the full OF pathname of the stdout device */
>         memset(path, 0, 256);
>         call_prom("instance-to-path", 3, 1, prom.stdout, path, 255);
> -       stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
> -       val = cpu_to_be32(stdout_node);
>         prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
>                      &val, sizeof(val));
>         prom_printf("OF stdout device is: %s\n", of_stdout_device);
> @@ -1995,10 +1993,14 @@ static void __init prom_init_stdout(void)
>                      path, strlen(path) + 1);
>  
>         /* If it's a display, note it */
> -       memset(type, 0, sizeof(type));
> -       prom_getprop(stdout_node, "device_type", type, sizeof(type));
> -       if (strcmp(type, "display") == 0)
> -               prom_setprop(stdout_node, path, "linux,boot-display", NULL, 0);
> +       stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
> +       if (stdout_node != PROM_ERROR) {
> +               val = cpu_to_be32(stdout_node);
> +               memset(type, 0, sizeof(type));
> +               prom_getprop(stdout_node, "device_type", type, sizeof(type));
> +               if (strcmp(type, "display") == 0)
> +                       prom_setprop(stdout_node, path, "linux,boot-display", NU

Line is cut off, this needs "NULL, 0);" at the end.



-Olof

^ permalink raw reply

* Re: 答复: [v7] clk: corenet: Adds the clock binding
From: Scott Wood @ 2014-01-08 18:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Yuantian Tang, linuxppc-dev@lists.ozlabs.org,
	devicetree@vger.kernel.org
In-Reply-To: <20140108093046.GB6701@e106331-lin.cambridge.arm.com>

On Wed, 2014-01-08 at 09:30 +0000, Mark Rutland wrote:
> On Wed, Jan 08, 2014 at 08:53:56AM +0000, Yuantian Tang wrote:
> > 
> > ________________________________________
> > 发件人: Wood Scott-B07421
> > 发送时间: 2014年1月8日 8:21
> > 收件人: Tang Yuantian-B29983
> > 抄送: galak@kernel.crashing.org; mark.rutland@arm.com; devicetree@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> > 主题: Re: [v7] clk: corenet: Adds the clock binding
> > 
> > On Wed, Nov 20, 2013 at 05:04:49PM +0800, tang yuantian wrote:
> > > +Recommended properties:
> > > +- ranges: Allows valid translation between child's address space and
> > > +     parent's. Must be present if the device has sub-nodes.
> > > +- #address-cells: Specifies the number of cells used to represent
> > > +     physical base addresses.  Must be present if the device has
> > > +     sub-nodes and set to 1 if present
> > > +- #size-cells: Specifies the number of cells used to represent
> > > +     the size of an address. Must be present if the device has
> > > +     sub-nodes and set to 1 if present
> > 
> > Why are we specifying #address-cells/#size-cells here?
> > 
> > A: it has sub-nodes which have REG property, don't we need to 
> > specify #address-cells/#size-cells?
> 
> If a node has a reg entry, its parent should have #size-cells and
> #address-cells to allow it to be parsed properly.

Yes, but why do we need to specify in this binding how many cells there
will be, especially since this binding only addresses the clock provider
aspect of the clockgen nodes (e.g. it doesn't describe the reg)?  Or
rather, it's partially describing the non-clock aspect, and doesn't
address the clock aspect at all AFAICT.

Where does the actual input clock frequency go?  U-Boot puts it in the
clockgen node itself as clock-frequency, but that isn't described in the
binding.  How does that relate to the sysclk node?  If
"fsl,qoriq-sysclk-1.0" is supposed to indicate that clock-frequency can
be found in the parent node, that isn't specified by the binding, nor is
clock-frequency shown in the example.

What is the difference between "fsl,qoriq-sysclk-1.0" and
"fsl,qoriq-sysclk-2.0"?  How does the concept of a fixed input clock
change?

-Scott

^ permalink raw reply

* Re: [PATCH v5 1/1] powerpc/embedded6xx: Add support for Motorola/Emerson MVME5100
From: Stephen N Chivers @ 2014-01-08 19:20 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Stephen N Chivers
In-Reply-To: <1389138670.11795.208.camel@snotra.buserror.net>

Scott Wood <scottwood@freescale.com> wrote on 01/08/2014 10:51:10 AM:

> From: Scott Wood <scottwood@freescale.com>
> To: Stephen N Chivers/AUS/CSC@CSC
> Cc: <benh@kernel.crashing.org>, <linuxppc-dev@lists.ozlabs.org>
> Date: 01/08/2014 10:51 AM
> Subject: Re: [PATCH v5 1/1] powerpc/embedded6xx: Add support for 
> Motorola/Emerson MVME5100
> 
> On Mon, 2014-01-06 at 12:23 +1100, Stephen Chivers wrote:
> > Add support for the Motorola/Emerson MVME5100 Single Board Computer.
> > 
> > The MVME5100 is a 6U form factor VME64 computer with:
> > 
> >    - A single MPC7410 or MPC750 CPU
> >    - A HAWK Processor Host Bridge (CPU to PCI) and
> >      MultiProcessor Interrupt Controller (MPIC)
> >    - Up to 500Mb of onboard memory
> >    - A M48T37 Real Time Clock (RTC) and Non-Volatile Memory chip
> >    - Two 16550 compatible UARTS
> >    - Two Intel E100 Fast Ethernets
> >    - Two PCI Mezzanine Card (PMC) Slots
> >    - PPCBug Firmware
> > 
> > The HAWK PHB/MPIC is compatible with the MPC10x devices.
> > 
> > There is no onboard disk support. This is usually provided by 
> installing a PMC
> > in first PMC slot.
> > 
> > This patch revives the board support, it was present in early 2.6
> > series kernels. The board support in those days was by Matt Porter of
> > MontaVista Software.
> > 
> > CSC Australia has around 31 of these boards in service. The kernel in 
use
> > for the boards is based on 2.6.31. The boards are operated without 
disks
> > from a file server. 
> > 
> > This patch is based on linux-3.13-rc2 and has been boot tested.
> > 
> > Only boards with 512 Mb of memory are known to work.
> > 
> > V1->V2:
> >    Address comments by Kumar Gala and Scott Wood.
> >    Minor adjustment to platforms/embedded6xx/Kconfig to ensure
> >       correct indentation where possible.
> > 
> > V2->V3:
> >    Address comments by Scott Wood and Ben Herrenschmidt.
> >    Address errors reported by checkpatch.
> > 
> > V3->V4:
> >    Address comment by Geert Uytterhoeven
> >    Add tested by Alessio Bogani.
> > 
> > V4->V5:
> >    Correct horrible typo in patch history.
> >    Kular Gama is Kumar Gala.
> 
> The patch history should go below the --- line, as it's for reviewers
> who have seen previous versions rather than for the git history.
Ok.
> 
> -Scott
> 
> 

^ permalink raw reply

* Re: [PATCH RFC 1/3] drivers: base: support cpu cache information interface to userspace via sysfs
From: Greg Kroah-Hartman @ 2014-01-08 20:26 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: devicetree, Ashok Raj, Rob Herring, x86, linux-kernel,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <1389209168-17189-2-git-send-email-sudeep.holla@arm.com>

On Wed, Jan 08, 2014 at 07:26:06PM +0000, Sudeep Holla wrote:
> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> This patch adds initial support for providing processor cache information
> to userspace through sysfs interface. This is based on x86 implementation
> and hence the interface is intended to be fully compatible.
> 
> A per-cpu array of cache information maintained is used mainly for
> sysfs-related book keeping.
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
>  drivers/base/Makefile     |   2 +-
>  drivers/base/cacheinfo.c  | 296 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/cacheinfo.h |  43 +++++++
>  3 files changed, 340 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/base/cacheinfo.c
>  create mode 100644 include/linux/cacheinfo.h

You are creating sysfs files, yet you didn't add Documentation/ABI/
information, which is required.  Please fix that.

greg k-h

^ permalink raw reply

* Re: [PATCH RFC 1/3] drivers: base: support cpu cache information interface to userspace via sysfs
From: Greg Kroah-Hartman @ 2014-01-08 20:27 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: devicetree, Ashok Raj, Rob Herring, x86, linux-kernel,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <1389209168-17189-2-git-send-email-sudeep.holla@arm.com>

On Wed, Jan 08, 2014 at 07:26:06PM +0000, Sudeep Holla wrote:
> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> This patch adds initial support for providing processor cache information
> to userspace through sysfs interface. This is based on x86 implementation
> and hence the interface is intended to be fully compatible.
> 
> A per-cpu array of cache information maintained is used mainly for
> sysfs-related book keeping.
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
>  drivers/base/Makefile     |   2 +-
>  drivers/base/cacheinfo.c  | 296 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/cacheinfo.h |  43 +++++++
>  3 files changed, 340 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/base/cacheinfo.c
>  create mode 100644 include/linux/cacheinfo.h
> 
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 94e8a80..76f07c8 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -4,7 +4,7 @@ obj-y			:= core.o bus.o dd.o syscore.o \
>  			   driver.o class.o platform.o \
>  			   cpu.o firmware.o init.o map.o devres.o \
>  			   attribute_container.o transport_class.o \
> -			   topology.o
> +			   topology.o cacheinfo.o
>  obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
>  obj-$(CONFIG_DMA_CMA) += dma-contiguous.o
>  obj-y			+= power/
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> new file mode 100644
> index 0000000..f436c31
> --- /dev/null
> +++ b/drivers/base/cacheinfo.c
> @@ -0,0 +1,296 @@
> +/*
> + * cacheinfo support - processor cache information via sysfs
> + *
> + * Copyright (C) 2013 ARM Ltd.
> + * All Rights Reserved
> + *
> + * Author: Sudeep Holla <sudeep.holla@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
> + * kind, whether express or implied; without even the implied warranty
> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +#include <linux/bitops.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/compiler.h>
> +#include <linux/cpu.h>
> +#include <linux/device.h>
> +#include <linux/init.h>
> +#include <linux/kobject.h>
> +#include <linux/of.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/smp.h>
> +#include <linux/sysfs.h>
> +
> +struct cache_attr {
> +	struct attribute attr;
> +	 ssize_t(*show) (unsigned int, unsigned short, char *);
> +	 ssize_t(*store) (unsigned int, unsigned short, const char *, size_t);
> +};
> +
> +/* pointer to kobject for cpuX/cache */
> +static DEFINE_PER_CPU(struct kobject *, ci_cache_kobject);
> +#define per_cpu_cache_kobject(cpu)     (per_cpu(ci_cache_kobject, cpu))
> +
> +struct index_kobject {
> +	struct kobject kobj;
> +	unsigned int cpu;
> +	unsigned short index;
> +};
> +
> +static cpumask_t cache_dev_map;
> +
> +/* pointer to array of kobjects for cpuX/cache/indexY */

Please don't use "raw" kobjects for this, use the device attribute
groups, that's what they are there for.  Bonus is that your code should
get a lot simpler when you do that.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH RFC 1/3] drivers: base: support cpu cache information interface to userspace via sysfs
From: Greg Kroah-Hartman @ 2014-01-08 20:28 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: devicetree, Ashok Raj, Rob Herring, x86, linux-kernel,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <1389209168-17189-2-git-send-email-sudeep.holla@arm.com>

On Wed, Jan 08, 2014 at 07:26:06PM +0000, Sudeep Holla wrote:
> From: Sudeep Holla <sudeep.holla@arm.com>
> +#define define_one_ro(_name) \
> +static struct cache_attr _name = \
> +	__ATTR(_name, 0444, show_##_name, NULL)

In the future, we do have __ATTR_RO(), which should be used instead.
You should never use __ATTR() on it's own, if at all possible.  I'm
sweeping the tree for all usages and fixing them slowly up over time.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH RFC 2/3] ARM: kernel: add support for cpu cache information
From: Russell King - ARM Linux @ 2014-01-08 20:57 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: devicetree, Ashok Raj, Rob Herring, x86, linux-kernel,
	Greg Kroah-Hartman, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1389209168-17189-3-git-send-email-sudeep.holla@arm.com>

On Wed, Jan 08, 2014 at 07:26:07PM +0000, Sudeep Holla wrote:
> +#if __LINUX_ARM_ARCH__ < 7 /* pre ARMv7 */
> +
> +#define MAX_CACHE_LEVEL		1	/* Only 1 level supported */
> +#define CTR_CTYPE_SHIFT		24
> +#define CTR_CTYPE_MASK		(1 << CTR_CTYPE_SHIFT)
> +
> +static inline unsigned int get_ctr(void)
> +{
> +	unsigned int ctr;
> +	asm volatile ("mrc p15, 0, %0, c0, c0, 1" : "=r" (ctr));
> +	return ctr;
> +}
> +
> +static enum cache_type get_cache_type(int level)
> +{
> +	if (level > MAX_CACHE_LEVEL)
> +		return CACHE_TYPE_NOCACHE;
> +	return get_ctr() & CTR_CTYPE_MASK ?
> +		CACHE_TYPE_SEPARATE : CACHE_TYPE_UNIFIED;

So, what do we do for CPUs that don't implement the CTR?  Just return
random rubbish based on decoding the CPU Identity register as if it
were the cache type register?

-- 
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up.  Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

^ permalink raw reply

* Re: PCIE device errors after linux kernel upgrade
From: Scott Wood @ 2014-01-08 21:07 UTC (permalink / raw)
  To: ravich; +Cc: linuxppc-dev
In-Reply-To: <1389169499473-79160.post@n7.nabble.com>

On Wed, 2014-01-08 at 00:24 -0800, ravich wrote:
> Finally I found the problem causing the sudden system reset :
> 
> our setup :
> 
> P2020<====>PCI Bridge <=====> FPGA
> 
> The reset occurs when we allocating skb and giving the Fpga dma addr of
> skb->data of this skb and when the FPGA tries to reach this address we are
> having a hardware reset. 
> 
> To fixed it we used GFP_DMA flag on skb allocations.
> 
> If you can explain me few thinks I will be more  then happy :
> 1) how come we managed to work in 2.6.32 kernel without this flag.

Maybe you got lucky, allocation patterns changed, etc?

> 2) Ok gave you a bad dma address why reset the system without  any warning. 

If you write to random addresses arbitrary things can happen.

-Scott

^ permalink raw reply

* Re: [v3, 3/7] powerpc: enable the relocatable support for the fsl booke 32bit kernel
From: Scott Wood @ 2014-01-08 21:46 UTC (permalink / raw)
  To: Kevin Hao; +Cc: linuxppc
In-Reply-To: <20140108024235.GA20739@pek-khao-d1.corp.ad.wrs.com>

On Wed, 2014-01-08 at 10:42 +0800, Kevin Hao wrote:
> On Tue, Jan 07, 2014 at 05:46:04PM -0600, Scott Wood wrote:
> > On Sat, 2014-01-04 at 14:34 +0800, Kevin Hao wrote:
> > > > I'm having a hard time following the logic here.  What is PAGE_OFFSET -
> > > > offset supposed to be?  Why would we map anything belowe PAGE_OFFSET?
> > > 
> > > No, we don't map the address below PAGE_OFFSET.
> > >     memstart_addr is the physical start address of RAM.
> > >     start is the kernel running physical address aligned with 64M.
> > > 
> > >     offset = memstart_addr - start
> > > 
> > > So if memstart_addr < start, the offset is negative. The PAGE_OFFSET - offset
> > > is the virtual start address we should use for the init 64M map. It's above
> > > the PAGE_OFFSET instead of below.
> > 
> > Oh.  I think it'd be more readable to do "offset = start -
> > memstart_addr" and add offset instead of subtracting it.
> 
> Yes, I agree. The reason that I use "offset = memstart_addr - start" is that
> it seems "memstart_addr" is always greater than "start" when we are booting
> a kdump kernel with a kernel option like "crashkernel=64M@80M". :-)

...so there is a situation where you map below PAGE_OFFSET. :-)
 
> > Also, offset should be phys_addr_t -- even if you don't expect to
> > support offsets greater than 4G on 32-bit, it's semantically the right
> > type to use.  Plus, "int" would break if this code were ever used with
> > 64-bit.
> 
> I thought about using phy_addr_t for the "offset" originally but gave it up
> for the following reasons:
>   * It will not be greater than 4G.
>   * We have to use the ugly #ifdef CONFIG_PHYS_64BIT in restore_to_as0().
>   * Need more registers for arguments for restore_to_as0().
> 
> Of course you can change it to phys_addr_t if you prefer.

I'd at least like to make it "long".

-Scott

^ permalink raw reply

* Re: [PATCH 02/12][v4] pci: fsl: add structure fsl_pci
From: Scott Wood @ 2014-01-08 21:58 UTC (permalink / raw)
  To: Minghuan Lian; +Cc: Bjorn Helgaas, linux-pci, linuxppc-dev, Zang Roy-R61911
In-Reply-To: <1389157323-3088-2-git-send-email-Minghuan.Lian@freescale.com>

On Wed, 2014-01-08 at 13:01 +0800, Minghuan Lian wrote:
> PowerPC uses structure pci_controller to describe PCI controller,
> but ARM uses structure pci_sys_data. In order to support PowerPC
> and ARM simultaneously, the patch adds a structure fsl_pci that
> contains most of the members of the pci_controller and pci_sys_data.
> Meanwhile, it defines a interface fsl_arch_sys_to_pci() which should
> be implemented in architecture-specific PCI controller driver to
> convert pci_controller or pci_sys_data to fsl_pci.
> 
> Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
> ---
> change log:
> v4:
> Added indirect type macro
> v1-v3:
> Derived from http://patchwork.ozlabs.org/patch/278965/
> 
> Based on upstream master.
> Based on the discussion of RFC version here
> http://patchwork.ozlabs.org/patch/274487/
> 
>  include/linux/fsl/pci-common.h | 48 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)

Same comments on this patchset as last time.

-Scott

^ permalink raw reply

* mpic build failure for 7447_hpc defconfig (bisected)
From: Paul Gortmaker @ 2014-01-08 22:45 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev

Commit 446f6d06fab0b49c61887ecbe8286d6aaa796637 ("powerpc/mpic: Properly
set default triggers") breaks the mpc7447_hpc_defconfig as follows:

  CC      arch/powerpc/sysdev/mpic.o
arch/powerpc/sysdev/mpic.c: In function 'mpic_set_irq_type':
arch/powerpc/sysdev/mpic.c:886:9: error: case label does not reduce to an integer constant
arch/powerpc/sysdev/mpic.c:890:9: error: case label does not reduce to an integer constant
arch/powerpc/sysdev/mpic.c:894:9: error: case label does not reduce to an integer constant
arch/powerpc/sysdev/mpic.c:898:9: error: case label does not reduce to an integer constant

Looking at the cpp output (gcc 4.7.3 from the kernel.org toolchains), I see:

   case mpic->hw_set[MPIC_IDX_VECPRI_SENSE_EDGE] |
        mpic->hw_set[MPIC_IDX_VECPRI_POLARITY_POSITIVE]:

The pointer into an array appears because CONFIG_MPIC_WEIRD=y is set for
this thing.

-------------------
  #ifdef CONFIG_MPIC_WEIRD
  static u32 mpic_infos[][MPIC_IDX_END] = {
        [0] = { /* Original OpenPIC compatible MPIC */

  [...]

  #define MPIC_INFO(name) mpic->hw_set[MPIC_IDX_##name]

  #else /* CONFIG_MPIC_WEIRD */

  #define MPIC_INFO(name) MPIC_##name

  #endif /* CONFIG_MPIC_WEIRD */
-------------------


Given it has been broken since 3.4-rc5, is it safe to say MPIC_WEIRD is dead
and unused?  Or should the case be converted to if/else or similar?  Or were
other versions of gcc actually able to "see" constant numbers?

Paul.
--

^ permalink raw reply

* Re: [v3, 3/7] powerpc: enable the relocatable support for the fsl booke 32bit kernel
From: Scott Wood @ 2014-01-09  0:02 UTC (permalink / raw)
  To: Kevin Hao; +Cc: linuxppc
In-Reply-To: <20140108024235.GA20739@pek-khao-d1.corp.ad.wrs.com>

On Wed, Jan 08, 2014 at 10:42:35AM +0800, Kevin Hao wrote:
> On Tue, Jan 07, 2014 at 05:46:04PM -0600, Scott Wood wrote:
> > Oh.  I think it'd be more readable to do "offset = start -
> > memstart_addr" and add offset instead of subtracting it.
> 
> Yes, I agree. The reason that I use "offset = memstart_addr - start" is that
> it seems "memstart_addr" is always greater than "start" when we are booting
> a kdump kernel with a kernel option like "crashkernel=64M@80M". :-)
> 
> > 
> > Also, offset should be phys_addr_t -- even if you don't expect to
> > support offsets greater than 4G on 32-bit, it's semantically the right
> > type to use.  Plus, "int" would break if this code were ever used with
> > 64-bit.
> 
> I thought about using phy_addr_t for the "offset" originally but gave it up
> for the following reasons:
>   * It will not be greater than 4G.
>   * We have to use the ugly #ifdef CONFIG_PHYS_64BIT in restore_to_as0().
>   * Need more registers for arguments for restore_to_as0().
> 
> Of course you can change it to phys_addr_t if you prefer.

Here's the diff I made when applying (also changed the subf in patch 9 to
add)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
index 71e08df..b1f7edc 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1251,7 +1251,7 @@ _GLOBAL(switch_to_as1)
  * Restore to the address space 0 and also invalidate the tlb entry created
  * by switch_to_as1.
  * r3 - the tlb entry which should be invalidated
- * r4 - __pa(PAGE_OFFSET in AS0) - __pa(PAGE_OFFSET in AS1)
+ * r4 - __pa(PAGE_OFFSET in AS1) - __pa(PAGE_OFFSET in AS0)
  * r5 - device tree virtual address. If r4 is 0, r5 is ignored.
 */
 _GLOBAL(restore_to_as0)
@@ -1266,8 +1266,8 @@ _GLOBAL(restore_to_as0)
 	 * so we need calculate the right jump and device tree address based
 	 * on the offset passed by r4.
 	 */
-	subf	r9,r4,r9
-	subf	r5,r4,r5
+	add	r9,r9,r4
+	add	r5,r5,r4
 
 2:	mfmsr	r7
 	li	r8,(MSR_IS | MSR_DS)
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index ce0c7d7..95deb9fd 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -291,7 +291,8 @@ notrace void __init relocate_init(u64 dt_ptr, phys_addr_t start)
 	 * and do a second relocation.
 	 */
 	if (start != memstart_addr) {
-		int n, offset = memstart_addr - start;
+		int n;
+		long offset = start - memstart_addr;
 
 		is_second_reloc = 1;
 		n = switch_to_as1();
@@ -299,7 +300,7 @@ notrace void __init relocate_init(u64 dt_ptr, phys_addr_t start)
 		if (memstart_addr > start)
 			map_mem_in_cams(0x4000000, CONFIG_LOWMEM_CAM_NUM);
 		else
-			map_mem_in_cams_addr(start, PAGE_OFFSET - offset,
+			map_mem_in_cams_addr(start, PAGE_OFFSET + offset,
 					0x4000000, CONFIG_LOWMEM_CAM_NUM);
 		restore_to_as0(n, offset, __va(dt_ptr));
 		/* We should never reach here */

-Scott

^ permalink raw reply related

* Re: [PATCH] powerpc: add vr save/restore functions
From: Michael Ellerman @ 2014-01-09  0:09 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linuxppc-dev
In-Reply-To: <87fvoypznh.fsf@igel.home>

On Wed, 2014-01-08 at 10:54 +0100, Andreas Schwab wrote:
> Michael Ellerman <michael@ellerman.id.au> writes:
> 
> > On Mon, 2013-12-30 at 15:31 +0100, Andreas Schwab wrote:
> >> GCC 4.8 now generates out-of-line vr save/restore functions when
> >> optimizing for size.  They are needed for the raid6 altivec support.
> >
> > It looks like they're identical for 32 & 64-bit ?
> 
> They use different temporary registers and calling conventions (no .opd
> for ppc64).

Yeah, sorry, you'd think I could spot the difference between r11 and r12.

cheers

^ permalink raw reply

* Re: [PATCH 02/12][v4] pci: fsl: add structure fsl_pci
From: Bjorn Helgaas @ 2014-01-09  0:12 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-pci, Minghuan Lian, linuxppc-dev, Zang Roy-R61911
In-Reply-To: <1389218288.25654.26.camel@snotra.buserror.net>

On Wed, Jan 08, 2014 at 03:58:08PM -0600, Scott Wood wrote:
> On Wed, 2014-01-08 at 13:01 +0800, Minghuan Lian wrote:
> > PowerPC uses structure pci_controller to describe PCI controller,
> > but ARM uses structure pci_sys_data. In order to support PowerPC
> > and ARM simultaneously, the patch adds a structure fsl_pci that
> > contains most of the members of the pci_controller and pci_sys_data.
> > Meanwhile, it defines a interface fsl_arch_sys_to_pci() which should
> > be implemented in architecture-specific PCI controller driver to
> > convert pci_controller or pci_sys_data to fsl_pci.
> > 
> > Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
> > ---
> > change log:
> > v4:
> > Added indirect type macro
> > v1-v3:
> > Derived from http://patchwork.ozlabs.org/patch/278965/
> > 
> > Based on upstream master.
> > Based on the discussion of RFC version here
> > http://patchwork.ozlabs.org/patch/274487/
> > 
> >  include/linux/fsl/pci-common.h | 48 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 48 insertions(+)
> 
> Same comments on this patchset as last time.

Minghuan, when you address Scott's comments, can you also add a
MAINTAINERS update?

Bjorn

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Joonsoo Kim @ 2014-01-09  0:20 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: cl, nacc, penberg, linux-mm, paulus, Anton Blanchard, mpm,
	linuxppc-dev
In-Reply-To: <52cbce84.aa71b60a.537c.ffffd9efSMTPIN_ADDED_BROKEN@mx.google.com>

On Tue, Jan 07, 2014 at 05:52:31PM +0800, Wanpeng Li wrote:
> On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote:
> >On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:

> >> Index: b/mm/slub.c
> >> ===================================================================
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -2278,10 +2278,17 @@ redo:
> >>  
> >>  	if (unlikely(!node_match(page, node))) {
> >>  		stat(s, ALLOC_NODE_MISMATCH);
> >> -		deactivate_slab(s, page, c->freelist);
> >> -		c->page = NULL;
> >> -		c->freelist = NULL;
> >> -		goto new_slab;
> >> +
> >> +		/*
> >> +		 * If the node contains no memory there is no point in trying
> >> +		 * to allocate a new node local slab
> >> +		 */
> >> +		if (node_spanned_pages(node)) {
> >> +			deactivate_slab(s, page, c->freelist);
> >> +			c->page = NULL;
> >> +			c->freelist = NULL;
> >> +			goto new_slab;
> >> +		}
> >>  	}
> >>  
> >>  	/*
> >
> >Hello,
> >
> >I think that we need more efforts to solve unbalanced node problem.
> >
> >With this patch, even if node of current cpu slab is not favorable to
> >unbalanced node, allocation would proceed and we would get the unintended memory.
> >
> >And there is one more problem. Even if we have some partial slabs on
> >compatible node, we would allocate new slab, because get_partial() cannot handle
> >this unbalance node case.
> >
> >To fix this correctly, how about following patch?
> >
> >Thanks.
> >
> >------------->8--------------------
> >diff --git a/mm/slub.c b/mm/slub.c
> >index c3eb3d3..a1f6dfa 100644
> >--- a/mm/slub.c
> >+++ b/mm/slub.c
> >@@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> > {
> >        void *object;
> >        int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
> >+       struct zonelist *zonelist;
> >+       struct zoneref *z;
> >+       struct zone *zone;
> >+       enum zone_type high_zoneidx = gfp_zone(flags);
> >
> >+       if (!node_present_pages(searchnode)) {
> >+               zonelist = node_zonelist(searchnode, flags);
> >+               for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
> >+                       searchnode = zone_to_nid(zone);
> >+                       if (node_present_pages(searchnode))
> >+                               break;
> >+               }
> >+       }
> 
> Why change searchnode instead of depending on fallback zones/nodes in 
> get_any_partial() to allocate partial slabs?
> 

If node != NUMA_NO_NODE, get_any_partial() isn't called.
That's why I change searchnode here instead of get_any_partial().

Thanks.

^ permalink raw reply

* [PATCH v4 1/3] powerpc: add barrier after writing kernel PTE
From: Scott Wood @ 2014-01-09  1:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Scott Wood

There is no barrier between something like ioremap() writing to
a PTE, and returning the value to a caller that may then store the
pointer in a place that is visible to other CPUs.  Such callers
generally don't perform barriers of their own.

Even if callers of ioremap() and similar things did use barriers,
the most logical choise would be smp_wmb(), which is not
architecturally sufficient when BookE hardware tablewalk is used.  A
full sync is specified by the architecture.

For userspace mappings, OTOH, we generally already have an lwsync due
to locking, and if we occasionally take a spurious fault due to not
having a full sync with hardware tablewalk, it will not be fatal
because we will retry rather than oops.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
v4: no change

 arch/powerpc/mm/pgtable_32.c |  1 +
 arch/powerpc/mm/pgtable_64.c | 12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 5b96017..343a87f 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -299,6 +299,7 @@ int map_page(unsigned long va, phys_addr_t pa, int flags)
 		set_pte_at(&init_mm, va, pg, pfn_pte(pa >> PAGE_SHIFT,
 						     __pgprot(flags)));
 	}
+	smp_wmb();
 	return err;
 }
 
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 02e8681..7551382 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -153,6 +153,18 @@ int map_kernel_page(unsigned long ea, unsigned long pa, int flags)
 		}
 #endif /* !CONFIG_PPC_MMU_NOHASH */
 	}
+
+#ifdef CONFIG_PPC_BOOK3E_64
+	/*
+	 * With hardware tablewalk, a sync is needed to ensure that
+	 * subsequent accesses see the PTE we just wrote.  Unlike userspace
+	 * mappings, we can't tolerate spurious faults, so make sure
+	 * the new PTE will be seen the first time.
+	 */
+	mb();
+#else
+	smp_wmb();
+#endif
 	return 0;
 }
 
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH v4 2/3] powerpc/e6500: TLB miss handler with hardware tablewalk support
From: Scott Wood @ 2014-01-09  1:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Scott Wood, Mihai Caraman
In-Reply-To: <1389231163-11175-1-git-send-email-scottwood@freescale.com>

There are a few things that make the existing hw tablewalk handlers
unsuitable for e6500:

 - Indirect entries go in TLB1 (though the resulting direct entries go in
   TLB0).

 - It has threads, but no "tlbsrx." -- so we need a spinlock and
   a normal "tlbsx".  Because we need this lock, hardware tablewalk
   is mandatory on e6500 unless we want to add spinlock+tlbsx to
   the normal bolted TLB miss handler.

 - TLB1 has no HES (nor next-victim hint) so we need software round robin
   (TODO: integrate this round robin data with hugetlb/KVM)

 - The existing tablewalk handlers map half of a page table at a time,
   because IBM hardware has a fixed 1MiB indirect page size.  e6500
   has variable size indirect entries, with a minimum of 2MiB.
   So we can't do the half-page indirect mapping, and even if we
   could it would be less efficient than mapping the full page.

 - Like on e5500, the linear mapping is bolted, so we don't need the
   overhead of supporting nested tlb misses.

Note that hardware tablewalk does not work in rev1 of e6500.
We do not expect to support e6500 rev1 in mainline Linux.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
---
v4: Ensure that MAS2 is 2M-aligned when writing an indirect entry.
It doesn't matter on real e6500 hardware, but it was noticed when
running under KVM, and the architecture does say that those bits
"software should set these bits to zero".

 arch/powerpc/include/asm/mmu-book3e.h |  13 +++
 arch/powerpc/include/asm/mmu.h        |  21 +++--
 arch/powerpc/include/asm/paca.h       |   6 ++
 arch/powerpc/kernel/asm-offsets.c     |   9 ++
 arch/powerpc/kernel/paca.c            |   5 +
 arch/powerpc/kernel/setup_64.c        |  31 ++++++
 arch/powerpc/mm/fsl_booke_mmu.c       |   7 ++
 arch/powerpc/mm/mem.c                 |   6 ++
 arch/powerpc/mm/tlb_low_64e.S         | 171 ++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/tlb_nohash.c          |  93 ++++++++++++------
 10 files changed, 326 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
index 936db36..89b785d 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -286,8 +286,21 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
 extern int mmu_linear_psize;
 extern int mmu_vmemmap_psize;
 
+struct tlb_core_data {
+	/* For software way selection, as on Freescale TLB1 */
+	u8 esel_next, esel_max, esel_first;
+
+	/* Per-core spinlock for e6500 TLB handlers (no tlbsrx.) */
+	u8 lock;
+};
+
 #ifdef CONFIG_PPC64
 extern unsigned long linear_map_top;
+extern int book3e_htw_mode;
+
+#define PPC_HTW_NONE	0
+#define PPC_HTW_IBM	1
+#define PPC_HTW_E6500	2
 
 /*
  * 64-bit booke platforms don't load the tlb in the tlb miss handler code.
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 691fd8a..f8d1d6d 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -180,16 +180,17 @@ static inline void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
 #define MMU_PAGE_64K_AP	3	/* "Admixed pages" (hash64 only) */
 #define MMU_PAGE_256K	4
 #define MMU_PAGE_1M	5
-#define MMU_PAGE_4M	6
-#define MMU_PAGE_8M	7
-#define MMU_PAGE_16M	8
-#define MMU_PAGE_64M	9
-#define MMU_PAGE_256M	10
-#define MMU_PAGE_1G	11
-#define MMU_PAGE_16G	12
-#define MMU_PAGE_64G	13
-
-#define MMU_PAGE_COUNT	14
+#define MMU_PAGE_2M	6
+#define MMU_PAGE_4M	7
+#define MMU_PAGE_8M	8
+#define MMU_PAGE_16M	9
+#define MMU_PAGE_64M	10
+#define MMU_PAGE_256M	11
+#define MMU_PAGE_1G	12
+#define MMU_PAGE_16G	13
+#define MMU_PAGE_64G	14
+
+#define MMU_PAGE_COUNT	15
 
 #if defined(CONFIG_PPC_STD_MMU_64)
 /* 64-bit classic hash table MMU */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index c3523d1..e81731c 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -113,6 +113,10 @@ struct paca_struct {
 	/* Keep pgd in the same cacheline as the start of extlb */
 	pgd_t *pgd __attribute__((aligned(0x80))); /* Current PGD */
 	pgd_t *kernel_pgd;		/* Kernel PGD */
+
+	/* Shared by all threads of a core -- points to tcd of first thread */
+	struct tlb_core_data *tcd_ptr;
+
 	/* We can have up to 3 levels of reentrancy in the TLB miss handler */
 	u64 extlb[3][EX_TLB_SIZE / sizeof(u64)];
 	u64 exmc[8];		/* used for machine checks */
@@ -123,6 +127,8 @@ struct paca_struct {
 	void *mc_kstack;
 	void *crit_kstack;
 	void *dbg_kstack;
+
+	struct tlb_core_data tcd;
 #endif /* CONFIG_PPC_BOOK3E */
 
 	mm_context_t context;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 41a2839..ed8d68c 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -203,6 +203,15 @@ int main(void)
 	DEFINE(PACA_MC_STACK, offsetof(struct paca_struct, mc_kstack));
 	DEFINE(PACA_CRIT_STACK, offsetof(struct paca_struct, crit_kstack));
 	DEFINE(PACA_DBG_STACK, offsetof(struct paca_struct, dbg_kstack));
+	DEFINE(PACA_TCD_PTR, offsetof(struct paca_struct, tcd_ptr));
+
+	DEFINE(TCD_ESEL_NEXT,
+		offsetof(struct tlb_core_data, esel_next));
+	DEFINE(TCD_ESEL_MAX,
+		offsetof(struct tlb_core_data, esel_max));
+	DEFINE(TCD_ESEL_FIRST,
+		offsetof(struct tlb_core_data, esel_first));
+	DEFINE(TCD_LOCK, offsetof(struct tlb_core_data, lock));
 #endif /* CONFIG_PPC_BOOK3E */
 
 #ifdef CONFIG_PPC_STD_MMU_64
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 623c356..bf0aada 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -160,6 +160,11 @@ void __init initialise_paca(struct paca_struct *new_paca, int cpu)
 #ifdef CONFIG_PPC_STD_MMU_64
 	new_paca->slb_shadow_ptr = init_slb_shadow(cpu);
 #endif /* CONFIG_PPC_STD_MMU_64 */
+
+#ifdef CONFIG_PPC_BOOK3E
+	/* For now -- if we have threads this will be adjusted later */
+	new_paca->tcd_ptr = &new_paca->tcd;
+#endif
 }
 
 /* Put the paca pointer into r13 and SPRG_PACA */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 2232aff..1ce9b87 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -97,6 +97,36 @@ int dcache_bsize;
 int icache_bsize;
 int ucache_bsize;
 
+#if defined(CONFIG_PPC_BOOK3E) && defined(CONFIG_SMP)
+static void setup_tlb_core_data(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		int first = cpu_first_thread_sibling(cpu);
+
+		paca[cpu].tcd_ptr = &paca[first].tcd;
+
+		/*
+		 * If we have threads, we need either tlbsrx.
+		 * or e6500 tablewalk mode, or else TLB handlers
+		 * will be racy and could produce duplicate entries.
+		 */
+		if (smt_enabled_at_boot >= 2 &&
+		    !mmu_has_feature(MMU_FTR_USE_TLBRSRV) &&
+		    book3e_htw_mode != PPC_HTW_E6500) {
+			/* Should we panic instead? */
+			WARN_ONCE("%s: unsupported MMU configuration -- expect problems\n",
+				  __func__);
+		}
+	}
+}
+#else
+static void setup_tlb_core_data(void)
+{
+}
+#endif
+
 #ifdef CONFIG_SMP
 
 static char *smt_enabled_cmdline;
@@ -445,6 +475,7 @@ void __init setup_system(void)
 
 	smp_setup_cpu_maps();
 	check_smt_enabled();
+	setup_tlb_core_data();
 
 #ifdef CONFIG_SMP
 	/* Release secondary cpus out of their spinloops at 0x60 now that
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index a68671c..94cd728 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -52,6 +52,7 @@
 #include <asm/smp.h>
 #include <asm/machdep.h>
 #include <asm/setup.h>
+#include <asm/paca.h>
 
 #include "mmu_decl.h"
 
@@ -191,6 +192,12 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t phys, unsigned long virt,
 	}
 	tlbcam_index = i;
 
+#ifdef CONFIG_PPC64
+	get_paca()->tcd.esel_next = i;
+	get_paca()->tcd.esel_max = mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY;
+	get_paca()->tcd.esel_first = i;
+#endif
+
 	return amount_mapped;
 }
 
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 3fa93dc..94448cd 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -307,6 +307,12 @@ static void __init register_page_bootmem_info(void)
 
 void __init mem_init(void)
 {
+	/*
+	 * book3s is limited to 16 page sizes due to encoding this in
+	 * a 4-bit field for slices.
+	 */
+	BUILD_BUG_ON(MMU_PAGE_COUNT > 16);
+
 #ifdef CONFIG_SWIOTLB
 	swiotlb_init(0);
 #endif
diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index b4113bf..75f5d27 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -239,6 +239,177 @@ itlb_miss_fault_bolted:
 	beq	tlb_miss_common_bolted
 	b	itlb_miss_kernel_bolted
 
+/*
+ * TLB miss handling for e6500 and derivatives, using hardware tablewalk.
+ *
+ * Linear mapping is bolted: no virtual page table or nested TLB misses
+ * Indirect entries in TLB1, hardware loads resulting direct entries
+ *    into TLB0
+ * No HES or NV hint on TLB1, so we need to do software round-robin
+ * No tlbsrx. so we need a spinlock, and we have to deal
+ *    with MAS-damage caused by tlbsx
+ * 4K pages only
+ */
+
+	START_EXCEPTION(instruction_tlb_miss_e6500)
+	tlb_prolog_bolted BOOKE_INTERRUPT_ITLB_MISS SPRN_SRR0
+
+	ld	r11,PACA_TCD_PTR(r13)
+	srdi.	r15,r16,60		/* get region */
+	ori	r16,r16,1
+
+	TLB_MISS_STATS_SAVE_INFO_BOLTED
+	bne	tlb_miss_kernel_e6500	/* user/kernel test */
+
+	b	tlb_miss_common_e6500
+
+	START_EXCEPTION(data_tlb_miss_e6500)
+	tlb_prolog_bolted BOOKE_INTERRUPT_DTLB_MISS SPRN_DEAR
+
+	ld	r11,PACA_TCD_PTR(r13)
+	srdi.	r15,r16,60		/* get region */
+	rldicr	r16,r16,0,62
+
+	TLB_MISS_STATS_SAVE_INFO_BOLTED
+	bne	tlb_miss_kernel_e6500	/* user vs kernel check */
+
+/*
+ * This is the guts of the TLB miss handler for e6500 and derivatives.
+ * We are entered with:
+ *
+ * r16 = page of faulting address (low bit 0 if data, 1 if instruction)
+ * r15 = crap (free to use)
+ * r14 = page table base
+ * r13 = PACA
+ * r11 = tlb_per_core ptr
+ * r10 = crap (free to use)
+ */
+tlb_miss_common_e6500:
+	/*
+	 * Search if we already have an indirect entry for that virtual
+	 * address, and if we do, bail out.
+	 *
+	 * MAS6:IND should be already set based on MAS4
+	 */
+	addi	r10,r11,TCD_LOCK
+1:	lbarx	r15,0,r10
+	cmpdi	r15,0
+	bne	2f
+	li	r15,1
+	stbcx.	r15,0,r10
+	bne	1b
+	.subsection 1
+2:	lbz	r15,0(r10)
+	cmpdi	r15,0
+	bne	2b
+	b	1b
+	.previous
+
+	mfspr	r15,SPRN_MAS2
+
+	tlbsx	0,r16
+	mfspr	r10,SPRN_MAS1
+	andis.	r10,r10,MAS1_VALID@h
+	bne	tlb_miss_done_e6500
+
+	/* Undo MAS-damage from the tlbsx */
+	mfspr	r10,SPRN_MAS1
+	oris	r10,r10,MAS1_VALID@h
+	mtspr	SPRN_MAS1,r10
+	mtspr	SPRN_MAS2,r15
+
+	/* Now, we need to walk the page tables. First check if we are in
+	 * range.
+	 */
+	rldicl.	r10,r16,64-PGTABLE_EADDR_SIZE,PGTABLE_EADDR_SIZE+4
+	bne-	tlb_miss_fault_e6500
+
+	rldicl	r15,r16,64-PGDIR_SHIFT+3,64-PGD_INDEX_SIZE-3
+	cmpldi	cr0,r14,0
+	clrrdi	r15,r15,3
+	beq-	tlb_miss_fault_e6500 /* No PGDIR, bail */
+	ldx	r14,r14,r15		/* grab pgd entry */
+
+	rldicl	r15,r16,64-PUD_SHIFT+3,64-PUD_INDEX_SIZE-3
+	clrrdi	r15,r15,3
+	cmpdi	cr0,r14,0
+	bge	tlb_miss_fault_e6500	/* Bad pgd entry or hugepage; bail */
+	ldx	r14,r14,r15		/* grab pud entry */
+
+	rldicl	r15,r16,64-PMD_SHIFT+3,64-PMD_INDEX_SIZE-3
+	clrrdi	r15,r15,3
+	cmpdi	cr0,r14,0
+	bge	tlb_miss_fault_e6500
+	ldx	r14,r14,r15		/* Grab pmd entry */
+
+	mfspr	r10,SPRN_MAS0
+	cmpdi	cr0,r14,0
+	bge	tlb_miss_fault_e6500
+
+	/* Now we build the MAS for a 2M indirect page:
+	 *
+	 * MAS 0   :	ESEL needs to be filled by software round-robin
+	 * MAS 1   :	Fully set up
+	 *               - PID already updated by caller if necessary
+	 *               - TSIZE for now is base ind page size always
+	 *               - TID already cleared if necessary
+	 * MAS 2   :	Default not 2M-aligned, need to be redone
+	 * MAS 3+7 :	Needs to be done
+	 */
+
+	ori	r14,r14,(BOOK3E_PAGESZ_4K << MAS3_SPSIZE_SHIFT)
+	mtspr	SPRN_MAS7_MAS3,r14
+
+	clrrdi	r15,r16,21		/* make EA 2M-aligned */
+	mtspr	SPRN_MAS2,r15
+
+	lbz	r15,TCD_ESEL_NEXT(r11)
+	lbz	r16,TCD_ESEL_MAX(r11)
+	lbz	r14,TCD_ESEL_FIRST(r11)
+	rlwimi	r10,r15,16,0x00ff0000	/* insert esel_next into MAS0 */
+	addi	r15,r15,1		/* increment esel_next */
+	mtspr	SPRN_MAS0,r10
+	cmpw	r15,r16
+	iseleq	r15,r14,r15		/* if next == last use first */
+	stb	r15,TCD_ESEL_NEXT(r11)
+
+	tlbwe
+
+tlb_miss_done_e6500:
+	.macro	tlb_unlock_e6500
+	li	r15,0
+	isync
+	stb	r15,TCD_LOCK(r11)
+	.endm
+
+	tlb_unlock_e6500
+	TLB_MISS_STATS_X(MMSTAT_TLB_MISS_NORM_OK)
+	tlb_epilog_bolted
+	rfi
+
+tlb_miss_kernel_e6500:
+	mfspr	r10,SPRN_MAS1
+	ld	r14,PACA_KERNELPGD(r13)
+	cmpldi	cr0,r15,8		/* Check for vmalloc region */
+	rlwinm	r10,r10,0,16,1		/* Clear TID */
+	mtspr	SPRN_MAS1,r10
+	beq+	tlb_miss_common_e6500
+
+tlb_miss_fault_e6500:
+	tlb_unlock_e6500
+	/* We need to check if it was an instruction miss */
+	andi.	r16,r16,1
+	bne	itlb_miss_fault_e6500
+dtlb_miss_fault_e6500:
+	TLB_MISS_STATS_D(MMSTAT_TLB_MISS_NORM_FAULT)
+	tlb_epilog_bolted
+	b	exc_data_storage_book3e
+itlb_miss_fault_e6500:
+	TLB_MISS_STATS_I(MMSTAT_TLB_MISS_NORM_FAULT)
+	tlb_epilog_bolted
+	b	exc_instruction_storage_book3e
+
+
 /**********************************************************************
  *                                                                    *
  * TLB miss handling for Book3E with TLB reservation and HES support  *
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 8805b7b..735839b 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -43,6 +43,7 @@
 #include <asm/tlb.h>
 #include <asm/code-patching.h>
 #include <asm/hugetlb.h>
+#include <asm/paca.h>
 
 #include "mmu_decl.h"
 
@@ -58,6 +59,10 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
 		.shift	= 12,
 		.enc	= BOOK3E_PAGESZ_4K,
 	},
+	[MMU_PAGE_2M] = {
+		.shift	= 21,
+		.enc	= BOOK3E_PAGESZ_2M,
+	},
 	[MMU_PAGE_4M] = {
 		.shift	= 22,
 		.enc	= BOOK3E_PAGESZ_4M,
@@ -136,7 +141,7 @@ static inline int mmu_get_tsize(int psize)
 int mmu_linear_psize;		/* Page size used for the linear mapping */
 int mmu_pte_psize;		/* Page size used for PTE pages */
 int mmu_vmemmap_psize;		/* Page size used for the virtual mem map */
-int book3e_htw_enabled;		/* Is HW tablewalk enabled ? */
+int book3e_htw_mode;		/* HW tablewalk?  Value is PPC_HTW_* */
 unsigned long linear_map_top;	/* Top of linear mapping */
 
 #endif /* CONFIG_PPC64 */
@@ -377,7 +382,7 @@ void tlb_flush_pgtable(struct mmu_gather *tlb, unsigned long address)
 {
 	int tsize = mmu_psize_defs[mmu_pte_psize].enc;
 
-	if (book3e_htw_enabled) {
+	if (book3e_htw_mode != PPC_HTW_NONE) {
 		unsigned long start = address & PMD_MASK;
 		unsigned long end = address + PMD_SIZE;
 		unsigned long size = 1UL << mmu_psize_defs[mmu_pte_psize].shift;
@@ -430,7 +435,7 @@ static void setup_page_sizes(void)
 			def = &mmu_psize_defs[psize];
 			shift = def->shift;
 
-			if (shift == 0)
+			if (shift == 0 || shift & 1)
 				continue;
 
 			/* adjust to be in terms of 4^shift Kb */
@@ -440,21 +445,40 @@ static void setup_page_sizes(void)
 				def->flags |= MMU_PAGE_SIZE_DIRECT;
 		}
 
-		goto no_indirect;
+		goto out;
 	}
 
 	if (fsl_mmu && (mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V2) {
-		u32 tlb1ps = mfspr(SPRN_TLB1PS);
+		u32 tlb1cfg, tlb1ps;
+
+		tlb0cfg = mfspr(SPRN_TLB0CFG);
+		tlb1cfg = mfspr(SPRN_TLB1CFG);
+		tlb1ps = mfspr(SPRN_TLB1PS);
+		eptcfg = mfspr(SPRN_EPTCFG);
+
+		if ((tlb1cfg & TLBnCFG_IND) && (tlb0cfg & TLBnCFG_PT))
+			book3e_htw_mode = PPC_HTW_E6500;
+
+		/*
+		 * We expect 4K subpage size and unrestricted indirect size.
+		 * The lack of a restriction on indirect size is a Freescale
+		 * extension, indicated by PSn = 0 but SPSn != 0.
+		 */
+		if (eptcfg != 2)
+			book3e_htw_mode = PPC_HTW_NONE;
 
 		for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
 			struct mmu_psize_def *def = &mmu_psize_defs[psize];
 
 			if (tlb1ps & (1U << (def->shift - 10))) {
 				def->flags |= MMU_PAGE_SIZE_DIRECT;
+
+				if (book3e_htw_mode && psize == MMU_PAGE_2M)
+					def->flags |= MMU_PAGE_SIZE_INDIRECT;
 			}
 		}
 
-		goto no_indirect;
+		goto out;
 	}
 #endif
 
@@ -471,8 +495,11 @@ static void setup_page_sizes(void)
 	}
 
 	/* Indirect page sizes supported ? */
-	if ((tlb0cfg & TLBnCFG_IND) == 0)
-		goto no_indirect;
+	if ((tlb0cfg & TLBnCFG_IND) == 0 ||
+	    (tlb0cfg & TLBnCFG_PT) == 0)
+		goto out;
+
+	book3e_htw_mode = PPC_HTW_IBM;
 
 	/* Now, we only deal with one IND page size for each
 	 * direct size. Hopefully all implementations today are
@@ -497,8 +524,8 @@ static void setup_page_sizes(void)
 				def->ind = ps + 10;
 		}
 	}
- no_indirect:
 
+out:
 	/* Cleanup array and print summary */
 	pr_info("MMU: Supported page sizes\n");
 	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
@@ -520,23 +547,23 @@ static void setup_page_sizes(void)
 
 static void setup_mmu_htw(void)
 {
-	/* Check if HW tablewalk is present, and if yes, enable it by:
-	 *
-	 * - patching the TLB miss handlers to branch to the
-	 *   one dedicates to it
-	 *
-	 * - setting the global book3e_htw_enabled
-       	 */
-	unsigned int tlb0cfg = mfspr(SPRN_TLB0CFG);
+	/*
+	 * If we want to use HW tablewalk, enable it by patching the TLB miss
+	 * handlers to branch to the one dedicated to it.
+	 */
 
-	if ((tlb0cfg & TLBnCFG_IND) &&
-	    (tlb0cfg & TLBnCFG_PT)) {
+	switch (book3e_htw_mode) {
+	case PPC_HTW_IBM:
 		patch_exception(0x1c0, exc_data_tlb_miss_htw_book3e);
 		patch_exception(0x1e0, exc_instruction_tlb_miss_htw_book3e);
-		book3e_htw_enabled = 1;
+		break;
+	case PPC_HTW_E6500:
+		patch_exception(0x1c0, exc_data_tlb_miss_e6500_book3e);
+		patch_exception(0x1e0, exc_instruction_tlb_miss_e6500_book3e);
+		break;
 	}
 	pr_info("MMU: Book3E HW tablewalk %s\n",
-		book3e_htw_enabled ? "enabled" : "not supported");
+		book3e_htw_mode != PPC_HTW_NONE ? "enabled" : "not supported");
 }
 
 /*
@@ -576,8 +603,16 @@ static void __early_init_mmu(int boot_cpu)
 	/* Set MAS4 based on page table setting */
 
 	mas4 = 0x4 << MAS4_WIMGED_SHIFT;
-	if (book3e_htw_enabled) {
-		mas4 |= mas4 | MAS4_INDD;
+	switch (book3e_htw_mode) {
+	case PPC_HTW_E6500:
+		mas4 |= MAS4_INDD;
+		mas4 |= BOOK3E_PAGESZ_2M << MAS4_TSIZED_SHIFT;
+		mas4 |= MAS4_TLBSELD(1);
+		mmu_pte_psize = MMU_PAGE_2M;
+		break;
+
+	case PPC_HTW_IBM:
+		mas4 |= MAS4_INDD;
 #ifdef CONFIG_PPC_64K_PAGES
 		mas4 |=	BOOK3E_PAGESZ_256M << MAS4_TSIZED_SHIFT;
 		mmu_pte_psize = MMU_PAGE_256M;
@@ -585,13 +620,16 @@ static void __early_init_mmu(int boot_cpu)
 		mas4 |=	BOOK3E_PAGESZ_1M << MAS4_TSIZED_SHIFT;
 		mmu_pte_psize = MMU_PAGE_1M;
 #endif
-	} else {
+		break;
+
+	case PPC_HTW_NONE:
 #ifdef CONFIG_PPC_64K_PAGES
 		mas4 |=	BOOK3E_PAGESZ_64K << MAS4_TSIZED_SHIFT;
 #else
 		mas4 |=	BOOK3E_PAGESZ_4K << MAS4_TSIZED_SHIFT;
 #endif
 		mmu_pte_psize = mmu_virtual_psize;
+		break;
 	}
 	mtspr(SPRN_MAS4, mas4);
 
@@ -611,8 +649,11 @@ static void __early_init_mmu(int boot_cpu)
 		/* limit memory so we dont have linear faults */
 		memblock_enforce_memory_limit(linear_map_top);
 
-		patch_exception(0x1c0, exc_data_tlb_miss_bolted_book3e);
-		patch_exception(0x1e0, exc_instruction_tlb_miss_bolted_book3e);
+		if (book3e_htw_mode == PPC_HTW_NONE) {
+			patch_exception(0x1c0, exc_data_tlb_miss_bolted_book3e);
+			patch_exception(0x1e0,
+				exc_instruction_tlb_miss_bolted_book3e);
+		}
 	}
 #endif
 
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH v4 3/3] powerpc/fsl-book3e-64: Use paca for hugetlb TLB1 entry selection
From: Scott Wood @ 2014-01-09  1:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Scott Wood
In-Reply-To: <1389231163-11175-1-git-send-email-scottwood@freescale.com>

This keeps usage coordinated for hugetlb and indirect entries, which
should make entry selection more predictable and probably improve overall
performance when mixing the two.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
v4: no change

 arch/powerpc/mm/hugetlbpage-book3e.c | 51 +++++++++++++++++++++++++++++-------
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage-book3e.c b/arch/powerpc/mm/hugetlbpage-book3e.c
index 646c4bf..5e4ee25 100644
--- a/arch/powerpc/mm/hugetlbpage-book3e.c
+++ b/arch/powerpc/mm/hugetlbpage-book3e.c
@@ -8,6 +8,44 @@
 #include <linux/mm.h>
 #include <linux/hugetlb.h>
 
+#ifdef CONFIG_PPC_FSL_BOOK3E
+#ifdef CONFIG_PPC64
+static inline int tlb1_next(void)
+{
+	struct paca_struct *paca = get_paca();
+	struct tlb_core_data *tcd;
+	int this, next;
+
+	tcd = paca->tcd_ptr;
+	this = tcd->esel_next;
+
+	next = this + 1;
+	if (next >= tcd->esel_max)
+		next = tcd->esel_first;
+
+	tcd->esel_next = next;
+	return this;
+}
+#else
+static inline int tlb1_next(void)
+{
+	int index, ncams;
+
+	ncams = mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY;
+
+	index = __get_cpu_var(next_tlbcam_idx);
+
+	/* Just round-robin the entries and wrap when we hit the end */
+	if (unlikely(index == ncams - 1))
+		__get_cpu_var(next_tlbcam_idx) = tlbcam_index;
+	else
+		__get_cpu_var(next_tlbcam_idx)++;
+
+	return index;
+}
+#endif /* !PPC64 */
+#endif /* FSL */
+
 static inline int mmu_get_tsize(int psize)
 {
 	return mmu_psize_defs[psize].enc;
@@ -47,7 +85,7 @@ void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
 	struct mm_struct *mm;
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
-	int index, ncams;
+	int index;
 #endif
 
 	if (unlikely(is_kernel_addr(ea)))
@@ -77,18 +115,11 @@ void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
 	}
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
-	ncams = mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY;
-
 	/* We have to use the CAM(TLB1) on FSL parts for hugepages */
-	index = __get_cpu_var(next_tlbcam_idx);
+	index = tlb1_next();
 	mtspr(SPRN_MAS0, MAS0_ESEL(index) | MAS0_TLBSEL(1));
-
-	/* Just round-robin the entries and wrap when we hit the end */
-	if (unlikely(index == ncams - 1))
-		__get_cpu_var(next_tlbcam_idx) = tlbcam_index;
-	else
-		__get_cpu_var(next_tlbcam_idx)++;
 #endif
+
 	mas1 = MAS1_VALID | MAS1_TID(mm->context.id) | MAS1_TSIZE(tsize);
 	mas2 = ea & ~((1UL << shift) - 1);
 	mas2 |= (pte_val(pte) >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK;
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars
From: Benjamin Herrenschmidt @ 2014-01-09  1:36 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Michael Ellerman, linuxppc-dev, linux-kernel@vger.kernel.org,
	Anton Blanchard, chzigotzky
In-Reply-To: <20140108174828.GA16830@quad.lixom.net>

On Wed, 2014-01-08 at 09:48 -0800, Olof Johansson wrote:

> >         /* If it's a display, note it */
> > -       memset(type, 0, sizeof(type));
> > -       prom_getprop(stdout_node, "device_type", type, sizeof(type));
> > -       if (strcmp(type, "display") == 0)
> > -               prom_setprop(stdout_node, path, "linux,boot-display", NULL, 0);
> > +       stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
> > +       if (stdout_node != PROM_ERROR) {
> > +               val = cpu_to_be32(stdout_node);
> > +               memset(type, 0, sizeof(type));
> > +               prom_getprop(stdout_node, "device_type", type, sizeof(type));
> > +               if (strcmp(type, "display") == 0)
> > +                       prom_setprop(stdout_node, path, "linux,boot-display", NU
> 
> Line is cut off, this needs "NULL, 0);" at the end.

Right, copy/paste failure :-)

Thanks, I'll try to get that to Linus before he cuts .13, otherwise it
will be -stable.

Cheers,
Ben.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox