LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] arch: configuration, deleting 'CONFIG_BUG' since always need it.
From: Geert Uytterhoeven @ 2013-05-23 14:10 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Catalin Marinas, Linux-sh list, Chen Gang, Heiko Carstens,
	paulus@samba.org, H. Peter Anvin, Michel Lespinasse,
	Hans-Christian Egtvedt, Linux-Arch, linux-s390, uml-devel,
	Yoshinori Sato, Richard Weinberger, Helge Deller,
	the arch/x86 maintainers, James E.J. Bottomley, mingo@redhat.com,
	Frederic Weisbecker, Paul McKenney, Håvard Skinnemoen,
	Serge Hallyn, Mike Frysinger, Arnd Bergmann, Will Deacon,
	Jeff Dike, Akinobu Mita, uml-user,
	uclinux-dist-devel@blackfin.uclinux.org, Thomas Gleixner,
	linux-arm-kernel@lists.infradead.org, Parisc List,
	linux-kernel@vger.kernel.org, Richard Kuo, Paul Mundt,
	Eric W. Biederman, linux-hexagon, Martin Schwidefsky, linux390,
	Andrew Morton, linuxppc-dev@lists.ozlabs.org, David Miller
In-Reply-To: <20130523125033.GP18614@n2100.arm.linux.org.uk>

On Thu, May 23, 2013 at 2:50 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, May 23, 2013 at 02:09:02PM +0200, Arnd Bergmann wrote:
>> On Thursday 23 May 2013, Russell King - ARM Linux wrote:
>> > This is the problem you guys are missing - unreachable() means "we lose
>> > control of the CPU at this point".
>>
>> I'm absolutely aware of this. Again, the current behaviour of doing nothing
>> at all isn't very different from undefined behavior when you get when you
>> get to the end of a function returning a pointer without a "return" statement,
>> or when you return from a function that has determined that it is not safe
>> to continue.
>
> Running off the end of a function like that is a different kettle of fish.
> The execution path is still as the compiler intends - what isn't is that
> the data returned is likely to be random trash.
>
> That's _quite_ different from the CPU starting to execute the contents
> of a literal data pool.

I agree it's best to e.g. trap and reboot.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 0/5 v2] VFIO PPC64: add VFIO support on POWERPC64
From: Alex Williamson @ 2013-05-23 14:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, linux-kernel, Paul Mackerras, linuxppc-dev, David Gibson
In-Reply-To: <1369107191-28547-1-git-send-email-aik@ozlabs.ru>

On Tue, 2013-05-21 at 13:33 +1000, Alexey Kardashevskiy wrote:
> The series adds support for VFIO on POWERPC in user space (such as QEMU).
> The in-kernel real mode IOMMU support is added by another series posted
> separately.
> 
> As the first and main aim of this series is the POWERNV platform support,
> the "Enable on POWERNV platform" patch goes first and introduces an API
> to be used by the VFIO IOMMU driver. The "Enable on pSeries platform" patch
> simply registers PHBs in the IOMMU subsystem and expects the API to be present,
> it enables VFIO support in fully emulated QEMU guests.
> 
> The main change is that this series was changed and tested against v3.10-rc1.
> It also contains some bugfixes which are mentioned (if any) in the patch messages.
> 
> Alexey Kardashevskiy (3):
>   powerpc/vfio: Enable on POWERNV platform
>   powerpc/vfio: Implement IOMMU driver for VFIO
>   powerpc/vfio: Enable on pSeries platform
> 
>  Documentation/vfio.txt                      |   63 +++++
>  arch/powerpc/include/asm/iommu.h            |   26 ++
>  arch/powerpc/kernel/iommu.c                 |  323 +++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/pci-ioda.c   |    1 +
>  arch/powerpc/platforms/powernv/pci-p5ioc2.c |    5 +-
>  arch/powerpc/platforms/powernv/pci.c        |    2 +
>  arch/powerpc/platforms/pseries/iommu.c      |    4 +
>  drivers/iommu/Kconfig                       |    8 +
>  drivers/vfio/Kconfig                        |    6 +
>  drivers/vfio/Makefile                       |    1 +
>  drivers/vfio/vfio.c                         |    1 +
>  drivers/vfio/vfio_iommu_spapr_tce.c         |  377 +++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h                   |   34 +++
>  13 files changed, 850 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> 

These look ok to me, how do you want to integrate them?  Should I
provide Acks on patches 2 & 3 and let them get pushed through the ppc
tree or should I wait for patch 1 then push 2 & 3 through my tree?
Thanks,

Alex

^ permalink raw reply

* Re: SATA hang on 8315E triggered by heavy flash write?
From: Anthony Foiani @ 2013-05-23 15:10 UTC (permalink / raw)
  To: Xie Shaohui-B21989; +Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <ED492CCEAF882048BC2237DE806547C90B1E0FFA@039-SN2MPN1-013.039d.mgd.msft.net>

Shaohui --

Xie Shaohui-B21989 <B21989@freescale.com> writes:

> Thanks for the confirmation. 

You're very welcome.

> So it seems the NOR write break the signal Integrity of SATA.
> I don't have schematic and board right now, could you please measure
> signals related to NOR write to see if anything abnormal? Is the board
> use FPGA or CPLD to control signal?

I'll have to pass these questions on to my hardware vendor; I'm not
equipped to do this level of hardware debugging (neither hardware nor
knowledge!).

> If stop NOR write, could the SATA recover and work?

Earlier in my development, I was seeing this error and it would
recover:

  ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
  ata2: PHY RDY changed
  ata2: hard resetting link
  ata2: Signature Update detected @ 0 msecs
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/133
  ata2: EH complete

At the current time, however, it seems that it does not recover.

I don't know whether this is due to the speed limiting code, or if
it's because we are doing more disk accesses (when the actual product
is up and running).

I can re-do the tests with the speed limit disabled, but I won't be
able to get to that for a few hours yet.  You can read about the speed
limit issues in this thread:

  http://article.gmane.org/gmane.linux.ports.ppc.embedded/50652

And my final patch (yes, a year later):

  http://article.gmane.org/gmane.linux.ports.ppc.embedded/58969

Please don't laugh too hard when you read it.  :)

Thanks again for your help.  I'll try to get the results of testing
w/o speed limit to you within a day or two.

Best regards,
Anthony Foiani

^ permalink raw reply

* Re: SATA hang on 8315E triggered by heavy flash write?
From: Anthony Foiani @ 2013-05-23 15:49 UTC (permalink / raw)
  To: Xie Shaohui-B21989; +Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <g4ndt3f5q.fsf@dworkin.scrye.com>

Shaohui --

Apologies, a minor clarification is needed:

Anthony Foiani <tkil@scrye.com> writes:

> Shaohui --
>
> Xie Shaohui-B21989 <B21989@freescale.com> writes:
>
>> If stop NOR write, could the SATA recover and work?
>
> Earlier in my development, I was seeing this error and it would
> recover:
>
>   ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
>   ata2: PHY RDY changed
>   ata2: hard resetting link
>   ata2: Signature Update detected @ 0 msecs
>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>   ata2.00: configured for UDMA/133
>   ata2: EH complete

In this case, it would recover *even as the NOR write continued*.

Here's an example where it froze and recovered twice.  The application
starts about 12s after the kernel, so 945s for the kernel should be
933s for the application.

Also, note that this case already has the speed limit code included
(see message at 945.928702), so I don't think I need to do a separate
test.

  [console]
  [  945.902543] ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
  [  945.909584] ata2: PHY RDY changed
  [  945.913048] ata2: hard resetting link
  [  945.928702] ata2: setting speed (in hard reset)
  [  945.939864] ata2: Signature Update detected @ 0 msecs
  [  946.115888] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  [  946.128548] ata2.00: configured for UDMA/133
  [  946.133021] ata2: EH complete
  [  952.537180] ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
  [  952.544208] ata2: PHY RDY changed
  [  952.547626] ata2: hard resetting link
  [  952.558319] ata2: setting speed (in hard reset)
  [  953.076730] ata2: Signature Update detected @ 508 msecs
  [  953.251866] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  [  953.259389] ata2.00: configured for UDMA/133
  [  953.263892] ata2: EH complete

  [application]
  +924.152278 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x100000 from buf[0x0]; attempt 1/3
  +925.599739 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x110000 from buf[0x10000]; attempt 1/3
  +927.018239 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x120000 from buf[0x20000]; attempt 1/3
  +928.414069 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x130000 from buf[0x30000]; attempt 1/3
  +929.872850 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x140000 from buf[0x40000]; attempt 1/3
  +931.341634 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x150000 from buf[0x50000]; attempt 1/3
  +932.724024 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x160000 from buf[0x60000]; attempt 1/3
  [first freeze is about here]
  +934.146203 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x170000 from buf[0x70000]; attempt 1/3
  +935.569069 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x180000 from buf[0x80000]; attempt 1/3
  +936.875275 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x190000 from buf[0x90000]; attempt 1/3
  +938.205302 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x1a0000 from buf[0xa0000]; attempt 1/3
  +939.519662 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x1b0000 from buf[0xb0000]; attempt 1/3
  +940.873656 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x1c0000 from buf[0xc0000]; attempt 1/3
  [second freeze is about here]
  +942.230740 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x1d0000 from buf[0xd0000]; attempt 1/3
  +943.641994 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x1e0000 from buf[0xe0000]; attempt 1/3
  +944.938454 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x1f0000 from buf[0xf0000]; attempt 1/3
  +946.236491 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x200000 from buf[0x100000]; attempt 1/3
  +947.607673 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x210000 from buf[0x110000]; attempt 1/3
  +948.919213 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x220000 from buf[0x120000]; attempt 1/3
  +950.151386 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x230000 from buf[0x130000]; attempt 1/3
  +951.502522 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x240000 from buf[0x140000]; attempt 1/3
  +952.851177 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x250000 from buf[0x150000]; attempt 1/3
  +954.082897 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x260000 from buf[0x160000]; attempt 1/3
  +955.315338 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x270000 from buf[0x170000]; attempt 1/3
  +956.559639 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x280000 from buf[0x180000]; attempt 1/3
  +957.845503 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x290000 from buf[0x190000]; attempt 1/3
  +959.100007 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x2a0000 from buf[0x1a0000]; attempt 1/3
  +960.347982 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x2b0000 from buf[0x1b0000]; attempt 1/3
  +961.545344 sw-upd.0 [29]: fm: nor0: write: writing 0x10000 @0x2c0000 from buf[0x1c0000]; attempt 1/3

Best regards,
Anthony Foiani

^ permalink raw reply

* Re: [PATCH 2/2] net: mv643xx_eth: proper initialization for Kirkwood SoCs
From: Jason Cooper @ 2013-05-23 16:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Lunn, netdev, linux-kernel, linux-arm-kernel, linuxppc-dev,
	David Miller, Lennert Buytenhek, Sebastian Hesselbarth
In-Reply-To: <20130522201607.GA18823@obsidianresearch.com>

Sebastian,

On Wed, May 22, 2013 at 02:16:07PM -0600, Jason Gunthorpe wrote:
> On Wed, May 22, 2013 at 10:04:02PM +0200, Sebastian Hesselbarth wrote:
> 
> > Ethernet controllers found on Kirkwood SoCs not only suffer from loosing
> > MAC address register contents on clock gating but also some important
> > registers are reset to values that would break ethernet. This patch
> 
> FWIW, we found that the bootloader has to write to PSC1, the driver
> doesn't work with the power on/reset value of the register. So I think
> it is safe to assume that all kirkwood bootloaders alter the value.
> 
> Our systems write the value 0x00638488 to PSC1.
> 
> I looked at patching mv643xx_eth, but ran into the same complexity you
> did, it isn't clear what variants of this IP block have the
> register/etc.
> 
> > +	/* Kirkwood resets some registers on gated clocks. Especially
> > +	 * CLK125_BYPASS_EN must be cleared but is not available on
> > +	 * all other SoCs/System Controllers using this driver.
> > +	 */
> > +	if (of_machine_is_compatible("marvell,kirkwood"))
> > +		wrlp(mp, PORT_SERIAL_CONTROL1,
> > +		     rdlp(mp, PORT_SERIAL_CONTROL1) & ~CLK125_BYPASS_EN);
> 
> of_machine_is_compatible seems heavy handed, I would expect this to be
> based on the compatible string of the ethernet node itself, not the
> machine??

Is there a model number variation between IP that needs this and IP that
doesn't?  If not, I'm fine with of_machine_is_compatible().

thx,

Jason.

^ permalink raw reply

* Re: [PATCH 2/2] net: mv643xx_eth: proper initialization for Kirkwood SoCs
From: Jason Gunthorpe @ 2013-05-23 17:11 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Andrew Lunn, linux-kernel, Lennert Buytenhek, netdev,
	linuxppc-dev, David Miller, linux-arm-kernel,
	Sebastian Hesselbarth
In-Reply-To: <20130523160111.GP31290@titan.lakedaemon.net>

On Thu, May 23, 2013 at 12:01:11PM -0400, Jason Cooper wrote:
> > > +	/* Kirkwood resets some registers on gated clocks. Especially
> > > +	 * CLK125_BYPASS_EN must be cleared but is not available on
> > > +	 * all other SoCs/System Controllers using this driver.
> > > +	 */
> > > +	if (of_machine_is_compatible("marvell,kirkwood"))
> > > +		wrlp(mp, PORT_SERIAL_CONTROL1,
> > > +		     rdlp(mp, PORT_SERIAL_CONTROL1) & ~CLK125_BYPASS_EN);
> > 
> > of_machine_is_compatible seems heavy handed, I would expect this to be
> > based on the compatible string of the ethernet node itself, not the
> > machine??
> 
> Is there a model number variation between IP that needs this and IP that
> doesn't?  If not, I'm fine with of_machine_is_compatible().

Well the name 'mv643xx' is a family of system controller SOC's
from ages ago, it seems reasonble to continue the trend and label the
IP variations with the SOC name:

 compatible = "marvell,kirwood,ethernet", "marvell,mv643xx_eth"

Jason

^ permalink raw reply

* Re: [PATCH 2/2] net: mv643xx_eth: proper initialization for Kirkwood SoCs
From: Jason Cooper @ 2013-05-23 17:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Lunn, linux-kernel, Lennert Buytenhek, netdev,
	linuxppc-dev, David Miller, linux-arm-kernel,
	Sebastian Hesselbarth
In-Reply-To: <20130523171112.GB31281@obsidianresearch.com>

On Thu, May 23, 2013 at 11:11:12AM -0600, Jason Gunthorpe wrote:
> On Thu, May 23, 2013 at 12:01:11PM -0400, Jason Cooper wrote:
> > > > +	/* Kirkwood resets some registers on gated clocks. Especially
> > > > +	 * CLK125_BYPASS_EN must be cleared but is not available on
> > > > +	 * all other SoCs/System Controllers using this driver.
> > > > +	 */
> > > > +	if (of_machine_is_compatible("marvell,kirkwood"))
> > > > +		wrlp(mp, PORT_SERIAL_CONTROL1,
> > > > +		     rdlp(mp, PORT_SERIAL_CONTROL1) & ~CLK125_BYPASS_EN);
> > > 
> > > of_machine_is_compatible seems heavy handed, I would expect this to be
> > > based on the compatible string of the ethernet node itself, not the
> > > machine??
> > 
> > Is there a model number variation between IP that needs this and IP that
> > doesn't?  If not, I'm fine with of_machine_is_compatible().
> 
> Well the name 'mv643xx' is a family of system controller SOC's
> from ages ago, it seems reasonble to continue the trend and label the
> IP variations with the SOC name:
> 
>  compatible = "marvell,kirwood,ethernet", "marvell,mv643xx_eth"

Shouldn't it rather be

	compatible = "marvell,kirkwood-eth", "marvell,orion-eth";

I'm inclined to go with of_machine_is_compatible() since the only
concrete difference we know is that the tweak is needed on kirkwood and
nowhere else.

If we had an errata, or a datasheet saying specifically flavor X needs
this and none other does, then we could trigger on the ethernet node
compatible string or a boolean in the node.  But we don't have that...

thx,

Jason.

^ permalink raw reply

* Re: [PATCH 2/2] net: mv643xx_eth: proper initialization for Kirkwood SoCs
From: Jason Gunthorpe @ 2013-05-23 17:53 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Andrew Lunn, linux-kernel, Lennert Buytenhek, netdev,
	linuxppc-dev, David Miller, linux-arm-kernel,
	Sebastian Hesselbarth
In-Reply-To: <20130523172339.GQ31290@titan.lakedaemon.net>

On Thu, May 23, 2013 at 01:23:39PM -0400, Jason Cooper wrote:

> Shouldn't it rather be
> 
> 	compatible = "marvell,kirkwood-eth", "marvell,orion-eth";

Not sure about orion-eth?

> I'm inclined to go with of_machine_is_compatible() since the only
> concrete difference we know is that the tweak is needed on kirkwood and
> nowhere else.

But there is a larger problem here then just this one bit.

The PSC1 register must be set properly for the board layout, and today
we rely on the bootloader to set it. In fact, even with Sebastian's
change the ethernet port won't work without bootloader
intervention. The PortReset bit should also be cleared by the driver
(and it is only present on some variants of this IP block,
apparently).

We know that some Marvell SOC's wack the ethernet registers when they
clock gate, and the flip of Clk125Bypass is another symptom of this
general problem.

So, long term, the PSC1 must be fully set by the driver, based on DT
information describing the board (eg RGMII/MII/1000Base-X [SFP] Phy
type), and the layout of this register seems to vary on a SOC by SOC
basis.

Thus, I think it is appropriate to call this variant of the eth IP
'marvell,kirkwood-eth' which indicates that the register block follows
the kirkwood manual and the PSC1 register specifically has the
kirkwood layout.

The question is what other Marvell SOCs have the same PSC1 layout as
kirkwood?

Jason

^ permalink raw reply

* Re: [PATCH] powerpc/mpc85xx: fix non-bootcpu cannot up after hibernation resume
From: Anton Vorontsov @ 2013-05-23 17:33 UTC (permalink / raw)
  To: Wang Dongsheng-B40534
  Cc: Wood Scott-B07421, Li Yang-R58472, Zhao Chenhui-B35336,
	rjw@sisk.pl, paulus@samba.org, johannes@sipsolutions.net,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <ABB05CD9C9F68C46A5CEDC7F15439259F2AFC3@039-SN2MPN1-022.039d.mgd.msft.net>

Hi!

On Tue, May 14, 2013 at 08:59:13AM +0000, Wang Dongsheng-B40534 wrote:
> I send to a wrong email address "Anton Vorontsov <avorontsov@ru.mvista.com>"
> 
> Add Anton Vorontsov <anton.vorontsov@linaro.org> to this email.

I don't have any means to test it, but the patch itself looks good and the
description makes sense. So,

Reviewed-by: Anton Vorontsov <anton@enomsg.org>

Thanks!

> 
> Thanks all.
> 
> > -----Original Message-----
> > From: Wang Dongsheng-B40534
> > Sent: Tuesday, May 14, 2013 4:06 PM
> > To: avorontsov@ru.mvista.com
> > Cc: paulus@samba.org; rjw@sisk.pl; benh@kernel.crashing.org;
> > johannes@sipsolutions.net; Wood Scott-B07421; Li Yang-R58472; Zhao
> > Chenhui-B35336; linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534
> > Subject: [PATCH] powerpc/mpc85xx: fix non-bootcpu cannot up after
> > hibernation resume
> > 
> > This problem belongs to the core synchronization issues.
> > The cpu1 already updated spin_table values, but bootcore cannot get
> > this value in time.
> > 
> > After bootcpu hibiernation restore the pages. we are now running
> > with the kernel data of the old kernel fully restored. if we reset
> > the non-bootcpus that will be reset cache(tlb), the non-bootcpus
> > will get new address(map virtual and physical address spaces).
> > but bootcpu tlb cache still use boot kernel data, so we need to
> > invalidate the bootcpu tlb cache make it to get new main memory data.
> > 
> > log:
> > Enabling non-boot CPUs ...
> > smp_85xx_kick_cpu: timeout waiting for core 1 to reset
> > smp: failed starting cpu 1 (rc -2)
> > Error taking CPU1 up: -2
> > 
> > Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
> > 
> > diff --git a/arch/powerpc/kernel/swsusp_booke.S
> > b/arch/powerpc/kernel/swsusp_booke.S
> > index 11a3930..9503249 100644
> > --- a/arch/powerpc/kernel/swsusp_booke.S
> > +++ b/arch/powerpc/kernel/swsusp_booke.S
> > @@ -141,6 +141,19 @@ _GLOBAL(swsusp_arch_resume)
> >  	lis	r11,swsusp_save_area@h
> >  	ori	r11,r11,swsusp_save_area@l
> > 
> > +	/*
> > +	 * The boot core get a virtual address, when the boot process,
> > +	 * the virtual address corresponds to a physical address. After
> > +	 * hibernation resume memory snapshots, The corresponding
> > +	 * relationship between the virtual memory and physical memory
> > +	 * might change again. We need to get a new page table. So we
> > +	 * need to invalidate TLB after resume pages.
> > +	 *
> > +	 * Invalidations TLB Using tlbilx/tlbivax/MMUCSR0.
> > +	 * tlbilx used here.
> > +	 */
> > +	bl	_tlbil_all
> > +
> >  	lwz	r4,SL_SPRG0(r11)
> >  	mtsprg	0,r4
> >  	lwz	r4,SL_SPRG1(r11)
> > --
> > 1.8.0
> 
> 

^ permalink raw reply

* Re: [PATCH 2/2] net: mv643xx_eth: proper initialization for Kirkwood SoCs
From: Jason Cooper @ 2013-05-23 18:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Lunn, linux-kernel, Lennert Buytenhek, netdev,
	linuxppc-dev, David Miller, linux-arm-kernel,
	Sebastian Hesselbarth
In-Reply-To: <20130523175357.GB2821@obsidianresearch.com>

On Thu, May 23, 2013 at 11:53:57AM -0600, Jason Gunthorpe wrote:
> On Thu, May 23, 2013 at 01:23:39PM -0400, Jason Cooper wrote:
> 
> > Shouldn't it rather be
> > 
> > 	compatible = "marvell,kirkwood-eth", "marvell,orion-eth";
> 
> Not sure about orion-eth?
>  
> > I'm inclined to go with of_machine_is_compatible() since the only
> > concrete difference we know is that the tweak is needed on kirkwood and
> > nowhere else.
> 
> But there is a larger problem here then just this one bit.
> 
> The PSC1 register must be set properly for the board layout, and today
> we rely on the bootloader to set it. In fact, even with Sebastian's
> change the ethernet port won't work without bootloader
> intervention. The PortReset bit should also be cleared by the driver
> (and it is only present on some variants of this IP block,
> apparently).
> 
> We know that some Marvell SOC's wack the ethernet registers when they
> clock gate, and the flip of Clk125Bypass is another symptom of this
> general problem.
> 
> So, long term, the PSC1 must be fully set by the driver, based on DT
> information describing the board (eg RGMII/MII/1000Base-X [SFP] Phy
> type), and the layout of this register seems to vary on a SOC by SOC
> basis.
> 
> Thus, I think it is appropriate to call this variant of the eth IP
> 'marvell,kirkwood-eth' which indicates that the register block follows
> the kirkwood manual and the PSC1 register specifically has the
> kirkwood layout.

Ok, so mv643xx_eth would match both "marvell,orion-eth" and
"marvell,kirkwood-eth", then write to PSC1 iff it sees a node matching
"marvell,kirkwood-eth".  I'm not too keen on that, however, the matching
of the machine doesn't look to good, either.

Perhaps a better answer is to add a boolean, "marvell,kirkwood_psc1" and
check for that?

Or, marvell,psc1_reset = <0xWWXXYYZZ>;

> The question is what other Marvell SOCs have the same PSC1 layout as
> kirkwood?

I think marvell,psc1_reset = <>; gives us the most flexibility in
accurately describing the hardware.

thx,

Jason.

^ permalink raw reply

* Re: [PATCH 2/2] net: mv643xx_eth: proper initialization for Kirkwood SoCs
From: Jason Gunthorpe @ 2013-05-23 19:01 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Andrew Lunn, linux-kernel, Lennert Buytenhek, netdev,
	linuxppc-dev, David Miller, linux-arm-kernel,
	Sebastian Hesselbarth
In-Reply-To: <20130523184028.GU31290@titan.lakedaemon.net>

On Thu, May 23, 2013 at 02:40:28PM -0400, Jason Cooper wrote:

> > But there is a larger problem here then just this one bit.
> > 
> > The PSC1 register must be set properly for the board layout, and today
> > we rely on the bootloader to set it. In fact, even with Sebastian's
> > change the ethernet port won't work without bootloader
> > intervention. The PortReset bit should also be cleared by the driver
> > (and it is only present on some variants of this IP block,
> > apparently).
> > 
> > We know that some Marvell SOC's wack the ethernet registers when they
> > clock gate, and the flip of Clk125Bypass is another symptom of this
> > general problem.
> > 
> > So, long term, the PSC1 must be fully set by the driver, based on DT
> > information describing the board (eg RGMII/MII/1000Base-X [SFP] Phy
> > type), and the layout of this register seems to vary on a SOC by SOC
> > basis.
> > 
> > Thus, I think it is appropriate to call this variant of the eth IP
> > 'marvell,kirkwood-eth' which indicates that the register block follows
> > the kirkwood manual and the PSC1 register specifically has the
> > kirkwood layout.
> 
> Ok, so mv643xx_eth would match both "marvell,orion-eth" and
> "marvell,kirkwood-eth", then write to PSC1 iff it sees a node matching
> "marvell,kirkwood-eth".  I'm not too keen on that, however, the matching
> of the machine doesn't look to good, either.

Why are you not keen on this? It seems like normal device driver
practice, that is what the data field of of_device_id is typically
used for..

There are more compatible strings than just kirkwood and orion in this
driver, the whole TX_BW_CONTROL_OLD_LAYOUT/TX_BW_CONTROL_NEW_LAYOUT
buisness (affecting PPC/MIPS) should also someday be captured with
compatible strings rather than auto-detection too..

> > The question is what other Marvell SOCs have the same PSC1 layout as
> > kirkwood?
> 
> I think marvell,psc1_reset = <>; gives us the most flexibility in
> accurately describing the hardware.

Agree, providing psc1_reset value is a good idea to setup the phy
modes. If all 'orion' SOCs have the PSC1 value then we don't need the
kirkwood differentiators, especially if things like the reset bit are
in the same place.

The same trick Sebastian used to capture the mac address could be used
to capture the PSC1 value from the bootloader.

Basically, I think any IP variants that have idential register layouts
can share a compatible string, otherwise different layouts need
different compatible strings, so the general format:

 compatible = "marvell,SOCNAME-eth", "marvell,<something>-eth";

Seems very sane to me. At least this way if we discover more changes
then the driver can match on the SOCNAME compatible string to find
them.

<someting> = orion for TX_BW_CONTROL_NEW_LAYOUT variants also seems
reasonable..

No idea what to call TX_BW_CONTROL_OLD_LAYOUT variants, or the PPC
variants, not important right now it seems.

(BTW, I wonder if the driver should ideally toggle PSC1 reset at some
point????)

Jason

^ permalink raw reply

* Re: [PATCH 2/2] net: mv643xx_eth: proper initialization for Kirkwood SoCs
From: Sebastian Hesselbarth @ 2013-05-23 22:40 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Andrew Lunn, linux-kernel, Jason Gunthorpe, Lennert Buytenhek,
	netdev, linuxppc-dev, David Miller, linux-arm-kernel
In-Reply-To: <20130523184028.GU31290@titan.lakedaemon.net>

On 05/23/2013 08:40 PM, Jason Cooper wrote:
> On Thu, May 23, 2013 at 11:53:57AM -0600, Jason Gunthorpe wrote:
>> On Thu, May 23, 2013 at 01:23:39PM -0400, Jason Cooper wrote:
>>> Shouldn't it rather be
>>>
>>> 	compatible = "marvell,kirkwood-eth", "marvell,orion-eth";
>>
>> Not sure about orion-eth?

Jason, Jason,

sorry I didn't came back to this conversation earlier. I already
reworked the patch to rely on 
of_device_is_compatible(.."marvell,kirkwood-eth"..). This is a
kirkwood only thing as other Orions cannot do clock gating or
retain critcal register content (Dove). I will stick with orion-eth
for all other and maybe introduce new compatible strings (and new
fixes) as soon as issues surface.

>>> I'm inclined to go with of_machine_is_compatible() since the only
>>> concrete difference we know is that the tweak is needed on kirkwood and
>>> nowhere else.
>>
>> But there is a larger problem here then just this one bit.
>>
>> The PSC1 register must be set properly for the board layout, and today
>> we rely on the bootloader to set it. In fact, even with Sebastian's
>> change the ethernet port won't work without bootloader
>> intervention. The PortReset bit should also be cleared by the driver
>> (and it is only present on some variants of this IP block,
>> apparently).

Actually, fixing modular scenarios is only for the sake of multiarch
someday. I don't see the point in running current kernel without eth
compiled in _on a NAS SoC_ ;)

On Dockstar which I tested, clearing CLK125_BYPASS_EN to make it work
after clock gating, it might be a coincidence that bootloader's PSC1
setup matches reset value here - so please test the patch on other
Kirkwood boards also.

But, as long as no other issue arise, I will not start to modifiy
mv643xx_eth out of the blue. I has been working for ages and breaking
PPC is not my intention. There are other things David Miller already
requested to get fixed and honestly I even thought about a fresh start
for it. Maybe I'll come back to it when barebox gets it's driver
someday.

>> We know that some Marvell SOC's wack the ethernet registers when they
>> clock gate, and the flip of Clk125Bypass is another symptom of this
>> general problem.

Which SoCs except Kirkwood? I cannot reproduce any of this behavior on
Dove - and from what I can see from the FS of Orion5x or MV78x00 there
are no clock gating registers.

>> So, long term, the PSC1 must be fully set by the driver, based on DT
>> information describing the board (eg RGMII/MII/1000Base-X [SFP] Phy
>> type), and the layout of this register seems to vary on a SOC by SOC
>> basis.

Agree, but I tend to not go at it now. mv643xx_eth has never set up that
registers and actually it never connects anything else than GMII phy (or 
no phy at all). The latter is easy but the for the other, I like to
give up that brain-dead multi-device driver and stick with one device
for both shared and up to three ports. From what I can see from e.g.
ixgbe or any other multi-port eth drivers they all attach the network
device to a single (pci) device.

>> Thus, I think it is appropriate to call this variant of the eth IP
>> 'marvell,kirkwood-eth' which indicates that the register block follows
>> the kirkwood manual and the PSC1 register specifically has the
>> kirkwood layout.
>
> Ok, so mv643xx_eth would match both "marvell,orion-eth" and
> "marvell,kirkwood-eth", then write to PSC1 iff it sees a node matching
> "marvell,kirkwood-eth".  I'm not too keen on that, however, the matching
> of the machine doesn't look to good, either.

I didn't choose "marvell,mv643xx-eth" for two reasons
(a) The DT layout is slightly different with phy-handle instead of phy
and marvell prefixed properties. Choosing a compatible string that
matches any PPC compatible string will cause driver racing with sysdev
code to set up platform_data.

(b) I chose to name the controller "orion-eth" and the port
"orion-eth-port" .. PPC has "mv64360-eth" for the port and some 
"mv64360-eth-block" or "-group" for the controller. IMHO not intuitive,
but it just a name anyway.

> Perhaps a better answer is to add a boolean, "marvell,kirkwood_psc1" and
> check for that?
>
> Or, marvell,psc1_reset =<0xWWXXYYZZ>;

For the _long_ run: Exploit either already present phy properties for
MII and friends or introduce new marvell prefixed .. but not misuse DT
for register values here. Each SoC should setup mv643xx_eth properly,
but that should be based on a clean approach _and_ enough people willing
to test that.

I just have a Dockstar and Topkick which is running 24/7. I didn't even
check if only 6281 suffers from it or also 6282 or maybe only some
revisions of 6281. This patch is a fix, nothing more nothing
less. If you have Kirkwoods around, please test if it suffers from
loosing the MAC address and if it works after insmod with the fix
installed.

>> The question is what other Marvell SOCs have the same PSC1 layout as
>> kirkwood?
>
> I think marvell,psc1_reset =<>; gives us the most flexibility in
> accurately describing the hardware.

IMHO using that is just another workaround for a broken driver. We
could hack the whole register setup in DT as it would still accurately
describe HW. Don't get me wrong, but I don't like it.

Haven't checked how happy Linus Walleij is about pinctrl drivers with
reg values hacked in lately.

Sebastian

^ permalink raw reply

* [PATCH] KVM: PPC: Book3S: Add support for H_IPOLL and H_XIRR_X in XICS emulation
From: Paul Mackerras @ 2013-05-24  1:42 UTC (permalink / raw)
  To: kvm-ppc, kvm, linuxppc-dev; +Cc: Alexander Graf

This adds the remaining two hypercalls defined by PAPR for manipulating
the XICS interrupt controller, H_IPOLL and H_XIRR_X.  H_IPOLL returns
information about the priority and pending interrupts for a virtual
cpu, without changing any state.  H_XIRR_X is like H_XIRR in that it
reads and acknowledges the highest-priority pending interrupt, but it
also returns the timestamp (timebase register value) from when the
interrupt was first received by the hypervisor.  Currently we just
return the current time, since we don't do any software queueing of
virtual interrupts inside the XICS emulation code.

These hcalls are not currently used by Linux guests, but may be in
future.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
Unfortunately I missed these two hcalls in the previous submissions.
It would be good to get this patch into 3.10 so we don't have a
kernel version with these calls missing from the API, in case future
guest kernels want to use them.

Alex, given you're on vacation at the moment, are you OK with Ben
taking this through his tree?

 arch/powerpc/include/asm/hvcall.h |    1 +
 arch/powerpc/kvm/book3s_hv.c      |    2 ++
 arch/powerpc/kvm/book3s_pr_papr.c |    2 ++
 arch/powerpc/kvm/book3s_xics.c    |   29 +++++++++++++++++++++++++++++
 4 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index cf4df8e..0c7f2bf 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -264,6 +264,7 @@
 #define H_GET_MPP		0x2D4
 #define H_HOME_NODE_ASSOCIATIVITY 0x2EC
 #define H_BEST_ENERGY		0x2F4
+#define H_XIRR_X		0x2FC
 #define H_RANDOM		0x300
 #define H_COP			0x304
 #define H_GET_MPP_X		0x314
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 065c9df..7e2059e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -562,6 +562,8 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 	case H_CPPR:
 	case H_EOI:
 	case H_IPI:
+	case H_IPOLL:
+	case H_XIRR_X:
 		if (kvmppc_xics_enabled(vcpu)) {
 			ret = kvmppc_xics_hcall(vcpu, req);
 			break;
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index 3750e3c..91d4b45 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -292,6 +292,8 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 	case H_CPPR:
 	case H_EOI:
 	case H_IPI:
+	case H_IPOLL:
+	case H_XIRR_X:
 		if (kvmppc_xics_enabled(vcpu))
 			return kvmppc_h_pr_xics_hcall(vcpu, cmd);
 		break;
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index f7a1037..94c1dd4 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -650,6 +650,23 @@ static noinline int kvmppc_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
 	return H_SUCCESS;
 }
 
+static int kvmppc_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server)
+{
+	union kvmppc_icp_state state;
+	struct kvmppc_icp *icp;
+
+	icp = vcpu->arch.icp;
+	if (icp->server_num != server) {
+		icp = kvmppc_xics_find_server(vcpu->kvm, server);
+		if (!icp)
+			return H_PARAMETER;
+	}
+	state = ACCESS_ONCE(icp->state);
+	kvmppc_set_gpr(vcpu, 4, ((u32)state.cppr << 24) | state.xisr);
+	kvmppc_set_gpr(vcpu, 5, state.mfrr);
+	return H_SUCCESS;
+}
+
 static noinline void kvmppc_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
 {
 	union kvmppc_icp_state old_state, new_state;
@@ -787,6 +804,18 @@ int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 	if (!xics || !vcpu->arch.icp)
 		return H_HARDWARE;
 
+	/* These requests don't have real-mode implementations at present */
+	switch (req) {
+	case H_XIRR_X:
+		res = kvmppc_h_xirr(vcpu);
+		kvmppc_set_gpr(vcpu, 4, res);
+		kvmppc_set_gpr(vcpu, 5, get_tb());
+		return rc;
+	case H_IPOLL:
+		rc = kvmppc_h_ipoll(vcpu, kvmppc_get_gpr(vcpu, 4));
+		return rc;
+	}
+
 	/* Check for real mode returning too hard */
 	if (xics->real_mode)
 		return kvmppc_xics_rm_complete(vcpu, req);
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] arch: configuration, deleting 'CONFIG_BUG' since always need it.
From: Chen Gang @ 2013-05-24  2:13 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Catalin Marinas, Linux-sh list, Heiko Carstens, paulus@samba.org,
	H. Peter Anvin, Michel Lespinasse, Hans-Christian Egtvedt,
	Linux-Arch, linux-s390, Russell King - ARM Linux, uml-devel,
	Yoshinori Sato, Richard Weinberger, Helge Deller,
	the arch/x86 maintainers, James E.J. Bottomley, mingo@redhat.com,
	Frederic Weisbecker, Paul McKenney, Håvard Skinnemoen,
	Serge Hallyn, Mike Frysinger, Arnd Bergmann, Will Deacon,
	Jeff Dike, Akinobu Mita, uml-user,
	uclinux-dist-devel@blackfin.uclinux.org, Thomas Gleixner,
	linux-arm-kernel@lists.infradead.org, Parisc List,
	linux-kernel@vger.kernel.org, Richard Kuo, Paul Mundt,
	Eric W. Biederman, linux-hexagon, Martin Schwidefsky, linux390,
	Andrew Morton, linuxppc-dev@lists.ozlabs.org, David Miller
In-Reply-To: <CAMuHMdUzcBuDupe4Fa6XieziGPMpPKC_k8X8gUUmDvLJ+Fe=Hg@mail.gmail.com>

On 05/23/2013 10:10 PM, Geert Uytterhoeven wrote:
> On Thu, May 23, 2013 at 2:50 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> > On Thu, May 23, 2013 at 02:09:02PM +0200, Arnd Bergmann wrote:
>>> >> On Thursday 23 May 2013, Russell King - ARM Linux wrote:
>>>> >> > This is the problem you guys are missing - unreachable() means "we lose
>>>> >> > control of the CPU at this point".
>>> >>
>>> >> I'm absolutely aware of this. Again, the current behaviour of doing nothing
>>> >> at all isn't very different from undefined behavior when you get when you
>>> >> get to the end of a function returning a pointer without a "return" statement,
>>> >> or when you return from a function that has determined that it is not safe
>>> >> to continue.
>> >
>> > Running off the end of a function like that is a different kettle of fish.
>> > The execution path is still as the compiler intends - what isn't is that
>> > the data returned is likely to be random trash.
>> >
>> > That's _quite_ different from the CPU starting to execute the contents
>> > of a literal data pool.
> I agree it's best to e.g. trap and reboot.

After read the arch/*/include/asm/bug.h,

It seems panic() is not suitable for NOMMU platforms (only m68k use it,
also need CONFIG_BUG and CONFIG_SUN3 enabled).

And unreachable() is need followed with an asm inline instruction (arm,
x86, powerpc mips...).

And __builtin_trap() is "the mechanism used may vary from release to
release so should not rely on any particular implementation" (ref to
"http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html", used by m68k,
sparc, ia64).

I can not find any *trap*() and *unreachable*() in "include/asm-generic/"

I can not find any suitable implementation which 'generic' enough to add
in "include/asm-generic/" (and in fact, CONFIG_BUG itself is not
'generic' enough to be in "include/asm-generic/").


At last, I still suggest to delete CONFIG_BUG, so most of architectures
can skip this issue firstly.

Then for specific architectures, also can get 3 benefits:

a. the related maintainers can implement it as their own willing (not
need discus it with another platform maintainers again);

b. the related maintainers can free use the platform specific features
(which can not be used in "include/asm-generic/");

c. the related maintainers are more familiar their own architectures
demands and requirements.



----------- arch/m68k/include/asm/bug.h --------------------------------

  1 #ifndef _M68K_BUG_H
  2 #define _M68K_BUG_H
  3
  4 #ifdef CONFIG_MMU
  5 #ifdef CONFIG_BUG
  6 #ifdef CONFIG_DEBUG_BUGVERBOSE
  7 #ifndef CONFIG_SUN3
  8 #define BUG() do { \
  9         printk("kernel BUG at %s:%d!\n", __FILE__, __LINE__); \
 10         __builtin_trap(); \
 11 } while (0)
 12 #else
 13 #define BUG() do { \
 14         printk("kernel BUG at %s:%d!\n", __FILE__, __LINE__); \
 15         panic("BUG!"); \
 16 } while (0)
 17 #endif
 18 #else
 19 #define BUG() do { \
 20         __builtin_trap(); \
 21 } while (0)
 22 #endif
 23
 24 #define HAVE_ARCH_BUG
 25 #endif
 26 #endif /* CONFIG_MMU */
 27
 28 #include <asm-generic/bug.h>
 29
 30 #endif




Thanks.
-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [PATCH 0/5 v2] VFIO PPC64: add VFIO support on POWERPC64
From: Alexey Kardashevskiy @ 2013-05-24  3:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-kernel, Paul Mackerras, linuxppc-dev, David Gibson
In-Reply-To: <1369320984.2646.135.camel@ul30vt.home>

On 05/24/2013 12:56 AM, Alex Williamson wrote:
> On Tue, 2013-05-21 at 13:33 +1000, Alexey Kardashevskiy wrote:
>> The series adds support for VFIO on POWERPC in user space (such as QEMU).
>> The in-kernel real mode IOMMU support is added by another series posted
>> separately.
>>
>> As the first and main aim of this series is the POWERNV platform support,
>> the "Enable on POWERNV platform" patch goes first and introduces an API
>> to be used by the VFIO IOMMU driver. The "Enable on pSeries platform" patch
>> simply registers PHBs in the IOMMU subsystem and expects the API to be present,
>> it enables VFIO support in fully emulated QEMU guests.
>>
>> The main change is that this series was changed and tested against v3.10-rc1.
>> It also contains some bugfixes which are mentioned (if any) in the patch messages.
>>
>> Alexey Kardashevskiy (3):
>>   powerpc/vfio: Enable on POWERNV platform
>>   powerpc/vfio: Implement IOMMU driver for VFIO
>>   powerpc/vfio: Enable on pSeries platform
>>
>>  Documentation/vfio.txt                      |   63 +++++
>>  arch/powerpc/include/asm/iommu.h            |   26 ++
>>  arch/powerpc/kernel/iommu.c                 |  323 +++++++++++++++++++++++
>>  arch/powerpc/platforms/powernv/pci-ioda.c   |    1 +
>>  arch/powerpc/platforms/powernv/pci-p5ioc2.c |    5 +-
>>  arch/powerpc/platforms/powernv/pci.c        |    2 +
>>  arch/powerpc/platforms/pseries/iommu.c      |    4 +
>>  drivers/iommu/Kconfig                       |    8 +
>>  drivers/vfio/Kconfig                        |    6 +
>>  drivers/vfio/Makefile                       |    1 +
>>  drivers/vfio/vfio.c                         |    1 +
>>  drivers/vfio/vfio_iommu_spapr_tce.c         |  377 +++++++++++++++++++++++++++
>>  include/uapi/linux/vfio.h                   |   34 +++
>>  13 files changed, 850 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
>>
> 
> These look ok to me, how do you want to integrate them?  Should I
> provide Acks on patches 2 & 3 and let them get pushed through the ppc
> tree or should I wait for patch 1 then push 2 & 3 through my tree?

Please ack on 2 & 3 and Ben will merge all three into his tree. Thanks!



-- 
Alexey

^ permalink raw reply

* Re: 3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic)
From: CAI Qian @ 2013-05-24  3:33 UTC (permalink / raw)
  To: linux-s390, linuxppc-dev
  Cc: Dave Chinner, LKML, Steve Best, xfs, stable, Hendrik Brueckner
In-Reply-To: <1125086079.5019070.1369285040855.JavaMail.root@redhat.com>

OK, here is clearer stack output from the run.
CAI Qian

+ ./check
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/s390x ibm-z10-23 3.9.3

001=09 29s
002=09 3s
003=09 2s
004=09 [not run] this test requires a valid $SCRATCH_DEV
005=09 2s
006=09 9s
007=09 10s
008=09 7s
009=09 [not run] this test requires a valid $SCRATCH_DEV
010=09 [not run] dbtest was not built for this platform
011=09 9s
012=09 10s
013=09 35s
014=09 5s
015=09 [not run] this test requires a valid $SCRATCH_DEV
016=09 [not run] this test requires a valid $SCRATCH_DEV
017=09 [not run] this test requires a valid $SCRATCH_DEV
018=09 [not run] this test requires a valid $SCRATCH_DEV
019=09 [not run] this test requires a valid $SCRATCH_DEV
020=09


[ 1316.571227] XFS (dm-0): Mounting Filesystem
[ 1316.697803] XFS (dm-0): Ending clean mount
[ 1318.080615] XFS (dm-0): Ending clean mount
[ 1348.791125] XFS (dm-0): Mounting Filesystem
[ 1348.989166] XFS (dm-0): Ending clean mount
[ 1353.335478] XFS (dm-0): Mounting Filesystem
[ 1353.496364] XFS (dm-0): Ending clean mount
[ 1357.495427] XFS (dm-0): Mounting Filesystem
[ 1357.676971] XFS (dm-0): Ending clean mount
[ 1361.646399] XFS (dm-0): Mounting Filesystem
[ 1361.890426] XFS (dm-0): Ending clean mount
[ 1371.798944] XFS (dm-0): Mounting Filesystem
[ 1371.976922] XFS (dm-0): Ending clean mount
[ 1384.559103] XFS (dm-0): Mounting Filesystem
[ 1384.725657] XFS (dm-0): Ending clean mount
[ 1393.131347] XFS (dm-0): Mounting Filesystem
[ 1393.357927] XFS (dm-0): Ending clean mount
[ 1407.282708] XFS (dm-0): Mounting Filesystem
[ 1407.745176] XFS (dm-0): Ending clean mount
[ 1422.927074] XFS (dm-0): Mounting Filesystem
[ 1423.136266] XFS (dm-0): Ending clean mount
[ 1425.500910] XFS (dm-0): Mounting Filesystem
[ 1425.608851] XFS (dm-0): Ending clean mount
[ 1450.978110] XFS (dm-0): Mounting Filesystem
[ 1451.255368] XFS (dm-0): Ending clean mount
[ 1453.603742] XFS (dm-0): Mounting Filesystem
[ 1453.680657] XFS (dm-0): Ending clean mount
[ 1456.262266] XFS (dm-0): Mounting Filesystem
[ 1456.330515] XFS (dm-0): Ending clean mount
[ 1457.053767] XFS (dm-0): Mounting Filesystem
[ 1457.107258] XFS (dm-0): Ending clean mount
[ 1462.049374] XFS (dm-0): Mounting Filesystem
[ 1462.111389] XFS (dm-0): Ending clean mount
[ 1471.109589] ODEBUG: deactivate not available (active state 0) object typ=
e: ti
mer_list hint: process_timeout+0x0/0x8
[ 1471.109683] ------------[ cut here ]------------
[ 1471.109688] WARNING: at lib/debugobjects.c:260
[ 1471.109692] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_n=
s(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) =
ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable=
_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ip=
v4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtabl=
es(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth=
_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_=
mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F)=
 dm_l
og(F) dm_mod(F)
[ 1471.109848] CPU: 0 Tainted: GF            3.9.3 #2
[ 1471.109858] Process swapper/0 (pid: 0, task: 0000000000a2b4d0, ksp: 0000=
00000
0a17d28)
[ 1471.109868] Krnl PSW : 0404c00180000000 000000000046c84a (debug_print_ob=
ject+
0xca/0xd8)
[ 1471.114762]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:=
0 EA:
3
Krnl GPRS: 0000000000000000 0000000000a2b4d0 0000000000000067 000000000101f=
708
[ 1471.114769]            000000000046c846 0000000084a4d448 000000000086936=
a 000
0000001040700
[ 1471.114773]            0000000001a0f290 0400000000000001 0000000000874cf=
8 000
0000000a395d8
[ 1471.114777]            000000000195f820 000000000001bd20 000000000046c84=
6 000
000000001bc20
[ 1471.114792] Krnl Code: 000000000046c83a: e34410000004        lg      %r4=
,0(%r
4,%r1)
           000000000046c840: c0e500139f88       brasl   %r14,6e0750
          #000000000046c846: a7f40001           brc     15,46c848
          >000000000046c84a: a7f4ffc2           brc     15,46c7ce
           000000000046c84e: a7290000           lghi    %r2,0
           000000000046c852: a7f4ffd7           brc     15,46c800
           000000000046c856: 0707               bcr     0,%r7
           000000000046c858: ebaff0680024       stmg    %r10,%r15,104(%r15)
[ 1471.114825] Call Trace:
[ 1471.114828] ([<000000000046c846>] debug_print_object+0xc6/0xd8)
[ 1471.114833]  [<000000000046d35c>] debug_object_deactivate+0x15c/0x160
[ 1471.114838]  [<0000000000148244>] run_timer_softirq+0x180/0x464
[ 1471.114843]  [<000000000013d8d6>] __do_softirq+0x112/0x42c
[ 1471.114847]  [<000000000013ddf8>] irq_exit+0xc8/0xe8
[ 1471.114851]  [<000000000010d55e>] do_extint+0x25e/0x318
[ 1471.114859]  [<00000000006f0d90>] ext_skip+0x40/0x44
[ 1471.114866]  [<00000000006f05d6>] vtime_stop_cpu+0x52/0xbc
[ 1471.114870] ([<00000000006f05b4>] vtime_stop_cpu+0x30/0xbc)
[ 1471.114874]  [<000000000010476e>] cpu_idle+0x112/0x1b8
[ 1471.114879]  [<0000000000aaf99a>] start_kernel+0x42e/0x43c
[ 1471.114885]  [<0000000000100020>] _stext+0x20/0x80
[ 1471.114891] 2 locks held by swapper/0/0:
[ 1471.114894]  #0:  (&(&base->lock)->rlock){-.-.-.}, at: [<000000000014812=
8>] r
un_timer_softirq+0x64/0x464
[ 1471.114908]  #1:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000046d2ba>] =
debug
_object_deactivate+0xba/0x160
[ 1471.114918] Last Breaking-Event-Address:
[ 1471.114921]  [<000000000046c846>] debug_print_object+0xc6/0xd8
[ 1471.114927] ---[ end trace dd87895f75677361 ]---
[ 1471.117683] ODEBUG: object is on stack, but not annotated
[ 1471.117723] ------------[ cut here ]------------
[ 1471.117726] WARNING: at lib/debugobjects.c:300
[ 1471.117729] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_n=
s(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) =
ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable=
_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ip=
v4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtabl=
es(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth=
_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_=
mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F)=
 dm_l
og(F) dm_mod(F)
[ 1471.117791] CPU: 0 Tainted: GF       W    3.9.3 #2
[ 1471.117794] Process rcu_sched (pid: 10, task: 00000000fffe8000, ksp: 000=
00000
fffe7ae0)
[ 1471.117797] Krnl PSW : 0404c00180000000 000000000046ccce (__debug_object=
_init
+0x35a/0x480)
[ 1471.117804]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:=
0 EA:
3
Krnl GPRS: 0000000000000000 00000000fffe8000 000000000000002d 000000000101f=
708
[ 1471.117811]            000000000046ccca 0000000000000001 0000000000a395d=
8 070
0000000000000
[ 1471.117814]            0000000000a78750 0000000001a0f298 00000000fffe7cf=
0 000
00000f8daa938
[ 1471.117818]            0000000001a0f290 00000000007434c0 000000000046ccc=
a 00
00000fffe7a40
[ 1471.117852] Krnl Code: 000000000046ccbe: c02000217617        larl    %r2=
,89b8
ec
           000000000046ccc4: c0e500139d46       brasl   %r14,6e0750
          #000000000046ccca: a7f40001           brc     15,46cccc
          >000000000046ccce: a7f4ff03           brc     15,46cad4
           000000000046ccd2: a7380000           lhi     %r3,0
           000000000046ccd6: a7f4ffe9           brc     15,46cca8
           000000000046ccda: c02000217622       larl    %r2,89b91e
           000000000046cce0: c0e500139d38       brasl   %r14,6e0750
[ 1471.117884] Call Trace:
[ 1471.117887] ([<000000000046ccca>] __debug_object_init+0x356/0x480)
[ 1471.117891]  [<0000000000146f42>] init_timer_key+0x3a/0x164
[ 1471.117897]  [<0000000000147eac>] timer_fixup_assert_init+0x50/0xa8
[ 1471.117901]  [<000000000046d7a4>] debug_object_assert_init+0x124/0x154
[ 1471.117906]  [<0000000000148676>] try_to_del_timer_sync+0x2e/0x94
[ 1471.117910]  [<000000000014878e>] del_timer_sync+0xb2/0x104
[ 1471.117915]  [<00000000006ea950>] schedule_timeout+0x190/0x37c
[ 1471.117922]  [<00000000001f1118>] rcu_gp_kthread+0x320/0x59c
[ 1471.117927]  [<0000000000164176>] kthread+0xe6/0xec
[ 1471.117934]  [<00000000006f0906>] kernel_thread_starter+0x6/0xc
[ 1471.117938]  [<00000000006f0900>] kernel_thread_starter+0x0/0xc
[ 1471.117943] 1 lock held by rcu_sched/10:
[ 1471.117945]  #0:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000046caba>] =
__deb
ug_object_init+0x146/0x480
[ 1471.117955] Last Breaking-Event-Address:
[ 1471.117958]  [<000000000046ccca>] __debug_object_init+0x356/0x480
[ 1471.117962] ---[ end trace dd87895f75677362 ]---
[ 1471.117967] ODEBUG: assert_init not available (active state 0) object ty=
pe: t
imer_list hint: stub_timer+0x0/0x8
[ 1471.117984] ------------[ cut here ]------------
[ 1471.117987] WARNING: at lib/debugobjects.c:260
[ 1471.117990] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_n=
s(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) =
ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable=
_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ip=
v4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtabl=
es(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth=
_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_=
mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F)=
 dm_l
og(F) dm_mod(F)
[ 1471.118053] CPU: 0 Tainted: GF       W    3.9.3 #2
[ 1471.118075] Process rcu_sched (pid: 10, task: 00000000fffe8000, ksp: 000=
00000
fffe7ae0)
[ 1471.118079] Krnl PSW : 0704e00180000000 000000000046c84a (debug_print_ob=
ject+
0xca/0xd8)
[ 1471.118085]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:=
0 EA:
3
Krnl GPRS: 0000000000000000 00000000fffe8000 0000000000000063 0000000000000=
001
[ 1471.118092]            000000000046c846 0000000000000001 000000000086936=
a 000
0000000a1d300
[ 1471.118095]            0000000001a0f290 07000000006ed3fc 000000000089ba1=
4 000
0000000a395d8
[ 1471.118099]            000000000195f820 00000000fffe7bc8 000000000046c84=
6 000
00000fffe7ac8
[ 1471.118109] Krnl Code: 000000000046c83a: e34410000004        lg      %r4=
,0(%r
4,%r1)
           000000000046c840: c0e500139f88       brasl   %r14,6e0750
          #000000000046c846: a7f40001           brc     15,46c848
          >000000000046c84a: a7f4ffc2           brc     15,46c7ce
           000000000046c84e: a7290000           lghi    %r2,0
           000000000046c852: a7f4ffd7           brc     15,46c800
           000000000046c856: 0707               bcr     0,%r7
           000000000046c858: ebaff0680024       stmg    %r10,%r15,104(%r15)
[ 1471.118141] Call Trace:
[ 1471.118144] ([<000000000046c846>] debug_print_object+0xc6/0xd8)
[ 1471.118149]  [<000000000046d7ce>] debug_object_assert_init+0x14e/0x154
[ 1471.118153]  [<0000000000148676>] try_to_del_timer_sync+0x2e/0x94
[ 1471.118158]  [<000000000014878e>] del_timer_sync+0xb2/0x104
[ 1471.118163]  [<00000000006ea950>] schedule_timeout+0x190/0x37c
[ 1471.118167]  [<00000000001f1118>] rcu_gp_kthread+0x320/0x59c
[ 1471.118172]  [<0000000000164176>] kthread+0xe6/0xec
[ 1471.118176]  [<00000000006f0906>] kernel_thread_starter+0x6/0xc
[ 1471.118180]  [<00000000006f0900>] kernel_thread_starter+0x0/0xc
[ 1471.118185] no locks held by rcu_sched/10.
[ 1471.118188] Last Breaking-Event-Address:
[ 1471.118190]  [<000000000046c846>] debug_print_object+0xc6/0xd8
[ 1471.118195] ---[ end trace dd87895f75677363 ]---
[ 1471.121980] Unable to handle kernel pointer dereference at virtual kerne=
l add
ress 000003ff8072c000
[ 1471.123714] Oops: 0011 [#1] SMP
[ 1471.123720] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_n=
s(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) =
ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable=
_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ip=
v4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtabl=
es(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth=
_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_=
mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F)=
 dm_l
og(F) dm_mod(F)
[ 1471.123777] CPU: 0 Tainted: GF       W    3.9.3 #2
[ 1471.123781] Process systemd-journal (pid: 516, task: 00000000f5ad8000, k=
sp: 0
0000000f544fbd0)
[ 1471.123784] Krnl PSW : 0704e00180000000 000003ff80749cc2 (xfs_vm_page_mk=
write
+0x12/0x1c [xfs])
[ 1471.154773]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:=
0 EA:
3
Krnl GPRS: 00000000000015e8 000003ff8072cc30 0000000000000001 0000000000000=
000
[ 1471.154781]            00000000006ef176 00000000007036b8 00000000f97b7d9=
8 000
003fffb3e7000
[ 1471.154784]            000003fffb3e7000 00000000f8f65a00 00000000f847273=
8 000
00000f8ffc2c0
[ 1471.154788]            00000000f8472738 000000000070d5e0 000003ff80749cb=
6 000
00000f544fc28
[ 1471.154802] Krnl Code: 000003ff80749cb6: 1222                ltr     %r2=
,%r2
           000003ff80749cb8: a784ffd8           brc     8,3ff80749c68
          #000003ff80749cbc: c010ffff17ba       larl    %r1,3ff8072cc30
          >000003ff80749cc2: e31010000004       lg      %r1,0(%r1)
           000003ff80749cc8: e32010000012       lt      %r2,0(%r1)
           000003ff80749cce: a7740032           brc     7,3ff80749d32
           000003ff80749cd2: e31003180004       lg      %r1,792
           000003ff80749cd8: e330101c0012       lt      %r3,28(%r1)
[ 1471.154838] Call Trace:
[ 1471.154841] ([<0000000000261870>] do_wp_page+0x6f8/0xa4c)
[ 1471.154851]  [<0000000000262f2c>] handle_pte_fault+0x65c/0xa10
[ 1471.154856]  [<0000000000264a22>] handle_mm_fault+0x182/0x260
[ 1471.154860]  [<00000000006f21d0>] do_protection_exception+0x1b8/0x3cc
[ 1471.154867]  [<00000000006f0a44>] pgm_check_handler+0x138/0x13c
[ 1471.154871]  [<000000008001d60a>] 0x8001d60a
[ 1471.154894] INFO: lockdep is turned off.
[ 1471.154897] Last Breaking-Event-Address:
[ 1471.154899]  [<00000000001ed7ca>] rcu_lockdep_current_cpu_online+0x6e/0x=
84
[ 1471.154909]
[ 1471.154913] ---[ end trace dd87895f75677364 ]---
[ 1471.157438] ------------[ cut here ]------------
[ 1471.157446] Kernel BUG at 00000000006eb186 [verbose debug info unavailab=
le]
[ 1471.157508] addressing exception: 0005 [#2] SMP
[ 1471.157514] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_n=
s(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) =
ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable=
_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ip=
v4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtabl=
es(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth=
_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_=
mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F)=
 dm_l
og(F) dm_mod(F)
[ 1471.157574] CPU: 0 Tainted: GF     D W    3.9.3 #2
[ 1471.157577] Process systemd-journal (pid: 516, task: 00000000f5ad8000, k=
sp: 0
0000000f544f578)
[ 1471.157580] Krnl PSW : 0404d00180000000 00000000006eb186 (mutex_lock_nes=
ted+0
xde/0x36c)
[ 1471.157591]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:=
0 EA:
3
Krnl GPRS: 000000000000166e 0000000000000000 00000000ffffffff 0000000000a78=
630
[ 1471.157597]            0000000000000000 0000000000000000 000000000000000=
2 000
00000100073f8
[ 1471.157601]            00000000f5ad8000 000000000101f708 000000000000000=
0 070
0000000000000
[ 1471.157604]            0000000010007388 0000000010007380 00000000006eb17=
0 000
00000f544f658
[ 1471.157616] Krnl Code: 00000000006eb178: a7180000            lhi     %r1=
,0
           00000000006eb17c: c027ffffffff       xilf    %r2,4294967295
          #00000000006eb182: ba12d008           cs      %r1,%r2,8(%r13)
          >00000000006eb186: 1211               ltr     %r1,%r1
           00000000006eb188: a774011b           brc     7,6eb3be
           00000000006eb18c: e31090000012       lt      %r1,0(%r9)
           00000000006eb192: a7740007           brc     7,6eb1a0
           00000000006eb196: e3d0d0700020       cg      %r13,112(%r13)
[ 1471.157650] Call Trace:
[ 1471.157652] ([<00000000006eb170>] mutex_lock_nested+0xc8/0x36c)
[ 1471.157657]  [<0000000000260e3e>] unmap_single_vma+0x786/0x7d0
[ 1471.157663]  [<000000000026219c>] unmap_vmas+0x50/0x74
[ 1471.157666]  [<000000000026b7aa>] exit_mmap+0x186/0x224
[ 1471.157671]  [<000000000012ef88>] mmput+0x84/0x110
[ 1471.157677]  [<0000000000139fd2>] do_exit+0x2c6/0xce8
[ 1471.157682]  [<0000000000100ef2>] die+0x13e/0x158
[ 1471.157687]  [<000000000011dd6e>] do_no_context+0xba/0xf0
[ 1471.157692]  [<00000000006f2714>] do_dat_exception+0x330/0x390
[ 1471.157697]  [<00000000006f0a44>] pgm_check_handler+0x138/0x13c
[ 1471.157701]  [<000003ff80749cc2>] xfs_vm_page_mkwrite+0x12/0x1c [xfs]
[ 1471.157800] ([<0000000000261870>] do_wp_page+0x6f8/0xa4c)
[ 1471.157805]  [<0000000000262f2c>] handle_pte_fault+0x65c/0xa10
[ 1471.157809]  [<0000000000264a22>] handle_mm_fault+0x182/0x260
[ 1471.157813]  [<00000000006f21d0>] do_protection_exception+0x1b8/0x3cc
[ 1471.157817]  [<00000000006f0a44>] pgm_check_handler+0x138/0x13c
[ 1471.157821]  [<000000008001d60a>] 0x8001d60a
[ 1471.157827] INFO: lockdep is turned off.
[ 1471.157829] Last Breaking-Event-Address:
[ 1471.157832]  [<00000000001af490>] trace_hardirqs_off_caller+0x7c/0xb0
[ 1471.157840]
[ 1471.157857] ---[ end trace dd87895f75677365 ]---
[ 1471.157863] Fixing recursive fault but reboot is needed!
[ 1471.157867] BUG: scheduling while atomic: systemd-journal/516/0x00000002
[ 1471.157870] INFO: lockdep is turned off.
[ 1471.157872] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_n=
s(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) =
ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable=
_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ip=
v4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtabl=
es(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth=
_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_=
mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F)=
 dm_l
og(F) dm_mod(F)
[ 1471.157928] CPU: 0 Tainted: GF     D W    3.9.3 #2
[ 1471.157931] Process systemd-journal (pid: 516, task: 00000000f5ad8000, k=
sp: 0
0000000f544f578)
[ 1471.157934]        00000000f544f168 00000000f544f178 0000000000000002 00=
00000
000000000
       00000000f544f208 00000000f544f180 00000000f544f180 0000000000100a1e
       0000000000000000 0000000000ffb000 000000000000000a 000000000000000a
       00000000f544f1c8 00000000f544f168 0000000000000000 0000000000000000
       00000000f5ad8000 0000000000100a1e 00000000f544f168 00000000f544f1b8
[ 1471.157966] Call Trace:
[ 1471.157968] ([<0000000000100920>] show_trace+0xf0/0x148)
[ 1471.157973]  [<00000000006e0e88>] __schedule_bug+0x80/0x98
[ 1471.157977]  [<00000000006ed9ea>] __schedule+0xc6a/0xcc8
[ 1471.157981]  [<000000000013a99e>] do_exit+0xc92/0xce8
[ 1471.157985]  [<0000000000100ef2>] die+0x13e/0x158
[ 1471.157990]  [<00000000006f02ac>] do_per_trap+0x0/0xb4
[ 1471.157993]  [<00000000006f0a44>] pgm_check_handler+0x138/0x13c
[ 1471.157997]  [<00000000006eb186>] mutex_lock_nested+0xde/0x36c
[ 1471.158002] ([<00000000006eb170>] mutex_lock_nested+0xc8/0x36c)
[ 1471.158006]  [<0000000000260e3e>] unmap_single_vma+0x786/0x7d0
[ 1471.158010]  [<000000000026219c>] unmap_vmas+0x50/0x74
[ 1471.158014]  [<000000000026b7aa>] exit_mmap+0x186/0x224
[ 1471.158018]  [<000000000012ef88>] mmput+0x84/0x110
[ 1471.158021]  [<0000000000139fd2>] do_exit+0x2c6/0xce8
[ 1471.158025]  [<0000000000100ef2>] die+0x13e/0x158
[ 1471.158029]  [<000000000011dd6e>] do_no_context+0xba/0xf0
[ 1471.158033]  [<00000000006f2714>] do_dat_exception+0x330/0x390
[ 1471.158037]  [<00000000006f0a44>] pgm_check_handler+0x138/0x13c
[ 1471.158041]  [<000003ff80749cc2>] xfs_vm_page_mkwrite+0x12/0x1c [xfs]
[ 1471.158070] ([<0000000000261870>] do_wp_page+0x6f8/0xa4c)
[ 1471.158074]  [<0000000000262f2c>] handle_pte_fault+0x65c/0xa10
[ 1471.158078]  [<0000000000264a22>] handle_mm_fault+0x182/0x260
[ 1471.158082]  [<00000000006f21d0>] do_protection_exception+0x1b8/0x3cc
[ 1471.158086]  [<00000000006f0a44>] pgm_check_handler+0x138/0x13c
[ 1471.158090]  [<000000008001d60a>] 0x8001d60a
[ 1471.158094] INFO: lockdep is turned off.
[ 1471.158221] Unable to handle kernel paging request at virtual user addre=
ss
        (null)
[ 1471.158247] Oops: 0004 [#3] SMP
[ 1471.158251] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_n=
s(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) =
ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable=
_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ip=
v4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtabl=
es(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth=
_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_=
mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F)=
 dm_l
og(F) dm_mod(F)
[ 1471.158306] CPU: 0 Tainted: GF     D W    3.9.3 #2
[ 1471.158309] Process in:imklog (pid: 765, task: 00000000fc7f2430, ksp: 00=
00000
0fc36fd50)
[ 1471.158312] Krnl PSW : 0404e00180000000 0000000000183520 (tg_load_down+0=
x60/0
xb0)
[ 1471.158320]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:=
0 EA:
3
Krnl GPRS: 0000000000000040 0000000000000000 0000000000000000 0000000000000=
008
[ 1471.158327]            0000000000000000 0000000000000000 00000000f925340=
0 000
0000000aa0854
[ 1471.158330]            0000000000aa0852 00000000001834c0 0000000000170ed=
c 000
0000000000000
[ 1471.158333]            00000000f9187e00 00000000fcd5c800 000000000017a33=
0 000
00000fc36fb48
[ 1471.158343] Krnl Code: 0000000000183512: a7290000            lghi    %r2=
,0
           0000000000183516: e31310000004       lg      %r1,0(%r3,%r1)
          #000000000018351c: b9040045           lgr     %r4,%r5
          >0000000000183520: e34010980024       stg     %r4,152(%r1)
           0000000000183526: ebbcf0700004       lmg     %r11,%r12,112(%r15)
           000000000018352c: 07fe               bcr     15,%r14
           000000000018352e: eb430003000d       sllg    %r4,%r3,3
           0000000000183534: c05000451512       larl    %r5,a25f58
[ 1471.158377] Call Trace:
[ 1471.158380] ([<000000000018bd96>] load_balance+0x236/0xc7c)
[ 1471.158385]  [<000000000018be16>] load_balance+0x2b6/0xc7c
[ 1471.158389]  [<000000000018d0a2>] idle_balance+0x212/0x494
[ 1471.158393]  [<00000000006ed9ae>] __schedule+0xc2e/0xcc8
[ 1471.158397]  [<00000000006f0c30>] io_reschedule+0xa/0x1a
[ 1471.158402]  [<000003fffd3bea22>] 0x3fffd3bea22
[ 1471.158406] INFO: lockdep is turned off.
[ 1471.158408] Last Breaking-Event-Address:
[ 1471.158411]  [<000000000017a32e>] walk_tg_tree_from+0x46/0x190
[ 1471.158416]
[ 1471.158418] ---[ end trace dd87895f75677366 ]---
[ 1471.158423] BUG: sleeping function called from invalid context at kernel=
/rwse
m.c:20
[ 1471.158426] in_atomic(): 1, irqs_disabled(): 0, pid: 765, name: in:imklo=
g
[ 1471.158429] INFO: lockdep is turned off.
[ 1471.158432] CPU: 0 Tainted: GF     D W    3.9.3 #2
[ 1471.158435] Process in:imklog (pid: 765, task: 00000000fc7f2430, ksp: 00=
00000
0fc36fd50)
[ 1471.158438]        00000000fc36f690 00000000fc36f6a0 0000000000000002 00=
00000
000000000
       00000000fc36f730 00000000fc36f6a8 00000000fc36f6a8 0000000000100a1e
       0000000000000000 00000000fc7f2870 000000000000000a 0404e0010000000a
       00000000fc36f6f0 00000000fc36f690 0000000000000000 0000000000000000
       00000000006fdbc8 0000000000100a1e 00000000fc36f690 00000000fc36f6e0
[ 1471.158470] Call Trace:
[ 1471.158473] ([<0000000000100920>] show_trace+0xf0/0x148)
[ 1471.158477]  [<0000000000175c0a>] __might_sleep+0x17a/0x238
[ 1471.158482]  [<00000000006ec4de>] down_read+0x42/0xe0
[ 1471.158487]  [<000000000014fea0>] exit_signals+0x38/0x144
[ 1471.158492]  [<0000000000139dd6>] do_exit+0xca/0xce8
[ 1471.158496]  [<0000000000100ef2>] die+0x13e/0x158
[ 1471.158500]  [<000000000011dd6e>] do_no_context+0xba/0xf0
[ 1471.158504]  [<00000000006f23de>] do_protection_exception+0x3c6/0x3cc
[ 1471.158508]  [<00000000006f0a44>] pgm_check_handler+0x138/0x13c
[ 1471.158513]  [<0000000000183520>] tg_load_down+0x60/0xb0
[ 1471.158517] ([<000000000018bd96>] load_balance+0x236/0xc7c)
[ 1471.158521]  [<000000000018be16>] load_balance+0x2b6/0xc7c
[ 1471.158525]  [<000000000018d0a2>] idle_balance+0x212/0x494
[ 1471.158529]  [<00000000006ed9ae>] __schedule+0xc2e/0xcc8
[ 1471.158533]  [<00000000006f0c30>] io_reschedule+0xa/0x1a
[ 1471.158537]  [<000003fffd3bea22>] 0x3fffd3bea22
[ 1471.158541] INFO: lockdep is turned off.
[ 1471.158545] note: in:imklog[765] exited with preempt_count 3
[ 1573.704911] Unable to handle kernel pointer dereference at virtual kerne=
l add
ress 00c040077ff00000
[ 1573.705014] Oops: 0038 [#4] SMP
[ 1573.705029] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_n=
s(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) =
ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable=
_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ip=
v4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtabl=
es(F)
 ip6table_filter(F) ip6_tables(F)00: HCPGSP2629I The virtual machine is pla=
ced i
n CP mode due to a SIGP stop from
 CPU 01.
 iptable_filter(F) ip_tables(F) sg(F) qeth_l2(F) vmur(F) xfs(F) libcrc32c(F=
) das
d_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_mod(F) ctcm(F) qeth(F) qdio(F) cc=
wgrou
p(F) fsm(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F)
[ 1573.705173] CPU: 1 Tainted: GF     D W    3.9.3 #2
[ 1573.705177] Process attr (pid: 22262, task: 00000000b1190000, ksp: 00000=
000b0
f43eb0)
[ 1573.705180] Krnl PSW : 0404e00180000000 000000000046c738 (lookup_object+=
0x20/
0x68)
[ 1573.705193]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:=
0 EA:
3
Krnl GPRS: 7440000000000000 00c040077ff000e8 00000000fadd1eb8 0000000000000=
003
[ 1573.705199]            000000000046d2ba 0000000000000000 0000000084c1a82=
8 000
0000084c1a868
[ 1573.705203]            0000000001964250 04000000001b5d64 0000000000a3bdf=
8 000
0000001964258
[ 1573.705206]            00000000fadd1eb8 00000000007434f0 000000000046d2c=
c 000
00000feb0b488
[ 1573.705221] Krnl Code: 000000000046c72c: a7380001            lhi     %r3=
,1
           000000000046c730: a7f40009           brc     15,46c742
          #000000000046c734: a73a0001           ahi     %r3,1
          >000000000046c738: e32010180020       cg      %r2,24(%r1)
           000000000046c73e: a7840016           brc     8,46c76a
           000000000046c742: e31010000002       ltg     %r1,0(%r1)
           000000000046c748: a774fff6           brc     7,46c734
           000000000046c74c: c010002dcfc0       larl    %r1,a266cc
[ 1573.705254] Call Trace:
[ 1573.705257] ([<000000000046d2ba>] debug_object_deactivate+0xba/0x160)
[ 1573.705263]  [<0000000000168890>] __run_hrtimer+0x58/0x474
[ 1573.705269]  [<0000000000169af2>] hrtimer_interrupt+0x116/0x2b0
[ 1573.705273]  [<0000000000104036>] clock_comparator_work+0x4a/0x54
[ 1573.705279]  [<000000000010d5b4>] do_extint+0x2b4/0x318
[ 1573.705285]  [<00000000006f0d90>] ext_skip+0x40/0x44
[ 1573.705292]  [<000003ff80108ba4>] ipt_do_table+0x298/0x8d0 [ip_tables]
[ 1573.705300] ([<000003ff801089d2>] ipt_do_table+0xc6/0x8d0 [ip_tables])
[ 1573.705305]  [<000003ff80165318>] nf_nat_ipv4_fn+0x1b4/0x354 [iptable_na=
t]
[ 1573.705310]  [<000003ff80165598>] nf_nat_ipv4_in+0x38/0x98 [iptable_nat]
[ 1573.705314]  [<00000000005d047c>] nf_iterate+0xd4/0x1b0
[ 1573.705903]  [<00000000005d0680>] nf_hook_slow+0x128/0x2cc
[ 1573.705909]  [<00000000005de956>] ip_rcv+0x272/0x360
[ 1573.705914]  [<0000000000592af6>] __netif_receive_skb_core+0xac2/0xe70
[ 1573.705919]  [<00000000005957e4>] netif_receive_skb+0x48/0x260
[ 1573.705924]  [<000003ff80095d16>] qeth_l2_poll+0x30a/0x540 [qeth_l2]
[ 1573.705938]  [<0000000000595f58>] net_rx_action+0x158/0x340
[ 1573.705942]  [<000000000013d8d6>] __do_softirq+0x112/0x42c
[ 1573.705947]  [<000000000013ddf8>] irq_exit+0xc8/0xe8
[ 1573.705951]  [<000000000010d55e>] do_extint+0x25e/0x318
[ 1573.705956]  [<00000000006f0d90>] ext_skip+0x40/0x44
[ 1573.705961]  [<000000000045a634>] memmove+0x14/0x5c
[ 1573.705968] ([<000003ff807d7a10>] __func__.41994+0x10/0xffffffffffffb600=
 [xfs
])
[ 1573.706104]  [<000003ff8076e6d2>] xfs_attr_leaf_compact+0xee/0x24c [xfs]
[ 1573.706224]  [<000003ff80772036>] xfs_attr_leaf_add+0x10a/0x268 [xfs]
[ 1573.706272]  [<000003ff8076a814>] xfs_attr_leaf_addname+0x11c/0x5a4 [xfs=
]
[ 1573.706303]  [<000003ff8076b028>] xfs_attr_set_int+0x38c/0x4b8 [xfs]
[ 1573.706333]  [<000003ff8076c9aa>] xfs_attr_set+0xb2/0xb8 [xfs]
[ 1573.706363]  [<000003ff8075d17e>] xfs_xattr_set+0x66/0xa0 [xfs]
[ 1573.706393]  [<00000000002d0a36>] generic_setxattr+0x7e/0x98
[ 1573.706400]  [<00000000002d1430>] __vfs_setxattr_noperm+0x78/0x1dc
[ 1573.706404]  [<00000000002d1652>] vfs_setxattr+0xbe/0xc4
[ 1573.706408]  [<00000000002d17b8>] setxattr+0x160/0x1c4
[ 1573.706412]  [<00000000002d1ba0>] SyS_lsetxattr+0xac/0xe8
[ 1573.706416]  [<00000000006f08b4>] sysc_tracego+0x14/0x1a
[ 1573.706420]  [<000003fffd2c2d1a>] 0x3fffd2c2d1a
[ 1573.706427] INFO: lockdep is turned off.
[ 1573.706430] Last Breaking-Event-Address:
[ 1573.706433]  [<000000000046c748>] lookup_object+0x30/0x68
[ 1573.706439]
[ 1573.706443] Kernel panic - not syncing: Fatal exception in interrupt
01: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 001=
0F384

----- Original Message -----
> From: "CAI Qian" <caiqian@redhat.com>
> To: "linux-s390" <linux-s390@vger.kernel.org>, linuxppc-dev@lists.ozlabs.=
org
> Cc: "LKML" <linux-kernel@vger.kernel.org>, stable@vger.kernel.org, xfs@os=
s.sgi.com, "Steve Best" <sbest@redhat.com>,
> "Hendrik Brueckner" <bhendrik@redhat.com>, "Dave Chinner" <david@fromorbi=
t.com>
> Sent: Thursday, May 23, 2013 12:57:20 PM
> Subject: 3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xf=
stests triggered panic)
>=20
> Original report:
> http://oss.sgi.com/archives/xfs/2013-05/msg00683.html
>=20
> Also seen on Power7:
> http://marc.info/?l=3Dlinux-kernel&m=3D136927904900692&w=3D2
>=20
> CAI Qian
>=20
> ----- Original Message -----
> > From: "Dave Chinner" <david@fromorbit.com>
> > To: "CAI Qian" <caiqian@redhat.com>
> > Cc: "LKML" <linux-kernel@vger.kernel.org>, stable@vger.kernel.org,
> > xfs@oss.sgi.com
> > Sent: Thursday, May 23, 2013 11:46:11 AM
> > Subject: Re: 3.9.2: xfstests triggered panic
> >=20
> > On Wed, May 22, 2013 at 11:16:56PM -0400, CAI Qian wrote:
> > > ----- Original Message -----
> > > > From: "Dave Chinner" <david@fromorbit.com>
> > > > To: "CAI Qian" <caiqian@redhat.com>
> > > > Cc: "LKML" <linux-kernel@vger.kernel.org>, stable@vger.kernel.org,
> > > > xfs@oss.sgi.com
> > > > Sent: Wednesday, May 22, 2013 5:53:00 PM
> > > > Subject: Re: 3.9.2: xfstests triggered panic
> > > >=20
> > > > On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
> > > > > Reproduced on almost all s390x guests by running xfstests.
> > > > >=20
> > > > > 14634.396658=C2=A8 XFS (dm-1): Mounting Filesystem
> > > > > 14634.525522=C2=A8 XFS (dm-1): Ending clean mount
> > > > > 14640.413007=C2=A8  <000000000017c6d4>=C2=A8 idle_balance+0x1a0/0=
x340
> > > > > 14640.413010=C2=A8  <000000000063303e>=C2=A8 __schedule+0xa22/0xa=
f0
> > > > > 14640.428279=C2=A8  <0000000000630da6>=C2=A8 schedule_timeout+0x1=
86/0x2c0
> > > > > 14640.428289=C2=A8  <00000000001cf864>=C2=A8 rcu_gp_kthread+0x1bc=
/0x298
> > > > > 14640.428300=C2=A8  <0000000000158c5a>=C2=A8 kthread+0xe6/0xec
> > > > > 14640.428304=C2=A8  <0000000000634de6>=C2=A8 kernel_thread_starte=
r+0x6/0xc
> > > > > 14640.428308=C2=A8  <0000000000634de0>=C2=A8 kernel_thread_starte=
r+0x0/0xc
> > > > > 14640.428311=C2=A8 Last Breaking-Event-Address:
> > > > > 14640.428314=C2=A8  <000000000016bd76>=C2=A8 walk_tg_tree_from+0x=
3a/0xf4
> > > > > 14640.428319=C2=A8  list_add corruption. next->prev should be pre=
v
> > > > > (0000000000000918
> > > > > ), but was           (null). (next=3D          (null)).
> > > >=20
> > > > Where's XFS in this? walk_tg_tree_from() is part of the scheduler
> > > > code. This kind of implies a stack corruption....
> > > >=20
> > > > > Sometimes, this pops up,
> > > > > [16907.275002] WARNING: at kernel/rcutree.c:1960
> > > > >=20
> > > > > or this,
> > > > > 15316.154171=C2=A8 XFS (dm-1): Mounting Filesystem
> > > > > 15316.255796=C2=A8 XFS (dm-1): Ending clean mount
> > > > > 15320.364246=C2=A8            00000000006367a2: e310b0080004     =
   lg
> > > > > %r1,8(%r
> > > > > 11)
> > > > > 15320.364249=C2=A8            00000000006367a8: 41101010         =
   la
> > > > > %r1,16(%
> > > > > r1)
> > > > > 15320.364251=C2=A8            00000000006367ac: e33010000004     =
   lg
> > > > > %r3,0(%r
> > > > > 1)
> > > > > 15320.364252=C2=A8 Call Trace:
> > > > > 15320.364252=C2=A8 Last Breaking-Event-Address:
> > > > > 15320.364253=C2=A8  =EF=BF=BD <0000000000000000>=C2=A8 Kernel sta=
ck overflow.
> > > > > 15320.364308=C2=A8 CPU: 0 Tainted: GF       W    3.9.2 #1
> > > > > 15320.364309=C2=A8 Process rhts-test-runne (pid: 625, task:
> > > > > 000000003dccc890,
> > > > > ksp: 0
> > > >=20
> > > > .... and there you go - a stack overflow. Your kernel stack size is
> > > > too small.
> > > >=20
> > > > I'd suggest that you need 16k stacks on s390 - IIRC every function
> > > > call has 128 byte stack frame, and there are call chains 70-80
> > > > functions deep in the storage stack...
> > > Hmm, I am unsure how to set to 16k stack there
> >=20
> > Are you build a 64 bit s390 kernel or a 32 bit kernel? 32 bit
> > kernels only have an 8k stack size, 64 bit kernels are 16k (see
> > arch/s390/Makefile).
> >=20
> > $ git grep STACK_SIZE arch/s390 |head -2
> > arch/s390/Makefile:STACK_SIZE   :=3D 8192
> > arch/s390/Makefile:STACK_SIZE   :=3D 16384
> >=20
> > As it is, the stack frame usage is worse than I thought:
> >=20
> > $ git grep STACK_FRAME_OVERHEAD arch/s390 |head -2
> > arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 96    =
  /*
> > size of minimum stack frame */
> > arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 160
> > /*
> > size of minimum stack frame */
> >=20
> > Overhead is 96 bytes for 32 bit and 160 bytes for 64 bit. So 16k
> > stack size is going to have big troubles with a 70-80 function deep
> > call chain.
> >=20
> > As for powerpc:
> >=20
> > arch/powerpc/include/asm/ppc_asm.h:#define STACKFRAMESIZE 256
> >=20
> > Yeah, same issue.
> >=20
> > But, seriously, these stack traces are meaningless to anyone not
> > familiar with s390 or power7 - they indicate a problem detected
> > in the idle loop, not where ever the stack overran.
> >=20
> > Can you please work with the s390/power7 people to obtain whatever
> > stack it was that overflowed, and we can go from there.
> >=20
> > Cheers,
> >=20
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
> >=20
>=20

^ permalink raw reply

* Re: [PATCH] arch: configuration, deleting 'CONFIG_BUG' since always need it.
From: Chen Gang @ 2013-05-24  4:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Catalin Marinas, Linux-sh list, Heiko Carstens, paulus@samba.org,
	H. Peter Anvin, Michel Lespinasse, Hans-Christian Egtvedt,
	Linux-Arch, linux-s390, Russell King - ARM Linux, uml-devel,
	Yoshinori Sato, Richard Weinberger, Helge Deller,
	the arch/x86 maintainers, James E.J. Bottomley, mingo@redhat.com,
	Frederic Weisbecker, Paul McKenney, Håvard Skinnemoen,
	Serge Hallyn, Mike Frysinger, Arnd Bergmann, Will Deacon,
	Jeff Dike, Akinobu Mita, uml-user,
	uclinux-dist-devel@blackfin.uclinux.org, Thomas Gleixner,
	linux-arm-kernel@lists.infradead.org, Parisc List,
	linux-kernel@vger.kernel.org, Richard Kuo, Paul Mundt,
	Eric W. Biederman, linux-hexagon, Martin Schwidefsky, linux390,
	Andrew Morton, linuxppc-dev@lists.ozlabs.org, David Miller
In-Reply-To: <519ECCCF.8090909@asianux.com>

On 05/24/2013 10:13 AM, Chen Gang wrote:
> On 05/23/2013 10:10 PM, Geert Uytterhoeven wrote:
>> On Thu, May 23, 2013 at 2:50 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>>>> On Thu, May 23, 2013 at 02:09:02PM +0200, Arnd Bergmann wrote:
>>>>>> On Thursday 23 May 2013, Russell King - ARM Linux wrote:
>>>>>>>> This is the problem you guys are missing - unreachable() means "we lose
>>>>>>>> control of the CPU at this point".
>>>>>>
>>>>>> I'm absolutely aware of this. Again, the current behaviour of doing nothing
>>>>>> at all isn't very different from undefined behavior when you get when you
>>>>>> get to the end of a function returning a pointer without a "return" statement,
>>>>>> or when you return from a function that has determined that it is not safe
>>>>>> to continue.
>>>>
>>>> Running off the end of a function like that is a different kettle of fish.
>>>> The execution path is still as the compiler intends - what isn't is that
>>>> the data returned is likely to be random trash.
>>>>
>>>> That's _quite_ different from the CPU starting to execute the contents
>>>> of a literal data pool.
>> I agree it's best to e.g. trap and reboot.
> 

In fact: if enable CONFIG_BUG, but not enable HAVE_ARCH_BUG, the
default implementation is:

 47 #ifndef HAVE_ARCH_BUG
 48 #define BUG() do { \
 49         printk("BUG: failure at %s:%d/%s()!\n", __FILE__, __LINE__, __func__); \
 50         panic("BUG!"); \
 51 } while (0)
 52 #endif

So if we delete CONFIG_BUG, the default implementation will be almost
like panic(),  and in panic() itself, also calls printk() !!

So...

:-)



> After read the arch/*/include/asm/bug.h,
> 
> It seems panic() is not suitable for NOMMU platforms (only m68k use it,
> also need CONFIG_BUG and CONFIG_SUN3 enabled).
> 
> And unreachable() is need followed with an asm inline instruction (arm,
> x86, powerpc mips...).
> 
> And __builtin_trap() is "the mechanism used may vary from release to
> release so should not rely on any particular implementation" (ref to
> "http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html", used by m68k,
> sparc, ia64).
> 
> I can not find any *trap*() and *unreachable*() in "include/asm-generic/"
> 
> I can not find any suitable implementation which 'generic' enough to add
> in "include/asm-generic/" (and in fact, CONFIG_BUG itself is not
> 'generic' enough to be in "include/asm-generic/").
> 
> 
> At last, I still suggest to delete CONFIG_BUG, so most of architectures
> can skip this issue firstly.
> 
> Then for specific architectures, also can get 3 benefits:
> 
> a. the related maintainers can implement it as their own willing (not
> need discus it with another platform maintainers again);
> 
> b. the related maintainers can free use the platform specific features
> (which can not be used in "include/asm-generic/");
> 
> c. the related maintainers are more familiar their own architectures
> demands and requirements.
> 
> 
> 
> ----------- arch/m68k/include/asm/bug.h --------------------------------
> 
>   1 #ifndef _M68K_BUG_H
>   2 #define _M68K_BUG_H
>   3
>   4 #ifdef CONFIG_MMU
>   5 #ifdef CONFIG_BUG
>   6 #ifdef CONFIG_DEBUG_BUGVERBOSE
>   7 #ifndef CONFIG_SUN3
>   8 #define BUG() do { \
>   9         printk("kernel BUG at %s:%d!\n", __FILE__, __LINE__); \
>  10         __builtin_trap(); \
>  11 } while (0)
>  12 #else
>  13 #define BUG() do { \
>  14         printk("kernel BUG at %s:%d!\n", __FILE__, __LINE__); \
>  15         panic("BUG!"); \
>  16 } while (0)
>  17 #endif
>  18 #else
>  19 #define BUG() do { \
>  20         __builtin_trap(); \
>  21 } while (0)
>  22 #endif
>  23
>  24 #define HAVE_ARCH_BUG
>  25 #endif
>  26 #endif /* CONFIG_MMU */
>  27
>  28 #include <asm-generic/bug.h>
>  29
>  30 #endif
> 
> 
> 
> 
> Thanks.
> 


-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* RE: [PATCH] powerpc/mpc85xx: fix non-bootcpu cannot up after hibernation resume
From: Wang Dongsheng-B40534 @ 2013-05-24  5:21 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: Wood Scott-B07421, Li Yang-R58472, Zhao Chenhui-B35336,
	rjw@sisk.pl, paulus@samba.org, johannes@sipsolutions.net,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20130523173332.GC30160@teo>

VGhhbmtzIGFudG9uLg0KDQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IEFu
dG9uIFZvcm9udHNvdiBbbWFpbHRvOmFudG9uQHNjYXJ5YnVncy5vcmddIE9uIEJlaGFsZiBPZiBB
bnRvbg0KPiBWb3JvbnRzb3YNCj4gU2VudDogRnJpZGF5LCBNYXkgMjQsIDIwMTMgMTozNCBBTQ0K
PiBUbzogV2FuZyBEb25nc2hlbmctQjQwNTM0DQo+IENjOiBwYXVsdXNAc2FtYmEub3JnOyByandA
c2lzay5wbDsgYmVuaEBrZXJuZWwuY3Jhc2hpbmcub3JnOw0KPiBqb2hhbm5lc0BzaXBzb2x1dGlv
bnMubmV0OyBXb29kIFNjb3R0LUIwNzQyMTsgTGkgWWFuZy1SNTg0NzI7IFpoYW8NCj4gQ2hlbmh1
aS1CMzUzMzY7IGxpbnV4cHBjLWRldkBsaXN0cy5vemxhYnMub3JnDQo+IFN1YmplY3Q6IFJlOiBb
UEFUQ0hdIHBvd2VycGMvbXBjODV4eDogZml4IG5vbi1ib290Y3B1IGNhbm5vdCB1cCBhZnRlcg0K
PiBoaWJlcm5hdGlvbiByZXN1bWUNCj4gDQo+IEhpIQ0KPiANCj4gT24gVHVlLCBNYXkgMTQsIDIw
MTMgYXQgMDg6NTk6MTNBTSArMDAwMCwgV2FuZyBEb25nc2hlbmctQjQwNTM0IHdyb3RlOg0KPiA+
IEkgc2VuZCB0byBhIHdyb25nIGVtYWlsIGFkZHJlc3MgIkFudG9uIFZvcm9udHNvdg0KPiA8YXZv
cm9udHNvdkBydS5tdmlzdGEuY29tPiINCj4gPg0KPiA+IEFkZCBBbnRvbiBWb3JvbnRzb3YgPGFu
dG9uLnZvcm9udHNvdkBsaW5hcm8ub3JnPiB0byB0aGlzIGVtYWlsLg0KPiANCj4gSSBkb24ndCBo
YXZlIGFueSBtZWFucyB0byB0ZXN0IGl0LCBidXQgdGhlIHBhdGNoIGl0c2VsZiBsb29rcyBnb29k
IGFuZA0KPiB0aGUgZGVzY3JpcHRpb24gbWFrZXMgc2Vuc2UuIFNvLA0KPiANCj4gUmV2aWV3ZWQt
Ynk6IEFudG9uIFZvcm9udHNvdiA8YW50b25AZW5vbXNnLm9yZz4NCj4gDQo+IFRoYW5rcyENCj4g
DQo+ID4NCj4gPiBUaGFua3MgYWxsLg0KPiA+DQo+ID4gPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2Ut
LS0tLQ0KPiA+ID4gRnJvbTogV2FuZyBEb25nc2hlbmctQjQwNTM0DQo+ID4gPiBTZW50OiBUdWVz
ZGF5LCBNYXkgMTQsIDIwMTMgNDowNiBQTQ0KPiA+ID4gVG86IGF2b3JvbnRzb3ZAcnUubXZpc3Rh
LmNvbQ0KPiA+ID4gQ2M6IHBhdWx1c0BzYW1iYS5vcmc7IHJqd0BzaXNrLnBsOyBiZW5oQGtlcm5l
bC5jcmFzaGluZy5vcmc7DQo+ID4gPiBqb2hhbm5lc0BzaXBzb2x1dGlvbnMubmV0OyBXb29kIFNj
b3R0LUIwNzQyMTsgTGkgWWFuZy1SNTg0NzI7IFpoYW8NCj4gPiA+IENoZW5odWktQjM1MzM2OyBs
aW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZzsgV2FuZyBEb25nc2hlbmctQjQwNTM0DQo+ID4g
PiBTdWJqZWN0OiBbUEFUQ0hdIHBvd2VycGMvbXBjODV4eDogZml4IG5vbi1ib290Y3B1IGNhbm5v
dCB1cCBhZnRlcg0KPiA+ID4gaGliZXJuYXRpb24gcmVzdW1lDQo+ID4gPg0KPiA+ID4gVGhpcyBw
cm9ibGVtIGJlbG9uZ3MgdG8gdGhlIGNvcmUgc3luY2hyb25pemF0aW9uIGlzc3Vlcy4NCj4gPiA+
IFRoZSBjcHUxIGFscmVhZHkgdXBkYXRlZCBzcGluX3RhYmxlIHZhbHVlcywgYnV0IGJvb3Rjb3Jl
IGNhbm5vdCBnZXQNCj4gPiA+IHRoaXMgdmFsdWUgaW4gdGltZS4NCj4gPiA+DQo+ID4gPiBBZnRl
ciBib290Y3B1IGhpYmllcm5hdGlvbiByZXN0b3JlIHRoZSBwYWdlcy4gd2UgYXJlIG5vdyBydW5u
aW5nDQo+ID4gPiB3aXRoIHRoZSBrZXJuZWwgZGF0YSBvZiB0aGUgb2xkIGtlcm5lbCBmdWxseSBy
ZXN0b3JlZC4gaWYgd2UgcmVzZXQNCj4gPiA+IHRoZSBub24tYm9vdGNwdXMgdGhhdCB3aWxsIGJl
IHJlc2V0IGNhY2hlKHRsYiksIHRoZSBub24tYm9vdGNwdXMNCj4gPiA+IHdpbGwgZ2V0IG5ldyBh
ZGRyZXNzKG1hcCB2aXJ0dWFsIGFuZCBwaHlzaWNhbCBhZGRyZXNzIHNwYWNlcykuDQo+ID4gPiBi
dXQgYm9vdGNwdSB0bGIgY2FjaGUgc3RpbGwgdXNlIGJvb3Qga2VybmVsIGRhdGEsIHNvIHdlIG5l
ZWQgdG8NCj4gPiA+IGludmFsaWRhdGUgdGhlIGJvb3RjcHUgdGxiIGNhY2hlIG1ha2UgaXQgdG8g
Z2V0IG5ldyBtYWluIG1lbW9yeSBkYXRhLg0KPiA+ID4NCj4gPiA+IGxvZzoNCj4gPiA+IEVuYWJs
aW5nIG5vbi1ib290IENQVXMgLi4uDQo+ID4gPiBzbXBfODV4eF9raWNrX2NwdTogdGltZW91dCB3
YWl0aW5nIGZvciBjb3JlIDEgdG8gcmVzZXQNCj4gPiA+IHNtcDogZmFpbGVkIHN0YXJ0aW5nIGNw
dSAxIChyYyAtMikNCj4gPiA+IEVycm9yIHRha2luZyBDUFUxIHVwOiAtMg0KPiA+ID4NCj4gPiA+
IFNpZ25lZC1vZmYtYnk6IFdhbmcgRG9uZ3NoZW5nIDxkb25nc2hlbmcud2FuZ0BmcmVlc2NhbGUu
Y29tPg0KPiA+ID4NCj4gPiA+IGRpZmYgLS1naXQgYS9hcmNoL3Bvd2VycGMva2VybmVsL3N3c3Vz
cF9ib29rZS5TDQo+ID4gPiBiL2FyY2gvcG93ZXJwYy9rZXJuZWwvc3dzdXNwX2Jvb2tlLlMNCj4g
PiA+IGluZGV4IDExYTM5MzAuLjk1MDMyNDkgMTAwNjQ0DQo+ID4gPiAtLS0gYS9hcmNoL3Bvd2Vy
cGMva2VybmVsL3N3c3VzcF9ib29rZS5TDQo+ID4gPiArKysgYi9hcmNoL3Bvd2VycGMva2VybmVs
L3N3c3VzcF9ib29rZS5TDQo+ID4gPiBAQCAtMTQxLDYgKzE0MSwxOSBAQCBfR0xPQkFMKHN3c3Vz
cF9hcmNoX3Jlc3VtZSkNCj4gPiA+ICAJbGlzCXIxMSxzd3N1c3Bfc2F2ZV9hcmVhQGgNCj4gPiA+
ICAJb3JpCXIxMSxyMTEsc3dzdXNwX3NhdmVfYXJlYUBsDQo+ID4gPg0KPiA+ID4gKwkvKg0KPiA+
ID4gKwkgKiBUaGUgYm9vdCBjb3JlIGdldCBhIHZpcnR1YWwgYWRkcmVzcywgd2hlbiB0aGUgYm9v
dCBwcm9jZXNzLA0KPiA+ID4gKwkgKiB0aGUgdmlydHVhbCBhZGRyZXNzIGNvcnJlc3BvbmRzIHRv
IGEgcGh5c2ljYWwgYWRkcmVzcy4gQWZ0ZXINCj4gPiA+ICsJICogaGliZXJuYXRpb24gcmVzdW1l
IG1lbW9yeSBzbmFwc2hvdHMsIFRoZSBjb3JyZXNwb25kaW5nDQo+ID4gPiArCSAqIHJlbGF0aW9u
c2hpcCBiZXR3ZWVuIHRoZSB2aXJ0dWFsIG1lbW9yeSBhbmQgcGh5c2ljYWwgbWVtb3J5DQo+ID4g
PiArCSAqIG1pZ2h0IGNoYW5nZSBhZ2Fpbi4gV2UgbmVlZCB0byBnZXQgYSBuZXcgcGFnZSB0YWJs
ZS4gU28gd2UNCj4gPiA+ICsJICogbmVlZCB0byBpbnZhbGlkYXRlIFRMQiBhZnRlciByZXN1bWUg
cGFnZXMuDQo+ID4gPiArCSAqDQo+ID4gPiArCSAqIEludmFsaWRhdGlvbnMgVExCIFVzaW5nIHRs
YmlseC90bGJpdmF4L01NVUNTUjAuDQo+ID4gPiArCSAqIHRsYmlseCB1c2VkIGhlcmUuDQo+ID4g
PiArCSAqLw0KPiA+ID4gKwlibAlfdGxiaWxfYWxsDQo+ID4gPiArDQo+ID4gPiAgCWx3eglyNCxT
TF9TUFJHMChyMTEpDQo+ID4gPiAgCW10c3ByZwkwLHI0DQo+ID4gPiAgCWx3eglyNCxTTF9TUFJH
MShyMTEpDQo+ID4gPiAtLQ0KPiA+ID4gMS44LjANCj4gPg0KPiA+DQoNCg==

^ permalink raw reply

* Re: [PATCH] arch: configuration, deleting 'CONFIG_BUG' since always need it.
From: Eric W. Biederman @ 2013-05-24  5:59 UTC (permalink / raw)
  Cc: Catalin Marinas, linux-sh, heiko.carstens, paulus@samba.org, hpa,
	walken, egtvedt, linux-arch, linux-s390, Russell King - ARM Linux,
	user-mode-linux-devel, ysato, richard, deller, x86, jejb,
	mingo@redhat.com, Geert Uytterhoeven, Frederic Weisbecker,
	paulmck, hskinnemoen, Serge Hallyn, Mike Frysinger, Arnd Bergmann,
	Will Deacon, jdike, akinobu.mita, user-mode-linux-user,
	uclinux-dist-devel, Thomas Gleixner,
	linux-arm-kernel@lists.infradead.org, linux-parisc,
	linux-kernel@vger.kernel.org, rkuo, lethal, linux-hexagon,
	schwidefsky, linux390, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org, David Miller
In-Reply-To: <519DCBEF.3090208@asianux.com>

Chen Gang <gang.chen@asianux.com> writes:

> The crazy user can unset 'CONFIG_BUG' in menuconfig: "> General setup >
> Configure standard kernel features (expert users) > BUG() Support".
>
> But in fact, we always need it, and quite a few of architectures have
> already implemented it (e.g. alpha, arc, arm, avr32, blackfin, cris,
> frv, ia64, m68k, mips, mn10300, parisc, powerpc, s390, sh, sparc, x86).
>
> And kernel also already has prepared a default effective implementation
> for the architectures which is unwilling to implement it by themselves
> (e.g. arm64, c6x, h8300, hexagon, m32r, metag, microblaze, openrisc,
> score, tile, um, unicore32, xtensa).
>
> So need get rid of 'CONFIG_BUG', and let it always enabled everywhere.


This looks like the right way to handle this to me.  If the BUG
annotations are too big and not needed they should simply be deleted
from the code base.  Disabling CONFIG_BUG which removes the BUG
annotations from the binaries without modifying the source code seems
like the wrong approach.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

^ permalink raw reply

* [git pull] Please pull powerpc.git merge branch
From: Benjamin Herrenschmidt @ 2013-05-24  9:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel list

Hi Linus !

Here are a few more powerpc fixes for 3.10. Some more P8 related
bits, a bunch of fixes for our P7+/P8 HW crypto drivers, some added
workarounds for those radeons that don't do proper 64-bit MSIs and
a couple of other trivialities by myself.

Cheers,
Ben.

The following changes since commit 519fe2ecb755b875d9814cdda19778c2e88c6901:

  Merge branch 'leds-fixes-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds (2013-05-21 11:41:07 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to f1dd153121dcb872ae6cba8d52bec97519eb7d97:

  powerpc/pseries: Make 32-bit MSI quirk work on systems lacking firmware support (2013-05-24 18:16:54 +1000)

----------------------------------------------------------------
Benjamin Herrenschmidt (5):
      powerpc: Fix TLB cleanup at boot on POWER8
      powerpc/pci: Fix bogus message at boot about empty memory resources
      powerpc/powernv: Fix condition for when to invalidate the TCE cache
      powerpc: Make radeon 32-bit MSI quirk work on powernv
      powerpc/powernv: Build a zImage.epapr

Brian King (1):
      powerpc/pseries: Make 32-bit MSI quirk work on systems lacking firmware support

Kent Yoder (1):
      drivers/crypto/nx: Fixes for multiple races and issues

Michael Ellerman (1):
      powerpc: Context switch more PMU related SPRs

 arch/powerpc/include/asm/pci-bridge.h     |    2 +
 arch/powerpc/include/asm/processor.h      |    6 +++
 arch/powerpc/kernel/asm-offsets.c         |    6 +++
 arch/powerpc/kernel/cpu_setup_power.S     |    8 ++-
 arch/powerpc/kernel/entry_64.S            |   28 +++++++++++
 arch/powerpc/kernel/pci-common.c          |    7 +--
 arch/powerpc/kernel/pci_64.c              |   10 ++++
 arch/powerpc/kernel/pci_dn.c              |    8 +++
 arch/powerpc/platforms/powernv/Kconfig    |    1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |   27 +++++------
 arch/powerpc/platforms/powernv/pci.c      |    6 ++-
 arch/powerpc/platforms/pseries/msi.c      |   75 +++++++++++++++--------------
 drivers/crypto/nx/nx-aes-cbc.c            |    1 +
 drivers/crypto/nx/nx-aes-ecb.c            |    1 +
 drivers/crypto/nx/nx-aes-gcm.c            |    2 +-
 drivers/crypto/nx/nx-sha256.c             |    8 +--
 drivers/crypto/nx/nx-sha512.c             |    7 +--
 drivers/crypto/nx/nx.c                    |   38 +++------------
 18 files changed, 146 insertions(+), 95 deletions(-)

^ permalink raw reply

* Re: [PATCH 2/2] net: mv643xx_eth: proper initialization for Kirkwood SoCs
From: Linus Walleij @ 2013-05-24 11:03 UTC (permalink / raw)
  To: Sebastian Hesselbarth, devicetree-discuss@lists.ozlabs.org,
	Grant Likely
  Cc: Andrew Lunn, Jason Cooper, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jason Gunthorpe, Lennert Buytenhek,
	linuxppc-dev@lists.ozlabs.org list, David Miller,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <519E9ADA.3040204@gmail.com>

On Fri, May 24, 2013 at 12:40 AM, Sebastian Hesselbarth
<sebastian.hesselbarth@gmail.com> wrote:
> On 05/23/2013 08:40 PM, Jason Cooper wrote:

>> I think marvell,psc1_reset =<>; gives us the most flexibility in
>> accurately describing the hardware.
>
>
> IMHO using that is just another workaround for a broken driver. We
> could hack the whole register setup in DT as it would still accurately
> describe HW. Don't get me wrong, but I don't like it.
>
> Haven't checked how happy Linus Walleij is about pinctrl drivers with
> reg values hacked in lately.

One of the things I've been ranting about lately is that Linux
subsystem maintainers have become de-facto device tree
standard commite chairs. :-(

So to the actual question:

In general I think we need to draw a line and define what we
mean with "describing the hardware" in a device tree.

We have some consensus:
- <reg> properties to describe regsiter BASE offset in physical
  memory and size.
- Resources like IRQ, DMA channel, regulator, GPIO pin control
  handles, are passed using <&ampersand> notation.

And so it goes on.

When it comes to defining different registers and their individual
bits and the meaning of these and/or default values, I personally
think that is making things harder for developers rather than
simplifying things. I know that pinctrl-single is anyway doing this
and I was talked into accepting it under circumstances where
developers are being passed opaque machine-generated
data that would otherwise be translated into unreadable header
files littering the kernel.

For a coder it is definately better if the *driver* know these
details, but whether that is possible seems to depend on things
like hardware development process.

IMO: if you want to go down that road, what you really want is not
ever more expressible device trees, but real open firmware,
or ACPI or UEFI that can interpret and run bytecode as some
"bios" for you. With DT coming from OF maybe this is a natural
progression of things, but one has to realize when we reach the
point where what we really want is a bios. Then your time is
likely better spent with Tianocore or something than with the
kernel.

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH v2 07/10] powerpc: uaccess s/might_sleep/might_fault/
From: Michael S. Tsirkin @ 2013-05-24 13:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-m32r-ja, kvm, Peter Zijlstra, Catalin Marinas, Will Deacon,
	David Howells, linux-mm, Paul Mackerras, H. Peter Anvin,
	linux-arch, linux-am33-list, Hirokazu Takata, x86, Ingo Molnar,
	microblaze-uclinux, Chris Metcalf, Thomas Gleixner,
	linux-arm-kernel, Michal Simek, linux-m32r, linux-kernel,
	Koichi Yasutake, linuxppc-dev
In-Reply-To: <201305221559.01806.arnd@arndb.de>

On Wed, May 22, 2013 at 03:59:01PM +0200, Arnd Bergmann wrote:
> On Thursday 16 May 2013, Michael S. Tsirkin wrote:
> > @@ -178,7 +178,7 @@ do {                                                                \
> >         long __pu_err;                                          \
> >         __typeof__(*(ptr)) __user *__pu_addr = (ptr);           \
> >         if (!is_kernel_addr((unsigned long)__pu_addr))          \
> > -               might_sleep();                                  \
> > +               might_fault();                                  \
> >         __chk_user_ptr(ptr);                                    \
> >         __put_user_size((x), __pu_addr, (size), __pu_err);      \
> >         __pu_err;                                               \
> > 
> 
> Another observation:
> 
> 	if (!is_kernel_addr((unsigned long)__pu_addr))
> 		might_sleep();
> 
> is almost the same as
> 
> 	might_fault();
> 
> except that it does not call might_lock_read().
> 
> The version above may have been put there intentionally and correctly, but
> if you want to replace it with might_fault(), you should remove the
> "if ()" condition.
> 
> 	Arnd

Well not exactly. The non-inline might_fault checks the
current segment, not the address.
I'm guessing this is trying to do the same just without
pulling in segment_eq, but I'd like a confirmation
from more PPC maintainers.

Guys would you ack

- 	if (!is_kernel_addr((unsigned long)__pu_addr))
- 		might_fault();
+ 	might_fault();

on top of this patch?

Also, any volunteer to test this (not just test-build)?

-- 
MST

^ permalink raw reply

* Re: [PATCH v2 07/10] powerpc: uaccess s/might_sleep/might_fault/
From: Michael S. Tsirkin @ 2013-05-24 13:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-m32r-ja, kvm, Peter Zijlstra, Catalin Marinas, Will Deacon,
	David Howells, linux-mm, Paul Mackerras, H. Peter Anvin,
	linux-arch, linux-am33-list, Hirokazu Takata, x86, Ingo Molnar,
	microblaze-uclinux, Chris Metcalf, Thomas Gleixner,
	linux-arm-kernel, Michal Simek, linux-m32r, linux-kernel,
	Koichi Yasutake, linuxppc-dev
In-Reply-To: <20130524130032.GA10167@redhat.com>

On Fri, May 24, 2013 at 04:00:32PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 22, 2013 at 03:59:01PM +0200, Arnd Bergmann wrote:
> > On Thursday 16 May 2013, Michael S. Tsirkin wrote:
> > > @@ -178,7 +178,7 @@ do {                                                                \
> > >         long __pu_err;                                          \
> > >         __typeof__(*(ptr)) __user *__pu_addr = (ptr);           \
> > >         if (!is_kernel_addr((unsigned long)__pu_addr))          \
> > > -               might_sleep();                                  \
> > > +               might_fault();                                  \
> > >         __chk_user_ptr(ptr);                                    \
> > >         __put_user_size((x), __pu_addr, (size), __pu_err);      \
> > >         __pu_err;                                               \
> > > 
> > 
> > Another observation:
> > 
> > 	if (!is_kernel_addr((unsigned long)__pu_addr))
> > 		might_sleep();
> > 
> > is almost the same as
> > 
> > 	might_fault();
> > 
> > except that it does not call might_lock_read().
> > 
> > The version above may have been put there intentionally and correctly, but
> > if you want to replace it with might_fault(), you should remove the
> > "if ()" condition.
> > 
> > 	Arnd
> 
> Well not exactly. The non-inline might_fault checks the
> current segment, not the address.
> I'm guessing this is trying to do the same just without
> pulling in segment_eq, but I'd like a confirmation
> from more PPC maintainers.
> 
> Guys would you ack
> 
> - 	if (!is_kernel_addr((unsigned long)__pu_addr))
> - 		might_fault();
> + 	might_fault();
> 
> on top of this patch?

OK I spoke too fast: I found this:

    powerpc: Fix incorrect might_sleep in __get_user/__put_user on kernel addresses
    
    We have a case where __get_user and __put_user can validly be used
    on kernel addresses in interrupt context - namely, the alignment
    exception handler, as our get/put_unaligned just do a single access
    and rely on the alignment exception handler to fix things up in the
    rare cases where the cpu can't handle it in hardware.  Thus we can
    get alignment exceptions in the network stack at interrupt level.
    The alignment exception handler does a __get_user to read the
    instruction and blows up in might_sleep().
    
    Since a __get_user on a kernel address won't actually ever sleep,
    this makes the might_sleep conditional on the address being less
    than PAGE_OFFSET.
    
    Signed-off-by: Paul Mackerras <paulus@samba.org>

So this won't work, unless we add the is_kernel_addr check
to might_fault. That will become possible on top of this patchset
but let's consider this carefully, and make this a separate
patchset, OK?

> Also, any volunteer to test this (not just test-build)?
> 
> -- 
> MST

^ permalink raw reply

* Re: [PATCH v2 07/10] powerpc: uaccess s/might_sleep/might_fault/
From: Arnd Bergmann @ 2013-05-24 13:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-m32r-ja, kvm, Peter Zijlstra, Catalin Marinas, Will Deacon,
	David Howells, linux-mm, Paul Mackerras, H. Peter Anvin,
	linux-arch, linux-am33-list, Hirokazu Takata, x86, Ingo Molnar,
	microblaze-uclinux, Chris Metcalf, Thomas Gleixner,
	linux-arm-kernel, Michal Simek, linux-m32r, linux-kernel,
	Koichi Yasutake, linuxppc-dev
In-Reply-To: <20130524131104.GA11462@redhat.com>

On Friday 24 May 2013, Michael S. Tsirkin wrote:
> So this won't work, unless we add the is_kernel_addr check
> to might_fault. That will become possible on top of this patchset
> but let's consider this carefully, and make this a separate
> patchset, OK?

Yes, makes sense.

	Arnd

^ permalink raw reply

* Re: [PATCH 2/3] powerpc/vfio: Implement IOMMU driver for VFIO
From: Alex Williamson @ 2013-05-24 14:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, linux-kernel, Paul Mackerras, linuxppc-dev, David Gibson
In-Reply-To: <1369107191-28547-3-git-send-email-aik@ozlabs.ru>

On Tue, 2013-05-21 at 13:33 +1000, Alexey Kardashevskiy wrote:
> VFIO implements platform independent stuff such as
> a PCI driver, BAR access (via read/write on a file descriptor
> or direct mapping when possible) and IRQ signaling.
> 
> The platform dependent part includes IOMMU initialization
> and handling.  This implements an IOMMU driver for VFIO
> which does mapping/unmapping pages for the guest IO and
> provides information about DMA window (required by a POWER
> guest).
> 
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Acked-by: Alex Williamson <alex.williamson@redhat.com>

> ---
>  Documentation/vfio.txt              |   63 ++++++
>  drivers/vfio/Kconfig                |    6 +
>  drivers/vfio/Makefile               |    1 +
>  drivers/vfio/vfio.c                 |    1 +
>  drivers/vfio/vfio_iommu_spapr_tce.c |  377 +++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h           |   34 ++++
>  6 files changed, 482 insertions(+)
>  create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> 
> diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> index 8eda363..c55533c 100644
> --- a/Documentation/vfio.txt
> +++ b/Documentation/vfio.txt
> @@ -283,6 +283,69 @@ a direct pass through for VFIO_DEVICE_* ioctls.  The read/write/mmap
>  interfaces implement the device region access defined by the device's
>  own VFIO_DEVICE_GET_REGION_INFO ioctl.
>  
> +
> +PPC64 sPAPR implementation note
> +-------------------------------------------------------------------------------
> +
> +This implementation has some specifics:
> +
> +1) Only one IOMMU group per container is supported as an IOMMU group
> +represents the minimal entity which isolation can be guaranteed for and
> +groups are allocated statically, one per a Partitionable Endpoint (PE)
> +(PE is often a PCI domain but not always).
> +
> +2) The hardware supports so called DMA windows - the PCI address range
> +within which DMA transfer is allowed, any attempt to access address space
> +out of the window leads to the whole PE isolation.
> +
> +3) PPC64 guests are paravirtualized but not fully emulated. There is an API
> +to map/unmap pages for DMA, and it normally maps 1..32 pages per call and
> +currently there is no way to reduce the number of calls. In order to make things
> +faster, the map/unmap handling has been implemented in real mode which provides
> +an excellent performance which has limitations such as inability to do
> +locked pages accounting in real time.
> +
> +So 3 additional ioctls have been added:
> +
> +	VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
> +		of the DMA window on the PCI bus.
> +
> +	VFIO_IOMMU_ENABLE - enables the container. The locked pages accounting
> +		is done at this point. This lets user first to know what
> +		the DMA window is and adjust rlimit before doing any real job.
> +
> +	VFIO_IOMMU_DISABLE - disables the container.
> +
> +
> +The code flow from the example above should be slightly changed:
> +
> +	.....
> +	/* Add the group to the container */
> +	ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
> +
> +	/* Enable the IOMMU model we want */
> +	ioctl(container, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU)
> +
> +	/* Get addition sPAPR IOMMU info */
> +	vfio_iommu_spapr_tce_info spapr_iommu_info;
> +	ioctl(container, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &spapr_iommu_info);
> +
> +	if (ioctl(container, VFIO_IOMMU_ENABLE))
> +		/* Cannot enable container, may be low rlimit */
> +
> +	/* Allocate some space and setup a DMA mapping */
> +	dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
> +			     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> +
> +	dma_map.size = 1024 * 1024;
> +	dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
> +	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
> +
> +	/* Check here is .iova/.size are within DMA window from spapr_iommu_info */
> +
> +	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
> +	.....
> +
>  -------------------------------------------------------------------------------
>  
>  [1] VFIO was originally an acronym for "Virtual Function I/O" in its
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 7cd5dec..b464687 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
>  	depends on VFIO
>  	default n
>  
> +config VFIO_IOMMU_SPAPR_TCE
> +	tristate
> +	depends on VFIO && SPAPR_TCE_IOMMU
> +	default n
> +
>  menuconfig VFIO
>  	tristate "VFIO Non-Privileged userspace driver framework"
>  	depends on IOMMU_API
>  	select VFIO_IOMMU_TYPE1 if X86
> +	select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
>  	help
>  	  VFIO provides a framework for secure userspace device drivers.
>  	  See Documentation/vfio.txt for more details.
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 2398d4a..72bfabc 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -1,3 +1,4 @@
>  obj-$(CONFIG_VFIO) += vfio.o
>  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
>  obj-$(CONFIG_VFIO_PCI) += pci/
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 295c48f..a819604 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1428,6 +1428,7 @@ static int __init vfio_init(void)
>  	 * drivers.
>  	 */
>  	request_module_nowait("vfio_iommu_type1");
> +	request_module_nowait("vfio_iommu_spapr_tce");
>  
>  	return 0;
>  
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> new file mode 100644
> index 0000000..bdae7a0
> --- /dev/null
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -0,0 +1,377 @@
> +/*
> + * VFIO: IOMMU DMA mapping support for TCE on POWER
> + *
> + * Copyright (C) 2013 IBM Corp.  All rights reserved.
> + *     Author: Alexey Kardashevskiy <aik@ozlabs.ru>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Derived from original vfio_iommu_type1.c:
> + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
> + *     Author: Alex Williamson <alex.williamson@redhat.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/err.h>
> +#include <linux/vfio.h>
> +#include <asm/iommu.h>
> +#include <asm/tce.h>
> +
> +#define DRIVER_VERSION  "0.1"
> +#define DRIVER_AUTHOR   "aik@ozlabs.ru"
> +#define DRIVER_DESC     "VFIO IOMMU SPAPR TCE"
> +
> +static void tce_iommu_detach_group(void *iommu_data,
> +		struct iommu_group *iommu_group);
> +
> +/*
> + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
> + *
> + * This code handles mapping and unmapping of user data buffers
> + * into DMA'ble space using the IOMMU
> + */
> +
> +/*
> + * The container descriptor supports only a single group per container.
> + * Required by the API as the container is not supplied with the IOMMU group
> + * at the moment of initialization.
> + */
> +struct tce_container {
> +	struct mutex lock;
> +	struct iommu_table *tbl;
> +	bool enabled;
> +};
> +
> +static int tce_iommu_enable(struct tce_container *container)
> +{
> +	int ret = 0;
> +	unsigned long locked, lock_limit, npages;
> +	struct iommu_table *tbl = container->tbl;
> +
> +	if (!container->tbl)
> +		return -ENXIO;
> +
> +	if (!current->mm)
> +		return -ESRCH; /* process exited */
> +
> +	if (container->enabled)
> +		return -EBUSY;
> +
> +	/*
> +	 * When userspace pages are mapped into the IOMMU, they are effectively
> +	 * locked memory, so, theoretically, we need to update the accounting
> +	 * of locked pages on each map and unmap.  For powerpc, the map unmap
> +	 * paths can be very hot, though, and the accounting would kill
> +	 * performance, especially since it would be difficult to impossible
> +	 * to handle the accounting in real mode only.
> +	 *
> +	 * To address that, rather than precisely accounting every page, we
> +	 * instead account for a worst case on locked memory when the iommu is
> +	 * enabled and disabled.  The worst case upper bound on locked memory
> +	 * is the size of the whole iommu window, which is usually relatively
> +	 * small (compared to total memory sizes) on POWER hardware.
> +	 *
> +	 * Also we don't have a nice way to fail on H_PUT_TCE due to ulimits,
> +	 * that would effectively kill the guest at random points, much better
> +	 * enforcing the limit based on the max that the guest can map.
> +	 */
> +	down_write(&current->mm->mmap_sem);
> +	npages = (tbl->it_size << IOMMU_PAGE_SHIFT) >> PAGE_SHIFT;
> +	locked = current->mm->locked_vm + npages;
> +	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +	if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
> +		pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n",
> +				rlimit(RLIMIT_MEMLOCK));
> +		ret = -ENOMEM;
> +	} else {
> +
> +		current->mm->locked_vm += npages;
> +		container->enabled = true;
> +	}
> +	up_write(&current->mm->mmap_sem);
> +
> +	return ret;
> +}
> +
> +static void tce_iommu_disable(struct tce_container *container)
> +{
> +	if (!container->enabled)
> +		return;
> +
> +	container->enabled = false;
> +
> +	if (!container->tbl || !current->mm)
> +		return;
> +
> +	down_write(&current->mm->mmap_sem);
> +	current->mm->locked_vm -= (container->tbl->it_size <<
> +			IOMMU_PAGE_SHIFT) >> PAGE_SHIFT;
> +	up_write(&current->mm->mmap_sem);
> +}
> +
> +static void *tce_iommu_open(unsigned long arg)
> +{
> +	struct tce_container *container;
> +
> +	if (arg != VFIO_SPAPR_TCE_IOMMU) {
> +		pr_err("tce_vfio: Wrong IOMMU type\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	container = kzalloc(sizeof(*container), GFP_KERNEL);
> +	if (!container)
> +		return ERR_PTR(-ENOMEM);
> +
> +	mutex_init(&container->lock);
> +
> +	return container;
> +}
> +
> +static void tce_iommu_release(void *iommu_data)
> +{
> +	struct tce_container *container = iommu_data;
> +
> +	WARN_ON(container->tbl && !container->tbl->it_group);
> +	tce_iommu_disable(container);
> +
> +	if (container->tbl && container->tbl->it_group)
> +		tce_iommu_detach_group(iommu_data, container->tbl->it_group);
> +
> +	mutex_destroy(&container->lock);
> +
> +	kfree(container);
> +}
> +
> +static long tce_iommu_ioctl(void *iommu_data,
> +				 unsigned int cmd, unsigned long arg)
> +{
> +	struct tce_container *container = iommu_data;
> +	unsigned long minsz;
> +	long ret;
> +
> +	switch (cmd) {
> +	case VFIO_CHECK_EXTENSION:
> +		return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
> +
> +	case VFIO_IOMMU_SPAPR_TCE_GET_INFO: {
> +		struct vfio_iommu_spapr_tce_info info;
> +		struct iommu_table *tbl = container->tbl;
> +
> +		if (WARN_ON(!tbl))
> +			return -ENXIO;
> +
> +		minsz = offsetofend(struct vfio_iommu_spapr_tce_info,
> +				dma32_window_size);
> +
> +		if (copy_from_user(&info, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (info.argsz < minsz)
> +			return -EINVAL;
> +
> +		info.dma32_window_start = tbl->it_offset << IOMMU_PAGE_SHIFT;
> +		info.dma32_window_size = tbl->it_size << IOMMU_PAGE_SHIFT;
> +		info.flags = 0;
> +
> +		if (copy_to_user((void __user *)arg, &info, minsz))
> +			return -EFAULT;
> +
> +		return 0;
> +	}
> +	case VFIO_IOMMU_MAP_DMA: {
> +		struct vfio_iommu_type1_dma_map param;
> +		struct iommu_table *tbl = container->tbl;
> +		unsigned long tce, i;
> +
> +		if (!tbl)
> +			return -ENXIO;
> +
> +		BUG_ON(!tbl->it_group);
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
> +
> +		if (copy_from_user(&param, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (param.argsz < minsz)
> +			return -EINVAL;
> +
> +		if (param.flags & ~(VFIO_DMA_MAP_FLAG_READ |
> +				VFIO_DMA_MAP_FLAG_WRITE))
> +			return -EINVAL;
> +
> +		if ((param.size & ~IOMMU_PAGE_MASK) ||
> +				(param.vaddr & ~IOMMU_PAGE_MASK))
> +			return -EINVAL;
> +
> +		/* iova is checked by the IOMMU API */
> +		tce = param.vaddr;
> +		if (param.flags & VFIO_DMA_MAP_FLAG_READ)
> +			tce |= TCE_PCI_READ;
> +		if (param.flags & VFIO_DMA_MAP_FLAG_WRITE)
> +			tce |= TCE_PCI_WRITE;
> +
> +		ret = iommu_tce_put_param_check(tbl, param.iova, tce);
> +		if (ret)
> +			return ret;
> +
> +		for (i = 0; i < (param.size >> IOMMU_PAGE_SHIFT); ++i) {
> +			ret = iommu_put_tce_user_mode(tbl,
> +					(param.iova >> IOMMU_PAGE_SHIFT) + i,
> +					tce);
> +			if (ret)
> +				break;
> +			tce += IOMMU_PAGE_SIZE;
> +		}
> +		if (ret)
> +			iommu_clear_tces_and_put_pages(tbl,
> +					param.iova >> IOMMU_PAGE_SHIFT,	i);
> +
> +		iommu_flush_tce(tbl);
> +
> +		return ret;
> +	}
> +	case VFIO_IOMMU_UNMAP_DMA: {
> +		struct vfio_iommu_type1_dma_unmap param;
> +		struct iommu_table *tbl = container->tbl;
> +
> +		if (WARN_ON(!tbl))
> +			return -ENXIO;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_dma_unmap,
> +				size);
> +
> +		if (copy_from_user(&param, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (param.argsz < minsz)
> +			return -EINVAL;
> +
> +		/* No flag is supported now */
> +		if (param.flags)
> +			return -EINVAL;
> +
> +		if (param.size & ~IOMMU_PAGE_MASK)
> +			return -EINVAL;
> +
> +		ret = iommu_tce_clear_param_check(tbl, param.iova, 0,
> +				param.size >> IOMMU_PAGE_SHIFT);
> +		if (ret)
> +			return ret;
> +
> +		ret = iommu_clear_tces_and_put_pages(tbl,
> +				param.iova >> IOMMU_PAGE_SHIFT,
> +				param.size >> IOMMU_PAGE_SHIFT);
> +		iommu_flush_tce(tbl);
> +
> +		return ret;
> +	}
> +	case VFIO_IOMMU_ENABLE:
> +		mutex_lock(&container->lock);
> +		ret = tce_iommu_enable(container);
> +		mutex_unlock(&container->lock);
> +		return ret;
> +
> +
> +	case VFIO_IOMMU_DISABLE:
> +		mutex_lock(&container->lock);
> +		tce_iommu_disable(container);
> +		mutex_unlock(&container->lock);
> +		return 0;
> +	}
> +
> +	return -ENOTTY;
> +}
> +
> +static int tce_iommu_attach_group(void *iommu_data,
> +		struct iommu_group *iommu_group)
> +{
> +	int ret;
> +	struct tce_container *container = iommu_data;
> +	struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group);
> +
> +	BUG_ON(!tbl);
> +	mutex_lock(&container->lock);
> +
> +	/* pr_debug("tce_vfio: Attaching group #%u to iommu %p\n",
> +			iommu_group_id(iommu_group), iommu_group); */
> +	if (container->tbl) {
> +		pr_warn("tce_vfio: Only one group per IOMMU container is allowed, existing id=%d, attaching id=%d\n",
> +				iommu_group_id(container->tbl->it_group),
> +				iommu_group_id(iommu_group));
> +		ret = -EBUSY;
> +	} else if (container->enabled) {
> +		pr_err("tce_vfio: attaching group #%u to enabled container\n",
> +				iommu_group_id(iommu_group));
> +		ret = -EBUSY;
> +	} else {
> +		ret = iommu_take_ownership(tbl);
> +		if (!ret)
> +			container->tbl = tbl;
> +	}
> +
> +	mutex_unlock(&container->lock);
> +
> +	return ret;
> +}
> +
> +static void tce_iommu_detach_group(void *iommu_data,
> +		struct iommu_group *iommu_group)
> +{
> +	struct tce_container *container = iommu_data;
> +	struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group);
> +
> +	BUG_ON(!tbl);
> +	mutex_lock(&container->lock);
> +	if (tbl != container->tbl) {
> +		pr_warn("tce_vfio: detaching group #%u, expected group is #%u\n",
> +				iommu_group_id(iommu_group),
> +				iommu_group_id(tbl->it_group));
> +	} else {
> +		if (container->enabled) {
> +			pr_warn("tce_vfio: detaching group #%u from enabled container, forcing disable\n",
> +					iommu_group_id(tbl->it_group));
> +			tce_iommu_disable(container);
> +		}
> +
> +		/* pr_debug("tce_vfio: detaching group #%u from iommu %p\n",
> +				iommu_group_id(iommu_group), iommu_group); */
> +		container->tbl = NULL;
> +		iommu_release_ownership(tbl);
> +	}
> +	mutex_unlock(&container->lock);
> +}
> +
> +const struct vfio_iommu_driver_ops tce_iommu_driver_ops = {
> +	.name		= "iommu-vfio-powerpc",
> +	.owner		= THIS_MODULE,
> +	.open		= tce_iommu_open,
> +	.release	= tce_iommu_release,
> +	.ioctl		= tce_iommu_ioctl,
> +	.attach_group	= tce_iommu_attach_group,
> +	.detach_group	= tce_iommu_detach_group,
> +};
> +
> +static int __init tce_iommu_init(void)
> +{
> +	return vfio_register_iommu_driver(&tce_iommu_driver_ops);
> +}
> +
> +static void __exit tce_iommu_cleanup(void)
> +{
> +	vfio_unregister_iommu_driver(&tce_iommu_driver_ops);
> +}
> +
> +module_init(tce_iommu_init);
> +module_exit(tce_iommu_cleanup);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> +
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 284ff24..87ee4f4 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -22,6 +22,7 @@
>  /* Extensions */
>  
>  #define VFIO_TYPE1_IOMMU		1
> +#define VFIO_SPAPR_TCE_IOMMU		2
>  
>  /*
>   * The IOCTL interface is designed for extensibility by embedding the
> @@ -375,4 +376,37 @@ struct vfio_iommu_type1_dma_unmap {
>  
>  #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
>  
> +/*
> + * IOCTLs to enable/disable IOMMU container usage.
> + * No parameters are supported.
> + */
> +#define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> +#define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> +
> +/* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> +
> +/*
> + * The SPAPR TCE info struct provides the information about the PCI bus
> + * address ranges available for DMA, these values are programmed into
> + * the hardware so the guest has to know that information.
> + *
> + * The DMA 32 bit window start is an absolute PCI bus address.
> + * The IOVA address passed via map/unmap ioctls are absolute PCI bus
> + * addresses too so the window works as a filter rather than an offset
> + * for IOVA addresses.
> + *
> + * A flag will need to be added if other page sizes are supported,
> + * so as defined here, it is always 4k.
> + */
> +struct vfio_iommu_spapr_tce_info {
> +	__u32 argsz;
> +	__u32 flags;			/* reserved for future use */
> +	__u32 dma32_window_start;	/* 32 bit window start (bytes) */
> +	__u32 dma32_window_size;	/* 32 bit window size (bytes) */
> +};
> +
> +#define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
> +
> +/* ***************************************************************** */
> +
>  #endif /* _UAPIVFIO_H */

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox