LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] pci-scan: Fix setting the limit
From: Alexey Kardashevskiy @ 2014-05-13 11:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras, Thomas Huth, Nikunj A Dadhania
In-Reply-To: <1399978118-10298-1-git-send-email-aik@ozlabs.ru>

On 05/13/2014 08:48 PM, Alexey Kardashevskiy wrote:
> PCI spec says that lower 20 bits are assumed 0xFFFFF. The existing code
> seems to get it right in pci-bridge-set-mem-limit.
> 
> However pci-bridge-set-mem-base does not account 0xFFFFF and poison
> the limit. Since the limit is not stored anywhere in SLOF and only
> besides in the config space, it remains broken.
> 
> This fixes pci-bridge-set-mem-base.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> I have doubts this is the right fix as I tried to "fix"
> pci-bridge-set-mmio-base (while I am here) and it broke the guest.
> 
> The problem I am fixing by this is that QEMU started as below is
> unable to initialize virtio-net device because there are overlapping
> virtio's BAR and bridge's "ranges" property. Note that virtio-net is
> attached to the PHB, not that additional bridge.
> 
> /home/aik/qemu-system-ppc64 \
> -enable-kvm \
> -m 1024 \
> -machine pseries \
> -nographic \
> -vga none \
> -device pci-bridge,id=id0,bus=pci.0,addr=5.0,chassis_nr=7 \
> -netdev tap,id=id1,ifname=tap1,script=ifup.sh,downscript=ifdown.sh \
> -device virtio-net-pci,id=id2,netdev=id1 \
> -initrd 1.cpio \
> -kernel vml315rc3 \
> 
> This is from the guest:
> 
> PCI host bridge /pci@800000020000000  ranges:
>   IO 0x0000010080000000..0x000001008000ffff -> 0x0000000000000000
>  MEM 0x00000100a0000000..0x00000100bfffffff -> 0x0000000080000000
> 
> PCI:0000:00:00.0 Resource 0 0000000000010020-000000000001003f [40101]
> PCI:0000:00:00.0 Resource 1 00000100b0000000-00000100b0000fff [40200]
> PCI:0000:00:00.0 Resource 6 00000100b0040000-00000100b007ffff [4c200]
> 
> PCI:0000:00:05.0 Bus rsrc 1 0000000090100000-00000000a00fffff [40200]
> PCI:0000:00:05.0 Bus rsrc 2 00000100a0000000-00000100b00fffff [42208]
> PCI: PHB (bus 0) bridge rsrc 4: 0000000000010000-000000000001ffff [0x100], parent c000000000f765b8 (PCI IO)
> PCI: PHB (bus 0) bridge rsrc 5: 00000100a0000000-00000100bfffffff [0x200], parent c000000000f76580 (PCI mem)
> PCI: Allocating 0000:00:00.0: Resource 0: 0000000000010020..000000000001003f [40101]
> PCI: Allocating 0000:00:00.0: Resource 1: 00000100b0000000..00000100b0000fff [40200]
> PCI: Cannot allocate resource region 1 of device 0000:00:00.0, will remap
> 
> This are PHB and bridge "ranges":
> [root@erif_root ~]# hexdump -e '7/4 "%08x "' -e '"\n"' /proc/device-tree/pci@800000020000000/ranges
> 01000000 00000000 00000000 00000100 80000000 00000000 00010000
> 02000000 00000000 80000000 00000100 a0000000 00000000 20000000
> [root@erif_root ~]# hexdump -e '8/4 "%08x "' -e '"\n"' /proc/device-tree/pci@800000020000000/pci@5/ranges
> 02000000 00000000 90100000 02000000 00000000 90100000 00000000 10000000
> 42000000 00000000 80000000 42000000 00000000 80000000 00000000 10100000
> 
> And virtio-net BARs:
> [root@erif_root ~]# hexdump -e '5/4 "%08x "' -e '"\n"' /proc/device-tree/pci@800000020000000/ethernet@0/reg
> 00000000 00000000 00000000 00000000 00000000
> 01000010 00000000 00000000 00000000 00000020
> 02000014 00000000 00000000 00000000 00001000
> 02000030 00000000 00000000 00000000 00040000
> [root@erif_root ~]# hexdump -e '5/4 "%08x "' -e '"\n"' /proc/device-tree/pci@800000020000000/ethernet@0/assigned-addres
> es
> 82000030 00000000 90040000 00000000 00040000
> 81000010 00000000 00000020 00000000 00000020
> 82000014 00000000 90000000 00000000 00001000

Hm. The patch seems to remove non-prefetch range from the PCI bridge
(pci@5/ranges) and made weird start address. So it is wrong. Oh...

[root@erif_root ~]# hexdump -e '7/4 "%08x "' -e '"\n"'
/proc/device-tree/pci@800000020000000/ranges
01000000 00000000 00000000 00000100 80000000 00000000 00010000
02000000 00000000 80000000 00000100 a0000000 00000000 20000000
[root@erif_root ~]# hexdump -e '8/4 "%08x "' -e '"\n"'
/proc/device-tree/pci@800000020000000/pci@5/ranges
42000000 00000000 7ff00000 42000000 00000000 7ff00000 00000000 00100000
[root@erif_root ~]# hexdump -e '5/4 "%08x "' -e '"\n"'
/proc/device-tree/pci@800000020000000/ethernet@0/reg
00000000 00000000 00000000 00000000 00000000
01000010 00000000 00000000 00000000 00000020
02000014 00000000 00000000 00000000 00001000
02000030 00000000 00000000 00000000 00040000
es ot@erif_root ~]# hexdump -e '5/4 "%08x "' -e '"\n"'
/proc/device-tree/pci@800000020000000/ethernet@0/assigned-addresse
82000030 00000000 90040000 00000000 00040000
81000010 00000000 00000020 00000000 00000020
82000014 00000000 90000000 00000000 00001000



> ---
>  slof/fs/pci-scan.fs | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/slof/fs/pci-scan.fs b/slof/fs/pci-scan.fs
> index ec9bd27..53b3b2c 100644
> --- a/slof/fs/pci-scan.fs
> +++ b/slof/fs/pci-scan.fs
> @@ -115,6 +115,7 @@ here 100 allot CONSTANT pci-device-vec
>          THEN                                    \ FI
>          10 rshift                               \ keep upper 16 bits
>          pci-max-mem @ FFFF0000 and or           \ and Insert mmem Limit (set it to max)
> +        1-
>          swap 24 + rtas-config-l!                \ and write it into the bridge
>  ;
>  
> 


-- 
Alexey

^ permalink raw reply

* Boot problems with a PA6T board
From: Christian Zigotzky @ 2014-05-13 12:06 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <1399268891.4600.3.camel@concordia>

On 05.05.2014 07:48, Michael Ellerman wrote:
> On Sun, 2014-05-04 at 18:02 +0200, Christian Zigotzky wrote:
>> Hi All,
>>
>> The RC 1, 2, and 3 of the kernel 3.15 don't boot on my PA6T board with a
>> Radeon HD 6870 graphics card.
>>
>> Screenshot:
>> http://forum.hyperion-entertainment.biz/download/file.php?id=1060&mode=view
>>
>> The kernel 3.14 starts without any problems. Has anyone a tip for me,
>> please?
> The line that says "starting cpu hw idx 0... failed" looks a little worrying.
> Do you see that on 3.14 as well?
>
> Otherwise bisection is probably your best bet.
>
> cheers
Hi All,

I have found out which patch is responsible for the boot problems. It's 
patch 9000c17dc0f9c910267d2661225c9d33a227b27e. Link: 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9000c17dc0f9c910267d2661225c9d33a227b27e

Experimental protocol:

git checkout -f 01d8885785a60ae8f4c37b0ed75bdc96d0fc6a44; git clean -fdx 
(from 02/04/14) -> Kernel boots
git checkout -f f1553174a207f68a4ec19d436003097e0a4dc405; git clean -fdx 
(from 03/04/14) -> Kernel boots
git checkout -f d40326f4b9f9617cdfd30f83a2db57d47e9c5bac; git clean -fdx 
(from 04/04/14) -> Kernel boots
git checkout -f 930b440cd8256f3861bdb0a59d26efaadac7941a; git clean -fdx 
(from 05/04/14) -> doesn't boot (rtc error)
git checkout -f 2b3a8fd735f86ebeb2b9d061054003000c36b654; git clean -fdx 
(from 06/04/14) -> doesn't boot (rtc error)
git checkout -f 26c12d93348f0bda0756aff83f4867d9ae58a5a6; git clean -fdx 
(from 07/04/14) -> doesn't boot (rtc error)
git checkout -f a6c8aff022d4d06e4b41455ae9b2a5d3d503bf76; git clean -fdx 
(from 08/04/14) -> Kernel boots
git checkout -f 035328c202d26a824b8632fd3b00635db5aee5a2; git clean -fdx 
(from 08/04/14) -> Kernel boots
git checkout -f 9000c17dc0f9c910267d2661225c9d33a227b27e; git clean -fdx 
(from 08/04/14) powerpc/powernv: Fix endian issues with sensor code
One OPAL call and one device tree property needed byte swapping. -> 
doesn't boot (prom_init)
git checkout -f d3d35d957a9d0733dc51f14b5abc0bff5d3c5f3a; git clean -fdx 
(from 08/04/14) -> doesn't boot (prom_init)
git checkout -f c4586256f0c440bc2bdb29d2cbb915f0ca785d26; git clean -fdx 
(from 09/04/14) -> doesn't boot (prom_init)

I'm not a programmer but what can I do to solve this boot problem?

Cheers,

Christian

^ permalink raw reply

* Re: powerpc/ppc64: Allow allmodconfig to build (finally !)
From: Guenter Roeck @ 2014-05-13 12:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1399972601.17624.169.camel@pasglop>

On 05/13/2014 02:16 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2014-05-12 at 17:28 -0700, Guenter Roeck wrote:
>
>> After applying this patch, I get
>>
>> arch/powerpc/kernel/exceptions-64s.S:269: Error: operand out of range
>> (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
>> arch/powerpc/kernel/exceptions-64s.S:729: Error: operand out of range
>> (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
>>
>> with powerpc:defconfig, powerpc:allmodconfig, powerpc:cell_defconfig, and
>> powerpc:maple_defconfig.
>>
>> This is on top of v3.15-rc5. Any idea what is going on ?
>>
>> Compiler is powerpc64-poky-linux-gcc (GCC) 4.7.2 (from poky 1.4.0-1).
>
> Interesting... works with all my test configs using 4.7.3...
>
> I don't have my tree at hand right now, I'll check what that means
> tomorrow see if I can find a workaround.
>

Maybe something is wrong with my toolchain. I'll try to find a more recent one.

Guenter

^ permalink raw reply

* Re: [PATCH 1/1] powerpc/perf: Adjust callchain based on DWARF debug info
From: Maynard Johnson @ 2014-05-13 16:15 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: maynardj, Michael Ellerman, Anton Blanchard, linux-kernel,
	Ulrich.Weigand, Arnaldo Carvalho de Melo, linuxppc-dev
In-Reply-To: <20140510024638.GA27540@us.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 4259 bytes --]

Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> wrote on 05/09/2014
09:46:38 PM:

> From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> To: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
> Cc: linux-kernel@vger.kernel.org, Anton Blanchard
> <anton@au1.ibm.com>, Ulrich.Weigand@de.ibm.com, Michael Ellerman
> <michaele@au1.ibm.com>, Maynard Johnson/Rochester/IBM@IBMUS,
> linuxppc-dev@lists.ozlabs.org
> Date: 05/09/2014 09:46 PM
> Subject: [PATCH 1/1] powerpc/perf: Adjust callchain based on DWARF debug
info
>
> [PATCH 1/1] powerpc/perf: Adjust callchain based on DWARF debug info

Acked-by: Maynard Johnson <maynardj@us.ibm.com>

Reviewed and tested.  Thanks, Suka.

-Maynard

>
> When saving the callchain on Power, the kernel conservatively saves
excess
> entries in the callchain. A few of these entries are needed in some cases
> but not others.
>
> Eg: the value in the link register (LR) is needed only when it holds the
> return address of a function. At other times it must be ignored.
>
> If the unnecessary entries are not ignored, we end up with duplicate arcs
> in the call-graphs.
>
> Use DWARF debug information to ignore the unnecessary entries.
>
> Callgraph before the patch:
>
>     14.67%          2234  sprintft  libc-2.18.so       [.] __random
>             |
>             --- __random
>                |
>                |--61.12%-- __random
>                |          |
>                |          |--97.15%-- rand
>                |          |          do_my_sprintf
>                |          |          main
>                |          |          generic_start_main.isra.0
>                |          |          __libc_start_main
>                |          |          0x0
>                |          |
>                |           --2.85%-- do_my_sprintf
>                |                     main
>                |                     generic_start_main.isra.0
>                |                     __libc_start_main
>                |                     0x0
>                |
>                 --38.88%-- rand
>                           |
>                           |--94.01%-- rand
>                           |          do_my_sprintf
>                           |          main
>                           |          generic_start_main.isra.0
>                           |          __libc_start_main
>                           |          0x0
>                           |
>                            --5.99%-- do_my_sprintf
>                                      main
>                                      generic_start_main.isra.0
>                                      __libc_start_main
>                                      0x0
>
> Callgraph after the patch:
>
>     14.67%          2234  sprintft  libc-2.18.so       [.] __random
>             |
>             --- __random
>                |
>                |--95.93%-- rand
>                |          do_my_sprintf
>                |          main
>                |          generic_start_main.isra.0
>                |          __libc_start_main
>                |          0x0
>                |
>                 --4.07%-- do_my_sprintf
>                           main
>                           generic_start_main.isra.0
>                           __libc_start_main
>                           0x0
>
> TODO:   For split-debug info objects like glibc, we can only determine
>    the call-frame-address only when both .eh_frame and .debug_info
>    sections are available. We should be able to determin the CFA
>    even without the .eh_frame section.
>
> Thanks to Ulrich Weigand for help with DWARF debug information.
>
> Fix suggested by Anton Blanchard.
>
> Reported-by: Maynard Johnson <maynard@us.ibm.com>
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> ---
>  tools/perf/arch/powerpc/Makefile                |   1 +
>  tools/perf/arch/powerpc/util/adjust-callchain.c | 278 +++++++++++++
> +++++++++++
>  tools/perf/config/Makefile                      |   5 +
>  tools/perf/util/callchain.h                     |  12 +
>  tools/perf/util/machine.c                       |  16 +-
>  5 files changed, 310 insertions(+), 2 deletions(-)
>  create mode 100644 tools/perf/arch/powerpc/util/adjust-callchain.c
>
[snip]

[-- Attachment #2: Type: text/html, Size: 8999 bytes --]

^ permalink raw reply

* Re: [PATCH 1/1] powerpc/perf: Adjust callchain based on DWARF debug info
From: Sukadev Bhattiprolu @ 2014-05-13 16:49 UTC (permalink / raw)
  To: Maynard Johnson
  Cc: maynardj, Michael Ellerman, Anton Blanchard, linux-kernel,
	Ulrich.Weigand, Arnaldo Carvalho de Melo, linuxppc-dev
In-Reply-To: <OF1EBFFD42.19D89CD2-ON86257CD7.00591CE2-86257CD7.00594B52@us.ibm.com>

Maynard Johnson [mpjohn@us.ibm.com] wrote:
| > [PATCH 1/1] powerpc/perf: Adjust callchain based on DWARF debug info
| 
| Acked-by: Maynard Johnson <maynardj@us.ibm.com>
| 
| Reviewed and tested.  Thanks, Suka.

Thanks Maynard.  This updated patch also fixes whitespace damage.

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Fri, 9 May 2014 19:00:35 -0500
Subject: [PATCH 1/1] powerpc/perf: Adjust callchain based on DWARF debug info

When saving the callchain on Power, the kernel conservatively saves excess
entries in the callchain. A few of these entries are needed in some cases
but not others. We should use the DWARF debug information to determine
when the entries are  needed.

Eg: the value in the link register (LR) is needed only when it holds the
return address of a function. At other times it must be ignored.

If the unnecessary entries are not ignored, we end up with duplicate arcs
in the call-graphs.

Use the DWARF debug information to determine if any callchain entries
should be ignored when building call-graphs.

Callgraph before the patch:

    14.67%          2234  sprintft  libc-2.18.so       [.] __random
            |
            --- __random
               |
               |--61.12%-- __random
               |          |
               |          |--97.15%-- rand
               |          |          do_my_sprintf
               |          |          main
               |          |          generic_start_main.isra.0
               |          |          __libc_start_main
               |          |          0x0
               |          |
               |           --2.85%-- do_my_sprintf
               |                     main
               |                     generic_start_main.isra.0
               |                     __libc_start_main
               |                     0x0
               |
                --38.88%-- rand
                          |
                          |--94.01%-- rand
                          |          do_my_sprintf
                          |          main
                          |          generic_start_main.isra.0
                          |          __libc_start_main
                          |          0x0
                          |
                           --5.99%-- do_my_sprintf
                                     main
                                     generic_start_main.isra.0
                                     __libc_start_main
                                     0x0

Callgraph after the patch:

    14.67%          2234  sprintft  libc-2.18.so       [.] __random
            |
            --- __random
               |
               |--95.93%-- rand
               |          do_my_sprintf
               |          main
               |          generic_start_main.isra.0
               |          __libc_start_main
               |          0x0
               |
                --4.07%-- do_my_sprintf
                          main
                          generic_start_main.isra.0
                          __libc_start_main
                          0x0

TODO:	For split-debug info objects like glibc, we can only determine
	the call-frame-address only when both .eh_frame and .debug_info
	sections are available. We should be able to determin the CFA
	even without the .eh_frame section.

Fix suggested by Anton Blanchard.

Thanks to valuable input on DWARF debug information from Ulrich Weigand.

Reported-by: Maynard Johnson <maynard@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: Maynard Johnson <maynardj@us.ibm.com>
Acked-by: Maynard Johnson <maynardj@us.ibm.com>
---
 tools/perf/arch/powerpc/Makefile                |   1 +
 tools/perf/arch/powerpc/util/adjust-callchain.c | 276 ++++++++++++++++++++++++
 tools/perf/config/Makefile                      |   5 +
 tools/perf/util/callchain.h                     |  12 ++
 tools/perf/util/machine.c                       |  16 +-
 5 files changed, 308 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/arch/powerpc/util/adjust-callchain.c

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index 744e629..512cc8d 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -3,3 +3,4 @@ PERF_HAVE_DWARF_REGS := 1
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
 endif
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/adjust-callchain.o
diff --git a/tools/perf/arch/powerpc/util/adjust-callchain.c b/tools/perf/arch/powerpc/util/adjust-callchain.c
new file mode 100644
index 0000000..0689dd8
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/adjust-callchain.c
@@ -0,0 +1,276 @@
+/*
+ * Use DWARF Debug information to skip unnecessary callchain entries.
+ *
+ * Copyright (C) 2014 Sukadev Bhattiprolu, IBM Corporation.
+ * Copyright (C) 2014 Ulrich Weigand, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <inttypes.h>
+#include <dwarf.h>
+#include <elfutils/libdwfl.h>
+
+#include "util/thread.h"
+#include "util/callchain.h"
+
+/*
+ * When saving the callchain on Power, the kernel conservatively saves
+ * excess entries in the callchain. A few of these entries are needed
+ * in some cases but not others. If the unnecessary entries are not
+ * ignored, we end up with duplicate arcs in the call-graphs. Use
+ * DWARF debug information to skip over any unnecessary callchain
+ * entries.
+ *
+ * See function header for arch_adjust_callchain() below for more details.
+ *
+ * The libdwfl code in this file is based on code from elfutils
+ * (libdwfl/argp-std.c, libdwfl/tests/addrcfi.c, etc).
+ */
+static char *debuginfo_path;
+
+static const Dwfl_Callbacks offline_callbacks = {
+	.debuginfo_path = &debuginfo_path,
+	.find_debuginfo = dwfl_standard_find_debuginfo,
+	.section_address = dwfl_offline_section_address,
+};
+
+
+/*
+ * Use the DWARF expression for the Call-frame-address and determine
+ * if return address is in LR and if a new frame was allocated.
+ */
+static int check_return_reg(int ra_regno, Dwarf_Frame *frame)
+{
+	Dwarf_Op ops_mem[2];
+	Dwarf_Op dummy;
+	Dwarf_Op *ops = &dummy;
+	size_t nops;
+	int result;
+
+	result = dwarf_frame_register(frame, ra_regno, ops_mem, &ops, &nops);
+	if (result < 0) {
+		pr_debug("dwarf_frame_register() %s\n", dwarf_errmsg(-1));
+		return -1;
+	}
+
+	/*
+	 * Check if return address is on the stack.
+	 */
+	if (nops != 0 || ops != NULL)
+		return 0;
+
+	/*
+	 * Return address is in LR. Check if a frame was allocated
+	 * but not-yet used.
+	 */
+	result = dwarf_frame_cfa(frame, &ops, &nops);
+	if (result < 0) {
+		pr_debug("dwarf_frame_cfa() returns %d, %s\n", result,
+					dwarf_errmsg(-1));
+		return -1;
+	}
+
+	/*
+	 * If call frame address is in r1, no new frame was allocated.
+	 */
+	if (nops == 1 && ops[0].atom == DW_OP_bregx && ops[0].number == 1 &&
+				ops[0].number2 == 0)
+		return 1;
+
+	/*
+	 * A new frame was allocated but has not yet been used.
+	 */
+	return 2;
+}
+
+/*
+ * Get the DWARF frame from the .eh_frame section.
+ */
+static Dwarf_Frame *get_eh_frame(Dwfl_Module *mod, Dwarf_Addr pc)
+{
+	int		result;
+	Dwarf_Addr	bias;
+	Dwarf_CFI	*cfi;
+	Dwarf_Frame	*frame;
+
+	cfi = dwfl_module_eh_cfi(mod, &bias);
+	if (!cfi) {
+		pr_debug("%s(): no CFI - %s\n", __func__, dwfl_errmsg(-1));
+		return NULL;
+	}
+
+	result = dwarf_cfi_addrframe(cfi, pc, &frame);
+	if (result) {
+		pr_debug("%s(): %s\n", __func__, dwfl_errmsg(-1));
+		return NULL;
+	}
+
+	return frame;
+}
+
+/*
+ * Get the DWARF frame from the .debug_frame section.
+ */
+static Dwarf_Frame *get_dwarf_frame(Dwfl_Module *mod, Dwarf_Addr pc)
+{
+	Dwarf_CFI       *cfi;
+	Dwarf_Addr      bias;
+	Dwarf_Frame     *frame;
+	int             result;
+
+	cfi = dwfl_module_dwarf_cfi(mod, &bias);
+	if (!cfi) {
+		pr_debug("%s(): no CFI - %s\n", __func__, dwfl_errmsg(-1));
+		return NULL;
+	}
+
+	result = dwarf_cfi_addrframe(cfi, pc, &frame);
+	if (result) {
+		printf("%s(): %s\n", __func__, dwfl_errmsg(-1));
+		return NULL;
+	}
+
+	return frame;
+}
+
+/*
+ * Return:
+ *	0 if return address for the program counter @pc is on stack
+ *	1 if return address is in LR and no new stack frame was allocated
+ *	2 if return address is in LR and a new frame was allocated (but not
+ *		yet used)
+ *	-1 in case of errors
+ */
+static int check_return_addr(const char *exec_file, Dwarf_Addr pc)
+{
+	Dwfl		*dwfl;
+	Dwfl_Module	*mod;
+	Dwarf_Frame	*frame;
+	int		ra_regno;
+	Dwarf_Addr	start = pc;
+	Dwarf_Addr	end = pc;
+	bool		signalp;
+
+	dwfl = dwfl_begin(&offline_callbacks);
+	if (!dwfl) {
+		pr_debug("dwfl_begin() failed: %s\n", dwarf_errmsg(-1));
+		return -1;
+	}
+
+	if (dwfl_report_offline(dwfl, "",  exec_file, -1) == NULL) {
+		pr_debug("dwfl_report_offline() failed %s\n", dwarf_errmsg(-1));
+		return -1;
+	}
+
+	mod = dwfl_addrmodule(dwfl, pc);
+	if (!mod) {
+		pr_debug("dwfl_addrmodule() failed, %s\n", dwarf_errmsg(-1));
+		return -1;
+	}
+
+	/*
+	 * To work with split debug info files (eg: glibc), check both
+	 * .eh_frame and .debug_frame sections of the ELF header.
+	 */
+	frame = get_eh_frame(mod, pc);
+	if (!frame) {
+		frame = get_dwarf_frame(mod, pc);
+		if (!frame)
+			return -1;
+	}
+
+	ra_regno = dwarf_frame_info(frame, &start, &end, &signalp);
+	if (ra_regno < 0) {
+		pr_debug("Return address register unavailable: %s\n",
+				dwarf_errmsg(-1));
+		return -1;
+	}
+
+	return check_return_reg(ra_regno, frame);
+}
+
+/*
+ * The callchain saved by the kernel always includes the link register (LR).
+ *
+ *	0:	PERF_CONTEXT_USER
+ *	1:	Program counter (Next instruction pointer)
+ *	2:	LR value
+ *	3:	Caller's caller
+ *	4:	...
+ *
+ * The value in LR is only needed when it holds a return address. If the
+ * return address is on the stack, we should ignore the LR value.
+ *
+ * Further, when the return address is in the LR, if a new frame was just
+ * allocated but the LR was not saved into it, then the LR contains the
+ * caller, slot 4: contains the caller's caller and the contents of slot 3:
+ * (chain->ips[3]) is undefined and must be ignored.
+ *
+ * Use DWARF debug information to determine if any entries need to be skipped.
+ *
+ * Return:
+ *	index:	of callchain entry that needs to be ignored (if any)
+ *	-1	if no entry needs to be ignored or in case of errors
+ *
+ * TODO:
+ *	Rather than returning an index into the callchain and have the
+ *	caller skip that entry, we could modify the callchain in-place
+ *	by putting a PERF_CONTEXT_IGNORE marker in the affected entry.
+ *
+ *	But @chain points to read-only mmap, so the caller needs to
+ *	duplicate the callchain to modify in-place - something like:
+ *
+ *		new_callchain = arch_duplicate_callchain()
+ *		arch_adjust_callchain(new_callchain)
+ *		arch_free_callchain(new_callchain)
+ *
+ *	Since we only expect to adjust <= 1 entry for now, just return
+ *	the index.
+ */
+int arch_adjust_callchain(struct machine *machine, struct thread *thread,
+				struct ip_callchain *chain)
+{
+	struct addr_location al;
+	struct dso *dso = NULL;
+	int rc;
+	u64 ip;
+	u64 skip_slot = -1;
+
+	if (chain->nr < 3)
+		return skip_slot;
+
+	ip = chain->ips[2];
+
+	thread__find_addr_location(thread, machine, PERF_RECORD_MISC_USER,
+			MAP__FUNCTION, ip, &al);
+
+	if (al.map)
+		dso = al.map->dso;
+
+	if (!dso) {
+		pr_debug("%" PRIx64 " dso is NULL\n", ip);
+		return skip_slot;
+	}
+
+	rc = check_return_addr(dso->long_name, ip);
+
+	pr_debug("DSO %s, nr %" PRIx64 ", ip 0x%" PRIx64 "rc %d\n",
+				dso->long_name, chain->nr, ip, rc);
+
+	if (rc == 0) {
+		/*
+		 * Return address on stack. Ignore LR value in callchain
+		 */
+		skip_slot = 2;
+	} else if (rc == 2) {
+		/*
+		 * New frame allocated but return address still in LR.
+		 * Ignore the caller's caller entry in callchain.
+		 */
+		skip_slot = 3;
+	}
+	return skip_slot;
+}
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 5a3c452..7e93877 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -29,11 +29,16 @@ ifeq ($(ARCH),x86)
   endif
   NO_PERF_REGS := 0
 endif
+
 ifeq ($(ARCH),arm)
   NO_PERF_REGS := 0
   LIBUNWIND_LIBS = -lunwind -lunwind-arm
 endif
 
+ifeq ($(ARCH),powerpc)
+  CFLAGS += -DHAVE_ADJUST_CALLCHAIN
+endif
+
 ifeq ($(LIBUNWIND_LIBS),)
   NO_LIBUNWIND := 1
 else
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 8ad97e9..81ecb90 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -157,4 +157,16 @@ int sample__resolve_callchain(struct perf_sample *sample, struct symbol **parent
 int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *sample);
 
 extern const char record_callchain_help[];
+
+#ifdef HAVE_ADJUST_CALLCHAIN
+extern int arch_adjust_callchain(struct machine *machine,
+			struct thread *thread, struct ip_callchain *chain);
+#else
+static inline int arch_adjust_callchain(struct machine *machine,
+			struct thread *thread, struct ip_callchain *chain)
+{
+	return -1;
+}
+#endif
+
 #endif	/* __PERF_CALLCHAIN_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a53cd0b..dce3bf0 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1271,6 +1271,7 @@ static int machine__resolve_callchain_sample(struct machine *machine,
 	int chain_nr = min(max_stack, (int)chain->nr);
 	int i;
 	int err;
+	int skip_slot;
 
 	callchain_cursor_reset(&callchain_cursor);
 
@@ -1279,14 +1280,25 @@ static int machine__resolve_callchain_sample(struct machine *machine,
 		return 0;
 	}
 
+	/*
+	 * Based on DWARF debug information, some architectures skip
+	 * some of the callchain entries saved by the kernel.
+	 */
+	skip_slot = arch_adjust_callchain(machine, thread, chain);
+
 	for (i = 0; i < chain_nr; i++) {
 		u64 ip;
 		struct addr_location al;
 
-		if (callchain_param.order == ORDER_CALLEE)
+		if (callchain_param.order == ORDER_CALLEE) {
+			if (i == skip_slot)
+				continue;
 			ip = chain->ips[i];
-		else
+		} else {
+			if ((int)(chain->nr - i - 1) == skip_slot)
+				continue;
 			ip = chain->ips[chain->nr - i - 1];
+		}
 
 		if (ip >= PERF_CONTEXT_MAX) {
 			switch (ip) {
-- 
1.8.4.2

^ permalink raw reply related

* Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
From: Pedro Alves @ 2014-05-13 17:13 UTC (permalink / raw)
  To: Anshuman Khandual, linuxppc-dev, linux-kernel
  Cc: michael, mikey, avagin, oleg
In-Reply-To: <1399276469-13541-3-git-send-email-khandual@linux.vnet.ibm.com>

On 05/05/14 08:54, Anshuman Khandual wrote:
> This patch enables get and set of transactional memory related register
> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
> ELF core note types added previously in this regard.
> 
> 	(1) NT_PPC_TM_SPR
> 	(2) NT_PPC_TM_CGPR
> 	(3) NT_PPC_TM_CFPR
> 	(4) NT_PPC_TM_CVMX

Sorry that I couldn't tell this from the code, but, what does the
kernel return when the ptracer requests these registers and the
program is not in a transaction?  Specifically I'm wondering whether
this follows the same semantics as the s390 port.

-- 
Pedro Alves

^ permalink raw reply

* Re: powerpc/ppc64: Allow allmodconfig to build (finally !)
From: Guenter Roeck @ 2014-05-13 17:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1399972601.17624.169.camel@pasglop>

On Tue, May 13, 2014 at 07:16:41PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2014-05-12 at 17:28 -0700, Guenter Roeck wrote:
> 
> > After applying this patch, I get
> > 
> > arch/powerpc/kernel/exceptions-64s.S:269: Error: operand out of range
> > (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
> > arch/powerpc/kernel/exceptions-64s.S:729: Error: operand out of range
> > (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
> > 
> > with powerpc:defconfig, powerpc:allmodconfig, powerpc:cell_defconfig, and
> > powerpc:maple_defconfig.
> > 
> > This is on top of v3.15-rc5. Any idea what is going on ?
> > 
> > Compiler is powerpc64-poky-linux-gcc (GCC) 4.7.2 (from poky 1.4.0-1).
> 
> Interesting... works with all my test configs using 4.7.3...
> 
> I don't have my tree at hand right now, I'll check what that means
> tomorrow see if I can find a workaround.
> 
It works for me with gcc 4.8.2 (build from yocto 1.6.0).

Is asking people to use gcc 4.7.3 or later acceptable ?

Guenter

^ permalink raw reply

* Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
From: Pedro Alves @ 2014-05-13 17:21 UTC (permalink / raw)
  To: Anshuman Khandual, linuxppc-dev, linux-kernel
  Cc: michael, mikey, Roland McGrath, avagin, oleg
In-Reply-To: <1399276469-13541-3-git-send-email-khandual@linux.vnet.ibm.com>

I wonder whether people are getting Roland's address from?

It's frequent that ptrace related patches end up CCed to
roland@redhat.com, but, he's not been at Red Hat for a few years
now.  Roland, do you still want to be CCed on ptrace-related
issues?  If so, there's probably a script somewhere in the
kernel that needs updating.  If not, well, it'd be good
if it were updated anyway.  :-)

It's a little annoying, as Red Hat's servers outright reject
email sent from a @redhat.com address if one tries to send
an email that includes a CC/FROM to a user that no longer
exists in the @redhat.com domain.

-- 
Pedro Alves

On 05/05/14 08:54, Anshuman Khandual wrote:
> This patch enables get and set of transactional memory related register
> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
> ELF core note types added previously in this regard.
> 
> 	(1) NT_PPC_TM_SPR
> 	(2) NT_PPC_TM_CGPR
> 	(3) NT_PPC_TM_CFPR
> 	(4) NT_PPC_TM_CVMX
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/switch_to.h |   8 +
>  arch/powerpc/kernel/process.c        |  24 ++
>  arch/powerpc/kernel/ptrace.c         | 683 +++++++++++++++++++++++++++++++++--
>  3 files changed, 687 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
> index 0e83e7d..2737f46 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct *t)
>  }
>  #endif
>  
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +extern void flush_tmregs_to_thread(struct task_struct *);
> +#else
> +static inline void flush_tmregs_to_thread(struct task_struct *t)
> +{
> +}
> +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
> +
>  static inline void clear_task_ebb(struct task_struct *t)
>  {
>  #ifdef CONFIG_PPC_BOOK3S_64
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 31d0215..e247898 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -695,6 +695,30 @@ static inline void __switch_to_tm(struct task_struct *prev)
>  	}
>  }
>  
> +void flush_tmregs_to_thread(struct task_struct *tsk)
> +{
> +	/*
> +	 * If task is not current, it should have been flushed
> +	 * already to it's thread_struct during __switch_to().
> +	 */
> +	if (tsk != current)
> +		return;
> +
> +	preempt_disable();
> +	if (tsk->thread.regs) {
> +		/*
> +		 * If we are still current, the TM state need to
> +		 * be flushed to thread_struct as it will be still
> +		 * present in the current cpu.
> +		 */
> +		if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
> +			__switch_to_tm(tsk);
> +			tm_recheckpoint_new_task(tsk);
> +		}
> +	}
> +	preempt_enable();
> +}
> +
>  /*
>   * This is called if we are on the way out to userspace and the
>   * TIF_RESTORE_TM flag is set.  It checks if we need to reload
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 2e3d2bf..92faded 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset,
>  	return ret;
>  }
>  
> +/*
> + * When any transaction is active, "thread_struct->transact_fp" holds
> + * the current running value of all FPR registers and "thread_struct->
> + * fp_state" holds the last checkpointed FPR registers state for the
> + * current transaction.
> + *
> + * struct data {
> + * 	u64	fpr[32];
> + * 	u64	fpscr;
> + * };
> + */
>  static int fpr_get(struct task_struct *target, const struct user_regset *regset,
>  		   unsigned int pos, unsigned int count,
>  		   void *kbuf, void __user *ubuf)
> @@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const struct user_regset *regset,
>  	u64 buf[33];
>  	int i;
>  #endif
> -	flush_fp_to_thread(target);
> +	if (MSR_TM_ACTIVE(target->thread.regs->msr)) {
> +		flush_fp_to_thread(target);
> +		flush_altivec_to_thread(target);
> +		flush_tmregs_to_thread(target);
> +	} else {
> +		flush_fp_to_thread(target);
> +	}
>  
>  #ifdef CONFIG_VSX
>  	/* copy to local buffer then write that out */
> -	for (i = 0; i < 32 ; i++)
> -		buf[i] = target->thread.TS_FPR(i);
> -	buf[32] = target->thread.fp_state.fpscr;
> +	if (MSR_TM_ACTIVE(target->thread.regs->msr)) {
> +		for (i = 0; i < 32 ; i++)
> +			buf[i] = target->thread.TS_TRANS_FPR(i);
> +		buf[32] = target->thread.transact_fp.fpscr;
> +	} else {
> +		for (i = 0; i < 32 ; i++)
> +			buf[i] = target->thread.TS_FPR(i);
> +		buf[32] = target->thread.fp_state.fpscr;
> +	}
>  	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
>  
>  #else
> -	BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) !=
> -		     offsetof(struct thread_fp_state, fpr[32][0]));
> +	if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
> +		BUILD_BUG_ON(offsetof(struct transact_fp, fpscr) !=
> +				offsetof(struct transact_fp, fpr[32][0]));
>  
> -	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +		return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +				   &target->thread.transact_fp, 0, -1);
> +	} esle {
> +		BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) !=
> +			     offsetof(struct thread_fp_state, fpr[32][0]));
> +
> +		return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
>  				   &target->thread.fp_state, 0, -1);
> +	}
>  #endif
>  }
>  
> @@ -391,23 +422,44 @@ static int fpr_set(struct task_struct *target, const struct user_regset *regset,
>  	u64 buf[33];
>  	int i;
>  #endif
> -	flush_fp_to_thread(target);
> +	if (MSR_TM_ACTIVE(target->thread.regs->msr)) {
> +		flush_fp_to_thread(target);
> +		flush_altivec_to_thread(target);
> +		flush_tmregs_to_thread(target);
> +	} else {
> +		flush_fp_to_thread(target);
> +	}
>  
>  #ifdef CONFIG_VSX
>  	/* copy to local buffer then write that out */
>  	i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
>  	if (i)
>  		return i;
> -	for (i = 0; i < 32 ; i++)
> -		target->thread.TS_FPR(i) = buf[i];
> -	target->thread.fp_state.fpscr = buf[32];
> +	for (i = 0; i < 32 ; i++) {
> +		if (MSR_TM_ACTIVE(target->thread.regs->msr))
> +			target->thread.TS_TRANS_FPR(i) = buf[i];
> +		else
> +			target->thread.TS_FPR(i) = buf[i];
> +	}
> +	if (MSR_TM_ACTIVE(target->thread.regs->msr))
> +		target->thread.transact_fp.fpscr = buf[32];
> +	else
> +		target->thread.fp_state.fpscr = buf[32];
>  	return 0;
>  #else
> -	BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) !=
> -		     offsetof(struct thread_fp_state, fpr[32][0]));
> +	if (MSR_TM_ACTIVE(target->thread.regs->msr)) {
> +		BUILD_BUG_ON(offsetof(struct transact_fp, fpscr) !=
> +			     offsetof(struct transact_fp, fpr[32][0]));
>  
> -	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> -				  &target->thread.fp_state, 0, -1);
> +		return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +					  &target->thread.transact_fp, 0, -1);
> +	} else {
> +		BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) !=
> +			     offsetof(struct thread_fp_state, fpr[32][0]));
> +
> +		return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +				&target->thread.fp_state, 0, -1);
> +	}
>  #endif
>  }
>  
> @@ -432,20 +484,44 @@ static int vr_active(struct task_struct *target,
>  	return target->thread.used_vr ? regset->n : 0;
>  }
>  
> +/*
> + * When any transaction is active, "thread_struct->transact_vr" holds
> + * the current running value of all VMX registers and "thread_struct->
> + * vr_state" holds the last checkpointed value of VMX registers for the
> + * current transaction.
> + *
> + * struct data {
> + * 	vector128	vr[32];
> + * 	vector128	vscr;
> + * 	vector128	vrsave;
> + * };
> + */
>  static int vr_get(struct task_struct *target, const struct user_regset *regset,
>  		  unsigned int pos, unsigned int count,
>  		  void *kbuf, void __user *ubuf)
>  {
>  	int ret;
> +	struct thread_vr_state *addr;
>  
> -	flush_altivec_to_thread(target);
> +	if (MSR_TM_ACTIVE(target->thread.regs->msr)) {
> +		flush_fp_to_thread(target);
> +		flush_altivec_to_thread(target);
> +		flush_tmregs_to_thread(target);
> +	} else {
> +		flush_altivec_to_thread(target);
> +	}
>  
>  	BUILD_BUG_ON(offsetof(struct thread_vr_state, vscr) !=
>  		     offsetof(struct thread_vr_state, vr[32]));
>  
> +	if (MSR_TM_ACTIVE(target->thread.regs->msr))
> +		addr = &target->thread.transact_vr;
> +	else
> +		addr = &target->thread.vr_state;
> +
>  	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> -				  &target->thread.vr_state, 0,
> -				  33 * sizeof(vector128));
> +				addr, 0, 33 * sizeof(vector128));
> +
>  	if (!ret) {
>  		/*
>  		 * Copy out only the low-order word of vrsave.
> @@ -455,11 +531,14 @@ static int vr_get(struct task_struct *target, const struct user_regset *regset,
>  			u32 word;
>  		} vrsave;
>  		memset(&vrsave, 0, sizeof(vrsave));
> -		vrsave.word = target->thread.vrsave;
> +		if (MSR_TM_ACTIVE(target->thread.regs->msr))
> +			vrsave.word = target->thread.transact_vrsave;
> +		else
> +			vrsave.word = target->thread.vrsave;
> +
>  		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &vrsave,
>  					  33 * sizeof(vector128), -1);
>  	}
> -
>  	return ret;
>  }
>  
> @@ -467,16 +546,27 @@ static int vr_set(struct task_struct *target, const struct user_regset *regset,
>  		  unsigned int pos, unsigned int count,
>  		  const void *kbuf, const void __user *ubuf)
>  {
> +	struct thread_vr_state *addr;
>  	int ret;
>  
> -	flush_altivec_to_thread(target);
> +	if (MSR_TM_ACTIVE(target->thread.regs->msr)) {
> +		flush_fp_to_thread(target);
> +		flush_altivec_to_thread(target);
> +		flush_tmregs_to_thread(target);
> +	} else {
> +		flush_altivec_to_thread(target);
> +	}
>  
>  	BUILD_BUG_ON(offsetof(struct thread_vr_state, vscr) !=
>  		     offsetof(struct thread_vr_state, vr[32]));
>  
> +	if (MSR_TM_ACTIVE(target->thread.regs->msr))
> +		addr = &target->thread.transact_vr;
> +	else
> +		addr = &target->thread.vr_state;
>  	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> -				 &target->thread.vr_state, 0,
> -				 33 * sizeof(vector128));
> +			addr, 0, 33 * sizeof(vector128));
> +
>  	if (!ret && count > 0) {
>  		/*
>  		 * We use only the first word of vrsave.
> @@ -486,13 +576,21 @@ static int vr_set(struct task_struct *target, const struct user_regset *regset,
>  			u32 word;
>  		} vrsave;
>  		memset(&vrsave, 0, sizeof(vrsave));
> -		vrsave.word = target->thread.vrsave;
> +
> +		if (MSR_TM_ACTIVE(target->thread.regs->msr))
> +			vrsave.word = target->thread.transact_vrsave;
> +		else
> +			vrsave.word = target->thread.vrsave;
> +
>  		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &vrsave,
>  					 33 * sizeof(vector128), -1);
> -		if (!ret)
> -			target->thread.vrsave = vrsave.word;
> +		if (!ret) {
> +			if (MSR_TM_ACTIVE(target->thread.regs->msr))
> +				target->thread.transact_vrsave = vrsave.word;
> +			else
> +				target->thread.vrsave = vrsave.word;
> +		}
>  	}
> -
>  	return ret;
>  }
>  #endif /* CONFIG_ALTIVEC */
> @@ -613,6 +711,347 @@ static int evr_set(struct task_struct *target, const struct user_regset *regset,
>  }
>  #endif /* CONFIG_SPE */
>  
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +
> +/*
> + *  Transactional memory SPR
> + *
> + * struct {
> + * 	u64		tm_tfhar;
> + *	u64		tm_texasr;
> + *	u64		tm_tfiar;
> + *	unsigned long	tm_orig_msr;
> + * 	unsigned long	tm_tar;
> + *	unsigned long	tm_ppr;
> + *	unsigned long	tm_dscr;
> + * };
> + */
> +static int tm_spr_get(struct task_struct *target, const struct user_regset *regset,
> +		   unsigned int pos, unsigned int count,
> +		   void *kbuf, void __user *ubuf)
> +{
> +	int ret;
> +
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +
> +	/* TFHAR register */
> +	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_tfhar, 0, sizeof(u64));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_tfhar) +
> +			sizeof(u64) != offsetof(struct thread_struct, tm_texasr));
> +
> +	/* TEXASR register */
> +	if (!ret)
> +		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_texasr, sizeof(u64), 2 * sizeof(u64));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_texasr) +
> +			sizeof(u64) != offsetof(struct thread_struct, tm_tfiar));
> +
> +	/* TFIAR register */
> +	if (!ret)
> +		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_tfiar, 2 * sizeof(u64), 3 * sizeof(u64));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_tfiar) +
> +			sizeof(u64) != offsetof(struct thread_struct, tm_orig_msr));
> +
> +	/* TM checkpointed original MSR */
> +	if (!ret)
> +		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_orig_msr, 3 * sizeof(u64),
> +				3 * sizeof(u64) + sizeof(unsigned long));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_orig_msr) +
> +			sizeof(unsigned long) + sizeof(struct pt_regs)
> +				!= offsetof(struct thread_struct, tm_tar));
> +
> +	/* TM checkpointed TAR register */
> +	if (!ret)
> +		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_tar, 3 * sizeof(u64) +
> +				sizeof(unsigned long) , 3 * sizeof(u64) +
> +					2 * sizeof(unsigned long));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_tar)
> +			+ sizeof(unsigned long) !=
> +				offsetof(struct thread_struct, tm_ppr));
> +
> +	/* TM checkpointed PPR register */
> +	if (!ret)
> +		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +				&target->thread.tm_ppr, 3 * sizeof(u64) +
> +					2 * sizeof(unsigned long), 3 * sizeof(u64) +
> +						3 * sizeof(unsigned long));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_ppr) +
> +			sizeof(unsigned long) !=
> +				offsetof(struct thread_struct, tm_dscr));
> +
> +	/* TM checkpointed DSCR register */
> +	if (!ret)
> +		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_dscr, 3 * sizeof(u64)
> +				+ 3 * sizeof(unsigned long), 3 * sizeof(u64)
> +						+ 4 * sizeof(unsigned long));
> +	return ret;
> +}
> +
> +static int tm_spr_set(struct task_struct *target, const struct user_regset *regset,
> +		   unsigned int pos, unsigned int count,
> +		   const void *kbuf, const void __user *ubuf)
> +{
> +	int ret;
> +
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +
> +	/* TFHAR register */
> +	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +				&target->thread.tm_tfhar, 0, sizeof(u64));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_tfhar)
> +		+ sizeof(u64) != offsetof(struct thread_struct, tm_texasr));
> +
> +	/* TEXASR register */
> +	if (!ret)
> +		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_texasr, sizeof(u64), 2 * sizeof(u64));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_texasr)
> +		+ sizeof(u64) != offsetof(struct thread_struct, tm_tfiar));
> +
> +	/* TFIAR register */
> +	if (!ret)
> +		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_tfiar, 2 * sizeof(u64), 3 * sizeof(u64));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_tfiar)
> +		+ sizeof(u64) != offsetof(struct thread_struct, tm_orig_msr));
> +
> +	/* TM checkpointed orig MSR */
> +	if (!ret)
> +		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_orig_msr, 3 * sizeof(u64),
> +				3 * sizeof(u64) + sizeof(unsigned long));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_orig_msr)
> +		+ sizeof(unsigned long) + sizeof(struct pt_regs) !=
> +			offsetof(struct thread_struct, tm_tar));
> +
> +	/* TM checkpointed TAR register */
> +	if (!ret)
> +		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.tm_tar, 3 * sizeof(u64) +
> +				sizeof(unsigned long), 3 * sizeof(u64) +
> +					2 * sizeof(unsigned long));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_tar)
> +			+ sizeof(unsigned long) != offsetof(struct thread_struct, tm_ppr));
> +
> +	/* TM checkpointed PPR register */
> +	if (!ret)
> +		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +				&target->thread.tm_ppr, 3 * sizeof(u64)
> +					+ 2 * sizeof(unsigned long), 3 * sizeof(u64)
> +					+ 3 * sizeof(unsigned long));
> +
> +	BUILD_BUG_ON(offsetof(struct thread_struct, tm_ppr) +
> +			sizeof(unsigned long) !=
> +				offsetof(struct thread_struct, tm_dscr));
> +
> +	/* TM checkpointed DSCR register */
> +	if (!ret)
> +		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +				&target->thread.tm_dscr,
> +					3 * sizeof(u64) + 3 * sizeof(unsigned long),
> +					3 * sizeof(u64) + 4 * sizeof(unsigned long));
> +
> +	return ret;
> +}
> +
> +/*
> + * TM Checkpointed GPR
> + *
> + * struct data {
> + * 	struct pt_regs ckpt_regs;
> + * };
> + */
> +static int tm_cgpr_get(struct task_struct *target, const struct user_regset *regset,
> +		   unsigned int pos, unsigned int count,
> +		   void *kbuf, void __user *ubuf)
> +{
> +	int ret;
> +
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.ckpt_regs, 0,
> +				sizeof(struct pt_regs));
> +	return ret;
> +}
> +
> +static int tm_cgpr_set(struct task_struct *target, const struct user_regset *regset,
> +		   unsigned int pos, unsigned int count,
> +		   const void *kbuf, const void __user *ubuf)
> +{
> +	int ret;
> +
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +					&target->thread.ckpt_regs, 0,
> +						sizeof(struct pt_regs));
> +	return ret;
> +}
> +
> +/*
> + * TM Checkpointed FPR
> + *
> + * struct data {
> + * 	u64	fpr[32];
> + * 	u64	fpscr;
> + * };
> + */
> +static int tm_cfpr_get(struct task_struct *target, const struct user_regset *regset,
> +		   unsigned int pos, unsigned int count,
> +		   void *kbuf, void __user *ubuf)
> +{
> +#ifdef CONFIG_VSX
> +	u64 buf[33];
> +	int i;
> +#endif
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +
> +#ifdef CONFIG_VSX
> +	/* copy to local buffer then write that out */
> +	for (i = 0; i < 32 ; i++)
> +		buf[i] = target->thread.TS_FPR(i);
> +	buf[32] = target->thread.fp_state.fpscr;
> +	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
> +
> +#else
> +	BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) !=
> +		offsetof(struct thread_fp_state, fpr[32][0]));
> +
> +	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +			&target->thread.thread_fp_state, 0, -1);
> +#endif
> +}
> +
> +static int tm_cfpr_set(struct task_struct *target, const struct user_regset *regset,
> +		   unsigned int pos, unsigned int count,
> +		   const void *kbuf, const void __user *ubuf)
> +{
> +#ifdef CONFIG_VSX
> +	u64 buf[33];
> +	int i;
> +#endif
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +
> +#ifdef CONFIG_VSX
> +	/* copy to local buffer then write that out */
> +	i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
> +	if (i)
> +		return i;
> +	for (i = 0; i < 32 ; i++)
> +		target->thread.TS_FPR(i) = buf[i];
> +	target->thread.fp_state.fpscr = buf[32];
> +	return 0;
> +#else
> +	BUILD_BUG_ON(offsetof(struct thread_fp_state, fpscr) !=
> +		      offsetof(struct thread_fp_state, fpr[32][0]));
> +
> +	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +				&target->thread.fp_state, 0, -1);
> +#endif
> +}
> +
> +/*
> + * TM Checkpointed VMX
> + *
> + * struct data {
> + * 	vector128	vr[32];
> + * 	vector128	vscr;
> + * 	vector128	vrsave;
> + *};
> + */
> +static int tm_cvmx_get(struct task_struct *target, const struct user_regset *regset,
> +		   unsigned int pos, unsigned int count,
> +		   void *kbuf, void __user *ubuf)
> +{
> +	int ret;
> +
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +
> +	BUILD_BUG_ON(offsetof(struct thread_vr_state, vscr) !=
> +		     offsetof(struct thread_vr_state, vr[32]));
> +
> +	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> +				  &target->thread.vr_state, 0,
> +				  33 * sizeof(vector128));
> +	if (!ret) {
> +		/*
> +		 * Copy out only the low-order word of vrsave.
> +		 */
> +		union {
> +			elf_vrreg_t reg;
> +			u32 word;
> +		} vrsave;
> +		memset(&vrsave, 0, sizeof(vrsave));
> +		vrsave.word = target->thread.vrsave;
> +		ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &vrsave,
> +					  33 * sizeof(vector128), -1);
> +	}
> +	return ret;
> +}
> +
> +static int tm_cvmx_set(struct task_struct *target, const struct user_regset *regset,
> +		   unsigned int pos, unsigned int count,
> +		   const void *kbuf, const void __user *ubuf)
> +{
> +	int ret;
> +
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +
> +	BUILD_BUG_ON(offsetof(struct thread_vr_state, vscr) !=
> +		offsetof(struct thread_vr_state, vr[32]));
> +
> +	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> +				 &target->thread.vr_state, 0,
> +				 33 * sizeof(vector128));
> +	if (!ret && count > 0) {
> +		/*
> +		 * We use only the first word of vrsave.
> +		 */
> +		union {
> +			elf_vrreg_t reg;
> +			u32 word;
> +		} vrsave;
> +		memset(&vrsave, 0, sizeof(vrsave));
> +		vrsave.word = target->thread.vrsave;
> +		ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &vrsave,
> +					 33 * sizeof(vector128), -1);
> +		if (!ret)
> +			target->thread.vrsave = vrsave.word;
> +	}
> +	return ret;
> +}
> +#endif	/* CONFIG_PPC_TRANSACTIONAL_MEM */
>  
>  /*
>   * These are our native regset flavors.
> @@ -629,6 +1068,12 @@ enum powerpc_regset {
>  #ifdef CONFIG_SPE
>  	REGSET_SPE,
>  #endif
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	REGSET_TM_SPR,		/* TM specific SPR */
> +	REGSET_TM_CGPR,		/* TM checkpointed GPR */
> +	REGSET_TM_CFPR,		/* TM checkpointed FPR */
> +	REGSET_TM_CVMX,		/* TM checkpointed VMX */
> +#endif
>  };
>  
>  static const struct user_regset native_regsets[] = {
> @@ -663,6 +1108,28 @@ static const struct user_regset native_regsets[] = {
>  		.active = evr_active, .get = evr_get, .set = evr_set
>  	},
>  #endif
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	[REGSET_TM_SPR] = {
> +		.core_note_type = NT_PPC_TM_SPR, .n = 7,
> +		.size = sizeof(u64), .align = sizeof(u64),
> +		.get = tm_spr_get, .set = tm_spr_set
> +	},
> +	[REGSET_TM_CGPR] = {
> +		.core_note_type = NT_PPC_TM_CGPR, .n = ELF_NGREG,
> +		.size = sizeof(long), .align = sizeof(long),
> +		.get = tm_cgpr_get, .set = tm_cgpr_set
> +	},
> +	[REGSET_TM_CFPR] = {
> +		.core_note_type = NT_PPC_TM_CFPR, .n = ELF_NFPREG,
> +		.size = sizeof(double), .align = sizeof(double),
> +		.get = tm_cfpr_get, .set = tm_cfpr_set
> +	},
> +	[REGSET_TM_CVMX] = {
> +		.core_note_type = NT_PPC_TM_CVMX, .n = 34,
> +		.size = sizeof(vector128), .align = sizeof(vector128),
> +		.get = tm_cvmx_get, .set = tm_cvmx_set
> +	},
> +#endif
>  };
>  
>  static const struct user_regset_view user_ppc_native_view = {
> @@ -803,6 +1270,145 @@ static int gpr32_set(struct task_struct *target,
>  					 (PT_TRAP + 1) * sizeof(reg), -1);
>  }
>  
> +static int tm_cgpr32_get(struct task_struct *target,
> +		     const struct user_regset *regset,
> +		     unsigned int pos, unsigned int count,
> +		     void *kbuf, void __user *ubuf)
> +{
> +	const unsigned long *regs = &target->thread.ckpt_regs.gpr[0];
> +	compat_ulong_t *k = kbuf;
> +	compat_ulong_t __user *u = ubuf;
> +	compat_ulong_t reg;
> +	int i;
> +
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +
> +	if (target->thread.regs == NULL)
> +		return -EIO;
> +
> +	if (!FULL_REGS(target->thread.regs)) {
> +		/* We have a partial register set.  Fill 14-31 with bogus values */
> +		for (i = 14; i < 32; i++)
> +			target->thread.regs->gpr[i] = NV_REG_POISON; 
> +	}
> +
> +	pos /= sizeof(reg);
> +	count /= sizeof(reg);
> +
> +	if (kbuf)
> +		for (; count > 0 && pos < PT_MSR; --count)
> +			*k++ = regs[pos++];
> +	else
> +		for (; count > 0 && pos < PT_MSR; --count)
> +			if (__put_user((compat_ulong_t) regs[pos++], u++))
> +				return -EFAULT;
> +
> +	if (count > 0 && pos == PT_MSR) {
> +		reg = get_user_msr(target);
> +		if (kbuf)
> +			*k++ = reg;
> +		else if (__put_user(reg, u++))
> +			return -EFAULT;
> +		++pos;
> +		--count;
> +	}
> +
> +	if (kbuf)
> +		for (; count > 0 && pos < PT_REGS_COUNT; --count)
> +			*k++ = regs[pos++];
> +	else
> +		for (; count > 0 && pos < PT_REGS_COUNT; --count)
> +			if (__put_user((compat_ulong_t) regs[pos++], u++))
> +				return -EFAULT;
> +
> +	kbuf = k;
> +	ubuf = u;
> +	pos *= sizeof(reg);
> +	count *= sizeof(reg);
> +	return user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf,
> +					PT_REGS_COUNT * sizeof(reg), -1);
> +}
> +
> +static int tm_cgpr32_set(struct task_struct *target,
> +		     const struct user_regset *regset,
> +		     unsigned int pos, unsigned int count,
> +		     const void *kbuf, const void __user *ubuf)
> +{
> +	unsigned long *regs = &target->thread.ckpt_regs.gpr[0];
> +	const compat_ulong_t *k = kbuf;
> +	const compat_ulong_t __user *u = ubuf;
> +	compat_ulong_t reg;
> +
> +	flush_fp_to_thread(target);
> +	flush_altivec_to_thread(target);
> +	flush_tmregs_to_thread(target);
> +
> +	if (target->thread.regs == NULL)
> +		return -EIO;
> +
> +	CHECK_FULL_REGS(target->thread.regs);
> +
> +	pos /= sizeof(reg);
> +	count /= sizeof(reg);
> +
> +	if (kbuf)
> +		for (; count > 0 && pos < PT_MSR; --count)
> +			regs[pos++] = *k++;
> +	else
> +		for (; count > 0 && pos < PT_MSR; --count) {
> +			if (__get_user(reg, u++))
> +				return -EFAULT;
> +			regs[pos++] = reg;
> +		}
> +
> +
> +	if (count > 0 && pos == PT_MSR) {
> +		if (kbuf)
> +			reg = *k++;
> +		else if (__get_user(reg, u++))
> +			return -EFAULT;
> +		set_user_msr(target, reg);
> +		++pos;
> +		--count;
> +	}
> +
> +	if (kbuf) {
> +		for (; count > 0 && pos <= PT_MAX_PUT_REG; --count)
> +			regs[pos++] = *k++;
> +		for (; count > 0 && pos < PT_TRAP; --count, ++pos)
> +			++k;
> +	} else {
> +		for (; count > 0 && pos <= PT_MAX_PUT_REG; --count) {
> +			if (__get_user(reg, u++))
> +				return -EFAULT;
> +			regs[pos++] = reg;
> +		}
> +		for (; count > 0 && pos < PT_TRAP; --count, ++pos)
> +			if (__get_user(reg, u++))
> +				return -EFAULT;
> +	}
> +
> +	if (count > 0 && pos == PT_TRAP) {
> +		if (kbuf)
> +			reg = *k++;
> +		else if (__get_user(reg, u++))
> +			return -EFAULT;
> +		set_user_trap(target, reg);
> +		++pos;
> +		--count;
> +	}
> +
> +	kbuf = k;
> +	ubuf = u;
> +	pos *= sizeof(reg);
> +	count *= sizeof(reg);
> +	return user_regset_copyin_ignore(&pos, &count, &kbuf, &ubuf,
> +					 (PT_TRAP + 1) * sizeof(reg), -1);
> +}
> +
> +
>  /*
>   * These are the regset flavors matching the CONFIG_PPC32 native set.
>   */
> @@ -831,6 +1437,28 @@ static const struct user_regset compat_regsets[] = {
>  		.active = evr_active, .get = evr_get, .set = evr_set
>  	},
>  #endif
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	[REGSET_TM_SPR] = {
> +		.core_note_type = NT_PPC_TM_SPR, .n = 7,
> +		.size = sizeof(u64), .align = sizeof(u64),
> +		.get = tm_spr_get, .set = tm_spr_set
> +	},
> +	[REGSET_TM_CGPR] = {
> +		.core_note_type = NT_PPC_TM_CGPR, .n = ELF_NGREG,
> +		.size = sizeof(long), .align = sizeof(long),
> +		.get = tm_cgpr32_get, .set = tm_cgpr32_set
> +	},
> +	[REGSET_TM_CFPR] = {
> +		.core_note_type = NT_PPC_TM_CFPR, .n = ELF_NFPREG,
> +		.size = sizeof(double), .align = sizeof(double),
> +		.get = tm_cfpr_get, .set = tm_cfpr_set
> +	},
> +	[REGSET_TM_CVMX] = {
> +		.core_note_type = NT_PPC_TM_CVMX, .n = 34,
> +		.size = sizeof(vector128), .align = sizeof(vector128),
> +		.get = tm_cvmx_get, .set = tm_cvmx_set
> +	},
> +#endif
>  };
>  
>  static const struct user_regset_view user_ppc_compat_view = {
> @@ -1754,7 +2382,6 @@ long arch_ptrace(struct task_struct *child, long request,
>  					     REGSET_SPE, 0, 35 * sizeof(u32),
>  					     datavp);
>  #endif
> -
>  	default:
>  		ret = ptrace_request(child, request, addr, data);
>  		break;
> 

^ permalink raw reply

* Re: powerpc/ppc64: Allow allmodconfig to build (finally !)
From: Guenter Roeck @ 2014-05-13 19:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1399972601.17624.169.camel@pasglop>

On Tue, May 13, 2014 at 07:16:41PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2014-05-12 at 17:28 -0700, Guenter Roeck wrote:
> 
> > After applying this patch, I get
> > 
> > arch/powerpc/kernel/exceptions-64s.S:269: Error: operand out of range
> > (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
> > arch/powerpc/kernel/exceptions-64s.S:729: Error: operand out of range
> > (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
> > 
> > with powerpc:defconfig, powerpc:allmodconfig, powerpc:cell_defconfig, and
> > powerpc:maple_defconfig.
> > 
> > This is on top of v3.15-rc5. Any idea what is going on ?
> > 
> > Compiler is powerpc64-poky-linux-gcc (GCC) 4.7.2 (from poky 1.4.0-1).
> 
> Interesting... works with all my test configs using 4.7.3...
> 
> I don't have my tree at hand right now, I'll check what that means
> tomorrow see if I can find a workaround.
> 

Drives me crazy. With gcc 4.8.2, powerpc:allmodconfig builds, but now I get
failures with ppc64e_defconfig and chroma_defconfig:

arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol
`interrupt_base_book3e' defined in .text section in
arch/powerpc/kernel/built-in.o
arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol
`interrupt_end_book3e' defined in .text section in
arch/powerpc/kernel/built-in.o
arch/powerpc/kernel/built-in.o: In function `exc_debug_debug_book3e':
(.text+0x1679e): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol
`interrupt_base_book3e' defined in .text section in
arch/powerpc/kernel/built-in.o
arch/powerpc/kernel/built-in.o: In function `exc_debug_debug_book3e':
(.text+0x167b2): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol
`interrupt_end_book3e' defined in .text section in
arch/powerpc/kernel/built-in.o
arch/powerpc/kernel/built-in.o: In function `skpinv':
arch/powerpc/kernel/exceptions-64e.o:(.text+0x178c6): relocation truncated to
fit: R_PPC64_ADDR16_HI against `.text'+178e0
arch/powerpc/kernel/built-in.o: In function `a2_tlbinit_after_linear_map':
(.text+0x17966): relocation truncated to fit: R_PPC64_ADDR16_HI against
`.text'+17974
arch/powerpc/kernel/built-in.o: In function `.init_core_book3e':
arch/powerpc/kernel/exceptions-64e.o:(.text+0x17a7e): relocation truncated to
fit: R_PPC64_ADDR16_HI against symbol `interrupt_base_book3e' defined in .text
section in arch/powerpc/kernel/built-in.o

Worse, that happens even without your patch applied, and the patch does not
make a difference :-(.

Guenter

^ permalink raw reply

* Re: [PATCH] [resend] net: get rid of SET_ETHTOOL_OPS
From: David Miller @ 2014-05-13 21:43 UTC (permalink / raw)
  To: w-lkml
  Cc: devel, linux-s390, b.a.t.m.a.n, trivial, xen-devel, linux-rdma,
	netdev, linux-usb, linux-wireless, linux-kernel, virtualization,
	e1000-devel, bridge, devel, nios2-dev, dev, linuxppc-dev,
	linux-acenic
In-Reply-To: <20140511001231.GC7875@kaos.lebenslange-mailadresse.de>

From: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Date: Sun, 11 May 2014 00:12:32 +0000

> net: get rid of SET_ETHTOOL_OPS
> 
> Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
> This does that.
> 
> Mostly done via coccinelle script:
> @@
> struct ethtool_ops *ops;
> struct net_device *dev;
> @@
> -       SET_ETHTOOL_OPS(dev, ops);
> +       dev->ethtool_ops = ops;
> 
> Compile tested only, but I'd seriously wonder if this broke anything.
> 
> Suggested-by: Dave Miller <davem@davemloft.net>
> Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH] printk/of_serial: fix serial console cessation part way through boot.
From: Stephen Chivers @ 2014-05-13 22:04 UTC (permalink / raw)
  To: geert, jslaby, gregkh
  Cc: devicetree, linux-serial, linux-kernel, rob+dt, schivers,
	grant.likely, linuxppc-dev, cproctor

Commit 5f5c9ae56c38942623f69c3e6dc6ec78e4da2076
"serial_core: Unregister console in uart_remove_one_port()"
fixed a crash where a serial port was removed but
not deregistered as a console.

There is a side effect of that commit for platforms having serial consoles
and of_serial configured (CONFIG_SERIAL_OF_PLATFORM). The serial console
is disabled midway through the boot process.

This cessation of the serial console affects PowerPC computers
such as the MVME5100 and SAM440EP.

The sequence is:

	bootconsole [udbg0] enabled
	....
	serial8250/16550 driver initialises and registers its UARTS,
	one of these is the serial console.
	console [ttyS0] enabled
	....
	of_serial probes "platform" devices, registering them as it goes.
	One of these is the serial console.
	console [ttyS0] disabled.

The disabling of the serial console is due to:

	a.  unregister_console in printk not clearing the
	    CONS_ENABLED bit in the console flags,
	    even though it has announced that the console is disabled; and

	b.  of_platform_serial_probe in of_serial not setting the port type
	    before it registers with serial8250_register_8250_port.

This patch ensures that the serial console is re-enabled when of_serial
registers a serial port that corresponds to the designated console.

Signed-off-by: Stephen Chivers <schivers@csc.com>
Tested-by: Stephen Chivers <schivers@csc.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [unregister_console]

===
The above failure was identified in Linux-3.15-rc2.

Tested using MVME5100 and SAM440EP PowerPC computers with
kernels built from Linux-3.15-rc5 and tty-next.

The continued operation of the serial console is vital for computers
such as the MVME5100 as that Single Board Computer does not
have any grapical/display hardware.

---
 drivers/tty/serial/of_serial.c |    1 +
 kernel/printk/printk.c         |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/tty/serial/of_serial.c b/drivers/tty/serial/of_serial.c
index 9924660..27981e2 100644
--- a/drivers/tty/serial/of_serial.c
+++ b/drivers/tty/serial/of_serial.c
@@ -173,6 +173,7 @@ static int of_platform_serial_probe(struct platform_device *ofdev)
 	{
 		struct uart_8250_port port8250;
 		memset(&port8250, 0, sizeof(port8250));
+		port.type = port_type;
 		port8250.port = port;
 
 		if (port.fifosize)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7228258..221229c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2413,6 +2413,7 @@ int unregister_console(struct console *console)
 	if (console_drivers != NULL && console->flags & CON_CONSDEV)
 		console_drivers->flags |= CON_CONSDEV;
 
+	console->flags &= ~CON_ENABLED;
 	console_unlock();
 	console_sysfs_notify();
 	return res;

^ permalink raw reply related

* Re: [PATCH] powerpc/pseries: relocate "config DTL" so KConfig nests properly
From: Michael Neuling @ 2014-05-13 23:04 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Deepthi Dharwar, Paul Bolle, Gavin Shan, Li Zhong, linux-kernel,
	Paul Mackerras, Srivatsa S. Bhat, linuxppc-dev
In-Reply-To: <5371C5E5.6070701@linux.vnet.ibm.com>

On Tue, 2014-05-13 at 00:12 -0700, Cody P Schafer wrote:
> On 05/12/2014 11:23 PM, Michael Neuling wrote:
> >> powerpc/pseries: relocate "config DTL" so KConfig nests properly
> >
> > I don't know what that means.  Can you describe it in more detail?
> >
>=20
> So the "config DTL" refers to the configuration entry.
>=20
> The "nests properly" refers to the indent that 'make menuconfig' shows=
=20
> when a config-option that depends on the config-option proceeding it.
>=20
> In this case, moving config DTL up so it is below config PPC_SPLPAR=20
> means that menuconfig will show config DTL nicely indented right below=
=20
> config PPC_SPLPAR when PPC_SPLPAR is enabled.
>=20
> To contrast that, right now if I enable PPC_SPLPAR in menuconfig, all I=
=20
> can immediately tell is that "something showed up further down the list=
=20
> where I wasn't looking", and I end up having to toggle the option a few=
=20
> times to figure out what showed up, or look at the KConfig to find out=
=20
> that config DTL depends on config PPC_SPLPAR.
>=20
> Essentially, this enables menuconfig to provide a visual hint about the=
=20
> dependencies between options.

Sounds like a good idea.  Can you repost the patch with that same info
in the commit log.

Mikey


> > Mikey
> >
> >
> > On Mon, 2014-05-12 at 20:09 -0700, Cody P Schafer wrote:
> >> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> >> ---
> >>   arch/powerpc/platforms/pseries/Kconfig | 20 ++++++++++----------
> >>   1 file changed, 10 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/pla=
tforms/pseries/Kconfig
> >> index 2cb8b77..e00dd4d 100644
> >> --- a/arch/powerpc/platforms/pseries/Kconfig
> >> +++ b/arch/powerpc/platforms/pseries/Kconfig
> >> @@ -33,6 +33,16 @@ config PPC_SPLPAR
> >>   	  processors, that is, which share physical processors between
> >>   	  two or more partitions.
> >>
> >> +config DTL
> >> +	bool "Dispatch Trace Log"
> >> +	depends on PPC_SPLPAR && DEBUG_FS
> >> +	help
> >> +	  SPLPAR machines can log hypervisor preempt & dispatch events to a
> >> +	  kernel buffer. Saying Y here will enable logging these events,
> >> +	  which are accessible through a debugfs file.
> >> +
> >> +	  Say N if you are unsure.
> >> +
> >>   config PSERIES_MSI
> >>          bool
> >>          depends on PCI_MSI && PPC_PSERIES && EEH
> >> @@ -122,13 +132,3 @@ config HV_PERF_CTRS
> >>   	  systems. 24x7 is available on Power 8 systems.
> >>
> >>             If unsure, select Y.
> >> -
> >> -config DTL
> >> -	bool "Dispatch Trace Log"
> >> -	depends on PPC_SPLPAR && DEBUG_FS
> >> -	help
> >> -	  SPLPAR machines can log hypervisor preempt & dispatch events to a
> >> -	  kernel buffer. Saying Y here will enable logging these events,
> >> -	  which are accessible through a debugfs file.
> >> -
> >> -	  Say N if you are unsure.
> >
>=20

^ permalink raw reply

* Re: powerpc/ppc64: Allow allmodconfig to build (finally !)
From: Stephen Rothwell @ 2014-05-14  3:34 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linuxppc-dev, Alan Modra
In-Reply-To: <20140513171749.GA9440@roeck-us.net>

[-- Attachment #1: Type: text/plain, Size: 2177 bytes --]

Hi Guenter,

On Tue, 13 May 2014 10:17:49 -0700 Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Tue, May 13, 2014 at 07:16:41PM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2014-05-12 at 17:28 -0700, Guenter Roeck wrote:
> > 
> > > After applying this patch, I get
> > > 
> > > arch/powerpc/kernel/exceptions-64s.S:269: Error: operand out of range
> > > (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
> > > arch/powerpc/kernel/exceptions-64s.S:729: Error: operand out of range
> > > (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
> > > 
> > > with powerpc:defconfig, powerpc:allmodconfig, powerpc:cell_defconfig, and
> > > powerpc:maple_defconfig.
> > > 
> > > This is on top of v3.15-rc5. Any idea what is going on ?
> > > 
> > > Compiler is powerpc64-poky-linux-gcc (GCC) 4.7.2 (from poky 1.4.0-1).
> > 
> > Interesting... works with all my test configs using 4.7.3...
> > 
> > I don't have my tree at hand right now, I'll check what that means
> > tomorrow see if I can find a workaround.
> > 
> It works for me with gcc 4.8.2 (build from yocto 1.6.0).
> 
> Is asking people to use gcc 4.7.3 or later acceptable ?

OK, this appears to be an assembler bug.

$ cat test.s
	.text
x:
	.pushsection	b, "a"
	beq y
	.popsection
	.=0x80000
y:
$ /opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-as --version
GNU assembler (GNU Binutils) 2.22
This assembler was configured for a target of `powerpc64-linux'.
$ /opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-as -o test.o test.s 
test.s: Assembler messages:
test.s:4: Error: operand out of range (0x0000000000080000 is not between 0xffffffffffff8000 and 0x0000000000007ffc)
$ /opt/cross/gcc-4.8.1-nolibc/powerpc64-linux/bin/powerpc64-linux-as --version
GNU assembler (GNU Binutils) 2.23.52.20130512
This assembler was configured for a target of `powerpc64-linux'.
$ /opt/cross/gcc-4.8.1-nolibc/powerpc64-linux/bin/powerpc64-linux-as -o test.o test.s 
(no error)

Alan, can you shed light on when it was fixed?

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH RFC v3 0/8] EEH Support for VFIO PCI device
From: Gavin Shan @ 2014-05-14  4:11 UTC (permalink / raw)
  To: kvm-ppc; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu, linuxppc-dev

The series of patches intends to support EEH for PCI devices, which are
passed through to PowerKVM based guest via VFIO. The implementation is
straightforward based on the issues or problems we have to resolve to
support EEH for PowerKVM based guest.

- Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly.
  If QEMU can't handle it, the request will be sent to host via newly introduced
  VFIO container IOCTL command (VFIO_EEH_INFO) and gets handled in host kernel.

- The error injection infrastructure need support request from the userland
  utility "errinjct" and PowerKVM based guest. The userland utility "errinjct"
  works on pSeries platform well with dedicated syscall, which helps invoking
  RTAS service to fulfil error injection in kernel. From the perspective, it's
  reasonable to extend the syscall to support PowerNV platform so that OPAL call
  can be invoked in host kernel for injecting errors. The data transported
  between userland and kerenl is still following "struct rtas_args" for both
  cases of PowerNV (OPAL) and pSeries (RTAS).

The series of patches requires corresponding firmware changes from Mike Qiu to
support error injection and QEMU changes to support EEH for guest. QEMU patchset
will be sent separately.

Change log
==========
v1 -> v2:
	* EEH RTAS requests are routed to QEMU, and then possiblly to host kerenl.
	  The mechanism KVM in-kernel handling is dropped.
	* Error injection is reimplemented based syscall, instead of KVM in-kerenl
	  handling. The logic for error injection token management is moved to
	  QEMU. The error injection request is routed to QEMU and then possiblly
	  to host kernel.
v2 -> v3:
	* Make the fields in struct eeh_vfio_pci_addr, struct vfio_eeh_info based
	  on the comments from Alexey.
	* Define macros for EEH VFIO operations (Alexey).
	* Clear frozen state after successful PE reset.
	* Merge original [PATCH 1/2/3] to one.

Testing on P7
=============

- Emulex adapter

Testing on P8
=============

- Need more testing after design is finalized.

-----

Gavin Shan (8):
  drivers/vfio: Introduce CONFIG_VFIO_EEH
  powerpc/eeh: Info to trace passed devices
  drivers/vfio: New IOCTL command VFIO_EEH_INFO
  powerpc/eeh: Avoid event on passed PE
  powerpc/powernv: Sync OPAL header file with firmware
  powerpc: Extend syscall ppc_rtas()
  powerpc/powernv: Implement ppc_call_opal()
  powerpc/powernv: Error injection infrastructure

 arch/powerpc/include/asm/eeh.h                 |  52 +++
 arch/powerpc/include/asm/opal.h                |  74 ++-
 arch/powerpc/include/asm/rtas.h                |  10 +-
 arch/powerpc/include/asm/syscalls.h            |   2 +-
 arch/powerpc/include/asm/systbl.h              |   2 +-
 arch/powerpc/include/uapi/asm/unistd.h         |   2 +-
 arch/powerpc/kernel/eeh.c                      |   8 +
 arch/powerpc/kernel/eeh_pe.c                   |  80 ++++
 arch/powerpc/kernel/rtas.c                     |  57 +--
 arch/powerpc/kernel/syscalls.c                 |  50 +++
 arch/powerpc/platforms/powernv/Makefile        |   3 +-
 arch/powerpc/platforms/powernv/eeh-ioda.c      |   3 +-
 arch/powerpc/platforms/powernv/eeh-vfio.c      | 593 +++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/errinject.c     | 224 ++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/opal.c          |  93 ++++
 drivers/vfio/Kconfig                           |   6 +
 drivers/vfio/vfio_iommu_spapr_tce.c            |  12 +
 include/uapi/linux/vfio.h                      |  57 +++
 kernel/sys_ni.c                                |   2 +-
 20 files changed, 1278 insertions(+), 53 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c
 create mode 100644 arch/powerpc/platforms/powernv/errinject.c

Thanks,
Gavin

^ permalink raw reply

* [PATCH 1/8] drivers/vfio: Introduce CONFIG_VFIO_EEH
From: Gavin Shan @ 2014-05-14  4:11 UTC (permalink / raw)
  To: kvm-ppc; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu, linuxppc-dev
In-Reply-To: <1400040722-29608-1-git-send-email-gwshan@linux.vnet.ibm.com>

The patch introduces CONFIG_VFIO_EEH for more IOCTL commands on
tce_iommu_driver_ops to support EEH funtionality for PCI devices
that are passed through from host to guest.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/vfio/Kconfig | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index af7b204..4f3293b 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -8,11 +8,17 @@ config VFIO_IOMMU_SPAPR_TCE
 	depends on VFIO && SPAPR_TCE_IOMMU
 	default n
 
+config VFIO_EEH
+	tristate
+	depends on EEH && VFIO_IOMMU_SPAPR_TCE
+	default n
+
 menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	depends on IOMMU_API
 	select VFIO_IOMMU_TYPE1 if X86
 	select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES)
+	select VFIO_EEH if PPC_POWERNV
 	select ANON_INODES
 	help
 	  VFIO provides a framework for secure userspace device drivers.
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 5/8] powerpc/powernv: Sync OPAL header file with firmware
From: Gavin Shan @ 2014-05-14  4:11 UTC (permalink / raw)
  To: kvm-ppc; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu, linuxppc-dev
In-Reply-To: <1400040722-29608-1-git-send-email-gwshan@linux.vnet.ibm.com>

The patch synchronizes OPAL header file with firmware so that the
host kernel can make OPAL call to do error injection.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h                | 65 ++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 2 files changed, 66 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 66ad7a7..ca55d9c 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -175,6 +175,7 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_SET_PARAM				90
 #define OPAL_DUMP_RESEND			91
 #define OPAL_DUMP_INFO2				94
+#define OPAL_ERR_INJECT				96
 
 #ifndef __ASSEMBLY__
 
@@ -219,6 +220,69 @@ enum OpalPciErrorSeverity {
 	OPAL_EEH_SEV_INF	= 5
 };
 
+enum OpalErrinjctType {
+	OpalErrinjctTypeFirst			= 0,
+	OpalErrinjctTypeFatal			= 1,
+	OpalErrinjctTypeRecoverRandomEvent	= 2,
+	OpalErrinjctTypeRecoverSpecialEvent	= 3,
+	OpalErrinjctTypeCorruptedPage		= 4,
+	OpalErrinjctTypeCorruptedSlb		= 5,
+	OpalErrinjctTypeTranslatorFailure	= 6,
+	OpalErrinjctTypeIoaBusError		= 7,
+	OpalErrinjctTypeIoaBusError64		= 8,
+	OpalErrinjctTypePlatformSpecific	= 9,
+	OpalErrinjctTypeDcacheStart		= 10,
+	OpalErrinjctTypeDcacheEnd		= 11,
+	OpalErrinjctTypeIcacheStart		= 12,
+	OpalErrinjctTypeIcacheEnd		= 13,
+	OpalErrinjctTypeTlbStart		= 14,
+	OpalErrinjctTypeTlbEnd			= 15,
+	OpalErrinjctTypeUpstreamIoError		= 16,
+	OpalErrinjctTypeLast			= 17,
+
+	/* IoaBusError & IoaBusError64 */
+	OpalEjtIoaLoadMemAddr			= 0,
+	OpalEjtIoaLoadMemData			= 1,
+	OpalEjtIoaLoadIoAddr			= 2,
+	OpalEjtIoaLoadIoData			= 3,
+	OpalEjtIoaLoadConfigAddr		= 4,
+	OpalEjtIoaLoadConfigData		= 5,
+	OpalEjtIoaStoreMemAddr			= 6,
+	OpalEjtIoaStoreMemData			= 7,
+	OpalEjtIoaStoreIoAddr			= 8,
+	OpalEjtIoaStoreIoData			= 9,
+	OpalEjtIoaStoreConfigAddr		= 10,
+	OpalEjtIoaStoreConfigData		= 11,
+	OpalEjtIoaDmaReadMemAddr		= 12,
+	OpalEjtIoaDmaReadMemData		= 13,
+	OpalEjtIoaDmaReadMemMaster		= 14,
+	OpalEjtIoaDmaReadMemTarget		= 15,
+	OpalEjtIoaDmaWriteMemAddr		= 16,
+	OpalEjtIoaDmaWriteMemData		= 17,
+	OpalEjtIoaDmaWriteMemMaster		= 18,
+	OpalEjtIoaDmaWriteMemTarget		= 19,
+};
+
+struct OpalErrinjct {
+	int32_t type;
+	union {
+		struct {
+			uint32_t addr;
+			uint32_t mask;
+			uint64_t phb_id;
+			uint32_t pe;
+			uint32_t function;
+		}ioa;
+		struct {
+			uint64_t addr;
+			uint64_t mask;
+			uint64_t phb_id;
+			uint32_t pe;
+			uint32_t function;
+		}ioa64;
+	};
+};
+
 enum OpalShpcAction {
 	OPAL_SHPC_GET_LINK_STATE = 0,
 	OPAL_SHPC_GET_SLOT_STATE = 1
@@ -839,6 +903,7 @@ int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer,
 				   uint64_t diag_buffer_len);
 int64_t opal_pci_get_phb_diag_data2(uint64_t phb_id, void *diag_buffer,
 				    uint64_t diag_buffer_len);
+int64_t opal_err_injct(void *data);
 int64_t opal_pci_fence_phb(uint64_t phb_id);
 int64_t opal_pci_reinit(uint64_t phb_id, uint64_t reinit_scope, uint64_t data);
 int64_t opal_pci_mask_pe_error(uint64_t phb_id, uint16_t pe_number, uint8_t error_type, uint8_t mask_action);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index f531ffe..46265de 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -119,6 +119,7 @@ OPAL_CALL(opal_pci_next_error,			OPAL_PCI_NEXT_ERROR);
 OPAL_CALL(opal_pci_poll,			OPAL_PCI_POLL);
 OPAL_CALL(opal_pci_msi_eoi,			OPAL_PCI_MSI_EOI);
 OPAL_CALL(opal_pci_get_phb_diag_data2,		OPAL_PCI_GET_PHB_DIAG_DATA2);
+OPAL_CALL(opal_err_injct,			OPAL_ERR_INJECT);
 OPAL_CALL(opal_xscom_read,			OPAL_XSCOM_READ);
 OPAL_CALL(opal_xscom_write,			OPAL_XSCOM_WRITE);
 OPAL_CALL(opal_lpc_read,			OPAL_LPC_READ);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 2/8] powerpc/eeh: Info to trace passed devices
From: Gavin Shan @ 2014-05-14  4:11 UTC (permalink / raw)
  To: kvm-ppc; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu, linuxppc-dev
In-Reply-To: <1400040722-29608-1-git-send-email-gwshan@linux.vnet.ibm.com>

The address of passed PCI devices (domain:bus:slot:func) might be
quite different from the perspective of host and guest. We have to
trace the address mapping so that we can emulate EEH RTAS requests
from guest. The patch introduces additional fields to eeh_pe and
eeh_dev for the purpose.

Also, the patch adds function eeh_vfio_pe_get() and eeh_vfio_dev_get()
to search EEH PE or device according to the given guest address. Both
of them will be used by subsequent patches.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h | 52 +++++++++++++++++++++++++++
 arch/powerpc/kernel/eeh_pe.c   | 80 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 7782056..96dabfc 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -48,6 +48,14 @@ struct device_node;
 #define EEH_PE_RST_HOLD_TIME		250
 #define EEH_PE_RST_SETTLE_TIME		1800
 
+#ifdef CONFIG_VFIO_EEH
+struct eeh_vfio_pci_addr {
+	uint64_t	buid;		/* PHB BUID			*/
+	uint16_t	config_addr;	/* Bus/Device/Function number	*/
+	uint32_t	pe_addr;	/* PE configuration address	*/
+};
+#endif /* CONFIG_VFIO_EEH */
+
 /*
  * The struct is used to trace PE related EEH functionality.
  * In theory, there will have one instance of the struct to
@@ -72,6 +80,7 @@ struct device_node;
 #define EEH_PE_RESET		(1 << 2)	/* PE reset in progress	*/
 
 #define EEH_PE_KEEP		(1 << 8)	/* Keep PE on hotplug	*/
+#define EEH_PE_PASSTHROUGH	(1 << 9)	/* PE owned by guest	*/
 
 struct eeh_pe {
 	int type;			/* PE type: PHB/Bus/Device	*/
@@ -85,6 +94,9 @@ struct eeh_pe {
 	struct timeval tstamp;		/* Time on first-time freeze	*/
 	int false_positives;		/* Times of reported #ff's	*/
 	struct eeh_pe *parent;		/* Parent PE			*/
+#ifdef CONFIG_VFIO_EEH
+	struct eeh_vfio_pci_addr guest_addr;
+#endif
 	struct list_head child_list;	/* Link PE to the child list	*/
 	struct list_head edevs;		/* Link list of EEH devices	*/
 	struct list_head child;		/* Child PEs			*/
@@ -93,6 +105,21 @@ struct eeh_pe {
 #define eeh_pe_for_each_dev(pe, edev, tmp) \
 		list_for_each_entry_safe(edev, tmp, &pe->edevs, list)
 
+static inline bool eeh_pe_passed(struct eeh_pe *pe)
+{
+	return pe ? !!(pe->state & EEH_PE_PASSTHROUGH) : false;
+}
+
+static inline void eeh_pe_set_passed(struct eeh_pe *pe, bool passed)
+{
+	if (pe) {
+		if (passed)
+			pe->state |= EEH_PE_PASSTHROUGH;
+		else
+			pe->state &= ~EEH_PE_PASSTHROUGH;
+	}
+}
+
 /*
  * The struct is used to trace EEH state for the associated
  * PCI device node or PCI device. In future, it might
@@ -110,6 +137,7 @@ struct eeh_pe {
 #define EEH_DEV_SYSFS		(1 << 9)	/* Sysfs created	*/
 #define EEH_DEV_REMOVED		(1 << 10)	/* Removed permanently	*/
 #define EEH_DEV_FRESET		(1 << 11)	/* Fundamental reset	*/
+#define EEH_DEV_PASSTHROUGH	(1 << 12)	/* Owned by guest	*/
 
 struct eeh_dev {
 	int mode;			/* EEH mode			*/
@@ -126,6 +154,9 @@ struct eeh_dev {
 	struct device_node *dn;		/* Associated device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
+#ifdef CONFIG_VFIO_EEH
+	struct eeh_vfio_pci_addr guest_addr;
+#endif
 };
 
 static inline struct device_node *eeh_dev_to_of_node(struct eeh_dev *edev)
@@ -138,6 +169,21 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct eeh_dev *edev)
 	return edev ? edev->pdev : NULL;
 }
 
+static inline bool eeh_dev_passed(struct eeh_dev *dev)
+{
+	return dev ? !!(dev->mode & EEH_DEV_PASSTHROUGH) : false;
+}
+
+static inline void eeh_dev_set_passed(struct eeh_dev *dev, bool passed)
+{
+	if (dev) {
+		if (passed)
+			dev->mode |= EEH_DEV_PASSTHROUGH;
+		else
+			dev->mode &= ~EEH_DEV_PASSTHROUGH;
+	}
+}
+
 /* Return values from eeh_ops::next_error */
 enum {
 	EEH_NEXT_ERR_NONE = 0,
@@ -335,6 +381,12 @@ static inline void eeh_remove_device(struct pci_dev *dev) { }
 #define EEH_IO_ERROR_VALUE(size) (-1UL)
 #endif /* CONFIG_EEH */
 
+
+#ifdef CONFIG_VFIO_EEH
+struct eeh_dev *eeh_vfio_dev_get(struct eeh_vfio_pci_addr *addr);
+struct eeh_pe *eeh_vfio_pe_get(struct eeh_vfio_pci_addr *addr);
+#endif /* CONFIG_VFIO_EEH */
+
 #ifdef CONFIG_PPC64
 /*
  * MMIO read/write operations with EEH support.
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index fbd01eb..0e7f7af 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -248,6 +248,86 @@ struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
 	return pe;
 }
 
+#ifdef CONFIG_VFIO_EEH
+static void *__eeh_vfio_dev_get(void *data, void *flag)
+{
+	struct eeh_pe *pe = (struct eeh_pe *)data;
+	struct eeh_vfio_pci_addr *addr = (struct eeh_vfio_pci_addr *)flag;
+	struct eeh_dev *edev, *tmp;
+
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		if (!eeh_dev_passed(edev))
+			continue;
+
+		/* Comparing the address in the guest */
+		if (edev->guest_addr.buid == addr->buid &&
+		    edev->guest_addr.config_addr == addr->config_addr)
+			return edev;
+	}
+
+	return NULL;
+}
+
+/**
+ * eeh_vfio_dev_get - Search EEH device based on guest's address
+ * @addr: EEH device guest address
+ *
+ * Search the EEH device according to its guest's address, which
+ * is made up of PHB BUID, and PCI config address.
+ */
+struct eeh_dev *eeh_vfio_dev_get(struct eeh_vfio_pci_addr *addr)
+{
+	struct eeh_pe *root;
+	struct eeh_dev *edev;
+
+	list_for_each_entry(root, &eeh_phb_pe, child) {
+		edev = eeh_pe_traverse(root, __eeh_vfio_dev_get, addr);
+		if (edev)
+			return edev;
+	}
+
+	return NULL;
+}
+
+static void *__eeh_vfio_pe_get(void *data, void *flag)
+{
+	struct eeh_pe *pe = (struct eeh_pe *)data;
+	struct eeh_vfio_pci_addr *addr = (struct eeh_vfio_pci_addr *)flag;
+
+	if (!eeh_pe_passed(pe))
+		return NULL;
+
+	/* Comparing the address */
+	if (pe->guest_addr.buid == addr->buid &&
+	    pe->guest_addr.pe_addr == addr->pe_addr)
+		return pe;
+
+	return NULL;
+}
+
+/**
+ * eeh_vfio_pe_get - Search EEH PE based on guest's address
+ * @addr: EEH PE guest address
+ *
+ * Search the EEH PE according to the guest address, which
+ * is made up of VM indicator, PHB BUID, and PE configuration
+ * address.
+ */
+struct eeh_pe *eeh_vfio_pe_get(struct eeh_vfio_pci_addr *addr)
+{
+	struct eeh_pe *root;
+	struct eeh_pe *pe;
+
+	list_for_each_entry(root, &eeh_phb_pe, child) {
+		pe = eeh_pe_traverse(root, __eeh_vfio_pe_get, addr);
+		if (pe)
+			return pe;
+	}
+
+	return NULL;
+}
+#endif /* CONFIG_VFIO_EEH */
+
 /**
  * eeh_pe_get_parent - Retrieve the parent PE
  * @edev: EEH device
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 8/8] powerpc/powernv: Error injection infrastructure
From: Gavin Shan @ 2014-05-14  4:12 UTC (permalink / raw)
  To: kvm-ppc; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu, linuxppc-dev
In-Reply-To: <1400040722-29608-1-git-send-email-gwshan@linux.vnet.ibm.com>

The patch intends to implement the error injection infrastructure
for PowerNV platform. The predetermined handlers will be called
according to the type of injected error (e.g. OpalErrinjctTypeIoaBusError).
For now, we just support PCI error injection. We need support
injecting other types of errors in future.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h            |   6 +
 arch/powerpc/platforms/powernv/Makefile    |   2 +-
 arch/powerpc/platforms/powernv/errinject.c | 224 +++++++++++++++++++++++++++++
 3 files changed, 231 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/powernv/errinject.c

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 7c4ffd0..7bf86ba 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -794,6 +794,12 @@ typedef struct oppanel_line {
 	uint64_t 	line_len;
 } oppanel_line_t;
 
+enum OpalCallToken{
+	OPAL_CALL_TOKEN_MIN = 0,
+	OPAL_CALL_TOKEN_ERRINJCT,
+	OPAL_CALL_TOKEN_MAX
+};
+
 /* /sys/firmware/opal */
 extern struct kobject *opal_kobj;
 
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 2b15a03..5ae8257 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,7 +1,7 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o opal-async.o
 obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
-obj-y			+= opal-msglog.o
+obj-y			+= opal-msglog.o errinject.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/errinject.c b/arch/powerpc/platforms/powernv/errinject.c
new file mode 100644
index 0000000..aa892d4
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/errinject.c
@@ -0,0 +1,224 @@
+/*
+ * The file intends to support error injection requests from host OS
+ * owned utility (e.g. errinjct) or VM. We need parse the information
+ * passed from user space and call to appropriate OPAL API accordingly.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2014.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/msi.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/io.h>
+#include <asm/iommu.h>
+#include <asm/msi_bitmap.h>
+#include <asm/opal.h>
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+#include <asm/rtas.h>
+#include <asm/tce.h>
+#include <asm/uaccess.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+static int powernv_errinjct_ioa(struct rtas_args *args)
+{
+	return -ENXIO;
+}
+
+static int powernv_errinjct_ioa64(struct rtas_args *args)
+{
+	return -ENXIO;
+}
+
+#ifdef CONFIG_VFIO_EEH
+static int powernv_errinjct_ioa_virt(struct rtas_args *args)
+{
+	uint32_t addr, mask, cfg_addr;
+	uint32_t buid_hi, buid_lo, op;
+	uint64_t buf_addr = ((uint64_t)(args->args[3])) << 32 |
+			    args->args[4];
+	void __user *buf = (void __user *)buf_addr;
+	struct eeh_vfio_pci_addr vfio_addr;
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct OpalErrinjct ej;
+
+	/* Extract parameters */
+	if (get_user(addr, (uint32_t __user *)buf) ||
+	    get_user(mask, (uint32_t __user *)(buf + 4)) ||
+	    get_user(cfg_addr, (uint32_t __user *)(buf + 8)) ||
+	    get_user(buid_hi, (uint32_t __user *)(buf + 12)) ||
+	    get_user(buid_lo, (uint32_t __user *)(buf + 16)) ||
+	    get_user(op, (uint32_t __user *)(buf + 20)))
+		return -EFAULT;
+
+	/* Check opcode */
+	if (op < OpalEjtIoaLoadMemAddr ||
+	    op > OpalEjtIoaDmaWriteMemTarget)
+		return -EINVAL;
+
+	/* Find PE */
+	vfio_addr.buid = ((((uint64_t)buid_hi) << 32) | buid_lo);
+	vfio_addr.pe_addr = cfg_addr;
+	pe = eeh_vfio_pe_get(&vfio_addr);
+	if (!pe)
+		return -ENODEV;
+	phb = pe->phb->private_data;
+
+	/* OPAL call */
+	ej.type = OpalErrinjctTypeIoaBusError;
+	ej.ioa.addr = addr;
+	ej.ioa.mask = mask;
+	ej.ioa.phb_id = phb->opal_id;
+	ej.ioa.pe = pe->addr;
+	ej.ioa.function = op;
+	if (opal_err_injct(&ej) != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+
+static int powernv_errinjct_ioa64_virt(struct rtas_args *args)
+{
+	uint32_t addr_hi, addr_lo, mask_hi, mask_lo;
+	uint32_t cfg_addr, buid_hi, buid_lo, op;
+	uint64_t buf_addr = ((uint64_t)(args->args[3])) << 32 |
+			    args->args[4];
+	void __user *buf = (void __user *)buf_addr;
+	struct eeh_vfio_pci_addr vfio_addr;
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct OpalErrinjct ej;
+
+	/* Extract parameters */
+	if (get_user(addr_hi, (uint32_t __user *)buf) ||
+	    get_user(addr_lo, (uint32_t __user *)(buf + 4)) ||
+	    get_user(mask_hi, (uint32_t __user *)(buf + 8)) ||
+	    get_user(mask_lo, (uint32_t __user *)(buf + 12)) ||
+	    get_user(cfg_addr, (uint32_t __user *)(buf + 16)) ||
+	    get_user(buid_hi, (uint32_t __user *)(buf + 20)) ||
+	    get_user(buid_lo, (uint32_t __user *)(buf + 24)) ||
+	    get_user(op, (uint32_t __user *)(buf + 28)))
+		return -EFAULT;
+
+	/* Check opcode */
+	if (op < OpalEjtIoaLoadMemAddr ||
+	    op > OpalEjtIoaDmaWriteMemTarget)
+		return -EINVAL;
+
+	/* Find PE */
+	vfio_addr.buid = ((((uint64_t)buid_hi) << 32) | buid_lo);
+	vfio_addr.pe_addr = (cfg_addr >> 8) & 0xffff;
+	pe = eeh_vfio_pe_get(&vfio_addr);
+	if (!pe)
+		return -ENODEV;
+	phb = pe->phb->private_data;
+
+	/* OPAL call */
+	ej.type = OpalErrinjctTypeIoaBusError64;
+	ej.ioa.addr = (((uint64_t)addr_hi) << 32) | addr_lo;
+	ej.ioa.mask = (((uint64_t)mask_hi) << 32) | mask_lo;
+	ej.ioa.phb_id = phb->opal_id;
+	ej.ioa.pe = pe->addr;
+	ej.ioa.function = op;
+	if (opal_err_injct(&ej) != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+#endif /* CONFIG_VFIO_EEH */
+
+struct errinjct_handler {
+	bool virt;
+	int token;
+	int (*fn)(struct rtas_args *arg);
+};
+
+static struct errinjct_handler handlers[] = {
+#ifdef CONFIG_EEH
+	{ false,
+	  OpalErrinjctTypeIoaBusError,
+	  powernv_errinjct_ioa
+	},
+	{ false,
+	  OpalErrinjctTypeIoaBusError64,
+          powernv_errinjct_ioa64
+	},
+#endif
+#ifdef CONFIG_VFIO_EEH
+	{ true,
+	  OpalErrinjctTypeIoaBusError,
+	  powernv_errinjct_ioa_virt
+	},
+	{ true,
+	  OpalErrinjctTypeIoaBusError64,
+	  powernv_errinjct_ioa64_virt
+	},
+#endif
+};
+
+static int powernv_errinjct(struct rtas_args *args)
+{
+	struct errinjct_handler *h;
+	int token, ej_token, i;
+	bool virt;
+
+	/* Sanity check */
+	if (args->nargs != 5 || args->nret != 1)
+		return -EINVAL;
+
+	token = args->token;
+	virt = !!args->args[0];
+	if (!virt || token != OPAL_CALL_TOKEN_ERRINJCT)
+		return -EINVAL;
+
+	/* Call into specific handler */
+	ej_token = args->args[1];
+	for (i = 0; i < ARRAY_SIZE(handlers); i++) {
+		h = &handlers[i];
+		if (h->virt == virt &&
+		    h->token == ej_token &&
+		    h->fn)
+			return h->fn(args);
+	}
+
+	return -ENXIO;
+}
+
+static int __init powernv_errinjct_init(void)
+{
+	int ret;
+
+	ret = opal_call_handler_register(false, OPAL_CALL_TOKEN_ERRINJCT,
+					 powernv_errinjct);
+	if (ret) {
+		pr_warn("%s: Cannot register errinjct handler\n",
+			__func__);
+		return ret;
+	}
+
+	ret = opal_call_handler_register(true, OPAL_CALL_TOKEN_ERRINJCT,
+					 powernv_errinjct);
+	if (ret) {
+		pr_warn("%s: Cannot register errinjct virtual handler\n",
+			__func__);
+		return ret;
+	}
+
+	return 0;
+}
+
+module_init(powernv_errinjct_init);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 3/8] drivers/vfio: New IOCTL command VFIO_EEH_INFO
From: Gavin Shan @ 2014-05-14  4:11 UTC (permalink / raw)
  To: kvm-ppc; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu, linuxppc-dev
In-Reply-To: <1400040722-29608-1-git-send-email-gwshan@linux.vnet.ibm.com>

The patch adds new IOCTL command VFIO_EEH_INFO to VFIO container
to support EEH functionality for PCI devices, which have been
passed from host to guest via VFIO.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile   |   1 +
 arch/powerpc/platforms/powernv/eeh-vfio.c | 593 ++++++++++++++++++++++++++++++
 drivers/vfio/vfio_iommu_spapr_tce.c       |  12 +
 include/uapi/linux/vfio.h                 |  57 +++
 4 files changed, 663 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c

diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 63cebb9..2b15a03 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -6,5 +6,6 @@ obj-y			+= opal-msglog.o
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
 obj-$(CONFIG_EEH)	+= eeh-ioda.o eeh-powernv.o
+obj-$(CONFIG_VFIO_EEH)	+= eeh-vfio.o
 obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
diff --git a/arch/powerpc/platforms/powernv/eeh-vfio.c b/arch/powerpc/platforms/powernv/eeh-vfio.c
new file mode 100644
index 0000000..69d5f2d
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/eeh-vfio.c
@@ -0,0 +1,593 @@
+/*
+  * The file intends to support EEH funtionality for those PCI devices,
+  * which have been passed through from host to guest via VFIO. So this
+  * file is naturally part of VFIO implementation on PowerNV platform.
+  *
+  * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2014.
+  *
+  * This program is free software; you can redistribute it and/or modify
+  * it under the terms of the GNU General Public License as published by
+  * the Free Software Foundation; either version 2 of the License, or
+  * (at your option) any later version.
+  */
+
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/msi.h>
+#include <linux/pci.h>
+#include <linux/string.h>
+#include <linux/vfio.h>
+
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/io.h>
+#include <asm/iommu.h>
+#include <asm/opal.h>
+#include <asm/msi_bitmap.h>
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+#include <asm/tce.h>
+#include <asm/uaccess.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+static int powernv_eeh_vfio_map(struct vfio_eeh_info *info)
+{
+	struct pci_bus *bus, *pe_bus;
+	struct pci_dev *pdev;
+	struct eeh_dev *edev;
+	struct eeh_pe *pe;
+	int domain, bus_no, devfn;
+
+	/* Host address */
+	domain = info->map.host_domain;
+	bus_no = (info->map.host_cfg_addr >> 8) & 0xff;
+	devfn = info->map.host_cfg_addr & 0xff;
+
+	/* Find PCI bus */
+	bus = pci_find_bus(domain, bus_no);
+	if (!bus) {
+		pr_warn("%s: PCI bus %04x:%02x not found\n",
+			__func__, domain, bus_no);
+		return -ENODEV;
+	}
+
+	/* Find PCI device */
+	pdev = pci_get_slot(bus, devfn);
+	if (!pdev) {
+		pr_warn("%s: PCI device %04x:%02x:%02x.%01x not found\n",
+			__func__, domain, bus_no,
+			PCI_SLOT(devfn), PCI_FUNC(devfn));
+		return -ENODEV;
+	}
+
+	/* No EEH device - almost impossible */
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (unlikely(!edev)) {
+		pci_dev_put(pdev);
+		pr_warn("%s: No EEH dev for PCI device %s\n",
+			__func__, pci_name(pdev));
+		return -ENODEV;
+	}
+
+	/* Doesn't support PE migration between different PHBs */
+	pe = edev->pe;
+	if (!eeh_pe_passed(pe)) {
+		pe_bus = eeh_pe_bus_get(pe);
+		BUG_ON(!pe_bus);
+
+		/* PE# has format 00BBSS00 */
+		pe->guest_addr.buid    = info->map.guest_buid;
+		pe->guest_addr.pe_addr = pe_bus->number << 16;
+		eeh_pe_set_passed(pe, true);
+	} else if (pe->guest_addr.buid != info->map.guest_buid) {
+		pci_dev_put(pdev);
+		pr_warn("%s: Mismatched PHB BUID (0x%llx, 0x%llx)\n",
+			__func__, pe->guest_addr.buid, info->map.guest_buid);
+		return -EINVAL;
+	}
+
+	edev->guest_addr.buid = info->map.guest_buid;
+	edev->guest_addr.config_addr = info->map.guest_cfg_addr;
+	eeh_dev_set_passed(edev, true);
+
+	pr_debug("EEH: Host PCI dev %s to %llx-%02x:%02x.%01x\n",
+		 pci_name(pdev), info->map.guest_buid,
+		 (info->map.guest_cfg_addr >> 8) & 0xFF,
+		 PCI_SLOT(info->map.guest_cfg_addr & 0xFF),
+		 PCI_FUNC(info->map.guest_cfg_addr & 0xFF));
+
+	pci_dev_put(pdev);
+	return 0;
+}
+
+static int powernv_eeh_vfio_unmap(struct vfio_eeh_info *info)
+{
+	struct eeh_vfio_pci_addr addr;
+	struct pci_dev *pdev;
+	struct eeh_dev *edev, *tmp;
+	struct eeh_pe *pe;
+	bool passed;
+
+	/* Get EEH device */
+	addr.buid = info->unmap.buid;
+	addr.config_addr = info->unmap.cfg_addr;
+	edev = eeh_vfio_dev_get(&addr);
+	if (!edev) {
+		pr_warn("%s: Cannot find %llx:%02x:%02x.%01x\n",
+			__func__, info->unmap.buid,
+			(info->unmap.cfg_addr >> 8) & 0xFF,
+			PCI_SLOT(info->unmap.cfg_addr & 0xFF),
+			PCI_FUNC(info->unmap.cfg_addr & 0xFF));
+		return -ENODEV;
+	}
+
+	/* Return EEH device */
+	memset(&edev->guest_addr, 0, sizeof(edev->guest_addr));
+	eeh_dev_set_passed(edev, false);
+	pdev = eeh_dev_to_pci_dev(edev);
+	pr_debug("EEH: Host PCI dev %s returned\n",
+		 pdev ? pci_name(pdev) : "NULL");
+
+	/* Return PE if no EEH device is owned by guest */
+	pe = edev->pe;
+	passed = false;
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		pdev = eeh_dev_to_pci_dev(edev);
+		if (pdev && pdev->subordinate)
+			continue;
+
+		if (eeh_dev_passed(edev)) {
+			passed = true;
+			break;
+		}
+	}
+
+	if (!passed) {
+		memset(&pe->guest_addr, 0, sizeof(pe->guest_addr));
+		eeh_pe_set_passed(pe, false);
+		pr_debug("EEH: PHB#%x-PE#%x returned to host\n",
+			 pe->phb->global_number, pe->addr);
+	}
+
+	return 0;
+}
+
+static int powernv_eeh_vfio_set_option(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_dev *edev;
+	struct eeh_pe *pe;
+	struct eeh_vfio_pci_addr addr;
+	int opcode = info->option.option;
+	int ret = 0;
+
+	/* Check opcode */
+	if (opcode < EEH_OPT_DISABLE || opcode > EEH_OPT_THAW_DMA) {
+		pr_warn("%s: opcode %d out of range (%d, %d)\n",
+			__func__, opcode, EEH_OPT_DISABLE, EEH_OPT_THAW_DMA);
+		ret = 3;
+		goto out;
+	}
+
+	/* Option "enable" uses PCI config address */
+	if (opcode == EEH_OPT_ENABLE) {
+		addr.buid = info->option.buid;
+		addr.config_addr = (info->option.addr >> 8) & 0xFFFF;
+		edev = eeh_vfio_dev_get(&addr);
+		if (!edev) {
+			pr_warn("%s: Cannot find %llx:%02x:%02x.%01x\n",
+				__func__, addr.buid,
+				(addr.config_addr >> 8) & 0xFF,
+				PCI_SLOT(addr.config_addr & 0xFF),
+				PCI_FUNC(addr.config_addr & 0xFF));
+			ret = 7;
+			goto out;
+		}
+		phb = edev->phb->private_data;
+	} else {
+		addr.buid    = info->option.buid;
+		addr.pe_addr = info->option.addr;
+		pe = eeh_vfio_pe_get(&addr);
+		if (!pe) {
+			pr_warn("%s: Cannot find PE %llx:%x\n",
+				__func__, addr.buid, addr.pe_addr);
+			ret = 7;
+			goto out;
+		}
+		phb = pe->phb->private_data;
+	}
+
+	/* Insure that the EEH stuff has been initialized */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 7;
+		goto out;
+	}
+
+	/*
+	 * The EEH functionality has been enabled on all PEs
+	 * by default. So just return success. The same situation
+	 * would be applied while we disable EEH functionality.
+	 * However, the guest isn't expected to disable that
+	 * at all.
+	 */
+	if (opcode == EEH_OPT_DISABLE ||
+	    opcode == EEH_OPT_ENABLE) {
+		ret = 0;
+		goto out;
+	}
+
+	/*
+	 * Call into the IODA dependent backend in order
+	 * to enable DMA or MMIO for the indicated PE.
+	 */
+	if (phb->eeh_ops && phb->eeh_ops->set_option) {
+		if (phb->eeh_ops->set_option(pe, opcode)) {
+			pr_warn("%s: Failure from backend\n",
+				__func__);
+			ret = 1;
+		}
+	} else {
+		pr_warn("%s: Unsupported request\n",
+			__func__);
+		ret = 7;
+	}
+
+out:
+	return ret;
+}
+
+static int powernv_eeh_vfio_get_addr(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_dev *edev;
+	struct eeh_vfio_pci_addr addr;
+	int opcode = info->addr.option;
+	int ret = 0;
+
+	/* Check opcode */
+	if (opcode != 0 && opcode != 1) {
+		pr_warn("%s: opcode %d out of range (0, 1)\n",
+			__func__, opcode);
+		ret = 3;
+		goto out;
+	}
+
+	/* Find EEH device */
+	addr.buid = info->addr.buid;
+	addr.config_addr = (info->addr.cfg_addr >> 8 ) & 0xFFFF;
+	edev = eeh_vfio_dev_get(&addr);
+	if (!edev) {
+		pr_warn("%s: Cannot find %llx:%02x:%02x.%01x\n",
+			__func__, addr.buid,
+			(addr.config_addr >> 8) & 0xFF,
+			PCI_SLOT(addr.config_addr & 0xFF),
+			PCI_FUNC(addr.config_addr & 0xFF));
+		ret = 7;
+		goto out;
+	}
+	phb = edev->phb->private_data;
+
+	/* EEH enabled ? */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 3;
+		goto out;
+	}
+
+	/* EEH device passed ? */
+	if (!eeh_dev_passed(edev)) {
+		pr_warn("%s: EEH dev %llx:%02x:%02x.%01x owned by host\n",
+			__func__, addr.buid,
+			(addr.config_addr >> 8) & 0xFF,
+			PCI_SLOT(addr.config_addr & 0xFF),
+			PCI_FUNC(addr.config_addr & 0xFF));
+		ret = 3;
+		goto out;
+	}
+
+	/*
+	 * Fill result according to opcode. We don't differentiate
+	 * PCI bus and device sensitive PE here.
+	 */
+	if (opcode == 0)
+		info->addr.ret = edev->pe->guest_addr.pe_addr;
+	else
+		info->addr.ret = 1;
+out:
+	return ret;
+}
+
+static int powernv_eeh_vfio_get_state(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct eeh_vfio_pci_addr addr;
+	int result, ret = 0;
+
+	/* Locate the PE */
+	addr.buid    = info->state.buid;
+	addr.pe_addr = info->state.pe_addr;
+	pe = eeh_vfio_pe_get(&addr);
+	if (!pe) {
+		pr_warn("%s: Cannot locate %llx:%x\n",
+			__func__, addr.buid, addr.pe_addr);
+		ret = 3;
+		goto out;
+	}
+	phb = pe->phb->private_data;
+
+	/* EEH enabled ? */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 3;
+		goto out;
+	}
+
+	/* Call to the IOC dependent function */
+	if (phb->eeh_ops && phb->eeh_ops->get_state) {
+		result = phb->eeh_ops->get_state(pe);
+
+		if (!(result & EEH_STATE_RESET_ACTIVE) &&
+		     (result & EEH_STATE_DMA_ENABLED) &&
+		     (result & EEH_STATE_MMIO_ENABLED))
+			info->state.state = 0;
+		else if (result & EEH_STATE_RESET_ACTIVE)
+			info->state.state = 1;
+		else if (!(result & EEH_STATE_RESET_ACTIVE) &&
+			 !(result & EEH_STATE_DMA_ENABLED) &&
+			 !(result & EEH_STATE_MMIO_ENABLED))
+			info->state.state = 2;
+		else if (!(result & EEH_STATE_RESET_ACTIVE) &&
+			 (result & EEH_STATE_DMA_ENABLED) &&
+			 !(result & EEH_STATE_MMIO_ENABLED))
+			info->state.state = 4;
+		else
+			info->state.state = 5;
+
+		ret = 0;
+	} else {
+		pr_warn("%s: Unsupported request\n", __func__);
+		ret = 3;
+	}
+
+out:
+	return ret;
+}
+
+static int powernv_eeh_vfio_pe_reset(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct eeh_vfio_pci_addr addr;
+	int opcode = info->reset.option;
+	int ret = 0;
+
+	/* Check opcode */
+	if (opcode != EEH_RESET_DEACTIVATE &&
+	    opcode != EEH_RESET_HOT &&
+	    opcode != EEH_RESET_FUNDAMENTAL) {
+		pr_warn("%s: Unsupported opcode %d\n",
+			__func__, opcode);
+		ret = 3;
+		goto out;
+	}
+
+	/* Locate the PE */
+	addr.buid    = info->reset.buid;
+	addr.pe_addr = info->reset.pe_addr;
+	pe = eeh_vfio_pe_get(&addr);
+	if (!pe) {
+		pr_warn("%s: Cannot locate %llx:%x\n",
+			__func__, addr.buid, addr.pe_addr);
+		ret = 3;
+		goto out;
+	}
+	phb = pe->phb->private_data;
+
+	/* EEH enabled ? */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 3;
+		goto out;
+	}
+
+	/* Call into the IODA dependent backend to do the reset */
+	if (!phb->eeh_ops ||
+	    !phb->eeh_ops->set_option ||
+	    !phb->eeh_ops->reset) {
+		pr_warn("%s: Unsupported request\n",
+			__func__);
+		ret = 7;
+	} else {
+		/*
+		 * The frozen PE might be caused by the mechanism called
+		 * PAPR error injection, which is supposed to be one-shot
+		 * without "sticky" bit as being stated by the spec. But
+		 * the reality isn't that, at least on P7IOC. So we have
+		 * to clear that to avoid recrusive error, which fails the
+		 * recovery eventually.
+		 */
+		if (opcode == EEH_RESET_DEACTIVATE)
+			opal_pci_reset(phb->opal_id,
+				       OPAL_PHB_ERROR,
+				       OPAL_ASSERT_RESET);
+
+		if (phb->eeh_ops->reset(pe, opcode)) {
+			pr_warn("%s: Failure from backend\n", __func__);
+			ret = 1;
+			goto out;
+		}
+
+		/*
+		 * The PE is still in frozen state and we need clear that.
+		 * It's good to clear frozen state after deassert to avoid
+		 * messy IO access during reset, which might cause recrusive
+		 * frozen PE.
+		 */
+		if (opcode == EEH_RESET_DEACTIVATE) {
+			if (phb->eeh_ops->set_option(pe, EEH_OPT_THAW_MMIO) ||
+			    phb->eeh_ops->set_option(pe, EEH_OPT_THAW_DMA)) {
+				pr_warn("%s: Cannot clear frozen state\n",
+					__func__);
+				ret = 1;
+			}
+
+			eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
+		}
+	}
+
+out:
+	return ret;
+}
+
+static int powernv_eeh_vfio_pe_config(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct eeh_vfio_pci_addr addr;
+	int ret = 0;
+
+	/* Locate the PE */
+	addr.buid    = info->config.buid;
+	addr.pe_addr = info->config.pe_addr;
+	pe = eeh_vfio_pe_get(&addr);
+	if (!pe) {
+		pr_warn("%s: Cannot locate %llx:%x\n",
+			__func__, addr.buid, addr.pe_addr);
+		ret = 3;
+		goto out;
+	}
+	phb = pe->phb->private_data;
+
+	/* EEH enabled ? */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 3;
+		goto out;
+        }
+
+	/*
+	 * The access to PCI config space on VFIO device has some
+	 * limitations. Part of PCI config space, including BAR
+	 * registers are not readable and writable. So the guest
+	 * should have stale values for those registers and we have
+	 * to restore them in host side.
+	 */
+	eeh_pe_restore_bars(pe);
+out:
+	return ret;
+}
+
+void eeh_vfio_release(struct iommu_table *tbl)
+{
+	struct pnv_ioda_pe *pnv_pe = container_of(tbl, struct pnv_ioda_pe,
+						  tce32_table);
+	struct pnv_phb *phb = pnv_pe->phb;
+	struct eeh_pe *phb_pe, *pe;
+	struct eeh_dev dev, *edev, *tmp;
+
+	/* Find PHB PE */
+	phb_pe = eeh_phb_pe_get(phb->hose);
+	if (unlikely(!phb_pe)) {
+		pr_warn("%s: Cannot find PHB#%d PE\n",
+			__func__, phb->hose->global_number);
+		return;
+	}
+
+	/* Find PE */
+	memset(&dev, 0, sizeof(struct eeh_dev));
+	dev.phb = phb->hose;
+	dev.pe_config_addr = pnv_pe->pe_number;
+	pe = eeh_pe_get(&dev);
+	if (unlikely(!pe)) {
+		pr_warn("%s: Cannot find PE instance for PHB#%d-PE#%d\n",
+			__func__, phb->hose->global_number,
+			pnv_pe->pe_number);
+		return;
+	}
+
+	/* Release it to host */
+	if (!eeh_pe_passed(pe))
+		return;
+
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		if (!eeh_dev_passed(edev))
+			continue;
+
+		memset(&edev->guest_addr, 0, sizeof(edev->guest_addr));
+		eeh_dev_set_passed(edev, false);
+	}
+
+	memset(&pe->guest_addr, 0, sizeof(pe->guest_addr));
+	eeh_pe_set_passed(pe, false);
+}
+EXPORT_SYMBOL(eeh_vfio_release);
+
+int eeh_vfio_ioctl(unsigned long arg)
+{
+	struct vfio_eeh_info info;
+	int ret = -EINVAL;
+
+	/* Copy over user argument */
+	if (copy_from_user(&info, (void __user *)arg, sizeof(info))) {
+		pr_warn("%s: Cannot copy user argument 0x%lx\n",
+			__func__, arg);
+		return -EFAULT;
+	}
+
+	/* Sanity check */
+	if (info.argsz != sizeof(info)) {
+		pr_warn("%s: Invalid argument size (%d, %ld)\n",
+			__func__, info.argsz, sizeof(info));
+		return -EINVAL;
+	}
+
+	/* Route according to operation */
+	switch (info.op) {
+	case VFIO_EEH_OP_MAP:
+		ret = powernv_eeh_vfio_map(&info);
+		break;
+	case VFIO_EEH_OP_UNMAP:
+		ret = powernv_eeh_vfio_unmap(&info);
+		break;
+	case VFIO_EEH_OP_SET_OPTION:
+		ret = powernv_eeh_vfio_set_option(&info);
+		break;
+	case VFIO_EEH_OP_GET_ADDR:
+		ret = powernv_eeh_vfio_get_addr(&info);
+		break;
+	case VFIO_EEH_OP_GET_STATE:
+		ret = powernv_eeh_vfio_get_state(&info);
+		break;
+	case VFIO_EEH_OP_PE_RESET:
+		ret = powernv_eeh_vfio_pe_reset(&info);
+		break;
+	case VFIO_EEH_OP_PE_CONFIG:
+		ret = powernv_eeh_vfio_pe_config(&info);
+		break;
+	default:
+		pr_info("%s: Cannot handle op#%d\n",
+			__func__, info.op);
+	}
+
+	/* Copy data back */
+	if (!ret && copy_to_user((void __user *)arg, &info, sizeof(info))) {
+		pr_warn("%s: Cannot copy to user 0x%lx\n",
+			__func__, arg);
+		return -EFAULT;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(eeh_vfio_ioctl);
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index a84788b..c45dece 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -26,6 +26,11 @@
 #define DRIVER_AUTHOR   "aik@ozlabs.ru"
 #define DRIVER_DESC     "VFIO IOMMU SPAPR TCE"
 
+#ifdef CONFIG_VFIO_EEH
+extern void eeh_vfio_release(struct iommu_table *tbl);
+extern int eeh_vfio_ioctl(unsigned long arg);
+#endif
+
 static void tce_iommu_detach_group(void *iommu_data,
 		struct iommu_group *iommu_group);
 
@@ -283,6 +288,10 @@ static long tce_iommu_ioctl(void *iommu_data,
 		tce_iommu_disable(container);
 		mutex_unlock(&container->lock);
 		return 0;
+#ifdef CONFIG_VFIO_EEH
+	case VFIO_EEH_INFO:
+		return eeh_vfio_ioctl(arg);
+#endif
 	}
 
 	return -ENOTTY;
@@ -342,6 +351,9 @@ static void tce_iommu_detach_group(void *iommu_data,
 		/* pr_debug("tce_vfio: detaching group #%u from iommu %p\n",
 				iommu_group_id(iommu_group), iommu_group); */
 		container->tbl = NULL;
+#ifdef CONFIG_VFIO_EEH
+		eeh_vfio_release(tbl);
+#endif
 		iommu_release_ownership(tbl);
 	}
 	mutex_unlock(&container->lock);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index cb9023d..1fd1bfb 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -455,6 +455,63 @@ struct vfio_iommu_spapr_tce_info {
 
 #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
 
+/*
+ * The VFIO EEH info struct provides way to support EEH functionality
+ * for PCI device that is passed from host to guest via VFIO.
+ */
+#define VFIO_EEH_OP_MAP		0
+#define VFIO_EEH_OP_UNMAP	1
+#define VFIO_EEH_OP_SET_OPTION	2
+#define VFIO_EEH_OP_GET_ADDR	3
+#define VFIO_EEH_OP_GET_STATE	4
+#define VFIO_EEH_OP_PE_RESET	5
+#define VFIO_EEH_OP_PE_CONFIG	6
+
+struct vfio_eeh_info {
+	__u32 argsz;
+	__u32 op;
+
+	union {
+		struct vfio_eeh_map {
+			__u32 host_domain;
+			__u16 host_cfg_addr;
+			__u64 guest_buid;
+			__u16 guest_cfg_addr;
+		} map;
+		struct vfio_eeh_unmap {
+			__u64 buid;
+			__u16 cfg_addr;
+		} unmap;
+		struct vfio_eeh_set_option {
+			__u64 buid;
+			__u32 addr;
+			__u32 option;
+		} option;
+		struct vfio_eeh_pe_addr {
+			__u64 buid;
+			__u32 cfg_addr;
+			__u32 option;
+			__u32 ret;
+		} addr;
+		struct vfio_eeh_state {
+			__u64 buid;
+			__u32 pe_addr;
+			__u32 state;
+                } state;
+		struct vfio_eeh_reset {
+			__u64 buid;
+			__u32 pe_addr;
+			__u32 option;
+		} reset;
+		struct vfio_eeh_config {
+			__u64 buid;
+			__u32 pe_addr;
+		} config;
+	};
+};
+
+#define VFIO_EEH_INFO	_IO(VFIO_TYPE, VFIO_BASE + 21)
+
 /* ***************************************************************** */
 
 #endif /* _UAPIVFIO_H */
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 4/8] powerpc/eeh: Avoid event on passed PE
From: Gavin Shan @ 2014-05-14  4:11 UTC (permalink / raw)
  To: kvm-ppc; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu, linuxppc-dev
In-Reply-To: <1400040722-29608-1-git-send-email-gwshan@linux.vnet.ibm.com>

If we detects frozen state on PE that has been passed to guest, we
needn't handle it. Instead, we rely on the guest to detect and recover
it. The patch avoid EEH event on the frozen passed PE so that the guest
can have chance to handle that.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c                 | 8 ++++++++
 arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9c6b899..6543f05 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	if (ret > 0)
 		return ret;
 
+	/*
+	 * If the PE has been passed to guest, we won't check the
+	 * state. Instead, let the guest handle it if the PE has
+	 * been frozen.
+	 */
+	if (eeh_pe_passed(pe))
+		return 0;
+
 	/* If we already have a pending isolation event for this
 	 * slot, we know it's bad already, we don't need to check.
 	 * Do this checking under a lock; as multiple PCI devices
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 1b5982f..03a3ed2 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -890,7 +890,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 				opal_pci_eeh_freeze_clear(phb->opal_id, frozen_pe_no,
 					OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
 				ret = EEH_NEXT_ERR_NONE;
-			} else if ((*pe)->state & EEH_PE_ISOLATED) {
+			} else if ((*pe)->state & EEH_PE_ISOLATED ||
+				   eeh_pe_passed(*pe)) {
 				ret = EEH_NEXT_ERR_NONE;
 			} else {
 				pr_err("EEH: Frozen PHB#%x-PE#%x (%s) detected\n",
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 6/8] powerpc: Extend syscall ppc_rtas()
From: Gavin Shan @ 2014-05-14  4:12 UTC (permalink / raw)
  To: kvm-ppc; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu, linuxppc-dev
In-Reply-To: <1400040722-29608-1-git-send-email-gwshan@linux.vnet.ibm.com>

Originally, syscall ppc_rtas() can be used to invoke RTAS call from
user space. Utility "errinjct" is using it to inject various errors
to the system for testing purpose. The patch intends to extend the
syscall to support both pSeries and PowerNV platform. With that,
RTAS and OPAL call can be invoked from user space. In turn, utility
"errinjct" can be supported on pSeries and PowerNV platform at same
time.

The original syscall handler ppc_rtas() is renamed to ppc_firmware(),
which calls ppc_call_rtas() or ppc_call_opal() depending on the
running platform. The data transported between userland and kerenl is
by "struct rtas_args". It's platform specific on how to use the data.

Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h        | 10 +++++-
 arch/powerpc/include/asm/syscalls.h    |  2 +-
 arch/powerpc/include/asm/systbl.h      |  2 +-
 arch/powerpc/include/uapi/asm/unistd.h |  2 +-
 arch/powerpc/kernel/rtas.c             | 57 +++++++---------------------------
 arch/powerpc/kernel/syscalls.c         | 50 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal.c  |  7 +++++
 kernel/sys_ni.c                        |  2 +-
 8 files changed, 82 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index b390f55..3428524 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -20,7 +20,7 @@
 #define RTAS_UNKNOWN_SERVICE (-1)
 #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */
 
-/* Buffer size for ppc_rtas system call. */
+/* Buffer size for ppc_firmware system call. */
 #define RTAS_RMOBUF_MAX (64 * 1024)
 
 /* RTAS return status codes */
@@ -427,9 +427,17 @@ static inline int page_is_rtas_user_buf(unsigned long pfn)
 /* Not the best place to put pSeries_coalesce_init, will be fixed when we
  * move some of the rtas suspend-me stuff to pseries */
 extern void pSeries_coalesce_init(void);
+extern int ppc_call_rtas(struct rtas_args *args);
 #else
 static inline int page_is_rtas_user_buf(unsigned long pfn) { return 0;}
 static inline void pSeries_coalesce_init(void) { }
+static inline int ppc_call_rtas(struct rtas_args *args) { return -ENXIO; }
+#endif
+
+#ifdef CONFIG_PPC_POWERNV
+extern int ppc_call_opal(struct rtas_args *args);
+#else
+static inline int ppc_call_opal(struct rtas_arts *args) { return -ENXIO; }
 #endif
 
 extern int call_rtas(const char *, int, int, unsigned long *, ...);
diff --git a/arch/powerpc/include/asm/syscalls.h b/arch/powerpc/include/asm/syscalls.h
index 23be8f1..3383e50 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -15,7 +15,7 @@ asmlinkage unsigned long sys_mmap2(unsigned long addr, size_t len,
 		unsigned long prot, unsigned long flags,
 		unsigned long fd, unsigned long pgoff);
 asmlinkage long ppc64_personality(unsigned long personality);
-asmlinkage int ppc_rtas(struct rtas_args __user *uargs);
+asmlinkage int ppc_firmware(struct rtas_args __user *uargs);
 
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_SYSCALLS_H */
diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 3ddf702..00f8bb2 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -259,7 +259,7 @@ COMPAT_SYS_SPU(utimes)
 COMPAT_SYS_SPU(statfs64)
 COMPAT_SYS_SPU(fstatfs64)
 SYSX(sys_ni_syscall, ppc_fadvise64_64, ppc_fadvise64_64)
-PPC_SYS_SPU(rtas)
+PPC_SYS_SPU(firmware)
 OLDSYS(debug_setcontext)
 SYSCALL(ni_syscall)
 COMPAT_SYS(migrate_pages)
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index 881bf2e..3aee765 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -273,7 +273,7 @@
 #ifndef __powerpc64__
 #define __NR_fadvise64_64	254
 #endif
-#define __NR_rtas		255
+#define __NR_firmware		255
 #define __NR_sys_debug_setcontext 256
 /* Number 257 is reserved for vserver */
 #define __NR_migrate_pages	258
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 8cd5ed0..5d829a72 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1017,59 +1017,32 @@ struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
 }
 
 /* We assume to be passed big endian arguments */
-asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
+int ppc_call_rtas(struct rtas_args *args)
 {
-	struct rtas_args args;
 	unsigned long flags;
 	char *buff_copy, *errbuf = NULL;
-	int nargs, nret, token;
 	int rc;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	if (copy_from_user(&args, uargs, 3 * sizeof(u32)) != 0)
-		return -EFAULT;
-
-	nargs = be32_to_cpu(args.nargs);
-	nret  = be32_to_cpu(args.nret);
-	token = be32_to_cpu(args.token);
-
-	if (nargs > ARRAY_SIZE(args.args)
-	    || nret > ARRAY_SIZE(args.args)
-	    || nargs + nret > ARRAY_SIZE(args.args))
-		return -EINVAL;
-
-	/* Copy in args. */
-	if (copy_from_user(args.args, uargs->args,
-			   nargs * sizeof(rtas_arg_t)) != 0)
-		return -EFAULT;
-
-	if (token == RTAS_UNKNOWN_SERVICE)
-		return -EINVAL;
-
-	args.rets = &args.args[nargs];
-	memset(args.rets, 0, nret * sizeof(rtas_arg_t));
-
 	/* Need to handle ibm,suspend_me call specially */
-	if (token == ibm_suspend_me_token) {
-		rc = rtas_ibm_suspend_me(&args);
+	if (args->token == ibm_suspend_me_token) {
+		rc = rtas_ibm_suspend_me(args);
 		if (rc)
 			return rc;
-		goto copy_return;
+		goto out;
 	}
 
 	buff_copy = get_errorlog_buffer();
 
 	flags = lock_rtas();
-
-	rtas.args = args;
+	rtas.args = *args;
 	enter_rtas(__pa(&rtas.args));
-	args = rtas.args;
+	*args = rtas.args;
 
-	/* A -1 return code indicates that the last command couldn't
-	   be completed due to a hardware error. */
-	if (be32_to_cpu(args.rets[0]) == -1)
+	/*
+	 * A -1 return code indicates that the last command couldn't
+	 * be completed due to a hardware error.
+	 */
+	if (be32_to_cpu(args->rets[0]) == -1)
 		errbuf = __fetch_rtas_last_error(buff_copy);
 
 	unlock_rtas(flags);
@@ -1080,13 +1053,7 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
 		kfree(buff_copy);
 	}
 
- copy_return:
-	/* Copy out args. */
-	if (copy_to_user(uargs->args + nargs,
-			 args.args + nargs,
-			 nret * sizeof(rtas_arg_t)) != 0)
-		return -EFAULT;
-
+out:
 	return 0;
 }
 
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index cd9be9a..bcb7483 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -40,6 +40,56 @@
 #include <asm/syscalls.h>
 #include <asm/time.h>
 #include <asm/unistd.h>
+#include <asm/machdep.h>
+#include <asm/rtas.h>
+
+asmlinkage int ppc_firmware(struct rtas_args __user *uargs)
+{
+	int rc;
+	int nargs, nret, token;
+	struct rtas_args args;
+
+	/* Copy over common header */
+	if (copy_from_user(&args, uargs, 3 * sizeof(u32)))
+		return -EFAULT;
+	nargs = be32_to_cpu(args.nargs);
+	nret  = be32_to_cpu(args.nret);
+	token = be32_to_cpu(args.token);
+
+	/* Parameter overflow ? */
+	if (nargs > ARRAY_SIZE(args.args)
+	    || nret > ARRAY_SIZE(args.args)
+	    || nargs + nret > ARRAY_SIZE(args.args))
+                return -EINVAL;
+
+	/* Copy over all arguments */
+        if (copy_from_user(args.args, uargs->args,
+			   nargs * sizeof(rtas_arg_t)))
+		return -EFAULT;
+
+	/* Invalid token ? */
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return -EINVAL;
+
+	/* Clean out return values */
+        args.rets = &args.args[nargs];
+        memset(args.rets, 0, nret * sizeof(rtas_arg_t));
+
+	/* Route to correct platform */
+	if (machine_is(pseries))
+		rc = ppc_call_rtas(&args);
+	else if (machine_is(powernv))
+		rc = ppc_call_opal(&args);
+	else
+		return -ENXIO;
+
+	/* Copy result to user space */
+	if (copy_to_user(uargs->args + nargs, args.args + nargs,
+                         nret * sizeof(rtas_arg_t)))
+		return -EFAULT;
+
+	return rc;
+}
 
 static inline unsigned long do_mmap2(unsigned long addr, size_t len,
 			unsigned long prot, unsigned long flags,
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 360ad80c..ad33c2b 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -25,6 +25,7 @@
 #include <asm/opal.h>
 #include <asm/firmware.h>
 #include <asm/mce.h>
+#include <asm/rtas.h>
 
 #include "powernv.h"
 
@@ -701,3 +702,9 @@ void opal_free_sg_list(struct opal_sg_list *sg)
 			sg = NULL;
 	}
 }
+
+/* Extend it later */
+int ppc_call_opal(struct rtas_args *args)
+{
+	return 0;
+}
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index bc8d1b7..2c5b3fa 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -159,7 +159,7 @@ cond_syscall(sys_pciconfig_read);
 cond_syscall(sys_pciconfig_write);
 cond_syscall(sys_pciconfig_iobase);
 cond_syscall(compat_sys_s390_ipc);
-cond_syscall(ppc_rtas);
+cond_syscall(ppc_firmware);
 cond_syscall(sys_spu_run);
 cond_syscall(sys_spu_create);
 cond_syscall(sys_subpage_prot);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 7/8] powerpc/powernv: Implement ppc_call_opal()
From: Gavin Shan @ 2014-05-14  4:12 UTC (permalink / raw)
  To: kvm-ppc; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu, linuxppc-dev
In-Reply-To: <1400040722-29608-1-git-send-email-gwshan@linux.vnet.ibm.com>

If we're running PowerNV platform, ppc_firmware() will be directed
to ppc_call_opal() where we can call to OPAL API accordingly. In
ppc_call_opal(), the input argument are parsed out and call to
appropriate OPAL API to handle that. Each request passed to the
function is identified with token. As we get to the function either
from host owned application (e.g. errinjct) or VM, we always have
the first parameter (so-called "virtual") to differentiate the
cases.

The patch implements above logic and OPAL call handler dynamica
registeration mechanism so that the handlers could be distributed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h       |  3 +-
 arch/powerpc/platforms/powernv/opal.c | 90 ++++++++++++++++++++++++++++++++++-
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ca55d9c..7c4ffd0 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -997,7 +997,8 @@ extern void opal_lpc_init(void);
 struct opal_sg_list *opal_vmalloc_to_sg_list(void *vmalloc_addr,
 					     unsigned long vmalloc_size);
 void opal_free_sg_list(struct opal_sg_list *sg);
-
+int opal_call_handler_register(bool virt, int token,
+			       int (*fn)(struct rtas_args *));
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index ad33c2b..c84823c 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -38,6 +38,13 @@ struct opal {
 	u64 size;
 } opal;
 
+struct opal_call_handler {
+	bool virt;
+	int token;
+	int (*fn)(struct rtas_args *args);
+	struct list_head list;
+};
+
 struct mcheck_recoverable_range {
 	u64 start_addr;
 	u64 end_addr;
@@ -47,6 +54,10 @@ struct mcheck_recoverable_range {
 static struct mcheck_recoverable_range *mc_recoverable_range;
 static int mc_recoverable_range_len;
 
+/* OPAL call handler */
+static LIST_HEAD(opal_call_handler_list);
+static DEFINE_SPINLOCK(opal_call_lock);
+
 struct device_node *opal_node;
 static DEFINE_SPINLOCK(opal_write_lock);
 extern u64 opal_mc_secondary_handler[];
@@ -703,8 +714,83 @@ void opal_free_sg_list(struct opal_sg_list *sg)
 	}
 }
 
-/* Extend it later */
-int ppc_call_opal(struct rtas_args *args)
+int opal_call_handler_register(bool virt, int token,
+			       int (*fn)(struct rtas_args *))
 {
+	struct opal_call_handler *h, *handler;
+
+	if (!token || !fn) {
+		pr_warn("%s: Invalid parameters\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	handler = kzalloc(sizeof(*handler), GFP_KERNEL);
+	if (!handler) {
+		pr_warn("%s: Out of memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+	handler->token = token;
+	handler->virt = virt;
+	handler->fn = fn;
+	INIT_LIST_HEAD(&handler->list);
+
+	spin_lock(&opal_call_lock);
+	list_for_each_entry(h, &opal_call_handler_list, list) {
+		if (h->token == token &&
+		    h->virt  == virt) {
+			spin_unlock(&opal_call_lock);
+			pr_warn("%s: Handler existing (%s, %x)\n",
+				__func__, virt ? "T" : "F", token);
+			kfree(handler);
+			return -EEXIST;
+		}
+	}
+
+	list_add_tail(&handler->list, &opal_call_handler_list);
+	spin_unlock(&opal_call_lock);
+
 	return 0;
 }
+
+/*
+ * It's usually invoked from syscall ppc_firmware() by host
+ * owned application or VM. The information carried in the
+ * input arguments is different. So we always have the first
+ * argument to differentiate it.
+ *
+ * Also, we have to extend 32-bits address to 64-bits. So
+ * for each address sensitive field, it will require 8
+ * bytes.
+ */
+int ppc_call_opal(struct rtas_args *args)
+{
+	bool virt, found;
+	int token;
+	struct opal_call_handler *h;
+
+	/* We should have "virt" at least */
+	if (args->nargs < 1)
+		return -EINVAL;
+	virt = !!args->args[0];
+	token = args->token;
+
+	/* Do we have handler ? */
+	found = false;
+	spin_lock(&opal_call_lock);
+	list_for_each_entry(h, &opal_call_handler_list, list) {
+		if (h->token == token &&
+		    h->virt == virt) {
+			found = true;
+			break;
+		}
+	}
+	spin_unlock(&opal_call_lock);
+
+	/* Call to handler */
+	if (!found)
+		return -ERANGE;
+
+	return h->fn(args);
+}
-- 
1.8.3.2

^ permalink raw reply related

* Re: powerpc/ppc64: Allow allmodconfig to build (finally !)
From: Guenter Roeck @ 2014-05-14  5:16 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linuxppc-dev, Alan Modra
In-Reply-To: <20140514133434.13b66009@canb.auug.org.au>

On 05/13/2014 08:34 PM, Stephen Rothwell wrote:
> Hi Guenter,
>
> On Tue, 13 May 2014 10:17:49 -0700 Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On Tue, May 13, 2014 at 07:16:41PM +1000, Benjamin Herrenschmidt wrote:
>>> On Mon, 2014-05-12 at 17:28 -0700, Guenter Roeck wrote:
>>>
>>>> After applying this patch, I get
>>>>
>>>> arch/powerpc/kernel/exceptions-64s.S:269: Error: operand out of range
>>>> (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
>>>> arch/powerpc/kernel/exceptions-64s.S:729: Error: operand out of range
>>>> (0x000000000000814c is not between 0xffffffffffff8000 and 0x0000000000007ffc)
>>>>
>>>> with powerpc:defconfig, powerpc:allmodconfig, powerpc:cell_defconfig, and
>>>> powerpc:maple_defconfig.
>>>>
>>>> This is on top of v3.15-rc5. Any idea what is going on ?
>>>>
>>>> Compiler is powerpc64-poky-linux-gcc (GCC) 4.7.2 (from poky 1.4.0-1).
>>>
>>> Interesting... works with all my test configs using 4.7.3...
>>>
>>> I don't have my tree at hand right now, I'll check what that means
>>> tomorrow see if I can find a workaround.
>>>
>> It works for me with gcc 4.8.2 (build from yocto 1.6.0).
>>
>> Is asking people to use gcc 4.7.3 or later acceptable ?
>
> OK, this appears to be an assembler bug.
>
> $ cat test.s
> 	.text
> x:
> 	.pushsection	b, "a"
> 	beq y
> 	.popsection
> 	.=0x80000
> y:
> $ /opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-as --version
> GNU assembler (GNU Binutils) 2.22
> This assembler was configured for a target of `powerpc64-linux'.
> $ /opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-as -o test.o test.s
> test.s: Assembler messages:
> test.s:4: Error: operand out of range (0x0000000000080000 is not between 0xffffffffffff8000 and 0x0000000000007ffc)
> $ /opt/cross/gcc-4.8.1-nolibc/powerpc64-linux/bin/powerpc64-linux-as --version
> GNU assembler (GNU Binutils) 2.23.52.20130512
> This assembler was configured for a target of `powerpc64-linux'.
> $ /opt/cross/gcc-4.8.1-nolibc/powerpc64-linux/bin/powerpc64-linux-as -o test.o test.s
> (no error)
>
> Alan, can you shed light on when it was fixed?
>

Hi Stephen,

any idea what might cause this one, by any chance ?

arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol `interrupt_base_book3e' defined in .text section in arch/powerpc/kernel/built-in.o
arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol `interrupt_end_book3e' defined in .text section in arch/powerpc/kernel/built-in.o
arch/powerpc/kernel/built-in.o: In function `exc_debug_debug_book3e':

I see this if I try to build powerpc:ppc64e_defconfig or powerpc:chroma_defconfig
with gcc 4.8.2 and binutils 2.24.

Thanks,
Guenter

^ permalink raw reply

* [git pull] Please pull powerpc.git merge branch
From: Benjamin Herrenschmidt @ 2014-05-14  5:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel list

Hi Linus !

Here are a couple of fixes for 3.15. One from Anton fixes a nasty regression
I introduced when trying to fix a loss of irq_work whose consequences is
that we can completely lose timer interrupts on a CPU... not pretty.

The other one is a change to our PCIe reset hook to use a firmware call
instead of direct config space accesses to trigger a fundamental reset
on the root port. This is necessary so that the FW gets a chance to
disable the link down error monitoring, which would otherwise trip
and cause subsequent fatal EEH error.

Cheers,
Ben.

The following changes since commit e4565362c7adc31201135c4b6d649fc1bdc3bf20:

  powerpc/4xx: Fix section mismatch in ppc4xx_pci.c (2014-04-28 16:32:53 +1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to 8050936caf125fbe54111ba5e696b68a360556ba:

  powerpc: irq work racing with timer interrupt can result in timer interrupt hang (2014-05-12 14:29:28 +1000)

----------------------------------------------------------------
Anton Blanchard (1):
      powerpc: irq work racing with timer interrupt can result in timer interrupt hang

Gavin Shan (1):
      powerpc/powernv: Reset root port in firmware

 arch/powerpc/kernel/time.c                | 3 ---
 arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++-
 2 files changed, 2 insertions(+), 4 deletions(-)

^ permalink raw reply

* Re: powerpc/ppc64: Allow allmodconfig to build (finally !)
From: Alan Modra @ 2014-05-14  5:42 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linuxppc-dev, Guenter Roeck
In-Reply-To: <20140514133434.13b66009@canb.auug.org.au>

On Wed, May 14, 2014 at 01:34:34PM +1000, Stephen Rothwell wrote:
> OK, this appears to be an assembler bug.

Agreed.  Upgrade binutils!

> $ cat test.s
> 	.text
> x:
> 	.pushsection	b, "a"
> 	beq y
> 	.popsection
> 	.=0x80000
> y:
> $ /opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-as --version
> GNU assembler (GNU Binutils) 2.22
> This assembler was configured for a target of `powerpc64-linux'.
> $ /opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-as -o test.o test.s 
> test.s: Assembler messages:
> test.s:4: Error: operand out of range (0x0000000000080000 is not between 0xffffffffffff8000 and 0x0000000000007ffc)
> $ /opt/cross/gcc-4.8.1-nolibc/powerpc64-linux/bin/powerpc64-linux-as --version
> GNU assembler (GNU Binutils) 2.23.52.20130512
> This assembler was configured for a target of `powerpc64-linux'.
> $ /opt/cross/gcc-4.8.1-nolibc/powerpc64-linux/bin/powerpc64-linux-as -o test.o test.s 
> (no error)
> 
> Alan, can you shed light on when it was fixed?

2012-11-05
https://sourceware.org/ml/binutils/2012-11/msg00043.html
git show 3b8b57a9495016b2b02fbc2612dd1607d4b6f9ba

The part that actually fixes this problem is "Leave insn field zero...".

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox