LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4/4] powerpc: Don't return to BE mode when we are already there
From: Alexander Graf @ 2013-12-23  1:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Dinar Valeev, Anton Blanchard
In-Reply-To: <1387761885-39599-1-git-send-email-agraf@suse.de>

Our Little Endian kernels can now live in a world where they are
running with Big Endian interrupts enabled. That is great for kexec,
because now we don't have to switch back to Big Endian mode.

Indicate this in the code. Only try to go into Big Endian mode when
we're not already there yet.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/platforms/pseries/setup.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index fb5d98c..7db1cf1 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -446,10 +446,15 @@ static void pSeries_machine_kexec(struct kimage *image)
 #endif
 
 #ifdef __LITTLE_ENDIAN__
+static bool ile_enabled;
+
 long pseries_big_endian_exceptions(void)
 {
 	long rc;
 
+	if (!ile_enabled)
+		return H_SUCCESS;
+
 	while (1) {
 		rc = enable_big_endian_exceptions();
 		if (!H_IS_LONG_BUSY(rc))
@@ -498,8 +503,12 @@ static long pseries_little_endian_exceptions(void)
 
 	while (1) {
 		rc = enable_little_endian_exceptions();
-		if (!H_IS_LONG_BUSY(rc))
+
+		if (!H_IS_LONG_BUSY(rc)) {
+			ile_enabled = true;
 			return rc;
+		}
+
 		mdelay(get_longbusy_msecs(rc));
 	}
 }
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH 2/4] powerpc: Add relocation code for fixups
From: Alexander Graf @ 2013-12-23  1:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Dinar Valeev, Anton Blanchard
In-Reply-To: <1387761885-39599-1-git-send-email-agraf@suse.de>

We need to patch an instruction that is covered by the fixup
framework. If we don't do anything about it we end up getting
our own patched instruction unpatched by nops by the fixups.

So add an export to the fixup code that allows us to tell it
that an instruction moved location in memory. This works because
we move the instruction into a different location, but still
execute it.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/cputable.h |  2 ++
 arch/powerpc/lib/feature-fixups.c   | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 0d4939b..c981f99 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -99,6 +99,8 @@ extern unsigned int __start___ftr_fixup, __stop___ftr_fixup;
 extern struct cpu_spec *identify_cpu(unsigned long offset, unsigned int pvr);
 extern void do_feature_fixups(unsigned long value, void *fixup_start,
 			      void *fixup_end);
+extern void relocate_fixup_entry(void *fixup_start, void *fixup_end,
+				 void *old_addr, void *new_addr);
 
 extern const char *powerpc_base_platform;
 
diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c
index 7a8a748..33996a4 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -151,6 +151,28 @@ void do_final_fixups(void)
 #endif
 }
 
+/*
+ * This changes the internal fixup location of a code block from
+ * old_addr to new_addr.
+ */
+void relocate_fixup_entry(void *fixup_start, void *fixup_end,
+			  void *old_addr, void *new_addr)
+{
+	struct fixup_entry *fcur, *fend;
+
+	fcur = fixup_start;
+	fend = fixup_end;
+
+	for (; fcur < fend; fcur++) {
+		long diff = (long)new_addr -
+			    (long)calc_addr(fcur, fcur->start_off);
+		if (calc_addr(fcur, fcur->start_off) == old_addr) {
+			fcur->start_off += diff;
+			fcur->end_off += diff;
+		}
+	}
+}
+
 #ifdef CONFIG_FTR_FIXUP_SELFTEST
 
 #define check(x)	\
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH 0/4] powerpc: Enable ILE on pSeries without H_MODE_SET
From: Alexander Graf @ 2013-12-23  1:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Dinar Valeev, Anton Blanchard

Howdy,

There are a few machines out there that would be pretty ppc64le capable
if only it wasn't for the hypervisor that's running on them.

The problem is that we need to run in ILE (interrupts delivered in little
endian) mode to run our normal interrupt vectors. The hypercall to enable
this mode is not available on PowerVM.

However, we can try to be smart here. Instead of configuring the hypervisor
to run interrupts in little endian mode, we can just bring ourselves back
to little endian on every interrupt.

We do this by patching every interrupt handler with a "branch" instruction
at the beginning which jumps into a tiny helper that turns on MSR.LE and
resumes the interrupt handler.

This is not the most smart thing to do performance wise, but it does have
a really nice side effect: We do not impact hypervisors that do support
H_SET_MODE. This is important, as we want to be as fast as possible on machines
that are supposed to run Little Endian guests.

As a tiny side effect we also clobber CFAR in every interrupt, rendering
the whole thing pretty useless. So we don't get nice and shiny perf and
debugging helps. Oh well.

Given that the alternatives to all of this would either be

  a) Not run at all or
  b) Incur performance penalties for "good" systems or
  c) Add significant maintenance burden on us

I'm quite convinced this is the way we want to go.


Enjoy,

Alex

Alexander Graf (4):
  powerpc: Add global exports for all interrupt vectors
  powerpc: Add relocation code for fixups
  powerpc: Add hack to make ppc64le work on hosts without ILE
  powerpc: Don't return to BE mode when we are already there

 arch/powerpc/include/asm/cputable.h    |   2 +
 arch/powerpc/kernel/Makefile           |   3 +
 arch/powerpc/kernel/exceptions-64s.S   |   5 ++
 arch/powerpc/kernel/fake_ile.S         | 101 +++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/vmlinux.lds.S      |  14 +++++
 arch/powerpc/lib/feature-fixups.c      |  22 +++++++
 arch/powerpc/platforms/pseries/setup.c |  61 +++++++++++++++++++-
 7 files changed, 207 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/kernel/fake_ile.S

-- 
1.8.1.4

^ permalink raw reply

* [PATCH 1/4] powerpc: Add global exports for all interrupt vectors
From: Alexander Graf @ 2013-12-23  1:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Dinar Valeev, Anton Blanchard
In-Reply-To: <1387761885-39599-1-git-send-email-agraf@suse.de>

We need to access every interrupt vector we can find soon, so let's
make them all visible through names to outside code.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/exceptions-64s.S | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 9f905e4..fcd039f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -148,6 +148,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 				 NOTEST, 0x100)
 
 	. = 0x200
+	.globl machine_check_pSeries_1
 machine_check_pSeries_1:
 	/* This is moved out of line as it can be patched by FW, but
 	 * some code path might still want to branch into the original
@@ -328,24 +329,28 @@ hv_doorbell_trampoline:
 	 * trickery is thus necessary
 	 */
 	. = 0xf00
+	.global performance_monitor_pseries_trampoline
 performance_monitor_pseries_trampoline:
 	SET_SCRATCH0(r13)
 	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	performance_monitor_pSeries
 
 	. = 0xf20
+	.global altivec_unavailable_pseries_trampoline
 altivec_unavailable_pseries_trampoline:
 	SET_SCRATCH0(r13)
 	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	altivec_unavailable_pSeries
 
 	. = 0xf40
+	.global vsx_unavailable_pseries_trampoline
 vsx_unavailable_pseries_trampoline:
 	SET_SCRATCH0(r13)
 	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	vsx_unavailable_pSeries
 
 	. = 0xf60
+	.global facility_unavailable_trampoline
 facility_unavailable_trampoline:
 	SET_SCRATCH0(r13)
 	EXCEPTION_PROLOG_0(PACA_EXGEN)
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH 3/4] powerpc: Add hack to make ppc64le work on hosts without ILE
From: Alexander Graf @ 2013-12-23  1:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Dinar Valeev, Anton Blanchard
In-Reply-To: <1387761885-39599-1-git-send-email-agraf@suse.de>

Some hypervisors don't implement the H_SET_MODE hypercall that we
need to set the ILE bit in LPCR which allows us to execute interrupts
in little endian mode.

However otherwise we would be able to run on those hypervisors just
fine.

So let's be creative. This patch creates a few small helpers that
work without register clobbering that execute in big endian mode.

Every interrupt gets patched with a branch instruction as the first
instruction which jumps into their respective helper. That helper
enables MSR.LE and returns back to the original interrupt.

That way we can keep our interrupt handlers in little endian code
while the hypervisor is unaware that we need them to be in little
endian.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/Makefile           |   3 +
 arch/powerpc/kernel/fake_ile.S         | 101 +++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/vmlinux.lds.S      |  14 +++++
 arch/powerpc/platforms/pseries/setup.c |  50 ++++++++++++++++
 4 files changed, 168 insertions(+)
 create mode 100644 arch/powerpc/kernel/fake_ile.S

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 445cb6e..31842e6 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -131,6 +131,9 @@ endif
 
 obj-$(CONFIG_EPAPR_PARAVIRT)	+= epapr_paravirt.o epapr_hcalls.o
 obj-$(CONFIG_KVM_GUEST)		+= kvm.o kvm_emul.o
+ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
+obj-$(CONFIG_PPC_BOOK3S_64)	+= fake_ile.o
+endif
 
 # Disable GCOV in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
diff --git a/arch/powerpc/kernel/fake_ile.S b/arch/powerpc/kernel/fake_ile.S
new file mode 100644
index 0000000..21e9bd7
--- /dev/null
+++ b/arch/powerpc/kernel/fake_ile.S
@@ -0,0 +1,101 @@
+/*
+ * PowerPC helpers for hypervisors without ILE implementation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2013
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/reg.h>
+#include <asm/ppc_asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+/* Little Endian fixups for hosts that don't support Little Endian */
+
+#define FAKE_ILE_HANDLER(handler, area) 			 \
+								 \
+/* This runs in BE mode */					 \
+fake_ile_##handler:						 \
+	.section __be_patch,"a"					;\
+	.llong fake_ile_##handler				;\
+	.previous						;\
+	SET_SCRATCH0(r13)					;\
+	GET_PACA(r13)						;\
+	std     r9, area + EX_R9(r13)				;\
+	std     r10, area + EX_R10(r13)				;\
+	mfsrr0	r9						;\
+	mfsrr1	r10						;\
+	std	r9, area + EX_SRR0(r13)				;\
+	std	r10, area + EX_R11(r13)				;\
+	mflr	r9						;\
+	bl	1f						;\
+	1:							;\
+	mflr	r10						;\
+	mtlr	r9						;\
+	addi	r9, r10, back_to_interrupt_##handler - 1b	;\
+	mfmsr	r10						;\
+	ori	r10, r10, MSR_LE				;\
+	mtsrr0	r9						;\
+	mtsrr1	r10						;\
+	ld	r9, area + EX_SRR0(r13)				;\
+	ld	r10, area + EX_R11(r13)				;\
+	RFI							;\
+	end_fake_ile_##handler:					;\
+	.section __be_patch,"a"					;\
+	.llong end_fake_ile_##handler				;\
+	.previous						;\
+								;\
+/* This runs in LE mode */					 \
+back_to_interrupt_##handler:					;\
+	mtsrr0	r9						;\
+	mtsrr1	r10						;\
+	li	r9, area + EX_R9				;\
+	li	r10, area + EX_R10				;\
+	ldbrx	r9, r13, r9					;\
+	ldbrx	r10, r13, r10					;\
+	GET_SCRATCH0(r13)					;\
+	/* This becomes the instruction we patched away */	 \
+	patched_insn_##handler:					;\
+	.long 0							;\
+	b 	handler + 4					;\
+								 \
+	.section __fake_ile,"a"					;\
+	.llong handler						;\
+	.llong patched_insn_##handler				;\
+	.llong fake_ile_##handler				;\
+	.previous						;\
+
+FAKE_ILE_HANDLER(system_reset_pSeries, PACA_EXMC)
+FAKE_ILE_HANDLER(machine_check_pSeries_1, PACA_EXMC)
+FAKE_ILE_HANDLER(data_access_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(data_access_slb_pSeries, PACA_EXSLB)
+FAKE_ILE_HANDLER(instruction_access_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(instruction_access_slb_pSeries, PACA_EXSLB)
+FAKE_ILE_HANDLER(hardware_interrupt_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(alignment_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(program_check_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(fp_unavailable_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(decrementer_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(doorbell_super_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(trap_0b_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(system_call_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(performance_monitor_pseries_trampoline, PACA_EXGEN)
+FAKE_ILE_HANDLER(altivec_unavailable_pseries_trampoline, PACA_EXGEN)
+FAKE_ILE_HANDLER(vsx_unavailable_pseries_trampoline, PACA_EXGEN)
+FAKE_ILE_HANDLER(facility_unavailable_trampoline, PACA_EXGEN)
+FAKE_ILE_HANDLER(instruction_breakpoint_pSeries, PACA_EXGEN)
+FAKE_ILE_HANDLER(altivec_assist_pSeries, PACA_EXGEN)
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index f096e72..7fdb7c9 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -147,6 +147,20 @@ SECTIONS
 		*(__fw_ftr_fixup)
 		__stop___fw_ftr_fixup = .;
 	}
+
+	. = ALIGN(8);
+	__fake_ile : AT(ADDR(__fake_ile) - LOAD_OFFSET) {
+		__start___fake_ile = .;
+		*(__fake_ile)
+		__stop___fake_ile = .;
+	}
+
+	. = ALIGN(8);
+	__be_patch : AT(ADDR(__be_patch) - LOAD_OFFSET) {
+		__start___be_patch = .;
+		*(__be_patch)
+		__stop___be_patch = .;
+	}
 #endif
 	.init.ramfs : AT(ADDR(.init.ramfs) - LOAD_OFFSET) {
 		INIT_RAM_FS
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index c1f1908..fb5d98c 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -67,6 +67,9 @@
 #include <asm/eeh.h>
 #include <asm/reg.h>
 #include <asm/plpar_wrappers.h>
+#include <asm/cacheflush.h>
+#include <asm/cputable.h>
+#include <asm/code-patching.h>
 
 #include "pseries.h"
 
@@ -455,6 +458,40 @@ long pseries_big_endian_exceptions(void)
 	}
 }
 
+static void swizzle_endian(u32 *start, u32 *end)
+{
+	for (; (long)start < (long)end; start++)
+		patch_instruction(start, swab32(*start));
+}
+
+static void fixup_missing_little_endian_exceptions(void)
+{
+	extern u32 *__start___fake_ile, *__stop___fake_ile;
+	extern u32 *__start___be_patch, *__stop___be_patch;
+	u32 **be_table = &__start___be_patch;
+	u32 **fake_table = &__start___fake_ile;
+
+	/* Make our big endian code look like big endian code */
+	for (; (long)be_table < (long)&__stop___be_patch; be_table += 2)
+		swizzle_endian(be_table[0], be_table[1]);
+
+	/* Now patch the interrupt handlers to branch to our BE code */
+	for (; (long)fake_table < (long)&__stop___fake_ile; fake_table += 3) {
+		u32 *le_handler = fake_table[0];
+		u32 *patched_insn = fake_table[1];
+		u32 *be_handler = fake_table[2];
+		u32 le_be_diff = (long)be_handler - (long)le_handler;
+		patch_instruction(patched_insn, *le_handler);
+		/* This patches the interrupt handler's first instruction into
+		   a branch that jumps to our BE handler that enables MSR_LE */
+		patch_instruction(le_handler, swab32(0x48000000 | le_be_diff));
+		/* Make sure that feature fixups use the new address for its
+		   code patching */
+		relocate_fixup_entry(&__start___ftr_fixup, &__stop___ftr_fixup,
+				     le_handler, patched_insn);
+	}
+}
+
 static long pseries_little_endian_exceptions(void)
 {
 	long rc;
@@ -737,6 +774,19 @@ static int __init pSeries_probe(void)
 			ppc_md.progress("H_SET_MODE LE exception fail", 0);
 			panic("Could not enable little endian exceptions");
 		}
+	} else {
+		/*
+		 * The hypervisor we're running on does not know how to
+		 * configure us to run interrupts in little endian mode,
+		 * so we have to cheat a bit.
+		 *
+		 * This call reprograms all interrupt handlers' first
+		 * instruction into a branch to a big endian fixup section
+		 * which only transitions us into little endian mode, then
+		 * returns back to the normal little endian interrupt
+		 * handler.
+		 */
+		fixup_missing_little_endian_exceptions();
 	}
 #endif
 
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH] ibmveth: Fix more little endian issues
From: Alexander Graf @ 2013-12-23  1:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Santiago Leon, Dinar Valeev, Anton Blanchard, netdev

The ibmveth driver is memcpy()'ing the mac address between a variable
(register) and memory. This assumes a certain endianness of the
system, so let's make that implicit assumption work again.

This patch adds be64_to_cpu() calls to all places where the mac address
gets memcpy()'ed into a local variable.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 drivers/net/ethernet/ibm/ibmveth.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 952d795..97f4ee96 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -581,7 +581,7 @@ static int ibmveth_open(struct net_device *netdev)
 	adapter->rx_queue.toggle = 1;
 
 	memcpy(&mac_address, netdev->dev_addr, netdev->addr_len);
-	mac_address = mac_address >> 16;
+	mac_address = be64_to_cpu(mac_address) >> 16;
 
 	rxq_desc.fields.flags_len = IBMVETH_BUF_VALID |
 					adapter->rx_queue.queue_len;
@@ -1186,6 +1186,7 @@ static void ibmveth_set_multicast_list(struct net_device *netdev)
 			/* add the multicast address to the filter table */
 			unsigned long mcast_addr = 0;
 			memcpy(((char *)&mcast_addr)+2, ha->addr, ETH_ALEN);
+			mcast_addr = cpu_to_be64(mcast_addr);
 			lpar_rc = h_multicast_ctrl(adapter->vdev->unit_address,
 						   IbmVethMcastAddFilter,
 						   mcast_addr);
-- 
1.8.1.4

^ permalink raw reply related

* Re: [PATCH v2] powernv: eeh: add buffer for P7IOC hub error data
From: Gavin Shan @ 2013-12-23  1:33 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20131220190601.GC5480@oc3347516403.ibm.com>

On Fri, Dec 20, 2013 at 01:06:01PM -0600, Brian W Hart wrote:
>V2: Replace driver-global 'hub_diag' with a per-PHB hub diag structure.
>
>Prevent ioda_eeh_hub_diag() from clobbering itself when called by supplying
>a per-PHB buffer for P7IOC hub diagnostic data.  Take care to inform OPAL of
>the correct size for the buffer.
>
>Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com>

Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>

We also need backport it to stable kernel 3.11+

>---
> arch/powerpc/platforms/powernv/eeh-ioda.c | 15 ++-------------
> arch/powerpc/platforms/powernv/pci.h      |  4 +++-
> 2 files changed, 5 insertions(+), 14 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
>index 8184ef5..4790275 100644
>--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
>+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
>@@ -36,7 +36,6 @@
> #include "powernv.h"
> #include "pci.h"
>
>-static char *hub_diag = NULL;
> static int ioda_eeh_nb_init = 0;
>
> static int ioda_eeh_event(struct notifier_block *nb,
>@@ -140,15 +139,6 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
> 		ioda_eeh_nb_init = 1;
> 	}
>
>-	/* We needn't HUB diag-data on PHB3 */
>-	if (phb->type == PNV_PHB_IODA1 && !hub_diag) {
>-		hub_diag = (char *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>-		if (!hub_diag) {
>-			pr_err("%s: Out of memory !\n", __func__);
>-			return -ENOMEM;
>-		}
>-	}
>-
> #ifdef CONFIG_DEBUG_FS
> 	if (phb->dbgfs) {
> 		debugfs_create_file("err_injct_outbound", 0600,
>@@ -633,11 +623,10 @@ static void ioda_eeh_hub_diag_common(struct OpalIoP7IOCErrorData *data)
> static void ioda_eeh_hub_diag(struct pci_controller *hose)
> {
> 	struct pnv_phb *phb = hose->private_data;
>-	struct OpalIoP7IOCErrorData *data;
>+	struct OpalIoP7IOCErrorData *data = &phb->diag.hub_diag;
> 	long rc;
>
>-	data = (struct OpalIoP7IOCErrorData *)ioda_eeh_hub_diag;
>-	rc = opal_pci_get_hub_diag_data(phb->hub_id, data, PAGE_SIZE);
>+	rc = opal_pci_get_hub_diag_data(phb->hub_id, data, sizeof *data);
> 	if (rc != OPAL_SUCCESS) {
> 		pr_warning("%s: Failed to get HUB#%llx diag-data (%ld)\n",
> 			   __func__, phb->hub_id, rc);
>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>index 911c24e..1ed8d5f 100644
>--- a/arch/powerpc/platforms/powernv/pci.h
>+++ b/arch/powerpc/platforms/powernv/pci.h
>@@ -172,11 +172,13 @@ struct pnv_phb {
> 		} ioda;
> 	};
>
>-	/* PHB status structure */
>+	/* PHB and hub status structure */
> 	union {
> 		unsigned char			blob[PNV_PCI_DIAG_BUF_SIZE];
> 		struct OpalIoP7IOCPhbErrorData	p7ioc;
>+		struct OpalIoP7IOCErrorData 	hub_diag;
> 	} diag;
>+
> };
>
> extern struct pci_ops pnv_pci_ops;

Thanks,
Gavin

^ permalink raw reply

* [PATCH] powerpc/mpic_timer: fix the time calculation is not accurate
From: Dongsheng Wang @ 2013-12-23  2:33 UTC (permalink / raw)
  To: scottwood, galak; +Cc: linuxppc-dev, Wang Dongsheng

From: Wang Dongsheng <dongsheng.wang@freescale.com>

When the timer GTCCR toggle bit is inverted, we calculated the rest
of the time is not accurate. So we need to ignore this bit.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/sysdev/mpic_timer.c b/arch/powerpc/sysdev/mpic_timer.c
index 22d7d57..0fb70c9 100644
--- a/arch/powerpc/sysdev/mpic_timer.c
+++ b/arch/powerpc/sysdev/mpic_timer.c
@@ -41,6 +41,7 @@
 #define MPIC_TIMER_TCR_ROVR_OFFSET	24
 
 #define TIMER_STOP			0x80000000
+#define GTCCR_TOG			0x80000000
 #define TIMERS_PER_GROUP		4
 #define MAX_TICKS			(~0U >> 1)
 #define MAX_TICKS_CASCADE		(~0U)
@@ -96,8 +97,15 @@ static void convert_ticks_to_time(struct timer_group_priv *priv,
 	time->tv_sec = (__kernel_time_t)div_u64(ticks, priv->timerfreq);
 	tmp_sec = (u64)time->tv_sec * (u64)priv->timerfreq;
 
-	time->tv_usec = (__kernel_suseconds_t)
-		div_u64((ticks - tmp_sec) * 1000000, priv->timerfreq);
+	time->tv_usec = 0;
+
+	/*
+	 * In some cases tmp_sec may be greater than ticks, because in the
+	 * process of calculation ticks and tmp_sec will be rounded.
+	 */
+	if (tmp_sec <= ticks)
+		time->tv_usec = (__kernel_suseconds_t)
+			div_u64((ticks - tmp_sec) * 1000000, priv->timerfreq);
 
 	return;
 }
@@ -327,11 +335,13 @@ void mpic_get_remain_time(struct mpic_timer *handle, struct timeval *time)
 	casc_priv = priv->timer[handle->num].cascade_handle;
 	if (casc_priv) {
 		tmp_ticks = in_be32(&priv->regs[handle->num].gtccr);
+		tmp_ticks &= ~GTCCR_TOG;
 		ticks = ((u64)tmp_ticks & UINT_MAX) * (u64)MAX_TICKS_CASCADE;
 		tmp_ticks = in_be32(&priv->regs[handle->num - 1].gtccr);
 		ticks += tmp_ticks;
 	} else {
 		ticks = in_be32(&priv->regs[handle->num].gtccr);
+		ticks &= ~GTCCR_TOG;
 	}
 
 	convert_ticks_to_time(priv, ticks, time);
-- 
1.8.5

^ permalink raw reply related

* [PATCH 02/21] net: freescale: slight optimization of addr compare
From: Ding Tianhong @ 2013-12-23  5:09 UTC (permalink / raw)
  To: Li Yang, Netdev, linux-kernel@vger.kernel.org, linuxppc-dev

Use the recently added and possibly more efficient
ether_addr_equal_unaligned to instead of memcmp.

Cc: Li Yang <leoli@freescale.com>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/ethernet/freescale/ucc_geth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index 5548b6d..88a1525 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -437,7 +437,7 @@ static void hw_add_addr_in_hash(struct ucc_geth_private *ugeth,
 
 static inline int compare_addr(u8 **addr1, u8 **addr2)
 {
-	return memcmp(addr1, addr2, ETH_ALEN);
+	return !ether_addr_equal_unaligned(addr1, addr2);
 }
 
 #ifdef DEBUG
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH] ibmveth: Fix more little endian issues
From: Anton Blanchard @ 2013-12-23  6:38 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Dinar Valeev, Santiago Leon, linuxppc-dev, netdev
In-Reply-To: <1387762163-39662-1-git-send-email-agraf@suse.de>


Hi Alex,

> The ibmveth driver is memcpy()'ing the mac address between a variable
> (register) and memory. This assumes a certain endianness of the
> system, so let's make that implicit assumption work again.

Nice catch! I don't like how the driver has two different methods
for creating these MAC addresses, both without comments. How does
this look?

Anton
--

The hypervisor expects MAC addresses passed in registers to be big
endian u64. Create a helper function called ibmveth_encode_mac_addr
which does the right thing in both big and little endian.

We were storing the MAC address in a long in struct ibmveth_adapter.
It's never used so remove it - we don't need another place in the
driver where we create endian issues with MAC addresses.

Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anton Blanchard <anton@samba.org>
---

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 952d795..044178b 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -523,6 +523,17 @@ retry:
 	return rc;
 }
 
+/*
+ * The hypervisor expects MAC addresses passed in registers to be
+ * big endian u64.
+ */
+static unsigned long ibmveth_encode_mac_addr(char *mac)
+{
+	unsigned long encoded = 0;
+	memcpy(((char *)&encoded) + 2, mac, ETH_ALEN);
+	return cpu_to_be64(encoded);
+}
+
 static int ibmveth_open(struct net_device *netdev)
 {
 	struct ibmveth_adapter *adapter = netdev_priv(netdev);
@@ -580,8 +591,7 @@ static int ibmveth_open(struct net_device *netdev)
 	adapter->rx_queue.num_slots = rxq_entries;
 	adapter->rx_queue.toggle = 1;
 
-	memcpy(&mac_address, netdev->dev_addr, netdev->addr_len);
-	mac_address = mac_address >> 16;
+	mac_address = ibmveth_encode_mac_addr(netdev->dev_addr);
 
 	rxq_desc.fields.flags_len = IBMVETH_BUF_VALID |
 					adapter->rx_queue.queue_len;
@@ -1184,8 +1194,8 @@ static void ibmveth_set_multicast_list(struct net_device *netdev)
 		/* add the addresses to the filter table */
 		netdev_for_each_mc_addr(ha, netdev) {
 			/* add the multicast address to the filter table */
-			unsigned long mcast_addr = 0;
-			memcpy(((char *)&mcast_addr)+2, ha->addr, ETH_ALEN);
+			unsigned long mcast_addr;
+			mcast_addr = ibmveth_encode_mac_addr(ha->addr);
 			lpar_rc = h_multicast_ctrl(adapter->vdev->unit_address,
 						   IbmVethMcastAddFilter,
 						   mcast_addr);
@@ -1369,9 +1379,6 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 
 	netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);
 
-	adapter->mac_addr = 0;
-	memcpy(&adapter->mac_addr, mac_addr_p, ETH_ALEN);
-
 	netdev->irq = dev->irq;
 	netdev->netdev_ops = &ibmveth_netdev_ops;
 	netdev->ethtool_ops = &netdev_ethtool_ops;
@@ -1380,7 +1387,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 	netdev->features |= netdev->hw_features;
 
-	memcpy(netdev->dev_addr, &adapter->mac_addr, netdev->addr_len);
+	memcpy(netdev->dev_addr, mac_addr_p, ETH_ALEN);
 
 	for (i = 0; i < IBMVETH_NUM_BUFF_POOLS; i++) {
 		struct kobject *kobj = &adapter->rx_buff_pool[i].kobj;
diff --git a/drivers/net/ethernet/ibm/ibmveth.h b/drivers/net/ethernet/ibm/ibmveth.h
index 84066ba..2c636cb 100644
--- a/drivers/net/ethernet/ibm/ibmveth.h
+++ b/drivers/net/ethernet/ibm/ibmveth.h
@@ -139,7 +139,6 @@ struct ibmveth_adapter {
     struct napi_struct napi;
     struct net_device_stats stats;
     unsigned int mcastFilterSize;
-    unsigned long mac_addr;
     void * buffer_list_addr;
     void * filter_list_addr;
     dma_addr_t buffer_list_dma;

^ permalink raw reply related

* Re: [PATCH] ibmveth: Fix more little endian issues
From: Alexander Graf @ 2013-12-23 10:17 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Dinar Valeev, Santiago Leon, linuxppc-dev, netdev
In-Reply-To: <20131223173833.0a9a6705@kryten>


On 23.12.2013, at 07:38, Anton Blanchard <anton@samba.org> wrote:

>=20
> Hi Alex,
>=20
>> The ibmveth driver is memcpy()'ing the mac address between a variable
>> (register) and memory. This assumes a certain endianness of the
>> system, so let's make that implicit assumption work again.
>=20
> Nice catch! I don't like how the driver has two different methods
> for creating these MAC addresses, both without comments. How does
> this look?

Heh - I didn't even realize those two places were doing the same thing.

Obviously your patch is by far nicer.

Reviewed-by: Alexander Graf <agraf@suse.de>


Alex

>=20
> Anton
> --
>=20
> The hypervisor expects MAC addresses passed in registers to be big
> endian u64. Create a helper function called ibmveth_encode_mac_addr
> which does the right thing in both big and little endian.
>=20
> We were storing the MAC address in a long in struct ibmveth_adapter.
> It's never used so remove it - we don't need another place in the
> driver where we create endian issues with MAC addresses.
>=20
> Reported-by: Alexander Graf <agraf@suse.de>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>=20
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c =
b/drivers/net/ethernet/ibm/ibmveth.c
> index 952d795..044178b 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -523,6 +523,17 @@ retry:
> 	return rc;
> }
>=20
> +/*
> + * The hypervisor expects MAC addresses passed in registers to be
> + * big endian u64.
> + */
> +static unsigned long ibmveth_encode_mac_addr(char *mac)
> +{
> +	unsigned long encoded =3D 0;
> +	memcpy(((char *)&encoded) + 2, mac, ETH_ALEN);
> +	return cpu_to_be64(encoded);
> +}
> +
> static int ibmveth_open(struct net_device *netdev)
> {
> 	struct ibmveth_adapter *adapter =3D netdev_priv(netdev);
> @@ -580,8 +591,7 @@ static int ibmveth_open(struct net_device *netdev)
> 	adapter->rx_queue.num_slots =3D rxq_entries;
> 	adapter->rx_queue.toggle =3D 1;
>=20
> -	memcpy(&mac_address, netdev->dev_addr, netdev->addr_len);
> -	mac_address =3D mac_address >> 16;
> +	mac_address =3D ibmveth_encode_mac_addr(netdev->dev_addr);
>=20
> 	rxq_desc.fields.flags_len =3D IBMVETH_BUF_VALID |
> 					adapter->rx_queue.queue_len;
> @@ -1184,8 +1194,8 @@ static void ibmveth_set_multicast_list(struct =
net_device *netdev)
> 		/* add the addresses to the filter table */
> 		netdev_for_each_mc_addr(ha, netdev) {
> 			/* add the multicast address to the filter table =
*/
> -			unsigned long mcast_addr =3D 0;
> -			memcpy(((char *)&mcast_addr)+2, ha->addr, =
ETH_ALEN);
> +			unsigned long mcast_addr;
> +			mcast_addr =3D =
ibmveth_encode_mac_addr(ha->addr);
> 			lpar_rc =3D =
h_multicast_ctrl(adapter->vdev->unit_address,
> 						   =
IbmVethMcastAddFilter,
> 						   mcast_addr);
> @@ -1369,9 +1379,6 @@ static int ibmveth_probe(struct vio_dev *dev, =
const struct vio_device_id *id)
>=20
> 	netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);
>=20
> -	adapter->mac_addr =3D 0;
> -	memcpy(&adapter->mac_addr, mac_addr_p, ETH_ALEN);
> -
> 	netdev->irq =3D dev->irq;
> 	netdev->netdev_ops =3D &ibmveth_netdev_ops;
> 	netdev->ethtool_ops =3D &netdev_ethtool_ops;
> @@ -1380,7 +1387,7 @@ static int ibmveth_probe(struct vio_dev *dev, =
const struct vio_device_id *id)
> 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> 	netdev->features |=3D netdev->hw_features;
>=20
> -	memcpy(netdev->dev_addr, &adapter->mac_addr, netdev->addr_len);
> +	memcpy(netdev->dev_addr, mac_addr_p, ETH_ALEN);
>=20
> 	for (i =3D 0; i < IBMVETH_NUM_BUFF_POOLS; i++) {
> 		struct kobject *kobj =3D &adapter->rx_buff_pool[i].kobj;
> diff --git a/drivers/net/ethernet/ibm/ibmveth.h =
b/drivers/net/ethernet/ibm/ibmveth.h
> index 84066ba..2c636cb 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.h
> +++ b/drivers/net/ethernet/ibm/ibmveth.h
> @@ -139,7 +139,6 @@ struct ibmveth_adapter {
>     struct napi_struct napi;
>     struct net_device_stats stats;
>     unsigned int mcastFilterSize;
> -    unsigned long mac_addr;
>     void * buffer_list_addr;
>     void * filter_list_addr;
>     dma_addr_t buffer_list_dma;

^ permalink raw reply

* [PATCH 1/3] powerpc/xmon: Don't loop forever in get_output_lock()
From: Michael Ellerman @ 2013-12-23 12:46 UTC (permalink / raw)
  To: linuxppc-dev

From: Michael Ellerman <michael@ellerman.id.au>

If we enter with xmon_speaker != 0 we skip the first cmpxchg(), we also
skip the while loop because xmon_speaker != last_speaker (0) - meaning we
skip the second cmpxchg() also.

Following that code path the compiler sees no memory barriers and so is
within its rights to never reload xmon_speaker. The end result is we loop
forever.

This manifests as all cpus being in xmon ('c' command), but they refuse
to take control when you switch to them ('c x' for cpu # x).

I have seen this deadlock in practice and also checked the generated code to
confirm this is what's happening.

The simplest fix is just to always try the cmpxchg().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/xmon/xmon.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index af9d346..500105c 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -309,12 +309,12 @@ static void get_output_lock(void)
 
 	if (xmon_speaker == me)
 		return;
+
 	for (;;) {
-		if (xmon_speaker == 0) {
-			last_speaker = cmpxchg(&xmon_speaker, 0, me);
-			if (last_speaker == 0)
-				return;
-		}
+		last_speaker = cmpxchg(&xmon_speaker, 0, me);
+		if (last_speaker == 0)
+			return;
+
 		timeout = 10000000;
 		while (xmon_speaker == last_speaker) {
 			if (--timeout > 0)
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 2/3] powerpc/xmon: Fix timeout loop in get_output_lock()
From: Michael Ellerman @ 2013-12-23 12:46 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1387802766-7199-1-git-send-email-mpe@ellerman.id.au>

As far as I can tell, our 70s era timeout loop in get_output_lock() is
generating no code.

This leads to the hostile takeover happening more or less simultaneously
on all cpus. The result is "interesting", some example output that is
more readable than most:

    cpu 0x1: Vector: 100 (Scypsut e0mx bR:e setV)e catto xc0p:u[ c 00
    c0:0  000t0o0V0erc0td:o5 rfc28050000]0c00 0 0  0 6t(pSrycsV1ppuot
    uxe 1m 2 0Rx21e3:0s0ce000c00000t00)00 60602oV2SerucSayt0y 0p 1sxs

Fix it by using udelay() in the timeout loop. The wait time and check
frequency are arbitrary, but seem to work OK. We already rely on
udelay() working so this is not a new dependency.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/xmon/xmon.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 500105c..051037e 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -315,10 +315,17 @@ static void get_output_lock(void)
 		if (last_speaker == 0)
 			return;
 
-		timeout = 10000000;
+		/*
+		 * Wait a full second for the lock, we might be on a slow
+		 * console, but check every 100us.
+		 */
+		timeout = 10000;
 		while (xmon_speaker == last_speaker) {
-			if (--timeout > 0)
+			if (--timeout > 0) {
+				udelay(100);
 				continue;
+			}
+
 			/* hostile takeover */
 			prev = cmpxchg(&xmon_speaker, last_speaker, me);
 			if (prev == last_speaker)
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 3/3] powerpc/xmon: Don't signal we've entered until we're finished printing
From: Michael Ellerman @ 2013-12-23 12:46 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1387802766-7199-1-git-send-email-mpe@ellerman.id.au>

Currently we set our cpu's bit in cpus_in_xmon, and then we take the
output lock and print the exception information.

This can race with the master cpu entering the command loop and printing
the backtrace. The result is that the backtrace gets garbled with
another cpu's exception print out.

Fix it by delaying the set of cpus_in_xmon until we are finished
printing.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/xmon/xmon.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 051037e..b59f44f 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -404,7 +404,6 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
 	}
 
 	xmon_fault_jmp[cpu] = recurse_jmp;
-	cpumask_set_cpu(cpu, &cpus_in_xmon);
 
 	bp = NULL;
 	if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) == (MSR_IR|MSR_64BIT))
@@ -426,6 +425,8 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
 		release_output_lock();
 	}
 
+	cpumask_set_cpu(cpu, &cpus_in_xmon);
+
  waiting:
 	secondary = 1;
 	while (secondary && !xmon_gate) {
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH] ibmveth: Fix more little endian issues
From: Joe Perches @ 2013-12-23 14:52 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Dinar Valeev, linuxppc-dev, Alexander Graf, netdev, Santiago Leon
In-Reply-To: <20131223173833.0a9a6705@kryten>

On Mon, 2013-12-23 at 17:38 +1100, Anton Blanchard wrote:
> The hypervisor expects MAC addresses passed in registers to be big
> endian u64.

So maybe use __be64 declarations?

> +static unsigned long ibmveth_encode_mac_addr(char *mac)

static __be64 ibmveth_encode_mac_addr(const char *mac)

?

etc...

^ permalink raw reply

* Re: [PATCH] powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
From: Anton Blanchard @ 2013-12-24  1:02 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: paulmck, paulus, linuxppc-dev
In-Reply-To: <1387459057.1305.1.camel@concordia>

Hi Michael,

> > To try and catch any screw ups in our ppc64 memcpy and
> > copy_tofrom_user loops, I wrote a quick test:
> > 
> > http://ozlabs.org/~anton/junkcode/validate_kernel_copyloops.tar.gz
> 
> Nice! How's this look?

Love it!

At the moment my other copy_to/from_user tests run against the kernel
(testing we copy all data right up to a page fault and that we return
the correct number of bytes not copied etc). A small signal handler
that walks the exception entries and branches to the handler should be
all it takes to do it completely in userspace.

Anton

> 
> cheers
> 
> 
> selftests: Import Anton's memcpy / copy_tofrom_user tests
> 
> Turn Anton's memcpy / copy_tofrom_user test into something that can
> live in tools/testing/selftests.
> 
> It requires one turd in arch/powerpc/lib/memcpy_64.S, but it's pretty
> harmless IMHO.
> 
> We are sailing very close to the wind with the feature macros. We
> define them to nothing, which currently means we get a few extra nops
> and include the unaligned calls.
> 
> ---
>  arch/powerpc/lib/memcpy_64.S                       |  2 +
>  tools/testing/selftests/powerpc/Makefile           |  2 +-
>  tools/testing/selftests/powerpc/copyloops/Makefile | 29 +++++++
>  .../selftests/powerpc/copyloops/asm/ppc_asm.h      | 86
> +++++++++++++++++++ .../selftests/powerpc/copyloops/asm/processor.h
> |  0 .../selftests/powerpc/copyloops/copyuser_64.S      |  1 +
>  .../selftests/powerpc/copyloops/copyuser_power7.S  |  1 +
>  .../selftests/powerpc/copyloops/memcpy_64.S        |  1 +
>  .../selftests/powerpc/copyloops/memcpy_power7.S    |  1 +
>  .../testing/selftests/powerpc/copyloops/validate.c | 99
> ++++++++++++++++++++++
> tools/testing/selftests/powerpc/utils.h            |  3 + 11 files
> changed, 224 insertions(+), 1 deletion(-) create mode 100644
> tools/testing/selftests/powerpc/copyloops/Makefile create mode 100644
> tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h create mode
> 100644 tools/testing/selftests/powerpc/copyloops/asm/processor.h
> create mode 120000
> tools/testing/selftests/powerpc/copyloops/copyuser_64.S create mode
> 120000 tools/testing/selftests/powerpc/copyloops/copyuser_power7.S
> create mode 120000
> tools/testing/selftests/powerpc/copyloops/memcpy_64.S create mode
> 120000 tools/testing/selftests/powerpc/copyloops/memcpy_power7.S
> create mode 100644
> tools/testing/selftests/powerpc/copyloops/validate.c
> 
> diff --git a/arch/powerpc/lib/memcpy_64.S
> b/arch/powerpc/lib/memcpy_64.S index d2bbbc8..72ad055 100644
> --- a/arch/powerpc/lib/memcpy_64.S
> +++ b/arch/powerpc/lib/memcpy_64.S
> @@ -14,7 +14,9 @@ _GLOBAL(memcpy)
>  BEGIN_FTR_SECTION
>  	std	r3,48(r1)	/* save destination pointer for
> return value */ FTR_SECTION_ELSE
> +#ifndef SELFTEST
>  	b	memcpy_power7
> +#endif
>  ALT_FTR_SECTION_END_IFCLR(CPU_FTR_VMX_COPY)
>  	PPC_MTOCRF(0x01,r5)
>  	cmpldi	cr1,r5,16
> diff --git a/tools/testing/selftests/powerpc/Makefile
> b/tools/testing/selftests/powerpc/Makefile index bd24ae5..316194f
> 100644 --- a/tools/testing/selftests/powerpc/Makefile
> +++ b/tools/testing/selftests/powerpc/Makefile
> @@ -13,7 +13,7 @@ CFLAGS := -Wall -O2 -flto -Wall -Werror
> -DGIT_VERSION='"$(GIT_VERSION)"' -I$(CUR 
>  export CC CFLAGS
>  
> -TARGETS = pmu
> +TARGETS = pmu copyloops
>  
>  endif
>  
> diff --git a/tools/testing/selftests/powerpc/copyloops/Makefile
> b/tools/testing/selftests/powerpc/copyloops/Makefile new file mode
> 100644 index 0000000..6f2d3be
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/copyloops/Makefile
> @@ -0,0 +1,29 @@
> +# The loops are all 64-bit code
> +CFLAGS += -m64
> +CFLAGS += -I$(CURDIR)
> +CFLAGS += -D SELFTEST
> +
> +# Use our CFLAGS for the implicit .S rule
> +ASFLAGS = $(CFLAGS)
> +
> +PROGS := copyuser_64 copyuser_power7 memcpy_64 memcpy_power7
> +EXTRA_SOURCES := validate.c ../harness.c
> +
> +all: $(PROGS)
> +
> +copyuser_64:     CPPFLAGS += -D
> COPY_LOOP=test___copy_tofrom_user_base +copyuser_power7: CPPFLAGS +=
> -D COPY_LOOP=test___copy_tofrom_user_power7 +memcpy_64:
> CPPFLAGS += -D COPY_LOOP=test_memcpy +memcpy_power7:   CPPFLAGS += -D
> COPY_LOOP=test_memcpy_power7 +
> +$(PROGS): $(EXTRA_SOURCES)
> +
> +run_tests: all
> +	@-for PROG in $(PROGS); do \
> +		./$$PROG; \
> +	done;
> +
> +clean:
> +	rm -f $(PROGS) *.o
> +
> +.PHONY: all run_tests clean
> diff --git a/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h
> b/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h new file
> mode 100644 index 0000000..ccd9c84
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h
> @@ -0,0 +1,86 @@
> +#include <ppc-asm.h>
> +
> +#define CONFIG_ALTIVEC
> +
> +#define r1	1
> +
> +#define vr0     0
> +#define vr1     1
> +#define vr2     2
> +#define vr3     3
> +#define vr4     4
> +#define vr5     5
> +#define vr6     6
> +#define vr7     7
> +#define vr8     8
> +#define vr9     9
> +#define vr10    10
> +#define vr11    11
> +#define vr12    12
> +#define vr13    13
> +#define vr14    14
> +#define vr15    15
> +#define vr16    16
> +#define vr17    17
> +#define vr18    18
> +#define vr19    19
> +#define vr20    20
> +#define vr21    21
> +#define vr22    22
> +#define vr23    23
> +#define vr24    24
> +#define vr25    25
> +#define vr26    26
> +#define vr27    27
> +#define vr28    28
> +#define vr29    29
> +#define vr30    30
> +#define vr31    31
> +
> +#define R14 r14
> +#define R15 r15
> +#define R16 r16
> +#define R17 r17
> +#define R18 r18
> +#define R19 r19
> +#define R20 r20
> +#define R21 r21
> +#define R22 r22
> +
> +#define STACKFRAMESIZE	256
> +#define STK_PARAM(i)	(48 + ((i)-3)*8)
> +#define STK_REG(i)	(112 + ((i)-14)*8)
> +
> +#define _GLOBAL(A) FUNC_START(test_ ## A)
> +
> +#define PPC_MTOCRF(A, B)	mtocrf A, B
> +
> +FUNC_START(enter_vmx_usercopy)
> +	li	r3,1
> +	blr
> +
> +FUNC_START(exit_vmx_usercopy)
> +	li	r3,0
> +	blr
> +
> +FUNC_START(enter_vmx_copy)
> +	li	r3,1
> +	blr
> +
> +FUNC_START(exit_vmx_copy)
> +	blr
> +
> +FUNC_START(memcpy_power7)
> +	blr
> +
> +FUNC_START(__copy_tofrom_user_power7)
> +	blr
> +
> +FUNC_START(__copy_tofrom_user_base)
> +	blr
> +
> +#define BEGIN_FTR_SECTION
> +#define FTR_SECTION_ELSE
> +#define ALT_FTR_SECTION_END_IFCLR(x)
> +#define ALT_FTR_SECTION_END(x, y)
> +#define END_FTR_SECTION_IFCLR(x)
> diff --git
> a/tools/testing/selftests/powerpc/copyloops/asm/processor.h
> b/tools/testing/selftests/powerpc/copyloops/asm/processor.h new file
> mode 100644 index 0000000..e69de29 diff --git
> a/tools/testing/selftests/powerpc/copyloops/copyuser_64.S
> b/tools/testing/selftests/powerpc/copyloops/copyuser_64.S new file
> mode 120000 index 0000000..f1c418a --- /dev/null
> +++ b/tools/testing/selftests/powerpc/copyloops/copyuser_64.S
> @@ -0,0 +1 @@
> +../../../../../arch/powerpc/lib/copyuser_64.S
> \ No newline at end of file
> diff --git
> a/tools/testing/selftests/powerpc/copyloops/copyuser_power7.S
> b/tools/testing/selftests/powerpc/copyloops/copyuser_power7.S new
> file mode 120000 index 0000000..4786895 --- /dev/null
> +++ b/tools/testing/selftests/powerpc/copyloops/copyuser_power7.S
> @@ -0,0 +1 @@
> +../../../../../arch/powerpc/lib/copyuser_power7.S
> \ No newline at end of file
> diff --git a/tools/testing/selftests/powerpc/copyloops/memcpy_64.S
> b/tools/testing/selftests/powerpc/copyloops/memcpy_64.S new file mode
> 120000 index 0000000..cce33fb
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/copyloops/memcpy_64.S
> @@ -0,0 +1 @@
> +../../../../../arch/powerpc/lib/memcpy_64.S
> \ No newline at end of file
> diff --git
> a/tools/testing/selftests/powerpc/copyloops/memcpy_power7.S
> b/tools/testing/selftests/powerpc/copyloops/memcpy_power7.S new file
> mode 120000 index 0000000..0d6fbfa --- /dev/null
> +++ b/tools/testing/selftests/powerpc/copyloops/memcpy_power7.S
> @@ -0,0 +1 @@
> +../../../../../arch/powerpc/lib/memcpy_power7.S
> \ No newline at end of file
> diff --git a/tools/testing/selftests/powerpc/copyloops/validate.c
> b/tools/testing/selftests/powerpc/copyloops/validate.c new file mode
> 100644 index 0000000..1750ff5
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/copyloops/validate.c
> @@ -0,0 +1,99 @@
> +#include <malloc.h>
> +#include <string.h>
> +#include <stdlib.h>
> +#include <stdbool.h>
> +
> +#include "../utils.h"
> +
> +#define MAX_LEN 8192
> +#define MAX_OFFSET 16
> +#define MIN_REDZONE 128
> +#define BUFLEN (MAX_LEN+MAX_OFFSET+2*MIN_REDZONE)
> +#define POISON 0xa5
> +
> +unsigned long COPY_LOOP(void *to, const void *from, unsigned long
> size); +
> +static void do_one(char *src, char *dst, unsigned long src_off,
> +		   unsigned long dst_off, unsigned long len, void
> *redzone,
> +		   void *fill)
> +{
> +	char *srcp, *dstp;
> +	unsigned long ret;
> +	unsigned long i;
> +
> +	srcp = src + MIN_REDZONE + src_off;
> +	dstp = dst + MIN_REDZONE + dst_off;
> +
> +	memset(src, POISON, BUFLEN);
> +	memset(dst, POISON, BUFLEN);
> +	memcpy(srcp, fill, len);
> +
> +	ret = COPY_LOOP(dstp, srcp, len);
> +	if (ret && ret != (unsigned long)dstp) {
> +		printf("(%p,%p,%ld) returned %ld\n", dstp, srcp,
> len, ret);
> +		abort();
> +	}
> +
> +	if (memcmp(dstp, srcp, len)) {
> +		printf("(%p,%p,%ld) miscompare\n", dstp, srcp, len);
> +		printf("src: ");
> +		for (i = 0; i < len; i++)
> +			printf("%02x ", srcp[i]);
> +		printf("\ndst: ");
> +		for (i = 0; i < len; i++)
> +			printf("%02x ", dstp[i]);
> +		printf("\n");
> +		abort();
> +	}
> +
> +	if (memcmp(dst, redzone, dstp - dst)) {
> +		printf("(%p,%p,%ld) redzone before corrupted\n",
> +		       dstp, srcp, len);
> +		abort();
> +	}
> +
> +	if (memcmp(dstp+len, redzone, dst+BUFLEN-(dstp+len))) {
> +		printf("(%p,%p,%ld) redzone after corrupted\n",
> +		       dstp, srcp, len);
> +		abort();
> +	}
> +}
> +
> +int test_copy_loop(void)
> +{
> +	char *src, *dst, *redzone, *fill;
> +	unsigned long len, src_off, dst_off;
> +	unsigned long i;
> +
> +	src = memalign(BUFLEN, BUFLEN);
> +	dst = memalign(BUFLEN, BUFLEN);
> +	redzone = malloc(BUFLEN);
> +	fill = malloc(BUFLEN);
> +
> +	if (!src || !dst || !redzone || !fill) {
> +		fprintf(stderr, "malloc failed\n");
> +		exit(1);
> +	}
> +
> +	memset(redzone, POISON, BUFLEN);
> +
> +	/* Fill with sequential bytes */
> +	for (i = 0; i < BUFLEN; i++)
> +		fill[i] = i & 0xff;
> +
> +	for (len = 1; len < MAX_LEN; len++) {
> +		for (src_off = 0; src_off < MAX_OFFSET; src_off++) {
> +			for (dst_off = 0; dst_off < MAX_OFFSET;
> dst_off++) {
> +				do_one(src, dst, src_off, dst_off,
> len,
> +				       redzone, fill);
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int main(void)
> +{
> +	return test_harness(test_copy_loop, str(COPY_LOOP));
> +}
> diff --git a/tools/testing/selftests/powerpc/utils.h
> b/tools/testing/selftests/powerpc/utils.h index 5851c4b..0de0644
> 100644 --- a/tools/testing/selftests/powerpc/utils.h
> +++ b/tools/testing/selftests/powerpc/utils.h
> @@ -31,4 +31,7 @@ do
> {
> \ }							\ } while
> (0) 
> +#define _str(s) #s
> +#define str(s) _str(s)
> +
>  #endif /* _SELFTESTS_POWERPC_UTILS_H */

^ permalink raw reply

* [PATCH] ibmveth: Fix more little endian issues
From: Anton Blanchard @ 2013-12-24  1:55 UTC (permalink / raw)
  To: Joe Perches
  Cc: Dinar Valeev, linuxppc-dev, Alexander Graf, netdev, Santiago Leon
In-Reply-To: <1387810329.22671.66.camel@joe-AO722>


The hypervisor expects MAC addresses passed in registers to be big
endian u64. Create a helper function called ibmveth_encode_mac_addr
which does the right thing in both big and little endian.

We were storing the MAC address in a long in struct ibmveth_adapter.
It's never used so remove it - we don't need another place in the
driver where we create endian issues with MAC addresses.

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Alexander Graf <agraf@suse.de>
---

v2: annotate with __be64 as suggested by Joe

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 952d795..bb9a631 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -497,7 +497,7 @@ static void ibmveth_cleanup(struct ibmveth_adapter *adapter)
 }
 
 static int ibmveth_register_logical_lan(struct ibmveth_adapter *adapter,
-        union ibmveth_buf_desc rxq_desc, u64 mac_address)
+			union ibmveth_buf_desc rxq_desc, __be64 mac_address)
 {
 	int rc, try_again = 1;
 
@@ -523,10 +523,20 @@ retry:
 	return rc;
 }
 
+/* The hypervisor expects MAC addresses passed in registers to be
+ * big endian u64.
+ */
+static __be64 ibmveth_encode_mac_addr(char *mac)
+{
+	unsigned long encoded = 0;
+	memcpy(((char *)&encoded) + 2, mac, ETH_ALEN);
+	return cpu_to_be64(encoded);
+}
+
 static int ibmveth_open(struct net_device *netdev)
 {
 	struct ibmveth_adapter *adapter = netdev_priv(netdev);
-	u64 mac_address = 0;
+	__be64 mac_address = 0;
 	int rxq_entries = 1;
 	unsigned long lpar_rc;
 	int rc;
@@ -580,8 +590,7 @@ static int ibmveth_open(struct net_device *netdev)
 	adapter->rx_queue.num_slots = rxq_entries;
 	adapter->rx_queue.toggle = 1;
 
-	memcpy(&mac_address, netdev->dev_addr, netdev->addr_len);
-	mac_address = mac_address >> 16;
+	mac_address = ibmveth_encode_mac_addr(netdev->dev_addr);
 
 	rxq_desc.fields.flags_len = IBMVETH_BUF_VALID |
 					adapter->rx_queue.queue_len;
@@ -1184,8 +1193,8 @@ static void ibmveth_set_multicast_list(struct net_device *netdev)
 		/* add the addresses to the filter table */
 		netdev_for_each_mc_addr(ha, netdev) {
 			/* add the multicast address to the filter table */
-			unsigned long mcast_addr = 0;
-			memcpy(((char *)&mcast_addr)+2, ha->addr, ETH_ALEN);
+			__be64 mcast_addr;
+			mcast_addr = ibmveth_encode_mac_addr(ha->addr);
 			lpar_rc = h_multicast_ctrl(adapter->vdev->unit_address,
 						   IbmVethMcastAddFilter,
 						   mcast_addr);
@@ -1369,9 +1378,6 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 
 	netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);
 
-	adapter->mac_addr = 0;
-	memcpy(&adapter->mac_addr, mac_addr_p, ETH_ALEN);
-
 	netdev->irq = dev->irq;
 	netdev->netdev_ops = &ibmveth_netdev_ops;
 	netdev->ethtool_ops = &netdev_ethtool_ops;
@@ -1380,7 +1386,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 	netdev->features |= netdev->hw_features;
 
-	memcpy(netdev->dev_addr, &adapter->mac_addr, netdev->addr_len);
+	memcpy(netdev->dev_addr, mac_addr_p, ETH_ALEN);
 
 	for (i = 0; i < IBMVETH_NUM_BUFF_POOLS; i++) {
 		struct kobject *kobj = &adapter->rx_buff_pool[i].kobj;
diff --git a/drivers/net/ethernet/ibm/ibmveth.h b/drivers/net/ethernet/ibm/ibmveth.h
index 84066ba..2c636cb 100644
--- a/drivers/net/ethernet/ibm/ibmveth.h
+++ b/drivers/net/ethernet/ibm/ibmveth.h
@@ -139,7 +139,6 @@ struct ibmveth_adapter {
     struct napi_struct napi;
     struct net_device_stats stats;
     unsigned int mcastFilterSize;
-    unsigned long mac_addr;
     void * buffer_list_addr;
     void * filter_list_addr;
     dma_addr_t buffer_list_dma;

^ permalink raw reply related

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
From: Michael Ellerman @ 2013-12-24  3:29 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev,
	mingo
In-Reply-To: <52B42394.4060705@linux.vnet.ibm.com>

On Fri, 2013-12-20 at 16:31 +0530, Anshuman Khandual wrote:
> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> > On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
> >> +
> >> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
> >> +		/* XL-form instruction */
> >> +		if (instr_is_branch_xlform(*addr)) {
> >> +
> >> +			/* LR should be set */
> >> +			if (is_branch_link_set(*addr)) {
> >> +				/*
> >> +			 	 * Conditional and unconditional
> >> +			 	 * branch to CTR.
> >> +			 	 */
> >> +				if (is_xlform_ctr(*addr))
> >> +					result = true;
> >> +
> >> +				/*
> >> +			 	 * Conditional and unconditional
> >> +			 	 * branch to LR.
> >> +			 	 */
> >> +				if (is_xlform_lr(*addr))
> >> +					result = true;
> >> +
> >> +				/*
> >> +			 	 * Conditional and unconditional
> >> +			 	 * branch to TAR.
> >> +			 	 */
> >> +				if (is_xlform_tar(*addr))
> >> +					result = true;
> > 
> > What other kind of XL-Form branch is there?
> 
> I am not sure. Do you know of any ?

That was my point. There are no other types, so you can just do:

	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL)
		if (instr_is_branch_xlform(*addr) && is_branch_link_set(*addr))
			return true;

> >> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
> >> +
> >> +		/* I-form instruction - excluded */
> >> +		if (instr_is_branch_iform(*addr))
> >> +			goto out;
> >> +
> >> +		/* B-form or XL-form instruction */
> >> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
> >> +
> >> +			/* Not branch always  */
> >> +			if (!is_bo_always(*addr)) {
> >> +
> >> +				/* Conditional branch to CTR register */
> >> +				if (is_bo_ctr(*addr))
> >> +					goto out;
> > 
> > We might have discussed this but why not?
> 
> Did not get that, discuss what ?

Why are we saying a conditional branch to the CTR is not a conditional branch?

It is conditional, so I think it should be included.

> >> +
> >> +				/* CR[BI] conditional branch with static hint */
> > 
> > A conditional branch with a static hint is still a conditional branch?
> 
> No its not. 

Yes it is?

In fact they could be very interesting branches. Because the compiler or
programmer has statically hinted them, if the hint is wrong they may be a major
source of branch midpredicts.


> >> +				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
> >> +					if (is_bo_crbi_hint(*addr))
> >> +						goto out;
> >> +				}
> >> +
> >> +				result = true;
> >> +			}
> >> +		}
> >> +	}
> >> +out:
> >> +	return result;
> >> +}
 
> >> +	} else {
> >> +		/*
> >> +		 * Userspace address needs to be
> >> +		 * copied first before analysis.
> >> +		 */
> >> +		pagefault_disable();
> >> +		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
> > 
> > I suspect you borrowed this incantation from the callchain code. Unlike that
> > code you don't fallback to reading the page tables directly.
> > 
> > I'd rather see the accessor in the callchain code made generic and have you
> > call it here.
> 
> You have mentioned to take care of this issue yourself.

Yes I will.

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
From: Michael Ellerman @ 2013-12-24  3:34 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: paulmck, paulus, linuxppc-dev
In-Reply-To: <20131224120259.454bc44c@kryten>

On Tue, 2013-12-24 at 12:02 +1100, Anton Blanchard wrote:
> Hi Michael,
> 
> > > To try and catch any screw ups in our ppc64 memcpy and
> > > copy_tofrom_user loops, I wrote a quick test:
> > > 
> > > http://ozlabs.org/~anton/junkcode/validate_kernel_copyloops.tar.gz
> > 
> > Nice! How's this look?
> 
> Love it!

Cool, I'll add your Signed-off-by and resubmit.

> At the moment my other copy_to/from_user tests run against the kernel
> (testing we copy all data right up to a page fault and that we return
> the correct number of bytes not copied etc). A small signal handler
> that walks the exception entries and branches to the handler should be
> all it takes to do it completely in userspace.

That'd be nice. Are they in your junkcode? I couldn't spot them at a glance.

cheers

^ permalink raw reply

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
From: Anshuman Khandual @ 2013-12-24  3:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev,
	mingo
In-Reply-To: <1387855790.15093.1.camel@concordia>

On 12/24/2013 08:59 AM, Michael Ellerman wrote:
> On Fri, 2013-12-20 at 16:31 +0530, Anshuman Khandual wrote:
>> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
>>> On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
>>>> +
>>>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
>>>> +		/* XL-form instruction */
>>>> +		if (instr_is_branch_xlform(*addr)) {
>>>> +
>>>> +			/* LR should be set */
>>>> +			if (is_branch_link_set(*addr)) {
>>>> +				/*
>>>> +			 	 * Conditional and unconditional
>>>> +			 	 * branch to CTR.
>>>> +			 	 */
>>>> +				if (is_xlform_ctr(*addr))
>>>> +					result = true;
>>>> +
>>>> +				/*
>>>> +			 	 * Conditional and unconditional
>>>> +			 	 * branch to LR.
>>>> +			 	 */
>>>> +				if (is_xlform_lr(*addr))
>>>> +					result = true;
>>>> +
>>>> +				/*
>>>> +			 	 * Conditional and unconditional
>>>> +			 	 * branch to TAR.
>>>> +			 	 */
>>>> +				if (is_xlform_tar(*addr))
>>>> +					result = true;
>>>
>>> What other kind of XL-Form branch is there?
>>
>> I am not sure. Do you know of any ?
> 
> That was my point. There are no other types, so you can just do:
> 
> 	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL)
> 		if (instr_is_branch_xlform(*addr) && is_branch_link_set(*addr))
> 			return true;
> 

Done

>>>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
>>>> +
>>>> +		/* I-form instruction - excluded */
>>>> +		if (instr_is_branch_iform(*addr))
>>>> +			goto out;
>>>> +
>>>> +		/* B-form or XL-form instruction */
>>>> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
>>>> +
>>>> +			/* Not branch always  */
>>>> +			if (!is_bo_always(*addr)) {
>>>> +
>>>> +				/* Conditional branch to CTR register */
>>>> +				if (is_bo_ctr(*addr))
>>>> +					goto out;
>>>
>>> We might have discussed this but why not?
>>
>> Did not get that, discuss what ?
> 
> Why are we saying a conditional branch to the CTR is not a conditional branch?
> 
> It is conditional, so I think it should be included.
> 

I believe conditional branch to CTR register and the below conditional branch
with static hint are excluded when processed with BHRB PMU based filter IFM3,
Here the SW implemented filter try to match those exclusions, so that a user
should not see any difference in results whether the filter is processed
either in PMU or in SW.

>>>> +
>>>> +				/* CR[BI] conditional branch with static hint */
>>>
>>> A conditional branch with a static hint is still a conditional branch?
>>
>> No its not. 
> 
> Yes it is?
> 
> In fact they could be very interesting branches. Because the compiler or
> programmer has statically hinted them, if the hint is wrong they may be a major
> source of branch midpredicts.
> 
> 
>>>> +				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
>>>> +					if (is_bo_crbi_hint(*addr))
>>>> +						goto out;
>>>> +				}
>>>> +
>>>> +				result = true;
>>>> +			}
>>>> +		}
>>>> +	}
>>>> +out:
>>>> +	return result;
>>>> +}
> 
>>>> +	} else {
>>>> +		/*
>>>> +		 * Userspace address needs to be
>>>> +		 * copied first before analysis.
>>>> +		 */
>>>> +		pagefault_disable();
>>>> +		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
>>>
>>> I suspect you borrowed this incantation from the callchain code. Unlike that
>>> code you don't fallback to reading the page tables directly.
>>>
>>> I'd rather see the accessor in the callchain code made generic and have you
>>> call it here.
>>
>> You have mentioned to take care of this issue yourself.
> 
> Yes I will.

Thanks !!

^ permalink raw reply

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
From: Michael Ellerman @ 2013-12-24  4:35 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev,
	mingo
In-Reply-To: <52B9049D.4020403@linux.vnet.ibm.com>

On Tue, 2013-12-24 at 09:20 +0530, Anshuman Khandual wrote:
> On 12/24/2013 08:59 AM, Michael Ellerman wrote:
> > On Fri, 2013-12-20 at 16:31 +0530, Anshuman Khandual wrote:
> >> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> >>> On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
> >>>> +
> 
> >>>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
> >>>> +
> >>>> +		/* I-form instruction - excluded */
> >>>> +		if (instr_is_branch_iform(*addr))
> >>>> +			goto out;
> >>>> +
> >>>> +		/* B-form or XL-form instruction */
> >>>> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
> >>>> +
> >>>> +			/* Not branch always  */
> >>>> +			if (!is_bo_always(*addr)) {
> >>>> +
> >>>> +				/* Conditional branch to CTR register */
> >>>> +				if (is_bo_ctr(*addr))
> >>>> +					goto out;
> >>>
> >>> We might have discussed this but why not?
> >>
> >> Did not get that, discuss what ?
> > 
> > Why are we saying a conditional branch to the CTR is not a conditional branch?
> > 
> > It is conditional, so I think it should be included.

> I believe conditional branch to CTR register and the below conditional branch
> with static hint are excluded when processed with BHRB PMU based filter IFM3,
> Here the SW implemented filter try to match those exclusions, so that a user
> should not see any difference in results whether the filter is processed
> either in PMU or in SW.

OK. That's what I meant by "we might have discussed this".

So you need to make it very clear in the code that we are implementing the IFM3
semantics, with a comment. Otherwise it's not obviously clear why those
semantics make sense.

And we need to make extra sure we implement the same semantics as IFM3, which I
don't think you do at the moment.

The description for IFM3 is:

   Do not record:
    * b and bl instructions, 
    * bc and bcl instructions for which the BO field indicates “Branch always.”
   
   For bclr, bclrl, bctr, bctrl, bctar, and bctarl instructions for which
   the BO field indicates “Branch always,” record only one entry
   containing the Branch target address.

So I don't think your SW filter implements that part correctly. You are
discarding all branches with "branch always" set.


   Do not record:
    * Branch instructions for which BO[0]=1, 

This is what excludes branches to CTR. But, it's only branches to CTR that
don't also depend on CR[BI] - we need to make that clear in the code.

    * Branch instructions for which the “a” bit in the BO field is set to 1.

So that's the is_bo_crbi_hint() check and rejection, but it's not related to
CR[BI] at all.

There's a note about CR[BI]:

    Do not record instructions that do not depend on the value of CR[BI].

But I think you've misinterpreted that. 

    Do not record instructions that do not depend on the value of CR[BI].

    Do     record instructions that        depend on the value of CR[BI].


In fact the only branches that don't depend on CR[BI] are "branch always"
branches, and branches with BO[0]=1, both of which were handled above.

cheers

^ permalink raw reply

* Re: [PATCH] ibmveth: Fix more little endian issues
From: Benjamin Herrenschmidt @ 2013-12-24  4:37 UTC (permalink / raw)
  To: Joe Perches
  Cc: netdev, Dinar Valeev, linuxppc-dev, Alexander Graf,
	Anton Blanchard, Santiago Leon
In-Reply-To: <1387810329.22671.66.camel@joe-AO722>

On Mon, 2013-12-23 at 06:52 -0800, Joe Perches wrote:
> On Mon, 2013-12-23 at 17:38 +1100, Anton Blanchard wrote:
> > The hypervisor expects MAC addresses passed in registers to be big
> > endian u64.
> 
> So maybe use __be64 declarations?
> 
> > +static unsigned long ibmveth_encode_mac_addr(char *mac)
> 
> static __be64 ibmveth_encode_mac_addr(const char *mac)

A register value has no endianness. Only memory content does. Especially
talking of a MAC address which is really a byte stream.... (Yes, our
__beXX types used without a * are borderline, but we've got used to it).

In fact I find the use of 

    memcpy(((char *)&encoded) + 2, mac, ETH_ALEN);

Really gross :-) Yes it works with the added cpu_to_be64() but in that
specific case, I think it would be nicer to simply load & shift into
position the 6 bytes and avoid the endianness issue completely.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] iommu: Add empty stub for iommu_group_get_by_id()
From: Alexey Kardashevskiy @ 2013-12-24  5:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Joerg Roedel, linux-kernel
In-Reply-To: <1385016074-17026-1-git-send-email-aik@ozlabs.ru>

On 11/21/2013 05:41 PM, Alexey Kardashevskiy wrote:
> Almost every function in include/linux/iommu.h has an empty stub
> but the iommu_group_get_by_id() did not get one by mistake.
> 
> This adds an empty stub for iommu_group_get_by_id() for IOMMU_API
> disabled config.

Ping?


> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  include/linux/iommu.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 7ea319e..3c7903d 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -245,6 +245,11 @@ static inline struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)
>  	return NULL;
>  }
>  
> +static inline struct iommu_group *iommu_group_get_by_id(int id)
> +{
> +	return NULL;
> +}
> +
>  static inline void iommu_domain_free(struct iommu_domain *domain)
>  {
>  }
> 


-- 
Alexey

^ permalink raw reply

* [PATCH v4 00/10] powerpc: enable the relocatable support for fsl booke 32bit kernel
From: Kevin Hao @ 2013-12-24  7:12 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc

v4:
  - Fix the bug when booting above 64M.
  - Rebase onto v3.13-rc5
  - Pass the following test on a p5020ds board:
       boot kernel at 0x5000000 and 0x9000000
       kdump test with kernel option "crashkernel=64M@80M"

v3:
The main changes include:
  * Drop the patch 5 in v2 (memblock: introduce the memblock_reinit function)
  * Change to use the 64M boot init tlb.

Please refer to the comment section of each patch for more detail.

This patch series passed the kdump test with kernel option "crashkernel=64M@32M"
and "crashkernel=64M@80M" on a p2020rdb board.

v2:
These patches are based on the Ben's next branch. In this version we choose
to do a second relocation if the PAGE_OFFSET is not mapped to the memstart_addr
and we also choose to set the tlb1 entries for the kernel space in address
space 1. With this implementation:
  * We can load the kernel at any place between
     memstart_addr ~ memstart_addr + 768M
  * We can reserve any memory between memstart_addr ~ memstart_addr + 768M
    for a kdump kernel.

I have done a kdump boot on a p2020rdb kernel with the memory reserved by
'crashkernel=32M@320M'.


v1:
Currently the fsl booke 32bit kernel is using the DYNAMIC_MEMSTART relocation
method. But the RELOCATABLE method is more flexible and has less alignment
restriction. So enable this feature on this platform and use it by
default for the kdump kernel.

These patches have passed the kdump boot test on a p2020rdb board.
---
Kevin Hao (10):
  powerpc/fsl_booke: protect the access to MAS7
  powerpc/fsl_booke: introduce get_phys_addr function
  powerpc: introduce macro LOAD_REG_ADDR_PIC
  powerpc: enable the relocatable support for the fsl booke 32bit kernel
  powerpc/fsl_booke: set the tlb entry for the kernel address in AS1
  powerpc: introduce early_get_first_memblock_info
  powerpc/fsl_booke: introduce map_mem_in_cams_addr
  powerpc/fsl_booke: make sure PAGE_OFFSET map to memstart_addr for
    relocatable kernel
  powerpc/fsl_booke: smp support for booting a relocatable kernel above
    64M
  powerpc/fsl_booke: enable the relocatable for the kdump kernel

 arch/powerpc/Kconfig                          |   5 +-
 arch/powerpc/include/asm/ppc_asm.h            |  13 ++
 arch/powerpc/kernel/fsl_booke_entry_mapping.S |   2 +
 arch/powerpc/kernel/head_fsl_booke.S          | 266 +++++++++++++++++++++++---
 arch/powerpc/kernel/prom.c                    |  41 +++-
 arch/powerpc/mm/fsl_booke_mmu.c               |  72 ++++++-
 arch/powerpc/mm/hugetlbpage-book3e.c          |   3 +-
 arch/powerpc/mm/mmu_decl.h                    |   2 +
 arch/powerpc/mm/tlb_nohash_low.S              |   4 +-
 include/linux/of_fdt.h                        |   1 +
 10 files changed, 370 insertions(+), 39 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH v4 01/10] powerpc/fsl_booke: protect the access to MAS7
From: Kevin Hao @ 2013-12-24  7:12 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc
In-Reply-To: <1387869132-12650-1-git-send-email-haokexin@gmail.com>

The e500v1 doesn't implement the MAS7, so we should avoid to access
this register on that implementations. In the current kernel, the
access to MAS7 are protected by either CONFIG_PHYS_64BIT or
MMU_FTR_BIG_PHYS. Since some code are executed before the code
patching, we have to use CONFIG_PHYS_64BIT in these cases.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
v4: No change.

v3: Use ifdef CONFIG_PHYS_64BIT for the code running before code patching.

v2: A new patch in v2.

 arch/powerpc/kernel/head_fsl_booke.S | 2 ++
 arch/powerpc/mm/hugetlbpage-book3e.c | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
index f45726a1d963..09921a5197c6 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -82,7 +82,9 @@ _ENTRY(_start);
 	and	r19,r3,r18		/* r19 = page offset */
 	andc	r31,r20,r18		/* r31 = page base */
 	or	r31,r31,r19		/* r31 = devtree phys addr */
+#ifdef CONFIG_PHYS_64BIT
 	mfspr	r30,SPRN_MAS7
+#endif
 
 	li	r25,0			/* phys kernel start (low) */
 	li	r24,0			/* CPU number */
diff --git a/arch/powerpc/mm/hugetlbpage-book3e.c b/arch/powerpc/mm/hugetlbpage-book3e.c
index 74551b5e41e5..646c4bffaeba 100644
--- a/arch/powerpc/mm/hugetlbpage-book3e.c
+++ b/arch/powerpc/mm/hugetlbpage-book3e.c
@@ -103,7 +103,8 @@ void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
 	if (mmu_has_feature(MMU_FTR_USE_PAIRED_MAS)) {
 		mtspr(SPRN_MAS7_MAS3, mas7_3);
 	} else {
-		mtspr(SPRN_MAS7, upper_32_bits(mas7_3));
+		if (mmu_has_feature(MMU_FTR_BIG_PHYS))
+			mtspr(SPRN_MAS7, upper_32_bits(mas7_3));
 		mtspr(SPRN_MAS3, lower_32_bits(mas7_3));
 	}
 
-- 
1.8.3.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox