LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/pseries: Rename RAS_VECTOR_OFFSET to RTAS_VECTOR_EXTERNAL_INTERRUPT and move to rtas.h
From: Mark Nelson @ 2010-05-27  7:40 UTC (permalink / raw)
  To: linuxppc-dev

The RAS code has a #define, RAS_VECTOR_OFFSET, that's used in the
check-exception RTAS call for the vector offset of the exception.

We'll be using this same vector offset for the upcoming IO Event interrupts
code (0x500) so let's move it to include/asm/rtas.h and call it
RTAS_VECTOR_EXTERNAL_INTERRUPT.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
---
 arch/powerpc/include/asm/rtas.h      |    3 +++
 arch/powerpc/platforms/pseries/ras.c |    5 ++---
 2 files changed, 5 insertions(+), 3 deletions(-)

Index: upstream/arch/powerpc/include/asm/rtas.h
===================================================================
--- upstream.orig/arch/powerpc/include/asm/rtas.h
+++ upstream/arch/powerpc/include/asm/rtas.h
@@ -137,6 +137,9 @@ struct rtas_t {
 #define RTAS_TYPE_PMGM_CONFIG_CHANGE	0x70
 #define RTAS_TYPE_PMGM_SERVICE_PROC	0x71
 
+/* RTAS check-exception vector offset */
+#define RTAS_VECTOR_EXTERNAL_INTERRUPT	0x500
+
 struct rtas_error_log {
 	unsigned long version:8;		/* Architectural version */
 	unsigned long severity:3;		/* Severity level of error */
Index: upstream/arch/powerpc/platforms/pseries/ras.c
===================================================================
--- upstream.orig/arch/powerpc/platforms/pseries/ras.c
+++ upstream/arch/powerpc/platforms/pseries/ras.c
@@ -61,7 +61,6 @@ static int ras_check_exception_token;
 
 #define EPOW_SENSOR_TOKEN	9
 #define EPOW_SENSOR_INDEX	0
-#define RAS_VECTOR_OFFSET	0x500
 
 static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
 static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
@@ -121,7 +120,7 @@ static irqreturn_t ras_epow_interrupt(in
 	spin_lock(&ras_log_buf_lock);
 
 	status = rtas_call(ras_check_exception_token, 6, 1, NULL,
-			   RAS_VECTOR_OFFSET,
+			   RTAS_VECTOR_EXTERNAL_INTERRUPT,
 			   irq_map[irq].hwirq,
 			   RTAS_EPOW_WARNING | RTAS_POWERMGM_EVENTS,
 			   critical, __pa(&ras_log_buf),
@@ -156,7 +155,7 @@ static irqreturn_t ras_error_interrupt(i
 	spin_lock(&ras_log_buf_lock);
 
 	status = rtas_call(ras_check_exception_token, 6, 1, NULL,
-			   RAS_VECTOR_OFFSET,
+			   RTAS_VECTOR_EXTERNAL_INTERRUPT,
 			   irq_map[irq].hwirq,
 			   RTAS_INTERNAL_ERROR, 1 /*Time Critical */,
 			   __pa(&ras_log_buf),

^ permalink raw reply

* halt the system without any broadcast message on the console
From: Bosi Daniele @ 2010-05-27  8:24 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

Hi all,
on the embedded system I'm working (mpc5121e, linux kernel 2.6.24.6) when I type the command "halt" on the console I see a broadcast message sent to all the users like this:

" Broadcast message from root (console) (Thu May 27 09:04:41 2010):

The system is going down for system halt NOW!"

I'd like to avoid this message on the console I'm using because at production time this has to be a simple serial port for external peripheral connection. (to do this on the command line we'll leave empty the "console" field).

The only way I found to avoid the broadcast message is the command "telinit 0" . In this way no one calls the "shutdown" command that is the one which send the broadcast message.
I did also some tests with the shutdown parameters but none of them erase the broadcast message.

So my questions are:
Does the command  "telinit 0" is safe in order to halt the machine?
Does the command  "telinit 6" is safe in order to reboot the machine?
Are there others commands to do the same work without the broadcast message?


Thanks
Daniele Bosi

[-- Attachment #2: Type: text/html, Size: 4054 bytes --]

^ permalink raw reply

* RE: [PATCH 2/2] net: ll_temac: fix checksum offload logic
From: John Linn @ 2010-05-27 13:26 UTC (permalink / raw)
  To: David Miller
  Cc: linuxppc-dev, netdev, michal.simek, Brian Hill, john.williams
In-Reply-To: <20100526.204517.35032653.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, May 26, 2010 9:45 PM
> To: John Linn
> Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org;
grant.likely@secretlab.ca;
> jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com;
michal.simek@petalogix.com; Brian Hill
> Subject: Re: [PATCH 2/2] net: ll_temac: fix checksum offload logic
> =

> aFrom: John Linn <john.linn@xilinx.com>
> Date: Wed, 26 May 2010 11:29:19 -0600
> =

> > The current checksum offload code does not work and this corrects
> > that functionality. It also updates the interrupt coallescing
> > initialization so than there are fewer interrupts and performance
> > is increased.
> >
> > Signed-off-by: Brian Hill <brian.hill@xilinx.com>
> > Signed-off-by: John Linn <john.linn@xilinx.com>
> =

> Applied, although I changed "temac_features" to be explicitly
> "unsigned int" instead of just plain "unsigned"

Awesome! Appreciate the help with both patches David.

> =



This email and any attachments are intended for the sole use of the named r=
ecipient(s) and contain(s) confidential information that may be proprietary=
, privileged or copyrighted under applicable law. If you are not the intend=
ed recipient, do not read, copy, or forward this email message or any attac=
hments. Delete this email message and any attachments immediately.

^ permalink raw reply

* [RFC PATCH] powerpc: Emulate nop too
From: Ananth N Mavinakayanahalli @ 2010-05-27 14:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, K.Prasad, Srikar Dronamraju
In-Reply-To: <20100520124955.GA29903@brick.ozlabs.ibm.com>

Hi Paul,

While we are at it, can we also add nop to the list of emulated
instructions?

Ananth
---
From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>

Emulate ori 0,0,0 (nop).

The long winded way is to do:

        case 24:
                rd = (instr >> 21) & 0x1f;
                if (rd != 0)
                        break;
                rb = (instr >> 11) & 0x1f;
                if (rb != 0)
                        break;
                imm = (instr & 0xffff);
                if (imm == 0) {
                        regs->nip += 4;
                        return 1;
                }
                break;

But, the following is more straightforward for this case at least.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/lib/sstep.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux-27may/arch/powerpc/lib/sstep.c
===================================================================
--- linux-27may.orig/arch/powerpc/lib/sstep.c
+++ linux-27may/arch/powerpc/lib/sstep.c
@@ -412,6 +412,12 @@ int __kprobes emulate_step(struct pt_reg
 	int err;
 	mm_segment_t oldfs;
 
+	/* ori 0,0,0 is a nop. Emulate that too */
+	if (instr == 0x60000000) {
+		regs->nip += 4;
+		return 1;
+	}
+
 	opcode = instr >> 26;
 	switch (opcode) {
 	case 16:	/* bc */

^ permalink raw reply

* [PATCH] powerpc/cell: Fix integer constant warning
From: Denis Kirjanov @ 2010-05-27 14:19 UTC (permalink / raw)
  To: arnd; +Cc: paulus, adobriyan, linuxppc-dev

Fix smatch warning:  warning: constant 0x800000000 is so big it is long

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
---
arch/powerpc/platforms/cell/iommu.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index e3ec497..0a33aed 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -1067,7 +1067,7 @@ static int __init cell_iommu_fixed_mapping_init(void)
 	fbase = _ALIGN_UP(fbase, 1 << IO_SEGMENT_SHIFT);
 	fsize = lmb_phys_mem_size();
 
-	if ((fbase + fsize) <= 0x800000000)
+	if ((fbase + fsize) <= 0x800000000ul)
 		hbase = 0; /* use the device tree window */
 	else {
 		/* If we're over 32 GB we need to cheat. We can't map all of

^ permalink raw reply related

* halt the system without any broadcast message on the console
From: Bosi Daniele @ 2010-05-27 15:39 UTC (permalink / raw)
  To: 'linuxppc-dev@lists.ozlabs.org'

Hi all,
on the embedded system I'm working (mpc5121e, linux kernel 2.6.24.6) when I=
 type the command "halt" on the console I see a broadcast message sent to a=
ll the users like this:

" Broadcast message from root (console) (Thu May 27 09:04:41 2010):

The system is going down for system halt NOW!"

I'd like to avoid this message on the console I'm using because at producti=
on time this has to be a simple serial port for external peripheral connect=
ion. (to do this on the command line we'll leave empty the "console" field)=
.

The only way I found to avoid the broadcast message is the command "telinit=
 0" . In this way no one calls the "shutdown" command that is the one which=
 send the broadcast message.
I did also some tests with the shutdown parameters but none of them erase t=
he broadcast message.

So my questions are:
Does the command  "telinit 0" is safe in order to halt the machine?
Does the command  "telinit 6" is safe in order to reboot the machine?
Are there others commands to do the same work without the broadcast message=
?


Thanks
Daniele Bosi

^ permalink raw reply

* Re: [RFC PATCH] powerpc: Emulate nop too
From: Kumar Gala @ 2010-05-27 20:22 UTC (permalink / raw)
  To: ananth; +Cc: linuxppc-dev, Paul Mackerras, Srikar Dronamraju, K.Prasad
In-Reply-To: <20100527141203.GA20770@in.ibm.com>


On May 27, 2010, at 9:12 AM, Ananth N Mavinakayanahalli wrote:

> Hi Paul,
> 
> While we are at it, can we also add nop to the list of emulated
> instructions?

Dare I ask why we need to emulate nop?

- k

> 
> Ananth
> ---
> From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> 
> Emulate ori 0,0,0 (nop).
> 
> The long winded way is to do:
> 
>        case 24:
>                rd = (instr >> 21) & 0x1f;
>                if (rd != 0)
>                        break;
>                rb = (instr >> 11) & 0x1f;
>                if (rb != 0)
>                        break;
>                imm = (instr & 0xffff);
>                if (imm == 0) {
>                        regs->nip += 4;
>                        return 1;
>                }
>                break;
> 
> But, the following is more straightforward for this case at least.
> 
> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> arch/powerpc/lib/sstep.c |    6 ++++++
> 1 file changed, 6 insertions(+)
> 
> Index: linux-27may/arch/powerpc/lib/sstep.c
> ===================================================================
> --- linux-27may.orig/arch/powerpc/lib/sstep.c
> +++ linux-27may/arch/powerpc/lib/sstep.c
> @@ -412,6 +412,12 @@ int __kprobes emulate_step(struct pt_reg
> 	int err;
> 	mm_segment_t oldfs;
> 
> +	/* ori 0,0,0 is a nop. Emulate that too */
> +	if (instr == 0x60000000) {
> +		regs->nip += 4;
> +		return 1;
> +	}
> +
> 	opcode = instr >> 26;
> 	switch (opcode) {
> 	case 16:	/* bc */
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* [PATCH 1/2] powerpc: Migration code reorganization / hibernation prep
From: Brian King @ 2010-05-27 21:21 UTC (permalink / raw)
  To: benh; +Cc: brking, linuxppc-dev


Partition hibernation will use some of the same code as is
currently used for Live Partition Migration. This function
further abstracts this code such that code outside of rtas.c
can utilize it. It also changes the error field in the suspend
me data structure to be an atomic type, since it is set and
checked on different cpus without any barriers or locking.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/rtas.h |   10 +++
 arch/powerpc/kernel/rtas.c      |  106 ++++++++++++++++++++++++++--------------
 2 files changed, 81 insertions(+), 35 deletions(-)

diff -puN arch/powerpc/include/asm/rtas.h~powerpc_rtas_hibernate_prep arch/powerpc/include/asm/rtas.h
--- linux-2.6/arch/powerpc/include/asm/rtas.h~powerpc_rtas_hibernate_prep	2010-05-25 13:05:48.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/include/asm/rtas.h	2010-05-25 13:05:48.000000000 -0500
@@ -63,6 +63,14 @@ struct rtas_t {
 	struct device_node *dev;	/* virtual address pointer */
 };
 
+struct rtas_suspend_me_data {
+	atomic_t working; /* number of cpus accessing this struct */
+	atomic_t done;
+	int token; /* ibm,suspend-me */
+	atomic_t error;
+	struct completion *complete; /* wait on this until working == 0 */
+};
+
 /* RTAS event classes */
 #define RTAS_INTERNAL_ERROR		0x80000000 /* set bit 0 */
 #define RTAS_EPOW_WARNING		0x40000000 /* set bit 1 */
@@ -174,6 +182,8 @@ extern int rtas_set_indicator(int indica
 extern int rtas_set_indicator_fast(int indicator, int index, int new_value);
 extern void rtas_progress(char *s, unsigned short hex);
 extern void rtas_initialize(void);
+extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
+extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
 
 struct rtc_time;
 extern unsigned long rtas_get_boot_time(void);
diff -puN arch/powerpc/kernel/rtas.c~powerpc_rtas_hibernate_prep arch/powerpc/kernel/rtas.c
--- linux-2.6/arch/powerpc/kernel/rtas.c~powerpc_rtas_hibernate_prep	2010-05-25 13:05:48.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/kernel/rtas.c	2010-05-25 13:05:48.000000000 -0500
@@ -47,14 +47,6 @@ struct rtas_t rtas = {
 };
 EXPORT_SYMBOL(rtas);
 
-struct rtas_suspend_me_data {
-	atomic_t working; /* number of cpus accessing this struct */
-	atomic_t done;
-	int token; /* ibm,suspend-me */
-	int error;
-	struct completion *complete; /* wait on this until working == 0 */
-};
-
 DEFINE_SPINLOCK(rtas_data_buf_lock);
 EXPORT_SYMBOL(rtas_data_buf_lock);
 
@@ -714,14 +706,54 @@ void rtas_os_term(char *str)
 
 static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE;
 #ifdef CONFIG_PPC_PSERIES
-static void rtas_percpu_suspend_me(void *info)
+static int __rtas_suspend_last_cpu(struct rtas_suspend_me_data *data, int wake_when_done)
+{
+	u16 slb_size = mmu_slb_size;
+	int rc = H_MULTI_THREADS_ACTIVE;
+	int cpu;
+
+	atomic_inc(&data->working);
+
+	slb_set_size(SLB_MIN_SIZE);
+	printk(KERN_DEBUG "calling ibm,suspend-me on cpu %i\n", smp_processor_id());
+
+	while (rc == H_MULTI_THREADS_ACTIVE && !atomic_read(&data->done) &&
+	       !atomic_read(&data->error))
+		rc = rtas_call(data->token, 0, 1, NULL);
+
+	if (rc || atomic_read(&data->error)) {
+		printk(KERN_DEBUG "ibm,suspend-me returned %d\n", rc);
+		slb_set_size(slb_size);
+	}
+
+	if (atomic_read(&data->error))
+		rc = atomic_read(&data->error);
+
+	atomic_set(&data->error, rc);
+
+	if (wake_when_done) {
+		atomic_set(&data->done, 1);
+
+		for_each_online_cpu(cpu)
+			plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu));
+	}
+
+	if (atomic_dec_return(&data->working) == 0)
+		complete(data->complete);
+
+	return rc;
+}
+
+int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data)
+{
+	return __rtas_suspend_last_cpu(data, 0);
+}
+
+static int __rtas_suspend_cpu(struct rtas_suspend_me_data *data, int wake_when_done)
 {
 	long rc = H_SUCCESS;
 	unsigned long msr_save;
-	u16 slb_size = mmu_slb_size;
 	int cpu;
-	struct rtas_suspend_me_data *data =
-		(struct rtas_suspend_me_data *)info;
 
 	atomic_inc(&data->working);
 
@@ -729,7 +761,7 @@ static void rtas_percpu_suspend_me(void
 	msr_save = mfmsr();
 	mtmsr(msr_save & ~(MSR_EE));
 
-	while (rc == H_SUCCESS && !atomic_read(&data->done))
+	while (rc == H_SUCCESS && !atomic_read(&data->done) && !atomic_read(&data->error))
 		rc = plpar_hcall_norets(H_JOIN);
 
 	mtmsr(msr_save);
@@ -741,33 +773,37 @@ static void rtas_percpu_suspend_me(void
 		/* All other cpus are in H_JOIN, this cpu does
 		 * the suspend.
 		 */
-		slb_set_size(SLB_MIN_SIZE);
-		printk(KERN_DEBUG "calling ibm,suspend-me on cpu %i\n",
-		       smp_processor_id());
-		data->error = rtas_call(data->token, 0, 1, NULL);
-
-		if (data->error) {
-			printk(KERN_DEBUG "ibm,suspend-me returned %d\n",
-			       data->error);
-			slb_set_size(slb_size);
-		}
+		return __rtas_suspend_last_cpu(data, wake_when_done);
 	} else {
 		printk(KERN_ERR "H_JOIN on cpu %i failed with rc = %ld\n",
 		       smp_processor_id(), rc);
-		data->error = rc;
+		atomic_set(&data->error, rc);
 	}
 
-	atomic_set(&data->done, 1);
+	if (wake_when_done) {
+		atomic_set(&data->done, 1);
 
-	/* This cpu did the suspend or got an error; in either case,
-	 * we need to prod all other other cpus out of join state.
-	 * Extra prods are harmless.
-	 */
-	for_each_online_cpu(cpu)
-		plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu));
+		/* This cpu did the suspend or got an error; in either case,
+		 * we need to prod all other other cpus out of join state.
+		 * Extra prods are harmless.
+		 */
+		for_each_online_cpu(cpu)
+			plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu));
+	}
 out:
 	if (atomic_dec_return(&data->working) == 0)
 		complete(data->complete);
+	return rc;
+}
+
+int rtas_suspend_cpu(struct rtas_suspend_me_data *data)
+{
+	return __rtas_suspend_cpu(data, 0);
+}
+
+static void rtas_percpu_suspend_me(void *info)
+{
+	__rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1);
 }
 
 static int rtas_ibm_suspend_me(struct rtas_args *args)
@@ -802,22 +838,22 @@ static int rtas_ibm_suspend_me(struct rt
 
 	atomic_set(&data.working, 0);
 	atomic_set(&data.done, 0);
+	atomic_set(&data.error, 0);
 	data.token = rtas_token("ibm,suspend-me");
-	data.error = 0;
 	data.complete = &done;
 
 	/* Call function on all CPUs.  One of us will make the
 	 * rtas call
 	 */
 	if (on_each_cpu(rtas_percpu_suspend_me, &data, 0))
-		data.error = -EINVAL;
+		atomic_set(&data.error, -EINVAL);
 
 	wait_for_completion(&done);
 
-	if (data.error != 0)
+	if (atomic_read(&data.error) != 0)
 		printk(KERN_ERR "Error doing global join\n");
 
-	return data.error;
+	return atomic_read(&data.error);
 }
 #else /* CONFIG_PPC_PSERIES */
 static int rtas_ibm_suspend_me(struct rtas_args *args)
_

^ permalink raw reply

* [PATCH 2/2] powerpc: Partition hibernation support
From: Brian King @ 2010-05-27 21:21 UTC (permalink / raw)
  To: benh; +Cc: brking, linuxppc-dev


Enables support for HMC initiated partition hibernation. This is
a firmware assisted hibernation, since the firmware handles writing
the memory out to disk, along with other partition information,
so we just mimic suspend to ram.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 arch/powerpc/Kconfig                         |    2 
 arch/powerpc/include/asm/hvcall.h            |    1 
 arch/powerpc/include/asm/machdep.h           |    1 
 arch/powerpc/platforms/pseries/Makefile      |    4 
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    3 
 arch/powerpc/platforms/pseries/suspend.c     |  214 +++++++++++++++++++++++++++
 6 files changed, 224 insertions(+), 1 deletion(-)

diff -puN /dev/null arch/powerpc/platforms/pseries/suspend.c
--- /dev/null	2009-12-15 17:58:07.000000000 -0600
+++ linux-2.6-bjking1/arch/powerpc/platforms/pseries/suspend.c	2010-05-27 10:10:28.000000000 -0500
@@ -0,0 +1,214 @@
+/*
+  * Copyright (C) 2010 Brian King IBM Corporation
+  *
+  * This program is free software; you can redistribute it and/or modify
+  * it under the terms of the GNU General Public License as published by
+  * the Free Software Foundation; either version 2 of the License, or
+  * (at your option) any later version.
+  *
+  * This program is distributed in the hope that it will be useful,
+  * but WITHOUT ANY WARRANTY; without even the implied warranty of
+  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+  * GNU General Public License for more details.
+  *
+  * You should have received a copy of the GNU General Public License
+  * along with this program; if not, write to the Free Software
+  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+  */
+
+#include <linux/delay.h>
+#include <linux/suspend.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/machdep.h>
+#include <asm/mmu.h>
+#include <asm/rtas.h>
+
+static u64 stream_id;
+static struct sys_device suspend_sysdev;
+static DECLARE_COMPLETION(suspend_work);
+static struct rtas_suspend_me_data suspend_data;
+static atomic_t suspending;
+
+/**
+ * pseries_suspend_begin - First phase of hibernation
+ *
+ * Check to ensure we are in a valid state to hibernate
+ *
+ * Return value:
+ * 	0 on success / other on failure
+ **/
+static int pseries_suspend_begin(suspend_state_t state)
+{
+	long vasi_state, rc;
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+
+	/* Make sure the state is valid */
+	rc = plpar_hcall(H_VASI_STATE, retbuf, stream_id);
+
+	vasi_state = retbuf[0];
+
+	if (rc) {
+		pr_err("pseries_suspend_begin: vasi_state returned %ld\n",rc);
+		return rc;
+	} else if (vasi_state == H_VASI_ENABLED) {
+		return -EAGAIN;
+	} else if (vasi_state != H_VASI_SUSPENDING) {
+		pr_err("pseries_suspend_begin: vasi_state returned state %ld\n",
+		       vasi_state);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * pseries_suspend_cpu - Suspend a single CPU
+ *
+ * Makes the H_JOIN call to suspend the CPU
+ *
+ **/
+static int pseries_suspend_cpu(void)
+{
+	if (atomic_read(&suspending))
+		return rtas_suspend_cpu(&suspend_data);
+	return 0;
+}
+
+/**
+ * pseries_suspend_enter - Final phase of hibernation
+ *
+ * Return value:
+ * 	0 on success / other on failure
+ **/
+static int pseries_suspend_enter(suspend_state_t state)
+{
+	int rc = rtas_suspend_last_cpu(&suspend_data);
+
+	atomic_set(&suspending, 0);
+	atomic_set(&suspend_data.done, 1);
+	return rc;
+}
+
+/**
+ * pseries_prepare_late - Prepare to suspend all other CPUs
+ *
+ * Return value:
+ * 	0 on success / other on failure
+ **/
+static int pseries_prepare_late(void)
+{
+	atomic_set(&suspending, 1);
+	atomic_set(&suspend_data.working, 0);
+	atomic_set(&suspend_data.done, 0);
+	atomic_set(&suspend_data.error, 0);
+	suspend_data.complete = &suspend_work;
+	INIT_COMPLETION(suspend_work);
+	return 0;
+}
+
+/**
+ * store_hibernate - Initiate partition hibernation
+ * @classdev:	sysdev class struct
+ * @attr:		class device attribute struct
+ * @buf:		buffer
+ * @count:		buffer size
+ *
+ * Write the stream ID received from the HMC to this file
+ * to trigger hibernating the partition
+ *
+ * Return value:
+ * 	number of bytes printed to buffer / other on failure
+ **/
+static ssize_t store_hibernate(struct sysdev_class *classdev,
+			       struct sysdev_class_attribute *attr,
+			       const char *buf, size_t count)
+{
+	int rc;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	stream_id = simple_strtoul(buf, NULL, 16);
+
+	do {
+		rc = pseries_suspend_begin(PM_SUSPEND_MEM);
+		if (rc == -EAGAIN)
+			ssleep(1);
+	} while (rc == -EAGAIN);
+
+	if (!rc)
+		rc = pm_suspend(PM_SUSPEND_MEM);
+
+	stream_id = 0;
+
+	if (!rc)
+		rc = count;
+	return rc;
+}
+
+static SYSDEV_CLASS_ATTR(hibernate, S_IWUSR, NULL, store_hibernate);
+
+static struct sysdev_class suspend_sysdev_class = {
+	.name = "power",
+};
+
+static struct platform_suspend_ops pseries_suspend_ops = {
+	.valid		= suspend_valid_only_mem,
+	.begin		= pseries_suspend_begin,
+	.prepare_late	= pseries_prepare_late,
+	.enter		= pseries_suspend_enter,
+};
+
+/**
+ * pseries_suspend_sysfs_register - Register with sysfs
+ *
+ * Return value:
+ * 	0 on success / other on failure
+ **/
+static int pseries_suspend_sysfs_register(struct sys_device *sysdev)
+{
+	int rc;
+
+	if ((rc = sysdev_class_register(&suspend_sysdev_class)))
+		return rc;
+
+	sysdev->id = 0;
+	sysdev->cls = &suspend_sysdev_class;
+
+	if ((rc = sysdev_class_create_file(&suspend_sysdev_class, &attr_hibernate)))
+		goto class_unregister;
+
+	return 0;
+
+class_unregister:
+	sysdev_class_unregister(&suspend_sysdev_class);
+	return rc;
+}
+
+/**
+ * pseries_suspend_init - initcall for pSeries suspend
+ *
+ * Return value:
+ * 	0 on success / other on failure
+ **/
+static int __init pseries_suspend_init(void)
+{
+	int rc;
+
+	if (!machine_is(pseries) || !firmware_has_feature(FW_FEATURE_LPAR))
+		return 0;
+
+	suspend_data.token = rtas_token("ibm,suspend-me");
+	if (suspend_data.token == RTAS_UNKNOWN_SERVICE)
+		return 0;
+
+	if ((rc = pseries_suspend_sysfs_register(&suspend_sysdev)))
+		return rc;
+
+	ppc_md.suspend_disable_cpu = pseries_suspend_cpu;
+	suspend_set_ops(&pseries_suspend_ops);
+	return 0;
+}
+
+__initcall(pseries_suspend_init);
diff -puN arch/powerpc/Kconfig~powerpc_allarch_pseries_hibernation arch/powerpc/Kconfig
--- linux-2.6/arch/powerpc/Kconfig~powerpc_allarch_pseries_hibernation	2010-05-27 10:10:28.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/Kconfig	2010-05-27 10:10:28.000000000 -0500
@@ -218,7 +218,7 @@ config ARCH_HIBERNATION_POSSIBLE
 config ARCH_SUSPEND_POSSIBLE
 	def_bool y
 	depends on ADB_PMU || PPC_EFIKA || PPC_LITE5200 || PPC_83xx || \
-		   PPC_85xx || PPC_86xx
+		   PPC_85xx || PPC_86xx || PPC_PSERIES
 
 config PPC_DCR_NATIVE
 	bool
diff -puN arch/powerpc/platforms/pseries/Makefile~powerpc_allarch_pseries_hibernation arch/powerpc/platforms/pseries/Makefile
--- linux-2.6/arch/powerpc/platforms/pseries/Makefile~powerpc_allarch_pseries_hibernation	2010-05-27 10:10:28.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/platforms/pseries/Makefile	2010-05-27 10:10:28.000000000 -0500
@@ -26,3 +26,7 @@ obj-$(CONFIG_HCALL_STATS)	+= hvCall_inst
 obj-$(CONFIG_PHYP_DUMP)	+= phyp_dump.o
 obj-$(CONFIG_CMM)		+= cmm.o
 obj-$(CONFIG_DTL)		+= dtl.o
+
+ifeq ($(CONFIG_PPC_PSERIES),y)
+obj-$(CONFIG_SUSPEND)		+= suspend.o
+endif
diff -puN arch/powerpc/include/asm/hvcall.h~powerpc_allarch_pseries_hibernation arch/powerpc/include/asm/hvcall.h
--- linux-2.6/arch/powerpc/include/asm/hvcall.h~powerpc_allarch_pseries_hibernation	2010-05-27 10:10:28.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/include/asm/hvcall.h	2010-05-27 10:10:28.000000000 -0500
@@ -74,6 +74,7 @@
 #define H_NOT_ENOUGH_RESOURCES -44
 #define H_R_STATE       -45
 #define H_RESCINDEND    -46
+#define H_MULTI_THREADS_ACTIVE -9005
 
 
 /* Long Busy is a condition that can be returned by the firmware
diff -puN arch/powerpc/include/asm/machdep.h~powerpc_allarch_pseries_hibernation arch/powerpc/include/asm/machdep.h
--- linux-2.6/arch/powerpc/include/asm/machdep.h~powerpc_allarch_pseries_hibernation	2010-05-27 10:10:28.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/include/asm/machdep.h	2010-05-27 10:10:28.000000000 -0500
@@ -266,6 +266,7 @@ struct machdep_calls {
 	void (*suspend_disable_irqs)(void);
 	void (*suspend_enable_irqs)(void);
 #endif
+	int (*suspend_disable_cpu)(void);
 
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 	ssize_t (*cpu_probe)(const char *, size_t);
diff -puN arch/powerpc/platforms/pseries/hotplug-cpu.c~powerpc_allarch_pseries_hibernation arch/powerpc/platforms/pseries/hotplug-cpu.c
--- linux-2.6/arch/powerpc/platforms/pseries/hotplug-cpu.c~powerpc_allarch_pseries_hibernation	2010-05-27 10:10:28.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/platforms/pseries/hotplug-cpu.c	2010-05-27 10:10:28.000000000 -0500
@@ -116,6 +116,9 @@ static void pseries_mach_cpu_die(void)
 
 	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
 		set_cpu_current_state(cpu, CPU_STATE_INACTIVE);
+		if (ppc_md.suspend_disable_cpu)
+			ppc_md.suspend_disable_cpu();
+
 		cede_latency_hint = 2;
 
 		get_lppaca()->idle = 1;
_

^ permalink raw reply

* Re: [RFC PATCH] powerpc: Emulate nop too
From: Paul Mackerras @ 2010-05-28  2:05 UTC (permalink / raw)
  To: Ananth N Mavinakayanahalli; +Cc: linuxppc-dev, K.Prasad, Srikar Dronamraju
In-Reply-To: <20100527141203.GA20770@in.ibm.com>

On Thu, May 27, 2010 at 07:42:03PM +0530, Ananth N Mavinakayanahalli wrote:

> While we are at it, can we also add nop to the list of emulated
> instructions?

I have a patch in development that emulates most of the arithmetic,
logical and shift/rotate instructions, including ori.

While you're here (in a virtual sense at least :), could you explain
what's going on with the emulate_step() call in resume_execution() in
arch/powerpc/kernel/kprobes.c?  It looks like, having decided that
emulate_step() can't handle the instruction, you single-step the
instruction out of line and then call emulate_step again on the same
instruction, in resume_execution().  Why on earth is it trying to
emulate the instruction when it has already been executed at this
point?  Is there any reason why we can't just remove the emulate_step
call from resume_execution()?

Paul.

^ permalink raw reply

* Re: [RFC PATCH] powerpc: Emulate nop too
From: Michael Neuling @ 2010-05-28  2:28 UTC (permalink / raw)
  To: ananth; +Cc: linuxppc-dev, Paul Mackerras, Srikar Dronamraju, K.Prasad
In-Reply-To: <20100527141203.GA20770@in.ibm.com>



In message <20100527141203.GA20770@in.ibm.com> you wrote:
> Hi Paul,
> 
> While we are at it, can we also add nop to the list of emulated
> instructions?
> 
> Ananth
> ---
> From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> 
> Emulate ori 0,0,0 (nop).
> 
> The long winded way is to do:
> 
>         case 24:
>                 rd = (instr >> 21) & 0x1f;
>                 if (rd != 0)
>                         break;
>                 rb = (instr >> 11) & 0x1f;
>                 if (rb != 0)
>                         break;

Don't we just need rb == rd?

>                 imm = (instr & 0xffff);
>                 if (imm == 0) {
>                         regs->nip += 4;
>                         return 1;
>                 }
>                 break;
> 
> But, the following is more straightforward for this case at least.
> 
> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/lib/sstep.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> Index: linux-27may/arch/powerpc/lib/sstep.c
> ===================================================================
> --- linux-27may.orig/arch/powerpc/lib/sstep.c
> +++ linux-27may/arch/powerpc/lib/sstep.c
> @@ -412,6 +412,12 @@ int __kprobes emulate_step(struct pt_reg
>  	int err;
>  	mm_segment_t oldfs;
>  
> +	/* ori 0,0,0 is a nop. Emulate that too */
> +	if (instr == 0x60000000) {
> +		regs->nip += 4;
> +		return 1;
> +	}
> +
>  	opcode = instr >> 26;
>  	switch (opcode) {
>  	case 16:	/* bc */
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* Re: [RFC PATCH] powerpc: Emulate nop too
From: Ananth N Mavinakayanahalli @ 2010-05-28  3:52 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras, Srikar Dronamraju, K.Prasad
In-Reply-To: <666A1FDB-C8EC-493A-957E-3C4291CCB439@kernel.crashing.org>

On Thu, May 27, 2010 at 03:22:45PM -0500, Kumar Gala wrote:
> 
> On May 27, 2010, at 9:12 AM, Ananth N Mavinakayanahalli wrote:
> 
> > Hi Paul,
> > 
> > While we are at it, can we also add nop to the list of emulated
> > instructions?
> 
> Dare I ask why we need to emulate nop?

We are close to getting userspace probes done. Some userspace apps
that've gained 'markers' from DTrace or elsewhere, are being enabled to
expose those in through debuginfo on Linux and the underlying instruction
there is a nop.

Ananth

^ permalink raw reply

* Re: [RFC PATCH] powerpc: Emulate nop too
From: Ananth N Mavinakayanahalli @ 2010-05-28  4:16 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras, Srikar Dronamraju, K.Prasad
In-Reply-To: <23979.1275013723@neuling.org>

On Fri, May 28, 2010 at 12:28:43PM +1000, Michael Neuling wrote:
> 
> 
> In message <20100527141203.GA20770@in.ibm.com> you wrote:
> > Hi Paul,
> > 
> > While we are at it, can we also add nop to the list of emulated
> > instructions?
> > 
> > Ananth
> > ---
> > From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> > 
> > Emulate ori 0,0,0 (nop).
> > 
> > The long winded way is to do:
> > 
> >         case 24:
> >                 rd = (instr >> 21) & 0x1f;
> >                 if (rd != 0)
> >                         break;
> >                 rb = (instr >> 11) & 0x1f;
> >                 if (rb != 0)
> >                         break;
> 
> Don't we just need rb == rd?

Sure. But for this case, just checking against the opcode seems simple
enough.

Ananth

^ permalink raw reply

* Re: [RFC PATCH] powerpc: Emulate nop too
From: Michael Neuling @ 2010-05-28  4:23 UTC (permalink / raw)
  To: ananth; +Cc: linuxppc-dev, Paul Mackerras, Srikar Dronamraju, K.Prasad
In-Reply-To: <20100528041645.GB25946@in.ibm.com>



In message <20100528041645.GB25946@in.ibm.com> you wrote:
> On Fri, May 28, 2010 at 12:28:43PM +1000, Michael Neuling wrote:
> > 
> > 
> > In message <20100527141203.GA20770@in.ibm.com> you wrote:
> > > Hi Paul,
> > > 
> > > While we are at it, can we also add nop to the list of emulated
> > > instructions?
> > > 
> > > Ananth
> > > ---
> > > From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> > > 
> > > Emulate ori 0,0,0 (nop).
> > > 
> > > The long winded way is to do:
> > > 
> > >         case 24:
> > >                 rd = (instr >> 21) & 0x1f;
> > >                 if (rd != 0)
> > >                         break;
> > >                 rb = (instr >> 11) & 0x1f;
> > >                 if (rb != 0)
> > >                         break;
> > 
> > Don't we just need rb == rd?
> 
> Sure. But for this case, just checking against the opcode seems simple
> enough.

Simple, sure.  You could not emulate anything and remove the code
altogether.  That would be truly simple. :-P

Why not eliminate as much as possible?

Anyway, sounds like paulus as a better solution.

Mikey

^ permalink raw reply

* linux-next: build failure after merge of the final tree (linus' tree related)
From: Stephen Rothwell @ 2010-05-28  5:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, ppc-dev, linux-kernel, linux-next, paulus,
	Anton Blanchard ZZ, H. Peter Anvin, Thomas Gleixner, Linus,
	Ingo Molnar

Hi Steven,

After merging the final tree, today's linux-next build (powerpc allyesconfig) failed like this:

arch/powerpc/platforms/pseries/hvCall_inst.c: In function 'hcall_inst_init':
arch/powerpc/platforms/pseries/hvCall_inst.c:143: warning: passing argument 1 of 'register_trace_hcall_entry' from incompatible pointer type
arch/powerpc/include/asm/trace.h:100: note: expected 'void (*)(void *, long unsigned int,  long unsigned int *)' but argument is of type 'void (*)(long unsigned int,  long unsigned int *)'
arch/powerpc/platforms/pseries/hvCall_inst.c:143: error: too few arguments to function 'register_trace_hcall_entry'
arch/powerpc/platforms/pseries/hvCall_inst.c:146: warning: passing argument 1 of 'register_trace_hcall_exit' from incompatible pointer type
arch/powerpc/include/asm/trace.h:122: note: expected 'void (*)(void *, long unsigned int,  long unsigned int,  long unsigned int *)' but argument is of type 'void (*)(long unsigned int,  long unsigned int,  long unsigned int *)'
arch/powerpc/platforms/pseries/hvCall_inst.c:146: error: too few arguments to function 'register_trace_hcall_exit'
arch/powerpc/platforms/pseries/hvCall_inst.c:147: warning: passing argument 1 of 'unregister_trace_hcall_entry' from incompatible pointer type
arch/powerpc/include/asm/trace.h:100: note: expected 'void (*)(void *, long unsigned int,  long unsigned int *)' but argument is of type 'void (*)(long unsigned int,  long unsigned int *)'
arch/powerpc/platforms/pseries/hvCall_inst.c:147: error: too few arguments to function 'unregister_trace_hcall_entry'

Probably caused by commit 38516ab59fbc5b3bb278cf5e1fe2867c70cff32e
("tracing: Let tracepoints have data passed to tracepoint callbacks").

I applied this patch which builds (but may be incorrect).

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Fri, 28 May 2010 15:05:00 +1000
Subject: [PATCH] tracing: fix for tracepoint API change

Commit 38516ab59fbc5b3bb278cf5e1fe2867c70cff32e "tracing: Let tracepoints
have data passed to tracepoint callbacks" requires this fixup to the
powerpc code.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 arch/powerpc/platforms/pseries/hvCall_inst.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hvCall_inst.c b/arch/powerpc/platforms/pseries/hvCall_inst.c
index 1fefae7..e19ff02 100644
--- a/arch/powerpc/platforms/pseries/hvCall_inst.c
+++ b/arch/powerpc/platforms/pseries/hvCall_inst.c
@@ -102,7 +102,7 @@ static const struct file_operations hcall_inst_seq_fops = {
 #define CPU_NAME_BUF_SIZE	32
 
 
-static void probe_hcall_entry(unsigned long opcode, unsigned long *args)
+static void probe_hcall_entry(void *ignored, unsigned long opcode, unsigned long *args)
 {
 	struct hcall_stats *h;
 
@@ -114,7 +114,7 @@ static void probe_hcall_entry(unsigned long opcode, unsigned long *args)
 	h->purr_start = mfspr(SPRN_PURR);
 }
 
-static void probe_hcall_exit(unsigned long opcode, unsigned long retval,
+static void probe_hcall_exit(void *ignored, unsigned long opcode, unsigned long retval,
 			     unsigned long *retbuf)
 {
 	struct hcall_stats *h;
@@ -140,11 +140,11 @@ static int __init hcall_inst_init(void)
 	if (!firmware_has_feature(FW_FEATURE_LPAR))
 		return 0;
 
-	if (register_trace_hcall_entry(probe_hcall_entry))
+	if (register_trace_hcall_entry(probe_hcall_entry, NULL))
 		return -EINVAL;
 
-	if (register_trace_hcall_exit(probe_hcall_exit)) {
-		unregister_trace_hcall_entry(probe_hcall_entry);
+	if (register_trace_hcall_exit(probe_hcall_exit, NULL)) {
+		unregister_trace_hcall_entry(probe_hcall_entry, NULL);
 		return -EINVAL;
 	}
 
-- 
1.7.1


-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

^ permalink raw reply related

* powerpc: remove resume_execution() in kprobes
From: Ananth N Mavinakayanahalli @ 2010-05-28  5:19 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, K.Prasad, Srikar Dronamraju
In-Reply-To: <20100528020556.GA10586@brick.ozlabs.ibm.com>

On Fri, May 28, 2010 at 12:05:56PM +1000, Paul Mackerras wrote:
> On Thu, May 27, 2010 at 07:42:03PM +0530, Ananth N Mavinakayanahalli wrote:
> 
> > While we are at it, can we also add nop to the list of emulated
> > instructions?
> 
> I have a patch in development that emulates most of the arithmetic,
> logical and shift/rotate instructions, including ori.

OK.

> While you're here (in a virtual sense at least :), could you explain
> what's going on with the emulate_step() call in resume_execution() in
> arch/powerpc/kernel/kprobes.c?  It looks like, having decided that
> emulate_step() can't handle the instruction, you single-step the
> instruction out of line and then call emulate_step again on the same
> instruction, in resume_execution().  Why on earth is it trying to
> emulate the instruction when it has already been executed at this
> point?  Is there any reason why we can't just remove the emulate_step
> call from resume_execution()?

You are right. We needed emulate_step() in resume_execution() before we
had the code to handle the emulation in kprobe_handler() at the time of
the breakpoint it. At the time we needed it mainly to ensure branch
targets are reflected correctly in regs->nip if the stepped instruction
was a branch.

However, we now don't get to post_kprobe_handler() at all if
emulate_step() returned 1 at the time of the breakpoint hit. It suffices
if we just fixup the nip here. Patch below. Tested for various
instructions that can and can't be emulated...

---
From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>

emulate_step() in kprobe_handler() would've already determined if the
probed instruction can be emulated. We single-step in hardware only if
the instruction couldn't be emulated. resume_execution() therefore is
superfluous -- all we need is to fix up the instruction pointer after
single-stepping.

Thanks to Paul Mackerras for catching this.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
---
 arch/powerpc/kernel/kprobes.c |   14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

Index: linux-2.6.34/arch/powerpc/kernel/kprobes.c
===================================================================
--- linux-2.6.34.orig/arch/powerpc/kernel/kprobes.c
+++ linux-2.6.34/arch/powerpc/kernel/kprobes.c
@@ -375,17 +375,6 @@ static int __kprobes trampoline_probe_ha
  * single-stepped a copy of the instruction.  The address of this
  * copy is p->ainsn.insn.
  */
-static void __kprobes resume_execution(struct kprobe *p, struct pt_regs *regs)
-{
-	int ret;
-	unsigned int insn = *p->ainsn.insn;
-
-	regs->nip = (unsigned long)p->addr;
-	ret = emulate_step(regs, insn);
-	if (ret == 0)
-		regs->nip = (unsigned long)p->addr + 4;
-}
-
 static int __kprobes post_kprobe_handler(struct pt_regs *regs)
 {
 	struct kprobe *cur = kprobe_running();
@@ -403,7 +392,8 @@ static int __kprobes post_kprobe_handler
 		cur->post_handler(cur, regs, 0);
 	}
 
-	resume_execution(cur, regs);
+	/* Adjust nip to after the single-stepped instruction */
+	regs->nip = (unsigned long)cur->addr + 4;
 	regs->msr |= kcb->kprobe_saved_msr;
 
 	/*Restore back the original saved kprobes variables and continue. */

^ permalink raw reply

* Re: [RFC PATCH] powerpc: Emulate nop too
From: Ananth N Mavinakayanahalli @ 2010-05-28  5:54 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras, Srikar Dronamraju, K.Prasad
In-Reply-To: <7278.1275020610@neuling.org>

On Fri, May 28, 2010 at 02:23:30PM +1000, Michael Neuling wrote:
> 
> 
> In message <20100528041645.GB25946@in.ibm.com> you wrote:
> > On Fri, May 28, 2010 at 12:28:43PM +1000, Michael Neuling wrote:
> > > 
> > > 
> > > In message <20100527141203.GA20770@in.ibm.com> you wrote:
> > > > Hi Paul,
> > > > 
> > > > While we are at it, can we also add nop to the list of emulated
> > > > instructions?
> > > > 
> > > > Ananth
> > > > ---
> > > > From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> > > > 
> > > > Emulate ori 0,0,0 (nop).
> > > > 
> > > > The long winded way is to do:
> > > > 
> > > >         case 24:
> > > >                 rd = (instr >> 21) & 0x1f;
> > > >                 if (rd != 0)
> > > >                         break;
> > > >                 rb = (instr >> 11) & 0x1f;
> > > >                 if (rb != 0)
> > > >                         break;
> > > 
> > > Don't we just need rb == rd?
> > 
> > Sure. But for this case, just checking against the opcode seems simple
> > enough.
> 
> Simple, sure.  You could not emulate anything and remove the code
> altogether.  That would be truly simple. :-P

Ah.. I misunderstood your initial point about rb == rd with imm = 0.

> Why not eliminate as much as possible?
> 
> Anyway, sounds like paulus as a better solution.

Right.

Ananth

^ permalink raw reply

* [Patch 1/5] Allow arch-specific cleanup before breakpoint unregistration
From: K.Prasad @ 2010-05-28  6:39 UTC (permalink / raw)
  To: linuxppc-dev@ozlabs.org, Paul Mackerras
  Cc: Michael Neuling, Benjamin Herrenschmidt, shaggy,
	Frederic Weisbecker, David Gibson, Alan Stern, K.Prasad,
	Roland McGrath
In-Reply-To: <20100528061928.677651410@linux.vnet.ibm.com>

Certain architectures (such as PowerPC Book III S) have a need to cleanup
data-structures before the breakpoint is unregistered. This patch introduces
an arch-specific hook in release_bp_slot() along with a weak definition in
the form of a stub funciton.

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/hw_breakpoint.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

Index: linux-2.6.ppc64_test/kernel/hw_breakpoint.c
===================================================================
--- linux-2.6.ppc64_test.orig/kernel/hw_breakpoint.c
+++ linux-2.6.ppc64_test/kernel/hw_breakpoint.c
@@ -242,6 +242,17 @@ toggle_bp_slot(struct perf_event *bp, bo
 }
 
 /*
+ * Function to perform processor-specific cleanup during unregistration
+ */
+__weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
+{
+	/*
+	 * A weak stub function here for those archs that don't define
+	 * it inside arch/.../kernel/hw_breakpoint.c
+	 */
+}
+
+/*
  * Contraints to check before allowing this new breakpoint counter:
  *
  *  == Non-pinned counter == (Considered as pinned for now)
@@ -339,6 +350,7 @@ void release_bp_slot(struct perf_event *
 {
 	mutex_lock(&nr_bp_mutex);
 
+	arch_unregister_hw_breakpoint(bp);
 	__release_bp_slot(bp);
 
 	mutex_unlock(&nr_bp_mutex);

^ permalink raw reply

* [Patch 0/5] PPC64-HWBKPT: Hardware Breakpoint interfaces - ver XXII
From: K.Prasad @ 2010-05-28  6:39 UTC (permalink / raw)
  To: linuxppc-dev@ozlabs.org, Paul Mackerras
  Cc: Michael Neuling, Benjamin Herrenschmidt, shaggy,
	Frederic Weisbecker, David Gibson, Alan Stern, Roland McGrath

Hi All,
	Please find a new set of patches that have the following changes.

Changelog - ver XXII
--------------------
(Version XXI: linuxppc-dev ref:20100525091314.GA29003@in.ibm.com)
- Extraneous breakpoint exceptions are now properly handled; causative
  instruction will be single-stepped and debug register values restored.
- Restoration of breakpoints during signal handling through thread_change_pc()
  had defects which are now fixed.
- Breakpoints are flushed through flush_ptrace_hw_breakpoint() call in both
  flush_thread() and prepare_to_copy() functions. 'ptrace_bps[]' and
  'last_hit_ubp' members are now promptly cleaned-up.
- Single-step exception is now conditionally emulated upon hitting
  alignment_exception.
- Rebased to commit 31f46717997a83bdf6db0dd04810c0a329eb3148 of linux-2.6 tree.

Kindly let me know your comments for the same.

Thanks,
K.Prasad

Changelog - ver XXI
--------------------
(Version XX: linuxppc-dev ref:20100524103136.GA8131@in.ibm.com)
- Decision to emulate an instruction is now based on whether the causative
  instruction is in user_mode() or not.
- Breakpoints don't have to be cleared during sigreturn. A 'double-hit' on
  hw_breakpoint_handler() is harmless for non-ptrace instructions.
- Minor changes to aid code brevity.

Changelog - ver XX
--------------------
(Version XIX: linuxppc-dev ref: 20100524040137.GA20873@in.ibm.com)
- Non task-bound breakpoints will only be emulated. Breakpoint will be
  unregistered with a warning if emulation fails.

Changelog - ver XIX
--------------------
(Version XVIII: linuxppc-dev ref: 20100512033055.GA6384@in.ibm.com)
- Increased coverage of breakpoints during concurrent alignment_exception
  and signal handling (which previously had 'blind-spots').
- Support for kernel-thread breakpoints and kernel-space breakpoints inside the
  context of a user-space process.
- Patches re-based to commit f4b87dee923342505e1ddba8d34ce9de33e75050, thereby
  necessitating minor changes to arch_validate_hwbkpt_settings().

Changelog - ver XVIII
--------------------
(Version XVII: linuxppc-dev ref: 20100414034340.GA6571@in.ibm.com)
- Slight code restructuring for brevity and coding-style corrections.
- emulate_single_step() now notifies DIE_SSTEP to registered handlers;
  causes single_step_dabr_instruction() to be invoked after alignment_exception.
- hw-breakpoint restoration variables are cleaned-up before unregistration
  through arch_unregister_hw_breakpoint().
- SIGTRAP is no longer generated for non-ptrace user-space breakpoints.

Changelog - ver XVII
--------------------
(Version XVI: linuxppc-dev ref: 20100330095809.GA14403@in.ibm.com)
- CONFIG_HAVE_HW_BREAKPOINT is now used to define the scope of the new code
  (in lieu of CONFIG_PPC_BOOK3S_64).
- CONFIG_HAVE_HW_BREAKPOINT is now dependant upon CONFIG_PERF_EVENTS and
  CONFIG_PPC_BOOK3S_64 (to overcome build failures under certain configs).
- Included a target in arch/powerpc/lib/Makefile to build sstep.o when
  HAVE_HW_BREAKPOINT.
- Added a dummy definition for hw_breakpoint_disable() when !HAVE_HW_BREAKPOINT.
- Tested builds under defconfigs for ppc64, cell and g5 (found no patch induced
  failures).

Changelog - ver XVI
--------------------
(Version XV: linuxppc-dev ref: 20100323140639.GA21836@in.ibm.com)
- Used a new config option CONFIG_PPC_BOOK3S_64 (in lieu of
  CONFIG_PPC64/CPU_FTR_HAS_DABR) to limit the scope of the new code.
- Disabled breakpoints before kexec of the machine using hw_breakpoint_disable().
- Minor optimisation in exception-64s.S to check for data breakpoint exceptions
  in DSISR finally (after check for other causes) + changes in code comments and 
  representation of DSISR_DABRMATCH constant.
- Rebased to commit ae6be51ed01d6c4aaf249a207b4434bc7785853b of linux-2.6.

Changelog - ver XV
--------------------
(Version XIV: linuxppc-dev ref: 20100308181232.GA3406@in.ibm.com)

- Additional patch to disable interrupts during data breakpoint exception
  handling.
- Moved HBP_NUM definition to cputable.h under a new CPU_FTR definition
  (CPU_FTR_HAS_DABR).
- Filtering of extraneous exceptions (due to accesses outside symbol length) is
  by-passed for ptrace requests.
- Removed flush_ptrace_hw_breakpoint() from __switch_to() due to incorrect
  coding placement.
- Changes to code comments as per community reviews for previous version.
- Minor coding-style changes in hw_breakpoint.c as per review comments.
- Re-based to commit ae6be51ed01d6c4aaf249a207b4434bc7785853b of linux-2.6

Changelog - ver XIV
--------------------
(Version XIII: linuxppc-dev ref: 20100215055605.GB3670@in.ibm.com)

- Removed the 'name' field from 'struct arch_hw_breakpoint'.
- All callback invocations through bp->overflow_handler() are replaced with
  perf_bp_event().
- Removed the check for pre-existing single-stepping mode in
  hw_breakpoint_handler() as this check is unreliable while in kernel-space.
  Side effect of this change is the non-triggering of hw-breakpoints while
  single-stepping kernel through KGDB or Xmon.
- Minor code-cleanups and addition of comments in hw_breakpoint_handler() and
  single_step_dabr_instruction().
- Re-based to commit 25cf84cf377c0aae5dbcf937ea89bc7893db5176 of linux-2.6

Changelog - ver XIII
--------------------
(Version XII: linuxppc-dev ref: 20100121084640.GA3252@in.ibm.com)

- Fixed a bug for user-space breakpoints (triggered following the failure of a
  breakpoint request).
- Re-based on commit 724e6d3fe8003c3f60bf404bf22e4e331327c596 of linux-2.6
  
Changelog - ver XII
--------------------
(Version XI: linuxppc-dev ref: 20100119091234.GA9971@in.ibm.com)

- Unset MSR_SE only if kernel was not previously in single-step mode.
- Pre-emption is now enabled before returning from the hw-breakpoint exception
  handler.
- Variables to track the source of single-step exception (breakpoint from kernel,
  user-space vs single-stepping due to other requests) are added.
- Extraneous hw-breakpoint exceptions (due to memory accesses lying outside
  monitored symbol length) is now done for both kernel and user-space
  (previously only user-space).
- single_step_dabr_instruction() now returns NOTIFY_DONE if kernel was in
  single-step mode even before the hw-breakpoint. This enables other users of
  single-step mode to be notified of the exception.
- User-space instructions are not emulated from kernel-space, they are instead
  single-stepped.
  
Changelog - ver XI
------------------
(Version X: linuxppc-dev ref: 20091211160144.GA23156@in.ibm.com)
- Conditionally unset MSR_SE in the single-step handler
- Added comments to explain the duration and need for pre-emption
disable following hw-breakpoint exception.

Changelog - ver X
------------------
- Re-write the PPC64 patches for the new implementation of hw-breakpoints that
  uses the perf-layer.
- Rebased to commit 7622fc234190a37d4e9fe3ed944a2b61a63fca03 of -tip.

Changelog - ver IX
-------------------
- Invocation of user-defined callback will be 'trigger-after-execute' (except
  for ptrace).
- Creation of a new global per-CPU breakpoint structure to help invocation of
  user-defined callback from single-step handler.
(Changes made now)
- Validation before registration will fail only if the address does not match
  the kernel symbol's (if specified) resolved address
  (through kallsyms_lookup_name()).
- 'symbolsize' value is expected to within the range contained by the symbol's
  starting address and the end of a double-word boundary (8 Bytes).
- PPC64's arch-dependant code is now aware of 'cpumask' in 'struct hw_breakpoint'
  and can accomodate requests for a subset of CPUs in the system.
- Introduced arch_disable_hw_breakpoint() required for
  <enable><disable>_hw_breakpoint() APIs.

Changelog - ver VIII
-------------------
- Reverting changes to allow one-shot breakpoints only for ptrace requests.
- Minor changes in sanity checking in arch_validate_hwbkpt_settings().
- put_cpu_no_resched() is no longer available. Converted to put_cpu().

Changelog - ver VII
-------------------
- Allow the one-shot behaviour for exception handlers to be defined by the user.
  A new 'is_one_shot' flag is added to 'struct arch_hw_breakpoint'.

Changelog - ver VI
------------------
The task of identifying 'genuine' breakpoint exceptions from those caused by
'out-of-range' accesses turned out to be more tricky than originally thought.
Some changes to this effect were made in version IV of this patchset, but they
were not sufficient for user-space. Basically the breakpoint address received
through ptrace is always aligned to 8-bytes since ptrace receives an encoded
'data' (consisting of address | translation_enable | bkpt_type), and the size of
the symbol is not known. However for kernel-space addresses, the symbol-size can
be determined using kallsyms_lookup_size_offset() and this is used to check if
DAR (in the exception context) is
'bkpt_address <= DAR <= (bkpt_address + symbol_size)', failing which we conclude
it as a stray exception.

The following changes are made to enable check:
- Addition of a symbolsize field in 'struct arch_hw_breakpoint' field.
- Store the size of the 'watched' kernel symbol into 'symbolsize' field in
  arch_store_info(0 routine.
- Verify if the above described condition is true when is_one_shot is FALSE in
  hw_breakpoint_handler().

Changelog - ver V
------------------
- Breakpoint requests from ptrace (for user-space) are designed to be one-shot
in PPC64. The patch contains changes to retain this behaviour by returning early
in hw_breakpoint_handler() [without re-initialising DABR] and unregistering the
user-space request in ptrace_triggered(). It is safe to make a
unregister_user_hw_breakpoint() call from the breakpoint exception context
[through ptrace_triggered()] without giving rise to circular locking-dependancy.
This is because there can be no kernel code running on the CPU (which received
the exception) with the same spinlock held.

- Minor change in 'type' member of 'struct arch_hw_breakpoint' from u8 to 'int'.

Changelog - ver IV
------------------
- While DABR register requires double-word (8 bytes) aligned addresses, i.e.
the breakpoint is active over a range of 8 bytes, PPC64 allows byte-level
addressability. This may lead to stray exceptions which have to be ignored in
hw_breakpoint_handler(), when DAR != (Breakpoint request address). However DABR
will be populated with the requested breakpoint address aligned to the previous
double-word address. The code is now modified to store user-requested address
in 'bp->info.address' but update the DABR with a double-word aligned address.
- Please note that the Data Breakpoint facility in Xmon is broken as of 2.6.29
and the same has not been integrated into this facility as described in Ver I.

Changelog - ver III
------------------
- Patches are based on commit 08f16e060bf54bdc34f800ed8b5362cdeda75d8b of -tip
tree.
- The declarations in arch/powerpc/include/asm/hw_breakpoint.h are done only if
CONFIG_PPC64 is defined. This eliminates the need to conditionally include this
header file.
- load_debug_registers() is done in start_secondary() i.e. during CPU
initialisation.
- arch_check_va_<> routines in hw_breakpoint.c are now replaced with a much
simpler is_kernel_addr() check in arch_validate_hwbkpt_settings()
- Return code of hw_breakpoint_handler() when triggered due to Lazy debug
register switching is now changed to NOTIFY_STOP.
- The ptrace code no longer sets the TIF_DEBUG task flag as it is proposed to
be done in register_user_hw_breakpoint() routine.
- hw_breakpoint_handler() is now modified to use hbp_kernel_pos value to
  determine if the trigger was a user/kernel space address. The DAR register
  value is checked with the address stored in 'struct hw_breakpoint' to avoid
  handling of exceptions that belong to kprobe/Xmon.


Changelog - ver II
------------------
- Split the monolithic patch into six logical patches
- Changed the signature of arch_check_va_in_<user><kernel>space functions. They
  are now marked static.
- HB_NUM is now called as HBP_NUM (to preserve a consistent short-name
  convention)
- Introduced hw_breakpoint_disable() and changes to kexec code to disable
  breakpoints before a reboot.
- Minor changes in ptrace code to use macro-defined constants instead of
  numbers.
- Introduced a new constant definition INSTRUCTION_LEN in reg.h

^ permalink raw reply

* [Patch 2/5] PPC64-HWBKPT: Implement hw-breakpoints for PowerPC BookIII S
From: K.Prasad @ 2010-05-28  6:40 UTC (permalink / raw)
  To: linuxppc-dev@ozlabs.org, Paul Mackerras
  Cc: Michael Neuling, Benjamin Herrenschmidt, shaggy,
	Frederic Weisbecker, David Gibson, Alan Stern, K.Prasad,
	Roland McGrath
In-Reply-To: <20100528061928.677651410@linux.vnet.ibm.com>

Implement perf-events based hw-breakpoint interfaces for PowerPC Book III S
processors. These interfaces help arbitrate requests from various users and
schedules them as appropriate.

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
---
 arch/powerpc/Kconfig                     |    1 
 arch/powerpc/include/asm/cputable.h      |    4 
 arch/powerpc/include/asm/hw_breakpoint.h |   73 +++++++
 arch/powerpc/include/asm/processor.h     |    8 
 arch/powerpc/kernel/Makefile             |    1 
 arch/powerpc/kernel/hw_breakpoint.c      |  322 +++++++++++++++++++++++++++++++
 arch/powerpc/kernel/machine_kexec_64.c   |    3 
 arch/powerpc/kernel/process.c            |   14 +
 arch/powerpc/kernel/ptrace.c             |   64 ++++++
 arch/powerpc/lib/Makefile                |    1 
 10 files changed, 491 insertions(+)

Index: linux-2.6.ppc64_test/arch/powerpc/include/asm/hw_breakpoint.h
===================================================================
--- /dev/null
+++ linux-2.6.ppc64_test/arch/powerpc/include/asm/hw_breakpoint.h
@@ -0,0 +1,73 @@
+/*
+ * PowerPC BookIII S hardware breakpoint definitions
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2010, IBM Corporation.
+ * Author: K.Prasad <prasad@linux.vnet.ibm.com>
+ *
+ */
+
+#ifndef _PPC_BOOK3S_64_HW_BREAKPOINT_H
+#define _PPC_BOOK3S_64_HW_BREAKPOINT_H
+
+#ifdef	__KERNEL__
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+
+struct arch_hw_breakpoint {
+	u8		len; /* length of the target data symbol */
+	int		type;
+	unsigned long	address;
+};
+
+#include <linux/kdebug.h>
+#include <asm/reg.h>
+#include <asm/system.h>
+
+static inline int hw_breakpoint_slots(int type)
+{
+	return HBP_NUM;
+}
+struct perf_event;
+struct pmu;
+struct perf_sample_data;
+
+#define HW_BREAKPOINT_ALIGN 0x7
+/* Maximum permissible length of any HW Breakpoint */
+#define HW_BREAKPOINT_LEN 0x8
+
+extern int arch_bp_generic_fields(int type, int *gen_bp_type);
+extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
+extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
+						unsigned long val, void *data);
+int arch_install_hw_breakpoint(struct perf_event *bp);
+void arch_uninstall_hw_breakpoint(struct perf_event *bp);
+void hw_breakpoint_pmu_read(struct perf_event *bp);
+extern void flush_ptrace_hw_breakpoint(struct task_struct *tsk);
+
+extern struct pmu perf_ops_bp;
+extern void ptrace_triggered(struct perf_event *bp, int nmi,
+			struct perf_sample_data *data, struct pt_regs *regs);
+static inline void hw_breakpoint_disable(void)
+{
+	set_dabr(0);
+}
+
+#else	/* CONFIG_HAVE_HW_BREAKPOINT */
+static inline void hw_breakpoint_disable(void) { }
+#endif	/* CONFIG_HAVE_HW_BREAKPOINT */
+#endif	/* __KERNEL__ */
+#endif	/* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
Index: linux-2.6.ppc64_test/arch/powerpc/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ linux-2.6.ppc64_test/arch/powerpc/kernel/hw_breakpoint.c
@@ -0,0 +1,322 @@
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers. Derived from
+ * "arch/x86/kernel/hw_breakpoint.c"
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2010 IBM Corporation
+ * Author: K.Prasad <prasad@linux.vnet.ibm.com>
+ *
+ */
+
+#include <linux/hw_breakpoint.h>
+#include <linux/notifier.h>
+#include <linux/kprobes.h>
+#include <linux/percpu.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+
+#include <asm/hw_breakpoint.h>
+#include <asm/processor.h>
+#include <asm/sstep.h>
+
+/*
+ * Stores the breakpoints currently in use on each breakpoint address
+ * register for every cpu
+ */
+static DEFINE_PER_CPU(struct perf_event *, bp_per_reg);
+
+/*
+ * Install a perf counter breakpoint.
+ *
+ * We seek a free debug address register and use it for this
+ * breakpoint.
+ *
+ * Atomic: we hold the counter->ctx->lock and we only handle variables
+ * and registers local to this cpu.
+ */
+int arch_install_hw_breakpoint(struct perf_event *bp)
+{
+	struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+	struct perf_event **slot = &__get_cpu_var(bp_per_reg);
+
+	*slot = bp;
+
+	/*
+	 * Do not install DABR values if the instruction must be single-stepped.
+	 * If so, DABR will be populated in single_step_dabr_instruction().
+	 */
+	if (current->thread.last_hit_ubp != bp)
+		set_dabr(info->address | info->type | DABR_TRANSLATION);
+
+	return 0;
+}
+
+/*
+ * Uninstall the breakpoint contained in the given counter.
+ *
+ * First we search the debug address register it uses and then we disable
+ * it.
+ *
+ * Atomic: we hold the counter->ctx->lock and we only handle variables
+ * and registers local to this cpu.
+ */
+void arch_uninstall_hw_breakpoint(struct perf_event *bp)
+{
+	struct perf_event **slot = &__get_cpu_var(bp_per_reg);
+
+	if (*slot != bp) {
+		WARN_ONCE(1, "Can't find the breakpoint");
+		return;
+	}
+
+	*slot = NULL;
+	set_dabr(0);
+}
+
+/*
+ * Perform cleanup of arch-specific counters during unregistration
+ * of the perf-event
+ */
+void arch_unregister_hw_breakpoint(struct perf_event *bp)
+{
+	/*
+	 * If the breakpoint is unregistered between a hw_breakpoint_handler()
+	 * and the single_step_dabr_instruction(), then cleanup the breakpoint
+	 * restoration variables to prevent dangling pointers.
+	 */
+	if (bp->ctx->task)
+		bp->ctx->task->thread.last_hit_ubp = NULL;
+
+	put_cpu();
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+int arch_check_bp_in_kernelspace(struct perf_event *bp)
+{
+	struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+
+	return is_kernel_addr(info->address);
+}
+
+int arch_bp_generic_fields(int type, int *gen_bp_type)
+{
+	switch (type) {
+	case DABR_DATA_READ:
+		*gen_bp_type = HW_BREAKPOINT_R;
+		break;
+	case DABR_DATA_WRITE:
+		*gen_bp_type = HW_BREAKPOINT_W;
+		break;
+	case (DABR_DATA_WRITE | DABR_DATA_READ):
+		*gen_bp_type = (HW_BREAKPOINT_W | HW_BREAKPOINT_R);
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/*
+ * Validate the arch-specific HW Breakpoint register settings
+ */
+int arch_validate_hwbkpt_settings(struct perf_event *bp)
+{
+	int ret = -EINVAL;
+	struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+
+	if (!bp)
+		return ret;
+
+	switch (bp->attr.bp_type) {
+	case HW_BREAKPOINT_R:
+		info->type = DABR_DATA_READ;
+		break;
+	case HW_BREAKPOINT_W:
+		info->type = DABR_DATA_WRITE;
+		break;
+	case HW_BREAKPOINT_R | HW_BREAKPOINT_W:
+		info->type = (DABR_DATA_READ | DABR_DATA_WRITE);
+		break;
+	default:
+		return ret;
+	}
+
+	info->address = bp->attr.bp_addr;
+	info->len = bp->attr.bp_len;
+
+	/*
+	 * Since breakpoint length can be a maximum of HW_BREAKPOINT_LEN(8)
+	 * and breakpoint addresses are aligned to nearest double-word
+	 * HW_BREAKPOINT_ALIGN by rounding off to the lower address, the
+	 * 'symbolsize' should satisfy the check below.
+	 */
+	if (info->len >
+	    (HW_BREAKPOINT_LEN - (info->address & HW_BREAKPOINT_ALIGN)))
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+	bool is_ptrace_bp = false;
+	int rc = NOTIFY_STOP;
+	struct perf_event *bp;
+	struct pt_regs *regs = args->regs;
+	int stepped = 1;
+	struct arch_hw_breakpoint *info;
+
+	/* Disable breakpoints during exception handling */
+	set_dabr(0);
+	/*
+	 * The counter may be concurrently released but that can only
+	 * occur from a call_rcu() path. We can then safely fetch
+	 * the breakpoint, use its callback, touch its counter
+	 * while we are in an rcu_read_lock() path.
+	 */
+	rcu_read_lock();
+
+	bp = __get_cpu_var(bp_per_reg);
+	if (!bp)
+		goto out;
+	info = counter_arch_bp(bp);
+	is_ptrace_bp = (bp->overflow_handler == ptrace_triggered) ?
+			true : false;
+
+	/*
+	 * Return early after invoking user-callback function without restoring
+	 * DABR if the breakpoint is from ptrace which always operates in
+	 * one-shot mode. The ptrace-ed process will receive the SIGTRAP signal
+	 * generated in do_dabr().
+	 */
+	if (is_ptrace_bp) {
+		perf_bp_event(bp, regs);
+		rc = NOTIFY_DONE;
+		goto out;
+	}
+
+	/* Do not emulate user-space instructions, instead single-step them */
+	if (user_mode(regs)) {
+		bp->ctx->task->thread.last_hit_ubp = bp;
+		regs->msr |= MSR_SE;
+		goto out;
+	}
+
+	stepped = emulate_step(regs, regs->nip);
+	/*
+	 * emulate_step() could not execute it. We've failed in reliably
+	 * handling the hw-breakpoint. Unregister it and throw a warning
+	 * message to let the user know about it.
+	 */
+	if (!stepped) {
+		WARN(1, "Unable to handle hardware breakpoint. Breakpoint at "
+			"0x%lx will be uninstalled.", info->address);
+		unregister_hw_breakpoint(bp);
+		goto out;
+	}
+	/*
+	 * As a policy, the callback is invoked in a 'trigger-after-execute'
+	 * fashion
+	 */
+	perf_bp_event(bp, regs);
+
+	set_dabr(info->address | info->type | DABR_TRANSLATION);
+out:
+	rcu_read_unlock();
+	return rc;
+}
+
+/*
+ * Handle single-step exceptions following a DABR hit.
+ */
+int __kprobes single_step_dabr_instruction(struct die_args *args)
+{
+	struct pt_regs *regs = args->regs;
+	struct perf_event *bp = NULL;
+	struct arch_hw_breakpoint *bp_info;
+
+	bp = current->thread.last_hit_ubp;
+	/*
+	 * Check if we are single-stepping as a result of a
+	 * previous HW Breakpoint exception
+	 */
+	if (!bp)
+		return NOTIFY_DONE;
+
+	bp_info = counter_arch_bp(bp);
+
+	/*
+	 * We shall invoke the user-defined callback function in the single
+	 * stepping handler to confirm to 'trigger-after-execute' semantics
+	 */
+	perf_bp_event(bp, regs);
+
+	/*
+	 * Do not disable MSR_SE if the process was already in
+	 * single-stepping mode.
+	 */
+	if (!test_thread_flag(TIF_SINGLESTEP))
+		regs->msr &= ~MSR_SE;
+
+	set_dabr(bp_info->address | bp_info->type | DABR_TRANSLATION);
+	current->thread.last_hit_ubp = NULL;
+	return NOTIFY_STOP;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+int __kprobes hw_breakpoint_exceptions_notify(
+		struct notifier_block *unused, unsigned long val, void *data)
+{
+	int ret = NOTIFY_DONE;
+
+	switch (val) {
+	case DIE_DABR_MATCH:
+		ret = hw_breakpoint_handler(data);
+		break;
+	case DIE_SSTEP:
+		ret = single_step_dabr_instruction(data);
+		break;
+	}
+
+	return ret;
+}
+
+/*
+ * Release the user breakpoints used by ptrace
+ */
+void flush_ptrace_hw_breakpoint(struct task_struct *tsk)
+{
+	struct thread_struct *t = &tsk->thread;
+
+	unregister_hw_breakpoint(t->ptrace_bps[0]);
+	t->ptrace_bps[0] = NULL;
+}
+
+void hw_breakpoint_pmu_read(struct perf_event *bp)
+{
+	/* TODO */
+}
+
Index: linux-2.6.ppc64_test/arch/powerpc/kernel/Makefile
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/Makefile
+++ linux-2.6.ppc64_test/arch/powerpc/kernel/Makefile
@@ -34,6 +34,7 @@ obj-y				+= vdso32/
 obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
 				   signal_64.o ptrace32.o \
 				   paca.o nvram_64.o firmware.o
+obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
 obj-$(CONFIG_PPC_BOOK3S_64)	+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj64-$(CONFIG_RELOCATABLE)	+= reloc_64.o
 obj-$(CONFIG_PPC_BOOK3E_64)	+= exceptions-64e.o
Index: linux-2.6.ppc64_test/arch/powerpc/include/asm/processor.h
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/include/asm/processor.h
+++ linux-2.6.ppc64_test/arch/powerpc/include/asm/processor.h
@@ -209,6 +209,14 @@ struct thread_struct {
 #ifdef CONFIG_PPC64
 	unsigned long	start_tb;	/* Start purr when proc switched in */
 	unsigned long	accum_tb;	/* Total accumilated purr for process */
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+	struct perf_event *ptrace_bps[HBP_NUM];
+	/*
+	 * Helps identify source of single-step exception and subsequent
+	 * hw-breakpoint enablement
+	 */
+	struct perf_event *last_hit_ubp;
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
 #endif
 	unsigned long	dabr;		/* Data address breakpoint register */
 #ifdef CONFIG_ALTIVEC
Index: linux-2.6.ppc64_test/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6.ppc64_test/arch/powerpc/kernel/ptrace.c
@@ -32,6 +32,8 @@
 #ifdef CONFIG_PPC32
 #include <linux/module.h>
 #endif
+#include <linux/hw_breakpoint.h>
+#include <linux/perf_event.h>
 
 #include <asm/uaccess.h>
 #include <asm/page.h>
@@ -866,9 +868,34 @@ void user_disable_single_step(struct tas
 	clear_tsk_thread_flag(task, TIF_SINGLESTEP);
 }
 
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+void ptrace_triggered(struct perf_event *bp, int nmi,
+		      struct perf_sample_data *data, struct pt_regs *regs)
+{
+	struct perf_event_attr attr;
+
+	/*
+	 * Disable the breakpoint request here since ptrace has defined a
+	 * one-shot behaviour for breakpoint exceptions in PPC64.
+	 * The SIGTRAP signal is generated automatically for us in do_dabr().
+	 * We don't have to do anything about that here
+	 */
+	attr = bp->attr;
+	attr.disabled = true;
+	modify_user_hw_breakpoint(bp, &attr);
+}
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
 int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
 			       unsigned long data)
 {
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+	int ret;
+	struct thread_struct *thread = &(task->thread);
+	struct perf_event *bp;
+	struct perf_event_attr attr;
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
 	/* For ppc64 we support one DABR and no IABR's at the moment (ppc64).
 	 *  For embedded processors we support one DAC and no IAC's at the
 	 *  moment.
@@ -896,6 +923,43 @@ int ptrace_set_debugreg(struct task_stru
 	/* Ensure breakpoint translation bit is set */
 	if (data && !(data & DABR_TRANSLATION))
 		return -EIO;
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+	bp = thread->ptrace_bps[0];
+	if ((!data) || !(data & (DABR_DATA_WRITE | DABR_DATA_READ))) {
+		if (bp) {
+			unregister_hw_breakpoint(bp);
+			thread->ptrace_bps[0] = NULL;
+		}
+		return 0;
+	}
+	if (bp) {
+		attr = bp->attr;
+		attr.bp_addr = data & ~HW_BREAKPOINT_ALIGN;
+		arch_bp_generic_fields(data &
+					(DABR_DATA_WRITE | DABR_DATA_READ),
+							&attr.bp_type);
+		ret =  modify_user_hw_breakpoint(bp, &attr);
+		if (ret)
+			return ret;
+		thread->ptrace_bps[0] = bp;
+		thread->dabr = data;
+		return 0;
+	}
+
+	/* Create a new breakpoint request if one doesn't exist already */
+	hw_breakpoint_init(&attr);
+	attr.bp_addr = data & ~HW_BREAKPOINT_ALIGN;
+	arch_bp_generic_fields(data & (DABR_DATA_WRITE | DABR_DATA_READ),
+								&attr.bp_type);
+
+	thread->ptrace_bps[0] = bp = register_user_hw_breakpoint(&attr,
+							ptrace_triggered, task);
+	if (IS_ERR(bp)) {
+		thread->ptrace_bps[0] = NULL;
+		return PTR_ERR(bp);
+	}
+
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
 
 	/* Move contents to the DABR register */
 	task->thread.dabr = data;
Index: linux-2.6.ppc64_test/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/process.c
+++ linux-2.6.ppc64_test/arch/powerpc/kernel/process.c
@@ -37,6 +37,7 @@
 #include <linux/kernel_stat.h>
 #include <linux/personality.h>
 #include <linux/random.h>
+#include <linux/hw_breakpoint.h>
 
 #include <asm/pgtable.h>
 #include <asm/uaccess.h>
@@ -462,8 +463,14 @@ struct task_struct *__switch_to(struct t
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
 	switch_booke_debug_regs(&new->thread);
 #else
+/*
+ * For PPC_BOOK3S_64, we use the hw-breakpoint interfaces that would
+ * schedule DABR
+ */
+#ifndef CONFIG_HAVE_HW_BREAKPOINT
 	if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr))
 		set_dabr(new->thread.dabr);
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
 #endif
 
 
@@ -642,7 +649,11 @@ void flush_thread(void)
 {
 	discard_lazy_cpu_state();
 
+#ifdef CONFIG_HAVE_HW_BREAKPOINTS
+	flush_ptrace_hw_breakpoint(current);
+#else /* CONFIG_HAVE_HW_BREAKPOINTS */
 	set_debug_reg_defaults(&current->thread);
+#endif /* CONFIG_HAVE_HW_BREAKPOINTS */
 }
 
 void
@@ -660,6 +671,9 @@ void prepare_to_copy(struct task_struct 
 	flush_altivec_to_thread(current);
 	flush_vsx_to_thread(current);
 	flush_spe_to_thread(current);
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+	flush_ptrace_hw_breakpoint(tsk);
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
 }
 
 /*
Index: linux-2.6.ppc64_test/arch/powerpc/include/asm/cputable.h
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/include/asm/cputable.h
+++ linux-2.6.ppc64_test/arch/powerpc/include/asm/cputable.h
@@ -516,6 +516,10 @@ static inline int cpu_has_feature(unsign
 		& feature);
 }
 
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+#define HBP_NUM 1
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
Index: linux-2.6.ppc64_test/arch/powerpc/kernel/machine_kexec_64.c
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/machine_kexec_64.c
+++ linux-2.6.ppc64_test/arch/powerpc/kernel/machine_kexec_64.c
@@ -25,6 +25,7 @@
 #include <asm/sections.h>	/* _end */
 #include <asm/prom.h>
 #include <asm/smp.h>
+#include <asm/hw_breakpoint.h>
 
 int default_machine_kexec_prepare(struct kimage *image)
 {
@@ -165,6 +166,7 @@ static void kexec_smp_down(void *arg)
 	while(kexec_all_irq_disabled == 0)
 		cpu_relax();
 	mb(); /* make sure all irqs are disabled before this */
+	hw_breakpoint_disable();
 	/*
 	 * Now every CPU has IRQs off, we can clear out any pending
 	 * IPIs and be sure that no more will come in after this.
@@ -180,6 +182,7 @@ static void kexec_prepare_cpus_wait(int 
 {
 	int my_cpu, i, notified=-1;
 
+	hw_breakpoint_disable();
 	my_cpu = get_cpu();
 	/* Make sure each CPU has atleast made it to the state we need */
 	for (i=0; i < NR_CPUS; i++) {
Index: linux-2.6.ppc64_test/arch/powerpc/lib/Makefile
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/lib/Makefile
+++ linux-2.6.ppc64_test/arch/powerpc/lib/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_PPC64)	+= copypage_64.o cop
 			   memcpy_64.o usercopy_64.o mem_64.o string.o
 obj-$(CONFIG_XMON)	+= sstep.o
 obj-$(CONFIG_KPROBES)	+= sstep.o
+obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= sstep.o
 
 ifeq ($(CONFIG_PPC64),y)
 obj-$(CONFIG_SMP)	+= locks.o
Index: linux-2.6.ppc64_test/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/Kconfig
+++ linux-2.6.ppc64_test/arch/powerpc/Kconfig
@@ -141,6 +141,7 @@ config PPC
 	select GENERIC_ATOMIC64 if PPC32
 	select HAVE_PERF_EVENTS
 	select HAVE_REGS_AND_STACK_ACCESS_API
+	select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S_64
 
 config EARLY_PRINTK
 	bool

^ permalink raw reply

* [Patch 3/5] PPC64-HWBKPT: Handle concurrent alignment interrupts
From: K.Prasad @ 2010-05-28  6:40 UTC (permalink / raw)
  To: linuxppc-dev@ozlabs.org, Paul Mackerras
  Cc: Michael Neuling, Benjamin Herrenschmidt, shaggy,
	Frederic Weisbecker, David Gibson, Alan Stern, K.Prasad,
	Roland McGrath
In-Reply-To: <20100528061928.677651410@linux.vnet.ibm.com>

An alignment interrupt may intervene between a DSI/hw-breakpoint exception
and the single-step exception. Enable the alignment interrupt (through
modifications to emulate_single_step()) to notify the single-step exception
handler for proper restoration of hw-breakpoints.

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/traps.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Index: linux-2.6.ppc64_test/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6.ppc64_test/arch/powerpc/kernel/traps.c
@@ -602,7 +602,7 @@ void RunModeException(struct pt_regs *re
 
 void __kprobes single_step_exception(struct pt_regs *regs)
 {
-	regs->msr &= ~(MSR_SE | MSR_BE);  /* Turn off 'trace' bits */
+	clear_single_step(regs);
 
 	if (notify_die(DIE_SSTEP, "single_step", regs, 5,
 					5, SIGTRAP) == NOTIFY_STOP)
@@ -621,10 +621,8 @@ void __kprobes single_step_exception(str
  */
 static void emulate_single_step(struct pt_regs *regs)
 {
-	if (single_stepping(regs)) {
-		clear_single_step(regs);
-		_exception(SIGTRAP, regs, TRAP_TRACE, 0);
-	}
+	if (single_stepping(regs))
+		single_step_exception(regs);
 }
 
 static inline int __parse_fpscr(unsigned long fpscr)

^ permalink raw reply

* [Patch 4/5] PPC64-HWBKPT: Enable hw-breakpoints while handling intervening signals
From: K.Prasad @ 2010-05-28  6:40 UTC (permalink / raw)
  To: linuxppc-dev@ozlabs.org, Paul Mackerras
  Cc: Michael Neuling, Benjamin Herrenschmidt, shaggy,
	Frederic Weisbecker, David Gibson, Alan Stern, K.Prasad,
	Roland McGrath
In-Reply-To: <20100528061928.677651410@linux.vnet.ibm.com>

A signal delivered between a hw_breakpoint_handler() and the
single_step_dabr_instruction() will not have the breakpoint active during
signal handling (since breakpoint will not be restored through single-stepping
due to absence of MSR_SE bit on the signal frame). Enable breakpoints before
signal delivery.

Restore hw-breakpoints if the user-context is altered in the signal handler.

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hw_breakpoint.h |    3 +++
 arch/powerpc/kernel/hw_breakpoint.c      |   18 ++++++++++++++++++
 arch/powerpc/kernel/signal.c             |    3 +++
 3 files changed, 24 insertions(+)

Index: linux-2.6.ppc64_test/arch/powerpc/include/asm/hw_breakpoint.h
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/include/asm/hw_breakpoint.h
+++ linux-2.6.ppc64_test/arch/powerpc/include/asm/hw_breakpoint.h
@@ -65,9 +65,12 @@ static inline void hw_breakpoint_disable
 {
 	set_dabr(0);
 }
+extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
 
 #else	/* CONFIG_HAVE_HW_BREAKPOINT */
 static inline void hw_breakpoint_disable(void) { }
+static inline void thread_change_pc(struct task_struct *tsk,
+					struct pt_regs *regs) { }
 #endif	/* CONFIG_HAVE_HW_BREAKPOINT */
 #endif	/* __KERNEL__ */
 #endif	/* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
Index: linux-2.6.ppc64_test/arch/powerpc/kernel/hw_breakpoint.c
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/hw_breakpoint.c
+++ linux-2.6.ppc64_test/arch/powerpc/kernel/hw_breakpoint.c
@@ -176,6 +176,24 @@ int arch_validate_hwbkpt_settings(struct
 }
 
 /*
+ * Restores the breakpoint on the debug registers.
+ * Invoke this function if it is known that the execution context is about to
+ * change to cause loss of MSR_SE settings.
+ */
+void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs)
+{
+	struct arch_hw_breakpoint *info;
+
+	if (likely(!tsk->thread.last_hit_ubp))
+		return;
+
+	info = counter_arch_bp(tsk->thread.last_hit_ubp);
+	regs->msr &= ~MSR_SE;
+	set_dabr(info->address | info->type | DABR_TRANSLATION);
+	tsk->thread.last_hit_ubp = NULL;
+}
+
+/*
  * Handle debug exception notifications.
  */
 int __kprobes hw_breakpoint_handler(struct die_args *args)
Index: linux-2.6.ppc64_test/arch/powerpc/kernel/signal.c
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/signal.c
+++ linux-2.6.ppc64_test/arch/powerpc/kernel/signal.c
@@ -11,6 +11,7 @@
 
 #include <linux/tracehook.h>
 #include <linux/signal.h>
+#include <asm/hw_breakpoint.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 
@@ -149,6 +150,8 @@ static int do_signal_pending(sigset_t *o
 	if (current->thread.dabr)
 		set_dabr(current->thread.dabr);
 #endif
+	/* Re-enable the breakpoints for the signal stack */
+	thread_change_pc(current, regs);
 
 	if (is32) {
         	if (ka.sa.sa_flags & SA_SIGINFO)

^ permalink raw reply

* [Patch 5/5] PPC64-HWBKPT: Discard extraneous interrupt due to accesses outside symbol length
From: K.Prasad @ 2010-05-28  6:41 UTC (permalink / raw)
  To: linuxppc-dev@ozlabs.org, Paul Mackerras
  Cc: Michael Neuling, Benjamin Herrenschmidt, shaggy,
	Frederic Weisbecker, David Gibson, Alan Stern, K.Prasad,
	Roland McGrath
In-Reply-To: <20100528061928.677651410@linux.vnet.ibm.com>

Many a times, the requested breakpoint length can be less than the fixed
breakpoint length i.e. 8 bytes supported by PowerPC BookIII S. This could lead
to extraneous interrupts resulting in false breakpoint notifications. The patch
below detects and discards such interrupts for non-ptrace requests (we don't
want to change ptrace behaviour for fear of breaking compatability).

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/hw_breakpoint.c |   39 ++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

Index: linux-2.6.ppc64_test/arch/powerpc/kernel/hw_breakpoint.c
===================================================================
--- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/hw_breakpoint.c
+++ linux-2.6.ppc64_test/arch/powerpc/kernel/hw_breakpoint.c
@@ -198,12 +198,13 @@ void thread_change_pc(struct task_struct
  */
 int __kprobes hw_breakpoint_handler(struct die_args *args)
 {
-	bool is_ptrace_bp = false;
+	bool is_extraneous_interrupt = false, is_ptrace_bp = false;
 	int rc = NOTIFY_STOP;
 	struct perf_event *bp;
 	struct pt_regs *regs = args->regs;
 	int stepped = 1;
 	struct arch_hw_breakpoint *info;
+	unsigned long dar = regs->dar;
 
 	/* Disable breakpoints during exception handling */
 	set_dabr(0);
@@ -234,9 +235,33 @@ int __kprobes hw_breakpoint_handler(stru
 		goto out;
 	}
 
+	/*
+	 * Verify if dar lies within the address range occupied by the symbol
+	 * being watched to filter extraneous exceptions.
+	 */
+	if (!((bp->attr.bp_addr <= dar) &&
+	     (dar <= (bp->attr.bp_addr + bp->attr.bp_len)))) {
+		/*
+		 * This exception is triggered not because of a memory access
+		 * on the monitored variable but in the double-word address
+		 * range in which it is contained. We will consume this
+		 * exception, considering it as 'noise'.
+		 */
+		is_extraneous_interrupt = true;
+	}
+
 	/* Do not emulate user-space instructions, instead single-step them */
 	if (user_mode(regs)) {
-		bp->ctx->task->thread.last_hit_ubp = bp;
+		/*
+		 * To prevent invocation of perf_event_bp(), we shall overload
+		 * thread.ptrace_bps[] pointer (unused for non-ptrace
+		 * exceptions) to flag an extraneous interrupt which must be
+		 * skipped.
+		 */
+		if (is_extraneous_interrupt)
+			bp->ctx->task->thread.ptrace_bps[0] = bp;
+		else
+			bp->ctx->task->thread.last_hit_ubp = bp;
 		regs->msr |= MSR_SE;
 		goto out;
 	}
@@ -274,7 +299,12 @@ int __kprobes single_step_dabr_instructi
 	struct perf_event *bp = NULL;
 	struct arch_hw_breakpoint *bp_info;
 
-	bp = current->thread.last_hit_ubp;
+	if (current->thread.last_hit_ubp)
+		bp = current->thread.last_hit_ubp;
+	else {
+		bp = current->thread.ptrace_bps[0];
+		current->thread.ptrace_bps[0] = NULL;
+	}
 	/*
 	 * Check if we are single-stepping as a result of a
 	 * previous HW Breakpoint exception
@@ -288,7 +318,8 @@ int __kprobes single_step_dabr_instructi
 	 * We shall invoke the user-defined callback function in the single
 	 * stepping handler to confirm to 'trigger-after-execute' semantics
 	 */
-	perf_bp_event(bp, regs);
+	if (bp == current->thread.last_hit_ubp)
+		perf_bp_event(bp, regs);
 
 	/*
 	 * Do not disable MSR_SE if the process was already in

^ permalink raw reply

* Re: [Patch 2/4] PPC64-HWBKPT: Implement hw-breakpoints for PowerPC BookIII S
From: K.Prasad @ 2010-05-28  7:39 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Michael Neuling, Benjamin Herrenschmidt, shaggy,
	Frederic Weisbecker, David Gibson, linuxppc-dev@ozlabs.org,
	Alan Stern, Roland McGrath
In-Reply-To: <20100527061940.GA4105@drongo>

On Thu, May 27, 2010 at 04:19:40PM +1000, Paul Mackerras wrote:
> On Tue, May 25, 2010 at 02:44:20PM +0530, K.Prasad wrote:
> 
> > Implement perf-events based hw-breakpoint interfaces for PowerPC Book III S
> > processors. These interfaces help arbitrate requests from various users and
> > schedules them as appropriate.
> 
> A few comments on the code below...
> 

I've posted a new patchset that addresses almost all of your comments.
Please find them here: linuxppc-dev message-id:
20100528063924.GA8679@in.ibm.com

> > +int __kprobes hw_breakpoint_handler(struct die_args *args)
> > +{
> > +	bool is_ptrace_bp = false;
> > +	int rc = NOTIFY_STOP;
> > +	struct perf_event *bp;
> > +	struct pt_regs *regs = args->regs;
> > +	unsigned long dar = regs->dar;
> > +	int stepped = 1;
> > +	struct arch_hw_breakpoint *info;
> > +
> > +	/* Disable breakpoints during exception handling */
> > +	set_dabr(0);
> > +	/*
> > +	 * The counter may be concurrently released but that can only
> > +	 * occur from a call_rcu() path. We can then safely fetch
> > +	 * the breakpoint, use its callback, touch its counter
> > +	 * while we are in an rcu_read_lock() path.
> > +	 */
> > +	rcu_read_lock();
> > +
> > +	bp = __get_cpu_var(bp_per_reg);
> > +	if (!bp)
> > +		goto out;
> > +	info = counter_arch_bp(bp);
> > +	is_ptrace_bp = (bp->overflow_handler == ptrace_triggered) ?
> > +			true : false;
> > +
> > +	/*
> > +	 * Verify if dar lies within the address range occupied by the symbol
> > +	 * being watched to filter extraneous exceptions.
> > +	 */
> > +	if (!((bp->attr.bp_addr <= dar) &&
> > +	    (dar <= (bp->attr.bp_addr + bp->attr.bp_len))) &&
> > +	    (!is_ptrace_bp))
> > +		/*
> > +		 * This exception is triggered not because of a memory access on
> > +		 * the monitored variable but in the double-word address range
> > +		 * in which it is contained. We will consume this exception,
> > +		 * considering it as 'noise'.
> > +		 */
> > +		goto restore_bp;
> 
> At this point we have to do the single-stepping, because the NIP is
> still pointing at the instruction that caused the exception, and if we
> just return to it with DABR set we won't make any progress, we'll just
> take the same exception again immediately.
> 

I don't know how I convinced myself earlier that this would work :-)

Given that the instructions needs to be emulated in the same manner as
others, I've re-used the ptrace_bps[] member in 'thread_struct' as a
flag to indicate such breakpoints. This will be later checked in
single_step_dabr_instruction() to prevent invocation of
perf_event_bp().

> > +/*
> > + * Handle single-step exceptions following a DABR hit.
> > + */
> > +int __kprobes single_step_dabr_instruction(struct die_args *args)
> > +{
> > +	struct pt_regs *regs = args->regs;
> > +	struct perf_event *bp = NULL;
> > +	struct arch_hw_breakpoint *bp_info;
> > +
> > +	bp = current->thread.last_hit_ubp;
> > +	/*
> > +	 * Check if we are single-stepping as a result of a
> > +	 * previous HW Breakpoint exception
> > +	 */
> > +	if (!bp)
> > +		return NOTIFY_DONE;
> > +
> > +	bp_info = counter_arch_bp(bp);
> > +
> > +	/*
> > +	 * We shall invoke the user-defined callback function in the single
> > +	 * stepping handler to confirm to 'trigger-after-execute' semantics
> > +	 */
> > +	perf_bp_event(bp, regs);
> > +
> > +	/*
> > +	 * Do not disable MSR_SE if the process was already in
> > +	 * single-stepping mode.
> > +	 */
> > +	if (!test_thread_flag(TIF_SINGLESTEP))
> > +		regs->msr &= ~MSR_SE;
> > +
> > +	set_dabr(bp_info->address | bp_info->type | DABR_TRANSLATION);
> > +	return NOTIFY_STOP;
> > +}
> 
> Nowhere in here do we reset current->thread.last_hit_ubp, yet other
> parts of the code assume that .last_hit_ubp != NULL means that we are
> currently single-stepping.  I think we need to clear .last_hit_ubp
> here.
> 

True, made the change.

> > Index: linux-2.6.ppc64_test/arch/powerpc/kernel/process.c
> > ===================================================================
> > --- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/process.c
> > +++ linux-2.6.ppc64_test/arch/powerpc/kernel/process.c
> > @@ -462,8 +462,14 @@ struct task_struct *__switch_to(struct t
> >  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> >  	switch_booke_debug_regs(&new->thread);
> >  #else
> > +/*
> > + * For PPC_BOOK3S_64, we use the hw-breakpoint interfaces that would
> > + * schedule DABR
> > + */
> > +#ifndef CONFIG_HAVE_HW_BREAKPOINT
> >  	if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr))
> >  		set_dabr(new->thread.dabr);
> > +#endif /* CONFIG_HAVE_HW_BREAKPOINT */
> >  #endif
> 
> Have you checked all the places that set_dabr is called to see whether
> they are still needed with CONFIG_HAVE_HW_BREAKPOINT?
> 

Yes. All invocations of set_dabr() or their caller functions are enclosed
within #ifdef CONFIG_HAVE_HW_BREAKPOINT.

> > Index: linux-2.6.ppc64_test/arch/powerpc/include/asm/cputable.h
> > ===================================================================
> > --- linux-2.6.ppc64_test.orig/arch/powerpc/include/asm/cputable.h
> > +++ linux-2.6.ppc64_test/arch/powerpc/include/asm/cputable.h
> > @@ -516,6 +516,10 @@ static inline int cpu_has_feature(unsign
> >  		& feature);
> >  }
> >  
> > +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> > +#define HBP_NUM 1
> > +#endif /* CONFIG_HAVE_HW_BREAKPOINT */
> 
> Why is this defined here, not in <asm/hw_breakpoint.h> ?
> 

We need HBP_NUM value for arch/powerpc/include/asm/processor.h
and if asm/hw_breakpoint.h is included there, it caused recursive
dependancies. After more discussions with the community (as in
linuxppc-dev: message-id: 20100330101234.GA14734@in.ibm.com) we finally
decided to put it in cputable.h

We will have more related definitions to accompany this when support for
BookIII E is brought in.

Thanks,
K.Prasad

^ permalink raw reply

* Re: [Patch 3/4] PPC64-HWBKPT: Handle concurrent alignment interrupts
From: K.Prasad @ 2010-05-28  7:41 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Michael Neuling, Benjamin Herrenschmidt, shaggy,
	Frederic Weisbecker, David Gibson, linuxppc-dev@ozlabs.org,
	Alan Stern, Roland McGrath
In-Reply-To: <20100527062044.GB4105@drongo>

On Thu, May 27, 2010 at 04:20:44PM +1000, Paul Mackerras wrote:
> On Tue, May 25, 2010 at 02:44:35PM +0530, K.Prasad wrote:
> 
> > An alignment interrupt may intervene between a DSI/hw-breakpoint exception
> > and the single-step exception. Enable the alignment interrupt (through
> > modifications to emulate_single_step()) to notify the single-step exception
> > handler for proper restoration of hw-breakpoints.
> > 
> > Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/kernel/traps.c |    7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > Index: linux-2.6.ppc64_test/arch/powerpc/kernel/traps.c
> > ===================================================================
> > --- linux-2.6.ppc64_test.orig/arch/powerpc/kernel/traps.c
> > +++ linux-2.6.ppc64_test/arch/powerpc/kernel/traps.c
> > @@ -602,7 +602,7 @@ void RunModeException(struct pt_regs *re
> >  
> >  void __kprobes single_step_exception(struct pt_regs *regs)
> >  {
> > -	regs->msr &= ~(MSR_SE | MSR_BE);  /* Turn off 'trace' bits */
> > +	clear_single_step(regs);
> >  
> >  	if (notify_die(DIE_SSTEP, "single_step", regs, 5,
> >  					5, SIGTRAP) == NOTIFY_STOP)
> > @@ -621,10 +621,7 @@ void __kprobes single_step_exception(str
> >   */
> >  static void emulate_single_step(struct pt_regs *regs)
> >  {
> > -	if (single_stepping(regs)) {
> > -		clear_single_step(regs);
> > -		_exception(SIGTRAP, regs, TRAP_TRACE, 0);
> > -	}
> > +	single_step_exception(regs);
> >  }
> 
> We still need the if (single_stepping(regs)) in emulate_single_step.
> We don't want to send the process a SIGTRAP every time it gets an
> alignment interrupt. :)
> 
> Paul.

Agreed, and made changes to that effect in version XXII (as seen in
patch linuxppc-dev message-id: 20100528064017.GD8679@in.ibm.com).

Thanks,
K.Prasad

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox