LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Arnd Bergmann @ 2010-08-27 12:03 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Rodolfo Giometti, john stultz, devicetree-discuss, linux-kernel,
	netdev, linuxppc-dev, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <20100827110855.GA11657@riccoc20.at.omicron.at>

On Friday 27 August 2010, Richard Cochran wrote:
> On Mon, Aug 23, 2010 at 01:21:39PM -0700, john stultz wrote:
> > On Thu, 2010-08-19 at 17:38 +0200, Richard Cochran wrote:
> > > On Thu, Aug 19, 2010 at 02:28:04PM +0200, Arnd Bergmann wrote:
> > > > Have you considered passing a struct timex instead of ppb and ts?
> > > 
> > > Yes, but the timex is not suitable, IMHO.
> > 
> > Could you expand on this?
> 
> We need to able to specify that the call is for a PTP clock. We could
> add that to the modes flag, like this:
> 
> /*timex.h*/
> #define ADJ_PTP_0 0x10000
> #define ADJ_PTP_1 0x20000
> #define ADJ_PTP_2 0x30000
> #define ADJ_PTP_3 0x40000
>
> I can live with this, if everyone else can, too.

My suggestion was actually to have a new syscall with the existing
structure, and pass a clockid_t value to it, similar to your
sys_clock_adjtime(), not change the actual sys_adjtime syscall.
  
> > Could we not add a adjustment mode ADJ_SETOFFSET or something that would
> > provide the instantaneous offset correction?
> 
> Yes, but we would also need to add a struct timespec to the struct
> timex, in order to get nanosecond resolution. I think it would be
> possible to do in the padding at the end?

Yes, that's exactly what the padding is for. Instead of timespec, you can
probably have a extra values for replacing the existing ppm values with
ppb values.

	Arnd

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Richard Cochran @ 2010-08-27 11:08 UTC (permalink / raw)
  To: john stultz
  Cc: Rodolfo Giometti, Arnd Bergmann, netdev, devicetree-discuss,
	linux-kernel, linuxppc-dev, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <1282594899.3111.358.camel@localhost.localdomain>

On Mon, Aug 23, 2010 at 01:21:39PM -0700, john stultz wrote:
> On Thu, 2010-08-19 at 17:38 +0200, Richard Cochran wrote:
> > On Thu, Aug 19, 2010 at 02:28:04PM +0200, Arnd Bergmann wrote:
> > > My point was that a syscall is better than an ioctl based interface here,
> > > which I definitely still think. Given that John knows much more about
> > > clocks than I do, we still need to get agreement on the question that
> > > he raised, which is whether we actually need to expose this clock to the
> > > user or not.
> > > 
> > > If we can find a way to sync system time accurate enough with PTP and
> > > PPS, user applications may not need to see two separate clocks at all.
> > 
> > At the very least, one user application (the PTPd) needs to see the
> > PTP clock.
> > 
> > > > SYSCALL_DEFINE3(clock_adjtime, const clockid_t, clkid,
> > > > 		int, ppb, struct timespec __user *, ts)
> > > > 
> > > > ppb - desired frequency adjustment in parts per billion
> > > > ts  - desired time step (or jump) in <sec,nsec> to correct
> > > >       a measured offset
> > > > 
> > > > Arguably, this syscall might be useful for other clocks, too.
> > > 
> > > This is a mix of adjtime and adjtimex with the addition of
> > > the clkid parameter, right?
> > 
> > Sort of, but not really. ADJTIME(3) takes an offset and slowly
> > corrects the clock using a servo in the kernel, over hours.
> > 
> > For this function, the offset passed in the 'ts' parameter will be
> > immediately corrected, by jumping to the new time. This reflects the
> > way that PTP works. After the first few samples, the PTPd has an
> > estimate of the offset to the master and the rate difference. The PTPd
> > can proceed in one of two ways.
> > 
> > 1. If resetting the clock is not desired, then the clock is set to the
> >    maximum adjustment (in the right direction) until the clock time is
> >    close to the master's time.
> > 
> > 2. The estimated offset is added to the current time, resulting in a
> >    jump in time.
> > 
> > We need clock_adjtime(id, 0, ts) for the second case.
> >
> > > Have you considered passing a struct timex instead of ppb and ts?
> > 
> > Yes, but the timex is not suitable, IMHO.
> 
> Could you expand on this?

We need to able to specify that the call is for a PTP clock. We could
add that to the modes flag, like this:

/*timex.h*/
#define ADJ_PTP_0 0x10000
#define ADJ_PTP_1 0x20000
#define ADJ_PTP_2 0x30000
#define ADJ_PTP_3 0x40000

I can live with this, if everyone else can, too.
 
> Could we not add a adjustment mode ADJ_SETOFFSET or something that would
> provide the instantaneous offset correction?

Yes, but we would also need to add a struct timespec to the struct
timex, in order to get nanosecond resolution. I think it would be
possible to do in the padding at the end?

> You're right that the timex is a little crufty. But its legacy that we
> will support indefinitely. So following the established interface helps
> maintainability.

We can use it for PTP, with the modifications suggested above. Or we
can just introduce the clock_adjtime method, instead.
 
> So if the clock_adjtime interface is needed, it would seem best for it
> to be generic enough to support not only PTP, but also the NTP kernel
> PLL.

For the proposed clock_adjime, what else is needed to support clock
adjustment in general?

I don't mind making the interface generic enough to support any
(realistic) conceivable clock adjustment scheme, but beyond the
present PTP hardware clocks, I don't know what else might be needed.

Richard

^ permalink raw reply

* [PATCH] mpc8308: fix USB DR controller initialization
From: Ilya Yanok @ 2010-08-27 10:57 UTC (permalink / raw)
  To: linuxppc-dev, wd, dzu, vlad; +Cc: Ilya Yanok

MPC8308 has ULPI pin muxing settings in SICRH register, bits 17-18
which is different from both MPC8313 and MPC8315.
Also MPC8308 doesn't have REFSEL, UTMI_PHY_EN and OTG_PORT fields
in the USB DR controller CONTROL register.

Signed-off-by: Ilya Yanok <yanok@emcraft.com>
---
 arch/powerpc/boot/dts/mpc8308rdb.dts  |    2 +-
 arch/powerpc/platforms/83xx/mpc83xx.h |    2 ++
 arch/powerpc/platforms/83xx/usb.c     |   21 ++++++++++++++++-----
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/boot/dts/mpc8308rdb.dts b/arch/powerpc/boot/dts/mpc8308rdb.dts
index a97eb2d..1e2b888 100644
--- a/arch/powerpc/boot/dts/mpc8308rdb.dts
+++ b/arch/powerpc/boot/dts/mpc8308rdb.dts
@@ -109,7 +109,7 @@
 		#address-cells = <1>;
 		#size-cells = <1>;
 		device_type = "soc";
-		compatible = "fsl,mpc8315-immr", "simple-bus";
+		compatible = "fsl,mpc8308-immr", "simple-bus";
 		ranges = <0 0xe0000000 0x00100000>;
 		reg = <0xe0000000 0x00000200>;
 		bus-frequency = <0>;
diff --git a/arch/powerpc/platforms/83xx/mpc83xx.h b/arch/powerpc/platforms/83xx/mpc83xx.h
index 0fea881..82a4345 100644
--- a/arch/powerpc/platforms/83xx/mpc83xx.h
+++ b/arch/powerpc/platforms/83xx/mpc83xx.h
@@ -35,6 +35,8 @@
 
 /* system i/o configuration register high */
 #define MPC83XX_SICRH_OFFS         0x118
+#define MPC8308_SICRH_USB_MASK     0x000c0000
+#define MPC8308_SICRH_USB_ULPI     0x00040000
 #define MPC834X_SICRH_USB_UTMI     0x00020000
 #define MPC831X_SICRH_USB_MASK     0x000000e0
 #define MPC831X_SICRH_USB_ULPI     0x000000a0
diff --git a/arch/powerpc/platforms/83xx/usb.c b/arch/powerpc/platforms/83xx/usb.c
index 3ba4bb7..2c64164 100644
--- a/arch/powerpc/platforms/83xx/usb.c
+++ b/arch/powerpc/platforms/83xx/usb.c
@@ -127,7 +127,8 @@ int mpc831x_usb_cfg(void)
 
 	/* Configure clock */
 	immr_node = of_get_parent(np);
-	if (immr_node && of_device_is_compatible(immr_node, "fsl,mpc8315-immr"))
+	if (immr_node && (of_device_is_compatible(immr_node, "fsl,mpc8315-immr") ||
+			of_device_is_compatible(immr_node, "fsl,mpc8308-immr")))
 		clrsetbits_be32(immap + MPC83XX_SCCR_OFFS,
 		                MPC8315_SCCR_USB_MASK,
 		                MPC8315_SCCR_USB_DRCM_01);
@@ -138,7 +139,11 @@ int mpc831x_usb_cfg(void)
 
 	/* Configure pin mux for ULPI.  There is no pin mux for UTMI */
 	if (prop && !strcmp(prop, "ulpi")) {
-		if (of_device_is_compatible(immr_node, "fsl,mpc8315-immr")) {
+		if (of_device_is_compatible(immr_node, "fsl,mpc8308-immr")) {
+			clrsetbits_be32(immap + MPC83XX_SICRH_OFFS,
+					MPC8308_SICRH_USB_MASK,
+					MPC8308_SICRH_USB_ULPI);
+		} else if (of_device_is_compatible(immr_node, "fsl,mpc8315-immr")) {
 			clrsetbits_be32(immap + MPC83XX_SICRL_OFFS,
 					MPC8315_SICRL_USB_MASK,
 					MPC8315_SICRL_USB_ULPI);
@@ -173,6 +178,9 @@ int mpc831x_usb_cfg(void)
 		     !strcmp(prop, "utmi"))) {
 		u32 refsel;
 
+		if (of_device_is_compatible(immr_node, "fsl,mpc8308-immr"))
+			goto out;
+
 		if (of_device_is_compatible(immr_node, "fsl,mpc8315-immr"))
 			refsel = CONTROL_REFSEL_24MHZ;
 		else
@@ -186,9 +194,11 @@ int mpc831x_usb_cfg(void)
 		temp = CONTROL_PHY_CLK_SEL_ULPI;
 #ifdef CONFIG_USB_OTG
 		/* Set OTG_PORT */
-		dr_mode = of_get_property(np, "dr_mode", NULL);
-		if (dr_mode && !strcmp(dr_mode, "otg"))
-			temp |= CONTROL_OTG_PORT;
+		if (!of_device_is_compatible(immr_node, "fsl,mpc8308-immr")) {
+			dr_mode = of_get_property(np, "dr_mode", NULL);
+			if (dr_mode && !strcmp(dr_mode, "otg"))
+				temp |= CONTROL_OTG_PORT;
+		}
 #endif /* CONFIG_USB_OTG */
 		out_be32(usb_regs + FSL_USB2_CONTROL_OFFS, temp);
 	} else {
@@ -196,6 +206,7 @@ int mpc831x_usb_cfg(void)
 		ret = -EINVAL;
 	}
 
+out:
 	iounmap(usb_regs);
 	of_node_put(np);
 	return ret;
-- 
1.6.2.5

^ permalink raw reply related

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Richard Cochran @ 2010-08-27  7:57 UTC (permalink / raw)
  To: john stultz
  Cc: Rodolfo Giometti, Arnd Bergmann, netdev, devicetree-discuss,
	linux-kernel, Christian Riesch, linuxppc-dev, linux-arm-kernel,
	Krzysztof Halasa
In-Reply-To: <1282874269.4371.74.camel@localhost.localdomain>

On Thu, Aug 26, 2010 at 06:57:49PM -0700, john stultz wrote:
> On Wed, 2010-08-25 at 11:40 +0200, Christian Riesch wrote:
> > 2) Master clock:
> > We have one or more network ports. Our system has a really good clock
> > (ovenized quartz crystal, an atomic clock, a GPS timing receiver...)
> > and it distributes this time on the network. In such a case we do not
> > steer our clock based on the (packet) timestamps we get from our
> > timestamping unit. Instead, we directly drive our clock hardware with
> > a very stable frequency that we get from the OCXO or the atomic
> > clock...
>
> Ok. Following you here...
>
> > or we use one of the ancillary features of the PTP clock that
> > Richard mentioned to timestamp not network packets but a 1pps signal
> > and use these timestamps to steer the clock.
>
> Wait.. I thought we weren't using PTP to steer the clock? But now we're
> using the pps signal from it to do so? Do I misunderstand you? Or did
> you just not mean this?

The master node in a PTP network probably takes its time from a
precise external time source, like GPS. The GPS provides a 1 PPS
directly to the PTP clock hardware, which latches the PTP hardware
clock time on the PPS edge. This provides one sample as input to a
clock servo (in the PTPd) that, in turn, regulates the PTP clock
hardware.

> > Packet time stamping is
> > used to distribute the time to the slaves, but it is not part of the
> > control loop in this case.
>
> I assume here you mean PTPd is steering the PTP clock according to the
> system time (which is NTP/GPS/whatever sourced)? And then the PTP clock
> distributes that time through the network?

Yes, but in this case, "system time" has nothing to do with the Linux
system time. For a PTP master clock, it really doesn't matter whether
the Linux time is correct, or not. It doesn't hurt either (but see
below about chaining servos).

> So first of all, thanks for the extra explanation and context here! I
> really appreciate it, as I'm not familiar with all the hardware details
> and possible use cases, but I'm trying to learn.
>
> So in the two cases you mention, the time "flow" is something like:
>
> #1) [Master Clock on Network1] => [PTP Clock] => [PTPd] =>
> 	[PTP Clock] => [PTP Clients on Network2]

I would only draw the PTP clock once, perhaps like this:

                                +------------------------------+
                                |                              ^
                                V                              |
[Master Clock on NW 1]--->[HW timestamp]--->[PTPd]--adj-->[PTP Clock]
[Slaves on NW 2,3,...]<---[HW timestamp]<---[    ]

>
> #2) [GPS] => [NTPd] => [System Time] => [PTPd] => [PTP clock] =>
> 	[PTP clients on Network]

Nope. More like this:

                                      +------------------------------+
                                      |                              ^
                                      V                              |
[GPS]----------PPS--------->[Latch  timestamp]--->[PTPd]--adj-->[PTP Clock]
[Slaves on NW 1,2,3,...]<---[HW pkt timestamp]<---[    ]

> And the original case:
> #3) [Master Clock on Network] => [PTP clock] => [PTPd] => [PTP clock]

More like:

[Master Clock on NW 1]--->[HW timestamp]--->[PTPd]--adj-->[PTP Clock]

> With a secondary control flow:
> 	[PPS signal from PTP clock] => [NTPd] => [System Time]

Yes.


> So, just brainstorming here, I guess the question I'm trying to figure
> out here, is can the "System Time" and "PTP clock" be merged/globbed
> into a single "Time" interface from the userspace point of view?

This is the core issue and source of misunderstanding, in my view. The
fact of the matter is, the current generation of computers has
multiple clocks, and these are usually unsynchronized. I think we
should not try too hard to cover up or work around this. It is a fact
of life.

It would be nice if there were only one clock, and that clock could do
everything that we need. Indeed, the next generation of SoC computers may
all have PTP build in to the main CPU. Well, one can always wish.

If we can make it appear that multiple clocks are just one clock, then
I agree that we should do it. But I would not want to sacrifice
synchronization accuracy or application features just to keep that
illusion.

> In other words, if internal to the kernel, the PTP clock was always
> synced to the system time, couldn't the flow look something like:
>
> #3') [Master clock on network] => [PTP clock] => [PTPd] =>
> 	 [System Time] => [in-kernel sync thread] => [PTP clock]
>
> So PTPd sees the offset adjustment from the PTP clock, and then feeds
> that offset correction right into (a possibly enhanced) adjtimex. The
> kernel would then immediately steer the PTP clock by the same amount to
> keep it in sync with system time (along with a periodic offset/freq
> correction step to deal with crystal drift).
>
> Similarly:
>
> #2') [GPS] => [NTPd] => [System Time] => [in-kernel sync thread] =>
> 		[PTP clock] => [PTP clients on Network]
>
> and
>
> #1') [Master Clock on Network1] => [PTP Clock] => [PTPd] =>
> 	[System Time] => [in-kernel sync thread] => [PTP Clock] =>
> 	[PTP Clients on Network2]
>
> Now, I realize PTP purists probably won't like this, because it
> effectively makes the in-kernel sync thread similar to a PTP boundary
> clock (or worse, since the control loop isn't exactly direct).

I don't like it. The experience with PTP boundary clocks already shows
that the errors in servo loops cascade. It worsens the PTP performance.

However, I think it would fine just to synch the Linux system time to
the PTP clock using the PPS interface, which is what my patch set
implements. Linux user applications would not be able to detect the
difference.

Richard

^ permalink raw reply

* [PATCH 2/2] powerpc, pseries: Re-enable dispatch trace log userspace interface
From: Paul Mackerras @ 2010-08-27  5:57 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20100827055643.GA19033@brick.ozlabs.ibm.com>

Since the cpu accounting code uses the hypervisor dispatch trace log
now when CONFIG_VIRT_CPU_ACCOUNTING = y, the previous commit disabled
access to it via files in the /sys/kernel/debug/powerpc/dtl/ directory
in that case.  This restores those files.

To do this, we now have a hook that the cpu accounting code will call
as it processes each entry from the hypervisor dispatch trace log.
The code in dtl.c now uses that to fill up its ring buffer, rather
than having the hypervisor fill the ring buffer directly.

This also fixes dtl_file_read() to handle overflow conditions a bit
better and adds a spinlock to ensure that race conditions (multiple
processes opening or reading the file concurrently) are handled
correctly.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/lppaca.h    |    8 ++
 arch/powerpc/kernel/time.c           |    6 +-
 arch/powerpc/platforms/pseries/dtl.c |  207 +++++++++++++++++++++++++++-------
 3 files changed, 180 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index cfb85ec..7f5e0fe 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -191,6 +191,14 @@ struct dtl_entry {
 #define DISPATCH_LOG_BYTES	4096	/* bytes per cpu */
 #define N_DISPATCH_LOG		(DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
 
+/*
+ * When CONFIG_VIRT_CPU_ACCOUNTING = y, the cpu accounting code controls
+ * reading from the dispatch trace log.  If other code wants to consume
+ * DTL entries, it can set this pointer to a function that will get
+ * called once for each DTL entry that gets processed.
+ */
+extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index);
+
 #endif /* CONFIG_PPC_BOOK3S */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_LPPACA_H */
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index fca2064..bcb738b 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -183,6 +183,8 @@ DEFINE_PER_CPU(unsigned long, cputime_scaled_last_delta);
 
 cputime_t cputime_one_jiffy;
 
+void (*dtl_consumer)(struct dtl_entry *, u64);
+
 static void calc_cputime_factors(void)
 {
 	struct div_result res;
@@ -218,7 +220,7 @@ static u64 read_spurr(u64 tb)
  */
 static u64 scan_dispatch_log(u64 stop_tb)
 {
-	unsigned long i = local_paca->dtl_ridx;
+	u64 i = local_paca->dtl_ridx;
 	struct dtl_entry *dtl = local_paca->dtl_curr;
 	struct dtl_entry *dtl_end = local_paca->dispatch_log_end;
 	struct lppaca *vpa = local_paca->lppaca_ptr;
@@ -229,6 +231,8 @@ static u64 scan_dispatch_log(u64 stop_tb)
 	if (i == vpa->dtl_idx)
 		return 0;
 	while (i < vpa->dtl_idx) {
+		if (dtl_consumer)
+			dtl_consumer(dtl, i);
 		dtb = dtl->timebase;
 		tb_delta = dtl->enqueue_to_dispatch_time +
 			dtl->ready_to_enqueue_time;
diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c
index 0357655..68cb2f2 100644
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -23,6 +23,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/debugfs.h>
+#include <linux/spinlock.h>
 #include <asm/smp.h>
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -37,6 +38,7 @@ struct dtl {
 	int			cpu;
 	int			buf_entries;
 	u64			last_idx;
+	spinlock_t		lock;
 };
 static DEFINE_PER_CPU(struct dtl, cpu_dtl);
 
@@ -55,25 +57,97 @@ static u8 dtl_event_mask = 0x7;
 static int dtl_buf_entries = (16 * 85);
 
 
-static int dtl_enable(struct dtl *dtl)
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+struct dtl_ring {
+	u64	write_index;
+	struct dtl_entry *write_ptr;
+	struct dtl_entry *buf;
+	struct dtl_entry *buf_end;
+	u8	saved_dtl_mask;
+};
+
+static DEFINE_PER_CPU(struct dtl_ring, dtl_rings);
+
+static atomic_t dtl_count;
+
+/*
+ * The cpu accounting code controls the DTL ring buffer, and we get
+ * given entries as they are processed.
+ */
+static void consume_dtle(struct dtl_entry *dtle, u64 index)
 {
-	unsigned long addr;
-	int ret, hwcpu;
+	struct dtl_ring *dtlr = &__get_cpu_var(dtl_rings);
+	struct dtl_entry *wp = dtlr->write_ptr;
+	struct lppaca *vpa = local_paca->lppaca_ptr;
 
-	/* only allow one reader */
-	if (dtl->buf)
-		return -EBUSY;
+	if (!wp)
+		return;
 
-	/* we need to store the original allocation size for use during read */
-	dtl->buf_entries = dtl_buf_entries;
+	*wp = *dtle;
+	barrier();
 
-	dtl->buf = kmalloc_node(dtl->buf_entries * sizeof(struct dtl_entry),
-			GFP_KERNEL, cpu_to_node(dtl->cpu));
-	if (!dtl->buf) {
-		printk(KERN_WARNING "%s: buffer alloc failed for cpu %d\n",
-				__func__, dtl->cpu);
-		return -ENOMEM;
-	}
+	/* check for hypervisor ring buffer overflow, ignore this entry if so */
+	if (index + N_DISPATCH_LOG < vpa->dtl_idx)
+		return;
+
+	++wp;
+	if (wp == dtlr->buf_end)
+		wp = dtlr->buf;
+	dtlr->write_ptr = wp;
+
+	/* incrementing write_index makes the new entry visible */
+	smp_wmb();
+	++dtlr->write_index;
+}
+
+static int dtl_start(struct dtl *dtl)
+{
+	struct dtl_ring *dtlr = &per_cpu(dtl_rings, dtl->cpu);
+
+	dtlr->buf = dtl->buf;
+	dtlr->buf_end = dtl->buf + dtl->buf_entries;
+	dtlr->write_index = 0;
+
+	/* setting write_ptr enables logging into our buffer */
+	smp_wmb();
+	dtlr->write_ptr = dtl->buf;
+
+	/* enable event logging */
+	dtlr->saved_dtl_mask = lppaca_of(dtl->cpu).dtl_enable_mask;
+	lppaca_of(dtl->cpu).dtl_enable_mask |= dtl_event_mask;
+
+	dtl_consumer = consume_dtle;
+	atomic_inc(&dtl_count);
+	return 0;
+}
+
+static void dtl_stop(struct dtl *dtl)
+{
+	struct dtl_ring *dtlr = &per_cpu(dtl_rings, dtl->cpu);
+
+	dtlr->write_ptr = NULL;
+	smp_wmb();
+
+	dtlr->buf = NULL;
+
+	/* restore dtl_enable_mask */
+	lppaca_of(dtl->cpu).dtl_enable_mask = dtlr->saved_dtl_mask;
+
+	if (atomic_dec_and_test(&dtl_count))
+		dtl_consumer = NULL;
+}
+
+static u64 dtl_current_index(struct dtl *dtl)
+{
+	return per_cpu(dtl_rings, dtl->cpu).write_index;
+}
+
+#else /* CONFIG_VIRT_CPU_ACCOUNTING */
+
+static int dtl_start(struct dtl *dtl)
+{
+	unsigned long addr;
+	int ret, hwcpu;
 
 	/* Register our dtl buffer with the hypervisor. The HV expects the
 	 * buffer size to be passed in the second word of the buffer */
@@ -85,12 +159,11 @@ static int dtl_enable(struct dtl *dtl)
 	if (ret) {
 		printk(KERN_WARNING "%s: DTL registration for cpu %d (hw %d) "
 		       "failed with %d\n", __func__, dtl->cpu, hwcpu, ret);
-		kfree(dtl->buf);
 		return -EIO;
 	}
 
 	/* set our initial buffer indices */
-	dtl->last_idx = lppaca_of(dtl->cpu).dtl_idx = 0;
+	lppaca_of(dtl->cpu).dtl_idx = 0;
 
 	/* ensure that our updates to the lppaca fields have occurred before
 	 * we actually enable the logging */
@@ -102,17 +175,66 @@ static int dtl_enable(struct dtl *dtl)
 	return 0;
 }
 
-static void dtl_disable(struct dtl *dtl)
+static void dtl_stop(struct dtl *dtl)
 {
 	int hwcpu = get_hard_smp_processor_id(dtl->cpu);
 
 	lppaca_of(dtl->cpu).dtl_enable_mask = 0x0;
 
 	unregister_dtl(hwcpu, __pa(dtl->buf));
+}
+
+static u64 dtl_current_index(struct dtl *dtl)
+{
+	return lppaca_of(dtl->cpu).dtl_idx;
+}
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 
+static int dtl_enable(struct dtl *dtl)
+{
+	long int n_entries;
+	long int rc;
+	struct dtl_entry *buf = NULL;
+
+	/* only allow one reader */
+	if (dtl->buf)
+		return -EBUSY;
+
+	n_entries = dtl_buf_entries;
+	buf = kmalloc_node(n_entries * sizeof(struct dtl_entry),
+			GFP_KERNEL, cpu_to_node(dtl->cpu));
+	if (!buf) {
+		printk(KERN_WARNING "%s: buffer alloc failed for cpu %d\n",
+				__func__, dtl->cpu);
+		return -ENOMEM;
+	}
+
+	spin_lock(&dtl->lock);
+	rc = -EBUSY;
+	if (!dtl->buf) {
+		/* store the original allocation size for use during read */
+		dtl->buf_entries = n_entries;
+		dtl->buf = buf;
+		dtl->last_idx = 0;
+		rc = dtl_start(dtl);
+		if (rc)
+			dtl->buf = NULL;
+	}
+	spin_unlock(&dtl->lock);
+
+	if (rc)
+		kfree(buf);
+	return rc;
+}
+
+static void dtl_disable(struct dtl *dtl)
+{
+	spin_lock(&dtl->lock);
+	dtl_stop(dtl);
 	kfree(dtl->buf);
 	dtl->buf = NULL;
 	dtl->buf_entries = 0;
+	spin_unlock(&dtl->lock);
 }
 
 /* file interface */
@@ -140,8 +262,9 @@ static int dtl_file_release(struct inode *inode, struct file *filp)
 static ssize_t dtl_file_read(struct file *filp, char __user *buf, size_t len,
 		loff_t *pos)
 {
-	int rc, cur_idx, last_idx, n_read, n_req, read_size;
+	long int rc, n_read, n_req, read_size;
 	struct dtl *dtl;
+	u64 cur_idx, last_idx, i;
 
 	if ((len % sizeof(struct dtl_entry)) != 0)
 		return -EINVAL;
@@ -154,41 +277,49 @@ static ssize_t dtl_file_read(struct file *filp, char __user *buf, size_t len,
 	/* actual number of entries read */
 	n_read = 0;
 
-	cur_idx = lppaca_of(dtl->cpu).dtl_idx;
+	spin_lock(&dtl->lock);
+
+	cur_idx = dtl_current_index(dtl);
 	last_idx = dtl->last_idx;
 
-	if (cur_idx - last_idx > dtl->buf_entries) {
-		pr_debug("%s: hv buffer overflow for cpu %d, samples lost\n",
-				__func__, dtl->cpu);
-	}
+	if (last_idx + dtl->buf_entries <= cur_idx)
+		last_idx = cur_idx - dtl->buf_entries + 1;
+
+	if (last_idx + n_req > cur_idx)
+		n_req = cur_idx - last_idx;
 
-	cur_idx  %= dtl->buf_entries;
-	last_idx %= dtl->buf_entries;
+	if (n_req > 0)
+		dtl->last_idx = last_idx + n_req;
+
+	spin_unlock(&dtl->lock);
+
+	if (n_req <= 0)
+		return 0;
+
+	i = last_idx % dtl->buf_entries;
 
 	/* read the tail of the buffer if we've wrapped */
-	if (last_idx > cur_idx) {
-		read_size = min(n_req, dtl->buf_entries - last_idx);
+	if (i + n_req > dtl->buf_entries) {
+		read_size = dtl->buf_entries - i;
 
-		rc = copy_to_user(buf, &dtl->buf[last_idx],
+		rc = copy_to_user(buf, &dtl->buf[i],
 				read_size * sizeof(struct dtl_entry));
 		if (rc)
 			return -EFAULT;
 
-		last_idx = 0;
+		i = 0;
 		n_req -= read_size;
 		n_read += read_size;
 		buf += read_size * sizeof(struct dtl_entry);
 	}
 
 	/* .. and now the head */
-	read_size = min(n_req, cur_idx - last_idx);
-	rc = copy_to_user(buf, &dtl->buf[last_idx],
-			read_size * sizeof(struct dtl_entry));
+	rc = copy_to_user(buf, &dtl->buf[i], n_req * sizeof(struct dtl_entry));
 	if (rc)
 		return -EFAULT;
 
-	n_read += read_size;
-	dtl->last_idx += n_read;
+	n_read += n_req;
+	dtl->last_idx = last_idx + n_read;
 
 	return n_read * sizeof(struct dtl_entry);
 }
@@ -220,11 +351,6 @@ static int dtl_init(void)
 	struct dentry *event_mask_file, *buf_entries_file;
 	int rc, i;
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-	/* disable this for now */
-	return -ENODEV;
-#endif
-
 	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
 		return -ENODEV;
 
@@ -251,6 +377,7 @@ static int dtl_init(void)
 	/* set up the per-cpu log structures */
 	for_each_possible_cpu(i) {
 		struct dtl *dtl = &per_cpu(cpu_dtl, i);
+		spin_lock_init(&dtl->lock);
 		dtl->cpu = i;
 
 		rc = dtl_setup_file(dtl);
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/2] powerpc: Account time using timebase rather than PURR
From: Paul Mackerras @ 2010-08-27  5:56 UTC (permalink / raw)
  To: linuxppc-dev

Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
PURR register for measuring the user and system time used by
processes, as well as other related times such as hardirq and
softirq times.  This turns out to be quite confusing for users
because it means that a program will often be measured as taking
less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
than it does when run on a single-threaded processor (ST mode), even
though the program takes longer to finish.  The discrepancy is
accounted for as stolen time, which is also confusing, particularly
when there are no other partitions running.

This changes the accounting to use the timebase instead, meaning that
the reported user and system times are the actual number of real-time
seconds that the program was executing on the processor thread,
regardless of which SMT mode the processor is in.  Thus a program will
generally show greater user and system times when run on a
multi-threaded processor than on a single-threaded processor.

On pSeries systems on POWER5 or later processors, we measure the
stolen time (time when this partition wasn't running) using the
hypervisor dispatch trace log.  We check for new entries in the
log on every entry from user mode and on every transition from
kernel process context to soft or hard IRQ context (i.e. when
account_system_vtime() gets called).  So that we can correctly
distinguish time stolen from user time and time stolen from system
time, without having to check the log on every exit to user mode,
we store separate timestamps for exit to user mode and entry from
user mode.

On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
in account_system_vtime() (as before), and then apportion the SPURR
ticks since the last time we read it between scaled user time and
scaled system time according to the relative proportions of user
time and system time over the same interval.  This avoids having to
read the SPURR on every kernel entry and exit.  On systems that have
PURR but not SPURR (i.e., POWER5), we do the same using the PURR
rather than the SPURR.

This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
for now since it conflicts with the use of the dispatch trace log
by the time accounting code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
This series goes on top of my "powerpc: Dynamically allocate most
lppaca structs" patch.

 arch/powerpc/include/asm/exception-64s.h |    3 +-
 arch/powerpc/include/asm/lppaca.h        |   19 ++
 arch/powerpc/include/asm/paca.h          |   10 +-
 arch/powerpc/include/asm/ppc_asm.h       |   50 ++++--
 arch/powerpc/include/asm/time.h          |    5 -
 arch/powerpc/kernel/asm-offsets.c        |    8 +-
 arch/powerpc/kernel/entry_64.S           |   18 ++
 arch/powerpc/kernel/process.c            |    1 -
 arch/powerpc/kernel/smp.c                |    5 -
 arch/powerpc/kernel/time.c               |  268 ++++++++++++++----------------
 arch/powerpc/platforms/pseries/dtl.c     |   24 +--
 arch/powerpc/platforms/pseries/lpar.c    |   21 +++
 arch/powerpc/platforms/pseries/setup.c   |   52 ++++++
 13 files changed, 290 insertions(+), 194 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 57c4000..7778d6f 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -137,7 +137,8 @@
 	li	r10,0;							   \
 	ld	r11,exception_marker@toc(r2);				   \
 	std	r10,RESULT(r1);		/* clear regs->result		*/ \
-	std	r11,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame	*/
+	std	r11,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame	*/ \
+	ACCOUNT_STOLEN_TIME
 
 /*
  * Exception vectors.
diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index 6d02624..cfb85ec 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -172,6 +172,25 @@ struct slb_shadow {
 
 extern struct slb_shadow slb_shadow[];
 
+/*
+ * Layout of entries in the hypervisor's dispatch trace log buffer.
+ */
+struct dtl_entry {
+	u8	dispatch_reason;
+	u8	preempt_reason;
+	u16	processor_id;
+	u32	enqueue_to_dispatch_time;
+	u32	ready_to_enqueue_time;
+	u32	waiting_to_ready_time;
+	u64	timebase;
+	u64	fault_addr;
+	u64	srr0;
+	u64	srr1;
+};
+
+#define DISPATCH_LOG_BYTES	4096	/* bytes per cpu */
+#define N_DISPATCH_LOG		(DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
+
 #endif /* CONFIG_PPC_BOOK3S */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_LPPACA_H */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 1ff6662..6af6c16 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -85,6 +85,8 @@ struct paca_struct {
 	u8 kexec_state;		/* set when kexec down has irqs off */
 #ifdef CONFIG_PPC_STD_MMU_64
 	struct slb_shadow *slb_shadow_ptr;
+	struct dtl_entry *dispatch_log;
+	struct dtl_entry *dispatch_log_end;
 
 	/*
 	 * Now, starting in cacheline 2, the exception save areas
@@ -134,8 +136,14 @@ struct paca_struct {
 	/* Stuff for accurate time accounting */
 	u64 user_time;			/* accumulated usermode TB ticks */
 	u64 system_time;		/* accumulated system TB ticks */
-	u64 startpurr;			/* PURR/TB value snapshot */
+	u64 user_time_scaled;		/* accumulated usermode SPURR ticks */
+	u64 starttime;			/* TB value snapshot */
+	u64 starttime_user;		/* TB value on exit to usermode */
 	u64 startspurr;			/* SPURR value snapshot */
+	u64 utime_sspurr;		/* ->user_time when ->startspurr set */
+	u64 stolen_time;		/* TB ticks taken by hypervisor */
+	u64 dtl_ridx;			/* read index in dispatch log */
+	struct dtl_entry *dtl_curr;	/* pointer corresponding to dtl_ridx */
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
 	/* We use this to store guest state in */
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 498fe09..9821006 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -9,6 +9,7 @@
 #include <asm/asm-compat.h>
 #include <asm/processor.h>
 #include <asm/ppc-opcode.h>
+#include <asm/firmware.h>
 
 #ifndef __ASSEMBLY__
 #error __FILE__ should only be used in assembler files
@@ -26,17 +27,13 @@
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 #define ACCOUNT_CPU_USER_ENTRY(ra, rb)
 #define ACCOUNT_CPU_USER_EXIT(ra, rb)
+#define ACCOUNT_STOLEN_TIME
 #else
 #define ACCOUNT_CPU_USER_ENTRY(ra, rb)					\
 	beq	2f;			/* if from kernel mode */	\
-BEGIN_FTR_SECTION;							\
-	mfspr	ra,SPRN_PURR;		/* get processor util. reg */	\
-END_FTR_SECTION_IFSET(CPU_FTR_PURR);					\
-BEGIN_FTR_SECTION;							\
-	MFTB(ra);			/* or get TB if no PURR */	\
-END_FTR_SECTION_IFCLR(CPU_FTR_PURR);					\
-	ld	rb,PACA_STARTPURR(r13);					\
-	std	ra,PACA_STARTPURR(r13);					\
+	MFTB(ra);			/* get timebase */		\
+	ld	rb,PACA_STARTTIME_USER(r13);				\
+	std	ra,PACA_STARTTIME(r13);					\
 	subf	rb,rb,ra;		/* subtract start value */	\
 	ld	ra,PACA_USER_TIME(r13);					\
 	add	ra,ra,rb;		/* add on to user time */	\
@@ -44,19 +41,34 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);					\
 2:
 
 #define ACCOUNT_CPU_USER_EXIT(ra, rb)					\
-BEGIN_FTR_SECTION;							\
-	mfspr	ra,SPRN_PURR;		/* get processor util. reg */	\
-END_FTR_SECTION_IFSET(CPU_FTR_PURR);					\
-BEGIN_FTR_SECTION;							\
-	MFTB(ra);			/* or get TB if no PURR */	\
-END_FTR_SECTION_IFCLR(CPU_FTR_PURR);					\
-	ld	rb,PACA_STARTPURR(r13);					\
-	std	ra,PACA_STARTPURR(r13);					\
+	MFTB(ra);			/* get timebase */		\
+	ld	rb,PACA_STARTTIME(r13);					\
+	std	ra,PACA_STARTTIME_USER(r13);				\
 	subf	rb,rb,ra;		/* subtract start value */	\
 	ld	ra,PACA_SYSTEM_TIME(r13);				\
-	add	ra,ra,rb;		/* add on to user time */	\
-	std	ra,PACA_SYSTEM_TIME(r13);
-#endif
+	add	ra,ra,rb;		/* add on to system time */	\
+	std	ra,PACA_SYSTEM_TIME(r13)
+
+#ifdef CONFIG_PPC_SPLPAR
+#define ACCOUNT_STOLEN_TIME						\
+BEGIN_FW_FTR_SECTION;							\
+	beq	33f;							\
+	/* from user - see if there are any DTL entries to process */	\
+	ld	r10,PACALPPACAPTR(r13);	/* get ptr to VPA */		\
+	ld	r11,PACA_DTL_RIDX(r13);	/* get log read index */	\
+	ld	r10,LPPACA_DTLIDX(r10);	/* get log write index */	\
+	cmpd	cr1,r11,r10;						\
+	beq+	cr1,33f;						\
+	bl	.accumulate_stolen_time;				\
+33:									\
+END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
+
+#else  /* CONFIG_PPC_SPLPAR */
+#define ACCOUNT_STOLEN_TIME
+
+#endif /* CONFIG_PPC_SPLPAR */
+
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 
 /*
  * Macros for storing registers into and loading registers from
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index dc779df..fe6f7c2 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -34,7 +34,6 @@ extern void to_tm(int tim, struct rtc_time * tm);
 extern void GregorianDay(struct rtc_time *tm);
 
 extern void generic_calibrate_decr(void);
-extern void snapshot_timebase(void);
 
 extern void set_dec_cpu6(unsigned int val);
 
@@ -212,12 +211,8 @@ struct cpu_usage {
 DECLARE_PER_CPU(struct cpu_usage, cpu_usage_array);
 
 #if defined(CONFIG_VIRT_CPU_ACCOUNTING)
-extern void calculate_steal_time(void);
-extern void snapshot_timebases(void);
 #define account_process_vtime(tsk)		account_process_tick(tsk, 0)
 #else
-#define calculate_steal_time()			do { } while (0)
-#define snapshot_timebases()			do { } while (0)
 #define account_process_vtime(tsk)		do { } while (0)
 #endif
 
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 1c0607d..c634940 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -181,17 +181,19 @@ int main(void)
 	       offsetof(struct slb_shadow, save_area[SLB_NUM_BOLTED - 1].vsid));
 	DEFINE(SLBSHADOW_STACKESID,
 	       offsetof(struct slb_shadow, save_area[SLB_NUM_BOLTED - 1].esid));
+	DEFINE(SLBSHADOW_SAVEAREA, offsetof(struct slb_shadow, save_area));
 	DEFINE(LPPACASRR0, offsetof(struct lppaca, saved_srr0));
 	DEFINE(LPPACASRR1, offsetof(struct lppaca, saved_srr1));
 	DEFINE(LPPACAANYINT, offsetof(struct lppaca, int_dword.any_int));
 	DEFINE(LPPACADECRINT, offsetof(struct lppaca, int_dword.fields.decr_int));
-	DEFINE(SLBSHADOW_SAVEAREA, offsetof(struct slb_shadow, save_area));
+	DEFINE(LPPACA_DTLIDX, offsetof(struct lppaca, dtl_idx));
+	DEFINE(PACA_DTL_RIDX, offsetof(struct paca_struct, dtl_ridx));
 #endif /* CONFIG_PPC_STD_MMU_64 */
 	DEFINE(PACAEMERGSP, offsetof(struct paca_struct, emergency_sp));
 	DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id));
 	DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state));
-	DEFINE(PACA_STARTPURR, offsetof(struct paca_struct, startpurr));
-	DEFINE(PACA_STARTSPURR, offsetof(struct paca_struct, startspurr));
+	DEFINE(PACA_STARTTIME, offsetof(struct paca_struct, starttime));
+	DEFINE(PACA_STARTTIME_USER, offsetof(struct paca_struct, starttime_user));
 	DEFINE(PACA_USER_TIME, offsetof(struct paca_struct, user_time));
 	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 2f1a6be..eb02f24 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -97,6 +97,24 @@ system_call_common:
 	addi	r9,r1,STACK_FRAME_OVERHEAD
 	ld	r11,exception_marker@toc(r2)
 	std	r11,-16(r9)		/* "regshere" marker */
+#if defined(CONFIG_VIRT_CPU_ACCOUNTING) && defined(CONFIG_PPC_SPLPAR)
+BEGIN_FW_FTR_SECTION
+	beq	33f
+	/* if from user, see if there are any DTL entries to process */
+	ld	r10,PACALPPACAPTR(r13)	/* get ptr to VPA */
+	ld	r11,PACA_DTL_RIDX(r13)	/* get log read index */
+	ld	r10,LPPACA_DTLIDX(r10)	/* get log write index */
+	cmpd	cr1,r11,r10
+	beq+	cr1,33f
+	bl	.accumulate_stolen_time
+	REST_GPR(0,r1)
+	REST_4GPRS(3,r1)
+	REST_2GPRS(7,r1)
+	addi	r9,r1,STACK_FRAME_OVERHEAD
+33:
+END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING && CONFIG_PPC_SPLPAR */
+
 #ifdef CONFIG_TRACE_IRQFLAGS
 	bl	.trace_hardirqs_on
 	REST_GPR(0,r1)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index feacfb7..806fa8a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -517,7 +517,6 @@ struct task_struct *__switch_to(struct task_struct *prev,
 
 	account_system_vtime(current);
 	account_process_vtime(current);
-	calculate_steal_time();
 
 	/*
 	 * We can't take a PMU exception inside _switch() since there is a
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index a61b3dd..af31c3f 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -508,9 +508,6 @@ int __devinit start_secondary(void *unused)
 	if (smp_ops->take_timebase)
 		smp_ops->take_timebase();
 
-	if (system_state > SYSTEM_BOOTING)
-		snapshot_timebase();
-
 	secondary_cpu_time_init();
 
 	ipi_call_lock();
@@ -575,8 +572,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
 	free_cpumask_var(old_mask);
 
-	snapshot_timebases();
-
 	dump_numa_cpu_topology();
 }
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 8533b3b..fca2064 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -164,8 +164,6 @@ unsigned long ppc_proc_freq;
 EXPORT_SYMBOL(ppc_proc_freq);
 unsigned long ppc_tb_freq;
 
-static DEFINE_PER_CPU(u64, last_jiffy);
-
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
 /*
  * Factors for converting from cputime_t (timebase ticks) to
@@ -200,62 +198,151 @@ static void calc_cputime_factors(void)
 }
 
 /*
- * Read the PURR on systems that have it, otherwise the timebase.
+ * Read the SPURR on systems that have it, otherwise the PURR,
+ * or if that doesn't exist return the timebase value passed in.
  */
-static u64 read_purr(void)
+static u64 read_spurr(u64 tb)
 {
+	if (cpu_has_feature(CPU_FTR_SPURR))
+		return mfspr(SPRN_SPURR);
 	if (cpu_has_feature(CPU_FTR_PURR))
 		return mfspr(SPRN_PURR);
-	return mftb();
+	return tb;
 }
 
+#ifdef CONFIG_PPC_SPLPAR
+
 /*
- * Read the SPURR on systems that have it, otherwise the purr
+ * Scan the dispatch trace log and count up the stolen time.
+ * Should be called with interrupts disabled.
  */
-static u64 read_spurr(u64 purr)
+static u64 scan_dispatch_log(u64 stop_tb)
 {
-	/*
-	 * cpus without PURR won't have a SPURR
-	 * We already know the former when we use this, so tell gcc
-	 */
-	if (cpu_has_feature(CPU_FTR_PURR) && cpu_has_feature(CPU_FTR_SPURR))
-		return mfspr(SPRN_SPURR);
-	return purr;
+	unsigned long i = local_paca->dtl_ridx;
+	struct dtl_entry *dtl = local_paca->dtl_curr;
+	struct dtl_entry *dtl_end = local_paca->dispatch_log_end;
+	struct lppaca *vpa = local_paca->lppaca_ptr;
+	u64 tb_delta;
+	u64 stolen = 0;
+	u64 dtb;
+
+	if (i == vpa->dtl_idx)
+		return 0;
+	while (i < vpa->dtl_idx) {
+		dtb = dtl->timebase;
+		tb_delta = dtl->enqueue_to_dispatch_time +
+			dtl->ready_to_enqueue_time;
+		barrier();
+		if (i + N_DISPATCH_LOG < vpa->dtl_idx) {
+			/* buffer has overflowed */
+			i = vpa->dtl_idx - N_DISPATCH_LOG;
+			dtl = local_paca->dispatch_log + (i % N_DISPATCH_LOG);
+			continue;
+		}
+		if (dtb > stop_tb)
+			break;
+		stolen += tb_delta;
+		++i;
+		++dtl;
+		if (dtl == dtl_end)
+			dtl = local_paca->dispatch_log;
+	}
+	local_paca->dtl_ridx = i;
+	local_paca->dtl_curr = dtl;
+	return stolen;
 }
 
 /*
+ * Accumulate stolen time by scanning the dispatch trace log.
+ * Called on entry from user mode.
+ */
+void accumulate_stolen_time(void)
+{
+	u64 sst, ust;
+
+	sst = scan_dispatch_log(get_paca()->starttime_user);
+	ust = scan_dispatch_log(get_paca()->starttime);
+	get_paca()->system_time -= sst;
+	get_paca()->user_time -= ust;
+	get_paca()->stolen_time += ust + sst;
+}
+
+static inline u64 calculate_stolen_time(u64 stop_tb)
+{
+	u64 stolen = 0;
+
+	if (get_paca()->dtl_ridx != get_paca()->lppaca_ptr->dtl_idx) {
+		stolen = scan_dispatch_log(stop_tb);
+		get_paca()->system_time -= stolen;
+	}
+
+	stolen += get_paca()->stolen_time;
+	get_paca()->stolen_time = 0;
+	return stolen;
+}
+
+#else /* CONFIG_PPC_SPLPAR */
+static inline u64 calculate_stolen_time(u64 stop_tb)
+{
+	return 0;
+}
+
+#endif /* CONFIG_PPC_SPLPAR */
+
+/*
  * Account time for a transition between system, hard irq
  * or soft irq state.
  */
 void account_system_vtime(struct task_struct *tsk)
 {
-	u64 now, nowscaled, delta, deltascaled, sys_time;
+	u64 now, nowscaled, delta, deltascaled;
 	unsigned long flags;
+	u64 stolen, udelta, sys_scaled, user_scaled;
 
 	local_irq_save(flags);
-	now = read_purr();
+	now = mftb();
 	nowscaled = read_spurr(now);
-	delta = now - get_paca()->startpurr;
+	get_paca()->system_time += now - get_paca()->starttime;
+	get_paca()->starttime = now;
 	deltascaled = nowscaled - get_paca()->startspurr;
-	get_paca()->startpurr = now;
 	get_paca()->startspurr = nowscaled;
-	if (!in_interrupt()) {
-		/* deltascaled includes both user and system time.
-		 * Hence scale it based on the purr ratio to estimate
-		 * the system time */
-		sys_time = get_paca()->system_time;
-		if (get_paca()->user_time)
-			deltascaled = deltascaled * sys_time /
-			     (sys_time + get_paca()->user_time);
-		delta += sys_time;
-		get_paca()->system_time = 0;
+
+	stolen = calculate_stolen_time(now);
+
+	delta = get_paca()->system_time;
+	get_paca()->system_time = 0;
+	udelta = get_paca()->user_time - get_paca()->utime_sspurr;
+	get_paca()->utime_sspurr = get_paca()->user_time;
+
+	/*
+	 * Because we don't read the SPURR on every kernel entry/exit,
+	 * deltascaled includes both user and system SPURR ticks.
+	 * Apportion these ticks to system SPURR ticks and user
+	 * SPURR ticks in the same ratio as the system time (delta)
+	 * and user time (udelta) values obtained from the timebase
+	 * over the same interval.  The system ticks get accounted here;
+	 * the user ticks get saved up in paca->user_time_scaled to be
+	 * used by account_process_tick.
+	 */
+	sys_scaled = delta;
+	user_scaled = udelta;
+	if (deltascaled != delta + udelta) {
+		if (udelta) {
+			sys_scaled = deltascaled * delta / (delta + udelta);
+			user_scaled = deltascaled - sys_scaled;
+		} else {
+			sys_scaled = deltascaled;
+		}
+	}
+	get_paca()->user_time_scaled += user_scaled;
+
+	if (in_irq() || idle_task(smp_processor_id()) != tsk) {
+		account_system_time(tsk, 0, delta, sys_scaled);
+		if (stolen)
+			account_steal_time(stolen);
+	} else {
+		account_idle_time(delta + stolen);
 	}
-	if (in_irq() || idle_task(smp_processor_id()) != tsk)
-		account_system_time(tsk, 0, delta, deltascaled);
-	else
-		account_idle_time(delta);
-	__get_cpu_var(cputime_last_delta) = delta;
-	__get_cpu_var(cputime_scaled_last_delta) = deltascaled;
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(account_system_vtime);
@@ -265,125 +352,26 @@ EXPORT_SYMBOL_GPL(account_system_vtime);
  * by the exception entry and exit code to the generic process
  * user and system time records.
  * Must be called with interrupts disabled.
+ * Assumes that account_system_vtime() has been called recently
+ * (i.e. since the last entry from usermode) so that
+ * get_paca()->user_time_scaled is up to date.
  */
 void account_process_tick(struct task_struct *tsk, int user_tick)
 {
 	cputime_t utime, utimescaled;
 
 	utime = get_paca()->user_time;
+	utimescaled = get_paca()->user_time_scaled;
 	get_paca()->user_time = 0;
-	utimescaled = cputime_to_scaled(utime);
+	get_paca()->user_time_scaled = 0;
+	get_paca()->utime_sspurr = 0;
 	account_user_time(tsk, utime, utimescaled);
 }
 
-/*
- * Stuff for accounting stolen time.
- */
-struct cpu_purr_data {
-	int	initialized;			/* thread is running */
-	u64	tb;			/* last TB value read */
-	u64	purr;			/* last PURR value read */
-	u64	spurr;			/* last SPURR value read */
-};
-
-/*
- * Each entry in the cpu_purr_data array is manipulated only by its
- * "owner" cpu -- usually in the timer interrupt but also occasionally
- * in process context for cpu online.  As long as cpus do not touch
- * each others' cpu_purr_data, disabling local interrupts is
- * sufficient to serialize accesses.
- */
-static DEFINE_PER_CPU(struct cpu_purr_data, cpu_purr_data);
-
-static void snapshot_tb_and_purr(void *data)
-{
-	unsigned long flags;
-	struct cpu_purr_data *p = &__get_cpu_var(cpu_purr_data);
-
-	local_irq_save(flags);
-	p->tb = get_tb_or_rtc();
-	p->purr = mfspr(SPRN_PURR);
-	wmb();
-	p->initialized = 1;
-	local_irq_restore(flags);
-}
-
-/*
- * Called during boot when all cpus have come up.
- */
-void snapshot_timebases(void)
-{
-	if (!cpu_has_feature(CPU_FTR_PURR))
-		return;
-	on_each_cpu(snapshot_tb_and_purr, NULL, 1);
-}
-
-/*
- * Must be called with interrupts disabled.
- */
-void calculate_steal_time(void)
-{
-	u64 tb, purr;
-	s64 stolen;
-	struct cpu_purr_data *pme;
-
-	pme = &__get_cpu_var(cpu_purr_data);
-	if (!pme->initialized)
-		return;		/* !CPU_FTR_PURR or early in early boot */
-	tb = mftb();
-	purr = mfspr(SPRN_PURR);
-	stolen = (tb - pme->tb) - (purr - pme->purr);
-	if (stolen > 0) {
-		if (idle_task(smp_processor_id()) != current)
-			account_steal_time(stolen);
-		else
-			account_idle_time(stolen);
-	}
-	pme->tb = tb;
-	pme->purr = purr;
-}
-
-#ifdef CONFIG_PPC_SPLPAR
-/*
- * Must be called before the cpu is added to the online map when
- * a cpu is being brought up at runtime.
- */
-static void snapshot_purr(void)
-{
-	struct cpu_purr_data *pme;
-	unsigned long flags;
-
-	if (!cpu_has_feature(CPU_FTR_PURR))
-		return;
-	local_irq_save(flags);
-	pme = &__get_cpu_var(cpu_purr_data);
-	pme->tb = mftb();
-	pme->purr = mfspr(SPRN_PURR);
-	pme->initialized = 1;
-	local_irq_restore(flags);
-}
-
-#endif /* CONFIG_PPC_SPLPAR */
-
 #else /* ! CONFIG_VIRT_CPU_ACCOUNTING */
 #define calc_cputime_factors()
-#define calculate_steal_time()		do { } while (0)
 #endif
 
-#if !(defined(CONFIG_VIRT_CPU_ACCOUNTING) && defined(CONFIG_PPC_SPLPAR))
-#define snapshot_purr()			do { } while (0)
-#endif
-
-/*
- * Called when a cpu comes up after the system has finished booting,
- * i.e. as a result of a hotplug cpu action.
- */
-void snapshot_timebase(void)
-{
-	__get_cpu_var(last_jiffy) = get_tb_or_rtc();
-	snapshot_purr();
-}
-
 void __delay(unsigned long loops)
 {
 	unsigned long start;
@@ -585,8 +573,6 @@ void timer_interrupt(struct pt_regs * regs)
 	old_regs = set_irq_regs(regs);
 	irq_enter();
 
-	calculate_steal_time();
-
 	if (test_perf_event_pending()) {
 		clear_perf_event_pending();
 		perf_event_do_pending();
diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c
index adfd544..0357655 100644
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -27,27 +27,10 @@
 #include <asm/system.h>
 #include <asm/uaccess.h>
 #include <asm/firmware.h>
+#include <asm/lppaca.h>
 
 #include "plpar_wrappers.h"
 
-/*
- * Layout of entries in the hypervisor's DTL buffer. Although we don't
- * actually access the internals of an entry (we only need to know the size),
- * we might as well define it here for reference.
- */
-struct dtl_entry {
-	u8	dispatch_reason;
-	u8	preempt_reason;
-	u16	processor_id;
-	u32	enqueue_to_dispatch_time;
-	u32	ready_to_enqueue_time;
-	u32	waiting_to_ready_time;
-	u64	timebase;
-	u64	fault_addr;
-	u64	srr0;
-	u64	srr1;
-};
-
 struct dtl {
 	struct dtl_entry	*buf;
 	struct dentry		*file;
@@ -237,6 +220,11 @@ static int dtl_init(void)
 	struct dentry *event_mask_file, *buf_entries_file;
 	int rc, i;
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+	/* disable this for now */
+	return -ENODEV;
+#endif
+
 	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
 		return -ENODEV;
 
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index a17fe4a..f129040 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -248,6 +248,8 @@ void vpa_init(int cpu)
 	int hwcpu = get_hard_smp_processor_id(cpu);
 	unsigned long addr;
 	long ret;
+	struct paca_struct *pp;
+	struct dtl_entry *dtl;
 
 	if (cpu_has_feature(CPU_FTR_ALTIVEC))
 		lppaca_of(cpu).vmxregs_in_use = 1;
@@ -274,6 +276,25 @@ void vpa_init(int cpu)
 			       "registration for cpu %d (hw %d) of area %lx "
 			       "returns %ld\n", cpu, hwcpu, addr, ret);
 	}
+
+	/*
+	 * Register dispatch trace log, if one has been allocated.
+	 */
+	pp = &paca[cpu];
+	dtl = pp->dispatch_log;
+	if (dtl) {
+		pp->dtl_ridx = 0;
+		pp->dtl_curr = dtl;
+		lppaca_of(cpu).dtl_idx = 0;
+
+		/* hypervisor reads buffer length from this field */
+		dtl->enqueue_to_dispatch_time = DISPATCH_LOG_BYTES;
+		ret = register_dtl(hwcpu, __pa(dtl));
+		if (ret)
+			pr_warn("DTL registration failed for cpu %d (%ld)\n",
+				cpu, ret);
+		lppaca_of(cpu).dtl_enable_mask = 2;
+	}
 }
 
 static long pSeries_lpar_hpte_insert(unsigned long hpte_group,
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index a6d19e3..d345bfd 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -273,6 +273,58 @@ static struct notifier_block pci_dn_reconfig_nb = {
 	.notifier_call = pci_dn_reconfig_notifier,
 };
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+/*
+ * Allocate space for the dispatch trace log for all possible cpus
+ * and register the buffers with the hypervisor.  This is used for
+ * computing time stolen by the hypervisor.
+ */
+static int alloc_dispatch_logs(void)
+{
+	int cpu, ret;
+	struct paca_struct *pp;
+	struct dtl_entry *dtl;
+
+	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+		return 0;
+
+	for_each_possible_cpu(cpu) {
+		pp = &paca[cpu];
+		dtl = kmalloc_node(DISPATCH_LOG_BYTES, GFP_KERNEL,
+				   cpu_to_node(cpu));
+		if (!dtl) {
+			pr_warn("Failed to allocate dispatch trace log for cpu %d\n",
+				cpu);
+			pr_warn("Stolen time statistics will be unreliable\n");
+			break;
+		}
+
+		pp->dtl_ridx = 0;
+		pp->dispatch_log = dtl;
+		pp->dispatch_log_end = dtl + N_DISPATCH_LOG;
+		pp->dtl_curr = dtl;
+	}
+
+	/* Register the DTL for the current (boot) cpu */
+	dtl = get_paca()->dispatch_log;
+	get_paca()->dtl_ridx = 0;
+	get_paca()->dtl_curr = dtl;
+	get_paca()->lppaca_ptr->dtl_idx = 0;
+
+	/* hypervisor reads buffer length from this field */
+	dtl->enqueue_to_dispatch_time = DISPATCH_LOG_BYTES;
+	ret = register_dtl(hard_smp_processor_id(), __pa(dtl));
+	if (ret)
+		pr_warn("DTL registration failed for boot cpu %d (%d)\n",
+			smp_processor_id(), ret);
+	get_paca()->lppaca_ptr->dtl_enable_mask = 2;
+
+	return 0;
+}
+
+early_initcall(alloc_dispatch_logs);
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING */
+
 static void __init pSeries_setup_arch(void)
 {
 	/* Discover PIC type and setup ppc_md accordingly */
-- 
1.7.1

^ permalink raw reply related

* Re: Power machines fail to boot after build being successful
From: Stephen Rothwell @ 2010-08-27  2:01 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, linux-next, LKML, divya
In-Reply-To: <2311.1282871746@neuling.org>

[-- Attachment #1: Type: text/plain, Size: 725 bytes --]

Hi Mikey,

On Fri, 27 Aug 2010 11:15:46 +1000 Michael Neuling <mikey@neuling.org> wrote:
>
> > After successfully building the kernel version
> > 2.6.36-rc2-git4(commitid d4348c678977c) with the config file
> > attached(used make oldconfig), P5 and P6 power machines fails to
> > reboot with the following logs
> > 
> > Logs collected while rebooting into today next, same occurs with the
> > upstream kernel too.
> 
> This will fix your problem:
> http://patchwork.ozlabs.org/patch/62757/ 

I have seen something similar in my linux-next boot tests, so I will add
this patch to linux-next for today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: john stultz @ 2010-08-27  1:57 UTC (permalink / raw)
  To: Christian Riesch
  Cc: Rodolfo Giometti, Arnd Bergmann, netdev, devicetree-discuss,
	linux-kernel, linuxppc-dev, Richard Cochran, linux-arm-kernel,
	Krzysztof Halasa
In-Reply-To: <AANLkTi=nKrRHH6+9mvny718gov8Q+u7DHFikdgNY8YdX@mail.gmail.com>

On Wed, 2010-08-25 at 11:40 +0200, Christian Riesch wrote:
> What you describe here is only one of the use cases. If the hardware
> has a single network port and operates as a PTP slave, it timestamps
> the PTP packets that are sent and received and subsequently uses these
> timestamps and the information it received from the master in the
> packets to steer its clock to align it with the master clock. In such
> a case the timestamping hardware and the clock hardware work together
> closely and it seems to be okay to use the same interface to control
> both the timestamping and the PTP clock.
> 
> But we have to consider other use cases, e.g.,
> 
> 1) Boundary clocks:
> We have more than one network port. One port operates as a slave
> clock, our system gets time information via this port and steers its
> PTP clock to align with the master clock. The other network ports of
> our system operate as master clocks and redistribute the time
> information we got from the master to other clocks on these networks.
> In such a case we do timestamping on each of the network ports, but we
> only have a single PTP clock. Each network port's timestamping
> hardware uses the same hardware clock to generate time stamps.
> 
> 2) Master clock:
> We have one or more network ports. Our system has a really good clock
> (ovenized quartz crystal, an atomic clock, a GPS timing receiver...)
> and it distributes this time on the network. In such a case we do not
> steer our clock based on the (packet) timestamps we get from our
> timestamping unit. Instead, we directly drive our clock hardware with
> a very stable frequency that we get from the OCXO or the atomic
> clock... 

Ok. Following you here...

> or we use one of the ancillary features of the PTP clock that
> Richard mentioned to timestamp not network packets but a 1pps signal
> and use these timestamps to steer the clock. 

Wait.. I thought we weren't using PTP to steer the clock? But now we're
using the pps signal from it to do so? Do I misunderstand you? Or did
you just not mean this?

> Packet time stamping is
> used to distribute the time to the slaves, but it is not part of the
> control loop in this case.

I assume here you mean PTPd is steering the PTP clock according to the
system time (which is NTP/GPS/whatever sourced)? And then the PTP clock
distributes that time through the network?

> So in the first case we have one PTP clock but several network packet
> timestamping units, whereas in the second case the packet timestamping
> is done but it is not part of the control loop that steers the clock.
> Of course in most hardware implementations both the PTP clock and the
> timestamping unit sit on the same chip and often use the same means of
> communication to the cpu, e.g., the MDIO bus, but I think we need some
> logical separation here.


So first of all, thanks for the extra explanation and context here! I
really appreciate it, as I'm not familiar with all the hardware details
and possible use cases, but I'm trying to learn.

So in the two cases you mention, the time "flow" is something like:

#1) [Master Clock on Network1] => [PTP Clock] => [PTPd] =>
	[PTP Clock] => [PTP Clients on Network2]

#2) [GPS] => [NTPd] => [System Time] => [PTPd] => [PTP clock] =>
	[PTP clients on Network]

And the original case:
#3) [Master Clock on Network] => [PTP clock] => [PTPd] => [PTP clock]

With a secondary control flow:
	[PPS signal from PTP clock] => [NTPd] => [System Time]


Right?


So, just brainstorming here, I guess the question I'm trying to figure
out here, is can the "System Time" and "PTP clock" be merged/globbed
into a single "Time" interface from the userspace point of view?

In other words, if internal to the kernel, the PTP clock was always
synced to the system time, couldn't the flow look something like:

#3') [Master clock on network] => [PTP clock] => [PTPd] =>
	 [System Time] => [in-kernel sync thread] => [PTP clock]

So PTPd sees the offset adjustment from the PTP clock, and then feeds
that offset correction right into (a possibly enhanced) adjtimex. The
kernel would then immediately steer the PTP clock by the same amount to
keep it in sync with system time (along with a periodic offset/freq
correction step to deal with crystal drift).

Similarly:

#2') [GPS] => [NTPd] => [System Time] => [in-kernel sync thread] => 
		[PTP clock] => [PTP clients on Network]

and 

#1') [Master Clock on Network1] => [PTP Clock] => [PTPd] =>
	[System Time] => [in-kernel sync thread] => [PTP Clock] => 
	[PTP Clients on Network2]

Now, I realize PTP purists probably won't like this, because it
effectively makes the in-kernel sync thread similar to a PTP boundary
clock (or worse, since the control loop isn't exactly direct).

But considering that the kernel (internally) allows for *very*
fine-grained adjustments (we keep our long-term offset error in
(nanoseconds << 32)  ie: ~quarter-*billion*ths of a nanosecond - I think
that's sub-attosecond, if I recall the unit). And even the existing
external adjtimex interface allows for adjustments of 1ppm<<16 which is
a granularity of ~15 parts-per-trillion (assuming i'm doing the math
right).

These are all much greater then the parts-per-billion adjustment
granularity proposed for the direct PTP clock steering, so I suspect any
error caused by the indirection in the control flow could be minimized
significantly.

Additionally my suggestion here has the benefit of:
A: Avoiding the fragmented time domains (ie CLOCK_REALTIME vs CLOCK_PTP)
caused by adding a new clock_id.

B: Avoiding the indirect system time sync through the PPS interface,
which isn't completely terrible, but just feels a little ugly
configuration wise from a users-perspective.

I'm sure I still have lots to learn about PTP, so please let me know
where I'm off-base.

thanks
-john

^ permalink raw reply

* Re: Power machines fail to boot after build being successful
From: Michael Neuling @ 2010-08-27  1:15 UTC (permalink / raw)
  To: divya; +Cc: LKML, linuxppc-dev
In-Reply-To: <4C764F52.5050707@linux.vnet.ibm.com>

> After successfully building the kernel version
> 2.6.36-rc2-git4(commitid d4348c678977c) with the config file
> attached(used make oldconfig), P5 and P6 power machines fails to
> reboot with the following logs
> 
> Logs collected while rebooting into today next, same occurs with the
> upstream kernel too.

This will fix your problem:
http://patchwork.ozlabs.org/patch/62757/ 

Please CC linuxppc-dev@ozlabs.org for pwoerpc bug reports.

Mikey


> Please wait, loading kernel...
> Allocated 01a00000 bytes for kernel @ 03500000
>     Elf64 kernel loaded...
> Loading ramdisk...
> ramdisk loaded 00841ae7 @ 04f00000
> OF stdout device is: /vdevice/vty@30000000
> Preparing to boot Linux version 2.6.36-rc2-autotest-next-20100826 (root@llm62
) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP Thu
 Aug 26 10:55:01 IST 2010
> Max number of cores passed to firmware: 512 (NR_CPUS = 1024)
> Calling ibm,client-architecture-support... done
> command line: root=/dev/sda5 IDENT=1282801372 xmon=early
> memory layout at init:
>    memory_limit : 0000000000000000 (16 MB aligned)
>    alloc_bottom : 0000000005750000
>    alloc_top    : 0000000008000000
>    alloc_top_hi : 0000000008000000
>    rmo_top      : 0000000008000000
>    ram_top      : 0000000008000000
> instantiating rtas at 0x00000000074e0000... done
> boot cpu hw idx 0
> starting cpu hw idx 2... done
> copying OF device tree...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x0000000005760000 ->  0x00000000057615fa
> Device tree struct  0x0000000005770000 ->  0x0000000005790000
> Calling quiesce...
> returning from prom_init
> Using pSeries machine description
> Using 1TB segments
> Found initrd at 0xc000000004f00000:0xc000000005741ae7
> Partition configured for 4 cpus.
> CPU maps initialized for 2 threads per core
> Starting Linux PPC64 #1 SMP Thu Aug 26 10:55:01 IST 2010
> -----------------------------------------------------
> ppc64_pft_size                = 0x1a
> physicalMemorySize            = 0x100000000
> htab_hash_mask                = 0x7ffff
> -----------------------------------------------------
> Initializing cgroup subsys cpuset
> Initializing cgroup subsys cpu
> Linux version 2.6.36-rc2-autotest-next-20100826 (root@llm62) (gcc version 4.3
.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP Thu Aug 26 10:55:01 I
ST 2010
> [boot]0012 Setup Arch
> EEH: No capable adapters found
> PPC64 nvram contains 15360 bytes
> Zone PFN ranges:
>    DMA      0x00000000 ->  0x00010000
>    Normal   empty
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
>      1: 0x00000000 ->  0x00010000
> Could not find start_pfn for node 0
> [boot]0015 Setup Done
> PERCPU: Embedded 29 pages/cpu @c000000002100000 s1859840 r0 d40704 u2097152
> pcpu-alloc: s1859840 r0 d40704 u2097152 alloc=2*1048576
> pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
> Built 2 zonelists in Node order, mobility grouping on.  Total pages: 65480
> Policy zone: DMA
> Kernel command line: root=/dev/sda5 IDENT=1282801372 xmon=early
> PID hash table entries: 4096 (order: -1, 32768 bytes)
> freeing bootmem node 1
> Memory: 4124352k/4194304k available (10880k kernel code, 69952k reserved, 288
0k data, 10947k bss, 2432k init)
> Hierarchical RCU implementation.
>          Verbose stalled-CPUs detection is disabled.
> NR_IRQS:512
> [boot]0020 XICS Init
> [boot]0021 XICS Done
> clocksource: timebase mult[7d0000] shift[22] registered
> Console: colour dummy device 80x25
> console [hvc0] enabled, bootconsole disabled
> console [hvc0] enabled, bootconsole disabled
> Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> ... MAX_LOCKDEP_SUBCLASSES:  8
> ... MAX_LOCK_DEPTH:          48
> ... MAX_LOCKDEP_KEYS:        8191
> ... CLASSHASH_SIZE:          4096
> ... MAX_LOCKDEP_ENTRIES:     16384
> ... MAX_LOCKDEP_CHAINS:      32768
> ... CHAINHASH_SIZE:          16384
>   memory used by lock dependency info: 6335 kB
>   per task-struct memory footprint: 2688 bytes
> allocated 2621440 bytes of page_cgroup
> please try 'cgroup_disable=memory' option if you don't want memory cgroups
> pid_max: default: 32768 minimum: 301
> Security Framework initialized
> SELinux:  Disabled at boot.
> AppArmor: AppArmor disabled by boot time parameter
> Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes)
> Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Mount-cache hash table entries: 4096
> Initializing cgroup subsys ns
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys memory
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> cpu 0x1: Vector: 300 (Data Access) at [c0000000fedcbd10]
>      pc: 0000000000017f00
>      lr: 0000000000008290
>      sp: c0000000fedcbf90
>     msr: 8000000000001000
>     dar: c0000000fedcbfa0
>   dsisr: 42000000
>    current = 0xc0000000fedb4e70
>    paca    = 0xc000000007440280
>      pid   = 0, comm = swapper
> WARNING: exception is not recoverable, can't continue
> enter ? for help
> 1:mon>  Unable to handle kernel paging request for data at address 0xc0000000
fedcbfa0
> Faulting instruction address: 0x00017f00
> Processor 1 is stuck.
> cpu 0x2: Vector: 300 (Data Access) at [c0000000fedcfd10]
>      pc: 0000000000017f00
>      lr: 0000000000008290
>      sp: c0000000fedcff90
>     msr: 8000000000001002
>     dar: c0000000fedcffa0
>   dsisr: 42000000
>    current = 0xc0000000fedb7580
>    paca    = 0xc000000007440500
>      pid   = 0, comm = swapper
> Unable to handle kernel paging request for data at address 0xc0000000fedcffa0
> Faulting instruction address: 0x00017f00
> Processor 2 is stuck.
> cpu 0x3: Vector: 300 (Data Access) at [c0000000fee33d10]
>      pc: 0000000000017f00
>      lr: 0000000000008290
>      sp: c0000000fee33f90
>     msr: 8000000000001000
>     dar: c0000000fee33fa0
>   dsisr: 42000000
>    current = 0xc0000000fedb9c90
>    paca    = 0xc000000007440780
>      pid   = 0, comm = swapper
> WARNING: exception is not recoverable, can't continue
> 
> Thanks
> Divya
> 
> 
> --------------060402090502050601040804
> Content-Type: text/plain;
>  name="config_ppc_slub"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: attachment;
>  filename="config_ppc_slub"
> 
> CONFIG_ALTIVEC=y
> # CONFIG_WIRELESS is not set
> CONFIG_VSX=y
> CONFIG_TASKSTATS=y
> CONFIG_TASK_DELAY_ACCT=y
> CONFIG_TASK_XACCT=y
> CONFIG_TASK_IO_ACCOUNTING=y
> CONFIG_HAVE_KPROBES=y
> CONFIG_HAVE_KRETPROBES=y
> CONFIG_TREE_RCU=y
> CONFIG_RCU_FANOUT=64
> CONFIG_CGROUP_SCHED=y
> CONFIG_CGROUPS=y
> CONFIG_CGROUP_NS=y
> CONFIG_CGROUP_FREEZER=y
> CONFIG_CGROUP_DEVICE=y
> CONFIG_CPUSETS=y
> CONFIG_PROC_PID_CPUSET=y
> CONFIG_CGROUP_CPUACCT=y
> CONFIG_RESOURCE_COUNTERS=y
> CONFIG_CGROUP_MEM_RES_CTLR=y
> CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
> CONFIG_NAMESPACES=y
> CONFIG_UTS_NS=y
> CONFIG_IPC_NS=y
> CONFIG_USER_NS=y
> CONFIG_PID_NS=y
> CONFIG_NET_NS=y
> CONFIG_KALLSYMS=y
> CONFIG_KALLSYMS_ALL=y
> CONFIG_KPROBES=y
> CONFIG_CMM=y
> CONFIG_PPC_HAS_HASH_64K=y
> CONFIG_PPC_64K_PAGES=y
> CONFIG_FORCE_MAX_ZONEORDER=9
> CONFIG_PPC_SUBPAGE_PROT=y
> CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
> CONFIG_ARCH_HAS_WALK_MEMORY=y
> CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
> CONFIG_KEXEC=y
> CONFIG_MEMORY_HOTPLUG=y
> CONFIG_MEMORY_HOTPLUG_SPARSE=y
> CONFIG_MEMORY_HOTREMOVE=y
> CONFIG_IBMEBUS=y
> CONFIG_EHEA=y
> CONFIG_SCSI_IBMVSCSI=y
> CONFIG_SCSI_IBMVFC=m
> CONFIG_SCSI_IPR=y
> CONFIG_DEBUG_KERNEL=y
> CONFIG_DETECT_SOFTLOCKUP=y
> CONFIG_DETECT_HUNG_TASK=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
> CONFIG_DEBUG_BUGVERBOSE=y
> CONFIG_DEBUG_INFO=y
> CONFIG_KPROBES_SANITY_TEST=y
> CONFIG_LATENCYTOP=y
> CONFIG_SYSCTL_SYSCALL_CHECK=y
> CONFIG_DEBUG_PAGEALLOC=y
> CONFIG_NOP_TRACER=y
> CONFIG_HAVE_FUNCTION_TRACER=y
> CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
> CONFIG_HAVE_DYNAMIC_FTRACE=y
> CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
> CONFIG_RING_BUFFER=y
> CONFIG_TRACING=y
> CONFIG_TRACING_SUPPORT=y
> ONFIG_BLK_DEV_IO_TRACE=y
> CONFIG_FTRACE_SELFTEST=y
> CONFIG_FTRACE_STARTUP_TEST=y
> CONFIG_KEYS=y
> CONFIG_KEYS_DEBUG_PROC_KEYS=y
> CONFIG_SECURITY=y
> CONFIG_SECURITYFS=y
> CONFIG_SECURITY_NETWORK=y
> CONFIG_RCU_CPU_STALL_DETECTOR=y
> CONFIG_EXT2_FS=y
> CONFIG_EXT2_FS_XATTR=y
> CONFIG_EXT2_FS_POSIX_ACL=y
> CONFIG_EXT2_FS_SECURITY=y
> CONFIG_EXT3_FS=y
> CONFIG_EXT3_FS_XATTR=y
> CONFIG_EXT3_FS_POSIX_ACL=y
> CONFIG_EXT3_FS_SECURITY=y
> CONFIG_EXT4_FS=y
> CONFIG_EXT4DEV_COMPAT=y
> CONFIG_EXT4_FS_XATTR=y
> CONFIG_EXT4_FS_POSIX_ACL=y
> CONFIG_EXT4_FS_SECURITY=y
> CONFIG_JBD=y
> CONFIG_JBD2=y
> CONFIG_REISERFS_PROC_INFO=y
> CONFIG_REISERFS_FS_XATTR=y
> CONFIG_REISERFS_FS_POSIX_ACL=y
> CONFIG_REISERFS_FS_SECURITY=y
> CONFIG_JFS_FS=m
> CONFIG_JFS_POSIX_ACL=y
> CONFIG_JFS_SECURITY=y
> CONFIG_JFS_STATISTICS=y
> CONFIG_FS_POSIX_ACL=y
> CONFIG_FILE_LOCKING=y
> CONFIG_XFS_FS=m
> CONFIG_XFS_QUOTA=y
> CONFIG_XFS_POSIX_ACL=y
> CONFIG_XFS_RT=y
> CONFIG_GFS2_FS=m
> CONFIG_GFS2_FS_LOCKING_DLM=m
> CONFIG_OCFS2_FS=m
> CONFIG_OCFS2_FS_O2CB=m
> CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
> CONFIG_OCFS2_FS_STATS=y
> CONFIG_BTRFS_FS=y
> CONFIG_BTRFS_FS_POSIX_ACL=y
> CONFIG_LOCALVERSION_AUTO=n
> CONFIG_LOCALVERSION=""
> CONFIG_SLUB_DEBUG=y
> CONFIG_SLUB=y
> CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
> CONFIG_HAVE_PERF_COUNTERS=y
> CONFIG_PERF_COUNTERS=y
> CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
> CONFIG_FSNOTIFY=y
> CONFIG_RCU_TORTURE_TEST=y
> CONFIG_RCU_TORTURE_TEST_RUNNABLE=y
> CONFIG_RCU_CPU_STALL_DETECTOR=y
> CONFIG_PERF_EVENTS=y
> CONFIG_PPC_OF_BOOT_TRAMPOLINE=y
> CONFIG_BLOCK=y
> CONFIG_BLK_DEV_BSG=y
> CONFIG_BLK_DEV_INTEGRITY=y
> CONFIG_BLK_CGROUP=y
> CONFIG_DEBUG_BLK_CGROUP=y
> CONFIG_BLOCK_COMPAT=y
> CONFIG_IOSCHED_NOOP=y
> CONFIG_IOSCHED_CFQ=y
> CONFIG_CFQ_GROUP_IOSCHED=y
> CONFIG_DEBUG_CFQ_IOSCHED=y
> CONFIG_DEFAULT_CFQ=y
> CONFIG_DEFAULT_IOSCHED="cfq"
> CONFIG_VIRTUALIZATION=y
> CONFIG_VIRTIO_PCI=y
> CONFIG_VIRTIO_BALLOON=y
> CONFIG_NR_IRQS=512
> CONFIG_MIGRATION=y
> CONFIG_KSM=y
> CONFIG_PM=y
> CONFIG_SUSPEND=y
> CONFIG_SUSPEND_FREEZER=y
> CONFIG_HIBERNATION=y
> CONFIG_SYSFS_DEPRECATED_V2=y
> CONFIG_LOCK_STAT=y
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_LOCKDEP=y
> CONFIG_IXGBE_DCB=y
> CONFIG_DCB=y
> 
> 
> --------------060402090502050601040804--
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply

* Re: Power machines fail to boot after build being successful
From: Tony Breeds @ 2010-08-27  0:46 UTC (permalink / raw)
  To: divya; +Cc: linux-next, LinuxPPC-dev, LKML
In-Reply-To: <4C764F52.5050707@linux.vnet.ibm.com>

On Thu, Aug 26, 2010 at 04:56:10PM +0530, divya wrote:
> Hi,
> 
> After successfully building the kernel version 2.6.36-rc2-git4(commitid
> d4348c678977c) with the config file attached(used make oldconfig),
> P5 and P6 power machines fails to reboot with the following logs

<snip>

> Preparing to boot Linux version 2.6.36-rc2-autotest-next-20100826 (root@llm62) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP Thu Aug 26 10:55:01 IST 2010

You say you're booting -git4 but you're running kernel is Linux next from Aug.
26th  Can you confirm you see this boot failure with Linus's tree NOT linux
next?

Adding linux-ppc and linux-next lists.

Yours Tony

^ permalink raw reply

* SPI bitbanging driver
From: Ravi Gupta @ 2010-08-26  9:39 UTC (permalink / raw)
  To: linuxppc-dev, MJ embd, Ira W. Snyder, Anton Vorontsov

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

Hi,

I am new to linux device driver development. I have to develop a spi
bitbanging driver for MPC837xERDB board. So far I have found that linux
kernel already has support for bitbanging over gpio(I am using linux kernel
2.6.35). Now I am confused that how I am going to configure SPI pins as GPIO
and what API to use to write my driver. I searched over internet a lot for a
sample SPI bitbang driver but unable to do so. Can anyone provide me some
sample bitbang driver explaning how to use the spi_bitbang.c and spi_gpio.c.
Do I need to change the mpc8377_rdb.dts file to register spi pins as gpio?

Thanks in advance
Ravi Gupta

[-- Attachment #2: Type: text/html, Size: 655 bytes --]

^ permalink raw reply

* [PATCH] powerpc: don't use kernel stack with translation off
From: Michael Neuling @ 2010-08-26  7:04 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, stable, Matt Evans

In f761622e59433130bc33ad086ce219feee9eb961 we changed
early_setup_secondary so it's called using the proper kernel stack
rather than the emergency one.

Unfortunately, this stack pointer can't be used when translation is off
on PHYP as this stack pointer might be outside the RMO.  This results in
the following on all non zero cpus:
  cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10]
      pc: 000000000001c50c
      lr: 000000000000821c
      sp: c00000001639ff90
     msr: 8000000000001000
     dar: c00000001639ffa0
   dsisr: 42000000
    current = 0xc000000016393540
    paca    = 0xc000000006e00200
      pid   = 0, comm = swapper

The original patch was only tested on bare metal system, so it never
caught this problem.

This changes __secondary_start so that we calculate the new stack
pointer but only start using it after we've called early_setup_secondary.

With this patch, the above problem goes away.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
benh: can you pick this up for 2.6.36 as we are bust currently in Linus' tree.

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 4d6681d..c571cd3 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -575,13 +575,19 @@ __secondary_start:
 	/* Initialize the kernel stack.  Just a repeat for iSeries.	 */
 	LOAD_REG_ADDR(r3, current_set)
 	sldi	r28,r24,3		/* get current_set[cpu#]	 */
-	ldx	r1,r3,r28
-	addi	r1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD
-	std	r1,PACAKSAVE(r13)
+	ldx	r14,r3,r28
+	addi	r14,r14,THREAD_SIZE-STACK_FRAME_OVERHEAD
+	std	r14,PACAKSAVE(r13)
 
 	/* Do early setup for that CPU (stab, slb, hash table pointer) */
 	bl	.early_setup_secondary
 
+	/*
+	 * setup the new stack pointer, but *don't* use this until
+	 * translation is on.
+	 */
+	mr	r1, r14
+
 	/* Clear backchain so we get nice backtraces */
 	li	r7,0
 	mtlr	r7

^ permalink raw reply related

* Re: [PATCH] powerpc: Wire up direct socket system calls
From: Ian Munsie @ 2010-08-26  5:56 UTC (permalink / raw)
  To: linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
	Andrew Morton, Andreas Schwab, Christoph Hellwig,
	Arjan van de Ven, Jesper Nilsson, linuxppc-dev
In-Reply-To: <1282798228-340-1-git-send-email-imunsie@au1.ibm.com>

Excerpts from Ian Munsie's message of Thu Aug 26 14:50:28 +1000 2010:
> This patch wires up the various socket system calls on PowerPC so that
> userspace can call them directly, rather than by going through the
> multiplexed socketcall system call.

I should have mentioned that the base is ppc/next

Also, I've included a simple library below suitable for use with
LD_PRELOAD to allow this to be tested on existing programs.


Cheers,
-Ian


#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/time.h>

/* PPC syscall numbers from /arch/powerpc/include/asm/unistd.h */
#define __NR_socket		326
#define __NR_bind		327
#define __NR_connect		328
#define __NR_listen		329
#define __NR_accept		330
#define __NR_getsockname	331
#define __NR_getpeername	332
#define __NR_socketpair		333
#define __NR_send		334
#define __NR_sendto		335
#define __NR_recv		336
#define __NR_recvfrom		337
#define __NR_shutdown		338
#define __NR_setsockopt		339
#define __NR_getsockopt		340
#define __NR_sendmsg		341
#define __NR_recvmsg		342
#define __NR_recvmmsg		343
#define __NR_accept4		344


#define DEBUG 0

#if DEBUG
#define DEBUGsyscall(name, ...)						\
	int __ret;							\
	__ret = syscall(__NR_##name, __VA_ARGS__);			\
	fprintf(stderr, "--"#name": %i", __ret);			\
	if (__ret == -1) {						\
		fprintf(stderr, ", %s (%i)", strerror(errno), errno);	\
	}								\
	fprintf(stderr, "--\n");					\
	return __ret;
#else
#define DEBUGsyscall(name, ...)						\
	return syscall(__NR_##name, __VA_ARGS__);
#endif


int socket(int domain, int type, int protocol)
{
	DEBUGsyscall(socket, domain, type, protocol);
}

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen)
{
	DEBUGsyscall(bind, sockfd, addr, addrlen);
}

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen)
{
	DEBUGsyscall(connect, sockfd, addr, addrlen);
}

int listen(int sockfd, int backlog)
{
	DEBUGsyscall(listen, sockfd, backlog);
}

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen)
{
	DEBUGsyscall(accept, sockfd, addr, addrlen);
}

int getsockname(int sockfd, struct sockaddr *addr, socklen_t *addrlen)
{
	DEBUGsyscall(getsockname, sockfd, addr, addrlen);
}

int getpeername(int sockfd, struct sockaddr *addr, socklen_t *addrlen)
{
	DEBUGsyscall(getpeername, sockfd, addr, addrlen);
}

int socketpair(int domain, int type, int protocol, int sv[2])
{
	DEBUGsyscall(socketpair, domain, type, protocol, sv);
}

int send(int sockfd, const void *buf, size_t len, int flags)
{
	DEBUGsyscall(send, sockfd, buf, len, flags);
}

int sendto(int sockfd, const void *buf, size_t len, int flags, const struct sockaddr *dest_addr, socklen_t addrlen)
{
	DEBUGsyscall(sendto, sockfd, buf, len, flags, dest_addr, addrlen);
}

int recv(int sockfd, void *buf, size_t len, int flags)
{
	DEBUGsyscall(recv, sockfd, buf, len, flags);
}

int recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen)
{
	DEBUGsyscall(recvfrom, sockfd, buf, len, flags, src_addr, addrlen);
}

int shutdown(int sockfd, int how)
{
	DEBUGsyscall(shutdown, sockfd, how);
}

int setsockopt(int sockfd, int level, int optname, const void *optval, socklen_t optlen)
{
	DEBUGsyscall(setsockopt, sockfd, level, optname, optval, optlen);
}

int getsockopt(int sockfd, int level, int optname, void *optval, socklen_t *optlen)
{
	DEBUGsyscall(getsockopt, sockfd, level, optname, optval, optlen);
}

int sendmsg(int sockfd, const struct msghdr *msg, int flags)
{
	DEBUGsyscall(sendmsg, sockfd, msg, flags);
}

int recvmsg(int sockfd, struct msghdr *msg, int flags)
{
	DEBUGsyscall(recvmsg, sockfd, msg, flags);
}

/* Debian squeeze libc doesn't support recvmmsg yet, not much point intercepting it */

int accept4(int sockfd, struct sockaddr *addr, socklen_t *addrlen, int flags)
{
	DEBUGsyscall(accept4, sockfd, addr, addrlen, flags);
}

^ permalink raw reply

* [PATCH] powerpc: Wire up direct socket system calls
From: Ian Munsie @ 2010-08-26  4:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jesper Nilsson, linuxppc-dev, Andreas Schwab, Ian Munsie,
	Paul Mackerras, Andrew Morton, Arjan van de Ven,
	Christoph Hellwig

From: Ian Munsie <imunsie@au1.ibm.com>

This patch wires up the various socket system calls on PowerPC so that
userspace can call them directly, rather than by going through the
multiplexed socketcall system call.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 arch/powerpc/include/asm/systbl.h |   19 +++++++++++++++++++
 arch/powerpc/include/asm/unistd.h |   21 ++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 3d21266..aa0f1eb 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -329,3 +329,22 @@ COMPAT_SYS(rt_tgsigqueueinfo)
 SYSCALL(fanotify_init)
 COMPAT_SYS(fanotify_mark)
 SYSCALL_SPU(prlimit64)
+SYSCALL_SPU(socket)
+SYSCALL_SPU(bind)
+SYSCALL_SPU(connect)
+SYSCALL_SPU(listen)
+SYSCALL_SPU(accept)
+SYSCALL_SPU(getsockname)
+SYSCALL_SPU(getpeername)
+SYSCALL_SPU(socketpair)
+SYSCALL_SPU(send)
+SYSCALL_SPU(sendto)
+COMPAT_SYS_SPU(recv)
+COMPAT_SYS_SPU(recvfrom)
+SYSCALL_SPU(shutdown)
+COMPAT_SYS_SPU(setsockopt)
+COMPAT_SYS_SPU(getsockopt)
+COMPAT_SYS_SPU(sendmsg)
+COMPAT_SYS_SPU(recvmsg)
+COMPAT_SYS_SPU(recvmmsg)
+SYSCALL_SPU(accept4)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index 597e6f9..6151937 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -348,10 +348,29 @@
 #define __NR_fanotify_init	323
 #define __NR_fanotify_mark	324
 #define __NR_prlimit64		325
+#define __NR_socket		326
+#define __NR_bind		327
+#define __NR_connect		328
+#define __NR_listen		329
+#define __NR_accept		330
+#define __NR_getsockname	331
+#define __NR_getpeername	332
+#define __NR_socketpair		333
+#define __NR_send		334
+#define __NR_sendto		335
+#define __NR_recv		336
+#define __NR_recvfrom		337
+#define __NR_shutdown		338
+#define __NR_setsockopt		339
+#define __NR_getsockopt		340
+#define __NR_sendmsg		341
+#define __NR_recvmsg		342
+#define __NR_recvmmsg		343
+#define __NR_accept4		344
 
 #ifdef __KERNEL__
 
-#define __NR_syscalls		326
+#define __NR_syscalls		345
 
 #define __NR__exit __NR_exit
 #define NR_syscalls	__NR_syscalls
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Christian Riesch @ 2010-08-25  9:40 UTC (permalink / raw)
  To: john stultz
  Cc: Rodolfo Giometti, Arnd Bergmann, netdev, devicetree-discuss,
	linux-kernel, linuxppc-dev, Richard Cochran, linux-arm-kernel,
	Krzysztof Halasa
In-Reply-To: <1282594125.3111.344.camel@localhost.localdomain>

On Mon, Aug 23, 2010 at 10:08 PM, john stultz <johnstul@us.ibm.com> wrote:
> On Thu, 2010-08-19 at 07:55 +0200, Richard Cochran wrote:
>> On Wed, Aug 18, 2010 at 05:12:56PM -0700, john stultz wrote:
>> > Again, my knowledge in the networking stack is pretty limited. But it
>> > would seem that having an interface that does something to the effect =
of
>> > "adjust the timestamp clock on the hardware that generated it from thi=
s
>> > packet by Xppb" would feel like the right level of abstraction. Its
>> > closely related to SO_TIMESTAMP, option right? Would something like
>> > using the setsockopt/getsockopt interface with
>> > SO_TIMESTAMP_ADJUST/OFFSET/SET/etc be reasonable?
>>
>> The clock and its adjustment have nothing to do with a network
>> socket. The current PTP hacks floating around all add private ioctls
>> to the MAC driver. That is the *wrong* way to do it.
>
> Could you clarify on *why* that is the wrong approach?
>
> Maybe this is where some of the confusion is coming from? The subtleties
> of the more generic PTP algorithm and how the existence of PTP hardware
> clocks change things are not clear to me. My understanding of ptp and
> the networking details around it is limited, so your expertise is
> appreciated. =C2=A0Might you consider covering some of this via a
> Documentation/ptp/overview.txt file in a future version of your patch?
>
> Here's a summary of what I understand:
> So from:
> http://en.wikipedia.org/wiki/Precision_Time_Protocol#Synchronization
>
> We see the message exchange of Sync/Delay_Req/Delay_Resp, and the
> calculation of the local offset from the server (and then a frequency
> adjustment over time as offsets values are accumulated).
>
> Without the hardware clock, this all of these messages and their
> corresponding timestamps are likely created by PTPd, using clock_gettime
> and then adjtimex() to correct for the calculated offset or freq
> adjustment. No extra interfaces are necessary, and PTPd is syncing the
> system time as accurately as it can. This is how the existing ptpd
> projects on the web seem to function.
>
> Now, with PTP hardware on the system, my understanding of what you're
> trying to enable with your patches is that the PTP hardware does the
> timestamping on both incoming and outgoing messages. PTPd then reads the
> pre-timestamped messages, calculates the offset and freq correction, and
> then feeds that back into the PTP hardware via your interface. No time
> correction is done at all by PTPd.

John,
What you describe here is only one of the use cases. If the hardware
has a single network port and operates as a PTP slave, it timestamps
the PTP packets that are sent and received and subsequently uses these
timestamps and the information it received from the master in the
packets to steer its clock to align it with the master clock. In such
a case the timestamping hardware and the clock hardware work together
closely and it seems to be okay to use the same interface to control
both the timestamping and the PTP clock.

But we have to consider other use cases, e.g.,

1) Boundary clocks:
We have more than one network port. One port operates as a slave
clock, our system gets time information via this port and steers its
PTP clock to align with the master clock. The other network ports of
our system operate as master clocks and redistribute the time
information we got from the master to other clocks on these networks.
In such a case we do timestamping on each of the network ports, but we
only have a single PTP clock. Each network port's timestamping
hardware uses the same hardware clock to generate time stamps.

2) Master clock:
We have one or more network ports. Our system has a really good clock
(ovenized quartz crystal, an atomic clock, a GPS timing receiver...)
and it distributes this time on the network. In such a case we do not
steer our clock based on the (packet) timestamps we get from our
timestamping unit. Instead, we directly drive our clock hardware with
a very stable frequency that we get from the OCXO or the atomic
clock... or we use one of the ancillary features of the PTP clock that
Richard mentioned to timestamp not network packets but a 1pps signal
and use these timestamps to steer the clock. Packet time stamping is
used to distribute the time to the slaves, but it is not part of the
control loop in this case.

So in the first case we have one PTP clock but several network packet
timestamping units, whereas in the second case the packet timestamping
is done but it is not part of the control loop that steers the clock.
Of course in most hardware implementations both the PTP clock and the
timestamping unit sit on the same chip and often use the same means of
communication to the cpu, e.g., the MDIO bus, but I think we need some
logical separation here.

Christian

^ permalink raw reply

* Re: [PATCH 1/2] kdump: Allow shrinking of kdump region to be overridden
From: Eric W. Biederman @ 2010-08-25  0:37 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: akpm, kexec, linux-kernel, linuxppc-dev
In-Reply-To: <20100825002258.GD28360@kryten>

Anton Blanchard <anton@samba.org> writes:

> On ppc64 the crashkernel region almost always overlaps an area of firmware.
> This works fine except when using the sysfs interface to reduce the kdump
> region. If we free the firmware area we are guaranteed to crash.

That is ppc64 bug.  firmware should not be in the reserved region.  Any
random kernel like thing can be put in to that region at any valid
address and the fact that shrinking the region frees your firmware means
that using that region could also stomp your firmware (which I assume
would be a bad thing).

So please fix the ppc64 reservation.

Eric

^ permalink raw reply

* Re: [PATCH] powerpc: Check end of stack canary at oops time
From: Anton Blanchard @ 2010-08-25  1:51 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <1282699778.21145.54.camel@concordia>

 
Hi,

> The check for init is just because we haven't set the magic value for
> init's stack right? But we could.

Yeah, it's similar to what x86 are doing now:


commit 0e7810be30f66e9f430c4ce2cd3b14634211690f
Author: Jan Beulich <JBeulich@novell.com>
Date:   Fri Nov 20 14:00:14 2009 +0000

    x86: Suppress stack overrun message for init_task
    
    init_task doesn't get its stack end location set to
    STACK_END_MAGIC, and hence the message is confusing
    rather than helpful in this case.


Adding it directly to init_task would be nice but I suspect we'd
either have to make assumptions about end_of_stack in our code or move the
canary into the thread_info (so we can statically allocate it via
INIT_THREAD_INFO()) or do it at runtime somewhere, hopefully early enough that
we couldn't take an oops.

Anton

^ permalink raw reply

* Re: [PATCH] powerpc: remove fpscr use from [kvm_]cvt_{fd,df}
From: Michael Neuling @ 2010-08-25  1:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Andreas Schwab, kvm-ppc, Paul Mackerras
In-Reply-To: <1282699836.22370.566.camel@pasglop>

In message <1282699836.22370.566.camel@pasglop> you wrote:
> On Tue, 2010-08-24 at 15:15 +1000, Michael Neuling wrote:
> > > > Do some 32 bit processors need this? 
> > > > 
> > > > In 32 bit before the merge, we use to have code that did:
> > > > 
> > > >   #if defined(CONFIG_4xx) || defined(CONFIG_E500)
> > > >    #define cvt_fd without save/restore fpscr
> > > >   #else
> > > >    #define cvt_fd with save/restore fpscr
> > > >   #end if
> > > > 
> > > > Kumar; does this ring any bells?
> > > 
> > > I don't see anything in the various 440 docs I have at hand that would
> > > hint at lfd/stfs adffecting FPSCR.
> > 
> > The way the ifdefs are, it's the other way around.  4xx procs don't need
> > to save/restore fpscr and others do.
> 
> Right, my bad. In any case, Paulus reckons it's all his mistake and we
> really never need to save/restore fpscr.

ACK :-P

Mikey

^ permalink raw reply

* Re: [PATCH] powerpc: remove fpscr use from [kvm_]cvt_{fd,df}
From: Benjamin Herrenschmidt @ 2010-08-25  1:30 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Andreas Schwab, kvm-ppc, Paul Mackerras
In-Reply-To: <32250.1282626925@neuling.org>

On Tue, 2010-08-24 at 15:15 +1000, Michael Neuling wrote:
> > > Do some 32 bit processors need this? 
> > > 
> > > In 32 bit before the merge, we use to have code that did:
> > > 
> > >   #if defined(CONFIG_4xx) || defined(CONFIG_E500)
> > >    #define cvt_fd without save/restore fpscr
> > >   #else
> > >    #define cvt_fd with save/restore fpscr
> > >   #end if
> > > 
> > > Kumar; does this ring any bells?
> > 
> > I don't see anything in the various 440 docs I have at hand that would
> > hint at lfd/stfs adffecting FPSCR.
> 
> The way the ifdefs are, it's the other way around.  4xx procs don't need
> to save/restore fpscr and others do.

Right, my bad. In any case, Paulus reckons it's all his mistake and we
really never need to save/restore fpscr.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: Check end of stack canary at oops time
From: Michael Ellerman @ 2010-08-25  1:29 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linuxppc-dev
In-Reply-To: <20100824231528.GC28360@kryten>

[-- Attachment #1: Type: text/plain, Size: 1678 bytes --]

On Wed, 2010-08-25 at 09:15 +1000, Anton Blanchard wrote:
> Add a check for the stack canary when we oops, similar to x86. This should make
> it clear that we overran our stack:
> 
> Unable to handle kernel paging request for data at address 0x24652f63700ac689
> Faulting instruction address: 0xc000000000063d24
> Thread overran stack, or stack corrupted
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
> 
> Index: powerpc.git/arch/powerpc/mm/fault.c
> ===================================================================
> --- powerpc.git.orig/arch/powerpc/mm/fault.c	2010-08-25 08:41:08.230086186 +1000
> +++ powerpc.git/arch/powerpc/mm/fault.c	2010-08-25 09:12:38.276553103 +1000
> @@ -30,6 +30,7 @@
>  #include <linux/kprobes.h>
>  #include <linux/kdebug.h>
>  #include <linux/perf_event.h>
> +#include <linux/magic.h>
>  
>  #include <asm/firmware.h>
>  #include <asm/page.h>
> @@ -385,6 +386,7 @@ do_sigbus:
>  void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
>  {
>  	const struct exception_table_entry *entry;
> +	unsigned long *stackend;
>  
>  	/* Are we prepared to handle this fault?  */
>  	if ((entry = search_exception_tables(regs->nip)) != NULL) {
> @@ -413,5 +415,9 @@ void bad_page_fault(struct pt_regs *regs
>  	printk(KERN_ALERT "Faulting instruction address: 0x%08lx\n",
>  		regs->nip);
>  
> +	stackend = end_of_stack(current);
> +	if (current != &init_task && *stackend != STACK_END_MAGIC)
> +		printk(KERN_ALERT "Thread overran stack, or stack corrupted\n");

The check for init is just because we haven't set the magic value for
init's stack right? But we could.

cheers


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* [PATCH 2/2] powerpc: kdump: Override crash_free_reserved_phys_range to avoid freeing RTAS
From: Anton Blanchard @ 2010-08-25  0:23 UTC (permalink / raw)
  To: benh, ebiederm, akpm; +Cc: linuxppc-dev, kexec, linux-kernel
In-Reply-To: <20100825002258.GD28360@kryten>


The crashkernel region will almost always overlap RTAS. If we free the
crashkernel region via "echo 0 > /sys/kernel/kexec_crash_size" then we will
free RTAS and the machine will crash in confusing and exciting ways.

Override crash_free_reserved_phys_range and check for overlap with RTAS.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/arch/powerpc/kernel/crash_dump.c
===================================================================
--- powerpc.git.orig/arch/powerpc/kernel/crash_dump.c	2010-08-24 09:24:32.219643256 +1000
+++ powerpc.git/arch/powerpc/kernel/crash_dump.c	2010-08-24 09:46:22.826868330 +1000
@@ -19,6 +19,7 @@
 #include <asm/prom.h>
 #include <asm/firmware.h>
 #include <asm/uaccess.h>
+#include <asm/rtas.h>
 
 #ifdef DEBUG
 #include <asm/udbg.h>
@@ -141,3 +142,35 @@ ssize_t copy_oldmem_page(unsigned long p
 
 	return csize;
 }
+
+#ifdef CONFIG_PPC_RTAS
+/*
+ * The crashkernel region will almost always overlap the RTAS region, so
+ * we have to be careful when shrinking the crashkernel region.
+ */
+void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
+{
+	unsigned long addr;
+	const u32 *basep, *sizep;
+	unsigned int rtas_start = 0, rtas_end = 0;
+
+	basep = of_get_property(rtas.dev, "linux,rtas-base", NULL);
+	sizep = of_get_property(rtas.dev, "rtas-size", NULL);
+
+	if (basep && sizep) {
+		rtas_start = *basep;
+		rtas_end = *basep + *sizep;
+	}
+
+	for (addr = begin; addr < end; addr += PAGE_SIZE) {
+		/* Does this page overlap with the RTAS region? */
+		if (addr <= rtas_end && ((addr + PAGE_SIZE) > rtas_start))
+			continue;
+
+		ClearPageReserved(pfn_to_page(addr >> PAGE_SHIFT));
+		init_page_count(pfn_to_page(addr >> PAGE_SHIFT));
+		free_page((unsigned long)__va(addr));
+		totalram_pages++;
+	}
+}
+#endif

^ permalink raw reply

* [PATCH 1/2] kdump: Allow shrinking of kdump region to be overridden
From: Anton Blanchard @ 2010-08-25  0:22 UTC (permalink / raw)
  To: benh, ebiederm, akpm; +Cc: linuxppc-dev, kexec, linux-kernel


On ppc64 the crashkernel region almost always overlaps an area of firmware.
This works fine except when using the sysfs interface to reduce the kdump
region. If we free the firmware area we are guaranteed to crash.

Rename free_reserved_phys_range to crash_free_reserved_phys_range and make
it a weak function so we can override it.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/kernel/kexec.c
===================================================================
--- powerpc.git.orig/kernel/kexec.c	2010-08-25 10:12:11.493863566 +1000
+++ powerpc.git/kernel/kexec.c	2010-08-25 10:12:35.907264327 +1000
@@ -1097,7 +1097,8 @@ size_t crash_get_memory_size(void)
 	return size;
 }
 
-static void free_reserved_phys_range(unsigned long begin, unsigned long end)
+void __weak crash_free_reserved_phys_range(unsigned long begin,
+					   unsigned long end)
 {
 	unsigned long addr;
 
@@ -1133,7 +1134,7 @@ int crash_shrink_memory(unsigned long ne
 	start = roundup(start, PAGE_SIZE);
 	end = roundup(start + new_size, PAGE_SIZE);
 
-	free_reserved_phys_range(end, crashk_res.end);
+	crash_free_reserved_phys_range(end, crashk_res.end);
 
 	if ((start == end) && (crashk_res.parent != NULL))
 		release_resource(&crashk_res);
Index: powerpc.git/include/linux/kexec.h
===================================================================
--- powerpc.git.orig/include/linux/kexec.h	2010-08-25 10:12:11.483862172 +1000
+++ powerpc.git/include/linux/kexec.h	2010-08-25 10:12:13.664166570 +1000
@@ -208,6 +208,7 @@ int __init parse_crashkernel(char *cmdli
 		unsigned long long *crash_size, unsigned long long *crash_base);
 int crash_shrink_memory(unsigned long new_size);
 size_t crash_get_memory_size(void);
+void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
 
 #else /* !CONFIG_KEXEC */
 struct pt_regs;

^ permalink raw reply

* [PATCH] powerpc: Check end of stack canary at oops time
From: Anton Blanchard @ 2010-08-24 23:15 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev


Add a check for the stack canary when we oops, similar to x86. This should make
it clear that we overran our stack:

Unable to handle kernel paging request for data at address 0x24652f63700ac689
Faulting instruction address: 0xc000000000063d24
Thread overran stack, or stack corrupted

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/arch/powerpc/mm/fault.c
===================================================================
--- powerpc.git.orig/arch/powerpc/mm/fault.c	2010-08-25 08:41:08.230086186 +1000
+++ powerpc.git/arch/powerpc/mm/fault.c	2010-08-25 09:12:38.276553103 +1000
@@ -30,6 +30,7 @@
 #include <linux/kprobes.h>
 #include <linux/kdebug.h>
 #include <linux/perf_event.h>
+#include <linux/magic.h>
 
 #include <asm/firmware.h>
 #include <asm/page.h>
@@ -385,6 +386,7 @@ do_sigbus:
 void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
 {
 	const struct exception_table_entry *entry;
+	unsigned long *stackend;
 
 	/* Are we prepared to handle this fault?  */
 	if ((entry = search_exception_tables(regs->nip)) != NULL) {
@@ -413,5 +415,9 @@ void bad_page_fault(struct pt_regs *regs
 	printk(KERN_ALERT "Faulting instruction address: 0x%08lx\n",
 		regs->nip);
 
+	stackend = end_of_stack(current);
+	if (current != &init_task && *stackend != STACK_END_MAGIC)
+		printk(KERN_ALERT "Thread overran stack, or stack corrupted\n");
+
 	die("Kernel access of bad area", regs, sig);
 }

^ permalink raw reply

* Re: [PATCH 1/1] powerpc: Clear cpu_sibling_map in cpu_die
From: Brian King @ 2010-08-24 21:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1282627487.22370.508.camel@pasglop>

On 08/24/2010 12:24 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2010-08-11 at 15:34 -0500, Brian King wrote:
>> While testing CPU DLPAR, the following problem was discovered.
>> We were DLPAR removing the first CPU, which in this case was
>> logical CPUs 0-3. CPUs 0-2 were already marked offline and
>> we were in the process of offlining CPU 3. After marking
>> the CPU inactive and offline in cpu_disable, but before the
>> cpu was completely idle (cpu_die), we ended up in __make_request
>> on CPU 3. There we looked at the topology map to see which CPU
>> to complete the I/O on and found no CPUs in the cpu_sibling_map.
>> This resulted in the block layer setting the completion cpu
>> to be NR_CPUS, which then caused an oops when we tried to
>> complete the I/O.
>>
>> Fix this by delaying clearing the sibling map of the cpu we
>> are offlining for the cpu we are offlining until cpu_die.
> 
> So I'm not getting a clear mental picture of the situation, sorry about
> that.
> 
> We are offlining CPU 3, and we have already marked it inactive and
> online, so how come we end up in __make_request() on it at this stage

I'm not sure about that. My thought was that until we get into cpu_die,
the cpu could still be executing code.

> and shouldn't it be the block layer that notices that it's targeting an
> offlined CPU ?

It could be easily fixed in blk_cpu_to_group as well. I'll look into
this.

> IE. I have doubts about leaving a CPU in the sibling map which isn't
> online... Wouldn't we end up "scheduling" things to it after it's
> supposed to have freed itself of everything (timers, workqueues,
> etc...) ?

I was assuming this wouldn't happen since the cpu is no longer online.


Thanks,

Brian

> 
> As I said, I'm probably missing a part of the puzzle ..
> 
> Ben.
> 
>> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
>> ---
>>
>>  arch/powerpc/kernel/smp.c |    9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff -puN arch/powerpc/kernel/smp.c~powerpc_sibling_map_offline arch/powerpc/kernel/smp.c
>> --- linux-2.6/arch/powerpc/kernel/smp.c~powerpc_sibling_map_offline	2010-08-09 16:49:47.000000000 -0500
>> +++ linux-2.6-bjking1/arch/powerpc/kernel/smp.c	2010-08-09 16:49:47.000000000 -0500
>> @@ -598,8 +598,11 @@ int __cpu_disable(void)
>>  	/* Update sibling maps */
>>  	base = cpu_first_thread_in_core(cpu);
>>  	for (i = 0; i < threads_per_core; i++) {
>> -		cpumask_clear_cpu(cpu, cpu_sibling_mask(base + i));
>> -		cpumask_clear_cpu(base + i, cpu_sibling_mask(cpu));
>> +		if ((base + i) != cpu) {
>> +			cpumask_clear_cpu(cpu, cpu_sibling_mask(base + i));
>> +			cpumask_clear_cpu(base + i, cpu_sibling_mask(cpu));
>> +		}
>> +
>>  		cpumask_clear_cpu(cpu, cpu_core_mask(base + i));
>>  		cpumask_clear_cpu(base + i, cpu_core_mask(cpu));
>>  	}
>> @@ -641,6 +644,8 @@ void cpu_hotplug_driver_unlock()
>>  
>>  void cpu_die(void)
>>  {
>> +	cpumask_clear_cpu(smp_processor_id(), cpu_sibling_mask(smp_processor_id()));
>> +
>>  	if (ppc_md.cpu_die)
>>  		ppc_md.cpu_die();
>>  }
>> _
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev


-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Stephan Gatzka @ 2010-08-24 18:30 UTC (permalink / raw)
  To: john stultz
  Cc: Arnd Bergmann, netdev, Richard Cochran, linuxppc-dev,
	linux-kernel, Rodolfo Giometti, devicetree-discuss,
	linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <1282594125.3111.344.camel@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 746 bytes --]

Hello!

> I'm curious if its possible to do the PTP hardware offset/adjustment
> calculation in a module internally to the kernel? That would allow the
> PPS interface to still be used to sync the system time, and not expose
> additional interfaces.
Just my 2 cents on this issue. PTP is very often used in embedded 
systems, where the PTP timestamps will go into some dedicated hardware, 
for instance an FPGA.

I'm currently working on a project where it is not necessary to adjust 
the Linux system time via PTP (although it would be a nice to have), but 
we only need the timestamps from the PHY to put them into our FPGA 
device. So we need some kind of API to access the PTP timestamp registers.

Kind regards,

Stephan


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5136 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox