LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: john stultz @ 2010-08-16 19:24 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Rodolfo Giometti, netdev, devicetree-discuss, linux-kernel,
	linuxppc-dev, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <363bd749a38d0b785d8431e591bf54c38db4c2d7.1281956490.git.richard.cochran@omicron.at>

On Mon, Aug 16, 2010 at 4:17 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> This patch adds an infrastructure for hardware clocks that implement
> IEEE 1588, the Precision Time Protocol (PTP). A class driver offers a
> registration method to particular hardware clock drivers. Each clock is
> exposed to user space as a character device with ioctls that allow tuning
> of the PTP clock.
>
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>

Hey Richard!
   Its very cool to see this work on lkml! I'm excited to see more
work done on ptp.  We had a short private thread discussion earlier (I
got busy and never replied to your last message, my apologies!), but I
wanted to bring up the concerns I have here as well.

A few comments below....

> +** PTP user space API
> +
> + =A0 The class driver creates a character device for each registered PTP
> + =A0 clock. User space programs may control the clock using standardized
> + =A0 ioctls. A program may query, enable, configure, and disable the
> + =A0 ancillary clock features. User space can receive time stamped
> + =A0 events via blocking read() and poll(). One shot and periodic
> + =A0 signals may be configured via an ioctl API with semantics similar
> + =A0 to the POSIX timer_settime() system call.

As I mentioned earlier, I'm not a huge fan of the char device
interface for abstracted PTP clocks.
If it was just the direct hardware access, similar to RTC, which user
apps then use as a timesource, I'd not have much of a problem. But as
I mentioned in an earlier private mail, the abstraction level concerns
me.

1) The driver-like model exposes a char dev for each clock, which
allows for poorly-written userland applications to hit portability
issues  (ie: /dev/hpet vs /dev/rtc). Granted this isn't a huge flaw,
but good APIs should be hard to get wrong.

2) As Arnd already mentioned, the chardev interface seems to duplicate
the clock_gettime/settime() and adjtimex() interfaces.

3) I'm not sure I see the benefit of being able to have multiple
frequency corrected time domains.  In other words, what benefit would
you get from adjusting a PTP clock's frequency instead of just
adjusting the system's time freq? Having the PTP time as a reference
to correct the system time seems reasonable, but I'm not sure I see
why userland would want to adjust the PTP clock's freq.

thanks
-john

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: john stultz @ 2010-08-16 19:38 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Rodolfo Giometti, netdev, devicetree-discuss, linux-kernel,
	linuxppc-dev, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <AANLkTik_2MKMhOuDGOmu8Kzyq-ipLe+Bxrb3FaD+Tv4U@mail.gmail.com>

On Mon, Aug 16, 2010 at 12:24 PM, john stultz <johnstul@us.ibm.com> wrote:
> On Mon, Aug 16, 2010 at 4:17 AM, Richard Cochran
> A few comments below....
>
>> +** PTP user space API
>> +
>> + =A0 The class driver creates a character device for each registered PT=
P
>> + =A0 clock. User space programs may control the clock using standardize=
d
>> + =A0 ioctls. A program may query, enable, configure, and disable the
>> + =A0 ancillary clock features. User space can receive time stamped
>> + =A0 events via blocking read() and poll(). One shot and periodic
>> + =A0 signals may be configured via an ioctl API with semantics similar
>> + =A0 to the POSIX timer_settime() system call.
>
> As I mentioned earlier, I'm not a huge fan of the char device
> interface for abstracted PTP clocks.
> If it was just the direct hardware access, similar to RTC, which user
> apps then use as a timesource, I'd not have much of a problem. But as
> I mentioned in an earlier private mail, the abstraction level concerns
> me.

[snip]

> 2) As Arnd already mentioned, the chardev interface seems to duplicate
> the clock_gettime/settime() and adjtimex() interfaces.

And maybe just to clarify, as I saw your response to Arnd, I'm not
suggesting using PTP clocks as clocksources for the internal
timekeeping core. Instead I'm trying to understand why PTP clocks need
the equivalent of the existing posix clocks/timer interface. Why would
only having a read-time interface not suffice?

thanks
-john

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Arnd Bergmann @ 2010-08-16 19:59 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: netdev, Richard Cochran, linux-kernel, Rodolfo Giometti,
	devicetree-discuss, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <20100816190003.GB4166@riccoc20.at.omicron.at>

On Monday 16 August 2010 21:00:03 Richard Cochran wrote:
> 
> On Mon, Aug 16, 2010 at 04:26:23PM +0200, Arnd Bergmann wrote:
> > Have you considered integrating the subsystem into the Posix clock/timer
> > framework?
> 
> Yes, but see below.
>  
> > I can't really tell from reading the source if this is possible or
> > not, but my feeling is that if it can be done, that would be a much
> > nicer interface. We already have clock_gettime/clock_settime/
> > timer_settime/... system calls, and while you'd need to add another
> > clockid and some syscalls, my feeling is that it will be more
> > usable in the end.
> 
> You are not the first person to ask about this. See this link for
> longer explanation of why I did not go that way:
> 
>   http://marc.info/?l=linux-netdev&m=127669810232201&w=2
> 
> You could offer the PTP clock as a Linux clock source/event device,
> and I agree that it would be nicer. However, the problem is, what do
> you do with the PHY based clocks?  Just one 16 bit read from a PHY
> clock can take 40 usec, and you need four such read operations just to
> get the current time value.

Why does it matter how long it takes to read the clock? I wasn't thinking
of replacing the system clock with this, just exposing the additional
clock as a new clockid_t value that can be accessed using the existing
syscalls.

> Also, I really did not want to add or change any syscalls. I could not
> see a practical way to extend the existing syscalls to accommodate PTP
> clocks.

Why did you not want to add syscalls? Adding ioctls instead of syscalls
does not make the interface better, just less visible.

Out of the ioctl commands you define, we already seem to have half or more:

PTP_CLOCK_APIVERS -> not needed
PTP_CLOCK_ADJFREQ -> new clock_adjfreq
PTP_CLOCK_ADJTIME -> new clock_adjtime
PTP_CLOCK_GETTIME -> clock_gettime
PTP_CLOCK_SETTIME -> clock_settime
PTP_CLOCK_GETCAPS -> new clock_getcaps
PTP_CLOCK_GETTIMER -> timer_gettime
PTP_CLOCK_SETTIMER -> timer_create/timer_settime
PTP_FEATURE_REQUEST -> possibly clock_feature

	Arnd

^ permalink raw reply

* Re: [PATCH] booting-without-of: Remove nonexistent chapters from TOC, fix numbering
From: Grant Likely @ 2010-08-16 21:09 UTC (permalink / raw)
  To: Anton Vorontsov; +Cc: linuxppc-dev
In-Reply-To: <20100811165603.GA22708@oksana.dev.rtsoft.ru>

On Wed, Aug 11, 2010 at 08:56:03PM +0400, Anton Vorontsov wrote:
> Marvell and GPIO bindings live in their own files, so the TOC should not
> mention them.
> 
> Also fix chapters numbering.
> 
> Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>

Applied, thanks.

g.

> ---
>  Documentation/powerpc/booting-without-of.txt |   31 +------------------------
>  1 files changed, 2 insertions(+), 29 deletions(-)
> 
> diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
> index 46d2210..3f454b7 100644
> --- a/Documentation/powerpc/booting-without-of.txt
> +++ b/Documentation/powerpc/booting-without-of.txt
> @@ -49,40 +49,13 @@ Table of Contents
>        f) MDIO on GPIOs
>        g) SPI busses
>  
> -  VII - Marvell Discovery mv64[345]6x System Controller chips
> -    1) The /system-controller node
> -    2) Child nodes of /system-controller
> -      a) Marvell Discovery MDIO bus
> -      b) Marvell Discovery ethernet controller
> -      c) Marvell Discovery PHY nodes
> -      d) Marvell Discovery SDMA nodes
> -      e) Marvell Discovery BRG nodes
> -      f) Marvell Discovery CUNIT nodes
> -      g) Marvell Discovery MPSCROUTING nodes
> -      h) Marvell Discovery MPSCINTR nodes
> -      i) Marvell Discovery MPSC nodes
> -      j) Marvell Discovery Watch Dog Timer nodes
> -      k) Marvell Discovery I2C nodes
> -      l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes
> -      m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes
> -      n) Marvell Discovery GPP (General Purpose Pins) nodes
> -      o) Marvell Discovery PCI host bridge node
> -      p) Marvell Discovery CPU Error nodes
> -      q) Marvell Discovery SRAM Controller nodes
> -      r) Marvell Discovery PCI Error Handler nodes
> -      s) Marvell Discovery Memory Controller nodes
> -
> -  VIII - Specifying interrupt information for devices
> +  VII - Specifying interrupt information for devices
>      1) interrupts property
>      2) interrupt-parent property
>      3) OpenPIC Interrupt Controllers
>      4) ISA Interrupt Controllers
>  
> -  IX - Specifying GPIO information for devices
> -    1) gpios property
> -    2) gpio-controller nodes
> -
> -  X - Specifying device power management information (sleep property)
> +  VIII - Specifying device power management information (sleep property)
>  
>    Appendix A - Sample SOC node for MPC8540
>  
> -- 
> 1.7.0.5

^ permalink raw reply

* Re: [Resend][PATCHv3] Xilinx Virtex 4 FX Soft FPU support
From: Grant Likely @ 2010-08-16 21:56 UTC (permalink / raw)
  To: Sergey Temerkhanov; +Cc: linuxppc-dev, John Linn
In-Reply-To: <201008140809.54230.temerkhanov@cifronik.ru>

Josh,

This one looks okay to me, but I've left it for you to comment on.  I
haven't seen any comments, so should I go ahead and pick it up for my
2.6.37 -next branch?

Cheers,
g.

On Fri, Aug 13, 2010 at 10:09 PM, Sergey Temerkhanov
<temerkhanov@cifronik.ru> wrote:
> This patch enables support for Xilinx Virtex 4 FX singe-float FPU.
>
> Changelog v2-v3:
> =9A =9A =9A =9A-Fixed whitespaces for SAVE_FPR/REST_FPR.
> =9A =9A =9A =9A-Changed description of MSR_AP bit.
> =9A =9A =9A =9A-Removed the stub for APU unavailable exception.
>
> Changelog v1->v2:
> =9A =9A =9A =9A-Added MSR_AP bit definition
> =9A =9A =9A =9A-Renamed CONFIG_XILINX_FPU to CONFIG_XILINX_SOFTFPU, moved=
 it to
> =9A =9A =9A =9A 'Platform support' and made it Virtex4-FX-only.
> =9A =9A =9A =9A-Changed SAVE_FPR/REST_FPR definition style.
>
> Caveats:
> =9A =9A =9A =9A- Hard-float binaries which rely on in-kernel math emulati=
on will
> =9A =9A =9A =9Agive wrong results since they expect 64-bit double-precisi=
on instead of
> =9A =9A =9A =9A32-bit single-precision numbers which Xilinx V4-FX Soft FP=
U produces.
>
>
> Signed-off-by: Sergey Temerkhanov<temerkhanov@cifronik.ru>
>
> diff -r 626de0d94469 arch/powerpc/include/asm/ppc_asm.h
> --- a/arch/powerpc/include/asm/ppc_asm.h =9A =9A =9A =9AWed May 26 15:33:=
32 2010 +0400
> +++ b/arch/powerpc/include/asm/ppc_asm.h =9A =9A =9A =9AWed May 26 20:30:=
43 2010 +0400
> @@ -85,13 +85,21 @@
> =9A#define REST_8GPRS(n, base) =9A =9AREST_4GPRS(n, base); REST_4GPRS(n+4=
, base)
> =9A#define REST_10GPRS(n, base) =9A REST_8GPRS(n, base); REST_2GPRS(n+8, =
base)
>
> +
> +#ifdef CONFIG_XILINX_SOFTFPU
> +#define SAVE_FPR(n, base) =9A =9A =9Astfs =9A =9An,THREAD_FPR0+8*TS_FPRW=
IDTH*(n)(base)
> +#define REST_FPR(n, base) =9A =9A =9Alfs =9A =9A n,THREAD_FPR0+8*TS_FPRW=
IDTH*(n)(base)
> +#else
> =9A#define SAVE_FPR(n, base) =9A =9A =9Astfd =9A =9An,THREAD_FPR0+8*TS_FP=
RWIDTH*(n)(base)
> +#define REST_FPR(n, base) =9A =9A =9Alfd =9A =9A n,THREAD_FPR0+8*TS_FPRW=
IDTH*(n)(base)
> +#endif
> +
> =9A#define SAVE_2FPRS(n, base) =9A =9ASAVE_FPR(n, base); SAVE_FPR(n+1, ba=
se)
> =9A#define SAVE_4FPRS(n, base) =9A =9ASAVE_2FPRS(n, base); SAVE_2FPRS(n+2=
, base)
> =9A#define SAVE_8FPRS(n, base) =9A =9ASAVE_4FPRS(n, base); SAVE_4FPRS(n+4=
, base)
> =9A#define SAVE_16FPRS(n, base) =9A SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, =
base)
> =9A#define SAVE_32FPRS(n, base) =9A SAVE_16FPRS(n, base); SAVE_16FPRS(n+1=
6, base)
> -#define REST_FPR(n, base) =9A =9A =9Alfd =9A =9A n,THREAD_FPR0+8*TS_FPRW=
IDTH*(n)(base)
> +
> =9A#define REST_2FPRS(n, base) =9A =9AREST_FPR(n, base); REST_FPR(n+1, ba=
se)
> =9A#define REST_4FPRS(n, base) =9A =9AREST_2FPRS(n, base); REST_2FPRS(n+2=
, base)
> =9A#define REST_8FPRS(n, base) =9A =9AREST_4FPRS(n, base); REST_4FPRS(n+4=
, base)
> diff -r 626de0d94469 arch/powerpc/include/asm/reg.h
> --- a/arch/powerpc/include/asm/reg.h =9A =9AWed May 26 15:33:32 2010 +040=
0
> +++ b/arch/powerpc/include/asm/reg.h =9A =9AWed May 26 20:30:43 2010 +040=
0
> @@ -30,6 +30,7 @@
> =9A#define MSR_ISF_LG =9A =9A 61 =9A =9A =9A =9A =9A =9A =9A/* Interrupt =
64b mode valid on 630 */
> =9A#define MSR_HV_LG =9A =9A =9A60 =9A =9A =9A =9A =9A =9A =9A/* Hypervis=
or state */
> =9A#define MSR_VEC_LG =9A =9A 25 =9A =9A =9A =9A =9A =9A =9A/* Enable Alt=
iVec */
> +#define MSR_AP_LG =9A =9A =9A25 =9A =9A =9A =9A =9A =9A =9A/* Enable PPC=
405 APU */
> =9A#define MSR_VSX_LG =9A =9A 23 =9A =9A =9A =9A =9A =9A =9A/* Enable VSX=
 */
> =9A#define MSR_POW_LG =9A =9A 18 =9A =9A =9A =9A =9A =9A =9A/* Enable Pow=
er Management */
> =9A#define MSR_WE_LG =9A =9A =9A18 =9A =9A =9A =9A =9A =9A =9A/* Wait Sta=
te Enable */
> @@ -71,6 +72,7 @@
> =9A#define MSR_HV =9A =9A =9A =9A 0
> =9A#endif
>
> +#define MSR_AP =9A =9A =9A =9A __MASK(MSR_AP_LG) =9A =9A =9A /* Enable P=
PC405 APU */
> =9A#define MSR_VEC =9A =9A =9A =9A =9A =9A =9A =9A__MASK(MSR_VEC_LG) =9A =
=9A =9A/* Enable AltiVec */
> =9A#define MSR_VSX =9A =9A =9A =9A =9A =9A =9A =9A__MASK(MSR_VSX_LG) =9A =
=9A =9A/* Enable VSX */
> =9A#define MSR_POW =9A =9A =9A =9A =9A =9A =9A =9A__MASK(MSR_POW_LG) =9A =
=9A =9A/* Enable Power Management */
> diff -r 626de0d94469 arch/powerpc/kernel/fpu.S
> --- a/arch/powerpc/kernel/fpu.S Wed May 26 15:33:32 2010 +0400
> +++ b/arch/powerpc/kernel/fpu.S Wed May 26 20:30:43 2010 +0400
> @@ -57,6 +57,9 @@
> =9A_GLOBAL(load_up_fpu)
> =9A =9A =9A =9Amfmsr =9A r5
> =9A =9A =9A =9Aori =9A =9A r5,r5,MSR_FP
> +#ifdef CONFIG_XILINX_SOFTFPU
> + =9A =9A =9A oris =9A =9Ar5,r5,MSR_AP@h
> +#endif
> =9A#ifdef CONFIG_VSX
> =9ABEGIN_FTR_SECTION
> =9A =9A =9A =9Aoris =9A =9Ar5,r5,MSR_VSX@h
> @@ -85,6 +88,9 @@
> =9A =9A =9A =9Atoreal(r5)
> =9A =9A =9A =9APPC_LL =9Ar4,_MSR-STACK_FRAME_OVERHEAD(r5)
> =9A =9A =9A =9Ali =9A =9A =9Ar10,MSR_FP|MSR_FE0|MSR_FE1
> +#ifdef CONFIG_XILINX_SOFTFPU
> + =9A =9A =9A oris =9A =9Ar10,r10,MSR_AP@h
> +#endif
> =9A =9A =9A =9Aandc =9A =9Ar4,r4,r10 =9A =9A =9A =9A =9A =9A =9A /* disab=
le FP for previous task */
> =9A =9A =9A =9APPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> =9A1:
> @@ -94,6 +100,9 @@
> =9A =9A =9A =9Amfspr =9A r5,SPRN_SPRG3 =9A =9A =9A =9A =9A /* current tas=
k's THREAD (phys) */
> =9A =9A =9A =9Alwz =9A =9A r4,THREAD_FPEXC_MODE(r5)
> =9A =9A =9A =9Aori =9A =9A r9,r9,MSR_FP =9A =9A =9A =9A =9A =9A/* enable =
FP for current */
> +#ifdef CONFIG_XILINX_SOFTFPU
> + =9A =9A =9A oris =9A =9Ar9,r9,MSR_AP@h
> +#endif
> =9A =9A =9A =9Aor =9A =9A =9Ar9,r9,r4
> =9A#else
> =9A =9A =9A =9Ald =9A =9A =9Ar4,PACACURRENT(r13)
> @@ -124,6 +133,9 @@
> =9A_GLOBAL(giveup_fpu)
> =9A =9A =9A =9Amfmsr =9A r5
> =9A =9A =9A =9Aori =9A =9A r5,r5,MSR_FP
> +#ifdef CONFIG_XILINX_SOFTFPU
> + =9A =9A =9A oris =9A =9Ar5,r5,MSR_AP@h
> +#endif
> =9A#ifdef CONFIG_VSX
> =9ABEGIN_FTR_SECTION
> =9A =9A =9A =9Aoris =9A =9Ar5,r5,MSR_VSX@h
> @@ -145,6 +157,9 @@
> =9A =9A =9A =9Abeq =9A =9A 1f
> =9A =9A =9A =9APPC_LL =9Ar4,_MSR-STACK_FRAME_OVERHEAD(r5)
> =9A =9A =9A =9Ali =9A =9A =9Ar3,MSR_FP|MSR_FE0|MSR_FE1
> +#ifdef CONFIG_XILINX_SOFTFPU
> + =9A =9A =9A oris =9A =9Ar3,r3,MSR_AP@h
> +#endif
> =9A#ifdef CONFIG_VSX
> =9ABEGIN_FTR_SECTION
> =9A =9A =9A =9Aoris =9A =9Ar3,r3,MSR_VSX@h
> diff -r 626de0d94469 arch/powerpc/kernel/head_40x.S
> --- a/arch/powerpc/kernel/head_40x.S =9A =9AWed May 26 15:33:32 2010 +040=
0
> +++ b/arch/powerpc/kernel/head_40x.S =9A =9AWed May 26 20:30:43 2010 +040=
0
> @@ -420,7 +420,19 @@
> =9A =9A =9A =9Aaddi =9A =9Ar3,r1,STACK_FRAME_OVERHEAD
> =9A =9A =9A =9AEXC_XFER_STD(0x700, program_check_exception)
>
> +/* 0x0800 - FPU unavailable Exception */
> +#ifdef CONFIG_PPC_FPU
> + =9A =9A =9A START_EXCEPTION(0x0800, FloatingPointUnavailable)
> + =9A =9A =9A NORMAL_EXCEPTION_PROLOG
> + =9A =9A =9A beq =9A =9A 1f; =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A=
 =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A \
> + =9A =9A =9A bl =9A =9A =9Aload_up_fpu; =9A =9A =9A =9A =9A =9A/* if fro=
m user, just load it up */ =9A \
> + =9A =9A =9A b =9A =9A =9A fast_exception_return; =9A =9A =9A =9A =9A =
=9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A\
> +1: =9A =9A addi =9A =9Ar3,r1,STACK_FRAME_OVERHEAD; =9A =9A =9A =9A =9A =
=9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A \
> + =9A =9A =9A EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
> +#else
> =9A =9A =9A =9AEXCEPTION(0x0800, Trap_08, unknown_exception, EXC_XFER_EE)
> +#endif
> +
> =9A =9A =9A =9AEXCEPTION(0x0900, Trap_09, unknown_exception, EXC_XFER_EE)
> =9A =9A =9A =9AEXCEPTION(0x0A00, Trap_0A, unknown_exception, EXC_XFER_EE)
> =9A =9A =9A =9AEXCEPTION(0x0B00, Trap_0B, unknown_exception, EXC_XFER_EE)
> @@ -821,8 +833,10 @@
> =9A* The PowerPC 4xx family of processors do not have an FPU, so this jus=
t
> =9A* returns.
> =9A*/
> +#ifndef CONFIG_PPC_FPU
> =9A_ENTRY(giveup_fpu)
> =9A =9A =9A =9Ablr
> +#endif
>
> =9A/* This is where the main kernel code starts.
> =9A*/
> diff -r 626de0d94469 arch/powerpc/platforms/Kconfig
> --- a/arch/powerpc/platforms/Kconfig =9A =9AWed May 26 15:33:32 2010 +040=
0
> +++ b/arch/powerpc/platforms/Kconfig =9A =9AWed May 26 20:30:43 2010 +040=
0
> @@ -333,4 +333,9 @@
> =9A =9A =9A =9Abool "Xilinx PCI host bridge support"
> =9A =9A =9A =9Adepends on PCI && XILINX_VIRTEX
>
> +config XILINX_SOFTFPU
> + =9A =9A =9A bool "Xilinx Soft FPU"
> + =9A =9A =9A select PPC_FPU
> + =9A =9A =9A depends on XILINX_VIRTEX_4_FX && !PPC40x_SIMPLE && !405GP &=
& !405GPR
> +
> =9Aendmenu
>
> --
> Regards, Sergey Temerkhanov
>



--=20
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* Re: Query regarding 2.6.335 RT[Ingo's] and Non-RT performance
From: Manikandan Ramachandran @ 2010-08-17  5:26 UTC (permalink / raw)
  To: linuxppc-dev

> ------------------------------------------------------
> > Date: Thu, 12 Aug 2010 13:53:51 -0400
> > From: Jeff Angielski <jeff@theptrgroup.com>
> > To: linuxppc-dev@lists.ozlabs.org
> > Subject: Re: Query regarding 2.6.335 RT[Ingo's] and Non-RT performance
> > Message-ID: <4C64352F.4090005@theptrgroup.com>
> > Content-Type: text/plain; charset=3DISO-8859-1; format=3Dflowed
> >
> > On 08/11/2010 06:18 PM, Manikandan Ramachandran wrote:
> > > Hello All,
> > > =A0 =A0 =A0I created a very simple program which has higher priority =
than
> > > normal tasks and runs a tight loop. Under same test environment I ran
> > > this program on both non-rt and rt 2.6.33.5 kernel. =A0To my suprise =
I see
> > > that performance of non-RT kernel is better than RT. non-RT kernel to=
ok
> > > 3 sec and 366156 usec while RT kernel took about 3 sec and 418011
> > > usec.Can someone please explain why the performance of non-rt kernel =
is
> > > better than rt kernel? From the face of the test result, I feel RT ha=
s
> > > more overhead,Is there any configuration that I could do to bring dow=
n
> > > the overhead?
> >
> > Your "surprise" is due to your definition of "performance".
> >
> > The purpose of the -rt kernels is to reduce the kernel latency. =A0This=
 is
> > important for servicing hardware. =A0Normal users find the -rt useful f=
or
> > audio/video applications. =A0Engineering and scientific users find the =
-rt
> > beneficially for servicing hardware like sensors or control systems.
> >
> > If you are just trying to run calculations as fast as you can in user
> > space, you'd be better off using the non-rt variants.
> >
> >
> > --
> > Jeff Angielski
> > The PTR Group
> > www.theptrgroup.com



 Thanks for your response.

On one hand I hear that RT-kernel is meant for reducing kernel latency on
other hand I see that there is RT-kernel overhead. So what really RT-kernel
brings to system performance?

Actually I see that latency for higher priority is more or less same for
non-rt system.

One more thing, since irqs being threaded in RT, and with CFS scheduler in
2.6.33, wouldn't we bring down system performance as CFS is O(log(n)) =A0in
nature?
 --
 Thanks,
 Manik

^ permalink raw reply

* RE: [PATCH 1/2] mmc: change ACMD12 to AUTO_CMD12 for more clear
From: Zang Roy-R61911 @ 2010-08-17  6:48 UTC (permalink / raw)
  To: Zang Roy-R61911, akpm, linux-mmc; +Cc: linuxppc-dev, mirqus
In-Reply-To: <3850A844E6A3854C827AC5C0BEC7B60A056400@zch01exm23.fsl.freescale.net>

=20

> -----Original Message-----
> From: Zang Roy-R61911=20
> Sent: Wednesday, August 11, 2010 12:47 PM
> To: Zang Roy-R61911; akpm@linux-foundation.org;=20
> linux-mmc@vger.kernel.org
> Cc: linuxppc-dev@ozlabs.org; mirqus@gmail.com;=20
> cbouatmailru@gmail.com; grant.likely@secretlab.ca
> Subject: RE: [PATCH 1/2] mmc: change ACMD12 to AUTO_CMD12 for=20
> more clear
>=20
> =20
>=20
> > -----Original Message-----
> > From: Zang Roy-R61911=20
> > Sent: Tuesday, August 10, 2010 17:47 PM
> > To: akpm@linux-foundation.org; linux-mmc@vger.kernel.org
> > Cc: linuxppc-dev@ozlabs.org; mirqus@gmail.com;=20
> > cbouatmailru@gmail.com; grant.likely@secretlab.ca
> > Subject: [PATCH 1/2] mmc: change ACMD12 to AUTO_CMD12 for more clear
> >=20
> > Change ACMD12 to AUTO_CMD12 to reduce the confusion.
> > ACMD12 might be confused with MMC/SD App CMD 12 (CMD55+CMD12 combo).
> >=20
> > Signed-off-by: Roy Zang <tie-fei.zang@freescale.com>
> > ---
> >  drivers/mmc/host/sdhci-of-core.c |    2 +-
> >  drivers/mmc/host/sdhci.c         |    8 ++++----
> >  drivers/mmc/host/sdhci.h         |   10 +++++-----
> >  3 files changed, 10 insertions(+), 10 deletions(-)
> Andrew,=20
> Could you help to pick up this minor fix?
> Thanks.
> Roy
Any update on these two patches?
Thanks.
Roy

^ permalink raw reply

* Re: [PATCH 4/9] RapidIO: Add relation links between RIO device structures
From: Micha Nelissen @ 2010-08-17  7:08 UTC (permalink / raw)
  To: Bounine, Alexandre; +Cc: linuxppc-dev, akpm, linux-kernel
In-Reply-To: <0CE8B6BE3C4AD74AB97D9D29BD24E552011D5F1F@CORPEXCH1.na.ads.idt.com>

Bounine, Alexandre wrote:
>> As RapidIO is a switched network, the concept of 'previous' and 'next'
>> devices is invalid. Perhaps it's just the way they were
>> discovered/enumerated, but that does not matter any more at runtime.
>> Or at least, should not matter.
>>
> 
> Yes, the "previous" and "next" have to be considered in context of
> enumeration/discovery.
> At runtime, it does not matter for data traffic, but is valuable
> information for error recovery

I agree it's desirable to have this information. Notes:
1) is rio_dev->prev used anywhere? (maybe I missed it)
2) is the nextdev[port] list complete? I mean are all connected switches 
in the list? My guess is that multiply connected switches are enumerated 
only once therefore only appear in the nextdev if only one switch, 
instead of all
3) it would be nice to have all switch connections information.

In case ever the network is rerouted, this information will become 
useful; instead of having a tree representation of the network only.

Micha

^ permalink raw reply

* Re: [PATCH 2/9] RapidIO, powerpc/85xx: modify RIO port-write interrupt handler
From: Micha Nelissen @ 2010-08-17  7:12 UTC (permalink / raw)
  To: Bounine, Alexandre; +Cc: linuxppc-dev, akpm, linux-kernel
In-Reply-To: <0CE8B6BE3C4AD74AB97D9D29BD24E552011D5F94@CORPEXCH1.na.ads.idt.com>

Bounine, Alexandre wrote:
> capable to receive and keep only one PW message. Therefore, I copy it
> into the driver's FIFO and re-enable HW Rx queue (it is called queue but
> can accept only one entry) ASAP. I have a test setup that is capable
> generate multiple PW messages and see many messages discarded by PW
> controller because of this single-entry HW queue.

Primarily due to the single entry queue, the order of checking is 
probably insignificant? :-) Anyway, I don't mind changing the order.

Micha

^ permalink raw reply

* Re: [PATCH 6/9] RapidIO: Add switch-specific sysfs initialization callback
From: Micha Nelissen @ 2010-08-17  7:18 UTC (permalink / raw)
  To: Bounine, Alexandre; +Cc: linuxppc-dev, akpm, linux-kernel
In-Reply-To: <0CE8B6BE3C4AD74AB97D9D29BD24E552011D5FFC@CORPEXCH1.na.ads.idt.com>

Bounine, Alexandre wrote:
>> Why not make a sw_sysfs_create and sw_sysfs_remove? Is better for
>> readability. Now you call 'sw_sysfs(dev, 0)' or 'sw_sysfs(dev, 1)';
> 
> I just do not want to have an extra member here. Not every switch will
> require own sysfs attributes, but every switch will be presented by a
> data structure. Based on its intended use I do not see any problem here.

It's not problematic, but personally I find function calls that pass 0 
or 1 as an argument hard to read. Likewise for boolean parameters. An 
alternative would be to have defines SW_SYSFS_CREATE etc. It's a minor item.

Micha

^ permalink raw reply

* Re: [PATCH 7/9] RapidIO: Add handling for PW message from a lost device
From: Micha Nelissen @ 2010-08-17  7:22 UTC (permalink / raw)
  To: Bounine, Alexandre; +Cc: linuxppc-dev, akpm, linux-kernel
In-Reply-To: <0CE8B6BE3C4AD74AB97D9D29BD24E552011D6040@CORPEXCH1.na.ads.idt.com>

Bounine, Alexandre wrote:
> That "real" PW message may be dropped by the controller (85xx is good
> example). Everything depends on number of PW messages directed to the
> host/controller. I am trying to use the first available notification to
> service device removal. If the "real" PW message is received it should
> be processed without any further action. 

Perhaps an idea is to use the repeated port-write sending feature so 
that dropped port-writes are not a problem anymore.

Micha

^ permalink raw reply

* Re: [PATCH 0/9] RapidIO: Set of patches to add Gen2 switches
From: Micha Nelissen @ 2010-08-17  7:31 UTC (permalink / raw)
  To: Bounine, Alexandre; +Cc: linuxppc-dev, akpm, linux-kernel, Thomas Moll
In-Reply-To: <0CE8B6BE3C4AD74AB97D9D29BD24E552011D6061@CORPEXCH1.na.ads.idt.com>

Bounine, Alexandre wrote:
> Micha Nelissen wrote:
>> This is not 'Gen2' specific, as these error management extensions also
>> exist in v1.2/1.3 (?) of the specification? E.g. tsi56x and tsi57x
> could
>> support this functionality?
> 
> Correct. EM extensions exist since v1.3. But implementation before Gen2
> switches relied
> on proprietary device specific mechanism (tsi57x). Addition of Gen2

Do you mean the tsi56x here?

Can you explain what the difference what you mean with relied on 
proprietary vs standard? E.g. setting the port-write destination ID 
register is standardized? And the format of the port-write message 
itself is also.

Micha

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Richard Cochran @ 2010-08-17  8:32 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linuxppc-dev, devicetree-discuss, linux-kernel, netdev,
	Rodolfo Giometti, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <201008162159.39734.arnd@arndb.de>

On Mon, Aug 16, 2010 at 09:59:39PM +0200, Arnd Bergmann wrote:
> Why does it matter how long it takes to read the clock? I wasn't thinking
> of replacing the system clock with this, just exposing the additional
> clock as a new clockid_t value that can be accessed using the existing
> syscalls.

Okay, now I see. You are suggesting this:

      clock_gettime(CLOCK_PTP, &ts);
      clock_settime(CLOCK_PTP, &ts);

I like this. If there is agreement about it, I am happy to implement
the PTP stuff that way.

> Why did you not want to add syscalls? Adding ioctls instead of syscalls
> does not make the interface better, just less visible.

I bet that, had I posted patch set with new syscalls, someone would
have said, "You are adding new syscalls. Can't you just use a char
device instead!"

If you add syscalls and introduce CLOCK_PTP, then you add it to
everyone's kernel, even those people who never heard of PTP. A char
device has the advantage that can it be simply ignored. Also, a
syscall has got to have the right form from the very beginning. If the
next generation of PTP hardware looks very different, then it is not
that much of a crime to change an ioctl interface, provided it has
versioning.

Richard

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Richard Cochran @ 2010-08-17  8:53 UTC (permalink / raw)
  To: john stultz
  Cc: Rodolfo Giometti, netdev, devicetree-discuss, linux-kernel,
	linuxppc-dev, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <AANLkTik_2MKMhOuDGOmu8Kzyq-ipLe+Bxrb3FaD+Tv4U@mail.gmail.com>

On Mon, Aug 16, 2010 at 12:24:48PM -0700, john stultz wrote:
> 3) I'm not sure I see the benefit of being able to have multiple
> frequency corrected time domains.  In other words, what benefit would
> you get from adjusting a PTP clock's frequency instead of just
> adjusting the system's time freq? Having the PTP time as a reference
> to correct the system time seems reasonable, but I'm not sure I see
> why userland would want to adjust the PTP clock's freq.

For PTP enabled hardware, the timestamp on the network packet comes
from from the PTP clock, not from the system time.

Of course, you can always just leave the PTP clock alone, figure the
needed correction, and apply it whenever needed. But this has some
disadvantages. First of all, the (one and only) open source PTPd does
not do it that way. Also, only one program (the PTPd or equivalent)
will know the correct time. Other programs will not be able to ask the
operating system for time services. Instead, they would need to use
IPC to the PTPd.

The PTP protocol (and some PTP hardware) offers a "one step" method,
where the timestamps are inserted by the hardware on the fly. Here you
really do need the PTP clock to be correctly adjusted.

All of the PTP hardware that I am familiar with provides a clock
adjustment method, so it simpler and cleaner just to use this facility
to tune the PTP clock.

Richard

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Arnd Bergmann @ 2010-08-17  9:25 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linuxppc-dev, devicetree-discuss, linux-kernel, netdev,
	Rodolfo Giometti, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <20100817083216.GA3330@riccoc20.at.omicron.at>

On Tuesday 17 August 2010, Richard Cochran wrote:
> > Why did you not want to add syscalls? Adding ioctls instead of syscalls
> > does not make the interface better, just less visible.
> 
> I bet that, had I posted patch set with new syscalls, someone would
> have said, "You are adding new syscalls. Can't you just use a char
> device instead!"

Very possible, but after considering both options, I think we would
still end up with the same conclusion.

> If you add syscalls and introduce CLOCK_PTP, then you add it to
> everyone's kernel, even those people who never heard of PTP. A char
> device has the advantage that can it be simply ignored.

It's a matter of perspective whether you consider this an advantage
or disadvantage. I would expect that since you are trying to get support
for PTP into the kernel, you'd be happy for people to know about it and
use your code ;-).

> Also, a syscall has got to have the right form from the very beginning.
> If the next generation of PTP hardware looks very different, then it is not
> that much of a crime to change an ioctl interface, provided it has
> versioning.

No, that's just a myth. The rules for ABI stability are pretty much the
same. We try hard to avoid ever changing an existing ABI, for both
syscalls and ioctl. In either case, if you get it wrong, you have to support
the old interface and create a new syscall or ioctl command.
As mentioned, versioning does not solve this, it just adds another
indirection (which we try to avoid).

One difference is that more people take a look at your code when you suggest
a new syscall, so the chances of getting it right in the first upstream
version are higher.
Another difference is that we generally use ioctl for devices that can
be enumerated, while syscalls are for system services that are not tied to
a specific device. This argument works both ways for PTP IMHO: On the one
hand you want to have a reliable clock that you can use without knowing
where it comes from, on the other you might have multiple PTP sources that
you need to differentiate.

	Arnd

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Richard Cochran @ 2010-08-17 10:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linuxppc-dev, devicetree-discuss, linux-kernel, netdev,
	Rodolfo Giometti, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <201008170925.55592.arnd@arndb.de>

On Tue, Aug 17, 2010 at 09:25:55AM +0000, Arnd Bergmann wrote:
> Another difference is that we generally use ioctl for devices that can
> be enumerated, while syscalls are for system services that are not tied to
> a specific device. This argument works both ways for PTP IMHO: On the one
> hand you want to have a reliable clock that you can use without knowing
> where it comes from, on the other you might have multiple PTP sources that
> you need to differentiate.

Yes, I agree. In normal use, there will be only one PTP clock in a
system. However, for research purposes, it would be nice to have more
than one.

I've been looking at offering the PTP clock as a posix clock, and it
is not as hard as I first thought. The PTP clock or clocks just have
to be registered as one of the posix_clocks[MAX_CLOCKS] in
posix-timers.c.

My suggestion would be to reserve three clock ids in time.h,
CLOCK_PTP0, CLOCK_PTP1, and CLOCK_PTP2. The first one would be the
same as CLOCK_REALTIME, for SW timestamping, and the other two would
allow two different PTP clocks at the same time, for the research use
case.

Using the clock id will bring another advantage, since it will then be
possible for user space to specify the desired timestamp source for
SO_TIMESTAMPING.

Richard

^ permalink raw reply

* Re: [PATCH 1/5] ptp: Added a brand new class driver for ptp clocks.
From: Arnd Bergmann @ 2010-08-17 11:36 UTC (permalink / raw)
  To: Richard Cochran, john stultz
  Cc: linuxppc-dev, devicetree-discuss, linux-kernel, netdev,
	Rodolfo Giometti, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <20100817105232.GA9079@riccoc20.at.omicron.at>

On Tuesday 17 August 2010, Richard Cochran wrote:
> On Tue, Aug 17, 2010 at 09:25:55AM +0000, Arnd Bergmann wrote:
> > Another difference is that we generally use ioctl for devices that can
> > be enumerated, while syscalls are for system services that are not tied to
> > a specific device. This argument works both ways for PTP IMHO: On the one
> > hand you want to have a reliable clock that you can use without knowing
> > where it comes from, on the other you might have multiple PTP sources that
> > you need to differentiate.
> 
> Yes, I agree. In normal use, there will be only one PTP clock in a
> system. However, for research purposes, it would be nice to have more
> than one.
> 
> I've been looking at offering the PTP clock as a posix clock, and it
> is not as hard as I first thought. The PTP clock or clocks just have
> to be registered as one of the posix_clocks[MAX_CLOCKS] in
> posix-timers.c.

Ok sounds good.

> My suggestion would be to reserve three clock ids in time.h,
> CLOCK_PTP0, CLOCK_PTP1, and CLOCK_PTP2. The first one would be the
> same as CLOCK_REALTIME, for SW timestamping, and the other two would
> allow two different PTP clocks at the same time, for the research use
> case.

I don't think there is a point in making exactly two independent sources
available. The clockid_t space is not really limited, so we could define
an arbitrary range of ids for ptp sources that could be used simultaneously,
as long as we have space more more ids with a fixed number.

Would it be reasonable to assume that on a machine with a large number
of NICs, you'd want a separate ptp source for each of them for timestamping?
Or would you preferably define just one source in such a setup?

I think both could be done with the use of class device attributes in
sysfs for configuration. Maybe you can have one CLOCK_PTP value for one
global PTP source and use sysfs to configure which device that is.

If you also need simultaneous access to the specific clocks, you could
have run-time configured clockid numbers in a sysfs attribute for each
ptp class device.

> Using the clock id will bring another advantage, since it will then be
> possible for user space to specify the desired timestamp source for
> SO_TIMESTAMPING.

God point.

	Arnd

^ permalink raw reply

* RE: [PATCH 7/9] RapidIO: Add handling for PW message from a lost device
From: Bounine, Alexandre @ 2010-08-17 12:44 UTC (permalink / raw)
  To: Micha Nelissen; +Cc: linuxppc-dev, akpm, linux-kernel, Bounine, Alexandre
In-Reply-To: <4C6A38A4.6070505@neli.hopto.org>

Micha Nelissen wrote:
>=20
> Perhaps an idea is to use the repeated port-write sending feature so
> that dropped port-writes are not a problem anymore.
>=20
Unfortunately, this feature is not defined by RIO spec. This is
proprietary function, so we=20
cannot rely on it. Yes, this is nice feature of Tsi57x switches and may
be used if you have=20
a closed system - just enable it in em_init. The RIO spec part 8 is
quite open about port-write generation and we cannot expect the same
behavior from different devices.   =20

^ permalink raw reply

* [PATCH 03/26] KVM: PPC: Add tracepoint for generic mmu map
From: Alexander Graf @ 2010-08-17 13:57 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1282053481-18787-1-git-send-email-agraf@suse.de>

This patch moves the generic mmu map debugging over to tracepoints.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_mmu_hpte.c |    3 +++
 arch/powerpc/kvm/trace.h           |   29 +++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c b/arch/powerpc/kvm/book3s_mmu_hpte.c
index 02c64ab..ac94bd9 100644
--- a/arch/powerpc/kvm/book3s_mmu_hpte.c
+++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
@@ -21,6 +21,7 @@
 #include <linux/kvm_host.h>
 #include <linux/hash.h>
 #include <linux/slab.h>
+#include "trace.h"
 
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
@@ -66,6 +67,8 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
 	u64 index;
 
+	trace_kvm_book3s_mmu_map(pte);
+
 	spin_lock(&vcpu->arch.mmu_lock);
 
 	/* Add to ePTE list */
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 3b9169c..ee6ac88 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -174,6 +174,35 @@ TRACE_EVENT(kvm_book3s_64_mmu_map,
 
 #endif
 
+TRACE_EVENT(kvm_book3s_mmu_map,
+	TP_PROTO(struct hpte_cache *pte),
+	TP_ARGS(pte),
+
+	TP_STRUCT__entry(
+		__field(	u64,		host_va		)
+		__field(	u64,		pfn		)
+		__field(	ulong,		eaddr		)
+		__field(	u64,		vpage		)
+		__field(	ulong,		raddr		)
+		__field(	int,		flags		)
+	),
+
+	TP_fast_assign(
+		__entry->host_va	= pte->host_va;
+		__entry->pfn		= pte->pfn;
+		__entry->eaddr		= pte->pte.eaddr;
+		__entry->vpage		= pte->pte.vpage;
+		__entry->raddr		= pte->pte.raddr;
+		__entry->flags		= (pte->pte.may_read ? 0x4 : 0) |
+					  (pte->pte.may_write ? 0x2 : 0) |
+					  (pte->pte.may_execute ? 0x1 : 0);
+	),
+
+	TP_printk("Map: hva=%llx pfn=%llx ea=%lx vp=%llx ra=%lx [%x]",
+		  __entry->host_va, __entry->pfn, __entry->eaddr,
+		  __entry->vpage, __entry->raddr, __entry->flags)
+);
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 12/26] KVM: PPC: Remove unused define
From: Alexander Graf @ 2010-08-17 13:57 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1282053481-18787-1-git-send-email-agraf@suse.de>

The define VSID_ALL is unused. Let's remove it.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index e7c4d00..4040c8d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -30,7 +30,6 @@
 #include "trace.h"
 
 #define PTE_SIZE 12
-#define VSID_ALL 0
 
 void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 00/26] KVM: PPC: Mid-August patch queue
From: Alexander Graf @ 2010-08-17 13:57 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list

Howdy,

This is my local patch queue with stuff that has accumulated over the last
weeks on KVM for PPC with some last minute fixes, speedups and debugging help
that I needed for the KVM Forum ;-).

The highlights of this set are:

  - Converted most important debug points to tracepoints
  - Flush less PTEs (speedup)
  - Go back to our own hash (less duplicates)
  - Make SRs guest settable (speedup for 32 bit guests)
  - Remove r30/r31 restrictions from PV hooks (speedup!)
  - Fix random breakages
  - Fix random guest stalls
  - 440GP host support (Thanks Hollis!)

Keep in mind that this is the first version that is stable on PPC32 hosts.
All versions prior to this could occupy otherwise used segment entries and
thus crash your machine :-).

After finally meeting Avi again, we also agreed to give pulls a try. So
here we go - this is my tree online:

    git://github.com/agraf/linux-2.6.git kvm-ppc-next


Have fun with more accurate, faster and less buggy KVM on PowerPC!


Alexander Graf (23):
  KVM: PPC: Move EXIT_DEBUG partially to tracepoints
  KVM: PPC: Move book3s_64 mmu map debug print to trace point
  KVM: PPC: Add tracepoint for generic mmu map
  KVM: PPC: Move pte invalidate debug code to tracepoint
  KVM: PPC: Fix sid map search after flush
  KVM: PPC: Add tracepoints for generic spte flushes
  KVM: PPC: Preload magic page when in kernel mode
  KVM: PPC: Don't flush PTEs on NX/RO hit
  KVM: PPC: Make invalidation code more reliable
  KVM: PPC: Move slb debugging to tracepoints
  KVM: PPC: Revert "KVM: PPC: Use kernel hash function"
  KVM: PPC: Remove unused define
  KVM: PPC: Add feature bitmap for magic page
  KVM: PPC: Move BAT handling code into spr handler
  KVM: PPC: Interpret SR registers on demand
  KVM: PPC: Put segment registers in shared page
  KVM: PPC: Add mtsrin PV code
  KVM: PPC: Make PV mtmsr work with r30 and r31
  KVM: PPC: Update int_pending also on dequeue
  KVM: PPC: Make PV mtmsrd L=1 work with r30 and r31
  KVM: PPC: Force enable nap on KVM
  KVM: PPC: Implement correct SID mapping on Book3s_32
  KVM: PPC: Don't put MSR_POW in MSR

Hollis Blanchard (3):
  KVM: PPC: initialize IVORs in addition to IVPR
  KVM: PPC: fix compilation of "dump tlbs" debug function
  KVM: PPC: allow ppc440gp to pass the compatibility check

 arch/powerpc/include/asm/kvm_book3s.h |   25 ++--
 arch/powerpc/include/asm/kvm_para.h   |    3 +
 arch/powerpc/kernel/asm-offsets.c     |    1 +
 arch/powerpc/kernel/kvm.c             |  144 ++++++++++++++++++---
 arch/powerpc/kernel/kvm_emul.S        |   75 +++++++++--
 arch/powerpc/kvm/44x.c                |    3 +-
 arch/powerpc/kvm/44x_tlb.c            |    1 +
 arch/powerpc/kvm/book3s.c             |   54 ++++----
 arch/powerpc/kvm/book3s_32_mmu.c      |   83 +++++++------
 arch/powerpc/kvm/book3s_32_mmu_host.c |   67 ++++++----
 arch/powerpc/kvm/book3s_64_mmu_host.c |   59 +++------
 arch/powerpc/kvm/book3s_emulate.c     |   48 +++-----
 arch/powerpc/kvm/book3s_mmu_hpte.c    |   38 ++----
 arch/powerpc/kvm/booke.c              |    8 +-
 arch/powerpc/kvm/powerpc.c            |    5 +-
 arch/powerpc/kvm/trace.h              |  230 +++++++++++++++++++++++++++++++++
 16 files changed, 614 insertions(+), 230 deletions(-)

^ permalink raw reply

* [PATCH 01/26] KVM: PPC: Move EXIT_DEBUG partially to tracepoints
From: Alexander Graf @ 2010-08-17 13:57 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1282053481-18787-1-git-send-email-agraf@suse.de>

We have a debug printk on every exit that is usually #ifdef'ed out. Using
tracepoints makes a lot more sense here though, as they can be dynamically
enabled.

This patch converts the most commonly used debug printks of EXIT_DEBUG to
tracepoints.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s.c |   26 ++++----------------------
 arch/powerpc/kvm/trace.h  |   42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index eee97b5..f8b9aab 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -17,6 +17,7 @@
 #include <linux/kvm_host.h>
 #include <linux/err.h>
 #include <linux/slab.h>
+#include "trace.h"
 
 #include <asm/reg.h>
 #include <asm/cputable.h>
@@ -35,7 +36,6 @@
 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
 
 /* #define EXIT_DEBUG */
-/* #define EXIT_DEBUG_SIMPLE */
 /* #define DEBUG_EXT */
 
 static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
@@ -105,14 +105,6 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 	kvmppc_giveup_ext(vcpu, MSR_VSX);
 }
 
-#if defined(EXIT_DEBUG)
-static u32 kvmppc_get_dec(struct kvm_vcpu *vcpu)
-{
-	u64 jd = mftb() - vcpu->arch.dec_jiffies;
-	return vcpu->arch.dec - jd;
-}
-#endif
-
 static void kvmppc_recalc_shadow_msr(struct kvm_vcpu *vcpu)
 {
 	ulong smsr = vcpu->arch.shared->msr;
@@ -848,16 +840,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 
 	run->exit_reason = KVM_EXIT_UNKNOWN;
 	run->ready_for_interrupt_injection = 1;
-#ifdef EXIT_DEBUG
-	printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | dec=0x%x | msr=0x%lx\n",
-		exit_nr, kvmppc_get_pc(vcpu), kvmppc_get_fault_dar(vcpu),
-		kvmppc_get_dec(vcpu), to_svcpu(vcpu)->shadow_srr1);
-#elif defined (EXIT_DEBUG_SIMPLE)
-	if ((exit_nr != 0x900) && (exit_nr != 0x500))
-		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | msr=0x%lx\n",
-			exit_nr, kvmppc_get_pc(vcpu), kvmppc_get_fault_dar(vcpu),
-			vcpu->arch.shared->msr);
-#endif
+
+	trace_kvm_book3s_exit(exit_nr, vcpu);
 	kvm_resched(vcpu);
 	switch (exit_nr) {
 	case BOOK3S_INTERRUPT_INST_STORAGE:
@@ -1089,9 +1073,7 @@ program_interrupt:
 		}
 	}
 
-#ifdef EXIT_DEBUG
-	printk(KERN_EMERG "KVM exit: vcpu=0x%p pc=0x%lx r=0x%x\n", vcpu, kvmppc_get_pc(vcpu), r);
-#endif
+	trace_kvm_book3s_reenter(r, vcpu);
 
 	return r;
 }
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index a8e8400..56cd162 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -98,6 +98,48 @@ TRACE_EVENT(kvm_gtlb_write,
 		__entry->word1, __entry->word2)
 );
 
+TRACE_EVENT(kvm_book3s_exit,
+	TP_PROTO(unsigned int exit_nr, struct kvm_vcpu *vcpu),
+	TP_ARGS(exit_nr, vcpu),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	exit_nr		)
+		__field(	unsigned long,	pc		)
+		__field(	unsigned long,	msr		)
+		__field(	unsigned long,	dar		)
+		__field(	unsigned long,	srr1		)
+	),
+
+	TP_fast_assign(
+		__entry->exit_nr	= exit_nr;
+		__entry->pc		= kvmppc_get_pc(vcpu);
+		__entry->dar		= kvmppc_get_fault_dar(vcpu);
+		__entry->msr		= vcpu->arch.shared->msr;
+		__entry->srr1		= to_svcpu(vcpu)->shadow_srr1;
+	),
+
+	TP_printk("exit=0x%x | pc=0x%lx | msr=0x%lx | dar=0x%lx | srr1=0x%lx",
+		  __entry->exit_nr, __entry->pc, __entry->msr, __entry->dar,
+		  __entry->srr1)
+);
+
+TRACE_EVENT(kvm_book3s_reenter,
+	TP_PROTO(int r, struct kvm_vcpu *vcpu),
+	TP_ARGS(r, vcpu),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	r		)
+		__field(	unsigned long,	pc		)
+	),
+
+	TP_fast_assign(
+		__entry->r		= r;
+		__entry->pc		= kvmppc_get_pc(vcpu);
+	),
+
+	TP_printk("reentry r=%d | pc=0x%lx", __entry->r, __entry->pc)
+);
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 06/26] KVM: PPC: Add tracepoints for generic spte flushes
From: Alexander Graf @ 2010-08-17 13:57 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1282053481-18787-1-git-send-email-agraf@suse.de>

The different ways of flusing shadow ptes have their own debug prints which use
stupid old printk.

Let's move them to tracepoints, making them easier available, faster and
possible to activate on demand

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_mmu_hpte.c |   18 +++---------------
 arch/powerpc/kvm/trace.h           |   23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c b/arch/powerpc/kvm/book3s_mmu_hpte.c
index 3397152..bd6a767 100644
--- a/arch/powerpc/kvm/book3s_mmu_hpte.c
+++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
@@ -31,14 +31,6 @@
 
 #define PTE_SIZE	12
 
-/* #define DEBUG_MMU */
-
-#ifdef DEBUG_MMU
-#define dprintk_mmu(a, ...) printk(KERN_INFO a, __VA_ARGS__)
-#else
-#define dprintk_mmu(a, ...) do { } while(0)
-#endif
-
 static struct kmem_cache *hpte_cache;
 
 static inline u64 kvmppc_mmu_hash_pte(u64 eaddr)
@@ -186,9 +178,7 @@ static void kvmppc_mmu_pte_flush_long(struct kvm_vcpu *vcpu, ulong guest_ea)
 
 void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask)
 {
-	dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%lx & 0x%lx\n",
-		    vcpu->arch.hpte_cache_count, guest_ea, ea_mask);
-
+	trace_kvm_book3s_mmu_flush("", vcpu, guest_ea, ea_mask);
 	guest_ea &= ea_mask;
 
 	switch (ea_mask) {
@@ -251,8 +241,7 @@ static void kvmppc_mmu_pte_vflush_long(struct kvm_vcpu *vcpu, u64 guest_vp)
 
 void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
 {
-	dprintk_mmu("KVM: Flushing %d Shadow vPTEs: 0x%llx & 0x%llx\n",
-		    vcpu->arch.hpte_cache_count, guest_vp, vp_mask);
+	trace_kvm_book3s_mmu_flush("v", vcpu, guest_vp, vp_mask);
 	guest_vp &= vp_mask;
 
 	switch(vp_mask) {
@@ -274,8 +263,7 @@ void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end)
 	struct hpte_cache *pte;
 	int i;
 
-	dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%lx - 0x%lx\n",
-		    vcpu->arch.hpte_cache_count, pa_start, pa_end);
+	trace_kvm_book3s_mmu_flush("p", vcpu, pa_start, pa_end);
 
 	rcu_read_lock();
 
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 4ab1c72..df15d02 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -232,6 +232,29 @@ TRACE_EVENT(kvm_book3s_mmu_invalidate,
 		  __entry->vpage, __entry->raddr, __entry->flags)
 );
 
+TRACE_EVENT(kvm_book3s_mmu_flush,
+	TP_PROTO(const char *type, struct kvm_vcpu *vcpu, unsigned long long p1,
+		 unsigned long long p2),
+	TP_ARGS(type, vcpu, p1, p2),
+
+	TP_STRUCT__entry(
+		__field(	int,			count		)
+		__field(	unsigned long long,	p1		)
+		__field(	unsigned long long,	p2		)
+		__field(	const char *,		type		)
+	),
+
+	TP_fast_assign(
+		__entry->count		= vcpu->arch.hpte_cache_count;
+		__entry->p1		= p1;
+		__entry->p2		= p2;
+		__entry->type		= type;
+	),
+
+	TP_printk("Flush %d %sPTEs: %llx - %llx",
+		  __entry->count, __entry->type, __entry->p1, __entry->p2)
+);
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 19/26] KVM: PPC: Update int_pending also on dequeue
From: Alexander Graf @ 2010-08-17 13:57 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1282053481-18787-1-git-send-email-agraf@suse.de>

When having a decrementor interrupt pending, the dequeuing happens manually
through an mtdec instruction. This instruction simply calls dequeue on that
interrupt, so the int_pending hint doesn't get updated.

This patch enables updating the int_pending hint also on dequeue, thus
correctly enabling guests to stay in guest contexts more often.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 5fbe949..8138d31 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -201,6 +201,9 @@ static void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu,
 {
 	clear_bit(kvmppc_book3s_vec2irqprio(vec),
 		  &vcpu->arch.pending_exceptions);
+
+	if (!vcpu->arch.pending_exceptions)
+		vcpu->arch.shared->int_pending = 0;
 }
 
 void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 13/26] KVM: PPC: Add feature bitmap for magic page
From: Alexander Graf @ 2010-08-17 13:57 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1282053481-18787-1-git-send-email-agraf@suse.de>

We will soon add SR PV support to the shared page, so we need some
infrastructure that allows the guest to query for features KVM exports.

This patch adds a second return value to the magic mapping that
indicated to the guest which features are available.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_para.h |    2 ++
 arch/powerpc/kernel/kvm.c           |   21 +++++++++++++++------
 arch/powerpc/kvm/powerpc.c          |    5 ++++-
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h
index 7438ab3..43c1b22 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -47,6 +47,8 @@ struct kvm_vcpu_arch_shared {
 
 #define KVM_FEATURE_MAGIC_PAGE	1
 
+#define KVM_MAGIC_FEAT_SR	(1 << 0)
+
 #ifdef __KERNEL__
 
 #ifdef CONFIG_KVM_GUEST
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index e936817..f48144f 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -267,12 +267,20 @@ static void kvm_patch_ins_wrteei(u32 *inst)
 
 static void kvm_map_magic_page(void *data)
 {
-	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
-		       KVM_MAGIC_PAGE,  /* Physical Address */
-		       KVM_MAGIC_PAGE); /* Effective Address */
+	u32 *features = data;
+
+	ulong in[8];
+	ulong out[8];
+
+	in[0] = KVM_MAGIC_PAGE;
+	in[1] = KVM_MAGIC_PAGE;
+
+	kvm_hypercall(in, out, HC_VENDOR_KVM | KVM_HC_PPC_MAP_MAGIC_PAGE);
+
+	*features = out[0];
 }
 
-static void kvm_check_ins(u32 *inst)
+static void kvm_check_ins(u32 *inst, u32 features)
 {
 	u32 _inst = *inst;
 	u32 inst_no_rt = _inst & ~KVM_MASK_RT;
@@ -368,9 +376,10 @@ static void kvm_use_magic_page(void)
 	u32 *p;
 	u32 *start, *end;
 	u32 tmp;
+	u32 features;
 
 	/* Tell the host to map the magic page to -4096 on all CPUs */
-	on_each_cpu(kvm_map_magic_page, NULL, 1);
+	on_each_cpu(kvm_map_magic_page, &features, 1);
 
 	/* Quick self-test to see if the mapping works */
 	if (__get_user(tmp, (u32*)KVM_MAGIC_PAGE)) {
@@ -383,7 +392,7 @@ static void kvm_use_magic_page(void)
 	end = (void*)_etext;
 
 	for (p = start; p < end; p++)
-		kvm_check_ins(p);
+		kvm_check_ins(p, features);
 
 	printk(KERN_INFO "KVM: Live patching for a fast VM %s\n",
 			 kvm_patching_worked ? "worked" : "failed");
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6a53a3f..496d7a5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -66,6 +66,8 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 		vcpu->arch.magic_page_pa = param1;
 		vcpu->arch.magic_page_ea = param2;
 
+		r2 = 0;
+
 		r = HC_EV_SUCCESS;
 		break;
 	}
@@ -76,13 +78,14 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 #endif
 
 		/* Second return value is in r4 */
-		kvmppc_set_gpr(vcpu, 4, r2);
 		break;
 	default:
 		r = HC_EV_UNIMPLEMENTED;
 		break;
 	}
 
+	kvmppc_set_gpr(vcpu, 4, r2);
+
 	return r;
 }
 
-- 
1.6.0.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox