All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Nicolas Pitre <nico@cam.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	torvalds@linux-foundation.org, dhowells@redhat.com,
	mingo@elte.hu, a.p.zijlstra@chello.nl, linux-kernel@vger.k
Subject: Re: [PATCH] convert cnt32_to_63 to inline
Date: Tue, 11 Nov 2008 15:11:30 -0500	[thread overview]
Message-ID: <20081111201130.GA10801@Krystal> (raw)
In-Reply-To: <20081111191355.GA13724@flint.arm.linux.org.uk>

* Russell King (rmk+lkml@arm.linux.org.uk) wrote:
> On Tue, Nov 11, 2008 at 01:28:00PM -0500, Mathieu Desnoyers wrote:
> > Let's see what it gives once implemented. Only compile-tested. Assumes
> > pxa, sa110 and mn10300 are all UP-only. Correct smp_rmb() are used for
> > arm versatile.
> 
> Versatile is also UP only.
> 
> The following are results from PXA built with gcc 3.4.3:
> 
> 1. two additional registers used in sched_clock()
> 2. 8 additional bytes of code (which are needless if gcc was more inteligent)
> 
> both of these I put down to inefficiencies in gcc's register allocation.
> 
> 3. worse instruction scheduling - two inter-dependent loads next to each
>    other causing a pipeline stall
> 
> Actual reading of variables/hardware is unaffected by this patch.
> 
> Old code:
> 
>    c:   e59f3050        ldr     r3, [pc, #80]   ; load address of oscr2ns_scale
>   10:   e59fc050        ldr     ip, [pc, #80]   ; load address of __m_cnt_hi
>   14:   e5932000        ldr     r2, [r3]	; read oscr2ns_scale
>   18:   e59f304c        ldr     r3, [pc, #76]   ; load address of OSCR
>   1c:   e59c1000        ldr     r1, [ip]	; read __m_cnt_hi
>   20:   e1a07002        mov     r7, r2
>   24:   e3a08000        mov     r8, #0  ; 0x0
>   28:   e5933000        ldr     r3, [r3]	; read OSCR register
> ...
>   58:   e1820b04        orr     r0, r2, r4, lsl #22
>   5c:   e1a01524        lsr     r1, r4, #10
>   60:   e89da9f0        ldm     sp, {r4, r5, r6, r7, r8, fp, sp, pc}
> 
> 
> New code:
> 
>    c:   e59f0058        ldr     r0, [pc, #88]   ; load address of oscr2ns_scale
>   10:   e5901000        ldr     r1, [r0]	; read oscr2ns_scale <= pipeline stall
>   14:   e59f3054        ldr     r3, [pc, #84]   ; load address of __m_cnt_hi
>   18:   e1a08001        mov     r8, r1
>   1c:   e5932000        ldr     r2, [r3]	; read __m_cnt_hi
>   20:   e59f304c        ldr     r3, [pc, #76]   ; load address of OSCR
>   24:   e1a09002        mov     r9, r2
>   28:   e3a0a000        mov     sl, #0  ; 0x0
>   2c:   e5933000        ldr     r3, [r3]	; read OSCR
> ...
>   58:   e1825b04        orr     r5, r2, r4, lsl #22
>   5c:   e1a06524        lsr     r6, r4, #10
>   60:   e1a01006        mov     r1, r6
>   64:   e1a00005        mov     r0, r5
>   68:   e89daff0        ldm     sp, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
> 
> Versatile:
> 
> 1. 12 additional bytes of code
> 2. same number of registers
> 3. worse instruction scheduling causing pipeline stall
> 
> Actual reading of variables/hardware is unaffected by this patch.
> 
> So, we have two platforms where this patch makes things visibly worse
> with no material benefit.
> 

I think the added barrier() are causing these pipeline stalls. They
don't allow the compiler to read variables such as oscr2ns_scale before
the barrier because gcc cannot assume it won't be modified. However, to
insure that OSCR read is done after __m_cnt_hi read, this barrier seems
required to be safe against gcc optimizations.

Have you compared my patch to Nicolas'patch, which adds a smp_rmb() in
the macro or to a vanilla tree ?

I wonder if reading those values sooner would help gcc ?

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

WARNING: multiple messages have this Message-ID (diff)
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Nicolas Pitre <nico@cam.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	torvalds@linux-foundation.org, dhowells@redhat.com,
	mingo@elte.hu, a.p.zijlstra@chello.nl,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	benh@kernel.crashing.org, paulus@samba.org, davem@davemloft.net,
	mingo@redhat.com, tglx@linutronix.de, rostedt@goodmis.org,
	linux-arch@vger.kernel.org
Subject: Re: [PATCH] convert cnt32_to_63 to inline
Date: Tue, 11 Nov 2008 15:11:30 -0500	[thread overview]
Message-ID: <20081111201130.GA10801@Krystal> (raw)
Message-ID: <20081111201130.l9pQwlJxzckCMgXycxwShmRz1B5iZMYEH2eOmODF8rc@z> (raw)
In-Reply-To: <20081111191355.GA13724@flint.arm.linux.org.uk>

* Russell King (rmk+lkml@arm.linux.org.uk) wrote:
> On Tue, Nov 11, 2008 at 01:28:00PM -0500, Mathieu Desnoyers wrote:
> > Let's see what it gives once implemented. Only compile-tested. Assumes
> > pxa, sa110 and mn10300 are all UP-only. Correct smp_rmb() are used for
> > arm versatile.
> 
> Versatile is also UP only.
> 
> The following are results from PXA built with gcc 3.4.3:
> 
> 1. two additional registers used in sched_clock()
> 2. 8 additional bytes of code (which are needless if gcc was more inteligent)
> 
> both of these I put down to inefficiencies in gcc's register allocation.
> 
> 3. worse instruction scheduling - two inter-dependent loads next to each
>    other causing a pipeline stall
> 
> Actual reading of variables/hardware is unaffected by this patch.
> 
> Old code:
> 
>    c:   e59f3050        ldr     r3, [pc, #80]   ; load address of oscr2ns_scale
>   10:   e59fc050        ldr     ip, [pc, #80]   ; load address of __m_cnt_hi
>   14:   e5932000        ldr     r2, [r3]	; read oscr2ns_scale
>   18:   e59f304c        ldr     r3, [pc, #76]   ; load address of OSCR
>   1c:   e59c1000        ldr     r1, [ip]	; read __m_cnt_hi
>   20:   e1a07002        mov     r7, r2
>   24:   e3a08000        mov     r8, #0  ; 0x0
>   28:   e5933000        ldr     r3, [r3]	; read OSCR register
> ...
>   58:   e1820b04        orr     r0, r2, r4, lsl #22
>   5c:   e1a01524        lsr     r1, r4, #10
>   60:   e89da9f0        ldm     sp, {r4, r5, r6, r7, r8, fp, sp, pc}
> 
> 
> New code:
> 
>    c:   e59f0058        ldr     r0, [pc, #88]   ; load address of oscr2ns_scale
>   10:   e5901000        ldr     r1, [r0]	; read oscr2ns_scale <= pipeline stall
>   14:   e59f3054        ldr     r3, [pc, #84]   ; load address of __m_cnt_hi
>   18:   e1a08001        mov     r8, r1
>   1c:   e5932000        ldr     r2, [r3]	; read __m_cnt_hi
>   20:   e59f304c        ldr     r3, [pc, #76]   ; load address of OSCR
>   24:   e1a09002        mov     r9, r2
>   28:   e3a0a000        mov     sl, #0  ; 0x0
>   2c:   e5933000        ldr     r3, [r3]	; read OSCR
> ...
>   58:   e1825b04        orr     r5, r2, r4, lsl #22
>   5c:   e1a06524        lsr     r6, r4, #10
>   60:   e1a01006        mov     r1, r6
>   64:   e1a00005        mov     r0, r5
>   68:   e89daff0        ldm     sp, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
> 
> Versatile:
> 
> 1. 12 additional bytes of code
> 2. same number of registers
> 3. worse instruction scheduling causing pipeline stall
> 
> Actual reading of variables/hardware is unaffected by this patch.
> 
> So, we have two platforms where this patch makes things visibly worse
> with no material benefit.
> 

I think the added barrier() are causing these pipeline stalls. They
don't allow the compiler to read variables such as oscr2ns_scale before
the barrier because gcc cannot assume it won't be modified. However, to
insure that OSCR read is done after __m_cnt_hi read, this barrier seems
required to be safe against gcc optimizations.

Have you compared my patch to Nicolas'patch, which adds a smp_rmb() in
the macro or to a vanilla tree ?

I wonder if reading those values sooner would help gcc ?

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-11-11 20:11 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-07  5:23 [RFC patch 00/18] Trace Clock v2 Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 01/18] get_cycles() : kconfig HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 02/18] get_cycles() : x86 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 03/18] get_cycles() : sparc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 04/18] get_cycles() : powerpc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07 14:56   ` Josh Boyer
2008-11-07 18:14     ` Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 05/18] get_cycles() : MIPS HAVE_GET_CYCLES_32 Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 06/18] Trace clock generic Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 07/18] Trace clock core Mathieu Desnoyers
2008-11-07  5:52   ` Andrew Morton
2008-11-07  6:16     ` Mathieu Desnoyers
2008-11-07  6:26       ` Andrew Morton
2008-11-07 16:12         ` Mathieu Desnoyers
2008-11-07 16:19           ` Andrew Morton
2008-11-07 18:16             ` Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 08/18] cnt32_to_63 should use smp_rmb() Mathieu Desnoyers
2008-11-07  6:05   ` Andrew Morton
2008-11-07  8:12     ` Nicolas Pitre
2008-11-07  8:38       ` Andrew Morton
2008-11-07 11:20         ` David Howells
2008-11-07 15:01           ` Nicolas Pitre
2008-11-07 15:50             ` Andrew Morton
2008-11-07 16:21               ` David Howells
2008-11-07 16:29                 ` Andrew Morton
2008-11-07 17:10                   ` David Howells
2008-11-07 17:26                     ` Andrew Morton
2008-11-07 18:00                       ` Mathieu Desnoyers
2008-11-07 18:21                         ` Andrew Morton
2008-11-07 18:30                           ` Harvey Harrison
2008-11-07 18:42                             ` Mathieu Desnoyers
2008-11-07 18:33                           ` Mathieu Desnoyers
2008-11-07 18:36                           ` Linus Torvalds
2008-11-07 18:45                             ` Andrew Morton
2008-11-07 16:47               ` Nicolas Pitre
2008-11-07 16:55                 ` David Howells
2008-11-07 17:21                 ` Andrew Morton
2008-11-07 20:03                   ` Nicolas Pitre
2008-11-07 16:07             ` David Howells
2008-11-07 16:47           ` Mathieu Desnoyers
2008-11-07 17:04             ` David Howells
2008-11-07 17:17               ` Mathieu Desnoyers
2008-11-07 23:27                 ` David Howells
2008-11-07 20:11             ` Russell King
2008-11-07 21:36               ` Mathieu Desnoyers
2008-11-07 21:36                 ` Mathieu Desnoyers
2008-11-07 22:18                 ` Russell King
2008-11-07 22:36                   ` Mathieu Desnoyers
2008-11-07 22:36                     ` Mathieu Desnoyers
2008-11-07 23:41                   ` David Howells
2008-11-08  0:15                     ` Russell King
2008-11-08  0:45                       ` David Howells
2008-11-08 15:24                       ` Nicolas Pitre
2008-11-08 23:20                         ` [PATCH] clarify usage expectations for cnt32_to_63() Nicolas Pitre
2008-11-09  2:25                           ` Mathieu Desnoyers
2008-11-09  2:54                             ` Nicolas Pitre
2008-11-09  5:06                               ` Nicolas Pitre
2008-11-09  5:27                                 ` [PATCH v2] " Nicolas Pitre
2008-11-09  6:48                                   ` Mathieu Desnoyers
2008-11-09 13:34                                     ` Nicolas Pitre
2008-11-09 13:43                                       ` Russell King
2008-11-09 16:22                                       ` Mathieu Desnoyers
2008-11-10  4:20                                         ` Nicolas Pitre
2008-11-10  4:42                                           ` Andrew Morton
2008-11-10 21:34                                             ` Nicolas Pitre
2008-11-10 21:58                                               ` Andrew Morton
2008-11-10 23:15                                                 ` Nicolas Pitre
2008-11-10 23:22                                                   ` Andrew Morton
2008-11-10 23:38                                                     ` Steven Rostedt
2008-11-11  0:26                                                     ` Nicolas Pitre
2008-11-11 18:28                                                       ` [PATCH] convert cnt32_to_63 to inline Mathieu Desnoyers
2008-11-11 19:13                                                         ` Russell King
2008-11-11 20:11                                                           ` Mathieu Desnoyers [this message]
2008-11-11 20:11                                                             ` Mathieu Desnoyers
2008-11-11 21:51                                                             ` Russell King
2008-11-12  3:48                                                               ` Mathieu Desnoyers
2008-11-12  3:48                                                                 ` Mathieu Desnoyers
2008-11-11 21:00                                                         ` Nicolas Pitre
2008-11-11 21:13                                                           ` Russell King
2008-11-11 22:31                                                         ` David Howells
2008-11-11 22:37                                                           ` Peter Zijlstra
2008-11-12  1:13                                                             ` Steven Rostedt
2008-11-07 11:03       ` [RFC patch 08/18] cnt32_to_63 should use smp_rmb() David Howells
2008-11-07 16:51         ` Mathieu Desnoyers
2008-11-07 20:18           ` Nicolas Pitre
2008-11-07 23:55             ` David Howells
2008-11-07 10:59     ` David Howells
2008-11-07 10:55   ` David Howells
2008-11-07 17:09     ` Mathieu Desnoyers
2008-11-07 17:33       ` Steven Rostedt
2008-11-07 19:18         ` Mathieu Desnoyers
2008-11-07 19:32           ` Peter Zijlstra
2008-11-07 20:02             ` Mathieu Desnoyers
2008-11-07 20:45               ` Mathieu Desnoyers
2008-11-07 20:54                 ` Paul E. McKenney
2008-11-07 21:04                   ` Steven Rostedt
2008-11-08  0:34                     ` Paul E. McKenney
2008-11-07 21:16                   ` Mathieu Desnoyers
2008-11-07 23:50         ` David Howells
2008-11-08  0:55           ` Steven Rostedt
2008-11-09 11:51             ` David Howells
2008-11-09 14:31               ` Steven Rostedt
2008-11-09 16:18               ` Mathieu Desnoyers
2008-11-07 20:08       ` Steven Rostedt
2008-11-07 20:55         ` Paul E. McKenney
2008-11-07 21:27         ` Mathieu Desnoyers
2008-11-07 20:36       ` Nicolas Pitre
2008-11-07 20:55         ` Mathieu Desnoyers
2008-11-07 21:22           ` Nicolas Pitre
2008-11-07  5:23 ` [RFC patch 09/18] Powerpc : Trace clock Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 10/18] Sparc64 " Mathieu Desnoyers
2008-11-07  5:45   ` David Miller
2008-11-07  5:23 ` [RFC patch 11/18] LTTng timestamp sh Mathieu Desnoyers
2008-11-07  5:23   ` Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 12/18] LTTng - TSC synchronicity test Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 13/18] x86 : remove arch-specific tsc_sync.c Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 14/18] MIPS use tsc_sync.c Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 15/18] MIPS : export hpt frequency for trace_clock Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 16/18] MIPS create empty sync_core() Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 17/18] MIPS : Trace clock Mathieu Desnoyers
2008-11-07 11:53   ` Peter Zijlstra
2008-11-07 17:44     ` Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 18/18] x86 trace clock Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081111201130.GA10801@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.k \
    --cc=mingo@elte.hu \
    --cc=nico@cam.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.