public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Nicolas Pitre <nico@cam.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	torvalds@linux-foundation.org, dhowells@redhat.com,
	mingo@elte.hu, a.p.zijlstra@chello.nl,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	benh@kernel.crashing.org, paulus@samba.org, davem@davemloft.net,
	mingo@redhat.com, tglx@linutronix.de, rostedt@goodmis.org,
	linux-arch@vger.kernel.org
Subject: Re: [PATCH] convert cnt32_to_63 to inline
Date: Tue, 11 Nov 2008 15:11:30 -0500	[thread overview]
Message-ID: <20081111201130.GA10801@Krystal> (raw)
In-Reply-To: <20081111191355.GA13724@flint.arm.linux.org.uk>

* Russell King (rmk+lkml@arm.linux.org.uk) wrote:
> On Tue, Nov 11, 2008 at 01:28:00PM -0500, Mathieu Desnoyers wrote:
> > Let's see what it gives once implemented. Only compile-tested. Assumes
> > pxa, sa110 and mn10300 are all UP-only. Correct smp_rmb() are used for
> > arm versatile.
> 
> Versatile is also UP only.
> 
> The following are results from PXA built with gcc 3.4.3:
> 
> 1. two additional registers used in sched_clock()
> 2. 8 additional bytes of code (which are needless if gcc was more inteligent)
> 
> both of these I put down to inefficiencies in gcc's register allocation.
> 
> 3. worse instruction scheduling - two inter-dependent loads next to each
>    other causing a pipeline stall
> 
> Actual reading of variables/hardware is unaffected by this patch.
> 
> Old code:
> 
>    c:   e59f3050        ldr     r3, [pc, #80]   ; load address of oscr2ns_scale
>   10:   e59fc050        ldr     ip, [pc, #80]   ; load address of __m_cnt_hi
>   14:   e5932000        ldr     r2, [r3]	; read oscr2ns_scale
>   18:   e59f304c        ldr     r3, [pc, #76]   ; load address of OSCR
>   1c:   e59c1000        ldr     r1, [ip]	; read __m_cnt_hi
>   20:   e1a07002        mov     r7, r2
>   24:   e3a08000        mov     r8, #0  ; 0x0
>   28:   e5933000        ldr     r3, [r3]	; read OSCR register
> ...
>   58:   e1820b04        orr     r0, r2, r4, lsl #22
>   5c:   e1a01524        lsr     r1, r4, #10
>   60:   e89da9f0        ldm     sp, {r4, r5, r6, r7, r8, fp, sp, pc}
> 
> 
> New code:
> 
>    c:   e59f0058        ldr     r0, [pc, #88]   ; load address of oscr2ns_scale
>   10:   e5901000        ldr     r1, [r0]	; read oscr2ns_scale <= pipeline stall
>   14:   e59f3054        ldr     r3, [pc, #84]   ; load address of __m_cnt_hi
>   18:   e1a08001        mov     r8, r1
>   1c:   e5932000        ldr     r2, [r3]	; read __m_cnt_hi
>   20:   e59f304c        ldr     r3, [pc, #76]   ; load address of OSCR
>   24:   e1a09002        mov     r9, r2
>   28:   e3a0a000        mov     sl, #0  ; 0x0
>   2c:   e5933000        ldr     r3, [r3]	; read OSCR
> ...
>   58:   e1825b04        orr     r5, r2, r4, lsl #22
>   5c:   e1a06524        lsr     r6, r4, #10
>   60:   e1a01006        mov     r1, r6
>   64:   e1a00005        mov     r0, r5
>   68:   e89daff0        ldm     sp, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
> 
> Versatile:
> 
> 1. 12 additional bytes of code
> 2. same number of registers
> 3. worse instruction scheduling causing pipeline stall
> 
> Actual reading of variables/hardware is unaffected by this patch.
> 
> So, we have two platforms where this patch makes things visibly worse
> with no material benefit.
> 

I think the added barrier() are causing these pipeline stalls. They
don't allow the compiler to read variables such as oscr2ns_scale before
the barrier because gcc cannot assume it won't be modified. However, to
insure that OSCR read is done after __m_cnt_hi read, this barrier seems
required to be safe against gcc optimizations.

Have you compared my patch to Nicolas'patch, which adds a smp_rmb() in
the macro or to a vanilla tree ?

I wonder if reading those values sooner would help gcc ?

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-11-11 20:11 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-07  5:23 [RFC patch 00/18] Trace Clock v2 Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 01/18] get_cycles() : kconfig HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 02/18] get_cycles() : x86 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 03/18] get_cycles() : sparc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 04/18] get_cycles() : powerpc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-07 14:56   ` Josh Boyer
2008-11-07 18:14     ` Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 05/18] get_cycles() : MIPS HAVE_GET_CYCLES_32 Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 06/18] Trace clock generic Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 07/18] Trace clock core Mathieu Desnoyers
2008-11-07  5:52   ` Andrew Morton
2008-11-07  6:16     ` Mathieu Desnoyers
2008-11-07  6:26       ` Andrew Morton
2008-11-07 16:12         ` Mathieu Desnoyers
2008-11-07 16:19           ` Andrew Morton
2008-11-07 18:16             ` Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 08/18] cnt32_to_63 should use smp_rmb() Mathieu Desnoyers
2008-11-07  6:05   ` Andrew Morton
2008-11-07  8:12     ` Nicolas Pitre
2008-11-07  8:38       ` Andrew Morton
2008-11-07 11:20         ` David Howells
2008-11-07 15:01           ` Nicolas Pitre
2008-11-07 15:50             ` Andrew Morton
2008-11-07 16:21               ` David Howells
2008-11-07 16:29                 ` Andrew Morton
2008-11-07 17:10                   ` David Howells
2008-11-07 17:26                     ` Andrew Morton
2008-11-07 18:00                       ` Mathieu Desnoyers
2008-11-07 18:21                         ` Andrew Morton
2008-11-07 18:30                           ` Harvey Harrison
2008-11-07 18:42                             ` Mathieu Desnoyers
2008-11-07 18:33                           ` Mathieu Desnoyers
2008-11-07 18:36                           ` Linus Torvalds
2008-11-07 18:45                             ` Andrew Morton
2008-11-07 16:47               ` Nicolas Pitre
2008-11-07 16:55                 ` David Howells
2008-11-07 17:21                 ` Andrew Morton
2008-11-07 20:03                   ` Nicolas Pitre
2008-11-07 16:07             ` David Howells
2008-11-07 16:47           ` Mathieu Desnoyers
2008-11-07 17:04             ` David Howells
2008-11-07 17:17               ` Mathieu Desnoyers
2008-11-07 23:27                 ` David Howells
2008-11-07 20:11             ` Russell King
2008-11-07 21:36               ` Mathieu Desnoyers
2008-11-07 22:18                 ` Russell King
2008-11-07 22:36                   ` Mathieu Desnoyers
2008-11-07 23:41                   ` David Howells
2008-11-08  0:15                     ` Russell King
2008-11-08  0:45                       ` David Howells
2008-11-08 15:24                       ` Nicolas Pitre
2008-11-08 23:20                         ` [PATCH] clarify usage expectations for cnt32_to_63() Nicolas Pitre
2008-11-09  2:25                           ` Mathieu Desnoyers
2008-11-09  2:54                             ` Nicolas Pitre
2008-11-09  5:06                               ` Nicolas Pitre
2008-11-09  5:27                                 ` [PATCH v2] " Nicolas Pitre
2008-11-09  6:48                                   ` Mathieu Desnoyers
2008-11-09 13:34                                     ` Nicolas Pitre
2008-11-09 13:43                                       ` Russell King
2008-11-09 16:22                                       ` Mathieu Desnoyers
2008-11-10  4:20                                         ` Nicolas Pitre
2008-11-10  4:42                                           ` Andrew Morton
2008-11-10 21:34                                             ` Nicolas Pitre
2008-11-10 21:58                                               ` Andrew Morton
2008-11-10 23:15                                                 ` Nicolas Pitre
2008-11-10 23:22                                                   ` Andrew Morton
2008-11-10 23:38                                                     ` Steven Rostedt
2008-11-11  0:26                                                     ` Nicolas Pitre
2008-11-11 18:28                                                       ` [PATCH] convert cnt32_to_63 to inline Mathieu Desnoyers
2008-11-11 19:13                                                         ` Russell King
2008-11-11 20:11                                                           ` Mathieu Desnoyers [this message]
2008-11-11 21:51                                                             ` Russell King
2008-11-12  3:48                                                               ` Mathieu Desnoyers
2008-11-11 21:00                                                         ` Nicolas Pitre
2008-11-11 21:13                                                           ` Russell King
2008-11-11 22:31                                                         ` David Howells
2008-11-11 22:37                                                           ` Peter Zijlstra
2008-11-12  1:13                                                             ` Steven Rostedt
2008-11-07 11:03       ` [RFC patch 08/18] cnt32_to_63 should use smp_rmb() David Howells
2008-11-07 16:51         ` Mathieu Desnoyers
2008-11-07 20:18           ` Nicolas Pitre
2008-11-07 23:55             ` David Howells
2008-11-07 10:59     ` David Howells
2008-11-07 10:55   ` David Howells
2008-11-07 17:09     ` Mathieu Desnoyers
2008-11-07 17:33       ` Steven Rostedt
2008-11-07 19:18         ` Mathieu Desnoyers
2008-11-07 19:32           ` Peter Zijlstra
2008-11-07 20:02             ` Mathieu Desnoyers
2008-11-07 20:45               ` Mathieu Desnoyers
2008-11-07 20:54                 ` Paul E. McKenney
2008-11-07 21:04                   ` Steven Rostedt
2008-11-08  0:34                     ` Paul E. McKenney
2008-11-07 21:16                   ` Mathieu Desnoyers
2008-11-07 23:50         ` David Howells
2008-11-08  0:55           ` Steven Rostedt
2008-11-09 11:51             ` David Howells
2008-11-09 14:31               ` Steven Rostedt
2008-11-09 16:18               ` Mathieu Desnoyers
2008-11-07 20:08       ` Steven Rostedt
2008-11-07 20:55         ` Paul E. McKenney
2008-11-07 21:27         ` Mathieu Desnoyers
2008-11-07 20:36       ` Nicolas Pitre
2008-11-07 20:55         ` Mathieu Desnoyers
2008-11-07 21:22           ` Nicolas Pitre
2008-11-07  5:23 ` [RFC patch 09/18] Powerpc : Trace clock Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 10/18] Sparc64 " Mathieu Desnoyers
2008-11-07  5:45   ` David Miller
2008-11-07  5:23 ` [RFC patch 11/18] LTTng timestamp sh Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 12/18] LTTng - TSC synchronicity test Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 13/18] x86 : remove arch-specific tsc_sync.c Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 14/18] MIPS use tsc_sync.c Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 15/18] MIPS : export hpt frequency for trace_clock Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 16/18] MIPS create empty sync_core() Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 17/18] MIPS : Trace clock Mathieu Desnoyers
2008-11-07 11:53   ` Peter Zijlstra
2008-11-07 17:44     ` Mathieu Desnoyers
2008-11-07  5:23 ` [RFC patch 18/18] x86 trace clock Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081111201130.GA10801@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=nico@cam.org \
    --cc=paulus@samba.org \
    --cc=ralf@linux-mips.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox