Re: [RFC] unify semaphore implementations

public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed

From: Russell King <rmk+lkml@arm.linux.org.uk>
To: Benjamin LaHaise <bcrl@kvack.org>
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] unify semaphore implementations
Date: Thu, 28 Apr 2005 23:40:05 +0100	[thread overview]
Message-ID: <20050428234005.A19778@flint.arm.linux.org.uk> (raw)
In-Reply-To: <20050428182926.GC16545@kvack.org>; from bcrl@kvack.org on Thu, Apr 28, 2005 at 02:29:26PM -0400

On Thu, Apr 28, 2005 at 02:29:26PM -0400, Benjamin LaHaise wrote:
> Please review the following series of patches for unifying the 
> semaphore implementation across all architectures (not posted as 
> they're about 350K), as they have only been tested on x86-64.  The 
> code generated is functionally identical to the earlier i386 
> variant, but since gcc has no way of taking condition codes as 
> results, there are two additional instructions inserted from the 
> use of generic atomic operations.  All told the >6000 lines of code 
> deleted makes for a much easier job for subsequent patches changing 
> semaphore functionality.  Cheers,

I'm not sure why we're doing this, apart from a desire to unify stuff.
What happened to efficiency and performance?

It is my understanding that the inline part of the semaphore
implementation was one of the critical areas - critical enough to
warrant coding it in assembly for some people.

So, I set about testing this implementation against the ARM
implementation.  For this, I used this code:

void it(struct semaphore *sem, int *val)
{
        down(sem);
        *val = *val + 1;
        up(sem);
}

which is a relatively simple test case.

The ARM assembly implementation for this gave:

        str     lr, [sp, #-4]!
        @ down_op
        mrs     ip, cpsr		@ local_irq_save
        orr     lr, ip, #128
        msr     cpsr_c, lr
        ldr     lr, [r0]
        subs    lr, lr, #1
        str     lr, [r0]
        msr     cpsr_c, ip		@ local_irq_restore
        movmi   ip, r0
        blmi    __down_failed
        ldr     r3, [r1, #0]
        add     r3, r3, #1
        str     r3, [r1, #0]
        @ up_op
        mrs     ip, cpsr		@ local_irq_save
        orr     lr, ip, #128
        msr     cpsr_c, lr
        ldr     lr, [r0]
        adds    lr, lr, #1
        str     lr, [r0]
        msr     cpsr_c, ip		@ local_irq_restore
        movle   ip, r0
        blle    __up_wakeup
        ldr     pc, [sp], #4

Stack: 1 location
Registers: 3 on top of the two arguments
Instructions: 23, 23 in the common execution path.

The atomic-op based implementation gave:

        stmfd   sp!, {r4, r5, lr}
        mov     r5, r1
        mov     r4, r0
        mrs     r2, cpsr                @ local_irq_save
        orr     r1, r2, #128
        msr     cpsr_c, r1
        ldr     r3, [r0, #0]
        sub     r3, r3, #1
        str     r3, [r0, #0]
        msr     cpsr_c, r2              @ local_irq_restore
        cmp     r3, #0
        blt     .L8
.L4:    ldr     r3, [r5, #0]
        add     r3, r3, #1
        str     r3, [r5, #0]
        mrs     r2, cpsr                @ local_irq_save
        orr     r1, r2, #128
        msr     cpsr_c, r1
        ldr     r3, [r4, #0]
        add     r3, r3, #1
        str     r3, [r4, #0]
        msr     cpsr_c, r2              @ local_irq_restore
        cmp     r3, #0
        mov     r0, r4
        ldmgtfd sp!, {r4, r5, pc}
        ldmfd   sp!, {r4, r5, lr}
        b       up_wakeup
.L8:    bl      down_failed
        b       .L4

Stack: 3 locations
Registers: 5 on top of the two arguments
Instructions: 29, 27 in the common execution path.

So, Ben's implementation is more expensive in terms of stack usage,
register usage, and number of instructions.

Why is this?  The answer is that gcc is unable to properly optimise
for ARM - I've no idea why.  One such simple thing is that the
sequence:

	blt	.L8
.L4:
...
.L8:	bl	down_failed
	b	.L4

can be easily rewritten as:

	bllt	down_failed

Another reason for the extra code bloat is that GCC can't propagate
PSR flags between the sub/add instructions across a memory barrier
(which local_irq_restore is.)

This makes the ARM assembly code implementation superior to any unified
version, no matter how you look at it from efficiency or performance
points of view.  The _only_ win is that of unification.

However, given that the kernel has bloated itself by 300K per stable
kernel series on ARM for the same functionality, I'd prefer to keep
my more efficient higher performance smaller code size semaphores
please.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

next prev parent reply	other threads:[~2005-04-28 22:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-28 18:29 [RFC] unify semaphore implementations Benjamin LaHaise
2005-04-28 18:48 ` James Bottomley
2005-04-28 18:59   ` Benjamin LaHaise
2005-04-28 18:53     ` David S. Miller
2005-04-28 22:40 ` Russell King [this message]
2005-04-29  0:42   ` Trond Myklebust
2005-04-29  1:26     ` Paul Mackerras
2005-04-28 22:54 ` David Howells
2005-04-29  0:44 ` Paul Mackerras
2005-04-29  5:33   ` Richard Henderson
2005-04-29 14:14   ` Benjamin LaHaise
2005-04-29 15:42     ` Trond Myklebust
2005-04-30  1:45       ` Paul Mackerras
2005-04-30  5:13         ` Paul Mackerras
2005-04-30 16:40         ` Trond Myklebust
2005-04-30  1:49     ` Paul Mackerras
2005-04-30 16:50       ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050428234005.A19778@flint.arm.linux.org.uk \
    --to=rmk+lkml@arm.linux.org.uk \
    --cc=bcrl@kvack.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox