Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: David Howells <dhowells@warthog.cambridge.redhat.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: torvalds@transmeta.com, linux-kernel@vger.kernel.org, davem@redhat.com
Subject: Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]
Date: Tue, 24 Apr 2001 09:56:11 +0100	[thread overview]
Message-ID: <5927.988102571@warthog.cambridge.redhat.com> (raw)
In-Reply-To: Your message of "Tue, 24 Apr 2001 06:56:33 +0200." <20010424065633.A16845@athlon.random>

> Ok I finished now my asm optimized rwsemaphores and I improved a little my
> spinlock based one but without touching the icache usage.

And I can break it. There's a very good reason the I changed __up_write() to
use CMPXCHG instead of SUBL. I found a sequence of operations that locked up
on this.

Unfortunately, I can't remember what it was. I do have it written down at
home, I think, so I see about sending it to you later.

>
> The main advantage of my rewrite are:
>
> -	my x86 version is visibly faster than yours in the write fast path and
> 	it also saves icache because it's smaller

That's because your fast write path is wrong. You can get away without
clobbering EDX and ECX, which I can't.

> (read fast path is reproducibly a bit faster too)

| +static inline void __down_read(struct rw_semaphore *sem)
...
| +			     : "+m" (sem->count), "+a" (sem)

>From what I've been told, you're lucky here... you avoid a pipeline stall
between whatever loads sem into EAX and the INCL that increments what it
points to because the "+a" constraint says that EAX will be changed. This
means that the compiler saves EAX before entering the inline asm, thus
delaying the INCL and thus avoiding the stall. However, you generally seem to
gain this at the expense of clobbering another register, say EDX.

So, for a function that just calls down_read twice, I get:
  20:   8b 44 24 04             mov    0x4(%esp,1),%eax
  24:   f0 ff 00                lock incl (%eax)
  27:   0f 88 22 00 00 00       js     4f <dr2+0x2f>
  2d:   f0 ff 00                lock incl (%eax)
  30:   0f 88 30 00 00 00       js     66 <dr2+0x46>
  36:   c3                      ret

And you get:
   0:   83 ec 08                sub    $0x8,%esp
   3:   8b 54 24 0c             mov    0xc(%esp,1),%edx
   7:   89 d0                   mov    %edx,%eax
   9:   f0 ff 02                lock incl (%edx)
   c:   0f 88 fc ff ff ff       js     e <dr+0xe>
  12:   89 d0                   mov    %edx,%eax
  14:   f0 ff 02                lock incl (%edx)
  17:   0f 88 0f 00 00 00       js     2c <dr2+0xc>
  1d:   58                      pop    %eax
  1e:   5a                      pop    %edx
  1f:   c3                      ret

Note also your asm constraints cause the compiler to eat an extra 8 bytes of
stack and then to pop it into registers to get rid of it. This is a gcc bug,
and will only hurt if the up_read and down_read are done in separate
subroutines.

| +static inline void __up_read(struct rw_semaphore *sem)
...
| +			     "jnz 1b\n\t"
| +			     "pushl %%ecx\n\t"
| +			     "call rwsem_wake\n\t"

Putting a PUSHL or two there hurt performance when I tried it, because, I'm
told, it introduces a pipeline stall.

> -	the common code automatically extends itself to support 2^32
>	concurrent sleepers on 64bit archs

You shouldn't do this in the XADD case, since you are explicitly using 32-bit
registers and instructions.

Actually, both of our generic cases allow 2^31 sleepers on a 32-bit arch, and
by changing mine to a long I can make it so we both support up to 2^63 on a
64-bit arch. However, I suspect that is overkill...

> -	there is no code duplication at all in supporting xchgadd common code
>       logic for other archs (and I prepared a skeleton to fill for the alpha)

Why not make it shareable? It doesn't have to be mandatory...

> So I'd suggest Linus to apply one of my above -8 patches for pre7. (I hope I
> won't need any secondary try)

I recommend against this at the moment (there's a bug in __up_write in his X86
optimised code).

David

next prev parent reply	other threads:[~2001-04-24  8:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-04-23 20:35 [PATCH] rw_semaphores, optimisations try #3 D.W.Howells
2001-04-23 21:34 ` Andrea Arcangeli
2001-04-24  4:56   ` rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3] Andrea Arcangeli
2001-04-24  8:56     ` David Howells [this message]
2001-04-24  9:49       ` Andrea Arcangeli
2001-04-24 10:25         ` David Howells
2001-04-24 10:44           ` Andrea Arcangeli
2001-04-24 13:07             ` David Howells
2001-04-24 13:59               ` Andrea Arcangeli
2001-04-24 15:49             ` Linus Torvalds
2001-04-24 10:17       ` Andrea Arcangeli
2001-04-24 10:33         ` David Howells
2001-04-24 10:46           ` Andrea Arcangeli
2001-04-24 12:19             ` Andrea Arcangeli
2001-04-24 13:10               ` Andrea Arcangeli
2001-04-23 22:23 ` [PATCH] rw_semaphores, optimisations try #3 Linus Torvalds
2001-04-24 10:05   ` David Howells
2001-04-24 15:40     ` Linus Torvalds
2001-04-24 16:37       ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5927.988102571@warthog.cambridge.redhat.com \
    --to=dhowells@warthog.cambridge.redhat.com \
    --cc=andrea@suse.de \
    --cc=davem@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox