Re: [PATCH] rw_semaphores, optimisations try #4

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: David Howells <dhowells@warthog.cambridge.redhat.com>
To: unlisted-recipients:; (no To-header on input)@localhost.localdomain
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] rw_semaphores, optimisations try #4
Date: Thu, 26 Apr 2001 08:39:16 +0100	[thread overview]
Message-ID: <8005.988270756@warthog.cambridge.redhat.com> (raw)
In-Reply-To: Your message of "Wed, 25 Apr 2001 22:56:21 +0200." <20010425225621.B13531@athlon.random>

Andrea Arcangeli <andrea@suse.de> wrote:
> It seems more similar to my code btw (you finally killed the useless
> chmxchg ;).

CMPXCHG ought to make things better by avoiding the XADD(+1)/XADD(-1) loop,
however, I tried various combinations and XADD beats CMPXCHG significantly.

Here's a quote from Borland assembler manual I managed to dig out, giving i486
timings on memory access:

       ADDL/SUBL	3 cycles
       XADDL		4 cycles
       CMPXCHG		8 cycles (success) / 10 cycles (failure)
       LOCK		+1 cycle minimum on this CPU

In reality, however, XADDL gives at least as good a result as ADDL/SUBL, maybe
just a little bit better, but its hard to say. However, the penalty imposed on
the other CPU (when it has to flush it's cache) probably more than makes up
for the difference.

> I only had a short low at your attached patch, but the results are quite
> suspect to my eyes beacuse we should still be equally fast in the fast
> path and I should still beat you on the write fast path because I do a
> much faster subl; js while you do movl -1; xadd ; js, while according to
> your results you beat me on both. Do you have an explanation or you
> don't know the reason either?

	MOVL $1,EDX
	SUBL EDX,(EAX)

Works out faster than:

	SUBL $1,(EAX)

as well... probably due to an avoided stall when the instruction before the
snippet loads EAX from memory. Oh yes... "STC, SUBL" may also be faster too.

> I will re-benchmark the whole thing shortly. But before re-benchmark if you
> have time could you fix the benchmark to use the variable pointer and send
> me a new tarball?  For your code it probably doesn't matter because you
> dereference the pointer by hand anyways, but it matters for mine and we want
> to benchmark real world fast path of course.

No, not till this evening now, I'm afraid.

As for real-world benchmarks, I suspect the fastpath is going to be
sufficiently few cycles that it's drowned out by whatever bit of code is
actually using it, like my Wine server module, which is where all this started
for me.

David

     prev parent reply	other threads:[~2001-04-26  7:39 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-04-25 20:06 [PATCH] rw_semaphores, optimisations try #4 D.W.Howells
2001-04-25 20:56 ` Andrea Arcangeli
2001-04-26  7:39   ` David Howells [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8005.988270756@warthog.cambridge.redhat.com \
    --to=dhowells@warthog.cambridge.redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox