From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Nick Piggin <npiggin@kernel.dk>, KVM <kvm@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
the arch/x86 maintainers <x86@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>, Avi Kivity <avi@redhat.com>,
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>,
Ingo Molnar <mingo@elte.hu>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jan Beulich <JBeulich@suse.com>
Subject: Re: [Xen-devel] [PATCH 00/10] [PATCH RFC V2] Paravirtualized ticketlocks
Date: Wed, 28 Sep 2011 09:44:25 -0700 [thread overview]
Message-ID: <4E834EE9.70102@goop.org> (raw)
In-Reply-To: <33782147.oLTY4kzH1r@chlor>
On 09/28/2011 06:58 AM, Stephan Diestelhorst wrote:
> I have tested this and have not seen it fail on publicly released AMD
> systems. But as I have tried to point out, this does not mean it is
> safe to do in software, because future microarchtectures may have more
> capable forwarding engines.
Sure.
>> Have you tested this, or is this just from code analysis (which I
>> agree with after reviewing the ordering rules in the Intel manual).
> We have found a similar issue in Novell's PV ticket lock implementation
> during internal product testing.
Jan may have picked it up from an earlier set of my patches.
>>> Since you want to get that addb out to global memory before the second
>>> read, either use a LOCK prefix for it, add an MFENCE between addb and
>>> movzwl, or use a LOCKed instruction that will have a fencing effect
>>> (e.g., to top-of-stack)between addb and movzwl.
>> Hm. I don't really want to do any of those because it will probably
>> have a significant effect on the unlock performance; I was really trying
>> to avoid adding any more locked instructions. A previous version of the
>> code had an mfence in here, but I hit on the idea of using aliasing to
>> get the ordering I want - but overlooked the possible effect of store
>> forwarding.
> Well, I'd be curious about the actual performance impact. If the store
> needs to commit to memory due to aliasing anyways, this would slow down
> execution, too. After all it is better to write working than fast code,
> no? ;-)
Rule of thumb is that AMD tends to do things like lock and fence more
efficiently than Intel - at least historically. I don't know if that's
still true for current Intel microarchitectures.
>> I guess it comes down to throwing myself on the efficiency of some kind
>> of fence instruction. I guess an lfence would be sufficient; is that
>> any more efficient than a full mfence?
> An lfence should not be sufficient, since that essentially is a NOP on
> WB memory. You really want a full fence here, since the store needs to
> be published before reading the lock with the next load.
The Intel manual reads:
Reads cannot pass earlier LFENCE and MFENCE instructions.
Writes cannot pass earlier LFENCE, SFENCE, and MFENCE instructions.
LFENCE instructions cannot pass earlier reads.
Which I interpreted as meaning that an lfence would prevent forwarding.
But I guess it doesn't say "lfence instructions cannot pass earlier
writes", which means that the lfence could logically happen before the
write, thereby allowing forwarding? Or should I be reading this some
other way?
>> Could you give me a pointer to AMD's description of the ordering rules?
> They should be in "AMD64 Architecture Programmer's Manual Volume 2:
> System Programming", Section 7.2 Multiprocessor Memory Access Ordering.
>
> http://developer.amd.com/documentation/guides/pages/default.aspx#manuals
>
> Let me know if you have some clarifying suggestions. We are currently
> revising these documents...
I find the English descriptions of these kinds of things frustrating to
read because of ambiguities in the precise meaning of words like "pass",
"ahead", "behind" in these contexts. I find the prose useful to get an
overview, but when I have a specific question I wonder if something more
formal would be useful.
I guess it's implied that anything that is not prohibited by the
ordering rules is allowed, but it wouldn't hurt to say it explicitly.
That said, the AMD description seems clearer and more explicit than the
Intel manual (esp since it specifically discusses the problem here).
Thanks,
J
next prev parent reply other threads:[~2011-09-28 17:18 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-15 0:31 [PATCH 00/10] [PATCH RFC V2] Paravirtualized ticketlocks Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 01/10] x86/ticketlocks: remove obsolete comment Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 02/10] x86/spinlocks: replace pv spinlocks with pv ticketlocks Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 03/10] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 04/10] x86/ticketlock: collapse a layer of functions Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 05/10] xen/pvticketlock: Xen implementation for PV ticket locks Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 06/10] x86/pvticketlock: use callee-save for lock_spinning Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 07/10] x86/ticketlocks: when paravirtualizing ticket locks, increment by 2 Jeremy Fitzhardinge
2011-09-15 0:31 ` Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 08/10] x86/ticketlock: add slowpath logic Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 09/10] xen/pvticketlock: allow interrupts to be enabled while blocking Jeremy Fitzhardinge
2011-09-15 0:31 ` [PATCH 10/10] xen: enable PV ticketlocks on HVM Xen Jeremy Fitzhardinge
2011-09-27 9:34 ` [Xen-devel] [PATCH 00/10] [PATCH RFC V2] Paravirtualized ticketlocks Stephan Diestelhorst
2011-09-27 9:34 ` Stephan Diestelhorst
2011-09-27 9:34 ` Stephan Diestelhorst
2011-09-27 16:44 ` [Xen-devel] " Jeremy Fitzhardinge
2011-09-27 16:44 ` Jeremy Fitzhardinge
2011-09-28 13:58 ` [Xen-devel] " Stephan Diestelhorst
2011-09-28 16:44 ` Jeremy Fitzhardinge [this message]
2011-09-28 18:13 ` Stephan Diestelhorst
2011-09-28 15:38 ` Linus Torvalds
2011-09-28 15:55 ` Jan Beulich
2011-09-28 15:55 ` Jan Beulich
2011-09-28 16:10 ` Linus Torvalds
2011-09-28 16:47 ` Jeremy Fitzhardinge
2011-09-28 17:22 ` Linus Torvalds
2011-09-28 17:24 ` H. Peter Anvin
2011-09-28 17:50 ` Jeremy Fitzhardinge
2011-09-28 18:08 ` Stephan Diestelhorst
2011-09-28 18:27 ` Jeremy Fitzhardinge
2011-09-28 18:49 ` Linus Torvalds
2011-09-28 19:06 ` Jeremy Fitzhardinge
2011-10-06 14:04 ` Stephan Diestelhorst
2011-10-06 17:40 ` Jeremy Fitzhardinge
2011-10-06 18:09 ` Jeremy Fitzhardinge
2011-10-10 7:32 ` Ingo Molnar
2011-10-10 19:51 ` Jeremy Fitzhardinge
2011-10-10 11:00 ` Stephan Diestelhorst
2011-10-10 11:00 ` Stephan Diestelhorst
2011-10-10 14:01 ` Stephan Diestelhorst
2011-10-10 14:01 ` Stephan Diestelhorst
2011-10-10 19:44 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E834EE9.70102@goop.org \
--to=jeremy@goop.org \
--cc=JBeulich@suse.com \
--cc=andi@firstfloor.org \
--cc=avi@redhat.com \
--cc=hpa@zytor.com \
--cc=jeremy.fitzhardinge@citrix.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mtosatti@redhat.com \
--cc=npiggin@kernel.dk \
--cc=peterz@infradead.org \
--cc=stephan.diestelhorst@amd.com \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.