All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: xen-devel@lists.xensource.com, "H. Peter Anvin" <hpa@zytor.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Nick Piggin <npiggin@kernel.dk>, KVM <kvm@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>, Avi Kivity <avi@redhat.com>,
	Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>,
	Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [Xen-devel] [PATCH 00/10] [PATCH RFC V2] Paravirtualized ticketlocks
Date: Tue, 27 Sep 2011 09:44:02 -0700	[thread overview]
Message-ID: <4E81FD52.50106@goop.org> (raw)
In-Reply-To: <3300108.XQUp9Wrktc@chlor>

On 09/27/2011 02:34 AM, Stephan Diestelhorst wrote:
> On Wednesday 14 September 2011, 17:31:32 Jeremy Fitzhardinge wrote:
>> This series replaces the existing paravirtualized spinlock mechanism
>> with a paravirtualized ticketlock mechanism.
> [...] 
>> The unlock code is very straightforward:
>> 	prev = *lock;
>> 	__ticket_unlock_release(lock);
>> 	if (unlikely(__ticket_in_slowpath(lock)))
>> 		__ticket_unlock_slowpath(lock, prev);
>>
>> which generates:
>> 	push   %rbp
>> 	mov    %rsp,%rbp
>>
>>     movzwl (%rdi),%esi
>> 	addb   $0x2,(%rdi)
>>     movzwl (%rdi),%eax
>> 	testb  $0x1,%ah
>> 	jne    1f
>>
>> 	pop    %rbp
>> 	retq   
>>
>> 	### SLOWPATH START
>> 1:	movzwl (%rdi),%edx
>> 	movzbl %dh,%ecx
>> 	mov    %edx,%eax
>> 	and    $-2,%ecx	# clear TICKET_SLOWPATH_FLAG
>> 	mov    %cl,%dh
>> 	cmp    %dl,%cl	# test to see if lock is uncontended
>> 	je     3f
>>
>> 2:	movzbl %dl,%esi
>> 	callq  *__ticket_unlock_kick	# kick anyone waiting
>> 	pop    %rbp
>> 	retq   
>>
>> 3:	lock cmpxchg %dx,(%rdi)	# use cmpxchg to safely write back flag
>> 	jmp    2b
>> 	### SLOWPATH END
> [...]
>> Thoughts? Comments? Suggestions?
> You have a nasty data race in your code that can cause a losing
> acquirer to sleep forever, because its setting the TICKET_SLOWPATH flag
> can race with the lock holder releasing the lock.
>
> I used the code for the slow path from the GIT repo.
>
> Let me try to point out an interleaving:
>
> Lock is held by one thread, contains 0x0200.
>
> _Lock holder_                   _Acquirer_
>                                 mov    $0x200,%eax
>                                 lock xadd %ax,(%rdi)
>                                 // ax:= 0x0200, lock:= 0x0400
>                                 ...
>                                 // this guy spins for a while, reading
>                                 // the lock
>                                 ...
> //trying to free the lock
> movzwl (%rdi),%esi (esi:=0x0400)
> addb   $0x2,(%rdi) (LOCAL copy of lock is now: 0x0402)
> movzwl (%rdi),%eax (local forwarding from previous store: eax := 0x0402)
> testb  $0x1,%ah    (no wakeup of anybody)
> jne    1f
>
>                                 callq  *__ticket_lock_spinning
>                                   ...
>                                   // __ticket_enter_slowpath(lock)
>                                   lock or (%rdi), $0x100
>                                   // (global view of lock := 0x0500)
> 						...
>                                   ACCESS_ONCE(lock->tickets.head) == want
>                                   // (reads 0x00)
> 						...
>                                   xen_poll_irq(irq); // goes to sleep
> ...
> [addb   $0x2,(%rdi)]
> // (becomes globally visible only now! global view of lock := 0x0502)
> ...
>
> Your code is reusing the (just about) safe version of unlocking a
> spinlock without understanding the effect that close has on later
> memory ordering. It may work on CPUs that cannot do narrow -> wide
> store to load forwarding and have to make the addb store visible
> globally. This is an implementation artifact of specific uarches, and
> you mustn't rely on it, since our specified memory model allows looser
> behaviour.

Ah, thanks for this observation.  I've seen this bug before when I
didn't pay attention to the unlock W vs flag R ordering at all, and I
was hoping the aliasing would be sufficient - and certainly this seems
to have been OK on my Intel systems.  But you're saying that it will
fail on current AMD systems?  Have you tested this, or is this just from
code analysis (which I agree with after reviewing the ordering rules in
the Intel manual).

> Since you want to get that addb out to global memory before the second
> read, either use a LOCK prefix for it, add an MFENCE between addb and
> movzwl, or use a LOCKed instruction that will have a fencing effect
> (e.g., to top-of-stack)between addb and movzwl.

Hm.  I don't really want to do any of those because it will probably
have a significant effect on the unlock performance; I was really trying
to avoid adding any more locked instructions.  A previous version of the
code had an mfence in here, but I hit on the idea of using aliasing to
get the ordering I want - but overlooked the possible effect of store
forwarding.

I guess it comes down to throwing myself on the efficiency of some kind
of fence instruction.  I guess an lfence would be sufficient; is that
any more efficient than a full mfence?  At least I can make it so that
its only present when pv ticket locks are actually in use, so it won't
affect the native case.

Could you give me a pointer to AMD's description of the ordering rules?

Thanks,
    J

WARNING: multiple messages have this Message-ID (diff)
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: xen-devel@lists.xensource.com, Nick Piggin <npiggin@kernel.dk>,
	KVM <kvm@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>, Avi Kivity <avi@redhat.com>,
	Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 00/10] [PATCH RFC V2] Paravirtualized ticketlocks
Date: Tue, 27 Sep 2011 09:44:02 -0700	[thread overview]
Message-ID: <4E81FD52.50106@goop.org> (raw)
In-Reply-To: <3300108.XQUp9Wrktc@chlor>

On 09/27/2011 02:34 AM, Stephan Diestelhorst wrote:
> On Wednesday 14 September 2011, 17:31:32 Jeremy Fitzhardinge wrote:
>> This series replaces the existing paravirtualized spinlock mechanism
>> with a paravirtualized ticketlock mechanism.
> [...] 
>> The unlock code is very straightforward:
>> 	prev = *lock;
>> 	__ticket_unlock_release(lock);
>> 	if (unlikely(__ticket_in_slowpath(lock)))
>> 		__ticket_unlock_slowpath(lock, prev);
>>
>> which generates:
>> 	push   %rbp
>> 	mov    %rsp,%rbp
>>
>>     movzwl (%rdi),%esi
>> 	addb   $0x2,(%rdi)
>>     movzwl (%rdi),%eax
>> 	testb  $0x1,%ah
>> 	jne    1f
>>
>> 	pop    %rbp
>> 	retq   
>>
>> 	### SLOWPATH START
>> 1:	movzwl (%rdi),%edx
>> 	movzbl %dh,%ecx
>> 	mov    %edx,%eax
>> 	and    $-2,%ecx	# clear TICKET_SLOWPATH_FLAG
>> 	mov    %cl,%dh
>> 	cmp    %dl,%cl	# test to see if lock is uncontended
>> 	je     3f
>>
>> 2:	movzbl %dl,%esi
>> 	callq  *__ticket_unlock_kick	# kick anyone waiting
>> 	pop    %rbp
>> 	retq   
>>
>> 3:	lock cmpxchg %dx,(%rdi)	# use cmpxchg to safely write back flag
>> 	jmp    2b
>> 	### SLOWPATH END
> [...]
>> Thoughts? Comments? Suggestions?
> You have a nasty data race in your code that can cause a losing
> acquirer to sleep forever, because its setting the TICKET_SLOWPATH flag
> can race with the lock holder releasing the lock.
>
> I used the code for the slow path from the GIT repo.
>
> Let me try to point out an interleaving:
>
> Lock is held by one thread, contains 0x0200.
>
> _Lock holder_                   _Acquirer_
>                                 mov    $0x200,%eax
>                                 lock xadd %ax,(%rdi)
>                                 // ax:= 0x0200, lock:= 0x0400
>                                 ...
>                                 // this guy spins for a while, reading
>                                 // the lock
>                                 ...
> //trying to free the lock
> movzwl (%rdi),%esi (esi:=0x0400)
> addb   $0x2,(%rdi) (LOCAL copy of lock is now: 0x0402)
> movzwl (%rdi),%eax (local forwarding from previous store: eax := 0x0402)
> testb  $0x1,%ah    (no wakeup of anybody)
> jne    1f
>
>                                 callq  *__ticket_lock_spinning
>                                   ...
>                                   // __ticket_enter_slowpath(lock)
>                                   lock or (%rdi), $0x100
>                                   // (global view of lock := 0x0500)
> 						...
>                                   ACCESS_ONCE(lock->tickets.head) == want
>                                   // (reads 0x00)
> 						...
>                                   xen_poll_irq(irq); // goes to sleep
> ...
> [addb   $0x2,(%rdi)]
> // (becomes globally visible only now! global view of lock := 0x0502)
> ...
>
> Your code is reusing the (just about) safe version of unlocking a
> spinlock without understanding the effect that close has on later
> memory ordering. It may work on CPUs that cannot do narrow -> wide
> store to load forwarding and have to make the addb store visible
> globally. This is an implementation artifact of specific uarches, and
> you mustn't rely on it, since our specified memory model allows looser
> behaviour.

Ah, thanks for this observation.  I've seen this bug before when I
didn't pay attention to the unlock W vs flag R ordering at all, and I
was hoping the aliasing would be sufficient - and certainly this seems
to have been OK on my Intel systems.  But you're saying that it will
fail on current AMD systems?  Have you tested this, or is this just from
code analysis (which I agree with after reviewing the ordering rules in
the Intel manual).

> Since you want to get that addb out to global memory before the second
> read, either use a LOCK prefix for it, add an MFENCE between addb and
> movzwl, or use a LOCKed instruction that will have a fencing effect
> (e.g., to top-of-stack)between addb and movzwl.

Hm.  I don't really want to do any of those because it will probably
have a significant effect on the unlock performance; I was really trying
to avoid adding any more locked instructions.  A previous version of the
code had an mfence in here, but I hit on the idea of using aliasing to
get the ordering I want - but overlooked the possible effect of store
forwarding.

I guess it comes down to throwing myself on the efficiency of some kind
of fence instruction.  I guess an lfence would be sufficient; is that
any more efficient than a full mfence?  At least I can make it so that
its only present when pv ticket locks are actually in use, so it won't
affect the native case.

Could you give me a pointer to AMD's description of the ordering rules?

Thanks,
    J

  reply	other threads:[~2011-09-27 16:44 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-15  0:31 [PATCH 00/10] [PATCH RFC V2] Paravirtualized ticketlocks Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 01/10] x86/ticketlocks: remove obsolete comment Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 02/10] x86/spinlocks: replace pv spinlocks with pv ticketlocks Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 03/10] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 04/10] x86/ticketlock: collapse a layer of functions Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 05/10] xen/pvticketlock: Xen implementation for PV ticket locks Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 06/10] x86/pvticketlock: use callee-save for lock_spinning Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 07/10] x86/ticketlocks: when paravirtualizing ticket locks, increment by 2 Jeremy Fitzhardinge
2011-09-15  0:31   ` Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 08/10] x86/ticketlock: add slowpath logic Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 09/10] xen/pvticketlock: allow interrupts to be enabled while blocking Jeremy Fitzhardinge
2011-09-15  0:31 ` [PATCH 10/10] xen: enable PV ticketlocks on HVM Xen Jeremy Fitzhardinge
2011-09-27  9:34 ` [Xen-devel] [PATCH 00/10] [PATCH RFC V2] Paravirtualized ticketlocks Stephan Diestelhorst
2011-09-27  9:34   ` Stephan Diestelhorst
2011-09-27  9:34   ` Stephan Diestelhorst
2011-09-27 16:44   ` Jeremy Fitzhardinge [this message]
2011-09-27 16:44     ` Jeremy Fitzhardinge
2011-09-28 13:58     ` [Xen-devel] " Stephan Diestelhorst
2011-09-28 16:44       ` Jeremy Fitzhardinge
2011-09-28 18:13         ` Stephan Diestelhorst
2011-09-28 15:38     ` Linus Torvalds
2011-09-28 15:55       ` Jan Beulich
2011-09-28 15:55         ` Jan Beulich
2011-09-28 16:10         ` Linus Torvalds
2011-09-28 16:47           ` Jeremy Fitzhardinge
2011-09-28 17:22             ` Linus Torvalds
2011-09-28 17:24               ` H. Peter Anvin
2011-09-28 17:50                 ` Jeremy Fitzhardinge
2011-09-28 18:08                   ` Stephan Diestelhorst
2011-09-28 18:27                     ` Jeremy Fitzhardinge
2011-09-28 18:49                     ` Linus Torvalds
2011-09-28 19:06                       ` Jeremy Fitzhardinge
2011-10-06 14:04                       ` Stephan Diestelhorst
2011-10-06 17:40                         ` Jeremy Fitzhardinge
2011-10-06 18:09                           ` Jeremy Fitzhardinge
2011-10-10  7:32                             ` Ingo Molnar
2011-10-10 19:51                               ` Jeremy Fitzhardinge
2011-10-10 11:00                           ` Stephan Diestelhorst
2011-10-10 11:00                             ` Stephan Diestelhorst
2011-10-10 14:01                             ` Stephan Diestelhorst
2011-10-10 14:01                               ` Stephan Diestelhorst
2011-10-10 19:44                               ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E81FD52.50106@goop.org \
    --to=jeremy@goop.org \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jeremy.fitzhardinge@citrix.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@kernel.dk \
    --cc=peterz@infradead.org \
    --cc=stephan.diestelhorst@amd.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.