All of lore.kernel.org
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
	peterz@infradead.org, torvalds@linux-foundation.org,
	konrad.wilk@oracle.com, pbonzini@redhat.com,
	paulmck@linux.vnet.ibm.com, waiman.long@hp.com, davej@redhat.com,
	oleg@redhat.com, x86@kernel.org, paul.gortmaker@windriver.com,
	ak@linux.intel.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	xen-devel@lists.xenproject.org, riel@redhat.com,
	borntraeger@de.ibm.com, akpm@linux-foundation.org,
	a.ryabinin@samsung.com, sasha.levin@oracle.com
Subject: Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions
Date: Mon, 09 Feb 2015 15:04:22 +0530	[thread overview]
Message-ID: <54D87F1E.9060307@linux.vnet.ibm.com> (raw)
In-Reply-To: <54D7D19B.1000103@goop.org>

On 02/09/2015 02:44 AM, Jeremy Fitzhardinge wrote:
> On 02/06/2015 06:49 AM, Raghavendra K T wrote:
[...]
>
>> Linus suggested that we should not do any writes to lock after unlock(),
>> and we can move slowpath clearing to fastpath lock.
>
> Yep, that seems like a sound approach.

Current approach seem to be working now. (though we could not avoid read).
Related question: Do you think we could avoid SLOWPATH_FLAG itself by
checking head and tail difference. or is it costly because it may
result in unnecessary unlock_kicks?

>> However it brings additional case to be handled, viz., slowpath still
>> could be set when somebody does arch_trylock. Handle that too by ignoring
>> slowpath flag during lock availability check.
>>
>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
>> ---
>>   arch/x86/include/asm/spinlock.h | 70 ++++++++++++++++++++---------------------
>>   1 file changed, 34 insertions(+), 36 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
>> index 625660f..0829f86 100644
>> --- a/arch/x86/include/asm/spinlock.h
>> +++ b/arch/x86/include/asm/spinlock.h
>> @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
>>   	set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
>>   }
>>
>> +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
>> +{
>> +	arch_spinlock_t old, new;
>> +	__ticket_t diff;
>> +
>> +	old.tickets = READ_ONCE(lock->tickets);
>
> Couldn't the caller pass in the lock state that it read rather than
> re-reading it?
>

Yes we could. do you mean we could pass additional read value apart from 
lock, (because lock will be anyway needed for cmpxchg).

>>
>> +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
>> +{
>> +}
>> +
>>   #endif /* CONFIG_PARAVIRT_SPINLOCKS */
>>
>>   static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>> @@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
>>   	register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
>>
>>   	inc = xadd(&lock->tickets, inc);
>> -	if (likely(inc.head == inc.tail))
>> +	if (likely(inc.head == (inc.tail & ~TICKET_SLOWPATH_FLAG)))
>

good point, we can get rid of this as well.

> The intent of this conditional was to be the quickest possible path when
> taking a fastpath lock, with the code below being used for all slowpath
> locks (free or taken). So I don't think masking out SLOWPATH_FLAG is
> necessary here.
>
>>   		goto out;
>>
>>   	inc.tail &= ~TICKET_SLOWPATH_FLAG;
>> @@ -98,7 +119,10 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
>>   		} while (--count);
>>   		__ticket_lock_spinning(lock, inc.tail);
>>   	}
>> -out:	barrier();	/* make sure nothing creeps before the lock is taken */
>> +out:
>> +	__ticket_check_and_clear_slowpath(lock);
>> +
>> +	barrier();	/* make sure nothing creeps before the lock is taken */
>
> Which means that if "goto out" path is only ever used for fastpath
> locks, you can limit calling __ticket_check_and_clear_slowpath() to the
> slowpath case.
>

Yes, I ll move that call up.

>>   }
>>
>>   static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
>> @@ -115,47 +139,21 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
>>   	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
>>   }
>>
>> -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
>> -					    arch_spinlock_t old)
>> -{
>> -	arch_spinlock_t new;
>> -
>> -	BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
>> -
>> -	/* Perform the unlock on the "before" copy */
>> -	old.tickets.head += TICKET_LOCK_INC;
>
> NB (see below)

Thanks for pointing, this solved the hang issue. I
missed this exact addition.

>
>> -
>> -	/* Clear the slowpath flag */
>> -	new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT);
>> -
>> -	/*
>> -	 * If the lock is uncontended, clear the flag - use cmpxchg in
>> -	 * case it changes behind our back though.
>> -	 */
>> -	if (new.tickets.head != new.tickets.tail ||
>> -	    cmpxchg(&lock->head_tail, old.head_tail,
>> -					new.head_tail) != old.head_tail) {
>> -		/*
>> -		 * Lock still has someone queued for it, so wake up an
>> -		 * appropriate waiter.
>> -		 */
>> -		__ticket_unlock_kick(lock, old.tickets.head);
>> -	}
>> -}
>> -
>>   static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
>>   {
>>   	if (TICKET_SLOWPATH_FLAG &&
>> -	    static_key_false(&paravirt_ticketlocks_enabled)) {
>> -		arch_spinlock_t prev;
>> +		static_key_false(&paravirt_ticketlocks_enabled)) {
>> +		__ticket_t prev_head;
>>
>> -		prev = *lock;
>> +		prev_head = lock->tickets.head;
>>   		add_smp(&lock->tickets.head, TICKET_LOCK_INC);
>>
>>   		/* add_smp() is a full mb() */
>>
>> -		if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
>> -			__ticket_unlock_slowpath(lock, prev);
>> +		if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) {
>
> So we're OK with still having a ("speculative"?) read-after-unlock here?
> I guess the only way to avoid it is to make the add_smp an xadd, but
> that's pretty expensive even compared to a locked add (at least last
> time I checked, which was at least a couple of microarchitectures ago).
> An unlocked add followed by lfence should also do the trick, but that
> was also much worse in practice.

So we have 3 choices,
1. xadd
2. continue with current approach.
3. a read before unlock and also after that.

>
>> +			BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
>> +			__ticket_unlock_kick(lock, prev_head);
>
> Should be "prev_head + TICKET_LOCK_INC" to match the previous code,
> otherwise it won't find the CPU waiting for the new head.

Yes it is :)

  parent reply	other threads:[~2015-02-09  9:34 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-06 14:49 [PATCH] x86 spinlock: Fix memory corruption on completing completions Raghavendra K T
2015-02-06 15:20 ` Sasha Levin
2015-02-06 15:20 ` Sasha Levin
2015-02-06 15:20   ` Sasha Levin
2015-02-06 16:15   ` Linus Torvalds
2015-02-06 16:15     ` Linus Torvalds
2015-02-06 16:15     ` Linus Torvalds
2015-02-06 17:03     ` Andrey Ryabinin
2015-02-06 17:03       ` Andrey Ryabinin
2015-02-06 17:03     ` Andrey Ryabinin
2015-02-06 17:03     ` Andrey Ryabinin
2015-02-06 16:15   ` Linus Torvalds
2015-02-08 17:14   ` Oleg Nesterov
2015-02-08 17:14     ` Oleg Nesterov
2015-02-08 17:14   ` Oleg Nesterov
2015-02-06 16:25 ` Linus Torvalds
2015-02-06 16:25 ` Linus Torvalds
2015-02-06 16:25   ` Linus Torvalds
2015-02-06 16:25   ` Linus Torvalds
2015-02-06 19:42   ` Davidlohr Bueso
2015-02-06 19:42   ` Davidlohr Bueso
2015-02-06 19:42   ` Davidlohr Bueso
2015-02-06 19:42     ` Davidlohr Bueso
2015-02-06 21:15     ` Sasha Levin
2015-02-06 21:15     ` Sasha Levin
2015-02-06 21:15       ` Sasha Levin
2015-02-06 21:15       ` Sasha Levin
2015-02-06 23:24       ` Davidlohr Bueso
2015-02-06 23:24         ` Davidlohr Bueso
2015-02-06 23:24         ` Davidlohr Bueso
2015-02-06 23:24       ` Davidlohr Bueso
2015-02-08 17:49   ` Raghavendra K T
2015-02-08 17:49     ` Raghavendra K T
2015-02-08 17:49     ` Raghavendra K T
2015-02-08 17:49   ` Raghavendra K T
2015-02-06 18:57 ` Sasha Levin
2015-02-06 18:57 ` Sasha Levin
2015-02-06 18:57   ` Sasha Levin
2015-02-08 17:57   ` Raghavendra K T
2015-02-08 17:57   ` Raghavendra K T
2015-02-08 17:57   ` Raghavendra K T
2015-02-08 21:14 ` Jeremy Fitzhardinge
2015-02-08 21:14 ` Jeremy Fitzhardinge
2015-02-08 21:14   ` Jeremy Fitzhardinge
2015-02-09  9:34   ` Raghavendra K T
2015-02-09  9:34   ` Raghavendra K T
2015-02-09  9:34   ` Raghavendra K T [this message]
2015-02-09 12:02     ` Peter Zijlstra
2015-02-09 12:02     ` Peter Zijlstra
2015-02-09 12:02       ` Peter Zijlstra
2015-02-09 12:52       ` Raghavendra K T
2015-02-09 12:52         ` Raghavendra K T
2015-02-09 12:52       ` Raghavendra K T
2015-02-10  0:53       ` Linus Torvalds
2015-02-10  0:53       ` Linus Torvalds
2015-02-10  0:53         ` Linus Torvalds
2015-02-10  0:53         ` Linus Torvalds
2015-02-10  9:30         ` Raghavendra K T
2015-02-10  9:30         ` Raghavendra K T
2015-02-10  9:30           ` Raghavendra K T
2015-02-10 13:18           ` Denys Vlasenko
2015-02-10 13:18           ` Denys Vlasenko
2015-02-10 13:18           ` Denys Vlasenko
2015-02-10 13:18             ` Denys Vlasenko
2015-02-10 13:20             ` Denys Vlasenko
2015-02-10 13:20             ` Denys Vlasenko
2015-02-10 13:20               ` Denys Vlasenko
2015-02-10 13:20               ` Denys Vlasenko
2015-02-10 14:24             ` Oleg Nesterov
2015-02-10 14:24               ` Oleg Nesterov
2015-02-10 14:24               ` Oleg Nesterov
2015-02-10 14:24             ` Oleg Nesterov
2015-02-10 13:23           ` Sasha Levin
2015-02-10 13:23           ` Sasha Levin
2015-02-10 13:23             ` Sasha Levin
2015-02-10 13:26           ` Oleg Nesterov
2015-02-10 13:26           ` Oleg Nesterov
2015-02-10 13:26             ` Oleg Nesterov
2015-02-11  1:18             ` Jeremy Fitzhardinge
2015-02-11  1:18               ` Jeremy Fitzhardinge
2015-02-11  1:18               ` Jeremy Fitzhardinge
2015-02-11 17:24               ` Oleg Nesterov
2015-02-11 17:24               ` Oleg Nesterov
2015-02-11 17:24                 ` Oleg Nesterov
2015-02-11 17:24                 ` Oleg Nesterov
2015-02-11 23:15                 ` Jeremy Fitzhardinge
2015-02-11 23:15                 ` Jeremy Fitzhardinge
2015-02-11 23:15                   ` Jeremy Fitzhardinge
2015-02-11 23:28                   ` Linus Torvalds
2015-02-11 23:28                   ` Linus Torvalds
2015-02-12  7:08                     ` Jeremy Fitzhardinge
2015-02-12  7:08                     ` Jeremy Fitzhardinge
2015-02-12 14:18                   ` Oleg Nesterov
2015-02-12 14:18                   ` Oleg Nesterov
2015-02-12 14:18                     ` Oleg Nesterov
2015-02-12 14:18                     ` Oleg Nesterov
2015-02-11 23:15                 ` Jeremy Fitzhardinge
2015-02-11  1:18             ` Jeremy Fitzhardinge
2015-02-11 11:08             ` Raghavendra K T
2015-02-11 11:08             ` Raghavendra K T
2015-02-11 11:08               ` Raghavendra K T
2015-02-11 17:38               ` Oleg Nesterov
2015-02-11 17:38                 ` Oleg Nesterov
2015-02-11 18:38                 ` Raghavendra K T
2015-02-11 18:38                 ` Raghavendra K T
2015-02-11 18:38                   ` Raghavendra K T
2015-02-11 18:38                 ` Raghavendra K T
2015-02-11 17:38               ` Oleg Nesterov
2015-02-11 11:08             ` Raghavendra K T
  -- strict thread matches above, loose matches on Subject: below --
2015-02-06 14:49 Raghavendra K T
2015-02-06 14:49 Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54D87F1E.9060307@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=a.ryabinin@samsung.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=davej@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jasowang@redhat.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=waiman.long@hp.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.