From: Alexander Fyodorov <halcy@yandex.ru>
To: Waiman Long <waiman.long@hp.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
"Chandramouleeswaran, Aswin" <aswin@hp.com>,
"Norton, Scott J" <scott.norton@hp.com>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH RFC v2 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation
Date: Wed, 21 Aug 2013 19:51:29 +0400 [thread overview]
Message-ID: <336901377100289@web16f.yandex.ru> (raw)
In-Reply-To: <52142D6C.6000400@hp.com>
21.08.2013, 07:01, "Waiman Long" <waiman.long@hp.com>:
> On 08/20/2013 11:31 AM, Alexander Fyodorov wrote:
>> Isn't a race possible if another thread acquires the spinlock in the window
>> between setting lock->locked to 0 and issuing smp_wmb()? Writes from
>> the critical section from our thread might be delayed behind the write to
>> lock->locked if the corresponding cache bank is busy.
>
> The purpose of smp_wmb() is to make sure that content in the cache will
> be flushed out to the memory in case the cache coherency protocol cannot
> guarantee a single view of memory content on all processor.
Linux kernel does not support architectures without cache coherency, and while using memory barriers just for flushing write buffers ASAP on cache-coherent processors might benefit performance on some architectures it will hurt performance on others. So it must not be done in architecture-independent code.
> In other
> word, smp_wmb() is used to make other processors see that the lock has
> been freed ASAP. If another processor see that before smp_wmb(), it will
> be even better as the latency is reduced. As the lock holder is the only
> one that can release the lock, there is no race condition here.
No, I was talking about the window between freeing lock and issuing smp_wmb(). What I meant is:
1) A = 0
2) CPU0 locks the spinlock protecting A.
3) CPU0 writes 1 to A, but the write gets stalled because the corresponding cache bank is busy.
4) CPU0 calls spin_unlock() and sets lock->locked to 0.
If CPU1 does a spin_lock() right now, it will succeed (since lock->locked == 0). But the write to A hasn't reached cache yet, so CPU1 will see A == 0.
More examples on this are in Documentation/memory-barriers.txt
> That is a legitimate question. I don't think it is a problem on x86 as
> the x86 spinlock code doesn't do a full mb() in the lock and unlock
> paths.
It does because "lock" prefix implies a full memory barrier.
> The smp_mb() will be conditionalized depending on the ARCH_QSPINLOCK
> setting. The smp_wmb() may not be needed, but a compiler barrier should
> still be there.
Do you mean because of inline? That shouldn't be a problem because smp_mb() prohibits compiler from doing any optimizations across the barrier (thanks to the "volatile" keyword).
More on this in Documentation/memory-barriers.txt
>> Also I don't understand why there are so many uses of ACCESS_ONCE()
>> macro. It does not guarantee memory ordering with regard to other CPUs,
>> so probably most of the uses can be removed (with exception of
>> lock_is_contended(), where it prohibits gcc from optimizing the loop away).
>
> All the lock/unlock code can be inlined and we don't know what the
> compiler will do to optimize code. The ACCESS_ONCE() macro is used to
> make sure that the compiler won't optimize away the actual fetch or
> write of the memory. Even if the compiler won't optimize away the memory
> access, adding the ACCESS_ONCE() macro won't have any extra overhead. So
> a more liberal use of it won't hurt performance.
If compiler optimized memory access away it would be a bug. And I'm not so sure about overhead... For example, on some VLIW architectures ACCESS_ONCE() might prohibit compiler from mixing other instructions to the unlock.
next prev parent reply other threads:[~2013-08-21 15:59 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <15321377012704@web8h.yandex.ru>
2013-08-21 3:01 ` [PATCH RFC v2 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation Waiman Long
2013-08-21 15:51 ` Alexander Fyodorov [this message]
2013-08-22 1:04 ` Waiman Long
2013-08-22 13:28 ` Alexander Fyodorov
2013-08-26 20:14 ` Waiman Long
2013-08-27 12:09 ` Alexander Fyodorov
[not found] ` <20130827091436.3d5971a0@gandalf.local.home>
2013-08-27 13:53 ` Peter Zijlstra
2013-08-28 1:21 ` Paul E. McKenney
2013-08-28 8:19 ` Peter Zijlstra
2013-08-28 12:59 ` Steven Rostedt
2013-08-28 13:05 ` Peter Zijlstra
2013-08-28 13:15 ` Steven Rostedt
2013-08-28 13:37 ` Peter Zijlstra
2013-08-29 15:24 ` Waiman Long
2013-08-29 17:03 ` Alexander Fyodorov
2013-08-30 3:16 ` Waiman Long
2013-08-30 8:15 ` Alexander Fyodorov
2013-08-13 18:41 [PATCH RFC v2 0/2] qspinlock: Introducing a 4-byte queue spinlock Waiman Long
2013-08-13 18:41 ` [PATCH RFC v2 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=336901377100289@web16f.yandex.ru \
--to=halcy@yandex.ru \
--cc=aswin@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=waiman.long@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.