From: Waiman Long <waiman.long@hp.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Jeff Layton <jlayton@redhat.com>,
Miklos Szeredi <mszeredi@suse.cz>, Ingo Molnar <mingo@redhat.com>,
linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Andi Kleen <andi@firstfloor.org>,
"Chandramouleeswaran, Aswin" <aswin@hp.com>,
"Norton, Scott J" <scott.norton@hp.com>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
Date: Sat, 29 Jun 2013 17:03:59 -0400 [thread overview]
Message-ID: <51CF4BBF.30608@hp.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1306271501220.4013@ionos.tec.linutronix.de>
On 06/27/2013 10:44 AM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>> On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
>>> On Wed, 26 Jun 2013, Waiman Long wrote:
>>> This is a complete design disaster. You are violating every single
>>> rule of proper layering. The differentiation of spinlock, raw_spinlock
>>> and arch_spinlock is there for a reason and you just take the current
>>> implementation and map your desired function to it. If we take this,
>>> then we fundamentally ruled out a change to the mappings of spinlock,
>>> raw_spinlock and arch_spinlock. This layering is on purpose and it
>>> took a lot of effort to make it as clean as it is now. Your proposed
>>> change undoes that and completely wreckages preempt-rt.
>> Would you mind enlighten me how this change will break preempt-rt?
> The whole spinlock layering was done for RT. In mainline spinlock is
> mapped to raw_spinlock. On RT spinlock is mapped to a PI aware
> rtmutex.
>
> So your mixing of the various layers and the assumption of lock plus
> count being adjacent, does not work on RT at all.
Thank for the explanation. I had downloaded and looked at the RT patch.
My code won't work for the full RT kernel. I guess that is no way to
work around this and only logical choice is to disable it for the full
RT kernel.
>> The only architecture that will break, according to data in the
>> respectively arch/*/include/asm/spinlock_types.h files is PA-RISC
>> 1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of
>> 16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or
>> not, but we can always disable the optimization for this case.
> You have to do that right from the beginning with a proper data
> structure and proper accessor functions. Introducing wreckage and then
> retroactivly fixing it, is not a really good idea.
For architecture that needs a larger than 32-bit arch_spin_lock type,
the optimization code will be disabled just like the non-SMP and full RT
cases.
>>> So what you really want is a new data structure, e.g. something like
>>> struct spinlock_refcount() and a proper set of new accessor functions
>>> instead of an adhoc interface which is designed solely for the needs
>>> of dcache improvements.
>> I had thought about that. The only downside is we need more code changes to
>> make existing code to use the new infrastructure. One of the drivers in my
> That's not a downside. It makes the usage of the infrastructure more
> obvious and not hidden behind macro magic. And the changes are trivial
> to script.
Making the code from d_lock to lockref.lock, for example, is trivial.
However, there are about 40 different files that need to be changed with
different maintainers. Do I need to get an ACK from all of them or can I
just go ahead with such trivial changes without their ACK? You know it
can be hard to get responses from so many maintainers in a timely
manner. This is actually my main concern.
>> design is to minimize change to existing code. Personally, I have no
>> objection of doing that if others think this is the right way to go.
> Definitely. Something like this would be ok:
>
> struct lock_refcnt {
> int refcnt;
> struct spinlock lock;
> };
>
> It does not require a gazillion of ifdefs and works for
> UP,SMP,DEBUG....
>
> extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond);
>
> You also want something like:
>
> extern void lock_refcnt_disable(void);
>
> So we can runtime disable it e.g. when lock elision is detected and
> active. So you can force lock_refcnt_mod() to return false.
>
> static inline bool lock_refcnt_inc(struct lock_refcnt *sr)
> {
> #ifdef CONFIG_HAVE_LOCK_REFCNT
> return lock_refcnt_mod(sr, 1, INTMAX);
> #else
> return false;
> #endif
> }
>
> That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n.
>
> So we can enable it per architecture and make it depend on SMP. For RT
> we simply can force this switch to n.
>
> The other question is the semantic of these refcount functions. From
> your description the usage pattern is:
>
> if (refcnt_xxx())
> return;
> /* Slow path */
> spin_lock();
> ...
> spin_unlock();
>
> So it might be sensible to make this explicit:
>
> static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr))
> {
> #ifdef CONFIG_HAVE_SPINLOCK_REFCNT
> if (lock_refcnt_mod(lr, 1, INTMAX))
> return true;
> #endif
> spin_lock(&lr->lock);
> return false;
> }
Yes, it is a good idea to have a config variable to enable/disable it as
long as the default is "y". Of course, an full RT kernel or an non-SMP
kernel will have it disabled.
Regards,
Longman
next prev parent reply other threads:[~2013-06-29 21:04 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-26 17:43 [PATCH v2 0/2] Lockless update of reference count protected by spinlock Waiman Long
2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
2013-06-26 20:17 ` Andi Kleen
2013-06-26 21:07 ` Waiman Long
2013-06-26 21:22 ` Andi Kleen
2013-06-26 23:26 ` Waiman Long
2013-06-27 1:06 ` Andi Kleen
2013-06-27 1:15 ` Waiman Long
2013-06-27 1:24 ` Waiman Long
2013-06-27 1:37 ` Andi Kleen
2013-06-27 14:56 ` Waiman Long
2013-06-28 13:46 ` Thomas Gleixner
2013-06-29 20:30 ` Waiman Long
2013-06-26 23:27 ` Thomas Gleixner
2013-06-26 23:06 ` Thomas Gleixner
2013-06-27 0:16 ` Waiman Long
2013-06-27 14:44 ` Thomas Gleixner
2013-06-29 21:03 ` Waiman Long [this message]
2013-06-27 0:26 ` Waiman Long
2013-06-29 17:45 ` Linus Torvalds
2013-06-29 20:23 ` Waiman Long
2013-06-29 21:34 ` Waiman Long
2013-06-29 22:11 ` Linus Torvalds
2013-06-29 22:34 ` Waiman Long
2013-06-29 21:58 ` Linus Torvalds
2013-06-29 22:47 ` Linus Torvalds
2013-07-01 13:40 ` Waiman Long
2013-06-26 17:43 ` [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51CF4BBF.30608@hp.com \
--to=waiman.long@hp.com \
--cc=andi@firstfloor.org \
--cc=aswin@hp.com \
--cc=benh@kernel.crashing.org \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mszeredi@suse.cz \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox