From: Waiman Long <waiman.long@hp.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Jeff Layton <jlayton@redhat.com>,
Miklos Szeredi <mszeredi@suse.cz>, Ingo Molnar <mingo@redhat.com>,
linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Andi Kleen <andi@firstfloor.org>,
"Chandramouleeswaran, Aswin" <aswin@hp.com>,
"Norton, Scott J" <scott.norton@hp.com>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
Date: Sat, 29 Jun 2013 17:03:59 -0400 [thread overview]
Message-ID: <51CF4BBF.30608@hp.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1306271501220.4013@ionos.tec.linutronix.de>
On 06/27/2013 10:44 AM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>> On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
>>> On Wed, 26 Jun 2013, Waiman Long wrote:
>>> This is a complete design disaster. You are violating every single
>>> rule of proper layering. The differentiation of spinlock, raw_spinlock
>>> and arch_spinlock is there for a reason and you just take the current
>>> implementation and map your desired function to it. If we take this,
>>> then we fundamentally ruled out a change to the mappings of spinlock,
>>> raw_spinlock and arch_spinlock. This layering is on purpose and it
>>> took a lot of effort to make it as clean as it is now. Your proposed
>>> change undoes that and completely wreckages preempt-rt.
>> Would you mind enlighten me how this change will break preempt-rt?
> The whole spinlock layering was done for RT. In mainline spinlock is
> mapped to raw_spinlock. On RT spinlock is mapped to a PI aware
> rtmutex.
>
> So your mixing of the various layers and the assumption of lock plus
> count being adjacent, does not work on RT at all.
Thank for the explanation. I had downloaded and looked at the RT patch.
My code won't work for the full RT kernel. I guess that is no way to
work around this and only logical choice is to disable it for the full
RT kernel.
>> The only architecture that will break, according to data in the
>> respectively arch/*/include/asm/spinlock_types.h files is PA-RISC
>> 1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of
>> 16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or
>> not, but we can always disable the optimization for this case.
> You have to do that right from the beginning with a proper data
> structure and proper accessor functions. Introducing wreckage and then
> retroactivly fixing it, is not a really good idea.
For architecture that needs a larger than 32-bit arch_spin_lock type,
the optimization code will be disabled just like the non-SMP and full RT
cases.
>>> So what you really want is a new data structure, e.g. something like
>>> struct spinlock_refcount() and a proper set of new accessor functions
>>> instead of an adhoc interface which is designed solely for the needs
>>> of dcache improvements.
>> I had thought about that. The only downside is we need more code changes to
>> make existing code to use the new infrastructure. One of the drivers in my
> That's not a downside. It makes the usage of the infrastructure more
> obvious and not hidden behind macro magic. And the changes are trivial
> to script.
Making the code from d_lock to lockref.lock, for example, is trivial.
However, there are about 40 different files that need to be changed with
different maintainers. Do I need to get an ACK from all of them or can I
just go ahead with such trivial changes without their ACK? You know it
can be hard to get responses from so many maintainers in a timely
manner. This is actually my main concern.
>> design is to minimize change to existing code. Personally, I have no
>> objection of doing that if others think this is the right way to go.
> Definitely. Something like this would be ok:
>
> struct lock_refcnt {
> int refcnt;
> struct spinlock lock;
> };
>
> It does not require a gazillion of ifdefs and works for
> UP,SMP,DEBUG....
>
> extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond);
>
> You also want something like:
>
> extern void lock_refcnt_disable(void);
>
> So we can runtime disable it e.g. when lock elision is detected and
> active. So you can force lock_refcnt_mod() to return false.
>
> static inline bool lock_refcnt_inc(struct lock_refcnt *sr)
> {
> #ifdef CONFIG_HAVE_LOCK_REFCNT
> return lock_refcnt_mod(sr, 1, INTMAX);
> #else
> return false;
> #endif
> }
>
> That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n.
>
> So we can enable it per architecture and make it depend on SMP. For RT
> we simply can force this switch to n.
>
> The other question is the semantic of these refcount functions. From
> your description the usage pattern is:
>
> if (refcnt_xxx())
> return;
> /* Slow path */
> spin_lock();
> ...
> spin_unlock();
>
> So it might be sensible to make this explicit:
>
> static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr))
> {
> #ifdef CONFIG_HAVE_SPINLOCK_REFCNT
> if (lock_refcnt_mod(lr, 1, INTMAX))
> return true;
> #endif
> spin_lock(&lr->lock);
> return false;
> }
Yes, it is a good idea to have a config variable to enable/disable it as
long as the default is "y". Of course, an full RT kernel or an non-SMP
kernel will have it disabled.
Regards,
Longman
next prev parent reply other threads:[~2013-06-29 21:03 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-26 17:43 [PATCH v2 0/2] Lockless update of reference count protected by spinlock Waiman Long
2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
2013-06-26 20:17 ` Andi Kleen
2013-06-26 21:07 ` Waiman Long
2013-06-26 21:22 ` Andi Kleen
2013-06-26 23:26 ` Waiman Long
2013-06-27 1:06 ` Andi Kleen
2013-06-27 1:15 ` Waiman Long
2013-06-27 1:24 ` Waiman Long
2013-06-27 1:37 ` Andi Kleen
2013-06-27 14:56 ` Waiman Long
2013-06-28 13:46 ` Thomas Gleixner
2013-06-29 20:30 ` Waiman Long
2013-06-26 23:27 ` Thomas Gleixner
2013-06-26 23:06 ` Thomas Gleixner
2013-06-27 0:16 ` Waiman Long
2013-06-27 14:44 ` Thomas Gleixner
2013-06-29 21:03 ` Waiman Long [this message]
2013-06-27 0:26 ` Waiman Long
2013-06-29 17:45 ` Linus Torvalds
2013-06-29 20:23 ` Waiman Long
2013-06-29 21:34 ` Waiman Long
2013-06-29 22:11 ` Linus Torvalds
2013-06-29 22:34 ` Waiman Long
2013-06-29 21:58 ` Linus Torvalds
2013-06-29 22:47 ` Linus Torvalds
2013-07-01 13:40 ` Waiman Long
2013-06-26 17:43 ` [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51CF4BBF.30608@hp.com \
--to=waiman.long@hp.com \
--cc=andi@firstfloor.org \
--cc=aswin@hp.com \
--cc=benh@kernel.crashing.org \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mszeredi@suse.cz \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.