From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753357Ab3F2VEM (ORCPT <rfc822;w@1wt.eu>);
	Sat, 29 Jun 2013 17:04:12 -0400
Received: from g1t0026.austin.hp.com ([15.216.28.33]:25590 "EHLO
	g1t0026.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752737Ab3F2VEK (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Sat, 29 Jun 2013 17:04:10 -0400
Message-ID: <51CF4BBF.30608@hp.com>
Date: Sat, 29 Jun 2013 17:03:59 -0400
From: Waiman Long <waiman.long@hp.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12
MIME-Version: 1.0
To: Thomas Gleixner <tglx@linutronix.de>
CC: Alexander Viro <viro@zeniv.linux.org.uk>, Jeff Layton <jlayton@redhat.com>,
        Miklos Szeredi <mszeredi@suse.cz>, Ingo Molnar <mingo@redhat.com>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Andi Kleen <andi@firstfloor.org>,
        "Chandramouleeswaran, Aswin" <aswin@hp.com>,
        "Norton, Scott J" <scott.norton@hp.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless
 update of refcount
References: <1372268603-46748-1-git-send-email-Waiman.Long@hp.com> <1372268603-46748-2-git-send-email-Waiman.Long@hp.com> <alpine.DEB.2.02.1306262307020.4013@ionos.tec.linutronix.de> <51CB8472.1040502@hp.com> <alpine.DEB.2.02.1306271501220.4013@ionos.tec.linutronix.de>
In-Reply-To: <alpine.DEB.2.02.1306271501220.4013@ionos.tec.linutronix.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 06/27/2013 10:44 AM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>> On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
>>> On Wed, 26 Jun 2013, Waiman Long wrote:
>>> This is a complete design disaster. You are violating every single
>>> rule of proper layering. The differentiation of spinlock, raw_spinlock
>>> and arch_spinlock is there for a reason and you just take the current
>>> implementation and map your desired function to it. If we take this,
>>> then we fundamentally ruled out a change to the mappings of spinlock,
>>> raw_spinlock and arch_spinlock. This layering is on purpose and it
>>> took a lot of effort to make it as clean as it is now. Your proposed
>>> change undoes that and completely wreckages preempt-rt.
>> Would you mind enlighten me how this change will break preempt-rt?
> The whole spinlock layering was done for RT. In mainline spinlock is
> mapped to raw_spinlock. On RT spinlock is mapped to a PI aware
> rtmutex.
>
> So your mixing of the various layers and the assumption of lock plus
> count being adjacent, does not work on RT at all.

Thank for the explanation. I had downloaded and looked at the RT patch. 
My code won't work for the full RT kernel. I guess that is no way to 
work around this and only logical choice is to disable it for the full 
RT kernel.

>> The only architecture that will break, according to data in the
>> respectively arch/*/include/asm/spinlock_types.h files is PA-RISC
>> 1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of
>> 16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or
>> not, but we can always disable the optimization for this case.
> You have to do that right from the beginning with a proper data
> structure and proper accessor functions. Introducing wreckage and then
> retroactivly fixing it, is not a really good idea.

For architecture that needs a larger than 32-bit arch_spin_lock type, 
the optimization code will be disabled just like the non-SMP and full RT 
cases.

>>> So what you really want is a new data structure, e.g. something like
>>> struct spinlock_refcount() and a proper set of new accessor functions
>>> instead of an adhoc interface which is designed solely for the needs
>>> of dcache improvements.
>> I had thought about that. The only downside is we need more code changes to
>> make existing code to use the new infrastructure. One of the drivers in my
> That's not a downside. It makes the usage of the infrastructure more
> obvious and not hidden behind macro magic. And the changes are trivial
> to script.

Making the code from d_lock to lockref.lock, for example, is trivial. 
However, there are about 40 different files that need to be changed with 
different maintainers. Do I need to get an ACK from all of them or can I 
just go ahead with such trivial changes without their ACK? You know it 
can be hard to get responses from so many maintainers in a timely 
manner. This is actually my main concern.

>> design is to minimize change to existing code. Personally, I have no
>> objection of doing that if others think this is the right way to go.
> Definitely. Something like this would be ok:
>
> struct lock_refcnt {
>         int                 refcnt;
>         struct spinlock     lock;
> };
>
> It does not require a gazillion of ifdefs and works for
> UP,SMP,DEBUG....
>
> extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond);
>
> You also want something like:
>
> extern void lock_refcnt_disable(void);
>
> So we can runtime disable it e.g. when lock elision is detected and
> active. So you can force lock_refcnt_mod() to return false.
>
> static inline bool lock_refcnt_inc(struct lock_refcnt *sr)
> {
> #ifdef CONFIG_HAVE_LOCK_REFCNT
> 	return lock_refcnt_mod(sr, 1, INTMAX);
> #else
> 	return false;
> #endif
> }
>
> That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n.
>
> So we can enable it per architecture and make it depend on SMP. For RT
> we simply can force this switch to n.
>
> The other question is the semantic of these refcount functions. From
> your description the usage pattern is:
>
>       if (refcnt_xxx())
> 	   return;
>       /* Slow path */
>       spin_lock();
>       ...
>       spin_unlock();
>
> So it might be sensible to make this explicit:
>
> static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr))
> {
> #ifdef CONFIG_HAVE_SPINLOCK_REFCNT
> 	if (lock_refcnt_mod(lr, 1, INTMAX))
> 	   return true;
> #endif
> 	spin_lock(&lr->lock);
> 	return false;
> }

Yes, it is a good idea to have a config variable to enable/disable it as 
long as the default is "y". Of course, an full RT kernel or an non-SMP 
kernel will have it disabled.

Regards,
Longman