From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753357Ab3F2VEM (ORCPT ); Sat, 29 Jun 2013 17:04:12 -0400 Received: from g1t0026.austin.hp.com ([15.216.28.33]:25590 "EHLO g1t0026.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752737Ab3F2VEK (ORCPT ); Sat, 29 Jun 2013 17:04:10 -0400 Message-ID: <51CF4BBF.30608@hp.com> Date: Sat, 29 Jun 2013 17:03:59 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Thomas Gleixner CC: Alexander Viro , Jeff Layton , Miklos Szeredi , Ingo Molnar , linux-fsdevel@vger.kernel.org, LKML , Linus Torvalds , Benjamin Herrenschmidt , Andi Kleen , "Chandramouleeswaran, Aswin" , "Norton, Scott J" , Peter Zijlstra , Steven Rostedt Subject: Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount References: <1372268603-46748-1-git-send-email-Waiman.Long@hp.com> <1372268603-46748-2-git-send-email-Waiman.Long@hp.com> <51CB8472.1040502@hp.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/27/2013 10:44 AM, Thomas Gleixner wrote: > On Wed, 26 Jun 2013, Waiman Long wrote: >> On 06/26/2013 07:06 PM, Thomas Gleixner wrote: >>> On Wed, 26 Jun 2013, Waiman Long wrote: >>> This is a complete design disaster. You are violating every single >>> rule of proper layering. The differentiation of spinlock, raw_spinlock >>> and arch_spinlock is there for a reason and you just take the current >>> implementation and map your desired function to it. If we take this, >>> then we fundamentally ruled out a change to the mappings of spinlock, >>> raw_spinlock and arch_spinlock. This layering is on purpose and it >>> took a lot of effort to make it as clean as it is now. Your proposed >>> change undoes that and completely wreckages preempt-rt. >> Would you mind enlighten me how this change will break preempt-rt? > The whole spinlock layering was done for RT. In mainline spinlock is > mapped to raw_spinlock. On RT spinlock is mapped to a PI aware > rtmutex. > > So your mixing of the various layers and the assumption of lock plus > count being adjacent, does not work on RT at all. Thank for the explanation. I had downloaded and looked at the RT patch. My code won't work for the full RT kernel. I guess that is no way to work around this and only logical choice is to disable it for the full RT kernel. >> The only architecture that will break, according to data in the >> respectively arch/*/include/asm/spinlock_types.h files is PA-RISC >> 1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of >> 16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or >> not, but we can always disable the optimization for this case. > You have to do that right from the beginning with a proper data > structure and proper accessor functions. Introducing wreckage and then > retroactivly fixing it, is not a really good idea. For architecture that needs a larger than 32-bit arch_spin_lock type, the optimization code will be disabled just like the non-SMP and full RT cases. >>> So what you really want is a new data structure, e.g. something like >>> struct spinlock_refcount() and a proper set of new accessor functions >>> instead of an adhoc interface which is designed solely for the needs >>> of dcache improvements. >> I had thought about that. The only downside is we need more code changes to >> make existing code to use the new infrastructure. One of the drivers in my > That's not a downside. It makes the usage of the infrastructure more > obvious and not hidden behind macro magic. And the changes are trivial > to script. Making the code from d_lock to lockref.lock, for example, is trivial. However, there are about 40 different files that need to be changed with different maintainers. Do I need to get an ACK from all of them or can I just go ahead with such trivial changes without their ACK? You know it can be hard to get responses from so many maintainers in a timely manner. This is actually my main concern. >> design is to minimize change to existing code. Personally, I have no >> objection of doing that if others think this is the right way to go. > Definitely. Something like this would be ok: > > struct lock_refcnt { > int refcnt; > struct spinlock lock; > }; > > It does not require a gazillion of ifdefs and works for > UP,SMP,DEBUG.... > > extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond); > > You also want something like: > > extern void lock_refcnt_disable(void); > > So we can runtime disable it e.g. when lock elision is detected and > active. So you can force lock_refcnt_mod() to return false. > > static inline bool lock_refcnt_inc(struct lock_refcnt *sr) > { > #ifdef CONFIG_HAVE_LOCK_REFCNT > return lock_refcnt_mod(sr, 1, INTMAX); > #else > return false; > #endif > } > > That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n. > > So we can enable it per architecture and make it depend on SMP. For RT > we simply can force this switch to n. > > The other question is the semantic of these refcount functions. From > your description the usage pattern is: > > if (refcnt_xxx()) > return; > /* Slow path */ > spin_lock(); > ... > spin_unlock(); > > So it might be sensible to make this explicit: > > static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr)) > { > #ifdef CONFIG_HAVE_SPINLOCK_REFCNT > if (lock_refcnt_mod(lr, 1, INTMAX)) > return true; > #endif > spin_lock(&lr->lock); > return false; > } Yes, it is a good idea to have a config variable to enable/disable it as long as the default is "y". Of course, an full RT kernel or an non-SMP kernel will have it disabled. Regards, Longman