From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758600Ab2KBQRf (ORCPT ); Fri, 2 Nov 2012 12:17:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12769 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751953Ab2KBQRe (ORCPT ); Fri, 2 Nov 2012 12:17:34 -0400 Date: Fri, 2 Nov 2012 17:18:15 +0100 From: Oleg Nesterov To: "Paul E. McKenney" Cc: Mikulas Patocka , Peter Zijlstra , Linus Torvalds , Ingo Molnar , Srikar Dronamraju , Ananth N Mavinakayanahalli , Anton Arapov , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Message-ID: <20121102161815.GA24256@redhat.com> References: <20121018162409.GA28504@redhat.com> <20121018163833.GK2518@linux.vnet.ibm.com> <20121018175747.GA30691@redhat.com> <20121019192838.GM2518@linux.vnet.ibm.com> <20121030184800.GA16129@redhat.com> <20121031194135.GA504@redhat.com> <20121031194158.GB504@redhat.com> <20121101154340.GH3027@linux.vnet.ibm.com> <20121101183324.GA30442@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121101183324.GA30442@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/01, Oleg Nesterov wrote: > > On 11/01, Paul E. McKenney wrote: > > > > OK, so it looks to me that this code relies on synchronize_sched() > > forcing a memory barrier on each CPU executing in the kernel. > > No, the patch tries to avoid this assumption, but probably I missed > something. > > > 1. A task running on CPU 0 currently write-holds the lock. > > > > 2. CPU 1 is running in the kernel, executing a longer-than-average > > loop of normal instructions (no atomic instructions or memory > > barriers). > > > > 3. CPU 0 invokes percpu_up_write(), calling up_write(), > > synchronize_sched(), and finally mutex_unlock(). > > And my expectation was, this should be enough because ... > > > 4. CPU 1 executes percpu_down_read(), which calls update_fast_ctr(), > > since update_fast_ctr does preempt_disable/enable it should see all > modifications done by CPU 0. > > IOW. Suppose that the writer (CPU 0) does > > percpu_done_write(); > STORE; > percpu_up_write(); > > This means > > STORE; > synchronize_sched(); > mutex_unlock(); > > Now. Do you mean that the next preempt_disable/enable can see the > result of mutex_unlock() but not STORE? So far I think this is not possible, so the code doesn't need the additional wstate/barriers. > > +static bool update_fast_ctr(struct percpu_rw_semaphore *brw, int val) > > +{ > > + bool success = false; > > int state; > > > + > > + preempt_disable(); > > + if (likely(!mutex_is_locked(&brw->writer_mutex))) { > > state = ACCESS_ONCE(brw->wstate); > if (likely(!state)) { > > > + __this_cpu_add(*brw->fast_read_ctr, val); > > + success = true; > > } else if (state & WSTATE_NEED_MB) { > __this_cpu_add(*brw->fast_read_ctr, val); > smb_mb(); /* Order increment against critical section. */ > success = true; > } ... > > +void percpu_up_write(struct percpu_rw_semaphore *brw) > > +{ > > + /* allow the new readers, but only the slow-path */ > > + up_write(&brw->rw_sem); > > ACCESS_ONCE(brw->wstate) = WSTATE_NEED_MB; > > > + > > + /* insert the barrier before the next fast-path in down_read */ > > + synchronize_sched(); But update_fast_ctr() should see mutex_is_locked(), obiously down_write() must ensure this. So update_fast_ctr() can execute the WSTATE_NEED_MB code only if it races with > ACCESS_ONCE(brw->wstate) = 0; > > > + mutex_unlock(&brw->writer_mutex); these 2 stores and sees them in reverse order. I guess that mutex_is_locked() in update_fast_ctr() looks a bit confusing. It means no-fast-path for the reader, we could use ->state instead. And even ->writer_mutex should go away if we want to optimize the write-contended case, but I think this needs another patch on top of this initial implementation. Oleg.