From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-bn1bon0134.outbound.protection.outlook.com ([157.56.111.134]:51256 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1030261AbcBQSpp (ORCPT ); Wed, 17 Feb 2016 13:45:45 -0500 Message-ID: <56C4BFCF.30100@hpe.com> Date: Wed, 17 Feb 2016 13:45:35 -0500 From: Waiman Long MIME-Version: 1.0 To: Peter Zijlstra CC: Christoph Lameter , Dave Chinner , Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , , , Ingo Molnar , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch Subject: Re: [RFC PATCH 1/2] lib/percpu-list: Per-cpu list with associated per-cpu locks References: <1455672680-7153-1-git-send-email-Waiman.Long@hpe.com> <1455672680-7153-2-git-send-email-Waiman.Long@hpe.com> <20160217095318.GO14668@dastard> <20160217110040.GB6357@twins.programming.kicks-ass.net> <20160217110520.GN6375@twins.programming.kicks-ass.net> <56C49CCA.7090805@hpe.com> <56C4AA19.1080907@hpe.com> <20160217171845.GK6357@twins.programming.kicks-ass.net> <56C4B0E1.4090902@hpe.com> <20160217182212.GL6357@twins.programming.kicks-ass.net> In-Reply-To: <20160217182212.GL6357@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 02/17/2016 01:22 PM, Peter Zijlstra wrote: > On Wed, Feb 17, 2016 at 12:41:53PM -0500, Waiman Long wrote: >> On 02/17/2016 12:18 PM, Peter Zijlstra wrote: >>> On Wed, Feb 17, 2016 at 12:12:57PM -0500, Waiman Long wrote: >>>> On 02/17/2016 11:27 AM, Christoph Lameter wrote: >>>>> On Wed, 17 Feb 2016, Waiman Long wrote: >>>>> >>>>>> I know we can use RCU for singly linked list, but I don't think we can use >>>>>> that for doubly linked list as there is no easy way to make atomic changes to >>>>>> both prev and next pointers simultaneously unless you are taking about 16b >>>>>> cmpxchg which is only supported in some architecture. >>>>> But its supported in the most important architecutes. You can fall back to >>>>> spinlocks on the ones that do not support it. >>>>> >>>> I guess with some limitations on how the lists can be traversed, we may be >>>> able to do that with RCU without lock. However, that will make the code more >>>> complex and harder to verify. Given that in both my and Dave's testing that >>>> contentions with list insertion and deletion are almost gone from the perf >>>> profile when they used to be a bottleneck, is it really worth the effort to >>>> do such a conversion? >>> My initial concern was the preempt disable delay introduced by holding >>> the spinlock over the entire iteration. >>> >>> There is no saying how many elements are on that list and there is no >>> lock break. >> But preempt_disable() is called at the beginning of the spin_lock() call. So >> the additional preempt_disable() in percpu_list_add() is just to cover the >> this_cpu_ptr() call to make sure that the cpu number doesn't change. So we >> are talking about a few ns at most here. >> > I'm talking about the list iteration, there is no preempt_disable() in > there, just the spin_lock, which you hold over the entire list, which > can be many, many element. Sorry for the misunderstanding. The original code has one global lock and one single list that covers all the inodes in the filesystem. This patch essentially breaks it up into multiple smaller lists with one lock for each. So the lock hold time should have been greatly reduced unless we are unfortunately enough that most of the inodes are in one single list. If lock hold time is a concern, I think in some cases we can set the an upper limit on how many inodes we want to process, release the lock, reacquire it and continue. I am just worry that using RCU and 16b cmpxchg will introduce too much complexity with no performance gain to show. Cheers, Longman