From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-bn1bon0134.outbound.protection.outlook.com ([157.56.111.134]:51256
	"EHLO na01-bn1-obe.outbound.protection.outlook.com"
	rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
	id S1030261AbcBQSpp (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 17 Feb 2016 13:45:45 -0500
Message-ID: <56C4BFCF.30100@hpe.com>
Date: Wed, 17 Feb 2016 13:45:35 -0500
From: Waiman Long <waiman.long@hpe.com>
MIME-Version: 1.0
To: Peter Zijlstra <peterz@infradead.org>
CC: Christoph Lameter <cl@linux.com>,
	Dave Chinner <david@fromorbit.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Tejun Heo <tj@kernel.org>, <linux-fsdevel@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Chinner <dchinner@redhat.com>,
	Scott J Norton <scott.norton@hp.com>,
	Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [RFC PATCH 1/2] lib/percpu-list: Per-cpu list with associated
 per-cpu locks
References: <1455672680-7153-1-git-send-email-Waiman.Long@hpe.com> <1455672680-7153-2-git-send-email-Waiman.Long@hpe.com> <20160217095318.GO14668@dastard> <20160217110040.GB6357@twins.programming.kicks-ass.net> <20160217110520.GN6375@twins.programming.kicks-ass.net> <56C49CCA.7090805@hpe.com> <alpine.DEB.2.20.1602171027310.14247@east.gentwo.org> <56C4AA19.1080907@hpe.com> <20160217171845.GK6357@twins.programming.kicks-ass.net> <56C4B0E1.4090902@hpe.com> <20160217182212.GL6357@twins.programming.kicks-ass.net>
In-Reply-To: <20160217182212.GL6357@twins.programming.kicks-ass.net>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 02/17/2016 01:22 PM, Peter Zijlstra wrote:
> On Wed, Feb 17, 2016 at 12:41:53PM -0500, Waiman Long wrote:
>> On 02/17/2016 12:18 PM, Peter Zijlstra wrote:
>>> On Wed, Feb 17, 2016 at 12:12:57PM -0500, Waiman Long wrote:
>>>> On 02/17/2016 11:27 AM, Christoph Lameter wrote:
>>>>> On Wed, 17 Feb 2016, Waiman Long wrote:
>>>>>
>>>>>> I know we can use RCU for singly linked list, but I don't think we can use
>>>>>> that for doubly linked list as there is no easy way to make atomic changes to
>>>>>> both prev and next pointers simultaneously unless you are taking about 16b
>>>>>> cmpxchg which is only supported in some architecture.
>>>>> But its supported in the most important architecutes. You can fall back to
>>>>> spinlocks on the ones that do not support it.
>>>>>
>>>> I guess with some limitations on how the lists can be traversed, we may be
>>>> able to do that with RCU without lock. However, that will make the code more
>>>> complex and harder to verify. Given that in both my and Dave's testing that
>>>> contentions with list insertion and deletion are almost gone from the perf
>>>> profile when they used to be a bottleneck, is it really worth the effort to
>>>> do such a conversion?
>>> My initial concern was the preempt disable delay introduced by holding
>>> the spinlock over the entire iteration.
>>>
>>> There is no saying how many elements are on that list and there is no
>>> lock break.
>> But preempt_disable() is called at the beginning of the spin_lock() call. So
>> the additional preempt_disable() in percpu_list_add() is just to cover the
>> this_cpu_ptr() call to make sure that the cpu number doesn't change. So we
>> are talking about a few ns at most here.
>>
> I'm talking about the list iteration, there is no preempt_disable() in
> there, just the spin_lock, which you hold over the entire list, which
> can be many, many element.

Sorry for the misunderstanding.

The original code has one global lock and one single list that covers 
all the inodes in the filesystem. This patch essentially breaks it up 
into multiple smaller lists with one lock for each. So the lock hold 
time should have been greatly reduced unless we are unfortunately enough 
that most of the inodes are in one single list.

If lock hold time is a concern, I think in some cases we can set the an 
upper limit on how many inodes we want to process, release the lock, 
reacquire it and continue. I am just worry that using RCU and 16b 
cmpxchg will introduce too much complexity with no performance gain to show.

Cheers,
Longman