From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-by2on0122.outbound.protection.outlook.com ([207.46.100.122]:14304 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1760202AbcBYOuV (ORCPT ); Thu, 25 Feb 2016 09:50:21 -0500 Message-ID: <56CF14A2.3070107@hpe.com> Date: Thu, 25 Feb 2016 09:50:10 -0500 From: Waiman Long MIME-Version: 1.0 To: Jan Kara CC: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , , , Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch Subject: Re: [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list References: <1456254272-42313-1-git-send-email-Waiman.Long@hpe.com> <1456254272-42313-4-git-send-email-Waiman.Long@hpe.com> <20160224082840.GB10096@quack.suse.cz> <56CE1124.7060208@hpe.com> In-Reply-To: <56CE1124.7060208@hpe.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 02/24/2016 03:23 PM, Waiman Long wrote: > On 02/24/2016 03:28 AM, Jan Kara wrote: >> On Tue 23-02-16 14:04:32, Waiman Long wrote: >>> When many threads are trying to add or delete inode to or from >>> a superblock's s_inodes list, spinlock contention on the list can >>> become a performance bottleneck. >>> >>> This patch changes the s_inodes field to become a per-cpu list with >>> per-cpu spinlocks. As a result, the following superblock inode list >>> (sb->s_inodes) iteration functions in vfs are also being modified: >>> >>> 1. iterate_bdevs() >>> 2. drop_pagecache_sb() >>> 3. wait_sb_inodes() >>> 4. evict_inodes() >>> 5. invalidate_inodes() >>> 6. fsnotify_unmount_inodes() >>> 7. add_dquot_ref() >>> 8. remove_dquot_ref() >>> >>> With an exit microbenchmark that creates a large number of threads, >>> attachs many inodes to them and then exits. The runtimes of that >>> microbenchmark with 1000 threads before and after the patch on a >>> 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as >>> follows: >>> >>> Kernel Elapsed Time System Time >>> ------ ------------ ----------- >>> Vanilla 4.5-rc4 65.29s 82m14s >>> Patched 4.5-rc4 22.81s 23m03s >>> >>> Before the patch, spinlock contention at the inode_sb_list_add() >>> function at the startup phase and the inode_sb_list_del() function at >>> the exit phase were about 79% and 93% of total CPU time respectively >>> (as measured by perf). After the patch, the percpu_list_add() >>> function consumed only about 0.04% of CPU time at startup phase. The >>> percpu_list_del() function consumed about 0.4% of CPU time at exit >>> phase. There were still some spinlock contention, but they happened >>> elsewhere. >> While looking through this patch, I have noticed that the >> list_for_each_entry_safe() iterations in evict_inodes() and >> invalidate_inodes() are actually unnecessary. So if you first apply the >> attached patch, you don't have to implement safe iteration variants >> at all. > > Thank for the patch. I will apply that in my next update. As for the > safe iteration variant, I think I will keep it since I had implemented > that already just in case it may be needed in some other places. > >> As a second comment, I'd note that this patch grows struct inode by 1 >> pointer. It is probably acceptable for large machines given the >> speedup but >> it should be noted in the changelog. Furthermore for UP or even small >> SMP >> systems this is IMHO undesired bloat since the speedup won't be >> noticeable. >> >> So for these small systems it would be good if per-cpu list magic >> would just >> fall back to single linked list with a spinlock. Do you think that is >> reasonably doable? >> > > I already have a somewhat separate code path for UP. So I can remove > the lock pointer for that. For small SMP system, however, the only way > to avoid the extra pointer is to add a config parameter to turn this > feature off. That can be added as a separate patch, if necessary. I am sorry that I need to retreat from this promise for UP. Removing the lock pointer will require change in the list deletion API to pass in the lock information. So I am not going to change it for the time being. Cheers, Longman