From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <56CE1124.7060208@hpe.com> Date: Wed, 24 Feb 2016 15:23:00 -0500 From: Waiman Long MIME-Version: 1.0 To: Jan Kara CC: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , , , Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch Subject: Re: [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list References: <1456254272-42313-1-git-send-email-Waiman.Long@hpe.com> <1456254272-42313-4-git-send-email-Waiman.Long@hpe.com> <20160224082840.GB10096@quack.suse.cz> In-Reply-To: <20160224082840.GB10096@quack.suse.cz> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: On 02/24/2016 03:28 AM, Jan Kara wrote: > On Tue 23-02-16 14:04:32, Waiman Long wrote: >> When many threads are trying to add or delete inode to or from >> a superblock's s_inodes list, spinlock contention on the list can >> become a performance bottleneck. >> >> This patch changes the s_inodes field to become a per-cpu list with >> per-cpu spinlocks. As a result, the following superblock inode list >> (sb->s_inodes) iteration functions in vfs are also being modified: >> >> 1. iterate_bdevs() >> 2. drop_pagecache_sb() >> 3. wait_sb_inodes() >> 4. evict_inodes() >> 5. invalidate_inodes() >> 6. fsnotify_unmount_inodes() >> 7. add_dquot_ref() >> 8. remove_dquot_ref() >> >> With an exit microbenchmark that creates a large number of threads, >> attachs many inodes to them and then exits. The runtimes of that >> microbenchmark with 1000 threads before and after the patch on a >> 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as >> follows: >> >> Kernel Elapsed Time System Time >> ------ ------------ ----------- >> Vanilla 4.5-rc4 65.29s 82m14s >> Patched 4.5-rc4 22.81s 23m03s >> >> Before the patch, spinlock contention at the inode_sb_list_add() >> function at the startup phase and the inode_sb_list_del() function at >> the exit phase were about 79% and 93% of total CPU time respectively >> (as measured by perf). After the patch, the percpu_list_add() >> function consumed only about 0.04% of CPU time at startup phase. The >> percpu_list_del() function consumed about 0.4% of CPU time at exit >> phase. There were still some spinlock contention, but they happened >> elsewhere. > While looking through this patch, I have noticed that the > list_for_each_entry_safe() iterations in evict_inodes() and > invalidate_inodes() are actually unnecessary. So if you first apply the > attached patch, you don't have to implement safe iteration variants at all. Thank for the patch. I will apply that in my next update. As for the safe iteration variant, I think I will keep it since I had implemented that already just in case it may be needed in some other places. > As a second comment, I'd note that this patch grows struct inode by 1 > pointer. It is probably acceptable for large machines given the speedup but > it should be noted in the changelog. Furthermore for UP or even small SMP > systems this is IMHO undesired bloat since the speedup won't be noticeable. > > So for these small systems it would be good if per-cpu list magic would just > fall back to single linked list with a spinlock. Do you think that is > reasonably doable? > I already have a somewhat separate code path for UP. So I can remove the lock pointer for that. For small SMP system, however, the only way to avoid the extra pointer is to add a config parameter to turn this feature off. That can be added as a separate patch, if necessary. BTW, I think the current inode structure is already pretty big, adding one more pointer will have too much impact on its overall size. Cheers, Longman