From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423450AbcBQQIn (ORCPT ); Wed, 17 Feb 2016 11:08:43 -0500 Received: from mail-bl2on0135.outbound.protection.outlook.com ([65.55.169.135]:61696 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1030275AbcBQQIk (ORCPT ); Wed, 17 Feb 2016 11:08:40 -0500 X-Greylist: delayed 143884 seconds by postgrey-1.27 at vger.kernel.org; Wed, 17 Feb 2016 11:08:40 EST Authentication-Results: fromorbit.com; dkim=none (message not signed) header.d=none;fromorbit.com; dmarc=none action=none header.from=hpe.com; Message-ID: <56C49AFF.1090103@hpe.com> Date: Wed, 17 Feb 2016 11:08:31 -0500 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Dave Chinner CC: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , , , Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch Subject: Re: [RRC PATCH 2/2] vfs: Use per-cpu list for superblock's inode list References: <1455672680-7153-1-git-send-email-Waiman.Long@hpe.com> <1455672680-7153-3-git-send-email-Waiman.Long@hpe.com> <20160217103739.GP14668@dastard> In-Reply-To: <20160217103739.GP14668@dastard> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [71.168.64.11] X-ClientProxiedBy: BN3PR09CA0015.namprd09.prod.outlook.com (25.160.111.153) To AT5PR84MB0130.NAMPRD84.PROD.OUTLOOK.COM (25.162.137.24) X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0130;2:IAmgR6UCRURwSHIE6SHqEzCIjFGN/pWDVXr9ruach2lUEUxo5fPeL6OEKEChYXCrweuTMEniQ/z/dDCzy8OH+z9tQ7vN7NqhnfZy/D/JiMH4XGD2gnv7qd4uix+QmiukVUuO91ekmGErfr9QBj5MtQ==;3:CXAiBKMvYQy2ZV5IVYNcQYkA8Uzuy75sb4zy+r8BAxMa9dTb8Uep16uhR4DI+dkE8e5g0pi8R7Ea9WOE5nr1V3nMEuP2JoKFiLBN7yJoiFzWp6PZBop8xKBMHO1gq3DA;25:aqjnky3ej0TRGJSLhd/XQ23lGN5kpZMVOlyRuJDh35UeTawo6gZMREs3JraNo5C+oQcofZpQ7AqMd5C3x4Gw0iJMEogcEk1IpgSqUyMQRCYLkFp6aG7IkNuZJEvbv7lmMcLORLFGp9eh693nNNE22mybsAkFgdRIHmZ5tGt+tnVgnjpWYTLaCH+WREDCUsaCv9vHNdYphvQYMBbWZgBNICweAGCRUQZZgLgMAymw7mg9MJdacpb6lrZcqdyes/+EjLCcUYiACHnVP8DRVmBxNBCzsrR+q8qYFU5ReRhAm6zKWTHj7C/tD5kK2wdY3RbS;20:lt4rZNIgYQERFhUx7cTiy3aZiZdXClsPaWtq7MhyJAcLzl9WAfH2R+s8S++S3laXuUYTCuv5Jqgn4k+55QnGGcKNzOX4AHckB+gSBLyXbFACuyqei2oTg9canwWFw4ZZsD+1WDqLwyGiguVqondmbeWYYb2iUR0Q6WEqRbmtMUFeTQvlr5TYsyp4VXmNHI//PiSKSCcHDX0uSHLqArpp2tKOjwTnhCOQvKtykdtAOpYvRHgWyq/sGeGkoAo5gRBj X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AT5PR84MB0130; X-MS-Office365-Filtering-Correlation-Id: 120557f4-d7e5-4b67-5b47-08d337b4986e X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:AT5PR84MB0130;BCL:0;PCL:0;RULEID:;SRVR:AT5PR84MB0130; X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0130;4:GKIhYPg6ihemp4sp8weBYmhEEjBiGYJLc1kCQutJ6nwj1++NQWXsJ9Z9U8m6blvc7skWdOR1BUTNDOqN0YBCyFzczZoUGJ52zo2iHQXBVICXN3aYCGUYNQptxreuC0AF74+bBmE69j9TsnnnAbIxnQkdGt0hhAC/CFYkshiI5PfdnXXKTq1uVBToQFWYzvrVVLQ/2tHqhfBVx3h0eq3PxjvuAnxGYnRI/ThYNsTshweWWUEI94XkTaKxYjM9VFxT8JvGo/pq4uVvm2zzOa6tHtRdA3oNztx575dgglRau2wrkfop0s7T2zt+KL8O876smjZAf1+3upVZm1/HLXdrYw6bqIwYzD2+03+XM9MctFVE4/Zv5Qh59wRp51q1c6D/ X-Forefront-PRVS: 085551F5A8 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(979002)(6009001)(6049001)(24454002)(479174004)(51914003)(377454003)(81156007)(50466002)(5890100001)(4001350100001)(86362001)(80316001)(6116002)(110136002)(189998001)(5001960100002)(42186005)(83506001)(5004730100002)(36756003)(47776003)(230700001)(4326007)(23756003)(2950100001)(77096005)(5008740100001)(2906002)(3846002)(87266999)(76176999)(64126003)(54356999)(19580395003)(65816999)(117156001)(50986999)(40100003)(87976001)(66066001)(1096002)(586003)(92566002)(65806001)(59896002)(65956001)(33656002)(7059030)(217873001)(969003)(989001)(999001)(1009001)(1019001);DIR:OUT;SFP:1102;SCL:1;SRVR:AT5PR84MB0130;H:[192.168.142.188];FPR:;SPF:None;MLV:ovrnspm;PTR:InfoNoRecords;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;AT5PR84MB0130;23:zLcBqARHyGNUL+CCbajuIwWUgOIffO860vhtoxS?= =?iso-8859-1?Q?9DfsdmMAaCM0vfAEOGCzDppzXOrCd1K83LjI85SwOX8yVliH5iyM6tiPY0?= =?iso-8859-1?Q?RA3sFXVdHBg0Z59FV23QLp1r9R+zFI/Hf9UgNGNzdyrFyK5fy9xZqQPFCh?= =?iso-8859-1?Q?CnJF94QRFVUtq45TAs7MtuzP9DZbddhFeM6ynBOciZzVXomTKD7Cc/4EpI?= =?iso-8859-1?Q?2kwkC/aDy33EfS6LB6M8LoeJo9qR528ZOq+UQ0Hv1/2rGgukWqZ9TZB2Wx?= =?iso-8859-1?Q?C3/tiS5JiaOtkHydZsY2LFXiaMiFmot35AJMX5S3/bL2rZMhT2o3WDtX2K?= =?iso-8859-1?Q?cZBX3P6xd1z0L0Req98dxP+4YO/ufOVaH1C4dMcHk/Ts0R7OhboD7oAWJJ?= =?iso-8859-1?Q?hbkd9YBGlggNfnp3fH3nVflGm/81+680oSfnFp/cQZOIoYoCCRh375OgsA?= =?iso-8859-1?Q?ooJHzt+EuB1N+Yvmb9H4PPJRAE+danlP4vJG6R2maodLuN7gB7w9zldI/1?= =?iso-8859-1?Q?EUcSzztNdTDQ8E+Tk8WPWtxi5aL4ruhKqDdCEyOAhAOtNKndVSrUJXEE+m?= =?iso-8859-1?Q?n0H8itpDiDkxo8+Z2lDnk/Zg4cu+0LPd0OdRtp4A4i5XRhERALPG0alLaZ?= =?iso-8859-1?Q?20Wztej19p2RehrFCals6GKN+5szA9djgKOY+/btaZZk9ypj5iXw4ECm/F?= =?iso-8859-1?Q?V9ZrfjGsry8EoZpv8zxnXUzOiEbliVoVaJszUorFvC5heWaKtWTY5pyCnF?= =?iso-8859-1?Q?kmfMj7AzRWJDtW/uTTYhnafEToAoSHwZt6ZM4cty3yDMjhmrnL8BkMqgei?= =?iso-8859-1?Q?VL5zUlnPeD7I9RYaOyyYdCvbRtVX2Ei9oj1P4NK4LUS3mHwqlxp5GDZIKd?= =?iso-8859-1?Q?WyBAHNyaWg26EFdI7kYBwOKxmkVewZ4opo0IMs+CumAPn3xn01yapgCVIQ?= =?iso-8859-1?Q?7EqZl/YvkLXh1M9Nulqp/VcPVmqCtXTrYbDqunMVw4v2Pa2Si+ErDz7BDD?= =?iso-8859-1?Q?esIE3E5PiesAaQkszdrxcEnAN2THft4sgMOGUweJNxlqcwezHuMI6aRi2u?= =?iso-8859-1?Q?+0ByU1UYfih8AG+yztYoiR4mVoP5o+JfbmJTGKd5t8fCndG6uoN64oyhsR?= =?iso-8859-1?Q?OPPWcBFm0n6N9L8VagMFDOHUt4/GIOhJixrWgJI4jGzfhWV3kr06P7lHFo?= =?iso-8859-1?Q?PsG2Am/b0HQnzv4Q1sD+o1CsopsWogCibL74Io3w27njJ9m6Um9Xyf9Axb?= =?iso-8859-1?Q?qJ1UVRB15T7ulJNGKIoaeM51QOAGbBHfIC9PrgrSFbeoUumu8jF92AZ2dy?= =?iso-8859-1?Q?H8mFwgMViM+38imcp3BVAmH2sXq7YPrQDaxyALumm/M1fqWzIhHiETYSse?= =?iso-8859-1?Q?tMrsYMnnKW1O+NmTW30THOeC1BLeJZZxsBcGJRSCM4DcV2+Sq3KM9PU+zN?= =?iso-8859-1?Q?Fk3VI1oYR4egRpMBFtQH4TsvCrbOlkAvOYN?= X-Microsoft-Exchange-Diagnostics: 1;AT5PR84MB0130;5:5wvVMe60SyKMGeAIJM3hwXb6S56LzjIeutEzYjfvXKaPEwCLHTie4zPUqnarkZMPHELfMjgRcmwEDkcs/K1ojYgo2YPdQejsgUZom8dIgK1BqXaulIfPlgZtE/K6m/k9hEyM65c1/b0C+vQSyZUBLQ==;24:TKUvLYqNErHzLZxAyle3GmwBzTPsCWC8SGLeaXqcWFU5y7VaxFJNigINzFl+0Ur5Bsax8LHb2nLwdhWIUA6dx3rm8eFl21K/ipa81NjFFdM= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Feb 2016 16:08:37.2100 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AT5PR84MB0130 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/17/2016 05:37 AM, Dave Chinner wrote: > On Tue, Feb 16, 2016 at 08:31:20PM -0500, Waiman Long wrote: >> When many threads are trying to add or delete inode to or from >> a superblock's s_inodes list, spinlock contention on the list can >> become a performance bottleneck. >> >> This patch changes the s_inodes field to become a per-cpu list with >> per-cpu spinlocks. >> >> With an exit microbenchmark that creates a large number of threads, >> attachs many inodes to them and then exits. The runtimes of that >> microbenchmark with 1000 threads before and after the patch on a >> 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as >> follows: >> >> Kernel Elapsed Time System Time >> ------ ------------ ----------- >> Vanilla 4.5-rc4 65.29s 82m14s >> Patched 4.5-rc4 22.81s 23m03s > Pretty good :) > > My fsmark tests usually show up a fair bit of contention - moving > 250k inodes through the cache every second over 16p does generate a > bit of load on the list. The patch makes the inode list add/del > operations disappear completely from the perf profiles, and there's > a marginal decrease in runtime (~4m40s vs 4m30s). I think the global > lock is right on the edge of breakdown under this load, though, so > if I was testing on a larger system I think the difference would be > much bigger. > > I'll run some more testing on it, see if anything breaks. > > A few comments on the code follow. > >> @@ -1866,8 +1866,8 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg) >> { >> struct inode *inode, *old_inode = NULL; >> >> - spin_lock(&blockdev_superblock->s_inode_list_lock); >> - list_for_each_entry(inode,&blockdev_superblock->s_inodes, i_sb_list) { >> + for_all_percpu_list_entries_simple(inode, percpu_lock, >> + blockdev_superblock->s_inodes_cpu, i_sb_list) { > This is kind what I meant about names getting way too long. How > about something like: > > #define walk_sb_inodes(inode, sb, pcpu_lock) \ > for_all_percpu_list_entries_simple(inode, pcpu_lock, \ > sb->s_inodes_list, i_sb_list) > > #define walk_sb_inodes_end(pcpu_lock) end_all_percpu_list_entries(pcpu_lock) > > for brevity? Yes, I think adding some inode specific macros in fs.h will help to make the patch easier to read. >> @@ -189,7 +190,7 @@ void fsnotify_unmount_inodes(struct super_block *sb) >> spin_unlock(&inode->i_lock); >> >> /* In case the dropping of a reference would nuke next_i. */ >> - while (&next_i->i_sb_list !=&sb->s_inodes) { >> + while (&next_i->i_sb_list.list != percpu_head) { >> spin_lock(&next_i->i_lock); >> if (!(next_i->i_state& (I_FREEING | I_WILL_FREE))&& >> atomic_read(&next_i->i_count)) { >> @@ -199,16 +200,16 @@ void fsnotify_unmount_inodes(struct super_block *sb) >> break; >> } >> spin_unlock(&next_i->i_lock); >> - next_i = list_next_entry(next_i, i_sb_list); >> + next_i = list_next_entry(next_i, i_sb_list.list); > pcpu_list_next_entry(next_i, i_sb_list)? Will add that. >> @@ -1397,9 +1398,8 @@ struct super_block { >> */ >> int s_stack_depth; >> >> - /* s_inode_list_lock protects s_inodes */ >> - spinlock_t s_inode_list_lock ____cacheline_aligned_in_smp; >> - struct list_head s_inodes; /* all inodes */ >> + /* The percpu locks protect s_inodes_cpu */ >> + PERCPU_LIST_HEAD(s_inodes_cpu); /* all inodes */ > There is no need to encode the type of list into the name. > i.e. drop the "_cpu" suffix - we can see it's a percpu list from the > declaration. Will remove that macro. Thanks for the review. Cheers, Longman