From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760822AbcBYOu0 (ORCPT ); Thu, 25 Feb 2016 09:50:26 -0500 Received: from mail-by2on0122.outbound.protection.outlook.com ([207.46.100.122]:14304 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1760202AbcBYOuV (ORCPT ); Thu, 25 Feb 2016 09:50:21 -0500 Authentication-Results: suse.cz; dkim=none (message not signed) header.d=none;suse.cz; dmarc=none action=none header.from=hpe.com; Message-ID: <56CF14A2.3070107@hpe.com> Date: Thu, 25 Feb 2016 09:50:10 -0500 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Jan Kara CC: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , , , Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch Subject: Re: [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list References: <1456254272-42313-1-git-send-email-Waiman.Long@hpe.com> <1456254272-42313-4-git-send-email-Waiman.Long@hpe.com> <20160224082840.GB10096@quack.suse.cz> <56CE1124.7060208@hpe.com> In-Reply-To: <56CE1124.7060208@hpe.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.170] X-ClientProxiedBy: CY1PR14CA0040.namprd14.prod.outlook.com (25.163.13.178) To CS1PR84MB0310.NAMPRD84.PROD.OUTLOOK.COM (25.162.190.28) X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0310;2:dURXPzS75XVFLpj5kizXYPXV6O970ahn1JhBXZ3y+kabVoU8W/KXof3AOKONsWImGyo3ECNMiCFKwLuTujCPe7QdClgyrCBdubIH9nMW7xDPagHFX0WoIMqPi8rTlqgbYJbZTZLh3NoiJf8hftX2IQ==;3:kNkGPNLHGOjTTHJlnUGi/OWQDn1Xu2xmFYA1uuBfn5AEoA9/1L0gj4bIVSxYAUpP/JuO+kv40cmk83sxMu8Gmj9rWV5YGtJ2GCxCTGJCHv6FQFgRjIzfEdc+yn8jeLi7;25:5Q0raXcGEbLY2mv/dKqgwr4Kq5+MR34Fihs5h2jlM2Rzn5VkIu5CN4YOVdiD5P2DI2PrYP8krJC+ul9mf+KKMB4fiPR32J0xE5k3m0pTYnXuHgZuQFWvpKKALkMMM+LFSW4cd2pFc5CeZofNgnzcxYeWKxwpywxYW0TdUs7V0KUdtWfLH5l/697VFzWQaijq7sz/zihCDXHhrTigyBdS5R10CbUUt+om/R0ai/bzXaq8RG3oq6fEfc6PirkdRmSSJsB3Fi0wokdEycz6pPl1MqaYZ+yP1lEb6WCp1NLtU9UqaG38oYMkdw5h8W/ZmK1UYtgP9ENaRmkRqVwY7R6GjA==;20:ZEV0uWF5Xb9/dVIpTB3ydKtg1TdzgIt54oRblNwAuzRNsWU8rV5loSE7ClG0LtA4RT9+8y33kHYJh81uvbGaWBMHrCVSJNA56lzpePbfflaiR+TS0k0/gp0w1UNjQZ6G4ImdqHSk28ycC+0KIXid0k7lXgtWOa0NtQwPk2y6RW90VvQDmAbgv98pUU9k5ZcIp+Kw//ZI5fb8m/MBR8RWeGSDu1iiaqLF6iwUD5OAnNwGX0/NcibD8sUBacoz6vVP X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0310; X-MS-Office365-Filtering-Correlation-Id: b99fa131-3100-4d86-868b-08d33df2fa5c X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046);SRVR:CS1PR84MB0310;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0310; X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0310;4:+e5dVGP8MhC6rc6s2KEk4rnu48IbbjDnEV/thEN9oG1Fd4wWq0HGTftX5OU1fLyqJ7cs+ZBSGLBNWyNBM8BGVHgTbcAa4Px1JuXPlUd4HLqEmGM0hIjsvvUszD9OjDMAc31g5ztslqTKvQji2QCj8UiNEONh0L+pVTdHzltjhBr1xCpdMucM48DYvAx1gjoMh6c0czIlPwQwwgFCdASakC5BLkh+ufbOWJ2NqJXQZ6HIgkl7A50reGfcO4F2afLlEPQoFPYaTIpHOmhla8WMkXE0F5CIYqvnJlqz/qrZjC9sxyMvdA/3q/84YIp9BHxLZ54mgzU9wBn+z4fohDyFyHkLmNZpHb8PFpewF4ELNoOy+dHV664S0nWUhcdeGtsZ X-Forefront-PRVS: 08635C03D4 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(377424004)(24454002)(54534003)(377454003)(479174004)(42186005)(3846002)(4001350100001)(6116002)(50466002)(76176999)(23756003)(2950100001)(1096002)(33656002)(2906002)(5890100001)(586003)(65816999)(230700001)(77096005)(87976001)(65956001)(189998001)(47776003)(36756003)(5001960100002)(110136002)(86362001)(4326007)(66066001)(92566002)(50986999)(54356999)(80316001)(5004730100002)(40100003)(5008740100001)(117156001)(7059030);DIR:OUT;SFP:1102;SCL:1;SRVR:CS1PR84MB0310;H:[192.168.142.193];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;CS1PR84MB0310;23:IF80oTIYomwh6u+7tCQW+CvO/OaXPMXksbivvyS?= =?iso-8859-1?Q?3+DLNLTp13bLjsIqN3oPDwFubG2hNNgI6FtLbCz5r6bu2swC3/uvJ+3Xqm?= =?iso-8859-1?Q?Y8voXxW8XrP3Ge6bXdUii19gfMOQp7YLDwCZBp1IhKJMfNyR635cpOJO2s?= =?iso-8859-1?Q?I+Zxm39sS/NIUXF9BOsLJWpbedFBHuHAo0Jcu4EPZTZtAPOW3pDTVOkoZa?= =?iso-8859-1?Q?dY8xuFRo5flHJROPhMxoGYZ7rfWmqjl84Eev96fHlffSdfDwo9U96LqlAs?= =?iso-8859-1?Q?qT5aq7ykRc9sZVz2mGAvVLOa2+9KY+e2Aj0E/VCjK1W/8X6cNwl7MA0GOM?= =?iso-8859-1?Q?01A2kNrALpvY6NE2b1GZ3CC/Xx4Iee3ID1GKtOZSFzXDmFy4C6Md2TZ6ox?= =?iso-8859-1?Q?RBXbuyD3m/1Syq5pQN6NSv7wgJAqfdpaNeVF0SDqi/cuT1er/oVsSqXa9q?= =?iso-8859-1?Q?u0MAc/zfflz0cWRshCIPIh0WFLIG4YWFNafT+IQZ51to1OUk/dBMtgLJJT?= =?iso-8859-1?Q?coOxCMrFEZ8SyD7Zb/qNWs496eVwSFWQPCKfHU3RLT3EEbpEE/JoI/HG9K?= =?iso-8859-1?Q?GQ/GxVNJeKSOGOSW1fWWRdE7VGwS6Hld5ZghOu2u/VraehJGiT1n4vKl7B?= =?iso-8859-1?Q?jS/rJCSkvPwLvKUiduMHi/zY/q8s46dC0GBK/aqNFZimRv4qFvl/zgDxd/?= =?iso-8859-1?Q?EJdfHWoi7kImH7prYEtdeYmqLII9KIyFtwykWJu9ObIHO3VKGtSI2Ox1DH?= =?iso-8859-1?Q?3ORIhESE1f/un/fwU4FhpT//WY3MfpIHKAW70ghq35NvI7NBisrzY6nmQc?= =?iso-8859-1?Q?bsTukNB85xpLUN6KGVDnwRUEMaH2HSGv8FdlGQ+N7LW0PsYdAGBgRJGcMA?= =?iso-8859-1?Q?T0uIgRAWfgh+k467Z21CchmBcEUKLuvqw3tfsoFxlFPK3/IPqGL2FZOEZb?= =?iso-8859-1?Q?EOP/JpDeN2pTFDvNNmkNTn9XQgYTG56KSzSaofXZk5JaUY7tHnfMmsODuw?= =?iso-8859-1?Q?K6Z1XODITwa0jwK017UNthx7BkcyFv7HQGiBCd+2zRCjRgSSRK5hitArE3?= =?iso-8859-1?Q?iX81LGjxGeB0Sn3iujRvhLExV2rV5Up/w3HylIwtKm8aU6XVrKRQAKgcq9?= =?iso-8859-1?Q?Nl6OiOvbrL9kQ3Oe/ednF15dWq1DE9OKNGhCkGX4jlCiEZ+VFRczx4Wx3r?= =?iso-8859-1?Q?3Zhcw9g6OiJ?= X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0310;5:96kk+AQClEHR8Ej2v8KO99wo9vvOaBVToRnn/AnCdAgDcmuBVjBWGGrWq5KFdy/JzIN1vhXfC71DDF6IWeVoAv8Vj9ZOll2ujKTzPku8z25nGNQHMInPmzZbdHuL+LdGXEem87GZ37D9apV6TqnoHA==;24:vFi4k7JhP0wDzwZe//PK3+DBOR+Py437ijzQOenTd2N1oMjwHUYMwYyJa03QAER1kV7XDGG8ceK5xkLO195gyDzJ0PFP3H1Fk9unxlB/OKo= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Feb 2016 14:50:16.7525 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CS1PR84MB0310 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/24/2016 03:23 PM, Waiman Long wrote: > On 02/24/2016 03:28 AM, Jan Kara wrote: >> On Tue 23-02-16 14:04:32, Waiman Long wrote: >>> When many threads are trying to add or delete inode to or from >>> a superblock's s_inodes list, spinlock contention on the list can >>> become a performance bottleneck. >>> >>> This patch changes the s_inodes field to become a per-cpu list with >>> per-cpu spinlocks. As a result, the following superblock inode list >>> (sb->s_inodes) iteration functions in vfs are also being modified: >>> >>> 1. iterate_bdevs() >>> 2. drop_pagecache_sb() >>> 3. wait_sb_inodes() >>> 4. evict_inodes() >>> 5. invalidate_inodes() >>> 6. fsnotify_unmount_inodes() >>> 7. add_dquot_ref() >>> 8. remove_dquot_ref() >>> >>> With an exit microbenchmark that creates a large number of threads, >>> attachs many inodes to them and then exits. The runtimes of that >>> microbenchmark with 1000 threads before and after the patch on a >>> 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as >>> follows: >>> >>> Kernel Elapsed Time System Time >>> ------ ------------ ----------- >>> Vanilla 4.5-rc4 65.29s 82m14s >>> Patched 4.5-rc4 22.81s 23m03s >>> >>> Before the patch, spinlock contention at the inode_sb_list_add() >>> function at the startup phase and the inode_sb_list_del() function at >>> the exit phase were about 79% and 93% of total CPU time respectively >>> (as measured by perf). After the patch, the percpu_list_add() >>> function consumed only about 0.04% of CPU time at startup phase. The >>> percpu_list_del() function consumed about 0.4% of CPU time at exit >>> phase. There were still some spinlock contention, but they happened >>> elsewhere. >> While looking through this patch, I have noticed that the >> list_for_each_entry_safe() iterations in evict_inodes() and >> invalidate_inodes() are actually unnecessary. So if you first apply the >> attached patch, you don't have to implement safe iteration variants >> at all. > > Thank for the patch. I will apply that in my next update. As for the > safe iteration variant, I think I will keep it since I had implemented > that already just in case it may be needed in some other places. > >> As a second comment, I'd note that this patch grows struct inode by 1 >> pointer. It is probably acceptable for large machines given the >> speedup but >> it should be noted in the changelog. Furthermore for UP or even small >> SMP >> systems this is IMHO undesired bloat since the speedup won't be >> noticeable. >> >> So for these small systems it would be good if per-cpu list magic >> would just >> fall back to single linked list with a spinlock. Do you think that is >> reasonably doable? >> > > I already have a somewhat separate code path for UP. So I can remove > the lock pointer for that. For small SMP system, however, the only way > to avoid the extra pointer is to add a config parameter to turn this > feature off. That can be added as a separate patch, if necessary. I am sorry that I need to retreat from this promise for UP. Removing the lock pointer will require change in the list deletion API to pass in the lock information. So I am not going to change it for the time being. Cheers, Longman