From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758892AbcBXUXU (ORCPT ); Wed, 24 Feb 2016 15:23:20 -0500 Received: from mail-bn1on0144.outbound.protection.outlook.com ([157.56.110.144]:12064 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1758713AbcBXUXS (ORCPT ); Wed, 24 Feb 2016 15:23:18 -0500 Authentication-Results: suse.cz; dkim=none (message not signed) header.d=none;suse.cz; dmarc=none action=none header.from=hpe.com; Message-ID: <56CE1124.7060208@hpe.com> Date: Wed, 24 Feb 2016 15:23:00 -0500 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Jan Kara CC: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , , , Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch Subject: Re: [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list References: <1456254272-42313-1-git-send-email-Waiman.Long@hpe.com> <1456254272-42313-4-git-send-email-Waiman.Long@hpe.com> <20160224082840.GB10096@quack.suse.cz> In-Reply-To: <20160224082840.GB10096@quack.suse.cz> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.229] X-ClientProxiedBy: SN1PR12CA0022.namprd12.prod.outlook.com (25.162.96.160) To CS1PR84MB0312.NAMPRD84.PROD.OUTLOOK.COM (25.162.190.30) X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;2:YKviaqlsT4u6MWg+sJHWRN58is2q/wHsPDotJEJDKE8u99js37XknSjcH2NJPFKJAHemmNei+SFMz7UGwir0JdEa+oCf4iQ4UzZLIzh65lQ1ULxvcgRkCBOXuHg1Nedt0veUnLJox74nENTS++/tSg==;3:Y/lzJhrXivyQMeuW2gcueDfGcscCTUus+s4BEI0777YzAaJT9VLDMNHrBvkQP9QM0bmeeRBs27pDYbiA59LSpM5Xe+q+DB8/qIylXJ6XTOIL7254l5RTYUEC/sXq09Xq;25:w0vs1s2y5Vqr6Bk0MoCt8v2DBtXZUB32z/nqApr/zaUqHccxSSUgvBEcqZCujqd877bl16usKh+UMAk7PTar4ZarwimeOc8+UU3ht5+6SE6HOAU8o1TqZofiDfwnrVdDkCUdh+tvEZ5I+L7z8t069CU6Dq7ZueIZYxcyBlGn+d5O7q8OeapyH+gK+lthLBmmpN0ML2eRTJPZrSNo0vkPPmza2qIbnz8AvCIHxPTad4/KGCtaO4gxHHNIIst6RuZxn2xcPIamPOcTpJZPCeReJ7LQzhfgL2NFQQVDmmxTDBAruHuqw5K+oxv7jThOZu21SXShuN2ajJl4Rk5odJhlYg==;20:mBkGGDsk/HV3xrfbFpSpCMM8dEdDab2yl+xcoAP8pxEz8JMnB4oGSF17sZHlH7Bsane3e+W5ydGgvgtz8DWZZUBkaqQGwYouu7ZFkfOIsBgN1a+AFicxuE/Xa6ODOmejsKzyCimiWHYMjJ+DE4mvBxsJvzOPzWTsrUEmXa/Fkt+deKubmrkXwpyUiyhfgN73VK+8h13f9t3uv+trLQii4m5EWYRgP4+tr50nF/scTpf8IG6gBP2AmFa4B5hgB7Pd X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0312; X-MS-Office365-Filtering-Correlation-Id: ec3e1322-86aa-4e77-11d0-08d33d584fa8 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001);SRVR:CS1PR84MB0312;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0312; X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;4:aq1TPSEBmA9mU1k1PqRq8knSggFgjt7MzwS/8Ak39W/lzsbQD8sRkNrmJXjs0QY5YSORL5WdDqzb1U73L1yBkpf32RmKp5Uq5jvcIRN58t6vD8JRb7pvXvVyWCweY439S4MjxaBEHC6dz2rP8BZmJ1lwd5xRhE/KjPXlLFQMzHXr/M8q5Is2XpoFxPm9V1t88fN5nAiJQq2eXKS4DE4SZh8cVxmvQZqwwqxX1Ac9vMGFH7YOQvaxS7Un21OxO5hmtNHsfISIgomzXrEdCzeaMhCgDPxJKBO+XrGwu2lu69G0Ge4a8kQ1G6SmfaHBvN5cHnN7bzYogcv+auuBeksFz+nP5J2ELxzXFSGu3zxSD3gr2yuihn4zXGFEql0GZFpv X-Forefront-PRVS: 08626BE3A5 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(377454003)(377424004)(24454002)(479174004)(59896002)(117156001)(230700001)(5008740100001)(50466002)(83506001)(5890100001)(77096005)(86362001)(189998001)(87976001)(42186005)(5001960100002)(23756003)(64126003)(40100003)(2950100001)(110136002)(36756003)(92566002)(5004730100002)(47776003)(65956001)(66066001)(1096002)(76176999)(586003)(33656002)(87266999)(4326007)(6116002)(3846002)(54356999)(50986999)(2906002)(65816999)(4001350100001)(7059030);DIR:OUT;SFP:1102;SCL:1;SRVR:CS1PR84MB0312;H:[192.168.142.192];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;CS1PR84MB0312;23:bX/E6ftmfvBIN3CMs3eepdA2RCCk7R+I9gDRMaT?= =?iso-8859-1?Q?/g19PnCQCeIwmy5mU5yiYRHBR3w06qjtI61UJTwbxHMz37cCB0iUiodyXR?= =?iso-8859-1?Q?H3AXxGK/ywj1MVeczJ6N6sE2vuxI48RLio8BGB+Nh1Mgwo2nXe7e8e39Gk?= =?iso-8859-1?Q?mVCJCLKg/4SIFgXZqV5PQCciYMaVz3sQbdYGdIzuQovFePtaUE0WE7bX5O?= =?iso-8859-1?Q?e0qSijflnCpYuJMA3HEoJjq5Uw9tX4pITObm1C+SL82db0W/nMaD/bfLRl?= =?iso-8859-1?Q?wmIBooktugl/0xbhQyVfGQ6PrnZy0uCtocG8ifAQu7sb3j3CBY66p+QFvj?= =?iso-8859-1?Q?VRvXNEWegrUEOFG8kSaFHzGE7WUqJR5qZwHvzcVLWr7hDpq+bT/DEo0bfF?= =?iso-8859-1?Q?4bzAzRBZ4iZOwy8dneG/sLC98VHx9qoe/MYLqf46HLw2PMYmY148seimai?= =?iso-8859-1?Q?qx/31HtRqssjZZbYtXNUdbP8Y456C5mMZOF/hL1aEeKFxbfjqkJZacnCPI?= =?iso-8859-1?Q?mmUVrG/MGgvUa3ydU6omNb5I9Fj9Y/PrdsZM9gPWLxSoMBwlvI4YsR+JBn?= =?iso-8859-1?Q?smXOywdbw6JDEEFzhCLcfnzrZm1viShSZhwkjBC//RUNT+oOAF0TTxQQD9?= =?iso-8859-1?Q?kdCsJZYM4g7mNrs8D3VkYSCyIV7FYL/E4yDTvAkZdZbeRtzuHm1vqHm0T3?= =?iso-8859-1?Q?aY0la21CuzlPYvEdNE+nExX12hPI8vf7FMDMmpc2laONsvvB0oLg7QqKmP?= =?iso-8859-1?Q?JzeqGdqwUS6VvGcNSxAwKxjWQ4tL62PNnFNOh9F+vxwP+mJdvLpIij/adi?= =?iso-8859-1?Q?8ccs8DEC7Rp+SoUcATtS2QA0p3loyZjqwz9+EsithwzJE0Ach1YNLLxeFJ?= =?iso-8859-1?Q?1IunRFR4o8pDY3jVvmD6Exyd4A0E1Xu21iDLJNu+LMFoF0dkzk17VdHKgg?= =?iso-8859-1?Q?4oEluVdJVCNLRBt8TWTr8IHhZk6515EJJb9cRuD6X7ZwP81EbzjZDCGRqw?= =?iso-8859-1?Q?f3lnnMETX5E/9W9aJhELFGpy1f/FsrQPTK8zimUdjIEY23W6A4musCITId?= =?iso-8859-1?Q?XzCDxJCtlxrvoiEtvxehKSX8IlvtbQM5X7qp29HCqrhj2ZylYNnyUwyg5H?= =?iso-8859-1?Q?APpF74Qw29ubR6gHNLvabWtkB/4+le0iZqiardIKcC8EVZ+ISlCdCcixFR?= =?iso-8859-1?Q?0GygHs104elRDZhgLCgyQtVpKPxnXUS5Wwyllc5H/XiXH2/PFXqAsI=3D?= X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;5:9lcPOsUWQ9WDYzK9l8y6w1y+PPI2UqwMDf8O+bYIpMkhGO9DPY33k85JGEKmZp62U8lYWqEt+XGSMwGfs7u3mugJiVVKRuOGCi0tIlXkUDV9naT5NDf4dCpgOX5ymocbAKqNv6daOCGIipZ5slb/SA==;24:B9fRR6t+UTIbtpslHMrRTMcQyWV/PVtDcS3pV6jAQNUdaMeRL5YQwpfE/9rNlMiv1tbSZE+J7q8s5TFZlGCaxcFfN489gv9HhxX6SyFti+s= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Feb 2016 20:23:07.7045 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CS1PR84MB0312 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/24/2016 03:28 AM, Jan Kara wrote: > On Tue 23-02-16 14:04:32, Waiman Long wrote: >> When many threads are trying to add or delete inode to or from >> a superblock's s_inodes list, spinlock contention on the list can >> become a performance bottleneck. >> >> This patch changes the s_inodes field to become a per-cpu list with >> per-cpu spinlocks. As a result, the following superblock inode list >> (sb->s_inodes) iteration functions in vfs are also being modified: >> >> 1. iterate_bdevs() >> 2. drop_pagecache_sb() >> 3. wait_sb_inodes() >> 4. evict_inodes() >> 5. invalidate_inodes() >> 6. fsnotify_unmount_inodes() >> 7. add_dquot_ref() >> 8. remove_dquot_ref() >> >> With an exit microbenchmark that creates a large number of threads, >> attachs many inodes to them and then exits. The runtimes of that >> microbenchmark with 1000 threads before and after the patch on a >> 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as >> follows: >> >> Kernel Elapsed Time System Time >> ------ ------------ ----------- >> Vanilla 4.5-rc4 65.29s 82m14s >> Patched 4.5-rc4 22.81s 23m03s >> >> Before the patch, spinlock contention at the inode_sb_list_add() >> function at the startup phase and the inode_sb_list_del() function at >> the exit phase were about 79% and 93% of total CPU time respectively >> (as measured by perf). After the patch, the percpu_list_add() >> function consumed only about 0.04% of CPU time at startup phase. The >> percpu_list_del() function consumed about 0.4% of CPU time at exit >> phase. There were still some spinlock contention, but they happened >> elsewhere. > While looking through this patch, I have noticed that the > list_for_each_entry_safe() iterations in evict_inodes() and > invalidate_inodes() are actually unnecessary. So if you first apply the > attached patch, you don't have to implement safe iteration variants at all. Thank for the patch. I will apply that in my next update. As for the safe iteration variant, I think I will keep it since I had implemented that already just in case it may be needed in some other places. > As a second comment, I'd note that this patch grows struct inode by 1 > pointer. It is probably acceptable for large machines given the speedup but > it should be noted in the changelog. Furthermore for UP or even small SMP > systems this is IMHO undesired bloat since the speedup won't be noticeable. > > So for these small systems it would be good if per-cpu list magic would just > fall back to single linked list with a spinlock. Do you think that is > reasonably doable? > I already have a somewhat separate code path for UP. So I can remove the lock pointer for that. For small SMP system, however, the only way to avoid the extra pointer is to add a config parameter to turn this feature off. That can be added as a separate patch, if necessary. BTW, I think the current inode structure is already pretty big, adding one more pointer will have too much impact on its overall size. Cheers, Longman