From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760815AbcBYOoD (ORCPT ); Thu, 25 Feb 2016 09:44:03 -0500 Received: from mail-bn1bon0133.outbound.protection.outlook.com ([157.56.111.133]:17680 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1760610AbcBYOoA (ORCPT ); Thu, 25 Feb 2016 09:44:00 -0500 Authentication-Results: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=hpe.com; Message-ID: <56CF1322.2040609@hpe.com> Date: Thu, 25 Feb 2016 09:43:46 -0500 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Ingo Molnar CC: Jan Kara , Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , , , Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch Subject: Re: [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list References: <1456254272-42313-1-git-send-email-Waiman.Long@hpe.com> <1456254272-42313-4-git-send-email-Waiman.Long@hpe.com> <20160224082840.GB10096@quack.suse.cz> <20160224083630.GA22868@gmail.com> <20160224085858.GE10096@quack.suse.cz> <20160225080635.GB10611@gmail.com> In-Reply-To: <20160225080635.GB10611@gmail.com> Content-Type: multipart/mixed; boundary="------------040703040006030204020606" X-Originating-IP: [72.71.243.170] X-ClientProxiedBy: SN1PR12CA0014.namprd12.prod.outlook.com (25.162.96.152) To TU4PR84MB0319.NAMPRD84.PROD.OUTLOOK.COM (25.162.186.29) X-Microsoft-Exchange-Diagnostics: 1;TU4PR84MB0319;2:E3bw14IBZKg4BUYuhTn+FyP21Mgw4oakSGcnh0g9c8i11aM+0llwqi1Wyk9+VAxHRah0axhBJDSaBNdA6pfAcDXj8c686qaO7yjWZ2Y3G0vAjhnuf5Vr94rHJ6G81TqjamLEZwxf7ra5cpqZdeHBzg==;3:kglMRmyG46nGr799q/het0izzkn0q3Bt4mu+pPhKwUqCQjv51G1Vp/NKthH/pHdTA6/4FudUeq8LI/IWK1AxdvPLYcOLOHX2iMlTm0oNGGwTbRKpfvlHFloAheOqHCCr;25:dKSTJhrDCPQYrJU7McWJ3Ho7PJqCWkS4O+FvFPTehgHxHQW+i98r2FUjcOvwHhOfddPwK7EYZKIngvZa3wxc/DFG6k3jY1pCfl8uuXLHaUQQ3Qfp/tAflv8R4QR+Xf4rkcs2DYbAnGm/7IREGwITYcTuRqQGNw0+B+iY0+Hz3Jbt0YSE7RX4LnuQXN4s1TAVneez+4KmeLMOR3eeuCazTREqZkft/RT/wZAbr0a0+9ZEA2aPCDYGSciXkNSbkiJ4z+xbztkAkR5c+hUWD956JwYu11PowolT0PjT5T3y+bffK2u5qP/RTbVgkgt4oEMMYQEZxooVDwysXroci6JCfg==;20:e2rkOYNYW2hz84DROAhw45xo+czBwzjl1QrFBeItkBPA2ja6R+LkG/9wyqhBbK1fU/5YSdF2IkYWPrbhMkyRqHopMlJbA6bRPJ5EEti9Nz7rnJoFRptQOwW1/wBW1KC3XTmg00UqrKp2lyYlQYguy2zarFhjicjNmcthdnHehMCiWoxTHhzYdDaV6cg1A4EHfCNnZDNJW0FY43X7UvlJDu057CceKIbjn5+peOjN2cngv849o9kEmJIYnXbFZS8S X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:TU4PR84MB0319; X-MS-Office365-Filtering-Correlation-Id: 302a318d-a24c-4381-f5c5-08d33df216c7 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(102415293)(102615271)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:TU4PR84MB0319;BCL:0;PCL:0;RULEID:;SRVR:TU4PR84MB0319; X-Microsoft-Exchange-Diagnostics: 1;TU4PR84MB0319;4:9Iz8bdYqrgYvkRX7S5SL74hrbFusN61VdTIelahlcbDz/CfHPT1ZxBDBrhjPMId0jPxRpy37yMUTj7kSzo5v7XQXmaSBFYU81fm2Lw0dbnD/UctKF55/bakwEeXN7Zh//pGXMTp0pSivCnH9sR8YEf8nrXzcYegzYDm1keK6GYTWlB0JSko0bbhKsDyZpZs2XuIOmGzcZb2PFQ1msvU9G83v6tVbo7YGxJ5G7K64VyOGlwb1bh3DmCtWfdMuvoSKaezRSmz/KReduCCK9cOsVLUsnJ+l7K8Ta33wCpXyAPftJJyLxOqfbKI7FT2CW+/bDFTGUtKm48m99DC7CKp7YUaTwIxn/f5uDHs7FVEjcx0xfiQCcz43XBPslmpfmkoPp6LEWAgFlbnkRKgRamwLZBSVVo9mJbTlvyLOhZG5uho= X-Forefront-PRVS: 08635C03D4 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(54534003)(24454002)(164054003)(377454003)(479174004)(5890100001)(42186005)(270700001)(512934002)(5008740100001)(117156001)(33656002)(65816999)(5001960100002)(189998001)(110136002)(5004730100002)(76176999)(568964002)(2950100001)(6116002)(54356999)(40100003)(3846002)(36756003)(1096002)(50986999)(92566002)(84326002)(87976001)(86362001)(2476003)(77096005)(2906002)(66066001)(19580395003)(80316001)(19580405001)(4610100001)(65956001)(586003)(4001350100001)(4326007)(7059030);DIR:OUT;SFP:1102;SCL:1;SRVR:TU4PR84MB0319;H:[192.168.142.193];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;TU4PR84MB0319;23:CagE5VfJGmpuobD1ITGYR/l+VR5PC5jeRZY6yvnGc?= =?us-ascii?Q?xLCaP95deOl/tYhGqMmje6kAaYVIwx4892/pkDfseAXk8u6gk2N+XOdR4o6b?= =?us-ascii?Q?MVjbqzJnplT1pbuBpqGljtfi527Gd833aHm+3vmGzcsli/k8h8YhV+xhe6v0?= =?us-ascii?Q?8ZunXekUn7uL5yl8Xuwp3SZRAg4CJ9Duw/GZHU2xZIyLH06yAFr4iD0YecvZ?= =?us-ascii?Q?lfBY9d7MkvEYIM/nAZoXKl/xISUiJ0omgJxR20mdL8rEHBWkmItCnaLq2X/W?= =?us-ascii?Q?GRuYCL7V+siMwgD5IfBO9QWX7jHxJLifV+5Hl/k4yvftaQUx9Vp+ojRIEy0B?= =?us-ascii?Q?T2bfoQKhAVNRQh4NPHNsMGMhJmc3n+6E0KFdPBZOUK18ovu6WolNFYwBFbCC?= =?us-ascii?Q?1JYwfu3+0LpT504KCaVFIxegPou/IpkG2UMQwsFRquFq9xDJ0QU7wr9+RVBI?= =?us-ascii?Q?CQbE6kWjCrLWqreSax6lR7J9whWesyWL0g36jMldpPgmf/TPj5XQR7tW0cTp?= =?us-ascii?Q?wjpiKEDEXniTEATkzZdXwdIKEKOdh3k0eVG+KaurDivea0LilL7DDbSzbFWS?= =?us-ascii?Q?a1icQ53hmoULcLJfBjXE+q5wtSgawJATxJ6IhGluLd759Jz48eI4JqaqLxya?= =?us-ascii?Q?zMNMcbPCOltS+ETWmKUZH3kkm3UHcTdLK9ROgWPJUqI0vUM6y5rIXV/NG8DF?= =?us-ascii?Q?Y1C9DlvRjXVSE6CaMQ5GeTM9fWaP3RFgnKGmLB8QVMOUSZRcU4zBteJTH+no?= =?us-ascii?Q?ohJD5uA8js1Mp6TzNbCO/W+zJHYKTK83gq81Gx79RHxGu1+QIDcRdgCVzPpx?= =?us-ascii?Q?VuWt/HXojTdbR2TT7F3FyGqh1B89gj5BhM6iipDeGPnUqala6ZgoyQ+RDrnJ?= =?us-ascii?Q?gJi0cQxNDnt3U5XQkCSrYu0G5Y2MgvbwGnHKNrUkfoAXvC6Y5sLH5dH25isi?= =?us-ascii?Q?YcAk7axlTii3QkDO5Gr5C+A0EZjRHV0fs/0ljqTFGlCrw/a1HQNJY9CEFiDD?= =?us-ascii?Q?ZzttPuZ+5Jc4HYRTf6b6mt4I8UK5EcBX+fdWhwH4ZbCCZDvJB+cu8elTfnid?= =?us-ascii?Q?l0b2hCAS6Qpal0inoFTSpft7c7NVb/EW/GS0j62I6FETneXQHheNMVMH2CJP?= =?us-ascii?Q?k//+tEaO37Ix1RwiZQXQqp9MVoMubvvFvAAyBl/7Vo/CrW7TcU1CO3Ujp7gh?= =?us-ascii?Q?RtZtodsjRvOimfZfhdEGIP7WSZNwI0MCGyImIXj/E16TIyo5nZ36lmPSd8UD?= =?us-ascii?Q?7Chafh4z0Nvr9Ii4V8=3D?= X-Microsoft-Exchange-Diagnostics: 1;TU4PR84MB0319;5:yj16245ibkTCJa13seCy8Cwb187acHw60TVZzKBwpAlpxg32yAe30DwL779mxvPYAJy5jO/H/3Y4JATrNbge/q7TyB4OD9KJqEYL+Onfmn2Iyl8SihRjce/kc8iu8dsbadXfw+StJXCOERCrHxeQBw==;24:UJ6xmpykuRbt93LvPck4Hz+VFt/AVJgWQgTt4Pv9GnqWWflIWCHCEXsfeAcAV6R4utq1koy6kY0kLDF9SgpzhcaWVvmazwzIa/zqFTI3loA= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Feb 2016 14:43:53.4360 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: TU4PR84MB0319 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --------------040703040006030204020606 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit On 02/25/2016 03:06 AM, Ingo Molnar wrote: > * Jan Kara wrote: > >>>>> With an exit microbenchmark that creates a large number of threads, >>>>> attachs many inodes to them and then exits. The runtimes of that >>>>> microbenchmark with 1000 threads before and after the patch on a 4-socket >>>>> Intel E7-4820 v3 system (40 cores, 80 threads) were as follows: >>>>> >>>>> Kernel Elapsed Time System Time >>>>> ------ ------------ ----------- >>>>> Vanilla 4.5-rc4 65.29s 82m14s >>>>> Patched 4.5-rc4 22.81s 23m03s >>>>> >>>>> Before the patch, spinlock contention at the inode_sb_list_add() function >>>>> at the startup phase and the inode_sb_list_del() function at the exit >>>>> phase were about 79% and 93% of total CPU time respectively (as measured >>>>> by perf). After the patch, the percpu_list_add() function consumed only >>>>> about 0.04% of CPU time at startup phase. The percpu_list_del() function >>>>> consumed about 0.4% of CPU time at exit phase. There were still some >>>>> spinlock contention, but they happened elsewhere. >>>> While looking through this patch, I have noticed that the >>>> list_for_each_entry_safe() iterations in evict_inodes() and >>>> invalidate_inodes() are actually unnecessary. So if you first apply the >>>> attached patch, you don't have to implement safe iteration variants at all. >>>> >>>> As a second comment, I'd note that this patch grows struct inode by 1 >>>> pointer. It is probably acceptable for large machines given the speedup but >>>> it should be noted in the changelog. Furthermore for UP or even small SMP >>>> systems this is IMHO undesired bloat since the speedup won't be noticeable. >>>> >>>> So for these small systems it would be good if per-cpu list magic would just >>>> fall back to single linked list with a spinlock. Do you think that is >>>> reasonably doable? >>> Even many 'small' systems tend to be SMP these days. >> Yes, I know. But my tablet with 4 ARM cores is unlikely to benefit from this >> change either. [...] > I'm not sure about that at all, the above numbers are showing a 3x-4x speedup in > system time, which ought to be noticeable on smaller SMP systems as well. > > Waiman, could you please post the microbenchmark? > > Thanks, > > Ingo The microbenchmark that I used is attached. I do agree that performance benefit will decrease as the number of CPUs get smaller. The system that I used for testing have 4 sockets with 40 cores (80 threads). Dave Chinner had run his fstests on a 16-core system (probably 2-socket) which showed modest improvement in performance (~4m40s vs 4m30s in runtime). This patch enables parallel insertion and deletion to/from the inode list which used to be a serialized operation. So if that list operation is a bottleneck, you will see significant improvement. If it is not, we may not notice that much of a difference. For a single-socket 4-core system, I agree that the performance benefit, if any, will be limited. Cheers, Longman --------------040703040006030204020606 Content-Type: text/plain; name="exit_test.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="exit_test.c" /* * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * Authors: Waiman Long */ /* * This is an exit test */ #include #include #include #include #include #include #include #include #include #include #define do_exit() syscall(SYS_exit) #define gettid() syscall(SYS_gettid) #define MAX_THREADS 2048 static inline void cpu_relax(void) { __asm__ __volatile__("rep;nop": : :"memory"); } static inline void atomic_inc(volatile int *v) { __asm__ __volatile__("lock incl %0": "+m" (*v)); } static volatile int exit_now = 0; static volatile int threadcnt = 0; /* * Walk the /proc/ filesystem to make them fill the dentry cache */ static void walk_procfs(void) { char cmdbuf[256]; pid_t tid = gettid(); snprintf(cmdbuf, sizeof(cmdbuf), "find /proc/%d > /dev/null 2>&1", tid); if (system(cmdbuf) < 0) perror("system() failed!"); } static void *exit_thread(void *dummy) { long tid = (long)dummy; walk_procfs(); atomic_inc(&threadcnt); /* * Busy wait until the do_exit flag is set and then call exit */ while (!exit_now) sleep(1); do_exit(); } static void exit_test(int threads) { pthread_t thread[threads]; long i = 0, finish; time_t start = time(NULL); while (i++ < threads) { if (pthread_create(thread + i - 1, NULL, exit_thread, (void *)i)) { perror("pthread_create"); exit(1); } #if 0 /* * Pipelining to reduce contention & improve speed */ if ((i & 0xf) == 0) while (i - threadcnt > 12) usleep(1); #endif } while (threadcnt != threads) usleep(1); walk_procfs(); printf("Setup time = %lus\n", time(NULL) - start); printf("Process ready to exit!\n"); kill(0, SIGKILL); exit(0); } int main(int argc, char *argv[]) { int tcnt; /* Thread counts */ char *cmd = argv[0]; if ((argc != 2) || !isdigit(argv[1][0])) { fprintf(stderr, "Usage: %s \n", cmd); exit(1); } tcnt = strtoul(argv[1], NULL, 10); if (tcnt > MAX_THREADS) { fprintf(stderr, "Error: thread count should be <= %d\n", MAX_THREADS); exit(1); } exit_test(tcnt); return 0; /* Not reaachable */ } --------------040703040006030204020606--