From mboxrd@z Thu Jan 1 00:00:00 1970 From: tim Subject: Re: [patch 1/2] fs: cleanup files_lock Date: Thu, 01 Apr 2010 19:24:40 -0700 Message-ID: <1270175080.15151.34.camel@mudge.jf.intel.com> References: <20090904065142.114706411@nick.local0.net> <20090904065534.303326352@nick.local0.net> <1254144248.15795.6.camel@laptop> <20091001021657.GO6327@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: ak@linux.intel.com, yanmin_zhang@linux.intel.com, gregkh@suse.de To: npiggin@suse.de, linux-fsdevel@vger.kernel.org Return-path: Received: from mga09.intel.com ([134.134.136.24]:9272 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756044Ab0DBCYe (ORCPT ); Thu, 1 Apr 2010 22:24:34 -0400 In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Nick Piggin wrote: > I would like to start sending bits of vfs scalability improvements to > be reviewed and hopefully merged. > > I will start with files_lock. Last time this one came up, it was > criticised because some hoped to get rid of files list, and because > it didn't have enough justification of the scalability improvement. > > For the first criticism, it isn't any more difficult to rip out if > we are ever able to remove files list. For the second, I have gathered > some statistics and written better justification. Andi I believe is > finding kbuild is becoming limited by files lock on larger systems. I did some testing doing multi-threaded kernel compile (skipping link stage) on a 4 socket Nehalem-EX class machines with 8 cores per socket. In the past we have found heavy contention on the files_lock. Nick's patch to reorganize files_lock is beneficial. With 64 threads compile, the distribution of user, system, idle, IO wait time improved and the details are as follow: us sys idle IO wait (in %) 2.6.34-rc2 51.25 28.25 17.25 3.25 +nick's patch 53.75 18.5 19 8.75 We spend 10% less cpu on system time contending for files_lock. Contention of files_lock is gone from our profiling data. The throughput with Nick's patch is a bit better (24.9 times faster than single threaded compile vs 24.5 times without patch). Tim