From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758145AbXKGRFy (ORCPT ); Wed, 7 Nov 2007 12:05:54 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753805AbXKGRFq (ORCPT ); Wed, 7 Nov 2007 12:05:46 -0500 Received: from smtp2.linux-foundation.org ([207.189.120.14]:37585 "EHLO smtp2.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750882AbXKGRFp (ORCPT ); Wed, 7 Nov 2007 12:05:45 -0500 Date: Wed, 7 Nov 2007 09:05:29 -0800 From: Andrew Morton To: Al Boldi Cc: neilb@suse.de, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: Massive slowdown when re-querying large nfs dir Message-Id: <20071107090529.f45626de.akpm@linux-foundation.org> In-Reply-To: <200711071236.26780.a1426z@gawab.com> References: <200711050758.38090.a1426z@gawab.com> <20071106221939.cfa79f9e.akpm@linux-foundation.org> <18225.26935.146395.366451@notabene.brown> <200711071236.26780.a1426z@gawab.com> X-Mailer: Sylpheed version 2.2.4 (GTK+ 2.8.19; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org > On Wed, 7 Nov 2007 12:36:26 +0300 Al Boldi wrote: > Neil Brown wrote: > > On Tuesday November 6, akpm@linux-foundation.org wrote: > > > > On Tue, 6 Nov 2007 14:28:11 +0300 Al Boldi wrote: > > > > Al Boldi wrote: > > > > > There is a massive (3-18x) slowdown when re-querying a large nfs dir > > > > > (2k+ entries) using a simple ls -l. > > > > > > > > > > On 2.6.23 client and server running userland rpc.nfs.V2: > > > > > first try: time -p ls -l <2k+ entry dir> in ~2.5sec > > > > > more tries: time -p ls -l <2k+ entry dir> in ~8sec > > > > > > > > > > first try: time -p ls -l <5k+ entry dir> in ~9sec > > > > > more tries: time -p ls -l <5k+ entry dir> in ~180sec > > > > > > > > > > On 2.6.23 client and 2.4.31 server running userland rpc.nfs.V2: > > > > > first try: time -p ls -l <2k+ entry dir> in ~2.5sec > > > > > more tries: time -p ls -l <2k+ entry dir> in ~7sec > > > > > > > > > > first try: time -p ls -l <5k+ entry dir> in ~8sec > > > > > more tries: time -p ls -l <5k+ entry dir> in ~43sec > > > > > > > > > > Remounting the nfs-dir on the client resets the problem. > > > > > > > > > > Any ideas? > > > > > > > > Ok, I played some more with this, and it turns out that nfsV3 is a lot > > > > faster. But, this does not explain why the 2.4.31 kernel is still > > > > over 4-times faster than 2.6.23. > > > > > > > > Can anybody explain what's going on? > > > > > > Sure, Neil can! ;) > > Thanks Andrew! > > > Nuh. > > He said "userland rpc.nfs.Vx". I only do "kernel-land NFS". In these > > days of high specialisation, each line of code is owned by a different > > person, and finding the right person is hard.... > > > > I would suggest getting a 'tcpdump -s0' trace and seeing (with > > wireshark) what is different between the various cases. > > Thanks Neil for looking into this. Your suggestion has already been answered > in a previous post, where the difference has been attributed to "ls -l" > inducing lookup for the first try, which is fast, and getattr for later > tries, which is super-slow. > > Now it's easy to blame the userland rpc.nfs.V2 server for this, but what's > not clear is how come 2.4.31 handles getattr faster than 2.6.23? > We broke 2.6? It'd be interesting to run the ls in an infinite loop on the client them start poking at the server. Is the 2.6 server doing physical IO? Is the 2.6 server consuming more system time? etc. A basic `vmstat 1' trace for both 2.4 and 2.6 would be a starting point. Could be that there's some additional latency caused by networking changes, too. I expect the tcpdump/wireshark/etc traces would have sufficient resolution for us to be able to see that.