From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qy0-f174.google.com ([209.85.216.174]:34607 "EHLO mail-qy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756596Ab0JANoq (ORCPT ); Fri, 1 Oct 2010 09:44:46 -0400 Received: by qyk36 with SMTP id 36so3337845qyk.19 for ; Fri, 01 Oct 2010 06:44:45 -0700 (PDT) Message-ID: <4CA5E5CB.7000204@stevek.com> Date: Fri, 01 Oct 2010 09:44:43 -0400 From: Steve Kann To: linux-nfs@vger.kernel.org Subject: question about serialization/queuing behavior in linux nfs client Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 Hi, List, I'm trying to debug some performance problems I've been seeing in a particular application. My main problem is the simple case of an overloaded server, but there's one aspect of the behavior I'm seeing in benchmarks that I don't quite understand. Basics: I'm doing benchmarks from a CentOS4 (2.6.9-78.0.13), using NFSv3 (over tcp) to connect to a NetApp filer. My benchmark application is a simple perl script which times directory operations (stat, mkdir, rmdir), and I typically am running between 20-200 parallel copies. What I don't quite understand is that if I look on the wire, I see the "worst case" operation times taking up to about ~10 seconds, but from the application, it's reporting worst case times in the 30-60 (or higher!) second range. At first, I thought that perhaps the system calls in the application were being mapped into multiple NFS operations, but that does not appear to be the case. My second thought was that the kernel is somehow limiting the number of outstanding requests it's issuing to the server. It seems that way back in kernel 2.4, there was a limit of 256 outstanding requests (as per nfs.sourceforge.net FAQ B7), but that hard limit was removed back in 2.5 with this patch from Trond (http://lwn.net/Articles/15074/), and replaced with other mechanisms to limit memory usage. The machine I'm testing from has 4GB, and a pretty low application memory footprint (there's nothing much else running on the machine other than my tests). Any idea what causes the disparity between what I'm seeing on the wire, and what my test application is seeing? Thanks for helping me understand, -SteveK