From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rcsinet10.oracle.com ([148.87.113.121]:60581 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751754Ab0EJRgk (ORCPT ); Mon, 10 May 2010 13:36:40 -0400 Message-ID: <4BE84419.6010306@oracle.com> Date: Mon, 10 May 2010 13:36:25 -0400 From: Chuck Lever To: Beast in Black CC: linux-nfs@vger.kernel.org Subject: Re: NFS hang when writing to loopback file from VMWare ESX (kernel 2.6.30) References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On 05/10/2010 05:20 AM, Beast in Black wrote: > Greetings. > > Every so often, when i'm writing via NFS to a loopback-mounted file, i > find that about 10-15 nfsd threads (out of a total of 64) go into D > state, along with the loop file, and never recover from the D state. > My setup is as follows: > > 1. sparse file is created via dd and loopback-mounted onto a > /dev/loopX device (where 0<= X<= 100) > 2. sparse file is mke2fs'd and mounted on mount point "/volumes/localvol" > 3. "/volumes/localvol" is exported with options > *(rw,no_root_squash,no_subtree_check,async,insecure,nohide,no_wdelay). > 4. /volumes/localvol is set as a network datastore (NFS mount) in ESX > 5. Virtual machine files for an ESX VM are copied into the NFS mount on ESX > 6. Virtual machine is powered on and I do some activity in it...write files etc. > > At this point, the VM is running fine in ESX. After a while, however, > I notice that the VM freezes and that ESX reports the NFS mounted > datastore as unreachable. When I check the NFS server machine, I find > that 10-15 NFS threads are in D state, along with the associated > loopback-mounted file. The D states are never recovered from, and the > only way out is to reboot the NFS server machine. > > I have also tried with specifying the export as "sync" instead of > "async" (and removing no_wdelay) but I still see the same behavior. > > The NFS server is running the vanilla 2.6.30 kernel on Ubuntu 8.10. > The NFS exports are all NFSv3. > > Does anyone have an idea of why this may be occurring? I would be glad > to provide any additional info required. There may be a deadlock due to memory pressure on the server. You might get some information by doing a "sudo echo 't' > /proc/sysrq_trigger", then looking in your syslog, when the server gets into the hung state. -- chuck[dot]lever[at]oracle[dot]com