Re: NFS hang when writing to loopback file from VMWare ESX (kernel 2.6.30)

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chuck Lever <chuck.lever@oracle.com>
To: Beast in Black <beast.in.black@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFS hang when writing to loopback file from VMWare ESX (kernel 2.6.30)
Date: Mon, 10 May 2010 13:36:25 -0400	[thread overview]
Message-ID: <4BE84419.6010306@oracle.com> (raw)
In-Reply-To: <AANLkTinyu2l0ykkWfqjHQV2HW9g5rPvWIj-DX6dnDBbI@mail.gmail.com>

On 05/10/2010 05:20 AM, Beast in Black wrote:
> Greetings.
>
> Every so often, when i'm writing via NFS to a loopback-mounted file, i
> find that about 10-15 nfsd threads (out of a total of 64) go into D
> state, along with the loop file, and never recover from the D state.
> My setup is as follows:
>
> 1. sparse file is created via dd and loopback-mounted onto a
> /dev/loopX device (where 0<= X<= 100)
> 2. sparse file is mke2fs'd and mounted on mount point "/volumes/localvol"
> 3. "/volumes/localvol" is exported with options
> *(rw,no_root_squash,no_subtree_check,async,insecure,nohide,no_wdelay).
> 4. /volumes/localvol is set as a network datastore (NFS mount) in ESX
> 5. Virtual machine files for an ESX VM are copied into the NFS mount on ESX
> 6. Virtual machine is powered on and I do some activity in it...write files etc.
>
> At this point, the VM is running fine in ESX. After a while, however,
> I notice that the VM freezes and that ESX reports the NFS mounted
> datastore as unreachable. When I check the NFS server machine, I find
> that 10-15 NFS threads are in D state, along with the associated
> loopback-mounted file. The D states are never recovered from, and the
> only way out is to reboot the NFS server machine.
>
> I have also tried with specifying the export as "sync" instead of
> "async" (and removing no_wdelay) but I still see the same behavior.
>
> The NFS server is running the vanilla 2.6.30 kernel on Ubuntu 8.10.
> The NFS exports are all NFSv3.
>
> Does anyone have an idea of why this may be occurring? I would be glad
> to provide any additional info required.

There may be a deadlock due to memory pressure on the server.  You might 
get some information by doing a "sudo echo 't' > /proc/sysrq_trigger", 
then looking in your syslog, when the server gets into the hung state.

-- 
chuck[dot]lever[at]oracle[dot]com

     prev parent reply	other threads:[~2010-05-10 17:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-10  9:20 NFS hang when writing to loopback file from VMWare ESX (kernel 2.6.30) Beast in Black
2010-05-10 17:36 ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BE84419.6010306@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=beast.in.black@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).