Re: NFS hang when writing to loopback file from VMWare ESX (kernel 2.6.30)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chuck Lever <chuck.lever@oracle.com>
To: Beast in Black <beast.in.black@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFS hang when writing to loopback file from VMWare ESX (kernel 2.6.30)
Date: Mon, 10 May 2010 13:36:25 -0400	[thread overview]
Message-ID: <4BE84419.6010306@oracle.com> (raw)
In-Reply-To: <AANLkTinyu2l0ykkWfqjHQV2HW9g5rPvWIj-DX6dnDBbI@mail.gmail.com>

On 05/10/2010 05:20 AM, Beast in Black wrote:
> Greetings.
>
> Every so often, when i'm writing via NFS to a loopback-mounted file, i
> find that about 10-15 nfsd threads (out of a total of 64) go into D
> state, along with the loop file, and never recover from the D state.
> My setup is as follows:
>
> 1. sparse file is created via dd and loopback-mounted onto a
> /dev/loopX device (where 0<= X<= 100)
> 2. sparse file is mke2fs'd and mounted on mount point "/volumes/localvol"
> 3. "/volumes/localvol" is exported with options
> *(rw,no_root_squash,no_subtree_check,async,insecure,nohide,no_wdelay).
> 4. /volumes/localvol is set as a network datastore (NFS mount) in ESX
> 5. Virtual machine files for an ESX VM are copied into the NFS mount on ESX
> 6. Virtual machine is powered on and I do some activity in it...write files etc.
>
> At this point, the VM is running fine in ESX. After a while, however,
> I notice that the VM freezes and that ESX reports the NFS mounted
> datastore as unreachable. When I check the NFS server machine, I find
> that 10-15 NFS threads are in D state, along with the associated
> loopback-mounted file. The D states are never recovered from, and the
> only way out is to reboot the NFS server machine.
>
> I have also tried with specifying the export as "sync" instead of
> "async" (and removing no_wdelay) but I still see the same behavior.
>
> The NFS server is running the vanilla 2.6.30 kernel on Ubuntu 8.10.
> The NFS exports are all NFSv3.
>
> Does anyone have an idea of why this may be occurring? I would be glad
> to provide any additional info required.

There may be a deadlock due to memory pressure on the server.  You might 
get some information by doing a "sudo echo 't' > /proc/sysrq_trigger", 
then looking in your syslog, when the server gets into the hung state.

-- 
chuck[dot]lever[at]oracle[dot]com

     prev parent reply	other threads:[~2010-05-10 17:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-10  9:20 NFS hang when writing to loopback file from VMWare ESX (kernel 2.6.30) Beast in Black
2010-05-10 17:36 ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BE84419.6010306@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=beast.in.black@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.