Re: nfs lockup - Benjamin Coddington

Linux NFS development
 help / color / mirror / Atom feed

From: Benjamin Coddington <bcodding@redhat.com>
To: krichy@tvnetwork.hu
Cc: linux-nfs@vger.kernel.org
Subject: Re: nfs lockup
Date: Thu, 22 Oct 2015 07:17:57 -0400 (EDT)	[thread overview]
Message-ID: <alpine.OSX.2.19.9992.1510220601461.6711@planck.local> (raw)
In-Reply-To: <alpine.DEB.2.20.1510212206120.16145@krichy.tvnetwork.hu>

It looks like a lot of processes are waiting on i_mutex in
generic_file_write_iter().  Possible you're in a particularly
bad spot of contention for that mutex?

Maybe you might use the 'perf-top' tool to dig in to what the system seems to be doing
when this happens..

On Wed, 21 Oct 2015, krichy@tvnetwork.hu wrote:

>
> No, the lock is nothing to do with drbd. In the ganeti cluster some vms use
> drbd mirrored disks, but others use images on shared folder on nfs. That locks
> up sometimes. Drbd devices do work well, every network connectivity work well.
>
> Please give me advice, what to check next time. Unfortunately I cannot
> reproduce the problem.
>
> Could the 9000 MTU setting affect NFS somehow? Does that count that we are
> using xen, and thus a hypervisor is involved (regarding drbd it does).
>
> Thanks,
>
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatika Zrt.
>
> On Wed, 21 Oct 2015, Benjamin Coddington wrote:
>
> > Date: Wed, 21 Oct 2015 15:05:24 -0400 (EDT)
> > From: Benjamin Coddington <bcodding@redhat.com>
> > To: krichy@tvnetwork.hu
> > Cc: linux-nfs@vger.kernel.org
> > Subject: Re: nfs lockup
> >
> > On Wed, 21 Oct 2015, krichy@tvnetwork.hu wrote:
> >
> > > Dear devs,
> > >
> > > We have an nfs lockup issue. We run a ganeti cluster consisting of 7
> > > debian
> > > linux nodes and 1 freenas for hosting the vm images. The images are
> > > exported
> > > via nfsv3. The problem is that randomly we end in a livelock on one of our
> > > nodes.
> > >
> > > That means the nfs share is alive, we can list directories, files, even
> > > can
> > > read files (very slow, see later). And even can write to files, but the
> > > file
> > > close operation does not return, it gets blocked.
> > >
> > > The read is slow in that way that while copying a file from the share to
> > > /tmp,
> > > the data arrives very fast to the node, but in /tmp it accumulates slowly.
> > >
> > > I've also opened a debian bug report on it, but I think it is not related
> > > to
> > > debian (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801924).
> > >
> > > The only way is to reboot machine, with all the vm's running on it getting
> > > interrupted.
> > >
> > > I've captured each tasks' stack trace, hopefully it helps someone to find
> > > out
> > > the issue.
> > >
> > > Meanwhile the other 6 nodes can access the nfs share right, so I think
> > > this is
> > > not a networking or server issue. Restarting the nfs server on the server
> > > side
> > > still does not have any effect, not recovering. The nfs tcp connection is
> > > established, listing files works again, but writes not.
> > >
> > > Some information of the nodes:
> > > # uname -a
> > > Linux host 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u4 (2015-09-19)
> > > x86_64 GNU/Linux
> > >
> > > They have 1.5G ram allocated to dom0, that should be enough.
> > >
> > > I know this information is little information, give me advice what to look
> > > for
> > > next time. Unfortunately I dont know how to reproduce it.
> > >
> > > Thanks in advance,
> > >
> > > Kojedzinszky Richard
> > > Euronet Magyarorszag Informatika Zrt.
> >
> > I took a look at your debian bug report.. what's up with those drbd procs?
> > Are you writing to drbd-backed devs, and have you made sure that's not
> > involved in any way?
> >
> > Ben
> >
>

next prev parent reply	other threads:[~2015-10-22 11:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-21 15:25 nfs lockup krichy
2015-10-21 19:05 ` Benjamin Coddington
2015-10-21 20:09   ` krichy
2015-10-22 11:17     ` Benjamin Coddington [this message]
2015-10-23 18:10 ` J. Bruce Fields
2015-10-26  7:38   ` krichy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.OSX.2.19.9992.1510220601461.6711@planck.local \
    --to=bcodding@redhat.com \
    --cc=krichy@tvnetwork.hu \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox