From: Shawn Starr <shawn.starr@rogers.com>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [NFSv4.1] Deadlock on writes - RHEL 7.1 kernel - nfs_pageio_doio?
Date: Thu, 03 Sep 2015 11:54:40 -0400 [thread overview]
Message-ID: <2932091.5g0LhYZZI5@segfault> (raw)
In-Reply-To: <alpine.OSX.2.19.9992.1509030643490.71803@planck.local>
On Thursday, September 03, 2015 06:47:26 AM Benjamin Coddington wrote:
> Hi Shawn,
>
> This doesn't look like a deadlock to me, just processes waiting for their
> writes to complete. They've been waiting for a long time, so the hung task
> warning is triggered.
>
> There might be a network problem that's preventing that NFS client from
> communicating with the server, or the server is taking a very long time
> to complete the operation. A network capture between the client and server
> might show what's actually happening.
>
> Ben
>
Hi Ben,
While that might be the case, this does not happen on our RHEL6 and CentOS 6.x VMs
so, I'm hesitant to say it's network related fully.
If EL7 changed some timeouts for NFS then this might explain the hung task warning, however,
leaving the VMs stuck they never recover, they appear deadlocked in VFS subsystem in the fact I can't login into them
via SSH or from the KVM console session itself. So all writes not on local disk are deadlocked
writes to remote syslog appear fine as this isn't going though VFS.
Thanks,
Shawn
> On Wed, 2 Sep 2015, Shawn Starr wrote:
> > Hello NFS devs,
> >
> > While this is a CentOS/RHEL kernel: 3.10.0-229.4.2.el7.x86_64 (and old)
> >
> > I was wondering your take on this deadlock, I cannot reproduce this and it
> > seems to happen in our KVM VM images randomly so far only once. When we
> > configure a VM it does two reboots, first sets up things then a final
> > reboot where we have a fresh bootup with settings in place.
> >
> >
> > This could be from a cron thats running, but the VMs in question is pretty
> > much idle, CPU skyrockets and they deadlock, can't ssh into them to
> > examine why. We have remote syslog capturing, so I would never see this
> > otherwise.
> >
> > If anyone has ideas on how I can test this? This has been reported in the
> > CentOS bugtracker by someone else also, I couldn't use their methods for
> > reproduction.
> >
> > below is the trace from kernel:
> >
<snip>
prev parent reply other threads:[~2015-09-03 15:54 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-02 16:53 [NFSv4.1] Deadlock on writes - RHEL 7.1 kernel - nfs_pageio_doio? Shawn Starr
2015-09-03 10:47 ` Benjamin Coddington
2015-09-03 15:54 ` Shawn Starr [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2932091.5g0LhYZZI5@segfault \
--to=shawn.starr@rogers.com \
--cc=bcodding@redhat.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox