linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gareth Clay <gclay@pivotal.io>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Many D state processes on XFS, kernel 4.4
Date: Thu, 27 Apr 2017 17:01:17 +0100	[thread overview]
Message-ID: <CAPeSaCAPKicLN=QF=gLHUeV5KPQP7f3DbT9QPGjzyECGqArzRQ@mail.gmail.com> (raw)
In-Reply-To: <20170426203451.GA44531@bfoster.bfoster>

Hi Brian,

Thanks very much from the response. Unfortunately we don't have logs
going back that far, so all I can say at the moment is that we're not
seeing any 'metadata I/O error' lines in the logs that we have whilst
the problem has been occurring. We're going to recreate the affected
VM and see if the problem recurs - if it does then we'll be sure to
grab the logs immediately and check.

What we can say is that this problem seems to have recurred 3 times
already on fresh VMs and disks. We initially wondered if it could be
due to a bad EBS volume or something similar, but this seems less
likely given the recurrence.

In the case of the other possible cause you mentioned, of I/O never
completing, is it possible that excessive load could call this, or
would this be more indicative of a concurrency issue at the filesystem
/ kernel level? One quirk of the workload on this machine is that we
have a lot of XFS project quotas which we're frequently checking to
report disk usage... Could it be that we're causing a starvation
problem?

Thanks again,
Gareth

On Wed, Apr 26, 2017 at 9:34 PM Brian Foster <bfoster@redhat.com> wrote:
>
> On Wed, Apr 26, 2017 at 05:47:15PM +0100, Gareth Clay wrote:
> > Hi,
> >
> > We're trying to diagnose a problem on an AWS virtual machine with two
> > XFS filesystems, each on loop devices. The loop files are sitting on
> > an EXT4 filesystem on Amazon EBS. The VM is running lots of Linux
> > containers - we're using Overlay FS on XFS to provide the root
> > filesystems for these containers.
> >
> > The problem we're seeing is a lot of processes entering D state, stuck
> > in the xlog_grant_head_wait function. We're also seeing xfsaild/loop0
> > stuck in D state. We're not able to write to the filesystem at all on
> > this device, it seems, without the process hitting D state. Once the
> > processes enter D state they never recover, and the list of D state
> > processes seems to be growing slowly over time.
> >
> > The filesystem on loop1 seems fine (we can run ls, touch etc)
> >
> > Would anyone be able to help us to diagnose the underlying problem please?
> >
> > Following the problem reporting FAQ we've collected the following
> > details from the VM:
> >
> > uname -a:
> > Linux 8dd9526f-00ba-4f7b-aa59-a62ec661c060 4.4.0-72-generic
> > #93~14.04.1-Ubuntu SMP Fri Mar 31 15:05:15 UTC 2017 x86_64 x86_64
> > x86_64 GNU/Linux
> >
> > xfs_repair version 3.1.9
> >
> > AWS VM with 8 CPU cores and EBS storage
> >
> > And we've also collected output from /proc, xfs_info, dmesg and the
> > XFS trace tool in the following files:
> >
> > https://s3.amazonaws.com/grootfs-logs/dmesg
> > https://s3.amazonaws.com/grootfs-logs/meminfo
> > https://s3.amazonaws.com/grootfs-logs/mounts
> > https://s3.amazonaws.com/grootfs-logs/partitions
> > https://s3.amazonaws.com/grootfs-logs/trace_report.txt
> > https://s3.amazonaws.com/grootfs-logs/xfs_info
> >
>
> It looks like everything is pretty much backed up on the log and the
> tail of the log is pinned by some dquot items. The trace output shows
> that xfsaild is spinning on flush locked dquots:
>
> <...>-2737622 [001] 33449671.892834: xfs_ail_flushing:     dev 7:0 lip 0x0xffff88012e655e30 lsn 191/61681 type XFS_LI_DQUOT flags IN_AIL
> <...>-2737622 [001] 33449671.892868: xfs_ail_flushing:     dev 7:0 lip 0x0xffff8800110d7bb0 lsn 191/61681 type XFS_LI_DQUOT flags IN_AIL
> <...>-2737622 [001] 33449671.892869: xfs_ail_flushing:     dev 7:0 lip 0x0xffff88012e655a80 lsn 191/67083 type XFS_LI_DQUOT flags IN_AIL
> <...>-2737622 [001] 33449671.892869: xfs_ail_flushing:     dev 7:0 lip 0x0xffff8800110d4810 lsn 191/67296 type XFS_LI_DQUOT flags IN_AIL
> <...>-2737622 [001] 33449671.892869: xfs_ail_flushing:     dev 7:0 lip 0x0xffff880122210460 lsn 191/67310 type XFS_LI_DQUOT flags IN_AIL
>
> The cause of that is not immediately clear. One possible reason is it
> could be due to I/O failure. Do you have any I/O error messages (i.e.,
> "metadata I/O error: block ...") in your logs from before you ended up
> in this state?
>
> If not, I'm wondering if another possibility is an I/O that just never
> completes.. is this something you can reliably reproduce?
>
> Brian
>
> > Thanks for any help or advice you can offer!
> >
> > Claudia and Gareth
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-04-27 16:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26 16:47 Many D state processes on XFS, kernel 4.4 Gareth Clay
2017-04-26 20:34 ` Brian Foster
2017-04-27 16:01   ` Gareth Clay [this message]
2017-04-27 17:57     ` Brian Foster
2017-05-03 12:07       ` Gareth Clay
2017-05-03 14:24         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPeSaCAPKicLN=QF=gLHUeV5KPQP7f3DbT9QPGjzyECGqArzRQ@mail.gmail.com' \
    --to=gclay@pivotal.io \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).