All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
	Carlos Maiolino <cmaiolino@redhat.com>,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback
Date: Fri, 16 Jun 2017 21:37:55 +0200	[thread overview]
Message-ID: <20170616193755.GD21846@wotan.suse.de> (raw)
In-Reply-To: <20170616192445.GG5421@birch.djwong.org>

On Fri, Jun 16, 2017 at 12:24:45PM -0700, Darrick J. Wong wrote:
> On Fri, Jun 16, 2017 at 08:35:10PM +0200, Luis R. Rodriguez wrote:
> > On Fri, Jun 16, 2017 at 12:54:45PM +0200, Carlos Maiolino wrote:
> > > When a buffer has been failed during writeback, the inode items into it
> > > are kept flush locked, and are never resubmitted due the flush lock, so,
> > > if any buffer fails to be written, the items in AIL are never written to
> > > disk and never unlocked.
> > > 
> > > This causes unmount operation to hang due these items flush locked in AIL,
> > 
> > What type of hang? If it has occurred in production is there a trace somewhere?
> > what does it look like?
> > 
> > You said you would work on an xfstest for this, how's that going? Otherewise
> > a commit log description of how to reproduce would be useful.
> 
> I'm curious for an xfstest too, but I think Carlos /did/ tell us how to
> reproduce -- create a thinp device, format XFS, fill it up, and try to
> unmount.

Well he did mention to create a Dm-thin device with a fileystem larger than
the real device. You seem to have say just filling up a filsystem?

Do both cases trigger the issue?

> I don't think there /is/ much of a trace, just xfsaild doing
> nothing and a bunch of ail items that are flush locked and stuck that way.

OK so no hung task seek, no crash, just a system call that never ends?

Is the issue recoverable? So unmount just never completes? Can we CTRL-C
(SIGINT) out of it?

> > > but this also causes the items in AIL to never be written back, even when
> > > the IO device comes back to normal.
> > > 
> > > I've been testing this patch with a DM-thin device, creating a
> > > filesystem larger than the real device.
> > > 
> > > When writing enough data to fill the DM-thin device, XFS receives ENOSPC
> > > errors from the device, and keep spinning on xfsaild (when 'retry
> > > forever' configuration is set).
> > > 
> > > At this point, the filesystem can not be unmounted because of the flush locked
> > > items in AIL, but worse, the items in AIL are never retried at all
> > > (once xfs_inode_item_push() will skip the items that are flush locked),
> > > even if the underlying DM-thin device is expanded to the proper size.
> > 
> > Jeesh.
> > 
> > If the above issue is a real hang, shoudl we not consider a sensible stable fix
> > to start off with ?
> 
> Huh?  I thought this series is supposed to be the fix.

It seems like a rather large set of changes, if the issue was sevee I was hoping
for a stable candidate fix first. If its not fixing a severe issue then sure.

  Luis

  reply	other threads:[~2017-06-16 19:37 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-16 10:54 [PATCH 0/2 V4] Resubmit items failed during writeback Carlos Maiolino
2017-06-16 10:54 ` [PATCH 1/2 V4] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino
2017-06-19 13:48   ` Brian Foster
2017-06-20  7:15     ` Carlos Maiolino
2017-06-16 10:54 ` [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
2017-06-16 11:06   ` Carlos Maiolino
2017-06-16 18:35   ` Luis R. Rodriguez
2017-06-16 19:24     ` Darrick J. Wong
2017-06-16 19:37       ` Luis R. Rodriguez [this message]
2017-06-16 19:45         ` Eric Sandeen
2017-06-19 10:59           ` Brian Foster
2017-06-20 16:52             ` Luis R. Rodriguez
2017-06-20 17:20               ` Brian Foster
2017-06-20 18:05                 ` Luis R. Rodriguez
2017-06-21 10:10                   ` Brian Foster
2017-06-21 15:25                     ` Luis R. Rodriguez
2017-06-20 18:38                 ` Luis R. Rodriguez
2017-06-20  7:01     ` Carlos Maiolino
2017-06-20 16:24       ` Luis R. Rodriguez
2017-06-21 11:51         ` Carlos Maiolino
2017-06-19 13:49   ` Brian Foster
2017-06-19 15:09     ` Brian Foster
2017-06-19 13:51 ` [PATCH 0/2 V4] Resubmit items failed during writeback Brian Foster
2017-06-19 17:42   ` Darrick J. Wong
2017-06-19 18:51     ` Brian Foster
2017-06-21  0:45       ` Darrick J. Wong
2017-06-21 10:15         ` Brian Foster
2017-06-21 11:03           ` Carlos Maiolino
2017-06-21 11:51             ` Brian Foster
2017-06-21 16:54               ` Darrick J. Wong
2017-06-22 12:05                 ` Carlos Maiolino
2017-06-22 12:40                   ` Brian Foster
2017-06-30 11:09                     ` Carlos Maiolino
2017-06-30 11:33                       ` Brian Foster
2017-06-30 12:22                         ` Carlos Maiolino
2017-06-30 17:01                           ` Darrick J. Wong
2017-07-03  8:37                             ` Carlos Maiolino
2017-06-21 16:45             ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2017-05-11 13:57 [PATCH 0/2] " Carlos Maiolino
2017-05-11 13:57 ` [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
2017-05-11 15:32   ` Eric Sandeen
2017-05-12  8:19     ` Carlos Maiolino
2017-05-11 17:08   ` Brian Foster
2017-05-12  8:21     ` Carlos Maiolino
2017-05-12 11:37       ` Brian Foster
2017-05-17 11:47         ` Carlos Maiolino
2017-05-17  0:57   ` Dave Chinner
2017-05-17 10:41     ` Carlos Maiolino
2017-05-19  0:22       ` Dave Chinner
2017-05-19 11:27         ` Brian Foster
2017-05-19 23:39           ` Dave Chinner
2017-05-20 11:46             ` Brian Foster
2017-05-21 23:19               ` Dave Chinner
2017-05-22 12:51                 ` Brian Foster
2017-05-23 11:23                   ` Dave Chinner
2017-05-23 16:22                     ` Brian Foster
2017-05-24  1:06                       ` Dave Chinner
2017-05-24 12:42                         ` Brian Foster
2017-05-24 13:26                           ` Carlos Maiolino
2017-05-24 17:08                             ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170616193755.GD21846@wotan.suse.de \
    --to=mcgrof@kernel.org \
    --cc=cmaiolino@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.