From: Shyam Kaushik <shyam@zadarastorage.com>
To: xfs@oss.sgi.com
Subject: RE: XFS hung task in xfs_ail_push_all_sync() when unmounting FS after disk failure/recovery
Date: Tue, 12 Apr 2016 10:50:06 +0530 [thread overview]
Message-ID: <695a5148d01ba43e96cb89736da6c1ad@mail.gmail.com> (raw)
In-Reply-To: 1aa4e955d78e6932260fdfc55c83bb8e@mail.gmail.com
Hi Carlos,
xfs.stap that I have been using is here
(https://www.dropbox.com/s/dluse4s3a1c7dj3/xfs.stap?dl=0). It has
line-numbers matching xfs with printks I added. You will have to tweak it
a bit.
Thanks.
--Shyam
-----Original Message-----
From: Carlos Maiolino
Sent: Fri Apr 8 09:31:31 CDT 2016
> Hi Shyam,
>
> do you mind to share your systemtap script with us?
>
> I'd like to take a look on it, and probably Brian will be interested to.
-----Original Message-----
From: Dave Chinner [mailto:david@fromorbit.com]
Sent: 11 April 2016 06:51
To: Alex Lyakas
Cc: Shyam Kaushik; Brian Foster; xfs@oss.sgi.com
Subject: Re: XFS hung task in xfs_ail_push_all_sync() when unmounting FS
after disk failure/recovery
On Sun, Apr 10, 2016 at 09:40:29PM +0300, Alex Lyakas wrote:
> Hello Dave,
>
> On Sat, Apr 9, 2016 at 1:46 AM, Dave Chinner <david@fromorbit.com>
wrote:
> > On Fri, Apr 08, 2016 at 04:21:02PM +0530, Shyam Kaushik wrote:
> >> Hi Dave, Brian, Carlos,
> >>
> >> While trying to reproduce this issue I have been running into
different
> >> issues that are similar. Underlying issue remains the same when
backend to
> >> XFS is failed & we unmount XFS, we run into hung-task timeout
(180-secs)
> >> with stack like
> >>
> >> kernel: [14952.671131] [<ffffffffc06a5f59>]
> >> xfs_ail_push_all_sync+0xa9/0xe0 [xfs]
> >> kernel: [14952.671139] [<ffffffff810b26b0>] ?
> >> prepare_to_wait_event+0x110/0x110
> >> kernel: [14952.671181] [<ffffffffc0690111>] xfs_unmountfs+0x61/0x1a0
> >> [xfs]
> >>
> >> while running trace-events, XFS ail push keeps looping around
> >>
> >> xfsaild/dm-10-21143 [001] ...2 17878.555133: xfs_ilock_nowait: dev
> >> 253:10 ino 0x0 flags ILOCK_SHARED caller xfs_inode_item_push [xfs]
> >
> > Looks like either a stale inode (which should never reach the AIL)
> > or it's an inode that's been reclaimed and this is a use after free
> > situation. Given that we are failing IOs here, I'd suggest it's more
> > likely to be an IO failure that's caused a writeback problem, not an
> > interaction with stale inodes.
> >
> > So, look at xfs_iflush. If an IO fails, it is supposed to unlock the
> > inode by calling xfs_iflush_abort(), which will also remove it from
> > the AIL. This can also happen on reclaim of a dirty inode, and if so
> > we'll still reclaim the inode because reclaim assumes xfs_iflush()
> > cleans up properly.
> >
> > Which, apparently, it doesn't:
> >
> > /*
> > * Get the buffer containing the on-disk inode.
> > */
> > error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp,
XBF_TRYLOCK, 0);
> > if (error || !bp) {
> > xfs_ifunlock(ip);
> > return error;
> > }
> >
> > This looks like a bug - xfs_iflush hasn't aborted the inode
> > writeback on failure - it's just unlocked the flush lock. Hence it
> > has left the inode dirty in the AIL, and then the inode has probably
> > then been reclaimed, setting the inode number to zero.
> In our case, we do not reach this call, because XFS is already marked
> as "shutdown", so in our case we do:
> /*
> * This may have been unpinned because the filesystem is shutting
> * down forcibly. If that's the case we must not write this inode
> * to disk, because the log record didn't make it to disk.
> *
> * We also have to remove the log item from the AIL in this case,
> * as we wait for an empty AIL as part of the unmount process.
> */
> if (XFS_FORCED_SHUTDOWN(mp)) {
> error = -EIO;
> goto abort_out;
> }
>
> So we call xfs_iflush_abort, but due to "iip" being NULL (as Shyam
> mentioned earlier in this thread), we proceed directly to
> xfs_ifunlock(ip), which now becomes the same situation as you
> described above.
If you are getting this occuring, something else has already gone
wrong as you can't have a dirty inode without a log item attached to
it. So it appears to me that you are reporting a symptom of a
problem after it has occured, not the root cause. Maybe it is the
same root cause, maybe not. Either way, it doesn't help us solve any
problem.
> The comment clearly says "We also have to remove the log item from the
> AIL in this case, as we wait for an empty AIL as part of the unmount
> process." Could you perhaps point us at the code that is supposed to
> remove the log item from the AIL? Apparently this is not happening.
xfs_iflush_abort or xfs_iflush_done does that work.
-Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2016-04-12 5:20 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-22 11:21 XFS hung task in xfs_ail_push_all_sync() when unmounting FS after disk failure/recovery Shyam Kaushik
2016-03-22 12:19 ` Brian Foster
2016-03-22 13:01 ` Shyam Kaushik
2016-03-22 14:03 ` Brian Foster
2016-03-22 15:38 ` Carlos Maiolino
2016-03-22 15:56 ` Carlos Maiolino
2016-03-23 9:43 ` Shyam Kaushik
2016-03-23 12:30 ` Brian Foster
2016-03-23 15:32 ` Carlos Maiolino
2016-03-23 22:37 ` Dave Chinner
2016-03-24 11:08 ` Carlos Maiolino
2016-03-24 16:52 ` Carlos Maiolino
2016-03-24 21:56 ` Dave Chinner
2016-04-01 12:31 ` Carlos Maiolino
2016-03-23 9:52 ` Shyam Kaushik
2016-03-24 13:38 ` Shyam Kaushik
2016-04-08 10:51 ` Shyam Kaushik
2016-04-08 13:16 ` Brian Foster
2016-04-08 13:35 ` Shyam Kaushik
2016-04-08 14:31 ` Carlos Maiolino
2016-04-08 17:48 ` Shyam Kaushik
2016-04-08 19:00 ` Brian Foster
2016-04-08 17:51 ` Shyam Kaushik
2016-04-08 22:46 ` Dave Chinner
2016-04-10 18:40 ` Alex Lyakas
2016-04-11 1:21 ` Dave Chinner
2016-04-11 14:52 ` Shyam Kaushik
2016-04-11 22:47 ` Dave Chinner
2016-04-12 5:20 ` Shyam Kaushik [this message]
2016-04-12 6:59 ` Shyam Kaushik
2016-04-12 8:19 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=695a5148d01ba43e96cb89736da6c1ad@mail.gmail.com \
--to=shyam@zadarastorage.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox