From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback
Date: Sat, 20 May 2017 09:39:00 +1000 [thread overview]
Message-ID: <20170519233900.GT17542@dastard> (raw)
In-Reply-To: <20170519112659.GB64091@bfoster.bfoster>
On Fri, May 19, 2017 at 07:27:01AM -0400, Brian Foster wrote:
> On Fri, May 19, 2017 at 10:22:27AM +1000, Dave Chinner wrote:
> > On Wed, May 17, 2017 at 12:41:11PM +0200, Carlos Maiolino wrote:
> > > Hi Dave,
> > >
> > > On Wed, May 17, 2017 at 10:57:49AM +1000, Dave Chinner wrote:
> > > > On Thu, May 11, 2017 at 03:57:33PM +0200, Carlos Maiolino wrote:
> > > > > +xfs_inode_item_error(
> > > > > + struct xfs_log_item *lip,
> > > > > + unsigned int bflags)
> > > > > +{
> > > > > +
> > > > > + /*
> > > > > + * The buffer writeback containing this inode has been failed
> > > > > + * mark it as failed and unlock the flush lock, so it can be retried
> > > > > + * again
> > > > > + */
> > > > > + if (bflags & XBF_WRITE_FAIL)
> > > > > + lip->li_flags |= XFS_LI_FAILED;
> > > > > +}
> > > >
> > > > I'm pretty sure we can't modify the log item state like this in
> > > > IO context because the parent object is not locked by this context.
> > > > We can modify the state during IO submission because we hold the
> > > > parent object lock, but we don't hold it here.
> > > >
> > > > i.e. we can race with other log item state changes done during a
> > > > concurrent transaction (e.g. marking the log item dirty in
> > > > xfs_trans_log_inode()).
> > > >
> > > > > +
> > > > > + if (lip->li_flags & XFS_LI_FAILED)
> > > > > + lip->li_flags &= XFS_LI_FAILED;
> > > > > + lip = next;
> > > > > + }
> > > >
> > > > Same race here - we hold the lock for this inode, but not all the
> > > > other inodes linked to the buffer.
> > > >
> > > > This implies that lip->li_flags needs to use atomic bit updates so
> > > > there are no RMW update races like this.
> > > >
> > >
> > > I believe refers to both cases above, while setting and removing the flag?
> > >
> > > I can simply use set_bit() and clear_bit() here, if this is enough, although, to
> > > use them I'll need to change lip->li_flags size to `unsigned long` if people are
> > > ok with that.
> >
> > Yup, and you need to change every other flag in li_flags to use
> > atomic bitops, too. That also means using test_bit() for value
> > checks, test_and_clear_bit() instead of "if (bit) {clear bit}" and
> > so on.
> >
> > i.e. it's no good just making one flag atomic and all the other
> > updates using RMW accesses to change the state because that doesn't
> > close the update race conditions.
> >
>
> Am I following this correctly that the only potential race here looks
> like it is with XFS_LI_ABORTED (the dirty transaction cancel case,
> specifically)?
XFS_LI_ABORTED is used during a shutdown (dirty trans cancel causes
a shutdown) so races don't really matter here.
However, the XFS_LI_IN_AIL flag is the problem because of it's
serialisation requirements. That is, log IO completion can be
running asynchronously to metadata object IO completion and the
manipulation of the XFS_LI_IN_AIL flag and state is protected by the
ailp->xa_lock.
Adding new flags to the same field that can be asynchronously
updated by RMW operations outside the ailp->xa_lock will cause
problems in future. There *may not* be a problem right now, but it
is poor programming practice to have different coherency processes
for the same variable that is updated via RMW operations. In these
situations, the only safe way to update the variable is to use an
atomic operation.
i.e. I'm not looking for a specific bug in the code - I'm pointing
out a concurrent programming error that is being made.
> But otherwise given the above, that this is primarily I/O error path
> code, and that we presumably already have serialization mechanisms in
> place in the form of parent object locks, why not just grab the inode
> lock in the appropriate places?
Because we can't guarantee that we can lock the parent object in IO
completion. Deadlock is trivial: process A grabs parent object,
blocks on flush lock. IO completion holds flush lock, blocks trying
to get parent object lock.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-05-19 23:39 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-11 13:57 [PATCH 0/2] Resubmit items failed during writeback Carlos Maiolino
2017-05-11 13:57 ` [PATCH 1/2] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino
2017-05-11 16:51 ` Brian Foster
2017-05-12 8:41 ` Carlos Maiolino
2017-05-12 11:37 ` Brian Foster
2017-05-11 13:57 ` [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
2017-05-11 15:32 ` Eric Sandeen
2017-05-12 8:19 ` Carlos Maiolino
2017-05-11 17:08 ` Brian Foster
2017-05-12 8:21 ` Carlos Maiolino
2017-05-12 11:37 ` Brian Foster
2017-05-17 11:47 ` Carlos Maiolino
2017-05-17 0:57 ` Dave Chinner
2017-05-17 10:41 ` Carlos Maiolino
2017-05-19 0:22 ` Dave Chinner
2017-05-19 11:27 ` Brian Foster
2017-05-19 23:39 ` Dave Chinner [this message]
2017-05-20 11:46 ` Brian Foster
2017-05-21 23:19 ` Dave Chinner
2017-05-22 12:51 ` Brian Foster
2017-05-23 11:23 ` Dave Chinner
2017-05-23 16:22 ` Brian Foster
2017-05-24 1:06 ` Dave Chinner
2017-05-24 12:42 ` Brian Foster
2017-05-24 13:26 ` Carlos Maiolino
2017-05-24 17:08 ` Brian Foster
-- strict thread matches above, loose matches on Subject: below --
2017-06-16 10:54 [PATCH 0/2 V4] Resubmit items failed during writeback Carlos Maiolino
2017-06-16 10:54 ` [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
2017-06-16 11:06 ` Carlos Maiolino
2017-06-16 18:35 ` Luis R. Rodriguez
2017-06-16 19:24 ` Darrick J. Wong
2017-06-16 19:37 ` Luis R. Rodriguez
2017-06-16 19:45 ` Eric Sandeen
2017-06-19 10:59 ` Brian Foster
2017-06-20 16:52 ` Luis R. Rodriguez
2017-06-20 17:20 ` Brian Foster
2017-06-20 18:05 ` Luis R. Rodriguez
2017-06-21 10:10 ` Brian Foster
2017-06-21 15:25 ` Luis R. Rodriguez
2017-06-20 18:38 ` Luis R. Rodriguez
2017-06-20 7:01 ` Carlos Maiolino
2017-06-20 16:24 ` Luis R. Rodriguez
2017-06-21 11:51 ` Carlos Maiolino
2017-06-19 13:49 ` Brian Foster
2017-06-19 15:09 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170519233900.GT17542@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).