From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback
Date: Mon, 22 May 2017 08:51:13 -0400 [thread overview]
Message-ID: <20170522125110.GA11578@bfoster.bfoster> (raw)
In-Reply-To: <20170521231906.GU17542@dastard>
On Mon, May 22, 2017 at 09:19:06AM +1000, Dave Chinner wrote:
> On Sat, May 20, 2017 at 07:46:56AM -0400, Brian Foster wrote:
> > On Sat, May 20, 2017 at 09:39:00AM +1000, Dave Chinner wrote:
> > > Adding new flags to the same field that can be asynchronously
> > > updated by RMW operations outside the ailp->xa_lock will cause
> > > problems in future. There *may not* be a problem right now, but it
> > > is poor programming practice to have different coherency processes
> > > for the same variable that is updated via RMW operations. In these
> > > situations, the only safe way to update the variable is to use an
> > > atomic operation.
> > >
> >
> > So is there a reason why we couldn't acquire ->xa_lock to fail the log
> > items as we would have done anyways if the metadata writeback had
> > succeeded and we were removing the log items from the AIL..
>
> Yes. the alip->xa_lock protects AIL state is a highly contended
> lock. It should not be used for things that aren't AIL related
> because that will have performance and scalability implications.
>
The purpose of this flag is to control AIL retry processing, how is this
not AIL related? If the I/O succeeds, we grab ->xa_lock to remove the
items from the AIL. If the I/O fails, we can't grab ->xa_lock to tag the
items as failed (so the AIL knows if/how to retry the item(s)) because
of performance and scalability concerns..?
In fact, having taken another look at the code, you could make the
argument that failure to take the ->xa_lock is a landmine should the AIL
task ever expect to observe consistent state across the affected set of
items (independent from atomic li_flags updates).
> > (e.g., why
> > introduce new serialization rules if we don't need to)?
>
> Becuase we don't tie independent object operational state to a
> single subsystem's internal locking requirements. If the log item
> does not have it's own context lock (which it doesn't), then we need
> to use atomic operations to serialise flag updates across different
> subsystems so they can safely maintain their own independent
> internal locking schemes.
>
> An example is the inode flags - they are serialised by the
> i_flags_lock spinlock because there are multiple state changes that
> need to be done atomically through different subsystems (including
> RCU freeing state). We serialise these updates by an object lock
> rather than a subsystem lock because it's the only way we can share
> the flag field across different subsystems safely...
>
This makes sense in principle and from the perspective that it might
matter for some future changes/flags, I just don't think this patch
really departs from our existing serialization model wrt to the AIL. We
aren't adding the serialization requirement to any new context where it
doesn't already exist. Therefore, (IMO) it's mostly an overcomplication
to rework serialization as a dependency for this fix.
All that said, the bitops change is harmless and there are only a few
flags to deal with, so I don't think it really matters much. I just
think it would be nice to avoid an artificial backport dependency. IOW,
I think this patch should use ->xa_lock as is and can be immediately
followed by a patch to convert the li_flags to bit ops and remove the
->xa_lock from contexts where it is no longer necessary (with documented
justification).
> > > > But otherwise given the above, that this is primarily I/O error path
> > > > code, and that we presumably already have serialization mechanisms in
> > > > place in the form of parent object locks, why not just grab the inode
> > > > lock in the appropriate places?
> > >
> > > Because we can't guarantee that we can lock the parent object in IO
> > > completion. Deadlock is trivial: process A grabs parent object,
> > > blocks on flush lock. IO completion holds flush lock, blocks trying
> > > to get parent object lock.
> > >
> >
> > Ah, right. So once we acquire the flush lock and drop the ilock, we
> > can no longer block on the ilock from a context responsible for
> > releasing the flush lock.
> >
> > That makes sense, but raises a larger question... if "process A" can
> > cause this kind of locking problem in I/O completion context, what
> > prevents it from interfering with (re)submission context just the same?
> > For example, the AIL pushes an inode, locks, flushes and submits the
> > underlying buffer. The buffer I/O fails and is ultimately released back
> > to the AIL. Process A comes in, locks the inode and blocks on the flush
> > lock. Subsequent AIL pushes and retry attempts fail to lock the inode,
> > never even get to the point of checking for LI_FAILED and thus we're
> > back to the same problem we have today.
> >
> > IOW, doesn't this mean we need to check and handle LI_FAILED first off
> > in ->iop_push() and not just in response to flush lock failure?
>
> It's entirely possible that we need to do that. This whole "don't
> endlessly retry failed buffer writes" thing is a substantial change
> in behaviour, so there's bound to be interactions that we don't get
> right the first time...
>
I'm not sure if you're referring to this patch or the broader error
configuration stuff here... Note that this patch doesn't change the
fundamental behavior that the AIL retries failed buffers (subject to the
error config). I tend to get this mixed up, but IIUC this has been
traditional behavior for things like buffers, for example, for quite
some time. The problem here is that the AIL doesn't correctly issue
retries for certain items (i.e., those with flush locks), so we're just
unraveling a livelock in the log subsystem (as opposed to changing
behavior).
Brian
> Cheers,
>
> Dave.
>
> >
> > Brian
> >
> > > Cheers,
> > >
> > > Dave.
> > > --
> > > Dave Chinner
> > > david@fromorbit.com
> >
>
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-05-22 12:51 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-11 13:57 [PATCH 0/2] Resubmit items failed during writeback Carlos Maiolino
2017-05-11 13:57 ` [PATCH 1/2] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino
2017-05-11 16:51 ` Brian Foster
2017-05-12 8:41 ` Carlos Maiolino
2017-05-12 11:37 ` Brian Foster
2017-05-11 13:57 ` [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
2017-05-11 15:32 ` Eric Sandeen
2017-05-12 8:19 ` Carlos Maiolino
2017-05-11 17:08 ` Brian Foster
2017-05-12 8:21 ` Carlos Maiolino
2017-05-12 11:37 ` Brian Foster
2017-05-17 11:47 ` Carlos Maiolino
2017-05-17 0:57 ` Dave Chinner
2017-05-17 10:41 ` Carlos Maiolino
2017-05-19 0:22 ` Dave Chinner
2017-05-19 11:27 ` Brian Foster
2017-05-19 23:39 ` Dave Chinner
2017-05-20 11:46 ` Brian Foster
2017-05-21 23:19 ` Dave Chinner
2017-05-22 12:51 ` Brian Foster [this message]
2017-05-23 11:23 ` Dave Chinner
2017-05-23 16:22 ` Brian Foster
2017-05-24 1:06 ` Dave Chinner
2017-05-24 12:42 ` Brian Foster
2017-05-24 13:26 ` Carlos Maiolino
2017-05-24 17:08 ` Brian Foster
-- strict thread matches above, loose matches on Subject: below --
2017-06-16 10:54 [PATCH 0/2 V4] Resubmit items failed during writeback Carlos Maiolino
2017-06-16 10:54 ` [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
2017-06-16 11:06 ` Carlos Maiolino
2017-06-16 18:35 ` Luis R. Rodriguez
2017-06-16 19:24 ` Darrick J. Wong
2017-06-16 19:37 ` Luis R. Rodriguez
2017-06-16 19:45 ` Eric Sandeen
2017-06-19 10:59 ` Brian Foster
2017-06-20 16:52 ` Luis R. Rodriguez
2017-06-20 17:20 ` Brian Foster
2017-06-20 18:05 ` Luis R. Rodriguez
2017-06-21 10:10 ` Brian Foster
2017-06-21 15:25 ` Luis R. Rodriguez
2017-06-20 18:38 ` Luis R. Rodriguez
2017-06-20 7:01 ` Carlos Maiolino
2017-06-20 16:24 ` Luis R. Rodriguez
2017-06-21 11:51 ` Carlos Maiolino
2017-06-19 13:49 ` Brian Foster
2017-06-19 15:09 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170522125110.GA11578@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).