public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/4] xfs: fix AGF vs inode cluster buffer deadlock
Date: Wed, 17 May 2023 11:47:55 +1000	[thread overview]
Message-ID: <ZGQySyiTJZzmB5N3@dread.disaster.area> (raw)
In-Reply-To: <20230517012629.GP858799@frogsfrogsfrogs>

On Tue, May 16, 2023 at 06:26:29PM -0700, Darrick J. Wong wrote:
> On Wed, May 17, 2023 at 10:04:49AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Lock order in XFS is AGI -> AGF, hence for operations involving
> > inode unlinked list operations we always lock the AGI first. Inode
> > unlinked list operations operate on the inode cluster buffer,
> > so the lock order there is AGI -> inode cluster buffer.
> > 
> > For O_TMPFILE operations, this now means the lock order set down in
> > xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the
> > unlinked ops are done before the directory modifications that may
> > allocate space and lock the AGF.
> > 
> > Unfortunately, we also now lock the inode cluster buffer when
> > logging an inode so that we can attach the inode to the cluster
> > buffer and pin it in memory. This creates a lock order of AGF ->
> > inode cluster buffer in directory operations as we have to log the
> > inode after we've allocated new space for it.
> > 
> > This creates a lock inversion between the AGF and the inode cluster
> > buffer. Because the inode cluster buffer is shared across multiple
> > inodes, the inversion is not specific to individual inodes but can
> > occur when inodes in the same cluster buffer are accessed in
> > different orders.
> > 
> > To fix this we need move all the inode log item cluster buffer
> > interactions to the end of the current transaction. Unfortunately,
> > xfs_trans_log_inode() calls are littered throughout the transactions
> > with no thought to ordering against other items or locking. This
> > makes it difficult to do anything that involves changing the call
> > sites of xfs_trans_log_inode() to change locking orders.
> > 
> > However, we do now have a mechanism that allows is to postpone dirty
> > item processing to just before we commit the transaction: the
> > ->iop_precommit method. This will be called after all the
> > modifications are done and high level objects like AGI and AGF
> > buffers have been locked and modified, thereby providing a mechanism
> > that guarantees we don't lock the inode cluster buffer before those
> > high level objects are locked.
> > 
> > This change is largely moving the guts of xfs_trans_log_inode() to
> > xfs_inode_item_precommit() and providing an extra flag context in
> > the inode log item to track the dirty state of the inode in the
> > current transaction. This also means we do a lot less repeated work
> > in xfs_trans_log_inode() by only doing it once per transaction when
> > all the work is done.
> 
> Aha, and that's why you moved all the "opportunistically tweak inode
> metadata while we're already logging it" bits to the precommit hook.

Yes. It didn't make sense to move just some of the "only need to do
once per transaction while the inode is locked" stuff from one place
to the other. I figured it's better to have it all in one place and
do it all just once...

....
> > -	/*
> > -	 * Always OR in the bits from the ili_last_fields field.  This is to
> > -	 * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines
> > -	 * in the eventual clearing of the ili_fields bits.  See the big comment
> > -	 * in xfs_iflush() for an explanation of this coordination mechanism.
> > -	 */
> > -	iip->ili_fields |= (flags | iip->ili_last_fields | iversion_flags);
> > -	spin_unlock(&iip->ili_lock);
> > +	iip->ili_dirty_flags |= flags;
> > +	trace_printk("ino 0x%llx, flags 0x%x, dflags 0x%x",
> > +		ip->i_ino, flags, iip->ili_dirty_flags);
> 
> Urk, leftover debugging info?

Yes, I just realised I'd left it there when writing the last email.
Ignoring those two trace-printk() statements, the rest of the code
should be ok to review...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-05-17  1:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-17  0:04 xfs: bug fixes for 6.4-rcX Dave Chinner
2023-05-17  0:04 ` [PATCH 1/4] xfs: buffer pins need to hold a buffer reference Dave Chinner
2023-05-17  1:26   ` Darrick J. Wong
2023-05-17 12:58   ` Christoph Hellwig
2023-05-17 22:24     ` Dave Chinner
2023-05-17  0:04 ` [PATCH 2/4] xfs: restore allocation trylock iteration Dave Chinner
2023-05-17  1:11   ` Darrick J. Wong
2023-05-17 12:59   ` Christoph Hellwig
2023-05-17  0:04 ` [PATCH 3/4] xfs: defered work could create precommits Dave Chinner
2023-05-17  1:20   ` Darrick J. Wong
2023-05-17  1:44     ` Dave Chinner
2023-06-01 15:01   ` Christoph Hellwig
2023-05-17  0:04 ` [PATCH 4/4] xfs: fix AGF vs inode cluster buffer deadlock Dave Chinner
2023-05-17  1:26   ` Darrick J. Wong
2023-05-17  1:47     ` Dave Chinner [this message]
2023-06-01  1:51     ` Dave Chinner
2023-06-01 14:38       ` Darrick J. Wong
2023-06-01 15:12   ` Christoph Hellwig
2023-06-25  2:58   ` Matthew Wilcox
2023-06-25 22:34     ` Dave Chinner
2023-05-17  7:07 ` xfs: bug fixes for 6.4-rcX Amir Goldstein
2023-05-17 11:34   ` Dave Chinner
2023-05-17 12:48     ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZGQySyiTJZzmB5N3@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox