public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 01/15] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering
Date: Thu, 31 Oct 2013 10:15:57 +1100	[thread overview]
Message-ID: <20131030231557.GJ6188@dastard> (raw)
In-Reply-To: <20131030223904.GM1935@sgi.com>

On Wed, Oct 30, 2013 at 05:39:04PM -0500, Ben Myers wrote:
> On Tue, Oct 29, 2013 at 10:11:44PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Removing an inode from the namespace involves removing the directory
> > entry and dropping the link count on the inode. Removing the
> > directory entry can result in locking an AGF (directory blocks were
> > freed) and removing a link count can result in placing the inode on
> > an unlinked list which results in locking an AGI.
> > 
> > The big problem here is that we have an ordering constraint on AGF
> > and AGI locking - inode allocation locks the AGI, then can allocate
> > a new extent for new inodes, locking the AGF after the AGI.
> > Similarly, freeing the inode removes the inode from the unlinked
> > list, requiring that we lock the AGI first, and then freeing the
> > inode can result in an inode chunk being freed and hence freeing
> > disk space requiring that we lock an AGF.
> > 
> > Hence the ordering that is imposed by other parts of the code is AGI
> > before AGF. This means we cannot remove the directory entry before
> > we drop the inode reference count and put it on the unlinked list as
> > this results in a lock order of AGF then AGI, and this can deadlock
> > against inode allocation and freeing. Therefore we must drop the
> > link counts before we remove the directory entry.
> > 
> > This is still safe from a transactional point of view - it is not
> > until we get to xfs_bmap_finish() that we have the possibility of
> > multiple transactions in this operation. Hence as long as we remove
> > the directory entry and drop the link count in the first transaction
> > of the remove operation, there are no transactional constraints on
> > the ordering here.
> > 
> > Change the ordering of the operations in the xfs_remove() function
> > to align the ordering of AGI and AGF locking to match that of the
> > rest of the code.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> These two codepaths look plausible for the deadlock you described:
> 
> inode allocation locking:
> xfs_create
>   xfs_dir_ialloc
>     xfs_ialloc
>       xfs_dialloc
>         xfs_ialloc_read_agi              * takes agi
>         xfs_ialloc_ag_alloc
>           xfs_alloc_vextent
>             xfs_alloc_fix_freelist
>               xfs_alloc_read_agf         * takes agf
> 
> vs
> 
> xfs_remove
>   xfs_dir_removename
>     xfs_dir2_node_removename
>       xfs_dir2_leafn_remove
>         xfs_dir2_shrink_inode
>           xfs_bunmapi
>           . xfs_bmap_del_extent
>           .   xfs_btree_delete
>           .     xfs_btree_delrec
>           .       .free_block
>           .         xfs_bmbt_free_block
>           .           xfs_bmap_add_free  * adds to free list, doesn't take agf
>             xfs_bmap_extents_to_btree
>               xfs_alloc_vextent          * takes agf

Yeah, that's not the obvious or common path, but it has the same
cause of allocation - it's a bmbt block that gets allocated. i.e.
removing a block from the middle of a contiguous extent can result
in the extent tree growing, and hence needing allocation of block
for the new entry. This is the path I was hitting:

....
        xfs_dir2_shrink_inode
          xfs_bunmapi
            xfs_bmap_del_extent
	      case 0: /* delete middle of extent */
	      xfs_btree_update
	      xfs_btree_increment
	      xfs_btree_insert
	        xfs_btree_insrec
		  xfs_btree_make_block_unfull
		    xfs_btree_split
		      .alloc_block
		        xfs_bmbt_alloc_block
		          xfs_alloc_vextent	* takes agf


> I was thinking I'd find something in .free_block, but I didn't.

Right, data extents are added to the free list that is later walked
and freed via xfs_bmap_finish() after it adds an EFI to match the
free list to the current transaction the free list belongs to and
commits it.

> But it does
> look like we'll take the agf if we have to convert between directory formats in
> xfs_dir2_leafn_remove, and it looks like there are a few more opportunities to
> take the agf in xfs_bunmapi...

Yup, but with the above call chain, any random block removal can
cause a bmbt allocation to occur, so we don't really need to look
any further. Indeed, you should just assume that any call to
xfs_bunmapi() to free an extent will require block allocation....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-10-30 23:16 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-29 11:11 [PATCH 00/15] xfs: patches for 3.13 Dave Chinner
2013-10-29 11:11 ` [PATCH 01/15] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
2013-10-30 22:39   ` Ben Myers
2013-10-30 23:15     ` Dave Chinner [this message]
2013-11-04 23:10       ` Ben Myers
2013-10-29 11:11 ` [PATCH 02/15] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
2013-10-29 11:11 ` [PATCH 03/15] xfs: abstract the differences in dir2/dir3 via an ops vector Dave Chinner
2013-10-29 11:11 ` [PATCH 04/15] xfs: vectorise remaining shortform dir2 ops Dave Chinner
2013-10-29 11:11 ` [PATCH 05/15] xfs: vectorise directory data operations Dave Chinner
2013-10-29 11:11 ` [PATCH 06/15] xfs: vectorise directory data operations part 2 Dave Chinner
2013-10-29 11:11 ` [PATCH 07/15] xfs: vectorise directory leaf operations Dave Chinner
2013-10-29 11:11 ` [PATCH 08/15] xfs: vectorise DA btree operations Dave Chinner
2013-10-29 11:11 ` [PATCH 09/15] xfs: vectorise encoding/decoding directory headers Dave Chinner
2013-10-29 19:06   ` Ben Myers
2013-10-29 11:11 ` [PATCH 10/15] xfs: vectorise directory leaf operations Dave Chinner
2013-10-29 19:13   ` Ben Myers
2013-10-29 11:11 ` [PATCH 11/15] xfs: convert directory vector functions to constants Dave Chinner
2013-10-29 19:22   ` Ben Myers
2013-10-29 22:15   ` [PATCH 11/15 V2] " Dave Chinner
2013-10-30 18:09     ` Ben Myers
2013-10-29 11:11 ` [PATCH 12/15] xfs: make dir2 ftype offset pointers explicit Dave Chinner
2013-10-29 20:00   ` Ben Myers
2013-10-29 22:15     ` Dave Chinner
2013-10-30 18:51       ` Ben Myers
2013-10-29 11:11 ` [PATCH 13/15] xfs: validity check the directory block leaf entry count Dave Chinner
2013-10-29 20:43   ` Ben Myers
2013-10-29 11:11 ` [PATCH 14/15] xfs: prevent stack overflows from page cache allocation Dave Chinner
2013-10-30 10:23   ` Christoph Hellwig
2013-10-30 21:40   ` Ben Myers
2013-10-29 11:11 ` [PATCH 15/15] xfs: fix static and extern sparse warnings Dave Chinner
2013-10-29 21:12   ` Ben Myers
2013-10-30 19:22 ` [PATCH 00/15] xfs: patches for 3.13 Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131030231557.GJ6188@dastard \
    --to=david@fromorbit.com \
    --cc=bpm@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox