All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 01/15] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering
Date: Mon, 4 Nov 2013 17:10:58 -0600	[thread overview]
Message-ID: <20131104231058.GS1935@sgi.com> (raw)
In-Reply-To: <20131030231557.GJ6188@dastard>

On Thu, Oct 31, 2013 at 10:15:57AM +1100, Dave Chinner wrote:
> On Wed, Oct 30, 2013 at 05:39:04PM -0500, Ben Myers wrote:
> > On Tue, Oct 29, 2013 at 10:11:44PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Removing an inode from the namespace involves removing the directory
> > > entry and dropping the link count on the inode. Removing the
> > > directory entry can result in locking an AGF (directory blocks were
> > > freed) and removing a link count can result in placing the inode on
> > > an unlinked list which results in locking an AGI.
> > > 
> > > The big problem here is that we have an ordering constraint on AGF
> > > and AGI locking - inode allocation locks the AGI, then can allocate
> > > a new extent for new inodes, locking the AGF after the AGI.
> > > Similarly, freeing the inode removes the inode from the unlinked
> > > list, requiring that we lock the AGI first, and then freeing the
> > > inode can result in an inode chunk being freed and hence freeing
> > > disk space requiring that we lock an AGF.
> > > 
> > > Hence the ordering that is imposed by other parts of the code is AGI
> > > before AGF. This means we cannot remove the directory entry before
> > > we drop the inode reference count and put it on the unlinked list as
> > > this results in a lock order of AGF then AGI, and this can deadlock
> > > against inode allocation and freeing. Therefore we must drop the
> > > link counts before we remove the directory entry.
> > > 
> > > This is still safe from a transactional point of view - it is not
> > > until we get to xfs_bmap_finish() that we have the possibility of
> > > multiple transactions in this operation. Hence as long as we remove
> > > the directory entry and drop the link count in the first transaction
> > > of the remove operation, there are no transactional constraints on
> > > the ordering here.
> > > 
> > > Change the ordering of the operations in the xfs_remove() function
> > > to align the ordering of AGI and AGF locking to match that of the
> > > rest of the code.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > 
> > These two codepaths look plausible for the deadlock you described:
> > 
> > inode allocation locking:
> > xfs_create
> >   xfs_dir_ialloc
> >     xfs_ialloc
> >       xfs_dialloc
> >         xfs_ialloc_read_agi              * takes agi
> >         xfs_ialloc_ag_alloc
> >           xfs_alloc_vextent
> >             xfs_alloc_fix_freelist
> >               xfs_alloc_read_agf         * takes agf
> > 
> > vs
> > 
> > xfs_remove
> >   xfs_dir_removename
> >     xfs_dir2_node_removename
> >       xfs_dir2_leafn_remove
> >         xfs_dir2_shrink_inode
> >           xfs_bunmapi
> >           . xfs_bmap_del_extent
> >           .   xfs_btree_delete
> >           .     xfs_btree_delrec
> >           .       .free_block
> >           .         xfs_bmbt_free_block
> >           .           xfs_bmap_add_free  * adds to free list, doesn't take agf
> >             xfs_bmap_extents_to_btree
> >               xfs_alloc_vextent          * takes agf
> 
> Yeah, that's not the obvious or common path, but it has the same
> cause of allocation - it's a bmbt block that gets allocated. i.e.
> removing a block from the middle of a contiguous extent can result
> in the extent tree growing, and hence needing allocation of block
> for the new entry. This is the path I was hitting:
> 
> ....
>         xfs_dir2_shrink_inode
>           xfs_bunmapi
>             xfs_bmap_del_extent
> 	      case 0: /* delete middle of extent */
> 	      xfs_btree_update
> 	      xfs_btree_increment
> 	      xfs_btree_insert
> 	        xfs_btree_insrec
> 		  xfs_btree_make_block_unfull
> 		    xfs_btree_split
> 		      .alloc_block
> 		        xfs_bmbt_alloc_block
> 		          xfs_alloc_vextent	* takes agf
> 
> 
> > I was thinking I'd find something in .free_block, but I didn't.
> 
> Right, data extents are added to the free list that is later walked
> and freed via xfs_bmap_finish() after it adds an EFI to match the
> free list to the current transaction the free list belongs to and
> commits it.
> 
> > But it does
> > look like we'll take the agf if we have to convert between directory formats in
> > xfs_dir2_leafn_remove, and it looks like there are a few more opportunities to
> > take the agf in xfs_bunmapi...
> 
> Yup, but with the above call chain, any random block removal can
> cause a bmbt allocation to occur, so we don't really need to look
> any further. Indeed, you should just assume that any call to
> xfs_bunmapi() to free an extent will require block allocation....

Applied this.  Thanks Dave.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-11-04 23:11 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-29 11:11 [PATCH 00/15] xfs: patches for 3.13 Dave Chinner
2013-10-29 11:11 ` [PATCH 01/15] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
2013-10-30 22:39   ` Ben Myers
2013-10-30 23:15     ` Dave Chinner
2013-11-04 23:10       ` Ben Myers [this message]
2013-10-29 11:11 ` [PATCH 02/15] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
2013-10-29 11:11 ` [PATCH 03/15] xfs: abstract the differences in dir2/dir3 via an ops vector Dave Chinner
2013-10-29 11:11 ` [PATCH 04/15] xfs: vectorise remaining shortform dir2 ops Dave Chinner
2013-10-29 11:11 ` [PATCH 05/15] xfs: vectorise directory data operations Dave Chinner
2013-10-29 11:11 ` [PATCH 06/15] xfs: vectorise directory data operations part 2 Dave Chinner
2013-10-29 11:11 ` [PATCH 07/15] xfs: vectorise directory leaf operations Dave Chinner
2013-10-29 11:11 ` [PATCH 08/15] xfs: vectorise DA btree operations Dave Chinner
2013-10-29 11:11 ` [PATCH 09/15] xfs: vectorise encoding/decoding directory headers Dave Chinner
2013-10-29 19:06   ` Ben Myers
2013-10-29 11:11 ` [PATCH 10/15] xfs: vectorise directory leaf operations Dave Chinner
2013-10-29 19:13   ` Ben Myers
2013-10-29 11:11 ` [PATCH 11/15] xfs: convert directory vector functions to constants Dave Chinner
2013-10-29 19:22   ` Ben Myers
2013-10-29 22:15   ` [PATCH 11/15 V2] " Dave Chinner
2013-10-30 18:09     ` Ben Myers
2013-10-29 11:11 ` [PATCH 12/15] xfs: make dir2 ftype offset pointers explicit Dave Chinner
2013-10-29 20:00   ` Ben Myers
2013-10-29 22:15     ` Dave Chinner
2013-10-30 18:51       ` Ben Myers
2013-10-29 11:11 ` [PATCH 13/15] xfs: validity check the directory block leaf entry count Dave Chinner
2013-10-29 20:43   ` Ben Myers
2013-10-29 11:11 ` [PATCH 14/15] xfs: prevent stack overflows from page cache allocation Dave Chinner
2013-10-30 10:23   ` Christoph Hellwig
2013-10-30 21:40   ` Ben Myers
2013-10-29 11:11 ` [PATCH 15/15] xfs: fix static and extern sparse warnings Dave Chinner
2013-10-29 21:12   ` Ben Myers
2013-10-30 19:22 ` [PATCH 00/15] xfs: patches for 3.13 Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131104231058.GS1935@sgi.com \
    --to=bpm@sgi.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.