All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "J. Bruce Fields" <bfields@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	"Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>
Subject: Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
Date: Wed, 10 Jul 2013 13:38:53 +1000	[thread overview]
Message-ID: <20130710033853.GP3438@dastard> (raw)
In-Reply-To: <20130710024059.GN32574@pad.fieldses.org>

On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > On Tue, Jul 09, 2013 at 08:21:20PM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
> ...
> > > > Just to throw a spanner in the works - have you considered that
> > > > other filesystems might have different inode lock ordering rules?
> > > > 
> > > > For example, XFS locks multiple inodes in ascending inode number
> > > > order, not ordered by pointer address. Hence we end up different
> > > > inode lock ordering at different layers of the stack and I can't see
> > > > that ending well....
> > > 
> > > What lock(s) is it taking exactly, where?
> > 
> > xfs_lock_two_inodes() locks two XFS inodes and doesn't require
> > i_mutex on the inodes to be held first.
> > 
> > Then there's xfs_lock_inodes() which can lock an arbitrary number of
> > inodes and has some special casing to avoid transaction subsystem
> > deadlocks. That's used by rename so typically is used for 4 inodes
> > maximum, and the ordering is set up via xfs_sort_for_rename(). The
> > VFS typically already holds the i_mutex on these inodes first, so
> > I'm not so concerned about this case.
> > 
> > I'm not sure that there is actually deadlock, but given that XFS can
> > lock multiple inodes independently of the VFS (e.g. through ioctl
> > interfaces) I'm extremely wary of differences in lock ordering on
> > the same structure....
> 
> OK.
> 
> > > If there's a possible
> > > deadlock, can we come up with a compatible ordering?
> > 
> > Sure. I'd prefer ordering by inode number, because then ordering is
> > deterministic rather than being dependent on memory allocation
> > results.  It makes forensic analysis of deadlocks and corruptions
> > easier because you can look at on-disk structures and accurately
> > predict locking behaviour and therefore determine the order of
> > operations that should occur. With lock ordering determined by
> > memory addresses, you can't easily predict the lock ordering two
> > particular inodes might take from one operation to another.
> 
> Hm, OK, not having done this I don't have a good feeling for how
> important that is, but I can take your word for it.
> 
> But the ext4 code actually originally used i_ino order and was changed
> by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> suggestion?:
> 
> 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw@mail.gmail.com>
> 
> 	"And the only sane order is comparing inode pointers, not inode
> 	numbers like ext4 apparently does."

Interesting. What has worked for the last 20 years must be wrong if
Linus says so ;)

> 
> (Uh, I thought I also remembered some rationale but can't dig up the
> email now.)

Probably duplicate inode numbers on inodes in different filesystems.
But rename doesn't allow that, and I don't we ever want to allow
arbitrary nested inode locking across superblocks. Hence I can't
think of a reason why it's a problem...

FWIW - gfs2 does multiple glock locking similar to XFS inode locking
- it sorts the locks in lock number order and the locks them all one
at a time...

> > > > > +EXPORT_SYMBOL(lock_two_nondirectories);
> > > > 
> > > > What makes this specific to non-directories?
> > > 
> > > See 
> > > 
> > > 	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields@redhat.com>
> > > 
> > > The only caller outside ext4 is vfs_rename_other.
> > 
> > Ah, so we now mix two different lock ordering models for directories
> > vs non-directories.  i.e. lock_rename() enforces parent/child
> > relationships on the two directories being locked, but if there is
> > no ancestry, it doesn't order the inode locking at all.
> > 
> > So it seems that we can make up whatever ordering we want here,
> > as long as we use it everywhere for locking multiple inodes. What
> > other code locks multiple inodes?
> 
> The ext4 code is the only code I know of--but only I think because Al
> pointed out.  And obviously I overlooked the xfs case.  I'll try looking
> harder....

A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
any other obvious ones.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
To: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
	Andreas Dilger
	<adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
Subject: Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
Date: Wed, 10 Jul 2013 13:38:53 +1000	[thread overview]
Message-ID: <20130710033853.GP3438@dastard> (raw)
In-Reply-To: <20130710024059.GN32574-spRCxval1Z7TsXDwO4sDpg@public.gmane.org>

On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > On Tue, Jul 09, 2013 at 08:21:20PM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
> ...
> > > > Just to throw a spanner in the works - have you considered that
> > > > other filesystems might have different inode lock ordering rules?
> > > > 
> > > > For example, XFS locks multiple inodes in ascending inode number
> > > > order, not ordered by pointer address. Hence we end up different
> > > > inode lock ordering at different layers of the stack and I can't see
> > > > that ending well....
> > > 
> > > What lock(s) is it taking exactly, where?
> > 
> > xfs_lock_two_inodes() locks two XFS inodes and doesn't require
> > i_mutex on the inodes to be held first.
> > 
> > Then there's xfs_lock_inodes() which can lock an arbitrary number of
> > inodes and has some special casing to avoid transaction subsystem
> > deadlocks. That's used by rename so typically is used for 4 inodes
> > maximum, and the ordering is set up via xfs_sort_for_rename(). The
> > VFS typically already holds the i_mutex on these inodes first, so
> > I'm not so concerned about this case.
> > 
> > I'm not sure that there is actually deadlock, but given that XFS can
> > lock multiple inodes independently of the VFS (e.g. through ioctl
> > interfaces) I'm extremely wary of differences in lock ordering on
> > the same structure....
> 
> OK.
> 
> > > If there's a possible
> > > deadlock, can we come up with a compatible ordering?
> > 
> > Sure. I'd prefer ordering by inode number, because then ordering is
> > deterministic rather than being dependent on memory allocation
> > results.  It makes forensic analysis of deadlocks and corruptions
> > easier because you can look at on-disk structures and accurately
> > predict locking behaviour and therefore determine the order of
> > operations that should occur. With lock ordering determined by
> > memory addresses, you can't easily predict the lock ordering two
> > particular inodes might take from one operation to another.
> 
> Hm, OK, not having done this I don't have a good feeling for how
> important that is, but I can take your word for it.
> 
> But the ext4 code actually originally used i_ino order and was changed
> by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> suggestion?:
> 
> 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
> 
> 	"And the only sane order is comparing inode pointers, not inode
> 	numbers like ext4 apparently does."

Interesting. What has worked for the last 20 years must be wrong if
Linus says so ;)

> 
> (Uh, I thought I also remembered some rationale but can't dig up the
> email now.)

Probably duplicate inode numbers on inodes in different filesystems.
But rename doesn't allow that, and I don't we ever want to allow
arbitrary nested inode locking across superblocks. Hence I can't
think of a reason why it's a problem...

FWIW - gfs2 does multiple glock locking similar to XFS inode locking
- it sorts the locks in lock number order and the locks them all one
at a time...

> > > > > +EXPORT_SYMBOL(lock_two_nondirectories);
> > > > 
> > > > What makes this specific to non-directories?
> > > 
> > > See 
> > > 
> > > 	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > 
> > > The only caller outside ext4 is vfs_rename_other.
> > 
> > Ah, so we now mix two different lock ordering models for directories
> > vs non-directories.  i.e. lock_rename() enforces parent/child
> > relationships on the two directories being locked, but if there is
> > no ancestry, it doesn't order the inode locking at all.
> > 
> > So it seems that we can make up whatever ordering we want here,
> > as long as we use it everywhere for locking multiple inodes. What
> > other code locks multiple inodes?
> 
> The ext4 code is the only code I know of--but only I think because Al
> pointed out.  And obviously I overlooked the xfs case.  I'll try looking
> harder....

A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
any other obvious ones.

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2013-07-10  3:38 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
2013-07-03 20:12 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
2013-07-09 10:49   ` Jeff Layton
2013-07-09 10:49     ` Jeff Layton
2013-07-09 15:48     ` Theodore Ts'o
2013-07-09 22:04   ` Dave Chinner
2013-07-09 22:04     ` Dave Chinner
2013-07-10  0:21     ` J. Bruce Fields
2013-07-10  0:21       ` J. Bruce Fields
2013-07-10  2:09       ` Dave Chinner
2013-07-10  2:09         ` Dave Chinner
2013-07-10  2:40         ` J. Bruce Fields
2013-07-10  3:38           ` Dave Chinner [this message]
2013-07-10  3:38             ` Dave Chinner
2013-07-10 21:26             ` J. Bruce Fields
2013-07-10 21:26               ` J. Bruce Fields
2013-07-11 14:04               ` Jeff Layton
2013-07-11 14:04                 ` Jeff Layton
2013-07-12 22:07                 ` J. Bruce Fields
2013-07-12 22:07                   ` J. Bruce Fields
2013-07-03 20:12 ` [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories J. Bruce Fields
2013-07-09 10:50   ` Jeff Layton
2013-07-09 10:50     ` Jeff Layton
2013-07-03 20:12 ` [PATCH 03/12] vfs: rename I_MUTEX_QUOTA now that it's not used for quotas J. Bruce Fields
2013-07-03 20:12   ` J. Bruce Fields
2013-07-09 10:54   ` Jeff Layton
2013-07-09 10:54     ` Jeff Layton
2013-07-09 14:26     ` J. Bruce Fields
2013-07-09 14:31       ` Jeff Layton
2013-07-03 20:12 ` [PATCH 04/12] vfs: take i_mutex on renamed file J. Bruce Fields
2013-07-03 20:12   ` J. Bruce Fields
2013-07-09 10:59   ` Jeff Layton
2013-07-09 10:59     ` Jeff Layton
2013-07-03 20:12 ` [PATCH 05/12] locks: introduce new FL_DELEG lock flag J. Bruce Fields
2013-07-03 20:12   ` J. Bruce Fields
2013-07-09 11:00   ` Jeff Layton
2013-07-09 11:00     ` Jeff Layton
2013-07-03 20:12 ` [PATCH 06/12] locks: implement delegations J. Bruce Fields
2013-07-09 12:23   ` Jeff Layton
2013-07-09 12:23     ` Jeff Layton
2013-07-09 14:41     ` J. Bruce Fields
2013-07-09 14:41       ` J. Bruce Fields
2013-07-03 20:12 ` [PATCH 07/12] namei: minor vfs_unlink cleanup J. Bruce Fields
2013-07-09 12:50   ` Jeff Layton
2013-07-09 12:50     ` Jeff Layton
2013-07-03 20:12 ` [PATCH 08/12] locks: break delegations on unlink J. Bruce Fields
2013-07-09 13:05   ` Jeff Layton
2013-07-09 13:05     ` Jeff Layton
2013-07-09 13:07     ` Jeff Layton
2013-07-09 13:07       ` Jeff Layton
2013-07-09 15:58     ` J. Bruce Fields
2013-07-09 15:58       ` J. Bruce Fields
2013-07-09 16:02       ` Jeff Layton
2013-07-09 19:29     ` J. Bruce Fields
2013-07-09 19:29       ` J. Bruce Fields
2013-07-03 20:12 ` [PATCH 09/12] locks: helper functions for delegation breaking J. Bruce Fields
2013-07-09 13:09   ` Jeff Layton
2013-07-09 13:09     ` Jeff Layton
2013-07-09 19:31     ` J. Bruce Fields
2013-07-09 19:37       ` Jeff Layton
2013-07-09 13:23   ` Jeff Layton
2013-07-09 19:38     ` J. Bruce Fields
2013-07-09 20:28       ` Jeff Layton
2013-07-03 20:12 ` [PATCH 10/12] locks: break delegations on rename J. Bruce Fields
2013-07-09 13:14   ` Jeff Layton
2013-07-09 13:14     ` Jeff Layton
2013-07-03 20:12 ` [PATCH 11/12] locks: break delegations on link J. Bruce Fields
2013-07-09 13:16   ` Jeff Layton
2013-07-09 13:16     ` Jeff Layton
2013-07-09 20:41     ` J. Bruce Fields
2013-07-09 20:41       ` J. Bruce Fields
2013-07-03 20:12 ` [PATCH 12/12] locks: break delegations on any attribute modification J. Bruce Fields
2013-07-09 13:30   ` Jeff Layton
2013-07-09 20:51     ` J. Bruce Fields
2013-07-09 20:51       ` J. Bruce Fields
2013-07-09 21:19       ` J. Bruce Fields
2013-07-09 21:19         ` J. Bruce Fields
2013-07-10  1:26         ` Jeff Layton
2013-07-10  1:26           ` Jeff Layton
2013-07-10 19:33           ` J. Bruce Fields
2013-07-10 19:33             ` J. Bruce Fields
2013-07-09 23:57       ` Jeff Layton
2013-07-09 23:57         ` Jeff Layton
  -- strict thread matches above, loose matches on Subject: below --
2013-09-05 16:30 [PATCH 00/12] Implement NFSv4 delegations, take 10 J. Bruce Fields
2013-09-05 16:30 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
2013-04-17  1:46 [PATCH 00/12] Implement NFSv4 delegations, take 7 J. Bruce Fields
2013-04-17  1:46 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
2013-02-03 16:31 [PATCH 00/12] Implement NFSv4 delegations, take 6 J. Bruce Fields
2013-02-03 16:31 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
2012-10-16 22:01 [PATCH 00/12] Implement NFSv4 delegations, take 5 J. Bruce Fields
2012-10-16 22:01 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
2012-10-16 22:01   ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130710033853.GP3438@dastard \
    --to=david@fromorbit.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=bfields@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.