All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	"Zorro Lang" <zlang@redhat.com>,
	"Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>
Subject: Re: [PATCH] xfs: preserve i_mode if __xfs_set_acl() fails
Date: Tue, 15 Aug 2017 16:29:39 -0300	[thread overview]
Message-ID: <20170815192938.GA1247@debian.home> (raw)
In-Reply-To: <20170815084430.GK21024@dastard>

On Tue, Aug 15, 2017 at 06:44:30PM +1000, Dave Chinner wrote:
> On Tue, Aug 15, 2017 at 03:18:58AM -0300, Ernesto A. Fernández wrote:
> > Hi Dave, thanks for the review!
> > 
> > On Tue, Aug 15, 2017 at 12:00:18PM +1000, Dave Chinner wrote:
> > > On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > index de7b9bd..c3193e1 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > @@ -395,6 +395,7 @@ xfs_attr_remove(
> > > >  	struct xfs_defer_ops	dfops;
> > > >  	xfs_fsblock_t		firstblock;
> > > >  	int			error;
> > > > +	int			noattr = 0;
> > > 
> > > bool.
> > > 
> > > And make it "bool has_attr = true" to avoid the nasty double
> > > negative of "!noattr" later on.
> > 
> > I called it noattr because the error was -ENOATTR, but you are right,
> > has_attr is better. I assume you would prefer mode_changed to be a bool
> > as well?
> > 
> > > 
> > > >  
> > > >  	XFS_STATS_INC(mp, xs_attr_remove);
> > > >  
> > > > @@ -438,7 +439,12 @@ xfs_attr_remove(
> > > >  	xfs_trans_ijoin(args.trans, dp, 0);
> > > >  
> > > >  	if (!xfs_inode_hasattr(dp)) {
> > > > -		error = -ENOATTR;
> > > > +		/*
> > > > +		 * Commit even if there is no attr to remove, because when
> > > > +		 * setting ACLs the caller may be changing the mode in the
> > > > +		 * same transaction.
> > > > +		 */
> > > > +		noattr = 1;
> > > >  	} else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > > >  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> > > >  		error = xfs_attr_shortform_remove(&args);
> > > > @@ -458,7 +464,7 @@ xfs_attr_remove(
> > > >  	if (mp->m_flags & XFS_MOUNT_WSYNC)
> > > >  		xfs_trans_set_sync(args.trans);
> > > >  
> > > > -	if ((flags & ATTR_KERNOTIME) == 0)
> > > > +	if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
> > > >  		xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
> > > 
> > > Why?
> > 
> > Because we don't want the ctime to change when you try to delete an
> > attribute that didn't exist in the first place. At least that's how
> > it went before, so I tried to keep it the same.
> 
> Which means the logic is convoluted and no longer obvious as to what
> it does. i.e. the noattr case is running a transaction that dirties
> an inode that no modifications were made to. Alarm bells right
> there.
> 
> > > > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > > >  		return error;
> > > >  
> > > >  	if (type == ACL_TYPE_ACCESS) {
> > > > -		umode_t mode;
> > > > -
> > > > -		error = posix_acl_update_mode(inode, &mode, &acl);
> > > > -		if (error)
> > > > -			return error;
> > > > -		error = xfs_set_mode(inode, mode);
> > > > +		error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
> > > >  		if (error)
> > > >  			return error;
> > > > +		inode->i_ctime = current_time(inode);
> > > > +		XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > > +		mode_changed = 1;
> > > >  	}
> > > 
> > > And this doesn't.
> > > 
> > > So this breaks all our rules for inode metadata updates, both for
> > > runtime access serialisation and for crash consistency.
> > 
> > I'm only making changes to the vfs inode, and xfs_set_acl() should always
> > be called with i_rwsem held for writing. I think that should be enough to
> > serialize access to the vfs inode fields.
> 
> It's not. The i_rwsem does not protect XFS inode metadata, which is
> what needs changing here. i.e. we change both the VFS inode and
> the corresponding XFS inode metadata at the same time, under the
> same XFS_ILOCK_EXCL lock, only inside a transaction context.
> 
> That's the rules, no exceptions.
> 
> > I'm sorry, I'm not sure I follow the crash consistency part. This is not
> > writing anything to disk.
> 
> xfs_set_mode() runs a transaction that is journalled to change the
> inode mode. You break in-memory consistency by modifying the VFS
> inode state directly without holding the correct locks (see
> xfs_setattr_mode()), and you break the transactional change model
> (i.e. on-disk journal consistency and on-disk metadata change
> ordering) by making metadata changes this outside transaction
> contexts and without holding the correct locks.
> 
> > My expectation was that the changes I made to
> > the vfs inode would be attached to the transaction later, when calling
> > xfs_trans_log_inode() in xfs_attr_set() or xfs_attr_remove(). Would that
> > not work as intended?
> 
> No, it wouldn't.  You have to follow the filesystem specific update
> rules to change inode metadata, not make up your own....
> 
> > I was mostly imitating how btrfs sets an ACL, so I
> > could be misunderstanding how xfs does journaling.
> 
> ... and btrfs has a completely different transaction model and
> implementation to XFS. IOWs, trying to do what works for btrfs in
> XFS code will break XFS. And vice versa.
> 
> > > >   set_acl:
> > > > -	return __xfs_set_acl(inode, acl, type);
> > > > +	error = __xfs_set_acl(inode, acl, type);
> > > > +	if (error && mode_changed) {
> > > > +		inode->i_mode = old_mode;
> > > > +		inode->i_ctime = old_ctime;
> > > > +		XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > > +	}
> > > 
> > > Same here.
> > 
> > i_rwsem would still be held here. My reasoning is that, since there was
> > an error, no changes were actually committed; so there would be no need
> > to commit the restoration either, just restore the old in-memory vfs inode
> > fields. Once again, this is inspired by btrfs, so I suppose it may fail
> > here in some way I can't see.
> > 
> > (I should probably clarify I did run tests before submitting)
> 
> When you treat the code as a black box for testing, then end result
> might appear to be what you want. That doesn't mean the
> implementation is correct or that hidden transient states within the
> block box are incorrect or invalid.
> 
> > > IOWs, you can't just remove a transactional modification of inode
> > > state and replace it with in-memory VFS inode level changes. You
> > > need to do the changes in transaction context and with the correct
> > > locks held.
> > >
> > > IOWs, if you need to undo the mode changes xfs_set_mode() did, you
> > > need to call xfs_set_mode() again. And because modifications have
> > 
> > But that could fail as well, and how would we handle that?
> 
> If the xattr change was not a fatal error (which will shut down the
> filesystem), then calling xfs_set_mode() should work just fine as it
> doesn't require any new space to be allocated (XFS != BTRFS - we can
> run overwriting transactions even at ENOSPC).
> 
> And you need to run the mode undo as a separate transaction
> because....
> 
> > If this bug
> > is to be fixed the whole thing needs to be a single transaction.
> 
> .... this is simply not possible. Why? Because __xfs_set_acl() can
> run up three separate transactions internally in adding the new
> xattr. That means your unlogged vfs indoe change has already been
> logged to the journal and so we *require* a transaction to overwrite
> the new mode that has already been committed to the journal.

It's clear that I don't understand xfs, and I'm causing more trouble than
I'm helping. I should leave this issue to you guys.
 
> But I have to ask - why do we even need to modify the mode first?
> Why not change the ACL first, then modify the mode+timestamp? If
> setting the ACL fails, then we don't have anything to undo and all
> is good....

I intended for the mode to be committed as part of the same transaction
that sets or removes the ACL. In my mind making the changes later, as part
of a separate transaction, would have meant that a crash between the two
left the filesystem in an inconsistent state, with a new ACL but without
the corresponding mode bits.

Of course that was based on a poor understanding of xfs journaling, so now
I don't know.

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Thanks,
Ernest

  reply	other threads:[~2017-08-15 19:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-14  8:18 [PATCH] xfs: preserve i_mode if __xfs_set_acl() fails Ernesto A. Fernández
2017-08-14 22:28 ` Darrick J. Wong
2017-08-15  0:58   ` Ernesto A. Fernández
2017-08-15  2:00 ` Dave Chinner
2017-08-15  6:18   ` Ernesto A. Fernández
2017-08-15  8:44     ` Dave Chinner
2017-08-15 19:29       ` Ernesto A. Fernández [this message]
2017-08-16  0:25         ` Dave Chinner
2017-08-16  7:16           ` Ernesto A. Fernández
2017-08-16 13:35             ` Dave Chinner
2017-08-16 19:31               ` Ernesto A. Fernández
2017-08-17  0:00                 ` Dave Chinner
2017-08-17  5:34                   ` Ernesto A. Fernández
2017-08-17  6:34                     ` Dave Chinner
2017-12-07 17:31 ` Bill O'Donnell
2017-12-07 17:38   ` Bill O'Donnell
2017-12-07 17:43   ` Darrick J. Wong
2017-12-07 17:51     ` Bill O'Donnell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170815192938.GA1247@debian.home \
    --to=ernesto.mnd.fernandez@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.