Re: [PATCH] btrfs: fix fallocate to use file_modified to update permissions consistently

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: fix fallocate to use file_modified to update permissions consistently
Date: Mon, 14 Mar 2022 10:40:55 -0700	[thread overview]
Message-ID: <20220314174055.GE8241@magnolia> (raw)
In-Reply-To: <Yi8ky7zU4L6Kk+eo@debian9.Home>

On Mon, Mar 14, 2022 at 11:19:39AM +0000, Filipe Manana wrote:
> On Fri, Mar 11, 2022 at 03:51:10PM -0800, Darrick J. Wong wrote:
> > On Fri, Mar 11, 2022 at 12:28:58PM +0000, Filipe Manana wrote:
> > > On Thu, Mar 10, 2022 at 11:22:45AM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Since the initial introduction of (posix) fallocate back at the turn of
> > > > the century, it has been possible to use this syscall to change the
> > > > user-visible contents of files.  This can happen by extending the file
> > > > size during a preallocation, or through any of the newer modes (punch,
> > > > zero range).  Because the call can be used to change file contents, we
> > > > should treat it like we do any other modification to a file -- update
> > > > the mtime, and drop set[ug]id privileges/capabilities.
> > > > 
> > > > The VFS function file_modified() does all this for us if pass it a
> > > > locked inode, so let's make fallocate drop permissions correctly.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > > Note: I plan to add fstests to test this behavior, but after the
> > > > generic/673 mess, I'm holding back on them until I can fix the three
> > > > major filesystems and clean up the xfs setattr_copy code.
> > > > 
> > > > https://lore.kernel.org/linux-ext4/20220310174410.GB8172@magnolia/T/#u
> > > > ---
> > > >  fs/btrfs/file.c |   23 +++++++++++++++++++----
> > > >  1 file changed, 19 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > > > index a0179cc62913..79e61c88b9e7 100644
> > > > --- a/fs/btrfs/file.c
> > > > +++ b/fs/btrfs/file.c
> > > > @@ -2918,8 +2918,9 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode,
> > > >  	return ret;
> > > >  }
> > > >  
> > > > -static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
> > > > +static int btrfs_punch_hole(struct file *file, loff_t offset, loff_t len)
> > > >  {
> > > > +	struct inode *inode = file_inode(file);
> > > >  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> > > >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> > > >  	struct extent_state *cached_state = NULL;
> > > > @@ -2951,6 +2952,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
> > > >  		goto out_only_mutex;
> > > >  	}
> > > >  
> > > > +	ret = file_modified(file);
> > > > +	if (ret)
> > > > +		goto out_only_mutex;
> > > > +
> > > >  	lockstart = round_up(offset, btrfs_inode_sectorsize(BTRFS_I(inode)));
> > > >  	lockend = round_down(offset + len,
> > > >  			     btrfs_inode_sectorsize(BTRFS_I(inode))) - 1;
> > > > @@ -3177,11 +3182,12 @@ static int btrfs_zero_range_check_range_boundary(struct btrfs_inode *inode,
> > > >  	return ret;
> > > >  }
> > > >  
> > > > -static int btrfs_zero_range(struct inode *inode,
> > > > +static int btrfs_zero_range(struct file *file,
> > > >  			    loff_t offset,
> > > >  			    loff_t len,
> > > >  			    const int mode)
> > > >  {
> > > > +	struct inode *inode = file_inode(file);
> > > >  	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> > > >  	struct extent_map *em;
> > > >  	struct extent_changeset *data_reserved = NULL;
> > > > @@ -3202,6 +3208,12 @@ static int btrfs_zero_range(struct inode *inode,
> > > >  		goto out;
> > > >  	}
> > > >  
> > > > +	ret = file_modified(file);
> > > > +	if (ret) {
> > > > +		free_extent_map(em);
> > > > +		goto out;
> > > > +	}
> > > > +
> > > 
> > > Could be done before getting the extent map, to make the code a bit shorter, or
> > > see the comment below.
> > 
> > The trouble is, if getting the extent map fails, we didn't change the
> > file, so there's no reason to bump the timestamps and whatnot...
> 
> Right, I figured you had that sort of intention.
> 
> However after the call to file_modified(), we may actually have not change the
> file at all, like when trying to zero a range that is already fully covered by a
> preallocated extent.

<nod>  At least in XFSland, there's no good way to check for a
pre-existing prealloc extent without holding the ILOCK, which makes
things complicated as I'll explain below. :)

> > 
> > > 
> > > >  	/*
> > > >  	 * Avoid hole punching and extent allocation for some cases. More cases
> > > >  	 * could be considered, but these are unlikely common and we keep things
> > > > @@ -3391,7 +3403,7 @@ static long btrfs_fallocate(struct file *file, int mode,
> > > >  		return -EOPNOTSUPP;
> > > >  
> > > >  	if (mode & FALLOC_FL_PUNCH_HOLE)
> > > > -		return btrfs_punch_hole(inode, offset, len);
> > > > +		return btrfs_punch_hole(file, offset, len);
> > > >  
> > > >  	/*
> > > >  	 * Only trigger disk allocation, don't trigger qgroup reserve
> > > > @@ -3446,7 +3458,7 @@ static long btrfs_fallocate(struct file *file, int mode,
> > > >  		goto out;
> > > >  
> > > >  	if (mode & FALLOC_FL_ZERO_RANGE) {
> > > > -		ret = btrfs_zero_range(inode, offset, len, mode);
> > > > +		ret = btrfs_zero_range(file, offset, len, mode);
> > > >  		btrfs_inode_unlock(inode, BTRFS_ILOCK_MMAP);
> > > >  		return ret;
> > > >  	}
> > > > @@ -3528,6 +3540,9 @@ static long btrfs_fallocate(struct file *file, int mode,
> > > >  		cur_offset = last_byte;
> > > >  	}
> > > >  
> > > > +	if (!ret)
> > > > +		ret = file_modified(file);
> > > 
> > > If call file_modified() before the if that checks for the zero range case,
> > > then we avoid having to call file_modified() at btrfs_zero_range() too,
> > > and get the behaviour for both plain fallocate and zero range.
> > 
> > ...and the reason I put it here is to make sure the ordered IO finishes
> > ok and that we pass the quota limit checks before we start modifying
> > things.
> 
> Ok, but before that point we may already have modified the file, through
> a call to either btrfs_cont_expand() or btrfs_truncate_block() above, to
> zero out part of the content of a page.

Ahh, ok.  My goal was to eliminate the places where we don't call
file_modified, even if that comes at the cost of occasionally doing it
when it wasn't strictly necessary.

> So if we did that, and we got an error when waiting for ordered extents
> or from the qgroup reservation, we end up leaving fallocate without
> calling file_modified(), despite having modified the file.
> 
> > 
> > That said -- you all know btrfs far better than I do, so if you'd rather
> > I put these calls further up (probably right after the inode_newsize_ok
> > check?) then I'm happy to do that. :)
> 
> Technically I suppose we should only call file_modified() if we actually
> change anything in the file, but as it is, we are calling it even when we
> don't end up modifying it or in some cases not calling it when we modify
> it.

Yep.

> How does xfs behaves in this respect? Does it call file_modified() only
> if something in the file actually changed?

file_modified calls back into the filesystem to run transactions to
update metadata, which means that XFS can't call it if it's already
gotten a transaction and an inode ILOCK.  Unfortunately, we also can't
check to see if the file actually requires modifications (zeroing
contents, extending i_size) until we've taken the ILOCK.

If we really wanted to be strict about only stripping permissions if the
file *actually* changed, we'd either have to re-design our setattr
implementation to notice a running transaction and use it; or do a weird
little dance where we lock the inode, check it, undo all that to call
file_modified if we decide it's necessary, and then create a new
transaction and re-lock it.  We'd also have to keep doing that until the
file stabilizes, which seemed like a lot of work to handle something
that's mostly a corner case, so XFS always calls file_modified after
successfully flushing data to disk.

At that point XFS haven't even gotten to checking quota yet, so
technically that's also a gap where we could drop privs but then fail
the fallocate with EDQUOT.

--D

> 
> Thanks.
> 
> > 
> > --D
> > 
> > > 
> > > Otherwise, it looks good.
> > > 
> > > Thanks for doing this, I had it on my todo list since I noticed the generic/673
> > > failure with reflinks and the suid/sgid bits.
> > > 
> > > > +
> > > >  	/*
> > > >  	 * If ret is still 0, means we're OK to fallocate.
> > > >  	 * Or just cleanup the list and exit.

next prev parent reply	other threads:[~2022-03-14 17:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-10 19:22 [PATCH] btrfs: fix fallocate to use file_modified to update permissions consistently Darrick J. Wong
2022-03-11 12:28 ` Filipe Manana
2022-03-11 23:51   ` Darrick J. Wong
2022-03-14 11:19     ` Filipe Manana
2022-03-14 17:40       ` Darrick J. Wong [this message]
2022-03-15 11:02         ` Filipe Manana
2022-03-15 16:40           ` Darrick J. Wong
2022-03-15 17:30             ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220314174055.GE8241@magnolia \
    --to=djwong@kernel.org \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox