public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Olga Kornievskaia <olga.kornievskaia@gmail.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 01/11] fs: Don't copy beyond the end of the file
Date: Mon, 22 Oct 2018 19:23:31 -0400	[thread overview]
Message-ID: <1c269f7a09c24638ed3105b1fc934ca2c25383c9.camel@kernel.org> (raw)
In-Reply-To: <CAN-5tyEbGmujqD5Q1FBH9uBWR0YYbW_ojBhdbcD_7TYV9dqMag@mail.gmail.com>

On Mon, 2018-10-22 at 14:32 -0400, Olga Kornievskaia wrote:
> On Sun, Oct 21, 2018 at 10:29 AM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Fri, 2018-10-19 at 11:29 -0400, Olga Kornievskaia wrote:
> > > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > > 
> > > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > > ---
> > >  fs/read_write.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > index 39b4a21..c60790f 100644
> > > --- a/fs/read_write.c
> > > +++ b/fs/read_write.c
> > > @@ -1570,6 +1570,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > >       if (unlikely(ret))
> > >               return ret;
> > > 
> > > +     if (pos_in >= i_size_read(inode_in))
> > > +             return -EINVAL;
> > > +
> > >       if (!(file_in->f_mode & FMODE_READ) ||
> > >           !(file_out->f_mode & FMODE_WRITE) ||
> > >           (file_out->f_flags & O_APPEND))
> > 
> > The patch description could use a bit more fleshing-out. The
> > copy_file_range manpage says:
> > 
> >        EINVAL Requested range extends beyond the end of the source file; or the flags argument is not 0.
> > 
> > So I guess this is intended to satisfy that requirement?
> 
> I agree the description of the patch is poor. It sort of falls under
> the the man page's description of range beyond the end of the source
> file. But in NFSv4.2, there is an explicit wording for the validity of
> the input parameters and having an input source offset that's beyond
> the end of the file is what this patch is attempting to check.
> 

Side note:

One has to wonder why they decided to make that an -EINVAL condition?
The system call returns ssize_t. Why not just return a fewer number of
bytes in that situation?

In fact, the RETURN VALUE section of the manpage says:

       Upon successful completion, copy_file_range() will return the number of
       bytes copied between files.  This could be less than the length  origi‐
       nally requested.

Under what conditions would that occur that do not include the file
being shorter than the range you wanted to copy?

I wonder if we ought to lobby to get that changed.

> > If so,
> > i_size_read is just going to return whatever is in inode->isize.
> > Could a copy_file_range call end up getting issued to copy from a file
> > that is already open on a range that it doesn't know about yet? i.e.
> > where the inode cache has not yet been updated.
> 
> I thought that with NFSv4 cache consistency, the inode->isize is
> accurate. If previous open had a read delegation, any modification on
> a server would trigger a CB_RECALL and the open for the copy offload
> would retrieve the latest size. In case of no delegations, the open
> retrieves the latest size and the call to copy_file_range() would have
> an update size.
> 

> It seems like that could on network filesystems (like NFS). Would this
> > be better handled in ->copy_file_range instead, where the driver could
> > make a better determination of the file size?
> 
> I'm not opposed to moving the size check into the NFS's copy_file_size
> (again in my opinion NFS attribute cache has the same file size as the
> inode's size). I think the thought was that such check should be done
> at the VFS layer as oppose to doing it by each of the file systems.
> 
> 

The attribute cache is not revalidated before the i_size is fetched with
i_size_read. You're just reading what happens to be in the in-memory
inode structure. So both clients have the file open already, and neither
has a delegation:

client 1: fetches attributes from file and sees a size of 1000
client 2: writes 20 bytes at offset 1000
client 1: calls copy file range to copy 1020 bytes starting at offset 0

If client1 didn't get an attribute update before the copy_file_range
call came in, then it would still think the size was 1000 and fail the
operation. It may even be many seconds before client1 sees the updated
size.

You could argue that we're not using locking here so you're just subject
to normal open-to-close cache coherency behavior, but that's rather "not
nice".

I think we probably ought to also push this check down into the
filesystem operations as well, and have copy_file_range ensure that the
attribute cache is updated. We're dealing with copy offload here so
doing an extra GETATTR beforehand shouldn't be terribly costly.
-- 
Jeff Layton <jlayton@kernel.org>


  reply	other threads:[~2018-10-22 23:23 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-19 15:29 [PATCH v1 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 01/11] fs: Don't copy beyond the end of the file Olga Kornievskaia
2018-10-19 16:13   ` Trond Myklebust
2018-10-19 16:13     ` Trond Myklebust
2018-10-21 14:29   ` Jeff Layton
2018-10-22 18:32     ` Olga Kornievskaia
2018-10-22 23:23       ` Jeff Layton [this message]
2018-10-23 16:50         ` Olga Kornievskaia
2018-10-24 11:09           ` Jeff Layton
2018-10-24 15:59             ` Olga Kornievskaia
2018-10-24 18:53               ` Olga Kornievskaia
2018-10-24 19:21                 ` Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 02/11] VFS permit cross device vfs_copy_file_range Olga Kornievskaia
2018-10-19 16:14   ` Trond Myklebust
2018-10-19 16:14     ` Trond Myklebust
2018-10-19 16:26     ` Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 03/11] NFS test for intra vs inter COPY Olga Kornievskaia
2018-10-21 14:44   ` Jeff Layton
2018-10-22 17:48     ` Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 04/11] NFS NFSD defining nl4_servers structure needed by both Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 05/11] NFS add COPY_NOTIFY operation Olga Kornievskaia
2018-10-23 15:50   ` Schumaker, Anna
2018-10-24  1:16     ` Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 06/11] NFS add ca_source_server<> to COPY Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 07/11] NFS also send OFFLOAD_CANCEL to source server Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 08/11] NFS inter ssc open Olga Kornievskaia
2018-10-23 20:23   ` Schumaker, Anna
2018-10-24  1:16     ` Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 09/11] NFS skip recovery of copy open on dest server Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 10/11] NFS for "inter" copy treat ESTALE as ENOTSUPP Olga Kornievskaia
2018-10-19 15:29 ` [PATCH v1 11/11] NFS COPY handle ERR_OFFLOAD_DENIED Olga Kornievskaia
  -- strict thread matches above, loose matches on Subject: below --
2018-10-19 15:30 [PATCH v1 01/11] fs: Don't copy beyond the end of the file Olga Kornievskaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1c269f7a09c24638ed3105b1fc934ca2c25383c9.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=olga.kornievskaia@gmail.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox