All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Liu Bo <bo.liu@linux.alibaba.com>
Cc: virtio-fs@redhat.com
Subject: Re: [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode
Date: Tue, 24 Mar 2020 20:09:40 +0000	[thread overview]
Message-ID: <20200324200940.GI2645@work-vm> (raw)
In-Reply-To: <20200320223252.snlelgr2ps5syfnh@rsjd01523.et2sqa>

* Liu Bo (bo.liu@linux.alibaba.com) wrote:
> On Fri, Mar 20, 2020 at 08:16:15PM +0000, Dr. David Alan Gilbert wrote:
> > * Liu Bo (bo.liu@linux.alibaba.com) wrote:
> > > When a file size is not aligned to PAGE_SIZE, a mmap write on it may
> > > encounter -EIO (can be observed from virtiofsd's log) due to the difference
> > > between the buf size and the size recorded in struct fuse_write_in.  The
> > > difference comes from the fact that for mmap, writeback IO is used and
> > > guest kernel sets fuse_write_in's size to inode size if EOF, while the buf
> > > len still remains PAGE_SIZE aligned.
> > > 
> > > This handles the above special mmap case by truncating the last buf'size.
> > 
> > Thanks,
> > 
> > > Fixes: Commit 469f9d2f ("virtiofsd: Plumb fuse_bufvec through do_write_buf")
> > > Reported-by: Yiqun Leng <yqleng@linux.alibaba.com>
> > > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > > ---
> > >  tools/virtiofsd/fuse_lowlevel.c | 17 +++++++++++++++++
> > >  1 file changed, 17 insertions(+)
> > > 
> > > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > > index ca2056f..4f8bfb6 100644
> > > --- a/tools/virtiofsd/fuse_lowlevel.c
> > > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > > @@ -1221,6 +1221,23 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
> > >           * and the data in the rest, we need to skip that first element
> > >           */
> > >          ibufv->buf[0].size = 0;
> > > +
> > > +        /*
> > > +         * In case of mmap, fuse_buf_size(pbufv) may need to truncate if
> > > +         * arg->size has been cropped by inode size inside guest.  The
> > > +         * diff can only be (0, PAGE_SIZE) because inode size must be
> > > +         * overlapped with the last buf.
> > > +         */
> > > +        if (arg->write_flags & FUSE_WRITE_CACHE) {
> > 
> > Does this need to only do it in the WRITE_CACHE case - or should we just
> > always truncate the write to arg->size?
> > Or is this just simpler?
> 
> For non-mmap IO, AFAICS, it's all synchronous IO (not using
> writepages) where the data part's length should be equal to arg->size
> here.  So I think it's no harm to do it for both.

OK, good that would simplify it.

> > 
> > > +                size_t total = fuse_buf_size(pbufv);
> > > +                int last = ibufv->count - 1;
> > > +
> > > +                if (total > arg->size) {
> > > +                        size_t diff = total - arg->size;
> > > +                        if (diff < ibufv->buf[last].size)
> > > +                                ibufv->buf[last].size -= diff;
> > 
> > I think that needs to modify pbufv->buf[last].size not ibufv
> > because the two are only the same in some cases (although it's possible
> > in this case the guest we try at the moment always falls in this side).
> >
> 
> OK.
> 
> > We should also do something in the else case - probably fail?
> >
> 
> If it fails, it then gets to the following check and report -EIO.

Yes, but that error doesn't tell us why; so we'll see that error and not
realise that it was because we failed to do this truncation.

> > > +                }
> > > +        }
> > >      }
> > >  
> > >      if (fuse_buf_size(pbufv) != arg->size) {
> > 
> > If we now know that pbufv is now always shrung to size,
> > then we only now need to check for the case where pbufv is too small.
> >
> 
> From my understanding about both mmap IO and nonmmap IO, I think it's
> arg->size that is always <= pbufv size.

It should do, but this is a sanity check - remember we don't trust the
guest.

Dave

> thanks,
> -liubo
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


  reply	other threads:[~2020-03-24 20:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-20 18:34 [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out Liu Bo
2020-03-20 18:34 ` [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode Liu Bo
2020-03-20 20:16   ` Dr. David Alan Gilbert
2020-03-20 22:33     ` Liu Bo
2020-03-24 20:09       ` Dr. David Alan Gilbert [this message]
2020-03-20 18:59 ` [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200324200940.GI2645@work-vm \
    --to=dgilbert@redhat.com \
    --cc=bo.liu@linux.alibaba.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.