All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Miklos Szeredi <mszeredi@redhat.com>
Cc: virtio-fs@redhat.com
Subject: Re: [Virtio-fs] [PATCH 6/9] virtio-fs: let dax style override directIO style when dax+cache=none
Date: Mon, 22 Apr 2019 14:14:36 -0400	[thread overview]
Message-ID: <20190422181436.GC7852@redhat.com> (raw)
In-Reply-To: <CAOssrKcnCzt9+om9=5pFqGBAeNds7U3FWpXWHLp6KNgHmVZ=Pg@mail.gmail.com>

On Wed, Apr 17, 2019 at 10:25:53AM +0200, Miklos Szeredi wrote:
> On Tue, Apr 16, 2019 at 9:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Apr 17, 2019 at 02:03:19AM +0800, Liu Bo wrote:
> > > In case of dax+cache=none, mmap uses dax style prior to directIO style,
> > > while read/write don't, but it seems that there is no reason not to do so.
> > >
> > > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > > Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> >
> > This is interesting. I was thinking about it today itself. I noticed
> > that ext4 and xfs also check for DAX inode first and use dax path
> > if dax is enabled.
> >
> > cache=never sets FOPEN_DIRECT_IO (even if application never asked for
> > direct IO). If dax is enabled, for data its equivalent to doing direct
> > IO. And for mmap() we are already checking for DAX first. So it makes
> > sense to do same thing for read/write path as well.
> >
> > CCing Miklos as well. He might have some thougts on this. I am curios
> > that initially whey did he make this change only for mmap() and not
> > for read/write paths.
> 
> AFAIR the main reason was that we had performance issues with size
> extending writes with dax.

Hi Miklos,

How about falling back to sending fuse WRITE message to fuse daemon
in case of extending writes (and not use DAX). That will solve the issue
of write and i_size modification not being atomic also will improve
performance of file extending writes. 

I wrote following hack patch and this seems to work.

Thanks
Vivek


Subject: fuse, dax: Do not use dax for file extnding writes

Right not we use dax for file extending writes and use fallocate() first
to make sure file on host is extended so that later write through mmap()
will not result in SIGBUS.

This approach has two problems. First of all write and i_size are not atomic.
And that means if guest crashes after fallocate(), it can expose trailing
zeros in file and give the apperance as if user data was lost. Secondly,
calling falloate() for every extending write is very slow.  Apart from
communication overhead, fallocate() on host is much slower than calling
write on host.

So do not use dax for file extending writes, instead just send WRITE message
to daemon (like we do for direct I/O path) and this should solve botht the
above problems.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/file.c |   67 ++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 17 deletions(-)

Index: rhvgoyal-linux-fuse/fs/fuse/file.c
===================================================================
--- rhvgoyal-linux-fuse.orig/fs/fuse/file.c	2019-04-18 16:08:19.048407845 -0400
+++ rhvgoyal-linux-fuse/fs/fuse/file.c	2019-04-22 13:22:20.939679793 -0400
@@ -1809,6 +1809,13 @@ static int fuse_iomap_begin(struct inode
 	pr_debug("fuse_iomap_begin() called. pos=0x%llx length=0x%llx\n",
 			pos, length);
 
+	/*
+	 * Writes beyond end of file are not handled using dax path. Instead
+	 * a fuse write message is sent to daemon
+	 */
+	if (flags & IOMAP_WRITE && pos >= i_size_read(inode))
+		return -EIO;
+
 	iomap->offset = pos;
 	iomap->flags = 0;
 	iomap->bdev = NULL;
@@ -1929,11 +1936,34 @@ static ssize_t fuse_dax_read_iter(struct
 	return ret;
 }
 
-static ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
+static bool file_extending_write(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+
+	return (iov_iter_rw(from) == WRITE &&
+	    	((iocb->ki_pos) >= i_size_read(inode)));
+}
+
+static ssize_t fuse_dax_direct_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
+	struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
 	ssize_t ret;
 
+	ret = fuse_direct_io(&io, from, &iocb->ki_pos, FUSE_DIO_WRITE);
+	if (ret < 0)
+		return ret;
+
+	fuse_invalidate_attr(inode);
+	fuse_write_update_size(inode, iocb->ki_pos);
+	return ret;
+}
+
+static ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	ssize_t ret, count;
+
 	if (iocb->ki_flags & IOCB_NOWAIT) {
 		if (!inode_trylock(inode))
 			return -EAGAIN;
@@ -1950,26 +1980,29 @@ static ssize_t fuse_dax_write_iter(struc
 		goto out;
 	/* TODO file_update_time() but we don't want metadata I/O */
 
-	/* TODO handle growing the file */
-	/* Grow file here if need be. iomap_begin() does not have access
-	 * to file pointer
-	 */
-	if (iov_iter_rw(from) == WRITE &&
-	    ((iocb->ki_pos + iov_iter_count(from)) > i_size_read(inode))) {
-		ret = __fuse_file_fallocate(iocb->ki_filp, 0, iocb->ki_pos,
-						iov_iter_count(from));
-		if (ret < 0) {
-			printk("fallocate(offset=0x%llx length=0x%zx)"
-			" failed. err=%zd\n", iocb->ki_pos,
-			iov_iter_count(from), ret);
-			goto out;
-		}
-		pr_debug("fallocate(offset=0x%llx length=0x%zx)"
-		" succeed. ret=%zd\n", iocb->ki_pos, iov_iter_count(from), ret);
+	/* Do not use dax for file extending writes as its an mmap and
+	 * trying to write beyong end of existing page will generate
+	 * SIGBUS.
+	 */
+	if (file_extending_write(iocb, from)) {
+		ret = fuse_dax_direct_write(iocb, from);
+		goto out;
 	}
 
 	ret = dax_iomap_rw(iocb, from, &fuse_iomap_ops);
+	if (ret < 0)
+		goto out;
 
+	/*
+	 * If part of the write was file extending, fuse dax path will not
+	 * take care of that. Do direct write instead.
+	 */
+	if (iov_iter_count(from) && file_extending_write(iocb, from)) {
+		count = fuse_dax_direct_write(iocb, from);
+		if (count < 0)
+			goto out;
+		ret += count;
+	}
 out:
 	inode_unlock(inode);
 


  parent reply	other threads:[~2019-04-22 18:14 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-16 18:03 [Virtio-fs] [PATCH 0/9] virtio-fs fixes Liu Bo
2019-04-16 18:03 ` [Virtio-fs] [PATCH 1/9] virtio-fs: fix multiple tag support Liu Bo
2019-04-16 18:09   ` Vivek Goyal
2019-04-16 18:03 ` [Virtio-fs] [PATCH 2/9] virtio-fs: clean up dax mapping before aborting connection Liu Bo
2019-04-16 18:18   ` Vivek Goyal
2019-04-16 18:40     ` Liu Bo
2019-04-16 18:03 ` [Virtio-fs] [PATCH 3/9] fuse: export fuse_drop_waiting() Liu Bo
2019-04-16 18:28   ` Vivek Goyal
2019-04-16 18:43     ` Liu Bo
2019-04-16 18:03 ` [Virtio-fs] [PATCH 4/9] virtio-fs: fix use-after-free against virtio_fs_vq's fuse_dev info Liu Bo
2019-04-23 19:51   ` Vivek Goyal
2019-04-25  0:16     ` Liu Bo
2019-04-16 18:03 ` [Virtio-fs] [PATCH 5/9] fuse: do not write whole page while page straddles i_size Liu Bo
2019-04-16 20:16   ` Vivek Goyal
2019-04-17  0:12     ` Liu Bo
2019-04-16 18:03 ` [Virtio-fs] [PATCH 6/9] virtio-fs: let dax style override directIO style when dax+cache=none Liu Bo
2019-04-16 19:38   ` Vivek Goyal
2019-04-17  8:25     ` Miklos Szeredi
2019-04-17 19:35       ` Liu Bo
2019-04-25 18:35         ` Vivek Goyal
2019-04-17 20:56       ` Vivek Goyal
2019-04-22 18:55         ` Liu Bo
2019-04-22 18:14       ` Vivek Goyal [this message]
2019-04-16 18:03 ` [Virtio-fs] [PATCH 7/9] fuse: return early if punch_hole fails Liu Bo
2019-04-16 19:51   ` Vivek Goyal
2019-04-16 18:03 ` [Virtio-fs] [PATCH 8/9] virtio-fs: honor RLIMIT_FSIZE in fuse_file_fallocate Liu Bo
2019-04-16 19:57   ` Vivek Goyal
2019-04-16 18:03 ` [Virtio-fs] [PATCH 9/9] fuse: fix deadlock in __fuse_file_fallocate() Liu Bo
2019-04-16 18:07   ` Vivek Goyal
2019-05-02 22:10     ` Liu Bo
2019-05-03 15:22       ` Vivek Goyal
2019-04-24 18:41 ` [Virtio-fs] [PATCH 0/9] virtio-fs fixes Vivek Goyal
2019-04-24 23:12   ` Liu Bo
2019-04-25 14:59     ` Vivek Goyal
2019-04-25 18:10       ` Liu Bo
2019-04-27  0:58         ` Liu Bo
2019-04-29 13:18           ` Vivek Goyal
2019-04-30  1:38             ` Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190422181436.GC7852@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=mszeredi@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.