From: Vivek Goyal <vgoyal@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@redhat.com>,
Ira Weiny <ira.weiny@intel.com>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Miklos Szeredi <miklos@szeredi.hu>,
Matthew Wilcox <willy@infradead.org>,
device-mapper development <dm-devel@redhat.com>,
Linux NVDIMM <nvdimm@lists.linux.dev>,
linux-s390 <linux-s390@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
virtualization@lists.linux-foundation.org,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
Date: Mon, 13 Dec 2021 11:17:49 -0500 [thread overview]
Message-ID: <YbdyLc+V1xyp8sc5@redhat.com> (raw)
In-Reply-To: <CAPcyv4g4_yFqDeS+pnAZOxcB=Ua+iArK5mqn0iMG4PX6oL=F_A@mail.gmail.com>
On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote:
> > > These methods indirect the actual DAX read/write path. In the end pmem
> > > uses magic flush and mc safe variants and fuse and dcssblk use plain ones
> > > while device mapper picks redirects to the underlying device.
> > >
> > > Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
> > > special variants, then use them everywhere as they fall back to the plain
> > > ones on s390 anyway and remove an indirect call from the read/write path
> > > as well as a lot of boilerplate code.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > > drivers/dax/super.c | 36 ++++++++++++++--
> > > drivers/md/dm-linear.c | 20 ---------
> > > drivers/md/dm-log-writes.c | 80 -----------------------------------
> > > drivers/md/dm-stripe.c | 20 ---------
> > > drivers/md/dm.c | 50 ----------------------
> > > drivers/nvdimm/pmem.c | 20 ---------
> > > drivers/s390/block/dcssblk.c | 14 ------
> > > fs/dax.c | 5 ---
> > > fs/fuse/virtio_fs.c | 19 +--------
> > > include/linux/dax.h | 9 ++--
> > > include/linux/device-mapper.h | 4 --
> > > 11 files changed, 37 insertions(+), 240 deletions(-)
> > >
> >
> > [..]
> > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > > index 5c03a0364a9bb..754319ce2a29b 100644
> > > --- a/fs/fuse/virtio_fs.c
> > > +++ b/fs/fuse/virtio_fs.c
> > > @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> > > return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
> > > }
> > >
> > > -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev,
> > > - pgoff_t pgoff, void *addr,
> > > - size_t bytes, struct iov_iter *i)
> > > -{
> > > - return copy_from_iter(addr, bytes, i);
> > > -}
> > > -
> > > -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev,
> > > - pgoff_t pgoff, void *addr,
> > > - size_t bytes, struct iov_iter *i)
> > > -{
> > > - return copy_to_iter(addr, bytes, i);
> > > -}
> > > -
> > > static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> > > pgoff_t pgoff, size_t nr_pages)
> > > {
> > > @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> > >
> > > static const struct dax_operations virtio_fs_dax_ops = {
> > > .direct_access = virtio_fs_direct_access,
> > > - .copy_from_iter = virtio_fs_copy_from_iter,
> > > - .copy_to_iter = virtio_fs_copy_to_iter,
> > > .zero_page_range = virtio_fs_zero_page_range,
> > > };
> > >
> > > @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
> > > fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
> > > if (IS_ERR(fs->dax_dev))
> > > return PTR_ERR(fs->dax_dev);
> > > -
> > > + set_dax_cached(fs->dax_dev);
> >
> > Looks good to me from virtiofs point of view.
> >
> > Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
> >
> > Going forward, I am wondering should virtiofs use flushcache version as
> > well. What if host filesystem is using DAX and mapping persistent memory
> > pfn directly into qemu address space. I have never tested that.
> >
> > Right now we are relying on applications to do fsync/msync on virtiofs
> > for data persistence.
>
> This sounds like it would need coordination with a paravirtualized
> driver that can indicate whether the host side is pmem or not, like
> the virtio_pmem driver.
Agreed. Let me check the details of virtio_pmem driver.
> However, if the guest sends any fsync/msync
> you would still need to go explicitly cache flush any dirty page
> because you can't necessarily trust that the guest did that already.
So host dax functionality will already take care of that, IIUC, right?
I see a dax_flush() call in dax_writeback_one(). I am assuming that's
the will take care of flushing dirty pages when guest issues
fsync()/msync(). So probably don't have to do anything extra here.
I think qemu should map files using MAP_SYNC though in this case though.
Any read/writes to virtiofs files will turn into host file load/store
operations. So flushcache in guest makes more sense with MAP_SYNC which
should make sure any filesystem metadata will already persist after
fault completion. And later guest can do writes followed by flush and
ensure data persists too.
IOW, I probably only need to do following.
- In virtiofs virtual device, add a notion of kind of dax window or memory
it supports. So may be some kind of "writethrough" property of virtiofs
dax cache.
- Use this property in virtiofs driver to decide whether to use
plain copy_from_iter() or _copy_from_iter_flushcache().
- qemu should use mmap(MAP_SYNC) if host filesystem is on persistent
memory.
Thanks
Vivek
next prev parent reply other threads:[~2021-12-13 16:17 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-09 6:38 devirtualize kernel access to DAX Christoph Hellwig
2021-12-09 6:38 ` [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() Christoph Hellwig
2021-12-12 14:22 ` Dan Williams
2021-12-13 8:27 ` Christoph Hellwig
2021-12-09 6:38 ` [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous Christoph Hellwig
2021-12-09 21:03 ` Pankaj Gupta
2021-12-12 14:23 ` Dan Williams
2021-12-09 6:38 ` [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag Christoph Hellwig
2021-12-12 14:24 ` Dan Williams
2021-12-13 8:40 ` Pankaj Gupta
2021-12-09 6:38 ` [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods Christoph Hellwig
2021-12-10 14:16 ` Vivek Goyal
2021-12-12 14:44 ` Dan Williams
2021-12-13 8:23 ` Christoph Hellwig
2021-12-14 14:22 ` Vivek Goyal
2021-12-14 16:41 ` Dan Williams
2021-12-14 20:32 ` Vivek Goyal
2021-12-14 23:43 ` Dan Williams
2021-12-15 15:52 ` Vivek Goyal
2021-12-15 16:46 ` Dan Williams
2021-12-15 10:30 ` Stefan Hajnoczi
2021-12-15 15:43 ` Vivek Goyal
2021-12-15 17:27 ` Stefan Hajnoczi
2021-12-13 16:17 ` Vivek Goyal [this message]
2021-12-12 14:39 ` Dan Williams
2021-12-13 8:24 ` Christoph Hellwig
2021-12-09 6:38 ` [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter Christoph Hellwig
2021-12-10 14:05 ` Vivek Goyal
2021-12-12 14:48 ` Dan Williams
2021-12-13 8:20 ` Christoph Hellwig
2021-12-13 16:43 ` Dan Williams
2021-12-14 13:59 ` Vivek Goyal
2021-12-12 15:03 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YbdyLc+V1xyp8sc5@redhat.com \
--to=vgoyal@redhat.com \
--cc=agk@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dgilbert@redhat.com \
--cc=dm-devel@redhat.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hch@lst.de \
--cc=ira.weiny@intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=nvdimm@lists.linux.dev \
--cc=snitzer@redhat.com \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox