public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Haozhong Zhang <haozhong.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Kevin Wolf <kwolf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	xiaoguangrong eric
	<xiaoguangrong.eric-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	kvm-devel <kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org"
	<linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org>,
	Qemu Developers
	<qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org>,
	Stefan Hajnoczi
	<stefanha-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Paolo Bonzini <pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Nitesh Narayan Lal
	<nilal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: KVM "fake DAX" flushing interface - discussion
Date: Fri, 21 Jul 2017 06:21:39 -0400 (EDT)	[thread overview]
Message-ID: <813318776.33377694.1500632499830.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20170721095131.ule4owoayuqwh6d3@hz-desktop>


> > 
> > Hello,
> > 
> > We shared a proposal for 'KVM fake DAX flushing interface'.
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg02478.html
> >
> 
> In above link,
>   "Overall goal of project
>    is to increase the number of virtual machines that can be
>    run on a physical machine, in order to *increase the density*
>    of customer virtual machines"
> 
> Is the fake persistent memory used as normal RAM in guest? If no, how
> is it expected to be used in guest?

Yes, guest will have a nvdimm DAX device and not use page cache for most 
of the operations. Host will manage memory requirement of all the guests.
  
> 
> > We did initial POC in which we used 'virtio-blk' device to perform
> > a device flush on pmem fsync on ext4 filesystem. They are few hacks
> > to make things work. We need suggestions on below points before we
> > start actual implementation.
> >
> > A] Problems to solve:
> > ------------------
> > 
> > 1] We are considering two approaches for 'fake DAX flushing interface'.
> >     
> >  1.1] fake dax with NVDIMM flush hints & KVM async page fault
> > 
> >      - Existing interface.
> > 
> >      - The approach to use flush hint address is already nacked upstream.
> > 
> >      - Flush hint not queued interface for flushing. Applications might
> >        avoid to use it.
> > 
> >      - Flush hint address traps from guest to host and do an entire fsync
> >        on backing file which itself is costly.
> > 
> >      - Can be used to flush specific pages on host backing disk. We can
> >        send data(pages information) equal to cache-line size(limitation)
> >        and tell host to sync corresponding pages instead of entire disk
> >        sync.
> > 
> >      - This will be an asynchronous operation and vCPU control is returned
> >        quickly.
> > 
> > 
> >  1.2] Using additional para virt device in addition to pmem device(fake dax
> >  with device flush)
> > 
> >      - New interface
> > 
> >      - Guest maintains information of DAX dirty pages as exceptional
> >      entries in
> >        radix tree.
> > 
> >      - If we want to flush specific pages from guest to host, we need to
> >      send
> >        list of the dirty pages corresponding to file on which we are doing
> >        fsync.
> > 
> >      - This will require implementation of new interface, a new paravirt
> >      device
> >        for sending flush requests.
> > 
> >      - Host side will perform fsync/fdatasync on list of dirty pages or
> >      entire
> >        block device backed file.
> > 
> > 2] Questions:
> > -----------
> > 
> >  2.1] Not sure why WPQ flush is not a queued interface? We can force
> >  applications
> >       to call this? device DAX neither calls fsync/msync?
> > 
> >  2.2] Depending upon interface we decide, we need optimal solution to sync
> >       range of pages?
> > 
> >      - Send range of pages from guest to host to sync asynchronously
> >      instead
> >        of syncing entire block device?
> 
> e.g. a new virtio device to deliver sync requests to host?
> 
> > 
> >      - Other option is to sync entire disk backing file to make sure all
> >      the
> >        writes are persistent. In our case, backing file is a regular file
> >        on
> >        non NVDIMM device so host page cache has list of dirty pages which
> >        can be used either with fsync or similar interface.
> 
> As the amount of dirty pages can be variant, the latency of each host
> fsync is likely to vary in a large range.
> 
> > 
> >  2.3] If we do host fsync on entire disk we will be flushing all the dirty
> >  data
> >       to backend file. Just thinking what would be better approach,
> >       flushing
> >       pages on corresponding guest file fsync or entire block device?
> > 
> >  2.4] If we decide to choose one of the above approaches, we need to
> >  consider
> >       all DAX supporting filesystems(ext4/xfs). Would hooking code to
> >       corresponding
> >       fsync code of fs seems reasonable? Just thinking for flush hint
> >       address use-case?
> >       Or how flush hint addresses would be invoked with fsync or similar
> >       api?
> > 
> >  2.5] Also with filesystem journalling and other mount options like
> >  barriers,
> >       ordered etc, how we decide to use page flush hint or regular fsync on
> >       file?
> >  
> >  2.6] If at guest side we have PFN of all the dirty pages in radixtree? and
> >  we send
> >       these to to host? At host side would we able to find corresponding
> >       page and flush
> >       them all?
> 
> That may require the host file system provides API to flush specified
> blocks/extents and their meta data in the file system. I'm not
> familiar with this part and don't know whether such API exists.
> 
> Haozhong
> 

  reply	other threads:[~2017-07-21 10:21 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1455443283.33337333.1500618150787.JavaMail.zimbra@redhat.com>
     [not found] ` <1455443283.33337333.1500618150787.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-21  6:56   ` KVM "fake DAX" flushing interface - discussion Pankaj Gupta
2017-07-21  9:51     ` Haozhong Zhang
2017-07-21 10:21       ` Pankaj Gupta [this message]
2017-07-21 12:12     ` Stefan Hajnoczi
2017-07-21 13:29       ` Pankaj Gupta
     [not found]         ` <46101617.33460557.1500643755247.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-21 14:00           ` Rik van Riel
2017-07-21 15:58         ` Stefan Hajnoczi
     [not found]           ` <20170721155848.GO18014-lxVrvc10SDRcolVlb+j0YCZi+YwRKgec@public.gmane.org>
2017-07-22 19:34             ` Dan Williams
     [not found]               ` <CAPcyv4gtWYpzbmggsbdLocPiMzU2rVt-ee+kL24gbrPxKd5Eyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-07-23 14:04                 ` Rik van Riel
     [not found]                   ` <1500818683.4073.31.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-23 16:01                     ` Dan Williams
     [not found]                       ` <CAPcyv4h5O4D2kp6SJhWiz4V=dOLDa_Q3pk2B=u-x7hKKQqdXsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-07-23 18:10                         ` Rik van Riel
2017-07-23 20:10                           ` Dan Williams
     [not found]                             ` <CAPcyv4hpbk0jgp+mA=q05zVBV8ZSZvCvV68JJ4gjE3QhK70d1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-07-24 10:23                               ` Jan Kara
     [not found]                                 ` <20170724102330.GE652-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-07-24 12:06                                   ` Pankaj Gupta
2017-07-24 12:37                                     ` Jan Kara
     [not found]                                       ` <20170724123752.GN652-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-07-24 15:10                                         ` Dan Williams
2017-07-24 15:48                                           ` Jan Kara
2017-07-24 16:19                                             ` Dan Williams
2017-07-25 14:27                                       ` Pankaj Gupta
     [not found]                                         ` <1888117852.34216619.1500992835767.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-25 14:46                                           ` Dan Williams
2017-07-25 20:59                                             ` Rik van Riel
     [not found]                                               ` <1501016375.26846.21.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-26 13:47                                                 ` Pankaj Gupta
2017-07-26 21:27                                                   ` Rik van Riel
2017-07-26 21:40                                                     ` Dan Williams
2017-07-26 23:46                                                       ` Rik van Riel
     [not found]                                                         ` <1501112787.4073.49.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-27  0:54                                                           ` Dan Williams
     [not found]                                                             ` <CAPcyv4gbC6Hx_4YsCfOd2t=fn=wPGp5h__1QH=-p40TPFNbFzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-31  7:13                                                               ` Xiao Guangrong
2017-10-31 14:20                                                                 ` Dan Williams
     [not found]                                                                   ` <CAPcyv4iw2cCpDmr+4kxsFvdy+iGZiz=ok-kLhsDKpqDy+szf-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-01  3:43                                                                     ` Xiao Guangrong
2017-11-01  4:25                                                                       ` Dan Williams
     [not found]                                                                         ` <CAPcyv4jR_LdbsX-rAsHC7++C6d-WYC084uWXzr+08PSYwoXFMw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-01  6:46                                                                           ` Xiao Guangrong
     [not found]                                                                             ` <ca6aaa77-cca0-441e-be49-73133d8581cf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-01 15:20                                                                               ` Dan Williams
     [not found]                                                                                 ` <CAPcyv4gKzvd39WbnKjbs3Bn9+o1tt=vz90CYMFu0DF5PsfHUig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-02  8:50                                                                                   ` Xiao Guangrong
2017-11-02 16:30                                                                                     ` Dan Williams
     [not found]                                                                                       ` <CAPcyv4iH==cqVAdd8i1y-8A6SuXU75OH1EZzgNMvtA21wfxPpQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-03  6:21                                                                                         ` Xiao Guangrong
2017-11-21 18:19                                                                                           ` Rik van Riel
     [not found]                                                                                             ` <1511288389.1080.14.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-11-21 18:26                                                                                               ` Dan Williams
2017-11-21 18:35                                                                                                 ` Rik van Riel
2017-11-23  4:05                                                                                             ` Xiao Guangrong
2017-11-23 16:14                                                                                               ` Dan Williams
2017-11-23 16:28                                                                                                 ` Paolo Bonzini
2017-11-24 12:40                                                                                                   ` Pankaj Gupta
     [not found]                                                                                                     ` <336152896.34452750.1511527207457.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-11-24 12:44                                                                                                       ` Paolo Bonzini
2017-11-24 13:02                                                                                                         ` [Qemu-devel] " Pankaj Gupta
2017-11-24 13:20                                                                                                           ` Paolo Bonzini
2017-11-28 18:03                                                                                                     ` Dan Williams
     [not found]                                                                                                       ` <CAPcyv4j6nk1cJFuG4DDA9JoNJe2d3rSskdFSUPu4aWzWX+JQeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-13  6:23                                                                                                         ` Pankaj Gupta
     [not found]                                                                                                           ` <326660076.6160176.1515824585284.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-17 16:17                                                                                                             ` Dan Williams
2018-01-17 17:31                                                                                                               ` Pankaj Gupta
2018-01-18 16:53                                                                                                     ` David Hildenbrand
     [not found]                                                                                                       ` <f1ca60cc-5506-a161-b473-f0de363b7e95-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-18 17:38                                                                                                         ` Dan Williams
2018-01-18 17:48                                                                                                           ` David Hildenbrand
     [not found]                                                                                                             ` <72839100-7fdf-693c-e9c2-348a5add8a56-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-18 18:45                                                                                                               ` Dan Williams
2018-01-18 18:54                                                                                                             ` Pankaj Gupta
2018-01-18 18:59                                                                                                               ` Dan Williams
     [not found]                                                                                                                 ` <CAPcyv4hso5FYCyxYBHRhHvsU+M_wrkQBwVKurK-i6BQYzQduPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-18 19:36                                                                                                                   ` Pankaj Gupta
2018-01-18 19:48                                                                                                                     ` Dan Williams
2018-01-18 19:51                                                                                                                   ` David Hildenbrand
2018-01-18 20:11                                                                                                                     ` Dan Williams
2017-11-06  7:57                                                                                       ` [Qemu-devel] " Pankaj Gupta
2017-11-06 16:57                                                                                         ` Dan Williams
     [not found]                                                                                           ` <CAPcyv4jdJwUQTy7O7Ar82J+gAi54ycCTa=HSfXY5Ogwqi+oC-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-07 11:21                                                                                             ` Pankaj Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=813318776.33377694.1500632499830.JavaMail.zimbra@redhat.com \
    --to=pagupta-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=haozhong.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=kwolf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org \
    --cc=nilal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org \
    --cc=riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=stefanha-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=xiaoguangrong.eric-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox