All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Juergen Gross <JGross@suse.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <JBeulich@suse.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Keir Fraser <keir@xen.org>
Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen
Date: Mon, 7 Mar 2016 15:53:02 -0500	[thread overview]
Message-ID: <20160307205302.GA21364@char.us.oracle.com> (raw)
In-Reply-To: <20160302071452.GB4064@hz-desktop.sh.intel.com>

On Wed, Mar 02, 2016 at 03:14:52PM +0800, Haozhong Zhang wrote:
> On 03/01/16 13:49, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 01, 2016 at 06:33:32PM +0000, Ian Jackson wrote:
> > > Haozhong Zhang writes ("Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen"):
> > > > On 02/18/16 21:14, Konrad Rzeszutek Wilk wrote:
> > > > > [someone:]
> > > > > > (2) For XENMAPSPACE_gmfn, _gmfn_range and _gmfn_foreign,
> > > > > >    (a) never map idx in them to GFNs occupied by vNVDIMM, and
> > > > > >    (b) never map idx corresponding to GFNs occupied by vNVDIMM
> > > > > 
> > > > > Would that mean that guest xen-blkback or xen-netback wouldn't
> > > > > be able to fetch data from the GFNs? As in, what if the HVM guest
> > > > > that has the NVDIMM also serves as a device domain - that is it
> > > > > has xen-blkback running to service other guests?
> > > > 
> > > > I'm not familiar with xen-blkback and xen-netback, so following
> > > > statements maybe wrong.
> > > > 
> > > > In my understanding, xen-blkback/-netback in a device domain maps the
> > > > pages from other domains into its own domain, and copies data between
> > > > those pages and vNVDIMM. The access to vNVDIMM is performed by NVDIMM
> > > > driver in device domain. In which steps of this procedure that
> > > > xen-blkback/-netback needs to map into GFNs of vNVDIMM?
> > > 
> > > I think I agree with what you are saying.  I don't understand exactly
> > > what you are proposing above in XENMAPSPACE_gmfn but I don't see how
> > > anything about this would interfere with blkback.
> > > 
> > > blkback when talking to an nvdimm will just go through the block layer
> > > front door, and do a copy, I presume.
> > 
> > I believe you are right. The block layer, and then the fs would copy in.
> > > 
> > > I don't see how netback comes into it at all.
> > > 
> > > But maybe I am just confused or ignorant!  Please do explain :-).
> > 
> > s/back/frontend/  
> > 
> > My fear was refcounting.
> > 
> > Specifically where we do not do copying. For example, you could
> > be sending data from the NVDIMM GFNs (scp?) to some other location
> > (another host?). It would go over the xen-netback (in the dom0)
> > - which would then grant map it (dom0 would).
> >
> 
> Thanks for the explanation!
> 
> It means NVDIMM is very possibly mapped in page granularity, and
> hypervisor needs per-page data structures like page_info (rather than the
> range set style nvdimm_pages) to manage those mappings.

I do not know. I figured you need some accounting in the hypervisor
as the pages can be grant mapped but I don't know the intricate details
of the P2M code to tell you for certain.

[edit: Your later email seems to imply that you do not need all this
information? Just ranges?]
> 
> Then we will face the problem that the potentially huge number of
> per-page data structures may not fit in the normal ram. Linux kernel
> developers came across the same problem, and their solution is to
> reserve an area of NVDIMM and put the page structures in the reserved
> area (https://lwn.net/Articles/672457/). I think we may take the similar
> solution:
> (1) Dom0 Linux kernel reserves an area on each NVDIMM for Xen usage
>     (besides the one used by Linux kernel itself) and reports the address
>     and size to Xen hypervisor.
> 
>     Reasons to choose Linux kernel to make the reservation include:
>     (a) only Dom0 Linux kernel has the NVDIMM driver,
>     (b) make it flexible for Dom0 Linux kernel to handle all
>         reservations (for itself and Xen).
> 
> (2) Then Xen hypervisor builds the page structures for NVDIMM pages and
>     stores them in above reserved areas.
> 
> (3) The reserved area is used as volatile, i.e. above two steps must be
>     done for every host boot.
> 
> > In effect Xen there are two guests (dom0 and domU) pointing in the
> > P2M to the same GPFN. And that would mean:
> > 
> > > > > >    (b) never map idx corresponding to GFNs occupied by vNVDIMM
> > 
> > Granted the XENMAPSPACE_gmfn happens _before_ the grant mapping is done
> > so perhaps this is not an issue?
> > 
> > The other situation I was envisioning - where the driver domain has
> > the NVDIMM passed in, and as well SR-IOV network card and functions
> > as an iSCSI target. That should work OK as we just need the IOMMU
> > to have the NVDIMM GPFNs programmed in.
> >
> 
> For this IOMMU usage example and above granted pages example, there
> remains one question: who is responsible to perform NVDIMM flush
> (clwb/clflushopt/pcommit)?


> 
> For the granted page example, if a NVDIMM page is granted to
> xen-netback, does the hypervisor need to tell xen-netback it's a NVDIMM
> page so that xen-netback can perform proper flush when it writes to that
> page? Or we may keep the NVDIMM transparent to xen-netback, and let Xen
> perform the flush when xen-netback gives up the granted NVDIMM page?
> 
> For the IOMMU example, my understanding is that there is a piece of
> software in the driver domain that handles SCSI commands received from
> network card and drives the network card to read/write certain areas of
> NVDIMM. Then that software should be aware of the existence of NVDIMM
> and perform the flush properly. Is that right?

I would imagine it is the same as any write on NVDIMM. The "owner"
of the NVDIMM would perform the pcommit. ?
> 
> Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-03-07 20:53 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01  5:44 [RFC Design Doc] Add vNVDIMM support for Xen Haozhong Zhang
2016-02-01 18:25 ` Andrew Cooper
2016-02-02  3:27   ` Tian, Kevin
2016-02-02  3:44   ` Haozhong Zhang
2016-02-02 11:09     ` Andrew Cooper
2016-02-02  6:33 ` Tian, Kevin
2016-02-02  7:39   ` Zhang, Haozhong
2016-02-02  7:48     ` Tian, Kevin
2016-02-02  7:53       ` Zhang, Haozhong
2016-02-02  8:03         ` Tian, Kevin
2016-02-02  8:49           ` Zhang, Haozhong
2016-02-02 19:01   ` Konrad Rzeszutek Wilk
2016-02-02 17:11 ` Stefano Stabellini
2016-02-03  7:00   ` Haozhong Zhang
2016-02-03  9:13     ` Jan Beulich
2016-02-03 14:09       ` Andrew Cooper
2016-02-03 14:23         ` Haozhong Zhang
2016-02-05 14:40         ` Ross Philipson
2016-02-06  1:43           ` Haozhong Zhang
2016-02-06 16:17             ` Ross Philipson
2016-02-03 12:02     ` Stefano Stabellini
2016-02-03 13:11       ` Haozhong Zhang
2016-02-03 14:20         ` Andrew Cooper
2016-02-04  3:10           ` Haozhong Zhang
2016-02-03 15:16       ` George Dunlap
2016-02-03 15:22         ` Stefano Stabellini
2016-02-03 15:35           ` Konrad Rzeszutek Wilk
2016-02-03 15:35           ` George Dunlap
2016-02-04  2:55           ` Haozhong Zhang
2016-02-04 12:24             ` Stefano Stabellini
2016-02-15  3:16               ` Zhang, Haozhong
2016-02-16 11:14                 ` Stefano Stabellini
2016-02-16 12:55                   ` Jan Beulich
2016-02-17  9:03                     ` Haozhong Zhang
2016-03-04  7:30                     ` Haozhong Zhang
2016-03-16 12:55                       ` Haozhong Zhang
2016-03-16 13:13                         ` Konrad Rzeszutek Wilk
2016-03-16 13:16                         ` Jan Beulich
2016-03-16 13:55                           ` Haozhong Zhang
2016-03-16 14:23                             ` Jan Beulich
2016-03-16 14:55                               ` Haozhong Zhang
2016-03-16 15:23                                 ` Jan Beulich
2016-03-17  8:58                                   ` Haozhong Zhang
2016-03-17 11:04                                     ` Jan Beulich
2016-03-17 12:44                                       ` Haozhong Zhang
2016-03-17 12:59                                         ` Jan Beulich
2016-03-17 13:29                                           ` Haozhong Zhang
2016-03-17 13:52                                             ` Jan Beulich
2016-03-17 14:00                                             ` Ian Jackson
2016-03-17 14:21                                               ` Haozhong Zhang
2016-03-29  8:47                                                 ` Haozhong Zhang
2016-03-29  9:11                                                   ` Jan Beulich
2016-03-29 10:10                                                     ` Haozhong Zhang
2016-03-29 10:49                                                       ` Jan Beulich
2016-04-08  5:02                                                         ` Haozhong Zhang
2016-04-08 15:52                                                           ` Jan Beulich
2016-04-12  8:45                                                             ` Haozhong Zhang
2016-04-21  5:09                                                               ` Haozhong Zhang
2016-04-21  7:04                                                                 ` Jan Beulich
2016-04-22  2:36                                                                   ` Haozhong Zhang
2016-04-22  8:24                                                                     ` Jan Beulich
2016-04-22 10:16                                                                       ` Haozhong Zhang
2016-04-22 10:53                                                                         ` Jan Beulich
2016-04-22 12:26                                                                           ` Haozhong Zhang
2016-04-22 12:36                                                                             ` Jan Beulich
2016-04-22 12:54                                                                               ` Haozhong Zhang
2016-04-22 13:22                                                                                 ` Jan Beulich
2016-03-17 13:32                                         ` Konrad Rzeszutek Wilk
2016-02-03 15:47       ` Konrad Rzeszutek Wilk
2016-02-04  2:36         ` Haozhong Zhang
2016-02-15  9:04         ` Zhang, Haozhong
2016-02-02 19:15 ` Konrad Rzeszutek Wilk
2016-02-03  8:28   ` Haozhong Zhang
2016-02-03  9:18     ` Jan Beulich
2016-02-03 12:22       ` Haozhong Zhang
2016-02-03 12:38         ` Jan Beulich
2016-02-03 12:49           ` Haozhong Zhang
2016-02-03 14:30       ` Andrew Cooper
2016-02-03 14:39         ` Jan Beulich
2016-02-15  8:43   ` Haozhong Zhang
2016-02-15 11:07     ` Jan Beulich
2016-02-17  9:01       ` Haozhong Zhang
2016-02-17  9:08         ` Jan Beulich
2016-02-18  7:42           ` Haozhong Zhang
2016-02-19  2:14             ` Konrad Rzeszutek Wilk
2016-03-01  7:39               ` Haozhong Zhang
2016-03-01 18:33                 ` Ian Jackson
2016-03-01 18:49                   ` Konrad Rzeszutek Wilk
2016-03-02  7:14                     ` Haozhong Zhang
2016-03-02 13:03                       ` Jan Beulich
2016-03-04  2:20                         ` Haozhong Zhang
2016-03-08  9:15                           ` Haozhong Zhang
2016-03-08  9:27                             ` Jan Beulich
2016-03-09 12:22                               ` Haozhong Zhang
2016-03-09 16:17                                 ` Jan Beulich
2016-03-10  3:27                                   ` Haozhong Zhang
2016-03-17 11:05                                   ` Ian Jackson
2016-03-17 13:37                                     ` Haozhong Zhang
2016-03-17 13:56                                       ` Jan Beulich
2016-03-17 14:22                                         ` Haozhong Zhang
2016-03-17 14:12                                       ` Xu, Quan
2016-03-17 14:22                                         ` Zhang, Haozhong
2016-03-07 20:53                       ` Konrad Rzeszutek Wilk [this message]
2016-03-08  5:50                         ` Haozhong Zhang
2016-02-18 17:17 ` Jan Beulich
2016-02-24 13:28   ` Haozhong Zhang
2016-02-24 14:00     ` Ross Philipson
2016-02-24 16:42       ` Haozhong Zhang
2016-02-24 17:50         ` Ross Philipson
2016-02-24 14:24     ` Jan Beulich
2016-02-24 15:48       ` Haozhong Zhang
2016-02-24 16:54         ` Jan Beulich
2016-02-28 14:48           ` Haozhong Zhang
2016-02-29  9:01             ` Jan Beulich
2016-02-29  9:45               ` Haozhong Zhang
2016-02-29 10:12                 ` Jan Beulich
2016-02-29 11:52                   ` Haozhong Zhang
2016-02-29 12:04                     ` Jan Beulich
2016-02-29 12:22                       ` Haozhong Zhang
2016-03-01 13:51                         ` Ian Jackson
2016-03-01 15:04                           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160307205302.GA21364@char.us.oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=JGross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=ian.campbell@citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.