Linux-NVDIMM Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Juergen Gross <JGross@suse.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Stefano Stabellini <stefano@aporeto.com>,
	David Vrabel <david.vrabel@citrix.com>,
	Jan Beulich <jbeulich@suse.com>,
	xen-devel@lists.xenproject.org,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen
Date: Tue, 11 Oct 2016 14:42:16 -0400	[thread overview]
Message-ID: <20161011184215.GB23193@localhost.localdomain> (raw)
In-Reply-To: <76b01877-40e4-1107-5c13-da98c8e1152e@citrix.com>

On Tue, Oct 11, 2016 at 07:15:42PM +0100, Andrew Cooper wrote:
> On 11/10/16 18:51, Dan Williams wrote:
> > On Tue, Oct 11, 2016 at 9:58 AM, Konrad Rzeszutek Wilk
> > <konrad.wilk@oracle.com> wrote:
> >> On Tue, Oct 11, 2016 at 08:53:33AM -0700, Dan Williams wrote:
> >>> On Tue, Oct 11, 2016 at 6:08 AM, Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>> Andrew Cooper <andrew.cooper3@citrix.com> 10/10/16 6:44 PM >>>
> >>>>> On 10/10/16 01:35, Haozhong Zhang wrote:
> >>>>>> Xen hypervisor needs assistance from Dom0 Linux kernel for following tasks:
> >>>>>> 1) Reserve an area on NVDIMM devices for Xen hypervisor to place
> >>>>>>    memory management data structures, i.e. frame table and M2P table.
> >>>>>> 2) Report SPA ranges of NVDIMM devices and the reserved area to Xen
> >>>>>>    hypervisor.
> >>>>> However, I can't see any justification for 1).  Dom0 should not be
> >>>>> involved in Xen's management of its own frame table and m2p.  The mfns
> >>>>> making up the pmem/pblk regions should be treated just like any other
> >>>>> MMIO regions, and be handed wholesale to dom0 by default.
> >>>> That precludes the use as RAM extension, and I thought earlier rounds of
> >>>> discussion had got everyone in agreement that at least for the pmem case
> >>>> we will need some control data in Xen.
> >>> The missing piece for me is why this reservation for control data
> >>> needs to be done in the libnvdimm core?  I would expect that any dax
> >> Isn't it done this way with Linux? That is say if the machine has
> >> 4GB of RAM and the NVDIMM is in TB range. You want to put the 'struct page'
> >> for the NVDIMM ranges somewhere. That place can be in regions on the
> >> NVDIMM that ndctl can reserve.
> > Yes.
> 
> I do not see any sensible usecase for Xen to use NVDIMMs as plain RAM;

I just gave you one. This is the 'usecase' that Linux has to deal with
now that the core kernel folks have pointed out that they don't want
'struct page' for the MMIO regions. This mechanism came about this and
finding a place _somewhere_ to deal with having to have 'struct page'
for the SPA ranges of the NVDIMM.

> NVDIMMs are far more valuable for higher level management in dom0.

Andrew, why are you providing input to this so late?

Haozhong provided an nice design document outlining the problem and
the solution he suggested.

> 
> I certainly think that such a usecase should be out-of-scope for initial
> Xen/NVDIMM support, even if only to reduce the complexity to start with.
> 
> A repeated complain I have of large feature submissions like this is
> that, by trying to solve all potential usecases at one, end up being
> overly complicated to develop, understand and review.

On the other hand - if you don't take these complicated issues from the
start, then you may have to redesign and redevelop this after the first
version which has been set in stone and committed.

> 
> >
> >>> capable file could be mapped and made available to a guest.  This
> >>> includes /dev/ramX devices that are dax capable, but are external to
> >>> the libnvdimm sub-system.
> >> This is more of just keeping track of the ranges if say the DAX file is
> >> extremely fragmented and requires a lot of 'struct pages' to keep track of
> >> when stiching up the VMA.
> > Right, but why does the libnvdimm core need to know about this
> > specific Xen reservation?  For example, if Xen wants some in-kernel
> > driver to own a pmem region and place its own metadata on the device I
> > would recommend something like:
> >
> >     bdev = blkdev_get_by_path("/dev/pmemX",  FMODE_EXCL...);
> >     bdev_direct_access(bdev, ...);
> >
> > ...in other words, I don't think we want libnvdimm to grow new device
> > types for every possible in-kernel user, Xen, MD, DM, etc. Instead,
> > just claim the resulting device.
> 
> I completely agree.
> 
> Whatever ends up happening between Xen and dom0, there should be no
> modifications like this to the nvdimm driver.  I will go so far as to
> say that there shouldn't be any modifications to the nvdimm driver
> (other than perhaps new query hooks so the Xen subsystem in Linux can
> query information to then pass up to Xen, if the existing queryability
> is insufficient).

Haozhong and Jan had been chatting about this in terms of how to keep
track of a guest having non-contingous SPAs of NVDIMM stiched to a guest.

The initial idea was to treat it as MMIO, but of course if you have 1
page ranges over say 1TB you end up consuming tons of memory to keep
track of this (the same way Linux would if you wanted to mmap an file
from DAX fs).

Other solutions were an bitmap, but that can also be cumbersome to deal
with. In the end the suggestion that was proposed was the one that Linux
choose - stash the 'struct page' in the NVDIMM.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  reply	other threads:[~2016-10-11 18:42 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-10  0:35 [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen Haozhong Zhang
2016-10-10  0:35 ` [RFC KERNEL PATCH 1/2] nvdimm: add PFN_MODE_XEN to pfn device for Xen usage Haozhong Zhang
2016-10-10  0:35 ` [RFC KERNEL PATCH 2/2] xen, nvdimm: report pfn devices in PFN_MODE_XEN to Xen hypervisor Haozhong Zhang
2016-10-10  3:45 ` [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen Dan Williams
2016-10-10  6:32   ` Haozhong Zhang
2016-10-10 16:24     ` Dan Williams
2016-10-11  7:11       ` Haozhong Zhang
2016-10-10 16:43 ` [Xen-devel] " Andrew Cooper
2016-10-11  5:52   ` Haozhong Zhang
2016-10-11 18:37     ` Andrew Cooper
     [not found]       ` <de62aa59-37e0-b01f-1617-6fc8f6fb3620-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2016-10-11 18:45         ` Konrad Rzeszutek Wilk
2016-10-11 18:48         ` Konrad Rzeszutek Wilk
2016-10-11 13:08   ` Jan Beulich
2016-10-11 15:53     ` Dan Williams
2016-10-11 16:58       ` Konrad Rzeszutek Wilk
2016-10-11 17:51         ` Dan Williams
2016-10-11 18:15           ` Andrew Cooper
2016-10-11 18:42             ` Konrad Rzeszutek Wilk [this message]
2016-10-11 19:43               ` Konrad Rzeszutek Wilk
2016-10-11 18:33           ` Konrad Rzeszutek Wilk
2016-10-11 19:28             ` Dan Williams
2016-10-11 19:48               ` Konrad Rzeszutek Wilk
2016-10-11 20:17                 ` Dan Williams
2016-10-12 10:33                   ` Haozhong Zhang
2016-10-12 11:32                     ` Jan Beulich
2016-10-12 14:58                       ` Haozhong Zhang
2016-10-12 15:39                         ` Jan Beulich
2016-10-12 15:42                           ` Dan Williams
2016-10-12 16:01                             ` Jan Beulich
2016-10-12 16:19                               ` Dan Williams
2016-10-13  8:34                                 ` Jan Beulich
2016-10-13  8:53                                   ` Haozhong Zhang
2016-10-13  9:08                                     ` Jan Beulich
2016-10-13 15:40                                       ` Dan Williams
2016-10-13 16:01                                         ` Andrew Cooper
2016-10-13 18:59                                           ` Dan Williams
2016-10-13 19:33                                             ` Andrew Cooper
2016-10-14  7:08                                               ` Haozhong Zhang
2016-10-14 12:18                                                 ` Andrew Cooper
2016-10-20  9:14                                                   ` Haozhong Zhang
2016-10-20 21:46                                                     ` Andrew Cooper
2016-10-14 10:03                                         ` Jan Beulich
2016-10-13 15:46                                       ` Haozhong Zhang
2016-10-14 10:16                                         ` Jan Beulich
2016-10-20  9:15                                           ` Haozhong Zhang
2016-10-13  9:08                                     ` Haozhong Zhang
2016-10-11 20:18                 ` Andrew Cooper
2016-10-12  7:25       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161011184215.GB23193@localhost.localdomain \
    --to=konrad.wilk@oracle.com \
    --cc=JGross@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=arnd@arndb.de \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=jbeulich@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=stefano@aporeto.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox