From: Haozhong Zhang <haozhong.zhang@intel.com>
To: Dan Williams <dan.j.williams@intel.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Jan Beulich <jbeulich@suse.com>,
Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Juergen Gross <JGross@suse.com>,
Xiao Guangrong <guangrong.xiao@linux.intel.com>,
Arnd Bergmann <arnd@arndb.de>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Stefano Stabellini <stefano@aporeto.com>,
David Vrabel <david.vrabel@citrix.com>,
xen-devel@lists.xenproject.org,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen
Date: Wed, 12 Oct 2016 18:33:18 +0800 [thread overview]
Message-ID: <20161012103318.vq36ed5ebb5xxcom@hz-desktop> (raw)
In-Reply-To: <CAPcyv4hBzGJeR=6bxa+DEwPGQFSCBGsQ1S18GR+y5_MMq6+f4g@mail.gmail.com>
On 10/11/16 13:17 -0700, Dan Williams wrote:
>On Tue, Oct 11, 2016 at 12:48 PM, Konrad Rzeszutek Wilk
><konrad.wilk@oracle.com> wrote:
>> On Tue, Oct 11, 2016 at 12:28:56PM -0700, Dan Williams wrote:
>>> On Tue, Oct 11, 2016 at 11:33 AM, Konrad Rzeszutek Wilk
>>> <konrad.wilk@oracle.com> wrote:
>>> > On Tue, Oct 11, 2016 at 10:51:19AM -0700, Dan Williams wrote:
>>> [..]
>>> >> Right, but why does the libnvdimm core need to know about this
>>> >> specific Xen reservation? For example, if Xen wants some in-kernel
>>> >
>>> > Let me turn this around - why does the libnvdimm core need to know about
>>> > Linux specific parts? Shouldn't this be OS agnostic, so that FreeBSD
>>> > for example can also poke a hole in this and fill it with its
>>> > OS-management meta-data?
>>>
>>> Specifically the core needs to know so that it can answer the Linux
>>> specific question of whether the pfn returned by ->direct_access() has
>>> a corresponding struct page or not. It's tied to the lifetime of the
>>> device and the usage of the reservation needs to be coordinated
>>> against the references of those pages. If FreeBSD decides it needs to
>>> reserve "struct page" capacity at the start of the device, I would
>>> hope that it reuses the same on-device info block that Linux is using
>>> and not create a new "FreeBSD-mode" device type.
>>
>> The issue here (as I understand, I may be missing something new)
>> is that the size of this special namespace may be different. That is
>> the 'struct page' on FreeBSD could be 256 bytes while on Linux it is
>> 64 bytes (numbers pulled out of the sky).
>>
>> Hence one would have to expand or such to re-use this.
>
>Sure, but we could support that today. If FreeBSD lays down the info
>block it is free to make a bigger reservation and Linux would be happy
>to use a smaller subset. If we, as an industry, want this "struct
>page" reservation to be common we can take it to a standards body to
>make as a cross-OS guarantee... but I think this is separate from the
>Xen reservation.
>
>>> To be honest I do not yet understand what metadata Xen wants to store
>>> in the device, but it seems the producer and consumer of that metadata
>>> is Xen itself and not the wider Linux kernel as is the case with
>>> struct page. Can you fill me in on what problem Xen solves with this
>>
>> Exactly!
>>> reservation?
>>
>> The same as Linux - its variant of 'struct page'. Which I think is
>> smaller than the Linux one, but perhaps it is not?
>>
>
>If the hypervisor needs to know where it can store some metadata, can
>that be satisfied with userspace tooling in Dom0? Something like,
>"/dev/pmem0p1 == Xen metadata" and "/dev/pmem0p2 == DAX filesystem
>with files to hand to guests". So my question is not about the
>rationale for having metadata, it's why does the Linux kernel need to
>know about the Xen reservation? As far as I can see it is independent
>/ opaque to the kernel.
Thank everyone for all these comments!
How about doing the reservation in the following way:
1. Create partition(s) on /dev/pmemX and make sure space besides the
partition table and potential padding before the first partition is
large enough to hold Xen's management structures and a super block
introduced in step 2. The space besides the partition table,
padding and the super block will be used as the reserved area.
2. Write a super block before above reserved area. The super block
records the base address and the size of the reserved area. It also
contains a signature and a checksum to identify itself.
The layout is shown as the following diagram.
+---------------+-----------+-------+----------+--------------+
| whatever used | Partition | Super | Reserved | /dev/pmem0p1 |
| by kernel | Table | Block | for Xen | |
+---------------+-----------+-------+----------+--------------+
\_____________________ _______________________/
V
/dev/pmem0
Above two steps can be done via a userspace program and do not need
Xen hypervisor running. The partitions on the device can be used
regardless of the existence of Xen hypervisor.
3. When Xen is running, implement a function in Dom0 Linux xen driver
(drivers/xen/) to response to udevd events that notify the
detection of the pmem regions.
This function searches on the pmem region for the super block
created in step 2. If one is found, it will know this pmem region
has been prepared for Xen usage.
Then it gets the base address and size of the reserved area (from
super block) and the entire address ranges of the pmem region (from
pmem driver), and reports them to Xen hypervisor.
The implementation of this step can be completely included in the
kernel Xen driver. (It may also be implemented as a udevd service in
userspace, if it's not considered as unsafe)
Thanks,
Haozhong
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next prev parent reply other threads:[~2016-10-12 10:33 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-10 0:35 [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen Haozhong Zhang
2016-10-10 0:35 ` [RFC KERNEL PATCH 1/2] nvdimm: add PFN_MODE_XEN to pfn device for Xen usage Haozhong Zhang
2016-10-10 0:35 ` [RFC KERNEL PATCH 2/2] xen, nvdimm: report pfn devices in PFN_MODE_XEN to Xen hypervisor Haozhong Zhang
2016-10-10 3:45 ` [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen Dan Williams
2016-10-10 6:32 ` Haozhong Zhang
2016-10-10 16:24 ` Dan Williams
2016-10-11 7:11 ` Haozhong Zhang
2016-10-10 16:43 ` [Xen-devel] " Andrew Cooper
2016-10-11 5:52 ` Haozhong Zhang
2016-10-11 18:37 ` Andrew Cooper
[not found] ` <de62aa59-37e0-b01f-1617-6fc8f6fb3620-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2016-10-11 18:45 ` Konrad Rzeszutek Wilk
2016-10-11 18:48 ` Konrad Rzeszutek Wilk
2016-10-11 13:08 ` Jan Beulich
2016-10-11 15:53 ` Dan Williams
2016-10-11 16:58 ` Konrad Rzeszutek Wilk
2016-10-11 17:51 ` Dan Williams
2016-10-11 18:15 ` Andrew Cooper
2016-10-11 18:42 ` Konrad Rzeszutek Wilk
2016-10-11 19:43 ` Konrad Rzeszutek Wilk
2016-10-11 18:33 ` Konrad Rzeszutek Wilk
2016-10-11 19:28 ` Dan Williams
2016-10-11 19:48 ` Konrad Rzeszutek Wilk
2016-10-11 20:17 ` Dan Williams
2016-10-12 10:33 ` Haozhong Zhang [this message]
2016-10-12 11:32 ` Jan Beulich
2016-10-12 14:58 ` Haozhong Zhang
2016-10-12 15:39 ` Jan Beulich
2016-10-12 15:42 ` Dan Williams
2016-10-12 16:01 ` Jan Beulich
2016-10-12 16:19 ` Dan Williams
2016-10-13 8:34 ` Jan Beulich
2016-10-13 8:53 ` Haozhong Zhang
2016-10-13 9:08 ` Jan Beulich
2016-10-13 15:40 ` Dan Williams
2016-10-13 16:01 ` Andrew Cooper
2016-10-13 18:59 ` Dan Williams
2016-10-13 19:33 ` Andrew Cooper
2016-10-14 7:08 ` Haozhong Zhang
2016-10-14 12:18 ` Andrew Cooper
2016-10-20 9:14 ` Haozhong Zhang
2016-10-20 21:46 ` Andrew Cooper
2016-10-14 10:03 ` Jan Beulich
2016-10-13 15:46 ` Haozhong Zhang
2016-10-14 10:16 ` Jan Beulich
2016-10-20 9:15 ` Haozhong Zhang
2016-10-13 9:08 ` Haozhong Zhang
2016-10-11 20:18 ` Andrew Cooper
2016-10-12 7:25 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161012103318.vq36ed5ebb5xxcom@hz-desktop \
--to=haozhong.zhang@intel.com \
--cc=JGross@suse.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=arnd@arndb.de \
--cc=boris.ostrovsky@oracle.com \
--cc=dan.j.williams@intel.com \
--cc=david.vrabel@citrix.com \
--cc=guangrong.xiao@linux.intel.com \
--cc=jbeulich@suse.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=stefano@aporeto.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox