From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH 4/4] hvmloader: add support to load extra
 ACPI tables from qemu
Date: Tue, 26 Jan 2016 13:54:12 +0100
Message-ID: <56A76C74.5010506@suse.com>
References: <20160120110449.GD4939@hz-desktop.sh.intel.com>
	<569F7B8302000078000C8FF8@prv-mh.provo.novell.com>
	<569FA7F3.8080506@linux.intel.com>
	<569FCCED02000078000C94BA@prv-mh.provo.novell.com>
	<569FC112.9060309@linux.intel.com>
	<56A0A25002000078000C971B@prv-mh.provo.novell.com>
	<56A095E3.5060507@linux.intel.com>
	<56A0AA8A02000078000C977D@prv-mh.provo.novell.com>
	<56A0A09A.2050101@linux.intel.com>
	<56A0C02A02000078000C9823@prv-mh.provo.novell.com>
	<20160121140103.GB6362@hz-desktop.sh.intel.com>
	<56A0FEA102000078000C9A44@prv-mh.provo.novell.com>
	<CAFLBxZbGYWXSKE0NRcQsD8Nngy_Xx0WdWMmXJDLB0iZWsyD-4g@mail.gmail.com>
	<56A7785802000078000CB0CD@prv-mh.provo.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <56A7785802000078000CB0CD@prv-mh.provo.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>, George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>, Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Jun Nakajima <jun.nakajima@intel.com>, Xiao Guangrong <guangrong.xiao@linux.intel.com>, Keir Fraser <keir@xen.org>
List-Id: xen-devel@lists.xenproject.org

On 26/01/16 13:44, Jan Beulich wrote:
>>>> On 26.01.16 at 12:44, <George.Dunlap@eu.citrix.com> wrote:
>> On Thu, Jan 21, 2016 at 2:52 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 21.01.16 at 15:01, <haozhong.zhang@intel.com> wrote:
>>>> On 01/21/16 03:25, Jan Beulich wrote:
>>>>>>>> On 21.01.16 at 10:10, <guangrong.xiao@linux.intel.com> wrote:
>>>>>> c) hypervisor should mange PMEM resource pool and partition it to multiple
>>>>>>     VMs.
>>>>>
>>>>> Yes.
>>>>>
>>>>
>>>> But I Still do not quite understand this part: why must pmem resource
>>>> management and partition be done in hypervisor?
>>>
>>> Because that's where memory management belongs. And PMEM,
>>> other than PBLK, is just another form of RAM.
>>
>> I haven't looked more deeply into the details of this, but this
>> argument doesn't seem right to me.
>>
>> Normal RAM in Xen is what might be called "fungible" -- at boot, all
>> RAM is zeroed, and it basically doesn't matter at all what RAM is
>> given to what guest.  (There are restrictions of course: lowmem for
>> DMA, contiguous superpages, &c; but within those groups, it doesn't
>> matter *which* bit of lowmem you get, as long as you get enough to do
>> your job.)  If you reboot your guest or hand RAM back to the
>> hypervisor, you assume that everything in it will disappear.  When you
>> ask for RAM, you can request some parameters that it will have
>> (lowmem, on a specific node, &c), but you can't request a specific
>> page that you had before.
>>
>> This is not the case for PMEM.  The whole point of PMEM (correct me if
>> I'm wrong) is to be used for long-term storage that survives over
>> reboot.  It matters very much that a guest be given the same PRAM
>> after the host is rebooted that it was given before.  It doesn't make
>> any sense to manage it the way Xen currently manages RAM (i.e., that
>> you request a page and get whatever Xen happens to give you).
> 
> Interesting. This isn't the usage model I have been thinking about
> so far. Having just gone back to the original 0/4 mail, I'm afraid
> we're really left guessing, and you guessed differently than I did.
> My understanding of the intentions of PMEM so far was that this
> is a high-capacity, slower than DRAM but much faster than e.g.
> swapping to disk alternative to normal RAM. I.e. the persistent
> aspect of it wouldn't matter at all in this case (other than for PBLK,
> obviously).
> 
> However, thinking through your usage model I have problems
> seeing it work in a reasonable way even with virtualization left
> aside: To my knowledge there's no established protocol on how
> multiple parties (different versions of the same OS, or even
> completely different OSes) would arbitrate using such memory
> ranges. And even for a single OS it is, other than for disks (and
> hence PBLK), not immediately clear how it would communicate
> from one boot to another what information got stored where,
> or how it would react to some or all of this storage having
> disappeared (just like a disk which got removed, which - unless
> it held the boot partition - would normally have pretty little
> effect on the OS coming back up).

Last year at Linux Plumbers Conference I attended a session dedicated
to NVDIMM support. I asked the very same question and the INTEL guy
there told me there is indeed something like a partition table meant
to describe the layout of the memory areas and their contents.

It would be nice to have a pointer to such information. Without anything
like this it might be rather difficult to find the best solution how to
implement NVDIMM support in Xen or any other product.


Juergen