From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andre Przywara <andre.przywara@amd.com>
Subject: Re: [vNUMA v2][PATCH 2/8] public interface
Date: Tue, 3 Aug 2010 23:21:53 +0200
Message-ID: <4C588871.8030907@amd.com>
References: <1BEA8649F0C00540AB2811D7922ECB6C9338B4CC@orsmsx507.amr.corp.intel.com>	<AANLkTimK3KCFz8K7ETcJtOVe9kSCXnhN0BKnrFKkvwMd@mail.gmail.com>	<AANLkTimSDabceF5sFwHK3N6a0X1eYiFqaF6BgGhbVcj6@mail.gmail.com>	<4C581BA6.3030502@amd.com>
	<AANLkTimhagg4fUjZ+QwaFtzaLJwGdwbhcx0T6q60Zchj@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <AANLkTimhagg4fUjZ+QwaFtzaLJwGdwbhcx0T6q60Zchj@mail.gmail.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Dulloor <dulloor@gmail.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

Dulloor wrote:
> On Tue, Aug 3, 2010 at 6:37 AM, Andre Przywara <andre.przywara@amd.com> wrote:
>> Dulloor wrote:
>>> Interface definition. Structure that will be shared with hvmloader (with
>>> HVMs)
>>> and directly with the VMs (with PV).
>>>
>>> -dulloor
>>>
>>> Signed-off-by : Dulloor <dulloor@gmail.com>
>>>
>>> +/* vnodes are 1GB-aligned */
>>> +#define XEN_MIN_VNODE_SHIFT (30)
>> Why that? Do you mean guest memory here? Isn't that a bit restrictive?
>> What if the remaining system resources do not allow this?
>> What about a 5GB guest on 2 nodes?
>> In AMD hardware there is minimum shift of 16MB, so I think 24 bit would
>> be better.
> Linux has stricter restrictions on min vnode shift (256MB afair). And,
> I remember
> one of the emails from Jan Beulich where the minimum node size was discussed
> (but in another context). I will get verify my facts and reply on this.
OK. I was just asking cause I wondered how the PCI hole issue is solved 
(I haven't managed to review these patches today).
256 MB looks OK to me.

> 
>> +struct xen_vnode_info {
>> +    uint8_t mnode_id;  /* physical node vnode is allocated from */
>> +    uint32_t start;    /* start of the vnode range (in pages) */
>> +    uint32_t end;              /* end of the vnode range (in pages) */
>> +};
>> +
>>> +struct xen_domain_numa_info {
>>> +    uint8_t version;    /* Interface version */
>>> +    uint8_t type;       /* VM memory allocation scheme (see above) */
>>> +
>>> +    uint8_t nr_vcpus;
>> Isn't that redundant with info stored somewhere else (for instance
>> in the hvm_info table)?
> But, this being a dynamic structure, nr_vcpus and nr_vnodes determine the
> actual size of the populated structure. It's just easier to use in the
> above helper macros.
Right. That is better. My concern was how to deal with possible 
inconsistencies. But the number of VCPUs shouldn't be a problem.

>>> +    uint8_t nr_vnodes;
>>> +    /* data[] has the following entries :
>>> +     * //Only (nr_vnodes) entries are filled, each sizeof(struct
>>> xen_vnode_info)
>>> +     * struct xen_vnode_info vnode_info[nr_vnodes];
>> Why would the guest need that info (physical node, start and end) here?
>> Wouldn't be just the size of the node's memory sufficient?
> I changed that from size to (start, end) on last review. size should
> be sufficient since
> all nodes are contiguous. Will revert this back to use size.
start and end look fine on the first glance, but you gain nothing in 
using this if you only allow one entry per node. See the simple example 
of 4GB in 2 nodes, the SRAT looks like this:
node0: 0-640K
node0: 1MB - 2GB
node1: 2GB - 3.5GB
node1: 4GB - 4.5GB
In my patches I did this hole-punching in hvmloader and only send 2G/2G 
via hvm_info.
 From an architectural point of view the Xen tools code shouldn't deal 
with these internals if this can be hidden in hvmloader.

Regards,
Andre.


-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12