From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <jgross@suse.com>
Subject: Re: PV-vNUMA issue: topology is misinterpreted by the
 guest
Date: Mon, 27 Jul 2015 16:02:39 +0200
Message-ID: <55B639FF.40609@suse.com>
References: <55AFAC34.1060606@oracle.com> <55B070ED.2040200@suse.com>
	<1437660433.5036.96.camel@citrix.com> <55B21364.5040906@suse.com>
	<1437749076.4682.47.camel@citrix.com> <55B25650.4030402@suse.com>
	<55B258C9.4040400@suse.com> <1437753509.4682.78.camel@citrix.com>
	<20150724160948.GA2067@l.oracle.com> <55B26570.1060008@suse.com>
	<20150724162911.GC2220@l.oracle.com> <55B26A45.2050402@suse.com>
	<55B26B84.1000101@oracle.com> <55B5B504.2030504@suse.com>
	<CAFLBxZbYR--Uq0bZxwSsX3HhUm5fa5U-Jtct6a=B8q8jS6WEfQ@mail.gmail.com>
	<55B60DE7.1020300@suse.com> <55B611F1.80508@citrix.com>
	<55B61DA5.5030903@suse.com> <1438003395.5036.122.camel@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <jgross@suse.com>) id 1ZJizV-0001O0-TR
	for xen-devel@lists.xenproject.org; Mon, 27 Jul 2015 14:02:54 +0000
In-Reply-To: <1438003395.5036.122.camel@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>, Wei Liu <wei.liu2@citrix.com>, George Dunlap <George.Dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, George Dunlap <george.dunlap@citrix.com>, David Vrabel <david.vrabel@citrix.com>, Jan Beulich <JBeulich@suse.com>, "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>, Boris Ostrovsky <boris.ostrovsky@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On 07/27/2015 03:23 PM, Dario Faggioli wrote:
> On Mon, 2015-07-27 at 14:01 +0200, Juergen Gross wrote:
>> On 07/27/2015 01:11 PM, George Dunlap wrote:
>
>>> Or alternately, if the user wants to give up on the "consolidation"
>>> aspect of virtualization, they can pin vcpus to pcpus and then pass in
>>> the actual host topology (hyperthreads and all).
>>
>> There would be another solution, of course:
>>
>> Support hyperthreads in the Xen scheduler via gang scheduling. While
>> this is not a simple solution, it is a fair one. Hyperthreads on one
>> core can influence each other rather much. With both threads always
>> running vcpus of the same guest the penalty/advantage would stay in the
>> same domain. The guest could make really sensible scheduling decisions
>> and the licensing would still work as desired.
>>
> This is interesting indeed, but I much rather see it as something
> orthogonal, which may indeed bring benefits in some of the scenarios
> described here, but should not be considered *the* solution.

Correct. I still think it should be done.

> Implementing, enabling and asking users to use something like this will
> impact the system behavior and performance, in ways that may not be
> desirable for all use cases.

I'd make it a scheduler parameter. So you could it enable for a
specific cpupool where you want it to be active.

> So, while I do think that this may be something nice to have and offer,
> trying to use it for solving the problem we're debating here would make
> things even more complex to configure.
>
> Also, this would take care of HT related issues, but what about cores
> (as in 'should vcpus be cores of sockets or full sockets') and !HT boxes
> (like AMD)?

!HT boxes will have no problem: We won't have to hide HT as cores...

Regarding many sockets with 1 core each or 1 socket with many cores: I
think 1 socket for the non-NUMA case is okay, we'll want multiple
sockets for NUMA.

> Not to mention, as you say yourself, that it's not easy to implement.

Yeah, but it will be fun. ;-)

>
>> Just an idea, but maybe worth to explore further instead of tweaking
>> more and more bits to make the virtual system somehow act sane.
>>
> Sure, and it it's interesting indeed, for a bunch or reasons and
> purposes (as Tim is also noting). Not so much --or at least not
> necessarily-- for this one, IMO.

It's especially interesting regarding accounting. A vcpu running for
1 second can do much more if no other vcpu is running on the same core.
This would be a problem of the guest then, like on bare metal.

For real time purposes it might be even interesting to schedule only 1
vcpu per core to have a reliable high speed of the vcpu.


Juergen