From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest Date: Tue, 28 Jul 2015 05:52:50 +0200 Message-ID: <55B6FC92.2000501@suse.com> References: <1437042762.28251.18.camel@citrix.com> <55B64A8A.7040200@citrix.com> <1438012950.5036.215.camel@citrix.com> <55B65CD6.7000607@citrix.com> <55B65D77.1050202@citrix.com> <1438018925.5036.242.camel@citrix.com> <55B6BC6D.8020808@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZJvwk-0001kV-9l for xen-devel@lists.xenproject.org; Tue, 28 Jul 2015 03:52:54 +0000 In-Reply-To: <55B6BC6D.8020808@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , Dario Faggioli Cc: Elena Ufimtseva , Wei Liu , David Vrabel , Jan Beulich , "xen-devel@lists.xenproject.org" , Boris Ostrovsky List-Id: xen-devel@lists.xenproject.org On 07/28/2015 01:19 AM, Andrew Cooper wrote: > On 27/07/2015 18:42, Dario Faggioli wrote: >> On Mon, 2015-07-27 at 17:33 +0100, Andrew Cooper wrote: >>> On 27/07/15 17:31, David Vrabel wrote: >>>> >>>>> Yeah, indeed. >>>>> That's the downside of Juergen's "Linux scheduler >>>>> approach". But the issue is there, even without taking vNUMA into >>>>> account, and I think something like that would really help (only for >>>>> Dom0, and Linux guests, of course). >>>> I disagree. Whether we're using vNUMA or not, Xen should still ensure >>>> that the guest kernel and userspace see a consistent and correct >>>> topology using the native mechanisms. >>> >>> +1 >>> >> +1 from me as well. In fact, a mechanism for making exactly such thing >> happen, was what I was after when starting the thread. >> >> Then it came up that CPUID needs to be used for at least two different >> and potentially conflicting purposes, that we want to support both and >> that, whether and for whatever reason it's used, Linux configures its >> scheduler after it, potentially resulting in rather pathological setups. >> > I don't see what the problem is here. Fundamentally, "NUMA optimise" vs > "comply with licence" is a user/admin decision at boot time, and we need > not cater to both halves at the same time. > > Supporting either, as chosen by the admin, is worthwhile. Wrong assumption again. *It's not only about NUMA*! The choice is: "comply with license" against "sane scheduling". NUMA just makes it more obvious, that the data the guest's scheduling decisions are based on is garbage as soon as you tell the guest there are hyperthreads without pinning the vcpus. Right now the sibling information is more or less random leading eventually to some vcpus thinking they have no sibling at all. As soon as you deliver sibling information based on vcpu number you might end up with a deterministic bad scheduling behaviour. Juergen