From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest Date: Mon, 27 Jul 2015 16:43:15 +0200 Message-ID: <55B64383.1000902@suse.com> References: <1437042762.28251.18.camel@citrix.com> <55A78DF2.1060709@citrix.com> <20150716152513.GU12455@zion.uk.xensource.com> <55A7D17C.5060602@citrix.com> <55A7D2CC.1050708@oracle.com> <55A7F7F40200007800092152@mail.emea.novell.com> <55A7DE45.4040804@citrix.com> <55A7E2D8.3040203@oracle.com> <55A8B83802000078000924AE@mail.emea.novell.com> <1437118075.23656.25.camel@citrix.com> <55A946C6.8000002@oracle.com> <1437401354.5036.19.camel@citrix.com> <55AD08F7.7020105@oracle.com> <55AEA4DD.7080406@oracle.com> <1437572160.5036.39.camel@citrix.com> <55AF9F8F.7030200@suse.com> <55AFA16B.3070103@oracle.com> <55AFA41E.1080101@suse.com> <55AFAC34.1060606@oracle.com> <55B070ED.2040200@suse.com> <1437660433.5036.96.camel@citrix.com> <55B21364.5040906@suse.com> <1437749076.4682.47.camel@citrix.com> <55B25650.4030402@suse.com> <55B258C9.4040400@suse.com> <1437753509.4682.78.camel@citrix.com> <55B26377.4060807@suse.com> <1438006166.5036.156.camel@citrix.com> <55B64193.9030400@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZJjcl-0004P1-M9 for xen-devel@lists.xenproject.org; Mon, 27 Jul 2015 14:43:27 +0000 In-Reply-To: <55B64193.9030400@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Boris Ostrovsky , Dario Faggioli Cc: Elena Ufimtseva , Wei Liu , Andrew Cooper , David Vrabel , Jan Beulich , "xen-devel@lists.xenproject.org" List-Id: xen-devel@lists.xenproject.org On 07/27/2015 04:34 PM, Boris Ostrovsky wrote: > On 07/27/2015 10:09 AM, Dario Faggioli wrote: >> On Fri, 2015-07-24 at 18:10 +0200, Juergen Gross wrote: >>> On 07/24/2015 05:58 PM, Dario Faggioli wrote: >>>> So, just to check if I'm understanding is correct: you'd like to add an >>>> abstraction layer, in Linux, like in generic (or, perhaps, scheduling) >>>> code, to hide the direct interaction with CPUID. >>>> Such layer, on baremetal, would just read CPUID while, on PV-ops, it'd >>>> check with Xen/match vNUMA/whatever... Is this that you are saying? >>> Sort of, yes. >>> >>> I just wouldn't add it, as it is already existing (more or less). It >>> can deal right now with AMD and Intel, we would "just" have to add Xen. >>> >> So, having gone through the rest of the thread (so far), and having >> given a fair amount o thinking to this, I really think that something >> like this would be a good thing to have in Linux. >> >> Of course, it's not that my opinion on where should be in Linux counts >> that much! :-D Nevertheless, I wanted to make it clear that, while >> skeptic at the beginning, I now think this is (part of) the way to go, >> as I said and explained in my reply to George. > > And I continue to believe that kernel solution does not address the > userland problem which is no less important than making kernel do proper > scheduling decisions (and I suspect when this patch goes for review > that's what the scheduling people are going to say). > > Remember the original problem that started this thread was that kernel > complained that topology didn't make sense and it turned off all > topology-related decisions. Which means that kernel already has a > solution for weird topology. Some enumeration doesn't trigger this > warning, but we can come up with one that does. Or we can indeed have a > patch in kernel that will, possibly silently, fail topology_sane() when > virtualized and not pinned. How would you come up with a topology the kernel is complaining about and user mode scheduling will use for sane decisions ? > (This is what I assume kernel does when topology_sane() fails. And if it > doesn't, that's a bug IMO) > > The licensing problem that Juergen described can be solved by pining > vcpus and exposing HT bit. Besides, creating a guest with 24 VPCUs and Hmm, yes. This way you sacrifice most of the virtualization advantages. > hoping that 16-core licensing will work I think is pushing it a bit when > you know that VCPUs will jump around cores (i.e. "on average" you are > running on more than 16 cores -- multi-threaded or not -- which arguably > is what licensing is trying to prevent) On a machine with only 16 cores running on more than 16 cores? I have some problems to believe this. The point was: if the license is happy on bare metal it should be so when running on the same hardware as a guest. Juergen