From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest Date: Wed, 22 Jul 2015 10:44:04 -0400 Message-ID: <55AFAC34.1060606@oracle.com> References: <1437042762.28251.18.camel@citrix.com> <55A7A7F40200007800091D60@mail.emea.novell.com> <55A78DF2.1060709@citrix.com> <20150716152513.GU12455@zion.uk.xensource.com> <55A7D17C.5060602@citrix.com> <55A7D2CC.1050708@oracle.com> <55A7F7F40200007800092152@mail.emea.novell.com> <55A7DE45.4040804@citrix.com> <55A7E2D8.3040203@oracle.com> <55A8B83802000078000924AE@mail.emea.novell.com> <1437118075.23656.25.camel@citrix.com> <55A946C6.8000002@oracle.com> <1437401354.5036.19.camel@citrix.com> <55AD08F7.7020105@oracle.com> <55AEA4DD.7080406@oracle.com> <1437572160.5036.39.camel@citrix.com> <55AF9F8F.7030200@suse.com> <55AFA16B.3070103@oracle.com> <55AFA41E.1080101@suse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZHvGB-0003wV-Ab for xen-devel@lists.xenproject.org; Wed, 22 Jul 2015 14:44:39 +0000 In-Reply-To: <55AFA41E.1080101@suse.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Juergen Gross , Dario Faggioli Cc: Elena Ufimtseva , Wei Liu , Andrew Cooper , David Vrabel , Jan Beulich , "xen-devel@lists.xenproject.org" List-Id: xen-devel@lists.xenproject.org On 07/22/2015 10:09 AM, Juergen Gross wrote: > On 07/22/2015 03:58 PM, Boris Ostrovsky wrote: >> On 07/22/2015 09:50 AM, Juergen Gross wrote: >>> On 07/22/2015 03:36 PM, Dario Faggioli wrote: >>>> On Tue, 2015-07-21 at 16:00 -0400, Boris Ostrovsky wrote: >>>>> On 07/20/2015 10:43 AM, Boris Ostrovsky wrote: >>>>>> On 07/20/2015 10:09 AM, Dario Faggioli wrote: >>>> >>>>>> I'll need to see how LLC IDs are calculated, probably also from some >>>>>> CPUID bits. >>>>> >>>>> >>>>> No, can't do this: LLC is calculated from CPUID leaf 4 (on Intel) >>>>> which >>>>> use indexes in ECX register and xl syntax doesn't allow you to >>>>> override >>>>> CPUIDs for such leaves. >>>>> >>>> Right. Which leaves us with the question of what should we do and/or >>>> recommend users to do? >>>> >>>> If there were a workaround that we could put in place, and document >>>> somewhere, however tricky it was, I'd say to go for it, and call it >>>> acceptable for now. >>>> >>>> But, if there isn't, should we disable PV vnuma, or warn the user that >>>> he may see issues? Can we identify, in Xen or in toolstack, whether an >>>> host topology will be problematic, and disable/warn in those cases >>>> too? >>>> >>>> I'm not sure, honestly. Disabling looks too aggressive, but it's an >>>> issue I wouldn't like an user to be facing, without at least being >>>> informed of the possibility... so, perhaps a (set of) warning(s)? >>>> Thoughts? >>> >>> I think we have 2 possible solutions: >>> >>> 1. Try to handle this all in the hypervisor via CPUID mangling. >>> >>> 2. Add PV-topology support to the guest and indicate this capability >>> via >>> elfnote; only enable PV-numa if this note is present. >>> >>> I'd prefer the second solution. If you are okay with this, I'd try >>> to do >>> some patches for the pvops kernel. Why do you think that kernel patches are preferable to CPUID management? This would be all in tools, I'd think. (Well, one problem that I can think of is that AMD sometimes pokes at MSRs and/or Northbridge's PCI registers to figure out nodeID --- that we may need to have to address in the hypervisor) And those patches won't help HVM guests, will they? How would they be useful by user processes? -boris >> >> What if I configure a guest to follow HW topology? I.e. I pin VCPUs to >> appropriate cores/threads? With elfnote I am stuck with disabled >> topology. > > Add an option to do exactly that: follow HW topology (pin vcpus, > configure vnuma)? > > Add a force flag to the vnuma configuration to ignore the elfnote? > >> Besides, this is not necessarily a NUMA-only issue, it's a scheduling >> one (inside the guest) as well. > > Sure. That's what Jan said regarding SUSE's xen-kernel. No toplogy info > (or a trivial one) might be better than the wrong one... > > This patch for pvops should be written in any case. I'll do this, but it > would be nice to know whether PV-numa should be considered or not.