From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest Date: Thu, 23 Jul 2015 10:42:57 +0100 Message-ID: <55B0B721.7010209@citrix.com> References: <1437042762.28251.18.camel@citrix.com> <55A7A7F40200007800091D60@mail.emea.novell.com> <55A78DF2.1060709@citrix.com> <20150716152513.GU12455@zion.uk.xensource.com> <55A7D17C.5060602@citrix.com> <55A7D2CC.1050708@oracle.com> <55A7F7F40200007800092152@mail.emea.novell.com> <55A7DE45.4040804@citrix.com> <55A7E2D8.3040203@oracle.com> <55A8B83802000078000924AE@mail.emea.novell.com> <1437118075.23656.25.camel@citrix.com> <55A946C6.8000002@oracle.com> <1437401354.5036.19.camel@citrix.com> <55AD08F7.7020105@oracle.com> <55AEA4DD.7080406@oracle.com> <1437572160.5036.39.camel@citrix.com> <55AF9F8F.7030200@suse.com> <55AFA16B.3070103@oracle.com> <55AFA41E.1080101@suse.com> <55AFAC34.1060606@oracle.com> <55B070ED.2040200@suse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZID2H-0000ml-OV for xen-devel@lists.xenproject.org; Thu, 23 Jul 2015 09:43:29 +0000 In-Reply-To: <55B070ED.2040200@suse.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Juergen Gross , Boris Ostrovsky , Dario Faggioli Cc: Elena Ufimtseva , "xen-devel@lists.xenproject.org" , Wei Liu , David Vrabel , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 23/07/15 05:43, Juergen Gross wrote: > On 07/22/2015 04:44 PM, Boris Ostrovsky wrote: >> On 07/22/2015 10:09 AM, Juergen Gross wrote: >>> On 07/22/2015 03:58 PM, Boris Ostrovsky wrote: >>>> On 07/22/2015 09:50 AM, Juergen Gross wrote: >>>>> On 07/22/2015 03:36 PM, Dario Faggioli wrote: >>>>>> On Tue, 2015-07-21 at 16:00 -0400, Boris Ostrovsky wrote: >>>>>>> On 07/20/2015 10:43 AM, Boris Ostrovsky wrote: >>>>>>>> On 07/20/2015 10:09 AM, Dario Faggioli wrote: >>>>>> >>>>>>>> I'll need to see how LLC IDs are calculated, probably also from >>>>>>>> some >>>>>>>> CPUID bits. >>>>>>> >>>>>>> >>>>>>> No, can't do this: LLC is calculated from CPUID leaf 4 (on Intel) >>>>>>> which >>>>>>> use indexes in ECX register and xl syntax doesn't allow you to >>>>>>> override >>>>>>> CPUIDs for such leaves. >>>>>>> >>>>>> Right. Which leaves us with the question of what should we do and/or >>>>>> recommend users to do? >>>>>> >>>>>> If there were a workaround that we could put in place, and document >>>>>> somewhere, however tricky it was, I'd say to go for it, and call it >>>>>> acceptable for now. >>>>>> >>>>>> But, if there isn't, should we disable PV vnuma, or warn the user >>>>>> that >>>>>> he may see issues? Can we identify, in Xen or in toolstack, >>>>>> whether an >>>>>> host topology will be problematic, and disable/warn in those cases >>>>>> too? >>>>>> >>>>>> I'm not sure, honestly. Disabling looks too aggressive, but it's an >>>>>> issue I wouldn't like an user to be facing, without at least being >>>>>> informed of the possibility... so, perhaps a (set of) warning(s)? >>>>>> Thoughts? >>>>> >>>>> I think we have 2 possible solutions: >>>>> >>>>> 1. Try to handle this all in the hypervisor via CPUID mangling. >>>>> >>>>> 2. Add PV-topology support to the guest and indicate this capability >>>>> via >>>>> elfnote; only enable PV-numa if this note is present. >>>>> >>>>> I'd prefer the second solution. If you are okay with this, I'd try >>>>> to do >>>>> some patches for the pvops kernel. >> >> Why do you think that kernel patches are preferable to CPUID management? >> This would be all in tools, I'd think. (Well, one problem that I can >> think of is that AMD sometimes pokes at MSRs and/or Northbridge's PCI >> registers to figure out nodeID --- that we may need to have to address >> in the hypervisor) > > Doing it via CPUID is more HW specific. Trying to fake a topology for > the guest from outside might lead to weird decisions in the guest e.g. > regarding licenses based on socket counts. > > If you are doing it in the guest itself you are able to address the > different problems (scheduling, licensing) in different ways. > >> And those patches won't help HVM guests, will they? How would they be >> useful by user processes? > > HVM can use pv interfaces as well. It's called pv-NUMA :-) > > Hmm, I didn't think of user processes. Are you aware of cases where they > are to be considered? The only case where user processes are involved I > could think of is licensing again. Depending on the licensing model > playing with CPUID is either good or bad. I can even imagine the CPUID > configuration capabilities in xl are in use today for exactly this > purpose. Using them for pv-NUMA as well will make this feature unusable > for those users. Userspace can also use things like hwloc which use cpuid to calculate efficient allocation of resources, rather than for licensing purposes. ~Andrew