From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest Date: Mon, 27 Jul 2015 13:13:25 +0200 Message-ID: <55B61255.6050904@suse.com> References: <55AFAC34.1060606@oracle.com> <55B070ED.2040200@suse.com> <1437660433.5036.96.camel@citrix.com> <55B21364.5040906@suse.com> <1437749076.4682.47.camel@citrix.com> <55B25650.4030402@suse.com> <55B258C9.4040400@suse.com> <1437753509.4682.78.camel@citrix.com> <20150724160948.GA2067@l.oracle.com> <55B26570.1060008@suse.com> <20150724162911.GC2220@l.oracle.com> <55B26A45.2050402@suse.com> <55B26B84.1000101@oracle.com> <55B5B504.2030504@suse.com> <55B60DE3.2050209@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZJgLZ-0003zT-Js for xen-devel@lists.xenproject.org; Mon, 27 Jul 2015 11:13:29 +0000 In-Reply-To: <55B60DE3.2050209@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , George Dunlap Cc: Elena Ufimtseva , Wei Liu , Dario Faggioli , David Vrabel , Jan Beulich , "xen-devel@lists.xenproject.org" , Boris Ostrovsky List-Id: xen-devel@lists.xenproject.org On 07/27/2015 12:54 PM, Andrew Cooper wrote: > On 27/07/15 11:43, George Dunlap wrote: >> On Mon, Jul 27, 2015 at 5:35 AM, Juergen Gross wrote: >>> On 07/24/2015 06:44 PM, Boris Ostrovsky wrote: >>>> On 07/24/2015 12:39 PM, Juergen Gross wrote: >>>>> >>>>> >>>>> I don't say mangling cpuids can't solve the scheduling problem. It >>>>> surely can. But it can't solve the scheduling problem without hiding >>>>> information like number of sockets or cores which might be required >>>>> for license purposes. If we don't care, fine. >>>>> >>>> (this is somewhat repeating the email I just sent) >>>> >>>> Why can's we construct socket/core info with CPUID (and *possibly* ACPI >>>> changes) that we present a reasonable (licensing-wise) picture? >>>> >>>> Can you suggest an example where it will not work and then maybe we can >>>> figure something out? >>> >>> Let's assume a software with license based on core count. You have a >>> system with a 2 8 core processors and hyperthreads enabled, summing up >>> to 32 logical processors. Your license is valid for up to 16 cores, so >>> running the software on bare metal on your system is fine. >>> >>> Now you are running the software inside a virtual machine with 24 vcpus >>> in a cpupool with 24 logical cpus limited to 12 cores (6 cores of each >>> processor). As we have to hide hyperthreading in order to not to have >>> to pin each vcpu to just a single logical processor, the topology >>> resulting from this picture will have to present 24 cores. The license >>> will not cover this hardware. >> But how does doing a PV topology help this situation? Because we're >> telling one thing to the OS (via our PV interface) and another thing >> to applications (via direct CPUID access)? > > I expressed exactly these concerns right back at the start of the vnuma > work. > > The OS and its userspace can and will use cpuid. Most examples will > only use cpuid. The only thing worse that providing no NUMA information > at all is providing conflicting information between cpuid and vnuma. > > IMO, HVM guests should get all their NUMA information from the same > sources as native hardware would provide. PV guests are admittedly > harder as in generally we cannot hide the real topology information in > cpuid. Are you aware the same is true currently even without vNUMA? The linux kernel (and other OS's as well) will make scheduling decisions based on cpuid data obtained during boot. The information will be correct only by chance and the real relation between vcpus and pcpus is changing all the time. So without adapting the kernel to that scenario it won't run optimal. You can either change the data to let the kernel make some sane decisions (cpuid mangling) or you can adapt the kernel somehow, e.g. by modifying the kernel internal tables used for making scheduling decisions (my proposal). Something should be done regardless of the vNUMA support. Juergen