From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest Date: Thu, 16 Jul 2015 11:56:50 +0100 Message-ID: <55A78DF2.1060709@citrix.com> References: <1437042762.28251.18.camel@citrix.com> <55A7A7F40200007800091D60@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZFgqh-000227-4j for xen-devel@lists.xenproject.org; Thu, 16 Jul 2015 10:57:07 +0000 In-Reply-To: <55A7A7F40200007800091D60@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Dario Faggioli Cc: Elena Ufimtseva , Wei Liu , David Vrabel , "xen-devel@lists.xenproject.org" , Boris Ostrovsky List-Id: xen-devel@lists.xenproject.org On 16/07/15 11:47, Jan Beulich wrote: >>>> On 16.07.15 at 12:32, wrote: >> root@test:~# numactl --hardware >> available: 2 nodes (0-1) >> node 0 cpus: 0 1 >> node 0 size: 475 MB >> node 0 free: 382 MB >> node 1 cpus: 2 3 >> node 1 size: 495 MB >> node 1 free: 475 MB >> node distances: >> node 0 1 >> 0: 10 10 >> 1: 20 10 >> >> root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list >> 0-1 >> root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list >> 0-3 >> root@test:~# cat /sys/devices/system/cpu/cpu2/topology/thread_siblings_list >> 2-3 >> root@test:~# cat /sys/devices/system/cpu/cpu2/topology/core_siblings_list >> 0-3 >> >> So the complain during boot seems to be against 'core_siblings' (which >> was not what I expected, but perhaps I misremember the meaning of >> "core_siblings" VS. "thread_siblings" VS. smt-siblings in Linux; I'll >> double check). >> >> Anyway, is there anything we can do to fix or workaround things? > Make the guest honor topology also at the CPUID layer. Whether > that's by not wrongly consuming the respective CPUID bits (i.e. a > guest side change) or reflecting PV state in what the hypervisor > returns I'm not sure about. While the latter might be more clean, > I'd be afraid this might get in the way of what the tool stack wants > to see. Xen's CPUID handling currently has no concept of per-core and per-package data in the cpuid policy. The guest sees the information from the pcpu which happened to be running libxc at the time the policy was decided, with a Xen-level fudge factor applied. There is also the regular problem of (failing to) trap cpuid instructions in PV guests, so any "numa aware" software which doesn't use the force override prefix will still fail in the above way. Lots of issues in this area are discussed in http://xenbits.xen.org/people/andrewcoop/feature-levelling/feature-levelling-E.pdf , especially the extra work section at the end. ~Andrew