From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49287) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a6sKF-0004x2-2R for qemu-devel@nongnu.org; Wed, 09 Dec 2015 22:55:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a6sKA-0002C6-Oq for qemu-devel@nongnu.org; Wed, 09 Dec 2015 22:55:27 -0500 Date: Thu, 10 Dec 2015 14:55:56 +1100 From: David Gibson Message-ID: <20151210035556.GZ20139@voom.fritz.box> References: <20151109045812.GE18558@voom.redhat.com> <20151110042232.GB20030@us.ibm.com> <20151111001758.GK18558@voom.redhat.com> <20151111005638.GB4644@linux.vnet.ibm.com> <20151111014126.GD5852@voom.redhat.com> <20151111221048.GF4644@linux.vnet.ibm.com> <20151112044715.GB4886@voom.redhat.com> <20151112164627.GC34348@linux.vnet.ibm.com> <20151201034125.GO31343@voom.redhat.com> <20151205010416.GF10583@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="eP8xGX2lG1UwfyBj" Content-Disposition: inline In-Reply-To: <20151205010416.GF10583@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] [PATCH v2 1/1] target-ppc: Implement rtas_get_sysparm(PROCESSOR_MODULE_INFO) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nishanth Aravamudan Cc: stewart@linux.vnet.ibm.com, benh@au1.ibm.com, aik@ozlabs.ru, agraf@suse.de, qemu-devel@nongnu.org, qemu-ppc@nongnu.org, paulus@au1.ibm.com, Sukadev Bhattiprolu --eP8xGX2lG1UwfyBj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Dec 04, 2015 at 05:04:16PM -0800, Nishanth Aravamudan wrote: > On 01.12.2015 [14:41:25 +1100], David Gibson wrote: > > On Thu, Nov 12, 2015 at 08:46:27AM -0800, Nishanth Aravamudan wrote: > > > On 12.11.2015 [15:47:15 +1100], David Gibson wrote: > > > > On Wed, Nov 11, 2015 at 02:10:48PM -0800, Nishanth Aravamudan wrote: > > > > > On 11.11.2015 [12:41:26 +1100], David Gibson wrote: > > > > > > On Tue, Nov 10, 2015 at 04:56:38PM -0800, Nishanth Aravamudan w= rote: > > > > > > > On 11.11.2015 [11:17:58 +1100], David Gibson wrote: > > > > > > > > On Mon, Nov 09, 2015 at 08:22:32PM -0800, Sukadev Bhattipro= lu wrote: > > > > >=20 > > > > > > > > > >=20 > > > > > > > > The trouble with xscom is that it's extremely specific to t= he way the > > > > > > > > current IBM servers present things. It won't work on other= types of > > > > > > > > host machine (which could happen with PR KVM), and could ev= en break if > > > > > > > > IBM changes the way it organizes the SCOMs in a future mach= ine. > > > > > > > >=20 > > > > > > > > Working from the nodes in /cpus still has some dependencies= on IBM > > > > > > > > specific properties, but it's at least partially based on OF > > > > > > > > standards. > > > > > > > >=20 > > > > > > > > There's also another possible approach here, though I don't= know if it > > > > > > > > will work. Instead of looking directly in the device tree,= try to get > > > > > > > > the information from lscpu, or libosinfo. That would at le= ast give > > > > > > > > you some hope of providing meaningful information on other = host types. > > > > > > >=20 > > > > > > > Heh, the issue that is underlying all of this, is that `lscpu= ` itself is > > > > > > > quite wrong. > > > > > > >=20 > > > > > > > On PAPR-compliant hypervisors (well, PowerVM, at least), the = only > > > > > > > supported means of determining the underlying hardware CPU in= formation > > > > > > > (which is what licensing models want in the end), is to use t= his RTAS > > > > > > > call in an LPAR. `lscpu` is explicitly incorrect in these env= ironments > > > > > > > (it's values are "derived" from sysfs and some are adjusted t= o ensure > > > > > > > the division of values works out). > > > > > >=20 > > > > > > So.. I'm not sure if you're just saying that lscpu is wrong bec= ause it > > > > > > gives the guest information, or because of other problems. > > > > >=20 > > > > > `lscpu`'s man-page specifically says that on virtualized platform= s, the > > > > > output may be inaccurate. And, in fact, on Power, in a KVM guest = (and > > > > > in a LPAR), `lscpu` is outputting the guest CPU information, whic= h is > > > > > completely fake. This is true on x86 KVM guests too, afaict. > > > >=20 > > > > Um.. yes, I was assuming lscpu reporting information about virtual > > > > cpus and sockets was intended and correct behaviour. > > >=20 > > > "lscpu - display information about the CPU architecture" > >=20 > > Right, without qualification I'd take that as virtual architecture. >=20 > Ok, I did not. I suppose the manpage could be reviewed and/or updated. >=20 > Regardless, Suka has gotten a patch merged which at least for power, can > display the physical information via `lscpu`, when the RTAS call is > available. Ok. > > > but at the same time "lscpu gathers CPU architecture informat= ion > > > from sysfs and /proc/cpuinfo" which is explicitly logical (or > > > virtual). > > >=20 > > > but at the same time "There is also information about the CPU caches = and > > > cache sharing, family, model, bogoMIPS, byte order, and stepping." wh= ich > > > seems rather physical to me. > >=20 > > bogomips and byte order are absolutely properties of a virtual cpu. >=20 > I think the distinction to be made is they are *also* properties of a > virtual CPU. That is, they are properties of both,and where they vary, > it might be relevant to know both cases. No, they're really not. Byte order isn't even really a property of the cpu at all, but of a particularly piece of software running on it. Bogomips is an internal parameter of the Linux kernel which depends (somewhat) on the CPU. To the guest kernel, the host bogomips is absolutely irrelevant. > > As are family and model, really, since they're generally at least > > partially visible to a guest, and there may be some capacity for > > faking them (x86 is more flexible in this regard than Power). > > Stepping might be depending on exactly what level the system is > > virtualized at (it's not for the case of PAPR). > >=20 > > Cache info is probably purely physical but amongst everything else > > that's a property of the virtual cpu, I don't think that's an argument > > that lscpu should return host cpu information in general. > >=20 > >=20 > > > So perhaps, as I kind of stumbled upon myself in my last reply, we > > > should explicitly indicate the physical vs. virtual information. > > >=20 > > > I will raise this with the lscpu maintainer. > > >=20 > > > > > *If* we have a valid RTAS implementation on PowerKVM (or under qe= mu > > > > > generally), I think we can modify `lscpu` to do the right thing i= n at > > > > > least those two environments. > > > > >=20 > > > > > > What I was suggesting is implementing the RTAS call so that it > > > > > > effectively lets the guest get lscpu information from the host. > > > > >=20 > > > > > A bit of a chicken & egg problem, I'd say. The `lscpu` output in = PowerNV > > > > > is also wrong :) > > > >=20 > > > > Ok.. why is it wrong in PowerNV? This sounds like something you'd > > > > want to fix anyway. > > >=20 > > > Yes, I never said we wouldn't? It's wrong on PowerNV because chips are > > > being counted as sockets, i.e. a 2 DCM system is being counted as a 4 > > > socket system, rather than a 2 socket system. > >=20 > > Well, sure, but the fact that the tool for the job has a bug doesn't > > seem like a great reason to re-implement that tool directly in qemu. > >=20 > > I don't see any chicken and egg problem here the *powernv* lscpu has > > no dependency on the cpu hypercall information. >=20 > Maybe I'm mistaken, but I think we're simply talking at odds. >=20 > There are three cases to consider, which I feel like I've stated before: >=20 > 1) PowerVM provides this information (# of physical socketc, etc) via an > RTAS call. >=20 > 2) PowerKVM does not currently provide it at all. >=20 > 3) PowerNV provides it via `lscpu` and outputs incorrect information. >=20 > Suka is fixing 2) by adding the RTAS call to qemu. What the RTAS call > uses to populate the data seems to be the root of the question. I guess > you are suggesting use `lscpu` output from the PowerNV environment, even > if it's wrong, and once it's right, it's all done for PowerKVM. More or less, yes. It just seems more sensible to fix a mostly-right source of the information in the host, rather than to write a completely new implementation to gather the information. > Suka is fixing 3) as well, via patches to lscpu, but as you've pointed > out separately even in the qemu-side, it's not quite clear how to > proceed in the best generic way. >=20 > > > > > > > So, we are trying to at least resolve what PowerKVM guest can= see by > > > > > > > supporting this RTAS call there. We should report *something*= to the > > > > > > > guest, if possible, and we can adjust what is reported to the= guests as > > > > > > > we go, from the host perspective. > > > > > > >=20 > > > > > > > I haven't followed along too closely in this thread, but woud= l it be > > > > > > > reasonable to only report this RTAS call as being supported u= nder > > > > > > > KVM? > > > > > >=20 > > > > > > Possibly, yes. > > > > >=20 > > > > > At least, as a first step, I guess. > > > > >=20 > > > > > > > How are other RTAS calls dealt with for PR and non-IBM models > > > > > > > currently? > > > > > >=20 > > > > > > Most of them still make sense in PR or TCG. A few do look in t= he host > > > > > > device tree, in which case they're likely to fail on non-KVM. > > > > >=20 > > > > > Got it, thanks. > > > > >=20 > > > > > So my investigation overall led me to this set of conclusions: > > > > >=20 > > > > > 1) Under PowerVM, we do not use this RTAS call, which is the only= (as > > > > > asserted by pHyp developers) valid way to get hardware informatio= n about > > > > > the machine. Therefore, the PowerVM `lscpu` output is the "virtua= l" CPU > > > > > information -- where cores are as defined by sharing of the L2-ca= che. > > > > >=20 > > > > > 2) Under PowerKVM, we do not use this RTAS call, because it's not > > > > > supported, and just spit out whatever the qemu topology is (which= has no > > > > > connection to the host (physical) CPU information). > > > >=20 > > > > Right.. so does that mean nothing is using this call yet? > > >=20 > > > Correct. > > >=20 > > > > > --> so if we implement the RTAS call of some sort under PowerKVM= , then > > > > > we can update `lscpu` to use that RTAS call. > > > >=20 > > > > Yeah, I'm not convinced that's correct. Shouldn't lscpu return the > > > > virtual cpu information, at least by default. > > >=20 > > > I think it should return both. *cough* this is a request from your > > > employer, actually *cough* :) For billing purposes, physical topology= is > > > apparently relevant, not virtual (which makes sense, I can make a KVM > > > guest with 100 sockets, but I definitely shouldn't be billed for 100 > > > sockets worth of RH seats, if the physical system only has 2 sockets). > >=20 > > Well, ok. Do you have any contact information so I can find out > > internally what it is they actually need? >=20 > Yes, will send off-list. >=20 > -Nish >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --eP8xGX2lG1UwfyBj Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJWaPfMAAoJEGw4ysog2bOSPoAP/2MYnQQfJCBfMb3dmbepGMML p6lfLmlQZoUHMzq/bIIwQ55QnF9RhegETkaKM5B5wqGbnR6wNJc0/sS79PPjA0yw ndsajRQLgGh5iT6EyZ7hzh1ZTa5VIbIX6T/7QnbavZa8kEbZ80t3VQV23QbOyml0 TsC9zqHSJiTRWMygJtbMjU8MhJDMgf2bSyLcw9syKUKELaOmf6GsDD07cwnhftT8 eiVeEpqj7NNds7rkhHVenrUzhSUHK3cvDPPJesVYux8HLvIM4dAmK1QFovCJGcYo s4MOF+4Qp3XVc1/Fpv0wdv+IOaLJc9m697tk6Q0r5JZVxX7dBLTogZC7JhUYS3Bl s97QsTboWknr82y4cnVZwcyL1/NhK/T0DOYPlTa9r6lwYsIvLr6Odb3uclKUal0g IzMAdXFWfSsZTte3KbosSbZcM0nSloMvaORBRoUGogr+9y/mq7oRMI3GSMvpslOx sVQ6DJwBeiqK6EDRGzKLF6lOxv96GQGzgMfWILGCxKKoMrKl0cudwdamRd1oo1r5 lZ4enFKrXfhACulDRZvKxtoF71vkONkYaUr3jee8P432Dpd850EZ+C6kW0JkZf89 4UQb4ItVjGqrPPRNhOsxhyfCkbD5h2D1lCgtC/LEos3DjX1AxScG6L0NtWazQntA gT9mBUCr8L21fpR6vI8D =QAXd -----END PGP SIGNATURE----- --eP8xGX2lG1UwfyBj--