Re: [Qemu-devel] [PATCH 8/9] spapr: Implement processor compatibility in ibm, client-architecture-support

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Alexander Graf <agraf@suse.de>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-devel@nongnu.org
Cc: qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 8/9] spapr: Implement processor compatibility in ibm, client-architecture-support
Date: Mon, 19 May 2014 11:01:09 +0200	[thread overview]
Message-ID: <5379C855.9050405@suse.de> (raw)
In-Reply-To: <5376BF3E.5060706@ozlabs.ru>


On 17.05.14 03:45, Alexey Kardashevskiy wrote:
> On 05/17/2014 06:46 AM, Alexander Graf wrote:
>> On 16.05.14 17:57, Alexey Kardashevskiy wrote:
>>> On 05/17/2014 12:16 AM, Alexander Graf wrote:
>>>> On 15.05.14 13:28, Alexey Kardashevskiy wrote:
>>>>> Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
>>>>> in most cases it can find a matching cpu_spec in the kernel's cpu_specs
>>>>> list. However if the kernel is quite old, it may be missing a definition
>>>>> of the actual CPU. To provide an ability for old kernels to work on modern
>>>>> hardware, a Processor Compatibility Mode has been introduced
>>>>> by the PowerISA specification.
>>>>>
>>>>>    From the hardware prospective, it is supported by the Processor
>>>>> Compatibility Register (PCR) which is defined in PowerISA. The register
>>>>> enables one of the compatibility modes (2.05/2.06/2.07).
>>>>> Since PCR is a hypervisor privileged register and cannot be
>>>>> accessed from the guest, the mode selection is done via
>>>>> ibm,client-architecture-support (CAS) RTAS call using which the guest
>>>>> specifies what "raw" and "architected" CPU versions it supports.
>>>>> QEMU works out the best match, changes a "cpu-version" property of
>>>>> every CPU and notifies the guest about the change by setting these
>>>>> properties in the buffer passed as a response on a custom H_CAS hypercall.
>>>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> ---
>>>>>     hw/ppc/spapr.c       | 40 +++++++++++++++++++++++++
>>>>>     hw/ppc/spapr_hcall.c | 83
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>     trace-events         |  5 ++++
>>>>>     3 files changed, 128 insertions(+)
>>>>>
>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>> index a2c9106..a0882a1 100644
>>>>> --- a/hw/ppc/spapr.c
>>>>> +++ b/hw/ppc/spapr.c
>>>>> @@ -35,6 +35,7 @@
>>>>>     #include "kvm_ppc.h"
>>>>>     #include "mmu-hash64.h"
>>>>>     #include "cpu-models.h"
>>>>> +#include "qom/cpu.h"
>>>>>       #include "hw/boards.h"
>>>>>     #include "hw/ppc/ppc.h"
>>>>> @@ -592,11 +593,50 @@ int spapr_h_cas_compose_response(target_ulong addr,
>>>>> target_ulong size)
>>>>>     {
>>>>>         void *fdt;
>>>>>         sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
>>>>> +    CPUState *cs;
>>>>> +    int smt = kvmppc_smt_threads();
>>>>>           size -= sizeof(hdr);
>>>>>           fdt = g_malloc0(size);
>>>>>         _FDT((fdt_create(fdt, size)));
>>>>> +    _FDT((fdt_begin_node(fdt, "cpus")));
>>>>> +
>>>>> +    CPU_FOREACH(cs) {
>>>>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>>>>> +        DeviceClass *dc = DEVICE_GET_CLASS(cpu);
>>>>> +        int smpt = spapr_get_compat_smp_threads(cpu);
>>>>> +        int i, index = ppc_get_vcpu_dt_id(cpu);
>>>>> +        uint32_t servers_prop[smpt];
>>>>> +        uint32_t gservers_prop[smpt * 2];
>>>>> +        char tmp[32];
>>>>> +
>>>>> +        if ((index % smt) != 0) {
>>>>> +            continue;
>>>>> +        }
>>>>> +
>>>>> +        snprintf(tmp, 32, "%s@%x", dc->fw_name, index);
>>>>> +        trace_spapr_cas_add_subnode(tmp);
>>>>> +
>>>>> +        _FDT((fdt_begin_node(fdt, tmp)));
>>>>> +        _FDT((fdt_property_cell(fdt, "cpu-version", cpu->cpu_version)));
>>>>> +
>>>>> +        /* Build interrupt servers and gservers properties */
>>>>> +        for (i = 0; i < smpt; i++) {
>>>>> +            servers_prop[i] = cpu_to_be32(index + i);
>>>>> +            /* Hack, direct the group queues back to cpu 0 */
>>>>> +            gservers_prop[i*2] = cpu_to_be32(index + i);
>>>>> +            gservers_prop[i*2 + 1] = 0;
>>>>> +        }
>>>>> +        _FDT((fdt_property(fdt, "ibm,ppc-interrupt-server#s",
>>>>> +                           servers_prop, sizeof(servers_prop))));
>>>>> +        _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s",
>>>>> +                           gservers_prop, sizeof(gservers_prop))));
>>>>> +
>>>>> +        _FDT((fdt_end_node(fdt)));
>>>>> +    }
>>>>> +
>>>>> +    _FDT((fdt_end_node(fdt)));
>>>> Why is this so much code? Can we only replace full nodes?
>>> It is a diff tree, in only has properties to change.
>> So why do we change so much then? :)
>
> 3 (three) properties are "much" now? :)
>
>>>> Then please
>>>> extract the original CPU node creation into a function and just call it
>>>> from the original generation and from here.
>>>>
>>>> If we can also replace single properties I'd say we only need to override
>>>> cpu-version.
>>> Mmm. The user could run qemu with threads=8 on POWER8 with SLES11 which
>>> must not see 8 threads but it can update itself and reboot with the kernel
>>> which is aware of POWER8. You are saying we do not want to support that?
>> First off, I think that practically speaking people will want to use
>> 2-thread granularity on POWER8 because that gives them the best load
>> exploitation on the system.
> Practically speaking, people will want to use kernels which can work on
> POWER8 in raw mode, no? Here we are trying to fix older guests.
>
>> So if we just disable CPU nodes that would not be visible usually (there
>> was a disabled flag in cpu nodes, right?) for threads that are incompatible
>> with a new CAS mode, we can then remove the disabled flag when the guest
>> accepts more threads again.
>>
>> That means we could expose 8 threads (POWER8), guest only supports 2
>> threads (POWER6), so we waste 6 threads. But the remaining 2 will be fast
>> ;). We basically degrade the system to SMT2 mode.
> We do not want to show more threads (i.e. bigger ibm,ppc-interrupt-server#s
> properties) if we do not support more threads as we do not really know what
> guest kernel (including AIX) or some ppc64_cpu/drmgr/whatever tool will do
> if they find more.
>
>> What does pHyp do here?
> I wish it was easy to verify :)
>
> I fail to see what exactly your objections are all about, I definitely miss
> something big here. Is it a minor code duplication or some dangerous bug?
> If it is just a code duplication, I could do something about it.

My main concern is that this is a big chunk of magic code that swizzles 
around random device tree properties without an easy to follow 
explanation that would show people what exactly the code does.

I think you can overcome all of this by combining the function you 
introduce here with the one that we already have for cpu node 
generation. You would of course need to parameterize it.

While you're at the process of combining the functions, you will realize 
that some things are not exactly obvious to every reader. For example

+        if ((index % smt) != 0) {
+            continue;
+        }


is based on the fact that only cores are listed in /cpus. But the code 
doesn't tell you, so if anyone who is not Alexey wants to debug this 
code and find out what's going wrong, he has to sit down for a few 
minutes are read up on that fact. It'd be a lot nicer to hint him 
towards the correct direction. This could either be a comment or reader 
friendly code such as

   for_each_core(...) {
   }

at which point the reader would immediately know what we're trying to do.

The same goes for the interrupt servers and gservers properties. If we 
don't explain the format at least by variable / function names in the 
code, you either have to know the format off the top of your head or 
constantly have the sPAPR spec (which is not public) lying next to you. 
This increases the barrier to understanding the code.

For example, if you would just extract the interrupt server generation 
into a function and give it parameters such as "num_threads" and give 
every value a good name before you assign it to the property array, the 
code suddenly becomes self-documenting. If you then share that function 
with the normal cpu node generation, anyone who fixes a bug in it also 
only has to look at a single spot.


Alex

next prev parent reply	other threads:[~2014-05-19  9:01 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-15 11:28 [Qemu-devel] [PATCH 0/9] spapr: Enable ibm, client-architecture-support Alexey Kardashevskiy
2014-05-15 11:28 ` [Qemu-devel] [PATCH 1/9] kvm: add set_one_reg/get_one_reg helpers Alexey Kardashevskiy
2014-05-15 11:28 ` [Qemu-devel] [PATCH 2/9] target-ppc: Add "compat" CPU option Alexey Kardashevskiy
2014-05-15 11:28 ` [Qemu-devel] [PATCH 3/9] spapr: Move server# property out of skeleton fdt Alexey Kardashevskiy
2014-05-15 11:28 ` [Qemu-devel] [PATCH 4/9] target-ppc: Implement "compat" CPU option Alexey Kardashevskiy
2014-05-16 14:05   ` Alexander Graf
2014-05-16 15:17     ` Alexey Kardashevskiy
2014-05-16 20:47       ` Alexander Graf
2014-05-21  6:57         ` Alexey Kardashevskiy
2014-05-21  7:29           ` Alexander Graf
2014-05-21  7:59             ` Alexey Kardashevskiy
2014-05-21  8:00               ` Alexander Graf
2014-05-21  8:08                 ` Alexey Kardashevskiy
2014-05-15 11:28 ` [Qemu-devel] [PATCH 5/9] target-ppc: Define Processor Compatibility Masks Alexey Kardashevskiy
2014-05-15 11:28 ` [Qemu-devel] [PATCH 6/9] spapr: Add ibm, client-architecture-support call Alexey Kardashevskiy
2014-05-19  3:47   ` [Qemu-devel] [Qemu-ppc] " Nikunj A Dadhania
2014-05-15 11:28 ` [Qemu-devel] [PATCH 7/9] spapr: Limit threads per core according to current compatibility mode Alexey Kardashevskiy
2014-05-16 14:09   ` Alexander Graf
2014-05-15 11:28 ` [Qemu-devel] [PATCH 8/9] spapr: Implement processor compatibility in ibm, client-architecture-support Alexey Kardashevskiy
2014-05-16 14:16   ` Alexander Graf
2014-05-16 15:57     ` Alexey Kardashevskiy
2014-05-16 20:46       ` Alexander Graf
2014-05-17  1:45         ` Alexey Kardashevskiy
2014-05-19  3:09           ` Alexey Kardashevskiy
2014-05-19  9:01           ` Alexander Graf [this message]
2014-05-15 11:28 ` [Qemu-devel] [PATCH 9/9] KVM: PPC: Enable compatibility mode Alexey Kardashevskiy
2014-05-16 14:18   ` Alexander Graf
2014-05-16 14:27     ` Alexey Kardashevskiy
2014-05-16 14:35       ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5379C855.9050405@suse.de \
    --to=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).