Re: [PATCH] spapr: Rework hash<->radix transitions at CAS

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: David Gibson <david@gibson.dropbear.id.au>
To: Greg Kurz <groug@kaod.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Laurent Vivier <lvivier@redhat.com>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [PATCH] spapr: Rework hash<->radix transitions at CAS
Date: Wed, 19 Feb 2020 10:21:05 +1100	[thread overview]
Message-ID: <20200218232105.GG1764@umbus.fritz.box> (raw)
In-Reply-To: <20200214191900.7e4f91fc@bahia.lan>

[-- Attachment #1: Type: text/plain, Size: 11716 bytes --]

On Fri, Feb 14, 2020 at 07:19:00PM +0100, Greg Kurz wrote:
> On Fri, 14 Feb 2020 09:28:35 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> > > Until the CAS negotiation is over, an HPT can be allocated on three
> > > different paths:
> > > 
> > > 1) during machine reset if the host doesn't support radix,
> > > 
> > > 2) during CAS if the guest wants hash and doesn't support HPT resizing,
> > >    in which case we pre-emptively resize the HPT to accomodate maxram,
> > > 
> > > 3) during CAS if no CAS reboot was requested, the guest wants hash but
> > >    we're currently configured for radix.
> > > 
> > > Depending on the various combinations of host or guest MMU support,
> > > HPT resizing guest support and the possibility of a CAS reboot, it
> > > is quite hard to know which of these allocates the HPT that will
> > > be ultimately used by the guest that wants to do hash. Also, some of
> > > them have bugs:
> > > 
> > > - 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
> > >   and thus doesn't update the VRMA size, even though we've just extended
> > >   the HPT. Not sure what issues this can cause,
> > > 
> > > - 3) doesn't check for HPT resizing support and will always allocate a
> > >   small HPT based on the initial RAM size. This caps the total amount of
> > >   RAM the guest can see, especially if maxram is much higher than the
> > >   initial ram.
> > > 
> > > We only support guests that do CAS and we already assume that the HPT
> > > isn't being used when we do the pre-emptive resizing at CAS. It thus
> > > seems reasonable to only allocate the HPT at the end of CAS, when no
> > > CAS reboot was requested.
> > > 
> > > Consolidate the logic so that we only create the HPT during 3), ie.
> > > when we're done with the CAS reboot cycles, and ensure HPT resizing
> > > is taken into account. This fixes the radix->hash transition for
> > > all cases.
> > 
> > Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
> > need the HPT at all times there, or there's nowhere to put VRMA
> > entries, so we can't run even in real mode.
> > 
> 
> Well it happens to be working anyway because KVM automatically
> creates an HPT (default size 16MB) in kvmppc_hv_setup_htab_rma()
> if QEMU didn't do so already... Would a comment to emphasize this
> be enough or do you prefer I don't drop the HPT allocation currently
> performed at machine reset ?

Relying on the automatic allocation is not a good idea.  With host
kernels before HPT resizing, once that automatic allocation happens,
we can't change the HPT size *at all*, even with a reset or CAS.

So, yes, the current code is annoyingly complex, but it's that way for
a reason.

> > > The guest can theoretically call CAS several times, without a CAS
> > > reboot in between. Linux guests don't do that, but better safe than
> > > sorry, let's ensure we can also handle the symmetrical hash->radix
> > > transition correctly: free the HPT and set the GR bit in PATE.
> > > An helper is introduced for the latter since this is already what
> > > we do during machine reset when going for radix.
> > > 
> > > As a bonus, this removes one user of spapr->cas_reboot, which we
> > > want to get rid of in the future.
> > > 
> > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > ---
> > >  hw/ppc/spapr.c         |   25 +++++++++++++++-----
> > >  hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
> > >  include/hw/ppc/spapr.h |    1 +
> > >  3 files changed, 44 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 828e2cc1359a..88bc0e4e3ca1 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
> > >  {
> > >      int hpt_shift;
> > >  
> > > +    /*
> > > +     * HPT resizing is a bit of a special case, because when enabled
> > > +     * we assume an HPT guest will support it until it says it
> > > +     * doesn't, instead of assuming it won't support it until it says
> > > +     * it does.  Strictly speaking that approach could break for
> > > +     * guests which don't make a CAS call, but those are so old we
> > > +     * don't care about them.  Without that assumption we'd have to
> > > +     * make at least a temporary allocation of an HPT sized for max
> > > +     * memory, which could be impossibly difficult under KVM HV if
> > > +     * maxram is large.
> > > +     */
> > >      if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
> > > -        || (spapr->cas_reboot
> > > -            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
> > > +        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
> > >          hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > >      } else {
> > >          uint64_t current_ram_size;
> > > @@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
> > >      return 0;
> > >  }
> > >  
> > > +void spapr_reset_patb_entry(SpaprMachineState *spapr)
> > > +{
> > > +    spapr->patb_entry = PATE1_GR;
> > > +    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > > +}
> > > +
> > >  static void spapr_machine_reset(MachineState *machine)
> > >  {
> > >      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> > > @@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
> > >           * without a HPT because KVM will start them in radix mode.
> > >           * Set the GR bit in PATE so that we know there is no HPT.
> > >           */
> > > -        spapr->patb_entry = PATE1_GR;
> > > -        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > > -    } else {
> > > -        spapr_setup_hpt_and_vrma(spapr);
> > > +        spapr_reset_patb_entry(spapr);
> > >      }
> > >  
> > >      qemu_devices_reset();
> > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > > index b8bb66b5c0d4..57ddf0fa6d05 100644
> > > --- a/hw/ppc/spapr_hcall.c
> > > +++ b/hw/ppc/spapr_hcall.c
> > > @@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > >      bool raw_mode_supported = false;
> > >      bool guest_xive;
> > >      CPUState *cs;
> > > +    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > >  
> > >      /* CAS is supposed to be called early when only the boot vCPU is active. */
> > >      CPU_FOREACH(cs) {
> > > @@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > >  
> > >      guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
> > >  
> > > -    /*
> > > -     * HPT resizing is a bit of a special case, because when enabled
> > > -     * we assume an HPT guest will support it until it says it
> > > -     * doesn't, instead of assuming it won't support it until it says
> > > -     * it does.  Strictly speaking that approach could break for
> > > -     * guests which don't make a CAS call, but those are so old we
> > > -     * don't care about them.  Without that assumption we'd have to
> > > -     * make at least a temporary allocation of an HPT sized for max
> > > -     * memory, which could be impossibly difficult under KVM HV if
> > > -     * maxram is large.
> > > -     */
> > > -    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > > -        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > > -
> > > -        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
> > > -            error_report(
> > > -                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > > -            exit(1);
> > > -        }
> > > -
> > > -        if (spapr->htab_shift < maxshift) {
> > > -            /* Guest doesn't know about HPT resizing, so we
> > > -             * pre-emptively resize for the maximum permitted RAM.  At
> > > -             * the point this is called, nothing should have been
> > > -             * entered into the existing HPT */
> > > -            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
> > > -            push_sregs_to_kvm_pr(spapr);
> > > -        }
> > > -    }
> > > -
> > >      /* NOTE: there are actually a number of ov5 bits where input from the
> > >       * guest is always zero, and the platform/QEMU enables them independently
> > >       * of guest input. To model these properly we'd want some sort of mask,
> > > @@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > >              error_report("Guest requested unavailable MMU mode (hash).");
> > >              exit(EXIT_FAILURE);
> > >          }
> > > +        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
> > > +            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > > +            error_report(
> > > +                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > > +            exit(1);
> > > +        }
> > >      }
> > >      spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
> > >      spapr_ovec_cleanup(ov1_guest);
> > > @@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > >          void *fdt;
> > >          SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
> > >  
> > > -        /* If spapr_machine_reset() did not set up a HPT but one is necessary
> > > -         * (because the guest isn't going to use radix) then set it up here. */
> > > -        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
> > > -            /* legacy hash or new hash: */
> > > -            spapr_setup_hpt_and_vrma(spapr);
> > > +        if (!guest_radix) {
> > > +            /*
> > > +             * Either spapr_machine_reset() did not set up a HPT but one
> > > +             * is necessary (because the guest isn't going to use radix),
> > > +             * or the guest doesn't know about HPT resizing and we need to
> > > +             * pre-emptively resize for the maximum permitted RAM. Set it
> > > +             * up here. At the point this is called, nothing should have
> > > +             * been entered into the existing HPT.
> > > +             */
> > > +            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
> > > +                /* legacy hash or new hash: */
> > > +                spapr_setup_hpt_and_vrma(spapr);
> > > +                push_sregs_to_kvm_pr(spapr);
> > > +            }
> > > +        } else {
> > > +            spapr_free_hpt(spapr);
> > > +            spapr_reset_patb_entry(spapr);
> > >          }
> > >  
> > >          if (fdt_bufsize < sizeof(hdr)) {
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index 09110961a589..9d88b5596481 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> > >  
> > >  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> > >  hwaddr spapr_get_rtas_addr(void);
> > > +void spapr_reset_patb_entry(SpaprMachineState *spapr);
> > >  #endif /* HW_SPAPR_H */
> > > 
> > 
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2020-02-18 23:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-13 15:38 [PATCH] spapr: Rework hash<->radix transitions at CAS Greg Kurz
2020-02-13 15:56 ` [PATCH] spapr: Make spapr_reallocate_hpt() static Greg Kurz
2020-02-13 22:28 ` [PATCH] spapr: Rework hash<->radix transitions at CAS David Gibson
2020-02-14 18:19   ` Greg Kurz
2020-02-18 23:21     ` David Gibson [this message]
2020-02-24 11:18       ` Greg Kurz
2020-02-24 23:44         ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200218232105.GG1764@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=groug@kaod.org \
    --cc=lvivier@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).