qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Michael Roth <mdroth@linux.vnet.ibm.com>
To: bharata@linux.vnet.ibm.comBharata B Rao <bharata@linux.vnet.ibm.com>
Cc: qemu-devel@nongnu.org, david@gibson.dropbear.id.au,
	nfont@linux.vnet.ibm.com, aik@ozlabs.ru, qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v3] spapr: Ensure all LMBs are represented in ibm, dynamic-memory
Date: Wed, 08 Jun 2016 11:09:11 -0500	[thread overview]
Message-ID: <20160608160911.665.29938@loki> (raw)
In-Reply-To: <20160608155002.GD8861@in.ibm.com>

Quoting Bharata B Rao (2016-06-08 10:50:03)
> On Wed, Jun 08, 2016 at 10:05:12AM -0500, Michael Roth wrote:
> > Quoting Bharata B Rao (2016-06-07 21:35:54)
> > > On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> > > > Quoting Bharata B Rao (2016-06-07 00:19:03)
> > > > > Memory hotplug can fail for some combinations of RAM and maxmem when
> > > > > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > > > > on maximum addressable memory returned by guest and this value is currently
> > > > > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > > > > While there is an attempt to fix the guest kernel, this patch works
> > > > > around the problem within QEMU itself.
> > > > > 
> > > > > memory_hotplug_max() routine in the guest kernel arrives at max
> > > > > addressable memory by multiplying lmb-size with the lmb-count obtained
> > > > > from ibm,dynamic-memory property. There are two assumptions here:
> > > > > 
> > > > > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> > > > >   where only hot-pluggable LMBs are present in this property.
> > > > > - The memory area comprising of RAM and hotplug region is contiguous: This
> > > > >   needn't be true always for PowerKVM as there can be gap between
> > > > >   boot time RAM and hotplug region.
> > > > > 
> > > > > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > > > > has information about all the LMBs (RMA, boot-time LMBs, future
> > > > > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > > > > hotpluggable region).
> > > > > 
> > > > > RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> > > > > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > > > > reserved so that these LMBs are not recounted/counted by guest.
> > > > > 
> > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > > ---
> > > > > Changes in v3:
> > > > > 
> > > > > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> > > > >   to have DRC objects for only hotpluggable LMBs.
> > > > > - Simplified the logic of creating dynamic-memory node based on comments
> > > > >   from Michael Roth and David Gibson.
> > > > > 
> > > > > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > > > > 
> > > > >  hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
> > > > >  include/hw/ppc/spapr.h |  5 +++--
> > > > >  2 files changed, 36 insertions(+), 20 deletions(-)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index 0636642..9d1d43d 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > > >      int ret, i, offset;
> > > > >      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> > > > >      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > > > > -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > > > > +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > > > > +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > > > > +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> > > > > +                       lmb_size;
> > > > >      uint32_t *int_buf, *cur_index, buf_len;
> > > > >      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > > > > 
> > > > >      /*
> > > > > -     * Don't create the node if there are no DR LMBs.
> > > > > +     * Don't create the node if there is no hotpluggable memory
> > > > >       */
> > > > > -    if (!nr_lmbs) {
> > > > > +    if (machine->ram_size == machine->maxram_size) {
> > > > >          return 0;
> > > > >      }
> > > > > 
> > > > > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > > >      for (i = 0; i < nr_lmbs; i++) {
> > > > >          sPAPRDRConnector *drc;
> > > > >          sPAPRDRConnectorClass *drck;
> > > > 
> > > > Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> > > > clearer to move them there now.
> > > 
> > > Yes.
> > > 
> > > > 
> > > > > -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > > > > +        uint64_t addr = i * lmb_size;
> > > > >          uint32_t *dynamic_memory = cur_index;
> > > > > 
> > > > > -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > > -                                       addr/lmb_size);
> > > > > -        g_assert(drc);
> > > > > -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > -
> > > > > -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > > -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > > -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > > -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > > -        if (addr < machine->ram_size ||
> > > > > -                    memory_region_present(get_system_memory(), addr)) {
> > > > > -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > > +        if (i >= hotplug_lmb_start) {
> > > > > +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > > +                                           addr / lmb_size);
> > > > 
> > > > Could just be i
> > > 
> > > Hmm I thought I got all such occurances covered :(
> > > 
> > > > 
> > > > > +            g_assert(drc);
> > > > > +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > +
> > > > > +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > > +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > > +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > > +            if (memory_region_present(get_system_memory(), addr)) {
> > > > > +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > > +            } else {
> > > > > +                dynamic_memory[5] = cpu_to_be32(0);
> > > > > +            }
> > > > >          } else {
> > > > > -            dynamic_memory[5] = cpu_to_be32(0);
> > > > > +            /*
> > > > > +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> > > > > +             * hotplug memory region -- all these are marked as reserved.
> > > > > +             */
> > > > > +            dynamic_memory[0] = cpu_to_be32(0);
> > > > > +            dynamic_memory[1] = cpu_to_be32(0);
> > > > 
> > > > Are we sure we shouldn't still encode the addr here?
> > > 
> > > Since kernel won't look at reserved LMBs, I thought it should be fine.
> > > We could populate the addr, but we don't have DRC for them and hence
> > > no DRC index, so anyway we will not have full information.
> > 
> > The 'DRC invalid' bit seems to suggests it's a perfectly valid scenario
> > to have LMBs with no backing drc, so I would think the other fields
> > should still be filled out appropriately, even if it might not get
> > used by guest either way.
> 
> addr can be populated, yes, but since there are no backing DRCs for
> RMA and rest of the boot-time RAM, we can't be populating DRC index.
> 
> 'DRC Invalid' in Table 248 of LoPAPR reads: If b '0/1', the DRC field of
> "ibm,dynamic-memory" property is valid/invalid.
> 
> In ibm,dyanmic-memory, anything sounding close to "DRC field" is DRC index,
> so it should be ok to have 0 for DRC index for such LMBs I suppose.

Agreed, for the drc index case I think leaving it 0/unset is the only
sensible approach, but if there's a flag to denote this situation we
should probably make use of it.

> 
> > 
> > I think an argument could be made for not setting addr for LMBs to used
> > to cover the RAM->hotplug gap, since that's padding rather than real
> > memory. But effectively re-using addr 0 seems like more of a liability
> > than a safeguard. If 'reserved' flag is doing what we expect, then it's
> > probably best not to deviate from normal values any more than necessary,
> > IMO.
> 
> Fine.
> 
> > 
> > > 
> > > > 
> > > > > +            dynamic_memory[2] = cpu_to_be32(0);
> > > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > +            dynamic_memory[4] = cpu_to_be32(-1);
> > > > > +            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);
> > > > 
> > > > LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I
> > > > think is what we'll want for cases where there's no backing DRC.
> > > 
> > > Seems to work. I started with reserved based on Nathan's original
> > > suggestion.
> > > 
> > > So Nathan, which would be more appropriate here from kernel point of view ?
> > > Reserved or 'DRC Invalid' ?
> > 
> > Wouldn't we want an OR of both?
> 
> To be fully sure that these LMBs would never be enumerated by the guest ?

I think 'reserved' flag covers that aspect, I'm really only suggesting
'drc invalid' flag also be included in the unlikely case that some
drmgr-like tool out there attempts to interpret the DRC index field as a
real index otherwise.

> 
> Regards,
> Bharata.
> 

  reply	other threads:[~2016-06-08 16:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-07  5:19 [Qemu-devel] [PATCH v3] spapr: Ensure all LMBs are represented in ibm, dynamic-memory Bharata B Rao
2016-06-07 23:37 ` Michael Roth
2016-06-08  2:35   ` Bharata B Rao
2016-06-08 15:05     ` Michael Roth
2016-06-08 15:50       ` Bharata B Rao
2016-06-08 16:09         ` Michael Roth [this message]
2016-06-09  2:05           ` David Gibson
2016-06-09  2:03       ` David Gibson
2016-06-10  5:07         ` Bharata B Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160608160911.665.29938@loki \
    --to=mdroth@linux.vnet.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=bharata@linux.vnet.ibm.comBharata \
    --cc=david@gibson.dropbear.id.au \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).