qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: mdroth@linux.vnet.ibm.com, danielhb@linux.vnet.ibm.com,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org, lvivier@redhat.com,
	sjitindarsingh@gmail.com, bharata@linux.vnet.ibm.com,
	Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCHv2 2/8] spapr: Remove 'awaiting_allocation' DRC flag
Date: Wed, 12 Jul 2017 19:04:49 +0200	[thread overview]
Message-ID: <20170712190449.3f260b2a@bahia.lan> (raw)
In-Reply-To: <20170712132742.368db8d8@bahia.lan>

[-- Attachment #1: Type: text/plain, Size: 6992 bytes --]

On Wed, 12 Jul 2017 13:27:42 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Wed, 12 Jul 2017 21:05:45 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Jul 12, 2017 at 12:00:01PM +0200, Greg Kurz wrote:  
> > > On Wed, 12 Jul 2017 15:53:11 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >     
> > > > The awaiting_allocation flag in the DRC was introduced by aab9913
> > > > "spapr_drc: Prevent detach racing against attach for CPU DR", allegedly to
> > > > prevent a guest crash on racing attach and detach.  Except.. information
> > > > from the BZ actually suggests a qemu crash, not a guest crash.  And there    
> > > 
> > > Can you share an url to the BZ ?    
> > 
> > Alas, no.  I have that information second hand from.. Bharata,
> > think.. and I believe the BZ is an IBM internal one.
> >   
> 
> I did some digging and I could find it in our internal bugzilla. And indeed,
> it mentions QEMU crashing because of a SEGV...
> 
> > >     
> > > > shouldn't be a problem here anyway: if the guest has already moved the DRC
> > > > away from UNUSABLE state, the detach would already be deferred, and if it
> > > > hadn't it should be safe to detach it (the guest should fail gracefully
> > > > when it attempts to change the allocation state).
> > > > 
> > > > I think this was probably just a bandaid for some other problem in the
> > > > state management.  So, remove awaiting_allocation and associated code.
> > > >   
> 
> ... so I tend to agree this was a bandaid.
> 
> Reviewed-by: Greg Kurz <groug@kaod.org>
> 
> I'll re-try the scenario from the BZ with this patch applied to see if we
> hit the crash again.
> 

The crashing scenario was basically start a guest using libvirt, to add
a bunch of vCPUs and then remove them.

I've been running the addition/removal in a loop (using 'virsh setvcpus') and
no QEMU crash is observed, with or without this patch.

But, without this patch I would quickly (ie, 8 invocations of setvcpus) reach
a situation where:
- 'virsh setvcpus' would time out and fail forever from now on
- 'lscpu' in the guest and 'virsh vcpuinfo' show different number of vCPUs

With this patch applied, I could run the loop much much longer, and only
observe a 'virsh setvcpus' timeout from time to time. But at some point,
'virsh setvcpus' would fail consecutively 3 or 4 times with:

error: internal error: qemu didn't report thread id for vcpu '8'

and then fail with the following error forever:

error: unsupported configuration: failed to find appropriate hotpluggable
 vcpus to reach the desired target vcpu count

'lscpu' and 'virsh vcpuinfo' also produce inconsistent output.

I could continue to hotplug/unplug vCPUs with the QEMU monitor though,
so I guess there's something wrong in libvirt. I've asked Shiva (on Cc)
to help on the investigation.

Anyway, this patch seems to improve things and doesn't cause QEMU
to crash. I thus maintain my R-b and you can add:

Tested-by: Greg Kurz <groug@kaod.org>

> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  hw/ppc/spapr_drc.c         | 25 +++----------------------
> > > >  include/hw/ppc/spapr_drc.h |  1 -
> > > >  2 files changed, 3 insertions(+), 23 deletions(-)
> > > > 
> > > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > > > index 9b07f80..89ba3d6 100644
> > > > --- a/hw/ppc/spapr_drc.c
> > > > +++ b/hw/ppc/spapr_drc.c
> > > > @@ -170,19 +170,13 @@ static uint32_t drc_set_usable(sPAPRDRConnector *drc)
> > > >      if (!drc->dev) {
> > > >          return RTAS_OUT_NO_SUCH_INDICATOR;
> > > >      }
> > > > -    if (drc->awaiting_release && drc->awaiting_allocation) {
> > > > -        /* kernel is acknowledging a previous hotplug event
> > > > -         * while we are already removing it.
> > > > -         * it's safe to ignore awaiting_allocation here since we know the
> > > > -         * situation is predicated on the guest either already having done
> > > > -         * so (boot-time hotplug), or never being able to acquire in the
> > > > -         * first place (hotplug followed by immediate unplug).
> > > > -         */
> > > > +    if (drc->awaiting_release) {
> > > > +        /* Don't allow the guest to move a device away from UNUSABLE
> > > > +         * state when we want to unplug it */
> > > >          return RTAS_OUT_NO_SUCH_INDICATOR;
> > > >      }
> > > >  
> > > >      drc->allocation_state = SPAPR_DR_ALLOCATION_STATE_USABLE;
> > > > -    drc->awaiting_allocation = false;
> > > >  
> > > >      return RTAS_OUT_SUCCESS;
> > > >  }
> > > > @@ -357,10 +351,6 @@ void spapr_drc_attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
> > > >      drc->fdt = fdt;
> > > >      drc->fdt_start_offset = fdt_start_offset;
> > > >  
> > > > -    if (spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PCI) {
> > > > -        drc->awaiting_allocation = true;
> > > > -    }
> > > > -
> > > >      object_property_add_link(OBJECT(drc), "device",
> > > >                               object_get_typename(OBJECT(drc->dev)),
> > > >                               (Object **)(&drc->dev),
> > > > @@ -398,12 +388,6 @@ void spapr_drc_detach(sPAPRDRConnector *drc, DeviceState *d, Error **errp)
> > > >          return;
> > > >      }
> > > >  
> > > > -    if (drc->awaiting_allocation) {
> > > > -        drc->awaiting_release = true;
> > > > -        trace_spapr_drc_awaiting_allocation(spapr_drc_index(drc));
> > > > -        return;
> > > > -    }
> > > > -
> > > >      spapr_drc_release(drc);
> > > >  }
> > > >  
> > > > @@ -426,8 +410,6 @@ void spapr_drc_reset(sPAPRDRConnector *drc)
> > > >          spapr_drc_release(drc);
> > > >      }
> > > >  
> > > > -    drc->awaiting_allocation = false;
> > > > -
> > > >      if (drc->dev) {
> > > >          /* A device present at reset is coldplugged */
> > > >          drc->isolation_state = SPAPR_DR_ISOLATION_STATE_UNISOLATED;
> > > > @@ -493,7 +475,6 @@ static const VMStateDescription vmstate_spapr_drc = {
> > > >          VMSTATE_UINT32(dr_indicator, sPAPRDRConnector),
> > > >          VMSTATE_BOOL(configured, sPAPRDRConnector),
> > > >          VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> > > > -        VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnector),
> > > >          VMSTATE_END_OF_LIST()
> > > >      }
> > > >  };
> > > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > > > index 715016b..18a196e 100644
> > > > --- a/include/hw/ppc/spapr_drc.h
> > > > +++ b/include/hw/ppc/spapr_drc.h
> > > > @@ -200,7 +200,6 @@ typedef struct sPAPRDRConnector {
> > > >      sPAPRConfigureConnectorState *ccs;
> > > >  
> > > >      bool awaiting_release;
> > > > -    bool awaiting_allocation;
> > > >  
> > > >      /* device pointer, via link property */
> > > >      DeviceState *dev;    
> > >     
> > 
> > 
> >   
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2017-07-12 17:05 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-12  5:53 [Qemu-devel] [PATCHv2 0/8] spapr: DRC cleanups (part VI) David Gibson
2017-07-12  5:53 ` [Qemu-devel] [PATCHv2 1/8] spapr: Treat devices added before inbound migration as coldplugged David Gibson
2017-07-12  8:41   ` Greg Kurz
2017-07-12  5:53 ` [Qemu-devel] [PATCHv2 2/8] spapr: Remove 'awaiting_allocation' DRC flag David Gibson
2017-07-12  9:38   ` Laurent Vivier
2017-07-12 10:00   ` Greg Kurz
2017-07-12 11:05     ` David Gibson
2017-07-12 11:27       ` Greg Kurz
2017-07-12 17:04         ` Greg Kurz [this message]
2017-07-12  5:53 ` [Qemu-devel] [PATCHv2 3/8] spapr: Simplify unplug path David Gibson
2017-07-12 10:04   ` Greg Kurz
2017-07-12 10:31     ` Greg Kurz
2017-07-13  0:30       ` David Gibson
2017-07-12  5:53 ` [Qemu-devel] [PATCHv2 4/8] spapr: Refactor spapr_drc_detach() David Gibson
2017-07-12 11:47   ` Greg Kurz
2017-07-13  0:53     ` David Gibson
2017-07-12  5:53 ` [Qemu-devel] [PATCHv2 5/8] spapr: Cleanups relating to DRC awaiting_release field David Gibson
2017-07-12  5:53 ` [Qemu-devel] [PATCHv2 6/8] spapr: Consolidate DRC state variables David Gibson
2017-07-12 17:36   ` [Qemu-devel] [Qemu-ppc] " Daniel Henrique Barboza
2017-07-12  5:53 ` [Qemu-devel] [PATCHv2 7/8] spapr: Remove sPAPRConfigureConnectorState sub-structure David Gibson
2017-07-12 17:40   ` [Qemu-devel] [Qemu-ppc] " Daniel Henrique Barboza
2017-07-12  5:53 ` [Qemu-devel] [PATCHv2 8/8] spapr: Implement DR-indicator for physical DRCs only David Gibson
2017-07-12 17:44   ` [Qemu-devel] [Qemu-ppc] " Daniel Henrique Barboza
2017-07-12 13:48 ` [Qemu-devel] [Qemu-ppc] [PATCHv2 0/8] spapr: DRC cleanups (part VI) Daniel Henrique Barboza
2017-07-13  0:57   ` David Gibson
2017-07-13 10:13     ` Daniel Henrique Barboza
2017-07-14  6:53       ` David Gibson
2017-07-14 13:50         ` Daniel Henrique Barboza
2017-07-15  2:42           ` David Gibson
2017-07-13  1:09 ` [Qemu-devel] " David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170712190449.3f260b2a@bahia.lan \
    --to=groug@kaod.org \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=danielhb@linux.vnet.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=lvivier@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=sbhat@linux.vnet.ibm.com \
    --cc=sjitindarsingh@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).