qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: "Cédric Le Goater" <clg@kaod.org>
Cc: Greg Kurz <groug@kaod.org>, Laurent Vivier <lvivier@redhat.com>,
	mdroth@linux.vnet.ibm.com, aik@ozlabs.ru, qemu-devel@nongnu.org,
	agraf@suse.de, qemu-ppc@nongnu.org,
	Bharata B Rao <bharata@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PULL 19/48] spapr: allocate the ICPState object from under sPAPRCPUCore
Date: Wed, 17 May 2017 16:37:52 +1000	[thread overview]
Message-ID: <20170517063752.GI15596@umbus.fritz.box> (raw)
In-Reply-To: <d8354004-1a76-bc67-cd9c-55a2b95ae142@kaod.org>

[-- Attachment #1: Type: text/plain, Size: 6027 bytes --]

On Wed, May 17, 2017 at 07:50:42AM +0200, Cédric Le Goater wrote:
> On 05/16/2017 06:10 PM, Greg Kurz wrote:
> > On Tue, 16 May 2017 17:18:27 +0200
> > Cédric Le Goater <clg@kaod.org> wrote:
> > 
> >> On 05/16/2017 02:55 PM, Laurent Vivier wrote:
> >>> On 16/05/2017 14:50, Cédric Le Goater wrote:  
> >>>> On 05/16/2017 02:03 PM, Laurent Vivier wrote:  
> >>>>> On 26/04/2017 09:00, David Gibson wrote:  
> >>>>>> From: Cédric Le Goater <clg@kaod.org>
> >>>>>>
> >>>>>> Today, all the ICPs are created before the CPUs, stored in an array
> >>>>>> under the sPAPR machine and linked to the CPU when the core threads
> >>>>>> are realized. This modeling brings some complexity when a lookup in
> >>>>>> the array is required and it can be simplified by allocating the ICPs
> >>>>>> when the CPUs are.
> >>>>>>
> >>>>>> This is the purpose of this proposal which introduces a new 'icp_type'
> >>>>>> field under the machine and creates the ICP objects of the right type
> >>>>>> (KVM or not) before the PowerPCCPU object are.
> >>>>>>
> >>>>>> This change allows more cleanups : the removal of the icps array under
> >>>>>> the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id()
> >>>>>> helper.
> >>>>>>
> >>>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>>>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> >>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>>>>> ---
> >>>>>>  hw/intc/xics.c          | 11 -----------
> >>>>>>  hw/ppc/spapr.c          | 47 ++++++++++++++---------------------------------
> >>>>>>  hw/ppc/spapr_cpu_core.c | 18 ++++++++++++++----
> >>>>>>  include/hw/ppc/spapr.h  |  2 +-
> >>>>>>  include/hw/ppc/xics.h   |  2 --
> >>>>>>  5 files changed, 29 insertions(+), 51 deletions(-)
> >>>>>>  
> >>>>>
> >>>>> This commit breaks CPU re-hotplugging with KVM
> >>>>>
> >>>>> the sequence "device_add, device_del, device_add" brings to the
> >>>>> following error message:
> >>>>>
> >>>>>     Unable to connect CPUx to kernel XICS: Device or resource busy
> >>>>>
> >>>>> It comes from icp_kvm_cpu_setup():
> >>>>>
> >>>>> ...
> >>>>>     ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_XICS, 0, kernel_xics_fd,
> >>>>>                               kvm_arch_vcpu_id(cs));
> >>>>>     if (ret < 0) {
> >>>>>         error_report("Unable to connect CPU%ld to kernel XICS: %s",
> >>>>>                      kvm_arch_vcpu_id(cs), strerror(errno));
> >>>>>         exit(1);
> >>>>>     }
> >>>>> ..
> >>>>>
> >>>>> It should be protected by cap_irq_xics_enabled:
> >>>>>
> >>>>> ...
> >>>>>     /*
> >>>>>      * If we are reusing a parked vCPU fd corresponding to the CPU
> >>>>>      * which was hot-removed earlier we don't have to renable
> >>>>>      * KVM_CAP_IRQ_XICS capability again.
> >>>>>      */
> >>>>>     if (icp->cap_irq_xics_enabled) {
> >>>>>         return;
> >>>>>     }
> >>>>>
> >>>>> ...
> >>>>>     ret = kvm_vcpu_enable_cap(...);
> >>>>> ...
> >>>>>     icp->cap_irq_xics_enabled = true;
> >>>>> ...
> >>>>>
> >>>>> But since this commit, "icp" is a new object on each call:
> >>>>>
> >>>>> spapr_cpu_core_realize_child()
> >>>>> ...
> >>>>>     obj = object_new(spapr->icp_type);
> >>>>> ...
> >>>>>     xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj));
> >>>>>     ...
> >>>>>             icpc->cpu_setup(icp, cpu); -> icp_kvm_cpu_setup()
> >>>>>     ...
> >>>>> ...
> >>>>>
> >>>>> and "cap_irq_xics_enabled" is reinitialized.
> >>>>>
> >>>>> Any idea how to fix that?  
> >>>>
> >>>> it seems that a cleanup is not done in the kernel. We are missing
> >>>> a way to call kvmppc_xics_free_icp() from QEMU. Today the only
> >>>> way is to destroy the vcpu.   
> >>>
> >>> The commit introducing this hack, for reference:
> >>>
> >>> commit a45863bda90daa8ec39e5a312b9734fd4665b016
> >>> Author: Bharata B Rao <bharata@linux.vnet.ibm.com>
> >>> Date:   Thu Jul 2 16:23:20 2015 +1000
> >>>
> >>>     xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
> >>>     
> >>>     When supporting CPU hot removal by parking the vCPU fd and reusing
> >>>     it during hotplug again, there can be cases where we try to reenable
> >>>     KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled.
> >>>     Introduce a boolean member in ICPState to track this and don't
> >>>     reenable the CAP if it was already enabled earlier.
> >>>     
> >>>     Re-enabling this CAP should ideally work, but currently it results in
> >>>     kernel trying to create and associate ICP with this vCPU and that
> >>>     fails since there is already an ICP associated with it. Hence this
> >>>     patch is needed to work around this problem in the kernel.
> >>>     
> >>>     This change allows CPU hot removal to work for sPAPR.
> >>>     
> >>>     Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> >>>     Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> >>>     Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>>     Signed-off-by: Alexander Graf <agraf@suse.de>  
> >>
> >> OK. 
> >>
> >> Greg is looking at re-adding the ICPState array because of a 
> >> migration issue with older machines. We might need to do so 
> >> unconditionally ...
> >>
> > 
> > That would be a pity to carry on with the pre-allocated ICPStates for
> > new machine types just because of that... What about keeping track
> > of all the cap_irq_xics_enabled flags in a separate max_cpus sized
> > static array ?
> 
> Could we use 'cpu->unplug' instead ?

I've only half followed this discussion, but fwiw I prefer the idea of
"parking" in-kernel ICP objects, similarly to the way we do for
removed VCPUs, rather than going back to keeping ICP objects around
indefinitely and unconditionally.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2017-05-17  7:01 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26  6:59 [Qemu-devel] [PULL 00/48] ppc-for-2.10 queue 20170426 David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 01/48] target/ppc: Improve accuracy of guest HTM availability on P8s David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 02/48] pseries: Add pseries-2.10 machine type David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 03/48] ppc/spapr: QOM'ify sPAPRRTCState David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 04/48] hw/ppc/pnv: Classify the "PowerNV Chip" devices as CPU devices David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 05/48] target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64 David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 06/48] spapr: Add ibm, processor-radix-AP-encodings to the device tree David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 07/48] target-ppc: support KVM_CAP_PPC_MMU_RADIX, KVM_CAP_PPC_MMU_HASH_V3 David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 08/48] target/ppc: Add new H-CALL shells for in memory table translation David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 09/48] target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 10/48] spapr: move spapr_populate_pa_features() David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 11/48] spapr: Enable ISA 3.0 MMU mode selection via CAS David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 12/48] spapr: Workaround for broken radix guests David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 13/48] target-ppc/kvm: Enable in-kernel TCE acceleration for multi-tce David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 14/48] spapr_pci: Warn when RAM page size is not enabled in IOMMU page mask David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 15/48] spapr_pci: Removed unused include David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 16/48] target/ppc: Add ibm, processor-radix-AP-encodings for TCG David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 17/48] ppc/xics: introduce an 'intc' backlink under PowerPCCPU David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 18/48] spapr: move the IRQ server number mapping under the machine David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 19/48] spapr: allocate the ICPState object from under sPAPRCPUCore David Gibson
2017-05-16 12:03   ` Laurent Vivier
2017-05-16 12:50     ` Cédric Le Goater
2017-05-16 12:55       ` Laurent Vivier
2017-05-16 15:18         ` Cédric Le Goater
2017-05-16 16:10           ` Greg Kurz
2017-05-17  5:50             ` Cédric Le Goater
2017-05-17  6:37               ` David Gibson [this message]
2017-05-17 10:10                 ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-05-17 10:08               ` [Qemu-devel] " Greg Kurz
2017-04-26  7:00 ` [Qemu-devel] [PULL 20/48] ppc/xics: add a realize() handler to ICPStateClass David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 21/48] ppc/pnv: add a PnvICPState object David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 22/48] ppc/pnv: extend the machine with a XICSFabric interface David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 23/48] ppc/pnv: extend the machine with a InterruptStatsProvider interface David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 24/48] ppc/pnv: create the ICP object under PnvCore David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 25/48] ppc/pnv: add a helper to calculate MMIO addresses registers David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 26/48] ppc/pnv: add memory regions for the ICP registers David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 27/48] ppc/pnv: Add cut down PSI bridge model and hookup external interrupt David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 28/48] ppc/pnv: Add OCC model stub with interrupt support David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 29/48] ppc: add IPMI support David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 30/48] ipmi: use a file to load SDRs David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 31/48] ipmi: provide support for FRUs David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 32/48] ipmi: introduce an ipmi_bmc_sdr_find() API David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 33/48] ipmi: introduce an ipmi_bmc_gen_event() API David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 34/48] target/ppc: Fix size of struct PPCElfPrstatus David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 35/48] spapr: remove the 'nr_servers' field from the machine David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 36/48] ppc/pnv: Add support for POWER8+ LPC Controller David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 37/48] ppc/pnv: enable only one LPC bus David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 38/48] ppc/pnv: scan ISA bus to populate device tree David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 39/48] ppc/pnv: populate device tree for RTC devices David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 40/48] ppc/pnv: populate device tree for serial devices David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 41/48] ppc/pnv: populate device tree for IPMI BT devices David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 42/48] ppc/pnv: add initial IPMI sensors for the BMC simulator David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 43/48] ppc/pnv: generate an OEM SEL event on shutdown David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 44/48] spapr-cpu-core: Release ICPState object during CPU unrealization David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 45/48] target/ppc: Flush TLB on write to PIDR David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 46/48] e500, book3s: mfspr 259: Register mapped/aliased SPRG3 user read David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 47/48] target/ppc: Style fixes David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 48/48] MAINTAINERS: Remove myself from e500 David Gibson
2017-04-26  9:04 ` [Qemu-devel] [PULL 00/48] ppc-for-2.10 queue 20170426 no-reply
2017-04-26 14:32 ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170517063752.GI15596@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=clg@kaod.org \
    --cc=groug@kaod.org \
    --cc=lvivier@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).