qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: "Cédric Le Goater" <clg@kaod.org>
Cc: Laurent Vivier <lvivier@redhat.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	mdroth@linux.vnet.ibm.com, aik@ozlabs.ru, qemu-devel@nongnu.org,
	agraf@suse.de, qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [PULL 19/48] spapr: allocate the ICPState object from under sPAPRCPUCore
Date: Tue, 16 May 2017 18:10:04 +0200	[thread overview]
Message-ID: <20170516181004.624cf441@bahia.lan> (raw)
In-Reply-To: <45db5bad-e1f3-d2d0-7014-878391638f6d@kaod.org>

[-- Attachment #1: Type: text/plain, Size: 5883 bytes --]

On Tue, 16 May 2017 17:18:27 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> On 05/16/2017 02:55 PM, Laurent Vivier wrote:
> > On 16/05/2017 14:50, Cédric Le Goater wrote:  
> >> On 05/16/2017 02:03 PM, Laurent Vivier wrote:  
> >>> On 26/04/2017 09:00, David Gibson wrote:  
> >>>> From: Cédric Le Goater <clg@kaod.org>
> >>>>
> >>>> Today, all the ICPs are created before the CPUs, stored in an array
> >>>> under the sPAPR machine and linked to the CPU when the core threads
> >>>> are realized. This modeling brings some complexity when a lookup in
> >>>> the array is required and it can be simplified by allocating the ICPs
> >>>> when the CPUs are.
> >>>>
> >>>> This is the purpose of this proposal which introduces a new 'icp_type'
> >>>> field under the machine and creates the ICP objects of the right type
> >>>> (KVM or not) before the PowerPCCPU object are.
> >>>>
> >>>> This change allows more cleanups : the removal of the icps array under
> >>>> the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id()
> >>>> helper.
> >>>>
> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >>>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> >>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>>> ---
> >>>>  hw/intc/xics.c          | 11 -----------
> >>>>  hw/ppc/spapr.c          | 47 ++++++++++++++---------------------------------
> >>>>  hw/ppc/spapr_cpu_core.c | 18 ++++++++++++++----
> >>>>  include/hw/ppc/spapr.h  |  2 +-
> >>>>  include/hw/ppc/xics.h   |  2 --
> >>>>  5 files changed, 29 insertions(+), 51 deletions(-)
> >>>>  
> >>>
> >>> This commit breaks CPU re-hotplugging with KVM
> >>>
> >>> the sequence "device_add, device_del, device_add" brings to the
> >>> following error message:
> >>>
> >>>     Unable to connect CPUx to kernel XICS: Device or resource busy
> >>>
> >>> It comes from icp_kvm_cpu_setup():
> >>>
> >>> ...
> >>>     ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_XICS, 0, kernel_xics_fd,
> >>>                               kvm_arch_vcpu_id(cs));
> >>>     if (ret < 0) {
> >>>         error_report("Unable to connect CPU%ld to kernel XICS: %s",
> >>>                      kvm_arch_vcpu_id(cs), strerror(errno));
> >>>         exit(1);
> >>>     }
> >>> ..
> >>>
> >>> It should be protected by cap_irq_xics_enabled:
> >>>
> >>> ...
> >>>     /*
> >>>      * If we are reusing a parked vCPU fd corresponding to the CPU
> >>>      * which was hot-removed earlier we don't have to renable
> >>>      * KVM_CAP_IRQ_XICS capability again.
> >>>      */
> >>>     if (icp->cap_irq_xics_enabled) {
> >>>         return;
> >>>     }
> >>>
> >>> ...
> >>>     ret = kvm_vcpu_enable_cap(...);
> >>> ...
> >>>     icp->cap_irq_xics_enabled = true;
> >>> ...
> >>>
> >>> But since this commit, "icp" is a new object on each call:
> >>>
> >>> spapr_cpu_core_realize_child()
> >>> ...
> >>>     obj = object_new(spapr->icp_type);
> >>> ...
> >>>     xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj));
> >>>     ...
> >>>             icpc->cpu_setup(icp, cpu); -> icp_kvm_cpu_setup()
> >>>     ...
> >>> ...
> >>>
> >>> and "cap_irq_xics_enabled" is reinitialized.
> >>>
> >>> Any idea how to fix that?  
> >>
> >> it seems that a cleanup is not done in the kernel. We are missing
> >> a way to call kvmppc_xics_free_icp() from QEMU. Today the only
> >> way is to destroy the vcpu.   
> > 
> > The commit introducing this hack, for reference:
> > 
> > commit a45863bda90daa8ec39e5a312b9734fd4665b016
> > Author: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Date:   Thu Jul 2 16:23:20 2015 +1000
> > 
> >     xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
> >     
> >     When supporting CPU hot removal by parking the vCPU fd and reusing
> >     it during hotplug again, there can be cases where we try to reenable
> >     KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled.
> >     Introduce a boolean member in ICPState to track this and don't
> >     reenable the CAP if it was already enabled earlier.
> >     
> >     Re-enabling this CAP should ideally work, but currently it results in
> >     kernel trying to create and associate ICP with this vCPU and that
> >     fails since there is already an ICP associated with it. Hence this
> >     patch is needed to work around this problem in the kernel.
> >     
> >     This change allows CPU hot removal to work for sPAPR.
> >     
> >     Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> >     Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> >     Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >     Signed-off-by: Alexander Graf <agraf@suse.de>  
> 
> OK. 
> 
> Greg is looking at re-adding the ICPState array because of a 
> migration issue with older machines. We might need to do so 
> unconditionally ...
> 

That would be a pity to carry on with the pre-allocated ICPStates for
new machine types just because of that... What about keeping track
of all the cap_irq_xics_enabled flags in a separate max_cpus sized
static array ?

> But for that specific issue, I think it would have been better 
> to clean up the kernel state. Is that possible ? 
> 

Commit 4c055ab54fae ("cpu: Reclaim vCPU objects") gives some more details
on why we don't destroy the vCPU in KVM on unplug, but rather park the vCPU
fd for later use... so I'm not sure we can clean up the kernel state.

But since the vCPU is still present, maybe we can find a way to tell KVM
that we want to reuse an already present ICP ?

> Thanks,
> 
> C.
>  
> 
> >> Else we need to reintroduce the array of icps (again) to keep some 
> >> xics state ... but that just sucks :/ Let me think about it. 
> >>  
> > 
> > Thanks,
> > Laurent  
> >> C.
> >>  
> >   
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2017-05-16 16:10 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26  6:59 [Qemu-devel] [PULL 00/48] ppc-for-2.10 queue 20170426 David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 01/48] target/ppc: Improve accuracy of guest HTM availability on P8s David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 02/48] pseries: Add pseries-2.10 machine type David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 03/48] ppc/spapr: QOM'ify sPAPRRTCState David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 04/48] hw/ppc/pnv: Classify the "PowerNV Chip" devices as CPU devices David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 05/48] target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64 David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 06/48] spapr: Add ibm, processor-radix-AP-encodings to the device tree David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 07/48] target-ppc: support KVM_CAP_PPC_MMU_RADIX, KVM_CAP_PPC_MMU_HASH_V3 David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 08/48] target/ppc: Add new H-CALL shells for in memory table translation David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 09/48] target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 10/48] spapr: move spapr_populate_pa_features() David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 11/48] spapr: Enable ISA 3.0 MMU mode selection via CAS David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 12/48] spapr: Workaround for broken radix guests David Gibson
2017-04-26  6:59 ` [Qemu-devel] [PULL 13/48] target-ppc/kvm: Enable in-kernel TCE acceleration for multi-tce David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 14/48] spapr_pci: Warn when RAM page size is not enabled in IOMMU page mask David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 15/48] spapr_pci: Removed unused include David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 16/48] target/ppc: Add ibm, processor-radix-AP-encodings for TCG David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 17/48] ppc/xics: introduce an 'intc' backlink under PowerPCCPU David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 18/48] spapr: move the IRQ server number mapping under the machine David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 19/48] spapr: allocate the ICPState object from under sPAPRCPUCore David Gibson
2017-05-16 12:03   ` Laurent Vivier
2017-05-16 12:50     ` Cédric Le Goater
2017-05-16 12:55       ` Laurent Vivier
2017-05-16 15:18         ` Cédric Le Goater
2017-05-16 16:10           ` Greg Kurz [this message]
2017-05-17  5:50             ` Cédric Le Goater
2017-05-17  6:37               ` David Gibson
2017-05-17 10:10                 ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-05-17 10:08               ` [Qemu-devel] " Greg Kurz
2017-04-26  7:00 ` [Qemu-devel] [PULL 20/48] ppc/xics: add a realize() handler to ICPStateClass David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 21/48] ppc/pnv: add a PnvICPState object David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 22/48] ppc/pnv: extend the machine with a XICSFabric interface David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 23/48] ppc/pnv: extend the machine with a InterruptStatsProvider interface David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 24/48] ppc/pnv: create the ICP object under PnvCore David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 25/48] ppc/pnv: add a helper to calculate MMIO addresses registers David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 26/48] ppc/pnv: add memory regions for the ICP registers David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 27/48] ppc/pnv: Add cut down PSI bridge model and hookup external interrupt David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 28/48] ppc/pnv: Add OCC model stub with interrupt support David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 29/48] ppc: add IPMI support David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 30/48] ipmi: use a file to load SDRs David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 31/48] ipmi: provide support for FRUs David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 32/48] ipmi: introduce an ipmi_bmc_sdr_find() API David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 33/48] ipmi: introduce an ipmi_bmc_gen_event() API David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 34/48] target/ppc: Fix size of struct PPCElfPrstatus David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 35/48] spapr: remove the 'nr_servers' field from the machine David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 36/48] ppc/pnv: Add support for POWER8+ LPC Controller David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 37/48] ppc/pnv: enable only one LPC bus David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 38/48] ppc/pnv: scan ISA bus to populate device tree David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 39/48] ppc/pnv: populate device tree for RTC devices David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 40/48] ppc/pnv: populate device tree for serial devices David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 41/48] ppc/pnv: populate device tree for IPMI BT devices David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 42/48] ppc/pnv: add initial IPMI sensors for the BMC simulator David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 43/48] ppc/pnv: generate an OEM SEL event on shutdown David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 44/48] spapr-cpu-core: Release ICPState object during CPU unrealization David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 45/48] target/ppc: Flush TLB on write to PIDR David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 46/48] e500, book3s: mfspr 259: Register mapped/aliased SPRG3 user read David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 47/48] target/ppc: Style fixes David Gibson
2017-04-26  7:00 ` [Qemu-devel] [PULL 48/48] MAINTAINERS: Remove myself from e500 David Gibson
2017-04-26  9:04 ` [Qemu-devel] [PULL 00/48] ppc-for-2.10 queue 20170426 no-reply
2017-04-26 14:32 ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170516181004.624cf441@bahia.lan \
    --to=groug@kaod.org \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=lvivier@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).