From: David Gibson <david@gibson.dropbear.id.au>
To: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, groug@kaod.org
Subject: Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
Date: Wed, 17 Feb 2021 13:31:29 +1100 [thread overview]
Message-ID: <YCyAAe4dJzpsgQ0x@yekko.fritz.box> (raw)
In-Reply-To: <20210211225246.17315-8-danielhb413@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 7957 bytes --]
On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
> Handling errors in memory hotunplug in the pSeries machine is more complex
> than any other device type, because there are all the complications that other
> devices has, and more.
>
> For instance, determining a timeout for a DIMM hotunplug must consider if it's a
> Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug DIMMs.
> The size of the DIMM is also a factor, given that longer DIMMs naturally takes
> longer to be hotunplugged from the kernel. And there's also the guest memory usage to
> be considered: if there's a process that is consuming memory that would be lost by
> the DIMM unplug, the kernel will postpone the unplug process until the process
> finishes, and then initiate the regular hotunplug process. The first two
> considerations are manageable, but the last one is a deal breaker.
>
> There is no sane way for the pSeries machine to determine the memory load in the guest
> when attempting a DIMM hotunplug - and even if there was a way, the guest can start
> using all the RAM in the middle of the unplug process and invalidate our previous
> assumptions - and in result we can't even begin to calculate a timeout for the
> operation. This means that we can't implement a viable timeout mechanism for memory
> unplug in pSeries.
>
> Going back to why we would consider an unplug timeout, the reason is that we can't
> know if the kernel is giving up the unplug. Turns out that, sometimes, we can.
> Consider a failed memory hotunplug attempt where the kernel will error out with
> the following message:
>
> 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs'
>
> This happens when there is a LMB that the kernel gave up in removing, and the LMBs
> marked for removal of the same DIMM are now being added back. This process happens
We need to be a little careful about terminology here. From the
guest's point of view, there's no such thing as a DIMM, only LMBs.
What the guest is doing here is essentially rejecting a single "index
+ number" DRC unplug request, which corresponds to one DIMM on the
qemu side.
> in the pseries kernel in [1], dlpar_memory_remove_by_ic() into dlpar_add_lmb(), and
> after that update_lmb_associativity_index(). In this function, the kernel is configuring
> the LMB DRC connector again. Note that this is a valid usage in LOPAR, as stated in
> section "ibm,configure-connector RTAS Call":
>
> 'A subsequent sequence of calls to ibm,configure-connector with the same entry from
> the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of
> devices which were not completely configured.'
>
> We can use this kernel behavior in our favor. If a DRC connector reconfiguration
> for a LMB that we marked as unplug pending happens, this indicates that the kernel
> changed its mind about the unplug and is reasserting that it will keep using the
> DIMM. In this case, it's safe to assume that the whole DIMM unplug was cancelled.
>
> This patch hops into rtas_ibm_configure_connector() and, in the scenario described
> above, clear the unplug state for the DIMM device. This will not solve all the
> problems we still have with memory unplug, but it will cover this case where the
> kernel reconfigures LMBs after a failed unplug. We are a bit more resilient,
> without using an unreliable timeout, and we didn't make the remaining error cases
> any worse.
I wonder if we could use this as a beginning of a hotplug failure
reporting mechanism. As noted, this is explicitly allowed by PAPR and
I think in general it makes sense that a configure-connector would
re-assert that the guest is using the resource and we can't unplug it.
Could we extend guests to do an indicative configure-connector on any
unplug it knows it can't complete? Or if configure-connector is too
disruptive could we use an (extra) H_SET_INDICATOR to "UNISOLATE"
state? If I'm reading right, that should be both permitted and a no-op
for existing PAPR implementations, so it should be a pretty safe way
to add that indication.
>
> [1] arch/powerpc/platforms/pseries/hotplug-memory.c
>
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
> hw/ppc/spapr.c | 30 ++++++++++++++++++++++++++++++
> hw/ppc/spapr_drc.c | 14 ++++++++++++++
> include/hw/ppc/spapr.h | 2 ++
> 3 files changed, 46 insertions(+)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ecce8abf14..4bcded4a1a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3575,6 +3575,36 @@ static SpaprDimmState *spapr_recover_pending_dimm_state(SpaprMachineState *ms,
> return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
> }
>
> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> + PCDIMMDevice *dimm)
> +{
> + SpaprDimmState *ds = spapr_pending_dimm_unplugs_find(spapr, dimm);
> + SpaprDrc *drc;
> + uint32_t nr_lmbs;
> + uint64_t size, addr_start, addr;
> + int i;
> +
> + if (ds) {
> + spapr_pending_dimm_unplugs_remove(spapr, ds);
> + }
Hrm... how would !ds arise? Could this just be an assert?
> +
> + size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
> + nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
> +
> + addr_start = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
> + &error_abort);
> +
> + addr = addr_start;
> + for (i = 0; i < nr_lmbs; i++) {
> + drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB,
> + addr / SPAPR_MEMORY_BLOCK_SIZE);
> + g_assert(drc);
> +
> + drc->unplug_requested = false;
> + addr += SPAPR_MEMORY_BLOCK_SIZE;
> + }
> +}
> +
> /* Callback to be called during DRC release. */
> void spapr_lmb_release(DeviceState *dev)
> {
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index c143bfb6d3..eae941233a 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -1230,6 +1230,20 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
>
> drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>
> + /*
> + * This indicates that the kernel is reconfiguring a LMB due to
> + * a failed hotunplug. Clear the pending unplug state for the whole
> + * DIMM.
> + */
> + if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB &&
> + drc->unplug_requested) {
> +
> + /* This really shouldn't happen in this point, but ... */
> + g_assert(drc->dev);
I'm a little worried that a buggy or malicious guest could trigger
this assert.
> +
> + spapr_clear_pending_dimm_unplug_state(spapr, PC_DIMM(drc->dev));
> + }
> +
> if (!drc->fdt) {
> void *fdt;
> int fdt_size;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ccbeeca1de..5bcc8f3bb8 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -847,6 +847,8 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
> int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
> void spapr_clear_pending_events(SpaprMachineState *spapr);
> void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> + PCDIMMDevice *dimm);
> int spapr_max_server_number(SpaprMachineState *spapr);
> void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> uint64_t pte0, uint64_t pte1);
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2021-02-17 2:34 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
2021-02-11 22:52 ` [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical() Daniel Henrique Barboza
2021-02-15 10:40 ` Greg Kurz
2021-02-17 0:51 ` David Gibson
2021-02-11 22:52 ` [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling Daniel Henrique Barboza
2021-02-16 15:50 ` Greg Kurz
2021-02-16 16:09 ` Daniel Henrique Barboza
2021-02-16 17:16 ` Greg Kurz
2021-02-16 17:44 ` Daniel Henrique Barboza
2021-02-17 0:54 ` David Gibson
2021-02-11 22:52 ` [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable Daniel Henrique Barboza
2021-02-17 0:57 ` David Gibson
2021-02-17 10:58 ` Greg Kurz
2021-02-11 22:52 ` [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request() Daniel Henrique Barboza
2021-02-17 0:58 ` David Gibson
2021-02-17 11:01 ` Greg Kurz
2021-02-11 22:52 ` [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer Daniel Henrique Barboza
2021-02-17 1:14 ` David Gibson
2021-02-17 1:20 ` David Gibson
2021-02-11 22:52 ` [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs Daniel Henrique Barboza
2021-02-17 1:23 ` David Gibson
2021-02-11 22:52 ` [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state Daniel Henrique Barboza
2021-02-17 2:31 ` David Gibson [this message]
2021-02-19 20:04 ` Daniel Henrique Barboza
2021-02-22 5:53 ` David Gibson
2021-02-19 21:31 ` Daniel Henrique Barboza
2021-02-22 5:54 ` David Gibson
2021-02-17 2:33 ` [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YCyAAe4dJzpsgQ0x@yekko.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=danielhb413@gmail.com \
--cc=groug@kaod.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).