qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, groug@kaod.org
Subject: Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
Date: Mon, 22 Feb 2021 16:53:32 +1100	[thread overview]
Message-ID: <YDNG3L0YlUhmfeFc@yekko.fritz.box> (raw)
In-Reply-To: <4c6d20f5-4a49-dd0e-8a3a-449e2d61f542@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7022 bytes --]

On Fri, Feb 19, 2021 at 05:04:23PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 2/16/21 11:31 PM, David Gibson wrote:
> > On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
> > > Handling errors in memory hotunplug in the pSeries machine is more complex
> > > than any other device type, because there are all the complications that other
> > > devices has, and more.
> > > 
> > > For instance, determining a timeout for a DIMM hotunplug must consider if it's a
> > > Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug DIMMs.
> > > The size of the DIMM is also a factor, given that longer DIMMs naturally takes
> > > longer to be hotunplugged from the kernel. And there's also the guest memory usage to
> > > be considered: if there's a process that is consuming memory that would be lost by
> > > the DIMM unplug, the kernel will postpone the unplug process until the process
> > > finishes, and then initiate the regular hotunplug process. The first two
> > > considerations are manageable, but the last one is a deal breaker.
> > > 
> > > There is no sane way for the pSeries machine to determine the memory load in the guest
> > > when attempting a DIMM hotunplug - and even if there was a way, the guest can start
> > > using all the RAM in the middle of the unplug process and invalidate our previous
> > > assumptions - and in result we can't even begin to calculate a timeout for the
> > > operation. This means that we can't implement a viable timeout mechanism for memory
> > > unplug in pSeries.
> > > 
> > > Going back to why we would consider an unplug timeout, the reason is that we can't
> > > know if the kernel is giving up the unplug. Turns out that, sometimes, we can.
> > > Consider a failed memory hotunplug attempt where the kernel will error out with
> > > the following message:
> > > 
> > > 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs'
> > > 
> > > This happens when there is a LMB that the kernel gave up in removing, and the LMBs
> > > marked for removal of the same DIMM are now being added back. This process happens
> > 
> > We need to be a little careful about terminology here.  From the
> > guest's point of view, there's no such thing as a DIMM, only LMBs.
> > What the guest is doing here is essentially rejecting a single "index
> > + number" DRC unplug request, which corresponds to one DIMM on the
> > qemu side.
> 
> I'll reword this paragraph to avoid using "DIMM" in the context of the guest
> kernel.
> 
> > 
> > > in the pseries kernel in [1], dlpar_memory_remove_by_ic() into dlpar_add_lmb(), and
> > > after that update_lmb_associativity_index(). In this function, the kernel is configuring
> > > the LMB DRC connector again. Note that this is a valid usage in LOPAR, as stated in
> > > section "ibm,configure-connector RTAS Call":
> > > 
> > > 'A subsequent sequence of calls to ibm,configure-connector with the same entry from
> > > the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of
> > > devices which were not completely configured.'
> > > 
> > > We can use this kernel behavior in our favor. If a DRC connector reconfiguration
> > > for a LMB that we marked as unplug pending happens, this indicates that the kernel
> > > changed its mind about the unplug and is reasserting that it will keep using the
> > > DIMM. In this case, it's safe to assume that the whole DIMM unplug was cancelled.
> > > 
> > > This patch hops into rtas_ibm_configure_connector() and, in the scenario described
> > > above, clear the unplug state for the DIMM device. This will not solve all the
> > > problems we still have with memory unplug, but it will cover this case where the
> > > kernel reconfigures LMBs after a failed unplug. We are a bit more resilient,
> > > without using an unreliable timeout, and we didn't make the remaining error cases
> > > any worse.
> > 
> > I wonder if we could use this as a beginning of a hotplug failure
> > reporting mechanism.  As noted, this is explicitly allowed by PAPR and
> > I think in general it makes sense that a configure-connector would
> > re-assert that the guest is using the resource and we can't unplug it.
> > 
> 
> I think it's worth looking into it. The kernel already does that in case of hotunplug
> failure of LMBs (at least in this particular case), so it's a matter of evaluating
> how hard it is to do the same for e.g. CPUs.
> 
> 
> > Could we extend guests to do an indicative configure-connector on any
> > unplug it knows it can't complete?  Or if configure-connector is too
> > disruptive could we use an (extra) H_SET_INDICATOR to "UNISOLATE"
> > state? If I'm reading right, that should be both permitted and a no-op
> > for existing PAPR implementations, so it should be a pretty safe way
> > to add that indication.
> 
> A quick look in LOPAR shows that set_indicator can be used to report
> hotplug failures (which is a surprise to me, I wasn't aware of it):
> 
> -----
> (Table 13.7, R1-13.5.3.4-4.)
> 
> For all DR options: If this is a DR operation that involves the user insert-
> ing a DR entity, then if the firmware can determine that the inserted entity
> would cause a system disturbance, then the set-indicator RTAS call must
> not unisolate the entity and must return an error status which is unique to the
> particular error.
> -----
> 
> The wording 'would cause a system disturbance' seems generic on purpose, giving
> the firmware/platform the prerrogative to refuse a hotplug for any given
> reason.

Right.  This is about firmware/platform detected errors, which is of
less interest to us at the moment than OS detected errors.

> I also read that there is no rule against using set_indicator with "unisolate" to
> an already unisolated resource. It would be a no-op.
> 
> I don't think we would find fierce opposition if we propose an addendum such as:
> 
> "For all DR options: If this is a DR operation that involves the user removing
> ing a DR entity, then if the partition operational system can determine that
> removing the entity would cause a system disturbance, then the set-indicator RTAS
> call can be used with the 'unisolate' value to inform the platform that the entity
> can not be removed at that time."
> 
> This would be enough to accomplish what we're aiming for using set_indicator and
> unisolate. Then we have 2 options to signal a failed unplug - configure-connector
> and unisolating the device. The guest can use whichever is easier or makes more
> sense.

Sounds good, let's do it.  Because it's a no-op currently, I also
think we can go ahead with an implementation even without waiting for
a PAPR change to be approved.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-02-22  6:01 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
2021-02-11 22:52 ` [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical() Daniel Henrique Barboza
2021-02-15 10:40   ` Greg Kurz
2021-02-17  0:51     ` David Gibson
2021-02-11 22:52 ` [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling Daniel Henrique Barboza
2021-02-16 15:50   ` Greg Kurz
2021-02-16 16:09     ` Daniel Henrique Barboza
2021-02-16 17:16       ` Greg Kurz
2021-02-16 17:44         ` Daniel Henrique Barboza
2021-02-17  0:54           ` David Gibson
2021-02-11 22:52 ` [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable Daniel Henrique Barboza
2021-02-17  0:57   ` David Gibson
2021-02-17 10:58   ` Greg Kurz
2021-02-11 22:52 ` [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request() Daniel Henrique Barboza
2021-02-17  0:58   ` David Gibson
2021-02-17 11:01   ` Greg Kurz
2021-02-11 22:52 ` [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer Daniel Henrique Barboza
2021-02-17  1:14   ` David Gibson
2021-02-17  1:20   ` David Gibson
2021-02-11 22:52 ` [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs Daniel Henrique Barboza
2021-02-17  1:23   ` David Gibson
2021-02-11 22:52 ` [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state Daniel Henrique Barboza
2021-02-17  2:31   ` David Gibson
2021-02-19 20:04     ` Daniel Henrique Barboza
2021-02-22  5:53       ` David Gibson [this message]
2021-02-19 21:31     ` Daniel Henrique Barboza
2021-02-22  5:54       ` David Gibson
2021-02-17  2:33 ` [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YDNG3L0YlUhmfeFc@yekko.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=danielhb413@gmail.com \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).