From: Juliet Kim <julietk@linux.vnet.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
Nathan Lynch <nathanl@linux.ibm.com>
Cc: ego@linux.vnet.ibm.com, mmc <mmc@linux.vnet.ibm.com>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
Date: Thu, 27 Jun 2019 16:59:26 -0500 [thread overview]
Message-ID: <a079dc5c-9d48-370d-baed-d74eb9c4fd92@linux.vnet.ibm.com> (raw)
In-Reply-To: <875zortydg.fsf@concordia.ellerman.id.au>
On 6/27/19 12:01 AM, Michael Ellerman wrote:
> Juliet Kim <julietk@linux.vnet.ibm.com> writes:
>> On 6/25/19 1:51 PM, Nathan Lynch wrote:
>>> Juliet Kim <julietk@linux.vnet.ibm.com> writes:
>>>
>>>> There's some concern this could retry forever, resulting in live lock.
>>> First of all the system will make progress in other areas even if there
>>> are repeated retries; we're not indefinitely holding locks or anything
>>> like that.
>> For instance, system admin runs a script that picks and offlines CPUs in a
>> loop to keep a certain rate of onlined CPUs for energy saving. If LPM keeps
>> putting CPUs back online, that would never finish, and would keepgenerating
>> new offline requests
>>
>>> Second, Linux checks the H_VASI_STATE result on every retry. If the
>>> platform wants to terminate the migration (say, if it imposes a
>>> timeout), Linux will abandon it when H_VASI_STATE fails to return
>>> H_VASI_SUSPENDING. And it seems incorrect to bail out before that
>>> happens, absent hard errors on the Linux side such as allocation
>>> failures.
>> I confirmed with the PHYP and HMC folks that they wouldn't time out the LPM
>> request including H_VASI_STATE, so if the LPM retries were unlucky enough to
>> encounter repeated CPU offline attempts (maybe some customer code retrying
>> that), then the retries could continue indefinitely, or until some manual
>> intervention. And in the mean time, the LPM delay here would cause PHYP to
>> block other operations.
> That sounds like a PHYP bug to me.
>
> cheers
PHYP doesn’t time out because they have no idea how long it will take for OS to
get to the point that it suspends. Other OS allows application to prepare for LPM.
They cannot predict the length of time that is appropriate in all cases and in any
case, it’s unlikely they’d make a change to that would come in time to help with
the current issue.
next prev parent reply other threads:[~2019-06-27 22:02 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-21 6:05 [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration Nathan Lynch
2019-06-24 16:52 ` mmc
2019-06-24 17:23 ` Nathan Lynch
2019-06-25 1:02 ` Juliet Kim
2019-06-25 18:51 ` Nathan Lynch
2019-06-26 21:40 ` Juliet Kim
2019-06-27 5:01 ` Michael Ellerman
2019-06-27 21:59 ` Juliet Kim [this message]
2019-07-01 22:12 ` Nathan Lynch
2019-07-02 2:16 ` Michael Ellerman
2019-07-03 14:27 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a079dc5c-9d48-370d-baed-d74eb9c4fd92@linux.vnet.ibm.com \
--to=julietk@linux.vnet.ibm.com \
--cc=ego@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mmc@linux.vnet.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=nathanl@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).