From: Nathan Lynch <nathanl@linux.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: Greg Kurz <groug@kaod.org>,
linuxppc-dev@lists.ozlabs.org,
Michael Roth <mdroth@linux.vnet.ibm.com>,
Thiago Jung Bauermann <bauerman@linux.ibm.com>,
Cedric Le Goater <clg@kaod.org>
Subject: Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
Date: Fri, 07 Aug 2020 02:05:09 -0500 [thread overview]
Message-ID: <87mu37ylzu.fsf@linux.ibm.com> (raw)
In-Reply-To: <87zh79yen7.fsf@mpe.ellerman.id.au>
Hi everyone,
Michael Ellerman <mpe@ellerman.id.au> writes:
> Greg Kurz <groug@kaod.org> writes:
>> On Tue, 04 Aug 2020 23:35:10 +1000
>> Michael Ellerman <mpe@ellerman.id.au> wrote:
>>> Spinning forever seems like a bad idea, but as has been demonstrated at
>>> least twice now, continuing when we don't know the state of the other
>>> CPU can lead to straight up crashes.
>>>
>>> So I think I'm persuaded that it's preferable to have the kernel stuck
>>> spinning rather than oopsing.
>>>
>>
>> +1
>>
>>> I'm 50/50 on whether we should have a cond_resched() in the loop. My
>>> first instinct is no, if we're stuck here for 20s a stack trace would be
>>> good. But then we will probably hit that on some big and/or heavily
>>> loaded machine.
>>>
>>> So possibly we should call cond_resched() but have some custom logic in
>>> the loop to print a warning if we are stuck for more than some
>>> sufficiently long amount of time.
>>
>> How long should that be ?
>
> Yeah good question.
>
> I guess step one would be seeing how long it can take on the 384 vcpu
> machine. And we can probably test on some other big machines.
>
> Hopefully Nathan can give us some idea of how long he's seen it take on
> large systems? I know he was concerned about the 20s timeout of the
> softlockup detector.
Maybe I'm not quite clear what this is referring to, but I don't think
stop-self/query-stopped-state latency increases with processor count, at
least not on PowerVM. And IIRC I was concerned with the earlier patch's
potential for causing the softlockup watchdog to rightly complain by
polling the stopped state without ever scheduling away.
The fact that smp_query_cpu_stopped() kind of collapses the two distinct
results from the query-cpu-stopped-state RTAS call into one return value
may make it harder than necessary to reason about the questions around
cond_resched() and whether to warn.
Sorry to pull this stunt but I have had some code sitting in a neglected
branch that I think gets the logic around this right.
What we should have is a simple C wrapper for the RTAS call that reflects the
architected inputs and outputs:
================================================================
(-- rtas.c --)
/**
* rtas_query_cpu_stopped_state() - Call RTAS query-cpu-stopped-state.
* @hwcpu: Identifies the processor thread to be queried.
* @status: Pointer to status, valid only on success.
*
* Determine whether the given processor thread is in the stopped
* state. If successful and @status is non-NULL, the thread's status
* is stored to @status.
*
* Return:
* * 0 - Success
* * -1 - Hardware error
* * -2 - Busy, try again later
*/
int rtas_query_cpu_stopped_state(unsigned int hwcpu, unsigned int *status)
{
unsigned int cpu_status;
int token;
int fwrc;
token = rtas_token("query-cpu-stopped-state");
fwrc = rtas_call(token, 1, 2, &cpu_status, hwcpu);
if (fwrc != 0)
goto out;
if (status != NULL)
*status = cpu_status;
out:
return fwrc;
}
================================================================
And then a utility function that waits for the remote thread to enter
stopped state, with higher-level logic for rescheduling and warning. The
fact that smp_query_cpu_stopped() currently does not handle a -2/busy
status is a bug, fixed below by using rtas_busy_delay(). Note the
justification for the explicit cond_resched() in the outer loop:
================================================================
(-- rtas.h --)
/* query-cpu-stopped-state CPU_status */
#define RTAS_QCSS_STATUS_STOPPED 0
#define RTAS_QCSS_STATUS_IN_PROGRESS 1
#define RTAS_QCSS_STATUS_NOT_STOPPED 2
(-- pseries/hotplug-cpu.c --)
/**
* wait_for_cpu_stopped() - Wait for a cpu to enter RTAS stopped state.
*/
static void wait_for_cpu_stopped(unsigned int cpu)
{
unsigned int status;
unsigned int hwcpu;
hwcpu = get_hard_smp_processor_id(cpu);
do {
int fwrc;
/*
* rtas_busy_delay() will yield only if RTAS returns a
* busy status. Since query-cpu-stopped-state can
* yield RTAS_QCSS_STATUS_IN_PROGRESS or
* RTAS_QCSS_STATUS_NOT_STOPPED for an unbounded
* period before the target thread stops, we must take
* care to explicitly reschedule while polling.
*/
cond_resched();
do {
fwrc = rtas_query_cpu_stopped_state(hwcpu, &status);
} while (rtas_busy_delay(fwrc));
if (fwrc == 0)
continue;
pr_err_ratelimited("query-cpu-stopped-state for "
"thread 0x%x returned %d\n",
hwcpu, fwrc);
goto out;
} while (status == RTAS_QCSS_STATUS_NOT_STOPPED ||
status == RTAS_QCSS_STATUS_IN_PROGRESS);
if (status != RTAS_QCSS_STATUS_STOPPED) {
pr_err_ratelimited("query-cpu-stopped-state yielded unknown "
"status %d for thread 0x%x\n",
status, hwcpu);
}
out:
return;
}
[...]
static void pseries_cpu_die(unsigned int cpu)
{
wait_for_cpu_stopped(cpu);
paca_ptrs[cpu]->cpu_start = 0;
}
================================================================
wait_for_cpu_stopped() should be able to accommodate a time-based
warning if necessary, but speaking as a likely recipient of any bug
reports that would arise here, I'm not convinced of the need and I
don't know what a good value would be. It's relatively easy to sample
the stack of a task that's apparently failing to make progress, plus I
probably would use 'perf probe' or similar to report the inputs and
outputs for the RTAS call.
I'm happy to make this a proper submission after I can clean it up and
retest it, or Michael R. is welcome to appropriate it, assuming it's
acceptable.
next prev parent reply other threads:[~2020-08-07 7:07 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-04 3:29 [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death Michael Roth
2020-08-04 13:35 ` Michael Ellerman
2020-08-04 14:16 ` Greg Kurz
2020-08-05 3:07 ` Michael Ellerman
2020-08-05 4:01 ` Thiago Jung Bauermann
2020-08-05 4:37 ` Michael Roth
2020-08-05 22:29 ` Michael Roth
2020-08-05 22:31 ` Michael Roth
2020-08-06 12:51 ` Michael Ellerman
2020-08-07 7:05 ` Nathan Lynch [this message]
2020-08-11 5:39 ` Michael Roth
2020-08-11 11:56 ` Michael Ellerman
2020-08-11 14:46 ` Nathan Lynch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mu37ylzu.fsf@linux.ibm.com \
--to=nathanl@linux.ibm.com \
--cc=bauerman@linux.ibm.com \
--cc=clg@kaod.org \
--cc=groug@kaod.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mdroth@linux.vnet.ibm.com \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).