* [PATCH] user-exec: Do not filter the signal on si_code
@ 2019-09-30 19:29 Richard Henderson
2019-09-30 19:40 ` no-reply
2019-09-30 21:01 ` Richard Henderson
0 siblings, 2 replies; 8+ messages in thread
From: Richard Henderson @ 2019-09-30 19:29 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent, david
This is a workaround for a ppc64le host kernel bug.
For the test case linux-test, we have an instruction trace
IN: sig_alarm
...
IN:
0x400080ed28: 380000ac li r0, 0xac
0x400080ed2c: 44000002 sc
IN: __libc_nanosleep
0x1003bb4c: 7c0802a6 mflr r0
0x1003bb50: f8010010 std r0, 0x10(r1)
Our signal return trampoline has, rightly, changed the guest
stack page read-only. Which, rightly, faults on the store of
a return address into a stack frame.
Checking the host /proc/pid/maps, we see the expected state:
4000800000-4000810000 r--p 00000000 00:00 0
However, the host kernel has supplied si_code == SEGV_MAPERR,
which is obviously incorrect.
By dropping this check, we may have an extra walk of the page
tables, but this should be inexpensive.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
FWIW, filed as
https://bugzilla.redhat.com/show_bug.cgi?id=1757189
out of habit and then
https://bugs.centos.org/view.php?id=16499
when I remembered that the system is running Centos not RHEL.
---
accel/tcg/user-exec.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 71c4bf6477..31ef091a70 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -143,9 +143,12 @@ static inline int handle_cpu_signal(uintptr_t pc, siginfo_t *info,
* for some other kind of fault that should really be passed to the
* guest, we'd end up in an infinite loop of retrying the faulting
* access.
+ *
+ * XXX: At least one host kernel, ppc64le w/Centos 7 4.14.0-115.6.1,
+ * incorrectly reports SEGV_MAPERR for a STDX write to a read-only page.
+ * Therefore, do not test info->si_code.
*/
- if (is_write && info->si_signo == SIGSEGV && info->si_code == SEGV_ACCERR &&
- h2g_valid(address)) {
+ if (is_write && info->si_signo == SIGSEGV && h2g_valid(address)) {
switch (page_unprotect(h2g(address), pc)) {
case 0:
/* Fault not caused by a page marked unwritable to protect
--
2.17.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] user-exec: Do not filter the signal on si_code
2019-09-30 19:29 [PATCH] user-exec: Do not filter the signal on si_code Richard Henderson
@ 2019-09-30 19:40 ` no-reply
2019-09-30 21:01 ` Richard Henderson
1 sibling, 0 replies; 8+ messages in thread
From: no-reply @ 2019-09-30 19:40 UTC (permalink / raw)
To: richard.henderson; +Cc: david, qemu-devel, laurent
Patchew URL: https://patchew.org/QEMU/20190930192931.20509-1-richard.henderson@linaro.org/
Hi,
This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.
=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===
The full log is available at
http://patchew.org/logs/20190930192931.20509-1-richard.henderson@linaro.org/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] user-exec: Do not filter the signal on si_code
2019-09-30 19:29 [PATCH] user-exec: Do not filter the signal on si_code Richard Henderson
2019-09-30 19:40 ` no-reply
@ 2019-09-30 21:01 ` Richard Henderson
2019-10-01 10:34 ` Peter Maydell
1 sibling, 1 reply; 8+ messages in thread
From: Richard Henderson @ 2019-09-30 21:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Maydell, Paolo Bonzini, laurent, david
On 9/30/19 12:29 PM, Richard Henderson wrote:
> This is a workaround for a ppc64le host kernel bug.
>
> For the test case linux-test, we have an instruction trace
>
> IN: sig_alarm
> ...
>
> IN:
> 0x400080ed28: 380000ac li r0, 0xac
> 0x400080ed2c: 44000002 sc
>
> IN: __libc_nanosleep
> 0x1003bb4c: 7c0802a6 mflr r0
> 0x1003bb50: f8010010 std r0, 0x10(r1)
>
> Our signal return trampoline has, rightly, changed the guest
> stack page read-only. Which, rightly, faults on the store of
> a return address into a stack frame.
>
> Checking the host /proc/pid/maps, we see the expected state:
>
> 4000800000-4000810000 r--p 00000000 00:00 0
>
> However, the host kernel has supplied si_code == SEGV_MAPERR,
> which is obviously incorrect.
>
> By dropping this check, we may have an extra walk of the page
> tables, but this should be inexpensive.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>
> FWIW, filed as
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1757189
>
> out of habit and then
>
> https://bugs.centos.org/view.php?id=16499
>
> when I remembered that the system is running Centos not RHEL.
>
> ---
> accel/tcg/user-exec.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
> index 71c4bf6477..31ef091a70 100644
> --- a/accel/tcg/user-exec.c
> +++ b/accel/tcg/user-exec.c
> @@ -143,9 +143,12 @@ static inline int handle_cpu_signal(uintptr_t pc, siginfo_t *info,
> * for some other kind of fault that should really be passed to the
> * guest, we'd end up in an infinite loop of retrying the faulting
> * access.
> + *
> + * XXX: At least one host kernel, ppc64le w/Centos 7 4.14.0-115.6.1,
> + * incorrectly reports SEGV_MAPERR for a STDX write to a read-only page.
> + * Therefore, do not test info->si_code.
> */
> - if (is_write && info->si_signo == SIGSEGV && info->si_code == SEGV_ACCERR &&
> - h2g_valid(address)) {
> + if (is_write && info->si_signo == SIGSEGV && h2g_valid(address)) {
Ho hum. This change is in conflict with Peter's long comment; I should have
read the context more thoroughly. There is an even longer comment with the
patch description: 9c4bbee9e3b83544257e82566342c29e15a88637
The SEGV_ACCERR check here is to prevent a loop by which page_unprotect races
with itself and, from Peter's analysis,
> * ...but when B gets the mmap lock it finds that the page is already
> PAGE_WRITE, and so it exits page_unprotect() via the "not due to
> protected translation" code path, and wrongly delivers the signal
> to the guest rather than just retrying the access
This bug was fixed in the referenced patch. But then continues:
> Since this would cause an infinite loop if we ever called
> page_unprotect() for some other kind of fault than "write failed due
> to bad access permissions", tighten the condition in
> handle_cpu_signal() to check the signal number and si_code, and add a
> comment so that if somebody does ever find themselves debugging an
> infinite loop of faults they have some clue about why.
>
> (The trick for identifying the correct setting for
> current_tb_invalidated for thread B (needed to handle the precise-SMC
> case) is due to Richard Henderson. Paolo Bonzini suggested just
> relying on si_code rather than trying anything more complicated.)
It is disappointing about the kernel bug. But since this affects Centos 7,
which is what *all* of the gcc compile farm ppc64 machines use, I think we need
to work around it somehow.
Should we simply add SEGV_MAPERR to the set of allowed si_code, to directly
work around the bug? If we got that code from a kernel without the bug, then
page_find should fail to find an entry, and we should then indicate that the
signal should be passed to the guest.
Thoughts?
r~
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] user-exec: Do not filter the signal on si_code
2019-09-30 21:01 ` Richard Henderson
@ 2019-10-01 10:34 ` Peter Maydell
2019-10-01 11:19 ` Laurent Vivier
0 siblings, 1 reply; 8+ messages in thread
From: Peter Maydell @ 2019-10-01 10:34 UTC (permalink / raw)
To: Richard Henderson
Cc: Paolo Bonzini, David Gibson, QEMU Developers, Laurent Vivier
On Mon, 30 Sep 2019 at 22:01, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 9/30/19 12:29 PM, Richard Henderson wrote:
> > This is a workaround for a ppc64le host kernel bug.
> > However, the host kernel has supplied si_code == SEGV_MAPERR,
> > which is obviously incorrect.
> It is disappointing about the kernel bug. But since this affects Centos 7,
> which is what *all* of the gcc compile farm ppc64 machines use, I think we need
> to work around it somehow.
We knew about the ppc kernel bug in 2018:
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06049.html
and the decision at that time was to say "kernel bug, update your
kernel" :-)
thanks
-- PMM
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] user-exec: Do not filter the signal on si_code
2019-10-01 10:34 ` Peter Maydell
@ 2019-10-01 11:19 ` Laurent Vivier
2019-10-01 11:46 ` Peter Maydell
0 siblings, 1 reply; 8+ messages in thread
From: Laurent Vivier @ 2019-10-01 11:19 UTC (permalink / raw)
To: Peter Maydell, Richard Henderson
Cc: Paolo Bonzini, QEMU Developers, David Gibson
Le 01/10/2019 à 12:34, Peter Maydell a écrit :
> On Mon, 30 Sep 2019 at 22:01, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 9/30/19 12:29 PM, Richard Henderson wrote:
>>> This is a workaround for a ppc64le host kernel bug.
>
>>> However, the host kernel has supplied si_code == SEGV_MAPERR,
>>> which is obviously incorrect.
>
>> It is disappointing about the kernel bug. But since this affects Centos 7,
>> which is what *all* of the gcc compile farm ppc64 machines use, I think we need
>> to work around it somehow.
>
> We knew about the ppc kernel bug in 2018:
> https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06049.html
> and the decision at that time was to say "kernel bug, update your
> kernel" :-)
Is it possible to update the farm to Centos 8?
Or as the kernel involved is specifically for POWER9, is it possible to
use only POWER8?
Thanks,
Laurent
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] user-exec: Do not filter the signal on si_code
2019-10-01 11:19 ` Laurent Vivier
@ 2019-10-01 11:46 ` Peter Maydell
2019-10-01 13:15 ` Laurent Vivier
0 siblings, 1 reply; 8+ messages in thread
From: Peter Maydell @ 2019-10-01 11:46 UTC (permalink / raw)
To: Laurent Vivier
Cc: Paolo Bonzini, Richard Henderson, QEMU Developers, David Gibson
On Tue, 1 Oct 2019 at 12:19, Laurent Vivier <laurent@vivier.eu> wrote:
> Is it possible to update the farm to Centos 8?
>
> Or as the kernel involved is specifically for POWER9, is it possible to
> use only POWER8?
My experience is that the gcc cfarm admins aren't in
principle against the idea of upgrading farm machines,
but in practice they tend to have a shortage of effort.
If there's a centos-7-kernel-update package that could
be installed without doing a full distro upgrade that
would probably be pretty easy to ask them to arrange.
thanks
-- PMM
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] user-exec: Do not filter the signal on si_code
2019-10-01 11:46 ` Peter Maydell
@ 2019-10-01 13:15 ` Laurent Vivier
2019-10-01 14:58 ` Richard Henderson
0 siblings, 1 reply; 8+ messages in thread
From: Laurent Vivier @ 2019-10-01 13:15 UTC (permalink / raw)
To: Peter Maydell
Cc: Paolo Bonzini, Richard Henderson, QEMU Developers, David Gibson
Le 01/10/2019 à 13:46, Peter Maydell a écrit :
> On Tue, 1 Oct 2019 at 12:19, Laurent Vivier <laurent@vivier.eu> wrote:
>> Is it possible to update the farm to Centos 8?
>>
>> Or as the kernel involved is specifically for POWER9, is it possible to
>> use only POWER8?
>
> My experience is that the gcc cfarm admins aren't in
> principle against the idea of upgrading farm machines,
> but in practice they tend to have a shortage of effort.
> If there's a centos-7-kernel-update package that could
> be installed without doing a full distro upgrade that
> would probably be pretty easy to ask them to arrange.
It seems Centos provides a 4.18 kernel for POWER9 on Centos 7:
http://mirror.centos.org/altarch/7/os/power9/Packages/kernel-4.18.0-80.7.2.el7.ppc64le.rpm
Thanks,
Laurent
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] user-exec: Do not filter the signal on si_code
2019-10-01 13:15 ` Laurent Vivier
@ 2019-10-01 14:58 ` Richard Henderson
0 siblings, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2019-10-01 14:58 UTC (permalink / raw)
To: Laurent Vivier, Peter Maydell
Cc: Paolo Bonzini, QEMU Developers, David Gibson
On 10/1/19 6:15 AM, Laurent Vivier wrote:
> Le 01/10/2019 à 13:46, Peter Maydell a écrit :
>> On Tue, 1 Oct 2019 at 12:19, Laurent Vivier <laurent@vivier.eu> wrote:
>>> Is it possible to update the farm to Centos 8?
>>>
>>> Or as the kernel involved is specifically for POWER9, is it possible to
>>> use only POWER8?
>>
>> My experience is that the gcc cfarm admins aren't in
>> principle against the idea of upgrading farm machines,
>> but in practice they tend to have a shortage of effort.
>> If there's a centos-7-kernel-update package that could
>> be installed without doing a full distro upgrade that
>> would probably be pretty easy to ask them to arrange.
>
> It seems Centos provides a 4.18 kernel for POWER9 on Centos 7:
>
> http://mirror.centos.org/altarch/7/os/power9/Packages/kernel-4.18.0-80.7.2.el7.ppc64le.rpm
Thanks guys. I've sent a message to the admins asking for an update on gcc135.
r~
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-10-01 14:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-09-30 19:29 [PATCH] user-exec: Do not filter the signal on si_code Richard Henderson
2019-09-30 19:40 ` no-reply
2019-09-30 21:01 ` Richard Henderson
2019-10-01 10:34 ` Peter Maydell
2019-10-01 11:19 ` Laurent Vivier
2019-10-01 11:46 ` Peter Maydell
2019-10-01 13:15 ` Laurent Vivier
2019-10-01 14:58 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).