From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
To: Ulrich Weigand <uweigand@de.ibm.com>
Cc: schwidefsky@de.ibm.com, linux-kernel@vger.kernel.org,
Bodo Stroesser <bstroesser@fujitsu-siemens.com>
Subject: Re: Again: UML on s390 (31Bit)
Date: Fri, 13 May 2005 17:07:33 +0200 [thread overview]
Message-ID: <4284C2B5.5040604@fujitsu-siemens.com> (raw)
In-Reply-To: <427B6B6D.5080609@fujitsu-siemens.com>
[-- Attachment #1: Type: text/plain, Size: 3582 bytes --]
Bodo Stroesser wrote:
> Ulrich Weigand wrote:
>
>> Bodo Stroesser wrote:
>>
>>
>>> Unfortunately, I guess this will not help. But maybe I'm missing
>>> something, as I don't even understand, what the effect of the
>>> attached patch should be.
>>
>>
>> Have you tried it?
Meanwhile I've tried.
Your patch absolutely doesn't change host's behavior in the situation,
that is relevant to UML.
I've prepared and attached a small program that easily can reproduce
the problem. I hope this will help to find a viable solution.
Regards
Bodo
>>
>>
>>> AFAICS, after each call to do_signal(),
>>> entry.S will return to user without regs->trap being checked again.
>>> do_signal() is the only place, where regs->trap is checked, and
>>> it will be called on return to user exactly once.
>>
>>
>> It will be called multiple times if *multiple* signals are pending,
>> and this is exactly the situation in your problem case (some other
>> signal is pending after the ptrace intercept SIGTRAP was delievered).
>
> No, that's not the situation, we talk about.
>
> UML runs its child with ptrace(PTRACE_SYSCALL).
> The syscall-interceptions do not use *real* signals. Instead, before
> and after it calls the syscall-handler, entry.S calls syscall_trace(),
> which again uses ptrace_notify() to inform the father.
> The father will see an event similar to the child receiving SIGTRAP or
> (SIGTRAP|0x80), but there will be no signal queued and do_signal() will
> not be called.
>
> UML does all changes to its child on these two interceptions. It reads
> syscall-number and register contents from the first syscall-interception,
> writes a dummy number to the syscall-number, restarts the child with
> ptrace(PTRACE_SYSCALL) and waits until the second interception for the
> syscall happens. Next it internally executes its syscall-handler for the
> original syscall-number and writes the resulting register contents to
> the child. Now syscall-handling in UML is finished and the child is
> resumed with ptrace(PTRACE_SYSCALL). Host's do_signal() is not called
> while doing all this.
>
> UML does not know, whether a signal is pending or not. It would not
> even help, if there would be a way to retrieve this information. A
> signal still could come in between retrieving the info and the child
> being scheduled after ptrace(PTRACE_SYSCALL).
>
> If there is a signal pending for the child, entry.S now jumps to
> sysc_return, which again jumps to sysc_work, which calls do_signal()
> exactly once. As trap still indicates a syscall, do_signal() possibly
> modifies psw and gpr2, which makes UML fail.
>
> The signal is not related to the syscall. UML does not know, if it is
> delivered while returning from syscall, with do_signal() changing
> registers, or later, without changes from do_signal(). So UML can't
> undo the changes done by do_signal().
>
> To UML the signal is an interrupt, and normally when returning from
> interrupt it doesn't want to modify child's psw or gprs. So UML
> normally does not modify psw or gprs on signal interceptions.
>
> Having said all this, unfortunately I don't see a way to satisfy UML's
> need with the current host implementation.
>
> Regards,
> Bodo
>
>>
>>
>>> So a practical solution should allow to reset regs->trap while the
>>> child is on the first or second syscall interception.
>>
>>
>> This is exactly what this patch is supposed to do: whenever during
>> a ptrace intercept the PSW is changed (as it presumably is by your
>> sigreturn implementation), regs->trap is automatically reset.
>>
>> Bye,
>> Ulrich
>>
>
>
[-- Attachment #2: check_restart_skip.c --]
[-- Type: text/plain, Size: 6548 bytes --]
/*
* This is a tool to test syscall invalidation via ptrace on s390.
* It is based on arch/um/os-Linux/start_up.c
*/
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sched.h>
#include <errno.h>
#include <stdarg.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <asm/unistd.h>
#include <asm/page.h>
#include <linux/ptrace.h>
#include <stddef.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#define ERESTARTNOINTR 513
static int ptrace_child(void *arg)
{
int ret;
int pid = getpid();
int sc_result;
/* Child wants to be ptraced */
if(ptrace(PTRACE_TRACEME, 0, 0, 0) < 0){
perror("ptrace");
kill(pid, SIGKILL);
}
/* Child stops itself */
kill(pid, SIGSTOP);
/* The following part is run under PTRACE_SYSCALL */
/* Here we have "svc __NR_getpid" twice. Father will invalidate the
* first and skip the second by adding 6 to PSWADDR.
* If the host does the unwanted syscall restarting, the second svc
* will be done and the result will be child's pid instead of
* -ERESTARTNOINTR
*/
__asm__ __volatile__ (
" svc %b1\n"
" .long 0\n"
" svc %b1\n"
" lr %0,2"
: "=d" (sc_result)
: "i" (__NR_getpid)
: "2" );
/* Here we are back running PTRACE_CONT */
/* Now we check the result of the syscall */
if (sc_result == -ERESTARTNOINTR)
ret = 0; /* Expected result: syscall was invalidated, no
syscall restart is done */
else if (sc_result == pid)
ret = 1; /* This is wrong, as it is the normal result of
getpid(). Probably host did a syscall restart! */
else
ret = 2; /* We don't know, what happened. There may be a bug in
this test tool */
/* Give father a status indicating success or failure */
exit(ret);
}
static void errout(char *str, int error)
{
printf(str, error);
putchar('\n');
exit(1);
}
static int start_ptraced_child(void **stack_out)
{
void *stack;
unsigned long sp;
int pid, n, status;
stack = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if(stack == MAP_FAILED)
errout("start_ptraced_child : mmap failed, errno = %d", errno);
sp = (unsigned long) stack + PAGE_SIZE - sizeof(void *);
pid = __clone(ptrace_child, (void *) sp, SIGCHLD, NULL);
if(pid < 0)
errout("start_ptraced_child : clone failed, errno = %d", errno);
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("start_ptraced_child : wait failed, errno = %d", errno);
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGSTOP))
errout("start_ptraced_child : expected SIGSTOP, "
"got status = 0x%x", status);
*stack_out = stack;
return(pid);
}
static int stop_ptraced_child(int pid, void *stack)
{
int status, n;
/* We resume our child and let it check it's result */
if(ptrace(PTRACE_CONT, pid, 0, 0) < 0)
errout("stop_ptraced_child : ptrace failed, errno = %d", errno);
/* Now, we wait for the child to exit */
n = waitpid(pid, &status, 0);
if(!WIFEXITED(status))
errout("\nstop_ptraced_child: error: child didn't exit,"
" status 0x%x\n", status);
if(munmap(stack, PAGE_SIZE) < 0)
errout("stop_ptraced_child : munmap failed, errno = %d", errno);
/* Return child's exit status */
return WEXITSTATUS(status);
}
int main(void)
{
void *stack;
int pid, syscall, n, status;
unsigned long addr;
printf("Checking if syscall restart handling in host can be skipped...");
fflush(stdout);
/* First create a child and wait, until it stops itself */
pid = start_ptraced_child(&stack);
/* Now resume the child */
if(ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0)
errout("check_restart_skip : ptrace failed, "
"errno = %d", errno);
/* wait, until child does a syscall */
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("check_restart_skip : wait failed, "
"errno = %d", errno);
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP))
errout("check_restart_skip : expected "
"SIGTRAP, got status = %d", status);
/* Check, if syscall is __NR_getpid */
syscall = ptrace(PTRACE_PEEKUSR, pid, PT_GPR2, 0);
if(syscall != __NR_getpid)
errout("check_restart_skip: unexpected syscall %d\n", syscall);
/* Modify syscall number to -1 */
n = ptrace(PTRACE_POKEUSR, pid, PT_GPR2, -1);
if(n < 0)
errout("check_restart_skip : failed to "
"modify system call, errno = %d", errno);
/* Resume child and wait for second syscall interception */
if(ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0)
errout("check_restart_skip : ptrace failed, "
"errno = %d", errno);
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("check_restart_skip : wait failed, "
"errno = %d", errno);
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP))
errout("check_restart_skip : expected "
"SIGTRAP, got status = %d", status);
/* Now, modify PSW_ADDR to skip second syscall */
addr = ptrace(PTRACE_PEEKUSR, pid, PT_PSWADDR, 0);
n = ptrace(PTRACE_POKEUSR, pid, PT_PSWADDR, addr+6);
if(n < 0)
errout("check_restart_skip : failed to modify PSWADDR,"
" errno = %d", errno);
/* Set syscall result to -ERESTARTNOINTR */
n = ptrace(PTRACE_POKEUSR, pid, PT_GPR2, -ERESTARTNOINTR);
if(n < 0)
errout("check_restart_skip : failed to modify system "
"call result, errno = %d", errno);
/* Here "accidentally" a signal is queued for the child */
kill(pid, SIGALRM);
/* We resume the child again and wait for next interception */
if(ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0)
errout("check_restart_skip : ptrace failed, "
"errno = %d", errno);
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("check_restart_skip : wait failed, "
"errno = %d", errno);
/* The interception must be for the signal, not for a syscall
Here, UML would do some interrupt processing */
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGALRM))
errout("check_restart_skip : expected "
"SIGALRM, got status = %d", status);
/* At the end of interrupt processing, UML would resume the child
* doing ptrace(PTRACE_SYSCALL), but without modifying the regs.
* Here we call stop_ptraced_child, which will resume the child
* with ptrace(PTRACE_CONT). Then the child will check the "result"
* and will exit with
* 0 if the result is -ERESTARTNOINTR
* 1 if the result is child's pid (host did syscall restart)
* 2 if we have an unexpected result
*/
n = stop_ptraced_child(pid, stack);
if (n)
printf("failed, result = %d\n", n);
else
printf("OK\n");
return n;
}
next prev parent reply other threads:[~2005-05-13 15:09 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-04 21:33 Again: UML on s390 (31Bit) Ulrich Weigand
2005-05-06 13:04 ` Bodo Stroesser
2005-05-13 15:07 ` Bodo Stroesser [this message]
2005-05-13 15:26 ` Martin Schwidefsky
2005-05-13 15:37 ` Bodo Stroesser
2005-05-13 15:40 ` Martin Schwidefsky
2005-05-13 15:45 ` Bodo Stroesser
2005-05-13 15:50 ` Martin Schwidefsky
2005-05-13 16:06 ` Bodo Stroesser
2005-05-20 10:09 ` Bodo Stroesser
2005-05-31 16:57 ` Martin Schwidefsky
2005-06-01 10:50 ` Bodo Stroesser
-- strict thread matches above, loose matches on Subject: below --
2005-05-04 16:04 Martin Schwidefsky
2005-05-04 19:02 ` Bodo Stroesser
2005-04-27 20:21 Bodo Stroesser
2005-04-28 8:36 ` Martin Schwidefsky
2005-04-28 9:54 ` Bodo Stroesser
2005-04-28 13:03 ` Martin Schwidefsky
2005-04-28 13:41 ` Bodo Stroesser
2005-04-28 15:27 ` Martin Schwidefsky
2005-04-28 18:50 ` Bodo Stroesser
2005-04-29 11:47 ` Martin Schwidefsky
2005-04-29 12:47 ` Bodo Stroesser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4284C2B5.5040604@fujitsu-siemens.com \
--to=bstroesser@fujitsu-siemens.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=uweigand@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox