* Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck"
@ 2010-06-02 19:23 Oleg Nesterov
2010-06-02 20:07 ` Oleg Nesterov
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Oleg Nesterov @ 2010-06-02 19:23 UTC (permalink / raw)
To: Roland McGrath; +Cc: Andrew Morton, Evan Teran, Jan Kratochvil, linux-kernel
(see https://bugzilla.kernel.org/show_bug.cgi?id=16061)
Roland McGrath wrote:
>
> Oleg, please get an appropriate test case for this into the ptrace-tests suite.
The first thing I did, I created the test-case ;)
#include <signal.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop; nop; nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
It is x86 specific and needs -O2. Probably I can just remove getip()
and related asserts and send it to Jan.
> That change might be the right one, but we should discuss it more in email, and
> look at the situation on other machines.
Yes. And I think it is better to discuss this on lkml.
I do not know what is the right fix. I do not like the fix in
https://bugzilla.kernel.org/attachment.cgi?id=26587 even if it is correct.
I can't explain this, but I think that tracehook.h is not the right place
to call disable_step(). And note that handle_signal() plays with TF anyway.
I am starting to think we should fix this per arch. As for x86, perhaps
we should start with this one-liner
spin_unlock_irq(¤t->sighand->siglock);
tracehook_signal_handler(sig, info, ka, regs,
- test_thread_flag(TIF_SINGLESTEP));
+ test_and_clear_thread_flag(TIF_SINGLESTEP));
return 0;
}
then do other changes.
However, what I am thinking about is the more "clever" change (it
passed ptrace-tests).
Do you think it can be correct? I am asking because I never understood
the TIF_SINGLESTEP/TIF_FORCED_TF interaction. But otoh, shouldn't
TIF_FORCED_TF + X86_EFLAGS_TF always imply TIF_SINGLESTEP? at least
in handle_signal().
IOW, help! To me, the patch below is also cleanup. But only if you think
it can fly ;)
Oleg.
--- 34-rc1/arch/x86/kernel/signal.c~BZ16061_MAYBE_FIX 2010-06-02 21:11:07.000000000 +0200
+++ 34-rc1/arch/x86/kernel/signal.c 2010-06-02 21:11:48.000000000 +0200
@@ -682,6 +682,7 @@ static int
handle_signal(unsigned long sig, siginfo_t *info, struct k_sigaction *ka,
sigset_t *oldset, struct pt_regs *regs)
{
+ bool stepping;
int ret;
/* Are we from a system call? */
@@ -706,13 +707,10 @@ handle_signal(unsigned long sig, siginfo
}
}
- /*
- * If TF is set due to a debugger (TIF_FORCED_TF), clear the TF
- * flag so that register information in the sigcontext is correct.
- */
- if (unlikely(regs->flags & X86_EFLAGS_TF) &&
- likely(test_and_clear_thread_flag(TIF_FORCED_TF)))
- regs->flags &= ~X86_EFLAGS_TF;
+ stepping = test_thread_flag(TIF_SINGLESTEP);
+ if (stepping)
+ // do this before setup_sigcontext()
+ user_disable_single_step(current);
ret = setup_rt_frame(sig, ka, info, oldset, regs);
@@ -748,8 +746,7 @@ handle_signal(unsigned long sig, siginfo
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
- tracehook_signal_handler(sig, info, ka, regs,
- test_thread_flag(TIF_SINGLESTEP));
+ tracehook_signal_handler(sig, info, ka, regs, stepping);
return 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" 2010-06-02 19:23 Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" Oleg Nesterov @ 2010-06-02 20:07 ` Oleg Nesterov 2010-06-06 16:38 ` [PATCH 0/1] ptrace: x86: stepping in a signal handler leaks X86_EFLAGS_TF Oleg Nesterov 2010-06-16 2:31 ` Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" Roland McGrath 2 siblings, 0 replies; 7+ messages in thread From: Oleg Nesterov @ 2010-06-02 20:07 UTC (permalink / raw) To: Roland McGrath; +Cc: Andrew Morton, Evan Teran, Jan Kratochvil, linux-kernel sorry for noise, forgot to mention... On 06/02, Oleg Nesterov wrote: > > However, what I am thinking about is the more "clever" change (it > passed ptrace-tests). > > Do you think it can be correct? I am asking because I never understood > the TIF_SINGLESTEP/TIF_FORCED_TF interaction. But otoh, shouldn't > TIF_FORCED_TF + X86_EFLAGS_TF always imply TIF_SINGLESTEP? at least > in handle_signal(). > > IOW, help! To me, the patch below is also cleanup. But only if you think > it can fly ;) and it is not clear to me if we should keep this code /* * Clear TF when entering the signal handler, but * notify any tracer that was single-stepping it. * The tracer may want to single-step inside the * handler too. */ regs->flags &= ~X86_EFLAGS_TF; in handle_signal(). If TF was set by us, it was cleared by user_disable_single_step(). Otherwise, why should we clear it if the tracer did set_flags(X86_EFLAGS_TF) ? Oleg. > --- 34-rc1/arch/x86/kernel/signal.c~BZ16061_MAYBE_FIX 2010-06-02 21:11:07.000000000 +0200 > +++ 34-rc1/arch/x86/kernel/signal.c 2010-06-02 21:11:48.000000000 +0200 > @@ -682,6 +682,7 @@ static int > handle_signal(unsigned long sig, siginfo_t *info, struct k_sigaction *ka, > sigset_t *oldset, struct pt_regs *regs) > { > + bool stepping; > int ret; > > /* Are we from a system call? */ > @@ -706,13 +707,10 @@ handle_signal(unsigned long sig, siginfo > } > } > > - /* > - * If TF is set due to a debugger (TIF_FORCED_TF), clear the TF > - * flag so that register information in the sigcontext is correct. > - */ > - if (unlikely(regs->flags & X86_EFLAGS_TF) && > - likely(test_and_clear_thread_flag(TIF_FORCED_TF))) > - regs->flags &= ~X86_EFLAGS_TF; > + stepping = test_thread_flag(TIF_SINGLESTEP); > + if (stepping) > + // do this before setup_sigcontext() > + user_disable_single_step(current); > > ret = setup_rt_frame(sig, ka, info, oldset, regs); > > @@ -748,8 +746,7 @@ handle_signal(unsigned long sig, siginfo > recalc_sigpending(); > spin_unlock_irq(¤t->sighand->siglock); > > - tracehook_signal_handler(sig, info, ka, regs, > - test_thread_flag(TIF_SINGLESTEP)); > + tracehook_signal_handler(sig, info, ka, regs, stepping); > > return 0; > } ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 0/1] ptrace: x86: stepping in a signal handler leaks X86_EFLAGS_TF 2010-06-02 19:23 Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" Oleg Nesterov 2010-06-02 20:07 ` Oleg Nesterov @ 2010-06-06 16:38 ` Oleg Nesterov 2010-06-06 16:39 ` [PATCH 1/1] " Oleg Nesterov 2010-06-16 2:31 ` Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" Roland McGrath 2 siblings, 1 reply; 7+ messages in thread From: Oleg Nesterov @ 2010-06-06 16:38 UTC (permalink / raw) To: Roland McGrath, Andrew Morton; +Cc: Evan Teran, Jan Kratochvil, linux-kernel On 06/02, Oleg Nesterov wrote: > > I am starting to think we should fix this per arch. As for x86, perhaps > we should start with this one-liner > > spin_unlock_irq(¤t->sighand->siglock); > > tracehook_signal_handler(sig, info, ka, regs, > - test_thread_flag(TIF_SINGLESTEP)); > + test_and_clear_thread_flag(TIF_SINGLESTEP)); > > return 0; > } > > then do other changes. I am sending this patch. It is still not clear to me what is the "right" fix, we need more discussion. Let's fix the bug first. Oleg. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/1] ptrace: x86: stepping in a signal handler leaks X86_EFLAGS_TF 2010-06-06 16:38 ` [PATCH 0/1] ptrace: x86: stepping in a signal handler leaks X86_EFLAGS_TF Oleg Nesterov @ 2010-06-06 16:39 ` Oleg Nesterov 0 siblings, 0 replies; 7+ messages in thread From: Oleg Nesterov @ 2010-06-06 16:39 UTC (permalink / raw) To: Roland McGrath, Andrew Morton; +Cc: Evan Teran, Jan Kratochvil, linux-kernel See https://bugzilla.kernel.org/show_bug.cgi?id=16061 When the TIF_SINGLESTEP tracee dequeues a signal, handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but leaves TIF_SINGLESTEP set. If the tracer does PTRACE_SINGLESTEP again, enable_single_step() sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the tracee gets the wrong SIGTRAP. Test-case (based on the excellent report from Evan): #include <unistd.h> #include <stdio.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <sys/user.h> #include <assert.h> #include <stddef.h> void handler(int n) { asm("nop"); } int child(void) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); signal(SIGALRM, handler); kill(getpid(), SIGALRM); return 0x23; } void *getip(int pid) { return (void*)ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), 0); } int main(void) { int pid, status; pid = fork(); if (!pid) return child(); assert(wait(&status) == pid); assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM); assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0); assert(wait(&status) == pid); assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP); assert((getip(pid) - (void*)handler) == 0); assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0); assert(wait(&status) == pid); assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP); assert((getip(pid) - (void*)handler) == 1); assert(ptrace(PTRACE_CONT, pid, 0,0) == 0); assert(wait(&status) == pid); assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23); return 0; } Change handle_signal() to clear TIF_SINGLESTEP as well. We cleared X86_EFLAGS_TF and TIF_FORCED_TF, we are not going to return to user-mode if TIF_SINGLESTEP was set, and we already passed syscall_trace_leave(). We are going to sleep until the next ptrace_resume() which should set these flags correctly if needed. Note: this is the most simple fix now, most probably we need more changes/cleanups on top of this patch. Reported-by: Evan Teran <eteran@alum.rit.edu> Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- arch/x86/kernel/signal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- 34-rc1/arch/x86/kernel/signal.c~BZ16061_TEMPORARY_FIX 2010-06-06 16:22:55.000000000 +0200 +++ 34-rc1/arch/x86/kernel/signal.c 2010-06-06 16:47:55.000000000 +0200 @@ -749,7 +749,7 @@ handle_signal(unsigned long sig, siginfo spin_unlock_irq(¤t->sighand->siglock); tracehook_signal_handler(sig, info, ka, regs, - test_thread_flag(TIF_SINGLESTEP)); + test_and_clear_thread_flag(TIF_SINGLESTEP)); return 0; } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" 2010-06-02 19:23 Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" Oleg Nesterov 2010-06-02 20:07 ` Oleg Nesterov 2010-06-06 16:38 ` [PATCH 0/1] ptrace: x86: stepping in a signal handler leaks X86_EFLAGS_TF Oleg Nesterov @ 2010-06-16 2:31 ` Roland McGrath 2010-06-16 19:56 ` Oleg Nesterov 2 siblings, 1 reply; 7+ messages in thread From: Roland McGrath @ 2010-06-16 2:31 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Jan Kratochvil, linux-kernel, x86 Sorry I'm slow in getting back to this. I was sick last week and I'm still rather foggy-headed. I haven't actually tried today to think through all the corners you've talked about. I'm hoping a couple of comments might clarify things for you enough that you'll restate it all in ways that don't require me to get less foggy and follow every detail of what you've said so far. :-} x86's TIF_SINGLESTEP is slightly confusingly overloaded for two things that are sort of different, but also perhaps are entirely redundant. (I'm the one who did that to begin with when originally fixing the syscall-step and signal-step corners years ago, but I still get to complain. ;-) Its original use is for do_debug: if ((dr6 & DR_STEP) && !user_mode(regs)) { This is for non-interrupt kernel entries where the hardware does not clear TF from the incoming user-mode eflags. I believe that's only the sysenter instruction, but I'm not entirely positive off hand. (This is helpfully not true of the syscall instruction that x86-64 has and that AMD CPUs use instead of sysenter for 32-bit processes on x86-64--grep MSR_SYSCALL_MASK.) Here the original idea was that the user registers have TF set, but it took effect in kernel mode. So it has to be cleared to stop interfering with the normal kernel asm paths. But it wants to be set again before we return to user mode. This is simplistic thinking just trying to mimic the hardware logic for interrupt entries (including 'int $0x80' syscalls), where the hardware stores TF in the trap frame with the rest of the user-mode state, and clears TF before executing the first kernel-mode instruction. (Since we set the MSR right, that's what 'syscall' instructions do in hardware too.) In the problem cases (such as ia32_sysenter_target), the hardware just goes to kernel mode and only changes cs and ip so the kernel code has to manually store all the user-mode state, including its eflags. Since eflags is left as it was, TF there causes the trap to do_debug after the first kernel-mode instruction--a few instructions before the kernel asm has stored the user eflags anywhere. So the original meaning of TIF_SINGLESTEP was simply to turn TF back on before going to user mode (I think it might originally have been handled in do_notify_resume). That was before I got involved. Perhaps some ancient LKML archives can reveal the actual history, but I've never looked. I suspect the way it went was that people noticed an unhandled trap in kernel mode after sysenter support was added, and just scratched that itch by clearing it in do_debug. Then it would have been noticed that a sysenter (vs an 'int $0x80') caused TF state just to be lost, so that both user-mode setting TF itself, and PTRACE_SINGLESTEP as it existed at the time, stopped working as they had. That's solved by adding TIF_SINGLESTEP as it originally was, just to get TF set again in user mode. Then it was cited as a problem that single-step over a system call instruction (of whichever flavor) didn't actually single-step, but double-stepped. This is when I first got involved, and this was the beginning of my understanding of the kernel entry asm paths--I was probably quite vague then on how the do_debug trap in kernel mode case might happen. For 'int $0x80' system calls, this is because the user-mode TF is just restored by the hardware in 'iret' (which is what transitions the hardware back from kernel to user mode), so it fires when the next user instruction completes--but from the user perspective, the last instruction was the 'int $0x80' itself (done with TF set), so by the hardware rules as they apply to all unprivileged instructions, the single-step trap should be right before the following instruction, not after it. For sysenter or syscall, it's similar: the sysexit (or sysret) instruction restores the user-mode eflags at the same time it transitions to user mode, so that TF applies to the following user instruction rather than the sysenter (syscall) that has just completed from the user perspective. To address that, we added the TIF_SINGLESTEP path to syscall_trace_leave (to further confuse things, it was a unified syscall_trace at the time). That originally just treated TIF_SINGLESTEP the same as it did TIF_SYSCALL_TRACE for exit tracing, since basic ptrace stops look the same either way. (We later cleaned that up since PTRACE_O_TRACESYSGOOD makes a syscall-exit stop look different from a single-step trap, and PTRACE_SINGLESTEP should look like a step, not like PTRACE_SYSCALL when you didn't use that. I also tried to clean it up more with thoughts of utrace, where syscall-tracing and single-step are independent knobs rather than being mutually exclusive as the old ptrace interface has it. And, of course, there was TIF_SYSCALL_EMU thrown in there somewhere to really complicate the possible control flow.) Much later, the TIF_SINGLESTEP path to syscall_trace_enter was added. (I'm skipping ahead in the chronology here to stay closer on topic.) This is for the (original) scenario of the DR_STEP trap in kernel mode, i.e. the sysenter entry path. This is to notice that the user-mode TF was artificially cleared by do_debug, and restore it into task_pt_regs()->flags as early as possible. (This would have been a good way to do the original TIF_SINGLESTEP back in prehistory, but nobody thought of doing it that way.) This is so that an examination of task_pt_regs() sees the "real" user-mode state, e.g. ptrace peeking at it during the syscall-entry tracing stop. Back to prehistory. It's always been possible in the hardware for user-mode to set TF itself, i.e.: pushf orw $0x200, (%esp) popf Weird applications actually do this, for strange ideas about code that checks who calls it, pure user self-debugging, and other weirdness. (That all may be ill-advised to do, but it's always been well-defined.) When user code does this, what happens is a natural analogue of what the hardware does: you get a SIGTRAP signal (presumably having installed a handler), with the whole register state (including TF set in eflags) in the struct sigcontext for the handler, but TF is cleared before running the signal handler, so it doesn't recurse. This is why signal.c clears TF in the task_pt_regs() (same pointer as regs arg to handle_signal) when setting up a signal handler. Around the same time as the syscall-step "double-step" issue was noticed, someone noticed that PTRACE_SINGLESTEP,signr for a handled signal had a similar issue: it gives you a SIGTRAP after executing the first instruction of the signal handler. But a person using a debugger would like to see a "stepi" operation go from one known user instruction that was about to be executed to the very next user-mode instruction that will be executed--i.e., stop before the first instruction of the signal handler and have the chance to look at registers and memory before executing it. Since in the signal-handling scenario, there is no state at which you were before one user instruction with TF set, and at the completion of that instruction you are at the first user instruction of the signal handler, there is no way to have an actual trap (do_debug) at the user register state we want to see. So instead we added a "fake trap" for PTRACE_SINGLESTEP, in the form of a ptrace_notify() call. Later that became tracehook_signal_handler(). Not long after, I wanted to clean things up, and added TIF_FORCED_TF. This is to keep debugger single-stepping coherent when user mode sets TF itself, and to only ever show the "real" user mode eflags bits even when debugger single-stepping is in use. This is for user_regset (e.g. for ptrace peeking at registers) and for struct sigcontext when the debugger single-steps into a signal handler. Linus had one of the aforementioned weird applications in binary form and wanted to debug it, so he went whole-hog and also added the is_setting_trap_flag() code that is now in step.c--so single-stepping that "popf" clears TIF_FORCED_TF, and we won't later confuse the user-chosen "real" TF (or lack thereof) with one induced by user_enable_single_step(). (That detection is imperfect in various ways, but solved the problem Linus had.) To complete the background, there is one more set of wrinkles. There are the various potential ptrace stops that take place "inside" some system call (execve, clone, fork, or vfork). In these cases, from the time a debugger could do PTRACE_SINGLESTEP with the user-mode PC about to do the system call instruction (whichever one) until the next actual execution of a user-mode instruction (i.e. the following one or a signal handler preempting it, or the very first one at the entry point for an exec), there is only a single "step" to make from one instruction to the next. But there might be three places for ptrace stops before you get to where resuming would actually execute that next instruction: syscall-entry tracing, in-syscall tracing (i.e. PTRACE_EVENT_*), and syscall-exit tracing. Then there might be a signal stop. At all these places, the debugger can decide to PTRACE_SINGLESTEP or not, after either it or user-mode originally was single-stepping into the system call instruction. So what does that all mean? Historically, we've decided that if the last kind of resumption before signal handling was single-step (or now, block-step) then we act as if you'd single-stepped into the system call instruction, i.e. stop before the next user-mode instruction. So, if you had not been single-stepping and the first time you looked at the user registers was in a PTRACE_EVENT_* stop, and then you do PTRACE_SINGLESTEP, you'd next see a SIGTRAP (now sent inside tracehook_report_syscall_exit(,1)) with the same PC that you just saw (the one following the system call instruction). One final note about the current behavior, for user-mode setting TF itself. Because the trap in kernel mode (sysenter instruction) makes do_debug set TIF_SINGLESTEP, this triggers the syscall_trace_leave path. So a user that does sysenter with TF set will get his SIGTRAP immediately after the sysenter instruction. Conversely, a syscall entry with TF set via 'int $0x80' or via x86-64's 'syscall' just stores the user's full eflags into task_pt_regs()->flags and never notices whether it contains TF, never sets TIF_SINGLESTEP. Hence, one of these entries (i.e. all syscalls from 64-bit user mode, and non-sysenter syscalls from 32-bit user mode on both 32-bit and 64-bit kernels) will "double-step" over the system call instruction and the one after it. Unless I'm mistaken, this is how it's always been for 'int $0x80' with TF set, and was originally that way when there was first sysenter support too, and from the time x86-64 came along, 64-bit system calls have behaved that way too. In the now several years since we overloaded TIF_SINGLESTEP for PTRACE_SINGLESTEP over system calls not to double-step, user-mode setting TF executing sysenter has behaved differently from 'int $0x80'. If it has ever been done in actual application code, that is. It's likely that all the users of TF in pure user mode either don't care about system calls at all, or use TF only on code that only uses 'int $0x80' system calls. (But who knows.) So the status quo for user-mode setting TF and doing system calls of various sorts is of dubious correctness in the abstract, but apparently good enough for the music of the people--and if the double-step is what it is, then the sysenter variation from that is dubious; but since nobody complains, apparently if anyone has ever actually cared they already catered to the quirk. So there is not much motivation to have any of the behavior seen by pure user TF machinations change at all, even if that would be in the direction of "obviously right in the abstract" (i.e. single-step means single-step, not double-step if the first instruction was too magical). So there you have it. What does it all mean for how it ought to be today? TIF_SINGLESTEP today means either or both of two things: TF was in force when user-mode executed sysenter (whether that TF was set by user-mode or by the debugger); or user_enable_single_step() was used more recently than user_disable_single_step(), i.e. if we're in a system call now, we're meant to act as if we'd single-stepped into the system call instruction. TIF_FORCED_TF today means that the user-mode eflags has TF clear, whether or not TF is actually set in task_pt_regs()->flags at the moment. Various things (both hardware and kernel code) can clear TF from the user-mode eflags but fail to clear TIF_FORCED_TF. It may make sense to have signal.c just notice TIF_FORCED_TF and mask TF out of what it stores in the signal frame rather than actually clearing it in the registers beforehand. (Note you must change ia32_signal.c too. If you use a helper for that, perhaps just globalize and rename get_flags() from ptrace.c, since we have it there too.) But it still needs to clear TF for the handler entry if TIF_SINGLESTEP is not set. And notice also restore_sigcontext (and ia32_restore_sigcontext), which also lets userland choose the TF value without regard to TIF_FORCED_TF. That should clear TIF_FORCED_TF if the new user value has TF, or else keep TF set if TIF_FORCED_TF, like set_flags() from ptrace.c does. (ptrace.c, signal.c, and ia32_signal.c all have their own ideas about the mask of eflags that userland (i.e.) sigreturn-self or ptrace-other is allowed to set or clear. That's another dubious situation of its own, but if we harmonize them then both directions of sigcontext handling can just share the ptrace.c code.) The change to the tracehook_signal_handler() call seems quite wrong to me. I'm not sure I could even define exactly what TIF_SINGLESTEP means after that. I think the problem it tries to address won't exist if all the signal paths that touch TF maintain TIF_FORCED_TF judiciously (e.g. by sharing the ptrace.c code), as I just described. It might make sense to replace these two TIF flags with two flags that have different meanings than these, and adjust everything accordingly to wind up with a situation that's easier to follow. But I don't have a clear idea of what the definitions of those flags would be. Thanks, Roland ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" 2010-06-16 2:31 ` Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" Roland McGrath @ 2010-06-16 19:56 ` Oleg Nesterov 2010-06-16 20:53 ` Roland McGrath 0 siblings, 1 reply; 7+ messages in thread From: Oleg Nesterov @ 2010-06-16 19:56 UTC (permalink / raw) To: Roland McGrath; +Cc: Jan Kratochvil, linux-kernel, x86 Roland, thanks a lot for this (long!) explanation. Another email from you which I should save in ~/DOCS. I don't know how many time you spent writing this message, but I bet I spent more trying to understand it ;) And still I need to read it again later. On 06/15, Roland McGrath wrote: > > x86's TIF_SINGLESTEP is slightly confusingly overloaded for two things > that are sort of different, but also perhaps are entirely redundant. Btw yes, I had the same feeling when I did ptrace-utrace. > (I'm the one who did that to begin with when originally fixing the > syscall-step and signal-step corners years ago, but I still get to > complain. ;-) Its original use is for do_debug: > > if ((dr6 & DR_STEP) && !user_mode(regs)) { > > This is for non-interrupt kernel entries where the hardware does not > clear TF from the incoming user-mode eflags. Aha. Damn. Can't resists. And when I read to this point, I decided to learn what exactly DR_STEP and db6 (I looked at native_get_debugreg) mean. More than hour I was trying to google and search in the intel pdf's I have. No lack. Nothing about db[0-6]! However, I discovered that volume 3b system programming guide describes DR6 register (not db!) and it has BS=14 bit (matches DR_STEP == (1 << 14)) which merely means: "the debug exception was triggered by the single-step execution mode". Heh. Finally I can understand this code, more or less. > [... snip a lot of good-to-know details ...] Thanks. > hardware stores TF in the trap frame > with the rest of the user-mode state, and clears TF before executing the > first kernel-mode instruction. (Since we set the MSR right, that's what > 'syscall' instructions do in hardware too.) grep, grep, grep, grep... syscall_init()->wrmsrl(X86_EFLAGS_TF), right? I didn't know. OK. And now I spent a couple more hours. Because I decided to learn how actually syscalls work on x86_64. I read the sysenter code for i386 a long ago, and I found my understanding doesn't match the current reality. grep * a_lot. So, the user-space doesn't use vdso/vsyscall, it merely calls the "syscall" insn. But what about vdso and vsyscall? Looks like, they both do (almost) the same, but using different the methods. IIUC, we put the context of vsyscall_64.o into the gate_vma address (VSYSCALL_START), and then the user-space can call vgetcpu() directly if the application knows about the VSYSCALL_ADDR() magic. But, otoh, we also have vdso. And this is what ldd shows as linux-vdso.so.1. And it also has the "getcpu" helper, __vdso_getcpu(). But, unlike vgetcpu(), this function is "visible" to dynamic linker and thus looks like a normal function. Correct? But why does it have both vdso and vsyscall? I guess, one of them (probably vsyscall) is more or less historical? OK, after this grepping I see it was off-topic. And I do not even remember my concerns which forced me to study this magic now. > So the original meaning of TIF_SINGLESTEP was > simply to turn TF back on before going to user mode > ... > Much later, the TIF_SINGLESTEP path to syscall_trace_enter was added. > This is to notice that the user-mode TF > was artificially cleared by do_debug, and restore it into > task_pt_regs()->flags as early as possible. > This is so that an examination of > task_pt_regs() sees the "real" user-mode state, e.g. ptrace peeking at > it during the syscall-entry tracing stop. Aha. So, in short: we restore TF in syscall_trace_enter() to expose this flag to debugger, right? This was my (vague) understanding, but I wasn't sure. > Back to prehistory. It's always been possible in the hardware for > user-mode to set TF itself, i.e.: > > pushf > orw $0x200, (%esp) > popf Quick question: is it connected to is_setting_trap_flag() ? IOW, what is_setting_trap_flag() actually tells us? Looks like, it should return true if the next insn changes the flags, correct? > but TF is cleared before running > the signal handler, so it doesn't recurse. This is why signal.c clears > TF in the task_pt_regs() (same pointer as regs arg to handle_signal) > when setting up a signal handler. This is not clear to me. Why should we clear TF? The text above says "before running the signal handler". But - if it was set by us: TIF_SINGLESTEP should be true, and we are not going to run the signal handler, we are going to ptrace_notify(SIGTRAP). - else: it was set by the app or debugger. Why should we clear TF? OK, probably we need this if the app sets TF for the self-debugging and has a handerl for SIGTRAP... If it was debugger, then I am not sure. OTOH, I do not know why debugger may want to do set_flags(X86_EFLAGS_TF). > Around the same time as the syscall-step "double-step" issue was > noticed, someone noticed that PTRACE_SINGLESTEP,signr for a handled > signal had a similar issue: it gives you a SIGTRAP after executing the > first instruction of the signal handler. > So instead we added a "fake > trap" for PTRACE_SINGLESTEP Yes, this is clear. Thanks. > Not long after, I wanted to clean things up, and added TIF_FORCED_TF. In short, TIF_FORCED_TF means: we believe we "own" X86_EFLAGS_TF. > and also added the is_setting_trap_flag() code that is now in > step.c--so single-stepping that "popf" clears TIF_FORCED_TF, and we > won't later confuse the user-chosen "real" TF (or lack thereof) with one > induced by user_enable_single_step(). OK, this probably answers my question about is_setting_trap_flag(). But I need to re-read the code and your explanation again, I am starting to lose the concentration. > (That detection is imperfect in > various ways Ah, good. I thought that my misunderstanding is the only problem. > To complete the background, there is one more set of wrinkles. There > are the various potential ptrace stops that take place "inside" some > system call (execve, clone, fork, or vfork). Oh, thanks. I didn't think about this. Again, will re-read your explanation tomorrow. (Probably this doesn't matter in the context of this particular bug, but I am not sure...) > One final note about the current behavior, for user-mode setting TF > itself. Because the trap in kernel mode (sysenter instruction) makes > do_debug set TIF_SINGLESTEP, this triggers the syscall_trace_leave path. Another thing I didn't think about. > So a user that does sysenter with TF set will get his SIGTRAP > immediately after the sysenter instruction. Conversely, a syscall entry > with TF set via 'int $0x80' or via x86-64's 'syscall' just stores the > user's full eflags into task_pt_regs()->flags and never notices whether > it contains TF, never sets TIF_SINGLESTEP. Heh. > Hence, one of these entries > (i.e. all syscalls from 64-bit user mode, and non-sysenter syscalls from > 32-bit user mode on both 32-bit and 64-bit kernels) will "double-step" > over the system call instruction and the one after it. Confused... syscall insn doesn't lead to TIF_SINGLESTEP, so syscall_trace_leave() should return to user-space, and then we should have a single step after the next instruction? (assuming TIF_SYSCALL_TRACE is not set). > It may make sense to have signal.c just notice TIF_FORCED_TF and mask TF > out of what it stores in the signal frame rather than actually clearing > it in the registers beforehand. Yes, the patch I sent you privately does exactly this. > (Note you must change ia32_signal.c OOPS, indeed. > If you use a helper for that, perhaps just globalize and rename > get_flags() from ptrace.c, since we have it there too.) Well, this is minor, but get_flags() from ptrace.c takes the task_struct pointer, while we already have pt_regs in setup_sigcontext... OK, I agree, but then we need a good name for this helper. ptrace_get_task_flags? > But it still > needs to clear TF for the handler entry if TIF_SINGLESTEP is not set. Aha. This partly answers my "Why should we clear TF" question above. At least you agree that if TIF_SINGLESTEP is set, this is not needed. I still can't fully understand why should we clear it otherwise. But, in any case, I guess we should do this because we do not want the user visible changes. > And notice also restore_sigcontext (and ia32_restore_sigcontext), which > also lets userland choose the TF value without regard to TIF_FORCED_TF. Oh! thanks... Again, again, again, will reread tomorrow. But at first glance, doesn't this mean another problem/patch? > The change to the tracehook_signal_handler() call seems quite wrong to > me. Yes, yes, this is clear. See you tomorrow. Thanks! Oleg. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" 2010-06-16 19:56 ` Oleg Nesterov @ 2010-06-16 20:53 ` Roland McGrath 0 siblings, 0 replies; 7+ messages in thread From: Roland McGrath @ 2010-06-16 20:53 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Jan Kratochvil, linux-kernel, x86 > Roland, thanks a lot for this (long!) explanation. Another email from > you which I should save in ~/DOCS. I don't know how many time you spent > writing this message, but I bet I spent more trying to understand it ;) That was my plan. ;-) > Damn. Can't resists. And when I read to this point, I decided to > learn what exactly DR_STEP and db6 (I looked at native_get_debugreg) > mean. More than hour I was trying to google and search in the intel > pdf's I have. No lack. Nothing about db[0-6]! However, I discovered > that volume 3b system programming guide describes DR6 register (not db!) Yes, the assembly syntax is %dbN, some comments and docs say DBn, and others say DRn. They're called the "debug registers", abbreviate randomly. Go team. > and it has BS=14 bit (matches DR_STEP == (1 << 14)) which merely means: > "the debug exception was triggered by the single-step execution mode". > Heh. Finally I can understand this code, more or less. Nobody understands it more than more or less. ;-) To get more confused, try to figure out when the hardware (or kernel) does or doesn't clear the DR_STEP bit (or others) in %db6. (Actually, don't. You can find some old LKML thread about hw_breakpoint where we figured that out, but my brain hurts just recalling that I once almost knew.) > grep, grep, grep, grep... syscall_init()->wrmsrl(X86_EFLAGS_TF), right? Right. > OK. And now I spent a couple more hours. Because I decided to learn > how actually syscalls work on x86_64. I read the sysenter code for > i386 a long ago, and I found my understanding doesn't match the current > reality. grep * a_lot. Muwhahaha. Indeed, this is not directly relevant and I judiciously said "the next instruction" for sysenter without mentioning the vDSO magic that is actually where that PC always is. You don't really need to know about that stuff, for this subject. But go ahead and learn about it, and then you'll get stuck fixing it next time there is a problem. ;-) > So, the user-space doesn't use vdso/vsyscall, it merely calls the > "syscall" insn. Right. x86-64 started out with a single, sensible and optimal mechanism for native system calls. The only reason i386 has __kernel_vsyscall is that there are multiple variants best for different hardware, and you don't know which flavor you want until you boot and see the actual CPU. > But what about vdso and vsyscall? Looks like, they both do (almost) > the same, but using different the methods. Yes, the non-DSO vsyscall page is an older hack and now we are stuck with that as a permanent part of the x86-64 user/kernel ABI. The vDSO is the "modern" method, and is treated similarly on i386 and several other machines (e.g. powerpc has exported calls there too). > OK, after this grepping I see it was off-topic. And I do not even > remember my concerns which forced me to study this magic now. Too late! Now we know that you know, and you will be the expert when I am hit by the bus. > Aha. So, in short: we restore TF in syscall_trace_enter() to expose > this flag to debugger, right? This was my (vague) understanding, but > I wasn't sure. Right. It also exposes it correctly in struct sigcontext if user-mode had set TF itself. This fix came long after most of the rest of the logic. So some other existing code might be redundant or strange-looking now that we have this. > > Back to prehistory. It's always been possible in the hardware for > > user-mode to set TF itself, i.e.: > > > > pushf > > orw $0x200, (%esp) > > popf > > Quick question: is it connected to is_setting_trap_flag() ? IOW, > what is_setting_trap_flag() actually tells us? Looks like, it should > return true if the next insn changes the flags, correct? Correct. is_setting_trap_flag() returns true when the PC is pointing at that "popf", for example. > - if it was set by us: TIF_SINGLESTEP should be true, and > we are not going to run the signal handler, we are going > to ptrace_notify(SIGTRAP). Right. > - else: it was set by the app or debugger. Why should we > clear TF? > > OK, probably we need this if the app sets TF for the > self-debugging and has a handerl for SIGTRAP... Correct. That's always been the only way to usefully set TF yourself in user mode (otherwise you'd just get infinite SIGTRAPs). It corresponds to how the hardware trap behavior works, e.g. if you were using signal handlers to execute old DOS code in emulation and had a DOS debugger in there. > If it was debugger, then I am not sure. OTOH, I do not > know why debugger may want to do set_flags(X86_EFLAGS_TF). It's just the general pedanticism that the debugger should be able to set up any state that user-mode code could have created itself. > > Not long after, I wanted to clean things up, and added TIF_FORCED_TF. > > In short, TIF_FORCED_TF means: we believe we "own" X86_EFLAGS_TF. Well, only sort of. It means we "own" it being on. If it's off, that doesn't mean we're choosing to turn it off when the userland state is on, for example. That might be a nice and clear thing for a flag like that to mean. But there are so many ways that it can be implicitly cleared by hardware and such that it would take more work to make it mean that. > > (That detection is imperfect in various ways > > Ah, good. I thought that my misunderstanding is the only problem. I could tell you some ways it could be wrong. But that's left as an exercise for the reader, and you really just don't care. > > To complete the background, there is one more set of wrinkles. There > > are the various potential ptrace stops that take place "inside" some > > system call (execve, clone, fork, or vfork). > > Oh, thanks. I didn't think about this. Again, will re-read your explanation > tomorrow. (Probably this doesn't matter in the context of this particular > bug, but I am not sure...) I don't think it directly influences anything we're talking about changing. But it is a nonobvious thing I wanted to put in the record. > Confused... syscall insn doesn't lead to TIF_SINGLESTEP, so > syscall_trace_leave() should return to user-space, and then we should > have a single step after the next instruction? (assuming TIF_SYSCALL_TRACE > is not set). You're not confused! That's exactly how it works. >From the userland perspective this is a "double-step": 0000000000400078 <_start>: 400078: 9c pushfq 400079: 66 81 0c 24 00 02 orw $0x200,(%rsp) 40007f: 9d popfq 400080: 0f 05 syscall 400082: f4 hlt This doesn't get a SIGTRAP at 400082 as it would for any normal instruction instead of "syscall". Instead it gets SIGSEGV because "hlt" is executed. > > It may make sense to have signal.c just notice TIF_FORCED_TF and mask TF > > out of what it stores in the signal frame rather than actually clearing > > it in the registers beforehand. > > Yes, the patch I sent you privately does exactly this. Right. I think that direction is where the right fixes lie. > Well, this is minor, but get_flags() from ptrace.c takes the task_struct > pointer, while we already have pt_regs in setup_sigcontext... OK, I agree, > but then we need a good name for this helper. ptrace_get_task_flags? I'd call it user_[gs]et_eflags, but names don't matter much. Feel free to change the signatures to take the pt_regs pointer too. Half the ptrace.c callers have the pointer on hand already too. And it wouldn't hurt to change the signature of the local {get,put}reg{,32} calls to take the argument and have their callers use task_pt_regs() only once, too. > > But it still > > needs to clear TF for the handler entry if TIF_SINGLESTEP is not set. > > Aha. This partly answers my "Why should we clear TF" question above. > At least you agree that if TIF_SINGLESTEP is set, this is not needed. Right. > I still can't fully understand why should we clear it otherwise. But, > in any case, I guess we should do this because we do not want the user > visible changes. It's the only way that user-mode being able to set TF on itself can ever make any kind of sense. > > And notice also restore_sigcontext (and ia32_restore_sigcontext), which > > also lets userland choose the TF value without regard to TIF_FORCED_TF. > > Oh! thanks... Again, again, again, will reread tomorrow. But at first > glance, doesn't this mean another problem/patch? Yes, those should use user_set_eflags. With today's kernel I think you could write another test case where a signal handler returns after setting TF in the sigcontext, so that PTRACE_SINGLESTEP over the sigreturn syscall breaks the untraced userland behavior by the next PTRACE_CONT clearing TF when userland wants it set. Thanks, Roland ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-06-16 20:53 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-02 19:23 Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" Oleg Nesterov 2010-06-02 20:07 ` Oleg Nesterov 2010-06-06 16:38 ` [PATCH 0/1] ptrace: x86: stepping in a signal handler leaks X86_EFLAGS_TF Oleg Nesterov 2010-06-06 16:39 ` [PATCH 1/1] " Oleg Nesterov 2010-06-16 2:31 ` Bug 16061 - single stepping in a signal handler can cause the single step flag to get "stuck" Roland McGrath 2010-06-16 19:56 ` Oleg Nesterov 2010-06-16 20:53 ` Roland McGrath
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).