public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] x86 single-step (TF) vs system calls & traps
@ 2004-06-29  1:55 Roland McGrath
  2004-06-29  2:05 ` Davide Libenzi
  2004-06-29  3:42 ` Linus Torvalds
  0 siblings, 2 replies; 18+ messages in thread
From: Roland McGrath @ 2004-06-29  1:55 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: Ingo Molnar, Andrew Cagney, Linux Kernel Mailing List

Andrew Cagney discovered this problem while working on GDB.  I suspect this
bug has always been there, but I've only actually tested current 2.6 kernels.

When you single-step into a trap instruction, you actually don't get a
SIGTRAP until the instruction after the trap instruction has also executed.
I have demonstrated this in three cases: `into' generating a SIGSEGV that
is suppressed via ptrace; an `int $0x80' system call entry; and a
`sysenter' system call entry via the vsyscall entry point.

In https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=126699 you can find
a working test program and full details on reproducing the problem using
gdb.  I can post all that here if people prefer.  If there were a program
that set the TF bit in the processor flags itself and handled the signals,
I'm sure that would see the same problem, without ptrace being involved.

>From reading the code and the x86 specs on traps, it makes sense why it
happens.  The trap flag causes a single-step trap after the execution of
the instruction that sets the trap flag.  For instructions that generate
their own traps, TF is cleared on the way into the kernel, and it's the
normal iret that is restoring the flag on the way back to user mode.  As
advertised, that executes the next instruction, i.e. whatever the restored
user PC is at, and then traps.  But from the userland perspective, this is
highly unexpected: the user executed one instruction like 'into' or 'int'
or `sysenter', and expects execution to stop after that "instruction" is
done.  To the user, everything the kernel does in response to the trap is
part of the execution of that one instruction.

So, I submit that the sensible behavior in these cases, when TF is set, is
to stop immediately with a simulated single-step trap, before running the
next user instruction.  Below is a patch that implements this on x86, and I
believe it covers all the cases.  I would like some verification from
others on my reading of which cases there are--that is, cases where a
kernel entry is with the saved PC sitting after an instruction that's
already executed to cause that entry.  From the x86 specs, I see only the
following exception of "trap" type: 

#1: some kind of debug traps.  These don't need any help because this is
    the same trap as the single-step trap anyway and the kernel behavior
    (SIGTRAP and its details) are identical already.

#3: int3.  Probably the same story here, but my fix does treat it like `into'.
    Since gdb uses this for its own purposes, you can't demonstrate the
    same anomaly that's easy to see for `into'.

#4: into.  This is a problem case I've demonstrated and that my fix addresses.

Software exceptions like the `int' instruction generates are not classified
the same in that table, but `int $0x80' system call entry behaves the same
way in effect.  My patch handles the system call entry case.

sysenter is not a trap, but the way it and sysexit work seems to have the
same effect.  

Are there any cases of this nature that I have overlooked?

Unfortunately I just don't see a way to address this without inserting into
the trap and system call paths checks for TF being set in the user-mode
flags word.  

The good news is that the fast `sysenter' path does not need any such
check, because of the way `sysenter' interacts with TF: instead the
single-step trap handler that already detects entry in kernel mode via the
`sysenter' case is where the checking happens, so the new overhead is only
in the case where TF was in use.  Something else I would like verification
on is that the `sysenter' entry point is the only case where the
single-step trap will be seen in kernel mode.  My patch assumes this.  I
can't see how else it could come up (except for the KGDB patch, which also
seems to make the same assumption).

I have demonstrated the problem behavior in 32-bit x86 code (see the
bugzilla reference above) both on x86 and on x86-64.  I haven't yet
attacked the x86-64 kernel's 32-bit support to make the same change.

However, in a native 64-bit system call on x86-64, I do not see the same
problem.  This bewilders me.  From the AMD64 docs I have, I would expect
syscall/sysret to behave the same way as we observe sysenter/sysexit doing
in 32-bit mode.  I don't see any difference in the behavior of TF specified
for x86-64 from x86.  Can anyone help me understand what's going on there?


I have only investigated x86 and x86-64 in detail (those are the boxes I
have handy for kernel hacking, and the machines I know well enough off hand
to suggest specific fixes like those below).  Andrew has tested for the
same issue on other architectures, and I believe the same or similar
problem hits powerpc, ia64, and perhaps s390.  I'm not sure if any other
machines have been tested and seen not to have any such isse.


Comments?


Thanks,
Roland


Index: linux-2.6/arch/i386/kernel/entry.S
===================================================================
RCS file: /home/roland/redhat/bkcvs/linux-2.5/arch/i386/kernel/entry.S,v
retrieving revision 1.86
diff -b -p -u -r1.86 entry.S
--- linux-2.6/arch/i386/kernel/entry.S 23 May 2004 05:03:15 -0000 1.86
+++ linux-2.6/arch/i386/kernel/entry.S 24 Jun 2004 06:53:37 -0000
@@ -255,6 +255,13 @@ sysenter_past_esp:
 	pushl %eax
 	SAVE_ALL
 	GET_THREAD_INFO(%ebp)
+	/*
+	 * We don't need to check for TF here as in system_call, below.
+	 * Kernel entry by `sysenter' doesn't clear TF and so we just took
+	 * the single-step trap in kernel mode and set TIF_SINGLESTEP to
+	 * restore TF when we return to user mode.  That will also handle
+	 * simulating the single-step trap as syscall_singlestep arranges.
+	 */
 	cmpl $(nr_syscalls), %eax
 	jae syscall_badsys
 
@@ -278,6 +285,9 @@ ENTRY(system_call)
 	pushl %eax			# save orig_eax
 	SAVE_ALL
 	GET_THREAD_INFO(%ebp)
+	testb $(TF_MASK >> 8),EFLAGS+1(%esp)
+	jnz syscall_singlestep
+syscall_check_nr:
 	cmpl $(nr_syscalls), %eax
 	jae syscall_badsys
 					# system call tracing in operation
@@ -370,6 +380,14 @@ syscall_badsys:
 	movl $-ENOSYS,EAX(%esp)
 	jmp resume_userspace
 
+	ALIGN
+syscall_singlestep:
+#ifdef CONFIG_SMP
+	lock
+#endif
+	btsl $TIF_SINGLESTEP_TRAP,TI_flags(%ebp)
+	jmp syscall_check_nr
+
 /*
  * Build the entry stubs and pointer table with
  * some assembler magic.
Index: linux-2.6/arch/i386/kernel/signal.c
===================================================================
RCS file: /home/roland/redhat/bkcvs/linux-2.5/arch/i386/kernel/signal.c,v
retrieving revision 1.40
diff -b -p -u -r1.40 signal.c
--- linux-2.6/arch/i386/kernel/signal.c 18 Jun 2004 20:35:31 -0000 1.40
+++ linux-2.6/arch/i386/kernel/signal.c 24 Jun 2004 06:54:51 -0000
@@ -613,10 +613,47 @@ void do_notify_resume(struct pt_regs *re
 	if (thread_info_flags & _TIF_SINGLESTEP) {
 		regs->eflags |= TF_MASK;
 		clear_thread_flag(TIF_SINGLESTEP);
+		/*
+		 * TIF_SINGLESTEP is only ever set for a `sysenter' entry
+		 * to the kernel when TF was set in the user-mode flags.
+		 */
+		thread_info_flags |= _TIF_SINGLESTEP_TRAP;
 	}
 	/* deal with pending signal delivery */
 	if (thread_info_flags & _TIF_SIGPENDING)
 		do_signal(regs,oldset);
 	
+	/*
+	 * Check if we should simulate a single-step trap.
+	 * Note this happens after handling other pending signals,
+	 * including the one that set the flag for us.
+	 */
+	if (thread_info_flags & _TIF_SINGLESTEP_TRAP) {
+		clear_thread_flag(TIF_SINGLESTEP_TRAP);
+		if (regs->eflags & X86_EFLAGS_TF) {
+			/*
+			 * We single-stepped into an instruction that
+			 * caused a trap, leaving the PC past that instruction.
+			 * If we return now with TF set, the next instruction
+			 * will be executed before we get a single-step trap.
+			 * But the user just single-stepped and expects a
+			 * stop after the trapping instruction, not the next.
+			 * So fake a single-step trap.
+			 */
+			siginfo_t info;
+			info.si_signo = SIGTRAP;
+			info.si_errno = 0;
+			info.si_code = TRAP_BRKPT;
+			info.si_addr = (void *)regs->eip;
+			force_sig_info(SIGTRAP, &info, current);
+			/*
+			 * Now handle that signal immediately,
+			 * since on return to assembly code it
+			 * won't check _TIF_WORK_MASK again.
+			 */
+			do_signal(regs,oldset);
+		}
+	}
+
 	clear_thread_flag(TIF_IRET);
 }
Index: linux-2.6/arch/i386/kernel/traps.c
===================================================================
RCS file: /home/roland/redhat/bkcvs/linux-2.5/arch/i386/kernel/traps.c,v
retrieving revision 1.74
diff -b -p -u -r1.74 traps.c
--- linux-2.6/arch/i386/kernel/traps.c 18 Jun 2004 15:16:11 -0000 1.74
+++ linux-2.6/arch/i386/kernel/traps.c 24 Jun 2004 06:33:13 -0000
@@ -348,7 +348,8 @@ static inline unsigned long get_cr2(void
 }
 
 static inline void do_trap(int trapnr, int signr, char *str, int vm86,
-			   struct pt_regs * regs, long error_code, siginfo_t *info)
+			   struct pt_regs * regs, long error_code,
+			   siginfo_t *info, int flag_if_tf)
 {
 	if (regs->eflags & VM_MASK) {
 		if (vm86)
@@ -367,6 +368,8 @@ static inline void do_trap(int trapnr, i
 			force_sig_info(signr, info, tsk);
 		else
 			force_sig(signr, tsk);
+		if (flag_if_tf && (regs->eflags & X86_EFLAGS_TF))
+			set_thread_flag(flag_if_tf);
 		return;
 	}
 
@@ -386,7 +389,7 @@ static inline void do_trap(int trapnr, i
 #define DO_ERROR(trapnr, signr, str, name) \
 asmlinkage void do_##name(struct pt_regs * regs, long error_code) \
 { \
-	do_trap(trapnr, signr, str, 0, regs, error_code, NULL); \
+	do_trap(trapnr, signr, str, 0, regs, error_code, NULL, 0); \
 }
 
 #define DO_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr) \
@@ -397,13 +400,14 @@ asmlinkage void do_##name(struct pt_regs
 	info.si_errno = 0; \
 	info.si_code = sicode; \
 	info.si_addr = (void *)siaddr; \
-	do_trap(trapnr, signr, str, 0, regs, error_code, &info); \
+	do_trap(trapnr, signr, str, 0, regs, error_code, &info, 0); \
 }
 
 #define DO_VM86_ERROR(trapnr, signr, str, name) \
 asmlinkage void do_##name(struct pt_regs * regs, long error_code) \
 { \
-	do_trap(trapnr, signr, str, 1, regs, error_code, NULL); \
+	do_trap(trapnr, signr, str, 1, regs, error_code, NULL, \
+		TIF_SINGLESTEP_TRAP); \
 }
 
 #define DO_VM86_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr) \
@@ -414,7 +418,7 @@ asmlinkage void do_##name(struct pt_regs
 	info.si_errno = 0; \
 	info.si_code = sicode; \
 	info.si_addr = (void *)siaddr; \
-	do_trap(trapnr, signr, str, 1, regs, error_code, &info); \
+	do_trap(trapnr, signr, str, 1, regs, error_code, &info, 0); \
 }
 
 DO_VM86_ERROR_INFO( 0, SIGFPE,  "divide error", divide_error, FPE_INTDIV, regs->eip)
Index: linux-2.6/include/asm-i386/thread_info.h
===================================================================
RCS file: /home/roland/redhat/bkcvs/linux-2.5/include/asm-i386/thread_info.h,v
retrieving revision 1.20
diff -b -p -u -r1.20 thread_info.h
--- linux-2.6/include/asm-i386/thread_info.h 19 May 2004 23:34:45 -0000 1.20
+++ linux-2.6/include/asm-i386/thread_info.h 23 Jun 2004 03:08:57 -0000
@@ -143,6 +143,7 @@ static inline unsigned long current_stac
 #define TIF_NEED_RESCHED	3	/* rescheduling necessary */
 #define TIF_SINGLESTEP		4	/* restore singlestep on return to user mode */
 #define TIF_IRET		5	/* return with iret */
+#define TIF_SINGLESTEP_TRAP	6	/* fake singlestep on return to user */
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_POLLING_NRFLAG	16	/* true if poll_idle() is polling TIF_NEED_RESCHED */
 
@@ -152,6 +153,7 @@ static inline unsigned long current_stac
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
 #define _TIF_SINGLESTEP		(1<<TIF_SINGLESTEP)
 #define _TIF_IRET		(1<<TIF_IRET)
+#define _TIF_SINGLESTEP_TRAP	(1<<TIF_SINGLESTEP_TRAP)
 #define _TIF_SYSCALL_AUDIT	(1<<TIF_SYSCALL_AUDIT)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  1:55 [RFC PATCH] x86 single-step (TF) vs system calls & traps Roland McGrath
@ 2004-06-29  2:05 ` Davide Libenzi
  2004-06-29  3:42 ` Linus Torvalds
  1 sibling, 0 replies; 18+ messages in thread
From: Davide Libenzi @ 2004-06-29  2:05 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Andrew Cagney,
	Linux Kernel Mailing List

On Mon, 28 Jun 2004, Roland McGrath wrote:

> Andrew Cagney discovered this problem while working on GDB.  I suspect this
> bug has always been there, but I've only actually tested current 2.6 kernels.

Roland, did you try with -mm kernels?



- Davide


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  1:55 [RFC PATCH] x86 single-step (TF) vs system calls & traps Roland McGrath
  2004-06-29  2:05 ` Davide Libenzi
@ 2004-06-29  3:42 ` Linus Torvalds
  2004-06-29  3:46   ` Roland McGrath
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2004-06-29  3:42 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Ingo Molnar, Andrew Cagney,
	Linux Kernel Mailing List



On Mon, 28 Jun 2004, Roland McGrath wrote:
> 
> When you single-step into a trap instruction, you actually don't get a
> SIGTRAP until the instruction after the trap instruction has also executed.

Yes. This is documented Intel behaviour. It also guarantees that there is 
forward progress in some strange circumstances, if I remember correctly.

And I refuse to make the fast-path slower just because of this. Not only 
has Linux always worked like this, as far as I know all other x86 OS's 
also tend to just do the Intel behaviour thing.

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  3:42 ` Linus Torvalds
@ 2004-06-29  3:46   ` Roland McGrath
  2004-06-29  3:55     ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Roland McGrath @ 2004-06-29  3:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Ingo Molnar, Andrew Cagney,
	Linux Kernel Mailing List

> And I refuse to make the fast-path slower just because of this. 

You are talking about the int $0x80 system call path here?
That is the only non-exception path touched by my changes.

> Not only has Linux always worked like this, as far as I know all other
> x86 OS's also tend to just do the Intel behaviour thing.

The only other one I have at hand to test is NetBSD 1.6.1, which does
indeed behave the same way for its int $0x80 system calls.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  3:46   ` Roland McGrath
@ 2004-06-29  3:55     ` Linus Torvalds
  2004-06-29  4:15       ` Andrew Morton
  2004-06-29  4:32       ` Roland McGrath
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2004-06-29  3:55 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Ingo Molnar, Andrew Cagney,
	Linux Kernel Mailing List



On Mon, 28 Jun 2004, Roland McGrath wrote:
>
> > And I refuse to make the fast-path slower just because of this. 
> 
> You are talking about the int $0x80 system call path here?
> That is the only non-exception path touched by my changes.

That's still the fast path on any machine where this matters.

If the system uses sysenter, it won't matter because we reload TF by hand. 
And if it uses "int 0x80" through the trampoline, it won't matter, because 
the "lost" instruction is part of the trampoline, and not "important" for 
the debuggee. I think it's a "ret" instruction that gets lost, so the only 
thing that happens is that the person "magically" returns to the caller. 

If that's a problem (for "finish" or other logic), I'd be ok with adding a 
"nop" to the vsyscall thing. That would have less of an impact than adding 
the test to the kernel path..

So the _only_ case your patch matters is for old-style binaries that use 
"int 0x80" inline, and there that path is indeed the hot-path.
 
> > Not only has Linux always worked like this, as far as I know all other
> > x86 OS's also tend to just do the Intel behaviour thing.
> 
> The only other one I have at hand to test is NetBSD 1.6.1, which does
> indeed behave the same way for its int $0x80 system calls.

I bet that if you really search, you cna probably find _some_ OS out there
that considered the Intel behaviour a bug, and fixed it with something
like your patch. But I bet it's not just Linux and BSD that use the Intel
behaviour, just because it's such a pain _not_ to.

			Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  3:55     ` Linus Torvalds
@ 2004-06-29  4:15       ` Andrew Morton
  2004-06-29  4:37         ` Roland McGrath
  2004-06-29  4:32       ` Roland McGrath
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2004-06-29  4:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: roland, mingo, cagney, linux-kernel

Linus Torvalds <torvalds@osdl.org> wrote:
>
>  On Mon, 28 Jun 2004, Roland McGrath wrote:
>  >
>  > > And I refuse to make the fast-path slower just because of this. 
>  > 
>  > You are talking about the int $0x80 system call path here?
>  > That is the only non-exception path touched by my changes.
> 
>  That's still the fast path on any machine where this matters.

Davide's patch (which has been in -mm for 6-7 weeks) doesn't add
fastpath overhead.


diff -puN arch/i386/kernel/entry.S~really-ptrace-single-step-2 arch/i386/kernel/entry.S
--- 25/arch/i386/kernel/entry.S~really-ptrace-single-step-2	2004-06-24 13:19:51.721958784 -0700
+++ 25-akpm/arch/i386/kernel/entry.S	2004-06-24 13:19:51.728957720 -0700
@@ -375,7 +375,7 @@ syscall_trace_entry:
 	# perform syscall exit tracing
 	ALIGN
 syscall_exit_work:
-	testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT), %cl
+	testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP), %cl
 	jz work_pending
 	sti				# could let do_syscall_trace() call
 					# schedule() instead
diff -puN arch/i386/kernel/ptrace.c~really-ptrace-single-step-2 arch/i386/kernel/ptrace.c
--- 25/arch/i386/kernel/ptrace.c~really-ptrace-single-step-2	2004-06-24 13:19:51.723958480 -0700
+++ 25-akpm/arch/i386/kernel/ptrace.c	2004-06-24 13:19:51.729957568 -0700
@@ -147,6 +147,7 @@ void ptrace_disable(struct task_struct *
 { 
 	long tmp;
 
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG;
 	put_stack_long(child, EFL_OFFSET, tmp);
 }
@@ -370,6 +371,7 @@ asmlinkage int sys_ptrace(long request, 
 		else {
 			clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
 		}
+		clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 		child->exit_code = data;
 	/* make sure the single step bit is not set. */
 		tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG;
@@ -391,6 +393,7 @@ asmlinkage int sys_ptrace(long request, 
 		if (child->state == TASK_ZOMBIE)	/* already dead */
 			break;
 		child->exit_code = SIGKILL;
+		clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 		/* make sure the single step bit is not set. */
 		tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG;
 		put_stack_long(child, EFL_OFFSET, tmp);
@@ -411,6 +414,7 @@ asmlinkage int sys_ptrace(long request, 
 		}
 		tmp = get_stack_long(child, EFL_OFFSET) | TRAP_FLAG;
 		put_stack_long(child, EFL_OFFSET, tmp);
+		set_tsk_thread_flag(child, TIF_SINGLESTEP);
 		child->exit_code = data;
 		/* give it a chance to run. */
 		wake_up_process(child);
@@ -535,7 +539,8 @@ void do_syscall_trace(struct pt_regs *re
 			audit_syscall_exit(current, regs->eax);
 	}
 
-	if (!test_thread_flag(TIF_SYSCALL_TRACE))
+	if (!test_thread_flag(TIF_SYSCALL_TRACE) &&
+	    !test_thread_flag(TIF_SINGLESTEP))
 		return;
 	if (!(current->ptrace & PT_PTRACED))
 		return;
diff -puN include/asm-i386/thread_info.h~really-ptrace-single-step-2 include/asm-i386/thread_info.h
--- 25/include/asm-i386/thread_info.h~really-ptrace-single-step-2	2004-06-24 13:19:51.724958328 -0700
+++ 25-akpm/include/asm-i386/thread_info.h	2004-06-24 13:19:51.729957568 -0700
@@ -157,7 +157,7 @@ static inline unsigned long current_stac
 
 /* work to do on interrupt/exception return */
 #define _TIF_WORK_MASK \
-  (0x0000FFFF & ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT))
+  (0x0000FFFF & ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP))
 #define _TIF_ALLWORK_MASK	0x0000FFFF	/* work to do on any return to u-space */
 
 /*
_


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  3:55     ` Linus Torvalds
  2004-06-29  4:15       ` Andrew Morton
@ 2004-06-29  4:32       ` Roland McGrath
  2004-06-29  5:15         ` Linus Torvalds
  1 sibling, 1 reply; 18+ messages in thread
From: Roland McGrath @ 2004-06-29  4:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Ingo Molnar, Andrew Cagney,
	Linux Kernel Mailing List

> > You are talking about the int $0x80 system call path here?
> > That is the only non-exception path touched by my changes.
> 
> That's still the fast path on any machine where this matters.

My question was whether this is the only path that you were referring to
when you said you refused to add any more instructions there.  I will seek
alternate solutions that avoid introducing any new instructions into fast
paths, but I want to be quite sure which you insist not to be changed.

> So the _only_ case your patch matters is for old-style binaries that use 
> "int 0x80" inline, and there that path is indeed the hot-path.

Hmm, it really sounds like you didn't read my explanation about the
sysenter case, or read the patch where I added code to handle it.

> If the system uses sysenter, it won't matter because we reload TF by hand. 

The issue does indeed arise using sysenter, as I explained and is easily
demonstrated by trying it.  I'm not sure what you mean here when you say,
"by hand".  The TF trap taken in kernel mode upon sysenter entry causes the
kernel to return using iret, which restores the TF flag in exactly the same
way as returning from other kinds of traps, and likewise executes the
following user-mode instruction.  As I've mentioned, there is no change in
the main path for sysenter entries required.  My patch changes the trap
handler for the single-step trap (in arch/i386/kernel/traps.c:do_debug) to
set the flag requesting a simulated trap before returning to user mode.

> And if it uses "int 0x80" through the trampoline, it won't matter, because 
> the "lost" instruction is part of the trampoline, and not "important" for 
> the debuggee. I think it's a "ret" instruction that gets lost, so the only 
> thing that happens is that the person "magically" returns to the caller. 

Are you referring to the signal trampoline for returning from signal
handlers?  If you are referring to some other trampoline, please clear up
my confusion.  The signal trampoline calls sigreturn, and the PC after that
system call instruction is then abandoned entirely.  What happens in the
case of single-stepping into the sigreturn system call is that (at least)
the first instruction of the restored state where the signal hit is
executed.  In practice, the sigreturn call also clears the TF bit in the
restored state unless the saved flags word has been diddled on the stack.
I submit that when using PTRACE_SINGLESTEP to enter the sigreturn system
call, the most useful thing is to get the next SIGTRAP with exactly the
state restored by the sigreturn call and no more instructions executed yet.
My patch accomplishes that, since sigreturn's clearing of TF in the
restored state doesn't clear the TIF_SINGLESTEP_TRAP kernel flag set on the
way in.

> But I bet it's not just Linux and BSD that use the Intel
> behaviour, just because it's such a pain _not_ to.

I don't think it's really so hard to make this work sanely.  It's just an
arcane thing that's easy to overlook in the initial implementation.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  4:15       ` Andrew Morton
@ 2004-06-29  4:37         ` Roland McGrath
  2004-06-29  7:00           ` Davide Libenzi
  0 siblings, 1 reply; 18+ messages in thread
From: Roland McGrath @ 2004-06-29  4:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, mingo, cagney, linux-kernel

> Davide's patch (which has been in -mm for 6-7 weeks) doesn't add
> fastpath overhead.

I am also dubious about exactly what it does.  That patch seems a bizarre
obfuscation of the code to me.  TIF_SINGLESTEP is really there to handle
the lazy TF clearing for sysenter entry, and that's all.  I don't think
that patch handles user-mode setting TF properly, unusual though that case
is.  How does that patch interact with PT_TRACESYSGOOD?  It appears to me
that PTRACE_SINGLESTEP will now generate a syscall trap instead of a
single-step trap, which is an undesireable change in behavior I would say.

I don't really care about user-mode setting of TF before executing int
$0x80.  If poeple have programs that use TF in user mode, they have never
complained about the issue before.  For PTRACE_SINGLESTEP, Davide's
approach of setting the kernel-work flag directly when PTRACE_SINGLESTEP
sets TF in the user flags word is the obvious way to avoid the test in the
fast path.  I am inclined to combine that approeach with what my patch
does, i.e. just take out the system call fast-path test and set
TIF_SINGLESTEP_TRAP in PTRACE_SINGLESTEP.  I think the way Davide's patch
uses TIF_SINGLESTEP is pretty questionable.


Thanks,
Roland


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  4:32       ` Roland McGrath
@ 2004-06-29  5:15         ` Linus Torvalds
  2004-07-01  8:09           ` Roland McGrath
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2004-06-29  5:15 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Ingo Molnar, Andrew Cagney,
	Linux Kernel Mailing List



On Mon, 28 Jun 2004, Roland McGrath wrote:
> 
> The issue does indeed arise using sysenter, as I explained and is easily
> demonstrated by trying it.  I'm not sure what you mean here when you say,
> "by hand".  The TF trap taken in kernel mode upon sysenter entry causes the
> kernel to return using iret, which restores the TF flag in exactly the same
> way as returning from other kinds of traps, and likewise executes the
> following user-mode instruction.

They are "user-mode" only in theory.

They are really kernel instructions set up by the kernel, and user-mode 
only in the sense that yes, they run in ring3. No actual user-compiled 
code executed anywhere.

At least to me, that vsyscall trampoline is all kernel code. But yes,
you're right, we no longer do the eflags save/restore in user mode, that
was too slow (some of my first versions just cleared TF unconditionally,
and the trampoline was responsible for re-enabling it).

> Are you referring to the signal trampoline for returning from signal
> handlers?  If you are referring to some other trampoline, please clear up
> my confusion.

I was talking about the vsyscall code. 

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  4:37         ` Roland McGrath
@ 2004-06-29  7:00           ` Davide Libenzi
  2004-07-01  7:47             ` Roland McGrath
  0 siblings, 1 reply; 18+ messages in thread
From: Davide Libenzi @ 2004-06-29  7:00 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Linus Torvalds, mingo, cagney,
	Linux Kernel Mailing List

On Mon, 28 Jun 2004, Roland McGrath wrote:

> > Davide's patch (which has been in -mm for 6-7 weeks) doesn't add
> > fastpath overhead.
> 
> I am also dubious about exactly what it does.  That patch seems a bizarre
> obfuscation of the code to me.  TIF_SINGLESTEP is really there to handle
> the lazy TF clearing for sysenter entry, and that's all.  I don't think
> that patch handles user-mode setting TF properly, unusual though that case
> is.  How does that patch interact with PT_TRACESYSGOOD?  It appears to me
> that PTRACE_SINGLESTEP will now generate a syscall trap instead of a
> single-step trap, which is an undesireable change in behavior I would say.
> 
> I don't really care about user-mode setting of TF before executing int
> $0x80.  If poeple have programs that use TF in user mode, they have never
> complained about the issue before.  For PTRACE_SINGLESTEP, Davide's
> approach of setting the kernel-work flag directly when PTRACE_SINGLESTEP
> sets TF in the user flags word is the obvious way to avoid the test in the
> fast path.  I am inclined to combine that approeach with what my patch
> does, i.e. just take out the system call fast-path test and set
> TIF_SINGLESTEP_TRAP in PTRACE_SINGLESTEP.  I think the way Davide's patch
> uses TIF_SINGLESTEP is pretty questionable.

Roland, I don't think (pretty sure actually ;) we can handle the case 
where TF is set from userspace and, at the same time, the user uses 
PTRACE_SINGLESTEP. The ptrace infrastructure uses the hw TF flag to work. 
The PTRACE_SINGLESTEP gives you the SYSGOOD behaviour, if you set it. And 
sends a SIGTRAP notification to the ptrace'ing parent process.


- Davide


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  7:00           ` Davide Libenzi
@ 2004-07-01  7:47             ` Roland McGrath
  2004-07-01 15:14               ` Davide Libenzi
  2004-07-01 20:34               ` Daniel Jacobowitz
  0 siblings, 2 replies; 18+ messages in thread
From: Roland McGrath @ 2004-07-01  7:47 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Andrew Morton, Linus Torvalds, mingo, cagney, Daniel Jacobowitz,
	Linux Kernel Mailing List

> Roland, I don't think (pretty sure actually ;) we can handle the case 
> where TF is set from userspace and, at the same time, the user uses 
> PTRACE_SINGLESTEP. 

I don't know where you pulled the notion of that case from.  I certainly
never mentioned it.  When I raised the case of user-mode setting of TF, I
was quite clear that it's a case when ptrace is not involved.

> The PTRACE_SINGLESTEP gives you the SYSGOOD behaviour, if you set it. And 
> sends a SIGTRAP notification to the ptrace'ing parent process.

Like I said before, that is a change in the behavior.  Since its inception,
SYSGOOD has meant exactly and only that when you use PTRACE_SYSCALL you
will get a different notification for a syscall-tracing stop than other
sources of SIGTRAP that may arise during execution of user code between
system calls.  At no time ever before, has it been possible to get the
SIGTRAP|0x80 wait result when you had not just called PTRACE_SYSCALL.
After your change, calling PTRACE_SINGLESTEP can now produce that result.
I don't think that change is a good thing.  

As the originator of the SYSGOOD option, Dan might have a comment about this.



Thanks,
Roland

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-06-29  5:15         ` Linus Torvalds
@ 2004-07-01  8:09           ` Roland McGrath
  0 siblings, 0 replies; 18+ messages in thread
From: Roland McGrath @ 2004-07-01  8:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Ingo Molnar, Andrew Cagney,
	Linux Kernel Mailing List

> They are "user-mode" only in theory.

And in every practical sense that userland ever contemplates code, except
its supplier.  When you take an asynchronous signal, your PC value might be
anywhere in the middle of the code; even when you take a synchronous signal
produced inside the system call, your PC value will be somewhere in the
middle of the code.  Hence the need for the vsyscall DSO with unwind
information.  The code does memory accesses via your stack pointer to your
normal memory, and these can fault or overflow in all the normal ways.  I
just can't see the basis for an argument that it "makes more sense" for
single-stepping these instructions to work differently than others.

Now, if single-stepping the call to the vsyscall entry point treated the
whole thing as an atomic unit and came immediately out the other side to
the return-address of that call, then we would be on the same page.  That's
actually pretty damn easy to implement with a special case for the vsyscall
PC address in do_debug.  But it would be inconsistent with the fact that
signals don't also automagically roll your PC back or forward out of the
vsyscall code section.  If we made the vsyscall entry point code truly
"virtually atomic", then I would be 100% with you on calling it "kernel code".
I somehow doubt that is what you want to do.  

That said, I don't really know why you would object to the change I've
suggested to do_debug.  It only adds code when the case of single-stepping
into sysenter actually happens.


Thanks,
Roland


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-07-01  7:47             ` Roland McGrath
@ 2004-07-01 15:14               ` Davide Libenzi
  2004-07-01 20:24                 ` Roland McGrath
  2004-07-01 20:34               ` Daniel Jacobowitz
  1 sibling, 1 reply; 18+ messages in thread
From: Davide Libenzi @ 2004-07-01 15:14 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Linus Torvalds, mingo, cagney, Daniel Jacobowitz,
	Linux Kernel Mailing List

On Thu, 1 Jul 2004, Roland McGrath wrote:

> > Roland, I don't think (pretty sure actually ;) we can handle the case 
> > where TF is set from userspace and, at the same time, the user uses 
> > PTRACE_SINGLESTEP. 
> 
> I don't know where you pulled the notion of that case from.  I certainly
> never mentioned it.  When I raised the case of user-mode setting of TF, I
> was quite clear that it's a case when ptrace is not involved.

Sorry, since it was obvious that if the user sets TF in userspace, he 
won't see the instruction following the int80, I thought you meant the 
interaction between userspace-TF and ptrace. That's documented in x86 
manuals and I don't think we should even try to have the userspace-TF to 
have a different behaviour from what x86 describe.



> > The PTRACE_SINGLESTEP gives you the SYSGOOD behaviour, if you set it. And 
> > sends a SIGTRAP notification to the ptrace'ing parent process.
> 
> Like I said before, that is a change in the behavior.  Since its inception,
> SYSGOOD has meant exactly and only that when you use PTRACE_SYSCALL you
> will get a different notification for a syscall-tracing stop than other
> sources of SIGTRAP that may arise during execution of user code between
> system calls.  At no time ever before, has it been possible to get the
> SIGTRAP|0x80 wait result when you had not just called PTRACE_SYSCALL.
> After your change, calling PTRACE_SINGLESTEP can now produce that result.
> I don't think that change is a good thing.  
> 
> As the originator of the SYSGOOD option, Dan might have a comment about this.

Here I meant that if you set SINGLESTEP|SYSGOOD, the patch will give you 
SIGTRAP|0x80, while if you set only SINGLESTEP the patch will give you 
SIGTRAP. Enforcing the SINGLESTEP|SYSGOOD is invalid or only gives SIGTRAP 
should be no more that three lines of code out of the fast path.



- Davide


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-07-01 15:14               ` Davide Libenzi
@ 2004-07-01 20:24                 ` Roland McGrath
  2004-07-01 21:47                   ` Davide Libenzi
  0 siblings, 1 reply; 18+ messages in thread
From: Roland McGrath @ 2004-07-01 20:24 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Andrew Morton, Linus Torvalds, mingo, cagney, Daniel Jacobowitz,
	Linux Kernel Mailing List

> That's documented in x86 manuals and I don't think we should even try to
> have the userspace-TF to have a different behaviour from what x86 describe.

The behavior described for the chip is that it traps after executing one
instruction.  From the user-mode perspective, the system call instruction
is one instruction that does not normally generate any kind of exception.
It just does one magic thing, that being having the kernel do something for
you.  If someone sets TF in user mode then they probably expect that they
will get a trap after executing one instruction, even that one.

> Here I meant that if you set SINGLESTEP|SYSGOOD, the patch will give you 
> SIGTRAP|0x80, while if you set only SINGLESTEP the patch will give you 
> SIGTRAP. Enforcing the SINGLESTEP|SYSGOOD is invalid or only gives SIGTRAP 
> should be no more that three lines of code out of the fast path.

There is no "set SINGLESTEP|SYSGOOD".  PTRACE_SINGLESTEP is a one-time
operation.  PTRACE_O_TRACESYSGOOD is a persistent flag you set when you
intend to at some point use the PTRACE_SYSCALL operation.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-07-01  7:47             ` Roland McGrath
  2004-07-01 15:14               ` Davide Libenzi
@ 2004-07-01 20:34               ` Daniel Jacobowitz
  2004-07-01 21:59                 ` Roland McGrath
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel Jacobowitz @ 2004-07-01 20:34 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Davide Libenzi, Andrew Morton, Linus Torvalds, mingo, cagney,
	Linux Kernel Mailing List

On Thu, Jul 01, 2004 at 12:47:59AM -0700, Roland McGrath wrote:
> > Roland, I don't think (pretty sure actually ;) we can handle the case 
> > where TF is set from userspace and, at the same time, the user uses 
> > PTRACE_SINGLESTEP. 
> 
> I don't know where you pulled the notion of that case from.  I certainly
> never mentioned it.  When I raised the case of user-mode setting of TF, I
> was quite clear that it's a case when ptrace is not involved.
> 
> > The PTRACE_SINGLESTEP gives you the SYSGOOD behaviour, if you set it. And 
> > sends a SIGTRAP notification to the ptrace'ing parent process.
> 
> Like I said before, that is a change in the behavior.  Since its inception,
> SYSGOOD has meant exactly and only that when you use PTRACE_SYSCALL you
> will get a different notification for a syscall-tracing stop than other
> sources of SIGTRAP that may arise during execution of user code between
> system calls.  At no time ever before, has it been possible to get the
> SIGTRAP|0x80 wait result when you had not just called PTRACE_SYSCALL.
> After your change, calling PTRACE_SINGLESTEP can now produce that result.
> I don't think that change is a good thing.  
> 
> As the originator of the SYSGOOD option, Dan might have a comment about this.

I am not the originator of PTRACE_O_TRACESYSGOOD, I just had the bad
luck to touch it.

I think reporting the system call using 0x80|SIGTRAP when you
PTRACE_SINGLESTEP over the trap instruction makes excellent good sense.

-- 
Daniel Jacobowitz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-07-01 20:24                 ` Roland McGrath
@ 2004-07-01 21:47                   ` Davide Libenzi
  0 siblings, 0 replies; 18+ messages in thread
From: Davide Libenzi @ 2004-07-01 21:47 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Linus Torvalds, mingo, cagney, Daniel Jacobowitz,
	Linux Kernel Mailing List

On Thu, 1 Jul 2004, Roland McGrath wrote:

> > Here I meant that if you set SINGLESTEP|SYSGOOD, the patch will give you 
> > SIGTRAP|0x80, while if you set only SINGLESTEP the patch will give you 
> > SIGTRAP. Enforcing the SINGLESTEP|SYSGOOD is invalid or only gives SIGTRAP 
> > should be no more that three lines of code out of the fast path.
> 
> There is no "set SINGLESTEP|SYSGOOD".  PTRACE_SINGLESTEP is a one-time
> operation.  PTRACE_O_TRACESYSGOOD is a persistent flag you set when you
> intend to at some point use the PTRACE_SYSCALL operation.

Allrighty ...

Andrew, per Roland suggestion, can you please add the patch below to the 
bits you already have in -mm?


Signed-off-by: Davide Libenzi <davidel@xmailserver.org>



- Davide



--- a/arch/i386/kernel/ptrace.c	2004-07-01 14:30:38.000000000 -0700
+++ b/arch/i386/kernel/ptrace.c	2004-07-01 14:32:44.000000000 -0700
@@ -546,8 +546,8 @@
 		return;
 	/* the 0x80 provides a way for the tracing parent to distinguish
 	   between a syscall stop and SIGTRAP delivery */
-	ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD)
-				 ? 0x80 : 0));
+	ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) &&
+				 !test_thread_flag(TIF_SINGLESTEP) ? 0x80 : 0));
 
 	/*
 	 * this isn't the same as continuing with a signal, but it will do

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-07-01 20:34               ` Daniel Jacobowitz
@ 2004-07-01 21:59                 ` Roland McGrath
  2004-07-02  4:22                   ` Daniel Jacobowitz
  0 siblings, 1 reply; 18+ messages in thread
From: Roland McGrath @ 2004-07-01 21:59 UTC (permalink / raw)
  To: Daniel Jacobowitz
  Cc: Davide Libenzi, Andrew Morton, Linus Torvalds, mingo, cagney,
	Linux Kernel Mailing List

> I am not the originator of PTRACE_O_TRACESYSGOOD, I just had the bad
> luck to touch it.

My apologies.

> I think reporting the system call using 0x80|SIGTRAP when you
> PTRACE_SINGLESTEP over the trap instruction makes excellent good sense.

If you are not concerned about existing users of PTRACE_O_TRACESYSGOOD
calling PTRACE_SINGLESTEP and then being confused, then I have no objection.
I consider you to be the authority on any such users there might be.

In that case, I'm happy to endorse Davide's original patch.
I will look into extending it to cover x86-64's ia32 support as well.

I still wonder if anyone has any insight into why this issue does not arise
for native x86-64's syscall/sysret.  From my reading of the AMD64 manual, I
would expect it to happen there as well.  That is, sysret is the instruction
that sets TF, and the manual says that the instruction after the one that
sets TF gets executed before the trap.  It would be convenient if sysret
were a special case for this rule, since it makes it do what is best for the
system call case.  But I haven't found a mention of that in the manual.



Thanks,
Roland

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps
  2004-07-01 21:59                 ` Roland McGrath
@ 2004-07-02  4:22                   ` Daniel Jacobowitz
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Jacobowitz @ 2004-07-02  4:22 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Davide Libenzi, Andrew Morton, Linus Torvalds, mingo, cagney,
	Linux Kernel Mailing List

On Thu, Jul 01, 2004 at 02:59:20PM -0700, Roland McGrath wrote:
> > I am not the originator of PTRACE_O_TRACESYSGOOD, I just had the bad
> > luck to touch it.
> 
> My apologies.
> 
> > I think reporting the system call using 0x80|SIGTRAP when you
> > PTRACE_SINGLESTEP over the trap instruction makes excellent good sense.
> 
> If you are not concerned about existing users of PTRACE_O_TRACESYSGOOD
> calling PTRACE_SINGLESTEP and then being confused, then I have no objection.
> I consider you to be the authority on any such users there might be.
> 
> In that case, I'm happy to endorse Davide's original patch.
> I will look into extending it to cover x86-64's ia32 support as well.

I don't know of any example users.  I'm sure there are a couple
somewhere, though.  The new behavior seems intuitively useful to me, so
I'd prefer Davide's original patch.

-- 
Daniel Jacobowitz

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-07-02  4:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-29  1:55 [RFC PATCH] x86 single-step (TF) vs system calls & traps Roland McGrath
2004-06-29  2:05 ` Davide Libenzi
2004-06-29  3:42 ` Linus Torvalds
2004-06-29  3:46   ` Roland McGrath
2004-06-29  3:55     ` Linus Torvalds
2004-06-29  4:15       ` Andrew Morton
2004-06-29  4:37         ` Roland McGrath
2004-06-29  7:00           ` Davide Libenzi
2004-07-01  7:47             ` Roland McGrath
2004-07-01 15:14               ` Davide Libenzi
2004-07-01 20:24                 ` Roland McGrath
2004-07-01 21:47                   ` Davide Libenzi
2004-07-01 20:34               ` Daniel Jacobowitz
2004-07-01 21:59                 ` Roland McGrath
2004-07-02  4:22                   ` Daniel Jacobowitz
2004-06-29  4:32       ` Roland McGrath
2004-06-29  5:15         ` Linus Torvalds
2004-07-01  8:09           ` Roland McGrath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox