* [PATCH] x86_64,entry: Fix RCX for traced syscalls
@ 2014-06-26 19:08 Andy Lutomirski
2014-06-26 19:56 ` Andi Kleen
2014-11-04 23:43 ` Andy Lutomirski
0 siblings, 2 replies; 9+ messages in thread
From: Andy Lutomirski @ 2014-06-26 19:08 UTC (permalink / raw)
To: x86, H. Peter Anvin, Linux Kernel
Cc: Borislav Petkov, Andi Kleen, Andy Lutomirski
The int_ret_from_sys_call and syscall tracing code disagrees with
the sysret path as to the value of RCX.
The Intel SDM, the AMD APM, and my laptop all agree that sysret
returns with RCX == RIP. The syscall tracing code does not respect
this property.
For example, this program:
int main()
{
extern const char syscall_rip[];
unsigned long rcx = 1;
unsigned long orig_rcx = rcx;
asm ("mov $-1, %%eax\n\t"
"syscall\n\t"
"syscall_rip:"
: "+c" (rcx) : : "r11");
printf("syscall: RCX = %lX RIP = %lX orig RCX = %lx\n",
rcx, (unsigned long)syscall_rip, orig_rcx);
return 0;
}
prints:
syscall: RCX = 400556 RIP = 400556 orig RCX = 1
Running it under strace gives this instead:
syscall: RCX = FFFFFFFFFFFFFFFF RIP = 400556 orig RCX = 1
This changes FIXUP_TOP_OF_STACK to match sysret, causing the test to
show RCX == RIP even under strace.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
arch/x86/kernel/entry_64.S | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b25ca96..6624e18 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -143,7 +143,8 @@ ENDPROC(native_usergs_sysret64)
movq \tmp,RSP+\offset(%rsp)
movq $__USER_DS,SS+\offset(%rsp)
movq $__USER_CS,CS+\offset(%rsp)
- movq $-1,RCX+\offset(%rsp)
+ movq RIP+\offset(%rsp),\tmp /* get rip */
+ movq \tmp,RCX+\offset(%rsp) /* copy it to rcx as sysret would do */
movq R11+\offset(%rsp),\tmp /* get eflags */
movq \tmp,EFLAGS+\offset(%rsp)
.endm
--
1.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] x86_64,entry: Fix RCX for traced syscalls
2014-06-26 19:08 [PATCH] x86_64,entry: Fix RCX for traced syscalls Andy Lutomirski
@ 2014-06-26 19:56 ` Andi Kleen
2014-06-26 19:59 ` Andy Lutomirski
2014-11-04 23:43 ` Andy Lutomirski
1 sibling, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2014-06-26 19:56 UTC (permalink / raw)
To: Andy Lutomirski
Cc: x86, H. Peter Anvin, Linux Kernel, Borislav Petkov, Andi Kleen
> show RCX == RIP even under strace.
If you think it's really worth the extra instruction?
It's not wrong, but it's not clear if it's useful.
-Andi
>
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
> arch/x86/kernel/entry_64.S | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index b25ca96..6624e18 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -143,7 +143,8 @@ ENDPROC(native_usergs_sysret64)
> movq \tmp,RSP+\offset(%rsp)
> movq $__USER_DS,SS+\offset(%rsp)
> movq $__USER_CS,CS+\offset(%rsp)
> - movq $-1,RCX+\offset(%rsp)
> + movq RIP+\offset(%rsp),\tmp /* get rip */
> + movq \tmp,RCX+\offset(%rsp) /* copy it to rcx as sysret would do */
> movq R11+\offset(%rsp),\tmp /* get eflags */
> movq \tmp,EFLAGS+\offset(%rsp)
> .endm
> --
> 1.9.3
>
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86_64,entry: Fix RCX for traced syscalls
2014-06-26 19:56 ` Andi Kleen
@ 2014-06-26 19:59 ` Andy Lutomirski
2014-06-26 20:00 ` Andy Lutomirski
0 siblings, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2014-06-26 19:59 UTC (permalink / raw)
To: Andi Kleen; +Cc: X86 ML, H. Peter Anvin, Linux Kernel, Borislav Petkov
On Thu, Jun 26, 2014 at 12:56 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> show RCX == RIP even under strace.
>
> If you think it's really worth the extra instruction?
Hard to say. That extra instruction only happens on slow paths, so I
suspect the slowdown is negligible. On the other hand, having syscall
show a blatant difference in behavior between traced and untraced
processes seems unfortunate.
>
> It's not wrong, but it's not clear if it's useful.
--Andy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86_64,entry: Fix RCX for traced syscalls
2014-06-26 19:59 ` Andy Lutomirski
@ 2014-06-26 20:00 ` Andy Lutomirski
2014-06-26 20:12 ` H. Peter Anvin
0 siblings, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2014-06-26 20:00 UTC (permalink / raw)
To: Andi Kleen; +Cc: X86 ML, H. Peter Anvin, Linux Kernel, Borislav Petkov
On Thu, Jun 26, 2014 at 12:59 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Jun 26, 2014 at 12:56 PM, Andi Kleen <andi@firstfloor.org> wrote:
>>> show RCX == RIP even under strace.
>>
>> If you think it's really worth the extra instruction?
>
> Hard to say. That extra instruction only happens on slow paths, so I
> suspect the slowdown is negligible. On the other hand, having syscall
> show a blatant difference in behavior between traced and untraced
> processes seems unfortunate.
>
>>
>> It's not wrong, but it's not clear if it's useful.
Also, if anyone ever wants to add some code to switch back from iret
to sysret when sysret will work, this is a prerequisite. Otherwise
sysret will never match iret. (I'm not immediately planning on doing
this, but I can imagine workloads (e.g. UML) for which it would be a
big improvement.)
--Andy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86_64,entry: Fix RCX for traced syscalls
2014-06-26 20:00 ` Andy Lutomirski
@ 2014-06-26 20:12 ` H. Peter Anvin
2014-06-26 20:47 ` Andy Lutomirski
0 siblings, 1 reply; 9+ messages in thread
From: H. Peter Anvin @ 2014-06-26 20:12 UTC (permalink / raw)
To: Andy Lutomirski, Andi Kleen; +Cc: X86 ML, Linux Kernel, Borislav Petkov
The real question is if we care that sysret and iter don't match. On 32 bits the situation is even more complex.
On June 26, 2014 1:00:22 PM PDT, Andy Lutomirski <luto@amacapital.net> wrote:
>On Thu, Jun 26, 2014 at 12:59 PM, Andy Lutomirski <luto@amacapital.net>
>wrote:
>> On Thu, Jun 26, 2014 at 12:56 PM, Andi Kleen <andi@firstfloor.org>
>wrote:
>>>> show RCX == RIP even under strace.
>>>
>>> If you think it's really worth the extra instruction?
>>
>> Hard to say. That extra instruction only happens on slow paths, so I
>> suspect the slowdown is negligible. On the other hand, having
>syscall
>> show a blatant difference in behavior between traced and untraced
>> processes seems unfortunate.
>>
>>>
>>> It's not wrong, but it's not clear if it's useful.
>
>Also, if anyone ever wants to add some code to switch back from iret
>to sysret when sysret will work, this is a prerequisite. Otherwise
>sysret will never match iret. (I'm not immediately planning on doing
>this, but I can imagine workloads (e.g. UML) for which it would be a
>big improvement.)
>
>--Andy
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86_64,entry: Fix RCX for traced syscalls
2014-06-26 20:12 ` H. Peter Anvin
@ 2014-06-26 20:47 ` Andy Lutomirski
2014-06-28 17:07 ` Pavel Machek
0 siblings, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2014-06-26 20:47 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Andi Kleen, X86 ML, Linux Kernel, Borislav Petkov
On Thu, Jun 26, 2014 at 1:12 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> The real question is if we care that sysret and iter don't match. On 32 bits the situation is even more complex.
At least for 64 bits, iret vs sysret is purely a kernel implementation
detail (except where a tracer modifies things that are inaccessible to
sysret), so ISTM it's worth one instruction to make them match.
I noticed this thing while fiddling with moving some of the syscall
tracing logic to C. This isn't a real problem, but it at least made
me scratch my head.
--Andy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86_64,entry: Fix RCX for traced syscalls
2014-06-26 20:47 ` Andy Lutomirski
@ 2014-06-28 17:07 ` Pavel Machek
2014-06-30 15:12 ` Andy Lutomirski
0 siblings, 1 reply; 9+ messages in thread
From: Pavel Machek @ 2014-06-28 17:07 UTC (permalink / raw)
To: Andy Lutomirski
Cc: H. Peter Anvin, Andi Kleen, X86 ML, Linux Kernel, Borislav Petkov
On Thu 2014-06-26 13:47:32, Andy Lutomirski wrote:
> On Thu, Jun 26, 2014 at 1:12 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> > The real question is if we care that sysret and iter don't match. On 32 bits the situation is even more complex.
>
> At least for 64 bits, iret vs sysret is purely a kernel implementation
> detail (except where a tracer modifies things that are inaccessible to
> sysret), so ISTM it's worth one instruction to make them match.
>
> I noticed this thing while fiddling with moving some of the syscall
> tracing logic to C. This isn't a real problem, but it at least made
> me scratch my head.
If possible, we'd like to trace programs without programs being noticed they are
being traced. See subterfugue utility, for example.
It is certainly worth one extra instruction.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86_64,entry: Fix RCX for traced syscalls
2014-06-28 17:07 ` Pavel Machek
@ 2014-06-30 15:12 ` Andy Lutomirski
0 siblings, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2014-06-30 15:12 UTC (permalink / raw)
To: Pavel Machek
Cc: H. Peter Anvin, Andi Kleen, X86 ML, Linux Kernel, Borislav Petkov
On Sat, Jun 28, 2014 at 10:07 AM, Pavel Machek <pavel@ucw.cz> wrote:
> On Thu 2014-06-26 13:47:32, Andy Lutomirski wrote:
>> On Thu, Jun 26, 2014 at 1:12 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> > The real question is if we care that sysret and iter don't match. On 32 bits the situation is even more complex.
>>
>> At least for 64 bits, iret vs sysret is purely a kernel implementation
>> detail (except where a tracer modifies things that are inaccessible to
>> sysret), so ISTM it's worth one instruction to make them match.
>>
>> I noticed this thing while fiddling with moving some of the syscall
>> tracing logic to C. This isn't a real problem, but it at least made
>> me scratch my head.
>
> If possible, we'd like to trace programs without programs being noticed they are
> being traced. See subterfugue utility, for example.
>
> It is certainly worth one extra instruction.
I tend to agree.
FWIW, I haven't looked at the ia32 stuff, but it should be possible to
do something similar if it's not there already. The iret path can set
any user state it wants.
--Andy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86_64,entry: Fix RCX for traced syscalls
2014-06-26 19:08 [PATCH] x86_64,entry: Fix RCX for traced syscalls Andy Lutomirski
2014-06-26 19:56 ` Andi Kleen
@ 2014-11-04 23:43 ` Andy Lutomirski
1 sibling, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2014-11-04 23:43 UTC (permalink / raw)
To: X86 ML, H. Peter Anvin, Linux Kernel
Cc: Borislav Petkov, Andi Kleen, Andy Lutomirski
On Thu, Jun 26, 2014 at 12:08 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> The int_ret_from_sys_call and syscall tracing code disagrees with
> the sysret path as to the value of RCX.
>
> The Intel SDM, the AMD APM, and my laptop all agree that sysret
> returns with RCX == RIP. The syscall tracing code does not respect
> this property.
>
> For example, this program:
>
> int main()
> {
> extern const char syscall_rip[];
> unsigned long rcx = 1;
> unsigned long orig_rcx = rcx;
> asm ("mov $-1, %%eax\n\t"
> "syscall\n\t"
> "syscall_rip:"
> : "+c" (rcx) : : "r11");
> printf("syscall: RCX = %lX RIP = %lX orig RCX = %lx\n",
> rcx, (unsigned long)syscall_rip, orig_rcx);
> return 0;
> }
>
> prints:
> syscall: RCX = 400556 RIP = 400556 orig RCX = 1
>
> Running it under strace gives this instead:
> syscall: RCX = FFFFFFFFFFFFFFFF RIP = 400556 orig RCX = 1
>
> This changes FIXUP_TOP_OF_STACK to match sysret, causing the test to
> show RCX == RIP even under strace.
>
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
> arch/x86/kernel/entry_64.S | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index b25ca96..6624e18 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -143,7 +143,8 @@ ENDPROC(native_usergs_sysret64)
> movq \tmp,RSP+\offset(%rsp)
> movq $__USER_DS,SS+\offset(%rsp)
> movq $__USER_CS,CS+\offset(%rsp)
> - movq $-1,RCX+\offset(%rsp)
> + movq RIP+\offset(%rsp),\tmp /* get rip */
> + movq \tmp,RCX+\offset(%rsp) /* copy it to rcx as sysret would do */
> movq R11+\offset(%rsp),\tmp /* get eflags */
> movq \tmp,EFLAGS+\offset(%rsp)
> .endm
> --
> 1.9.3
>
Hi all-
I think this got lost. No one acked it, but no one nacked it either.
Summary of arguments in favor of applying:
- It's arguably a bugfix.
- Processes shouldn't be able to detect that they're being traced.
There are probably any number of ways that tracing is visible, but
this fixes one of them.
- The assembly code is complex enough anyway without requiring people
to wonder why setting the saved RCX to -1 is a reasonable thing to do.
- The performance hit is probably negligible. It's a single
instruction, it only happens during a slow path, and it's unlikely to
cause a cache miss.
Summary of arguments against:
- It adds one instruction.
- The bug it fixes isn't really entirely obviously a bug in the first place.
I'm still in favor.
--Andy
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-11-04 23:44 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-26 19:08 [PATCH] x86_64,entry: Fix RCX for traced syscalls Andy Lutomirski
2014-06-26 19:56 ` Andi Kleen
2014-06-26 19:59 ` Andy Lutomirski
2014-06-26 20:00 ` Andy Lutomirski
2014-06-26 20:12 ` H. Peter Anvin
2014-06-26 20:47 ` Andy Lutomirski
2014-06-28 17:07 ` Pavel Machek
2014-06-30 15:12 ` Andy Lutomirski
2014-11-04 23:43 ` Andy Lutomirski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.