* [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
@ 2013-02-14 13:17 Denys Vlasenko
2013-02-14 15:00 ` Oleg Nesterov
0 siblings, 1 reply; 12+ messages in thread
From: Denys Vlasenko @ 2013-02-14 13:17 UTC (permalink / raw)
To: linux-kernel, Oleg Nesterov
Determining personality of a ptraced process is a murky area.
On x86, for years strace was looking at segment selectors,
which is conceptually wrong: see, for example,
https://lkml.org/lkml/2012/1/18/320
strace recently changed detection method and current git code
(not released yet) does the following: it reads registers
with PTRACE_GETREGSET, and looks at returned regset size.
It is different for 64-bit and 32-bit processes,
and appears to be a reliable way to determine personality:
No need to check segment selectors for magic values.
This works for well-behaving processes.
But the hole described in the aforementioned lkml thread
still remains: 64-bit processes can perform 32-bit syscalls
using "int 80" entry method, and in this case, kernel returns
64-bit regset. For example, this:
asm("int $0x80": :"a" (29)); /* 32-bit sys_pause */
will be decoded by strace as a (64-bit) shmget syscall.
This patch makes it so that in syscall-entry-stop caused by
"int 80" instruction, PTRACE_GETREGSET returns 32-bit regset.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
--- linux-3.7.7/arch/x86/include/asm/uaccess.h
+++ linux-3.7.7_regset/arch/x86/include/asm/uaccess.h
@@ -594,5 +594,7 @@
# include <asm/uaccess_64.h>
#endif
+#define ARCH_HAS_SYSCALL_USER_REGSET_VIEW 1
+
#endif /* _ASM_X86_UACCESS_H */
--- linux-3.7.7/arch/x86/kernel/ptrace.c
+++ linux-3.7.7_regset/arch/x86/kernel/ptrace.c
@@ -1443,6 +1443,22 @@
#endif
}
+const struct user_regset_view *syscall_user_regset_view(struct task_struct *task)
+{
+#ifdef CONFIG_IA32_EMULATION
+ /* Did task make 32-bit syscall just now?
+ * Task can still be 64-bit: think "int 0x80 on x86_64".
+ */
+ if (task_thread_info(task)->status & TS_COMPAT)
+#endif
+#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
+ return &user_x86_32_view;
+#endif
+#ifdef CONFIG_X86_64
+ return &user_x86_64_view;
+#endif
+}
+
static void fill_sigtrap_info(struct task_struct *tsk,
struct pt_regs *regs,
int error_code, int si_code,
--- linux-3.7.7/include/linux/regset.h
+++ linux-3.7.7_regset/include/linux/regset.h
@@ -204,6 +204,12 @@
*/
const struct user_regset_view *task_user_regset_view(struct task_struct *tsk);
+#ifdef ARCH_HAS_SYSCALL_USER_REGSET_VIEW
+const struct user_regset_view *syscall_user_regset_view(struct task_struct *tsk);
+#else
+# define syscall_user_regset_view task_user_regset_view
+#endif
+
/*
* These are helpers for writing regset get/set functions in arch code.
--- linux-3.7.7/kernel/ptrace.c
+++ linux-3.7.7_regset/kernel/ptrace.c
@@ -684,7 +684,7 @@
static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
struct iovec *kiov)
{
- const struct user_regset_view *view = task_user_regset_view(task);
+ const struct user_regset_view *view = syscall_user_regset_view(task);
const struct user_regset *regset = find_regset(view, type);
int regset_no;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-14 13:17 [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 Denys Vlasenko
@ 2013-02-14 15:00 ` Oleg Nesterov
2013-02-14 16:26 ` Denys Vlasenko
2013-02-14 18:05 ` H. Peter Anvin
0 siblings, 2 replies; 12+ messages in thread
From: Oleg Nesterov @ 2013-02-14 15:00 UTC (permalink / raw)
To: Denys Vlasenko; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin
On 02/14, Denys Vlasenko wrote:
>
> Determining personality of a ptraced process is a murky area.
> On x86, for years strace was looking at segment selectors,
> which is conceptually wrong: see, for example,
> https://lkml.org/lkml/2012/1/18/320
>
> strace recently changed detection method and current git code
> (not released yet) does the following: it reads registers
> with PTRACE_GETREGSET, and looks at returned regset size.
> It is different for 64-bit and 32-bit processes,
> and appears to be a reliable way to determine personality:
> No need to check segment selectors for magic values.
>
> This works for well-behaving processes.
>
> But the hole described in the aforementioned lkml thread
> still remains: 64-bit processes can perform 32-bit syscalls
> using "int 80" entry method, and in this case, kernel returns
> 64-bit regset. For example, this:
>
> asm("int $0x80": :"a" (29)); /* 32-bit sys_pause */
>
> will be decoded by strace as a (64-bit) shmget syscall.
>
> This patch makes it so that in syscall-entry-stop caused by
> "int 80" instruction, PTRACE_GETREGSET returns 32-bit regset.
Not sure...
First of all, this is incompatible change. And to me, it doesn't look
correct anyway. Say, why the debugger can't modify r15 if a 64bit tracee
does int80 ? Or think about PTRACE_EVENT_FORK which can be reported with
TS_COMPAT set.
Probably is_ia32_task() should be reported "explicitely" as we already
discussed, and afaik you have other ideas.
Oleg.
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>
> --- linux-3.7.7/arch/x86/include/asm/uaccess.h
> +++ linux-3.7.7_regset/arch/x86/include/asm/uaccess.h
> @@ -594,5 +594,7 @@
> # include <asm/uaccess_64.h>
> #endif
>
> +#define ARCH_HAS_SYSCALL_USER_REGSET_VIEW 1
> +
> #endif /* _ASM_X86_UACCESS_H */
>
> --- linux-3.7.7/arch/x86/kernel/ptrace.c
> +++ linux-3.7.7_regset/arch/x86/kernel/ptrace.c
> @@ -1443,6 +1443,22 @@
> #endif
> }
>
> +const struct user_regset_view *syscall_user_regset_view(struct task_struct *task)
> +{
> +#ifdef CONFIG_IA32_EMULATION
> + /* Did task make 32-bit syscall just now?
> + * Task can still be 64-bit: think "int 0x80 on x86_64".
> + */
> + if (task_thread_info(task)->status & TS_COMPAT)
> +#endif
> +#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
> + return &user_x86_32_view;
> +#endif
> +#ifdef CONFIG_X86_64
> + return &user_x86_64_view;
> +#endif
> +}
> +
> static void fill_sigtrap_info(struct task_struct *tsk,
> struct pt_regs *regs,
> int error_code, int si_code,
> --- linux-3.7.7/include/linux/regset.h
> +++ linux-3.7.7_regset/include/linux/regset.h
> @@ -204,6 +204,12 @@
> */
> const struct user_regset_view *task_user_regset_view(struct task_struct *tsk);
>
> +#ifdef ARCH_HAS_SYSCALL_USER_REGSET_VIEW
> +const struct user_regset_view *syscall_user_regset_view(struct task_struct *tsk);
> +#else
> +# define syscall_user_regset_view task_user_regset_view
> +#endif
> +
>
> /*
> * These are helpers for writing regset get/set functions in arch code.
> --- linux-3.7.7/kernel/ptrace.c
> +++ linux-3.7.7_regset/kernel/ptrace.c
> @@ -684,7 +684,7 @@
> static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
> struct iovec *kiov)
> {
> - const struct user_regset_view *view = task_user_regset_view(task);
> + const struct user_regset_view *view = syscall_user_regset_view(task);
> const struct user_regset *regset = find_regset(view, type);
> int regset_no;
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-14 15:00 ` Oleg Nesterov
@ 2013-02-14 16:26 ` Denys Vlasenko
2013-02-14 18:05 ` H. Peter Anvin
1 sibling, 0 replies; 12+ messages in thread
From: Denys Vlasenko @ 2013-02-14 16:26 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin
On 02/14/2013 04:00 PM, Oleg Nesterov wrote:
> On 02/14, Denys Vlasenko wrote:
>> This patch makes it so that in syscall-entry-stop caused by
>> "int 80" instruction, PTRACE_GETREGSET returns 32-bit regset.
>
> Not sure...
>
> First of all, this is incompatible change. And to me, it doesn't look
> correct anyway. Say, why the debugger can't modify r15 if a 64bit tracee
> does int80?
On x86_64, PTRACE_GET/SETREGS can be used for this: they always operate
on 64-bit registers.
> Or think about PTRACE_EVENT_FORK which can be reported with
> TS_COMPAT set.
I don't see a problem. Yes, PTRACE_GETREGSET will return
32-bit regset in this ptrace-stop, which is a problem... why?
> Probably is_ia32_task() should be reported "explicitely" as we already
> discussed, and afaik you have other ideas.
Yes, there are a few ideas. Say, new ptrace op:
can be introduced to return a vector of longs.
To make it easily parsable, how about (type,len,data...)
records? This also may allow tracer to indicate which records
it wants: for example, not everyone wants to read syscall params.
Maybe something like
ptrace(PTRACE_GETSYSCALL, pid, list_of_elements_I_want, &iov)
where list_of_elements_I_want is long[], 0-terminated,
iov points to a buffer, and on return iov.len is updated
(a-la GETREGSET).
What problems can be solved here?
(1) syscall entry/exit discrimination etc. Say, a record
can contain bit flags, such as "it's a syscall entry",
"it's a syscall exit", "it's a group-stop" etc.
Currently, it is impossible for tracer to distinguish
syscall entry from syscall exit.
(2) a record can supply arch-specific data, such as the x86-specific
problem I tried to address: "was it a int 80 syscall?". Variable-length
record format makes it easy to adapt to different archs' needs.
Alternatively, we can set aside a few bits in "bit flags record"
as arch dependent bits. Most arches need just a few bits.
(3) on syscall entry, a record can contain (up to) 7 words: syscall_no
and 0-6 params, making tracer's code less architecture dependent.
Today in strace, *every* architecture needs to have arch-dependent
regs-to-params conversion code. I would like to be able
to code it in C with the same code for most arches.
(4) We can read structs/data pointed by syscall params, such as
struct stat returned by fstat, without needing additional
round-trip to kernel, *and* with kernel-supplied information
on structure's size. Currently, strace has to know the size correctly.
There were, and will be, bugs in strace where we mishandle
structures because we mis-detect process' bitness, and use wrong
struct stat layouts. If kernel would be able to tell us:
"I returned 78 byte structure in memory pointed to by arg1",
it would help a lot. Even if it wouldn't return the result
structure itself (I imagine it's a lot of work in kernel
to access it in other process vm), knowing its size
will still be a big help.
(5) We can read several regsets: gps, SSE regs, etc.
Maybe someone would find it useful? (strace doesn't need this).
How does this look?
I propose to start small, by implementing just 1; 2 in a form of arch bits
as part of 1; and 3. It will satisfy the needs I tried to address
in my patch.
--
vda
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-14 15:00 ` Oleg Nesterov
2013-02-14 16:26 ` Denys Vlasenko
@ 2013-02-14 18:05 ` H. Peter Anvin
2013-02-14 19:18 ` Oleg Nesterov
1 sibling, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2013-02-14 18:05 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Denys Vlasenko, linux-kernel, Andi Kleen
On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
> On 02/14, Denys Vlasenko wrote:
>>
>> Determining personality of a ptraced process is a murky area.
>> On x86, for years strace was looking at segment selectors,
>> which is conceptually wrong: see, for example,
>> https://lkml.org/lkml/2012/1/18/320
>>
One proposal that keeps being on the table is to export a regset with
metadatam, including process mode at launch (i386, x86-64, x32).
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-14 18:05 ` H. Peter Anvin
@ 2013-02-14 19:18 ` Oleg Nesterov
2013-02-14 19:21 ` H. Peter Anvin
0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2013-02-14 19:18 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Denys Vlasenko, linux-kernel, Andi Kleen
On 02/14, H. Peter Anvin wrote:
>
> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
>> On 02/14, Denys Vlasenko wrote:
>>>
>>> Determining personality of a ptraced process is a murky area.
>>> On x86, for years strace was looking at segment selectors,
>>> which is conceptually wrong: see, for example,
>>> https://lkml.org/lkml/2012/1/18/320
>>>
>
> One proposal that keeps being on the table is to export a regset with
> metadatam, including process mode at launch (i386, x86-64, x32).
Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
time. Or REGSET_META should include META+GENERAL.
IOW, it is not clear to me what this "meta" should actually report.
Oleg.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-14 19:18 ` Oleg Nesterov
@ 2013-02-14 19:21 ` H. Peter Anvin
2013-02-14 20:55 ` Cyrill Gorcunov
2013-02-15 15:42 ` Denys Vlasenko
0 siblings, 2 replies; 12+ messages in thread
From: H. Peter Anvin @ 2013-02-14 19:21 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Denys Vlasenko, linux-kernel, Andi Kleen
On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
> On 02/14, H. Peter Anvin wrote:
>>
>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
>>> On 02/14, Denys Vlasenko wrote:
>>>>
>>>> Determining personality of a ptraced process is a murky area.
>>>> On x86, for years strace was looking at segment selectors,
>>>> which is conceptually wrong: see, for example,
>>>> https://lkml.org/lkml/2012/1/18/320
>>>>
>>
>> One proposal that keeps being on the table is to export a regset with
>> metadatam, including process mode at launch (i386, x86-64, x32).
>
> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
> time. Or REGSET_META should include META+GENERAL.
>
> IOW, it is not clear to me what this "meta" should actually report.
>
That is one of the things that needs to be nailed down. In particular,
what are the things people need.
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-14 19:21 ` H. Peter Anvin
@ 2013-02-14 20:55 ` Cyrill Gorcunov
2013-02-15 14:50 ` Denys Vlasenko
2013-02-15 15:42 ` Denys Vlasenko
1 sibling, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2013-02-14 20:55 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Oleg Nesterov, Denys Vlasenko, linux-kernel, Andi Kleen,
Pavel Emelyanov
On Thu, Feb 14, 2013 at 11:21:12AM -0800, H. Peter Anvin wrote:
> On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
> > On 02/14, H. Peter Anvin wrote:
> >>
> >> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
> >>> On 02/14, Denys Vlasenko wrote:
> >>>>
> >>>> Determining personality of a ptraced process is a murky area.
> >>>> On x86, for years strace was looking at segment selectors,
> >>>> which is conceptually wrong: see, for example,
> >>>> https://lkml.org/lkml/2012/1/18/320
> >>>>
> >>
> >> One proposal that keeps being on the table is to export a regset with
> >> metadatam, including process mode at launch (i386, x86-64, x32).
> >
> > Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
> > do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
> > time. Or REGSET_META should include META+GENERAL.
> >
> > IOW, it is not clear to me what this "meta" should actually report.
>
> That is one of the things that needs to be nailed down. In particular,
> what are the things people need.
Indeed, having some "official" way for compat bit retrieval would be
a great thing for us (c/r camp) since we've just met the same problem.
And at moment I sticked for the same trick as gdb does (cs/ds test).
But, guys, if only I'm not missing something completely obvious,
can't we simply provide task-compat bit to userspace in say /proc/pid/stat
or something? Then the strace/gdb would be able to always know if
the tracee is actually in compat mode. Or I miss something fundamental
here?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-14 20:55 ` Cyrill Gorcunov
@ 2013-02-15 14:50 ` Denys Vlasenko
2013-02-15 14:56 ` Cyrill Gorcunov
0 siblings, 1 reply; 12+ messages in thread
From: Denys Vlasenko @ 2013-02-15 14:50 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: H. Peter Anvin, Oleg Nesterov, linux-kernel, Andi Kleen,
Pavel Emelyanov
On 02/14/2013 09:55 PM, Cyrill Gorcunov wrote:
> On Thu, Feb 14, 2013 at 11:21:12AM -0800, H. Peter Anvin wrote:
>> On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
>>> On 02/14, H. Peter Anvin wrote:
>>>>
>>>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
>>>>> On 02/14, Denys Vlasenko wrote:
>>>>>>
>>>>>> Determining personality of a ptraced process is a murky area.
>>>>>> On x86, for years strace was looking at segment selectors,
>>>>>> which is conceptually wrong: see, for example,
>>>>>> https://lkml.org/lkml/2012/1/18/320
>>>>>>
>>>>
>>>> One proposal that keeps being on the table is to export a regset with
>>>> metadatam, including process mode at launch (i386, x86-64, x32).
>>>
>>> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
>>> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
>>> time. Or REGSET_META should include META+GENERAL.
>>>
>>> IOW, it is not clear to me what this "meta" should actually report.
>>
>> That is one of the things that needs to be nailed down. In particular,
>> what are the things people need.
>
> Indeed, having some "official" way for compat bit retrieval would be
> a great thing for us (c/r camp) since we've just met the same problem.
> And at moment I sticked for the same trick as gdb does (cs/ds test).
>
> But, guys, if only I'm not missing something completely obvious,
> can't we simply provide task-compat bit to userspace in say /proc/pid/stat
> or something? Then the strace/gdb would be able to always know if
> the tracee is actually in compat mode. Or I miss something fundamental
> here?
strace needs to get that data on every syscall entry in the traced process.
Doing open/read/close on every syscall entry is going to slow it down a lot.
--
vda
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-15 14:50 ` Denys Vlasenko
@ 2013-02-15 14:56 ` Cyrill Gorcunov
2013-02-15 15:09 ` Oleg Nesterov
0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2013-02-15 14:56 UTC (permalink / raw)
To: Denys Vlasenko
Cc: H. Peter Anvin, Oleg Nesterov, linux-kernel, Andi Kleen,
Pavel Emelyanov
On Fri, Feb 15, 2013 at 03:50:43PM +0100, Denys Vlasenko wrote:
> On 02/14/2013 09:55 PM, Cyrill Gorcunov wrote:
> > On Thu, Feb 14, 2013 at 11:21:12AM -0800, H. Peter Anvin wrote:
> >> On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
> >>> On 02/14, H. Peter Anvin wrote:
> >>>>
> >>>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
> >>>>> On 02/14, Denys Vlasenko wrote:
> >>>>>>
> >>>>>> Determining personality of a ptraced process is a murky area.
> >>>>>> On x86, for years strace was looking at segment selectors,
> >>>>>> which is conceptually wrong: see, for example,
> >>>>>> https://lkml.org/lkml/2012/1/18/320
> >>>>>>
> >>>>
> >>>> One proposal that keeps being on the table is to export a regset with
> >>>> metadatam, including process mode at launch (i386, x86-64, x32).
> >>>
> >>> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
> >>> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
> >>> time. Or REGSET_META should include META+GENERAL.
> >>>
> >>> IOW, it is not clear to me what this "meta" should actually report.
> >>
> >> That is one of the things that needs to be nailed down. In particular,
> >> what are the things people need.
> >
> > Indeed, having some "official" way for compat bit retrieval would be
> > a great thing for us (c/r camp) since we've just met the same problem.
> > And at moment I sticked for the same trick as gdb does (cs/ds test).
> >
> > But, guys, if only I'm not missing something completely obvious,
> > can't we simply provide task-compat bit to userspace in say /proc/pid/stat
> > or something? Then the strace/gdb would be able to always know if
> > the tracee is actually in compat mode. Or I miss something fundamental
> > here?
>
> strace needs to get that data on every syscall entry in the traced process.
> Doing open/read/close on every syscall entry is going to slow it down a lot.
Don't you need to read it only once when strace is attached? Compat flag can't
be arbitrary dropped when program executes, no?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-15 14:56 ` Cyrill Gorcunov
@ 2013-02-15 15:09 ` Oleg Nesterov
2013-02-15 15:16 ` Cyrill Gorcunov
0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2013-02-15 15:09 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Denys Vlasenko, H. Peter Anvin, linux-kernel, Andi Kleen,
Pavel Emelyanov
On 02/15, Cyrill Gorcunov wrote:
>
> > strace needs to get that data on every syscall entry in the traced process.
> > Doing open/read/close on every syscall entry is going to slow it down a lot.
>
> Don't you need to read it only once when strace is attached? Compat flag can't
> be arbitrary dropped when program executes, no?
TS_COMPAT is set/cleared every time a 64bit task does int80.
I guess you need TIF_IA32, not TS_COMPAT, for c/r.
Oleg.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-15 15:09 ` Oleg Nesterov
@ 2013-02-15 15:16 ` Cyrill Gorcunov
0 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2013-02-15 15:16 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Denys Vlasenko, H. Peter Anvin, linux-kernel, Andi Kleen,
Pavel Emelyanov
On Fri, Feb 15, 2013 at 04:09:40PM +0100, Oleg Nesterov wrote:
> On 02/15, Cyrill Gorcunov wrote:
> >
> > > strace needs to get that data on every syscall entry in the traced process.
> > > Doing open/read/close on every syscall entry is going to slow it down a lot.
> >
> > Don't you need to read it only once when strace is attached? Compat flag can't
> > be arbitrary dropped when program executes, no?
>
> TS_COMPAT is set/cleared every time a 64bit task does int80.
>
> I guess you need TIF_IA32, not TS_COMPAT, for c/r.
Yeah, indeed. Still if there will be a metadata in registers set, this
will be even better I think.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
2013-02-14 19:21 ` H. Peter Anvin
2013-02-14 20:55 ` Cyrill Gorcunov
@ 2013-02-15 15:42 ` Denys Vlasenko
1 sibling, 0 replies; 12+ messages in thread
From: Denys Vlasenko @ 2013-02-15 15:42 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Oleg Nesterov, linux-kernel, Andi Kleen, jan.kratochvil
On 02/14/2013 08:21 PM, H. Peter Anvin wrote:
> On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
>> On 02/14, H. Peter Anvin wrote:
>>>
>>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
>>>> On 02/14, Denys Vlasenko wrote:
>>>>>
>>>>> Determining personality of a ptraced process is a murky area.
>>>>> On x86, for years strace was looking at segment selectors,
>>>>> which is conceptually wrong: see, for example,
>>>>> https://lkml.org/lkml/2012/1/18/320
>>>>>
>>>
>>> One proposal that keeps being on the table is to export a regset with
>>> metadatam, including process mode at launch (i386, x86-64, x32).
>>
>> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
>> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
>> time. Or REGSET_META should include META+GENERAL.
>>
>> IOW, it is not clear to me what this "meta" should actually report.
>
> That is one of the things that needs to be nailed down. In particular,
> what are the things people need.
Let's see what strace needs, by examining its source for various arches.
Ow. Six instances of PTRACE_PEEKTEXT (i.e. attempts to read tracee's
code - inherently unsafe operation) in syscall.c, affected arches:
S390: for syscall# fetch, thankfully only needed before 2.5.44;
ARM: for syscall# fetch. Looks like only needed for non-EABI?
SPARC: for personality detection.
Examples of personality detection:
POWERPC64: by examining registers (MSR)
X86: by looking at GETREGSET size
IA64: by examining registers (CR_IPSR)
ARM: by checking syscall no (scno & 0x0f0000)
SPARC: by looking at trap instruction
Syscall entry versus exit detection (i.e. a sanity check):
ALPHA, MIPS: registers (if a3 is 0 or -1, it's exit)
S390: registers (messy code)
X86: registers (eax must be -ENOSYS on entry)
In general, it is not reliable: eax must be -ENOSYS on entry,
but it can be -ENOSYS on exit too. IOW: if we see eax == -ENOSYS,
we have noi idea whether it's entry or exit.
Syscall parameters fetching. Some architectures
need to use nontrivial code. Look at this:
#elif defined(IA64)
if (!ia32) {
unsigned long *out0, cfm, sof, sol;
long rbs_end;
/* be backwards compatible with kernel < 2.4.4... */
# ifndef PT_RBS_END
# define PT_RBS_END PT_AR_BSP
# endif
if (upeek(tcp, PT_RBS_END, &rbs_end) < 0)
return -1;
if (upeek(tcp, PT_CFM, (long *) &cfm) < 0)
return -1;
sof = (cfm >> 0) & 0x7f;
sol = (cfm >> 7) & 0x7f;
out0 = ia64_rse_skip_regs((unsigned long *) rbs_end, -sof + sol);
for (i = 0; i < nargs; ++i) {
if (umoven(tcp, (unsigned long) ia64_rse_skip_regs(out0, i),
sizeof(long), (char *) &tcp->u_arg[i]) < 0)
return -1;
}
or this:
#elif defined(MIPS)
if (nargs > 4) {
long sp;
if (upeek(tcp, REG_SP, &sp) < 0)
return -1;
for (i = 0; i < 4; ++i)
if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0)
return -1;
umoven(tcp, sp + 16, (nargs - 4) * sizeof(tcp->u_arg[0]),
(char *)(tcp->u_arg + 4));
} else {
for (i = 0; i < nargs; ++i)
if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0)
return -1;
}
Detecting error exits from syscalls. Most arches use the -errno
convention, others (IA64, SPARC, MIPS) have dedicated register
or bit in a status register to indicate error. Some syscalls
"never fail" (e.g. getgid), and strace needs to know which syscalls
never fail.
If you want to take a look yourself, for your convenience I attached
larger excerpts from strace's syscall.c source file.
To summarize:
Looks like this particular ptrace user would benefit from
the following data:
* is it a syscall entry, exit, or something else.
* for syscall entry:
- parameters width (32/64/etc) and personality data
(if arch has personality data more fine-grained than "32/64 bits")
- syscall no
- parameters
* for syscall exit:
- parameters width (32/64/etc) and personality data
- error indicator (errno)?
- syscall result
Does this look as a good format?
--
vda
static int
get_scno(struct tcb *tcp)
{
long scno = 0;
#if defined(S390) || defined(S390X)
if (upeek(tcp, PT_GPR2, &syscall_mode) < 0)
return -1;
if (syscall_mode != -ENOSYS) {
/*
* Since kernel version 2.5.44 the scno gets passed in gpr2.
*/
scno = syscall_mode;
} else {
/*
* Old style of "passing" the scno via the SVC instruction.
*/
long psw;
long opcode, offset_reg, tmp;
void *svc_addr;
static const int gpr_offset[16] = {
PT_GPR0, PT_GPR1, PT_ORIGGPR2, PT_GPR3,
PT_GPR4, PT_GPR5, PT_GPR6, PT_GPR7,
PT_GPR8, PT_GPR9, PT_GPR10, PT_GPR11,
PT_GPR12, PT_GPR13, PT_GPR14, PT_GPR15
};
if (upeek(tcp, PT_PSWADDR, &psw) < 0)
return -1;
errno = 0;
opcode = ptrace(PTRACE_PEEKTEXT, tcp->pid, (char *)(psw - sizeof(long)), 0);
if (errno) {
perror_msg("%s", "peektext(psw-oneword)");
return -1;
}
/*
* We have to check if the SVC got executed directly or via an
* EXECUTE instruction. In case of EXECUTE it is necessary to do
* instruction decoding to derive the system call number.
* Unfortunately the opcode sizes of EXECUTE and SVC are differently,
* so that this doesn't work if a SVC opcode is part of an EXECUTE
* opcode. Since there is no way to find out the opcode size this
* is the best we can do...
*/
if ((opcode & 0xff00) == 0x0a00) {
/* SVC opcode */
scno = opcode & 0xff;
}
else {
/* SVC got executed by EXECUTE instruction */
/*
* Do instruction decoding of EXECUTE. If you really want to
* understand this, read the Principles of Operations.
*/
svc_addr = (void *) (opcode & 0xfff);
tmp = 0;
offset_reg = (opcode & 0x000f0000) >> 16;
if (offset_reg && (upeek(tcp, gpr_offset[offset_reg], &tmp) < 0))
return -1;
svc_addr += tmp;
tmp = 0;
offset_reg = (opcode & 0x0000f000) >> 12;
if (offset_reg && (upeek(tcp, gpr_offset[offset_reg], &tmp) < 0))
return -1;
svc_addr += tmp;
scno = ptrace(PTRACE_PEEKTEXT, tcp->pid, svc_addr, 0);
if (errno)
return -1;
# if defined(S390X)
scno >>= 48;
# else
scno >>= 16;
# endif
tmp = 0;
offset_reg = (opcode & 0x00f00000) >> 20;
if (offset_reg && (upeek(tcp, gpr_offset[offset_reg], &tmp) < 0))
return -1;
scno = (scno | tmp) & 0xff;
}
}
#elif defined(POWERPC)
if (upeek(tcp, sizeof(unsigned long)*PT_R0, &scno) < 0)
return -1;
# ifdef POWERPC64
/* TODO: speed up strace by not doing this at every syscall.
* We only need to do it after execve.
*/
int currpers;
long val;
/* Check for 64/32 bit mode. */
if (upeek(tcp, sizeof(unsigned long)*PT_MSR, &val) < 0)
return -1;
/* SF is bit 0 of MSR */
if (val < 0)
currpers = 0;
else
currpers = 1;
update_personality(tcp, currpers);
# endif
#elif defined(X86_64) || defined(X32)
int currpers;
/* GETREGSET of NT_PRSTATUS tells us regset size,
* which unambiguously detects i386.
*
* Linux kernel distinguishes x86-64 and x32 processes
* solely by looking at __X32_SYSCALL_BIT:
* arch/x86/include/asm/compat.h::is_x32_task():
* if (task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT)
* return true;
*/
if (x86_io.iov_len == sizeof(i386_regs)) {
scno = i386_regs.orig_eax;
currpers = 1;
} else {
scno = x86_64_regs.orig_rax;
currpers = 0;
if (scno & __X32_SYSCALL_BIT) {
scno -= __X32_SYSCALL_BIT;
currpers = 2;
}
}
update_personality(tcp, currpers);
#elif defined(IA64)
long psr;
if (upeek(tcp, PT_CR_IPSR, &psr) >= 0)
ia32 = (psr & IA64_PSR_IS) != 0;
if (ia32) {
if (upeek(tcp, PT_R1, &scno) < 0)
return -1;
} else {
if (upeek(tcp, PT_R15, &scno) < 0)
return -1;
}
#elif defined(AARCH64)
switch (aarch64_io.iov_len) {
case sizeof(aarch64_regs):
/* We are in 64-bit mode */
scno = aarch64_regs.regs[8];
update_personality(tcp, 1);
break;
case sizeof(arm_regs):
/* We are in 32-bit mode */
scno = arm_regs.ARM_r7;
update_personality(tcp, 0);
break;
}
#elif defined(ARM)
/*
* We only need to grab the syscall number on syscall entry.
*/
if (arm_regs.ARM_ip == 0) {
/*
* Note: we only deal with 32-bit CPUs here
*/
if (arm_regs.ARM_cpsr & 0x20) {
/*
* Get the Thumb-mode system call number
*/
scno = arm_regs.ARM_r7;
} else {
/*
* Get the ARM-mode system call number
*/
errno = 0;
scno = ptrace(PTRACE_PEEKTEXT, tcp->pid, (void *)(arm_regs.ARM_pc - 4), NULL);
if (errno)
return -1;
/* Handle the EABI syscall convention. We do not
bother converting structures between the two
ABIs, but basic functionality should work even
if strace and the traced program have different
ABIs. */
if (scno == 0xef000000) {
scno = arm_regs.ARM_r7;
} else {
if ((scno & 0x0ff00000) != 0x0f900000) {
fprintf(stderr, "syscall: unknown syscall trap 0x%08lx\n",
scno);
return -1;
}
/*
* Fixup the syscall number
*/
scno &= 0x000fffff;
}
}
if (scno & 0x0f0000) {
/*
* Handle ARM specific syscall
*/
update_personality(tcp, 1);
scno &= 0x0000ffff;
} else
update_personality(tcp, 0);
} else {
fprintf(stderr, "pid %d stray syscall entry\n", tcp->pid);
tcp->flags |= TCB_INSYSCALL;
}
#elif defined(LINUX_MIPSN32)
unsigned long long regs[38];
if (ptrace(PTRACE_GETREGS, tcp->pid, NULL, (long) ®s) < 0)
return -1;
mips_a3 = regs[REG_A3];
mips_r2 = regs[REG_V0];
scno = mips_r2;
if (!SCNO_IN_RANGE(scno)) {
if (mips_a3 == 0 || mips_a3 == -1) {
if (debug_flag)
fprintf(stderr, "stray syscall exit: v0 = %ld\n", scno);
return 0;
}
}
#elif defined(MIPS)
if (upeek(tcp, REG_A3, &mips_a3) < 0)
return -1;
if (upeek(tcp, REG_V0, &scno) < 0)
return -1;
if (!SCNO_IN_RANGE(scno)) {
if (mips_a3 == 0 || mips_a3 == -1) {
if (debug_flag)
fprintf(stderr, "stray syscall exit: v0 = %ld\n", scno);
return 0;
}
}
#elif defined(ALPHA)
if (upeek(tcp, REG_A3, &alpha_a3) < 0)
return -1;
if (upeek(tcp, REG_R0, &scno) < 0)
return -1;
/*
* Do some sanity checks to figure out if it's
* really a syscall entry
*/
if (!SCNO_IN_RANGE(scno)) {
if (alpha_a3 == 0 || alpha_a3 == -1) {
if (debug_flag)
fprintf(stderr, "stray syscall exit: r0 = %ld\n", scno);
return 0;
}
}
#elif defined(SPARC) || defined(SPARC64)
/* Disassemble the syscall trap. */
/* Retrieve the syscall trap instruction. */
unsigned long trap;
errno = 0;
# if defined(SPARC64)
trap = ptrace(PTRACE_PEEKTEXT, tcp->pid, (char *)regs.tpc, 0);
trap >>= 32;
# else
trap = ptrace(PTRACE_PEEKTEXT, tcp->pid, (char *)regs.pc, 0);
# endif
if (errno)
return -1;
/* Disassemble the trap to see what personality to use. */
switch (trap) {
case 0x91d02010:
/* Linux/SPARC syscall trap. */
update_personality(tcp, 0);
break;
case 0x91d0206d:
/* Linux/SPARC64 syscall trap. */
update_personality(tcp, 2);
break;
case 0x91d02000:
/* SunOS syscall trap. (pers 1) */
fprintf(stderr, "syscall: SunOS no support\n");
return -1;
case 0x91d02008:
/* Solaris 2.x syscall trap. (per 2) */
update_personality(tcp, 1);
break;
case 0x91d02009:
/* NetBSD/FreeBSD syscall trap. */
fprintf(stderr, "syscall: NetBSD/FreeBSD not supported\n");
return -1;
case 0x91d02027:
/* Solaris 2.x gettimeofday */
update_personality(tcp, 1);
break;
default:
# if defined(SPARC64)
fprintf(stderr, "syscall: unknown syscall trap %08lx %016lx\n", trap, regs.tpc);
# else
fprintf(stderr, "syscall: unknown syscall trap %08lx %08lx\n", trap, regs.pc);
# endif
return -1;
}
/* Extract the system call number from the registers. */
if (trap == 0x91d02027)
scno = 156;
else
scno = regs.u_regs[U_REG_G1];
if (scno == 0) {
scno = regs.u_regs[U_REG_O0];
memmove(®s.u_regs[U_REG_O0], ®s.u_regs[U_REG_O1], 7*sizeof(regs.u_regs[0]));
}
#elif defined(TILE)
int currpers;
scno = tile_regs.regs[10];
# ifdef __tilepro__
currpers = 1;
# else
# ifndef PT_FLAGS_COMPAT
# define PT_FLAGS_COMPAT 0x10000 /* from Linux 3.8 on */
# endif
if (tile_regs.flags & PT_FLAGS_COMPAT)
currpers = 1;
else
currpers = 0;
# endif
update_personality(tcp, currpers);
#endif
tcp->scno = scno;
return 1;
}
/* Called at each syscall entry.
* Returns:
* 0: "ignore this ptrace stop", bail out of trace_syscall_entering() silently.
* 1: ok, continue in trace_syscall_entering().
* other: error, trace_syscall_entering() should print error indicator
* ("????" etc) and bail out.
*/
static int
syscall_fixup_on_sysenter(struct tcb *tcp)
{
/* A common case of "not a syscall entry" is post-execve SIGTRAP */
#if defined(I386)
if (i386_regs.eax != -ENOSYS) {
if (debug_flag)
fprintf(stderr, "not a syscall entry (eax = %ld)\n", i386_regs.eax);
return 0;
}
#elif defined(X86_64) || defined(X32)
{
long rax;
if (x86_io.iov_len == sizeof(i386_regs)) {
/* Sign extend from 32 bits */
rax = (int32_t)i386_regs.eax;
} else {
/* Note: in X32 build, this truncates 64 to 32 bits */
rax = x86_64_regs.rax;
}
if (rax != -ENOSYS) {
if (debug_flag)
fprintf(stderr, "not a syscall entry (rax = %ld)\n", rax);
return 0;
}
}
#elif defined(S390) || defined(S390X)
/* TODO: we already fetched PT_GPR2 in get_scno
* and stored it in syscall_mode, reuse it here
* instead of re-fetching?
*/
if (upeek(tcp, PT_GPR2, &gpr2) < 0)
return -1;
if (syscall_mode != -ENOSYS)
syscall_mode = tcp->scno;
if (gpr2 != syscall_mode) {
if (debug_flag)
fprintf(stderr, "not a syscall entry (gpr2 = %ld)\n", gpr2);
return 0;
}
#elif defined(M68K)
if (upeek(tcp, 4*PT_D0, &m68k_d0) < 0)
return -1;
if (m68k_d0 != -ENOSYS) {
if (debug_flag)
fprintf(stderr, "not a syscall entry (d0 = %ld)\n", m68k_d0);
return 0;
}
#elif defined(IA64)
if (upeek(tcp, PT_R10, &ia64_r10) < 0)
return -1;
if (upeek(tcp, PT_R8, &ia64_r8) < 0)
return -1;
if (ia32 && ia64_r8 != -ENOSYS) {
if (debug_flag)
fprintf(stderr, "not a syscall entry (r8 = %ld)\n", ia64_r8);
return 0;
}
#elif defined(CRISV10) || defined(CRISV32)
if (upeek(tcp, 4*PT_R10, &cris_r10) < 0)
return -1;
if (cris_r10 != -ENOSYS) {
if (debug_flag)
fprintf(stderr, "not a syscall entry (r10 = %ld)\n", cris_r10);
return 0;
}
#elif defined(MICROBLAZE)
if (upeek(tcp, 3 * 4, µblaze_r3) < 0)
return -1;
if (microblaze_r3 != -ENOSYS) {
if (debug_flag)
fprintf(stderr, "not a syscall entry (r3 = %ld)\n", microblaze_r3);
return 0;
}
#endif
return 1;
}
/* Return -1 on error or 1 on success (never 0!) */
static int
get_syscall_args(struct tcb *tcp)
{
int i, nargs;
if (SCNO_IN_RANGE(tcp->scno))
nargs = tcp->u_nargs = sysent[tcp->scno].nargs;
else
nargs = tcp->u_nargs = MAX_ARGS;
#if defined(S390) || defined(S390X)
for (i = 0; i < nargs; ++i)
if (upeek(tcp, i==0 ? PT_ORIGGPR2 : PT_GPR2 + i*sizeof(long), &tcp->u_arg[i]) < 0)
return -1;
#elif defined(ALPHA)
for (i = 0; i < nargs; ++i)
if (upeek(tcp, REG_A0+i, &tcp->u_arg[i]) < 0)
return -1;
#elif defined(IA64)
if (!ia32) {
unsigned long *out0, cfm, sof, sol;
long rbs_end;
/* be backwards compatible with kernel < 2.4.4... */
# ifndef PT_RBS_END
# define PT_RBS_END PT_AR_BSP
# endif
if (upeek(tcp, PT_RBS_END, &rbs_end) < 0)
return -1;
if (upeek(tcp, PT_CFM, (long *) &cfm) < 0)
return -1;
sof = (cfm >> 0) & 0x7f;
sol = (cfm >> 7) & 0x7f;
out0 = ia64_rse_skip_regs((unsigned long *) rbs_end, -sof + sol);
for (i = 0; i < nargs; ++i) {
if (umoven(tcp, (unsigned long) ia64_rse_skip_regs(out0, i),
sizeof(long), (char *) &tcp->u_arg[i]) < 0)
return -1;
}
} else {
static const int argreg[MAX_ARGS] = { PT_R11 /* EBX = out0 */,
PT_R9 /* ECX = out1 */,
PT_R10 /* EDX = out2 */,
PT_R14 /* ESI = out3 */,
PT_R15 /* EDI = out4 */,
PT_R13 /* EBP = out5 */};
for (i = 0; i < nargs; ++i) {
if (upeek(tcp, argreg[i], &tcp->u_arg[i]) < 0)
return -1;
/* truncate away IVE sign-extension */
tcp->u_arg[i] &= 0xffffffff;
}
}
#elif defined(MIPS)
if (nargs > 4) {
long sp;
if (upeek(tcp, REG_SP, &sp) < 0)
return -1;
for (i = 0; i < 4; ++i)
if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0)
return -1;
umoven(tcp, sp + 16, (nargs - 4) * sizeof(tcp->u_arg[0]),
(char *)(tcp->u_arg + 4));
} else {
for (i = 0; i < nargs; ++i)
if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0)
return -1;
}
#elif defined(M68K)
for (i = 0; i < nargs; ++i)
if (upeek(tcp, (i < 5 ? i : i + 2)*4, &tcp->u_arg[i]) < 0)
return -1;
#else /* Other architecture (32bits specific) */
for (i = 0; i < nargs; ++i)
if (upeek(tcp, i*4, &tcp->u_arg[i]) < 0)
return -1;
#endif
return 1;
}
/* Returns:
* 1: ok, continue in trace_syscall_exiting().
* -1: error, trace_syscall_exiting() should print error indicator
* ("????" etc) and bail out.
*/
static int
get_syscall_result(struct tcb *tcp)
{
#if defined(S390) || defined(S390X)
if (upeek(tcp, PT_GPR2, &gpr2) < 0)
return -1;
#elif defined(POWERPC)
# define SO_MASK 0x10000000
{
long flags;
if (upeek(tcp, sizeof(unsigned long)*PT_CCR, &flags) < 0)
return -1;
if (upeek(tcp, sizeof(unsigned long)*PT_R3, &ppc_result) < 0)
return -1;
if (flags & SO_MASK)
ppc_result = -ppc_result;
}
#elif defined(AVR32)
/* already done by get_regs */
#elif defined(BFIN)
if (upeek(tcp, PT_R0, &bfin_r0) < 0)
return -1;
#elif defined(I386)
/* already done by get_regs */
#elif defined(X86_64) || defined(X32)
/* already done by get_regs */
#elif defined(IA64)
# define IA64_PSR_IS ((long)1 << 34)
long psr;
if (upeek(tcp, PT_CR_IPSR, &psr) >= 0)
ia32 = (psr & IA64_PSR_IS) != 0;
if (upeek(tcp, PT_R8, &ia64_r8) < 0)
return -1;
if (upeek(tcp, PT_R10, &ia64_r10) < 0)
return -1;
#elif defined(ARM)
/* already done by get_regs */
#elif defined(AARCH64)
/* register reading already done by get_regs */
/* Used to do this, but we did it on syscall entry already: */
/* We are in 64-bit mode (personality 1) if register struct is aarch64_regs,
* else it's personality 0.
*/
/*update_personality(tcp, aarch64_io.iov_len == sizeof(aarch64_regs));*/
#elif defined(M68K)
if (upeek(tcp, 4*PT_D0, &m68k_d0) < 0)
return -1;
#elif defined(LINUX_MIPSN32)
unsigned long long regs[38];
if (ptrace(PTRACE_GETREGS, tcp->pid, NULL, (long) ®s) < 0)
return -1;
mips_a3 = regs[REG_A3];
mips_r2 = regs[REG_V0];
#elif defined(MIPS)
if (upeek(tcp, REG_A3, &mips_a3) < 0)
return -1;
if (upeek(tcp, REG_V0, &mips_r2) < 0)
return -1;
#elif defined(ALPHA)
if (upeek(tcp, REG_A3, &alpha_a3) < 0)
return -1;
if (upeek(tcp, REG_R0, &alpha_r0) < 0)
return -1;
#elif defined(SPARC) || defined(SPARC64)
/* already done by get_regs */
#elif defined(HPPA)
if (upeek(tcp, PT_GR28, &hppa_r28) < 0)
return -1;
#elif defined(SH)
/* new syscall ABI returns result in R0 */
if (upeek(tcp, 4*REG_REG0, (long *)&sh_r0) < 0)
return -1;
#elif defined(SH64)
/* ABI defines result returned in r9 */
if (upeek(tcp, REG_GENERAL(9), (long *)&sh64_r9) < 0)
return -1;
#elif defined(CRISV10) || defined(CRISV32)
if (upeek(tcp, 4*PT_R10, &cris_r10) < 0)
return -1;
#elif defined(TILE)
/* already done by get_regs */
#elif defined(MICROBLAZE)
if (upeek(tcp, 3 * 4, µblaze_r3) < 0)
return -1;
#elif defined(OR1K)
/* already done by get_regs */
#endif
return 1;
}
/* Called at each syscall exit */
static void
syscall_fixup_on_sysexit(struct tcb *tcp)
{
#if defined(S390) || defined(S390X)
if (syscall_mode != -ENOSYS)
syscall_mode = tcp->scno;
if ((tcp->flags & TCB_WAITEXECVE)
&& (gpr2 == -ENOSYS || gpr2 == tcp->scno)) {
/*
* Return from execve.
* Fake a return value of zero. We leave the TCB_WAITEXECVE
* flag set for the post-execve SIGTRAP to see and reset.
*/
gpr2 = 0;
}
#endif
}
/* Returns:
* 1: ok, continue in trace_syscall_exiting().
* -1: error, trace_syscall_exiting() should print error indicator
* ("????" etc) and bail out.
*/
static int
get_error(struct tcb *tcp)
{
int u_error = 0;
int check_errno = 1;
if (SCNO_IN_RANGE(tcp->scno) &&
sysent[tcp->scno].sys_flags & SYSCALL_NEVER_FAILS) {
check_errno = 0;
}
#if defined(S390) || defined(S390X)
if (check_errno && is_negated_errno(gpr2)) {
tcp->u_rval = -1;
u_error = -gpr2;
}
else {
tcp->u_rval = gpr2;
}
#elif defined(I386)
if (check_errno && is_negated_errno(i386_regs.eax)) {
tcp->u_rval = -1;
u_error = -i386_regs.eax;
}
else {
tcp->u_rval = i386_regs.eax;
}
#elif defined(X86_64)
long rax;
if (x86_io.iov_len == sizeof(i386_regs)) {
/* Sign extend from 32 bits */
rax = (int32_t)i386_regs.eax;
} else {
rax = x86_64_regs.rax;
}
if (check_errno && is_negated_errno(rax)) {
tcp->u_rval = -1;
u_error = -rax;
}
else {
tcp->u_rval = rax;
}
#elif defined(X32)
/* In X32, return value is 64-bit (llseek uses one).
* Using merely "long rax" would not work.
*/
long long rax;
if (x86_io.iov_len == sizeof(i386_regs)) {
/* Sign extend from 32 bits */
rax = (int32_t)i386_regs.eax;
} else {
rax = x86_64_regs.rax;
}
/* Careful: is_negated_errno() works only on longs */
if (check_errno && is_negated_errno_x32(rax)) {
tcp->u_rval = -1;
u_error = -rax;
}
else {
tcp->u_rval = rax; /* truncating */
tcp->u_lrval = rax;
}
#elif defined(IA64)
if (ia32) {
int err;
err = (int)ia64_r8;
if (check_errno && is_negated_errno(err)) {
tcp->u_rval = -1;
u_error = -err;
}
else {
tcp->u_rval = err;
}
} else {
if (check_errno && ia64_r10) {
tcp->u_rval = -1;
u_error = ia64_r8;
} else {
tcp->u_rval = ia64_r8;
}
}
#elif defined(MIPS)
if (check_errno && mips_a3) {
tcp->u_rval = -1;
u_error = mips_r2;
} else {
tcp->u_rval = mips_r2;
# if defined(LINUX_MIPSN32)
tcp->u_lrval = mips_r2;
# endif
}
#elif defined(POWERPC)
if (check_errno && is_negated_errno(ppc_result)) {
tcp->u_rval = -1;
u_error = -ppc_result;
}
else {
tcp->u_rval = ppc_result;
}
#elif defined(M68K)
if (check_errno && is_negated_errno(m68k_d0)) {
tcp->u_rval = -1;
u_error = -m68k_d0;
}
else {
tcp->u_rval = m68k_d0;
}
#elif defined(ARM) || defined(AARCH64)
# if defined(AARCH64)
if (tcp->currpers == 1) {
if (check_errno && is_negated_errno(aarch64_regs.regs[0])) {
tcp->u_rval = -1;
u_error = -aarch64_regs.regs[0];
}
else {
tcp->u_rval = aarch64_regs.regs[0];
}
}
else
# endif
{
if (check_errno && is_negated_errno(arm_regs.ARM_r0)) {
tcp->u_rval = -1;
u_error = -arm_regs.ARM_r0;
}
else {
tcp->u_rval = arm_regs.ARM_r0;
}
}
#elif defined(AVR32)
if (check_errno && regs.r12 && (unsigned) -regs.r12 < nerrnos) {
tcp->u_rval = -1;
u_error = -regs.r12;
}
else {
tcp->u_rval = regs.r12;
}
#elif defined(BFIN)
if (check_errno && is_negated_errno(bfin_r0)) {
tcp->u_rval = -1;
u_error = -bfin_r0;
} else {
tcp->u_rval = bfin_r0;
}
#elif defined(ALPHA)
if (check_errno && alpha_a3) {
tcp->u_rval = -1;
u_error = alpha_r0;
}
else {
tcp->u_rval = alpha_r0;
}
#elif defined(SPARC)
if (check_errno && regs.psr & PSR_C) {
tcp->u_rval = -1;
u_error = regs.u_regs[U_REG_O0];
}
else {
tcp->u_rval = regs.u_regs[U_REG_O0];
}
#elif defined(SPARC64)
if (check_errno && regs.tstate & 0x1100000000UL) {
tcp->u_rval = -1;
u_error = regs.u_regs[U_REG_O0];
}
else {
tcp->u_rval = regs.u_regs[U_REG_O0];
}
#elif defined(HPPA)
if (check_errno && is_negated_errno(hppa_r28)) {
tcp->u_rval = -1;
u_error = -hppa_r28;
}
else {
tcp->u_rval = hppa_r28;
}
#elif defined(SH)
if (check_errno && is_negated_errno(sh_r0)) {
tcp->u_rval = -1;
u_error = -sh_r0;
}
else {
tcp->u_rval = sh_r0;
}
#elif defined(SH64)
if (check_errno && is_negated_errno(sh64_r9)) {
tcp->u_rval = -1;
u_error = -sh64_r9;
}
else {
tcp->u_rval = sh64_r9;
}
#elif defined(CRISV10) || defined(CRISV32)
if (check_errno && cris_r10 && (unsigned) -cris_r10 < nerrnos) {
tcp->u_rval = -1;
u_error = -cris_r10;
}
else {
tcp->u_rval = cris_r10;
}
#elif defined(TILE)
/*
* The standard tile calling convention returns the value (or negative
* errno) in r0, and zero (or positive errno) in r1.
* Until at least kernel 3.8, however, the r1 value is not reflected
* in ptregs at this point, so we use r0 here.
*/
if (check_errno && is_negated_errno(tile_regs.regs[0])) {
tcp->u_rval = -1;
u_error = -tile_regs.regs[0];
} else {
tcp->u_rval = tile_regs.regs[0];
}
#elif defined(MICROBLAZE)
if (check_errno && is_negated_errno(microblaze_r3)) {
tcp->u_rval = -1;
u_error = -microblaze_r3;
}
else {
tcp->u_rval = microblaze_r3;
}
#elif defined(OR1K)
if (check_errno && is_negated_errno(or1k_regs.gpr[11])) {
tcp->u_rval = -1;
u_error = -or1k_regs.gpr[11];
}
else {
tcp->u_rval = or1k_regs.gpr[11];
}
#endif
tcp->u_error = u_error;
return 1;
}
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-02-15 15:42 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-14 13:17 [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 Denys Vlasenko
2013-02-14 15:00 ` Oleg Nesterov
2013-02-14 16:26 ` Denys Vlasenko
2013-02-14 18:05 ` H. Peter Anvin
2013-02-14 19:18 ` Oleg Nesterov
2013-02-14 19:21 ` H. Peter Anvin
2013-02-14 20:55 ` Cyrill Gorcunov
2013-02-15 14:50 ` Denys Vlasenko
2013-02-15 14:56 ` Cyrill Gorcunov
2013-02-15 15:09 ` Oleg Nesterov
2013-02-15 15:16 ` Cyrill Gorcunov
2013-02-15 15:42 ` Denys Vlasenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox