* [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
@ 2013-02-14 13:17 Denys Vlasenko
2013-02-14 15:00 ` Oleg Nesterov
0 siblings, 1 reply; 12+ messages in thread
From: Denys Vlasenko @ 2013-02-14 13:17 UTC (permalink / raw)
To: linux-kernel, Oleg Nesterov
Determining personality of a ptraced process is a murky area.
On x86, for years strace was looking at segment selectors,
which is conceptually wrong: see, for example,
https://lkml.org/lkml/2012/1/18/320
strace recently changed detection method and current git code
(not released yet) does the following: it reads registers
with PTRACE_GETREGSET, and looks at returned regset size.
It is different for 64-bit and 32-bit processes,
and appears to be a reliable way to determine personality:
No need to check segment selectors for magic values.
This works for well-behaving processes.
But the hole described in the aforementioned lkml thread
still remains: 64-bit processes can perform 32-bit syscalls
using "int 80" entry method, and in this case, kernel returns
64-bit regset. For example, this:
asm("int $0x80": :"a" (29)); /* 32-bit sys_pause */
will be decoded by strace as a (64-bit) shmget syscall.
This patch makes it so that in syscall-entry-stop caused by
"int 80" instruction, PTRACE_GETREGSET returns 32-bit regset.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
--- linux-3.7.7/arch/x86/include/asm/uaccess.h
+++ linux-3.7.7_regset/arch/x86/include/asm/uaccess.h
@@ -594,5 +594,7 @@
# include <asm/uaccess_64.h>
#endif
+#define ARCH_HAS_SYSCALL_USER_REGSET_VIEW 1
+
#endif /* _ASM_X86_UACCESS_H */
--- linux-3.7.7/arch/x86/kernel/ptrace.c
+++ linux-3.7.7_regset/arch/x86/kernel/ptrace.c
@@ -1443,6 +1443,22 @@
#endif
}
+const struct user_regset_view *syscall_user_regset_view(struct task_struct *task)
+{
+#ifdef CONFIG_IA32_EMULATION
+ /* Did task make 32-bit syscall just now?
+ * Task can still be 64-bit: think "int 0x80 on x86_64".
+ */
+ if (task_thread_info(task)->status & TS_COMPAT)
+#endif
+#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
+ return &user_x86_32_view;
+#endif
+#ifdef CONFIG_X86_64
+ return &user_x86_64_view;
+#endif
+}
+
static void fill_sigtrap_info(struct task_struct *tsk,
struct pt_regs *regs,
int error_code, int si_code,
--- linux-3.7.7/include/linux/regset.h
+++ linux-3.7.7_regset/include/linux/regset.h
@@ -204,6 +204,12 @@
*/
const struct user_regset_view *task_user_regset_view(struct task_struct *tsk);
+#ifdef ARCH_HAS_SYSCALL_USER_REGSET_VIEW
+const struct user_regset_view *syscall_user_regset_view(struct task_struct *tsk);
+#else
+# define syscall_user_regset_view task_user_regset_view
+#endif
+
/*
* These are helpers for writing regset get/set functions in arch code.
--- linux-3.7.7/kernel/ptrace.c
+++ linux-3.7.7_regset/kernel/ptrace.c
@@ -684,7 +684,7 @@
static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
struct iovec *kiov)
{
- const struct user_regset_view *view = task_user_regset_view(task);
+ const struct user_regset_view *view = syscall_user_regset_view(task);
const struct user_regset *regset = find_regset(view, type);
int regset_no;
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-14 13:17 [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 Denys Vlasenko @ 2013-02-14 15:00 ` Oleg Nesterov 2013-02-14 16:26 ` Denys Vlasenko 2013-02-14 18:05 ` H. Peter Anvin 0 siblings, 2 replies; 12+ messages in thread From: Oleg Nesterov @ 2013-02-14 15:00 UTC (permalink / raw) To: Denys Vlasenko; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin On 02/14, Denys Vlasenko wrote: > > Determining personality of a ptraced process is a murky area. > On x86, for years strace was looking at segment selectors, > which is conceptually wrong: see, for example, > https://lkml.org/lkml/2012/1/18/320 > > strace recently changed detection method and current git code > (not released yet) does the following: it reads registers > with PTRACE_GETREGSET, and looks at returned regset size. > It is different for 64-bit and 32-bit processes, > and appears to be a reliable way to determine personality: > No need to check segment selectors for magic values. > > This works for well-behaving processes. > > But the hole described in the aforementioned lkml thread > still remains: 64-bit processes can perform 32-bit syscalls > using "int 80" entry method, and in this case, kernel returns > 64-bit regset. For example, this: > > asm("int $0x80": :"a" (29)); /* 32-bit sys_pause */ > > will be decoded by strace as a (64-bit) shmget syscall. > > This patch makes it so that in syscall-entry-stop caused by > "int 80" instruction, PTRACE_GETREGSET returns 32-bit regset. Not sure... First of all, this is incompatible change. And to me, it doesn't look correct anyway. Say, why the debugger can't modify r15 if a 64bit tracee does int80 ? Or think about PTRACE_EVENT_FORK which can be reported with TS_COMPAT set. Probably is_ia32_task() should be reported "explicitely" as we already discussed, and afaik you have other ideas. Oleg. > Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> > > --- linux-3.7.7/arch/x86/include/asm/uaccess.h > +++ linux-3.7.7_regset/arch/x86/include/asm/uaccess.h > @@ -594,5 +594,7 @@ > # include <asm/uaccess_64.h> > #endif > > +#define ARCH_HAS_SYSCALL_USER_REGSET_VIEW 1 > + > #endif /* _ASM_X86_UACCESS_H */ > > --- linux-3.7.7/arch/x86/kernel/ptrace.c > +++ linux-3.7.7_regset/arch/x86/kernel/ptrace.c > @@ -1443,6 +1443,22 @@ > #endif > } > > +const struct user_regset_view *syscall_user_regset_view(struct task_struct *task) > +{ > +#ifdef CONFIG_IA32_EMULATION > + /* Did task make 32-bit syscall just now? > + * Task can still be 64-bit: think "int 0x80 on x86_64". > + */ > + if (task_thread_info(task)->status & TS_COMPAT) > +#endif > +#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION > + return &user_x86_32_view; > +#endif > +#ifdef CONFIG_X86_64 > + return &user_x86_64_view; > +#endif > +} > + > static void fill_sigtrap_info(struct task_struct *tsk, > struct pt_regs *regs, > int error_code, int si_code, > --- linux-3.7.7/include/linux/regset.h > +++ linux-3.7.7_regset/include/linux/regset.h > @@ -204,6 +204,12 @@ > */ > const struct user_regset_view *task_user_regset_view(struct task_struct *tsk); > > +#ifdef ARCH_HAS_SYSCALL_USER_REGSET_VIEW > +const struct user_regset_view *syscall_user_regset_view(struct task_struct *tsk); > +#else > +# define syscall_user_regset_view task_user_regset_view > +#endif > + > > /* > * These are helpers for writing regset get/set functions in arch code. > --- linux-3.7.7/kernel/ptrace.c > +++ linux-3.7.7_regset/kernel/ptrace.c > @@ -684,7 +684,7 @@ > static int ptrace_regset(struct task_struct *task, int req, unsigned int type, > struct iovec *kiov) > { > - const struct user_regset_view *view = task_user_regset_view(task); > + const struct user_regset_view *view = syscall_user_regset_view(task); > const struct user_regset *regset = find_regset(view, type); > int regset_no; > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-14 15:00 ` Oleg Nesterov @ 2013-02-14 16:26 ` Denys Vlasenko 2013-02-14 18:05 ` H. Peter Anvin 1 sibling, 0 replies; 12+ messages in thread From: Denys Vlasenko @ 2013-02-14 16:26 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin On 02/14/2013 04:00 PM, Oleg Nesterov wrote: > On 02/14, Denys Vlasenko wrote: >> This patch makes it so that in syscall-entry-stop caused by >> "int 80" instruction, PTRACE_GETREGSET returns 32-bit regset. > > Not sure... > > First of all, this is incompatible change. And to me, it doesn't look > correct anyway. Say, why the debugger can't modify r15 if a 64bit tracee > does int80? On x86_64, PTRACE_GET/SETREGS can be used for this: they always operate on 64-bit registers. > Or think about PTRACE_EVENT_FORK which can be reported with > TS_COMPAT set. I don't see a problem. Yes, PTRACE_GETREGSET will return 32-bit regset in this ptrace-stop, which is a problem... why? > Probably is_ia32_task() should be reported "explicitely" as we already > discussed, and afaik you have other ideas. Yes, there are a few ideas. Say, new ptrace op: can be introduced to return a vector of longs. To make it easily parsable, how about (type,len,data...) records? This also may allow tracer to indicate which records it wants: for example, not everyone wants to read syscall params. Maybe something like ptrace(PTRACE_GETSYSCALL, pid, list_of_elements_I_want, &iov) where list_of_elements_I_want is long[], 0-terminated, iov points to a buffer, and on return iov.len is updated (a-la GETREGSET). What problems can be solved here? (1) syscall entry/exit discrimination etc. Say, a record can contain bit flags, such as "it's a syscall entry", "it's a syscall exit", "it's a group-stop" etc. Currently, it is impossible for tracer to distinguish syscall entry from syscall exit. (2) a record can supply arch-specific data, such as the x86-specific problem I tried to address: "was it a int 80 syscall?". Variable-length record format makes it easy to adapt to different archs' needs. Alternatively, we can set aside a few bits in "bit flags record" as arch dependent bits. Most arches need just a few bits. (3) on syscall entry, a record can contain (up to) 7 words: syscall_no and 0-6 params, making tracer's code less architecture dependent. Today in strace, *every* architecture needs to have arch-dependent regs-to-params conversion code. I would like to be able to code it in C with the same code for most arches. (4) We can read structs/data pointed by syscall params, such as struct stat returned by fstat, without needing additional round-trip to kernel, *and* with kernel-supplied information on structure's size. Currently, strace has to know the size correctly. There were, and will be, bugs in strace where we mishandle structures because we mis-detect process' bitness, and use wrong struct stat layouts. If kernel would be able to tell us: "I returned 78 byte structure in memory pointed to by arg1", it would help a lot. Even if it wouldn't return the result structure itself (I imagine it's a lot of work in kernel to access it in other process vm), knowing its size will still be a big help. (5) We can read several regsets: gps, SSE regs, etc. Maybe someone would find it useful? (strace doesn't need this). How does this look? I propose to start small, by implementing just 1; 2 in a form of arch bits as part of 1; and 3. It will satisfy the needs I tried to address in my patch. -- vda ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-14 15:00 ` Oleg Nesterov 2013-02-14 16:26 ` Denys Vlasenko @ 2013-02-14 18:05 ` H. Peter Anvin 2013-02-14 19:18 ` Oleg Nesterov 1 sibling, 1 reply; 12+ messages in thread From: H. Peter Anvin @ 2013-02-14 18:05 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Denys Vlasenko, linux-kernel, Andi Kleen On 02/14/2013 07:00 AM, Oleg Nesterov wrote: > On 02/14, Denys Vlasenko wrote: >> >> Determining personality of a ptraced process is a murky area. >> On x86, for years strace was looking at segment selectors, >> which is conceptually wrong: see, for example, >> https://lkml.org/lkml/2012/1/18/320 >> One proposal that keeps being on the table is to export a regset with metadatam, including process mode at launch (i386, x86-64, x32). -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-14 18:05 ` H. Peter Anvin @ 2013-02-14 19:18 ` Oleg Nesterov 2013-02-14 19:21 ` H. Peter Anvin 0 siblings, 1 reply; 12+ messages in thread From: Oleg Nesterov @ 2013-02-14 19:18 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Denys Vlasenko, linux-kernel, Andi Kleen On 02/14, H. Peter Anvin wrote: > > On 02/14/2013 07:00 AM, Oleg Nesterov wrote: >> On 02/14, Denys Vlasenko wrote: >>> >>> Determining personality of a ptraced process is a murky area. >>> On x86, for years strace was looking at segment selectors, >>> which is conceptually wrong: see, for example, >>> https://lkml.org/lkml/2012/1/18/320 >>> > > One proposal that keeps being on the table is to export a regset with > metadatam, including process mode at launch (i386, x86-64, x32). Yes... but if this metadata includes TS_COMPAT-is-set, then strace should do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every time. Or REGSET_META should include META+GENERAL. IOW, it is not clear to me what this "meta" should actually report. Oleg. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-14 19:18 ` Oleg Nesterov @ 2013-02-14 19:21 ` H. Peter Anvin 2013-02-14 20:55 ` Cyrill Gorcunov 2013-02-15 15:42 ` Denys Vlasenko 0 siblings, 2 replies; 12+ messages in thread From: H. Peter Anvin @ 2013-02-14 19:21 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Denys Vlasenko, linux-kernel, Andi Kleen On 02/14/2013 11:18 AM, Oleg Nesterov wrote: > On 02/14, H. Peter Anvin wrote: >> >> On 02/14/2013 07:00 AM, Oleg Nesterov wrote: >>> On 02/14, Denys Vlasenko wrote: >>>> >>>> Determining personality of a ptraced process is a murky area. >>>> On x86, for years strace was looking at segment selectors, >>>> which is conceptually wrong: see, for example, >>>> https://lkml.org/lkml/2012/1/18/320 >>>> >> >> One proposal that keeps being on the table is to export a regset with >> metadatam, including process mode at launch (i386, x86-64, x32). > > Yes... but if this metadata includes TS_COMPAT-is-set, then strace should > do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every > time. Or REGSET_META should include META+GENERAL. > > IOW, it is not clear to me what this "meta" should actually report. > That is one of the things that needs to be nailed down. In particular, what are the things people need. -hpa ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-14 19:21 ` H. Peter Anvin @ 2013-02-14 20:55 ` Cyrill Gorcunov 2013-02-15 14:50 ` Denys Vlasenko 2013-02-15 15:42 ` Denys Vlasenko 1 sibling, 1 reply; 12+ messages in thread From: Cyrill Gorcunov @ 2013-02-14 20:55 UTC (permalink / raw) To: H. Peter Anvin Cc: Oleg Nesterov, Denys Vlasenko, linux-kernel, Andi Kleen, Pavel Emelyanov On Thu, Feb 14, 2013 at 11:21:12AM -0800, H. Peter Anvin wrote: > On 02/14/2013 11:18 AM, Oleg Nesterov wrote: > > On 02/14, H. Peter Anvin wrote: > >> > >> On 02/14/2013 07:00 AM, Oleg Nesterov wrote: > >>> On 02/14, Denys Vlasenko wrote: > >>>> > >>>> Determining personality of a ptraced process is a murky area. > >>>> On x86, for years strace was looking at segment selectors, > >>>> which is conceptually wrong: see, for example, > >>>> https://lkml.org/lkml/2012/1/18/320 > >>>> > >> > >> One proposal that keeps being on the table is to export a regset with > >> metadatam, including process mode at launch (i386, x86-64, x32). > > > > Yes... but if this metadata includes TS_COMPAT-is-set, then strace should > > do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every > > time. Or REGSET_META should include META+GENERAL. > > > > IOW, it is not clear to me what this "meta" should actually report. > > That is one of the things that needs to be nailed down. In particular, > what are the things people need. Indeed, having some "official" way for compat bit retrieval would be a great thing for us (c/r camp) since we've just met the same problem. And at moment I sticked for the same trick as gdb does (cs/ds test). But, guys, if only I'm not missing something completely obvious, can't we simply provide task-compat bit to userspace in say /proc/pid/stat or something? Then the strace/gdb would be able to always know if the tracee is actually in compat mode. Or I miss something fundamental here? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-14 20:55 ` Cyrill Gorcunov @ 2013-02-15 14:50 ` Denys Vlasenko 2013-02-15 14:56 ` Cyrill Gorcunov 0 siblings, 1 reply; 12+ messages in thread From: Denys Vlasenko @ 2013-02-15 14:50 UTC (permalink / raw) To: Cyrill Gorcunov Cc: H. Peter Anvin, Oleg Nesterov, linux-kernel, Andi Kleen, Pavel Emelyanov On 02/14/2013 09:55 PM, Cyrill Gorcunov wrote: > On Thu, Feb 14, 2013 at 11:21:12AM -0800, H. Peter Anvin wrote: >> On 02/14/2013 11:18 AM, Oleg Nesterov wrote: >>> On 02/14, H. Peter Anvin wrote: >>>> >>>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote: >>>>> On 02/14, Denys Vlasenko wrote: >>>>>> >>>>>> Determining personality of a ptraced process is a murky area. >>>>>> On x86, for years strace was looking at segment selectors, >>>>>> which is conceptually wrong: see, for example, >>>>>> https://lkml.org/lkml/2012/1/18/320 >>>>>> >>>> >>>> One proposal that keeps being on the table is to export a regset with >>>> metadatam, including process mode at launch (i386, x86-64, x32). >>> >>> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should >>> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every >>> time. Or REGSET_META should include META+GENERAL. >>> >>> IOW, it is not clear to me what this "meta" should actually report. >> >> That is one of the things that needs to be nailed down. In particular, >> what are the things people need. > > Indeed, having some "official" way for compat bit retrieval would be > a great thing for us (c/r camp) since we've just met the same problem. > And at moment I sticked for the same trick as gdb does (cs/ds test). > > But, guys, if only I'm not missing something completely obvious, > can't we simply provide task-compat bit to userspace in say /proc/pid/stat > or something? Then the strace/gdb would be able to always know if > the tracee is actually in compat mode. Or I miss something fundamental > here? strace needs to get that data on every syscall entry in the traced process. Doing open/read/close on every syscall entry is going to slow it down a lot. -- vda ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-15 14:50 ` Denys Vlasenko @ 2013-02-15 14:56 ` Cyrill Gorcunov 2013-02-15 15:09 ` Oleg Nesterov 0 siblings, 1 reply; 12+ messages in thread From: Cyrill Gorcunov @ 2013-02-15 14:56 UTC (permalink / raw) To: Denys Vlasenko Cc: H. Peter Anvin, Oleg Nesterov, linux-kernel, Andi Kleen, Pavel Emelyanov On Fri, Feb 15, 2013 at 03:50:43PM +0100, Denys Vlasenko wrote: > On 02/14/2013 09:55 PM, Cyrill Gorcunov wrote: > > On Thu, Feb 14, 2013 at 11:21:12AM -0800, H. Peter Anvin wrote: > >> On 02/14/2013 11:18 AM, Oleg Nesterov wrote: > >>> On 02/14, H. Peter Anvin wrote: > >>>> > >>>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote: > >>>>> On 02/14, Denys Vlasenko wrote: > >>>>>> > >>>>>> Determining personality of a ptraced process is a murky area. > >>>>>> On x86, for years strace was looking at segment selectors, > >>>>>> which is conceptually wrong: see, for example, > >>>>>> https://lkml.org/lkml/2012/1/18/320 > >>>>>> > >>>> > >>>> One proposal that keeps being on the table is to export a regset with > >>>> metadatam, including process mode at launch (i386, x86-64, x32). > >>> > >>> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should > >>> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every > >>> time. Or REGSET_META should include META+GENERAL. > >>> > >>> IOW, it is not clear to me what this "meta" should actually report. > >> > >> That is one of the things that needs to be nailed down. In particular, > >> what are the things people need. > > > > Indeed, having some "official" way for compat bit retrieval would be > > a great thing for us (c/r camp) since we've just met the same problem. > > And at moment I sticked for the same trick as gdb does (cs/ds test). > > > > But, guys, if only I'm not missing something completely obvious, > > can't we simply provide task-compat bit to userspace in say /proc/pid/stat > > or something? Then the strace/gdb would be able to always know if > > the tracee is actually in compat mode. Or I miss something fundamental > > here? > > strace needs to get that data on every syscall entry in the traced process. > Doing open/read/close on every syscall entry is going to slow it down a lot. Don't you need to read it only once when strace is attached? Compat flag can't be arbitrary dropped when program executes, no? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-15 14:56 ` Cyrill Gorcunov @ 2013-02-15 15:09 ` Oleg Nesterov 2013-02-15 15:16 ` Cyrill Gorcunov 0 siblings, 1 reply; 12+ messages in thread From: Oleg Nesterov @ 2013-02-15 15:09 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Denys Vlasenko, H. Peter Anvin, linux-kernel, Andi Kleen, Pavel Emelyanov On 02/15, Cyrill Gorcunov wrote: > > > strace needs to get that data on every syscall entry in the traced process. > > Doing open/read/close on every syscall entry is going to slow it down a lot. > > Don't you need to read it only once when strace is attached? Compat flag can't > be arbitrary dropped when program executes, no? TS_COMPAT is set/cleared every time a 64bit task does int80. I guess you need TIF_IA32, not TS_COMPAT, for c/r. Oleg. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-15 15:09 ` Oleg Nesterov @ 2013-02-15 15:16 ` Cyrill Gorcunov 0 siblings, 0 replies; 12+ messages in thread From: Cyrill Gorcunov @ 2013-02-15 15:16 UTC (permalink / raw) To: Oleg Nesterov Cc: Denys Vlasenko, H. Peter Anvin, linux-kernel, Andi Kleen, Pavel Emelyanov On Fri, Feb 15, 2013 at 04:09:40PM +0100, Oleg Nesterov wrote: > On 02/15, Cyrill Gorcunov wrote: > > > > > strace needs to get that data on every syscall entry in the traced process. > > > Doing open/read/close on every syscall entry is going to slow it down a lot. > > > > Don't you need to read it only once when strace is attached? Compat flag can't > > be arbitrary dropped when program executes, no? > > TS_COMPAT is set/cleared every time a 64bit task does int80. > > I guess you need TIF_IA32, not TS_COMPAT, for c/r. Yeah, indeed. Still if there will be a metadata in registers set, this will be even better I think. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 2013-02-14 19:21 ` H. Peter Anvin 2013-02-14 20:55 ` Cyrill Gorcunov @ 2013-02-15 15:42 ` Denys Vlasenko 1 sibling, 0 replies; 12+ messages in thread From: Denys Vlasenko @ 2013-02-15 15:42 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Oleg Nesterov, linux-kernel, Andi Kleen, jan.kratochvil On 02/14/2013 08:21 PM, H. Peter Anvin wrote: > On 02/14/2013 11:18 AM, Oleg Nesterov wrote: >> On 02/14, H. Peter Anvin wrote: >>> >>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote: >>>> On 02/14, Denys Vlasenko wrote: >>>>> >>>>> Determining personality of a ptraced process is a murky area. >>>>> On x86, for years strace was looking at segment selectors, >>>>> which is conceptually wrong: see, for example, >>>>> https://lkml.org/lkml/2012/1/18/320 >>>>> >>> >>> One proposal that keeps being on the table is to export a regset with >>> metadatam, including process mode at launch (i386, x86-64, x32). >> >> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should >> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every >> time. Or REGSET_META should include META+GENERAL. >> >> IOW, it is not clear to me what this "meta" should actually report. > > That is one of the things that needs to be nailed down. In particular, > what are the things people need. Let's see what strace needs, by examining its source for various arches. Ow. Six instances of PTRACE_PEEKTEXT (i.e. attempts to read tracee's code - inherently unsafe operation) in syscall.c, affected arches: S390: for syscall# fetch, thankfully only needed before 2.5.44; ARM: for syscall# fetch. Looks like only needed for non-EABI? SPARC: for personality detection. Examples of personality detection: POWERPC64: by examining registers (MSR) X86: by looking at GETREGSET size IA64: by examining registers (CR_IPSR) ARM: by checking syscall no (scno & 0x0f0000) SPARC: by looking at trap instruction Syscall entry versus exit detection (i.e. a sanity check): ALPHA, MIPS: registers (if a3 is 0 or -1, it's exit) S390: registers (messy code) X86: registers (eax must be -ENOSYS on entry) In general, it is not reliable: eax must be -ENOSYS on entry, but it can be -ENOSYS on exit too. IOW: if we see eax == -ENOSYS, we have noi idea whether it's entry or exit. Syscall parameters fetching. Some architectures need to use nontrivial code. Look at this: #elif defined(IA64) if (!ia32) { unsigned long *out0, cfm, sof, sol; long rbs_end; /* be backwards compatible with kernel < 2.4.4... */ # ifndef PT_RBS_END # define PT_RBS_END PT_AR_BSP # endif if (upeek(tcp, PT_RBS_END, &rbs_end) < 0) return -1; if (upeek(tcp, PT_CFM, (long *) &cfm) < 0) return -1; sof = (cfm >> 0) & 0x7f; sol = (cfm >> 7) & 0x7f; out0 = ia64_rse_skip_regs((unsigned long *) rbs_end, -sof + sol); for (i = 0; i < nargs; ++i) { if (umoven(tcp, (unsigned long) ia64_rse_skip_regs(out0, i), sizeof(long), (char *) &tcp->u_arg[i]) < 0) return -1; } or this: #elif defined(MIPS) if (nargs > 4) { long sp; if (upeek(tcp, REG_SP, &sp) < 0) return -1; for (i = 0; i < 4; ++i) if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0) return -1; umoven(tcp, sp + 16, (nargs - 4) * sizeof(tcp->u_arg[0]), (char *)(tcp->u_arg + 4)); } else { for (i = 0; i < nargs; ++i) if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0) return -1; } Detecting error exits from syscalls. Most arches use the -errno convention, others (IA64, SPARC, MIPS) have dedicated register or bit in a status register to indicate error. Some syscalls "never fail" (e.g. getgid), and strace needs to know which syscalls never fail. If you want to take a look yourself, for your convenience I attached larger excerpts from strace's syscall.c source file. To summarize: Looks like this particular ptrace user would benefit from the following data: * is it a syscall entry, exit, or something else. * for syscall entry: - parameters width (32/64/etc) and personality data (if arch has personality data more fine-grained than "32/64 bits") - syscall no - parameters * for syscall exit: - parameters width (32/64/etc) and personality data - error indicator (errno)? - syscall result Does this look as a good format? -- vda static int get_scno(struct tcb *tcp) { long scno = 0; #if defined(S390) || defined(S390X) if (upeek(tcp, PT_GPR2, &syscall_mode) < 0) return -1; if (syscall_mode != -ENOSYS) { /* * Since kernel version 2.5.44 the scno gets passed in gpr2. */ scno = syscall_mode; } else { /* * Old style of "passing" the scno via the SVC instruction. */ long psw; long opcode, offset_reg, tmp; void *svc_addr; static const int gpr_offset[16] = { PT_GPR0, PT_GPR1, PT_ORIGGPR2, PT_GPR3, PT_GPR4, PT_GPR5, PT_GPR6, PT_GPR7, PT_GPR8, PT_GPR9, PT_GPR10, PT_GPR11, PT_GPR12, PT_GPR13, PT_GPR14, PT_GPR15 }; if (upeek(tcp, PT_PSWADDR, &psw) < 0) return -1; errno = 0; opcode = ptrace(PTRACE_PEEKTEXT, tcp->pid, (char *)(psw - sizeof(long)), 0); if (errno) { perror_msg("%s", "peektext(psw-oneword)"); return -1; } /* * We have to check if the SVC got executed directly or via an * EXECUTE instruction. In case of EXECUTE it is necessary to do * instruction decoding to derive the system call number. * Unfortunately the opcode sizes of EXECUTE and SVC are differently, * so that this doesn't work if a SVC opcode is part of an EXECUTE * opcode. Since there is no way to find out the opcode size this * is the best we can do... */ if ((opcode & 0xff00) == 0x0a00) { /* SVC opcode */ scno = opcode & 0xff; } else { /* SVC got executed by EXECUTE instruction */ /* * Do instruction decoding of EXECUTE. If you really want to * understand this, read the Principles of Operations. */ svc_addr = (void *) (opcode & 0xfff); tmp = 0; offset_reg = (opcode & 0x000f0000) >> 16; if (offset_reg && (upeek(tcp, gpr_offset[offset_reg], &tmp) < 0)) return -1; svc_addr += tmp; tmp = 0; offset_reg = (opcode & 0x0000f000) >> 12; if (offset_reg && (upeek(tcp, gpr_offset[offset_reg], &tmp) < 0)) return -1; svc_addr += tmp; scno = ptrace(PTRACE_PEEKTEXT, tcp->pid, svc_addr, 0); if (errno) return -1; # if defined(S390X) scno >>= 48; # else scno >>= 16; # endif tmp = 0; offset_reg = (opcode & 0x00f00000) >> 20; if (offset_reg && (upeek(tcp, gpr_offset[offset_reg], &tmp) < 0)) return -1; scno = (scno | tmp) & 0xff; } } #elif defined(POWERPC) if (upeek(tcp, sizeof(unsigned long)*PT_R0, &scno) < 0) return -1; # ifdef POWERPC64 /* TODO: speed up strace by not doing this at every syscall. * We only need to do it after execve. */ int currpers; long val; /* Check for 64/32 bit mode. */ if (upeek(tcp, sizeof(unsigned long)*PT_MSR, &val) < 0) return -1; /* SF is bit 0 of MSR */ if (val < 0) currpers = 0; else currpers = 1; update_personality(tcp, currpers); # endif #elif defined(X86_64) || defined(X32) int currpers; /* GETREGSET of NT_PRSTATUS tells us regset size, * which unambiguously detects i386. * * Linux kernel distinguishes x86-64 and x32 processes * solely by looking at __X32_SYSCALL_BIT: * arch/x86/include/asm/compat.h::is_x32_task(): * if (task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT) * return true; */ if (x86_io.iov_len == sizeof(i386_regs)) { scno = i386_regs.orig_eax; currpers = 1; } else { scno = x86_64_regs.orig_rax; currpers = 0; if (scno & __X32_SYSCALL_BIT) { scno -= __X32_SYSCALL_BIT; currpers = 2; } } update_personality(tcp, currpers); #elif defined(IA64) long psr; if (upeek(tcp, PT_CR_IPSR, &psr) >= 0) ia32 = (psr & IA64_PSR_IS) != 0; if (ia32) { if (upeek(tcp, PT_R1, &scno) < 0) return -1; } else { if (upeek(tcp, PT_R15, &scno) < 0) return -1; } #elif defined(AARCH64) switch (aarch64_io.iov_len) { case sizeof(aarch64_regs): /* We are in 64-bit mode */ scno = aarch64_regs.regs[8]; update_personality(tcp, 1); break; case sizeof(arm_regs): /* We are in 32-bit mode */ scno = arm_regs.ARM_r7; update_personality(tcp, 0); break; } #elif defined(ARM) /* * We only need to grab the syscall number on syscall entry. */ if (arm_regs.ARM_ip == 0) { /* * Note: we only deal with 32-bit CPUs here */ if (arm_regs.ARM_cpsr & 0x20) { /* * Get the Thumb-mode system call number */ scno = arm_regs.ARM_r7; } else { /* * Get the ARM-mode system call number */ errno = 0; scno = ptrace(PTRACE_PEEKTEXT, tcp->pid, (void *)(arm_regs.ARM_pc - 4), NULL); if (errno) return -1; /* Handle the EABI syscall convention. We do not bother converting structures between the two ABIs, but basic functionality should work even if strace and the traced program have different ABIs. */ if (scno == 0xef000000) { scno = arm_regs.ARM_r7; } else { if ((scno & 0x0ff00000) != 0x0f900000) { fprintf(stderr, "syscall: unknown syscall trap 0x%08lx\n", scno); return -1; } /* * Fixup the syscall number */ scno &= 0x000fffff; } } if (scno & 0x0f0000) { /* * Handle ARM specific syscall */ update_personality(tcp, 1); scno &= 0x0000ffff; } else update_personality(tcp, 0); } else { fprintf(stderr, "pid %d stray syscall entry\n", tcp->pid); tcp->flags |= TCB_INSYSCALL; } #elif defined(LINUX_MIPSN32) unsigned long long regs[38]; if (ptrace(PTRACE_GETREGS, tcp->pid, NULL, (long) ®s) < 0) return -1; mips_a3 = regs[REG_A3]; mips_r2 = regs[REG_V0]; scno = mips_r2; if (!SCNO_IN_RANGE(scno)) { if (mips_a3 == 0 || mips_a3 == -1) { if (debug_flag) fprintf(stderr, "stray syscall exit: v0 = %ld\n", scno); return 0; } } #elif defined(MIPS) if (upeek(tcp, REG_A3, &mips_a3) < 0) return -1; if (upeek(tcp, REG_V0, &scno) < 0) return -1; if (!SCNO_IN_RANGE(scno)) { if (mips_a3 == 0 || mips_a3 == -1) { if (debug_flag) fprintf(stderr, "stray syscall exit: v0 = %ld\n", scno); return 0; } } #elif defined(ALPHA) if (upeek(tcp, REG_A3, &alpha_a3) < 0) return -1; if (upeek(tcp, REG_R0, &scno) < 0) return -1; /* * Do some sanity checks to figure out if it's * really a syscall entry */ if (!SCNO_IN_RANGE(scno)) { if (alpha_a3 == 0 || alpha_a3 == -1) { if (debug_flag) fprintf(stderr, "stray syscall exit: r0 = %ld\n", scno); return 0; } } #elif defined(SPARC) || defined(SPARC64) /* Disassemble the syscall trap. */ /* Retrieve the syscall trap instruction. */ unsigned long trap; errno = 0; # if defined(SPARC64) trap = ptrace(PTRACE_PEEKTEXT, tcp->pid, (char *)regs.tpc, 0); trap >>= 32; # else trap = ptrace(PTRACE_PEEKTEXT, tcp->pid, (char *)regs.pc, 0); # endif if (errno) return -1; /* Disassemble the trap to see what personality to use. */ switch (trap) { case 0x91d02010: /* Linux/SPARC syscall trap. */ update_personality(tcp, 0); break; case 0x91d0206d: /* Linux/SPARC64 syscall trap. */ update_personality(tcp, 2); break; case 0x91d02000: /* SunOS syscall trap. (pers 1) */ fprintf(stderr, "syscall: SunOS no support\n"); return -1; case 0x91d02008: /* Solaris 2.x syscall trap. (per 2) */ update_personality(tcp, 1); break; case 0x91d02009: /* NetBSD/FreeBSD syscall trap. */ fprintf(stderr, "syscall: NetBSD/FreeBSD not supported\n"); return -1; case 0x91d02027: /* Solaris 2.x gettimeofday */ update_personality(tcp, 1); break; default: # if defined(SPARC64) fprintf(stderr, "syscall: unknown syscall trap %08lx %016lx\n", trap, regs.tpc); # else fprintf(stderr, "syscall: unknown syscall trap %08lx %08lx\n", trap, regs.pc); # endif return -1; } /* Extract the system call number from the registers. */ if (trap == 0x91d02027) scno = 156; else scno = regs.u_regs[U_REG_G1]; if (scno == 0) { scno = regs.u_regs[U_REG_O0]; memmove(®s.u_regs[U_REG_O0], ®s.u_regs[U_REG_O1], 7*sizeof(regs.u_regs[0])); } #elif defined(TILE) int currpers; scno = tile_regs.regs[10]; # ifdef __tilepro__ currpers = 1; # else # ifndef PT_FLAGS_COMPAT # define PT_FLAGS_COMPAT 0x10000 /* from Linux 3.8 on */ # endif if (tile_regs.flags & PT_FLAGS_COMPAT) currpers = 1; else currpers = 0; # endif update_personality(tcp, currpers); #endif tcp->scno = scno; return 1; } /* Called at each syscall entry. * Returns: * 0: "ignore this ptrace stop", bail out of trace_syscall_entering() silently. * 1: ok, continue in trace_syscall_entering(). * other: error, trace_syscall_entering() should print error indicator * ("????" etc) and bail out. */ static int syscall_fixup_on_sysenter(struct tcb *tcp) { /* A common case of "not a syscall entry" is post-execve SIGTRAP */ #if defined(I386) if (i386_regs.eax != -ENOSYS) { if (debug_flag) fprintf(stderr, "not a syscall entry (eax = %ld)\n", i386_regs.eax); return 0; } #elif defined(X86_64) || defined(X32) { long rax; if (x86_io.iov_len == sizeof(i386_regs)) { /* Sign extend from 32 bits */ rax = (int32_t)i386_regs.eax; } else { /* Note: in X32 build, this truncates 64 to 32 bits */ rax = x86_64_regs.rax; } if (rax != -ENOSYS) { if (debug_flag) fprintf(stderr, "not a syscall entry (rax = %ld)\n", rax); return 0; } } #elif defined(S390) || defined(S390X) /* TODO: we already fetched PT_GPR2 in get_scno * and stored it in syscall_mode, reuse it here * instead of re-fetching? */ if (upeek(tcp, PT_GPR2, &gpr2) < 0) return -1; if (syscall_mode != -ENOSYS) syscall_mode = tcp->scno; if (gpr2 != syscall_mode) { if (debug_flag) fprintf(stderr, "not a syscall entry (gpr2 = %ld)\n", gpr2); return 0; } #elif defined(M68K) if (upeek(tcp, 4*PT_D0, &m68k_d0) < 0) return -1; if (m68k_d0 != -ENOSYS) { if (debug_flag) fprintf(stderr, "not a syscall entry (d0 = %ld)\n", m68k_d0); return 0; } #elif defined(IA64) if (upeek(tcp, PT_R10, &ia64_r10) < 0) return -1; if (upeek(tcp, PT_R8, &ia64_r8) < 0) return -1; if (ia32 && ia64_r8 != -ENOSYS) { if (debug_flag) fprintf(stderr, "not a syscall entry (r8 = %ld)\n", ia64_r8); return 0; } #elif defined(CRISV10) || defined(CRISV32) if (upeek(tcp, 4*PT_R10, &cris_r10) < 0) return -1; if (cris_r10 != -ENOSYS) { if (debug_flag) fprintf(stderr, "not a syscall entry (r10 = %ld)\n", cris_r10); return 0; } #elif defined(MICROBLAZE) if (upeek(tcp, 3 * 4, µblaze_r3) < 0) return -1; if (microblaze_r3 != -ENOSYS) { if (debug_flag) fprintf(stderr, "not a syscall entry (r3 = %ld)\n", microblaze_r3); return 0; } #endif return 1; } /* Return -1 on error or 1 on success (never 0!) */ static int get_syscall_args(struct tcb *tcp) { int i, nargs; if (SCNO_IN_RANGE(tcp->scno)) nargs = tcp->u_nargs = sysent[tcp->scno].nargs; else nargs = tcp->u_nargs = MAX_ARGS; #if defined(S390) || defined(S390X) for (i = 0; i < nargs; ++i) if (upeek(tcp, i==0 ? PT_ORIGGPR2 : PT_GPR2 + i*sizeof(long), &tcp->u_arg[i]) < 0) return -1; #elif defined(ALPHA) for (i = 0; i < nargs; ++i) if (upeek(tcp, REG_A0+i, &tcp->u_arg[i]) < 0) return -1; #elif defined(IA64) if (!ia32) { unsigned long *out0, cfm, sof, sol; long rbs_end; /* be backwards compatible with kernel < 2.4.4... */ # ifndef PT_RBS_END # define PT_RBS_END PT_AR_BSP # endif if (upeek(tcp, PT_RBS_END, &rbs_end) < 0) return -1; if (upeek(tcp, PT_CFM, (long *) &cfm) < 0) return -1; sof = (cfm >> 0) & 0x7f; sol = (cfm >> 7) & 0x7f; out0 = ia64_rse_skip_regs((unsigned long *) rbs_end, -sof + sol); for (i = 0; i < nargs; ++i) { if (umoven(tcp, (unsigned long) ia64_rse_skip_regs(out0, i), sizeof(long), (char *) &tcp->u_arg[i]) < 0) return -1; } } else { static const int argreg[MAX_ARGS] = { PT_R11 /* EBX = out0 */, PT_R9 /* ECX = out1 */, PT_R10 /* EDX = out2 */, PT_R14 /* ESI = out3 */, PT_R15 /* EDI = out4 */, PT_R13 /* EBP = out5 */}; for (i = 0; i < nargs; ++i) { if (upeek(tcp, argreg[i], &tcp->u_arg[i]) < 0) return -1; /* truncate away IVE sign-extension */ tcp->u_arg[i] &= 0xffffffff; } } #elif defined(MIPS) if (nargs > 4) { long sp; if (upeek(tcp, REG_SP, &sp) < 0) return -1; for (i = 0; i < 4; ++i) if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0) return -1; umoven(tcp, sp + 16, (nargs - 4) * sizeof(tcp->u_arg[0]), (char *)(tcp->u_arg + 4)); } else { for (i = 0; i < nargs; ++i) if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0) return -1; } #elif defined(M68K) for (i = 0; i < nargs; ++i) if (upeek(tcp, (i < 5 ? i : i + 2)*4, &tcp->u_arg[i]) < 0) return -1; #else /* Other architecture (32bits specific) */ for (i = 0; i < nargs; ++i) if (upeek(tcp, i*4, &tcp->u_arg[i]) < 0) return -1; #endif return 1; } /* Returns: * 1: ok, continue in trace_syscall_exiting(). * -1: error, trace_syscall_exiting() should print error indicator * ("????" etc) and bail out. */ static int get_syscall_result(struct tcb *tcp) { #if defined(S390) || defined(S390X) if (upeek(tcp, PT_GPR2, &gpr2) < 0) return -1; #elif defined(POWERPC) # define SO_MASK 0x10000000 { long flags; if (upeek(tcp, sizeof(unsigned long)*PT_CCR, &flags) < 0) return -1; if (upeek(tcp, sizeof(unsigned long)*PT_R3, &ppc_result) < 0) return -1; if (flags & SO_MASK) ppc_result = -ppc_result; } #elif defined(AVR32) /* already done by get_regs */ #elif defined(BFIN) if (upeek(tcp, PT_R0, &bfin_r0) < 0) return -1; #elif defined(I386) /* already done by get_regs */ #elif defined(X86_64) || defined(X32) /* already done by get_regs */ #elif defined(IA64) # define IA64_PSR_IS ((long)1 << 34) long psr; if (upeek(tcp, PT_CR_IPSR, &psr) >= 0) ia32 = (psr & IA64_PSR_IS) != 0; if (upeek(tcp, PT_R8, &ia64_r8) < 0) return -1; if (upeek(tcp, PT_R10, &ia64_r10) < 0) return -1; #elif defined(ARM) /* already done by get_regs */ #elif defined(AARCH64) /* register reading already done by get_regs */ /* Used to do this, but we did it on syscall entry already: */ /* We are in 64-bit mode (personality 1) if register struct is aarch64_regs, * else it's personality 0. */ /*update_personality(tcp, aarch64_io.iov_len == sizeof(aarch64_regs));*/ #elif defined(M68K) if (upeek(tcp, 4*PT_D0, &m68k_d0) < 0) return -1; #elif defined(LINUX_MIPSN32) unsigned long long regs[38]; if (ptrace(PTRACE_GETREGS, tcp->pid, NULL, (long) ®s) < 0) return -1; mips_a3 = regs[REG_A3]; mips_r2 = regs[REG_V0]; #elif defined(MIPS) if (upeek(tcp, REG_A3, &mips_a3) < 0) return -1; if (upeek(tcp, REG_V0, &mips_r2) < 0) return -1; #elif defined(ALPHA) if (upeek(tcp, REG_A3, &alpha_a3) < 0) return -1; if (upeek(tcp, REG_R0, &alpha_r0) < 0) return -1; #elif defined(SPARC) || defined(SPARC64) /* already done by get_regs */ #elif defined(HPPA) if (upeek(tcp, PT_GR28, &hppa_r28) < 0) return -1; #elif defined(SH) /* new syscall ABI returns result in R0 */ if (upeek(tcp, 4*REG_REG0, (long *)&sh_r0) < 0) return -1; #elif defined(SH64) /* ABI defines result returned in r9 */ if (upeek(tcp, REG_GENERAL(9), (long *)&sh64_r9) < 0) return -1; #elif defined(CRISV10) || defined(CRISV32) if (upeek(tcp, 4*PT_R10, &cris_r10) < 0) return -1; #elif defined(TILE) /* already done by get_regs */ #elif defined(MICROBLAZE) if (upeek(tcp, 3 * 4, µblaze_r3) < 0) return -1; #elif defined(OR1K) /* already done by get_regs */ #endif return 1; } /* Called at each syscall exit */ static void syscall_fixup_on_sysexit(struct tcb *tcp) { #if defined(S390) || defined(S390X) if (syscall_mode != -ENOSYS) syscall_mode = tcp->scno; if ((tcp->flags & TCB_WAITEXECVE) && (gpr2 == -ENOSYS || gpr2 == tcp->scno)) { /* * Return from execve. * Fake a return value of zero. We leave the TCB_WAITEXECVE * flag set for the post-execve SIGTRAP to see and reset. */ gpr2 = 0; } #endif } /* Returns: * 1: ok, continue in trace_syscall_exiting(). * -1: error, trace_syscall_exiting() should print error indicator * ("????" etc) and bail out. */ static int get_error(struct tcb *tcp) { int u_error = 0; int check_errno = 1; if (SCNO_IN_RANGE(tcp->scno) && sysent[tcp->scno].sys_flags & SYSCALL_NEVER_FAILS) { check_errno = 0; } #if defined(S390) || defined(S390X) if (check_errno && is_negated_errno(gpr2)) { tcp->u_rval = -1; u_error = -gpr2; } else { tcp->u_rval = gpr2; } #elif defined(I386) if (check_errno && is_negated_errno(i386_regs.eax)) { tcp->u_rval = -1; u_error = -i386_regs.eax; } else { tcp->u_rval = i386_regs.eax; } #elif defined(X86_64) long rax; if (x86_io.iov_len == sizeof(i386_regs)) { /* Sign extend from 32 bits */ rax = (int32_t)i386_regs.eax; } else { rax = x86_64_regs.rax; } if (check_errno && is_negated_errno(rax)) { tcp->u_rval = -1; u_error = -rax; } else { tcp->u_rval = rax; } #elif defined(X32) /* In X32, return value is 64-bit (llseek uses one). * Using merely "long rax" would not work. */ long long rax; if (x86_io.iov_len == sizeof(i386_regs)) { /* Sign extend from 32 bits */ rax = (int32_t)i386_regs.eax; } else { rax = x86_64_regs.rax; } /* Careful: is_negated_errno() works only on longs */ if (check_errno && is_negated_errno_x32(rax)) { tcp->u_rval = -1; u_error = -rax; } else { tcp->u_rval = rax; /* truncating */ tcp->u_lrval = rax; } #elif defined(IA64) if (ia32) { int err; err = (int)ia64_r8; if (check_errno && is_negated_errno(err)) { tcp->u_rval = -1; u_error = -err; } else { tcp->u_rval = err; } } else { if (check_errno && ia64_r10) { tcp->u_rval = -1; u_error = ia64_r8; } else { tcp->u_rval = ia64_r8; } } #elif defined(MIPS) if (check_errno && mips_a3) { tcp->u_rval = -1; u_error = mips_r2; } else { tcp->u_rval = mips_r2; # if defined(LINUX_MIPSN32) tcp->u_lrval = mips_r2; # endif } #elif defined(POWERPC) if (check_errno && is_negated_errno(ppc_result)) { tcp->u_rval = -1; u_error = -ppc_result; } else { tcp->u_rval = ppc_result; } #elif defined(M68K) if (check_errno && is_negated_errno(m68k_d0)) { tcp->u_rval = -1; u_error = -m68k_d0; } else { tcp->u_rval = m68k_d0; } #elif defined(ARM) || defined(AARCH64) # if defined(AARCH64) if (tcp->currpers == 1) { if (check_errno && is_negated_errno(aarch64_regs.regs[0])) { tcp->u_rval = -1; u_error = -aarch64_regs.regs[0]; } else { tcp->u_rval = aarch64_regs.regs[0]; } } else # endif { if (check_errno && is_negated_errno(arm_regs.ARM_r0)) { tcp->u_rval = -1; u_error = -arm_regs.ARM_r0; } else { tcp->u_rval = arm_regs.ARM_r0; } } #elif defined(AVR32) if (check_errno && regs.r12 && (unsigned) -regs.r12 < nerrnos) { tcp->u_rval = -1; u_error = -regs.r12; } else { tcp->u_rval = regs.r12; } #elif defined(BFIN) if (check_errno && is_negated_errno(bfin_r0)) { tcp->u_rval = -1; u_error = -bfin_r0; } else { tcp->u_rval = bfin_r0; } #elif defined(ALPHA) if (check_errno && alpha_a3) { tcp->u_rval = -1; u_error = alpha_r0; } else { tcp->u_rval = alpha_r0; } #elif defined(SPARC) if (check_errno && regs.psr & PSR_C) { tcp->u_rval = -1; u_error = regs.u_regs[U_REG_O0]; } else { tcp->u_rval = regs.u_regs[U_REG_O0]; } #elif defined(SPARC64) if (check_errno && regs.tstate & 0x1100000000UL) { tcp->u_rval = -1; u_error = regs.u_regs[U_REG_O0]; } else { tcp->u_rval = regs.u_regs[U_REG_O0]; } #elif defined(HPPA) if (check_errno && is_negated_errno(hppa_r28)) { tcp->u_rval = -1; u_error = -hppa_r28; } else { tcp->u_rval = hppa_r28; } #elif defined(SH) if (check_errno && is_negated_errno(sh_r0)) { tcp->u_rval = -1; u_error = -sh_r0; } else { tcp->u_rval = sh_r0; } #elif defined(SH64) if (check_errno && is_negated_errno(sh64_r9)) { tcp->u_rval = -1; u_error = -sh64_r9; } else { tcp->u_rval = sh64_r9; } #elif defined(CRISV10) || defined(CRISV32) if (check_errno && cris_r10 && (unsigned) -cris_r10 < nerrnos) { tcp->u_rval = -1; u_error = -cris_r10; } else { tcp->u_rval = cris_r10; } #elif defined(TILE) /* * The standard tile calling convention returns the value (or negative * errno) in r0, and zero (or positive errno) in r1. * Until at least kernel 3.8, however, the r1 value is not reflected * in ptregs at this point, so we use r0 here. */ if (check_errno && is_negated_errno(tile_regs.regs[0])) { tcp->u_rval = -1; u_error = -tile_regs.regs[0]; } else { tcp->u_rval = tile_regs.regs[0]; } #elif defined(MICROBLAZE) if (check_errno && is_negated_errno(microblaze_r3)) { tcp->u_rval = -1; u_error = -microblaze_r3; } else { tcp->u_rval = microblaze_r3; } #elif defined(OR1K) if (check_errno && is_negated_errno(or1k_regs.gpr[11])) { tcp->u_rval = -1; u_error = -or1k_regs.gpr[11]; } else { tcp->u_rval = or1k_regs.gpr[11]; } #endif tcp->u_error = u_error; return 1; } ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-02-15 15:42 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-14 13:17 [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 Denys Vlasenko 2013-02-14 15:00 ` Oleg Nesterov 2013-02-14 16:26 ` Denys Vlasenko 2013-02-14 18:05 ` H. Peter Anvin 2013-02-14 19:18 ` Oleg Nesterov 2013-02-14 19:21 ` H. Peter Anvin 2013-02-14 20:55 ` Cyrill Gorcunov 2013-02-15 14:50 ` Denys Vlasenko 2013-02-15 14:56 ` Cyrill Gorcunov 2013-02-15 15:09 ` Oleg Nesterov 2013-02-15 15:16 ` Cyrill Gorcunov 2013-02-15 15:42 ` Denys Vlasenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox