[PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
@ 2013-02-14 13:17 Denys Vlasenko
  2013-02-14 15:00 ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Denys Vlasenko @ 2013-02-14 13:17 UTC (permalink / raw)
  To: linux-kernel, Oleg Nesterov

Determining personality of a ptraced process is a murky area.
On x86, for years strace was looking at segment selectors,
which is conceptually wrong: see, for example,
https://lkml.org/lkml/2012/1/18/320

strace recently changed detection method and current git code
(not released yet) does the following: it reads registers
with PTRACE_GETREGSET, and looks at returned regset size.
It is different for 64-bit and 32-bit processes,
and appears to be a reliable way to determine personality:
No need to check segment selectors for magic values.

This works for well-behaving processes.

But the hole described in the aforementioned lkml thread
still remains: 64-bit processes can perform 32-bit syscalls
using "int 80" entry method, and in this case, kernel returns
64-bit regset. For example, this:

asm("int $0x80": :"a" (29)); /* 32-bit sys_pause */

will be decoded by strace as a (64-bit) shmget syscall.

This patch makes it so that in syscall-entry-stop caused by
"int 80" instruction, PTRACE_GETREGSET returns 32-bit regset.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>

--- linux-3.7.7/arch/x86/include/asm/uaccess.h
+++ linux-3.7.7_regset/arch/x86/include/asm/uaccess.h
@@ -594,5 +594,7 @@
 # include <asm/uaccess_64.h>
 #endif

+#define ARCH_HAS_SYSCALL_USER_REGSET_VIEW 1
+
 #endif /* _ASM_X86_UACCESS_H */

--- linux-3.7.7/arch/x86/kernel/ptrace.c
+++ linux-3.7.7_regset/arch/x86/kernel/ptrace.c
@@ -1443,6 +1443,22 @@
 #endif
 }

+const struct user_regset_view *syscall_user_regset_view(struct task_struct *task)
+{
+#ifdef CONFIG_IA32_EMULATION
+	/* Did task make 32-bit syscall just now?
+	 * Task can still be 64-bit: think "int 0x80 on x86_64".
+	 */
+	if (task_thread_info(task)->status & TS_COMPAT)
+#endif
+#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
+		return &user_x86_32_view;
+#endif
+#ifdef CONFIG_X86_64
+	return &user_x86_64_view;
+#endif
+}
+
 static void fill_sigtrap_info(struct task_struct *tsk,
 				struct pt_regs *regs,
 				int error_code, int si_code,
--- linux-3.7.7/include/linux/regset.h
+++ linux-3.7.7_regset/include/linux/regset.h
@@ -204,6 +204,12 @@
  */
 const struct user_regset_view *task_user_regset_view(struct task_struct *tsk);

+#ifdef ARCH_HAS_SYSCALL_USER_REGSET_VIEW
+const struct user_regset_view *syscall_user_regset_view(struct task_struct *tsk);
+#else
+# define syscall_user_regset_view task_user_regset_view
+#endif
+

 /*
  * These are helpers for writing regset get/set functions in arch code.
--- linux-3.7.7/kernel/ptrace.c
+++ linux-3.7.7_regset/kernel/ptrace.c
@@ -684,7 +684,7 @@
 static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
 			 struct iovec *kiov)
 {
-	const struct user_regset_view *view = task_user_regset_view(task);
+	const struct user_regset_view *view = syscall_user_regset_view(task);
 	const struct user_regset *regset = find_regset(view, type);
 	int regset_no;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-14 13:17 [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 Denys Vlasenko
@ 2013-02-14 15:00 ` Oleg Nesterov
  2013-02-14 16:26   ` Denys Vlasenko
  2013-02-14 18:05   ` H. Peter Anvin
  0 siblings, 2 replies; 12+ messages in thread
From: Oleg Nesterov @ 2013-02-14 15:00 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin

On 02/14, Denys Vlasenko wrote:
>
> Determining personality of a ptraced process is a murky area.
> On x86, for years strace was looking at segment selectors,
> which is conceptually wrong: see, for example,
> https://lkml.org/lkml/2012/1/18/320
>
> strace recently changed detection method and current git code
> (not released yet) does the following: it reads registers
> with PTRACE_GETREGSET, and looks at returned regset size.
> It is different for 64-bit and 32-bit processes,
> and appears to be a reliable way to determine personality:
> No need to check segment selectors for magic values.
>
> This works for well-behaving processes.
>
> But the hole described in the aforementioned lkml thread
> still remains: 64-bit processes can perform 32-bit syscalls
> using "int 80" entry method, and in this case, kernel returns
> 64-bit regset. For example, this:
>
> asm("int $0x80": :"a" (29)); /* 32-bit sys_pause */
>
> will be decoded by strace as a (64-bit) shmget syscall.
>
> This patch makes it so that in syscall-entry-stop caused by
> "int 80" instruction, PTRACE_GETREGSET returns 32-bit regset.

Not sure...

First of all, this is incompatible change. And to me, it doesn't look
correct anyway. Say, why the debugger can't modify r15 if a 64bit tracee
does int80 ? Or think about PTRACE_EVENT_FORK which can be reported with
TS_COMPAT set.

Probably is_ia32_task() should be reported "explicitely" as we already
discussed, and afaik you have other ideas.

Oleg.

> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>
> --- linux-3.7.7/arch/x86/include/asm/uaccess.h
> +++ linux-3.7.7_regset/arch/x86/include/asm/uaccess.h
> @@ -594,5 +594,7 @@
>  # include <asm/uaccess_64.h>
>  #endif
>
> +#define ARCH_HAS_SYSCALL_USER_REGSET_VIEW 1
> +
>  #endif /* _ASM_X86_UACCESS_H */
>
> --- linux-3.7.7/arch/x86/kernel/ptrace.c
> +++ linux-3.7.7_regset/arch/x86/kernel/ptrace.c
> @@ -1443,6 +1443,22 @@
>  #endif
>  }
>
> +const struct user_regset_view *syscall_user_regset_view(struct task_struct *task)
> +{
> +#ifdef CONFIG_IA32_EMULATION
> +	/* Did task make 32-bit syscall just now?
> +	 * Task can still be 64-bit: think "int 0x80 on x86_64".
> +	 */
> +	if (task_thread_info(task)->status & TS_COMPAT)
> +#endif
> +#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
> +		return &user_x86_32_view;
> +#endif
> +#ifdef CONFIG_X86_64
> +	return &user_x86_64_view;
> +#endif
> +}
> +
>  static void fill_sigtrap_info(struct task_struct *tsk,
>  				struct pt_regs *regs,
>  				int error_code, int si_code,
> --- linux-3.7.7/include/linux/regset.h
> +++ linux-3.7.7_regset/include/linux/regset.h
> @@ -204,6 +204,12 @@
>   */
>  const struct user_regset_view *task_user_regset_view(struct task_struct *tsk);
>
> +#ifdef ARCH_HAS_SYSCALL_USER_REGSET_VIEW
> +const struct user_regset_view *syscall_user_regset_view(struct task_struct *tsk);
> +#else
> +# define syscall_user_regset_view task_user_regset_view
> +#endif
> +
>
>  /*
>   * These are helpers for writing regset get/set functions in arch code.
> --- linux-3.7.7/kernel/ptrace.c
> +++ linux-3.7.7_regset/kernel/ptrace.c
> @@ -684,7 +684,7 @@
>  static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
>  			 struct iovec *kiov)
>  {
> -	const struct user_regset_view *view = task_user_regset_view(task);
> +	const struct user_regset_view *view = syscall_user_regset_view(task);
>  	const struct user_regset *regset = find_regset(view, type);
>  	int regset_no;
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-14 15:00 ` Oleg Nesterov
@ 2013-02-14 16:26   ` Denys Vlasenko
  2013-02-14 18:05   ` H. Peter Anvin
  1 sibling, 0 replies; 12+ messages in thread
From: Denys Vlasenko @ 2013-02-14 16:26 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Andi Kleen, H. Peter Anvin

On 02/14/2013 04:00 PM, Oleg Nesterov wrote:
> On 02/14, Denys Vlasenko wrote:
>> This patch makes it so that in syscall-entry-stop caused by
>> "int 80" instruction, PTRACE_GETREGSET returns 32-bit regset.
> 
> Not sure...
> 
> First of all, this is incompatible change. And to me, it doesn't look
> correct anyway. Say, why the debugger can't modify r15 if a 64bit tracee
> does int80?

On x86_64, PTRACE_GET/SETREGS can be used for this: they always operate
on 64-bit registers.

> Or think about PTRACE_EVENT_FORK which can be reported with
> TS_COMPAT set.

I don't see a problem. Yes, PTRACE_GETREGSET will return
32-bit regset in this ptrace-stop, which is a problem... why?

> Probably is_ia32_task() should be reported "explicitely" as we already
> discussed, and afaik you have other ideas.

Yes, there are a few ideas. Say, new ptrace op:
can be introduced to return a vector of longs.
To make it easily parsable, how about (type,len,data...)
records? This also may allow tracer to indicate which records
it wants: for example, not everyone wants to read syscall params.
Maybe something like

ptrace(PTRACE_GETSYSCALL, pid, list_of_elements_I_want, &iov)

where list_of_elements_I_want is long[], 0-terminated,
iov points to a buffer, and on return iov.len is updated
(a-la GETREGSET).

What problems can be solved here?

(1) syscall entry/exit discrimination etc. Say, a record
can contain bit flags, such as "it's a syscall entry",
"it's a syscall exit", "it's a group-stop" etc.
Currently, it is impossible for tracer to distinguish
syscall entry from syscall exit.

(2) a record can supply arch-specific data, such as the x86-specific
problem I tried to address: "was it a int 80 syscall?". Variable-length
record format makes it easy to adapt to different archs' needs.
Alternatively, we can set aside a few bits in "bit flags record"
as arch dependent bits. Most arches need just a few bits.

(3) on syscall entry, a record can contain (up to) 7 words: syscall_no
and 0-6 params, making tracer's code less architecture dependent.
Today in strace, *every* architecture needs to have arch-dependent
regs-to-params conversion code. I would like to be able
to code it in C with the same code for most arches.

(4) We can read structs/data pointed by syscall params, such as
struct stat returned by fstat, without needing additional
round-trip to kernel, *and* with kernel-supplied information
on structure's size. Currently, strace has to know the size correctly.
There were, and will be, bugs in strace where we mishandle
structures because we mis-detect process' bitness, and use wrong
struct stat layouts. If kernel would be able to tell us:
"I returned 78 byte structure in memory pointed to by arg1",
it would help a lot. Even if it wouldn't return the result
structure itself (I imagine it's a lot of work in kernel
to access it in other process vm), knowing its size
will still be a big help.

(5) We can read several regsets: gps, SSE regs, etc.
Maybe someone would find it useful? (strace doesn't need this).

How does this look?

I propose to start small, by implementing just 1; 2 in a form of arch bits
as part of 1; and 3. It will satisfy the needs I tried to address
in my patch.

-- 
vda

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-14 15:00 ` Oleg Nesterov
  2013-02-14 16:26   ` Denys Vlasenko
@ 2013-02-14 18:05   ` H. Peter Anvin
  2013-02-14 19:18     ` Oleg Nesterov
  1 sibling, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2013-02-14 18:05 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Denys Vlasenko, linux-kernel, Andi Kleen

On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
> On 02/14, Denys Vlasenko wrote:
>>
>> Determining personality of a ptraced process is a murky area.
>> On x86, for years strace was looking at segment selectors,
>> which is conceptually wrong: see, for example,
>> https://lkml.org/lkml/2012/1/18/320
>>

One proposal that keeps being on the table is to export a regset with 
metadatam, including process mode at launch (i386, x86-64, x32).

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-14 18:05   ` H. Peter Anvin
@ 2013-02-14 19:18     ` Oleg Nesterov
  2013-02-14 19:21       ` H. Peter Anvin
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2013-02-14 19:18 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Denys Vlasenko, linux-kernel, Andi Kleen

On 02/14, H. Peter Anvin wrote:
>
> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
>> On 02/14, Denys Vlasenko wrote:
>>>
>>> Determining personality of a ptraced process is a murky area.
>>> On x86, for years strace was looking at segment selectors,
>>> which is conceptually wrong: see, for example,
>>> https://lkml.org/lkml/2012/1/18/320
>>>
>
> One proposal that keeps being on the table is to export a regset with
> metadatam, including process mode at launch (i386, x86-64, x32).

Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
time. Or REGSET_META should include META+GENERAL.

IOW, it is not clear to me what this "meta" should actually report.

Oleg.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-14 19:18     ` Oleg Nesterov
@ 2013-02-14 19:21       ` H. Peter Anvin
  2013-02-14 20:55         ` Cyrill Gorcunov
  2013-02-15 15:42         ` Denys Vlasenko
  0 siblings, 2 replies; 12+ messages in thread
From: H. Peter Anvin @ 2013-02-14 19:21 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Denys Vlasenko, linux-kernel, Andi Kleen

On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
> On 02/14, H. Peter Anvin wrote:
>>
>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
>>> On 02/14, Denys Vlasenko wrote:
>>>>
>>>> Determining personality of a ptraced process is a murky area.
>>>> On x86, for years strace was looking at segment selectors,
>>>> which is conceptually wrong: see, for example,
>>>> https://lkml.org/lkml/2012/1/18/320
>>>>
>>
>> One proposal that keeps being on the table is to export a regset with
>> metadatam, including process mode at launch (i386, x86-64, x32).
> 
> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
> time. Or REGSET_META should include META+GENERAL.
> 
> IOW, it is not clear to me what this "meta" should actually report.
> 

That is one of the things that needs to be nailed down.  In particular,
what are the things people need.

	-hpa



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-14 19:21       ` H. Peter Anvin
@ 2013-02-14 20:55         ` Cyrill Gorcunov
  2013-02-15 14:50           ` Denys Vlasenko
  2013-02-15 15:42         ` Denys Vlasenko
  1 sibling, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2013-02-14 20:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Oleg Nesterov, Denys Vlasenko, linux-kernel, Andi Kleen,
	Pavel Emelyanov

On Thu, Feb 14, 2013 at 11:21:12AM -0800, H. Peter Anvin wrote:
> On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
> > On 02/14, H. Peter Anvin wrote:
> >>
> >> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
> >>> On 02/14, Denys Vlasenko wrote:
> >>>>
> >>>> Determining personality of a ptraced process is a murky area.
> >>>> On x86, for years strace was looking at segment selectors,
> >>>> which is conceptually wrong: see, for example,
> >>>> https://lkml.org/lkml/2012/1/18/320
> >>>>
> >>
> >> One proposal that keeps being on the table is to export a regset with
> >> metadatam, including process mode at launch (i386, x86-64, x32).
> > 
> > Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
> > do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
> > time. Or REGSET_META should include META+GENERAL.
> > 
> > IOW, it is not clear to me what this "meta" should actually report.
> 
> That is one of the things that needs to be nailed down.  In particular,
> what are the things people need.

Indeed, having some "official" way for compat bit retrieval would be
a great thing for us (c/r camp) since we've just met the same problem.
And at moment I sticked for the same trick as gdb does (cs/ds test).

But, guys, if only I'm not missing something completely obvious,
can't we simply provide task-compat bit to userspace in say /proc/pid/stat
or something? Then the strace/gdb would be able to always know if
the tracee is actually in compat mode. Or I miss something fundamental
here?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-14 20:55         ` Cyrill Gorcunov
@ 2013-02-15 14:50           ` Denys Vlasenko
  2013-02-15 14:56             ` Cyrill Gorcunov
  0 siblings, 1 reply; 12+ messages in thread
From: Denys Vlasenko @ 2013-02-15 14:50 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: H. Peter Anvin, Oleg Nesterov, linux-kernel, Andi Kleen,
	Pavel Emelyanov

On 02/14/2013 09:55 PM, Cyrill Gorcunov wrote:
> On Thu, Feb 14, 2013 at 11:21:12AM -0800, H. Peter Anvin wrote:
>> On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
>>> On 02/14, H. Peter Anvin wrote:
>>>>
>>>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
>>>>> On 02/14, Denys Vlasenko wrote:
>>>>>>
>>>>>> Determining personality of a ptraced process is a murky area.
>>>>>> On x86, for years strace was looking at segment selectors,
>>>>>> which is conceptually wrong: see, for example,
>>>>>> https://lkml.org/lkml/2012/1/18/320
>>>>>>
>>>>
>>>> One proposal that keeps being on the table is to export a regset with
>>>> metadatam, including process mode at launch (i386, x86-64, x32).
>>>
>>> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
>>> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
>>> time. Or REGSET_META should include META+GENERAL.
>>>
>>> IOW, it is not clear to me what this "meta" should actually report.
>>
>> That is one of the things that needs to be nailed down.  In particular,
>> what are the things people need.
> 
> Indeed, having some "official" way for compat bit retrieval would be
> a great thing for us (c/r camp) since we've just met the same problem.
> And at moment I sticked for the same trick as gdb does (cs/ds test).
> 
> But, guys, if only I'm not missing something completely obvious,
> can't we simply provide task-compat bit to userspace in say /proc/pid/stat
> or something? Then the strace/gdb would be able to always know if
> the tracee is actually in compat mode. Or I miss something fundamental
> here?

strace needs to get that data on every syscall entry in the traced process.
Doing open/read/close on every syscall entry is going to slow it down a lot.

-- 
vda



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-15 14:50           ` Denys Vlasenko
@ 2013-02-15 14:56             ` Cyrill Gorcunov
  2013-02-15 15:09               ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2013-02-15 14:56 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: H. Peter Anvin, Oleg Nesterov, linux-kernel, Andi Kleen,
	Pavel Emelyanov

On Fri, Feb 15, 2013 at 03:50:43PM +0100, Denys Vlasenko wrote:
> On 02/14/2013 09:55 PM, Cyrill Gorcunov wrote:
> > On Thu, Feb 14, 2013 at 11:21:12AM -0800, H. Peter Anvin wrote:
> >> On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
> >>> On 02/14, H. Peter Anvin wrote:
> >>>>
> >>>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
> >>>>> On 02/14, Denys Vlasenko wrote:
> >>>>>>
> >>>>>> Determining personality of a ptraced process is a murky area.
> >>>>>> On x86, for years strace was looking at segment selectors,
> >>>>>> which is conceptually wrong: see, for example,
> >>>>>> https://lkml.org/lkml/2012/1/18/320
> >>>>>>
> >>>>
> >>>> One proposal that keeps being on the table is to export a regset with
> >>>> metadatam, including process mode at launch (i386, x86-64, x32).
> >>>
> >>> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
> >>> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
> >>> time. Or REGSET_META should include META+GENERAL.
> >>>
> >>> IOW, it is not clear to me what this "meta" should actually report.
> >>
> >> That is one of the things that needs to be nailed down.  In particular,
> >> what are the things people need.
> > 
> > Indeed, having some "official" way for compat bit retrieval would be
> > a great thing for us (c/r camp) since we've just met the same problem.
> > And at moment I sticked for the same trick as gdb does (cs/ds test).
> > 
> > But, guys, if only I'm not missing something completely obvious,
> > can't we simply provide task-compat bit to userspace in say /proc/pid/stat
> > or something? Then the strace/gdb would be able to always know if
> > the tracee is actually in compat mode. Or I miss something fundamental
> > here?
> 
> strace needs to get that data on every syscall entry in the traced process.
> Doing open/read/close on every syscall entry is going to slow it down a lot.

Don't you need to read it only once when strace is attached? Compat flag can't
be arbitrary dropped when program executes, no?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-15 14:56             ` Cyrill Gorcunov
@ 2013-02-15 15:09               ` Oleg Nesterov
  2013-02-15 15:16                 ` Cyrill Gorcunov
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2013-02-15 15:09 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Denys Vlasenko, H. Peter Anvin, linux-kernel, Andi Kleen,
	Pavel Emelyanov

On 02/15, Cyrill Gorcunov wrote:
>
> > strace needs to get that data on every syscall entry in the traced process.
> > Doing open/read/close on every syscall entry is going to slow it down a lot.
>
> Don't you need to read it only once when strace is attached? Compat flag can't
> be arbitrary dropped when program executes, no?

TS_COMPAT is set/cleared every time a 64bit task does int80.

I guess you need TIF_IA32, not TS_COMPAT, for c/r.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-15 15:09               ` Oleg Nesterov
@ 2013-02-15 15:16                 ` Cyrill Gorcunov
  0 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2013-02-15 15:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Denys Vlasenko, H. Peter Anvin, linux-kernel, Andi Kleen,
	Pavel Emelyanov

On Fri, Feb 15, 2013 at 04:09:40PM +0100, Oleg Nesterov wrote:
> On 02/15, Cyrill Gorcunov wrote:
> >
> > > strace needs to get that data on every syscall entry in the traced process.
> > > Doing open/read/close on every syscall entry is going to slow it down a lot.
> >
> > Don't you need to read it only once when strace is attached? Compat flag can't
> > be arbitrary dropped when program executes, no?
> 
> TS_COMPAT is set/cleared every time a 64bit task does int80.
> 
> I guess you need TIF_IA32, not TS_COMPAT, for c/r.

Yeah, indeed. Still if there will be a metadata in registers set, this
will be even better I think.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80
  2013-02-14 19:21       ` H. Peter Anvin
  2013-02-14 20:55         ` Cyrill Gorcunov
@ 2013-02-15 15:42         ` Denys Vlasenko
  1 sibling, 0 replies; 12+ messages in thread
From: Denys Vlasenko @ 2013-02-15 15:42 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Oleg Nesterov, linux-kernel, Andi Kleen, jan.kratochvil

On 02/14/2013 08:21 PM, H. Peter Anvin wrote:
> On 02/14/2013 11:18 AM, Oleg Nesterov wrote:
>> On 02/14, H. Peter Anvin wrote:
>>>
>>> On 02/14/2013 07:00 AM, Oleg Nesterov wrote:
>>>> On 02/14, Denys Vlasenko wrote:
>>>>>
>>>>> Determining personality of a ptraced process is a murky area.
>>>>> On x86, for years strace was looking at segment selectors,
>>>>> which is conceptually wrong: see, for example,
>>>>> https://lkml.org/lkml/2012/1/18/320
>>>>>
>>>
>>> One proposal that keeps being on the table is to export a regset with
>>> metadatam, including process mode at launch (i386, x86-64, x32).
>>
>> Yes... but if this metadata includes TS_COMPAT-is-set, then strace should
>> do PTRACE_GETREGSET(REGSET_META) + PTRACE_GETREGSET(REGSET_GENERAL) every
>> time. Or REGSET_META should include META+GENERAL.
>>
>> IOW, it is not clear to me what this "meta" should actually report.
> 
> That is one of the things that needs to be nailed down.  In particular,
> what are the things people need.

Let's see what strace needs, by examining its source for various arches.

Ow. Six instances of PTRACE_PEEKTEXT (i.e. attempts to read tracee's
code - inherently unsafe operation) in syscall.c, affected arches:
S390: for syscall# fetch, thankfully only needed before 2.5.44;
ARM: for syscall# fetch. Looks like only needed for non-EABI?
SPARC: for personality detection.

Examples of personality detection:
POWERPC64: by examining registers (MSR)
X86: by looking at GETREGSET size
IA64: by examining registers (CR_IPSR)
ARM: by checking syscall no (scno & 0x0f0000)
SPARC: by looking at trap instruction

Syscall entry versus exit detection (i.e. a sanity check):
ALPHA, MIPS: registers (if a3 is 0 or -1, it's exit)
S390: registers (messy code)
X86: registers (eax must be -ENOSYS on entry)

In general, it is not reliable: eax must be -ENOSYS on entry,
but it can be -ENOSYS on exit too. IOW: if we see eax == -ENOSYS,
we have noi idea whether it's entry or exit.

Syscall parameters fetching. Some architectures
need to use nontrivial code. Look at this:

#elif defined(IA64)
	if (!ia32) {
		unsigned long *out0, cfm, sof, sol;
		long rbs_end;
		/* be backwards compatible with kernel < 2.4.4... */
#		ifndef PT_RBS_END
#		  define PT_RBS_END	PT_AR_BSP
#		endif

		if (upeek(tcp, PT_RBS_END, &rbs_end) < 0)
			return -1;
		if (upeek(tcp, PT_CFM, (long *) &cfm) < 0)
			return -1;

		sof = (cfm >> 0) & 0x7f;
		sol = (cfm >> 7) & 0x7f;
		out0 = ia64_rse_skip_regs((unsigned long *) rbs_end, -sof + sol);

		for (i = 0; i < nargs; ++i) {
			if (umoven(tcp, (unsigned long) ia64_rse_skip_regs(out0, i),
				   sizeof(long), (char *) &tcp->u_arg[i]) < 0)
				return -1;
		}

or this:

#elif defined(MIPS)
	if (nargs > 4) {
		long sp;

		if (upeek(tcp, REG_SP, &sp) < 0)
			return -1;
		for (i = 0; i < 4; ++i)
			if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0)
				return -1;
		umoven(tcp, sp + 16, (nargs - 4) * sizeof(tcp->u_arg[0]),
		       (char *)(tcp->u_arg + 4));
	} else {
		for (i = 0; i < nargs; ++i)
			if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0)
				return -1;
	}

Detecting error exits from syscalls. Most arches use the -errno
convention, others (IA64, SPARC, MIPS) have dedicated register
or bit in a status register to indicate error. Some syscalls
"never fail" (e.g. getgid), and strace needs to know which syscalls
never fail.

If you want to take a look yourself, for your convenience I attached
larger excerpts from strace's syscall.c source file.


To summarize:

Looks like this particular ptrace user would benefit from
the following data:

* is it a syscall entry, exit, or something else.
* for syscall entry:
  - parameters width (32/64/etc) and personality data
    (if arch has personality data more fine-grained than "32/64 bits")
  - syscall no
  - parameters
* for syscall exit:
  - parameters width (32/64/etc) and personality data
  - error indicator (errno)?
  - syscall result

Does this look as a good format?

-- 
vda




static int
get_scno(struct tcb *tcp)
{
	long scno = 0;

#if defined(S390) || defined(S390X)
	if (upeek(tcp, PT_GPR2, &syscall_mode) < 0)
		return -1;

	if (syscall_mode != -ENOSYS) {
		/*
		 * Since kernel version 2.5.44 the scno gets passed in gpr2.
		 */
		scno = syscall_mode;
	} else {
		/*
		 * Old style of "passing" the scno via the SVC instruction.
		 */
		long psw;
		long opcode, offset_reg, tmp;
		void *svc_addr;
		static const int gpr_offset[16] = {
				PT_GPR0,  PT_GPR1,  PT_ORIGGPR2, PT_GPR3,
				PT_GPR4,  PT_GPR5,  PT_GPR6,     PT_GPR7,
				PT_GPR8,  PT_GPR9,  PT_GPR10,    PT_GPR11,
				PT_GPR12, PT_GPR13, PT_GPR14,    PT_GPR15
		};

		if (upeek(tcp, PT_PSWADDR, &psw) < 0)
			return -1;
		errno = 0;
		opcode = ptrace(PTRACE_PEEKTEXT, tcp->pid, (char *)(psw - sizeof(long)), 0);
		if (errno) {
			perror_msg("%s", "peektext(psw-oneword)");
			return -1;
		}

		/*
		 *  We have to check if the SVC got executed directly or via an
		 *  EXECUTE instruction. In case of EXECUTE it is necessary to do
		 *  instruction decoding to derive the system call number.
		 *  Unfortunately the opcode sizes of EXECUTE and SVC are differently,
		 *  so that this doesn't work if a SVC opcode is part of an EXECUTE
		 *  opcode. Since there is no way to find out the opcode size this
		 *  is the best we can do...
		 */
		if ((opcode & 0xff00) == 0x0a00) {
			/* SVC opcode */
			scno = opcode & 0xff;
		}
		else {
			/* SVC got executed by EXECUTE instruction */

			/*
			 *  Do instruction decoding of EXECUTE. If you really want to
			 *  understand this, read the Principles of Operations.
			 */
			svc_addr = (void *) (opcode & 0xfff);

			tmp = 0;
			offset_reg = (opcode & 0x000f0000) >> 16;
			if (offset_reg && (upeek(tcp, gpr_offset[offset_reg], &tmp) < 0))
				return -1;
			svc_addr += tmp;

			tmp = 0;
			offset_reg = (opcode & 0x0000f000) >> 12;
			if (offset_reg && (upeek(tcp, gpr_offset[offset_reg], &tmp) < 0))
				return -1;
			svc_addr += tmp;

			scno = ptrace(PTRACE_PEEKTEXT, tcp->pid, svc_addr, 0);
			if (errno)
				return -1;
# if defined(S390X)
			scno >>= 48;
# else
			scno >>= 16;
# endif
			tmp = 0;
			offset_reg = (opcode & 0x00f00000) >> 20;
			if (offset_reg && (upeek(tcp, gpr_offset[offset_reg], &tmp) < 0))
				return -1;

			scno = (scno | tmp) & 0xff;
		}
	}
#elif defined(POWERPC)
	if (upeek(tcp, sizeof(unsigned long)*PT_R0, &scno) < 0)
		return -1;
# ifdef POWERPC64
	/* TODO: speed up strace by not doing this at every syscall.
	 * We only need to do it after execve.
	 */
	int currpers;
	long val;

	/* Check for 64/32 bit mode. */
	if (upeek(tcp, sizeof(unsigned long)*PT_MSR, &val) < 0)
		return -1;
	/* SF is bit 0 of MSR */
	if (val < 0)
		currpers = 0;
	else
		currpers = 1;
	update_personality(tcp, currpers);
# endif
#elif defined(X86_64) || defined(X32)
	int currpers;
	/* GETREGSET of NT_PRSTATUS tells us regset size,
	 * which unambiguously detects i386.
	 *
	 * Linux kernel distinguishes x86-64 and x32 processes
	 * solely by looking at __X32_SYSCALL_BIT:
	 * arch/x86/include/asm/compat.h::is_x32_task():
	 * if (task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT)
	 *         return true;
	 */
	if (x86_io.iov_len == sizeof(i386_regs)) {
		scno = i386_regs.orig_eax;
		currpers = 1;
	} else {
		scno = x86_64_regs.orig_rax;
		currpers = 0;
		if (scno & __X32_SYSCALL_BIT) {
			scno -= __X32_SYSCALL_BIT;
			currpers = 2;
		}
	}
	update_personality(tcp, currpers);
#elif defined(IA64)
	long psr;
	if (upeek(tcp, PT_CR_IPSR, &psr) >= 0)
		ia32 = (psr & IA64_PSR_IS) != 0;
	if (ia32) {
		if (upeek(tcp, PT_R1, &scno) < 0)
			return -1;
	} else {
		if (upeek(tcp, PT_R15, &scno) < 0)
			return -1;
	}
#elif defined(AARCH64)
	switch (aarch64_io.iov_len) {
		case sizeof(aarch64_regs):
			/* We are in 64-bit mode */
			scno = aarch64_regs.regs[8];
			update_personality(tcp, 1);
			break;
		case sizeof(arm_regs):
			/* We are in 32-bit mode */
			scno = arm_regs.ARM_r7;
			update_personality(tcp, 0);
			break;
	}
#elif defined(ARM)
	/*
	 * We only need to grab the syscall number on syscall entry.
	 */
	if (arm_regs.ARM_ip == 0) {
		/*
		 * Note: we only deal with 32-bit CPUs here
		 */
		if (arm_regs.ARM_cpsr & 0x20) {
			/*
			 * Get the Thumb-mode system call number
			 */
			scno = arm_regs.ARM_r7;
		} else {
			/*
			 * Get the ARM-mode system call number
			 */
			errno = 0;
			scno = ptrace(PTRACE_PEEKTEXT, tcp->pid, (void *)(arm_regs.ARM_pc - 4), NULL);
			if (errno)
				return -1;

			/* Handle the EABI syscall convention.  We do not
			   bother converting structures between the two
			   ABIs, but basic functionality should work even
			   if strace and the traced program have different
			   ABIs.  */
			if (scno == 0xef000000) {
				scno = arm_regs.ARM_r7;
			} else {
				if ((scno & 0x0ff00000) != 0x0f900000) {
					fprintf(stderr, "syscall: unknown syscall trap 0x%08lx\n",
						scno);
					return -1;
				}

				/*
				 * Fixup the syscall number
				 */
				scno &= 0x000fffff;
			}
		}
		if (scno & 0x0f0000) {
			/*
			 * Handle ARM specific syscall
			 */
			update_personality(tcp, 1);
			scno &= 0x0000ffff;
		} else
			update_personality(tcp, 0);

	} else {
		fprintf(stderr, "pid %d stray syscall entry\n", tcp->pid);
		tcp->flags |= TCB_INSYSCALL;
	}
#elif defined(LINUX_MIPSN32)
	unsigned long long regs[38];

	if (ptrace(PTRACE_GETREGS, tcp->pid, NULL, (long) &regs) < 0)
		return -1;
	mips_a3 = regs[REG_A3];
	mips_r2 = regs[REG_V0];

	scno = mips_r2;
	if (!SCNO_IN_RANGE(scno)) {
		if (mips_a3 == 0 || mips_a3 == -1) {
			if (debug_flag)
				fprintf(stderr, "stray syscall exit: v0 = %ld\n", scno);
			return 0;
		}
	}
#elif defined(MIPS)
	if (upeek(tcp, REG_A3, &mips_a3) < 0)
		return -1;
	if (upeek(tcp, REG_V0, &scno) < 0)
		return -1;

	if (!SCNO_IN_RANGE(scno)) {
		if (mips_a3 == 0 || mips_a3 == -1) {
			if (debug_flag)
				fprintf(stderr, "stray syscall exit: v0 = %ld\n", scno);
			return 0;
		}
	}
#elif defined(ALPHA)
	if (upeek(tcp, REG_A3, &alpha_a3) < 0)
		return -1;
	if (upeek(tcp, REG_R0, &scno) < 0)
		return -1;

	/*
	 * Do some sanity checks to figure out if it's
	 * really a syscall entry
	 */
	if (!SCNO_IN_RANGE(scno)) {
		if (alpha_a3 == 0 || alpha_a3 == -1) {
			if (debug_flag)
				fprintf(stderr, "stray syscall exit: r0 = %ld\n", scno);
			return 0;
		}
	}
#elif defined(SPARC) || defined(SPARC64)
	/* Disassemble the syscall trap. */
	/* Retrieve the syscall trap instruction. */
	unsigned long trap;
	errno = 0;
# if defined(SPARC64)
	trap = ptrace(PTRACE_PEEKTEXT, tcp->pid, (char *)regs.tpc, 0);
	trap >>= 32;
# else
	trap = ptrace(PTRACE_PEEKTEXT, tcp->pid, (char *)regs.pc, 0);
# endif
	if (errno)
		return -1;

	/* Disassemble the trap to see what personality to use. */
	switch (trap) {
	case 0x91d02010:
		/* Linux/SPARC syscall trap. */
		update_personality(tcp, 0);
		break;
	case 0x91d0206d:
		/* Linux/SPARC64 syscall trap. */
		update_personality(tcp, 2);
		break;
	case 0x91d02000:
		/* SunOS syscall trap. (pers 1) */
		fprintf(stderr, "syscall: SunOS no support\n");
		return -1;
	case 0x91d02008:
		/* Solaris 2.x syscall trap. (per 2) */
		update_personality(tcp, 1);
		break;
	case 0x91d02009:
		/* NetBSD/FreeBSD syscall trap. */
		fprintf(stderr, "syscall: NetBSD/FreeBSD not supported\n");
		return -1;
	case 0x91d02027:
		/* Solaris 2.x gettimeofday */
		update_personality(tcp, 1);
		break;
	default:
# if defined(SPARC64)
		fprintf(stderr, "syscall: unknown syscall trap %08lx %016lx\n", trap, regs.tpc);
# else
		fprintf(stderr, "syscall: unknown syscall trap %08lx %08lx\n", trap, regs.pc);
# endif
		return -1;
	}

	/* Extract the system call number from the registers. */
	if (trap == 0x91d02027)
		scno = 156;
	else
		scno = regs.u_regs[U_REG_G1];
	if (scno == 0) {
		scno = regs.u_regs[U_REG_O0];
		memmove(&regs.u_regs[U_REG_O0], &regs.u_regs[U_REG_O1], 7*sizeof(regs.u_regs[0]));
	}
#elif defined(TILE)
	int currpers;
	scno = tile_regs.regs[10];
# ifdef __tilepro__
	currpers = 1;
# else
#  ifndef PT_FLAGS_COMPAT
#   define PT_FLAGS_COMPAT 0x10000  /* from Linux 3.8 on */
#  endif
	if (tile_regs.flags & PT_FLAGS_COMPAT)
		currpers = 1;
	else
		currpers = 0;
# endif
	update_personality(tcp, currpers);
#endif
	tcp->scno = scno;
	return 1;
}

/* Called at each syscall entry.
 * Returns:
 * 0: "ignore this ptrace stop", bail out of trace_syscall_entering() silently.
 * 1: ok, continue in trace_syscall_entering().
 * other: error, trace_syscall_entering() should print error indicator
 *    ("????" etc) and bail out.
 */
static int
syscall_fixup_on_sysenter(struct tcb *tcp)
{
	/* A common case of "not a syscall entry" is post-execve SIGTRAP */
#if defined(I386)
	if (i386_regs.eax != -ENOSYS) {
		if (debug_flag)
			fprintf(stderr, "not a syscall entry (eax = %ld)\n", i386_regs.eax);
		return 0;
	}
#elif defined(X86_64) || defined(X32)
	{
		long rax;
		if (x86_io.iov_len == sizeof(i386_regs)) {
			/* Sign extend from 32 bits */
			rax = (int32_t)i386_regs.eax;
		} else {
			/* Note: in X32 build, this truncates 64 to 32 bits */
			rax = x86_64_regs.rax;
		}
		if (rax != -ENOSYS) {
			if (debug_flag)
				fprintf(stderr, "not a syscall entry (rax = %ld)\n", rax);
			return 0;
		}
	}
#elif defined(S390) || defined(S390X)
	/* TODO: we already fetched PT_GPR2 in get_scno
	 * and stored it in syscall_mode, reuse it here
	 * instead of re-fetching?
	 */
	if (upeek(tcp, PT_GPR2, &gpr2) < 0)
		return -1;
	if (syscall_mode != -ENOSYS)
		syscall_mode = tcp->scno;
	if (gpr2 != syscall_mode) {
		if (debug_flag)
			fprintf(stderr, "not a syscall entry (gpr2 = %ld)\n", gpr2);
		return 0;
	}
#elif defined(M68K)
	if (upeek(tcp, 4*PT_D0, &m68k_d0) < 0)
		return -1;
	if (m68k_d0 != -ENOSYS) {
		if (debug_flag)
			fprintf(stderr, "not a syscall entry (d0 = %ld)\n", m68k_d0);
		return 0;
	}
#elif defined(IA64)
	if (upeek(tcp, PT_R10, &ia64_r10) < 0)
		return -1;
	if (upeek(tcp, PT_R8, &ia64_r8) < 0)
		return -1;
	if (ia32 && ia64_r8 != -ENOSYS) {
		if (debug_flag)
			fprintf(stderr, "not a syscall entry (r8 = %ld)\n", ia64_r8);
		return 0;
	}
#elif defined(CRISV10) || defined(CRISV32)
	if (upeek(tcp, 4*PT_R10, &cris_r10) < 0)
		return -1;
	if (cris_r10 != -ENOSYS) {
		if (debug_flag)
			fprintf(stderr, "not a syscall entry (r10 = %ld)\n", cris_r10);
		return 0;
	}
#elif defined(MICROBLAZE)
	if (upeek(tcp, 3 * 4, &microblaze_r3) < 0)
		return -1;
	if (microblaze_r3 != -ENOSYS) {
		if (debug_flag)
			fprintf(stderr, "not a syscall entry (r3 = %ld)\n", microblaze_r3);
		return 0;
	}
#endif
	return 1;
}

/* Return -1 on error or 1 on success (never 0!) */
static int
get_syscall_args(struct tcb *tcp)
{
	int i, nargs;

	if (SCNO_IN_RANGE(tcp->scno))
		nargs = tcp->u_nargs = sysent[tcp->scno].nargs;
	else
		nargs = tcp->u_nargs = MAX_ARGS;

#if defined(S390) || defined(S390X)
	for (i = 0; i < nargs; ++i)
		if (upeek(tcp, i==0 ? PT_ORIGGPR2 : PT_GPR2 + i*sizeof(long), &tcp->u_arg[i]) < 0)
			return -1;
#elif defined(ALPHA)
	for (i = 0; i < nargs; ++i)
		if (upeek(tcp, REG_A0+i, &tcp->u_arg[i]) < 0)
			return -1;
#elif defined(IA64)
	if (!ia32) {
		unsigned long *out0, cfm, sof, sol;
		long rbs_end;
		/* be backwards compatible with kernel < 2.4.4... */
#		ifndef PT_RBS_END
#		  define PT_RBS_END	PT_AR_BSP
#		endif

		if (upeek(tcp, PT_RBS_END, &rbs_end) < 0)
			return -1;
		if (upeek(tcp, PT_CFM, (long *) &cfm) < 0)
			return -1;

		sof = (cfm >> 0) & 0x7f;
		sol = (cfm >> 7) & 0x7f;
		out0 = ia64_rse_skip_regs((unsigned long *) rbs_end, -sof + sol);

		for (i = 0; i < nargs; ++i) {
			if (umoven(tcp, (unsigned long) ia64_rse_skip_regs(out0, i),
				   sizeof(long), (char *) &tcp->u_arg[i]) < 0)
				return -1;
		}
	} else {
		static const int argreg[MAX_ARGS] = { PT_R11 /* EBX = out0 */,
						      PT_R9  /* ECX = out1 */,
						      PT_R10 /* EDX = out2 */,
						      PT_R14 /* ESI = out3 */,
						      PT_R15 /* EDI = out4 */,
						      PT_R13 /* EBP = out5 */};

		for (i = 0; i < nargs; ++i) {
			if (upeek(tcp, argreg[i], &tcp->u_arg[i]) < 0)
				return -1;
			/* truncate away IVE sign-extension */
			tcp->u_arg[i] &= 0xffffffff;
		}
	}
#elif defined(MIPS)
	if (nargs > 4) {
		long sp;

		if (upeek(tcp, REG_SP, &sp) < 0)
			return -1;
		for (i = 0; i < 4; ++i)
			if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0)
				return -1;
		umoven(tcp, sp + 16, (nargs - 4) * sizeof(tcp->u_arg[0]),
		       (char *)(tcp->u_arg + 4));
	} else {
		for (i = 0; i < nargs; ++i)
			if (upeek(tcp, REG_A0 + i, &tcp->u_arg[i]) < 0)
				return -1;
	}
#elif defined(M68K)
	for (i = 0; i < nargs; ++i)
		if (upeek(tcp, (i < 5 ? i : i + 2)*4, &tcp->u_arg[i]) < 0)
			return -1;
#else /* Other architecture (32bits specific) */
	for (i = 0; i < nargs; ++i)
		if (upeek(tcp, i*4, &tcp->u_arg[i]) < 0)
			return -1;
#endif
	return 1;
}

/* Returns:
 * 1: ok, continue in trace_syscall_exiting().
 * -1: error, trace_syscall_exiting() should print error indicator
 *    ("????" etc) and bail out.
 */
static int
get_syscall_result(struct tcb *tcp)
{
#if defined(S390) || defined(S390X)
	if (upeek(tcp, PT_GPR2, &gpr2) < 0)
		return -1;
#elif defined(POWERPC)
# define SO_MASK 0x10000000
	{
		long flags;
		if (upeek(tcp, sizeof(unsigned long)*PT_CCR, &flags) < 0)
			return -1;
		if (upeek(tcp, sizeof(unsigned long)*PT_R3, &ppc_result) < 0)
			return -1;
		if (flags & SO_MASK)
			ppc_result = -ppc_result;
	}
#elif defined(AVR32)
	/* already done by get_regs */
#elif defined(BFIN)
	if (upeek(tcp, PT_R0, &bfin_r0) < 0)
		return -1;
#elif defined(I386)
	/* already done by get_regs */
#elif defined(X86_64) || defined(X32)
	/* already done by get_regs */
#elif defined(IA64)
#	define IA64_PSR_IS	((long)1 << 34)
	long psr;
	if (upeek(tcp, PT_CR_IPSR, &psr) >= 0)
		ia32 = (psr & IA64_PSR_IS) != 0;
	if (upeek(tcp, PT_R8, &ia64_r8) < 0)
		return -1;
	if (upeek(tcp, PT_R10, &ia64_r10) < 0)
		return -1;
#elif defined(ARM)
	/* already done by get_regs */
#elif defined(AARCH64)
	/* register reading already done by get_regs */

	/* Used to do this, but we did it on syscall entry already: */
	/* We are in 64-bit mode (personality 1) if register struct is aarch64_regs,
	 * else it's personality 0.
	 */
	/*update_personality(tcp, aarch64_io.iov_len == sizeof(aarch64_regs));*/
#elif defined(M68K)
	if (upeek(tcp, 4*PT_D0, &m68k_d0) < 0)
		return -1;
#elif defined(LINUX_MIPSN32)
	unsigned long long regs[38];

	if (ptrace(PTRACE_GETREGS, tcp->pid, NULL, (long) &regs) < 0)
		return -1;
	mips_a3 = regs[REG_A3];
	mips_r2 = regs[REG_V0];
#elif defined(MIPS)
	if (upeek(tcp, REG_A3, &mips_a3) < 0)
		return -1;
	if (upeek(tcp, REG_V0, &mips_r2) < 0)
		return -1;
#elif defined(ALPHA)
	if (upeek(tcp, REG_A3, &alpha_a3) < 0)
		return -1;
	if (upeek(tcp, REG_R0, &alpha_r0) < 0)
		return -1;
#elif defined(SPARC) || defined(SPARC64)
	/* already done by get_regs */
#elif defined(HPPA)
	if (upeek(tcp, PT_GR28, &hppa_r28) < 0)
		return -1;
#elif defined(SH)
	/* new syscall ABI returns result in R0 */
	if (upeek(tcp, 4*REG_REG0, (long *)&sh_r0) < 0)
		return -1;
#elif defined(SH64)
	/* ABI defines result returned in r9 */
	if (upeek(tcp, REG_GENERAL(9), (long *)&sh64_r9) < 0)
		return -1;
#elif defined(CRISV10) || defined(CRISV32)
	if (upeek(tcp, 4*PT_R10, &cris_r10) < 0)
		return -1;
#elif defined(TILE)
	/* already done by get_regs */
#elif defined(MICROBLAZE)
	if (upeek(tcp, 3 * 4, &microblaze_r3) < 0)
		return -1;
#elif defined(OR1K)
	/* already done by get_regs */
#endif
	return 1;
}

/* Called at each syscall exit */
static void
syscall_fixup_on_sysexit(struct tcb *tcp)
{
#if defined(S390) || defined(S390X)
	if (syscall_mode != -ENOSYS)
		syscall_mode = tcp->scno;
	if ((tcp->flags & TCB_WAITEXECVE)
		 && (gpr2 == -ENOSYS || gpr2 == tcp->scno)) {
		/*
		 * Return from execve.
		 * Fake a return value of zero.  We leave the TCB_WAITEXECVE
		 * flag set for the post-execve SIGTRAP to see and reset.
		 */
		gpr2 = 0;
	}
#endif
}

/* Returns:
 * 1: ok, continue in trace_syscall_exiting().
 * -1: error, trace_syscall_exiting() should print error indicator
 *    ("????" etc) and bail out.
 */
static int
get_error(struct tcb *tcp)
{
	int u_error = 0;
	int check_errno = 1;
	if (SCNO_IN_RANGE(tcp->scno) &&
	    sysent[tcp->scno].sys_flags & SYSCALL_NEVER_FAILS) {
		check_errno = 0;
	}
#if defined(S390) || defined(S390X)
	if (check_errno && is_negated_errno(gpr2)) {
		tcp->u_rval = -1;
		u_error = -gpr2;
	}
	else {
		tcp->u_rval = gpr2;
	}
#elif defined(I386)
	if (check_errno && is_negated_errno(i386_regs.eax)) {
		tcp->u_rval = -1;
		u_error = -i386_regs.eax;
	}
	else {
		tcp->u_rval = i386_regs.eax;
	}
#elif defined(X86_64)
	long rax;
	if (x86_io.iov_len == sizeof(i386_regs)) {
		/* Sign extend from 32 bits */
		rax = (int32_t)i386_regs.eax;
	} else {
		rax = x86_64_regs.rax;
	}
	if (check_errno && is_negated_errno(rax)) {
		tcp->u_rval = -1;
		u_error = -rax;
	}
	else {
		tcp->u_rval = rax;
	}
#elif defined(X32)
	/* In X32, return value is 64-bit (llseek uses one).
	 * Using merely "long rax" would not work.
	 */
	long long rax;
	if (x86_io.iov_len == sizeof(i386_regs)) {
		/* Sign extend from 32 bits */
		rax = (int32_t)i386_regs.eax;
	} else {
		rax = x86_64_regs.rax;
	}
	/* Careful: is_negated_errno() works only on longs */
	if (check_errno && is_negated_errno_x32(rax)) {
		tcp->u_rval = -1;
		u_error = -rax;
	}
	else {
		tcp->u_rval = rax; /* truncating */
		tcp->u_lrval = rax;
	}
#elif defined(IA64)
	if (ia32) {
		int err;

		err = (int)ia64_r8;
		if (check_errno && is_negated_errno(err)) {
			tcp->u_rval = -1;
			u_error = -err;
		}
		else {
			tcp->u_rval = err;
		}
	} else {
		if (check_errno && ia64_r10) {
			tcp->u_rval = -1;
			u_error = ia64_r8;
		} else {
			tcp->u_rval = ia64_r8;
		}
	}
#elif defined(MIPS)
	if (check_errno && mips_a3) {
		tcp->u_rval = -1;
		u_error = mips_r2;
	} else {
		tcp->u_rval = mips_r2;
# if defined(LINUX_MIPSN32)
		tcp->u_lrval = mips_r2;
# endif
	}
#elif defined(POWERPC)
	if (check_errno && is_negated_errno(ppc_result)) {
		tcp->u_rval = -1;
		u_error = -ppc_result;
	}
	else {
		tcp->u_rval = ppc_result;
	}
#elif defined(M68K)
	if (check_errno && is_negated_errno(m68k_d0)) {
		tcp->u_rval = -1;
		u_error = -m68k_d0;
	}
	else {
		tcp->u_rval = m68k_d0;
	}
#elif defined(ARM) || defined(AARCH64)
# if defined(AARCH64)
	if (tcp->currpers == 1) {
		if (check_errno && is_negated_errno(aarch64_regs.regs[0])) {
			tcp->u_rval = -1;
			u_error = -aarch64_regs.regs[0];
		}
		else {
			tcp->u_rval = aarch64_regs.regs[0];
		}
	}
	else
# endif
	{
		if (check_errno && is_negated_errno(arm_regs.ARM_r0)) {
			tcp->u_rval = -1;
			u_error = -arm_regs.ARM_r0;
		}
		else {
			tcp->u_rval = arm_regs.ARM_r0;
		}
	}
#elif defined(AVR32)
	if (check_errno && regs.r12 && (unsigned) -regs.r12 < nerrnos) {
		tcp->u_rval = -1;
		u_error = -regs.r12;
	}
	else {
		tcp->u_rval = regs.r12;
	}
#elif defined(BFIN)
	if (check_errno && is_negated_errno(bfin_r0)) {
		tcp->u_rval = -1;
		u_error = -bfin_r0;
	} else {
		tcp->u_rval = bfin_r0;
	}
#elif defined(ALPHA)
	if (check_errno && alpha_a3) {
		tcp->u_rval = -1;
		u_error = alpha_r0;
	}
	else {
		tcp->u_rval = alpha_r0;
	}
#elif defined(SPARC)
	if (check_errno && regs.psr & PSR_C) {
		tcp->u_rval = -1;
		u_error = regs.u_regs[U_REG_O0];
	}
	else {
		tcp->u_rval = regs.u_regs[U_REG_O0];
	}
#elif defined(SPARC64)
	if (check_errno && regs.tstate & 0x1100000000UL) {
		tcp->u_rval = -1;
		u_error = regs.u_regs[U_REG_O0];
	}
	else {
		tcp->u_rval = regs.u_regs[U_REG_O0];
	}
#elif defined(HPPA)
	if (check_errno && is_negated_errno(hppa_r28)) {
		tcp->u_rval = -1;
		u_error = -hppa_r28;
	}
	else {
		tcp->u_rval = hppa_r28;
	}
#elif defined(SH)
	if (check_errno && is_negated_errno(sh_r0)) {
		tcp->u_rval = -1;
		u_error = -sh_r0;
	}
	else {
		tcp->u_rval = sh_r0;
	}
#elif defined(SH64)
	if (check_errno && is_negated_errno(sh64_r9)) {
		tcp->u_rval = -1;
		u_error = -sh64_r9;
	}
	else {
		tcp->u_rval = sh64_r9;
	}
#elif defined(CRISV10) || defined(CRISV32)
	if (check_errno && cris_r10 && (unsigned) -cris_r10 < nerrnos) {
		tcp->u_rval = -1;
		u_error = -cris_r10;
	}
	else {
		tcp->u_rval = cris_r10;
	}
#elif defined(TILE)
	/*
	 * The standard tile calling convention returns the value (or negative
	 * errno) in r0, and zero (or positive errno) in r1.
	 * Until at least kernel 3.8, however, the r1 value is not reflected
	 * in ptregs at this point, so we use r0 here.
	 */
	if (check_errno && is_negated_errno(tile_regs.regs[0])) {
		tcp->u_rval = -1;
		u_error = -tile_regs.regs[0];
	} else {
		tcp->u_rval = tile_regs.regs[0];
	}
#elif defined(MICROBLAZE)
	if (check_errno && is_negated_errno(microblaze_r3)) {
		tcp->u_rval = -1;
		u_error = -microblaze_r3;
	}
	else {
		tcp->u_rval = microblaze_r3;
	}
#elif defined(OR1K)
	if (check_errno && is_negated_errno(or1k_regs.gpr[11])) {
		tcp->u_rval = -1;
		u_error = -or1k_regs.gpr[11];
	}
	else {
		tcp->u_rval = or1k_regs.gpr[11];
	}
#endif
	tcp->u_error = u_error;
	return 1;
}


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-02-15 15:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-14 13:17 [PATCH] x86: make PTRACE_GETREGSET return 32-bit regs if 64-bit process entered kernel with int 80 Denys Vlasenko
2013-02-14 15:00 ` Oleg Nesterov
2013-02-14 16:26   ` Denys Vlasenko
2013-02-14 18:05   ` H. Peter Anvin
2013-02-14 19:18     ` Oleg Nesterov
2013-02-14 19:21       ` H. Peter Anvin
2013-02-14 20:55         ` Cyrill Gorcunov
2013-02-15 14:50           ` Denys Vlasenko
2013-02-15 14:56             ` Cyrill Gorcunov
2013-02-15 15:09               ` Oleg Nesterov
2013-02-15 15:16                 ` Cyrill Gorcunov
2013-02-15 15:42         ` Denys Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox