* CONFIG_HAVE_ARCH_TRACEHOOK and you
@ 2008-09-12 2:57 Roland McGrath
2008-09-12 2:57 ` Roland McGrath
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 2:57 UTC (permalink / raw)
To: linux-arch; +Cc: utrace-devel, linux-kernel
Hi, linux-arch! I'd like to tell you about CONFIG_HAVE_ARCH_TRACEHOOK
and what each arch's maintainers will want to do about it.
In the current tree (since 2.6.27-rc1), arch/Kconfig defines the
internal config item HAVE_ARCH_TRACEHOOK. The idea is to set a new
baseline of what an arch has to implement to support tracing/debugging
of user tasks. After you do these things, then new kinds of tracing
features, new replacements for ptrace, or drastic changes to the
implementation details of ptrace, can come along later and just work on
your arch without every arch maintainer having to worry about the
details of new features or new non-arch implementation details. When
your arch is ready, adding "select HAVE_ARCH_TRACEHOOK" somewhere in an
arch-specific Kconfig file makes known that your arch is compatible with
new code. New features and options will "require HAVE_ARCH_TRACEHOOK".
In 2.6.27 (as of rc6), powerpc{,64} and sparc{,64} have complete support
and do "select HAVE_ARCH_TRACEHOOK". In 2.6.28, x86 will have it (that
work is already done and waiting in the tip/x86/tracehook branch), and I
believe s390 is also already on track to have it done in 2.6.28. Since
2.6.27-rc1, everything mentioned here has been in place in the generic
code, so you can start the changes for your arch as soon as you like.
If there is interest, I can run an informal session at the Kernel Summit
and/or Linux Plumbers Conference next week to walk arch hackers through
each work item. (I don't have any presentation to give or anything, all
the info is in this posting and/or documented in the source code. But
we can work through it all in detail in person for as long as you'd like.)
Let me know ASAP if folks want this, so we can get it scheduled.
It has been suggested that we set a deadline to require every arch to be
compatible and set HAVE_ARCH_TRACEHOOK. If your arch hadn't been
updated after that deadline, then your kernel build might break, or it
might still build but suddenly fail to support the ptrace system call.
I'm not especially suggesting this myself, and I really don't know when
that deadline might be set. But it's possible that in a few releases,
you'd be sorry if you just ignore this stuff now and let it slide.
Of course more arch-specific work is liable to come along in the future
for some fancy new tracing features. I won't try to promise that by
doing this now you won't ever have to think about ptrace and its ilk
again. But it should be enough for a good number of possible new
features and reorganizations to come along and work on your arch without
you having to think about them, or at least not too much.
I did most of this work myself for x86 and powerpc. I found it
advantageous to do it in a large number of tiny incremental patches.
There are many little items, but most are very quick to do (trivial).
With a few exceptions, this is pretty much just renaming and juggling code
you already have. (The user_regset parts are most of the work.) I'll be
glad to help with any specific questions about your arch code, or to
review your arch patches if you CC me.
All of the interfaces are subject to change if the current definitions
make life really hard for some arch. So don't worry about bending over
backward, just pipe up if you see a problem.
The comment in arch/Kconfig lists the items:
# task_pt_regs() in asm/processor.h or asm/ptrace.h
# arch_has_single_step() if there is hardware single-step support
# arch_has_block_step() if there is hardware block-step support
# arch_ptrace() and not #define __ARCH_SYS_PTRACE
# compat_arch_ptrace() and #define __ARCH_WANT_COMPAT_SYS_PTRACE
# asm/syscall.h supplying asm-generic/syscall.h interface
# linux/regset.h user_regset interfaces
# CORE_DUMP_USE_REGSET #define'd in linux/elf.h
# TIF_SYSCALL_TRACE calls tracehook_report_syscall_{entry,exit}
# TIF_NOTIFY_RESUME calls tracehook_notify_resume()
# signal delivery calls tracehook_signal_handler()
The various kerneldoc comments in linux/ptrace.h, linux/tracehook.h,
asm-generic/syscall.h, and linux/regset.h give full details about the
new calls (some calls to make, some calls to define).
There is a little more verbiage you can find from:
http://sourceware.org/systemtap/wiki/utrace/arch
Feel free to edit that wiki, or move the contents someplace else you
like better. (It mentions utrace, but all the substance there is about
HAVE_ARCH_TRACEHOOK and not dependent on utrace, which is not in the kernel.)
Maybe it's useful for more text about this to go into some Documentation/
file (please go right ahead and write one).
Here's the text that's on that wiki:
1. task_pt_regs()
* Define this inline function in asm/processor.h or asm/ptrace.h.
2. arch_has_single_step(), arch_has_block_step()
* If your hardware has single-step and/or block-step support, then
define these macros and related functions.
See the kerneldoc comments in linux/ptrace.h for details.
3. arch_ptrace()
* You must define arch_ptrace() and not #define __ARCH_SYS_PTRACE.
4. compat_arch_ptrace()
* If your arch uses CONFIG_COMPAT, you must define
compat_arch_ptrace() and #define __ARCH_WANT_COMPAT_SYS_PTRACE.
5. linux/regset.h
* You must define user_regset structures and calls for your
machine, and define task_user_regset_view(). The formats must
match those used for core dumps, and have appropriate
.core_note_type fields. See linux/regset.h for details.
6. CORE_DUMP_USE_REGSET
* You must #define CORE_DUMP_USE_REGSET in asm/elf.h and test that
core dumps work via the user_regset interfaces and produce
correct results.
7. asm/syscall.h
* You must supply asm/syscall.h for your arch, with all the
functions (usually inlines) described in asm-generic/syscall.h.
8. TIF_SYSCALL_TRACE
* Setting TIF_SYSCALL_TRACE must cause calls from arch code to
tracehook_report_syscall_entry() and
tracehook_report_syscall_exit() instead of the old ptrace
behavior. Note that the calling arch code should handle the
return value from tracehook_report_syscall_entry(), which is
behavior that was not required for the old ptrace
support. This needs to implement some form of safe abort of
the syscall. See the kerneldoc comments for the exact details.
9. TIF_NOTIFY_RESUME
* You must define the TIF_NOTIFY_RESUME bit. This should behave in
the arch code like TIF_SIGPENDING, i.e. checked when returning to
user mode so you can never miss one. But when TIF_NOTIFY_RESUME
is set, the arch code must do:
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
where regs is the same as task_pt_regs(current). (That is the
only effect of TIF_NOTIFY_RESUME, and it does not affect waits et
al like TIF_SIGPENDING does.) This code path should not
unconditionally go into the signals code, i.e. at some point you
should check TIF_SIGPENDING independently and not enter a
do_signal() path when only TIF_NOTIFY_RESUME is set; this avoids
debugged threads serializing on their shared siglock.
10. tracehook_signal_handler()
* Your signal handling code should call tracehook_signal_handler()
after doing handler setup. This happens after all the signal
magic (sa_mask handling et al), usually the last thing before
returning from do_signal() or a similar function in the arch
code. See linux/tracehook.h for the parameters to pass it.
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 2:57 Roland McGrath
@ 2008-09-12 2:57 ` Roland McGrath
2008-09-12 8:23 ` Christoph Hellwig
` (2 subsequent siblings)
3 siblings, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 2:57 UTC (permalink / raw)
To: linux-arch; +Cc: utrace-devel, linux-kernel
Hi, linux-arch! I'd like to tell you about CONFIG_HAVE_ARCH_TRACEHOOK
and what each arch's maintainers will want to do about it.
In the current tree (since 2.6.27-rc1), arch/Kconfig defines the
internal config item HAVE_ARCH_TRACEHOOK. The idea is to set a new
baseline of what an arch has to implement to support tracing/debugging
of user tasks. After you do these things, then new kinds of tracing
features, new replacements for ptrace, or drastic changes to the
implementation details of ptrace, can come along later and just work on
your arch without every arch maintainer having to worry about the
details of new features or new non-arch implementation details. When
your arch is ready, adding "select HAVE_ARCH_TRACEHOOK" somewhere in an
arch-specific Kconfig file makes known that your arch is compatible with
new code. New features and options will "require HAVE_ARCH_TRACEHOOK".
In 2.6.27 (as of rc6), powerpc{,64} and sparc{,64} have complete support
and do "select HAVE_ARCH_TRACEHOOK". In 2.6.28, x86 will have it (that
work is already done and waiting in the tip/x86/tracehook branch), and I
believe s390 is also already on track to have it done in 2.6.28. Since
2.6.27-rc1, everything mentioned here has been in place in the generic
code, so you can start the changes for your arch as soon as you like.
If there is interest, I can run an informal session at the Kernel Summit
and/or Linux Plumbers Conference next week to walk arch hackers through
each work item. (I don't have any presentation to give or anything, all
the info is in this posting and/or documented in the source code. But
we can work through it all in detail in person for as long as you'd like.)
Let me know ASAP if folks want this, so we can get it scheduled.
It has been suggested that we set a deadline to require every arch to be
compatible and set HAVE_ARCH_TRACEHOOK. If your arch hadn't been
updated after that deadline, then your kernel build might break, or it
might still build but suddenly fail to support the ptrace system call.
I'm not especially suggesting this myself, and I really don't know when
that deadline might be set. But it's possible that in a few releases,
you'd be sorry if you just ignore this stuff now and let it slide.
Of course more arch-specific work is liable to come along in the future
for some fancy new tracing features. I won't try to promise that by
doing this now you won't ever have to think about ptrace and its ilk
again. But it should be enough for a good number of possible new
features and reorganizations to come along and work on your arch without
you having to think about them, or at least not too much.
I did most of this work myself for x86 and powerpc. I found it
advantageous to do it in a large number of tiny incremental patches.
There are many little items, but most are very quick to do (trivial).
With a few exceptions, this is pretty much just renaming and juggling code
you already have. (The user_regset parts are most of the work.) I'll be
glad to help with any specific questions about your arch code, or to
review your arch patches if you CC me.
All of the interfaces are subject to change if the current definitions
make life really hard for some arch. So don't worry about bending over
backward, just pipe up if you see a problem.
The comment in arch/Kconfig lists the items:
# task_pt_regs() in asm/processor.h or asm/ptrace.h
# arch_has_single_step() if there is hardware single-step support
# arch_has_block_step() if there is hardware block-step support
# arch_ptrace() and not #define __ARCH_SYS_PTRACE
# compat_arch_ptrace() and #define __ARCH_WANT_COMPAT_SYS_PTRACE
# asm/syscall.h supplying asm-generic/syscall.h interface
# linux/regset.h user_regset interfaces
# CORE_DUMP_USE_REGSET #define'd in linux/elf.h
# TIF_SYSCALL_TRACE calls tracehook_report_syscall_{entry,exit}
# TIF_NOTIFY_RESUME calls tracehook_notify_resume()
# signal delivery calls tracehook_signal_handler()
The various kerneldoc comments in linux/ptrace.h, linux/tracehook.h,
asm-generic/syscall.h, and linux/regset.h give full details about the
new calls (some calls to make, some calls to define).
There is a little more verbiage you can find from:
http://sourceware.org/systemtap/wiki/utrace/arch
Feel free to edit that wiki, or move the contents someplace else you
like better. (It mentions utrace, but all the substance there is about
HAVE_ARCH_TRACEHOOK and not dependent on utrace, which is not in the kernel.)
Maybe it's useful for more text about this to go into some Documentation/
file (please go right ahead and write one).
Here's the text that's on that wiki:
1. task_pt_regs()
* Define this inline function in asm/processor.h or asm/ptrace.h.
2. arch_has_single_step(), arch_has_block_step()
* If your hardware has single-step and/or block-step support, then
define these macros and related functions.
See the kerneldoc comments in linux/ptrace.h for details.
3. arch_ptrace()
* You must define arch_ptrace() and not #define __ARCH_SYS_PTRACE.
4. compat_arch_ptrace()
* If your arch uses CONFIG_COMPAT, you must define
compat_arch_ptrace() and #define __ARCH_WANT_COMPAT_SYS_PTRACE.
5. linux/regset.h
* You must define user_regset structures and calls for your
machine, and define task_user_regset_view(). The formats must
match those used for core dumps, and have appropriate
.core_note_type fields. See linux/regset.h for details.
6. CORE_DUMP_USE_REGSET
* You must #define CORE_DUMP_USE_REGSET in asm/elf.h and test that
core dumps work via the user_regset interfaces and produce
correct results.
7. asm/syscall.h
* You must supply asm/syscall.h for your arch, with all the
functions (usually inlines) described in asm-generic/syscall.h.
8. TIF_SYSCALL_TRACE
* Setting TIF_SYSCALL_TRACE must cause calls from arch code to
tracehook_report_syscall_entry() and
tracehook_report_syscall_exit() instead of the old ptrace
behavior. Note that the calling arch code should handle the
return value from tracehook_report_syscall_entry(), which is
behavior that was not required for the old ptrace
support. This needs to implement some form of safe abort of
the syscall. See the kerneldoc comments for the exact details.
9. TIF_NOTIFY_RESUME
* You must define the TIF_NOTIFY_RESUME bit. This should behave in
the arch code like TIF_SIGPENDING, i.e. checked when returning to
user mode so you can never miss one. But when TIF_NOTIFY_RESUME
is set, the arch code must do:
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
where regs is the same as task_pt_regs(current). (That is the
only effect of TIF_NOTIFY_RESUME, and it does not affect waits et
al like TIF_SIGPENDING does.) This code path should not
unconditionally go into the signals code, i.e. at some point you
should check TIF_SIGPENDING independently and not enter a
do_signal() path when only TIF_NOTIFY_RESUME is set; this avoids
debugged threads serializing on their shared siglock.
10. tracehook_signal_handler()
* Your signal handling code should call tracehook_signal_handler()
after doing handler setup. This happens after all the signal
magic (sa_mask handling et al), usually the last thing before
returning from do_signal() or a similar function in the arch
code. See linux/tracehook.h for the parameters to pass it.
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 2:57 Roland McGrath
2008-09-12 2:57 ` Roland McGrath
@ 2008-09-12 8:23 ` Christoph Hellwig
2008-09-12 13:05 ` Paul Mundt
2008-09-12 13:13 ` Russell King
3 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2008-09-12 8:23 UTC (permalink / raw)
To: Roland McGrath; +Cc: linux-arch, utrace-devel, linux-kernel
On Thu, Sep 11, 2008 at 07:57:33PM -0700, Roland McGrath wrote:
> 3. arch_ptrace()
>
> * You must define arch_ptrace() and not #define __ARCH_SYS_PTRACE.
__ARCH_SYS_PTRACE is already gone :)
>
> 4. compat_arch_ptrace()
>
> * If your arch uses CONFIG_COMPAT, you must define
> compat_arch_ptrace() and #define __ARCH_WANT_COMPAT_SYS_PTRACE.
This should be gone for 2.6.28, too. (no ACK from the ia64 people yet,
but mips and parisc are doen for sure)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 2:57 Roland McGrath
2008-09-12 2:57 ` Roland McGrath
2008-09-12 8:23 ` Christoph Hellwig
@ 2008-09-12 13:05 ` Paul Mundt
2008-09-15 20:38 ` Roland McGrath
2008-09-12 13:13 ` Russell King
3 siblings, 1 reply; 19+ messages in thread
From: Paul Mundt @ 2008-09-12 13:05 UTC (permalink / raw)
To: Roland McGrath; +Cc: linux-arch, utrace-devel, linux-kernel
On Thu, Sep 11, 2008 at 07:57:33PM -0700, Roland McGrath wrote:
> The comment in arch/Kconfig lists the items:
>
> # task_pt_regs() in asm/processor.h or asm/ptrace.h
> # arch_has_single_step() if there is hardware single-step support
> # arch_has_block_step() if there is hardware block-step support
> # arch_ptrace() and not #define __ARCH_SYS_PTRACE
> # compat_arch_ptrace() and #define __ARCH_WANT_COMPAT_SYS_PTRACE
> # asm/syscall.h supplying asm-generic/syscall.h interface
> # linux/regset.h user_regset interfaces
> # CORE_DUMP_USE_REGSET #define'd in linux/elf.h
> # TIF_SYSCALL_TRACE calls tracehook_report_syscall_{entry,exit}
> # TIF_NOTIFY_RESUME calls tracehook_notify_resume()
> # signal delivery calls tracehook_signal_handler()
>
[snip]
> Here's the text that's on that wiki:
>
> 1. task_pt_regs()
>
> * Define this inline function in asm/processor.h or asm/ptrace.h.
>
user_stack_pointer() is apparently a requirement, too. Although given
that you already have a task_struct pointer the only place you currently
use it (lib/syscall.c), it makes more sense to use KSTK_ESP/KSTK_EIP
which is provided by almost everyone already.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
---
lib/syscall.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/syscall.c b/lib/syscall.c
index a4f7067..888c36a 100644
--- a/lib/syscall.c
+++ b/lib/syscall.c
@@ -11,8 +11,8 @@ static int collect_syscall(struct task_struct *target, long *callno,
if (unlikely(!regs))
return -EAGAIN;
- *sp = user_stack_pointer(regs);
- *pc = instruction_pointer(regs);
+ *sp = KSTK_ESP(target);
+ *pc = KSTK_EIP(target);
*callno = syscall_get_nr(target, regs);
if (*callno != -1L && maxargs > 0)
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 2:57 Roland McGrath
` (2 preceding siblings ...)
2008-09-12 13:05 ` Paul Mundt
@ 2008-09-12 13:13 ` Russell King
2008-09-12 21:57 ` David Miller
2008-09-12 22:31 ` Roland McGrath
3 siblings, 2 replies; 19+ messages in thread
From: Russell King @ 2008-09-12 13:13 UTC (permalink / raw)
To: Roland McGrath; +Cc: linux-arch, utrace-devel, linux-kernel
Okay, let's comment on each bit separately.
Regsets
-------
These don't appear to be a problem for ARM, and turn out to be relatively
clean. The only thing I did do was invent some alternative simpler
helper functions rather than using the user_regset_copy* functions (to
avoid taking the address of function arguments, which needlessly forces
them onto the stack.)
However, in looking at other architectures, I notice that sparc does this
when initializing its regsets:
.n = 38 * sizeof(u32),
.size = sizeof(u32), .align = sizeof(u32),
and sparc64:
.n = 36 * sizeof(u64),
.size = sizeof(u64), .align = sizeof(u64),
which, given that fs/binfmt_elf.c does this:
size_t size = regset->n * regset->size;
void *data = kmalloc(size, GFP_KERNEL);
if (unlikely(!data))
return 0;
means sparc ends up allocating 38 * sizeof(u32) * sizeof(u32), and
sparc64 ends up with 36 * sizeof(u64) * sizeof(u64), which must surely
be wrong?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
@ 2008-09-12 13:40 Russell King
2008-09-12 13:40 ` Russell King
2008-09-12 23:57 ` Roland McGrath
0 siblings, 2 replies; 19+ messages in thread
From: Russell King @ 2008-09-12 13:40 UTC (permalink / raw)
To: Roland McGrath; +Cc: linux-arch, utrace-devel, linux-kernel
ptrace_report_syscall
---------------------
What's the purpose of the PT_PTRACED check in there? The current code
is setup such that you can only start tracing syscalls if the
__ptrace_may_access test succeeds, which will only succeed if PT_PTRACED
is already set. TIF_SYSCALL_TRACE will be cleared before PT_PTRACED is
cleared when stopping ptracing. So, TIF_SYSCALL_TRACE can only ever be
set if PT_PTRACED is already set, and hopefully arch code never tries
to call the syscall tracing hooks if TIF_SYSCALL_TRACE isn't set. That
all means that the PT_PTRACED test seems redundant.
Also, shouldn't ptrace_report_syscall() be in linux/ptrace.h, since it's
just the guts of the existing syscall tracing hook irrespective of the
tracehook stuff?
syscall
-------
* syscall_get_arguments - extract system call parameter values
* @task: task of interest, must be blocked
* @regs: task_pt_regs() of @task
* @i: argument index [0,5]
* @n: number of arguments; n+i must be [1,6].
* @args: array filled with argument values
This is all very well, but adds a lot of complexity for architectures
which isn't currently required. This is fine if you have a sane ABI
where you can just pick the arguments straight out of the registers one
by one.
However, with ARM EABI, we have the situation where, for a syscall such
as:
long sys_foo(int a, long long b, long long c, int d);
the register allocation ends up as:
a => r0
b => r2, r3
c => r4, r5
d => r6
whereas:
long sys_foo(long long a, long long b, int c, int d);
the register allocation ends up as:
a => r0, r1
b => r2, r3
c => r4
d => r5
So, in order to know which register argument N is in, you need to know
all the types of the arguments which come before it. That means
creating and maintaining a table of syscall arguments which sounds
really sucky.
I can think of why you want these interfaces - because you see it as
necessary for things like strace. However, strace needs to know the
type information for each syscall argument in any case, so by putting
the requirement for arg N access into the kernel, you're causing that
type information to be placed not only in strace but also into the
kernel.
It also means that you can't randomly change the syscall number, since
changing the syscall number between those two examples means you need
to reshuffle the registers to make the arguments fit.
To me, this looks like a source of a large can of worms. Is this really
necessary, or can we keep this functionality pushed to where it belongs -
in the very few userspace applications which require it?
I think the rest of syscall.h shouldn't be that much of a problem for ARM.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 13:40 CONFIG_HAVE_ARCH_TRACEHOOK and you Russell King
@ 2008-09-12 13:40 ` Russell King
2008-09-12 23:57 ` Roland McGrath
1 sibling, 0 replies; 19+ messages in thread
From: Russell King @ 2008-09-12 13:40 UTC (permalink / raw)
To: Roland McGrath; +Cc: linux-arch, utrace-devel, linux-kernel
ptrace_report_syscall
---------------------
What's the purpose of the PT_PTRACED check in there? The current code
is setup such that you can only start tracing syscalls if the
__ptrace_may_access test succeeds, which will only succeed if PT_PTRACED
is already set. TIF_SYSCALL_TRACE will be cleared before PT_PTRACED is
cleared when stopping ptracing. So, TIF_SYSCALL_TRACE can only ever be
set if PT_PTRACED is already set, and hopefully arch code never tries
to call the syscall tracing hooks if TIF_SYSCALL_TRACE isn't set. That
all means that the PT_PTRACED test seems redundant.
Also, shouldn't ptrace_report_syscall() be in linux/ptrace.h, since it's
just the guts of the existing syscall tracing hook irrespective of the
tracehook stuff?
syscall
-------
* syscall_get_arguments - extract system call parameter values
* @task: task of interest, must be blocked
* @regs: task_pt_regs() of @task
* @i: argument index [0,5]
* @n: number of arguments; n+i must be [1,6].
* @args: array filled with argument values
This is all very well, but adds a lot of complexity for architectures
which isn't currently required. This is fine if you have a sane ABI
where you can just pick the arguments straight out of the registers one
by one.
However, with ARM EABI, we have the situation where, for a syscall such
as:
long sys_foo(int a, long long b, long long c, int d);
the register allocation ends up as:
a => r0
b => r2, r3
c => r4, r5
d => r6
whereas:
long sys_foo(long long a, long long b, int c, int d);
the register allocation ends up as:
a => r0, r1
b => r2, r3
c => r4
d => r5
So, in order to know which register argument N is in, you need to know
all the types of the arguments which come before it. That means
creating and maintaining a table of syscall arguments which sounds
really sucky.
I can think of why you want these interfaces - because you see it as
necessary for things like strace. However, strace needs to know the
type information for each syscall argument in any case, so by putting
the requirement for arg N access into the kernel, you're causing that
type information to be placed not only in strace but also into the
kernel.
It also means that you can't randomly change the syscall number, since
changing the syscall number between those two examples means you need
to reshuffle the registers to make the arguments fit.
To me, this looks like a source of a large can of worms. Is this really
necessary, or can we keep this functionality pushed to where it belongs -
in the very few userspace applications which require it?
I think the rest of syscall.h shouldn't be that much of a problem for ARM.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 13:13 ` Russell King
@ 2008-09-12 21:57 ` David Miller
2008-09-12 22:31 ` Roland McGrath
1 sibling, 0 replies; 19+ messages in thread
From: David Miller @ 2008-09-12 21:57 UTC (permalink / raw)
To: rmk+lkml; +Cc: roland, linux-arch, utrace-devel, linux-kernel
From: Russell King <rmk+lkml@arm.linux.org.uk>
Date: Fri, 12 Sep 2008 14:13:51 +0100
> However, in looking at other architectures, I notice that sparc does this
> when initializing its regsets:
>
> .n = 38 * sizeof(u32),
> .size = sizeof(u32), .align = sizeof(u32),
>
> and sparc64:
>
> .n = 36 * sizeof(u64),
> .size = sizeof(u64), .align = sizeof(u64),
>
> which, given that fs/binfmt_elf.c does this:
>
> size_t size = regset->n * regset->size;
> void *data = kmalloc(size, GFP_KERNEL);
> if (unlikely(!data))
> return 0;
>
> means sparc ends up allocating 38 * sizeof(u32) * sizeof(u32), and
> sparc64 ends up with 36 * sizeof(u64) * sizeof(u64), which must surely
> be wrong?
Yep, definitely a bug, good catch. I guess, better to allocate
too much by accident rather than too little in this case :-)
I'll fix this up, thanks!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 13:13 ` Russell King
2008-09-12 21:57 ` David Miller
@ 2008-09-12 22:31 ` Roland McGrath
2008-09-12 22:31 ` Roland McGrath
2008-09-12 22:37 ` David Miller
1 sibling, 2 replies; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 22:31 UTC (permalink / raw)
To: Russell King; +Cc: linux-arch, utrace-devel, linux-kernel
> clean. The only thing I did do was invent some alternative simpler
> helper functions rather than using the user_regset_copy* functions (to
> avoid taking the address of function arguments, which needlessly forces
> them onto the stack.)
Ideally those would get inlined so that doesn't happen. Their only purpose
is to make it easier to write get/set functions in arch code. Whatever you
find preferable in your arch code is certainly fine with me. If you have
something different, or changes to those regset.h functions, that would be
of use for other arch code too, then bring them on (under a new subject
line if you don't mind).
> means sparc ends up allocating 38 * sizeof(u32) * sizeof(u32), and
> sparc64 ends up with 36 * sizeof(u64) * sizeof(u64), which must surely
> be wrong?
Yup! Sure looks like Dave skipped the step where it says "test that core
dumps work ... and produce correct results". :-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 22:31 ` Roland McGrath
@ 2008-09-12 22:31 ` Roland McGrath
2008-09-12 22:37 ` David Miller
1 sibling, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 22:31 UTC (permalink / raw)
To: Russell King; +Cc: linux-arch, utrace-devel, linux-kernel
> clean. The only thing I did do was invent some alternative simpler
> helper functions rather than using the user_regset_copy* functions (to
> avoid taking the address of function arguments, which needlessly forces
> them onto the stack.)
Ideally those would get inlined so that doesn't happen. Their only purpose
is to make it easier to write get/set functions in arch code. Whatever you
find preferable in your arch code is certainly fine with me. If you have
something different, or changes to those regset.h functions, that would be
of use for other arch code too, then bring them on (under a new subject
line if you don't mind).
> means sparc ends up allocating 38 * sizeof(u32) * sizeof(u32), and
> sparc64 ends up with 36 * sizeof(u64) * sizeof(u64), which must surely
> be wrong?
Yup! Sure looks like Dave skipped the step where it says "test that core
dumps work ... and produce correct results". :-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 22:31 ` Roland McGrath
2008-09-12 22:31 ` Roland McGrath
@ 2008-09-12 22:37 ` David Miller
2008-09-12 22:39 ` Roland McGrath
1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2008-09-12 22:37 UTC (permalink / raw)
To: roland; +Cc: rmk+lkml, linux-arch, utrace-devel, linux-kernel
From: Roland McGrath <roland@redhat.com>
Date: Fri, 12 Sep 2008 15:31:56 -0700 (PDT)
> > means sparc ends up allocating 38 * sizeof(u32) * sizeof(u32), and
> > sparc64 ends up with 36 * sizeof(u64) * sizeof(u64), which must surely
> > be wrong?
>
> Yup! Sure looks like Dave skipped the step where it says "test that core
> dumps work ... and produce correct results". :-)
It was allocating too much memory, which is harmless.
So it did work, and I did test it.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 22:37 ` David Miller
@ 2008-09-12 22:39 ` Roland McGrath
2008-09-12 22:39 ` Roland McGrath
2008-09-12 22:40 ` David Miller
0 siblings, 2 replies; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 22:39 UTC (permalink / raw)
To: David Miller; +Cc: linux-arch, utrace-devel, rmk+lkml, linux-kernel
> It was allocating too much memory, which is harmless.
Didn't it also write NT_PRFPREG notes of the wrong size?
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 22:39 ` Roland McGrath
@ 2008-09-12 22:39 ` Roland McGrath
2008-09-12 22:40 ` David Miller
1 sibling, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 22:39 UTC (permalink / raw)
To: David Miller; +Cc: rmk+lkml, linux-arch, utrace-devel, linux-kernel
> It was allocating too much memory, which is harmless.
Didn't it also write NT_PRFPREG notes of the wrong size?
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 22:39 ` Roland McGrath
2008-09-12 22:39 ` Roland McGrath
@ 2008-09-12 22:40 ` David Miller
2008-09-12 22:45 ` Roland McGrath
1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2008-09-12 22:40 UTC (permalink / raw)
To: roland; +Cc: rmk+lkml, linux-arch, utrace-devel, linux-kernel
From: Roland McGrath <roland@redhat.com>
Date: Fri, 12 Sep 2008 15:39:26 -0700 (PDT)
> > It was allocating too much memory, which is harmless.
>
> Didn't it also write NT_PRFPREG notes of the wrong size?
Yep, but gdb was "generous in what it received" and happily
read the contents.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 22:40 ` David Miller
@ 2008-09-12 22:45 ` Roland McGrath
2008-09-12 22:45 ` Roland McGrath
0 siblings, 1 reply; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 22:45 UTC (permalink / raw)
To: David Miller; +Cc: linux-arch, utrace-devel, rmk+lkml, linux-kernel
> > Didn't it also write NT_PRFPREG notes of the wrong size?
>
> Yep, but gdb was "generous in what it received" and happily
> read the contents.
Ah. I've always done core sanity checks with:
1. generate core1 on old kernel
2. generate core2 on new kernel (identical userland scenario)
3. eu-readelf -nl core1 > a
4. eu-readelf -nl core2 > b
5. diff -u a b
Then you can eyeball any expected drift like SP address randomization,
and be suspicious of all other differences. (Of course, I also test
that gdb still likes it.)
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 22:45 ` Roland McGrath
@ 2008-09-12 22:45 ` Roland McGrath
0 siblings, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 22:45 UTC (permalink / raw)
To: David Miller; +Cc: rmk+lkml, linux-arch, utrace-devel, linux-kernel
> > Didn't it also write NT_PRFPREG notes of the wrong size?
>
> Yep, but gdb was "generous in what it received" and happily
> read the contents.
Ah. I've always done core sanity checks with:
1. generate core1 on old kernel
2. generate core2 on new kernel (identical userland scenario)
3. eu-readelf -nl core1 > a
4. eu-readelf -nl core2 > b
5. diff -u a b
Then you can eyeball any expected drift like SP address randomization,
and be suspicious of all other differences. (Of course, I also test
that gdb still likes it.)
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 13:40 CONFIG_HAVE_ARCH_TRACEHOOK and you Russell King
2008-09-12 13:40 ` Russell King
@ 2008-09-12 23:57 ` Roland McGrath
2008-09-12 23:57 ` Roland McGrath
1 sibling, 1 reply; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 23:57 UTC (permalink / raw)
To: Russell King; +Cc: linux-arch, utrace-devel, linux-kernel
> ptrace_report_syscall
> ---------------------
>
> What's the purpose of the PT_PTRACED check in there? [...]
The short answer is that tracehook_report_syscall_* is just about moving
the current ptrace logic into the shared inline, not changing that logic.
So it's a separate issue. The current logic (in arch/arm/kernel/ptrace.c
too) has the check, I didn't change that.
In fact, I think it's not entirely redundant. When the tracer exits
suddenly without doing PTRACE_DETACH (kill -9 gdb or whatnot), then
->ptrace gets cleared but TIF_SYSCALL_TRACE never gets reset.
(That very arguably should be fixed, but it's not anything new.)
But all that's entirely orthogonal to consolidating each arch's copy of the
code into ptrace_report_syscall, which is the only change to this area of
ptrace that the tracehook migration itself does.
> Also, shouldn't ptrace_report_syscall() be in linux/ptrace.h, since it's
> just the guts of the existing syscall tracing hook irrespective of the
> tracehook stuff?
I have no objection.
> So, in order to know which register argument N is in, you need to know
> all the types of the arguments which come before it. That means
> creating and maintaining a table of syscall arguments which sounds
> really sucky.
I certainly don't propose that you do any of that. I think this is just a
documentation issue. Instead of "syscall arguments", read "syscall
argument registers, in the most naturally useful order for your arch".
Everything about these values, and the syscall # value (when != -1L), is
entirely arch-specific. The asm/syscall.h interfaces aren't meant to make
the meaning of those bits generic.
For example, on all 32-bit machines, __NR_pread64 logically takes 4
arguments but that's in 5 argument registers. In the syscall.h calls,
args[3] means only the 4th argument register, not the whole "4th argument"
(which is (args[3] | (args[4] << 32)) on LE-32).
Indeed, we do expect userland to know exactly which registers, in which
order, we mean by "the 6 syscall arg registers" on the particular arch/ABI.
It may be handy for userland that on some arch's or for many syscalls on
most arch's, this happens to correspond with one argument per register in
the natural order of syscall arguments at source level. But it's
userland's problem to grok the ABI enough to know what these 6 registers
plus the syscall # mean.
The reason we need syscall_get_arguments is that it can be used in places
that the full user_regset calls can't be, such as in task_current_syscall.
It's deep knowledge of the internals of the kernel's entry paths to know
which user registers can and can't be found correctly in pt_regs when
inside a fast-path system call entry. Nothing in userland knows that, nor
should it. (And anyway, the user_regset calls are only kosher to call at
"safe spots" like ptrace/signal stops--so that the arch's regset code is
not obliged to carefully slice and dice for partially-available regsets.)
When inside a syscall, the argument registers were certainly copied in, so
we specify syscall_get_arguments assuming that these can always be got.
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 23:57 ` Roland McGrath
@ 2008-09-12 23:57 ` Roland McGrath
0 siblings, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2008-09-12 23:57 UTC (permalink / raw)
To: Russell King; +Cc: linux-arch, utrace-devel, linux-kernel
> ptrace_report_syscall
> ---------------------
>
> What's the purpose of the PT_PTRACED check in there? [...]
The short answer is that tracehook_report_syscall_* is just about moving
the current ptrace logic into the shared inline, not changing that logic.
So it's a separate issue. The current logic (in arch/arm/kernel/ptrace.c
too) has the check, I didn't change that.
In fact, I think it's not entirely redundant. When the tracer exits
suddenly without doing PTRACE_DETACH (kill -9 gdb or whatnot), then
->ptrace gets cleared but TIF_SYSCALL_TRACE never gets reset.
(That very arguably should be fixed, but it's not anything new.)
But all that's entirely orthogonal to consolidating each arch's copy of the
code into ptrace_report_syscall, which is the only change to this area of
ptrace that the tracehook migration itself does.
> Also, shouldn't ptrace_report_syscall() be in linux/ptrace.h, since it's
> just the guts of the existing syscall tracing hook irrespective of the
> tracehook stuff?
I have no objection.
> So, in order to know which register argument N is in, you need to know
> all the types of the arguments which come before it. That means
> creating and maintaining a table of syscall arguments which sounds
> really sucky.
I certainly don't propose that you do any of that. I think this is just a
documentation issue. Instead of "syscall arguments", read "syscall
argument registers, in the most naturally useful order for your arch".
Everything about these values, and the syscall # value (when != -1L), is
entirely arch-specific. The asm/syscall.h interfaces aren't meant to make
the meaning of those bits generic.
For example, on all 32-bit machines, __NR_pread64 logically takes 4
arguments but that's in 5 argument registers. In the syscall.h calls,
args[3] means only the 4th argument register, not the whole "4th argument"
(which is (args[3] | (args[4] << 32)) on LE-32).
Indeed, we do expect userland to know exactly which registers, in which
order, we mean by "the 6 syscall arg registers" on the particular arch/ABI.
It may be handy for userland that on some arch's or for many syscalls on
most arch's, this happens to correspond with one argument per register in
the natural order of syscall arguments at source level. But it's
userland's problem to grok the ABI enough to know what these 6 registers
plus the syscall # mean.
The reason we need syscall_get_arguments is that it can be used in places
that the full user_regset calls can't be, such as in task_current_syscall.
It's deep knowledge of the internals of the kernel's entry paths to know
which user registers can and can't be found correctly in pt_regs when
inside a fast-path system call entry. Nothing in userland knows that, nor
should it. (And anyway, the user_regset calls are only kosher to call at
"safe spots" like ptrace/signal stops--so that the arch's regset code is
not obliged to carefully slice and dice for partially-available regsets.)
When inside a syscall, the argument registers were certainly copied in, so
we specify syscall_get_arguments assuming that these can always be got.
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: CONFIG_HAVE_ARCH_TRACEHOOK and you
2008-09-12 13:05 ` Paul Mundt
@ 2008-09-15 20:38 ` Roland McGrath
0 siblings, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2008-09-15 20:38 UTC (permalink / raw)
To: Paul Mundt; +Cc: linux-arch, utrace-devel, linux-kernel
> user_stack_pointer() is apparently a requirement, too.
Ah, yes! I'd forgotten about user_stack_pointer() and instruction_pointer().
Those never got properly documented either.
> Although given that you already have a task_struct pointer the only place
> you currently use it (lib/syscall.c), it makes more sense to use
> KSTK_ESP/KSTK_EIP which is provided by almost everyone already.
Almost? It must be everyone, right?
It's used unconditionally in fs/proc/array.c (the only use).
I hadn't noticed these before. They aren't commented anywhere.
I've got to say, too, these are truly dismal names!
Also, I've just noticed that x86-64's user_stack_pointer() is wrong for the
case when a fast-path 64-bit syscall is in progress. To get it right, it
needs a bit from the struct thread_info, so a call that takes task_struct
instead of (or in addition to) pt_regs is certainly right.
These are defined in asm/processor.h, as macros. It would be better if
they could be inlines, but they really can't because asm/processor.h is
before struct task_struct is defined, etc. I wonder if we could move these
to another header where they can be clean inlines. I'd sure like to change
those names while we're at it.
Thoughts?
Thanks,
Roland
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-09-15 20:38 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-12 13:40 CONFIG_HAVE_ARCH_TRACEHOOK and you Russell King
2008-09-12 13:40 ` Russell King
2008-09-12 23:57 ` Roland McGrath
2008-09-12 23:57 ` Roland McGrath
-- strict thread matches above, loose matches on Subject: below --
2008-09-12 2:57 Roland McGrath
2008-09-12 2:57 ` Roland McGrath
2008-09-12 8:23 ` Christoph Hellwig
2008-09-12 13:05 ` Paul Mundt
2008-09-15 20:38 ` Roland McGrath
2008-09-12 13:13 ` Russell King
2008-09-12 21:57 ` David Miller
2008-09-12 22:31 ` Roland McGrath
2008-09-12 22:31 ` Roland McGrath
2008-09-12 22:37 ` David Miller
2008-09-12 22:39 ` Roland McGrath
2008-09-12 22:39 ` Roland McGrath
2008-09-12 22:40 ` David Miller
2008-09-12 22:45 ` Roland McGrath
2008-09-12 22:45 ` Roland McGrath
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox