public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	Hendrik Brueckner <brueckner@linux.vnet.ibm.com>,
	Jason Baron <jbaron@redhat.com>,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, rostedt@goodmis.org, peterz@infradead.org,
	jiayingz@google.com, mbligh@google.com, lizf@cn.fujitsu.com
Subject: Re: [PATCH 08/12] add trace events for each syscall entry/exit
Date: Wed, 26 Aug 2009 14:32:32 +0200	[thread overview]
Message-ID: <20090826123229.GC6009@nowhere> (raw)
In-Reply-To: <20090826073819.GA4749@osiris.boeblingen.de.ibm.com>

On Wed, Aug 26, 2009 at 09:38:20AM +0200, Heiko Carstens wrote:
> On Wed, Aug 26, 2009 at 12:04:26AM +0200, Martin Schwidefsky wrote:
> > On Tue, 25 Aug 2009 14:31:19 -0400
> > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > > The design proposal for this kthread behavior wrt syscalls is based on a
> > > very specific and current kernel behavior, that may happen to change and
> > > that I have actually seen proven incorrect. For instance, some
> > > proprietary Linux driver does very odd things with system calls within
> > > kernel threads, like invoking them with int 0x80.
> 
> That's broken.. some proprietary drivers even change the system call table.
> Do you want to be able to deal with that as well?
> 
> > > Yes, this is odd, but do we really want to tie the tracer that much to
> > > the actual OS implementation specificities ?
> > > 
> > > That sounds like a recipe for endless breakages and missing bits of
> > > instrumentation.
> > > 
> > > So my advice would be: if we want to trace the syscall entry/exit paths,
> > > let's trace them for the _whole_ system, and find ways to make it work
> > > for corner-cases rather than finding clever ways to diminish
> > > instrumentation coverage.
> > 
> > I guess that the real reason for the crash is hidden in the initialization
> > of the pt_regs structure of the kernel thread.
> 
> On s390 the reason is that the scvnr in the pt_regs structure of the initial
> kernel thread is initialized to 0. svcnr contains the system call number
> and system call number 0 does not exist.
> That's why we have
> 
> static inline long syscall_get_nr(struct task_struct *task,
> 				  struct pt_regs *regs)
> {
> 	return regs->svcnr ? regs->svcnr : -1;
> }
> 
> Now, if you fork a kernel thread from the initial task the pt_regs structure
> gets copied. Upon ret_from_fork the trace exit path will get -1 for
> syscall_get_nr().
>  
> > > Given the ret from fork example happens to be the first event fired
> > > after the thread is created, we should be able to deal with this problem
> > > by initializing the thread structure used by syscall exit tracing to an
> > > initial "ret from fork" value.
> > 
> > That is my best guess as well.
> 
> What would that value be? __NR_fork?
> 
> Syscall tracing of kernel threads seems to be wrong. If somebody would do
> a "modprobe" and the init function of the module would create a kernel thread
> then syscall_get_nr() at the ret_from_fork path of the kernel thread would
> return __NR_init_module. That is of course only true if the old kernel_thread()
> API would be used. For kthread_create() it would return the syscall of the
> thread from which the kthread daemon was forked (the initial process I would
> guess, which was initialized to 0).
> 
> So skipping kernel threads at the exit path seems so be the best fix, IMHO ;)


Yeah, we can decide to trace syscalls from kernel, but doing so through
the current syscalls tracepoints is broken.


 
> ---
>  kernel/trace/trace_syscalls.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-next/kernel/trace/trace_syscalls.c
> ===================================================================
> --- linux-next.orig/kernel/trace/trace_syscalls.c
> +++ linux-next/kernel/trace/trace_syscalls.c
> @@ -253,6 +253,8 @@ void ftrace_syscall_exit(struct pt_regs 
>  	struct ring_buffer_event *event;
>  	int syscall_nr;
>  
> +	if (!current->mm)
> +		return;


Hendrik Brueckner already beat you at it and sent
a patch that ignores the TIF_SYSCALL_TRACEPOINT setting for
the kernel threads.

I'll add your acked by on it, thanks!


>  	syscall_nr = syscall_get_nr(current, regs);
>  	if (!test_bit(syscall_nr, enabled_exit_syscalls))
>  		return;


  reply	other threads:[~2009-08-26 12:32 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-10 20:52 [PATCH 00/12] add syscall tracepoints V3 Jason Baron
2009-08-10 20:52 ` [PATCH 01/12] map syscall name to number Jason Baron
2009-08-10 20:52 ` [PATCH 02/12] call arch_init_ftrace_syscalls at boot Jason Baron
2009-08-10 20:52 ` [PATCH 03/12] add DECLARE_TRACE_WITH_CALLBACK() macro Jason Baron
2009-08-10 20:52 ` [PATCH 04/12] add syscall tracepoints Jason Baron
2009-08-10 20:52 ` [PATCH 05/12] update FTRACE_SYSCALL_MAX Jason Baron
2009-08-11 11:00   ` Frederic Weisbecker
2009-08-11 19:39     ` Matt Fleming
2009-08-24 13:41     ` Paul Mundt
2009-08-24 14:06       ` Jason Baron
2009-08-24 14:15         ` Paul Mundt
2009-08-24 14:34           ` Frederic Weisbecker
2009-08-24 14:37             ` Paul Mundt
2009-08-24 14:42           ` Jason Baron
2009-08-24 14:50             ` Paul Mundt
2009-08-24 18:34               ` Ingo Molnar
2009-08-10 20:52 ` [PATCH 06/12] trace_event - raw_init bailout Jason Baron
2009-08-10 20:52 ` [PATCH 07/12] add ftrace_event_call void * 'data' field Jason Baron
2009-08-11 10:09   ` Frederic Weisbecker
2009-08-17 22:19     ` Steven Rostedt
2009-08-17 23:09       ` Frederic Weisbecker
2009-08-18  0:06         ` Steven Rostedt
2009-08-10 20:52 ` [PATCH 08/12] add trace events for each syscall entry/exit Jason Baron
2009-08-11 10:50   ` Frederic Weisbecker
2009-08-11 11:45     ` Ingo Molnar
2009-08-11 12:01       ` Frederic Weisbecker
2009-08-25 12:50   ` Hendrik Brueckner
2009-08-25 14:15     ` Frederic Weisbecker
2009-08-25 16:02       ` Hendrik Brueckner
2009-08-25 16:20         ` Mathieu Desnoyers
2009-08-25 16:59           ` Frederic Weisbecker
2009-08-25 17:31             ` Frederic Weisbecker
2009-08-25 18:31               ` Mathieu Desnoyers
2009-08-25 19:42                 ` Frederic Weisbecker
2009-08-25 19:51                   ` Mathieu Desnoyers
2009-08-26  0:19                     ` Frederic Weisbecker
2009-08-26  0:42                       ` Mathieu Desnoyers
2009-08-26  7:28                         ` Ingo Molnar
2009-08-26 17:11                           ` Mathieu Desnoyers
2009-08-26  6:48                   ` Peter Zijlstra
2009-08-25 22:04                 ` Martin Schwidefsky
2009-08-26  7:38                   ` Heiko Carstens
2009-08-26 12:32                     ` Frederic Weisbecker [this message]
2009-08-26  6:21                 ` Peter Zijlstra
2009-08-26 17:08                   ` Mathieu Desnoyers
2009-08-26 18:41                     ` Christoph Hellwig
2009-08-26 18:42                       ` Christoph Hellwig
2009-08-26 19:01                         ` Mathieu Desnoyers
2009-08-26  7:10                 ` Peter Zijlstra
2009-08-26 17:10                   ` Mathieu Desnoyers
2009-08-26 17:24                   ` H. Peter Anvin
2009-08-25 17:04           ` Jason Baron
2009-08-25 18:15             ` Mathieu Desnoyers
2009-08-26 12:35         ` Frederic Weisbecker
2009-08-26 12:59           ` Heiko Carstens
2009-08-26 13:30             ` Frederic Weisbecker
2009-08-26 13:48               ` Steven Rostedt
2009-08-26 13:53                 ` Frederic Weisbecker
2009-08-26 14:44                   ` Steven Rostedt
2009-08-26 13:56                 ` Peter Zijlstra
2009-08-26 14:41                   ` Steven Rostedt
2009-08-26 14:10               ` Heiko Carstens
2009-08-26 14:27                 ` Frederic Weisbecker
2009-08-26 14:43                   ` Steven Rostedt
2009-08-26 16:14                     ` Frederic Weisbecker
2009-08-26 14:43                 ` Steven Rostedt
2009-08-26 14:41           ` Hendrik Brueckner
2009-08-28 12:28         ` [tip:tracing/core] tracing: Don't trace kernel thread syscalls tip-bot for Hendrik Brueckner
2009-08-25 21:40     ` [PATCH 08/12] add trace events for each syscall entry/exit Frederic Weisbecker
2009-08-25 22:09       ` Frederic Weisbecker
2009-08-26  7:47         ` Heiko Carstens
2009-08-28 12:27     ` [tip:tracing/core] tracing: Check invalid syscall nr while tracing syscalls tip-bot for Hendrik Brueckner
2009-08-10 20:52 ` [PATCH 09/12] add support traceopint ids Jason Baron
2009-08-11 11:28   ` Frederic Weisbecker
2009-08-10 20:53 ` [PATCH 10/12] add perf counter support Jason Baron
2009-08-11 12:12   ` Frederic Weisbecker
2009-08-11 12:17     ` Ingo Molnar
2009-08-11 12:25       ` Frederic Weisbecker
2009-08-10 20:53 ` [PATCH 11/12] add more namespace area to 'perf list' output Jason Baron
2009-08-10 20:53 ` [PATCH 12/12] convert x86_64 mmap and uname to use DEFINE_SYSCALL Jason Baron
2009-08-25 12:31 ` [PATCH 00/12] add syscall tracepoints V3 - s390 arch update Hendrik Brueckner
2009-08-25 13:52   ` Frederic Weisbecker
2009-08-25 14:39     ` Heiko Carstens
2009-08-25 19:52       ` Frederic Weisbecker
2009-08-25 15:38     ` Hendrik Brueckner
2009-08-26 16:53   ` Frederic Weisbecker
2009-08-27  7:27     ` [PATCH]: tracing: s390 arch updates for tracing syscalls Hendrik Brueckner
2009-08-28 12:27   ` [tip:tracing/core] tracing: Add syscall tracepoints - s390 arch update tip-bot for Hendrik Brueckner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090826123229.GC6009@nowhere \
    --to=fweisbec@gmail.com \
    --cc=brueckner@linux.vnet.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jbaron@redhat.com \
    --cc=jiayingz@google.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mbligh@google.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox