public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>,
	Jason Baron <jbaron@redhat.com>,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, rostedt@goodmis.org, peterz@infradead.org,
	jiayingz@google.com, mbligh@google.com, lizf@cn.fujitsu.com,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [PATCH 08/12] add trace events for each syscall entry/exit
Date: Tue, 25 Aug 2009 20:42:48 -0400	[thread overview]
Message-ID: <20090826004248.GA5793@Krystal> (raw)
In-Reply-To: <20090826001902.GC9953@nowhere>

* Frederic Weisbecker (fweisbec@gmail.com) wrote:
> On Tue, Aug 25, 2009 at 03:51:11PM -0400, Mathieu Desnoyers wrote:
> > * Frederic Weisbecker (fweisbec@gmail.com) wrote:
> > > On Tue, Aug 25, 2009 at 02:31:19PM -0400, Mathieu Desnoyers wrote:
> > > > (Well, I do not have time currently to look into the gory details
> > > > (sorry), but let's try to take a step back from the problem.)
> > > > 
> > > > The design proposal for this kthread behavior wrt syscalls is based on a
> > > > very specific and current kernel behavior, that may happen to change and
> > > > that I have actually seen proven incorrect. For instance, some
> > > > proprietary Linux driver does very odd things with system calls within
> > > > kernel threads, like invoking them with int 0x80.
> > > > 
> > > > Yes, this is odd, but do we really want to tie the tracer that much to
> > > > the actual OS implementation specificities ?
> > > 
> > > 
> > > I really can't see the point in doing this. I don't expect the kernel
> > > behaviour to change soon and have explicit syscalls interrupts done
> > > from it. It's not about a current kernel implementation fashion,
> > > it's about kernel design sanity that is not likely to go backward.
> > > 
> > > Is it worth it to trace kernel threads, maintain their tracing
> > > specificities (such as workarounds with ret_from_fork that implies)
> > > just because we want to support tracing on some silly proprietary drivers?
> > > 
> > > 
> > > > 
> > > > That sounds like a recipe for endless breakages and missing bits of
> > > > instrumentation.
> > > >
> > > > So my advice would be: if we want to trace the syscall entry/exit paths,
> > > > let's trace them for the _whole_ system, and find ways to make it work
> > > > for corner-cases rather than finding clever ways to diminish
> > > > instrumentation coverage.
> > > 
> > > 
> > > If developers of out of tree drivers want to implement buggy things
> > > that would never be accepted after a minimal review here, and then instrument
> > > their bugs, then I would suggest them to implement their own ad hoc instrumentation,
> > > really :-/
> > > 
> > > What's the point in supporting out of tree bugs?
> > > 
> > > Well, the only advantage of doing this would be to support reverse engineering
> > > in tiny and rare corner cases. Not that worth the effort.
> > > 
> > >  
> > > > Given the ret from fork example happens to be the first event fired
> > > > after the thread is created, we should be able to deal with this problem
> > > > by initializing the thread structure used by syscall exit tracing to an
> > > > initial "ret from fork" value.
> > > > 
> > > > Mathieu
> > > 
> > > 
> > > It means we have to support and check this corner case in every archs
> > > that support syscall tracing, deal with crashes because we omitted it, etc...
> > > 
> > > For all the things I've explained above I don't think it's worth the effort.
> > > 
> > > But it's just my opinion...
> > > 
> > 
> > Then we might want to explicitly require that calls to sys_*() system
> > calls made from within the kernel pass through another instrumentation
> > mechanism. IMHO, that would make sense. It would cover both system calls
> > made from kernel threads and system calls made from within a system call
> > or trap.
> > 
> > Mathieu
> 
> 
> Well, we can't really set a tracepoint per sys_*() function. Or more
> precisely we already have them, automagically generated and relying on
> sysenter ptrace path.
> 
> But if we want to check which syscalls are called from kernel threads, we have:
> 
> - kthread() -> do_exit()
> 
> 
> The entry point of every kernel threads (except "kthreadd") is
> kthread(). It calls do_exit() in the end.
> 
> If we want to trace the exit of a kernel thread, we can put
> a tracepoint there instead of do_exit() which results would
> be intermixed with sys_exit() tracing.
> 
> 
> - kthreadd :: create_kthread() -> kernel_thread() -> do_fork()
> 
> 
> A creation of a thread is the result of the kthreadd thread fork().
> If we want to trace the creation of kernel threads, we can again do that
> in the upper level: kernel_thread().
> 
> But does that inform us about who created the thread? All we would see
> is kthreadd that forks. This is a very poor information compared
> to a userspace fork() that tells us who really created the new process.
> 
> Instead what we want is probably to trace kthread_create() which inserts the
> job of a thread creation in the kthreadd thread, so that we know
> _who_ asked for this thread creation (process that requested it and callsite).
> And that's much more rich in information.
> 
> Well, you can even climb in an upper layer and look if this is a workqueue,
> a kernel/async.c thread, a slow work, etc...
> 
> 
> - kernel_execve() -> sys_execve()
> 
> We can execute user apps from kernel through call_usermodehelper().
> And we can trace kernel_execve() or again in an upper layer
> like call_usermodehelper()
> 
> - ... I guess there are other examples
> 
> The kernel calls syscalls through wrappers, and tracing these wrappers,
> depending of the desired level of informations we want (choose your layer),
> are much more verbose / rich in informations.
> 

What you describe looks a lot like the approach I use in the LTTng tree.
Actually, the main point I am trying to make here is: if we rely only on
tracing at the syscall entry/exit level for, say, monitoring all uses of
e.g. sys_open(), we might be caught offguard by internal sys_open() uses
within the kernel.

sys_open if just an example (and possibly a bad one), but I am just
saying that syscall entry/exit tracing should not be seen as a complete
replacement of tracepoints added within the most important system call
sites if we plan to keep track of the overall kernel activity.

But we can do that incrementally, and it's only partially related to
syscall entry/exit instrumentation. Actually, if we find out that we
have to add instrumentation within the kernel code for a relatively
large quantity of system calls, going through the current effort to
extract the system call arguments might be unnecessary if we eventually
end up extracting those arguments from tracepoints placed in the sys_*()
implementation.

Mathieu



-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2009-08-26  0:42 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-10 20:52 [PATCH 00/12] add syscall tracepoints V3 Jason Baron
2009-08-10 20:52 ` [PATCH 01/12] map syscall name to number Jason Baron
2009-08-10 20:52 ` [PATCH 02/12] call arch_init_ftrace_syscalls at boot Jason Baron
2009-08-10 20:52 ` [PATCH 03/12] add DECLARE_TRACE_WITH_CALLBACK() macro Jason Baron
2009-08-10 20:52 ` [PATCH 04/12] add syscall tracepoints Jason Baron
2009-08-10 20:52 ` [PATCH 05/12] update FTRACE_SYSCALL_MAX Jason Baron
2009-08-11 11:00   ` Frederic Weisbecker
2009-08-11 19:39     ` Matt Fleming
2009-08-24 13:41     ` Paul Mundt
2009-08-24 14:06       ` Jason Baron
2009-08-24 14:15         ` Paul Mundt
2009-08-24 14:34           ` Frederic Weisbecker
2009-08-24 14:37             ` Paul Mundt
2009-08-24 14:42           ` Jason Baron
2009-08-24 14:50             ` Paul Mundt
2009-08-24 18:34               ` Ingo Molnar
2009-08-10 20:52 ` [PATCH 06/12] trace_event - raw_init bailout Jason Baron
2009-08-10 20:52 ` [PATCH 07/12] add ftrace_event_call void * 'data' field Jason Baron
2009-08-11 10:09   ` Frederic Weisbecker
2009-08-17 22:19     ` Steven Rostedt
2009-08-17 23:09       ` Frederic Weisbecker
2009-08-18  0:06         ` Steven Rostedt
2009-08-10 20:52 ` [PATCH 08/12] add trace events for each syscall entry/exit Jason Baron
2009-08-11 10:50   ` Frederic Weisbecker
2009-08-11 11:45     ` Ingo Molnar
2009-08-11 12:01       ` Frederic Weisbecker
2009-08-25 12:50   ` Hendrik Brueckner
2009-08-25 14:15     ` Frederic Weisbecker
2009-08-25 16:02       ` Hendrik Brueckner
2009-08-25 16:20         ` Mathieu Desnoyers
2009-08-25 16:59           ` Frederic Weisbecker
2009-08-25 17:31             ` Frederic Weisbecker
2009-08-25 18:31               ` Mathieu Desnoyers
2009-08-25 19:42                 ` Frederic Weisbecker
2009-08-25 19:51                   ` Mathieu Desnoyers
2009-08-26  0:19                     ` Frederic Weisbecker
2009-08-26  0:42                       ` Mathieu Desnoyers [this message]
2009-08-26  7:28                         ` Ingo Molnar
2009-08-26 17:11                           ` Mathieu Desnoyers
2009-08-26  6:48                   ` Peter Zijlstra
2009-08-25 22:04                 ` Martin Schwidefsky
2009-08-26  7:38                   ` Heiko Carstens
2009-08-26 12:32                     ` Frederic Weisbecker
2009-08-26  6:21                 ` Peter Zijlstra
2009-08-26 17:08                   ` Mathieu Desnoyers
2009-08-26 18:41                     ` Christoph Hellwig
2009-08-26 18:42                       ` Christoph Hellwig
2009-08-26 19:01                         ` Mathieu Desnoyers
2009-08-26  7:10                 ` Peter Zijlstra
2009-08-26 17:10                   ` Mathieu Desnoyers
2009-08-26 17:24                   ` H. Peter Anvin
2009-08-25 17:04           ` Jason Baron
2009-08-25 18:15             ` Mathieu Desnoyers
2009-08-26 12:35         ` Frederic Weisbecker
2009-08-26 12:59           ` Heiko Carstens
2009-08-26 13:30             ` Frederic Weisbecker
2009-08-26 13:48               ` Steven Rostedt
2009-08-26 13:53                 ` Frederic Weisbecker
2009-08-26 14:44                   ` Steven Rostedt
2009-08-26 13:56                 ` Peter Zijlstra
2009-08-26 14:41                   ` Steven Rostedt
2009-08-26 14:10               ` Heiko Carstens
2009-08-26 14:27                 ` Frederic Weisbecker
2009-08-26 14:43                   ` Steven Rostedt
2009-08-26 16:14                     ` Frederic Weisbecker
2009-08-26 14:43                 ` Steven Rostedt
2009-08-26 14:41           ` Hendrik Brueckner
2009-08-28 12:28         ` [tip:tracing/core] tracing: Don't trace kernel thread syscalls tip-bot for Hendrik Brueckner
2009-08-25 21:40     ` [PATCH 08/12] add trace events for each syscall entry/exit Frederic Weisbecker
2009-08-25 22:09       ` Frederic Weisbecker
2009-08-26  7:47         ` Heiko Carstens
2009-08-28 12:27     ` [tip:tracing/core] tracing: Check invalid syscall nr while tracing syscalls tip-bot for Hendrik Brueckner
2009-08-10 20:52 ` [PATCH 09/12] add support traceopint ids Jason Baron
2009-08-11 11:28   ` Frederic Weisbecker
2009-08-10 20:53 ` [PATCH 10/12] add perf counter support Jason Baron
2009-08-11 12:12   ` Frederic Weisbecker
2009-08-11 12:17     ` Ingo Molnar
2009-08-11 12:25       ` Frederic Weisbecker
2009-08-10 20:53 ` [PATCH 11/12] add more namespace area to 'perf list' output Jason Baron
2009-08-10 20:53 ` [PATCH 12/12] convert x86_64 mmap and uname to use DEFINE_SYSCALL Jason Baron
2009-08-25 12:31 ` [PATCH 00/12] add syscall tracepoints V3 - s390 arch update Hendrik Brueckner
2009-08-25 13:52   ` Frederic Weisbecker
2009-08-25 14:39     ` Heiko Carstens
2009-08-25 19:52       ` Frederic Weisbecker
2009-08-25 15:38     ` Hendrik Brueckner
2009-08-26 16:53   ` Frederic Weisbecker
2009-08-27  7:27     ` [PATCH]: tracing: s390 arch updates for tracing syscalls Hendrik Brueckner
2009-08-28 12:27   ` [tip:tracing/core] tracing: Add syscall tracepoints - s390 arch update tip-bot for Hendrik Brueckner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090826004248.GA5793@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=brueckner@linux.vnet.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jbaron@redhat.com \
    --cc=jiayingz@google.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mbligh@google.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox