public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	Masami Hiramatsu <mhiramat@redhat.com>,
	Roland McGrath <roland@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	LKML <linux-kernel@vger.kernel.org>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Jiaying Zhang <jiayingz@google.com>,
	Martin Bligh <mbligh@google.com>,
	utrace-devel@redhat.com
Subject: Re: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
Date: Tue, 17 Mar 2009 00:46:58 +0100	[thread overview]
Message-ID: <20090316234657.GC6150@nowhere> (raw)
In-Reply-To: <20090316221800.GE12974@redhat.com>

On Mon, Mar 16, 2009 at 06:18:00PM -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> 
> On Mon, Mar 16, 2009 at 05:45:26PM -0400, Mathieu Desnoyers wrote:
> 
> > [...]
> > > As far as I know, utrace supports multiple trace-engines on a process.
> > > Since ptrace is just an engine of utrace, you can add another engine on utrace.
> > > 
> > > utrace-+-ptrace_engine---owner_process
> > >        |
> > >        +-systemtap_module
> > >        |
> > >        +-ftrace_plugin
> 
> Right.  In this way, utrace is simply a multiplexing intermediary.
> 
> 
> > > Here, Frank had posted an example of utrace->ftrace engine.
> > > http://lkml.org/lkml/2009/1/27/294
> > > 
> > > And here is the latest his patch(which seems to support syscall tracing...)
> > > http://git.kernel.org/?p=linux/kernel/git/frob/linux-2.6-utrace.git;a=blob;f=kernel/trace/trace_process.c;h=619815f6c2543d0d82824139773deb4ca460a280;hb=ab20efa8d8b5ded96e8f8c3663dda3b4cb532124
> > > 
> > 
> > Reminder : we are looking at system-wide tracing here. Here are some
> > comments about the current utrace implementation.
> > 
> > Looking at include/linux/utrace.h from the tree
> > 
> > 17  * A tracing engine starts by calling utrace_attach_task() or
> > 18  * utrace_attach_pid() on the chosen thread, passing in a set of hooks
> > 19  * (&struct utrace_engine_ops), and some associated data.  This produces a
> > 20  * &struct utrace_engine, which is the handle used for all other
> > 21  * operations.  An attached engine has its ops vector, its data, and an
> > 22  * event mask controlled by utrace_set_events().
> > 
> > So if the system has, say 3000 threads, then we have 3000 struct
> > utrace_engine created ? I wonder what effet this could have one
> > cachelines if this is used to trace hot paths like system call
> > entry/exit. Have you benchmarked this kind of scenario under tbench ?
> 
> It has not been a problem, since utrace_engines are designed to be
> lightweight.  Starting or stopping a systemtap script of the form
> 
>     probe process.syscall {}
> 
> appears to have no noticable impact on a tbench suite.
> 
> 
> > 24  * For each event bit that is set, that engine will get the
> > 25  * appropriate ops->report_*() callback when the event occurs.  The
> > 26  * &struct utrace_engine_ops need not provide callbacks for an event
> > 27  * unless the engine sets one of the associated event bits.
> > 
> > Looking at utrace_set_events(), we seem to be limited to 32 events on a
> > 32-bits architectures because it uses a bitmask ? Isn't it a bit small?
> 
> There are only a few types of thread events that involve different
> classes of treatment, or different degrees of freedom in terms of
> interference with the uninstrumented fast path of the threads.
> 
> For example, it does not make sense to have different flag bits for
> different system calls, since choosing to trace *any* system call
> involves taking the thread off of the fast path with the TIF_ flag.
> Once it's off the fast path, it doesn't matter whether the utrace core
> or some client performs syscall discrimination, so it is left to the
> client.
> 
> 
> > 682 /**
> > 683  * utrace_set_events_pid - choose which event reports a tracing engine gets
> > 684  * @pid:                thread to affect
> > 685  * @engine:             attached engine to affect
> > 686  * @eventmask:          new event mask
> > 687  *
> > 688  * This is the same as utrace_set_events(), but takes a &struct pid
> > 689  * pointer rather than a &struct task_struct pointer.  The caller must
> > 690  * hold a ref on @pid, but does not need to worry about the task
> > 691  * staying valid.  If it's been reaped so that @pid points nowhere,
> > 692  * then this call returns -%ESRCH.
> > 
> > 
> > Comments like "but does not need to worry about the task staying valid"
> > does not make me feel safe and comfortable at all, could you explain
> > how you can assume that derefencing an "invalid" pointer will return
> > NULL ?
> 
> (We're doing a final round of "internal" (pre-LKML) reviews of the
> utrace implementation right now on utrace-devel@redhat.com, where such
> comments get fastest attention from the experts.)
> 
> For this particular issue, the utrace documentation file explains the
> liveness rules for the various pointers that can be fed to or received
> from utrace functions.  This is not about "feeling" safe, it's about
> what the mechanism is deliberately designed to permit.
> 
> 
> > About the utrace_attach_task() :
> > 
> > 244         if (unlikely(target->flags & PF_KTHREAD))
> > 245                 /*
> > 246                  * Silly kernel, utrace is for users!
> > 247                  */
> > 248                 return ERR_PTR(-EPERM);
> > 
> > So we cannot trace kernel threads ?
> 
> I'm not quite sure about all the reasons for this, but I believe that
> kernel threads don't tend to engage in job control / signal /
> system-call activities the same way as normal user threads do.
> 


Some of them use some syscalls, but it doesn't involve a user/kernel switch.
So it's not tracable by hooking syscall_entry/exit or using tracehooks.
It would require specific hooks on sys_* functions for that.

So this check is right (writing on each thread info seems somewhat costly so
it's better if it is avoided like here).

Frederic.

 
> > 118 /*
> > 119  * Called without locks, when we might be the first utrace engine to attach.
> > 120  * If this is a newborn thread and we are not the creator, we have to wait
> > 121  * for it.  The creator gets the first chance to attach.  The PF_STARTING
> > 122  * flag is cleared after its report_clone hook has had a chance to run.
> > 123  */
> > 124 static inline int utrace_attach_delay(struct task_struct *target)
> > 125 {
> > 126         if ((target->flags & PF_STARTING) && target->real_parent != current)
> > 127                 do {
> > 128                         schedule_timeout_interruptible(1);
> > 129                         if (signal_pending(current))
> > 130                                 return -ERESTARTNOINTR;
> > 131                 } while (target->flags & PF_STARTING);
> > 132
> > 133         return 0;
> > 134 }
> > 
> > Why do we absolutely have to poll until the thread has started ?
> 
> (I don't know off the top of my head - Roland?)
> 
> 
> > utrace_add_engine()
> >   set_notify_resume(target);
> > 
> > ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice
> > that it is set asynchronously with the execution of the target thread
> > (as I do with my TIF_KERNEL_TRACE thread flag).
> > 
> > However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in 
> > entry_64.S
> > 
> > int_signal:
> > and
> > retint_signal:
> > 
> > code paths. However, if there is no syscall tracing to do upon syscall
> > entry, the thread flags are not re-read at syscall exit and you will
> > miss the syscall exit returning from your target thread if this thread
> > was blocked while you set its TIF_NOTIFY_RESUME. Or is it handled in
> > some subtle way I did not figure out ? BTW re-reading the TIF flags from
> > the thread_info at syscall exit on the fast path is out of question
> > because it considerably degrades the kernel performances. entry_*.S is
> > a very, very critical path.
> 
> (I don't know off the top of my head - Roland?)
> 
> 
> - FChE


  reply	other threads:[~2009-03-16 23:47 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-07  4:52 [RFC][PATCH 0/2] Syscalls tracing Frederic Weisbecker
2009-03-07  4:52 ` [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure Frederic Weisbecker
2009-03-07  7:49   ` Frederic Weisbecker
2009-03-09 13:52     ` Steven Rostedt
2009-03-09 20:47       ` Frederic Weisbecker
2009-03-10  0:52         ` Steven Rostedt
2009-03-10 10:08           ` Frederic Weisbecker
2009-03-08 16:24   ` Ingo Molnar
2009-03-09 20:40     ` Frederic Weisbecker
2009-03-09 13:44   ` Steven Rostedt
2009-03-09 20:42     ` Frederic Weisbecker
2009-03-13  4:27   ` [tip:tracing/syscalls] tracing/ftrace: syscall tracing infrastructure, basics Frederic Weisbecker
2009-03-16 19:36   ` [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure Masami Hiramatsu
2009-03-16 20:15     ` Frederic Weisbecker
2009-03-16 20:38       ` Masami Hiramatsu
2009-03-16 21:45         ` Mathieu Desnoyers
2009-03-16 22:18           ` Frank Ch. Eigler
2009-03-16 23:46             ` Frederic Weisbecker [this message]
2009-03-23 16:42             ` Mathieu Desnoyers
2009-03-23 16:52               ` Frank Ch. Eigler
2009-03-23 17:03                 ` Mathieu Desnoyers
2009-03-17  5:24           ` Oleg Nesterov
2009-03-17 16:00             ` Mathieu Desnoyers
2009-03-19 10:34               ` Roland McGrath
2009-03-23 17:33                 ` Mathieu Desnoyers
2009-03-07  4:53 ` [RFC][PATCH 2/2] tracing/x86: basic implementation of syscall tracing for x86-64 Frederic Weisbecker
2009-03-13  4:27   ` [tip:tracing/syscalls] tracing/x86: basic implementation of syscall tracing for x86 Frederic Weisbecker
2009-03-13 16:09   ` [tip:tracing/syscalls] tracing/syscalls: support for syscalls tracing on x86, fix Ingo Molnar
2009-03-13 16:32   ` [RFC][PATCH 2/2] tracing/x86: basic implementation of syscall tracing for x86-64 Mathieu Desnoyers
2009-03-13 16:37     ` Ingo Molnar
2009-03-13 16:50       ` Mathieu Desnoyers
2009-03-15  4:44         ` Ingo Molnar
2009-03-15 15:16           ` Frederic Weisbecker
2009-03-15 19:22           ` Mathieu Desnoyers
2009-03-07 12:15 ` [RFC][PATCH 0/2] Syscalls tracing Peter Zijlstra
2009-03-07 16:02   ` Frederic Weisbecker
2009-03-08 11:24     ` Frederic Weisbecker
2009-03-08 11:28       ` Frederic Weisbecker
2009-03-07 20:26 ` Frank Ch. Eigler
2009-03-07 21:53   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090316234657.GC6150@nowhere \
    --to=fweisbec@gmail.com \
    --cc=fche@redhat.com \
    --cc=jiayingz@google.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mbligh@google.com \
    --cc=mhiramat@redhat.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=roland@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=utrace-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox