From: Jason Baron <jbaron@redhat.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org, laijs@cn.fujitsu.com,
rostedt@goodmis.org, peterz@infradead.org,
mathieu.desnoyers@polymtl.ca, jiayingz@google.com,
mbligh@google.com, roland@redhat.com, fche@redhat.com
Subject: Re: [PATCH 2/2] convert to syscall tracepoints
Date: Tue, 9 Jun 2009 15:17:46 -0400 [thread overview]
Message-ID: <20090609191746.GB3111@redhat.com> (raw)
In-Reply-To: <20090609185320.GC6057@nowhere>
On Tue, Jun 09, 2009 at 08:53:21PM +0200, Frederic Weisbecker wrote:
> On Tue, Jun 09, 2009 at 10:13:09AM -0400, Jason Baron wrote:
> > On Tue, Jun 09, 2009 at 01:02:35AM +0200, Frederic Weisbecker wrote:
> > > On Mon, Jun 08, 2009 at 05:38:33PM -0400, Jason Baron wrote:
> > > > On Mon, Jun 08, 2009 at 11:25:26PM +0200, Ingo Molnar wrote:
> > > > > > On Mon, Jun 08, 2009 at 10:40:56PM +0200, Ingo Molnar wrote:
> > > > > > > * Jason Baron <jbaron@redhat.com> wrote:
> > > > > > >
> > > > > > > > +#ifdef __NR_time
> > > > > > > > +trace_event_syscall(1, time, time_t __user *, tloc);
> > > > > > > > +#endif
> > > > > > > > +
> > > > > > > > +#ifdef __NR_stime
> > > > > > > > +trace_event_syscall(1, stime, time_t __user *, tptr);
> > > > > > > > +#endif
> > > > > > > > +
> > > > > > > > +#ifdef __NR_gettimeofday
> > > > > > > > +trace_event_syscall(2, gettimeofday, struct timeval __user *, tv, struct timezone __user *, tz);
> > > > > > > > +#endif
> > > > > > >
> > > > > > > This could be reduced to a single line: just add a Kconfig entry
> > > > > > > (say TRACE_SYSCALL_TRACEPOINTS) wether an arch supports syscall
> > > > > > > tracepoints, enable it on a sane arch, make sure it has all the
> > > > > > > syscalls and list them ...
> > > > > > >
> > > > > > > As more architectures turn on SYSCALL_TRACEPOINTS, they'll have to
> > > > > > > resolve any deviations in syscall entry points. Ideally we'd have
> > > > > > > one generic table that covers 95% of all syscalls, and the remaining
> > > > > > > 5% in some architecture specific #ifdef section.
> > > > > > >
> > > > > >
> > > > > > true, but this implementation works for all arches now, why would
> > > > > > want to slowly add this over time? [...]
> > > > >
> > > > > Because the current solution is butt-ugly ...
> > > > >
> > > > > > [...] I think its unnecessary work that could be error prone.
> > > > >
> > > > > This area needs cleanups - making it messier doesnt help. (I've
> > > > > Cc:-ed hpa - he has expressed interest in auto-generating all the
> > > > > syscall related details from another angle ...)
> > > > >
> > > > > > > But, more generally, i'm not at all convinced that we need _any_
> > > > > > > of this enumeration. Look how much the above lines duplicate
> > > > > > > DEFINE_SYSCALL macros. Why arent those macros re-used?
> > > > > >
> > > > > > The DEFINE_SYSCALL() are located all over the code in various .c files.
> > > > >
> > > > > yes, and that's good.
> > > > >
> > > > > > Thus, if we define the tracpoints via the DEFINE_SYSCALL() macros
> > > > > > we are going to have 'static inline functions' (which is how
> > > > > > tracepoints are implemented) defined in all these .c files. Now, I
> > > > > > need to call all these 'static inline functions' from ptrace.c.
> > > > > > How do I do that? [...]
> > > > >
> > > > > And that's bad.
> > > > >
> > > > > We dont want a per syscall tracepoint call site. AT ALL.
> > > > >
> > > > > We want to collect the record information, we want to construct
> > > > > /debug/tracing/events/syscalls/ directories with all the proper
> > > > > tracepoint-lookalike entries, and then we want to use the
> > > > > _existing_, _zero overhead_ method implemented by Frederic to get
> > > > > per syscall functionality.
> > > > >
> > > >
> > > > Yes, this can easily be done....but that wasn't the problem I was
> > > > interested in solving. I wanted a per syscall tracepoint site. I thought
> > > > I had been making that clear all along...Please notice that the
> > > > implementation I've proposed obtains the syscall number, and then jumps
> > > > to the appropriate tracepoint and then exits. Its quite efficient. In
> > > > fact, I've enabled all of the syscalls using my proposed method and
> > > > running tbench I'm able to get more throughput then using the current
> > > > syscall method. I've also done 'getpid()' loops and seen no performance
> > > > difference between the approaches. I'm happy to run any other
> > > > benchmarks...
> > > >
> > > > > Have you looked at how the syscall attributes information is
> > > > > constructed by using .section tricks? See:
> > > > > kernel/trace/trace_syscalls.c.
> > > > >
> > > >
> > > > yes, I believe I understand the problem space. I had been talking about
> > > > a per-syscall tracepoint all along...maybe I wasn't clear...
> > > >
> > > > thanks,
> > > >
> > > > -Jason
> > >
> > >
> > > Ok, I understand the problem.
> > > Well, the fact is that we can use the artifact from the current syscall tracer
> > > to solve a part of this problem.
> > >
> > > Currently the syscall tracer does the following:
> > >
> > > - create a section with all syscalls informations, provided by DEFINE_SYSCALL()
> > > That includes the size, type, name of parameters.
> > >
> > > - map a table during the boot which resolves a syscall number to its information
> > > in the syscall metadata section
> > >
> > > - uses a generic "trace_syscall()" (or something like that) in ptrace.c (x86)
> > > which gather informations from the current syscalls (get from the mapped table)
> > > and then send the trace to the ring buffer.
> > >
> > > - have a pretty printing (well, not that much actually) callback which, again,
> > > retrieve the syscall information from its number after getting the trace from
> > > the ring buffer. And then the raw field values aree printed, with the field
> > > names, and their types, optionally.
> > >
> > > Now what I would suggest to avoid this whole listing of syscalls in your patch
> > > is to avoid the use of hardcoded tracepoints.
> > >
> > > We can't really use TRACE_EVENT() here without using the listing you did.
> > > Instead, you could define a custom struct ftrace_event_call from DEFINE_SYSCALL().
> > >
> > > In regfunc() you can turn on TIF_FTRACE (using a refcounter).
> > >
> > > The struct trace_event ftrace_event_type can reuse the existing output callback
> > > for syscall tracing which retrieve the syscall informations.
> > >
> > > void ftrace_raw_event_##call() can be replaced by calling directly the existing
> > > generic callback for syscall tracing trace insertion.
> > >
> > > And the arch mapping table can resolve a syscall number to its matching
> > > event.
> > >
> >
> > hmmm..so I presume this would layer on 2 tracepoints? One in syscall
> > entry and one in exit, presumably passing a 'struct pt_regs'?
>
>
>
> Exactly, instead of having two tracepoints per syscalls, we would have only
> two generic and smart enough to handle all syscalls thanks to the syscalls
> metadata.
>
>
>
> > I think
> > the refcounter would also have to be deeper in the tracepoint
> > infrastructure since the event tracing wouldn't be the only potential
> > user of these tracepoints.
>
>
>
> Yeah. Ie we would have two layers of syscalls tracing (de)activation:
> the TIF flags to enable the whole syscall tracing, and also a state bit
> in syscalls metadata. So that we can activate/deactivate each of them
> independently, like any "normal" tracepoint.
>
> Frederic.
>
ok, this is what I was thinking here as well...I'll code this
implementation up.
thanks,
-Jason
next prev parent reply other threads:[~2009-06-09 19:19 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-05 18:07 [PATCH 0/2] convert ftrace syscalls to TRACE_EVENT Jason Baron
2009-06-05 18:08 ` [PATCH 1/2] allow TP_printk() to have no args Jason Baron
2009-06-05 18:08 ` [PATCH 2/2] convert to syscall tracepoints Jason Baron
2009-06-07 13:29 ` Ingo Molnar
2009-06-08 20:24 ` Jason Baron
2009-06-08 20:40 ` Ingo Molnar
2009-06-08 21:11 ` Jason Baron
2009-06-08 21:25 ` Ingo Molnar
2009-06-08 21:38 ` Jason Baron
2009-06-08 22:00 ` Ingo Molnar
2009-06-08 23:02 ` Frederic Weisbecker
2009-06-09 14:13 ` Jason Baron
2009-06-09 18:53 ` Frederic Weisbecker
2009-06-09 19:17 ` Jason Baron [this message]
2009-06-07 19:19 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090609191746.GB3111@redhat.com \
--to=jbaron@redhat.com \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=jiayingz@google.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mbligh@google.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=roland@redhat.com \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.