public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>,
	Ingo Molnar <mingo@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, David Sharp <dhsharp@google.com>,
	Justin Teravest <teravest@google.com>,
	Laurent Chavey <chavey@google.com>,
	Michael Davidson <md@google.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/6] trace: trace syscall in its handler not from ptrace handler
Date: Fri, 30 Mar 2012 13:57:17 +0200	[thread overview]
Message-ID: <20120330115715.GB13022@somewhere.redhat.com> (raw)
In-Reply-To: <4F74C0B2.1010100@zytor.com>

On Thu, Mar 29, 2012 at 01:06:10PM -0700, H. Peter Anvin wrote:
> On 03/29/2012 12:43 PM, Vaibhav Nagarnaik wrote:
> > 
> > However, we agree that the syscall tracing as implemented currently is
> > a bit unwieldy. We would want to be a part of the re-designing effort
> > if there is a momentum in the community towards that goal. We would be
> > happy to contribute towards this effort.
> > 
> 
> I had a long discussion with Frederic over IRC earlier today.  We came
> up with the following strawman:
> 
> 1. A system call thunk (which could be enabled/disabled by patching the
> syscall table.)  This provides an entry and exit hook, and also sets a
> per-thread flag to capture userspace traffic.
> 
> 2. Instrumenting get_user/put_user/copy_from_user/copy_to_user to
> capture traffic to userspace.  This captures the *full* set of system
> call arguments, including things addressed via pointers.  Furthermore,
> it captures the exact versions fed to or returned from the kernel, and
> deals with data-dependent collection like ioctl().
> 
> This has to be done with extreme care to avoid introducing overhead in
> the no-tracing case, however, as these functions are extraordinarily
> performance sensitive.  This probably will require careful patching in
> the first enable/last disable case.
> 
> 3. There will need to be userspace tools written to decode the resulting
> trace buffer.  This is pretty much needed anyway, but once you throw in
> complex data structures it becomes even more so.  A trace will basically
> consist of:
> 
> SYSCALL_ENTRY <syscall number> <arg1..6>
> COPY_FROM_USER <address> <data>
>   ...
> COPY_TO_USER <address> <data>
>   ...
> SYSCALL_EXIT <return value>
> 
> Outputting this in human-readable format requires some reasonably
> sophisticated logic, but the *HUGE* advantage is that not only is all
> the information there, it is *correct by construction*.
> 
> 	-hpa


Note we have the relevant tracepoints in place with the "raw_syscalls"
events subsystem. They are generic with only two tracepoints sys_enter
and sys_exit and they blindly dump the syscall number/arg/return value:

$ cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/format 
name: sys_enter
ID: 53
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;
        field:int common_padding;       offset:8;       size:4; signed:1;

        field:long id;  offset:16;      size:8; signed:1;
        field:unsigned long args[6];    offset:24;      size:48;        signed:0;

print fmt: "NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)", REC->id, REC->args[0], REC->args[1], REC->args[2], REC->args[3], 
REC->args[4], REC->args[5]

$ cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_exit/format 
name: sys_exit
ID: 52
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;
        field:int common_padding;       offset:8;       size:4; signed:1;

        field:long id;  offset:16;      size:8; signed:1;
        field:long ret; offset:24;      size:8; signed:1;

print fmt: "NR %ld = %ld", REC->id, REC->ret

Now we have yet to do the syscall table patching and the copy_*_user() tracepoints.
But other than these details the bulk of the remaining work is in userspace.

  parent reply	other threads:[~2012-03-30 11:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-26 18:39 [PATCH 0/6] Enhance and speed up syscall tracing Vaibhav Nagarnaik
2012-03-26 18:39 ` [PATCH 1/6] trace: syscalls.h - cleanup and simplify SYSCALL_METADATA() Vaibhav Nagarnaik
2012-03-26 18:39 ` [PATCH 2/6] trace: add support for 32 bit compat syscalls on x86_64 Vaibhav Nagarnaik
2012-03-27  4:49   ` H. Peter Anvin
2012-03-28 21:10     ` Vaibhav Nagarnaik
2012-03-28 21:11       ` Vaibhav Nagarnaik
2012-03-28 23:00         ` Vaibhav Nagarnaik
2012-03-26 18:39 ` [PATCH 3/6] trace: Refactor ftrace syscall macros to make them more readable Vaibhav Nagarnaik
2012-03-26 18:39 ` [PATCH 4/6] trace: trace syscall in its handler not from ptrace handler Vaibhav Nagarnaik
2012-03-27  5:00   ` H. Peter Anvin
2012-03-28 18:23     ` Vaibhav Nagarnaik
2012-03-29  2:43       ` H. Peter Anvin
2012-03-29  2:59         ` Steven Rostedt
2012-03-29  3:15           ` H. Peter Anvin
2012-03-29  3:02         ` Vaibhav Nagarnaik
2012-03-29  3:16           ` H. Peter Anvin
2012-03-29  6:20           ` Ingo Molnar
2012-03-29 19:02             ` Vaibhav Nagarnaik
2012-03-29 19:12               ` H. Peter Anvin
2012-03-29 19:43                 ` Vaibhav Nagarnaik
2012-03-29 20:06                   ` H. Peter Anvin
2012-03-29 22:40                     ` David Sharp
2012-03-29 22:44                       ` H. Peter Anvin
2012-03-30 12:06                       ` Frederic Weisbecker
2012-03-30 11:57                     ` Frederic Weisbecker [this message]
2012-03-29 22:44                 ` David Sharp
2012-03-29 22:48                   ` H. Peter Anvin
2012-03-26 18:39 ` [PATCH 5/6] trace: raw_syscalls: Mark compat syscalls in the MSB of the syscall number Vaibhav Nagarnaik
2012-03-26 18:39 ` [PATCH 6/6] trace: get rid of the enabled_*_syscalls bitmaps Vaibhav Nagarnaik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120330115715.GB13022@somewhere.redhat.com \
    --to=fweisbec@gmail.com \
    --cc=chavey@google.com \
    --cc=dhsharp@google.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=md@google.com \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=teravest@google.com \
    --cc=tglx@linutronix.de \
    --cc=vnagarnaik@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox