From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757346AbYEDNsu (ORCPT ); Sun, 4 May 2008 09:48:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755727AbYEDNsm (ORCPT ); Sun, 4 May 2008 09:48:42 -0400 Received: from tomts36-srv.bellnexxia.net ([209.226.175.93]:62187 "EHLO tomts36-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755244AbYEDNsl (ORCPT ); Sun, 4 May 2008 09:48:41 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: As8EAKpbHUhMROPA/2dsb2JhbACBU6gB Date: Sun, 4 May 2008 09:48:39 -0400 From: Mathieu Desnoyers To: Ingo Molnar Cc: linux-kernel@vger.kernel.org, systemtap@sources.redhat.com, "Frank Ch. Eigler" Subject: System call instrumentation Message-ID: <20080504134838.GA21487@Krystal> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 09:26:49 up 65 days, 9:37, 3 users, load average: 1.10, 0.55, 0.60 User-Agent: Mutt/1.5.16 (2007-06-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Ingo, I looked at the system call instrumentation present in LTTng lately. I tried different solutions, e.g. hooking a kernel-wide syscall trace in do_syscall_trace, but it appears that I ended up re-doing another syscall table, which consists of specialized functions which extracts the string and data structure parameters from user-space. Since code duplication is not exactly wanted, I think that the original approach taken in my patchset, which is to instrument the kernel code at the sys_* level (e.g. sys_open), which is the earliest level where the parameter information is made available to the kernel, is still the best way to go. I would still identify the execution mode changes in the same way I do currently, which is by instrumenting do_syscall_trace, just to know as soon as possible when the mode has changed from user-space to kernel-space so we can do time accounting more accurately. I already have the patchset which adds the KERNEL_TRACE thread flag to every architectures. It's tested in assembly in the same way SYSCALL_TRACE is tested, but is activated globally by iterating on all the threads. So, the currently proposed scheme for a system call would be (for the open() example) shown as : kernel stack trace: event name (parameters) do_syscall_trace() trace: kernel_arch_syscall_entry (syscall id, instruction pointer) do_sys_open() trace: fs_open (fd, filename) do_syscall_trace() kernel_arch_syscall_exit (return value) If we take this open() example, filename is ready only in do_sys_open, which is called by sys_open and sys_openat. So the logical instrumentation site for this would really be do_sys_open(). The information about which system call has been done is made available in the kernel_arch_syscall_entry event. It is not present anymore at the do_sys_open level because this execution path can be called from more than one syscall. What do you think of this approach ? Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68