From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755096AbZHYQUG (ORCPT ); Tue, 25 Aug 2009 12:20:06 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755064AbZHYQUG (ORCPT ); Tue, 25 Aug 2009 12:20:06 -0400 Received: from tomts36.bellnexxia.net ([209.226.175.93]:49945 "EHLO tomts36-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755056AbZHYQUF (ORCPT ); Tue, 25 Aug 2009 12:20:05 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AigFAOimk0pMROOX/2dsb2JhbACBU9ZbgjKBaAU Date: Tue, 25 Aug 2009 12:20:04 -0400 From: Mathieu Desnoyers To: Hendrik Brueckner , Frederic Weisbecker , Jason Baron , linux-kernel@vger.kernel.org, mingo@elte.hu, laijs@cn.fujitsu.com, rostedt@goodmis.org, peterz@infradead.org, jiayingz@google.com, mbligh@google.com, lizf@cn.fujitsu.com, Heiko Carstens , Martin Schwidefsky Subject: Re: [PATCH 08/12] add trace events for each syscall entry/exit Message-ID: <20090825162004.GA25058@Krystal> References: <20090825141547.GE6114@nowhere> <20090825160237.GG4639@cetus.boeblingen.de.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <20090825160237.GG4639@cetus.boeblingen.de.ibm.com> X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.27.31-grsec (i686) X-Uptime: 12:11:00 up 7 days, 3:00, 2 users, load average: 0.13, 0.14, 0.21 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Hendrik Brueckner (brueckner@linux.vnet.ibm.com) wrote: > On Tue, Aug 25, 2009 at 04:15:49PM +0200, Frederic Weisbecker wrote: > > On Tue, Aug 25, 2009 at 02:50:27PM +0200, Hendrik Brueckner wrote: > > > There are at least two scenarios where syscall_get_nr() can return -1: > > > > > > 1. For example, ptrace stores an invalid syscall number, and thus, > > > tracing code resets it. > > > (see do_syscall_trace_enter in arch/s390/kernel/ptrace.c) > > > > > > 2. The syscall_regfunc() (kernel/tracepoint.c) sets the TIF_SYSCALL_FTRACE > > > (now: TIF_SYSCALL_TRACEPOINT) flag for all threads which includes > > > kernel threads. > > > However, the ftrace selftest triggers a kernel oops when testing syscall > > > trace points: > > > - The kernel thread is started as ususal (do_fork()), > > > - tracing code sets TIF_SYSCALL_FTRACE, > > > - the ret_from_fork() function is triggered and starts > > > ftrace_syscall_exit() with an invalid syscall number. > > > > > > > > I wonder if there is any way to identify such situation...? > For the second case, it might be an option to avoid setting the > TIF_SYSCALL_FTRACE flag for kernel threads. > > Kernel threads have task_struct->mm set to NULL. > (Thanks to Heiko for that hint ;-) > > The idea is then to check the mm field in syscall_regfunc() and > set the flag accordingly. > > However, I think the patch is an optional add-on becase checking > the syscall number is still required for case 1). > > --- > kernel/tracepoint.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > --- a/kernel/tracepoint.c > +++ b/kernel/tracepoint.c > @@ -593,7 +593,9 @@ void syscall_regfunc(void) > if (!sys_tracepoint_refcount) { > read_lock_irqsave(&tasklist_lock, flags); > do_each_thread(g, t) { > - set_tsk_thread_flag(t, TIF_SYSCALL_FTRACE); > + /* Skip kernel threads. */ > + if (t->mm) > + set_tsk_thread_flag(t, TIF_SYSCALL_FTRACE); Uh ? kernel threads can invoke a system call. There are rare places where kernel code actually invoke system calls. I don't see why we should not deal with them. Moreover, the problem you face is more general: if we set the TIF_SYSCALL_FTRACE flag of a standard thread right in the middle of its system call, x86_64 will cause the syscall exit to execute by re-reading the thread flags and run a syscall trace exit. We could simply initialize the "saved system calls id" number to something like -1, so that if we happen to return from a syscall that did not get its id recorded at syscall entry, we know it because it's not initialized. We would need to carefully put back the -1 value after clearing the thread flag when we stop tracing too (while still holding a mutex). Mathieu > } while_each_thread(g, t); > read_unlock_irqrestore(&tasklist_lock, flags); > } > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68