From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932886AbZHZMcl (ORCPT ); Wed, 26 Aug 2009 08:32:41 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932579AbZHZMck (ORCPT ); Wed, 26 Aug 2009 08:32:40 -0400 Received: from mail-ew0-f206.google.com ([209.85.219.206]:37644 "EHLO mail-ew0-f206.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932305AbZHZMcj (ORCPT ); Wed, 26 Aug 2009 08:32:39 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=ojf3v6xgsL54rFGnzwczAi4aXUuPefWE1BXHPGIpjQpbw3iqnUJrIkHr0XvLq09NP8 dWMUxQfDge71wIR0HyKC+9m4uOv/8Y4ou9/NwXQMum30CfTMxQete8brov5sAnNW8kBe /yYBqH3q4Yz8RsNnssRdkUReL1ApDqJXq9pNs= Date: Wed, 26 Aug 2009 14:32:32 +0200 From: Frederic Weisbecker To: Heiko Carstens Cc: Martin Schwidefsky , Mathieu Desnoyers , Hendrik Brueckner , Jason Baron , linux-kernel@vger.kernel.org, mingo@elte.hu, laijs@cn.fujitsu.com, rostedt@goodmis.org, peterz@infradead.org, jiayingz@google.com, mbligh@google.com, lizf@cn.fujitsu.com Subject: Re: [PATCH 08/12] add trace events for each syscall entry/exit Message-ID: <20090826123229.GC6009@nowhere> References: <20090825141547.GE6114@nowhere> <20090825160237.GG4639@cetus.boeblingen.de.ibm.com> <20090825162004.GA25058@Krystal> <20090825165912.GI6114@nowhere> <20090825173107.GJ6114@nowhere> <20090825183119.GC2448@Krystal> <20090826000426.0332cbf4@skybase> <20090826073819.GA4749@osiris.boeblingen.de.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090826073819.GA4749@osiris.boeblingen.de.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 26, 2009 at 09:38:20AM +0200, Heiko Carstens wrote: > On Wed, Aug 26, 2009 at 12:04:26AM +0200, Martin Schwidefsky wrote: > > On Tue, 25 Aug 2009 14:31:19 -0400 > > Mathieu Desnoyers wrote: > > > The design proposal for this kthread behavior wrt syscalls is based on a > > > very specific and current kernel behavior, that may happen to change and > > > that I have actually seen proven incorrect. For instance, some > > > proprietary Linux driver does very odd things with system calls within > > > kernel threads, like invoking them with int 0x80. > > That's broken.. some proprietary drivers even change the system call table. > Do you want to be able to deal with that as well? > > > > Yes, this is odd, but do we really want to tie the tracer that much to > > > the actual OS implementation specificities ? > > > > > > That sounds like a recipe for endless breakages and missing bits of > > > instrumentation. > > > > > > So my advice would be: if we want to trace the syscall entry/exit paths, > > > let's trace them for the _whole_ system, and find ways to make it work > > > for corner-cases rather than finding clever ways to diminish > > > instrumentation coverage. > > > > I guess that the real reason for the crash is hidden in the initialization > > of the pt_regs structure of the kernel thread. > > On s390 the reason is that the scvnr in the pt_regs structure of the initial > kernel thread is initialized to 0. svcnr contains the system call number > and system call number 0 does not exist. > That's why we have > > static inline long syscall_get_nr(struct task_struct *task, > struct pt_regs *regs) > { > return regs->svcnr ? regs->svcnr : -1; > } > > Now, if you fork a kernel thread from the initial task the pt_regs structure > gets copied. Upon ret_from_fork the trace exit path will get -1 for > syscall_get_nr(). > > > > Given the ret from fork example happens to be the first event fired > > > after the thread is created, we should be able to deal with this problem > > > by initializing the thread structure used by syscall exit tracing to an > > > initial "ret from fork" value. > > > > That is my best guess as well. > > What would that value be? __NR_fork? > > Syscall tracing of kernel threads seems to be wrong. If somebody would do > a "modprobe" and the init function of the module would create a kernel thread > then syscall_get_nr() at the ret_from_fork path of the kernel thread would > return __NR_init_module. That is of course only true if the old kernel_thread() > API would be used. For kthread_create() it would return the syscall of the > thread from which the kthread daemon was forked (the initial process I would > guess, which was initialized to 0). > > So skipping kernel threads at the exit path seems so be the best fix, IMHO ;) Yeah, we can decide to trace syscalls from kernel, but doing so through the current syscalls tracepoints is broken. > --- > kernel/trace/trace_syscalls.c | 2 ++ > 1 file changed, 2 insertions(+) > > Index: linux-next/kernel/trace/trace_syscalls.c > =================================================================== > --- linux-next.orig/kernel/trace/trace_syscalls.c > +++ linux-next/kernel/trace/trace_syscalls.c > @@ -253,6 +253,8 @@ void ftrace_syscall_exit(struct pt_regs > struct ring_buffer_event *event; > int syscall_nr; > > + if (!current->mm) > + return; Hendrik Brueckner already beat you at it and sent a patch that ignores the TIF_SYSCALL_TRACEPOINT setting for the kernel threads. I'll add your acked by on it, thanks! > syscall_nr = syscall_get_nr(current, regs); > if (!test_bit(syscall_nr, enabled_exit_syscalls)) > return;