From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754897AbYIWD1T (ORCPT ); Mon, 22 Sep 2008 23:27:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754022AbYIWD1J (ORCPT ); Mon, 22 Sep 2008 23:27:09 -0400 Received: from tomts22-srv.bellnexxia.net ([209.226.175.184]:35877 "EHLO tomts22-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754151AbYIWD1I (ORCPT ); Mon, 22 Sep 2008 23:27:08 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAPT910hMQWq+/2dsb2JhbACBXbU2gWY Date: Mon, 22 Sep 2008 23:27:07 -0400 From: Mathieu Desnoyers To: Steven Rostedt Cc: "Frank Ch. Eigler" , Martin Bligh , Linux Kernel Mailing List , Linus Torvalds , Thomas Gleixner , od@novell.com Subject: Re: Unified tracing buffer Message-ID: <20080923032707.GJ24937@Krystal> References: <33307c790809191433w246c0283l55a57c196664ce77@mail.gmail.com> <20080919231841.GB3456@redhat.com> <20080920133752.GA23215@Krystal> <20080922184538.GB6349@Krystal> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 23:14:33 up 110 days, 7:54, 7 users, load average: 0.62, 0.59, 0.58 User-Agent: Mutt/1.5.16 (2007-06-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Steven Rostedt (rostedt@goodmis.org) wrote: > > On Mon, 22 Sep 2008, Mathieu Desnoyers wrote: > > * Steven Rostedt (rostedt@goodmis.org) wrote: [...] > > > and then to turn on function tracing, I need to hook into this marker. I'd > > > rather just push the data right into the buffer here without having to > > > make another function call to hook into this. > > > > > > > The scheme you propose here is based on a few inherent assumptions : > > > > - You assume ring_buffer_reserve() and ring_buffer_commit() are static > > inline and thus does not turn into function calls. > > - You assume these are small enough so they can be inlined without > > causing L1 insn cache trashing when tracing is activated. > > - You therefore assume they use a locking scheme that lets them be > > really really compact (e.g. interrupt disable and spin lock). > > - You assume that the performance impact of doing a function call is > > bigger than the impact of locking, which is false by at least a factor > > 10. > > I don't assume anything. I will have the requirement that reserve and > commit must be paired, and for the first version, hold locks. > By saying you don't want to do any function call, the only technical reason I see for you wanting that is performance, and thus you would assume the above. If not, why don't you want to make another function call ? This all I mean by "assumption" here. > Maybe I should rename it to: ring_buffer_lock_reserve and > ring_buffer_unlock_commit. To show this. > > [...] > > > kernel/trace/ftrace.c : > > > > /* > > * the following macro would only do the "declaration" part of the > > * markers, without doing all the function call stuff. > > */ > > DECLARE_MARKER(function_entry, > > "pid %d pc %d flags %lu func 0x%lX parent 0x%lX"); > > > > void ftrace_mcount(unsigned long ip, unsigned long parent_ip) > > { > > size_t ev_size = 0; > > char *buffer; > > > > /* > > * We assume event payload aligned on sizeof(void *). > > * Event size calculated statically. > > */ > > ev_size += sizeof(int); > > ev_size += var_align(ev_size, sizeof(int)); > > ev_size += sizeof(int); > > ev_size += var_align(ev_size, sizeof(unsigned long)); > > ev_size += sizeof(unsigned long); > > ev_size += var_align(ev_size, sizeof(unsigned long)); > > ev_size += sizeof(unsigned long); > > ev_size += var_align(ev_size, sizeof(unsigned long)); > > ev_size += sizeof(unsigned long); > > > > /* > > * Now reserve space and copy data. > > */ > > buffer = ring_buffer_reserve(func_event_id, ev_size); > > /* Write pid */ > > *(int *)buffer = current->pid; > > buffer += sizeof(int); > > > > /* Write pc */ > > buffer += var_align(buffer, sizeof(int)); > > *(int *)buffer = preempt_count(); > > buffer += sizeof(int); > > > > /* Write flags */ > > buffer += var_align(buffer, sizeof(unsigned long)); > > *(unsigned long *)buffer = local_irq_flags(); > > buffer += sizeof(unsigned long); > > > > /* Write func */ > > buffer += var_align(buffer, sizeof(unsigned long)); > > *(unsigned long *)buffer = func; > > buffer += sizeof(unsigned long); > > > > /* Write parent */ > > buffer += var_align(buffer, sizeof(unsigned long)); > > *(unsigned long *)buffer = parent; > > buffer += sizeof(unsigned long); > > > > ring_buffer_commit(buffer, ev_size); > > } > > > > > > Would that be suitable for you ? > > YUCK YUCK YUCK!!!! > > Mathieu, > > Do I have to bring up the argument of simplicity again? I will never use > such an API. Mine was very simple, I have to spend 10 minutes trying to > figure out what the above is. I only spent 5 so I'm still at a lost. > I was actually waiting for you to propose an alternative, but I fear you already did without me noticing :) How do you deal with exporting data across kernel/user boundary in your proposal exactly ? How does this work on architecture with 64-bits kernel and 32-bits userland... ? A simple C structure copy might be simple to _code_, but hellish to export to userspace and lead to hard to debug binary incompatibilities (different gcc flags, 32/64 bits user/kernel). And this is without telling about the non-portability of the exported data. If gcc/icc-knowledgeful people can reassure me by certifying it won't generate a mess, fine, but until then, I stay very doubtful about solutions involving to imply binary compability between kernel and userland. And common.. 10 minutes to understand the above code. Your _are_ kidding me right ? Would that help if I create a small 4 lineish wrapper around the buffer write ? Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68