From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161176AbcHEKwX (ORCPT ); Fri, 5 Aug 2016 06:52:23 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:41278 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759563AbcHEKwV (ORCPT ); Fri, 5 Aug 2016 06:52:21 -0400 Date: Fri, 5 Aug 2016 12:52:09 +0200 From: Peter Zijlstra To: Alexei Starovoitov Cc: Brendan Gregg , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , linux-kernel@vger.kernel.org, Alexei Starovoitov , Wang Nan Subject: Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling Message-ID: <20160805105209.GR6879@twins.programming.kicks-ass.net> References: <1470192469-11910-1-git-send-email-bgregg@netflix.com> <1470192469-11910-2-git-send-email-bgregg@netflix.com> <20160803094805.GD6879@twins.programming.kicks-ass.net> <20160804142853.GO6862@twins.programming.kicks-ass.net> <20160805014341.GB52225@ast-mbp.thefacebook.com> <20160805052404.GA57782@ast-mbp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160805052404.GA57782@ast-mbp.thefacebook.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 04, 2016 at 10:24:06PM -0700, Alexei Starovoitov wrote: > tracepoints are actually zero overhead already via static-key mechanism. > I don't think Peter's objection for the tracepoint was due to overhead. Almost 0, they still have some I$ footprint, but yes. My main worry is that we can feed tracepoints into perf, so having tracepoints in perf is tricky. I also don't much like this tracepoint being specific to the hrtimer bits, I can well imagine people wanting to do the same thing for hardware based samples or whatnot. > > The perf:perf_hrtimer probe point is also reading state mid-way > > through a function, so it's not quite as simple as wrapping the > > function pointer. I do like that idea, though, but for things like > > struct file_operations. So what additional state to you need? > > > Currently overflow_handler is set at event alloc time. If we start > > > changing it on the fly with atomic xchg(), afaik things shouldn't > > > break, since each overflow_handler is run to completion and doesn't > > > change global state, right? Yes, or even a simple WRITE_ONCE() to replace it, as long as we make sure to use a READ_ONCE() to load the pointer. As long as we're sure to limit this poking to a single user its fairly simple to get right. The moment there can be concurrency a lot of fail can happen. > instead of adding a tracepoint to perf_swevent_hrtimer we can replace > overflow_handler for that particular event with some form of bpf wrapper. > (probably new bpf program type). Then not only periodic events > will be triggering bpf prog, but pmu events as well. Exactly. > So instead of normal __perf_event_output() writing into ringbuffer, > a bpf prog will be called that can optionally write into different > rb via bpf_perf_event_output. It could even chain and call into the original function once its done and have both outputs. > The question is what to pass into the > program to make the most use out of it. 'struct pt_regs' is done deal. > but perf_sample_data we cannot pass as-is, since it's kernel internal. Urgh, does it have to be stable API? Can't we simply rely on the kernel headers to provide the right structure definition?