From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1161176AbcHEKwX (ORCPT <rfc822;w@1wt.eu>);
	Fri, 5 Aug 2016 06:52:23 -0400
Received: from bombadil.infradead.org ([198.137.202.9]:41278 "EHLO
	bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1759563AbcHEKwV (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 5 Aug 2016 06:52:21 -0400
Date: Fri, 5 Aug 2016 12:52:09 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Brendan Gregg <bgregg@netflix.com>, Ingo Molnar <mingo@redhat.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        linux-kernel@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
        Wang Nan <wangnan0@huawei.com>
Subject: Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling
Message-ID: <20160805105209.GR6879@twins.programming.kicks-ass.net>
References: <1470192469-11910-1-git-send-email-bgregg@netflix.com>
 <1470192469-11910-2-git-send-email-bgregg@netflix.com>
 <20160803094805.GD6879@twins.programming.kicks-ass.net>
 <CAJN39oiS1NiPsAZnvvOrO8rLC=k4nVofJ-AkyAQq9Y2PGvSsVw@mail.gmail.com>
 <20160804142853.GO6862@twins.programming.kicks-ass.net>
 <20160805014341.GB52225@ast-mbp.thefacebook.com>
 <CAJN39ogVeq4S6Hhc_qcJQb66gmV85TEvZFB=YWHA0j+V62bHCQ@mail.gmail.com>
 <20160805052404.GA57782@ast-mbp.thefacebook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160805052404.GA57782@ast-mbp.thefacebook.com>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Aug 04, 2016 at 10:24:06PM -0700, Alexei Starovoitov wrote:
> tracepoints are actually zero overhead already via static-key mechanism.
> I don't think Peter's objection for the tracepoint was due to overhead.

Almost 0, they still have some I$ footprint, but yes. My main worry is
that we can feed tracepoints into perf, so having tracepoints in perf is
tricky.

I also don't much like this tracepoint being specific to the hrtimer
bits, I can well imagine people wanting to do the same thing for
hardware based samples or whatnot.

> > The perf:perf_hrtimer probe point is also reading state mid-way
> > through a function, so it's not quite as simple as wrapping the
> > function pointer. I do like that idea, though, but for things like
> > struct file_operations.

So what additional state to you need?

> > > Currently overflow_handler is set at event alloc time. If we start
> > > changing it on the fly with atomic xchg(), afaik things shouldn't
> > > break, since each overflow_handler is run to completion and doesn't
> > > change global state, right?

Yes, or even a simple WRITE_ONCE() to replace it, as long as we make
sure to use a READ_ONCE() to load the pointer.

As long as we're sure to limit this poking to a single user its fairly
simple to get right. The moment there can be concurrency a lot of fail
can happen.

> instead of adding a tracepoint to perf_swevent_hrtimer we can replace
> overflow_handler for that particular event with some form of bpf wrapper.
> (probably new bpf program type). Then not only periodic events
> will be triggering bpf prog, but pmu events as well.

Exactly.

> So instead of normal __perf_event_output() writing into ringbuffer,
> a bpf prog will be called that can optionally write into different
> rb via bpf_perf_event_output. 

It could even chain and call into the original function once its done
and have both outputs.

> The question is what to pass into the
> program to make the most use out of it. 'struct pt_regs' is done deal.
> but perf_sample_data we cannot pass as-is, since it's kernel internal.

Urgh, does it have to be stable API? Can't we simply rely on the kernel
headers to provide the right structure definition?