From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758083AbcATIcf (ORCPT ); Wed, 20 Jan 2016 03:32:35 -0500 Received: from casper.infradead.org ([85.118.1.10]:51145 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753283AbcATIcZ (ORCPT ); Wed, 20 Jan 2016 03:32:25 -0500 Date: Wed, 20 Jan 2016 09:32:22 +0100 From: Peter Zijlstra To: Alexei Starovoitov Cc: Alexander Shishkin , Ingo Molnar , linux-kernel@vger.kernel.org, vince@deater.net, eranian@google.com, Arnaldo Carvalho de Melo , Jiri Olsa Subject: Re: [PATCH v2] perf: Synchronously cleanup child events Message-ID: <20160120083222.GF6357@twins.programming.kicks-ass.net> References: <20160118144410.GS6357@twins.programming.kicks-ass.net> <1453216354-9282-1-git-send-email-alexander.shishkin@linux.intel.com> <20160119200558.GC6357@twins.programming.kicks-ass.net> <20160119215818.GA87036@ast-mbp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160119215818.GA87036@ast-mbp.thefacebook.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 19, 2016 at 01:58:19PM -0800, Alexei Starovoitov wrote: > On Tue, Jan 19, 2016 at 09:05:58PM +0100, Peter Zijlstra wrote: > > The most obvious place that generates such magical references would be > > the bpf arraymap doing perf_event_get() on things. There are a few other > > places that take temp references (perf_mmap_close), but those are > > 'short' lived and while ugly will not cause massive grief. The BPF one > > OTOH is a real problem here. > > > > And looking at the BPF stuff, that code seems to assume > > perf_event_kernel_release() := put_event(), so this patch breaks that > > too. > > > > > > Alexei, is there a reason the arraymap stuff needs a perf event ref as > > opposed to a file ref? I'm forever a little confused on how perf<->bpf > > works. > > A file ref will not work, since user space could have closed that > perf_event file to avoid unnecessary FDs. So I'm (possibly again) confused on how BPF works. I thought the reason you handed in perf events from userspace; as opposed to creating your own with perf_event_create_kernel_counter(); was because userspace was interested in the output. Also, BPF should not be a way to get around the filedesc resource limit. > Program only need the stable pointer to 'struct perf_event' which > it will use while running. > At the end it will call perf_event_kernel_release() which > is == put_event(). > It was the case that 'perf_events' were normal refcnt-ed structures > and the last guy frees it. Sort-of, but user events are (or should be, rather) tied to the filedesc to account the resources used. There is also the event->owner field, we track the task that created the event, with your current scheme that is left dangling once userspace closes the last filedesc and you still have a ref open. > This put_event_last() logic definitely looks problematic. > There are no ordering guarantees. > User space may close FD, while struct perf_event is still alive. > The loop around perf_event_last() looks buggy. > I'm obviously missing the main goal of this patch. Right, so the patch in question tries to synchronously clean up everything related to the counter when we close the file. Such that the file better reflects the actual resource usage. Currently we do this async (and with holes). In short, user created event really should be filedesc based, yes we have event references, but those 'should' be short lived.