From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752539AbbGOQWJ (ORCPT ); Wed, 15 Jul 2015 12:22:09 -0400 Received: from e17.ny.us.ibm.com ([129.33.205.207]:47222 "EHLO e17.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751877AbbGOQWH (ORCPT ); Wed, 15 Jul 2015 12:22:07 -0400 X-Helo: d01dlp02.pok.ibm.com X-MailFrom: aravinda@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Message-ID: <55A688A0.7070507@linux.vnet.ibm.com> Date: Wed, 15 Jul 2015 21:51:52 +0530 From: Aravinda Prasad User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: Peter Zijlstra CC: linux-kernel@vger.kernel.org, rostedt@goodmis.org, mingo@redhat.com, paulus@samba.org, acme@kernel.org, hbathini@linux.vnet.ibm.com, ananth@in.ibm.com Subject: Re: [RFC PATCH] perf: Container-aware tracing support References: <20150715090836.16376.80760.stgit@aravindap> <20150715124741.GL2859@worktop.programming.kicks-ass.net> In-Reply-To: <20150715124741.GL2859@worktop.programming.kicks-ass.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15071516-0041-0000-0000-000000D1A9AF Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wednesday 15 July 2015 06:17 PM, Peter Zijlstra wrote: > On Wed, Jul 15, 2015 at 02:38:36PM +0530, Aravinda Prasad wrote: >> Current tracing infrastructure such as perf and ftrace reports system >> wide data when invoked inside a container. It is required to restrict >> events specific to a container context when such tools are invoked >> inside a container. >> >> This RFC patch supports filtering container specific events, without >> any change in the user interface, when invoked within a container for >> the perf utility; such support needs to be extended to ftrace. This >> patch assumes that the debugfs is available within the container and >> all the processes running inside a container are grouped into a single >> perf_event subsystem of cgroups. This patch piggybacks on the existing >> support available for tracing with cgroups [1] by setting the cgrp >> member of the event structure to the cgroup of the context perf tool >> is invoked from. >> >> However, this patch is not complete and requires more work to fully >> support tracing inside a container. This patch is intended to initiate >> the discussion on having container-aware tracing support. A detailed >> explanation on what is supported and pending issues are mentioned >> below. > > tracing is outside the scope of perf; I suspect you want tracefs to be > sensitive to filesystem namespaces and all that that entails. Yes, tracefs needs to be sensitive to filesystem namespace. I wanted to put together points required for supporting perf/trace inside containers. > >> Cc: Hari Bathini >> Signed-off-by: Aravinda Prasad >> --- >> kernel/events/core.c | 49 +++++++++++++++++++++++++++++++++++-------------- >> 1 file changed, 35 insertions(+), 14 deletions(-) >> >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index 81aa3a4..f6a1f89 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -589,17 +589,38 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event, >> { >> struct perf_cgroup *cgrp; >> struct cgroup_subsys_state *css; >> - struct fd f = fdget(fd); >> + struct fd f; >> int ret = 0; >> >> - if (!f.file) >> - return -EBADF; >> + if (fd != -1) { >> + f = fdget(fd); >> + if (!f.file) >> + return -EBADF; >> >> - css = css_tryget_online_from_dir(f.file->f_path.dentry, >> + css = css_tryget_online_from_dir(f.file->f_path.dentry, >> &perf_event_cgrp_subsys); >> - if (IS_ERR(css)) { >> - ret = PTR_ERR(css); >> - goto out; >> + if (IS_ERR(css)) { >> + ret = PTR_ERR(css); >> + fdput(f); >> + return ret; >> + } >> + } else if (event->attach_state == PERF_ATTACH_TASK) { >> + /* Tracing on a PID. No need to set event->cgrp */ >> + return ret; >> + } else if (task_active_pid_ns(current) != &init_pid_ns) { > > Why the pid namespace? This comes from my understanding of container -- having at least a separate PID namespace with processes inside a container grouped into a single perf_event cgroups subsystem. I know there are other ways to define a container, however, I thought I start with the above one. > >> + /* Don't set event->cgrp if task belongs to root cgroup */ >> + if (task_css_is_root(current, perf_event_cgrp_id)) >> + return ret; > > So if you have the root perf_cgroup inside your container you can > escape? If we have root perf_cgroup inside the container then even if we set event->cgrp we will be including all processes in the system. Regards, Aravinda > >> + >> + css = task_css(current, perf_event_cgrp_id); >> + if (!css || !css_tryget_online(css)) >> + return -ENOENT; >> + } else { >> + /* >> + * perf invoked from global context and hence don't set >> + * event->cgrp as all the events should be included >> + */ >> + return ret; >> } >> >> cgrp = container_of(css, struct perf_cgroup, css); > -- Regards, Aravinda