From: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
To: a.p.zijlstra@chello.nl, linux-kernel@vger.kernel.org,
rostedt@goodmis.org, mingo@redhat.com, paulus@samba.org,
acme@kernel.org
Cc: hbathini@linux.vnet.ibm.com, ananth@in.ibm.com
Subject: [RFC PATCH] perf: Container-aware tracing support
Date: Wed, 15 Jul 2015 14:38:36 +0530 [thread overview]
Message-ID: <20150715090836.16376.80760.stgit@aravindap> (raw)
Current tracing infrastructure such as perf and ftrace reports system
wide data when invoked inside a container. It is required to restrict
events specific to a container context when such tools are invoked
inside a container.
This RFC patch supports filtering container specific events, without
any change in the user interface, when invoked within a container for
the perf utility; such support needs to be extended to ftrace. This
patch assumes that the debugfs is available within the container and
all the processes running inside a container are grouped into a single
perf_event subsystem of cgroups. This patch piggybacks on the existing
support available for tracing with cgroups [1] by setting the cgrp
member of the event structure to the cgroup of the context perf tool
is invoked from.
However, this patch is not complete and requires more work to fully
support tracing inside a container. This patch is intended to initiate
the discussion on having container-aware tracing support. A detailed
explanation on what is supported and pending issues are mentioned
below.
Suggestions, feedback, flames are welcome.
[1] https://lkml.org/lkml/2011/2/14/40
--------------------------------------------------------------------
Details:
With this patch, perf-stat, perf-record (tracepoints, [ku]rpobes) and
perf-top when executed within a container reports events that are
triggered only in that container context. However, there are couple
of limitations on how this works for kprobes/uprobes and in general
ftrace infrastructure.
The problem arises due to the use of files /sys/kernel/debug/
tracing/[uk]probe_events. Perf utility inserts a probe by writing into
the [uk]probe_events file, which is parsed by the kernel to register
an event. When debugfs is mounted inside containers, the contents of
these files are visible to all containers. This implies that a user
within a container can list/delete probes registered by other
containers, leading to security issues and/or denial of service (Eg:
by deleting a probe from another container every time it is
registered). This could be undesirable depending on the way containers
are used (Eg: if used in multi-tenancy with each users assigned a
container).
The issues mentioned above exist for tracing infrastructures which use
ftrace interface. One approach is to have a container specific view of
these files under /sys/kernel/debug/tracing. At this moment, this seems
to require a significant rework of ftrace.
We are looking for feedback on the assumptions we have made about the
processes running inside a container grouped into a single perf_event
subsystem and also any thoughts on extending such support to ftrace.
Regards,
Aravinda
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
kernel/events/core.c | 49 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 14 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 81aa3a4..f6a1f89 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -589,17 +589,38 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
{
struct perf_cgroup *cgrp;
struct cgroup_subsys_state *css;
- struct fd f = fdget(fd);
+ struct fd f;
int ret = 0;
- if (!f.file)
- return -EBADF;
+ if (fd != -1) {
+ f = fdget(fd);
+ if (!f.file)
+ return -EBADF;
- css = css_tryget_online_from_dir(f.file->f_path.dentry,
+ css = css_tryget_online_from_dir(f.file->f_path.dentry,
&perf_event_cgrp_subsys);
- if (IS_ERR(css)) {
- ret = PTR_ERR(css);
- goto out;
+ if (IS_ERR(css)) {
+ ret = PTR_ERR(css);
+ fdput(f);
+ return ret;
+ }
+ } else if (event->attach_state == PERF_ATTACH_TASK) {
+ /* Tracing on a PID. No need to set event->cgrp */
+ return ret;
+ } else if (task_active_pid_ns(current) != &init_pid_ns) {
+ /* Don't set event->cgrp if task belongs to root cgroup */
+ if (task_css_is_root(current, perf_event_cgrp_id))
+ return ret;
+
+ css = task_css(current, perf_event_cgrp_id);
+ if (!css || !css_tryget_online(css))
+ return -ENOENT;
+ } else {
+ /*
+ * perf invoked from global context and hence don't set
+ * event->cgrp as all the events should be included
+ */
+ return ret;
}
cgrp = container_of(css, struct perf_cgroup, css);
@@ -614,8 +635,10 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
perf_detach_cgroup(event);
ret = -EINVAL;
}
-out:
- fdput(f);
+
+ if (fd != -1)
+ fdput(f);
+
return ret;
}
@@ -7554,11 +7577,9 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (!has_branch_stack(event))
event->attr.branch_sample_type = 0;
- if (cgroup_fd != -1) {
- err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
- if (err)
- goto err_ns;
- }
+ err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
+ if (err)
+ goto err_ns;
pmu = perf_init_event(event);
if (!pmu)
next reply other threads:[~2015-07-15 9:08 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-15 9:08 Aravinda Prasad [this message]
2015-07-15 12:47 ` [RFC PATCH] perf: Container-aware tracing support Peter Zijlstra
2015-07-15 16:21 ` Aravinda Prasad
2015-07-17 10:19 ` Peter Zijlstra
2015-07-17 12:20 ` Aravinda Prasad
2015-07-17 12:26 ` Ingo Molnar
-- strict thread matches above, loose matches on Subject: below --
2017-01-12 12:11 Aravinda Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150715090836.16376.80760.stgit@aravindap \
--to=aravinda@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@kernel.org \
--cc=ananth@in.ibm.com \
--cc=hbathini@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulus@samba.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox