From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Leo Yan <leo.yan@linaro.org>
Cc: James Clark <james.clark@arm.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>, Ian Rogers <irogers@google.com>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1] perf synthetic-events: Return error if procfs isn't mounted for PID namespaces
Date: Sun, 6 Feb 2022 08:46:37 -0300 [thread overview]
Message-ID: <Yf+1HQ9+7TPmSSr7@kernel.org> (raw)
In-Reply-To: <20220204124849.GB2040169@leoy-ThinkPad-X240s>
Em Fri, Feb 04, 2022 at 08:48:49PM +0800, Leo Yan escreveu:
> On Fri, Jan 07, 2022 at 02:27:57PM +0000, James Clark wrote:
> > On 24/12/2021 12:40, Leo Yan wrote:
> > > For perf recording, it retrieves process info by iterating nodes in proc
> > > fs. If we run perf in a non-root PID namespace with command:
> > >
> > > # unshare --fork --pid perf record -e cycles -a -- test_program
> > >
> > > ... in this case, unshare command creates a child PID namespace and
> > > launches perf tool in it, but the issue is the proc fs is not mounted
> > > for the non-root PID namespace, this leads to the perf tool gathering
> > > process info from its parent PID namespace.
> > >
> >
> > I had some concerns that this would prevent use of perf in docker, but docker
> > does mount /proc so it is still working. And you've added a warning about how
> > to fix the unshare command so I think this is ok.
> >
> > Reviewed-by: James Clark <james.clark@arm.com>
>
> Thanks a lot for review, James.
>
> Arnaldo, Jiri, could you take a look for this patch? Thanks!
Looks sane, there was discussion, a detailed set of steps to reproduce
the problem was provided, thanks, applied.
- Arnaldo
> Leo
>
> > > We can use below command to observe the process nodes under proc fs:
> > >
> > > # unshare --pid --fork ls /proc
> > > 1 137 1968 2128 3 342 48 62 78 crypto kcore net uptime
> > > 10 138 2 2142 30 35 49 63 8 devices keys pagetypeinfo version
> > > 11 139 20 2143 304 36 50 64 82 device-tree key-users partitions vmallocinfo
> > > 12 14 2011 22 305 37 51 65 83 diskstats kmsg self vmstat
> > > 128 140 2038 23 307 39 52 656 84 driver kpagecgroup slabinfo zoneinfo
> > > 129 15 2074 24 309 4 53 67 9 execdomains kpagecount softirqs
> > > 13 16 2094 241 31 40 54 68 asound fb kpageflags stat
> > > 130 164 2096 242 310 41 55 69 buddyinfo filesystems loadavg swaps
> > > 131 17 2098 25 317 42 56 70 bus fs locks sys
> > > 132 175 21 26 32 43 57 71 cgroups interrupts meminfo sysrq-trigger
> > > 133 179 2102 263 329 44 58 75 cmdline iomem misc sysvipc
> > > 134 1875 2103 27 330 45 59 76 config.gz ioports modules thread-self
> > > 135 19 2117 29 333 46 6 77 consoles irq mounts timer_list
> > > 136 1941 2121 298 34 47 60 773 cpuinfo kallsyms mtd tty
> > >
> > > So it shows many existed tasks, since unshared command has not mounted
> > > the proc fs for the new created PID namespace, it still accesses the
> > > proc fs of the root PID namespace. This leads to two prominent issues:
> > >
> > > - Firstly, PID values are mismatched between thread info and samples.
> > > The gathered thread info are coming from the proc fs of the root PID
> > > namespace, but samples record its PID from the child PID namespace.
> > >
> > > - The second issue is profiled program 'test_program' returns its forked
> > > PID number from the child PID namespace, perf tool wrongly uses this
> > > PID number to retrieve the process info via the proc fs of the root
> > > PID namespace.
> > >
> > > To avoid issues, we need to mount proc fs for the child PID namespace
> > > with the option '--mount-proc' when use unshare command:
> > >
> > > # unshare --fork --pid --mount-proc perf record -e cycles -a -- test_program
> > >
> > > Conversely, when the proc fs of the root PID namespace is used by child
> > > namespace, perf tool can detect the multiple PID levels and
> > > nsinfo__is_in_root_namespace() returns false, this patch reports error
> > > for this case:
> > >
> > > # unshare --fork --pid perf record -e cycles -a -- test_program
> > > Couldn't synthesize bpf events.
> > > Perf runs in non-root PID namespace but it tries to gather process info from its parent PID namespace.
> > > Please mount the proc file system properly, e.g. add the option '--mount-proc' for unshare command.
> > >
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > > tools/perf/util/synthetic-events.c | 19 +++++++++++++++++++
> > > 1 file changed, 19 insertions(+)
> > >
> > > diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> > > index 198982109f0f..cf7800347f77 100644
> > > --- a/tools/perf/util/synthetic-events.c
> > > +++ b/tools/perf/util/synthetic-events.c
> > > @@ -1784,6 +1784,25 @@ int __machine__synthesize_threads(struct machine *machine, struct perf_tool *too
> > > perf_event__handler_t process, bool needs_mmap,
> > > bool data_mmap, unsigned int nr_threads_synthesize)
> > > {
> > > + /*
> > > + * When perf runs in non-root PID namespace, and the namespace's proc FS
> > > + * is not mounted, nsinfo__is_in_root_namespace() returns false.
> > > + * In this case, the proc FS is coming for the parent namespace, thus
> > > + * perf tool will wrongly gather process info from its parent PID
> > > + * namespace.
> > > + *
> > > + * To avoid the confusion that the perf tool runs in a child PID
> > > + * namespace but it synthesizes thread info from its parent PID
> > > + * namespace, returns failure with warning.
> > > + */
> > > + if (!nsinfo__is_in_root_namespace()) {
> > > + pr_err("Perf runs in non-root PID namespace but it tries to ");
> > > + pr_err("gather process info from its parent PID namespace.\n");
> > > + pr_err("Please mount the proc file system properly, e.g. ");
> > > + pr_err("add the option '--mount-proc' for unshare command.\n");
> > > + return -EPERM;
> > > + }
> > > +
> > > if (target__has_task(target))
> > > return perf_event__synthesize_thread_map(tool, threads, process, machine,
> > > needs_mmap, data_mmap);
> > >
--
- Arnaldo
next prev parent reply other threads:[~2022-02-06 11:52 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-24 12:40 [PATCH v1] perf synthetic-events: Return error if procfs isn't mounted for PID namespaces Leo Yan
2021-12-24 12:43 ` Leo Yan
2022-01-07 14:27 ` James Clark
2022-02-04 12:48 ` Leo Yan
2022-02-06 11:46 ` Arnaldo Carvalho de Melo [this message]
2022-02-07 3:31 ` Leo Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yf+1HQ9+7TPmSSr7@kernel.org \
--to=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=irogers@google.com \
--cc=james.clark@arm.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@redhat.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=leo.yan@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).