From: Oleg Nesterov <oleg@redhat.com>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Namhyung Kim <namhyung.kim@lge.com>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Hyeoncheol Lee <cheol.lee@lge.com>,
Hemant Kumar <hkshaw@linux.vnet.ibm.com>,
LKML <linux-kernel@vger.kernel.org>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
"zhangwei(Jovi)" <jovi.zhangwei@huawei.com>,
Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Subject: Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
Date: Mon, 4 Nov 2013 16:01:12 +0100 [thread overview]
Message-ID: <20131104150112.GC4440@redhat.com> (raw)
In-Reply-To: <87ob60366m.fsf@sejong.aot.lge.com>
On 11/04, Namhyung Kim wrote:
>
> On Sat, 2 Nov 2013 16:54:58 +0100, Oleg Nesterov wrote:
> >
> > This does not look right to me.
> >
> > - get_user_vaddr() is costly, it does vma_interval_tree_foreach() under
> > ->i_mmap_mutex.
>
> Hmm.. yes, I think this is not needed. I guess it should lookup a
> proper vma in current->mm with mmap_sem read-locked.
>
> >
> > - this only allows to read the data from the same binary.
>
> Right. This is also an unnecessary restriction. We should be able to
> access data in other binary.
Yes... but this needs another discussion. In general, we simply can not
do this with the suggested syntax.
Say you want to probe this "foo" binary and dump "stdin" from libc.so.
You can't do this. You simply can't know where libc.so will be mmaped.
But: if we attach the event to the already running process, or if we
disable the randomization, then we can probably do this, see below.
Or the syntax should be "name=probe @file/addr" or something like this.
> > - in particular, you can't read the data from bss
>
> I can't understand why.. The bss region should also be in a same vma of
> normal data, no?
No, no. bss is mmaped anonymously, at least in general. See set_brk() in
load_elf().
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> >
> > unsigned int global = 0x1234;
> >
> > void func(void)
> > {
> > }
> >
> > int main(void)
> > {
> > char cmd[64];
> >
> > global = 0x4321;
> > func();
> >
> > printf("addr = %p\n", &global);
> >
> > sprintf(cmd, "cat /proc/%d/maps", getpid());
> > system(cmd);
> >
> > return 0;
> > }
> >
> > # nm foo | grep -w global
> > 0000000000600a04 D global
> >
> > # perf probe -x ./foo -a "func var=@0xa04:u32"
> > # perf record -e probe_foo:func ./foo
> > addr = 0x600a04
> > 00400000-00401000 r-xp 00000000 fe:01 20958 /root/foo
> > 00600000-00601000 rw-p 00000000 fe:01 20958 /root/foo
> > ...
> >
> > # perf script | tail -1
> > foo 555 [000] 1302.345642: probe_foo:func: (40059c) var=1234
> >
> > Note that it reports "1234", not "4321". This is because
> > get_user_vaddr() finds another (1st) read-only mapping, and
> > prints the initial value of "global".
> >
> > IOW, it reads the memory from 0x400a04, not from 0x600a04.
>
> Argh.. This is a problem.
>
> I thought the gcc somehow aligns data to next page boundary.
And perhaps it even should, my system is old. But this doesn't really
matter, the process itself can create another mapping.
> But if
> it's not the case, we need to recognize which is the proper one..
>
> Simply preferring a writable vma to a read-only vma is what's came to my
> head now. Do you have an idea?
So far I think that trace_uprobes.c should not play games with vma. At all.
> > -------------------------------------------------------------------------------
> > Can't we simply implement get_user_vaddr() as
> >
> > static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> > {
> > void __user *vaddr = (void __force __user *)addr;
> >
> > /* A NULL tu means that we already got the vaddr */
> > if (tu)
> > vaddr += (current->mm->start_data & PAGE_MASK);
> >
> > return vaddr;
> > }
> >
> > ?
> >
> > I did this change, and now the test-case above works. And it also works
> > with "cc -pie -fPIC",
> >
> > # nm foo | grep -w global
> > 0000000000200c9c D global
> >
> > # perf probe -x ./foo -a "func var=@0xc9c:u32"
> > # perf record -e probe_foo:func ./foo
> > ...
> > # perf script | tail -1
> > foo 576 [001] 475.519940: probe_foo:func: (7ffe95ca3814) var=4321
> >
> > What do you think?
>
> This can only work with the probes fetching data from the executable,
> right? But as I said it should support any other binaries too.
See above, we can't in general read other binaries.
But: if we know know where it is mmapped we can do this, just we need
to calculate the right addr passed to trace_uprobes.
Or: we should support both absolute and relative addresses, this is what
I was going to discuss later.
> static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> {
> unsigned long pgoff = addr >> PAGE_SHIFT;
> struct vm_area_struct *vma, *orig_vma = NULL;
> unsigned long vaddr = 0;
>
> if (tu == NULL) {
> /* A NULL tu means that we already got the vaddr */
> return (void __force __user *) addr;
> }
>
> down_read(¤t->mm->mmap_sem);
>
> vma = current->mm->mmap;
Cough, it can be null if another thread does munmap(0, TASK_SIZE) ;)
But this doesn't matter.
> do {
> if (!vma->vm_file || vma->vm_file->f_inode != tu->inode) {
> /*
> * We found read-only mapping for this inode.
> * (provided that all mappings for this inode
> * have consecutive addresses)
> */
> if (orig_vma)
> break;
> continue;
> }
>
> if (vma->vm_pgoff > pgoff ||
> (vma->vm_pgoff + vma_pages(vma) <= pgoff))
> continue;
>
> orig_vma = vma;
>
> /*
> * We prefer writable mapping over read-only since
> * data is usually in read/write memory region. But
> * in case of read-only data, it only can be found in
> * read-only mapping so we save orig_vma and check
> * whether it also has writable mapping.
> */
> if (vma->vm_flags & VM_WRITE)
> break;
> } while ((vma = vma->vm_next) != NULL);
>
> if (orig_vma)
> vaddr = offset_to_vaddr(orig_vma, addr);
>
> up_read(¤t->mm->mmap_sem);
>
> return (void __force __user *) vaddr;
> }
For what? Why it is better then my suggestion?
How it can read bss? How it can read the data from other binaries?
How we can trust the result? This code relies on some guesses and
none of them are "strict".
If nothing else, elf can have the arbitrary number of mmaped sections,
this can't work in general?
Oleg.
next prev parent reply other threads:[~2013-11-04 15:07 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-29 6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
2013-10-29 6:53 ` [PATCH 01/13] tracing/uprobes: Fix documentation of uprobe registration syntax Namhyung Kim
2013-10-29 6:53 ` [PATCH 02/13] tracing/probes: Fix basic print type functions Namhyung Kim
2013-10-29 6:53 ` [PATCH 03/13] tracing/kprobes: Move fetch functions to trace_kprobe.c Namhyung Kim
2013-10-29 6:53 ` [PATCH 04/13] tracing/kprobes: Add fetch{,_size} member into deref fetch method Namhyung Kim
2013-10-29 6:53 ` [PATCH 05/13] tracing/kprobes: Staticize stack and memory fetch functions Namhyung Kim
2013-10-29 6:53 ` [PATCH 06/13] tracing/kprobes: Factor out struct trace_probe Namhyung Kim
2013-10-29 6:53 ` [PATCH 07/13] tracing/uprobes: Convert to " Namhyung Kim
2013-10-29 6:53 ` [PATCH 08/13] tracing/kprobes: Move common functions to trace_probe.h Namhyung Kim
2013-10-29 6:53 ` [PATCH 09/13] tracing/kprobes: Integrate duplicate set_print_fmt() Namhyung Kim
2013-10-29 6:53 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
2013-10-31 18:16 ` Oleg Nesterov
2013-11-01 9:00 ` Namhyung Kim
2013-11-04 8:06 ` Namhyung Kim
2013-11-04 14:35 ` Oleg Nesterov
2013-11-05 1:12 ` Namhyung Kim
2013-11-01 15:09 ` Oleg Nesterov
2013-11-01 15:22 ` Oleg Nesterov
2013-11-03 20:20 ` Oleg Nesterov
2013-11-04 8:11 ` Namhyung Kim
2013-11-04 14:38 ` Oleg Nesterov
2013-11-05 1:17 ` Namhyung Kim
2013-10-29 6:53 ` [PATCH 11/13] tracing/kprobes: Add priv argument to fetch functions Namhyung Kim
2013-11-04 16:09 ` Oleg Nesterov
2013-11-05 2:10 ` Namhyung Kim
2013-10-29 6:53 ` [PATCH 12/13] tracing/uprobes: Add more " Namhyung Kim
2013-10-31 18:22 ` Oleg Nesterov
2013-11-04 8:50 ` Namhyung Kim
2013-11-04 16:44 ` Oleg Nesterov
2013-11-04 17:17 ` Steven Rostedt
2013-11-05 2:19 ` Namhyung Kim
2013-11-05 2:17 ` Namhyung Kim
2013-11-01 17:53 ` Oleg Nesterov
2013-10-29 6:53 ` [PATCH 13/13] tracing/uprobes: Add support for full argument access methods Namhyung Kim
2013-10-30 10:36 ` [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Masami Hiramatsu
2013-11-02 15:54 ` Oleg Nesterov
2013-11-04 8:46 ` Namhyung Kim
2013-11-04 8:59 ` Namhyung Kim
2013-11-04 15:51 ` Oleg Nesterov
2013-11-04 16:22 ` Oleg Nesterov
2013-11-04 18:47 ` Oleg Nesterov
2013-11-04 18:57 ` Oleg Nesterov
2013-11-05 2:51 ` Namhyung Kim
2013-11-05 16:41 ` Oleg Nesterov
2013-11-06 8:37 ` Namhyung Kim
2013-11-05 2:49 ` Namhyung Kim
2013-11-05 6:58 ` Namhyung Kim
2013-11-05 17:45 ` Oleg Nesterov
2013-11-05 19:24 ` Oleg Nesterov
2013-11-06 8:57 ` Namhyung Kim
2013-11-06 17:37 ` Oleg Nesterov
2013-11-06 18:24 ` Oleg Nesterov
2013-11-07 9:00 ` Namhyung Kim
2013-11-08 17:00 ` Oleg Nesterov
2013-11-12 7:49 ` Namhyung Kim
2013-11-07 8:48 ` Namhyung Kim
2013-11-09 3:18 ` Masami Hiramatsu
2013-11-09 15:23 ` Oleg Nesterov
2013-11-12 8:00 ` Namhyung Kim
2013-11-12 18:44 ` Oleg Nesterov
2013-11-25 6:59 ` Namhyung Kim
2013-11-25 14:12 ` [PATCH] uprobes: Allocate ->utask before handler_chain() for tracing handlers Oleg Nesterov
2013-11-06 8:48 ` [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
2013-11-06 16:28 ` Oleg Nesterov
2013-11-07 7:33 ` Namhyung Kim
2013-11-08 16:52 ` Oleg Nesterov
2013-11-05 2:15 ` Namhyung Kim
2013-11-05 16:33 ` Oleg Nesterov
2013-11-06 8:34 ` Namhyung Kim
2013-11-05 1:59 ` Namhyung Kim
2013-11-04 15:01 ` Oleg Nesterov [this message]
2013-11-05 1:53 ` Namhyung Kim
2013-11-05 16:28 ` Oleg Nesterov
2013-11-06 8:31 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131104150112.GC4440@redhat.com \
--to=oleg@redhat.com \
--cc=acme@ghostprotocols.net \
--cc=cheol.lee@lge.com \
--cc=hkshaw@linux.vnet.ibm.com \
--cc=jovi.zhangwei@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=namhyung.kim@lge.com \
--cc=namhyung@kernel.org \
--cc=rostedt@goodmis.org \
--cc=srikar@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).