public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Anton Arapov <anton@redhat.com>, David Smith <dsmith@redhat.com>,
	"Frank Ch. Eigler" <fche@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	Josh Stone <jistone@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	"Suzuki K. Poulose" <suzuki@in.ibm.com>
Cc: linux-kernel@vger.kernel.org
Subject: uprobes && pre-filtering
Date: Mon, 5 Nov 2012 20:04:46 +0100	[thread overview]
Message-ID: <20121105190446.GA6188@redhat.com> (raw)

Hello.

There is a known (and by design) problem with uprobes. They act
systemwide, there is no pre-filtering. Just some random thoughts
to provoke the discussion.

- I think that the current uprobe_consumer->filter(task) should die.

  It buys nothing. It is called right before ->handler() and this is
  pointless, ->handler() can call it itself.

  And more importantly, I do not think this hook (with the same
  semantics) can/should be used by both register and handler_chain().

- We should change register_ and and mmap to filter out the tasks
  which we do not want to trace. To simplify, lets forget about
  multiple consumers first.

  Everything is simple, install_breakpoint() callers should simply
  call consumer->filter(args) and do nothing if it returns false.

  The main problem is, what should be passed as "args". I think it is
  pointless to use task_struct as an argument. And not because there
  is no simple way to find all tasks which use this particular mm in
  register_for_each_vma(), even if it was possible I think this makes
  no sense.

  If 2 tasks/threads share the same mm they will share int3 as well,
  so I think we need

  	enum filter_mode {
  		UPROBE_FILTER_REGISTER,
  		UPROBE_FILTER_MMAP,
  		/* more */
  	};

  	consumer->filter(enum filter_mode mode, struct mm_struct *mm);

  Sure, this does not allow to, say, probe the tasks with the given uid.
  But once again, currently we do not have for_each_mm_user(task, mm)
  and even if we implement it

  	a) ->filter(mm) can use it itself)

  	b) I do not think register_for_each_vma() should call it.

  	   Suppose that a consumer wants to track the task with the
  	   given pid PID. In this case ->filter() can simply check
  	   find_task_by_vpid(PID)->mm = mm. This is fast and simple.

  	   Or you want to probe all instances of /bin/ls. In this case
  	   filter() can check mm->exe_file->f_path == saved_path, this
  	   is very cheap.

  	   But if we add for_each_mm_user() into register_for_each_vma()
  	   it will be called even if the filtering is simple.

  So. If filter(UPROBE_FILTER_REGISTER) needs to check task_uid(task) == UID
  it has to do for_each_process() until we have (if ever) for_each_mm_user().

  Or it should always return true and remove the unnecessary breakpoints
  later, or use another API (see below).

  Also. Who needs the "nontrivial" filtering? I do not see any potential
  in-kernel user. And the tools like systemtap can take another approach
  (perhaps).

- Perhaps we should extend the API. We can add

	uprobe_apply(consumer, task, bool add_remove);

  which adds/removes breakpoints to task->mm.

  This way consumer can probe every task it wants to trace after
  uprobe_register().

  Its ->filter(UPROBE_FILTER_REGISTER) should simply return false. Or,
  better, we can split uprobe_register() into 2 functions,
  __uprobe_register() and uprobe_apply_all() which actually does
  register_for_each_vma().

  ***** QUESTION *****: perhaps this is all systemtap needs? ignoring
  UPROBE_FILTER_MMAP.

- Multiple consumers. uprobe_register/uprobe_unregister should be
  modified so that register_for_each_vma() is called every time when
  the new consumer comes or consumer goes away.

  uprobe_apply(add_remove => false) should obviously consult other
  consumers too.

- Perhaps we should teach handle_swbp() to remove the unwanted breakpoints.

  If every ->handler() returns, say, UPROBE_GO_AWAY handle_swbp() should
  remove this breakpoint. Of course, a consumer must be sure that if it
  returns UPROBE_GO_AWAY this task can't share ->mm with another task it
  wants to trace.

  Or consumer->handler() can do uprobe_apply(add_remove => false) itself,
  but this needs more discussion.

  The point is that if the filtering at UPROBE_FILTER_REGISTER time is
  not possible, ->filter(UPROBE_FILTER_REGISTER) can return true. A
  "wrong" int3 doesn't hurt until the task actually hits the breakpoint,
  and I think that a single bp-hit is fine performance-wise.

- fork(). The child inherits all breakpoints from parent, and uprobes
  can't control this. What can we do?

	* We can add another uprobe hook which does something like

	  	for_each_uprobe_in_each_vma(child_mm) {
	  		if (filter(UPROBE_FILTER_FORK))
	  			install_breakoint();
	  		else
	  			remove_breakpoint();
	  	}


	  But is is not clear where can we add this hook. dup_mmap()
	  looks appealing, but at this time the child is still under
	  construction, consumer->filter() can't look at task_struct.

	  And of course, it is not nice to slow down fork().

	* If we only care about the unwanted breakpoints, perhaps it
	  would be better to rely on UPROBE_GO_AWAY above?

	* Finally, do we care at all? Again, who can ever need to
	  re-install breakpoints after fork?

	  systemtap (iiuc) doesn't need this. And even if it does
	  or will need, I guess it can hook fork itself and use
	  uprobe_apply() ?

Please comment.

Oleg.


             reply	other threads:[~2012-11-05 19:04 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-05 19:04 Oleg Nesterov [this message]
2012-11-06  9:05 ` uprobes && pre-filtering Srikar Dronamraju
2012-11-06 17:02   ` Oleg Nesterov
2012-11-06 17:41     ` Josh Stone
2012-11-06 18:20       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121105190446.GA6188@redhat.com \
    --to=oleg@redhat.com \
    --cc=ananth@in.ibm.com \
    --cc=anton@redhat.com \
    --cc=dsmith@redhat.com \
    --cc=fche@redhat.com \
    --cc=jistone@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=suzuki@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox