All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Anton Arapov <anton@redhat.com>, David Smith <dsmith@redhat.com>,
	"Frank Ch. Eigler" <fche@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	Josh Stone <jistone@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	"Suzuki K. Poulose" <suzuki@in.ibm.com>
Cc: linux-kernel@vger.kernel.org
Subject: uprobes && pre-filtering
Date: Mon, 5 Nov 2012 20:04:46 +0100	[thread overview]
Message-ID: <20121105190446.GA6188@redhat.com> (raw)

Hello.

There is a known (and by design) problem with uprobes. They act
systemwide, there is no pre-filtering. Just some random thoughts
to provoke the discussion.

- I think that the current uprobe_consumer->filter(task) should die.

  It buys nothing. It is called right before ->handler() and this is
  pointless, ->handler() can call it itself.

  And more importantly, I do not think this hook (with the same
  semantics) can/should be used by both register and handler_chain().

- We should change register_ and and mmap to filter out the tasks
  which we do not want to trace. To simplify, lets forget about
  multiple consumers first.

  Everything is simple, install_breakpoint() callers should simply
  call consumer->filter(args) and do nothing if it returns false.

  The main problem is, what should be passed as "args". I think it is
  pointless to use task_struct as an argument. And not because there
  is no simple way to find all tasks which use this particular mm in
  register_for_each_vma(), even if it was possible I think this makes
  no sense.

  If 2 tasks/threads share the same mm they will share int3 as well,
  so I think we need

  	enum filter_mode {
  		UPROBE_FILTER_REGISTER,
  		UPROBE_FILTER_MMAP,
  		/* more */
  	};

  	consumer->filter(enum filter_mode mode, struct mm_struct *mm);

  Sure, this does not allow to, say, probe the tasks with the given uid.
  But once again, currently we do not have for_each_mm_user(task, mm)
  and even if we implement it

  	a) ->filter(mm) can use it itself)

  	b) I do not think register_for_each_vma() should call it.

  	   Suppose that a consumer wants to track the task with the
  	   given pid PID. In this case ->filter() can simply check
  	   find_task_by_vpid(PID)->mm = mm. This is fast and simple.

  	   Or you want to probe all instances of /bin/ls. In this case
  	   filter() can check mm->exe_file->f_path == saved_path, this
  	   is very cheap.

  	   But if we add for_each_mm_user() into register_for_each_vma()
  	   it will be called even if the filtering is simple.

  So. If filter(UPROBE_FILTER_REGISTER) needs to check task_uid(task) == UID
  it has to do for_each_process() until we have (if ever) for_each_mm_user().

  Or it should always return true and remove the unnecessary breakpoints
  later, or use another API (see below).

  Also. Who needs the "nontrivial" filtering? I do not see any potential
  in-kernel user. And the tools like systemtap can take another approach
  (perhaps).

- Perhaps we should extend the API. We can add

	uprobe_apply(consumer, task, bool add_remove);

  which adds/removes breakpoints to task->mm.

  This way consumer can probe every task it wants to trace after
  uprobe_register().

  Its ->filter(UPROBE_FILTER_REGISTER) should simply return false. Or,
  better, we can split uprobe_register() into 2 functions,
  __uprobe_register() and uprobe_apply_all() which actually does
  register_for_each_vma().

  ***** QUESTION *****: perhaps this is all systemtap needs? ignoring
  UPROBE_FILTER_MMAP.

- Multiple consumers. uprobe_register/uprobe_unregister should be
  modified so that register_for_each_vma() is called every time when
  the new consumer comes or consumer goes away.

  uprobe_apply(add_remove => false) should obviously consult other
  consumers too.

- Perhaps we should teach handle_swbp() to remove the unwanted breakpoints.

  If every ->handler() returns, say, UPROBE_GO_AWAY handle_swbp() should
  remove this breakpoint. Of course, a consumer must be sure that if it
  returns UPROBE_GO_AWAY this task can't share ->mm with another task it
  wants to trace.

  Or consumer->handler() can do uprobe_apply(add_remove => false) itself,
  but this needs more discussion.

  The point is that if the filtering at UPROBE_FILTER_REGISTER time is
  not possible, ->filter(UPROBE_FILTER_REGISTER) can return true. A
  "wrong" int3 doesn't hurt until the task actually hits the breakpoint,
  and I think that a single bp-hit is fine performance-wise.

- fork(). The child inherits all breakpoints from parent, and uprobes
  can't control this. What can we do?

	* We can add another uprobe hook which does something like

	  	for_each_uprobe_in_each_vma(child_mm) {
	  		if (filter(UPROBE_FILTER_FORK))
	  			install_breakoint();
	  		else
	  			remove_breakpoint();
	  	}


	  But is is not clear where can we add this hook. dup_mmap()
	  looks appealing, but at this time the child is still under
	  construction, consumer->filter() can't look at task_struct.

	  And of course, it is not nice to slow down fork().

	* If we only care about the unwanted breakpoints, perhaps it
	  would be better to rely on UPROBE_GO_AWAY above?

	* Finally, do we care at all? Again, who can ever need to
	  re-install breakpoints after fork?

	  systemtap (iiuc) doesn't need this. And even if it does
	  or will need, I guess it can hook fork itself and use
	  uprobe_apply() ?

Please comment.

Oleg.


             reply	other threads:[~2012-11-05 19:04 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-05 19:04 Oleg Nesterov [this message]
2012-11-06  9:05 ` uprobes && pre-filtering Srikar Dronamraju
2012-11-06 17:02   ` Oleg Nesterov
2012-11-06 17:41     ` Josh Stone
2012-11-06 18:20       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121105190446.GA6188@redhat.com \
    --to=oleg@redhat.com \
    --cc=ananth@in.ibm.com \
    --cc=anton@redhat.com \
    --cc=dsmith@redhat.com \
    --cc=fche@redhat.com \
    --cc=jistone@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=suzuki@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.