Re: [PATCH 1/3] ftrace: add function tracing to single thread

Linux Container Development
 help / color / mirror / Atom feed

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
       [not found]       ` <20081125153104.ecdceed4.akpm@linux-foundation.org>
@ 2008-11-26  0:11         ` Steven Rostedt
  2008-11-26  0:53           ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2008-11-26  0:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Ingo Molnar, fweisbec, srostedt, containers




On Tue, 25 Nov 2008, Andrew Morton wrote:

> On Tue, 25 Nov 2008 18:01:06 -0500 (EST)
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > 
> > On Tue, 25 Nov 2008, Andrew Morton wrote:
> > 
> > > On Tue, 25 Nov 2008 17:34:22 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > This patch adds the ability to function trace a single thread.
> > > > The file:
> > > > 
> > > >   /debugfs/tracing/set_ftrace_pid
> > > > 
> > > > contains the pid to trace.
> > > 
> > > What happens if the same pid exists in two or more pid namespaces?
> > 
> > I think we had this discussion before.
> 
> Oh.  What did we end up concluding?
> 
> > It tests current->pid, would that 
> > be different among the name spaces?
> 
> I think those are non-unique.  containers@lists.osdl.org would have
> better ideas..

Added list in CC.

I think the end result was, if this file can only be changed by root, then 
we do not need to worry about namespaces. This file is a privileged file 
that can only be modified by root.

If someday we decide to let non admin users touch this file, then we would
need to care about this.  This file may actually be modified in the future 
by users, so this may become an issue.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
  2008-11-26  0:11         ` [PATCH 1/3] ftrace: add function tracing to single thread Steven Rostedt
@ 2008-11-26  0:53           ` Dave Hansen
  2008-11-26  1:01             ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2008-11-26  0:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, containers, fweisbec, Ingo Molnar, LKML, srostedt,
	Eric Biederman, Sukadev Bhattiprolu, Serge E. Hallyn

On Tue, 2008-11-25 at 19:11 -0500, Steven Rostedt wrote:
> > > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > 
> > > > > This patch adds the ability to function trace a single thread.
> > > > > The file:
> > > > > 
> > > > >   /debugfs/tracing/set_ftrace_pid
> > > > > 
> > > > > contains the pid to trace.
> > > > 
> > > > What happens if the same pid exists in two or more pid namespaces?
> > > 
> > > I think we had this discussion before.
> > 
> > Oh.  What did we end up concluding?
> > 
> > > It tests current->pid, would that 
> > > be different among the name spaces?
> > 
> > I think those are non-unique.  containers@lists.osdl.org would have
> > better ideas..
> 
> Added list in CC.
> 
> I think the end result was, if this file can only be changed by root, then 
> we do not need to worry about namespaces. This file is a privileged file 
> that can only be modified by root.
> 
> If someday we decide to let non admin users touch this file, then we would
> need to care about this.  This file may actually be modified in the future 
> by users, so this may become an issue.

This really has very little to do with root vs non-root users.  In fact,
we're working towards having cases where we have many "root" users, even
those inside namespaces.  It is also quite possible for a normal root
user to fork into a new pid namespace.  In that case, root simply won't
be able to use this feature because something like:

	echo $$ /debugfs/tracing/set_ftrace_pid

just won't work.  Let's look at a bit of the code.

+static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
+{
+       if (current->pid != ftrace_pid_trace)
+               return;
+
+       ftrace_pid_function(ip, parent_ip);
+}

One thing this doesn't deal with is pid wraparound.  Does that matter?

If you want to fix this a bit, instead of saving off the pid_t in
ftrace_pid_trace, you should save a 'struct pid'.  You can get the
'struct pid' for a particular task by doing a find_get_pid(pid_t).  You
can then compare that pid_t to current by doing a
pid_task(struct_pid_that_i_saved, PIDTYPE_PID).  That will also protect
against pid wraparound.

The find_get_pid() is handy because it will do the pid_t lookup in the
context of the current task's pid namespace, which is what you want, I
think.

-- Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
  2008-11-26  0:53           ` Dave Hansen
@ 2008-11-26  1:01             ` Steven Rostedt
       [not found]               ` <m1zljndrb6.fsf@frodo.ebiederm.org>
  2008-11-26  5:20               ` Eric W. Biederman
  0 siblings, 2 replies; 11+ messages in thread
From: Steven Rostedt @ 2008-11-26  1:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, containers, fweisbec, Ingo Molnar, LKML, srostedt,
	Eric Biederman, Sukadev Bhattiprolu, Serge E. Hallyn


On Tue, 25 Nov 2008, Dave Hansen wrote:
> > 
> > I think the end result was, if this file can only be changed by root, then 
> > we do not need to worry about namespaces. This file is a privileged file 
> > that can only be modified by root.
> > 
> > If someday we decide to let non admin users touch this file, then we would
> > need to care about this.  This file may actually be modified in the future 
> > by users, so this may become an issue.
> 
> This really has very little to do with root vs non-root users.  In fact,
> we're working towards having cases where we have many "root" users, even
> those inside namespaces.  It is also quite possible for a normal root
> user to fork into a new pid namespace.  In that case, root simply won't
> be able to use this feature because something like:
> 	
> 	echo $$ /debugfs/tracing/set_ftrace_pid
> 	
> just won't work.  Let's look at a bit of the code.
> 
> +static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
> +{
> +       if (current->pid != ftrace_pid_trace)
> +               return;
> +
> +       ftrace_pid_function(ip, parent_ip);
> +}
> 
> One thing this doesn't deal with is pid wraparound.  Does that matter?

Should not.  This is just a way to trace a particular process. Currently 
it traces all processes. If we wrap, then we trace the process with the 
new pid. This should not be an issue.

> 
> If you want to fix this a bit, instead of saving off the pid_t in
> ftrace_pid_trace, you should save a 'struct pid'.  You can get the
> 'struct pid' for a particular task by doing a find_get_pid(pid_t).  You
> can then compare that pid_t to current by doing a
> pid_task(struct_pid_that_i_saved, PIDTYPE_PID).  That will also protect
> against pid wraparound.
> 
> The find_get_pid() is handy because it will do the pid_t lookup in the
> context of the current task's pid namespace, which is what you want, I
> think.

Nope, we can not call that in this context. ftrace_pid_func is called 
directly from mcount, without any protection.

struct pid *find_get_pid(pid_t nr)
{
	struct pid *pid;

	rcu_read_lock();
	pid = get_pid(find_vpid(nr));
	rcu_read_unlock();

	return pid;
}
EXPORT_SYMBOL_GPL(find_get_pid);

This means find_get_pid will call mcount which will call ftrace_pid_func, 
and back again. This can also happen with rcu_read_{un}lock() and 
get_pid() and find_vpid().

We can not do anything special here.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
       [not found]               ` <m1zljndrb6.fsf@frodo.ebiederm.org>
@ 2008-11-26  5:01                 ` Steven Rostedt
  2008-11-26  6:23                   ` Eric W. Biederman
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2008-11-26  5:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dave Hansen, Andrew Morton, containers, fweisbec, Ingo Molnar,
	LKML, srostedt, Sukadev Bhattiprolu, Serge E. Hallyn



On Tue, 25 Nov 2008, Eric W. Biederman wrote:

> 
> 

I'm speechless too.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
  2008-11-26  1:01             ` Steven Rostedt
       [not found]               ` <m1zljndrb6.fsf@frodo.ebiederm.org>
@ 2008-11-26  5:20               ` Eric W. Biederman
  2008-11-26 16:36                 ` Steven Rostedt
  1 sibling, 1 reply; 11+ messages in thread
From: Eric W. Biederman @ 2008-11-26  5:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dave Hansen, Andrew Morton, containers, fweisbec, Ingo Molnar,
	LKML, srostedt, Sukadev Bhattiprolu, Serge E. Hallyn

Steven Rostedt <rostedt@goodmis.org> writes:

> On Tue, 25 Nov 2008, Dave Hansen wrote:
>> > 
>> > I think the end result was, if this file can only be changed by root, then 
>> > we do not need to worry about namespaces. This file is a privileged file 
>> > that can only be modified by root.
>> > 
>> > If someday we decide to let non admin users touch this file, then we would
>> > need to care about this.  This file may actually be modified in the future 
>> > by users, so this may become an issue.
>> 
>> This really has very little to do with root vs non-root users.  In fact,
>> we're working towards having cases where we have many "root" users, even
>> those inside namespaces.  It is also quite possible for a normal root
>> user to fork into a new pid namespace.  In that case, root simply won't
>> be able to use this feature because something like:
>> 	
>> 	echo $$ /debugfs/tracing/set_ftrace_pid
>> 	
>> just won't work.  Let's look at a bit of the code.
>> 
>> +static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
>> +{
>> +       if (current->pid != ftrace_pid_trace)
>> +               return;
>> +
>> +       ftrace_pid_function(ip, parent_ip);
>> +}
>> 
>> One thing this doesn't deal with is pid wraparound.  Does that matter?
>
> Should not.  This is just a way to trace a particular process. Currently 
> it traces all processes. If we wrap, then we trace the process with the 
> new pid. This should not be an issue.

So.  Using raw pid numbers in the kernel is bad form.  The internal
representation should be struct pid pointers as much as we can make
them.

I would 100% prefer it if ftrace_pid_func was written in terms of struct
pid.  That does guarantee you don't have pid wrap around issues.
It almost makes it clear 

+static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
+{
+	if (task_pid(current) == ftrace_pid_trace)
+               return;
+
+       ftrace_pid_function(ip, parent_ip);
+}

We don't need locks to access the pid of current.


>> If you want to fix this a bit, instead of saving off the pid_t in
>> ftrace_pid_trace, you should save a 'struct pid'.  You can get the
>> 'struct pid' for a particular task by doing a find_get_pid(pid_t).  You
>> can then compare that pid_t to current by doing a
>> pid_task(struct_pid_that_i_saved, PIDTYPE_PID).  That will also protect
>> against pid wraparound.
>> 
>> The find_get_pid() is handy because it will do the pid_t lookup in the
>> context of the current task's pid namespace, which is what you want, I
>> think.
>
> Nope, we can not call that in this context. ftrace_pid_func is called 
> directly from mcount, without any protection.

Of course you can't.  But at the same time find_get_pid() is always
supposed to be called on the user space pid ingress path.

> struct pid *find_get_pid(pid_t nr)
> {
> 	struct pid *pid;
>
> 	rcu_read_lock();
> 	pid = get_pid(find_vpid(nr));
> 	rcu_read_unlock();
>
> 	return pid;
> }
> EXPORT_SYMBOL_GPL(find_get_pid);
>
> This means find_get_pid will call mcount which will call ftrace_pid_func, 
> and back again. This can also happen with rcu_read_{un}lock() and 
> get_pid() and find_vpid().
>
> We can not do anything special here.

I don't see the whole path.  But here is the deal.
1) Using struct pid and the proper find_get_pid() means that a user with
   the proper capabilities/permissions who happens to be running in a pid
   namespace can call this and it will just work.

2) The current best practices in the current are to:
   - call find_get_pid() when you capture a user space pid.
   - call put_pid() when you are done with it.

   Perhaps that is just:
   put_pid(ftrace_pid_trace);
   ftrace_pid_trace = find_get_pid(user_provided_pid_value);

3) If you ultimately want to support the full gamut:
   thread/process/process group/session.  You will need
   to use struct pid pointer comparisons.

4) When I looked at the place you were concerned about races 
   a) you were concerned about the wrong race.
   b) I don't see a race.
   c) You were filtering for the tid of a linux task not
      the tgid of a process.  So the code didn't seem to
      be doing what you thought it was doing.

5) I keep thinking current->pid should be removed some day.

So let's do this properly if we can.  This is a privileged operation
so we don't need to support people without the proper capabilities
doing this.  Or multiple comparisons or anything silly like that.  But
doing this with struct pid comparisons seems cleaner and more maintainable.  And that
really should matter.

Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
  2008-11-26  5:01                 ` Steven Rostedt
@ 2008-11-26  6:23                   ` Eric W. Biederman
  2008-11-26  6:32                     ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Eric W. Biederman @ 2008-11-26  6:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dave Hansen, Andrew Morton, containers, fweisbec, Ingo Molnar,
	LKML, srostedt, Sukadev Bhattiprolu, Serge E. Hallyn

Steven Rostedt <rostedt@goodmis.org> writes:

> On Tue, 25 Nov 2008, Eric W. Biederman wrote:
>
>> 
>> 
>
> I'm speechless too.

I'm a bit tired so probably am pushing to hard.

At the same time I don't see a single reason not to use
struct pid for what it was designed for.  Identifying tasks.
pid_t's really only belong at the border.

I can see in the tracer when grabbing numbers you might
not be able to follow pointers.  For that I see justification
for using task->pid.  For the comparison I just don't see it.

Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
  2008-11-26  6:23                   ` Eric W. Biederman
@ 2008-11-26  6:32                     ` Ingo Molnar
  2008-11-26  7:02                       ` Eric W. Biederman
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2008-11-26  6:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Steven Rostedt, Dave Hansen, Andrew Morton, containers, fweisbec,
	LKML, srostedt, Sukadev Bhattiprolu, Serge E. Hallyn


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> Steven Rostedt <rostedt@goodmis.org> writes:
> 
> > On Tue, 25 Nov 2008, Eric W. Biederman wrote:
> >
> > I'm speechless too.
> 
> I'm a bit tired so probably am pushing to hard.
> 
> At the same time I don't see a single reason not to use struct pid 
> for what it was designed for.  Identifying tasks. pid_t's really 
> only belong at the border.
> 
> I can see in the tracer when grabbing numbers you might not be able 
> to follow pointers.  For that I see justification for using 
> task->pid.  For the comparison I just don't see it.

i dont see the point of the complexity you are advocating. 99.9% of 
the users run a unique PID space.

Tracing is about keeping stuff simple. On containers we could also 
trace the namespace ID (is there an easy ID for the namespace, as an 
easy extension to the very nice PID concept that Unix introduced 
decades ago?) and be done with it.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
  2008-11-26  6:32                     ` Ingo Molnar
@ 2008-11-26  7:02                       ` Eric W. Biederman
  2008-11-26  7:18                         ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Eric W. Biederman @ 2008-11-26  7:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Dave Hansen, Andrew Morton, containers, fweisbec,
	LKML, srostedt, Sukadev Bhattiprolu, Serge E. Hallyn

Ingo Molnar <mingo@elte.hu> writes:

> i dont see the point of the complexity you are advocating. 99.9% of 
> the users run a unique PID space.

I'm not advocating complexity.  I'm advocating using the same APIs as
the rest of the kernel, for doing the same functions.

> Tracing is about keeping stuff simple. On containers we could also 
> trace the namespace ID (is there an easy ID for the namespace, as an 
> easy extension to the very nice PID concept that Unix introduced 
> decades ago?) and be done with it.

I don't really care about the pid namespace in this context.

I am just asking that we compare a different field in the task struct.

I am asking that we don't accumulate new users of an old crufty bug prone
API, for no good reason.

I'm asking that we don't be different for no good reason.

Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
  2008-11-26  7:02                       ` Eric W. Biederman
@ 2008-11-26  7:18                         ` Ingo Molnar
  2008-11-26  8:48                           ` Eric W. Biederman
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2008-11-26  7:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Steven Rostedt, Dave Hansen, Andrew Morton, containers, fweisbec,
	LKML, srostedt, Sukadev Bhattiprolu, Serge E. Hallyn


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> > i dont see the point of the complexity you are advocating. 99.9% of 
> > the users run a unique PID space.
> 
> I'm not advocating complexity.  I'm advocating using the same APIs as
> the rest of the kernel, for doing the same functions.
> 
> > Tracing is about keeping stuff simple. On containers we could also 
> > trace the namespace ID (is there an easy ID for the namespace, as an 
> > easy extension to the very nice PID concept that Unix introduced 
> > decades ago?) and be done with it.
> 
> I don't really care about the pid namespace in this context.
> 
> I am just asking that we compare a different field in the task 
> struct.
> 
> I am asking that we don't accumulate new users of an old crufty bug 
> prone API, for no good reason.

i dont disagree about the change, but i'm curious, what's bug-prone 
about current->pid? It certainly worked quite well for the first 15 
years.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
  2008-11-26  7:18                         ` Ingo Molnar
@ 2008-11-26  8:48                           ` Eric W. Biederman
  0 siblings, 0 replies; 11+ messages in thread
From: Eric W. Biederman @ 2008-11-26  8:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Dave Hansen, Andrew Morton, containers, fweisbec,
	LKML, srostedt, Sukadev Bhattiprolu, Serge E. Hallyn

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> Ingo Molnar <mingo@elte.hu> writes:
>> 
>> > i dont see the point of the complexity you are advocating. 99.9% of 
>> > the users run a unique PID space.
>> 
>> I'm not advocating complexity.  I'm advocating using the same APIs as
>> the rest of the kernel, for doing the same functions.
>> 
>> > Tracing is about keeping stuff simple. On containers we could also 
>> > trace the namespace ID (is there an easy ID for the namespace, as an 
>> > easy extension to the very nice PID concept that Unix introduced 
>> > decades ago?) and be done with it.
>> 
>> I don't really care about the pid namespace in this context.
>> 
>> I am just asking that we compare a different field in the task 
>> struct.
>> 
>> I am asking that we don't accumulate new users of an old crufty bug 
>> prone API, for no good reason.
>
> i dont disagree about the change, but i'm curious, what's bug-prone 
> about current->pid? It certainly worked quite well for the first 15 
> years.

Nothing especially serious.

- Just plain general sloppiness that you can get into with using numeric
  pids because you can get away with that you can't get away with if you
  are dealing with a real data structure.

  One of which is classic data type confusion.  Is a pid stored in an
  int a pid_t a short or something else.  Which leads to the difficult
  question if you need to change something how do you find and update
  all of the users.

  Other cases are the horrible to convert cases where we in practice
  leaked pids all over the place, got the locking totally all confused
  simply because we never noticed the races.  Code like that is a
  nightmare to convert to using struct pid *.  Even if struct pid pointers
  are faster to use.

- There is the interesting case that the original unix usage of pids
  and process and session groups with ttys, did not have any problems
  with pid wrap around.  But as pids became more common we added the
  SIGIO support which theoretically at least could allow send a signal
  to the wrong process due to pid wrap around.

  I'm not especially security paranoid but I suspect someone intent
  on such things could likely find a way to send a signal to a process they usually
  could not using pid wrap around.

- As for current->pid itself it is on my hit list for a different reason.

  It is one of the very few left overs from the old pid API, where we
  assume pids numbers are global. Being able to successfully remove it
  would greatly increase the confidence that we haven't missed
  something in the pid namespace implementation.

  The __pgrp and the __session fields in signal_struct are much higher
  on my hit list.  Darn I thought I had already removed them.  But unfortunately
  the final finishing touches on the pid namespace got stalled.

  Currently the preferred patterns are struct pid pointers internal to
  the kernel ( any place we are likely to save them ) and pid_t values
  right on the edge of user space. 

  current->pid is very handy for debugging.  Certainly global pid numbers
  aka pid numbers in the initial pid namespace are the pids we want
  to print with printk, and in any very light weight tracing.  As long
  as they don't creep into APIs where people turn around and use those
  those pids I don't have a problem with privileged users seeing them.

Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ftrace: add function tracing to single thread
  2008-11-26  5:20               ` Eric W. Biederman
@ 2008-11-26 16:36                 ` Steven Rostedt
  0 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2008-11-26 16:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dave Hansen, Andrew Morton, containers, fweisbec, Ingo Molnar,
	LKML, srostedt, Sukadev Bhattiprolu, Serge E. Hallyn


On Tue, 25 Nov 2008, Eric W. Biederman wrote:

> Steven Rostedt <rostedt@goodmis.org> writes:
> 
> > On Tue, 25 Nov 2008, Dave Hansen wrote:
> >> > 
> >> > I think the end result was, if this file can only be changed by root, then 
> >> > we do not need to worry about namespaces. This file is a privileged file 
> >> > that can only be modified by root.
> >> > 
> >> > If someday we decide to let non admin users touch this file, then we would
> >> > need to care about this.  This file may actually be modified in the future 
> >> > by users, so this may become an issue.
> >> 
> >> This really has very little to do with root vs non-root users.  In fact,
> >> we're working towards having cases where we have many "root" users, even
> >> those inside namespaces.  It is also quite possible for a normal root
> >> user to fork into a new pid namespace.  In that case, root simply won't
> >> be able to use this feature because something like:
> >> 	
> >> 	echo $$ /debugfs/tracing/set_ftrace_pid
> >> 	
> >> just won't work.  Let's look at a bit of the code.
> >> 
> >> +static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
> >> +{
> >> +       if (current->pid != ftrace_pid_trace)
> >> +               return;
> >> +
> >> +       ftrace_pid_function(ip, parent_ip);
> >> +}
> >> 
> >> One thing this doesn't deal with is pid wraparound.  Does that matter?
> >
> > Should not.  This is just a way to trace a particular process. Currently 
> > it traces all processes. If we wrap, then we trace the process with the 
> > new pid. This should not be an issue.
> 
> So.  Using raw pid numbers in the kernel is bad form.  The internal
> representation should be struct pid pointers as much as we can make
> them.
> 
> I would 100% prefer it if ftrace_pid_func was written in terms of struct
> pid.  That does guarantee you don't have pid wrap around issues.
> It almost makes it clear 
> 
> +static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
> +{
> +	if (task_pid(current) == ftrace_pid_trace)
> +               return;
> +
> +       ftrace_pid_function(ip, parent_ip);
> +}
> 
> We don't need locks to access the pid of current.

That version does not bother me. I'm not worried about locks as much as
I am about recursion. If that "task_pid()" ever became a function that is
traced by mcount, then it will end up in a recursive loop, and will crash 
the system.

> 
> 
> >> If you want to fix this a bit, instead of saving off the pid_t in
> >> ftrace_pid_trace, you should save a 'struct pid'.  You can get the
> >> 'struct pid' for a particular task by doing a find_get_pid(pid_t).  You
> >> can then compare that pid_t to current by doing a
> >> pid_task(struct_pid_that_i_saved, PIDTYPE_PID).  That will also protect
> >> against pid wraparound.
> >> 
> >> The find_get_pid() is handy because it will do the pid_t lookup in the
> >> context of the current task's pid namespace, which is what you want, I
> >> think.
> >
> > Nope, we can not call that in this context. ftrace_pid_func is called 
> > directly from mcount, without any protection.
> 
> Of course you can't.  But at the same time find_get_pid() is always
> supposed to be called on the user space pid ingress path.
> 
> > struct pid *find_get_pid(pid_t nr)
> > {
> > 	struct pid *pid;
> >
> > 	rcu_read_lock();
> > 	pid = get_pid(find_vpid(nr));
> > 	rcu_read_unlock();
> >
> > 	return pid;
> > }
> > EXPORT_SYMBOL_GPL(find_get_pid);
> >
> > This means find_get_pid will call mcount which will call ftrace_pid_func, 
> > and back again. This can also happen with rcu_read_{un}lock() and 
> > get_pid() and find_vpid().
> >
> > We can not do anything special here.
> 
> I don't see the whole path.  But here is the deal.
> 1) Using struct pid and the proper find_get_pid() means that a user with
>    the proper capabilities/permissions who happens to be running in a pid
>    namespace can call this and it will just work.
> 
> 2) The current best practices in the current are to:
>    - call find_get_pid() when you capture a user space pid.
>    - call put_pid() when you are done with it.
> 
>    Perhaps that is just:
>    put_pid(ftrace_pid_trace);
>    ftrace_pid_trace = find_get_pid(user_provided_pid_value);

This may be fine.

> 
> 3) If you ultimately want to support the full gamut:
>    thread/process/process group/session.  You will need
>    to use struct pid pointer comparisons.
> 
> 4) When I looked at the place you were concerned about races 
>    a) you were concerned about the wrong race.
>    b) I don't see a race.
>    c) You were filtering for the tid of a linux task not
>       the tgid of a process.  So the code didn't seem to
>       be doing what you thought it was doing.
> 
> 5) I keep thinking current->pid should be removed some day.
> 
> So let's do this properly if we can.  This is a privileged operation
> so we don't need to support people without the proper capabilities
> doing this.  Or multiple comparisons or anything silly like that.  But
> doing this with struct pid comparisons seems cleaner and more maintainable.  And that
> really should matter.

Just so you understand what I'm concerned about:

$ objdump -dr kernel/pid.o
[...]
0000025f <find_get_pid>:
 25f:   55                      push   %ebp
 260:   89 e5                   mov    %esp,%ebp
 262:   53                      push   %ebx
 263:   e8 fc ff ff ff          call   264 <find_get_pid+0x5>
                        264: R_386_PC32 mcount
 268:   89 c3                   mov    %eax,%ebx
 26a:   b8 01 00 00 00          mov    $0x1,%eax


looking in arch/x86/kernel/entry_32.S:

ENTRY(mcount)
        cmpl $0, function_trace_stop
        jne  ftrace_stub

        cmpl $ftrace_stub, ftrace_trace_function
        jnz trace
[...]
trace:
        pushl %eax
        pushl %ecx
        pushl %edx
        movl 0xc(%esp), %eax
        movl 0x4(%ebp), %edx
        subl $MCOUNT_INSN_SIZE, %eax

        call *ftrace_trace_function



looking in kerne/trace/ftrace.c:

                        if (ftrace_pid_trace >= 0) {
                                set_ftrace_pid_function(func);
                                func = ftrace_pid_func;
                        }
                        ftrace_trace_function = func;

And we then have

static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
{
        if (current->pid != ftrace_pid_trace)
                return;

        ftrace_pid_function(ip, parent_ip);
}


Now by having ftrace_pid_func call find_get_pid, we have the function flow 
of...


 schedule() /* any traced function */
  --> mcount
        --> *ftrace_trace_function == ftrace_pid_func
          ftrace_pid_func
               --> find_get_pid
                    --> mcount
                         --> *ftrace_trace_function == ftrace_pid_func
                             ftrace_pid_func
                                  --> find_get_pid
                                       [ ad infinitum ]


The comparison must be very careful not to call anything that will also 
trace. I can add code to catch this recursion, but this is overhead I do 
not want to add. Remember, this is called on every function call.

If we do the work at the time we set ftrace_pid_trace and we can do simple 
pointer comparisons in the ftrace_pid_func, I will be happy with that. I'm 
still learning about this pid namespace, so I'll probably screw it up a 
few more times ;-)

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-11-26 16:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20081125223421.795046329@goodmis.org>
     [not found] ` <20081125223456.976670734@goodmis.org>
     [not found]   ` <20081125144221.66eb99ad.akpm@linux-foundation.org>
     [not found]     ` <alpine.DEB.1.10.0811251800130.26424@gandalf.stny.rr.com>
     [not found]       ` <20081125153104.ecdceed4.akpm@linux-foundation.org>
2008-11-26  0:11         ` [PATCH 1/3] ftrace: add function tracing to single thread Steven Rostedt
2008-11-26  0:53           ` Dave Hansen
2008-11-26  1:01             ` Steven Rostedt
     [not found]               ` <m1zljndrb6.fsf@frodo.ebiederm.org>
2008-11-26  5:01                 ` Steven Rostedt
2008-11-26  6:23                   ` Eric W. Biederman
2008-11-26  6:32                     ` Ingo Molnar
2008-11-26  7:02                       ` Eric W. Biederman
2008-11-26  7:18                         ` Ingo Molnar
2008-11-26  8:48                           ` Eric W. Biederman
2008-11-26  5:20               ` Eric W. Biederman
2008-11-26 16:36                 ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox