[rfd] function-graph augmentation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [rfd] function-graph augmentation
@ 2009-02-19 21:01 Peter Zijlstra
  2009-02-19 21:28 ` Frederic Weisbecker
  2009-02-20  8:39 ` Ingo Molnar
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2009-02-19 21:01 UTC (permalink / raw)
  To: Frederic Weisbecker, Jason Baron, Ingo Molnar, Steven Rostedt; +Cc: lkml

Hi,

I was thinking how best to augment the function graph tracer with
various information. It seemed useful to add argument/return tracer
entries which, when found after a function entry, or function exit entry
would be rendered in the trace.

So supposing something like;

 3)               |  handle_mm_fault() {                                        
 3)               |    count_vm_event() {                                       
 3)   0.243 us    |      test_ti_thread_flag();                                 
 3)   0.754 us    |    }                                                        
 3)   0.249 us    |    pud_alloc();                                             
 3)   0.251 us    |    pmd_alloc();                                             
 3)               |    __do_fault() {                                           
 3)               |      filemap_fault() {                                      
 3)               |        find_lock_page() {                                   
 3)               |          find_get_page() {                                  
 3)   0.248 us    |            test_ti_thread_flag();                           
 3)   0.844 us    |          }                                                  
 3)   1.341 us    |        }                                                    
 3)   1.837 us    |      }                                                      
 3)   0.275 us    |      _spin_lock();                                          
 3)   0.257 us    |      page_add_file_rmap();                                  
 3)   0.233 us    |      native_set_pte_at();                                   
 3)               |      _spin_unlock() {                                       
 3)   0.248 us    |        test_ti_thread_flag();                               
 3)   0.742 us    |      }                                                      
 3)               |      unlock_page() {                                        
 3)   0.243 us    |        page_waitqueue();                                    
 3)   0.237 us    |        __wake_up_bit();                                     
 3)   1.209 us    |      }                                                      
 3)   6.274 us    |    }                                                        
 3)   8.806 us    |  } 

Say we found:

trace_graph_entry -- handle_mm_fault()
trace_func_arg -- address:0xffffffff
trace_func_arg -- write_access:1

We'd render:

 3)               |  handle_mm_fault(.address=0xffffffff, .write_access=1) {

trace_graph_return -- handle_mm_fault()
trace_func_ret -- 2

We'd render:

 3)   8.806 us    |  } = 2

Then we can register with tracepoints inside functions to add these
generic trace_func_arg/_ret entries to augment the graph (and or
function) tracer.


Does that make sense?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-19 21:01 [rfd] function-graph augmentation Peter Zijlstra
@ 2009-02-19 21:28 ` Frederic Weisbecker
  2009-02-19 21:38   ` Frederic Weisbecker
  2009-02-20  8:56   ` Ingo Molnar
  2009-02-20  8:39 ` Ingo Molnar
  1 sibling, 2 replies; 11+ messages in thread
From: Frederic Weisbecker @ 2009-02-19 21:28 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jason Baron, Ingo Molnar, Steven Rostedt, lkml

On Thu, Feb 19, 2009 at 10:01:44PM +0100, Peter Zijlstra wrote:
> Hi,
> 
> I was thinking how best to augment the function graph tracer with
> various information. It seemed useful to add argument/return tracer
> entries which, when found after a function entry, or function exit entry
> would be rendered in the trace.
> 
> So supposing something like;
> 
>  3)               |  handle_mm_fault() {                                        
>  3)               |    count_vm_event() {                                       
>  3)   0.243 us    |      test_ti_thread_flag();                                 
>  3)   0.754 us    |    }                                                        
>  3)   0.249 us    |    pud_alloc();                                             
>  3)   0.251 us    |    pmd_alloc();                                             
>  3)               |    __do_fault() {                                           
>  3)               |      filemap_fault() {                                      
>  3)               |        find_lock_page() {                                   
>  3)               |          find_get_page() {                                  
>  3)   0.248 us    |            test_ti_thread_flag();                           
>  3)   0.844 us    |          }                                                  
>  3)   1.341 us    |        }                                                    
>  3)   1.837 us    |      }                                                      
>  3)   0.275 us    |      _spin_lock();                                          
>  3)   0.257 us    |      page_add_file_rmap();                                  
>  3)   0.233 us    |      native_set_pte_at();                                   
>  3)               |      _spin_unlock() {                                       
>  3)   0.248 us    |        test_ti_thread_flag();                               
>  3)   0.742 us    |      }                                                      
>  3)               |      unlock_page() {                                        
>  3)   0.243 us    |        page_waitqueue();                                    
>  3)   0.237 us    |        __wake_up_bit();                                     
>  3)   1.209 us    |      }                                                      
>  3)   6.274 us    |    }                                                        
>  3)   8.806 us    |  } 
> 
> Say we found:
> 
> trace_graph_entry -- handle_mm_fault()
> trace_func_arg -- address:0xffffffff
> trace_func_arg -- write_access:1
> 
> We'd render:
> 
>  3)               |  handle_mm_fault(.address=0xffffffff, .write_access=1) {


Good solution, except that I wonder about preemption races.
Imagine the following scenario:

CPU#0
trace_graph_entry -> commit to ring_buffer CPU#0

//entering a the function
//task is scheduled out and reschduled later on CPU#1

CPU#1
trace_func_arg -> commit to ring_buffer CPU#1

Later on the graph output process:

print("func(")
search_for_args on buffer but don't find them because they are another
cpu ring_buffer.

Well I guess it should be rare, but it can happen.
Another race will be interrupts. And interrupt can fill a lot of entries
between a trace entry and the parameters.

And yet another thing: the ring buffer does not allow yet to peek more than one entry
further. But, I guess it doesn't require a lot of change.

The other solution would be to have a hashtable of functions for which we want to store the
parameters where we can find the number of parameters to store, and their type.
This way we could atomically submit those parameter entries and be more sure they will follow
the current one.

 
> trace_graph_return -- handle_mm_fault()
> trace_func_ret -- 2
> 
> We'd render:
> 
>  3)   8.806 us    |  } = 2
> 
> Then we can register with tracepoints inside functions to add these
> generic trace_func_arg/_ret entries to augment the graph (and or
> function) tracer.


We have some vague discussions about he return value.
The most simple would be to pick it on the return value register and send it the rest of
the trace. By default we can just print a hex value of an unsigned long size.
That would be generic...
 
> 
> Does that make sense?
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-19 21:28 ` Frederic Weisbecker
@ 2009-02-19 21:38   ` Frederic Weisbecker
  2009-02-20  8:56   ` Ingo Molnar
  1 sibling, 0 replies; 11+ messages in thread
From: Frederic Weisbecker @ 2009-02-19 21:38 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jason Baron, Ingo Molnar, Steven Rostedt, lkml

On Thu, Feb 19, 2009 at 10:28:08PM +0100, Frederic Weisbecker wrote:
> On Thu, Feb 19, 2009 at 10:01:44PM +0100, Peter Zijlstra wrote:
> > Hi,
> > 
> > I was thinking how best to augment the function graph tracer with
> > various information. It seemed useful to add argument/return tracer
> > entries which, when found after a function entry, or function exit entry
> > would be rendered in the trace.
> > 
> > So supposing something like;
> > 
> >  3)               |  handle_mm_fault() {                                        
> >  3)               |    count_vm_event() {                                       
> >  3)   0.243 us    |      test_ti_thread_flag();                                 
> >  3)   0.754 us    |    }                                                        
> >  3)   0.249 us    |    pud_alloc();                                             
> >  3)   0.251 us    |    pmd_alloc();                                             
> >  3)               |    __do_fault() {                                           
> >  3)               |      filemap_fault() {                                      
> >  3)               |        find_lock_page() {                                   
> >  3)               |          find_get_page() {                                  
> >  3)   0.248 us    |            test_ti_thread_flag();                           
> >  3)   0.844 us    |          }                                                  
> >  3)   1.341 us    |        }                                                    
> >  3)   1.837 us    |      }                                                      
> >  3)   0.275 us    |      _spin_lock();                                          
> >  3)   0.257 us    |      page_add_file_rmap();                                  
> >  3)   0.233 us    |      native_set_pte_at();                                   
> >  3)               |      _spin_unlock() {                                       
> >  3)   0.248 us    |        test_ti_thread_flag();                               
> >  3)   0.742 us    |      }                                                      
> >  3)               |      unlock_page() {                                        
> >  3)   0.243 us    |        page_waitqueue();                                    
> >  3)   0.237 us    |        __wake_up_bit();                                     
> >  3)   1.209 us    |      }                                                      
> >  3)   6.274 us    |    }                                                        
> >  3)   8.806 us    |  } 
> > 
> > Say we found:
> > 
> > trace_graph_entry -- handle_mm_fault()
> > trace_func_arg -- address:0xffffffff
> > trace_func_arg -- write_access:1
> > 
> > We'd render:
> > 
> >  3)               |  handle_mm_fault(.address=0xffffffff, .write_access=1) {
> 
> 
> Good solution, except that I wonder about preemption races.
> Imagine the following scenario:
> 
> CPU#0
> trace_graph_entry -> commit to ring_buffer CPU#0
> 
> //entering a the function
> //task is scheduled out and reschduled later on CPU#1
> 
> CPU#1
> trace_func_arg -> commit to ring_buffer CPU#1
> 
> Later on the graph output process:
> 
> print("func(")
> search_for_args on buffer but don't find them because they are another
> cpu ring_buffer.
> 
> Well I guess it should be rare, but it can happen.
> Another race will be interrupts. And interrupt can fill a lot of entries
> between a trace entry and the parameters.
> 
> And yet another thing: the ring buffer does not allow yet to peek more than one entry
> further. But, I guess it doesn't require a lot of change.
> 
> The other solution would be to have a hashtable of functions for which we want to store the
> parameters where we can find the number of parameters to store, and their type.
> This way we could atomically submit those parameter entries and be more sure they will follow
> the current one.
> 
>  
> > trace_graph_return -- handle_mm_fault()
> > trace_func_ret -- 2
> > 
> > We'd render:
> > 
> >  3)   8.806 us    |  } = 2
> > 
> > Then we can register with tracepoints inside functions to add these
> > generic trace_func_arg/_ret entries to augment the graph (and or
> > function) tracer.
> 
> 
> We have some vague discussions about he return value.


Grr, s/have/had

Well, storing the return value as an unsigned long is generic.
It will probably match most of the return types...

Or we can follow the functions hashtable idea for return types: for most of the functions
we store an unsigned long that we will output in hex, for some other we can store special
values depending of the type given for the specific entry (or why not with a callback called
on output time).


> The most simple would be to pick it on the return value register and send it the rest of
> the trace. By default we can just print a hex value of an unsigned long size.
> That would be generic...
>  
> > 
> > Does that make sense?
> > 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-19 21:01 [rfd] function-graph augmentation Peter Zijlstra
  2009-02-19 21:28 ` Frederic Weisbecker
@ 2009-02-20  8:39 ` Ingo Molnar
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2009-02-20  8:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Frederic Weisbecker, Jason Baron, Steven Rostedt, lkml


* Peter Zijlstra <peterz@infradead.org> wrote:

> Hi,
> 
> I was thinking how best to augment the function graph tracer 
> with various information. It seemed useful to add 
> argument/return tracer entries which, when found after a 
> function entry, or function exit entry would be rendered in 
> the trace.
> 
> So supposing something like;
> 
>  3)               |  handle_mm_fault() {                                        
>  3)               |    count_vm_event() {                                       
>  3)   0.243 us    |      test_ti_thread_flag();                                 
>  3)   0.754 us    |    }                                                        
>  3)   0.249 us    |    pud_alloc();                                             
>  3)   0.251 us    |    pmd_alloc();                                             
>  3)               |    __do_fault() {                                           
>  3)               |      filemap_fault() {                                      
>  3)               |        find_lock_page() {                                   
>  3)               |          find_get_page() {                                  
>  3)   0.248 us    |            test_ti_thread_flag();                           
>  3)   0.844 us    |          }                                                  
>  3)   1.341 us    |        }                                                    
>  3)   1.837 us    |      }                                                      
>  3)   0.275 us    |      _spin_lock();                                          
>  3)   0.257 us    |      page_add_file_rmap();                                  
>  3)   0.233 us    |      native_set_pte_at();                                   
>  3)               |      _spin_unlock() {                                       
>  3)   0.248 us    |        test_ti_thread_flag();                               
>  3)   0.742 us    |      }                                                      
>  3)               |      unlock_page() {                                        
>  3)   0.243 us    |        page_waitqueue();                                    
>  3)   0.237 us    |        __wake_up_bit();                                     
>  3)   1.209 us    |      }                                                      
>  3)   6.274 us    |    }                                                        
>  3)   8.806 us    |  } 
> 
> Say we found:
> 
> trace_graph_entry -- handle_mm_fault()
> trace_func_arg -- address:0xffffffff
> trace_func_arg -- write_access:1
> 
> We'd render:
> 
>  3)               |  handle_mm_fault(.address=0xffffffff, .write_access=1) {
> 
> trace_graph_return -- handle_mm_fault()
> trace_func_ret -- 2
> 
> We'd render:
> 
>  3)   8.806 us    |  } = 2
> 
> Then we can register with tracepoints inside functions to add 
> these generic trace_func_arg/_ret entries to augment the graph 
> (and or function) tracer.
> 
> Does that make sense?

Yeah, it looks really nice that way.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-19 21:28 ` Frederic Weisbecker
  2009-02-19 21:38   ` Frederic Weisbecker
@ 2009-02-20  8:56   ` Ingo Molnar
  2009-02-20 13:30     ` Arnaldo Carvalho de Melo
  2009-02-20 14:36     ` Frederic Weisbecker
  1 sibling, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2009-02-20  8:56 UTC (permalink / raw)
  To: Frederic Weisbecker, Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Jason Baron, Steven Rostedt, lkml

* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Thu, Feb 19, 2009 at 10:01:44PM +0100, Peter Zijlstra wrote:
> > Hi,
> > 
> > I was thinking how best to augment the function graph tracer with
> > various information. It seemed useful to add argument/return tracer
> > entries which, when found after a function entry, or function exit entry
> > would be rendered in the trace.
> > 
> > So supposing something like;
> > 
> >  3)               |  handle_mm_fault() {                                        
> >  3)               |    count_vm_event() {                                       
> >  3)   0.243 us    |      test_ti_thread_flag();                                 
> >  3)   0.754 us    |    }                                                        
> >  3)   0.249 us    |    pud_alloc();                                             
> >  3)   0.251 us    |    pmd_alloc();                                             
> >  3)               |    __do_fault() {                                           
> >  3)               |      filemap_fault() {                                      
> >  3)               |        find_lock_page() {                                   
> >  3)               |          find_get_page() {                                  
> >  3)   0.248 us    |            test_ti_thread_flag();                           
> >  3)   0.844 us    |          }                                                  
> >  3)   1.341 us    |        }                                                    
> >  3)   1.837 us    |      }                                                      
> >  3)   0.275 us    |      _spin_lock();                                          
> >  3)   0.257 us    |      page_add_file_rmap();                                  
> >  3)   0.233 us    |      native_set_pte_at();                                   
> >  3)               |      _spin_unlock() {                                       
> >  3)   0.248 us    |        test_ti_thread_flag();                               
> >  3)   0.742 us    |      }                                                      
> >  3)               |      unlock_page() {                                        
> >  3)   0.243 us    |        page_waitqueue();                                    
> >  3)   0.237 us    |        __wake_up_bit();                                     
> >  3)   1.209 us    |      }                                                      
> >  3)   6.274 us    |    }                                                        
> >  3)   8.806 us    |  } 
> > 
> > Say we found:
> > 
> > trace_graph_entry -- handle_mm_fault()
> > trace_func_arg -- address:0xffffffff
> > trace_func_arg -- write_access:1
> > 
> > We'd render:
> > 
> >  3)               |  handle_mm_fault(.address=0xffffffff, .write_access=1) {
> 
> 
> Good solution, except that I wonder about preemption races.
> Imagine the following scenario:
> 
> CPU#0
> trace_graph_entry -> commit to ring_buffer CPU#0
> 
> //entering a the function
> //task is scheduled out and reschduled later on CPU#1
> 
> CPU#1
> trace_func_arg -> commit to ring_buffer CPU#1
> 
> Later on the graph output process:
> 
> print("func(")
> search_for_args on buffer but don't find them because they are another
> cpu ring_buffer.
> 
> Well I guess it should be rare, but it can happen.
> Another race will be interrupts. And interrupt can fill a lot 
> of entries between a trace entry and the parameters.
> 
> And yet another thing: the ring buffer does not allow yet to 
> peek more than one entry further. But, I guess it doesn't 
> require a lot of change.
> 
> The other solution would be to have a hashtable of functions 
> for which we want to store the parameters where we can find 
> the number of parameters to store, and their type. This way we 
> could atomically submit those parameter entries and be more 
> sure they will follow the current one.

Hm, i dont really like this approach of doing it via the 
ring-buffer - it's fundamentally fragile. There's two solutions 
i can think of - one is to use the return stack and the other 
one is to also use dwarf2 data.

1)

Regarding return values there's not a lot of problems (except 
the bit size of the return value). No preemption and no IRQ 
troubles either since it's all atomic and we augment the entry 
that we emit - right?

Regarding entry parameters augmentation, it should be solved 
differently i believe.

When say handle_mm_fault() is augmented, we already create a 
function-return-stack entry for it. So when the tracepoint is 
executed, it should not try to find things in the ring buffer 
(which as you point out is racy and fragile:

 - preemption moves us to the wrong ringbuffer
 - IRQs can interact
 - it can take a long time to scan the ring-buffer
 - entries might be consumed already, etc., etc.

but we could delay the injection of the function entry event, up 
until we reach the tracepoint. We can do that by annotating 
handle_mm_fault() with a trace_arguments attribute for example.

In such a solution there's no 'distance' between the entry event 
and the extra-parameters event, so it's all atomic and neither 
preemption nor IRQs cause problems.

2)

Another, entirely different, and i think complementary approach, 
which exciting new possibilities would be to (also) 
automatically pick up arguments from the stack, on function 
entry.

If there's a (read-mostly, lockless-to-read and scalable) 
function attributes hash, then we could encode the parameters 
signature of functions (or at least, of certain functions) in 
the attributes hash. Then the tracer will know how many 
arguments to pick up from the stack.

This approach has the advantage that we could reconstruct the 
parameters of _arbitrary_ functions, without having to touch 
those functions. We already enumerate all functions during build 
time, it would take some more dwarf2 magic to recover the 
call/parameter signature. Oh, and at that time we could also 
record the _return type_ - easing the return value.

Note that it does not take a full, bloated DEBUG_INFO build - we 
can build a -g object to get the dwarf2 data and then strip out 
the dwarf2 data.

Arnaldo, what do you think about this, how feasible would it be 
to put dwarf2 magic into scripts/recordmcount.pl?

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-20  8:56   ` Ingo Molnar
@ 2009-02-20 13:30     ` Arnaldo Carvalho de Melo
  2009-02-20 14:04       ` Arnaldo Carvalho de Melo
  2009-02-20 14:36     ` Frederic Weisbecker
  1 sibling, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2009-02-20 13:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Peter Zijlstra, Jason Baron, Steven Rostedt,
	lkml

Em Fri, Feb 20, 2009 at 09:56:27AM +0100, Ingo Molnar escreveu:
> 2)
> 
> Another, entirely different, and i think complementary approach, 
> which exciting new possibilities would be to (also) 
> automatically pick up arguments from the stack, on function 
> entry.
> 
> If there's a (read-mostly, lockless-to-read and scalable) 
> function attributes hash, then we could encode the parameters 
> signature of functions (or at least, of certain functions) in 
> the attributes hash. Then the tracer will know how many 
> arguments to pick up from the stack.
> 
> This approach has the advantage that we could reconstruct the 
> parameters of _arbitrary_ functions, without having to touch 
> those functions. We already enumerate all functions during build 
> time, it would take some more dwarf2 magic to recover the 
> call/parameter signature. Oh, and at that time we could also 
> record the _return type_ - easing the return value.
> 
> Note that it does not take a full, bloated DEBUG_INFO build - we 
> can build a -g object to get the dwarf2 data and then strip out 
> the dwarf2 data.
> 
> Arnaldo, what do you think about this, how feasible would it be 
> to put dwarf2 magic into scripts/recordmcount.pl?

/me reading scripts/recordmcount.pl...

- Arnaldo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-20 13:30     ` Arnaldo Carvalho de Melo
@ 2009-02-20 14:04       ` Arnaldo Carvalho de Melo
  2009-02-20 14:17         ` Ingo Molnar
  2009-02-20 14:18         ` Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2009-02-20 14:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Peter Zijlstra, Jason Baron, Steven Rostedt,
	lkml

Em Fri, Feb 20, 2009 at 10:30:11AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Feb 20, 2009 at 09:56:27AM +0100, Ingo Molnar escreveu:
> > 2)
> > 
> > Another, entirely different, and i think complementary approach, 
> > which exciting new possibilities would be to (also) 
> > automatically pick up arguments from the stack, on function 
> > entry.
> > 
> > If there's a (read-mostly, lockless-to-read and scalable) 
> > function attributes hash, then we could encode the parameters 
> > signature of functions (or at least, of certain functions) in 
> > the attributes hash. Then the tracer will know how many 
> > arguments to pick up from the stack.
> > 
> > This approach has the advantage that we could reconstruct the 
> > parameters of _arbitrary_ functions, without having to touch 
> > those functions. We already enumerate all functions during build 
> > time, it would take some more dwarf2 magic to recover the 
> > call/parameter signature. Oh, and at that time we could also 
> > record the _return type_ - easing the return value.
> > 
> > Note that it does not take a full, bloated DEBUG_INFO build - we 
> > can build a -g object to get the dwarf2 data and then strip out 
> > the dwarf2 data.
> > 
> > Arnaldo, what do you think about this, how feasible would it be 
> > to put dwarf2 magic into scripts/recordmcount.pl?
> 
> /me reading scripts/recordmcount.pl...

So you want to:

1. build object with -g
2. just after it is built, get what we want from the DWARF sections,
   then strip it, stash what we collected
3. what we want is:
	- parameter names
	- _where_ each parameter is (DWARF location expressions)
	- type signature (CTF like stuff)
4. allow users to ask for parameters (all? just some?) for certain functions
   to be collected at function entry
5. at function entry check if parameters should be collected, go over
   each parameter DWARF location expression and collect the values,
   encoding them into the ring buffer
6. at cat /d/tracing/trace time pretty print the parameters collected,
   i.e. name=value-formatting-according-to-its-type

Ok, base types are easy, enums are easy, what about structs? forget
about them and just print the pointer? i.e.:

	_spin_lock(.lock=0xabcdef)

versus:

	_spin_lock(.lock={.raw_lock={.slock=0}})

All members should be collected? Just some? User decides?

- Arnaldo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-20 14:04       ` Arnaldo Carvalho de Melo
@ 2009-02-20 14:17         ` Ingo Molnar
  2009-02-20 14:18         ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2009-02-20 14:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, Peter Zijlstra, Jason Baron, Steven Rostedt,
	lkml


* Arnaldo Carvalho de Melo <acme@redhat.com> wrote:

> Em Fri, Feb 20, 2009 at 10:30:11AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Feb 20, 2009 at 09:56:27AM +0100, Ingo Molnar escreveu:
> > > 2)
> > > 
> > > Another, entirely different, and i think complementary approach, 
> > > which exciting new possibilities would be to (also) 
> > > automatically pick up arguments from the stack, on function 
> > > entry.
> > > 
> > > If there's a (read-mostly, lockless-to-read and scalable) 
> > > function attributes hash, then we could encode the parameters 
> > > signature of functions (or at least, of certain functions) in 
> > > the attributes hash. Then the tracer will know how many 
> > > arguments to pick up from the stack.
> > > 
> > > This approach has the advantage that we could reconstruct the 
> > > parameters of _arbitrary_ functions, without having to touch 
> > > those functions. We already enumerate all functions during build 
> > > time, it would take some more dwarf2 magic to recover the 
> > > call/parameter signature. Oh, and at that time we could also 
> > > record the _return type_ - easing the return value.
> > > 
> > > Note that it does not take a full, bloated DEBUG_INFO build - we 
> > > can build a -g object to get the dwarf2 data and then strip out 
> > > the dwarf2 data.
> > > 
> > > Arnaldo, what do you think about this, how feasible would it be 
> > > to put dwarf2 magic into scripts/recordmcount.pl?
> > 
> > /me reading scripts/recordmcount.pl...
> 
> So you want to:
> 
> 1. build object with -g
> 2. just after it is built, get what we want from the DWARF sections,
>    then strip it, stash what we collected
> 3. what we want is:
> 	- parameter names
> 	- _where_ each parameter is (DWARF location expressions)
> 	- type signature (CTF like stuff)
> 4. allow users to ask for parameters (all? just some?) for certain functions
>    to be collected at function entry
> 5. at function entry check if parameters should be collected, go over
>    each parameter DWARF location expression and collect the values,
>    encoding them into the ring buffer
> 6. at cat /d/tracing/trace time pretty print the parameters collected,
>    i.e. name=value-formatting-according-to-its-type

yeah.

> Ok, base types are easy, enums are easy, what about structs? forget
> about them and just print the pointer? i.e.:
> 
> 	_spin_lock(.lock=0xabcdef)

yeah.

> versus:
> 
> 	_spin_lock(.lock={.raw_lock={.slock=0}})
> 
> All members should be collected? Just some? User decides?

I think we should concentrate on the simplest, most obvious use, 
and iterate from there, gradually.

There's a lot of unknowns here - how reliable is the dwarf2 data 
in practice: we _really_ dont want to trust dwarf2 data by 
default becaus it can crash the kernel - so it's best to put it 
in some trusted format controlled by us - just like recordmcount 
works as a post-processor.

So if we have something simple and obvious and robust to start 
from we'll know a lot more once we started using it.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-20 14:04       ` Arnaldo Carvalho de Melo
  2009-02-20 14:17         ` Ingo Molnar
@ 2009-02-20 14:18         ` Peter Zijlstra
  2009-02-20 14:32           ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2009-02-20 14:18 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Frederic Weisbecker, Jason Baron, Steven Rostedt,
	lkml

On Fri, 2009-02-20 at 11:04 -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Feb 20, 2009 at 10:30:11AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Feb 20, 2009 at 09:56:27AM +0100, Ingo Molnar escreveu:
> > > 2)
> > > 
> > > Another, entirely different, and i think complementary approach, 
> > > which exciting new possibilities would be to (also) 
> > > automatically pick up arguments from the stack, on function 
> > > entry.
> > > 
> > > If there's a (read-mostly, lockless-to-read and scalable) 
> > > function attributes hash, then we could encode the parameters 
> > > signature of functions (or at least, of certain functions) in 
> > > the attributes hash. Then the tracer will know how many 
> > > arguments to pick up from the stack.
> > > 
> > > This approach has the advantage that we could reconstruct the 
> > > parameters of _arbitrary_ functions, without having to touch 
> > > those functions. We already enumerate all functions during build 
> > > time, it would take some more dwarf2 magic to recover the 
> > > call/parameter signature. Oh, and at that time we could also 
> > > record the _return type_ - easing the return value.
> > > 
> > > Note that it does not take a full, bloated DEBUG_INFO build - we 
> > > can build a -g object to get the dwarf2 data and then strip out 
> > > the dwarf2 data.
> > > 
> > > Arnaldo, what do you think about this, how feasible would it be 
> > > to put dwarf2 magic into scripts/recordmcount.pl?
> > 
> > /me reading scripts/recordmcount.pl...
> 
> So you want to:
> 
> 1. build object with -g
> 2. just after it is built, get what we want from the DWARF sections,
>    then strip it, stash what we collected
> 3. what we want is:
> 	- parameter names
> 	- _where_ each parameter is (DWARF location expressions)
> 	- type signature (CTF like stuff)
> 4. allow users to ask for parameters (all? just some?) for certain functions
>    to be collected at function entry
> 5. at function entry check if parameters should be collected, go over
>    each parameter DWARF location expression and collect the values,
>    encoding them into the ring buffer
> 6. at cat /d/tracing/trace time pretty print the parameters collected,
>    i.e. name=value-formatting-according-to-its-type
> 
> Ok, base types are easy, enums are easy, what about structs? forget
> about them and just print the pointer? i.e.:
> 
> 	_spin_lock(.lock=0xabcdef)
> 
> versus:
> 
> 	_spin_lock(.lock={.raw_lock={.slock=0}})
> 
> All members should be collected? Just some? User decides?

I would say some, with a code annotation. 

Specifically for your example, I wanted to use lockdep/stat tracepoints
to collect the lockdep lock class name to print in locking primitives,
those are much more descriptive than either a lock pointer or a dump of
the full structure.

So perhaps we can do something like:

diff --git a/mm/memory.c b/mm/memory.c
index 36faa3a..20efad1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2869,6 +2869,8 @@ EXPORT_SYMBOL(pagefault_enable);
 int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		unsigned long address, int write_access)
 {
+	__trace_function_arg(address);
+
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;


Or possibly go through tracepoint probes to collect them.

Anyway, it would be grand if we can have some GCC people participate
because things like recordmount.pl, while highly creative, utterly tank
build times.





^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-20 14:18         ` Peter Zijlstra
@ 2009-02-20 14:32           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2009-02-20 14:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Frederic Weisbecker, Jason Baron, Steven Rostedt,
	lkml

Em Fri, Feb 20, 2009 at 03:18:02PM +0100, Peter Zijlstra escreveu:
> On Fri, 2009-02-20 at 11:04 -0300, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Feb 20, 2009 at 10:30:11AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Fri, Feb 20, 2009 at 09:56:27AM +0100, Ingo Molnar escreveu:
> > > > 2)
> > > > 
> > > > Another, entirely different, and i think complementary approach, 
> > > > which exciting new possibilities would be to (also) 
> > > > automatically pick up arguments from the stack, on function 
> > > > entry.
> > > > 
> > > > If there's a (read-mostly, lockless-to-read and scalable) 
> > > > function attributes hash, then we could encode the parameters 
> > > > signature of functions (or at least, of certain functions) in 
> > > > the attributes hash. Then the tracer will know how many 
> > > > arguments to pick up from the stack.
> > > > 
> > > > This approach has the advantage that we could reconstruct the 
> > > > parameters of _arbitrary_ functions, without having to touch 
> > > > those functions. We already enumerate all functions during build 
> > > > time, it would take some more dwarf2 magic to recover the 
> > > > call/parameter signature. Oh, and at that time we could also 
> > > > record the _return type_ - easing the return value.
> > > > 
> > > > Note that it does not take a full, bloated DEBUG_INFO build - we 
> > > > can build a -g object to get the dwarf2 data and then strip out 
> > > > the dwarf2 data.
> > > > 
> > > > Arnaldo, what do you think about this, how feasible would it be 
> > > > to put dwarf2 magic into scripts/recordmcount.pl?
> > > 
> > > /me reading scripts/recordmcount.pl...
> > 
> > So you want to:
> > 
> > 1. build object with -g
> > 2. just after it is built, get what we want from the DWARF sections,
> >    then strip it, stash what we collected
> > 3. what we want is:
> > 	- parameter names
> > 	- _where_ each parameter is (DWARF location expressions)
> > 	- type signature (CTF like stuff)
> > 4. allow users to ask for parameters (all? just some?) for certain functions
> >    to be collected at function entry
> > 5. at function entry check if parameters should be collected, go over
> >    each parameter DWARF location expression and collect the values,
> >    encoding them into the ring buffer
> > 6. at cat /d/tracing/trace time pretty print the parameters collected,
> >    i.e. name=value-formatting-according-to-its-type
> > 
> > Ok, base types are easy, enums are easy, what about structs? forget
> > about them and just print the pointer? i.e.:
> > 
> > 	_spin_lock(.lock=0xabcdef)
> > 
> > versus:
> > 
> > 	_spin_lock(.lock={.raw_lock={.slock=0}})
> > 
> > All members should be collected? Just some? User decides?
> 
> I would say some, with a code annotation. 
> 
> Specifically for your example, I wanted to use lockdep/stat tracepoints
> to collect the lockdep lock class name to print in locking primitives,

I was going to ask about that ;-)
 
> those are much more descriptive than either a lock pointer or a dump of
> the full structure.

Agreed, specific pretty-printers should be used when available in a lib.

> So perhaps we can do something like:
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 36faa3a..20efad1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2869,6 +2869,8 @@ EXPORT_SYMBOL(pagefault_enable);
>  int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  		unsigned long address, int write_access)
>  {
> +	__trace_function_arg(address);
> +
>  	pgd_t *pgd;
>  	pud_t *pud;
>  	pmd_t *pmd;
> 
> 
> Or possibly go through tracepoint probes to collect them.

Code gets polluted, no? I.e. only before hand annotated arguments will
be collected.

Well, perhaps for really really really interesting functions this will
speed up things by almost unconditionally collecting the explicitely
annotated traceparms.
 
> Anyway, it would be grand if we can have some GCC people participate
> because things like recordmount.pl, while highly creative, utterly tank
> build times.

yes, but don't require a new compiler.

If these use cases were considered as working proofs of concept, then
hopefully in the future we would have them built into the compiler.

- Arnaldo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfd] function-graph augmentation
  2009-02-20  8:56   ` Ingo Molnar
  2009-02-20 13:30     ` Arnaldo Carvalho de Melo
@ 2009-02-20 14:36     ` Frederic Weisbecker
  1 sibling, 0 replies; 11+ messages in thread
From: Frederic Weisbecker @ 2009-02-20 14:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Jason Baron,
	Steven Rostedt, lkml

On Fri, Feb 20, 2009 at 09:56:27AM +0100, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Thu, Feb 19, 2009 at 10:01:44PM +0100, Peter Zijlstra wrote:
> > > Hi,
> > > 
> > > I was thinking how best to augment the function graph tracer with
> > > various information. It seemed useful to add argument/return tracer
> > > entries which, when found after a function entry, or function exit entry
> > > would be rendered in the trace.
> > > 
> > > So supposing something like;
> > > 
> > >  3)               |  handle_mm_fault() {                                        
> > >  3)               |    count_vm_event() {                                       
> > >  3)   0.243 us    |      test_ti_thread_flag();                                 
> > >  3)   0.754 us    |    }                                                        
> > >  3)   0.249 us    |    pud_alloc();                                             
> > >  3)   0.251 us    |    pmd_alloc();                                             
> > >  3)               |    __do_fault() {                                           
> > >  3)               |      filemap_fault() {                                      
> > >  3)               |        find_lock_page() {                                   
> > >  3)               |          find_get_page() {                                  
> > >  3)   0.248 us    |            test_ti_thread_flag();                           
> > >  3)   0.844 us    |          }                                                  
> > >  3)   1.341 us    |        }                                                    
> > >  3)   1.837 us    |      }                                                      
> > >  3)   0.275 us    |      _spin_lock();                                          
> > >  3)   0.257 us    |      page_add_file_rmap();                                  
> > >  3)   0.233 us    |      native_set_pte_at();                                   
> > >  3)               |      _spin_unlock() {                                       
> > >  3)   0.248 us    |        test_ti_thread_flag();                               
> > >  3)   0.742 us    |      }                                                      
> > >  3)               |      unlock_page() {                                        
> > >  3)   0.243 us    |        page_waitqueue();                                    
> > >  3)   0.237 us    |        __wake_up_bit();                                     
> > >  3)   1.209 us    |      }                                                      
> > >  3)   6.274 us    |    }                                                        
> > >  3)   8.806 us    |  } 
> > > 
> > > Say we found:
> > > 
> > > trace_graph_entry -- handle_mm_fault()
> > > trace_func_arg -- address:0xffffffff
> > > trace_func_arg -- write_access:1
> > > 
> > > We'd render:
> > > 
> > >  3)               |  handle_mm_fault(.address=0xffffffff, .write_access=1) {
> > 
> > 
> > Good solution, except that I wonder about preemption races.
> > Imagine the following scenario:
> > 
> > CPU#0
> > trace_graph_entry -> commit to ring_buffer CPU#0
> > 
> > //entering a the function
> > //task is scheduled out and reschduled later on CPU#1
> > 
> > CPU#1
> > trace_func_arg -> commit to ring_buffer CPU#1
> > 
> > Later on the graph output process:
> > 
> > print("func(")
> > search_for_args on buffer but don't find them because they are another
> > cpu ring_buffer.
> > 
> > Well I guess it should be rare, but it can happen.
> > Another race will be interrupts. And interrupt can fill a lot 
> > of entries between a trace entry and the parameters.
> > 
> > And yet another thing: the ring buffer does not allow yet to 
> > peek more than one entry further. But, I guess it doesn't 
> > require a lot of change.
> > 
> > The other solution would be to have a hashtable of functions 
> > for which we want to store the parameters where we can find 
> > the number of parameters to store, and their type. This way we 
> > could atomically submit those parameter entries and be more 
> > sure they will follow the current one.
> 
> Hm, i dont really like this approach of doing it via the 
> ring-buffer - it's fundamentally fragile. There's two solutions 
> i can think of - one is to use the return stack and the other 
> one is to also use dwarf2 data.
> 
> 1)
> 
> Regarding return values there's not a lot of problems (except 
> the bit size of the return value). No preemption and no IRQ 
> troubles either since it's all atomic and we augment the entry 
> that we emit - right?
> 
> Regarding entry parameters augmentation, it should be solved 
> differently i believe.
> 
> When say handle_mm_fault() is augmented, we already create a 
> function-return-stack entry for it. So when the tracepoint is 
> executed, it should not try to find things in the ring buffer 
> (which as you point out is racy and fragile:
> 
>  - preemption moves us to the wrong ringbuffer
>  - IRQs can interact
>  - it can take a long time to scan the ring-buffer
>  - entries might be consumed already, etc., etc.
> 
> but we could delay the injection of the function entry event, up 
> until we reach the tracepoint. We can do that by annotating 
> handle_mm_fault() with a trace_arguments attribute for example.
> 
> In such a solution there's no 'distance' between the entry event 
> and the extra-parameters event, so it's all atomic and neither 
> preemption nor IRQs cause problems.
> 
> 2)
> 
> Another, entirely different, and i think complementary approach, 
> which exciting new possibilities would be to (also) 
> automatically pick up arguments from the stack, on function 
> entry.
> 
> If there's a (read-mostly, lockless-to-read and scalable) 
> function attributes hash, then we could encode the parameters 
> signature of functions (or at least, of certain functions) in 
> the attributes hash. Then the tracer will know how many 
> arguments to pick up from the stack.
> 
> This approach has the advantage that we could reconstruct the 
> parameters of _arbitrary_ functions, without having to touch 
> those functions. We already enumerate all functions during build 
> time, it would take some more dwarf2 magic to recover the 
> call/parameter signature. Oh, and at that time we could also 
> record the _return type_ - easing the return value.
> 
> Note that it does not take a full, bloated DEBUG_INFO build - we 
> can build a -g object to get the dwarf2 data and then strip out 
> the dwarf2 data.
> 
> Arnaldo, what do you think about this, how feasible would it be 
> to put dwarf2 magic into scripts/recordmcount.pl?
> 
> 	Ingo

Those two approach will be each differently powerful and scalable.
The first will help to eventually output complex parameters such as
structures pointers. But it will act only on functions which will have those tracepoints.

The second will let most of the kernel functions having their parameters output
but with less precision, since only very simple types such as int,long,char etc...will
have an interesting output.

It would be good if both can be complementary.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-02-20 14:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-19 21:01 [rfd] function-graph augmentation Peter Zijlstra
2009-02-19 21:28 ` Frederic Weisbecker
2009-02-19 21:38   ` Frederic Weisbecker
2009-02-20  8:56   ` Ingo Molnar
2009-02-20 13:30     ` Arnaldo Carvalho de Melo
2009-02-20 14:04       ` Arnaldo Carvalho de Melo
2009-02-20 14:17         ` Ingo Molnar
2009-02-20 14:18         ` Peter Zijlstra
2009-02-20 14:32           ` Arnaldo Carvalho de Melo
2009-02-20 14:36     ` Frederic Weisbecker
2009-02-20  8:39 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).