* [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 16:00 [PATCH v2 00/15] tracing: 'hist' triggers Tom Zanussi
@ 2015-03-02 16:01 ` Tom Zanussi
2015-03-02 16:37 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Tom Zanussi @ 2015-03-02 16:01 UTC (permalink / raw)
To: rostedt; +Cc: masami.hiramatsu.pt, namhyung, andi, ast, linux-kernel,
Tom Zanussi
Add a gfp flag that allows kmalloc() et al to be used in tracing
functions.
The problem with using kmalloc for tracing is that the tracing
subsystem should be able to trace kmalloc itself, which it can't do
directly because of paths like kmalloc()->trace_kmalloc()->kmalloc()
or kmalloc()->trace_mm_page_alloc()->kmalloc().
With this flag, tracing code could use a special version of kmalloc()
that sets __GFP_NOTRACE on every allocation it does, while leaving the
normal kmalloc() path untouched.
This would allow any tracepoint in the kmalloc path to be avoided via
DEFINE_EVENT_CONDITION() redefinitions of those events, which check
for ___GFP_NOTRACE immediately in their execution and break if set,
thereby avoiding the recursion.
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
include/linux/gfp.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b840e3b..acbbc7c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -34,6 +34,7 @@ struct vm_area_struct;
#define ___GFP_NO_KSWAPD 0x400000u
#define ___GFP_OTHER_NODE 0x800000u
#define ___GFP_WRITE 0x1000000u
+#define ___GFP_NOTRACE 0x2000000u
/* If the above are modified, __GFP_BITS_SHIFT may need updating */
/*
@@ -97,7 +98,7 @@ struct vm_area_struct;
*/
#define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
-#define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 26 /* Room for N __GFP_FOO bits */
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
/* This equals 0, but use constants in case they ever change */
--
1.9.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 16:01 ` [PATCH 07/15] mm: Add ___GFP_NOTRACE Tom Zanussi
@ 2015-03-02 16:37 ` Steven Rostedt
2015-03-02 16:46 ` Tom Zanussi
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2015-03-02 16:37 UTC (permalink / raw)
To: Tom Zanussi; +Cc: masami.hiramatsu.pt, namhyung, andi, ast, linux-kernel
On Mon, 2 Mar 2015 10:01:00 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> Add a gfp flag that allows kmalloc() et al to be used in tracing
> functions.
>
> The problem with using kmalloc for tracing is that the tracing
> subsystem should be able to trace kmalloc itself, which it can't do
> directly because of paths like kmalloc()->trace_kmalloc()->kmalloc()
> or kmalloc()->trace_mm_page_alloc()->kmalloc().
This part I don't like at all. Why can't the memory be preallocated
when the hist is created (the echo 'hist:...')?
kmalloc must never be called from any tracepoint callback.
This change is currently a showstopper.
-- Steve
>
> With this flag, tracing code could use a special version of kmalloc()
> that sets __GFP_NOTRACE on every allocation it does, while leaving the
> normal kmalloc() path untouched.
>
> This would allow any tracepoint in the kmalloc path to be avoided via
> DEFINE_EVENT_CONDITION() redefinitions of those events, which check
> for ___GFP_NOTRACE immediately in their execution and break if set,
> thereby avoiding the recursion.
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 16:37 ` Steven Rostedt
@ 2015-03-02 16:46 ` Tom Zanussi
2015-03-02 17:58 ` Alexei Starovoitov
0 siblings, 1 reply; 16+ messages in thread
From: Tom Zanussi @ 2015-03-02 16:46 UTC (permalink / raw)
To: Steven Rostedt; +Cc: masami.hiramatsu.pt, namhyung, andi, ast, linux-kernel
On Mon, 2015-03-02 at 11:37 -0500, Steven Rostedt wrote:
> On Mon, 2 Mar 2015 10:01:00 -0600
> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>
> > Add a gfp flag that allows kmalloc() et al to be used in tracing
> > functions.
> >
> > The problem with using kmalloc for tracing is that the tracing
> > subsystem should be able to trace kmalloc itself, which it can't do
> > directly because of paths like kmalloc()->trace_kmalloc()->kmalloc()
> > or kmalloc()->trace_mm_page_alloc()->kmalloc().
>
> This part I don't like at all. Why can't the memory be preallocated
> when the hist is created (the echo 'hist:...')?
>
Yeah, I didn't like it either. My original version did exactly what you
suggest and preallocated an array of entries to 'allocate' from in order
to avoid the problem.
But I wanted to attempt to use the bpf_map directly, which already uses
kmalloc internally. My fallback in case this wouldn't fly, which it
obviously won't, would be to add an option to have the bpf_map code
preallocate a maximum number of entries or pass in a client-owned array
for the purpose. I'll do that if I don't hear any better ideas..
Tom
> kmalloc must never be called from any tracepoint callback.
>
> This change is currently a showstopper.
>
> -- Steve
>
>
> >
> > With this flag, tracing code could use a special version of kmalloc()
> > that sets __GFP_NOTRACE on every allocation it does, while leaving the
> > normal kmalloc() path untouched.
> >
> > This would allow any tracepoint in the kmalloc path to be avoided via
> > DEFINE_EVENT_CONDITION() redefinitions of those events, which check
> > for ___GFP_NOTRACE immediately in their execution and break if set,
> > thereby avoiding the recursion.
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 16:46 ` Tom Zanussi
@ 2015-03-02 17:58 ` Alexei Starovoitov
2015-03-02 18:03 ` Tom Zanussi
0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 17:58 UTC (permalink / raw)
To: Tom Zanussi
Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, Mar 2, 2015 at 8:46 AM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> On Mon, 2015-03-02 at 11:37 -0500, Steven Rostedt wrote:
>> On Mon, 2 Mar 2015 10:01:00 -0600
>> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>>
>> > Add a gfp flag that allows kmalloc() et al to be used in tracing
>> > functions.
>> >
>> > The problem with using kmalloc for tracing is that the tracing
>> > subsystem should be able to trace kmalloc itself, which it can't do
>> > directly because of paths like kmalloc()->trace_kmalloc()->kmalloc()
>> > or kmalloc()->trace_mm_page_alloc()->kmalloc().
>>
>> This part I don't like at all. Why can't the memory be preallocated
>> when the hist is created (the echo 'hist:...')?
>>
>
> Yeah, I didn't like it either. My original version did exactly what you
> suggest and preallocated an array of entries to 'allocate' from in order
> to avoid the problem.
>
> But I wanted to attempt to use the bpf_map directly, which already uses
> kmalloc internally. My fallback in case this wouldn't fly, which it
> obviously won't, would be to add an option to have the bpf_map code
> preallocate a maximum number of entries or pass in a client-owned array
> for the purpose. I'll do that if I don't hear any better ideas..
Tom, I'm still reading through the patch set.
Quick comment for the above.
Currently there are two map types: array and hash.
array type is pre-allocating all memory at map creation time.
hash is allocating on demand.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 17:58 ` Alexei Starovoitov
@ 2015-03-02 18:03 ` Tom Zanussi
2015-03-02 18:12 ` Alexei Starovoitov
0 siblings, 1 reply; 16+ messages in thread
From: Tom Zanussi @ 2015-03-02 18:03 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, 2015-03-02 at 09:58 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 8:46 AM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> > On Mon, 2015-03-02 at 11:37 -0500, Steven Rostedt wrote:
> >> On Mon, 2 Mar 2015 10:01:00 -0600
> >> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> >>
> >> > Add a gfp flag that allows kmalloc() et al to be used in tracing
> >> > functions.
> >> >
> >> > The problem with using kmalloc for tracing is that the tracing
> >> > subsystem should be able to trace kmalloc itself, which it can't do
> >> > directly because of paths like kmalloc()->trace_kmalloc()->kmalloc()
> >> > or kmalloc()->trace_mm_page_alloc()->kmalloc().
> >>
> >> This part I don't like at all. Why can't the memory be preallocated
> >> when the hist is created (the echo 'hist:...')?
> >>
> >
> > Yeah, I didn't like it either. My original version did exactly what you
> > suggest and preallocated an array of entries to 'allocate' from in order
> > to avoid the problem.
> >
> > But I wanted to attempt to use the bpf_map directly, which already uses
> > kmalloc internally. My fallback in case this wouldn't fly, which it
> > obviously won't, would be to add an option to have the bpf_map code
> > preallocate a maximum number of entries or pass in a client-owned array
> > for the purpose. I'll do that if I don't hear any better ideas..
>
> Tom, I'm still reading through the patch set.
> Quick comment for the above.
> Currently there are two map types: array and hash.
> array type is pre-allocating all memory at map creation time.
> hash is allocating on demand.
OK, so would it make sense to do the same for the hash type, or at least
add an option that does that?
Tom
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 18:03 ` Tom Zanussi
@ 2015-03-02 18:12 ` Alexei Starovoitov
2015-03-02 18:25 ` Tom Zanussi
2015-03-02 18:43 ` Steven Rostedt
0 siblings, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 18:12 UTC (permalink / raw)
To: Tom Zanussi
Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, Mar 2, 2015 at 10:03 AM, Tom Zanussi
<tom.zanussi@linux.intel.com> wrote:
> On Mon, 2015-03-02 at 09:58 -0800, Alexei Starovoitov wrote:
>> On Mon, Mar 2, 2015 at 8:46 AM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>> > On Mon, 2015-03-02 at 11:37 -0500, Steven Rostedt wrote:
>> >> On Mon, 2 Mar 2015 10:01:00 -0600
>> >> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>> >>
>> >> > Add a gfp flag that allows kmalloc() et al to be used in tracing
>> >> > functions.
>> >> >
>> >> > The problem with using kmalloc for tracing is that the tracing
>> >> > subsystem should be able to trace kmalloc itself, which it can't do
>> >> > directly because of paths like kmalloc()->trace_kmalloc()->kmalloc()
>> >> > or kmalloc()->trace_mm_page_alloc()->kmalloc().
>> >>
>> >> This part I don't like at all. Why can't the memory be preallocated
>> >> when the hist is created (the echo 'hist:...')?
>> >>
>> >
>> > Yeah, I didn't like it either. My original version did exactly what you
>> > suggest and preallocated an array of entries to 'allocate' from in order
>> > to avoid the problem.
>> >
>> > But I wanted to attempt to use the bpf_map directly, which already uses
>> > kmalloc internally. My fallback in case this wouldn't fly, which it
>> > obviously won't, would be to add an option to have the bpf_map code
>> > preallocate a maximum number of entries or pass in a client-owned array
>> > for the purpose. I'll do that if I don't hear any better ideas..
>>
>> Tom, I'm still reading through the patch set.
>> Quick comment for the above.
>> Currently there are two map types: array and hash.
>> array type is pre-allocating all memory at map creation time.
>> hash is allocating on demand.
>
> OK, so would it make sense to do the same for the hash type, or at least
> add an option that does that?
I'm not sure what would be the meaning of hash map that has all
elements pre-allocated...
As I'm reading your cover letter, I agree, we need to find a way
to call kmalloc_notrace-like from tracepoints.
Not sure that patch 8 style of duplicating the functions is clean.
Can we keep kmalloc/kfree as-is and do something like
if (in_tracepoint()) check inside ftrace_raw_kmalloc* ?
so that kmalloc will be traced but calls to kmalloc from inside
tracepoints will be automatically suppressed ?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 18:12 ` Alexei Starovoitov
@ 2015-03-02 18:25 ` Tom Zanussi
2015-03-02 18:43 ` Steven Rostedt
1 sibling, 0 replies; 16+ messages in thread
From: Tom Zanussi @ 2015-03-02 18:25 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, 2015-03-02 at 10:12 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 10:03 AM, Tom Zanussi
> <tom.zanussi@linux.intel.com> wrote:
> > On Mon, 2015-03-02 at 09:58 -0800, Alexei Starovoitov wrote:
> >> On Mon, Mar 2, 2015 at 8:46 AM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> >> > On Mon, 2015-03-02 at 11:37 -0500, Steven Rostedt wrote:
> >> >> On Mon, 2 Mar 2015 10:01:00 -0600
> >> >> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> >> >>
> >> >> > Add a gfp flag that allows kmalloc() et al to be used in tracing
> >> >> > functions.
> >> >> >
> >> >> > The problem with using kmalloc for tracing is that the tracing
> >> >> > subsystem should be able to trace kmalloc itself, which it can't do
> >> >> > directly because of paths like kmalloc()->trace_kmalloc()->kmalloc()
> >> >> > or kmalloc()->trace_mm_page_alloc()->kmalloc().
> >> >>
> >> >> This part I don't like at all. Why can't the memory be preallocated
> >> >> when the hist is created (the echo 'hist:...')?
> >> >>
> >> >
> >> > Yeah, I didn't like it either. My original version did exactly what you
> >> > suggest and preallocated an array of entries to 'allocate' from in order
> >> > to avoid the problem.
> >> >
> >> > But I wanted to attempt to use the bpf_map directly, which already uses
> >> > kmalloc internally. My fallback in case this wouldn't fly, which it
> >> > obviously won't, would be to add an option to have the bpf_map code
> >> > preallocate a maximum number of entries or pass in a client-owned array
> >> > for the purpose. I'll do that if I don't hear any better ideas..
> >>
> >> Tom, I'm still reading through the patch set.
> >> Quick comment for the above.
> >> Currently there are two map types: array and hash.
> >> array type is pre-allocating all memory at map creation time.
> >> hash is allocating on demand.
> >
> > OK, so would it make sense to do the same for the hash type, or at least
> > add an option that does that?
>
> I'm not sure what would be the meaning of hash map that has all
> elements pre-allocated...
The idea would be that instead of getting your individually kmalloc'ed
elements on-demand from kmalloc while in the handler, you'd get them
from a pool you've pre-allocated when you set up the table. This could
be from a list of individual entries you've already kmalloc'ed ahead of
time, or from an array of n * sizeof(entry).
This would also allow you to avoid GFP_ATOMIC for those.
> As I'm reading your cover letter, I agree, we need to find a way
> to call kmalloc_notrace-like from tracepoints.
> Not sure that patch 8 style of duplicating the functions is clean.
No, it's horrible, but it does the job without changing the normal path
at all.
> Can we keep kmalloc/kfree as-is and do something like
> if (in_tracepoint()) check inside ftrace_raw_kmalloc* ?
Yeah, that's essentially what TP_CONDITION() in patch 8 (Make kmem
memory allocation tracepoints conditional) does.
Tom
> so that kmalloc will be traced but calls to kmalloc from inside
> tracepoints will be automatically suppressed ?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
@ 2015-03-02 18:40 Alexei Starovoitov
2015-03-02 18:54 ` Tom Zanussi
0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 18:40 UTC (permalink / raw)
To: Tom Zanussi
Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, Mar 2, 2015 at 10:25 AM, Tom Zanussi
<tom.zanussi@linux.intel.com> wrote:
> On Mon, 2015-03-02 at 10:12 -0800, Alexei Starovoitov wrote:
>> On Mon, Mar 2, 2015 at 10:03 AM, Tom Zanussi
>> <tom.zanussi@linux.intel.com> wrote:
>> > On Mon, 2015-03-02 at 09:58 -0800, Alexei Starovoitov wrote:
>> >> On Mon, Mar 2, 2015 at 8:46 AM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>> >> > On Mon, 2015-03-02 at 11:37 -0500, Steven Rostedt wrote:
>> >> >> On Mon, 2 Mar 2015 10:01:00 -0600
>> >> >> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>> >> >>
>> >> >> > Add a gfp flag that allows kmalloc() et al to be used in tracing
>> >> >> > functions.
>> >> >> >
>> >> >> > The problem with using kmalloc for tracing is that the tracing
>> >> >> > subsystem should be able to trace kmalloc itself, which it can't do
>> >> >> > directly because of paths like kmalloc()->trace_kmalloc()->kmalloc()
>> >> >> > or kmalloc()->trace_mm_page_alloc()->kmalloc().
>> >> >>
>> >> >> This part I don't like at all. Why can't the memory be preallocated
>> >> >> when the hist is created (the echo 'hist:...')?
>> >> >>
>> >> >
>> >> > Yeah, I didn't like it either. My original version did exactly what you
>> >> > suggest and preallocated an array of entries to 'allocate' from in order
>> >> > to avoid the problem.
>> >> >
>> >> > But I wanted to attempt to use the bpf_map directly, which already uses
>> >> > kmalloc internally. My fallback in case this wouldn't fly, which it
>> >> > obviously won't, would be to add an option to have the bpf_map code
>> >> > preallocate a maximum number of entries or pass in a client-owned array
>> >> > for the purpose. I'll do that if I don't hear any better ideas..
>> >>
>> >> Tom, I'm still reading through the patch set.
>> >> Quick comment for the above.
>> >> Currently there are two map types: array and hash.
>> >> array type is pre-allocating all memory at map creation time.
>> >> hash is allocating on demand.
>> >
>> > OK, so would it make sense to do the same for the hash type, or at least
>> > add an option that does that?
>>
>> I'm not sure what would be the meaning of hash map that has all
>> elements pre-allocated...
>
> The idea would be that instead of getting your individually kmalloc'ed
> elements on-demand from kmalloc while in the handler, you'd get them
> from a pool you've pre-allocated when you set up the table. This could
> be from a list of individual entries you've already kmalloc'ed ahead of
> time, or from an array of n * sizeof(entry).
would work, but kinda ugly, since we will pre-allocate a lot
and may not be using it at all.
> This would also allow you to avoid GFP_ATOMIC for those.
>
>> As I'm reading your cover letter, I agree, we need to find a way
>> to call kmalloc_notrace-like from tracepoints.
>> Not sure that patch 8 style of duplicating the functions is clean.
>
> No, it's horrible, but it does the job without changing the normal path
> at all.
>
>> Can we keep kmalloc/kfree as-is and do something like
>> if (in_tracepoint()) check inside ftrace_raw_kmalloc* ?
>
> Yeah, that's essentially what TP_CONDITION() in patch 8 (Make kmem
> memory allocation tracepoints conditional) does.
not quite, I mean something like global kmalloc recursion flag.
then kmalloc doesn't need to change.
ftrace_raw_event_kmem_alloc() would use a flag
inside ftrace_event_call struct or global recursion flag.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 18:12 ` Alexei Starovoitov
2015-03-02 18:25 ` Tom Zanussi
@ 2015-03-02 18:43 ` Steven Rostedt
1 sibling, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2015-03-02 18:43 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, 2 Mar 2015 10:12:32 -0800
Alexei Starovoitov <ast@plumgrid.com> wrote:
> I'm not sure what would be the meaning of hash map that has all
> elements pre-allocated...
> As I'm reading your cover letter, I agree, we need to find a way
> to call kmalloc_notrace-like from tracepoints.
> Not sure that patch 8 style of duplicating the functions is clean.
> Can we keep kmalloc/kfree as-is and do something like
> if (in_tracepoint()) check inside ftrace_raw_kmalloc* ?
I'm strongly against this. You should not be doing anything in a
tracepoint that you can't do from NMI context. And calling kmalloc
happens to be one of them.
Not to mention, kmalloc is a hot path, and tracing must not have any
impact on it (no extra if statements).
-- Steve
> so that kmalloc will be traced but calls to kmalloc from inside
> tracepoints will be automatically suppressed ?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 18:40 Alexei Starovoitov
@ 2015-03-02 18:54 ` Tom Zanussi
0 siblings, 0 replies; 16+ messages in thread
From: Tom Zanussi @ 2015-03-02 18:54 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, 2015-03-02 at 10:40 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 10:25 AM, Tom Zanussi
> <tom.zanussi@linux.intel.com> wrote:
> > On Mon, 2015-03-02 at 10:12 -0800, Alexei Starovoitov wrote:
> >> On Mon, Mar 2, 2015 at 10:03 AM, Tom Zanussi
> >> <tom.zanussi@linux.intel.com> wrote:
> >> > On Mon, 2015-03-02 at 09:58 -0800, Alexei Starovoitov wrote:
> >> >> On Mon, Mar 2, 2015 at 8:46 AM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> >> >> > On Mon, 2015-03-02 at 11:37 -0500, Steven Rostedt wrote:
> >> >> >> On Mon, 2 Mar 2015 10:01:00 -0600
> >> >> >> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> >> >> >>
> >> >> >> > Add a gfp flag that allows kmalloc() et al to be used in tracing
> >> >> >> > functions.
> >> >> >> >
> >> >> >> > The problem with using kmalloc for tracing is that the tracing
> >> >> >> > subsystem should be able to trace kmalloc itself, which it can't do
> >> >> >> > directly because of paths like kmalloc()->trace_kmalloc()->kmalloc()
> >> >> >> > or kmalloc()->trace_mm_page_alloc()->kmalloc().
> >> >> >>
> >> >> >> This part I don't like at all. Why can't the memory be preallocated
> >> >> >> when the hist is created (the echo 'hist:...')?
> >> >> >>
> >> >> >
> >> >> > Yeah, I didn't like it either. My original version did exactly what you
> >> >> > suggest and preallocated an array of entries to 'allocate' from in order
> >> >> > to avoid the problem.
> >> >> >
> >> >> > But I wanted to attempt to use the bpf_map directly, which already uses
> >> >> > kmalloc internally. My fallback in case this wouldn't fly, which it
> >> >> > obviously won't, would be to add an option to have the bpf_map code
> >> >> > preallocate a maximum number of entries or pass in a client-owned array
> >> >> > for the purpose. I'll do that if I don't hear any better ideas..
> >> >>
> >> >> Tom, I'm still reading through the patch set.
> >> >> Quick comment for the above.
> >> >> Currently there are two map types: array and hash.
> >> >> array type is pre-allocating all memory at map creation time.
> >> >> hash is allocating on demand.
> >> >
> >> > OK, so would it make sense to do the same for the hash type, or at least
> >> > add an option that does that?
> >>
> >> I'm not sure what would be the meaning of hash map that has all
> >> elements pre-allocated...
> >
> > The idea would be that instead of getting your individually kmalloc'ed
> > elements on-demand from kmalloc while in the handler, you'd get them
> > from a pool you've pre-allocated when you set up the table. This could
> > be from a list of individual entries you've already kmalloc'ed ahead of
> > time, or from an array of n * sizeof(entry).
>
> would work, but kinda ugly, since we will pre-allocate a lot
> and may not be using it at all.
>
That's true but you have a user-defined map limit anyway, which you can
adjust to minimize wastage. And allocating ahead of time also means you
perturb the system less while actually tracing.
Tom
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
@ 2015-03-02 19:24 Alexei Starovoitov
2015-03-02 19:33 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 19:24 UTC (permalink / raw)
To: Steven Rostedt
Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, Mar 2, 2015 at 10:43 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2 Mar 2015 10:12:32 -0800
> Alexei Starovoitov <ast@plumgrid.com> wrote:
>
>> I'm not sure what would be the meaning of hash map that has all
>> elements pre-allocated...
>> As I'm reading your cover letter, I agree, we need to find a way
>> to call kmalloc_notrace-like from tracepoints.
>> Not sure that patch 8 style of duplicating the functions is clean.
>> Can we keep kmalloc/kfree as-is and do something like
>> if (in_tracepoint()) check inside ftrace_raw_kmalloc* ?
>
> I'm strongly against this. You should not be doing anything in a
> tracepoint that you can't do from NMI context. And calling kmalloc
> happens to be one of them.
well, percentage of tracepoints called from NMI is tiny
comparing to the rest, so assuming nmi context
everywhere is very inefficient.
Like we can use pre-allocated pool of map entries when
tracepoint is called from NMI, but we shouldn't be using
it in other cases. Just like ring buffers and other things
have nmi and non-nmi pools and code paths, it doesn't
make sense to disallow kmalloc all together.
btw, calling kmalloc is _faster_ than taking
objects from cache-cold special nmi only pool.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
@ 2015-03-02 19:29 Alexei Starovoitov
2015-03-02 22:04 ` Tom Zanussi
0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 19:29 UTC (permalink / raw)
To: Tom Zanussi
Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, Mar 2, 2015 at 10:54 AM, Tom Zanussi
<tom.zanussi@linux.intel.com> wrote:
>> >
>> > The idea would be that instead of getting your individually kmalloc'ed
>> > elements on-demand from kmalloc while in the handler, you'd get them
>> > from a pool you've pre-allocated when you set up the table. This could
>> > be from a list of individual entries you've already kmalloc'ed ahead of
>> > time, or from an array of n * sizeof(entry).
>>
>> would work, but kinda ugly, since we will pre-allocate a lot
>> and may not be using it at all.
>>
>
> That's true but you have a user-defined map limit anyway, which you can
> adjust to minimize wastage. And allocating ahead of time also means you
> perturb the system less while actually tracing.
nope. it's the other way around.
using kmalloc is faster and less overhead on the whole system
then grabbing cache-cold objects from special pool.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 19:24 Alexei Starovoitov
@ 2015-03-02 19:33 ` Steven Rostedt
0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2015-03-02 19:33 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, 2 Mar 2015 11:24:04 -0800
Alexei Starovoitov <ast@plumgrid.com> wrote:
> well, percentage of tracepoints called from NMI is tiny
> comparing to the rest, so assuming nmi context
> everywhere is very inefficient.
> Like we can use pre-allocated pool of map entries when
> tracepoint is called from NMI, but we shouldn't be using
> it in other cases. Just like ring buffers and other things
> have nmi and non-nmi pools and code paths, it doesn't
> make sense to disallow kmalloc all together.
> btw, calling kmalloc is _faster_ than taking
> objects from cache-cold special nmi only pool.
Please show the numbers and post the tests when stating something like
that.
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
@ 2015-03-02 19:52 Alexei Starovoitov
0 siblings, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 19:52 UTC (permalink / raw)
To: Steven Rostedt
Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, Mar 2, 2015 at 11:33 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2 Mar 2015 11:24:04 -0800
> Alexei Starovoitov <ast@plumgrid.com> wrote:
>
>> well, percentage of tracepoints called from NMI is tiny
>> comparing to the rest, so assuming nmi context
>> everywhere is very inefficient.
>> Like we can use pre-allocated pool of map entries when
>> tracepoint is called from NMI, but we shouldn't be using
>> it in other cases. Just like ring buffers and other things
>> have nmi and non-nmi pools and code paths, it doesn't
>> make sense to disallow kmalloc all together.
>> btw, calling kmalloc is _faster_ than taking
>> objects from cache-cold special nmi only pool.
>
> Please show the numbers and post the tests when stating something like
> that.
sure. here is the work that Jesper is doing accelerate slub:
http://thread.gmane.org/gmane.linux.kernel.mm/126138
he's measuring kmalloc/kfree pair as 19ns
which is way less then cost of cache miss on cold buffer
we'll get from custom pool. We want to minimize cache
misses and not absolute number of instructions.
yes, in custom pool one can receive a buffer few cycles
faster then from kmalloc, but it will be cold.
'prefetch of next' trick that slub is using won't work
for custom pool. That is the main problem with it.
By using main allocator the buffers are coming hot.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 19:29 [PATCH 07/15] mm: Add ___GFP_NOTRACE Alexei Starovoitov
@ 2015-03-02 22:04 ` Tom Zanussi
2015-03-02 23:21 ` Alexei Starovoitov
0 siblings, 1 reply; 16+ messages in thread
From: Tom Zanussi @ 2015-03-02 22:04 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, 2015-03-02 at 11:29 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 10:54 AM, Tom Zanussi
> <tom.zanussi@linux.intel.com> wrote:
> >> >
> >> > The idea would be that instead of getting your individually kmalloc'ed
> >> > elements on-demand from kmalloc while in the handler, you'd get them
> >> > from a pool you've pre-allocated when you set up the table. This could
> >> > be from a list of individual entries you've already kmalloc'ed ahead of
> >> > time, or from an array of n * sizeof(entry).
> >>
> >> would work, but kinda ugly, since we will pre-allocate a lot
> >> and may not be using it at all.
> >>
> >
> > That's true but you have a user-defined map limit anyway, which you can
> > adjust to minimize wastage. And allocating ahead of time also means you
> > perturb the system less while actually tracing.
>
> nope. it's the other way around.
> using kmalloc is faster and less overhead on the whole system
> then grabbing cache-cold objects from special pool.
Until you start causing GFP_ATOMIC failures for what you're tracing
because your map has grabbed them all...
Tom
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 07/15] mm: Add ___GFP_NOTRACE
2015-03-02 22:04 ` Tom Zanussi
@ 2015-03-02 23:21 ` Alexei Starovoitov
0 siblings, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 23:21 UTC (permalink / raw)
To: Tom Zanussi
Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML
On Mon, Mar 2, 2015 at 2:04 PM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> On Mon, 2015-03-02 at 11:29 -0800, Alexei Starovoitov wrote:
>> On Mon, Mar 2, 2015 at 10:54 AM, Tom Zanussi
>> <tom.zanussi@linux.intel.com> wrote:
>> >> >
>> >> > The idea would be that instead of getting your individually kmalloc'ed
>> >> > elements on-demand from kmalloc while in the handler, you'd get them
>> >> > from a pool you've pre-allocated when you set up the table. This could
>> >> > be from a list of individual entries you've already kmalloc'ed ahead of
>> >> > time, or from an array of n * sizeof(entry).
>> >>
>> >> would work, but kinda ugly, since we will pre-allocate a lot
>> >> and may not be using it at all.
>> >>
>> >
>> > That's true but you have a user-defined map limit anyway, which you can
>> > adjust to minimize wastage. And allocating ahead of time also means you
>> > perturb the system less while actually tracing.
>>
>> nope. it's the other way around.
>> using kmalloc is faster and less overhead on the whole system
>> then grabbing cache-cold objects from special pool.
>
> Until you start causing GFP_ATOMIC failures for what you're tracing
> because your map has grabbed them all...
hmm, you mean kmalloc runs out of memory since we're
trying to trace on a system that exhausted all memory already?
so then pre-allocation before hand will help?
yep, that's good use case.
I'm not against pre-allocation feature for maps.
My point was that it's not necessary most of the time.
We can add a flag to map_create to pre-allocate. Easily.
Just needs to be controlled by user.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-03-02 23:21 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-02 19:29 [PATCH 07/15] mm: Add ___GFP_NOTRACE Alexei Starovoitov
2015-03-02 22:04 ` Tom Zanussi
2015-03-02 23:21 ` Alexei Starovoitov
-- strict thread matches above, loose matches on Subject: below --
2015-03-02 19:52 Alexei Starovoitov
2015-03-02 19:24 Alexei Starovoitov
2015-03-02 19:33 ` Steven Rostedt
2015-03-02 18:40 Alexei Starovoitov
2015-03-02 18:54 ` Tom Zanussi
2015-03-02 16:00 [PATCH v2 00/15] tracing: 'hist' triggers Tom Zanussi
2015-03-02 16:01 ` [PATCH 07/15] mm: Add ___GFP_NOTRACE Tom Zanussi
2015-03-02 16:37 ` Steven Rostedt
2015-03-02 16:46 ` Tom Zanussi
2015-03-02 17:58 ` Alexei Starovoitov
2015-03-02 18:03 ` Tom Zanussi
2015-03-02 18:12 ` Alexei Starovoitov
2015-03-02 18:25 ` Tom Zanussi
2015-03-02 18:43 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox