tracing vs. string formatting

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* tracing vs. string formatting
@ 2011-11-18  9:59 Johannes Berg
  2011-11-18 13:50 ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: Johannes Berg @ 2011-11-18  9:59 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Frederic Weisbecker

Hi,

I'm thinking about adding tracing to all debug messages in our driver
(iwlwifi) so we can ask people to record all debug messages (which are
way too many, some can be removed), and then we could post-process it
and only show selectively the ones we care about.

Converting all the messages and/or adding tracing at all the points
where we have messages now would be a huge effort, and sometimes the
messages are all that we can get from users, so we wouldn't want to get
rid of them either.

Therefore I was thinking we should just trace the messages.

The issue I ran into is that obviously almost all messages need string
formatting. But to put them into the trace, we have two choices now that
I can see:

1) format the messages into a temporary buffer, then copy into the trace
2) format the messages into the ringbuffer, reserving the max length we
   think we might need

Both have obvious problems: The trivial implementation of 1) is stupid
if tracing is disabled and at least incurs an extra copy. 2) means that
I need to reserve say 200 bytes for each entry and then will end up not
using a large chunk of it in most cases.

Is there any way to reserve those 200 bytes, but then when we commit
update the trace entry length and also un-reserve whatever size we
didn't need? Then we could directly vsnprintf into the ring-buffer and
use the return value to update from 200 bytes reserved to say 20.

The API I'm envisioning would look a bit like this:

TRACE_EVENT(foo_dbg,
        TP_PROTO(const char *fmt, va_list args),
        TP_ARGS(fmt, args),
        TP_STRUCT__entry(
		/* 200 is the max length we expect/reserve */
                __fmt_string(msg, 200)
        ),
        TP_fast_assign(
                __vprint_str(msg, fmt, args);
        ),
        TP_printk("%s", __get_str(msg))
);

Behind the scenes somehow the 200 would get reserved, propagated,
__vprint_str would call something like
	r = vsnprintf(__get_str(msg), 200, fmt, args);

and use r to shrink the entry if possible.

Now obviously I have no idea if it's even possible to shrink an entry
afterwards :-) It seems like it should be since the local CPU buffer is
locked in some way, but ... what do I know. Interrupts coming in would
probably have to make it not shrink the entry, but that seems feasible
too -- in that case you'd just have the 200 bytes used but not filled
except for the message (though then maybe you have to memset the
remainder to 0).

Does this sound completely stupid? Yes maybe adding tracing (is there a
way to cause certain tracepoints to be printk'ed?) would be better but
converting all the messages to like a few hundred tracepoints would be a
huge waste as well.

If you made it this far -- thanks! :-)

johannes

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: tracing vs. string formatting
  2011-11-18  9:59 tracing vs. string formatting Johannes Berg
@ 2011-11-18 13:50 ` Steven Rostedt
  2011-11-18 14:05   ` Johannes Berg
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2011-11-18 13:50 UTC (permalink / raw)
  To: Johannes Berg; +Cc: LKML, Frederic Weisbecker

On Fri, 2011-11-18 at 10:59 +0100, Johannes Berg wrote:
> Hi,
> 
> I'm thinking about adding tracing to all debug messages in our driver
> (iwlwifi) so we can ask people to record all debug messages (which are
> way too many, some can be removed), and then we could post-process it
> and only show selectively the ones we care about.
> 
> Converting all the messages and/or adding tracing at all the points
> where we have messages now would be a huge effort, and sometimes the
> messages are all that we can get from users, so we wouldn't want to get
> rid of them either.
> 
> Therefore I was thinking we should just trace the messages.
> 
> The issue I ran into is that obviously almost all messages need string
> formatting. But to put them into the trace, we have two choices now that
> I can see:
> 
> 1) format the messages into a temporary buffer, then copy into the trace
> 2) format the messages into the ringbuffer, reserving the max length we
>    think we might need

I'm confused? Is this a different topic than printk tracing? As with
printk, if you put the trace point where I mentioned before, you let
printk figure out the size already and all it takes is a string and
length, and it will allocate just enough for the ring buffer to place it
there.

But if you are talking about something else, then sure this is an issue.
Currently trace_printk() uses a temporary buffer. Why waste 200 bytes in
the ring buffer that's not needed.

> 
> Both have obvious problems: The trivial implementation of 1) is stupid
> if tracing is disabled and at least incurs an extra copy. 2) means that
> I need to reserve say 200 bytes for each entry and then will end up not
> using a large chunk of it in most cases.
> 
> Is there any way to reserve those 200 bytes, but then when we commit
> update the trace entry length and also un-reserve whatever size we
> didn't need? Then we could directly vsnprintf into the ring-buffer and
> use the return value to update from 200 bytes reserved to say 20.
> 
> The API I'm envisioning would look a bit like this:
> 
> TRACE_EVENT(foo_dbg,
>         TP_PROTO(const char *fmt, va_list args),
>         TP_ARGS(fmt, args),
>         TP_STRUCT__entry(
> 		/* 200 is the max length we expect/reserve */
>                 __fmt_string(msg, 200)
>         ),
>         TP_fast_assign(
>                 __vprint_str(msg, fmt, args);
>         ),
>         TP_printk("%s", __get_str(msg))
> );
> 
> 
> Behind the scenes somehow the 200 would get reserved, propagated,
> __vprint_str would call something like
> 	r = vsnprintf(__get_str(msg), 200, fmt, args);
> 
> and use r to shrink the entry if possible.

This is interesting. Now we could add an API to ftrace (and maybe even
perf?) that can shrink the buffer if the space isn't needed. We do that
already if we decide to filter out the trace ponit, the allocated buffer
is freed if no interrupt came in and added data after it. If that
happens than we just convert the event into dead space.

But another issue with this is that when you get to an end of a buffer
page (internal buffers which currently are just page sizes). If you
don't have enough space, it will go to the next buffer page and the
space on the previous page is now wasted and can't be reclaimed.


> 
> 
> 
> Now obviously I have no idea if it's even possible to shrink an entry
> afterwards :-) It seems like it should be since the local CPU buffer is
> locked in some way, but ... what do I know. Interrupts coming in would
> probably have to make it not shrink the entry, but that seems feasible
> too -- in that case you'd just have the 200 bytes used but not filled
> except for the message (though then maybe you have to memset the
> remainder to 0).
> 
> 
> Does this sound completely stupid? Yes maybe adding tracing (is there a
> way to cause certain tracepoints to be printk'ed?) would be better but
> converting all the messages to like a few hundred tracepoints would be a
> huge waste as well.

Wait? I'm confused again. Maybe I read you wrong. Do you mean to make
trace points printk'd? I think just make the current printk into a
tracepoint and be done with it. But it would be all printks or none. We
even still have ftrace_dump_on_oops if we need this information on a
system crash.


> 
> 
> If you made it this far -- thanks! :-)

I did, but I'm still on meds ;)

-- Steve



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: tracing vs. string formatting
  2011-11-18 13:50 ` Steven Rostedt
@ 2011-11-18 14:05   ` Johannes Berg
  0 siblings, 0 replies; 3+ messages in thread
From: Johannes Berg @ 2011-11-18 14:05 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Frederic Weisbecker

On Fri, 2011-11-18 at 08:50 -0500, Steven Rostedt wrote:

> > The issue I ran into is that obviously almost all messages need string
> > formatting. But to put them into the trace, we have two choices now that
> > I can see:
> > 
> > 1) format the messages into a temporary buffer, then copy into the trace
> > 2) format the messages into the ringbuffer, reserving the max length we
> >    think we might need
> 
> I'm confused? Is this a different topic than printk tracing? As with
> printk, if you put the trace point where I mentioned before, you let
> printk figure out the size already and all it takes is a string and
> length, and it will allocate just enough for the ring buffer to place it
> there.

Yes it's a different topic. With the printk tracing I can today (with
the trace console I posted initially that can even be done out of tree)
get all those messages. But if at the same time you have a slow serial
console connected, printk'ing all the messages to the console can be
expensive.

> But if you are talking about something else, then sure this is an issue.
> Currently trace_printk() uses a temporary buffer. Why waste 200 bytes in
> the ring buffer that's not needed.

I hadn't even thought about trace_printk, but yeah, d'oh, it obviously
has the same issue.

> > TRACE_EVENT(foo_dbg,
> >         TP_PROTO(const char *fmt, va_list args),
> >         TP_ARGS(fmt, args),
> >         TP_STRUCT__entry(
> > 		/* 200 is the max length we expect/reserve */
> >                 __fmt_string(msg, 200)
> >         ),
> >         TP_fast_assign(
> >                 __vprint_str(msg, fmt, args);
> >         ),
> >         TP_printk("%s", __get_str(msg))
> > );
> > 
> > 
> > Behind the scenes somehow the 200 would get reserved, propagated,
> > __vprint_str would call something like
> > 	r = vsnprintf(__get_str(msg), 200, fmt, args);
> > 
> > and use r to shrink the entry if possible.
> 
> This is interesting. Now we could add an API to ftrace (and maybe even
> perf?) that can shrink the buffer if the space isn't needed. We do that
> already if we decide to filter out the trace ponit, the allocated buffer
> is freed if no interrupt came in and added data after it. If that
> happens than we just convert the event into dead space.
> 
> But another issue with this is that when you get to an end of a buffer
> page (internal buffers which currently are just page sizes). If you
> don't have enough space, it will go to the next buffer page and the
> space on the previous page is now wasted and can't be reclaimed.

Right. That's not *too* bad though because you're (hopefully!) not going
to have like 1024 here for the size of the messages.


> > Does this sound completely stupid? Yes maybe adding tracing (is there a
> > way to cause certain tracepoints to be printk'ed?) would be better but
> > converting all the messages to like a few hundred tracepoints would be a
> > huge waste as well.
> 
> Wait? I'm confused again. Maybe I read you wrong. Do you mean to make
> trace points printk'd? I think just make the current printk into a
> tracepoint and be done with it. But it would be all printks or none. We
> even still have ftrace_dump_on_oops if we need this information on a
> system crash.

No I'm just thinking into too many different directions at once.

If you assume my mail is about optimising trace_printk() to directly
format into the ring buffer, you'll be much happier ;-)

What my last paragraph was trying to say was: In theory I wouldn't need
to do
	trace_fmt_msg(fmt, va_args);
because I'd have a
	trace_foo(arg1, arg2, arg3);

with the right format in TP_printk(), but I'm trying to get existing
debug infrastructure into the trace *without* getting it into the kernel
messages (!)

johannes


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-11-18 14:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-18  9:59 tracing vs. string formatting Johannes Berg
2011-11-18 13:50 ` Steven Rostedt
2011-11-18 14:05   ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox