netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-17 17:14 Alexei Starovoitov
  2014-12-17 19:51 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-17 17:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Josef Bacik, Kernel Team

On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> I guess even just using 'perf probe' to set those wannabe tracepoints
> should be enough, no? Then he can refer to those in his perf record
> call, etc and process it just like with the real tracepoints.

it's far from ideal for two reasons.
- they have different kernels and dragging along vmlinux
with debug info or multiple 'perf list' data is too cumbersome
operationally. Permanent tracepoints solve this problem.
- the action upon hitting tracepoint is non-trivial.
perf probe style of unconditionally walking pointer chains
will be tripping over wrong pointers.
Plus they already need to do aggregation for high
frequency events.
As part of acting on trace_transmit_skb() event:
if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
  tcp_trace_stats_add(...)
}
if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
  tcp_trace_stats_add(...)
}

^ permalink raw reply	[flat|nested] 29+ messages in thread
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-17 20:42 Alexei Starovoitov
  2014-12-17 20:56 ` David Ahern
  2014-12-17 21:19 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-17 20:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Josef Bacik, Kernel Team

On Wed, Dec 17, 2014 at 11:51 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Wed, Dec 17, 2014 at 09:14:02AM -0800, Alexei Starovoitov escreveu:
>> On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
>> <arnaldo.melo@gmail.com> wrote:
>> > I guess even just using 'perf probe' to set those wannabe tracepoints
>> > should be enough, no? Then he can refer to those in his perf record
>> > call, etc and process it just like with the real tracepoints.
>
>> it's far from ideal for two reasons.
>> - they have different kernels and dragging along vmlinux
>> with debug info or multiple 'perf list' data is too cumbersome
>
> It is not strictly necessary to carry vmlinux, that is just a probe
> point resolution time problem, solvable when generating a shell script,
> on the development machine, to insert the probes.

on N development machines with kernels that
would match worker machines...
I'm not saying it's impossible, just operationally difficult.
This is my understanding of Martin's use case.

>> operationally. Permanent tracepoints solve this problem.
>
> Sure, and when available, use them, my suggestion wasn't to use
> exclusively any mechanism, but to initially use what is available to
> create the tools, then find places that could be improved (if that
> proves to be the case) by using a higher performance mechanism.

agree. I think if kprobe approach was usable, it would have
been used already and yet here you have these patches
that add tracepoints in few strategic places of tcp stack.

>> - the action upon hitting tracepoint is non-trivial.
>> perf probe style of unconditionally walking pointer chains
>> will be tripping over wrong pointers.
>
> Huh? Care to elaborate on this one?

if perf probe does 'result->name' as in your example
then it would work, but patch 5 does conditional
walking of pointers, so you cannot just add
a perf probe that does print(ptr1->value1, ptr2->value2)
It won't crash, but will be collecting wrong stats.
(likely counting zeros)

>> Plus they already need to do aggregation for high
>> frequency events.
>
>> As part of acting on trace_transmit_skb() event:
>> if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
>>   tcp_trace_stats_add(...)
>> }
>> if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
>>   tcp_trace_stats_add(...)
>> }
>
> But aren't these stats TCP already keeps or could be made to?

that's the whole discussion about.
tcp_info has some of them.
Though it's difficult to claim that, say, tcp_info->tcpi_lost is
the same as loss_segs_retrans from patch 5.

^ permalink raw reply	[flat|nested] 29+ messages in thread
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-17  3:06 Alexei Starovoitov
  2014-12-17 21:42 ` Josef Bacik
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-17  3:06 UTC (permalink / raw)
  To: Martin Lau
  Cc: Eric Dumazet, Blake Matheny, Laurent Chavey, Yuchung Cheng,
	netdev@vger.kernel.org, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

On Tue, Dec 16, 2014 at 5:30 PM, Martin Lau <kafai@fb.com> wrote:
>> >> >> I think systemtap like scripting on top of patches 1 and 3
>> >> >> should solve your use case ?
>> > We have quite a few different versions running in the production.  It may not
>> > be operationally easy.
>>
>> different versions of kernel or different versions of tcp_tracer ?
> Former and we are releasing new kernel pretty often.

I see. So for dynamic tracer to be useful in such environment,
the scripts should be compatible across different kernel version
without recompilation. All makes sense.

> How does the current TRACE_EVENT do it when it wants to printf more data?

tracepoints, like any other user interface, shouldn't
break compatibility. With printf it's practically impossible.
Some subsystems may be breaking this rule arguing that
tracepoints is a debug facility, but networking tracepoints don't change.

>> It feels that for stats collection only, tracepoints+tcp_trace
>> do not add much additional value vs extending tcp_info
>> and using ss.
> I think we are on the same page. Once 'this should cost nothing if not
> activated' proposition was cleared out.  It was what I meant that doing the
> collection part in the TCP itself (instead of tracepoints) would be nice.

agree.

> I think going forward, as others have suggested, it may be better to come
> together and reach a common ground on what to collect first before I re-work
> patch 1 to 3 and repost.

I think as a minimum it will be discussed at netdev01 in Feb,
but I suspect not everyone on this list can(want) go to Ottawa,
so would be nice to have a meetup for bay area folks to
discuss this sooner with public g+ hangout.
Thoughts?

^ permalink raw reply	[flat|nested] 29+ messages in thread
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-17  0:15 Alexei Starovoitov
  2014-12-17  1:30 ` Martin Lau
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-17  0:15 UTC (permalink / raw)
  To: Martin Lau
  Cc: Eric Dumazet, Blake Matheny, Laurent Chavey, Yuchung Cheng,
	netdev@vger.kernel.org, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

On Tue, Dec 16, 2014 at 10:28 AM, Martin Lau <kafai@fb.com> wrote:
> We can consider to reuse the events's format (tracing/events/*/format). I think
> blktrace.c is using similar approach in trace-cmd.

yes. tcp_trace is a carbon copy of blktrace applied to tcp.

>> >> I think systemtap like scripting on top of patches 1 and 3
>> >> should solve your use case ?
> We have quite a few different versions running in the production.  It may not
> be operationally easy.

different versions of kernel or different versions of tcp_tracer ?

> Having a getsockopt will be useful for the new application/library to take
> advantage of.
>
> For the continuous monitoring/logging purpose, ftrace can provide event
> triggered tracing instead of periodically consulting ss.

so both getsockopt tcp_info approach and ftrace+tcp_trace
approach can provide the same set of stats per flow, right?
And the only difference is 'ss' needs polling and ftrace
collects all events?
Since they're stats anyway, the polling interval
shouldn't matter. Just like lost trace events?

from patch 5 commit log:
"Define probes and register them to the TCP tracepoints.  The probes
collect the data defined in struct tcp_sk_trace and record them to
the tracing's ring_buffer.
"
so two trace_seq_printf() from patch 5
and two new 'struct tcp_trace_stats' and 'tcp_trace_basic'
from patch 4 will become permanent user api.

At the same time the commit log is saying:
"It is still missing a few things that
we currently have, like:
- why the sender is blocked? and how long for each reason?
- some TCP Congestion Control data"

Does it mean that these printf and structs would have
to change?

Can 'struct tcp_info' be extended instead of
adding 'struct tcp_trace_stats' ?

Then getsockopt and ftrace+tcp_trace will be returning
the same structs.

It feels that for stats collection only, tracepoints+tcp_trace
do not add much additional value vs extending tcp_info
and using ss.

I see the value in tracepoints on its own, since we'll
be able to use dynamic tracing to do event aggregation,
filtering, etc. That was my alternative suggestion to
add only tracepoints from patches 1 and 3.

^ permalink raw reply	[flat|nested] 29+ messages in thread
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-15  6:55 Alexei Starovoitov
  2014-12-15 16:03 ` Eric Dumazet
  2014-12-17 15:07 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-15  6:55 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev@vger.kernel.org, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

On Sun, Dec 14, 2014 at 5:56 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> Hi,
>
> We have been using the kernel ftrace infra to collect TCP per-flow statistics.
> The following patch set is a first slim-down version of our
> existing implementation. We would like to get some early feedback
> and make it useful for others.
>
> [RFC PATCH net-next 1/5] tcp: Add TCP TRACE_EVENTs:
> Defines some basic tracepoints (by TRACE_EVENT).
>
> [RFC PATCH net-next 2/5] tcp: A perf script for TCP tracepoints:
> A sample perf script with simple ip/port filtering and summary output.
>
> [RFC PATCH net-next 3/5] tcp: Add a few more tracepoints for tcp tracer:
> Declares a few more tracepoints (by DECLARE_TRACE) which are
> used by the tcp_tracer.  The tcp_tracer is in the patch 5/5.
>
> [RFC PATCH net-next 4/5] tcp: Introduce tcp_sk_trace and related structs:
> Defines a few tcp_trace structs which are used to collect statistics
> on each tcp_sock.
>
> [RFC PATCH net-next 5/5] tcp: Add TCP tracer:
> It introduces a tcp_tracer which hooks onto the tracepoints defined in the
> patch 1/5 and 3/5.  It collects data defined in patch 4/5. We currently
> use this tracer to collect per-flow statistics.  The commit log has
> some more details.

I think patches 1 and 3 are good additions, since they establish
few permanent points of instrumentation in tcp stack.
Patches 4-5 look more like use cases of tracepoints established
before. They may feel like simple additions and, no doubt,
they are useful, but since they expose things via tracing
infra they become part of api and cannot be changed later,
when more stats would be needed.
I think systemtap like scripting on top of patches 1 and 3
should solve your use case ?
Also, have you looked at recent eBPF work?
Though it's not completely ready yet, soon it should
be able to do the same stats collection as you have
in 4/5 without adding permanent pieces to the kernel.

^ permalink raw reply	[flat|nested] 29+ messages in thread
* [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-15  1:56 Martin KaFai Lau
  0 siblings, 0 replies; 29+ messages in thread
From: Martin KaFai Lau @ 2014-12-15  1:56 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team

Hi,

We have been using the kernel ftrace infra to collect TCP per-flow statistics.
The following patch set is a first slim-down version of our
existing implementation. We would like to get some early feedback
and make it useful for others.

[RFC PATCH net-next 1/5] tcp: Add TCP TRACE_EVENTs:
Defines some basic tracepoints (by TRACE_EVENT).

[RFC PATCH net-next 2/5] tcp: A perf script for TCP tracepoints:
A sample perf script with simple ip/port filtering and summary output.

[RFC PATCH net-next 3/5] tcp: Add a few more tracepoints for tcp tracer:
Declares a few more tracepoints (by DECLARE_TRACE) which are
used by the tcp_tracer.  The tcp_tracer is in the patch 5/5.

[RFC PATCH net-next 4/5] tcp: Introduce tcp_sk_trace and related structs:
Defines a few tcp_trace structs which are used to collect statistics
on each tcp_sock.

[RFC PATCH net-next 5/5] tcp: Add TCP tracer:
It introduces a tcp_tracer which hooks onto the tracepoints defined in the
patch 1/5 and 3/5.  It collects data defined in patch 4/5. We currently
use this tracer to collect per-flow statistics.  The commit log has
some more details.

Thanks,
--Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-12-19  1:43 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-17 17:14 [RFC PATCH net-next 0/5] tcp: TCP tracer Alexei Starovoitov
2014-12-17 19:51 ` Arnaldo Carvalho de Melo
  -- strict thread matches above, loose matches on Subject: below --
2014-12-17 20:42 Alexei Starovoitov
2014-12-17 20:56 ` David Ahern
2014-12-17 21:24   ` Arnaldo Carvalho de Melo
2014-12-17 21:19 ` Arnaldo Carvalho de Melo
2014-12-17  3:06 Alexei Starovoitov
2014-12-17 21:42 ` Josef Bacik
2014-12-18 23:43   ` Lawrence Brakmo
2014-12-19  1:42     ` Yuchung Cheng
2014-12-17  0:15 Alexei Starovoitov
2014-12-17  1:30 ` Martin Lau
2014-12-15  6:55 Alexei Starovoitov
2014-12-15 16:03 ` Eric Dumazet
2014-12-15 16:08   ` Blake Matheny
2014-12-15 19:56     ` Yuchung Cheng
2014-12-17 20:45       ` rapier
2014-12-16 18:28     ` Martin Lau
2014-12-15 16:42   ` Josef Bacik
2014-12-15 22:01     ` Tom Herbert
2014-12-15 22:17       ` rapier
2014-12-15 22:29       ` Steven Rostedt
2014-12-15 23:28       ` Jamal Hadi Salim
2014-12-15 23:40         ` Eric Dumazet
2014-12-16 22:40     ` Jason Baron
2014-12-16 22:45       ` David Miller
2014-12-16 22:50         ` Hannes Frederic Sowa
2014-12-17 15:07 ` Arnaldo Carvalho de Melo
2014-12-15  1:56 Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).