* TCP event tracking via netlink...
@ 2007-12-05 13:30 David Miller
2007-12-05 14:11 ` John Heffner
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: David Miller @ 2007-12-05 13:30 UTC (permalink / raw)
To: ilpo.jarvinen; +Cc: netdev
Ilpo, I was pondering the kind of debugging one does to find
congestion control issues and even SACK bugs and it's currently too
painful because there is no standard way to track state changes.
I assume you're using something like carefully crafted printk's,
kprobes, or even ad-hoc statistic counters. That's what I used to do
:-)
With that in mind it occurred to me that we might want to do something
like a state change event generator.
Basically some application or even a daemon listens on this generic
netlink socket family we create. The header of each event packet
indicates what socket the event is for and then there is some state
information.
Then you can look at a tcpdump and this state dump side by side and
see what the kernel decided to do.
Now there is the question of granularity.
A very important consideration in this is that we want this thing to
be enabled in the distributions, therefore it must be cheap. Perhaps
one test at the end of the packet input processing.
So I say we pick some state to track (perhaps start with tcp_info)
and just push that at the end of every packet input run. Also,
we add some minimal filtering capability (match on specific IP
address and/or port, for example).
Maybe if we want to get really fancy we can have some more-expensive
debug mode where detailed specific events get generated via some
macros we can scatter all over the place. This won't be useful
for general user problem analysis, but it will be excellent for
developers.
Let me know if you think this is useful enough and I'll work on
an implementation we can start playing with.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: TCP event tracking via netlink... 2007-12-05 13:30 TCP event tracking via netlink David Miller @ 2007-12-05 14:11 ` John Heffner 2007-12-05 14:48 ` Evgeniy Polyakov 2007-12-06 5:00 ` David Miller 2007-12-05 16:53 ` Joe Perches 2007-12-05 23:18 ` Ilpo Järvinen 2 siblings, 2 replies; 21+ messages in thread From: John Heffner @ 2007-12-05 14:11 UTC (permalink / raw) To: David Miller; +Cc: ilpo.jarvinen, netdev David Miller wrote: > Ilpo, I was pondering the kind of debugging one does to find > congestion control issues and even SACK bugs and it's currently too > painful because there is no standard way to track state changes. > > I assume you're using something like carefully crafted printk's, > kprobes, or even ad-hoc statistic counters. That's what I used to do > :-) > > With that in mind it occurred to me that we might want to do something > like a state change event generator. > > Basically some application or even a daemon listens on this generic > netlink socket family we create. The header of each event packet > indicates what socket the event is for and then there is some state > information. > > Then you can look at a tcpdump and this state dump side by side and > see what the kernel decided to do. > > Now there is the question of granularity. > > A very important consideration in this is that we want this thing to > be enabled in the distributions, therefore it must be cheap. Perhaps > one test at the end of the packet input processing. > > So I say we pick some state to track (perhaps start with tcp_info) > and just push that at the end of every packet input run. Also, > we add some minimal filtering capability (match on specific IP > address and/or port, for example). > > Maybe if we want to get really fancy we can have some more-expensive > debug mode where detailed specific events get generated via some > macros we can scatter all over the place. This won't be useful > for general user problem analysis, but it will be excellent for > developers. > > Let me know if you think this is useful enough and I'll work on > an implementation we can start playing with. FWIW, sounds similar to what these guys are doing with SIFTR for FreeBSD: http://caia.swin.edu.au/urp/newtcp/tools.html http://caia.swin.edu.au/reports/070824A/CAIA-TR-070824A.pdf -John ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 14:11 ` John Heffner @ 2007-12-05 14:48 ` Evgeniy Polyakov 2007-12-05 15:12 ` Samir Bellabes 2007-12-06 5:03 ` David Miller 2007-12-06 5:00 ` David Miller 1 sibling, 2 replies; 21+ messages in thread From: Evgeniy Polyakov @ 2007-12-05 14:48 UTC (permalink / raw) To: John Heffner; +Cc: David Miller, ilpo.jarvinen, netdev Hi. On Wed, Dec 05, 2007 at 09:11:01AM -0500, John Heffner (jheffner@psc.edu) wrote: > >Maybe if we want to get really fancy we can have some more-expensive > >debug mode where detailed specific events get generated via some > >macros we can scatter all over the place. This won't be useful > >for general user problem analysis, but it will be excellent for > >developers. > > > >Let me know if you think this is useful enough and I'll work on > >an implementation we can start playing with. > > > FWIW, sounds similar to what these guys are doing with SIFTR for FreeBSD: > http://caia.swin.edu.au/urp/newtcp/tools.html > http://caia.swin.edu.au/reports/070824A/CAIA-TR-070824A.pdf And even more similar to this patch from Samir Bellabes of Mandriva: http://lwn.net/Articles/202255/ -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 14:48 ` Evgeniy Polyakov @ 2007-12-05 15:12 ` Samir Bellabes 2007-12-06 5:03 ` David Miller 1 sibling, 0 replies; 21+ messages in thread From: Samir Bellabes @ 2007-12-05 15:12 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: John Heffner, David Miller, ilpo.jarvinen, netdev Evgeniy Polyakov <johnpol@2ka.mipt.ru> writes: > Hi. > > On Wed, Dec 05, 2007 at 09:11:01AM -0500, John Heffner (jheffner@psc.edu) wrote: >> >Maybe if we want to get really fancy we can have some more-expensive >> >debug mode where detailed specific events get generated via some >> >macros we can scatter all over the place. This won't be useful >> >for general user problem analysis, but it will be excellent for >> >developers. >> > >> >Let me know if you think this is useful enough and I'll work on >> >an implementation we can start playing with. >> >> >> FWIW, sounds similar to what these guys are doing with SIFTR for FreeBSD: >> http://caia.swin.edu.au/urp/newtcp/tools.html >> http://caia.swin.edu.au/reports/070824A/CAIA-TR-070824A.pdf > > And even more similar to this patch from Samir Bellabes of Mandriva: > http://lwn.net/Articles/202255/ Indeed, I was thinking about this idea. but yet, my goal is not to deal with specific protocols like TCP, it's just to deal with the LSM hooks. Anyway, the idea is the same, having a deamon is userspace to catch informations. So why not a expansion? Lately, I'm moving the code to generic netlink, from connector. regards, sam ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 14:48 ` Evgeniy Polyakov 2007-12-05 15:12 ` Samir Bellabes @ 2007-12-06 5:03 ` David Miller 2007-12-06 10:58 ` Evgeniy Polyakov 1 sibling, 1 reply; 21+ messages in thread From: David Miller @ 2007-12-06 5:03 UTC (permalink / raw) To: johnpol; +Cc: jheffner, ilpo.jarvinen, netdev From: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Date: Wed, 5 Dec 2007 17:48:43 +0300 > On Wed, Dec 05, 2007 at 09:11:01AM -0500, John Heffner (jheffner@psc.edu) wrote: > > >Maybe if we want to get really fancy we can have some more-expensive > > >debug mode where detailed specific events get generated via some > > >macros we can scatter all over the place. This won't be useful > > >for general user problem analysis, but it will be excellent for > > >developers. > > > > > >Let me know if you think this is useful enough and I'll work on > > >an implementation we can start playing with. > > > > > > FWIW, sounds similar to what these guys are doing with SIFTR for FreeBSD: > > http://caia.swin.edu.au/urp/newtcp/tools.html > > http://caia.swin.edu.au/reports/070824A/CAIA-TR-070824A.pdf > > And even more similar to this patch from Samir Bellabes of Mandriva: > http://lwn.net/Articles/202255/ I think this work is very different. When I say "state" I mean something more significant than CLOSE, ESTABLISHED, etc. which is what Samir's patches are tracking. I'm talking about all of the sequence numbers, SACK information, congestion control knobs, etc. whose values are nearly impossible to track on a packet to packet basis in order to diagnose problems. Web100 provided facilities along these lines as well. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-06 5:03 ` David Miller @ 2007-12-06 10:58 ` Evgeniy Polyakov 0 siblings, 0 replies; 21+ messages in thread From: Evgeniy Polyakov @ 2007-12-06 10:58 UTC (permalink / raw) To: David Miller; +Cc: jheffner, ilpo.jarvinen, netdev On Wed, Dec 05, 2007 at 09:03:43PM -0800, David Miller (davem@davemloft.net) wrote: > I think this work is very different. > > When I say "state" I mean something more significant than > CLOSE, ESTABLISHED, etc. which is what Samir's patches are > tracking. > > I'm talking about all of the sequence numbers, SACK information, > congestion control knobs, etc. whose values are nearly impossible to > track on a packet to packet basis in order to diagnose problems. I pointed that work as a possible basis for collecting more info if you needs including sequence numbers, window sizes and so on. It just requires a useful structure layout placed, so that one would not require to recreate the same bits again, so that it could be called from any place inside the stack. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 14:11 ` John Heffner 2007-12-05 14:48 ` Evgeniy Polyakov @ 2007-12-06 5:00 ` David Miller 1 sibling, 0 replies; 21+ messages in thread From: David Miller @ 2007-12-06 5:00 UTC (permalink / raw) To: jheffner; +Cc: ilpo.jarvinen, netdev From: John Heffner <jheffner@psc.edu> Date: Wed, 05 Dec 2007 09:11:01 -0500 > FWIW, sounds similar to what these guys are doing with SIFTR for FreeBSD: > http://caia.swin.edu.au/urp/newtcp/tools.html > http://caia.swin.edu.au/reports/070824A/CAIA-TR-070824A.pdf Yes, my proposal is very similar to this SIFTR work. In their work they tap into the stack using the packet filtering hooks. In this way they avoid having to make TCP stack modifications, they just look up the PCB and dump state, whereas we have more liberty to do more serious surgery :-) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 13:30 TCP event tracking via netlink David Miller 2007-12-05 14:11 ` John Heffner @ 2007-12-05 16:53 ` Joe Perches 2007-12-05 21:33 ` Stephen Hemminger 2007-12-05 23:18 ` Ilpo Järvinen 2 siblings, 1 reply; 21+ messages in thread From: Joe Perches @ 2007-12-05 16:53 UTC (permalink / raw) To: David Miller; +Cc: ilpo.jarvinen, netdev > it occurred to me that we might want to do something > like a state change event generator. This could be a basis for an interesting TCP performance tester. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 16:53 ` Joe Perches @ 2007-12-05 21:33 ` Stephen Hemminger 2007-12-05 22:15 ` Ilpo Järvinen 2007-12-06 10:20 ` David Miller 0 siblings, 2 replies; 21+ messages in thread From: Stephen Hemminger @ 2007-12-05 21:33 UTC (permalink / raw) To: Joe Perches; +Cc: David Miller, ilpo.jarvinen, netdev On Wed, 05 Dec 2007 08:53:07 -0800 Joe Perches <joe@perches.com> wrote: > > it occurred to me that we might want to do something > > like a state change event generator. > > This could be a basis for an interesting TCP > performance tester. That is what tcpprobe does but it isn't detailed enough to address SACK issues. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 21:33 ` Stephen Hemminger @ 2007-12-05 22:15 ` Ilpo Järvinen 2007-12-06 4:06 ` Stephen Hemminger 2007-12-06 10:20 ` David Miller 1 sibling, 1 reply; 21+ messages in thread From: Ilpo Järvinen @ 2007-12-05 22:15 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Joe Perches, David Miller, Netdev On Wed, 5 Dec 2007, Stephen Hemminger wrote: > On Wed, 05 Dec 2007 08:53:07 -0800 > Joe Perches <joe@perches.com> wrote: > > > > it occurred to me that we might want to do something > > > like a state change event generator. > > > > This could be a basis for an interesting TCP > > performance tester. > > That is what tcpprobe does but it isn't detailed enough to address SACK > issues. ...It would be nice if that could be generalized so that the probe could be attached to some other functions than tcp_rcv_established instead. If we convert remaining functions that don't have sk or tp as first argument so that sk is listed first (should be many with wrong ordering if any), then maybe a generic handler could be of type: jtcp_entry(struct sock *sk, ...) or when available: jtcp_entry(struct sock *sk, struct sk_buff *ack, ...) -- i. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 22:15 ` Ilpo Järvinen @ 2007-12-06 4:06 ` Stephen Hemminger 0 siblings, 0 replies; 21+ messages in thread From: Stephen Hemminger @ 2007-12-06 4:06 UTC (permalink / raw) To: Ilpo Järvinen; +Cc: Joe Perches, David Miller, Netdev On Thu, 6 Dec 2007 00:15:49 +0200 (EET) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > On Wed, 5 Dec 2007, Stephen Hemminger wrote: > > > On Wed, 05 Dec 2007 08:53:07 -0800 > > Joe Perches <joe@perches.com> wrote: > > > > > > it occurred to me that we might want to do something > > > > like a state change event generator. > > > > > > This could be a basis for an interesting TCP > > > performance tester. > > > > That is what tcpprobe does but it isn't detailed enough to address SACK > > issues. > > ...It would be nice if that could be generalized so that the probe could > be attached to some other functions than tcp_rcv_established instead. > > If we convert remaining functions that don't have sk or tp as first > argument so that sk is listed first (should be many with wrong ordering > if any), then maybe a generic handler could be of type: > > jtcp_entry(struct sock *sk, ...) > > or when available: > > jtcp_entry(struct sock *sk, struct sk_buff *ack, ...) > > > -- > i. An earlier version had hooks in send as well, it is trivial to extend. as long as the prototypes match, any function arg ordering is okay. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 21:33 ` Stephen Hemminger 2007-12-05 22:15 ` Ilpo Järvinen @ 2007-12-06 10:20 ` David Miller 2007-12-06 13:28 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 21+ messages in thread From: David Miller @ 2007-12-06 10:20 UTC (permalink / raw) To: shemminger; +Cc: joe, ilpo.jarvinen, netdev From: Stephen Hemminger <shemminger@linux-foundation.org> Date: Wed, 5 Dec 2007 16:33:38 -0500 > On Wed, 05 Dec 2007 08:53:07 -0800 > Joe Perches <joe@perches.com> wrote: > > > > it occurred to me that we might want to do something > > > like a state change event generator. > > > > This could be a basis for an interesting TCP > > performance tester. > > That is what tcpprobe does but it isn't detailed enough to address SACK > issues. Indeed, this could be done via the jprobe there. Silly me I didn't do this in the implementation I whipped up, which I'll likely correct. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-06 10:20 ` David Miller @ 2007-12-06 13:28 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 21+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-06 13:28 UTC (permalink / raw) To: David Miller; +Cc: shemminger, joe, ilpo.jarvinen, netdev Em Thu, Dec 06, 2007 at 02:20:58AM -0800, David Miller escreveu: > From: Stephen Hemminger <shemminger@linux-foundation.org> > Date: Wed, 5 Dec 2007 16:33:38 -0500 > > > On Wed, 05 Dec 2007 08:53:07 -0800 > > Joe Perches <joe@perches.com> wrote: > > > > > > it occurred to me that we might want to do something > > > > like a state change event generator. > > > > > > This could be a basis for an interesting TCP > > > performance tester. > > > > That is what tcpprobe does but it isn't detailed enough to address SACK > > issues. > > Indeed, this could be done via the jprobe there. > > Silly me I didn't do this in the implementation I whipped > up, which I'll likely correct. I have some experiments from the past on this area: This is what is produced by ctracer + the ostra callgrapher when tracking many sk_buff objects, tracing sk_buff routines and as well all other structs that have a pointer to a sk_buff, i.e. where the sk_buff can be get from the struct that has a pointer to it, tcp_sock is an "alias" to struct inet_sock that is an "alias" to struct sock, etc, so when tracing tcp_sock you also trace inet_connection_sock, inet_sock, sock methods: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sk_buff/many_objects/ With just one object (that is reused, so appears many times): http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sk_buff/0xffff8101013130e8/ Following struct sock methods: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sock/many_objects/ http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sock/0xf61bf500/ struct socket: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/socket/many_objects/ It works by using the DWARF information to generate a systemtap module that in turn will create a relayfs channel where we store the traces and a automatically reorganized struct with just the base types (int, char, long, etc) and typedefs that end up being base types. Example of the struct minisock recreated from the debugging information and reorganized using the algorithms in pahole to save space, generated by this tool, go to the bottom, where you'll find struct ctracer__mini_sock and the collector, that from a full sized object creates the mini struct. http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/ctracer_collector.struct.sock.c And the systemtap module (the tcpprobe on steroids) automatically generated: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/ctracer_methods.struct.sock.stp This requires more work to: . reduce the overhead . filter out undesired functions creating a "project" with the functions desired using some gui editor . specify lists of fields to put on the internal state to be collected, again using a gui or plain ctracer-edit using vi, instead of getting just base types . Be able to say: collect just the fields on the second and fourth cacheline . collectors for complex objects such as spinlocks, socket lock, mutexes But since people are wanting to work on tools to watch state transitions, fields changing, etc, I thought I should dust off the ostra experiments and the more recent dwarves ctracer work I'm doing on my copious spare time 8) In the callgrapher there are some more interesting stuff: Interface to see where fields changed: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sock/0xf61bf500/changes.html In this page clicking on a field name, such as: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sock/0xf61bf500/sk_forward_alloc.png You'll get graphs over time. Code is in the dwarves repo at: http://master.kernel.org/git/?p=linux/kernel/git/acme/pahole.git;a=summary Thanks, - Arnaldo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 13:30 TCP event tracking via netlink David Miller 2007-12-05 14:11 ` John Heffner 2007-12-05 16:53 ` Joe Perches @ 2007-12-05 23:18 ` Ilpo Järvinen 2007-12-06 10:33 ` David Miller 2 siblings, 1 reply; 21+ messages in thread From: Ilpo Järvinen @ 2007-12-05 23:18 UTC (permalink / raw) To: David Miller; +Cc: Netdev On Wed, 5 Dec 2007, David Miller wrote: > Ilpo, I was pondering the kind of debugging one does to find > congestion control issues and even SACK bugs and it's currently too > painful because there is no standard way to track state changes. That's definately true. > I assume you're using something like carefully crafted printk's, > kprobes, or even ad-hoc statistic counters. That's what I used to do > :-) No, that's not at all what I do :-). I usually look time-seq graphs expect for the cases when I just find things out by reading code (or by just thinking of it). I'm so used to all things in the graphs that I can quite easily spot any inconsistencies & TCP events and then look interesting parts in greater detail, very rarely something remains uncertain... However, instead of directly going to printks, etc. I almost always read the code first (usually it's not just couple of lines but tens of potential TCP execution paths involving more than a handful of functions to check what the end result would be). This has a nice side-effect that other things tend to show up as well. Only when things get nasty and I cannot figure out what it does wrong, only then I add specially placed ad-hoc printks. One trick I also use, is to get the vars of the relevant flow from /proc/net/tcp in a while loop but it only works for my case because I use links that are slow (even a small value sleep in the loop does not hide much). For other people reports, I occasionally have to write a validator patches like you might have notice because in a typical miscount case our BUG_TRAPs are too late because they occur only after outstanding window becomes zero that might be very distant point in time already from the cause. Also, I'm planning an experiment with those markers thing to see if they are of any use when trying to gather some latency data about SACK processing because they seem light weight enough to not be disturbing. > With that in mind it occurred to me that we might want to do something > like a state change event generator. > > Basically some application or even a daemon listens on this generic > netlink socket family we create. The header of each event packet > indicates what socket the event is for and then there is some state > information. > > Then you can look at a tcpdump and this state dump side by side and > see what the kernel decided to do. Much of the info is available in tcpdump already, it's just hard to read without graphing it first because there are some many overlapping things to track in two-dimensional space. ...But yes, I have to admit that couple of problems come to my mind where having some variable from tcp_sock would have made the problem more obvious. > Now there is the question of granularity. > > A very important consideration in this is that we want this thing to > be enabled in the distributions, therefore it must be cheap. Perhaps > one test at the end of the packet input processing. Not sure what is the benefit of having distributions with it because those people hardly report problems anyway to here, they're just too happy with TCP performance unless we print something to their logs, which implies that we must setup a *_ON() condition :-(. Yes, often negleted problem is that most people are just too happy even something like TCP Tahoe or something as prehistoric. I've been surprised how badly TCP can break without nobody complaining as long as it doesn't crash (even any of the devs). Two key things seems to surface the most of the TCP related bugs: research people really staring at strange packet patterns (or code) and automatic WARN/BUG_ON checks triggered reports. The latter reports include also corner cases which nobody would otherwise ever noticed (or at least before Linus releases 3.0 :-/). IMHO, those invariant WARN/BUG_ON are the only alternative that scales to normal users well enough. The checks are simple enough so that it can be always on and then we just happen to print something to their log, and that's offensive enough for somebody to come up with a report... ;-) > So I say we pick some state to track (perhaps start with tcp_info) > and just push that at the end of every packet input run. Also, > we add some minimal filtering capability (match on specific IP > address and/or port, for example). > > Maybe if we want to get really fancy we can have some more-expensive > debug mode where detailed specific events get generated via some > macros we can scatter all over the place. > > This won't be useful for general user problem analysis, but it will be > excellent for developers. I would say that it to be generic enough, most function entrys and exits should have to be covered because the need varies a lot, the processing in general is so complex that things would get too easily shadowed otherwise! In addition we need expensive mode++ which goes all the way down to the dirty details of the write queue, they're now dirtier than ever because the queue is split I dared to do. Some problems are simply such that things cannot be accurately verified without high processing overhead until it's far too late (eg skb bits vs *_out counters). Maybe we should start to build an expensive state validator as well which would automatically check invariants of the write queue and tcp_sock in a straight forward, unoptimized manner? That would definately do a lot of work for us, just ask people to turn it on and it spits out everything that went wrong :-) (unless they really depend on very high-speed things and are therefore unhappy if we scan thousands of packets unnecessarily per ACK :-)). ...Early enough! ...That would work also for distros but there's always human judgement needed to decide whether the bug reporter will be happy when his TCP processing does no longer scale ;-). For the simpler thing, why not just taking all TCP functions and doing some automated tool using kprobes to collect the information we need through the sk/tp available on almost every function call, some TCP specific code could then easily produce what we want from it? Ah, this is almost done already as noted by Stephen, would just need some generalization to be pluggable to other functions as well and more variables. > Let me know if you think this is useful enough and I'll work on > an implementation we can start playing with. ...Hopefully you found any of my comments useful. -- i. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-05 23:18 ` Ilpo Järvinen @ 2007-12-06 10:33 ` David Miller 2007-12-06 17:23 ` Stephen Hemminger 2007-12-07 16:43 ` Ilpo Järvinen 0 siblings, 2 replies; 21+ messages in thread From: David Miller @ 2007-12-06 10:33 UTC (permalink / raw) To: ilpo.jarvinen; +Cc: netdev From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Thu, 6 Dec 2007 01:18:28 +0200 (EET) > On Wed, 5 Dec 2007, David Miller wrote: > > > I assume you're using something like carefully crafted printk's, > > kprobes, or even ad-hoc statistic counters. That's what I used to do > > :-) > > No, that's not at all what I do :-). I usually look time-seq graphs > expect for the cases when I just find things out by reading code (or > by just thinking of it). Can you briefly detail what graph tools and command lines you are using? The last time I did graphing to analyze things, the tools were hit-or-miss. > Much of the info is available in tcpdump already, it's just hard to read > without graphing it first because there are some many overlapping things > to track in two-dimensional space. > > ...But yes, I have to admit that couple of problems come to my mind > where having some variable from tcp_sock would have made the problem > more obvious. The most important are the cwnd and ssthresh, which you could guess using graphs but it is important to know on a packet to packet basis why we might have sent a packet or not because this has rippling effects down the rest of the RTT. > Not sure what is the benefit of having distributions with it because > those people hardly report problems anyway to here, they're just too > happy with TCP performance unless we print something to their logs, > which implies that we must setup a *_ON() condition :-(. That may be true, but if we could integrate the information with tcpdumps, we could gather internal state using tools the user already has available. Imagine if tcpdump printed out: 02:26:14.865805 IP $SRC > $DEST: . 11226:12686(1460) ack 0 win 108 ss_thresh: 129 cwnd: 133 packets_out: 132 or something like that. > Some problems are simply such that things cannot be accurately verified > without high processing overhead until it's far too late (eg skb bits vs > *_out counters). Maybe we should start to build an expensive state > validator as well which would automatically check invariants of the write > queue and tcp_sock in a straight forward, unoptimized manner? That would > definately do a lot of work for us, just ask people to turn it on and it > spits out everything that went wrong :-) (unless they really depend on > very high-speed things and are therefore unhappy if we scan thousands of > packets unnecessarily per ACK :-)). ...Early enough! ...That would work > also for distros but there's always human judgement needed to decide > whether the bug reporter will be happy when his TCP processing does no > longer scale ;-). I think it's useful as a TCP_DEBUG config option or similar, sure. But sometimes the algorithms are working as designed, it's just that they provide poor pipe utilization and CWND analysis embedded inside of a tcpdump would be one way to see that as well as determine the flaw in the algorithm. > ...Hopefully you found any of my comments useful. Very much so, thanks. I put together a sample implementation anyways just to show the idea, against net-2.6.25 below. It is untested since I didn't write the userland app yet to see that proper things get logged. Basically you could run a daemon that writes per-connection traces into files based upon the incoming netlink events. Later, using the binary pcap file and these traces, you can piece together traces like the above using the timestamps etc. to match up pcap packets to ones from the TCP logger. The userland tools could do analysis and print pre-cooked state diff logs, like "this ACK raised CWND by one" or whatever else you wanted to know. It's nice that an expert like you can look at graphs and understand, but we'd like to create more experts and besides reading code one way to become an expert is to be able to extrace live real data from the kernel's working state and try to understand how things got that way. This information is permanently lost currently. diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 56342c3..c0e61d0 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -170,6 +170,47 @@ struct tcp_md5sig { __u8 tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */ }; +/* TCP netlink event logger. */ +struct tcp_log_key { + union { + __be32 a4; + __be32 a6[4]; + } saddr, daddr; + __be16 sport; + __be16 dport; + unsigned short family; + unsigned short __pad; +}; + +struct tcp_log_stamp { + __u32 tv_sec; + __u32 tv_usec; +}; + +struct tcp_log_payload { + struct tcp_log_key key; + struct tcp_log_stamp stamp; + struct tcp_info info; +}; + +enum { + TCP_LOG_A_UNSPEC = 0, + __TCP_LOG_A_MAX, +}; +#define TCP_LOG_A_MAX (__TCP_LOG_A_MAX - 1) + +#define TCP_LOG_GENL_NAME "tcp_log" +#define TCP_LOG_GENL_VERSION 1 + +enum { + TCP_LOG_CMD_UNSPEC = 0, + TCP_LOG_CMD_HELLO, + TCP_LOG_CMD_GOODBYE, + TCP_LOG_CMD_EVENT, + __TCP_LOG_CMD_MAX, +}; +#define TCP_LOG_CMD_MAX (__TCP_LOG_CMD_MAX - 1) + #ifdef __KERNEL__ #include <linux/skbuff.h> diff --git a/include/net/tcp.h b/include/net/tcp.h index 9dbed0b..5ac82ea 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1730,6 +1730,19 @@ struct tcp_request_sock_ops { #endif }; +#define TCP_LOG_PID_INACTIVE -1 +extern int tcp_log_pid; + +extern void tcp_do_log(struct sock *sk, ktime_t stamp); + +static inline void tcp_log(struct sock *sk, ktime_t stamp) +{ + if (likely(tcp_log_pid == TCP_LOG_PID_INACTIVE)) + return; + + tcp_do_log(sk, stamp); +} + extern void tcp_v4_init(struct net_proto_family *ops); extern void tcp_init(void); diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index ad40ef3..fa0cc1d 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -7,7 +7,7 @@ obj-y := route.o inetpeer.o protocol.o \ ip_output.o ip_sockglue.o inet_hashtables.o \ inet_timewait_sock.o inet_connection_sock.o \ tcp.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o \ - tcp_minisocks.o tcp_cong.o \ + tcp_minisocks.o tcp_cong.o tcp_log.o \ datagram.o raw.o udp.o udplite.o \ arp.o icmp.o devinet.o af_inet.o igmp.o \ fib_frontend.o fib_semantics.o \ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c5fba12..a51cbd2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4577,6 +4577,7 @@ int tcp_rcv_established(struct sock *sk, struct sk_buff *skb, struct tcphdr *th, unsigned len) { struct tcp_sock *tp = tcp_sk(sk); + ktime_t stamp = skb->tstamp; /* * Header prediction. @@ -4657,6 +4658,7 @@ int tcp_rcv_established(struct sock *sk, struct sk_buff *skb, tcp_ack(sk, skb, 0); __kfree_skb(skb); tcp_data_snd_check(sk); + tcp_log(sk, stamp); return 0; } else { /* Header too small */ TCP_INC_STATS_BH(TCP_MIB_INERRS); @@ -4748,6 +4750,7 @@ no_ack: __kfree_skb(skb); else sk->sk_data_ready(sk, 0); + tcp_log(sk, stamp); return 0; } } @@ -4800,6 +4803,7 @@ slow_path: TCP_INC_STATS_BH(TCP_MIB_INERRS); NET_INC_STATS_BH(LINUX_MIB_TCPABORTONSYN); tcp_reset(sk); + tcp_log(sk, stamp); return 1; } @@ -4817,6 +4821,7 @@ step5: tcp_data_snd_check(sk); tcp_ack_snd_check(sk); + tcp_log(sk, stamp); return 0; csum_error: @@ -4824,6 +4829,7 @@ csum_error: discard: __kfree_skb(skb); + tcp_log(sk, stamp); return 0; } --- a/net/ipv4/tcp_log.c 2007-10-24 01:07:28.000000000 -0700 +++ b/net/ipv4/tcp_log.c 2007-12-06 01:06:26.000000000 -0800 @@ -0,0 +1,149 @@ +/* tcp_log.c: Netlink based TCP state change logger. + * + * Copyright (C) 2007 David S. Miller <davem@davemloft.net> + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/time.h> +#include <linux/ipv6.h> +#include <linux/tcp.h> + +#include <net/genetlink.h> +#include <net/inet_sock.h> +#include <net/tcp.h> + +static struct genl_family tcp_log_family = { + .id = GENL_ID_GENERATE, + .name = TCP_LOG_GENL_NAME, + .version = TCP_LOG_GENL_VERSION, + .hdrsize = sizeof(struct tcp_log_payload), + .maxattr = TCP_LOG_A_MAX, +}; + +static unsigned int tcp_log_seqnum; + +int tcp_log_pid = TCP_LOG_PID_INACTIVE; +EXPORT_SYMBOL(tcp_log_pid); + +static int tcp_log_hello(struct sk_buff *skb, struct genl_info *info) +{ + tcp_log_pid = info->snd_pid; + return 0; +} + +static int tcp_log_goodbye(struct sk_buff *skb, struct genl_info *info) +{ + tcp_log_pid = TCP_LOG_PID_INACTIVE; + return 0; +} + +static struct genl_ops tcp_log_hello_ops = { + .cmd = TCP_LOG_CMD_HELLO, + .doit = tcp_log_hello, +}; + +static struct genl_ops tcp_log_goodbye_ops = { + .cmd = TCP_LOG_CMD_GOODBYE, + .doit = tcp_log_goodbye, +}; + +static void fill_key(struct tcp_log_key *key, struct sock *sk) +{ + struct inet_sock *inet = inet_sk(sk); + struct ipv6_pinfo *np = inet6_sk(sk); + + switch (sk->sk_family) { + case AF_INET: + key->saddr.a4 = inet->saddr; + key->daddr.a4 = inet->daddr; + break; + case AF_INET6: + memcpy(&key->saddr.a6, &np->saddr, sizeof(key->saddr.a6)); + memcpy(&key->daddr.a6, &np->daddr, sizeof(key->daddr.a6)); + break; + default: + BUG(); + break; + } + key->sport = inet->sport; + key->dport = inet->dport; +} + +void tcp_do_log(struct sock *sk, ktime_t stamp) +{ + struct tcp_log_payload *p; + struct sk_buff *skb; + struct timeval tv; + void *data; + int size; + + size = nla_total_size(sizeof(struct tcp_log_payload)); + skb = genlmsg_new(size, GFP_ATOMIC); + if (!skb) + return; + + data = genlmsg_put(skb, 0, tcp_log_seqnum++, + &tcp_log_family, 0, TCP_LOG_CMD_EVENT); + if (!data) { + nlmsg_free(skb); + return; + } + p = data; + + fill_key(&p->key, sk); + + if (stamp.tv64) + tv = ktime_to_timeval(stamp); + else + do_gettimeofday(&tv); + + p->stamp.tv_sec = tv.tv_sec; + p->stamp.tv_usec = tv.tv_usec; + + tcp_get_info(sk, &p->info); + + if (genlmsg_end(skb, data) < 0) { + nlmsg_free(skb); + return; + } + + genlmsg_unicast(skb, tcp_log_pid); +} +EXPORT_SYMBOL(tcp_do_log); + +static int __init tcp_log_init(void) +{ + int err = genl_register_family(&tcp_log_family); + + if (err) + return err; + + err = genl_register_ops(&tcp_log_family, &tcp_log_hello_ops); + if (err) + goto out_unregister_family; + + err = genl_register_ops(&tcp_log_family, &tcp_log_goodbye_ops); + if (err) + goto out_unregister_hello; + + return 0; + +out_unregister_hello: + genl_unregister_ops(&tcp_log_family, &tcp_log_hello_ops); + +out_unregister_family: + genl_unregister_family(&tcp_log_family); + + return err; +} + +static void __exit tcp_log_exit(void) +{ + genl_unregister_ops(&tcp_log_family, &tcp_log_goodbye_ops); + genl_unregister_ops(&tcp_log_family, &tcp_log_hello_ops); + genl_unregister_family(&tcp_log_family); +} + +module_init(tcp_log_init); +module_exit(tcp_log_exit); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-06 10:33 ` David Miller @ 2007-12-06 17:23 ` Stephen Hemminger 2007-12-07 6:51 ` David Miller 2008-01-02 8:22 ` David Miller 2007-12-07 16:43 ` Ilpo Järvinen 1 sibling, 2 replies; 21+ messages in thread From: Stephen Hemminger @ 2007-12-06 17:23 UTC (permalink / raw) To: David Miller; +Cc: ilpo.jarvinen, netdev On Thu, 06 Dec 2007 02:33:46 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> > Date: Thu, 6 Dec 2007 01:18:28 +0200 (EET) > > > On Wed, 5 Dec 2007, David Miller wrote: > > > > > I assume you're using something like carefully crafted printk's, > > > kprobes, or even ad-hoc statistic counters. That's what I used to do > > > :-) > > > > No, that's not at all what I do :-). I usually look time-seq graphs > > expect for the cases when I just find things out by reading code (or > > by just thinking of it). > > Can you briefly detail what graph tools and command lines > you are using? > > The last time I did graphing to analyze things, the tools > were hit-or-miss. > > > Much of the info is available in tcpdump already, it's just hard to read > > without graphing it first because there are some many overlapping things > > to track in two-dimensional space. > > > > ...But yes, I have to admit that couple of problems come to my mind > > where having some variable from tcp_sock would have made the problem > > more obvious. > > The most important are the cwnd and ssthresh, which you could guess > using graphs but it is important to know on a packet to packet > basis why we might have sent a packet or not because this has > rippling effects down the rest of the RTT. > > > Not sure what is the benefit of having distributions with it because > > those people hardly report problems anyway to here, they're just too > > happy with TCP performance unless we print something to their logs, > > which implies that we must setup a *_ON() condition :-(. > > That may be true, but if we could integrate the information with > tcpdumps, we could gather internal state using tools the user > already has available. > > Imagine if tcpdump printed out: > > 02:26:14.865805 IP $SRC > $DEST: . 11226:12686(1460) ack 0 win 108 > ss_thresh: 129 cwnd: 133 packets_out: 132 > > or something like that. > > > Some problems are simply such that things cannot be accurately verified > > without high processing overhead until it's far too late (eg skb bits vs > > *_out counters). Maybe we should start to build an expensive state > > validator as well which would automatically check invariants of the write > > queue and tcp_sock in a straight forward, unoptimized manner? That would > > definately do a lot of work for us, just ask people to turn it on and it > > spits out everything that went wrong :-) (unless they really depend on > > very high-speed things and are therefore unhappy if we scan thousands of > > packets unnecessarily per ACK :-)). ...Early enough! ...That would work > > also for distros but there's always human judgement needed to decide > > whether the bug reporter will be happy when his TCP processing does no > > longer scale ;-). > > I think it's useful as a TCP_DEBUG config option or similar, sure. > > But sometimes the algorithms are working as designed, it's just that > they provide poor pipe utilization and CWND analysis embedded inside > of a tcpdump would be one way to see that as well as determine the > flaw in the algorithm. > > > ...Hopefully you found any of my comments useful. > > Very much so, thanks. > > I put together a sample implementation anyways just to show the idea, > against net-2.6.25 below. > > It is untested since I didn't write the userland app yet to see that > proper things get logged. Basically you could run a daemon that > writes per-connection traces into files based upon the incoming > netlink events. Later, using the binary pcap file and these traces, > you can piece together traces like the above using the timestamps > etc. to match up pcap packets to ones from the TCP logger. > > The userland tools could do analysis and print pre-cooked state diff > logs, like "this ACK raised CWND by one" or whatever else you wanted > to know. > > It's nice that an expert like you can look at graphs and understand, > but we'd like to create more experts and besides reading code one > way to become an expert is to be able to extrace live real data > from the kernel's working state and try to understand how things > got that way. This information is permanently lost currently. Tools and scripts for testing that generate graphs are at: git://git.kernel.org/pub/scm/tcptest/tcptest ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-06 17:23 ` Stephen Hemminger @ 2007-12-07 6:51 ` David Miller 2008-01-02 8:22 ` David Miller 1 sibling, 0 replies; 21+ messages in thread From: David Miller @ 2007-12-07 6:51 UTC (permalink / raw) To: shemminger; +Cc: ilpo.jarvinen, netdev From: Stephen Hemminger <shemminger@linux-foundation.org> Date: Thu, 6 Dec 2007 09:23:12 -0800 > Tools and scripts for testing that generate graphs are at: > git://git.kernel.org/pub/scm/tcptest/tcptest I know about this, I'm just curious what exactly Ilpo is using :-) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-06 17:23 ` Stephen Hemminger 2007-12-07 6:51 ` David Miller @ 2008-01-02 8:22 ` David Miller 2008-01-02 11:05 ` Ilpo Järvinen 1 sibling, 1 reply; 21+ messages in thread From: David Miller @ 2008-01-02 8:22 UTC (permalink / raw) To: shemminger; +Cc: ilpo.jarvinen, netdev From: Stephen Hemminger <shemminger@linux-foundation.org> Date: Thu, 6 Dec 2007 09:23:12 -0800 > Tools and scripts for testing that generate graphs are at: > git://git.kernel.org/pub/scm/tcptest/tcptest Did you move it somewhere else? davem@sunset:~/src/GIT$ git clone git://git.kernel.org/pub/scm/tcptest/tcptest Initialized empty Git repository in /home/davem/src/GIT/tcptest/.git/ fatal: The remote end hung up unexpectedly fetch-pack from 'git://git.kernel.org/pub/scm/tcptest/tcptest' failed. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2008-01-02 8:22 ` David Miller @ 2008-01-02 11:05 ` Ilpo Järvinen 2008-01-03 9:26 ` David Miller 0 siblings, 1 reply; 21+ messages in thread From: Ilpo Järvinen @ 2008-01-02 11:05 UTC (permalink / raw) To: David Miller; +Cc: Stephen Hemminger, Netdev On Wed, 2 Jan 2008, David Miller wrote: > From: Stephen Hemminger <shemminger@linux-foundation.org> > Date: Thu, 6 Dec 2007 09:23:12 -0800 > > > Tools and scripts for testing that generate graphs are at: > > git://git.kernel.org/pub/scm/tcptest/tcptest > > Did you move it somewhere else? > > davem@sunset:~/src/GIT$ git clone git://git.kernel.org/pub/scm/tcptest/tcptest > Initialized empty Git repository in /home/davem/src/GIT/tcptest/.git/ > fatal: The remote end hung up unexpectedly > fetch-pack from 'git://git.kernel.org/pub/scm/tcptest/tcptest' failed. .../network/ was missing from the path :-). $ git-remote show origin * remote origin URL: git://git.kernel.org/pub/scm/network/tcptest/tcptest.git Remote branch(es) merged with 'git pull' while on branch master master Tracked remote branches master -- i. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2008-01-02 11:05 ` Ilpo Järvinen @ 2008-01-03 9:26 ` David Miller 0 siblings, 0 replies; 21+ messages in thread From: David Miller @ 2008-01-03 9:26 UTC (permalink / raw) To: ilpo.jarvinen; +Cc: shemminger, netdev From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Wed, 2 Jan 2008 13:05:17 +0200 (EET) > git://git.kernel.org/pub/scm/network/tcptest/tcptest.git Thanks a lot Ilpo. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TCP event tracking via netlink... 2007-12-06 10:33 ` David Miller 2007-12-06 17:23 ` Stephen Hemminger @ 2007-12-07 16:43 ` Ilpo Järvinen 1 sibling, 0 replies; 21+ messages in thread From: Ilpo Järvinen @ 2007-12-07 16:43 UTC (permalink / raw) To: David Miller; +Cc: Netdev [-- Attachment #1: Type: TEXT/PLAIN, Size: 6292 bytes --] On Thu, 6 Dec 2007, David Miller wrote: > From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> > Date: Thu, 6 Dec 2007 01:18:28 +0200 (EET) > > > On Wed, 5 Dec 2007, David Miller wrote: > > > > > I assume you're using something like carefully crafted printk's, > > > kprobes, or even ad-hoc statistic counters. That's what I used to do > > > :-) > > > > No, that's not at all what I do :-). I usually look time-seq graphs > > expect for the cases when I just find things out by reading code (or > > by just thinking of it). > > Can you briefly detail what graph tools and command lines > you are using? I have a tool called Sealion but it's behind NDA (making it open source has been talked for long but I don't have idea why it hasn't realized yet). It's mostly tcl/tk code is, by no means nice or clean desing nor quality (I'll leave details why I think it's that way out of this discussion :-)). Produces svgs. Usually I'm have the things I need in the standard sent+ACK+SACKs(+win) graph it produces. The result is quite similar to what tcptrace+xplot produces but xplot UI is really horrible, IMHO. If I have to deal with tcpdump output only, it takes considerable amount of time to do computations with bc to come up with the same understanding by just reading tcpdumps. > The last time I did graphing to analyze things, the tools > were hit-or-miss. Yeah, this is definately true. Open source graphing tools I know are really not that astonishing :-(. I've tried to look for better tools as well but with little success. > > Much of the info is available in tcpdump already, it's just hard to read > > without graphing it first because there are some many overlapping things > > to track in two-dimensional space. > > > > ...But yes, I have to admit that couple of problems come to my mind > > where having some variable from tcp_sock would have made the problem > > more obvious. > > The most important are the cwnd and ssthresh, which you could guess > using graphs but it is important to know on a packet to packet > basis why we might have sent a packet or not because this has > rippling effects down the rest of the RTT. Couple of points: In order to evaluate validity of some action, one might need more than one packet from the history. Answer to the why we have sent a packet is rather simple (excluding RTOs): cwnd > packets_in_flight and data was available. No, it's not at all complicated. Though I might be too biased toward non-application limited cases which make the formula even simpler because everything is basically ACK clocked. To really tell what caused changes between cwnd and/or packets_in_flight one usually needs some history or more fine-grained approach, once per packet is way too wide gap. It tells just what happened, not why, unless you're really familiar with the state machine and can make the right guess. > > Not sure what is the benefit of having distributions with it because > > those people hardly report problems anyway to here, they're just too > > happy with TCP performance unless we print something to their logs, > > which implies that we must setup a *_ON() condition :-(. > > That may be true, but if we could integrate the information with > tcpdumps, we could gather internal state using tools the user > already has available. It would definately help if we could, but that of course depends on getting the reports in the first place. > Imagine if tcpdump printed out: > > 02:26:14.865805 IP $SRC > $DEST: . 11226:12686(1460) ack 0 win 108 > ss_thresh: 129 cwnd: 133 packets_out: 132 > > or something like that. How about this: 02:26:14.865805 IP $SRC > $DEST: . ack 11226 win 108 <...sack 1 {15606:18526} 17066:18526 0->S sacktag_one l0 s1 r0 f4 pc1 ... 11226:12686 ---- clean_rtx_queue ... 11226:12686 0->L mark_head_lost l1 s1 r0 f4 pc1 ... 12686:14146 0->L mark_head_lost l2 s1 r0 f4 pc1 ... 11226:12686 L->LRe retransmit_skb l2 s1 r1 f4 pc1 ... ...would make the bug in sack processing relatively obvious (yes, it has an intentional flaw in it, points from find it :-))... That would be something I'd like to have right now. > But sometimes the algorithms are working as designed, it's just that > they provide poor pipe utilization and CWND analysis embedded inside > of a tcpdump would be one way to see that as well as determine the > flaw in the algorithm. Fair enough. > It is untested since I didn't write the userland app yet to see that > proper things get logged. Basically you could run a daemon that > writes per-connection traces into files based upon the incoming > netlink events. Later, using the binary pcap file and these traces, > you can piece together traces like the above using the timestamps > etc. to match up pcap packets to ones from the TCP logger. > > The userland tools could do analysis and print pre-cooked state diff > logs, like "this ACK raised CWND by one" or whatever else you wanted > to know. Obviously a collection of useful userland tools seems here at least as important as the existance of the interface. > It's nice that an expert like you can look at graphs and understand, > but we'd like to create more experts and besides reading code one > way to become an expert is to be able to extrace live real data > from the kernel's working state and try to understand how things > got that way. This information is permanently lost currently. IMHO this problem is in such caliber that no human can track efficiently more than a couple of packets of a TCP flow from text only view, without headaches I mean, or are you able to do that at ease? And we're talking here about the people who have just begun to deal with TCP. ...For me especially those nearly identical seqnos all around are too overwhelming to track in any sane way and I'd expect that most feel the same way. I'd state my point different way around (with the terms you chose): it is very difficult to become an expert without looking some graphs, we may disagree and that's fine :-). I think it's because one would then have no idea about larger picture (and about the very _relevant_ past/future) when looking just a single line at a time from the tcpdump (or equivalent). Sad thing is that a good tool to do the visulization might not exist. -- i. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2008-01-03 9:26 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-05 13:30 TCP event tracking via netlink David Miller 2007-12-05 14:11 ` John Heffner 2007-12-05 14:48 ` Evgeniy Polyakov 2007-12-05 15:12 ` Samir Bellabes 2007-12-06 5:03 ` David Miller 2007-12-06 10:58 ` Evgeniy Polyakov 2007-12-06 5:00 ` David Miller 2007-12-05 16:53 ` Joe Perches 2007-12-05 21:33 ` Stephen Hemminger 2007-12-05 22:15 ` Ilpo Järvinen 2007-12-06 4:06 ` Stephen Hemminger 2007-12-06 10:20 ` David Miller 2007-12-06 13:28 ` Arnaldo Carvalho de Melo 2007-12-05 23:18 ` Ilpo Järvinen 2007-12-06 10:33 ` David Miller 2007-12-06 17:23 ` Stephen Hemminger 2007-12-07 6:51 ` David Miller 2008-01-02 8:22 ` David Miller 2008-01-02 11:05 ` Ilpo Järvinen 2008-01-03 9:26 ` David Miller 2007-12-07 16:43 ` Ilpo Järvinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).