All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <shemminger@vyatta.com>
To: Jerry Chu <hkchu@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>, netdev@vger.kernel.org
Subject: Re: [PATCH] Add useful per-connection TCP stats for diagnosis purpose.
Date: Thu, 17 Mar 2011 14:20:30 -0700	[thread overview]
Message-ID: <20110317142030.62af785a@nehalam> (raw)
In-Reply-To: <AANLkTi=sLkXreC3LNsDuFDSeNMebr+ScMc12=gngd2cc@mail.gmail.com>

On Thu, 17 Mar 2011 13:16:15 -0700
Jerry Chu <hkchu@google.com> wrote:

> Eric, thanks for the prompt feedback.
> 
> On Thu, Mar 17, 2011 at 1:42 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le jeudi 17 mars 2011 à 01:06 -0700, H.K. Jerry Chu a écrit :
> >> From: Jerry Chu <hkchu@google.com>
> >>
> >> This patch add a number of very useful counters/stats (defined in
> >> tcp_stats.h) to help diagnosing TCP related problems.
> >>
> >> create_time     - when the connection was created (in jiffies)
> >> total_inbytes   - total inbytes as consumed by the receiving apps.
> >> total_outbytes  - total outbytes sent down from the transmitting apps.
> >>
> >> total_outdatasegs - total data carrying segments sent so far, including
> >>               retransmitted ones.
> >>
> >> total_xmit      - total accumulated time (usecs) when the connection
> >>               has something to send.
> >>
> >> total_retrans_time - total time (usecs, accumulated) the connection
> >>               spends trying to recover lost packets. For each
> >>               loss event the time is measured from the lost packet
> >>               was first sent till the retransmitted packet was
> >>               eventually ack'ed.
> >>
> >> total_cwnd_limit - total time (usecs, excluding time spent on loss
> >>               recovery) the xmit is stopped due to cwnd limited
> >>
> >> total_swnd_limit - total time (usecs) theconnection is swnd limited
> >>
> >> The following two counters are for listeners only:
> >>
> >> accepted_reqs   - total # of accepted connection requests.
> >> listen_drops    - total # of dropped SYN reqs (SYN cookies excluded) due
> >>               to listener's queue overflow.
> >>
> >> total_retrans_time/total_retrans ratio gives a rough picture of how
> >> quickly in average the connection can recover from a pkt loss. E.g.,
> >> when the network is more congested, or the traffic contains mainly
> >> smaller RPC where tail drop often requires RTO to recover,
> >> the total_retrans_time/total_retrans ratio tends to be higher.
> >>
> >> Currently the new counters/stats are exported through /proc/net/tcp.
> >
> > Please dont. Use iproute2 instead.
> >
> >> Some simple, abbreviated field names have been added to the output of
> >> /proc/net/tcp in order to allow backward/forward compatibility in the
> >> future. Obviously the new counters/stats can also be easily exported
> >> through other APIs.
> >>
> >
> > /proc/net/tcp is legacy. You should touch it eventually, but after
> > "other APIS" are done. It was the old way (quick but a bit ugly)
> 
> Understood. /proc/net/tcp is a much more expedient way of exporting these
> counters because it doesn't requires any additional, special tool to read it,
> unless other APIs (e.g., netlink). Note that backward compatibility to
> /proc/net/tcp has been ensured by adding field names in the heading.
> 
> >
> >
> >
> >> Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
> >> ---
> >>  include/linux/ktime.h    |    3 ++
> >>  include/linux/tcp.h      |    1 +
> >>  include/net/tcp_stats.h  |   65 ++++++++++++++++++++++++++++++++++++++++++++++
> >>  net/ipv4/tcp.c           |   30 ++++++++++++++++++---
> >>  net/ipv4/tcp_input.c     |   13 +++++++++
> >>  net/ipv4/tcp_ipv4.c      |   41 ++++++++++++++++++++++++++---
> >>  net/ipv4/tcp_minisocks.c |    9 ++++++
> >>  net/ipv4/tcp_output.c    |   47 +++++++++++++++++++++++++++++++--
> >>  net/ipv6/tcp_ipv6.c      |    8 +++++
> >>  9 files changed, 206 insertions(+), 11 deletions(-)
> >>  create mode 100644 include/net/tcp_stats.h
> >>
> >> diff --git a/include/linux/ktime.h b/include/linux/ktime.h
> >> index e1ceaa9..e60e758 100644
> >> --- a/include/linux/ktime.h
> >> +++ b/include/linux/ktime.h
> >> @@ -333,6 +333,9 @@ extern void ktime_get_ts(struct timespec *ts);
> >>  /* Get the real (wall-) time in timespec format: */
> >>  #define ktime_get_real_ts(ts)        getnstimeofday(ts)
> >
> > Hmm, this kind of changes are out of netdev scope and should be avoided
> 
> Ok. (It was moved out of tcp_stats.h only at the last minute.)
> 
> >
> >>
> >> +#define ktime_since(a)               ktime_to_us(ktime_sub(ktime_get(), (a)))
> >
> > us are implied in ktime_since() ? thats strange.
> 
> Ok.
> 
> >
> >> +#define ktime_zero(a)                ktime_equal((a), ktime_set(0, 0))
> >
> > ktime_zero() sounds like : "give me zero time" or "clear the ktime
> > field".
> 
> Yes I actually have been flip-flopping on the name...
> 
> >
> >> +
> >>  static inline ktime_t ns_to_ktime(u64 ns)
> >>  {
> >>       static const ktime_t ktime_zero = { .tv64 = 0 };
> >> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> >> index e64f4c6..ea5cb5d 100644
> >> --- a/include/linux/tcp.h
> >> +++ b/include/linux/tcp.h
> >> @@ -460,6 +460,7 @@ struct tcp_sock {
> >>        * contains related tcp_cookie_transactions fields.
> >>        */
> >>       struct tcp_cookie_values  *cookie_values;
> >> +     struct tcp_stats        *conn_stats;
> >>  };
> >
> > Really, using separate cache lines to store some stats is expensive.
> > You should add counters in existing structure, to avoid additional cache
> > line dirties. Carefully placing stats in already dirtied cache lines.
> 
> This was how it was done initially but then we wanted to allow future
> extension to include possibly a lot more counters, something like Web100
> (RFC4898). For the latter the memory/performance hit will likely require
> a config option, and a separate structure will make this easier. Does it
> make sense?
> 
> >
> > You also should use native ktime_t infrastructure, to make the maths
> > really fast in fast path.
> >
> > Only when stats are to be returned to user, you'll have to convert the
> > native timestamps to user exportable ones.
> 
> Good point! Will do. (I mistakenly thought ktime_t is a larger structure.)
> 
> >
> > Quite frankly, using u64 fields allow nanosec resolution.
> 
> I wish to use less bits because the final report only needs ms or even
> sec resolution but the intermediate computation needs to capture usec
> resolution.
> 
> >
> > BTW, we probably could 'export' sk->sk_drops for TCP, like we do for
> > UDP.
> 
> There are many other potentially useful counters/stats (spurious_retrans,
> min_rtt, total_rto,...) but there is a tradeoff against memory/performance hit
> so for the first round I'm focusing on what i feel is the most useful set.
> 
> Thanks,
> 
> Jerry

These stats are best added via netlink, the tool ss already prints lots
of similar stats. Look at INET_DIAG_INFO and the output of:
 ss -i -t






-- 

  reply	other threads:[~2011-03-17 21:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-17  8:06 [PATCH] Add useful per-connection TCP stats for diagnosis purpose H.K. Jerry Chu
2011-03-17  8:42 ` Eric Dumazet
2011-03-17 20:16   ` Jerry Chu
2011-03-17 21:20     ` Stephen Hemminger [this message]
2011-03-18  4:33       ` Jerry Chu
2011-03-18  4:51         ` David Miller
2011-03-18  6:05         ` Eric Dumazet
2011-03-19  4:33           ` Jerry Chu
2011-03-19  5:02             ` Eric Dumazet
2011-03-19  5:38               ` Jerry Chu
2011-03-19  6:03                 ` Eric Dumazet
2011-03-20  7:30                   ` Jerry Chu
2011-03-20  9:18                     ` Eric Dumazet
2011-03-21 21:18             ` Stephen Hemminger
2011-03-21 22:19               ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110317142030.62af785a@nehalam \
    --to=shemminger@vyatta.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hkchu@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.