Re: [RFC PATCH v6 net-next 3/4] tcp: add in_flight to tcp_skb_cb

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lawrence Brakmo <brakmo@fb.com>
To: Yuchung Cheng <ycheng@google.com>
Cc: netdev <netdev@vger.kernel.org>, Kernel Team <Kernel-team@fb.com>,
	"Neal Cardwell" <ncardwell@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Subject: Re: [RFC PATCH v6 net-next 3/4] tcp: add in_flight to tcp_skb_cb
Date: Thu, 27 Aug 2015 23:27:54 +0000	[thread overview]
Message-ID: <D204E9B0.7469%brakmo@fb.com> (raw)
In-Reply-To: <CAK6E8=fNk-x6=QLHT0rjFRh3bARfkErA+VW_PgxRUcu-3fm9pQ@mail.gmail.com>



On 8/27/15, 3:57 PM, "Yuchung Cheng" <ycheng@google.com> wrote:

>On Thu, Aug 27, 2015 at 3:54 PM, Yuchung Cheng <ycheng@google.com> wrote:
>> On Thu, Aug 27, 2015 at 3:44 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>>> Yuchung, thank you for reviewing these patches. Response inline below.
>>>
>>> On 8/27/15, 3:00 PM, "Yuchung Cheng" <ycheng@google.com> wrote:
>>>
>>>>On Tue, Aug 25, 2015 at 4:33 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>>>>> Add in_flight (bytes in flight when packet was sent) field
>>>>> to tx component of tcp_skb_cb and make it available to
>>>>> congestion modules' pkts_acked() function through the
>>>>> ack_sample function argument.
>>>>>
>>>>> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
>> Acked-by: Yuchung Cheng <ycheng@google.com>
>>
>>>>> ---
>>>>>  include/net/tcp.h     | 2 ++
>>>>>  net/ipv4/tcp_input.c  | 5 ++++-
>>>>>  net/ipv4/tcp_output.c | 4 +++-
>>>>>  3 files changed, 9 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>>>>> index a086a98..cdd93e5 100644
>>>>> --- a/include/net/tcp.h
>>>>> +++ b/include/net/tcp.h
>>>>> @@ -757,6 +757,7 @@ struct tcp_skb_cb {
>>>>>         union {
>>>>>                 struct {
>>>>>                         /* There is space for up to 20 bytes */
>>>>> +                       __u32 in_flight;/* Bytes in flight when
>>>>>packet
>>>>>sent */
>>>>>                 } tx;   /* only used for outgoing skbs */
>>>>>                 union {
>>>>>                         struct inet_skb_parm    h4;
>>>>> @@ -842,6 +843,7 @@ union tcp_cc_info;
>>>>>  struct ack_sample {
>>>>>         u32 pkts_acked;
>>>>>         s32 rtt_us;
>>>>> +       u32 in_flight;
>>>>>  };
>>>>>
>>>>>  struct tcp_congestion_ops {
>>>>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>>>>> index f506a0a..338e6bb 100644
>>>>> --- a/net/ipv4/tcp_input.c
>>>>> +++ b/net/ipv4/tcp_input.c
>>>>> @@ -3069,6 +3069,7 @@ static int tcp_clean_rtx_queue(struct sock *sk,
>>>>>int prior_fackets,
>>>>>         long ca_rtt_us = -1L;
>>>>>         struct sk_buff *skb;
>>>>>         u32 pkts_acked = 0;
>>>>> +       u32 last_in_flight = 0;
>>>>>         bool rtt_update;
>>>>>         int flag = 0;
>>>>>
>>>>> @@ -3108,6 +3109,7 @@ static int tcp_clean_rtx_queue(struct sock *sk,
>>>>>int prior_fackets,
>>>>>                         if (!first_ackt.v64)
>>>>>                                 first_ackt = last_ackt;
>>>>>
>>>>> +                       last_in_flight =
>>>>>TCP_SKB_CB(skb)->tx.in_flight;
>>>>>                         reord = min(pkts_acked, reord);
>>>>>                         if (!after(scb->end_seq, tp->high_seq))
>>>>>                                 flag |= FLAG_ORIG_SACK_ACKED;
>>>>> @@ -3197,7 +3199,8 @@ static int tcp_clean_rtx_queue(struct sock *sk,
>>>>>int prior_fackets,
>>>>>         }
>>>>>
>>>>>         if (icsk->icsk_ca_ops->pkts_acked) {
>>>>> -               struct ack_sample sample = {pkts_acked, ca_rtt_us};
>>>>> +               struct ack_sample sample = {pkts_acked, ca_rtt_us,
>>>>> +                                           last_in_flight};
>>>>>
>>>>>                 icsk->icsk_ca_ops->pkts_acked(sk, &sample);
>>>>>         }
>>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>>> index 444ab5b..244d201 100644
>>>>> --- a/net/ipv4/tcp_output.c
>>>>> +++ b/net/ipv4/tcp_output.c
>>>>> @@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk,
>>>>>struct sk_buff *skb, int clone_it,
>>>>>         int err;
>>>>>
>>>>>         BUG_ON(!skb || !tcp_skb_pcount(skb));
>>>>> +       tp = tcp_sk(sk);
>>>>>
>>>>>         if (clone_it) {
>>>>>                 skb_mstamp_get(&skb->skb_mstamp);
>>>>> +               TCP_SKB_CB(skb)->tx.in_flight =
>>>>>TCP_SKB_CB(skb)->end_seq
>>>>> +                       - tp->snd_una;
>>>>what if skb is a retransmitted packet? e.g. the first retransmission
>>>>in fast recovery would always record an inflight of 1 packet?
>>>
>>> Yes.
>>> This does not affect NV for 2 reasons: 1) NV does not use ACKs when
>>> ca_state is not Open or Disorder to determine congestion state, 2)
>>>even if
>>> we used it, the small inflight means that the computed throughput will
>>>be
>>> small so it will not cause a non-congestion signal, but will not cause
>>>a
>>> congestion signal either because NV needs many (~60) measurements
>>>before
>>> determining there is congestion.
>>>
>>> However, other consumers may prefer a different value. From a
>>>congestion
>>> avoidance perspective, it is unclear we will be able to compute an
>>> accurate throughput when retransmitting, so we may as well give a lower
>>> bound.
>> I see. Then this is OK for now since only NV uses it. We can enhance
>> and track tput even during other CA states later. Would that be a
>> useful feature for NV as well?
>For example, we (at Google servers) have seen some flows staying in
>very long CA_Recovery due to rate limiter or CA_Disorder state due to
>high path reordering. It'd be beneficial to have CC continue to
>operate in these circumstances in the future.

Hopefully congestion avoidance in NV would adapt to a rate limiter and
prevent
losses or large queues.
However, for the time being NV is only recommended for data center traffic
since 1) I¹ve only tested in small RTT environments and 2) cannot compete
fairly with Cubic (at least in small RTT environments).

>
>>
>>>
>>> What do you think?
>>>
>>>>
>>>>>
>>>>>                 if (unlikely(skb_cloned(skb)))
>>>>>                         skb = pskb_copy(skb, gfp_mask);
>>>>> @@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk,
>>>>>struct
>>>>>sk_buff *skb, int clone_it,
>>>>>         }
>>>>>
>>>>>         inet = inet_sk(sk);
>>>>> -       tp = tcp_sk(sk);
>>>>>         tcb = TCP_SKB_CB(skb);
>>>>>         memset(&opts, 0, sizeof(opts));
>>>>>
>>>>> --
>>>>> 1.8.1
>>>>>
>>>

next prev parent reply	other threads:[~2015-08-27 23:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-25 23:33 tcp: add NV congestion control Lawrence Brakmo
2015-08-25 23:33 ` [RFC PATCH v6 net-next 1/4] tcp: replace cnt & rtt with struct in pkts_acked() Lawrence Brakmo
2015-08-27 21:42   ` Yuchung Cheng
2015-08-25 23:33 ` [RFC PATCH v6 net-next 2/4] tcp: refactor struct tcp_skb_cb Lawrence Brakmo
2015-08-27 21:53   ` Yuchung Cheng
2015-08-25 23:33 ` [RFC PATCH v6 net-next 3/4] tcp: add in_flight to tcp_skb_cb Lawrence Brakmo
2015-08-27 22:00   ` Yuchung Cheng
2015-08-27 22:44     ` Lawrence Brakmo
2015-08-27 22:54       ` Yuchung Cheng
2015-08-27 22:57         ` Yuchung Cheng
2015-08-27 23:27           ` Lawrence Brakmo [this message]
2015-08-25 23:33 ` [RFC PATCH v6 net-next 4/4] tcp: add NV congestion control Lawrence Brakmo
2015-08-26  0:19 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D204E9B0.7469%brakmo@fb.com \
    --to=brakmo@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kennetkl@ifi.uio.no \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.