netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Bendik Rønning Opstad" <bro.devel@gmail.com>
To: "Bendik Rønning Opstad" <bro.devel@gmail.com>,
	"Eric Dumazet" <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>,
	"Yuchung Cheng" <ycheng@google.com>,
	"Neal Cardwell" <ncardwell@google.com>,
	"Andreas Petlund" <apetlund@simula.no>,
	"Carsten Griwodz" <griff@simula.no>,
	"Pål Halvorsen" <paalh@simula.no>,
	"Jonas Markussen" <jonassm@ifi.uio.no>,
	"Kristian Evensen" <kristian.evensen@gmail.com>,
	"Kenneth Klette Jonassen" <kennetkl@ifi.uio.no>
Subject: Re: [PATCH v3 net-next 2/2] tcp: Add Redundant Data Bundling (RDB)
Date: Mon, 8 Feb 2016 18:30:49 +0100	[thread overview]
Message-ID: <56B8D0C9.9010509@gmail.com> (raw)
In-Reply-To: <CAF8eE=VOuoNLQHtkRwM9ZG+vJ-uH2ufVW5y_pS24rGqWh4Qa2g@mail.gmail.com>

Sorry guys, I messed up that email by including HTML, and it got
rejected by netdev@vger.kernel.org. I'll resend it properly formatted.

Bendik

On 08/02/16 18:17, Bendik Rønning Opstad wrote:
> Eric, thank you for the feedback!
> 
> On Wed, Feb 3, 2016 at 8:34 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Wed, 2016-02-03 at 19:17 +0100, Bendik Rønning Opstad wrote:
>>> On Tue, Feb 2, 2016 at 9:35 PM, Eric Dumazet <eric.dumazet@gmail.com>
> wrote:
>>>> Really this looks very complicated.
>>>
>>> Can you be more specific?
>>
>> A lot of code added, needing maintenance cost for years to come.
> 
> Yes, that is understandable.
> 
>>>> Why not simply append the new skb content to prior one ?
>>>
>>> It's not clear to me what you mean. At what stage in the output engine
>>> do you refer to?
>>>
>>> We want to avoid modifying the data of the SKBs in the output queue,
>>
>> Why ? We already do that, as I pointed out.
> 
> I suspect that we might be talking past each other. It wasn't clear to
> me that we were discussing how to implement this in a different way.
> 
> The current retrans collapse functionality only merges SKBs that
> contain data that has already been sent and is about to be
> retransmitted.
> 
> This differs significantly from RDB, which combines both already
> transmitted data and unsent data in the same packet without changing
> how the data is stored (and the state tracked) in the output queue.
> Another difference is that RDB includes un-ACKed data that is not
> considered lost.
> 
>>> therefore we allocate a new SKB (This SKB is named rdb_skb in the code).
>>> The header and payload of the first SKB containing data we want to
>>> redundantly transmit is then copied. Then the payload of the SKBs
> following
>>> next in the output queue is appended onto the rdb_skb. The last payload
>>> that is appended is from the first SKB with unsent data, i.e. the
>>> sk_send_head.
>>>
>>> Would you suggest a different approach?
>>>
>>>> skb_still_in_host_queue(sk, prior_skb) would also tell you if the skb
> is
>>>> really available (ie its clone not sitting/waiting in a qdisc on the
>>>> host)
>>>
>>> Where do you suggest this should be used?
>>
>> To detect if appending data to prior skb is possible.
> 
> I see. As the implementation intentionally avoids modifying SKBs in
> the output queue, this was not obvious.
> 
>> If the prior packet is still in qdisc, no change is allowed,
>> and it is fine : DRB should not trigger anyway.
> 
> Actually, whether the data in the prior SKB is on the wire or is still
> on the host (in qdisc/driver queue) is not relevant. RDB always wants
> to redundantly resend the data if there is room in the packet, because
> the previous packet may become lost.
> 
>>>> Note : select_size() always allocate skb with SKB_WITH_OVERHEAD(2048 -
>>>> MAX_TCP_HEADER) available bytes in skb->data.
>>>
>>> Sure, rdb_build_skb() could use this instead of the calculated
>>> bytes_in_rdb_skb.
>>
>> Point is : small packets already have tail room in skb->head
> 
> Yes, I'm aware of that. But we do not allocate new SKBs because we
> think the existing SKBs do not have enough space available. We do it
> to avoid modifications to the SKBs in the output queue.
> 
>> When RDB decides a packet should be merged into the prior one, you can
>> simply copy payload into the tailroom, then free the skb.
>>
>> No skb allocations are needed, only freeing.
> 
> It wasn't clear to me that you suggest a completely different
> implementation approach altogether.
> 
> As I understand you, the approach you suggest is as follows:
> 
> 1. An SKB containing unsent data is processed for transmission (lets
>    call it T_SKB)
> 2. Check if the previous SKB (lets call it P_SKB) (containing sent but
>    un-ACKed data) has available (tail) room for the payload contained
>    in T_SKB.
> 3. If room in P_SKB:
>   * Copy the unsent data from T_SKB to P_SKB by appending it to the
>     linear data and update sequence numbers.
>   * Remove T_SKB (which contains only the new and unsent data) from
>     the output queue.
>   * Transmit P_SKB, which now contains some already sent data and some
>     unsent data.
> 
> 
> If I have misunderstood, can you please elaborate in detail what you
> mean?
> 
> If this is the approach you suggest, I can think of some potential
> downsides that require further considerations:
> 
> 
> 1) ACK-accounting will work differently
> 
> When the previous SKB (P_SKB) is modified by appending the data of the
> next SKB (T_SKB), what should happen when an incoming ACK
> acknowledges the data that was sent in the original transmission
> (before the SKB was modified), but not the data that was appended
> later? tcp_clean_rtx_queue currently handles partially ACKed SKBs due
> to TSO, in which case the tcp_skb_pcount(skb) > 1. So this function
> would need to be modified to handle this for RDB modified SKBs in the
> queue, where all the data is located in the linear data buffer (no GSO
> segs).
> 
> How should SACK and retrans flags be handled when one SKB in the
> output queue can represent multiple transmitted packets?
> 
> 
> 2) Timestamps and RTT measurements
> 
> How should RTT measurements work when you don't have a timestamp for
> the data that was newly appended to the existing SKB containing sent
> but un-ACKed data? Or should the skb->skb_mstamp be updated when the
> SKB with newly appended data is sent again? That would make any RTT
> measurements based on ACKs on the originally sent packet unusable.
> 
> 
> 3) Retransmit and lost SKB hints
> 
> Appending unsent data to SKBs with sent data will affect the usage of
> tp->retransmit_skb_hint and tp->lost_skb_hint. As these variables
> contain pointers to SKBs in the output queue, it is implied that all
> the data in an SKB has the same state, such as retransmitted or lost.
> 
> 
> 4) RDB's loss accounting
> 
> RDB detects loss by looking at how many segments that are ACKed. If an
> incoming ACK acknowledges data in multiples SKBs, we can infer that
> loss has occurred (ignoring the possibility of reordering). With the
> approach you suggest, we lose the information about how many packets
> we originally had, and how much of the payload was redundant
> (considering SKBs are updated with new data and sent out again). We
> would need additional variables in order to keep track of this.
> 
> 
> 5) Forced bundling on retransmissions
> 
> Since the SKBs in the output queue are modified to contain redundant
> data, retransmissions of the SKBs will necessarily only contain the
> redundant data unless the SKBs are modified before the retransmission.
> 
> 
> 6) Configuring how much is bundled becomes complex
> 
> When previous SKBs are to be used by appending the new data to be
> sent, it is no longer possible to configure the amount of data to
> bundle. We are forced to bundle all the data in the previous SKB.
> 
> Say we have 3 SKBs in the queue, with unsent segments 1, 2, 3:
> [1] [2] [3]
> 
> Send 1:
> [1] ->
> Try to send 2, but first merge 2 with 1:
> [1,2] [3]
> Send merged SKB:
> [1,2] ->
> 
> When we want to send segment 3, we are forced to bundle both 1 and 2.
> Try to send 3, but first merge 3 with 1,2.
> [1,2,3]
> Send merged SKB:
> [1,2,3] ->
> 
> Transmitting only 2,3 in a packet then becomes difficult without
> additional logic for RDB record keeping.
> 
> 
>> RDB could be implemented in a more concise way.
> 
> I'm open for suggestions to improvements. However, I can't see how the
> suggested approach (as I've understood it) can be implemented without
> making extensive modifications to the current TCP engine. Having one
> SKB represent multiple packets, where each packet contains different data
> and possibly in different states (retransmitted/lost), seems very complex.
> 
> By avoiding any modifications to the output queue we ensure the
> default code branch is completely unaffected, avoiding any special
> handling in multiple locations in the codebase.
> 
> 
> Regards,
> 
> Bendik
> 

  parent reply	other threads:[~2016-02-08 17:31 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-23 20:50 [PATCH RFC net-next 0/2] tcp: Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-10-23 20:50 ` [PATCH RFC net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2015-10-23 21:44   ` Eric Dumazet
     [not found]     ` <1445636654.22974.193.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2015-10-25  5:56       ` Bendik Rønning Opstad
2015-10-23 20:50 ` [PATCH RFC net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-10-26 14:50   ` Neal Cardwell
2015-10-26 21:35     ` Andreas Petlund
2015-10-26 21:58       ` Yuchung Cheng
2015-10-27 19:15         ` Jonas Markussen
2015-10-29 22:53         ` Bendik Rønning Opstad
2015-11-02  9:18           ` David Laight
2015-11-02  9:37   ` David Laight
2015-11-05  2:06     ` Bendik Rønning Opstad
2015-10-24  6:11 ` [PATCH RFC net-next 0/2] tcp: " Yuchung Cheng
2015-10-24  8:00   ` Jonas Markussen
     [not found]     ` <61F74109-9FDC-485A-978B-714B7AA27445-6miFZF/5cTBuMpJDpNschA@public.gmane.org>
2015-10-24 12:57       ` Eric Dumazet
2015-11-09 19:40         ` Bendik Rønning Opstad
2015-11-23 16:26 ` [PATCH RFC v2 " Bendik Rønning Opstad
2015-11-23 16:26 ` [PATCH RFC v2 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2015-11-23 16:26 ` [PATCH RFC v2 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-11-23 17:43   ` Eric Dumazet
2015-11-23 20:05     ` Bendik Rønning Opstad
2016-02-02 19:23 ` [PATCH v3 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-02-02 19:23 ` [PATCH v3 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-02-02 19:23 ` [PATCH v3 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-02-02 20:35   ` Eric Dumazet
2016-02-03 18:17     ` Bendik Rønning Opstad
2016-02-03 19:34       ` Eric Dumazet
     [not found]         ` <CAF8eE=VOuoNLQHtkRwM9ZG+vJ-uH2ufVW5y_pS24rGqWh4Qa2g@mail.gmail.com>
2016-02-08 17:30           ` Bendik Rønning Opstad [this message]
2016-02-08 17:38         ` Bendik Rønning Opstad
2016-02-16 13:51 ` [PATCH v4 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-02-16 13:51 ` [PATCH v4 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-02-16 13:51 ` [PATCH v4 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-02-18 15:18   ` Eric Dumazet
2016-02-19 14:12     ` Bendik Rønning Opstad
2016-02-24 21:12 ` [PATCH v5 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-02-24 21:12 ` [PATCH v5 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-02-24 21:12 ` [PATCH v5 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-03-02 19:52   ` David Miller
2016-03-02 22:33     ` Bendik Rønning Opstad
2016-03-03 18:06 ` [PATCH v6 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-03-07 19:36   ` David Miller
2016-03-10  0:20   ` Yuchung Cheng
2016-03-10  1:45     ` Jonas Markussen
2016-03-10  2:27       ` Yuchung Cheng
2016-03-12  9:23         ` Jonas Markussen
2016-03-13 23:18     ` Bendik Rønning Opstad
2016-03-14 21:59       ` Yuchung Cheng
2016-03-18 14:25         ` Bendik Rønning Opstad
2016-03-03 18:06 ` [PATCH v6 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-03-03 18:06 ` [PATCH v6 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-03-14 21:15   ` Eric Dumazet
2016-03-15  1:04     ` Rick Jones
2016-03-15 18:09       ` Yuchung Cheng
2016-03-18 17:58     ` Bendik Rønning Opstad
2016-03-14 21:54   ` Yuchung Cheng
2016-03-15  0:40     ` Bill Fink
2016-03-17 23:26     ` Bendik Rønning Opstad
2016-03-21 18:54       ` Yuchung Cheng
2016-06-16 17:12         ` Bendik Rønning Opstad
2016-06-22 14:56 ` [PATCH v7 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-06-22 14:56 ` [PATCH v7 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-06-22 14:56 ` [PATCH v7 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B8D0C9.9010509@gmail.com \
    --to=bro.devel@gmail.com \
    --cc=apetlund@simula.no \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=griff@simula.no \
    --cc=jonassm@ifi.uio.no \
    --cc=kennetkl@ifi.uio.no \
    --cc=kristian.evensen@gmail.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=paalh@simula.no \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).