From: "Bendik Rønning Opstad" <bro.devel@gmail.com>
To: "Bendik Rønning Opstad" <bro.devel@gmail.com>,
"Eric Dumazet" <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Netdev <netdev@vger.kernel.org>,
"Yuchung Cheng" <ycheng@google.com>,
"Neal Cardwell" <ncardwell@google.com>,
"Andreas Petlund" <apetlund@simula.no>,
"Carsten Griwodz" <griff@simula.no>,
"Pål Halvorsen" <paalh@simula.no>,
"Jonas Markussen" <jonassm@ifi.uio.no>,
"Kristian Evensen" <kristian.evensen@gmail.com>,
"Kenneth Klette Jonassen" <kennetkl@ifi.uio.no>
Subject: Re: [PATCH v3 net-next 2/2] tcp: Add Redundant Data Bundling (RDB)
Date: Mon, 8 Feb 2016 18:30:49 +0100 [thread overview]
Message-ID: <56B8D0C9.9010509@gmail.com> (raw)
In-Reply-To: <CAF8eE=VOuoNLQHtkRwM9ZG+vJ-uH2ufVW5y_pS24rGqWh4Qa2g@mail.gmail.com>
Sorry guys, I messed up that email by including HTML, and it got
rejected by netdev@vger.kernel.org. I'll resend it properly formatted.
Bendik
On 08/02/16 18:17, Bendik Rønning Opstad wrote:
> Eric, thank you for the feedback!
>
> On Wed, Feb 3, 2016 at 8:34 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Wed, 2016-02-03 at 19:17 +0100, Bendik Rønning Opstad wrote:
>>> On Tue, Feb 2, 2016 at 9:35 PM, Eric Dumazet <eric.dumazet@gmail.com>
> wrote:
>>>> Really this looks very complicated.
>>>
>>> Can you be more specific?
>>
>> A lot of code added, needing maintenance cost for years to come.
>
> Yes, that is understandable.
>
>>>> Why not simply append the new skb content to prior one ?
>>>
>>> It's not clear to me what you mean. At what stage in the output engine
>>> do you refer to?
>>>
>>> We want to avoid modifying the data of the SKBs in the output queue,
>>
>> Why ? We already do that, as I pointed out.
>
> I suspect that we might be talking past each other. It wasn't clear to
> me that we were discussing how to implement this in a different way.
>
> The current retrans collapse functionality only merges SKBs that
> contain data that has already been sent and is about to be
> retransmitted.
>
> This differs significantly from RDB, which combines both already
> transmitted data and unsent data in the same packet without changing
> how the data is stored (and the state tracked) in the output queue.
> Another difference is that RDB includes un-ACKed data that is not
> considered lost.
>
>>> therefore we allocate a new SKB (This SKB is named rdb_skb in the code).
>>> The header and payload of the first SKB containing data we want to
>>> redundantly transmit is then copied. Then the payload of the SKBs
> following
>>> next in the output queue is appended onto the rdb_skb. The last payload
>>> that is appended is from the first SKB with unsent data, i.e. the
>>> sk_send_head.
>>>
>>> Would you suggest a different approach?
>>>
>>>> skb_still_in_host_queue(sk, prior_skb) would also tell you if the skb
> is
>>>> really available (ie its clone not sitting/waiting in a qdisc on the
>>>> host)
>>>
>>> Where do you suggest this should be used?
>>
>> To detect if appending data to prior skb is possible.
>
> I see. As the implementation intentionally avoids modifying SKBs in
> the output queue, this was not obvious.
>
>> If the prior packet is still in qdisc, no change is allowed,
>> and it is fine : DRB should not trigger anyway.
>
> Actually, whether the data in the prior SKB is on the wire or is still
> on the host (in qdisc/driver queue) is not relevant. RDB always wants
> to redundantly resend the data if there is room in the packet, because
> the previous packet may become lost.
>
>>>> Note : select_size() always allocate skb with SKB_WITH_OVERHEAD(2048 -
>>>> MAX_TCP_HEADER) available bytes in skb->data.
>>>
>>> Sure, rdb_build_skb() could use this instead of the calculated
>>> bytes_in_rdb_skb.
>>
>> Point is : small packets already have tail room in skb->head
>
> Yes, I'm aware of that. But we do not allocate new SKBs because we
> think the existing SKBs do not have enough space available. We do it
> to avoid modifications to the SKBs in the output queue.
>
>> When RDB decides a packet should be merged into the prior one, you can
>> simply copy payload into the tailroom, then free the skb.
>>
>> No skb allocations are needed, only freeing.
>
> It wasn't clear to me that you suggest a completely different
> implementation approach altogether.
>
> As I understand you, the approach you suggest is as follows:
>
> 1. An SKB containing unsent data is processed for transmission (lets
> call it T_SKB)
> 2. Check if the previous SKB (lets call it P_SKB) (containing sent but
> un-ACKed data) has available (tail) room for the payload contained
> in T_SKB.
> 3. If room in P_SKB:
> * Copy the unsent data from T_SKB to P_SKB by appending it to the
> linear data and update sequence numbers.
> * Remove T_SKB (which contains only the new and unsent data) from
> the output queue.
> * Transmit P_SKB, which now contains some already sent data and some
> unsent data.
>
>
> If I have misunderstood, can you please elaborate in detail what you
> mean?
>
> If this is the approach you suggest, I can think of some potential
> downsides that require further considerations:
>
>
> 1) ACK-accounting will work differently
>
> When the previous SKB (P_SKB) is modified by appending the data of the
> next SKB (T_SKB), what should happen when an incoming ACK
> acknowledges the data that was sent in the original transmission
> (before the SKB was modified), but not the data that was appended
> later? tcp_clean_rtx_queue currently handles partially ACKed SKBs due
> to TSO, in which case the tcp_skb_pcount(skb) > 1. So this function
> would need to be modified to handle this for RDB modified SKBs in the
> queue, where all the data is located in the linear data buffer (no GSO
> segs).
>
> How should SACK and retrans flags be handled when one SKB in the
> output queue can represent multiple transmitted packets?
>
>
> 2) Timestamps and RTT measurements
>
> How should RTT measurements work when you don't have a timestamp for
> the data that was newly appended to the existing SKB containing sent
> but un-ACKed data? Or should the skb->skb_mstamp be updated when the
> SKB with newly appended data is sent again? That would make any RTT
> measurements based on ACKs on the originally sent packet unusable.
>
>
> 3) Retransmit and lost SKB hints
>
> Appending unsent data to SKBs with sent data will affect the usage of
> tp->retransmit_skb_hint and tp->lost_skb_hint. As these variables
> contain pointers to SKBs in the output queue, it is implied that all
> the data in an SKB has the same state, such as retransmitted or lost.
>
>
> 4) RDB's loss accounting
>
> RDB detects loss by looking at how many segments that are ACKed. If an
> incoming ACK acknowledges data in multiples SKBs, we can infer that
> loss has occurred (ignoring the possibility of reordering). With the
> approach you suggest, we lose the information about how many packets
> we originally had, and how much of the payload was redundant
> (considering SKBs are updated with new data and sent out again). We
> would need additional variables in order to keep track of this.
>
>
> 5) Forced bundling on retransmissions
>
> Since the SKBs in the output queue are modified to contain redundant
> data, retransmissions of the SKBs will necessarily only contain the
> redundant data unless the SKBs are modified before the retransmission.
>
>
> 6) Configuring how much is bundled becomes complex
>
> When previous SKBs are to be used by appending the new data to be
> sent, it is no longer possible to configure the amount of data to
> bundle. We are forced to bundle all the data in the previous SKB.
>
> Say we have 3 SKBs in the queue, with unsent segments 1, 2, 3:
> [1] [2] [3]
>
> Send 1:
> [1] ->
> Try to send 2, but first merge 2 with 1:
> [1,2] [3]
> Send merged SKB:
> [1,2] ->
>
> When we want to send segment 3, we are forced to bundle both 1 and 2.
> Try to send 3, but first merge 3 with 1,2.
> [1,2,3]
> Send merged SKB:
> [1,2,3] ->
>
> Transmitting only 2,3 in a packet then becomes difficult without
> additional logic for RDB record keeping.
>
>
>> RDB could be implemented in a more concise way.
>
> I'm open for suggestions to improvements. However, I can't see how the
> suggested approach (as I've understood it) can be implemented without
> making extensive modifications to the current TCP engine. Having one
> SKB represent multiple packets, where each packet contains different data
> and possibly in different states (retransmitted/lost), seems very complex.
>
> By avoiding any modifications to the output queue we ensure the
> default code branch is completely unaffected, avoiding any special
> handling in multiple locations in the codebase.
>
>
> Regards,
>
> Bendik
>
next prev parent reply other threads:[~2016-02-08 17:31 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-23 20:50 [PATCH RFC net-next 0/2] tcp: Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-10-23 20:50 ` [PATCH RFC net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2015-10-23 21:44 ` Eric Dumazet
[not found] ` <1445636654.22974.193.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2015-10-25 5:56 ` Bendik Rønning Opstad
2015-10-23 20:50 ` [PATCH RFC net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-10-26 14:50 ` Neal Cardwell
2015-10-26 21:35 ` Andreas Petlund
2015-10-26 21:58 ` Yuchung Cheng
2015-10-27 19:15 ` Jonas Markussen
2015-10-29 22:53 ` Bendik Rønning Opstad
2015-11-02 9:18 ` David Laight
2015-11-02 9:37 ` David Laight
2015-11-05 2:06 ` Bendik Rønning Opstad
2015-10-24 6:11 ` [PATCH RFC net-next 0/2] tcp: " Yuchung Cheng
2015-10-24 8:00 ` Jonas Markussen
[not found] ` <61F74109-9FDC-485A-978B-714B7AA27445-6miFZF/5cTBuMpJDpNschA@public.gmane.org>
2015-10-24 12:57 ` Eric Dumazet
2015-11-09 19:40 ` Bendik Rønning Opstad
2015-11-23 16:26 ` [PATCH RFC v2 " Bendik Rønning Opstad
2015-11-23 16:26 ` [PATCH RFC v2 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2015-11-23 16:26 ` [PATCH RFC v2 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-11-23 17:43 ` Eric Dumazet
2015-11-23 20:05 ` Bendik Rønning Opstad
2016-02-02 19:23 ` [PATCH v3 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-02-02 19:23 ` [PATCH v3 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-02-02 19:23 ` [PATCH v3 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-02-02 20:35 ` Eric Dumazet
2016-02-03 18:17 ` Bendik Rønning Opstad
2016-02-03 19:34 ` Eric Dumazet
[not found] ` <CAF8eE=VOuoNLQHtkRwM9ZG+vJ-uH2ufVW5y_pS24rGqWh4Qa2g@mail.gmail.com>
2016-02-08 17:30 ` Bendik Rønning Opstad [this message]
2016-02-08 17:38 ` Bendik Rønning Opstad
2016-02-16 13:51 ` [PATCH v4 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-02-16 13:51 ` [PATCH v4 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-02-16 13:51 ` [PATCH v4 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-02-18 15:18 ` Eric Dumazet
2016-02-19 14:12 ` Bendik Rønning Opstad
2016-02-24 21:12 ` [PATCH v5 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-02-24 21:12 ` [PATCH v5 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-02-24 21:12 ` [PATCH v5 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-03-02 19:52 ` David Miller
2016-03-02 22:33 ` Bendik Rønning Opstad
2016-03-03 18:06 ` [PATCH v6 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-03-07 19:36 ` David Miller
2016-03-10 0:20 ` Yuchung Cheng
2016-03-10 1:45 ` Jonas Markussen
2016-03-10 2:27 ` Yuchung Cheng
2016-03-12 9:23 ` Jonas Markussen
2016-03-13 23:18 ` Bendik Rønning Opstad
2016-03-14 21:59 ` Yuchung Cheng
2016-03-18 14:25 ` Bendik Rønning Opstad
2016-03-03 18:06 ` [PATCH v6 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-03-03 18:06 ` [PATCH v6 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-03-14 21:15 ` Eric Dumazet
2016-03-15 1:04 ` Rick Jones
2016-03-15 18:09 ` Yuchung Cheng
2016-03-18 17:58 ` Bendik Rønning Opstad
2016-03-14 21:54 ` Yuchung Cheng
2016-03-15 0:40 ` Bill Fink
2016-03-17 23:26 ` Bendik Rønning Opstad
2016-03-21 18:54 ` Yuchung Cheng
2016-06-16 17:12 ` Bendik Rønning Opstad
2016-06-22 14:56 ` [PATCH v7 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-06-22 14:56 ` [PATCH v7 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-06-22 14:56 ` [PATCH v7 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56B8D0C9.9010509@gmail.com \
--to=bro.devel@gmail.com \
--cc=apetlund@simula.no \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=griff@simula.no \
--cc=jonassm@ifi.uio.no \
--cc=kennetkl@ifi.uio.no \
--cc=kristian.evensen@gmail.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=paalh@simula.no \
--cc=ycheng@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).