Why tcp_sacktag_walk specially process next

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Why tcp_sacktag_walk specially process next_dup?
@ 2012-12-25 10:35 Yi Li
  2012-12-25 20:57 ` Ilpo Järvinen
  0 siblings, 1 reply; 2+ messages in thread
From: Yi Li @ 2012-12-25 10:35 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

Hi Ilpo,
I am a kernel newbie, maybe this question is simple.
If you have some free time, could you help me ?

I am reading your commit 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=68f8353b480e5f2e136c38a511abdbb88eaa8ce2, 
through this code path:

tcp_sacktag_write_queue() {
     if (tcp_sack_cache_ok(tp, cache) && !dup_sack &&
             after(end_seq, cache->start_seq)) {

             /* Head todo? */
             if (before(start_seq, cache->start_seq)) {
                         skb = tcp_sacktag_skip(skb, sk, &state,
                                start_seq);
                 skb = tcp_sacktag_walk(skb, sk, next_dup,
                                &state,
                                start_seq,
                                cache->start_seq,
                                dup_sack);
             }

}

and when we come to tcp_sacktag_walk(), comparing the current processing 
sack block
with cache, we have:  start_seq < cache->start_seq, and we now need to 
process the
bytes between (start_seq, cache->start_seq) in tcp_write_queue.

But in tcp_sacktag_walk(), why we first check the seqence space in 
next_dup ?
I know this is about D-SACK, and I have read the rfc2883, but I am still 
confused.
I have some questions:
1. Why we introduce a next_dup variable in SACK processing, is it better 
for performance optimization?
      As there is dup_sack variable, will this pre-processing of sack 
block be mixed with dup_sack ?
2. What does this test statement means in tcp_sacktag_walk:
      if ((next_dup != NULL) &&
             before(TCP_SKB_CB(skb)->seq, next_dup->end_seq)) { 
---------------------> A
             in_sack = tcp_match_skb_to_sack(sk, skb,
                             next_dup->start_seq,
                             next_dup->end_seq);
             if (in_sack > 0)
                 dup_sack = true;
         }
as far as i know, if tcp_skb_pcout(skb)>1, this condition maybe exist:
        skb->seq   < current_sack_block.start_seq < 
current_sack_block.end_seq < next_dup->start_seq < next_dup->end_seq.
So, I do not understand what the code A really does.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Why tcp_sacktag_walk specially process next_dup?
  2012-12-25 10:35 Why tcp_sacktag_walk specially process next_dup? Yi Li
@ 2012-12-25 20:57 ` Ilpo Järvinen
  0 siblings, 0 replies; 2+ messages in thread
From: Ilpo Järvinen @ 2012-12-25 20:57 UTC (permalink / raw)
  To: Yi Li; +Cc: Netdev

On Tue, 25 Dec 2012, Yi Li wrote:

> Hi Ilpo,
> I am a kernel newbie, maybe this question is simple.
> If you have some free time, could you help me ?
>
> I am reading your commit
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=68f8353b480e5f2e136c38a511abdbb88eaa8ce2,
> through this code path:
> 
> tcp_sacktag_write_queue() {
>     if (tcp_sack_cache_ok(tp, cache) && !dup_sack &&
>             after(end_seq, cache->start_seq)) {
> 
>             /* Head todo? */
>             if (before(start_seq, cache->start_seq)) {
>                         skb = tcp_sacktag_skip(skb, sk, &state,
>                                start_seq);
>                 skb = tcp_sacktag_walk(skb, sk, next_dup,
>                                &state,
>                                start_seq,
>                                cache->start_seq,
>                                dup_sack);
>             }
> 
> }
> 
> and when we come to tcp_sacktag_walk(), comparing the current processing sack
> block
> with cache, we have:  start_seq < cache->start_seq, and we now need to
> process the
> bytes between (start_seq, cache->start_seq) in tcp_write_queue.
> 
> But in tcp_sacktag_walk(), why we first check the seqence space in next_dup ?
> I know this is about D-SACK, and I have read the rfc2883, but I am still
> confused.
> I have some questions:
> 1. Why we introduce a next_dup variable in SACK processing, is it better for
> performance optimization?

IIRC, it is optimization, probably to avoid need to do non-in-order ops 
(it's few years since I wrote that already :)). But I'll take a look later 
again.

>      As there is dup_sack variable, will this pre-processing of sack block be
> mixed with dup_sack ?

IIRC, dup_sack tells that we're processing DSACK currently.

> 2. What does this test statement means in tcp_sacktag_walk:
>      if ((next_dup != NULL) &&
>             before(TCP_SKB_CB(skb)->seq, next_dup->end_seq)) {
> ---------------------> A
>             in_sack = tcp_match_skb_to_sack(sk, skb,
>                             next_dup->start_seq,
>                             next_dup->end_seq);
>             if (in_sack > 0)
>                 dup_sack = true;
>         }
> as far as i know, if tcp_skb_pcout(skb)>1, this condition maybe exist:
>        skb->seq   < 
> current_sack_block.start_seq < current_sack_block.end_seq <
> next_dup->start_seq < next_dup->end_seq.
>
> So, I do not understand what the code A really does.

Then tcp_match_skb_to_sack won't match this skb but just splits first if 
necessary. It only does work if there's something to do already there 
(but I cannot fully check the code easily now so you might have some 
point I cannot follow based on this email alone).

BTW, these two conditions will always hold nowadays due to SACK block 
validation:

> current_sack_block.start_seq < current_sack_block.end_seq
> next_dup->start_seq < next_dup->end_seq.


-- 
 i.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-12-25 21:02 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-25 10:35 Why tcp_sacktag_walk specially process next_dup? Yi Li
2012-12-25 20:57 ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).