* [PATCH 1/3] Advance fast path pointer for first block only
2007-01-27 16:47 [PATCH 0/3] Fix issues with SACK processing Baruch Even
@ 2007-01-27 16:48 ` Baruch Even
0 siblings, 0 replies; 8+ messages in thread
From: Baruch Even @ 2007-01-27 16:48 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
Only advance the SACK fast-path pointer for the first block, the fast-path
assumes that only the first block advances next time so we should not move the
skb for the next sack blocks.
Signed-Off-By: Baruch Even <baruch@ev-en.org>
---
I'm not sure about the fack_count part, this patch changes the value that is
maintained in this case.
Index: 2.6-rc6/net/ipv4/tcp_input.c
===================================================================
--- 2.6-rc6.orig/net/ipv4/tcp_input.c 2007-01-27 14:53:27.000000000 +0200
+++ 2.6-rc6/net/ipv4/tcp_input.c 2007-01-27 15:59:22.000000000 +0200
@@ -1048,8 +1048,13 @@
int in_sack, pcount;
u8 sacked;
- tp->fastpath_skb_hint = skb;
- tp->fastpath_cnt_hint = fack_count;
+ if (i == 0) {
+ /* Only advance the hint for the first SACK
+ * block, the hint is for quickly handling the
+ * advancing of the first SACK blocks only. */
+ tp->fastpath_skb_hint = skb;
+ tp->fastpath_cnt_hint = fack_count;
+ }
/* The retransmission queue is always in order, so
* we can short-circuit the walk early.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] Advance fast path pointer for first block only
[not found] <20070129071339.24050.21052.sendpatchset@galon.ev-en.org>
@ 2007-01-31 20:48 ` David Miller
2007-02-01 7:38 ` Baruch Even
[not found] ` <20070129071344.24050.14971.sendpatchset@galon.ev-en.org>
[not found] ` <20070129071349.24050.56022.sendpatchset@galon.ev-en.org>
2 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2007-01-31 20:48 UTC (permalink / raw)
To: baruch; +Cc: netdev
From: Baruch Even <baruch@ev-en.org>
Date: Mon, 29 Jan 2007 09:13:39 +0200
> Only advance the SACK fast-path pointer for the first block, the fast-path
> assumes that only the first block advances next time so we should not move the
> skb for the next sack blocks.
>
> Signed-Off-By: Baruch Even <baruch@ev-en.org>
>
> ---
>
> I'm not sure about the fack_count part, this patch changes the value that is
> maintained in this case.
I'm not sure about this patch :-)
The fastpath is being used for two things here, I believe.
First, it's being used to monitor the expanding of the initial
SACK block, as you note. This is clear by the fact that we
only use the fastpath cache on future calls when this code path
triggers:
if (flag)
num_sacks = 1;
because otherwise we clear the fastpath cache to NULL.
But it's also being used in this loop you are editing to remember
where we stopped in the previous iteration of the loop. With your
change it will always walk the whole retransmit queue from the end of
the first SACK block, whereas before it would iterate starting at the
last SKB visited at the end of the previous sack block processed by
the loop.
This sounds trivial, but on deep pipes it could bring back some of
the performance problems the hinting was meant to cure.
The fack_count issue should be OK, since we're using the value
properly calculated at the end of the first SACK block as we
iterate to find the SKBs within the SACK block. Another way
of saying this is that if the algorithm is correct when the
fastpath cache is missed, it should be correct if we only cache
for the first SACK block.
I'll use this opportunity to say I'm rather unhappy with all of
these fastpath hinting schemes. They chew up a ton of space in
the tcp_sock because we have several pointers for each of the
different queue states we can cache and those eat 8 bytes a
piece on 64-bit.
We can probably fix the bug and preserve the inter-iteration
end-SKB caching by having local variable SKB/fack_count caches
in this function, and only updating the tcp_sock cache values
for the first SACK block.
Maybe something like this?
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c26076f..b470a85 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -936,12 +936,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
struct tcp_sock *tp = tcp_sk(sk);
unsigned char *ptr = ack_skb->h.raw + TCP_SKB_CB(ack_skb)->sacked;
struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
+ struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3;
int reord = tp->packets_out;
int prior_fackets;
u32 lost_retrans = 0;
int flag = 0;
int dup_sack = 0;
+ int cached_fack_count;
int i;
if (!tp->sacked_out)
@@ -1025,20 +1027,22 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
/* clear flag as used for different purpose in following code */
flag = 0;
+ /* Use SACK fastpath hint if valid */
+ cached_skb = tp->fastpath_skb_hint;
+ cached_fack_count = tp->fastpath_cnt_hint;
+ if (!cached_skb) {
+ cached_skb = sk->sk_write_queue.next;
+ cached_fack_count = 0;
+ }
+
for (i=0; i<num_sacks; i++, sp++) {
struct sk_buff *skb;
__u32 start_seq = ntohl(sp->start_seq);
__u32 end_seq = ntohl(sp->end_seq);
int fack_count;
- /* Use SACK fastpath hint if valid */
- if (tp->fastpath_skb_hint) {
- skb = tp->fastpath_skb_hint;
- fack_count = tp->fastpath_cnt_hint;
- } else {
- skb = sk->sk_write_queue.next;
- fack_count = 0;
- }
+ skb = cached_skb;
+ fack_count = cached_fack_count;
/* Event "B" in the comment above. */
if (after(end_seq, tp->high_seq))
@@ -1048,8 +1052,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
int in_sack, pcount;
u8 sacked;
- tp->fastpath_skb_hint = skb;
- tp->fastpath_cnt_hint = fack_count;
+ cached_skb = skb;
+ cached_fack_count = fack_count;
+ if (i == 0) {
+ tp->fastpath_skb_hint = skb;
+ tp->fastpath_cnt_hint = fack_count;
+ }
/* The retransmission queue is always in order, so
* we can short-circuit the walk early.
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] Seperate DSACK from SACK fast path
[not found] ` <20070129071344.24050.14971.sendpatchset@galon.ev-en.org>
@ 2007-01-31 20:49 ` David Miller
0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2007-01-31 20:49 UTC (permalink / raw)
To: baruch; +Cc: netdev
From: Baruch Even <baruch@ev-en.org>
Date: Mon, 29 Jan 2007 09:13:44 +0200
> + for (i = 0; i < num_sacks; i++) {
> + __be32 start_seq = sp[i].start_seq;
> + __be32 end_seq = sp[i].end_seq;
This is not sufficient, you have to also fix up the type
of the recv_sack_cache[] array entries in struct tcp_sock
to be struct tcp_sack_block_wire.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3] Check num sacks in SACK fast path
[not found] ` <20070129071349.24050.56022.sendpatchset@galon.ev-en.org>
@ 2007-01-31 20:52 ` David Miller
2007-02-01 7:22 ` Baruch Even
0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2007-01-31 20:52 UTC (permalink / raw)
To: baruch; +Cc: netdev
From: Baruch Even <baruch@ev-en.org>
Date: Mon, 29 Jan 2007 09:13:49 +0200
> When we check for SACK fast path make sure that we also have the same number of
> SACK blocks in the cache and in the new SACK data. This prevents us from
> mistakenly taking the cache data if the old data in the SACK cache is the same
> as the data in the SACK block.
>
> Signed-Off-By: Baruch Even <baruch@ev-en.org>
We could implement this without extra state, for example by
clearing out the rest of the recv_sack_cache entries.
We should never see a SACK block from sequence zero to zero,
which would be an empty SACK block.
Something like the following?
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c26076f..84cd722 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -999,6 +1001,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
return 0;
}
}
+ for (; i <= 4; i++) {
+ tp->recv_sack_cache[i].start_seq = 0;
+ tp->recv_sack_cache[i].end_seq = 0;
+ }
if (flag)
num_sacks = 1;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3] Check num sacks in SACK fast path
2007-01-31 20:52 ` [PATCH 3/3] Check num sacks in " David Miller
@ 2007-02-01 7:22 ` Baruch Even
2007-02-01 22:38 ` David Miller
0 siblings, 1 reply; 8+ messages in thread
From: Baruch Even @ 2007-02-01 7:22 UTC (permalink / raw)
To: David Miller; +Cc: netdev
* David Miller <davem@davemloft.net> [070131 22:52]:
> From: Baruch Even <baruch@ev-en.org>
> Date: Mon, 29 Jan 2007 09:13:49 +0200
>
> > When we check for SACK fast path make sure that we also have the same number of
> > SACK blocks in the cache and in the new SACK data. This prevents us from
> > mistakenly taking the cache data if the old data in the SACK cache is the same
> > as the data in the SACK block.
> >
> > Signed-Off-By: Baruch Even <baruch@ev-en.org>
>
> We could implement this without extra state, for example by
> clearing out the rest of the recv_sack_cache entries.
>
> We should never see a SACK block from sequence zero to zero,
> which would be an empty SACK block.
That would work as well at the cost of extra writing to memory for each
ack packet. Though I won't guess what is worse, the extra memory used or
the extra writing.
> Something like the following?
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index c26076f..84cd722 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -999,6 +1001,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> return 0;
> }
> }
> + for (; i <= 4; i++) {
That won't work though, the <= should be <, I've actually used
ARRAY_SIZE just to be on the safe side.
> + tp->recv_sack_cache[i].start_seq = 0;
> + tp->recv_sack_cache[i].end_seq = 0;
> + }
>
> if (flag)
> num_sacks = 1;
Baruch
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] Advance fast path pointer for first block only
2007-01-31 20:48 ` [PATCH 1/3] Advance fast path pointer for first block only David Miller
@ 2007-02-01 7:38 ` Baruch Even
0 siblings, 0 replies; 8+ messages in thread
From: Baruch Even @ 2007-02-01 7:38 UTC (permalink / raw)
To: David Miller; +Cc: netdev
* David Miller <davem@davemloft.net> [070131 22:48]:
> From: Baruch Even <baruch@ev-en.org>
> Date: Mon, 29 Jan 2007 09:13:39 +0200
>
> > Only advance the SACK fast-path pointer for the first block, the fast-path
> > assumes that only the first block advances next time so we should not move the
> > skb for the next sack blocks.
> >
> > Signed-Off-By: Baruch Even <baruch@ev-en.org>
> >
> > ---
> >
> > I'm not sure about the fack_count part, this patch changes the value that is
> > maintained in this case.
>
> I'm not sure about this patch :-)
That's why we have a gatekeeper for. To keep us on our toes.
> The fastpath is being used for two things here, I believe.
>
> First, it's being used to monitor the expanding of the initial
> SACK block, as you note. This is clear by the fact that we
> only use the fastpath cache on future calls when this code path
> triggers:
>
> if (flag)
> num_sacks = 1;
>
> because otherwise we clear the fastpath cache to NULL.
True.
> But it's also being used in this loop you are editing to remember
> where we stopped in the previous iteration of the loop. With your
> change it will always walk the whole retransmit queue from the end of
> the first SACK block, whereas before it would iterate starting at the
> last SKB visited at the end of the previous sack block processed by
> the loop.
OK. When I did the patch I forgot that by this stage we have sorted the
SACK blocks and we can't rely on the expanding block to be the first.
> I'll use this opportunity to say I'm rather unhappy with all of
> these fastpath hinting schemes. They chew up a ton of space in
> the tcp_sock because we have several pointers for each of the
> different queue states we can cache and those eat 8 bytes a
> piece on 64-bit.
Understood. I do intend to look at different ways to organise the data
and do the work, but I wouldn't hold my breath for this.
One option to limit the damage of slow processing of SACK is to limit
the amount of work we are willing to do for SACK, if we limit the SACK
walk to 1000 packets we will prevent the worst and I believe that we
will still not cause much harm to the TCP recovery. This belief will
need to be checked, the value 1000 will also need to be checked and
maybe made configurable.
This is obviously a hack, but it will guarantee that we never cause the
ACK clock to die like it can now.
> We can probably fix the bug and preserve the inter-iteration
> end-SKB caching by having local variable SKB/fack_count caches
> in this function, and only updating the tcp_sock cache values
> for the first SACK block.
>
> Maybe something like this?
Ah, now I see what you meant and it is indeed another issue with my
patch. I'll change my patch with this fix and fix the issue of i == 0
being the wrong way to cache only the originally first SACK block.
Baruch
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3] Check num sacks in SACK fast path
2007-02-01 7:22 ` Baruch Even
@ 2007-02-01 22:38 ` David Miller
0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2007-02-01 22:38 UTC (permalink / raw)
To: baruch; +Cc: netdev
From: Baruch Even <baruch@ev-en.org>
Date: Thu, 1 Feb 2007 09:22:52 +0200
> * David Miller <davem@davemloft.net> [070131 22:52]:
> > We should never see a SACK block from sequence zero to zero,
> > which would be an empty SACK block.
>
> That would work as well at the cost of extra writing to memory for each
> ack packet. Though I won't guess what is worse, the extra memory used or
> the extra writing.
Good point. The recv_sack_cache is 32-bytes, so sits in at most
2 cache lines. The writes are consequetive and in order so would
compress into cpu store buffers.
But it's definitely not free. :-)
> > Something like the following?
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index c26076f..84cd722 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -999,6 +1001,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> > return 0;
> > }
> > }
> > + for (; i <= 4; i++) {
>
> That won't work though, the <= should be <, I've actually used
> ARRAY_SIZE just to be on the safe side.
Thanks for catching that.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] Advance fast path pointer for first block only
[not found] <20070202144116.26863.3722.sendpatchset@galon.ev-en.org>
@ 2007-02-05 7:36 ` David Miller
0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2007-02-05 7:36 UTC (permalink / raw)
To: baruch; +Cc: netdev
From: Baruch Even <baruch@ev-en.org>
Date: Fri, 02 Feb 2007 16:41:16 +0200
> Only advance the SACK fast-path pointer for the first block, the fast-path
> assumes that only the first block advances next time so we should not move the
> cached skb for the next sack blocks.
>
> Signed-Off-By: Baruch Even <baruch@ev-en.org>
Looks great, patch applied, thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-02-05 7:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070129071339.24050.21052.sendpatchset@galon.ev-en.org>
2007-01-31 20:48 ` [PATCH 1/3] Advance fast path pointer for first block only David Miller
2007-02-01 7:38 ` Baruch Even
[not found] ` <20070129071344.24050.14971.sendpatchset@galon.ev-en.org>
2007-01-31 20:49 ` [PATCH 2/3] Seperate DSACK from SACK fast path David Miller
[not found] ` <20070129071349.24050.56022.sendpatchset@galon.ev-en.org>
2007-01-31 20:52 ` [PATCH 3/3] Check num sacks in " David Miller
2007-02-01 7:22 ` Baruch Even
2007-02-01 22:38 ` David Miller
[not found] <20070202144116.26863.3722.sendpatchset@galon.ev-en.org>
2007-02-05 7:36 ` [PATCH 1/3] Advance fast path pointer for first block only David Miller
2007-01-27 16:47 [PATCH 0/3] Fix issues with SACK processing Baruch Even
2007-01-27 16:48 ` [PATCH 1/3] Advance fast path pointer for first block only Baruch Even
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).