* [PATCH 0/9]: TCP: The Road to Super TSO
@ 2005-06-07 4:08 David S. Miller
2005-06-07 4:16 ` [PATCH 1/9]: " David S. Miller
` (10 more replies)
0 siblings, 11 replies; 18+ messages in thread
From: David S. Miller @ 2005-06-07 4:08 UTC (permalink / raw)
To: netdev; +Cc: herbert, jheffner
Some folks, notable the S2IO guys, get performance degradation
from the Super TSO v2 patch (they get it from the first version
as well). It's a real pain to spot what causes such things
in such a huge patch... so I started splitting things up in
a very fine grained manner so we can catch regressions more
precisely.
There are several bugs spotted by this first set of 9 patches,
and I'd really appreciate good high-quality testing reports.
Please do not mail such reports privately to me, as some have
done, always include netdev@oss.sgi.com, thanks a lot.
Herbert, I'm CC:'ing you because one of the bugs fixed here
has to do with the TSO header COW'ing stuff you did. You
missed one case where a skb_header_release() call was needed,
namely tcp_fragment() where it does it's __skb_append().
John, I'm CC:'ing you because there are several cwnd handling
related cures in here. I did _not_ fix the TSO cwnd growth
bug yet in these patches, but it is at the very top of my
TODO list for my next batch of work on this stuff. The most
notable fix here is the bogus extra cwnd validation done by
__tcp_push_pending_frames(). That validation should only
occur if we _do_ send some packets, and tcp_write_xmit() takes
care of that just fine. The other one is that the 'nonagle'
argument to __tcp_push_pending_frames() is clobbered by it's
tcp_skb_is_last() logic, causing TCP_NAGLE_PUSH to be used for
all packets processed by tcp_write_xmit(), whoops...
Please help me review this stuff, thanks.
The patches will show up as followups to this email.
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller @ 2005-06-07 4:16 ` David S. Miller 2005-06-07 4:17 ` [PATCH 2/9]: " David S. Miller ` (9 subsequent siblings) 10 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 4:16 UTC (permalink / raw) To: netdev; +Cc: herbert, jheffner [TCP]: Simplify SKB data portion allocation with NETIF_F_SG. The ideal and most optimal layout for an SKB when doing scatter-gather is to put all the headers at skb->data, and all the user data in the page array. This makes SKB splitting and combining extremely simple, especially before a packet goes onto the wire the first time. So, when sk_stream_alloc_pskb() is given a zero size, make sure there is no skb_tailroom(). This is achieved by applying SKB_DATA_ALIGN() to the header length used here. Next, make select_size() in TCP output segmentation use a length of zero when NETIF_F_SG is true on the outgoing interface. Signed-off-by: David S. Miller <davem@davemloft.net> 28f78ef8dcc90a2a26499dab76678bd6813d7793 (from 3f5948fa2cbbda1261eec9a39ef3004b3caf73fb) diff --git a/include/net/sock.h b/include/net/sock.h --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1130,13 +1130,16 @@ static inline void sk_stream_moderate_sn static inline struct sk_buff *sk_stream_alloc_pskb(struct sock *sk, int size, int mem, int gfp) { - struct sk_buff *skb = alloc_skb(size + sk->sk_prot->max_header, gfp); + struct sk_buff *skb; + int hdr_len; + hdr_len = SKB_DATA_ALIGN(sk->sk_prot->max_header); + skb = alloc_skb(size + hdr_len, gfp); if (skb) { skb->truesize += mem; if (sk->sk_forward_alloc >= (int)skb->truesize || sk_stream_mem_schedule(sk, skb->truesize, 0)) { - skb_reserve(skb, sk->sk_prot->max_header); + skb_reserve(skb, hdr_len); return skb; } __kfree_skb(skb); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -775,13 +775,9 @@ static inline int select_size(struct soc { int tmp = tp->mss_cache_std; - if (sk->sk_route_caps & NETIF_F_SG) { - int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER); + if (sk->sk_route_caps & NETIF_F_SG) + tmp = 0; - if (tmp >= pgbreak && - tmp <= pgbreak + (MAX_SKB_FRAGS - 1) * PAGE_SIZE) - tmp = pgbreak; - } return tmp; } @@ -891,11 +887,6 @@ new_segment: tcp_mark_push(tp, skb); goto new_segment; } else if (page) { - /* If page is cached, align - * offset to L1 cache boundary - */ - off = (off + L1_CACHE_BYTES - 1) & - ~(L1_CACHE_BYTES - 1); if (off == PAGE_SIZE) { put_page(page); TCP_PAGE(sk) = page = NULL; ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller 2005-06-07 4:16 ` [PATCH 1/9]: " David S. Miller @ 2005-06-07 4:17 ` David S. Miller 2005-06-07 4:17 ` [PATCH 3/9]: " David S. Miller ` (8 subsequent siblings) 10 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 4:17 UTC (permalink / raw) To: netdev; +Cc: herbert, jheffner [TCP]: Fix quick-ack decrementing with TSO. On each packet output, we call tcp_dec_quickack_mode() if the ACK flag is set. It drops tp->ack.quick until it hits zero, at which time we deflate the ATO value. When doing TSO, we are emitting multiple packets with ACK set, so we should decrement tp->ack.quick that many segments. Note that, unlike this case, tcp_enter_cwr() should not take the tcp_skb_pcount(skb) into consideration. That function, one time, readjusts tp->snd_cwnd and moves into TCP_CA_CWR state. Signed-off-by: David S. Miller <davem@davemloft.net> 00cb08b2ec091f4b461210026392edeaccf31d9c (from 28f78ef8dcc90a2a26499dab76678bd6813d7793) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -817,11 +817,16 @@ static inline int tcp_ack_scheduled(stru return tp->ack.pending&TCP_ACK_SCHED; } -static __inline__ void tcp_dec_quickack_mode(struct tcp_sock *tp) +static __inline__ void tcp_dec_quickack_mode(struct tcp_sock *tp, unsigned int pkts) { - if (tp->ack.quick && --tp->ack.quick == 0) { - /* Leaving quickack mode we deflate ATO. */ - tp->ack.ato = TCP_ATO_MIN; + if (tp->ack.quick) { + if (pkts >= tp->ack.quick) { + tp->ack.quick = 0; + + /* Leaving quickack mode we deflate ATO. */ + tp->ack.ato = TCP_ATO_MIN; + } else + tp->ack.quick -= pkts; } } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -141,11 +141,11 @@ static inline void tcp_event_data_sent(s tp->ack.pingpong = 1; } -static __inline__ void tcp_event_ack_sent(struct sock *sk) +static __inline__ void tcp_event_ack_sent(struct sock *sk, unsigned int pkts) { struct tcp_sock *tp = tcp_sk(sk); - tcp_dec_quickack_mode(tp); + tcp_dec_quickack_mode(tp, pkts); tcp_clear_xmit_timer(sk, TCP_TIME_DACK); } @@ -361,7 +361,7 @@ static int tcp_transmit_skb(struct sock tp->af_specific->send_check(sk, th, skb->len, skb); if (tcb->flags & TCPCB_FLAG_ACK) - tcp_event_ack_sent(sk); + tcp_event_ack_sent(sk, tcp_skb_pcount(skb)); if (skb->len != tcp_header_size) tcp_event_data_sent(tp, skb, sk); ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller 2005-06-07 4:16 ` [PATCH 1/9]: " David S. Miller 2005-06-07 4:17 ` [PATCH 2/9]: " David S. Miller @ 2005-06-07 4:17 ` David S. Miller 2005-06-07 4:18 ` [PATCH 4/9]: " David S. Miller ` (7 subsequent siblings) 10 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 4:17 UTC (permalink / raw) To: netdev; +Cc: herbert, jheffner [TCP]: Move send test logic out of net/tcp.h This just moves the code into tcp_output.c, no code logic changes are made by this patch. Using this as a baseline, we can begin to untangle the mess of comparisons for the Nagle test et al. We will also be able to reduce all of the redundant computation that occurs when outputting data packets. Signed-off-by: David S. Miller <davem@davemloft.net> cba5d690f46699d37df7dc087247d1f7c7155692 (from 00cb08b2ec091f4b461210026392edeaccf31d9c) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -945,6 +945,9 @@ extern __u32 cookie_v4_init_sequence(str /* tcp_output.c */ extern int tcp_write_xmit(struct sock *, int nonagle); +extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, + unsigned cur_mss, int nonagle); +extern int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp); extern int tcp_retransmit_skb(struct sock *, struct sk_buff *); extern void tcp_xmit_retransmit_queue(struct sock *); extern void tcp_simple_retransmit(struct sock *); @@ -1389,12 +1392,6 @@ static __inline__ __u32 tcp_max_burst(co return 3; } -static __inline__ int tcp_minshall_check(const struct tcp_sock *tp) -{ - return after(tp->snd_sml,tp->snd_una) && - !after(tp->snd_sml, tp->snd_nxt); -} - static __inline__ void tcp_minshall_update(struct tcp_sock *tp, int mss, const struct sk_buff *skb) { @@ -1402,122 +1399,18 @@ static __inline__ void tcp_minshall_upda tp->snd_sml = TCP_SKB_CB(skb)->end_seq; } -/* Return 0, if packet can be sent now without violation Nagle's rules: - 1. It is full sized. - 2. Or it contains FIN. - 3. Or TCP_NODELAY was set. - 4. Or TCP_CORK is not set, and all sent packets are ACKed. - With Minshall's modification: all sent small packets are ACKed. - */ - -static __inline__ int -tcp_nagle_check(const struct tcp_sock *tp, const struct sk_buff *skb, - unsigned mss_now, int nonagle) -{ - return (skb->len < mss_now && - !(TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN) && - ((nonagle&TCP_NAGLE_CORK) || - (!nonagle && - tp->packets_out && - tcp_minshall_check(tp)))); -} - -extern void tcp_set_skb_tso_segs(struct sock *, struct sk_buff *); - -/* This checks if the data bearing packet SKB (usually sk->sk_send_head) - * should be put on the wire right now. - */ -static __inline__ int tcp_snd_test(struct sock *sk, - struct sk_buff *skb, - unsigned cur_mss, int nonagle) -{ - struct tcp_sock *tp = tcp_sk(sk); - int pkts = tcp_skb_pcount(skb); - - if (!pkts) { - tcp_set_skb_tso_segs(sk, skb); - pkts = tcp_skb_pcount(skb); - } - - /* RFC 1122 - section 4.2.3.4 - * - * We must queue if - * - * a) The right edge of this frame exceeds the window - * b) There are packets in flight and we have a small segment - * [SWS avoidance and Nagle algorithm] - * (part of SWS is done on packetization) - * Minshall version sounds: there are no _small_ - * segments in flight. (tcp_nagle_check) - * c) We have too many packets 'in flight' - * - * Don't use the nagle rule for urgent data (or - * for the final FIN -DaveM). - * - * Also, Nagle rule does not apply to frames, which - * sit in the middle of queue (they have no chances - * to get new data) and if room at tail of skb is - * not enough to save something seriously (<32 for now). - */ - - /* Don't be strict about the congestion window for the - * final FIN frame. -DaveM - */ - return (((nonagle&TCP_NAGLE_PUSH) || tp->urg_mode - || !tcp_nagle_check(tp, skb, cur_mss, nonagle)) && - (((tcp_packets_in_flight(tp) + (pkts-1)) < tp->snd_cwnd) || - (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)) && - !after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd)); -} - static __inline__ void tcp_check_probe_timer(struct sock *sk, struct tcp_sock *tp) { if (!tp->packets_out && !tp->pending) tcp_reset_xmit_timer(sk, TCP_TIME_PROBE0, tp->rto); } -static __inline__ int tcp_skb_is_last(const struct sock *sk, - const struct sk_buff *skb) -{ - return skb->next == (struct sk_buff *)&sk->sk_write_queue; -} - -/* Push out any pending frames which were held back due to - * TCP_CORK or attempt at coalescing tiny packets. - * The socket must be locked by the caller. - */ -static __inline__ void __tcp_push_pending_frames(struct sock *sk, - struct tcp_sock *tp, - unsigned cur_mss, - int nonagle) -{ - struct sk_buff *skb = sk->sk_send_head; - - if (skb) { - if (!tcp_skb_is_last(sk, skb)) - nonagle = TCP_NAGLE_PUSH; - if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || - tcp_write_xmit(sk, nonagle)) - tcp_check_probe_timer(sk, tp); - } - tcp_cwnd_validate(sk, tp); -} - static __inline__ void tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp) { __tcp_push_pending_frames(sk, tp, tcp_current_mss(sk, 1), tp->nonagle); } -static __inline__ int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp) -{ - struct sk_buff *skb = sk->sk_send_head; - - return (skb && - tcp_snd_test(sk, skb, tcp_current_mss(sk, 1), - tcp_skb_is_last(sk, skb) ? TCP_NAGLE_PUSH : tp->nonagle)); -} - static __inline__ void tcp_init_wl(struct tcp_sock *tp, u32 ack, u32 seq) { tp->snd_wl1 = seq; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -419,6 +419,135 @@ static inline void tcp_tso_set_push(stru TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_PSH; } +static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (skb->len <= tp->mss_cache_std || + !(sk->sk_route_caps & NETIF_F_TSO)) { + /* Avoid the costly divide in the normal + * non-TSO case. + */ + skb_shinfo(skb)->tso_segs = 1; + skb_shinfo(skb)->tso_size = 0; + } else { + unsigned int factor; + + factor = skb->len + (tp->mss_cache_std - 1); + factor /= tp->mss_cache_std; + skb_shinfo(skb)->tso_segs = factor; + skb_shinfo(skb)->tso_size = tp->mss_cache_std; + } +} + +static inline int tcp_minshall_check(const struct tcp_sock *tp) +{ + return after(tp->snd_sml,tp->snd_una) && + !after(tp->snd_sml, tp->snd_nxt); +} + +/* Return 0, if packet can be sent now without violation Nagle's rules: + * 1. It is full sized. + * 2. Or it contains FIN. + * 3. Or TCP_NODELAY was set. + * 4. Or TCP_CORK is not set, and all sent packets are ACKed. + * With Minshall's modification: all sent small packets are ACKed. + */ + +static inline int tcp_nagle_check(const struct tcp_sock *tp, + const struct sk_buff *skb, + unsigned mss_now, int nonagle) +{ + return (skb->len < mss_now && + !(TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN) && + ((nonagle&TCP_NAGLE_CORK) || + (!nonagle && + tp->packets_out && + tcp_minshall_check(tp)))); +} + +/* This checks if the data bearing packet SKB (usually sk->sk_send_head) + * should be put on the wire right now. + */ +static int tcp_snd_test(struct sock *sk, struct sk_buff *skb, + unsigned cur_mss, int nonagle) +{ + struct tcp_sock *tp = tcp_sk(sk); + int pkts = tcp_skb_pcount(skb); + + if (!pkts) { + tcp_set_skb_tso_segs(sk, skb); + pkts = tcp_skb_pcount(skb); + } + + /* RFC 1122 - section 4.2.3.4 + * + * We must queue if + * + * a) The right edge of this frame exceeds the window + * b) There are packets in flight and we have a small segment + * [SWS avoidance and Nagle algorithm] + * (part of SWS is done on packetization) + * Minshall version sounds: there are no _small_ + * segments in flight. (tcp_nagle_check) + * c) We have too many packets 'in flight' + * + * Don't use the nagle rule for urgent data (or + * for the final FIN -DaveM). + * + * Also, Nagle rule does not apply to frames, which + * sit in the middle of queue (they have no chances + * to get new data) and if room at tail of skb is + * not enough to save something seriously (<32 for now). + */ + + /* Don't be strict about the congestion window for the + * final FIN frame. -DaveM + */ + return (((nonagle&TCP_NAGLE_PUSH) || tp->urg_mode + || !tcp_nagle_check(tp, skb, cur_mss, nonagle)) && + (((tcp_packets_in_flight(tp) + (pkts-1)) < tp->snd_cwnd) || + (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)) && + !after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd)); +} + +static inline int tcp_skb_is_last(const struct sock *sk, + const struct sk_buff *skb) +{ + return skb->next == (struct sk_buff *)&sk->sk_write_queue; +} + +/* Push out any pending frames which were held back due to + * TCP_CORK or attempt at coalescing tiny packets. + * The socket must be locked by the caller. + */ +void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, + unsigned cur_mss, int nonagle) +{ + struct sk_buff *skb = sk->sk_send_head; + + if (skb) { + if (!tcp_skb_is_last(sk, skb)) + nonagle = TCP_NAGLE_PUSH; + if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || + tcp_write_xmit(sk, nonagle)) + tcp_check_probe_timer(sk, tp); + } + tcp_cwnd_validate(sk, tp); +} + +int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp) +{ + struct sk_buff *skb = sk->sk_send_head; + + return (skb && + tcp_snd_test(sk, skb, tcp_current_mss(sk, 1), + (tcp_skb_is_last(sk, skb) ? + TCP_NAGLE_PUSH : + tp->nonagle))); +} + + /* Send _single_ skb sitting at the send head. This function requires * true push pending frames to setup probe timer etc. */ @@ -440,27 +569,6 @@ void tcp_push_one(struct sock *sk, unsig } } -void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (skb->len <= tp->mss_cache_std || - !(sk->sk_route_caps & NETIF_F_TSO)) { - /* Avoid the costly divide in the normal - * non-TSO case. - */ - skb_shinfo(skb)->tso_segs = 1; - skb_shinfo(skb)->tso_size = 0; - } else { - unsigned int factor; - - factor = skb->len + (tp->mss_cache_std - 1); - factor /= tp->mss_cache_std; - skb_shinfo(skb)->tso_segs = factor; - skb_shinfo(skb)->tso_size = tp->mss_cache_std; - } -} - /* Function to create two new TCP segments. Shrinks the given segment * to the specified size and appends a new segment with the rest of the * packet to the list. This won't be called frequently, I hope. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller ` (2 preceding siblings ...) 2005-06-07 4:17 ` [PATCH 3/9]: " David S. Miller @ 2005-06-07 4:18 ` David S. Miller 2005-06-07 4:19 ` [PATCH 5/9]: " David S. Miller ` (6 subsequent siblings) 10 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 4:18 UTC (permalink / raw) To: netdev; +Cc: herbert, jheffner [TCP]: Move __tcp_data_snd_check into tcp_output.c It reimplements portions of tcp_snd_check(), so it we move it to tcp_output.c we can consolidate it's logic much easier in a later change. Signed-off-by: David S. Miller <davem@davemloft.net> bdbf09522de5be3ada129dceaa3ad9da9be078bc (from cba5d690f46699d37df7dc087247d1f7c7155692) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -945,6 +945,7 @@ extern __u32 cookie_v4_init_sequence(str /* tcp_output.c */ extern int tcp_write_xmit(struct sock *, int nonagle); +extern void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb); extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, unsigned cur_mss, int nonagle); extern int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3975,16 +3975,6 @@ static inline void tcp_check_space(struc } } -static void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || - tcp_packets_in_flight(tp) >= tp->snd_cwnd || - tcp_write_xmit(sk, tp->nonagle)) - tcp_check_probe_timer(sk, tp); -} - static __inline__ void tcp_data_snd_check(struct sock *sk) { struct sk_buff *skb = sk->sk_send_head; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -536,6 +536,16 @@ void __tcp_push_pending_frames(struct so tcp_cwnd_validate(sk, tp); } +void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || + tcp_packets_in_flight(tp) >= tp->snd_cwnd || + tcp_write_xmit(sk, tp->nonagle)) + tcp_check_probe_timer(sk, tp); +} + int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp) { struct sk_buff *skb = sk->sk_send_head; ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 5/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller ` (3 preceding siblings ...) 2005-06-07 4:18 ` [PATCH 4/9]: " David S. Miller @ 2005-06-07 4:19 ` David S. Miller 2005-06-07 4:20 ` [PATCH 6/9]: " David S. Miller ` (5 subsequent siblings) 10 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 4:19 UTC (permalink / raw) To: netdev; +Cc: herbert, jheffner [TCP]: Add missing skb_header_release() call to tcp_fragment(). When we add any new packet to the TCP socket write queue, we must call skb_header_release() on it in order for the TSO sharing checks in the drivers to work. Signed-off-by: David S. Miller <davem@davemloft.net> 79eb6b25499ed5470cb7b20428c435288fcb3502 (from bdbf09522de5be3ada129dceaa3ad9da9be078bc) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -660,6 +660,7 @@ static int tcp_fragment(struct sock *sk, } /* Link BUFF into the send queue. */ + skb_header_release(buff); __skb_append(skb, buff); return 0; ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 6/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller ` (4 preceding siblings ...) 2005-06-07 4:19 ` [PATCH 5/9]: " David S. Miller @ 2005-06-07 4:20 ` David S. Miller 2005-06-07 4:21 ` [PATCH 7/9]: " David S. Miller ` (4 subsequent siblings) 10 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 4:20 UTC (permalink / raw) To: netdev; +Cc: herbert, jheffner [TCP]: Kill extra cwnd validate in __tcp_push_pending_frames(). The tcp_cwnd_validate() function should only be invoked if we actually send some frames, yet __tcp_push_pending_frames() will always invoke it. tcp_write_xmit() does the call for us, so the call here can simply be removed. Also, tcp_write_xmit() can be marked static. Signed-off-by: David S. Miller <davem@davemloft.net> ae083bd3447865cbaf0996a69ba03807fd9fce01 (from 79eb6b25499ed5470cb7b20428c435288fcb3502) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -944,7 +944,6 @@ extern __u32 cookie_v4_init_sequence(str /* tcp_output.c */ -extern int tcp_write_xmit(struct sock *, int nonagle); extern void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb); extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, unsigned cur_mss, int nonagle); @@ -964,6 +963,9 @@ extern void tcp_push_one(struct sock *, extern void tcp_send_ack(struct sock *sk); extern void tcp_send_delayed_ack(struct sock *sk); +/* tcp_input.c */ +extern void tcp_cwnd_application_limited(struct sock *sk); + /* tcp_timer.c */ extern void tcp_init_xmit_timers(struct sock *); extern void tcp_clear_xmit_timers(struct sock *); @@ -1339,28 +1341,6 @@ static inline void tcp_sync_left_out(str tp->left_out = tp->sacked_out + tp->lost_out; } -extern void tcp_cwnd_application_limited(struct sock *sk); - -/* Congestion window validation. (RFC2861) */ - -static inline void tcp_cwnd_validate(struct sock *sk, struct tcp_sock *tp) -{ - __u32 packets_out = tp->packets_out; - - if (packets_out >= tp->snd_cwnd) { - /* Network is feed fully. */ - tp->snd_cwnd_used = 0; - tp->snd_cwnd_stamp = tcp_time_stamp; - } else { - /* Network starves. */ - if (tp->packets_out > tp->snd_cwnd_used) - tp->snd_cwnd_used = tp->packets_out; - - if ((s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= tp->rto) - tcp_cwnd_application_limited(sk); - } -} - /* Set slow start threshould and cwnd not falling to slow start */ static inline void __tcp_enter_cwr(struct tcp_sock *tp) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -517,35 +517,6 @@ static inline int tcp_skb_is_last(const return skb->next == (struct sk_buff *)&sk->sk_write_queue; } -/* Push out any pending frames which were held back due to - * TCP_CORK or attempt at coalescing tiny packets. - * The socket must be locked by the caller. - */ -void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, - unsigned cur_mss, int nonagle) -{ - struct sk_buff *skb = sk->sk_send_head; - - if (skb) { - if (!tcp_skb_is_last(sk, skb)) - nonagle = TCP_NAGLE_PUSH; - if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || - tcp_write_xmit(sk, nonagle)) - tcp_check_probe_timer(sk, tp); - } - tcp_cwnd_validate(sk, tp); -} - -void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || - tcp_packets_in_flight(tp) >= tp->snd_cwnd || - tcp_write_xmit(sk, tp->nonagle)) - tcp_check_probe_timer(sk, tp); -} - int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp) { struct sk_buff *skb = sk->sk_send_head; @@ -846,6 +817,26 @@ unsigned int tcp_current_mss(struct sock return mss_now; } +/* Congestion window validation. (RFC2861) */ + +static inline void tcp_cwnd_validate(struct sock *sk, struct tcp_sock *tp) +{ + __u32 packets_out = tp->packets_out; + + if (packets_out >= tp->snd_cwnd) { + /* Network is feed fully. */ + tp->snd_cwnd_used = 0; + tp->snd_cwnd_stamp = tcp_time_stamp; + } else { + /* Network starves. */ + if (tp->packets_out > tp->snd_cwnd_used) + tp->snd_cwnd_used = tp->packets_out; + + if ((s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= tp->rto) + tcp_cwnd_application_limited(sk); + } +} + /* This routine writes packets to the network. It advances the * send_head. This happens as incoming acks open up the remote * window for us. @@ -853,7 +844,7 @@ unsigned int tcp_current_mss(struct sock * Returns 1, if no segments are in flight and we have queued segments, but * cannot send anything now because of SWS or another problem. */ -int tcp_write_xmit(struct sock *sk, int nonagle) +static int tcp_write_xmit(struct sock *sk, int nonagle) { struct tcp_sock *tp = tcp_sk(sk); unsigned int mss_now; @@ -906,6 +897,34 @@ int tcp_write_xmit(struct sock *sk, int return 0; } +/* Push out any pending frames which were held back due to + * TCP_CORK or attempt at coalescing tiny packets. + * The socket must be locked by the caller. + */ +void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, + unsigned cur_mss, int nonagle) +{ + struct sk_buff *skb = sk->sk_send_head; + + if (skb) { + if (!tcp_skb_is_last(sk, skb)) + nonagle = TCP_NAGLE_PUSH; + if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || + tcp_write_xmit(sk, nonagle)) + tcp_check_probe_timer(sk, tp); + } +} + +void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || + tcp_packets_in_flight(tp) >= tp->snd_cwnd || + tcp_write_xmit(sk, tp->nonagle)) + tcp_check_probe_timer(sk, tp); +} + /* This function returns the amount that we can raise the * usable window based on the following constraints * ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 7/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller ` (5 preceding siblings ...) 2005-06-07 4:20 ` [PATCH 6/9]: " David S. Miller @ 2005-06-07 4:21 ` David S. Miller 2005-06-07 4:22 ` [PATCH 8/9]: " David S. Miller ` (3 subsequent siblings) 10 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 4:21 UTC (permalink / raw) To: netdev; +Cc: herbert, jheffner [TCP]: tcp_write_xmit() tabbing cleanup Put the main basic block of work at the top-level of tabbing, and mark the TCP_CLOSE test with unlikely(). Signed-off-by: David S. Miller <davem@davemloft.net> b8d892e4dc753d796e80da6e17f2a88aede0695e (from ae083bd3447865cbaf0996a69ba03807fd9fce01) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -847,54 +847,54 @@ static inline void tcp_cwnd_validate(str static int tcp_write_xmit(struct sock *sk, int nonagle) { struct tcp_sock *tp = tcp_sk(sk); + struct sk_buff *skb; unsigned int mss_now; + int sent_pkts; /* If we are closed, the bytes will have to remain here. * In time closedown will finish, we empty the write queue and all * will be happy. */ - if (sk->sk_state != TCP_CLOSE) { - struct sk_buff *skb; - int sent_pkts = 0; + if (unlikely(sk->sk_state == TCP_CLOSE)) + return 0; - /* Account for SACKS, we may need to fragment due to this. - * It is just like the real MSS changing on us midstream. - * We also handle things correctly when the user adds some - * IP options mid-stream. Silly to do, but cover it. - */ - mss_now = tcp_current_mss(sk, 1); - while ((skb = sk->sk_send_head) && - tcp_snd_test(sk, skb, mss_now, - tcp_skb_is_last(sk, skb) ? nonagle : - TCP_NAGLE_PUSH)) { - if (skb->len > mss_now) { - if (tcp_fragment(sk, skb, mss_now)) - break; - } - - TCP_SKB_CB(skb)->when = tcp_time_stamp; - tcp_tso_set_push(skb); - if (tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC))) + /* Account for SACKS, we may need to fragment due to this. + * It is just like the real MSS changing on us midstream. + * We also handle things correctly when the user adds some + * IP options mid-stream. Silly to do, but cover it. + */ + mss_now = tcp_current_mss(sk, 1); + sent_pkts = 0; + while ((skb = sk->sk_send_head) && + tcp_snd_test(sk, skb, mss_now, + tcp_skb_is_last(sk, skb) ? nonagle : + TCP_NAGLE_PUSH)) { + if (skb->len > mss_now) { + if (tcp_fragment(sk, skb, mss_now)) break; + } - /* Advance the send_head. This one is sent out. - * This call will increment packets_out. - */ - update_send_head(sk, tp, skb); + TCP_SKB_CB(skb)->when = tcp_time_stamp; + tcp_tso_set_push(skb); + if (tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC))) + break; - tcp_minshall_update(tp, mss_now, skb); - sent_pkts = 1; - } + /* Advance the send_head. This one is sent out. + * This call will increment packets_out. + */ + update_send_head(sk, tp, skb); - if (sent_pkts) { - tcp_cwnd_validate(sk, tp); - return 0; - } + tcp_minshall_update(tp, mss_now, skb); + sent_pkts = 1; + } - return !tp->packets_out && sk->sk_send_head; + if (sent_pkts) { + tcp_cwnd_validate(sk, tp); + return 0; } - return 0; + + return !tp->packets_out && sk->sk_send_head; } /* Push out any pending frames which were held back due to ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 8/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller ` (6 preceding siblings ...) 2005-06-07 4:21 ` [PATCH 7/9]: " David S. Miller @ 2005-06-07 4:22 ` David S. Miller 2005-06-07 4:23 ` [PATCH 9/9]: " David S. Miller ` (2 subsequent siblings) 10 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 4:22 UTC (permalink / raw) To: netdev; +Cc: herbert, jheffner [TCP]: Fix redundant calculations of tcp_current_mss() tcp_write_xmit() uses tcp_current_mss(), but some of it's callers, namely __tcp_push_pending_frames(), already has this value available already. While we're here, fix the "cur_mss" argument to be "unsigned int" instead of plain "unsigned". Signed-off-by: David S. Miller <davem@davemloft.net> f22c7890049ef8c51b0cdcc5d7e0cd06333de6b0 (from b8d892e4dc753d796e80da6e17f2a88aede0695e) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -946,7 +946,7 @@ extern __u32 cookie_v4_init_sequence(str extern void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb); extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, - unsigned cur_mss, int nonagle); + unsigned int cur_mss, int nonagle); extern int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp); extern int tcp_retransmit_skb(struct sock *, struct sk_buff *); extern void tcp_xmit_retransmit_queue(struct sock *); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -844,11 +844,10 @@ static inline void tcp_cwnd_validate(str * Returns 1, if no segments are in flight and we have queued segments, but * cannot send anything now because of SWS or another problem. */ -static int tcp_write_xmit(struct sock *sk, int nonagle) +static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; - unsigned int mss_now; int sent_pkts; /* If we are closed, the bytes will have to remain here. @@ -858,13 +857,6 @@ static int tcp_write_xmit(struct sock *s if (unlikely(sk->sk_state == TCP_CLOSE)) return 0; - - /* Account for SACKS, we may need to fragment due to this. - * It is just like the real MSS changing on us midstream. - * We also handle things correctly when the user adds some - * IP options mid-stream. Silly to do, but cover it. - */ - mss_now = tcp_current_mss(sk, 1); sent_pkts = 0; while ((skb = sk->sk_send_head) && tcp_snd_test(sk, skb, mss_now, @@ -902,7 +894,7 @@ static int tcp_write_xmit(struct sock *s * The socket must be locked by the caller. */ void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, - unsigned cur_mss, int nonagle) + unsigned int cur_mss, int nonagle) { struct sk_buff *skb = sk->sk_send_head; @@ -910,7 +902,7 @@ void __tcp_push_pending_frames(struct so if (!tcp_skb_is_last(sk, skb)) nonagle = TCP_NAGLE_PUSH; if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || - tcp_write_xmit(sk, nonagle)) + tcp_write_xmit(sk, cur_mss, nonagle)) tcp_check_probe_timer(sk, tp); } } @@ -921,7 +913,7 @@ void __tcp_data_snd_check(struct sock *s if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || tcp_packets_in_flight(tp) >= tp->snd_cwnd || - tcp_write_xmit(sk, tp->nonagle)) + tcp_write_xmit(sk, tcp_current_mss(sk, 1), tp->nonagle)) tcp_check_probe_timer(sk, tp); } ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 9/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller ` (7 preceding siblings ...) 2005-06-07 4:22 ` [PATCH 8/9]: " David S. Miller @ 2005-06-07 4:23 ` David S. Miller 2005-06-07 4:56 ` [PATCH 0/9]: " Stephen Hemminger 2005-06-08 21:40 ` John Heffner 10 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 4:23 UTC (permalink / raw) To: netdev; +Cc: herbert, jheffner [TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling. 'nonagle' should be passed to the tcp_snd_test() function as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the tail of the write_queue. This is because Nagle does not apply to such frames since we cannot possibly tack more data onto them. However, while doing this __tcp_push_pending_frames() makes all of the packets in the write_queue use this modified 'nonagle' value. Fix the bug and simplify this function by just calling tcp_write_xmit() directly if sk_send_head is non-NULL. As a result, we can now make tcp_data_snd_check() just call tcp_push_pending_frames() instead of the specialized __tcp_data_snd_check(). Signed-off-by: David S. Miller <davem@davemloft.net> 45d0377c7d18e1a036b0a1f96788a998dccf73cf (from f22c7890049ef8c51b0cdcc5d7e0cd06333de6b0) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -944,7 +944,6 @@ extern __u32 cookie_v4_init_sequence(str /* tcp_output.c */ -extern void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb); extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, unsigned int cur_mss, int nonagle); extern int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3975,12 +3975,9 @@ static inline void tcp_check_space(struc } } -static __inline__ void tcp_data_snd_check(struct sock *sk) +static __inline__ void tcp_data_snd_check(struct sock *sk, struct tcp_sock *tp) { - struct sk_buff *skb = sk->sk_send_head; - - if (skb != NULL) - __tcp_data_snd_check(sk, skb); + tcp_push_pending_frames(sk, tp); tcp_check_space(sk); } @@ -4274,7 +4271,7 @@ int tcp_rcv_established(struct sock *sk, */ tcp_ack(sk, skb, 0); __kfree_skb(skb); - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); return 0; } else { /* Header too small */ TCP_INC_STATS_BH(TCP_MIB_INERRS); @@ -4340,7 +4337,7 @@ int tcp_rcv_established(struct sock *sk, if (TCP_SKB_CB(skb)->ack_seq != tp->snd_una) { /* Well, only one small jumplet in fast path... */ tcp_ack(sk, skb, FLAG_DATA); - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); if (!tcp_ack_scheduled(tp)) goto no_ack; } @@ -4418,7 +4415,7 @@ step5: /* step 7: process the segment text */ tcp_data_queue(sk, skb); - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); tcp_ack_snd_check(sk); return 0; @@ -4732,7 +4729,7 @@ int tcp_rcv_state_process(struct sock *s /* Do step6 onward by hand. */ tcp_urg(sk, skb, th); __kfree_skb(skb); - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); return 0; } @@ -4921,7 +4918,7 @@ int tcp_rcv_state_process(struct sock *s /* tcp_data could move socket to TIME-WAIT */ if (sk->sk_state != TCP_CLOSE) { - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); tcp_ack_snd_check(sk); } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -899,24 +899,11 @@ void __tcp_push_pending_frames(struct so struct sk_buff *skb = sk->sk_send_head; if (skb) { - if (!tcp_skb_is_last(sk, skb)) - nonagle = TCP_NAGLE_PUSH; - if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || - tcp_write_xmit(sk, cur_mss, nonagle)) + if (tcp_write_xmit(sk, cur_mss, nonagle)) tcp_check_probe_timer(sk, tp); } } -void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || - tcp_packets_in_flight(tp) >= tp->snd_cwnd || - tcp_write_xmit(sk, tcp_current_mss(sk, 1), tp->nonagle)) - tcp_check_probe_timer(sk, tp); -} - /* This function returns the amount that we can raise the * usable window based on the following constraints * ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller ` (8 preceding siblings ...) 2005-06-07 4:23 ` [PATCH 9/9]: " David S. Miller @ 2005-06-07 4:56 ` Stephen Hemminger 2005-06-07 5:51 ` David S. Miller 2005-06-08 21:40 ` John Heffner 10 siblings, 1 reply; 18+ messages in thread From: Stephen Hemminger @ 2005-06-07 4:56 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, herbert, jheffner I'll merge these with the TCP infrastructure stuff and send it off to Andrew. Actually, it is more of fix the TCP infrastructure to match TSO + rc6 but you get the ida. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/9]: TCP: The Road to Super TSO 2005-06-07 4:56 ` [PATCH 0/9]: " Stephen Hemminger @ 2005-06-07 5:51 ` David S. Miller 0 siblings, 0 replies; 18+ messages in thread From: David S. Miller @ 2005-06-07 5:51 UTC (permalink / raw) To: shemminger; +Cc: netdev, herbert, jheffner From: Stephen Hemminger <shemminger@osdl.org> Date: Mon, 06 Jun 2005 21:56:16 -0700 > I'll merge these with the TCP infrastructure stuff and > send it off to Andrew. Actually, it is more of fix the TCP > infrastructure to match TSO + rc6 but you get the ida. Probably not a good idea, it's %75 of the implementation of Super TSO and totally conflicts with the super TSO patch. Probably best to keep the existing Super TSO stuff in there until I'm done with this stuff. :) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/9]: TCP: The Road to Super TSO 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller ` (9 preceding siblings ...) 2005-06-07 4:56 ` [PATCH 0/9]: " Stephen Hemminger @ 2005-06-08 21:40 ` John Heffner 2005-06-08 21:49 ` David S. Miller 10 siblings, 1 reply; 18+ messages in thread From: John Heffner @ 2005-06-08 21:40 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, herbert On Tuesday 07 June 2005 12:08 am, David S. Miller wrote: > Some folks, notable the S2IO guys, get performance degradation > from the Super TSO v2 patch (they get it from the first version > as well). It's a real pain to spot what causes such things > in such a huge patch... so I started splitting things up in > a very fine grained manner so we can catch regressions more > precisely. I'm curious about the details of this. Is there decreased performance relative to current TSO? Relative to no TSO? Sending to just one receiver or many, and is it receiver limited? -John ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/9]: TCP: The Road to Super TSO 2005-06-08 21:40 ` John Heffner @ 2005-06-08 21:49 ` David S. Miller 2005-06-08 22:10 ` Herbert Xu 2005-06-08 22:19 ` Leonid Grossman 0 siblings, 2 replies; 18+ messages in thread From: David S. Miller @ 2005-06-08 21:49 UTC (permalink / raw) To: jheffner; +Cc: netdev, herbert From: John Heffner <jheffner@psc.edu> Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO Date: Wed, 8 Jun 2005 17:40:10 -0400 > On Tuesday 07 June 2005 12:08 am, David S. Miller wrote: > > Some folks, notable the S2IO guys, get performance degradation > > from the Super TSO v2 patch (they get it from the first version > > as well). It's a real pain to spot what causes such things > > in such a huge patch... so I started splitting things up in > > a very fine grained manner so we can catch regressions more > > precisely. > > I'm curious about the details of this. Is there decreased performance > relative to current TSO? Relative to no TSO? Sending to just one receiver > or many, and is it receiver limited? The receiver is limited in their tests. No current generation systems can fill a 10gbit pipe fully, especially at 1500 byte MTU. Performance went down, with both TSO enabled and disabled, compared to not having the patches applied. That's why I'm going through this entire exercise of doing things one piece at a time. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/9]: TCP: The Road to Super TSO 2005-06-08 21:49 ` David S. Miller @ 2005-06-08 22:10 ` Herbert Xu 2005-06-09 4:55 ` Leonid Grossman 2005-06-08 22:19 ` Leonid Grossman 1 sibling, 1 reply; 18+ messages in thread From: Herbert Xu @ 2005-06-08 22:10 UTC (permalink / raw) To: David S. Miller; +Cc: jheffner, netdev On Wed, Jun 08, 2005 at 02:49:06PM -0700, David S. Miller wrote: > > Performance went down, with both TSO enabled and disabled, compared to > not having the patches applied. What was the receiver running? Was the performance degradation more pronounced with TSO enabled? -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 0/9]: TCP: The Road to Super TSO 2005-06-08 22:10 ` Herbert Xu @ 2005-06-09 4:55 ` Leonid Grossman 0 siblings, 0 replies; 18+ messages in thread From: Leonid Grossman @ 2005-06-09 4:55 UTC (permalink / raw) To: 'Herbert Xu', 'David S. Miller'; +Cc: jheffner, netdev Some of the original data that we got couple weeks ago are attached. On questions from Herbert and others: - The performance drop from the "super-TSO" with TSO OFF is marginal, with TSO ON is quite noticeable. - The numbers are similar in back-to-back and switch based (sender vs two receivers) tests. - The numbers are relative; we tested in pci-x 1.0 slots where ~7.5Gbps is a practical bus limit For TCP traffic. In pci-x 2.0 slots, the numbers are ~10Gbps with either Jumbo frames Or with 1500 mtu + TSO, (against two 1500 mtu receivers), at a fraction of a single Opteron %cpu - David is correct, with 1500 mtu the single receiver %cpu becomes a bottleneck; the best throughput with 1500 mtu I've seen was ~5Gbps. So, in B2B setup with 1500 mtu the advantages of TSO are mostly wasted since there is no TSO counterpart on the receive side. Receive side stateless offloads fix this, but we did not get around to deploy these ASIC capabilities in Linux yet. Anyway, here it goes: ---------------------------------------------------------- Configuration: Dual Opteron system .243 as Rx, dual Opteron system .117 as Rx, four way Opteron system .247 as Tx, connected via CISCO switch. .243 and .117 kernel source are patched with tcp_ack26.diff, .247 kernel source are patched with tcp_super_tso.diff. Run 8 nttcp connections from Tx system to each Rx system, Use package size 65535 for mtu 1500, Use package size 300000 for mtu 9000. Tx throughput on four way Opteron system .247: 2.6.12-rc4 Tx-1500 CPU usage Tx-9000 CPU usage ---------------- ------------------ TSO off 2.5Gb/s 55%(note 1) 5.3 40%(3) TSO on 4.0 47%(2) 6.1 35%(4) ========================================================== 2.6.12-rc4 with tcp_super_tso.diff patch Tx-1500 CPU usage Tx-9000 CPU usage ---------------- ------------------ TSO off 2.4Gb/s 60%(5) 5.0 41%(7) TSO on 3.5 45%(6) 5.7 35%(8) Note(1): 1500 tso off top - 08:45:41 up 13 min, 2 users, load average: 2.03, 1.01, 0.54 Tasks: 90 total, 3 running, 87 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 50.7% hi, 49.3% s Cpu1 : 0.3% us, 29.2% sy, 0.0% ni, 53.2% id, 0.0% wa, 0.0% hi, 17.3% s Cpu2 : 0.3% us, 27.9% sy, 0.0% ni, 53.2% id, 0.0% wa, 0.0% hi, 18.6% s Cpu3 : 0.3% us, 23.6% sy, 0.0% ni, 59.5% id, 0.0% wa, 0.0% hi, 16.6% s Mem: 2055724k total, 203172k used, 1852552k free, 24112k buffers Swap: 2040244k total, 0k used, 2040244k free, 79384k cached Note(2): 1500 tso on top - 08:48:19 up 16 min, 2 users, load average: 0.74, 0.71, 0.49 Tasks: 90 total, 4 running, 86 sleeping, 0 stopped, 0 zombie Cpu0 : 0.3% us, 1.1% sy, 0.0% ni, 71.9% id, 0.6% wa, 12.2% hi, 13.8% s Cpu1 : 0.5% us, 7.8% sy, 0.0% ni, 88.2% id, 0.5% wa, 0.0% hi, 3.0% s Cpu2 : 0.4% us, 8.1% sy, 0.0% ni, 88.2% id, 0.5% wa, 0.0% hi, 2.9% s Cpu3 : 0.3% us, 6.6% sy, 0.0% ni, 90.3% id, 0.1% wa, 0.0% hi, 2.7% s Mem: 2055724k total, 203652k used, 1852072k free, 25308k buffers Swap: 2040244k total, 0k used, 2040244k free, 79412k cached Note(3): 9000 off top - 08:58:19 up 6 min, 2 users, load average: 0.88, 0.47, 0.21 Tasks: 90 total, 2 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 0.8% us, 8.8% sy, 0.0% ni, 79.1% id, 1.4% wa, 3.5% hi, 6.4% si Cpu1 : 0.7% us, 7.3% sy, 0.0% ni, 90.8% id, 0.4% wa, 0.0% hi, 0.8% si Cpu2 : 0.7% us, 6.9% sy, 0.0% ni, 90.8% id, 1.0% wa, 0.1% hi, 0.5% si Cpu3 : 0.5% us, 5.1% sy, 0.0% ni, 93.9% id, 0.3% wa, 0.0% hi, 0.2% si Mem: 2055724k total, 378620k used, 1677104k free, 18400k buffers Swap: 2040244k total, 0k used, 2040244k free, 72788k cached Note(4): 9000 on top - 08:55:55 up 4 min, 2 users, load average: 0.53, 0.26, 0.12 Tasks: 90 total, 2 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 1.1% us, 4.4% sy, 0.0% ni, 89.2% id, 2.2% wa, 1.2% hi, 1.9% si Cpu1 : 1.0% us, 3.5% sy, 0.0% ni, 94.3% id, 0.6% wa, 0.0% hi, 0.5% si Cpu2 : 1.1% us, 6.4% sy, 0.0% ni, 90.7% id, 1.6% wa, 0.1% hi, 0.2% si Cpu3 : 0.8% us, 5.3% sy, 0.0% ni, 93.5% id, 0.4% wa, 0.0% hi, 0.1% si Mem: 2055724k total, 375892k used, 1679832k free, 17424k buffers Swap: 2040244k total, 0k used, 2040244k free, 72676k cached Note (5): 1500 tso off top - 05:54:20 up 10 min, 2 users, load average: 1.48, 0.62, 0.29 Tasks: 91 total, 3 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 0.5% us, 0.5% sy, 0.0% ni, 81.3% id, 0.9% wa, 7.6% hi, 9.1% Cpu1 : 0.7% us, 5.4% sy, 0.0% ni, 91.5% id, 0.7% wa, 0.0% hi, 1.8% Cpu2 : 0.6% us, 6.5% sy, 0.0% ni, 90.2% id, 0.7% wa, 0.0% hi, 2.0% Cpu3 : 0.4% us, 5.5% sy, 0.0% ni, 92.1% id, 0.2% wa, 0.0% hi, 1.8% Mem: 2055724k total, 204100k used, 1851624k free, 24056k buffers Swap: 2040244k total, 0k used, 2040244k free, 79440k cached Note (6): 1500 tso on top - 05:49:36 up 6 min, 2 users, load average: 1.28, 0.45, 0.18 Tasks: 91 total, 6 running, 85 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 41.5% hi, 58.5% Cpu1 : 0.0% us, 26.4% sy, 0.0% ni, 69.9% id, 0.0% wa, 0.0% hi, 3.7% Cpu2 : 0.3% us, 24.3% sy, 0.0% ni, 71.3% id, 0.0% wa, 0.0% hi, 4.0% Cpu3 : 0.0% us, 19.1% sy, 0.0% ni, 77.6% id, 0.0% wa, 0.0% hi, 3.3% Mem: 2055724k total, 200496k used, 1855228k free, 22644k buffers Swap: 2040244k total, 0k used, 2040244k free, 79288k cached Note (7): 9000 off top - 06:03:13 up 19 min, 2 users, load average: 0.52, 0.27, 0.23 Tasks: 91 total, 3 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 0.3% us, 1.0% sy, 0.0% ni, 86.0% id, 0.5% wa, 5.3% hi, 6.8% Cpu1 : 0.4% us, 4.3% sy, 0.0% ni, 93.7% id, 0.4% wa, 0.0% hi, 1.3% Cpu2 : 0.3% us, 4.5% sy, 0.0% ni, 93.2% id, 0.4% wa, 0.0% hi, 1.5% Cpu3 : 0.2% us, 3.8% sy, 0.0% ni, 94.7% id, 0.1% wa, 0.0% hi, 1.2% Mem: 2055724k total, 399540k used, 1656184k free, 25816k buffers Swap: 2040244k total, 0k used, 2040244k free, 79516k cached Note (8): 9000 on top - 06:05:16 up 21 min, 2 users, load average: 0.79, 0.42, 0.29 Tasks: 91 total, 1 running, 90 sleeping, 0 stopped, 0 zombie Cpu0 : 0.3% us, 2.5% sy, 0.0% ni, 83.5% id, 0.5% wa, 5.6% hi, 7.7% Cpu1 : 0.4% us, 5.1% sy, 0.0% ni, 92.9% id, 0.3% wa, 0.0% hi, 1.3% Cpu2 : 0.3% us, 4.9% sy, 0.0% ni, 92.9% id, 0.4% wa, 0.0% hi, 1.4% Cpu3 : 0.2% us, 3.9% sy, 0.0% ni, 94.7% id, 0.1% wa, 0.0% hi, 1.2% Mem: 2055724k total, 397784k used, 1657940k free, 26892k buffers Swap: 2040244k total, 0k used, 2040244k free, 79528k cached > -----Original Message----- > From: netdev-bounce@oss.sgi.com > [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Herbert Xu > Sent: Wednesday, June 08, 2005 3:11 PM > To: David S. Miller > Cc: jheffner@psc.edu; netdev@oss.sgi.com > Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO > > On Wed, Jun 08, 2005 at 02:49:06PM -0700, David S. Miller wrote: > > > > Performance went down, with both TSO enabled and disabled, > compared to > > not having the patches applied. > > What was the receiver running? Was the performance > degradation more pronounced with TSO enabled? > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 0/9]: TCP: The Road to Super TSO 2005-06-08 21:49 ` David S. Miller 2005-06-08 22:10 ` Herbert Xu @ 2005-06-08 22:19 ` Leonid Grossman 1 sibling, 0 replies; 18+ messages in thread From: Leonid Grossman @ 2005-06-08 22:19 UTC (permalink / raw) To: 'David S. Miller', jheffner; +Cc: netdev, herbert > -----Original Message----- > From: netdev-bounce@oss.sgi.com > [mailto:netdev-bounce@oss.sgi.com] On Behalf Of David S. Miller > Sent: Wednesday, June 08, 2005 2:49 PM > To: jheffner@psc.edu > Cc: netdev@oss.sgi.com; herbert@gondor.apana.org.au > Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO > > From: John Heffner <jheffner@psc.edu> > Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO > Date: Wed, 8 Jun 2005 17:40:10 -0400 > > > On Tuesday 07 June 2005 12:08 am, David S. Miller wrote: > > > Some folks, notable the S2IO guys, get performance > degradation from > > > the Super TSO v2 patch (they get it from the first > version as well). > > > It's a real pain to spot what causes such things in such a huge > > > patch... so I started splitting things up in a very fine grained > > > manner so we can catch regressions more precisely. > > > > I'm curious about the details of this. Is there decreased > performance > > relative to current TSO? Relative to no TSO? Sending to just one > > receiver or many, and is it receiver limited? > > The receiver is limited in their tests. No current > generation systems can fill a 10gbit pipe fully, especially > at 1500 byte MTU. With jumbo frames, a single receiver can handle 10GbE line rate. With 1500 mtu, a single receiver becomes a bottleneck. I will forward the numbers later today. > > Performance went down, with both TSO enabled and disabled, > compared to not having the patches applied. > > That's why I'm going through this entire exercise of doing > things one piece at a time. > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/9]: TCP: The Road to Super TSO @ 2005-06-09 4:30 Leonid Grossman 0 siblings, 0 replies; 18+ messages in thread From: Leonid Grossman @ 2005-06-09 4:30 UTC (permalink / raw) To: 'David S. Miller'; +Cc: netdev FYI, looks like the code in the nine patches is not responsible for the performance drop; the problem is elsewhere in the Super TSO code. -----Original Message----- From: kshaw [mailto:kim.shaw@neterion.com] Sent: Wednesday, June 08, 2005 8:34 PM To: 'David S. Miller' Cc: ravinandan.arakali@neterion.com; leonid.grossman@neterion.com Subject: RE: test Super TSO David, I have applied all 9 patches (6-9 are done by editing source files), I don't see Tx performance drop from any patch, Tx throughput remains at 6.17 Gb/s - 6.18 Gb/s. The following is configuration: 4 way Opteron system .247 with shipping kernel 2.6.12-rc5 as TX system , 4 way Opteron system .226 with kernel 2.6.11.5 as Rx system, NIC driver REL_1-7-7-7_LX installed on both systems, Mtu is set to 9000 on both systems. Systems are connected back to back. Run 8 nttcp connections from Tx system to Rx system for 60 seconds. TSO is set to default on in both systems. I also re-tested the original TSO patch which I used weeks ago, With above same hardware, kernel 2.6.12-rc4 applied with original TSO patch on Tx System, Tx throughput drops to 5.28 Gb/s. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-06-09 4:55 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-06-07 4:08 [PATCH 0/9]: TCP: The Road to Super TSO David S. Miller 2005-06-07 4:16 ` [PATCH 1/9]: " David S. Miller 2005-06-07 4:17 ` [PATCH 2/9]: " David S. Miller 2005-06-07 4:17 ` [PATCH 3/9]: " David S. Miller 2005-06-07 4:18 ` [PATCH 4/9]: " David S. Miller 2005-06-07 4:19 ` [PATCH 5/9]: " David S. Miller 2005-06-07 4:20 ` [PATCH 6/9]: " David S. Miller 2005-06-07 4:21 ` [PATCH 7/9]: " David S. Miller 2005-06-07 4:22 ` [PATCH 8/9]: " David S. Miller 2005-06-07 4:23 ` [PATCH 9/9]: " David S. Miller 2005-06-07 4:56 ` [PATCH 0/9]: " Stephen Hemminger 2005-06-07 5:51 ` David S. Miller 2005-06-08 21:40 ` John Heffner 2005-06-08 21:49 ` David S. Miller 2005-06-08 22:10 ` Herbert Xu 2005-06-09 4:55 ` Leonid Grossman 2005-06-08 22:19 ` Leonid Grossman -- strict thread matches above, loose matches on Subject: below -- 2005-06-09 4:30 Leonid Grossman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).