* [RFC PATCH v3 net-next 0/3] tcp: Make use of MSG_EOR in tcp_sendmsg
@ 2016-04-20 6:24 Martin KaFai Lau
2016-04-20 6:24 ` [RFC PATCH v3 net-next 1/3] tcp: Make use of MSG_EOR in tcp_sendmsg and tcp_sendpage Martin KaFai Lau
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Martin KaFai Lau @ 2016-04-20 6:24 UTC (permalink / raw)
To: netdev
Cc: Eric Dumazet, Neal Cardwell, Soheil Hassas Yeganeh,
Willem de Bruijn, Yuchung Cheng, Kernel Team
v3:
~ Separate EOR marking from the SKBTX_ANY_TSTAMP logic.
~ Move the eor bit test back to the loop in tcp_sendmsg and
tcp_sendpage because there could be >1 threads doing
sendmsg.
~ Thanks to Eric Dumazet's suggestions on v2.
~ The TCP timestamp bug fixes are separated into other threads.
v2:
~ Rework based on the recent work
"add TX timestamping via cmsg" by
Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
~ This version takes the MSG_EOR bit as a signal of
end-of-response-message and leave the selective
timestamping job to the cmsg
~ Changes based on the v1 feedback (like avoid
unlikely check in a loop and adding tcp_sendpage
support)
~ The first 3 patches are bug fixes. The fixes in this
series depend on the newly introduced txstamp_ack in
net-next. I will make relevant patches against net after
getting some feedback.
~ The test results are based on the recently posted net fix:
"tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks"
~ Due to the lacking cmsg support in packetdrill (or may
be I just could not find it), a BPF prog is used to kprobe
to sock_queue_err_skb() and print out the value of
serr->ee.ee_data. The BPF prog (run-able from bcc) is
attached at the end.
One potential use case is to use MSG_EOR with
SOF_TIMESTAMPING_TX_ACK to get a more accurate
TCP ack timestamping on application protocol with
multiple outgoing response messages (e.g. HTTP2).
One of our use case is at the webserver. The webserver tracks
the HTTP2 response latency by measuring when the webserver sends
the first byte to the socket till the TCP ACK of the last byte
is received. In the cases where we don't have client side
measurement, measuring from the server side is the only option.
In the cases we have the client side measurement, the server side
data can also be used to justify/cross-check-with the client
side data.
^ permalink raw reply [flat|nested] 9+ messages in thread* [RFC PATCH v3 net-next 1/3] tcp: Make use of MSG_EOR in tcp_sendmsg and tcp_sendpage 2016-04-20 6:24 [RFC PATCH v3 net-next 0/3] tcp: Make use of MSG_EOR in tcp_sendmsg Martin KaFai Lau @ 2016-04-20 6:24 ` Martin KaFai Lau 2016-04-20 9:21 ` Eric Dumazet 2016-04-20 6:24 ` [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb Martin KaFai Lau 2016-04-20 6:24 ` [RFC PATCH v3 net-next 3/3] tcp: Handle eor bit when fragmenting a skb Martin KaFai Lau 2 siblings, 1 reply; 9+ messages in thread From: Martin KaFai Lau @ 2016-04-20 6:24 UTC (permalink / raw) To: netdev Cc: Eric Dumazet, Neal Cardwell, Soheil Hassas Yeganeh, Willem de Bruijn, Yuchung Cheng, Kernel Team This patch adds an eor bit to the TCP_SKB_CB. When MSG_EOR is passed to tcp_sendmsg/tcp_sendpage, the eor bit will be set at the skb containing the last byte of the userland's msg. The eor bit will prevent data from appending to that skb in the future. This patch handles the tcp_sendmsg and tcp_sendpage cases. The followup patches will handle other skb coalescing and fragment cases. One potential use case is to use MSG_EOR with SOF_TIMESTAMPING_TX_ACK to get a more accurate TCP ack timestamping on application protocol with multiple outgoing response messages (e.g. HTTP2). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> --- include/net/tcp.h | 3 ++- net/ipv4/tcp.c | 7 +++++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index c0ef054..ac31798 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -762,7 +762,8 @@ struct tcp_skb_cb { __u8 ip_dsfield; /* IPv4 tos or IPv6 dsfield */ __u8 txstamp_ack:1, /* Record TX timestamp for ack? */ - unused:7; + eor:1, /* Is skb MSG_EOR marked */ + unused:6; __u32 ack_seq; /* Sequence number ACK'd */ union { struct inet_skb_parm h4; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4d73858..7df0c1a88 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -908,7 +908,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, int copy, i; bool can_coalesce; - if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) { + if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0 || + TCP_SKB_CB(skb)->eor) { new_segment: if (!sk_stream_memory_free(sk)) goto wait_for_sndbuf; @@ -960,6 +961,7 @@ new_segment: size -= copy; if (!size) { tcp_tx_timestamp(sk, sk->sk_tsflags, skb); + TCP_SKB_CB(skb)->eor = !!(flags & MSG_EOR); goto out; } @@ -1156,7 +1158,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) copy = max - skb->len; } - if (copy <= 0) { + if (copy <= 0 || TCP_SKB_CB(skb)->eor) { new_segment: /* Allocate new segment. If the interface is SG, * allocate skb fitting to single page. @@ -1250,6 +1252,7 @@ new_segment: copied += copy; if (!msg_data_left(msg)) { tcp_tx_timestamp(sk, sockc.tsflags, skb); + TCP_SKB_CB(skb)->eor = !!(flags & MSG_EOR); goto out; } -- 2.5.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v3 net-next 1/3] tcp: Make use of MSG_EOR in tcp_sendmsg and tcp_sendpage 2016-04-20 6:24 ` [RFC PATCH v3 net-next 1/3] tcp: Make use of MSG_EOR in tcp_sendmsg and tcp_sendpage Martin KaFai Lau @ 2016-04-20 9:21 ` Eric Dumazet 0 siblings, 0 replies; 9+ messages in thread From: Eric Dumazet @ 2016-04-20 9:21 UTC (permalink / raw) To: Martin KaFai Lau Cc: netdev, Eric Dumazet, Neal Cardwell, Soheil Hassas Yeganeh, Willem de Bruijn, Yuchung Cheng, Kernel Team On Tue, 2016-04-19 at 23:24 -0700, Martin KaFai Lau wrote: > This patch adds an eor bit to the TCP_SKB_CB. When MSG_EOR > is passed to tcp_sendmsg/tcp_sendpage, the eor bit will > be set at the skb containing the last byte of the userland's > msg. The eor bit will prevent data from appending to that > skb in the future. > > This patch handles the tcp_sendmsg and tcp_sendpage cases. > > The followup patches will handle other skb coalescing > and fragment cases. > > One potential use case is to use MSG_EOR with > SOF_TIMESTAMPING_TX_ACK to get a more accurate > TCP ack timestamping on application protocol with > multiple outgoing response messages (e.g. HTTP2). > > Signed-off-by: Martin KaFai Lau <kafai@fb.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Soheil Hassas Yeganeh <soheil@google.com> > Cc: Willem de Bruijn <willemb@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Suggested-by: Eric Dumazet <edumazet@google.com> > --- > include/net/tcp.h | 3 ++- > net/ipv4/tcp.c | 7 +++++-- > 2 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index c0ef054..ac31798 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -762,7 +762,8 @@ struct tcp_skb_cb { > > __u8 ip_dsfield; /* IPv4 tos or IPv6 dsfield */ > __u8 txstamp_ack:1, /* Record TX timestamp for ack? */ > - unused:7; > + eor:1, /* Is skb MSG_EOR marked */ > + unused:6; > __u32 ack_seq; /* Sequence number ACK'd */ > union { > struct inet_skb_parm h4; > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 4d73858..7df0c1a88 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -908,7 +908,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, > int copy, i; > bool can_coalesce; > > - if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) { > + if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0 || > + TCP_SKB_CB(skb)->eor) { > new_segment: > if (!sk_stream_memory_free(sk)) > goto wait_for_sndbuf; > @@ -960,6 +961,7 @@ new_segment: > size -= copy; > if (!size) { > tcp_tx_timestamp(sk, sk->sk_tsflags, skb); > + TCP_SKB_CB(skb)->eor = !!(flags & MSG_EOR); I am not sure you understood how do_tcp_sendpages() was working. 1) It is called one page at a time, so size would reach zero for every sent page, and we would have at most 4096 bytes (on x86) per skb, even if a sendfile() or splice() or vmsplice() is requesting a large size. 2) @flags here does not contain typical MSG_... values, but a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST Since there is no way to add a MSG_EOR yet in the sendfile() and related functions, you should remove the above line and not claim sendpage() support in patch changelog/title, since it is not true. We only support not aggregating data to the last skb in write queue if eor bit is set on it, thus not breaking sendmsg( ... MSG_EOR) prior uses. > goto out; > } > > @@ -1156,7 +1158,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > copy = max - skb->len; > } > > - if (copy <= 0) { > + if (copy <= 0 || TCP_SKB_CB(skb)->eor) { > new_segment: > /* Allocate new segment. If the interface is SG, > * allocate skb fitting to single page. > @@ -1250,6 +1252,7 @@ new_segment: > copied += copy; > if (!msg_data_left(msg)) { > tcp_tx_timestamp(sk, sockc.tsflags, skb); > + TCP_SKB_CB(skb)->eor = !!(flags & MSG_EOR); Since this is a rmw, it is probably better to avoid it in common case, since we know prior value of eor must be 0 at this point. if (unlikely(flags & MSG_EOR)) TCP_SKB_CB(skb)->eor = 1; > goto out; > } > Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb 2016-04-20 6:24 [RFC PATCH v3 net-next 0/3] tcp: Make use of MSG_EOR in tcp_sendmsg Martin KaFai Lau 2016-04-20 6:24 ` [RFC PATCH v3 net-next 1/3] tcp: Make use of MSG_EOR in tcp_sendmsg and tcp_sendpage Martin KaFai Lau @ 2016-04-20 6:24 ` Martin KaFai Lau 2016-04-20 20:04 ` Soheil Hassas Yeganeh 2016-04-20 6:24 ` [RFC PATCH v3 net-next 3/3] tcp: Handle eor bit when fragmenting a skb Martin KaFai Lau 2 siblings, 1 reply; 9+ messages in thread From: Martin KaFai Lau @ 2016-04-20 6:24 UTC (permalink / raw) To: netdev Cc: Eric Dumazet, Neal Cardwell, Soheil Hassas Yeganeh, Willem de Bruijn, Yuchung Cheng, Kernel Team This patch: 1. Prevent next_skb from coalescing to the prev_skb if TCP_SKB_CB(prev_skb)->eor is set 2. Update the TCP_SKB_CB(prev_skb)->eor if coalescing is allowed Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> --- net/ipv4/tcp_input.c | 4 ++++ net/ipv4/tcp_output.c | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 75e8336..68c55e5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1303,6 +1303,7 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *skb, } TCP_SKB_CB(prev)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags; + TCP_SKB_CB(prev)->eor = TCP_SKB_CB(skb)->eor; if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) TCP_SKB_CB(prev)->end_seq++; @@ -1368,6 +1369,9 @@ static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb, if ((TCP_SKB_CB(prev)->sacked & TCPCB_TAGBITS) != TCPCB_SACKED_ACKED) goto fallback; + if (TCP_SKB_CB(prev)->eor) + goto fallback; + in_sack = !after(start_seq, TCP_SKB_CB(skb)->seq) && !before(end_seq, TCP_SKB_CB(skb)->end_seq); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a6e4a83..96bdf98 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2494,6 +2494,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) * packet counting does not break. */ TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS; + TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor; /* changed transmit queue under us so clear hints */ tcp_clear_retrans_hints_partial(tp); @@ -2545,6 +2546,9 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to, if (!tcp_can_collapse(sk, skb)) break; + if (TCP_SKB_CB(to)->eor) + break; + space -= skb->len; if (first) { -- 2.5.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb 2016-04-20 6:24 ` [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb Martin KaFai Lau @ 2016-04-20 20:04 ` Soheil Hassas Yeganeh 2016-04-21 16:56 ` Martin KaFai Lau 0 siblings, 1 reply; 9+ messages in thread From: Soheil Hassas Yeganeh @ 2016-04-20 20:04 UTC (permalink / raw) To: Martin KaFai Lau Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn, Yuchung Cheng, Kernel Team On Wed, Apr 20, 2016 at 2:24 AM, Martin KaFai Lau <kafai@fb.com> wrote: > This patch: > 1. Prevent next_skb from coalescing to the prev_skb if > TCP_SKB_CB(prev_skb)->eor is set > 2. Update the TCP_SKB_CB(prev_skb)->eor if coalescing is > allowed > > Signed-off-by: Martin KaFai Lau <kafai@fb.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Soheil Hassas Yeganeh <soheil@google.com> > Cc: Willem de Bruijn <willemb@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > --- > net/ipv4/tcp_input.c | 4 ++++ > net/ipv4/tcp_output.c | 4 ++++ > 2 files changed, 8 insertions(+) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 75e8336..68c55e5 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -1303,6 +1303,7 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *skb, > } > > TCP_SKB_CB(prev)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags; > + TCP_SKB_CB(prev)->eor = TCP_SKB_CB(skb)->eor; > if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) > TCP_SKB_CB(prev)->end_seq++; > > @@ -1368,6 +1369,9 @@ static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb, > if ((TCP_SKB_CB(prev)->sacked & TCPCB_TAGBITS) != TCPCB_SACKED_ACKED) > goto fallback; > > + if (TCP_SKB_CB(prev)->eor) > + goto fallback; > + nit: You might want to add unlikely around all checks for "tcp_skb_cb->eor"s. > in_sack = !after(start_seq, TCP_SKB_CB(skb)->seq) && > !before(end_seq, TCP_SKB_CB(skb)->end_seq); > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index a6e4a83..96bdf98 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2494,6 +2494,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) > * packet counting does not break. > */ > TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS; > + TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor; > > /* changed transmit queue under us so clear hints */ > tcp_clear_retrans_hints_partial(tp); > @@ -2545,6 +2546,9 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to, > if (!tcp_can_collapse(sk, skb)) > break; > > + if (TCP_SKB_CB(to)->eor) > + break; > + nit: Perhaps a better place to check for eor is right after entering the loop? to skip a few instructions and tcp_can_collapse, in an unlikely case eor is set. > space -= skb->len; > > if (first) { > -- > 2.5.1 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb 2016-04-20 20:04 ` Soheil Hassas Yeganeh @ 2016-04-21 16:56 ` Martin KaFai Lau 2016-04-21 21:14 ` Soheil Hassas Yeganeh 0 siblings, 1 reply; 9+ messages in thread From: Martin KaFai Lau @ 2016-04-21 16:56 UTC (permalink / raw) To: Soheil Hassas Yeganeh Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn, Yuchung Cheng, Kernel Team On Wed, Apr 20, 2016 at 04:04:54PM -0400, Soheil Hassas Yeganeh wrote: > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index a6e4a83..96bdf98 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -2494,6 +2494,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) > > * packet counting does not break. > > */ > > TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS; > > + TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor; > > > > /* changed transmit queue under us so clear hints */ > > tcp_clear_retrans_hints_partial(tp); > > @@ -2545,6 +2546,9 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to, > > if (!tcp_can_collapse(sk, skb)) > > break; > > > > + if (TCP_SKB_CB(to)->eor) > > + break; > > + > > nit: Perhaps a better place to check for eor is right after entering > the loop? to skip a few instructions and tcp_can_collapse, in an > unlikely case eor is set. hmm... Not sure I understand it. You meant moving the unlikely case before (or after?) the more likely cases which may have a better chance to break the loop sooner? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb 2016-04-21 16:56 ` Martin KaFai Lau @ 2016-04-21 21:14 ` Soheil Hassas Yeganeh 2016-04-22 4:30 ` Martin KaFai Lau 0 siblings, 1 reply; 9+ messages in thread From: Soheil Hassas Yeganeh @ 2016-04-21 21:14 UTC (permalink / raw) To: Martin KaFai Lau Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn, Yuchung Cheng, Kernel Team On Thu, Apr 21, 2016 at 12:56 PM, Martin KaFai Lau <kafai@fb.com> wrote: > On Wed, Apr 20, 2016 at 04:04:54PM -0400, Soheil Hassas Yeganeh wrote: >> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >> > index a6e4a83..96bdf98 100644 >> > --- a/net/ipv4/tcp_output.c >> > +++ b/net/ipv4/tcp_output.c >> > @@ -2494,6 +2494,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) >> > * packet counting does not break. >> > */ >> > TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS; >> > + TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor; >> > >> > /* changed transmit queue under us so clear hints */ >> > tcp_clear_retrans_hints_partial(tp); >> > @@ -2545,6 +2546,9 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to, >> > if (!tcp_can_collapse(sk, skb)) >> > break; >> > >> > + if (TCP_SKB_CB(to)->eor) >> > + break; >> > + >> >> nit: Perhaps a better place to check for eor is right after entering >> the loop? to skip a few instructions and tcp_can_collapse, in an >> unlikely case eor is set. > hmm... Not sure I understand it. > You meant moving the unlikely case before (or after?) the more likely > cases which may have a better chance to break the loop sooner? Well I don't have strong preference here. So, feel free to ignore. Though I'm not sure how "likely" are the checks in tcp_can_collapse. On another note, do you think putting this is a self-documenting helper function, say tcp_can_collapse_to(), would help readability? Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb 2016-04-21 21:14 ` Soheil Hassas Yeganeh @ 2016-04-22 4:30 ` Martin KaFai Lau 0 siblings, 0 replies; 9+ messages in thread From: Martin KaFai Lau @ 2016-04-22 4:30 UTC (permalink / raw) To: Soheil Hassas Yeganeh Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn, Yuchung Cheng, Kernel Team On Thu, Apr 21, 2016 at 05:14:37PM -0400, Soheil Hassas Yeganeh wrote: > On another note, do you think putting this is a self-documenting > helper function, say tcp_can_collapse_to(), would help readability? Sure. I will move unlikely(TCP_SKB_CB(to)->eor) to a new helper function tcp_skb_can_collapse_to() in the next spin. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH v3 net-next 3/3] tcp: Handle eor bit when fragmenting a skb 2016-04-20 6:24 [RFC PATCH v3 net-next 0/3] tcp: Make use of MSG_EOR in tcp_sendmsg Martin KaFai Lau 2016-04-20 6:24 ` [RFC PATCH v3 net-next 1/3] tcp: Make use of MSG_EOR in tcp_sendmsg and tcp_sendpage Martin KaFai Lau 2016-04-20 6:24 ` [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb Martin KaFai Lau @ 2016-04-20 6:24 ` Martin KaFai Lau 2 siblings, 0 replies; 9+ messages in thread From: Martin KaFai Lau @ 2016-04-20 6:24 UTC (permalink / raw) To: netdev Cc: Eric Dumazet, Neal Cardwell, Soheil Hassas Yeganeh, Willem de Bruijn, Yuchung Cheng, Kernel Team When fragmenting a skb, the next_skb should carry the eor from prev_skb. The eor of prev_skb should also be reset. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> --- net/ipv4/tcp_output.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 96bdf98..95f419b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1128,6 +1128,12 @@ static void tcp_fragment_tstamp(struct sk_buff *skb, struct sk_buff *skb2) } } +static void tcp_skb_fragment_eor(struct sk_buff *skb, struct sk_buff *skb2) +{ + TCP_SKB_CB(skb2)->eor = TCP_SKB_CB(skb)->eor; + TCP_SKB_CB(skb)->eor = 0; +} + /* Function to create two new TCP segments. Shrinks the given segment * to the specified size and appends a new segment with the rest of the * packet to the list. This won't be called frequently, I hope. @@ -1173,6 +1179,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, TCP_SKB_CB(skb)->tcp_flags = flags & ~(TCPHDR_FIN | TCPHDR_PSH); TCP_SKB_CB(buff)->tcp_flags = flags; TCP_SKB_CB(buff)->sacked = TCP_SKB_CB(skb)->sacked; + tcp_skb_fragment_eor(skb, buff); if (!skb_shinfo(skb)->nr_frags && skb->ip_summed != CHECKSUM_PARTIAL) { /* Copy and checksum data tail into the new buffer. */ @@ -1733,6 +1740,8 @@ static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len, /* This packet was never sent out yet, so no SACK bits. */ TCP_SKB_CB(buff)->sacked = 0; + tcp_skb_fragment_eor(skb, buff); + buff->ip_summed = skb->ip_summed = CHECKSUM_PARTIAL; skb_split(skb, buff, len); tcp_fragment_tstamp(skb, buff); -- 2.5.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-04-22 4:31 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-04-20 6:24 [RFC PATCH v3 net-next 0/3] tcp: Make use of MSG_EOR in tcp_sendmsg Martin KaFai Lau 2016-04-20 6:24 ` [RFC PATCH v3 net-next 1/3] tcp: Make use of MSG_EOR in tcp_sendmsg and tcp_sendpage Martin KaFai Lau 2016-04-20 9:21 ` Eric Dumazet 2016-04-20 6:24 ` [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb Martin KaFai Lau 2016-04-20 20:04 ` Soheil Hassas Yeganeh 2016-04-21 16:56 ` Martin KaFai Lau 2016-04-21 21:14 ` Soheil Hassas Yeganeh 2016-04-22 4:30 ` Martin KaFai Lau 2016-04-20 6:24 ` [RFC PATCH v3 net-next 3/3] tcp: Handle eor bit when fragmenting a skb Martin KaFai Lau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox