* [PATCH net] tcp: Always cleanup skb before sending
@ 2017-11-01 21:10 Christoph Paasch
2017-11-01 21:32 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Paasch @ 2017-11-01 21:10 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Eric Dumazet
Since commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache
line misses") IPCB/IP6CB is no more at the beginning of the tcp_skb_cb.
This means that on the output path, we need to make sure that it has
been correctly initialized to 0, as is done in tcp_transmit_skb.
However, when going through the other code-path in TCP that can send an
skb (e.g., through tcp_v6_send_synack), we end up in a situation where
IP6CB has some of its fields set to unexpected values. Depending on the
layout of tcp_skb_cb across the different kernel-versions this can be
lastopt, flags,...
This patch makes sure that IPCB/IP6CB is always set to 0 before sending.
Cc: Eric Dumazet <edumazet@google.com>
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
---
include/net/tcp.h | 2 ++
net/ipv4/tcp_ipv4.c | 6 ++++++
net/ipv4/tcp_output.c | 20 ++++++++++++--------
3 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index e6d0002a1b0b..a375ee8fc534 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2032,6 +2032,8 @@ static inline void tcp_listendrop(const struct sock *sk)
enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer);
+void tcp_skb_cleanup(struct sk_buff *skb);
+
/*
* Interface for adding Upper Level Protocols over TCP
*/
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5b027c69cbc5..db7dd65b1f19 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -709,6 +709,9 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
arg.tos = ip_hdr(skb)->tos;
arg.uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL);
+
+ tcp_skb_cleanup(skb);
+
local_bh_disable();
ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
skb, &TCP_SKB_CB(skb)->header.h4.opt,
@@ -795,6 +798,9 @@ static void tcp_v4_send_ack(const struct sock *sk,
arg.bound_dev_if = oif;
arg.tos = tos;
arg.uid = sock_net_uid(net, sk_fullsock(sk) ? sk : NULL);
+
+ tcp_skb_cleanup(skb);
+
local_bh_disable();
ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
skb, &TCP_SKB_CB(skb)->header.h4.opt,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 823003eef3a2..6935a68d449b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -973,6 +973,16 @@ static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
HRTIMER_MODE_ABS_PINNED);
}
+void tcp_skb_cleanup(struct sk_buff *skb)
+{
+ /* Our usage of tstamp should remain private */
+ skb->tstamp = 0;
+
+ /* Cleanup our debris for IP stacks */
+ memset(skb->cb, 0, max(sizeof(struct inet_skb_parm),
+ sizeof(struct inet6_skb_parm)));
+}
+
/* This routine actually transmits TCP packets queued in by
* tcp_do_sendmsg(). This is used by both the initial
* transmission and possible later retransmissions.
@@ -1115,12 +1125,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
skb_shinfo(skb)->gso_segs = tcp_skb_pcount(skb);
skb_shinfo(skb)->gso_size = tcp_skb_mss(skb);
- /* Our usage of tstamp should remain private */
- skb->tstamp = 0;
-
- /* Cleanup our debris for IP stacks */
- memset(skb->cb, 0, max(sizeof(struct inet_skb_parm),
- sizeof(struct inet6_skb_parm)));
+ tcp_skb_cleanup(skb);
err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
@@ -3204,8 +3209,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
rcu_read_unlock();
#endif
- /* Do not fool tcpdump (if any), clean our debris */
- skb->tstamp = 0;
+ tcp_skb_cleanup(skb);
return skb;
}
EXPORT_SYMBOL(tcp_make_synack);
--
2.14.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: Always cleanup skb before sending
2017-11-01 21:10 [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch
@ 2017-11-01 21:32 ` Eric Dumazet
2017-11-01 21:53 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2017-11-01 21:32 UTC (permalink / raw)
To: Christoph Paasch; +Cc: David Miller, netdev
On Wed, Nov 1, 2017 at 2:10 PM, Christoph Paasch <cpaasch@apple.com> wrote:
> Since commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache
> line misses") IPCB/IP6CB is no more at the beginning of the tcp_skb_cb.
> This means that on the output path, we need to make sure that it has
> been correctly initialized to 0, as is done in tcp_transmit_skb.
>
> However, when going through the other code-path in TCP that can send an
> skb (e.g., through tcp_v6_send_synack), we end up in a situation where
> IP6CB has some of its fields set to unexpected values. Depending on the
> layout of tcp_skb_cb across the different kernel-versions this can be
> lastopt, flags,...
Or not use tcp_init_nondata_skb() on non fast clones, since it adds
unnecessary writes and clears.
tcp_make_synack() really has no business using tcp_init_nondata_skb()
and could simply set th->seq = htonl(tcp_rsk(req)->snt_isn);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: Always cleanup skb before sending
2017-11-01 21:32 ` Eric Dumazet
@ 2017-11-01 21:53 ` Eric Dumazet
2017-11-02 0:10 ` Christoph Paasch
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2017-11-01 21:53 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Christoph Paasch, David Miller, netdev
On Wed, 2017-11-01 at 14:32 -0700, Eric Dumazet wrote:
> On Wed, Nov 1, 2017 at 2:10 PM, Christoph Paasch <cpaasch@apple.com> wrote:
> > Since commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache
> > line misses") IPCB/IP6CB is no more at the beginning of the tcp_skb_cb.
> > This means that on the output path, we need to make sure that it has
> > been correctly initialized to 0, as is done in tcp_transmit_skb.
> >
> > However, when going through the other code-path in TCP that can send an
> > skb (e.g., through tcp_v6_send_synack), we end up in a situation where
> > IP6CB has some of its fields set to unexpected values. Depending on the
> > layout of tcp_skb_cb across the different kernel-versions this can be
> > lastopt, flags,...
>
> Or not use tcp_init_nondata_skb() on non fast clones, since it adds
> unnecessary writes and clears.
>
> tcp_make_synack() really has no business using tcp_init_nondata_skb()
> and could simply set th->seq = htonl(tcp_rsk(req)->snt_isn);
Something like :
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 69cfdead0cb49e4365158048a0d1a9bbdd55fa83..5502abc5307f0ce1de610d4b70f3a59c4d5383c5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3399,13 +3399,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
tcp_ecn_make_synack(req, th);
th->source = htons(ireq->ir_num);
th->dest = ireq->ir_rmt_port;
- /* Setting of flags are superfluous here for callers (and ECE is
- * not even correctly set)
- */
- tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn,
- TCPHDR_SYN | TCPHDR_ACK);
-
- th->seq = htonl(TCP_SKB_CB(skb)->seq);
+ skb->ip_summed = CHECKSUM_PARTIAL;
+ th->seq = htonl(tcp_rsk(req)->snt_isn);
/* XXX data is queued and acked as is. No buffer/window check */
th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: Always cleanup skb before sending
2017-11-01 21:53 ` Eric Dumazet
@ 2017-11-02 0:10 ` Christoph Paasch
2017-11-02 1:00 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02 0:10 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev
On 01/11/17 - 14:53:38, Eric Dumazet wrote:
> On Wed, 2017-11-01 at 14:32 -0700, Eric Dumazet wrote:
> > On Wed, Nov 1, 2017 at 2:10 PM, Christoph Paasch <cpaasch@apple.com> wrote:
> > > Since commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache
> > > line misses") IPCB/IP6CB is no more at the beginning of the tcp_skb_cb.
> > > This means that on the output path, we need to make sure that it has
> > > been correctly initialized to 0, as is done in tcp_transmit_skb.
> > >
> > > However, when going through the other code-path in TCP that can send an
> > > skb (e.g., through tcp_v6_send_synack), we end up in a situation where
> > > IP6CB has some of its fields set to unexpected values. Depending on the
> > > layout of tcp_skb_cb across the different kernel-versions this can be
> > > lastopt, flags,...
> >
> > Or not use tcp_init_nondata_skb() on non fast clones, since it adds
> > unnecessary writes and clears.
> >
> > tcp_make_synack() really has no business using tcp_init_nondata_skb()
> > and could simply set th->seq = htonl(tcp_rsk(req)->snt_isn);
>
> Something like :
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 69cfdead0cb49e4365158048a0d1a9bbdd55fa83..5502abc5307f0ce1de610d4b70f3a59c4d5383c5 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3399,13 +3399,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
> tcp_ecn_make_synack(req, th);
> th->source = htons(ireq->ir_num);
> th->dest = ireq->ir_rmt_port;
> - /* Setting of flags are superfluous here for callers (and ECE is
> - * not even correctly set)
> - */
> - tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn,
> - TCPHDR_SYN | TCPHDR_ACK);
> -
> - th->seq = htonl(TCP_SKB_CB(skb)->seq);
> + skb->ip_summed = CHECKSUM_PARTIAL;
> + th->seq = htonl(tcp_rsk(req)->snt_isn);
> /* XXX data is queued and acked as is. No buffer/window check */
> th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt);
Yes, that looks good to me. Thanks!
But we still need to clean up the skb in tcp_v4_send_reset and
tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
coming from tcp_v4_rcv.
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: Always cleanup skb before sending
2017-11-02 0:10 ` Christoph Paasch
@ 2017-11-02 1:00 ` Eric Dumazet
2017-11-02 2:21 ` Eric Dumazet
2017-11-02 18:16 ` [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch
0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2017-11-02 1:00 UTC (permalink / raw)
To: Christoph Paasch; +Cc: Eric Dumazet, David Miller, netdev
On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
> Yes, that looks good to me. Thanks!
>
> But we still need to clean up the skb in tcp_v4_send_reset and
> tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> coming from tcp_v4_rcv.
You might be confused : ip_send_unicast_reply() does not send back the
incoming skb.
A fresh skb is allocated, then appended/sent.
And commit 24a2d43d8886f5a29c did the changes to provide to
__ip_options_echo() the proper IPCB header location.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: Always cleanup skb before sending
2017-11-02 1:00 ` Eric Dumazet
@ 2017-11-02 2:21 ` Eric Dumazet
2017-11-02 18:24 ` Christoph Paasch
2017-11-02 19:30 ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
2017-11-02 18:16 ` [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch
1 sibling, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2017-11-02 2:21 UTC (permalink / raw)
To: Christoph Paasch; +Cc: Eric Dumazet, David Miller, netdev
On Wed, 2017-11-01 at 18:00 -0700, Eric Dumazet wrote:
> On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
>
> > Yes, that looks good to me. Thanks!
> >
> > But we still need to clean up the skb in tcp_v4_send_reset and
> > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> > coming from tcp_v4_rcv.
>
> You might be confused : ip_send_unicast_reply() does not send back the
> incoming skb.
>
> A fresh skb is allocated, then appended/sent.
>
> And commit 24a2d43d8886f5a29c did the changes to provide to
> __ip_options_echo() the proper IPCB header location.
>
More details :
Fields written by tcp_init_nondata_skb() on the synack packet :
->seq (32bits) at offset 0 of skb->cb[]
->end_seq (32bits) at offset 4 of skb->cb[]
->tcp_gso_segs (16bits) at offset 8
->tcp_flags (8bits) at offset 12 value (TCPHDR_SYN | TCPHDR_ACK ->
0x12)
->sacked (8bits) at offset 13
IPCB fields sharing these 14 bytes :
iif /* 32bits, offset 0 */
opt.faddr (32bits) offset 4
opt.nexthop (32bits) offset 8 /* value 1 */
opt.optlen (8bits) offset 12 /* value 0x12 */
opt.srr (8bits) offset 13
IP6CB fields sharing these 14 bytes :
iif /* 32bits, offset 0 */
ra /* 16 bits, offset 4 */
dst0 /* 16 bits offset 6 */
srcrt /* 16 bits offset 8 */ -> 0x0001
dst1 /* 16 bits offset 10 */ (not mangled -> 0)
lastopt /* 16 bits offset 12 */ -> 0x12
At xmit :
IPV4 uses ip_build_and_send_pkt() to transmit the SYNACK, so skb->cb[]
is not used.
IPv6 uses other fields.
So I really wonder what exact issue you observed, please share your
drugs ;)
Thanks !
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: Always cleanup skb before sending
2017-11-02 1:00 ` Eric Dumazet
2017-11-02 2:21 ` Eric Dumazet
@ 2017-11-02 18:16 ` Christoph Paasch
1 sibling, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02 18:16 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev
On 01/11/17 - 18:00:10, Eric Dumazet wrote:
> On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
>
> > Yes, that looks good to me. Thanks!
> >
> > But we still need to clean up the skb in tcp_v4_send_reset and
> > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> > coming from tcp_v4_rcv.
>
> You might be confused : ip_send_unicast_reply() does not send back the
> incoming skb.
>
> A fresh skb is allocated, then appended/sent.
>
> And commit 24a2d43d8886f5a29c did the changes to provide to
> __ip_options_echo() the proper IPCB header location.
Yes, sorry I misunderstood ip_send_unicast_reply(). Didn't see the
allocation of the new skb.
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: Always cleanup skb before sending
2017-11-02 2:21 ` Eric Dumazet
@ 2017-11-02 18:24 ` Christoph Paasch
2017-11-02 18:26 ` Eric Dumazet
2017-11-02 19:30 ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
1 sibling, 1 reply; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02 18:24 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev
Hi Eric,
On 01/11/17 - 19:21:31, Eric Dumazet wrote:
> On Wed, 2017-11-01 at 18:00 -0700, Eric Dumazet wrote:
> > On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
> >
> > > Yes, that looks good to me. Thanks!
> > >
> > > But we still need to clean up the skb in tcp_v4_send_reset and
> > > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> > > coming from tcp_v4_rcv.
> >
> > You might be confused : ip_send_unicast_reply() does not send back the
> > incoming skb.
> >
> > A fresh skb is allocated, then appended/sent.
> >
> > And commit 24a2d43d8886f5a29c did the changes to provide to
> > __ip_options_echo() the proper IPCB header location.
> >
>
> More details :
>
> Fields written by tcp_init_nondata_skb() on the synack packet :
>
> ->seq (32bits) at offset 0 of skb->cb[]
> ->end_seq (32bits) at offset 4 of skb->cb[]
> ->tcp_gso_segs (16bits) at offset 8
> ->tcp_flags (8bits) at offset 12 value (TCPHDR_SYN | TCPHDR_ACK ->
> 0x12)
In my build I see tcp_flags at offset 16, because ktime_t is 64 bits.
> ->sacked (8bits) at offset 13
>
> IPCB fields sharing these 14 bytes :
>
> iif /* 32bits, offset 0 */
> opt.faddr (32bits) offset 4
> opt.nexthop (32bits) offset 8 /* value 1 */
> opt.optlen (8bits) offset 12 /* value 0x12 */
> opt.srr (8bits) offset 13
>
> IP6CB fields sharing these 14 bytes :
>
> iif /* 32bits, offset 0 */
> ra /* 16 bits, offset 4 */
> dst0 /* 16 bits offset 6 */
> srcrt /* 16 bits offset 8 */ -> 0x0001
> dst1 /* 16 bits offset 10 */ (not mangled -> 0)
> lastopt /* 16 bits offset 12 */ -> 0x12
Thus, because what I mention above, we are writing here into flags which sits
at offset 16.
So, IP6SKB_FORWARDED and IP6SKB_FRAGMENTED will be set on the outgoing skb.
Especially for IP6SKB_FORWARDED, which is checked at ip6_finish_output2() I
am concerned. Even in general, having these fields set looks brittle to me.
What do you think?
> At xmit :
>
> IPV4 uses ip_build_and_send_pkt() to transmit the SYNACK, so skb->cb[]
> is not used.
>
> IPv6 uses other fields.
>
> So I really wonder what exact issue you observed, please share your
> drugs ;)
No drugs, only belgian beer :)
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: Always cleanup skb before sending
2017-11-02 18:24 ` Christoph Paasch
@ 2017-11-02 18:26 ` Eric Dumazet
2017-11-02 18:31 ` Christoph Paasch
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2017-11-02 18:26 UTC (permalink / raw)
To: Christoph Paasch; +Cc: Eric Dumazet, David Miller, netdev
On Thu, Nov 2, 2017 at 11:24 AM, Christoph Paasch <cpaasch@apple.com> wrote:
> Hi Eric,
>
> On 01/11/17 - 19:21:31, Eric Dumazet wrote:
>> On Wed, 2017-11-01 at 18:00 -0700, Eric Dumazet wrote:
>> > On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
>> >
>> > > Yes, that looks good to me. Thanks!
>> > >
>> > > But we still need to clean up the skb in tcp_v4_send_reset and
>> > > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
>> > > coming from tcp_v4_rcv.
>> >
>> > You might be confused : ip_send_unicast_reply() does not send back the
>> > incoming skb.
>> >
>> > A fresh skb is allocated, then appended/sent.
>> >
>> > And commit 24a2d43d8886f5a29c did the changes to provide to
>> > __ip_options_echo() the proper IPCB header location.
>> >
>>
>> More details :
>>
>> Fields written by tcp_init_nondata_skb() on the synack packet :
>>
>> ->seq (32bits) at offset 0 of skb->cb[]
>> ->end_seq (32bits) at offset 4 of skb->cb[]
>> ->tcp_gso_segs (16bits) at offset 8
>> ->tcp_flags (8bits) at offset 12 value (TCPHDR_SYN | TCPHDR_ACK ->
>> 0x12)
>
> In my build I see tcp_flags at offset 16, because ktime_t is 64 bits.
Yes, but this ktime was removed in net-next for my rb-tree work.
>
>> ->sacked (8bits) at offset 13
>>
>> IPCB fields sharing these 14 bytes :
>>
>> iif /* 32bits, offset 0 */
>> opt.faddr (32bits) offset 4
>> opt.nexthop (32bits) offset 8 /* value 1 */
>> opt.optlen (8bits) offset 12 /* value 0x12 */
>> opt.srr (8bits) offset 13
>>
>> IP6CB fields sharing these 14 bytes :
>>
>> iif /* 32bits, offset 0 */
>> ra /* 16 bits, offset 4 */
>> dst0 /* 16 bits offset 6 */
>> srcrt /* 16 bits offset 8 */ -> 0x0001
>> dst1 /* 16 bits offset 10 */ (not mangled -> 0)
>> lastopt /* 16 bits offset 12 */ -> 0x12
>
> Thus, because what I mention above, we are writing here into flags which sits
> at offset 16.
>
> So, IP6SKB_FORWARDED and IP6SKB_FRAGMENTED will be set on the outgoing skb.
> Especially for IP6SKB_FORWARDED, which is checked at ip6_finish_output2() I
> am concerned. Even in general, having these fields set looks brittle to me.
>
> What do you think?
I think I will submit my patch, which should clear the issue, without
breaking IPv4 options handling as your patch did ;)
Thanks !
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: Always cleanup skb before sending
2017-11-02 18:26 ` Eric Dumazet
@ 2017-11-02 18:31 ` Christoph Paasch
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02 18:31 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev
On 02/11/17 - 11:26:21, Eric Dumazet wrote:
> On Thu, Nov 2, 2017 at 11:24 AM, Christoph Paasch <cpaasch@apple.com> wrote:
> > Hi Eric,
> >
> > On 01/11/17 - 19:21:31, Eric Dumazet wrote:
> >> On Wed, 2017-11-01 at 18:00 -0700, Eric Dumazet wrote:
> >> > On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
> >> >
> >> > > Yes, that looks good to me. Thanks!
> >> > >
> >> > > But we still need to clean up the skb in tcp_v4_send_reset and
> >> > > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> >> > > coming from tcp_v4_rcv.
> >> >
> >> > You might be confused : ip_send_unicast_reply() does not send back the
> >> > incoming skb.
> >> >
> >> > A fresh skb is allocated, then appended/sent.
> >> >
> >> > And commit 24a2d43d8886f5a29c did the changes to provide to
> >> > __ip_options_echo() the proper IPCB header location.
> >> >
> >>
> >> More details :
> >>
> >> Fields written by tcp_init_nondata_skb() on the synack packet :
> >>
> >> ->seq (32bits) at offset 0 of skb->cb[]
> >> ->end_seq (32bits) at offset 4 of skb->cb[]
> >> ->tcp_gso_segs (16bits) at offset 8
> >> ->tcp_flags (8bits) at offset 12 value (TCPHDR_SYN | TCPHDR_ACK ->
> >> 0x12)
> >
> > In my build I see tcp_flags at offset 16, because ktime_t is 64 bits.
>
> Yes, but this ktime was removed in net-next for my rb-tree work.
>
> >
> >> ->sacked (8bits) at offset 13
> >>
> >> IPCB fields sharing these 14 bytes :
> >>
> >> iif /* 32bits, offset 0 */
> >> opt.faddr (32bits) offset 4
> >> opt.nexthop (32bits) offset 8 /* value 1 */
> >> opt.optlen (8bits) offset 12 /* value 0x12 */
> >> opt.srr (8bits) offset 13
> >>
> >> IP6CB fields sharing these 14 bytes :
> >>
> >> iif /* 32bits, offset 0 */
> >> ra /* 16 bits, offset 4 */
> >> dst0 /* 16 bits offset 6 */
> >> srcrt /* 16 bits offset 8 */ -> 0x0001
> >> dst1 /* 16 bits offset 10 */ (not mangled -> 0)
> >> lastopt /* 16 bits offset 12 */ -> 0x12
> >
> > Thus, because what I mention above, we are writing here into flags which sits
> > at offset 16.
> >
> > So, IP6SKB_FORWARDED and IP6SKB_FRAGMENTED will be set on the outgoing skb.
> > Especially for IP6SKB_FORWARDED, which is checked at ip6_finish_output2() I
> > am concerned. Even in general, having these fields set looks brittle to me.
> >
> > What do you think?
>
> I think I will submit my patch, which should clear the issue, without
> breaking IPv4 options handling as your patch did ;)
Sounds good! :)
Thanks for your feedback.
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack()
2017-11-02 2:21 ` Eric Dumazet
2017-11-02 18:24 ` Christoph Paasch
@ 2017-11-02 19:30 ` Eric Dumazet
2017-11-02 19:49 ` Christoph Paasch
2017-11-03 5:32 ` David Miller
1 sibling, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2017-11-02 19:30 UTC (permalink / raw)
To: Christoph Paasch; +Cc: Eric Dumazet, David Miller, netdev
From: Eric Dumazet <edumazet@google.com>
Christoph Paasch sent a patch to address the following issue :
tcp_make_synack() is leaving some TCP private info in skb->cb[],
then send the packet by other means than tcp_transmit_skb()
tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
tcp_make_synack() should not use tcp_init_nondata_skb() :
tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
queues (the ones that are only sent via tcp_transmit_skb())
This patch fixes the issue and should even save few cpu cycles ;)
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
---
net/ipv4/tcp_output.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 823003eef3a21a5cc5c27e0be9f46159afa060df..478909f4694d00076c96b7a3be1eda62b6be8bef 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3180,13 +3180,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
th->source = htons(ireq->ir_num);
th->dest = ireq->ir_rmt_port;
skb->mark = ireq->ir_mark;
- /* Setting of flags are superfluous here for callers (and ECE is
- * not even correctly set)
- */
- tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn,
- TCPHDR_SYN | TCPHDR_ACK);
-
- th->seq = htonl(TCP_SKB_CB(skb)->seq);
+ skb->ip_summed = CHECKSUM_PARTIAL;
+ th->seq = htonl(tcp_rsk(req)->snt_isn);
/* XXX data is queued and acked as is. No buffer/window check */
th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack()
2017-11-02 19:30 ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
@ 2017-11-02 19:49 ` Christoph Paasch
2017-11-03 5:32 ` David Miller
1 sibling, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02 19:49 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev
On 02/11/17 - 12:30:25, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Christoph Paasch sent a patch to address the following issue :
>
> tcp_make_synack() is leaving some TCP private info in skb->cb[],
> then send the packet by other means than tcp_transmit_skb()
>
> tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
> IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
>
> tcp_make_synack() should not use tcp_init_nondata_skb() :
>
> tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
> queues (the ones that are only sent via tcp_transmit_skb())
>
> This patch fixes the issue and should even save few cpu cycles ;)
>
> Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> ---
> net/ipv4/tcp_output.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack()
2017-11-02 19:30 ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
2017-11-02 19:49 ` Christoph Paasch
@ 2017-11-03 5:32 ` David Miller
1 sibling, 0 replies; 13+ messages in thread
From: David Miller @ 2017-11-03 5:32 UTC (permalink / raw)
To: eric.dumazet; +Cc: cpaasch, edumazet, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 02 Nov 2017 12:30:25 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> Christoph Paasch sent a patch to address the following issue :
>
> tcp_make_synack() is leaving some TCP private info in skb->cb[],
> then send the packet by other means than tcp_transmit_skb()
>
> tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
> IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
>
> tcp_make_synack() should not use tcp_init_nondata_skb() :
>
> tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
> queues (the ones that are only sent via tcp_transmit_skb())
>
> This patch fixes the issue and should even save few cpu cycles ;)
>
> Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Christoph Paasch <cpaasch@apple.com>
Applied and queued up for -stable.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-11-03 5:32 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-01 21:10 [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch
2017-11-01 21:32 ` Eric Dumazet
2017-11-01 21:53 ` Eric Dumazet
2017-11-02 0:10 ` Christoph Paasch
2017-11-02 1:00 ` Eric Dumazet
2017-11-02 2:21 ` Eric Dumazet
2017-11-02 18:24 ` Christoph Paasch
2017-11-02 18:26 ` Eric Dumazet
2017-11-02 18:31 ` Christoph Paasch
2017-11-02 19:30 ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
2017-11-02 19:49 ` Christoph Paasch
2017-11-03 5:32 ` David Miller
2017-11-02 18:16 ` [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).