* [PATCH net-next 0/4] IP: cleanup LSRR option processing
@ 2017-08-03 16:07 Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa
The __ip_options_echo() function expect a valid dst entry in skb->dst;
as result we sometimes need to preserve the dst entry for the whole IP
RX path.
The current usage of skb->dst looks more a relic from ancient past that
a real functional constraint. This patchset tries to remove such usage,
and than drops some hacks currently in place in the IP code to keep
skb->dst around.
__ip_options_echo() uses of skb->dst for two different purposes: retrieving
the netns assicated with the skb, and modify the ingress packet LSRR address
list.
The first patch removes the code modifying the ingress packet, and the second
one provides an explicit netns argument to __ip_options_echo(). The following
patches cleanup the current code keeping arund skb->dst for __ip_options_echo's
sake.
Updating the __ip_options_echo() function has been previously discussed here:
http://marc.info/?l=linux-netdev&m=150064533516348&w=2
Paolo Abeni (4):
IP: do not modify ingress packet IP option in ip_options_echo()
ip/options: explicitly provide net ns to __ip_options_echo()
Revert "ipv4: keep skb->dst around in presence of IP options"
udp: no need to preserve skb->dst
include/net/ip.h | 9 +++++----
include/net/tcp.h | 5 +++--
net/ipv4/icmp.c | 4 ++--
net/ipv4/ip_options.c | 9 +++------
net/ipv4/ip_output.c | 2 +-
net/ipv4/ip_sockglue.c | 16 +++++-----------
net/ipv4/syncookies.c | 2 +-
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv4/udp.c | 13 +++++--------
9 files changed, 26 insertions(+), 36 deletions(-)
--
2.13.3
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() 2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni @ 2017-08-03 16:07 ` Paolo Abeni 2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni ` (3 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa While computing the response option set for LSRR, ip_options_echo() also changes the ingress packet LSRR addresses list, setting the last one to the dst specific address for the ingress packet - via memset(start[ ... The only visible effect of such change - beyond possibly damaging shared/cloned skbs - is modifying the data carried by ICMP replies changing the header information for reported the ingress packet, which violates RFC1122 3.2.2.6. All the others call sites just ignore the ingress packet IP options after calling ip_options_echo() Note that the last element in the LSRR option address list for the reply packet will be properly set later in the ip output path via ip_options_build(). This buggy memset() predates git history and apparently was present into the initial ip_options_echo() implementation in linux 1.3.30 but still looks wrong. The removal of the fib_compute_spec_dst() call will help completely dropping the skb->dst usage by __ip_options_echo() with a later patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/ipv4/ip_options.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c index 93157f2f4758..fdda97308c0b 100644 --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -174,9 +174,6 @@ int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb, doffset -= 4; } if (doffset > 3) { - __be32 daddr = fib_compute_spec_dst(skb); - - memcpy(&start[doffset-1], &daddr, 4); dopt->faddr = faddr; dptr[0] = start[0]; dptr[1] = doffset+3; -- 2.13.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() 2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni 2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni @ 2017-08-03 16:07 ` Paolo Abeni 2017-09-05 17:18 ` Eric Dumazet 2017-08-03 16:07 ` [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options" Paolo Abeni ` (2 subsequent siblings) 4 siblings, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa __ip_options_echo() uses the current network namespace, and currently retrives it via skb->dst->dev. This commit adds an explicit 'net' argument to __ip_options_echo() and update all the call sites to provide it, usually via a simpler sock_net(). After this change, __ip_options_echo() no more needs to access skb->dst and we can drop a couple of hack to preserve such info in the rx path. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- include/net/ip.h | 9 +++++---- include/net/tcp.h | 5 +++-- net/ipv4/icmp.c | 4 ++-- net/ipv4/ip_options.c | 6 +++--- net/ipv4/ip_output.c | 2 +- net/ipv4/ip_sockglue.c | 7 ++++--- net/ipv4/syncookies.c | 2 +- net/ipv4/tcp_ipv4.c | 2 +- 8 files changed, 20 insertions(+), 17 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 821cedcc8e73..9e59dcf1787a 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -567,11 +567,12 @@ int ip_forward(struct sk_buff *skb); void ip_options_build(struct sk_buff *skb, struct ip_options *opt, __be32 daddr, struct rtable *rt, int is_frag); -int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb, - const struct ip_options *sopt); -static inline int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb) +int __ip_options_echo(struct net *net, struct ip_options *dopt, + struct sk_buff *skb, const struct ip_options *sopt); +static inline int ip_options_echo(struct net *net, struct ip_options *dopt, + struct sk_buff *skb) { - return __ip_options_echo(dopt, skb, &IPCB(skb)->opt); + return __ip_options_echo(net, dopt, skb, &IPCB(skb)->opt); } void ip_options_fragment(struct sk_buff *skb); diff --git a/include/net/tcp.h b/include/net/tcp.h index bb1881b4ce48..5173fecde495 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1885,7 +1885,8 @@ extern void tcp_rack_reo_timeout(struct sock *sk); /* * Save and compile IPv4 options, return a pointer to it */ -static inline struct ip_options_rcu *tcp_v4_save_options(struct sk_buff *skb) +static inline struct ip_options_rcu *tcp_v4_save_options(struct net *net, + struct sk_buff *skb) { const struct ip_options *opt = &TCP_SKB_CB(skb)->header.h4.opt; struct ip_options_rcu *dopt = NULL; @@ -1894,7 +1895,7 @@ static inline struct ip_options_rcu *tcp_v4_save_options(struct sk_buff *skb) int opt_size = sizeof(*dopt) + opt->optlen; dopt = kmalloc(opt_size, GFP_ATOMIC); - if (dopt && __ip_options_echo(&dopt->opt, skb, opt)) { + if (dopt && __ip_options_echo(net, &dopt->opt, skb, opt)) { kfree(dopt); dopt = NULL; } diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index c2be26b98b5f..681e33998e03 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -412,7 +412,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) int type = icmp_param->data.icmph.type; int code = icmp_param->data.icmph.code; - if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb)) + if (ip_options_echo(net, &icmp_param->replyopts.opt.opt, skb)) return; /* Needed by both icmp_global_allow and icmp_xmit_lock */ @@ -694,7 +694,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) iph->tos; mark = IP4_REPLY_MARK(net, skb_in->mark); - if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in)) + if (ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in)) goto out_unlock; diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c index fdda97308c0b..525ae88d1e58 100644 --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -86,8 +86,8 @@ void ip_options_build(struct sk_buff *skb, struct ip_options *opt, * NOTE: dopt cannot point to skb. */ -int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb, - const struct ip_options *sopt) +int __ip_options_echo(struct net *net, struct ip_options *dopt, + struct sk_buff *skb, const struct ip_options *sopt) { unsigned char *sptr, *dptr; int soffset, doffset; @@ -140,7 +140,7 @@ int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb, __be32 addr; memcpy(&addr, dptr+soffset-1, 4); - if (inet_addr_type(dev_net(skb_dst(skb)->dev), addr) != RTN_UNICAST) { + if (inet_addr_type(net, addr) != RTN_UNICAST) { dopt->ts_needtime = 1; soffset += 8; } diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index b631ec685d77..73b0b15245b6 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1525,7 +1525,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, int err; int oif; - if (__ip_options_echo(&replyopts.opt.opt, skb, sopt)) + if (__ip_options_echo(net, &replyopts.opt.opt, skb, sopt)) return; ipc.addr = daddr; diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index ecc4b4a2413e..1c3354d028a4 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -80,7 +80,8 @@ static void ip_cmsg_recv_opts(struct msghdr *msg, struct sk_buff *skb) } -static void ip_cmsg_recv_retopts(struct msghdr *msg, struct sk_buff *skb) +static void ip_cmsg_recv_retopts(struct net *net, struct msghdr *msg, + struct sk_buff *skb) { unsigned char optbuf[sizeof(struct ip_options) + 40]; struct ip_options *opt = (struct ip_options *)optbuf; @@ -88,7 +89,7 @@ static void ip_cmsg_recv_retopts(struct msghdr *msg, struct sk_buff *skb) if (IPCB(skb)->opt.optlen == 0) return; - if (ip_options_echo(opt, skb)) { + if (ip_options_echo(net, opt, skb)) { msg->msg_flags |= MSG_CTRUNC; return; } @@ -204,7 +205,7 @@ void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk, } if (flags & IP_CMSG_RETOPTS) { - ip_cmsg_recv_retopts(msg, skb); + ip_cmsg_recv_retopts(sock_net(sk), msg, skb); flags &= ~IP_CMSG_RETOPTS; if (!flags) diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 03ad8778c395..b1bb1b3a1082 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -355,7 +355,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb) /* We throwed the options of the initial SYN away, so we hope * the ACK carries the same options again (see RFC1122 4.2.3.8) */ - ireq->opt = tcp_v4_save_options(skb); + ireq->opt = tcp_v4_save_options(sock_net(sk), skb); if (security_inet_conn_request(sk, skb, req)) { reqsk_free(req); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 9b51663cd5a4..5f708c85110e 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1267,7 +1267,7 @@ static void tcp_v4_init_req(struct request_sock *req, sk_rcv_saddr_set(req_to_sk(req), ip_hdr(skb)->daddr); sk_daddr_set(req_to_sk(req), ip_hdr(skb)->saddr); - ireq->opt = tcp_v4_save_options(skb); + ireq->opt = tcp_v4_save_options(sock_net(sk_listener), skb); } static struct dst_entry *tcp_v4_route_req(const struct sock *sk, -- 2.13.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() 2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni @ 2017-09-05 17:18 ` Eric Dumazet 2017-09-05 21:03 ` Paolo Abeni 2017-09-06 12:44 ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni 0 siblings, 2 replies; 10+ messages in thread From: Eric Dumazet @ 2017-09-05 17:18 UTC (permalink / raw) To: Paolo Abeni, David S. Miller; +Cc: netdev, Eric Dumazet, Hannes Frederic Sowa On Thu, 2017-08-03 at 18:07 +0200, Paolo Abeni wrote: > __ip_options_echo() uses the current network namespace, and > currently retrives it via skb->dst->dev. > > This commit adds an explicit 'net' argument to __ip_options_echo() > and update all the call sites to provide it, usually via a simpler > sock_net(). > > After this change, __ip_options_echo() no more needs to access > skb->dst and we can drop a couple of hack to preserve such > info in the rx path. > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- David, Paolo This commit (91ed1e666a4ea2e260452a7d7d311ac5ae852cba "ip/options: explicitly provide net ns to __ip_options_echo()") needs to be backported to linux-4.13 stable version to avoid these kind of crashes [1] This is because of MSG_PEEK operation, hitting skb_consume_udp() while skb is still in receive queue. Next read() finding again the skb then can see a NULL skb->dst Thanks ! [1] general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 3017 Comm: syzkaller446772 Not tainted 4.13.0+ #68 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801cd0a4380 task.stack: ffff8801cc498000 RIP: 0010:__ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 RSP: 0018:ffff8801cc49f628 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801cc49f928 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004 RBP: ffff8801cc49f6b8 R08: ffff8801cc49f936 R09: ffffed0039893f28 R10: 0000000000000003 R11: ffffed0039893f27 R12: ffff8801cc49f918 R13: ffff8801ccbcf36c R14: 000000000000000f R15: 0000000000000018 FS: 0000000000979880(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200c0ff0 CR3: 00000001cc4ed000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip_options_echo include/net/ip.h:574 [inline] ip_cmsg_recv_retopts net/ipv4/ip_sockglue.c:91 [inline] ip_cmsg_recv_offset+0xa17/0x1280 net/ipv4/ip_sockglue.c:207 udp_recvmsg+0xe0b/0x1260 net/ipv4/udp.c:1641 inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793 sock_recvmsg_nosec net/socket.c:792 [inline] sock_recvmsg+0xc9/0x110 net/socket.c:799 SYSC_recvfrom+0x2dc/0x570 net/socket.c:1788 SyS_recvfrom+0x40/0x50 net/socket.c:1760 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x444c89 RSP: 002b:00007ffd80c788e8 EFLAGS: 00000286 ORIG_RAX: 000000000000002d RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000444c89 RDX: 0000000000000000 RSI: 0000000020bc0000 RDI: 0000000000000004 RBP: 0000000000000082 R08: 00000000200c0ff0 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000402390 R13: 0000000000402420 R14: 0000000000000000 R15: 0000000000000000 Code: f6 c1 01 0f 85 a5 01 00 00 48 89 4d b8 e8 31 e9 6b fd 48 8b 4d b8 48 b8 00 00 00 00 00 fc ff df 48 83 e1 fe 48 89 ca 48 c1 ea 03 <80> 3c 02 00 0f 85 41 02 00 00 48 8b 09 48 b8 00 00 00 00 00 fc RIP: __ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 RSP: ffff8801cc49f628 ---[ end trace b30d95b284222843 ]--- Kernel panic - not syncing: Fatal exception ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() 2017-09-05 17:18 ` Eric Dumazet @ 2017-09-05 21:03 ` Paolo Abeni 2017-09-06 12:44 ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni 1 sibling, 0 replies; 10+ messages in thread From: Paolo Abeni @ 2017-09-05 21:03 UTC (permalink / raw) To: Eric Dumazet, David S. Miller; +Cc: netdev, Eric Dumazet, Hannes Frederic Sowa On Tue, 2017-09-05 at 10:18 -0700, Eric Dumazet wrote: > On Thu, 2017-08-03 at 18:07 +0200, Paolo Abeni wrote: > > __ip_options_echo() uses the current network namespace, and > > currently retrives it via skb->dst->dev. > > > > This commit adds an explicit 'net' argument to __ip_options_echo() > > and update all the call sites to provide it, usually via a simpler > > sock_net(). > > > > After this change, __ip_options_echo() no more needs to access > > skb->dst and we can drop a couple of hack to preserve such > > info in the rx path. > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > David, Paolo > > This commit (91ed1e666a4ea2e260452a7d7d311ac5ae852cba "ip/options: > explicitly provide net ns to __ip_options_echo()") > > needs to be backported to linux-4.13 stable version to avoid these kind > of crashes [1] > > This is because of MSG_PEEK operation, hitting skb_consume_udp() while > skb is still in receive queue. > > Next read() finding again the skb then can see a NULL skb->dst > > Thanks ! > > [1] > general protection fault: 0000 [#1] SMP KASAN > Dumping ftrace buffer: > (ftrace buffer empty) > Modules linked in: > CPU: 0 PID: 3017 Comm: syzkaller446772 Not tainted 4.13.0+ #68 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > task: ffff8801cd0a4380 task.stack: ffff8801cc498000 > RIP: 0010:__ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 > RSP: 0018:ffff8801cc49f628 EFLAGS: 00010246 > RAX: dffffc0000000000 RBX: ffff8801cc49f928 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004 > RBP: ffff8801cc49f6b8 R08: ffff8801cc49f936 R09: ffffed0039893f28 > R10: 0000000000000003 R11: ffffed0039893f27 R12: ffff8801cc49f918 > R13: ffff8801ccbcf36c R14: 000000000000000f R15: 0000000000000018 > FS: 0000000000979880(0000) GS:ffff8801db200000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000200c0ff0 CR3: 00000001cc4ed000 CR4: 00000000001406f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > ip_options_echo include/net/ip.h:574 [inline] > ip_cmsg_recv_retopts net/ipv4/ip_sockglue.c:91 [inline] > ip_cmsg_recv_offset+0xa17/0x1280 net/ipv4/ip_sockglue.c:207 > udp_recvmsg+0xe0b/0x1260 net/ipv4/udp.c:1641 > inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793 > sock_recvmsg_nosec net/socket.c:792 [inline] > sock_recvmsg+0xc9/0x110 net/socket.c:799 > SYSC_recvfrom+0x2dc/0x570 net/socket.c:1788 > SyS_recvfrom+0x40/0x50 net/socket.c:1760 > entry_SYSCALL_64_fastpath+0x1f/0xbe > RIP: 0033:0x444c89 > RSP: 002b:00007ffd80c788e8 EFLAGS: 00000286 ORIG_RAX: 000000000000002d > RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000444c89 > RDX: 0000000000000000 RSI: 0000000020bc0000 RDI: 0000000000000004 > RBP: 0000000000000082 R08: 00000000200c0ff0 R09: 0000000000000010 > R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000402390 > R13: 0000000000402420 R14: 0000000000000000 R15: 0000000000000000 > Code: f6 c1 01 0f 85 a5 01 00 00 48 89 4d b8 e8 31 e9 6b fd 48 8b 4d b8 > 48 b8 00 00 00 00 00 fc ff df 48 83 e1 fe 48 89 ca 48 c1 ea 03 <80> 3c > 02 00 0f 85 41 02 00 00 48 8b 09 48 b8 00 00 00 00 00 fc > RIP: __ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 RSP: > ffff8801cc49f628 > ---[ end trace b30d95b284222843 ]--- > Kernel panic - not syncing: Fatal exception Thank you Eric for the report! Darn me, I seriously messed-up with the stateless consume. I think we can have similar issues pith ipsec/secpath and MSG_PEEK, even if with less catastropthic outcome. What about the following, which should cover both cases? (only compile tested, I'll test it tomorrow morning my time) --- diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d67a8182e5eb..63df75ae70ee 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -885,7 +885,7 @@ void kfree_skb(struct sk_buff *skb); void kfree_skb_list(struct sk_buff *segs); void skb_tx_error(struct sk_buff *skb); void consume_skb(struct sk_buff *skb); -void consume_stateless_skb(struct sk_buff *skb); +void __consume_stateless_skb(struct sk_buff *skb); void __kfree_skb(struct sk_buff *skb); extern struct kmem_cache *skbuff_head_cache; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e07556606284..f2411a8744d7 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -753,14 +753,11 @@ EXPORT_SYMBOL(consume_skb); * consume_stateless_skb - free an skbuff, assuming it is stateless * @skb: buffer to free * - * Works like consume_skb(), but this variant assumes that all the head - * states have been already dropped. + * Alike consume_skb(), but this variant assumes that all the head + * states have been already dropped and usage count is one */ -void consume_stateless_skb(struct sk_buff *skb) +void __consume_stateless_skb(struct sk_buff *skb) { - if (!skb_unref(skb)) - return; - trace_consume_skb(skb); if (likely(skb->head)) skb_release_data(skb); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 62344804baae..979e4d8526ba 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1386,12 +1386,15 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len) unlock_sock_fast(sk, slow); } + if (!skb_unref(skb)) + return; + /* In the more common cases we cleared the head states previously, * see __udp_queue_rcv_skb(). */ if (unlikely(udp_skb_has_head_state(skb))) skb_release_head_state(skb); - consume_stateless_skb(skb); + __consume_stateless_skb(skb); } EXPORT_SYMBOL_GPL(skb_consume_udp); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net] udp: drop head states only when all skb references are gone 2017-09-05 17:18 ` Eric Dumazet 2017-09-05 21:03 ` Paolo Abeni @ 2017-09-06 12:44 ` Paolo Abeni 2017-09-08 3:03 ` David Miller 1 sibling, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2017-09-06 12:44 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa After commit 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing") we clear the skb head state as soon as the skb carrying them is first processed. Since the same skb can be processed several times when MSG_PEEK is used, we can end up lacking the required head states, and eventually oopsing. Fix this clearing the skb head state only when processing the last skb reference. Reported-by: Eric Dumazet <edumazet@google.com> Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- include/linux/skbuff.h | 2 +- net/core/skbuff.c | 9 +++------ net/ipv4/udp.c | 5 ++++- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d67a8182e5eb..63df75ae70ee 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -885,7 +885,7 @@ void kfree_skb(struct sk_buff *skb); void kfree_skb_list(struct sk_buff *segs); void skb_tx_error(struct sk_buff *skb); void consume_skb(struct sk_buff *skb); -void consume_stateless_skb(struct sk_buff *skb); +void __consume_stateless_skb(struct sk_buff *skb); void __kfree_skb(struct sk_buff *skb); extern struct kmem_cache *skbuff_head_cache; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e07556606284..72eb23d2426f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -753,14 +753,11 @@ EXPORT_SYMBOL(consume_skb); * consume_stateless_skb - free an skbuff, assuming it is stateless * @skb: buffer to free * - * Works like consume_skb(), but this variant assumes that all the head - * states have been already dropped. + * Alike consume_skb(), but this variant assumes that this is the last + * skb reference and all the head states have been already dropped */ -void consume_stateless_skb(struct sk_buff *skb) +void __consume_stateless_skb(struct sk_buff *skb) { - if (!skb_unref(skb)) - return; - trace_consume_skb(skb); if (likely(skb->head)) skb_release_data(skb); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 62344804baae..979e4d8526ba 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1386,12 +1386,15 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len) unlock_sock_fast(sk, slow); } + if (!skb_unref(skb)) + return; + /* In the more common cases we cleared the head states previously, * see __udp_queue_rcv_skb(). */ if (unlikely(udp_skb_has_head_state(skb))) skb_release_head_state(skb); - consume_stateless_skb(skb); + __consume_stateless_skb(skb); } EXPORT_SYMBOL_GPL(skb_consume_udp); -- 2.13.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net] udp: drop head states only when all skb references are gone 2017-09-06 12:44 ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni @ 2017-09-08 3:03 ` David Miller 0 siblings, 0 replies; 10+ messages in thread From: David Miller @ 2017-09-08 3:03 UTC (permalink / raw) To: pabeni; +Cc: netdev, edumazet, hannes From: Paolo Abeni <pabeni@redhat.com> Date: Wed, 6 Sep 2017 14:44:36 +0200 > After commit 0ddf3fb2c43d ("udp: preserve skb->dst if required > for IP options processing") we clear the skb head state as soon > as the skb carrying them is first processed. > > Since the same skb can be processed several times when MSG_PEEK > is used, we can end up lacking the required head states, and > eventually oopsing. > > Fix this clearing the skb head state only when processing the > last skb reference. > > Reported-by: Eric Dumazet <edumazet@google.com> > Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> Applied and queued up for -stable, thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options" 2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni 2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni 2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni @ 2017-08-03 16:07 ` Paolo Abeni 2017-08-03 16:07 ` [PATCH net-next 4/4] udp: no need to preserve skb->dst Paolo Abeni 2017-08-07 3:51 ` [PATCH net-next 0/4] IP: cleanup LSRR option processing David Miller 4 siblings, 0 replies; 10+ messages in thread From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa ip_options_echo() does not use anymore the skb->dst and don't need to keep the dst around for options's sake only. This reverts commit 34b2cef20f19c87999fff3da4071e66937db9644. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/ipv4/ip_sockglue.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 1c3354d028a4..dd68a9ed5e40 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1228,14 +1228,7 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) pktinfo->ipi_ifindex = 0; pktinfo->ipi_spec_dst.s_addr = 0; } - /* We need to keep the dst for __ip_options_echo() - * We could restrict the test to opt.ts_needtime || opt.srr, - * but the following is good enough as IP options are not often used. - */ - if (unlikely(IPCB(skb)->opt.optlen)) - skb_dst_force(skb); - else - skb_dst_drop(skb); + skb_dst_drop(skb); } int ip_setsockopt(struct sock *sk, int level, -- 2.13.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next 4/4] udp: no need to preserve skb->dst 2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni ` (2 preceding siblings ...) 2017-08-03 16:07 ` [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options" Paolo Abeni @ 2017-08-03 16:07 ` Paolo Abeni 2017-08-07 3:51 ` [PATCH net-next 0/4] IP: cleanup LSRR option processing David Miller 4 siblings, 0 replies; 10+ messages in thread From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa __ip_options_echo() does not need anymore skb->dst, so we can avoid explicitly preserving it for its own sake. This is almost a revert of commit 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing") plus some lifting to fit later changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/ipv4/udp.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index e6276fa3750b..38bca2c4897d 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1176,7 +1176,11 @@ static void udp_set_dev_scratch(struct sk_buff *skb) scratch->csum_unnecessary = !!skb_csum_unnecessary(skb); scratch->is_linear = !skb_is_nonlinear(skb); #endif - if (likely(!skb->_skb_refdst)) + /* all head states execept sp (dst, sk, nf) are always cleared by + * udp_rcv() and we need to preserve secpath, if present, to eventually + * process IP_CMSG_PASSSEC at recvmsg() time + */ + if (likely(!skb_sec_path(skb))) scratch->_tsize_state |= UDP_SKB_IS_STATELESS; } @@ -1782,13 +1786,6 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) sk_mark_napi_id_once(sk, skb); } - /* At recvmsg() time we may access skb->dst or skb->sp depending on - * the IP options and the cmsg flags, elsewhere can we clear all - * pending head states while they are hot in the cache - */ - if (likely(IPCB(skb)->opt.optlen == 0 && !skb_sec_path(skb))) - skb_release_head_state(skb); - rc = __udp_enqueue_schedule_skb(sk, skb); if (rc < 0) { int is_udplite = IS_UDPLITE(sk); -- 2.13.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next 0/4] IP: cleanup LSRR option processing 2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni ` (3 preceding siblings ...) 2017-08-03 16:07 ` [PATCH net-next 4/4] udp: no need to preserve skb->dst Paolo Abeni @ 2017-08-07 3:51 ` David Miller 4 siblings, 0 replies; 10+ messages in thread From: David Miller @ 2017-08-07 3:51 UTC (permalink / raw) To: pabeni; +Cc: netdev, edumazet, hannes From: Paolo Abeni <pabeni@redhat.com> Date: Thu, 3 Aug 2017 18:07:04 +0200 > The __ip_options_echo() function expect a valid dst entry in skb->dst; > as result we sometimes need to preserve the dst entry for the whole IP > RX path. > > The current usage of skb->dst looks more a relic from ancient past that > a real functional constraint. This patchset tries to remove such usage, > and than drops some hacks currently in place in the IP code to keep > skb->dst around. > > __ip_options_echo() uses of skb->dst for two different purposes: retrieving > the netns assicated with the skb, and modify the ingress packet LSRR address > list. > > The first patch removes the code modifying the ingress packet, and the second > one provides an explicit netns argument to __ip_options_echo(). The following > patches cleanup the current code keeping arund skb->dst for __ip_options_echo's > sake. > > Updating the __ip_options_echo() function has been previously discussed here: > > http://marc.info/?l=linux-netdev&m=150064533516348&w=2 Series applied, thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-09-08 3:03 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni 2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni 2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni 2017-09-05 17:18 ` Eric Dumazet 2017-09-05 21:03 ` Paolo Abeni 2017-09-06 12:44 ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni 2017-09-08 3:03 ` David Miller 2017-08-03 16:07 ` [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options" Paolo Abeni 2017-08-03 16:07 ` [PATCH net-next 4/4] udp: no need to preserve skb->dst Paolo Abeni 2017-08-07 3:51 ` [PATCH net-next 0/4] IP: cleanup LSRR option processing David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).