netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>
Subject: Re: [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo()
Date: Tue, 05 Sep 2017 23:03:01 +0200	[thread overview]
Message-ID: <1504645381.3841.6.camel@redhat.com> (raw)
In-Reply-To: <1504631927.15310.47.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2017-09-05 at 10:18 -0700, Eric Dumazet wrote:
> On Thu, 2017-08-03 at 18:07 +0200, Paolo Abeni wrote:
> > __ip_options_echo() uses the current network namespace, and
> > currently retrives it via skb->dst->dev.
> > 
> > This commit adds an explicit 'net' argument to __ip_options_echo()
> > and update all the call sites to provide it, usually via a simpler
> > sock_net().
> > 
> > After this change, __ip_options_echo() no more needs to access
> > skb->dst and we can drop a couple of hack to preserve such
> > info in the rx path.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> 
> David, Paolo
> 
> This commit (91ed1e666a4ea2e260452a7d7d311ac5ae852cba "ip/options:
> explicitly provide net ns to __ip_options_echo()")
> 
> needs to be backported to linux-4.13 stable version to avoid these kind
> of crashes [1]
> 
> This is because of MSG_PEEK operation, hitting skb_consume_udp() while
> skb is still in receive queue.
> 
> Next read() finding again the skb then can see a NULL skb->dst
> 
> Thanks !
> 
> [1]
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 3017 Comm: syzkaller446772 Not tainted 4.13.0+ #68
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> task: ffff8801cd0a4380 task.stack: ffff8801cc498000
> RIP: 0010:__ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143
> RSP: 0018:ffff8801cc49f628 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: ffff8801cc49f928 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004
> RBP: ffff8801cc49f6b8 R08: ffff8801cc49f936 R09: ffffed0039893f28
> R10: 0000000000000003 R11: ffffed0039893f27 R12: ffff8801cc49f918
> R13: ffff8801ccbcf36c R14: 000000000000000f R15: 0000000000000018
> FS:  0000000000979880(0000) GS:ffff8801db200000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000200c0ff0 CR3: 00000001cc4ed000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  ip_options_echo include/net/ip.h:574 [inline]
>  ip_cmsg_recv_retopts net/ipv4/ip_sockglue.c:91 [inline]
>  ip_cmsg_recv_offset+0xa17/0x1280 net/ipv4/ip_sockglue.c:207
>  udp_recvmsg+0xe0b/0x1260 net/ipv4/udp.c:1641
>  inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793
>  sock_recvmsg_nosec net/socket.c:792 [inline]
>  sock_recvmsg+0xc9/0x110 net/socket.c:799
>  SYSC_recvfrom+0x2dc/0x570 net/socket.c:1788
>  SyS_recvfrom+0x40/0x50 net/socket.c:1760
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x444c89
> RSP: 002b:00007ffd80c788e8 EFLAGS: 00000286 ORIG_RAX: 000000000000002d
> RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000444c89
> RDX: 0000000000000000 RSI: 0000000020bc0000 RDI: 0000000000000004
> RBP: 0000000000000082 R08: 00000000200c0ff0 R09: 0000000000000010
> R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000402390
> R13: 0000000000402420 R14: 0000000000000000 R15: 0000000000000000
> Code: f6 c1 01 0f 85 a5 01 00 00 48 89 4d b8 e8 31 e9 6b fd 48 8b 4d b8
> 48 b8 00 00 00 00 00 fc ff df 48 83 e1 fe 48 89 ca 48 c1 ea 03 <80> 3c
> 02 00 0f 85 41 02 00 00 48 8b 09 48 b8 00 00 00 00 00 fc 
> RIP: __ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 RSP:
> ffff8801cc49f628
> ---[ end trace b30d95b284222843 ]---
> Kernel panic - not syncing: Fatal exception

Thank you Eric for the report! 

Darn me, I seriously messed-up with the stateless consume.

I think we can have similar issues pith ipsec/secpath and MSG_PEEK,
even if with less catastropthic outcome.

What about the following, which should cover both cases? (only compile
tested, I'll test it tomorrow morning my time)
---
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d67a8182e5eb..63df75ae70ee 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -885,7 +885,7 @@ void kfree_skb(struct sk_buff *skb);
 void kfree_skb_list(struct sk_buff *segs);
 void skb_tx_error(struct sk_buff *skb);
 void consume_skb(struct sk_buff *skb);
-void consume_stateless_skb(struct sk_buff *skb);
+void __consume_stateless_skb(struct sk_buff *skb);
 void  __kfree_skb(struct sk_buff *skb);
 extern struct kmem_cache *skbuff_head_cache;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e07556606284..f2411a8744d7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -753,14 +753,11 @@ EXPORT_SYMBOL(consume_skb);
  *	consume_stateless_skb - free an skbuff, assuming it is stateless
  *	@skb: buffer to free
  *
- *	Works like consume_skb(), but this variant assumes that all the head
- *	states have been already dropped.
+ *	Alike consume_skb(), but this variant assumes that all the head
+ *	states have been already dropped and usage count is one
  */
-void consume_stateless_skb(struct sk_buff *skb)
+void __consume_stateless_skb(struct sk_buff *skb)
 {
-	if (!skb_unref(skb))
-		return;
-
 	trace_consume_skb(skb);
 	if (likely(skb->head))
 		skb_release_data(skb);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 62344804baae..979e4d8526ba 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1386,12 +1386,15 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
 		unlock_sock_fast(sk, slow);
 	}
 
+	if (!skb_unref(skb))
+		return;
+
 	/* In the more common cases we cleared the head states previously,
 	 * see __udp_queue_rcv_skb().
 	 */
 	if (unlikely(udp_skb_has_head_state(skb)))
 		skb_release_head_state(skb);
-	consume_stateless_skb(skb);
+	__consume_stateless_skb(skb);
 }
 EXPORT_SYMBOL_GPL(skb_consume_udp);
 

  reply	other threads:[~2017-09-05 21:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni
2017-09-05 17:18   ` Eric Dumazet
2017-09-05 21:03     ` Paolo Abeni [this message]
2017-09-06 12:44     ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni
2017-09-08  3:03       ` David Miller
2017-08-03 16:07 ` [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options" Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 4/4] udp: no need to preserve skb->dst Paolo Abeni
2017-08-07  3:51 ` [PATCH net-next 0/4] IP: cleanup LSRR option processing David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1504645381.3841.6.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).