All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>
Subject: Re: [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo()
Date: Tue, 05 Sep 2017 23:03:01 +0200	[thread overview]
Message-ID: <1504645381.3841.6.camel@redhat.com> (raw)
In-Reply-To: <1504631927.15310.47.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2017-09-05 at 10:18 -0700, Eric Dumazet wrote:
> On Thu, 2017-08-03 at 18:07 +0200, Paolo Abeni wrote:
> > __ip_options_echo() uses the current network namespace, and
> > currently retrives it via skb->dst->dev.
> > 
> > This commit adds an explicit 'net' argument to __ip_options_echo()
> > and update all the call sites to provide it, usually via a simpler
> > sock_net().
> > 
> > After this change, __ip_options_echo() no more needs to access
> > skb->dst and we can drop a couple of hack to preserve such
> > info in the rx path.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> 
> David, Paolo
> 
> This commit (91ed1e666a4ea2e260452a7d7d311ac5ae852cba "ip/options:
> explicitly provide net ns to __ip_options_echo()")
> 
> needs to be backported to linux-4.13 stable version to avoid these kind
> of crashes [1]
> 
> This is because of MSG_PEEK operation, hitting skb_consume_udp() while
> skb is still in receive queue.
> 
> Next read() finding again the skb then can see a NULL skb->dst
> 
> Thanks !
> 
> [1]
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 3017 Comm: syzkaller446772 Not tainted 4.13.0+ #68
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> task: ffff8801cd0a4380 task.stack: ffff8801cc498000
> RIP: 0010:__ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143
> RSP: 0018:ffff8801cc49f628 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: ffff8801cc49f928 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004
> RBP: ffff8801cc49f6b8 R08: ffff8801cc49f936 R09: ffffed0039893f28
> R10: 0000000000000003 R11: ffffed0039893f27 R12: ffff8801cc49f918
> R13: ffff8801ccbcf36c R14: 000000000000000f R15: 0000000000000018
> FS:  0000000000979880(0000) GS:ffff8801db200000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000200c0ff0 CR3: 00000001cc4ed000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  ip_options_echo include/net/ip.h:574 [inline]
>  ip_cmsg_recv_retopts net/ipv4/ip_sockglue.c:91 [inline]
>  ip_cmsg_recv_offset+0xa17/0x1280 net/ipv4/ip_sockglue.c:207
>  udp_recvmsg+0xe0b/0x1260 net/ipv4/udp.c:1641
>  inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793
>  sock_recvmsg_nosec net/socket.c:792 [inline]
>  sock_recvmsg+0xc9/0x110 net/socket.c:799
>  SYSC_recvfrom+0x2dc/0x570 net/socket.c:1788
>  SyS_recvfrom+0x40/0x50 net/socket.c:1760
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x444c89
> RSP: 002b:00007ffd80c788e8 EFLAGS: 00000286 ORIG_RAX: 000000000000002d
> RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000444c89
> RDX: 0000000000000000 RSI: 0000000020bc0000 RDI: 0000000000000004
> RBP: 0000000000000082 R08: 00000000200c0ff0 R09: 0000000000000010
> R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000402390
> R13: 0000000000402420 R14: 0000000000000000 R15: 0000000000000000
> Code: f6 c1 01 0f 85 a5 01 00 00 48 89 4d b8 e8 31 e9 6b fd 48 8b 4d b8
> 48 b8 00 00 00 00 00 fc ff df 48 83 e1 fe 48 89 ca 48 c1 ea 03 <80> 3c
> 02 00 0f 85 41 02 00 00 48 8b 09 48 b8 00 00 00 00 00 fc 
> RIP: __ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 RSP:
> ffff8801cc49f628
> ---[ end trace b30d95b284222843 ]---
> Kernel panic - not syncing: Fatal exception

Thank you Eric for the report! 

Darn me, I seriously messed-up with the stateless consume.

I think we can have similar issues pith ipsec/secpath and MSG_PEEK,
even if with less catastropthic outcome.

What about the following, which should cover both cases? (only compile
tested, I'll test it tomorrow morning my time)
---
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d67a8182e5eb..63df75ae70ee 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -885,7 +885,7 @@ void kfree_skb(struct sk_buff *skb);
 void kfree_skb_list(struct sk_buff *segs);
 void skb_tx_error(struct sk_buff *skb);
 void consume_skb(struct sk_buff *skb);
-void consume_stateless_skb(struct sk_buff *skb);
+void __consume_stateless_skb(struct sk_buff *skb);
 void  __kfree_skb(struct sk_buff *skb);
 extern struct kmem_cache *skbuff_head_cache;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e07556606284..f2411a8744d7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -753,14 +753,11 @@ EXPORT_SYMBOL(consume_skb);
  *	consume_stateless_skb - free an skbuff, assuming it is stateless
  *	@skb: buffer to free
  *
- *	Works like consume_skb(), but this variant assumes that all the head
- *	states have been already dropped.
+ *	Alike consume_skb(), but this variant assumes that all the head
+ *	states have been already dropped and usage count is one
  */
-void consume_stateless_skb(struct sk_buff *skb)
+void __consume_stateless_skb(struct sk_buff *skb)
 {
-	if (!skb_unref(skb))
-		return;
-
 	trace_consume_skb(skb);
 	if (likely(skb->head))
 		skb_release_data(skb);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 62344804baae..979e4d8526ba 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1386,12 +1386,15 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
 		unlock_sock_fast(sk, slow);
 	}
 
+	if (!skb_unref(skb))
+		return;
+
 	/* In the more common cases we cleared the head states previously,
 	 * see __udp_queue_rcv_skb().
 	 */
 	if (unlikely(udp_skb_has_head_state(skb)))
 		skb_release_head_state(skb);
-	consume_stateless_skb(skb);
+	__consume_stateless_skb(skb);
 }
 EXPORT_SYMBOL_GPL(skb_consume_udp);
 

  reply	other threads:[~2017-09-05 21:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni
2017-09-05 17:18   ` Eric Dumazet
2017-09-05 21:03     ` Paolo Abeni [this message]
2017-09-06 12:44     ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni
2017-09-08  3:03       ` David Miller
2017-08-03 16:07 ` [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options" Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 4/4] udp: no need to preserve skb->dst Paolo Abeni
2017-08-07  3:51 ` [PATCH net-next 0/4] IP: cleanup LSRR option processing David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1504645381.3841.6.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.