From: dormando <dormando@rydia.net>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Alexey Preobrazhensky" <preobr@google.com>,
"Steffen Klassert" <steffen.klassert@secunet.com>,
"David Miller" <davem@davemloft.net>,
paulmck@linux.vnet.ibm.com, netdev@vger.kernel.org,
"Kostya Serebryany" <kcc@google.com>,
"Dmitry Vyukov" <dvyukov@google.com>,
"Lars Bull" <larsbull@google.com>,
"Eric Dumazet" <edumazet@google.com>,
"Bruce Curtis" <brutus@google.com>,
"Maciej Żenczykowski" <maze@google.com>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>
Subject: Re: [PATCH] ipv4: fix a race in ip4_datagram_release_cb()
Date: Tue, 10 Jun 2014 17:32:22 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.02.1406101726390.11647@dtop> (raw)
In-Reply-To: <1402407781.3645.426.camel@edumazet-glaptop2.roam.corp.google.com>
On Tue, 10 Jun 2014, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Alexey gave a AddressSanitizer[1] report that finally gave a good hint
> at where was the origin of various problems already reported by Dormando
> in the past [2]
>
> Problem comes from the fact that UDP can have a lockless TX path, and
> concurrent threads can manipulate sk_dst_cache, while another thread,
> is holding socket lock and calls __sk_dst_set() in
> ip4_datagram_release_cb() (this was added in linux-3.8)
>
> It seems that all we need to do is to use sk_dst_check() and
> sk_dst_set() so that all the writers hold same spinlock
> (sk->sk_dst_lock) to prevent corruptions.
>
> TCP stack do not need this protection, as all sk_dst_cache writers hold
> the socket lock.
>
> [1]
> https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
>
> AddressSanitizer: heap-use-after-free in ipv4_dst_check
> Read of size 2 by thread T15453:
> [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
> [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
> [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0
> [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413
> [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0
> [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
> [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
> [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
> [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
> ./arch/x86/kernel/entry_64.S:629
>
> Freed by thread T15455:
> [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
> [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280
> [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0
> [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
> [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
> [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
> [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
> ./arch/x86/kernel/entry_64.S:629
>
> Allocated by thread T15453:
> [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
> [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
> [< inlined >] __ip_route_output_key+0x3e8/0xf70
> __mkroute_output ./net/ipv4/route.c:1939
> [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
> [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
> [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0
> [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
> [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
> [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
> [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
> ./arch/x86/kernel/entry_64.S:629
>
> [2]
> <4>[196727.311203] general protection fault: 0000 [#1] SMP
> <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
> <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
> <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
> <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
> <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>] [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
> <4>[196727.311399] RSP: 0018:ffff885effd23a70 EFLAGS: 00010282
> <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
> <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
> <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
> <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
> <4>[196727.311510] FS: 0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
> <4>[196727.311554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
> <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>[196727.311713] Stack:
> <4>[196727.311733] ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
> <4>[196727.311784] ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
> <4>[196727.311834] ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
> <4>[196727.311885] Call Trace:
> <4>[196727.311907] <IRQ>
> <4>[196727.311912] [<ffffffff815b7f42>] dst_destroy+0x32/0xe0
> <4>[196727.311959] [<ffffffff815b86c6>] dst_release+0x56/0x80
> <4>[196727.311986] [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
> <4>[196727.312013] [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
> <4>[196727.312041] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
> <4>[196727.312070] [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
> <4>[196727.312097] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
> <4>[196727.312125] [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
> <4>[196727.312154] [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
> <4>[196727.312183] [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360
> <4>[196727.312212] [<ffffffff815fe00b>] ip_rcv+0x22b/0x340
> <4>[196727.312242] [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
> <4>[196727.312275] [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
> <4>[196727.312308] [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
> <4>[196727.312338] [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
> <4>[196727.312368] [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
> <4>[196727.312397] [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
> <4>[196727.312433] [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
> <4>[196727.312463] [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
> <4>[196727.312491] [<ffffffff815b1691>] net_rx_action+0x111/0x210
> <4>[196727.312521] [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
> <4>[196727.312552] [<ffffffff810519d0>] __do_softirq+0xd0/0x270
> <4>[196727.312583] [<ffffffff816cef3c>] call_softirq+0x1c/0x30
> <4>[196727.312613] [<ffffffff81004205>] do_softirq+0x55/0x90
> <4>[196727.312640] [<ffffffff81051c85>] irq_exit+0x55/0x60
> <4>[196727.312668] [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0
> <4>[196727.312696] [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
> <4>[196727.312722] <EOI>
> <1>[196727.313071] RIP [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
> <4>[196727.313100] RSP <ffff885effd23a70>
> <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
> <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
>
> Reported-by: Alexey Preobrazhensky <preobr@google.com>
> Reported-by: dormando <dormando@rydia.ne>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets")
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> ---
> net/ipv4/datagram.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
> index 8b5134c582f1..a3095fdefbed 100644
> --- a/net/ipv4/datagram.c
> +++ b/net/ipv4/datagram.c
> @@ -86,18 +86,26 @@ out:
> }
> EXPORT_SYMBOL(ip4_datagram_connect);
>
> +/* Because UDP xmit path can manipulate sk_dst_cache without holding
> + * socket lock, we need to use sk_dst_set() here,
> + * even if we own the socket lock.
> + */
> void ip4_datagram_release_cb(struct sock *sk)
> {
> const struct inet_sock *inet = inet_sk(sk);
> const struct ip_options_rcu *inet_opt;
> __be32 daddr = inet->inet_daddr;
> + struct dst_entry *dst;
> struct flowi4 fl4;
> struct rtable *rt;
>
> - if (! __sk_dst_get(sk) || __sk_dst_check(sk, 0))
> - return;
> -
> rcu_read_lock();
> +
> + dst = __sk_dst_get(sk);
> + if (!dst || !dst->obsolete || dst->ops->check(dst, 0)) {
> + rcu_read_unlock();
> + return;
> + }
> inet_opt = rcu_dereference(inet->inet_opt);
> if (inet_opt && inet_opt->opt.srr)
> daddr = inet_opt->opt.faddr;
> @@ -105,8 +113,10 @@ void ip4_datagram_release_cb(struct sock *sk)
> inet->inet_saddr, inet->inet_dport,
> inet->inet_sport, sk->sk_protocol,
> RT_CONN_FLAGS(sk), sk->sk_bound_dev_if);
> - if (!IS_ERR(rt))
> - __sk_dst_set(sk, &rt->dst);
> +
> + dst = !IS_ERR(rt) ? &rt->dst : NULL;
> + sk_dst_set(sk, dst);
> +
> rcu_read_unlock();
> }
> EXPORT_SYMBOL_GPL(ip4_datagram_release_cb);
>
>
>
This is neat. So just confirming that I understand (thanks for re-cc'ing
me btw <3):
Because of the UDP glitch, it can cause either the kernel panic in the UDP
code, but also elsewhere in the TCP stack based on if it's fiddling with
the same DST being hosed by a UDP packet?
Unfortunately we're observing a pretty severe sintr cpu increase under
3.14, so I'm going to try this on 3.10.latest. Any potential issues there?
Thankfully it seems like our udpkill utility will reproduce it quickly.
Lets see how it goes!
next prev parent reply other threads:[~2014-06-11 0:32 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-06 11:29 Potential race in ip4_datagram_release_cb Alexey Preobrazhensky
2014-06-06 12:56 ` Eric Dumazet
2014-06-06 15:59 ` Alexei Starovoitov
2014-06-06 16:16 ` Eric Dumazet
2014-06-06 17:44 ` Alexei Starovoitov
2014-06-06 17:56 ` Eric Dumazet
2014-06-06 18:13 ` Alexei Starovoitov
2014-06-10 13:43 ` [PATCH] ipv4: fix a race in ip4_datagram_release_cb() Eric Dumazet
2014-06-11 0:32 ` dormando [this message]
2014-06-11 0:55 ` Eric Dumazet
2014-06-11 1:12 ` Eric Dumazet
2014-06-11 1:26 ` Eric Dumazet
2014-06-11 4:16 ` dormando
2014-06-11 5:54 ` Eric Dumazet
2014-06-11 7:20 ` dormando
2014-06-11 7:26 ` dormando
2014-06-11 7:38 ` dormando
2014-06-11 12:41 ` Eric Dumazet
2014-06-11 13:12 ` Eric Dumazet
2014-06-12 1:55 ` dormando
2014-06-12 3:43 ` Eric Dumazet
2014-06-12 4:05 ` dormando
2014-06-22 19:07 ` dormando
2014-06-23 8:33 ` Eric Dumazet
2014-06-23 8:55 ` dormando
2014-06-23 16:57 ` Dmitry Vyukov
2014-06-24 17:05 ` [PATCH net] ipv4: fix dst race in sk_dst_get() Eric Dumazet
2014-06-26 0:42 ` David Miller
2014-06-11 13:38 ` [PATCH] ipv4: fix a race in ip4_datagram_release_cb() Kostya Serebryany
2014-06-29 0:25 ` dormando
2014-06-30 6:38 ` Eric Dumazet
2014-06-30 8:15 ` dormando
2014-06-30 8:30 ` Eric Dumazet
2014-07-08 1:41 ` dormando
2014-07-08 6:47 ` Eric Dumazet
2014-07-08 7:01 ` dormando
2014-07-16 21:03 ` dormando
2014-07-25 8:11 ` dormando
2014-06-30 8:26 ` [PATCH] ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix Eric Dumazet
2014-07-01 6:43 ` David Miller
2014-06-11 22:39 ` [PATCH] ipv4: fix a race in ip4_datagram_release_cb() David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.02.1406101726390.11647@dtop \
--to=dormando@rydia.net \
--cc=alexei.starovoitov@gmail.com \
--cc=brutus@google.com \
--cc=davem@davemloft.net \
--cc=dvyukov@google.com \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=kcc@google.com \
--cc=larsbull@google.com \
--cc=maze@google.com \
--cc=netdev@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=preobr@google.com \
--cc=steffen.klassert@secunet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox