From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Florian Westphal <fw@strlen.de>,
Jesper Dangaard Brouer <brouer@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Nikolay Aleksandrov <nikolay@redhat.com>
Subject: [PATCH 3.10 07/41] net: fix for a race condition in the inet frag code
Date: Fri, 11 Apr 2014 09:09:38 -0700 [thread overview]
Message-ID: <20140411160933.378896169@linuxfoundation.org> (raw)
In-Reply-To: <20140411160932.865173041@linuxfoundation.org>
3.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov <nikolay@redhat.com>
[ Upstream commit 24b9bf43e93e0edd89072da51cf1fab95fc69dec ]
I stumbled upon this very serious bug while hunting for another one,
it's a very subtle race condition between inet_frag_evictor,
inet_frag_intern and the IPv4/6 frag_queue and expire functions
(basically the users of inet_frag_kill/inet_frag_put).
What happens is that after a fragment has been added to the hash chain
but before it's been added to the lru_list (inet_frag_lru_add) in
inet_frag_intern, it may get deleted (either by an expired timer if
the system load is high or the timer sufficiently low, or by the
fraq_queue function for different reasons) before it's added to the
lru_list, then after it gets added it's a matter of time for the
evictor to get to a piece of memory which has been freed leading to a
number of different bugs depending on what's left there.
I've been able to trigger this on both IPv4 and IPv6 (which is normal
as the frag code is the same), but it's been much more difficult to
trigger on IPv4 due to the protocol differences about how fragments
are treated.
The setup I used to reproduce this is: 2 machines with 4 x 10G bonded
in a RR bond, so the same flow can be seen on multiple cards at the
same time. Then I used multiple instances of ping/ping6 to generate
fragmented packets and flood the machines with them while running
other processes to load the attacked machine.
*It is very important to have the _same flow_ coming in on multiple CPUs
concurrently. Usually the attacked machine would die in less than 30
minutes, if configured properly to have many evictor calls and timeouts
it could happen in 10 minutes or so.
An important point to make is that any caller (frag_queue or timer) of
inet_frag_kill will remove both the timer refcount and the
original/guarding refcount thus removing everything that's keeping the
frag from being freed at the next inet_frag_put. All of this could
happen before the frag was ever added to the LRU list, then it gets
added and the evictor uses a freed fragment.
An example for IPv6 would be if a fragment is being added and is at
the stage of being inserted in the hash after the hash lock is
released, but before inet_frag_lru_add executes (or is able to obtain
the lru lock) another overlapping fragment for the same flow arrives
at a different CPU which finds it in the hash, but since it's
overlapping it drops it invoking inet_frag_kill and thus removing all
guarding refcounts, and afterwards freeing it by invoking
inet_frag_put which removes the last refcount added previously by
inet_frag_find, then inet_frag_lru_add gets executed by
inet_frag_intern and we have a freed fragment in the lru_list.
The fix is simple, just move the lru_add under the hash chain locked
region so when a removing function is called it'll have to wait for
the fragment to be added to the lru_list, and then it'll remove it (it
works because the hash chain removal is done before the lru_list one
and there's no window between the two list adds when the frag can get
dropped). With this fix applied I couldn't kill the same machine in 24
hours with the same setup.
Fixes: 3ef0eb0db4bf ("net: frag, move LRU list maintenance outside of
rwlock")
CC: Florian Westphal <fw@strlen.de>
CC: Jesper Dangaard Brouer <brouer@redhat.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/inet_fragment.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -283,9 +283,10 @@ static struct inet_frag_queue *inet_frag
atomic_inc(&qp->refcnt);
hlist_add_head(&qp->list, &hb->chain);
+ inet_frag_lru_add(nf, qp);
spin_unlock(&hb->chain_lock);
read_unlock(&f->lock);
- inet_frag_lru_add(nf, qp);
+
return qp;
}
next prev parent reply other threads:[~2014-04-11 16:09 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-11 16:09 [PATCH 3.10 00/41] 3.10.37-stable review Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 01/41] selinux: correctly label /proc inodes in use before the policy is loaded Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 02/41] powernow-k6: disable cache when changing frequency Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 03/41] powernow-k6: correctly initialize default parameters Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 04/41] powernow-k6: reorder frequencies Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 05/41] kbuild: fix make headers_install when path is too long Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 06/41] cpuidle: Check the result of cpuidle_get_driver() against NULL Greg Kroah-Hartman
2014-04-11 16:09 ` Greg Kroah-Hartman [this message]
2014-04-11 16:09 ` [PATCH 3.10 08/41] net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 09/41] bridge: multicast: add sanity check for query source addresses Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 10/41] inet: frag: make sure forced eviction removes all frags Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 11/41] net: unix: non blocking recvmsg() should not return -EINTR Greg Kroah-Hartman
2014-04-11 16:21 ` Rainer Weikusat
2014-04-11 16:09 ` [PATCH 3.10 12/41] ipv6: Fix exthdrs offload registration Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 13/41] ipv6: dont set DST_NOCOUNT for remotely added routes Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 14/41] vlan: Set correct source MAC address with TX VLAN offload enabled Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 15/41] tcp: tcp_release_cb() should release socket ownership Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 16/41] net: socket: error on a negative msg_namelen Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 17/41] ipv6: Avoid unnecessary temporary addresses being generated Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 18/41] ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 19/41] vxlan: fix potential NULL dereference in arp_reduce() Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 20/41] rtnetlink: fix fdb notification flags Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 21/41] ipmr: fix mfc " Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 22/41] ip6mr: " Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 23/41] netpoll: fix the skb check in pkt_is_ns Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 24/41] tg3: Do not include vlan acceleration features in vlan_features Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 25/41] usbnet: include wait queue head in device structure Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 26/41] vlan: Set hard_header_len according to available acceleration Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 27/41] vhost: fix total length when packets are too short Greg Kroah-Hartman
2014-04-11 16:09 ` [PATCH 3.10 28/41] vhost: validate vhost_get_vq_desc return value Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 29/41] xen-netback: remove pointless clause from if statement Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 30/41] ipv6: some ipv6 statistic counters failed to disable bh Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 31/41] netlink: dont compare the nul-termination in nla_strcmp Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 32/41] isdnloop: Validate NUL-terminated strings from user Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 33/41] isdnloop: several buffer overflows Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 34/41] rds: prevent dereference of a NULL device in rds_iw_laddr_check Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 35/41] ARC: [nsimosci] Change .dts to use generic 8250 UART Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 36/41] ARC: [nsimosci] Unbork console Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 37/41] futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 38/41] m68k: Skip " Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 39/41] crypto: ghash-clmulni-intel - use C implementation for setkey() Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 40/41] cpufreq: Fix governor start/stop race condition Greg Kroah-Hartman
2014-04-11 16:10 ` [PATCH 3.10 41/41] cpufreq: Fix timer/workqueue corruption due to double queueing Greg Kroah-Hartman
2014-04-11 21:44 ` [PATCH 3.10 00/41] 3.10.37-stable review Guenter Roeck
2014-04-11 23:45 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140411160933.378896169@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=brouer@redhat.com \
--cc=davem@davemloft.net \
--cc=fw@strlen.de \
--cc=linux-kernel@vger.kernel.org \
--cc=nikolay@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).