* [PATCH V2,nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and hash resize
2016-07-03 5:18 [PATCH V2,nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Liping Zhang
@ 2016-07-03 5:18 ` Liping Zhang
2016-07-03 5:18 ` [PATCH V2,nf 2/3] netfilter: cttimeout: unlink timeout obj again when hash resize happen Liping Zhang
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Liping Zhang @ 2016-07-03 5:18 UTC (permalink / raw)
To: pablo; +Cc: fw, netfilter-devel, Liping Zhang
From: Liping Zhang <liping.zhang@spreadtrum.com>
When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack
hash table via /sys/module/nf_conntrack/parameters/hashsize, race will
happen, because reader can observe a newly allocated hash but the old size
(or vice versa). So oops will happen like follows:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack]
Call Trace:
[<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack]
[<ffffffff81261a1c>] seq_read+0x2cc/0x390
[<ffffffff812a8d62>] proc_reg_read+0x42/0x70
[<ffffffff8123bee7>] __vfs_read+0x37/0x130
[<ffffffff81347980>] ? security_file_permission+0xa0/0xc0
[<ffffffff8123cf75>] vfs_read+0x95/0x140
[<ffffffff8123e475>] SyS_read+0x55/0xc0
[<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4
It is very easy to reproduce this kernel crash.
1. open one shell and input the following cmds:
while : ; do
echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize
done
2. open more shells and input the following cmds:
while : ; do
cat /proc/net/nf_conntrack
done
3. just wait a monent, oops will happen soon.
The solution in this patch is based on Florian's Commit 5e3c61f98175
("netfilter: conntrack: fix lookup race during hash resize"). And
add a wrapper function nf_conntrack_get_ht to get hash and hsize
suggested by Florian Westphal.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
---
V2: Add nf_conntrack_get_ht wrapper function. I think in ____nf_conntrack_find
and nf_conntrack_tuple_taken, we can also use nf_conntrack_get_ht to simplify
the source code, but it is unrelated to this bug fix. So I'd rather take
another patch to do this thing.
include/net/netfilter/nf_conntrack_core.h | 2 ++
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c | 14 ++++++++++----
net/netfilter/nf_conntrack_core.c | 17 +++++++++++++++++
net/netfilter/nf_conntrack_standalone.c | 14 +++++++++-----
4 files changed, 38 insertions(+), 9 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 3e2f332..79d7ac5 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -51,6 +51,8 @@ bool nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse,
const struct nf_conntrack_l3proto *l3proto,
const struct nf_conntrack_l4proto *l4proto);
+void nf_conntrack_get_ht(struct hlist_nulls_head **hash, unsigned int *hsize);
+
/* Find a connection corresponding to a tuple. */
struct nf_conntrack_tuple_hash *
nf_conntrack_find_get(struct net *net,
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
index c6f3c40..6392371 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
@@ -26,6 +26,8 @@
struct ct_iter_state {
struct seq_net_private p;
+ struct hlist_nulls_head *hash;
+ unsigned int htable_size;
unsigned int bucket;
};
@@ -35,10 +37,10 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
struct hlist_nulls_node *n;
for (st->bucket = 0;
- st->bucket < nf_conntrack_htable_size;
+ st->bucket < st->htable_size;
st->bucket++) {
n = rcu_dereference(
- hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
+ hlist_nulls_first_rcu(&st->hash[st->bucket]));
if (!is_a_nulls(n))
return n;
}
@@ -53,11 +55,11 @@ static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
head = rcu_dereference(hlist_nulls_next_rcu(head));
while (is_a_nulls(head)) {
if (likely(get_nulls_value(head) == st->bucket)) {
- if (++st->bucket >= nf_conntrack_htable_size)
+ if (++st->bucket >= st->htable_size)
return NULL;
}
head = rcu_dereference(
- hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
+ hlist_nulls_first_rcu(&st->hash[st->bucket]));
}
return head;
}
@@ -75,7 +77,11 @@ static struct hlist_nulls_node *ct_get_idx(struct seq_file *seq, loff_t pos)
static void *ct_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(RCU)
{
+ struct ct_iter_state *st = seq->private;
+
rcu_read_lock();
+
+ nf_conntrack_get_ht(&st->hash, &st->htable_size);
return ct_get_idx(seq, *pos);
}
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index f204274..2d2472b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -466,6 +466,23 @@ nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
net_eq(net, nf_ct_net(ct));
}
+/* must be called with rcu read lock held */
+void nf_conntrack_get_ht(struct hlist_nulls_head **hash, unsigned int *hsize)
+{
+ struct hlist_nulls_head *hptr;
+ unsigned int sequence, hsz;
+
+ do {
+ sequence = read_seqcount_begin(&nf_conntrack_generation);
+ hsz = nf_conntrack_htable_size;
+ hptr = nf_conntrack_hash;
+ } while (read_seqcount_retry(&nf_conntrack_generation, sequence));
+
+ *hash = hptr;
+ *hsize = hsz;
+}
+EXPORT_SYMBOL_GPL(nf_conntrack_get_ht);
+
/*
* Warning :
* - Caller must take a reference on returned object
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index c026c47..f430b2f 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -48,6 +48,8 @@ EXPORT_SYMBOL_GPL(print_tuple);
struct ct_iter_state {
struct seq_net_private p;
+ struct hlist_nulls_head *hash;
+ unsigned int htable_size;
unsigned int bucket;
u_int64_t time_now;
};
@@ -58,9 +60,10 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
struct hlist_nulls_node *n;
for (st->bucket = 0;
- st->bucket < nf_conntrack_htable_size;
+ st->bucket < st->htable_size;
st->bucket++) {
- n = rcu_dereference(hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
+ n = rcu_dereference(
+ hlist_nulls_first_rcu(&st->hash[st->bucket]));
if (!is_a_nulls(n))
return n;
}
@@ -75,12 +78,11 @@ static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
head = rcu_dereference(hlist_nulls_next_rcu(head));
while (is_a_nulls(head)) {
if (likely(get_nulls_value(head) == st->bucket)) {
- if (++st->bucket >= nf_conntrack_htable_size)
+ if (++st->bucket >= st->htable_size)
return NULL;
}
head = rcu_dereference(
- hlist_nulls_first_rcu(
- &nf_conntrack_hash[st->bucket]));
+ hlist_nulls_first_rcu(&st->hash[st->bucket]));
}
return head;
}
@@ -102,6 +104,8 @@ static void *ct_seq_start(struct seq_file *seq, loff_t *pos)
st->time_now = ktime_get_real_ns();
rcu_read_lock();
+
+ nf_conntrack_get_ht(&st->hash, &st->htable_size);
return ct_get_idx(seq, *pos);
}
--
2.5.5
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH V2,nf 2/3] netfilter: cttimeout: unlink timeout obj again when hash resize happen
2016-07-03 5:18 [PATCH V2,nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Liping Zhang
2016-07-03 5:18 ` [PATCH V2,nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and " Liping Zhang
@ 2016-07-03 5:18 ` Liping Zhang
2016-07-03 5:18 ` [PATCH V2,nf 3/3] netfilter: nf_ct_helper: unlink helper " Liping Zhang
2016-07-11 10:09 ` [PATCH V2,nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Pablo Neira Ayuso
3 siblings, 0 replies; 5+ messages in thread
From: Liping Zhang @ 2016-07-03 5:18 UTC (permalink / raw)
To: pablo; +Cc: fw, netfilter-devel, Liping Zhang
From: Liping Zhang <liping.zhang@spreadtrum.com>
Imagine such situation, nf_conntrack_htable_size now is 4096, we are doing
ctnl_untimeout, and iterate on 3000# bucket.
Meanwhile, another user try to reduce hash size to 2048, then all nf_conn
are removed to the new hashtable. When this hash resize operation finished,
we still try to itreate ct begin from 3000# bucket, find nothing to do and
just return.
We may miss unlinking some timeout objects. And later we will end up with
invalid references to timeout object that are already gone.
So when we find that hash resize happened, try to unlink timeout objects
from the 0# bucket again.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
---
V2: no need to use nf_conntrack_generation to check hash resize happen.
net/netfilter/nfnetlink_cttimeout.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index 3c84f14..4cdcd96 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -303,16 +303,24 @@ static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout)
{
struct nf_conntrack_tuple_hash *h;
const struct hlist_nulls_node *nn;
+ unsigned int last_hsize;
+ spinlock_t *lock;
int i;
local_bh_disable();
- for (i = 0; i < nf_conntrack_htable_size; i++) {
- nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
- if (i < nf_conntrack_htable_size) {
- hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
- untimeout(h, timeout);
+restart:
+ last_hsize = nf_conntrack_htable_size;
+ for (i = 0; i < last_hsize; i++) {
+ lock = &nf_conntrack_locks[i % CONNTRACK_LOCKS];
+ nf_conntrack_lock(lock);
+ if (last_hsize != nf_conntrack_htable_size) {
+ spin_unlock(lock);
+ goto restart;
}
- spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+
+ hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
+ untimeout(h, timeout);
+ spin_unlock(lock);
}
local_bh_enable();
}
--
2.5.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH V2,nf 3/3] netfilter: nf_ct_helper: unlink helper again when hash resize happen
2016-07-03 5:18 [PATCH V2,nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Liping Zhang
2016-07-03 5:18 ` [PATCH V2,nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and " Liping Zhang
2016-07-03 5:18 ` [PATCH V2,nf 2/3] netfilter: cttimeout: unlink timeout obj again when hash resize happen Liping Zhang
@ 2016-07-03 5:18 ` Liping Zhang
2016-07-11 10:09 ` [PATCH V2,nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Pablo Neira Ayuso
3 siblings, 0 replies; 5+ messages in thread
From: Liping Zhang @ 2016-07-03 5:18 UTC (permalink / raw)
To: pablo; +Cc: fw, netfilter-devel, Liping Zhang
From: Liping Zhang <liping.zhang@spreadtrum.com>
Similar to ctnl_untimeout, when hash resize happened, we should try
to do unhelp from the 0# bucket again.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
---
V2: no need to use nf_conntrack_generation to check hash resize happen.
net/netfilter/nf_conntrack_helper.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 196cb39..db63a79 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -392,6 +392,8 @@ static void __nf_conntrack_helper_unregister(struct nf_conntrack_helper *me,
struct nf_conntrack_expect *exp;
const struct hlist_node *next;
const struct hlist_nulls_node *nn;
+ unsigned int last_hsize;
+ spinlock_t *lock;
unsigned int i;
int cpu;
@@ -422,14 +424,20 @@ static void __nf_conntrack_helper_unregister(struct nf_conntrack_helper *me,
unhelp(h, me);
spin_unlock_bh(&pcpu->lock);
}
+
local_bh_disable();
- for (i = 0; i < nf_conntrack_htable_size; i++) {
- nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
- if (i < nf_conntrack_htable_size) {
- hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
- unhelp(h, me);
+restart:
+ last_hsize = nf_conntrack_htable_size;
+ for (i = 0; i < last_hsize; i++) {
+ lock = &nf_conntrack_locks[i % CONNTRACK_LOCKS];
+ nf_conntrack_lock(lock);
+ if (last_hsize != nf_conntrack_htable_size) {
+ spin_unlock(lock);
+ goto restart;
}
- spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
+ unhelp(h, me);
+ spin_unlock(lock);
}
local_bh_enable();
}
--
2.5.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH V2,nf 0/3] netfilter: conntrack: fix race condition associated with hash resize
2016-07-03 5:18 [PATCH V2,nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Liping Zhang
` (2 preceding siblings ...)
2016-07-03 5:18 ` [PATCH V2,nf 3/3] netfilter: nf_ct_helper: unlink helper " Liping Zhang
@ 2016-07-11 10:09 ` Pablo Neira Ayuso
3 siblings, 0 replies; 5+ messages in thread
From: Pablo Neira Ayuso @ 2016-07-11 10:09 UTC (permalink / raw)
To: Liping Zhang; +Cc: fw, netfilter-devel, Liping Zhang
On Sun, Jul 03, 2016 at 01:18:42PM +0800, Liping Zhang wrote:
> From: Liping Zhang <liping.zhang@spreadtrum.com>
>
> When user adjust the hash size via /sys/module/nf_conntrack/parameters/hashsize,
> something will break because race condition happened.
>
> This patch set aim to fix these bugs.
>
> When we do "cat /proc/net/nf_conntrack", and at the same time do hash resize,
> nf_conntrack_htable_size and nf_conntrack_hash may become unrelated if we
> read them separately, so oops will happen. Fix this in patch #1.
>
> When we do unlink help or timeout objects, and at the same time do hash resize,
> we may miss unlinking some objects, later we will end up with invalid references.
> Fix this in patch #2 and #3.
Series applied to nf-next. We're already a bit late in the rc cycle
and this has been broken since the beginning, so I'm inclined to
follow the nf-next path.
^ permalink raw reply [flat|nested] 5+ messages in thread