* [PATCH nf 0/3] netfilter: conntrack: fix race condition associated with hash resize
@ 2016-07-02 10:59 Liping Zhang
2016-07-02 10:59 ` [PATCH nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and " Liping Zhang
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Liping Zhang @ 2016-07-02 10:59 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, Liping Zhang
From: Liping Zhang <liping.zhang@spreadtrum.com>
When user adjust the hash size via /sys/module/nf_conntrack/parameters/hashsize,
something will break because race condition happened.
This patch set aim to fix these bugs.
When we do "cat /proc/net/nf_conntrack", and at the same time do hash resize,
nf_conntrack_htable_size and nf_conntrack_hash may become unrelated if we
read them separately, so oops will happen. Fix this in patch #1.
When we do unlink help or timeout objects, and at the same time do hash resize,
we may miss unlinking some objects, later we will end up with invalid references.
Fix this in patch #2 and #3.
Liping Zhang (3):
netfilter: conntrack: fix race between nf_conntrack proc read and hash
resize
netfilter: cttimeout: unlink timeout obj again when hash resize happen
netfilter: nf_ct_helper: unlink helper again when hash resize happen
include/net/netfilter/nf_conntrack_core.h | 1 +
.../netfilter/nf_conntrack_l3proto_ipv4_compat.c | 20 ++++++++++++++++----
net/netfilter/nf_conntrack_core.c | 4 +++-
net/netfilter/nf_conntrack_helper.c | 14 ++++++++++++--
net/netfilter/nf_conntrack_standalone.c | 20 +++++++++++++++-----
net/netfilter/nfnetlink_cttimeout.c | 14 ++++++++++++--
6 files changed, 59 insertions(+), 14 deletions(-)
--
2.5.5
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and hash resize
2016-07-02 10:59 [PATCH nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Liping Zhang
@ 2016-07-02 10:59 ` Liping Zhang
2016-07-02 17:46 ` Florian Westphal
2016-07-02 10:59 ` [PATCH nf 2/3] netfilter: cttimeout: unlink timeout obj again when hash resize happen Liping Zhang
2016-07-02 10:59 ` [PATCH nf 3/3] netfilter: nf_ct_helper: unlink helper " Liping Zhang
2 siblings, 1 reply; 6+ messages in thread
From: Liping Zhang @ 2016-07-02 10:59 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, Liping Zhang
From: Liping Zhang <liping.zhang@spreadtrum.com>
When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack
hash table via /sys/module/nf_conntrack/parameters/hashsize, race will
happen, because reader can observe a newly allocated hash but the old size
(or vice versa). So oops will happen like follows:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack]
Call Trace:
[<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack]
[<ffffffff81261a1c>] seq_read+0x2cc/0x390
[<ffffffff812a8d62>] proc_reg_read+0x42/0x70
[<ffffffff8123bee7>] __vfs_read+0x37/0x130
[<ffffffff81347980>] ? security_file_permission+0xa0/0xc0
[<ffffffff8123cf75>] vfs_read+0x95/0x140
[<ffffffff8123e475>] SyS_read+0x55/0xc0
[<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4
It is very easy to reproduce this kernel crash.
1. open one shell and input the following cmds:
while : ; do
echo $RANDOM > hashsize
done
2. open more shells and input the following cmds:
while : ; do
cat /proc/net/nf_conntrack
done
3. just wait a monent, oops will happen soon.
The solution in this patch is based on Florian's Commit 5e3c61f98175
("netfilter: conntrack: fix lookup race during hash resize").
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
---
include/net/netfilter/nf_conntrack_core.h | 1 +
.../netfilter/nf_conntrack_l3proto_ipv4_compat.c | 20 ++++++++++++++++----
net/netfilter/nf_conntrack_core.c | 4 +++-
net/netfilter/nf_conntrack_standalone.c | 20 +++++++++++++++-----
4 files changed, 35 insertions(+), 10 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 3e2f332..4f6453a 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -82,6 +82,7 @@ print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple,
#define CONNTRACK_LOCKS 1024
extern struct hlist_nulls_head *nf_conntrack_hash;
+extern seqcount_t nf_conntrack_generation;
extern spinlock_t nf_conntrack_locks[CONNTRACK_LOCKS];
void nf_conntrack_lock(spinlock_t *lock);
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
index c6f3c40..584899f 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
@@ -26,6 +26,8 @@
struct ct_iter_state {
struct seq_net_private p;
+ struct hlist_nulls_head *hash;
+ unsigned int htable_size;
unsigned int bucket;
};
@@ -35,10 +37,10 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
struct hlist_nulls_node *n;
for (st->bucket = 0;
- st->bucket < nf_conntrack_htable_size;
+ st->bucket < st->htable_size;
st->bucket++) {
n = rcu_dereference(
- hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
+ hlist_nulls_first_rcu(&st->hash[st->bucket]));
if (!is_a_nulls(n))
return n;
}
@@ -53,11 +55,11 @@ static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
head = rcu_dereference(hlist_nulls_next_rcu(head));
while (is_a_nulls(head)) {
if (likely(get_nulls_value(head) == st->bucket)) {
- if (++st->bucket >= nf_conntrack_htable_size)
+ if (++st->bucket >= st->htable_size)
return NULL;
}
head = rcu_dereference(
- hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
+ hlist_nulls_first_rcu(&st->hash[st->bucket]));
}
return head;
}
@@ -75,7 +77,17 @@ static struct hlist_nulls_node *ct_get_idx(struct seq_file *seq, loff_t pos)
static void *ct_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(RCU)
{
+ struct ct_iter_state *st = seq->private;
+ unsigned int sequence;
+
rcu_read_lock();
+
+ do {
+ sequence = read_seqcount_begin(&nf_conntrack_generation);
+ st->htable_size = nf_conntrack_htable_size;
+ st->hash = nf_conntrack_hash;
+ } while (read_seqcount_retry(&nf_conntrack_generation, sequence));
+
return ct_get_idx(seq, *pos);
}
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index f204274..1c39697 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -72,9 +72,11 @@ EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock);
struct hlist_nulls_head *nf_conntrack_hash __read_mostly;
EXPORT_SYMBOL_GPL(nf_conntrack_hash);
+seqcount_t nf_conntrack_generation __read_mostly;
+EXPORT_SYMBOL_GPL(nf_conntrack_generation);
+
static __read_mostly struct kmem_cache *nf_conntrack_cachep;
static __read_mostly spinlock_t nf_conntrack_locks_all_lock;
-static __read_mostly seqcount_t nf_conntrack_generation;
static __read_mostly DEFINE_SPINLOCK(nf_conntrack_locks_all_lock);
static __read_mostly bool nf_conntrack_locks_all;
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index c026c47..2cf484b 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -48,6 +48,8 @@ EXPORT_SYMBOL_GPL(print_tuple);
struct ct_iter_state {
struct seq_net_private p;
+ struct hlist_nulls_head *hash;
+ unsigned int htable_size;
unsigned int bucket;
u_int64_t time_now;
};
@@ -58,9 +60,10 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
struct hlist_nulls_node *n;
for (st->bucket = 0;
- st->bucket < nf_conntrack_htable_size;
+ st->bucket < st->htable_size;
st->bucket++) {
- n = rcu_dereference(hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
+ n = rcu_dereference(
+ hlist_nulls_first_rcu(&st->hash[st->bucket]));
if (!is_a_nulls(n))
return n;
}
@@ -75,12 +78,11 @@ static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
head = rcu_dereference(hlist_nulls_next_rcu(head));
while (is_a_nulls(head)) {
if (likely(get_nulls_value(head) == st->bucket)) {
- if (++st->bucket >= nf_conntrack_htable_size)
+ if (++st->bucket >= st->htable_size)
return NULL;
}
head = rcu_dereference(
- hlist_nulls_first_rcu(
- &nf_conntrack_hash[st->bucket]));
+ hlist_nulls_first_rcu(&st->hash[st->bucket]));
}
return head;
}
@@ -99,9 +101,17 @@ static void *ct_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(RCU)
{
struct ct_iter_state *st = seq->private;
+ unsigned int sequence;
st->time_now = ktime_get_real_ns();
rcu_read_lock();
+
+ do {
+ sequence = read_seqcount_begin(&nf_conntrack_generation);
+ st->htable_size = nf_conntrack_htable_size;
+ st->hash = nf_conntrack_hash;
+ } while (read_seqcount_retry(&nf_conntrack_generation, sequence));
+
return ct_get_idx(seq, *pos);
}
--
2.5.5
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH nf 2/3] netfilter: cttimeout: unlink timeout obj again when hash resize happen
2016-07-02 10:59 [PATCH nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Liping Zhang
2016-07-02 10:59 ` [PATCH nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and " Liping Zhang
@ 2016-07-02 10:59 ` Liping Zhang
2016-07-02 10:59 ` [PATCH nf 3/3] netfilter: nf_ct_helper: unlink helper " Liping Zhang
2 siblings, 0 replies; 6+ messages in thread
From: Liping Zhang @ 2016-07-02 10:59 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, Liping Zhang
From: Liping Zhang <liping.zhang@spreadtrum.com>
Imagine such situation, nf_conntrack_htable_size now is 4096, we are doing
ctnl_untimeout, and iterate on 3000# bucket.
Meanwhile, another user try to reduce hash size to 2048, then all nf_conn
are removed to the new hashtable. When this hash resize operation finished,
we still try to itreate ct begin from 3000# bucket, find nothing to do and
just return.
Then we may miss unlinking some timeout objects. And later we will end up
with invalid references to timeout object that are already gone.
So when we find that hash resize happened, try to unlink timeout objects
from the 0# bucket again.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
---
net/netfilter/nfnetlink_cttimeout.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index 3c84f14..1d0bc86 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -303,16 +303,26 @@ static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout)
{
struct nf_conntrack_tuple_hash *h;
const struct hlist_nulls_node *nn;
+ unsigned int sequence;
+ spinlock_t *lock;
int i;
local_bh_disable();
+restart:
+ sequence = read_seqcount_begin(&nf_conntrack_generation);
for (i = 0; i < nf_conntrack_htable_size; i++) {
- nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ lock = &nf_conntrack_locks[i % CONNTRACK_LOCKS];
+ nf_conntrack_lock(lock);
+ if (read_seqcount_retry(&nf_conntrack_generation, sequence)) {
+ spin_unlock(lock);
+ goto restart;
+ }
+
if (i < nf_conntrack_htable_size) {
hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
untimeout(h, timeout);
}
- spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ spin_unlock(lock);
}
local_bh_enable();
}
--
2.5.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH nf 3/3] netfilter: nf_ct_helper: unlink helper again when hash resize happen
2016-07-02 10:59 [PATCH nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Liping Zhang
2016-07-02 10:59 ` [PATCH nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and " Liping Zhang
2016-07-02 10:59 ` [PATCH nf 2/3] netfilter: cttimeout: unlink timeout obj again when hash resize happen Liping Zhang
@ 2016-07-02 10:59 ` Liping Zhang
2 siblings, 0 replies; 6+ messages in thread
From: Liping Zhang @ 2016-07-02 10:59 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, Liping Zhang
From: Liping Zhang <liping.zhang@spreadtrum.com>
Similar to ctnl_untimeout, when hash resize happened, we should try
to do unhelp from the 0# bucket again.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
---
net/netfilter/nf_conntrack_helper.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 196cb39..9d02b75 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -392,6 +392,8 @@ static void __nf_conntrack_helper_unregister(struct nf_conntrack_helper *me,
struct nf_conntrack_expect *exp;
const struct hlist_node *next;
const struct hlist_nulls_node *nn;
+ unsigned int sequence;
+ spinlock_t *lock;
unsigned int i;
int cpu;
@@ -422,14 +424,22 @@ static void __nf_conntrack_helper_unregister(struct nf_conntrack_helper *me,
unhelp(h, me);
spin_unlock_bh(&pcpu->lock);
}
+
local_bh_disable();
+restart:
+ sequence = read_seqcount_begin(&nf_conntrack_generation);
for (i = 0; i < nf_conntrack_htable_size; i++) {
- nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ lock = &nf_conntrack_locks[i % CONNTRACK_LOCKS];
+ nf_conntrack_lock(lock);
+ if (read_seqcount_retry(&nf_conntrack_generation, sequence)) {
+ spin_unlock(lock);
+ goto restart;
+ }
if (i < nf_conntrack_htable_size) {
hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
unhelp(h, me);
}
- spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ spin_unlock(lock);
}
local_bh_enable();
}
--
2.5.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and hash resize
2016-07-02 10:59 ` [PATCH nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and " Liping Zhang
@ 2016-07-02 17:46 ` Florian Westphal
2016-07-03 2:22 ` Liping Zhang
0 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2016-07-02 17:46 UTC (permalink / raw)
To: Liping Zhang; +Cc: pablo, netfilter-devel, Liping Zhang
Liping Zhang <zlpnobody@163.com> wrote:
> From: Liping Zhang <liping.zhang@spreadtrum.com>
>
> When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack
> hash table via /sys/module/nf_conntrack/parameters/hashsize, race will
> happen, because reader can observe a newly allocated hash but the old size
> (or vice versa). So oops will happen like follows:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
> IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack]
> Call Trace:
> [<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack]
> [<ffffffff81261a1c>] seq_read+0x2cc/0x390
> [<ffffffff812a8d62>] proc_reg_read+0x42/0x70
> [<ffffffff8123bee7>] __vfs_read+0x37/0x130
> [<ffffffff81347980>] ? security_file_permission+0xa0/0xc0
> [<ffffffff8123cf75>] vfs_read+0x95/0x140
> [<ffffffff8123e475>] SyS_read+0x55/0xc0
> [<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4
>
> It is very easy to reproduce this kernel crash.
> 1. open one shell and input the following cmds:
> while : ; do
> echo $RANDOM > hashsize
> done
> 2. open more shells and input the following cmds:
> while : ; do
> cat /proc/net/nf_conntrack
> done
> 3. just wait a monent, oops will happen soon.
Good catch, but ...
> diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
> index 3e2f332..4f6453a 100644
> --- a/include/net/netfilter/nf_conntrack_core.h
> +++ b/include/net/netfilter/nf_conntrack_core.h
> @@ -82,6 +82,7 @@ print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple,
> #define CONNTRACK_LOCKS 1024
>
> extern struct hlist_nulls_head *nf_conntrack_hash;
> +extern seqcount_t nf_conntrack_generation;
instead of this and the proliferation of this:
> + do {
> + sequence = read_seqcount_begin(&nf_conntrack_generation);
> + st->htable_size = nf_conntrack_htable_size;
> + st->hash = nf_conntrack_hash;
> + } while (read_seqcount_retry(&nf_conntrack_generation, sequence));
> +
> return ct_get_idx(seq, *pos);
> }
I think it might be better to do something like
/* must be called with rcu read lock held */
unsigned int nf_conntrack_get_ht(struct hlist_nulls_head *h,
unsigned int *buckets)
{
do {
s = read_seq ...
size = nf_conntrack_htable_size;
ptr = nf_conntrack_hash;
} while ...
*h = ptr;
*buckets = size;
return s;
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and hash resize
2016-07-02 17:46 ` Florian Westphal
@ 2016-07-03 2:22 ` Liping Zhang
0 siblings, 0 replies; 6+ messages in thread
From: Liping Zhang @ 2016-07-03 2:22 UTC (permalink / raw)
To: Florian Westphal; +Cc: pablo, netfilter-devel, Liping Zhang
>Good catch, but ...
>
>> diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
>> index 3e2f332..4f6453a 100644
>> --- a/include/net/netfilter/nf_conntrack_core.h
>> +++ b/include/net/netfilter/nf_conntrack_core.h
>> @@ -82,6 +82,7 @@ print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple,
>> #define CONNTRACK_LOCKS 1024
>>
>> extern struct hlist_nulls_head *nf_conntrack_hash;
>> +extern seqcount_t nf_conntrack_generation;
>
>instead of this and the proliferation of this:
>
>> + do {
>> + sequence = read_seqcount_begin(&nf_conntrack_generation);
>> + st->htable_size = nf_conntrack_htable_size;
>> + st->hash = nf_conntrack_hash;
>> + } while (read_seqcount_retry(&nf_conntrack_generation, sequence));
>> +
>> return ct_get_idx(seq, *pos);
>> }
>
>I think it might be better to do something like
>
>/* must be called with rcu read lock held */
>unsigned int nf_conntrack_get_ht(struct hlist_nulls_head *h,
> unsigned int *buckets)
>{
> do {
> s = read_seq ...
> size = nf_conntrack_htable_size;
> ptr = nf_conntrack_hash;
> } while ...
>
> *h = ptr;
> *buckets = size;
>
> return s;
Agree.
And I also find there's no need to use nf_conntrack_generation in my patch #2 and #3.
Will send V2 later.
Thanks
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-07-03 2:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-02 10:59 [PATCH nf 0/3] netfilter: conntrack: fix race condition associated with hash resize Liping Zhang
2016-07-02 10:59 ` [PATCH nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and " Liping Zhang
2016-07-02 17:46 ` Florian Westphal
2016-07-03 2:22 ` Liping Zhang
2016-07-02 10:59 ` [PATCH nf 2/3] netfilter: cttimeout: unlink timeout obj again when hash resize happen Liping Zhang
2016-07-02 10:59 ` [PATCH nf 3/3] netfilter: nf_ct_helper: unlink helper " Liping Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).