From: Andrew Vagin <avagin@parallels.com>
To: Eric Dumazet <eric.dumazet@gmail.com>, Florian Westphal <fw@strlen.de>
Cc: Andrey Vagin <avagin@openvz.org>,
netfilter-devel@vger.kernel.org, netfilter@vger.kernel.org,
coreteam@netfilter.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, vvs@openvz.org,
Pablo Neira Ayuso <pablo@netfilter.org>,
Patrick McHardy <kaber@trash.net>,
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
"David S. Miller" <davem@davemloft.net>,
Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3)
Date: Tue, 14 Jan 2014 14:51:47 +0400 [thread overview]
Message-ID: <20140114105147.GA14538@paralelels.com> (raw)
In-Reply-To: <1389558074.31367.187.camel@edumazet-glaptop2.roam.corp.google.com>
On Sun, Jan 12, 2014 at 12:21:14PM -0800, Eric Dumazet wrote:
> On Sun, 2014-01-12 at 21:50 +0400, Andrey Vagin wrote:
> > Lets look at destroy_conntrack:
> >
> > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> > ...
> > nf_conntrack_free(ct)
> > kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
> >
> > net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
> >
> > The hash is protected by rcu, so readers look up conntracks without
> > locks.
> > A conntrack is removed from the hash, but in this moment a few readers
> > still can use the conntrack. Then this conntrack is released and another
> > thread creates conntrack with the same address and the equal tuple.
> > After this a reader starts to validate the conntrack:
> > * It's not dying, because a new conntrack was created
> > * nf_ct_tuple_equal() returns true.
> ...
>
>
> > v2: move nf_ct_is_confirmed into the unlikely() annotation
> > v3: Eric suggested to fix refcnt, so that it becomes zero before adding
> > in a hash, but we can't find a way how to do that. Another way is to
> > interpret the confirm bit as part of a search key and check it in
> > ____nf_conntrack_find() too.
> >
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: Florian Westphal <fw@strlen.de>
> > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > Cc: Patrick McHardy <kaber@trash.net>
> > Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> > Signed-off-by: Andrey Vagin <avagin@openvz.org>
> > ---
>
> Acked-by: Eric Dumazet <edumazet@google.com>
>
> Thanks Andrey !
>
Eh, looks like this path is incomplete too:(
I think we can't set a reference counter for objects which is allocated
from a SLAB_DESTROY_BY_RCU cache. Look at the following backtrace.
cpu1 cpu2
ct = ____nf_conntrack_find()
destroy_conntrack
atomic_inc_not_zero(ct)
__nf_conntrack_alloc
atomic_set(&ct->ct_general.use, 1);
if (!nf_ct_key_equal(h, tuple, zone))
nf_ct_put(ct);
destroy_conntrack(ct) !!!!
/* continues to use the conntrack */
Did I miss something again?
I think __nf_conntrack_alloc must use atomic_inc instead of
atomic_set. And we must be sure, that the first object from a new page is
zeroized.
I am talking about this, because after this patch a bug was triggered from
another place:
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
<4>[67096.759371] invalid opcode: 0000 [#1] SMP
<4>[67096.759385] last sysfs file: /sys/devices/virtual/block/md0/md/metadata_version
<4>[67096.759414] CPU 2
<4>[67096.759422] Modules linked in: xt_comment sch_sfq cls_fw sch_htb pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs xt_string xt_hashlimit xt_recent xt_length xt_hl xt_tcpmss xt_TCPMSS xt_multiport xt_limit xt_dscp vzevent coretemp cpufreq_ondemand acpi_cpufreq freq_table mperf 8021q garp stp llc ipt_REJECT iptable_filter iptable_mangle xt_NOTRACK iptable_raw iptable_nat ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state ip6table_filter ip6table_raw xt_MARK ip6table_mangle ip6_tables ext4 jbd2 tun ip_gre ipip vzethdev vznetdev vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ipv6 vzcpt nf_conntrack vzdquota vzmon vziolimit vzdev tunnel4 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc tpm_tis tpm tpm_bios microcode serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support e1000e
ext3 jbd mbcache raid1 sd_mod crc_t10dif ata_piix ahci pata_acpi ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>[67096.759801]
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760032] RSP: 0000:ffff88001ae378b8 EFLAGS: 00010246
<4>[67096.760075] RAX: 0000000000000000 RBX: ffff8801a57ac928 RCX: 0000000000065000
<4>[67096.760123] RDX: 000000000000f603 RSI: 0000000000000006 RDI: ffff8801a57ac928
<4>[67096.760174] RBP: ffff88001ae378d8 R08: 0000000000000002 R09: ffff8802373b06e0
<4>[67096.760221] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88023928c080
<4>[67096.760255] R13: ffff880237e8c000 R14: 0000000000000002 R15: 0000000000000002
<4>[67096.760255] FS: 0000000000000000(0000) GS:ffff880028300000(0063) knlGS:00000000b63afbb0
<4>[67096.760255] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
<4>[67096.760255] CR2: 00000000b74f44c0 CR3: 00000000b89c6000 CR4: 00000000000007e0
<4>[67096.760255] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[67096.760255] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[67096.760255] Process atdd (pid: 498649, veid: 666, threadinfo ffff88001ae36000, task ffff88001deaa980)
<4>[67096.760255] Stack:
<4>[67096.760255] ffff88001ae378e8 ffff88001ae37988 ffff88023928c080 0000000000000003
<4>[67096.760255] <d> ffff88001ae378e8 ffffffff814844a7 ffff88001ae37908 ffffffffa03d9bb5
<4>[67096.760255] <d> ffff88012dcae580 ffff88023928c080 ffff88001ae379e8 ffffffffa03d9fb2
<4>[67096.760255] Call Trace:
<4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
<4>[67096.760255] Code: 0b ab 0a e1 eb b7 f6 05 34 f8 00 e2 20 74 b7 80 3d f0 b0 00 00 00 74 ae 48 89 de 48 c7 c7 20 16 3e a0 31 c0 e8 05 ca 13 e1 eb 9b <0f> 0b eb fe f6 05 0b f8 00 e2 20 0f 84 db fe ff ff 80 3d eb b0
<1>[67096.760255] RIP [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] RSP <ffff88001ae378b8>
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Vagin <avagin@parallels.com>
To: Eric Dumazet <eric.dumazet@gmail.com>, Florian Westphal <fw@strlen.de>
Cc: Andrey Vagin <avagin@openvz.org>,
<netfilter-devel@vger.kernel.org>, <netfilter@vger.kernel.org>,
<coreteam@netfilter.org>, <netdev@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <vvs@openvz.org>,
Pablo Neira Ayuso <pablo@netfilter.org>,
Patrick McHardy <kaber@trash.net>,
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
"David S. Miller" <davem@davemloft.net>,
Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3)
Date: Tue, 14 Jan 2014 14:51:47 +0400 [thread overview]
Message-ID: <20140114105147.GA14538@paralelels.com> (raw)
In-Reply-To: <1389558074.31367.187.camel@edumazet-glaptop2.roam.corp.google.com>
On Sun, Jan 12, 2014 at 12:21:14PM -0800, Eric Dumazet wrote:
> On Sun, 2014-01-12 at 21:50 +0400, Andrey Vagin wrote:
> > Lets look at destroy_conntrack:
> >
> > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> > ...
> > nf_conntrack_free(ct)
> > kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
> >
> > net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
> >
> > The hash is protected by rcu, so readers look up conntracks without
> > locks.
> > A conntrack is removed from the hash, but in this moment a few readers
> > still can use the conntrack. Then this conntrack is released and another
> > thread creates conntrack with the same address and the equal tuple.
> > After this a reader starts to validate the conntrack:
> > * It's not dying, because a new conntrack was created
> > * nf_ct_tuple_equal() returns true.
> ...
>
>
> > v2: move nf_ct_is_confirmed into the unlikely() annotation
> > v3: Eric suggested to fix refcnt, so that it becomes zero before adding
> > in a hash, but we can't find a way how to do that. Another way is to
> > interpret the confirm bit as part of a search key and check it in
> > ____nf_conntrack_find() too.
> >
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: Florian Westphal <fw@strlen.de>
> > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > Cc: Patrick McHardy <kaber@trash.net>
> > Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> > Signed-off-by: Andrey Vagin <avagin@openvz.org>
> > ---
>
> Acked-by: Eric Dumazet <edumazet@google.com>
>
> Thanks Andrey !
>
Eh, looks like this path is incomplete too:(
I think we can't set a reference counter for objects which is allocated
from a SLAB_DESTROY_BY_RCU cache. Look at the following backtrace.
cpu1 cpu2
ct = ____nf_conntrack_find()
destroy_conntrack
atomic_inc_not_zero(ct)
__nf_conntrack_alloc
atomic_set(&ct->ct_general.use, 1);
if (!nf_ct_key_equal(h, tuple, zone))
nf_ct_put(ct);
destroy_conntrack(ct) !!!!
/* continues to use the conntrack */
Did I miss something again?
I think __nf_conntrack_alloc must use atomic_inc instead of
atomic_set. And we must be sure, that the first object from a new page is
zeroized.
I am talking about this, because after this patch a bug was triggered from
another place:
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
<4>[67096.759371] invalid opcode: 0000 [#1] SMP
<4>[67096.759385] last sysfs file: /sys/devices/virtual/block/md0/md/metadata_version
<4>[67096.759414] CPU 2
<4>[67096.759422] Modules linked in: xt_comment sch_sfq cls_fw sch_htb pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs xt_string xt_hashlimit xt_recent xt_length xt_hl xt_tcpmss xt_TCPMSS xt_multiport xt_limit xt_dscp vzevent coretemp cpufreq_ondemand acpi_cpufreq freq_table mperf 8021q garp stp llc ipt_REJECT iptable_filter iptable_mangle xt_NOTRACK iptable_raw iptable_nat ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state ip6table_filter ip6table_raw xt_MARK ip6table_mangle ip6_tables ext4 jbd2 tun ip_gre ipip vzethdev vznetdev vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ipv6 vzcpt nf_conntrack vzdquota vzmon vziolimit vzdev tunnel4 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc tpm_tis tpm tpm_bios microcode serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support e1000e
ext3 jbd mbcache raid1 sd_mod crc_t10dif ata_piix ahci pata_acpi ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>[67096.759801]
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760032] RSP: 0000:ffff88001ae378b8 EFLAGS: 00010246
<4>[67096.760075] RAX: 0000000000000000 RBX: ffff8801a57ac928 RCX: 0000000000065000
<4>[67096.760123] RDX: 000000000000f603 RSI: 0000000000000006 RDI: ffff8801a57ac928
<4>[67096.760174] RBP: ffff88001ae378d8 R08: 0000000000000002 R09: ffff8802373b06e0
<4>[67096.760221] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88023928c080
<4>[67096.760255] R13: ffff880237e8c000 R14: 0000000000000002 R15: 0000000000000002
<4>[67096.760255] FS: 0000000000000000(0000) GS:ffff880028300000(0063) knlGS:00000000b63afbb0
<4>[67096.760255] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
<4>[67096.760255] CR2: 00000000b74f44c0 CR3: 00000000b89c6000 CR4: 00000000000007e0
<4>[67096.760255] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[67096.760255] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[67096.760255] Process atdd (pid: 498649, veid: 666, threadinfo ffff88001ae36000, task ffff88001deaa980)
<4>[67096.760255] Stack:
<4>[67096.760255] ffff88001ae378e8 ffff88001ae37988 ffff88023928c080 0000000000000003
<4>[67096.760255] <d> ffff88001ae378e8 ffffffff814844a7 ffff88001ae37908 ffffffffa03d9bb5
<4>[67096.760255] <d> ffff88012dcae580 ffff88023928c080 ffff88001ae379e8 ffffffffa03d9fb2
<4>[67096.760255] Call Trace:
<4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
<4>[67096.760255] Code: 0b ab 0a e1 eb b7 f6 05 34 f8 00 e2 20 74 b7 80 3d f0 b0 00 00 00 74 ae 48 89 de 48 c7 c7 20 16 3e a0 31 c0 e8 05 ca 13 e1 eb 9b <0f> 0b eb fe f6 05 0b f8 00 e2 20 0f 84 db fe ff ff 80 3d eb b0
<1>[67096.760255] RIP [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] RSP <ffff88001ae378b8>
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Vagin <avagin@parallels.com>
To: Eric Dumazet <eric.dumazet@gmail.com>, Florian Westphal <fw@strlen.de>
Cc: Andrey Vagin <avagin@openvz.org>,
<netfilter-devel@vger.kernel.org>, <netfilter@vger.kernel.org>,
<coreteam@netfilter.org>, <netdev@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <vvs@openvz.org>,
Pablo Neira Ayuso <pablo@netfilter.org>,
Patrick McHardy <kaber@trash.net>,
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
"David S. Miller" <davem@davemloft.net>,
Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3)
Date: Tue, 14 Jan 2014 14:51:47 +0400 [thread overview]
Message-ID: <20140114105147.GA14538@paralelels.com> (raw)
In-Reply-To: <1389558074.31367.187.camel@edumazet-glaptop2.roam.corp.google.com>
On Sun, Jan 12, 2014 at 12:21:14PM -0800, Eric Dumazet wrote:
> On Sun, 2014-01-12 at 21:50 +0400, Andrey Vagin wrote:
> > Lets look at destroy_conntrack:
> >
> > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> > ...
> > nf_conntrack_free(ct)
> > kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
> >
> > net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
> >
> > The hash is protected by rcu, so readers look up conntracks without
> > locks.
> > A conntrack is removed from the hash, but in this moment a few readers
> > still can use the conntrack. Then this conntrack is released and another
> > thread creates conntrack with the same address and the equal tuple.
> > After this a reader starts to validate the conntrack:
> > * It's not dying, because a new conntrack was created
> > * nf_ct_tuple_equal() returns true.
> ...
>
>
> > v2: move nf_ct_is_confirmed into the unlikely() annotation
> > v3: Eric suggested to fix refcnt, so that it becomes zero before adding
> > in a hash, but we can't find a way how to do that. Another way is to
> > interpret the confirm bit as part of a search key and check it in
> > ____nf_conntrack_find() too.
> >
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: Florian Westphal <fw@strlen.de>
> > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > Cc: Patrick McHardy <kaber@trash.net>
> > Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> > Signed-off-by: Andrey Vagin <avagin@openvz.org>
> > ---
>
> Acked-by: Eric Dumazet <edumazet@google.com>
>
> Thanks Andrey !
>
Eh, looks like this path is incomplete too:(
I think we can't set a reference counter for objects which is allocated
from a SLAB_DESTROY_BY_RCU cache. Look at the following backtrace.
cpu1 cpu2
ct = ____nf_conntrack_find()
destroy_conntrack
atomic_inc_not_zero(ct)
__nf_conntrack_alloc
atomic_set(&ct->ct_general.use, 1);
if (!nf_ct_key_equal(h, tuple, zone))
nf_ct_put(ct);
destroy_conntrack(ct) !!!!
/* continues to use the conntrack */
Did I miss something again?
I think __nf_conntrack_alloc must use atomic_inc instead of
atomic_set. And we must be sure, that the first object from a new page is
zeroized.
I am talking about this, because after this patch a bug was triggered from
another place:
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
<4>[67096.759371] invalid opcode: 0000 [#1] SMP
<4>[67096.759385] last sysfs file: /sys/devices/virtual/block/md0/md/metadata_version
<4>[67096.759414] CPU 2
<4>[67096.759422] Modules linked in: xt_comment sch_sfq cls_fw sch_htb pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs xt_string xt_hashlimit xt_recent xt_length xt_hl xt_tcpmss xt_TCPMSS xt_multiport xt_limit xt_dscp vzevent coretemp cpufreq_ondemand acpi_cpufreq freq_table mperf 8021q garp stp llc ipt_REJECT iptable_filter iptable_mangle xt_NOTRACK iptable_raw iptable_nat ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state ip6table_filter ip6table_raw xt_MARK ip6table_mangle ip6_tables ext4 jbd2 tun ip_gre ipip vzethdev vznetdev vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ipv6 vzcpt nf_conntrack vzdquota vzmon vziolimit vzdev tunnel4 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc tpm_tis tpm tpm_bios microcode serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support e1000e ext3 jbd mbcache raid1 sd_mod crc_t10dif ata_piix ahci pata_acpi ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>[67096.759801]
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760032] RSP: 0000:ffff88001ae378b8 EFLAGS: 00010246
<4>[67096.760075] RAX: 0000000000000000 RBX: ffff8801a57ac928 RCX: 0000000000065000
<4>[67096.760123] RDX: 000000000000f603 RSI: 0000000000000006 RDI: ffff8801a57ac928
<4>[67096.760174] RBP: ffff88001ae378d8 R08: 0000000000000002 R09: ffff8802373b06e0
<4>[67096.760221] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88023928c080
<4>[67096.760255] R13: ffff880237e8c000 R14: 0000000000000002 R15: 0000000000000002
<4>[67096.760255] FS: 0000000000000000(0000) GS:ffff880028300000(0063) knlGS:00000000b63afbb0
<4>[67096.760255] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
<4>[67096.760255] CR2: 00000000b74f44c0 CR3: 00000000b89c6000 CR4: 00000000000007e0
<4>[67096.760255] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[67096.760255] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[67096.760255] Process atdd (pid: 498649, veid: 666, threadinfo ffff88001ae36000, task ffff88001deaa980)
<4>[67096.760255] Stack:
<4>[67096.760255] ffff88001ae378e8 ffff88001ae37988 ffff88023928c080 0000000000000003
<4>[67096.760255] <d> ffff88001ae378e8 ffffffff814844a7 ffff88001ae37908 ffffffffa03d9bb5
<4>[67096.760255] <d> ffff88012dcae580 ffff88023928c080 ffff88001ae379e8 ffffffffa03d9fb2
<4>[67096.760255] Call Trace:
<4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
<4>[67096.760255] Code: 0b ab 0a e1 eb b7 f6 05 34 f8 00 e2 20 74 b7 80 3d f0 b0 00 00 00 74 ae 48 89 de 48 c7 c7 20 16 3e a0 31 c0 e8 05 ca 13 e1 eb 9b <0f> 0b eb fe f6 05 0b f8 00 e2 20 0f 84 db fe ff ff 80 3d eb b0
<1>[67096.760255] RIP [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] RSP <ffff88001ae378b8>
next prev parent reply other threads:[~2014-01-14 10:51 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 10:31 [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Andrey Vagin
2014-01-07 11:42 ` Vasily Averin
2014-01-07 15:08 ` Eric Dumazet
2014-01-07 15:25 ` Florian Westphal
2014-01-08 13:42 ` Eric Dumazet
2014-01-08 14:04 ` Florian Westphal
2014-01-08 17:31 ` Eric Dumazet
2014-01-08 20:18 ` Florian Westphal
2014-01-08 20:23 ` Florian Westphal
2014-01-09 20:32 ` Andrew Vagin
2014-01-09 20:32 ` Andrew Vagin
2014-01-09 20:56 ` Florian Westphal
2014-01-09 21:07 ` Andrew Vagin
2014-01-09 21:07 ` Andrew Vagin
2014-01-09 21:26 ` Florian Westphal
2014-01-09 5:24 ` Andrew Vagin
2014-01-09 15:23 ` Eric Dumazet
2014-01-09 21:46 ` Andrey Wagin
2014-01-08 13:17 ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v2) Andrey Vagin
2014-01-08 13:47 ` Eric Dumazet
2014-01-12 17:50 ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Andrey Vagin
2014-01-12 20:21 ` Eric Dumazet
2014-01-14 10:51 ` Andrew Vagin [this message]
2014-01-14 10:51 ` Andrew Vagin
2014-01-14 10:51 ` Andrew Vagin
2014-01-14 11:10 ` Andrey Wagin
2014-01-14 14:36 ` Eric Dumazet
2014-01-14 17:35 ` [PATCH] [RFC] netfilter: nf_conntrack: don't relase a conntrack with non-zero refcnt Andrey Vagin
2014-01-14 17:44 ` Cyrill Gorcunov
2014-01-14 18:53 ` Florian Westphal
2014-01-15 18:08 ` Andrew Vagin
2014-01-15 18:08 ` Andrew Vagin
2014-01-16 9:23 ` Florian Westphal
2014-02-02 23:30 ` Pablo Neira Ayuso
2014-02-03 13:59 ` Andrew Vagin
2014-02-03 13:59 ` Andrew Vagin
2014-02-03 16:22 ` Eric Dumazet
2014-01-27 13:44 ` Andrew Vagin
2014-01-27 13:44 ` Andrew Vagin
2014-01-29 19:21 ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140114105147.GA14538@paralelels.com \
--to=avagin@parallels.com \
--cc=avagin@openvz.org \
--cc=coreteam@netfilter.org \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=fw@strlen.de \
--cc=gorcunov@openvz.org \
--cc=kaber@trash.net \
--cc=kadlec@blackhole.kfki.hu \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=netfilter@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=vvs@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.