From: Vasily Averin <vvs@parallels.com>
To: "Michal Kubeček" <mkubecek@suse.cz>, "Phil Oester" <kernel@linuxace.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
netfilter-devel@vger.kernel.org, Andrey Vagin <avagin@openvz.org>,
Patrick McHardy <kaber@trash.net>
Subject: once again about crash in fins_appropriate_src()/same_src()
Date: Tue, 18 Feb 2014 14:16:30 +0400 [thread overview]
Message-ID: <530332FE.7040803@parallels.com> (raw)
Dear all,
I would return to problem fixed by Michal Kubeček
"[nf] netfilter: use RCU safe kfree for conntrack extensions"
https://bugzilla.netfilter.org/show_bug.cgi?id=714
I'm not sure that problem was caused by RCU race and I afraid it can be still not fixed.
Parallels got similar crash on RHEL6.u4-based kernel.
I've checked vmcore dump and found exactly the same situation:
we have zeroed struct nf_conn_nat in nat_bysource hlist, and crashed becasue nat->ct = NULL.
Our kernel did not included patch from Michal,
however I've found several strange facts:
- I've found old alive conntrack that have reference to extension with nf_conn_nat,
conntrack have strange status 0xe. I.e. have NOT set any NAT bits, but have CONFIRMED bit set
- our entry seems was not removed by RCU, all references in nat_bysource hlist looks correct
- last unloaded module was nf_nat -- but it is loaded back
I.e. it does not look like problem was caused by RCU race.
So I would like to ask, are you sure that problem in your cases was really caused by RCU race?
Did you tried to find conntrack used affected nf_conn_nat?
Did you observed probably conntrack destroy on neighbor CPUs?
I believe that nf_conn_nat could be zeroyed during nf_nat module unloading,
but I do not understand how this entry could return back into nat_byource list after module reloading.
Do you probably have some ideas?
You can find some more details about my case below.
<1>[6972396.347616] BUG: unable to handle kernel NULL pointer dereference at 000000000000003e
<1>[6972396.347733] IP: [<ffffffffa0243fa0>] nf_nat_setup_info+0x460/0x590 [nf_nat]
<4>[6972396.347802] PGD 0
<4>[6972396.347859] Oops: 0000 [#1] SMP
<4>[6972396.347920] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.5/0000:06:00.0/net/eth1/statistics/tx_bytes
<4>[6972396.348041] CPU 0
<4>[6972396.348048] Modules linked in: ipt_REDIRECT nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat iptable_mangle ipt_LOG xt_DSCP 8021q garp stp llc cls_fw sch_cbq ip_vzredir(U) vzredir(U) vzcompat(U) nf_conntrack_ipv4
nf_defrag_ipv4 vznetdev ip6_vzprivnet(U) ip6_vzredir(U) ip6_vznetstat(U) ip_vzprivnet(U) vziolimit(U) vzfs(U) vzdquota vzcpt nfs lockd fscache auth_rpcgss nfs_acl sunrpc vzlinkdev(U) vzethdev vzevent vzlist(U) vzstat(U) vzmon ip_vznet
stat(U) vznetstat(U) vzdev xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg mic
rocode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e i5100_edac edac_core ioatdma dca shpchp ext3 jbd mbcache sr_mod cdrom pata_it8213 sd_mod crc_t10dif pata_acpi ata_generic ata_piix aacraid dm_mirror dm_region_hash
dm_log dm_mod [last unloaded: nf_nat]
<4>[6972396.348586]
<4>[6972396.348586] Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-042stab083.2 #1 042stab083_2 Supermicro X7DCL/X7DCL
<4>[6972396.348586] RIP: 0010:[<ffffffffa0243fa0>] [<ffffffffa0243fa0>] nf_nat_setup_info+0x460/0x590 [nf_nat]
<4>[6972396.348586] RSP: 0018:ffff880028203700 EFLAGS: 00010286
<4>[6972396.348586] RAX: 0000000000000000 RBX: ffff8802d89c9b58 RCX: 000000005ef7b2de
<4>[6972396.348586] RDX: 0000000000000001 RSI: ffff8800282037e0 RDI: ffff880028203740
<4>[6972396.348586] RBP: ffff8800282037d0 R08: 00000000c0deecc2 R09: 0000000000000002
<4>[6972396.348586] R10: ffff8801345e5508 R11: ffff880028203918 R12: 0000000000000000
<4>[6972396.348586] R13: ffff8800282037e0 R14: ffff880028203740 R15: ffff8800c350a5f0
<4>[6972396.348586] FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
<4>[6972396.348586] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>[6972396.348586] CR2: 000000000000003e CR3: 00000002b6ada000 CR4: 00000000000407f0
<4>[6972396.348586] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[6972396.348586] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[6972396.348586] Process swapper (pid: 0, veid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020)
<4>[6972396.348586] Stack:
<4>[6972396.348586] ffff880028203740 ffffffff81b1a520 000000b200b20000 ffff88031a6606e0
<4>[6972396.348586] <d> 0000000028203740 ffff880000000008 ffffffff81299af0 ffff8800851e61c0
<4>[6972396.348586] <d> 00000000f7e6a8c0 0000000000000000 16e6a8c00002873a 0000000000000000
<4>[6972396.348586] Call Trace:
<4>[6972396.348586] <IRQ>
<4>[6972396.348586] [<ffffffff81299af0>] ? swiotlb_map_page+0x0/0xf0
<4>[6972396.348586] [<ffffffffa025e152>] masquerade_tg+0xd2/0xf0 [ipt_MASQUERADE]
<4>[6972396.348586] [<ffffffffa025232b>] ipt_do_table+0x3fb/0x6b0 [ip_tables]
<4>[6972396.348586] [<ffffffff8145c419>] ? dev_queue_xmit+0x169/0x560
<4>[6972396.348586] [<ffffffff8144c8a6>] ? skb_checksum+0x56/0x2d0
<4>[6972396.348586] [<ffffffffa0255284>] nf_nat_rule_find+0x24/0x80 [iptable_nat]
<4>[6972396.348586] [<ffffffffa0255521>] nf_nat_fn+0x111/0x260 [iptable_nat]
<4>[6972396.348586] [<ffffffffa0255798>] nf_nat_out+0x48/0xd0 [iptable_nat]
<4>[6972396.348586] [<ffffffff814841b9>] nf_iterate+0x69/0xb0
<4>[6972396.348586] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[6972396.348586] [<ffffffff81484374>] nf_hook_slow+0x74/0x110
<4>[6972396.348586] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[6972396.348586] [<ffffffff81490590>] ? ip_forward_finish+0x0/0x60
<4>[6972396.348586] [<ffffffff81495204>] ip_output+0xa4/0xc0
<4>[6972396.348586] [<ffffffff814905df>] ip_forward_finish+0x4f/0x60
<4>[6972396.348586] [<ffffffff81490788>] ip_forward+0x198/0x430
<4>[6972396.348586] [<ffffffff8148e8ed>] ip_rcv_finish+0x12d/0x440
<4>[6972396.348586] [<ffffffff8148ee84>] ip_rcv+0x284/0x360
<4>[6972396.348586] [<ffffffff8145651b>] __netif_receive_skb+0x34b/0x7b0
<4>[6972396.348586] [<ffffffff81458af8>] netif_receive_skb+0x58/0x60
<4>[6972396.348586] [<ffffffff81458c00>] napi_skb_finish+0x50/0x70
<4>[6972396.348586] [<ffffffff8145ae69>] napi_gro_receive+0x39/0x50
<4>[6972396.348586] [<ffffffffa013679f>] e1000_receive_skb+0x4f/0x90 [e1000e]
<4>[6972396.348586] [<ffffffffa0138d30>] e1000_clean_rx_irq+0x380/0x580 [e1000e]
<4>[6972396.348586] [<ffffffffa01412ed>] e1000e_poll+0x8d/0x37c [e1000e]
<4>[6972396.348586] [<ffffffff8145af90>] net_rx_action+0x110/0x300
<4>[6972396.348586] [<ffffffff810a2265>] ? __hrtimer_start_range_ns+0x1a5/0x470
<4>[6972396.348586] [<ffffffff8107ac8d>] __do_softirq+0x10d/0x250
<4>[6972396.348586] [<ffffffff810a84f9>] ? do_timer+0x329/0x590
<4>[6972396.348586] [<ffffffff8100c2ac>] call_softirq+0x1c/0x30
<4>[6972396.348586] [<ffffffff8100df05>] do_softirq+0x65/0xa0
<4>[6972396.348586] [<ffffffff8107aaad>] irq_exit+0xcd/0xd0
<4>[6972396.348586] [<ffffffff81520465>] do_IRQ+0x75/0xf0
<4>[6972396.348586] [<ffffffff8100ba93>] ret_from_intr+0x0/0x11
<4>[6972396.348586] <EOI>
<4>[6972396.348586] [<ffffffff81014967>] ? mwait_idle+0x77/0xd0
<4>[6972396.348586] [<ffffffff8151c9fa>] ? atomic_notifier_call_chain+0x1a/0x20
<4>[6972396.348586] [<ffffffff8100a026>] cpu_idle+0xb6/0x110
<4>[6972396.348586] [<ffffffff814fc5e5>] rest_init+0x85/0x90
<4>[6972396.348586] [<ffffffff81c32f6e>] start_kernel+0x412/0x41e
<4>[6972396.348586] [<ffffffff81c3233a>] x86_64_start_reservations+0x125/0x129
<4>[6972396.348586] [<ffffffff81c32438>] x86_64_start_kernel+0xfa/0x109
<4>[6972396.348586] Code: a0 02 00 00 4c 8b 3e 4d 85 ff 75 1b e9 10 01 00 00 66 2e 0f 1f 84 00 00 00 00 00 4d 8b 3f 4d 85 ff 0f 84 84 00 00 00 49 8b 47 20 <38> 50 3e 75 eb 8b 48 18 3b 4d a0 75 e3 0f b7 48 28 66 3b 4d b0
<1>[6972396.348586] RIP [<ffffffffa0243fa0>] nf_nat_setup_info+0x460/0x590 [nf_nat]
<4>[6972396.348586] RSP <ffff880028203700>
<4>[6972396.348586] CR2: 000000000000003e
find_appropriate_src()
{
hlist_for_each_entry_rcu(nat, n, &net->ipv4.nat_bysource[h], bysource) {
ct = nat->ct; ct = NULL
if (same_src(ct, tuple)) <<<< crashed on access to ct
nat = %r15 = ffff8800c350a5f0
crash> struct nf_conn_nat ffff8800c350a5f0
struct nf_conn_nat {
bysource = {
next = 0x0,
pprev = 0xffff88001a35cb70
},
seq = {{
correction_pos = 0,
offset_before = 0,
offset_after = 0
}, {
correction_pos = 0,
offset_before = 0,
offset_after = 0
}},
ct = 0x0,
help = {
nat_pptp_info = {
pns_call_id = 0,
pac_call_id = 0
}
},
masq_index = 0
}
according nf_ct_ext is used by ct ffff880212e81d50
crash> kmem ffff8800c350a5f0
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff88031fc00140 size-128 128 183495 506280 16876 4k
FREE / [ALLOCATED]
[ffff8800c350a5c0] <<<<< nf_conn_nat is part of nf_ct_ext sreucture
crash> search ffff8800c350a5c0
ffff880212e81e70: ffff8800c350a5c0 <<< someone have reference to our nf_ct_ext
crash> kmem ffff880212e81e70
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff88031dc00380 nf_conntrack_ffffffff81b1a520 312 799 1944 162 4k
FREE / [ALLOCATED]
[ffff880212e81d50] <<< it is alive conntrack
VvS: name of conntrack slab is correct, it is OpenVz specific, each network namespace have own slab for conntracks.
crash> nf_conn ffff880212e81d50
struct nf_conn
counter = 0x1
src: x.x.x.x, port 59116
dst: y.y.y.y, port 24000
status = 0xe, no NAT and NAT_DONE bits (?!?), set only IPS_SEEN_REPLY, IPS_ASSURED, IPS_CONFIRMED.
master = 0
timer -- will be expired in future
proto tcp -- seems there was some activity in this connection in past
tcp = {
seen = {{
td_end = 0x221b933,
td_maxend = 0x221f23b,
td_maxwin = 0xf7d3,
td_maxack = 0xdc9812bb,
td_scale = 0x6,
flags = 0x23
}, {
td_end = 0xdc9812bb,
td_maxend = 0xdc986dfb,
td_maxwin = 0x3908,
td_maxack = 0x221b933,
td_scale = 0x7,
flags = 0x23
}},
state = 0x3, <<< TCP_CONNTRACK_ESTABLISHED
last_dir = 0x1,
retrans = 0x0,
last_index = 0x1,
last_seq = 0xdc9812ba,
last_ack = 0xdc9812bb,
last_end = 0xdc9812bb,
last_win = 0x3908
},
ext = 0xffff8800c350a5c0 -- reference to extension
crash> nf_ct_ext 0xffff8800c350a5c0
struct nf_ct_ext {
rcu = {
next = 0x0,
func = 0x0
},
offset = "\000\060\000\030", <<<<< \060 =0x30 --> nat extension was used
len = 0x60,
data = 0xffff8800c350a5d5 ""
}
hlist looks normal
crash> nf_conn_nat.bysource 0xffff8800c350a5f0
bysource = {
next = 0x0,
pprev = 0xffff88001a35cb70
}
crash> nf_conn_nat.bysource 0xffff88001a35cb70
bysource = {
next = 0xffff8800c350a5f0,
pprev = 0xffff88031b2c8a08
}
crash> hlist_head 0xffff88031b2c8a08 <<<<< hlist_head from init_net.ipv4.nat_bysource
struct hlist_head {
first = 0xffff88001a35cb70
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
reply other threads:[~2014-02-18 10:17 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=530332FE.7040803@parallels.com \
--to=vvs@parallels.com \
--cc=avagin@openvz.org \
--cc=eric.dumazet@gmail.com \
--cc=kaber@trash.net \
--cc=kernel@linuxace.com \
--cc=mkubecek@suse.cz \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.