From: "Jakub Kiciński" <moorray3@wp.pl>
To: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>, <netdev@vger.kernel.org>,
Fan Du <fan.du@windriver.com>
Subject: Re: net-next: NULL pointer dereference on adding a net namespace and a system freeze
Date: Tue, 11 Mar 2014 13:42:26 +0100 [thread overview]
Message-ID: <20140311134226.7a200693@north> (raw)
In-Reply-To: <20140311120059.GB32371@secunet.com>
On Tue, 11 Mar 2014 13:00:59 +0100, Steffen Klassert wrote:
> On Tue, Mar 11, 2014 at 01:46:49AM +0100, Jakub Kiciński wrote:
> >
> > I bisected the other issue to be caused/uncovered by:
> >
> > commit 1a1ccc96abb2ed9b8fbb71018e64b97324caef53
> > Author: Steffen Klassert <steffen.klassert@secunet.com>
> > Date: Wed Feb 19 10:07:34 2014 +0100
> >
> > xfrm: Remove caching of xfrm_policy_sk_bundles
> >
> > We currently cache socket policy bundles at xfrm_policy_sk_bundles.
> > These cached bundles are never used. Instead we create and cache
> > a new one whenever xfrm_lookup() is called on a socket policy.
> >
> > Most protocols cache the used routes to the socket, so let's
> > remove the unused caching of socket policy bundles in xfrm.
> >
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> >
>
> This patch should affect only on the usage of IPsec socket policies.
> Do you use socket policies, or do you use IPsec at all?
I'm running pretty standard Fedora 20 installation here (notably with
NetowrkManager removed). Two daemons that trigger flow_cache warnings
are libvirt and rtkit.
I'm not sure how to check IPsec policies, ip xfrm state/policy don't
show anything.
> >
> > Machine freezes after FLOW_HASH_RND_PERIOD (default 10 minutes).
> > Now get this warning during boot:
> >
> > [ 31.664820] ------------[ cut here ]------------
> > [ 31.664824] WARNING: CPU: 2 PID: 3560 at /home/kuba/Development/Linux/net-next/lib/list_debug.c:33 __list_add+0xac/0xc0()
> > [ 31.664826] list_add corruption. prev->next should be next (ffff880224579598), but was (null). (prev=ffff8802106140e8).
> > [ 31.664827] Modules linked in: xt_CHECKSUM tun bridge stp llc ccm xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ftdi_sio arc4 rt2800pci rt2800mmio rt2800lib crc_ccitt eeprom_93cx6 rt2x00pci kvm_amd rt2x00mmio rt2x00lib mac80211 kvm snd_ca0106 cfg80211 e1000e snd_ac97_codec ac97_bus microcode serio_raw ptp i2c_piix4 k10temp acpi_cpufreq pps_core wmi r8169 mii rfkill nfsd auth_rpcgss nfs_acl lockd binfmt_misc sunrpc usb_storage radeon drm_kms_helper ttm
> > [ 31.664855] CPU: 2 PID: 3560 Comm: (t-daemon) Not tainted 3.14.0-rc2-1a1ccc96abb2ed9b8fbb71018e64b97324caef53+ #11
> > [ 31.664856] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790XT-UD4P/GA-MA790XT-UD4P, BIOS F9b 08/17/2012
> > [ 31.664857] 0000000000000009 ffff8802242e7c70 ffffffff81627878 ffff8802242e7cb8
> > [ 31.664859] ffff8802242e7ca8 ffffffff8104a28d ffff880210610ea8 ffff880224579598
> > [ 31.664861] ffff8802106140e8 ffff880224578000 0000000000000000 ffff8802242e7d08
> > [ 31.664863] Call Trace:
> > [ 31.664865] [<ffffffff81627878>] dump_stack+0x4d/0x66
> > [ 31.664867] [<ffffffff8104a28d>] warn_slowpath_common+0x7d/0xa0
> > [ 31.664869] [<ffffffff8104a2fc>] warn_slowpath_fmt+0x4c/0x50
> > [ 31.664871] [<ffffffff812fdd8c>] __list_add+0xac/0xc0
> > [ 31.664873] [<ffffffff81055d33>] __internal_add_timer+0x113/0x130
> > [ 31.664875] [<ffffffff81055f47>] internal_add_timer+0x17/0x40
> > [ 31.664876] [<ffffffff810587b2>] mod_timer+0x102/0x230
> > [ 31.664878] [<ffffffff810588f8>] add_timer+0x18/0x20
> > [ 31.664880] [<ffffffff81572204>] flow_cache_init+0x224/0x2b0
> > [ 31.664882] [<ffffffff815f7247>] xfrm_net_init+0x227/0x360
> > [ 31.664884] [<ffffffff815f7171>] ? xfrm_net_init+0x151/0x360
> > [ 31.664886] [<ffffffff81553131>] ops_init+0x41/0x150
> > [ 31.664888] [<ffffffff815532b3>] setup_net+0x73/0x110
> > [ 31.664890] [<ffffffff815537f2>] copy_net_ns+0x72/0x100
> > [ 31.664892] [<ffffffff81072619>] create_new_namespaces+0xf9/0x190
> > [ 31.664894] [<ffffffff81072891>] unshare_nsproxy_namespaces+0x61/0xa0
> > [ 31.664895] [<ffffffff81049949>] SyS_unshare+0x159/0x270
> > [ 31.664897] [<ffffffff81638092>] system_call_fastpath+0x16/0x1b
> >
>
> I was unable to reproduce this here, but it looks like the flowcache
> namespace changes are still not complete. We leak an active timer
> and all the allocated resources when we exit a namespace.
I also failed to reproduce it reliably on a VM. On a VM it happens 50%
of the times while on physical machine it's triggered reliably on every
boot.
While playing restarting libvirt and rtkit to see it they produce any
xfrm noise I got this:
[ 292.624771] BUG: soft lockup - CPU#1 stuck for 22s! [(t-daemon):4655]
[ 292.624777] Modules linked in: bnep bluetooth 6lowpan_iphc fuse ipt_MASQUERADE xt_CHECKSUM tun bridge stp llc ccm xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw arc4 rt2800pci rt2800mmio rt2800lib crc_ccitt eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib ftdi_sio kvm_amd mac80211 cfg80211 kvm e1000e snd_ca0106 snd_ac97_codec i2c_piix4 rfkill microcode ac97_bus serio_raw k10temp r8169 mii acpi_cpufreq ptp wmi pps_core nfsd auth_rpcgss nfs_acl lockd binfmt_misc sunrpc usb_storage radeon drm_kms_helper ttm
[ 292.624884] CPU: 1 PID: 4655 Comm: (t-daemon) Not tainted 3.14.0-rc2d3623099d3509fa68fa28235366049dd3156c63a+ #10
[ 292.624889] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790XT-UD4P/GA-MA790XT-UD4P, BIOS F9b 08/17/2012
[ 292.624894] task: ffff8802228753c0 ti: ffff8800b515a000 task.ti: ffff8800b515a000
[ 292.624899] RIP: 0010:[<ffffffff81072a63>] [<ffffffff81072a63>] raw_notifier_chain_register+0x23/0x40
[ 292.624910] RSP: 0018:ffff8800b515bd98 EFLAGS: 00000246
[ 292.624914] RAX: ffff8802014d0ec0 RBX: ffffffff81c23340 RCX: 0000000000000004
[ 292.624919] RDX: 0000000000000000 RSI: ffff8800b50f1fc0 RDI: ffff8802014d0ec8
[ 292.624923] RBP: ffff8800b515bd98 R08: 0000000000000000 R09: 0000000000000000
[ 292.624928] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c233a8
[ 292.624933] R13: 0000000180040004 R14: 0000000000000246 R15: 000060fd00000000
[ 292.624939] FS: 00007fa39d6118c0(0000) GS:ffff88022fc80000(0000) knlGS:00000000e26ffb40
[ 292.624944] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 292.624948] CR2: 00007fa4b45a7f40 CR3: 00000000bd2e6000 CR4: 00000000000007e0
[ 292.624951] Stack:
[ 292.624955] ffff8800b515bdb0 ffffffff8161ff8a ffff8800b50f1100 ffff8800b515bde0
[ 292.624965] ffffffff815721be ffff8800b50f1100 0000000000000000 ffff8800b50f1160
[ 292.624974] ffff8800b50f1290 ffff8800b515be28 ffffffff815f7321 ffffffff815f7231
[ 292.624982] Call Trace:
[ 292.624992] [<ffffffff8161ff8a>] register_cpu_notifier+0x2a/0x40
[ 292.625001] [<ffffffff815721be>] flow_cache_init+0x1de/0x2b0
[ 292.625009] [<ffffffff815f7321>] xfrm_net_init+0x241/0x380
[ 292.625016] [<ffffffff815f7231>] ? xfrm_net_init+0x151/0x380
[ 292.625025] [<ffffffff81553131>] ops_init+0x41/0x150
[ 292.625033] [<ffffffff815532b3>] setup_net+0x73/0x110
[ 292.625042] [<ffffffff815537f2>] copy_net_ns+0x72/0x100
[ 292.625050] [<ffffffff81072619>] create_new_namespaces+0xf9/0x190
[ 292.625058] [<ffffffff81072891>] unshare_nsproxy_namespaces+0x61/0xa0
[ 292.625065] [<ffffffff81049949>] SyS_unshare+0x159/0x270
[ 292.625073] [<ffffffff816381d2>] system_call_fastpath+0x16/0x1b
[ 292.625077] Code: e9 7b ff ff ff 0f 1f 00 66 66 66 66 90 55 48 8b 07 48 89 e5 48 85 c0 74 21 8b 56 10 3b 50 10 7e 0c eb 17 0f 1f 44 00 00 39 50 10 <7c> 0d 48 8d 78 08 48 8b 40 08 48 85 c0 75 ee 48 89 46 08 31 c0
This is net-next with head at d3623099d3509fa68fa28235366049dd3156c63a
It takes a few restarts of libvirt/rtkit-daemon to trigger, but I've
definitely seen register_cpu_notifier appearing in backtraces before...
maybe this is some kind of a lead?
> Could you please try the patch below?
Testing now... Expect results in 15 minutes...
> Also, please send your config if the patch does not fix your problem.
config: http://paste.fedoraproject.org/84281/54146313
-- kuba
next prev parent reply other threads:[~2014-03-11 12:42 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-10 0:44 net-next: NULL pointer dereference on adding a net namespace and a system freeze Jakub Kicinski
2014-03-10 4:02 ` Eric Dumazet
2014-03-10 4:09 ` Eric Dumazet
2014-03-10 6:51 ` Fan Du
2014-03-10 13:44 ` Eric Dumazet
2014-03-10 14:09 ` [PATCH net-next] flowcache: restore a single flow_cache kmem_cache Eric Dumazet
2014-03-11 1:45 ` David Miller
2014-03-10 12:19 ` net-next: NULL pointer dereference on adding a net namespace and a system freeze Jakub Kiciński
2014-03-10 14:04 ` Eric Dumazet
2014-03-11 0:46 ` Jakub Kiciński
2014-03-11 5:30 ` Steffen Klassert
2014-03-11 12:00 ` Steffen Klassert
2014-03-11 12:40 ` Eric Dumazet
2014-03-11 13:20 ` Steffen Klassert
2014-03-11 14:30 ` Jakub Kiciński
2014-03-12 8:38 ` Steffen Klassert
2014-03-12 8:43 ` [PATCH net-next] flowcache: Fix resource leaks on namespace exit Steffen Klassert
2014-03-12 11:43 ` Eric Dumazet
2014-03-12 19:31 ` David Miller
2014-03-11 12:42 ` Jakub Kiciński [this message]
2014-03-12 10:02 ` net-next: NULL pointer dereference on adding a net namespace and a system freeze Fan Du
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140311134226.7a200693@north \
--to=moorray3@wp.pl \
--cc=eric.dumazet@gmail.com \
--cc=fan.du@windriver.com \
--cc=netdev@vger.kernel.org \
--cc=steffen.klassert@secunet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.