All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chiluk <chiluk@canonical.com>
To: linux-fsdevel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>
Subject: Soft-lockup on vfsmount_lock with large numbers of mount namespaces in the cloud
Date: Thu, 20 Feb 2014 00:51:44 -0600	[thread overview]
Message-ID: <5305A600.1030209@canonical.com> (raw)

An openstack neutron gateway uses network namespaces to partition
machines within a cloud. In order to do so it creates lots of network
namespaces, and as a result mount namespaces. This is accomplished
through many calls to

$ ip netns add/delete/exec

After roughly 3k-4k namespaces the performance of these ip calls becomes
very slow on the order of many seconds.  After a few more the machine
starts to report "BUGs" on the stuck ip processes (BUG output below).

We think the problem is contention for the vfsmount_lock which gets held
by do_umount while it walks the mounts in the following stack

do_umount
 -> umount_tree
    -> propagate_umount
       -> __propagate_umount
          -> __lookup_mnt

Where lookup_mnt proceeds to spend significant time walking the
mount_hastable.

How we can mitigate or fix this expensive operation while holding the
lock?  If this has already been fixed please feel free to point me at
requisite git hash's.

Perhaps I'm looking in the wrong area of code, and I really just need
aa7a574d0c54cc5a0aceb7357b5097342c0844ee.  Are there any others that
immediately stand out or is this a new problem?

Also we've tried reproducing with 3.5, 3.8, 3.11 which yielded similar
results. 3.13 runs into similar results but has different issues related
to the RCU locking.  When I have a better idea as to what's going on
with 3.13 I will report back about that.

Thanks,
Dave Chiluk.


[15645.196718] BUG: soft lockup - CPU#23 stuck for 22s! [ip:5898]
[15645.203279] Modules linked in: xt_conntrack nfnetlink xt_CT
iptable_raw ipt_REDIRECT veth ipmi_devintf ipmi_si ipmi_msghandler
iptable_nat nf_nat xt_recent xt_multiport netlord(O) bridge bonding
ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables
iptable_filter ip_tables x_tables coretemp kvm_intel kvm
ghash_clmulni_intel 8021q aesni_intel cryptd hid_generic garp stp
gpio_ich igb aes_x86_64 usbhid i7core_edac llc hid serio_raw edac_core
mac_hid dca lpc_ich microcode ahci libahci lp parport shpchp hpsa
[15645.203323] CPU 23
[15645.203324] Modules linked in:
[15645.203326]  xt_conntrack nfnetlink xt_CT iptable_raw ipt_REDIRECT
veth ipmi_devintf ipmi_si ipmi_msghandler iptable_nat nf_nat xt_recent
xt_multiport netlord(O) bridge bonding ipt_REJECT xt_LOG xt_limit
xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables
x_tables coretemp kvm_intel kvm ghash_clmulni_intel 8021q aesni_intel
cryptd hid_generic garp stp gpio_ich igb aes_x86_64 usbhid i7core_edac
llc hid serio_raw edac_core mac_hid dca lpc_ich microcode ahci libahci
lp parport shpchp hpsa
[15645.203357]
[15645.203359] Pid: 5898, comm: ip Tainted: G           O
3.5.0-44-generic #67~precise1hf
[15645.203363] RIP: 0010:[<ffffffff8169ef29>]  [<ffffffff8169ef29>]
_raw_spin_unlock_irqrestore+0x19/0x30
[15645.203373] RSP: 0018:ffff88183fd63dd8  EFLAGS: 00000282
[15645.203375] RAX: 0000000000000282 RBX: 0000000000000000 RCX:
0000000000000400
[15645.203377] RDX: 0000000000000002 RSI: 0000000000000282 RDI:
0000000000000282
[15645.203378] RBP: ffff88183fd63de0 R08: 0000000000000000 R09:
0000000000000000
[15645.203380] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff88183fd63d48
[15645.203381] R13: ffffffff816a820a R14: ffff88183fd63de0 R15:
ffff88183fd739c0
[15645.203384] FS:  00007fdf0de76700(0000) GS:ffff88183fd60000(0000)
knlGS:0000000000000000
[15645.203385] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[15645.203387] CR2: 00000000040cfdb8 CR3: 0000001288f87000 CR4:
00000000000007e0
[15645.203389] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[15645.203391] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[15645.203393] Process ip (pid: 5898, threadinfo ffff880621126000, task
ffff88072f609700)
[15645.203394] Stack:
[15645.203396]  ffff8812b6d36500 ffff88183fd63e50 ffffffff8108f6d7
0000000000000017
[15645.203401]  0000000000000282 00000000000139c0 0000000000000017
0000000000000000
[15645.203405]  ffff8812b6d36570 ffff880979ef5380 00000000000139c0
0000000000000017
[15645.203410] Call Trace:
[15645.203411]  <IRQ>
[15645.203414]
[15645.203419]  [<ffffffff8108f6d7>] update_shares+0xc7/0x100
[15645.203423]  [<ffffffff81091daf>] rebalance_domains+0x4f/0x180
[15645.203426]  [<ffffffff81092078>] run_rebalance_domains+0x48/0x60
[15645.203433]  [<ffffffff8105bc78>] __do_softirq+0xa8/0x210
[15645.203439]  [<ffffffff810ab6c4>] ? tick_program_event+0x24/0x30
[15645.203443]  [<ffffffff816a8b5c>] call_softirq+0x1c/0x30
[15645.203449]  [<ffffffff81016235>] do_softirq+0x65/0xa0
[15645.203453]  [<ffffffff8105c05e>] irq_exit+0x8e/0xb0
[15645.203456]  [<ffffffff816a94be>] smp_apic_timer_interrupt+0x6e/0x99
[15645.203462]  [<ffffffff816a820a>] apic_timer_interrupt+0x6a/0x70
[15645.203463]  <EOI>
[15645.203465]
[15645.203468]  [<ffffffff8169ef29>] ? _raw_spin_unlock_irqrestore+0x19/0x30
[15645.203474]  [<ffffffff81148b66>] free_percpu+0xa6/0x140
[15645.203478]  [<ffffffff811a71fe>] free_vfsmnt+0x2e/0x50
[15645.203482]  [<ffffffff811a7b8b>] mntput_no_expire+0xfb/0x160
[15645.203484]  [<ffffffff811a7c14>] mntput+0x24/0x40
[15645.203488]  [<ffffffff811a885b>] release_mounts+0x8b/0xa0
[15645.203491]  [<ffffffff811a8e4f>] do_umount+0x15f/0x250
[15645.203494]  [<ffffffff811a900a>] sys_umount+0xca/0xe0
[15645.203498]  [<ffffffff816a7769>] system_call_fastpath+0x16/0x1b
[15645.203499] Code: 66 90 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00
00 55 48 89 e5 53 66 66 66 66 90 48 89 f3 e8 6e 1b 9a ff 66 90 48 89 df
57 9d <66> 66 90 66 90 5b 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00
[15645.203532] Kernel panic - not syncing: softlockup: hung tasks
[15645.210089] Pid: 5898, comm: ip Tainted: G           O
3.5.0-44-generic #67~precise1hf1267535v20140117b1-Ubuntu
[15645.221437] Call Trace:
[15645.224200]  <IRQ>  [<ffffffff816862b2>] panic+0xc1/0x1d7
[15645.230290]  [<ffffffff810e1b57>] watchdog_timer_fn+0x177/0x180
[15645.236949]  [<ffffffff8107c3d8>] __run_hrtimer+0x78/0x1f0
[15645.243120]  [<ffffffff810e19e0>] ? __touch_watchdog+0x30/0x30
[15645.249680]  [<ffffffff8107cc67>] hrtimer_interrupt+0xf7/0x240
[15645.256240]  [<ffffffff816a94b9>] smp_apic_timer_interrupt+0x69/0x99
[15645.263382]  [<ffffffff816a820a>] apic_timer_interrupt+0x6a/0x70
[15645.270137]  [<ffffffff8169ef29>] ? _raw_spin_unlock_irqrestore+0x19/0x30
[15645.277767]  [<ffffffff8108f6d7>] update_shares+0xc7/0x100
[15645.283937]  [<ffffffff81091daf>] rebalance_domains+0x4f/0x180
[15645.290496]  [<ffffffff81092078>] run_rebalance_domains+0x48/0x60
[15645.297347]  [<ffffffff8105bc78>] __do_softirq+0xa8/0x210
[15645.303420]  [<ffffffff810ab6c4>] ? tick_program_event+0x24/0x30
[15645.310172]  [<ffffffff816a8b5c>] call_softirq+0x1c/0x30
[15645.316148]  [<ffffffff81016235>] do_softirq+0x65/0xa0
[15645.321930]  [<ffffffff8105c05e>] irq_exit+0x8e/0xb0
[15645.327518]  [<ffffffff816a94be>] smp_apic_timer_interrupt+0x6e/0x99
[15645.334661]  [<ffffffff816a820a>] apic_timer_interrupt+0x6a/0x70
[15645.341411]  <EOI>  [<ffffffff8169ef29>] ?
_raw_spin_unlock_irqrestore+0x19/0x30
[15645.349763]  [<ffffffff81148b66>] free_percpu+0xa6/0x140
[15645.355738]  [<ffffffff811a71fe>] free_vfsmnt+0x2e/0x50
[15645.361617]  [<ffffffff811a7b8b>] mntput_no_expire+0xfb/0x160
[15645.368079]  [<ffffffff811a7c14>] mntput+0x24/0x40
[15645.373471]  [<ffffffff811a885b>] release_mounts+0x8b/0xa0
[15645.379642]  [<ffffffff811a8e4f>] do_umount+0x15f/0x250
[15645.385521]  [<ffffffff811a900a>] sys_umount+0xca/0xe0
[15645.391302]  [<ffffffff816a7769>] system_call_fastpath+0x16/0x1b


             reply	other threads:[~2014-02-20  6:51 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-20  6:51 Dave Chiluk [this message]
2014-02-25  8:05 ` Soft-lockup on vfsmount_lock with large numbers of mount namespaces in the cloud Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5305A600.1030209@canonical.com \
    --to=chiluk@canonical.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.