From mboxrd@z Thu Jan 1 00:00:00 1970 From: Richard Weinberger Subject: MNT_DETACH and mount namespace issue (was: Re: [PATCH] vfs: Fix RCU usage in __propagate_umount()) Date: Wed, 30 Jul 2014 22:46:31 +0200 Message-ID: <53D959A7.5070702@nod.at> References: <1406728756-32443-1-git-send-email-richard@sigma-star.at> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: viro@zeniv.linux.org.uk, hch@infradead.org, paulmck@linux.vnet.ibm.com, jeffm@suse.com, sahne@0x90.at, "linux-kernel@vger.kernel.org" To: linux-fsdevel@vger.kernel.org Return-path: In-Reply-To: <1406728756-32443-1-git-send-email-richard@sigma-star.at> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Am 30.07.2014 15:59, schrieb Richard Weinberger: > If we use the plain list_empty() we might not see the > hlist_del_init_rcu() and therefore miss one member of the > list. >=20 > It fixes the following issue: > $ unshare -m /usr/bin/sleep 10000 & > $ mkdir -p foo/proc > $ mount -t proc none foo/proc > $ mount -t binfmt_misc none foo/proc/sys/fs/binfmt_misc > $ umount -l foo/proc > $ rmdir foo/proc > rmdir: failed to remove =E2=80=98foo/proc=E2=80=99: Device or resourc= e busy Although my fix was wrong, the issue is real, it seems to exist for a v= ery long time. Just was able to reproduce it on 2.6.32. Please note that you need a shared root subtree to trigger the issue. i.e. mount --shared / Maybe this is why nobody noticed it so far as only systemd distros have the root subtree shared by default. I hit the issue on openSUSE 13.1 where an application creates a chroot = environment and then lazy umounts /proc. It happened on very few machines. An analysis showed that only boxes wi= th an OpenVPN tunnel were affected. This did not make any sense until I discovered that the = OpenVPN systemd service file has set "PrivateTmp=3Dtrue". This setting creates a mount namespace for the said service... In __propagate_umount() the following piece of code is interesting: /* * umount the child only if the child has no * other children */ if (child && list_empty(&child->mnt_mounts)) { hlist_del_init_rcu(&child->mnt_hash); hlist_add_before_rcu(&child->mnt_hash, &mnt->mnt_hash); } child->mnt_mounts is non-empty for the "proc" although the "binfmt_misc= " subtree was removed. I'm not sure whether this is only one more symptom or the main culprit. Any ideas? Thanks, //richard