From mboxrd@z Thu Jan 1 00:00:00 1970 From: Richard Weinberger Subject: Re: MNT_DETACH and mount namespace issue Date: Fri, 01 Aug 2014 00:17:13 +0200 Message-ID: <53DAC069.7070606@nod.at> References: <1406728756-32443-1-git-send-email-richard@sigma-star.at> <53D959A7.5070702@nod.at> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: viro@zeniv.linux.org.uk, hch@infradead.org, paulmck@linux.vnet.ibm.com, jeffm@suse.com, sahne@0x90.at, "linux-kernel@vger.kernel.org" , linuxram@us.ibm.com To: linux-fsdevel@vger.kernel.org Return-path: In-Reply-To: <53D959A7.5070702@nod.at> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Am 30.07.2014 22:46, schrieb Richard Weinberger: > Am 30.07.2014 15:59, schrieb Richard Weinberger: >> If we use the plain list_empty() we might not see the >> hlist_del_init_rcu() and therefore miss one member of the >> list. >> >> It fixes the following issue: >> $ unshare -m /usr/bin/sleep 10000 & >> $ mkdir -p foo/proc >> $ mount -t proc none foo/proc >> $ mount -t binfmt_misc none foo/proc/sys/fs/binfmt_misc >> $ umount -l foo/proc >> $ rmdir foo/proc >> rmdir: failed to remove =E2=80=98foo/proc=E2=80=99: Device or resour= ce busy >=20 > Although my fix was wrong, the issue is real, it seems to exist for a= very long > time. Just was able to reproduce it on 2.6.32. > Please note that you need a shared root subtree to trigger the issue. > i.e. mount --shared / > Maybe this is why nobody noticed it so far as only systemd distros > have the root subtree shared by default. >=20 > I hit the issue on openSUSE 13.1 where an application creates a chroo= t environment > and then lazy umounts /proc. > It happened on very few machines. An analysis showed that only boxes = with an OpenVPN tunnel > were affected. This did not make any sense until I discovered that th= e OpenVPN systemd > service file has set "PrivateTmp=3Dtrue". This setting creates > a mount namespace for the said service... >=20 > In __propagate_umount() the following piece of code is interesting: >=20 > /* > * umount the child only if the child has no > * other children > */ > if (child && list_empty(&child->mnt_mounts)) { > hlist_del_init_rcu(&child->mnt_hash); > hlist_add_before_rcu(&child->mnt_hash, &mnt->mnt_hash); > } >=20 > child->mnt_mounts is non-empty for the "proc" although the "binfmt_mi= sc" > subtree was removed. > I'm not sure whether this is only one more symptom or the main culpri= t. CC'ing Ram Pai. Ram, you are the author of the said code. Can you please explain why we= need that list_empty() check? To my (limited) understanding of VFS, the following change should be fi= ne to fix the issue: diff --git a/fs/pnode.c b/fs/pnode.c index 302bf22..627b35f 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -376,11 +376,7 @@ static void __propagate_umount(struct mount *mnt) struct mount *child =3D __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint); - /* - * umount the child only if the child has no - * other children - */ - if (child && list_empty(&child->mnt_mounts)) { + if (child) { hlist_del_init_rcu(&child->mnt_hash); hlist_add_before_rcu(&child->mnt_hash, &mnt->mnt_hash); } Thanks, //richard