From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 826BEC433F5 for ; Mon, 21 Feb 2022 05:04:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232084AbiBUFEs (ORCPT ); Mon, 21 Feb 2022 00:04:48 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:44116 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232040AbiBUFEr (ORCPT ); Mon, 21 Feb 2022 00:04:47 -0500 Received: from zeniv-ca.linux.org.uk (zeniv-ca.linux.org.uk [IPv6:2607:5300:60:148a::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 246632A720 for ; Sun, 20 Feb 2022 21:04:23 -0800 (PST) Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nM0s1-003V5c-Rm; Mon, 21 Feb 2022 05:04:21 +0000 Date: Mon, 21 Feb 2022 05:04:21 +0000 From: Al Viro To: linux-fsdevel@vger.kernel.org Cc: Eric Biederman , Linus Torvalds Subject: Re: [RFC] umount/__detach_mounts() race Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org BTW, while looking through the vicinity - I think this if (!check_mnt(old) && old_path->dentry->d_op != &ns_dentry_operations) return mnt; in __do_loopback() is too permissive. I'd prefer to replace it with if (!check_mnt(old)) { // allow binding objects from internal instance of nsfs if (old->mnt_ns != MNT_NS_INTERAL || old_path->dentry->d_op != &ns_dentry_operations) return mnt; } Suppose we'd bound an nsfs object e.g. on /tmp/foo. Consider a race between mount --bind /tmp/foo /tmp/bar and umount -l /tmp/foo. do_loopback() resolves /tmp/foo. We have dentry from nsfs and mount that sits on /tmp/foo. We'd resolved /tmp/bar. In the meanwhile, umount has resolved /tmp/foo and unmounted it. struct mount is still alive due to the reference held by do_loopback(). do_loopback() finally grabs namespace_sem. It verifies that mountpoint to be (/tmp/bar) is still OK (it is) and calls __do_loopback(). The check in __do_loopback() passes - old is unmounted, but dentry is nsfs one, so we proceed to clone old. And that's where the things go wrong - we copy the flags, including MNT_UMOUNT, to the new instance. And proceed to attach it on /tmp/bar. It's really not a good thing. E.g. __detach_mnt() will assume that reference to the parent mount of /tmp/bar is *not* held. As the matter of fact, it is, so we'll get a leak if the mountpoint (/tmp/bar, that is) gets unlinked in another namespace. That's not the only way to get trouble - we are really not supposed to have MNT_UMOUNT mounts attached to !MNT_UMOUNT ones. Eric, do you see any problems with the change above?