From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6234C433F5 for ; Mon, 21 Feb 2022 16:46:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381190AbiBUQrD (ORCPT ); Mon, 21 Feb 2022 11:47:03 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:60836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381208AbiBUQrB (ORCPT ); Mon, 21 Feb 2022 11:47:01 -0500 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7499DE6E for ; Mon, 21 Feb 2022 08:46:35 -0800 (PST) Received: from in01.mta.xmission.com ([166.70.13.51]:36758) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1nMBpa-009hYd-6z; Mon, 21 Feb 2022 09:46:34 -0700 Received: from ip68-227-174-4.om.om.cox.net ([68.227.174.4]:51802 helo=email.froward.int.ebiederm.org.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1nMBpZ-002bvF-65; Mon, 21 Feb 2022 09:46:33 -0700 From: "Eric W. Biederman" To: Al Viro Cc: linux-fsdevel@vger.kernel.org, Linus Torvalds References: Date: Mon, 21 Feb 2022 10:46:26 -0600 In-Reply-To: (Al Viro's message of "Mon, 21 Feb 2022 05:04:21 +0000") Message-ID: <87tucscgm5.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1nMBpZ-002bvF-65;;;mid=<87tucscgm5.fsf@email.froward.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.174.4;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+QOqccXFM17n72cA1akYyrUF6VVN2W+es= X-SA-Exim-Connect-IP: 68.227.174.4 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [RFC] umount/__detach_mounts() race X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Al Viro writes: > BTW, while looking through the vicinity - I think this > if (!check_mnt(old) && old_path->dentry->d_op != &ns_dentry_operations) > return mnt; > in __do_loopback() is too permissive. I'd prefer to replace it with > if (!check_mnt(old)) { > // allow binding objects from internal instance of nsfs > if (old->mnt_ns != MNT_NS_INTERAL || > old_path->dentry->d_op != &ns_dentry_operations) > return mnt; > } > > Suppose we'd bound an nsfs object e.g. on /tmp/foo. Consider a race > between mount --bind /tmp/foo /tmp/bar and umount -l /tmp/foo. > > do_loopback() resolves /tmp/foo. We have dentry from nsfs and mount > that sits on /tmp/foo. We'd resolved /tmp/bar. > > In the meanwhile, umount has resolved /tmp/foo and unmounted it. > struct mount is still alive due to the reference held by do_loopback(). > > do_loopback() finally grabs namespace_sem. It verifies that mountpoint > to be (/tmp/bar) is still OK (it is) and calls __do_loopback(). The > check in __do_loopback() passes - old is unmounted, but dentry is > nsfs one, so we proceed to clone old. > > And that's where the things go wrong - we copy the flags, including > MNT_UMOUNT, to the new instance. And proceed to attach it on /tmp/bar. > > It's really not a good thing. E.g. __detach_mnt() will assume that > reference to the parent mount of /tmp/bar is *not* held. As the > matter of fact, it is, so we'll get a leak if the mountpoint (/tmp/bar, > that is) gets unlinked in another namespace. That's not the only way > to get trouble - we are really not supposed to have MNT_UMOUNT mounts > attached to !MNT_UMOUNT ones. > > Eric, do you see any problems with the change above? Such as breaking userspace code? Maybe. Currently we exempt nsfs dentries from the same namespace restriction when cloning them. If I read your proposal correctly you are proposing only exempting nsfs dentries that are internally mounted from the same namespace restriction. We need to keep the ordinary case of bind mounts from one nsfs dentry to another dentry working even after it is mounted. If my foggy brain is reasoning correctly you are proposing not allowing bind mounts before the mount has been attached or after the mount has been detached by umount_tree. Not allowing already umounted mounts to be bind mounted seems semantically fine. Userspace should not care. I wonder if perhaps we should have an explicit test for MNT_UMOUNT in __do_loopback. Eric