From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4A8ACD613F for ; Mon, 9 Oct 2023 21:58:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378824AbjJIV6V (ORCPT ); Mon, 9 Oct 2023 17:58:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378789AbjJIV6U (ORCPT ); Mon, 9 Oct 2023 17:58:20 -0400 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [IPv6:2a03:a000:7:0:5054:ff:fe1c:15ff]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2B5499; Mon, 9 Oct 2023 14:58:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Uo1EiyojZdAoskTvMvtwCcjNoSWtdzYtyhDVJ0XRCLA=; b=tQPA9yIqHmqNSbhOVvxiucllWR l2tso1BLwh5CGWav21hSNoCV18PYO+IsxWdqSb/pXzM2cPknMXd9P7ke0zHb6UliQGYi6gLyRpPKv U7hiFaveQ7mceuNDQ8jXcyNC9Wn3AkdirOyK8p0NOTO/Oj6yURkgBAyy5W9y7bWeLSH/O6DDIkmyl WCp+Xw1NcUBfI4VrenMtUjHlxEHqjrSYPedcuoxv/CrT5Nzph7SfGqANRPJyxErt/tMNNVW/JgwXK fM0gVUEdTnwkyob+53dTu2ypbGyfYDxKYT5I9ypDKvQN6UcsFeM5PE4elD0hso579aRXCVJbRT5mn T7zz5aAw==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.96 #2 (Red Hat Linux)) id 1qpyGA-00HK0m-1f; Mon, 09 Oct 2023 21:57:54 +0000 Date: Mon, 9 Oct 2023 22:57:54 +0100 From: Al Viro To: Christoph Hellwig Cc: Christian Brauner , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Fenghua Yu , Reinette Chatre , Miquel Raynal , Richard Weinberger , Vignesh Raghavendra , Dennis Dalessandro , Tejun Heo , Trond Myklebust , Anna Schumaker , Kees Cook , Damien Le Moal , Naohiro Aota , Greg Kroah-Hartman , linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org, linux-hardening@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super Message-ID: <20231009215754.GL800259@ZenIV> References: <20230913111013.77623-1-hch@lst.de> <20230913111013.77623-4-hch@lst.de> <20230913232712.GC800259@ZenIV> <20230926093834.GB13806@lst.de> <20230926212515.GN800259@ZenIV> <20231002064646.GA1799@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231002064646.GA1799@lst.de> Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, Oct 02, 2023 at 08:46:46AM +0200, Christoph Hellwig wrote: > On Tue, Sep 26, 2023 at 10:25:15PM +0100, Al Viro wrote: > > Before your patch: foo_kill_super() calls kill_anon_super(), > > which calls kill_super_notify(), which removes the sucker from > > the list, then frees ->s_fs_info. After your patch: > > removal from the lists happens via the call of kill_super_notify() > > *after* both of your methods had been called, while freeing > > ->s_fs_info happens from the method call. IOW, you've restored > > the situation prior to "super: ensure valid info". The whole > > point of that commit had been to make sure that we have nothing > > in the lists with ->s_fs_info pointing to a freed object. > > > > It's not about free_anon_bdev(); that part is fine - it's the > > "we can drop the weird second call site of kill_super_notify()" > > thing that is broken. > > The point has been to only release the anon dev_t after > kill_super_notify, to prevent two of them beeing reused. > > Which we do as the free_anon_bdev is done directly in > deactivate_locked_super. The new ->free_sb for non-block file systems > frees resources, but none of them matter for sget. We keep talking past each other... Let me try again: at the tip of your branch you have static struct file_system_type ubifs_fs_type = { .name = "ubifs", .owner = THIS_MODULE, .mount = ubifs_mount, .free_sb = ubifs_free_sb, }; static void ubifs_free_sb(struct super_block *s) { kfree(s->s_fs_info); } static struct dentry *ubifs_mount(struct file_system_type *fs_type, int flags, const char *name, void *data) { ... sb = sget(fs_type, sb_test, sb_set, flags, c); ... } static int sb_test(struct super_block *sb, void *data) { struct ubifs_info *c1 = data; struct ubifs_info *c = sb->s_fs_info; return c->vi.cdev == c1->vi.cdev; } See the problem? Mainline has static void kill_ubifs_super(struct super_block *s) { struct ubifs_info *c = s->s_fs_info; kill_anon_super(s); kfree(c); } and void kill_anon_super(struct super_block *sb) { dev_t dev = sb->s_dev; generic_shutdown_super(sb); kill_super_notify(sb); free_anon_bdev(dev); } That removes the superblock from the list of instances before its ->s_fs_info is freed. In your branch removal happens here: if (fs->shutdown_sb) fs->shutdown_sb(s); generic_shutdown_super(s); if (fs->free_sb) fs->free_sb(s); kill_super_notify(s); That comes *after* ubifs_free_sb() has freed ->s_fs_info. And there's nothing to stop ubifs_mount() (on a completely unrelated device) to get called right at that moment. Doing the sget() call quoted above. Now, in sget() we have hlist_for_each_entry(old, &type->fs_supers, s_instances) { if (!test(old, data)) and that will hit sb_test(old, data), with old being a superblock still in ->fs_supers, but with ->s_fs_info already freed. So in sb_test() we have c equal to old->s_fs_info and return c->vi.cdev == c1->vi.cdev; is a bloody use after free. Here we are unlikely to get fucked over - it's a plain fetch from freed object. If you look at e.g. nfs, you'll see a lot more than that - pointer chasing from freed (and possibly reused) object. The only difference is that there you have sget_fc() instead of sget() - same loop anyway. The bottom line: in the form it is posted, your series reintroduces the class of UAF that had been added by taking removal from the instances list out of generic_shutdown_super() and then papered over by adding that kill_super_notify() into kill_anon_super(). And frankly, I believe that the root cause is the insistence that list removal should happen after generic_shutdown_super(). Sure, you want the superblock to serve as bdev holder, which leads to fun with -EBUSY if mount comes while umount still hadn't closed the device. I suspect that it would make a lot more sense to introduce an intermediate state - "held, but will be released in a short while". You already have something similar, but only for the entire disk ->bd_claiming stuff. Add a new primitive (will_release_bdev()), so that attempts to claim the sucker will wait until it gets released instead of failing with -EBUSY. And do *that* before generic_shutdown_super() when unmounting something that is block-based. Allows to bring the list removal back where it used to be, no UAF at all... IMO that direction is a lot more promising.