From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EBC8C4332F for ; Tue, 29 Nov 2022 05:23:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235541AbiK2FX3 (ORCPT ); Tue, 29 Nov 2022 00:23:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235505AbiK2FX0 (ORCPT ); Tue, 29 Nov 2022 00:23:26 -0500 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C89586172 for ; Mon, 28 Nov 2022 21:23:25 -0800 (PST) Received: by mail-pl1-x62a.google.com with SMTP id 4so12370386pli.0 for ; Mon, 28 Nov 2022 21:23:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=IxIKWVEioCs+RHVXPU4kDjkDlAi81794JMXS5NNgfAw=; b=2T52pGRKgEtayT4yprZLraDcBMaAMxVkb2DwqDa9bAp88A83rFMh+jXIW93T1jWerz GlXnBY9ySl3V+A0gPRzGAWL8d6Rv7csQIpFTWu4kRJjsxRXouCR4XZy7j5/44q+vwrjB rBgvQXAii11F6g4rl+P9V2pkdXMToqqQgIGgnx6tzzzwBmL7DWFqW3Y2Ln6+/eLK868s G6g5mo4af3a0lieAGTx0oj5i/pkr3McU28nqUIPUrX0iVPNxOteDWkaSDT5V17+dqc2V +zgx4eka0DcMSDRjJd03DoRE8BbkzNvEkhpBY66zb+0ipOs4X1eYzIUEihMcjJqbZ9lQ SKng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=IxIKWVEioCs+RHVXPU4kDjkDlAi81794JMXS5NNgfAw=; b=KrudKjT0p/TnbFpX4TryFeFtmrn39ZAQY6TMjRi79DpQYIxNhbYwyhgn7pdrgy+cPC k9GeN+6No7FG0t9Zl0nKK95sqssATYMlGliZ1ojLWIHPslAtqsznYnB5fb0lMqB9npqW ctWF5bAYhemoSw+vVb9N/lMWztLpOpsbwGLzWdSsprNi7UUHD2qqfafqNE2Vl6xXWLgi 8BTHnBuh/lO3Ms7ub+RneHPCjAJxvKY3QDeHs6rV5PXais+QenZtPmQMYKhNQGmbKJez 3XR1Siz/MgWMmc7fguEnUqoXGtowg9MgB669ZgzubZYH9C4LqCeRSNqsmqwFkyZXhDSx /Dig== X-Gm-Message-State: ANoB5pkYb2gzWa8QOaLBq8XSL78QZYaDbBXWsuyYZpmUsVfPesUEICLn 9Xp2enEZOPTDEnzXTuGJo5nVXV9FUGY2FQ== X-Google-Smtp-Source: AA0mqf4pCwcBL3EjaF5Vljk/lj/SfK385iKV7tt4w/zgYAUE7BYu5UjCN4p9tXOp9UPhu86HehvBGg== X-Received: by 2002:a17:90a:fc92:b0:219:1545:bc57 with SMTP id ci18-20020a17090afc9200b002191545bc57mr18476317pjb.133.1669699405314; Mon, 28 Nov 2022 21:23:25 -0800 (PST) Received: from dread.disaster.area (pa49-186-65-106.pa.vic.optusnet.com.au. [49.186.65.106]) by smtp.gmail.com with ESMTPSA id y24-20020aa79438000000b0057293b17c8bsm8908073pfo.22.2022.11.28.21.23.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Nov 2022 21:23:24 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1ozt5W-002JFj-7w; Tue, 29 Nov 2022 16:23:22 +1100 Date: Tue, 29 Nov 2022 16:23:22 +1100 From: Dave Chinner To: "Darrick J. Wong" Cc: linux-xfs@vger.kernel.org Subject: Re: [PATCH 1/3] xfs: invalidate block device page cache during unmount Message-ID: <20221129052322.GA3600936@dread.disaster.area> References: <166930915825.2061853.2470510849612284907.stgit@magnolia> <166930916399.2061853.16165124824627761814.stgit@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <166930916399.2061853.16165124824627761814.stgit@magnolia> Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Thu, Nov 24, 2022 at 08:59:24AM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong > > Every now and then I see fstests failures on aarch64 (64k pages) that > trigger on the following sequence: > > mkfs.xfs $dev > mount $dev $mnt > touch $mnt/a > umount $mnt > xfs_db -c 'path /a' -c 'print' $dev > > 99% of the time this succeeds, but every now and then xfs_db cannot find > /a and fails. This turns out to be a race involving udev/blkid, the > page cache for the block device, and the xfs_db process. > > udev is triggered whenever anyone closes a block device or unmounts it. > The default udev rules invoke blkid to read the fs super and create > symlinks to the bdev under /dev/disk. For this, it uses buffered reads > through the page cache. > > xfs_db also uses buffered reads to examine metadata. There is no > coordination between xfs_db and udev, which means that they can run > concurrently. Note there is no coordination between the kernel and > blkid either. > > On a system with 64k pages, the page cache can cache the superblock and > the root inode (and hence the root dir) with the same 64k page. If > udev spawns blkid after the mkfs and the system is busy enough that it > is still running when xfs_db starts up, they'll both read from the same > page in the pagecache. > > The unmount writes updated inode metadata to disk directly. The XFS > buffer cache does not use the bdev pagecache, nor does it invalidate the > pagecache on umount. If the above scenario occurs, the pagecache no > longer reflects what's on disk, xfs_db reads the stale metadata, and > fails to find /a. Most of the time this succeeds because closing a bdev > invalidates the page cache, but when processes race, everyone loses. > > Fix the problem by invalidating the bdev pagecache after flushing the > bdev, so that xfs_db will see up to date metadata. > > Signed-off-by: Darrick J. Wong > --- > fs/xfs/xfs_buf.c | 1 + > 1 file changed, 1 insertion(+) > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index dde346450952..54c774af6e1c 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -1945,6 +1945,7 @@ xfs_free_buftarg( > list_lru_destroy(&btp->bt_lru); > > blkdev_issue_flush(btp->bt_bdev); > + invalidate_bdev(btp->bt_bdev); > fs_put_dax(btp->bt_daxdev, btp->bt_mount); > > kmem_free(btp); Looks OK and because XFS has multiple block devices we have to do this invalidation for each bdev. Reviewed-by: Dave Chinner However: this does not look to be an XFS specific problem. If we look at reconfigure_super(), when it completes a remount-ro operation it calls invalidate_bdev() because: /* * Some filesystems modify their metadata via some other path than the * bdev buffer cache (eg. use a private mapping, or directories in * pagecache, etc). Also file data modifications go via their own * mappings. So If we try to mount readonly then copy the filesystem * from bdev, we could get stale data, so invalidate it to give a best * effort at coherency. */ if (remount_ro && sb->s_bdev) invalidate_bdev(sb->s_bdev); This is pretty much the same problem as this patch avoids for XFS in the unmount path, yes? Shouldn't we be adding a call to invalidate_bdev(sb->s_bdev) after the fs->kill_sb() call in deactivate_locked_super() so that this problem goes away for all filesystems? Cheers, Dave. -- Dave Chinner david@fromorbit.com