From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E656C6FA90 for ; Tue, 20 Sep 2022 02:45:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230038AbiITCp2 (ORCPT ); Mon, 19 Sep 2022 22:45:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230055AbiITCpZ (ORCPT ); Mon, 19 Sep 2022 22:45:25 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C56A8491E0; Mon, 19 Sep 2022 19:45:23 -0700 (PDT) Received: from dread.disaster.area (pa49-180-183-60.pa.nsw.optusnet.com.au [49.180.183.60]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id A23661100975; Tue, 20 Sep 2022 12:45:21 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1oaTGB-009qB6-UQ; Tue, 20 Sep 2022 12:45:19 +1000 Date: Tue, 20 Sep 2022 12:45:19 +1000 From: Dave Chinner To: Shiyang Ruan Cc: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, nvdimm@lists.linux.dev, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, djwong@kernel.org, dan.j.williams@intel.com, hch@infradead.org, jane.chu@oracle.com Subject: Re: [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Message-ID: <20220920024519.GQ3600936@dread.disaster.area> References: <9e9521a4-6e07-e226-2814-b78a2451656b@fujitsu.com> <1662114961-66-1-git-send-email-ruansy.fnst@fujitsu.com> <1662114961-66-4-git-send-email-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1662114961-66-4-git-send-email-ruansy.fnst@fujitsu.com> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=e9dl9Yl/ c=1 sm=1 tr=0 ts=63292942 a=mj5ET7k2jFntY++HerHxfg==:117 a=mj5ET7k2jFntY++HerHxfg==:17 a=kj9zAlcOel0A:10 a=xOM3xZuef0cA:10 a=VwQbUJbxAAAA:8 a=QyXUC8HyAAAA:8 a=omOdbC7AAAAA:8 a=7-415B0cAAAA:8 a=_WoWtd2e78iMft2-j4wA:9 a=CjuIK1q_8ugA:10 a=AjGcO6oz07-iQ99wixmX:22 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Fri, Sep 02, 2022 at 10:36:01AM +0000, Shiyang Ruan wrote: > This patch is inspired by Dan's "mm, dax, pmem: Introduce > dev_pagemap_failure()"[1]. With the help of dax_holder and > ->notify_failure() mechanism, the pmem driver is able to ask filesystem > (or mapped device) on it to unmap all files in use and notify processes > who are using those files. > > Call trace: > trigger unbind > -> unbind_store() > -> ... (skip) > -> devres_release_all() # was pmem driver ->remove() in v1 > -> kill_dax() > -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE) > -> xfs_dax_notify_failure() > > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove > event. So do not shutdown filesystem directly if something not > supported, or if failure range includes metadata area. Make sure all > files and processes are handled correctly. > > [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/ > > Signed-off-by: Shiyang Ruan > --- > drivers/dax/super.c | 3 ++- > fs/xfs/xfs_notify_failure.c | 23 +++++++++++++++++++++++ > include/linux/mm.h | 1 + > 3 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 9b5e2a5eb0ae..cf9a64563fbe 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev) > return; > > if (dax_dev->holder_data != NULL) > - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0); > + dax_holder_notify_failure(dax_dev, 0, U64_MAX, > + MF_MEM_PRE_REMOVE); > > clear_bit(DAXDEV_ALIVE, &dax_dev->flags); > synchronize_srcu(&dax_srcu); > diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c > index 3830f908e215..5e04ba7fa403 100644 > --- a/fs/xfs/xfs_notify_failure.c > +++ b/fs/xfs/xfs_notify_failure.c > @@ -22,6 +22,7 @@ > > #include > #include > +#include > > struct xfs_failure_info { > xfs_agblock_t startblock; > @@ -77,6 +78,9 @@ xfs_dax_failure_fn( > > if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) || > (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) { > + /* The device is about to be removed. Not a really failure. */ > + if (notify->mf_flags & MF_MEM_PRE_REMOVE) > + return 0; > notify->want_shutdown = true; > return 0; > } > @@ -182,12 +186,23 @@ xfs_dax_notify_failure( > struct xfs_mount *mp = dax_holder(dax_dev); > u64 ddev_start; > u64 ddev_end; > + int error; > > if (!(mp->m_super->s_flags & SB_BORN)) { > xfs_warn(mp, "filesystem is not ready for notify_failure()!"); > return -EIO; > } > > + if (mf_flags & MF_MEM_PRE_REMOVE) { > + xfs_info(mp, "device is about to be removed!"); > + down_write(&mp->m_super->s_umount); > + error = sync_filesystem(mp->m_super); > + drop_pagecache_sb(mp->m_super, NULL); > + up_write(&mp->m_super->s_umount); > + if (error) > + return error; If the device is about to go away unexpectedly, shouldn't this shut down the filesystem after syncing it here? If the filesystem has been shut down, then everything will fail before removal finally triggers, and the act of unmounting the filesystem post device removal will clean up the page cache and all the other caches. IOWs, I don't understand why the page cache is considered special here (as opposed to, say, the inode or dentry caches), nor why we aren't shutting down the filesystem directly after syncing it to disk to ensure that we don't end up with applications losing data as a result of racing with the removal.... Cheers, Dave. -- Dave Chinner david@fromorbit.com