From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F34151AF0CA; Thu, 30 Jan 2025 08:41:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738226489; cv=none; b=OUs9jVTBONhLKNzD172niTvO0dZ4d/mUUyeBSg6hTqmql4Vas6yT9e3opSJ/QwENVD4TDdLgUOqm+bFy0J2XEtXwOlwf7WmJ/CzRWRkgXYtrzPCFyxcwr835G3yhIJzuAQ4n8cu8hrB8S1QBDE3ewS5eK0qYjTpfOq+WIBZFjtw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738226489; c=relaxed/simple; bh=8rz4ge6bNOd1BJew9MdLChnjGrGf/7ofvRlSY2Yvk6o=; h=Subject:To:Cc:From:Date:In-Reply-To:Message-ID:MIME-Version: Content-Type; b=jYOCii2DyI2Pd5cDwUQURF+BmUyc3wWnJukF6PBmYpsBT2VR3AkLrZ5mNBdSbrKCSf4Vai+UCEZwhJJSVZDPFQ8isVB0b1aX4628ruJvj4SUCgFak46yE0B+ltpEmIqew33lhQLK1U/UMaIE66dd8UbpiTyZDz93eEb5iLwpXTk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=Eh8UaAjC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="Eh8UaAjC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 14EAFC4CED2; Thu, 30 Jan 2025 08:41:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1738226488; bh=8rz4ge6bNOd1BJew9MdLChnjGrGf/7ofvRlSY2Yvk6o=; h=Subject:To:Cc:From:Date:In-Reply-To:From; b=Eh8UaAjCFvABMZucwpNuTSJJTyOPJtL3OakV34H4a5FKE54bLjcWaPESz3WwA9nqb Nm6wCsGdxMpjHrAuX+/YdRn1Ndt1uGlSDHB1PK9jsfHa/LudNQA6VTD2YyF5CbudSI dfjHoMYiuunblpD9PXcO6rtCQ8n5bLTFXx14GWTE= Subject: Patch "xfs: allow read IO and FICLONE to run concurrently" has been added to the 6.1-stable tree To: amir73il@gmail.com,catherine.hoang@oracle.com,chandan.babu@oracle.com,chandanbabu@kernel.org,dchinner@redhat.com,djwong@kernel.org,gregkh@linuxfoundation.org,hch@lst.de,leah.rumancik@gmail.com,xfs-stable@lists.linux.dev Cc: From: Date: Thu, 30 Jan 2025 09:40:59 +0100 In-Reply-To: <20250129184717.80816-10-leah.rumancik@gmail.com> Message-ID: <2025013059-ricotta-figure-1401@gregkh> Precedence: bulk X-Mailing-List: xfs-stable@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 8bit X-stable: commit X-Patchwork-Hint: ignore This is a note to let you know that I've just added the patch titled xfs: allow read IO and FICLONE to run concurrently to the 6.1-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: xfs-allow-read-io-and-ficlone-to-run-concurrently.patch and it can be found in the queue-6.1 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From stable+bounces-111220-greg=kroah.com@vger.kernel.org Wed Jan 29 19:47:54 2025 From: Leah Rumancik Date: Wed, 29 Jan 2025 10:47:07 -0800 Subject: xfs: allow read IO and FICLONE to run concurrently To: stable@vger.kernel.org Cc: xfs-stable@lists.linux.dev, amir73il@gmail.com, chandan.babu@oracle.com, catherine.hoang@oracle.com, "Darrick J. Wong" , Dave Chinner , Christoph Hellwig , Chandan Babu R , Leah Rumancik Message-ID: <20250129184717.80816-10-leah.rumancik@gmail.com> From: Catherine Hoang [ Upstream commit 14a537983b228cb050ceca3a5b743d01315dc4aa ] One of our VM cluster management products needs to snapshot KVM image files so that they can be restored in case of failure. Snapshotting is done by redirecting VM disk writes to a sidecar file and using reflink on the disk image, specifically the FICLONE ioctl as used by "cp --reflink". Reflink locks the source and destination files while it operates, which means that reads from the main vm disk image are blocked, causing the vm to stall. When an image file is heavily fragmented, the copy process could take several minutes. Some of the vm image files have 50-100 million extent records, and duplicating that much metadata locks the file for 30 minutes or more. Having activities suspended for such a long time in a cluster node could result in node eviction. Clone operations and read IO do not change any data in the source file, so they should be able to run concurrently. Demote the exclusive locks taken by FICLONE to shared locks to allow reads while cloning. While a clone is in progress, writes will take the IOLOCK_EXCL, so they block until the clone completes. Link: https://lore.kernel.org/linux-xfs/8911B94D-DD29-4D6E-B5BC-32EAF1866245@oracle.com/ Signed-off-by: Catherine Hoang Reviewed-by: "Darrick J. Wong" Reviewed-by: Dave Chinner Reviewed-by: Christoph Hellwig Signed-off-by: Chandan Babu R Signed-off-by: Leah Rumancik Signed-off-by: Greg Kroah-Hartman --- fs/xfs/xfs_file.c | 63 ++++++++++++++++++++++++++++++++++++++++----------- fs/xfs/xfs_inode.c | 17 +++++++++++++ fs/xfs/xfs_inode.h | 9 +++++++ fs/xfs/xfs_reflink.c | 4 +++ 4 files changed, 80 insertions(+), 13 deletions(-) --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -214,6 +214,43 @@ xfs_ilock_iocb( return 0; } +static int +xfs_ilock_iocb_for_write( + struct kiocb *iocb, + unsigned int *lock_mode) +{ + ssize_t ret; + struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); + + ret = xfs_ilock_iocb(iocb, *lock_mode); + if (ret) + return ret; + + if (*lock_mode == XFS_IOLOCK_EXCL) + return 0; + if (!xfs_iflags_test(ip, XFS_IREMAPPING)) + return 0; + + xfs_iunlock(ip, *lock_mode); + *lock_mode = XFS_IOLOCK_EXCL; + return xfs_ilock_iocb(iocb, *lock_mode); +} + +static unsigned int +xfs_ilock_for_write_fault( + struct xfs_inode *ip) +{ + /* get a shared lock if no remapping in progress */ + xfs_ilock(ip, XFS_MMAPLOCK_SHARED); + if (!xfs_iflags_test(ip, XFS_IREMAPPING)) + return XFS_MMAPLOCK_SHARED; + + /* wait for remapping to complete */ + xfs_iunlock(ip, XFS_MMAPLOCK_SHARED); + xfs_ilock(ip, XFS_MMAPLOCK_EXCL); + return XFS_MMAPLOCK_EXCL; +} + STATIC ssize_t xfs_file_dio_read( struct kiocb *iocb, @@ -523,7 +560,7 @@ xfs_file_dio_write_aligned( unsigned int iolock = XFS_IOLOCK_SHARED; ssize_t ret; - ret = xfs_ilock_iocb(iocb, iolock); + ret = xfs_ilock_iocb_for_write(iocb, &iolock); if (ret) return ret; ret = xfs_file_write_checks(iocb, from, &iolock); @@ -590,7 +627,7 @@ retry_exclusive: flags = IOMAP_DIO_FORCE_WAIT; } - ret = xfs_ilock_iocb(iocb, iolock); + ret = xfs_ilock_iocb_for_write(iocb, &iolock); if (ret) return ret; @@ -1158,7 +1195,7 @@ xfs_file_remap_range( if (xfs_file_sync_writes(file_in) || xfs_file_sync_writes(file_out)) xfs_log_force_inode(dest); out_unlock: - xfs_iunlock2_io_mmap(src, dest); + xfs_iunlock2_remapping(src, dest); if (ret) trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_); /* @@ -1313,6 +1350,7 @@ __xfs_filemap_fault( struct inode *inode = file_inode(vmf->vma->vm_file); struct xfs_inode *ip = XFS_I(inode); vm_fault_t ret; + unsigned int lock_mode = 0; trace_xfs_filemap_fault(ip, pe_size, write_fault); @@ -1321,25 +1359,24 @@ __xfs_filemap_fault( file_update_time(vmf->vma->vm_file); } + if (IS_DAX(inode) || write_fault) + lock_mode = xfs_ilock_for_write_fault(XFS_I(inode)); + if (IS_DAX(inode)) { pfn_t pfn; - xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); ret = xfs_dax_fault(vmf, pe_size, write_fault, &pfn); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); - xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); + } else if (write_fault) { + ret = iomap_page_mkwrite(vmf, &xfs_page_mkwrite_iomap_ops); } else { - if (write_fault) { - xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); - ret = iomap_page_mkwrite(vmf, - &xfs_page_mkwrite_iomap_ops); - xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); - } else { - ret = filemap_fault(vmf); - } + ret = filemap_fault(vmf); } + if (lock_mode) + xfs_iunlock(XFS_I(inode), lock_mode); + if (write_fault) sb_end_pagefault(inode->i_sb); return ret; --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3644,6 +3644,23 @@ xfs_iunlock2_io_mmap( inode_unlock(VFS_I(ip1)); } +/* Drop the MMAPLOCK and the IOLOCK after a remap completes. */ +void +xfs_iunlock2_remapping( + struct xfs_inode *ip1, + struct xfs_inode *ip2) +{ + xfs_iflags_clear(ip1, XFS_IREMAPPING); + + if (ip1 != ip2) + xfs_iunlock(ip1, XFS_MMAPLOCK_SHARED); + xfs_iunlock(ip2, XFS_MMAPLOCK_EXCL); + + if (ip1 != ip2) + inode_unlock_shared(VFS_I(ip1)); + inode_unlock(VFS_I(ip2)); +} + /* * Reload the incore inode list for this inode. Caller should ensure that * the link count cannot change, either by taking ILOCK_SHARED or otherwise --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -347,6 +347,14 @@ static inline bool xfs_inode_has_large_e /* Quotacheck is running but inode has not been added to quota counts. */ #define XFS_IQUOTAUNCHECKED (1 << 14) +/* + * Remap in progress. Callers that wish to update file data while + * holding a shared IOLOCK or MMAPLOCK must drop the lock and retake + * the lock in exclusive mode. Relocking the file will block until + * IREMAPPING is cleared. + */ +#define XFS_IREMAPPING (1U << 15) + /* All inode state flags related to inode reclaim. */ #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \ XFS_IRECLAIM | \ @@ -595,6 +603,7 @@ void xfs_end_io(struct work_struct *work int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); +void xfs_iunlock2_remapping(struct xfs_inode *ip1, struct xfs_inode *ip2); static inline bool xfs_inode_unlinked_incomplete( --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1539,6 +1539,10 @@ xfs_reflink_remap_prep( if (ret) goto out_unlock; + xfs_iflags_set(src, XFS_IREMAPPING); + if (inode_in != inode_out) + xfs_ilock_demote(src, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL); + return 0; out_unlock: xfs_iunlock2_io_mmap(src, dest); Patches currently in stable-queue which might be from leah.rumancik@gmail.com are queue-6.1/xfs-allow-read-io-and-ficlone-to-run-concurrently.patch queue-6.1/xfs-hoist-freeing-of-rt-data-fork-extent-mappings.patch queue-6.1/xfs-make-sure-maxlen-is-still-congruent-with-prod-when-rounding-down.patch queue-6.1/xfs-only-remap-the-written-blocks-in-xfs_reflink_end_cow_extent.patch queue-6.1/xfs-dquot-recovery-does-not-validate-the-recovered-dquot.patch queue-6.1/xfs-clean-up-dqblk-extraction.patch queue-6.1/xfs-abort-intent-items-when-recovery-intents-fail.patch queue-6.1/xfs-up-ic_sema-if-flushing-data-device-fails.patch queue-6.1/xfs-fix-internal-error-from-agfl-exhaustion.patch queue-6.1/xfs-factor-out-xfs_defer_pending_abort.patch queue-6.1/xfs-fix-units-conversion-error-in-xfs_bmap_del_extent_delay.patch queue-6.1/xfs-bump-max-fsgeom-struct-version.patch queue-6.1/xfs-handle-nimaps-0-from-xfs_bmapi_write-in-xfs_alloc_file_space.patch queue-6.1/xfs-rt-stubs-should-return-negative-errnos-when-rt-disabled.patch queue-6.1/xfs-clean-up-fs_xflag_realtime-handling-in-xfs_ioctl_setattr_xflags.patch queue-6.1/xfs-respect-the-stable-writes-flag-on-the-rt-device.patch queue-6.1/xfs-introduce-protection-for-drop-nlink.patch queue-6.1/xfs-prevent-rt-growfs-when-quota-is-enabled.patch queue-6.1/xfs-inode-recovery-does-not-validate-the-recovered-inode.patch