Re: [PATCH 1/3] ntfs: wait for sync mft writes to complete

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hyunchul Lee <hyc.lee@gmail.com>
To: DaeMyung Kang <charsyam@gmail.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] ntfs: wait for sync mft writes to complete
Date: Mon, 4 May 2026 10:21:21 +0900	[thread overview]
Message-ID: <aff0kZFdQsa2_hme@hyunchul-PC02> (raw)
In-Reply-To: <afaac877edbd92ee82bb443ca35fd110f823a31d.1777568957.git.charsyam@gmail.com>

On Fri, May 01, 2026 at 02:20:53AM +0900, DaeMyung Kang wrote:
> ntfs_sync_mft_mirror() and write_mft_record_nolock() with @sync set
> are both documented as synchronous, but neither actually waits for
> the bio they submit nor inspects bi_status.  write_inode() can
> return success while dirty mft record bytes are still in flight, and
> bio errors are silently dropped: the volume is not marked with
> errors and the inode is not redirtied.  This breaks fsync()/sync
> metadata durability.
> 
> Switch ntfs_sync_mft_mirror() and the @sync path of
> write_mft_record_nolock() to submit_bio_wait() and propagate the
> returned error to the caller.  Capture ntfs_sync_mft_mirror()'s
> return value at its call sites in write_mft_record_nolock() so a
> mirror write failure surfaces too.
> 
> The @sync parameter only controls the main MFT bio.  The !@sync main
> submission is therefore unchanged and still uses ntfs_bio_end_io() to
> drop the folio reference taken before submission.  The mirror call
> has always been documented as performing synchronous I/O regardless
> of @sync, so making it actually block restores the originally
> intended contract for both @sync and !@sync callers.
> 
> Note this only fixes the synchronous mirror/main paths reachable
> from write_mft_record_nolock().  The main MFT write submitted from
> ntfs_write_mft_block() (the .writepages path) still does not wait
> for completion or check bi_status; that requires a larger
> restructuring and is left to a follow-up patch.
> 
> Fixes: 115380f9a2f9 ("ntfs: update mft operations")
> Signed-off-by: DaeMyung Kang <charsyam@gmail.com>

Looks good to me.

Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>

> ---
>  fs/ntfs/mft.c | 63 +++++++++++++++++++++++++++++++++------------------
>  1 file changed, 41 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
> index 7d989267a82b..4051b4823162 100644
> --- a/fs/ntfs/mft.c
> +++ b/fs/ntfs/mft.c
> @@ -449,7 +449,7 @@ static void ntfs_bio_end_io(struct bio *bio)
>  int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const u64 mft_no,
>  		struct mft_record *m)
>  {
> -	u8 *kmirr = NULL;
> +	u8 *kmirr;
>  	struct folio *folio;
>  	unsigned int folio_ofs, lcn_folio_off = 0;
>  	int err = 0;
> @@ -479,6 +479,7 @@ int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const u64 mft_no,
>  	kmirr = kmap_local_folio(folio, 0) + folio_ofs;
>  	/* Copy the mst protected mft record to the mirror. */
>  	memcpy(kmirr, m, vol->mft_record_size);
> +	kunmap_local(kmirr);
>  
>  	if (vol->cluster_size_bits > PAGE_SHIFT) {
>  		lcn_folio_off = folio->index << PAGE_SHIFT;
> @@ -490,20 +491,22 @@ int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const u64 mft_no,
>  		NTFS_B_TO_SECTOR(vol, NTFS_CLU_TO_B(vol, vol->mftmirr_lcn) +
>  				 lcn_folio_off + folio_ofs);
>  
> -	if (!bio_add_folio(bio, folio, vol->mft_record_size, folio_ofs)) {
> +	if (bio_add_folio(bio, folio, vol->mft_record_size, folio_ofs))
> +		err = submit_bio_wait(bio);
> +	else
>  		err = -EIO;
> -		bio_put(bio);
> -		goto unlock_folio;
> -	}
> +	bio_put(bio);
>  
> -	bio->bi_end_io = ntfs_bio_end_io;
> -	submit_bio(bio);
> -	/* Current state: all buffers are clean, unlocked, and uptodate. */
> +	/*
> +	 * The in-memory mirror is now valid because we just memcpy()'d the
> +	 * mst-protected mft record into it.  Mark the folio uptodate even on
> +	 * write error so a subsequent read_mapping_folio() does not refetch
> +	 * the stale on-disk mirror and overwrite this copy.  The error is
> +	 * propagated to the caller via @err.
> +	 */
>  	folio_mark_uptodate(folio);
>  
> -unlock_folio:
>  	folio_unlock(folio);
> -	kunmap_local(kmirr);
>  	folio_put(folio);
>  	if (likely(!err)) {
>  		ntfs_debug("Done.");
> @@ -588,20 +591,36 @@ int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, int syn
>  		}
>  
>  		/* Synchronize the mft mirror now if not @sync. */
> -		if (!sync && ni->mft_no < vol->mftmirr_size)
> -			ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
> +		if (!sync && ni->mft_no < vol->mftmirr_size) {
> +			int sub_err = ntfs_sync_mft_mirror(vol, ni->mft_no,
> +							   fixup_m);
> +			if (unlikely(sub_err) && !err)
> +				err = sub_err;
> +		}
>  
> -		folio_get(folio);
> -		bio->bi_private = folio;
> -		bio->bi_end_io = ntfs_bio_end_io;
> -		submit_bio(bio);
> +		if (sync) {
> +			int sub_err = submit_bio_wait(bio);
> +
> +			bio_put(bio);
> +			if (unlikely(sub_err) && !err)
> +				err = sub_err;
> +		} else {
> +			folio_get(folio);
> +			bio->bi_private = folio;
> +			bio->bi_end_io = ntfs_bio_end_io;
> +			submit_bio(bio);
> +		}
>  		offset += vol->cluster_size;
>  		i++;
>  	}
>  
>  	/* If @sync, now synchronize the mft mirror. */
> -	if (sync && ni->mft_no < vol->mftmirr_size)
> -		ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
> +	if (sync && ni->mft_no < vol->mftmirr_size) {
> +		int sub_err = ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
> +
> +		if (unlikely(sub_err) && !err)
> +			err = sub_err;
> +	}
>  	kunmap_local(kaddr);
>  	if (unlikely(err)) {
>  		/* I/O error during writing.  This is really bad! */
> @@ -617,10 +636,10 @@ int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, int syn
>  	bio_put(bio);
>  err_out:
>  	/*
> -	 * Current state: all buffers are clean, unlocked, and uptodate.
> -	 * The caller should mark the base inode as bad so that no more i/o
> -	 * happens.  ->drop_inode() will still be invoked so all extent inodes
> -	 * and other allocated memory will be freed.
> +	 * The caller should mark the base inode as bad so no more I/O
> +	 * happens. ->drop_inode() will still be invoked so all extent inodes
> +	 * and other allocated memory will be freed. ENOMEM is retried by
> +	 * redirtying the mft record below.
>  	 */
>  	if (err == -ENOMEM) {
>  		ntfs_error(vol->sb,
> -- 
> 2.43.0
> 

-- 
Thanks,
Hyunchul

next prev parent reply	other threads:[~2026-05-04  1:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 17:20 [PATCH 0/3] ntfs: error/durability fixes for mft writepage paths DaeMyung Kang
2026-04-30 17:20 ` [PATCH 1/3] ntfs: wait for sync mft writes to complete DaeMyung Kang
2026-05-04  1:21   ` Hyunchul Lee [this message]
2026-04-30 17:20 ` [PATCH 2/3] ntfs: redirty folio when ntfs_write_mft_block() runs out of memory DaeMyung Kang
2026-05-01 13:04   ` Hyunchul Lee
2026-04-30 17:20 ` [PATCH 3/3] ntfs: capture mft mirror sync errors in ntfs_write_mft_block() DaeMyung Kang
2026-05-01 13:04   ` Hyunchul Lee
2026-05-04 11:00 ` [PATCH 0/3] ntfs: error/durability fixes for mft writepage paths Namjae Jeon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aff0kZFdQsa2_hme@hyunchul-PC02 \
    --to=hyc.lee@gmail.com \
    --cc=arnd@arndb.de \
    --cc=charsyam@gmail.com \
    --cc=linkinjeon@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.