[PATCH 0/3] ntfs: error/durability fixes for mft writepage paths

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] ntfs: error/durability fixes for mft writepage paths
@ 2026-04-30 17:20 DaeMyung Kang
  2026-04-30 17:20 ` [PATCH 1/3] ntfs: wait for sync mft writes to complete DaeMyung Kang
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: DaeMyung Kang @ 2026-04-30 17:20 UTC (permalink / raw)
  To: Namjae Jeon, Hyunchul Lee
  Cc: Arnd Bergmann, linux-fsdevel, linux-kernel, DaeMyung Kang

This series fixes three independent issues in the ntfs mft writepage
paths in linux-next.

Patch 1 restores the documented synchronous semantics of the @sync
path of write_mft_record_nolock() and of ntfs_sync_mft_mirror().
Both functions claim to wait for I/O completion, but in the converted
bio code they only call submit_bio() and return; bi_status is never
inspected.  As a result write_inode() can report success while dirty
bytes are still in flight and bio errors are silently dropped.
Introduced by commit 115380f9a2f9 ("ntfs: update mft operations").

Patch 2 fixes ntfs_write_mft_block().  When the per-call allocations
for @locked_nis or @ref_inos fail, the function returns -ENOMEM with
the folio still locked, which stalls any later task that needs the
folio's lock and drops the dirty state from the writeback iterator's
point of view.  Use folio_redirty_for_writepage() and folio_unlock()
before returning.  Introduced when those buffers were moved off the
stack by commit f462fdf3d6a4 ("ntfs: reduce stack usage in
ntfs_write_mft_block()").

Patch 3 captures the return value of ntfs_sync_mft_mirror() inside
ntfs_write_mft_block() so a $MFTMirr write failure during writepages
is propagated and surfaces via NVolErrors.  Patch 3 is what makes
ntfs_sync_mft_mirror()'s newly-meaningful return value (from patch 1)
visible on this code path.  Also introduced by commit
115380f9a2f9 ("ntfs: update mft operations").

Note: this series does not yet wait for the main $MFT bios submitted
inside ntfs_write_mft_block() to complete, nor does it propagate
their bi_status, nor does it redirty folios when records are skipped
by ntfs_may_write_mft_record().  Those issues require a per-folio
writeback completion context and are the subject of a follow-up
patch I am preparing separately.

The patches apply on top of linkinjeon/ntfs-next.  Build-tested on
x86_64 with CONFIG_NTFS_FS=m and pass scripts/checkpatch.pl with no
warnings.

Runtime testing was done in QEMU with a small initramfs, ntfs.ko and
dm-flakey.  The test kernel used CONFIG_NTFS_FS=m, CONFIG_FAILSLAB=y,
CONFIG_FAULT_INJECTION=y, and
CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y.  KVM was not available in
the test environment, so these runs used TCG acceleration.

On the unpatched baseline (the commit immediately before this series,
9e9354075d5a), the QEMU tests reproduced the fixed failures:

  - With dm-flakey erroring writes during a metadata fsync,
    utime_fsync returned success:

      QEMU-NTFS-ORIG: TEST patch1-original: BUG reproduced
      fsync returned success under write I/O failure

  - With failslab injected from the ntfs_mft_writepages() stack,
    sync stayed blocked after the injected allocation failure and the
    guest later emitted hung-task diagnostics:

      QEMU-NTFS-ORIG: TEST patch2-original: BUG reproduced
      sync still blocked after injected allocation failure

With this series applied, the same QEMU test setup completed:

  - The metadata fsync path returned EIO under dm-flakey write errors:

      QEMU-NTFS-BATCH1: TEST patch1: PASS fsync failed as expected rc=1

  - failslab injection from ntfs_mft_writepages() was observed and sync
    completed without hanging:

      QEMU-NTFS-BATCH1: TEST patch2: observed injected
      ntfs_mft_writepages allocation failure
      QEMU-NTFS-BATCH1: TEST patch2: PASS no hang after
      injected allocation failure rc=0

  - The writepages path observed and propagated an $MFTMirr write
    failure:

      QEMU-NTFS-BATCH1: TEST patch3: observed MFT mirror sync failure
      QEMU-NTFS-BATCH1: TEST patch3: PASS syncfs failed as expected rc=1

  - Final result:

      QEMU-NTFS-BATCH1: PASS all qemu ntfs batch1 checks completed

The whole-device dm-flakey run is not a clean negative control for
patch 3 alone because syncfs can also see main $MFT writeback errors in
the unpatched tree.  It nevertheless covers the patched $MFTMirr error
path and verifies that the error is propagated.

DaeMyung Kang (3):
  ntfs: wait for sync mft writes to complete
  ntfs: redirty folio when ntfs_write_mft_block() runs out of memory
  ntfs: capture mft mirror sync errors in ntfs_write_mft_block()

 fs/ntfs/mft.c | 76 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 51 insertions(+), 25 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] ntfs: wait for sync mft writes to complete
  2026-04-30 17:20 [PATCH 0/3] ntfs: error/durability fixes for mft writepage paths DaeMyung Kang
@ 2026-04-30 17:20 ` DaeMyung Kang
  2026-04-30 17:20 ` [PATCH 2/3] ntfs: redirty folio when ntfs_write_mft_block() runs out of memory DaeMyung Kang
  2026-04-30 17:20 ` [PATCH 3/3] ntfs: capture mft mirror sync errors in ntfs_write_mft_block() DaeMyung Kang
  2 siblings, 0 replies; 4+ messages in thread
From: DaeMyung Kang @ 2026-04-30 17:20 UTC (permalink / raw)
  To: Namjae Jeon, Hyunchul Lee
  Cc: Arnd Bergmann, linux-fsdevel, linux-kernel, DaeMyung Kang

ntfs_sync_mft_mirror() and write_mft_record_nolock() with @sync set
are both documented as synchronous, but neither actually waits for
the bio they submit nor inspects bi_status.  write_inode() can
return success while dirty mft record bytes are still in flight, and
bio errors are silently dropped: the volume is not marked with
errors and the inode is not redirtied.  This breaks fsync()/sync
metadata durability.

Switch ntfs_sync_mft_mirror() and the @sync path of
write_mft_record_nolock() to submit_bio_wait() and propagate the
returned error to the caller.  Capture ntfs_sync_mft_mirror()'s
return value at its call sites in write_mft_record_nolock() so a
mirror write failure surfaces too.

The @sync parameter only controls the main MFT bio.  The !@sync main
submission is therefore unchanged and still uses ntfs_bio_end_io() to
drop the folio reference taken before submission.  The mirror call
has always been documented as performing synchronous I/O regardless
of @sync, so making it actually block restores the originally
intended contract for both @sync and !@sync callers.

Note this only fixes the synchronous mirror/main paths reachable
from write_mft_record_nolock().  The main MFT write submitted from
ntfs_write_mft_block() (the .writepages path) still does not wait
for completion or check bi_status; that requires a larger
restructuring and is left to a follow-up patch.

Fixes: 115380f9a2f9 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
---
 fs/ntfs/mft.c | 63 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 41 insertions(+), 22 deletions(-)

diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
index 7d989267a82b..4051b4823162 100644
--- a/fs/ntfs/mft.c
+++ b/fs/ntfs/mft.c
@@ -449,7 +449,7 @@ static void ntfs_bio_end_io(struct bio *bio)
 int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const u64 mft_no,
 		struct mft_record *m)
 {
-	u8 *kmirr = NULL;
+	u8 *kmirr;
 	struct folio *folio;
 	unsigned int folio_ofs, lcn_folio_off = 0;
 	int err = 0;
@@ -479,6 +479,7 @@ int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const u64 mft_no,
 	kmirr = kmap_local_folio(folio, 0) + folio_ofs;
 	/* Copy the mst protected mft record to the mirror. */
 	memcpy(kmirr, m, vol->mft_record_size);
+	kunmap_local(kmirr);
 
 	if (vol->cluster_size_bits > PAGE_SHIFT) {
 		lcn_folio_off = folio->index << PAGE_SHIFT;
@@ -490,20 +491,22 @@ int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const u64 mft_no,
 		NTFS_B_TO_SECTOR(vol, NTFS_CLU_TO_B(vol, vol->mftmirr_lcn) +
 				 lcn_folio_off + folio_ofs);
 
-	if (!bio_add_folio(bio, folio, vol->mft_record_size, folio_ofs)) {
+	if (bio_add_folio(bio, folio, vol->mft_record_size, folio_ofs))
+		err = submit_bio_wait(bio);
+	else
 		err = -EIO;
-		bio_put(bio);
-		goto unlock_folio;
-	}
+	bio_put(bio);
 
-	bio->bi_end_io = ntfs_bio_end_io;
-	submit_bio(bio);
-	/* Current state: all buffers are clean, unlocked, and uptodate. */
+	/*
+	 * The in-memory mirror is now valid because we just memcpy()'d the
+	 * mst-protected mft record into it.  Mark the folio uptodate even on
+	 * write error so a subsequent read_mapping_folio() does not refetch
+	 * the stale on-disk mirror and overwrite this copy.  The error is
+	 * propagated to the caller via @err.
+	 */
 	folio_mark_uptodate(folio);
 
-unlock_folio:
 	folio_unlock(folio);
-	kunmap_local(kmirr);
 	folio_put(folio);
 	if (likely(!err)) {
 		ntfs_debug("Done.");
@@ -588,20 +591,36 @@ int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, int syn
 		}
 
 		/* Synchronize the mft mirror now if not @sync. */
-		if (!sync && ni->mft_no < vol->mftmirr_size)
-			ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
+		if (!sync && ni->mft_no < vol->mftmirr_size) {
+			int sub_err = ntfs_sync_mft_mirror(vol, ni->mft_no,
+							   fixup_m);
+			if (unlikely(sub_err) && !err)
+				err = sub_err;
+		}
 
-		folio_get(folio);
-		bio->bi_private = folio;
-		bio->bi_end_io = ntfs_bio_end_io;
-		submit_bio(bio);
+		if (sync) {
+			int sub_err = submit_bio_wait(bio);
+
+			bio_put(bio);
+			if (unlikely(sub_err) && !err)
+				err = sub_err;
+		} else {
+			folio_get(folio);
+			bio->bi_private = folio;
+			bio->bi_end_io = ntfs_bio_end_io;
+			submit_bio(bio);
+		}
 		offset += vol->cluster_size;
 		i++;
 	}
 
 	/* If @sync, now synchronize the mft mirror. */
-	if (sync && ni->mft_no < vol->mftmirr_size)
-		ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
+	if (sync && ni->mft_no < vol->mftmirr_size) {
+		int sub_err = ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
+
+		if (unlikely(sub_err) && !err)
+			err = sub_err;
+	}
 	kunmap_local(kaddr);
 	if (unlikely(err)) {
 		/* I/O error during writing.  This is really bad! */
@@ -617,10 +636,10 @@ int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, int syn
 	bio_put(bio);
 err_out:
 	/*
-	 * Current state: all buffers are clean, unlocked, and uptodate.
-	 * The caller should mark the base inode as bad so that no more i/o
-	 * happens.  ->drop_inode() will still be invoked so all extent inodes
-	 * and other allocated memory will be freed.
+	 * The caller should mark the base inode as bad so no more I/O
+	 * happens. ->drop_inode() will still be invoked so all extent inodes
+	 * and other allocated memory will be freed. ENOMEM is retried by
+	 * redirtying the mft record below.
 	 */
 	if (err == -ENOMEM) {
 		ntfs_error(vol->sb,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] ntfs: redirty folio when ntfs_write_mft_block() runs out of memory
  2026-04-30 17:20 [PATCH 0/3] ntfs: error/durability fixes for mft writepage paths DaeMyung Kang
  2026-04-30 17:20 ` [PATCH 1/3] ntfs: wait for sync mft writes to complete DaeMyung Kang
@ 2026-04-30 17:20 ` DaeMyung Kang
  2026-04-30 17:20 ` [PATCH 3/3] ntfs: capture mft mirror sync errors in ntfs_write_mft_block() DaeMyung Kang
  2 siblings, 0 replies; 4+ messages in thread
From: DaeMyung Kang @ 2026-04-30 17:20 UTC (permalink / raw)
  To: Namjae Jeon, Hyunchul Lee
  Cc: Arnd Bergmann, linux-fsdevel, linux-kernel, DaeMyung Kang

ntfs_write_mft_block() is called by writeback_iter() with the folio
locked.  When the per-call allocations for @locked_nis or @ref_inos
fail, the function returns -ENOMEM directly without unlocking the
folio.  Any later task that needs the folio's lock then stalls, and
the folio's dirty state is silently lost from the writeback
iterator's point of view.

Use folio_redirty_for_writepage() so the folio remains dirty for a
subsequent writeback pass, unlock it, and only then return -ENOMEM
so the caller can propagate the error to fsync()/sync_filesystem().

Fixes: f462fdf3d6a4 ("ntfs: reduce stack usage in ntfs_write_mft_block()")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
---
 fs/ntfs/mft.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
index 4051b4823162..00f172fd1b21 100644
--- a/fs/ntfs/mft.c
+++ b/fs/ntfs/mft.c
@@ -2740,8 +2740,11 @@ static int ntfs_write_mft_block(struct folio *folio, struct writeback_control *w
 	ntfs_debug("Entering for inode 0x%llx, attribute type 0x%x, folio index 0x%lx.",
 			ni->mft_no, ni->type, folio->index);
 
-	if (!locked_nis || !ref_inos)
+	if (!locked_nis || !ref_inos) {
+		folio_redirty_for_writepage(wbc, folio);
+		folio_unlock(folio);
 		return -ENOMEM;
+	}
 
 	/* We have to zero every time due to mmap-at-end-of-file. */
 	if (folio->index >= (i_size >> folio_shift(folio)))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] ntfs: capture mft mirror sync errors in ntfs_write_mft_block()
  2026-04-30 17:20 [PATCH 0/3] ntfs: error/durability fixes for mft writepage paths DaeMyung Kang
  2026-04-30 17:20 ` [PATCH 1/3] ntfs: wait for sync mft writes to complete DaeMyung Kang
  2026-04-30 17:20 ` [PATCH 2/3] ntfs: redirty folio when ntfs_write_mft_block() runs out of memory DaeMyung Kang
@ 2026-04-30 17:20 ` DaeMyung Kang
  2 siblings, 0 replies; 4+ messages in thread
From: DaeMyung Kang @ 2026-04-30 17:20 UTC (permalink / raw)
  To: Namjae Jeon, Hyunchul Lee
  Cc: Arnd Bergmann, linux-fsdevel, linux-kernel, DaeMyung Kang

After ntfs_sync_mft_mirror() became able to return real I/O errors,
ntfs_write_mft_block() still discards its return value at the call
site inside the per-record loop.  A failed $MFTMirr write therefore
leaves the volume looking clean from the writeback path even though
the on-disk mirror is now stale.

Capture the return value and feed it into the function's existing
@err variable using the same "first error wins" pattern already used
on other failure paths.  The error is propagated to the caller and,
via the existing tail of the function, sets NVolErrors so umount and
chkdsk see the volume as inconsistent.

Fixes: 115380f9a2f9 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
---
 fs/ntfs/mft.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
index 00f172fd1b21..866816f710b5 100644
--- a/fs/ntfs/mft.c
+++ b/fs/ntfs/mft.c
@@ -2862,9 +2862,13 @@ static int ntfs_write_mft_block(struct folio *folio, struct writeback_control *w
 			}
 			prev_mft_ofs = mft_ofs;
 
-			if (mft_no < vol->mftmirr_size)
-				ntfs_sync_mft_mirror(vol, mft_no,
+			if (mft_no < vol->mftmirr_size) {
+				int sub_err = ntfs_sync_mft_mirror(vol, mft_no,
 						(struct mft_record *)(kaddr + mft_ofs));
+
+				if (unlikely(sub_err) && !err)
+					err = sub_err;
+			}
 		} else if (ref_inos[nr_ref_inos])
 			nr_ref_inos++;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-30 17:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 17:20 [PATCH 0/3] ntfs: error/durability fixes for mft writepage paths DaeMyung Kang
2026-04-30 17:20 ` [PATCH 1/3] ntfs: wait for sync mft writes to complete DaeMyung Kang
2026-04-30 17:20 ` [PATCH 2/3] ntfs: redirty folio when ntfs_write_mft_block() runs out of memory DaeMyung Kang
2026-04-30 17:20 ` [PATCH 3/3] ntfs: capture mft mirror sync errors in ntfs_write_mft_block() DaeMyung Kang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox