* [PATCH 10/11] xfs_scrub: don't leak the autofsck fsproperty handle
From: Darrick J. Wong @ 2026-06-24 18:17 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Codex notices that we leak the fsproperty handle if the filesystem
doesn't actually have the property set. Fix that by moving the free
call; it can handle a totally nulled out structure.
Cc: <linux-xfs@vger.kernel.org> # v6.10.0
Fixes: 9451b5ee0d0d2d ("xfs_scrub: allow sysadmin to control background scrubs")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/phase1.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/scrub/phase1.c b/scrub/phase1.c
index 6ac59264b50bb7..620a12393b3658 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -178,8 +178,6 @@ mode_from_autofsck(
break;
}
- fsprops_free_handle(&fph);
-
summarize:
switch (ctx->mode) {
case SCRUB_MODE_NONE:
@@ -200,6 +198,7 @@ mode_from_autofsck(
break;
}
+ fsprops_free_handle(&fph);
return;
no_property:
/*
^ permalink raw reply related
* [PATCH 09/11] xfs_scrub: account for reflinked realtime file data
From: Darrick J. Wong @ 2026-06-24 18:16 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
When we added reflink to the rt device, we forgot to account for
multiply owned space in the phase 7 accounting. Luckily, Codex noticed
for us.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/phase7.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/scrub/phase7.c b/scrub/phase7.c
index 4a25c521fa0c76..375ba0632c353b 100644
--- a/scrub/phase7.c
+++ b/scrub/phase7.c
@@ -27,7 +27,8 @@ struct summary_counts {
unsigned long long dbytes; /* data dev bytes */
unsigned long long rbytes; /* rt dev bytes */
unsigned long long lbytes; /* log dev bytes */
- unsigned long long next_phys; /* next phys bytes we see? */
+ unsigned long long next_dphys; /* next phys bytes we see on data dev? */
+ unsigned long long next_rphys; /* next phys bytes we see on rt dev? */
unsigned long long agbytes; /* freespace bytes */
/* Free space histogram, in fsb */
@@ -120,16 +121,21 @@ count_block_summary(
break;
case XFS_DEV_RT:
/* Count realtime extents. */
+ if (counts->next_rphys >= fsmap->fmr_physical + len)
+ return 0;
+ else if (counts->next_rphys > fsmap->fmr_physical)
+ len -= counts->next_rphys - fsmap->fmr_physical;
counts->rbytes += len;
+ counts->next_rphys = fsmap->fmr_physical + fsmap->fmr_length;
break;
case XFS_DEV_DATA:
/* Count datadev extents. */
- if (counts->next_phys >= fsmap->fmr_physical + len)
+ if (counts->next_dphys >= fsmap->fmr_physical + len)
return 0;
- else if (counts->next_phys > fsmap->fmr_physical)
- len -= counts->next_phys - fsmap->fmr_physical;
+ else if (counts->next_dphys > fsmap->fmr_physical)
+ len -= counts->next_dphys - fsmap->fmr_physical;
counts->dbytes += len;
- counts->next_phys = fsmap->fmr_physical + fsmap->fmr_length;
+ counts->next_dphys = fsmap->fmr_physical + fsmap->fmr_length;
break;
}
^ permalink raw reply related
* [PATCH 08/11] xfs_scrub: account only data extent tail after an overlap
From: Darrick J. Wong @ 2026-06-24 18:16 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Codex points out that the fsmap overlap handling in phase7 isn't quite
right -- if we already saw part of the current fsmapping, we should
*subtract* the overlap, not set the length to it! Fix that.
Cc: <linux-xfs@vger.kernel.org> # v4.15.0
Fixes: 698c6c7cb8ba75 ("xfs_scrub: check summary counters")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/phase7.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scrub/phase7.c b/scrub/phase7.c
index 3b765931a304a9..4a25c521fa0c76 100644
--- a/scrub/phase7.c
+++ b/scrub/phase7.c
@@ -127,7 +127,7 @@ count_block_summary(
if (counts->next_phys >= fsmap->fmr_physical + len)
return 0;
else if (counts->next_phys > fsmap->fmr_physical)
- len = counts->next_phys - fsmap->fmr_physical;
+ len -= counts->next_phys - fsmap->fmr_physical;
counts->dbytes += len;
counts->next_phys = fsmap->fmr_physical + fsmap->fmr_length;
break;
^ permalink raw reply related
* [PATCH 07/11] xfs_scrub: report external log space usage in phase 7
From: Darrick J. Wong @ 2026-06-24 18:16 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Let's report the external log space attached to a mounted filesystem so
that the user knows we found it.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/phase7.c | 73 +++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 48 insertions(+), 25 deletions(-)
diff --git a/scrub/phase7.c b/scrub/phase7.c
index e16ca28aa28371..3b765931a304a9 100644
--- a/scrub/phase7.c
+++ b/scrub/phase7.c
@@ -26,6 +26,7 @@
struct summary_counts {
unsigned long long dbytes; /* data dev bytes */
unsigned long long rbytes; /* rt dev bytes */
+ unsigned long long lbytes; /* log dev bytes */
unsigned long long next_phys; /* next phys bytes we see? */
unsigned long long agbytes; /* freespace bytes */
@@ -68,21 +69,18 @@ count_block_summary(
void *arg)
{
struct summary_counts *counts;
- bool is_rt = false;
+ enum xfs_device dev;
unsigned long long len;
int ret;
- if (ctx->mnt.fsgeom.rtstart) {
- if (fsmap->fmr_device == XFS_DEV_LOG)
- return 0;
- if (fsmap->fmr_device == XFS_DEV_RT)
- is_rt = true;
- } else {
- if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
- return 0;
- if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
- is_rt = true;
- }
+ if (ctx->mnt.fsgeom.rtstart)
+ dev = fsmap->fmr_device;
+ else if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
+ dev = XFS_DEV_LOG;
+ else if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
+ dev = XFS_DEV_RT;
+ else
+ dev = XFS_DEV_DATA;
counts = ptvar_get((struct ptvar *)arg, &ret);
if (ret) {
@@ -95,10 +93,16 @@ count_block_summary(
uint64_t blocks;
blocks = cvt_b_to_off_fsbt(&ctx->mnt, fsmap->fmr_length);
- if (is_rt)
+ switch (dev) {
+ case XFS_DEV_RT:
hist_add(&counts->rtdev_hist, blocks);
- else
+ break;
+ case XFS_DEV_DATA:
hist_add(&counts->datadev_hist, blocks);
+ break;
+ case XFS_DEV_LOG:
+ break;
+ }
return 0;
}
@@ -109,10 +113,16 @@ count_block_summary(
fsmap->fmr_owner == XFS_FMR_OWN_AG)
counts->agbytes += fsmap->fmr_length;
- if (is_rt) {
+ switch (dev) {
+ case XFS_DEV_LOG:
+ /* Count external log */
+ counts->lbytes += len;
+ break;
+ case XFS_DEV_RT:
/* Count realtime extents. */
counts->rbytes += len;
- } else {
+ break;
+ case XFS_DEV_DATA:
/* Count datadev extents. */
if (counts->next_phys >= fsmap->fmr_physical + len)
return 0;
@@ -120,6 +130,7 @@ count_block_summary(
len = counts->next_phys - fsmap->fmr_physical;
counts->dbytes += len;
counts->next_phys = fsmap->fmr_physical + fsmap->fmr_length;
+ break;
}
return 0;
@@ -137,6 +148,7 @@ add_summaries(
total->dbytes += item->dbytes;
total->rbytes += item->rbytes;
+ total->lbytes += item->lbytes;
total->agbytes += item->agbytes;
hist_import(&total->datadev_hist, &item->datadev_hist);
@@ -162,8 +174,10 @@ phase7_func(
unsigned long long used_data;
unsigned long long used_rt;
unsigned long long used_files;
+ unsigned long long used_log;
unsigned long long stat_data;
unsigned long long stat_rt;
+ unsigned long long stat_log;
uint64_t counted_inodes = 0;
unsigned long long absdiff;
unsigned long long d_blocks;
@@ -241,8 +255,13 @@ phase7_func(
/* Report on what we found. */
used_data = cvt_off_fsb_to_b(&ctx->mnt, d_blocks - d_bfree);
used_rt = cvt_off_fsb_to_b(&ctx->mnt, r_blocks - r_bfree);
+ if (ctx->mnt.fsgeom.logstart == 0)
+ used_log = cvt_off_fsb_to_b(&ctx->mnt, l_blocks);
+ else
+ used_log = 0;
stat_data = totalcount.dbytes;
stat_rt = totalcount.rbytes;
+ stat_log = totalcount.lbytes;
/*
* Complain if the counts are off by more than 10% unless
@@ -252,28 +271,32 @@ phase7_func(
complain = verbose;
complain |= !within_range(ctx, stat_data, used_data, absdiff, 1, 10,
_("data blocks"));
+ complain |= !within_range(ctx, stat_log, used_log, absdiff, 1, 10,
+ _("external log blocks"));
complain |= !within_range(ctx, stat_rt, used_rt, absdiff, 1, 10,
_("realtime blocks"));
complain |= !within_range(ctx, counted_inodes, used_files, 100, 1, 10,
_("inodes"));
if (complain) {
- double d, r, i;
- char *du, *ru, *iu;
+ double d, r, i, l;
+ char *du, *ru, *iu, *lu;
- if (used_rt || stat_rt) {
+ if (used_rt || stat_rt || used_log) {
d = auto_space_units(used_data, &du);
r = auto_space_units(used_rt, &ru);
+ l = auto_space_units(used_log, &lu);
i = auto_units(used_files, &iu, &ip);
fprintf(stdout,
-_("%.1f%s data used; %.1f%s realtime data used; %.*f%s inodes used.\n"),
- d, du, r, ru, ip, i, iu);
+_("%.1f%s data used; %.1f%s realtime data used; %.1f%s external log used; %.*f%s inodes used.\n"),
+ d, du, r, ru, l, lu, ip, i, iu);
d = auto_space_units(stat_data, &du);
r = auto_space_units(stat_rt, &ru);
+ l = auto_space_units(stat_log, &lu);
i = auto_units(counted_inodes, &iu, &ip);
fprintf(stdout,
-_("%.1f%s data found; %.1f%s realtime data found; %.*f%s inodes found.\n"),
- d, du, r, ru, ip, i, iu);
+_("%.1f%s data found; %.1f%s realtime data found; %.1f%s external log found; %.*f%s inodes found.\n"),
+ d, du, r, ru, l, lu, ip, i, iu);
} else {
d = auto_space_units(used_data, &du);
i = auto_units(used_files, &iu, &ip);
@@ -314,13 +337,13 @@ _("%.*f%s inodes counted; %.*f%s inodes checked.\n"),
*/
if (ctx->bytes_checked &&
(verbose ||
- !within_range(ctx, used_data + used_rt,
+ !within_range(ctx, used_data + used_rt + used_log,
ctx->bytes_checked, absdiff, 1, 10,
_("verified blocks")))) {
double b1, b2;
char *b1u, *b2u;
- b1 = auto_space_units(used_data + used_rt, &b1u);
+ b1 = auto_space_units(used_data + used_rt + used_log, &b1u);
b2 = auto_space_units(ctx->bytes_checked, &b2u);
fprintf(stdout,
_("%.1f%s data counted; %.1f%s data verified.\n"),
^ permalink raw reply related
* [PATCH 06/11] xfs_scrub: warn about incomplete repairs if we never get to them
From: Darrick J. Wong @ 2026-06-24 18:16 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
If the final pass at repairs fails because the kernel says the
filesystem is busy, we should emit this error to the caller instead of
dropping it silently.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/repair.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/scrub/repair.c b/scrub/repair.c
index 5d821655877b80..fdefc2131aa453 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -110,7 +110,10 @@ repair_epilogue(
case EDEADLOCK:
case EBUSY:
/* Filesystem is busy, try again later. */
- if (debug || verbose)
+ if (repair_flags & XRM_FINAL_WARNING)
+ str_error(ctx, descr_render(dsc),
+_("Filesystem is busy, repair incomplete."));
+ else if (debug || verbose)
str_info(ctx, descr_render(dsc),
_("Filesystem is busy, deferring repair."));
return 0;
^ permalink raw reply related
* [PATCH 05/11] xfs_scrub: don't skip bulkstat batch when scrub_scan_user_files helper returns ESTALE
From: Darrick J. Wong @ 2026-06-24 18:15 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Codex complains that the ESTALE in the switch statement results in the
rest of the bulkstat batch being skipped, and that ECANCELED doesn't
actually abort the walk immediately. scrub_scan_user_files is only
called during phases 5 and 6, which is after we've verified all the file
metadata in the filesystem. Therefore, an ESTALE here means that the
file was deleted, so we skip it and move on to the next file. Fix both
issues.
Cc: <linux-xfs@vger.kernel.org> # v6.14.0
Fixes: 279b0d0e8d73f1 ("xfs_scrub: call bulkstat directly if we're only scanning user files")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/inodes.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/scrub/inodes.c b/scrub/inodes.c
index bf1cbdd6c7698b..7ce5f7ffeb62a1 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -360,6 +360,7 @@ bulkstat_for_inumbers(
enum abort_state {
RUNNING = 0,
ABORTED,
+ CANCELLED,
};
static inline int abort_state_ret(enum abort_state s)
@@ -544,8 +545,8 @@ _("Changed too many times during scan; giving up."));
goto out;
}
case ECANCELED:
- error = 0;
- fallthrough;
+ si->aborted = CANCELLED;
+ goto out;
default:
goto err;
}
@@ -761,9 +762,22 @@ scan_user_files(
case 0:
break;
case ESTALE:
- case ECANCELED:
+ /*
+ * scrub_scan_user_files is only called during phases
+ * 5 and 6, which is after we've verified all the file
+ * metadata in the filesystem. Therefore, an ESTALE
+ * here means that the file was deleted, so we skip it
+ * and move on to the next file.
+ */
error = 0;
- fallthrough;
+ break;
+ case ECANCELED:
+ /*
+ * Helper function wants us to stop iterating, so stop
+ * the walk immediately.
+ */
+ si->aborted = CANCELLED;
+ goto out;
default:
goto err;
}
^ permalink raw reply related
* [PATCH 04/11] xfs_scrub: track inode scan abort state with an enum
From: Darrick J. Wong @ 2026-06-24 18:15 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Change this from a boolean to an enum so that we can handle scan
cancellations correctly in the next patch.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/inodes.c | 44 +++++++++++++++++++++++++++-----------------
1 file changed, 27 insertions(+), 17 deletions(-)
diff --git a/scrub/inodes.c b/scrub/inodes.c
index ab5cf393327f1a..bf1cbdd6c7698b 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -357,13 +357,23 @@ bulkstat_for_inumbers(
bulkstat_single_step(ctx, inumbers, seen_mask, breq);
}
+enum abort_state {
+ RUNNING = 0,
+ ABORTED,
+};
+
+static inline int abort_state_ret(enum abort_state s)
+{
+ return s == ABORTED ? -1 : 0;
+}
+
/* BULKSTAT wrapper routines. */
struct scan_inodes {
struct workqueue wq_bulkstat;
scrub_inode_iter_fn fn;
void *arg;
unsigned int nr_threads;
- bool aborted;
+ enum abort_state aborted;
};
/*
@@ -530,7 +540,7 @@ scan_ag_bulkstat(
}
str_info(ctx, descr_render(&dsc_bulkstat),
_("Changed too many times during scan; giving up."));
- si->aborted = true;
+ si->aborted = ABORTED;
goto out;
}
case ECANCELED:
@@ -540,7 +550,7 @@ _("Changed too many times during scan; giving up."));
goto err;
}
if (scrub_excessive_errors(ctx)) {
- si->aborted = true;
+ si->aborted = ABORTED;
goto out;
}
last_ino = scan_ino;
@@ -549,7 +559,7 @@ _("Changed too many times during scan; giving up."));
err:
if (error) {
str_liberror(ctx, error, descr_render(&dsc_bulkstat));
- si->aborted = true;
+ si->aborted = ABORTED;
}
out:
free(ichunk);
@@ -594,7 +604,7 @@ scan_ag_inumbers(
cvt_ino_to_agino(&ctx->mnt, nextino),
cvt_ino_to_agino(&ctx->mnt,
ireq->inumbers[0].xi_startino));
- si->aborted = true;
+ si->aborted = ABORTED;
break;
}
nextino = ireq->hdr.ino;
@@ -611,7 +621,7 @@ scan_ag_inumbers(
error = -workqueue_add(&si->wq_bulkstat,
scan_ag_bulkstat, agno, ichunk);
if (error) {
- si->aborted = true;
+ si->aborted = ABORTED;
str_liberror(ctx, error,
_("queueing bulkstat work"));
goto out;
@@ -641,7 +651,7 @@ scan_ag_inumbers(
err:
if (error) {
str_liberror(ctx, error, descr_render(&dsc));
- si->aborted = true;
+ si->aborted = ABORTED;
}
out:
if (ichunk)
@@ -687,14 +697,14 @@ scrub_scan_all_inodes(
si.nr_threads);
if (ret) {
str_liberror(ctx, ret, _("creating inumbers workqueue"));
- si.aborted = true;
+ si.aborted = ABORTED;
goto kill_bulkstat;
}
for (agno = 0; agno < ctx->mnt.fsgeom.agcount; agno++) {
ret = -workqueue_add(&wq_inumbers, scan_ag_inumbers, agno, &si);
if (ret) {
- si.aborted = true;
+ si.aborted = ABORTED;
str_liberror(ctx, ret, _("queueing inumbers work"));
break;
}
@@ -702,7 +712,7 @@ scrub_scan_all_inodes(
ret = -workqueue_terminate(&wq_inumbers);
if (ret) {
- si.aborted = true;
+ si.aborted = ABORTED;
str_liberror(ctx, ret, _("finishing inumbers work"));
}
workqueue_destroy(&wq_inumbers);
@@ -710,12 +720,12 @@ scrub_scan_all_inodes(
kill_bulkstat:
ret = -workqueue_terminate(&si.wq_bulkstat);
if (ret) {
- si.aborted = true;
+ si.aborted = ABORTED;
str_liberror(ctx, ret, _("finishing bulkstat work"));
}
workqueue_destroy(&si.wq_bulkstat);
- return si.aborted ? -1 : 0;
+ return abort_state_ret(si.aborted);
}
struct user_bulkstat {
@@ -758,7 +768,7 @@ scan_user_files(
goto err;
}
if (scrub_excessive_errors(ctx)) {
- si->aborted = true;
+ si->aborted = ABORTED;
goto out;
}
}
@@ -766,7 +776,7 @@ scan_user_files(
err:
if (error) {
str_liberror(ctx, error, descr_render(&dsc_bulkstat));
- si->aborted = true;
+ si->aborted = ABORTED;
}
out:
free(ureq);
@@ -824,7 +834,7 @@ scan_user_bulkstat(
err_ureq:
free(ureq);
err:
- si->aborted = true;
+ si->aborted = ABORTED;
str_liberror(ctx, ret, what);
return 0;
}
@@ -861,12 +871,12 @@ scrub_scan_user_files(
ret = -workqueue_terminate(&si.wq_bulkstat);
if (ret) {
- si.aborted = true;
+ si.aborted = ABORTED;
str_liberror(ctx, ret, _("finishing bulkstat work"));
}
workqueue_destroy(&si.wq_bulkstat);
- return si.aborted ? -1 : 0;
+ return abort_state_ret(si.aborted);
}
/* Open a file by handle, returning either the fd or -1 on error. */
^ permalink raw reply related
* [PATCH 03/11] xfs_scrub: handle media scans of internal rt devices correctly
From: Darrick J. Wong @ 2026-06-24 18:15 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Codex noticed media scans of internal rt devices don't work at all
correctly. First, we fail to allocate a ctx->verify_disks[XFS_DEV_RT]
for the internal rt section, and even if we did, phase 6 doesn't
allocate media_verify_state.rvp[XFS_DEV_RT] if there's a media error on
an internal rt volume, so we'll crash there too.
Fix both issues to make it work properly.
Cc: <linux-xfs@vger.kernel.org> # v6.15.0
Fixes: 37591ef3f4f14c ("xfs_scrub: support internal RT device")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/phase1.c | 7 +++++--
scrub/phase6.c | 4 ++--
scrub/spacemap.c | 2 +-
3 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/scrub/phase1.c b/scrub/phase1.c
index 34a2b3aec030eb..6ac59264b50bb7 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -101,7 +101,8 @@ scrub_cleanup(
disk_close(ctx->verify_disks[XFS_DEV_DATA]);
if (ctx->verify_disks[XFS_DEV_LOG])
disk_close(ctx->verify_disks[XFS_DEV_LOG]);
- if (ctx->verify_disks[XFS_DEV_RT])
+ if (ctx->verify_disks[XFS_DEV_RT] &&
+ ctx->verify_disks[XFS_DEV_RT] != ctx->verify_disks[XFS_DEV_DATA])
disk_close(ctx->verify_disks[XFS_DEV_RT]);
fshandle_destroy();
error = -xfd_close(&ctx->mnt);
@@ -232,7 +233,9 @@ configure_xfs_verify_fallback(
}
}
- if (ctx->fsinfo.fs_rt) {
+ if (ctx->mnt.fsgeom.rtstart) {
+ ctx->verify_disks[XFS_DEV_RT] = ctx->verify_disks[XFS_DEV_DATA];
+ } else if (ctx->fsinfo.fs_rt || ctx->mnt.fsgeom.rtstart) {
ctx->verify_disks[XFS_DEV_RT] = disk_open(ctx->fsinfo.fs_rt);
if (!ctx->verify_disks[XFS_DEV_RT]) {
str_error(ctx, ctx->mntpoint,
diff --git a/scrub/phase6.c b/scrub/phase6.c
index 2278ae5ad3dfd7..aef817add4157b 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -744,7 +744,7 @@ phase6_func(
goto out_datapool;
}
}
- if (ctx->fsinfo.fs_rt) {
+ if (ctx->fsinfo.fs_rt || ctx->mnt.fsgeom.rtstart) {
ret = alloc_pool(ctx, &vs, XFS_DEV_RT);
if (ret) {
str_liberror(ctx, ret,
@@ -843,7 +843,7 @@ phase6_estimate(
* nr_threads appropriately to handle that many threads.
*/
*nr_threads = read_verify_nproc(ctx);
- if (ctx->fsinfo.fs_rt)
+ if (ctx->fsinfo.fs_rt || ctx->mnt.fsgeom.rtstart)
*nr_threads += read_verify_nproc(ctx);
if (ctx->fsinfo.fs_log)
*nr_threads += read_verify_nproc(ctx);
diff --git a/scrub/spacemap.c b/scrub/spacemap.c
index 1ee4d1946d3db7..8f595ad94c5991 100644
--- a/scrub/spacemap.c
+++ b/scrub/spacemap.c
@@ -266,7 +266,7 @@ scrub_scan_all_spacemaps(
break;
}
}
- if (ctx->fsinfo.fs_rt) {
+ if (ctx->fsinfo.fs_rt || ctx->mnt.fsgeom.rtstart) {
for (agno = 0; agno < ctx->mnt.fsgeom.rgcount; agno++) {
ret = -workqueue_add(&wq, scan_rtg_rmaps, agno, &sbx);
if (ret) {
^ permalink raw reply related
* [PATCH 02/11] xfs_scrub: report bad file ranges correctly
From: Darrick J. Wong @ 2026-06-24 18:15 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Codex complains that the "media error at data offset..." message prints
the wrong information -- err_off is the offset into @map, not the file
offset; and the length should be constrained by the end of @map. Fix
both of these issues.
Cc: <linux-xfs@vger.kernel.org> # v4.15.0
Fixes: b364a9c008fc04 ("xfs_scrub: scrub file data blocks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/phase6.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/scrub/phase6.c b/scrub/phase6.c
index cd9bb26bf88628..2278ae5ad3dfd7 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -408,6 +408,7 @@ report_ioerr_fsmap(
char buf[DESCR_BUFSZ];
struct ioerr_filerange *fr = arg;
uint64_t err_off;
+ uint64_t err_len;
int ret;
/* Don't care about unwritten extents. */
@@ -476,9 +477,12 @@ report_ioerr_fsmap(
return 0;
}
+ err_len = min(fr->physical + fr->length,
+ map->fmr_physical + map->fmr_length) -
+ max(fr->physical, map->fmr_physical);
str_unfixable_error(ctx, buf,
_("media error at data offset %llu length %llu."),
- err_off, fr->length);
+ map->fmr_offset + err_off, err_len);
return 0;
}
^ permalink raw reply related
* [PATCH 01/11] xfs_scrub: handle missing media verify ioctl failure return codes
From: Darrick J. Wong @ 2026-06-24 18:14 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs
In-Reply-To: <178232484383.915780.8675173410074139317.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Back when we reworked the read-verify code to use the kernel ioctl to
perform media scans, we forgot to teach read_verify_one callers to
handle the new error codes. Codex noticed this discrepancy, so let's
fix that.
Cc: <linux-xfs@vger.kernel.org> # v7.0.0
Fixes: 02760878dd86b9 ("xfs_scrub: use the verify media ioctl during phase 6 if possible")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
scrub/read_verify.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 01e96f5bef40f6..efdf6d544858b8 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -383,7 +383,13 @@ read_verify(
read_error = errno;
/* Runtime error, bail out... */
- if (read_error != EIO && read_error != EILSEQ) {
+ switch (read_error) {
+ case EIO:
+ case EILSEQ:
+ case EREMOTEIO:
+ case ENODATA:
+ break;
+ default:
rvp->runtime_error = read_error;
return;
}
^ permalink raw reply related
* [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 2
From: Darrick J. Wong @ 2026-06-24 18:14 UTC (permalink / raw)
To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs
Hi all,
Here's a second batch of xfs_scrub fixes resulting from Codex reviews.
If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.
With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.
--D
xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-codex-fixes2
---
Commits in this patchset:
* xfs_scrub: handle missing media verify ioctl failure return codes
* xfs_scrub: report bad file ranges correctly
* xfs_scrub: handle media scans of internal rt devices correctly
* xfs_scrub: track inode scan abort state with an enum
* xfs_scrub: don't skip bulkstat batch when scrub_scan_user_files helper returns ESTALE
* xfs_scrub: warn about incomplete repairs if we never get to them
* xfs_scrub: report external log space usage in phase 7
* xfs_scrub: account only data extent tail after an overlap
* xfs_scrub: account for reflinked realtime file data
* xfs_scrub: don't leak the autofsck fsproperty handle
* xfs_scrub: warn about difficult rtgroup repairs
---
scrub/inodes.c | 66 ++++++++++++++++++++++++++------------
scrub/phase1.c | 10 +++---
scrub/phase2.c | 8 +++++
scrub/phase6.c | 10 ++++--
scrub/phase7.c | 89 ++++++++++++++++++++++++++++++++++-----------------
scrub/read_verify.c | 8 ++++-
scrub/repair.c | 5 ++-
scrub/spacemap.c | 2 +
8 files changed, 137 insertions(+), 61 deletions(-)
^ permalink raw reply
* Re: [PATCH v3 3/6] xfs: implement write-stream management support
From: Darrick J. Wong @ 2026-06-24 18:11 UTC (permalink / raw)
To: Kanchan Joshi
Cc: brauner, hch, dgc, jack, cem, axboe, kbusch, ritesh.list,
linux-xfs, linux-fsdevel, linux-block, gost.dev
In-Reply-To: <20260616180555.33338-4-joshi.k@samsung.com>
On Tue, Jun 16, 2026 at 11:35:52PM +0530, Kanchan Joshi wrote:
> Implement support for FS_IOC_WRITE_STREAM ioctl.
>
> For FS_WRITE_STREAM_OP_GET_MAX, available write streams are reported
> based on the capability of the underlying block device.
> For FS_WRITE_STREAM_OP_{SET/GET}, add a new i_write_stream field in xfs
> inode. This value is propagated to the iomap during block mapping.
>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
> fs/xfs/xfs_icache.c | 1 +
> fs/xfs/xfs_inode.c | 46 +++++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_inode.h | 6 ++++++
> fs/xfs/xfs_ioctl.c | 38 +++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_iomap.c | 1 +
> 5 files changed, 92 insertions(+)
>
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 2040a9292ee6..d5f880f5b810 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -130,6 +130,7 @@ xfs_inode_alloc(
> spin_lock_init(&ip->i_ioend_lock);
> ip->i_next_unlinked = NULLAGINO;
> ip->i_prev_unlinked = 0;
> + ip->i_write_stream = 0;
>
> return ip;
> }
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index beaa26ec62da..2e7c61d71b48 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -47,6 +47,52 @@
>
> struct kmem_cache *xfs_inode_cache;
>
> +int
> +xfs_inode_max_write_streams(
> + struct xfs_inode *ip)
> +{
> + struct block_device *bdev;
> +
> + bdev = xfs_inode_buftarg(ip)->bt_bdev;
> + if (!bdev)
> + return 0;
> +
> + return bdev_max_write_streams(bdev);
> +}
> +
> +uint16_t
> +xfs_inode_get_write_stream(
> + struct xfs_inode *ip)
> +{
> + uint16_t stream_id;
> +
> + xfs_ilock(ip, XFS_ILOCK_SHARED);
> + stream_id = ip->i_write_stream;
> + xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +
> + return stream_id;
> +}
> +
> +int
> +xfs_inode_set_write_stream(
> + struct xfs_inode *ip,
> + uint16_t stream_id)
> +{
> + int ret = 0;
> +
> + xfs_ilock(ip, XFS_ILOCK_EXCL);
> +
> + if (stream_id > xfs_inode_max_write_streams(ip)) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> + ip->i_write_stream = stream_id;
> +
> +out_unlock:
> + xfs_iunlock(ip, XFS_ILOCK_EXCL);
> + return ret;
> +}
> +
> /*
> * These two are wrapper routines around the xfs_ilock() routine used to
> * centralize some grungy code. They are used in places that wish to lock the
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index bd6d33557194..768c4195306c 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -38,6 +38,9 @@ typedef struct xfs_inode {
> struct xfs_ifork i_df; /* data fork */
> struct xfs_ifork i_af; /* attribute fork */
>
> + /* Write stream information */
> + uint16_t i_write_stream;
> +
> /* Transaction and locking information. */
> struct xfs_inode_log_item *i_itemp; /* logging information */
> struct rw_semaphore i_lock; /* inode lock */
> @@ -676,4 +679,7 @@ int xfs_icreate_dqalloc(const struct xfs_icreate_args *args,
> struct xfs_dquot **udqpp, struct xfs_dquot **gdqpp,
> struct xfs_dquot **pdqpp);
>
> +int xfs_inode_max_write_streams(struct xfs_inode *ip);
> +uint16_t xfs_inode_get_write_stream(struct xfs_inode *ip);
> +int xfs_inode_set_write_stream(struct xfs_inode *ip, uint16_t stream_id);
> #endif /* __XFS_INODE_H__ */
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 46e234863644..3f82a4884b81 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1179,6 +1179,42 @@ xfs_ioctl_fs_counts(
> return 0;
> }
>
> +static int
> +xfs_ioc_write_stream(
> + struct file *filp,
> + void __user *arg)
> +{
> + struct inode *inode = file_inode(filp);
> + struct xfs_inode *ip = XFS_I(inode);
> + struct fs_write_stream ws = { };
> +
> + if (copy_from_user(&ws, arg, sizeof(ws)))
> + return -EFAULT;
> + if (ws.rsvd != 0)
> + return -EINVAL;
> +
> + switch (ws.op_flags) {
> + case FS_WRITE_STREAM_OP_GET_MAX:
> + ws.max_streams = xfs_inode_max_write_streams(ip);
Shouldn't you hold ILOCK when you look at the REALTIME bit?
--D
> + goto copy_out;
> + case FS_WRITE_STREAM_OP_GET:
> + ws.stream_id = xfs_inode_get_write_stream(ip);
> + goto copy_out;
> + case FS_WRITE_STREAM_OP_SET:
> + if (!(filp->f_mode & FMODE_WRITE))
> + return -EBADF;
> + return xfs_inode_set_write_stream(ip, ws.stream_id);
> + default:
> + return -EINVAL;
> + }
> + return 0;
> +
> +copy_out:
> + if (copy_to_user(arg, &ws, sizeof(ws)))
> + return -EFAULT;
> + return 0;
> +}
> +
> /*
> * These long-unused ioctls were removed from the official ioctl API in 5.17,
> * but retain these definitions so that we can log warnings about them.
> @@ -1444,6 +1480,8 @@ xfs_file_ioctl(
> return xfs_ioc_health_monitor(filp, arg);
> case XFS_IOC_VERIFY_MEDIA:
> return xfs_ioc_verify_media(filp, arg);
> + case FS_IOC_WRITE_STREAM:
> + return xfs_ioc_write_stream(filp, arg);
>
> default:
> return -ENOTTY;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index f20a02f49ed9..ccbf7dcf1ad5 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -144,6 +144,7 @@ xfs_bmbt_to_iomap(
> iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
> iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
> iomap->flags = iomap_flags;
> + iomap->write_stream = ip->i_write_stream;
> if (mapping_flags & IOMAP_DAX) {
> iomap->dax_dev = target->bt_daxdev;
> } else {
> --
> 2.25.1
>
>
^ permalink raw reply
* Re: [PATCH v3 2/6] iomap: introduce and propagate write_stream
From: Darrick J. Wong @ 2026-06-24 18:10 UTC (permalink / raw)
To: Kanchan Joshi
Cc: brauner, hch, dgc, jack, cem, axboe, kbusch, ritesh.list,
linux-xfs, linux-fsdevel, linux-block, gost.dev
In-Reply-To: <20260616180555.33338-3-joshi.k@samsung.com>
On Tue, Jun 16, 2026 at 11:35:51PM +0530, Kanchan Joshi wrote:
> Add a new write_stream field to struct iomap. Existing hole is used to
> place the new field.
> Propagate write_stream from iomap to bio in both direct I/O and buffered
> writeback paths.
>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
> fs/iomap/direct-io.c | 1 +
> fs/iomap/ioend.c | 3 +++
> include/linux/iomap.h | 2 ++
> 3 files changed, 6 insertions(+)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index b36ee619cdcd..455fd5d97d25 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -348,6 +348,7 @@ static ssize_t iomap_dio_bio_iter_one(struct iomap_iter *iter,
> fscrypt_set_bio_crypt_ctx(bio, iter->inode, pos, GFP_KERNEL);
> bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
> bio->bi_write_hint = iter->inode->i_write_hint;
> + bio->bi_write_stream = iter->iomap.write_stream;
> bio->bi_ioprio = dio->iocb->ki_ioprio;
> bio->bi_private = dio;
> bio->bi_end_io = iomap_dio_bio_end_io;
> diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
> index acf3cf98b23a..56ed5ba6a421 100644
> --- a/fs/iomap/ioend.c
> +++ b/fs/iomap/ioend.c
> @@ -164,6 +164,7 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
> GFP_NOFS, &iomap_ioend_bioset);
> bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
> bio->bi_write_hint = wpc->inode->i_write_hint;
> + bio->bi_write_stream = wpc->iomap.write_stream;
> wbc_init_bio(wpc->wbc, bio);
> wpc->nr_folios = 0;
> return iomap_init_ioend(wpc->inode, bio, pos, ioend_flags);
> @@ -187,6 +188,8 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
> if (!(wpc->iomap.flags & IOMAP_F_ANON_WRITE) &&
> iomap_sector(&wpc->iomap, pos) != bio_end_sector(&ioend->io_bio))
> return false;
> + if (wpc->iomap.write_stream != ioend->io_bio.bi_write_stream)
> + return false;
> /*
> * Limit ioend bio chain lengths to minimise IO completion latency. This
> * also prevents long tight loops ending page writeback on all the
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 2c5685adf3a9..44583429ffa4 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -120,6 +120,8 @@ struct iomap {
> u64 length; /* length of mapping, bytes */
> u16 type; /* type of mapping */
> u16 flags; /* flags for mapping */
> + u8 write_stream; /* write stream for I/O */
I'm mildly confused by the types here -- the ioctl exposes a u32, iomap
has a u8, and xfs seems to use u16. I gather you want maximum
flexibility in the uapi and that's the reason for the u32, but can the
internal interfaces be made consistent?
I also wonder what happens if the write stream ever becomes persistent,
but this patchset doesn't go there, and maybe the programming model is
simply that you have to set it every time you open the file?
--D
> + /* 3 bytes padding hole here */
> struct block_device *bdev; /* block device for I/O */
> struct dax_device *dax_dev; /* dax_dev for dax operations */
> void *inline_data;
> --
> 2.25.1
>
>
^ permalink raw reply
* Re: [PATCH v3 1/6] fs: add generic write-stream management ioctl
From: Darrick J. Wong @ 2026-06-24 18:03 UTC (permalink / raw)
To: Kanchan Joshi
Cc: brauner, hch, dgc, jack, cem, axboe, kbusch, ritesh.list,
linux-xfs, linux-fsdevel, linux-block, gost.dev
In-Reply-To: <20260616180555.33338-2-joshi.k@samsung.com>
On Tue, Jun 16, 2026 at 11:35:50PM +0530, Kanchan Joshi wrote:
> Wire up the userspace interface for write stream management via a new
> vfs ioctl 'FS_IOC_WRITE_STEAM'.
> Application communictes the intended operation using the 'op_flags'
> field of the passed 'struct fs_write_stream'.
> Valid flags are:
> FS_WRITE_STREAM_OP_GET_MAX: Returns the number of available streams.
> FS_WRITE_STREAM_OP_SET: Assign a specific stream value to the file.
> FS_WRITE_STREAM_OP_GET: Query what stream value is set on the file.
>
> Application should query the available streams by using
> FS_WRITE_STREAM_OP_GET_MAX first.
> If returned value is N, valid stream values for the file are 0 to N.
> Stream value 0 implies that no stream is set on the file.
You might want to make that an explicit #define then.
> Setting a larger value than available streams is rejected.
>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
> include/uapi/linux/fs.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 13f71202845e..9e87271e610b 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -338,6 +338,20 @@ struct file_attr {
> /* Get logical block metadata capability details */
> #define FS_IOC_GETLBMD_CAP _IOWR(0x15, 2, struct logical_block_metadata_cap)
>
> +struct fs_write_stream {
> + __u32 op_flags; /* IN: operation flags */
> + union {
> + __u32 stream_id; /* IN/OUT: stream value to assign/guery */
"query"?
--D
> + __u32 max_streams; /* OUT: max streams values supported */
> + };
> + __u64 rsvd;
> +};
> +
> +#define FS_WRITE_STREAM_OP_GET_MAX (1 << 0)
> +#define FS_WRITE_STREAM_OP_GET (1 << 1)
> +#define FS_WRITE_STREAM_OP_SET (1 << 2)
> +
> +#define FS_IOC_WRITE_STREAM _IOWR('f', 135, struct fs_write_stream)
> /*
> * Inode flags (FS_IOC_GETFLAGS / FS_IOC_SETFLAGS)
> *
> --
> 2.25.1
>
>
^ permalink raw reply
* Re: [PATCH RFC v2 17/18] fs: look up the superblock via the device table in user_get_super()
From: Darrick J. Wong @ 2026-06-24 17:54 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Christoph Hellwig, Jens Axboe, Alexander Viro,
linux-block, linux-kernel, linux-fsdevel, Carlos Maiolino,
linux-xfs, Chris Mason, David Sterba, linux-btrfs,
Theodore Ts'o, linux-ext4, Gao Xiang, linux-erofs
In-Reply-To: <20260616-work-super-bdev_holder_global-v2-17-7df6b864028e@kernel.org>
On Tue, Jun 16, 2026 at 04:08:33PM +0200, Christian Brauner wrote:
> user_get_super() still finds the superblock for a device number by
> walking the global super_blocks list under sb_lock. Every superblock is
> registered in the device table under its s_dev since sget_fc() inserts
> it there, including superblocks on anonymous devices, so use the table
> instead.
>
> The refcount-pinning cursor helpers super_dev_{get,first,next}() only
> touch table state and do not depend on CONFIG_BLOCK, so drop the
> CONFIG_BLOCK guard around them: their new caller serves anonymous
> devices as well (ustat() on e.g. tmpfs) and is built without
> CONFIG_BLOCK. The guard falls in this patch rather than separately
> since without this caller the helpers would be unused without
> CONFIG_BLOCK.
>
> The pinned entry holds a passive reference on the superblock so
> super_lock() can be called directly; once the superblock is locked grab
> a passive reference for the caller before dropping the pin.
>
> The device table contains more than the old walk could find: a
> superblock is also registered for every additional device it claims
> (the xfs log and realtime devices, btrfs member devices, the ext4
> external journal, erofs blob devices). Don't filter those out:
> specifying any device a filesystem uses now resolves to that
> filesystem, so ustat() and quotactl() work on e.g. the xfs log device
> or a btrfs member device (the latter used to fail outright as btrfs
> superblocks carry an anonymous s_dev that never matches a member
> device). When several superblocks share a device (erofs blob devices)
> the first live superblock wins.
Does erofs have a means to find the other superblocks that share a
device given a notification coming in on one of them? As hch says, it
feels weird to have a lookup mechanism when there's also an upcall
mechanism.
<shrug> I've been on vacation for a while so maybe I missed that there's
another use for the bdev->sb lookup? There are 1600 more emails for me
to go through... :P
--D
>
> The cursor also keeps scanning past dying superblocks where the old
> walk gave up after the first s_dev match, so a mount racing with the
> unmount of the same device (or with the reuse of a recycled anonymous
> dev_t) finds the live superblock where the old walk could spuriously
> return NULL.
>
> This removes the last s_dev-keyed walk of the super_blocks list and
> takes ustat() and quotactl()'s block device lookup off sb_lock
> entirely.
>
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
> ---
> fs/super.c | 28 ++++++++--------------------
> 1 file changed, 8 insertions(+), 20 deletions(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index 2d0a07861bfc..93f24aea75c4 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -501,7 +501,6 @@ static int super_dev_register(struct super_block *sb)
> return err;
> }
>
> -#ifdef CONFIG_BLOCK
> static struct super_dev *super_dev_get(struct rhlist_head *pos)
> {
> struct super_dev *sb_dev;
> @@ -535,7 +534,6 @@ static struct super_dev *super_dev_next(struct super_dev *prev)
> super_dev_put(prev);
> return sb_dev;
> }
> -#endif
>
> static void kill_super_notify(struct super_block *sb)
> {
> @@ -1044,29 +1042,19 @@ EXPORT_SYMBOL(iterate_supers_type);
>
> struct super_block *user_get_super(dev_t dev, bool excl)
> {
> - struct super_block *sb;
> -
> - spin_lock(&sb_lock);
> - list_for_each_entry(sb, &super_blocks, s_list) {
> - bool locked;
> + struct super_dev *sb_dev;
>
> - if (sb->s_dev != dev)
> - continue;
> + for (sb_dev = super_dev_first(dev); sb_dev; sb_dev = super_dev_next(sb_dev)) {
> + struct super_block *sb = sb_dev->sd_sb;
>
> - if (!refcount_inc_not_zero(&sb->s_passive))
> + if (!super_lock(sb, excl))
> continue;
>
> - spin_unlock(&sb_lock);
> -
> - locked = super_lock(sb, excl);
> - if (locked)
> - return sb;
> -
> - put_super(sb);
> - spin_lock(&sb_lock);
> - break;
> + /* The pinned entry holds a passive reference, take our own. */
> + refcount_inc(&sb->s_passive);
> + super_dev_put(sb_dev);
> + return sb;
> }
> - spin_unlock(&sb_lock);
> return NULL;
> }
>
>
> --
> 2.47.3
>
>
^ permalink raw reply
* Re: [PATCH v7 1/2] xfs: add an allocation mode to xfs_alloc_file_space()
From: Darrick J. Wong @ 2026-06-24 17:47 UTC (permalink / raw)
To: Pankaj Raghav
Cc: linux-xfs, bfoster, lukas, dgc, gost.dev, pankaj.raghav, andres,
kundan.kumar, hch, cem, hch
In-Reply-To: <20260622083106.2914092-2-p.raghav@samsung.com>
On Mon, Jun 22, 2026 at 10:31:05AM +0200, Pankaj Raghav wrote:
> xfs_alloc_file_space() hardcodes XFS_BMAPI_PREALLOC to preallocate
> unwritten extents across a range.
>
> In preparation for FALLOC_FL_WRITE_ZEROES, add an explicit allocation
> mode argument, enum xfs_alloc_file_space_mode, and derive the xfs_bmapi
> flags from it. The only mode for now is XFS_ALLOC_FILE_SPACE_PREALLOC,
> which preallocates unwritten extents and marks the inode as preallocated
> exactly as before, so there is no functional change.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> ---
> fs/xfs/xfs_bmap_util.c | 25 +++++++++++++++++++++----
> fs/xfs/xfs_bmap_util.h | 6 +++++-
> fs/xfs/xfs_file.c | 9 ++++++---
> 3 files changed, 32 insertions(+), 8 deletions(-)
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 3b9f262f8e91..8dfb3c1e3759 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -642,11 +642,19 @@ xfs_free_eofblocks(
> return error;
> }
>
> +/*
> + * Allocate space for a file according to @mode:
> + *
> + * XFS_ALLOC_FILE_SPACE_PREALLOC:
> + * Preallocate unwritten extents across the range and mark the inode as
> + * preallocated.
"Preallocate unwritten extents over holes across the range..."?
Other than that, this looks good to me.
--D
> + */
> int
> xfs_alloc_file_space(
> struct xfs_inode *ip,
> xfs_off_t offset,
> - xfs_off_t len)
> + xfs_off_t len,
> + enum xfs_alloc_file_space_mode mode)
> {
> xfs_mount_t *mp = ip->i_mount;
> xfs_off_t count;
> @@ -657,6 +665,7 @@ xfs_alloc_file_space(
> int rt;
> xfs_trans_t *tp;
> xfs_bmbt_irec_t imaps[1], *imapp;
> + uint32_t bmapi_flags, nr_exts;
> int error;
>
> if (xfs_is_always_cow_inode(ip))
> @@ -674,6 +683,15 @@ xfs_alloc_file_space(
> if (len <= 0)
> return -EINVAL;
>
> + switch (mode) {
> + case XFS_ALLOC_FILE_SPACE_PREALLOC:
> + bmapi_flags = XFS_BMAPI_PREALLOC;
> + nr_exts = XFS_IEXT_ADD_NOSPLIT_CNT;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> rt = XFS_IS_REALTIME_INODE(ip);
> extsz = xfs_get_extsz_hint(ip);
>
> @@ -733,8 +751,7 @@ xfs_alloc_file_space(
> if (error)
> break;
>
> - error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
> - XFS_IEXT_ADD_NOSPLIT_CNT);
> + error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK, nr_exts);
> if (error)
> goto error;
>
> @@ -748,7 +765,7 @@ xfs_alloc_file_space(
> * will eventually reach the requested range.
> */
> error = xfs_bmapi_write(tp, ip, startoffset_fsb,
> - allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
> + allocatesize_fsb, bmapi_flags, 0, imapp,
> &nimaps);
> if (error) {
> if (error != -ENOSR)
> diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
> index c477b3361630..232b4c48247e 100644
> --- a/fs/xfs/xfs_bmap_util.h
> +++ b/fs/xfs/xfs_bmap_util.h
> @@ -55,8 +55,12 @@ int xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
> int *is_empty);
>
> /* preallocation and hole punch interface */
> +enum xfs_alloc_file_space_mode {
> + XFS_ALLOC_FILE_SPACE_PREALLOC,
> +};
> +
> int xfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
> - xfs_off_t len);
> + xfs_off_t len, enum xfs_alloc_file_space_mode mode);
> int xfs_free_file_space(struct xfs_inode *ip, xfs_off_t offset,
> xfs_off_t len, struct xfs_zone_alloc_ctx *ac);
> int xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset,
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 845a97c9b063..e90ea6ebdc8e 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1406,7 +1406,8 @@ xfs_falloc_zero_range(
> len = round_up(offset + len, blksize) -
> round_down(offset, blksize);
> offset = round_down(offset, blksize);
> - error = xfs_alloc_file_space(ip, offset, len);
> + error = xfs_alloc_file_space(ip, offset, len,
> + XFS_ALLOC_FILE_SPACE_PREALLOC);
> }
> if (error)
> return error;
> @@ -1432,7 +1433,8 @@ xfs_falloc_unshare_range(
> if (error)
> return error;
>
> - error = xfs_alloc_file_space(XFS_I(inode), offset, len);
> + error = xfs_alloc_file_space(XFS_I(inode), offset, len,
> + XFS_ALLOC_FILE_SPACE_PREALLOC);
> if (error)
> return error;
> return xfs_falloc_setsize(file, new_size);
> @@ -1460,7 +1462,8 @@ xfs_falloc_allocate_range(
> if (error)
> return error;
>
> - error = xfs_alloc_file_space(XFS_I(inode), offset, len);
> + error = xfs_alloc_file_space(XFS_I(inode), offset, len,
> + XFS_ALLOC_FILE_SPACE_PREALLOC);
> if (error)
> return error;
> return xfs_falloc_setsize(file, new_size);
> --
> 2.51.2
>
>
^ permalink raw reply
* Re: [PATCH 2/2] xfs/216: disable all concurrency scaling
From: Darrick J. Wong @ 2026-06-24 17:44 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Zorro Lang, Lukas Herbolt, Eric Sandeen, Shin'ichiro Kawasaki,
linux-xfs, fstests
In-Reply-To: <20260619050937.444488-3-hch@lst.de>
On Fri, Jun 19, 2026 at 07:09:29AM +0200, Christoph Hellwig wrote:
> This test currently disables log concurrency scaling, but even the
> data device concurrency scaling can create mismatching output on
> systems with a large CPU count.
>
> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
> tests/xfs/216 | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/xfs/216 b/tests/xfs/216
> index 1749647c11f7..ce8bb528410b 100755
> --- a/tests/xfs/216
> +++ b/tests/xfs/216
> @@ -23,7 +23,7 @@ _cleanup()
> _require_scratch
> _scratch_mkfs_xfs >/dev/null 2>&1
> if _scratch_mkfs_xfs_supports_concurrency -l >> $seqres.full 2>&1; then
> - loop_mkfs_opts="-l concurrency=0"
> + loop_mkfs_opts="-d concurrency=0 -l concurrency=0 -r concurrency=0"
/me notes that -lconcurrency is not compatible with -llogdev and the
loopdev is not formatted with SCRATCH_MKFS_OPTIONS, so this won't work
to disable the concurrency= mkfs options if fstests is being run with
SCRATCH_LOGDEV set.
--D
> else
> loop_mkfs_opts=""
> fi
> --
> 2.53.0
>
>
^ permalink raw reply
* [PATCH] iomap: Remove FGP_NOFS from iomap_get_folio()
From: Matthew Wilcox (Oracle) @ 2026-06-24 17:42 UTC (permalink / raw)
To: Christian Brauner
Cc: Matthew Wilcox (Oracle), Darrick J. Wong, Jens Axboe, Namjae Jeon,
Sungjong Seo, Yuezhang Mo, Miklos Szeredi, Andreas Gruenbacher,
Hyunchul Lee, Konstantin Komarov, Carlos Maiolino, Damien Le Moal,
Naohiro Aota, Johannes Thumshirn, linux-xfs, linux-fsdevel,
linux-block, fuse-devel, gfs2, ntfs3
FGP_NOFS is legacy; filesystems should be using memalloc_nofs_save/restore
instead. We have it here in iomap because it was buried in
grab_cache_page_write_begin() and we didn't want to change this behaviour
as part of the folio transition.
I have tested this with XFS and see no issues. Other filesystems (cc'd)
may need to make adjustments. Please test with lockdep enabled.
Cc: "Darrick J. Wong" <djwong@kernel.org> (iomap)
Cc: Jens Axboe <axboe@kernel.dk> (block)
Cc: Namjae Jeon <linkinjeon@kernel.org> (exfat, ntfs)
Cc: Sungjong Seo <sj1557.seo@samsung.com> (exfat)
Cc: Yuezhang Mo <yuezhang.mo@sony.com> (exfat)
Cc: Miklos Szeredi <miklos@szeredi.hu> (fuse)
Cc: Andreas Gruenbacher <agruenba@redhat.com> (gfs2)
Cc: Hyunchul Lee <hyc.lee@gmail.com> (ntfs)
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> (ntfs3)
Cc: Carlos Maiolino <cem@kernel.org> (xfs)
Cc: Damien Le Moal <dlemoal@kernel.org> (zonefs)
Cc: Naohiro Aota <naohiro.aota@wdc.com> (zonefs)
Cc: Johannes Thumshirn <jth@kernel.org> (zonefs)
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: fuse-devel@lists.linux.dev
Cc: gfs2@lists.linux.dev
Cc: ntfs3@lists.linux.dev
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/iomap/buffered-io.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8d4806dc46d4..27bc2455a98d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -768,7 +768,7 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
*/
struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
{
- fgf_t fgp = FGP_WRITEBEGIN | FGP_NOFS;
+ fgf_t fgp = FGP_WRITEBEGIN;
if (iter->flags & IOMAP_NOWAIT)
fgp |= FGP_NOWAIT;
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v2] xfs: clean up spelling and grammar errors in comments
From: Darrick J. Wong @ 2026-06-24 17:37 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: caina, linux-xfs
In-Reply-To: <ajve0YCxD_u62phh@infradead.org>
On Wed, Jun 24, 2026 at 06:42:41AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 17, 2026 at 01:50:52PM +0800, caina wrote:
> > /*
> > * Set up an in-memory buffer cache so that we can use the xfbtree. Allocating
> > - * a shmem file might take loks, so we cannot be in transaction context. Park
> > + * a shmem file might take looks, so we cannot be in transaction context. Park
>
> I think this should be locks, not looks.
Yes.
The rest of the changes (aside from the "do so so that..." change) look
ok to me.
--D
^ permalink raw reply
* Re: [PATCH v5 2/2] xfs: prevent close() from hanging on frozen filesystems
From: Darrick J. Wong @ 2026-06-24 17:35 UTC (permalink / raw)
To: Aditya Srivastava
Cc: Carlos Maiolino, Christoph Hellwig, linux-xfs, linux-kernel
In-Reply-To: <20260616053850.2188-3-aditya.ansh182@gmail.com>
On Tue, Jun 16, 2026 at 05:38:50AM +0000, Aditya Srivastava wrote:
> From: Aditya Prakash Srivastava <aditya.ansh182@gmail.com>
>
> When a file with active speculative post-EOF preallocations is closed,
> xfs_file_release() synchronously triggers xfs_free_eofblocks() to clean
> them up. This requires allocating a write transaction (xfs_trans_alloc),
> which blocks indefinitely if the filesystem is currently frozen or in the
> process of freezing, as it waits to acquire the superblock's write lock.
>
> As a result, a close() system call on a read-write file descriptor can
> hang indefinitely in percpu_rwsem_wait() until the filesystem is thawed,
> even if the file is closed by a non-writer process or after all writing
> activity has already ceased.
>
> To fix this properly and avoid any potential race conditions where a freeze
> might come in immediately after a writable check, pass the new
> XFS_TRANS_WRITECOUNT_TRYLOCK flag to xfs_trans_alloc() when freeing
> speculative preallocations in xfs_file_release().
>
> If xfs_free_eofblocks() returns -EAGAIN on a trylock failure, we cleanly
> bypass setting XFS_EOFBLOCKS_RELEASED on the inode, ensuring subsequent
> releases or the background blockgc garbage collector can successfully retry
> the cleanup once the filesystem thaws.
>
> Also, add the new trans_flags parameter to xfs_free_eofblocks() to make
> its usage stand out, and update existing callers to pass 0 to preserve
> standard blocking paths.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=205833
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1474726
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Aditya Prakash Srivastava <aditya.ansh182@gmail.com>
> ---
> fs/xfs/xfs_bmap_util.c | 10 ++++++----
> fs/xfs/xfs_bmap_util.h | 2 +-
> fs/xfs/xfs_file.c | 8 +++++---
> fs/xfs/xfs_icache.c | 2 +-
> fs/xfs/xfs_inode.c | 2 +-
> 5 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 0ab00615f1ad..a99aae4a1631 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -574,7 +574,8 @@ xfs_can_free_eofblocks(
> */
> int
> xfs_free_eofblocks(
> - struct xfs_inode *ip)
> + struct xfs_inode *ip,
> + uint trans_flags)
> {
> struct xfs_trans *tp;
> struct xfs_mount *mp = ip->i_mount;
> @@ -604,9 +605,10 @@ xfs_free_eofblocks(
> return 0;
> }
>
> - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0,
> + trans_flags, &tp);
> if (error) {
> - ASSERT(xfs_is_shutdown(mp));
> + ASSERT(error == -EAGAIN || xfs_is_shutdown(mp));
> return error;
> }
>
> @@ -928,7 +930,7 @@ xfs_prepare_shift(
> * into the accessible region of the file.
> */
> if (xfs_can_free_eofblocks(ip)) {
> - error = xfs_free_eofblocks(ip);
> + error = xfs_free_eofblocks(ip, 0);
> if (error)
> return error;
> }
> diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
> index c477b3361630..c13774aa0892 100644
> --- a/fs/xfs/xfs_bmap_util.h
> +++ b/fs/xfs/xfs_bmap_util.h
> @@ -66,7 +66,7 @@ int xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset,
>
> /* EOF block manipulation functions */
> bool xfs_can_free_eofblocks(struct xfs_inode *ip);
> -int xfs_free_eofblocks(struct xfs_inode *ip);
> +int xfs_free_eofblocks(struct xfs_inode *ip, uint trans_flags);
>
> int xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip,
> struct xfs_swapext *sx);
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 845a97c9b063..76c9b2fe7c51 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1806,9 +1806,11 @@ xfs_file_release(
> */
> if (!xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED) &&
> xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> - if (xfs_can_free_eofblocks(ip) &&
> - !xfs_iflags_test_and_set(ip, XFS_EOFBLOCKS_RELEASED))
> - xfs_free_eofblocks(ip);
> + if (!xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED) &&
> + xfs_can_free_eofblocks(ip) &&
> + !xfs_free_eofblocks(ip, XFS_TRANS_WRITECOUNT_TRYLOCK))
> + xfs_iflags_set(ip, XFS_EOFBLOCKS_RELEASED);
Could you prevent the close() stalls by surrounding this with
sb_start_write_trylock instead of passing transaction allocation flags
all the way down?
OFC that results in a messy if test:
if (xfs_can_free_eofblocks(...) &&
!xfs_iflags_test(...RELEASED) &&
!sb_start_write_trylock(...)) {
if (!xfs_iflags_test_and_set(...))
xfs_free_eofblocks(ip);
sb_end_write(...);
}
<shrug> Sorry if this is noise, I've been on vacation.
--D
> +
> xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> }
>
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 2040a9292ee6..c575b4acb24c 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1259,7 +1259,7 @@ xfs_inode_free_eofblocks(
> *lockflags |= XFS_IOLOCK_EXCL;
>
> if (xfs_can_free_eofblocks(ip))
> - return xfs_free_eofblocks(ip);
> + return xfs_free_eofblocks(ip, 0);
>
> /* inode could be preallocated */
> trace_xfs_inode_free_eofblocks_invalid(ip);
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index ddf2707c8894..14d3cd04a79f 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1423,7 +1423,7 @@ xfs_inactive(
> * reference to the inode at this point anyways.
> */
> if (xfs_can_free_eofblocks(ip))
> - error = xfs_free_eofblocks(ip);
> + error = xfs_free_eofblocks(ip, 0);
>
> goto out;
> }
> --
> 2.47.3
>
>
^ permalink raw reply
* Re: [PATCH] fsr: preserve xfrog_bulkstat error codes
From: Darrick J. Wong @ 2026-06-24 17:20 UTC (permalink / raw)
To: liuh; +Cc: hch, linux-xfs
In-Reply-To: <20260624081135.12390-1-liuhuan01@kylinos.cn>
On Wed, Jun 24, 2026 at 04:11:35PM +0800, liuh wrote:
> Fix the bulkstat loop condition in fsrfs() to assign the return
> value of xfrog_bulkstat() to ret before comparing it against zero.
>
> Without the extra parentheses, operator precedence causes ret
> to receive only the result of the comparison (0 or 1), which
> discards the actual error code and breaks error reporting.
>
> Signed-off-by: liuh <liuhuan01@kylinos.cn>
> ---
> fsr/xfs_fsr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fsr/xfs_fsr.c b/fsr/xfs_fsr.c
> index e74180c1..68cc0ea4 100644
> --- a/fsr/xfs_fsr.c
> +++ b/fsr/xfs_fsr.c
> @@ -678,7 +678,7 @@ fsrfs(char *mntdir, xfs_ino_t startino, int targetrange)
> return -1;
> }
>
> - while ((ret = -xfrog_bulkstat(&fsxfd, breq) == 0)) {
> + while ((ret = -xfrog_bulkstat(&fsxfd, breq)) == 0) {
Heh, oops.
Cc: <linux-xfs@vger.kernel.org> # v5.3.0
Fixes: e6542132dec3cd ("libfrog: convert bulkstat.c functions to negative error codes")
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> struct xfs_bulkstat *buf = breq->bulkstat;
> struct xfs_bulkstat *p;
> struct xfs_bulkstat *endp;
> --
> 2.43.0
>
>
^ permalink raw reply
* Re: [PATCH] mkfs: simplify setup_proto file status checks
From: Darrick J. Wong @ 2026-06-24 17:13 UTC (permalink / raw)
To: liuh; +Cc: linux-xfs
In-Reply-To: <20260622071110.28854-1-liuhuan01@kylinos.cn>
On Mon, Jun 22, 2026 at 03:11:10PM +0800, liuh wrote:
> setup_proto() calls filesize() before validating the source type with
> fstat(), even though both operations ultimately require the same file
> status information.
>
> Perform a single fstat() immediately after opening the source path and
> reuse the resulting metadata for both source type validation and file
> size retrieval. This simplifies the setup logic and eliminates a
> redundant metadata lookup.
>
> Signed-off-by: liuh <liuhuan01@kylinos.cn>
> ---
> mkfs/proto.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/mkfs/proto.c b/mkfs/proto.c
> index a460aebd..51f61e18 100644
> --- a/mkfs/proto.c
> +++ b/mkfs/proto.c
> @@ -82,14 +82,17 @@ setup_proto(
> return result;
> }
>
> - if ((fd = open(fname, O_RDONLY)) < 0 || (size = filesize(fd)) < 0) {
> + if ((fd = open(fname, O_RDONLY)) < 0) {
> fprintf(stderr, _("%s: failed to open %s: %s\n"),
> progname, fname, strerror(errno));
> goto out_fail;
> }
>
> - if (fstat(fd, &statbuf) < 0)
> - fail(_("invalid or unreadable source path"), errno);
> + if (fstat(fd, &statbuf) < 0) {
> + fprintf(stderr, _("%s: failed to stat %s: %s\n"),
> + progname, fname, strerror(errno));
> + goto out_fail;
> + }
>
> /*
> * Handle directory inputs.
> @@ -101,6 +104,11 @@ setup_proto(
> return result;
> }
>
> + /*
> + * Get size from statbuf for regular file
There's no guarantee that this is a regular file; the only option that
we've eliminated here is S_IFDIR. There ought to be a check for
S_IFREG here.
--D
> + */
> + size = statbuf.st_size;
> +
> /*
> * Else this is a protofile, let's handle traditionally.
> */
> --
> 2.43.0
>
>
^ permalink raw reply
* Re: [PATCH] mkfs: remove duplicate include of convert.h
From: Darrick J. Wong @ 2026-06-24 17:09 UTC (permalink / raw)
To: liuh; +Cc: linux-xfs
In-Reply-To: <20260622024311.26681-1-liuhuan01@kylinos.cn>
On Mon, Jun 22, 2026 at 10:43:11AM +0800, liuh wrote:
> The header file "libfrog/convert.h" is included twice in xfs_mkfs.c.
> Clean it up by removing the redundant second inclusion.
>
> Signed-off-by: liuh <liuhuan01@kylinos.cn>
Looks ok,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> mkfs/xfs_mkfs.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index dd8a48c3..64e8a6a6 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -15,7 +15,6 @@
> #include "libfrog/dahashselftest.h"
> #include "libfrog/fsproperties.h"
> #include "libfrog/zones.h"
> -#include "libfrog/convert.h"
> #include "proto.h"
> #include <ini.h>
>
> --
> 2.43.0
>
>
^ permalink raw reply
* Re: [PATCH] xfs: fix AGFL extent count calculation in xrep_agfl_fill
From: Darrick J. Wong @ 2026-06-24 17:08 UTC (permalink / raw)
To: jiazhenyuan; +Cc: cem, kees, linux-xfs, hch, linux-kernel, kernel
In-Reply-To: <20260623024153.835431-1-jiazhenyuan@uniontech.com>
On Tue, Jun 23, 2026 at 10:41:53AM +0800, jiazhenyuan wrote:
> In xrep_agfl_fill(), the call to xagb_bitmap_set() passes
> 'agbno - 1' as the length argument. However, xagb_bitmap_set()
> expects a length (number of blocks), not an end block number.
> Passing 'agbno - 1' causes used_extents to record an incorrect
> range.
>
> Fix this by calculating the correct length as 'agbno - start',
> which represents the actual number of blocks filled into the AGFL.
I have a stack of bugfixes waiting for the 7.2-rc rebase, and this
is one of them. But since you posted first, I no longer have to seek
approval for it. :)
With this added,
Cc: <stable@vger.kernel.org> # v6.6
Fixes: 014ad53732d2ba ("xfs: use per-AG bitmaps to reap unused AG metadata blocks during repair")
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> Signed-off-by: jiazhenyuan <jiazhenyuan@uniontech.com>
> ---
> fs/xfs/scrub/agheader_repair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
> index ae9ed5f280d0..b0ffd37afb45 100644
> --- a/fs/xfs/scrub/agheader_repair.c
> +++ b/fs/xfs/scrub/agheader_repair.c
> @@ -652,7 +652,7 @@ xrep_agfl_fill(
> while (agbno < start + len && af->fl_off < af->flcount)
> af->agfl_bno[af->fl_off++] = cpu_to_be32(agbno++);
>
> - error = xagb_bitmap_set(&af->used_extents, start, agbno - 1);
> + error = xagb_bitmap_set(&af->used_extents, start, agbno - start);
> if (error)
> return error;
>
> --
> 2.20.1
>
>
^ permalink raw reply
* Re: update BDI {io,ra}_pages values based on the RT device limits
From: Christoph Hellwig @ 2026-06-24 15:42 UTC (permalink / raw)
To: Carlos Maiolino
Cc: Christoph Hellwig, Jan Kara, Filip Blagojevic, Matthew Wilcox,
Damien Le Moal, linux-fsdevel, linux-xfs
In-Reply-To: <ajuxIMzTbzxEuX5g@nidhogg.toxiclabs.cc>
On Wed, Jun 24, 2026 at 12:40:50PM +0200, Carlos Maiolino wrote:
> Giving the current tendency of filesystems being multi-device, this
> doesn't sound bad IMHO. Wouldn't accessing io_pages of each BDI also be
> worth even for a journal dev? I wonder if what you ran into wouldn't be
> possible if somebody would be using just a SSD for journal and a non-rt
> XFS on a HDD or a different/slower device.
Nothing looks at the value for the log device.
>
> I don't know how stupid that sounds but perhaps it wouldn't be that
> complicated to support multiple BDIs without making it unpleasant for
> filesystems that don't care?
> sb->bdi could be turned into a dynamic array and a new sb->s_devcount
> fields to keep track of it on multi-device filesystems. Filesystems who
> don't care about multiple BDIs would have it pointing to a single BDI
> struct. Again I feel I'm missing something, so it might sound really
> stupid :)
This will get complicated really soon..
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox