* [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y
@ 2025-06-11 21:01 Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 01/23] xfs: fix interval filtering in multi-step fsmap queries Leah Rumancik
` (23 more replies)
0 siblings, 24 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable; +Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Leah Rumancik
Hello again,
This is a series for 6.1.y for fixes from 6.11. It corresponds to the
6.6.y series here:
https://lore.kernel.org/linux-xfs/20241218191725.63098-1-catherine.hoang@oracle.com/
During porting, I noticed 6.1.y was missing a fix series from 6.5
that is a dependency of the fixes from 6.11 so I included those
first.
These were tested via the auto group on 9 configs with no regressions
seen. These were also already ack'd on the xfs-stable mailing list.
series from 6.5:
https://lore.kernel.org/linux-xfs/168506055189.3727958.722711918040129046.stgit@frogsfrogsfrogs/
63ef7a35912d xfs: fix interval filtering in multi-step fsmap queries
7975aba19cba xfs: fix integer overflows in the fsmap rtbitmap and logdev backends
d898137d789c xfs: fix getfsmap reporting past the last rt extent
f045dd00328d xfs: clean up the rtbitmap fsmap backend
a949a1c2a198 xfs: fix logdev fsmap query result filtering
3ee9351e7490 xfs: validate fsmap offsets specified in the query keys
75dc03453122 xfs: fix xfs_btree_query_range callers to initialize btree rec fully
fix of 63ef7a35912dd ("xfs: fix interval filtering in multi-step fsmap queries")
https://lore.kernel.org/linux-xfs/169335025661.3518128.12423331693506002020.stgit@frogsfrogsfrogs/
cfa2df68b7ce xfs: fix an agbno overflow in __xfs_getfsmap_datadev
6.6 series for 6.11:
https://lore.kernel.org/linux-xfs/20241218191725.63098-1-catherine.hoang@oracle.com/
85d0947db262 xfs: fix the contact address for the sysfs ABI documentation
c08d03996cea xfs: verify buffer, inode, and dquot items every tx commit
ff627196ddc1 xfs: use consistent uid/gid when grabbing dquots for inodes
7531c9ab2e55 xfs: declare xfs_file.c symbols in xfs_file.h
c070b8802159 xfs: create a new helper to return a file's allocation unit
2e63ed9b0175 xfs: Fix xfs_flush_unmap_range() range for RT
fe962ab3c4f1 xfs: Fix xfs_prepare_shift() range for RT
ca96d83c9307 xfs: don't walk off the end of a directory data block
27336a327b40 xfs: remove unused parameter in macro XFS_DQUOT_LOGRES
b2dcbd8a928c xfs: attr forks require attr, not attr2
4a82db7a4b73 xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set
9fadc53d793c xfs: Fix the owner setting issue for rmap query in xfs fsmap
35bd108619c2 xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
29fcb5fef608 xfs: take m_growlock when running growfsrt
e5d1ae2d4d0b xfs: reset rootdir extent size hint after growfsrt
[skipped for 6.1 as scrub is not supported in 6.1:]
cb95cb2450e3 xfs: convert comma to semicolon
1bee32f33c0a xfs: fix file_path handling in tracepoints
- Leah
Christoph Hellwig (1):
xfs: fix the contact address for the sysfs ABI documentation
Darrick J. Wong (17):
xfs: fix interval filtering in multi-step fsmap queries
xfs: fix integer overflows in the fsmap rtbitmap and logdev backends
xfs: fix getfsmap reporting past the last rt extent
xfs: clean up the rtbitmap fsmap backend
xfs: fix logdev fsmap query result filtering
xfs: validate fsmap offsets specified in the query keys
xfs: fix xfs_btree_query_range callers to initialize btree rec fully
xfs: fix an agbno overflow in __xfs_getfsmap_datadev
xfs: verify buffer, inode, and dquot items every tx commit
xfs: use consistent uid/gid when grabbing dquots for inodes
xfs: declare xfs_file.c symbols in xfs_file.h
xfs: create a new helper to return a file's allocation unit
xfs: attr forks require attr, not attr2
xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set
xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
xfs: take m_growlock when running growfsrt
xfs: reset rootdir extent size hint after growfsrt
John Garry (2):
xfs: Fix xfs_flush_unmap_range() range for RT
xfs: Fix xfs_prepare_shift() range for RT
Julian Sun (1):
xfs: remove unused parameter in macro XFS_DQUOT_LOGRES
Zizhi Wo (1):
xfs: Fix the owner setting issue for rmap query in xfs fsmap
lei lu (1):
xfs: don't walk off the end of a directory data block
Documentation/ABI/testing/sysfs-fs-xfs | 8 +-
fs/xfs/Kconfig | 12 ++
fs/xfs/libxfs/xfs_alloc.c | 10 +-
fs/xfs/libxfs/xfs_dir2_data.c | 31 ++-
fs/xfs/libxfs/xfs_dir2_priv.h | 7 +
fs/xfs/libxfs/xfs_quota_defs.h | 2 +-
fs/xfs/libxfs/xfs_refcount.c | 13 +-
fs/xfs/libxfs/xfs_rmap.c | 10 +-
fs/xfs/libxfs/xfs_trans_resv.c | 28 +--
fs/xfs/scrub/bmap.c | 8 +-
fs/xfs/xfs.h | 4 +
fs/xfs/xfs_bmap_util.c | 22 +-
fs/xfs/xfs_buf_item.c | 32 +++
fs/xfs/xfs_dquot_item.c | 31 +++
fs/xfs/xfs_file.c | 33 ++-
fs/xfs/xfs_file.h | 15 ++
fs/xfs/xfs_fsmap.c | 266 ++++++++++++++-----------
fs/xfs/xfs_inode.c | 29 ++-
fs/xfs/xfs_inode.h | 2 +
fs/xfs/xfs_inode_item.c | 32 +++
fs/xfs/xfs_ioctl.c | 12 ++
fs/xfs/xfs_iops.c | 1 +
fs/xfs/xfs_iops.h | 3 -
fs/xfs/xfs_rtalloc.c | 78 ++++++--
fs/xfs/xfs_symlink.c | 8 +-
fs/xfs/xfs_trace.h | 25 +++
26 files changed, 505 insertions(+), 217 deletions(-)
create mode 100644 fs/xfs/xfs_file.h
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 6.1 01/23] xfs: fix interval filtering in multi-step fsmap queries
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 02/23] xfs: fix integer overflows in the fsmap rtbitmap and logdev backends Leah Rumancik
` (22 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Dave Chinner,
Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 63ef7a35912dd743cabd65d5bb95891625c0dd46 ]
I noticed a bug in ranged GETFSMAP queries:
# xfs_io -c 'fsmap -vvvv' /opt
EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL
0: 8:80 [0..7]: static fs metadata 0 (0..7) 8
<snip>
9: 8:80 [192..223]: 137 0..31 0 (192..223) 32
# xfs_io -c 'fsmap -vvvv -d 208 208' /opt
#
That's not right -- we asked what block maps block 208, and we should've
received a mapping for inode 137 offset 16. Instead, we get nothing.
The root cause of this problem is a mis-interaction between the fsmap
code and how btree ranged queries work. xfs_btree_query_range returns
any btree record that overlaps with the query interval, even if the
record starts before or ends after the interval. Similarly, GETFSMAP is
supposed to return a recordset containing all records that overlap the
range queried.
However, it's possible that the recordset is larger than the buffer that
the caller provided to convey mappings to userspace. In /that/ case,
userspace is supposed to copy the last record returned to fmh_keys[0]
and call GETFSMAP again. In this case, we do not want to return
mappings that we have already supplied to the caller. The call to
xfs_btree_query_range is the same, but now we ignore any records that
start before fmh_keys[0].
Unfortunately, we didn't implement the filtering predicate correctly.
The predicate should only be called when we're calling back for more
records. Accomplish this by setting info->low.rm_blockcount to a
nonzero value and ensuring that it is cleared as necessary. As a
result, we no longer want to adjust dkeys[0] in the main setup function
because that's confusing.
This patch doesn't touch the logdev/rtbitmap backends because they have
bigger problems that will be addressed by subsequent patches.
Found via xfs/556 with parent pointers enabled.
Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_fsmap.c | 67 +++++++++++++++++++++++++++++++++-------------
1 file changed, 48 insertions(+), 19 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index a5b9754c62d1..2011f1bf7ce0 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -160,11 +160,18 @@ struct xfs_getfsmap_info {
struct xfs_buf *agf_bp; /* AGF, for refcount queries */
struct xfs_perag *pag; /* AG info, if applicable */
xfs_daddr_t next_daddr; /* next daddr we expect */
u64 missing_owner; /* owner of holes */
u32 dev; /* device id */
- struct xfs_rmap_irec low; /* low rmap key */
+ /*
+ * Low rmap key for the query. If low.rm_blockcount is nonzero, this
+ * is the second (or later) call to retrieve the recordset in pieces.
+ * xfs_getfsmap_rec_before_start will compare all records retrieved
+ * by the rmapbt query to filter out any records that start before
+ * the last record.
+ */
+ struct xfs_rmap_irec low;
struct xfs_rmap_irec high; /* high rmap key */
bool last; /* last extent? */
};
/* Associate a device with a getfsmap handler. */
@@ -235,34 +242,45 @@ xfs_getfsmap_format(
rec = &info->fsmap_recs[info->head->fmh_entries++];
xfs_fsmap_from_internal(rec, xfm);
}
+static inline bool
+xfs_getfsmap_rec_before_start(
+ struct xfs_getfsmap_info *info,
+ const struct xfs_rmap_irec *rec,
+ xfs_daddr_t rec_daddr)
+{
+ if (info->low.rm_blockcount)
+ return xfs_rmap_compare(rec, &info->low) < 0;
+ return false;
+}
+
/*
* Format a reverse mapping for getfsmap, having translated rm_startblock
* into the appropriate daddr units.
*/
STATIC int
xfs_getfsmap_helper(
struct xfs_trans *tp,
struct xfs_getfsmap_info *info,
const struct xfs_rmap_irec *rec,
xfs_daddr_t rec_daddr)
{
struct xfs_fsmap fmr;
struct xfs_mount *mp = tp->t_mountp;
bool shared;
int error;
if (fatal_signal_pending(current))
return -EINTR;
/*
* Filter out records that start before our startpoint, if the
* caller requested that.
*/
- if (xfs_rmap_compare(rec, &info->low) < 0) {
+ if (xfs_getfsmap_rec_before_start(info, rec, rec_daddr)) {
rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount);
if (info->next_daddr < rec_daddr)
info->next_daddr = rec_daddr;
return 0;
}
@@ -604,13 +622,31 @@ __xfs_getfsmap_datadev(
info->low.rm_startblock = XFS_FSB_TO_AGBNO(mp, start_fsb);
info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset);
error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]);
if (error)
return error;
- info->low.rm_blockcount = 0;
+ info->low.rm_blockcount = XFS_BB_TO_FSBT(mp, keys[0].fmr_length);
xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
+ /* Adjust the low key if we are continuing from where we left off. */
+ if (info->low.rm_blockcount == 0) {
+ /* empty */
+ } else if (XFS_RMAP_NON_INODE_OWNER(info->low.rm_owner) ||
+ (info->low.rm_flags & (XFS_RMAP_ATTR_FORK |
+ XFS_RMAP_BMBT_BLOCK |
+ XFS_RMAP_UNWRITTEN))) {
+ info->low.rm_startblock += info->low.rm_blockcount;
+ info->low.rm_owner = 0;
+ info->low.rm_offset = 0;
+
+ start_fsb += info->low.rm_blockcount;
+ if (XFS_FSB_TO_DADDR(mp, start_fsb) >= eofs)
+ return 0;
+ } else {
+ info->low.rm_offset += info->low.rm_blockcount;
+ }
+
info->high.rm_startblock = -1U;
info->high.rm_owner = ULLONG_MAX;
info->high.rm_offset = ULLONG_MAX;
info->high.rm_blockcount = 0;
info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS;
@@ -657,16 +693,12 @@ __xfs_getfsmap_datadev(
/*
* Set the AG low key to the start of the AG prior to
* moving on to the next AG.
*/
- if (pag->pag_agno == start_ag) {
- info->low.rm_startblock = 0;
- info->low.rm_owner = 0;
- info->low.rm_offset = 0;
- info->low.rm_flags = 0;
- }
+ if (pag->pag_agno == start_ag)
+ memset(&info->low, 0, sizeof(info->low));
/*
* If this is the last AG, report any gap at the end of it
* before we drop the reference to the perag when the loop
* terminates.
@@ -899,25 +931,21 @@ xfs_getfsmap(
*
* If the low key mapping refers to file data, the same physical
* blocks could be mapped to several other files/offsets.
* According to rmapbt record ordering, the minimal next
* possible record for the block range is the next starting
- * offset in the same inode. Therefore, bump the file offset to
- * continue the search appropriately. For all other low key
- * mapping types (attr blocks, metadata), bump the physical
- * offset as there can be no other mapping for the same physical
- * block range.
+ * offset in the same inode. Therefore, each fsmap backend bumps
+ * the file offset to continue the search appropriately. For
+ * all other low key mapping types (attr blocks, metadata), each
+ * fsmap backend bumps the physical offset as there can be no
+ * other mapping for the same physical block range.
*/
dkeys[0] = head->fmh_keys[0];
if (dkeys[0].fmr_flags & (FMR_OF_SPECIAL_OWNER | FMR_OF_EXTENT_MAP)) {
- dkeys[0].fmr_physical += dkeys[0].fmr_length;
- dkeys[0].fmr_owner = 0;
if (dkeys[0].fmr_offset)
return -EINVAL;
- } else
- dkeys[0].fmr_offset += dkeys[0].fmr_length;
- dkeys[0].fmr_length = 0;
+ }
memset(&dkeys[1], 0xFF, sizeof(struct xfs_fsmap));
if (!xfs_getfsmap_check_keys(dkeys, &head->fmh_keys[1]))
return -EINVAL;
@@ -958,10 +986,11 @@ xfs_getfsmap(
break;
info.dev = handlers[i].dev;
info.last = false;
info.pag = NULL;
+ info.low.rm_blockcount = 0;
error = handlers[i].fn(tp, dkeys, &info);
if (error)
break;
xfs_trans_cancel(tp);
tp = NULL;
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 02/23] xfs: fix integer overflows in the fsmap rtbitmap and logdev backends
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 01/23] xfs: fix interval filtering in multi-step fsmap queries Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 03/23] xfs: fix getfsmap reporting past the last rt extent Leah Rumancik
` (21 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Dave Chinner,
Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 7975aba19cba4eba7ff60410f9294c90edc96dcf ]
It's not correct to use the rmap irec structure to hold query key
information to query the rtbitmap because the realtime volume can be
longer than 2^32 fsblocks in length. Because the rt volume doesn't have
allocation groups, introduce a daddr-based record filtering algorithm
and compute the rtextent values using 64-bit variables. The same
problem exists in the external log device fsmap implementation, so use
the same solution to fix it too.
After this patch, all the code that touches info->low and info->high
under xfs_getfsmap_logdev and __xfs_getfsmap_rtdev are unnecessary.
Cleaning this up will be done in subsequent patches.
Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_fsmap.c | 90 ++++++++++++++++++++++++++++++++--------------
1 file changed, 64 insertions(+), 26 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 2011f1bf7ce0..5039d330ef98 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -158,10 +158,12 @@ struct xfs_getfsmap_info {
struct xfs_fsmap_head *head;
struct fsmap *fsmap_recs; /* mapping records */
struct xfs_buf *agf_bp; /* AGF, for refcount queries */
struct xfs_perag *pag; /* AG info, if applicable */
xfs_daddr_t next_daddr; /* next daddr we expect */
+ /* daddr of low fsmap key when we're using the rtbitmap */
+ xfs_daddr_t low_daddr;
u64 missing_owner; /* owner of holes */
u32 dev; /* device id */
/*
* Low rmap key for the query. If low.rm_blockcount is nonzero, this
* is the second (or later) call to retrieve the recordset in pieces.
@@ -248,59 +250,66 @@ static inline bool
xfs_getfsmap_rec_before_start(
struct xfs_getfsmap_info *info,
const struct xfs_rmap_irec *rec,
xfs_daddr_t rec_daddr)
{
+ if (info->low_daddr != -1ULL)
+ return rec_daddr < info->low_daddr;
if (info->low.rm_blockcount)
return xfs_rmap_compare(rec, &info->low) < 0;
return false;
}
/*
* Format a reverse mapping for getfsmap, having translated rm_startblock
- * into the appropriate daddr units.
+ * into the appropriate daddr units. Pass in a nonzero @len_daddr if the
+ * length could be larger than rm_blockcount in struct xfs_rmap_irec.
*/
STATIC int
xfs_getfsmap_helper(
struct xfs_trans *tp,
struct xfs_getfsmap_info *info,
const struct xfs_rmap_irec *rec,
- xfs_daddr_t rec_daddr)
+ xfs_daddr_t rec_daddr,
+ xfs_daddr_t len_daddr)
{
struct xfs_fsmap fmr;
struct xfs_mount *mp = tp->t_mountp;
bool shared;
int error;
if (fatal_signal_pending(current))
return -EINTR;
+ if (len_daddr == 0)
+ len_daddr = XFS_FSB_TO_BB(mp, rec->rm_blockcount);
+
/*
* Filter out records that start before our startpoint, if the
* caller requested that.
*/
if (xfs_getfsmap_rec_before_start(info, rec, rec_daddr)) {
- rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount);
+ rec_daddr += len_daddr;
if (info->next_daddr < rec_daddr)
info->next_daddr = rec_daddr;
return 0;
}
/* Are we just counting mappings? */
if (info->head->fmh_count == 0) {
if (info->head->fmh_entries == UINT_MAX)
return -ECANCELED;
if (rec_daddr > info->next_daddr)
info->head->fmh_entries++;
if (info->last)
return 0;
info->head->fmh_entries++;
- rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount);
+ rec_daddr += len_daddr;
if (info->next_daddr < rec_daddr)
info->next_daddr = rec_daddr;
return 0;
}
@@ -336,73 +345,73 @@ xfs_getfsmap_helper(
fmr.fmr_physical = rec_daddr;
error = xfs_fsmap_owner_from_rmap(&fmr, rec);
if (error)
return error;
fmr.fmr_offset = XFS_FSB_TO_BB(mp, rec->rm_offset);
- fmr.fmr_length = XFS_FSB_TO_BB(mp, rec->rm_blockcount);
+ fmr.fmr_length = len_daddr;
if (rec->rm_flags & XFS_RMAP_UNWRITTEN)
fmr.fmr_flags |= FMR_OF_PREALLOC;
if (rec->rm_flags & XFS_RMAP_ATTR_FORK)
fmr.fmr_flags |= FMR_OF_ATTR_FORK;
if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK)
fmr.fmr_flags |= FMR_OF_EXTENT_MAP;
if (fmr.fmr_flags == 0) {
error = xfs_getfsmap_is_shared(tp, info, rec, &shared);
if (error)
return error;
if (shared)
fmr.fmr_flags |= FMR_OF_SHARED;
}
xfs_getfsmap_format(mp, &fmr, info);
out:
- rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount);
+ rec_daddr += len_daddr;
if (info->next_daddr < rec_daddr)
info->next_daddr = rec_daddr;
return 0;
}
/* Transform a rmapbt irec into a fsmap */
STATIC int
xfs_getfsmap_datadev_helper(
struct xfs_btree_cur *cur,
const struct xfs_rmap_irec *rec,
void *priv)
{
struct xfs_mount *mp = cur->bc_mp;
struct xfs_getfsmap_info *info = priv;
xfs_fsblock_t fsb;
xfs_daddr_t rec_daddr;
fsb = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno, rec->rm_startblock);
rec_daddr = XFS_FSB_TO_DADDR(mp, fsb);
- return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr);
+ return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr, 0);
}
/* Transform a bnobt irec into a fsmap */
STATIC int
xfs_getfsmap_datadev_bnobt_helper(
struct xfs_btree_cur *cur,
const struct xfs_alloc_rec_incore *rec,
void *priv)
{
struct xfs_mount *mp = cur->bc_mp;
struct xfs_getfsmap_info *info = priv;
struct xfs_rmap_irec irec;
xfs_daddr_t rec_daddr;
rec_daddr = XFS_AGB_TO_DADDR(mp, cur->bc_ag.pag->pag_agno,
rec->ar_startblock);
irec.rm_startblock = rec->ar_startblock;
irec.rm_blockcount = rec->ar_blockcount;
irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */
irec.rm_offset = 0;
irec.rm_flags = 0;
- return xfs_getfsmap_helper(cur->bc_tp, info, &irec, rec_daddr);
+ return xfs_getfsmap_helper(cur->bc_tp, info, &irec, rec_daddr, 0);
}
/* Set rmap flags based on the getfsmap flags */
static void
xfs_getfsmap_set_irec_flags(
@@ -425,133 +434,161 @@ xfs_getfsmap_logdev(
const struct xfs_fsmap *keys,
struct xfs_getfsmap_info *info)
{
struct xfs_mount *mp = tp->t_mountp;
struct xfs_rmap_irec rmap;
+ xfs_daddr_t rec_daddr, len_daddr;
+ xfs_fsblock_t start_fsb;
int error;
/* Set up search keys */
+ start_fsb = XFS_BB_TO_FSBT(mp,
+ keys[0].fmr_physical + keys[0].fmr_length);
info->low.rm_startblock = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical);
info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset);
error = xfs_fsmap_owner_to_rmap(&info->low, keys);
if (error)
return error;
info->low.rm_blockcount = 0;
xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
+ /* Adjust the low key if we are continuing from where we left off. */
+ if (keys[0].fmr_length > 0)
+ info->low_daddr = XFS_FSB_TO_BB(mp, start_fsb);
+
error = xfs_fsmap_owner_to_rmap(&info->high, keys + 1);
if (error)
return error;
info->high.rm_startblock = -1U;
info->high.rm_owner = ULLONG_MAX;
info->high.rm_offset = ULLONG_MAX;
info->high.rm_blockcount = 0;
info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS;
info->missing_owner = XFS_FMR_OWN_FREE;
trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low);
trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high);
- if (keys[0].fmr_physical > 0)
+ if (start_fsb > 0)
return 0;
/* Fabricate an rmap entry for the external log device. */
rmap.rm_startblock = 0;
rmap.rm_blockcount = mp->m_sb.sb_logblocks;
rmap.rm_owner = XFS_RMAP_OWN_LOG;
rmap.rm_offset = 0;
rmap.rm_flags = 0;
- return xfs_getfsmap_helper(tp, info, &rmap, 0);
+ rec_daddr = XFS_FSB_TO_BB(mp, rmap.rm_startblock);
+ len_daddr = XFS_FSB_TO_BB(mp, rmap.rm_blockcount);
+ return xfs_getfsmap_helper(tp, info, &rmap, rec_daddr, len_daddr);
}
#ifdef CONFIG_XFS_RT
/* Transform a rtbitmap "record" into a fsmap */
STATIC int
xfs_getfsmap_rtdev_rtbitmap_helper(
struct xfs_mount *mp,
struct xfs_trans *tp,
const struct xfs_rtalloc_rec *rec,
void *priv)
{
struct xfs_getfsmap_info *info = priv;
struct xfs_rmap_irec irec;
- xfs_daddr_t rec_daddr;
+ xfs_rtblock_t rtbno;
+ xfs_daddr_t rec_daddr, len_daddr;
+
+ rtbno = rec->ar_startext * mp->m_sb.sb_rextsize;
+ rec_daddr = XFS_FSB_TO_BB(mp, rtbno);
+ irec.rm_startblock = rtbno;
+
+ rtbno = rec->ar_extcount * mp->m_sb.sb_rextsize;
+ len_daddr = XFS_FSB_TO_BB(mp, rtbno);
+ irec.rm_blockcount = rtbno;
- irec.rm_startblock = rec->ar_startext * mp->m_sb.sb_rextsize;
- rec_daddr = XFS_FSB_TO_BB(mp, irec.rm_startblock);
- irec.rm_blockcount = rec->ar_extcount * mp->m_sb.sb_rextsize;
irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */
irec.rm_offset = 0;
irec.rm_flags = 0;
- return xfs_getfsmap_helper(tp, info, &irec, rec_daddr);
+ return xfs_getfsmap_helper(tp, info, &irec, rec_daddr, len_daddr);
}
/* Execute a getfsmap query against the realtime device. */
STATIC int
__xfs_getfsmap_rtdev(
struct xfs_trans *tp,
const struct xfs_fsmap *keys,
int (*query_fn)(struct xfs_trans *,
- struct xfs_getfsmap_info *),
+ struct xfs_getfsmap_info *,
+ xfs_rtblock_t start_rtb,
+ xfs_rtblock_t end_rtb),
struct xfs_getfsmap_info *info)
{
struct xfs_mount *mp = tp->t_mountp;
- xfs_fsblock_t start_fsb;
- xfs_fsblock_t end_fsb;
+ xfs_rtblock_t start_rtb;
+ xfs_rtblock_t end_rtb;
uint64_t eofs;
int error = 0;
eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks);
if (keys[0].fmr_physical >= eofs)
return 0;
- start_fsb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical);
- end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
+ start_rtb = XFS_BB_TO_FSBT(mp,
+ keys[0].fmr_physical + keys[0].fmr_length);
+ end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
/* Set up search keys */
- info->low.rm_startblock = start_fsb;
+ info->low.rm_startblock = start_rtb;
error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]);
if (error)
return error;
info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset);
info->low.rm_blockcount = 0;
xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
- info->high.rm_startblock = end_fsb;
+ /* Adjust the low key if we are continuing from where we left off. */
+ if (keys[0].fmr_length > 0) {
+ info->low_daddr = XFS_FSB_TO_BB(mp, start_rtb);
+ if (info->low_daddr >= eofs)
+ return 0;
+ }
+
+ info->high.rm_startblock = end_rtb;
error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]);
if (error)
return error;
info->high.rm_offset = XFS_BB_TO_FSBT(mp, keys[1].fmr_offset);
info->high.rm_blockcount = 0;
xfs_getfsmap_set_irec_flags(&info->high, &keys[1]);
trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low);
trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high);
- return query_fn(tp, info);
+ return query_fn(tp, info, start_rtb, end_rtb);
}
/* Actually query the realtime bitmap. */
STATIC int
xfs_getfsmap_rtdev_rtbitmap_query(
struct xfs_trans *tp,
- struct xfs_getfsmap_info *info)
+ struct xfs_getfsmap_info *info,
+ xfs_rtblock_t start_rtb,
+ xfs_rtblock_t end_rtb)
{
struct xfs_rtalloc_rec alow = { 0 };
struct xfs_rtalloc_rec ahigh = { 0 };
struct xfs_mount *mp = tp->t_mountp;
int error;
xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED);
/*
* Set up query parameters to return free rtextents covering the range
* we want.
*/
- alow.ar_startext = info->low.rm_startblock;
- ahigh.ar_startext = info->high.rm_startblock;
+ alow.ar_startext = start_rtb;
+ ahigh.ar_startext = end_rtb;
do_div(alow.ar_startext, mp->m_sb.sb_rextsize);
if (do_div(ahigh.ar_startext, mp->m_sb.sb_rextsize))
ahigh.ar_startext++;
error = xfs_rtalloc_query_range(mp, tp, &alow, &ahigh,
xfs_getfsmap_rtdev_rtbitmap_helper, info);
@@ -986,10 +1023,11 @@ xfs_getfsmap(
break;
info.dev = handlers[i].dev;
info.last = false;
info.pag = NULL;
+ info.low_daddr = -1ULL;
info.low.rm_blockcount = 0;
error = handlers[i].fn(tp, dkeys, &info);
if (error)
break;
xfs_trans_cancel(tp);
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 03/23] xfs: fix getfsmap reporting past the last rt extent
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 01/23] xfs: fix interval filtering in multi-step fsmap queries Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 02/23] xfs: fix integer overflows in the fsmap rtbitmap and logdev backends Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 04/23] xfs: clean up the rtbitmap fsmap backend Leah Rumancik
` (20 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Dave Chinner,
Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit d898137d789cac9ebe5eed9957e4cf25122ca524 ]
The realtime section ends at the last rt extent. If the user configures
the rt geometry with an extent size that is not an integer factor of the
number of rt blocks, it's possible for there to be rt blocks past the
end of the last rt extent. These tail blocks cannot ever be allocated
and will cause corruption reports if the last extent coincides with the
end of an rt bitmap block, so do not report consider them for the
GETFSMAP output.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_fsmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 5039d330ef98..7b72992c14d9 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -527,11 +527,11 @@ __xfs_getfsmap_rtdev(
xfs_rtblock_t start_rtb;
xfs_rtblock_t end_rtb;
uint64_t eofs;
int error = 0;
- eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks);
+ eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rextents * mp->m_sb.sb_rextsize);
if (keys[0].fmr_physical >= eofs)
return 0;
start_rtb = XFS_BB_TO_FSBT(mp,
keys[0].fmr_physical + keys[0].fmr_length);
end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 04/23] xfs: clean up the rtbitmap fsmap backend
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (2 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 03/23] xfs: fix getfsmap reporting past the last rt extent Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 05/23] xfs: fix logdev fsmap query result filtering Leah Rumancik
` (19 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Dave Chinner,
Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit f045dd00328d78f25d64913285f4547f772d13e2 ]
The rtbitmap fsmap backend doesn't query the rmapbt, so it's wasteful to
spend time initializing the rmap_irec objects. Worse yet, the logic to
query the rtbitmap is spread across three separate functions, which is
unnecessarily difficult to follow.
Compute the start rtextent that we want from keys[0] directly and
combine the functions to avoid passing parameters around everywhere, and
consolidate all the logic into a single function. At one point many
years ago I intended to use __xfs_getfsmap_rtdev as the launching point
for realtime rmapbt queries, but this hasn't been the case for a long
time.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_fsmap.c | 62 +++++++---------------------------------------
fs/xfs/xfs_trace.h | 25 +++++++++++++++++++
2 files changed, 34 insertions(+), 53 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 7b72992c14d9..202f162515bd 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -510,76 +510,44 @@ xfs_getfsmap_rtdev_rtbitmap_helper(
irec.rm_flags = 0;
return xfs_getfsmap_helper(tp, info, &irec, rec_daddr, len_daddr);
}
-/* Execute a getfsmap query against the realtime device. */
+/* Execute a getfsmap query against the realtime device rtbitmap. */
STATIC int
-__xfs_getfsmap_rtdev(
+xfs_getfsmap_rtdev_rtbitmap(
struct xfs_trans *tp,
const struct xfs_fsmap *keys,
- int (*query_fn)(struct xfs_trans *,
- struct xfs_getfsmap_info *,
- xfs_rtblock_t start_rtb,
- xfs_rtblock_t end_rtb),
struct xfs_getfsmap_info *info)
{
+
+ struct xfs_rtalloc_rec alow = { 0 };
+ struct xfs_rtalloc_rec ahigh = { 0 };
struct xfs_mount *mp = tp->t_mountp;
xfs_rtblock_t start_rtb;
xfs_rtblock_t end_rtb;
uint64_t eofs;
- int error = 0;
+ int error;
eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rextents * mp->m_sb.sb_rextsize);
if (keys[0].fmr_physical >= eofs)
return 0;
start_rtb = XFS_BB_TO_FSBT(mp,
keys[0].fmr_physical + keys[0].fmr_length);
end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
- /* Set up search keys */
- info->low.rm_startblock = start_rtb;
- error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]);
- if (error)
- return error;
- info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset);
- info->low.rm_blockcount = 0;
- xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
+ info->missing_owner = XFS_FMR_OWN_UNKNOWN;
/* Adjust the low key if we are continuing from where we left off. */
if (keys[0].fmr_length > 0) {
info->low_daddr = XFS_FSB_TO_BB(mp, start_rtb);
if (info->low_daddr >= eofs)
return 0;
}
- info->high.rm_startblock = end_rtb;
- error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]);
- if (error)
- return error;
- info->high.rm_offset = XFS_BB_TO_FSBT(mp, keys[1].fmr_offset);
- info->high.rm_blockcount = 0;
- xfs_getfsmap_set_irec_flags(&info->high, &keys[1]);
-
- trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low);
- trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high);
-
- return query_fn(tp, info, start_rtb, end_rtb);
-}
-
-/* Actually query the realtime bitmap. */
-STATIC int
-xfs_getfsmap_rtdev_rtbitmap_query(
- struct xfs_trans *tp,
- struct xfs_getfsmap_info *info,
- xfs_rtblock_t start_rtb,
- xfs_rtblock_t end_rtb)
-{
- struct xfs_rtalloc_rec alow = { 0 };
- struct xfs_rtalloc_rec ahigh = { 0 };
- struct xfs_mount *mp = tp->t_mountp;
- int error;
+ trace_xfs_fsmap_low_key_linear(mp, info->dev, start_rtb);
+ trace_xfs_fsmap_high_key_linear(mp, info->dev, end_rtb);
xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED);
/*
* Set up query parameters to return free rtextents covering the range
@@ -607,22 +575,10 @@ xfs_getfsmap_rtdev_rtbitmap_query(
goto err;
err:
xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED);
return error;
}
-
-/* Execute a getfsmap query against the realtime device rtbitmap. */
-STATIC int
-xfs_getfsmap_rtdev_rtbitmap(
- struct xfs_trans *tp,
- const struct xfs_fsmap *keys,
- struct xfs_getfsmap_info *info)
-{
- info->missing_owner = XFS_FMR_OWN_UNKNOWN;
- return __xfs_getfsmap_rtdev(tp, keys, xfs_getfsmap_rtdev_rtbitmap_query,
- info);
-}
#endif /* CONFIG_XFS_RT */
/* Execute a getfsmap query against the regular data device. */
STATIC int
__xfs_getfsmap_datadev(
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 20e2ec8b73aa..a9e3081b6625 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3489,10 +3489,35 @@ DEFINE_EVENT(xfs_fsmap_class, name, \
TP_ARGS(mp, keydev, agno, rmap))
DEFINE_FSMAP_EVENT(xfs_fsmap_low_key);
DEFINE_FSMAP_EVENT(xfs_fsmap_high_key);
DEFINE_FSMAP_EVENT(xfs_fsmap_mapping);
+DECLARE_EVENT_CLASS(xfs_fsmap_linear_class,
+ TP_PROTO(struct xfs_mount *mp, u32 keydev, uint64_t bno),
+ TP_ARGS(mp, keydev, bno),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(dev_t, keydev)
+ __field(xfs_fsblock_t, bno)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->keydev = new_decode_dev(keydev);
+ __entry->bno = bno;
+ ),
+ TP_printk("dev %d:%d keydev %d:%d bno 0x%llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ MAJOR(__entry->keydev), MINOR(__entry->keydev),
+ __entry->bno)
+)
+#define DEFINE_FSMAP_LINEAR_EVENT(name) \
+DEFINE_EVENT(xfs_fsmap_linear_class, name, \
+ TP_PROTO(struct xfs_mount *mp, u32 keydev, uint64_t bno), \
+ TP_ARGS(mp, keydev, bno))
+DEFINE_FSMAP_LINEAR_EVENT(xfs_fsmap_low_key_linear);
+DEFINE_FSMAP_LINEAR_EVENT(xfs_fsmap_high_key_linear);
+
DECLARE_EVENT_CLASS(xfs_getfsmap_class,
TP_PROTO(struct xfs_mount *mp, struct xfs_fsmap *fsmap),
TP_ARGS(mp, fsmap),
TP_STRUCT__entry(
__field(dev_t, dev)
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 05/23] xfs: fix logdev fsmap query result filtering
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (3 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 04/23] xfs: clean up the rtbitmap fsmap backend Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 06/23] xfs: validate fsmap offsets specified in the query keys Leah Rumancik
` (18 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Dave Chinner,
Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit a949a1c2a198e048630a8b0741a99b85a5d88136 ]
The external log device fsmap backend doesn't have an rmapbt to query,
so it's wasteful to spend time initializing the rmap_irec objects.
Worse yet, the log could (someday) be longer than 2^32 fsblocks, so
using the rmap irec structure will result in integer overflows.
Fix this mess by computing the start address that we want from keys[0]
directly, and use the daddr-based record filtering algorithm that we
also use for rtbitmap queries.
Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_fsmap.c | 30 ++++++++----------------------
1 file changed, 8 insertions(+), 22 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 202f162515bd..cdd806d80b7c 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -435,40 +435,26 @@ xfs_getfsmap_logdev(
struct xfs_getfsmap_info *info)
{
struct xfs_mount *mp = tp->t_mountp;
struct xfs_rmap_irec rmap;
xfs_daddr_t rec_daddr, len_daddr;
- xfs_fsblock_t start_fsb;
- int error;
+ xfs_fsblock_t start_fsb, end_fsb;
+ uint64_t eofs;
- /* Set up search keys */
+ eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
+ if (keys[0].fmr_physical >= eofs)
+ return 0;
start_fsb = XFS_BB_TO_FSBT(mp,
keys[0].fmr_physical + keys[0].fmr_length);
- info->low.rm_startblock = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical);
- info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset);
- error = xfs_fsmap_owner_to_rmap(&info->low, keys);
- if (error)
- return error;
- info->low.rm_blockcount = 0;
- xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
+ end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
/* Adjust the low key if we are continuing from where we left off. */
if (keys[0].fmr_length > 0)
info->low_daddr = XFS_FSB_TO_BB(mp, start_fsb);
- error = xfs_fsmap_owner_to_rmap(&info->high, keys + 1);
- if (error)
- return error;
- info->high.rm_startblock = -1U;
- info->high.rm_owner = ULLONG_MAX;
- info->high.rm_offset = ULLONG_MAX;
- info->high.rm_blockcount = 0;
- info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS;
- info->missing_owner = XFS_FMR_OWN_FREE;
-
- trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low);
- trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high);
+ trace_xfs_fsmap_low_key_linear(mp, info->dev, start_fsb);
+ trace_xfs_fsmap_high_key_linear(mp, info->dev, end_fsb);
if (start_fsb > 0)
return 0;
/* Fabricate an rmap entry for the external log device. */
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 06/23] xfs: validate fsmap offsets specified in the query keys
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (4 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 05/23] xfs: fix logdev fsmap query result filtering Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 07/23] xfs: fix xfs_btree_query_range callers to initialize btree rec fully Leah Rumancik
` (17 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Dave Chinner,
Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 3ee9351e74907fe3acb0721c315af25b05dc87da ]
Improve the validation of the fsmap offset fields in the query keys and
move the validation to the top of the function now that we have pushed
the low key adjustment code downwards.
Also fix some indenting issues that aren't worth a separate patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_fsmap.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index cdd806d80b7c..d10f2c719220 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -800,10 +800,23 @@ xfs_getfsmap_is_valid_device(
STATIC bool
xfs_getfsmap_check_keys(
struct xfs_fsmap *low_key,
struct xfs_fsmap *high_key)
{
+ if (low_key->fmr_flags & (FMR_OF_SPECIAL_OWNER | FMR_OF_EXTENT_MAP)) {
+ if (low_key->fmr_offset)
+ return false;
+ }
+ if (high_key->fmr_flags != -1U &&
+ (high_key->fmr_flags & (FMR_OF_SPECIAL_OWNER |
+ FMR_OF_EXTENT_MAP))) {
+ if (high_key->fmr_offset && high_key->fmr_offset != -1ULL)
+ return false;
+ }
+ if (high_key->fmr_length && high_key->fmr_length != -1ULL)
+ return false;
+
if (low_key->fmr_device > high_key->fmr_device)
return false;
if (low_key->fmr_device < high_key->fmr_device)
return true;
@@ -843,39 +856,41 @@ xfs_getfsmap_check_keys(
*
* Key to Confusion
* ----------------
* There are multiple levels of keys and counters at work here:
* xfs_fsmap_head.fmh_keys -- low and high fsmap keys passed in;
- * these reflect fs-wide sector addrs.
+ * these reflect fs-wide sector addrs.
* dkeys -- fmh_keys used to query each device;
- * these are fmh_keys but w/ the low key
- * bumped up by fmr_length.
+ * these are fmh_keys but w/ the low key
+ * bumped up by fmr_length.
* xfs_getfsmap_info.next_daddr -- next disk addr we expect to see; this
* is how we detect gaps in the fsmap
records and report them.
* xfs_getfsmap_info.low/high -- per-AG low/high keys computed from
- * dkeys; used to query the metadata.
+ * dkeys; used to query the metadata.
*/
int
xfs_getfsmap(
struct xfs_mount *mp,
struct xfs_fsmap_head *head,
struct fsmap *fsmap_recs)
{
struct xfs_trans *tp = NULL;
struct xfs_fsmap dkeys[2]; /* per-dev keys */
struct xfs_getfsmap_dev handlers[XFS_GETFSMAP_DEVS];
struct xfs_getfsmap_info info = { NULL };
bool use_rmap;
int i;
int error = 0;
if (head->fmh_iflags & ~FMH_IF_VALID)
return -EINVAL;
if (!xfs_getfsmap_is_valid_device(mp, &head->fmh_keys[0]) ||
!xfs_getfsmap_is_valid_device(mp, &head->fmh_keys[1]))
return -EINVAL;
+ if (!xfs_getfsmap_check_keys(&head->fmh_keys[0], &head->fmh_keys[1]))
+ return -EINVAL;
use_rmap = xfs_has_rmapbt(mp) &&
has_capability_noaudit(current, CAP_SYS_ADMIN);
head->fmh_entries = 0;
@@ -917,19 +932,12 @@ xfs_getfsmap(
* all other low key mapping types (attr blocks, metadata), each
* fsmap backend bumps the physical offset as there can be no
* other mapping for the same physical block range.
*/
dkeys[0] = head->fmh_keys[0];
- if (dkeys[0].fmr_flags & (FMR_OF_SPECIAL_OWNER | FMR_OF_EXTENT_MAP)) {
- if (dkeys[0].fmr_offset)
- return -EINVAL;
- }
memset(&dkeys[1], 0xFF, sizeof(struct xfs_fsmap));
- if (!xfs_getfsmap_check_keys(dkeys, &head->fmh_keys[1]))
- return -EINVAL;
-
info.next_daddr = head->fmh_keys[0].fmr_physical +
head->fmh_keys[0].fmr_length;
info.fsmap_recs = fsmap_recs;
info.head = head;
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 07/23] xfs: fix xfs_btree_query_range callers to initialize btree rec fully
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (5 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 06/23] xfs: validate fsmap offsets specified in the query keys Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 08/23] xfs: fix an agbno overflow in __xfs_getfsmap_datadev Leah Rumancik
` (16 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Dave Chinner,
Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 75dc0345312221971903b2e28279b7e24b7dbb1b ]
Use struct initializers to ensure that the xfs_btree_irecs passed into
the query_range function are completely initialized. No functional
changes, just closing some sloppy hygiene.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/libxfs/xfs_alloc.c | 10 +++-------
fs/xfs/libxfs/xfs_refcount.c | 13 +++++++------
fs/xfs/libxfs/xfs_rmap.c | 10 +++-------
3 files changed, 13 insertions(+), 20 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index c08265f19136..cd5b197d7046 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3543,19 +3543,15 @@ xfs_alloc_query_range(
const struct xfs_alloc_rec_incore *low_rec,
const struct xfs_alloc_rec_incore *high_rec,
xfs_alloc_query_range_fn fn,
void *priv)
{
- union xfs_btree_irec low_brec;
- union xfs_btree_irec high_brec;
- struct xfs_alloc_query_range_info query;
+ union xfs_btree_irec low_brec = { .a = *low_rec };
+ union xfs_btree_irec high_brec = { .a = *high_rec };
+ struct xfs_alloc_query_range_info query = { .priv = priv, .fn = fn };
ASSERT(cur->bc_btnum == XFS_BTNUM_BNO);
- low_brec.a = *low_rec;
- high_brec.a = *high_rec;
- query.priv = priv;
- query.fn = fn;
return xfs_btree_query_range(cur, &low_brec, &high_brec,
xfs_alloc_query_range_helper, &query);
}
/* Find all free space records. */
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 4ec7a81dd3ef..7e16e76fd2e1 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1901,12 +1901,17 @@ xfs_refcount_recover_cow_leftovers(
struct xfs_trans *tp;
struct xfs_btree_cur *cur;
struct xfs_buf *agbp;
struct xfs_refcount_recovery *rr, *n;
struct list_head debris;
- union xfs_btree_irec low;
- union xfs_btree_irec high;
+ union xfs_btree_irec low = {
+ .rc.rc_domain = XFS_REFC_DOMAIN_COW,
+ };
+ union xfs_btree_irec high = {
+ .rc.rc_domain = XFS_REFC_DOMAIN_COW,
+ .rc.rc_startblock = -1U,
+ };
xfs_fsblock_t fsb;
int error;
/* reflink filesystems mustn't have AGs larger than 2^31-1 blocks */
BUILD_BUG_ON(XFS_MAX_CRC_AG_BLOCKS >= XFS_REFC_COWFLAG);
@@ -1933,14 +1938,10 @@ xfs_refcount_recover_cow_leftovers(
if (error)
goto out_trans;
cur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag);
/* Find all the leftover CoW staging extents. */
- memset(&low, 0, sizeof(low));
- memset(&high, 0, sizeof(high));
- low.rc.rc_domain = high.rc.rc_domain = XFS_REFC_DOMAIN_COW;
- high.rc.rc_startblock = -1U;
error = xfs_btree_query_range(cur, &low, &high,
xfs_refcount_recover_extent, &debris);
xfs_btree_del_cursor(cur, error);
xfs_trans_brelse(tp, agbp);
xfs_trans_cancel(tp);
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index b56aca1e7c66..95d3599561ce 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -2335,18 +2335,14 @@ xfs_rmap_query_range(
const struct xfs_rmap_irec *low_rec,
const struct xfs_rmap_irec *high_rec,
xfs_rmap_query_range_fn fn,
void *priv)
{
- union xfs_btree_irec low_brec;
- union xfs_btree_irec high_brec;
- struct xfs_rmap_query_range_info query;
+ union xfs_btree_irec low_brec = { .r = *low_rec };
+ union xfs_btree_irec high_brec = { .r = *high_rec };
+ struct xfs_rmap_query_range_info query = { .priv = priv, .fn = fn };
- low_brec.r = *low_rec;
- high_brec.r = *high_rec;
- query.priv = priv;
- query.fn = fn;
return xfs_btree_query_range(cur, &low_brec, &high_brec,
xfs_rmap_query_range_helper, &query);
}
/* Find all rmaps. */
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 08/23] xfs: fix an agbno overflow in __xfs_getfsmap_datadev
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (6 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 07/23] xfs: fix xfs_btree_query_range callers to initialize btree rec fully Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 09/23] xfs: fix the contact address for the sysfs ABI documentation Leah Rumancik
` (15 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Dave Chinner,
Dave Chinner, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit cfa2df68b7ceb49ac9eb2d295ab0c5974dbf17e7 ]
Dave Chinner reported that xfs/273 fails if the AG size happens to be an
exact power of two. I traced this to an agbno integer overflow when the
current GETFSMAP call is a continuation of a previous GETFSMAP call, and
the last record returned was non-shareable space at the end of an AG.
__xfs_getfsmap_datadev sets up a data device query by converting the
incoming fmr_physical into an xfs_fsblock_t and cracking it into an agno
and agbno pair. In the (failing) case of where fmr_blockcount of the
low key is nonzero and the record was for a non-shareable extent, it
will add fmr_blockcount to start_fsb and info->low.rm_startblock.
If the low key was actually the last record for that AG, then this
addition causes info->low.rm_startblock to point beyond EOAG. When the
rmapbt range query starts, it'll return an empty set, and fsmap moves on
to the next AG.
Or so I thought. Remember how we added to start_fsb?
If agsize < 1<<agblklog, start_fsb points to the same AG as the original
fmr_physical from the low key. We run the rmapbt query, which returns
nothing, so getfsmap zeroes info->low and moves on to the next AG.
If agsize == 1<<agblklog, start_fsb now points to the next AG. We run
the rmapbt query on the next AG with the excessively large
rm_startblock. If this next AG is actually the last AG, we'll set
info->high to EOFS (which is now has a lower rm_startblock than
info->low), and the ranged btree query code will return -EINVAL. If
it's not the last AG, we ignore all records for the intermediate AGs.
Oops.
Fix this by decoding start_fsb into agno and agbno only after making
adjustments to start_fsb. This means that info->low.rm_startblock will
always be set to a valid agbno, and we always start the rmapbt iteration
in the correct AG.
While we're at it, fix the predicate for determining if an fsmap record
represents non-shareable space to include file data on pre-reflink
filesystems.
Reported-by: Dave Chinner <david@fromorbit.com>
Fixes: 63ef7a35912dd ("xfs: fix interval filtering in multi-step fsmap queries")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_fsmap.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index d10f2c719220..956a5670e56c 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -563,10 +563,23 @@ xfs_getfsmap_rtdev_rtbitmap(
xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED);
return error;
}
#endif /* CONFIG_XFS_RT */
+static inline bool
+rmap_not_shareable(struct xfs_mount *mp, const struct xfs_rmap_irec *r)
+{
+ if (!xfs_has_reflink(mp))
+ return true;
+ if (XFS_RMAP_NON_INODE_OWNER(r->rm_owner))
+ return true;
+ if (r->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK |
+ XFS_RMAP_UNWRITTEN))
+ return true;
+ return false;
+}
+
/* Execute a getfsmap query against the regular data device. */
STATIC int
__xfs_getfsmap_datadev(
struct xfs_trans *tp,
const struct xfs_fsmap *keys,
@@ -596,35 +609,33 @@ __xfs_getfsmap_datadev(
/*
* Convert the fsmap low/high keys to AG based keys. Initialize
* low to the fsmap low key and max out the high key to the end
* of the AG.
*/
- info->low.rm_startblock = XFS_FSB_TO_AGBNO(mp, start_fsb);
info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset);
error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]);
if (error)
return error;
info->low.rm_blockcount = XFS_BB_TO_FSBT(mp, keys[0].fmr_length);
xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
/* Adjust the low key if we are continuing from where we left off. */
if (info->low.rm_blockcount == 0) {
- /* empty */
- } else if (XFS_RMAP_NON_INODE_OWNER(info->low.rm_owner) ||
- (info->low.rm_flags & (XFS_RMAP_ATTR_FORK |
- XFS_RMAP_BMBT_BLOCK |
- XFS_RMAP_UNWRITTEN))) {
- info->low.rm_startblock += info->low.rm_blockcount;
+ /* No previous record from which to continue */
+ } else if (rmap_not_shareable(mp, &info->low)) {
+ /* Last record seen was an unshareable extent */
info->low.rm_owner = 0;
info->low.rm_offset = 0;
start_fsb += info->low.rm_blockcount;
if (XFS_FSB_TO_DADDR(mp, start_fsb) >= eofs)
return 0;
} else {
+ /* Last record seen was a shareable file data extent */
info->low.rm_offset += info->low.rm_blockcount;
}
+ info->low.rm_startblock = XFS_FSB_TO_AGBNO(mp, start_fsb);
info->high.rm_startblock = -1U;
info->high.rm_owner = ULLONG_MAX;
info->high.rm_offset = ULLONG_MAX;
info->high.rm_blockcount = 0;
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 09/23] xfs: fix the contact address for the sysfs ABI documentation
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (7 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 08/23] xfs: fix an agbno overflow in __xfs_getfsmap_datadev Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 10/23] xfs: verify buffer, inode, and dquot items every tx commit Leah Rumancik
` (14 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong,
Christoph Hellwig, Chandan Babu R, Leah Rumancik
From: Christoph Hellwig <hch@lst.de>
[ Upstream commit 9ff4490e2ab364ec433f15668ef3f5edfb53feca ]
oss.sgi.com is long dead, refer to the current linux-xfs list instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
Documentation/ABI/testing/sysfs-fs-xfs | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-fs-xfs b/Documentation/ABI/testing/sysfs-fs-xfs
index f704925f6fe9..82d8e2f79834 100644
--- a/Documentation/ABI/testing/sysfs-fs-xfs
+++ b/Documentation/ABI/testing/sysfs-fs-xfs
@@ -1,37 +1,37 @@
What: /sys/fs/xfs/<disk>/log/log_head_lsn
Date: July 2014
KernelVersion: 3.17
-Contact: xfs@oss.sgi.com
+Contact: linux-xfs@vger.kernel.org
Description:
The log sequence number (LSN) of the current head of the
log. The LSN is exported in "cycle:basic block" format.
Users: xfstests
What: /sys/fs/xfs/<disk>/log/log_tail_lsn
Date: July 2014
KernelVersion: 3.17
-Contact: xfs@oss.sgi.com
+Contact: linux-xfs@vger.kernel.org
Description:
The log sequence number (LSN) of the current tail of the
log. The LSN is exported in "cycle:basic block" format.
What: /sys/fs/xfs/<disk>/log/reserve_grant_head
Date: July 2014
KernelVersion: 3.17
-Contact: xfs@oss.sgi.com
+Contact: linux-xfs@vger.kernel.org
Description:
The current state of the log reserve grant head. It
represents the total log reservation of all currently
outstanding transactions. The grant head is exported in
"cycle:bytes" format.
Users: xfstests
What: /sys/fs/xfs/<disk>/log/write_grant_head
Date: July 2014
KernelVersion: 3.17
-Contact: xfs@oss.sgi.com
+Contact: linux-xfs@vger.kernel.org
Description:
The current state of the log write grant head. It
represents the total log reservation of all currently
outstanding transactions, including regrants due to
rolling transactions. The grant head is exported in
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 10/23] xfs: verify buffer, inode, and dquot items every tx commit
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (8 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 09/23] xfs: fix the contact address for the sysfs ABI documentation Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 11/23] xfs: use consistent uid/gid when grabbing dquots for inodes Leah Rumancik
` (13 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong,
Christoph Hellwig, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 150bb10a28b9c8709ae227fc898d9cf6136faa1e ]
generic/388 has an annoying tendency to fail like this during log
recovery:
XFS (sda4): Unmounting Filesystem 435fe39b-82b6-46ef-be56-819499585130
XFS (sda4): Mounting V5 Filesystem 435fe39b-82b6-46ef-be56-819499585130
XFS (sda4): Starting recovery (logdev: internal)
00000000: 49 4e 81 b6 03 02 00 00 00 00 00 07 00 00 00 07 IN..............
00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 10 ................
00000020: 35 9a 8b c1 3e 6e 81 00 35 9a 8b c1 3f dc b7 00 5...>n..5...?...
00000030: 35 9a 8b c1 3f dc b7 00 00 00 00 00 00 3c 86 4f 5...?........<.O
00000040: 00 00 00 00 00 00 02 f3 00 00 00 00 00 00 00 00 ................
00000050: 00 00 1f 01 00 00 00 00 00 00 00 02 b2 74 c9 0b .............t..
00000060: ff ff ff ff d7 45 73 10 00 00 00 00 00 00 00 2d .....Es........-
00000070: 00 00 07 92 00 01 fe 30 00 00 00 00 00 00 00 1a .......0........
00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000090: 35 9a 8b c1 3b 55 0c 00 00 00 00 00 04 27 b2 d1 5...;U.......'..
000000a0: 43 5f e3 9b 82 b6 46 ef be 56 81 94 99 58 51 30 C_....F..V...XQ0
XFS (sda4): Internal error Bad dinode after recovery at line 539 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x4e/0xc0 [xfs]
CPU: 0 PID: 2189311 Comm: mount Not tainted 6.9.0-rc4-djwx #rc4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x4f/0x60
xfs_corruption_error+0x90/0xa0
xlog_recover_inode_commit_pass2+0x5f1/0xb00
xlog_recover_items_pass2+0x4e/0xc0
xlog_recover_commit_trans+0x2db/0x350
xlog_recovery_process_trans+0xab/0xe0
xlog_recover_process_data+0xa7/0x130
xlog_do_recovery_pass+0x398/0x840
xlog_do_log_recovery+0x62/0xc0
xlog_do_recover+0x34/0x1d0
xlog_recover+0xe9/0x1a0
xfs_log_mount+0xff/0x260
xfs_mountfs+0x5d9/0xb60
xfs_fs_fill_super+0x76b/0xa30
get_tree_bdev+0x124/0x1d0
vfs_get_tree+0x17/0xa0
path_mount+0x72b/0xa90
__x64_sys_mount+0x112/0x150
do_syscall_64+0x49/0x100
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
XFS (sda4): Corruption detected. Unmount and run xfs_repair
XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0x739/0x920 [xfs], inode 0x427b2d1
XFS (sda4): Filesystem has been shut down due to log error (0x2).
XFS (sda4): Please unmount the filesystem and rectify the problem(s).
XFS (sda4): log mount/recovery failed: error -117
XFS (sda4): log mount failed
This inode log item recovery failing the dinode verifier after
replaying the contents of the inode log item into the ondisk inode.
Looking back into what the kernel was doing at the time of the fs
shutdown, a thread was in the middle of running a series of
transactions, each of which committed changes to the inode.
At some point in the middle of that chain, an invalid (at least
according to the verifier) change was committed. Had the filesystem not
shut down in the middle of the chain, a subsequent transaction would
have corrected the invalid state and nobody would have noticed. But
that's not what happened here. Instead, the invalid inode state was
committed to the ondisk log, so log recovery tripped over it.
The actual defect here was an overzealous inode verifier, which was
fixed in a separate patch. This patch adds some transaction precommit
functions for CONFIG_XFS_DEBUG=y mode so that we can detect these kinds
of transient errors at transaction commit time, where it's much easier
to find the root cause.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/Kconfig | 12 ++++++++++++
fs/xfs/xfs.h | 4 ++++
fs/xfs/xfs_buf_item.c | 32 ++++++++++++++++++++++++++++++++
fs/xfs/xfs_dquot_item.c | 31 +++++++++++++++++++++++++++++++
fs/xfs/xfs_inode_item.c | 32 ++++++++++++++++++++++++++++++++
5 files changed, 111 insertions(+)
diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 9fac5ea8d0e4..dff90db507e3 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -152,10 +152,22 @@ config XFS_DEBUG
Note that the resulting code will be HUGE and SLOW, and probably
not useful unless you are debugging a particular problem.
Say N unless you are an XFS developer, or you play one on TV.
+config XFS_DEBUG_EXPENSIVE
+ bool "XFS expensive debugging checks"
+ depends on XFS_FS && XFS_DEBUG
+ help
+ Say Y here to get an XFS build with expensive debugging checks
+ enabled. These checks may affect performance significantly.
+
+ Note that the resulting code will be HUGER and SLOWER, and probably
+ not useful unless you are debugging a particular problem.
+
+ Say N unless you are an XFS developer, or you play one on TV.
+
config XFS_ASSERT_FATAL
bool "XFS fatal asserts"
default y
depends on XFS_FS && XFS_DEBUG
help
diff --git a/fs/xfs/xfs.h b/fs/xfs/xfs.h
index f6ffb4f248f7..9355ccad9503 100644
--- a/fs/xfs/xfs.h
+++ b/fs/xfs/xfs.h
@@ -8,10 +8,14 @@
#ifdef CONFIG_XFS_DEBUG
#define DEBUG 1
#endif
+#ifdef CONFIG_XFS_DEBUG_EXPENSIVE
+#define DEBUG_EXPENSIVE 1
+#endif
+
#ifdef CONFIG_XFS_ASSERT_FATAL
#define XFS_ASSERT_FATAL 1
#endif
#ifdef CONFIG_XFS_WARN
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 023d4e0385dd..b02ce568de0c 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -20,10 +20,11 @@
#include "xfs_dquot_item.h"
#include "xfs_dquot.h"
#include "xfs_trace.h"
#include "xfs_log.h"
#include "xfs_log_priv.h"
+#include "xfs_error.h"
struct kmem_cache *xfs_buf_item_cache;
static inline struct xfs_buf_log_item *BUF_ITEM(struct xfs_log_item *lip)
@@ -779,12 +780,43 @@ xfs_buf_item_committed(
if ((bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF) && lip->li_lsn != 0)
return lip->li_lsn;
return lsn;
}
+#ifdef DEBUG_EXPENSIVE
+static int
+xfs_buf_item_precommit(
+ struct xfs_trans *tp,
+ struct xfs_log_item *lip)
+{
+ struct xfs_buf_log_item *bip = BUF_ITEM(lip);
+ struct xfs_buf *bp = bip->bli_buf;
+ struct xfs_mount *mp = bp->b_mount;
+ xfs_failaddr_t fa;
+
+ if (!bp->b_ops || !bp->b_ops->verify_struct)
+ return 0;
+ if (bip->bli_flags & XFS_BLI_STALE)
+ return 0;
+
+ fa = bp->b_ops->verify_struct(bp);
+ if (fa) {
+ xfs_buf_verifier_error(bp, -EFSCORRUPTED, bp->b_ops->name,
+ bp->b_addr, BBTOB(bp->b_length), fa);
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+ ASSERT(fa == NULL);
+ }
+
+ return 0;
+}
+#else
+# define xfs_buf_item_precommit NULL
+#endif
+
static const struct xfs_item_ops xfs_buf_item_ops = {
.iop_size = xfs_buf_item_size,
+ .iop_precommit = xfs_buf_item_precommit,
.iop_format = xfs_buf_item_format,
.iop_pin = xfs_buf_item_pin,
.iop_unpin = xfs_buf_item_unpin,
.iop_release = xfs_buf_item_release,
.iop_committing = xfs_buf_item_committing,
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 6a1aae799cf1..7d19091215b0 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -15,10 +15,11 @@
#include "xfs_trans.h"
#include "xfs_buf_item.h"
#include "xfs_trans_priv.h"
#include "xfs_qm.h"
#include "xfs_log.h"
+#include "xfs_error.h"
static inline struct xfs_dq_logitem *DQUOT_ITEM(struct xfs_log_item *lip)
{
return container_of(lip, struct xfs_dq_logitem, qli_item);
}
@@ -191,12 +192,42 @@ xfs_qm_dquot_logitem_committing(
xfs_csn_t seq)
{
return xfs_qm_dquot_logitem_release(lip);
}
+#ifdef DEBUG_EXPENSIVE
+static int
+xfs_qm_dquot_logitem_precommit(
+ struct xfs_trans *tp,
+ struct xfs_log_item *lip)
+{
+ struct xfs_dquot *dqp = DQUOT_ITEM(lip)->qli_dquot;
+ struct xfs_mount *mp = dqp->q_mount;
+ struct xfs_disk_dquot ddq = { };
+ xfs_failaddr_t fa;
+
+ xfs_dquot_to_disk(&ddq, dqp);
+ fa = xfs_dquot_verify(mp, &ddq, dqp->q_id);
+ if (fa) {
+ XFS_CORRUPTION_ERROR("Bad dquot during logging",
+ XFS_ERRLEVEL_LOW, mp, &ddq, sizeof(ddq));
+ xfs_alert(mp,
+ "Metadata corruption detected at %pS, dquot 0x%x",
+ fa, dqp->q_id);
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+ ASSERT(fa == NULL);
+ }
+
+ return 0;
+}
+#else
+# define xfs_qm_dquot_logitem_precommit NULL
+#endif
+
static const struct xfs_item_ops xfs_dquot_item_ops = {
.iop_size = xfs_qm_dquot_logitem_size,
+ .iop_precommit = xfs_qm_dquot_logitem_precommit,
.iop_format = xfs_qm_dquot_logitem_format,
.iop_pin = xfs_qm_dquot_logitem_pin,
.iop_unpin = xfs_qm_dquot_logitem_unpin,
.iop_release = xfs_qm_dquot_logitem_release,
.iop_committing = xfs_qm_dquot_logitem_committing,
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 2ec23c9af760..a734ca8d8f03 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -34,10 +34,40 @@ xfs_inode_item_sort(
struct xfs_log_item *lip)
{
return INODE_ITEM(lip)->ili_inode->i_ino;
}
+#ifdef DEBUG_EXPENSIVE
+static void
+xfs_inode_item_precommit_check(
+ struct xfs_inode *ip)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_dinode *dip;
+ xfs_failaddr_t fa;
+
+ dip = kzalloc(mp->m_sb.sb_inodesize, GFP_KERNEL | GFP_NOFS);
+ if (!dip) {
+ ASSERT(dip != NULL);
+ return;
+ }
+
+ xfs_inode_to_disk(ip, dip, 0);
+ xfs_dinode_calc_crc(mp, dip);
+ fa = xfs_dinode_verify(mp, ip->i_ino, dip);
+ if (fa) {
+ xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip,
+ sizeof(*dip), fa);
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+ ASSERT(fa == NULL);
+ }
+ kfree(dip);
+}
+#else
+# define xfs_inode_item_precommit_check(ip) ((void)0)
+#endif
+
/*
* Prior to finally logging the inode, we have to ensure that all the
* per-modification inode state changes are applied. This includes VFS inode
* state updates, format conversions, verifier state synchronisation and
* ensuring the inode buffer remains in memory whilst the inode is dirty.
@@ -166,10 +196,12 @@ xfs_inode_item_precommit(
* in xfs_iflush() for an explanation of this coordination mechanism.
*/
iip->ili_fields |= (flags | iip->ili_last_fields);
spin_unlock(&iip->ili_lock);
+ xfs_inode_item_precommit_check(ip);
+
/*
* We are done with the log item transaction dirty state, so clear it so
* that it doesn't pollute future transactions.
*/
iip->ili_dirty_flags = 0;
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 11/23] xfs: use consistent uid/gid when grabbing dquots for inodes
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (9 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 10/23] xfs: verify buffer, inode, and dquot items every tx commit Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 12/23] xfs: declare xfs_file.c symbols in xfs_file.h Leah Rumancik
` (12 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong,
Christoph Hellwig, Sasha Levin, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 24a4e1cb322e2bf0f3a1afd1978b610a23aa8f36 ]
[ 6.1: resolved conflicts in xfs_inode.c and xfs_symlink.c due to 6.1
not having switched to idmap yet ]
I noticed that callers of xfs_qm_vop_dqalloc use the following code to
compute the anticipated uid of the new file:
mapped_fsuid(idmap, &init_user_ns);
whereas the VFS uses a slightly different computation for actually
assigning i_uid:
mapped_fsuid(idmap, i_user_ns(inode));
Technically, these are not the same things. According to Christian
Brauner, the only time that inode->i_sb->s_user_ns != &init_user_ns is
when the filesystem was mounted in a new mount namespace by an
unpriviledged user. XFS does not allow this, which is why we've never
seen bug reports about quotas being incorrect or the uid checks in
xfs_qm_vop_create_dqattach tripping debug assertions.
However, this /is/ a logic bomb, so let's make the code consistent.
Link: https://lore.kernel.org/linux-fsdevel/20240617-weitblick-gefertigt-4a41f37119fa@brauner/
Fixes: c14329d39f2d ("fs: port fs{g,u}id helpers to mnt_idmap")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_inode.c | 16 ++++++++++------
fs/xfs/xfs_symlink.c | 8 +++++---
2 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b26d26d29273..88d0a088fa86 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -981,14 +981,16 @@ xfs_create(
return -EIO;
prid = xfs_get_initial_prid(dp);
/*
- * Make sure that we have allocated dquot(s) on disk.
+ * Make sure that we have allocated dquot(s) on disk. The uid/gid
+ * computation code must match what the VFS uses to assign i_[ug]id.
+ * INHERIT adjusts the gid computation for setgid/grpid systems.
*/
- error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, &init_user_ns),
- mapped_fsgid(mnt_userns, &init_user_ns), prid,
+ error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, i_user_ns(VFS_I(dp))),
+ mapped_fsgid(mnt_userns, i_user_ns(VFS_I(dp))), prid,
XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
&udqp, &gdqp, &pdqp);
if (error)
return error;
@@ -1130,14 +1132,16 @@ xfs_create_tmpfile(
return -EIO;
prid = xfs_get_initial_prid(dp);
/*
- * Make sure that we have allocated dquot(s) on disk.
+ * Make sure that we have allocated dquot(s) on disk. The uid/gid
+ * computation code must match what the VFS uses to assign i_[ug]id.
+ * INHERIT adjusts the gid computation for setgid/grpid systems.
*/
- error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, &init_user_ns),
- mapped_fsgid(mnt_userns, &init_user_ns), prid,
+ error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, i_user_ns(VFS_I(dp))),
+ mapped_fsgid(mnt_userns, i_user_ns(VFS_I(dp))), prid,
XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
&udqp, &gdqp, &pdqp);
if (error)
return error;
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 8389f3ef88ef..78bd02a98aa5 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -189,14 +189,16 @@ xfs_symlink(
ASSERT(pathlen > 0);
prid = xfs_get_initial_prid(dp);
/*
- * Make sure that we have allocated dquot(s) on disk.
+ * Make sure that we have allocated dquot(s) on disk. The uid/gid
+ * computation code must match what the VFS uses to assign i_[ug]id.
+ * INHERIT adjusts the gid computation for setgid/grpid systems.
*/
- error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, &init_user_ns),
- mapped_fsgid(mnt_userns, &init_user_ns), prid,
+ error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, i_user_ns(VFS_I(dp))),
+ mapped_fsgid(mnt_userns, i_user_ns(VFS_I(dp))), prid,
XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
&udqp, &gdqp, &pdqp);
if (error)
return error;
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 12/23] xfs: declare xfs_file.c symbols in xfs_file.h
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (10 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 11/23] xfs: use consistent uid/gid when grabbing dquots for inodes Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 13/23] xfs: create a new helper to return a file's allocation unit Leah Rumancik
` (11 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong,
Christoph Hellwig, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 00acb28d96746f78389f23a7b5309a917b45c12f ]
Move the two public symbols in xfs_file.c to xfs_file.h. We're about to
add more public symbols in that source file, so let's finally create the
header file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_file.c | 1 +
fs/xfs/xfs_file.h | 12 ++++++++++++
fs/xfs/xfs_ioctl.c | 1 +
fs/xfs/xfs_iops.c | 1 +
fs/xfs/xfs_iops.h | 3 ---
5 files changed, 15 insertions(+), 3 deletions(-)
create mode 100644 fs/xfs/xfs_file.h
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 821cb86a83bd..6f7522977f7f 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -22,10 +22,11 @@
#include "xfs_log.h"
#include "xfs_icache.h"
#include "xfs_pnfs.h"
#include "xfs_iomap.h"
#include "xfs_reflink.h"
+#include "xfs_file.h"
#include <linux/dax.h>
#include <linux/falloc.h>
#include <linux/backing-dev.h>
#include <linux/mman.h>
diff --git a/fs/xfs/xfs_file.h b/fs/xfs/xfs_file.h
new file mode 100644
index 000000000000..7d39e3eca56d
--- /dev/null
+++ b/fs/xfs/xfs_file.h
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ */
+#ifndef __XFS_FILE_H__
+#define __XFS_FILE_H__
+
+extern const struct file_operations xfs_file_operations;
+extern const struct file_operations xfs_dir_file_operations;
+
+#endif /* __XFS_FILE_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index c7cb496dc345..1afb1b1b831e 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -36,10 +36,11 @@
#include "xfs_ag.h"
#include "xfs_health.h"
#include "xfs_reflink.h"
#include "xfs_ioctl.h"
#include "xfs_xattr.h"
+#include "xfs_file.h"
#include <linux/mount.h>
#include <linux/namei.h>
#include <linux/fileattr.h>
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 6fbdc0a19e54..9ca1b8bf1f05 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -23,10 +23,11 @@
#include "xfs_dir2.h"
#include "xfs_iomap.h"
#include "xfs_error.h"
#include "xfs_ioctl.h"
#include "xfs_xattr.h"
+#include "xfs_file.h"
#include <linux/posix_acl.h>
#include <linux/security.h>
#include <linux/iversion.h>
#include <linux/fiemap.h>
diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h
index e570dcb5df8d..73ff92355eaa 100644
--- a/fs/xfs/xfs_iops.h
+++ b/fs/xfs/xfs_iops.h
@@ -6,13 +6,10 @@
#ifndef __XFS_IOPS_H__
#define __XFS_IOPS_H__
struct xfs_inode;
-extern const struct file_operations xfs_file_operations;
-extern const struct file_operations xfs_dir_file_operations;
-
extern ssize_t xfs_vn_listxattr(struct dentry *, char *data, size_t size);
int xfs_vn_setattr_size(struct user_namespace *mnt_userns,
struct dentry *dentry, struct iattr *vap);
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 13/23] xfs: create a new helper to return a file's allocation unit
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (11 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 12/23] xfs: declare xfs_file.c symbols in xfs_file.h Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 14/23] xfs: Fix xfs_flush_unmap_range() range for RT Leah Rumancik
` (10 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong,
Christoph Hellwig, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit ee20808d848c87a51e176706d81b95a21747d6cf ]
Create a new helper function to calculate the fundamental allocation
unit (i.e. the smallest unit of space we can allocate) of a file.
Things are going to get hairy with range-exchange on the realtime
device, so prepare for this now.
Remove the static attribute from xfs_is_falloc_aligned since the next
patch will need it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_file.c | 32 ++++++++++++--------------------
fs/xfs/xfs_file.h | 3 +++
fs/xfs/xfs_inode.c | 13 +++++++++++++
fs/xfs/xfs_inode.h | 2 ++
4 files changed, 30 insertions(+), 20 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 6f7522977f7f..3c910e36da69 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -37,37 +37,29 @@ static const struct vm_operations_struct xfs_file_vm_ops;
/*
* Decide if the given file range is aligned to the size of the fundamental
* allocation unit for the file.
*/
-static bool
+bool
xfs_is_falloc_aligned(
struct xfs_inode *ip,
loff_t pos,
long long int len)
{
- struct xfs_mount *mp = ip->i_mount;
- uint64_t mask;
-
- if (XFS_IS_REALTIME_INODE(ip)) {
- if (!is_power_of_2(mp->m_sb.sb_rextsize)) {
- u64 rextbytes;
- u32 mod;
-
- rextbytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
- div_u64_rem(pos, rextbytes, &mod);
- if (mod)
- return false;
- div_u64_rem(len, rextbytes, &mod);
- return mod == 0;
- }
- mask = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize) - 1;
- } else {
- mask = mp->m_sb.sb_blocksize - 1;
+ unsigned int alloc_unit = xfs_inode_alloc_unitsize(ip);
+
+ if (!is_power_of_2(alloc_unit)) {
+ u32 mod;
+
+ div_u64_rem(pos, alloc_unit, &mod);
+ if (mod)
+ return false;
+ div_u64_rem(len, alloc_unit, &mod);
+ return mod == 0;
}
- return !((pos | len) & mask);
+ return !((pos | len) & (alloc_unit - 1));
}
/*
* Fsync operations on directories are much simpler than on regular files,
* as there is no file data to flush, and thus also no need for explicit
diff --git a/fs/xfs/xfs_file.h b/fs/xfs/xfs_file.h
index 7d39e3eca56d..2ad91f755caf 100644
--- a/fs/xfs/xfs_file.h
+++ b/fs/xfs/xfs_file.h
@@ -7,6 +7,9 @@
#define __XFS_FILE_H__
extern const struct file_operations xfs_file_operations;
extern const struct file_operations xfs_dir_file_operations;
+bool xfs_is_falloc_aligned(struct xfs_inode *ip, loff_t pos,
+ long long int len);
+
#endif /* __XFS_FILE_H__ */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 88d0a088fa86..3ccbc31767b3 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3773,5 +3773,18 @@ xfs_inode_reload_unlinked(
xfs_iunlock(ip, XFS_ILOCK_SHARED);
xfs_trans_cancel(tp);
return error;
}
+
+/* Returns the size of fundamental allocation unit for a file, in bytes. */
+unsigned int
+xfs_inode_alloc_unitsize(
+ struct xfs_inode *ip)
+{
+ unsigned int blocks = 1;
+
+ if (XFS_IS_REALTIME_INODE(ip))
+ blocks = ip->i_mount->m_sb.sb_rextsize;
+
+ return XFS_FSB_TO_B(ip->i_mount, blocks);
+}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index c177c92f3aa5..c4f426eadf8e 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -620,6 +620,8 @@ xfs_inode_unlinked_incomplete(
return VFS_I(ip)->i_nlink == 0 && !xfs_inode_on_unlinked_list(ip);
}
int xfs_inode_reload_unlinked_bucket(struct xfs_trans *tp, struct xfs_inode *ip);
int xfs_inode_reload_unlinked(struct xfs_inode *ip);
+unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip);
+
#endif /* __XFS_INODE_H__ */
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 14/23] xfs: Fix xfs_flush_unmap_range() range for RT
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (12 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 13/23] xfs: create a new helper to return a file's allocation unit Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 15/23] xfs: Fix xfs_prepare_shift() " Leah Rumancik
` (9 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, John Garry,
Christoph Hellwig, Chandan Babu R, Leah Rumancik
From: John Garry <john.g.garry@oracle.com>
[ Upstream commit d3b689d7c711a9f36d3e48db9eaa75784a892f4c ]
Currently xfs_flush_unmap_range() does unmap for a full RT extent range,
which we also want to ensure is clean and idle.
This code change is originally from Dave Chinner.
Reviewed-by: Christoph Hellwig <hch@lst.de>4
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_bmap_util.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 62b92e92a685..dabae6323c50 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -961,18 +961,22 @@ int
xfs_flush_unmap_range(
struct xfs_inode *ip,
xfs_off_t offset,
xfs_off_t len)
{
- struct xfs_mount *mp = ip->i_mount;
struct inode *inode = VFS_I(ip);
xfs_off_t rounding, start, end;
int error;
- rounding = max_t(xfs_off_t, mp->m_sb.sb_blocksize, PAGE_SIZE);
- start = round_down(offset, rounding);
- end = round_up(offset + len, rounding) - 1;
+ /*
+ * Make sure we extend the flush out to extent alignment
+ * boundaries so any extent range overlapping the start/end
+ * of the modification we are about to do is clean and idle.
+ */
+ rounding = max_t(xfs_off_t, xfs_inode_alloc_unitsize(ip), PAGE_SIZE);
+ start = rounddown_64(offset, rounding);
+ end = roundup_64(offset + len, rounding) - 1;
error = filemap_write_and_wait_range(inode->i_mapping, start, end);
if (error)
return error;
truncate_pagecache_range(inode, start, end);
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 15/23] xfs: Fix xfs_prepare_shift() range for RT
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (13 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 14/23] xfs: Fix xfs_flush_unmap_range() range for RT Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 16/23] xfs: don't walk off the end of a directory data block Leah Rumancik
` (8 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, John Garry,
Christoph Hellwig, Chandan Babu R, Leah Rumancik
From: John Garry <john.g.garry@oracle.com>
[ Upstream commit f23660f059470ec7043748da7641e84183c23bc8 ]
The RT extent range must be considered in the xfs_flush_unmap_range() call
to stabilize the boundary.
This code change is originally from Dave Chinner.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_bmap_util.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index dabae6323c50..bab8ba224e10 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1057,33 +1057,35 @@ xfs_free_file_space(
static int
xfs_prepare_shift(
struct xfs_inode *ip,
loff_t offset)
{
- struct xfs_mount *mp = ip->i_mount;
+ unsigned int rounding;
int error;
/*
* Trim eofblocks to avoid shifting uninitialized post-eof preallocation
* into the accessible region of the file.
*/
if (xfs_can_free_eofblocks(ip)) {
error = xfs_free_eofblocks(ip);
if (error)
return error;
}
/*
* Shift operations must stabilize the start block offset boundary along
* with the full range of the operation. If we don't, a COW writeback
* completion could race with an insert, front merge with the start
* extent (after split) during the shift and corrupt the file. Start
- * with the block just prior to the start to stabilize the boundary.
+ * with the allocation unit just prior to the start to stabilize the
+ * boundary.
*/
- offset = round_down(offset, mp->m_sb.sb_blocksize);
+ rounding = xfs_inode_alloc_unitsize(ip);
+ offset = rounddown_64(offset, rounding);
if (offset)
- offset -= mp->m_sb.sb_blocksize;
+ offset -= rounding;
/*
* Writeback and invalidate cache for the remainder of the file as we're
* about to shift down every extent from offset to EOF.
*/
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 16/23] xfs: don't walk off the end of a directory data block
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (14 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 15/23] xfs: Fix xfs_prepare_shift() " Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 17/23] xfs: remove unused parameter in macro XFS_DQUOT_LOGRES Leah Rumancik
` (7 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, lei lu,
Chandan Babu R, Leah Rumancik
From: lei lu <llfamsec@gmail.com>
[ Upstream commit 0c7fcdb6d06cdf8b19b57c17605215b06afa864a ]
This adds sanity checks for xfs_dir2_data_unused and xfs_dir2_data_entry
to make sure don't stray beyond valid memory region. Before patching, the
loop simply checks that the start offset of the dup and dep is within the
range. So in a crafted image, if last entry is xfs_dir2_data_unused, we
can change dup->length to dup->length-1 and leave 1 byte of space. In the
next traversal, this space will be considered as dup or dep. We may
encounter an out of bound read when accessing the fixed members.
In the patch, we make sure that the remaining bytes large enough to hold
an unused entry before accessing xfs_dir2_data_unused and
xfs_dir2_data_unused is XFS_DIR2_DATA_ALIGN byte aligned. We also make
sure that the remaining bytes large enough to hold a dirent with a
single-byte name before accessing xfs_dir2_data_entry.
Signed-off-by: lei lu <llfamsec@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/libxfs/xfs_dir2_data.c | 31 ++++++++++++++++++++++++++-----
fs/xfs/libxfs/xfs_dir2_priv.h | 7 +++++++
2 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c
index dbcf58979a59..e1d5da6d8d4a 100644
--- a/fs/xfs/libxfs/xfs_dir2_data.c
+++ b/fs/xfs/libxfs/xfs_dir2_data.c
@@ -175,78 +175,99 @@ __xfs_dir3_data_check(
* Loop over the data/unused entries.
*/
while (offset < end) {
struct xfs_dir2_data_unused *dup = bp->b_addr + offset;
struct xfs_dir2_data_entry *dep = bp->b_addr + offset;
+ unsigned int reclen;
+
+ /*
+ * Are the remaining bytes large enough to hold an
+ * unused entry?
+ */
+ if (offset > end - xfs_dir2_data_unusedsize(1))
+ return __this_address;
/*
* If it's unused, look for the space in the bestfree table.
* If we find it, account for that, else make sure it
* doesn't need to be there.
*/
if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
xfs_failaddr_t fa;
+ reclen = xfs_dir2_data_unusedsize(
+ be16_to_cpu(dup->length));
if (lastfree != 0)
return __this_address;
- if (offset + be16_to_cpu(dup->length) > end)
+ if (be16_to_cpu(dup->length) != reclen)
+ return __this_address;
+ if (offset + reclen > end)
return __this_address;
if (be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)) !=
offset)
return __this_address;
fa = xfs_dir2_data_freefind_verify(hdr, bf, dup, &dfp);
if (fa)
return fa;
if (dfp) {
i = (int)(dfp - bf);
if ((freeseen & (1 << i)) != 0)
return __this_address;
freeseen |= 1 << i;
} else {
if (be16_to_cpu(dup->length) >
be16_to_cpu(bf[2].length))
return __this_address;
}
- offset += be16_to_cpu(dup->length);
+ offset += reclen;
lastfree = 1;
continue;
}
+
+ /*
+ * This is not an unused entry. Are the remaining bytes
+ * large enough for a dirent with a single-byte name?
+ */
+ if (offset > end - xfs_dir2_data_entsize(mp, 1))
+ return __this_address;
+
/*
* It's a real entry. Validate the fields.
* If this is a block directory then make sure it's
* in the leaf section of the block.
* The linear search is crude but this is DEBUG code.
*/
if (dep->namelen == 0)
return __this_address;
- if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber)))
+ reclen = xfs_dir2_data_entsize(mp, dep->namelen);
+ if (offset + reclen > end)
return __this_address;
- if (offset + xfs_dir2_data_entsize(mp, dep->namelen) > end)
+ if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber)))
return __this_address;
if (be16_to_cpu(*xfs_dir2_data_entry_tag_p(mp, dep)) != offset)
return __this_address;
if (xfs_dir2_data_get_ftype(mp, dep) >= XFS_DIR3_FT_MAX)
return __this_address;
count++;
lastfree = 0;
if (hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC) ||
hdr->magic == cpu_to_be32(XFS_DIR3_BLOCK_MAGIC)) {
addr = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
(xfs_dir2_data_aoff_t)
((char *)dep - (char *)hdr));
name.name = dep->name;
name.len = dep->namelen;
hash = xfs_dir2_hashname(mp, &name);
for (i = 0; i < be32_to_cpu(btp->count); i++) {
if (be32_to_cpu(lep[i].address) == addr &&
be32_to_cpu(lep[i].hashval) == hash)
break;
}
if (i >= be32_to_cpu(btp->count))
return __this_address;
}
- offset += xfs_dir2_data_entsize(mp, dep->namelen);
+ offset += reclen;
}
/*
* Need to have seen all the entries and all the bestfree slots.
*/
if (freeseen != 7)
diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h
index 7404a9ff1a92..9046d08554e9 100644
--- a/fs/xfs/libxfs/xfs_dir2_priv.h
+++ b/fs/xfs/libxfs/xfs_dir2_priv.h
@@ -185,10 +185,17 @@ void xfs_dir2_sf_put_ftype(struct xfs_mount *mp,
/* xfs_dir2_readdir.c */
extern int xfs_readdir(struct xfs_trans *tp, struct xfs_inode *dp,
struct dir_context *ctx, size_t bufsize);
+static inline unsigned int
+xfs_dir2_data_unusedsize(
+ unsigned int len)
+{
+ return round_up(len, XFS_DIR2_DATA_ALIGN);
+}
+
static inline unsigned int
xfs_dir2_data_entsize(
struct xfs_mount *mp,
unsigned int namelen)
{
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 17/23] xfs: remove unused parameter in macro XFS_DQUOT_LOGRES
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (15 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 16/23] xfs: don't walk off the end of a directory data block Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 18/23] xfs: attr forks require attr, not attr2 Leah Rumancik
` (6 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Julian Sun,
Chandan Babu R, Leah Rumancik
From: Julian Sun <sunjunchao2870@gmail.com>
[ Upstream commit af5d92f2fad818663da2ce073b6fe15b9d56ffdc ]
In the macro definition of XFS_DQUOT_LOGRES, a parameter is accepted,
but it is not used. Hence, it should be removed.
This patch has only passed compilation test, but it should be fine.
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/libxfs/xfs_quota_defs.h | 2 +-
fs/xfs/libxfs/xfs_trans_resv.c | 28 ++++++++++++++--------------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_quota_defs.h b/fs/xfs/libxfs/xfs_quota_defs.h
index cb035da3f990..fb05f44f6c75 100644
--- a/fs/xfs/libxfs/xfs_quota_defs.h
+++ b/fs/xfs/libxfs/xfs_quota_defs.h
@@ -54,11 +54,11 @@ typedef uint8_t xfs_dqtype_t;
* dquots can be unique and so 6 dquots can be modified....
*
* And, of course, we also need to take into account the dquot log format item
* used to describe each dquot.
*/
-#define XFS_DQUOT_LOGRES(mp) \
+#define XFS_DQUOT_LOGRES \
((sizeof(struct xfs_dq_logformat) + sizeof(struct xfs_disk_dquot)) * 6)
#define XFS_IS_QUOTA_ON(mp) ((mp)->m_qflags & XFS_ALL_QUOTA_ACCT)
#define XFS_IS_UQUOTA_ON(mp) ((mp)->m_qflags & XFS_UQUOTA_ACCT)
#define XFS_IS_PQUOTA_ON(mp) ((mp)->m_qflags & XFS_PQUOTA_ACCT)
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 5b2f27cbdb80..1bb2891b26ff 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -332,15 +332,15 @@ xfs_calc_write_reservation(
adj = xfs_calc_buf_res(
xfs_refcountbt_block_count(mp, 2),
blksz);
t1 += adj;
t3 += adj;
- return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
+ return XFS_DQUOT_LOGRES + max3(t1, t2, t3);
}
t4 = xfs_calc_refcountbt_reservation(mp, 1);
- return XFS_DQUOT_LOGRES(mp) + max(t4, max3(t1, t2, t3));
+ return XFS_DQUOT_LOGRES + max(t4, max3(t1, t2, t3));
}
unsigned int
xfs_calc_write_reservation_minlogsize(
struct xfs_mount *mp)
@@ -404,41 +404,41 @@ xfs_calc_itruncate_reservation(
if (xfs_has_reflink(mp))
t2 += xfs_calc_buf_res(
xfs_refcountbt_block_count(mp, 4),
blksz);
- return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
+ return XFS_DQUOT_LOGRES + max3(t1, t2, t3);
}
t4 = xfs_calc_refcountbt_reservation(mp, 2);
- return XFS_DQUOT_LOGRES(mp) + max(t4, max3(t1, t2, t3));
+ return XFS_DQUOT_LOGRES + max(t4, max3(t1, t2, t3));
}
unsigned int
xfs_calc_itruncate_reservation_minlogsize(
struct xfs_mount *mp)
{
return xfs_calc_itruncate_reservation(mp, true);
}
/*
* In renaming a files we can modify:
* the five inodes involved: 5 * inode size
* the two directory btrees: 2 * (max depth + v2) * dir block size
* the two directory bmap btrees: 2 * max depth * block size
* And the bmap_finish transaction can free dir and bmap blocks (two sets
* of bmap blocks) giving:
* the agf for the ags in which the blocks live: 3 * sector size
* the agfl for the ags in which the blocks live: 3 * sector size
* the superblock for the free block count: sector size
* the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
*/
STATIC uint
xfs_calc_rename_reservation(
struct xfs_mount *mp)
{
- return XFS_DQUOT_LOGRES(mp) +
+ return XFS_DQUOT_LOGRES +
max((xfs_calc_inode_res(mp, 5) +
xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
XFS_FSB_TO_B(mp, 1))),
(xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3),
@@ -473,11 +473,11 @@ xfs_calc_iunlink_remove_reservation(
*/
STATIC uint
xfs_calc_link_reservation(
struct xfs_mount *mp)
{
- return XFS_DQUOT_LOGRES(mp) +
+ return XFS_DQUOT_LOGRES +
xfs_calc_iunlink_remove_reservation(mp) +
max((xfs_calc_inode_res(mp, 2) +
xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
XFS_FSB_TO_B(mp, 1))),
(xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
@@ -511,11 +511,11 @@ xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
*/
STATIC uint
xfs_calc_remove_reservation(
struct xfs_mount *mp)
{
- return XFS_DQUOT_LOGRES(mp) +
+ return XFS_DQUOT_LOGRES +
xfs_calc_iunlink_add_reservation(mp) +
max((xfs_calc_inode_res(mp, 2) +
xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
XFS_FSB_TO_B(mp, 1))),
(xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
@@ -570,20 +570,20 @@ xfs_calc_icreate_resv_alloc(
}
STATIC uint
xfs_calc_icreate_reservation(xfs_mount_t *mp)
{
- return XFS_DQUOT_LOGRES(mp) +
+ return XFS_DQUOT_LOGRES +
max(xfs_calc_icreate_resv_alloc(mp),
xfs_calc_create_resv_modify(mp));
}
STATIC uint
xfs_calc_create_tmpfile_reservation(
struct xfs_mount *mp)
{
- uint res = XFS_DQUOT_LOGRES(mp);
+ uint res = XFS_DQUOT_LOGRES;
res += xfs_calc_icreate_resv_alloc(mp);
return res + xfs_calc_iunlink_add_reservation(mp);
}
@@ -628,28 +628,28 @@ xfs_calc_symlink_reservation(
*/
STATIC uint
xfs_calc_ifree_reservation(
struct xfs_mount *mp)
{
- return XFS_DQUOT_LOGRES(mp) +
+ return XFS_DQUOT_LOGRES +
xfs_calc_inode_res(mp, 1) +
xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
xfs_calc_iunlink_remove_reservation(mp) +
xfs_calc_inode_chunk_res(mp, _FREE) +
xfs_calc_inobt_res(mp) +
xfs_calc_finobt_res(mp);
}
/*
* When only changing the inode we log the inode and possibly the superblock
* We also add a bit of slop for the transaction stuff.
*/
STATIC uint
xfs_calc_ichange_reservation(
struct xfs_mount *mp)
{
- return XFS_DQUOT_LOGRES(mp) +
+ return XFS_DQUOT_LOGRES +
xfs_calc_inode_res(mp, 1) +
xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
}
@@ -754,11 +754,11 @@ xfs_calc_writeid_reservation(
*/
STATIC uint
xfs_calc_addafork_reservation(
struct xfs_mount *mp)
{
- return XFS_DQUOT_LOGRES(mp) +
+ return XFS_DQUOT_LOGRES +
xfs_calc_inode_res(mp, 1) +
xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
xfs_calc_buf_res(1, mp->m_dir_geo->blksize) +
xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
XFS_FSB_TO_B(mp, 1)) +
@@ -802,11 +802,11 @@ xfs_calc_attrinval_reservation(
*/
STATIC uint
xfs_calc_attrsetm_reservation(
struct xfs_mount *mp)
{
- return XFS_DQUOT_LOGRES(mp) +
+ return XFS_DQUOT_LOGRES +
xfs_calc_inode_res(mp, 1) +
xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH, XFS_FSB_TO_B(mp, 1));
}
@@ -842,11 +842,11 @@ xfs_calc_attrsetrt_reservation(
*/
STATIC uint
xfs_calc_attrrm_reservation(
struct xfs_mount *mp)
{
- return XFS_DQUOT_LOGRES(mp) +
+ return XFS_DQUOT_LOGRES +
max((xfs_calc_inode_res(mp, 1) +
xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH,
XFS_FSB_TO_B(mp, 1)) +
(uint)XFS_FSB_TO_B(mp,
XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 18/23] xfs: attr forks require attr, not attr2
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (16 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 17/23] xfs: remove unused parameter in macro XFS_DQUOT_LOGRES Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 19/23] xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set Leah Rumancik
` (5 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong,
Christoph Hellwig, Chandan Babu R, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 73c34b0b85d46bf9c2c0b367aeaffa1e2481b136 ]
It turns out that I misunderstood the difference between the attr and
attr2 feature bits. "attr" means that at some point an attr fork was
created somewhere in the filesystem. "attr2" means that inodes have
variable-sized forks, but says nothing about whether or not there
actually /are/ attr forks in the system.
If we have an attr fork, we only need to check that attr is set.
Fixes: 99d9d8d05da26 ("xfs: scrub inode block mappings")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/scrub/bmap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index f0b9cb6506fd..45b135929144 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -645,11 +645,17 @@ xchk_bmap(
xchk_ino_set_corrupt(sc, sc->ip->i_ino);
goto out;
}
break;
case XFS_ATTR_FORK:
- if (!xfs_has_attr(mp) && !xfs_has_attr2(mp))
+ /*
+ * "attr" means that an attr fork was created at some point in
+ * the life of this filesystem. "attr2" means that inodes have
+ * variable-sized data/attr fork areas. Hence we only check
+ * attr here.
+ */
+ if (!xfs_has_attr(mp))
xchk_ino_set_corrupt(sc, sc->ip->i_ino);
break;
default:
ASSERT(whichfork == XFS_DATA_FORK);
break;
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 19/23] xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (17 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 18/23] xfs: attr forks require attr, not attr2 Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 20/23] xfs: Fix the owner setting issue for rmap query in xfs fsmap Leah Rumancik
` (4 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong,
Christoph Hellwig, Chandan Babu R, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 8d16762047c627073955b7ed171a36addaf7b1ff ]
If a file has the S_DAX flag (aka fsdax access mode) set, we cannot
allow users to change the realtime flag unless the datadev and rtdev
both support fsdax access modes. Even if there are no extents allocated
to the file, the setattr thread could be racing with another thread
that has already started down the write code paths.
Fixes: ba23cba9b3bdc ("fs: allow per-device dax status checking for filesystems")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_ioctl.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 1afb1b1b831e..ef3dc0778566 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1126,10 +1126,21 @@ xfs_ioctl_setattr_xflags(
if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
/* Can't change realtime flag if any extents are allocated. */
if (ip->i_df.if_nextents || ip->i_delayed_blks)
return -EINVAL;
+
+ /*
+ * If S_DAX is enabled on this file, we can only switch the
+ * device if both support fsdax. We can't update S_DAX because
+ * there might be other threads walking down the access paths.
+ */
+ if (IS_DAX(VFS_I(ip)) &&
+ (mp->m_ddev_targp->bt_daxdev == NULL ||
+ (mp->m_rtdev_targp &&
+ mp->m_rtdev_targp->bt_daxdev == NULL)))
+ return -EINVAL;
}
if (rtflag) {
/* If realtime flag is set then must have realtime device */
if (mp->m_sb.sb_rblocks == 0 || mp->m_sb.sb_rextsize == 0 ||
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 20/23] xfs: Fix the owner setting issue for rmap query in xfs fsmap
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (18 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 19/23] xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 21/23] xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code Leah Rumancik
` (3 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, Zizhi Wo,
Chandan Babu R, Leah Rumancik
From: Zizhi Wo <wozizhi@huawei.com>
[ Upstream commit 68415b349f3f16904f006275757f4fcb34b8ee43 ]
I notice a rmap query bug in xfs_io fsmap:
[root@fedora ~]# xfs_io -c 'fsmap -vvvv' /mnt
EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL
0: 253:16 [0..7]: static fs metadata 0 (0..7) 8
1: 253:16 [8..23]: per-AG metadata 0 (8..23) 16
2: 253:16 [24..39]: inode btree 0 (24..39) 16
3: 253:16 [40..47]: per-AG metadata 0 (40..47) 8
4: 253:16 [48..55]: refcount btree 0 (48..55) 8
5: 253:16 [56..103]: per-AG metadata 0 (56..103) 48
6: 253:16 [104..127]: free space 0 (104..127) 24
......
Bug:
[root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt
[root@fedora ~]#
Normally, we should be able to get one record, but we got nothing.
The root cause of this problem lies in the incorrect setting of rm_owner in
the rmap query. In the case of the initial query where the owner is not
set, __xfs_getfsmap_datadev() first sets info->high.rm_owner to ULLONG_MAX.
This is done to prevent any omissions when comparing rmap items. However,
if the current ag is detected to be the last one, the function sets info's
high_irec based on the provided key. If high->rm_owner is not specified, it
should continue to be set to ULLONG_MAX; otherwise, there will be issues
with interval omissions. For example, consider "start" and "end" within the
same block. If high->rm_owner == 0, it will be smaller than the founded
record in rmapbt, resulting in a query with no records. The main call stack
is as follows:
xfs_ioc_getfsmap
xfs_getfsmap
xfs_getfsmap_datadev_rmapbt
__xfs_getfsmap_datadev
info->high.rm_owner = ULLONG_MAX
if (pag->pag_agno == end_ag)
xfs_fsmap_owner_to_rmap
// set info->high.rm_owner = 0 because fmr_owner == -1ULL
dest->rm_owner = 0
// get nothing
xfs_getfsmap_datadev_rmapbt_query
The problem can be resolved by simply modify the xfs_fsmap_owner_to_rmap
function internal logic to achieve.
After applying this patch, the above problem have been solved:
[root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt
EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL
0: 253:16 [0..7]: static fs metadata 0 (0..7) 8
Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_fsmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 956a5670e56c..1efd18437ca4 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -69,11 +69,11 @@ xfs_fsmap_owner_to_rmap(
}
switch (src->fmr_owner) {
case 0: /* "lowest owner id possible" */
case -1ULL: /* "highest owner id possible" */
- dest->rm_owner = 0;
+ dest->rm_owner = src->fmr_owner;
break;
case XFS_FMR_OWN_FREE:
dest->rm_owner = XFS_RMAP_OWN_NULL;
break;
case XFS_FMR_OWN_UNKNOWN:
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 21/23] xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (19 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 20/23] xfs: Fix the owner setting issue for rmap query in xfs fsmap Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 22/23] xfs: take m_growlock when running growfsrt Leah Rumancik
` (2 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong, wozizhi,
Christoph Hellwig, Chandan Babu R, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 6b35cc8d9239569700cc7cc737c8ed40b8b9cfdb ]
Use XFS_BUF_DADDR_NULL (instead of a magic sentinel value) to mean "this
field is null" like the rest of xfs.
Cc: wozizhi@huawei.com
Fixes: e89c041338ed6 ("xfs: implement the GETFSMAP ioctl")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_fsmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 1efd18437ca4..a0668a1ef100 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -250,11 +250,11 @@ static inline bool
xfs_getfsmap_rec_before_start(
struct xfs_getfsmap_info *info,
const struct xfs_rmap_irec *rec,
xfs_daddr_t rec_daddr)
{
- if (info->low_daddr != -1ULL)
+ if (info->low_daddr != XFS_BUF_DADDR_NULL)
return rec_daddr < info->low_daddr;
if (info->low.rm_blockcount)
return xfs_rmap_compare(rec, &info->low) < 0;
return false;
}
@@ -984,11 +984,11 @@ xfs_getfsmap(
break;
info.dev = handlers[i].dev;
info.last = false;
info.pag = NULL;
- info.low_daddr = -1ULL;
+ info.low_daddr = XFS_BUF_DADDR_NULL;
info.low.rm_blockcount = 0;
error = handlers[i].fn(tp, dkeys, &info);
if (error)
break;
xfs_trans_cancel(tp);
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 22/23] xfs: take m_growlock when running growfsrt
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (20 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 21/23] xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 23/23] xfs: reset rootdir extent size hint after growfsrt Leah Rumancik
2025-06-14 13:54 ` [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Sasha Levin
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong,
Christoph Hellwig, Chandan Babu R, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit 16e1fbdce9c8d084863fd63cdaff8fb2a54e2f88 ]
Take the grow lock when we're expanding the realtime volume, like we do
for the other growfs calls.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_rtalloc.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 5cf1e91f4c20..149fcfc485d8 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -951,83 +951,93 @@ xfs_growfs_rt(
return -EPERM;
/* Needs to have been mounted with an rt device. */
if (!XFS_IS_REALTIME_MOUNT(mp))
return -EINVAL;
+
+ if (!mutex_trylock(&mp->m_growlock))
+ return -EWOULDBLOCK;
/*
* Mount should fail if the rt bitmap/summary files don't load, but
* we'll check anyway.
*/
+ error = -EINVAL;
if (!mp->m_rbmip || !mp->m_rsumip)
- return -EINVAL;
+ goto out_unlock;
/* Shrink not supported. */
if (in->newblocks <= sbp->sb_rblocks)
- return -EINVAL;
+ goto out_unlock;
/* Can only change rt extent size when adding rt volume. */
if (sbp->sb_rblocks > 0 && in->extsize != sbp->sb_rextsize)
- return -EINVAL;
+ goto out_unlock;
/* Range check the extent size. */
if (XFS_FSB_TO_B(mp, in->extsize) > XFS_MAX_RTEXTSIZE ||
XFS_FSB_TO_B(mp, in->extsize) < XFS_MIN_RTEXTSIZE)
- return -EINVAL;
+ goto out_unlock;
/* Unsupported realtime features. */
+ error = -EOPNOTSUPP;
if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp) || xfs_has_quota(mp))
- return -EOPNOTSUPP;
+ goto out_unlock;
nrblocks = in->newblocks;
error = xfs_sb_validate_fsb_count(sbp, nrblocks);
if (error)
- return error;
+ goto out_unlock;
/*
* Read in the last block of the device, make sure it exists.
*/
error = xfs_buf_read_uncached(mp->m_rtdev_targp,
XFS_FSB_TO_BB(mp, nrblocks - 1),
XFS_FSB_TO_BB(mp, 1), 0, &bp, NULL);
if (error)
- return error;
+ goto out_unlock;
xfs_buf_relse(bp);
/*
* Calculate new parameters. These are the final values to be reached.
*/
nrextents = nrblocks;
do_div(nrextents, in->extsize);
- if (!xfs_validate_rtextents(nrextents))
- return -EINVAL;
+ if (!xfs_validate_rtextents(nrextents)) {
+ error = -EINVAL;
+ goto out_unlock;
+ }
nrbmblocks = howmany_64(nrextents, NBBY * sbp->sb_blocksize);
nrextslog = xfs_compute_rextslog(nrextents);
nrsumlevels = nrextslog + 1;
nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nrbmblocks;
nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize);
nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks);
/*
* New summary size can't be more than half the size of
* the log. This prevents us from getting a log overflow,
* since we'll log basically the whole summary file at once.
*/
- if (nrsumblocks > (mp->m_sb.sb_logblocks >> 1))
- return -EINVAL;
+ if (nrsumblocks > (mp->m_sb.sb_logblocks >> 1)) {
+ error = -EINVAL;
+ goto out_unlock;
+ }
+
/*
* Get the old block counts for bitmap and summary inodes.
* These can't change since other growfs callers are locked out.
*/
rbmblocks = XFS_B_TO_FSB(mp, mp->m_rbmip->i_disk_size);
rsumblocks = XFS_B_TO_FSB(mp, mp->m_rsumip->i_disk_size);
/*
* Allocate space to the bitmap and summary files, as necessary.
*/
error = xfs_growfs_rt_alloc(mp, rbmblocks, nrbmblocks, mp->m_rbmip);
if (error)
- return error;
+ goto out_unlock;
error = xfs_growfs_rt_alloc(mp, rsumblocks, nrsumblocks, mp->m_rsumip);
if (error)
- return error;
+ goto out_unlock;
rsum_cache = mp->m_rsum_cache;
if (nrbmblocks != sbp->sb_rbmblocks)
xfs_alloc_rsum_cache(mp, nrbmblocks);
@@ -1188,10 +1198,12 @@ xfs_growfs_rt(
} else {
kmem_free(rsum_cache);
}
}
+out_unlock:
+ mutex_unlock(&mp->m_growlock);
return error;
}
/*
* Allocate an extent in the realtime subvolume, with the usual allocation
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6.1 23/23] xfs: reset rootdir extent size hint after growfsrt
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (21 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 22/23] xfs: take m_growlock when running growfsrt Leah Rumancik
@ 2025-06-11 21:01 ` Leah Rumancik
2025-06-14 13:54 ` [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Sasha Levin
23 siblings, 0 replies; 25+ messages in thread
From: Leah Rumancik @ 2025-06-11 21:01 UTC (permalink / raw)
To: stable
Cc: xfs-stable, chandan.babu, catherine.hoang, djwong,
Christoph Hellwig, Chandan Babu R, Leah Rumancik
From: "Darrick J. Wong" <djwong@kernel.org>
[ Upstream commit a24cae8fc1f13f6f6929351309f248fd2e9351ce ]
If growfsrt is run on a filesystem that doesn't have a rt volume, it's
possible to change the rt extent size. If the root directory was
previously set up with an inherited extent size hint and rtinherit, it's
possible that the hint is no longer a multiple of the rt extent size.
Although the verifiers don't complain about this, xfs_repair will, so if
we detect this situation, log the root directory to clean it up. This
is still racy, but it's better than nothing.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_rtalloc.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 149fcfc485d8..fc21b4e81ade 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -913,10 +913,43 @@ xfs_alloc_rsum_cache(
mp->m_rsum_cache = kvzalloc(rbmblocks, GFP_KERNEL);
if (!mp->m_rsum_cache)
xfs_warn(mp, "could not allocate realtime summary cache");
}
+/*
+ * If we changed the rt extent size (meaning there was no rt volume previously)
+ * and the root directory had EXTSZINHERIT and RTINHERIT set, it's possible
+ * that the extent size hint on the root directory is no longer congruent with
+ * the new rt extent size. Log the rootdir inode to fix this.
+ */
+static int
+xfs_growfs_rt_fixup_extsize(
+ struct xfs_mount *mp)
+{
+ struct xfs_inode *ip = mp->m_rootip;
+ struct xfs_trans *tp;
+ int error = 0;
+
+ xfs_ilock(ip, XFS_IOLOCK_EXCL);
+ if (!(ip->i_diflags & XFS_DIFLAG_RTINHERIT) ||
+ !(ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT))
+ goto out_iolock;
+
+ error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange, 0, 0, false,
+ &tp);
+ if (error)
+ goto out_iolock;
+
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ error = xfs_trans_commit(tp);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+out_iolock:
+ xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+ return error;
+}
+
/*
* Visible (exported) functions.
*/
/*
@@ -942,10 +975,11 @@ xfs_growfs_rt(
xfs_extlen_t rbmblocks; /* current number of rt bitmap blocks */
xfs_extlen_t rsumblocks; /* current number of rt summary blks */
xfs_sb_t *sbp; /* old superblock */
xfs_fsblock_t sumbno; /* summary block number */
uint8_t *rsum_cache; /* old summary cache */
+ xfs_agblock_t old_rextsize = mp->m_sb.sb_rextsize;
sbp = &mp->m_sb;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -1175,10 +1209,16 @@ xfs_growfs_rt(
mp->m_features |= XFS_FEAT_REALTIME;
}
if (error)
goto out_free;
+ if (old_rextsize != in->extsize) {
+ error = xfs_growfs_rt_fixup_extsize(mp);
+ if (error)
+ goto out_free;
+ }
+
/* Update secondary superblocks now the physical grow has completed */
error = xfs_update_secondary_sbs(mp);
out_free:
/*
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
` (22 preceding siblings ...)
2025-06-11 21:01 ` [PATCH 6.1 23/23] xfs: reset rootdir extent size hint after growfsrt Leah Rumancik
@ 2025-06-14 13:54 ` Sasha Levin
23 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-06-14 13:54 UTC (permalink / raw)
To: Leah Rumancik; +Cc: stable, xfs-stable, chandan.babu, catherine.hoang, djwong
On Wed, Jun 11, 2025 at 02:01:04PM -0700, Leah Rumancik wrote:
>Hello again,
>
>This is a series for 6.1.y for fixes from 6.11. It corresponds to the
>6.6.y series here:
>https://lore.kernel.org/linux-xfs/20241218191725.63098-1-catherine.hoang@oracle.com/
Queued up, thanks!
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-06-14 13:54 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 21:01 [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 01/23] xfs: fix interval filtering in multi-step fsmap queries Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 02/23] xfs: fix integer overflows in the fsmap rtbitmap and logdev backends Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 03/23] xfs: fix getfsmap reporting past the last rt extent Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 04/23] xfs: clean up the rtbitmap fsmap backend Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 05/23] xfs: fix logdev fsmap query result filtering Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 06/23] xfs: validate fsmap offsets specified in the query keys Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 07/23] xfs: fix xfs_btree_query_range callers to initialize btree rec fully Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 08/23] xfs: fix an agbno overflow in __xfs_getfsmap_datadev Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 09/23] xfs: fix the contact address for the sysfs ABI documentation Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 10/23] xfs: verify buffer, inode, and dquot items every tx commit Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 11/23] xfs: use consistent uid/gid when grabbing dquots for inodes Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 12/23] xfs: declare xfs_file.c symbols in xfs_file.h Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 13/23] xfs: create a new helper to return a file's allocation unit Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 14/23] xfs: Fix xfs_flush_unmap_range() range for RT Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 15/23] xfs: Fix xfs_prepare_shift() " Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 16/23] xfs: don't walk off the end of a directory data block Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 17/23] xfs: remove unused parameter in macro XFS_DQUOT_LOGRES Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 18/23] xfs: attr forks require attr, not attr2 Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 19/23] xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 20/23] xfs: Fix the owner setting issue for rmap query in xfs fsmap Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 21/23] xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 22/23] xfs: take m_growlock when running growfsrt Leah Rumancik
2025-06-11 21:01 ` [PATCH 6.1 23/23] xfs: reset rootdir extent size hint after growfsrt Leah Rumancik
2025-06-14 13:54 ` [PATCH 6.1 00/23] fixes from 6.11 for 6.1.y Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox