* [PATCHSET 0/3] xfs: reload entire iunlink lists
@ 2023-09-03 16:15 Darrick J. Wong
2023-09-03 16:15 ` [PATCH 1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list Darrick J. Wong
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Darrick J. Wong @ 2023-09-03 16:15 UTC (permalink / raw)
To: chandan.babu, djwong; +Cc: linux-xfs, david
Hi all,
This is the second part of correcting XFS to reload the incore unlinked
inode list from the ondisk contents. Whereas part one tackled failures
from regular filesystem calls, this part takes on the problem of needing
to reload the entire incore unlinked inode list on account of somebody
loading an inode that's in the /middle/ of an unlinked list. This
happens during quotacheck, bulkstat, or even opening a file by handle.
In this case we don't know the length of the list that we're reloading,
so we don't want to create a new unbounded memory load while holding
resources locked. Instead, we'll target UNTRUSTED iget calls to reload
the entire bucket.
Note that this changes the definition of the incore unlinked inode list
slightly -- i_prev_unlinked == 0 now means "not on the incore list".
If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.
This has been lightly tested with fstests. Enjoy!
Comments and questions are, as always, welcome.
--D
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fix-iunlink-list-6.6
---
fs/xfs/xfs_export.c | 6 +++
fs/xfs/xfs_icache.c | 2 -
fs/xfs/xfs_inode.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++--
fs/xfs/xfs_inode.h | 34 ++++++++++++++-
fs/xfs/xfs_itable.c | 9 ++++
fs/xfs/xfs_mount.h | 7 +++
fs/xfs/xfs_qm.c | 7 +++
fs/xfs/xfs_trace.h | 20 +++++++++
8 files changed, 193 insertions(+), 7 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
2023-09-03 16:15 [PATCHSET 0/3] xfs: reload entire iunlink lists Darrick J. Wong
@ 2023-09-03 16:15 ` Darrick J. Wong
2023-09-07 6:27 ` Dave Chinner
2023-09-03 16:15 ` [PATCH 2/3] xfs: reload entire unlinked bucket lists Darrick J. Wong
2023-09-03 16:16 ` [PATCH 3/3] xfs: make inode unlinked bucket recovery work with quotacheck Darrick J. Wong
2 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2023-09-03 16:15 UTC (permalink / raw)
To: chandan.babu, djwong; +Cc: linux-xfs, david
From: Darrick J. Wong <djwong@kernel.org>
Alter the definition of i_prev_unlinked slightly to make it more obvious
when an inode with 0 link count is not part of the iunlink bucket lists
rooted in the AGI. This distinction is necessary because it is not
sufficient to check inode.i_nlink to decide if an inode is on the
unlinked list. Updates to i_nlink can happen while holding only
ILOCK_EXCL, but updates to an inode's position in the AGI unlinked list
(which happen after the nlink update) requires both ILOCK_EXCL and the
AGI buffer lock.
The next few patches will make it possible to reload an entire unlinked
bucket list when we're walking the inode table or performing handle
operations and need more than the ability to iget the last inode in the
chain.
The upcoming directory repair code also needs to be able to make this
distinction to decide if a zero link count directory should be moved to
the orphanage or allowed to inactivate. An upcoming enhancement to the
online AGI fsck code will need this distinction to check and rebuild the
AGI unlinked buckets.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_icache.c | 2 +-
fs/xfs/xfs_inode.c | 3 ++-
fs/xfs/xfs_inode.h | 20 +++++++++++++++++++-
3 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 30d7454a9b93..3c210ac83713 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -113,7 +113,7 @@ xfs_inode_alloc(
INIT_LIST_HEAD(&ip->i_ioend_list);
spin_lock_init(&ip->i_ioend_lock);
ip->i_next_unlinked = NULLAGINO;
- ip->i_prev_unlinked = NULLAGINO;
+ ip->i_prev_unlinked = 0;
return ip;
}
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 3e3df3d8bcdd..6cd2f29b540a 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2015,6 +2015,7 @@ xfs_iunlink_insert_inode(
}
/* Point the head of the list to point to this inode. */
+ ip->i_prev_unlinked = NULLAGINO;
return xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, agino);
}
@@ -2117,7 +2118,7 @@ xfs_iunlink_remove_inode(
}
ip->i_next_unlinked = NULLAGINO;
- ip->i_prev_unlinked = NULLAGINO;
+ ip->i_prev_unlinked = 0;
return error;
}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 7547caf2f2ab..65aae8925509 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -68,8 +68,21 @@ typedef struct xfs_inode {
uint64_t i_diflags2; /* XFS_DIFLAG2_... */
struct timespec64 i_crtime; /* time created */
- /* unlinked list pointers */
+ /*
+ * Unlinked list pointers. These point to the next and previous inodes
+ * in the AGI unlinked bucket list, respectively. These fields can
+ * only be updated with the AGI locked.
+ *
+ * i_next_unlinked caches di_next_unlinked.
+ */
xfs_agino_t i_next_unlinked;
+
+ /*
+ * If the inode is not on an unlinked list, this field is zero. If the
+ * inode is the first element in an unlinked list, this field is
+ * NULLAGINO. Otherwise, i_prev_unlinked points to the previous inode
+ * in the unlinked list.
+ */
xfs_agino_t i_prev_unlinked;
/* VFS inode */
@@ -81,6 +94,11 @@ typedef struct xfs_inode {
struct list_head i_ioend_list;
} xfs_inode_t;
+static inline bool xfs_inode_on_unlinked_list(const struct xfs_inode *ip)
+{
+ return ip->i_prev_unlinked != 0;
+}
+
static inline bool xfs_inode_has_attr_fork(struct xfs_inode *ip)
{
return ip->i_forkoff > 0;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] xfs: reload entire unlinked bucket lists
2023-09-03 16:15 [PATCHSET 0/3] xfs: reload entire iunlink lists Darrick J. Wong
2023-09-03 16:15 ` [PATCH 1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list Darrick J. Wong
@ 2023-09-03 16:15 ` Darrick J. Wong
2023-09-07 7:02 ` Dave Chinner
2023-09-03 16:16 ` [PATCH 3/3] xfs: make inode unlinked bucket recovery work with quotacheck Darrick J. Wong
2 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2023-09-03 16:15 UTC (permalink / raw)
To: chandan.babu, djwong; +Cc: linux-xfs, david
From: Darrick J. Wong <djwong@kernel.org>
The previous patch to reload unrecovered unlinked inodes when adding a
newly created inode to the unlinked list is missing a key piece of
functionality. It doesn't handle the case that someone calls xfs_iget
on an inode that is not the last item in the incore list. For example,
if at mount time the ondisk iunlink bucket looks like this:
AGI -> 7 -> 22 -> 3 -> NULL
None of these three inodes are cached in memory. Now let's say that
someone tries to open inode 3 by handle. We need to walk the list to
make sure that inodes 7 and 22 get loaded cold, and that the
i_prev_unlinked of inode 3 gets set to 22.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_export.c | 6 +++
fs/xfs/xfs_inode.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_inode.h | 9 +++++
fs/xfs/xfs_itable.c | 9 +++++
fs/xfs/xfs_trace.h | 20 ++++++++++
5 files changed, 144 insertions(+)
diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
index 1064c2342876..f71ea786a6d2 100644
--- a/fs/xfs/xfs_export.c
+++ b/fs/xfs/xfs_export.c
@@ -146,6 +146,12 @@ xfs_nfs_get_inode(
return ERR_PTR(error);
}
+ error = xfs_inode_reload_unlinked(ip);
+ if (error) {
+ xfs_irele(ip);
+ return ERR_PTR(error);
+ }
+
if (VFS_I(ip)->i_generation != generation) {
xfs_irele(ip);
return ERR_PTR(-ESTALE);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 6cd2f29b540a..56f6bde6001b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3607,3 +3607,103 @@ xfs_iunlock2_io_mmap(
if (ip1 != ip2)
inode_unlock(VFS_I(ip1));
}
+
+/*
+ * Reload the incore inode list for this inode. Caller should ensure that
+ * the link count cannot change, either by taking ILOCK_SHARED or otherwise
+ * preventing other threads from executing.
+ */
+int
+xfs_inode_reload_unlinked_bucket(
+ struct xfs_trans *tp,
+ struct xfs_inode *ip)
+{
+ struct xfs_mount *mp = tp->t_mountp;
+ struct xfs_buf *agibp;
+ struct xfs_agi *agi;
+ struct xfs_perag *pag;
+ xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+ xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
+ xfs_agino_t prev_agino, next_agino;
+ unsigned int bucket;
+ bool foundit = false;
+ int error;
+
+ /* Grab the first inode in the list */
+ pag = xfs_perag_get(mp, agno);
+ error = xfs_ialloc_read_agi(pag, tp, &agibp);
+ xfs_perag_put(pag);
+ if (error)
+ return error;
+
+ bucket = agino % XFS_AGI_UNLINKED_BUCKETS;
+ agi = agibp->b_addr;
+
+ trace_xfs_inode_reload_unlinked_bucket(ip);
+
+ xfs_info_ratelimited(mp,
+ "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating list recovery.",
+ agino, agno);
+
+ prev_agino = NULLAGINO;
+ next_agino = be32_to_cpu(agi->agi_unlinked[bucket]);
+ while (next_agino != NULLAGINO) {
+ struct xfs_inode *next_ip = NULL;
+
+ if (next_agino == agino) {
+ /* Found this inode, set its backlink. */
+ next_ip = ip;
+ next_ip->i_prev_unlinked = prev_agino;
+ foundit = true;
+ }
+ if (!next_ip) {
+ /* Inode already in memory. */
+ next_ip = xfs_iunlink_lookup(pag, next_agino);
+ }
+ if (!next_ip) {
+ /* Inode not in memory, reload. */
+ error = xfs_iunlink_reload_next(tp, agibp, prev_agino,
+ next_agino);
+ if (error)
+ break;
+
+ next_ip = xfs_iunlink_lookup(pag, next_agino);
+ }
+ if (!next_ip) {
+ /* No incore inode at all? We reloaded it... */
+ ASSERT(next_ip != NULL);
+ error = -EFSCORRUPTED;
+ break;
+ }
+
+ prev_agino = next_agino;
+ next_agino = next_ip->i_next_unlinked;
+ }
+
+ xfs_trans_brelse(tp, agibp);
+ /* Should have found this inode somewhere in the iunlinked bucket. */
+ if (!error && !foundit)
+ error = -EFSCORRUPTED;
+ return error;
+}
+
+/* Decide if this inode is missing its unlinked list and reload it. */
+int
+xfs_inode_reload_unlinked(
+ struct xfs_inode *ip)
+{
+ struct xfs_trans *tp;
+ int error;
+
+ error = xfs_trans_alloc_empty(ip->i_mount, &tp);
+ if (error)
+ return error;
+
+ xfs_ilock(ip, XFS_ILOCK_SHARED);
+ if (xfs_inode_unlinked_incomplete(ip))
+ error = xfs_inode_reload_unlinked_bucket(tp, ip);
+ xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ xfs_trans_cancel(tp);
+
+ return error;
+}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 65aae8925509..a111b5551ecd 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -593,4 +593,13 @@ void xfs_end_io(struct work_struct *work);
int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
+static inline bool
+xfs_inode_unlinked_incomplete(
+ struct xfs_inode *ip)
+{
+ return VFS_I(ip)->i_nlink == 0 && !xfs_inode_on_unlinked_list(ip);
+}
+int xfs_inode_reload_unlinked_bucket(struct xfs_trans *tp, struct xfs_inode *ip);
+int xfs_inode_reload_unlinked(struct xfs_inode *ip);
+
#endif /* __XFS_INODE_H__ */
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index f225413a993c..ea38d69b9922 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -80,6 +80,15 @@ xfs_bulkstat_one_int(
if (error)
goto out;
+ if (xfs_inode_unlinked_incomplete(ip)) {
+ error = xfs_inode_reload_unlinked_bucket(tp, ip);
+ if (error) {
+ xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ xfs_irele(ip);
+ return error;
+ }
+ }
+
ASSERT(ip != NULL);
ASSERT(ip->i_imap.im_blkno != 0);
inode = VFS_I(ip);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index f4e46bac9b91..31418d56618d 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3857,6 +3857,26 @@ TRACE_EVENT(xfs_iunlink_reload_next,
__entry->next_agino)
);
+TRACE_EVENT(xfs_inode_reload_unlinked_bucket,
+ TP_PROTO(struct xfs_inode *ip),
+ TP_ARGS(ip),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agino_t, agino)
+ ),
+ TP_fast_assign(
+ __entry->dev = ip->i_mount->m_super->s_dev;
+ __entry->agno = XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino);
+ __entry->agino = XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino);
+ ),
+ TP_printk("dev %d:%d agno 0x%x agino 0x%x bucket %u",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->agino,
+ __entry->agino % XFS_AGI_UNLINKED_BUCKETS)
+);
+
DECLARE_EVENT_CLASS(xfs_ag_inode_class,
TP_PROTO(struct xfs_inode *ip),
TP_ARGS(ip),
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] xfs: make inode unlinked bucket recovery work with quotacheck
2023-09-03 16:15 [PATCHSET 0/3] xfs: reload entire iunlink lists Darrick J. Wong
2023-09-03 16:15 ` [PATCH 1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list Darrick J. Wong
2023-09-03 16:15 ` [PATCH 2/3] xfs: reload entire unlinked bucket lists Darrick J. Wong
@ 2023-09-03 16:16 ` Darrick J. Wong
2023-09-05 16:33 ` [PATCH v1.1 " Darrick J. Wong
2 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2023-09-03 16:16 UTC (permalink / raw)
To: chandan.babu, djwong; +Cc: linux-xfs, david
From: Darrick J. Wong <djwong@kernel.org>
Teach quotacheck to reload the unlinked inode lists when walking the
inode table. This requires extra state handling, since it's possible
that a reloaded inode will get inactivated before quotacheck tries to
scan it; in this case, we need to ensure that the reloaded inode does
not have dquots attached when it is freed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_inode.c | 12 +++++++++---
fs/xfs/xfs_inode.h | 5 ++++-
fs/xfs/xfs_mount.h | 7 +++++++
fs/xfs/xfs_qm.c | 7 +++++++
4 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 56f6bde6001b..22af7268169b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1743,9 +1743,13 @@ xfs_inactive(
ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0))
truncate = 1;
- error = xfs_qm_dqattach(ip);
- if (error)
- goto out;
+ if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) {
+ xfs_qm_dqdetach(ip);
+ } else {
+ error = xfs_qm_dqattach(ip);
+ if (error)
+ goto out;
+ }
if (S_ISLNK(VFS_I(ip)->i_mode))
error = xfs_inactive_symlink(ip);
@@ -1963,6 +1967,8 @@ xfs_iunlink_reload_next(
trace_xfs_iunlink_reload_next(next_ip);
rele:
ASSERT(!(VFS_I(next_ip)->i_state & I_DONTCACHE));
+ if (xfs_is_quotacheck_running(mp) && next_ip)
+ xfs_iflags_set(next_ip, XFS_IQUOTAUNCHECKED);
xfs_irele(next_ip);
return error;
}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index a111b5551ecd..0c5bdb91152e 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -344,6 +344,9 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
*/
#define XFS_INACTIVATING (1 << 13)
+/* Quotacheck is running but inode has not been added to quota counts. */
+#define XFS_IQUOTAUNCHECKED (1 << 14)
+
/* All inode state flags related to inode reclaim. */
#define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
XFS_IRECLAIM | \
@@ -358,7 +361,7 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
#define XFS_IRECLAIM_RESET_FLAGS \
(XFS_IRECLAIMABLE | XFS_IRECLAIM | \
XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \
- XFS_INACTIVATING)
+ XFS_INACTIVATING | XFS_IQUOTAUNCHECKED)
/*
* Flags for inode locking.
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 6e2806654e94..e6ae627ac771 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -405,6 +405,8 @@ __XFS_HAS_FEAT(nouuid, NOUUID)
#define XFS_OPSTATE_WARNED_SHRINK 8
/* Kernel has logged a warning about logged xattr updates being used. */
#define XFS_OPSTATE_WARNED_LARP 9
+/* Mount time quotacheck is running */
+#define XFS_OPSTATE_QUOTACHECK_RUNNING 10
#define __XFS_IS_OPSTATE(name, NAME) \
static inline bool xfs_is_ ## name (struct xfs_mount *mp) \
@@ -427,6 +429,11 @@ __XFS_IS_OPSTATE(inode32, INODE32)
__XFS_IS_OPSTATE(readonly, READONLY)
__XFS_IS_OPSTATE(inodegc_enabled, INODEGC_ENABLED)
__XFS_IS_OPSTATE(blockgc_enabled, BLOCKGC_ENABLED)
+#ifdef CONFIG_QUOTA
+__XFS_IS_OPSTATE(quotacheck_running, QUOTACHECK_RUNNING)
+#else
+# define xfs_is_quotacheck_running(mp) (false)
+#endif
static inline bool
xfs_should_warn(struct xfs_mount *mp, long nr)
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 6abcc34fafd8..7256090c3895 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1160,6 +1160,10 @@ xfs_qm_dqusage_adjust(
if (error)
return error;
+ error = xfs_inode_reload_unlinked(ip);
+ if (error)
+ goto error0;
+
ASSERT(ip->i_delayed_blks == 0);
if (XFS_IS_REALTIME_INODE(ip)) {
@@ -1173,6 +1177,7 @@ xfs_qm_dqusage_adjust(
}
nblks = (xfs_qcnt_t)ip->i_nblocks - rtblks;
+ xfs_iflags_clear(ip, XFS_IQUOTAUNCHECKED);
/*
* Add the (disk blocks and inode) resources occupied by this
@@ -1319,8 +1324,10 @@ xfs_qm_quotacheck(
flags |= XFS_PQUOTA_CHKD;
}
+ xfs_set_quotacheck_running(mp);
error = xfs_iwalk_threaded(mp, 0, 0, xfs_qm_dqusage_adjust, 0, true,
NULL);
+ xfs_clear_quotacheck_running(mp);
/*
* On error, the inode walk may have partially populated the dquot
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v1.1 3/3] xfs: make inode unlinked bucket recovery work with quotacheck
2023-09-03 16:16 ` [PATCH 3/3] xfs: make inode unlinked bucket recovery work with quotacheck Darrick J. Wong
@ 2023-09-05 16:33 ` Darrick J. Wong
2023-09-07 7:11 ` Dave Chinner
0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2023-09-05 16:33 UTC (permalink / raw)
To: chandan.babu; +Cc: linux-xfs, david
From: Darrick J. Wong <djwong@kernel.org>
Teach quotacheck to reload the unlinked inode lists when walking the
inode table. This requires extra state handling, since it's possible
that a reloaded inode will get inactivated before quotacheck tries to
scan it; in this case, we need to ensure that the reloaded inode does
not have dquots attached when it is freed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v1.1: s/CONFIG_QUOTA/CONFIG_XFS_QUOTA/ and fix tracepoint flags decoding
---
fs/xfs/xfs_inode.c | 12 +++++++++---
fs/xfs/xfs_inode.h | 5 ++++-
fs/xfs/xfs_mount.h | 10 +++++++++-
fs/xfs/xfs_qm.c | 7 +++++++
4 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 56f6bde6001b..22af7268169b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1743,9 +1743,13 @@ xfs_inactive(
ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0))
truncate = 1;
- error = xfs_qm_dqattach(ip);
- if (error)
- goto out;
+ if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) {
+ xfs_qm_dqdetach(ip);
+ } else {
+ error = xfs_qm_dqattach(ip);
+ if (error)
+ goto out;
+ }
if (S_ISLNK(VFS_I(ip)->i_mode))
error = xfs_inactive_symlink(ip);
@@ -1963,6 +1967,8 @@ xfs_iunlink_reload_next(
trace_xfs_iunlink_reload_next(next_ip);
rele:
ASSERT(!(VFS_I(next_ip)->i_state & I_DONTCACHE));
+ if (xfs_is_quotacheck_running(mp) && next_ip)
+ xfs_iflags_set(next_ip, XFS_IQUOTAUNCHECKED);
xfs_irele(next_ip);
return error;
}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index a111b5551ecd..0c5bdb91152e 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -344,6 +344,9 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
*/
#define XFS_INACTIVATING (1 << 13)
+/* Quotacheck is running but inode has not been added to quota counts. */
+#define XFS_IQUOTAUNCHECKED (1 << 14)
+
/* All inode state flags related to inode reclaim. */
#define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
XFS_IRECLAIM | \
@@ -358,7 +361,7 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
#define XFS_IRECLAIM_RESET_FLAGS \
(XFS_IRECLAIMABLE | XFS_IRECLAIM | \
XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \
- XFS_INACTIVATING)
+ XFS_INACTIVATING | XFS_IQUOTAUNCHECKED)
/*
* Flags for inode locking.
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 6e2806654e94..d19cca099bc3 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -405,6 +405,8 @@ __XFS_HAS_FEAT(nouuid, NOUUID)
#define XFS_OPSTATE_WARNED_SHRINK 8
/* Kernel has logged a warning about logged xattr updates being used. */
#define XFS_OPSTATE_WARNED_LARP 9
+/* Mount time quotacheck is running */
+#define XFS_OPSTATE_QUOTACHECK_RUNNING 10
#define __XFS_IS_OPSTATE(name, NAME) \
static inline bool xfs_is_ ## name (struct xfs_mount *mp) \
@@ -427,6 +429,11 @@ __XFS_IS_OPSTATE(inode32, INODE32)
__XFS_IS_OPSTATE(readonly, READONLY)
__XFS_IS_OPSTATE(inodegc_enabled, INODEGC_ENABLED)
__XFS_IS_OPSTATE(blockgc_enabled, BLOCKGC_ENABLED)
+#ifdef CONFIG_XFS_QUOTA
+__XFS_IS_OPSTATE(quotacheck_running, QUOTACHECK_RUNNING)
+#else
+# define xfs_is_quotacheck_running(mp) (false)
+#endif
static inline bool
xfs_should_warn(struct xfs_mount *mp, long nr)
@@ -444,7 +451,8 @@ xfs_should_warn(struct xfs_mount *mp, long nr)
{ (1UL << XFS_OPSTATE_BLOCKGC_ENABLED), "blockgc" }, \
{ (1UL << XFS_OPSTATE_WARNED_SCRUB), "wscrub" }, \
{ (1UL << XFS_OPSTATE_WARNED_SHRINK), "wshrink" }, \
- { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" }
+ { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" }, \
+ { (1UL << XFS_OPSTATE_QUOTACHECK_RUNNING), "quotacheck" }
/*
* Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 6abcc34fafd8..7256090c3895 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1160,6 +1160,10 @@ xfs_qm_dqusage_adjust(
if (error)
return error;
+ error = xfs_inode_reload_unlinked(ip);
+ if (error)
+ goto error0;
+
ASSERT(ip->i_delayed_blks == 0);
if (XFS_IS_REALTIME_INODE(ip)) {
@@ -1173,6 +1177,7 @@ xfs_qm_dqusage_adjust(
}
nblks = (xfs_qcnt_t)ip->i_nblocks - rtblks;
+ xfs_iflags_clear(ip, XFS_IQUOTAUNCHECKED);
/*
* Add the (disk blocks and inode) resources occupied by this
@@ -1319,8 +1324,10 @@ xfs_qm_quotacheck(
flags |= XFS_PQUOTA_CHKD;
}
+ xfs_set_quotacheck_running(mp);
error = xfs_iwalk_threaded(mp, 0, 0, xfs_qm_dqusage_adjust, 0, true,
NULL);
+ xfs_clear_quotacheck_running(mp);
/*
* On error, the inode walk may have partially populated the dquot
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
2023-09-03 16:15 ` [PATCH 1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list Darrick J. Wong
@ 2023-09-07 6:27 ` Dave Chinner
0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2023-09-07 6:27 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: chandan.babu, linux-xfs
On Sun, Sep 03, 2023 at 09:15:53AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Alter the definition of i_prev_unlinked slightly to make it more obvious
> when an inode with 0 link count is not part of the iunlink bucket lists
> rooted in the AGI. This distinction is necessary because it is not
> sufficient to check inode.i_nlink to decide if an inode is on the
> unlinked list. Updates to i_nlink can happen while holding only
> ILOCK_EXCL, but updates to an inode's position in the AGI unlinked list
> (which happen after the nlink update) requires both ILOCK_EXCL and the
> AGI buffer lock.
>
> The next few patches will make it possible to reload an entire unlinked
> bucket list when we're walking the inode table or performing handle
> operations and need more than the ability to iget the last inode in the
> chain.
>
> The upcoming directory repair code also needs to be able to make this
> distinction to decide if a zero link count directory should be moved to
> the orphanage or allowed to inactivate. An upcoming enhancement to the
> online AGI fsck code will need this distinction to check and rebuild the
> AGI unlinked buckets.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> fs/xfs/xfs_icache.c | 2 +-
> fs/xfs/xfs_inode.c | 3 ++-
> fs/xfs/xfs_inode.h | 20 +++++++++++++++++++-
> 3 files changed, 22 insertions(+), 3 deletions(-)
Looks ok.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] xfs: reload entire unlinked bucket lists
2023-09-03 16:15 ` [PATCH 2/3] xfs: reload entire unlinked bucket lists Darrick J. Wong
@ 2023-09-07 7:02 ` Dave Chinner
2023-09-07 18:30 ` Darrick J. Wong
0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2023-09-07 7:02 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: chandan.babu, linux-xfs
On Sun, Sep 03, 2023 at 09:15:59AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> The previous patch to reload unrecovered unlinked inodes when adding a
> newly created inode to the unlinked list is missing a key piece of
> functionality. It doesn't handle the case that someone calls xfs_iget
> on an inode that is not the last item in the incore list. For example,
> if at mount time the ondisk iunlink bucket looks like this:
>
> AGI -> 7 -> 22 -> 3 -> NULL
>
> None of these three inodes are cached in memory. Now let's say that
> someone tries to open inode 3 by handle. We need to walk the list to
> make sure that inodes 7 and 22 get loaded cold, and that the
> i_prev_unlinked of inode 3 gets set to 22.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> fs/xfs/xfs_export.c | 6 +++
> fs/xfs/xfs_inode.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_inode.h | 9 +++++
> fs/xfs/xfs_itable.c | 9 +++++
> fs/xfs/xfs_trace.h | 20 ++++++++++
> 5 files changed, 144 insertions(+)
>
>
> diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
> index 1064c2342876..f71ea786a6d2 100644
> --- a/fs/xfs/xfs_export.c
> +++ b/fs/xfs/xfs_export.c
> @@ -146,6 +146,12 @@ xfs_nfs_get_inode(
> return ERR_PTR(error);
> }
>
> + error = xfs_inode_reload_unlinked(ip);
> + if (error) {
> + xfs_irele(ip);
> + return ERR_PTR(error);
> + }
We don't want to be creating an empty transaction, locking the inode
and then cancelling the transaction having done nothing on every
NFS handle we have to resolve. We only want to do this if the link
count is zero and i_prev_unlinked == 0, at which point we can then
take the slow path (i.e call xfs_inode_reload_unlinked()) to reload
the unlinked list. Something like:
if (xfs_inode_unlinked_incomplete(ip)) {
error = xfs_inode_reload_unlinked(ip);
if (error) {
xfs_irele(ip);
return ERR_PTR(error);
}
}
Hmmmm. if i_nlink is zero on ip and we call xfs_irele() on it, the
it will be punted straight to inactivation, which will try to remove
it from the unlinked list and free it. But we just failed to reload
the unlinked list, so inactivation is going to go badly...
So on reload error, shouldn't we be shutting down the filesystem
because we know we cannot safely remove this inode from the unlinked
list and free it?
> +
> if (VFS_I(ip)->i_generation != generation) {
> xfs_irele(ip);
> return ERR_PTR(-ESTALE);
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 6cd2f29b540a..56f6bde6001b 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3607,3 +3607,103 @@ xfs_iunlock2_io_mmap(
> if (ip1 != ip2)
> inode_unlock(VFS_I(ip1));
> }
> +
> +/*
> + * Reload the incore inode list for this inode. Caller should ensure that
> + * the link count cannot change, either by taking ILOCK_SHARED or otherwise
> + * preventing other threads from executing.
> + */
> +int
> +xfs_inode_reload_unlinked_bucket(
> + struct xfs_trans *tp,
> + struct xfs_inode *ip)
> +{
> + struct xfs_mount *mp = tp->t_mountp;
> + struct xfs_buf *agibp;
> + struct xfs_agi *agi;
> + struct xfs_perag *pag;
> + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
> + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
> + xfs_agino_t prev_agino, next_agino;
> + unsigned int bucket;
> + bool foundit = false;
> + int error;
> +
> + /* Grab the first inode in the list */
> + pag = xfs_perag_get(mp, agno);
> + error = xfs_ialloc_read_agi(pag, tp, &agibp);
> + xfs_perag_put(pag);
> + if (error)
> + return error;
> +
> + bucket = agino % XFS_AGI_UNLINKED_BUCKETS;
> + agi = agibp->b_addr;
> +
> + trace_xfs_inode_reload_unlinked_bucket(ip);
> +
> + xfs_info_ratelimited(mp,
> + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating list recovery.",
> + agino, agno);
> +
> + prev_agino = NULLAGINO;
> + next_agino = be32_to_cpu(agi->agi_unlinked[bucket]);
> + while (next_agino != NULLAGINO) {
> + struct xfs_inode *next_ip = NULL;
> +
> + if (next_agino == agino) {
> + /* Found this inode, set its backlink. */
> + next_ip = ip;
> + next_ip->i_prev_unlinked = prev_agino;
> + foundit = true;
> + }
> + if (!next_ip) {
> + /* Inode already in memory. */
> + next_ip = xfs_iunlink_lookup(pag, next_agino);
> + }
> + if (!next_ip) {
> + /* Inode not in memory, reload. */
> + error = xfs_iunlink_reload_next(tp, agibp, prev_agino,
> + next_agino);
> + if (error)
> + break;
> +
> + next_ip = xfs_iunlink_lookup(pag, next_agino);
> + }
> + if (!next_ip) {
> + /* No incore inode at all? We reloaded it... */
> + ASSERT(next_ip != NULL);
> + error = -EFSCORRUPTED;
> + break;
> + }
Not a great fan of this logic construct, nor am I great fan of
asking you to restructure code for vanity reasons. However, I do
think a goto improves it quite a lot:
while (next_agino != NULLAGINO) {
struct xfs_inode *next_ip = NULL;
if (next_agino == agino) {
/* Found this inode, set its backlink. */
next_ip = ip;
next_ip->i_prev_unlinked = prev_agino;
foundit = true;
goto next_inode;
}
/* Try in-memory lookup first. */
next_ip = xfs_iunlink_lookup(pag, next_agino);
if (next_ip)
goto next_inode;
/* Inode not in memory, try reloading it. */
error = xfs_iunlink_reload_next(tp, agibp, prev_agino,
next_agino);
if (error)
break;
/* Grab the reloaded inode */
next_ip = xfs_iunlink_lookup(pag, next_agino);
if (!next_ip) {
/* No incore inode at all, list must be corrupted. */
ASSERT(next_ip != NULL);
error = -EFSCORRUPTED;
break;
}
next_inode:
prev_agino = next_agino;
next_agino = next_ip->i_next_unlinked;
}
If you think it's better, feel free to use this. Otherwise I can
live with the code until I have to touch this code again...
......
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index f225413a993c..ea38d69b9922 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -80,6 +80,15 @@ xfs_bulkstat_one_int(
> if (error)
> goto out;
>
> + if (xfs_inode_unlinked_incomplete(ip)) {
> + error = xfs_inode_reload_unlinked_bucket(tp, ip);
> + if (error) {
> + xfs_iunlock(ip, XFS_ILOCK_SHARED);
> + xfs_irele(ip);
> + return error;
> + }
> + }
Same question here about shutdown on error being necessary....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1.1 3/3] xfs: make inode unlinked bucket recovery work with quotacheck
2023-09-05 16:33 ` [PATCH v1.1 " Darrick J. Wong
@ 2023-09-07 7:11 ` Dave Chinner
2023-09-07 18:34 ` Darrick J. Wong
0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2023-09-07 7:11 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: chandan.babu, linux-xfs
On Tue, Sep 05, 2023 at 09:33:03AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Teach quotacheck to reload the unlinked inode lists when walking the
> inode table. This requires extra state handling, since it's possible
> that a reloaded inode will get inactivated before quotacheck tries to
> scan it; in this case, we need to ensure that the reloaded inode does
> not have dquots attached when it is freed.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> v1.1: s/CONFIG_QUOTA/CONFIG_XFS_QUOTA/ and fix tracepoint flags decoding
> ---
> fs/xfs/xfs_inode.c | 12 +++++++++---
> fs/xfs/xfs_inode.h | 5 ++++-
> fs/xfs/xfs_mount.h | 10 +++++++++-
> fs/xfs/xfs_qm.c | 7 +++++++
> 4 files changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 56f6bde6001b..22af7268169b 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1743,9 +1743,13 @@ xfs_inactive(
> ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0))
> truncate = 1;
>
> - error = xfs_qm_dqattach(ip);
> - if (error)
> - goto out;
> + if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) {
> + xfs_qm_dqdetach(ip);
> + } else {
> + error = xfs_qm_dqattach(ip);
> + if (error)
> + goto out;
> + }
That needs a comment - I'm not going to remember why sometimes we
detatch dquots instead of attach them here....
....
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index 6abcc34fafd8..7256090c3895 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -1160,6 +1160,10 @@ xfs_qm_dqusage_adjust(
> if (error)
> return error;
>
> + error = xfs_inode_reload_unlinked(ip);
> + if (error)
> + goto error0;
Same comment here about doing millions of transaction create/cancel
for inodes that have non-zero link counts....
Also, same comment here about shutting down on reload error because
the irele() call will inactivate the inode and try to remove it from
the unlinked list....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] xfs: reload entire unlinked bucket lists
2023-09-07 7:02 ` Dave Chinner
@ 2023-09-07 18:30 ` Darrick J. Wong
0 siblings, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2023-09-07 18:30 UTC (permalink / raw)
To: Dave Chinner; +Cc: chandan.babu, linux-xfs
On Thu, Sep 07, 2023 at 05:02:12PM +1000, Dave Chinner wrote:
> On Sun, Sep 03, 2023 at 09:15:59AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > The previous patch to reload unrecovered unlinked inodes when adding a
> > newly created inode to the unlinked list is missing a key piece of
> > functionality. It doesn't handle the case that someone calls xfs_iget
> > on an inode that is not the last item in the incore list. For example,
> > if at mount time the ondisk iunlink bucket looks like this:
> >
> > AGI -> 7 -> 22 -> 3 -> NULL
> >
> > None of these three inodes are cached in memory. Now let's say that
> > someone tries to open inode 3 by handle. We need to walk the list to
> > make sure that inodes 7 and 22 get loaded cold, and that the
> > i_prev_unlinked of inode 3 gets set to 22.
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > fs/xfs/xfs_export.c | 6 +++
> > fs/xfs/xfs_inode.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > fs/xfs/xfs_inode.h | 9 +++++
> > fs/xfs/xfs_itable.c | 9 +++++
> > fs/xfs/xfs_trace.h | 20 ++++++++++
> > 5 files changed, 144 insertions(+)
> >
> >
> > diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
> > index 1064c2342876..f71ea786a6d2 100644
> > --- a/fs/xfs/xfs_export.c
> > +++ b/fs/xfs/xfs_export.c
> > @@ -146,6 +146,12 @@ xfs_nfs_get_inode(
> > return ERR_PTR(error);
> > }
> >
> > + error = xfs_inode_reload_unlinked(ip);
> > + if (error) {
> > + xfs_irele(ip);
> > + return ERR_PTR(error);
> > + }
>
> We don't want to be creating an empty transaction, locking the inode
> and then cancelling the transaction having done nothing on every
> NFS handle we have to resolve. We only want to do this if the link
> count is zero and i_prev_unlinked == 0, at which point we can then
> take the slow path (i.e call xfs_inode_reload_unlinked()) to reload
> the unlinked list. Something like:
Hmm. Using an unlocked check here makes me nervous, since we could be
racing with another thread that is unlinking the inode. OTOH, this code
doesn't start *changing* any incore state until it's taken ILOCK_SHARED
and locked the AGI buffer to stabilize @ip's unlinked list.
So. I think it's ok to do the unlocked check here like you propose:
> if (xfs_inode_unlinked_incomplete(ip)) {
> error = xfs_inode_reload_unlinked(ip);
> if (error) {
> xfs_irele(ip);
> return ERR_PTR(error);
> }
> }
Though I want to add one more xfs_inode_unlinked_incomplete call inside
xfs_inode_reload_unlinked_bucket to skip the list walk if we take the
locks only to find that someone else fixed it up for us.
> Hmmmm. if i_nlink is zero on ip and we call xfs_irele() on it, the
> it will be punted straight to inactivation, which will try to remove
> it from the unlinked list and free it. But we just failed to reload
> the unlinked list, so inactivation is going to go badly...
>
> So on reload error, shouldn't we be shutting down the filesystem
> because we know we cannot safely remove this inode from the unlinked
> list and free it?
I think the iunlink lookup failure will take down the filesystem for
us when inactivation runs on @ip, but you're right, we should shut down
immediately so that the stack trace will show xfs_nfs_get_inode and not
just a background thread.
> > +
> > if (VFS_I(ip)->i_generation != generation) {
> > xfs_irele(ip);
> > return ERR_PTR(-ESTALE);
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 6cd2f29b540a..56f6bde6001b 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -3607,3 +3607,103 @@ xfs_iunlock2_io_mmap(
> > if (ip1 != ip2)
> > inode_unlock(VFS_I(ip1));
> > }
> > +
> > +/*
> > + * Reload the incore inode list for this inode. Caller should ensure that
> > + * the link count cannot change, either by taking ILOCK_SHARED or otherwise
> > + * preventing other threads from executing.
> > + */
> > +int
> > +xfs_inode_reload_unlinked_bucket(
> > + struct xfs_trans *tp,
> > + struct xfs_inode *ip)
> > +{
> > + struct xfs_mount *mp = tp->t_mountp;
> > + struct xfs_buf *agibp;
> > + struct xfs_agi *agi;
> > + struct xfs_perag *pag;
> > + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
> > + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
> > + xfs_agino_t prev_agino, next_agino;
> > + unsigned int bucket;
> > + bool foundit = false;
> > + int error;
> > +
> > + /* Grab the first inode in the list */
> > + pag = xfs_perag_get(mp, agno);
> > + error = xfs_ialloc_read_agi(pag, tp, &agibp);
> > + xfs_perag_put(pag);
> > + if (error)
> > + return error;
> > +
> > + bucket = agino % XFS_AGI_UNLINKED_BUCKETS;
> > + agi = agibp->b_addr;
> > +
> > + trace_xfs_inode_reload_unlinked_bucket(ip);
> > +
> > + xfs_info_ratelimited(mp,
> > + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating list recovery.",
> > + agino, agno);
> > +
> > + prev_agino = NULLAGINO;
> > + next_agino = be32_to_cpu(agi->agi_unlinked[bucket]);
> > + while (next_agino != NULLAGINO) {
> > + struct xfs_inode *next_ip = NULL;
> > +
> > + if (next_agino == agino) {
> > + /* Found this inode, set its backlink. */
> > + next_ip = ip;
> > + next_ip->i_prev_unlinked = prev_agino;
> > + foundit = true;
> > + }
> > + if (!next_ip) {
> > + /* Inode already in memory. */
> > + next_ip = xfs_iunlink_lookup(pag, next_agino);
> > + }
> > + if (!next_ip) {
> > + /* Inode not in memory, reload. */
> > + error = xfs_iunlink_reload_next(tp, agibp, prev_agino,
> > + next_agino);
> > + if (error)
> > + break;
> > +
> > + next_ip = xfs_iunlink_lookup(pag, next_agino);
> > + }
> > + if (!next_ip) {
> > + /* No incore inode at all? We reloaded it... */
> > + ASSERT(next_ip != NULL);
> > + error = -EFSCORRUPTED;
> > + break;
> > + }
>
> Not a great fan of this logic construct, nor am I great fan of
> asking you to restructure code for vanity reasons. However, I do
> think a goto improves it quite a lot:
>
> while (next_agino != NULLAGINO) {
> struct xfs_inode *next_ip = NULL;
>
> if (next_agino == agino) {
> /* Found this inode, set its backlink. */
> next_ip = ip;
> next_ip->i_prev_unlinked = prev_agino;
> foundit = true;
> goto next_inode;
> }
>
> /* Try in-memory lookup first. */
> next_ip = xfs_iunlink_lookup(pag, next_agino);
> if (next_ip)
> goto next_inode;
>
> /* Inode not in memory, try reloading it. */
> error = xfs_iunlink_reload_next(tp, agibp, prev_agino,
> next_agino);
> if (error)
> break;
>
> /* Grab the reloaded inode */
> next_ip = xfs_iunlink_lookup(pag, next_agino);
> if (!next_ip) {
> /* No incore inode at all, list must be corrupted. */
> ASSERT(next_ip != NULL);
> error = -EFSCORRUPTED;
> break;
> }
> next_inode:
> prev_agino = next_agino;
> next_agino = next_ip->i_next_unlinked;
> }
>
> If you think it's better, feel free to use this. Otherwise I can
> live with the code until I have to touch this code again...
>
> ......
I'll use it, thanks. It /does/ make the loop body easier to understand
since there are fewer "uhh when *is* next_ip null?" questions.
> > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > index f225413a993c..ea38d69b9922 100644
> > --- a/fs/xfs/xfs_itable.c
> > +++ b/fs/xfs/xfs_itable.c
> > @@ -80,6 +80,15 @@ xfs_bulkstat_one_int(
> > if (error)
> > goto out;
> >
> > + if (xfs_inode_unlinked_incomplete(ip)) {
> > + error = xfs_inode_reload_unlinked_bucket(tp, ip);
> > + if (error) {
> > + xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > + xfs_irele(ip);
> > + return error;
> > + }
> > + }
>
> Same question here about shutdown on error being necessary....
Will add a xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE) call here
too.
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1.1 3/3] xfs: make inode unlinked bucket recovery work with quotacheck
2023-09-07 7:11 ` Dave Chinner
@ 2023-09-07 18:34 ` Darrick J. Wong
2023-09-07 21:40 ` Dave Chinner
0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2023-09-07 18:34 UTC (permalink / raw)
To: Dave Chinner; +Cc: chandan.babu, linux-xfs
On Thu, Sep 07, 2023 at 05:11:53PM +1000, Dave Chinner wrote:
> On Tue, Sep 05, 2023 at 09:33:03AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Teach quotacheck to reload the unlinked inode lists when walking the
> > inode table. This requires extra state handling, since it's possible
> > that a reloaded inode will get inactivated before quotacheck tries to
> > scan it; in this case, we need to ensure that the reloaded inode does
> > not have dquots attached when it is freed.
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > v1.1: s/CONFIG_QUOTA/CONFIG_XFS_QUOTA/ and fix tracepoint flags decoding
> > ---
> > fs/xfs/xfs_inode.c | 12 +++++++++---
> > fs/xfs/xfs_inode.h | 5 ++++-
> > fs/xfs/xfs_mount.h | 10 +++++++++-
> > fs/xfs/xfs_qm.c | 7 +++++++
> > 4 files changed, 29 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 56f6bde6001b..22af7268169b 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -1743,9 +1743,13 @@ xfs_inactive(
> > ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0))
> > truncate = 1;
> >
> > - error = xfs_qm_dqattach(ip);
> > - if (error)
> > - goto out;
> > + if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) {
> > + xfs_qm_dqdetach(ip);
> > + } else {
> > + error = xfs_qm_dqattach(ip);
> > + if (error)
> > + goto out;
> > + }
>
> That needs a comment - I'm not going to remember why sometimes we
> detatch dquots instead of attach them here....
/*
* If this inode is being inactivated during a quotacheck and
* has not yet been scanned by quotacheck, we /must/ remove the
* dquots from the inode before inactivation changes the block
* and inode counts. Most probably this is a result of
* reloading the incore iunlinked list to purge unrecovered
* unlinked inodes.
*/
How does that sound?
> ....
> > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > index 6abcc34fafd8..7256090c3895 100644
> > --- a/fs/xfs/xfs_qm.c
> > +++ b/fs/xfs/xfs_qm.c
> > @@ -1160,6 +1160,10 @@ xfs_qm_dqusage_adjust(
> > if (error)
> > return error;
> >
> > + error = xfs_inode_reload_unlinked(ip);
> > + if (error)
> > + goto error0;
>
> Same comment here about doing millions of transaction create/cancel
> for inodes that have non-zero link counts....
>
> Also, same comment here about shutting down on reload error because
> the irele() call will inactivate the inode and try to remove it from
> the unlinked list....
Ok, fixed this callsite too.
Thank you for looking at this series!
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1.1 3/3] xfs: make inode unlinked bucket recovery work with quotacheck
2023-09-07 18:34 ` Darrick J. Wong
@ 2023-09-07 21:40 ` Dave Chinner
0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2023-09-07 21:40 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: chandan.babu, linux-xfs
On Thu, Sep 07, 2023 at 11:34:41AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 07, 2023 at 05:11:53PM +1000, Dave Chinner wrote:
> > On Tue, Sep 05, 2023 at 09:33:03AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > >
> > > Teach quotacheck to reload the unlinked inode lists when walking the
> > > inode table. This requires extra state handling, since it's possible
> > > that a reloaded inode will get inactivated before quotacheck tries to
> > > scan it; in this case, we need to ensure that the reloaded inode does
> > > not have dquots attached when it is freed.
> > >
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > > v1.1: s/CONFIG_QUOTA/CONFIG_XFS_QUOTA/ and fix tracepoint flags decoding
> > > ---
> > > fs/xfs/xfs_inode.c | 12 +++++++++---
> > > fs/xfs/xfs_inode.h | 5 ++++-
> > > fs/xfs/xfs_mount.h | 10 +++++++++-
> > > fs/xfs/xfs_qm.c | 7 +++++++
> > > 4 files changed, 29 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index 56f6bde6001b..22af7268169b 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -1743,9 +1743,13 @@ xfs_inactive(
> > > ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0))
> > > truncate = 1;
> > >
> > > - error = xfs_qm_dqattach(ip);
> > > - if (error)
> > > - goto out;
> > > + if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) {
> > > + xfs_qm_dqdetach(ip);
> > > + } else {
> > > + error = xfs_qm_dqattach(ip);
> > > + if (error)
> > > + goto out;
> > > + }
> >
> > That needs a comment - I'm not going to remember why sometimes we
> > detatch dquots instead of attach them here....
>
> /*
> * If this inode is being inactivated during a quotacheck and
> * has not yet been scanned by quotacheck, we /must/ remove the
> * dquots from the inode before inactivation changes the block
> * and inode counts. Most probably this is a result of
> * reloading the incore iunlinked list to purge unrecovered
> * unlinked inodes.
> */
>
> How does that sound?
LGTM.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-09-07 21:40 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-03 16:15 [PATCHSET 0/3] xfs: reload entire iunlink lists Darrick J. Wong
2023-09-03 16:15 ` [PATCH 1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list Darrick J. Wong
2023-09-07 6:27 ` Dave Chinner
2023-09-03 16:15 ` [PATCH 2/3] xfs: reload entire unlinked bucket lists Darrick J. Wong
2023-09-07 7:02 ` Dave Chinner
2023-09-07 18:30 ` Darrick J. Wong
2023-09-03 16:16 ` [PATCH 3/3] xfs: make inode unlinked bucket recovery work with quotacheck Darrick J. Wong
2023-09-05 16:33 ` [PATCH v1.1 " Darrick J. Wong
2023-09-07 7:11 ` Dave Chinner
2023-09-07 18:34 ` Darrick J. Wong
2023-09-07 21:40 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox