From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 04/13] xfs: arrange all unlinked inodes into one list
Date: Tue, 18 Aug 2020 16:59:59 -0700 [thread overview]
Message-ID: <20200818235959.GR6096@magnolia> (raw)
In-Reply-To: <20200812092556.2567285-5-david@fromorbit.com>
On Wed, Aug 12, 2020 at 07:25:47PM +1000, Dave Chinner wrote:
> From: Gao Xiang <hsiangkao@redhat.com>
>
> We currently keep unlinked lists short on disk by hashing the inodes
> across multiple buckets. We don't need to ikeep them short anymore
> as we no longer need to traverse the entire to remove an inode from
> it. The in-memory back reference index provides the previous inode
> in the list for us instead.
>
> Log recovery still has to handle existing filesystems that use all
> 64 on-disk buckets so we detect and handle this case specially so
> that so inode eviction can still work properly in recovery.
>
> [dchinner: imported into parent patch series early on and modified
> to fit cleanly. ]
>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_inode.c | 49 +++++++++++++++++++++++++++-------------------
> 1 file changed, 29 insertions(+), 20 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index f2f502b65691..fa92bdf6e0da 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -33,6 +33,7 @@
> #include "xfs_symlink.h"
> #include "xfs_trans_priv.h"
> #include "xfs_log.h"
> +#include "xfs_log_priv.h"
> #include "xfs_bmap_btree.h"
> #include "xfs_reflink.h"
>
> @@ -2092,25 +2093,32 @@ xfs_iunlink_update_bucket(
> struct xfs_trans *tp,
> xfs_agnumber_t agno,
> struct xfs_buf *agibp,
> - unsigned int bucket_index,
> + xfs_agino_t old_agino,
> xfs_agino_t new_agino)
> {
> + struct xlog *log = tp->t_mountp->m_log;
> struct xfs_agi *agi = agibp->b_addr;
> xfs_agino_t old_value;
> - int offset;
> + unsigned int bucket_index;
> + int offset;
>
> ASSERT(xfs_verify_agino_or_null(tp->t_mountp, agno, new_agino));
>
> + bucket_index = 0;
> + /* During recovery, the old multiple bucket index can be applied */
> + if (!log || log->l_flags & XLOG_RECOVERY_NEEDED) {
Does the flag test need parentheses?
It feels a little funny that we pass in old_agino (having gotten it from
agi_unlinked) and then compare it with agi_unlinked, but as the commit
log points out, I guess this is a wart of having to support the old
unlinked list behavior. It makes sense to me that if we're going to
change the unlinked list behavior we could be a little more careful
about double-checking things.
Question: if a newer kernel crashes with a super-long unlinked list and
the fs gets recovered on an old kernel, will this lead to insanely high
recovery times? I think the answer is no, because recovery is single
threaded and the hash only existed to reduce AGI contention during
normal unlinking operations?
--D
> + ASSERT(old_agino != NULLAGINO);
> +
> + if (be32_to_cpu(agi->agi_unlinked[0]) != old_agino)
> + bucket_index = old_agino % XFS_AGI_UNLINKED_BUCKETS;
> + }
> +
> old_value = be32_to_cpu(agi->agi_unlinked[bucket_index]);
> trace_xfs_iunlink_update_bucket(tp->t_mountp, agno, bucket_index,
> old_value, new_agino);
>
> - /*
> - * We should never find the head of the list already set to the value
> - * passed in because either we're adding or removing ourselves from the
> - * head of the list.
> - */
> - if (old_value == new_agino) {
> + /* check if the old agi_unlinked head is as expected */
> + if (old_value != old_agino) {
> xfs_buf_mark_corrupt(agibp);
> return -EFSCORRUPTED;
> }
> @@ -2216,17 +2224,18 @@ xfs_iunlink_insert_inode(
> xfs_agino_t next_agino;
> xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
> xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
> - short bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS;
> int error;
>
> agi = agibp->b_addr;
>
> /*
> - * Get the index into the agi hash table for the list this inode will
> - * go on. Make sure the pointer isn't garbage and that this inode
> - * isn't already on the list.
> + * We don't need to traverse the on disk unlinked list to find the
> + * previous inode in the list when removing inodes anymore, so we don't
> + * need multiple on-disk lists anymore. Hence we always use bucket 0.
> + * Make sure the pointer isn't garbage and that this inode isn't already
> + * on the list.
> */
> - next_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]);
> + next_agino = be32_to_cpu(agi->agi_unlinked[0]);
> if (next_agino == agino ||
> !xfs_verify_agino_or_null(mp, agno, next_agino)) {
> xfs_buf_mark_corrupt(agibp);
> @@ -2256,7 +2265,7 @@ xfs_iunlink_insert_inode(
> }
>
> /* Point the head of the list to point to this inode. */
> - return xfs_iunlink_update_bucket(tp, agno, agibp, bucket_index, agino);
> + return xfs_iunlink_update_bucket(tp, agno, agibp, next_agino, agino);
> }
>
> /*
> @@ -2416,16 +2425,17 @@ xfs_iunlink_remove_inode(
> xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
> xfs_agino_t next_agino;
> xfs_agino_t head_agino;
> - short bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS;
> int error;
>
> agi = agibp->b_addr;
>
> /*
> - * Get the index into the agi hash table for the list this inode will
> - * go on. Make sure the head pointer isn't garbage.
> + * We don't need to traverse the on disk unlinked list to find the
> + * previous inode in the list when removing inodes anymore, so we don't
> + * need multiple on-disk lists anymore. Hence we always use bucket 0.
> + * Make sure the head pointer isn't garbage.
> */
> - head_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]);
> + head_agino = be32_to_cpu(agi->agi_unlinked[0]);
> if (!xfs_verify_agino(mp, agno, head_agino)) {
> XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
> agi, sizeof(*agi));
> @@ -2483,8 +2493,7 @@ xfs_iunlink_remove_inode(
> }
>
> /* Point the head of the list to the next unlinked inode. */
> - return xfs_iunlink_update_bucket(tp, agno, agibp, bucket_index,
> - next_agino);
> + return xfs_iunlink_update_bucket(tp, agno, agibp, agino, next_agino);
> }
>
> /*
> --
> 2.26.2.761.g0e0b3e54be
>
next prev parent reply other threads:[~2020-08-19 0:00 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-12 9:25 [PATCH 00/13] xfs: in memory inode unlink log items Dave Chinner
2020-08-12 9:25 ` [PATCH 01/13] xfs: xfs_iflock is no longer a completion Dave Chinner
2020-08-18 23:44 ` Darrick J. Wong
2020-08-22 7:41 ` Christoph Hellwig
2020-08-12 9:25 ` [PATCH 02/13] xfs: add log item precommit operation Dave Chinner
2020-08-22 9:06 ` Christoph Hellwig
2020-08-12 9:25 ` [PATCH 03/13] xfs: factor the xfs_iunlink functions Dave Chinner
2020-08-18 23:49 ` Darrick J. Wong
2020-08-22 7:45 ` Christoph Hellwig
2020-08-12 9:25 ` [PATCH 04/13] xfs: arrange all unlinked inodes into one list Dave Chinner
2020-08-18 23:59 ` Darrick J. Wong [this message]
2020-08-19 0:45 ` Dave Chinner
2020-08-19 0:58 ` Gao Xiang
2020-08-22 9:01 ` Christoph Hellwig
2020-08-23 17:24 ` Gao Xiang
2020-08-24 8:19 ` [RFC PATCH] xfs: use log_incompat feature instead of speculate matching Gao Xiang
2020-08-24 8:34 ` Gao Xiang
2020-08-24 15:08 ` Darrick J. Wong
2020-08-24 15:41 ` Gao Xiang
2020-08-25 10:06 ` [PATCH] " Gao Xiang
2020-08-25 14:54 ` Darrick J. Wong
2020-08-25 15:30 ` Gao Xiang
2020-08-27 7:19 ` Christoph Hellwig
2020-08-12 9:25 ` [PATCH 05/13] xfs: add unlink list pointers to xfs_inode Dave Chinner
2020-08-19 0:02 ` Darrick J. Wong
2020-08-19 0:47 ` Dave Chinner
2020-08-22 9:03 ` Christoph Hellwig
2020-08-25 5:17 ` Dave Chinner
2020-08-27 7:21 ` Christoph Hellwig
2020-08-12 9:25 ` [PATCH 06/13] xfs: replace iunlink backref lookups with list lookups Dave Chinner
2020-08-19 0:13 ` Darrick J. Wong
2020-08-19 0:52 ` Dave Chinner
2020-08-12 9:25 ` [PATCH 07/13] xfs: mapping unlinked inodes is now redundant Dave Chinner
2020-08-19 0:14 ` Darrick J. Wong
2020-08-12 9:25 ` [PATCH 08/13] xfs: updating i_next_unlinked doesn't need to return old value Dave Chinner
2020-08-19 0:19 ` Darrick J. Wong
2020-08-12 9:25 ` [PATCH 09/13] xfs: validate the unlinked list pointer on update Dave Chinner
2020-08-19 0:23 ` Darrick J. Wong
2020-08-12 9:25 ` [PATCH 10/13] xfs: re-order AGI updates in unlink list updates Dave Chinner
2020-08-19 0:29 ` Darrick J. Wong
2020-08-19 1:01 ` Dave Chinner
2020-08-12 9:25 ` [PATCH 11/13] xfs: combine iunlink inode update functions Dave Chinner
2020-08-19 0:30 ` Darrick J. Wong
2020-08-12 9:25 ` [PATCH 12/13] xfs: add in-memory iunlink log item Dave Chinner
2020-08-19 0:35 ` Darrick J. Wong
2020-08-12 9:25 ` [PATCH 13/13] xfs: reorder iunlink remove operation in xfs_ifree Dave Chinner
2020-08-12 11:12 ` Gao Xiang
2020-08-19 0:45 ` Darrick J. Wong
2020-08-18 18:17 ` [PATCH 00/13] xfs: in memory inode unlink log items Darrick J. Wong
2020-08-18 20:01 ` Gao Xiang
2020-08-18 21:42 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200818235959.GR6096@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox