From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 20/24] xfs: xfs_iflush() is no longer necessary
Date: Fri, 22 May 2020 16:54:47 -0700 [thread overview]
Message-ID: <20200522235447.GX8230@magnolia> (raw)
In-Reply-To: <20200522035029.3022405-21-david@fromorbit.com>
On Fri, May 22, 2020 at 01:50:25PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Now we have a cached buffer on inode log items, we don't need
> to do buffer lookups when flushing inodes anymore - all we need
> to do is lock the buffer and we are ready to go.
>
> This largely gets rid of the need for xfs_iflush(), which is
> essentially just a mechanism to look up the buffer and flush the
> inode to it. Instead, we can just call xfs_iflush_cluster() with a
> few modifications to ensure it also flushes the inode we already
> hold locked.
>
> This allows the AIL inode item pushing to be almost entirely
> non-blocking in XFS - we won't block unless memory allocation
> for the cluster inode lookup blocks or the block device queues are
> full.
>
> Writeback during inode reclaim becomes a little more complex because
> we now have to lock the buffer ourselves, but otherwise this change
> is largely a functional no-op that removes a whole lot of code.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
/This/ looks pretty straightforward. I always wondered about the fact
that the inode item push function would extract a buffer pointer but
then imap_to_bp would replace our pointer...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/xfs/xfs_inode.c | 106 ++++++----------------------------------
> fs/xfs/xfs_inode.h | 2 +-
> fs/xfs/xfs_inode_item.c | 51 +++++++------------
> 3 files changed, 32 insertions(+), 127 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 961eef2ce8c4a..a94528d26328b 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3429,7 +3429,18 @@ xfs_rename(
> return error;
> }
>
> -STATIC int
> +/*
> + * Non-blocking flush of dirty inode metadata into the backing buffer.
> + *
> + * The caller must have a reference to the inode and hold the cluster buffer
> + * locked. The function will walk across all the inodes on the cluster buffer it
> + * can find and lock without blocking, and flush them to the cluster buffer.
> + *
> + * On success, the caller must write out the buffer returned in *bp and
> + * release it. On failure, the filesystem will be shut down, the buffer will
> + * have been unlocked and released, and EFSCORRUPTED will be returned.
> + */
> +int
> xfs_iflush_cluster(
> struct xfs_inode *ip,
> struct xfs_buf *bp)
> @@ -3464,8 +3475,6 @@ xfs_iflush_cluster(
>
> for (i = 0; i < nr_found; i++) {
> cip = cilist[i];
> - if (cip == ip)
> - continue;
>
> /*
> * because this is an RCU protected lookup, we could find a
> @@ -3556,99 +3565,11 @@ xfs_iflush_cluster(
> kmem_free(cilist);
> out_put:
> xfs_perag_put(pag);
> - return error;
> -}
> -
> -/*
> - * Flush dirty inode metadata into the backing buffer.
> - *
> - * The caller must have the inode lock and the inode flush lock held. The
> - * inode lock will still be held upon return to the caller, and the inode
> - * flush lock will be released after the inode has reached the disk.
> - *
> - * The caller must write out the buffer returned in *bpp and release it.
> - */
> -int
> -xfs_iflush(
> - struct xfs_inode *ip,
> - struct xfs_buf **bpp)
> -{
> - struct xfs_mount *mp = ip->i_mount;
> - struct xfs_buf *bp = NULL;
> - struct xfs_dinode *dip;
> - int error;
> -
> - XFS_STATS_INC(mp, xs_iflush_count);
> -
> - ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
> - ASSERT(xfs_isiflocked(ip));
> - ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_BTREE ||
> - ip->i_df.if_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
> -
> - *bpp = NULL;
> -
> - xfs_iunpin_wait(ip);
> -
> - /*
> - * For stale inodes we cannot rely on the backing buffer remaining
> - * stale in cache for the remaining life of the stale inode and so
> - * xfs_imap_to_bp() below may give us a buffer that no longer contains
> - * inodes below. We have to check this after ensuring the inode is
> - * unpinned so that it is safe to reclaim the stale inode after the
> - * flush call.
> - */
> - if (xfs_iflags_test(ip, XFS_ISTALE)) {
> - xfs_ifunlock(ip);
> - return 0;
> - }
> -
> - /*
> - * Get the buffer containing the on-disk inode. We are doing a try-lock
> - * operation here, so we may get an EAGAIN error. In that case, return
> - * leaving the inode dirty.
> - *
> - * If we get any other error, we effectively have a corruption situation
> - * and we cannot flush the inode. Abort the flush and shut down.
> - */
> - error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK);
> - if (error == -EAGAIN) {
> - xfs_ifunlock(ip);
> - return error;
> - }
> - if (error)
> - goto abort;
> -
> - /*
> - * If the buffer is pinned then push on the log now so we won't
> - * get stuck waiting in the write for too long.
> - */
> - if (xfs_buf_ispinned(bp))
> - xfs_log_force(mp, 0);
> -
> - /*
> - * Flush the provided inode then attempt to gather others from the
> - * cluster into the write.
> - *
> - * Note: Once we attempt to flush an inode, we must run buffer
> - * completion callbacks on any failure. If this fails, simulate an I/O
> - * failure on the buffer and shut down.
> - */
> - error = xfs_iflush_int(ip, bp);
> - if (!error)
> - error = xfs_iflush_cluster(ip, bp);
> if (error) {
> bp->b_flags |= XBF_ASYNC;
> xfs_buf_ioend_fail(bp);
> - goto shutdown;
> + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> }
> -
> - *bpp = bp;
> - return 0;
> -
> -abort:
> - xfs_iflush_abort(ip);
> -shutdown:
> - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> return error;
> }
>
> @@ -3667,6 +3588,7 @@ xfs_iflush_int(
> ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_BTREE ||
> ip->i_df.if_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
> ASSERT(iip != NULL && iip->ili_fields != 0);
> + ASSERT(iip->ili_item.li_buf == bp);
>
> dip = xfs_buf_offset(bp, ip->i_imap.im_boffset);
>
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index dadcf19458960..d1109eb13ba2e 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -427,7 +427,7 @@ int xfs_log_force_inode(struct xfs_inode *ip);
> void xfs_iunpin_wait(xfs_inode_t *);
> #define xfs_ipincount(ip) ((unsigned int) atomic_read(&ip->i_pincount))
>
> -int xfs_iflush(struct xfs_inode *, struct xfs_buf **);
> +int xfs_iflush_cluster(struct xfs_inode *, struct xfs_buf *);
> void xfs_lock_two_inodes(struct xfs_inode *ip0, uint ip0_mode,
> struct xfs_inode *ip1, uint ip1_mode);
>
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 7718cfc39eec8..42b4b5fe1e2a9 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -504,53 +504,35 @@ xfs_inode_item_push(
> uint rval = XFS_ITEM_SUCCESS;
> int error;
>
> - if (xfs_ipincount(ip) > 0)
> - return XFS_ITEM_PINNED;
> + ASSERT(iip->ili_item.li_buf);
>
> - if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
> - return XFS_ITEM_LOCKED;
> + if (xfs_ipincount(ip) > 0 || xfs_buf_ispinned(bp) ||
> + (ip->i_flags & XFS_ISTALE))
> + return XFS_ITEM_PINNED;
>
> - /*
> - * Re-check the pincount now that we stabilized the value by
> - * taking the ilock.
> - */
> - if (xfs_ipincount(ip) > 0) {
> - rval = XFS_ITEM_PINNED;
> - goto out_unlock;
> - }
> + /* If the inode is already flush locked, we're already flushing. */
> + if (xfs_isiflocked(ip))
> + return XFS_ITEM_FLUSHING;
>
> - /*
> - * Stale inode items should force out the iclog.
> - */
> - if (ip->i_flags & XFS_ISTALE) {
> - rval = XFS_ITEM_PINNED;
> - goto out_unlock;
> - }
> + if (!xfs_buf_trylock(bp))
> + return XFS_ITEM_LOCKED;
>
> - /*
> - * Someone else is already flushing the inode. Nothing we can do
> - * here but wait for the flush to finish and remove the item from
> - * the AIL.
> - */
> - if (!xfs_iflock_nowait(ip)) {
> - rval = XFS_ITEM_FLUSHING;
> - goto out_unlock;
> + if (bp->b_flags & _XBF_DELWRI_Q) {
> + xfs_buf_unlock(bp);
> + return XFS_ITEM_FLUSHING;
> }
> -
> - ASSERT(iip->ili_fields != 0 || XFS_FORCED_SHUTDOWN(ip->i_mount));
> spin_unlock(&lip->li_ailp->ail_lock);
>
> - error = xfs_iflush(ip, &bp);
> + error = xfs_iflush_cluster(ip, bp);
> if (!error) {
> if (!xfs_buf_delwri_queue(bp, buffer_list))
> rval = XFS_ITEM_FLUSHING;
> - xfs_buf_relse(bp);
> - } else if (error == -EAGAIN)
> + xfs_buf_unlock(bp);
> + } else {
> rval = XFS_ITEM_LOCKED;
> + }
>
> spin_lock(&lip->li_ailp->ail_lock);
> -out_unlock:
> - xfs_iunlock(ip, XFS_ILOCK_SHARED);
> return rval;
> }
>
> @@ -567,6 +549,7 @@ xfs_inode_item_release(
>
> ASSERT(ip->i_itemp != NULL);
> ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> + ASSERT(lip->li_buf || !test_bit(XFS_LI_DIRTY, &lip->li_flags));
>
> lock_flags = iip->ili_lock_flags;
> iip->ili_lock_flags = 0;
> --
> 2.26.2.761.g0e0b3e54be
>
next prev parent reply other threads:[~2020-05-22 23:54 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-22 3:50 [PATCH 00/24] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-05-22 3:50 ` [PATCH 01/24] xfs: remove logged flag from inode log item Dave Chinner
2020-05-22 7:25 ` Christoph Hellwig
2020-05-22 21:13 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 02/24] xfs: add an inode item lock Dave Chinner
2020-05-22 6:45 ` Amir Goldstein
2020-05-22 21:24 ` Darrick J. Wong
2020-05-23 8:45 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 03/24] xfs: mark inode buffers in cache Dave Chinner
2020-05-22 7:45 ` Amir Goldstein
2020-05-22 21:35 ` Darrick J. Wong
2020-05-24 23:41 ` Dave Chinner
2020-05-23 8:48 ` Christoph Hellwig
2020-05-25 0:06 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 04/24] xfs: mark dquot " Dave Chinner
2020-05-22 7:46 ` Amir Goldstein
2020-05-22 21:38 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 05/24] xfs: mark log recovery buffers for completion Dave Chinner
2020-05-22 7:41 ` Amir Goldstein
2020-05-24 23:54 ` Dave Chinner
2020-05-22 21:41 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 06/24] xfs: call xfs_buf_iodone directly Dave Chinner
2020-05-22 7:56 ` Amir Goldstein
2020-05-22 21:53 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 07/24] xfs: clean up whacky buffer log item list reinit Dave Chinner
2020-05-22 22:01 ` Darrick J. Wong
2020-05-23 8:50 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 08/24] xfs: fold xfs_istale_done into xfs_iflush_done Dave Chinner
2020-05-22 22:10 ` Darrick J. Wong
2020-05-23 9:12 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 09/24] xfs: use direct calls for dquot IO completion Dave Chinner
2020-05-22 22:13 ` Darrick J. Wong
2020-05-23 9:16 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 10/24] xfs: clean up the buffer iodone callback functions Dave Chinner
2020-05-22 22:26 ` Darrick J. Wong
2020-05-25 0:37 ` Dave Chinner
2020-05-23 9:19 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 11/24] xfs: get rid of log item callbacks Dave Chinner
2020-05-22 22:27 ` Darrick J. Wong
2020-05-23 9:19 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 12/24] xfs: pin inode backing buffer to the inode log item Dave Chinner
2020-05-22 22:39 ` Darrick J. Wong
2020-05-23 9:34 ` Christoph Hellwig
2020-05-23 21:43 ` Dave Chinner
2020-05-24 5:31 ` Christoph Hellwig
2020-05-24 23:13 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 13/24] xfs: make inode reclaim almost non-blocking Dave Chinner
2020-05-22 12:19 ` Amir Goldstein
2020-05-22 22:48 ` Darrick J. Wong
2020-05-23 22:29 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 14/24] xfs: remove IO submission from xfs_reclaim_inode() Dave Chinner
2020-05-22 23:06 ` Darrick J. Wong
2020-05-25 3:49 ` Dave Chinner
2020-05-23 9:40 ` Christoph Hellwig
2020-05-23 22:35 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 15/24] xfs: allow multiple reclaimers per AG Dave Chinner
2020-05-22 23:10 ` Darrick J. Wong
2020-05-23 22:35 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 16/24] xfs: don't block inode reclaim on the ILOCK Dave Chinner
2020-05-22 23:11 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 17/24] xfs: remove SYNC_TRYLOCK from inode reclaim Dave Chinner
2020-05-22 23:14 ` Darrick J. Wong
2020-05-23 22:42 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 18/24] xfs: clean up inode reclaim comments Dave Chinner
2020-05-22 23:17 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 19/24] xfs: attach inodes to the cluster buffer when dirtied Dave Chinner
2020-05-22 23:48 ` Darrick J. Wong
2020-05-23 22:59 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 20/24] xfs: xfs_iflush() is no longer necessary Dave Chinner
2020-05-22 23:54 ` Darrick J. Wong [this message]
2020-05-22 3:50 ` [PATCH 21/24] xfs: rename xfs_iflush_int() Dave Chinner
2020-05-22 12:33 ` Amir Goldstein
2020-05-22 23:57 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 22/24] xfs: rework xfs_iflush_cluster() dirty inode iteration Dave Chinner
2020-05-23 0:13 ` Darrick J. Wong
2020-05-23 23:14 ` Dave Chinner
2020-05-23 11:31 ` Christoph Hellwig
2020-05-23 23:23 ` Dave Chinner
2020-05-24 5:32 ` Christoph Hellwig
2020-05-23 11:39 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 23/24] xfs: factor xfs_iflush_done Dave Chinner
2020-05-23 0:20 ` Darrick J. Wong
2020-05-23 11:35 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 24/24] xfs: remove xfs_inobp_check() Dave Chinner
2020-05-23 0:16 ` Darrick J. Wong
2020-05-23 11:36 ` Christoph Hellwig
2020-05-22 4:04 ` [PATCH 00/24] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-05-23 16:18 ` Darrick J. Wong
2020-05-23 21:22 ` Dave Chinner
2020-05-22 6:18 ` Amir Goldstein
2020-05-22 12:01 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200522235447.GX8230@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.