From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 03/30] xfs: add an inode item lock
Date: Tue, 2 Jun 2020 12:34:44 -0400 [thread overview]
Message-ID: <20200602163444.GC7967@bfoster> (raw)
In-Reply-To: <20200601214251.4167140-4-david@fromorbit.com>
On Tue, Jun 02, 2020 at 07:42:24AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> The inode log item is kind of special in that it can be aggregating
> new changes in memory at the same time time existing changes are
> being written back to disk. This means there are fields in the log
> item that are accessed concurrently from contexts that don't share
> any locking at all.
>
> e.g. updating ili_last_fields occurs at flush time under the
> ILOCK_EXCL and flush lock at flush time, under the flush lock at IO
> completion time, and is read under the ILOCK_EXCL when the inode is
> logged. Hence there is no actual serialisation between reading the
> field during logging of the inode in transactions vs clearing the
> field in IO completion.
>
> We currently get away with this by the fact that we are only
> clearing fields in IO completion, and nothing bad happens if we
> accidentally log more of the inode than we actually modify. Worst
> case is we consume a tiny bit more memory and log bandwidth.
>
> However, if we want to do more complex state manipulations on the
> log item that requires updates at all three of these potential
> locations, we need to have some mechanism of serialising those
> operations. To do this, introduce a spinlock into the log item to
> serialise internal state.
>
> This could be done via the xfs_inode i_flags_lock, but this then
> leads to potential lock inversion issues where inode flag updates
> need to occur inside locks that best nest inside the inode log item
> locks (e.g. marking inodes stale during inode cluster freeing).
> Using a separate spinlock avoids these sorts of problems and
> simplifies future code.
>
> This does not touch the use of ili_fields in the item formatting
> code - that is entirely protected by the ILOCK_EXCL at this point in
> time, so it remains untouched.
>
Thanks for pointing this out.
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/libxfs/xfs_trans_inode.c | 54 +++++++++++++++++----------------
> fs/xfs/xfs_file.c | 9 ++++--
> fs/xfs/xfs_inode.c | 20 +++++++-----
> fs/xfs/xfs_inode_item.c | 7 +++++
> fs/xfs/xfs_inode_item.h | 18 +++++++++--
> 5 files changed, 68 insertions(+), 40 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> index 4504d215cd590..fe6c2e39be85d 100644
> --- a/fs/xfs/libxfs/xfs_trans_inode.c
> +++ b/fs/xfs/libxfs/xfs_trans_inode.c
...
> @@ -122,23 +117,30 @@ xfs_trans_log_inode(
> * set however, then go ahead and bump the i_version counter
> * unconditionally.
> */
> - if (!test_and_set_bit(XFS_LI_DIRTY, &ip->i_itemp->ili_item.li_flags) &&
> - IS_I_VERSION(VFS_I(ip))) {
> - if (inode_maybe_inc_iversion(VFS_I(ip), flags & XFS_ILOG_CORE))
> - flags |= XFS_ILOG_CORE;
> + if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) {
> + if (IS_I_VERSION(inode) &&
> + inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE))
> + iversion_flags = XFS_ILOG_CORE;
> }
>
> - tp->t_flags |= XFS_TRANS_DIRTY;
> + /*
> + * Record the specific change for fdatasync optimisation. This allows
> + * fdatasync to skip log forces for inodes that are only timestamp
> + * dirty. We do this before the change count so that the core being
> + * logged in this case does not impact on fdatasync behaviour.
> + */
We no longer do this before the change count logic so that part of the
comment is bogus.
> + spin_lock(&iip->ili_lock);
> + iip->ili_fsync_fields |= flags;
>
> /*
> - * Always OR in the bits from the ili_last_fields field.
> - * This is to coordinate with the xfs_iflush() and xfs_iflush_done()
> - * routines in the eventual clearing of the ili_fields bits.
> - * See the big comment in xfs_iflush() for an explanation of
> - * this coordination mechanism.
> + * Always OR in the bits from the ili_last_fields field. This is to
> + * coordinate with the xfs_iflush() and xfs_iflush_done() routines in
> + * the eventual clearing of the ili_fields bits. See the big comment in
> + * xfs_iflush() for an explanation of this coordination mechanism.
> */
> - flags |= ip->i_itemp->ili_last_fields;
> - ip->i_itemp->ili_fields |= flags;
> + iip->ili_fields |= (flags | iip->ili_last_fields |
> + iversion_flags);
> + spin_unlock(&iip->ili_lock);
> }
>
> int
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 403c90309a8ff..0abf770b77498 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -94,6 +94,7 @@ xfs_file_fsync(
> {
> struct inode *inode = file->f_mapping->host;
> struct xfs_inode *ip = XFS_I(inode);
> + struct xfs_inode_log_item *iip = ip->i_itemp;
> struct xfs_mount *mp = ip->i_mount;
> int error = 0;
> int log_flushed = 0;
> @@ -137,13 +138,15 @@ xfs_file_fsync(
> xfs_ilock(ip, XFS_ILOCK_SHARED);
> if (xfs_ipincount(ip)) {
> if (!datasync ||
> - (ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> - lsn = ip->i_itemp->ili_last_lsn;
> + (iip->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> + lsn = iip->ili_last_lsn;
I am still a little confused why the lock is elided in other read cases,
such as this one or perhaps the similar check in xfs_bmbt_to_iomap()..?
Similarly, it looks like we set the ili_[flush|last]_lsn fields outside
of this lock (though last_lsn looks like it's also covered by ilock),
yet the update to the inode_log_item struct implies they should be
protected. What's the intent there?
> }
>
> if (lsn) {
> error = xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, &log_flushed);
> - ip->i_itemp->ili_fsync_fields = 0;
> + spin_lock(&iip->ili_lock);
> + iip->ili_fsync_fields = 0;
> + spin_unlock(&iip->ili_lock);
> }
> xfs_iunlock(ip, XFS_ILOCK_SHARED);
>
...
> diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
> index 4de5070e07655..44c47c08b0b59 100644
> --- a/fs/xfs/xfs_inode_item.h
> +++ b/fs/xfs/xfs_inode_item.h
> @@ -16,12 +16,24 @@ struct xfs_mount;
> struct xfs_inode_log_item {
> struct xfs_log_item ili_item; /* common portion */
> struct xfs_inode *ili_inode; /* inode ptr */
> - xfs_lsn_t ili_flush_lsn; /* lsn at last flush */
> - xfs_lsn_t ili_last_lsn; /* lsn at last transaction */
> - unsigned short ili_lock_flags; /* lock flags */
> + unsigned short ili_lock_flags; /* inode lock flags */
> + /*
> + * The ili_lock protects the interactions between the dirty state and
> + * the flush state of the inode log item. This allows us to do atomic
> + * modifications of multiple state fields without having to hold a
> + * specific inode lock to serialise them.
> + *
> + * We need atomic changes between indoe dirtying, inode flushing and
s/indoe/inode/
Brian
> + * inode completion, but these all hold different combinations of
> + * ILOCK and iflock and hence we need some other method of serialising
> + * updates to the flush state.
> + */
> + spinlock_t ili_lock; /* flush state lock */
> unsigned int ili_last_fields; /* fields when flushed */
> unsigned int ili_fields; /* fields to be logged */
> unsigned int ili_fsync_fields; /* logged since last fsync */
> + xfs_lsn_t ili_flush_lsn; /* lsn at last flush */
> + xfs_lsn_t ili_last_lsn; /* lsn at last transaction */
> };
>
> static inline int xfs_inode_clean(xfs_inode_t *ip)
> --
> 2.26.2.761.g0e0b3e54be
>
next prev parent reply other threads:[~2020-06-02 16:34 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-01 21:42 [PATCH 00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-06-01 21:42 ` [PATCH 01/30] xfs: Don't allow logging of XFS_ISTALE inodes Dave Chinner
2020-06-02 4:30 ` Darrick J. Wong
2020-06-02 7:06 ` Dave Chinner
2020-06-02 16:32 ` Brian Foster
2020-06-01 21:42 ` [PATCH 02/30] xfs: remove logged flag from inode log item Dave Chinner
2020-06-02 16:32 ` Brian Foster
2020-06-01 21:42 ` [PATCH 03/30] xfs: add an inode item lock Dave Chinner
2020-06-02 16:34 ` Brian Foster [this message]
2020-06-04 1:54 ` Dave Chinner
2020-06-04 14:03 ` Brian Foster
2020-06-01 21:42 ` [PATCH 04/30] xfs: mark inode buffers in cache Dave Chinner
2020-06-02 16:45 ` Brian Foster
2020-06-02 19:22 ` Darrick J. Wong
2020-06-02 21:29 ` Dave Chinner
2020-06-03 14:57 ` Brian Foster
2020-06-03 21:21 ` Dave Chinner
2020-06-01 21:42 ` [PATCH 05/30] xfs: mark dquot " Dave Chinner
2020-06-02 16:45 ` Brian Foster
2020-06-02 19:00 ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 06/30] xfs: mark log recovery buffers for completion Dave Chinner
2020-06-02 16:45 ` Brian Foster
2020-06-02 19:24 ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 07/30] xfs: call xfs_buf_iodone directly Dave Chinner
2020-06-02 16:47 ` Brian Foster
2020-06-02 21:38 ` Dave Chinner
2020-06-03 14:58 ` Brian Foster
2020-06-01 21:42 ` [PATCH 08/30] xfs: clean up whacky buffer log item list reinit Dave Chinner
2020-06-02 16:47 ` Brian Foster
2020-06-01 21:42 ` [PATCH 09/30] xfs: make inode IO completion buffer centric Dave Chinner
2020-06-03 14:58 ` Brian Foster
2020-06-01 21:42 ` [PATCH 10/30] xfs: use direct calls for dquot IO completion Dave Chinner
2020-06-02 19:25 ` Darrick J. Wong
2020-06-03 14:58 ` Brian Foster
2020-06-01 21:42 ` [PATCH 11/30] xfs: clean up the buffer iodone callback functions Dave Chinner
2020-06-03 14:58 ` Brian Foster
2020-06-01 21:42 ` [PATCH 12/30] xfs: get rid of log item callbacks Dave Chinner
2020-06-03 14:58 ` Brian Foster
2020-06-01 21:42 ` [PATCH 13/30] xfs: handle buffer log item IO errors directly Dave Chinner
2020-06-02 20:39 ` Darrick J. Wong
2020-06-02 22:17 ` Dave Chinner
2020-06-03 15:02 ` Brian Foster
2020-06-03 21:34 ` Dave Chinner
2020-06-01 21:42 ` [PATCH 14/30] xfs: unwind log item error flagging Dave Chinner
2020-06-02 20:45 ` Darrick J. Wong
2020-06-03 15:02 ` Brian Foster
2020-06-01 21:42 ` [PATCH 15/30] xfs: move xfs_clear_li_failed out of xfs_ail_delete_one() Dave Chinner
2020-06-02 20:47 ` Darrick J. Wong
2020-06-03 15:02 ` Brian Foster
2020-06-01 21:42 ` [PATCH 16/30] xfs: pin inode backing buffer to the inode log item Dave Chinner
2020-06-02 22:30 ` Darrick J. Wong
2020-06-02 22:53 ` Dave Chinner
2020-06-03 18:58 ` Brian Foster
2020-06-03 22:15 ` Dave Chinner
2020-06-04 14:03 ` Brian Foster
2020-06-01 21:42 ` [PATCH 17/30] xfs: make inode reclaim almost non-blocking Dave Chinner
2020-06-01 21:42 ` [PATCH 18/30] xfs: remove IO submission from xfs_reclaim_inode() Dave Chinner
2020-06-02 22:36 ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 19/30] xfs: allow multiple reclaimers per AG Dave Chinner
2020-06-01 21:42 ` [PATCH 20/30] xfs: don't block inode reclaim on the ILOCK Dave Chinner
2020-06-01 21:42 ` [PATCH 21/30] xfs: remove SYNC_TRYLOCK from inode reclaim Dave Chinner
2020-06-01 21:42 ` [PATCH 22/30] xfs: remove SYNC_WAIT from xfs_reclaim_inodes() Dave Chinner
2020-06-02 22:43 ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 23/30] xfs: clean up inode reclaim comments Dave Chinner
2020-06-02 22:45 ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 24/30] xfs: rework stale inodes in xfs_ifree_cluster Dave Chinner
2020-06-02 23:01 ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 25/30] xfs: attach inodes to the cluster buffer when dirtied Dave Chinner
2020-06-02 23:03 ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 26/30] xfs: xfs_iflush() is no longer necessary Dave Chinner
2020-06-01 21:42 ` [PATCH 27/30] xfs: rename xfs_iflush_int() Dave Chinner
2020-06-01 21:42 ` [PATCH 28/30] xfs: rework xfs_iflush_cluster() dirty inode iteration Dave Chinner
2020-06-02 23:23 ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 29/30] xfs: factor xfs_iflush_done Dave Chinner
2020-06-01 21:42 ` [PATCH 30/30] xfs: remove xfs_inobp_check() Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2020-06-04 7:45 [PATCH 00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-06-04 7:45 ` [PATCH 03/30] xfs: add an inode item lock Dave Chinner
2020-06-09 13:13 ` Brian Foster
2020-06-22 8:15 [PATCH 00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-06-22 8:15 ` [PATCH 03/30] xfs: add an inode item lock Dave Chinner
2020-06-23 2:30 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200602163444.GC7967@bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).