From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: convert m_active_trans counter to per-cpu
Date: Tue, 12 May 2020 15:59:18 -0700 [thread overview]
Message-ID: <20200512225918.GA1984748@magnolia> (raw)
In-Reply-To: <20200512213919.GT2040@dread.disaster.area>
On Wed, May 13, 2020 at 07:39:19AM +1000, Dave Chinner wrote:
> On Tue, May 12, 2020 at 09:03:52AM -0700, Darrick J. Wong wrote:
> > On Tue, May 12, 2020 at 12:59:49PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > It's a global atomic counter, and we are hitting it at a rate of
> > > half a million transactions a second, so it's bouncing the counter
> > > cacheline all over the place on large machines. Convert it to a
> > > per-cpu counter.
> > >
> > > And .... oh wow, that was unexpected!
> > >
> > > Concurrent create, 50 million inodes, identical 16p/16GB virtual
> > > machines on different physical hosts. Machine A has twice the CPU
> > > cores per socket of machine B:
> > >
> > > unpatched patched
> > > machine A: 3m45s 2m27s
> > > machine B: 4m13s 4m14s
> > >
> > > Create rates:
> > > unpatched patched
> > > machine A: 246k+/-15k 384k+/-10k
> > > machine B: 225k+/-13k 223k+/-11k
> > >
> > > Concurrent rm of same 50 million inodes:
> > >
> > > unpatched patched
> > > machine A: 8m30s 3m09s
> > > machine B: 4m02s 4m51s
> > >
> > > The transaction rate on the fast machine went from about 250k/sec to
> > > over 600k/sec, which indicates just how much of a bottleneck this
> > > atomic counter was.
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > > fs/xfs/xfs_mount.h | 2 +-
> > > fs/xfs/xfs_super.c | 12 +++++++++---
> > > fs/xfs/xfs_trans.c | 6 +++---
> > > 3 files changed, 13 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > > index 712b3e2583316..af3d8b71e9591 100644
> > > --- a/fs/xfs/xfs_mount.h
> > > +++ b/fs/xfs/xfs_mount.h
> > > @@ -84,6 +84,7 @@ typedef struct xfs_mount {
> > > * extents or anything related to the rt device.
> > > */
> > > struct percpu_counter m_delalloc_blks;
> > > + struct percpu_counter m_active_trans; /* in progress xact counter */
> > >
> > > struct xfs_buf *m_sb_bp; /* buffer for superblock */
> > > char *m_rtname; /* realtime device name */
> > > @@ -164,7 +165,6 @@ typedef struct xfs_mount {
> > > uint64_t m_resblks; /* total reserved blocks */
> > > uint64_t m_resblks_avail;/* available reserved blocks */
> > > uint64_t m_resblks_save; /* reserved blks @ remount,ro */
> > > - atomic_t m_active_trans; /* number trans frozen */
> > > struct xfs_mru_cache *m_filestream; /* per-mount filestream data */
> > > struct delayed_work m_reclaim_work; /* background inode reclaim */
> > > struct delayed_work m_eofblocks_work; /* background eof blocks
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index e80bd2c4c279e..bc4853525ce18 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -883,7 +883,7 @@ xfs_quiesce_attr(
> > > int error = 0;
> > >
> > > /* wait for all modifications to complete */
> > > - while (atomic_read(&mp->m_active_trans) > 0)
> > > + while (percpu_counter_sum(&mp->m_active_trans) > 0)
> > > delay(100);
> >
> > Hmm. AFAICT, this counter stops us from quiescing the log while
> > transactions are still running. We only quiesce the log for unmount,
> > remount-ro, and fs freeze. Given that we now start_sb_write for
> > xfs_getfsmap and the background freeing threads, I wonder, do we still
> > need this at all?
>
> Perhaps not - I didn't look that far. It's basically only needed to
> protect against XFS_TRANS_NO_WRITECOUNT transactions, which is
> really just xfs_sync_sb() these days. This can come from several
> places, but the only one outside of mount/freeze/unmount is the log
> worker. Perhaps the log worker can be cancelled before calling
> xfs_quiesce_attr() rather than after?
What if we skip bumping m_active_trans for NO_WRITECOUNT transactions?
There aren't that many of them, and it'd probably better for memory
consumption on 1000-core systems. ;)
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2020-05-12 22:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-12 2:59 [PATCH 0/2] xfs: fix a couple of performance issues Dave Chinner
2020-05-12 2:59 ` [PATCH 1/2] xfs: separate read-only variables in struct xfs_mount Dave Chinner
2020-05-12 8:14 ` Christoph Hellwig
2020-05-12 9:11 ` Dave Chinner
2020-05-12 2:59 ` [PATCH 2/2] xfs: convert m_active_trans counter to per-cpu Dave Chinner
2020-05-12 8:16 ` Christoph Hellwig
2020-05-12 9:06 ` Dave Chinner
2020-05-12 16:03 ` Darrick J. Wong
2020-05-12 21:39 ` Dave Chinner
2020-05-12 22:59 ` Darrick J. Wong [this message]
2020-05-13 17:17 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200512225918.GA1984748@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox