From: Waiman Long <Waiman.Long@hpe.com>
To: Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@linux-foundation.org>,
Dave Chinner <dchinner@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Scott J Norton <scott.norton@hp.com>,
linux-kernel@vger.kernel.org, Waiman Long <Waiman.Long@hpe.com>,
xfs@oss.sgi.com, Ingo Molnar <mingo@redhat.com>,
Douglas Hatch <doug.hatch@hp.com>
Subject: [RFC PATCH 2/2] xfs: Allow degeneration of m_fdblocks/m_ifree to global counters
Date: Fri, 4 Mar 2016 21:51:39 -0500 [thread overview]
Message-ID: <1457146299-1601-3-git-send-email-Waiman.Long@hpe.com> (raw)
In-Reply-To: <1457146299-1601-1-git-send-email-Waiman.Long@hpe.com>
Small XFS filesystems on systems with large number of CPUs can incur a
significant overhead due to excessive calls to the percpu_counter_sum()
function which needs to walk through a large number of different
cachelines.
This patch uses the newly added percpu_counter_set_limit() API to
potentially switch the m_fdblocks and m_ifree per-cpu counters to
a global counter with locks at filesystem mount time if its size
is small relatively to the number of CPUs available.
A possible use case is the use of the NVDIMM as an application scratch
storage area for log file and other small files. Current battery-backed
NVDIMMs are pretty small in size, e.g. 8G per DIMM. So we cannot create
large filesystem on top of them.
On a 4-socket 80-thread system running 4.5-rc6 kernel, this patch can
improve the throughput of the AIM7 XFS disk workload by 25%. Before
the patch, the perf profile was:
18.68% 0.08% reaim [k] __percpu_counter_compare
18.05% 9.11% reaim [k] __percpu_counter_sum
0.37% 0.36% reaim [k] __percpu_counter_add
After the patch, the perf profile was:
0.73% 0.36% reaim [k] __percpu_counter_add
0.27% 0.27% reaim [k] __percpu_counter_compare
Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
fs/xfs/xfs_mount.c | 1 -
fs/xfs/xfs_mount.h | 5 +++++
fs/xfs/xfs_super.c | 6 ++++++
3 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index bb753b3..fe74b91 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1163,7 +1163,6 @@ xfs_mod_ifree(
* a large batch count (1024) to minimise global counter updates except when
* we get near to ENOSPC and we have to be very accurate with our updates.
*/
-#define XFS_FDBLOCKS_BATCH 1024
int
xfs_mod_fdblocks(
struct xfs_mount *mp,
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b570984..d9520f4 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -206,6 +206,11 @@ typedef struct xfs_mount {
#define XFS_WSYNC_WRITEIO_LOG 14 /* 16k */
/*
+ * FD blocks batch size for per-cpu compare
+ */
+#define XFS_FDBLOCKS_BATCH 1024
+
+/*
* Allow large block sizes to be reported to userspace programs if the
* "largeio" mount option is used.
*
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 59c9b7b..c0b4f79 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1412,6 +1412,12 @@ xfs_reinit_percpu_counters(
percpu_counter_set(&mp->m_icount, mp->m_sb.sb_icount);
percpu_counter_set(&mp->m_ifree, mp->m_sb.sb_ifree);
percpu_counter_set(&mp->m_fdblocks, mp->m_sb.sb_fdblocks);
+
+ /*
+ * Use default batch size for m_ifree
+ */
+ percpu_counter_set_limit(&mp->m_ifree, 0);
+ percpu_counter_set_limit(&mp->m_fdblocks, 4 * XFS_FDBLOCKS_BATCH);
}
static void
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2016-03-05 2:52 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-05 2:51 [RFC PATCH 0/2] percpu_counter: Enable switching to global counter Waiman Long
2016-03-05 2:51 ` [RFC PATCH 1/2] percpu_counter: Allow falling back to global counter on large system Waiman Long
2016-03-07 18:24 ` Christoph Lameter
[not found] ` <56E9B219.7090500@hpe.com>
2016-03-18 1:58 ` Christoph Lameter
2016-03-05 2:51 ` Waiman Long [this message]
2016-03-05 6:34 ` [RFC PATCH 0/2] percpu_counter: Enable switching to global counter Dave Chinner
[not found] ` <56DDBCEB.8060307@hpe.com>
2016-03-07 21:33 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1457146299-1601-3-git-send-email-Waiman.Long@hpe.com \
--to=waiman.long@hpe.com \
--cc=cl@linux-foundation.org \
--cc=dchinner@redhat.com \
--cc=doug.hatch@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=scott.norton@hp.com \
--cc=tj@kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox