From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 16/18] xfs: serialise inode reclaim within an AG
Date: Fri, 24 Sep 2010 22:31:14 +1000 [thread overview]
Message-ID: <1285331476-23015-17-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1285331476-23015-1-git-send-email-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
Memory reclaim via shrinkers has a terrible habit of having N+M
concurrent shrinker executions (N = num CPUs, M = num kswapds) all
trying to shrink the same cache. When the cache they are all working
on is protected by a single spinlock, massive contention an
slowdowns occur.
Wrap the per-ag inode caches with a reclaim mutex to serialise
reclaim access to the AG. This will block concurrent reclaim in each
AG but still allow reclaim to scan multiple AGs concurrently. Allow
shrinkers to move on to the next AG if it can't get the lock, and if
we can't get any AG, then start blocking on locks.
To prevent reclaimers from continually scanning the same inodes in
each AG, add a cursor that tracks where the last reclaim got up to
and start from that point on the next reclaim. This should avoid
only ever scanning a small number of inodes at the satart of each AG
and not making progress. If we have a non-shrinker based reclaim
pass, ignore the cursor and reset it to zero once we are done.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
fs/xfs/linux-2.6/xfs_sync.c | 24 ++++++++++++++++++++++++
fs/xfs/xfs_ag.h | 2 ++
fs/xfs/xfs_mount.c | 1 +
3 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index c0c09e9..7d6f312 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -828,7 +828,9 @@ xfs_reclaim_inodes_ag(
int error = 0;
int last_error = 0;
xfs_agnumber_t ag;
+ int trylock = !!(flags & SYNC_TRYLOCK);
+restart:
ag = 0;
while ((pag = xfs_perag_get_tag(mp, ag, XFS_ICI_RECLAIM_TAG))) {
unsigned long first_index = 0;
@@ -837,6 +839,17 @@ xfs_reclaim_inodes_ag(
ag = pag->pag_agno + 1;
+ if (!mutex_trylock(&pag->pag_ici_reclaim_lock)) {
+ if (trylock) {
+ trylock++;
+ continue;
+ }
+ mutex_lock(&pag->pag_ici_reclaim_lock);
+ }
+
+ if (trylock)
+ first_index = pag->pag_ici_reclaim_cursor;
+
do {
struct xfs_inode *batch[XFS_LOOKUP_BATCH];
int i;
@@ -889,8 +902,19 @@ xfs_reclaim_inodes_ag(
} while (nr_found && !done && *nr_to_scan > 0);
+ pag->pag_ici_reclaim_cursor = (done || !trylock) ? 0 : first_index;
+ mutex_unlock(&pag->pag_ici_reclaim_lock);
xfs_perag_put(pag);
}
+
+ /*
+ * if we skipped any AG, and we still have scan count remaining, do
+ * another pass this time waiting on the reclaim locks.
+ */
+ if (trylock > 1 && *nr_to_scan) {
+ trylock = 0;
+ goto restart;
+ }
return XFS_ERROR(last_error);
}
diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
index 51c42c2..baeec83 100644
--- a/fs/xfs/xfs_ag.h
+++ b/fs/xfs/xfs_ag.h
@@ -230,6 +230,8 @@ typedef struct xfs_perag {
rwlock_t pag_ici_lock; /* incore inode lock */
struct radix_tree_root pag_ici_root; /* incore inode cache root */
int pag_ici_reclaimable; /* reclaimable inodes */
+ struct mutex pag_ici_reclaim_lock; /* serialisation point */
+ unsigned long pag_ici_reclaim_cursor; /* reclaim restart point */
/* for rcu-safe freeing */
struct rcu_head rcu_head;
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index d66e87c..59859c3 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -477,6 +477,7 @@ xfs_initialize_perag(
pag->pag_agno = index;
pag->pag_mount = mp;
rwlock_init(&pag->pag_ici_lock);
+ mutex_init(&pag->pag_ici_reclaim_lock);
INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
if (radix_tree_preload(GFP_NOFS))
--
1.7.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-09-24 12:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-24 12:30 [PATCH 0/18] xfs: metadata scalability V3 Dave Chinner
2010-09-24 12:30 ` [PATCH 01/18] xfs: force background CIL push under sustained load Dave Chinner
2010-09-24 12:31 ` [PATCH 02/18] xfs: reduce the number of CIL lock round trips during commit Dave Chinner
2010-09-24 12:31 ` [PATCH 03/18] xfs: remove debug assert for per-ag reference counting Dave Chinner
2010-09-24 12:31 ` [PATCH 04/18] xfs: lockless per-ag lookups Dave Chinner
2010-09-24 12:31 ` [PATCH 05/18] xfs: don't use vfs writeback for pure metadata modifications Dave Chinner
2010-09-25 23:42 ` Christoph Hellwig
2010-09-27 1:09 ` Dave Chinner
2010-09-24 12:31 ` [PATCH 06/18] xfs: rename xfs_buf_get_nodaddr to be more appropriate Dave Chinner
2010-09-24 12:31 ` [PATCH 07/18] xfs: introduced uncached buffer read primitve Dave Chinner
2010-09-24 12:31 ` [PATCH 08/18] xfs: store xfs_mount in the buftarg instead of in the xfs_buf Dave Chinner
2010-09-24 12:31 ` [PATCH 09/18] xfs: kill XBF_FS_MANAGED buffers Dave Chinner
2010-09-24 12:31 ` [PATCH 10/18] xfs: use unhashed buffers for size checks Dave Chinner
2010-09-24 12:31 ` [PATCH 11/18] xfs: remove buftarg hash for external devices Dave Chinner
2010-09-24 12:31 ` [PATCH 12/18] xfs: split inode AG walking into separate code for reclaim Dave Chinner
2010-09-24 12:31 ` [PATCH 13/18] xfs: split out inode walk inode grabbing Dave Chinner
2010-09-25 16:31 ` Christoph Hellwig
2010-09-24 12:31 ` [PATCH 14/18] xfs: implement batched inode lookups for AG walking Dave Chinner
2010-09-25 16:32 ` Christoph Hellwig
2010-09-24 12:31 ` [PATCH 15/18] xfs: batch inode reclaim lookup Dave Chinner
2010-09-24 12:31 ` Dave Chinner [this message]
2010-09-25 23:49 ` [PATCH 16/18] xfs: serialise inode reclaim within an AG Christoph Hellwig
2010-09-27 0:56 ` Dave Chinner
2010-09-24 12:31 ` [PATCH 17/18] xfs: convert buffer cache hash to rbtree Dave Chinner
2010-09-24 12:31 ` [PATCH 18/18] xfs: pack xfs_buf structure more tightly Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2010-09-27 1:47 [PATCH 0/18] xfs: metadata scalability V4 Dave Chinner
2010-09-27 1:47 ` [PATCH 16/18] xfs: serialise inode reclaim within an AG Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1285331476-23015-17-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox