From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 20/34] xfs: remove all the inodes on a buffer from the AIL in bulk
Date: Tue, 21 Dec 2010 18:29:16 +1100 [thread overview]
Message-ID: <1292916570-25015-21-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1292916570-25015-1-git-send-email-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
When inode buffer IO completes, usually all of the inodes are removed from the
AIL. This involves processing them one at a time and taking the AIL lock once
for every inode. When all CPUs are processing inode IO completions, this causes
excessive amount sof contention on the AIL lock.
Instead, change the way we process inode IO completion in the buffer
IO done callback. Allow the inode IO done callback to walk the list
of IO done callbacks and pull all the inodes off the buffer in one
go and then process them as a batch.
Once all the inodes for removal are collected, take the AIL lock
once and do a bulk removal operation to minimise traffic on the AIL
lock.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_inode_item.c | 92 ++++++++++++++++++++++++++++++++++++++---------
fs/xfs/xfs_trans_ail.c | 73 +++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_trans_priv.h | 4 ++
3 files changed, 152 insertions(+), 17 deletions(-)
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 7c8d30c..fd4f398 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -842,15 +842,64 @@ xfs_inode_item_destroy(
* flushed to disk. It is responsible for removing the inode item
* from the AIL if it has not been re-logged, and unlocking the inode's
* flush lock.
+ *
+ * To reduce AIL lock traffic as much as possible, we scan the buffer log item
+ * list for other inodes that will run this function. We remove them from the
+ * buffer list so we can process all the inode IO completions in one AIL lock
+ * traversal.
*/
void
xfs_iflush_done(
struct xfs_buf *bp,
struct xfs_log_item *lip)
{
- struct xfs_inode_log_item *iip = INODE_ITEM(lip);
- xfs_inode_t *ip = iip->ili_inode;
+ struct xfs_inode_log_item *iip;
+ struct xfs_log_item *blip;
+ struct xfs_log_item *next;
+ struct xfs_log_item *prev;
struct xfs_ail *ailp = lip->li_ailp;
+ int need_ail = 0;
+
+ /*
+ * Scan the buffer IO completions for other inodes being completed and
+ * attach them to the current inode log item.
+ */
+ blip = XFS_BUF_FSPRIVATE(bp, xfs_log_item_t *);
+ prev = NULL;
+ while (blip != NULL) {
+ if (lip->li_cb != xfs_iflush_done) {
+ prev = blip;
+ blip = blip->li_bio_list;
+ continue;
+ }
+
+ /* remove from list */
+ next = blip->li_bio_list;
+ if (!prev) {
+ XFS_BUF_SET_FSPRIVATE(bp, next);
+ } else {
+ prev->li_bio_list = next;
+ }
+
+ /* add to current list */
+ blip->li_bio_list = lip->li_bio_list;
+ lip->li_bio_list = blip;
+
+ /*
+ * while we have the item, do the unlocked check for needing
+ * the AIL lock.
+ */
+ iip = INODE_ITEM(blip);
+ if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn)
+ need_ail++;
+
+ blip = next;
+ }
+
+ /* make sure we capture the state of the initial inode. */
+ iip = INODE_ITEM(lip);
+ if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn)
+ need_ail++;
/*
* We only want to pull the item from the AIL if it is
@@ -861,28 +910,37 @@ xfs_iflush_done(
* the lock since it's cheaper, and then we recheck while
* holding the lock before removing the inode from the AIL.
*/
- if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) {
+ if (need_ail) {
+ struct xfs_log_item *log_items[need_ail];
+ int i = 0;
spin_lock(&ailp->xa_lock);
- if (lip->li_lsn == iip->ili_flush_lsn) {
- /* xfs_trans_ail_delete() drops the AIL lock. */
- xfs_trans_ail_delete(ailp, lip);
- } else {
- spin_unlock(&ailp->xa_lock);
+ for (blip = lip; blip; blip = blip->li_bio_list) {
+ iip = INODE_ITEM(blip);
+ if (iip->ili_logged &&
+ blip->li_lsn == iip->ili_flush_lsn) {
+ log_items[i++] = blip;
+ }
+ ASSERT(i <= need_ail);
}
+ /* xfs_trans_ail_delete_bulk() drops the AIL lock. */
+ xfs_trans_ail_delete_bulk(ailp, log_items, i);
}
- iip->ili_logged = 0;
/*
- * Clear the ili_last_fields bits now that we know that the
- * data corresponding to them is safely on disk.
+ * clean up and unlock the flush lock now we are done. We can clear the
+ * ili_last_fields bits now that we know that the data corresponding to
+ * them is safely on disk.
*/
- iip->ili_last_fields = 0;
+ for (blip = lip; blip; blip = next) {
+ next = blip->li_bio_list;
+ blip->li_bio_list = NULL;
- /*
- * Release the inode's flush lock since we're done with it.
- */
- xfs_ifunlock(ip);
+ iip = INODE_ITEM(blip);
+ iip->ili_logged = 0;
+ iip->ili_last_fields = 0;
+ xfs_ifunlock(iip->ili_inode);
+ }
}
/*
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index fe991a7..218f968 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -639,6 +639,79 @@ xfs_trans_ail_delete(
}
}
+/*
+ * xfs_trans_ail_delete_bulk - remove multiple log items from the AIL
+ *
+ * @xfs_trans_ail_delete_bulk takes an array of log items that all need to
+ * removed from the AIL. The caller is already holding the AIL lock, and done
+ * all the checks necessary to ensure the items passed in via @log_items are
+ * ready for deletion. This includes checking that the items are in the AIL.
+ *
+ * For each log item to be removed, unlink it from the AIL, clear the IN_AIL
+ * flag from the item and reset the item's lsn to 0. If we remove the first
+ * item in the AIL, update the log tail to match the new minimum LSN in the
+ * AIL.
+ *
+ * This function will not drop the AIL lock until all items are removed from
+ * the AIL to minimise the amount of lock traffic on the AIL. This does not
+ * greatly increase the AIL hold time, but does significantly reduce the amount
+ * of traffic on the lock, especially during IO completion.
+ *
+ * This function must be called with the AIL lock held. The lock is dropped
+ * before returning.
+ */
+void
+xfs_trans_ail_delete_bulk(
+ struct xfs_ail *ailp,
+ struct xfs_log_item **log_items,
+ int nr_items) __releases(ailp->xa_lock)
+{
+ xfs_log_item_t *mlip;
+ xfs_lsn_t tail_lsn;
+ int mlip_changed = 0;
+ int i;
+
+ mlip = xfs_ail_min(ailp);
+
+ for (i = 0; i < nr_items; i++) {
+ struct xfs_log_item *lip = log_items[i];
+ if (!(lip->li_flags & XFS_LI_IN_AIL)) {
+ struct xfs_mount *mp = ailp->xa_mount;
+
+ spin_unlock(&ailp->xa_lock);
+ if (!XFS_FORCED_SHUTDOWN(mp)) {
+ xfs_cmn_err(XFS_PTAG_AILDELETE, CE_ALERT, mp,
+ "%s: attempting to delete a log item that is not in the AIL",
+ __func__);
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+ }
+ return;
+ }
+
+ xfs_ail_delete(ailp, lip);
+ lip->li_flags &= ~XFS_LI_IN_AIL;
+ lip->li_lsn = 0;
+ if (mlip == lip)
+ mlip_changed = 1;
+ }
+
+ if (!mlip_changed) {
+ spin_unlock(&ailp->xa_lock);
+ return;
+ }
+
+ /*
+ * It is not safe to access mlip after the AIL lock is dropped, so we
+ * must get a copy of li_lsn before we do so. This is especially
+ * important on 32-bit platforms where accessing and updating 64-bit
+ * values like li_lsn is not atomic. It is possible we've emptied the
+ * AIL here, so if that is the case, pass an LSN of 0 to the tail move.
+ */
+ mlip = xfs_ail_min(ailp);
+ tail_lsn = mlip ? mlip->li_lsn : 0;
+ spin_unlock(&ailp->xa_lock);
+ xfs_log_move_tail(ailp->xa_mount, tail_lsn);
+}
/*
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index e039729..246ca4d 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -85,6 +85,10 @@ void xfs_trans_ail_update_bulk(struct xfs_ail *ailp,
void xfs_trans_ail_delete(struct xfs_ail *ailp,
struct xfs_log_item *lip)
__releases(ailp->xa_lock);
+void xfs_trans_ail_delete_bulk(struct xfs_ail *ailp,
+ struct xfs_log_item **log_items,
+ int nr_items)
+ __releases(ailp->xa_lock);
void xfs_trans_ail_push(struct xfs_ail *, xfs_lsn_t);
void xfs_trans_unlocked_item(struct xfs_ail *,
xfs_log_item_t *);
--
1.7.2.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-12-21 7:28 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-21 7:28 [PATCH 00/34] xfs: scalability patchset for 2.6.38 Dave Chinner
2010-12-21 7:28 ` [PATCH 01/34] xfs: provide a inode iolock lockdep class Dave Chinner
2010-12-21 15:15 ` Christoph Hellwig
2010-12-21 7:28 ` [PATCH 02/34] xfs: use KM_NOFS for allocations during attribute list operations Dave Chinner
2010-12-21 15:16 ` Christoph Hellwig
2010-12-21 7:28 ` [PATCH 03/34] lib: percpu counter add unless less than functionality Dave Chinner
2010-12-22 2:20 ` Alex Elder
2010-12-22 3:46 ` Dave Chinner
2010-12-21 7:29 ` [PATCH 04/34] xfs: use generic per-cpu counter infrastructure Dave Chinner
2010-12-21 7:29 ` [PATCH 05/34] xfs: demultiplex xfs_icsb_modify_counters() Dave Chinner
2010-12-21 7:29 ` [PATCH 06/34] xfs: dynamic speculative EOF preallocation Dave Chinner
2010-12-21 15:15 ` Christoph Hellwig
2010-12-21 21:42 ` Dave Chinner
2010-12-21 23:44 ` Dave Chinner
2010-12-22 2:29 ` Alex Elder
2010-12-29 12:56 ` Christoph Hellwig
2010-12-27 14:57 ` Alex Elder
2010-12-27 15:00 ` Alex Elder
2011-01-06 18:16 ` Christoph Hellwig
2010-12-21 7:29 ` [PATCH 07/34] xfs: don't truncate prealloc from frequently accessed inodes Dave Chinner
2010-12-21 16:45 ` Christoph Hellwig
2010-12-21 7:29 ` [PATCH 08/34] xfs: rcu free inodes Dave Chinner
2010-12-21 7:29 ` [PATCH 09/34] xfs: convert inode cache lookups to use RCU locking Dave Chinner
2010-12-21 7:29 ` [PATCH 10/34] xfs: convert pag_ici_lock to a spin lock Dave Chinner
2010-12-21 7:29 ` [PATCH 11/34] xfs: convert xfsbud shrinker to a per-buftarg shrinker Dave Chinner
2010-12-21 7:29 ` [PATCH 12/34] xfs: add a lru to the XFS buffer cache Dave Chinner
2010-12-21 7:29 ` [PATCH 13/34] xfs: connect up buffer reclaim priority hooks Dave Chinner
2010-12-21 7:29 ` [PATCH 14/34] xfs: fix EFI transaction cancellation Dave Chinner
2010-12-21 7:29 ` [PATCH 15/34] xfs: Pull EFI/EFD handling out from under the AIL lock Dave Chinner
2010-12-21 7:29 ` [PATCH 16/34] xfs: clean up xfs_ail_delete() Dave Chinner
2010-12-21 7:29 ` [PATCH 17/34] xfs: bulk AIL insertion during transaction commit Dave Chinner
2010-12-21 7:29 ` [PATCH 18/34] xfs: reduce the number of AIL push wakeups Dave Chinner
2010-12-21 7:29 ` [PATCH 19/34] xfs: consume iodone callback items on buffers as they are processed Dave Chinner
2010-12-21 7:29 ` Dave Chinner [this message]
2010-12-22 2:20 ` [PATCH 20/34] xfs: remove all the inodes on a buffer from the AIL in bulk Alex Elder
2010-12-22 3:49 ` Dave Chinner
2010-12-21 7:29 ` [PATCH 22/34] xfs: use AIL bulk delete function to implement single delete Dave Chinner
2010-12-21 7:29 ` [PATCH 23/34] xfs: convert log grant ticket queues to list heads Dave Chinner
2010-12-21 7:29 ` [PATCH 24/34] xfs: fact out common grant head/log tail verification code Dave Chinner
2010-12-21 7:29 ` [PATCH 25/34] xfs: rework log grant space calculations Dave Chinner
2010-12-21 7:29 ` [PATCH 26/34] xfs: combine grant heads into a single 64 bit integer Dave Chinner
2010-12-21 7:29 ` [PATCH 27/34] xfs: use wait queues directly for the log wait queues Dave Chinner
2010-12-21 7:29 ` [PATCH 28/34] xfs: make AIL tail pushing independent of the grant lock Dave Chinner
2010-12-21 7:29 ` [PATCH 29/34] xfs: convert l_last_sync_lsn to an atomic variable Dave Chinner
2010-12-21 7:29 ` [PATCH 30/34] xfs: convert l_tail_lsn " Dave Chinner
2010-12-29 12:52 ` Christoph Hellwig
2010-12-29 15:49 ` Alex Elder
2010-12-21 7:29 ` [PATCH 31/34] xfs: convert log grant heads to atomic variables Dave Chinner
2010-12-21 7:29 ` [PATCH 32/34] xfs: introduce new locks for the log grant ticket wait queues Dave Chinner
2010-12-21 7:29 ` [PATCH 33/34] xfs: convert grant head manipulations to lockless algorithm Dave Chinner
2010-12-21 7:29 ` [PATCH 34/34] xfs: kill useless spinlock_destroy macro Dave Chinner
2010-12-23 1:15 ` [PATCH 00/34] xfs: scalability patchset for 2.6.38 Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1292916570-25015-21-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox