All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@kernel.dk>
To: xfs@oss.sgi.com
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: VFS scalability git tree
Date: Tue, 27 Jul 2010 18:06:32 +1000	[thread overview]
Message-ID: <20100727080632.GA4958@amd> (raw)
In-Reply-To: <20100727070538.GA2893@amd>

On Tue, Jul 27, 2010 at 05:05:39PM +1000, Nick Piggin wrote:
> On Fri, Jul 23, 2010 at 11:55:14PM +1000, Dave Chinner wrote:
> > On Fri, Jul 23, 2010 at 05:01:00AM +1000, Nick Piggin wrote:
> > > I'm pleased to announce I have a git tree up of my vfs scalability work.
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git
> > > http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git
> > > 
> > > Branch vfs-scale-working
> > 
> > With a production build (i.e. no lockdep, no xfs debug), I'll
> > run the same fs_mark parallel create/unlink workload to show
> > scalability as I ran here:
> > 
> > http://oss.sgi.com/archives/xfs/2010-05/msg00329.html
> 
> I've made a similar setup, 2s8c machine, but using 2GB ramdisk instead
> of a real disk (I don't have easy access to a good disk setup ATM, but
> I guess we're more interested in code above the block layer anyway).
> 
> Made an XFS on /dev/ram0 with 16 ags, 64MB log, otherwise same config as
> yours.
> 
> I found that performance is a little unstable, so I sync and echo 3 >
> drop_caches between each run. When it starts reclaiming memory, things
> get a bit more erratic (and XFS seemed to be almost livelocking for tens
> of seconds in inode reclaim).

So about this XFS livelock type thingy. It looks like this, and happens
periodically while running the above fs_mark benchmark requiring reclaim
of inodes:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
15  0   6900  31032    192 471852    0    0    28 183296 8520 46672  5 91  4  0
19  0   7044  22928    192 466712   96  144  1056 115586 8622 41695  3 96  1  0
19  0   7136  59884    192 471200  160   92  6768 34564  995  542  1 99 0  0
19  0   7244  17008    192 467860    0  104  2068 32953 1044  630  1 99 0  0
18  0   7244  43436    192 467324    0    0    12     0  817  405  0 100 0  0
18  0   7244  43684    192 467324    0    0     0     0  806  425  0 100 0  0
18  0   7244  43932    192 467324    0    0     0     0  808  403  0 100 0  0
18  0   7244  44924    192 467324    0    0     0     0  808  398  0 100 0  0
18  0   7244  45456    192 467324    0    0     0     0  809  409  0 100 0  0
18  0   7244  45472    192 467324    0    0     0     0  805  412  0 100 0  0
18  0   7244  46392    192 467324    0    0     0     0  807  401  0 100 0  0
18  0   7244  47012    192 467324    0    0     0     0  810  414  0 100 0  0
18  0   7244  47260    192 467324    0    0     0     0  806  396  0 100 0  0
18  0   7244  47752    192 467324    0    0     0     0  806  403  0 100 0  0
18  0   7244  48204    192 467324    0    0     0     0  810  409  0 100 0  0
18  0   7244  48608    192 467324    0    0     0     0  807  412  0 100 0  0
18  0   7244  48876    192 467324    0    0     0     0  805  406  0 100 0  0
18  0   7244  49000    192 467324    0    0     0     0  809  402  0 100 0  0
18  0   7244  49408    192 467324    0    0     0     0  807  396  0 100 0  0
18  0   7244  49908    192 467324    0    0     0     0  809  406  0 100 0  0
18  0   7244  50032    192 467324    0    0     0     0  805  404  0 100 0  0
18  0   7244  50032    192 467324    0    0     0     0  805  406  0 100 0  0
19  0   7244  73436    192 467324    0    0     0  6340  808  384  0 100 0  0
20  0   7244 490220    192 467324    0    0     0  8411  830  389  0 100 0  0
18  0   7244 620092    192 467324    0    0     0     4  809  435  0 100 0  0
18  0   7244 620344    192 467324    0    0     0     0  806  430  0 100 0  0
16  0   7244 682620    192 467324    0    0    44    80  890  326  0 100 0  0
12  0   7244 604464    192 479308   76    0 11716 73555 2242 14318  2 94 4  0
12  0   7244 556700    192 483488    0    0  4276 77680 6576 92285  1 97 2  0
17  0   7244 502508    192 485456    0    0  2092 98368 6308 91919  1 96 4  0
11  0   7244 416500    192 487116    0    0  1760 114844 7414 63025  2 96  2  0

Nothing much happening except 100% system time for seconds at a time
(length of time varies). This is on a ramdisk, so it isn't waiting
for IO.

During this time, lots of things are contending on the lock:

    60.37%         fs_mark  [kernel.kallsyms]   [k] __write_lock_failed
     4.30%         kswapd0  [kernel.kallsyms]   [k] __write_lock_failed
     3.70%         fs_mark  [kernel.kallsyms]   [k] try_wait_for_completion
     3.59%         fs_mark  [kernel.kallsyms]   [k] _raw_write_lock
     3.46%         kswapd1  [kernel.kallsyms]   [k] __write_lock_failed
                   |
                   --- __write_lock_failed
                      |
                      |--99.92%-- xfs_inode_ag_walk
                      |          xfs_inode_ag_iterator
                      |          xfs_reclaim_inode_shrink
                      |          shrink_slab
                      |          shrink_zone
                      |          balance_pgdat
                      |          kswapd
                      |          kthread
                      |          kernel_thread_helper
                       --0.08%-- [...]

     3.02%         fs_mark  [kernel.kallsyms]   [k] _raw_spin_lock
     1.82%         fs_mark  [kernel.kallsyms]   [k] _xfs_buf_find
     1.16%         fs_mark  [kernel.kallsyms]   [k] memcpy
     0.86%         fs_mark  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
     0.75%         fs_mark  [kernel.kallsyms]   [k] xfs_log_commit_cil
                   |
                   --- xfs_log_commit_cil
                       _xfs_trans_commit
                      |
                      |--60.00%-- xfs_remove
                      |          xfs_vn_unlink
                      |          vfs_unlink
                      |          do_unlinkat
                      |          sys_unlink

I'm not sure if there was a long-running read locker in there causing
all the write lockers to fail, or if they are just running into one
another. But anyway, I hacked the following patch which seemed to
improve that behaviour. I haven't run any throughput numbers on it yet,
but I could if you're interested (and it's not completely broken!)

Batch pag_ici_lock acquisition on the reclaim path, and also skip inodes
that appear to be busy to improve locking efficiency.

Index: source/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- source.orig/fs/xfs/linux-2.6/xfs_sync.c	2010-07-26 21:12:11.000000000 +1000
+++ source/fs/xfs/linux-2.6/xfs_sync.c	2010-07-26 21:58:59.000000000 +1000
@@ -87,6 +87,91 @@ xfs_inode_ag_lookup(
 	return ip;
 }
 
+#define RECLAIM_BATCH_SIZE	32
+STATIC int
+xfs_inode_ag_walk_reclaim(
+	struct xfs_mount	*mp,
+	struct xfs_perag	*pag,
+	int			(*execute)(struct xfs_inode *ip,
+					   struct xfs_perag *pag, int flags),
+	int			flags,
+	int			tag,
+	int			exclusive,
+	int			*nr_to_scan)
+{
+	uint32_t		first_index;
+	int			last_error = 0;
+	int			skipped;
+	xfs_inode_t		*batch[RECLAIM_BATCH_SIZE];
+	int			batchnr;
+	int			i;
+
+	BUG_ON(!exclusive);
+
+restart:
+	skipped = 0;
+	first_index = 0;
+next_batch:
+	batchnr = 0;
+	/* fill the batch */
+	write_lock(&pag->pag_ici_lock);
+	do {
+		xfs_inode_t	*ip;
+
+		ip = xfs_inode_ag_lookup(mp, pag, &first_index, tag);
+		if (!ip)
+			break;	
+		if (!(flags & SYNC_WAIT) &&
+				(!xfs_iflock_free(ip) ||
+				__xfs_iflags_test(ip, XFS_IRECLAIM)))
+			continue;
+
+		/*
+		 * The radix tree lock here protects a thread in xfs_iget from
+		 * racing with us starting reclaim on the inode.  Once we have
+		 * the XFS_IRECLAIM flag set it will not touch us.
+		 */
+		spin_lock(&ip->i_flags_lock);
+		ASSERT_ALWAYS(__xfs_iflags_test(ip, XFS_IRECLAIMABLE));
+		if (__xfs_iflags_test(ip, XFS_IRECLAIM)) {
+			/* ignore as it is already under reclaim */
+			spin_unlock(&ip->i_flags_lock);
+			continue;
+		}
+		__xfs_iflags_set(ip, XFS_IRECLAIM);
+		spin_unlock(&ip->i_flags_lock);
+
+		batch[batchnr++] = ip;
+	} while ((*nr_to_scan)-- && batchnr < RECLAIM_BATCH_SIZE);
+	write_unlock(&pag->pag_ici_lock);
+
+	for (i = 0; i < batchnr; i++) {
+		int		error = 0;
+		xfs_inode_t	*ip = batch[i];
+
+		/* execute doesn't require pag->pag_ici_lock */
+		error = execute(ip, pag, flags);
+		if (error == EAGAIN) {
+			skipped++;
+			continue;
+		}
+		if (error)
+			last_error = error;
+
+		/* bail out if the filesystem is corrupted.  */
+		if (error == EFSCORRUPTED)
+			break;
+	}
+	if (batchnr == RECLAIM_BATCH_SIZE)
+		goto next_batch;
+
+	if (0 && skipped) {
+		delay(1);
+		goto restart;
+	}
+	return last_error;
+}
+
 STATIC int
 xfs_inode_ag_walk(
 	struct xfs_mount	*mp,
@@ -113,6 +198,7 @@ restart:
 			write_lock(&pag->pag_ici_lock);
 		else
 			read_lock(&pag->pag_ici_lock);
+
 		ip = xfs_inode_ag_lookup(mp, pag, &first_index, tag);
 		if (!ip) {
 			if (exclusive)
@@ -198,8 +284,12 @@ xfs_inode_ag_iterator(
 	nr = nr_to_scan ? *nr_to_scan : INT_MAX;
 	ag = 0;
 	while ((pag = xfs_inode_ag_iter_next_pag(mp, &ag, tag))) {
-		error = xfs_inode_ag_walk(mp, pag, execute, flags, tag,
-						exclusive, &nr);
+		if (tag == XFS_ICI_RECLAIM_TAG)
+			error = xfs_inode_ag_walk_reclaim(mp, pag, execute,
+						flags, tag, exclusive, &nr);
+		else
+			error = xfs_inode_ag_walk(mp, pag, execute,
+						flags, tag, exclusive, &nr);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
@@ -789,23 +879,6 @@ xfs_reclaim_inode(
 {
 	int	error = 0;
 
-	/*
-	 * The radix tree lock here protects a thread in xfs_iget from racing
-	 * with us starting reclaim on the inode.  Once we have the
-	 * XFS_IRECLAIM flag set it will not touch us.
-	 */
-	spin_lock(&ip->i_flags_lock);
-	ASSERT_ALWAYS(__xfs_iflags_test(ip, XFS_IRECLAIMABLE));
-	if (__xfs_iflags_test(ip, XFS_IRECLAIM)) {
-		/* ignore as it is already under reclaim */
-		spin_unlock(&ip->i_flags_lock);
-		write_unlock(&pag->pag_ici_lock);
-		return 0;
-	}
-	__xfs_iflags_set(ip, XFS_IRECLAIM);
-	spin_unlock(&ip->i_flags_lock);
-	write_unlock(&pag->pag_ici_lock);
-
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (!xfs_iflock_nowait(ip)) {
 		if (!(sync_mode & SYNC_WAIT))
Index: source/fs/xfs/xfs_inode.h
===================================================================
--- source.orig/fs/xfs/xfs_inode.h	2010-07-26 21:10:33.000000000 +1000
+++ source/fs/xfs/xfs_inode.h	2010-07-26 21:11:59.000000000 +1000
@@ -349,6 +349,11 @@ static inline int xfs_iflock_nowait(xfs_
 	return try_wait_for_completion(&ip->i_flush);
 }
 
+static inline int xfs_iflock_free(xfs_inode_t *ip)
+{
+	return completion_done(&ip->i_flush);
+}
+
 static inline void xfs_ifunlock(xfs_inode_t *ip)
 {
 	complete(&ip->i_flush);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@kernel.dk>
To: xfs@oss.sgi.com
Cc: Dave Chinner <david@fromorbit.com>, linux-fsdevel@vger.kernel.org
Subject: Re: VFS scalability git tree
Date: Tue, 27 Jul 2010 18:06:32 +1000	[thread overview]
Message-ID: <20100727080632.GA4958@amd> (raw)
In-Reply-To: <20100727070538.GA2893@amd>

On Tue, Jul 27, 2010 at 05:05:39PM +1000, Nick Piggin wrote:
> On Fri, Jul 23, 2010 at 11:55:14PM +1000, Dave Chinner wrote:
> > On Fri, Jul 23, 2010 at 05:01:00AM +1000, Nick Piggin wrote:
> > > I'm pleased to announce I have a git tree up of my vfs scalability work.
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git
> > > http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git
> > > 
> > > Branch vfs-scale-working
> > 
> > With a production build (i.e. no lockdep, no xfs debug), I'll
> > run the same fs_mark parallel create/unlink workload to show
> > scalability as I ran here:
> > 
> > http://oss.sgi.com/archives/xfs/2010-05/msg00329.html
> 
> I've made a similar setup, 2s8c machine, but using 2GB ramdisk instead
> of a real disk (I don't have easy access to a good disk setup ATM, but
> I guess we're more interested in code above the block layer anyway).
> 
> Made an XFS on /dev/ram0 with 16 ags, 64MB log, otherwise same config as
> yours.
> 
> I found that performance is a little unstable, so I sync and echo 3 >
> drop_caches between each run. When it starts reclaiming memory, things
> get a bit more erratic (and XFS seemed to be almost livelocking for tens
> of seconds in inode reclaim).

So about this XFS livelock type thingy. It looks like this, and happens
periodically while running the above fs_mark benchmark requiring reclaim
of inodes:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
15  0   6900  31032    192 471852    0    0    28 183296 8520 46672  5 91  4  0
19  0   7044  22928    192 466712   96  144  1056 115586 8622 41695  3 96  1  0
19  0   7136  59884    192 471200  160   92  6768 34564  995  542  1 99 0  0
19  0   7244  17008    192 467860    0  104  2068 32953 1044  630  1 99 0  0
18  0   7244  43436    192 467324    0    0    12     0  817  405  0 100 0  0
18  0   7244  43684    192 467324    0    0     0     0  806  425  0 100 0  0
18  0   7244  43932    192 467324    0    0     0     0  808  403  0 100 0  0
18  0   7244  44924    192 467324    0    0     0     0  808  398  0 100 0  0
18  0   7244  45456    192 467324    0    0     0     0  809  409  0 100 0  0
18  0   7244  45472    192 467324    0    0     0     0  805  412  0 100 0  0
18  0   7244  46392    192 467324    0    0     0     0  807  401  0 100 0  0
18  0   7244  47012    192 467324    0    0     0     0  810  414  0 100 0  0
18  0   7244  47260    192 467324    0    0     0     0  806  396  0 100 0  0
18  0   7244  47752    192 467324    0    0     0     0  806  403  0 100 0  0
18  0   7244  48204    192 467324    0    0     0     0  810  409  0 100 0  0
18  0   7244  48608    192 467324    0    0     0     0  807  412  0 100 0  0
18  0   7244  48876    192 467324    0    0     0     0  805  406  0 100 0  0
18  0   7244  49000    192 467324    0    0     0     0  809  402  0 100 0  0
18  0   7244  49408    192 467324    0    0     0     0  807  396  0 100 0  0
18  0   7244  49908    192 467324    0    0     0     0  809  406  0 100 0  0
18  0   7244  50032    192 467324    0    0     0     0  805  404  0 100 0  0
18  0   7244  50032    192 467324    0    0     0     0  805  406  0 100 0  0
19  0   7244  73436    192 467324    0    0     0  6340  808  384  0 100 0  0
20  0   7244 490220    192 467324    0    0     0  8411  830  389  0 100 0  0
18  0   7244 620092    192 467324    0    0     0     4  809  435  0 100 0  0
18  0   7244 620344    192 467324    0    0     0     0  806  430  0 100 0  0
16  0   7244 682620    192 467324    0    0    44    80  890  326  0 100 0  0
12  0   7244 604464    192 479308   76    0 11716 73555 2242 14318  2 94 4  0
12  0   7244 556700    192 483488    0    0  4276 77680 6576 92285  1 97 2  0
17  0   7244 502508    192 485456    0    0  2092 98368 6308 91919  1 96 4  0
11  0   7244 416500    192 487116    0    0  1760 114844 7414 63025  2 96  2  0

Nothing much happening except 100% system time for seconds at a time
(length of time varies). This is on a ramdisk, so it isn't waiting
for IO.

During this time, lots of things are contending on the lock:

    60.37%         fs_mark  [kernel.kallsyms]   [k] __write_lock_failed
     4.30%         kswapd0  [kernel.kallsyms]   [k] __write_lock_failed
     3.70%         fs_mark  [kernel.kallsyms]   [k] try_wait_for_completion
     3.59%         fs_mark  [kernel.kallsyms]   [k] _raw_write_lock
     3.46%         kswapd1  [kernel.kallsyms]   [k] __write_lock_failed
                   |
                   --- __write_lock_failed
                      |
                      |--99.92%-- xfs_inode_ag_walk
                      |          xfs_inode_ag_iterator
                      |          xfs_reclaim_inode_shrink
                      |          shrink_slab
                      |          shrink_zone
                      |          balance_pgdat
                      |          kswapd
                      |          kthread
                      |          kernel_thread_helper
                       --0.08%-- [...]

     3.02%         fs_mark  [kernel.kallsyms]   [k] _raw_spin_lock
     1.82%         fs_mark  [kernel.kallsyms]   [k] _xfs_buf_find
     1.16%         fs_mark  [kernel.kallsyms]   [k] memcpy
     0.86%         fs_mark  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
     0.75%         fs_mark  [kernel.kallsyms]   [k] xfs_log_commit_cil
                   |
                   --- xfs_log_commit_cil
                       _xfs_trans_commit
                      |
                      |--60.00%-- xfs_remove
                      |          xfs_vn_unlink
                      |          vfs_unlink
                      |          do_unlinkat
                      |          sys_unlink

I'm not sure if there was a long-running read locker in there causing
all the write lockers to fail, or if they are just running into one
another. But anyway, I hacked the following patch which seemed to
improve that behaviour. I haven't run any throughput numbers on it yet,
but I could if you're interested (and it's not completely broken!)

Batch pag_ici_lock acquisition on the reclaim path, and also skip inodes
that appear to be busy to improve locking efficiency.

Index: source/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- source.orig/fs/xfs/linux-2.6/xfs_sync.c	2010-07-26 21:12:11.000000000 +1000
+++ source/fs/xfs/linux-2.6/xfs_sync.c	2010-07-26 21:58:59.000000000 +1000
@@ -87,6 +87,91 @@ xfs_inode_ag_lookup(
 	return ip;
 }
 
+#define RECLAIM_BATCH_SIZE	32
+STATIC int
+xfs_inode_ag_walk_reclaim(
+	struct xfs_mount	*mp,
+	struct xfs_perag	*pag,
+	int			(*execute)(struct xfs_inode *ip,
+					   struct xfs_perag *pag, int flags),
+	int			flags,
+	int			tag,
+	int			exclusive,
+	int			*nr_to_scan)
+{
+	uint32_t		first_index;
+	int			last_error = 0;
+	int			skipped;
+	xfs_inode_t		*batch[RECLAIM_BATCH_SIZE];
+	int			batchnr;
+	int			i;
+
+	BUG_ON(!exclusive);
+
+restart:
+	skipped = 0;
+	first_index = 0;
+next_batch:
+	batchnr = 0;
+	/* fill the batch */
+	write_lock(&pag->pag_ici_lock);
+	do {
+		xfs_inode_t	*ip;
+
+		ip = xfs_inode_ag_lookup(mp, pag, &first_index, tag);
+		if (!ip)
+			break;	
+		if (!(flags & SYNC_WAIT) &&
+				(!xfs_iflock_free(ip) ||
+				__xfs_iflags_test(ip, XFS_IRECLAIM)))
+			continue;
+
+		/*
+		 * The radix tree lock here protects a thread in xfs_iget from
+		 * racing with us starting reclaim on the inode.  Once we have
+		 * the XFS_IRECLAIM flag set it will not touch us.
+		 */
+		spin_lock(&ip->i_flags_lock);
+		ASSERT_ALWAYS(__xfs_iflags_test(ip, XFS_IRECLAIMABLE));
+		if (__xfs_iflags_test(ip, XFS_IRECLAIM)) {
+			/* ignore as it is already under reclaim */
+			spin_unlock(&ip->i_flags_lock);
+			continue;
+		}
+		__xfs_iflags_set(ip, XFS_IRECLAIM);
+		spin_unlock(&ip->i_flags_lock);
+
+		batch[batchnr++] = ip;
+	} while ((*nr_to_scan)-- && batchnr < RECLAIM_BATCH_SIZE);
+	write_unlock(&pag->pag_ici_lock);
+
+	for (i = 0; i < batchnr; i++) {
+		int		error = 0;
+		xfs_inode_t	*ip = batch[i];
+
+		/* execute doesn't require pag->pag_ici_lock */
+		error = execute(ip, pag, flags);
+		if (error == EAGAIN) {
+			skipped++;
+			continue;
+		}
+		if (error)
+			last_error = error;
+
+		/* bail out if the filesystem is corrupted.  */
+		if (error == EFSCORRUPTED)
+			break;
+	}
+	if (batchnr == RECLAIM_BATCH_SIZE)
+		goto next_batch;
+
+	if (0 && skipped) {
+		delay(1);
+		goto restart;
+	}
+	return last_error;
+}
+
 STATIC int
 xfs_inode_ag_walk(
 	struct xfs_mount	*mp,
@@ -113,6 +198,7 @@ restart:
 			write_lock(&pag->pag_ici_lock);
 		else
 			read_lock(&pag->pag_ici_lock);
+
 		ip = xfs_inode_ag_lookup(mp, pag, &first_index, tag);
 		if (!ip) {
 			if (exclusive)
@@ -198,8 +284,12 @@ xfs_inode_ag_iterator(
 	nr = nr_to_scan ? *nr_to_scan : INT_MAX;
 	ag = 0;
 	while ((pag = xfs_inode_ag_iter_next_pag(mp, &ag, tag))) {
-		error = xfs_inode_ag_walk(mp, pag, execute, flags, tag,
-						exclusive, &nr);
+		if (tag == XFS_ICI_RECLAIM_TAG)
+			error = xfs_inode_ag_walk_reclaim(mp, pag, execute,
+						flags, tag, exclusive, &nr);
+		else
+			error = xfs_inode_ag_walk(mp, pag, execute,
+						flags, tag, exclusive, &nr);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
@@ -789,23 +879,6 @@ xfs_reclaim_inode(
 {
 	int	error = 0;
 
-	/*
-	 * The radix tree lock here protects a thread in xfs_iget from racing
-	 * with us starting reclaim on the inode.  Once we have the
-	 * XFS_IRECLAIM flag set it will not touch us.
-	 */
-	spin_lock(&ip->i_flags_lock);
-	ASSERT_ALWAYS(__xfs_iflags_test(ip, XFS_IRECLAIMABLE));
-	if (__xfs_iflags_test(ip, XFS_IRECLAIM)) {
-		/* ignore as it is already under reclaim */
-		spin_unlock(&ip->i_flags_lock);
-		write_unlock(&pag->pag_ici_lock);
-		return 0;
-	}
-	__xfs_iflags_set(ip, XFS_IRECLAIM);
-	spin_unlock(&ip->i_flags_lock);
-	write_unlock(&pag->pag_ici_lock);
-
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (!xfs_iflock_nowait(ip)) {
 		if (!(sync_mode & SYNC_WAIT))
Index: source/fs/xfs/xfs_inode.h
===================================================================
--- source.orig/fs/xfs/xfs_inode.h	2010-07-26 21:10:33.000000000 +1000
+++ source/fs/xfs/xfs_inode.h	2010-07-26 21:11:59.000000000 +1000
@@ -349,6 +349,11 @@ static inline int xfs_iflock_nowait(xfs_
 	return try_wait_for_completion(&ip->i_flush);
 }
 
+static inline int xfs_iflock_free(xfs_inode_t *ip)
+{
+	return completion_done(&ip->i_flush);
+}
+
 static inline void xfs_ifunlock(xfs_inode_t *ip)
 {
 	complete(&ip->i_flush);

  reply	other threads:[~2010-07-27  8:03 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-22 19:01 VFS scalability git tree Nick Piggin
2010-07-22 19:01 ` Nick Piggin
2010-07-23 11:13 ` Dave Chinner
2010-07-23 11:13   ` Dave Chinner
2010-07-23 14:04   ` [PATCH 0/2] vfs scalability tree fixes Dave Chinner
2010-07-23 14:04     ` Dave Chinner
2010-07-23 16:09     ` Nick Piggin
2010-07-23 16:09       ` Nick Piggin
2010-07-23 14:04   ` [PATCH 1/2] xfs: fix shrinker build Dave Chinner
2010-07-23 14:04     ` Dave Chinner
2010-07-23 14:04   ` [PATCH 2/2] xfs: shrinker should use a per-filesystem scan count Dave Chinner
2010-07-23 14:04     ` Dave Chinner
2010-07-23 15:51   ` VFS scalability git tree Nick Piggin
2010-07-23 15:51     ` Nick Piggin
2010-07-24  0:21     ` Dave Chinner
2010-07-24  0:21       ` Dave Chinner
2010-07-23 11:17 ` Christoph Hellwig
2010-07-23 11:17   ` Christoph Hellwig
2010-07-23 15:42   ` Nick Piggin
2010-07-23 15:42     ` Nick Piggin
2010-07-23 13:55 ` Dave Chinner
2010-07-23 13:55   ` Dave Chinner
2010-07-23 16:16   ` Nick Piggin
2010-07-23 16:16     ` Nick Piggin
2010-07-27  7:05   ` Nick Piggin
2010-07-27  7:05     ` Nick Piggin
2010-07-27  8:06     ` Nick Piggin [this message]
2010-07-27  8:06       ` Nick Piggin
2010-07-27 11:36       ` XFS hang in xlog_grant_log_space (was Re: VFS scalability git tree) Nick Piggin
2010-07-27 13:30         ` Dave Chinner
2010-07-27 14:58           ` XFS hang in xlog_grant_log_space Dave Chinner
2010-07-28 13:17             ` Dave Chinner
2010-07-29 14:05               ` Nick Piggin
2010-07-29 22:56                 ` Dave Chinner
2010-07-30  3:59                   ` Nick Piggin
2010-07-28 12:57       ` VFS scalability git tree Dave Chinner
2010-07-28 12:57         ` Dave Chinner
2010-07-29 14:03         ` Nick Piggin
2010-07-29 14:03           ` Nick Piggin
2010-07-27 11:09     ` Nick Piggin
2010-07-27 11:09       ` Nick Piggin
2010-07-27 13:18     ` Dave Chinner
2010-07-27 13:18       ` Dave Chinner
2010-07-27 15:09       ` Nick Piggin
2010-07-27 15:09         ` Nick Piggin
2010-07-28  4:59         ` Dave Chinner
2010-07-28  4:59           ` Dave Chinner
2010-07-28  4:59           ` Dave Chinner
2010-07-23 15:35 ` Nick Piggin
2010-07-23 15:35   ` Nick Piggin
2010-07-24  8:43 ` KOSAKI Motohiro
2010-07-24  8:43   ` KOSAKI Motohiro
2010-07-24  8:44   ` [PATCH 1/2] vmscan: shrink_all_slab() use reclaim_state instead the return value of shrink_slab() KOSAKI Motohiro
2010-07-24  8:44     ` KOSAKI Motohiro
2010-07-24  8:44     ` KOSAKI Motohiro
2010-07-24 12:05     ` KOSAKI Motohiro
2010-07-24 12:05       ` KOSAKI Motohiro
2010-07-24  8:46   ` [PATCH 2/2] vmscan: change shrink_slab() return tyep with void KOSAKI Motohiro
2010-07-24  8:46     ` KOSAKI Motohiro
2010-07-24  8:46     ` KOSAKI Motohiro
2010-07-24 10:54   ` VFS scalability git tree KOSAKI Motohiro
2010-07-24 10:54     ` KOSAKI Motohiro
2010-07-26  5:41 ` Nick Piggin
2010-07-26  5:41   ` Nick Piggin
2010-07-28 10:24   ` Nick Piggin
2010-07-28 10:24     ` Nick Piggin
2010-07-30  9:12 ` Nick Piggin
2010-07-30  9:12   ` Nick Piggin
2010-08-03  0:27   ` john stultz
2010-08-03  0:27     ` john stultz
2010-08-03  0:27     ` john stultz
2010-08-03  5:44     ` Nick Piggin
2010-08-03  5:44       ` Nick Piggin
2010-08-03  5:44       ` Nick Piggin
2010-09-14 22:26       ` Christoph Hellwig
2010-09-14 23:02         ` Frank Mayhar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100727080632.GA4958@amd \
    --to=npiggin@kernel.dk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.