public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
@ 2013-09-09  1:33 Dave Chinner
  2013-09-09 15:17 ` Mark Tinguely
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Dave Chinner @ 2013-09-09  1:33 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

CPU overhead of buffer lookups dominate most metadata intensive
workloads. The thing is, most such workloads are hitting a
relatively small number of buffers repeatedly, and so caching
recently hit buffers is a good idea.

Add a hashed lookaside buffer that records the recent buffer
lookup successes and is searched first before doing a rb-tree
lookup. If we get a hit, we avoid the expensive rbtree lookup and
greatly reduce the overhead of the lookup. If we get a cache miss,
then we've added an extra CPU cacheline miss into the lookup.

In cold cache lookup cases, this extra cache line miss is irrelevant
as we need to read or allocate the buffer anyway, and the etup time
for that dwarfs the cost of the miss.

In the case that we miss the lookaside cache and find the buffer in
the rbtree, the cache line miss overhead will be noticable only if
we don't see any lookaside cache misses at all in subsequent
lookups. We don't tend to do random cache walks in perfomrance
critical paths, so the net result is that the extra CPU cacheline
miss will be lost in the reduction of misses due to cache hits. This
hit/miss case is what we'll see with file removal operations.

A simple prime number hash was chosen for the cache (i.e. modulo 37)
because it is fast, simple, and works really well with block numbers
that tend to be aligned to a multiple of 8. No attempt to optimise
this has been made - it's just a number I picked out of thin air
given that most repetitive workloads have a working set of buffers
that is significantly smaller than 37 per AG and should hold most of
the AG header buffers permanently in the lookaside cache.

The result is that on a typical concurrent create fsmark benchmark I
run, the profile of CPU usage went from having _xfs_buf_find() as
teh number one CPU consumer:

   6.55%  [kernel]  [k] _xfs_buf_find
   4.94%  [kernel]  [k] xfs_dir3_free_hdr_from_disk
   4.77%  [kernel]  [k] native_read_tsc
   4.67%  [kernel]  [k] __ticket_spin_trylock

to this, at about #8 and #30 in the profile:

   2.56%  [kernel]  [k] _xfs_buf_find
....
   0.55%  [kernel]  [k] _xfs_buf_find_lookaside

So the lookaside cache has halved the CPU overhead of looking up
buffers for this workload.

On a buffer hit/miss workload like the followup concurrent removes,
_xfs_buf_find() went from #1 in the profile again at:

   9.13%  [kernel]  [k] _xfs_buf_find

to #6 and #23 repesctively:

   2.82%  [kernel]  [k] _xfs_buf_find
....
   0.78%  [kernel]  [k] _xfs_buf_find_lookaside

Which is also a significant reduction in CPU overhead for buffer
lookups, and shows the benefit on mixed cold/hot cache lookup
workloads.

Performance differential, as measured with -m crc=1,finobt=1:

		   create		remove
		time    rate		time
xfsdev		4m16s	221k/s		6m18s
patched		3m59s	236k/s		5m56s

So less CPU time spent on lookups translates directly to better
metadata performance.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c   | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_buf.h   |  6 ++++++
 fs/xfs/xfs_mount.h |  1 +
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index ee85f29..e382b80 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -470,6 +470,46 @@ _xfs_buf_map_pages(
  */
 
 /*
+ * Lookaside cache check
+ */
+STATIC struct xfs_buf *
+_xfs_buf_find_lookaside(
+	struct xfs_perag	*pag,
+	xfs_daddr_t		bn,
+	int			numblks)
+{
+	struct xfs_buf	*bp;
+
+	ASSERT(spin_is_locked(&pag->pag_buf_lock));
+	bp = pag->pag_buf_cache[XBF_HASH(bn)];
+	if (!bp)
+		return NULL;
+	if (bp->b_bn != bn || bp->b_length != numblks)
+		return NULL;
+	return bp;
+}
+
+static __inline__ void
+_xfs_buf_find_lookaside_insert(
+	struct xfs_perag	*pag,
+	struct xfs_buf		*bp,
+	xfs_daddr_t		bn)
+{
+	ASSERT(spin_is_locked(&pag->pag_buf_lock));
+	pag->pag_buf_cache[XBF_HASH(bn)] = bp;
+}
+
+static __inline__ void
+_xfs_buf_find_lookaside_remove(
+	struct xfs_perag	*pag,
+	struct xfs_buf		*bp)
+{
+	ASSERT(spin_is_locked(&pag->pag_buf_lock));
+	if (pag->pag_buf_cache[XBF_HASH(bp->b_bn)] == bp)
+		pag->pag_buf_cache[XBF_HASH(bp->b_bn)] = NULL;
+}
+
+/*
  *	Look up, and creates if absent, a lockable buffer for
  *	a given range of an inode.  The buffer is returned
  *	locked.	No I/O is implied by this call.
@@ -522,8 +562,12 @@ _xfs_buf_find(
 	pag = xfs_perag_get(btp->bt_mount,
 				xfs_daddr_to_agno(btp->bt_mount, blkno));
 
-	/* walk tree */
+	/* First check the lookaside cache for a hit, otherwise walk the tree */
 	spin_lock(&pag->pag_buf_lock);
+	bp = _xfs_buf_find_lookaside(pag, blkno, numblks);
+	if (bp)
+		goto found;
+
 	rbp = &pag->pag_buf_tree.rb_node;
 	parent = NULL;
 	bp = NULL;
@@ -549,7 +593,7 @@ _xfs_buf_find(
 				rbp = &(*rbp)->rb_right;
 				continue;
 			}
-			atomic_inc(&bp->b_hold);
+			_xfs_buf_find_lookaside_insert(pag, bp, blkno);
 			goto found;
 		}
 	}
@@ -560,6 +604,7 @@ _xfs_buf_find(
 		rb_insert_color(&new_bp->b_rbnode, &pag->pag_buf_tree);
 		/* the buffer keeps the perag reference until it is freed */
 		new_bp->b_pag = pag;
+		_xfs_buf_find_lookaside_insert(pag, bp, blkno);
 		spin_unlock(&pag->pag_buf_lock);
 	} else {
 		XFS_STATS_INC(xb_miss_locked);
@@ -569,6 +614,7 @@ _xfs_buf_find(
 	return new_bp;
 
 found:
+	atomic_inc(&bp->b_hold);
 	spin_unlock(&pag->pag_buf_lock);
 	xfs_perag_put(pag);
 
@@ -924,6 +970,7 @@ xfs_buf_rele(
 		} else {
 			xfs_buf_lru_del(bp);
 			ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
+			_xfs_buf_find_lookaside_remove(pag, bp);
 			rb_erase(&bp->b_rbnode, &pag->pag_buf_tree);
 			spin_unlock(&pag->pag_buf_lock);
 			xfs_perag_put(pag);
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 433a12e..658f746 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -166,6 +166,12 @@ typedef struct xfs_buf {
 #endif
 } xfs_buf_t;
 
+/*
+ * lookaside cache definitions
+ */
+#define XBF_HASH_SIZE		37
+#define XBF_HASH(bn)		(bn % XBF_HASH_SIZE)
+
 /* Finding and Reading Buffers */
 struct xfs_buf *_xfs_buf_find(struct xfs_buftarg *target,
 			      struct xfs_buf_map *map, int nmaps,
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 1fa0584..b8bfabc 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -366,6 +366,7 @@ typedef struct xfs_perag {
 	/* buffer cache index */
 	spinlock_t	pag_buf_lock;	/* lock for pag_buf_tree */
 	struct rb_root	pag_buf_tree;	/* ordered tree of active buffers */
+	struct xfs_buf	*pag_buf_cache[XBF_HASH_SIZE];
 
 	/* for rcu-safe freeing */
 	struct rcu_head	rcu_head;
-- 
1.8.3.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
  2013-09-09  1:33 [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find Dave Chinner
@ 2013-09-09 15:17 ` Mark Tinguely
  2013-09-09 15:39   ` Dave Chinner
  2013-09-18 21:48 ` Mark Tinguely
  2013-09-23 14:17 ` Mark Tinguely
  2 siblings, 1 reply; 10+ messages in thread
From: Mark Tinguely @ 2013-09-09 15:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 09/08/13 20:33, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> CPU overhead of buffer lookups dominate most metadata intensive
> workloads. The thing is, most such workloads are hitting a
> relatively small number of buffers repeatedly, and so caching
> recently hit buffers is a good idea.
>
> Add a hashed lookaside buffer that records the recent buffer
> lookup successes and is searched first before doing a rb-tree
> lookup. If we get a hit, we avoid the expensive rbtree lookup and
> greatly reduce the overhead of the lookup. If we get a cache miss,
> then we've added an extra CPU cacheline miss into the lookup.

Interesting. The last allocated xfs_buf is placed into the hash.

Might be interesting to know the hit-miss ratio on a real workload.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
  2013-09-09 15:17 ` Mark Tinguely
@ 2013-09-09 15:39   ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2013-09-09 15:39 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Mon, Sep 09, 2013 at 10:17:24AM -0500, Mark Tinguely wrote:
> On 09/08/13 20:33, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@redhat.com>
> >
> >CPU overhead of buffer lookups dominate most metadata intensive
> >workloads. The thing is, most such workloads are hitting a
> >relatively small number of buffers repeatedly, and so caching
> >recently hit buffers is a good idea.
> >
> >Add a hashed lookaside buffer that records the recent buffer
> >lookup successes and is searched first before doing a rb-tree
> >lookup. If we get a hit, we avoid the expensive rbtree lookup and
> >greatly reduce the overhead of the lookup. If we get a cache miss,
> >then we've added an extra CPU cacheline miss into the lookup.
> 
> Interesting. The last allocated xfs_buf is placed into the hash.
> 
> Might be interesting to know the hit-miss ratio on a real workload.

Of course. Didn't you notice that data in the next patch I sent? ;)

Anyway, here's one I prepared earlier.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


[RFC v2] xfs: lookaside cache for xfs_buf_find

From: Dave Chinner <dchinner@redhat.com>

CPU overhead of buffer lookups dominate most metadata intensive
workloads. The thing is, most such workloads are hitting a
relatively small number of buffers repeatedly, and so caching
recently hit buffers is a good idea.

Add a hashed lookaside buffer that records the recent buffer
lookup successes and is searched first before doing a rb-tree
lookup. If we get a hit, we avoid the expensive rbtree lookup and
greatly reduce the overhead of the lookup. If we get a cache miss,
then we've added an extra CPU cacheline miss into the lookup.

In cold cache lookup cases, this extra cache line miss is irrelevant
as we need to read or allocate the buffer anyway, and the etup time
for that dwarfs the cost of the miss.

In the case that we miss the lookaside cache and find the buffer in
the rbtree, the cache line miss overhead will be noticable only if
we don't see any lookaside cache misses at all in subsequent
lookups. We don't tend to do random cache walks in perfomrance
critical paths, so the net result is that the extra CPU cacheline
miss will be lost in the reduction of misses due to cache hits. This
hit/miss case is what we'll see with file removal operations.

A simple prime number hash was chosen for the cache (i.e. modulo 37)
because it is fast, simple, and works really well with block numbers
that tend to be aligned to a multiple of 8. No attempt to optimise
this has been made - it's just a number I picked out of thin air
given that most repetitive workloads have a working set of buffers
that is significantly smaller than 37 per AG and should hold most of
the AG header buffers permanently in the lookaside cache.

The result is that on a typical concurrent create fsmark benchmark I
run, the profile of CPU usage went from having _xfs_buf_find() as
teh number one CPU consumer:

   6.55%  [kernel]  [k] _xfs_buf_find
   4.94%  [kernel]  [k] xfs_dir3_free_hdr_from_disk
   4.77%  [kernel]  [k] native_read_tsc
   4.67%  [kernel]  [k] __ticket_spin_trylock

to this, at about #8 and #30 in the profile:

   2.56%  [kernel]  [k] _xfs_buf_find
....
   0.55%  [kernel]  [k] _xfs_buf_find_lookaside

So the lookaside cache has halved the CPU overhead of looking up
buffers for this workload.

On a buffer hit/miss workload like the followup concurrent removes,
_xfs_buf_find() went from #1 in the profile again at:

   9.13%  [kernel]  [k] _xfs_buf_find

to #6 and #23 repesctively:

   2.82%  [kernel]  [k] _xfs_buf_find
....
   0.78%  [kernel]  [k] _xfs_buf_find_lookaside

IOWs, on most workloads - even read only workloads - there is still
a significant benefit to this simple lookaside cache.

More analysis, using -m crc=1,finobt=1:

Lookaside cache hit rates vs size:

		    cache size
cache size	7	37	73
create		0.4	0.76	0.88
bulkstat	0.64	0.88	0.93
find+stat	0.45	0.68	0.73
ls -R		0.06	0.10	0.14
unlink		0.43	0.75	0.89

As expected, the larger the cache, the higher the hit rate.

Wall time vs cache size:

			cache size
		none	7	37	73
create		256s	240s	235s	238s
bulkstat	153s	154s	163s	161s
find+stat	210s	204s	205s	206s
ls -R		 31s	 31s	 31s	32s
unlink		381s	362s	356s	360s

Higher hit rates don't necessarily translate into higher
performance, however.

System time vs cache size:

			cache size
		none	7	37	73
create		2851s	2681s	2637s	2607s
bulkstat	2357s	2342s	2497s	2473s
find+stat	1743s	1704s	1681s	1667s
ls -R		 22s	  21s	  20s	  21s
unlink		3409s	3313s	3216s	3168s

All looks as expected here given the cache hit rate numbers, except
for the bulkstat numbers. I'm not sure why the system time goes up
given the cache hit rates being so high - perf shows the CPU in
xfs_buf_find() going down....

So, performance is best at a cache size of 37 entries, though cache
hit rates and system time is better with 73 entries. Not immediately
obvious why, but it tends to indicate the initial swag at a cache of
37 entries gives a pretty good tradeoff between size, CPU usage
reduction and overall performance improvement.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c   | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_buf.h   |  6 +++++
 fs/xfs/xfs_mount.h |  4 +++-
 fs/xfs/xfs_stats.h |  8 ++++++-
 4 files changed, 79 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index ee85f29..bcceb81 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -470,6 +470,49 @@ _xfs_buf_map_pages(
  */
 
 /*
+ * Lookaside cache check
+ */
+STATIC struct xfs_buf *
+_xfs_buf_find_lookaside(
+	struct xfs_perag	*pag,
+	xfs_daddr_t		bn,
+	int			numblks)
+{
+	struct xfs_buf	*bp;
+
+	ASSERT(spin_is_locked(&pag->pag_buf_lock));
+	bp = pag->pag_buf_cache[XBF_HASH(bn)];
+	if (!bp)
+		return NULL;
+	if (bp->b_bn != bn || bp->b_length != numblks)
+		return NULL;
+	return bp;
+}
+
+static __inline__ void
+_xfs_buf_find_lookaside_insert(
+	struct xfs_perag	*pag,
+	struct xfs_buf		*bp,
+	xfs_daddr_t		bn)
+{
+	ASSERT(spin_is_locked(&pag->pag_buf_lock));
+	XFS_STATS_INC(xb_lookaside_insert);
+	pag->pag_buf_cache[XBF_HASH(bn)] = bp;
+}
+
+static __inline__ void
+_xfs_buf_find_lookaside_remove(
+	struct xfs_perag	*pag,
+	struct xfs_buf		*bp)
+{
+	ASSERT(spin_is_locked(&pag->pag_buf_lock));
+	if (pag->pag_buf_cache[XBF_HASH(bp->b_bn)] == bp) {
+		XFS_STATS_INC(xb_lookaside_remove);
+		pag->pag_buf_cache[XBF_HASH(bp->b_bn)] = NULL;
+	}
+}
+
+/*
  *	Look up, and creates if absent, a lockable buffer for
  *	a given range of an inode.  The buffer is returned
  *	locked.	No I/O is implied by this call.
@@ -492,6 +535,13 @@ _xfs_buf_find(
 	int			numblks = 0;
 	int			i;
 
+	/* get tree root */
+	pag = xfs_perag_get(btp->bt_mount,
+				xfs_daddr_to_agno(btp->bt_mount, blkno));
+	prefetch(&pag->pag_buf_lock);
+	prefetch(&pag->pag_buf_cache[XBF_HASH(blkno)]);
+	prefetch(&pag->pag_buf_tree);
+
 	for (i = 0; i < nmaps; i++)
 		numblks += map[i].bm_len;
 	numbytes = BBTOB(numblks);
@@ -515,15 +565,20 @@ _xfs_buf_find(
 			  "%s: Block out of range: block 0x%llx, EOFS 0x%llx ",
 			  __func__, blkno, eofs);
 		WARN_ON(1);
+		xfs_perag_put(pag);
 		return NULL;
 	}
 
-	/* get tree root */
-	pag = xfs_perag_get(btp->bt_mount,
-				xfs_daddr_to_agno(btp->bt_mount, blkno));
 
-	/* walk tree */
+	/* First check the lookaside cache for a hit, otherwise walk the tree */
 	spin_lock(&pag->pag_buf_lock);
+	bp = _xfs_buf_find_lookaside(pag, blkno, numblks);
+	if (bp) {
+		XFS_STATS_INC(xb_lookaside_hit);
+		goto found;
+	}
+
+	XFS_STATS_INC(xb_lookaside_miss);
 	rbp = &pag->pag_buf_tree.rb_node;
 	parent = NULL;
 	bp = NULL;
@@ -549,7 +604,7 @@ _xfs_buf_find(
 				rbp = &(*rbp)->rb_right;
 				continue;
 			}
-			atomic_inc(&bp->b_hold);
+			_xfs_buf_find_lookaside_insert(pag, bp, blkno);
 			goto found;
 		}
 	}
@@ -560,6 +615,7 @@ _xfs_buf_find(
 		rb_insert_color(&new_bp->b_rbnode, &pag->pag_buf_tree);
 		/* the buffer keeps the perag reference until it is freed */
 		new_bp->b_pag = pag;
+		_xfs_buf_find_lookaside_insert(pag, bp, blkno);
 		spin_unlock(&pag->pag_buf_lock);
 	} else {
 		XFS_STATS_INC(xb_miss_locked);
@@ -569,6 +625,7 @@ _xfs_buf_find(
 	return new_bp;
 
 found:
+	atomic_inc(&bp->b_hold);
 	spin_unlock(&pag->pag_buf_lock);
 	xfs_perag_put(pag);
 
@@ -924,6 +981,7 @@ xfs_buf_rele(
 		} else {
 			xfs_buf_lru_del(bp);
 			ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
+			_xfs_buf_find_lookaside_remove(pag, bp);
 			rb_erase(&bp->b_rbnode, &pag->pag_buf_tree);
 			spin_unlock(&pag->pag_buf_lock);
 			xfs_perag_put(pag);
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 433a12e..658f746 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -166,6 +166,12 @@ typedef struct xfs_buf {
 #endif
 } xfs_buf_t;
 
+/*
+ * lookaside cache definitions
+ */
+#define XBF_HASH_SIZE		37
+#define XBF_HASH(bn)		(bn % XBF_HASH_SIZE)
+
 /* Finding and Reading Buffers */
 struct xfs_buf *_xfs_buf_find(struct xfs_buftarg *target,
 			      struct xfs_buf_map *map, int nmaps,
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 1fa0584..2a997dc 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -364,7 +364,9 @@ typedef struct xfs_perag {
 	unsigned long	pag_ici_reclaim_cursor;	/* reclaim restart point */
 
 	/* buffer cache index */
-	spinlock_t	pag_buf_lock;	/* lock for pag_buf_tree */
+	spinlock_t	pag_buf_lock ____cacheline_aligned_in_smp;
+					/* lock for pag_buf_tree */
+	struct xfs_buf	*pag_buf_cache[XBF_HASH_SIZE];
 	struct rb_root	pag_buf_tree;	/* ordered tree of active buffers */
 
 	/* for rcu-safe freeing */
diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h
index c8f238b..c1f98cb 100644
--- a/fs/xfs/xfs_stats.h
+++ b/fs/xfs/xfs_stats.h
@@ -108,7 +108,7 @@ struct xfsstats {
 	__uint32_t		vn_reclaim;	/* # times vn_reclaim called */
 	__uint32_t		vn_remove;	/* # times vn_remove called */
 	__uint32_t		vn_free;	/* # times vn_free called */
-#define XFSSTAT_END_BUF			(XFSSTAT_END_VNODE_OPS+9)
+#define XFSSTAT_END_BUF			(XFSSTAT_END_VNODE_OPS+13)
 	__uint32_t		xb_get;
 	__uint32_t		xb_create;
 	__uint32_t		xb_get_locked;
@@ -118,6 +118,12 @@ struct xfsstats {
 	__uint32_t		xb_page_retries;
 	__uint32_t		xb_page_found;
 	__uint32_t		xb_get_read;
+	/* XXX: can we extend like this? */
+	__uint32_t		xb_lookaside_hit;
+	__uint32_t		xb_lookaside_miss;
+	__uint32_t		xb_lookaside_insert;
+	__uint32_t		xb_lookaside_remove;
+
 /* Version 2 btree counters */
 #define XFSSTAT_END_ABTB_V2		(XFSSTAT_END_BUF+15)
 	__uint32_t		xs_abtb_2_lookup;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
  2013-09-09  1:33 [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find Dave Chinner
  2013-09-09 15:17 ` Mark Tinguely
@ 2013-09-18 21:48 ` Mark Tinguely
  2013-09-18 23:24   ` Dave Chinner
  2013-09-23 14:17 ` Mark Tinguely
  2 siblings, 1 reply; 10+ messages in thread
From: Mark Tinguely @ 2013-09-18 21:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 09/08/13 20:33, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> CPU overhead of buffer lookups dominate most metadata intensive
> workloads. The thing is, most such workloads are hitting a
> relatively small number of buffers repeatedly, and so caching
> recently hit buffers is a good idea.
>
> Add a hashed lookaside buffer that records the recent buffer
> lookup successes and is searched first before doing a rb-tree
> lookup. If we get a hit, we avoid the expensive rbtree lookup and
> greatly reduce the overhead of the lookup. If we get a cache miss,
> then we've added an extra CPU cacheline miss into the lookup.
>
> In cold cache lookup cases, this extra cache line miss is irrelevant
> as we need to read or allocate the buffer anyway, and the etup time
> for that dwarfs the cost of the miss.
>
> In the case that we miss the lookaside cache and find the buffer in
> the rbtree, the cache line miss overhead will be noticable only if
> we don't see any lookaside cache misses at all in subsequent
> lookups. We don't tend to do random cache walks in perfomrance
> critical paths, so the net result is that the extra CPU cacheline
> miss will be lost in the reduction of misses due to cache hits. This
> hit/miss case is what we'll see with file removal operations.
>
> A simple prime number hash was chosen for the cache (i.e. modulo 37)
> because it is fast, simple, and works really well with block numbers
> that tend to be aligned to a multiple of 8. No attempt to optimise
> this has been made - it's just a number I picked out of thin air
> given that most repetitive workloads have a working set of buffers
> that is significantly smaller than 37 per AG and should hold most of
> the AG header buffers permanently in the lookaside cache.
>
> The result is that on a typical concurrent create fsmark benchmark I
> run, the profile of CPU usage went from having _xfs_buf_find() as
> teh number one CPU consumer:
>
>     6.55%  [kernel]  [k] _xfs_buf_find
>     4.94%  [kernel]  [k] xfs_dir3_free_hdr_from_disk
>     4.77%  [kernel]  [k] native_read_tsc
>     4.67%  [kernel]  [k] __ticket_spin_trylock
>
> to this, at about #8 and #30 in the profile:
>
>     2.56%  [kernel]  [k] _xfs_buf_find
> ....
>     0.55%  [kernel]  [k] _xfs_buf_find_lookaside
>
> So the lookaside cache has halved the CPU overhead of looking up
> buffers for this workload.
>
> On a buffer hit/miss workload like the followup concurrent removes,
> _xfs_buf_find() went from #1 in the profile again at:
>
>     9.13%  [kernel]  [k] _xfs_buf_find
>
> to #6 and #23 repesctively:
>
>     2.82%  [kernel]  [k] _xfs_buf_find
> ....
>     0.78%  [kernel]  [k] _xfs_buf_find_lookaside
>
> Which is also a significant reduction in CPU overhead for buffer
> lookups, and shows the benefit on mixed cold/hot cache lookup
> workloads.
>
> Performance differential, as measured with -m crc=1,finobt=1:
>
> 		   create		remove
> 		time    rate		time
> xfsdev		4m16s	221k/s		6m18s
> patched		3m59s	236k/s		5m56s
>
> So less CPU time spent on lookups translates directly to better
> metadata performance.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---

Low cost, possible higher return. Idea looks good to me.

What happens in xfs_buf_get_map() when we lose the xfs_buf_find() race?
I don't see a removal of the losing lookaside entry inserted in the
xfs_buf_find().

I will let it run for a while.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
  2013-09-18 21:48 ` Mark Tinguely
@ 2013-09-18 23:24   ` Dave Chinner
  2013-09-19 13:23     ` Mark Tinguely
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-09-18 23:24 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Wed, Sep 18, 2013 at 04:48:45PM -0500, Mark Tinguely wrote:
> On 09/08/13 20:33, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@redhat.com>
> >
> >CPU overhead of buffer lookups dominate most metadata intensive
> >workloads. The thing is, most such workloads are hitting a
> >relatively small number of buffers repeatedly, and so caching
> >recently hit buffers is a good idea.
> >
> >Add a hashed lookaside buffer that records the recent buffer
> >lookup successes and is searched first before doing a rb-tree
> >lookup. If we get a hit, we avoid the expensive rbtree lookup and
> >greatly reduce the overhead of the lookup. If we get a cache miss,
> >then we've added an extra CPU cacheline miss into the lookup.
....
> 
> Low cost, possible higher return. Idea looks good to me.
> 
> What happens in xfs_buf_get_map() when we lose the xfs_buf_find() race?

What race is that?

> I don't see a removal of the losing lookaside entry inserted in the
> xfs_buf_find().

Why would we want to do removal of an entry if some other lookup
aliases to the same slot and doesn't match? If the buffer we are
looking up isn't in cache at all, then we've just removed something
that has had previous cache hits and is still in cache without
inserting anything in it's place. If the buffer is in the cache,
then we do an insert once we've found it. i.e. there is no need to
do removal on lookup miss...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
  2013-09-18 23:24   ` Dave Chinner
@ 2013-09-19 13:23     ` Mark Tinguely
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Tinguely @ 2013-09-19 13:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 09/18/13 18:24, Dave Chinner wrote:
> On Wed, Sep 18, 2013 at 04:48:45PM -0500, Mark Tinguely wrote:
>> On 09/08/13 20:33, Dave Chinner wrote:
>>> From: Dave Chinner<dchinner@redhat.com>
>>>
>>> CPU overhead of buffer lookups dominate most metadata intensive
>>> workloads. The thing is, most such workloads are hitting a
>>> relatively small number of buffers repeatedly, and so caching
>>> recently hit buffers is a good idea.
>>>
>>> Add a hashed lookaside buffer that records the recent buffer
>>> lookup successes and is searched first before doing a rb-tree
>>> lookup. If we get a hit, we avoid the expensive rbtree lookup and
>>> greatly reduce the overhead of the lookup. If we get a cache miss,
>>> then we've added an extra CPU cacheline miss into the lookup.
> ....
>>
>> Low cost, possible higher return. Idea looks good to me.
>>
>> What happens in xfs_buf_get_map() when we lose the xfs_buf_find() race?
>
> What race is that?

I am thinking the two overlapping callers to xfs_buf_find() protected by 
the two calls to xfs_buf_find(). But my mistake was where the lookaside 
gets added. It is added correctly on the second call to xfs_buf_find() 
where it make sure that another find did not beat this find. Yes, no 
entry needs to be removed.


--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
  2013-09-09  1:33 [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find Dave Chinner
  2013-09-09 15:17 ` Mark Tinguely
  2013-09-18 21:48 ` Mark Tinguely
@ 2013-09-23 14:17 ` Mark Tinguely
  2013-09-23 21:25   ` Michael L. Semon
  2013-09-24  0:48   ` Dave Chinner
  2 siblings, 2 replies; 10+ messages in thread
From: Mark Tinguely @ 2013-09-23 14:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 09/08/13 20:33, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> CPU overhead of buffer lookups dominate most metadata intensive
> workloads. The thing is, most such workloads are hitting a
> relatively small number of buffers repeatedly, and so caching
> recently hit buffers is a good idea.
>
...

I think this needs more testing.

I get the following panic in a loop test after a few (3-8) iterations:

  while true
   do
    tar zxpf xfs.tar
    cd xfs
    make
    make modules
    cd ..
    rm -r xfs
  done


BUG: unable to handle kernel paging request at ffff880831c1d218
IP: [<ffffffffa01886c8>] _xfs_buf_find_lookaside+0x98/0xb0 [xfs]
PGD 1c5d067 PUD 85ffe0067 PMD 85fe51067 PTE 8000000831c1d060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: xfs(O) e1000e exportfs libcrc32c ext3 jbd [last 
unloaded: xfs
]
CPU: 0 PID: 23423 Comm: tar Tainted: G           O 3.11.0-rc1+ #3
task: ffff880837f087a0 ti: ffff880831c46000 task.ti: ffff880831c46000
RIP: 0010:[<ffffffffa01886c8>]  [<ffffffffa01886c8>] 
_xfs_buf_find_lookaside+0x9
8/0xb0 [xfs]
RSP: 0018:ffff880831c47918  EFLAGS: 00010286
RAX: ffff880831c1d200 RBX: ffff8808372e0000 RCX: 0000000000000003
  RDX: 0000000000000011 RSI: 00000000000009c0 RDI: ffff8808372e0000
  RBP: ffff880831c47938 R08: ffff8808372e0000 R09: ffff8808376e8d80
  R10: 0000000000000010 R11: 00000000000009c0 R12: 00000000000009c0
  R13: 0000000000000010 R14: 0000000000000001 R15: 00000000000009c0
  FS:  00007fa4bc51f700(0000) GS:ffff88085bc00000(0000) 
knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff880831c1d218 CR3: 000000082ed00000 CR4: 00000000000007f0
  Stack:
   ffff880831c47938 ffff880831c47aa8 0000000000000010 ffff880834ab7900
   ffff880831c479b8 ffffffffa018a679 ffff8808372e00c0 ffff88082eed01a0
   0000000000000029 ffff8808372e01f0 0000000000000000 000200015bfe1c68
  Call Trace:
   [<ffffffffa018a679>] _xfs_buf_find+0x159/0x520 [xfs]
   [<ffffffffa018aea0>] xfs_buf_get_map+0x30/0x130 [xfs]
   [<ffffffffa018afc6>] xfs_buf_read_map+0x26/0xa0 [xfs]
   [<ffffffffa01fbf5d>] xfs_trans_read_buf_map+0x16d/0x4c0 [xfs]
   [<ffffffffa01e784c>] xfs_imap_to_bp+0x6c/0x120 [xfs]
   [<ffffffffa01e7975>] xfs_iread+0x75/0x2f0 [xfs]
   [<ffffffff8114eafb>] ? inode_init_always+0xfb/0x1c0
   [<ffffffffa019311a>] xfs_iget_cache_miss+0x5a/0x1e0 [xfs]
   [<ffffffffa01933db>] xfs_iget+0x13b/0x1c0 [xfs]
   [<ffffffffa01dfaad>] xfs_ialloc+0xbd/0x860 [xfs]
   [<ffffffffa01e02e7>] xfs_dir_ialloc+0x97/0x2e0 [xfs]
   [<ffffffffa01a2308>] ? xfs_trans_reserve+0x308/0x310 [xfs]

I got the same panic running xfstest 319 with the patch at:
    http://oss.sgi.com/archives/xfs/2013-09/msg00578.html
once it hung on a xfs_buf lock before the panic.

And these are the only tests that I threw at this patch.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
  2013-09-23 14:17 ` Mark Tinguely
@ 2013-09-23 21:25   ` Michael L. Semon
  2013-09-24  0:48   ` Dave Chinner
  1 sibling, 0 replies; 10+ messages in thread
From: Michael L. Semon @ 2013-09-23 21:25 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On 09/23/2013 10:17 AM, Mark Tinguely wrote:
> On 09/08/13 20:33, Dave Chinner wrote:
>> From: Dave Chinner<dchinner@redhat.com>
>>
>> CPU overhead of buffer lookups dominate most metadata intensive
>> workloads. The thing is, most such workloads are hitting a
>> relatively small number of buffers repeatedly, and so caching
>> recently hit buffers is a good idea.
>>
> ...
> 
> I think this needs more testing.
> 
> I get the following panic in a loop test after a few (3-8) iterations:
> 
>  while true
>   do
>    tar zxpf xfs.tar
>    cd xfs
>    make
>    make modules
>    cd ..
>    rm -r xfs
>  done
> 
> 
> BUG: unable to handle kernel paging request at ffff880831c1d218
> IP: [<ffffffffa01886c8>] _xfs_buf_find_lookaside+0x98/0xb0 [xfs]
> PGD 1c5d067 PUD 85ffe0067 PMD 85fe51067 PTE 8000000831c1d060
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> Modules linked in: xfs(O) e1000e exportfs libcrc32c ext3 jbd [last unloaded: xfs
> ]
> CPU: 0 PID: 23423 Comm: tar Tainted: G           O 3.11.0-rc1+ #3
> task: ffff880837f087a0 ti: ffff880831c46000 task.ti: ffff880831c46000
> RIP: 0010:[<ffffffffa01886c8>]  [<ffffffffa01886c8>] _xfs_buf_find_lookaside+0x9
> 8/0xb0 [xfs]
> RSP: 0018:ffff880831c47918  EFLAGS: 00010286
> RAX: ffff880831c1d200 RBX: ffff8808372e0000 RCX: 0000000000000003
>  RDX: 0000000000000011 RSI: 00000000000009c0 RDI: ffff8808372e0000
>  RBP: ffff880831c47938 R08: ffff8808372e0000 R09: ffff8808376e8d80
>  R10: 0000000000000010 R11: 00000000000009c0 R12: 00000000000009c0
>  R13: 0000000000000010 R14: 0000000000000001 R15: 00000000000009c0
>  FS:  00007fa4bc51f700(0000) GS:ffff88085bc00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: ffff880831c1d218 CR3: 000000082ed00000 CR4: 00000000000007f0
>  Stack:
>   ffff880831c47938 ffff880831c47aa8 0000000000000010 ffff880834ab7900
>   ffff880831c479b8 ffffffffa018a679 ffff8808372e00c0 ffff88082eed01a0
>   0000000000000029 ffff8808372e01f0 0000000000000000 000200015bfe1c68
>  Call Trace:
>   [<ffffffffa018a679>] _xfs_buf_find+0x159/0x520 [xfs]
>   [<ffffffffa018aea0>] xfs_buf_get_map+0x30/0x130 [xfs]
>   [<ffffffffa018afc6>] xfs_buf_read_map+0x26/0xa0 [xfs]
>   [<ffffffffa01fbf5d>] xfs_trans_read_buf_map+0x16d/0x4c0 [xfs]
>   [<ffffffffa01e784c>] xfs_imap_to_bp+0x6c/0x120 [xfs]
>   [<ffffffffa01e7975>] xfs_iread+0x75/0x2f0 [xfs]
>   [<ffffffff8114eafb>] ? inode_init_always+0xfb/0x1c0
>   [<ffffffffa019311a>] xfs_iget_cache_miss+0x5a/0x1e0 [xfs]
>   [<ffffffffa01933db>] xfs_iget+0x13b/0x1c0 [xfs]
>   [<ffffffffa01dfaad>] xfs_ialloc+0xbd/0x860 [xfs]
>   [<ffffffffa01e02e7>] xfs_dir_ialloc+0x97/0x2e0 [xfs]
>   [<ffffffffa01a2308>] ? xfs_trans_reserve+0x308/0x310 [xfs]
> 
> I got the same panic running xfstest 319 with the patch at:
>    http://oss.sgi.com/archives/xfs/2013-09/msg00578.html
> once it hung on a xfs_buf lock before the panic.
> 
> And these are the only tests that I threw at this patch.
> 
> --Mark.

I got similar issues in full runs of xfstests, but I'm having 
severe setup problems here and also had to adjust the patch for 
32-bit x86.  Thanks for reproducing the problem on 64-bit Linux.

Michael


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
  2013-09-23 14:17 ` Mark Tinguely
  2013-09-23 21:25   ` Michael L. Semon
@ 2013-09-24  0:48   ` Dave Chinner
  2013-09-24 17:41     ` Mark Tinguely
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2013-09-24  0:48 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Mon, Sep 23, 2013 at 09:17:35AM -0500, Mark Tinguely wrote:
> On 09/08/13 20:33, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@redhat.com>
> >
> >CPU overhead of buffer lookups dominate most metadata intensive
> >workloads. The thing is, most such workloads are hitting a
> >relatively small number of buffers repeatedly, and so caching
> >recently hit buffers is a good idea.
> >
> ...
> 
> I think this needs more testing.

Yes, that's what an "RFC" implies. It's an idea, it's not
fully baked and it's not ready for inclusion - it's a proof of
concept that needs further work, and I't being posted for discussion
to determine if it's worth pursuing further.

Indeed, I haven't proposed it for inclusion yet because I'm
still finding problems caused by the patch - it's still just a
prototype at this point.

> I got the same panic running xfstest 319 with the patch at:
>    http://oss.sgi.com/archives/xfs/2013-09/msg00578.html
> once it hung on a xfs_buf lock before the panic.
> 
> And these are the only tests that I threw at this patch.

Sure. The version I have in my stack at the moment has some more
ixes in it, like handling of length mismatches due to stale buffers
on lookaside lookups, and other such things.

i.e. early feedback on prototype code is exactly what [RFC] patches
are for...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find
  2013-09-24  0:48   ` Dave Chinner
@ 2013-09-24 17:41     ` Mark Tinguely
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Tinguely @ 2013-09-24 17:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 09/23/13 19:48, Dave Chinner wrote:
> On Mon, Sep 23, 2013 at 09:17:35AM -0500, Mark Tinguely wrote:
>> On 09/08/13 20:33, Dave Chinner wrote:
>>> From: Dave Chinner<dchinner@redhat.com>
>>>
>>> CPU overhead of buffer lookups dominate most metadata intensive
>>> workloads. The thing is, most such workloads are hitting a
>>> relatively small number of buffers repeatedly, and so caching
>>> recently hit buffers is a good idea.
>>>
>> ...
>>
>> I think this needs more testing.
>
> Yes, that's what an "RFC" implies. It's an idea, it's not
> fully baked and it's not ready for inclusion - it's a proof of
> concept that needs further work, and I't being posted for discussion
> to determine if it's worth pursuing further.
>
> Indeed, I haven't proposed it for inclusion yet because I'm
> still finding problems caused by the patch - it's still just a
> prototype at this point.
>
>> I got the same panic running xfstest 319 with the patch at:
>>     http://oss.sgi.com/archives/xfs/2013-09/msg00578.html
>> once it hung on a xfs_buf lock before the panic.
>>
>> And these are the only tests that I threw at this patch.
>
> Sure. The version I have in my stack at the moment has some more
> ixes in it, like handling of length mismatches due to stale buffers
> on lookaside lookups, and other such things.
>
> i.e. early feedback on prototype code is exactly what [RFC] patches
> are for...

And early feedback is that it has potential but needs more work.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-09-24 17:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-09  1:33 [PATCH] [RFC] xfs: lookaside cache for xfs_buf_find Dave Chinner
2013-09-09 15:17 ` Mark Tinguely
2013-09-09 15:39   ` Dave Chinner
2013-09-18 21:48 ` Mark Tinguely
2013-09-18 23:24   ` Dave Chinner
2013-09-19 13:23     ` Mark Tinguely
2013-09-23 14:17 ` Mark Tinguely
2013-09-23 21:25   ` Michael L. Semon
2013-09-24  0:48   ` Dave Chinner
2013-09-24 17:41     ` Mark Tinguely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox