public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] speed up single bio_vec allocation
@ 2006-12-04 19:27 Chen, Kenneth W
  2006-12-04 20:06 ` Jens Axboe
  2006-12-08  2:27 ` Andi Kleen
  0 siblings, 2 replies; 18+ messages in thread
From: Chen, Kenneth W @ 2006-12-04 19:27 UTC (permalink / raw)
  To: 'Jens Axboe'; +Cc: linux-kernel

On 64-bit arch like x86_64, struct bio is 104 byte.  Since bio slab is
created with SLAB_HWCACHE_ALIGN flag, there are usually spare memory
available at the end of bio.  I think we can utilize that memory for
bio_vec allocation.  The purpose is not so much on saving memory consumption
for bio_vec, instead, I'm attempting to optimize away a call to bvec_alloc_bs.

So here is a patch to do just that for 1 segment bio_vec (we currently only
have space for 1 on 2.6.19).  And the detection whether there are spare space
available is dynamically calculated at compile time.  If there are no space
available, there will be no run time cost at all because gcc simply optimize
away all the code added in this patch.  If there are space available, the only
run time check is to see what the size of iovec is and we do appropriate
assignment to bio->bi_io_Vec etc.  The cost is minimal and we gain a whole
lot back from not calling bvec_alloc_bs() function.

I tried to use cache_line_size() to find out the alignment of struct bio, but
stumbled on that it is a runtime function for x86_64. So instead I made bio
to hint to the slab allocator to align on 32 byte (slab will use the larger
value of hw cache line and caller hints of "align").  I think it is a sane
number for majority of the CPUs out in the world.


Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>


--- ./fs/bio.c.orig	2006-12-03 17:20:36.000000000 -0800
+++ ./fs/bio.c	2006-12-03 21:29:20.000000000 -0800
@@ -29,11 +29,14 @@
 #include <scsi/sg.h>		/* for struct sg_iovec */
 
 #define BIO_POOL_SIZE 256
-
+#define BIO_ALIGN 32		/* minimal bio structure alignment */
 static kmem_cache_t *bio_slab __read_mostly;
 
 #define BIOVEC_NR_POOLS 6
 
+#define BIOVEC_FIT_INSIDE_BIO_CACHE_LINE				\
+	(ALIGN(sizeof(struct bio), BIO_ALIGN) ==			\
+	 ALIGN(sizeof(struct bio) + sizeof(struct bio_vec), BIO_ALIGN))
 /*
  * a small number of entries is fine, not going to be performance critical.
  * basically we just need to survive
@@ -113,7 +116,8 @@ void bio_free(struct bio *bio, struct bi
 
 	BIO_BUG_ON(pool_idx >= BIOVEC_NR_POOLS);
 
-	mempool_free(bio->bi_io_vec, bio_set->bvec_pools[pool_idx]);
+	if (!BIOVEC_FIT_INSIDE_BIO_CACHE_LINE || pool_idx)
+		mempool_free(bio->bi_io_vec, bio_set->bvec_pools[pool_idx]);
 	mempool_free(bio, bio_set->bio_pool);
 }
 
@@ -166,7 +170,15 @@ struct bio *bio_alloc_bioset(gfp_t gfp_m
 		struct bio_vec *bvl = NULL;
 
 		bio_init(bio);
-		if (likely(nr_iovecs)) {
+
+		/*
+		 * if bio_vec can fit into remaining cache line of struct
+		 * bio, go ahead use it and skip mempool allocation.
+		 */
+		if (nr_iovecs == 1 && BIOVEC_FIT_INSIDE_BIO_CACHE_LINE) {
+			bvl = (struct bio_vec*) (bio + 1);
+			bio->bi_max_vecs = 1;
+		} else if (likely(nr_iovecs)) {
 			unsigned long idx = 0; /* shut up gcc */
 
 			bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx, bs);
@@ -1204,7 +1216,7 @@ static void __init biovec_init_slabs(voi
 		struct biovec_slab *bvs = bvec_slabs + i;
 
 		size = bvs->nr_vecs * sizeof(struct bio_vec);
-		bvs->slab = kmem_cache_create(bvs->name, size, 0,
+		bvs->slab = kmem_cache_create(bvs->name, size, BIO_ALIGN,
                                 SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
 	}
 }

^ permalink raw reply	[flat|nested] 18+ messages in thread
* RE: [patch] speed up single bio_vec allocation
@ 2006-12-08 22:14 Chen, Kenneth W
  2006-12-14 20:23 ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Chen, Kenneth W @ 2006-12-08 22:14 UTC (permalink / raw)
  To: 'Jens Axboe'; +Cc: 'linux-kernel'

> Chen, Kenneth wrote on Wednesday, December 06, 2006 10:20 AM
> > Jens Axboe wrote on Wednesday, December 06, 2006 2:09 AM
> > This is what I had in mind, in case it wasn't completely clear. Not
> > tested, other than it compiles. Basically it eliminates the small
> > bio_vec pool, and grows the bio by 16-bytes on 64-bit archs, or by
> > 12-bytes on 32-bit archs instead and uses the room at the end for the
> > bio_vec structure.
> 
> Yeah, I had a very similar patch queued internally for the large benchmark
> measurement.  I will post the result as soon as I get it.


Jens, this improves 0.25% on our db transaction processing benchmark setup.
The patch tested is (on top of 2.6.19):
http://marc.theaimsgroup.com/?l=linux-kernel&m=116539972229021&w=2

- Ken

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-12-14 20:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-04 19:27 [patch] speed up single bio_vec allocation Chen, Kenneth W
2006-12-04 20:06 ` Jens Axboe
2006-12-04 20:36   ` Chen, Kenneth W
2006-12-04 20:43     ` Jens Axboe
2006-12-06 10:08       ` Jens Axboe
2006-12-06 10:56         ` Jens Axboe
2006-12-06 18:19         ` Chen, Kenneth W
2006-12-07 19:22           ` Nate Diller
2006-12-07 19:36             ` Chen, Kenneth W
2006-12-07 21:46               ` Nate Diller
2006-12-07 21:52                 ` Chen, Kenneth W
2006-12-07 22:33                   ` Nate Diller
2006-12-08  8:01                     ` Jens Axboe
2006-12-08  2:27 ` Andi Kleen
2006-12-08  4:23   ` Chen, Kenneth W
2006-12-08  4:37     ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2006-12-08 22:14 Chen, Kenneth W
2006-12-14 20:23 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox