public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] concurrent block allocation for ext3
@ 2003-03-10 15:41 Alex Tomas
  2003-03-10 16:25 ` Andreas Dilger
  2003-03-14 21:22 ` Martin J. Bligh
  0 siblings, 2 replies; 7+ messages in thread
From: Alex Tomas @ 2003-03-10 15:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Alex Tomas


Hi!

Here is the small patch which implements concurrent block allocation
for ext3. It removes lock_super() in ext3_new_block() and ext3_free_blocks().
Modifications of counters in superblock and group descriptors are protected
by spinlock. Tested on SMP for several hours.


--- linux/fs/ext3/balloc.c	Thu Feb 20 16:19:06 2003
+++ balloc.c	Mon Mar 10 16:00:49 2003
@@ -118,7 +118,6 @@
 		printk ("ext3_free_blocks: nonexistent device");
 		return;
 	}
-	lock_super (sb);
 	es = EXT3_SB(sb)->s_es;
 	if (block < le32_to_cpu(es->s_first_data_block) ||
 	    block + count < block ||
@@ -214,11 +213,13 @@
 				      block + i);
 			BUFFER_TRACE(bitmap_bh, "bit already cleared");
 		} else {
+			spin_lock(&EXT3_SB(sb)->s_alloc_lock);
 			dquot_freed_blocks++;
 			gdp->bg_free_blocks_count =
 			  cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)+1);
 			es->s_free_blocks_count =
 			  cpu_to_le32(le32_to_cpu(es->s_free_blocks_count)+1);
+			spin_unlock(&EXT3_SB(sb)->s_alloc_lock);
 		}
 		/* @@@ This prevents newly-allocated data from being
 		 * freed and then reallocated within the same
@@ -267,7 +268,6 @@
 error_return:
 	brelse(bitmap_bh);
 	ext3_std_error(sb, err);
-	unlock_super(sb);
 	if (dquot_freed_blocks)
 		DQUOT_FREE_BLOCK(inode, dquot_freed_blocks);
 	return;
@@ -408,7 +408,6 @@
 		return 0;
 	}
 
-	lock_super(sb);
 	es = EXT3_SB(sb)->s_es;
 	if (le32_to_cpu(es->s_free_blocks_count) <=
 			le32_to_cpu(es->s_r_blocks_count) &&
@@ -461,6 +460,7 @@
 
 	ext3_debug("Bit not found in block group %d.\n", group_no);
 
+repeat:
 	/*
 	 * Now search the rest of the groups.  We assume that 
 	 * i and gdp correctly point to the last group visited.
@@ -538,9 +538,9 @@
 
 	/* The superblock lock should guard against anybody else beating
 	 * us to this point! */
-	J_ASSERT_BH(bitmap_bh, !ext3_test_bit(ret_block, bitmap_bh->b_data));
 	BUFFER_TRACE(bitmap_bh, "setting bitmap bit");
-	ext3_set_bit(ret_block, bitmap_bh->b_data);
+	if (ext3_set_bit(ret_block, bitmap_bh->b_data)) 
+		goto repeat;
 	performed_allocation = 1;
 
 #ifdef CONFIG_JBD_DEBUG
@@ -586,11 +586,13 @@
 	ext3_debug("allocating block %d. Goal hits %d of %d.\n",
 			ret_block, goal_hits, goal_attempts);
 
+	spin_lock(&EXT3_SB(sb)->s_alloc_lock);
 	gdp->bg_free_blocks_count =
 			cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) - 1);
 	es->s_free_blocks_count =
 			cpu_to_le32(le32_to_cpu(es->s_free_blocks_count) - 1);
-
+	spin_unlock(&EXT3_SB(sb)->s_alloc_lock);
+	
 	BUFFER_TRACE(gdp_bh, "journal_dirty_metadata for group descriptor");
 	err = ext3_journal_dirty_metadata(handle, gdp_bh);
 	if (!fatal)
@@ -606,7 +608,6 @@
 	if (fatal)
 		goto out;
 
-	unlock_super(sb);
 	*errp = 0;
 	brelse(bitmap_bh);
 	return ret_block;
@@ -618,7 +619,6 @@
 		*errp = fatal;
 		ext3_std_error(sb, fatal);
 	}
-	unlock_super(sb);
 	/*
 	 * Undo the block allocation
 	 */


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] concurrent block allocation for ext3
  2003-03-10 15:41 [PATCH] concurrent block allocation for ext3 Alex Tomas
@ 2003-03-10 16:25 ` Andreas Dilger
  2003-03-10 16:33   ` Alex Tomas
  2003-03-10 16:43   ` Daniel Phillips
  2003-03-14 21:22 ` Martin J. Bligh
  1 sibling, 2 replies; 7+ messages in thread
From: Andreas Dilger @ 2003-03-10 16:25 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel, Andrew Morton

On Mar 10, 2003  18:41 +0300, Alex Tomas wrote:
> Here is the small patch which implements concurrent block allocation
> for ext3. It removes lock_super() in ext3_new_block() and ext3_free_blocks().
> Modifications of counters in superblock and group descriptors are protected
> by spinlock. Tested on SMP for several hours.

Any ideas on how much this improves the performance?  What sort of tests
were you running?  We could improve things a bit further by having separate
per-group locks for the update of the group descriptor info, and only
lazily update the superblock at statfs and unmount time (with a suitable
feature flag so e2fsck can fix this up at recovery time), but you seem
to have gotten the majority of the parallelism from this fix.

> @@ -214,11 +213,13 @@
>  				      block + i);
>  			BUFFER_TRACE(bitmap_bh, "bit already cleared");
>  		} else {
> +			spin_lock(&EXT3_SB(sb)->s_alloc_lock);
>  			dquot_freed_blocks++;
>  			gdp->bg_free_blocks_count =
>  			  cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)+1);
>  			es->s_free_blocks_count =
>  			  cpu_to_le32(le32_to_cpu(es->s_free_blocks_count)+1);
> +			spin_unlock(&EXT3_SB(sb)->s_alloc_lock);

One minor nit is that you left an ext3_error() for the "bit already cleared"
case just above this patch hunk.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] concurrent block allocation for ext3
  2003-03-10 16:25 ` Andreas Dilger
@ 2003-03-10 16:33   ` Alex Tomas
  2003-03-10 16:43   ` Daniel Phillips
  1 sibling, 0 replies; 7+ messages in thread
From: Alex Tomas @ 2003-03-10 16:33 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Alex Tomas, linux-kernel, Andrew Morton

>>>>> Andreas Dilger (AD) writes:

 AD> Any ideas on how much this improves the performance?  What sort
 AD> of tests were you running?  We could improve things a bit further
 AD> by having separate per-group locks for the update of the group
 AD> descriptor info, and only lazily update the superblock at statfs
 AD> and unmount time (with a suitable feature flag so e2fsck can fix
 AD> this up at recovery time), but you seem to have gotten the
 AD> majority of the parallelism from this fix.

I'm trying to measure improvement.

The tests were:

1) on big fs (1GB)
lots of processes (up to 50) creating, removing directories and files +
untaring kernel and make -j4 bzImage +
dd if=/dev/zero of=/mnt/dump.file bs=1M count=8000; rm -f /mnt/dump.file

2) on small fs (64MB)
20 processes create and remove lots of files and directories


in fact, I catched dozens of debug messages about set_bit collision. Then
I fscked fs to be sure all is ok.

 >> @@ -214,11 +213,13 @@ block + i); BUFFER_TRACE(bitmap_bh, "bit
 >> already cleared"); } else { +
 >> spin_lock(&EXT3_SB(sb)->s_alloc_lock); dquot_freed_blocks++;
 gdp-> bg_free_blocks_count =
 >> cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)+1);
 es-> s_free_blocks_count =
 >> cpu_to_le32(le32_to_cpu(es->s_free_blocks_count)+1); +
 >> spin_unlock(&EXT3_SB(sb)->s_alloc_lock);

 AD> One minor nit is that you left an ext3_error() for the "bit
 AD> already cleared" case just above this patch hunk.


hmm. whats wrong with it?

with best regards, Alex


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] concurrent block allocation for ext3
  2003-03-10 16:25 ` Andreas Dilger
  2003-03-10 16:33   ` Alex Tomas
@ 2003-03-10 16:43   ` Daniel Phillips
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Phillips @ 2003-03-10 16:43 UTC (permalink / raw)
  To: Andreas Dilger, Alex Tomas; +Cc: linux-kernel, Andrew Morton

On Mon 10 Mar 03 17:25, Andreas Dilger wrote:
> One minor nit is that you left an ext3_error() for the "bit already
> cleared" case just above this patch hunk.

But that one belongs there, because no two threads should be trying to free 
the same block at the same time.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] concurrent block allocation for ext3
  2003-03-10 15:41 [PATCH] concurrent block allocation for ext3 Alex Tomas
  2003-03-10 16:25 ` Andreas Dilger
@ 2003-03-14 21:22 ` Martin J. Bligh
  2003-03-15  2:56   ` Martin J. Bligh
  1 sibling, 1 reply; 7+ messages in thread
From: Martin J. Bligh @ 2003-03-14 21:22 UTC (permalink / raw)
  To: Alex Tomas, linux-kernel; +Cc: Andrew Morton, anton

SDET on my machine (16x NUMA-Q) has fallen in love with your patch, 
and has decided to elope with it to a small desert island. This is 
despite it's one disk hung off node 0, and the IO througput of a 
slightly damp piece of cotton thread. Apologies for the loss of your 
patch as it gets whisked away ;-)

M.

PS. Oh, I had this bit, per akpm-instructions: For best results, add ____cacheline_aligned_in_smp to struct ext2_bg_info

PPS. I'll try to run some more focused tests with aim7 over the weekend.
As if we needed it ...

-------------------------

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1  (see disclaimer)
                           Throughput    Std. Dev
          2.5.64-bk3-mjb3       100.0%         1.8%
         2.5.64-mjb3-ext2       102.0%         1.1%

SDET 2  (see disclaimer)
                           Throughput    Std. Dev
          2.5.64-bk3-mjb3       100.0%         3.7%
         2.5.64-mjb3-ext2       106.1%         3.1%

SDET 4  (see disclaimer)
                           Throughput    Std. Dev
          2.5.64-bk3-mjb3       100.0%         1.5%
         2.5.64-mjb3-ext2       101.1%         2.1%

SDET 8  (see disclaimer)
                           Throughput    Std. Dev
          2.5.64-bk3-mjb3       100.0%         0.2%
         2.5.64-mjb3-ext2       113.3%         0.7%

SDET 16  (see disclaimer)
                           Throughput    Std. Dev
          2.5.64-bk3-mjb3       100.0%         1.1%
         2.5.64-mjb3-ext2       167.1%         0.8%

SDET 32  (see disclaimer)
                           Throughput    Std. Dev
          2.5.64-bk3-mjb3       100.0%         0.9%
         2.5.64-mjb3-ext2       170.7%         0.1%

SDET 64  (see disclaimer)
                           Throughput    Std. Dev
          2.5.64-bk3-mjb3       100.0%         0.7%
         2.5.64-mjb3-ext2       157.2%         0.5%

SDET 128  (see disclaimer)
                           Throughput    Std. Dev
          2.5.64-bk3-mjb3       100.0%         0.3%
         2.5.64-mjb3-ext2       151.3%         0.8%


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] concurrent block allocation for ext3
  2003-03-14 21:22 ` Martin J. Bligh
@ 2003-03-15  2:56   ` Martin J. Bligh
  2003-03-15  6:08     ` Martin J. Bligh
  0 siblings, 1 reply; 7+ messages in thread
From: Martin J. Bligh @ 2003-03-15  2:56 UTC (permalink / raw)
  To: Alex Tomas, linux-kernel; +Cc: Andrew Morton, anton

> SDET on my machine (16x NUMA-Q) has fallen in love with your patch, 
> and has decided to elope with it to a small desert island. This is 
> despite it's one disk hung off node 0, and the IO througput of a 
> slightly damp piece of cotton thread. Apologies for the loss of your 
> patch as it gets whisked away ;-)

Dbench (1 disk, x440 8 real cpus, 16 HT ones)

before: 
Throughput 265.032 MB/sec (NB=331.29 MB/sec  2650.32 MBit/sec)  256 procs
after:
Throughput 381.964 MB/sec (NB=477.454 MB/sec  3819.64 MBit/sec)  256 procs

(I took the second run, first ones are slower, seems to be stable after)

NUMA-Q 16-way (1 disk. 16 cpus)

before:
Throughput 48.5304 MB/sec (NB=60.663 MB/sec  485.304 MBit/sec)  256 procs
after:
Throughput 58.8483 MB/sec (NB=73.5603 MB/sec  588.483 MBit/sec)  256 procs

NUMA-Q has slower disks, old adaptors, and a slow cross-node interconnect.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] concurrent block allocation for ext3
  2003-03-15  2:56   ` Martin J. Bligh
@ 2003-03-15  6:08     ` Martin J. Bligh
  0 siblings, 0 replies; 7+ messages in thread
From: Martin J. Bligh @ 2003-03-15  6:08 UTC (permalink / raw)
  To: Alex Tomas, linux-kernel; +Cc: Andrew Morton, anton

> before:
> Throughput 48.5304 MB/sec (NB=60.663 MB/sec  485.304 MBit/sec)  256 procs
> after:
> Throughput 58.8483 MB/sec (NB=73.5603 MB/sec  588.483 MBit/sec)  256 procs

OK, akpm wanted dbench 32 instead:

before:

Throughput 187.637 MB/sec (NB=234.546 MB/sec  1876.37 MBit/sec)  32 procs

after:

Throughput 378.664 MB/sec (NB=473.33 MB/sec  3786.64 MBit/sec)  32 procs

/me likes.

M.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-03-15  5:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-10 15:41 [PATCH] concurrent block allocation for ext3 Alex Tomas
2003-03-10 16:25 ` Andreas Dilger
2003-03-10 16:33   ` Alex Tomas
2003-03-10 16:43   ` Daniel Phillips
2003-03-14 21:22 ` Martin J. Bligh
2003-03-15  2:56   ` Martin J. Bligh
2003-03-15  6:08     ` Martin J. Bligh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox