jbd/jbd2 performance improvements

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* jbd/jbd2 performance improvements
@ 2008-10-07 13:14 Theodore Ts'o
  2008-10-15 17:29 ` Ric Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Theodore Ts'o @ 2008-10-07 13:14 UTC (permalink / raw)
  To: linux-ext4

As I mentioned on the ext4 call yesterday, there was an interesting
thread on LKML that wasn't cc'ed onto the linux-ext4 mailing list.  So
in case folks missed it, it might be worth taking a look at this mail
thread:

	[PATCH] Give kjournald a IOPRIO_CLASS_RT io priority

	http://lkml.org/lkml/2008/10/1/405

The main issue that got discussed was the age-old "entaglement" problem.
The jbd/jbd2 layer is supposed to avoid this by not blocking the
"current" transaction why the blocks from the previous "committing"
transaction are still being written out to disk.  Apparently this was
broken sometime in the 2.5 time-frame:

	http://lkml.org/lkml/2008/10/2/41
	http://lkml.org/lkml/2008/10/2/322

Later in the thread, a major contention point in do_get_write_access()
was identified as the problem:

	http://lkml.org/lkml/2008/10/3/7

... and then andrew produced the following "hacky" fix:

	http://lkml.org/lkml/2008/10/3/22

If someone has time to runs some benchmarks to see how this improves
things, especially on a workload that has plenty of "engtanglements",
that would be great.  (I bet Ric's fs_mark run should do a good job;
fsyncs to creates lots of commits and the need to modify blocks that had
been modified in the previous tansactions.)

If we can get some quick testing done, and it shows really good results,
this could be something that could try fast-tracking into the 2.6.28
merge window.

						- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: jbd/jbd2 performance improvements
  2008-10-07 13:14 jbd/jbd2 performance improvements Theodore Ts'o
@ 2008-10-15 17:29 ` Ric Wheeler
  2008-10-16  6:04   ` Solofo.Ramangalahy
  0 siblings, 1 reply; 8+ messages in thread
From: Ric Wheeler @ 2008-10-15 17:29 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, Eric Sandeen

Theodore Ts'o wrote:
> As I mentioned on the ext4 call yesterday, there was an interesting
> thread on LKML that wasn't cc'ed onto the linux-ext4 mailing list.  So
> in case folks missed it, it might be worth taking a look at this mail
> thread:
>
> 	[PATCH] Give kjournald a IOPRIO_CLASS_RT io priority
>
> 	http://lkml.org/lkml/2008/10/1/405
>
> The main issue that got discussed was the age-old "entaglement" problem.
> The jbd/jbd2 layer is supposed to avoid this by not blocking the
> "current" transaction why the blocks from the previous "committing"
> transaction are still being written out to disk.  Apparently this was
> broken sometime in the 2.5 time-frame:
>
> 	http://lkml.org/lkml/2008/10/2/41
> 	http://lkml.org/lkml/2008/10/2/322
>
> Later in the thread, a major contention point in do_get_write_access()
> was identified as the problem:
>
> 	http://lkml.org/lkml/2008/10/3/7
>
> ... and then andrew produced the following "hacky" fix:
>
> 	http://lkml.org/lkml/2008/10/3/22
>
> If someone has time to runs some benchmarks to see how this improves
> things, especially on a workload that has plenty of "engtanglements",
> that would be great.  (I bet Ric's fs_mark run should do a good job;
> fsyncs to creates lots of commits and the need to modify blocks that had
> been modified in the previous tansactions.)
>
> If we can get some quick testing done, and it shows really good results,
> this could be something that could try fast-tracking into the 2.6.28
> merge window.
>
> 						- Ted
>   

We are going to try and poke at this - do you suspect a single or 
multi-threaded test would work best?

Ric


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: jbd/jbd2 performance improvements
  2008-10-15 17:29 ` Ric Wheeler
@ 2008-10-16  6:04   ` Solofo.Ramangalahy
  2008-10-16 12:06     ` Ric Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Solofo.Ramangalahy @ 2008-10-16  6:04 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Theodore Ts'o, linux-ext4, Eric Sandeen

Hi Ric,

>>>>> On Wed, 15 Oct 2008 13:29:55 -0400, Ric Wheeler <rwheeler@redhat.com> said:
    Ric> We are going to try and poke at this - do you suspect a single or 
    Ric> multi-threaded test would work best?

I've performed some tests:
http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/
http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/results_sorted.txt.html

I now realize that the results may not be valid since I used kvm, but
they do show variation wrt. the number of threads.

So you may want to test both single and multi-threaded.

-- 
solofo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: jbd/jbd2 performance improvements
  2008-10-16  6:04   ` Solofo.Ramangalahy
@ 2008-10-16 12:06     ` Ric Wheeler
  2008-10-16 12:39       ` Eric Sandeen
  0 siblings, 1 reply; 8+ messages in thread
From: Ric Wheeler @ 2008-10-16 12:06 UTC (permalink / raw)
  To: Solofo.Ramangalahy; +Cc: Theodore Ts'o, linux-ext4, Eric Sandeen

Solofo.Ramangalahy@bull.net wrote:
> Hi Ric,
>
>   
>>>>>> On Wed, 15 Oct 2008 13:29:55 -0400, Ric Wheeler <rwheeler@redhat.com> said:
>>>>>>             
>     Ric> We are going to try and poke at this - do you suspect a single or 
>     Ric> multi-threaded test would work best?
>
> I've performed some tests:
> http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/
> http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/results_sorted.txt.html
>
> I now realize that the results may not be valid since I used kvm, but
> they do show variation wrt. the number of threads.
>
> So you may want to test both single and multi-threaded.
>
>   
A very thorough test, but the results don't seem to point to a 
consistent winner.

I agree that running without KVM in the picture might be very 
interesting. Eric has some similar tests underway, I think that his 
results were also inconclusive so far...

Ric



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: jbd/jbd2 performance improvements
  2008-10-16 12:06     ` Ric Wheeler
@ 2008-10-16 12:39       ` Eric Sandeen
  2008-10-23 10:42         ` Solofo.Ramangalahy
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2008-10-16 12:39 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Solofo.Ramangalahy, Theodore Ts'o, linux-ext4

Ric Wheeler wrote:
> Solofo.Ramangalahy@bull.net wrote:
>> Hi Ric,
>>
>>   
>>>>>>> On Wed, 15 Oct 2008 13:29:55 -0400, Ric Wheeler <rwheeler@redhat.com> said:
>>>>>>>             
>>     Ric> We are going to try and poke at this - do you suspect a single or 
>>     Ric> multi-threaded test would work best?
>>
>> I've performed some tests:
>> http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/
>> http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/results_sorted.txt.html
>>
>> I now realize that the results may not be valid since I used kvm, but
>> they do show variation wrt. the number of threads.
>>
>> So you may want to test both single and multi-threaded.
>>
>>   
> A very thorough test, but the results don't seem to point to a 
> consistent winner.
> 
> I agree that running without KVM in the picture might be very 
> interesting. Eric has some similar tests underway, I think that his 
> results were also inconclusive so far...

Yep, I've yet to find an fs_mark invocation, at least, which shows a
clear winner.  I also ran w/ akpm's suggested io_schedule watcher patch
and never see us waiting on this lock (I did set it to 1s though, which
is probably too long for my storage).

-Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: jbd/jbd2 performance improvements
  2008-10-16 12:39       ` Eric Sandeen
@ 2008-10-23 10:42         ` Solofo.Ramangalahy
  2008-10-23 12:00           ` Ric Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Solofo.Ramangalahy @ 2008-10-23 10:42 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Ric Wheeler, Theodore Ts'o, linux-ext4

>>>>> On Thu, 16 Oct 2008 07:39:04 -0500, Eric Sandeen <sandeen@redhat.com> said:
    >> A very thorough test, but the results don't seem to point to a
    >> consistent winner.
    >> 
    >> I agree that running without KVM in the picture might be very
    >> interesting. Eric has some similar tests underway, I think that
    >> his results were also inconclusive so far...

    Eric> Yep, I've yet to find an fs_mark invocation, at least, which
    Eric> shows a clear winner.  I also ran w/ akpm's suggested
    Eric> io_schedule watcher patch and never see us waiting on this
    Eric> lock (I did set it to 1s though, which is probably too long
    Eric> for my storage).

I've redone the tests without kvm. Still no clear winner

To sum up:
. kernel ext4-stable
. mkfs (1.41.3) default options
. mount options: default, akpm, akpm_lock_hack
. scheduler default (cfq)
. 8 cpus, single 15K rpm disk.
. without the high latency detection patch
. a broad range of fs_mark (all the sync strategies, from 1 to 32
  threads, up to 10000 files/thread, several directories).
. a "tangled synchrony" workload as mentionned in the "Analysis and
  evolution of journaling file systems" paper discussed monday.

First things first, maybe I should have spent more time
reproducing Arjan behavior before testing.

This was not a complete waste of time though, as the following errors
were spotted during the runs:
1. EXT4-fs error (device sdb): ext4_free_inode: bit already cleared for inode 32769
2. EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 8
3. BUG: spinlock wrong CPU on CPU#3, fs_mark/1975
 lock: ffff88015a44f480, .magic: dead4ead, .owner: fs_mark/1975, .owner_cpu: 1
Pid: 1975, comm: fs_mark Not tainted 2.6.27.1-ext4-stable-gcov #1

Call Trace:
 [<ffffffff811a47a2>] spin_bug+0xa2/0xaa
 [<ffffffff811a481f>] _raw_spin_unlock+0x75/0x8a
 [<ffffffff814552c1>] _spin_unlock+0x26/0x2a
 [<ffffffffa00d4fd3>] ext4_read_inode_bitmap+0xfa/0x14e [ext4]
 [<ffffffffa00d564b>] ext4_new_inode+0x5d4/0xec4 [ext4]
 [<ffffffff810562db>] ? __lock_acquire+0x481/0x7d8
 [<ffffffffa00c2430>] ? jbd2_journal_start+0xef/0x11a [jbd2]
 [<ffffffffa00c2430>] ? jbd2_journal_start+0xef/0x11a [jbd2]
 [<ffffffffa00deb99>] ext4_create+0xc7/0x144 [ext4]
 [<ffffffff810b6734>] vfs_create+0xdf/0x155
 [<ffffffff810b8905>] do_filp_open+0x220/0x7fc
 [<ffffffff814552c1>] ? _spin_unlock+0x26/0x2a
 [<ffffffff810abe5a>] do_sys_open+0x53/0xd3
 [<ffffffff810abf03>] sys_open+0x1b/0x1d
 [<ffffffff8100bf8b>] system_call_fastpath+0x16/0x1b

Anybody seen this in their logs?

The "bit already cleared for inode" is triggered by:
fs_mark -v -d /mnt/test-ext4 -n10000 -D10 -N1000 -t8 -s4096 -S0

-- 
solofo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: jbd/jbd2 performance improvements
  2008-10-23 10:42         ` Solofo.Ramangalahy
@ 2008-10-23 12:00           ` Ric Wheeler
  2008-10-23 12:22             ` Solofo.Ramangalahy
  0 siblings, 1 reply; 8+ messages in thread
From: Ric Wheeler @ 2008-10-23 12:00 UTC (permalink / raw)
  To: Solofo.Ramangalahy, Arjan De Ven
  Cc: Eric Sandeen, Ric Wheeler, Theodore Ts'o, linux-ext4,
	Andrew Morton

Solofo.Ramangalahy@bull.net wrote:
>>>>>> On Thu, 16 Oct 2008 07:39:04 -0500, Eric Sandeen <sandeen@redhat.com> said:
>>>>>>             
>     >> A very thorough test, but the results don't seem to point to a
>     >> consistent winner.
>     >> 
>     >> I agree that running without KVM in the picture might be very
>     >> interesting. Eric has some similar tests underway, I think that
>     >> his results were also inconclusive so far...
>
>     Eric> Yep, I've yet to find an fs_mark invocation, at least, which
>     Eric> shows a clear winner.  I also ran w/ akpm's suggested
>     Eric> io_schedule watcher patch and never see us waiting on this
>     Eric> lock (I did set it to 1s though, which is probably too long
>     Eric> for my storage).
>
> I've redone the tests without kvm. Still no clear winner
>
> To sum up:
> . kernel ext4-stable
> . mkfs (1.41.3) default options
> . mount options: default, akpm, akpm_lock_hack
> . scheduler default (cfq)
> . 8 cpus, single 15K rpm disk.
> . without the high latency detection patch
> . a broad range of fs_mark (all the sync strategies, from 1 to 32
>   threads, up to 10000 files/thread, several directories).
> . a "tangled synchrony" workload as mentionned in the "Analysis and
>   evolution of journaling file systems" paper discussed monday.
>
> First things first, maybe I should have spent more time
> reproducing Arjan behavior before testing.
>
> This was not a complete waste of time though, as the following errors
> were spotted during the runs:
> 1. EXT4-fs error (device sdb): ext4_free_inode: bit already cleared for inode 32769
> 2. EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 8
> 3. BUG: spinlock wrong CPU on CPU#3, fs_mark/1975
>  lock: ffff88015a44f480, .magic: dead4ead, .owner: fs_mark/1975, .owner_cpu: 1
> Pid: 1975, comm: fs_mark Not tainted 2.6.27.1-ext4-stable-gcov #1
>
> Call Trace:
>  [<ffffffff811a47a2>] spin_bug+0xa2/0xaa
>  [<ffffffff811a481f>] _raw_spin_unlock+0x75/0x8a
>  [<ffffffff814552c1>] _spin_unlock+0x26/0x2a
>  [<ffffffffa00d4fd3>] ext4_read_inode_bitmap+0xfa/0x14e [ext4]
>  [<ffffffffa00d564b>] ext4_new_inode+0x5d4/0xec4 [ext4]
>  [<ffffffff810562db>] ? __lock_acquire+0x481/0x7d8
>  [<ffffffffa00c2430>] ? jbd2_journal_start+0xef/0x11a [jbd2]
>  [<ffffffffa00c2430>] ? jbd2_journal_start+0xef/0x11a [jbd2]
>  [<ffffffffa00deb99>] ext4_create+0xc7/0x144 [ext4]
>  [<ffffffff810b6734>] vfs_create+0xdf/0x155
>  [<ffffffff810b8905>] do_filp_open+0x220/0x7fc
>  [<ffffffff814552c1>] ? _spin_unlock+0x26/0x2a
>  [<ffffffff810abe5a>] do_sys_open+0x53/0xd3
>  [<ffffffff810abf03>] sys_open+0x1b/0x1d
>  [<ffffffff8100bf8b>] system_call_fastpath+0x16/0x1b
>  
> Anybody seen this in their logs?
>
> The "bit already cleared for inode" is triggered by:
> fs_mark -v -d /mnt/test-ext4 -n10000 -D10 -N1000 -t8 -s4096 -S0
>
>   
Arjan,

Do you have any details on the test case that you ran that showed a 
clear improvement? What kind of storage & IO pattern did you use?

Regards,

Ric


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: jbd/jbd2 performance improvements
  2008-10-23 12:00           ` Ric Wheeler
@ 2008-10-23 12:22             ` Solofo.Ramangalahy
  0 siblings, 0 replies; 8+ messages in thread
From: Solofo.Ramangalahy @ 2008-10-23 12:22 UTC (permalink / raw)
  To: Arjan De Ven
  Cc: Eric Sandeen, Ric Wheeler, Theodore Ts'o, linux-ext4,
	Andrew Morton

Ric Wheeler writes:
 > Do you have any details on the test case that you ran that showed a 
 > clear improvement? What kind of storage & IO pattern did you use?

Is it possible to record latencytop output (like top batch mode) to
capture the highest latency during a test run?

Or how did you collect this:
>my reproducer is sadly very simple (claws-mail is my mail client that uses maildir)
>Process claws-mail (4896)                  Total: 2829.7 msec
>EXT3: Waiting for journal access                  2491.0 msec         88.4 %
>Writing back inodes                                160.9 msec          5.7 %
>synchronous write                                   78.8 msec          3.0 %
>is an example of such a trace (this is with patch, without patch the numbers are about 3x bigger)
>Waiting for journal access is "journal_get_write_access"
>Writing back inodes is "writeback_inodes"
>synchronous write is "do_sync_write"


-- 
solofo



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-10-23 12:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-07 13:14 jbd/jbd2 performance improvements Theodore Ts'o
2008-10-15 17:29 ` Ric Wheeler
2008-10-16  6:04   ` Solofo.Ramangalahy
2008-10-16 12:06     ` Ric Wheeler
2008-10-16 12:39       ` Eric Sandeen
2008-10-23 10:42         ` Solofo.Ramangalahy
2008-10-23 12:00           ` Ric Wheeler
2008-10-23 12:22             ` Solofo.Ramangalahy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).