* jbd/jbd2 performance improvements @ 2008-10-07 13:14 Theodore Ts'o 2008-10-15 17:29 ` Ric Wheeler 0 siblings, 1 reply; 8+ messages in thread From: Theodore Ts'o @ 2008-10-07 13:14 UTC (permalink / raw) To: linux-ext4 As I mentioned on the ext4 call yesterday, there was an interesting thread on LKML that wasn't cc'ed onto the linux-ext4 mailing list. So in case folks missed it, it might be worth taking a look at this mail thread: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority http://lkml.org/lkml/2008/10/1/405 The main issue that got discussed was the age-old "entaglement" problem. The jbd/jbd2 layer is supposed to avoid this by not blocking the "current" transaction why the blocks from the previous "committing" transaction are still being written out to disk. Apparently this was broken sometime in the 2.5 time-frame: http://lkml.org/lkml/2008/10/2/41 http://lkml.org/lkml/2008/10/2/322 Later in the thread, a major contention point in do_get_write_access() was identified as the problem: http://lkml.org/lkml/2008/10/3/7 ... and then andrew produced the following "hacky" fix: http://lkml.org/lkml/2008/10/3/22 If someone has time to runs some benchmarks to see how this improves things, especially on a workload that has plenty of "engtanglements", that would be great. (I bet Ric's fs_mark run should do a good job; fsyncs to creates lots of commits and the need to modify blocks that had been modified in the previous tansactions.) If we can get some quick testing done, and it shows really good results, this could be something that could try fast-tracking into the 2.6.28 merge window. - Ted ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: jbd/jbd2 performance improvements 2008-10-07 13:14 jbd/jbd2 performance improvements Theodore Ts'o @ 2008-10-15 17:29 ` Ric Wheeler 2008-10-16 6:04 ` Solofo.Ramangalahy 0 siblings, 1 reply; 8+ messages in thread From: Ric Wheeler @ 2008-10-15 17:29 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4, Eric Sandeen Theodore Ts'o wrote: > As I mentioned on the ext4 call yesterday, there was an interesting > thread on LKML that wasn't cc'ed onto the linux-ext4 mailing list. So > in case folks missed it, it might be worth taking a look at this mail > thread: > > [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority > > http://lkml.org/lkml/2008/10/1/405 > > The main issue that got discussed was the age-old "entaglement" problem. > The jbd/jbd2 layer is supposed to avoid this by not blocking the > "current" transaction why the blocks from the previous "committing" > transaction are still being written out to disk. Apparently this was > broken sometime in the 2.5 time-frame: > > http://lkml.org/lkml/2008/10/2/41 > http://lkml.org/lkml/2008/10/2/322 > > Later in the thread, a major contention point in do_get_write_access() > was identified as the problem: > > http://lkml.org/lkml/2008/10/3/7 > > ... and then andrew produced the following "hacky" fix: > > http://lkml.org/lkml/2008/10/3/22 > > If someone has time to runs some benchmarks to see how this improves > things, especially on a workload that has plenty of "engtanglements", > that would be great. (I bet Ric's fs_mark run should do a good job; > fsyncs to creates lots of commits and the need to modify blocks that had > been modified in the previous tansactions.) > > If we can get some quick testing done, and it shows really good results, > this could be something that could try fast-tracking into the 2.6.28 > merge window. > > - Ted > We are going to try and poke at this - do you suspect a single or multi-threaded test would work best? Ric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: jbd/jbd2 performance improvements 2008-10-15 17:29 ` Ric Wheeler @ 2008-10-16 6:04 ` Solofo.Ramangalahy 2008-10-16 12:06 ` Ric Wheeler 0 siblings, 1 reply; 8+ messages in thread From: Solofo.Ramangalahy @ 2008-10-16 6:04 UTC (permalink / raw) To: Ric Wheeler; +Cc: Theodore Ts'o, linux-ext4, Eric Sandeen Hi Ric, >>>>> On Wed, 15 Oct 2008 13:29:55 -0400, Ric Wheeler <rwheeler@redhat.com> said: Ric> We are going to try and poke at this - do you suspect a single or Ric> multi-threaded test would work best? I've performed some tests: http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/ http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/results_sorted.txt.html I now realize that the results may not be valid since I used kvm, but they do show variation wrt. the number of threads. So you may want to test both single and multi-threaded. -- solofo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: jbd/jbd2 performance improvements 2008-10-16 6:04 ` Solofo.Ramangalahy @ 2008-10-16 12:06 ` Ric Wheeler 2008-10-16 12:39 ` Eric Sandeen 0 siblings, 1 reply; 8+ messages in thread From: Ric Wheeler @ 2008-10-16 12:06 UTC (permalink / raw) To: Solofo.Ramangalahy; +Cc: Theodore Ts'o, linux-ext4, Eric Sandeen Solofo.Ramangalahy@bull.net wrote: > Hi Ric, > > >>>>>> On Wed, 15 Oct 2008 13:29:55 -0400, Ric Wheeler <rwheeler@redhat.com> said: >>>>>> > Ric> We are going to try and poke at this - do you suspect a single or > Ric> multi-threaded test would work best? > > I've performed some tests: > http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/ > http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/results_sorted.txt.html > > I now realize that the results may not be valid since I used kvm, but > they do show variation wrt. the number of threads. > > So you may want to test both single and multi-threaded. > > A very thorough test, but the results don't seem to point to a consistent winner. I agree that running without KVM in the picture might be very interesting. Eric has some similar tests underway, I think that his results were also inconclusive so far... Ric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: jbd/jbd2 performance improvements 2008-10-16 12:06 ` Ric Wheeler @ 2008-10-16 12:39 ` Eric Sandeen 2008-10-23 10:42 ` Solofo.Ramangalahy 0 siblings, 1 reply; 8+ messages in thread From: Eric Sandeen @ 2008-10-16 12:39 UTC (permalink / raw) To: Ric Wheeler; +Cc: Solofo.Ramangalahy, Theodore Ts'o, linux-ext4 Ric Wheeler wrote: > Solofo.Ramangalahy@bull.net wrote: >> Hi Ric, >> >> >>>>>>> On Wed, 15 Oct 2008 13:29:55 -0400, Ric Wheeler <rwheeler@redhat.com> said: >>>>>>> >> Ric> We are going to try and poke at this - do you suspect a single or >> Ric> multi-threaded test would work best? >> >> I've performed some tests: >> http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/ >> http://www.bullopensource.org/ext4/20081013-2.6.27-rc9-ext4-1-akpm-fix-run6/results_sorted.txt.html >> >> I now realize that the results may not be valid since I used kvm, but >> they do show variation wrt. the number of threads. >> >> So you may want to test both single and multi-threaded. >> >> > A very thorough test, but the results don't seem to point to a > consistent winner. > > I agree that running without KVM in the picture might be very > interesting. Eric has some similar tests underway, I think that his > results were also inconclusive so far... Yep, I've yet to find an fs_mark invocation, at least, which shows a clear winner. I also ran w/ akpm's suggested io_schedule watcher patch and never see us waiting on this lock (I did set it to 1s though, which is probably too long for my storage). -Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: jbd/jbd2 performance improvements 2008-10-16 12:39 ` Eric Sandeen @ 2008-10-23 10:42 ` Solofo.Ramangalahy 2008-10-23 12:00 ` Ric Wheeler 0 siblings, 1 reply; 8+ messages in thread From: Solofo.Ramangalahy @ 2008-10-23 10:42 UTC (permalink / raw) To: Eric Sandeen; +Cc: Ric Wheeler, Theodore Ts'o, linux-ext4 >>>>> On Thu, 16 Oct 2008 07:39:04 -0500, Eric Sandeen <sandeen@redhat.com> said: >> A very thorough test, but the results don't seem to point to a >> consistent winner. >> >> I agree that running without KVM in the picture might be very >> interesting. Eric has some similar tests underway, I think that >> his results were also inconclusive so far... Eric> Yep, I've yet to find an fs_mark invocation, at least, which Eric> shows a clear winner. I also ran w/ akpm's suggested Eric> io_schedule watcher patch and never see us waiting on this Eric> lock (I did set it to 1s though, which is probably too long Eric> for my storage). I've redone the tests without kvm. Still no clear winner To sum up: . kernel ext4-stable . mkfs (1.41.3) default options . mount options: default, akpm, akpm_lock_hack . scheduler default (cfq) . 8 cpus, single 15K rpm disk. . without the high latency detection patch . a broad range of fs_mark (all the sync strategies, from 1 to 32 threads, up to 10000 files/thread, several directories). . a "tangled synchrony" workload as mentionned in the "Analysis and evolution of journaling file systems" paper discussed monday. First things first, maybe I should have spent more time reproducing Arjan behavior before testing. This was not a complete waste of time though, as the following errors were spotted during the runs: 1. EXT4-fs error (device sdb): ext4_free_inode: bit already cleared for inode 32769 2. EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 8 3. BUG: spinlock wrong CPU on CPU#3, fs_mark/1975 lock: ffff88015a44f480, .magic: dead4ead, .owner: fs_mark/1975, .owner_cpu: 1 Pid: 1975, comm: fs_mark Not tainted 2.6.27.1-ext4-stable-gcov #1 Call Trace: [<ffffffff811a47a2>] spin_bug+0xa2/0xaa [<ffffffff811a481f>] _raw_spin_unlock+0x75/0x8a [<ffffffff814552c1>] _spin_unlock+0x26/0x2a [<ffffffffa00d4fd3>] ext4_read_inode_bitmap+0xfa/0x14e [ext4] [<ffffffffa00d564b>] ext4_new_inode+0x5d4/0xec4 [ext4] [<ffffffff810562db>] ? __lock_acquire+0x481/0x7d8 [<ffffffffa00c2430>] ? jbd2_journal_start+0xef/0x11a [jbd2] [<ffffffffa00c2430>] ? jbd2_journal_start+0xef/0x11a [jbd2] [<ffffffffa00deb99>] ext4_create+0xc7/0x144 [ext4] [<ffffffff810b6734>] vfs_create+0xdf/0x155 [<ffffffff810b8905>] do_filp_open+0x220/0x7fc [<ffffffff814552c1>] ? _spin_unlock+0x26/0x2a [<ffffffff810abe5a>] do_sys_open+0x53/0xd3 [<ffffffff810abf03>] sys_open+0x1b/0x1d [<ffffffff8100bf8b>] system_call_fastpath+0x16/0x1b Anybody seen this in their logs? The "bit already cleared for inode" is triggered by: fs_mark -v -d /mnt/test-ext4 -n10000 -D10 -N1000 -t8 -s4096 -S0 -- solofo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: jbd/jbd2 performance improvements 2008-10-23 10:42 ` Solofo.Ramangalahy @ 2008-10-23 12:00 ` Ric Wheeler 2008-10-23 12:22 ` Solofo.Ramangalahy 0 siblings, 1 reply; 8+ messages in thread From: Ric Wheeler @ 2008-10-23 12:00 UTC (permalink / raw) To: Solofo.Ramangalahy, Arjan De Ven Cc: Eric Sandeen, Ric Wheeler, Theodore Ts'o, linux-ext4, Andrew Morton Solofo.Ramangalahy@bull.net wrote: >>>>>> On Thu, 16 Oct 2008 07:39:04 -0500, Eric Sandeen <sandeen@redhat.com> said: >>>>>> > >> A very thorough test, but the results don't seem to point to a > >> consistent winner. > >> > >> I agree that running without KVM in the picture might be very > >> interesting. Eric has some similar tests underway, I think that > >> his results were also inconclusive so far... > > Eric> Yep, I've yet to find an fs_mark invocation, at least, which > Eric> shows a clear winner. I also ran w/ akpm's suggested > Eric> io_schedule watcher patch and never see us waiting on this > Eric> lock (I did set it to 1s though, which is probably too long > Eric> for my storage). > > I've redone the tests without kvm. Still no clear winner > > To sum up: > . kernel ext4-stable > . mkfs (1.41.3) default options > . mount options: default, akpm, akpm_lock_hack > . scheduler default (cfq) > . 8 cpus, single 15K rpm disk. > . without the high latency detection patch > . a broad range of fs_mark (all the sync strategies, from 1 to 32 > threads, up to 10000 files/thread, several directories). > . a "tangled synchrony" workload as mentionned in the "Analysis and > evolution of journaling file systems" paper discussed monday. > > First things first, maybe I should have spent more time > reproducing Arjan behavior before testing. > > This was not a complete waste of time though, as the following errors > were spotted during the runs: > 1. EXT4-fs error (device sdb): ext4_free_inode: bit already cleared for inode 32769 > 2. EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 8 > 3. BUG: spinlock wrong CPU on CPU#3, fs_mark/1975 > lock: ffff88015a44f480, .magic: dead4ead, .owner: fs_mark/1975, .owner_cpu: 1 > Pid: 1975, comm: fs_mark Not tainted 2.6.27.1-ext4-stable-gcov #1 > > Call Trace: > [<ffffffff811a47a2>] spin_bug+0xa2/0xaa > [<ffffffff811a481f>] _raw_spin_unlock+0x75/0x8a > [<ffffffff814552c1>] _spin_unlock+0x26/0x2a > [<ffffffffa00d4fd3>] ext4_read_inode_bitmap+0xfa/0x14e [ext4] > [<ffffffffa00d564b>] ext4_new_inode+0x5d4/0xec4 [ext4] > [<ffffffff810562db>] ? __lock_acquire+0x481/0x7d8 > [<ffffffffa00c2430>] ? jbd2_journal_start+0xef/0x11a [jbd2] > [<ffffffffa00c2430>] ? jbd2_journal_start+0xef/0x11a [jbd2] > [<ffffffffa00deb99>] ext4_create+0xc7/0x144 [ext4] > [<ffffffff810b6734>] vfs_create+0xdf/0x155 > [<ffffffff810b8905>] do_filp_open+0x220/0x7fc > [<ffffffff814552c1>] ? _spin_unlock+0x26/0x2a > [<ffffffff810abe5a>] do_sys_open+0x53/0xd3 > [<ffffffff810abf03>] sys_open+0x1b/0x1d > [<ffffffff8100bf8b>] system_call_fastpath+0x16/0x1b > > Anybody seen this in their logs? > > The "bit already cleared for inode" is triggered by: > fs_mark -v -d /mnt/test-ext4 -n10000 -D10 -N1000 -t8 -s4096 -S0 > > Arjan, Do you have any details on the test case that you ran that showed a clear improvement? What kind of storage & IO pattern did you use? Regards, Ric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: jbd/jbd2 performance improvements 2008-10-23 12:00 ` Ric Wheeler @ 2008-10-23 12:22 ` Solofo.Ramangalahy 0 siblings, 0 replies; 8+ messages in thread From: Solofo.Ramangalahy @ 2008-10-23 12:22 UTC (permalink / raw) To: Arjan De Ven Cc: Eric Sandeen, Ric Wheeler, Theodore Ts'o, linux-ext4, Andrew Morton Ric Wheeler writes: > Do you have any details on the test case that you ran that showed a > clear improvement? What kind of storage & IO pattern did you use? Is it possible to record latencytop output (like top batch mode) to capture the highest latency during a test run? Or how did you collect this: >my reproducer is sadly very simple (claws-mail is my mail client that uses maildir) >Process claws-mail (4896) Total: 2829.7 msec >EXT3: Waiting for journal access 2491.0 msec 88.4 % >Writing back inodes 160.9 msec 5.7 % >synchronous write 78.8 msec 3.0 % >is an example of such a trace (this is with patch, without patch the numbers are about 3x bigger) >Waiting for journal access is "journal_get_write_access" >Writing back inodes is "writeback_inodes" >synchronous write is "do_sync_write" -- solofo ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-10-23 12:22 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-07 13:14 jbd/jbd2 performance improvements Theodore Ts'o 2008-10-15 17:29 ` Ric Wheeler 2008-10-16 6:04 ` Solofo.Ramangalahy 2008-10-16 12:06 ` Ric Wheeler 2008-10-16 12:39 ` Eric Sandeen 2008-10-23 10:42 ` Solofo.Ramangalahy 2008-10-23 12:00 ` Ric Wheeler 2008-10-23 12:22 ` Solofo.Ramangalahy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).