public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
@ 2001-09-24 14:09 Beau Kuiper
  2001-09-24 14:46 ` [reiserfs-list] " Chris Mason
  2001-09-24 15:32 ` Matthias Andree
  0 siblings, 2 replies; 32+ messages in thread
From: Beau Kuiper @ 2001-09-24 14:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: reiserfs-list

Hi all again,

I have updated my last set of patches for reiserfs to run on the 2.4.10 
kernel.

The new set of patches create a new method to do kupdated syncs. On 
filesystems that do no support this new method, the regular write_super 
method is used. Then reiserfs on kupdated super_sync, simply calls the 
flush_old_commits code with immediate mode off. 

The reiserfs improvements in 2.4.10 are great, but still not as good as 
2.2.19 was.

I have run two benchmarks on:

   the 2.4.9 kernel (plain, slow, starting point)
   the 2.4.9 kernel (with kupdated disabled, this is where we want to be)
   the 2.4.10 kernel (plain, quite fast though)
   the 2.4.10 kernel with my patches.

The benchmarks are:

   dbench 10 (done 4 times, with first result discarded)
   kernel compliation times (done twice, with first result discarded)

The first result in all benchmarks is discarded because it is used to set up 
the cache to a consistant state.

All benchmarks are run on the following machine:

Duron 700
VIA KT133 northbridge and 686A southbridge.
384meg RAM
40 gig IBM drive (7200rpm, GXP60)

The IBM drive has its internal write caching disabled (because it is damned 
good :-) ) since it hides the problems that my old drive had (I upgraded a 
few days ago)

I was going to use an old Quantum 5400rpm drive for these benchmarks but I 
blew it up ;-) (I got fire!! on one of the chips and everything, somehow 
managed to plug power cable into it backwards) Its smell is still lingering 
as I write this. Could someone with a slow 5400rpm drive do these tests and 
report back.

Anyway, enough yabbering, onto the results

---- 2.4.9 (plain)

dbench: 25.6155, 24.4236, 26.05 MB/Sec

kernel compile: 5.41.744 wall time, 4.43.880 user time, 0.16.380 sys time

---- 2.4.9 (kupdated off)

dbench: 33.763, 36.452, 32.0602 MB/Sec

kernel compile: 5.7.967 wall time, 4.44.140 user time, 0.15.380 sys time

---- 2.4.10 (plain)

dbench: 35.3584, 31.1634, 32.3602 MB/Sec

kernel compile: 5.21.458 wall time, 4.43.840 user time, 0.14.590 sys time

---- 2.4.10 (patched with attached patch)

dbench : 35.028, 33.6774, 38.2342 MB/Sec

kernel compile: 5.4.640 wall time, 4.42.950 user time, 0.15.160 sys time

Conclusions:

The 2.4.10 kernel improved reiserfs performace a lot all by itself, 
especially in dbench. In kernel compiles, however (maybe because dbench 
doesn't stress kupdated much), it still isn't as fast as my new patch.

Also, the performace problems seem to be very dependant on the hardware being 
used. 5400rpm drives get hurt a lot, while 7200 rpm drives seem to handle it 
better. Decent write caching on IDE devices (like the 2meg buffer on the IBM) 
can completely hide this issue.

Thanks to everyone who has helped me so far, and I look forward to further 
comments and assistance,
Beau Kuiper
kuib-kl@ljbc.wa.edu.au

^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
@ 2001-09-24 21:58 Dieter Nützel
  2001-09-25  0:19 ` Matthias Andree
  0 siblings, 1 reply; 32+ messages in thread
From: Dieter Nützel @ 2001-09-24 21:58 UTC (permalink / raw)
  To: Beau Kuiper
  Cc: Chris Mason, Andrea Arcangeli, Robert Love, Linux Kernel List,
	ReiserFS List, Roger Larsson, george anzinger

On Monday, September 24, 2001 14:46:09 PM -0400 Chris Mason wrote:
> On Monday, September 24, 2001 10:09:59 PM +0800 Beau Kuiper
> <kuib-kl@ljbc.wa.edu.au> wrote:
>
> > Hi all again,
> > 
> > I have updated my last set of patches for reiserfs to run on the 2.4.10 
> > kernel.
> > 
> > The new set of patches create a new method to do kupdated syncs. On 
> > filesystems that do no support this new method, the regular write_super 
> > method is used. Then reiserfs on kupdated super_sync, simply calls the 
> > flush_old_commits code with immediate mode off. 
> > 
>
> Ok, I think the patch is missing ;-)

That's what I've found first, too :-)))

> What we need to do now is look more closely at why the performance
> increases.

I can't second that.

First let me tell you that _all_ of my previously posted benchmarks are 
recorded _WITHOUT_ write caching.
Even if my IBM DDSY-T18350N 18 GB U160 10k disk has 4 MB cache (8 and 16 MB 
versions are available, too) it is NOT enabled per default (firmware and 
Linux SCSI driver).

Do you know of a Linux SCSI tool to enable it for testing purposes?
I know it _IS_ unsave for server (journaling) systems.

Below are my numbers for 2.4.10-preempt plus some little additional patches 
and your latest patch.

Greetings,
	Dieter

*************************************************************

inode.c-schedule.patch (Andrea, me)
APPLIED
Could be the culprit for my "slow" block IO with bonnie++
But better preempting. Really?

--- linux/fs/inode.c    Mon Sep 24 00:31:58 2001
+++ linux-2.4.10-preempt/fs/inode.c     Mon Sep 24 01:07:06 2001
@@ -17,6 +17,7 @@
 #include <linux/swapctl.h>
 #include <linux/prefetch.h>
 #include <linux/locks.h>
+#include <linux/compiler.h>

 /*
  * New inode.c implementation.
@@ -296,6 +297,12 @@
                         * so we have to start looking from the list head.
                         */
                        tmp = head;
+
+                        if (unlikely(current->need_resched)) {
+                                spin_unlock(&inode_lock);
+                                schedule();
+                                spin_lock(&inode_lock);
+                        }
                }
        }

journal.c-1-patch (Chris)
APPLIED
Do we need this really?
Shouldn't hurt? -- As Chris told me.

--- linux/fs/reiserfs/journal.c Sat Sep  8 08:05:32 2001
+++ linux/fs/reiserfs/journal.c Thu Sep 20 13:15:04 2001
@@ -2872,17 +2872,12 @@
   /* write any buffers that must hit disk before this commit is done */
   fsync_inode_buffers(&(SB_JOURNAL(p_s_sb)->j_dummy_inode)) ;

-  /* honor the flush and async wishes from the caller */
+  /* honor the flush wishes from the caller, simple commits can
+  ** be done outside the journal lock, they are done below
+  */
   if (flush) {
-
     flush_commit_list(p_s_sb, SB_JOURNAL_LIST(p_s_sb) + orig_jindex, 1) ;
     flush_journal_list(p_s_sb,  SB_JOURNAL_LIST(p_s_sb) + orig_jindex , 1) ;
-  } else if (commit_now) {
-    if (wait_on_commit) {
-      flush_commit_list(p_s_sb, SB_JOURNAL_LIST(p_s_sb) + orig_jindex, 1) ;
-    } else {
-      commit_flush_async(p_s_sb, orig_jindex) ;
-    }
   }

   /* reset journal values for the next transaction */
@@ -2944,6 +2939,16 @@
   atomic_set(&(SB_JOURNAL(p_s_sb)->j_jlock), 0) ;
   /* wake up any body waiting to join. */
   wake_up(&(SB_JOURNAL(p_s_sb)->j_join_wait)) ;
+
+  if (!flush && commit_now) {
+    if (current->need_resched)
+      schedule() ;
+    if (wait_on_commit) {
+      flush_commit_list(p_s_sb, SB_JOURNAL_LIST(p_s_sb) + orig_jindex, 1) ;
+    } else {
+      commit_flush_async(p_s_sb, orig_jindex) ;
+    }
+  }
   return 0 ;
 }

vmalloc.c-patch (Andrea)
NOT APPLIED
Do we need it?

--- linux/mm/vmalloc.c.~1~     Thu Sep 20 01:44:20 2001
+++ linux/mm/vmalloc.c Fri Sep 21 00:40:48 2001
@@ -144,6 +144,7 @@
        int ret;

        dir = pgd_offset_k(address);
+       flush_cache_all();
        spin_lock(&init_mm.page_table_lock);
        do {
                pmd_t *pmd;

*************************************************************

2.4.10+
patch-rml-2.4.10-preempt-kernel-1+
patch-rml-2.4.10-preempt-ptrace-and-jobs-fix+
patch-rml-2.4.10-preempt-stats-1+
inode.c-schedule.patch+
journal.c-1-patch

32 clients started
.....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................++............................................+.......+++..+....+++..+...+.....+.++...+.++++.+++.++++++.+++********************************
Throughput 38.6878 MB/sec (NB=48.3597 MB/sec  386.878 MBit/sec)
14.200u 54.940s 1:50.21 62.7%   0+0k 0+0io 911pf+0w
max load: 1777

Version 1.92a       ------Sequential Output------ --Sequential Input- 
--Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
%CP
SunWave1      1248M    79  97 16034  21  5719   6   147  98 22904  16 269.0   
4
Latency               138ms    2546ms     201ms   97838us   58940us    3207ms
Version 1.92a       ------Sequential Create------ --------Random 
Create--------
SunWave1            -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
%CP
                 16  6121  75 +++++ +++ 12753  95  8422  80 +++++ +++ 11152  
95
Latency             26126us   11425us   11879us    5325us   12082us   13025us
1.92a,1.92a,SunWave1,1001286857,1248M,79,97,16034,21,5719,6,147,98,22904,16,269.0,4,16,,,,,6121,75,+++++,+++,12753,95,8422,80,+++++,+++,11152,95,138ms,2546ms,201ms,97838us,58940us,3207ms,26126us,11425us,11879us,5325us,12082us,13025us

After running VTK (VIS app) I get this:

Worst 20 latency times of 1648 measured in this period.
  usec      cause     mask   start line/file      address   end line/file
  7239  spin_lock        1   381/memory.c        c012808f   402/memory.c
   321        BKL        0  2754/buffer.c        c01415ba   697/sched.c
   312        BKL        0   359/buffer.c        c013d6dc  1381/sched.c
   280        BKL        0   359/buffer.c        c013d6dc  1380/sched.c
   252   reacqBKL        0  1375/sched.c         c0115334  1381/sched.c
   232  spin_lock        0   547/sched.c         c0113574   697/sched.c
   215       eth1        0   585/irq.c           c01089af   647/irq.c
   164        BKL        0   452/exit.c          c011b4d1   681/tty_io.c
   119        BKL        0  1437/namei.c         c014cabf   697/sched.c
   105        BKL        0   452/exit.c          c011b4d1   697/sched.c
   101        BKL        5   712/tty_io.c        c01a6edb   714/tty_io.c
   100        BKL        0   452/exit.c          c011b4d1  1380/sched.c
    99    unknown        1    76/softirq.c       c011cba4   119/softirq.c
    92  spin_lock        4   468/netfilter.c     c01fe263   119/softirq.c
    79        BKL        0    42/file.c          c01714b0    63/file.c
    72        BKL        0   752/namei.c         c014b73f   697/sched.c
    71        BKL        0   533/inode.c         c016e0ad  1381/sched.c
    71        BKL        0    30/inode.c         c016d531    52/inode.c
    68        BKL        0   452/exit.c          c011b4d1  1381/sched.c
    66        BKL        0   927/namei.c         c014b94f   929/namei.c

Adhoc c012808e <zap_page_range+5e/260>

Do we need Rik's patch?

****************************************************************************

2.4.10+
patch-rml-2.4.10-preempt-kernel-1+
patch-rml-2.4.10-preempt-ptrace-and-jobs-fix+
patch-rml-2.4.10-preempt-stats-1+
inode.c-schedule.patch+
journal.c-1-patch+
kupdated-patch

32 clients started
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+..........................................+.................+...........++.....................++.....++......+......+++++...++++++++.+++++++.++********************************
Throughput 38.9015 MB/sec (NB=48.6269 MB/sec  389.015 MBit/sec)
15.140u 60.640s 1:49.66 69.1%   0+0k 0+0io 911pf+0w
max load: 1654

Version 1.92a       ------Sequential Output------ --Sequential Input- 
--Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
%CP
SunWave1      1248M    84  99 16348  21  5746   6   142  99 23411  17 265.9   
4
Latency               130ms    1868ms     192ms   88459us   54625us    3367ms
Version 1.92a       ------Sequential Create------ --------Random 
Create--------
SunWave1            -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
%CP
                 16  4941  65 +++++ +++ 12916  96  6847  76 +++++ +++ 10785  
94
Latency              8468us   11334us   11736us    8520us   12205us   12856us
1.92a,1.92a,SunWave1,1001358471,1248M,84,99,16348,21,5746,6,142,99,23411,17,265.9,4,16,,,,,4941,65,+++++,+++,12916,96,6847,76,+++++,+++,10785,94,130ms,1868ms,192ms,88459us,54625us,3367ms,8468us,11334us,11736us,8520us,12205us,12856us

Dbench run during MP3 playback:
32 clients started
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+.........+.........................................+.+.....+...............+...........................................................................+...................++...........+.+.......+..++++.++.+.++.++++++.+++++********************************
Throughput 34.7025 MB/sec (NB=43.3782 MB/sec  347.025 MBit/sec)
15.290u 63.820s 2:02.74 64.4%   0+0k 0+0io 911pf+0w

Worst 20 latency times of 16578 measured in this period.
  usec      cause     mask   start line/file      address   end line/file
 26795  spin_lock        1   291/buffer.c        c0141a2c   280/buffer.c
 17330  spin_lock        1   341/vmscan.c        c0133f0a   402/vmscan.c
 12925  spin_lock        1   439/vmscan.c        c0133ea5   338/vmscan.c
 11923  spin_lock        1   291/buffer.c        c0141a2c   285/buffer.c
  7253        BKL        0  1302/inode.c         c016faa9  1381/sched.c
  7117        BKL        1  1302/inode.c         c016faa9   697/sched.c
  6097        BKL        0  1302/inode.c         c016faa9  1380/sched.c
  6000        BKL        1   533/inode.c         c016e11d   697/sched.c
  4870   reacqBKL        1  1375/sched.c         c0115334   929/namei.c
  4015  spin_lock        0   439/vmscan.c        c0133ea5   402/vmscan.c
  2075        BKL        1   452/exit.c          c011b4d1   697/sched.c
  2029  spin_lock        1   547/sched.c         c0113574   697/sched.c
  2010        BKL        0  1302/inode.c         c016faa9   842/inode.c
  1730        BKL        0  2754/buffer.c        c01415ba  2757/buffer.c
  1668        BKL        1  2754/buffer.c        c01415ba   697/sched.c
  1574  spin_lock        0   483/dcache.c        c01545da   520/dcache.c
  1416  spin_lock        0  1376/sched.c         c0115353  1380/sched.c
  1396  spin_lock        1  1376/sched.c         c0115353   697/sched.c
  1387    aic7xxx        1    76/softirq.c       c011cba4   119/softirq.c
  1341        BKL        1   533/inode.c         c016e11d   842/inode.c

Adhoc c0141a2c <kupdate+11c/210>
Adhoc c0133f0a <shrink_cache+37a/5b0>
Adhoc c0133ea4 <shrink_cache+314/5b0>
Adhoc c0141a2c <kupdate+11c/210>
Adhoc c016faa8 <reiserfs_dirty_inode+58/f0>
Adhoc c016faa8 <reiserfs_dirty_inode+58/f0>
Adhoc c016faa8 <reiserfs_dirty_inode+58/f0>
Adhoc c016e11c <reiserfs_get_block+9c/f30>
Adhoc c0115334 <preempt_schedule+34/b0>
Adhoc c0133ea4 <shrink_cache+314/5b0>
Adhoc c011b4d0 <do_exit+130/360>
Adhoc c0113574 <schedule+34/550>
Adhoc c016faa8 <reiserfs_dirty_inode+58/f0>
Adhoc c01415ba <sync_old_buffers+2a/130>
Adhoc c01415ba <sync_old_buffers+2a/130>
Adhoc c01545da <select_parent+3a/100>
Adhoc c0115352 <preempt_schedule+52/b0>
Adhoc c0115352 <preempt_schedule+52/b0>
Adhoc c011cba4 <do_softirq+34/150>
Adhoc c016e11c <reiserfs_get_block+9c/f30>

Redo after some seconds:

Worst 20 latency times of 1944 measured in this period.
  usec      cause     mask   start line/file      address   end line/file
  2028        BKL        1  1302/inode.c         c016faa9   697/sched.c
   584        BKL        0  1302/inode.c         c016faa9  1306/inode.c
   572  spin_lock        0  1376/sched.c         c0115353  1380/sched.c
   415        BKL        0  1302/inode.c         c016faa9  1381/sched.c
   356        BKL        0  2754/buffer.c        c01415ba  2757/buffer.c
   353        BKL        0  2754/buffer.c        c01415ba   697/sched.c
   328        BKL        0  2754/buffer.c        c01415ba  1381/sched.c
   278  spin_lock        0   381/memory.c        c012808f   402/memory.c
   274   reacqBKL        0  1375/sched.c         c0115334  1381/sched.c
   245  spin_lock        0   547/sched.c         c0113574   697/sched.c
   208       eth1        0   585/irq.c           c01089af   647/irq.c
   188        BKL        1   301/namei.c         c014a4b1   697/sched.c
   176        BKL        1   927/namei.c         c014b9bf   929/namei.c
   161        BKL        0   301/namei.c         c014a4b1  1380/sched.c
   154        BKL        0   533/inode.c         c016e11d   842/inode.c
   147        BKL        6   712/tty_io.c        c01a6f6b   714/tty_io.c
   141        BKL        0   301/namei.c         c014a4b1  1381/sched.c
   141        BKL        0    30/inode.c         c016d5a1    52/inode.c
   126   reacqBKL        0  1375/sched.c         c0115334  2757/buffer.c
   121   reacqBKL        0  1375/sched.c         c0115334   929/namei.c

Adhoc c016faa8 <reiserfs_dirty_inode+58/f0>
Adhoc c016faa8 <reiserfs_dirty_inode+58/f0>
Adhoc c0115352 <preempt_schedule+52/b0>
Adhoc c016faa8 <reiserfs_dirty_inode+58/f0>
Adhoc c01415ba <sync_old_buffers+2a/130>
Adhoc c01415ba <sync_old_buffers+2a/130>
Adhoc c01415ba <sync_old_buffers+2a/130>
Adhoc c012808e <zap_page_range+5e/260>
Adhoc c0115334 <preempt_schedule+34/b0>
Adhoc c0113574 <schedule+34/550>
Adhoc c01089ae <do_IRQ+3e/1d0>
Adhoc c014a4b0 <real_lookup+70/150>
Adhoc c014b9be <vfs_create+ae/150>
Adhoc c014a4b0 <real_lookup+70/150>
Adhoc c016e11c <reiserfs_get_block+9c/f30>
Adhoc c01a6f6a <tty_write+21a/2f0>
Adhoc c014a4b0 <real_lookup+70/150>
Adhoc c016d5a0 <reiserfs_delete_inode+30/110>
Adhoc c0115334 <preempt_schedule+34/b0>
Adhoc c0115334 <preempt_schedule+34/b0>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2001-09-25 22:41 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-24 14:09 [PATCH] 2.4.10 improved reiserfs a lot, but could still be better Beau Kuiper
2001-09-24 14:46 ` [reiserfs-list] " Chris Mason
2001-09-24 15:32 ` Matthias Andree
2001-09-24 15:45   ` Alan Cox
2001-09-24 15:47     ` Matthias Andree
2001-09-24 16:08       ` Alan Cox
2001-09-24 16:08         ` [reiserfs-list] " Chris Dukes
2001-09-24 16:54         ` Matthias Andree
2001-09-24 16:15   ` Nicholas Knight
2001-09-24 16:40     ` [reiserfs-list] " Lehmann 
2001-09-24 16:53     ` Matthias Andree
2001-09-24 16:57       ` [reiserfs-list] " Lehmann 
2001-09-25 14:04         ` bill davidsen
2001-09-25 17:39           ` bill davidsen
2001-09-24 20:05       ` Nicholas Knight
2001-09-25  0:11         ` Matthias Andree
2001-09-25  4:49           ` Nicholas Knight
2001-09-25  6:00             ` Beau Kuiper
2001-09-25  6:17               ` Nicholas Knight
2001-09-25 10:44               ` Matthias Andree
2001-09-25 11:01                 ` ben-lists
2001-09-25 10:42             ` Matthias Andree
2001-09-25 11:07               ` Nicholas Knight
2001-09-25 14:47           ` Alex Bligh - linux-kernel
2001-09-25 15:13             ` Matthias Andree
2001-09-25 15:23             ` John Alvord
2001-09-25 22:41               ` bill davidsen
2001-09-25 12:54     ` Jorge Nerín
2001-09-25 13:06       ` [reiserfs-list] " Chris Mason
2001-09-25 13:17       ` Matthias Andree
  -- strict thread matches above, loose matches on Subject: below --
2001-09-24 21:58 Dieter Nützel
2001-09-25  0:19 ` Matthias Andree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox