Re: 2x Oracle slowdown from 2.2.16 to 2.4.4

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
       [not found] <Pine.LNX.4.21.0107111530170.2342-100000@llarsh-pc3.us.oracle.com.suse.lists.linux.kernel>
@ 2001-07-12 10:14 ` Andi Kleen
  2001-07-12 14:22   ` Chris Mason
  2001-07-12 16:09   ` Lance Larsh
  0 siblings, 2 replies; 25+ messages in thread
From: Andi Kleen @ 2001-07-12 10:14 UTC (permalink / raw)
  To: llarsh; +Cc: linux-kernel, mason

Lance Larsh <llarsh@oracle.com> writes:
> 
> I ran lots of iozone tests which illustrated a huge difference in write
> throughput between reiser and ext2.  Chris Mason sent me a patch which
> improved the reiser case (removing an unnecessary commit), but it was
> still noticeably slower than ext2.  Therefore I would recommend that
> at this time reiser should not be used for Oracle database files.

When I read the 2.4.6 reiserfs code correctly reiserfs does not cause
any transactions for reads/writes to allocated blocks; i.e. you're not extending
the file, you're not filling holes and you're not updating atimes.
My understanding is that this is normally true for Oracle, but probably
not for iozone so it would be better if you benchmarked random writes
to an already allocated file. 
The 2.4 page cache is more or less direct write through in this case.

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-12 10:14 ` 2x Oracle slowdown from 2.2.16 to 2.4.4 Andi Kleen
@ 2001-07-12 14:22   ` Chris Mason
  2001-07-12 16:09   ` Lance Larsh
  1 sibling, 0 replies; 25+ messages in thread
From: Chris Mason @ 2001-07-12 14:22 UTC (permalink / raw)
  To: Andi Kleen, llarsh; +Cc: linux-kernel



On Thursday, July 12, 2001 12:14:16 PM +0200 Andi Kleen <freitag@alancoxonachip.com> wrote:

> Lance Larsh <llarsh@oracle.com> writes:
>> 
>> I ran lots of iozone tests which illustrated a huge difference in write
>> throughput between reiser and ext2.  Chris Mason sent me a patch which
>> improved the reiser case (removing an unnecessary commit), but it was
>> still noticeably slower than ext2.  Therefore I would recommend that
>> at this time reiser should not be used for Oracle database files.
> 
> When I read the 2.4.6 reiserfs code correctly reiserfs does not cause
> any transactions for reads/writes to allocated blocks; i.e. you're not extending
> the file, you're not filling holes and you're not updating atimes.
> My understanding is that this is normally true for Oracle, but probably
> not for iozone so it would be better if you benchmarked random writes
> to an already allocated file. 
> The 2.4 page cache is more or less direct write through in this case.
>

In general, yes.  But, atime updates trigger transactions, as
and O_SYNC/fsync writes (in 2.4.x reiserfs) always force a commit of
the current tranasction.  The two patches I just posted should fix
that...

-chris






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-12 10:14 ` 2x Oracle slowdown from 2.2.16 to 2.4.4 Andi Kleen
  2001-07-12 14:22   ` Chris Mason
@ 2001-07-12 16:09   ` Lance Larsh
  1 sibling, 0 replies; 25+ messages in thread
From: Lance Larsh @ 2001-07-12 16:09 UTC (permalink / raw)
  To: Andi Kleen; +Cc: llarsh, linux-kernel, mason

[-- Attachment #1: Type: text/plain, Size: 928 bytes --]

Andi Kleen wrote:

> My understanding is that this is normally true for Oracle, but probably
> not for iozone so it would be better if you benchmarked random writes
> to an already allocated file.

You are correct that this is true for Oracle:  we preallocate the file at db create
time, and we use O_DSYNC to avoid atime updates.  The same is true for iozone:  it
performs writes to all the blocks (creating the file and allocating blocks), then
rewrites all of the blocks.  The write and rewrite times are measured and reported
in separate.  Naturally, we only care about the rewrite times, and those are the
results I'm quoting when I casually use the term "writes".  Also, we pass the "-o"
option to iozone, which causes it to open the file with O_SYNC (which on Linux is
really O_DSYNC), just like Oracle does.  So, the mode I'm running iozone in really
does model Oracle i/o.  Sorry if that wasn't clear.

Thanks,
Lance

[-- Attachment #2: Card for Lance Larsh --]
[-- Type: text/x-vcard, Size: 367 bytes --]

begin:vcard 
n:Larsh;Lance
x-mozilla-html:FALSE
url:http://www.oracle.com
org:Oracle Corporation;<img src=http://www.geocities.com/TheTropics/3068/oraani.gif>
version:2.1
email;internet:Lance.Larsh@oracle.com
title:Principal Software Engineer
adr;quoted-printable:;;500 Oracle Pkwy=0D=0AMS 401ip4;Redwood Shores;CA;94065;
x-mozilla-cpt:;6896
fn:Lance Larsh
end:vcard

^ permalink raw reply	[flat|nested] 25+ messages in thread

* 2x Oracle slowdown from 2.2.16 to 2.4.4
@ 2001-07-11  0:45 Brian Strand
  2001-07-11  1:15 ` Andrea Arcangeli
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Brian Strand @ 2001-07-11  0:45 UTC (permalink / raw)
  To: linux-kernel

We are running 3 Oracle servers, each dual CPU, 1 1GB and 2 2GB memory, 
 between 36-180GB of RAID.  On June 26, I upgraded all boxes from Suse 
7.0 to Suse 7.2 (going from kernel version 2.2.16-40 to 2.4.4-14). 
 Reviewing Oracle job times (jobs range from a few minutes to 10 hours) 
before and after, performance is almost exactly twice as poor after the 
upgrade versus before the upgrade.  Nothing in the hardware or Oracle 
configuration has changed on any server.  Does anyone have any ideas as 
to what might cause this?

Thanks,
Brian Strand
CTO Switch Management

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11  0:45 Brian Strand
@ 2001-07-11  1:15 ` Andrea Arcangeli
  2001-07-11 16:44   ` Brian Strand
  2001-07-11  2:58 ` Jeff V. Merkey
  2001-07-11  2:59 ` Jeff V. Merkey
  2 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2001-07-11  1:15 UTC (permalink / raw)
  To: Brian Strand; +Cc: linux-kernel

On Tue, Jul 10, 2001 at 05:45:16PM -0700, Brian Strand wrote:
> We are running 3 Oracle servers, each dual CPU, 1 1GB and 2 2GB memory, 
>  between 36-180GB of RAID.  On June 26, I upgraded all boxes from Suse 
> 7.0 to Suse 7.2 (going from kernel version 2.2.16-40 to 2.4.4-14). 
>  Reviewing Oracle job times (jobs range from a few minutes to 10 hours) 
> before and after, performance is almost exactly twice as poor after the 
> upgrade versus before the upgrade.  Nothing in the hardware or Oracle 
> configuration has changed on any server.  Does anyone have any ideas as 
> to what might cause this?

We need to restrict the problem. How are you using Oracle?  Through any
filesystem? If yes which one? Or with rawio?  Is your workload cached
most of the time or not?

thanks,
Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11  1:15 ` Andrea Arcangeli
@ 2001-07-11 16:44   ` Brian Strand
  2001-07-11 17:08     ` Andrea Arcangeli
  2001-07-11 23:03     ` Lance Larsh
  0 siblings, 2 replies; 25+ messages in thread
From: Brian Strand @ 2001-07-11 16:44 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Andrea Arcangeli wrote:

>We need to restrict the problem. How are you using Oracle?  Through any
>filesystem? If yes which one? Or with rawio?  Is your workload cached
>most of the time or not?
>
Our Oracle configuration is on reiserfs on lvm on Mylex.  Our workload 
is not entirely cached, as we are working against an 8GB table, Oracle 
is configured to use slightly more than 1GB of memory, and there is 
always several MB/s of IO going on during our queries.  The "working 
set" of the main table and indexes occupies over 2GB.

Many Thanks,
Brian Strand
CTO Switch Management



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11 16:44   ` Brian Strand
@ 2001-07-11 17:08     ` Andrea Arcangeli
  2001-07-11 17:23       ` Chris Mason
  2001-07-11 23:03     ` Lance Larsh
  1 sibling, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2001-07-11 17:08 UTC (permalink / raw)
  To: Brian Strand; +Cc: linux-kernel

On Wed, Jul 11, 2001 at 09:44:19AM -0700, Brian Strand wrote:
> Our Oracle configuration is on reiserfs on lvm on Mylex.  Our workload 
> is not entirely cached, as we are working against an 8GB table, Oracle 
> is configured to use slightly more than 1GB of memory, and there is 
> always several MB/s of IO going on during our queries.  The "working 
> set" of the main table and indexes occupies over 2GB.

As I suspected there is the VM in our way. Also reiserfs could be an
issue but I am not aware of any regression on the reiserfs side, Chris?

I tend to believe it is a VM regression (and I admit, this is what I
would bet as soon as I read your report before being sure the VM was in
our way).

One way to verify this could be to run Oracle on top of rawio and then
on ext2. If it's the vm you should still get the slowdown on ext2 too
and you should run as fast as 2.2 with rawio. Most people uses Oracle on
top of rawio on top of lvm, and incidentally this is was the first
slowdown report I got about 2.4 when compared to 2.2.

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11 17:08     ` Andrea Arcangeli
@ 2001-07-11 17:23       ` Chris Mason
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Mason @ 2001-07-11 17:23 UTC (permalink / raw)
  To: Andrea Arcangeli, Brian Strand; +Cc: linux-kernel



On Wednesday, July 11, 2001 07:08:21 PM +0200 Andrea Arcangeli
<andrea@suse.de> wrote:

> On Wed, Jul 11, 2001 at 09:44:19AM -0700, Brian Strand wrote:
>> Our Oracle configuration is on reiserfs on lvm on Mylex.  Our workload 
>> is not entirely cached, as we are working against an 8GB table, Oracle 
>> is configured to use slightly more than 1GB of memory, and there is 
>> always several MB/s of IO going on during our queries.  The "working 
>> set" of the main table and indexes occupies over 2GB.
> 
> As I suspected there is the VM in our way. Also reiserfs could be an
> issue but I am not aware of any regression on the reiserfs side, Chris?

reiserfs has a big O_SYNC penalty right now, which can be fixed by a
transaction tracking patch I posted a month or so ago.  It has been tested
by a few people as a large improvement.  Brian, I'll update this to 2.4.6
and send along.

-chris


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11 16:44   ` Brian Strand
  2001-07-11 17:08     ` Andrea Arcangeli
@ 2001-07-11 23:03     ` Lance Larsh
  2001-07-11 23:46       ` Brian Strand
                         ` (3 more replies)
  1 sibling, 4 replies; 25+ messages in thread
From: Lance Larsh @ 2001-07-11 23:03 UTC (permalink / raw)
  To: Brian Strand; +Cc: Andrea Arcangeli, linux-kernel

On Wed, 11 Jul 2001, Brian Strand wrote:

> Our Oracle configuration is on reiserfs on lvm on Mylex.

I can pretty much tell you it's the reiser+lvm combination that is hurting
you here.  At the 2.5 kernel summit a few months back, I reported that
some of our servers experienced as much as 10-15x slowdown after we moved
to 2.4.  As it turned out, the problem was that the new servers (with
identical hardware to the old servers) were configured to use reiser+lvm,
whereas the older servers were using ext2 without lvm.  When we rebuilt
the new servers with ext2 alone, the problem disappeared.  (Note that we
also tried reiserfs without lvm, which was 5-6x slower than ext2 without
lvm.)

I ran lots of iozone tests which illustrated a huge difference in write
throughput between reiser and ext2.  Chris Mason sent me a patch which
improved the reiser case (removing an unnecessary commit), but it was
still noticeably slower than ext2.  Therefore I would recommend that
at this time reiser should not be used for Oracle database files.

Thanks,
Lance

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11 23:03     ` Lance Larsh
@ 2001-07-11 23:46       ` Brian Strand
  2001-07-12 15:21         ` Lance Larsh
  2001-07-12  0:23       ` Chris Mason
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Brian Strand @ 2001-07-11 23:46 UTC (permalink / raw)
  To: Lance Larsh; +Cc: Andrea Arcangeli, linux-kernel

Lance Larsh wrote:

>On Wed, 11 Jul 2001, Brian Strand wrote:
>
>>Our Oracle configuration is on reiserfs on lvm on Mylex.
>>
>I can pretty much tell you it's the reiser+lvm combination that is hurting
>you here.  At the 2.5 kernel summit a few months back, I reported that
>
Why did it get so much worse going from 2.2.16 to 2.4.4, with an 
otherwise-identical configuration?  We had reiserfs+lvm under 2.2.16 too.

>
>some of our servers experienced as much as 10-15x slowdown after we moved
>to 2.4.  As it turned out, the problem was that the new servers (with
>identical hardware to the old servers) were configured to use reiser+lvm,
>whereas the older servers were using ext2 without lvm.  When we rebuilt
>the new servers with ext2 alone, the problem disappeared.  (Note that we
>also tried reiserfs without lvm, which was 5-6x slower than ext2 without
>lvm.)
>
>I ran lots of iozone tests which illustrated a huge difference in write
>throughput between reiser and ext2.  Chris Mason sent me a patch which
>improved the reiser case (removing an unnecessary commit), but it was
>still noticeably slower than ext2.  Therefore I would recommend that
>at this time reiser should not be used for Oracle database files.
>
How do ext2+lvm, rawio+lvm, ext2 w/o lvm, and rawio w/o lvm compare in 
terms of Oracle performance?  I am going to try a migration if 2.4.6 
doesn't make everything better; do you have any suggestions as to the 
relative performance of each strategy?

Thanks,
Brian



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11 23:46       ` Brian Strand
@ 2001-07-12 15:21         ` Lance Larsh
  2001-07-12 21:31           ` Hans Reiser
  2001-07-13  3:00           ` Andrew Morton
  0 siblings, 2 replies; 25+ messages in thread
From: Lance Larsh @ 2001-07-12 15:21 UTC (permalink / raw)
  To: Brian Strand; +Cc: Andrea Arcangeli, linux-kernel

On Wed, 11 Jul 2001, Brian Strand wrote:

> Why did it get so much worse going from 2.2.16 to 2.4.4, with an
> otherwise-identical configuration?  We had reiserfs+lvm under 2.2.16 too.

Don't have an answer to that.  I never tried reiser on 2.2.

> How do ext2+lvm, rawio+lvm, ext2 w/o lvm, and rawio w/o lvm compare in
> terms of Oracle performance?  I am going to try a migration if 2.4.6
> doesn't make everything better; do you have any suggestions as to the
> relative performance of each strategy?

The best answer I can give at the moment is to use either ext2 or rawio,
and you might want to avoid lvm for now.

I never ran any of the lvm configurations myself.  What little I know
about lvm performance is conjecture based on comparing my reiser results
(5-6x slower than ext2) to the reiser+lvm results from one of our other
internal groups (10-15x slower than ext2).  So, although it looks like lvm
throws in a factor of 2-3x slowdown when using reiser, I don't think we
can assume lvm slows down ext2 by the same amount or else someone probably
would have noticed by now.  Perhaps there's something that sort of
resonates between reiser and lvm to cause the combination to be
particularly bad.  Just guessing...

And while we're talking about comparing configurations, I'll mention that
I'm currently trying to compare raw and ext2 (no lvm in either case).
Although raw should be faster than fs, we're seeing some strange results:
it looks like ext2 can be as much as 2x faster than raw for reads, though
I'm not confident that these results are accurate.  The fs might still be
getting a boost from the fs cache, even though we've tried to eliminate
that possibility by sizing things appropriately.

Has anyone else seen results like this, or can anyone think of a
possible explanation?

Thanks,
Lance

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-12 15:21         ` Lance Larsh
@ 2001-07-12 21:31           ` Hans Reiser
  2001-07-12 21:51             ` Chris Mason
  2001-07-13  3:00           ` Andrew Morton
  1 sibling, 1 reply; 25+ messages in thread
From: Hans Reiser @ 2001-07-12 21:31 UTC (permalink / raw)
  To: Lance Larsh; +Cc: Brian Strand, Andrea Arcangeli, linux-kernel

Lance Larsh wrote:
> 
> On Wed, 11 Jul 2001, Brian Strand wrote:
> 
> > Why did it get so much worse going from 2.2.16 to 2.4.4, with an
> > otherwise-identical configuration?  We had reiserfs+lvm under 2.2.16 too.
> 
> Don't have an answer to that.  I never tried reiser on 2.2.
> 
> > How do ext2+lvm, rawio+lvm, ext2 w/o lvm, and rawio w/o lvm compare in
> > terms of Oracle performance?  I am going to try a migration if 2.4.6
> > doesn't make everything better; do you have any suggestions as to the
> > relative performance of each strategy?
> 
> The best answer I can give at the moment is to use either ext2 or rawio,
> and you might want to avoid lvm for now.
> 
> I never ran any of the lvm configurations myself.  What little I know
> about lvm performance is conjecture based on comparing my reiser results

Lance, I would appreciate it if you would be more careful to identify that you are using O_SYNC,
which is a special case we are not optimized for, and which I am frankly skeptical should be used at
all by an application instead of using fsync judiciously.  It is rare that an application is
inherently completely incapable of ever having two I/Os not be serialized, and using O_SYNC to force
every IO to be serialized rather than picking and choosing when to use fsync, well, I have my doubts
frankly.  If a user really needs every operation to be synchronous, they should buy a system with an
SSD for the journal from applianceware.com (they sell them tuned to run ReiserFS), or else they are
just going to go real slow, no matter what the FS does.


> (5-6x slower than ext2) to the reiser+lvm results from one of our other
> internal groups (10-15x slower than ext2).  So, although it looks like lvm
> throws in a factor of 2-3x slowdown when using reiser, I don't think we
> can assume lvm slows down ext2 by the same amount or else someone probably
> would have noticed by now.  Perhaps there's something that sort of
> resonates between reiser and lvm to cause the combination to be
> particularly bad.  Just guessing...
> 
> And while we're talking about comparing configurations, I'll mention that
> I'm currently trying to compare raw and ext2 (no lvm in either case).
> Although raw should be faster than fs, we're seeing some strange results:
> it looks like ext2 can be as much as 2x faster than raw for reads, though
> I'm not confident that these results are accurate.  The fs might still be
> getting a boost from the fs cache, even though we've tried to eliminate
> that possibility by sizing things appropriately.
> 
> Has anyone else seen results like this, or can anyone think of a
> possible explanation?
> 
> Thanks,
> Lance
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-12 21:31           ` Hans Reiser
@ 2001-07-12 21:51             ` Chris Mason
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Mason @ 2001-07-12 21:51 UTC (permalink / raw)
  To: Hans Reiser, Lance Larsh; +Cc: Brian Strand, Andrea Arcangeli, linux-kernel



On Friday, July 13, 2001 01:31:42 AM +0400 Hans Reiser <reiser@namesys.com> wrote:

> Lance, I would appreciate it if you would be more careful to identify that you are using O_SYNC,
> which is a special case we are not optimized for, and which I am frankly skeptical should be used at
> all by an application instead of using fsync judiciously.  It is rare that an application is
> inherently completely incapable of ever having two I/Os not be serialized, and using O_SYNC to force
> every IO to be serialized rather than picking and choosing when to use fsync, well, I have my doubts
> frankly.  If a user really needs every operation to be synchronous, they should buy a system with an
> SSD for the journal from applianceware.com (they sell them tuned to run ReiserFS), or else they are
> just going to go real slow, no matter what the FS does.
> 

There is no reason for reiserfs to be 5 times slower than ext2 at anything ;-)  
Regardless of if O_SYNC is a good idea or not.  I should have optimized the
original code for this case, as oracle is reason enough to do it.

-chris


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-12 15:21         ` Lance Larsh
  2001-07-12 21:31           ` Hans Reiser
@ 2001-07-13  3:00           ` Andrew Morton
  2001-07-13  4:17             ` Andrew Morton
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2001-07-13  3:00 UTC (permalink / raw)
  To: Lance Larsh; +Cc: Brian Strand, Andrea Arcangeli, linux-kernel

Lance Larsh wrote:
> 
> And while we're talking about comparing configurations, I'll mention that
> I'm currently trying to compare raw and ext2 (no lvm in either case).

It would be interesting to see some numbers for ext3 with full
data journalling.

Some preliminary testing by Neil Brown shows that ext3 is 1.5x faster
than ext2 when used with knfsd, mounted synchronously.  (This uses
O_SYNC internally).

The reason is that all the data and metadata are written to a
contiguous area of the disk: no seeks apart from the seek to the
journal are needed.  Once the metadata and data are committed to
the journal, the O_SYNC (or fsync()) caller is allowed to continue.
Checkpointing of the data and metadata into the main fileystem is
allowed to proceed via normal writeback.

Make sure that you're using a *big* journal though.   Use the
`-J size=400' option with tune2fs or mke2fs.

-

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-13  3:00           ` Andrew Morton
@ 2001-07-13  4:17             ` Andrew Morton
  2001-07-13 15:36               ` Jeffrey W. Baker
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2001-07-13  4:17 UTC (permalink / raw)
  To: Lance Larsh, Brian Strand, Andrea Arcangeli, linux-kernel

Andrew Morton wrote:
> 
> Lance Larsh wrote:
> >
> > And while we're talking about comparing configurations, I'll mention that
> > I'm currently trying to compare raw and ext2 (no lvm in either case).
> 
> It would be interesting to see some numbers for ext3 with full
> data journalling.
> 
> Some preliminary testing by Neil Brown shows that ext3 is 1.5x faster
> than ext2 when used with knfsd, mounted synchronously.  (This uses
> O_SYNC internally).

I just did some testing with local filesystems - running `dbench 4'
on ext2-on-iDE and ext3-on-IDE, where dbench was altered to open
files O_SYNC.  Journal size was 400 megs, mount options `data=journal'

ext2: Throughput 2.71849 MB/sec (NB=3.39812 MB/sec  27.1849 MBit/sec)
ext3: Throughput 12.3623 MB/sec (NB=15.4529 MB/sec  123.623 MBit/sec)

ext3 patches are at http://www.uow.edu.au/~andrewm/linux/ext3/

The difference will be less dramatic with large, individual writes.

Be aware though that ext3 breaks both RAID1 and RAID5.  This
RAID patch should help:


--- linux-2.4.6/drivers/md/raid1.c	Wed Jul  4 18:21:26 2001
+++ lk-ext3/drivers/md/raid1.c	Thu Jul 12 15:27:09 2001
@@ -46,6 +46,30 @@
 #define PRINTK(x...)  do { } while (0)
 #endif
 
+#define __raid1_wait_event(wq, condition) 				\
+do {									\
+	wait_queue_t __wait;						\
+	init_waitqueue_entry(&__wait, current);				\
+									\
+	add_wait_queue(&wq, &__wait);					\
+	for (;;) {							\
+		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		if (condition)						\
+			break;						\
+		run_task_queue(&tq_disk);				\
+		schedule();						\
+	}								\
+	current->state = TASK_RUNNING;					\
+	remove_wait_queue(&wq, &__wait);				\
+} while (0)
+
+#define raid1_wait_event(wq, condition) 				\
+do {									\
+	if (condition)	 						\
+		break;							\
+	__raid1_wait_event(wq, condition);				\
+} while (0)
+
 
 static mdk_personality_t raid1_personality;
 static md_spinlock_t retry_list_lock = MD_SPIN_LOCK_UNLOCKED;
@@ -83,7 +107,7 @@ static struct buffer_head *raid1_alloc_b
 			cnt--;
 		} else {
 			PRINTK("raid1: waiting for %d bh\n", cnt);
-			wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt);
+			raid1_wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt);
 		}
 	}
 	return bh;
@@ -170,7 +194,7 @@ static struct raid1_bh *raid1_alloc_r1bh
 			memset(r1_bh, 0, sizeof(*r1_bh));
 			return r1_bh;
 		}
-		wait_event(conf->wait_buffer, conf->freer1);
+		raid1_wait_event(conf->wait_buffer, conf->freer1);
 	} while (1);
 }
 
--- linux-2.4.6/drivers/md/raid5.c	Wed Jul  4 18:21:26 2001
+++ lk-ext3/drivers/md/raid5.c	Thu Jul 12 21:31:55 2001
@@ -66,10 +66,11 @@ static inline void __release_stripe(raid
 			BUG();
 		if (atomic_read(&conf->active_stripes)==0)
 			BUG();
-		if (test_bit(STRIPE_DELAYED, &sh->state))
-			list_add_tail(&sh->lru, &conf->delayed_list);
-		else if (test_bit(STRIPE_HANDLE, &sh->state)) {
-			list_add_tail(&sh->lru, &conf->handle_list);
+		if (test_bit(STRIPE_HANDLE, &sh->state)) {
+			if (test_bit(STRIPE_DELAYED, &sh->state))
+				list_add_tail(&sh->lru, &conf->delayed_list);
+			else
+				list_add_tail(&sh->lru, &conf->handle_list);
 			md_wakeup_thread(conf->thread);
 		} else {
 			if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
@@ -1167,10 +1168,9 @@ static void raid5_unplug_device(void *da
 
 	raid5_activate_delayed(conf);
 	
-	if (conf->plugged) {
-		conf->plugged = 0;
-		md_wakeup_thread(conf->thread);
-	}	
+	conf->plugged = 0;
+	md_wakeup_thread(conf->thread);
+
 	spin_unlock_irqrestore(&conf->device_lock, flags);
 }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-13  4:17             ` Andrew Morton
@ 2001-07-13 15:36               ` Jeffrey W. Baker
  2001-07-13 15:49                 ` Andrew Morton
  2001-07-16 22:03                 ` Stephen C. Tweedie
  0 siblings, 2 replies; 25+ messages in thread
From: Jeffrey W. Baker @ 2001-07-13 15:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Lance Larsh, Brian Strand, Andrea Arcangeli, linux-kernel

On Fri, 13 Jul 2001, Andrew Morton wrote:

> Andrew Morton wrote:
> >
> > Lance Larsh wrote:
> > >
> > > And while we're talking about comparing configurations, I'll mention that
> > > I'm currently trying to compare raw and ext2 (no lvm in either case).
> >
> > It would be interesting to see some numbers for ext3 with full
> > data journalling.
> >
> > Some preliminary testing by Neil Brown shows that ext3 is 1.5x faster
> > than ext2 when used with knfsd, mounted synchronously.  (This uses
> > O_SYNC internally).
>
> I just did some testing with local filesystems - running `dbench 4'
> on ext2-on-iDE and ext3-on-IDE, where dbench was altered to open
> files O_SYNC.  Journal size was 400 megs, mount options `data=journal'
>
> ext2: Throughput 2.71849 MB/sec (NB=3.39812 MB/sec  27.1849 MBit/sec)
> ext3: Throughput 12.3623 MB/sec (NB=15.4529 MB/sec  123.623 MBit/sec)
>
> ext3 patches are at http://www.uow.edu.au/~andrewm/linux/ext3/
>
> The difference will be less dramatic with large, individual writes.

This is a totally transient effect, right?  The journal acts as a faster
buffer, but if programs are writing a lot of data to the disk for a very
long time, the throughput will eventually be throttled by writing the
journal back into the filesystem.

For programs that write in bursts, it looks like a huge win!

-jwb


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-13 15:36               ` Jeffrey W. Baker
@ 2001-07-13 15:49                 ` Andrew Morton
  2001-07-16 22:03                 ` Stephen C. Tweedie
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2001-07-13 15:49 UTC (permalink / raw)
  To: Jeffrey W. Baker
  Cc: Lance Larsh, Brian Strand, Andrea Arcangeli, linux-kernel

"Jeffrey W. Baker" wrote:
> 
> > ...
> > ext2: Throughput 2.71849 MB/sec (NB=3.39812 MB/sec  27.1849 MBit/sec)
> > ext3: Throughput 12.3623 MB/sec (NB=15.4529 MB/sec  123.623 MBit/sec)
> >
> > ext3 patches are at http://www.uow.edu.au/~andrewm/linux/ext3/
> >
> > The difference will be less dramatic with large, individual writes.
> 
> This is a totally transient effect, right?  The journal acts as a faster
> buffer, but if programs are writing a lot of data to the disk for a very
> long time, the throughput will eventually be throttled by writing the
> journal back into the filesystem.

It varies a lot with workload.  With large writes such as 
'iozone -s 300m -a -i 0' it seems about the same throughput
as ext2.  It would take some time to characterise fully.

> For programs that write in bursts, it looks like a huge win!

yes - lots of short writes (eg: mailspools) will benefit considerably.
The benefits come from the additional merging and sorting which
can be performed on the writeback data.

I suspect some of the dbench benefit comes from the fact that
the files are unlinked at the end of the test - if the data hasn't
been written back at that time the buffers are hunted down and
zapped - they *never* get written.

If anyone wants to test sync throughput, please be sure to use
0.9.3-pre - it fixes some rather sucky behaviour with large journals.

-

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-13 15:36               ` Jeffrey W. Baker
  2001-07-13 15:49                 ` Andrew Morton
@ 2001-07-16 22:03                 ` Stephen C. Tweedie
  1 sibling, 0 replies; 25+ messages in thread
From: Stephen C. Tweedie @ 2001-07-16 22:03 UTC (permalink / raw)
  To: Jeffrey W. Baker
  Cc: Andrew Morton, Lance Larsh, Brian Strand, Andrea Arcangeli,
	linux-kernel, Stephen Tweedie

Hi,

On Fri, Jul 13, 2001 at 08:36:01AM -0700, Jeffrey W. Baker wrote:

> > files O_SYNC.  Journal size was 400 megs, mount options `data=journal'
> >
> > ext2: Throughput 2.71849 MB/sec (NB=3.39812 MB/sec  27.1849 MBit/sec)
> > ext3: Throughput 12.3623 MB/sec (NB=15.4529 MB/sec  123.623 MBit/sec)
> >
> > The difference will be less dramatic with large, individual writes.
> 
> This is a totally transient effect, right?  The journal acts as a faster
> buffer, but if programs are writing a lot of data to the disk for a very
> long time, the throughput will eventually be throttled by writing the
> journal back into the filesystem.

Not for O_SYNC.  For ext2, *every* O_SYNC append to a file involves
seeking between inodes and indirect blocks and data blocks.  With ext3
with data journaling enabled, the synchronous part of the IO is a
single sequential write to the journal.  The async writeback will
affect throughput, yes, but since it is done in the background, it can
do tons of optimisations: if you extend a file a hundred times with
O_SYNC, then you are forced to journal the inode update a hundred
times but the writeback which occurs later need only be done once.

For async traffic, you're quite correct.  For synchronous traffic, the
writeback later on is still async, and the synchronous costs really do
often dominate, so the net effect over time is still a big win.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11 23:03     ` Lance Larsh
  2001-07-11 23:46       ` Brian Strand
@ 2001-07-12  0:23       ` Chris Mason
  2001-07-12 14:48         ` Lance Larsh
  2001-07-12  2:30       ` Andrea Arcangeli
  2001-07-12  6:12       ` parviz dey
  3 siblings, 1 reply; 25+ messages in thread
From: Chris Mason @ 2001-07-12  0:23 UTC (permalink / raw)
  To: Lance Larsh, Brian Strand; +Cc: Andrea Arcangeli, linux-kernel



On Wednesday, July 11, 2001 04:03:09 PM -0700 Lance Larsh
<llarsh@oracle.com> wrote:

> I ran lots of iozone tests which illustrated a huge difference in write
> throughput between reiser and ext2.  Chris Mason sent me a patch which
> improved the reiser case (removing an unnecessary commit), but it was
> still noticeably slower than ext2.  Therefore I would recommend that
> at this time reiser should not be used for Oracle database files.
> 

Hi Lance,

Could I get a copy of the results from last benchmark you ran (with the
patch + noatime on reiserfs).  I'd like to close that gap...

-chris


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-12  0:23       ` Chris Mason
@ 2001-07-12 14:48         ` Lance Larsh
  0 siblings, 0 replies; 25+ messages in thread
From: Lance Larsh @ 2001-07-12 14:48 UTC (permalink / raw)
  To: Chris Mason; +Cc: Brian Strand, Andrea Arcangeli, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 858 bytes --]

On Wed, 11 Jul 2001, Chris Mason wrote:

> Could I get a copy of the results from last benchmark you ran (with the
> patch + noatime on reiserfs).  I'd like to close that gap...

I have the results in an Excel spreadsheet, but I'm only attaching the
plot in postscript format to simplify things.  If you'd like me to send
you the .xls file, let me know.  Note that the results included here are
only for "rewrites", not "writes".

The most interesting things I see are:

1.  the reiser patch you sent me made a noticeable improvement, but it
didn't matter whether I used the noatime mount option or not.

2.  reiser has a reproducible spike in throughput at 4k i/o size, and it
even beats ext2 in that single case.

3.  (and sort of off topic...) ext2
throughput drifts slightly down for i/o sizes >64k as we go from 2.4.0 ->
2.4.3 -> 2.4.4

Thanks,
Lance

[-- Attachment #2: Type: APPLICATION/postscript, Size: 51903 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11 23:03     ` Lance Larsh
  2001-07-11 23:46       ` Brian Strand
  2001-07-12  0:23       ` Chris Mason
@ 2001-07-12  2:30       ` Andrea Arcangeli
  2001-07-12  6:12       ` parviz dey
  3 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2001-07-12  2:30 UTC (permalink / raw)
  To: Lance Larsh; +Cc: Brian Strand, linux-kernel, lvm-devel

On Wed, Jul 11, 2001 at 04:03:09PM -0700, Lance Larsh wrote:
> some of our servers experienced as much as 10-15x slowdown after we moved
[..]
> also tried reiserfs without lvm, which was 5-6x slower than ext2 without

Hmm, so lvm introduced a significant slowdown too.

The only thing I'm scared about lvm are the down() in the ll_rw_block
fast paths and sumbit_bh which should *obviously* be converted to rwsem
(the write lock is needed only while moving PV around or while taking
COW in a snapshotted device). This way the fast paths common cases will
never wait for a lock. We inherit those non rw semaphores from the
latest lvm release (more recent than beta7 there's only the head CVS).

The down() of beta7 fixes race conditions present in previous releases
so they weren't pointless, but it was obviously a suboptimal fix. When I
seen them I was just scared but it was hard to tell if they could hurt
in real life and since 'till today nobody said anything bad about lvm
performance I assumed it wasn't a problem, but now something has
changed thanks to your feedback.

I will soon somehow make those changes in the lvm (based on beta7) in my
tree and it will be interesting to see if this will make a difference. I
will also have a look to see if I can improve a little more the lvm_map
but other than those non rw semaphores there should be not a significant
overhead to remove in the lvm fast path.

Andrea

PS. hint: if the down() were the problem you should also see an higher
    context switching rate with lvm+ext2 than with plain ext2.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11 23:03     ` Lance Larsh
                         ` (2 preceding siblings ...)
  2001-07-12  2:30       ` Andrea Arcangeli
@ 2001-07-12  6:12       ` parviz dey
  3 siblings, 0 replies; 25+ messages in thread
From: parviz dey @ 2001-07-12  6:12 UTC (permalink / raw)
  To: Lance Larsh; +Cc: linux-kernel

Hey Lance,

interesting stuff!!
Did u ever found out why this would happen??
any idea??

--- Lance Larsh <llarsh@oracle.com> wrote:
> On Wed, 11 Jul 2001, Brian Strand wrote:
> 
> > Our Oracle configuration is on reiserfs on lvm on
> Mylex.
> 
> I can pretty much tell you it's the reiser+lvm
> combination that is hurting
> you here.  At the 2.5 kernel summit a few months
> back, I reported that
> some of our servers experienced as much as 10-15x
> slowdown after we moved
> to 2.4.  As it turned out, the problem was that the
> new servers (with
> identical hardware to the old servers) were
> configured to use reiser+lvm,
> whereas the older servers were using ext2 without
> lvm.  When we rebuilt
> the new servers with ext2 alone, the problem
> disappeared.  (Note that we
> also tried reiserfs without lvm, which was 5-6x
> slower than ext2 without
> lvm.)
> 
> I ran lots of iozone tests which illustrated a huge
> difference in write
> throughput between reiser and ext2.  Chris Mason
> sent me a patch which
> improved the reiser case (removing an unnecessary
> commit), but it was
> still noticeably slower than ext2.  Therefore I
> would recommend that
> at this time reiser should not be used for Oracle
> database files.
> 
> Thanks,
> Lance
> 
> -
> To unsubscribe from this list: send the line
> "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at 
> http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11  0:45 Brian Strand
  2001-07-11  1:15 ` Andrea Arcangeli
@ 2001-07-11  2:58 ` Jeff V. Merkey
  2001-07-11 15:55   ` Brian Strand
  2001-07-11  2:59 ` Jeff V. Merkey
  2 siblings, 1 reply; 25+ messages in thread
From: Jeff V. Merkey @ 2001-07-11  2:58 UTC (permalink / raw)
  To: Brian Strand; +Cc: linux-kernel

On Tue, Jul 10, 2001 at 05:45:16PM -0700, Brian Strand wrote:
> We are running 3 Oracle servers, each dual CPU, 1 1GB and 2 2GB memory, 
>  between 36-180GB of RAID.  On June 26, I upgraded all boxes from Suse 
> 7.0 to Suse 7.2 (going from kernel version 2.2.16-40 to 2.4.4-14). 
>  Reviewing Oracle job times (jobs range from a few minutes to 10 hours) 
> before and after, performance is almost exactly twice as poor after the 
> upgrade versus before the upgrade.  Nothing in the hardware or Oracle 
> configuration has changed on any server.  Does anyone have any ideas as 
> to what might cause this?
> 
> Thanks,
> Brian Strand
> CTO Switch Management
> 
> 

Oracle performance is critical in requiring fast disk access.  Oracle is
virtually self-contained with regard to the subsystems it uses -- it 
provides most of it's own.  Oracle slowdowns are related to either 
problems in the networking software for remote SQL operations, and 
disk access witb regard to jobs run locally.  If it's slower for local
SQL processing as well as remote I would suspect a problem with the 
low level disk interface.

Jeff


> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11  2:58 ` Jeff V. Merkey
@ 2001-07-11 15:55   ` Brian Strand
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Strand @ 2001-07-11 15:55 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: linux-kernel



Jeff V. Merkey wrote:

>Oracle performance is critical in requiring fast disk access.  Oracle is
>virtually self-contained with regard to the subsystems it uses -- it 
>provides most of it's own.  Oracle slowdowns are related to either 
>problems in the networking software for remote SQL operations, and 
>disk access witb regard to jobs run locally.  If it's slower for local
>SQL processing as well as remote I would suspect a problem with the 
>low level disk interface.
>
Our Oracle jobs are almost entirely local (we got rid of all network 
access for performance reasons months ago).  Before the upgrade to 
2.4.4, they were running well enough, but now (with the only change 
being the Suse upgrade from 7.0 to 7.2) they are taking twice as long. 
 I am slightly suspicious of the kernel, as much swapping is happening 
now which was not happening before on an identical workload.  I am 
trying out 2.4.6-2 (from Hubert Mantel's builds) today to see if VM 
behavior improves.

Many Thanks,
Brian Strand
CTO Switch Management



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
  2001-07-11  0:45 Brian Strand
  2001-07-11  1:15 ` Andrea Arcangeli
  2001-07-11  2:58 ` Jeff V. Merkey
@ 2001-07-11  2:59 ` Jeff V. Merkey
  2 siblings, 0 replies; 25+ messages in thread
From: Jeff V. Merkey @ 2001-07-11  2:59 UTC (permalink / raw)
  To: vokamura, Brian Strand; +Cc: linux-kernel

On Tue, Jul 10, 2001 at 05:45:16PM -0700, Brian Strand wrote:

Van,

Can you help this person?

Jeff


> We are running 3 Oracle servers, each dual CPU, 1 1GB and 2 2GB memory, 
>  between 36-180GB of RAID.  On June 26, I upgraded all boxes from Suse 
> 7.0 to Suse 7.2 (going from kernel version 2.2.16-40 to 2.4.4-14). 
>  Reviewing Oracle job times (jobs range from a few minutes to 10 hours) 
> before and after, performance is almost exactly twice as poor after the 
> upgrade versus before the upgrade.  Nothing in the hardware or Oracle 
> configuration has changed on any server.  Does anyone have any ideas as 
> to what might cause this?
> 
> Thanks,
> Brian Strand
> CTO Switch Management
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2001-07-17  9:05 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.21.0107111530170.2342-100000@llarsh-pc3.us.oracle.com.suse.lists.linux.kernel>
2001-07-12 10:14 ` 2x Oracle slowdown from 2.2.16 to 2.4.4 Andi Kleen
2001-07-12 14:22   ` Chris Mason
2001-07-12 16:09   ` Lance Larsh
2001-07-11  0:45 Brian Strand
2001-07-11  1:15 ` Andrea Arcangeli
2001-07-11 16:44   ` Brian Strand
2001-07-11 17:08     ` Andrea Arcangeli
2001-07-11 17:23       ` Chris Mason
2001-07-11 23:03     ` Lance Larsh
2001-07-11 23:46       ` Brian Strand
2001-07-12 15:21         ` Lance Larsh
2001-07-12 21:31           ` Hans Reiser
2001-07-12 21:51             ` Chris Mason
2001-07-13  3:00           ` Andrew Morton
2001-07-13  4:17             ` Andrew Morton
2001-07-13 15:36               ` Jeffrey W. Baker
2001-07-13 15:49                 ` Andrew Morton
2001-07-16 22:03                 ` Stephen C. Tweedie
2001-07-12  0:23       ` Chris Mason
2001-07-12 14:48         ` Lance Larsh
2001-07-12  2:30       ` Andrea Arcangeli
2001-07-12  6:12       ` parviz dey
2001-07-11  2:58 ` Jeff V. Merkey
2001-07-11 15:55   ` Brian Strand
2001-07-11  2:59 ` Jeff V. Merkey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox