* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 [not found] <Pine.LNX.4.21.0107111530170.2342-100000@llarsh-pc3.us.oracle.com.suse.lists.linux.kernel> @ 2001-07-12 10:14 ` Andi Kleen 2001-07-12 14:22 ` Chris Mason 2001-07-12 16:09 ` Lance Larsh 0 siblings, 2 replies; 25+ messages in thread From: Andi Kleen @ 2001-07-12 10:14 UTC (permalink / raw) To: llarsh; +Cc: linux-kernel, mason Lance Larsh <llarsh@oracle.com> writes: > > I ran lots of iozone tests which illustrated a huge difference in write > throughput between reiser and ext2. Chris Mason sent me a patch which > improved the reiser case (removing an unnecessary commit), but it was > still noticeably slower than ext2. Therefore I would recommend that > at this time reiser should not be used for Oracle database files. When I read the 2.4.6 reiserfs code correctly reiserfs does not cause any transactions for reads/writes to allocated blocks; i.e. you're not extending the file, you're not filling holes and you're not updating atimes. My understanding is that this is normally true for Oracle, but probably not for iozone so it would be better if you benchmarked random writes to an already allocated file. The 2.4 page cache is more or less direct write through in this case. -Andi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-12 10:14 ` 2x Oracle slowdown from 2.2.16 to 2.4.4 Andi Kleen @ 2001-07-12 14:22 ` Chris Mason 2001-07-12 16:09 ` Lance Larsh 1 sibling, 0 replies; 25+ messages in thread From: Chris Mason @ 2001-07-12 14:22 UTC (permalink / raw) To: Andi Kleen, llarsh; +Cc: linux-kernel On Thursday, July 12, 2001 12:14:16 PM +0200 Andi Kleen <freitag@alancoxonachip.com> wrote: > Lance Larsh <llarsh@oracle.com> writes: >> >> I ran lots of iozone tests which illustrated a huge difference in write >> throughput between reiser and ext2. Chris Mason sent me a patch which >> improved the reiser case (removing an unnecessary commit), but it was >> still noticeably slower than ext2. Therefore I would recommend that >> at this time reiser should not be used for Oracle database files. > > When I read the 2.4.6 reiserfs code correctly reiserfs does not cause > any transactions for reads/writes to allocated blocks; i.e. you're not extending > the file, you're not filling holes and you're not updating atimes. > My understanding is that this is normally true for Oracle, but probably > not for iozone so it would be better if you benchmarked random writes > to an already allocated file. > The 2.4 page cache is more or less direct write through in this case. > In general, yes. But, atime updates trigger transactions, as and O_SYNC/fsync writes (in 2.4.x reiserfs) always force a commit of the current tranasction. The two patches I just posted should fix that... -chris ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-12 10:14 ` 2x Oracle slowdown from 2.2.16 to 2.4.4 Andi Kleen 2001-07-12 14:22 ` Chris Mason @ 2001-07-12 16:09 ` Lance Larsh 1 sibling, 0 replies; 25+ messages in thread From: Lance Larsh @ 2001-07-12 16:09 UTC (permalink / raw) To: Andi Kleen; +Cc: llarsh, linux-kernel, mason [-- Attachment #1: Type: text/plain, Size: 928 bytes --] Andi Kleen wrote: > My understanding is that this is normally true for Oracle, but probably > not for iozone so it would be better if you benchmarked random writes > to an already allocated file. You are correct that this is true for Oracle: we preallocate the file at db create time, and we use O_DSYNC to avoid atime updates. The same is true for iozone: it performs writes to all the blocks (creating the file and allocating blocks), then rewrites all of the blocks. The write and rewrite times are measured and reported in separate. Naturally, we only care about the rewrite times, and those are the results I'm quoting when I casually use the term "writes". Also, we pass the "-o" option to iozone, which causes it to open the file with O_SYNC (which on Linux is really O_DSYNC), just like Oracle does. So, the mode I'm running iozone in really does model Oracle i/o. Sorry if that wasn't clear. Thanks, Lance [-- Attachment #2: Card for Lance Larsh --] [-- Type: text/x-vcard, Size: 367 bytes --] begin:vcard n:Larsh;Lance x-mozilla-html:FALSE url:http://www.oracle.com org:Oracle Corporation;<img src=http://www.geocities.com/TheTropics/3068/oraani.gif> version:2.1 email;internet:Lance.Larsh@oracle.com title:Principal Software Engineer adr;quoted-printable:;;500 Oracle Pkwy=0D=0AMS 401ip4;Redwood Shores;CA;94065; x-mozilla-cpt:;6896 fn:Lance Larsh end:vcard ^ permalink raw reply [flat|nested] 25+ messages in thread
* 2x Oracle slowdown from 2.2.16 to 2.4.4
@ 2001-07-11 0:45 Brian Strand
2001-07-11 1:15 ` Andrea Arcangeli
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Brian Strand @ 2001-07-11 0:45 UTC (permalink / raw)
To: linux-kernel
We are running 3 Oracle servers, each dual CPU, 1 1GB and 2 2GB memory,
between 36-180GB of RAID. On June 26, I upgraded all boxes from Suse
7.0 to Suse 7.2 (going from kernel version 2.2.16-40 to 2.4.4-14).
Reviewing Oracle job times (jobs range from a few minutes to 10 hours)
before and after, performance is almost exactly twice as poor after the
upgrade versus before the upgrade. Nothing in the hardware or Oracle
configuration has changed on any server. Does anyone have any ideas as
to what might cause this?
Thanks,
Brian Strand
CTO Switch Management
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 0:45 Brian Strand @ 2001-07-11 1:15 ` Andrea Arcangeli 2001-07-11 16:44 ` Brian Strand 2001-07-11 2:58 ` Jeff V. Merkey 2001-07-11 2:59 ` Jeff V. Merkey 2 siblings, 1 reply; 25+ messages in thread From: Andrea Arcangeli @ 2001-07-11 1:15 UTC (permalink / raw) To: Brian Strand; +Cc: linux-kernel On Tue, Jul 10, 2001 at 05:45:16PM -0700, Brian Strand wrote: > We are running 3 Oracle servers, each dual CPU, 1 1GB and 2 2GB memory, > between 36-180GB of RAID. On June 26, I upgraded all boxes from Suse > 7.0 to Suse 7.2 (going from kernel version 2.2.16-40 to 2.4.4-14). > Reviewing Oracle job times (jobs range from a few minutes to 10 hours) > before and after, performance is almost exactly twice as poor after the > upgrade versus before the upgrade. Nothing in the hardware or Oracle > configuration has changed on any server. Does anyone have any ideas as > to what might cause this? We need to restrict the problem. How are you using Oracle? Through any filesystem? If yes which one? Or with rawio? Is your workload cached most of the time or not? thanks, Andrea ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 1:15 ` Andrea Arcangeli @ 2001-07-11 16:44 ` Brian Strand 2001-07-11 17:08 ` Andrea Arcangeli 2001-07-11 23:03 ` Lance Larsh 0 siblings, 2 replies; 25+ messages in thread From: Brian Strand @ 2001-07-11 16:44 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel Andrea Arcangeli wrote: >We need to restrict the problem. How are you using Oracle? Through any >filesystem? If yes which one? Or with rawio? Is your workload cached >most of the time or not? > Our Oracle configuration is on reiserfs on lvm on Mylex. Our workload is not entirely cached, as we are working against an 8GB table, Oracle is configured to use slightly more than 1GB of memory, and there is always several MB/s of IO going on during our queries. The "working set" of the main table and indexes occupies over 2GB. Many Thanks, Brian Strand CTO Switch Management ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 16:44 ` Brian Strand @ 2001-07-11 17:08 ` Andrea Arcangeli 2001-07-11 17:23 ` Chris Mason 2001-07-11 23:03 ` Lance Larsh 1 sibling, 1 reply; 25+ messages in thread From: Andrea Arcangeli @ 2001-07-11 17:08 UTC (permalink / raw) To: Brian Strand; +Cc: linux-kernel On Wed, Jul 11, 2001 at 09:44:19AM -0700, Brian Strand wrote: > Our Oracle configuration is on reiserfs on lvm on Mylex. Our workload > is not entirely cached, as we are working against an 8GB table, Oracle > is configured to use slightly more than 1GB of memory, and there is > always several MB/s of IO going on during our queries. The "working > set" of the main table and indexes occupies over 2GB. As I suspected there is the VM in our way. Also reiserfs could be an issue but I am not aware of any regression on the reiserfs side, Chris? I tend to believe it is a VM regression (and I admit, this is what I would bet as soon as I read your report before being sure the VM was in our way). One way to verify this could be to run Oracle on top of rawio and then on ext2. If it's the vm you should still get the slowdown on ext2 too and you should run as fast as 2.2 with rawio. Most people uses Oracle on top of rawio on top of lvm, and incidentally this is was the first slowdown report I got about 2.4 when compared to 2.2. Andrea ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 17:08 ` Andrea Arcangeli @ 2001-07-11 17:23 ` Chris Mason 0 siblings, 0 replies; 25+ messages in thread From: Chris Mason @ 2001-07-11 17:23 UTC (permalink / raw) To: Andrea Arcangeli, Brian Strand; +Cc: linux-kernel On Wednesday, July 11, 2001 07:08:21 PM +0200 Andrea Arcangeli <andrea@suse.de> wrote: > On Wed, Jul 11, 2001 at 09:44:19AM -0700, Brian Strand wrote: >> Our Oracle configuration is on reiserfs on lvm on Mylex. Our workload >> is not entirely cached, as we are working against an 8GB table, Oracle >> is configured to use slightly more than 1GB of memory, and there is >> always several MB/s of IO going on during our queries. The "working >> set" of the main table and indexes occupies over 2GB. > > As I suspected there is the VM in our way. Also reiserfs could be an > issue but I am not aware of any regression on the reiserfs side, Chris? reiserfs has a big O_SYNC penalty right now, which can be fixed by a transaction tracking patch I posted a month or so ago. It has been tested by a few people as a large improvement. Brian, I'll update this to 2.4.6 and send along. -chris ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 16:44 ` Brian Strand 2001-07-11 17:08 ` Andrea Arcangeli @ 2001-07-11 23:03 ` Lance Larsh 2001-07-11 23:46 ` Brian Strand ` (3 more replies) 1 sibling, 4 replies; 25+ messages in thread From: Lance Larsh @ 2001-07-11 23:03 UTC (permalink / raw) To: Brian Strand; +Cc: Andrea Arcangeli, linux-kernel On Wed, 11 Jul 2001, Brian Strand wrote: > Our Oracle configuration is on reiserfs on lvm on Mylex. I can pretty much tell you it's the reiser+lvm combination that is hurting you here. At the 2.5 kernel summit a few months back, I reported that some of our servers experienced as much as 10-15x slowdown after we moved to 2.4. As it turned out, the problem was that the new servers (with identical hardware to the old servers) were configured to use reiser+lvm, whereas the older servers were using ext2 without lvm. When we rebuilt the new servers with ext2 alone, the problem disappeared. (Note that we also tried reiserfs without lvm, which was 5-6x slower than ext2 without lvm.) I ran lots of iozone tests which illustrated a huge difference in write throughput between reiser and ext2. Chris Mason sent me a patch which improved the reiser case (removing an unnecessary commit), but it was still noticeably slower than ext2. Therefore I would recommend that at this time reiser should not be used for Oracle database files. Thanks, Lance ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 23:03 ` Lance Larsh @ 2001-07-11 23:46 ` Brian Strand 2001-07-12 15:21 ` Lance Larsh 2001-07-12 0:23 ` Chris Mason ` (2 subsequent siblings) 3 siblings, 1 reply; 25+ messages in thread From: Brian Strand @ 2001-07-11 23:46 UTC (permalink / raw) To: Lance Larsh; +Cc: Andrea Arcangeli, linux-kernel Lance Larsh wrote: >On Wed, 11 Jul 2001, Brian Strand wrote: > >>Our Oracle configuration is on reiserfs on lvm on Mylex. >> >I can pretty much tell you it's the reiser+lvm combination that is hurting >you here. At the 2.5 kernel summit a few months back, I reported that > Why did it get so much worse going from 2.2.16 to 2.4.4, with an otherwise-identical configuration? We had reiserfs+lvm under 2.2.16 too. > >some of our servers experienced as much as 10-15x slowdown after we moved >to 2.4. As it turned out, the problem was that the new servers (with >identical hardware to the old servers) were configured to use reiser+lvm, >whereas the older servers were using ext2 without lvm. When we rebuilt >the new servers with ext2 alone, the problem disappeared. (Note that we >also tried reiserfs without lvm, which was 5-6x slower than ext2 without >lvm.) > >I ran lots of iozone tests which illustrated a huge difference in write >throughput between reiser and ext2. Chris Mason sent me a patch which >improved the reiser case (removing an unnecessary commit), but it was >still noticeably slower than ext2. Therefore I would recommend that >at this time reiser should not be used for Oracle database files. > How do ext2+lvm, rawio+lvm, ext2 w/o lvm, and rawio w/o lvm compare in terms of Oracle performance? I am going to try a migration if 2.4.6 doesn't make everything better; do you have any suggestions as to the relative performance of each strategy? Thanks, Brian ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 23:46 ` Brian Strand @ 2001-07-12 15:21 ` Lance Larsh 2001-07-12 21:31 ` Hans Reiser 2001-07-13 3:00 ` Andrew Morton 0 siblings, 2 replies; 25+ messages in thread From: Lance Larsh @ 2001-07-12 15:21 UTC (permalink / raw) To: Brian Strand; +Cc: Andrea Arcangeli, linux-kernel On Wed, 11 Jul 2001, Brian Strand wrote: > Why did it get so much worse going from 2.2.16 to 2.4.4, with an > otherwise-identical configuration? We had reiserfs+lvm under 2.2.16 too. Don't have an answer to that. I never tried reiser on 2.2. > How do ext2+lvm, rawio+lvm, ext2 w/o lvm, and rawio w/o lvm compare in > terms of Oracle performance? I am going to try a migration if 2.4.6 > doesn't make everything better; do you have any suggestions as to the > relative performance of each strategy? The best answer I can give at the moment is to use either ext2 or rawio, and you might want to avoid lvm for now. I never ran any of the lvm configurations myself. What little I know about lvm performance is conjecture based on comparing my reiser results (5-6x slower than ext2) to the reiser+lvm results from one of our other internal groups (10-15x slower than ext2). So, although it looks like lvm throws in a factor of 2-3x slowdown when using reiser, I don't think we can assume lvm slows down ext2 by the same amount or else someone probably would have noticed by now. Perhaps there's something that sort of resonates between reiser and lvm to cause the combination to be particularly bad. Just guessing... And while we're talking about comparing configurations, I'll mention that I'm currently trying to compare raw and ext2 (no lvm in either case). Although raw should be faster than fs, we're seeing some strange results: it looks like ext2 can be as much as 2x faster than raw for reads, though I'm not confident that these results are accurate. The fs might still be getting a boost from the fs cache, even though we've tried to eliminate that possibility by sizing things appropriately. Has anyone else seen results like this, or can anyone think of a possible explanation? Thanks, Lance ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-12 15:21 ` Lance Larsh @ 2001-07-12 21:31 ` Hans Reiser 2001-07-12 21:51 ` Chris Mason 2001-07-13 3:00 ` Andrew Morton 1 sibling, 1 reply; 25+ messages in thread From: Hans Reiser @ 2001-07-12 21:31 UTC (permalink / raw) To: Lance Larsh; +Cc: Brian Strand, Andrea Arcangeli, linux-kernel Lance Larsh wrote: > > On Wed, 11 Jul 2001, Brian Strand wrote: > > > Why did it get so much worse going from 2.2.16 to 2.4.4, with an > > otherwise-identical configuration? We had reiserfs+lvm under 2.2.16 too. > > Don't have an answer to that. I never tried reiser on 2.2. > > > How do ext2+lvm, rawio+lvm, ext2 w/o lvm, and rawio w/o lvm compare in > > terms of Oracle performance? I am going to try a migration if 2.4.6 > > doesn't make everything better; do you have any suggestions as to the > > relative performance of each strategy? > > The best answer I can give at the moment is to use either ext2 or rawio, > and you might want to avoid lvm for now. > > I never ran any of the lvm configurations myself. What little I know > about lvm performance is conjecture based on comparing my reiser results Lance, I would appreciate it if you would be more careful to identify that you are using O_SYNC, which is a special case we are not optimized for, and which I am frankly skeptical should be used at all by an application instead of using fsync judiciously. It is rare that an application is inherently completely incapable of ever having two I/Os not be serialized, and using O_SYNC to force every IO to be serialized rather than picking and choosing when to use fsync, well, I have my doubts frankly. If a user really needs every operation to be synchronous, they should buy a system with an SSD for the journal from applianceware.com (they sell them tuned to run ReiserFS), or else they are just going to go real slow, no matter what the FS does. > (5-6x slower than ext2) to the reiser+lvm results from one of our other > internal groups (10-15x slower than ext2). So, although it looks like lvm > throws in a factor of 2-3x slowdown when using reiser, I don't think we > can assume lvm slows down ext2 by the same amount or else someone probably > would have noticed by now. Perhaps there's something that sort of > resonates between reiser and lvm to cause the combination to be > particularly bad. Just guessing... > > And while we're talking about comparing configurations, I'll mention that > I'm currently trying to compare raw and ext2 (no lvm in either case). > Although raw should be faster than fs, we're seeing some strange results: > it looks like ext2 can be as much as 2x faster than raw for reads, though > I'm not confident that these results are accurate. The fs might still be > getting a boost from the fs cache, even though we've tried to eliminate > that possibility by sizing things appropriately. > > Has anyone else seen results like this, or can anyone think of a > possible explanation? > > Thanks, > Lance > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-12 21:31 ` Hans Reiser @ 2001-07-12 21:51 ` Chris Mason 0 siblings, 0 replies; 25+ messages in thread From: Chris Mason @ 2001-07-12 21:51 UTC (permalink / raw) To: Hans Reiser, Lance Larsh; +Cc: Brian Strand, Andrea Arcangeli, linux-kernel On Friday, July 13, 2001 01:31:42 AM +0400 Hans Reiser <reiser@namesys.com> wrote: > Lance, I would appreciate it if you would be more careful to identify that you are using O_SYNC, > which is a special case we are not optimized for, and which I am frankly skeptical should be used at > all by an application instead of using fsync judiciously. It is rare that an application is > inherently completely incapable of ever having two I/Os not be serialized, and using O_SYNC to force > every IO to be serialized rather than picking and choosing when to use fsync, well, I have my doubts > frankly. If a user really needs every operation to be synchronous, they should buy a system with an > SSD for the journal from applianceware.com (they sell them tuned to run ReiserFS), or else they are > just going to go real slow, no matter what the FS does. > There is no reason for reiserfs to be 5 times slower than ext2 at anything ;-) Regardless of if O_SYNC is a good idea or not. I should have optimized the original code for this case, as oracle is reason enough to do it. -chris ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-12 15:21 ` Lance Larsh 2001-07-12 21:31 ` Hans Reiser @ 2001-07-13 3:00 ` Andrew Morton 2001-07-13 4:17 ` Andrew Morton 1 sibling, 1 reply; 25+ messages in thread From: Andrew Morton @ 2001-07-13 3:00 UTC (permalink / raw) To: Lance Larsh; +Cc: Brian Strand, Andrea Arcangeli, linux-kernel Lance Larsh wrote: > > And while we're talking about comparing configurations, I'll mention that > I'm currently trying to compare raw and ext2 (no lvm in either case). It would be interesting to see some numbers for ext3 with full data journalling. Some preliminary testing by Neil Brown shows that ext3 is 1.5x faster than ext2 when used with knfsd, mounted synchronously. (This uses O_SYNC internally). The reason is that all the data and metadata are written to a contiguous area of the disk: no seeks apart from the seek to the journal are needed. Once the metadata and data are committed to the journal, the O_SYNC (or fsync()) caller is allowed to continue. Checkpointing of the data and metadata into the main fileystem is allowed to proceed via normal writeback. Make sure that you're using a *big* journal though. Use the `-J size=400' option with tune2fs or mke2fs. - ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-13 3:00 ` Andrew Morton @ 2001-07-13 4:17 ` Andrew Morton 2001-07-13 15:36 ` Jeffrey W. Baker 0 siblings, 1 reply; 25+ messages in thread From: Andrew Morton @ 2001-07-13 4:17 UTC (permalink / raw) To: Lance Larsh, Brian Strand, Andrea Arcangeli, linux-kernel Andrew Morton wrote: > > Lance Larsh wrote: > > > > And while we're talking about comparing configurations, I'll mention that > > I'm currently trying to compare raw and ext2 (no lvm in either case). > > It would be interesting to see some numbers for ext3 with full > data journalling. > > Some preliminary testing by Neil Brown shows that ext3 is 1.5x faster > than ext2 when used with knfsd, mounted synchronously. (This uses > O_SYNC internally). I just did some testing with local filesystems - running `dbench 4' on ext2-on-iDE and ext3-on-IDE, where dbench was altered to open files O_SYNC. Journal size was 400 megs, mount options `data=journal' ext2: Throughput 2.71849 MB/sec (NB=3.39812 MB/sec 27.1849 MBit/sec) ext3: Throughput 12.3623 MB/sec (NB=15.4529 MB/sec 123.623 MBit/sec) ext3 patches are at http://www.uow.edu.au/~andrewm/linux/ext3/ The difference will be less dramatic with large, individual writes. Be aware though that ext3 breaks both RAID1 and RAID5. This RAID patch should help: --- linux-2.4.6/drivers/md/raid1.c Wed Jul 4 18:21:26 2001 +++ lk-ext3/drivers/md/raid1.c Thu Jul 12 15:27:09 2001 @@ -46,6 +46,30 @@ #define PRINTK(x...) do { } while (0) #endif +#define __raid1_wait_event(wq, condition) \ +do { \ + wait_queue_t __wait; \ + init_waitqueue_entry(&__wait, current); \ + \ + add_wait_queue(&wq, &__wait); \ + for (;;) { \ + set_current_state(TASK_UNINTERRUPTIBLE); \ + if (condition) \ + break; \ + run_task_queue(&tq_disk); \ + schedule(); \ + } \ + current->state = TASK_RUNNING; \ + remove_wait_queue(&wq, &__wait); \ +} while (0) + +#define raid1_wait_event(wq, condition) \ +do { \ + if (condition) \ + break; \ + __raid1_wait_event(wq, condition); \ +} while (0) + static mdk_personality_t raid1_personality; static md_spinlock_t retry_list_lock = MD_SPIN_LOCK_UNLOCKED; @@ -83,7 +107,7 @@ static struct buffer_head *raid1_alloc_b cnt--; } else { PRINTK("raid1: waiting for %d bh\n", cnt); - wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt); + raid1_wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt); } } return bh; @@ -170,7 +194,7 @@ static struct raid1_bh *raid1_alloc_r1bh memset(r1_bh, 0, sizeof(*r1_bh)); return r1_bh; } - wait_event(conf->wait_buffer, conf->freer1); + raid1_wait_event(conf->wait_buffer, conf->freer1); } while (1); } --- linux-2.4.6/drivers/md/raid5.c Wed Jul 4 18:21:26 2001 +++ lk-ext3/drivers/md/raid5.c Thu Jul 12 21:31:55 2001 @@ -66,10 +66,11 @@ static inline void __release_stripe(raid BUG(); if (atomic_read(&conf->active_stripes)==0) BUG(); - if (test_bit(STRIPE_DELAYED, &sh->state)) - list_add_tail(&sh->lru, &conf->delayed_list); - else if (test_bit(STRIPE_HANDLE, &sh->state)) { - list_add_tail(&sh->lru, &conf->handle_list); + if (test_bit(STRIPE_HANDLE, &sh->state)) { + if (test_bit(STRIPE_DELAYED, &sh->state)) + list_add_tail(&sh->lru, &conf->delayed_list); + else + list_add_tail(&sh->lru, &conf->handle_list); md_wakeup_thread(conf->thread); } else { if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { @@ -1167,10 +1168,9 @@ static void raid5_unplug_device(void *da raid5_activate_delayed(conf); - if (conf->plugged) { - conf->plugged = 0; - md_wakeup_thread(conf->thread); - } + conf->plugged = 0; + md_wakeup_thread(conf->thread); + spin_unlock_irqrestore(&conf->device_lock, flags); } ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-13 4:17 ` Andrew Morton @ 2001-07-13 15:36 ` Jeffrey W. Baker 2001-07-13 15:49 ` Andrew Morton 2001-07-16 22:03 ` Stephen C. Tweedie 0 siblings, 2 replies; 25+ messages in thread From: Jeffrey W. Baker @ 2001-07-13 15:36 UTC (permalink / raw) To: Andrew Morton; +Cc: Lance Larsh, Brian Strand, Andrea Arcangeli, linux-kernel On Fri, 13 Jul 2001, Andrew Morton wrote: > Andrew Morton wrote: > > > > Lance Larsh wrote: > > > > > > And while we're talking about comparing configurations, I'll mention that > > > I'm currently trying to compare raw and ext2 (no lvm in either case). > > > > It would be interesting to see some numbers for ext3 with full > > data journalling. > > > > Some preliminary testing by Neil Brown shows that ext3 is 1.5x faster > > than ext2 when used with knfsd, mounted synchronously. (This uses > > O_SYNC internally). > > I just did some testing with local filesystems - running `dbench 4' > on ext2-on-iDE and ext3-on-IDE, where dbench was altered to open > files O_SYNC. Journal size was 400 megs, mount options `data=journal' > > ext2: Throughput 2.71849 MB/sec (NB=3.39812 MB/sec 27.1849 MBit/sec) > ext3: Throughput 12.3623 MB/sec (NB=15.4529 MB/sec 123.623 MBit/sec) > > ext3 patches are at http://www.uow.edu.au/~andrewm/linux/ext3/ > > The difference will be less dramatic with large, individual writes. This is a totally transient effect, right? The journal acts as a faster buffer, but if programs are writing a lot of data to the disk for a very long time, the throughput will eventually be throttled by writing the journal back into the filesystem. For programs that write in bursts, it looks like a huge win! -jwb ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-13 15:36 ` Jeffrey W. Baker @ 2001-07-13 15:49 ` Andrew Morton 2001-07-16 22:03 ` Stephen C. Tweedie 1 sibling, 0 replies; 25+ messages in thread From: Andrew Morton @ 2001-07-13 15:49 UTC (permalink / raw) To: Jeffrey W. Baker Cc: Lance Larsh, Brian Strand, Andrea Arcangeli, linux-kernel "Jeffrey W. Baker" wrote: > > > ... > > ext2: Throughput 2.71849 MB/sec (NB=3.39812 MB/sec 27.1849 MBit/sec) > > ext3: Throughput 12.3623 MB/sec (NB=15.4529 MB/sec 123.623 MBit/sec) > > > > ext3 patches are at http://www.uow.edu.au/~andrewm/linux/ext3/ > > > > The difference will be less dramatic with large, individual writes. > > This is a totally transient effect, right? The journal acts as a faster > buffer, but if programs are writing a lot of data to the disk for a very > long time, the throughput will eventually be throttled by writing the > journal back into the filesystem. It varies a lot with workload. With large writes such as 'iozone -s 300m -a -i 0' it seems about the same throughput as ext2. It would take some time to characterise fully. > For programs that write in bursts, it looks like a huge win! yes - lots of short writes (eg: mailspools) will benefit considerably. The benefits come from the additional merging and sorting which can be performed on the writeback data. I suspect some of the dbench benefit comes from the fact that the files are unlinked at the end of the test - if the data hasn't been written back at that time the buffers are hunted down and zapped - they *never* get written. If anyone wants to test sync throughput, please be sure to use 0.9.3-pre - it fixes some rather sucky behaviour with large journals. - ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-13 15:36 ` Jeffrey W. Baker 2001-07-13 15:49 ` Andrew Morton @ 2001-07-16 22:03 ` Stephen C. Tweedie 1 sibling, 0 replies; 25+ messages in thread From: Stephen C. Tweedie @ 2001-07-16 22:03 UTC (permalink / raw) To: Jeffrey W. Baker Cc: Andrew Morton, Lance Larsh, Brian Strand, Andrea Arcangeli, linux-kernel, Stephen Tweedie Hi, On Fri, Jul 13, 2001 at 08:36:01AM -0700, Jeffrey W. Baker wrote: > > files O_SYNC. Journal size was 400 megs, mount options `data=journal' > > > > ext2: Throughput 2.71849 MB/sec (NB=3.39812 MB/sec 27.1849 MBit/sec) > > ext3: Throughput 12.3623 MB/sec (NB=15.4529 MB/sec 123.623 MBit/sec) > > > > The difference will be less dramatic with large, individual writes. > > This is a totally transient effect, right? The journal acts as a faster > buffer, but if programs are writing a lot of data to the disk for a very > long time, the throughput will eventually be throttled by writing the > journal back into the filesystem. Not for O_SYNC. For ext2, *every* O_SYNC append to a file involves seeking between inodes and indirect blocks and data blocks. With ext3 with data journaling enabled, the synchronous part of the IO is a single sequential write to the journal. The async writeback will affect throughput, yes, but since it is done in the background, it can do tons of optimisations: if you extend a file a hundred times with O_SYNC, then you are forced to journal the inode update a hundred times but the writeback which occurs later need only be done once. For async traffic, you're quite correct. For synchronous traffic, the writeback later on is still async, and the synchronous costs really do often dominate, so the net effect over time is still a big win. Cheers, Stephen ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 23:03 ` Lance Larsh 2001-07-11 23:46 ` Brian Strand @ 2001-07-12 0:23 ` Chris Mason 2001-07-12 14:48 ` Lance Larsh 2001-07-12 2:30 ` Andrea Arcangeli 2001-07-12 6:12 ` parviz dey 3 siblings, 1 reply; 25+ messages in thread From: Chris Mason @ 2001-07-12 0:23 UTC (permalink / raw) To: Lance Larsh, Brian Strand; +Cc: Andrea Arcangeli, linux-kernel On Wednesday, July 11, 2001 04:03:09 PM -0700 Lance Larsh <llarsh@oracle.com> wrote: > I ran lots of iozone tests which illustrated a huge difference in write > throughput between reiser and ext2. Chris Mason sent me a patch which > improved the reiser case (removing an unnecessary commit), but it was > still noticeably slower than ext2. Therefore I would recommend that > at this time reiser should not be used for Oracle database files. > Hi Lance, Could I get a copy of the results from last benchmark you ran (with the patch + noatime on reiserfs). I'd like to close that gap... -chris ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-12 0:23 ` Chris Mason @ 2001-07-12 14:48 ` Lance Larsh 0 siblings, 0 replies; 25+ messages in thread From: Lance Larsh @ 2001-07-12 14:48 UTC (permalink / raw) To: Chris Mason; +Cc: Brian Strand, Andrea Arcangeli, linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 858 bytes --] On Wed, 11 Jul 2001, Chris Mason wrote: > Could I get a copy of the results from last benchmark you ran (with the > patch + noatime on reiserfs). I'd like to close that gap... I have the results in an Excel spreadsheet, but I'm only attaching the plot in postscript format to simplify things. If you'd like me to send you the .xls file, let me know. Note that the results included here are only for "rewrites", not "writes". The most interesting things I see are: 1. the reiser patch you sent me made a noticeable improvement, but it didn't matter whether I used the noatime mount option or not. 2. reiser has a reproducible spike in throughput at 4k i/o size, and it even beats ext2 in that single case. 3. (and sort of off topic...) ext2 throughput drifts slightly down for i/o sizes >64k as we go from 2.4.0 -> 2.4.3 -> 2.4.4 Thanks, Lance [-- Attachment #2: Type: APPLICATION/postscript, Size: 51903 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 23:03 ` Lance Larsh 2001-07-11 23:46 ` Brian Strand 2001-07-12 0:23 ` Chris Mason @ 2001-07-12 2:30 ` Andrea Arcangeli 2001-07-12 6:12 ` parviz dey 3 siblings, 0 replies; 25+ messages in thread From: Andrea Arcangeli @ 2001-07-12 2:30 UTC (permalink / raw) To: Lance Larsh; +Cc: Brian Strand, linux-kernel, lvm-devel On Wed, Jul 11, 2001 at 04:03:09PM -0700, Lance Larsh wrote: > some of our servers experienced as much as 10-15x slowdown after we moved [..] > also tried reiserfs without lvm, which was 5-6x slower than ext2 without Hmm, so lvm introduced a significant slowdown too. The only thing I'm scared about lvm are the down() in the ll_rw_block fast paths and sumbit_bh which should *obviously* be converted to rwsem (the write lock is needed only while moving PV around or while taking COW in a snapshotted device). This way the fast paths common cases will never wait for a lock. We inherit those non rw semaphores from the latest lvm release (more recent than beta7 there's only the head CVS). The down() of beta7 fixes race conditions present in previous releases so they weren't pointless, but it was obviously a suboptimal fix. When I seen them I was just scared but it was hard to tell if they could hurt in real life and since 'till today nobody said anything bad about lvm performance I assumed it wasn't a problem, but now something has changed thanks to your feedback. I will soon somehow make those changes in the lvm (based on beta7) in my tree and it will be interesting to see if this will make a difference. I will also have a look to see if I can improve a little more the lvm_map but other than those non rw semaphores there should be not a significant overhead to remove in the lvm fast path. Andrea PS. hint: if the down() were the problem you should also see an higher context switching rate with lvm+ext2 than with plain ext2. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 23:03 ` Lance Larsh ` (2 preceding siblings ...) 2001-07-12 2:30 ` Andrea Arcangeli @ 2001-07-12 6:12 ` parviz dey 3 siblings, 0 replies; 25+ messages in thread From: parviz dey @ 2001-07-12 6:12 UTC (permalink / raw) To: Lance Larsh; +Cc: linux-kernel Hey Lance, interesting stuff!! Did u ever found out why this would happen?? any idea?? --- Lance Larsh <llarsh@oracle.com> wrote: > On Wed, 11 Jul 2001, Brian Strand wrote: > > > Our Oracle configuration is on reiserfs on lvm on > Mylex. > > I can pretty much tell you it's the reiser+lvm > combination that is hurting > you here. At the 2.5 kernel summit a few months > back, I reported that > some of our servers experienced as much as 10-15x > slowdown after we moved > to 2.4. As it turned out, the problem was that the > new servers (with > identical hardware to the old servers) were > configured to use reiser+lvm, > whereas the older servers were using ext2 without > lvm. When we rebuilt > the new servers with ext2 alone, the problem > disappeared. (Note that we > also tried reiserfs without lvm, which was 5-6x > slower than ext2 without > lvm.) > > I ran lots of iozone tests which illustrated a huge > difference in write > throughput between reiser and ext2. Chris Mason > sent me a patch which > improved the reiser case (removing an unnecessary > commit), but it was > still noticeably slower than ext2. Therefore I > would recommend that > at this time reiser should not be used for Oracle > database files. > > Thanks, > Lance > > - > To unsubscribe from this list: send the line > "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at > http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 0:45 Brian Strand 2001-07-11 1:15 ` Andrea Arcangeli @ 2001-07-11 2:58 ` Jeff V. Merkey 2001-07-11 15:55 ` Brian Strand 2001-07-11 2:59 ` Jeff V. Merkey 2 siblings, 1 reply; 25+ messages in thread From: Jeff V. Merkey @ 2001-07-11 2:58 UTC (permalink / raw) To: Brian Strand; +Cc: linux-kernel On Tue, Jul 10, 2001 at 05:45:16PM -0700, Brian Strand wrote: > We are running 3 Oracle servers, each dual CPU, 1 1GB and 2 2GB memory, > between 36-180GB of RAID. On June 26, I upgraded all boxes from Suse > 7.0 to Suse 7.2 (going from kernel version 2.2.16-40 to 2.4.4-14). > Reviewing Oracle job times (jobs range from a few minutes to 10 hours) > before and after, performance is almost exactly twice as poor after the > upgrade versus before the upgrade. Nothing in the hardware or Oracle > configuration has changed on any server. Does anyone have any ideas as > to what might cause this? > > Thanks, > Brian Strand > CTO Switch Management > > Oracle performance is critical in requiring fast disk access. Oracle is virtually self-contained with regard to the subsystems it uses -- it provides most of it's own. Oracle slowdowns are related to either problems in the networking software for remote SQL operations, and disk access witb regard to jobs run locally. If it's slower for local SQL processing as well as remote I would suspect a problem with the low level disk interface. Jeff > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 2:58 ` Jeff V. Merkey @ 2001-07-11 15:55 ` Brian Strand 0 siblings, 0 replies; 25+ messages in thread From: Brian Strand @ 2001-07-11 15:55 UTC (permalink / raw) To: Jeff V. Merkey; +Cc: linux-kernel Jeff V. Merkey wrote: >Oracle performance is critical in requiring fast disk access. Oracle is >virtually self-contained with regard to the subsystems it uses -- it >provides most of it's own. Oracle slowdowns are related to either >problems in the networking software for remote SQL operations, and >disk access witb regard to jobs run locally. If it's slower for local >SQL processing as well as remote I would suspect a problem with the >low level disk interface. > Our Oracle jobs are almost entirely local (we got rid of all network access for performance reasons months ago). Before the upgrade to 2.4.4, they were running well enough, but now (with the only change being the Suse upgrade from 7.0 to 7.2) they are taking twice as long. I am slightly suspicious of the kernel, as much swapping is happening now which was not happening before on an identical workload. I am trying out 2.4.6-2 (from Hubert Mantel's builds) today to see if VM behavior improves. Many Thanks, Brian Strand CTO Switch Management ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2x Oracle slowdown from 2.2.16 to 2.4.4 2001-07-11 0:45 Brian Strand 2001-07-11 1:15 ` Andrea Arcangeli 2001-07-11 2:58 ` Jeff V. Merkey @ 2001-07-11 2:59 ` Jeff V. Merkey 2 siblings, 0 replies; 25+ messages in thread From: Jeff V. Merkey @ 2001-07-11 2:59 UTC (permalink / raw) To: vokamura, Brian Strand; +Cc: linux-kernel On Tue, Jul 10, 2001 at 05:45:16PM -0700, Brian Strand wrote: Van, Can you help this person? Jeff > We are running 3 Oracle servers, each dual CPU, 1 1GB and 2 2GB memory, > between 36-180GB of RAID. On June 26, I upgraded all boxes from Suse > 7.0 to Suse 7.2 (going from kernel version 2.2.16-40 to 2.4.4-14). > Reviewing Oracle job times (jobs range from a few minutes to 10 hours) > before and after, performance is almost exactly twice as poor after the > upgrade versus before the upgrade. Nothing in the hardware or Oracle > configuration has changed on any server. Does anyone have any ideas as > to what might cause this? > > Thanks, > Brian Strand > CTO Switch Management > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2001-07-17 9:05 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.21.0107111530170.2342-100000@llarsh-pc3.us.oracle.com.suse.lists.linux.kernel>
2001-07-12 10:14 ` 2x Oracle slowdown from 2.2.16 to 2.4.4 Andi Kleen
2001-07-12 14:22 ` Chris Mason
2001-07-12 16:09 ` Lance Larsh
2001-07-11 0:45 Brian Strand
2001-07-11 1:15 ` Andrea Arcangeli
2001-07-11 16:44 ` Brian Strand
2001-07-11 17:08 ` Andrea Arcangeli
2001-07-11 17:23 ` Chris Mason
2001-07-11 23:03 ` Lance Larsh
2001-07-11 23:46 ` Brian Strand
2001-07-12 15:21 ` Lance Larsh
2001-07-12 21:31 ` Hans Reiser
2001-07-12 21:51 ` Chris Mason
2001-07-13 3:00 ` Andrew Morton
2001-07-13 4:17 ` Andrew Morton
2001-07-13 15:36 ` Jeffrey W. Baker
2001-07-13 15:49 ` Andrew Morton
2001-07-16 22:03 ` Stephen C. Tweedie
2001-07-12 0:23 ` Chris Mason
2001-07-12 14:48 ` Lance Larsh
2001-07-12 2:30 ` Andrea Arcangeli
2001-07-12 6:12 ` parviz dey
2001-07-11 2:58 ` Jeff V. Merkey
2001-07-11 15:55 ` Brian Strand
2001-07-11 2:59 ` Jeff V. Merkey
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox