* Fw: Re: ICP, 3ware, Areca?
@ 2006-11-07 19:47 Andrew Morton
2006-11-07 19:55 ` Alex Tomas
2006-11-07 20:59 ` Dave Kleikamp
0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2006-11-07 19:47 UTC (permalink / raw)
To: linux-ext4@vger.kernel.org
Why is ext3 slow??
Begin forwarded message:
Date: Tue, 7 Nov 2006 09:47:17 -0500
From: "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com>
To: Arne Schmitz <arne.schmitz@gmx.net>
Cc: linux-ide-arrays@lists.math.uh.edu
Subject: Re: ICP, 3ware, Areca?
On Tue, Nov 07, 2006 at 03:25:04PM +0100, Arne Schmitz wrote:
> Has anyone information about how current ICP and Areca hardware performs under
> Linux? We are currently running kernel 2.6.17 and have two offers, one with
> an Areca ARC-1220 8-port, and one with an ICP 9087MA 8-port. Does either of
> them make trouble running a (64 bit) Linux?
>
> At the moment we only have two 3ware controllers running on 32 bit Linux.
On Fri, 18 Aug 2006, I wrote to the list:
I've been doing sequential raw disk I/O testing with both Jens Axboe's
"fio" using libaio and iodepths up to 32, as well as a basic
"dd if=/dev/zero oflag=direct".
Reads look fine; a zone read test shows 360 MiB/s at the start of the disk,
190 MiB/s at the end. I see similarly high numbers doing direct reads via
ext3.
Unfortunately, no matter what I do on the write side, I don't see
more than 72 MiB/s for a sequential direct I/O write to the raw disk.
I've tried the deadline and noop schedulers, boosted nr_requests and
toyed with various i/o sizes and queue depths using fio. I was expecting
sequential writes in the range of 120-150 MiB/s, based on the (now
ancient) tweakers.net review and various other info. [Copying /dev/zero
to tmpfs on this box yields ~860 MiB/s.]
The machine is a Tyan 2882 dual Opteron with 8GB RAM and an Areca 1220
/ 128MB BBU and 8xWDC WD2500JS-00NCB1 250.1GB 7200 RPM configured as a
RAID6 with chunk size 64K. [System volume is on an separate MD RAID1 on
the Nvidia controller.] It's running FC4 x86_64 with a custom-built
2.6.17.7 kernel and the arcmsr driver from scsi-misc GIT, which is
basically 1.20.0X.13 + fixes. The firmware is V1.41 2006-5-24.
Chris Caputo suggested:
I'd run a test with write cache on and one with write cache off and
compare the results. The difference can be vast and depending on your
application it may be okay to run with write cache on.
And I reported back on Tue, 22 Aug 2006:
Forcing disk write caching on certainly changes the results
(and the risk profile, of course). For the archives, here are
some simple "dd" and "fio" odirect results. These benchmarks
were run with defaults (CFQ scheduler, nr_request = 128).
...
Summary:
Raw partition: 228 MiB/s
XFS: 228 MiB/s
Ext3: 139-151 MiB/s
Regards,
Bill Rugolsky
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Fw: Re: ICP, 3ware, Areca? 2006-11-07 19:47 Fw: Re: ICP, 3ware, Areca? Andrew Morton @ 2006-11-07 19:55 ` Alex Tomas 2006-11-07 20:59 ` Dave Kleikamp 1 sibling, 0 replies; 7+ messages in thread From: Alex Tomas @ 2006-11-07 19:55 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-ext4@vger.kernel.org can we get vmstat 1 output for the run? thanks, Alex >>>>> Andrew Morton (AM) writes: AM> Why is ext3 slow?? AM> Begin forwarded message: AM> Date: Tue, 7 Nov 2006 09:47:17 -0500 AM> From: "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com> AM> To: Arne Schmitz <arne.schmitz@gmx.net> AM> Cc: linux-ide-arrays@lists.math.uh.edu AM> Subject: Re: ICP, 3ware, Areca? AM> On Tue, Nov 07, 2006 at 03:25:04PM +0100, Arne Schmitz wrote: >> Has anyone information about how current ICP and Areca hardware performs under >> Linux? We are currently running kernel 2.6.17 and have two offers, one with >> an Areca ARC-1220 8-port, and one with an ICP 9087MA 8-port. Does either of >> them make trouble running a (64 bit) Linux? >> >> At the moment we only have two 3ware controllers running on 32 bit Linux. AM> On Fri, 18 Aug 2006, I wrote to the list: AM> I've been doing sequential raw disk I/O testing with both Jens Axboe's AM> "fio" using libaio and iodepths up to 32, as well as a basic AM> "dd if=/dev/zero oflag=direct". AM> Reads look fine; a zone read test shows 360 MiB/s at the start of the disk, AM> 190 MiB/s at the end. I see similarly high numbers doing direct reads via AM> ext3. AM> Unfortunately, no matter what I do on the write side, I don't see AM> more than 72 MiB/s for a sequential direct I/O write to the raw disk. AM> I've tried the deadline and noop schedulers, boosted nr_requests and AM> toyed with various i/o sizes and queue depths using fio. I was expecting AM> sequential writes in the range of 120-150 MiB/s, based on the (now AM> ancient) tweakers.net review and various other info. [Copying /dev/zero AM> to tmpfs on this box yields ~860 MiB/s.] AM> The machine is a Tyan 2882 dual Opteron with 8GB RAM and an Areca 1220 AM> / 128MB BBU and 8xWDC WD2500JS-00NCB1 250.1GB 7200 RPM configured as a AM> RAID6 with chunk size 64K. [System volume is on an separate MD RAID1 on AM> the Nvidia controller.] It's running FC4 x86_64 with a custom-built AM> 2.6.17.7 kernel and the arcmsr driver from scsi-misc GIT, which is AM> basically 1.20.0X.13 + fixes. The firmware is V1.41 2006-5-24. AM> Chris Caputo suggested: AM> I'd run a test with write cache on and one with write cache off and AM> compare the results. The difference can be vast and depending on your AM> application it may be okay to run with write cache on. AM> And I reported back on Tue, 22 Aug 2006: AM> Forcing disk write caching on certainly changes the results AM> (and the risk profile, of course). For the archives, here are AM> some simple "dd" and "fio" odirect results. These benchmarks AM> were run with defaults (CFQ scheduler, nr_request = 128). AM> ... AM> Summary: AM> Raw partition: 228 MiB/s AM> XFS: 228 MiB/s AM> Ext3: 139-151 MiB/s AM> Regards, AM> Bill Rugolsky AM> - AM> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in AM> the body of a message to majordomo@vger.kernel.org AM> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fw: Re: ICP, 3ware, Areca? 2006-11-07 19:47 Fw: Re: ICP, 3ware, Areca? Andrew Morton 2006-11-07 19:55 ` Alex Tomas @ 2006-11-07 20:59 ` Dave Kleikamp 2006-11-07 21:06 ` bzzz 2006-11-07 21:45 ` Andrew Morton 1 sibling, 2 replies; 7+ messages in thread From: Dave Kleikamp @ 2006-11-07 20:59 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-ext4@vger.kernel.org On Tue, 2006-11-07 at 11:47 -0800, Andrew Morton wrote: > Why is ext3 slow?? Allocation? I don't see anything indicating that Bill is overwriting an existing file, so there is block allocation and journaling overhead. If that's the case, it would be interesting to see how fast ext3 is when overwriting a file. Extents and delayed allocation should improve on this a lot. > Begin forwarded message: > > Date: Tue, 7 Nov 2006 09:47:17 -0500 > From: "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com> > To: Arne Schmitz <arne.schmitz@gmx.net> > Cc: linux-ide-arrays@lists.math.uh.edu > Subject: Re: ICP, 3ware, Areca? > > > On Tue, Nov 07, 2006 at 03:25:04PM +0100, Arne Schmitz wrote: > > Has anyone information about how current ICP and Areca hardware performs under > > Linux? We are currently running kernel 2.6.17 and have two offers, one with > > an Areca ARC-1220 8-port, and one with an ICP 9087MA 8-port. Does either of > > them make trouble running a (64 bit) Linux? > > > > At the moment we only have two 3ware controllers running on 32 bit Linux. > > On Fri, 18 Aug 2006, I wrote to the list: > > I've been doing sequential raw disk I/O testing with both Jens Axboe's > "fio" using libaio and iodepths up to 32, as well as a basic > "dd if=/dev/zero oflag=direct". > > Reads look fine; a zone read test shows 360 MiB/s at the start of the disk, > 190 MiB/s at the end. I see similarly high numbers doing direct reads via > ext3. This would indicate that indirect block lookups themselves aren't a problem. > Summary: > > Raw partition: 228 MiB/s > XFS: 228 MiB/s > Ext3: 139-151 MiB/s -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fw: Re: ICP, 3ware, Areca? 2006-11-07 20:59 ` Dave Kleikamp @ 2006-11-07 21:06 ` bzzz 2006-11-07 21:45 ` Andrew Morton 1 sibling, 0 replies; 7+ messages in thread From: bzzz @ 2006-11-07 21:06 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Andrew Morton, linux-ext4@vger.kernel.org >>>>> Dave Kleikamp (DK) writes: DK> On Tue, 2006-11-07 at 11:47 -0800, Andrew Morton wrote: >> Why is ext3 slow?? DK> Allocation? I don't see anything indicating that Bill is overwriting an DK> existing file, so there is block allocation and journaling overhead. If DK> that's the case, it would be interesting to see how fast ext3 is when DK> overwriting a file. Extents and delayed allocation should improve on DK> this a lot. this was my first suspiction as well. though in my testing on opteron write achieved ~300MB/s consuming 100% cpu. so it would be interesting to see vmstat output and actual cpu consumption. thanks, Alex ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fw: Re: ICP, 3ware, Areca? 2006-11-07 20:59 ` Dave Kleikamp 2006-11-07 21:06 ` bzzz @ 2006-11-07 21:45 ` Andrew Morton 2006-11-07 22:07 ` Bill Rugolsky Jr. 2006-11-07 22:20 ` Bill Rugolsky Jr. 1 sibling, 2 replies; 7+ messages in thread From: Andrew Morton @ 2006-11-07 21:45 UTC (permalink / raw) To: Dave Kleikamp; +Cc: linux-ext4@vger.kernel.org, Bill Rugolsky Jr. On Tue, 07 Nov 2006 14:59:52 -0600 Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote: > On Tue, 2006-11-07 at 11:47 -0800, Andrew Morton wrote: > > Why is ext3 slow?? > > Allocation? I don't see anything indicating that Bill is overwriting an > existing file, so there is block allocation and journaling overhead. If > that's the case, it would be interesting to see how fast ext3 is when > overwriting a file. Extents and delayed allocation should improve on > this a lot. Maybe. or perhaps some funniness with RAID aligment. Bill, if you have time it'd be interesting to repeat the comparative benchmarking with: ext3, data=ordered: dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct time dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct conv=notrunc ext4dev: dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct time dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct conv=notrunc ext4dev, -oextents rm foo dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct time dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct conv=notrunc > > Begin forwarded message: > > > > Date: Tue, 7 Nov 2006 09:47:17 -0500 > > From: "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com> > > To: Arne Schmitz <arne.schmitz@gmx.net> > > Cc: linux-ide-arrays@lists.math.uh.edu > > Subject: Re: ICP, 3ware, Areca? > > > > > > On Tue, Nov 07, 2006 at 03:25:04PM +0100, Arne Schmitz wrote: > > > Has anyone information about how current ICP and Areca hardware performs under > > > Linux? We are currently running kernel 2.6.17 and have two offers, one with > > > an Areca ARC-1220 8-port, and one with an ICP 9087MA 8-port. Does either of > > > them make trouble running a (64 bit) Linux? > > > > > > At the moment we only have two 3ware controllers running on 32 bit Linux. > > > > On Fri, 18 Aug 2006, I wrote to the list: > > > > I've been doing sequential raw disk I/O testing with both Jens Axboe's > > "fio" using libaio and iodepths up to 32, as well as a basic > > "dd if=/dev/zero oflag=direct". > > > > Reads look fine; a zone read test shows 360 MiB/s at the start of the disk, > > 190 MiB/s at the end. I see similarly high numbers doing direct reads via > > ext3. > > This would indicate that indirect block lookups themselves aren't a > problem. > > > Summary: > > > > Raw partition: 228 MiB/s > > XFS: 228 MiB/s > > Ext3: 139-151 MiB/s It's hard to believe that the block allocator could do this to us. I'd be suspecting that something is causing additional seeking. Bill, when publishing figures like this it is useful (and somewhat important) to also record the CPU consumption. So please publish the full output of /usr/bin/time and not just the elapsed time, thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fw: Re: ICP, 3ware, Areca? 2006-11-07 21:45 ` Andrew Morton @ 2006-11-07 22:07 ` Bill Rugolsky Jr. 2006-11-07 22:20 ` Bill Rugolsky Jr. 1 sibling, 0 replies; 7+ messages in thread From: Bill Rugolsky Jr. @ 2006-11-07 22:07 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Kleikamp, linux-ext4@vger.kernel.org On Tue, Nov 07, 2006 at 01:45:13PM -0800, Andrew Morton wrote: > Bill, if you have time it'd be interesting to repeat the comparative > benchmarking with: > > ext3, data=ordered: > > dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct > time dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct conv=notrunc > > ext4dev: > > dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct > time dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct conv=notrunc > > ext4dev, -oextents > > rm foo > dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct > time dd if=/dev/zero of=foo bs=1M count=1000 oflag=direct conv=notrunc Andrew, Will do. I currently have one of these servers running a production Postgresql over Ext3. The warm-standby backup server is not yet fully configured and in use, so I will do some testing before deploying it. We are at the tail end of a horrible office move, so I've been a bit removed from kernel-building. [Sadly, I have yet to have a chance to test the excellent sata_nv ADMA work to see whether the latencies are gone.] I ought to be able to get to testing in the next day or two; sorry in advance for the delay. In the e-mail you received, I had omitted the full information from my original postings. I don't see the archives online, so I've appended the full results. fio-1.5-0.20060728152503 was used; the parameters appear in the fio output -Bill ========================================================================= Date: Tue, 22 Aug 2006 12:39:01 -0400 From: "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com> To: Chris Caputo <ccaputo@alt.net> Cc: linux-ide-arrays@lists.math.uh.edu Subject: Re: Areca 1220 Sequential I/O performance numbers In-Reply-To: <Pine.LNX.4.64.0608182252550.4337@nacho.alt.net> Message-ID: <20060822163901.GA1048@ti64.telemetry-investments.com> On Fri, Aug 18, 2006 at 10:54:22PM +0000, Chris Caputo wrote: > I'd run a test with write cache on and one with write cache off and > compare the results. The difference can be vast and depending on your > application it may be okay to run with write cache on. Thanks Chris, Forcing disk write caching on certainly changes the results (and the risk profile, of course). For the archives, here are some simple "dd" and "fio" odirect results. These benchmarks were run with defaults (CFQ scheduler, nr_request = 128). Again, the machine is a Tyan 2882 dual Opteron with 8GB RAM and an Areca 1220 / 128MB BBU and 8xWDC WD2500JS-00NCB1 250.1GB 7200 RPM configured as a RAID6 with chunk size 64K. [System volume is on an separate MD RAID1 on the Nvidia controller.] It's running FC4 x86_64 with a custom-built 2.6.17.7 kernel and the arcmsr driver from scsi-misc GIT, which is basically 1.20.0X.13 + fixes. The firmware is V1.41 2006-5-24. Summary: Raw partition: 228 MiB/s XFS: 228 MiB/s Ext3: 139-151 MiB/s [N.B.: The "dd" numbers are displayed in MB/s, the "fio" results are in MiB/s.] ================= = Raw partition = ================= % sudo time dd if=/dev/zero of=/dev/sdc2 bs=4M count=1024 oflag=direct 1024+0 records in 1024+0 records out 4294967296 bytes (4.3 GB) copied, 17.7893 seconds, 241 MB/s 0.00user 0.68system 0:17.86elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (3major+264minor)pagefaults 0swaps % sudo fio sequential-write client1: (g=0): rw=write, odir=1, bs=131072-131072, rate=0, ioengine=libaio, iodepth=32 Starting 1 thread Threads running: 1: [W] [100.00% done] [eta 00m:00s] client1: (groupid=0): err= 0: write: io= 4099MiB, bw=228004KiB/s, runt= 18855msec slat (msec): min= 0, max= 0, avg= 0.00, dev= 0.00 clat (msec): min= 0, max= 83, avg=18.07, dev=26.64 bw (KiB/s) : min= 0, max=358612, per=98.57%, avg=224741.21, dev=243343.17 cpu : usr=0.30%, sys=5.15%, ctx=33015 Run status group 0 (all jobs): WRITE: io=4099MiB, aggrb=228004, minb=228004, maxb=228004, mint=18855msec, maxt=18855msec Disk stats (read/write): sdc: ios=0/32799, merge=0/0, ticks=0/602466, in_queue=602461, util=99.73% ====================================================== = XFS (/sbin/mkfs.xfs -f -d su=65536,sw=6 /dev/sdc2) = ====================================================== % sudo time dd if=/dev/zero of=foo bs=4M count=1024 oflag=direct 1024+0 records in 1024+0 records out 4294967296 bytes (4.3 GB) copied, 17.9354 seconds, 239 MB/s 0.00user 0.80system 0:17.93elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+268minor)pagefaults 0swaps % sudo fio sequential-write-foo client1: (g=0): rw=write, odir=1, bs=131072-131072, rate=0, ioengine=libaio, iodepth=32 Starting 1 thread client1: Laying out IO file (4096MiB) Threads running: 1: [W] [100.00% done] [eta 00m:00s] client1: (groupid=0): err= 0: write: io= 4096MiB, bw=228613KiB/s, runt= 18787msec slat (msec): min= 0, max= 0, avg= 0.00, dev= 0.00 clat (msec): min= 0, max= 105, avg=18.02, dev=26.63 bw (KiB/s) : min= 0, max=359137, per=97.62%, avg=223165.97, dev=240029.16 cpu : usr=0.21%, sys=5.39%, ctx=32928 Run status group 0 (all jobs): WRITE: io=4096MiB, aggrb=228613, minb=228613, maxb=228613, mint=18787msec, maxt=18787msec Disk stats (read/write): sdc: ios=28/49658, merge=0/1, ticks=520/2564125, in_queue=2564637, util=92.62% ================================================================== = Ext3 (/sbin/mke2fs -j -J size=400 -E stride=96 /dev/sdc2) = = This is with data=ordered; data=writeback was slightly slower. = ================================================================== % sudo time dd if=/dev/zero of=foo bs=4M count=1024 oflag=direct 1024+0 records in 1024+0 records out 4294967296 bytes (4.3 GB) copied, 29.4102 seconds, 146 MB/s 0.00user 1.40system 0:29.95elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+268minor)pagefaults 0swaps % sudo fio sequential-write-foo client1: (g=0): rw=write, odir=1, bs=131072-131072, rate=0, ioengine=libaio, iodepth=32 Starting 1 thread Threads running: 1: [W] [100.00% done] [eta 00m:00s]0m:10s] client1: (groupid=0): err= 0: write: io= 4096MiB, bw=151894KiB/s, runt= 28276msec slat (msec): min= 0, max= 0, avg= 0.00, dev= 0.00 clat (msec): min= 0, max= 428, avg=27.23, dev=56.99 bw (KiB/s) : min= 0, max=266338, per=100.11%, avg=152057.02, dev=173467.74 cpu : usr=0.23%, sys=3.64%, ctx=32944 Run status group 0 (all jobs): WRITE: io=4096MiB, aggrb=151894, minb=151894, maxb=151894, mint=28276msec, maxt=28276msec Disk stats (read/write): sdc: ios=0/33867, merge=0/5, ticks=0/934143, in_queue=934143, util=99.96% ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fw: Re: ICP, 3ware, Areca? 2006-11-07 21:45 ` Andrew Morton 2006-11-07 22:07 ` Bill Rugolsky Jr. @ 2006-11-07 22:20 ` Bill Rugolsky Jr. 1 sibling, 0 replies; 7+ messages in thread From: Bill Rugolsky Jr. @ 2006-11-07 22:20 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Kleikamp, linux-ext4@vger.kernel.org On Tue, Nov 07, 2006 at 01:45:13PM -0800, Andrew Morton wrote: > On Tue, 07 Nov 2006 14:59:52 -0600 > Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote: > > > On Tue, 2006-11-07 at 11:47 -0800, Andrew Morton wrote: > > > Why is ext3 slow?? > > > > Allocation? I don't see anything indicating that Bill is overwriting an > > existing file, so there is block allocation and journaling overhead. If > > that's the case, it would be interesting to see how fast ext3 is when > > overwriting a file. Extents and delayed allocation should improve on > > this a lot. Will do. > Maybe. or perhaps some funniness with RAID aligment. I neglected to include the relevant RAID/mkfs info here. device=/dev/sdc2 # ought to have been on a raid stripe boundary # very close to the start of the array # XFS: mkfs.xfs -f -d su=65536,sw=6 -l su=65536 $device mount -o noatime,attr2,largeio,logbsize=64k $device /mnt # Ext3: XFS has problems up through 2.6.18-rc5; use slow, but safe, Ext3: mke2fs -j -J size=400 -E stride=96 $device mount -o noatime $device /mnt Also, I ran blockdev --flushbufs and echo 1 | sudo tee /proc/sys/vm/drop_caches before each test. -Bill ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-11-07 22:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-07 19:47 Fw: Re: ICP, 3ware, Areca? Andrew Morton 2006-11-07 19:55 ` Alex Tomas 2006-11-07 20:59 ` Dave Kleikamp 2006-11-07 21:06 ` bzzz 2006-11-07 21:45 ` Andrew Morton 2006-11-07 22:07 ` Bill Rugolsky Jr. 2006-11-07 22:20 ` Bill Rugolsky Jr.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox