* [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
@ 2001-09-24 14:09 Beau Kuiper
2001-09-24 14:46 ` [reiserfs-list] " Chris Mason
2001-09-24 15:32 ` Matthias Andree
0 siblings, 2 replies; 30+ messages in thread
From: Beau Kuiper @ 2001-09-24 14:09 UTC (permalink / raw)
To: linux-kernel; +Cc: reiserfs-list
Hi all again,
I have updated my last set of patches for reiserfs to run on the 2.4.10
kernel.
The new set of patches create a new method to do kupdated syncs. On
filesystems that do no support this new method, the regular write_super
method is used. Then reiserfs on kupdated super_sync, simply calls the
flush_old_commits code with immediate mode off.
The reiserfs improvements in 2.4.10 are great, but still not as good as
2.2.19 was.
I have run two benchmarks on:
the 2.4.9 kernel (plain, slow, starting point)
the 2.4.9 kernel (with kupdated disabled, this is where we want to be)
the 2.4.10 kernel (plain, quite fast though)
the 2.4.10 kernel with my patches.
The benchmarks are:
dbench 10 (done 4 times, with first result discarded)
kernel compliation times (done twice, with first result discarded)
The first result in all benchmarks is discarded because it is used to set up
the cache to a consistant state.
All benchmarks are run on the following machine:
Duron 700
VIA KT133 northbridge and 686A southbridge.
384meg RAM
40 gig IBM drive (7200rpm, GXP60)
The IBM drive has its internal write caching disabled (because it is damned
good :-) ) since it hides the problems that my old drive had (I upgraded a
few days ago)
I was going to use an old Quantum 5400rpm drive for these benchmarks but I
blew it up ;-) (I got fire!! on one of the chips and everything, somehow
managed to plug power cable into it backwards) Its smell is still lingering
as I write this. Could someone with a slow 5400rpm drive do these tests and
report back.
Anyway, enough yabbering, onto the results
---- 2.4.9 (plain)
dbench: 25.6155, 24.4236, 26.05 MB/Sec
kernel compile: 5.41.744 wall time, 4.43.880 user time, 0.16.380 sys time
---- 2.4.9 (kupdated off)
dbench: 33.763, 36.452, 32.0602 MB/Sec
kernel compile: 5.7.967 wall time, 4.44.140 user time, 0.15.380 sys time
---- 2.4.10 (plain)
dbench: 35.3584, 31.1634, 32.3602 MB/Sec
kernel compile: 5.21.458 wall time, 4.43.840 user time, 0.14.590 sys time
---- 2.4.10 (patched with attached patch)
dbench : 35.028, 33.6774, 38.2342 MB/Sec
kernel compile: 5.4.640 wall time, 4.42.950 user time, 0.15.160 sys time
Conclusions:
The 2.4.10 kernel improved reiserfs performace a lot all by itself,
especially in dbench. In kernel compiles, however (maybe because dbench
doesn't stress kupdated much), it still isn't as fast as my new patch.
Also, the performace problems seem to be very dependant on the hardware being
used. 5400rpm drives get hurt a lot, while 7200 rpm drives seem to handle it
better. Decent write caching on IDE devices (like the 2meg buffer on the IBM)
can completely hide this issue.
Thanks to everyone who has helped me so far, and I look forward to further
comments and assistance,
Beau Kuiper
kuib-kl@ljbc.wa.edu.au
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [reiserfs-list] [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 14:09 [PATCH] 2.4.10 improved reiserfs a lot, but could still be better Beau Kuiper
@ 2001-09-24 14:46 ` Chris Mason
2001-09-24 15:32 ` Matthias Andree
1 sibling, 0 replies; 30+ messages in thread
From: Chris Mason @ 2001-09-24 14:46 UTC (permalink / raw)
To: Beau Kuiper, linux-kernel; +Cc: reiserfs-list
On Monday, September 24, 2001 10:09:59 PM +0800 Beau Kuiper
<kuib-kl@ljbc.wa.edu.au> wrote:
> Hi all again,
>
> I have updated my last set of patches for reiserfs to run on the 2.4.10
> kernel.
>
> The new set of patches create a new method to do kupdated syncs. On
> filesystems that do no support this new method, the regular write_super
> method is used. Then reiserfs on kupdated super_sync, simply calls the
> flush_old_commits code with immediate mode off.
>
Ok, I think the patch is missing ;-)
What we need to do now is look more closely at why the performance
increases. There are a few possibilities:
1) larger transactions due to less frequent commits.
2) More efficient metadata writes due to less frequent calls to
reiserfs_journal_kupdate
3) Less time spent flushing direct->indirect targets due to less frequent
commits.
The good news is we can easily separate these. Start by running
debugreiserfs -j /dev/xxx > /tmp/foo
This prints out the transactions still in the log. You are looking for
j_len, which is the length of each transaction. The closer this is to ~900
or so, the more efficient the log is.
Q1) Does your patch increase the average length of the transactions?
Q2) Run the tests again with -o notail (including on pure 2.4.10). Does
the performance gain go down relative to pure 2.4.10?
If Q1 is true, we might be able to tune /proc/sys/vm/bdflush to have
similar benefits.
If Q2 is true, we need to tune the way direct->indirect targets get flushed
(this probably neesd to be tuned regardless).
If neither is true, it is probably the less frequent calls to
reiserfs_journal_kupdate, also tunable through the bdflush params.
I'm not saying we don't need your patch, but I'd like to find out for sure
why it is helping.
Thanks,
Chris
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 14:09 [PATCH] 2.4.10 improved reiserfs a lot, but could still be better Beau Kuiper
2001-09-24 14:46 ` [reiserfs-list] " Chris Mason
@ 2001-09-24 15:32 ` Matthias Andree
2001-09-24 15:45 ` Alan Cox
2001-09-24 16:15 ` Nicholas Knight
1 sibling, 2 replies; 30+ messages in thread
From: Matthias Andree @ 2001-09-24 15:32 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
On Mon, 24 Sep 2001, Beau Kuiper wrote:
> Also, the performace problems seem to be very dependant on the hardware being
> used. 5400rpm drives get hurt a lot, while 7200 rpm drives seem to handle it
> better. Decent write caching on IDE devices (like the 2meg buffer on the IBM)
> can completely hide this issue.
Decent write caching on IDE devices can eat your whole file system.
Turn it off (I have no idea of internals, but I presume it'll still be a
write-through cache, so reading back will still be served from the
buffer). Do hdparm -W0 /dev/hd[a-h].
One might consider adding TCQ to the IDE driver. FreeBSD already has it,
and IBM drives talk it.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 15:32 ` Matthias Andree
@ 2001-09-24 15:45 ` Alan Cox
2001-09-24 15:47 ` Matthias Andree
2001-09-24 16:15 ` Nicholas Knight
1 sibling, 1 reply; 30+ messages in thread
From: Alan Cox @ 2001-09-24 15:45 UTC (permalink / raw)
To: Matthias Andree; +Cc: linux-kernel, reiserfs-list
> > better. Decent write caching on IDE devices (like the 2meg buffer on the IBM)
> > can completely hide this issue.
>
> Decent write caching on IDE devices can eat your whole file system.
YM bad write caching 8)
> Turn it off (I have no idea of internals, but I presume it'll still be a
> write-through cache, so reading back will still be served from the
> buffer). Do hdparm -W0 /dev/hd[a-h].
You can't turn it off and on many drives you can't flush the cache either
the operation is not implemented.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 15:45 ` Alan Cox
@ 2001-09-24 15:47 ` Matthias Andree
2001-09-24 16:08 ` Alan Cox
0 siblings, 1 reply; 30+ messages in thread
From: Matthias Andree @ 2001-09-24 15:47 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
On Mon, 24 Sep 2001, Alan Cox wrote:
> > > better. Decent write caching on IDE devices (like the 2meg buffer on the IBM)
> > > can completely hide this issue.
> >
> > Decent write caching on IDE devices can eat your whole file system.
>
> YM bad write caching 8)
Well, drives do reorder their cache flushes, otherwise, they don't need
the cache.
> > Turn it off (I have no idea of internals, but I presume it'll still be a
> > write-through cache, so reading back will still be served from the
> > buffer). Do hdparm -W0 /dev/hd[a-h].
>
> You can't turn it off and on many drives you can't flush the cache either
> the operation is not implemented.
Those drives should be blacklisted and rejected as soon as someone tries
to mount those pieces rw. Either the drive can make guarantees when a
write to permanent storage has COMPLETED (either by switching off the
cache or by a flush operation) or it belongs ripped out of the boxes and
stuffed down the throat of the idiot who built it.
--
Matthias Andree
"Those who give up essential liberties for temporary safety deserve
neither liberty nor safety." - Benjamin Franklin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 15:47 ` Matthias Andree
@ 2001-09-24 16:08 ` Alan Cox
2001-09-24 16:08 ` [reiserfs-list] " Chris Dukes
2001-09-24 16:54 ` Matthias Andree
0 siblings, 2 replies; 30+ messages in thread
From: Alan Cox @ 2001-09-24 16:08 UTC (permalink / raw)
To: Matthias Andree; +Cc: linux-kernel, reiserfs-list
> Those drives should be blacklisted and rejected as soon as someone tries
> to mount those pieces rw. Either the drive can make guarantees when a
> write to permanent storage has COMPLETED (either by switching off the
> cache or by a flush operation) or it belongs ripped out of the boxes and
> stuffed down the throat of the idiot who built it.
In which case you can choose between ancient ST-506 drives and SCSI
Alan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 16:08 ` Alan Cox
@ 2001-09-24 16:08 ` Chris Dukes
2001-09-24 16:54 ` Matthias Andree
1 sibling, 0 replies; 30+ messages in thread
From: Chris Dukes @ 2001-09-24 16:08 UTC (permalink / raw)
To: Alan Cox; +Cc: Matthias Andree, linux-kernel, reiserfs-list
On Mon, Sep 24, 2001 at 05:08:10PM +0100, Alan Cox wrote:
> > Those drives should be blacklisted and rejected as soon as someone tries
> > to mount those pieces rw. Either the drive can make guarantees when a
> > write to permanent storage has COMPLETED (either by switching off the
> > cache or by a flush operation) or it belongs ripped out of the boxes and
> > stuffed down the throat of the idiot who built it.
>
> In which case you can choose between ancient ST-506 drives and SCSI
I thought that Andre had some information as to which devices were
compliant and which weren't.
--
Chris Dukes
The law is a code that isolates justice from public participation.
-- Stephen Marhall
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 15:32 ` Matthias Andree
2001-09-24 15:45 ` Alan Cox
@ 2001-09-24 16:15 ` Nicholas Knight
2001-09-24 16:40 ` [reiserfs-list] " Lehmann
` (2 more replies)
1 sibling, 3 replies; 30+ messages in thread
From: Nicholas Knight @ 2001-09-24 16:15 UTC (permalink / raw)
To: Matthias Andree, linux-kernel, reiserfs-list
On Monday 24 September 2001 08:32 am, Matthias Andree wrote:
> On Mon, 24 Sep 2001, Beau Kuiper wrote:
> > Also, the performace problems seem to be very dependant on the
> > hardware being used. 5400rpm drives get hurt a lot, while 7200 rpm
> > drives seem to handle it better. Decent write caching on IDE
> > devices (like the 2meg buffer on the IBM) can completely hide this
> > issue.
>
> Decent write caching on IDE devices can eat your whole file system.
>
> Turn it off (I have no idea of internals, but I presume it'll still
> be a write-through cache, so reading back will still be served from
> the buffer). Do hdparm -W0 /dev/hd[a-h].
I'm sorry, but that's not acceptable.
Please note the dd timings at the bottom of this message.
This is consistant with real workload on my and other peoples systems,
200-300MB+ files, clear up to 1GB+ files are at times ROUTINE for
writing. This is esspecialy applicable if dealing with disk images, etc.
Disabling write cache creates times that are twice as large or more
than WITH write cache. Unless the system or drive has a serious,
SPECIFIC fault with its write cache, disabling it can cause an
unacceptable performance hit.
Yes, a typical desktop user isn't going to notice much, even a normal
webserver or fileserver not dealing with constant updates may not, but
certain workloads will. These workloads are real enough that telling
people to disable write caching out of hand is a bad idea.
There's no way in hell I'm going to accept having my performance cut in
half or more on my daily workload due to the remote possibility that
something may happen to my data/filesystem in mid-write. That's what
the cheap little UPS sitting beside my desk that gives me ample time to
power down is for.
Short of a catastrophic power supply, motherboard, or other component
failure (which often stands a good chance of destroying the drive
anyway), a power failure is the main problem here.
Keep in mind also, that you may be putting your data and filesystems in
more risk by not using a write cache as with using it.
(below timings are all on an IBM 75GXP DTLA-307045, 46.11BB("GB")
7200RPM ATA/100 drive on an off-board Promise Ultra/100 (ATA/100)
controller on an 800Mhz Athlon 100Mhz FSB PC100 RAM CAS3@100Mhz.)
root@c779218-a:/home/nnkk# hdparm -W0 /dev/hde
/dev/hde:
setting drive write-caching to 0 (off)
root@c779218-a:/home/nnkk# time dd if=/dev/zero of=test.zero bs=1024k
count=128
128+0 records in
128+0 records out
real 0m18.178s
user 0m0.000s
sys 0m1.740s
root@c779218-a:/home/nnkk# rm -f test.zero
root@c779218-a:/home/nnkk# hdparm -W1 /dev/hde
/dev/hde:
setting drive write-caching to 1 (on)
root@c779218-a:/home/nnkk# time dd if=/dev/zero of=test.zero bs=1024k
count=128
128+0 records in
128+0 records out
real 0m7.809s
user 0m0.020s
sys 0m1.890s
root@c779218-a:/home/nnkk# time dd if=/dev/zero of=test.zero bs=1024k
count=256
256+0 records in
256+0 records out
real 0m44.328s
user 0m0.010s
sys 0m3.510s
root@c779218-a:/home/nnkk# hdparm -W1 /dev/hde
/dev/hde:
setting drive write-caching to 1 (on)
root@c779218-a:/home/nnkk# rm -f test.zero
root@c779218-a:/home/nnkk# time dd if=/dev/zero of=test.zero bs=1024k
count=256
256+0 records in
256+0 records out
real 0m18.790s
user 0m0.000s
sys 0m3.780s
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 16:15 ` Nicholas Knight
@ 2001-09-24 16:40 ` Lehmann
2001-09-24 16:53 ` Matthias Andree
2001-09-25 12:54 ` Jorge Nerín
2 siblings, 0 replies; 30+ messages in thread
From: Lehmann @ 2001-09-24 16:40 UTC (permalink / raw)
To: Nicholas Knight; +Cc: Matthias Andree, linux-kernel, reiserfs-list
On Mon, Sep 24, 2001 at 09:15:19AM -0700, Nicholas Knight <tegeran@home.com> wrote:
[turning off write-cache]
> I'm sorry, but that's not acceptable.
(I had it turned off for a long time, until I reasoned: real power-outages
are very rare, so I can leave it turned on anyways and risk a
filesystemcheck after outages).
The reason this kills performance ALWAYS is that ide does not support large
enough transfer sizes (8-32k on most drives) to fill one track.
Turning off write caching has a big chance of lowering your transaction
throughput to the drive's RPM. Combined with linux' not-that-optimal elevator
and write behaviour this has good chances of costing a lot of performance.
TCQ will obviously help, but I somehow doubt it will work fine - even with
SCSI TCQ is a nightmare (the aic7xxx drive regularly kills my system if
tagged queueing is enabled for example). IDE currently is a mess (I do
_not_ expect my drive performance to simply halve just because two devices
to share the bus, even if this is how conservative ide is destined to
work).
I am convinced that there is a way of creating a hard write barrier (e.g.
a cache flush that waits) with most if not all ide disks - putting them
into powersave should work, if nothing else ;)
So apart from driver issues (such as TCQ), the mid-layer needs to be improved
(and plans already exist) to support semi-ordered writes and give as much
control over the device cache as possible.
Not to mention that the VM needs improvements here as well.
I didn't say much more than Alan implied: we have to live with it, so we
better think about making it work.
--
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@goof.com |e|
-=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
The choice of a GNU generation |
|
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 16:15 ` Nicholas Knight
2001-09-24 16:40 ` [reiserfs-list] " Lehmann
@ 2001-09-24 16:53 ` Matthias Andree
2001-09-24 16:57 ` [reiserfs-list] " Lehmann
2001-09-24 20:05 ` Nicholas Knight
2001-09-25 12:54 ` Jorge Nerín
2 siblings, 2 replies; 30+ messages in thread
From: Matthias Andree @ 2001-09-24 16:53 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
On Mon, 24 Sep 2001, Nicholas Knight wrote:
> > Turn it off (I have no idea of internals, but I presume it'll still
> > be a write-through cache, so reading back will still be served from
> > the buffer). Do hdparm -W0 /dev/hd[a-h].
>
> I'm sorry, but that's not acceptable.
> Please note the dd timings at the bottom of this message.
Well, of course, turning off the cache will cause performance penalties,
but it at least gives you a chance to get away with a recoverable file
system should the power fail or the box crash.
> Yes, a typical desktop user isn't going to notice much, even a normal
> webserver or fileserver not dealing with constant updates may not, but
> certain workloads will. These workloads are real enough that telling
> people to disable write caching out of hand is a bad idea.
I switched a box to ext3 with write caches off in expectance of multiple
power outages during works, and NOTHING happened. I expect that box is
now writing 4 times slower than before, I have no real figures, and it's
still "smooth enough" in spite of 2.4.9.
> Keep in mind also, that you may be putting your data and filesystems in
> more risk by not using a write cache as with using it.
Utterly non-sense.
Linear writing as dd mostly does is BTW something which should never be
affected by write caches.
--
Matthias Andree
"Those who give up essential liberties for temporary safety deserve
neither liberty nor safety." - Benjamin Franklin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 16:08 ` Alan Cox
2001-09-24 16:08 ` [reiserfs-list] " Chris Dukes
@ 2001-09-24 16:54 ` Matthias Andree
1 sibling, 0 replies; 30+ messages in thread
From: Matthias Andree @ 2001-09-24 16:54 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
On Mon, 24 Sep 2001, Alan Cox wrote:
> > Those drives should be blacklisted and rejected as soon as someone tries
> > to mount those pieces rw. Either the drive can make guarantees when a
> > write to permanent storage has COMPLETED (either by switching off the
> > cache or by a flush operation) or it belongs ripped out of the boxes and
> > stuffed down the throat of the idiot who built it.
>
> In which case you can choose between ancient ST-506 drives and SCSI
Sorry, a disk drive which makes no guarantees even after a flush, does
not belong in my boxen. I'd return it as broken the first day I figured
it did lazy write-back caching. No file system can be safe on such
disks.
--
Matthias Andree
"Those who give up essential liberties for temporary safety deserve
neither liberty nor safety." - Benjamin Franklin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 16:53 ` Matthias Andree
@ 2001-09-24 16:57 ` Lehmann
2001-09-25 14:04 ` bill davidsen
2001-09-24 20:05 ` Nicholas Knight
1 sibling, 1 reply; 30+ messages in thread
From: Lehmann @ 2001-09-24 16:57 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
On Mon, Sep 24, 2001 at 06:53:03PM +0200, Matthias Andree <matthias.andree@stud.uni-dortmund.de> wrote:
> Linear writing as dd mostly does is BTW something which should never be
> affected by write caches.
A write cache can and will speed up linear writes on typical ide setups.
--
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@goof.com |e|
-=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
The choice of a GNU generation |
|
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 16:53 ` Matthias Andree
2001-09-24 16:57 ` [reiserfs-list] " Lehmann
@ 2001-09-24 20:05 ` Nicholas Knight
2001-09-25 0:11 ` Matthias Andree
1 sibling, 1 reply; 30+ messages in thread
From: Nicholas Knight @ 2001-09-24 20:05 UTC (permalink / raw)
To: Matthias Andree, linux-kernel, reiserfs-list
On Monday 24 September 2001 09:53 am, Matthias Andree wrote:
> On Mon, 24 Sep 2001, Nicholas Knight wrote:
> > > Turn it off (I have no idea of internals, but I presume it'll
> > > still be a write-through cache, so reading back will still be
> > > served from the buffer). Do hdparm -W0 /dev/hd[a-h].
> >
> > I'm sorry, but that's not acceptable.
> > Please note the dd timings at the bottom of this message.
>
> Well, of course, turning off the cache will cause performance
> penalties, but it at least gives you a chance to get away with a
> recoverable file system should the power fail or the box crash.
Would you like to read the rest of my message please? Cheap UPS's can
provide protection against power failures. If your data is that
valuble, you can afford a cheap UPS to give you 5 minutes to shut down.
>
> > Yes, a typical desktop user isn't going to notice much, even a
> > normal webserver or fileserver not dealing with constant updates
> > may not, but certain workloads will. These workloads are real
> > enough that telling people to disable write caching out of hand is
> > a bad idea.
>
> I switched a box to ext3 with write caches off in expectance of
> multiple power outages during works, and NOTHING happened. I expect
> that box is now writing 4 times slower than before, I have no real
> figures, and it's still "smooth enough" in spite of 2.4.9.
Would you like to read it AGAIN? I specificaly said that MOST people
would not notice a real difference.
>
> > Keep in mind also, that you may be putting your data and
> > filesystems in more risk by not using a write cache as with using
> > it.
>
> Utterly non-sense.
>
> Linear writing as dd mostly does is BTW something which should never
> be affected by write caches.
Explain the numbers then.
I followed *YOUR* instructions for disabling write caching.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 20:05 ` Nicholas Knight
@ 2001-09-25 0:11 ` Matthias Andree
2001-09-25 4:49 ` Nicholas Knight
2001-09-25 14:47 ` Alex Bligh - linux-kernel
0 siblings, 2 replies; 30+ messages in thread
From: Matthias Andree @ 2001-09-25 0:11 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
On Mon, 24 Sep 2001, Nicholas Knight wrote:
> Would you like to read the rest of my message please? Cheap UPS's can
> provide protection against power failures. If your data is that
> valuble, you can afford a cheap UPS to give you 5 minutes to shut down.
No UPS can protect you from system crashes. The problem is, with the
drive cache on, the drive will acknowledge having written the data early
and reorder its writes, but who makes guarantees it can write its whole
2 MB to disk should the power fail? No-one. ATA6 drafts have a NOTE that
says, the FLUSH CACHE command may take longer than 30 s to complete.
Journalling File systems don't get you anywhere if the drive reorders
its blocks before the write (I presume, most will do), they may instead
turn the whole partition to junk without notice, because any assumptions
as to the on-disk structure don't hold.
> > Linear writing as dd mostly does is BTW something which should never
> > be affected by write caches.
>
> Explain the numbers then.
I can't, any explanation right now would be conjecture. I can reproduce
the numbers on my IBM DTLA-307045 (Promise) and on my Western Digital
CAC420400D (VIA KT133, the disk looks like an IBM DJNA-352030 OEM,
though).
However, would you care to elaborate how switching OFF the cache should
harm data, provided you don't need to cater for power outages (UPS
attached, e. g.)?
hdparm:
" -W Disable/enable the IDE drive's write-caching fea
ture (usually OFF by default)."
> I followed *YOUR* instructions for disabling write caching.
No-one doubts you did. I said it's weird that the drive write cache has
an impact on dd figures. It may be worthwhile to investigate this, but
again, any try to explain this would be a guess.
It may be an implementation problem in our IBM drives which ship with
their write caches enabled, someone please do this test on current
Fujitsu, Maxtor or Seagate IDE drives or with different controllers.
It would suffice if the kernel could flush the drive's buffers on
fsync() and other synchronous operations, but a flush command has only
recently appeared in the ATA standards, as it seems. I only have drafts
here, ATA 3 draft rev. 6 did not offer any command to flush the cache,
ATA 6 draft makes it mandatory for all devices that do offer a PACKET
interface. Not sure about the actual ATA 3, 4, or 5 standards.
Why are disk drives slower with their caches disabled on LINEAR writes?
--
Matthias Andree
"Those who give up essential liberties for temporary safety deserve
neither liberty nor safety." - Benjamin Franklin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 0:11 ` Matthias Andree
@ 2001-09-25 4:49 ` Nicholas Knight
2001-09-25 6:00 ` Beau Kuiper
2001-09-25 10:42 ` Matthias Andree
2001-09-25 14:47 ` Alex Bligh - linux-kernel
1 sibling, 2 replies; 30+ messages in thread
From: Nicholas Knight @ 2001-09-25 4:49 UTC (permalink / raw)
To: Matthias Andree, linux-kernel, reiserfs-list
On Monday 24 September 2001 05:11 pm, Matthias Andree wrote:
> On Mon, 24 Sep 2001, Nicholas Knight wrote:
> > Would you like to read the rest of my message please? Cheap UPS's
> > can provide protection against power failures. If your data is that
> > valuble, you can afford a cheap UPS to give you 5 minutes to shut
> > down.
>
> No UPS can protect you from system crashes. The problem is, with the
> drive cache on, the drive will acknowledge having written the data
> early and reorder its writes, but who makes guarantees it can write
> its whole 2 MB to disk should the power fail? No-one. ATA6 drafts
> have a NOTE that says, the FLUSH CACHE command may take longer than
> 30 s to complete.
>
> Journalling File systems don't get you anywhere if the drive reorders
> its blocks before the write (I presume, most will do), they may
> instead turn the whole partition to junk without notice, because any
> assumptions as to the on-disk structure don't hold.
>
> > > Linear writing as dd mostly does is BTW something which should
> > > never be affected by write caches.
> >
> > Explain the numbers then.
>
> I can't, any explanation right now would be conjecture. I can
> reproduce the numbers on my IBM DTLA-307045 (Promise) and on my
> Western Digital CAC420400D (VIA KT133, the disk looks like an IBM
> DJNA-352030 OEM, though).
>
> However, would you care to elaborate how switching OFF the cache
> should harm data, provided you don't need to cater for power outages
> (UPS attached, e. g.)?
It's a very remote possability of failure, like most instances where
write-cache would cause problems. Catastrophic failure of the IDE cable
in mid-write will cause problems. If write cache is enabled, the write
stands a higher chance of having made it to the drive before the cable
died, with it off, it stands a higher chance of NOT having made it
entirely to the drive.
For most drives, I don't know for sure if they'd finish the write
that's now sitting in their cache, but I expect higher quality drives
(such as our IBM drives) definitely would. Infact I may even be willing
to test this later (my swap partition looks like it wants to help :)
>
> hdparm:
>
> " -W Disable/enable the IDE drive's write-caching fea
> ture (usually OFF by default)."
>
> > I followed *YOUR* instructions for disabling write caching.
>
> No-one doubts you did. I said it's weird that the drive write cache
> has an impact on dd figures. It may be worthwhile to investigate
> this, but again, any try to explain this would be a guess.
>
> It may be an implementation problem in our IBM drives which ship with
> their write caches enabled, someone please do this test on current
> Fujitsu, Maxtor or Seagate IDE drives or with different controllers.
Either Maxtor or Western Digital share very close designs to IBM
drives, I belive they had some sort of development partnership. I'm not
sure if it was Maxtor or WD.
>
> It would suffice if the kernel could flush the drive's buffers on
> fsync() and other synchronous operations, but a flush command has
> only recently appeared in the ATA standards, as it seems. I only have
> drafts here, ATA 3 draft rev. 6 did not offer any command to flush
> the cache, ATA 6 draft makes it mandatory for all devices that do
> offer a PACKET interface. Not sure about the actual ATA 3, 4, or 5
> standards.
>
> Why are disk drives slower with their caches disabled on LINEAR
> writes?
Maybe the cache isn't doing what we think it is?
You're right, now that I'm thinking about it, it doesn't make a whole
lot of sense. The cache on our IBM's is just 2MB.
Does anyone have contacts at IBM and/or Western Digital? Something's
up... The 256MB write with write-cache off was going at 5.8MB/sec, and
with it on it was going at 14.22MB/sec (averages). One interesting
thing, the timings are showing a pretty consistant but tiny increase in
sys time with write caching on.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 4:49 ` Nicholas Knight
@ 2001-09-25 6:00 ` Beau Kuiper
2001-09-25 6:17 ` Nicholas Knight
2001-09-25 10:44 ` Matthias Andree
2001-09-25 10:42 ` Matthias Andree
1 sibling, 2 replies; 30+ messages in thread
From: Beau Kuiper @ 2001-09-25 6:00 UTC (permalink / raw)
To: Nicholas Knight; +Cc: Matthias Andree, linux-kernel, reiserfs-list
On Mon, 24 Sep 2001, Nicholas Knight wrote:
> On Monday 24 September 2001 05:11 pm, Matthias Andree wrote:
> > On Mon, 24 Sep 2001, Nicholas Knight wrote:
> > > Would you like to read the rest of my message please? Cheap UPS's
> > > can provide protection against power failures. If your data is that
> > > valuble, you can afford a cheap UPS to give you 5 minutes to shut
> > > down.
> >
> > No UPS can protect you from system crashes. The problem is, with the
> > drive cache on, the drive will acknowledge having written the data
> > early and reorder its writes, but who makes guarantees it can write
> > its whole 2 MB to disk should the power fail? No-one. ATA6 drafts
> > have a NOTE that says, the FLUSH CACHE command may take longer than
> > 30 s to complete.
> >
> > Journalling File systems don't get you anywhere if the drive reorders
> > its blocks before the write (I presume, most will do), they may
> > instead turn the whole partition to junk without notice, because any
> > assumptions as to the on-disk structure don't hold.
> >
> > > > Linear writing as dd mostly does is BTW something which should
> > > > never be affected by write caches.
> > >
> > > Explain the numbers then.
> >
> > I can't, any explanation right now would be conjecture. I can
> > reproduce the numbers on my IBM DTLA-307045 (Promise) and on my
> > Western Digital CAC420400D (VIA KT133, the disk looks like an IBM
> > DJNA-352030 OEM, though).
> >
> > However, would you care to elaborate how switching OFF the cache
> > should harm data, provided you don't need to cater for power outages
> > (UPS attached, e. g.)?
>
> It's a very remote possability of failure, like most instances where
> write-cache would cause problems. Catastrophic failure of the IDE cable
> in mid-write will cause problems. If write cache is enabled, the write
> stands a higher chance of having made it to the drive before the cable
> died, with it off, it stands a higher chance of NOT having made it
> entirely to the drive.
Catastrophic failure of the IDE cable???.
What are you doing to the poor thing, jumping on it?
Anyway, with a UPS, the issue of IDE device write caching is fairly moot.
As long as power is applied, any write issued to the drive will be
completed regardless of whether write caching is on or off. I am fairly
certain the write caching is pretty conservative, which is write as soon
as possible after elavating with other write requests and giving read
requests priority.
I can imagine how it improves sequenial write performace too. With the
write cache off, the computer cannot send another write request to the IDE
device until the last one had finished. By the time the computer is told
the request was finished and it has sent a new request to the drive, the
disk would have spun out of the place it was supposed to be placed. The
drive will then have to wait for the disk to spin around fully again
before doing the write. With the write cache enabled, several requests can
be placed into the drive buffer and written in the single revolution of
the drive.
Beau Kuiper
kuib-kl@ljbc.wa.edu.au
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 6:00 ` Beau Kuiper
@ 2001-09-25 6:17 ` Nicholas Knight
2001-09-25 10:44 ` Matthias Andree
1 sibling, 0 replies; 30+ messages in thread
From: Nicholas Knight @ 2001-09-25 6:17 UTC (permalink / raw)
To: Beau Kuiper; +Cc: Matthias Andree, linux-kernel, reiserfs-list
On Monday 24 September 2001 11:00 pm, Beau Kuiper wrote:
> On Mon, 24 Sep 2001, Nicholas Knight wrote:
> > It's a very remote possability of failure, like most instances
> > where write-cache would cause problems. Catastrophic failure of the
> > IDE cable in mid-write will cause problems. If write cache is
> > enabled, the write stands a higher chance of having made it to the
> > drive before the cable died, with it off, it stands a higher chance
> > of NOT having made it entirely to the drive.
>
> Catastrophic failure of the IDE cable???.
> What are you doing to the poor thing, jumping on it?
actually, jumping on it while it's flat probably wouldn't cause too
many problems... :P
It's not neccisarily that the cable failed, poor choice of words on my
part probably, the instance I was thinking of was the cable pulled out
of the drive while it's in operation, I've seen cables pulled and
nearly pulled out of drives in operation, it can happen with a slip of
your hand while checking a fan, I poke around in my system while it's
on all the time, luckily I've yet to pull out the cable on my own
system, but I've seen it done.
>
> Anyway, with a UPS, the issue of IDE device write caching is fairly
> moot. As long as power is applied, any write issued to the drive will
> be completed regardless of whether write caching is on or off. I am
This was my point in all this really, aside from failure of the drive
itself, add a cheap UPS and the only risk is that your HDD controller
decides to write garbage :)
> fairly certain the write caching is pretty conservative, which is
> write as soon as possible after elavating with other write requests
> and giving read requests priority.
>
> I can imagine how it improves sequenial write performace too. With
> the write cache off, the computer cannot send another write request
> to the IDE device until the last one had finished. By the time the
> computer is told the request was finished and it has sent a new
> request to the drive, the disk would have spun out of the place it
> was supposed to be placed. The drive will then have to wait for the
> disk to spin around fully again before doing the write. With the
> write cache enabled, several requests can be placed into the drive
> buffer and written in the single revolution of the drive.
OK, now THIS makes sense! Thanks.
>
> Beau Kuiper
> kuib-kl@ljbc.wa.edu.au
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 4:49 ` Nicholas Knight
2001-09-25 6:00 ` Beau Kuiper
@ 2001-09-25 10:42 ` Matthias Andree
2001-09-25 11:07 ` Nicholas Knight
1 sibling, 1 reply; 30+ messages in thread
From: Matthias Andree @ 2001-09-25 10:42 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
On Mon, 24 Sep 2001, Nicholas Knight wrote:
> It's a very remote possability of failure, like most instances where
> write-cache would cause problems. Catastrophic failure of the IDE cable
> in mid-write will cause problems. If write cache is enabled, the write
> stands a higher chance of having made it to the drive before the cable
> died, with it off, it stands a higher chance of NOT having made it
> entirely to the drive.
Cables don't suddenly die without the help of e. g. your CPU fan.
> For most drives, I don't know for sure if they'd finish the write
> that's now sitting in their cache, but I expect higher quality drives
> (such as our IBM drives) definitely would. Infact I may even be willing
> to test this later (my swap partition looks like it wants to help :)
Drives would not write incomplete blocks.
> > It may be an implementation problem in our IBM drives which ship with
> > their write caches enabled, someone please do this test on current
> > Fujitsu, Maxtor or Seagate IDE drives or with different controllers.
>
> Either Maxtor or Western Digital share very close designs to IBM
> drives, I belive they had some sort of development partnership. I'm not
> sure if it was Maxtor or WD.
The Western Digital 420400D (20 GB, 5400/min) and its 7200/min brother
with 18 GBs were IBM disk drives, supposedly, but the WD ...AA/BB drives
and whatever else there was looked some different from IBM drives.
> > Why are disk drives slower with their caches disabled on LINEAR
> > writes?
>
> Maybe the cache isn't doing what we think it is?
Maybe. A monitor software or debug mode would be good to see when writes
are scheduled and which blocks are written (I need to ask a friend of
mine who hacked ll_rw_blk.c on a different purpose for his diploma
thesis, maybe his code is valuable to figure things out.)
> Does anyone have contacts at IBM and/or Western Digital? Something's
> up... The 256MB write with write-cache off was going at 5.8MB/sec, and
> with it on it was going at 14.22MB/sec (averages). One interesting
> thing, the timings are showing a pretty consistant but tiny increase in
> sys time with write caching on.
I also saw that here, but again, it's basically the same hardware.
--
Matthias Andree
"Those who give up essential liberties for temporary safety deserve
neither liberty nor safety." - Benjamin Franklin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 6:00 ` Beau Kuiper
2001-09-25 6:17 ` Nicholas Knight
@ 2001-09-25 10:44 ` Matthias Andree
2001-09-25 11:01 ` ben-lists
1 sibling, 1 reply; 30+ messages in thread
From: Matthias Andree @ 2001-09-25 10:44 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
On Tue, 25 Sep 2001, Beau Kuiper wrote:
> I can imagine how it improves sequenial write performace too. With the
> write cache off, the computer cannot send another write request to the IDE
> device until the last one had finished. By the time the computer is told
> the request was finished and it has sent a new request to the drive, the
> disk would have spun out of the place it was supposed to be placed. The
> drive will then have to wait for the disk to spin around fully again
> before doing the write. With the write cache enabled, several requests can
> be placed into the drive buffer and written in the single revolution of
> the drive.
Might be an explanation. How big are the chunks of data that the
kernel sends to the disk?
--
Matthias Andree
"Those who give up essential liberties for temporary safety deserve
neither liberty nor safety." - Benjamin Franklin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 10:44 ` Matthias Andree
@ 2001-09-25 11:01 ` ben-lists
0 siblings, 0 replies; 30+ messages in thread
From: ben-lists @ 2001-09-25 11:01 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
On Tue, 25 Sep 2001, Matthias Andree wrote:
> > before doing the write. With the write cache enabled, several requests can
> > be placed into the drive buffer and written in the single revolution of
> > the drive.
>
> Might be an explanation. How big are the chunks of data that the
> kernel sends to the disk?
I would think that if you send enought data, the drive's cache would be
full and speed would drop to that of the data going to the disk itself. So
the drive must be able to write to the disk at the speed that the os sends
the data, even with write cache on. So maybe the speed difference is
caused by the protocol: the system has to wait for the write to be ack'd
by the drive longer ( -> throughput goes down) when the write cache is
off.
/Benno
--
Sebastian Benoit <ben-lists@andastra.de>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 10:42 ` Matthias Andree
@ 2001-09-25 11:07 ` Nicholas Knight
0 siblings, 0 replies; 30+ messages in thread
From: Nicholas Knight @ 2001-09-25 11:07 UTC (permalink / raw)
To: Matthias Andree, linux-kernel, reiserfs-list
On Tuesday 25 September 2001 03:42 am, Matthias Andree wrote:
> On Mon, 24 Sep 2001, Nicholas Knight wrote:
> > It's a very remote possability of failure, like most instances
> > where write-cache would cause problems. Catastrophic failure of the
> > IDE cable in mid-write will cause problems. If write cache is
> > enabled, the write stands a higher chance of having made it to the
> > drive before the cable died, with it off, it stands a higher chance
> > of NOT having made it entirely to the drive.
>
> Cables don't suddenly die without the help of e. g. your CPU fan.
I explained in another message the situation I was thinking of,
accidental pulling of the cable.
>
> > For most drives, I don't know for sure if they'd finish the write
> > that's now sitting in their cache, but I expect higher quality
> > drives (such as our IBM drives) definitely would. Infact I may even
> > be willing to test this later (my swap partition looks like it
> > wants to help :)
>
> Drives would not write incomplete blocks.
Not what I ment, I ment that if a write gets to the drive completely,
and part is still sitting in the cache, I'd think the drive would
continue to write it out as long as it has power. I wasn't reffering to
the write partialy being down the cable.
> >
> > Either Maxtor or Western Digital share very close designs to IBM
> > drives, I belive they had some sort of development partnership. I'm
> > not sure if it was Maxtor or WD.
>
> The Western Digital 420400D (20 GB, 5400/min) and its 7200/min
> brother with 18 GBs were IBM disk drives, supposedly, but the WD
> ...AA/BB drives and whatever else there was looked some different
> from IBM drives.
>
> > > Why are disk drives slower with their caches disabled on LINEAR
> > > writes?
> >
> > Maybe the cache isn't doing what we think it is?
>
> Maybe. A monitor software or debug mode would be good to see when
> writes are scheduled and which blocks are written (I need to ask a
> friend of mine who hacked ll_rw_blk.c on a different purpose for his
> diploma thesis, maybe his code is valuable to figure things out.)
>
> > Does anyone have contacts at IBM and/or Western Digital?
> > Something's up... The 256MB write with write-cache off was going at
> > 5.8MB/sec, and with it on it was going at 14.22MB/sec (averages).
> > One interesting thing, the timings are showing a pretty consistant
> > but tiny increase in sys time with write caching on.
>
> I also saw that here, but again, it's basically the same hardware.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 16:15 ` Nicholas Knight
2001-09-24 16:40 ` [reiserfs-list] " Lehmann
2001-09-24 16:53 ` Matthias Andree
@ 2001-09-25 12:54 ` Jorge Nerín
2001-09-25 13:06 ` [reiserfs-list] " Chris Mason
2001-09-25 13:17 ` Matthias Andree
2 siblings, 2 replies; 30+ messages in thread
From: Jorge Nerín @ 2001-09-25 12:54 UTC (permalink / raw)
To: tegeran; +Cc: Matthias Andree, linux-kernel, reiserfs-list
Nicholas Knight wrote:
>On Monday 24 September 2001 08:32 am, Matthias Andree wrote:
>
>>On Mon, 24 Sep 2001, Beau Kuiper wrote:
>>
>>>Also, the performace problems seem to be very dependant on the
>>>hardware being used. 5400rpm drives get hurt a lot, while 7200 rpm
>>>drives seem to handle it better. Decent write caching on IDE
>>>devices (like the 2meg buffer on the IBM) can completely hide this
>>>issue.
>>>
>>Decent write caching on IDE devices can eat your whole file system.
>>
>>Turn it off (I have no idea of internals, but I presume it'll still
>>be a write-through cache, so reading back will still be served from
>>the buffer). Do hdparm -W0 /dev/hd[a-h].
>>
>
>I'm sorry, but that's not acceptable.
>Please note the dd timings at the bottom of this message.
>
>This is consistant with real workload on my and other peoples systems,
>200-300MB+ files, clear up to 1GB+ files are at times ROUTINE for
>writing. This is esspecialy applicable if dealing with disk images, etc.
>Disabling write cache creates times that are twice as large or more
>than WITH write cache. Unless the system or drive has a serious,
>SPECIFIC fault with its write cache, disabling it can cause an
>unacceptable performance hit.
>
>Yes, a typical desktop user isn't going to notice much, even a normal
>webserver or fileserver not dealing with constant updates may not, but
>certain workloads will. These workloads are real enough that telling
>people to disable write caching out of hand is a bad idea.
>There's no way in hell I'm going to accept having my performance cut in
>half or more on my daily workload due to the remote possibility that
>something may happen to my data/filesystem in mid-write. That's what
>the cheap little UPS sitting beside my desk that gives me ample time to
>power down is for.
>Short of a catastrophic power supply, motherboard, or other component
>failure (which often stands a good chance of destroying the drive
>anyway), a power failure is the main problem here.
>Keep in mind also, that you may be putting your data and filesystems in
>more risk by not using a write cache as with using it.
>
>
>
>(below timings are all on an IBM 75GXP DTLA-307045, 46.11BB("GB")
>7200RPM ATA/100 drive on an off-board Promise Ultra/100 (ATA/100)
>controller on an 800Mhz Athlon 100Mhz FSB PC100 RAM CAS3@100Mhz.)
>
>root@c779218-a:/home/nnkk# hdparm -W0 /dev/hde
>
>/dev/hde:
> setting drive write-caching to 0 (off)
>root@c779218-a:/home/nnkk# time dd if=/dev/zero of=test.zero bs=1024k
>count=128
>128+0 records in
>128+0 records out
>
>real 0m18.178s
>user 0m0.000s
>sys 0m1.740s
>root@c779218-a:/home/nnkk# rm -f test.zero
>root@c779218-a:/home/nnkk# hdparm -W1 /dev/hde
>
>/dev/hde:
> setting drive write-caching to 1 (on)
>root@c779218-a:/home/nnkk# time dd if=/dev/zero of=test.zero bs=1024k
>count=128
>128+0 records in
>128+0 records out
>
>real 0m7.809s
>user 0m0.020s
>sys 0m1.890s
>root@c779218-a:/home/nnkk# time dd if=/dev/zero of=test.zero bs=1024k
>count=256
>256+0 records in
>256+0 records out
>
>real 0m44.328s
>user 0m0.010s
>sys 0m3.510s
>root@c779218-a:/home/nnkk# hdparm -W1 /dev/hde
>
>/dev/hde:
> setting drive write-caching to 1 (on)
>root@c779218-a:/home/nnkk# rm -f test.zero
>root@c779218-a:/home/nnkk# time dd if=/dev/zero of=test.zero bs=1024k
>count=256
>256+0 records in
>256+0 records out
>
>real 0m18.790s
>user 0m0.000s
>sys 0m3.780s
>
Who says test.zero is a linear file and it's not scattered around the
whole disk and the fs layer is filling holes...? If it's the case the
write cache is a BIG win, just think that the fs writes a chunk at the
beggining of the disk, then another chunk at the end, then another near
the beginning, then... you get the picture, in this case the disk
reorders the seeks to best fit.
If you want to try a REAL linear write do a dd if=/dev/zero of=/dev/hde7
or whatever unused partition you have.
--
Jorge Nerin
<comandante@zaralinux.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 12:54 ` Jorge Nerín
@ 2001-09-25 13:06 ` Chris Mason
2001-09-25 13:17 ` Matthias Andree
1 sibling, 0 replies; 30+ messages in thread
From: Chris Mason @ 2001-09-25 13:06 UTC (permalink / raw)
To: comandante, tegeran; +Cc: Matthias Andree, linux-kernel, reiserfs-list
On Tuesday, September 25, 2001 02:54:58 PM +0200 Jorge Nerín
<jnerin@juridicas.com> wrote:
>>
> Who says test.zero is a linear file and it's not scattered around the
> whole disk and the fs layer is filling holes...? If it's the case the
> write cache is a BIG win, just think that the fs writes a chunk at the
> beggining of the disk, then another chunk at the end, then another near
> the beginning, then... you get the picture, in this case the disk
> reorders the seeks to best fit.
>
> If you want to try a REAL linear write do a dd if=/dev/zero of=/dev/hde7
> or whatever unused partition you have.
>
Exactly, especially since during the dd you're going to seek back to the
log for a few commit writes.
>From a filesystem point of view, I've spent hours and hours getting
reiserfs to order the writes correctly to keep data consistent after a
crash. Turning on writeback caching without a battery backup more or less
throws all that work out the window. Don't do it.
For some people, a UPS counts as a battery backup, but there are lots of
reasons that doesn't fly in any kind of production environment. If your
job somehow depends on the data being safe, just get a raid controller with
batter backed cache.
-chris
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 12:54 ` Jorge Nerín
2001-09-25 13:06 ` [reiserfs-list] " Chris Mason
@ 2001-09-25 13:17 ` Matthias Andree
1 sibling, 0 replies; 30+ messages in thread
From: Matthias Andree @ 2001-09-25 13:17 UTC (permalink / raw)
To: linux-kernel, reiserfs-list
Could you please only quote the relevant parts, particularly, would you
erase deeper nesting of quotes in your replies? That'd help a lot of
people transfer less data they can still easily get from their own
mailer at a keypress. Thanks a lot.
On Tue, 25 Sep 2001, Jorge Nerín wrote:
> Who says test.zero is a linear file and it's not scattered around the
> whole disk and the fs layer is filling holes...? If it's the case the
> write cache is a BIG win, just think that the fs writes a chunk at the
> beggining of the disk, then another chunk at the end, then another near
> the beginning, then... you get the picture, in this case the disk
> reorders the seeks to best fit.
>
> If you want to try a REAL linear write do a dd if=/dev/zero of=/dev/hde7
> or whatever unused partition you have.
I did, and it showed almost the same behaviour, twice as fast with write
cache turned on.
Scattering the writes would (usually) only happen on a nearly filled
disk.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-24 16:57 ` [reiserfs-list] " Lehmann
@ 2001-09-25 14:04 ` bill davidsen
2001-09-25 17:39 ` bill davidsen
0 siblings, 1 reply; 30+ messages in thread
From: bill davidsen @ 2001-09-25 14:04 UTC (permalink / raw)
To: linux-kernel
In article <20010924185755.E4126@schmorp.de> Marc Lehmann wrote:
| On Mon, Sep 24, 2001 at 06:53:03PM +0200, Matthias Andree
| <matthias.andree@stud.uni-dortmund.de> wrote:
| > Linear writing as dd mostly does is BTW something which should never be
| > affected by write caches.
|
| A write cache can and will speed up linear writes on typical ide setups.
Pedantically I guess that's true, but I wouldn't expect any significant
change unless the drive were badly fragmented, since the write cache on
the drive should hold enough data to allow all data to a track to be
written in a single revolution.
Write cache makes a big difference in normal use, where seeks and such
can be optimized, but for a single process writing a single file (ie.
dd) I don't see where it would or could help much.
Since the single process is not a typical case on most systems, I don't
see that it's a burning question.
--
bill davidsen <davidsen@tmr.com>
"If I were a diplomat, in the best case I'd go hungry. In the worst
case, people would die."
-- Robert Lipe
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 0:11 ` Matthias Andree
2001-09-25 4:49 ` Nicholas Knight
@ 2001-09-25 14:47 ` Alex Bligh - linux-kernel
2001-09-25 15:13 ` Matthias Andree
2001-09-25 15:23 ` John Alvord
1 sibling, 2 replies; 30+ messages in thread
From: Alex Bligh - linux-kernel @ 2001-09-25 14:47 UTC (permalink / raw)
To: Matthias Andree, linux-kernel, reiserfs-list; +Cc: Alex Bligh - linux-kernel
--On Tuesday, September 25, 2001 2:11 AM +0200 Matthias Andree
<matthias.andree@stud.uni-dortmund.de> wrote:
> Why are disk drives slower with their caches disabled on LINEAR writes?
Probably because sectors are so close together on the physical media.
If you disable write caching, and are writing sectors 1001, 1002, 1003
etc., you tell it to write sector 1001, and it doesn't complete until
it's written it, you IRQ the PC, and it sends the write out for 1002,
which completes a little later. However, by this time 1002 has
flown past the drive head, as it wasn't immediately queued on the drive.
If you had only one sector of writeahead, this effect would disappear
(but is just as theoretically dangerous if there is no way to force
a flush() of the write cache).
--
Alex Bligh
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 14:47 ` Alex Bligh - linux-kernel
@ 2001-09-25 15:13 ` Matthias Andree
2001-09-25 15:23 ` John Alvord
1 sibling, 0 replies; 30+ messages in thread
From: Matthias Andree @ 2001-09-25 15:13 UTC (permalink / raw)
To: linux-kernel, reiserfs-list, Alex Bligh - linux-kernel
On Tue, 25 Sep 2001, Alex Bligh - linux-kernel wrote:
> Probably because sectors are so close together on the physical media.
> If you disable write caching, and are writing sectors 1001, 1002, 1003
> etc., you tell it to write sector 1001, and it doesn't complete until
> it's written it, you IRQ the PC, and it sends the write out for 1002,
> which completes a little later. However, by this time 1002 has
> flown past the drive head, as it wasn't immediately queued on the drive.
> If you had only one sector of writeahead, this effect would disappear
> (but is just as theoretically dangerous if there is no way to force
> a flush() of the write cache).
Which leads me to the question: which ATA standard brought up the
mandatory FLUSH CACHE command? I saw it's in the ATA 6 draft. How about
standards used in drives that are sold today? ATA 4, ATA 5? Do they have
the FLUSH CACHE command listed, possibly as mandatory? That might be
rather useful to use after in a "synchronous" write.
--
Matthias Andree
"Those who give up essential liberties for temporary safety deserve
neither liberty nor safety." - Benjamin Franklin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 14:47 ` Alex Bligh - linux-kernel
2001-09-25 15:13 ` Matthias Andree
@ 2001-09-25 15:23 ` John Alvord
2001-09-25 22:41 ` bill davidsen
1 sibling, 1 reply; 30+ messages in thread
From: John Alvord @ 2001-09-25 15:23 UTC (permalink / raw)
To: Alex Bligh - linux-kernel; +Cc: Matthias Andree, linux-kernel, reiserfs-list
On Tue, 25 Sep 2001, Alex Bligh - linux-kernel wrote:
>
>
> --On Tuesday, September 25, 2001 2:11 AM +0200 Matthias Andree
> <matthias.andree@stud.uni-dortmund.de> wrote:
>
> > Why are disk drives slower with their caches disabled on LINEAR writes?
>
> Probably because sectors are so close together on the physical media.
> If you disable write caching, and are writing sectors 1001, 1002, 1003
> etc., you tell it to write sector 1001, and it doesn't complete until
> it's written it, you IRQ the PC, and it sends the write out for 1002,
> which completes a little later. However, by this time 1002 has
> flown past the drive head, as it wasn't immediately queued on the drive.
There used to be stupid hard disk formatting tricks where the sector
numbers were interleaved
1001 1008 1002 1009 1003 1010 1004 1011 1005 1012 1006 1013 1007 1014
just to gain that enhancement. I also remember an ancient IBM Dasd trick
where the sectors were offset just slightly so that a track to track
switch could pick up the next sector in time.
Ancient history...
john
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 14:04 ` bill davidsen
@ 2001-09-25 17:39 ` bill davidsen
0 siblings, 0 replies; 30+ messages in thread
From: bill davidsen @ 2001-09-25 17:39 UTC (permalink / raw)
To: linux-kernel
In article <200109251404.f8PE4Oh06427@deathstar.prodigy.com> I wrote:
>Write cache makes a big difference in normal use, where seeks and such
>can be optimized, but for a single process writing a single file (ie.
>dd) I don't see where it would or could help much.
Sorry, ignore this, I got a phone call while replying to this and
glanced at the screen and transposed drive write cache with o/s write
cache. Ignore, I had a failure to restore context, and someone else has
made my intended point about ext2 being able to write stale blocks under
some failure modes.
-bill
--
bill davidsen <davidsen@tmr.com>
"If I were a diplomat, in the best case I'd go hungry. In the worst
case, people would die."
-- Robert Lipe
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
2001-09-25 15:23 ` John Alvord
@ 2001-09-25 22:41 ` bill davidsen
0 siblings, 0 replies; 30+ messages in thread
From: bill davidsen @ 2001-09-25 22:41 UTC (permalink / raw)
To: linux-kernel
In article <Pine.LNX.4.20.0109250820001.28393-100000@otter.mbay.net> jalvo@mbay.net wrote:
| There used to be stupid hard disk formatting tricks where the sector
| numbers were interleaved
|
| 1001 1008 1002 1009 1003 1010 1004 1011 1005 1012 1006 1013 1007 1014
|
| just to gain that enhancement. I also remember an ancient IBM Dasd trick
| where the sectors were offset just slightly so that a track to track
| switch could pick up the next sector in time.
|
| Ancient history...
The spacing between consecutively numbered sectors is called
interleave, and the offset between cylinders is called skew.
Take a look at superformat, it's still done. The object is to take any
media without hardware cache and match the timing of the media under the
head with the need for data. This could easily double the transfer rate
on any media which runs in an unbuffered reader.
And I would be surprised if modern hard drives don't allow using an
offset between tracks so that when a track is read the next track
"starts" under the read heads after the delay to step one track. This
can save you the rotational latency of the drive, which on a 7200rpm
drive is ~8.5ms, or as great as the step time. If actual track to track
is matched to the step time you save that one rotation on big block
reads, like swaps.
Guess it's not quite so ancient history after all, just done by the
vendor typically. If I were writing a low level formatting routine, and
I haven't in over a decade, I would run some tests on how much skew is
needed to have the first sector under the head. The penelty for making
that number a little too large is a sector or two, the penalty for
making it too small is one rotation less a sector or two.
--
bill davidsen <davidsen@tmr.com>
"If I were a diplomat, in the best case I'd go hungry. In the worst
case, people would die."
-- Robert Lipe
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2001-09-25 22:41 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-24 14:09 [PATCH] 2.4.10 improved reiserfs a lot, but could still be better Beau Kuiper
2001-09-24 14:46 ` [reiserfs-list] " Chris Mason
2001-09-24 15:32 ` Matthias Andree
2001-09-24 15:45 ` Alan Cox
2001-09-24 15:47 ` Matthias Andree
2001-09-24 16:08 ` Alan Cox
2001-09-24 16:08 ` [reiserfs-list] " Chris Dukes
2001-09-24 16:54 ` Matthias Andree
2001-09-24 16:15 ` Nicholas Knight
2001-09-24 16:40 ` [reiserfs-list] " Lehmann
2001-09-24 16:53 ` Matthias Andree
2001-09-24 16:57 ` [reiserfs-list] " Lehmann
2001-09-25 14:04 ` bill davidsen
2001-09-25 17:39 ` bill davidsen
2001-09-24 20:05 ` Nicholas Knight
2001-09-25 0:11 ` Matthias Andree
2001-09-25 4:49 ` Nicholas Knight
2001-09-25 6:00 ` Beau Kuiper
2001-09-25 6:17 ` Nicholas Knight
2001-09-25 10:44 ` Matthias Andree
2001-09-25 11:01 ` ben-lists
2001-09-25 10:42 ` Matthias Andree
2001-09-25 11:07 ` Nicholas Knight
2001-09-25 14:47 ` Alex Bligh - linux-kernel
2001-09-25 15:13 ` Matthias Andree
2001-09-25 15:23 ` John Alvord
2001-09-25 22:41 ` bill davidsen
2001-09-25 12:54 ` Jorge Nerín
2001-09-25 13:06 ` [reiserfs-list] " Chris Mason
2001-09-25 13:17 ` Matthias Andree
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox