* Out-of-order writing by disk drives
@ 2009-04-07 20:04 Anton Ertl
2009-04-14 14:10 ` Andi Kleen
2009-04-17 21:07 ` Folkert van Heusden
0 siblings, 2 replies; 15+ messages in thread
From: Anton Ertl @ 2009-04-07 20:04 UTC (permalink / raw)
To: linux-kernel
I have released a new version of hdtest, a program that tests whether
hard disks write out-of-order relative to the order that the writes
were passed to them from the OS. You find the program at
http://www.complang.tuwien.ac.at/anton/hdtest/
Here I mainly present the results from my tests, and explain enough
about the program so you know what I am talking about.
HOW DOES IT WORK?
It writes the blocks in an order like this:
1000-0-1001-0-1002-0-...
This sequence seems to inspire PATA and SATA disks to write
out-of-order (in the order 1000-1001-1002-...-0). So you turn off the
drive's power while running the program. The written blocks contain
certain data that another program from the suite can check after you
power the drive up again.
RESULTS
I performed two sets of tests, one in November 1999, and one in April
2009. The results have not changed much. In both tests disks wrote
data seriously out-of-order in their default configuration; they can
delay the writing of block 0 in this test for quite a long time.
In more detail:
In 2009 I tested three drives (and accessed the whole drive) under
Linux 2.6.18 on Debian Etch; the USB enclosure used was a Tsunami
Elegant 3.5" Enclosure that has PATA and SATA disk drive interfaces.
* Maxtor L300R0 PATA (300GB) connected through an USB enclosure: In
two tests it wrote the consecutive blocks 47 and 34 blocks after the
last written block 0.
* Seagate ST340062 Model 0A PATA (7200.10, 400GB):
connected through a USB enclosure:
3 times the result was as if it had written the blocks in-order
1 time it wrote 3064 blocks out-of-order
2 times it wrote 18384 blocks out-of-order
connected directly via PATA cable:
1 time it wrote 1972 blocks out-of-order
* Seagate ST340062 Model 0AS SATA (7200.10, 400GB) connected through a
USB enclosure:
1 time the result was as if it had written the blocks in-order
2 times it wrote 3064 blocks out-of-order
1 time it wrote 6128 blocks out-of-order
1 time it wrote 12256 blocks out-of-order
1 time it did not write block 0 at all
It is interesting that the number of blocks that is found to be
out-of-order is often a multiple of 3064. Maybe this is a multiple of
a track size; no other explanations come to mind.
In 1999 I tested two drives (and accessed one partition) under
Linux-2.2.1 on RedHat 5.1. The two drives were a Quantum Fireball
CR8.4A (8GB) and an IBM-DHEA-36480 (6GB), both connected directly via
PATA. I did one test with each of the disks, and they did not even
write block 0 once on the platters before I turned off the power.
I also tested the Quantum with write caching disabled (hdparm -W 0).
Hdtest was now quite noisy and produced the in-order result.
CONCLUSION
Applications and file systems requiring in-order writes (i.e.,
basically all of them) should use barriers or turn off write caching
for the disk drive(s) they use. Unfortunately, the Linux ext3 file
system does not use barriers by default; use the mount option
barrier=1 to enable them, e.g. by putting a line like this in
/etc/fstab:
/dev/md2 /home ext3 defaults,barrier=1 1 2
- anton
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Out-of-order writing by disk drives
2009-04-07 20:04 Out-of-order writing by disk drives Anton Ertl
@ 2009-04-14 14:10 ` Andi Kleen
2009-04-14 16:33 ` Anton Ertl
2009-04-17 21:07 ` Folkert van Heusden
1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2009-04-14 14:10 UTC (permalink / raw)
To: anton; +Cc: linux-kernel
"Anton Ertl" <anton@mips.complang.tuwien.ac.at> writes:
>
> /dev/md2 /home ext3 defaults,barrier=1 1 2
Just make sure /dev/md2 is a RAID1, MD RAID0/5/10 don't support barriers.
See also my general treatise of multiple device barriers earlier today.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-14 14:10 ` Andi Kleen
@ 2009-04-14 16:33 ` Anton Ertl
2009-04-14 17:24 ` Andi Kleen
0 siblings, 1 reply; 15+ messages in thread
From: Anton Ertl @ 2009-04-14 16:33 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
Andi Kleen wrote:
>
> "Anton Ertl" <anton@mips.complang.tuwien.ac.at> writes:
> >
> > /dev/md2 /home ext3 defaults,barrier=1 1 2
>
> Just make sure /dev/md2 is a RAID1, MD RAID0/5/10 don't support barriers.
Thank you. I added the following to the README:
|Note that, as of this writing (2009-04), not all Linux devices support
|barriers, in particular md devices only support them in RAID 1 mode;
|the kernel will reportedly warn about the lack of barriers if you try
|to use ext3 with barriers on a device that does not support barriers
|(look in, e.g., dmesg).
> See also my general treatise of multiple device barriers earlier today.
I guess you mean <878wm3903h.fsf@basil.nowhere.org> in the
"dm-multipath and write request ordering" thread. Thank you for the
pointer.
- anton
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-14 16:33 ` Anton Ertl
@ 2009-04-14 17:24 ` Andi Kleen
2009-04-14 17:40 ` Mark Lord
0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2009-04-14 17:24 UTC (permalink / raw)
To: Anton Ertl; +Cc: Andi Kleen, linux-kernel
On Tue, Apr 14, 2009 at 06:33:50PM +0200, Anton Ertl wrote:
> Andi Kleen wrote:
> >
> > "Anton Ertl" <anton@mips.complang.tuwien.ac.at> writes:
> > >
> > > /dev/md2 /home ext3 defaults,barrier=1 1 2
> >
> > Just make sure /dev/md2 is a RAID1, MD RAID0/5/10 don't support barriers.
>
> Thank you. I added the following to the README:
>
> |Note that, as of this writing (2009-04), not all Linux devices support
> |barriers, in particular md devices only support them in RAID 1 mode;
> |the kernel will reportedly warn about the lack of barriers if you try
> |to use ext3 with barriers on a device that does not support barriers
> |(look in, e.g., dmesg).
A full listing of what devices do and don't support barriers would
be likely very long. You would actually need to list down to hard disks.
A common problem is barriers over LVM. Since 2.6.29 they work
with a single device (and if the underlying device supports it) with
dm linear, but not in any other LVM setup.
So it might be better to just generally recommend to check
dmesg.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-14 17:24 ` Andi Kleen
@ 2009-04-14 17:40 ` Mark Lord
2009-04-14 17:48 ` Michael Tokarev
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Mark Lord @ 2009-04-14 17:40 UTC (permalink / raw)
To: Andi Kleen; +Cc: Anton Ertl, linux-kernel
Andi Kleen wrote:
> On Tue, Apr 14, 2009 at 06:33:50PM +0200, Anton Ertl wrote:
>> Andi Kleen wrote:
>>> "Anton Ertl" <anton@mips.complang.tuwien.ac.at> writes:
>>>> /dev/md2 /home ext3 defaults,barrier=1 1 2
>>> Just make sure /dev/md2 is a RAID1, MD RAID0/5/10 don't support barriers.
>> Thank you. I added the following to the README:
>>
>> |Note that, as of this writing (2009-04), not all Linux devices support
>> |barriers, in particular md devices only support them in RAID 1 mode;
>> |the kernel will reportedly warn about the lack of barriers if you try
>> |to use ext3 with barriers on a device that does not support barriers
>> |(look in, e.g., dmesg).
>
> A full listing of what devices do and don't support barriers would
> be likely very long. You would actually need to list down to hard disks.
>
> A common problem is barriers over LVM. Since 2.6.29 they work
> with a single device (and if the underlying device supports it) with
> dm linear, but not in any other LVM setup.
..
Does anyone else here find this rather peculiar?
The folks who actually care about barriers the most
(apart from kernel developers) are probably enterprise users.
And who is most likely to be using RAID and LVM,
where barriers generally don't work at all ?
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Out-of-order writing by disk drives
2009-04-14 17:40 ` Mark Lord
@ 2009-04-14 17:48 ` Michael Tokarev
2009-04-14 17:54 ` Andi Kleen
2009-04-17 19:46 ` Folkert van Heusden
2 siblings, 0 replies; 15+ messages in thread
From: Michael Tokarev @ 2009-04-14 17:48 UTC (permalink / raw)
To: Mark Lord; +Cc: Andi Kleen, Anton Ertl, linux-kernel
Mark Lord wrote:
> Andi Kleen wrote:
[]
>> A common problem is barriers over LVM. Since 2.6.29 they work
>> with a single device (and if the underlying device supports it) with
>> dm linear, but not in any other LVM setup.
>
> Does anyone else here find this rather peculiar?
>
> The folks who actually care about barriers the most
> (apart from kernel developers) are probably enterprise users.
>
> And who is most likely to be using RAID and LVM,
> where barriers generally don't work at all ?
And esp. RAID10.
(Not being an "enterprise" user really, but I do still use
several hard drives and databases).
For this very reason, I stopped using both RAID10 and LVM.
For now, I've several large RAID1 volumes with GPT partitions
inside them, and use that for various databases etc. With
XFS mostly. It also does not have constant alignment problems
what LVM has(*) (I can align GPT properly, even when the tools
to do so (parted-derivates) are very bad quality still, crashing
left and right). Yes it's not that easy to use as LVM, but it
is MUCH faster because of all the reasons stated.
(*) another LVM's issue which is hidden behind scenes for
most users, and is especially serious on raid5 or raid6 --
this is misaligned blocks. The good thing about this is
that raid[56] are not usually being used for write-intensive
applications like databases.
/mjt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-14 17:40 ` Mark Lord
2009-04-14 17:48 ` Michael Tokarev
@ 2009-04-14 17:54 ` Andi Kleen
2009-04-14 18:09 ` Michael Tokarev
2009-04-17 19:46 ` Folkert van Heusden
2 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2009-04-14 17:54 UTC (permalink / raw)
To: Mark Lord; +Cc: Andi Kleen, Anton Ertl, linux-kernel
> Does anyone else here find this rather peculiar?
>
> The folks who actually care about barriers the most
> (apart from kernel developers) are probably enterprise users.
>
> And who is most likely to be using RAID and LVM,
> where barriers generally don't work at all ?
The big enterprise users often have a SAN, so the LVM/RAID part is
hidden somewhere between a block device, together with a lot of
battery backed cache RAM (so that even running uncached is not too bad)
But on the other hand they also have UPSes, so data loss on power failure
might be not that big a problem for them.
Where I see it as a problem is with virtualization; LVM seems to be the
most sane way to manage file systems for lots of VMs and you likely
want barriers there too.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-14 17:54 ` Andi Kleen
@ 2009-04-14 18:09 ` Michael Tokarev
2009-04-14 18:50 ` Andi Kleen
0 siblings, 1 reply; 15+ messages in thread
From: Michael Tokarev @ 2009-04-14 18:09 UTC (permalink / raw)
To: Andi Kleen; +Cc: Mark Lord, Anton Ertl, linux-kernel
Andi Kleen wrote:
[]
> Where I see it as a problem is with virtualization; LVM seems to be the
> most sane way to manage file systems for lots of VMs and you likely
> want barriers there too.
Virtualisation is the best fit for partitionable raid1 arrays.
It is, in fact, what we have here -- I tested LVM but rejected
it because of this very issue - it does not support barriers.
And used partitions inside RAID1 arrays instead. It is not
that easy as with lvm (requires some work with numbers instead
of names, but that's again not that problematic if you think
about /dev/disk/by-name/...), and does not support resizing,
but again, this is not really a very necessary feature here
(easy to copy data to a new, larger place).
And by the way, here, extlinux comes very very handy. Inside
guests I don't use partitions but "whole disks", including
boot disk (/dev/vda for kvm). ext3fs on it, and extlinux to
boot it (works flawlessly on a partition-less device)
/mjt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-14 18:09 ` Michael Tokarev
@ 2009-04-14 18:50 ` Andi Kleen
2009-04-14 19:27 ` Chris Mason
0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2009-04-14 18:50 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Andi Kleen, Mark Lord, Anton Ertl, linux-kernel
On Tue, Apr 14, 2009 at 10:09:48PM +0400, Michael Tokarev wrote:
> Andi Kleen wrote:
> []
> >Where I see it as a problem is with virtualization; LVM seems to be the
> >most sane way to manage file systems for lots of VMs and you likely
> >want barriers there too.
>
> Virtualisation is the best fit for partitionable raid1 arrays.
> It is, in fact, what we have here -- I tested LVM but rejected
> it because of this very issue - it does not support barriers.
It does now as of 2.6.29, as long as you only have a single
underlying device and use dm linear.
-Andi
>
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-14 18:50 ` Andi Kleen
@ 2009-04-14 19:27 ` Chris Mason
2009-04-15 12:46 ` Jens Axboe
0 siblings, 1 reply; 15+ messages in thread
From: Chris Mason @ 2009-04-14 19:27 UTC (permalink / raw)
To: Andi Kleen; +Cc: Michael Tokarev, Mark Lord, Anton Ertl, linux-kernel
On Tue, 2009-04-14 at 20:50 +0200, Andi Kleen wrote:
> On Tue, Apr 14, 2009 at 10:09:48PM +0400, Michael Tokarev wrote:
> > Andi Kleen wrote:
> > []
> > >Where I see it as a problem is with virtualization; LVM seems to be the
> > >most sane way to manage file systems for lots of VMs and you likely
> > >want barriers there too.
> >
> > Virtualisation is the best fit for partitionable raid1 arrays.
> > It is, in fact, what we have here -- I tested LVM but rejected
> > it because of this very issue - it does not support barriers.
>
> It does now as of 2.6.29, as long as you only have a single
> underlying device and use dm linear.
Eric Sandeen noticed this is actually still broken:
http://lkml.org/lkml/2009/3/23/360
-chris
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-14 19:27 ` Chris Mason
@ 2009-04-15 12:46 ` Jens Axboe
0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2009-04-15 12:46 UTC (permalink / raw)
To: Chris Mason
Cc: Andi Kleen, Michael Tokarev, Mark Lord, Anton Ertl, linux-kernel,
agk
On Tue, Apr 14 2009, Chris Mason wrote:
> On Tue, 2009-04-14 at 20:50 +0200, Andi Kleen wrote:
> > On Tue, Apr 14, 2009 at 10:09:48PM +0400, Michael Tokarev wrote:
> > > Andi Kleen wrote:
> > > []
> > > >Where I see it as a problem is with virtualization; LVM seems to be the
> > > >most sane way to manage file systems for lots of VMs and you likely
> > > >want barriers there too.
> > >
> > > Virtualisation is the best fit for partitionable raid1 arrays.
> > > It is, in fact, what we have here -- I tested LVM but rejected
> > > it because of this very issue - it does not support barriers.
> >
> > It does now as of 2.6.29, as long as you only have a single
> > underlying device and use dm linear.
>
> Eric Sandeen noticed this is actually still broken:
>
> http://lkml.org/lkml/2009/3/23/360
Alasdair promised to push the remaining barrier bits for 2.6.30, so
hopefully it should all be in working order Real Soon Now.
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-14 17:40 ` Mark Lord
2009-04-14 17:48 ` Michael Tokarev
2009-04-14 17:54 ` Andi Kleen
@ 2009-04-17 19:46 ` Folkert van Heusden
2009-04-17 20:29 ` Andi Kleen
2 siblings, 1 reply; 15+ messages in thread
From: Folkert van Heusden @ 2009-04-17 19:46 UTC (permalink / raw)
To: Mark Lord; +Cc: Andi Kleen, Anton Ertl, linux-kernel
>> A full listing of what devices do and don't support barriers would
>> be likely very long. You would actually need to list down to hard disks.
>>
>> A common problem is barriers over LVM. Since 2.6.29 they work
>> with a single device (and if the underlying device supports it) with
>> dm linear, but not in any other LVM setup.
> ..
>
> Does anyone else here find this rather peculiar?
> The folks who actually care about barriers the most
> (apart from kernel developers) are probably enterprise users.
> And who is most likely to be using RAID and LVM,
> where barriers generally don't work at all ?
What about iSCSI? Does it support barriers?
Folkert van Heusden
--
MultiTail cok yonlu kullanimli bir program, loglari okumak, verilen
kommandolari yerine getirebilen. Filter, renk verme, merge, 'diff-
view', vs. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-07 20:04 Out-of-order writing by disk drives Anton Ertl
2009-04-14 14:10 ` Andi Kleen
@ 2009-04-17 21:07 ` Folkert van Heusden
2009-04-18 9:06 ` Anton Ertl
1 sibling, 1 reply; 15+ messages in thread
From: Folkert van Heusden @ 2009-04-17 21:07 UTC (permalink / raw)
To: Anton Ertl; +Cc: linux-kernel
> I have released a new version of hdtest, a program that tests whether
> hard disks write out-of-order relative to the order that the writes
> were passed to them from the OS. You find the program at
> http://www.complang.tuwien.ac.at/anton/hdtest/
Not sure if it matters but it seems open-iscsi (both target and
initiator are linux systems) works fine with respect to the write
barriers: while running hdtest on an iscsi device I suddenly stopped the
traffic flowing using an iptables DROP-rule. Then of course I stopped
the iscsi initiator, removed the rules, restarted the initator and ran
hdcheck: all above the line have the correct magic.
Folkert van Heusden
--
www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Out-of-order writing by disk drives
2009-04-17 21:07 ` Folkert van Heusden
@ 2009-04-18 9:06 ` Anton Ertl
0 siblings, 0 replies; 15+ messages in thread
From: Anton Ertl @ 2009-04-18 9:06 UTC (permalink / raw)
To: Folkert van Heusden; +Cc: linux-kernel
Folkert van Heusden wrote:
>
> > I have released a new version of hdtest, a program that tests whether
> > hard disks write out-of-order relative to the order that the writes
> > were passed to them from the OS. You find the program at
> > http://www.complang.tuwien.ac.at/anton/hdtest/
>
> Not sure if it matters but it seems open-iscsi (both target and
> initiator are linux systems) works fine with respect to the write
> barriers: while running hdtest on an iscsi device I suddenly stopped the
> traffic flowing using an iptables DROP-rule. Then of course I stopped
> the iscsi initiator, removed the rules, restarted the initator and ran
> hdcheck: all above the line have the correct magic.
hdtest does not use barriers (if it did, my results would hopefully be
different; BTW, how would I use device barriers from a user program?).
But it does write to the device opened with O_SYNC. So I expect the
kernel to pass the request synchronously to the device (due to
O_SYNC), but the device has no particular reason (like barriers) to
write the stuff in-order. So I would expect your disconnection not to
result in out-of-order writing just like I would not expect
disconnecting the USB or SATA connection to have that effect when
using a setup like I did (but I have not tried that).
In short, your experiment tells nothing about barriers over iSCSI,
because barriers are not used (AFAIK).
- anton
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-04-18 9:06 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-07 20:04 Out-of-order writing by disk drives Anton Ertl
2009-04-14 14:10 ` Andi Kleen
2009-04-14 16:33 ` Anton Ertl
2009-04-14 17:24 ` Andi Kleen
2009-04-14 17:40 ` Mark Lord
2009-04-14 17:48 ` Michael Tokarev
2009-04-14 17:54 ` Andi Kleen
2009-04-14 18:09 ` Michael Tokarev
2009-04-14 18:50 ` Andi Kleen
2009-04-14 19:27 ` Chris Mason
2009-04-15 12:46 ` Jens Axboe
2009-04-17 19:46 ` Folkert van Heusden
2009-04-17 20:29 ` Andi Kleen
2009-04-17 21:07 ` Folkert van Heusden
2009-04-18 9:06 ` Anton Ertl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox