public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Journaling pointless with today's hard disks?
@ 2001-11-24 13:03 Florian Weimer
  2001-11-24 13:40 ` Rik van Riel
                   ` (2 more replies)
  0 siblings, 3 replies; 86+ messages in thread
From: Florian Weimer @ 2001-11-24 13:03 UTC (permalink / raw)
  To: linux-kernel

In the German computer community, a statement from IBM[1] is
circulating which describes a rather peculiar behavior of certain IBM
IDE hard drivers (the DTLA series):

When the drive is powered down during a write operation, the sector
which was being written has got an incorrect checksum stored on disk.
So far, so good---but if the sector is read later, the drive returns a
*permanent*, *hard* error, which can only be removed by a low-level
format (IBM provides a tool for it).  The drive does not automatically
map out such sectors.

IBM claims this isn't a firmware error, but thinks that this explains
the failures frequently observed with DTLA drivers (which might
reflect reality or not, I don't know, but that's not the point
anyway).

Now my question: Obviously, journaling file systems do not work
correctly on drivers with such behavior.  In contrast, a vital data
structure is frequently written to (the journal), so such file systems
*increase* the probability of complete failure (with a bad sector in
the journal, the file system is probably unusable; for non-journaling
file systems, only a part of the data becomes unavailable).  Is the
DTLA hard disk behavior regarding aborted writes more common among
contemporary hard drives?  Wouldn't this make journaling pretty
pointless?


1. http://www.cooling-solutions.de/dtla-faq (German)

-- 
Florian Weimer 	                  Florian.Weimer@RUS.Uni-Stuttgart.DE
University of Stuttgart           http://cert.uni-stuttgart.de/
RUS-CERT                          +49-711-685-5973/fax +49-711-685-5898

^ permalink raw reply	[flat|nested] 86+ messages in thread
* Re: Journaling pointless with today's hard disks?
@ 2001-11-25  1:20 dnu478nt5w@mailexpire.com
  0 siblings, 0 replies; 86+ messages in thread
From: dnu478nt5w@mailexpire.com @ 2001-11-25  1:20 UTC (permalink / raw)
  To: linux-kernel

Stephen Satchell wrote:

> Most power supplies are not designed to hold up for more than 30-60 ms at 
> full load upon removal of mains power.  Power-fail detect typically 
> requires 12 ms (three-quarters cycle average at 60 Hz) or 15 ms 
> (three-quarters cycle average at 50 Hz) to detect that mains power has 
> failed, leaving your system a very short time to abort that long queue of 
> disk write commands.  It's very possible that by the time the system wakes 
> up to the fact that its electron feeding tube is empty it has already 
> started a write operation that cannot be completed before power goes out of 
> specification.  It's a race condition.
>
> If power goes out of specification before the drive completes a commanded
> write, what do you expect the poor drive to do?

I expect it to have enough capacitor power and rotational inertia that
it can decide before it *starts* a given sector write whether it will
be able, barring a disaster rather less likely than instantaneous loss
of DC power, to complete it.

It doesn't need that long.  Take, as an example, a Really Old drive...
an original 20 MB MFM drive.  3600 RPM, 17 sectors/track.

That's 60*17 = 1020 sectors per second passing under the head.  So the
actual duration of a sector write is 1 ms.

In a more modern hard drive (IBM 40GV) will spin faster and have
from 370 (inner tracks) to 792 (outer tracks) sectors per track.
(http://www.storagereview.com/guide2000/ref/hdd/geom/tracksZBR.html)

Even at 5400 rpm, on the innermost track, that's 90*370 = 33300
sectors/second passing under the head, or 30 *microseconds* per sector.


I think it's reasonable to expect a drive to keep functioning for 30
microseconds between when it notices the power is dropping and when it
really can't continue.  Heck, even 1 ms isn't unreasonable.

We exepct a drive to look at the power supply shortly before the write,
and decide if it's "go for launch" or not.

Given that modern drives already save enough power to unload the heads
before the platter slows to the point that they'd touch down, this doesn't
seem like a big problem.  (Of course, unloading the heads doesn't require
that drive RPM, head position, or anything else be within spec.)


What we'd *like* is for the drive to have enough power to be able to
seek to a reserved location and dump out the entire write-behind cache
before dying, but that possibly requires a full-bore seek (longer than
the typical 9 ms "average" seek) plus head settle time (it's okay to
start *reading* before you're sure the head is in place; the CRC will
tell you if you didn't make it), plus writing 4000 sectors (5+ rotations,
with head switching and extra settle time between, for a 2 MB buffer),
but that's adding up to a good fraction of 100 ms, which *is* a bit long
for power-loss operation.

^ permalink raw reply	[flat|nested] 86+ messages in thread
* RE: Journaling pointless with today's hard disks?
@ 2001-11-28 14:36 Galappatti, Kishantha
  0 siblings, 0 replies; 86+ messages in thread
From: Galappatti, Kishantha @ 2001-11-28 14:36 UTC (permalink / raw)
  To: 'Pedro M. Rodrigues', Wayne Whitney, Andre Hedrick; +Cc: LKML

im curious also.. what do you mean you've "exposed" yourself already? is
this a trade secret?

--kish

-----Original Message-----
From: Pedro M. Rodrigues [mailto:pmanuel@myrealbox.com]
Sent: Wednesday, November 28, 2001 6:53 AM
To: Wayne Whitney; Andre Hedrick
Cc: LKML
Subject: Re: Journaling pointless with today's hard disks?



   Just curious but what can a selfttest mode and consequent block 
test do to inspire you such worry? Are we dealing with the mob or 
something of the sort when we buy an IBM 75GXP disk? 


/Pedro

On 27 Nov 2001 at 13:52, Andre Hedrick wrote:

> 
> 
> What you have done is trigger a process to have the device go into a
> selftest mode to perform a block test.  I would tell you more but I
> may have exposed myself already.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 86+ messages in thread
* Re: Journaling pointless with today's hard disks?
@ 2001-11-28 17:22 David Balazic
  0 siblings, 0 replies; 86+ messages in thread
From: David Balazic @ 2001-11-28 17:22 UTC (permalink / raw)
  To: cw, linux-kernel@vger.kernel.org

Chris Wedgwood (cw@f00f.org) wrote :

> On Sat, Nov 24, 2001 at 02:03:11PM +0100, Florian Weimer wrote: 
> 
>     When the drive is powered down during a write operation, the 
>     sector which was being written has got an incorrect checksum 
>     stored on disk. So far, so good---but if the sector is read 
>     later, the drive returns a *permanent*, *hard* error, which can 
>     only be removed by a low-level format (IBM provides a tool for 
>     it). The drive does not automatically map out such sectors. 
> 
> AVOID SUCH DRIVES... I have both Seagate and IBM SCSI drives which a 
> are hot-swappable in a test machine that I used for testing various 
> journalling filesystems a while back for reliability. 
> 
> Some (many) of those tests involved removed the disk during writes 
> (literally) and checking the results afterwards.

What do you mean by "removed the disk" ?

- rm /dev/hda ? :-)
- disconnect the disk from the SCSI or ATA bus ?
- from the power supply ?
- both ?
- something else ?

> 
> The drives were set not to write-cache (they don't by default, but all 
> my IDE drives do, so maybe this is a SCSI thing?) 
> 
> At no point did I ever see a partial write or corrupted sector; nor 
> have I seen any appear in the grown table, so as best as I can tell 
> even under removal with sustain writes there are SOME DRIVES WHERE 
> THIS ISN'T A PROBLEM. 
> 
> Now, since EMC, NetApp, Sun, HP, Compaq, etc. all have products which 
> presumable depend on this behavior, I don't think it's going to go 
> away, it perhaps will just become important to know which drives are 
> brain-damaged and list them so people can avoid them. 
> 
> As this will affect the Windows world too consumer pressure will 
> hopefully rectify this problem. 
> 
>   --cw 
> 
> P.S. Write-caching in hard-drives is insanely dangerous for 
>      journalling filesystems and can result in all sorts of nasties. 
>      I recommend people turn this off in their init scripts (perhaps I 
>      will send a patch for the kernel to do this on boot, I just 
>      wonder if it will eat some drives). 

-- 
David Balazic
--------------
"Be excellent to each other." - Bill S. Preston, Esq., & "Ted" Theodore Logan
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

^ permalink raw reply	[flat|nested] 86+ messages in thread
* Re: Journaling pointless with today's hard disks?
@ 2001-11-28 23:25 Frank de Lange
  2001-11-29  1:52 ` Matthias Andree
  0 siblings, 1 reply; 86+ messages in thread
From: Frank de Lange @ 2001-11-28 23:25 UTC (permalink / raw)
  To: linux-kernel

On Mon, 26 Nov 2001, Richard B. Johnson wrote:

> It isn't that easy! Any kind of power storage within the drive would
> have to be isolated with diodes so that it doesn't try to run your
> motherboard as well as the drive. This means that +5 volt logic supply
> would now be 5.0 - 0.6 = 4.4 volts at the drive, well below the design
> voltage. Use of a Schottky diode (0.34 volts) would help somewhat, but you
> have now narrowed the normal power design-margin by 90 percent, not good.

Another interesting possibility would be to use the momentum of the spinning
platters and motor assembly to power the drive electronics, simply by using the
motor as a generator. When power fails during a write, use the current
generated by the motor to finish the write.

Just a wild idea...

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \ lkml-frank@unternet.org  /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2001-12-04  3:48 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-24 13:03 Journaling pointless with today's hard disks? Florian Weimer
2001-11-24 13:40 ` Rik van Riel
2001-11-24 16:36   ` Phil Howard
2001-11-24 17:19     ` Charles Marslett
2001-11-24 17:31     ` Florian Weimer
2001-11-24 17:41     ` Matthias Andree
2001-11-24 19:20       ` Florian Weimer
2001-11-24 19:29         ` Rik van Riel
2001-11-24 22:51           ` John Alvord
2001-11-24 23:41             ` Phil Howard
2001-11-25  0:24               ` Ian Stirling
2001-11-25  0:53                 ` Phil Howard
2001-11-25  1:25                   ` H. Peter Anvin
2001-11-25  1:44                   ` Sven.Riedel
2001-11-24 22:28         ` H. Peter Anvin
2001-11-25  4:49           ` Andre Hedrick
2001-11-24 23:04         ` Pedro M. Rodrigues
2001-11-24 23:23         ` Stephen Satchell
2001-11-24 23:29           ` H. Peter Anvin
2001-11-26 18:05             ` Steve Brueggeman
2001-11-26 23:49               ` Martin Eriksson
2001-11-27  0:06                 ` Andreas Dilger
2001-11-27  0:16                   ` Andre Hedrick
2001-11-27  7:38                     ` Andreas Dilger
2001-11-27 11:48                       ` Ville Herva
2001-11-27  0:18                 ` Jonathan Lundell
2001-11-27  1:01                   ` Ian Stirling
2001-11-27  1:33                     ` H. Peter Anvin
2001-11-27  1:57                   ` Steve Underwood
2001-11-27  5:04                   ` Stephen Satchell
     [not found]         ` <mailman.1006644421.6553.linux-kernel2news@redhat.com>
2001-11-25  4:20           ` Pete Zaitcev
2001-11-25 13:52           ` Pedro M. Rodrigues
2001-11-25 12:30         ` Matthias Andree
2001-11-25 15:04           ` Barry K. Nathan
2001-11-25 16:31             ` Matthias Andree
2001-11-27  2:39               ` Pavel Machek
2001-12-03 10:23                 ` Matthias Andree
2001-11-25  9:14 ` Chris Wedgwood
2001-11-25 22:55   ` Daniel Phillips
2001-11-26 16:59   ` Rob Landley
2001-11-26 20:30     ` Andre Hedrick
2001-11-26 20:35       ` Rob Landley
2001-11-26 23:59         ` Andreas Dilger
2001-11-27  0:24           ` H. Peter Anvin
2001-11-27  0:52             ` H. Peter Anvin
2001-11-27  1:11               ` Andrew Morton
2001-11-27  1:15                 ` H. Peter Anvin
2001-11-27 16:59                   ` Matthias Andree
2001-11-27 16:56               ` Matthias Andree
2001-11-27  1:23         ` Ian Stirling
2001-11-26 23:00           ` Rob Landley
2001-11-27  2:41             ` H. Peter Anvin
2001-11-27  0:19               ` Rob Landley
2001-11-27 23:35                 ` Andreas Bombe
2001-11-28 14:32                   ` Rob Landley
2001-11-27  3:39             ` Ian Stirling
2001-11-27  7:03         ` Ville Herva
2001-11-27 16:50         ` Matthias Andree
2001-11-27 20:31           ` Rob Landley
2001-11-28 18:43             ` Matthias Andree
2001-11-28 18:46               ` Rob Landley
2001-11-28 22:19                 ` Matthias Andree
2001-11-29 22:21                   ` Pavel Machek
2001-12-01 10:55                     ` Jeff V. Merkey
2001-12-02  0:08                     ` Matthias Andree
2001-12-03 20:04                       ` Pavel Machek
2001-11-26 20:53     ` Richard B. Johnson
2001-11-26 21:18       ` Journaling pointless with today's hard disks? [wandering OT] Rob Landley
2001-11-27  0:32       ` Journaling pointless with today's hard disks? H. Peter Anvin
2001-11-27 16:39     ` Matthias Andree
2001-11-27 17:42       ` Martin Eriksson
2001-11-28 16:35         ` Ian Stirling
2001-11-26 17:14 ` Steve Brueggeman
2001-11-26 20:36   ` Andre Hedrick
2001-11-26 21:14     ` Steve Brueggeman
2001-11-26 21:36       ` Andre Hedrick
2001-11-27 16:36         ` Steve Brueggeman
2001-11-27 20:04           ` Bill Davidsen
2001-11-27 21:28         ` Wayne Whitney
2001-11-27 21:52           ` Andre Hedrick
2001-11-28 11:53             ` Pedro M. Rodrigues
  -- strict thread matches above, loose matches on Subject: below --
2001-11-25  1:20 dnu478nt5w@mailexpire.com
2001-11-28 14:36 Galappatti, Kishantha
2001-11-28 17:22 David Balazic
2001-11-28 23:25 Frank de Lange
2001-11-29  1:52 ` Matthias Andree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox