linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Blockbusting news, results are in
@ 2003-10-19  2:16 Norman Diamond
  2003-10-19  4:15 ` Larry McVoy
  2003-10-21  8:43 ` Jan-Benedict Glaw
  0 siblings, 2 replies; 52+ messages in thread
From: Norman Diamond @ 2003-10-19  2:16 UTC (permalink / raw)
  To: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek, Justin Cormack, Russell King,
	Vitaly Fertman, Krzysztof Halasa

In order of importance instead of chronology:

In the presence of friends who are disk drive engineers at Toshiba, I tried
to read the file containing the bad block, we listened to the disk drive do
auto-retries, we watched Linux record the I/O failure in the system log, we
saw the "cp" program report an I/O error, and we observed that the drive did
not reallocate the bad block during reads.

Next, I tried to write the bad block.  The following command is not the
first one that I tried but it was the first one to actually try writing the
bad block:
  dd if=/dev/zero of=/dev/hda bs=512 seek=19021881 count=1
We listened to the disk drive do auto-retries, we watched Linux record the
I/O failure in the system log, we saw the "dd" command report an I/O error,
and we observed that the drive did not reallocate the bad block during
writes.  (The bad block number is 19021882.)

After a few other experiments, I used smartctl to direct the drive to do a
long self-test.  When it completed, we observed that the drive had
self-diagnosed a read failure on the same bad sector number as always, and
we observed that the drive did not reallocate the bad block during long
self-tests.

Does anyone need more?

A partial solution could be to stop using Toshiba drives, but I don't think
this will be a complete answer.  Toshiba is not the only maker whose disk
drives get bad blocks.  We do not know if Toshiba is the only maker whose
firmware refuses to reallocate bad blocks when permanent errors are
detected, because the makers aren't saying.

File systems must maintain lists of  bad blocks and prevent ordinary file
operations from ever using those sector numbers.

Someone pointed out that this technique will not work for swap partitions.
I agree.  The "mkswap" command needs to test every sector in the swap
partition and warn the user if the partition will be unusable.

Now for the less important stuff.

After many hours of "find"ing and "cp"ing files to /dev/null, the bad block
was detected to be in file
  /usr/share/locale/es/LC_MESSAGES/bfd.mo
So indeed, this file had been written once and was not intended to be
written again, and could easily be restored from a source of good data.  But
I was really startled by this, because I don't use Spanish locales.  The
only locales I use are Japanese and English.  So why did this file even get
read, even while I was doing kernel compiles and stuff like that?  After
all, the reason the bad block was getting logged in the system log was that
the file was getting read.

I "mv"ed the file to file /badblockhere and used rpm with --replacepkgs to
reinstall binutils from SuSE's 8.2 distribution.  Then copied the new
correct file /usr/share/locale/es/LC_MESSAGES/bfd.mo to file /goodfilehere.
This preparation made it easy to do experiments with my Toshiba friends when
they visited.

My first attempt to write the bad block (after the read experiments) was:
  dd if=/goodfilehere of=/badblockhere
But this did not even try to write to the bad block.  The drive did not try
to do any auto-retries, there were no errors in the system log, and the dd
command output a success message.  Next, a repeat of a read attempt that
used to fail:
  cp /badblockhere /dev/null
succeeded.  So I guess that the when the dd command is told to output to an
ordinary file, it does not overwrite its output file, it creates a new file
and then renames it to replace the old file.  (Too bad it couldn't do the
same when I ran this command:
  dd if=/dev/zero of=/dev/hda bs=512 seek=19021881 count=1
and write a new disk drive to replace the old one  ^u^)

And now that block is in free space somewhere, waiting for Linux and the
Reiser filesystem to allocate it when creating or expanding some future
file.

The bad block can still be detected.  This fails as always:
  dd if=/dev/hda of=/dev/null bs=512 skip=19021881 count=1
(The bad block number is 19021882.)

By the way, Toshiba's US subsidiary has indications on their web site that
they provide warranty service on their products, but that they have reduced
the warranty period from three years to one year.  This was a smart move by
Toshiba's US subsidiary.  If their disk drives start to develop bad blocks
after two years, then customers don't discover how bad Toshiba's firmware is
until two years have passed, and now they can't even make claims to get
firmware fixed.

Toshiba's head office is even smarter.  In Japanese they refuse entirely to
provide warranty service to end users.  Customers have to send defective
disk drives back up through the sales channel.  Well, lucky customers who
bought the disk drive as part of a notebook computer probably get one year's
warranty from the vendor of the notebook computer, so if they're lucky
enough to learn about Toshiba's firmware within a year then they can send
their entire computer back for some length of time to get warranty service.
But anyone who went to Akihabara and bought the drive by itself from a parts
store, the store probably offers one week or one month to replace a failing
drive if it was dead on arrival.  In these cases a customer who learns about
Toshiba's firmware after two weeks or five weeks gets screwed.

My disk drive was made at Toshiba's factory in Gifu prefecture on September
13, 2001.  Since that time the factory has closed and this model has been
discontinued.

But Toshiba isn't the only maker who isn't saying how bad their firmware is.
We need those bad block lists.  They are as necessary as they ever were.


^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: Blockbusting news, results are in
@ 2003-10-19  7:37 Mudama, Eric
  2003-10-19  8:09 ` Norman Diamond
                   ` (4 more replies)
  0 siblings, 5 replies; 52+ messages in thread
From: Mudama, Eric @ 2003-10-19  7:37 UTC (permalink / raw)
  To: 'Norman Diamond ', 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', 'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '


> Does anyone need more?

Why don't you ask your friends at Toshiba whether that model supports
automatic reallocation, and if it does, how to enable it?

Since it isn't in the T13 ATA spec, I am assuming the ability to toggle that
feature is very vendor-specific.  Pretty sure all Maxtors from at least the
last year ship with that sort of reallocation enabled, and probably the last
4-5 years.

> We do not know if Toshiba is the only maker whose firmware
> refuses to reallocate bad blocks when permanent errors are
> detected, because the makers aren't saying.

What would you like "us disk makers" to say?  The drives I play with at work
happily reallocate on the fly all the time. (when I whack them with a
screwdriver and cause scratches on the media, that is)

> By the way, Toshiba's US subsidiary has indications on their
> web site that they provide warranty service on their products,
> but that they have reduced the warranty period from three years
> to one year.  This was a smart move by Toshiba's US subsidiary.

Yes, it saves us a lot of money every year, and lets us sell you each drive
for a few dollars cheaper.  My understanding is that the #1 cost issue is
the fact that to warranty a product legally in the USA, you need to maintain
a certain amount of product to handle replacement drives, long after they
stop being shipped.  Reducing our warranty inventory to some fraction of 1
year's volume (~55M drives) from some fraction of 3 year's volume (~160M
drives) is a significant amount of product we don't have to "eat".
(Remember, 3 year old drives, that we no longer need to hold on to for
warranty purposes, are near-worthless in the consumer market)

If every other part of your computer is warrantied for 1 year, why should
disk drives alone in the cheapest OEM systems carry 3 year warranties?  BTW,
you're welcome to buy "premium" drives with 3-year or 5-year warranties.  (3
on most vendor's high end ATA products, and 5 years on most SCSI products)
In most cases these premium warranties will only cost you $5-$10.  (This is
based simply on the rough price delta between our DiamondMax Plus9 200GB and
our MaxLine II 200GB, which are basically the same drive with different
warranties)

> If their disk drives start to develop bad blocks after two
> years, then customers don't discover how bad Toshiba's firmware
> is until two years have passed, and now they can't even make
> claims to get firmware fixed.

What do you want "fixed" in the firmware?  It is 1000x cheaper to just send
you a replacement drive from the current product line.  By the time 3 years
have passed (2 years beyond a 1 year warranty), our factory isn't even
capable of reprocessing the disk drive you hold in your hands, since we wind
up retooling chunks of it every few months to make way for
bigger/faster/quieter/cheaper disk drives.

About 2.5 years ago, Maxtor's largest drive was 60GB... 15GB/head.  Now
we're shipping 250GB drives with 6 heads also... ~42GB/head, almost triple
the capacity, and in a few months we'll be doing a chunk better.

The only two parts in common between those two drives is the molex power
connector.

> Toshiba's head office is even smarter.  In Japanese they refuse
> entirely to provide warranty service to end users.  Customers
> have to send defective disk drives back up through the sales channel.

I guess my suggestion is don't buy Toshiba.  Research support options before
you buy.

> Well, lucky customers who bought the disk drive as part of a notebook
> computer probably get one year's warranty from the vendor of the
> notebook computer, so if they're lucky enough to learn about Toshiba's
> firmware within a year then they can send their entire computer back
> for some length of time to get warranty service.

See above.

> But anyone who went to Akihabara and bought the drive by itself from a
> parts store, the store probably offers one week or one month to
> replace a failing drive if it was dead on arrival.  In these cases
> a customer who learns about Toshiba's firmware after two weeks or five
> weeks gets screwed.

Don't buy drives from bargain basement shops.  Buy from trusted retailers,
or direct from the manufacturer.  That you bought from a place that probably
didn't even stock retail packages in shock-resistant packaging is stupid.

> But Toshiba isn't the only maker who isn't saying how bad their
> firmware is.  We need those bad block lists.  They are as
> necessary as they ever were.

We're not saying our firmware is bad because frankly, I think it is rather
decent, and getting better every single product we release.  Given that the
disk drive is probably the most complex piece of machinery in your home, I
think they do pretty well all things considered.

I still don't understand why your Toshiba engineer friends couldn't help you
beyond listening to the drive bounce off the crash stop.

(BTW, if the drive is clunking because it can't acquire at a certain
location, odds are that more than just the user data at that sector is a
problem.  Your testing doesn't indicate that, but I'd be suspicious
personally.)

--eric, speaking for myself not Maxtor of course




^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: Blockbusting news, results are in
@ 2003-10-19 17:36 Mudama, Eric
  0 siblings, 0 replies; 52+ messages in thread
From: Mudama, Eric @ 2003-10-19 17:36 UTC (permalink / raw)
  To: 'Hans Reiser '
  Cc: ''Norman Diamond ' ',
	''Wes Janzen ' ',
	''Rogier Wolff ' ',
	''John Bradford ' ',
	''linux-kernel@vger.kernel.org ' ',
	''nikita@namesys.com ' ',
	''Pavel Machek ' ',
	''Justin Cormack ' ',
	''Russell King ' ',
	''Vitaly Fertman ' ',
	''Krzysztof Halasa ' '

 
> Eric, is it true what we tell users, that if a drive can't remap
> a bad block it has probably used up all its spares, and that in
> turn means that it is wise to buy a new one because the chance of
> experiencing additional data corruption on a drive that has used
> up all its spares is much higher than the average drive?

Not sure about other vendors, but a fatal write on a maxtor means we
couldn't do your write after exhausting all attempts at reallocation,
recertification, etc.  If you ever get this on a drive, either:

1) the drive is unable to reallocate any more blocks because it has run out
of spares

or

2) the drive was attempting those writes under environmental conditions that
it was unable to handle. (extreme shock&vibe, <5C, >55C, etc)

> What are the common sources of data corruption, is one of them
> that the drive head starts bumping the media more and more often
> because a bearing (or something) has started to show signs of wear?

>From my understanding, most returns are due to damaged heads (some small
percent burn up over time) or operational shock "head/media events" where
someone bumped a running drive and the head dug a crater in the media.  Any
particulate contamination can be struck by the heads causing high-fly write
events.  (head bounces up off the media in the middle of a write).  I
haven't heard of bearing wear being a common issue... all drives these days
use fluid bearings.  Early fluid bearings had outgassing issues at high
temperature, but I think those problems were solved by manufacturers long
before the first drives using them hit store shelves.

All in all, they're rather delicate.  I'm amazed they work at all too.

--eric

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: Blockbusting news, results are in
@ 2003-10-19 17:39 Mudama, Eric
  0 siblings, 0 replies; 52+ messages in thread
From: Mudama, Eric @ 2003-10-19 17:39 UTC (permalink / raw)
  To: 'Ingo Oeser '; +Cc: 'linux-kernel@vger.kernel.org '

 

-----Original Message-----
From: Ingo Oeser
> So reducing the default warranty is no good customer service in my
> opinion. And in Germany you have 2 years warranty per law, which
> resemble the life cycle of a PC quite nicely.

It wasn't my idea, I'm just an engineer.

Maxtor is a big enough company, I would guess that their warranty law
complience matches every country in which the drives are warrantied.

--eric



^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: Blockbusting news, results are in
@ 2003-10-19 17:51 Mudama, Eric
  2003-10-20  6:22 ` Rogier Wolff
  0 siblings, 1 reply; 52+ messages in thread
From: Mudama, Eric @ 2003-10-19 17:51 UTC (permalink / raw)
  To: 'Rogier Wolff '
  Cc: ''Norman Diamond ' ',
	''Hans Reiser ' ',
	''Wes Janzen ' ',
	''John Bradford ' ',
	''linux-kernel@vger.kernel.org ' ',
	''nikita@namesys.com ' ',
	''Pavel Machek ' ',
	''Justin Cormack ' ',
	''Russell King ' ',
	''Vitaly Fertman ' ',
	''Krzysztof Halasa ' '

 

-----Original Message-----
From: Rogier Wolff

> Know your maxtor drives: Maxtor has been shipping 4-platter,
> 8 head drives for quite a long time. Only recently am I
> starting to see the largest maxtor-drive from a family having
> the space to carry 4 platters, but none of the expected
> capacity are shipping (*).... Care to explain?

I do know our product line.  All of our current 4-platter products are 5400
RPM, and have been for 4 years.  People aren't interested in 5400RPM drives
anymore, and the design tolerances on a 4-platter 7200RPM drive are tight
enough that it becomes extremely difficult to manufacture.

The volume on our only current 4-platter drive is quite small, compared to
the rest of our products.  Most of the industry is getting away from the
HUGE drives and saying that 7200RPM "very big" is more important than the
extra 50GB of capacity.  (300 vs 250)

> Eric, do you know why maxtor stopped putting the number of heads
> in the model number? (It's the last number in the model number,
> just after the letter. Currently all drives set this to "0"). It
> was quite convenient for us to know what to expect from a 92720U8,
> 98196H8, 96147H8 and 4G160J8. (Hmmm apparently, we're mostly
> buying the "largest of the family" drives: they all have 8
> heads! I just looked at the models in some of our computers.)

You're buying 5400RPM products not 7200RPM.  Your 4G160J8 drive was
manufacturered over 2 years ago, it isn't a current product.

I'd guess that the reason we don't put the head number on the drive is to
not confuse OEM databases.  Our drives basically tell us at the end of
manufacturing how big they were able to become, regardless of head count.
To make it easier, our model number is now a capacity instead of a head
count.

It prevents Dell from saying "this model number comes in 4 sizes, we want
different part numbers for each capacity too!" so now we only give them the
capacity.

If you're looking for the densest drives our factory produces (which have,
by definition, the best sequential I/O performance), you can  buy only the
model number of a capacity that is at the peak (e.g. a 250GB drive can't be
made with a 30GB head, while a 200GB can) of a generation.

I think there are other ways to figure out how many heads are physically in
a drive, but I don't want to spoil it and take all the fun away.

--eric

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: Blockbusting news, results are in
@ 2003-10-20 14:08 Mudama, Eric
  2003-10-20 14:42 ` John Bradford
  0 siblings, 1 reply; 52+ messages in thread
From: Mudama, Eric @ 2003-10-20 14:08 UTC (permalink / raw)
  To: 'Rogier Wolff'
  Cc: ''Norman Diamond ' ',
	''Hans Reiser ' ',
	''Wes Janzen ' ',
	''John Bradford ' ',
	''linux-kernel@vger.kernel.org ' ',
	''nikita@namesys.com ' ',
	''Pavel Machek ' ',
	''Justin Cormack ' ',
	''Russell King ' ',
	''Vitaly Fertman ' ',
	''Krzysztof Halasa ' '



> -----Original Message-----
>
> We're getting annoyed at WD because they are selling WD800 drives 
> (80G) with 2, 4, 6 and 8 heads(*). So when we order a replacement
> WD800 for spare parts for a broken one, we might end up with a
> different generation drive which is useless for the "part exchange"
> project....
> 
> (*) they probably don't sell the full complement... yet.

With "today's generation" of ATA drives, WD and Maxtor stop at 3 platters,
and Seagate stops at a 2-platter design.

Everyone wants to make a 4-platter drive, but for their rather small
volumes, most people find it isn't cost effective.

> You're assuming that a head-switch is faster than a 
> track-to-track seek. Apparently that is no longer true.
> We've seen drives that "scan" a whole platter before
> switching heads. We've seen drives that do this on a 
> per-region basis. 

Track-to-track seeks are faster than headswitches because you don't have to
worry about radial comb imbalance (head A is only guaranteed to be within
some tolerance of head B, this creates your 'skew').

Most vendors these days have a "modified horizontal format" which does some
small number of cylinders (16?) then a headswitch, so that they slowly walk
inward.

Drives that walk the entire platter then headswitch haven't existed for
years I'm quite sure, at least not in modern ATA drives.  Headswitches after
each zone would possibly make sense, but it can make it noticably more
complicated to estimate the time for a seek, which is the key to good
performance.  (e.g. your "local area" IO becomes IO across potentially
thousands of cylinders)

> How about: "Opening it up and having a peek?" :-) That certainly
> works. But most vendors don't let me do that before I buy.  :-)

Before you buy? um, no =P

--eric

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: Blockbusting news, results are in
@ 2003-10-20 15:55 Mudama, Eric
  2003-10-20 17:32 ` Hans Reiser
  0 siblings, 1 reply; 52+ messages in thread
From: Mudama, Eric @ 2003-10-20 15:55 UTC (permalink / raw)
  To: 'Hans Reiser', Norman Diamond
  Cc: 'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', linux-kernel, nikita,
	'Pavel Machek ', 'Justin Cormack ',
	'Russell King ', 'Vitaly Fertman ',
	'Krzysztof Halasa '


> -----Original Message-----
> From: Hans Reiser [mailto:reiser@namesys.com]
> Sent: Sunday, October 19, 2003 2:25 AM
> To: Norman Diamond
> Cc: Mudama, Eric; 'Wes Janzen '; 'Rogier Wolff '; 'John Bradford ';
> linux-kernel@vger.kernel.org; nikita@namesys.com; 'Pavel Machek ';
> 'Justin Cormack '; 'Russell King '; 'Vitaly Fertman '; 
> 'Krzysztof Halasa
> '
> Subject: Re: Blockbusting news, results are in
> 
> 
> Norman Diamond wrote:
> 
> >
> >>What would you like "us disk makers" to say?
> >>    
> >>
> >
> >How to force reallocations even when data are lost, 
> >
> buy Maxtor and write to them, thereby triggering the remap.

It isn't necessarilly that simple.  However, if the drive is still alive, it
has written your data to a place where it can get at it again.  

^ permalink raw reply	[flat|nested] 52+ messages in thread
* Re: Blockbusting news, results are in
@ 2003-10-21 15:18 Norman Diamond
  2003-10-21 19:31 ` Chuck Campbell
  0 siblings, 1 reply; 52+ messages in thread
From: Norman Diamond @ 2003-10-21 15:18 UTC (permalink / raw)
  To: linux-kernel

Jan-Benedict Glaw replied to me:

> > After a few other experiments, I used smartctl to direct the drive to do a
> > long self-test.  When it completed, we observed that the drive had
> > self-diagnosed a read failure on the same bad sector number as always, and
> > we observed that the drive did not reallocate the bad block during long
> > self-tests.
>
> Maybe the drive can't remap the block because there's no free space in
> the remap area available any more...

As previously reported about twice in this thread, the first time I ran
"smartctl -a" it reported that the quantities of reallocated sector events
and reallocated sector count were both 1, and when I ran it again after the
first long self-test, both of these quantities had increased to 2.  If there
were no room in the remap area, then how did the remaps increase from 1 to 2
while the permanently bad sector remained non-remapped and permanantly bad?

Here's more results.  The quantities of reallocated sector events and
reallocated sector count have both increased to 3.  Meanwhile the
permanently bad sector remains permanently bad.  I think that there is still
room remaining in the remap area.

By this time I think my friends at Toshiba agree that Toshiba's firmware is
inadequate, though participants in this LKML thread are about evenly divided
on the issue.  It seems that Maxtor's firmware is better, though I notice
the silence regarding my questions of customer non-service and uncertain
warranties (when Maxtor distributed one drive with two incompatible sets of
jumpering instructions).  And other manufacturers still aren't saying
whether their firmware is adequate.  It still seems that either Linux must
be made to work around known bad blocks or else hard disks and Linux cannot
be used together on a computer.


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2003-10-23 17:33 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-19  2:16 Blockbusting news, results are in Norman Diamond
2003-10-19  4:15 ` Larry McVoy
2003-10-19  5:00   ` Paul
2003-10-19  8:19     ` Andre Hedrick
2003-10-19  8:08   ` Hans Reiser
2003-10-19  8:35     ` William Lee Irwin III
2003-10-19 20:01       ` Pavel Machek
2003-10-19 20:11         ` William Lee Irwin III
2003-10-20  7:24         ` John Bradford
2003-10-19 22:49       ` jw schultz
2003-10-20  7:22         ` John Bradford
2003-10-20  8:22           ` jw schultz
2003-10-20  7:27         ` Hans Reiser
2003-10-20  8:08           ` jw schultz
2003-10-19 19:49     ` Pavel Machek
2003-10-20  7:22       ` Hans Reiser
2003-10-21 10:31     ` Eric W. Biederman
2003-10-21  8:43 ` Jan-Benedict Glaw
  -- strict thread matches above, loose matches on Subject: below --
2003-10-19  7:37 Mudama, Eric
2003-10-19  8:09 ` Norman Diamond
2003-10-19  8:24   ` Hans Reiser
2003-10-19 11:43   ` Ralf Baechle
2003-10-19 15:55   ` Krzysztof Halasa
2003-10-19  8:13 ` Rogier Wolff
2003-10-19  8:17 ` Hans Reiser
2003-10-19  8:41   ` Rogier Wolff
2003-10-20 15:56     ` Thayne Harbaugh
2003-10-19  8:21 ` Andre Hedrick
2003-10-19  8:27   ` Hans Reiser
2003-10-19  9:01     ` Erik Andersen
2003-10-19 14:10       ` Andre Hedrick
2003-10-19 18:16         ` Hans Reiser
2003-10-19 19:44           ` Andre Hedrick
2003-10-20  7:21             ` Hans Reiser
2003-10-19 14:42       ` Valdis.Kletnieks
2003-10-19 10:47 ` Ingo Oeser
2003-10-19 17:36 Mudama, Eric
2003-10-19 17:39 Mudama, Eric
2003-10-19 17:51 Mudama, Eric
2003-10-20  6:22 ` Rogier Wolff
2003-10-20 14:08 Mudama, Eric
2003-10-20 14:42 ` John Bradford
2003-10-20 15:55 Mudama, Eric
2003-10-20 17:32 ` Hans Reiser
2003-10-21 15:18 Norman Diamond
2003-10-21 19:31 ` Chuck Campbell
2003-10-21 20:05   ` Richard B. Johnson
2003-10-21 20:21     ` Valdis.Kletnieks
2003-10-21 20:31       ` Richard B. Johnson
2003-10-21 21:53       ` Theodore Ts'o
2003-10-22  2:32         ` Valdis.Kletnieks
2003-10-23 17:28           ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).