Re: Blockbusting news, results are in

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Blockbusting news, results are in
@ 2003-10-19  2:16 Norman Diamond
  2003-10-19  4:15 ` Larry McVoy
  2003-10-21  8:43 ` Jan-Benedict Glaw
  0 siblings, 2 replies; 52+ messages in thread
From: Norman Diamond @ 2003-10-19  2:16 UTC (permalink / raw)
  To: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek, Justin Cormack, Russell King,
	Vitaly Fertman, Krzysztof Halasa

In order of importance instead of chronology:

In the presence of friends who are disk drive engineers at Toshiba, I tried
to read the file containing the bad block, we listened to the disk drive do
auto-retries, we watched Linux record the I/O failure in the system log, we
saw the "cp" program report an I/O error, and we observed that the drive did
not reallocate the bad block during reads.

Next, I tried to write the bad block.  The following command is not the
first one that I tried but it was the first one to actually try writing the
bad block:
  dd if=/dev/zero of=/dev/hda bs=512 seek=19021881 count=1
We listened to the disk drive do auto-retries, we watched Linux record the
I/O failure in the system log, we saw the "dd" command report an I/O error,
and we observed that the drive did not reallocate the bad block during
writes.  (The bad block number is 19021882.)

After a few other experiments, I used smartctl to direct the drive to do a
long self-test.  When it completed, we observed that the drive had
self-diagnosed a read failure on the same bad sector number as always, and
we observed that the drive did not reallocate the bad block during long
self-tests.

Does anyone need more?

A partial solution could be to stop using Toshiba drives, but I don't think
this will be a complete answer.  Toshiba is not the only maker whose disk
drives get bad blocks.  We do not know if Toshiba is the only maker whose
firmware refuses to reallocate bad blocks when permanent errors are
detected, because the makers aren't saying.

File systems must maintain lists of  bad blocks and prevent ordinary file
operations from ever using those sector numbers.

Someone pointed out that this technique will not work for swap partitions.
I agree.  The "mkswap" command needs to test every sector in the swap
partition and warn the user if the partition will be unusable.

Now for the less important stuff.

After many hours of "find"ing and "cp"ing files to /dev/null, the bad block
was detected to be in file
  /usr/share/locale/es/LC_MESSAGES/bfd.mo
So indeed, this file had been written once and was not intended to be
written again, and could easily be restored from a source of good data.  But
I was really startled by this, because I don't use Spanish locales.  The
only locales I use are Japanese and English.  So why did this file even get
read, even while I was doing kernel compiles and stuff like that?  After
all, the reason the bad block was getting logged in the system log was that
the file was getting read.

I "mv"ed the file to file /badblockhere and used rpm with --replacepkgs to
reinstall binutils from SuSE's 8.2 distribution.  Then copied the new
correct file /usr/share/locale/es/LC_MESSAGES/bfd.mo to file /goodfilehere.
This preparation made it easy to do experiments with my Toshiba friends when
they visited.

My first attempt to write the bad block (after the read experiments) was:
  dd if=/goodfilehere of=/badblockhere
But this did not even try to write to the bad block.  The drive did not try
to do any auto-retries, there were no errors in the system log, and the dd
command output a success message.  Next, a repeat of a read attempt that
used to fail:
  cp /badblockhere /dev/null
succeeded.  So I guess that the when the dd command is told to output to an
ordinary file, it does not overwrite its output file, it creates a new file
and then renames it to replace the old file.  (Too bad it couldn't do the
same when I ran this command:
  dd if=/dev/zero of=/dev/hda bs=512 seek=19021881 count=1
and write a new disk drive to replace the old one  ^u^)

And now that block is in free space somewhere, waiting for Linux and the
Reiser filesystem to allocate it when creating or expanding some future
file.

The bad block can still be detected.  This fails as always:
  dd if=/dev/hda of=/dev/null bs=512 skip=19021881 count=1
(The bad block number is 19021882.)

By the way, Toshiba's US subsidiary has indications on their web site that
they provide warranty service on their products, but that they have reduced
the warranty period from three years to one year.  This was a smart move by
Toshiba's US subsidiary.  If their disk drives start to develop bad blocks
after two years, then customers don't discover how bad Toshiba's firmware is
until two years have passed, and now they can't even make claims to get
firmware fixed.

Toshiba's head office is even smarter.  In Japanese they refuse entirely to
provide warranty service to end users.  Customers have to send defective
disk drives back up through the sales channel.  Well, lucky customers who
bought the disk drive as part of a notebook computer probably get one year's
warranty from the vendor of the notebook computer, so if they're lucky
enough to learn about Toshiba's firmware within a year then they can send
their entire computer back for some length of time to get warranty service.
But anyone who went to Akihabara and bought the drive by itself from a parts
store, the store probably offers one week or one month to replace a failing
drive if it was dead on arrival.  In these cases a customer who learns about
Toshiba's firmware after two weeks or five weeks gets screwed.

My disk drive was made at Toshiba's factory in Gifu prefecture on September
13, 2001.  Since that time the factory has closed and this model has been
discontinued.

But Toshiba isn't the only maker who isn't saying how bad their firmware is.
We need those bad block lists.  They are as necessary as they ever were.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  2:16 Blockbusting news, results are in Norman Diamond
@ 2003-10-19  4:15 ` Larry McVoy
  2003-10-19  5:00   ` Paul
  2003-10-19  8:08   ` Hans Reiser
  2003-10-21  8:43 ` Jan-Benedict Glaw
  1 sibling, 2 replies; 52+ messages in thread
From: Larry McVoy @ 2003-10-19  4:15 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek, Justin Cormack, Russell King,
	Vitaly Fertman, Krzysztof Halasa

On Sun, Oct 19, 2003 at 11:16:42AM +0900, Norman Diamond wrote:
> We need those bad block lists.  They are as necessary as they ever were.

I'm not sure why this is a news flash.  When I was at Sun a 2GB drive
cost us $4000.  I think we sold them for $6000.  You can't buy a 2GB
drive today nor a 20GB drive.  A 200GB drive costs $160.  That's 100
times bigger for 25 times less money, or a net increase of price/capacity
of 2500.  In the same period of time, CPUs have not kept up though they
are close.

You're suprised that drives are unreliable?  Please.  You are getting
unbelievable value from those drives and you demanded it.  Price is the
only way people make purchasing decisions, that's why DEC got out of the
drive business, then HP did, and then IBM did.  They couldn't afford to
compete with the cutrate junk that we call drives today.

I'm not blaming you, I'm as bad as the next guy, I buy based on price
as well but I have no illusions that what I am buying is reliable.
The drives we put into servers here go through a couple weeks of all bit
patterns being changed and even then we don't depend on them, everything
is backed up.

I've told you guys over and over that you need to CRC the data in user
space, we do that in our backup scripts and it tells us when the drives
are going bad.  So we don't get burned and you wouldn't either if you
did the same thing.

Drives are amazingly cheap, it's a miracle that they work at all, don't
be so suprised when they don't.
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  4:15 ` Larry McVoy
@ 2003-10-19  5:00   ` Paul
  2003-10-19  8:19     ` Andre Hedrick
  2003-10-19  8:08   ` Hans Reiser
  1 sibling, 1 reply; 52+ messages in thread
From: Paul @ 2003-10-19  5:00 UTC (permalink / raw)
  To: Larry McVoy, Norman Diamond; +Cc: linux-kernel

Larry McVoy <lm@bitmover.com>, on Sat Oct 18, 2003 [09:15:53 PM] said:
> On Sun, Oct 19, 2003 at 11:16:42AM +0900, Norman Diamond wrote:
> > We need those bad block lists.  They are as necessary as they ever were.
> 
> I'm not sure why this is a news flash.  When I was at Sun a 2GB drive
> cost us $4000.  I think we sold them for $6000.  You can't buy a 2GB
> drive today nor a 20GB drive.  A 200GB drive costs $160.  That's 100
> times bigger for 25 times less money, or a net increase of price/capacity
> of 2500.  In the same period of time, CPUs have not kept up though they
> are close.
> 
> You're suprised that drives are unreliable?  Please.  You are getting
> unbelievable value from those drives and you demanded it.  Price is the
> only way people make purchasing decisions, that's why DEC got out of the
> drive business, then HP did, and then IBM did.  They couldn't afford to
> compete with the cutrate junk that we call drives today.
> 
> I'm not blaming you, I'm as bad as the next guy, I buy based on price
> as well but I have no illusions that what I am buying is reliable.
> The drives we put into servers here go through a couple weeks of all bit
> patterns being changed and even then we don't depend on them, everything
> is backed up.
> 
> I've told you guys over and over that you need to CRC the data in user
> space, we do that in our backup scripts and it tells us when the drives
> are going bad.  So we don't get burned and you wouldn't either if you
> did the same thing.
> 
> Drives are amazingly cheap, it's a miracle that they work at all, don't
> be so suprised when they don't.
> -- 
> ---
> Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm
        Hi;

        I think you may be missing the point he is trying to make
in order to take your hobby horse for a spin;) He is trying to
claim, that he has a disk that is not dying, that has a bad
sector that he cant get remapped, and thus, there needs to be
support for bad blocks in the filesystem layer. (in the face
of the argument that modern disks make filesystem support of
bad blocks irrelevant.)
        As a side note, I also have a 6gig disk, which a few
years ago was, ahem, bumped during a write. It now has a handful
of screwy sectors, that I cant get rid of, even after doing
the stuff Norman describes. I used the -c option to e2fsck,
and its been doing great ever since-- a few years of use without
more bad sectors.

Paul
set@pobox.com


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  5:00   ` Paul
@ 2003-10-19  8:19     ` Andre Hedrick
  0 siblings, 0 replies; 52+ messages in thread
From: Andre Hedrick @ 2003-10-19  8:19 UTC (permalink / raw)
  To: Paul; +Cc: Larry McVoy, Norman Diamond, linux-kernel

On Sun, 19 Oct 2003, Paul wrote:

> Larry McVoy <lm@bitmover.com>, on Sat Oct 18, 2003 [09:15:53 PM] said:
> > On Sun, Oct 19, 2003 at 11:16:42AM +0900, Norman Diamond wrote:
> > > We need those bad block lists.  They are as necessary as they ever were.
> > 
> > I'm not sure why this is a news flash.  When I was at Sun a 2GB drive
> > cost us $4000.  I think we sold them for $6000.  You can't buy a 2GB
> > drive today nor a 20GB drive.  A 200GB drive costs $160.  That's 100
> > times bigger for 25 times less money, or a net increase of price/capacity
> > of 2500.  In the same period of time, CPUs have not kept up though they
> > are close.
> > 
> > You're suprised that drives are unreliable?  Please.  You are getting
> > unbelievable value from those drives and you demanded it.  Price is the
> > only way people make purchasing decisions, that's why DEC got out of the
> > drive business, then HP did, and then IBM did.  They couldn't afford to
> > compete with the cutrate junk that we call drives today.
> > 
> > I'm not blaming you, I'm as bad as the next guy, I buy based on price
> > as well but I have no illusions that what I am buying is reliable.
> > The drives we put into servers here go through a couple weeks of all bit
> > patterns being changed and even then we don't depend on them, everything
> > is backed up.
> > 
> > I've told you guys over and over that you need to CRC the data in user
> > space, we do that in our backup scripts and it tells us when the drives
> > are going bad.  So we don't get burned and you wouldn't either if you
> > did the same thing.
> > 
> > Drives are amazingly cheap, it's a miracle that they work at all, don't
> > be so suprised when they don't.
> > -- 
> > ---
> > Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm
>         Hi;
> 
>         I think you may be missing the point he is trying to make
> in order to take your hobby horse for a spin;) He is trying to
> claim, that he has a disk that is not dying, that has a bad
> sector that he cant get remapped, and thus, there needs to be
> support for bad blocks in the filesystem layer. (in the face
> of the argument that modern disks make filesystem support of
> bad blocks irrelevant.)

First you have to make Linux have a direct path back to the application
layer which owns the request.  Then you can attempt a filesystem remapping
code war.

Well basically there are ways to force invoke the remap but 99% of the
people can not and will not go through the hassle.  So I am not going to
spend time explaining each and every vendor mode.

That is what people in media forensics get paid to do.

>         As a side note, I also have a 6gig disk, which a few
> years ago was, ahem, bumped during a write. It now has a handful
> of screwy sectors, that I cant get rid of, even after doing
> the stuff Norman describes. I used the -c option to e2fsck,
> and its been doing great ever since-- a few years of use without
> more bad sectors.
> 
> Paul
> set@pobox.com
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  4:15 ` Larry McVoy
  2003-10-19  5:00   ` Paul
@ 2003-10-19  8:08   ` Hans Reiser
  2003-10-19  8:35     ` William Lee Irwin III
                       ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Hans Reiser @ 2003-10-19  8:08 UTC (permalink / raw)
  To: Larry McVoy
  Cc: Norman Diamond, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek, Justin Cormack, Russell King,
	Vitaly Fertman, Krzysztof Halasa

Larry McVoy wrote:

>
>
>I've told you guys over and over that you need to CRC the data in user
>space, we do that in our backup scripts and it tells us when the drives
>are going bad.  S
>
Why do the CRC in user space, that requires modifying every one of 7000+ applications (if I understand you correctly, which is far from a sure thing;-) )?

Write a reiser4 CRC file plugin.  It would take a weekend, and most of the work would be cut and pasting from the default file plugin..  

I understand why you do it in BK, but for user space as a whole user space is the wrong place.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:08   ` Hans Reiser
@ 2003-10-19  8:35     ` William Lee Irwin III
  2003-10-19 20:01       ` Pavel Machek
  2003-10-19 22:49       ` jw schultz
  2003-10-19 19:49     ` Pavel Machek
  2003-10-21 10:31     ` Eric W. Biederman
  2 siblings, 2 replies; 52+ messages in thread
From: William Lee Irwin III @ 2003-10-19  8:35 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Larry McVoy, Norman Diamond, Wes Janzen, Rogier Wolff,
	John Bradford, linux-kernel, nikita, Pavel Machek, Justin Cormack,
	Russell King, Vitaly Fertman, Krzysztof Halasa, axboe

Larry McVoy wrote:
>> I've told you guys over and over that you need to CRC the data in user
>> space, we do that in our backup scripts and it tells us when the drives
>> are going bad.  S

On Sun, Oct 19, 2003 at 12:08:00PM +0400, Hans Reiser wrote:
> Why do the CRC in user space, that requires modifying every one of 7000+ 
> applications (if I understand you correctly, which is far from a sure 
> thing;-) )?
> Write a reiser4 CRC file plugin.  It would take a weekend, and most of the 
> work would be cut and pasting from the default file plugin..  
> I understand why you do it in BK, but for user space as a whole user space 
> is the wrong place.

I think the fs driver layer might be the wrong thing too; maybe it'd be
best to do the CRC and/or checksumming at the block layer?

At the very least, I see a lack of genericity with respect to making it a
plugin to a specific fs. I'm going to try not to delve too far into
specifics, as my knowledge in these areas is limited, but I'd welcome any
corrections of misunderstandings I might have about feasibility, value,
or importance of these things, and even techhical misconceptions.

Jens, I apologize if advance if this is just another lame flamewar best
bitbucketed as opposed to answered.

Thanks.

-- wli

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:35     ` William Lee Irwin III
@ 2003-10-19 20:01       ` Pavel Machek
  2003-10-19 20:11         ` William Lee Irwin III
  2003-10-20  7:24         ` John Bradford
  2003-10-19 22:49       ` jw schultz
  1 sibling, 2 replies; 52+ messages in thread
From: Pavel Machek @ 2003-10-19 20:01 UTC (permalink / raw)
  To: William Lee Irwin III, Hans Reiser, Larry McVoy, Norman Diamond,
	Wes Janzen, Rogier Wolff, John Bradford, linux-kernel, nikita,
	Pavel Machek, Justin Cormack, Russell King, Vitaly Fertman,
	Krzysztof Halasa, axboe

Hi!

> >> I've told you guys over and over that you need to CRC the data in user
> >> space, we do that in our backup scripts and it tells us when the drives
> >> are going bad.  S
> 
> On Sun, Oct 19, 2003 at 12:08:00PM +0400, Hans Reiser wrote:
> > Why do the CRC in user space, that requires modifying every one of 7000+ 
> > applications (if I understand you correctly, which is far from a sure 
> > thing;-) )?
> > Write a reiser4 CRC file plugin.  It would take a weekend, and most of the 
> > work would be cut and pasting from the default file plugin..  
> > I understand why you do it in BK, but for user space as a whole user space 
> > is the wrong place.
> 
> I think the fs driver layer might be the wrong thing too; maybe it'd be
> best to do the CRC and/or checksumming at the block layer?

I think that's best place.

Here's first attempt at implementation:
http://www.ussg.iu.edu/hypermail/linux/kernel/0004.3/0487.html

									Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19 20:01       ` Pavel Machek
@ 2003-10-19 20:11         ` William Lee Irwin III
  2003-10-20  7:24         ` John Bradford
  1 sibling, 0 replies; 52+ messages in thread
From: William Lee Irwin III @ 2003-10-19 20:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Hans Reiser, Larry McVoy, Norman Diamond, Wes Janzen,
	Rogier Wolff, John Bradford, linux-kernel, nikita, Pavel Machek,
	Justin Cormack, Russell King, Vitaly Fertman, Krzysztof Halasa,
	axboe

At some point in the past, I wrote:
>> I think the fs driver layer might be the wrong thing too; maybe it'd be
>> best to do the CRC and/or checksumming at the block layer?

On Sun, Oct 19, 2003 at 10:01:05PM +0200, Pavel Machek wrote:
> I think that's best place.
> Here's first attempt at implementation:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0004.3/0487.html

Wow, that was a while ago. Well, I'm impressed.


-- wli

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19 20:01       ` Pavel Machek
  2003-10-19 20:11         ` William Lee Irwin III
@ 2003-10-20  7:24         ` John Bradford
  1 sibling, 0 replies; 52+ messages in thread
From: John Bradford @ 2003-10-20  7:24 UTC (permalink / raw)
  To: Pavel Machek, William Lee Irwin III, Hans Reiser, Larry McVoy,
	Norman Diamond, Wes Janzen, Rogier Wolff, linux-kernel, nikita,
	Pavel Machek, Justin Cormack, Russell King, Vitaly Fertman,
	Krzysztof Halasa, axboe

> > I think the fs driver layer might be the wrong thing too; maybe it'd be
> > best to do the CRC and/or checksumming at the block layer?
> 
> I think that's best place.

Yes, because you can then use it on ST-506 disks if you really want
to, and never see a bad block at the filesystem level.  It also means
that people with drives that do defect management to their
satisfaction don't need the overhead of the defect management in
software layer, however small it may be.

John.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:35     ` William Lee Irwin III
  2003-10-19 20:01       ` Pavel Machek
@ 2003-10-19 22:49       ` jw schultz
  2003-10-20  7:22         ` John Bradford
  2003-10-20  7:27         ` Hans Reiser
  1 sibling, 2 replies; 52+ messages in thread
From: jw schultz @ 2003-10-19 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: William Lee Irwin III, Hans Reiser, Larry McVoy, Norman Diamond,
	Wes Janzen, Rogier Wolff, John Bradford, nikita, Pavel Machek,
	Justin Cormack, Russell King, Vitaly Fertman, Krzysztof Halasa,
	axboe

On Sun, Oct 19, 2003 at 01:35:51AM -0700, William Lee Irwin III wrote:
> Larry McVoy wrote:
> >> I've told you guys over and over that you need to CRC the data in user
> >> space, we do that in our backup scripts and it tells us when the drives
> >> are going bad.  S
> 
> On Sun, Oct 19, 2003 at 12:08:00PM +0400, Hans Reiser wrote:
> > Why do the CRC in user space, that requires modifying every one of 7000+ 
> > applications (if I understand you correctly, which is far from a sure 
> > thing;-) )?
> > Write a reiser4 CRC file plugin.  It would take a weekend, and most of the 
> > work would be cut and pasting from the default file plugin..  
> > I understand why you do it in BK, but for user space as a whole user space 
> > is the wrong place.
> 
> I think the fs driver layer might be the wrong thing too; maybe it'd be
> best to do the CRC and/or checksumming at the block layer?

Or even better, do it on the disk controller strapped to the
physical disk so you can hide the fact that CRCs add data
overhead for every block making them longer than 2^n and can
use a CRC with optimised for the type of errors most common
on the media.  Wait, that is where it is already being done.

What is apparently missing is better handling of the
uncorrectable errors.  Specifically the ability to pass the
errors and warnings up to the OS for evaluation and for the
OS to be able to request a block remap or to undo a block
remap.

I'd guess that most of Hans' errors are not coming from
spinning media but from tranmission errors on the HD cables
and system busses, and from undetected memory errors.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19 22:49       ` jw schultz
@ 2003-10-20  7:22         ` John Bradford
  2003-10-20  8:22           ` jw schultz
  2003-10-20  7:27         ` Hans Reiser
  1 sibling, 1 reply; 52+ messages in thread
From: John Bradford @ 2003-10-20  7:22 UTC (permalink / raw)
  To: jw schultz, linux-kernel
  Cc: William Lee Irwin III, Hans Reiser, Larry McVoy, Norman Diamond,
	Wes Janzen, Rogier Wolff, John Bradford, nikita, Pavel Machek,
	Justin Cormack, Russell King, Vitaly Fertman, Krzysztof Halasa,
	axboe

> What is apparently missing is better handling of the
> uncorrectable errors.  Specifically the ability to pass the
> errors and warnings up to the OS for evaluation and for the
> OS to be able to request a block remap or to undo a block
> remap.

Why this suggestion keeping coming up, I have no idea.  If you take
the idea to it's extreme, it's basically saying that we should
off-load all processing on to the host.  Although there has been a
move towards dumb peripherals in recent years, (E.G. software modems),
I have seen almost no even vaguely convincing arguments other than
cost as to why they are superior, (lower latency has been mentioned
with regard to software modems - I fail to see the benefit, although I
suppose it might exist for games players).  Apart from some data
recovery applications, I don't see how it is possible to do anything
really useful simply by adding the ability to pass some warnings and
errors up to the OS, without giving the OS access to all of the data
that the drive firmware has access to.

Obviously drives with completely open and free firmware would be
great, but that is not likely to happen in the near future, so for the
time being, if you don't like the way drives handle defect management,
complain to the manufactuers.  I am satisfied with the way Maxtor
disks handle defect management, both Eric's explainations and my own
observations.

John.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-20  7:22         ` John Bradford
@ 2003-10-20  8:22           ` jw schultz
  0 siblings, 0 replies; 52+ messages in thread
From: jw schultz @ 2003-10-20  8:22 UTC (permalink / raw)
  To: linux-kernel

On Mon, Oct 20, 2003 at 08:22:59AM +0100, John Bradford wrote:
> > What is apparently missing is better handling of the
> > uncorrectable errors.  Specifically the ability to pass the
> > errors and warnings up to the OS for evaluation and for the
> > OS to be able to request a block remap or to undo a block
> > remap.
> 
> Why this suggestion keeping coming up, I have no idea.  If you take
> the idea to it's extreme, it's basically saying that we should
> off-load all processing on to the host.  Although there has been a
> move towards dumb peripherals in recent years, (E.G. software modems),
> I have seen almost no even vaguely convincing arguments other than
> cost as to why they are superior, (lower latency has been mentioned
> with regard to software modems - I fail to see the benefit, although I
> suppose it might exist for games players).  Apart from some data
> recovery applications, I don't see how it is possible to do anything
> really useful simply by adding the ability to pass some warnings and
> errors up to the OS, without giving the OS access to all of the data
> that the drive firmware has access to.

I'm not suggesting the drive off-load it.  I'm only
suggesting that there be a mechanism for the host to be more
involved if the host is capable.

The problem that began this thread is a perfect example.  A
bad block that the drive firmware apparently will not remap
calls for the ability to explicitly instruct the drive to
remap it.  In some cases it might be good to be able to let
the host countermand a remap if the disk reports overtemp.

> Obviously drives with completely open and free firmware would be
> great, but that is not likely to happen in the near future, so for the
> time being, if you don't like the way drives handle defect management,
> complain to the manufactuers.  I am satisfied with the way Maxtor
> disks handle defect management, both Eric's explainations and my own
> observations.

No disagreement here.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19 22:49       ` jw schultz
  2003-10-20  7:22         ` John Bradford
@ 2003-10-20  7:27         ` Hans Reiser
  2003-10-20  8:08           ` jw schultz
  1 sibling, 1 reply; 52+ messages in thread
From: Hans Reiser @ 2003-10-20  7:27 UTC (permalink / raw)
  To: jw schultz
  Cc: linux-kernel, William Lee Irwin III, Larry McVoy, Norman Diamond,
	Wes Janzen, Rogier Wolff, John Bradford, nikita, Pavel Machek,
	Justin Cormack, Russell King, Vitaly Fertman, Krzysztof Halasa,
	axboe

jw schultz wrote:

>
>
>I'd guess that most of Hans' errors are not coming from
>spinning media but from tranmission errors on the HD cables
>and system busses, and from undetected memory errors.
>
>
>  
>

I am not getting any errors;-), but what you say sounds likely to be 
true of the reiserfs  users who see errors, and I trust Larry's account 
of his users.

-- 
Hans



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-20  7:27         ` Hans Reiser
@ 2003-10-20  8:08           ` jw schultz
  0 siblings, 0 replies; 52+ messages in thread
From: jw schultz @ 2003-10-20  8:08 UTC (permalink / raw)
  To: linux-kernel

On Mon, Oct 20, 2003 at 11:27:23AM +0400, Hans Reiser wrote:
> jw schultz wrote:
> 
> >
> >
> >I'd guess that most of Hans' errors are not coming from
> >spinning media but from tranmission errors on the HD cables
> >and system busses, and from undetected memory errors.
> >
> >
> > 
> >
> 
> I am not getting any errors;-), but what you say sounds likely to be 
> true of the reiserfs  users who see errors, and I trust Larry's account 
> of his users.

Sorry, i meant Larry.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:08   ` Hans Reiser
  2003-10-19  8:35     ` William Lee Irwin III
@ 2003-10-19 19:49     ` Pavel Machek
  2003-10-20  7:22       ` Hans Reiser
  2003-10-21 10:31     ` Eric W. Biederman
  2 siblings, 1 reply; 52+ messages in thread
From: Pavel Machek @ 2003-10-19 19:49 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Larry McVoy, Norman Diamond, Wes Janzen, Rogier Wolff,
	John Bradford, linux-kernel, nikita, Pavel Machek, Justin Cormack,
	Russell King, Vitaly Fertman, Krzysztof Halasa

Hi!

> >I've told you guys over and over that you need to CRC the data in user
> >space, we do that in our backup scripts and it tells us when the drives
> >are going bad.  S
> >
> Why do the CRC in user space, that requires modifying every one of 7000+ 
> applications (if I understand you correctly, which is far from a sure 
> thing;-) )?
> 
> Write a reiser4 CRC file plugin.  It would take a weekend, and most of the 
> work would be cut and pasting from the default file plugin..  

Even better, use crc loop method, and do checks at block device
level. I had patch to implement that...
								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19 19:49     ` Pavel Machek
@ 2003-10-20  7:22       ` Hans Reiser
  0 siblings, 0 replies; 52+ messages in thread
From: Hans Reiser @ 2003-10-20  7:22 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Larry McVoy, Norman Diamond, Wes Janzen, Rogier Wolff,
	John Bradford, linux-kernel, nikita, Justin Cormack, Russell King,
	Vitaly Fertman, Krzysztof Halasa

Pavel Machek wrote:

>
>Even better, use crc loop method, and do checks at block device
>level. I had patch to implement that...
>								Pavel
>  
>
this would not get the memory errors that Larry was talking about as 
well..... which is not to discount your patch, sounds like a reasonable 
patch.....

-- 
Hans



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:08   ` Hans Reiser
  2003-10-19  8:35     ` William Lee Irwin III
  2003-10-19 19:49     ` Pavel Machek
@ 2003-10-21 10:31     ` Eric W. Biederman
  2 siblings, 0 replies; 52+ messages in thread
From: Eric W. Biederman @ 2003-10-21 10:31 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Larry McVoy, Norman Diamond, Wes Janzen, Rogier Wolff,
	John Bradford, linux-kernel, nikita, Pavel Machek, Justin Cormack,
	Russell King, Vitaly Fertman, Krzysztof Halasa

Hans Reiser <reiser@namesys.com> writes:

> Larry McVoy wrote:
> 
> >
> >
> >I've told you guys over and over that you need to CRC the data in user
> >space, we do that in our backup scripts and it tells us when the drives
> >are going bad.  S
> >
> Why do the CRC in user space, that requires modifying every one of 7000+
> applications (if I understand you correctly, which is far from a sure thing;-)
> )?

End to end data integrity checking is a requirement.  Otherwise errors
happen silently and you rarely if ever see them.  And the error checking
must be end to end because you cannot trust the other layers to work
properly 100% of the time.

However to actually track down errors to root causes of errors, the closer
you can have your error checking to the hardware the better.  So having
CRC data or similar in the filesystem for both the metadata and the
file information is a good thing.

> Write a reiser4 CRC file plugin.  It would take a weekend, and most of the work
> would be cut and pasting from the default file plugin..  I understand why you do
> it in BK, but for user space as a whole user space is the wrong
> place.

Error checking should not be necessary for casual files that you don't
really care about but for times when you care about the integrity of
your data the application should be checking it.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  2:16 Blockbusting news, results are in Norman Diamond
  2003-10-19  4:15 ` Larry McVoy
@ 2003-10-21  8:43 ` Jan-Benedict Glaw
  1 sibling, 0 replies; 52+ messages in thread
From: Jan-Benedict Glaw @ 2003-10-21  8:43 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]

On Sun, 2003-10-19 11:16:42 +0900, Norman Diamond <ndiamond@wta.att.ne.jp>
wrote in message <1c6401c395e7$16630d00$3eee4ca5@DIAMONDLX60>:
> After a few other experiments, I used smartctl to direct the drive to do a
> long self-test.  When it completed, we observed that the drive had
> self-diagnosed a read failure on the same bad sector number as always, and
> we observed that the drive did not reallocate the bad block during long
> self-tests.

Maybe the drive can't remap the block because there's no free space in
the remap area available any more...

In this case, the real problem is that the drive doesn't tell in advance
that is already remapped some blocks.

MfG, JBG

-- 
   Jan-Benedict Glaw       jbglaw@lug-owl.de    . +49-172-7608481
   "Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg
    fuer einen Freien Staat voll Freier Bürger" | im Internet! |   im Irak!
   ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: Blockbusting news, results are in
@ 2003-10-19  7:37 Mudama, Eric
  2003-10-19  8:09 ` Norman Diamond
                   ` (4 more replies)
  0 siblings, 5 replies; 52+ messages in thread
From: Mudama, Eric @ 2003-10-19  7:37 UTC (permalink / raw)
  To: 'Norman Diamond ', 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', 'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

> Does anyone need more?

Why don't you ask your friends at Toshiba whether that model supports
automatic reallocation, and if it does, how to enable it?

Since it isn't in the T13 ATA spec, I am assuming the ability to toggle that
feature is very vendor-specific.  Pretty sure all Maxtors from at least the
last year ship with that sort of reallocation enabled, and probably the last
4-5 years.

> We do not know if Toshiba is the only maker whose firmware
> refuses to reallocate bad blocks when permanent errors are
> detected, because the makers aren't saying.

What would you like "us disk makers" to say?  The drives I play with at work
happily reallocate on the fly all the time. (when I whack them with a
screwdriver and cause scratches on the media, that is)

> By the way, Toshiba's US subsidiary has indications on their
> web site that they provide warranty service on their products,
> but that they have reduced the warranty period from three years
> to one year.  This was a smart move by Toshiba's US subsidiary.

Yes, it saves us a lot of money every year, and lets us sell you each drive
for a few dollars cheaper.  My understanding is that the #1 cost issue is
the fact that to warranty a product legally in the USA, you need to maintain
a certain amount of product to handle replacement drives, long after they
stop being shipped.  Reducing our warranty inventory to some fraction of 1
year's volume (~55M drives) from some fraction of 3 year's volume (~160M
drives) is a significant amount of product we don't have to "eat".
(Remember, 3 year old drives, that we no longer need to hold on to for
warranty purposes, are near-worthless in the consumer market)

If every other part of your computer is warrantied for 1 year, why should
disk drives alone in the cheapest OEM systems carry 3 year warranties?  BTW,
you're welcome to buy "premium" drives with 3-year or 5-year warranties.  (3
on most vendor's high end ATA products, and 5 years on most SCSI products)
In most cases these premium warranties will only cost you $5-$10.  (This is
based simply on the rough price delta between our DiamondMax Plus9 200GB and
our MaxLine II 200GB, which are basically the same drive with different
warranties)

> If their disk drives start to develop bad blocks after two
> years, then customers don't discover how bad Toshiba's firmware
> is until two years have passed, and now they can't even make
> claims to get firmware fixed.

What do you want "fixed" in the firmware?  It is 1000x cheaper to just send
you a replacement drive from the current product line.  By the time 3 years
have passed (2 years beyond a 1 year warranty), our factory isn't even
capable of reprocessing the disk drive you hold in your hands, since we wind
up retooling chunks of it every few months to make way for
bigger/faster/quieter/cheaper disk drives.

About 2.5 years ago, Maxtor's largest drive was 60GB... 15GB/head.  Now
we're shipping 250GB drives with 6 heads also... ~42GB/head, almost triple
the capacity, and in a few months we'll be doing a chunk better.

The only two parts in common between those two drives is the molex power
connector.

> Toshiba's head office is even smarter.  In Japanese they refuse
> entirely to provide warranty service to end users.  Customers
> have to send defective disk drives back up through the sales channel.

I guess my suggestion is don't buy Toshiba.  Research support options before
you buy.

> Well, lucky customers who bought the disk drive as part of a notebook
> computer probably get one year's warranty from the vendor of the
> notebook computer, so if they're lucky enough to learn about Toshiba's
> firmware within a year then they can send their entire computer back
> for some length of time to get warranty service.

See above.

> But anyone who went to Akihabara and bought the drive by itself from a
> parts store, the store probably offers one week or one month to
> replace a failing drive if it was dead on arrival.  In these cases
> a customer who learns about Toshiba's firmware after two weeks or five
> weeks gets screwed.

Don't buy drives from bargain basement shops.  Buy from trusted retailers,
or direct from the manufacturer.  That you bought from a place that probably
didn't even stock retail packages in shock-resistant packaging is stupid.

> But Toshiba isn't the only maker who isn't saying how bad their
> firmware is.  We need those bad block lists.  They are as
> necessary as they ever were.

We're not saying our firmware is bad because frankly, I think it is rather
decent, and getting better every single product we release.  Given that the
disk drive is probably the most complex piece of machinery in your home, I
think they do pretty well all things considered.

I still don't understand why your Toshiba engineer friends couldn't help you
beyond listening to the drive bounce off the crash stop.

(BTW, if the drive is clunking because it can't acquire at a certain
location, odds are that more than just the user data at that sector is a
problem.  Your testing doesn't indicate that, but I'd be suspicious
personally.)

--eric, speaking for myself not Maxtor of course

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  7:37 Mudama, Eric
@ 2003-10-19  8:09 ` Norman Diamond
  2003-10-19  8:24   ` Hans Reiser
                     ` (2 more replies)
  2003-10-19  8:13 ` Rogier Wolff
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 52+ messages in thread
From: Norman Diamond @ 2003-10-19  8:09 UTC (permalink / raw)
  To: Mudama, Eric, 'Hans Reiser ', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ', linux-kernel,
	nikita, 'Pavel Machek ', 'Justin Cormack ',
	'Russell King ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Eric Mudama replied to me:

> > Does anyone need more?
>
> Why don't you ask your friends at Toshiba whether that model supports
> automatic reallocation, and if it does, how to enable it?

1.  I didn't have to ask whether it does, because the S.M.A.R.T. logs
already showed that it had done so.  The probelm is that it didn't do so to
the block that was involved in this issue.

2.  I did ask a different question, why that particular block wasn't getting
reallocated, and my friends answered, and this answer was already reported
in this thread a few days ago.

3.  If there were a way to enable reallocation in case of permanent errors,
I think my friends would have said.  But they sure didn't say there were any
user-settable options, they only said some approximations of how it was
designed.  It does reallocations after temporary read errors but not after
permanent read errors (where permanent means 255 failures in auto-retry).
They think it does reallocations after temporary write errors, they weren't
sure if it does reallocations after permanent write errors, now we know that
it doesn't do reallocations after permanent write errors, and this is how it
is designed, with no hint of options to toggle.

> > We do not know if Toshiba is the only maker whose firmware
> > refuses to reallocate bad blocks when permanent errors are
> > detected, because the makers aren't saying.
>
> What would you like "us disk makers" to say?

How to force reallocations even when data are lost, so that the block number
can still be accessed even though the data will be random or zeroes until it
gets written again.  How to force reallocations even when data are lost, to
prevent a different problem (i.e. if the block is not reallocated and then a
subsequent write appears to succeed, I don't really think that spot on the
platter has really reliably recovered even if you think so, I think the new
data might still get lost again in a few milliseconds or minutes).

> If every other part of your computer is warrantied for 1 year, why should
> disk drives alone in the cheapest OEM systems carry 3 year warranties?

Why does RAM carry 6 year warranties?  (Maybe some don't but this is
common.)

> BTW, you're welcome to buy "premium" drives with 3-year or 5-year
> warranties.  (3 on most vendor's high end ATA products, and 5 years on
> most SCSI products)

I haven't seen that, even on a SCSI product.

Meanwhile, regarding ATA and warranties, here's a question for you.  I
bought a Maxtor 80GB desktop hard drive at a time when it was a high end
product.  The drive came with two sets of instructions, one in Japanese and
one in English.  Which set of instructions do you think most customers read
here in Japan?  And then which set of instructions do you think was more
likely to have correct jumpering instructions?  I couldn't quite be sure
which set of jumpering instructions to believe, because even though Maxtor's
parent might be in the US (I'm not sure actually), I did buy this thing in
the Japanese market with Japanese packaging and one of the two sets of
instructions in Japanese.  So I sent e-mail to Maxtor to ask which
instructions were correct, but Maxtor didn't answer.  I phoned Maxtor, and
it turned out that the phone number was answered in Singapore, and the
person didn't answer my question but gave a different e-mail address for me
to send my question to.  So I sent e-mail to Maxtor's different e-mail
address to ask again which instructions were correct and explain everything
that had happened so far.  Maxtor still never answered.  Would you like to
know why my level of trust in Maxtor drives is as low as it has been in IBM
drives since a previous experience and has been in Toshiba drives for the
past week?  This doesn't exactly reflect drive reliability unless I guess
wrong which set of jumpering instructions to obey.  But still, suppose I
guessed wrong, then would Maxtor provide a warranty?

> In most cases these premium warranties will only cost you $5-$10.

I've still never seen them on parts that way, not for 550 yen or 1,100 yen
or any other amount.  I've occasionally seen it on entire computers, for
example Dell, or a store warranty at Bic Camera, for around 5% of the price
of the computer.

> > If their disk drives start to develop bad blocks after two
> > years, then customers don't discover how bad Toshiba's firmware
> > is until two years have passed, and now they can't even make
> > claims to get firmware fixed.
>
> What do you want "fixed" in the firmware?

Reallocate bad blocks when bad blocks are detected, even in situations when
the badness is detected as permanent.  This answer hasn't been clear yet
from this thread?????

> I still don't understand why your Toshiba engineer friends couldn't help
> you beyond listening to the drive bounce off the crash stop.

They're not sure yet if they can.  Officially of course they can't, because
of the warranty rules that have already been discussed.

> (BTW, if the drive is clunking because it can't acquire at a certain
> location, odds are that more than just the user data at that sector is a
> problem.

It didn't sound like clunking.  It sounded like repeated seeks.  It didn't
sound like 255 repeated seeks, so I'm guessing it probably does something
like try 15 retries without seeking, then seek again and try another 15
times, etc.

Meanwhile, the end result still holds.  In at least some cases, known
defective firmware is refusing to do reallocations when reallocations are
possible.  Other makers' firmware is less known.  We still need to keep
lists of bad blocks known by the OS and filesystems and drivers, and we
still need to keep those blocks away from ordinary file operations.  These
lists remain as necessary as they ever were.  Unless we get some guarantees
of good behavior by drives, if we don't make lists of bad blocks then we
will have to say that Linux and disk drives shouldn't be used together in
any computer.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:09 ` Norman Diamond
@ 2003-10-19  8:24   ` Hans Reiser
  2003-10-19 11:43   ` Ralf Baechle
  2003-10-19 15:55   ` Krzysztof Halasa
  2 siblings, 0 replies; 52+ messages in thread
From: Hans Reiser @ 2003-10-19  8:24 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Mudama, Eric, 'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', linux-kernel, nikita,
	'Pavel Machek ', 'Justin Cormack ',
	'Russell King ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Norman Diamond wrote:

>
>>What would you like "us disk makers" to say?
>>    
>>
>
>How to force reallocations even when data are lost, 
>
buy Maxtor and write to them, thereby triggering the remap.

All of this said, let me just repeat that I concede that ReiserFS does 
need to support remapping, and Reiser4 does it.  However, I think that 
we should encourage users to ask the drive to do it for them.  Maybe 
this is wrong if it turns out that most drives are not responsible/wise 
about it, but I need more info before I can say about that.

-- 
Hans

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:09 ` Norman Diamond
  2003-10-19  8:24   ` Hans Reiser
@ 2003-10-19 11:43   ` Ralf Baechle
  2003-10-19 15:55   ` Krzysztof Halasa
  2 siblings, 0 replies; 52+ messages in thread
From: Ralf Baechle @ 2003-10-19 11:43 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Mudama, Eric, 'Hans Reiser ', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ', linux-kernel,
	nikita, 'Pavel Machek ', 'Justin Cormack ',
	'Russell King ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

On Sun, Oct 19, 2003 at 05:09:36PM +0900, Norman Diamond wrote:

> How to force reallocations even when data are lost, so that the block number
> can still be accessed even though the data will be random or zeroes until it
> gets written again.  How to force reallocations even when data are lost, to
> prevent a different problem (i.e. if the block is not reallocated and then a
> subsequent write appears to succeed, I don't really think that spot on the
> platter has really reliably recovered even if you think so, I think the new
> data might still get lost again in a few milliseconds or minutes).
> 
> > If every other part of your computer is warrantied for 1 year, why should
> > disk drives alone in the cheapest OEM systems carry 3 year warranties?
> 
> Why does RAM carry 6 year warranties?  (Maybe some don't but this is
> common.)

The distribution of RAM failure over time is different.  Most failure of RAM
tend to be in the first few days or even hours.  After that the rate drops
to a very low value for the next few years.  In other words a long warranty
time won't cause alot of cost for the manufacturer nor benefit customers
much.  But it looks good in advertisment ...

  Ralf

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:09 ` Norman Diamond
  2003-10-19  8:24   ` Hans Reiser
  2003-10-19 11:43   ` Ralf Baechle
@ 2003-10-19 15:55   ` Krzysztof Halasa
  2 siblings, 0 replies; 52+ messages in thread
From: Krzysztof Halasa @ 2003-10-19 15:55 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Mudama, Eric, 'Hans Reiser ', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ', linux-kernel,
	nikita, 'Pavel Machek ', 'Justin Cormack ',
	'Russell King ', 'Vitaly Fertman '

"Norman Diamond" <ndiamond@wta.att.ne.jp> writes:

> 3.  If there were a way to enable reallocation in case of permanent errors,
> I think my friends would have said.  But they sure didn't say there were any
> user-settable options, they only said some approximations of how it was
> designed.

Other drives remap on write by default.

>  It does reallocations after temporary read errors but not after
> permanent read errors (where permanent means 255 failures in auto-retry).

Good so far.

> They think it does reallocations after temporary write errors, they weren't
> sure if it does reallocations after permanent write errors, now we know that
> it doesn't do reallocations after permanent write errors, and this is how it
> is designed, with no hint of options to toggle.

There isn't (shoudn't be) such a thing as "temporary" or "permanent"
write error. A write error should cause a sector to be remapped (the
question about how many times the drive should try to write a single
sector is irrelevant here). The drive should not return write error
unless all spare sectors are already in use (which means the drive
is approaching death and should be replaced immediately).

All drives I currently use do just that (though I don't currently use
Toshiba drives).

I would rather ask Toshiba if it's a bug in their firmware and if they
have fixed it. Or buy another brand.

> Why does RAM carry 6 year warranties?  (Maybe some don't but this is
> common.)

It doesn't have moving parts.

> > BTW, you're welcome to buy "premium" drives with 3-year or 5-year
> > warranties.  (3 on most vendor's high end ATA products, and 5 years on
> > most SCSI products)

We have that here (ATA, and I think all SCSI drives have 5 years).
Not sure about exact price increase, though.
-- 
Krzysztof Halasa, B*FH

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  7:37 Mudama, Eric
  2003-10-19  8:09 ` Norman Diamond
@ 2003-10-19  8:13 ` Rogier Wolff
  2003-10-19  8:17 ` Hans Reiser
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 52+ messages in thread
From: Rogier Wolff @ 2003-10-19  8:13 UTC (permalink / raw)
  To: Mudama, Eric
  Cc: 'Norman Diamond ', 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', 'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

On Sun, Oct 19, 2003 at 01:37:03AM -0600, Mudama, Eric wrote:
> About 2.5 years ago, Maxtor's largest drive was 60GB... 15GB/head.  Now
> we're shipping 250GB drives with 6 heads also... ~42GB/head, almost triple

Know your maxtor drives: Maxtor has been shipping 4-platter, 8 head
drives for quite a long time. Only recently am I starting to see the
largest maxtor-drive from a family having the space to carry 4 
platters, but none of the expected capacity are shipping (*).... Care 
to explain?

Eric, do you know why maxtor stopped putting the number of heads 
in the model number? (It's the last number in the model number, just
after the letter. Currently all drives set this to "0"). It was quite 
convenient for us to know what to expect from a 92720U8, 98196H8, 
96147H8 and 4G160J8. (Hmmm apparently, we're mostly buying the 
"largest of the family" drives: they all have 8 heads! I just
looked at the models in some of our computers.)

			Roger. 

(e.g. the 250G model seems to be a 6-head disk, and not 8-head). 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  7:37 Mudama, Eric
  2003-10-19  8:09 ` Norman Diamond
  2003-10-19  8:13 ` Rogier Wolff
@ 2003-10-19  8:17 ` Hans Reiser
  2003-10-19  8:41   ` Rogier Wolff
  2003-10-19  8:21 ` Andre Hedrick
  2003-10-19 10:47 ` Ingo Oeser
  4 siblings, 1 reply; 52+ messages in thread
From: Hans Reiser @ 2003-10-19  8:17 UTC (permalink / raw)
  To: Mudama, Eric
  Cc: 'Norman Diamond ', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ',
	'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

Eric, is it true what we tell users, that if a drive can't remap a bad 
block it has probably used up all its spares, and that in turn means 
that it is wise to buy a new one because the chance of experiencing 
additional data corruption on a drive that has used up all its spares is 
much higher than the average drive?

What are the common sources of data corruption, is one of them that the 
drive head starts bumping the media more and more often because a 
bearing (or something) has started to show signs of wear?

-- 
Hans



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:17 ` Hans Reiser
@ 2003-10-19  8:41   ` Rogier Wolff
  2003-10-20 15:56     ` Thayne Harbaugh
  0 siblings, 1 reply; 52+ messages in thread
From: Rogier Wolff @ 2003-10-19  8:41 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Mudama, Eric, 'Norman Diamond ', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ',
	'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

On Sun, Oct 19, 2003 at 12:17:16PM +0400, Hans Reiser wrote:
> What are the common sources of data corruption, is one of them that the 
> drive head starts bumping the media more and more often because a 
> bearing (or something) has started to show signs of wear?

I'm not sure if the manufacturer knows. Datarecovery companies
know. 

Sources of dataloss are: 

	- Software
	- crooked platters (especially on laptop drives)
	- heads bouncing on platter
	- broken electronics. 

They more or less happen in about the same number of cases. 

The fact that we see less "high end" disks doesn't mean they break
down less. It might mean that they get sold less (true), or that the 
people that buy them make better backups (probably also true). 

	Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:41   ` Rogier Wolff
@ 2003-10-20 15:56     ` Thayne Harbaugh
  0 siblings, 0 replies; 52+ messages in thread
From: Thayne Harbaugh @ 2003-10-20 15:56 UTC (permalink / raw)
  To: Rogier Wolff
  Cc: Hans Reiser, Mudama, Eric, 'Norman Diamond ',
	'Wes Janzen ', 'John Bradford ',
	"'linux-kernel@vger.kernel.org " ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

On Sun, 2003-10-19 at 02:41, Rogier Wolff wrote:
> On Sun, Oct 19, 2003 at 12:17:16PM +0400, Hans Reiser wrote:
> > What are the common sources of data corruption, is one of them that the 
> > drive head starts bumping the media more and more often because a 
> > bearing (or something) has started to show signs of wear?
> 
> I'm not sure if the manufacturer knows. Datarecovery companies
> know. 
> 
> Sources of dataloss are: 
> 
> 	- Software
> 	- crooked platters (especially on laptop drives)
> 	- heads bouncing on platter
> 	- broken electronics. 

I experienced a fun problem where high-CFM fans in a 1u chassis caused
extreme vibration.  This vibration caused errors in the drive and
resulted in large numbers of failures using badblocks as a testing
tool.  When the vibration was removed the failures disappeared.  Another
symptom of the vibration was _very_ low disk performance <10% normal.

I couldn't get a straight answer from the drive manufacturer about how
the reallocation table worked.  The short answer was that they were the
only ones that could reset the reallocation table.

-- 
Thayne Harbaugh
Linux Networx

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: Blockbusting news, results are in
  2003-10-19  7:37 Mudama, Eric
                   ` (2 preceding siblings ...)
  2003-10-19  8:17 ` Hans Reiser
@ 2003-10-19  8:21 ` Andre Hedrick
  2003-10-19  8:27   ` Hans Reiser
  2003-10-19 10:47 ` Ingo Oeser
  4 siblings, 1 reply; 52+ messages in thread
From: Andre Hedrick @ 2003-10-19  8:21 UTC (permalink / raw)
  To: Mudama, Eric
  Cc: 'Norman Diamond ', 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', 'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '


Eric,

That is what DCO is and we both know what that issue will bring to the
table.

Andre Hedrick
LAD Storage Consulting Group

On Sun, 19 Oct 2003, Mudama, Eric wrote:

> 
> > Does anyone need more?
> 
> Why don't you ask your friends at Toshiba whether that model supports
> automatic reallocation, and if it does, how to enable it?
> 
> Since it isn't in the T13 ATA spec, I am assuming the ability to toggle that
> feature is very vendor-specific.  Pretty sure all Maxtors from at least the
> last year ship with that sort of reallocation enabled, and probably the last
> 4-5 years.
> 
> > We do not know if Toshiba is the only maker whose firmware
> > refuses to reallocate bad blocks when permanent errors are
> > detected, because the makers aren't saying.
> 
> What would you like "us disk makers" to say?  The drives I play with at work
> happily reallocate on the fly all the time. (when I whack them with a
> screwdriver and cause scratches on the media, that is)
> 
> > By the way, Toshiba's US subsidiary has indications on their
> > web site that they provide warranty service on their products,
> > but that they have reduced the warranty period from three years
> > to one year.  This was a smart move by Toshiba's US subsidiary.
> 
> Yes, it saves us a lot of money every year, and lets us sell you each drive
> for a few dollars cheaper.  My understanding is that the #1 cost issue is
> the fact that to warranty a product legally in the USA, you need to maintain
> a certain amount of product to handle replacement drives, long after they
> stop being shipped.  Reducing our warranty inventory to some fraction of 1
> year's volume (~55M drives) from some fraction of 3 year's volume (~160M
> drives) is a significant amount of product we don't have to "eat".
> (Remember, 3 year old drives, that we no longer need to hold on to for
> warranty purposes, are near-worthless in the consumer market)
> 
> If every other part of your computer is warrantied for 1 year, why should
> disk drives alone in the cheapest OEM systems carry 3 year warranties?  BTW,
> you're welcome to buy "premium" drives with 3-year or 5-year warranties.  (3
> on most vendor's high end ATA products, and 5 years on most SCSI products)
> In most cases these premium warranties will only cost you $5-$10.  (This is
> based simply on the rough price delta between our DiamondMax Plus9 200GB and
> our MaxLine II 200GB, which are basically the same drive with different
> warranties)
> 
> > If their disk drives start to develop bad blocks after two
> > years, then customers don't discover how bad Toshiba's firmware
> > is until two years have passed, and now they can't even make
> > claims to get firmware fixed.
> 
> What do you want "fixed" in the firmware?  It is 1000x cheaper to just send
> you a replacement drive from the current product line.  By the time 3 years
> have passed (2 years beyond a 1 year warranty), our factory isn't even
> capable of reprocessing the disk drive you hold in your hands, since we wind
> up retooling chunks of it every few months to make way for
> bigger/faster/quieter/cheaper disk drives.
> 
> About 2.5 years ago, Maxtor's largest drive was 60GB... 15GB/head.  Now
> we're shipping 250GB drives with 6 heads also... ~42GB/head, almost triple
> the capacity, and in a few months we'll be doing a chunk better.
> 
> The only two parts in common between those two drives is the molex power
> connector.
> 
> > Toshiba's head office is even smarter.  In Japanese they refuse
> > entirely to provide warranty service to end users.  Customers
> > have to send defective disk drives back up through the sales channel.
> 
> I guess my suggestion is don't buy Toshiba.  Research support options before
> you buy.
> 
> > Well, lucky customers who bought the disk drive as part of a notebook
> > computer probably get one year's warranty from the vendor of the
> > notebook computer, so if they're lucky enough to learn about Toshiba's
> > firmware within a year then they can send their entire computer back
> > for some length of time to get warranty service.
> 
> See above.
> 
> > But anyone who went to Akihabara and bought the drive by itself from a
> > parts store, the store probably offers one week or one month to
> > replace a failing drive if it was dead on arrival.  In these cases
> > a customer who learns about Toshiba's firmware after two weeks or five
> > weeks gets screwed.
> 
> Don't buy drives from bargain basement shops.  Buy from trusted retailers,
> or direct from the manufacturer.  That you bought from a place that probably
> didn't even stock retail packages in shock-resistant packaging is stupid.
> 
> > But Toshiba isn't the only maker who isn't saying how bad their
> > firmware is.  We need those bad block lists.  They are as
> > necessary as they ever were.
> 
> We're not saying our firmware is bad because frankly, I think it is rather
> decent, and getting better every single product we release.  Given that the
> disk drive is probably the most complex piece of machinery in your home, I
> think they do pretty well all things considered.
> 
> I still don't understand why your Toshiba engineer friends couldn't help you
> beyond listening to the drive bounce off the crash stop.
> 
> (BTW, if the drive is clunking because it can't acquire at a certain
> location, odds are that more than just the user data at that sector is a
> problem.  Your testing doesn't indicate that, but I'd be suspicious
> personally.)
> 
> --eric, speaking for myself not Maxtor of course
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:21 ` Andre Hedrick
@ 2003-10-19  8:27   ` Hans Reiser
  2003-10-19  9:01     ` Erik Andersen
  0 siblings, 1 reply; 52+ messages in thread
From: Hans Reiser @ 2003-10-19  8:27 UTC (permalink / raw)
  To: Andre Hedrick
  Cc: Mudama, Eric, 'Norman Diamond ', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ',
	'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

What is DCO oh cryptic industry insider.;-)

Hans

Andre Hedrick wrote:

>Eric,
>
>That is what DCO is and we both know what that issue will bring to the
>table.
>
>Andre Hedrick
>LAD Storage Consulting Group
>
>On Sun, 19 Oct 2003, Mudama, Eric wrote:
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  8:27   ` Hans Reiser
@ 2003-10-19  9:01     ` Erik Andersen
  2003-10-19 14:10       ` Andre Hedrick
  2003-10-19 14:42       ` Valdis.Kletnieks
  0 siblings, 2 replies; 52+ messages in thread
From: Erik Andersen @ 2003-10-19  9:01 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Andre Hedrick, Mudama, Eric, 'Norman Diamond ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', 'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

On Sun Oct 19, 2003 at 12:27:30PM +0400, Hans Reiser wrote:
> What is DCO oh cryptic industry insider.;-)

See "6.21 Device Configuration Overlay feature set" in 
the ATA6 spec...

"The optional Device Configuration Overlay feature set allows a
utility program to modify some of the optional commands, modes,
and feature sets that a device reports as supported in the
IDENTIFY DEVICE or IDENTIFY PACKET DEVICE command response as
well as the capacity reported."

 -Erik

--
Erik B. Andersen             http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  9:01     ` Erik Andersen
@ 2003-10-19 14:10       ` Andre Hedrick
  2003-10-19 18:16         ` Hans Reiser
  2003-10-19 14:42       ` Valdis.Kletnieks
  1 sibling, 1 reply; 52+ messages in thread
From: Andre Hedrick @ 2003-10-19 14:10 UTC (permalink / raw)
  To: Erik Andersen
  Cc: Hans Reiser, Mudama, Eric, 'Norman Diamond ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', 'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '


Erik,

Nice quoting of the "public" version, now what did I really mean?
Obviously you, Eric, and the rest of the mailing list were not in the
smoke filled back room when Compaq and Dell were formalizing the proposal.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Sun, 19 Oct 2003, Erik Andersen wrote:

> On Sun Oct 19, 2003 at 12:27:30PM +0400, Hans Reiser wrote:
> > What is DCO oh cryptic industry insider.;-)
> 
> See "6.21 Device Configuration Overlay feature set" in 
> the ATA6 spec...
> 
> "The optional Device Configuration Overlay feature set allows a
> utility program to modify some of the optional commands, modes,
> and feature sets that a device reports as supported in the
> IDENTIFY DEVICE or IDENTIFY PACKET DEVICE command response as
> well as the capacity reported."
> 
>  -Erik
> 
> --
> Erik B. Andersen             http://codepoet-consulting.com/
> --This message was written using 73% post-consumer electrons--
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19 14:10       ` Andre Hedrick
@ 2003-10-19 18:16         ` Hans Reiser
  2003-10-19 19:44           ` Andre Hedrick
  0 siblings, 1 reply; 52+ messages in thread
From: Hans Reiser @ 2003-10-19 18:16 UTC (permalink / raw)
  To: Andre Hedrick
  Cc: Erik Andersen, Mudama, Eric, 'Norman Diamond ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', 'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

Any time you want to just spell it out in plain english for the rest of 
the lkml, please do so.....

Hans

Andre Hedrick wrote:

>Erik,
>
>Nice quoting of the "public" version, now what did I really mean?
>Obviously you, Eric, and the rest of the mailing list were not in the
>smoke filled back room when Compaq and Dell were formalizing the proposal.
>
>Cheers,
>
>Andre Hedrick
>LAD Storage Consulting Group
>
>On Sun, 19 Oct 2003, Erik Andersen wrote:
>
>  
>
>>On Sun Oct 19, 2003 at 12:27:30PM +0400, Hans Reiser wrote:
>>    
>>
>>>What is DCO oh cryptic industry insider.;-)
>>>      
>>>
>>See "6.21 Device Configuration Overlay feature set" in 
>>the ATA6 spec...
>>
>>"The optional Device Configuration Overlay feature set allows a
>>utility program to modify some of the optional commands, modes,
>>and feature sets that a device reports as supported in the
>>IDENTIFY DEVICE or IDENTIFY PACKET DEVICE command response as
>>well as the capacity reported."
>>
>> -Erik
>>
>>--
>>Erik B. Andersen             http://codepoet-consulting.com/
>>--This message was written using 73% post-consumer electrons--
>>
>>    
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19 18:16         ` Hans Reiser
@ 2003-10-19 19:44           ` Andre Hedrick
  2003-10-20  7:21             ` Hans Reiser
  0 siblings, 1 reply; 52+ messages in thread
From: Andre Hedrick @ 2003-10-19 19:44 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Erik Andersen, Mudama, Eric, 'Norman Diamond ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', 'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

Okay Hans,

First if people will bother to remember that everything about storage is a
"LIE".  Then the rest of the issue will be simple.

The private version of DCO was jokingly referred to ad "dollars for
gigabytes".

Designed for the idiot and moron system administrator who only knows how
to apply a snapshot image to a drive.  These are the special class of SA's
who because the system was purchased w/ a 20MB drive, they could only
install another 20MB drive.

The reality as Eric (maxtor) pointed out, drive companies can not keep
drive lifetimes that long.  However the cheap path the drive industry took
was to cheat the warrenty.  Now with DCO (vendor version), they could
provide any drive as a "de-stroked" capacity they want.  Thus recall I
added but now may have been removed "STROKE" option.

Now the private version can alter the entire IDENTIFY page.

So a 200GB drive can be made to look and report as that 20GB drive.
This includes faking (the big lie) the model, capacity, features,
revision, and firmware.  Additionally listing such "LIES" on company web
sites as valid products.

Now where did the joke come from?  If some one like me was to announce
they have the means to detect such a DCO event, which I can not.  That
person could sell at a price a tool to make disks magically grow in
capacity.

So depending on how much you were willing to pay and how much the
individual want to charge, they could gradually expand your drive to full
capacity (native) and charge you for each step.

So now that the story of the DCO is out, have fun finding it.  Also do
thank the lamers and brain dead SA's who could not live with a simple HPA,
to force the creation of yet another storage "LIE".

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Sun, 19 Oct 2003, Hans Reiser wrote:

> Any time you want to just spell it out in plain english for the rest of 
> the lkml, please do so.....
> 
> Hans
> 
> Andre Hedrick wrote:
> 
> >Erik,
> >
> >Nice quoting of the "public" version, now what did I really mean?
> >Obviously you, Eric, and the rest of the mailing list were not in the
> >smoke filled back room when Compaq and Dell were formalizing the proposal.
> >
> >Cheers,
> >
> >Andre Hedrick
> >LAD Storage Consulting Group
> >
> >On Sun, 19 Oct 2003, Erik Andersen wrote:
> >
> >  
> >
> >>On Sun Oct 19, 2003 at 12:27:30PM +0400, Hans Reiser wrote:
> >>    
> >>
> >>>What is DCO oh cryptic industry insider.;-)
> >>>      
> >>>
> >>See "6.21 Device Configuration Overlay feature set" in 
> >>the ATA6 spec...
> >>
> >>"The optional Device Configuration Overlay feature set allows a
> >>utility program to modify some of the optional commands, modes,
> >>and feature sets that a device reports as supported in the
> >>IDENTIFY DEVICE or IDENTIFY PACKET DEVICE command response as
> >>well as the capacity reported."
> >>
> >> -Erik
> >>
> >>--
> >>Erik B. Andersen             http://codepoet-consulting.com/
> >>--This message was written using 73% post-consumer electrons--
> >>
> >>    
> >>
> >
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at  http://www.tux.org/lkml/
> >
> >
> >  
> >
> 
> 
> -- 
> Hans
> 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19 19:44           ` Andre Hedrick
@ 2003-10-20  7:21             ` Hans Reiser
  0 siblings, 0 replies; 52+ messages in thread
From: Hans Reiser @ 2003-10-20  7:21 UTC (permalink / raw)
  To: Andre Hedrick
  Cc: Erik Andersen, Mudama, Eric, 'Norman Diamond ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', 'linux-kernel@vger.kernel.org ',
	'nikita@namesys.com ', 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

Thanks for the explanation.

-- 
Hans



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  9:01     ` Erik Andersen
  2003-10-19 14:10       ` Andre Hedrick
@ 2003-10-19 14:42       ` Valdis.Kletnieks
  1 sibling, 0 replies; 52+ messages in thread
From: Valdis.Kletnieks @ 2003-10-19 14:42 UTC (permalink / raw)
  To: andersen; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 601 bytes --]

On Sun, 19 Oct 2003 03:01:50 MDT, Erik Andersen said:

> See "6.21 Device Configuration Overlay feature set" in 
> the ATA6 spec...
> 
> "The optional Device Configuration Overlay feature set allows a
> utility program to modify some of the optional commands, modes,
> and feature sets that a device reports as supported in the
> IDENTIFY DEVICE or IDENTIFY PACKET DEVICE command response as
> well as the capacity reported."

I'm going to go out on a limb again and guess that knowing how to do the
modifications requires access to pieces of dead trees only available after an
NDA has been signed?



[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19  7:37 Mudama, Eric
                   ` (3 preceding siblings ...)
  2003-10-19  8:21 ` Andre Hedrick
@ 2003-10-19 10:47 ` Ingo Oeser
  4 siblings, 0 replies; 52+ messages in thread
From: Ingo Oeser @ 2003-10-19 10:47 UTC (permalink / raw)
  To: Mudama, Eric; +Cc: linux-kernel

Hi Eric,
hi lkml,

On Sunday 19 October 2003 09:37, Mudama, Eric wrote:
> Yes, it saves us a lot of money every year, and lets us sell you each drive
> for a few dollars cheaper.  My understanding is that the #1 cost issue is
> the fact that to warranty a product legally in the USA, you need to
> maintain a certain amount of product to handle replacement drives, long
> after they stop being shipped.  Reducing our warranty inventory to some
> fraction of 1 year's volume (~55M drives) from some fraction of 3 year's
> volume (~160M drives) is a significant amount of product we don't have to
> "eat". (Remember, 3 year old drives, that we no longer need to hold on to
> for warranty purposes, are near-worthless in the consumer market)

This is solved very easily and has been done by Seagate before for long
warranty devices: Don't hold the devices and promise the customer to
receive same or newer/better device on warranty after the product is no
longer produced. That will make your revenues and your customer more
happy, as long as product quality isn't decreasing over drive generations.

So reducing the default warranty is no good customer service in my
opinion. And in Germany you have 2 years warranty per law, which
resemble the life cycle of a PC quite nicely.

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: Blockbusting news, results are in
@ 2003-10-19 17:36 Mudama, Eric
  0 siblings, 0 replies; 52+ messages in thread
From: Mudama, Eric @ 2003-10-19 17:36 UTC (permalink / raw)
  To: 'Hans Reiser '
  Cc: ''Norman Diamond ' ',
	''Wes Janzen ' ',
	''Rogier Wolff ' ',
	''John Bradford ' ',
	''linux-kernel@vger.kernel.org ' ',
	''nikita@namesys.com ' ',
	''Pavel Machek ' ',
	''Justin Cormack ' ',
	''Russell King ' ',
	''Vitaly Fertman ' ',
	''Krzysztof Halasa ' '

 
> Eric, is it true what we tell users, that if a drive can't remap
> a bad block it has probably used up all its spares, and that in
> turn means that it is wise to buy a new one because the chance of
> experiencing additional data corruption on a drive that has used
> up all its spares is much higher than the average drive?

Not sure about other vendors, but a fatal write on a maxtor means we
couldn't do your write after exhausting all attempts at reallocation,
recertification, etc.  If you ever get this on a drive, either:

1) the drive is unable to reallocate any more blocks because it has run out
of spares

or

2) the drive was attempting those writes under environmental conditions that
it was unable to handle. (extreme shock&vibe, <5C, >55C, etc)

> What are the common sources of data corruption, is one of them
> that the drive head starts bumping the media more and more often
> because a bearing (or something) has started to show signs of wear?

>From my understanding, most returns are due to damaged heads (some small
percent burn up over time) or operational shock "head/media events" where
someone bumped a running drive and the head dug a crater in the media.  Any
particulate contamination can be struck by the heads causing high-fly write
events.  (head bounces up off the media in the middle of a write).  I
haven't heard of bearing wear being a common issue... all drives these days
use fluid bearings.  Early fluid bearings had outgassing issues at high
temperature, but I think those problems were solved by manufacturers long
before the first drives using them hit store shelves.

All in all, they're rather delicate.  I'm amazed they work at all too.

--eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: Blockbusting news, results are in
@ 2003-10-19 17:39 Mudama, Eric
  0 siblings, 0 replies; 52+ messages in thread
From: Mudama, Eric @ 2003-10-19 17:39 UTC (permalink / raw)
  To: 'Ingo Oeser '; +Cc: 'linux-kernel@vger.kernel.org '

 

-----Original Message-----
From: Ingo Oeser
> So reducing the default warranty is no good customer service in my
> opinion. And in Germany you have 2 years warranty per law, which
> resemble the life cycle of a PC quite nicely.

It wasn't my idea, I'm just an engineer.

Maxtor is a big enough company, I would guess that their warranty law
complience matches every country in which the drives are warrantied.

--eric



^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: Blockbusting news, results are in
@ 2003-10-19 17:51 Mudama, Eric
  2003-10-20  6:22 ` Rogier Wolff
  0 siblings, 1 reply; 52+ messages in thread
From: Mudama, Eric @ 2003-10-19 17:51 UTC (permalink / raw)
  To: 'Rogier Wolff '
  Cc: ''Norman Diamond ' ',
	''Hans Reiser ' ',
	''Wes Janzen ' ',
	''John Bradford ' ',
	''linux-kernel@vger.kernel.org ' ',
	''nikita@namesys.com ' ',
	''Pavel Machek ' ',
	''Justin Cormack ' ',
	''Russell King ' ',
	''Vitaly Fertman ' ',
	''Krzysztof Halasa ' '

-----Original Message-----
From: Rogier Wolff

> Know your maxtor drives: Maxtor has been shipping 4-platter,
> 8 head drives for quite a long time. Only recently am I
> starting to see the largest maxtor-drive from a family having
> the space to carry 4 platters, but none of the expected
> capacity are shipping (*).... Care to explain?

I do know our product line.  All of our current 4-platter products are 5400
RPM, and have been for 4 years.  People aren't interested in 5400RPM drives
anymore, and the design tolerances on a 4-platter 7200RPM drive are tight
enough that it becomes extremely difficult to manufacture.

The volume on our only current 4-platter drive is quite small, compared to
the rest of our products.  Most of the industry is getting away from the
HUGE drives and saying that 7200RPM "very big" is more important than the
extra 50GB of capacity.  (300 vs 250)

> Eric, do you know why maxtor stopped putting the number of heads
> in the model number? (It's the last number in the model number,
> just after the letter. Currently all drives set this to "0"). It
> was quite convenient for us to know what to expect from a 92720U8,
> 98196H8, 96147H8 and 4G160J8. (Hmmm apparently, we're mostly
> buying the "largest of the family" drives: they all have 8
> heads! I just looked at the models in some of our computers.)

You're buying 5400RPM products not 7200RPM.  Your 4G160J8 drive was
manufacturered over 2 years ago, it isn't a current product.

I'd guess that the reason we don't put the head number on the drive is to
not confuse OEM databases.  Our drives basically tell us at the end of
manufacturing how big they were able to become, regardless of head count.
To make it easier, our model number is now a capacity instead of a head
count.

It prevents Dell from saying "this model number comes in 4 sizes, we want
different part numbers for each capacity too!" so now we only give them the
capacity.

If you're looking for the densest drives our factory produces (which have,
by definition, the best sequential I/O performance), you can  buy only the
model number of a capacity that is at the peak (e.g. a 250GB drive can't be
made with a 30GB head, while a 200GB can) of a generation.

I think there are other ways to figure out how many heads are physically in
a drive, but I don't want to spoil it and take all the fun away.

--eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-19 17:51 Mudama, Eric
@ 2003-10-20  6:22 ` Rogier Wolff
  0 siblings, 0 replies; 52+ messages in thread
From: Rogier Wolff @ 2003-10-20  6:22 UTC (permalink / raw)
  To: Mudama, Eric
  Cc: 'Rogier Wolff ', ''Norman Diamond ' ',
	''Hans Reiser ' ',
	''Wes Janzen ' ',
	''John Bradford ' ',
	''linux-kernel@vger.kernel.org ' ',
	''nikita@namesys.com ' ',
	''Pavel Machek ' ',
	''Justin Cormack ' ',
	''Russell King ' ',
	''Vitaly Fertman ' ',
	''Krzysztof Halasa ' '

On Sun, Oct 19, 2003 at 11:51:05AM -0600, Mudama, Eric wrote:
>  
> 
> -----Original Message-----
> From: Rogier Wolff
> 
> > Know your maxtor drives: Maxtor has been shipping 4-platter,
> > 8 head drives for quite a long time. Only recently am I
> > starting to see the largest maxtor-drive from a family having
> > the space to carry 4 platters, but none of the expected
> > capacity are shipping (*).... Care to explain?
> 
> I do know our product line.  All of our current 4-platter products are 5400
> RPM, and have been for 4 years.  People aren't interested in 5400RPM drives
> anymore, and the design tolerances on a 4-platter 7200RPM drive are tight
> enough that it becomes extremely difficult to manufacture.

OK. Thanks for the explanation. We'd (apparently) buy them, as
evidenced by the disks I found on a random search for maxtor
drives in my company. (about half are maxtor).

The 160G 5400 RPM drives will do 36 Mbyte per second, the 7200 versions
might do 50Mb per second, the difference is unimportant. 

> It prevents Dell from saying "this model number comes in 4 sizes, we want
> different part numbers for each capacity too!" so now we only give them the
> capacity.

We're getting annoyed at WD because they are selling WD800 drives 
(80G) with 2, 4, 6 and 8 heads(*). So when we order a replacement
WD800 for spare parts for a broken one, we might end up with a
different generation drive which is useless for the "part exchange"
project....

(*) they probably don't sell the full complement... yet.

> If you're looking for the densest drives our factory produces (which have,
> by definition, the best sequential I/O performance), you can  buy only the

You're assuming that a head-switch is faster than a track-to-track seek.
Apparently that is no longer true. We've seen drives that "scan" a whole
platter before switching heads. We've seen drives that do this on a 
per-region basis. 

> model number of a capacity that is at the peak (e.g. a 250GB drive can't be
> made with a 30GB head, while a 200GB can) of a generation.
> 
> I think there are other ways to figure out how many heads are physically in
> a drive, but I don't want to spoil it and take all the fun away.

How about: "Opening it up and having a peek?" :-) That certainly
works. But most vendors don't let me do that before I buy.  :-)

		Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: Blockbusting news, results are in
@ 2003-10-20 14:08 Mudama, Eric
  2003-10-20 14:42 ` John Bradford
  0 siblings, 1 reply; 52+ messages in thread
From: Mudama, Eric @ 2003-10-20 14:08 UTC (permalink / raw)
  To: 'Rogier Wolff'
  Cc: ''Norman Diamond ' ',
	''Hans Reiser ' ',
	''Wes Janzen ' ',
	''John Bradford ' ',
	''linux-kernel@vger.kernel.org ' ',
	''nikita@namesys.com ' ',
	''Pavel Machek ' ',
	''Justin Cormack ' ',
	''Russell King ' ',
	''Vitaly Fertman ' ',
	''Krzysztof Halasa ' '

> -----Original Message-----
>
> We're getting annoyed at WD because they are selling WD800 drives 
> (80G) with 2, 4, 6 and 8 heads(*). So when we order a replacement
> WD800 for spare parts for a broken one, we might end up with a
> different generation drive which is useless for the "part exchange"
> project....
> 
> (*) they probably don't sell the full complement... yet.

With "today's generation" of ATA drives, WD and Maxtor stop at 3 platters,
and Seagate stops at a 2-platter design.

Everyone wants to make a 4-platter drive, but for their rather small
volumes, most people find it isn't cost effective.

> You're assuming that a head-switch is faster than a 
> track-to-track seek. Apparently that is no longer true.
> We've seen drives that "scan" a whole platter before
> switching heads. We've seen drives that do this on a 
> per-region basis. 

Track-to-track seeks are faster than headswitches because you don't have to
worry about radial comb imbalance (head A is only guaranteed to be within
some tolerance of head B, this creates your 'skew').

Most vendors these days have a "modified horizontal format" which does some
small number of cylinders (16?) then a headswitch, so that they slowly walk
inward.

Drives that walk the entire platter then headswitch haven't existed for
years I'm quite sure, at least not in modern ATA drives.  Headswitches after
each zone would possibly make sense, but it can make it noticably more
complicated to estimate the time for a seek, which is the key to good
performance.  (e.g. your "local area" IO becomes IO across potentially
thousands of cylinders)

> How about: "Opening it up and having a peek?" :-) That certainly
> works. But most vendors don't let me do that before I buy.  :-)

Before you buy? um, no =P

--eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: Blockbusting news, results are in
  2003-10-20 14:08 Mudama, Eric
@ 2003-10-20 14:42 ` John Bradford
  0 siblings, 0 replies; 52+ messages in thread
From: John Bradford @ 2003-10-20 14:42 UTC (permalink / raw)
  To: Mudama, Eric, 'Rogier Wolff'
  Cc: ''Norman Diamond ' ',
	''Hans Reiser ' ',
	''Wes Janzen ' ',
	''John Bradford ' ',
	''linux-kernel@vger.kernel.org ' ',
	''nikita@namesys.com ' ',
	''Pavel Machek ' ',
	''Justin Cormack ' ',
	''Russell King ' ',
	''Vitaly Fertman ' ',
	''Krzysztof Halasa ' '

> Everyone wants to make a 4-platter drive, but for their rather small
> volumes, most people find it isn't cost effective.

Why not go back to 5.25" form factor?  Would there be significant
engineering difficulties in using current densities, etc, on larger
platters?

> > How about: "Opening it up and having a peek?" :-) That certainly
> > works. But most vendors don't let me do that before I buy.  :-)
> 
> Before you buy? um, no =P

Transparent cases, maybe?  :-)

John.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: Blockbusting news, results are in
@ 2003-10-20 15:55 Mudama, Eric
  2003-10-20 17:32 ` Hans Reiser
  0 siblings, 1 reply; 52+ messages in thread
From: Mudama, Eric @ 2003-10-20 15:55 UTC (permalink / raw)
  To: 'Hans Reiser', Norman Diamond
  Cc: 'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', linux-kernel, nikita,
	'Pavel Machek ', 'Justin Cormack ',
	'Russell King ', 'Vitaly Fertman ',
	'Krzysztof Halasa '


> -----Original Message-----
> From: Hans Reiser [mailto:reiser@namesys.com]
> Sent: Sunday, October 19, 2003 2:25 AM
> To: Norman Diamond
> Cc: Mudama, Eric; 'Wes Janzen '; 'Rogier Wolff '; 'John Bradford ';
> linux-kernel@vger.kernel.org; nikita@namesys.com; 'Pavel Machek ';
> 'Justin Cormack '; 'Russell King '; 'Vitaly Fertman '; 
> 'Krzysztof Halasa
> '
> Subject: Re: Blockbusting news, results are in
> 
> 
> Norman Diamond wrote:
> 
> >
> >>What would you like "us disk makers" to say?
> >>    
> >>
> >
> >How to force reallocations even when data are lost, 
> >
> buy Maxtor and write to them, thereby triggering the remap.

It isn't necessarilly that simple.  However, if the drive is still alive, it
has written your data to a place where it can get at it again.  

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-20 15:55 Mudama, Eric
@ 2003-10-20 17:32 ` Hans Reiser
  0 siblings, 0 replies; 52+ messages in thread
From: Hans Reiser @ 2003-10-20 17:32 UTC (permalink / raw)
  To: Mudama, Eric
  Cc: Norman Diamond, 'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ', linux-kernel, nikita,
	'Pavel Machek ', 'Justin Cormack ',
	'Russell King ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Mudama, Eric wrote:

>>-----Original Message-----
>>From: Hans Reiser [mailto:reiser@namesys.com]
>>Sent: Sunday, October 19, 2003 2:25 AM
>>To: Norman Diamond
>>Cc: Mudama, Eric; 'Wes Janzen '; 'Rogier Wolff '; 'John Bradford ';
>>linux-kernel@vger.kernel.org; nikita@namesys.com; 'Pavel Machek ';
>>'Justin Cormack '; 'Russell King '; 'Vitaly Fertman '; 
>>'Krzysztof Halasa
>>'
>>Subject: Re: Blockbusting news, results are in
>>
>>
>>Norman Diamond wrote:
>>
>>    
>>
>>>>What would you like "us disk makers" to say?
>>>>   
>>>>
>>>>        
>>>>
>>>How to force reallocations even when data are lost, 
>>>
>>>      
>>>
>>buy Maxtor and write to them, thereby triggering the remap.
>>    
>>
>
>It isn't necessarilly that simple. ]
>
Can you explain? 

>However, if the drive is still alive, it
>has written your data to a place where it can get at it again.  
>
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
@ 2003-10-21 15:18 Norman Diamond
  2003-10-21 19:31 ` Chuck Campbell
  0 siblings, 1 reply; 52+ messages in thread
From: Norman Diamond @ 2003-10-21 15:18 UTC (permalink / raw)
  To: linux-kernel

Jan-Benedict Glaw replied to me:

> > After a few other experiments, I used smartctl to direct the drive to do a
> > long self-test.  When it completed, we observed that the drive had
> > self-diagnosed a read failure on the same bad sector number as always, and
> > we observed that the drive did not reallocate the bad block during long
> > self-tests.
>
> Maybe the drive can't remap the block because there's no free space in
> the remap area available any more...

As previously reported about twice in this thread, the first time I ran
"smartctl -a" it reported that the quantities of reallocated sector events
and reallocated sector count were both 1, and when I ran it again after the
first long self-test, both of these quantities had increased to 2.  If there
were no room in the remap area, then how did the remaps increase from 1 to 2
while the permanently bad sector remained non-remapped and permanantly bad?

Here's more results.  The quantities of reallocated sector events and
reallocated sector count have both increased to 3.  Meanwhile the
permanently bad sector remains permanently bad.  I think that there is still
room remaining in the remap area.

By this time I think my friends at Toshiba agree that Toshiba's firmware is
inadequate, though participants in this LKML thread are about evenly divided
on the issue.  It seems that Maxtor's firmware is better, though I notice
the silence regarding my questions of customer non-service and uncertain
warranties (when Maxtor distributed one drive with two incompatible sets of
jumpering instructions).  And other manufacturers still aren't saying
whether their firmware is adequate.  It still seems that either Linux must
be made to work around known bad blocks or else hard disks and Linux cannot
be used together on a computer.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-21 15:18 Norman Diamond
@ 2003-10-21 19:31 ` Chuck Campbell
  2003-10-21 20:05   ` Richard B. Johnson
  0 siblings, 1 reply; 52+ messages in thread
From: Chuck Campbell @ 2003-10-21 19:31 UTC (permalink / raw)
  To: Norman Diamond; +Cc: linux-kernel

On Wed, Oct 22, 2003 at 12:18:33AM +0900, Norman Diamond wrote:
> It still seems that either Linux must
> be made to work around known bad blocks or else hard disks and Linux cannot
> be used together on a computer.

That is a bit of a troll.  I suspect that 70 - 90 percent of linux 
installations on a computer are using hard disks.  Yet the number of people
having the same problem as you doesn't appear to be 100% of those.

Making intentionally inflammatory statements to try and win others to your 
point of view tends to do exactly the opposite.

If there are enough people experiencing the problem, it gets fixed, at least
as far as I can tell.  There have been a nunmber of viable ideas proposed
in this thread to handle the situation you are talking about, and while
none of them are exactly what you proposed, feel free to write code.  Baiting
comments like the above don't really accomplish anything constructive.

-chuck

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-21 19:31 ` Chuck Campbell
@ 2003-10-21 20:05   ` Richard B. Johnson
  2003-10-21 20:21     ` Valdis.Kletnieks
  0 siblings, 1 reply; 52+ messages in thread
From: Richard B. Johnson @ 2003-10-21 20:05 UTC (permalink / raw)
  To: Chuck Campbell; +Cc: Norman Diamond, linux-kernel

On Tue, 21 Oct 2003, Chuck Campbell wrote:

> On Wed, Oct 22, 2003 at 12:18:33AM +0900, Norman Diamond wrote:
> > It still seems that either Linux must
> > be made to work around known bad blocks or else hard disks and Linux cannot
> > be used together on a computer.
>
> That is a bit of a troll.  I suspect that 70 - 90 percent of linux
> installations on a computer are using hard disks.  Yet the number of people
> having the same problem as you doesn't appear to be 100% of those.
>
> Making intentionally inflammatory statements to try and win others to your
> point of view tends to do exactly the opposite.
>
> If there are enough people experiencing the problem, it gets fixed, at least
> as far as I can tell.  There have been a nunmber of viable ideas proposed
> in this thread to handle the situation you are talking about, and while
> none of them are exactly what you proposed, feel free to write code.  Baiting
> comments like the above don't really accomplish anything constructive.
>
> -chuck

I thought both ext2 and ext3 do handle bad-blocks. They just
don't make them owned by a "file" because the Unix file-systems
don't require dummy directory entries.

If the respondent wants them isolated into a "BADBLOCKS" file,
he can make a utility to do that. It's really quite easy because
you can raw-read disks under Linux, plus there is already
the `badblocks` program that will locate them.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-21 20:05   ` Richard B. Johnson
@ 2003-10-21 20:21     ` Valdis.Kletnieks
  2003-10-21 20:31       ` Richard B. Johnson
  2003-10-21 21:53       ` Theodore Ts'o
  0 siblings, 2 replies; 52+ messages in thread
From: Valdis.Kletnieks @ 2003-10-21 20:21 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

On Tue, 21 Oct 2003 16:05:15 EDT, "Richard B. Johnson" said:

> If the respondent wants them isolated into a "BADBLOCKS" file,
> he can make a utility to do that. It's really quite easy because
> you can raw-read disks under Linux, plus there is already
> the `badblocks` program that will locate them.

Yes, it's trivially easy to figure out that block 193453 on /dev/hdb is bad.
It's even not too bad to map that to an offset on /dev/hdb4.  Even if you're
using LVM or DM to map stuff, it's still attackable.  But how do you guarantee
that block 193453 gets allocated to your badblocks file and not to some other
file that just tried to extend itself by 32K?


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-21 20:21     ` Valdis.Kletnieks
@ 2003-10-21 20:31       ` Richard B. Johnson
  2003-10-21 21:53       ` Theodore Ts'o
  1 sibling, 0 replies; 52+ messages in thread
From: Richard B. Johnson @ 2003-10-21 20:31 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel

On Tue, 21 Oct 2003 Valdis.Kletnieks@vt.edu wrote:

> On Tue, 21 Oct 2003 16:05:15 EDT, "Richard B. Johnson" said:
>
> > If the respondent wants them isolated into a "BADBLOCKS" file,
> > he can make a utility to do that. It's really quite easy because
> > you can raw-read disks under Linux, plus there is already
> > the `badblocks` program that will locate them.
>
> Yes, it's trivially easy to figure out that block 193453 on /dev/hdb is bad.
> It's even not too bad to map that to an offset on /dev/hdb4.  Even if you're
> using LVM or DM to map stuff, it's still attackable.  But how do you guarantee
> that block 193453 gets allocated to your badblocks file and not to some other
> file that just tried to extend itself by 32K?
>
>

You repair file-systems when they are not mounted. Also, the
source of an old version of e2fsprogs has a "defrag" utility
that could be used as a sample of how to create a file that
owns the bad blocks you find.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-21 20:21     ` Valdis.Kletnieks
  2003-10-21 20:31       ` Richard B. Johnson
@ 2003-10-21 21:53       ` Theodore Ts'o
  2003-10-22  2:32         ` Valdis.Kletnieks
  1 sibling, 1 reply; 52+ messages in thread
From: Theodore Ts'o @ 2003-10-21 21:53 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: root, linux-kernel

On Tue, Oct 21, 2003 at 04:21:26PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Tue, 21 Oct 2003 16:05:15 EDT, "Richard B. Johnson" said:
> 
> > If the respondent wants them isolated into a "BADBLOCKS" file,
> > he can make a utility to do that. It's really quite easy because
> > you can raw-read disks under Linux, plus there is already
> > the `badblocks` program that will locate them.
> 
> Yes, it's trivially easy to figure out that block 193453 on /dev/hdb is bad.
> It's even not too bad to map that to an offset on /dev/hdb4.  Even if you're
> using LVM or DM to map stuff, it's still attackable.  But how do you guarantee
> that block 193453 gets allocated to your badblocks file and not to some other
> file that just tried to extend itself by 32K?

Read the e2fsck man page, and pay attention to the -c, -l, and -L
options....

						- Ted


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-21 21:53       ` Theodore Ts'o
@ 2003-10-22  2:32         ` Valdis.Kletnieks
  2003-10-23 17:28           ` Theodore Ts'o
  0 siblings, 1 reply; 52+ messages in thread
From: Valdis.Kletnieks @ 2003-10-22  2:32 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: root, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

On Tue, 21 Oct 2003 17:53:46 EDT, "Theodore Ts'o" said:

> Read the e2fsck man page, and pay attention to the -c, -l, and -L
> options....

Yes, I knew this was doable if the filesystem was unmounted - the fun is of
course that if you get a bad block in /usr or someplace similar, it would
REALLY be nice to be able to do something about it without taking it offline..

The cynic in me says that if I have to take it down to flag a bad block, I'm
going to use the downtime to just replace the *bleep*ing thing with something
more acquainted with the concept of block relocation - if it didn't relocate
it, either the relocation sectors are used up or the drive has pessimal
microcode, both of which are bad news.

I admit I haven't cooked up a test filesystem and actually checked what happens
if you feed the -l flag a block that's already in a file (presumably it
deallocates it from the inode and leaves a sparse hole) or a block that
contains inodes or a superblock copy...

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Blockbusting news, results are in
  2003-10-22  2:32         ` Valdis.Kletnieks
@ 2003-10-23 17:28           ` Theodore Ts'o
  0 siblings, 0 replies; 52+ messages in thread
From: Theodore Ts'o @ 2003-10-23 17:28 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: root, linux-kernel

On Tue, Oct 21, 2003 at 10:32:34PM -0400, Valdis.Kletnieks@vt.edu wrote:
> Yes, I knew this was doable if the filesystem was unmounted - the fun is of
> course that if you get a bad block in /usr or someplace similar, it would
> REALLY be nice to be able to do something about it without taking it offline..

Agreed, it wouldn't be that hard to add some kernel code to do this
on-line, at least if the block isn't already allocated.  If the block
is already allocated, there would need to have to be some userspace
help to find which file the block actually belongs to, so the block
could be substituted out and the user appropriatly warned.

> I admit I haven't cooked up a test filesystem and actually checked what happens
> if you feed the -l flag a block that's already in a file (presumably it
> deallocates it from the inode and leaves a sparse hole) or a block that
> contains inodes or a superblock copy...

It gets treated as a block that has been claimed by two inodes (which
it is; the original inode and the bad block inode).  E2fsck gives the
user the option of (a) allocating a new block so that the file gets
replacement block with whatever data could be copied from the bad
block, or (b) if the user declines the first option, e2fsck next gives
the user the option of deleting the file containing bad block.

						- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2003-10-23 17:33 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-19  2:16 Blockbusting news, results are in Norman Diamond
2003-10-19  4:15 ` Larry McVoy
2003-10-19  5:00   ` Paul
2003-10-19  8:19     ` Andre Hedrick
2003-10-19  8:08   ` Hans Reiser
2003-10-19  8:35     ` William Lee Irwin III
2003-10-19 20:01       ` Pavel Machek
2003-10-19 20:11         ` William Lee Irwin III
2003-10-20  7:24         ` John Bradford
2003-10-19 22:49       ` jw schultz
2003-10-20  7:22         ` John Bradford
2003-10-20  8:22           ` jw schultz
2003-10-20  7:27         ` Hans Reiser
2003-10-20  8:08           ` jw schultz
2003-10-19 19:49     ` Pavel Machek
2003-10-20  7:22       ` Hans Reiser
2003-10-21 10:31     ` Eric W. Biederman
2003-10-21  8:43 ` Jan-Benedict Glaw
  -- strict thread matches above, loose matches on Subject: below --
2003-10-19  7:37 Mudama, Eric
2003-10-19  8:09 ` Norman Diamond
2003-10-19  8:24   ` Hans Reiser
2003-10-19 11:43   ` Ralf Baechle
2003-10-19 15:55   ` Krzysztof Halasa
2003-10-19  8:13 ` Rogier Wolff
2003-10-19  8:17 ` Hans Reiser
2003-10-19  8:41   ` Rogier Wolff
2003-10-20 15:56     ` Thayne Harbaugh
2003-10-19  8:21 ` Andre Hedrick
2003-10-19  8:27   ` Hans Reiser
2003-10-19  9:01     ` Erik Andersen
2003-10-19 14:10       ` Andre Hedrick
2003-10-19 18:16         ` Hans Reiser
2003-10-19 19:44           ` Andre Hedrick
2003-10-20  7:21             ` Hans Reiser
2003-10-19 14:42       ` Valdis.Kletnieks
2003-10-19 10:47 ` Ingo Oeser
2003-10-19 17:36 Mudama, Eric
2003-10-19 17:39 Mudama, Eric
2003-10-19 17:51 Mudama, Eric
2003-10-20  6:22 ` Rogier Wolff
2003-10-20 14:08 Mudama, Eric
2003-10-20 14:42 ` John Bradford
2003-10-20 15:55 Mudama, Eric
2003-10-20 17:32 ` Hans Reiser
2003-10-21 15:18 Norman Diamond
2003-10-21 19:31 ` Chuck Campbell
2003-10-21 20:05   ` Richard B. Johnson
2003-10-21 20:21     ` Valdis.Kletnieks
2003-10-21 20:31       ` Richard B. Johnson
2003-10-21 21:53       ` Theodore Ts'o
2003-10-22  2:32         ` Valdis.Kletnieks
2003-10-23 17:28           ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).