linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: maarten <maarten@ultratux.net>
To: linux-raid@vger.kernel.org
Subject: Re: Spares and partitioning huge disks
Date: Sun, 9 Jan 2005 00:01:31 +0100	[thread overview]
Message-ID: <200501090001.31738.maarten@ultratux.net> (raw)
In-Reply-To: <crpg32$2rr$1@sea.gmane.org>

On Saturday 08 January 2005 21:33, Mario Holbe wrote:
> maarten <maarten@ultratux.net> wrote:
> > On Saturday 08 January 2005 19:55, you wrote:
> >> My disks claim to be able to re-locate bad blocks on read error.  But I
> >> am not sure if this is correctable errors or not.  If not correctable
> >> errors are re-located, what data does the drive return?  Since I don't
> >> know, I
>
> ...
>
> > Afaik, if a drive senses it gets more 'difficult' than usual to read a
> > sector, it will automatically copy it to a spare sector and reassign it.
> > However, I
>
> No, this is usually not the case. At least I don't know IDE drives
> that do so. This is why I call it `sector read error'.

Do you mean SCSI ones do ?  If so, I thought the firmware intelligence between 
ATA and SCSI vanished long ago.

> Each newer disk has some amount of `spare sectors' which can be
> used to relocate bad sectors. Usually, you have two situations
> where you can detect a bad sector:
> 1. If you write to it and this attempt fails and
> 2. If you read from it and this attempt fails.

Hm.  I'm not extremely well versed on modern drive technology but 
nevertheless: How I understood it is somewhat different, namely:

1.  If you write to it and that fails the drive will allocate a spare sector.  
From that we [should be] able to conclude that if you get a write failure 
that the drive ran out of spare sectors. (is that a fact, or not??)

2. If you read from it, the drives' firmware will see an error and:
2a: Retry the read a couple more times, succeed, copy that to a spare sector 
and reallocate.   - OR
2b: Retry the read, fail miserably despite that and (only then) signal a read 
error to the host.

I've heard for a long time that drives are much more sophisticated than 
before, retrying failed reads.  They can try to read 'off-track' (off-axis) 
and such things that were impossible when stepping motors were still used. 
But that was more than 10 years ago, now they all have coil-actuated heads.

In other words, drives don't wait till the sector is really unreadable, 
they'll reallocate at the first sign of trouble (decaying signal strength, 
spurious crc errors, stuff like that).  This is also suggested by the 
observable behaviour of drive and OS; if a reallocation only would occur 
after the fact, ie. when the data is beyond salvaging, then every sector 
reallocation would by definition lead to corrupt data in that file. Generally 
speaking -since there are so many spare sectors- an OS would die very soon as 
all its files / libs/ DLLs got corrupted due to the reallocation (which is 
supposed to be transparent to the host, only the drive knows).
But... I have no solid proof of this though, other than reasoning like this.

> 1. would require some verify-operation, so I'm not sure if this
> is done at all in the wild.
> 2. has a simple problem: If you get a read-request for sector x
> and you cannot read it, what data should you return then? The
> answer is simple: you don't return data but an error (the read-
> error). Additionally you mark the sector as bad and relocate the
> next write-request for that sector to some spare sector and further
> read-requests then too. However, you still have to respond error
> messages for each subsequent read-request before the first
> relocated write-request appears.
> And afaik this is what current disks do. That's why you can just
> re-sync the failed disk to the array again without any problem -
> because the write-request happens then, the relocation takes place
> and everything's fine.

So basically what you're saying is that reallocation _only_ happens on 
_writes_ ?  Hm.  Maybe, I don't know...

The problem with my theory is that if it is true, then that automatically 
means that whenever md gets a read error that that data is indeed gone.
Or maybe that isn't a problem since the disk gets kicked, and afterwards 
during resync the reallocation pays off. Yeah.  That must be it. :-)

Maarten


  reply	other threads:[~2005-01-08 23:01 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-06 14:16 Spares and partitioning huge disks maarten
2005-01-06 16:46 ` Guy
2005-01-06 17:08   ` maarten
2005-01-06 17:31 ` Guy
2005-01-06 18:18   ` maarten
     [not found]     ` <41DD83DA.9040609@h3c.com>
2005-01-06 19:42       ` maarten
2005-01-07 20:59 ` Mario Holbe
2005-01-07 21:57   ` Guy
2005-01-08 10:22     ` Mario Holbe
2005-01-08 12:19       ` maarten
2005-01-08 16:33         ` Guy
2005-01-08 16:58           ` maarten
2005-01-08 14:52     ` Frank van Maarseveen
2005-01-08 15:50       ` Mario Holbe
2005-01-08 16:32       ` Guy
2005-01-08 17:16         ` maarten
2005-01-08 18:55           ` Guy
2005-01-08 19:25             ` maarten
2005-01-08 20:33               ` Mario Holbe
2005-01-08 23:01                 ` maarten [this message]
2005-01-09 10:10                   ` Mario Holbe
2005-01-09 16:23                     ` Guy
2005-01-09 16:36                       ` Michael Tokarev
2005-01-09 17:52                         ` Peter T. Breuer
2005-01-09 17:59                           ` Michael Tokarev
2005-01-09 18:34                             ` Peter T. Breuer
2005-01-09 20:28                             ` Guy
2005-01-09 20:47                               ` Peter T. Breuer
2005-01-10  7:19                                 ` Peter T. Breuer
2005-01-10  9:05                                   ` Guy
2005-01-10  9:38                                     ` Peter T. Breuer
2005-01-10 12:31                                   ` Peter T. Breuer
2005-01-10 13:19                                     ` Peter T. Breuer
2005-01-10 18:37                                       ` Peter T. Breuer
2005-01-11 11:34                                         ` Peter T. Breuer
2005-01-08 23:09               ` Guy
2005-01-09  0:56                 ` maarten
2005-01-13  2:05                 ` Neil Brown
2005-01-13  4:55                   ` Guy
2005-01-13  9:27                   ` Peter T. Breuer
2005-01-13 15:53                     ` Guy
2005-01-13 17:16                       ` Peter T. Breuer
2005-01-13 20:40                         ` Guy
2005-01-13 23:32                           ` Peter T. Breuer
2005-01-14  2:43                             ` Guy
2005-01-08 16:49       ` maarten
2005-01-08 19:01         ` maarten
2005-01-10 16:34           ` maarten
2005-01-10 16:36             ` Gordon Henderson
2005-01-10 17:10               ` maarten
2005-01-16 16:19                 ` 4 questions. Chieftec chassis case CA-01B, resync times, selecting ide driver module loading, raid5 :2 drives on same ide channel Mitchell Laks
2005-01-16 17:53                   ` Gordon Henderson
2005-01-16 18:22                   ` Maarten
2005-01-16 19:39                   ` Guy
2005-01-16 20:55                     ` Maarten
2005-01-16 21:58                       ` Guy
2005-01-10 17:13             ` Spares and partitioning huge disks Guy
2005-01-10 17:35               ` hard disk re-locates bad block on read Guy
2005-01-11 14:34                 ` Tom Coughlan
2005-01-11 22:43                   ` Guy
2005-01-12 13:51                     ` Tom Coughlan
2005-01-10 18:24               ` Spares and partitioning huge disks maarten
2005-01-10 20:09                 ` Guy
2005-01-10 21:21                   ` maarten
2005-01-11  1:04                   ` maarten
2005-01-10 18:40               ` maarten
2005-01-10 19:41                 ` Guy
2005-01-12 11:41               ` RAID-6 Gordon Henderson
2005-01-13  2:11                 ` RAID-6 Neil Brown
2005-01-15 16:12                   ` RAID-6 Gordon Henderson
2005-01-17  8:04                     ` RAID-6 Turbo Fredriksson
2005-01-11 10:09             ` Spares and partitioning huge disks KELEMEN Peter
2005-01-09 19:33         ` Frank van Maarseveen
2005-01-09 21:26           ` maarten
2005-01-09 22:29             ` Frank van Maarseveen
2005-01-09 23:16               ` maarten
2005-01-10  8:15                 ` Frank van Maarseveen
2005-01-14 17:29                   ` Dieter Stueken
2005-01-14 17:46                     ` maarten
2005-01-14 19:14                       ` Derek Piper
2005-01-15  0:13                     ` Michael Tokarev
2005-01-15  9:34                       ` Peter T. Breuer
2005-01-15  9:54                         ` Mikael Abrahamsson
2005-01-15 10:31                           ` Brad Campbell
2005-01-15 11:10                             ` Mikael Abrahamsson
2005-01-15 10:33                           ` Peter T. Breuer
2005-01-15 11:07                             ` Mikael Abrahamsson
2005-01-09 23:20             ` Guy
2005-01-10  7:42               ` Gordon Henderson
2005-01-10  9:03                 ` Guy
2005-01-10 12:21                   ` Stats... [RE: Spares and partitioning huge disks] Gordon Henderson
2005-01-10  0:42             ` Spares and partitioning huge disks Guy
  -- strict thread matches above, loose matches on Subject: below --
2005-01-13  9:53 Bene Martin
2005-01-13 10:11 ` Peter T. Breuer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200501090001.31738.maarten@ultratux.net \
    --to=maarten@ultratux.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).