Bad blocks

All of lore.kernel.org
 help / color / mirror / Atom feed

* Bad blocks
@ 2002-04-16  1:13 Sam Vilain
  2002-04-16 11:43 ` Matthias Andree
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Sam Vilain @ 2002-04-16  1:13 UTC (permalink / raw)
  To: reiserfs-list

I seem to have some bad blocks on my laptop's hard drive, how do I mark
them as bad to reiserfs?

root@hoffman:/usr/share/doc/RFC/unclassified# ls
hda: read_intr: status=0x5b { DriveReady SeekComplete DataRequest Index Error }
hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=25866987, sector=4211304
end_request: I/O error, dev 03:06 (hda), sector 4211304
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [66745 67408 0x0 SD]
ls: rfc470.txt.gz: Permission denied
hda: read_intr: status=0x5b { DriveReady SeekComplete DataRequest Index Error }
hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=25866987, sector=4211304
end_request: I/O error, dev 03:06 (hda), sector 4211304
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [66745 67410 0x0 SD]
ls: rfc486.txt.gz: Permission denied
hda: read_intr: status=0x5b { DriveReady SeekComplete DataRequest Index Error }
hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=25866987, sector=4211304
end_request: I/O error, dev 03:06 (hda), sector 4211304
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [66745 67409 0x0 SD]
ls: rfc478.txt.gz: Permission denied
rfc1.txt.gz    rfc239.txt.gz  rfc370.txt.gz  rfc501.txt.gz  rfc697.txt.gz  rfc84.txt.gz
   [...]

I know I have 8 bad sectors:

hoffman:/usr/share/doc/RFC/unclassified# dd if=/dev/hda6 of=test skip=4211303 count=2
dd: reading `/dev/hda6': Input/output error
1+0 records in
1+0 records out
hoffman:/usr/share/doc/RFC/unclassified# dd if=/dev/hda6 of=test skip=4211311 count=2
dd: reading `/dev/hda6': Input/output error
0+0 records in
0+0 records out
hoffman:/usr/share/doc/RFC/unclassified# dd if=/dev/hda6 of=test skip=4211312 count=2
2+0 records in
2+0 records out
hoffman:/usr/share/doc/RFC/unclassified# 

`mkreiserfs' is missing the -c and -l options.

`reiserfsck' aborts during the scan with the message:

pass_through_tree: unable to read 526413 block on device 0x3

Apparently, ext2 handles marking sectors as bad automatically.

How can I deal with this?  If anyone knows of a tool to re-format just
8 sectors (to let the disk re-map the blocks elsewhere), that also
would be helpful.

Cheers,
--
   Sam Vilain, sam@vilain.net     WWW: http://sam.vilain.net/
    7D74 2A09 B2D3 C30F F78E      GPG: http://sam.vilain.net/sam.asc
    278A A425 30A9 05B5 2F13


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad blocks
  2002-04-16  1:13 Bad blocks Sam Vilain
@ 2002-04-16 11:43 ` Matthias Andree
  2002-04-16 11:47 ` Oleg Drokin
  2002-04-16 12:01 ` Ed Tomlinson
  2 siblings, 0 replies; 23+ messages in thread
From: Matthias Andree @ 2002-04-16 11:43 UTC (permalink / raw)
  To: reiserfs-list

Sam Vilain <sam@vilain.net> writes:

> How can I deal with this?  If anyone knows of a tool to re-format just
> 8 sectors (to let the disk re-map the blocks elsewhere), that also
> would be helpful.

Manufacturers may have these tools, but some do a full low-level
format. Usually, writing the bad blocks will make the drive remap them
if it has spare sectors left to remap to.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad blocks
  2002-04-16  1:13 Bad blocks Sam Vilain
  2002-04-16 11:43 ` Matthias Andree
@ 2002-04-16 11:47 ` Oleg Drokin
  2002-04-16 12:01 ` Ed Tomlinson
  2 siblings, 0 replies; 23+ messages in thread
From: Oleg Drokin @ 2002-04-16 11:47 UTC (permalink / raw)
  To: Sam Vilain; +Cc: reiserfs-list

Hello!

   There is inferior bad blocks support feature described in our FAQ.
   Also next reiserfsprogs version will have more improved bad blocks support.

Bye,
    Oleg
On Tue, Apr 16, 2002 at 02:13:50AM +0100, Sam Vilain wrote:
> I seem to have some bad blocks on my laptop's hard drive, how do I mark
> them as bad to reiserfs?
> 
> root@hoffman:/usr/share/doc/RFC/unclassified# ls
> hda: read_intr: status=0x5b { DriveReady SeekComplete DataRequest Index Error }
> hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=25866987, sector=4211304
> end_request: I/O error, dev 03:06 (hda), sector 4211304
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [66745 67408 0x0 SD]
> ls: rfc470.txt.gz: Permission denied
> hda: read_intr: status=0x5b { DriveReady SeekComplete DataRequest Index Error }
> hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=25866987, sector=4211304
> end_request: I/O error, dev 03:06 (hda), sector 4211304
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [66745 67410 0x0 SD]
> ls: rfc486.txt.gz: Permission denied
> hda: read_intr: status=0x5b { DriveReady SeekComplete DataRequest Index Error }
> hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=25866987, sector=4211304
> end_request: I/O error, dev 03:06 (hda), sector 4211304
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [66745 67409 0x0 SD]
> ls: rfc478.txt.gz: Permission denied
> rfc1.txt.gz    rfc239.txt.gz  rfc370.txt.gz  rfc501.txt.gz  rfc697.txt.gz  rfc84.txt.gz
>    [...]
> 
> I know I have 8 bad sectors:
> 
> hoffman:/usr/share/doc/RFC/unclassified# dd if=/dev/hda6 of=test skip=4211303 count=2
> dd: reading `/dev/hda6': Input/output error
> 1+0 records in
> 1+0 records out
> hoffman:/usr/share/doc/RFC/unclassified# dd if=/dev/hda6 of=test skip=4211311 count=2
> dd: reading `/dev/hda6': Input/output error
> 0+0 records in
> 0+0 records out
> hoffman:/usr/share/doc/RFC/unclassified# dd if=/dev/hda6 of=test skip=4211312 count=2
> 2+0 records in
> 2+0 records out
> hoffman:/usr/share/doc/RFC/unclassified# 
> 
> `mkreiserfs' is missing the -c and -l options.
> 
> `reiserfsck' aborts during the scan with the message:
> 
> pass_through_tree: unable to read 526413 block on device 0x3
> 
> Apparently, ext2 handles marking sectors as bad automatically.
> 
> How can I deal with this?  If anyone knows of a tool to re-format just
> 8 sectors (to let the disk re-map the blocks elsewhere), that also
> would be helpful.
> 
> Cheers,
> --
>    Sam Vilain, sam@vilain.net     WWW: http://sam.vilain.net/
>     7D74 2A09 B2D3 C30F F78E      GPG: http://sam.vilain.net/sam.asc
>     278A A425 30A9 05B5 2F13
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad blocks
  2002-04-16  1:13 Bad blocks Sam Vilain
  2002-04-16 11:43 ` Matthias Andree
  2002-04-16 11:47 ` Oleg Drokin
@ 2002-04-16 12:01 ` Ed Tomlinson
  2 siblings, 0 replies; 23+ messages in thread
From: Ed Tomlinson @ 2002-04-16 12:01 UTC (permalink / raw)
  To: Sam Vilain, reiserfs-list

On April 15, 2002 09:13 pm, Sam Vilain wrote:
> I seem to have some bad blocks on my laptop's hard drive, how do I mark
> them as bad to reiserfs?

EVMS has a feature to handle bad blocks.

Ed Tomlinson

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bad blocks
@ 2002-06-17 15:19 Alexander Saers
  2002-06-17 15:23 ` Oleg Drokin
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Saers @ 2002-06-17 15:19 UTC (permalink / raw)
  To: reiserfs-list

Hello

It seams that my harddrive have some bad sectors. I therefore made some
researche on how to handle it. I found this peace of information

http://www.reiserfs.org/bad-block-handling.html

But this only makes the sectors busy and the next reiserfsck will then
detect this and remove the bussy mark. Isnt there a way to create files on
the bad sectors so that nobody uses them. Not even in the future after a
reiserfsck. Like if i have a folder on the drive that says
/badblocks/sector8004
/badblocks/sector8010
etc etc

Thanx fo a good filesystem

/Alexander

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: bad blocks
  2002-06-17 15:19 bad blocks Alexander Saers
@ 2002-06-17 15:23 ` Oleg Drokin
  0 siblings, 0 replies; 23+ messages in thread
From: Oleg Drokin @ 2002-06-17 15:23 UTC (permalink / raw)
  To: Alexander Saers; +Cc: reiserfs-list

Hello!

On Mon, Jun 17, 2002 at 05:19:18PM +0200, Alexander Saers wrote:

> But this only makes the sectors busy and the next reiserfsck will then
> detect this and remove the bussy mark. Isnt there a way to create files on
> the bad sectors so that nobody uses them. Not even in the future after a
> reiserfsck. Like if i have a folder on the drive that says
> /badblocks/sector8004
> /badblocks/sector8010
> etc etc
> Thanx fo a good filesystem

New reiserfsck have support for lists of bad blocks, but this was not
strongly tested and therefore disabled by default.
But some work is done in this direction

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Bad Blocks
@ 2002-09-19 14:48 Abiy,Mike [Edm]
  2002-09-19 15:23 ` Jorge R . Csapo
  0 siblings, 1 reply; 23+ messages in thread
From: Abiy,Mike [Edm] @ 2002-09-19 14:48 UTC (permalink / raw)
  To: linux-admin

I would like to apologize, first, if this happens to be a simple question.

1. How does one test for bad blocks on a hard drive in linux.

2. More important, how does one render a bad block on hard drive unreadable,
so that the bad block utility that was used( whatever it may be)  and/or
program does not try to write or read to this same bad block, resulting in
the same errors happening again and again.

3. This was necessitated by trips to a remote site from remote power bootup
( sometime it is absolutely necessary to do that) to do a manual fsck,
because the linux box stops the normal bootup process awaiting manual
intervention to do manual fsck.

any information on this topic would be greatly appreciated.

thanks
Michael

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad Blocks
       [not found] <F8500ECEBD66D211A3D20008C724A29C01CEFD9D@SR-EDM-EXCH4.edm. ab.ec.gc.ca>
@ 2002-09-19 15:03 ` Scott Taylor
  0 siblings, 0 replies; 23+ messages in thread
From: Scott Taylor @ 2002-09-19 15:03 UTC (permalink / raw)
  To: linux-admin

At 07:48 AM 19/09/2002, Abiy,Mike [Edm] wrote:
>I would like to apologize, first, if this happens to be a simple question.
>
>1. How does one test for bad blocks on a hard drive in linux.
>
>2. More important, how does one render a bad block on hard drive unreadable,
>so that the bad block utility that was used( whatever it may be)  and/or
>program does not try to write or read to this same bad block, resulting in
>the same errors happening again and again.
>
>3. This was necessitated by trips to a remote site from remote power bootup
>( sometime it is absolutely necessary to do that) to do a manual fsck,
>because the linux box stops the normal bootup process awaiting manual
>intervention to do manual fsck.
>
>any information on this topic would be greatly appreciated.

Hello Michael,

That could depend on your hardware and file system.  Probably a good idea 
to read 'man fsck'


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad Blocks
  2002-09-19 14:48 Bad Blocks Abiy,Mike [Edm]
@ 2002-09-19 15:23 ` Jorge R . Csapo
  0 siblings, 0 replies; 23+ messages in thread
From: Jorge R . Csapo @ 2002-09-19 15:23 UTC (permalink / raw)
  To: Abiy,Mike [Edm]; +Cc: linux-admin

assim falou Abiy,Mike [Edm] (em 19/09/2002):
> I would like to apologize, first, if this happens to be a simple question.
> 
> 1. How does one test for bad blocks on a hard drive in linux.
> 
> 2. More important, how does one render a bad block on hard drive unreadable,
> so that the bad block utility that was used( whatever it may be)  and/or
> program does not try to write or read to this same bad block, resulting in
> the same errors happening again and again.

mkfs -c does just that, addressing both 1. and 2. 

> 
> 3. This was necessitated by trips to a remote site from remote power bootup
> ( sometime it is absolutely necessary to do that) to do a manual fsck,
> because the linux box stops the normal bootup process awaiting manual
> intervention to do manual fsck.

This is totally configurable, meaning the necessity for a manual fsck can
simply be removed. You can either prevent Linux from fsck'ing at boot (not a
really good idea) or force fsck at every boot but with options that don't
require manual intervention. The way to do this depends on your distro, but
it may involve editing /etc/inittab, a number of /etc/rc.d files, re-creating
your filesystems or all of the above...

-- 
Jorge R. Csapo
--------------------------------------------------
 /"\
 \ / CAMPANHA DA FITA ASCII - CONTRA MAIL HTML
  X  ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 / \
--------------------------------------------------
http://www.completo.com.br/~jorge
===========================================
With a PC, I always felt limited
by the software available.
On Unix, I am limited only by my knowledge.
--Peter J. Schoenster

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Bad Blocks
@ 2002-09-23 14:17 Abiy,Mike [Edm]
  2002-09-23 20:33 ` Glynn Clements
  0 siblings, 1 reply; 23+ messages in thread
From: Abiy,Mike [Edm] @ 2002-09-23 14:17 UTC (permalink / raw)
  To: 'Jorge R . Csapo', Abiy,Mike [Edm]; +Cc: linux-admin

Continuing on the same subject, How does the linux box decide that it
requires a manual fsck, during bootup, is there a specified number of bad
blocks that it has to come across before it decides to halt the bootup
process and ( fsck) and wait for a manual fsck; if so, is there a way to
change that. again, the whole reason is to avoid having to drive over six
and half hours just to do a manual fsck and the loss of the linux box(
system) during that time.

assim falou Abiy,Mike [Edm] (em 19/09/2002):
> I would like to apologize, first, if this happens to be a simple question.
> 
> 1. How does one test for bad blocks on a hard drive in linux.
> 
> 2. More important, how does one render a bad block on hard drive
unreadable,
> so that the bad block utility that was used( whatever it may be)  and/or
> program does not try to write or read to this same bad block, resulting in
> the same errors happening again and again.

mkfs -c does just that, addressing both 1. and 2. 

> 
> 3. This was necessitated by trips to a remote site from remote power
bootup
> ( sometime it is absolutely necessary to do that) to do a manual fsck,
> because the linux box stops the normal bootup process awaiting manual
> intervention to do manual fsck.

This is totally configurable, meaning the necessity for a manual fsck can
simply be removed. You can either prevent Linux from fsck'ing at boot (not a
really good idea) or force fsck at every boot but with options that don't
require manual intervention. The way to do this depends on your distro, but
it may involve editing /etc/inittab, a number of /etc/rc.d files,
re-creating
your filesystems or all of the above...

-- 
Jorge R. Csapo
--------------------------------------------------
 /"\
 \ / CAMPANHA DA FITA ASCII - CONTRA MAIL HTML
  X  ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 / \
--------------------------------------------------
http://www.completo.com.br/~jorge
===========================================
With a PC, I always felt limited
by the software available.
On Unix, I am limited only by my knowledge.
--Peter J. Schoenster

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Bad Blocks
  2002-09-23 14:17 Abiy,Mike [Edm]
@ 2002-09-23 20:33 ` Glynn Clements
  0 siblings, 0 replies; 23+ messages in thread
From: Glynn Clements @ 2002-09-23 20:33 UTC (permalink / raw)
  To: Abiy,Mike [Edm]; +Cc: 'Jorge R . Csapo', linux-admin

Abiy,Mike [Edm] wrote:

> Continuing on the same subject, How does the linux box decide that it
> requires a manual fsck, during bootup

If fsck returns an error code other than 0 (no errors) or 1 (some
errors, but they were all fixed), the boot sequence will normally be
interrupted before the root filesystem is mounted read-write.

-- 
Glynn Clements <glynn.clements@virgin.net>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Bad Blocks
@ 2002-09-24  4:14 Aleksander Kujbida
  2002-09-24  4:54 ` Glynn Clements
  0 siblings, 1 reply; 23+ messages in thread
From: Aleksander Kujbida @ 2002-09-24  4:14 UTC (permalink / raw)
  To: linux-admin

Also, if a specified period of time has elapsed since the last fsck (or is 
it last boot?), it will fsck. Can't remember where the time period is set.

Aleksander


>From: Glynn Clements <glynn.clements@virgin.net>
>To: "Abiy,Mike [Edm]" <Mike.Abiy@EC.gc.ca>
>CC: "'Jorge R . Csapo'" <jorge@completo.com.br>,linux-admin@vger.kernel.org
>Subject: RE: Bad Blocks
>Date: Mon, 23 Sep 2002 21:33:10 +0100
>
>
>Abiy,Mike [Edm] wrote:
>
> > Continuing on the same subject, How does the linux box decide that it
> > requires a manual fsck, during bootup
>
>If fsck returns an error code other than 0 (no errors) or 1 (some
>errors, but they were all fixed), the boot sequence will normally be
>interrupted before the root filesystem is mounted read-write.
>


_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail. 
http://www.hotmail.com


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Bad Blocks
  2002-09-24  4:14 Aleksander Kujbida
@ 2002-09-24  4:54 ` Glynn Clements
  0 siblings, 0 replies; 23+ messages in thread
From: Glynn Clements @ 2002-09-24  4:54 UTC (permalink / raw)
  To: Aleksander Kujbida; +Cc: linux-admin


Aleksander Kujbida wrote:

> > > Continuing on the same subject, How does the linux box decide that it
> > > requires a manual fsck, during bootup
> >
> > If fsck returns an error code other than 0 (no errors) or 1 (some
> > errors, but they were all fixed), the boot sequence will normally be
> > interrupted before the root filesystem is mounted read-write.
> 
> Also, if a specified period of time has elapsed since the last fsck (or is 
> it last boot?), it will fsck. Can't remember where the time period is set.

The boot sequence *always* runs fsck; but fsck itself won't actually
perform the check if the filesystem was cleanly unmounted and neither
the maximum mount count nor the check interval have been reached.

-- 
Glynn Clements <glynn.clements@virgin.net>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Bad Blocks
@ 2002-09-24  7:54 Aleksander Kujbida
  0 siblings, 0 replies; 23+ messages in thread
From: Aleksander Kujbida @ 2002-09-24  7:54 UTC (permalink / raw)
  To: linux-admin

Thanks for the clarification, Glynn.
Aleksander


>From: Glynn Clements <glynn.clements@virgin.net>
>To: "Aleksander Kujbida" <akujbida@hotmail.com>
>CC: linux-admin@vger.kernel.org
>Subject: RE: Bad Blocks
>Date: Tue, 24 Sep 2002 05:54:28 +0100
>
>
>Aleksander Kujbida wrote:
>
> > > > Continuing on the same subject, How does the linux box decide that 
>it
> > > > requires a manual fsck, during bootup
> > >
> > > If fsck returns an error code other than 0 (no errors) or 1 (some
> > > errors, but they were all fixed), the boot sequence will normally be
> > > interrupted before the root filesystem is mounted read-write.
> >
> > Also, if a specified period of time has elapsed since the last fsck (or 
>is
> > it last boot?), it will fsck. Can't remember where the time period is 
>set.
>
>The boot sequence *always* runs fsck; but fsck itself won't actually
>perform the check if the filesystem was cleanly unmounted and neither
>the maximum mount count nor the check interval have been reached.
>
>--
>Glynn Clements <glynn.clements@virgin.net>




_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail. 
http://www.hotmail.com


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Bad blocks
@ 2004-04-01 14:06 Kalev Lember
  2004-04-01 14:36 ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Kalev Lember @ 2004-04-01 14:06 UTC (permalink / raw)
  To: linux-mtd

Hi,

I am going to use DOC Millennium Plus and I do not want to use M-Systems 
propietary kernel modules.
Having read the mailing list archives I have some questions.  Does current 
INFTL code support bad block handling? Without that I would say it is 
virtually useless. Am I correct?

-- 
Best regards,
Kalev Lember

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad blocks
  2004-04-01 14:06 Bad blocks Kalev Lember
@ 2004-04-01 14:36 ` David Woodhouse
  2004-04-02  0:14   ` Greg Ungerer
  2004-04-04 11:16   ` Kalev Lember
  0 siblings, 2 replies; 23+ messages in thread
From: David Woodhouse @ 2004-04-01 14:36 UTC (permalink / raw)
  To: Kalev Lember; +Cc: gerg, linux-mtd

On Thu, 2004-04-01 at 17:06 +0300, Kalev Lember wrote:
> Hi,
> 
> I am going to use DOC Millennium Plus and I do not want to use M-Systems 
> propietary kernel modules.
> Having read the mailing list archives I have some questions.  Does current 
> INFTL code support bad block handling? Without that I would say it is 
> virtually useless. Am I correct?

It doesn't, and you're probably correct, yes.

This is fairly high up on my TODO list but keeps getting preempted by
RealWork(tm). Are you volunteering? It shouldn't be too hard.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad blocks
  2004-04-01 14:36 ` David Woodhouse
@ 2004-04-02  0:14   ` Greg Ungerer
  2004-04-04 11:16   ` Kalev Lember
  1 sibling, 0 replies; 23+ messages in thread
From: Greg Ungerer @ 2004-04-02  0:14 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Kalev Lember, linux-mtd


David Woodhouse wrote:
> On Thu, 2004-04-01 at 17:06 +0300, Kalev Lember wrote:
>>I am going to use DOC Millennium Plus and I do not want to use M-Systems 
>>propietary kernel modules.
>>Having read the mailing list archives I have some questions.  Does current 
>>INFTL code support bad block handling? Without that I would say it is 
>>virtually useless. Am I correct?
> 
> 
> It doesn't, and you're probably correct, yes.
> 
> This is fairly high up on my TODO list but keeps getting preempted by
> RealWork(tm). Are you volunteering? It shouldn't be too hard.

Yeah, high on my todo list as well. But I am not going to
get to it any time soon. It should be quite simple to do.

Regards
Greg


------------------------------------------------------------------------
Greg Ungerer  --  Chief Software Dude       EMAIL:     gerg@snapgear.com
SnapGear -- a CyberGuard Company            PHONE:       +61 7 3435 2888
825 Stanley St,                             FAX:         +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia         WEB: http://www.SnapGear.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad blocks
  2004-04-01 14:36 ` David Woodhouse
  2004-04-02  0:14   ` Greg Ungerer
@ 2004-04-04 11:16   ` Kalev Lember
  1 sibling, 0 replies; 23+ messages in thread
From: Kalev Lember @ 2004-04-04 11:16 UTC (permalink / raw)
  To: David Woodhouse; +Cc: gerg, linux-mtd

On Thu, 1 Apr 2004, David Woodhouse wrote:
> On Thu, 2004-04-01 at 17:06 +0300, Kalev Lember wrote:
> > INFTL code support bad block handling? Without that I would say it is 
> 
> This is fairly high up on my TODO list but keeps getting preempted by
> RealWork(tm). Are you volunteering? It shouldn't be too hard.

No, I could not do it because I do not have a developing platform, I have 
only access to some devices that have DoC soldered on their mainboards. 
They can boot only from DoC, currently using M-Systems driver and linux. 
If I were to make one unbootable the would have to reflash it in the 
manufacturers plant. Sorry.

-- 
Kalev Lember

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Bad blocks
  2005-02-17  7:14 Question regarding mdadm.conf Michael Tokarev
@ 2005-02-17  8:30 ` Guy
  0 siblings, 0 replies; 23+ messages in thread
From: Guy @ 2005-02-17  8:30 UTC (permalink / raw)
  To: linux-raid

About 1 month ago the topic was bad blocks.  I have been monitoring the bad
blocks on my disks and I find I have had 3 new bad blocks since Jan 18.
Each on a different disk.  I have 17 disks, SEAGATE ST118202LC.

These bad blocks did not cause any problems with md.  I believe they were
readable or write errors, but re-mapped since I have AWRE and ARRE turned
on.

I don't know what is a normal rate, but based on the last month, I would
expect about 2.29 defects per disk per year.  I have had the disks in use
for about 2.5 years.  I have 81 defects.  That comes to 1.905 per disk per
year.  Not to far off the mark!  And I have replaced disks 2-3 times in 2.5
years.  The failed disks failed due to a cable problem, the disks may have
been just fine, but I took them apart for the magnets before I realized the
power cable was at fault.  The cable is now repaired.

Anyway, if these were read errors, md would have caused me lots of problems.
So, we need md to deal with bad blocks without kicking out the disk.

I have been using this command to monitor the disks:
sginfo -G /dev/sda | grep "in grown table"

My current bad block status (Defect list) for 17 disks:
0 entries (0 bytes) in grown table.
0 entries (0 bytes) in grown table.
8 entries (64 bytes) in grown table.
5 entries (40 bytes) in grown table.
0 entries (0 bytes) in grown table.
12 entries (96 bytes) in grown table.
20 entries (160 bytes) in grown table.
0 entries (0 bytes) in grown table.
6 entries (48 bytes) in grown table.
0 entries (0 bytes) in grown table.
6 entries (48 bytes) in grown table.
28 entries (224 bytes) in grown table.
0 entries (0 bytes) in grown table.
2 entries (16 bytes) in grown table.
3 entries (24 bytes) in grown table.
4 entries (32 bytes) in grown table.
0 entries (0 bytes) in grown table.

Does anyone know how many defects is considered too many?

Guy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Bad Blocks...
@ 2005-04-26 20:33 Eddie Dawydiuk
  2005-04-26 22:43 ` Charles Manning
  0 siblings, 1 reply; 23+ messages in thread
From: Eddie Dawydiuk @ 2005-04-26 20:33 UTC (permalink / raw)
  To: linux-mtd

Hello Yaffers,

After running some stress tests on a 128MB NAND Flash I have found some
strange behavior while using the Yaffs filesystem... The stress test
creates a ring-buffer of 5 directories, each directory contains 10,000
files with a size of 1248 bytes (Please find the source code attached). 
When running this application on a 32MB NAND Flash I am able to fill the
disk and then delete the files as expected... Although when running the
test on a 128MB NAND Flash(with the same kernel) I find that after
creating slightly over 35,000 files I am unable to write any more files to
disk(my board hangs). After rebooting the board, when I attempt to delete
the files only some of the files are deleted successfully(on the first
attempt). After attempting several more times I am able to delete all of
the files but I find that I have hundreds of bad blocks(there are no error
messages when I attempt to delete a file and it is unsucessfully deleted).
I have provided the output of /proc/yaffs below(after running the stress
test multiple times) and am using a 2.4.26 kernel... I have read the other
posts refering to bad block management
(http://www.aleph1.co.uk/pipermail/yaffs/2005q1/000955.html) and have
ensured the fixes suggested have been made. If anyone has any suggestions
I would appreciate them...

$ cat /proc/yaffs
YAFFS built:Apr 26 2005 10:44:45
$Id: yaffs_fs.c,v 1.3 2005/01/25 00:38:25 eddie Exp $
$Id: yaffs_guts.c,v 1.41 2005/04/24 08:54:36 charles Exp $

Device yaffs
startBlock......... 1
endBlock........... 7999
chunkGroupBits..... 2
chunkGroupSize..... 4
nErasedBlocks...... 1575
nTnodesCreated..... 35000
nFreeTnodes........ 21790
nObjectsCreated.... 34400
nFreeObjects....... 21795
nFreeChunks........ 142801
nPageWrites........ 71420
nPageReads......... 2573700
nBlockErasures..... 4021
nGCCopies.......... 317
garbageCollections. 3599
passiveGCs......... 3599
nRetriedWrites..... 0
nRetireBlocks...... 2397
eccFixed........... 0
eccUnfixed......... 0
tagsEccFixed....... 0
tagsEccUnfixed..... 654
cacheHits.......... 0
nDeletedFiles...... 22161
nUnlinkedFiles..... 22162
nBackgroudDeletions 0
useNANDECC......... 1

Thanks,
Eddie

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad Blocks...
  2005-04-26 20:33 Bad Blocks Eddie Dawydiuk
@ 2005-04-26 22:43 ` Charles Manning
  0 siblings, 0 replies; 23+ messages in thread
From: Charles Manning @ 2005-04-26 22:43 UTC (permalink / raw)
  To: eddie, linux-mtd

On Wednesday 27 April 2005 08:33, Eddie Dawydiuk wrote:
> Hello Yaffers,
>
> After running some stress tests on a 128MB NAND Flash I have found some
> strange behavior while using the Yaffs filesystem... The stress test
> creates a ring-buffer of 5 directories, each directory contains 10,000
> files with a size of 1248 bytes (Please find the source code attached).
> When running this application on a 32MB NAND Flash I am able to fill the
> disk and then delete the files as expected... Although when running the
> test on a 128MB NAND Flash(with the same kernel) I find that after
> creating slightly over 35,000 files I am unable to write any more files to
> disk(my board hangs). After rebooting the board, when I attempt to delete
> the files only some of the files are deleted successfully(on the first
> attempt). After attempting several more times I am able to delete all of
> the files but I find that I have hundreds of bad blocks(there are no error
> messages when I attempt to delete a file and it is unsucessfully deleted).
> I have provided the output of /proc/yaffs below(after running the stress
> test multiple times) and am using a 2.4.26 kernel... I have read the other
> posts refering to bad block management
> (http://www.aleph1.co.uk/pipermail/yaffs/2005q1/000955.html) and have
> ensured the fixes suggested have been made. If anyone has any suggestions
> I would appreciate them...


Hi Eddie

I have not tried making 35000 files on a Linux box myself, but all of this 
should work fine.

The core problem, it seems to me, is the excessive number of bad blocks being 
generated. nRetiredBlocks of 2397 shows that over a quarter of your blocks 
have been marked bad in just one run.  Anything over 1-2% in a product 
lifetime is unlhealthy.

Sever block failures will cause flow-on failures like garbage collection 
problems etc.

Block failures are generally a result of ecc errors. I see you're using NAND 
ecc, so make sure that is doing the right thing.

I suggest adding some of the low level tracing (eg. YAFFS_TRACE_BAD_BLOCKS) 
and maybe turn on more mtd tracing. Feel free to add your own too :-)

-- Charles

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Bad Blocks
@ 2013-03-20 18:55 Dyweni - Ceph-Devel
  2013-03-28 15:54 ` Gregory Farnum
  0 siblings, 1 reply; 23+ messages in thread
From: Dyweni - Ceph-Devel @ 2013-03-20 18:55 UTC (permalink / raw)
  To: ceph-devel

Hi All,

I would like to understand how Ceph handles and recovers from bad 
blocks.  Would someone mind explaining this to me?  It wasn't very 
apparent from the docs.

My ultimate goal to be able to get some extra life out of my disks, 
after I detect that they may be failing.  (I'm talking about those disks 
that may have a small amount of bad blocks, but otherwise seem file and 
still perform well).

Here's what I've put together:

1. BBR Hardware
     - All hard disks come with a set number of blocks that are reserved 
for remapping of failed blocks.  This is handled transparently by the 
hard disk.  The hard disk may not begin reporting failed blocks until 
all the reserved blocks are used up.

2. BBR Device Mapper Target
     - Back in the EVMS days, IBM wrote a kernel module (dm-bbr) and a 
evms plugin to manage that kernel module.  I have updated that kernel 
module to work with the 3.6.11 kernel.  I have also rewrote some 
portions of the evms plugin as a standalone bash script to allow me to 
initialize the BBR layer and start the BBR device mapper target on that 
layer.  (So far it seems to run fine, but requires more testing).

3. BTRFS
     - I've read that BTRFS can perform data scrubbing and repair 
damaged files from redundant copies.

4. CEPH
     - I've read that CEPH can perform a deep scrub to find damaged 
copies.  I assume by the distributed nature of CEPH, it can repair the 
damaged copy from the other OSDs.

One thing I am not clear on is when BTRFS / CEPH finds damaged data, 
what do they do to prevent data from being written to the same area?

Also, I'm wondering if any parts to my layered approach are redundant / 
unnecessary...  For instance if BTRFS marks the block bad internally, 
then perhaps the BBR DM Target isn't needed...

In my testing recently, I had the following setup:
   Disk -> DM-Crypt -> DM-BBR -> BTRFS -> OSD

When the OSD hit a bad block, the DM-BBR target successfully remapped 
it to one of its own reserved blocks, BTRFS then reported data 
corruption, and the OSD daemon crashed.

-- 
Thanks,
Dyweni

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bad Blocks
  2013-03-20 18:55 Dyweni - Ceph-Devel
@ 2013-03-28 15:54 ` Gregory Farnum
  0 siblings, 0 replies; 23+ messages in thread
From: Gregory Farnum @ 2013-03-28 15:54 UTC (permalink / raw)
  To: YS3fpFE2ykfB; +Cc: ceph-devel@vger.kernel.org

The OSDs expect the underlying filesystem to keep their data clean and
fail-crash in order to prevent accidentally introducing corruption
into the system. There's some ongoing work to make that a little
friendlier, but it's not done yet.
-Greg

On Wed, Mar 20, 2013 at 11:55 AM, Dyweni - Ceph-Devel
<YS3fpFE2ykfB@dyweni.com> wrote:
> Hi All,
>
> I would like to understand how Ceph handles and recovers from bad blocks.
> Would someone mind explaining this to me?  It wasn't very apparent from the
> docs.
>
> My ultimate goal to be able to get some extra life out of my disks, after I
> detect that they may be failing.  (I'm talking about those disks that may
> have a small amount of bad blocks, but otherwise seem file and still perform
> well).
>
> Here's what I've put together:
>
> 1. BBR Hardware
>     - All hard disks come with a set number of blocks that are reserved for
> remapping of failed blocks.  This is handled transparently by the hard disk.
> The hard disk may not begin reporting failed blocks until all the reserved
> blocks are used up.
>
> 2. BBR Device Mapper Target
>     - Back in the EVMS days, IBM wrote a kernel module (dm-bbr) and a evms
> plugin to manage that kernel module.  I have updated that kernel module to
> work with the 3.6.11 kernel.  I have also rewrote some portions of the evms
> plugin as a standalone bash script to allow me to initialize the BBR layer
> and start the BBR device mapper target on that layer.  (So far it seems to
> run fine, but requires more testing).
>
> 3. BTRFS
>     - I've read that BTRFS can perform data scrubbing and repair damaged
> files from redundant copies.
>
> 4. CEPH
>     - I've read that CEPH can perform a deep scrub to find damaged copies.
> I assume by the distributed nature of CEPH, it can repair the damaged copy
> from the other OSDs.
>
> One thing I am not clear on is when BTRFS / CEPH finds damaged data, what do
> they do to prevent data from being written to the same area?
>
> Also, I'm wondering if any parts to my layered approach are redundant /
> unnecessary...  For instance if BTRFS marks the block bad internally, then
> perhaps the BBR DM Target isn't needed...
>
>
> In my testing recently, I had the following setup:
>   Disk -> DM-Crypt -> DM-BBR -> BTRFS -> OSD
>
> When the OSD hit a bad block, the DM-BBR target successfully remapped it to
> one of its own reserved blocks, BTRFS then reported data corruption, and the
> OSD daemon crashed.
>
>
> --
> Thanks,
> Dyweni
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-03-28 15:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-16  1:13 Bad blocks Sam Vilain
2002-04-16 11:43 ` Matthias Andree
2002-04-16 11:47 ` Oleg Drokin
2002-04-16 12:01 ` Ed Tomlinson
  -- strict thread matches above, loose matches on Subject: below --
2002-06-17 15:19 bad blocks Alexander Saers
2002-06-17 15:23 ` Oleg Drokin
2002-09-19 14:48 Bad Blocks Abiy,Mike [Edm]
2002-09-19 15:23 ` Jorge R . Csapo
     [not found] <F8500ECEBD66D211A3D20008C724A29C01CEFD9D@SR-EDM-EXCH4.edm. ab.ec.gc.ca>
2002-09-19 15:03 ` Scott Taylor
2002-09-23 14:17 Abiy,Mike [Edm]
2002-09-23 20:33 ` Glynn Clements
2002-09-24  4:14 Aleksander Kujbida
2002-09-24  4:54 ` Glynn Clements
2002-09-24  7:54 Aleksander Kujbida
2004-04-01 14:06 Bad blocks Kalev Lember
2004-04-01 14:36 ` David Woodhouse
2004-04-02  0:14   ` Greg Ungerer
2004-04-04 11:16   ` Kalev Lember
2005-02-17  7:14 Question regarding mdadm.conf Michael Tokarev
2005-02-17  8:30 ` Bad blocks Guy
2005-04-26 20:33 Bad Blocks Eddie Dawydiuk
2005-04-26 22:43 ` Charles Manning
2013-03-20 18:55 Dyweni - Ceph-Devel
2013-03-28 15:54 ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.