* md: badblocks(pid 1216) used obsolete MD ioctl
@ 2002-07-01 16:58 Cal Webster
0 siblings, 0 replies; 6+ messages in thread
From: Cal Webster @ 2002-07-01 16:58 UTC (permalink / raw)
To: linux-raid
Hello list:
I'm getting errors when formatting and managing RAID devices. Below is the
sequence of events leading up to the errors. I had recently upgraded our
RAID5 array from 6 to 9 disks (see below for system profile and raid
config). To do this, I backed up the contents of the RAID, formatted the new
drives, and re-created it with a new raidtab. There were no errors on any
RAID devices prior to the upgrade.
Please note that the same sector is shown in each of the two I/O errors, so
there may be a real error. However, why can't I get "fsck" to work around
it? Why did the sector error not show up before? Also, why did I only get
data errors in the filesystem check? Isn't the "badblock" test supposed to
catch these bad blocks? Could the "obsolete MD ioctl" errors be keeping
"fsck" from correcting the errors?
Thanks!
--Cal Webster
Network Manager
NAWCTSD ISEO CPNC
############################
# Begin Sequence of Events #
############################
1. Upgraded from 6 to 9 drives - no apparent errors during creation or
restoration of RAID5 device.
2. After restoring data, /dev/sdc1 got kicked from array.
---------------------------------------------------
SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 2
I/O error: dev 08:21, sector 32772736
raid5: Disk failure on sdc1, disabling device.
Operation continuing on 7 devices
---------------------------------------------------
3. Ran "e2fsck -c /dev/md0" before adding "sdc1" back into array.
---------------------------------------------------
md: badblocks(pid 1857) used obsolete MD ioctl, upgrade your software to use
new ictls.
---------------------------------------------------
4. Ran "fdisk" on /dev/sdc without changing any parameters to erase the RAID
superblock (is there another way to remove "faulty" flag?). Same errors
occurred when formatting new drives (sdg,sdh,sdi) during hardware upgrade:
Multiple occurrences of this error:
---------------------------------------------------
sys32_ioctl(fdisk:9883): Unknown cmd fd(3) cmd(00000330) arg(effffb40)
---------------------------------------------------
No errors displayed at terminal
Partition tables still look okay
5. Used "mdadm /dev/md0 -a /dev/sdc1" to add (formerly "faulty") drive back
into array.
No errors
6. Used "mdadm /dev/md0 -f /dev/sdi1" to kick (good) original spare from
array.
---------------------------------------------------
md0: resyncing spare disk sdc1 to replace failed disk
---------------------------------------------------
Reconstruction of array proceeded without incident.
7. Ran "fdisk" on /dev/sdi without changing any parameters to erase the RAID
superblock (is there another way to remove "faulty" flag?). Got same errors
in log from "fdisk" as with /dev/sdc and, as with the other drive, no
terminal error.
8. Used "mdadm /dev/md0 -a /dev/sdi1" to add original spare drive back into
array, again as the spare. Following error appeared in log following raid
reconfig.
---------------------------------------------------
md: badblocks(pid 1216) used obsolete MD ioctl, upgrade your software to use
new ictls.
---------------------------------------------------
9. Ran e2fsck
---------------------------------------------------
[root@winggear root]# e2fsck -c /dev/md0
e2fsck 1.23, 15-Aug-2001 for EXT2 FS 0.5b, 95/08/09
Checking for bad blocks (read-only test): done
Pass 1: Checking inodes, blocks, and sizes
Inode 15105025 is in use, but has dtime set. Fix<y>? yes
...
Inode 15105088 is in use, but has dtime set. Fix<y>? yes
yyPass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 15105025 (...) has a bad mode (0157306).
Clear<y>? yes
...
Inode 15105088 (...) has a bad mode (0157306).
Clear<y>? yes
Pass 5: Checking group summary information
ARCHIVE: ***** FILE SYSTEM WAS MODIFIED *****
ARCHIVE: 70316/15482880 files (0.2% non-contiguous), 9814872/30942912 blocks
---------------------------------------------------
10. Following error shows up in system log following completion of "e2fsck".
/dev/sdc1 shows no errors in system log prior to or during the filesystem
check.
---------------------------------------------------
SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 2
I/O error: dev 08:21, sector 32772736
raid5: Disk failure on sdc1, disabling device.
Operation continuing on 7 devices
---------------------------------------------------
##########################
# End Sequence of Events #
##########################
######################
# Begin RAID Profile #
######################
===============
Partition table
(all 9 drives)
===============
Disk /dev/sdh (Sun disk label): 19 heads, 248 sectors, 7506 cylinders
Units = cylinders of 4712 * 512 bytes
Device Flag Start End Blocks Id System
/dev/sdh1 1 7506 17681780 fd Linux raid autodetect
/dev/sdh3 0 7506 17684136 5 Whole disk
===============
===========
Superblocks
(all 9 drives)
===========
--------[ mdadm --examine /dev/sda1 ]--------
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : 714b16c1:5fb9be28:e5a24a26:38f9f531
Creation Time : Wed Jun 26 13:03:23 2002
Raid Level : raid5
Device Size : 17681664 (16.86 GiB 18.15 GB)
Raid Devices : 8
Total Devices : 9
Preferred Minor : 0
Update Time : Sat Jun 29 01:45:09 2002
State : dirty, no-errors
Active Devices : 8
Working Devices : 8
Failed Devices : 1
Spare Devices : 0
Checksum : 14137179 - correct
Events : 0.24
Layout : left-asymmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 8 113 7 active sync /dev/sdh1
---------------------------------------------
===========
==================
RAID Configuration
==================
-----------------------[ raidtab ]-----------------------
#
# 'persistent' RAID5 setup, with one spare disk:
#
raiddev /dev/md0
raid-level 5
nr-raid-disks 8
nr-spare-disks 1
persistent-superblock 1
chunk-size 128
device /dev/sda1
raid-disk 0
device /dev/sdb1
raid-disk 1
device /dev/sdc1
raid-disk 2
device /dev/sdd1
raid-disk 3
device /dev/sde1
raid-disk 4
device /dev/sdf1
raid-disk 5
device /dev/sdg1
raid-disk 6
device /dev/sdh1
raid-disk 7
device /dev/sdi1
spare-disk 0
---------------------------------------------------------
==================
####################
# End RAID Profile #
####################
########################
# Begin System Profile #
########################
CPU:
cpu : TI UltraSparc IIi
fpu : UltraSparc IIi integrated FPU
promlib : Version 3 Revision 14
prom : 3.14.0
type : sun4u
ncpus probed : 1
ncpus active : 1
Cpu0Bogo : 599.65
Cpu0ClkTck : 0000000011e1ab1e
MMU Type : Spitfire
Physical RAM: 256 MB
IDE Boot drive:
-
class: HD
bus: IDE
detached: 0
device: hdb
driver: ignore
desc: "ST380021A"
physical: 155061/16/63
logical: 155061/16/63
-
SCSI Software RAID Drives:
## 6 of these:
-
class: HD
bus: SCSI
detached: 0
device: sda
driver: ignore
desc: "Fujitsu MAA3182S SUN18G"
host: 0
id: 0
channel: 0
lun: 0
-
## 3 of these:
-
class: HD
bus: SCSI
detached: 0
device: sdg
driver: ignore
desc: "Seagate ST318438LW"
host: 0
id: 6
channel: 0
lun: 0
-
Swap: 256 MB partition
Operating System:
Linux version 2.4.18-0.92sparc (root@fry.rdu.redhat.com) (gcc driver version
egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) executing gcc version
egcs-2.92.11) #1 Mon May 6 17:51:54 EDT 2002
RAID Software: raidtools-1.00.2-1.3
######################
# End System Profile #
######################
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: md: badblocks(pid 1216) used obsolete MD ioctl
[not found] <5.1.0.14.2.20020701103744.035bf760@mail.harddata.com>
@ 2002-07-01 19:43 ` Cal Webster
0 siblings, 0 replies; 6+ messages in thread
From: Cal Webster @ 2002-07-01 19:43 UTC (permalink / raw)
To: Maurice Hilarius; +Cc: linux-raid
> -----Original Message-----
> From: Maurice Hilarius [mailto:maurice@harddata.com]
> Sent: Monday, July 01, 2002 10:41 AM
> To: cwebster@ec.rr.com
> Subject: Re: md: badblocks(pid 1216) used obsolete MD ioctl
>
Thank you for the feedback Maurice.
> You have SCSI bus problems. Likely related to termination.
> 1) check if you have active, powered termination, of the correct type. A
> U160 term will generally work, even with lower SCSI variants, as long as
> they are LVD of some type.
Forgive me, but I don't believe this is a SCSI bus problem. If the problem
moved around or I got different errors each time, I might suspect the bus.
As it is, the same sector on the same drive is called out each time. These
drives are mounted in an external SCSI drive enclosure which is terminated
with the appropriate active termination at the opposite end of the bus from
the connection to the host computer.
> 2) Check if the devices are jumpered to provide termination power to the
> SCSI bus. Make it so if possible, on ALL devices.
All the new drives are Seagate Barracuda ST318438LW. Unless I am mistaken,
they are ready to install in their factory default configuration.
Termination power is supplied by the bus for the external active terminator.
None of the nine drives should be terminated or jumpered to supply
termination power.
> 3) If this an Adaptec controller, go into it's BIOS (CTRL-A on startup
> message) If set to "auto" on termination, over-ride set to actual
> configuration, manually.
Please note that this is an UltraSparc IIi, not in Intel box. It does not
load a separate SCSI BIOS on startup the way Intel machines do.
> Maybe a bad drive?
That's my point. At most, I'm willing to concede that there is a bad sector,
(32772736 as reported in the system log). Even so, e2fsck should be able to
"mark" these "bad blocks", adding them to the list for the device. Once
marked, these blocks will not be written to again.
--Cal Webster
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: md: badblocks(pid 1216) used obsolete MD ioctl
[not found] <5.1.0.14.2.20020701141931.035aa4e0@mail.harddata.com>
@ 2002-07-02 19:29 ` Cal Webster
2002-07-02 21:13 ` Diamon
0 siblings, 1 reply; 6+ messages in thread
From: Cal Webster @ 2002-07-02 19:29 UTC (permalink / raw)
To: Maurice Hilarius; +Cc: linux-raid
> -----Original Message-----
> From: Maurice Hilarius [mailto:maurice@harddata.com]
> Sent: Monday, July 01, 2002 2:26 PM
> To: cwebster@ec.rr.com
> Subject: RE: md: badblocks(pid 1216) used obsolete MD ioctl
>
> I missed that. Always the same drive and sectors?
That's right: single drive, one bad sector.
> It might be worthwhile to run a SCSI (low level) format on this drive, so
> that any bad sectors get marked bad and are not used.
The tools at my disposal relevant to this task are fdisk, mke2fs, e2fsck,
and badblocks. I don't believe that the "surface analysis" done by most SCSI
BIOS utilities on Intel machines does much more than these tools.
> It never hurts to have device close to the terminator ALSO providing term
> power. By the time the current gets from the host, all the way to
> the other
> end of the cabling, especially with an external cabinet, the voltage may
> drop quite a bit.
> It does no harm to enable it on the drives as well, and makes sure the
> attenuation is not a problem.
I appreciate what you're saying, especially since I did not indicate the
proximity of the RAID array to the host computer. However, the length of
wire between this on-board interface connector and the last device in the
external array is less than 8 feet total, so I would expect zero benefit. To
the contrary, in our situation all it would do is make the drives run hotter
in an already warm environment. I've never found this (current drain) to be
a factor with cable lengths under 8-feet, even with single-ended segments.
As I'm sure you are aware, differential SCSI has substantially extended the
range of SCSI signals on longer cables.
> >Please note that this is an UltraSparc IIi, not in Intel box. It does not
> >load a separate SCSI BIOS on startup the way Intel machines do.
>
> True, but lots of people put Adaptec PCI controllers in SPARCS, Macs, etc.
> If so, it may be necessary to temporarily plug it into a PC to
> check/adjust
> these settings.
> There are likely to be SPARC specific utilities which can do the
> same thing of course..
Okay, I've got to ask now. What benefit could I expect from pulling a drive
from my array and plugging it into a PC? What could I do on the PC that I
could not do with one of the utilities mentioned above?
> >That's my point. At most, I'm willing to concede that there is a
> bad sector,
> >(32772736 as reported in the system log). Even so, e2fsck should
> be able to
> >"mark" these "bad blocks", adding them to the list for the device. Once
> >marked, these blocks will not be written to again.
> You are right, it should, assuming that it sees the sector is bad
> on a read.
> However, if you want to make sure, format the filesystem with option to
> write to all sectors and verify.
> In mke2fs for example, using the "-c" flag.
You may have hit on it here. If you look at my original post, you'll see
that I did specify the "-c" flag to e2fsck. The "-c" flag to mke2fs does
exactly the same as the same flag on e2fsck. It starts a "badblocks"
read-only test. To accomplish a thorough, non-destructive, read-write test,
I'd have to run "badblocks" by itself, specifying this option (i.e.
badblocks -svn -o badblks.md0 /dev/md0). I could then use the output of this
test to mark the bad sectors with either e2fsck or mke2fs (using the "-l"
flag).
I still think there is a problem with e2fsck and fdisk, though, or possibly
with the libraries upon which they depend. I monitored the system log while
working on this problem. The following errors coincided with the
command/event shown. I'll be updating most of this stuff to the latest
versions with Aurora 0.3 (Equivalent to RHL 7.3). Hopefully, some of these
problems will be fixed then.
=======================
[root@winggear root]# e2fsck -c /dev/md0
e2fsck 1.23, 15-Aug-2001 for EXT2 FS 0.5b, 95/08/09
Checking for bad blocks (read-only test): 81504/ 30942912
-----------------------
From /var/log/messages:
-----------------------
Jul 2 10:17:27 winggear kernel: md: badblocks(pid 1175) used obsolete MD
ioctl, upgrade your software to use new ictls.
=======================
=======================
[root@winggear samba]# fdisk /dev/sdi
Command (m for help):
-----------------------
From /var/log/messages:
-----------------------
Jul 2 09:08:56 winggear kernel: sys32_ioctl(fdisk:1588): Unknown cmd fd(3)
cmd(00000330) arg(effffb10)
=======================
> Still, if you suspect bad sectors, a low level format is the
> first order of
> the day.
> If this marks MANY sectors as bad, it is likely the drive is
> either dying,
> or a head skip occurred in the past.
Whatever the term "low-level format" means to you, I certainly agree that
multiple bad blocks could be signal of impending doom, especially if there
are a growing number of them. Even if the drive was formatted with "spare"
cylinders, the inevitable can only be delayed.
--Cal Webster
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: md: badblocks(pid 1216) used obsolete MD ioctl
2002-07-02 19:29 ` Cal Webster
@ 2002-07-02 21:13 ` Diamon
0 siblings, 0 replies; 6+ messages in thread
From: Diamon @ 2002-07-02 21:13 UTC (permalink / raw)
To: linux-raid
<forgot to CC: the list, my bad>
You might want to look into a low-level format tool called sformat if
memory serves. I seem to recall it works for a lot of platforms.
Adaptec SCSI bios can properly spot and reallocate bad blocks rather
well. None of the tools you named can actually reallocate bad sectors,
unless the underlying drive itself initiates the reallocation when your tool
touches it. If it would do that, I'd think it would have been done already.
A proper low-level format refreshes sector marks and can reorganize a
disk for more optimal use for that specific controller (Move a drive
formatted on a Buslogic card to an Adaptec card and you can see, measure,
and sometimes even HEAR the difference), and even recover any heat-weakened
sectors.
I used to need to low-level format my Seagate 18Gb LVD2 drive every 6-8
months or so until I got it below 45C (somewhere about 120F I think) from
the 55C it had run at before. Now that they're below 45 I have no problems
with them.
Properly twisted LVD cabling should have very minimal crosstalk and
power loss. I'd doubt that providing bus term power from the drives would
do any good, but that's just my opinion.
Anyway, I hope some of this helps.
----- Original Message -----
From: "Cal Webster" <kc130iseo@coastalnet.com>
To: "Maurice Hilarius" <maurice@harddata.com>
Cc: <linux-raid@vger.kernel.org>
Sent: Tuesday, July 02, 2002 2:29 PM
Subject: RE: md: badblocks(pid 1216) used obsolete MD ioctl
> > -----Original Message-----
> > From: Maurice Hilarius [mailto:maurice@harddata.com]
> > Sent: Monday, July 01, 2002 2:26 PM
> > To: cwebster@ec.rr.com
> > Subject: RE: md: badblocks(pid 1216) used obsolete MD ioctl
> >
> > I missed that. Always the same drive and sectors?
>
> That's right: single drive, one bad sector.
>
> > It might be worthwhile to run a SCSI (low level) format on this drive,
so
> > that any bad sectors get marked bad and are not used.
>
> The tools at my disposal relevant to this task are fdisk, mke2fs, e2fsck,
> and badblocks. I don't believe that the "surface analysis" done by most
SCSI
> BIOS utilities on Intel machines does much more than these tools.
>
> > It never hurts to have device close to the terminator ALSO providing
term
> > power. By the time the current gets from the host, all the way to
> > the other
> > end of the cabling, especially with an external cabinet, the voltage may
> > drop quite a bit.
> > It does no harm to enable it on the drives as well, and makes sure the
> > attenuation is not a problem.
>
> I appreciate what you're saying, especially since I did not indicate the
> proximity of the RAID array to the host computer. However, the length of
> wire between this on-board interface connector and the last device in the
> external array is less than 8 feet total, so I would expect zero benefit.
To
> the contrary, in our situation all it would do is make the drives run
hotter
> in an already warm environment. I've never found this (current drain) to
be
> a factor with cable lengths under 8-feet, even with single-ended segments.
> As I'm sure you are aware, differential SCSI has substantially extended
the
> range of SCSI signals on longer cables.
>
> > >Please note that this is an UltraSparc IIi, not in Intel box. It does
not
> > >load a separate SCSI BIOS on startup the way Intel machines do.
> >
> > True, but lots of people put Adaptec PCI controllers in SPARCS, Macs,
etc.
> > If so, it may be necessary to temporarily plug it into a PC to
> > check/adjust
> > these settings.
> > There are likely to be SPARC specific utilities which can do the
> > same thing of course..
>
> Okay, I've got to ask now. What benefit could I expect from pulling a
drive
> from my array and plugging it into a PC? What could I do on the PC that I
> could not do with one of the utilities mentioned above?
>
> > >That's my point. At most, I'm willing to concede that there is a
> > bad sector,
> > >(32772736 as reported in the system log). Even so, e2fsck should
> > be able to
> > >"mark" these "bad blocks", adding them to the list for the device. Once
> > >marked, these blocks will not be written to again.
> > You are right, it should, assuming that it sees the sector is bad
> > on a read.
> > However, if you want to make sure, format the filesystem with option to
> > write to all sectors and verify.
> > In mke2fs for example, using the "-c" flag.
>
> You may have hit on it here. If you look at my original post, you'll see
> that I did specify the "-c" flag to e2fsck. The "-c" flag to mke2fs does
> exactly the same as the same flag on e2fsck. It starts a "badblocks"
> read-only test. To accomplish a thorough, non-destructive, read-write
test,
> I'd have to run "badblocks" by itself, specifying this option (i.e.
> badblocks -svn -o badblks.md0 /dev/md0). I could then use the output of
this
> test to mark the bad sectors with either e2fsck or mke2fs (using the "-l"
> flag).
>
> I still think there is a problem with e2fsck and fdisk, though, or
possibly
> with the libraries upon which they depend. I monitored the system log
while
> working on this problem. The following errors coincided with the
> command/event shown. I'll be updating most of this stuff to the latest
> versions with Aurora 0.3 (Equivalent to RHL 7.3). Hopefully, some of these
> problems will be fixed then.
>
> =======================
> [root@winggear root]# e2fsck -c /dev/md0
> e2fsck 1.23, 15-Aug-2001 for EXT2 FS 0.5b, 95/08/09
> Checking for bad blocks (read-only test): 81504/ 30942912
> -----------------------
> >From /var/log/messages:
> -----------------------
> Jul 2 10:17:27 winggear kernel: md: badblocks(pid 1175) used obsolete MD
> ioctl, upgrade your software to use new ictls.
> =======================
>
> =======================
> [root@winggear samba]# fdisk /dev/sdi
>
> Command (m for help):
>
> -----------------------
> >From /var/log/messages:
> -----------------------
> Jul 2 09:08:56 winggear kernel: sys32_ioctl(fdisk:1588): Unknown cmd
fd(3)
> cmd(00000330) arg(effffb10)
> =======================
>
>
> > Still, if you suspect bad sectors, a low level format is the
> > first order of
> > the day.
> > If this marks MANY sectors as bad, it is likely the drive is
> > either dying,
> > or a head skip occurred in the past.
>
> Whatever the term "low-level format" means to you, I certainly agree that
> multiple bad blocks could be signal of impending doom, especially if there
> are a growing number of them. Even if the drive was formatted with "spare"
> cylinders, the inevitable can only be delayed.
>
>
> --Cal Webster
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: md: badblocks(pid 1216) used obsolete MD ioctl
2002-07-02 21:30 ` md: badblocks(pid 1216) used obsolete MD ioctl Cal Webster
@ 2002-07-02 21:29 ` Diamon
0 siblings, 0 replies; 6+ messages in thread
From: Diamon @ 2002-07-02 21:29 UTC (permalink / raw)
To: linux-raid; +Cc: kc130iseo
<-- Snip snip -->
> [snip]
> > These commands are SCSI command set mode commands, and after the drive
> > receives the instruction it performs the format itself.
> > mke2fs and so on only do high level filesystem formats.
Exactly.
> Of course, most storage people know that the "CHS" value vendors assign to
a
> drive almost never represent the physical geometry. The drive and/or
> controller must be capable of translating to "physical" locations for
> read/write operations. I assume that the "low-level" format you refer to
> operates at this level.
SCSI doesn't really have a CHS, and never has. It pretends to, but a
SCSI device as I recall is actually (at least internally) addressed by
sectors, regardless of where on the disk the sectors are actually located.
> A crude overview (my understanding): During a "low-level" format the
> controller, activating machine instructions stored on the drive, will scan
> for defects then re-write BOT (Beginning Of Track) markers and sector
> addresses on the media, skipping the bad spots it found and storing them
for
> future reference in some non-volatile medium (often it will do more).
> "fdisk" and the tools in the e2fsprogs package merely overlay partition,
> sector, and block data atop this low-level framework to create and manage
> the filesystems.
Actually, the controller usually should do nothing. There is a command
to pass to the drive to tell it to format itself, and off it goes. Most if
not all drives do no error-checking during the format operation, trusting to
their 'Media Verify" to be done later. SCSI drives don't skip bad sectors.
If the drive really knows it's bad and has been allowed to, it will
reallocate the sector, but oddly most drives are set to NOT reallocate on
Read Errors. Not sure why.
> > > > It does no harm to enable it on the drives as well, and makes sure
the
> > > > attenuation is not a problem.
> [snip]
> > >in an already warm environment. I've never found this (current
> > >drain) to be a factor with cable lengths under 8-feet, even
> > >with single-ended segments. As I'm sure you are aware,
> > >differential SCSI has substantially extended the range of SCSI
> > >signals on longer cables.
> >
> > True. Still, I have seen many cases where this helped.
>
> Thanks, I'll keep that in mind. I know from my own experience that there
are
> times when a certain action fixes a problem even though it doesn't seem to
> make sense at the time.
Arrgh, how true that is...
> Often it is more expedient to just try something that might work, rather
> than to enter an in-depth analysis. I cannot let things like this go
> indefinitely, though. I've got to know "why". When encountering these
> situations, I usually make time later to conduct a forensic analysis. Over
> the years I've found that there is usually a contributing factor, often
> related to the "interim fix" I was forced to employ.
>
> > >Okay, I've got to ask now. What benefit could I expect from
> > pulling a drive
> > >from my array and plugging it into a PC? What could I do on the PC that
I
> > >could not do with one of the utilities mentioned above?
> >
> > Low level SCSI format command.
> > This marks bad all defective sectors ON THE DRIVE.
> > SCSI drives have factory and "grown" defect tables in their firmware,
and
> > this is more effective and reliable than filesystem tools ability to
mark
> > bad sectors.
Yes, the drive usually knows best if it has a problem... I think the
glist (grown defect list, as opposed to the plist, or primary list) is
stored on disk in a reserved sector, and woe betide THAT sector going bad.
:P I think sformat can scrup the glist as well. Again I can't swear to
that, it's been years since I used sformat.
> Okay, I've used low-level SCSI format for that purpose many times on
> "Windoze" PC's. In fact, I've searched in vain for utilities that do this
on
> a Sparc Linux machine. Admittedly, having the bad sectors stored on the
> drive itself prevents losing track of them, making it more reliable.
> However, I still don't see how it is more effective than Linux e2fsprogs
at
> detecting and marking them.
E2fsprogs only look at the high-level, not the disk itself. If your
disk has reallocate on error disabled, all the formatting in the world
(except low-level formats) will do nothing.
> I suppose that, if there were many of these bad spots to manage, there may
> be some performance gain by relieving the filesystem/kernel drivers of the
> burden. I don't see how physically re-mapping a few bad areas is
> significantly more efficient that simply allowing the filesystem to avoid
> them. To low-level format drives on my Sparc Linux machines I would have
to
> disconnect and remove the chassis from the 19-inch rack, extract the
drive,
> open SCSI capable PC and install the drive, power it up and run the
format,
> then re-install everything. This is way too much trouble for the expected
> benefit, at least for my situation.
Again, check into sformat. I can't get a good URL currently, but I
think the current version is 3.5 from what I see, and it's from that fellow
Joerg Schilling who does cdrecord. A good program, but you need generic
scsi devices to work it, as sofrmat bypasses the scsi driver to operate.
> --Cal Webster
>
> P.S. Why don't you post to the list?
I did this time, forgot on the first message. :)
Diamon
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: md: badblocks(pid 1216) used obsolete MD ioctl
[not found] <5.1.0.14.2.20020702113532.035e0ec0@mail.harddata.com>
@ 2002-07-02 21:30 ` Cal Webster
2002-07-02 21:29 ` Diamon
0 siblings, 1 reply; 6+ messages in thread
From: Cal Webster @ 2002-07-02 21:30 UTC (permalink / raw)
To: Maurice Hilarius; +Cc: linux-raid
> -----Original Message-----
> From: Maurice Hilarius [mailto:maurice@harddata.com]
> Sent: Tuesday, July 02, 2002 11:42 AM
> To: cwebster@ec.rr.com
> Subject: RE: md: badblocks(pid 1216) used obsolete MD ioctl
>
>
[snip]
> These commands are SCSI command set mode commands, and after the drive
> receives the instruction it performs the format itself.
> mke2fs and so on only do high level filesystem formats.
Of course, most storage people know that the "CHS" value vendors assign to a
drive almost never represent the physical geometry. The drive and/or
controller must be capable of translating to "physical" locations for
read/write operations. I assume that the "low-level" format you refer to
operates at this level.
A crude overview (my understanding): During a "low-level" format the
controller, activating machine instructions stored on the drive, will scan
for defects then re-write BOT (Beginning Of Track) markers and sector
addresses on the media, skipping the bad spots it found and storing them for
future reference in some non-volatile medium (often it will do more).
"fdisk" and the tools in the e2fsprogs package merely overlay partition,
sector, and block data atop this low-level framework to create and manage
the filesystems.
> > > It does no harm to enable it on the drives as well, and makes sure the
> > > attenuation is not a problem.
[snip]
> >in an already warm environment. I've never found this (current
> >drain) to be a factor with cable lengths under 8-feet, even
> >with single-ended segments. As I'm sure you are aware,
> >differential SCSI has substantially extended the range of SCSI
> >signals on longer cables.
>
> True. Still, I have seen many cases where this helped.
Thanks, I'll keep that in mind. I know from my own experience that there are
times when a certain action fixes a problem even though it doesn't seem to
make sense at the time.
Often it is more expedient to just try something that might work, rather
than to enter an in-depth analysis. I cannot let things like this go
indefinitely, though. I've got to know "why". When encountering these
situations, I usually make time later to conduct a forensic analysis. Over
the years I've found that there is usually a contributing factor, often
related to the "interim fix" I was forced to employ.
> >Okay, I've got to ask now. What benefit could I expect from
> pulling a drive
> >from my array and plugging it into a PC? What could I do on the PC that I
> >could not do with one of the utilities mentioned above?
>
> Low level SCSI format command.
> This marks bad all defective sectors ON THE DRIVE.
> SCSI drives have factory and "grown" defect tables in their firmware, and
> this is more effective and reliable than filesystem tools ability to mark
> bad sectors.
Okay, I've used low-level SCSI format for that purpose many times on
"Windoze" PC's. In fact, I've searched in vain for utilities that do this on
a Sparc Linux machine. Admittedly, having the bad sectors stored on the
drive itself prevents losing track of them, making it more reliable.
However, I still don't see how it is more effective than Linux e2fsprogs at
detecting and marking them.
I suppose that, if there were many of these bad spots to manage, there may
be some performance gain by relieving the filesystem/kernel drivers of the
burden. I don't see how physically re-mapping a few bad areas is
significantly more efficient that simply allowing the filesystem to avoid
them. To low-level format drives on my Sparc Linux machines I would have to
disconnect and remove the chassis from the 19-inch rack, extract the drive,
open SCSI capable PC and install the drive, power it up and run the format,
then re-install everything. This is way too much trouble for the expected
benefit, at least for my situation.
--Cal Webster
P.S. Why don't you post to the list?
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-07-02 21:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <5.1.0.14.2.20020702113532.035e0ec0@mail.harddata.com>
2002-07-02 21:30 ` md: badblocks(pid 1216) used obsolete MD ioctl Cal Webster
2002-07-02 21:29 ` Diamon
[not found] <5.1.0.14.2.20020701141931.035aa4e0@mail.harddata.com>
2002-07-02 19:29 ` Cal Webster
2002-07-02 21:13 ` Diamon
[not found] <5.1.0.14.2.20020701103744.035bf760@mail.harddata.com>
2002-07-01 19:43 ` Cal Webster
2002-07-01 16:58 Cal Webster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).