* Determining filename from absolute sector failure in raid0
@ 2008-10-07 14:03 Bryce
2008-10-07 14:29 ` David Lethe
0 siblings, 1 reply; 4+ messages in thread
From: Bryce @ 2008-10-07 14:03 UTC (permalink / raw)
To: Linux RAID Mailing List
I have a raid0 comprised of 4x identical 300Gb drives
Disk /dev/hdf: 300.0 GB, 300001443840 bytes
255 heads, 63 sectors/track, 36473 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0009f25f
Device Boot Start End Blocks Id System
/dev/hdf1 * 1 36473 292969341 fd Linux raid
autodetect
md1 : active raid0 hdk1[0] hdg1[2] hdf1[1] hde1[3]
1171876864 blocks 256k chunks
hdf recently wobbled and the kernel happily dumped out the following
message (about 20 odd times)
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=216521307,
high=12, low=15194715, sector=216521303
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 216521303
terrific.
smartctl shows
5 Reallocated_Sector_Ct 0x0033 252 252 063 Pre-fail
Always - 16
196 Reallocated_Event_Count 0x0008 251 251 000 Old_age
Offline - 2
197 Current_Pending_Sector 0x0008 253 253 000 Old_age
Offline - 1
198 Offline_Uncorrectable 0x0008 252 252 000 Old_age
Offline - 1
so while a few sectors have been remapped the sector remapping table is
barely used, so I'm reasonably confident the disk is 'ok' in terms of
normal operation
Now my question about all this is how the hell do I determine what file
it was trying to access? (I just want to be sure that whatever file it
was fiddling with isn't a steaming pile of poo now)
normally if it were a single disk it would be reasonably straightforward
to work out what file was being used from debugfs: icheck <block>,
ncheck <inode from previews operation>
the issue I have with raid0 is, exactly what block should I be pointing
debugfs at since the kernel has given an absolute address on a physical
disk and no logical information as to where in the FS it was at the time ?
tune2fs -l /dev/md1
Block count: 292969216
Block size: 4096
mdadm --examine /dev/hdf1
/dev/hdf1:
Magic : a92b4efc
Version : 00.90.00
UUID : 2132d2e6:a2955ac2:6ad2f3dc:52b3dfd6
Creation Time : Wed May 26 18:34:06 2004
Raid Level : raid0
Used Dev Size : 292969216 (279.40 GiB 300.00 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Update Time : Wed May 26 18:34:06 2004
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : bd5e5781 - correct
Events : 0.5
Chunk Size : 256K
Number Major Minor RaidDevice State
this 1 33 65 1 active sync /dev/hdf1
0 0 33 1 0 active sync /dev/hde1
1 1 33 65 1 active sync /dev/hdf1
2 2 34 1 2 active sync /dev/hdg1
3 3 34 65 3 active sync <---
empty. bug? (expected /dev/hdk1)
mdadm --version
mdadm - v2.6.2 - 21st May 2007
Would I be correct in assuming that I start my offset from hde1? do I
need some other funky math to account for striping/blocking?
Or I could just say 'to hell with it' and listen to elevator music all day.
Phil
=--=
^ permalink raw reply [flat|nested] 4+ messages in thread* RE: Determining filename from absolute sector failure in raid0
2008-10-07 14:03 Determining filename from absolute sector failure in raid0 Bryce
@ 2008-10-07 14:29 ` David Lethe
2008-10-07 16:34 ` Bryce
0 siblings, 1 reply; 4+ messages in thread
From: David Lethe @ 2008-10-07 14:29 UTC (permalink / raw)
To: Bryce, Linux RAID Mailing List
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Bryce
> Sent: Tuesday, October 07, 2008 9:03 AM
> To: Linux RAID Mailing List
> Subject: Determining filename from absolute sector failure in raid0
>
>
> I have a raid0 comprised of 4x identical 300Gb drives
>
> Disk /dev/hdf: 300.0 GB, 300001443840 bytes
> 255 heads, 63 sectors/track, 36473 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x0009f25f
>
> Device Boot Start End Blocks Id System
> /dev/hdf1 * 1 36473 292969341 fd Linux raid
> autodetect
>
> md1 : active raid0 hdk1[0] hdg1[2] hdf1[1] hde1[3]
> 1171876864 blocks 256k chunks
>
> hdf recently wobbled and the kernel happily dumped out the following
> message (about 20 odd times)
>
> hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=216521307,
> high=12, low=15194715, sector=216521303
> ide: failed opcode was: unknown
> end_request: I/O error, dev hdf, sector 216521303
>
> terrific.
> smartctl shows
> 5 Reallocated_Sector_Ct 0x0033 252 252 063 Pre-fail
> Always - 16
> 196 Reallocated_Event_Count 0x0008 251 251 000 Old_age
> Offline - 2
> 197 Current_Pending_Sector 0x0008 253 253 000 Old_age
> Offline - 1
> 198 Offline_Uncorrectable 0x0008 252 252 000 Old_age
> Offline - 1
>
> so while a few sectors have been remapped the sector remapping table
is
> barely used, so I'm reasonably confident the disk is 'ok' in terms of
> normal operation
> Now my question about all this is how the hell do I determine what
file
> it was trying to access? (I just want to be sure that whatever file it
> was fiddling with isn't a steaming pile of poo now)
>
> normally if it were a single disk it would be reasonably
> straightforward
> to work out what file was being used from debugfs: icheck <block>,
> ncheck <inode from previews operation>
> the issue I have with raid0 is, exactly what block should I be
pointing
> debugfs at since the kernel has given an absolute address on a
physical
> disk and no logical information as to where in the FS it was at the
> time ?
>
> tune2fs -l /dev/md1
> Block count: 292969216
> Block size: 4096
>
> mdadm --examine /dev/hdf1
> /dev/hdf1:
> Magic : a92b4efc
> Version : 00.90.00
> UUID : 2132d2e6:a2955ac2:6ad2f3dc:52b3dfd6
> Creation Time : Wed May 26 18:34:06 2004
> Raid Level : raid0
> Used Dev Size : 292969216 (279.40 GiB 300.00 GB)
> Raid Devices : 4
> Total Devices : 4
> Preferred Minor : 1
>
> Update Time : Wed May 26 18:34:06 2004
> State : active
> Active Devices : 4
> Working Devices : 4
> Failed Devices : 0
> Spare Devices : 0
> Checksum : bd5e5781 - correct
> Events : 0.5
>
> Chunk Size : 256K
>
> Number Major Minor RaidDevice State
> this 1 33 65 1 active sync /dev/hdf1
>
> 0 0 33 1 0 active sync /dev/hde1
> 1 1 33 65 1 active sync /dev/hdf1
> 2 2 34 1 2 active sync /dev/hdg1
> 3 3 34 65 3 active sync <---
> empty. bug? (expected /dev/hdk1)
>
> mdadm --version
> mdadm - v2.6.2 - 21st May 2007
>
> Would I be correct in assuming that I start my offset from hde1? do I
> need some other funky math to account for striping/blocking?
>
> Or I could just say 'to hell with it' and listen to elevator music all
> day.
>
>
> Phil
> =--=
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
------------------------------------------------------------------------
--------------------
What you need is a utility program that translates a physical device &
offset and converts that into a logical block in md0. Then
Something that maps a logical block in md0 to a file in a filesystem.
Unfortunately, I am not aware of any off-the-shelf utility to do this,
but put me down for being very interested in such a solution myself.
David
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Determining filename from absolute sector failure in raid0
2008-10-07 14:29 ` David Lethe
@ 2008-10-07 16:34 ` Bryce
2008-10-07 19:04 ` David Lethe
0 siblings, 1 reply; 4+ messages in thread
From: Bryce @ 2008-10-07 16:34 UTC (permalink / raw)
To: Linux RAID Mailing List
David Lethe wrote:
> ------------------------------------------------
> --------------------
> What you need is a utility program that translates a physical device &
> offset and converts that into a logical block in md0. Then
> Something that maps a logical block in md0 to a file in a filesystem.
>
> Unfortunately, I am not aware of any off-the-shelf utility to do this,
> but put me down for being very interested in such a solution myself.
> David
>
>
>
well another part of the 'what do i do now?' question is that
/proc/mdstat shows the drives assembled thusly
hdk1[0] hdg1[2] hdf1[1] hde1[3]
and mdadm --examine shows the order as hde 0 hdf 1 hdg 2 hdk 3
err whats the actual order?
Phil
=--=
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Determining filename from absolute sector failure in raid0
2008-10-07 16:34 ` Bryce
@ 2008-10-07 19:04 ` David Lethe
0 siblings, 0 replies; 4+ messages in thread
From: David Lethe @ 2008-10-07 19:04 UTC (permalink / raw)
To: Bryce, Linux RAID Mailing List
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Bryce
> Sent: Tuesday, October 07, 2008 11:35 AM
> To: Linux RAID Mailing List
> Subject: Re: Determining filename from absolute sector failure in
raid0
>
> David Lethe wrote:
> > ------------------------------------------------
> > --------------------
> > What you need is a utility program that translates a physical device
> &
> > offset and converts that into a logical block in md0. Then
> > Something that maps a logical block in md0 to a file in a
filesystem.
> >
> > Unfortunately, I am not aware of any off-the-shelf utility to do
> this,
> > but put me down for being very interested in such a solution myself.
> > David
> >
> >
> >
> well another part of the 'what do i do now?' question is that
> /proc/mdstat shows the drives assembled thusly
> hdk1[0] hdg1[2] hdf1[1] hde1[3]
>
> and mdadm --examine shows the order as hde 0 hdf 1 hdg 2 hdk 3
> err whats the actual order?
>
> Phil
> =--=
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
It is not that easy, you have to look at the topology of the RAID,
factor in starting/ending block numbers, journaling, chunk/stripe size,
RAID level, etc.. in order to obtain the logical block number that
corresponds to any given physical disk number & raid configuration.
Lets call this magic routine PhysToLog(int mdNumber,char *PhysDev,
size_t PhysBlockNumber);
That routine in perfect world would query the md config and return a
64-bit logical block number (let's future-proof the code), and if you
had parity RAID, then would need to know the physical stripe number that
would need to get rebuilt in event the bad block mapped to a parity
(XOR) data.
Since you have RAID0, you have a lot of variables to eliminate, and can
extrapolate this information. Part 2, which can be done independently,
is the logical block to physical block.
Let's assume you manually take it apart and now have a logical block
number in md0 that maps to the physical block ...
Look at this link: it will help to convert a known "bad block" to file's
inode, which is what you need.
http://mail-index.netbsd.org/tech-userlevel/2003/04/17/0000.html
Too bad there isn't a mapbadblock2file command that ties this all up
into a nice pretty package for everybody. It would be a big hit in
dealing with data corruption and drive failures. I'm doing a 2TB rebuild
as we speak scanning for corrupted blocks.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-10-07 19:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-07 14:03 Determining filename from absolute sector failure in raid0 Bryce
2008-10-07 14:29 ` David Lethe
2008-10-07 16:34 ` Bryce
2008-10-07 19:04 ` David Lethe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).