* Fwd: xfs I/O error
2008-02-23 21:46 xfs I/O error Rekrutacja119
@ 2008-02-23 22:00 ` Rekrutacja119
2008-02-23 22:08 ` Iustin Pop
` (2 subsequent siblings)
3 siblings, 0 replies; 19+ messages in thread
From: Rekrutacja119 @ 2008-02-23 22:00 UTC (permalink / raw)
To: xfs
one more thing, i got it from logs when the XFS got down:
Feb 23 22:58:20 debian kernel: [<c021f67e>] xfs_free_ag_extent+0x49e/0x750
Feb 23 22:58:20 debian kernel: [<c0221270>] xfs_free_extent+0xe0/0x110
Feb 23 22:58:20 debian kernel: [<c0221270>] xfs_free_extent+0xe0/0x110
Feb 23 22:58:20 debian kernel: [<c025d24c>] xlog_grant_push_ail+0x3c/0x140
Feb 23 22:58:20 debian kernel: [<c022c1fa>] xfs_bmap_finish+0x13a/0x190
Feb 23 22:58:20 debian kernel: [<c0236730>] xfs_bunmapi+0x0/0xfb0
Feb 23 22:58:20 debian kernel: [<c025442a>] xfs_itruncate_finish+0x27a/0x400
Feb 23 22:58:20 debian kernel: [<c0276b3c>] xfs_inactive+0x49c/0x510
Feb 23 22:58:20 debian kernel: [<c0282697>] xfs_fs_clear_inode+0x77/0xc0
Feb 23 22:58:20 debian kernel: [<c017976f>] clear_inode+0x8f/0x140
Feb 23 22:58:20 debian kernel: [<c014f827>] truncate_inode_pages+0x17/0x20
Feb 23 22:58:20 debian kernel: [<c017990d>] generic_delete_inode+0xed/0x100
Feb 23 22:58:20 debian kernel: [<c0178f4c>] iput+0x5c/0x70
Feb 23 22:58:20 debian kernel: [<c017052f>] do_unlinkat+0xef/0x150
Feb 23 22:58:20 debian kernel: [<c0102972>] sysenter_past_esp+0x5f/0x85
Feb 23 22:58:20 debian kernel: [<c0390000>] nf_nat_ftp+0xd0/0x100
Feb 23 22:58:20 debian kernel: =======================
Feb 23 22:58:20 debian kernel: xfs_force_shutdown(md0,0x8) called from line
4258 of file fs/xfs/xfs_bmap.c. Return address = 0xc02830bc
please, can i change it somehow? i'm so desperate i think i will try to
change this line in kernel, no matter what consequences it might have (if i
don't allow access to this disk right away, i might as well delete the whole
thing, so it's very important)
also, xfs_repair stops on :
cannot read inode 44610144, disk block 22305072, cnt 16
---------- Forwarded message ----------
From: Rekrutacja119 <rekrutacja119@gmail.com>
Date: 23-02-2008 22:46
Subject: xfs I/O error
To: xfs@oss.sgi.com
hello, is there any way to force XFS to ignore I/O errors? it seems it is
shutting down the fs when it encounters any error.
The problem is that i can't mark badsectors, as XFS doesn't support bad
sector marking, but i also cannot access any correct data on partition,
because when i try to access damaged sector, the whole fs goes down.
any idea why?
i use xfsprogs 2.9.4, my xfs is array made from 3 HDs, RAID 0, and one of
them is getting some bad sectors. i cannot replace it currently.
after i run xfs_repair on it, i was able to mount it and access the data,
but when somebody tries to access bad data, the whole XFS goes down. i don't
want that, i also dont have place to xfsmetadump the whole array to another
disks.
i tried scaning whole disk with badblocks (badblocks -c 1 -s -v /dev/sdb),
and then running dd if=/dev/zero of=/dev/sdb count=1 bs=1
seek=NUMBER_FROM_BADBLOCKOUTPUT
but every block was written fine! (which is strange i guess), and it didnt
help.
please advise me anything other than switching the drive (i will do it,
can't now though) or dumping the whole thing as i need to much space.
the easiest solution would be to just ignore errors, and if not, then to
somehow force xfs to mark them as bad sectors (smartctl is showing errors
like for example
# 2 Extended offline Completed: unknown failure 90% 9395
-
or
Error 8324 occurred at disk power-on lifetime: 9398 hours (391 days + 14
hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 30 33 59 e6 Error: UNC at LBA = 0x06593330 = 106509104
[[HTML alternate version deleted]]
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: xfs I/O error
2008-02-23 21:46 xfs I/O error Rekrutacja119
2008-02-23 22:00 ` Fwd: " Rekrutacja119
@ 2008-02-23 22:08 ` Iustin Pop
2008-02-23 22:14 ` Rekrutacja119
2008-02-24 3:53 ` Eric Sandeen
2008-02-24 12:17 ` Ragnar Kjørstad
3 siblings, 1 reply; 19+ messages in thread
From: Iustin Pop @ 2008-02-23 22:08 UTC (permalink / raw)
To: Rekrutacja119; +Cc: xfs
On Sat, Feb 23, 2008 at 10:46:52PM +0100, Rekrutacja119 wrote:
> hello, is there any way to force XFS to ignore I/O errors? it seems it is
> shutting down the fs when it encounters any error.
> The problem is that i can't mark badsectors, as XFS doesn't support bad
> sector marking, but i also cannot access any correct data on partition,
> because when i try to access damaged sector, the whole fs goes down.
>
> any idea why?
>
> i use xfsprogs 2.9.4, my xfs is array made from 3 HDs, RAID 0, and one of
> them is getting some bad sectors. i cannot replace it currently.
>
> after i run xfs_repair on it, i was able to mount it and access the data,
> but when somebody tries to access bad data, the whole XFS goes down. i don't
> want that, i also dont have place to xfsmetadump the whole array to another
> disks.
>
> i tried scaning whole disk with badblocks (badblocks -c 1 -s -v /dev/sdb),
> and then running dd if=/dev/zero of=/dev/sdb count=1 bs=1
> seek=NUMBER_FROM_BADBLOCKOUTPUT
>
> but every block was written fine! (which is strange i guess), and it didnt
> help.
I'm not really sure, but the above seems wrong. badblocks has a default
block size of 1024 (-c does something else, not set the block size), and
you use that block number as an offset in bytes for dd (because you set
bs=1).
I would recommend to try the dd again, but with bs=1024. And afterwards,
rerun badblocks and check you have no errors.
As for xfs, I don't think if it can do what you want (ignore bad
blocks), if the error is in the metadata sections.
regards,
iustin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs I/O error
2008-02-23 22:08 ` Iustin Pop
@ 2008-02-23 22:14 ` Rekrutacja119
2008-02-24 9:01 ` Iustin Pop
0 siblings, 1 reply; 19+ messages in thread
From: Rekrutacja119 @ 2008-02-23 22:14 UTC (permalink / raw)
To: Rekrutacja119; +Cc: xfs
so i should use the list i got from badblocks with dd but with bs=1024? are
you sure? i'm not sure what is my block size, but xfs_info says this:
debian:/# xfs_info /dev/md0
meta-data=/dev/md0 isize=256 agcount=32, agsize=5723342
blks
= sectsz=512 attr=0
data = bsize=4096 blocks=183146912, imaxpct=25
= sunit=2 swidth=6 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=24576 blocks=0, rtextents=0
i think i made it with 4K stack size ... well anyways, i should use the list
i got earlier and just try dd but with bs=1024... ? don't want to erase more
than i have to. (i want to do it so i can smartctl -t offline /dev/sdb then,
so maybe it will somehow see that these blocks are broken and mark them)
2008/2/23, Iustin Pop <iusty@k1024.org>:
>
> On Sat, Feb 23, 2008 at 10:46:52PM +0100, Rekrutacja119 wrote:
> > hello, is there any way to force XFS to ignore I/O errors? it seems it
> is
> > shutting down the fs when it encounters any error.
> > The problem is that i can't mark badsectors, as XFS doesn't support bad
> > sector marking, but i also cannot access any correct data on partition,
> > because when i try to access damaged sector, the whole fs goes down.
> >
> > any idea why?
> >
> > i use xfsprogs 2.9.4, my xfs is array made from 3 HDs, RAID 0, and one
> of
> > them is getting some bad sectors. i cannot replace it currently.
> >
> > after i run xfs_repair on it, i was able to mount it and access the
> data,
> > but when somebody tries to access bad data, the whole XFS goes down. i
> don't
> > want that, i also dont have place to xfsmetadump the whole array to
> another
> > disks.
> >
> > i tried scaning whole disk with badblocks (badblocks -c 1 -s -v
> /dev/sdb),
> > and then running dd if=/dev/zero of=/dev/sdb count=1 bs=1
> > seek=NUMBER_FROM_BADBLOCKOUTPUT
> >
> > but every block was written fine! (which is strange i guess), and it
> didnt
> > help.
>
> I'm not really sure, but the above seems wrong. badblocks has a default
> block size of 1024 (-c does something else, not set the block size), and
> you use that block number as an offset in bytes for dd (because you set
> bs=1).
>
> I would recommend to try the dd again, but with bs=1024. And afterwards,
> rerun badblocks and check you have no errors.
>
> As for xfs, I don't think if it can do what you want (ignore bad
> blocks), if the error is in the metadata sections.
>
> regards,
> iustin
>
[[HTML alternate version deleted]]
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: xfs I/O error
2008-02-23 22:14 ` Rekrutacja119
@ 2008-02-24 9:01 ` Iustin Pop
2008-02-24 11:52 ` Rekrutacja119
0 siblings, 1 reply; 19+ messages in thread
From: Iustin Pop @ 2008-02-24 9:01 UTC (permalink / raw)
To: Rekrutacja119; +Cc: xfs
On Sat, Feb 23, 2008 at 11:14:32PM +0100, Rekrutacja119 wrote:
> so i should use the list i got from badblocks with dd but with bs=1024? are
> you sure? i'm not sure what is my block size, but xfs_info says this:
the block size of the filesystem has no relation to the block size
badblocks uses in checking the block device.
>
[...]
>
> i think i made it with 4K stack size ... well anyways, i should use the list
> i got earlier and just try dd but with bs=1024... ? don't want to erase more
> than i have to. (i want to do it so i can smartctl -t offline /dev/sdb then,
> so maybe it will somehow see that these blocks are broken and mark them)
I don't understand what do you want to do with smartctl at all. How
would offlining the disk would help?
Again, I would:
- take the block list given by badblocks
- verify that each block can't be read first via dd if=/dev/...
of=/dev/null bs=1024 count=1 seek=NUMBER_FROM_BADBLOCK
- if confirmed that you got the right 'bad' blocks, use that for dd
with bs=1024 to write zeroes over them
As Eric confirmed, you probably already destroyed some good data on the
drive.
regards,
iustin
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: xfs I/O error
2008-02-24 9:01 ` Iustin Pop
@ 2008-02-24 11:52 ` Rekrutacja119
2008-02-24 12:31 ` Iustin Pop
0 siblings, 1 reply; 19+ messages in thread
From: Rekrutacja119 @ 2008-02-24 11:52 UTC (permalink / raw)
To: Rekrutacja119, xfs
smartctl -t offline is i think scheduling a SMART test
i will try what you say, but there is no way to force XFS to not unmount
filesystem if it finds out I/O error?
also, i didnt edit any info, it was just all i got from messages log
2008/2/24, Iustin Pop <iusty@k1024.org>:
>
> On Sat, Feb 23, 2008 at 11:14:32PM +0100, Rekrutacja119 wrote:
> > so i should use the list i got from badblocks with dd but with bs=1024?
> are
> > you sure? i'm not sure what is my block size, but xfs_info says this:
>
> the block size of the filesystem has no relation to the block size
> badblocks uses in checking the block device.
> >
> [...]
>
> >
> > i think i made it with 4K stack size ... well anyways, i should use the
> list
> > i got earlier and just try dd but with bs=1024... ? don't want to erase
> more
> > than i have to. (i want to do it so i can smartctl -t offline /dev/sdb
> then,
> > so maybe it will somehow see that these blocks are broken and mark them)
>
>
> I don't understand what do you want to do with smartctl at all. How
> would offlining the disk would help?
>
> Again, I would:
> - take the block list given by badblocks
> - verify that each block can't be read first via dd if=/dev/...
> of=/dev/null bs=1024 count=1 seek=NUMBER_FROM_BADBLOCK
> - if confirmed that you got the right 'bad' blocks, use that for dd
> with bs=1024 to write zeroes over them
>
> As Eric confirmed, you probably already destroyed some good data on the
> drive.
>
> regards,
>
> iustin
>
[[HTML alternate version deleted]]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs I/O error
2008-02-24 11:52 ` Rekrutacja119
@ 2008-02-24 12:31 ` Iustin Pop
2008-02-25 16:40 ` Rekrutacja119
0 siblings, 1 reply; 19+ messages in thread
From: Iustin Pop @ 2008-02-24 12:31 UTC (permalink / raw)
To: Rekrutacja119; +Cc: xfs
On Sun, Feb 24, 2008 at 12:52:38PM +0100, Rekrutacja119 wrote:
> smartctl -t offline is i think scheduling a SMART test
ah yes, you're right, I'm sorry.
I would recommend to use not -t offline but -t long - AFAIK, the
'offline' test just updates the attributes but not scan the whole of the
disk.
In any case, I don't think a smart test is better than badblocks...
regards,
iustin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs I/O error
2008-02-24 12:31 ` Iustin Pop
@ 2008-02-25 16:40 ` Rekrutacja119
2008-02-25 17:17 ` Eric Sandeen
0 siblings, 1 reply; 19+ messages in thread
From: Rekrutacja119 @ 2008-02-25 16:40 UTC (permalink / raw)
To: xfs
i tried doing dd if with the bs=1024, and the dd command went fine! i really
don't understand it :( i wont write zeros to it, if it is fine, since this
sector is probably correct.
any ideas what to do now?
also - still no solution to force XFS to not shutdown itself?
2008/2/24, Iustin Pop <iusty@k1024.org>:
>
> On Sun, Feb 24, 2008 at 12:52:38PM +0100, Rekrutacja119 wrote:
> > smartctl -t offline is i think scheduling a SMART test
> ah yes, you're right, I'm sorry.
>
> I would recommend to use not -t offline but -t long - AFAIK, the
> 'offline' test just updates the attributes but not scan the whole of the
> disk.
>
> In any case, I don't think a smart test is better than badblocks...
>
> regards,
> iustin
>
[[HTML alternate version deleted]]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs I/O error
2008-02-25 16:40 ` Rekrutacja119
@ 2008-02-25 17:17 ` Eric Sandeen
0 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2008-02-25 17:17 UTC (permalink / raw)
To: Rekrutacja119; +Cc: xfs
Rekrutacja119 wrote:
> i tried doing dd if with the bs=1024, and the dd command went fine! i really
> don't understand it :( i wont write zeros to it, if it is fine, since this
> sector is probably correct.
>
> any ideas what to do now?
>
> also - still no solution to force XFS to not shutdown itself?
mount it readonly.
xfs isn't going to continue if doing so would further corrupt the fs.
-Eric
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs I/O error
2008-02-23 21:46 xfs I/O error Rekrutacja119
2008-02-23 22:00 ` Fwd: " Rekrutacja119
2008-02-23 22:08 ` Iustin Pop
@ 2008-02-24 3:53 ` Eric Sandeen
2008-02-24 12:17 ` Ragnar Kjørstad
3 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2008-02-24 3:53 UTC (permalink / raw)
To: Rekrutacja119; +Cc: xfs
Rekrutacja119 wrote:
> hello, is there any way to force XFS to ignore I/O errors? it seems it is
> shutting down the fs when it encounters any error.
It does not shut down on any error; it should only be shutting down on
errors after which it cannot guarantee filesystem consistency.
> The problem is that i can't mark badsectors, as XFS doesn't support bad
> sector marking, but i also cannot access any correct data on partition,
> because when i try to access damaged sector, the whole fs goes down.
>
> any idea why?
Depends on what the sector is and what xfs is doing with it.
(btw the trace you posted in your next messages looks like you edited
out some relevant information)
> i use xfsprogs 2.9.4, my xfs is array made from 3 HDs, RAID 0, and one of
> them is getting some bad sectors. i cannot replace it currently.
xfs can't really help you with your bad hardware ;)
> after i run xfs_repair on it, i was able to mount it and access the data,
> but when somebody tries to access bad data, the whole XFS goes down. i don't
> want that, i also dont have place to xfsmetadump the whole array to another
> disks.
I do not think metadump does what you think it does... it only copies
metadata.
> i tried scaning whole disk with badblocks (badblocks -c 1 -s -v /dev/sdb),
> and then running dd if=/dev/zero of=/dev/sdb count=1 bs=1
> seek=NUMBER_FROM_BADBLOCKOUTPUT
>
> but every block was written fine! (which is strange i guess), and it didnt
> help.
as iustin said, I think you just pretty well clobbered some important
metadata on your disk. badblocks gives you block numbers in 1024 units.
You gave dd a block size of 1... then rather than seeking out the
proper number of 1024 units, you seeked that many bytes; overwriting
probably important stuff at the beginning of your disk (since your wrote
at 1/1024 the offset that you should have)
> please advise me anything other than switching the drive (i will do it,
> can't now though) or dumping the whole thing as i need to much space.
mount it readonly to get to the data you need?
> the easiest solution would be to just ignore errors, and if not, then to
> somehow force xfs to mark them as bad sectors (smartctl is showing errors
> like for example
IMHO marking sectors bad is pointless. If you have a failing drive, it
will only get worse. At best you could use badblocks to try some writes
to remap; assuming you don't get it wrong and just zero out more of your
disk...
-Eric
> # 2 Extended offline Completed: unknown failure 90% 9395
> -
>
> or
>
>
> Error 8324 occurred at disk power-on lifetime: 9398 hours (391 days + 14
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 30 33 59 e6 Error: UNC at LBA = 0x06593330 = 106509104
>
>
> [[HTML alternate version deleted]]
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs I/O error
2008-02-23 21:46 xfs I/O error Rekrutacja119
` (2 preceding siblings ...)
2008-02-24 3:53 ` Eric Sandeen
@ 2008-02-24 12:17 ` Ragnar Kjørstad
[not found] ` <2db2c6b80802250847m2d161f5n276026dae396d3cc@mail.gmail.com>
3 siblings, 1 reply; 19+ messages in thread
From: Ragnar Kjørstad @ 2008-02-24 12:17 UTC (permalink / raw)
To: Rekrutacja119; +Cc: xfs
On Sat, Feb 23, 2008 at 10:46:52PM +0100, Rekrutacja119 wrote:
> the easiest solution would be to just ignore errors, and if not, then to
> somehow force xfs to mark them as bad sectors
If you really wanted to you could probably use dm to map your bad
sectors to another device.
--
Ragnar Kjørstad
Software Engineer
Platform Computing
^ permalink raw reply [flat|nested] 19+ messages in thread