public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* can xfs_repair guarantee a complete clean filesystem?
@ 2009-12-01  2:05 hank peng
  2009-12-01  3:54 ` Eric Sandeen
  0 siblings, 1 reply; 13+ messages in thread
From: hank peng @ 2009-12-01  2:05 UTC (permalink / raw)
  To: linux-xfs

When using xfs_repair, I want my XFS filesystem complete clean and
continue to work even if some files lost. Becase we use XFS in low-end
NAS box, customers want a tool to repair the filesystem when it has
problem and they allow some files be lost and don't want the whole
system to stop.
So, I wonder if xfs_repair or some other tools can satisfy this funciton?

-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-01  2:05 can xfs_repair guarantee a complete clean filesystem? hank peng
@ 2009-12-01  3:54 ` Eric Sandeen
  2009-12-01  4:37   ` hank peng
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2009-12-01  3:54 UTC (permalink / raw)
  To: hank peng; +Cc: linux-xfs

hank peng wrote:
> When using xfs_repair, I want my XFS filesystem complete clean and
> continue to work even if some files lost. Becase we use XFS in low-end
> NAS box, customers want a tool to repair the filesystem when it has
> problem and they allow some files be lost and don't want the whole
> system to stop.
> So, I wonder if xfs_repair or some other tools can satisfy this funciton?
> 

Yes, that is exactly its purpose (any potential bugs notwithstanding...)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-01  3:54 ` Eric Sandeen
@ 2009-12-01  4:37   ` hank peng
  2009-12-01  5:58     ` Eric Sandeen
  0 siblings, 1 reply; 13+ messages in thread
From: hank peng @ 2009-12-01  4:37 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> When using xfs_repair, I want my XFS filesystem complete clean and
>> continue to work even if some files lost. Becase we use XFS in low-end
>> NAS box, customers want a tool to repair the filesystem when it has
>> problem and they allow some files be lost and don't want the whole
>> system to stop.
>> So, I wonder if xfs_repair or some other tools can satisfy this funciton?
>>
>
> Yes, that is exactly its purpose (any potential bugs notwithstanding...)
>
Thanks for your reply.
Is there some points I should notice about when using xfs_repair? I
used to encounter some cases in which xfs_repair complete successfully
but some errors like "Corrupt in memory detected" occured when the
filesytem is put into online for short time.
Should I reboot the machine and use xfs_repair before the damaged
filesystem is used, or some other options I should use?
In addition, I googled some information and found that some people say
xfs_check should be used before xfs_repair, is it right?

> -Eric
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-01  4:37   ` hank peng
@ 2009-12-01  5:58     ` Eric Sandeen
  2009-12-01  6:34       ` hank peng
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2009-12-01  5:58 UTC (permalink / raw)
  To: hank peng; +Cc: linux-xfs

hank peng wrote:
> 2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
>> hank peng wrote:
>>> When using xfs_repair, I want my XFS filesystem complete clean and
>>> continue to work even if some files lost. Becase we use XFS in low-end
>>> NAS box, customers want a tool to repair the filesystem when it has
>>> problem and they allow some files be lost and don't want the whole
>>> system to stop.
>>> So, I wonder if xfs_repair or some other tools can satisfy this funciton?
>>>
>> Yes, that is exactly its purpose (any potential bugs notwithstanding...)
>>
> Thanks for your reply.
> Is there some points I should notice about when using xfs_repair? I
> used to encounter some cases in which xfs_repair complete successfully
> but some errors like "Corrupt in memory detected" occured when the
> filesytem is put into online for short time.

It's possible that you encountered a bug (in xfs or elsewhere), or
bad hardware...

> Should I reboot the machine and use xfs_repair before the damaged
> filesystem is used, or some other options I should use?

Just unmount the filesystem, run repair, and remount.

> In addition, I googled some information and found that some people say
> xfs_check should be used before xfs_repair, is it right?

There's no need; xfs_check doesn't scale very well, and xfs_repair -n will do
a check-only run if that's what you want.

xfs_check checks a little more than xfs_repair, but xfs_repair simply
rebuilds those things it doesn't check in any case.

-Eric

>> -Eric
>>
> 
> 
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-01  5:58     ` Eric Sandeen
@ 2009-12-01  6:34       ` hank peng
  2009-12-01 14:43         ` Eric Sandeen
  0 siblings, 1 reply; 13+ messages in thread
From: hank peng @ 2009-12-01  6:34 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> 2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
>>> hank peng wrote:
>>>> When using xfs_repair, I want my XFS filesystem complete clean and
>>>> continue to work even if some files lost. Becase we use XFS in low-end
>>>> NAS box, customers want a tool to repair the filesystem when it has
>>>> problem and they allow some files be lost and don't want the whole
>>>> system to stop.
>>>> So, I wonder if xfs_repair or some other tools can satisfy this funciton?
>>>>
>>> Yes, that is exactly its purpose (any potential bugs notwithstanding...)
>>>
>> Thanks for your reply.
>> Is there some points I should notice about when using xfs_repair? I
>> used to encounter some cases in which xfs_repair complete successfully
>> but some errors like "Corrupt in memory detected" occured when the
>> filesytem is put into online for short time.
>
> It's possible that you encountered a bug (in xfs or elsewhere), or
> bad hardware...
>
>> Should I reboot the machine and use xfs_repair before the damaged
>> filesystem is used, or some other options I should use?
>
> Just unmount the filesystem, run repair, and remount.
>
>> In addition, I googled some information and found that some people say
>> xfs_check should be used before xfs_repair, is it right?
>
> There's no need; xfs_check doesn't scale very well, and xfs_repair -n will do
> a check-only run if that's what you want.
>
> xfs_check checks a little more than xfs_repair, but xfs_repair simply
> rebuilds those things it doesn't check in any case.
>
> -Eric
>
Today, I encountered a problem:
I use "xfs_repair -L “ on a damaged filesystem and a lot of messages
output which include "moving disconnected inodes to lost+found ...".
Then I can remount the filesystem successfully and decided to remove
those files in lost+found directory, but it printed the following
message:
root@1234dahua:/mnt/Pool_md1/ss1/lost+found# rm -rf *
rm: cannot stat '710': Structure needs cleaning
rm: cannot stat '728': Structure needs cleaning
rm: cannot stat '729': Structure needs cleaning
rm: cannot stat '730': Structure needs cleaning
rm: cannot stat '731': Structure needs cleaning
rm: cannot stat '732': Structure needs cleaning
rm: cannot stat '733': Structure needs cleaning
rm: cannot stat '734': Structure needs cleaning
rm: cannot stat '735': Structure needs cleaning

Other directories and files seems normal to access, is it not allowed
to delete files in lost+found directory after repair, then what should
I do?

>>> -Eric
>>>
>>
>>
>>
>
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-01  6:34       ` hank peng
@ 2009-12-01 14:43         ` Eric Sandeen
  2009-12-01 15:32           ` hank peng
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2009-12-01 14:43 UTC (permalink / raw)
  To: hank peng; +Cc: linux-xfs

hank peng wrote:

> Today, I encountered a problem:
> I use "xfs_repair -L “ on a damaged filesystem and a lot of messages

why did you use -L, did it fail to mount & replay the log properly?

> output which include "moving disconnected inodes to lost+found ...".
> Then I can remount the filesystem successfully and decided to remove
> those files in lost+found directory, but it printed the following
> message:
> root@1234dahua:/mnt/Pool_md1/ss1/lost+found# rm -rf *
> rm: cannot stat '710': Structure needs cleaning
> rm: cannot stat '728': Structure needs cleaning
> rm: cannot stat '729': Structure needs cleaning
> rm: cannot stat '730': Structure needs cleaning
> rm: cannot stat '731': Structure needs cleaning
> rm: cannot stat '732': Structure needs cleaning
> rm: cannot stat '733': Structure needs cleaning
> rm: cannot stat '734': Structure needs cleaning
> rm: cannot stat '735': Structure needs cleaning

Look at dmesg to see what's gone wrong....

> Other directories and files seems normal to access, is it not allowed
> to delete files in lost+found directory after repair, then what should
> I do?

This is indicative of a bug or IO error that caused xfs to shut down.

You haven't mentioned which kernel version, architecture, or
version of xfsprogs you're using yet ... that may offer some clues.

I'm guessing an older kernel and userspace on arm? :)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-01 14:43         ` Eric Sandeen
@ 2009-12-01 15:32           ` hank peng
  2009-12-01 15:44             ` Eric Sandeen
  0 siblings, 1 reply; 13+ messages in thread
From: hank peng @ 2009-12-01 15:32 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>
>> Today, I encountered a problem:
>> I use "xfs_repair -L “ on a damaged filesystem and a lot of messages
>
> why did you use -L, did it fail to mount & replay the log properly?
>
umount, mount, umount and then xfs_repair failed, so I have to use -L.

>> output which include "moving disconnected inodes to lost+found ...".
>> Then I can remount the filesystem successfully and decided to remove
>> those files in lost+found directory, but it printed the following
>> message:
>> root@1234dahua:/mnt/Pool_md1/ss1/lost+found# rm -rf *
>> rm: cannot stat '710': Structure needs cleaning
>> rm: cannot stat '728': Structure needs cleaning
>> rm: cannot stat '729': Structure needs cleaning
>> rm: cannot stat '730': Structure needs cleaning
>> rm: cannot stat '731': Structure needs cleaning
>> rm: cannot stat '732': Structure needs cleaning
>> rm: cannot stat '733': Structure needs cleaning
>> rm: cannot stat '734': Structure needs cleaning
>> rm: cannot stat '735': Structure needs cleaning
>
> Look at dmesg to see what's gone wrong....
>

>> Other directories and files seems normal to access, is it not allowed
>> to delete files in lost+found directory after repair, then what should
>> I do?
>
> This is indicative of a bug or IO error that caused xfs to shut down.
>
> You haven't mentioned which kernel version, architecture, or
> version of xfsprogs you're using yet ... that may offer some clues.
>

> I'm guessing an older kernel and userspace on arm? :)
>
kernel version is 2.6.23, xfsprogs is 2.9.7, CPU is MPC8548, powerpc arch.
I am at home now, Maybe I can provide some detailed information tomorrow.

> -Eric
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-01 15:32           ` hank peng
@ 2009-12-01 15:44             ` Eric Sandeen
  2009-12-02  0:46               ` hank peng
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2009-12-01 15:44 UTC (permalink / raw)
  To: hank peng; +Cc: linux-xfs

hank peng wrote:
> 2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
>> hank peng wrote:
>>
>>> Today, I encountered a problem:
>>> I use "xfs_repair -L “ on a damaged filesystem and a lot of messages
>> why did you use -L, did it fail to mount & replay the log properly?
>>
> umount, mount, umount and then xfs_repair failed, so I have to use -L.

Failed how?  that's a bug.  (or, -L did nothing anyway because the log
wasn't dirty after the clean unmount)

>>> output which include "moving disconnected inodes to lost+found ...".
>>> Then I can remount the filesystem successfully and decided to remove
>>> those files in lost+found directory, but it printed the following
>>> message:
>>> root@1234dahua:/mnt/Pool_md1/ss1/lost+found# rm -rf *
>>> rm: cannot stat '710': Structure needs cleaning
>>> rm: cannot stat '728': Structure needs cleaning
>>> rm: cannot stat '729': Structure needs cleaning
>>> rm: cannot stat '730': Structure needs cleaning
>>> rm: cannot stat '731': Structure needs cleaning
>>> rm: cannot stat '732': Structure needs cleaning
>>> rm: cannot stat '733': Structure needs cleaning
>>> rm: cannot stat '734': Structure needs cleaning
>>> rm: cannot stat '735': Structure needs cleaning
>> Look at dmesg to see what's gone wrong....
>>
> 
>>> Other directories and files seems normal to access, is it not allowed
>>> to delete files in lost+found directory after repair, then what should
>>> I do?
>> This is indicative of a bug or IO error that caused xfs to shut down.
>>
>> You haven't mentioned which kernel version, architecture, or
>> version of xfsprogs you're using yet ... that may offer some clues.
>>
> 
>> I'm guessing an older kernel and userspace on arm? :)
>>
> kernel version is 2.6.23, xfsprogs is 2.9.7, CPU is MPC8548, powerpc arch.
> I am at home now, Maybe I can provide some detailed information tomorrow.

If there's any possibility to test newer kernel & userspace, that'd 
be great.  Many bugs have been fixed since those versions.

-Eric

>> -Eric
>>
> 
> 
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-01 15:44             ` Eric Sandeen
@ 2009-12-02  0:46               ` hank peng
  2009-12-02  1:08                 ` Eric Sandeen
  0 siblings, 1 reply; 13+ messages in thread
From: hank peng @ 2009-12-02  0:46 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> 2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
>>> hank peng wrote:
>>>
>>>> Today, I encountered a problem:
>>>> I use "xfs_repair -L “ on a damaged filesystem and a lot of messages
>>> why did you use -L, did it fail to mount & replay the log properly?
>>>
>> umount, mount, umount and then xfs_repair failed, so I have to use -L.
>
> Failed how?  that's a bug.  (or, -L did nothing anyway because the log
> wasn't dirty after the clean unmount)
>
>>>> output which include "moving disconnected inodes to lost+found ...".
>>>> Then I can remount the filesystem successfully and decided to remove
>>>> those files in lost+found directory, but it printed the following
>>>> message:
>>>> root@1234dahua:/mnt/Pool_md1/ss1/lost+found# rm -rf *
>>>> rm: cannot stat '710': Structure needs cleaning
>>>> rm: cannot stat '728': Structure needs cleaning
>>>> rm: cannot stat '729': Structure needs cleaning
>>>> rm: cannot stat '730': Structure needs cleaning
>>>> rm: cannot stat '731': Structure needs cleaning
>>>> rm: cannot stat '732': Structure needs cleaning
>>>> rm: cannot stat '733': Structure needs cleaning
>>>> rm: cannot stat '734': Structure needs cleaning
>>>> rm: cannot stat '735': Structure needs cleaning
>>> Look at dmesg to see what's gone wrong....
>>>
>>
>>>> Other directories and files seems normal to access, is it not allowed
>>>> to delete files in lost+found directory after repair, then what should
>>>> I do?
>>> This is indicative of a bug or IO error that caused xfs to shut down.
>>>
>>> You haven't mentioned which kernel version, architecture, or
>>> version of xfsprogs you're using yet ... that may offer some clues.
>>>
>>
>>> I'm guessing an older kernel and userspace on arm? :)
>>>
>> kernel version is 2.6.23, xfsprogs is 2.9.7, CPU is MPC8548, powerpc arch.
>> I am at home now, Maybe I can provide some detailed information tomorrow.
>
> If there's any possibility to test newer kernel & userspace, that'd
> be great.  Many bugs have been fixed since those versions.
>
We did have plan to upgrade kernel to latest 2.6.31.
BTW, Is there some place where I can check those fixed bug list across versions?
> -Eric
>
>>> -Eric
>>>
>>
>>
>>
>
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-02  0:46               ` hank peng
@ 2009-12-02  1:08                 ` Eric Sandeen
  2009-12-02  1:36                   ` hank peng
  2009-12-02  2:39                   ` hank peng
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Sandeen @ 2009-12-02  1:08 UTC (permalink / raw)
  To: hank peng; +Cc: linux-xfs

hank peng wrote:
> 2009/12/1 Eric Sandeen <sandeen@sandeen.net>:

...

>>> kernel version is 2.6.23, xfsprogs is 2.9.7, CPU is MPC8548, powerpc arch.
>>> I am at home now, Maybe I can provide some detailed information tomorrow.
>> If there's any possibility to test newer kernel & userspace, that'd
>> be great.  Many bugs have been fixed since those versions.
>>
> We did have plan to upgrade kernel to latest 2.6.31.

Well, I'm just suggesting testing it for now, not necessarily
upgrading your product.  Would just be good to know if the bug you
are seeing persists upstream on ppc.

> BTW, Is there some place where I can check those fixed bug list across versions?

You can look at the changelogs on kernel.org, for instance:

http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.32

Or with git, git log --pretty-oneline fs/xfs

There isn't a great bug <-> commit <-> kernelversion mapping, I guess.

-Eric

>> -Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-02  1:08                 ` Eric Sandeen
@ 2009-12-02  1:36                   ` hank peng
  2009-12-02  2:39                   ` hank peng
  1 sibling, 0 replies; 13+ messages in thread
From: hank peng @ 2009-12-02  1:36 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

2009/12/2 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> 2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
>
> ...
>
>>>> kernel version is 2.6.23, xfsprogs is 2.9.7, CPU is MPC8548, powerpc arch.
>>>> I am at home now, Maybe I can provide some detailed information tomorrow.
>>> If there's any possibility to test newer kernel & userspace, that'd
>>> be great.  Many bugs have been fixed since those versions.
>>>
>> We did have plan to upgrade kernel to latest 2.6.31.
>
> Well, I'm just suggesting testing it for now, not necessarily
> upgrading your product.  Would just be good to know if the bug you
> are seeing persists upstream on ppc.
>
Sad, our test engineer had deleted that damaged filesystem, I have to
reproduce test again and wait for it to happen.
I will replace newer kernel to test immediately when this problem
occure again and let you know ASAP.

>> BTW, Is there some place where I can check those fixed bug list across versions?
>
> You can look at the changelogs on kernel.org, for instance:
>
> http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.32
>
> Or with git, git log --pretty-oneline fs/xfs
>
> There isn't a great bug <-> commit <-> kernelversion mapping, I guess.
>
> -Eric
>
>>> -Eric
>
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-02  1:08                 ` Eric Sandeen
  2009-12-02  1:36                   ` hank peng
@ 2009-12-02  2:39                   ` hank peng
  2009-12-02  3:52                     ` Eric Sandeen
  1 sibling, 1 reply; 13+ messages in thread
From: hank peng @ 2009-12-02  2:39 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

Hi, Eric:
I think I have reproduced the problem.

# uname -a
Linux 1234dahua 2.6.23 #747 Mon Nov 16 10:52:58 CST 2009 ppc unknown
#mdadm -C /dev/md1 -l5 -n3 /dev/sd{h,c,b}
# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
[raid4] [multipath]
md1 : active raid5 sdb[3] sdc[1] sdh[0]
      976772992 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]
      [==>..................]  recovery = 13.0% (63884032/488386496)
finish=103.8min speed=68124K/sec

unused devices: <none>
#pvcreate /dev/md1
#vgcreate Pool_md1 /dev/md1
#lvcreate -L 931G -n testlv Pool_md1
#lvdisplay
# lvdisplay
  --- Logical volume ---
  LV Name                /dev/Pool_md1/testlv
  VG Name                Pool_md1
  LV UUID                jWTgk5-Q6tf-jSEU-m9VZ-K2Kb-1oRW-R7oP94
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                931.00 GB
  Current LE             238336
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0

#mkfs.xfs -f -ssize=4k /dev/Pool_md1/testlv
#mount /dev/Pool_md1/testlv /mnt/Pool_md1/testlv
All is OK and mount the filesystem and began to write files into it
through our application software. For a short while, problem occured.
# cd /mnt/Pool_md1/testlv
cd: error retrieving current directory: getcwd: cannot access parent
directories: Input/output error
#dmesg | tail -n 30
--- rd:3 wd:2
 disk 0, o:1, dev:sdh
 disk 1, o:1, dev:sdc
RAID5 conf printout:
 --- rd:3 wd:2
 disk 0, o:1, dev:sdh
 disk 1, o:1, dev:sdc
 disk 2, o:1, dev:sdb
md: recovery of RAID array md1
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for recovery.
md: using 128k window, over a total of 488386496 blocks.
Filesystem "dm-0": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-0
Ending clean XFS mount for filesystem: dm-0
Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1169 of
file fs/xfs/xfs_trans.c.  Caller 0xc019fbf0
Call Trace:
[e8e6dcb0] [c00091ec] show_stack+0x3c/0x1a0 (unreliable)
[e8e6dce0] [c017559c] xfs_error_report+0x50/0x60
[e8e6dcf0] [c0197058] xfs_trans_cancel+0x124/0x140
[e8e6dd10] [c019fbf0] xfs_create+0x1fc/0x63c
[e8e6dd90] [c01ad690] xfs_vn_mknod+0x1ac/0x20c
[e8e6de40] [c007ded4] vfs_create+0xa8/0xe4
[e8e6de60] [c0081370] open_namei+0x5f0/0x688
[e8e6deb0] [c00729b8] do_filp_open+0x2c/0x6c
[e8e6df20] [c0072a54] do_sys_open+0x5c/0xf8
[e8e6df40] [c0002320] ret_from_syscall+0x0/0x3c
xfs_force_shutdown(dm-0,0x8) called from line 1170 of file
fs/xfs/xfs_trans.c.  Return address = 0xc01b0b74
Filesystem "dm-0": Corruption of in-memory data detected.  Shutting
down filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)

What shoul I do now? use xfs_repair or use newer kernel ? Please let
me know if you need other information.

2009/12/2 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> 2009/12/1 Eric Sandeen <sandeen@sandeen.net>:
>
> ...
>
>>>> kernel version is 2.6.23, xfsprogs is 2.9.7, CPU is MPC8548, powerpc arch.
>>>> I am at home now, Maybe I can provide some detailed information tomorrow.
>>> If there's any possibility to test newer kernel & userspace, that'd
>>> be great.  Many bugs have been fixed since those versions.
>>>
>> We did have plan to upgrade kernel to latest 2.6.31.
>
> Well, I'm just suggesting testing it for now, not necessarily
> upgrading your product.  Would just be good to know if the bug you
> are seeing persists upstream on ppc.
>
>> BTW, Is there some place where I can check those fixed bug list across versions?
>
> You can look at the changelogs on kernel.org, for instance:
>
> http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.32
>
> Or with git, git log --pretty-oneline fs/xfs
>
> There isn't a great bug <-> commit <-> kernelversion mapping, I guess.
>
> -Eric
>
>>> -Eric
>
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: can xfs_repair guarantee a complete clean filesystem?
  2009-12-02  2:39                   ` hank peng
@ 2009-12-02  3:52                     ` Eric Sandeen
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Sandeen @ 2009-12-02  3:52 UTC (permalink / raw)
  To: hank peng; +Cc: linux-xfs

hank peng wrote:
> Hi, Eric:
> I think I have reproduced the problem.
> 
> # uname -a
> Linux 1234dahua 2.6.23 #747 Mon Nov 16 10:52:58 CST 2009 ppc unknown
> #mdadm -C /dev/md1 -l5 -n3 /dev/sd{h,c,b}
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath]
> md1 : active raid5 sdb[3] sdc[1] sdh[0]
>       976772992 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]
>       [==>..................]  recovery = 13.0% (63884032/488386496)
> finish=103.8min speed=68124K/sec
> 
> unused devices: <none>
> #pvcreate /dev/md1
> #vgcreate Pool_md1 /dev/md1
> #lvcreate -L 931G -n testlv Pool_md1
> #lvdisplay
> # lvdisplay
>   --- Logical volume ---
>   LV Name                /dev/Pool_md1/testlv
>   VG Name                Pool_md1
>   LV UUID                jWTgk5-Q6tf-jSEU-m9VZ-K2Kb-1oRW-R7oP94
>   LV Write Access        read/write
>   LV Status              available
>   # open                 1
>   LV Size                931.00 GB
>   Current LE             238336
>   Segments               1
>   Allocation             inherit
>   Read ahead sectors     auto
>   - currently set to     256
>   Block device           253:0
> 
> #mkfs.xfs -f -ssize=4k /dev/Pool_md1/testlv
> #mount /dev/Pool_md1/testlv /mnt/Pool_md1/testlv
> All is OK and mount the filesystem and began to write files into it
> through our application software. For a short while, problem occured.
> # cd /mnt/Pool_md1/testlv
> cd: error retrieving current directory: getcwd: cannot access parent
> directories: Input/output error
> #dmesg | tail -n 30
> --- rd:3 wd:2
>  disk 0, o:1, dev:sdh
>  disk 1, o:1, dev:sdc
> RAID5 conf printout:
>  --- rd:3 wd:2
>  disk 0, o:1, dev:sdh
>  disk 1, o:1, dev:sdc
>  disk 2, o:1, dev:sdb
> md: recovery of RAID array md1
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than
> 200000 KB/sec) for recovery.
> md: using 128k window, over a total of 488386496 blocks.
> Filesystem "dm-0": Disabling barriers, not supported by the underlying device
> XFS mounting filesystem dm-0
> Ending clean XFS mount for filesystem: dm-0
> Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1169 of
> file fs/xfs/xfs_trans.c.  Caller 0xc019fbf0
> Call Trace:
> [e8e6dcb0] [c00091ec] show_stack+0x3c/0x1a0 (unreliable)
> [e8e6dce0] [c017559c] xfs_error_report+0x50/0x60
> [e8e6dcf0] [c0197058] xfs_trans_cancel+0x124/0x140
> [e8e6dd10] [c019fbf0] xfs_create+0x1fc/0x63c
> [e8e6dd90] [c01ad690] xfs_vn_mknod+0x1ac/0x20c
> [e8e6de40] [c007ded4] vfs_create+0xa8/0xe4
> [e8e6de60] [c0081370] open_namei+0x5f0/0x688
> [e8e6deb0] [c00729b8] do_filp_open+0x2c/0x6c
> [e8e6df20] [c0072a54] do_sys_open+0x5c/0xf8
> [e8e6df40] [c0002320] ret_from_syscall+0x0/0x3c
> xfs_force_shutdown(dm-0,0x8) called from line 1170 of file
> fs/xfs/xfs_trans.c.  Return address = 0xc01b0b74
> Filesystem "dm-0": Corruption of in-memory data detected.  Shutting
> down filesystem: dm-0
> Please umount the filesystem, and rectify the problem(s)
> 
> What shoul I do now? use xfs_repair or use newer kernel ? Please let
> me know if you need other information.

Test upstream; if it passes, test kernels in between to see if you can find
out when it got fixed, and maybe you can backport it.

If it fails upstream, we have an unfixed bug and we'll try to help you find it.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-12-02  3:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-01  2:05 can xfs_repair guarantee a complete clean filesystem? hank peng
2009-12-01  3:54 ` Eric Sandeen
2009-12-01  4:37   ` hank peng
2009-12-01  5:58     ` Eric Sandeen
2009-12-01  6:34       ` hank peng
2009-12-01 14:43         ` Eric Sandeen
2009-12-01 15:32           ` hank peng
2009-12-01 15:44             ` Eric Sandeen
2009-12-02  0:46               ` hank peng
2009-12-02  1:08                 ` Eric Sandeen
2009-12-02  1:36                   ` hank peng
2009-12-02  2:39                   ` hank peng
2009-12-02  3:52                     ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox