public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Re: kernel oops on debian, 2.6.18-5, large xfs volume
@ 2008-01-25  9:41 lxh
  0 siblings, 0 replies; 2+ messages in thread
From: lxh @ 2008-01-25  9:41 UTC (permalink / raw)
  To: xfs

Hi,
 	
======= 2008-01-25 16:01:54 =======

>On Fri, Jan 25, 2008 at 03:16:36PM +0800, lxh wrote:
>> Hello, 
>>    we have dozens of file servers with a 1.5TB/2.5 TB large xfs file system
>>    volume running on a RAID6 SATA array.  Each volume contains about
>>    10,000,000 files. The Operating system is debian GNU/Linux 2.6.18-5-amd64
>>    #1 SMP. we got a kernel oops frequently last year.
>> 
>> here is the oops :
>>  Filesystem "cciss/c0d1": XFS internal error xfs_trans_cancel at line 1138
>>  of file fs/xfs/xfs_trans.c.  Caller 0xffffffff881df006
>>  Call Trace:
>>  [<ffffffff881fed18>] :xfs:xfs_trans_cancel+0x5b/0xfe
>>  [<ffffffff88207006>] :xfs:xfs_create+0x58b/0x5dd
>>  [<ffffffff8820f496>] :xfs:xfs_vn_mknod+0x1bd/0x3c8
>
>Are you running out of space in the filesystem?
    we did not run out of space. there is enough space for writing.
>
>The only vectors I've seen that can cause this are I/O errors
>or ENOSPC during file create after we've already checked that
>this cannot happen. Are there any I/O errors in the log?
>
After we run xfs_repair, it outputs nothing special. 
I guess this problem be related with big volume and a mass of small files. Some servers are equipped with same hardware and software, but they are configured with 1TB volume and stored big files. This problem never happen on them.

>This commit:
>
>http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=45c34141126a89da07197d5b89c04c6847f1171a
>
>which is in 2.6.23 fixed the last known cause of the ENOSPC
>issue, so upgrading the kernel or patching this fix back
>to the 2.6.18 kernel may fix the problem if it is related to
>ENOSPC.
Thank you very much for your help! I will try this patch on some machines.

>
>>  Every time the error occurs, the volume can not be accessed. So we have to
>>  umount this volume, run xfs_repair, and then remount it. This problem
>>  causes seriously impact of our service.
>
>Anyway, next time it happens, can you please run xfs_check on the
>filesystem first and post the output? If there is no output, then
>the filesystem is fine and you don't need to run repair.

The volume is unusable when it happens. So we run xfs_repair. The xfs_repair operation output nothing special. But after xfs_repair, we can access the volume again. I don't konw why.
>
>If it is not fine, can also post the output of xfs_repair?
>
>Once the filesystem has been fixed up, can you then post the
>output of this command to tell us the space usage in the filesystems?
>
># xfs_db -r -c 'sb 0' -c p <dev>
I will comply with the your suggestions when it happens again, and then contact you.

>
>Cheers,
>
>Dave.
>-- 
>Dave Chinner
>Principal Engineer
>SGI Australian Software Group

= = = = = = = = = = = = = = = = = = = =
Cheers,			
Luoxiaohua
NetEase.com Inc				 
        
        lxhzju@163.com
          2008-01-25

^ permalink raw reply	[flat|nested] 2+ messages in thread
* kernel oops on debian, 2.6.18-5, large xfs volume
@ 2008-01-25  7:16 lxh
  2008-01-25  8:01 ` David Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: lxh @ 2008-01-25  7:16 UTC (permalink / raw)
  To: xfs

Hello, 
   we have dozens of file servers with a 1.5TB/2.5 TB large xfs file system volume running on a RAID6 SATA array.  Each volume contains about 10,000,000 files. The Operating system is debian GNU/Linux 2.6.18-5-amd64 #1 SMP. we got a kernel oops frequently last year. 

here is the oops :
 Filesystem "cciss/c0d1": XFS internal error xfs_trans_cancel at line 1138
 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff881df006
 Call Trace:
 [<ffffffff881fed18>] :xfs:xfs_trans_cancel+0x5b/0xfe
 [<ffffffff88207006>] :xfs:xfs_create+0x58b/0x5dd
 [<ffffffff8820f496>] :xfs:xfs_vn_mknod+0x1bd/0x3c8
 [<ffffffff8027d27d>] default_wake_function+0x0/0xe
 [<ffffffff802200e5>] __up_read+0x13/0x8a
 [<ffffffff881eb682>] :xfs:xfs_iunlock+0x57/0x79
 [<ffffffff88204180>] :xfs:xfs_lookup+0x6c/0x7d
 [<ffffffff802200e5>] __up_read+0x13/0x8a
 [<ffffffff881eb682>] :xfs:xfs_iunlock+0x57/0x79
 [<ffffffff882041ce>] :xfs:xfs_access+0x3d/0x46
 [<ffffffff8820fa4b>] :xfs:xfs_vn_permission+0x14/0x18
 [<ffffffff8020cc7d>] permission+0x87/0xce
 [<ffffffff80208f26>] __link_path_walk+0x16a/0xf3c
 [<ffffffff8022ae52>] mntput_no_expire+0x19/0x8b
 [<ffffffff8020dd5f>] link_path_walk+0xd3/0xe5
 [<ffffffff802381ed>] vfs_create+0xe7/0x12c
 [<ffffffff80218efb>] open_namei+0x18d/0x69c
 [<ffffffff802252f1>] do_filp_open+0x1c/0x3d
 [<ffffffff80217baa>] do_sys_open+0x44/0xc5
 [<ffffffff802584d6>] system_call+0x7e/0x83
  
 Every time the error occurs, the volume can not be accessed. So we have to umount this volume, run xfs_repair, and then remount it. This problem causes seriously impact of our service. 
  Could you help me resolve this problem ? 

        Luo xiaohua
        lxhzju@163.com
          2008-01-25

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-02-21  7:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-25  9:41 Re: kernel oops on debian, 2.6.18-5, large xfs volume lxh
  -- strict thread matches above, loose matches on Subject: below --
2008-01-25  7:16 lxh
2008-01-25  8:01 ` David Chinner
2008-02-21  7:34   ` lxh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox