* kernel oops on debian , 2.6.18-5 @ 2007-12-18 9:20 Yann Dupont 2007-12-18 12:32 ` David Chinner 0 siblings, 1 reply; 5+ messages in thread From: Yann Dupont @ 2007-12-18 9:20 UTC (permalink / raw) To: xfs; +Cc: Jacky Carimalo Hello, we got a kernel oops, probably in xfs on a debian kernel. This volume is on SAN + device mapper. this is a 1 TB volume. It was in service for more than 2 ou 3 years. There is a high humber of files on it, as this volume serves for a rsyncd, where 200+ servers sync their root filesystem on it every day. here is the oops : Dec 16 23:27:32 inchgower kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1561 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff881857b7 Dec 16 23:27:32 inchgower kernel: Dec 16 23:27:32 inchgower kernel: Call Trace: Dec 16 23:27:32 inchgower kernel: [<ffffffff88183ec0>] :xfs:xfs_free_ag_extent+0x19f/0x67f Dec 16 23:27:32 inchgower kernel: [<ffffffff881857b7>] :xfs:xfs_free_extent+0xa9/0xc9 Dec 16 23:27:32 inchgower kernel: [<ffffffff88192526>] :xfs:xfs_bmap_finish+0xf0/0x169 Dec 16 23:27:32 inchgower kernel: [<ffffffff881b0185>] :xfs:xfs_itruncate_finish+0x172/0x2b3 Dec 16 23:27:32 inchgower kernel: [<ffffffff881c9683>] :xfs:xfs_inactive+0x22e/0x823 Dec 16 23:27:32 inchgower kernel: [<ffffffff8023e8a9>] pagevec_lookup+0x17/0x1e Dec 16 23:27:32 inchgower kernel: [<ffffffff8022a5b4>] truncate_inode_pages_range+0x1b1/0x277 Dec 16 23:27:32 inchgower kernel: [<ffffffff881d3146>] :xfs:xfs_fs_clear_inode+0xa5/0xec Dec 16 23:27:32 inchgower kernel: [<ffffffff8022177b>] clear_inode+0xc5/0xf6 Dec 16 23:27:32 inchgower kernel: [<ffffffff8022dfa0>] generic_delete_inode+0xde/0x143 Dec 16 23:27:32 inchgower kernel: [<ffffffff8020cc6b>] dput+0x135/0x153 Dec 16 23:27:32 inchgower kernel: [<ffffffff802355c0>] sys_renameat+0x19b/0x20a Dec 16 23:27:32 inchgower kernel: [<ffffffff8020bee8>] _atomic_dec_and_lock+0x39/0x57 Dec 16 23:27:32 inchgower kernel: [<ffffffff8022ba1e>] mntput_no_expire+0x19/0x8b Dec 16 23:27:32 inchgower kernel: [<ffffffff8025d2a2>] ia32_sysret+0x0/0xa Dec 16 23:27:32 inchgower kernel: Dec 16 23:27:32 inchgower kernel: xfs_force_shutdown(dm-3,0x8) called from line 4267 of file fs/xfs/xfs_bmap.c. Return address = 0xffffffff88192563 Dec 16 23:27:32 inchgower kernel: Filesystem "dm-3": Corruption of in-memory data detected. Shutting down filesystem: dm-3 Dec 16 23:27:32 inchgower kernel: Please umount the filesystem, and rectify the problem(s) and the kernel : inchgower:/var/log# uname -a Linux inchgower 2.6.18-5-vserver-amd64 #1 SMP Fri Jun 1 00:27:03 UTC 2007 x86_64 GNU/Linux Please note that it is not the "generic" debian kernel, but the vserver one - but stock etch version anyway. We had not seen any problems with this combination, (xfs + debian kernel-vserver) which is very largely deployed here. This is a first. Do you think the problems is due to xfs or other factors ? Sincerely, -- Yann Dupont, Cri de l'université de Nantes Tel: 02.51.12.53.91 - Fax: 02.51.12.58.60 - Yann.Dupont@univ-nantes.fr ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel oops on debian , 2.6.18-5 2007-12-18 9:20 kernel oops on debian , 2.6.18-5 Yann Dupont @ 2007-12-18 12:32 ` David Chinner 2007-12-18 14:41 ` Yann Dupont 0 siblings, 1 reply; 5+ messages in thread From: David Chinner @ 2007-12-18 12:32 UTC (permalink / raw) To: Yann Dupont; +Cc: xfs, Jacky Carimalo On Tue, Dec 18, 2007 at 10:20:21AM +0100, Yann Dupont wrote: > Hello, we got a kernel oops, probably in xfs on a debian kernel. > > This volume is on SAN + device mapper. > this is a 1 TB volume. It was in service for more than 2 ou 3 years. > There is a high humber of files on it, as this volume serves for a > rsyncd, where 200+ servers sync their root filesystem on it every day. > > here is the oops : > > Dec 16 23:27:32 inchgower kernel: XFS internal error > XFS_WANT_CORRUPTED_GOTO at line 1561 of file fs/xfs/xfs_alloc.c. Caller > 0xffffffff881857b7 > Dec 16 23:27:32 inchgower kernel: > Dec 16 23:27:32 inchgower kernel: Call Trace: > Dec 16 23:27:32 inchgower kernel: [<ffffffff88183ec0>] > :xfs:xfs_free_ag_extent+0x19f/0x67f corrupted freespace btree. what does xfs_check tell you about the filesystem on dm-3? > Please note that it is not the "generic" debian kernel, but the vserver > one - but stock etch version anyway. We had not seen any problems with > this combination, (xfs + debian kernel-vserver) which is very largely > deployed here. This is a first. > > Do you think the problems is due to xfs or other factors ? Could be a hardware problem. Could be an XFs problem. Coul dbe a dm problem. I really can't say from a shutdown message like this - all it tells us is that a btree block was corrupted by something since the last time it was checked.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel oops on debian , 2.6.18-5 2007-12-18 12:32 ` David Chinner @ 2007-12-18 14:41 ` Yann Dupont 2007-12-19 0:38 ` Barry Naujok 0 siblings, 1 reply; 5+ messages in thread From: Yann Dupont @ 2007-12-18 14:41 UTC (permalink / raw) To: David Chinner; +Cc: xfs, Jacky Carimalo David Chinner wrote: > On Tue, Dec 18, 2007 at 10:20:21AM +0100, Yann Dupont wrote: > >> Hello, we got a kernel oops, probably in xfs on a debian kernel. >> >> This volume is on SAN + device mapper. >> this is a 1 TB volume. It was in service for more than 2 ou 3 years. >> There is a high humber of files on it, as this volume serves for a >> rsyncd, where 200+ servers sync their root filesystem on it every day. >> >> here is the oops : >> >> Dec 16 23:27:32 inchgower kernel: XFS internal error >> XFS_WANT_CORRUPTED_GOTO at line 1561 of file fs/xfs/xfs_alloc.c. Caller >> 0xffffffff881857b7 >> Dec 16 23:27:32 inchgower kernel: >> Dec 16 23:27:32 inchgower kernel: Call Trace: >> Dec 16 23:27:32 inchgower kernel: [<ffffffff88183ec0>] >> :xfs:xfs_free_ag_extent+0x19f/0x67f >> > > corrupted freespace btree. what does xfs_check tell you about the > filesystem on dm-3? > > xfs_check tells me to run xfs_repair -L, the attempts to mount the FS to clear the logs ending in kernel oops. XFS internal error XFS_WANT_CORRUPTED_RETURN at line 281 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff88182f74 Call Trace: [<ffffffff881816ed>] :xfs:xfs_alloc_fixup_trees+0x2fa/0x30b [<ffffffff88198822>] :xfs:xfs_btree_setbuf+0x1f/0x89 [<ffffffff88182f74>] :xfs:xfs_alloc_ag_vextent+0xbd4/0xf5e [<ffffffff88183aa5>] :xfs:xfs_alloc_vextent+0x2ce/0x401 [<ffffffff88191a70>] :xfs:xfs_bmapi+0x1068/0x1c85 [<ffffffff881c85f2>] :xfs:kmem_zone_alloc+0x56/0xa3 [<ffffffff8819ca78>] :xfs:xfs_dir2_grow_inode+0xca/0x2d4 [<ffffffff8819d8df>] :xfs:xfs_dir2_sf_to_block+0xad/0x5ba [<ffffffff881b001b>] :xfs:xfs_inode_item_init+0x1e/0x7a [<ffffffff881a4348>] :xfs:xfs_dir2_sf_addname+0x19d/0x4cf [<ffffffff8819d43e>] :xfs:xfs_dir_createname+0xc4/0x134 [<ffffffff881c865d>] :xfs:kmem_zone_zalloc+0x1e/0x2f [<ffffffff881b001b>] :xfs:xfs_inode_item_init+0x1e/0x7a [<ffffffff881c6065>] :xfs:xfs_create+0x39d/0x5dd [<ffffffff881ce702>] :xfs:xfs_vn_mknod+0x1bd/0x3c8inchgower:~# strace -fp 7885 Process 17194 attached with 6 threads - interrupt to quit [<ffffffff80220a18>] __up_read+0x13/0x8a [<ffffffff881aa75e>] :xfs:xfs_iunlock+0x57/0x79 [<ffffffff881c3392>] :xfs:xfs_access+0x3d/0x46 [<ffffffff8819d112>] :xfs:xfs_dir_lookup+0xa2/0x122 [<ffffffff8020e0c5>] link_path_walk+0xd3/0xe5 [<ffffffff80239138>] vfs_create+0xe7/0x12c [<ffffffff80219430>] open_namei+0x18c/0x6a0 [<ffffffff881cc5bb>] :xfs:xfs_file_open+0x27/0x2c [<ffffffff80225d1d>] do_filp_open+0x1c/0x3d [<ffffffff802180e0>] do_sys_open+0x44/0xc5 [<ffffffff8025d2a2>] ia32_sysret+0x0/0xa Filesystem "dm-1": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xffffffff881c6253 Call Trace: [<ffffffff881bdeac>] :xfs:xfs_trans_cancel+0x5b/0xfe [<ffffffff881c6253>] :xfs:xfs_create+0x58b/0x5dd [<ffffffff881ce702>] :xfs:xfs_vn_mknod+0x1bd/0x3c8 [<ffffffff80220a18>] __up_read+0x13/0x8a [<ffffffff881aa75e>] :xfs:xfs_iunlock+0x57/0x79 [<ffffffff881c3392>] :xfs:xfs_access+0x3d/0x46 [<ffffffff8819d112>] :xfs:xfs_dir_lookup+0xa2/0x122 [<ffffffff8020e0c5>] link_path_walk+0xd3/0xe5 [<ffffffff80239138>] vfs_create+0xe7/0x12c [<ffffffff80219430>] open_namei+0x18c/0x6a0 [<ffffffff881cc5bb>] :xfs:xfs_file_open+0x27/0x2c [<ffffffff80225d1d>] do_filp_open+0x1c/0x3d [<ffffffff802180e0>] do_sys_open+0x44/0xc5 [<ffffffff8025d2a2>] ia32_sysret+0x0/0xa I've been upgrading the xfs_repair to last version available on debian (xfs_repair version 2.9.4) There are lots of errors reported (don't have the beginning on the console) ... data fork in ino 3628932549 claims free block 226749351 data fork in ino 3628932549 claims free block 226749352 data fork in ino 3628932549 claims free block 226749353 data fork in ino 3628932549 claims free block 226749354 data fork in ino 3628932549 claims free block 226749355 data fork in ino 3628932549 claims free block 226749356 data fork in ino 3628932549 claims free block 226749357 data fork in ino 3628932549 claims free block 226749358 data fork in ino 3628932549 claims free block 226749359 data fork in ino 3628932549 claims free block 226749360 data fork in ino 3628932549 claims free block 226749361 data fork in ino 3628932549 claims free block 226749362 data fork in ino 3628932549 claims free block 226749363 imap claims a free inode 3629547632 is in use, correcting imap and clearing inode - agno = 28 - agno = 29 data fork in ino 3894217924 claims free block 243388605 data fork in ino 3894217924 claims free block 243388606 data fork in ino 3899211601 claims free block 243702250 data fork in ino 3899211601 claims free block 243702251 data fork in ino 3899211601 claims free block 243702252 data fork in ino 3907562994 claims free block 244222632 data fork in ino 3907562994 claims free block 244222633 data fork in ino 3907562994 claims free block 244222634 data fork in ino 3907562994 claims free block 244222635 data fork in ino 3907562994 claims free block 244222636 data fork in ino 3910289697 claims free block 244393117 data fork in ino 3910289697 claims free block 244393118 data fork in ino 3910289699 claims free block 244393113 .... and in the end : - agno = 31 correcting imap correcting imap correcting imap correcting imap correcting imap - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 ) And now the process seems stuck. There is no activity on the san disk ; a ps show this : root 7885 6466 7885 0 6 1447133 5660020 6 09:55 pts/0 00:00:19 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17190 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17191 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17192 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17193 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 root 7885 6466 17194 0 6 1447133 5660020 6 10:16 pts/0 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 and a strace this : inchgower:~# strace -fp 7885 Process 17194 attached with 6 threads - interrupt to quit [pid 17191] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...> [pid 17192] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...> [pid 17193] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...> [pid 17194] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...> [pid 17190] futex(0x67e4f8, FUTEX_WAIT, 2, NULL Can I stop the process and start another version without risking problems ? > Could be a hardware problem. Could be an XFs problem. Coul dbe a dm problem. > I really can't say from a shutdown message like this - all it tells us is > that a btree block was corrupted by something since the last time it was > checked.... > > Cheers, > > Dave. > OK, cheers, -- Yann Dupont, Cri de l'université de Nantes Tel: 02.51.12.53.91 - Fax: 02.51.12.58.60 - Yann.Dupont@univ-nantes.fr ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel oops on debian , 2.6.18-5 2007-12-18 14:41 ` Yann Dupont @ 2007-12-19 0:38 ` Barry Naujok 2007-12-19 10:27 ` Yann Dupont 0 siblings, 1 reply; 5+ messages in thread From: Barry Naujok @ 2007-12-19 0:38 UTC (permalink / raw) To: Yann Dupont, David Chinner; +Cc: xfs, Jacky Carimalo On Wed, 19 Dec 2007 01:41:36 +1100, Yann Dupont <Yann.Dupont@univ-nantes.fr> wrote: > David Chinner wrote: >> On Tue, Dec 18, 2007 at 10:20:21AM +0100, Yann Dupont wrote: >> >>> Hello, we got a kernel oops, probably in xfs on a debian kernel. >>> >>> This volume is on SAN + device mapper. >>> this is a 1 TB volume. It was in service for more than 2 ou 3 years. >>> There is a high humber of files on it, as this volume serves for a >>> rsyncd, where 200+ servers sync their root filesystem on it every day. >>> >>> here is the oops : >>> >>> Dec 16 23:27:32 inchgower kernel: XFS internal error >>> XFS_WANT_CORRUPTED_GOTO at line 1561 of file fs/xfs/xfs_alloc.c. >>> Caller >>> 0xffffffff881857b7 >>> Dec 16 23:27:32 inchgower kernel: >>> Dec 16 23:27:32 inchgower kernel: Call Trace: >>> Dec 16 23:27:32 inchgower kernel: [<ffffffff88183ec0>] >>> :xfs:xfs_free_ag_extent+0x19f/0x67f >>> >> >> corrupted freespace btree. what does xfs_check tell you about the >> filesystem on dm-3? >> >> > xfs_check tells me to run xfs_repair -L, the attempts to mount the FS > to clear the logs ending in kernel oops. [snip] > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > > ) > > And now the process seems stuck. > There is no activity on the san disk ; > > a ps show this : > > root 7885 6466 7885 0 6 1447133 5660020 6 09:55 pts/0 > 00:00:19 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17190 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17191 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17192 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17193 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > root 7885 6466 17194 0 6 1447133 5660020 6 10:16 pts/0 > 00:00:00 xfs_repair -L /dev/evms/DATAXFS2 > > > and a strace this : > inchgower:~# strace -fp 7885 > Process 17194 attached with 6 threads - interrupt to quit > [pid 17191] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...> > [pid 17192] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...> > [pid 17193] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...> > [pid 17194] futex(0x2aab3c8fa884, FUTEX_WAIT, 44, NULL <unfinished ...> > [pid 17190] futex(0x67e4f8, FUTEX_WAIT, 2, NULL > > Can I stop the process and start another version without risking > problems ? Yes, you can stop and restart. In your scenario, run xfs_repair -P to disable prefetch which is getting stuck. Barry. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel oops on debian , 2.6.18-5 2007-12-19 0:38 ` Barry Naujok @ 2007-12-19 10:27 ` Yann Dupont 0 siblings, 0 replies; 5+ messages in thread From: Yann Dupont @ 2007-12-19 10:27 UTC (permalink / raw) To: Barry Naujok; +Cc: David Chinner, xfs, Jacky Carimalo Barry Naujok wrote: > >> >> Can I stop the process and start another version without risking >> problems ? > > Yes, you can stop and restart. In your scenario, run xfs_repair -P to > disable prefetch which is getting stuck. > > Barry. > > > Thanks for the tip, xfs_repair -P did the job. Only minimal problems found : (9856 K in lost+found for a 1 TB volume) inchgower:/var/lib/vservers/VS_Archive/Archive/lost+found# ls -al total 9860 drwxr-xr-x 2 root root 4096 2007-12-19 10:28 . drwxr-xr-x 32 root root 4096 2007-12-04 16:26 .. -rw-r----- 1 root root 0 2007-12-16 19:33 1311409765 -rwx------ 1 root www-data 4992 2007-11-13 11:47 1388664851 -rwx------ 1 root root 4643 2007-09-27 15:51 1388664852 -rw------- 1 ntp 104 369 2007-12-16 22:36 1624155084 -rw-r----- 1 root root 0 2007-12-16 22:03 17043457 -rw-r--r-- 1 root root 8358 2005-05-10 21:54 1810860709 -rw-r--r-- 1 root root 7644 2005-05-10 21:54 1818650492 -rw-r--r-- 1 root root 45620 2005-05-10 21:54 1818650493 -rw-r--r-- 1 root root 5304 2005-05-10 21:54 1818840419 -rw-r--r-- 1 root root 5034 2005-05-10 21:54 1818841748 -rw-r--r-- 1 root root 1300 2005-05-10 21:54 1818842113 -rw-r--r-- 1 root root 6040 2005-05-10 21:54 1818842115 -rw-r--r-- 1 root root 4008 2005-05-10 21:54 1818842134 -rw-r--r-- 1 root root 4019 2005-05-10 21:54 1818873357 -rwx------ 1 root www-data 732 2007-12-14 15:24 1900698499 -rw-r--r-- 1 root root 1060 2007-02-21 18:48 2041002887 -rw-r--r-- 1 root root 152 2005-11-10 17:09 2041002903 -rwx------ 1 root www-data 0 2007-12-15 23:33 2153644053 -rwx------ 1 root www-data 0 2007-12-15 23:33 2158002845 -rw-r--r-- 1 root root 20480 2007-12-16 22:03 2572244357 drwxr-xr-x 2 root root 75 2007-12-16 23:27 268556858 drwxr-xr-x 2 root root 6 2007-12-16 23:27 268556859 -rw-r--r-- 1 root root 20480 2007-12-16 19:33 273918729 -rw-r--r-- 1 root root 1193201 2007-12-16 16:06 3223675010 -rw-r----- 1 root root 0 2007-12-16 19:33 3226075506 -rw-r--r-- 1 root root 22451 2007-12-16 16:07 3226075507 -rwx------ 1 root www-data 19211 2007-12-14 11:22 3361035798 -rwx------ 1 root www-data 1265 2007-12-14 17:58 3628814835 -rwx------ 1 root www-data 0 2007-12-15 22:00 3910289706 -rwx------ 1 root www-data 19211 2007-12-14 13:57 4055379312 -rwx------ 1 root www-data 14360 2007-12-14 17:58 4055415347 -rwx------ 1 root www-data 14346 2007-12-10 18:01 4055415352 -rwx------ 1 root www-data 19225 2007-12-14 17:58 4055415353 drwxr-xr-x 2 root root 19 2007-12-16 23:27 4160831758 drwxr-xr-x 2 root root 20 2007-12-16 23:27 4160831759 drwxr-xr-x 2 root root 6 2005-09-15 11:44 4160831760 -rw-r--r-- 1 root root 1387921 2007-11-05 02:02 620696741 -rw-r--r-- 1 root root 4715935 2007-11-06 02:02 620696742 -rw-r--r-- 1 root root 1389488 2007-11-07 02:02 620696743 -rw-r--r-- 1 root root 1193251 2007-12-16 20:49 633648296 as you can see, some entries on lost+found are from 2005-05-10 (we had some bad power fault at that time), so maybe the corruption was there since a very long time. Cheers, -- Yann Dupont, Cri de l'université de Nantes Tel: 02.51.12.53.91 - Fax: 02.51.12.58.60 - Yann.Dupont@univ-nantes.fr ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-12-19 10:27 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-18 9:20 kernel oops on debian , 2.6.18-5 Yann Dupont 2007-12-18 12:32 ` David Chinner 2007-12-18 14:41 ` Yann Dupont 2007-12-19 0:38 ` Barry Naujok 2007-12-19 10:27 ` Yann Dupont
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox