* XFS Kernel Panics in CentOS
@ 2012-03-30 22:02 Mark Rechler
2012-03-30 22:44 ` Eric Sandeen
0 siblings, 1 reply; 9+ messages in thread
From: Mark Rechler @ 2012-03-30 22:02 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 1580 bytes --]
Hi Everyone,
We've been getting a lot of errors (across several kernels) and eventually
a kernel panic. Any insight into these errors would be much appreciated.
Errors:
Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line 2112 of file
fs/xfs/xfs_da_btree.c. Caller 0xffffffff883c1826
Call Trace:
[<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
[<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
[<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
[<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
[<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
[<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
[<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
[<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
[<ffffffff8000f550>] generic_permission+0x40/0xca
[<ffffffff8000d902>] permission+0x81/0xc8
[<ffffffff8000999d>] __link_path_walk+0x173/0xf42
[<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
[<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
[<ffffffff8001278e>] getname+0x15b/0x1c2
[<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
[<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
[<ffffffff8008c46e>] default_wake_function+0x0/0xe
[<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
[<ffffffff8002a996>] sys_newlstat+0x19/0x31
[<ffffffff8005d229>] tracesys+0x71/0xe0
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
RIP [<ffffffffff8841bfaf>] :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
RSP <ffff81020752dbc8>
CR2: 00000000000002
<0>Kernel panic - not syncing: Fatal exception
Thanks,
Mark
[-- Attachment #1.2: Type: text/html, Size: 1866 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS Kernel Panics in CentOS
2012-03-30 22:02 XFS Kernel Panics in CentOS Mark Rechler
@ 2012-03-30 22:44 ` Eric Sandeen
2012-04-02 15:09 ` Mark Rechler
0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2012-03-30 22:44 UTC (permalink / raw)
To: Mark Rechler; +Cc: xfs
On 3/30/12 5:02 PM, Mark Rechler wrote:
> Hi Everyone,
>
> We've been getting a lot of errors (across several kernels) and eventually a kernel panic. Any insight into these errors would be much appreciated.
>
> Errors:
> Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff883c1826
Saying which CentOS it is would help ;) And, standard disclaimers about how CentOS doesn't come with upstream _or_ distro support, etc etc...
But xfs_da_do_buf(2) indicates on-disk corruption, having encountered a bad magic number when reading from the disk. Have you tried xfs_repair?
-Eric
> Call Trace:
> [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
> [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
> [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
> [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
> [<ffffffff8000f550>] generic_permission+0x40/0xca
> [<ffffffff8000d902>] permission+0x81/0xc8
> [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
> [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
> [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
> [<ffffffff8001278e>] getname+0x15b/0x1c2
> [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
> [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
> [<ffffffff8008c46e>] default_wake_function+0x0/0xe
> [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
> [<ffffffff8002a996>] sys_newlstat+0x19/0x31
> [<ffffffff8005d229>] tracesys+0x71/0xe0
> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
> Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
> RIP [<ffffffffff8841bfaf>] :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
> RSP <ffff81020752dbc8>
> CR2: 00000000000002
> <0>Kernel panic - not syncing: Fatal exception
>
> Thanks,
> Mark
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS Kernel Panics in CentOS
2012-03-30 22:44 ` Eric Sandeen
@ 2012-04-02 15:09 ` Mark Rechler
2012-04-02 18:03 ` Eric Sandeen
0 siblings, 1 reply; 9+ messages in thread
From: Mark Rechler @ 2012-04-02 15:09 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 2833 bytes --]
Hi Eric,
Thank you for the reply. We are running CentOS 5.8, with the
2.6.18-164.10.1.el5.centos.plus kernel as it was mentioned in a bug report
that has similar behavior, but ultimately a different kernel panic (
http://bugs.centos.org/view.php?id=4089). We have tried running xfs_repair
in the past and it has not proved useful. The odd part is that these are
fresh systems (just installed). If it helps, we are also running glusterfs
on these boxes though load does not always correlate to a kernel panic.
Thanks,
Mark
On Fri, Mar 30, 2012 at 6:44 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 3/30/12 5:02 PM, Mark Rechler wrote:
> > Hi Everyone,
> >
> > We've been getting a lot of errors (across several kernels) and
> eventually a kernel panic. Any insight into these errors would be much
> appreciated.
> >
> > Errors:
> > Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line 2112 of
> file fs/xfs/xfs_da_btree.c. Caller 0xffffffff883c1826
>
> Saying which CentOS it is would help ;) And, standard disclaimers about
> how CentOS doesn't come with upstream _or_ distro support, etc etc...
>
> But xfs_da_do_buf(2) indicates on-disk corruption, having encountered a
> bad magic number when reading from the disk. Have you tried xfs_repair?
>
> -Eric
>
> > Call Trace:
> > [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
> > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> > [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
> > [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
> > [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
> > [<ffffffff8000f550>] generic_permission+0x40/0xca
> > [<ffffffff8000d902>] permission+0x81/0xc8
> > [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
> > [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
> > [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
> > [<ffffffff8001278e>] getname+0x15b/0x1c2
> > [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
> > [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
> > [<ffffffff8008c46e>] default_wake_function+0x0/0xe
> > [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
> > [<ffffffff8002a996>] sys_newlstat+0x19/0x31
> > [<ffffffff8005d229>] tracesys+0x71/0xe0
> > [<ffffffff8005d28d>] tracesys+0xd5/0xe0
> >
> > Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
> > RIP [<ffffffffff8841bfaf>] :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
> > RSP <ffff81020752dbc8>
> > CR2: 00000000000002
> > <0>Kernel panic - not syncing: Fatal exception
> >
> > Thanks,
> > Mark
> >
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
>
[-- Attachment #1.2: Type: text/html, Size: 3737 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS Kernel Panics in CentOS
2012-04-02 15:09 ` Mark Rechler
@ 2012-04-02 18:03 ` Eric Sandeen
2012-06-29 4:46 ` Changliang Chen
0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2012-04-02 18:03 UTC (permalink / raw)
To: Mark Rechler; +Cc: xfs
On 4/2/12 8:09 AM, Mark Rechler wrote:
> Hi Eric,
>
> Thank you for the reply. We are running CentOS 5.8, with the
> 2.6.18-164.10.1.el5.centos.plus kernel as it was mentioned in a bug
> report that has similar behavior, but ultimately a different kernel
> panic (http://bugs.centos.org/view.php?id=4089). We have tried
> running xfs_repair in the past and it has not proved useful. The odd
> part is that these are fresh systems (just installed). If it helps,
> we are also running glusterfs on these boxes though load does not
> always correlate to a kernel panic.
I can't say for sure what's in that respun "extra" centos kernel,
but I can say this: the error you hit indicates that xfs read a
buffer, and wound up with a metadata buffer which had unrecognized
magic - i.e. it did not look like metadata as expected. Seeing what
looks like corruption, it shut down.
This reminds me a little of
https://bugzilla.redhat.com/show_bug.cgi?id=512552
which I fixed for RHEL customers a while back, where cancelled
readahead in MD was resulting in xfs thinking a buffer was
uptodate, but in fact it was uninitialized, hence it found
garbage and shut down in this way.
Something similar seems to be happening in your case, if xfs_repair
comes up clean; somehow xfs is getting hold of a buffer which
apparently doesn't match what xfs_repair found to be a consistent
filesystem.
So I might suspect something in the storage stack?
Also please be sure you don't have kmod-xfs or xfs-kmod installed
on your centos box, which is a truly ancient and completely unsupported
backport of xfs from long, long ago.
-Eric
> Thanks,
> Mark
>
> On Fri, Mar 30, 2012 at 6:44 PM, Eric Sandeen <sandeen@sandeen.net <mailto:sandeen@sandeen.net>> wrote:
>
> On 3/30/12 5:02 PM, Mark Rechler wrote:
> > Hi Everyone,
> >
> > We've been getting a lot of errors (across several kernels) and eventually a kernel panic. Any insight into these errors would be much appreciated.
> >
> > Errors:
> > Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff883c1826
>
> Saying which CentOS it is would help ;) And, standard disclaimers about how CentOS doesn't come with upstream _or_ distro support, etc etc...
>
> But xfs_da_do_buf(2) indicates on-disk corruption, having encountered a bad magic number when reading from the disk. Have you tried xfs_repair?
>
> -Eric
>
> > Call Trace:
> > [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
> > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> > [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
> > [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
> > [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
> > [<ffffffff8000f550>] generic_permission+0x40/0xca
> > [<ffffffff8000d902>] permission+0x81/0xc8
> > [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
> > [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
> > [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
> > [<ffffffff8001278e>] getname+0x15b/0x1c2
> > [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
> > [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
> > [<ffffffff8008c46e>] default_wake_function+0x0/0xe
> > [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
> > [<ffffffff8002a996>] sys_newlstat+0x19/0x31
> > [<ffffffff8005d229>] tracesys+0x71/0xe0
> > [<ffffffff8005d28d>] tracesys+0xd5/0xe0
> >
> > Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
> > RIP [<ffffffffff8841bfaf>] :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
> > RSP <ffff81020752dbc8>
> > CR2: 00000000000002
> > <0>Kernel panic - not syncing: Fatal exception
> >
> > Thanks,
> > Mark
> >
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com <mailto:xfs@oss.sgi.com>
> > http://oss.sgi.com/mailman/listinfo/xfs
>
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS Kernel Panics in CentOS
2012-04-02 18:03 ` Eric Sandeen
@ 2012-06-29 4:46 ` Changliang Chen
2012-06-29 4:52 ` Eric Sandeen
0 siblings, 1 reply; 9+ messages in thread
From: Changliang Chen @ 2012-06-29 4:46 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs, Mark Rechler
[-- Attachment #1.1: Type: text/plain, Size: 4790 bytes --]
Hi Eric,
Is this issue resolved? We have been getting the same problem, though
we had upgrated the kernel to 2.6.18-308.8.2.el5.
On Tue, Apr 3, 2012 at 2:03 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 4/2/12 8:09 AM, Mark Rechler wrote:
> > Hi Eric,
> >
> > Thank you for the reply. We are running CentOS 5.8, with the
> > 2.6.18-164.10.1.el5.centos.plus kernel as it was mentioned in a bug
> > report that has similar behavior, but ultimately a different kernel
> > panic (http://bugs.centos.org/view.php?id=4089). We have tried
> > running xfs_repair in the past and it has not proved useful. The odd
> > part is that these are fresh systems (just installed). If it helps,
> > we are also running glusterfs on these boxes though load does not
> > always correlate to a kernel panic.
>
> I can't say for sure what's in that respun "extra" centos kernel,
> but I can say this: the error you hit indicates that xfs read a
> buffer, and wound up with a metadata buffer which had unrecognized
> magic - i.e. it did not look like metadata as expected. Seeing what
> looks like corruption, it shut down.
>
> This reminds me a little of
> https://bugzilla.redhat.com/show_bug.cgi?id=512552
> which I fixed for RHEL customers a while back, where cancelled
> readahead in MD was resulting in xfs thinking a buffer was
> uptodate, but in fact it was uninitialized, hence it found
> garbage and shut down in this way.
>
> Something similar seems to be happening in your case, if xfs_repair
> comes up clean; somehow xfs is getting hold of a buffer which
> apparently doesn't match what xfs_repair found to be a consistent
> filesystem.
>
> So I might suspect something in the storage stack?
>
> Also please be sure you don't have kmod-xfs or xfs-kmod installed
> on your centos box, which is a truly ancient and completely unsupported
> backport of xfs from long, long ago.
>
> -Eric
>
> > Thanks,
> > Mark
> >
> > On Fri, Mar 30, 2012 at 6:44 PM, Eric Sandeen <sandeen@sandeen.net<mailto:
> sandeen@sandeen.net>> wrote:
> >
> > On 3/30/12 5:02 PM, Mark Rechler wrote:
> > > Hi Everyone,
> > >
> > > We've been getting a lot of errors (across several kernels) and
> eventually a kernel panic. Any insight into these errors would be much
> appreciated.
> > >
> > > Errors:
> > > Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line
> 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff883c1826
> >
> > Saying which CentOS it is would help ;) And, standard disclaimers
> about how CentOS doesn't come with upstream _or_ distro support, etc etc...
> >
> > But xfs_da_do_buf(2) indicates on-disk corruption, having
> encountered a bad magic number when reading from the disk. Have you tried
> xfs_repair?
> >
> > -Eric
> >
> > > Call Trace:
> > > [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> > > [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
> > > [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
> > > [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
> > > [<ffffffff8000f550>] generic_permission+0x40/0xca
> > > [<ffffffff8000d902>] permission+0x81/0xc8
> > > [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
> > > [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
> > > [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
> > > [<ffffffff8001278e>] getname+0x15b/0x1c2
> > > [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
> > > [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
> > > [<ffffffff8008c46e>] default_wake_function+0x0/0xe
> > > [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
> > > [<ffffffff8002a996>] sys_newlstat+0x19/0x31
> > > [<ffffffff8005d229>] tracesys+0x71/0xe0
> > > [<ffffffff8005d28d>] tracesys+0xd5/0xe0
> > >
> > > Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
> > > RIP [<ffffffffff8841bfaf>]
> :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
> > > RSP <ffff81020752dbc8>
> > > CR2: 00000000000002
> > > <0>Kernel panic - not syncing: Fatal exception
> > >
> > > Thanks,
> > > Mark
> > >
> > >
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com <mailto:xfs@oss.sgi.com>
> > > http://oss.sgi.com/mailman/listinfo/xfs
> >
> >
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
--
Regards,
Cocl
ops manager
19lou Operation & Maintenance Dept
[-- Attachment #1.2: Type: text/html, Size: 6774 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS Kernel Panics in CentOS
2012-06-29 4:46 ` Changliang Chen
@ 2012-06-29 4:52 ` Eric Sandeen
2012-06-29 4:58 ` Changliang Chen
0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2012-06-29 4:52 UTC (permalink / raw)
To: Changliang Chen; +Cc: xfs@oss.sgi.com, Mark Rechler
[-- Attachment #1.1: Type: text/plain, Size: 5199 bytes --]
On Jun 29, 2012, at 12:46 AM, Changliang Chen <hqucocl@gmail.com> wrote:
> Hi Eric,
>
> Is this issue resolved? We have been getting the same problem, though we had upgrated the kernel to 2.6.18-308.8.2.el5.
>
I do not know; if it were rhel I'd suggest logging a support ticket. I've not seen anything similar on rhel.
Did you make sure there is no xfs kmod rpm installed? What does modinfo xfs say?
> On Tue, Apr 3, 2012 at 2:03 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 4/2/12 8:09 AM, Mark Rechler wrote:
> > Hi Eric,
> >
> > Thank you for the reply. We are running CentOS 5.8, with the
> > 2.6.18-164.10.1.el5.centos.plus kernel as it was mentioned in a bug
> > report that has similar behavior, but ultimately a different kernel
> > panic (http://bugs.centos.org/view.php?id=4089). We have tried
> > running xfs_repair in the past and it has not proved useful. The odd
> > part is that these are fresh systems (just installed). If it helps,
> > we are also running glusterfs on these boxes though load does not
> > always correlate to a kernel panic.
>
> I can't say for sure what's in that respun "extra" centos kernel,
> but I can say this: the error you hit indicates that xfs read a
> buffer, and wound up with a metadata buffer which had unrecognized
> magic - i.e. it did not look like metadata as expected. Seeing what
> looks like corruption, it shut down.
>
> This reminds me a little of
> https://bugzilla.redhat.com/show_bug.cgi?id=512552
> which I fixed for RHEL customers a while back, where cancelled
> readahead in MD was resulting in xfs thinking a buffer was
> uptodate, but in fact it was uninitialized, hence it found
> garbage and shut down in this way.
>
> Something similar seems to be happening in your case, if xfs_repair
> comes up clean; somehow xfs is getting hold of a buffer which
> apparently doesn't match what xfs_repair found to be a consistent
> filesystem.
>
> So I might suspect something in the storage stack?
>
> Also please be sure you don't have kmod-xfs or xfs-kmod installed
> on your centos box, which is a truly ancient and completely unsupported
> backport of xfs from long, long ago.
>
> -Eric
>
> > Thanks,
> > Mark
> >
> > On Fri, Mar 30, 2012 at 6:44 PM, Eric Sandeen <sandeen@sandeen.net <mailto:sandeen@sandeen.net>> wrote:
> >
> > On 3/30/12 5:02 PM, Mark Rechler wrote:
> > > Hi Everyone,
> > >
> > > We've been getting a lot of errors (across several kernels) and eventually a kernel panic. Any insight into these errors would be much appreciated.
> > >
> > > Errors:
> > > Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff883c1826
> >
> > Saying which CentOS it is would help ;) And, standard disclaimers about how CentOS doesn't come with upstream _or_ distro support, etc etc...
> >
> > But xfs_da_do_buf(2) indicates on-disk corruption, having encountered a bad magic number when reading from the disk. Have you tried xfs_repair?
> >
> > -Eric
> >
> > > Call Trace:
> > > [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
> > > [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
> > > [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
> > > [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
> > > [<ffffffff8000f550>] generic_permission+0x40/0xca
> > > [<ffffffff8000d902>] permission+0x81/0xc8
> > > [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
> > > [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
> > > [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
> > > [<ffffffff8001278e>] getname+0x15b/0x1c2
> > > [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
> > > [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
> > > [<ffffffff8008c46e>] default_wake_function+0x0/0xe
> > > [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
> > > [<ffffffff8002a996>] sys_newlstat+0x19/0x31
> > > [<ffffffff8005d229>] tracesys+0x71/0xe0
> > > [<ffffffff8005d28d>] tracesys+0xd5/0xe0
> > >
> > > Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
> > > RIP [<ffffffffff8841bfaf>] :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
> > > RSP <ffff81020752dbc8>
> > > CR2: 00000000000002
> > > <0>Kernel panic - not syncing: Fatal exception
> > >
> > > Thanks,
> > > Mark
> > >
> > >
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com <mailto:xfs@oss.sgi.com>
> > > http://oss.sgi.com/mailman/listinfo/xfs
> >
> >
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
>
>
> --
>
> Regards,
>
> Cocl
> ops manager
> 19lou Operation & Maintenance Dept
[-- Attachment #1.2: Type: text/html, Size: 7811 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS Kernel Panics in CentOS
2012-06-29 4:52 ` Eric Sandeen
@ 2012-06-29 4:58 ` Changliang Chen
2012-06-29 15:04 ` Mark Rechler
0 siblings, 1 reply; 9+ messages in thread
From: Changliang Chen @ 2012-06-29 4:58 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs@oss.sgi.com, Mark Rechler
[-- Attachment #1.1: Type: text/plain, Size: 6083 bytes --]
Hi,
We sure that we haven't installed the xfs kmod,and the modinfo are:
# modinfo xfs
filename: /lib/modules/2.6.18-308.8.2.el5/kernel/fs/xfs/xfs.ko
license: GPL
description: SGI XFS with ACLs, security attributes, large block/inode
numbers, no debug enabled
author: Silicon Graphics, Inc.
srcversion: D37A003AFEE1A42BDD4DD56
depends:
vermagic: 2.6.18-308.8.2.el5 SMP mod_unload gcc-4.1
module_sig:
883f3504fd752a1a91bf303215fc9511247a309f792a2c9d45673dbc457399198719262a50135f0a083e666c424dff9de84f1f5eff01e607decb4921e
On Fri, Jun 29, 2012 at 12:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On Jun 29, 2012, at 12:46 AM, Changliang Chen <hqucocl@gmail.com> wrote:
>
> Hi Eric,
>
> Is this issue resolved? We have been getting the same problem, though
> we had upgrated the kernel to 2.6.18-308.8.2.el5.
>
> I do not know; if it were rhel I'd suggest logging a support ticket. I've
> not seen anything similar on rhel.
>
> Did you make sure there is no xfs kmod rpm installed? What does modinfo
> xfs say?
>
> On Tue, Apr 3, 2012 at 2:03 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
>
>> On 4/2/12 8:09 AM, Mark Rechler wrote:
>> > Hi Eric,
>> >
>> > Thank you for the reply. We are running CentOS 5.8, with the
>> > 2.6.18-164.10.1.el5.centos.plus kernel as it was mentioned in a bug
>> > report that has similar behavior, but ultimately a different kernel
>> > panic (http://bugs.centos.org/view.php?id=4089). We have tried
>> > running xfs_repair in the past and it has not proved useful. The odd
>> > part is that these are fresh systems (just installed). If it helps,
>> > we are also running glusterfs on these boxes though load does not
>> > always correlate to a kernel panic.
>>
>> I can't say for sure what's in that respun "extra" centos kernel,
>> but I can say this: the error you hit indicates that xfs read a
>> buffer, and wound up with a metadata buffer which had unrecognized
>> magic - i.e. it did not look like metadata as expected. Seeing what
>> looks like corruption, it shut down.
>>
>> This reminds me a little of
>> https://bugzilla.redhat.com/show_bug.cgi?id=512552
>> which I fixed for RHEL customers a while back, where cancelled
>> readahead in MD was resulting in xfs thinking a buffer was
>> uptodate, but in fact it was uninitialized, hence it found
>> garbage and shut down in this way.
>>
>> Something similar seems to be happening in your case, if xfs_repair
>> comes up clean; somehow xfs is getting hold of a buffer which
>> apparently doesn't match what xfs_repair found to be a consistent
>> filesystem.
>>
>> So I might suspect something in the storage stack?
>>
>> Also please be sure you don't have kmod-xfs or xfs-kmod installed
>> on your centos box, which is a truly ancient and completely unsupported
>> backport of xfs from long, long ago.
>>
>> -Eric
>>
>> > Thanks,
>> > Mark
>> >
>> > On Fri, Mar 30, 2012 at 6:44 PM, Eric Sandeen <sandeen@sandeen.net<mailto:
>> sandeen@sandeen.net>> wrote:
>> >
>> > On 3/30/12 5:02 PM, Mark Rechler wrote:
>> > > Hi Everyone,
>> > >
>> > > We've been getting a lot of errors (across several kernels) and
>> eventually a kernel panic. Any insight into these errors would be much
>> appreciated.
>> > >
>> > > Errors:
>> > > Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line
>> 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff883c1826
>> >
>> > Saying which CentOS it is would help ;) And, standard disclaimers
>> about how CentOS doesn't come with upstream _or_ distro support, etc etc...
>> >
>> > But xfs_da_do_buf(2) indicates on-disk corruption, having
>> encountered a bad magic number when reading from the disk. Have you tried
>> xfs_repair?
>> >
>> > -Eric
>> >
>> > > Call Trace:
>> > > [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
>> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>> > > [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
>> > > [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
>> > > [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
>> > > [<ffffffff8000f550>] generic_permission+0x40/0xca
>> > > [<ffffffff8000d902>] permission+0x81/0xc8
>> > > [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
>> > > [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
>> > > [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
>> > > [<ffffffff8001278e>] getname+0x15b/0x1c2
>> > > [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
>> > > [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
>> > > [<ffffffff8008c46e>] default_wake_function+0x0/0xe
>> > > [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
>> > > [<ffffffff8002a996>] sys_newlstat+0x19/0x31
>> > > [<ffffffff8005d229>] tracesys+0x71/0xe0
>> > > [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>> > >
>> > > Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
>> > > RIP [<ffffffffff8841bfaf>]
>> :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
>> > > RSP <ffff81020752dbc8>
>> > > CR2: 00000000000002
>> > > <0>Kernel panic - not syncing: Fatal exception
>> > >
>> > > Thanks,
>> > > Mark
>> > >
>> > >
>> > > _______________________________________________
>> > > xfs mailing list
>> > > xfs@oss.sgi.com <mailto:xfs@oss.sgi.com>
>> > > http://oss.sgi.com/mailman/listinfo/xfs
>> >
>> >
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>
>
>
> --
>
> Regards,
>
> Cocl
> ops manager
> 19lou Operation & Maintenance Dept
>
>
--
Regards,
Cocl
ops manager
19lou Operation & Maintenance Dept
[-- Attachment #1.2: Type: text/html, Size: 8709 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS Kernel Panics in CentOS
2012-06-29 4:58 ` Changliang Chen
@ 2012-06-29 15:04 ` Mark Rechler
2012-06-29 21:38 ` Stefan Ring
0 siblings, 1 reply; 9+ messages in thread
From: Mark Rechler @ 2012-06-29 15:04 UTC (permalink / raw)
To: Changliang Chen; +Cc: Eric Sandeen, xfs@oss.sgi.com
[-- Attachment #1.1: Type: text/plain, Size: 6725 bytes --]
Hi Everyone,
It turned out in my case to be related to:
http://oss.sgi.com/bugzilla/show_bug.cgi?id=840
Write barriers were not passed when using LVM/XFS/MegaRAID combined. After
upgrading the kernel to 2.6.39 (used packages from
http://elrepo.org/tiki/tiki-index.php) all XFS issues were resolved. The
other solution was not using LVM.
Hope this helps.
Mark
On Fri, Jun 29, 2012 at 12:58 AM, Changliang Chen <hqucocl@gmail.com> wrote:
> Hi,
>
> We sure that we haven't installed the xfs kmod,and the modinfo are:
>
> # modinfo xfs
> filename: /lib/modules/2.6.18-308.8.2.el5/kernel/fs/xfs/xfs.ko
> license: GPL
> description: SGI XFS with ACLs, security attributes, large block/inode
> numbers, no debug enabled
> author: Silicon Graphics, Inc.
> srcversion: D37A003AFEE1A42BDD4DD56
> depends:
> vermagic: 2.6.18-308.8.2.el5 SMP mod_unload gcc-4.1
> module_sig:
> 883f3504fd752a1a91bf303215fc9511247a309f792a2c9d45673dbc457399198719262a50135f0a083e666c424dff9de84f1f5eff01e607decb4921e
>
> On Fri, Jun 29, 2012 at 12:52 PM, Eric Sandeen <sandeen@sandeen.net>wrote:
>
>> On Jun 29, 2012, at 12:46 AM, Changliang Chen <hqucocl@gmail.com> wrote:
>>
>> Hi Eric,
>>
>> Is this issue resolved? We have been getting the same problem, though
>> we had upgrated the kernel to 2.6.18-308.8.2.el5.
>>
>> I do not know; if it were rhel I'd suggest logging a support ticket.
>> I've not seen anything similar on rhel.
>>
>> Did you make sure there is no xfs kmod rpm installed? What does modinfo
>> xfs say?
>>
>> On Tue, Apr 3, 2012 at 2:03 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>
>>> On 4/2/12 8:09 AM, Mark Rechler wrote:
>>> > Hi Eric,
>>> >
>>> > Thank you for the reply. We are running CentOS 5.8, with the
>>> > 2.6.18-164.10.1.el5.centos.plus kernel as it was mentioned in a bug
>>> > report that has similar behavior, but ultimately a different kernel
>>> > panic (http://bugs.centos.org/view.php?id=4089). We have tried
>>> > running xfs_repair in the past and it has not proved useful. The odd
>>> > part is that these are fresh systems (just installed). If it helps,
>>> > we are also running glusterfs on these boxes though load does not
>>> > always correlate to a kernel panic.
>>>
>>> I can't say for sure what's in that respun "extra" centos kernel,
>>> but I can say this: the error you hit indicates that xfs read a
>>> buffer, and wound up with a metadata buffer which had unrecognized
>>> magic - i.e. it did not look like metadata as expected. Seeing what
>>> looks like corruption, it shut down.
>>>
>>> This reminds me a little of
>>> https://bugzilla.redhat.com/show_bug.cgi?id=512552
>>> which I fixed for RHEL customers a while back, where cancelled
>>> readahead in MD was resulting in xfs thinking a buffer was
>>> uptodate, but in fact it was uninitialized, hence it found
>>> garbage and shut down in this way.
>>>
>>> Something similar seems to be happening in your case, if xfs_repair
>>> comes up clean; somehow xfs is getting hold of a buffer which
>>> apparently doesn't match what xfs_repair found to be a consistent
>>> filesystem.
>>>
>>> So I might suspect something in the storage stack?
>>>
>>> Also please be sure you don't have kmod-xfs or xfs-kmod installed
>>> on your centos box, which is a truly ancient and completely unsupported
>>> backport of xfs from long, long ago.
>>>
>>> -Eric
>>>
>>> > Thanks,
>>> > Mark
>>> >
>>> > On Fri, Mar 30, 2012 at 6:44 PM, Eric Sandeen <sandeen@sandeen.net<mailto:
>>> sandeen@sandeen.net>> wrote:
>>> >
>>> > On 3/30/12 5:02 PM, Mark Rechler wrote:
>>> > > Hi Everyone,
>>> > >
>>> > > We've been getting a lot of errors (across several kernels) and
>>> eventually a kernel panic. Any insight into these errors would be much
>>> appreciated.
>>> > >
>>> > > Errors:
>>> > > Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line
>>> 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff883c1826
>>> >
>>> > Saying which CentOS it is would help ;) And, standard disclaimers
>>> about how CentOS doesn't come with upstream _or_ distro support, etc etc...
>>> >
>>> > But xfs_da_do_buf(2) indicates on-disk corruption, having
>>> encountered a bad magic number when reading from the disk. Have you tried
>>> xfs_repair?
>>> >
>>> > -Eric
>>> >
>>> > > Call Trace:
>>> > > [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
>>> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>>> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>>> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>>> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>>> > > [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
>>> > > [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
>>> > > [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
>>> > > [<ffffffff8000f550>] generic_permission+0x40/0xca
>>> > > [<ffffffff8000d902>] permission+0x81/0xc8
>>> > > [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
>>> > > [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
>>> > > [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
>>> > > [<ffffffff8001278e>] getname+0x15b/0x1c2
>>> > > [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
>>> > > [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
>>> > > [<ffffffff8008c46e>] default_wake_function+0x0/0xe
>>> > > [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
>>> > > [<ffffffff8002a996>] sys_newlstat+0x19/0x31
>>> > > [<ffffffff8005d229>] tracesys+0x71/0xe0
>>> > > [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>>> > >
>>> > > Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
>>> > > RIP [<ffffffffff8841bfaf>]
>>> :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
>>> > > RSP <ffff81020752dbc8>
>>> > > CR2: 00000000000002
>>> > > <0>Kernel panic - not syncing: Fatal exception
>>> > >
>>> > > Thanks,
>>> > > Mark
>>> > >
>>> > >
>>> > > _______________________________________________
>>> > > xfs mailing list
>>> > > xfs@oss.sgi.com <mailto:xfs@oss.sgi.com>
>>> > > http://oss.sgi.com/mailman/listinfo/xfs
>>> >
>>> >
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>
>>
>>
>> --
>>
>> Regards,
>>
>> Cocl
>> ops manager
>> 19lou Operation & Maintenance Dept
>>
>>
>
>
> --
>
> Regards,
>
> Cocl
> ops manager
> 19lou Operation & Maintenance Dept
>
[-- Attachment #1.2: Type: text/html, Size: 9582 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS Kernel Panics in CentOS
2012-06-29 15:04 ` Mark Rechler
@ 2012-06-29 21:38 ` Stefan Ring
0 siblings, 0 replies; 9+ messages in thread
From: Stefan Ring @ 2012-06-29 21:38 UTC (permalink / raw)
To: Mark Rechler; +Cc: xfs@oss.sgi.com
> It turned out in my case to be related to:
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=840
>
> Write barriers were not passed when using LVM/XFS/MegaRAID combined. After
> upgrading the kernel to 2.6.39 (used packages from
> http://elrepo.org/tiki/tiki-index.php) all XFS issues were resolved. The
> other solution was not using LVM.
I fail to see how missing write barriers can make a difference to what
the filesystem reads from the block layer. How is this possible? Does
the block layer not take the pending writes into account when handling
read requests?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-06-29 21:39 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-30 22:02 XFS Kernel Panics in CentOS Mark Rechler
2012-03-30 22:44 ` Eric Sandeen
2012-04-02 15:09 ` Mark Rechler
2012-04-02 18:03 ` Eric Sandeen
2012-06-29 4:46 ` Changliang Chen
2012-06-29 4:52 ` Eric Sandeen
2012-06-29 4:58 ` Changliang Chen
2012-06-29 15:04 ` Mark Rechler
2012-06-29 21:38 ` Stefan Ring
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox