From: Brian Foster <bfoster@redhat.com>
To: 符永涛 <yongtaofu@gmail.com>
Cc: Ben Myers <bpm@sgi.com>, Eric Sandeen <sandeen@sandeen.net>,
"xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22
Date: Mon, 15 Apr 2013 10:13:58 -0400 [thread overview]
Message-ID: <516C0B26.6020207@redhat.com> (raw)
In-Reply-To: <CADFMGuJMjKc1QoS-Ewt6wG2uSWjyWfQevQg7ZVMer0XSpx3Vjg@mail.gmail.com>
On 04/15/2013 08:54 AM, 符永涛 wrote:
> Dear Brian and xfs experts,
> Brain your scripts works and I am able to reproduce it with glusterfs
> rebalance on our test cluster. 2 of our server xfs shutdown during
> glusterfs rebalance, the shutdown userspace stacktrace both related to
> pthread. See logs bellow, What's your opinion? Thank you very much!
> logs:
Thanks for the data. Can you also create a metadump for the
filesystem(s) associated with this output?
Brian
> [root@10.23.72.93 ~]# cat xfs.log
>
> --- xfs_imap --
> module("xfs").function("xfs_imap@fs/xfs/xfs_ialloc.c:1257").return
> -- return=0x16
> vars: mp=0xffff882017a50800 tp=0xffff881c81797c70 ino=0xffffffff
> imap=0xffff88100e2f7c08 flags=0x0 agbno=? agino=? agno=? blks_per_cluster=?
> chunk_agbno=? cluster_agbno=? error=? offset=? offset_agbno=? __func__=[...]
> mp: m_agno_log = 0x5, m_agino_log = 0x20
> mp->m_sb: sb_agcount = 0x1c, sb_agblocks = 0xffffff0, sb_inopblog = 0x4,
> sb_agblklog = 0x1c, sb_dblocks = 0x1b4900000
> imap: im_blkno = 0x0, im_len = 0xa078, im_boffset = 0x86ea
> kernel backtrace:
> Returning from: 0xffffffffa02b3ab0 : xfs_imap+0x0/0x280 [xfs]
> Returning to : 0xffffffffa02b9599 : xfs_inotobp+0x49/0xc0 [xfs]
> 0xffffffffa02b96f1 : xfs_iunlink_remove+0xe1/0x320 [xfs]
> 0xffffffff81501a69
> 0x0 (inexact)
> user backtrace:
> 0x3bd1a0e5ad [/lib64/libpthread-2.12.so+0xe5ad/0x219000]
>
> --- xfs_iunlink_remove --
> module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return
> -- return=0x16
> vars: tp=0xffff881c81797c70 ip=0xffff881003c13c00 next_ino=? mp=? agi=?
> dip=? agibp=0xffff880109b47e20 ibp=? agno=? agino=? next_agino=? last_ibp=?
> last_dip=0xffff882000000000 bucket_index=? offset=?
> last_offset=0xffffffffffff8810 error=? __func__=[...]
> ip: i_ino = 0x113, i_flags = 0x0
> ip->i_d: di_nlink = 0x0, di_gen = 0x0
> [root@10.23.72.93 ~]#
> [root@10.23.72.94 ~]# cat xfs.log
>
> --- xfs_imap --
> module("xfs").function("xfs_imap@fs/xfs/xfs_ialloc.c:1257").return
> -- return=0x16
> vars: mp=0xffff881017c6c800 tp=0xffff8801037acea0 ino=0xffffffff
> imap=0xffff882017101c08 flags=0x0 agbno=? agino=? agno=? blks_per_cluster=?
> chunk_agbno=? cluster_agbno=? error=? offset=? offset_agbno=? __func__=[...]
> mp: m_agno_log = 0x5, m_agino_log = 0x20
> mp->m_sb: sb_agcount = 0x1c, sb_agblocks = 0xffffff0, sb_inopblog = 0x4,
> sb_agblklog = 0x1c, sb_dblocks = 0x1b4900000
> imap: im_blkno = 0x0, im_len = 0xd98, im_boffset = 0x547
> kernel backtrace:
> Returning from: 0xffffffffa02b3ab0 : xfs_imap+0x0/0x280 [xfs]
> Returning to : 0xffffffffa02b9599 : xfs_inotobp+0x49/0xc0 [xfs]
> 0xffffffffa02b96f1 : xfs_iunlink_remove+0xe1/0x320 [xfs]
> 0xffffffff81501a69
> 0x0 (inexact)
> user backtrace:
> 0x30cd40e5ad [/lib64/libpthread-2.12.so+0xe5ad/0x219000]
>
> --- xfs_iunlink_remove --
> module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return
> -- return=0x16
> vars: tp=0xffff8801037acea0 ip=0xffff880e697c8800 next_ino=? mp=? agi=?
> dip=? agibp=0xffff880d846c2d60 ibp=? agno=? agino=? next_agino=? last_ibp=?
> last_dip=0xffff881017c6c800 bucket_index=? offset=?
> last_offset=0xffffffffffff880e error=? __func__=[...]
> ip: i_ino = 0x142, i_flags = 0x0
> ip->i_d: di_nlink = 0x0, di_gen = 0x3565732e
>
>
>
> 2013/4/15 符永涛 <yongtaofu@gmail.com>
>
>> Also glusterfs use a lot of hardlink for self-heal:
>> --------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/998416323
>> ---------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/999296624
>> ---------T 2 root root 0 Apr 15 12:24 /mnt/xfsd/testbug/999568484
>> ---------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/999956875
>> ---------T 2 root root 0 Apr 15 11:58
>> /mnt/xfsd/testbug/.glusterfs/05/2f/052f4e3e-c379-4a3c-b995-a10fdaca33d0
>> ---------T 2 root root 0 Apr 15 11:58
>> /mnt/xfsd/testbug/.glusterfs/05/95/0595272e-ce2b-45d5-8693-d02c00b94d9d
>> ---------T 2 root root 0 Apr 15 11:58
>> /mnt/xfsd/testbug/.glusterfs/05/ca/05ca00a0-92a7-44cf-b6e3-380496aafaa4
>> ---------T 2 root root 0 Apr 15 12:24
>> /mnt/xfsd/testbug/.glusterfs/0a/23/0a238ca7-3cef-4540-9c98-6bf631551b21
>> ---------T 2 root root 0 Apr 15 11:58
>> /mnt/xfsd/testbug/.glusterfs/0a/4b/0a4b640b-f675-4708-bb59-e2369ffbbb9d
>> Does it related?
>>
>>
>> 2013/4/15 符永涛 <yongtaofu@gmail.com>
>>
>>> Dear xfs experts,
>>> Now I'm deploying Brian's system script in out cluster. But from last
>>> night till now 5 servers in our 24 servers xfs shutdown with the same
>>> error. I run xfs_repair command and found all the lost inodes are glusterfs
>>> dht link files. This explains why the xfs shutdown tend to happen during
>>> glusterfs rebalance. During glusterfs rebalance procedure a lot of dhk link
>>> files may be unlinked. For example the following inodes are found in
>>> lost+found in one of the servers:
>>> [root@* lost+found]# pwd
>>> /mnt/xfsd/lost+found
>>> [root@* lost+found]# ls -l
>>> total 740
>>> ---------T 1 root root 0 Apr 8 21:06 100119
>>> ---------T 1 root root 0 Apr 8 21:11 101123
>>> ---------T 1 root root 0 Apr 8 21:19 102659
>>> ---------T 1 root root 0 Apr 12 14:46 1040919
>>> ---------T 1 root root 0 Apr 12 14:58 1041943
>>> ---------T 1 root root 0 Apr 8 21:32 105219
>>> ---------T 1 root root 0 Apr 8 21:37 105731
>>> ---------T 1 root root 0 Apr 12 17:48 1068055
>>> ---------T 1 root root 0 Apr 12 18:38 1073943
>>> ---------T 1 root root 0 Apr 8 21:54 108035
>>> ---------T 1 root root 0 Apr 12 21:49 1091095
>>> ---------T 1 root root 0 Apr 13 00:17 1111063
>>> ---------T 1 root root 0 Apr 13 03:51 1121815
>>> ---------T 1 root root 0 Apr 8 22:25 112387
>>> ---------T 1 root root 0 Apr 13 06:39 1136151
>>> ...
>>> [root@* lost+found]# getfattr -m . -d -e hex *
>>>
>>> # file: 96007
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0xa0370d8a9f104dafbebbd0e6dd7ce1f7
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x0000000049dff000
>>>
>>> # file: 97027
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0xc1c1fe2ec7034442a623385f43b04c25
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006ac78000
>>>
>>> # file: 97559
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0xcf7c17013c914511bda4d1c743fae118
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000519fb000
>>>
>>> # file: 98055
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0xe86abc6e2c4b44c28d415fbbe34f2102
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000004c098000
>>>
>>> # file: 98567
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0x12543a2efbdf4b9fa61c6d89ca396f80
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006bc98000
>>>
>>> # file: 98583
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0x760d16d3b7974cfb9c0a665a0982c470
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006cde9000
>>>
>>> # file: 99607
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0x0849a732ea204bc3b8bae830b46881da
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000513f1000
>>> ...
>>>
>>> What do you think about it? Thank you very much.
>>>
>>>
>>> 2013/4/12 符永涛 <yongtaofu@gmail.com>
>>>
>>>> Hi Brian,
>>>>
>>>> Your scripts works for me now after I installed all the rpm built out
>>>> from kernel srpm. I'll try it. Thank you.
>>>>
>>>>
>>>> 2013/4/12 Brian Foster <bfoster@redhat.com>
>>>>
>>>>> On 04/12/2013 04:32 AM, 符永涛 wrote:
>>>>>> Dear xfs experts,
>>>>>> Can I just call xfs_stack_trace(); in the second line of
>>>>>> xfs_do_force_shutdown() to print stack and rebuild kernel to check
>>>>>> what's the error?
>>>>>>
>>>>>
>>>>> I suppose that's a start. If you're willing/able to create and run a
>>>>> modified kernel for the purpose of collecting more debug info, perhaps
>>>>> we can get a bit more creative in collecting more data on the problem
>>>>> (but a stack trace there is a good start).
>>>>>
>>>>> BTW- you might want to place the call after the XFS_FORCED_SHUTDOWN(mp)
>>>>> check almost halfway into the function to avoid duplicate messages.
>>>>>
>>>>> Brian
>>>>>
>>>>>>
>>>>>> 2013/4/12 符永涛 <yongtaofu@gmail.com <mailto:yongtaofu@gmail.com>>
>>>>>>
>>>>>> Hi Brian,
>>>>>> What else I'm missing? Thank you.
>>>>>> stap -e 'probe module("xfs").function("xfs_iunlink"){}'
>>>>>>
>>>>>> WARNING: cannot find module xfs debuginfo: No DWARF information
>>>>> found
>>>>>> semantic error: no match while resolving probe point
>>>>>> module("xfs").function("xfs_iunlink")
>>>>>> Pass 2: analysis failed. Try again with another '--vp 01' option.
>>>>>>
>>>>>>
>>>>>> 2013/4/12 符永涛 <yongtaofu@gmail.com <mailto:yongtaofu@gmail.com>>
>>>>>>
>>>>>> ls -l
>>>>>>
>>>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>>>>> -r--r--r-- 1 root root 21393024 Apr 12 12:08
>>>>>>
>>>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>>>>>
>>>>>> rpm -qa|grep kernel
>>>>>> kernel-headers-2.6.32-279.el6.x86_64
>>>>>> kernel-devel-2.6.32-279.el6.x86_64
>>>>>> kernel-2.6.32-358.el6.x86_64
>>>>>> kernel-debuginfo-common-x86_64-2.6.32-279.el6.x86_64
>>>>>> abrt-addon-kerneloops-2.0.8-6.el6.x86_64
>>>>>> kernel-firmware-2.6.32-358.el6.noarch
>>>>>> kernel-debug-2.6.32-358.el6.x86_64
>>>>>> kernel-debuginfo-2.6.32-279.el6.x86_64
>>>>>> dracut-kernel-004-283.el6.noarch
>>>>>> libreport-plugin-kerneloops-2.0.9-5.el6.x86_64
>>>>>> kernel-devel-2.6.32-358.el6.x86_64
>>>>>> kernel-2.6.32-279.el6.x86_64
>>>>>>
>>>>>> rpm -q kernel-debuginfo
>>>>>> kernel-debuginfo-2.6.32-279.el6.x86_64
>>>>>>
>>>>>> rpm -q kernel
>>>>>> kernel-2.6.32-279.el6.x86_64
>>>>>> kernel-2.6.32-358.el6.x86_64
>>>>>>
>>>>>> do I need to re probe it?
>>>>>>
>>>>>>
>>>>>> 2013/4/12 Eric Sandeen <sandeen@sandeen.net
>>>>>> <mailto:sandeen@sandeen.net>>
>>>>>>
>>>>>> On 4/11/13 11:32 PM, 符永涛 wrote:
>>>>>> > Hi Brian,
>>>>>> > Sorry but when I execute the script it says:
>>>>>> > WARNING: cannot find module xfs debuginfo: No DWARF
>>>>>> information found
>>>>>> > semantic error: no match while resolving probe point
>>>>>> module("xfs").function("xfs_iunlink")
>>>>>> >
>>>>>> > uname -a
>>>>>> > 2.6.32-279.el6.x86_64
>>>>>> > kernel debuginfo has been installed.
>>>>>> >
>>>>>> > Where can I find the correct xfs debuginfo?
>>>>>>
>>>>>> it should be in the kernel-debuginfo rpm (of the same
>>>>>> version/release as the kernel rpm you're running)
>>>>>>
>>>>>> You should have:
>>>>>>
>>>>>>
>>>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>>>>>
>>>>>> If not, can you show:
>>>>>>
>>>>>> # uname -a
>>>>>> # rpm -q kernel
>>>>>> # rpm -q kernel-debuginfo
>>>>>>
>>>>>> -Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> 符永涛
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> 符永涛
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> 符永涛
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> xfs mailing list
>>>>>> xfs@oss.sgi.com
>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> 符永涛
>>>>
>>>
>>>
>>>
>>> --
>>> 符永涛
>>>
>>
>>
>>
>> --
>> 符永涛
>>
>
>
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-04-15 14:12 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-09 12:53 need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22 符永涛
2013-04-09 13:03 ` 符永涛
2013-04-09 13:05 ` 符永涛
2013-04-09 14:52 ` Ben Myers
2013-04-09 15:00 ` 符永涛
2013-04-09 15:07 ` 符永涛
2013-04-09 15:10 ` 符永涛
2013-04-10 10:10 ` Emmanuel Florac
2013-04-10 12:52 ` Dave Chinner
2013-04-10 13:52 ` 符永涛
2013-04-11 19:11 ` 符永涛
2013-04-11 19:55 ` 符永涛
2013-04-11 23:26 ` Brian Foster
2013-04-12 0:45 ` 符永涛
2013-04-12 12:50 ` Brian Foster
2013-04-12 13:42 ` 符永涛
2013-04-12 13:48 ` 符永涛
2013-04-12 13:51 ` 符永涛
2013-04-12 13:59 ` 符永涛
2013-04-12 1:07 ` Eric Sandeen
2013-04-12 1:36 ` 符永涛
2013-04-12 1:38 ` 符永涛
2013-04-12 6:15 ` 符永涛
2013-04-12 4:32 ` 符永涛
2013-04-12 5:16 ` Eric Sandeen
2013-04-12 5:40 ` 符永涛
2013-04-12 6:00 ` 符永涛
2013-04-12 12:11 ` Brian Foster
2013-04-12 7:44 ` 符永涛
2013-04-12 8:32 ` 符永涛
2013-04-12 12:41 ` Brian Foster
2013-04-12 14:48 ` 符永涛
2013-04-15 2:08 ` 符永涛
2013-04-15 5:04 ` 符永涛
2013-04-15 12:54 ` 符永涛
2013-04-15 13:33 ` 符永涛
2013-04-15 13:36 ` 符永涛
2013-04-15 13:45 ` 符永涛
2013-04-15 13:57 ` Eric Sandeen
2013-04-15 14:21 ` 符永涛
2013-04-15 15:24 ` 符永涛
2013-04-15 19:34 ` Eric Sandeen
2013-04-15 14:13 ` Brian Foster [this message]
2013-04-12 5:23 ` 符永涛
2013-04-09 22:16 ` Michael L. Semon
2013-04-09 22:18 ` Eric Sandeen
2013-04-09 22:48 ` Ben Myers
2013-04-09 23:30 ` Dave Chinner
2013-04-09 15:06 ` Eric Sandeen
2013-04-09 15:18 ` 符永涛
2013-04-09 15:23 ` Eric Sandeen
2013-04-09 15:25 ` 符永涛
2013-04-09 15:23 ` 符永涛
2013-04-09 15:44 ` Eric Sandeen
2013-04-09 15:48 ` 符永涛
2013-04-09 15:49 ` 符永涛
2013-04-09 15:58 ` Brian Foster
2013-04-09 17:10 ` Eric Sandeen
2013-04-10 5:34 ` 符永涛
2013-04-10 5:36 ` 符永涛
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=516C0B26.6020207@redhat.com \
--to=bfoster@redhat.com \
--cc=bpm@sgi.com \
--cc=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
--cc=yongtaofu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.