* xfsdump segfault
@ 2013-02-16 6:20 Daniel Browning
2013-02-16 6:51 ` Dave Chinner
0 siblings, 1 reply; 4+ messages in thread
From: Daniel Browning @ 2013-02-16 6:20 UTC (permalink / raw)
To: xfs
Hello folks,
I ran into a segfault while running xfsdump, what should I do
to find the cause?
Cheers,
--
DB
[root@betelgeuse ~]# mkfs.xfs -f /dev/mapper/W1F01HB6_encrypted
meta-data=/dev/mapper/W1F01HB6_encrypted isize=256 agcount=32, agsize=22888168 blks
= sectsz=4096 attr=2, projid32bit=0
data = bsize=4096 blocks=732421363, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=357627, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@betelgeuse ~]# mount -o inode64,logbsize=256k /dev/mapper/W1F01HB6_encrypted /mnt/backup3_W1F01HB6
[root@betelgeuse ~]# xfsdump -J -l 0 - /mnt/backup3 | xfsrestore - /mnt/backup3_W1F01HB6
xfsdump: using file dump (drive_simple) strategy
xfsrestore: using file dump (drive_simple) strategy
xfsdump: xfsrestore: version 3.1.2 (dump format 3.0)version 3.1.2 (dump format 3.0)
xfsdump: level 0 dump of betelgeuse.orion:/mnt/backup3
xfsdump: dump date: Fri Feb 15 21:19:08 2013
xfsdump: session id: 9bca5442-1b56-4dc6-91b8-bfa68d80c1f1
xfsdump: session label: ""
xfsrestore: searching media for dump
xfsdump: ino map phase 1: constructing initial dump list
xfsdump: ino map phase 2: skipping (no pruning necessary)
xfsdump: ino map phase 3: skipping (only one dump stream)
xfsdump: ino map construction complete
xfsdump: estimated dump size: 2443232512512 bytes
xfsdump: creating dump session media file 0 (media 0, file 0)
xfsdump: dumping ino map
xfsrestore: examining media file 0
xfsrestore: dump description:
xfsrestore: hostname: betelgeuse.orion
xfsrestore: mount point: /mnt/backup3
xfsrestore: volume: /dev/mapper/backup3_encrypted
xfsrestore: session time: Fri Feb 15 21:19:08 2013
xfsrestore: level: 0
xfsrestore: session label: ""
xfsrestore: media label: ""
xfsrestore: file system id: 8c338a01-730e-4bfd-b3fe-a7a60492f81e
xfsrestore: session id: 9bca5442-1b56-4dc6-91b8-bfa68d80c1f1
xfsrestore: media id: 440538bf-4f05-436c-8eb7-4f9091f62c4b
xfsrestore: searching media for directory dump
xfsrestore: reading directories
xfsdump: dumping directories
xfsdump: dumping non-directory files
xfsrestore: 483705 directories and 8830791 entries processed
xfsrestore: directory post-processing
xfsrestore: restoring non-directory files
xfsdump: ending media file
xfsdump: media file size 33603518464 bytes
xfsdump: dump size (non-dir files) : 32684068672 bytes
xfsdump: NOTE: dump interrupted: 833 seconds elapsed
xfsdump: Dump Status: INTERRUPT
Segmentation fault (core dumped)
[root@betelgeuse ~]# tail /var/log/messages -n 6
Feb 15 21:33:01 betelgeuse kernel: xfsrestore[12176]: segfault at 10 ip 00000033fd8478de sp 00007ffff3177dd0 error 4 in libc-2.12.so[33fd800000+189000]
Feb 15 21:33:01 betelgeuse abrt[12237]: Saved core dump of pid 12176 (/sbin/xfsrestore) to /var/spool/abrt/ccpp-2013-02-15-21:33:01-12176 (880640 bytes)
Feb 15 21:33:01 betelgeuse abrtd: Directory 'ccpp-2013-02-15-21:33:01-12176' creation detected
Feb 15 21:33:02 betelgeuse abrtd: Executable '/sbin/xfsrestore' doesn't belong to any package
Feb 15 21:33:02 betelgeuse abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-02-15-21:33:01-12176' exited with 1
Feb 15 21:33:02 betelgeuse abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2013-02-15-21:33:01-12176, deleting
[root@betelgeuse ~]# mount | grep backup3
/dev/mapper/backup3_encrypted on /mnt/backup3 type xfs (ro,inode64,logbsize=256k,inode64,logbsize=256k)
/dev/mapper/W1F01HB6_encrypted on /mnt/backup3_W1F01HB6 type xfs (rw,inode64,logbsize=256k)
[root@betelgeuse ~]# head -n 1 /etc/issue
CentOS release 6.3 (Final)
[root@betelgeuse ~]# uname -a
Linux betelgeuse.orion 3.7.6-1.el6xen.x86_64 #1 SMP Mon Feb 4 17:12:13 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@betelgeuse ~]# xfs_repair -V
xfs_repair version 3.1.10
[root@betelgeuse ~]# grep 'model name' /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
[root@betelgeuse ~]# cat /proc/meminfo
MemTotal: 65951000 kB
MemFree: 1801436 kB
Buffers: 17708 kB
Cached: 54001724 kB
SwapCached: 0 kB
Active: 3880776 kB
Inactive: 50162584 kB
Active(anon): 24196 kB
Inactive(anon): 144 kB
Active(file): 3856580 kB
Inactive(file): 50162440 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 7048 kB
Writeback: 0 kB
AnonPages: 23960 kB
Mapped: 14216 kB
Shmem: 408 kB
Slab: 6442920 kB
SReclaimable: 5776336 kB
SUnreclaim: 666584 kB
KernelStack: 1880 kB
PageTables: 5392 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 32975500 kB
Committed_AS: 103260 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 387912 kB
VmallocChunk: 34359323900 kB
HardwareCorrupted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 10168 kB
DirectMap2M: 2054144 kB
DirectMap1G: 65011712 kB
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: xfsdump segfault 2013-02-16 6:20 xfsdump segfault Daniel Browning @ 2013-02-16 6:51 ` Dave Chinner 2013-02-17 0:06 ` Daniel Browning 0 siblings, 1 reply; 4+ messages in thread From: Dave Chinner @ 2013-02-16 6:51 UTC (permalink / raw) To: Daniel Browning; +Cc: xfs On Fri, Feb 15, 2013 at 10:20:46PM -0800, Daniel Browning wrote: > Hello folks, > > I ran into a segfault while running xfsdump, what should I do > to find the cause? xfsrestore segfaulted, not xfsdump. Can you run xfsdump to dump to a file, then xfsrestore to restore from the file, rather than piping one into the other? That way we can determine what went wrong first. i.e dump failing and restore segfaulting when the pipe was broken, or restore segfaulting and dump stopping because the pipe was broken.... Once you've worked out which failed, adding "-v verbose" or "-v debug" flags to the the command line might give us more indication of what went wrong. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xfsdump segfault 2013-02-16 6:51 ` Dave Chinner @ 2013-02-17 0:06 ` Daniel Browning 2013-02-17 3:47 ` [solved] xfsrestore segfault (was: Re: xfsdump segfault) Daniel Browning 0 siblings, 1 reply; 4+ messages in thread From: Daniel Browning @ 2013-02-17 0:06 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Thanks for the quick response, Dave. On Friday 15 February 2013 10:51:11 pm Dave Chinner wrote: > On Fri, Feb 15, 2013 at 10:20:46PM -0800, Daniel Browning wrote: > > Hello folks, > > > > I ran into a segfault while running xfsdump, what should I do > > to find the cause? > > xfsrestore segfaulted, not xfsdump. Ah, yes, that's right. > Can you run xfsdump to dump to > a file, then xfsrestore to restore from the file, rather than piping > one into the other? Done. It took a little while since it's a 2.4TiB filesystem. Here is the last line of the "-v debug" restore (path names scrubbed for anonymity): xfsrestore: restoring UNIX domain socket ino 128004852 backup/secondary/backups/firefly/root/backups/[...]/etc/socket Segmentation fault (core dumped) [root@betelgeuse ~]# dmesg | tail -n 1 xfsrestore[18838]: segfault at 10 ip 00000033fd8478de sp 00007fcad10a0e00 error 4 in libc-2.12.so[33fd800000+189000] So it appears that a socket is causing the problem. What should I try next? -- DB _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* [solved] xfsrestore segfault (was: Re: xfsdump segfault) 2013-02-17 0:06 ` Daniel Browning @ 2013-02-17 3:47 ` Daniel Browning 0 siblings, 0 replies; 4+ messages in thread From: Daniel Browning @ 2013-02-17 3:47 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Saturday 16 February 2013 4:06:02 pm Daniel Browning wrote: > So it appears that a socket is causing the problem. What should I > try next? Turns out it was already fixed in git HEAD, so take this email as my personal vote for releasing 3.1.3 sooner rather than later, for whatever that's worth. :) commit 59ad060e8f8c39406838c37360efb5f6c09a9041 Author: Alex Elder <elder@inktank.com> Date: Thu Dec 27 08:10:14 2012 -0600 xfsdump: fix format string in restore_spec() Nigel Tamplin reported getting a seg fault in xfsrestore when a path name was too long. He correctly diagnosed that the problem was due to an extra "%s" format specifier in the format value passed to a call to mlog(). This patch corrects that. Signed-off-by: Alex Elder <elder@inktank.com> Reported-by: Nigel Tamplin <ntamplin@codefaber.co.uk> Tested-by: Nigel Tamplin <ntamplin@codefaber.co.uk> Signed-off-by: Ben Myers <bpm@sgi.com> diff --git a/restore/content.c b/restore/content.c index edd00ed..54d933c 100644 --- a/restore/content.c +++ b/restore/content.c @@ -7796,7 +7796,7 @@ restore_spec( filehdr_t *fhdrp, rv_t *rvp, char *path ) if ( strlen( path ) >= sizeof( addr.sun_path )) { mlog( MLOG_VERBOSE | MLOG_WARNING, _( "pathname too long for bind of " - "%s ino %llu %s: %s: discarding\n"), + "%s ino %llu %s: discarding\n"), printstr, fhdrp->fh_stat.bs_ino, path ); Dave Chinner helped me further on IRC and suggested gdb. When I got the stacktrace, Nathan Scott on the channel used it to find the problem area and the commit that fixed it. I applied just that one commit to 3.1.2, recompiled, and confirmed that it is fixed. For the sake of completeness I'll post the stacktrace anyway (with a few ellipsis for anonymization). Thanks again everyone, -- DB /sbin/xfsrestore: truncating secondary/[...]/mod_interchange.html from 0 to 22744 /sbin/xfsrestore: drive_simple read( want 32 ) /sbin/xfsrestore: drive_simple return_read_buf( returning 32 ) /sbin/xfsrestore: xlate_extenthdr /sbin/xfsrestore: read extent hdr size 23040 offset 0 type 4 flags 00000001 /sbin/xfsrestore: read extent hdr type DATA offset 0 sz 23040 flags 1 /sbin/xfsrestore: drive_simple read( want 23040 ) /sbin/xfsrestore: drive_simple return_read_buf( returning 23040 ) /sbin/xfsrestore: drive_simple read( want 32 ) /sbin/xfsrestore: drive_simple return_read_buf( returning 32 ) /sbin/xfsrestore: xlate_extenthdr /sbin/xfsrestore: read extent hdr size 0 offset 0 type 0 flags 00000001 /sbin/xfsrestore: read extent hdr type LAST offset 0 sz 0 flags 1 /sbin/xfsrestore: drive_simple get_mark( ) /sbin/xfsrestore: drive_simple read( want 256 ) /sbin/xfsrestore: drive_simple return_read_buf( returning 256 ) /sbin/xfsrestore: xlate_bstat /sbin/xfsrestore: xlate_bstat: pre-xlate bs_ino 1705965962454368256 bs_mode 20060200000 /sbin/xfsrestore: xlate_bstat: post-xlate bs_ino 98814635031 bs_mode 140600 /sbin/xfsrestore: xlate_filehdr: pre-xlate fh_offset 0 fh_flags 33554432 fh_checksum 3550492224 /sbin/xfsrestore: xlate_filehdr: post-xlate fh_offset 0 fh_flags 2 fh_checksum 1077321939 /sbin/xfsrestore: read file hdr off 0 flags 0x2 ino 98814635031 mode 0x0000c180 /sbin/xfsrestore: restoring secondary/[...]/etc/socket (98814635031 2812586625) /sbin/xfsrestore: restoring UNIX domain socket ino 98814635031 secondary/[...]/etc/socket Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff211e700 (LWP 23393)] 0x00000033fd8478de in vfprintf () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.6.x86_64 glibc-2.12-1.80.el6_3.7.x86_64 libattr-2.4.44-7.el6.x86_64 libuuid-2.17.2-12.7.el6_3.x86_64 (gdb) backtrace #0 0x00000033fd8478de in vfprintf () from /lib64/libc.so.6 #1 0x000000000041b462 in mlog_va (levelarg=<value optimized out>, fmt=0x448068 "pathname too long for bind of %s ino %llu %s: %s: discarding\n", args=0x7ffff211b530) at mlog.c:453 #2 0x000000000041b696 in mlog (levelarg=<value optimized out>, fmt=<value optimized out>) at mlog.c:365 #3 0x000000000042a1dd in restore_spec (cp=<value optimized out>, linkpr=<value optimized out>, path1=0x7fffec0008c0 "secondary/[...]/etc/socket", path2=<value optimized out>) at content.c:7797 #4 restore_file_cb (cp=<value optimized out>, linkpr=<value optimized out>, path1=0x7fffec0008c0 "secondary/[...]/etc/socket", path2=<value optimized out>) at content.c:7299 #5 0x0000000000439d10 in tree_cb_links (ino=98814635031, gen=2812586625, ctime=1358666583, mtime=1322084463, funcp=0x429560 <restore_file_cb>, contextp=0x7ffff211cca0, path1=0x7fffec0008c0 "secondary/[...]/etc/socket", path2=0x7fffec0028d0 "../custom") at tree.c:1870 #6 0x0000000000430f50 in restore_file (thrdix=0) at content.c:7219 #7 applynondirdump (thrdix=0) at content.c:3459 #8 content_stream_restore (thrdix=0) at content.c:2543 #9 0x0000000000418ee6 in childmain (arg1=<value optimized out>) at main.c:1443 #10 0x0000000000407680 in cldmgr_entry (arg1=0x657420) at cldmgr.c:235 #11 0x00000033fdc07851 in start_thread () from /lib64/libpthread.so.0 #12 0x00000033fd8e811d in clone () from /lib64/libc.so.6 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-02-17 3:47 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-16 6:20 xfsdump segfault Daniel Browning 2013-02-16 6:51 ` Dave Chinner 2013-02-17 0:06 ` Daniel Browning 2013-02-17 3:47 ` [solved] xfsrestore segfault (was: Re: xfsdump segfault) Daniel Browning
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox