From: Leslie Rhorer <lrhorer@mygrande.net>
To: Brian Foster <bfoster@redhat.com>
Cc: Eric Sandeen <sandeen@sandeen.net>,
Kris Rusocki <kszysiu@braxis.org>,
"Rhorer, Leslie" <Leslie.Rhorer@level3.com>,
"xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: XFS File system in trouble
Date: Tue, 04 Aug 2015 02:52:33 -0500 [thread overview]
Message-ID: <55C06F41.4030502@mygrande.net> (raw)
In-Reply-To: <55BE7C75.4060604@mygrande.net>
It's failing, again. The rsync job failed and when I attempt to untar
the file in the image mount, it fails there, as well. See below. I
formatted a 1.5T drive as xfs and mounted it under /media. I then
dumped the failing FS to a file on /media using xfs_metadump and used
xfs_mdrestore to create an image of the FS. I then mounted the image,
copied over the tarball to its location, and ran tar to extract the files:
RAID-Server:/# mount -o nouuid /media/md0.img /TEST
RAID-Server:/# cd "/TEST/Server-Main/Equipment/Drive
Controllers/HighPoint Adapters/Rocket 2722/Driver"/
RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint
Adapters/Rocket 2722/Driver# cp "/RAID/Server-Main/Equipment/Drive
Controllers/HighPoint Adapters/Rocket 2722/Driver/RR_27xx.tar.gz" ./
RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint
Adapters/Rocket 2722/Driver# tar -xzvf RR_27xx.tar.gz
DC7280/
DC7280/Linux/
DC7280/Linux/Opensource/
DC7280/Linux/Opensource/DC7280-linux-src-v1.0-110621-1313.tar.gz
DC7280/Windows/
DC7280/Windows/Vista-Win2008-Win7/
DC7280/Windows/Vista-Win2008-Win7/x32/
DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.cat
DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.inf
DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.sys
DC7280/Windows/Vista-Win2008-Win7/x64/
DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.cat
DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.inf
DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.sys
DC7280/Windows/Vista-Win2008-Win7/Readme.txt
DC7280/.ddinfo
R272x/
R272x/Linux/
R272x/Linux/Opensource/
R272x/Linux/Opensource/partial/
R272x/Linux/Opensource/partial/include/
...
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/pcitable
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/readme.txt
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhdd
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step1.sh
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step2.sh
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
Cannot mkdir: Structure needs cleaning
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
Cannot mkdir: Input/output error
tar:
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh:
Cannot open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
Cannot mkdir: Input/output error
tar:
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py:
Cannot open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
Cannot mkdir: Input/output error
tar:
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo: Cannot
open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
Cannot mkdir: Input/output error
tar:
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias:
Cannot open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.cgz
gzip: tar:
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: Cannot
mkdir: Input/output errorstdin: Input/output error
tar: Unexpected EOF in archive
tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot utime: Input/output error
tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change ownership to uid 0,
gid 1000: Input/output error
tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change mode to rwxr-xr-x:
Input/output error
tar: RR274x/Driver/Linux: Cannot utime: Input/output error
tar: RR274x/Driver/Linux: Cannot change ownership to uid 0, gid 1000:
Input/output error
tar: RR274x/Driver/Linux: Cannot change mode to rwxr-xr-x: Input/output
error
tar: RR274x/Driver: Cannot utime: Input/output error
tar: RR274x/Driver: Cannot change ownership to uid 0, gid 1000:
Input/output error
tar: RR274x/Driver: Cannot change mode to rwxr-xr-x: Input/output error
tar: RR274x: Cannot utime: Input/output error
tar: RR274x: Cannot change ownership to uid 0, gid 1000: Input/output error
tar: RR274x: Cannot change mode to rwxr-xr-x: Input/output error
tar: Error is not recoverable: exiting now
dmesg:
[131329.013475] XFS (md0): Mounting V4 Filesystem
[131329.918438] XFS (md0): Ending clean mount
[131499.357099] XFS (md0): Mounting V4 Filesystem
[131499.709248] XFS (md0): Ending clean mount
[131874.545344] loop: module loaded
[131874.549914] XFS (loop0): Mounting V4 Filesystem
[131874.555540] XFS (loop0): Ending clean mount
[132020.964431] XFS (loop0): xfs_iread: validation failed for inode
124656869424 failed
[132020.964435] ffff88028b078000: 49 4e 00 00 03 02 00 00 00 30 00 70 00
00 03 e8 IN.......0.p....
[132020.964437] ffff88028b078010: 00 00 00 00 06 20 b0 6f 01 2e 00 00 00
00 00 16 ..... .o........
[132020.964438] ffff88028b078020: 01 57 37 fd 2b 5d 22 9e 1e 0a 61 8c 00
00 00 20 .W7.+]"...a....
[132020.964440] ffff88028b078030: ff ff 00 d2 1b f6 27 90 00 00 00 00 00
00 00 00 ......'.........
[132020.964454] XFS (loop0): Internal error xfs_iread at line 392 of
file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_inode_buf.c.
Caller xfs_iget+0x24b/0x690 [xfs]
[132020.964457] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64
#1 Debian 3.16.7-ckt11-1
[132020.964459] Hardware name: To be filled by O.E.M. To be filled by
O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013
[132020.964460] 0000000000000001 ffffffff8150b405 ffff880424059800
ffffffffa09115cb
[132020.964463] 0000018800000010 ffffffffa0916f6b ffff88030f5c6c00
ffff880424059800
[132020.964465] 0000000000000075 ffff8800ad1afe98 ffffffffa095cb3a
ffffffffa0916f6b
[132020.964467] Call Trace:
[132020.964471] [<ffffffff8150b405>] ? dump_stack+0x41/0x51
[132020.964478] [<ffffffffa09115cb>] ? xfs_corruption_error+0x5b/0x80 [xfs]
[132020.964483] [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
[132020.964492] [<ffffffffa095cb3a>] ? xfs_iread+0xea/0x400 [xfs]
[132020.964497] [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
[132020.964503] [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
[132020.964511] [<ffffffffa0956de6>] ? xfs_ialloc+0xa6/0x500 [xfs]
[132020.964517] [<ffffffffa092658e>] ? kmem_zone_alloc+0x6e/0xe0 [xfs]
[132020.964525] [<ffffffffa09572a2>] ? xfs_dir_ialloc+0x62/0x2a0 [xfs]
[132020.964531] [<ffffffffa09251e5>] ? xfs_trans_reserve+0x1f5/0x200 [xfs]
[132020.964538] [<ffffffffa09579a9>] ? xfs_create+0x489/0x700 [xfs]
[132020.964541] [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
[132020.964548] [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs]
[132020.964550] [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
[132020.964551] [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
[132020.964554] [<ffffffff815115cd>] ?
system_call_fast_compare_end+0x10/0x15
[132020.964555] XFS (loop0): Corruption detected. Unmount and run xfs_repair
[132020.964564] XFS (loop0): Internal error xfs_trans_cancel at line 959
of file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c.
Caller xfs_create+0x2b2/0x700 [xfs]
[132020.964566] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64
#1 Debian 3.16.7-ckt11-1
[132020.964567] Hardware name: To be filled by O.E.M. To be filled by
O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013
[132020.964568] 000000000000000c ffffffff8150b405 ffff8800ad1afe98
ffffffffa0925e07
[132020.964570] ffff880002530800 ffff880079e03ec8 ffff880424059800
ffffffffa09577d2
[132020.964571] 0000000000000001 ffff880079e03e20 ffff880079e03e1c
ffff880079e03eb0
[132020.964573] Call Trace:
[132020.964575] [<ffffffff8150b405>] ? dump_stack+0x41/0x51
[132020.964581] [<ffffffffa0925e07>] ? xfs_trans_cancel+0xc7/0xf0 [xfs]
[132020.964588] [<ffffffffa09577d2>] ? xfs_create+0x2b2/0x700 [xfs]
[132020.964590] [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
[132020.964596] [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs]
[132020.964598] [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
[132020.964600] [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
[132020.964602] [<ffffffff815115cd>] ?
system_call_fast_compare_end+0x10/0x15
[132020.964604] XFS (loop0): xfs_do_force_shutdown(0x8) called from line
960 of file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c.
Return address = 0xffffffffa0925e20
[132021.196487] XFS (loop0): Corruption of in-memory data detected.
Shutting down filesystem
[132021.196491] XFS (loop0): Please umount the filesystem and rectify
the problem(s)
[132024.791456] XFS (loop0): xfs_log_force: error 5 returned.
[132054.854625] XFS (loop0): xfs_log_force: error 5 returned.
[132084.917775] XFS (loop0): xfs_log_force: error 5 returned.
[132114.980927] XFS (loop0): xfs_log_force: error 5 returned.
[132145.044086] XFS (loop0): xfs_log_force: error 5 returned.
[132175.107307] XFS (loop0): xfs_log_force: error 5 returned.
[132205.170404] XFS (loop0): xfs_log_force: error 5 returned.
[132235.233587] XFS (loop0): xfs_log_force: error 5 returned.
On 8/2/2015 3:24 PM, Leslie Rhorer wrote:
>
> OK, this is goofy. It seems to be working, now. As usual, I've
> been doing some work on the server this weekend, but I can't think of
> anything I have done that would fix the issue. I did replace the
> remaining good 4G RAM module with a pair of 8G RAM modules, but memtest
> reported the remaining 4G module as good, and I verified the removed
> module really was bad. I also replaced the removable drive carrier and
> cables that were feeding the two SSDs, once of which was reporting
> failures as noted in the syslog. It's hard for me to believe either of
> those things could have been causing the issue, though.
>
> I attached a 1.5T external drive to the server and formatted it as
> XFS in preparation to continue troubleshooting. To make sure of things,
> I tried decompressing the tarball, again, and this time it worked all
> the way to the end. I then deleted the entire directory structure
> created by the tarball and decompressed the file again twice. I'll see
> if the rsync process works. That will take a couple of days.
>
> On 7/28/2015 5:11 PM, Brian Foster wrote:
>> On Tue, Jul 28, 2015 at 10:13:01AM -0500, Leslie Rhorer wrote:
>>> On 7/28/2015 7:33 AM, Brian Foster wrote:
>>>> On Tue, Jul 28, 2015 at 02:46:45AM -0500, Leslie Rhorer wrote:
>>>>> On 7/20/2015 6:17 AM, Brian Foster wrote:
>>>>>> On Sat, Jul 18, 2015 at 08:02:50PM -0500, Leslie Rhorer wrote:
>>>>>>>
>> ...
>>>>
>>>>> I then copied both the tarball and the image over to the root,
>>>>> and while
>>>>> the system would not let me create the image on the root, it did
>>>>> let me copy
>>>>> the image to the root. I then umounted the RAID array, mounted the
>>>>> image,
>>>>> and attempted to cd to the original directory in the image mount
>>>>> where the
>>>>> tarball was saved. That failed with an I/O error:
>>>>>
>>>>
>>>> It sounds a bit strange for the mdrestore to fail on root but a cp of
>>>> the resulting image to work. Do the resulting images have the same file
>>>> size or is the rootfs copy truncated? If the latter, you could be
>>>> missing part of the fs and thus any of the following tests are probably
>>>> moot.
>>>
>>> Well, it can't be as large as it is reported, let's put it that way,
>>> although the reported file size is the same. Ls claims it to be 16T in
>>> size, which cannot be the case on a 100G partition. I forgot to
>>> mention cp
>>> does complain:
>>>
>>> RAID-Server:/# cp /RAID/TEST/RAIDfile.img ./
>>> cp: cannot lseek ‘./RAIDfile.img’: Invalid argument
>>>
>>> But it does the same thing on the backup server, and it works
>>> there. I
>>> tried a cmp, and it seems to be hung. It just may be taking a long
>>> time,
>>> however.
>>>
>>
>> Yeah, you can't really trust the resulting image. It doesn't take much
>> space to create a very large sparse file, but different filesystems have
>> different maximum file size limits. The problem here is that some
>> metadata near the beginning of the file might reference or depend on
>> something near the end, and I/Os beyond the end of the file will
>> probably result in errors.
>>
>> I'd probably try the nouuid approach since the hardware is similar as
>> well as some of the other interesting suggestions that have been made to
>> try and get the image on the rootfs and see what happens there too.
>>
>> Brian
>>
>>>> Brian
>>>>
>>>>> RAID-Server:/# cd "/media/Server-Main/Equipment/Drive
>>>>> Controllers/HighPoint
>>>>> Adapters/Rocket 2722/Driver/"
>>>>> bash: cd: /media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters/Rocket 2722/Driver/: Input/output error
>>>>>
>>>>> I changed directories to a point two directories above the
>>>>> previous attempt
>>>>> and did a long listing:
>>>>>
>>>>> RAID-Server:/# cd "/media/Server-Main/Equipment/Drive
>>>>> Controllers/HighPoint
>>>>> Adapters"
>>>>> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters# ll
>>>>> ls: cannot access RocketRAID 2722: Input/output error
>>>>> total 4
>>>>> drwxr-xr-x 6 root lrhorer 4096 Jul 18 19:26 Rocket 2722
>>>>> ?????????? ? ? ? ? ? RocketRAID 2722
>>>>>
>>>>> As you can see, Rocket 2722 is still there, but RocketRAID 2722
>>>>> is very
>>>>> sick. Rocket 2722 is the parent of where the tarbal was, however,
>>>>> so I did
>>>>> a cd and an ll again:
>>>>>
>>>>> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters# cd "Rocket 2722"/
>>>>> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters/Rocket 2722# ll
>>>>> ls: cannot access BIOS: Input/output error
>>>>> ls: cannot access Driver: Input/output error
>>>>> ls: cannot access HighPoint RAID Management Software: Input/output
>>>>> error
>>>>> ls: cannot access Manual: Input/output error
>>>>> total 248
>>>>> -rwxr--r-- 1 root lrhorer 245760 Nov 20 2008 autorun.exe
>>>>> -rwxr--r-- 1 root lrhorer 51 Mar 21 2001 autorun.inf
>>>>> ?????????? ? ? ? ? ? BIOS
>>>>> ?????????? ? ? ? ? ? Driver
>>>>> ?????????? ? ? ? ? ? HighPoint RAID
>>>>> Management
>>>>> Software
>>>>> ?????????? ? ? ? ? ? Manual
>>>>> -rwxr--r-- 1 root lrhorer 1134 Feb 5 2012 readme.txt
>>>>>
>>>>> So now, what?
>>>>>
>>>>> _______________________________________________
>>>>> xfs mailing list
>>>>> xfs@oss.sgi.com
>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-08-04 7:52 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-18 1:46 XFS File system in trouble Rhorer, Leslie
2015-07-18 14:16 ` Eric Sandeen
2015-07-18 17:23 ` Rhorer, Leslie
2015-07-18 17:47 ` Kris Rusocki
2015-07-18 18:12 ` Leslie Rhorer
2015-07-19 1:02 ` Leslie Rhorer
2015-07-19 23:27 ` Dave Chinner
2015-07-20 7:41 ` Leslie Rhorer
2015-07-20 8:05 ` Martin Papik
2015-07-20 8:35 ` Leslie Rhorer
2015-07-20 8:52 ` Martin Papik
2015-07-20 13:08 ` Gim Leong Chin
2015-07-20 13:34 ` Eric Sandeen
2015-07-23 3:18 ` Eric Sandeen
2015-07-24 13:47 ` Leslie Rhorer
2015-07-24 14:44 ` Eric Sandeen
2015-07-24 15:29 ` Rhorer, Leslie
2015-07-20 11:17 ` Brian Foster
2015-07-23 1:45 ` Leslie Rhorer
2015-07-23 11:36 ` Brian Foster
2015-07-28 7:46 ` Leslie Rhorer
2015-07-28 8:35 ` Stefan Ring
2015-07-28 10:48 ` Roger Willcocks
2015-07-28 12:33 ` Brian Foster
2015-07-28 15:13 ` Leslie Rhorer
2015-07-28 16:53 ` Eric Sandeen
2015-07-28 19:12 ` Martin Papik
2015-07-28 19:52 ` Martin Steigerwald
2015-07-28 22:11 ` Brian Foster
2015-08-02 20:24 ` Leslie Rhorer
2015-08-04 7:52 ` Leslie Rhorer [this message]
2015-08-04 12:19 ` Brian Foster
2015-08-04 22:42 ` Dave Chinner
2015-08-10 1:37 ` Leslie Rhorer
2015-08-13 6:21 ` Leslie Rhorer
2015-08-14 1:26 ` Dave Chinner
2015-08-14 23:12 ` Leslie Rhorer
2015-08-15 12:28 ` Roger Willcocks
2015-08-15 18:48 ` Eric Sandeen
2015-08-15 18:57 ` Roger Willcocks
2015-08-15 22:48 ` Dave Chinner
2015-08-15 19:00 ` Eric Sandeen
2015-08-15 19:13 ` Roger Willcocks
2015-08-16 0:32 ` Eric Sandeen
2015-08-18 2:14 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55C06F41.4030502@mygrande.net \
--to=lrhorer@mygrande.net \
--cc=Leslie.Rhorer@level3.com \
--cc=bfoster@redhat.com \
--cc=kszysiu@braxis.org \
--cc=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.