public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Leslie Rhorer <lrhorer@mygrande.net>
To: Brian Foster <bfoster@redhat.com>
Cc: Eric Sandeen <sandeen@sandeen.net>,
	Kris Rusocki <kszysiu@braxis.org>,
	"Rhorer, Leslie" <Leslie.Rhorer@level3.com>,
	"xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: XFS File system in trouble
Date: Tue, 04 Aug 2015 02:52:33 -0500	[thread overview]
Message-ID: <55C06F41.4030502@mygrande.net> (raw)
In-Reply-To: <55BE7C75.4060604@mygrande.net>

	It's failing, again.  The rsync job failed and when I attempt to untar 
the file in the image mount, it fails there, as well.  See below.  I 
formatted a 1.5T drive as xfs and mounted it under /media.  I then 
dumped the failing FS to a file on /media using xfs_metadump and used 
xfs_mdrestore to create an image of the FS.  I then mounted the image, 
copied over the tarball to its location, and ran tar to extract the files:

RAID-Server:/# mount -o nouuid /media/md0.img /TEST

RAID-Server:/# cd "/TEST/Server-Main/Equipment/Drive 
Controllers/HighPoint Adapters/Rocket 2722/Driver"/

RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint 
Adapters/Rocket 2722/Driver# cp "/RAID/Server-Main/Equipment/Drive 
Controllers/HighPoint Adapters/Rocket 2722/Driver/RR_27xx.tar.gz" ./

RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint 
Adapters/Rocket 2722/Driver# tar -xzvf RR_27xx.tar.gz
DC7280/
DC7280/Linux/
DC7280/Linux/Opensource/
DC7280/Linux/Opensource/DC7280-linux-src-v1.0-110621-1313.tar.gz
DC7280/Windows/
DC7280/Windows/Vista-Win2008-Win7/
DC7280/Windows/Vista-Win2008-Win7/x32/
DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.cat
DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.inf
DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.sys
DC7280/Windows/Vista-Win2008-Win7/x64/
DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.cat
DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.inf
DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.sys
DC7280/Windows/Vista-Win2008-Win7/Readme.txt
DC7280/.ddinfo
R272x/
R272x/Linux/
R272x/Linux/Opensource/
R272x/Linux/Opensource/partial/
R272x/Linux/Opensource/partial/include/

...

RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/pcitable
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/readme.txt
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhdd
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step1.sh
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step2.sh
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Structure needs cleaning
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Input/output error
tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh: 
Cannot open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Input/output error
tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py: 
Cannot open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Input/output error
tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo: Cannot 
open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Input/output error
tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias: 
Cannot open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.cgz

gzip: tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: Cannot 
mkdir: Input/output errorstdin: Input/output error

tar: Unexpected EOF in archive
tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot utime: Input/output error
tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change ownership to uid 0, 
gid 1000: Input/output error
tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change mode to rwxr-xr-x: 
Input/output error
tar: RR274x/Driver/Linux: Cannot utime: Input/output error
tar: RR274x/Driver/Linux: Cannot change ownership to uid 0, gid 1000: 
Input/output error
tar: RR274x/Driver/Linux: Cannot change mode to rwxr-xr-x: Input/output 
error
tar: RR274x/Driver: Cannot utime: Input/output error
tar: RR274x/Driver: Cannot change ownership to uid 0, gid 1000: 
Input/output error
tar: RR274x/Driver: Cannot change mode to rwxr-xr-x: Input/output error
tar: RR274x: Cannot utime: Input/output error
tar: RR274x: Cannot change ownership to uid 0, gid 1000: Input/output error
tar: RR274x: Cannot change mode to rwxr-xr-x: Input/output error
tar: Error is not recoverable: exiting now


dmesg:
[131329.013475] XFS (md0): Mounting V4 Filesystem
[131329.918438] XFS (md0): Ending clean mount
[131499.357099] XFS (md0): Mounting V4 Filesystem
[131499.709248] XFS (md0): Ending clean mount
[131874.545344] loop: module loaded
[131874.549914] XFS (loop0): Mounting V4 Filesystem
[131874.555540] XFS (loop0): Ending clean mount
[132020.964431] XFS (loop0): xfs_iread: validation failed for inode 
124656869424 failed
[132020.964435] ffff88028b078000: 49 4e 00 00 03 02 00 00 00 30 00 70 00 
00 03 e8  IN.......0.p....
[132020.964437] ffff88028b078010: 00 00 00 00 06 20 b0 6f 01 2e 00 00 00 
00 00 16  ..... .o........
[132020.964438] ffff88028b078020: 01 57 37 fd 2b 5d 22 9e 1e 0a 61 8c 00 
00 00 20  .W7.+]"...a....
[132020.964440] ffff88028b078030: ff ff 00 d2 1b f6 27 90 00 00 00 00 00 
00 00 00  ......'.........
[132020.964454] XFS (loop0): Internal error xfs_iread at line 392 of 
file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_inode_buf.c. 
Caller xfs_iget+0x24b/0x690 [xfs]
[132020.964457] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64 
#1 Debian 3.16.7-ckt11-1
[132020.964459] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013
[132020.964460]  0000000000000001 ffffffff8150b405 ffff880424059800 
ffffffffa09115cb
[132020.964463]  0000018800000010 ffffffffa0916f6b ffff88030f5c6c00 
ffff880424059800
[132020.964465]  0000000000000075 ffff8800ad1afe98 ffffffffa095cb3a 
ffffffffa0916f6b
[132020.964467] Call Trace:
[132020.964471]  [<ffffffff8150b405>] ? dump_stack+0x41/0x51
[132020.964478]  [<ffffffffa09115cb>] ? xfs_corruption_error+0x5b/0x80 [xfs]
[132020.964483]  [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
[132020.964492]  [<ffffffffa095cb3a>] ? xfs_iread+0xea/0x400 [xfs]
[132020.964497]  [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
[132020.964503]  [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
[132020.964511]  [<ffffffffa0956de6>] ? xfs_ialloc+0xa6/0x500 [xfs]
[132020.964517]  [<ffffffffa092658e>] ? kmem_zone_alloc+0x6e/0xe0 [xfs]
[132020.964525]  [<ffffffffa09572a2>] ? xfs_dir_ialloc+0x62/0x2a0 [xfs]
[132020.964531]  [<ffffffffa09251e5>] ? xfs_trans_reserve+0x1f5/0x200 [xfs]
[132020.964538]  [<ffffffffa09579a9>] ? xfs_create+0x489/0x700 [xfs]
[132020.964541]  [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
[132020.964548]  [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs]
[132020.964550]  [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
[132020.964551]  [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
[132020.964554]  [<ffffffff815115cd>] ? 
system_call_fast_compare_end+0x10/0x15
[132020.964555] XFS (loop0): Corruption detected. Unmount and run xfs_repair
[132020.964564] XFS (loop0): Internal error xfs_trans_cancel at line 959 
of file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c. 
Caller xfs_create+0x2b2/0x700 [xfs]
[132020.964566] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64 
#1 Debian 3.16.7-ckt11-1
[132020.964567] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013
[132020.964568]  000000000000000c ffffffff8150b405 ffff8800ad1afe98 
ffffffffa0925e07
[132020.964570]  ffff880002530800 ffff880079e03ec8 ffff880424059800 
ffffffffa09577d2
[132020.964571]  0000000000000001 ffff880079e03e20 ffff880079e03e1c 
ffff880079e03eb0
[132020.964573] Call Trace:
[132020.964575]  [<ffffffff8150b405>] ? dump_stack+0x41/0x51
[132020.964581]  [<ffffffffa0925e07>] ? xfs_trans_cancel+0xc7/0xf0 [xfs]
[132020.964588]  [<ffffffffa09577d2>] ? xfs_create+0x2b2/0x700 [xfs]
[132020.964590]  [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
[132020.964596]  [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs]
[132020.964598]  [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
[132020.964600]  [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
[132020.964602]  [<ffffffff815115cd>] ? 
system_call_fast_compare_end+0x10/0x15
[132020.964604] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 
960 of file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c. 
Return address = 0xffffffffa0925e20
[132021.196487] XFS (loop0): Corruption of in-memory data detected. 
Shutting down filesystem
[132021.196491] XFS (loop0): Please umount the filesystem and rectify 
the problem(s)
[132024.791456] XFS (loop0): xfs_log_force: error 5 returned.
[132054.854625] XFS (loop0): xfs_log_force: error 5 returned.
[132084.917775] XFS (loop0): xfs_log_force: error 5 returned.
[132114.980927] XFS (loop0): xfs_log_force: error 5 returned.
[132145.044086] XFS (loop0): xfs_log_force: error 5 returned.
[132175.107307] XFS (loop0): xfs_log_force: error 5 returned.
[132205.170404] XFS (loop0): xfs_log_force: error 5 returned.
[132235.233587] XFS (loop0): xfs_log_force: error 5 returned.


On 8/2/2015 3:24 PM, Leslie Rhorer wrote:
>
>      OK, this is goofy.  It seems to be working, now.  As usual, I've
> been doing some work on the server this weekend, but I can't think of
> anything I have done that would fix the issue.  I did replace the
> remaining good 4G RAM module with a pair of 8G RAM modules, but memtest
> reported the remaining 4G module as good, and I verified the removed
> module really was bad.  I also replaced the removable drive carrier and
> cables that were feeding the two SSDs, once of which was reporting
> failures as noted in the syslog.  It's hard for me to believe either of
> those things could have been causing the issue, though.
>
>      I attached a 1.5T external drive to the server and formatted it as
> XFS in preparation to continue troubleshooting.  To make sure of things,
> I tried decompressing the tarball, again, and this time it worked all
> the way to the end.  I then deleted the entire directory structure
> created by the tarball and decompressed the file again twice.  I'll see
> if the rsync process works.  That will take a couple of days.
>
> On 7/28/2015 5:11 PM, Brian Foster wrote:
>> On Tue, Jul 28, 2015 at 10:13:01AM -0500, Leslie Rhorer wrote:
>>> On 7/28/2015 7:33 AM, Brian Foster wrote:
>>>> On Tue, Jul 28, 2015 at 02:46:45AM -0500, Leslie Rhorer wrote:
>>>>> On 7/20/2015 6:17 AM, Brian Foster wrote:
>>>>>> On Sat, Jul 18, 2015 at 08:02:50PM -0500, Leslie Rhorer wrote:
>>>>>>>
>> ...
>>>>
>>>>>     I then copied both the tarball and the image over to the root,
>>>>> and while
>>>>> the system would not let me create the image on the root, it did
>>>>> let me copy
>>>>> the image to the root.  I then umounted the RAID array, mounted the
>>>>> image,
>>>>> and attempted to cd to the original directory in the image mount
>>>>> where the
>>>>> tarball was saved.  That failed with an I/O error:
>>>>>
>>>>
>>>> It sounds a bit strange for the mdrestore to fail on root but a cp of
>>>> the resulting image to work. Do the resulting images have the same file
>>>> size or is the rootfs copy truncated? If the latter, you could be
>>>> missing part of the fs and thus any of the following tests are probably
>>>> moot.
>>>
>>>     Well, it can't be as large as it is reported, let's put it that way,
>>> although the reported file size is the same.  Ls claims it to be 16T in
>>> size, which cannot be the case on a 100G partition.  I forgot to
>>> mention cp
>>> does complain:
>>>
>>> RAID-Server:/# cp /RAID/TEST/RAIDfile.img ./
>>> cp: cannot lseek ‘./RAIDfile.img’: Invalid argument
>>>
>>>     But it does the same thing on the backup server, and it works
>>> there.  I
>>> tried a cmp, and it seems to be hung.  It just may be taking a long
>>> time,
>>> however.
>>>
>>
>> Yeah, you can't really trust the resulting image. It doesn't take much
>> space to create a very large sparse file, but different filesystems have
>> different maximum file size limits. The problem here is that some
>> metadata near the beginning of the file might reference or depend on
>> something near the end, and I/Os beyond the end of the file will
>> probably result in errors.
>>
>> I'd probably try the nouuid approach since the hardware is similar as
>> well as some of the other interesting suggestions that have been made to
>> try and get the image on the rootfs and see what happens there too.
>>
>> Brian
>>
>>>> Brian
>>>>
>>>>> RAID-Server:/# cd "/media/Server-Main/Equipment/Drive
>>>>> Controllers/HighPoint
>>>>> Adapters/Rocket 2722/Driver/"
>>>>> bash: cd: /media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters/Rocket 2722/Driver/: Input/output error
>>>>>
>>>>>     I changed directories to a point two directories above the
>>>>> previous attempt
>>>>> and did a long listing:
>>>>>
>>>>> RAID-Server:/# cd "/media/Server-Main/Equipment/Drive
>>>>> Controllers/HighPoint
>>>>> Adapters"
>>>>> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters# ll
>>>>> ls: cannot access RocketRAID 2722: Input/output error
>>>>> total 4
>>>>> drwxr-xr-x 6 root lrhorer 4096 Jul 18 19:26 Rocket 2722
>>>>> ?????????? ? ?    ?          ?            ? RocketRAID 2722
>>>>>
>>>>>     As you can see, Rocket 2722 is still there, but RocketRAID 2722
>>>>> is very
>>>>> sick.  Rocket 2722 is the parent of where the tarbal was, however,
>>>>> so I did
>>>>> a cd and an ll again:
>>>>>
>>>>> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters# cd "Rocket 2722"/
>>>>> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters/Rocket 2722# ll
>>>>> ls: cannot access BIOS: Input/output error
>>>>> ls: cannot access Driver: Input/output error
>>>>> ls: cannot access HighPoint RAID Management Software: Input/output
>>>>> error
>>>>> ls: cannot access Manual: Input/output error
>>>>> total 248
>>>>> -rwxr--r-- 1 root lrhorer 245760 Nov 20  2008 autorun.exe
>>>>> -rwxr--r-- 1 root lrhorer     51 Mar 21  2001 autorun.inf
>>>>> ?????????? ? ?    ?            ?            ? BIOS
>>>>> ?????????? ? ?    ?            ?            ? Driver
>>>>> ?????????? ? ?    ?            ?            ? HighPoint RAID
>>>>> Management
>>>>> Software
>>>>> ?????????? ? ?    ?            ?            ? Manual
>>>>> -rwxr--r-- 1 root lrhorer   1134 Feb  5  2012 readme.txt
>>>>>
>>>>>     So now, what?
>>>>>
>>>>> _______________________________________________
>>>>> xfs mailing list
>>>>> xfs@oss.sgi.com
>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-08-04  7:52 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-18  1:46 XFS File system in trouble Rhorer, Leslie
2015-07-18 14:16 ` Eric Sandeen
2015-07-18 17:23   ` Rhorer, Leslie
2015-07-18 17:47     ` Kris Rusocki
2015-07-18 18:12       ` Leslie Rhorer
2015-07-19  1:02       ` Leslie Rhorer
2015-07-19 23:27         ` Dave Chinner
2015-07-20  7:41           ` Leslie Rhorer
2015-07-20  8:05             ` Martin Papik
2015-07-20  8:35               ` Leslie Rhorer
2015-07-20  8:52                 ` Martin Papik
2015-07-20 13:08                 ` Gim Leong Chin
2015-07-20 13:34             ` Eric Sandeen
2015-07-23  3:18             ` Eric Sandeen
2015-07-24 13:47               ` Leslie Rhorer
2015-07-24 14:44                 ` Eric Sandeen
2015-07-24 15:29                   ` Rhorer, Leslie
2015-07-20 11:17         ` Brian Foster
2015-07-23  1:45           ` Leslie Rhorer
2015-07-23 11:36             ` Brian Foster
2015-07-28  7:46           ` Leslie Rhorer
2015-07-28  8:35             ` Stefan Ring
2015-07-28 10:48             ` Roger Willcocks
2015-07-28 12:33             ` Brian Foster
2015-07-28 15:13               ` Leslie Rhorer
2015-07-28 16:53                 ` Eric Sandeen
2015-07-28 19:12                   ` Martin Papik
2015-07-28 19:52                     ` Martin Steigerwald
2015-07-28 22:11                 ` Brian Foster
2015-08-02 20:24                   ` Leslie Rhorer
2015-08-04  7:52                     ` Leslie Rhorer [this message]
2015-08-04 12:19                       ` Brian Foster
2015-08-04 22:42                       ` Dave Chinner
2015-08-10  1:37                         ` Leslie Rhorer
2015-08-13  6:21                           ` Leslie Rhorer
2015-08-14  1:26                             ` Dave Chinner
2015-08-14 23:12                               ` Leslie Rhorer
2015-08-15 12:28                                 ` Roger Willcocks
2015-08-15 18:48                                   ` Eric Sandeen
2015-08-15 18:57                                     ` Roger Willcocks
2015-08-15 22:48                                       ` Dave Chinner
2015-08-15 19:00                                     ` Eric Sandeen
2015-08-15 19:13                                       ` Roger Willcocks
2015-08-16  0:32                                       ` Eric Sandeen
2015-08-18  2:14                                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C06F41.4030502@mygrande.net \
    --to=lrhorer@mygrande.net \
    --cc=Leslie.Rhorer@level3.com \
    --cc=bfoster@redhat.com \
    --cc=kszysiu@braxis.org \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox