Re: XFS Mount Recovery Failed on Root File System After Power Outage

From: "Chin Gim Leong" <CHIN_Gim_Leong@performance-computing.net>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS Mount Recovery Failed on Root File System After Power      Outage
Date: Mon, 3 Sep 2012 00:13:05 +0800	[thread overview]
Message-ID: <0a1f3a3e331f28a9fb81973dc0893fb5.squirrel@emailmg.netfirms.com> (raw)

[-- Attachment #1.1: Type: text/plain, Size: 8099 bytes --]

Hi Dave

I did the "xfs_repair -L" on the root file
system; xfs_repair is version 3.1.8.  The repair was successful.

> You misunderstood. I was asking for the messages when it
>
successfully mounts and the contents of /proc/mounts is when it is
mounted to see if barriers were disabled or not supported on your
hardware.
> 
The notebook is Acer Aspire 6530G, AMD Turion
X2 RM-74, chipset is AMD M780G, southbridge is AMD SB 700.  The hard drive
is Western Digital Scorpio Black SATA 320 GB, WDC WD3200BEKT-00F3T0,
/dev/sdb
chingl@rat:~> cat
/etc/fstab                                                                                                                                                                   

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part1
swap                 swap       defaults              0
0                                                                                

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part2
/                    xfs       
defaults,logbufs=8,logbsize=256k              1
1                                                         
/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part3
/home                xfs       
defaults,logbufs=8,logbsize=256k              1 2
/dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part2
/windows/C           ntfs-3g   
users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0
/dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part3 /windows/D          
ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0
proc                 /proc                proc       defaults             
0 0
sysfs                /sys                 sysfs     
noauto                0 0
debugfs              /sys/kernel/debug   
debugfs    noauto                0 0
usbfs               
/proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts    
mode=0620,gid=5       0 0
chingl@rat:~>
chingl@rat:~> cat
/proc/mounts 
rootfs / rootfs rw 0 0
devtmpfs /dev devtmpfs
rw,relatime,size=1761184k,nr_inodes=440296,mode=755 0 0
tmpfs
/dev/shm tmpfs rw,relatime 0 0
devpts /dev/pts devpts
rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sdb2 / xfs
rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 proc /proc proc
rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
debugfs
/sys/kernel/debug debugfs rw,relatime 0 0
/dev/sdb3 /home xfs
rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0
 fusectl
/sys/fs/fuse/connections fusectl rw,relatime 0 0
securityfs
/sys/kernel/security securityfs rw,relatime 0 0
none
/proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
none
/var/lib/ntp/proc proc ro,nosuid,nodev,relatime 0 0
gvfs-fuse-daemon
/home/chingl/.gvfs fuse.gvfs-fuse-daemon
rw,nosuid,nodev,relatime,user_id=1000,group_id=100 0 0
chingl@rat:~>
>From /var/log/boot.msg:

<6>[    1.084085] ata3: SATA link down (SStatus 0 SControl 300)
  <6>[    1.084172] ata4: SATA link down (SStatus 0 SControl
300)
  <3>[    1.256058] ata2: softreset failed (device not
ready)
  <4>[    1.256067] ata2: applying SB600 PMP SRST
workaround and retrying   <3>[    1.256084] ata1: softreset failed
(device not ready)
  <4>[    1.256095] ata1: applying SB600 PMP
SRST workaround and retrying   <6>[    1.428069] ata2: SATA link up
3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.428097] ata1: SATA
link up 3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.429454]
ata1.00: ATA-8: Hitachi HTS543232L9A300, FB4OC40C, max UDMA/133

<6>[    1.429458] ata1.00: 625142448 sectors, multi 16: LBA48 NCQ
(depth 31/32), AA
  <6>[    1.430923] ata1.00: configured for
UDMA/133
  <5>[    1.431164] scsi 0:0:0:0: Direct-Access    
ATA      Hitachi HTS54323 FB4O PQ: 0 ANSI: 5
  <5>[   
1.431560] sd 0:0:0:0: [sda] 625142448 512-byte logical blocks: (320 GB/298
GiB)
  <5>[    1.431672] sd 0:0:0:0: [sda] Write Protect is
off
  <7>[    1.431675] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00
00
  <5>[    1.431915] sd 0:0:0:0: [sda] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
  <6>[   
1.441064] ata2.00: ATA-8: WDC WD3200BEKT-00F3T0, 11.01A11, max UDMA/133
  <6>[    1.441067] ata2.00: 625142448 sectors, multi 16: LBA48
NCQ (depth 31/32), AA
  <6>[    1.442978] ata2.00: configured
for UDMA/133
  <5>[    1.443207] scsi 1:0:0:0:
Direct-Access     ATA      WDC
WD3200BEKT-0 11.0 PQ: 0 ANSI: 5

<5>[    1.443387] sd 1:0:0:0: [sdb] 625142448 512-byte logical
blocks: (320 GB/298 GiB)
  <5>[    1.443707] sd 1:0:0:0: [sdb]
Write Protect is off
  <7>[    1.443710] sd 1:0:0:0: [sdb] Mode
Sense: 00 3a 00 00
  <5>[    1.443756] sd 1:0:0:0: [sdb] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA

<6>[    1.459438]  sda: sda1 sda2 sda3 sda4
  <5>[   
1.460197] sd 0:0:0:0: [sda] Attached SCSI disk
  <6>[   
1.476394] Synaptics Touchpad, model: 1, fw: 6.3, id: 0x12a0b1, caps:
0xa04711/0xa04000/0x0
  <6>[    1.481898]  sdb: sdb1 sdb2
sdb3
  <5>[    1.482254] sd 1:0:0:0: [sdb] Attached SCSI
disk

  <6>[    3.201111] SGI XFS with ACLs,
security attributes, realtime, large block/inode numbers, no debug
enabled
  <6>[    3.202279] SGI XFS Quota Management
subsystem
  <5>[    3.230174] XFS mounting filesystem sdb2
  <7>[    3.351323] Ending clean XFS mount for filesystem: sdb2

  <5>[   11.865393] XFS mounting filesystem sdb3

<7>[   12.008707] Ending clean XFS mount for filesystem: sdb3

I do not see anything that says barrier is not supported?

> Only by looking at them can you know. Regardless of what
filesystem you
are using, recovery of files and directories from
lost+found is the same process. e.g. do an rpm check to see if allteh
installed packages are intact. that will narrow down where all your
binaries came from. use of strings can also tell you what the binary is.
e.g:

> Define "really there" when important
metadata (i.e. the log) has been
corrupted and is not available any
more.  Indeed, if things like btree splits of merges occurred in the log,
and they are
> partially written to disk, it's entirely possible
that you could lose
directory references to inodes that haven't been
modified for some time....
> 
> Remember, like all fsck
programs, xfs_repair is a best effort
> attempt at correcting the
problems found  - there are no guarantees
given about what it can and
can't recover when it runs...
> 
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> 

Looking at the messages from xfs_repair, and inspection of /lost+found,
the files in it are from /tmp and
/etc/NetworkManager/system-connections/Auto eth0 (a session file). I
think one of the index.db in /var/cache/man was also affected.

I did a "rpm -V -a", thanks for the advice.  I manually
inspected every output, paying attention to the missing files and file
size or checksum mismatches.
There were 5 missing files from two
packages, as well as a number of installed files of zero file size.  I
have re-installed all of the affected packages.

May I know, if
it is possible that, due to the loss of the journal log, that some package
installed files either go missing, or show zero file size?

An
aside, the reason I use XFS is that when I was a student, I did my work in
a school centre with a cluster of 12 SGI Indigo2 R10000 and 3 SGI O2
R5000.  Due to buggy IRIX 6.2 and earlier releases of IRIX 6.5, 6.5.X, the
machines had kernel panics; faulty power supplies (the maintenance was
discontinued) also caused stoppages.  On restart, the recovery was always
instantaneous, no XFS file system repair was ever done
When I built
my first computer, it was only natural that I chose XFS, that was SUSE
Linux 10.0.  On various versions of SUSE, I have had freezes and power
outages, but I have never had to repair file system.  The only time I had
ever had to run xfs_repair was when Areca RAID spitted out the WD desktop
drives and I had to rescue the RAID, so I was very unprepared for this
latest incident.

Any way, a big thanks to all those who
contributed towards XFS over the years in IRIX and Linux, I could not
imagine back in the 90s that in the future I would have a piece of SGI
IRIX technology in my own personal computers.

GL

[-- Attachment #1.2: Type: text/html, Size: 8985 bytes --]

[-- Attachment #2: untitled-[1.2] --]
[-- Type: text/html, Size: 9009 bytes --]

[-- Attachment #3: untitled-[1.2] --]
[-- Type: text/html, Size: 7848 bytes --]

[-- Attachment #4: untitled-[1.2] --]
[-- Type: text/html, Size: 9138 bytes --]

[-- Attachment #5: untitled-[2] --]
[-- Type: text/html, Size: 9812 bytes --]

[-- Attachment #6: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs