Ok, I've been seeing a problem here since had to move over to XFS from JFS due to file system size issues. I am seeing XFS Data corruption under ?heavy io? Basically, what happens is that under heavy load (i.e. if I'm doing say a xfs_fsr (which nearly always triggers the freeze issue) on a volume the system hovers around 90% utilization for the dm device for a while (sometimes an hour+, sometimes minutes) the subsystem goes into 100% utilization and then freezes solid forcing me to do a hard reboot of the box. When coming back up generally the XFS volumes are really screwed up (see below). Areca cards all have BBU's and the only write cache is on the BBU (drive cache disabled). Systems are all UPS protected as well. These freezes have happened too frequently and unfortunatly nothing is logged anywhere. It's not worth doing a repair as the amount of corruption is too extensive so requires a complete restore from backup. I just mention xfs_fsr here as that /seems/ to generate an I/O pattern that nearly always results in a freeze. I have done it with other high-i/o functions though not as reliably. I don't know what else can be done to remove this issue (and not really sure it's really directly related to XFS, as LVM and the areca driver are also involved) however the main result is that XFS gets really screwed up. I did NOT have these issues w/ JFS (same subsystem lvm + areca set up so it /seems/ to point to XFS or at least it's tied in there somewhere) unfortunately JFS has issues with file systems larger than 32TiB so the only file system I can use is XFS. Since I'm using hardware raid w/ BBU when I reboot and it comes back up the raid controller writes out to the drives any outstanding data in it's cache and from the hardware point of view (as well as lvm's point of view) the array is ok. The file system however generally can't be mounted (about 4 out of 5 times, some times it does get auto-mounted but when I then run an xfs_repair -n -v in those cases there are pages of errors (badly aligned inode rec, bad starting inode #'s, dubious inode btree block headers among others). When I let a repair actually run in one case out of 4,500,000 files it linked about 2,000,000 or so but there was no way to identify and verify file integrity. The others were just lost. This is not limited to large volume sizes I have seen similar on small ~2TiB file systems as well. Also when it happened in a couple cases the file system that was taking the I/O (say xfs_fsr -v /home ) another XFS filesystem on the same system which was NOT taking much if any I/O gets badly corrupted (say /var/test ). Both would be using the same areca controllers and same physical discs (same PV's and same VG's but different LV's). Any suggestions on how to isolate or eliminate this would be greatly appreciated. Steve Technical data is below: ============== $iostat -m -x 15 (IOSTAT capture right up to a freeze event:) (system sits here for a long bit hovering around 90% for the DM device and about 30% for the the PV's) Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 7.80 0.07 2.00 0.00 0.04 38.19 0.00 2.26 0.97 0.20 sdb 120.07 34.47 253.00 706.67 24.98 28.96 115.11 1.06 1.10 0.28 26.87 sdc 48.80 28.93 324.73 730.87 24.98 28.94 104.62 1.19 1.13 0.29 30.60 sdd 121.73 33.13 251.60 700.40 24.99 28.94 116.01 1.11 1.17 0.29 27.40 sde 49.00 28.60 324.33 731.47 24.99 28.95 104.65 1.22 1.15 0.26 27.53 sdf 120.27 33.20 253.00 701.00 24.99 28.97 115.84 1.14 1.20 0.33 31.67 sdg 48.80 29.07 324.73 731.80 25.00 28.95 104.59 1.37 1.29 0.35 36.93 sdh 120.47 33.47 252.73 702.53 25.00 28.96 115.68 1.24 1.30 0.35 33.67 sdi 50.73 28.27 322.73 735.13 24.99 29.01 104.54 1.34 1.26 0.31 32.27 dm-0 0.00 0.00 0.13 0.13 0.00 0.00 12.00 0.01 25.00 25.00 0.67 dm-1 0.00 0.00 1602.67 992.73 199.93 231.69 340.59 4.12 1.59 0.34 88.40 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 (Then it jumps up to 99-100% for the majority of devices (here sdf,sdg, sdh, sdi are all on the same physical areca card). avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.60 24.71 0.00 74.69 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 4.07 0.00 1.13 0.00 0.02 36.71 0.00 1.18 1.18 0.13 sdb 2.07 1.93 8.00 17.00 0.63 0.84 120.33 0.04 1.49 0.35 0.87 sdc 2.87 1.20 7.40 22.13 0.63 0.83 101.86 0.04 1.49 0.25 0.73 sdd 2.13 1.80 8.07 17.20 0.63 0.84 119.64 0.04 1.45 0.32 0.80 sde 2.93 1.07 7.20 21.80 0.63 0.83 103.65 0.05 1.89 0.34 1.00 sdf 1.93 1.87 8.13 13.67 0.63 0.64 119.78 46.58 2.35 45.63 99.47 sdg 2.87 1.00 7.13 17.80 0.62 0.64 104.04 64.12 2.41 39.84 99.33 sdh 2.07 1.67 7.93 13.47 0.62 0.64 121.22 47.85 2.12 46.39 99.27 sdi 2.93 1.07 7.07 18.47 0.62 0.64 101.77 62.15 2.32 38.83 99.13 dm-0 0.00 0.00 0.20 0.07 0.00 0.00 10.00 0.00 2.50 2.50 0.07 dm-1 0.00 0.00 40.20 30.13 5.03 6.68 340.96 74.73 2.13 14.19 99.80 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 (Then here it hits 100% and the system locks) avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.81 24.95 0.00 74.24 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 8.40 0.00 2.13 0.00 0.04 39.50 0.00 1.88 0.63 0.13 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 0.07 0.00 0.00 16.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 50.00 0.00 0.00 100.00 sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 69.00 0.00 0.00 100.00 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 50.00 0.00 0.00 100.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 65.00 0.00 0.00 100.00 dm-0 0.00 0.00 0.00 0.07 0.00 0.00 16.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 85.00 0.00 0.00 100.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ============ (System) (Ubuntu 8.04.3 LTS): Linux loki 2.6.24-26-server #1 SMP Tue Dec 1 18:26:43 UTC 2009 x86_64 GNU/Linux -------------- xfs_repair version 2.9.4 ============= (modinfo's) filename: /lib/modules/2.6.24-26-server/kernel/fs/xfs/xfs.ko license: GPL description: SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled author: Silicon Graphics, Inc. srcversion: A2E6459B3A4C96355F95E61 depends: vermagic: 2.6.24-26-server SMP mod_unload ============ filename: /lib/modules/2.6.24-26-server/kernel/drivers/scsi/arcmsr/arcmsr.ko version: Driver Version 1.20.00.15 2007/08/30 license: Dual BSD/GPL description: ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID HOST Adapter author: Erich Chen srcversion: 38E576EB40C1A58E8B9E007 alias: pci:v000017D3d00001681sv*sd*bc*sc*i* alias: pci:v000017D3d00001680sv*sd*bc*sc*i* alias: pci:v000017D3d00001381sv*sd*bc*sc*i* alias: pci:v000017D3d00001380sv*sd*bc*sc*i* alias: pci:v000017D3d00001280sv*sd*bc*sc*i* alias: pci:v000017D3d00001270sv*sd*bc*sc*i* alias: pci:v000017D3d00001260sv*sd*bc*sc*i* alias: pci:v000017D3d00001230sv*sd*bc*sc*i* alias: pci:v000017D3d00001220sv*sd*bc*sc*i* alias: pci:v000017D3d00001210sv*sd*bc*sc*i* alias: pci:v000017D3d00001202sv*sd*bc*sc*i* alias: pci:v000017D3d00001201sv*sd*bc*sc*i* alias: pci:v000017D3d00001200sv*sd*bc*sc*i* alias: pci:v000017D3d00001170sv*sd*bc*sc*i* alias: pci:v000017D3d00001160sv*sd*bc*sc*i* alias: pci:v000017D3d00001130sv*sd*bc*sc*i* alias: pci:v000017D3d00001120sv*sd*bc*sc*i* alias: pci:v000017D3d00001110sv*sd*bc*sc*i* depends: scsi_mod vermagic: 2.6.24-26-server SMP mod_unload =========== filename: /lib/modules/2.6.24-26-server/kernel/drivers/md/dm-mod.ko license: GPL author: Joe Thornber description: device-mapper driver srcversion: A7E89E997173E41CB6AAF04 depends: vermagic: 2.6.24-26-server SMP mod_unload parm: major:The major number of the device mapper (uint) =========== ============ mounted with: /dev/vg_media/lv_ftpshare /var/ftp xfs defaults,relatime,nobarrier,logbufs=8,logbsize=256k,sunit=256,swidth=2048,inode64,noikeep,largeio,swalloc,allocsize=128k 0 2 ============ XFS info: meta-data=/dev/mapper/vg_media-lv_ftpshare isize=2048 agcount=41, agsize=268435424 blks = sectsz=512 attr=0 data = bsize=4096 blocks=10737418200, imaxpct=1 = sunit=32 swidth=256 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=32 blks, lazy-count=0 realtime =none extsz=1048576 blocks=0, rtextents=0 ============= XFS is running on top of LVM: --- Logical volume --- LV Name /dev/vg_media/lv_ftpshare VG Name vg_media LV UUID MgEBWv-x9fn-KUoJ-3y5X-snlk-7F9E-A3CiHh LV Write Access read/write LV Status available # open 1 LV Size 40.00 TB Current LE 40960 Segments 1 Allocation inherit Read ahead sectors 0 Block device 254:1 ============== LVM is using as it's base physical volumes 8 hardware raids (MediaVol00-70 inclusive): [ 175.320738] ARECA RAID ADAPTER4: FIRMWARE VERSION V1.47 2009-07-16 [ 175.336238] scsi4 : Areca SAS Host Adapter RAID Controller( RAID6 capable) [ 175.336239] Driver Version 1.20.00.15 2007/08/30 [ 175.336387] ACPI: PCI Interrupt 0000:0a:00.0[A] -> GSI 17 (level, low) -> IRQ 17 [ 175.336395] PCI: Setting latency timer of device 0000:0a:00.0 to 64 [ 175.336990] scsi 4:0:0:0: Direct-Access Areca BootVol#00 R001 PQ: 0 ANSI: 5 [ 175.337096] scsi 4:0:0:1: Direct-Access Areca MediaVol#00 R001 PQ: 0 ANSI: 5 [ 175.337169] scsi 4:0:0:2: Direct-Access Areca MediaVol#10 R001 PQ: 0 ANSI: 5 [ 175.337240] scsi 4:0:0:3: Direct-Access Areca MediaVol#20 R001 PQ: 0 ANSI: 5 [ 175.337312] scsi 4:0:0:4: Direct-Access Areca MediaVol#30 R001 PQ: 0 ANSI: 5 [ 175.337907] scsi 4:0:16:0: Processor Areca RAID controller R001 PQ: 0 ANSI: 0 [ 175.356231] ARECA RAID ADAPTER5: FIRMWARE VERSION V1.47 2009-10-22 [ 175.376144] scsi5 : Areca SAS Host Adapter RAID Controller( RAID6 capable) [ 175.376145] Driver Version 1.20.00.15 2007/08/30 [ 175.377354] scsi 5:0:0:5: Direct-Access Areca MediaVol#40 R001 PQ: 0 ANSI: 5 [ 175.377434] scsi 5:0:0:6: Direct-Access Areca MediaVol#50 R001 PQ: 0 ANSI: 5 [ 175.377495] scsi 5:0:0:7: Direct-Access Areca MediaVol#60 R001 PQ: 0 ANSI: 5 [ 175.377587] scsi 5:0:1:0: Direct-Access Areca MediaVol#70 R001 PQ: 0 ANSI: 5 [ 175.378156] scsi 5:0:16:0: Processor Areca RAID controller R001 PQ: 0 ANSI: 0 =================