public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS filesystem on EC2 instance corrupts and shuts down
@ 2013-03-06  8:07 Shrinath M
  2013-03-06 12:59 ` Ric Wheeler
  0 siblings, 1 reply; 15+ messages in thread
From: Shrinath M @ 2013-03-06  8:07 UTC (permalink / raw)
  To: xfs; +Cc: Sabyasachi Ruj, Vivek Goel


[-- Attachment #1.1: Type: text/plain, Size: 867 bytes --]

We are experiencing a strange XFS corruption issue. If we look in
/var/log/messages, it simply says -

    Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248619] XFS (md0):
Corruption detected. Unmount and run xfs_repair

It shuts down the filesystem after this. On rebooting, it calls xfs_repair
automatically and everything comes back to normal.
We have had 2 such occurrences till now, I am attaching the relevant parts
of /var/log/messages here, assuming someone can enlighten me on whats going
wrong.

Machine details are as follows -

We are using Amazon AMI version: Amazon Linux AMI release 2012.09 We are
running 8 EBS volumes of 512 MB each, in RAID 0 Array. $~: uname -a Linux
ip-100-0-100-1 3.2.34-55.46.amzn1.x86_64 #1 SMP Tue Nov 20 10:06:15 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux


Ask me if anyone wants any more details


-- 
Regards
*Shrinath.M*

[-- Attachment #1.2: Type: text/html, Size: 1777 bytes --]

[-- Attachment #2: xfs_corruption_logs.txt --]
[-- Type: text/plain, Size: 32894 bytes --]

Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.014241] XFS (md0): Corruption detected. Unmount and run xfs_repair
Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.014259] XFS (md0): xfs_iunlink_remove: xfs_itobp() returned error 117.
Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.014267] XFS (md0): xfs_inactive: xfs_ifree returned error 117
Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.014274] XFS (md0): xfs_do_force_shutdown(0x1) called from line 745 of file fs/xfs/xfs_vnodeops.c.  Return address = 0xffffffffa021410f
Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O Error Detected. Shutting down filesystem
Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023654] XFS (md0): Please umount the filesystem and rectify the problem(s)
Feb 12 19:47:43 ip-100-0-100-1 kernel: [2541193.052041] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:48:13 ip-100-0-100-1 kernel: [2541223.132038] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:48:43 ip-100-0-100-1 kernel: [2541253.212030] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:49:13 ip-100-0-100-1 kernel: [2541283.292034] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:49:43 ip-100-0-100-1 kernel: [2541313.372036] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:50:14 ip-100-0-100-1 kernel: [2541343.452034] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:50:44 ip-100-0-100-1 kernel: [2541373.532046] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:51:14 ip-100-0-100-1 kernel: [2541403.612031] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:51:44 ip-100-0-100-1 kernel: [2541433.692037] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:52:14 ip-100-0-100-1 kernel: [2541463.772035] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:52:44 ip-100-0-100-1 kernel: [2541493.852034] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:53:14 ip-100-0-100-1 kernel: [2541523.932043] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:53:44 ip-100-0-100-1 kernel: [2541554.012032] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:54:14 ip-100-0-100-1 kernel: [2541584.092037] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:54:44 ip-100-0-100-1 kernel: [2541614.172037] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:55:14 ip-100-0-100-1 kernel: [2541644.252046] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:55:44 ip-100-0-100-1 kernel: [2541674.336034] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:56:14 ip-100-0-100-1 kernel: [2541704.412034] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:56:45 ip-100-0-100-1 kernel: [2541734.492039] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:57:15 ip-100-0-100-1 kernel: [2541764.572045] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:57:45 ip-100-0-100-1 kernel: [2541794.652045] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:58:15 ip-100-0-100-1 kernel: [2541824.732037] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:58:45 ip-100-0-100-1 kernel: [2541854.812034] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:59:15 ip-100-0-100-1 kernel: [2541884.892124] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 19:59:45 ip-100-0-100-1 kernel: [2541914.972031] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:00:15 ip-100-0-100-1 kernel: [2541945.056036] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:00:45 ip-100-0-100-1 kernel: [2541975.132037] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:01:15 ip-100-0-100-1 kernel: [2542005.212051] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:01:45 ip-100-0-100-1 kernel: [2542035.292035] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:02:15 ip-100-0-100-1 kernel: [2542065.372038] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:02:46 ip-100-0-100-1 kernel: [2542095.452035] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:03:16 ip-100-0-100-1 kernel: [2542125.532039] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:03:46 ip-100-0-100-1 kernel: [2542155.612050] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:04:16 ip-100-0-100-1 kernel: [2542185.692032] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:04:46 ip-100-0-100-1 kernel: [2542215.772046] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:05:16 ip-100-0-100-1 kernel: [2542245.852036] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:05:46 ip-100-0-100-1 kernel: [2542275.932047] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:06:16 ip-100-0-100-1 kernel: [2542306.012032] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:06:46 ip-100-0-100-1 kernel: [2542336.092028] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:07:16 ip-100-0-100-1 kernel: [2542366.172030] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:07:46 ip-100-0-100-1 kernel: [2542396.252036] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:08:16 ip-100-0-100-1 kernel: [2542426.332032] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:08:46 ip-100-0-100-1 kernel: [2542456.412033] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:09:17 ip-100-0-100-1 kernel: [2542486.492096] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:09:47 ip-100-0-100-1 kernel: [2542516.572037] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:10:17 ip-100-0-100-1 kernel: [2542546.652030] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:10:47 ip-100-0-100-1 kernel: [2542576.732032] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:11:17 ip-100-0-100-1 kernel: [2542606.812042] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:11:47 ip-100-0-100-1 kernel: [2542636.892137] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:12:17 ip-100-0-100-1 kernel: [2542666.972031] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:12:47 ip-100-0-100-1 kernel: [2542697.052037] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:13:17 ip-100-0-100-1 kernel: [2542727.132042] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:13:47 ip-100-0-100-1 kernel: [2542757.212056] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:14:17 ip-100-0-100-1 kernel: [2542787.292033] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:14:47 ip-100-0-100-1 kernel: [2542817.372053] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:15:18 ip-100-0-100-1 kernel: [2542847.452077] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:15:48 ip-100-0-100-1 kernel: [2542877.532044] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:16:18 ip-100-0-100-1 kernel: [2542907.612040] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:16:48 ip-100-0-100-1 kernel: [2542937.692036] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:17:17 ip-100-0-100-1 init: serial (hvc0) main process (1763) killed by TERM signal
Feb 12 20:17:17 ip-100-0-100-1 init: tty (/dev/tty1) main process (1764) killed by TERM signal
Feb 12 20:17:17 ip-100-0-100-1 init: tty (/dev/tty2) main process (1767) killed by TERM signal
Feb 12 20:17:17 ip-100-0-100-1 init: tty (/dev/tty3) main process (1770) killed by TERM signal
Feb 12 20:17:17 ip-100-0-100-1 init: tty (/dev/tty4) main process (1773) killed by TERM signal
Feb 12 20:17:17 ip-100-0-100-1 init: tty (/dev/tty5) main process (1775) killed by TERM signal
Feb 12 20:17:17 ip-100-0-100-1 init: tty (/dev/tty6) main process (1777) killed by TERM signal
Feb 12 20:17:17 ip-100-0-100-1 init: plymouth-shutdown main process (13534) terminated with status 1
Feb 12 20:17:17 ip-100-0-100-1 init: splash-manager main process (13530) terminated with status 1
Feb 12 20:17:18 ip-100-0-100-1 kernel: [2542967.772046] XFS (md0): xfs_log_force: error 5 returned.
Feb 12 20:17:19 ip-100-0-100-1 ntpd[1415]: ntpd exiting on signal 15
Feb 12 20:17:20 ip-100-0-100-1 init: Disconnected from system bus
Feb 12 20:17:20 ip-100-0-100-1 auditd[1301]: The audit daemon is exiting.
Feb 12 20:17:20 ip-100-0-100-1 kernel: [2542969.540860] type=1305 audit(1360700240.109:361670): audit_pid=0 old=1301 auid=4294967295 ses=4294967295 res=1
Feb 12 20:17:20 ip-100-0-100-1 kernel: [2542969.644508] type=1305 audit(1360700240.213:361671): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1
Feb 12 20:17:20 ip-100-0-100-1 kernel: Kernel logging (proc) stopped.
Feb 12 20:17:20 ip-100-0-100-1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1316" x-info="http://www.rsyslog.com"] exiting on signal 15.
Feb 12 20:19:57 ip-100-0-100-1 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Feb 12 20:19:57 ip-100-0-100-1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1335" x-info="http://www.rsyslog.com"] start
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Initializing cgroup subsys cpuset
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Initializing cgroup subsys cpu
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Linux version 3.2.34-55.46.amzn1.x86_64 (mockbuild@gobi-build-31003) (gcc version 4.6.2 20111027 (Red Hat 4.6.2-2) (GCC) ) #1 SMP Tue Nov 20 10:06:15 UTC 2012
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Command line: root=LABEL=/ console=hvc0 LANG=en_US.UTF-8 KEYTABLE=us
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Marking TSC unstable due to Xen domain
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] ACPI in unprivileged domain disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Released 0 pages of unused memory
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Set 0 page(s) to 1-1 mapping
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] BIOS-provided physical RAM map:
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000]  Xen: 0000000000100000 - 00000001e0800000 (usable)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] NX (Execute Disable) protection: active
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] DMI not present or invalid.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] No AGP bridge found
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] last_pfn = 0x1e0800 max_arch_pfn = 0x400000000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] last_pfn = 0x100000 max_arch_pfn = 0x400000000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] init_memory_mapping: 0000000000000000-0000000100000000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] init_memory_mapping: 0000000100000000-00000001e0800000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] RAMDISK: 019a2000 - 02d74000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] NUMA turned off
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Faking a node at 0000000000000000-00000001e0800000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Initmem setup node 0 0000000000000000-00000001e0800000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000]   NODE_DATA [00000001dfffb000 - 00000001dfffffff]
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Zone PFN ranges:
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000]   DMA      0x00000010 -> 0x00001000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000]   DMA32    0x00001000 -> 0x00100000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000]   Normal   0x00100000 -> 0x001e0800
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Movable zone start PFN for each node
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] early_node_map[2] active PFN ranges
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000]     0: 0x00000010 -> 0x000000a0
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000]     0: 0x00000100 -> 0x001e0800
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] No local APIC present
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] APIC: disable apic facility
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] APIC: switched to apic NOOP
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] PCI: Warning: Cannot find a gap in the 32bit address range
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] PCI: Unassigned devices with 32bit resource registers may break!
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Allocating PCI resources starting at 1e0900000 (gap: 1e0900000:400000)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Booting paravirtualized kernel on Xen
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Xen version: 3.4.3-2.6.18 (preserve-AD)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] setup_percpu: NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:2 nr_node_ids:1
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] PERCPU: Embedded 27 pages/cpu @ffff8801dfc00000 s80512 r8192 d21888 u1048576
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1935244
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Policy zone: Normal
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Kernel command line: root=LABEL=/ console=hvc0 LANG=en_US.UTF-8 KEYTABLE=us
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Checking aperture...
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] No AGP bridge found
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Memory: 7612316k/7872512k available (3844k kernel code, 448k absent, 259748k reserved, 2971k data, 556k init)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Hierarchical RCU implementation.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] NR_IRQS:4352 nr_irqs:288 16
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Console: colour dummy device 80x25
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] console [tty0] enabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] console [hvc0] enabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] installing Xen timer for CPU 0
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.000000] Detected 2266.746 MHz processor.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4533.49 BogoMIPS (lpj=9066984)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.004000] pid_max: default: 32768 minimum: 301
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.004000] Security Framework initialized
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.004000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.005068] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.006129] Mount-cache hash table entries: 256
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.006393] Initializing cgroup subsys cpuacct
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.006405] Initializing cgroup subsys devices
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.006411] Initializing cgroup subsys freezer
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.006416] Initializing cgroup subsys blkio
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.006429] Initializing cgroup subsys perf_event
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.006534] CPU: Physical Processor ID: 0
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.006540] CPU: Processor Core ID: 0
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.006620] SMP alternatives: switching to UP code
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.028174] cpu 0 spinlock event irq 17
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.028204] Performance Events: unsupported p6 CPU model 26 no PMU driver, software events only.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.028409] installing Xen timer for CPU 1
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.028425] cpu 1 spinlock event irq 23
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.028456] SMP alternatives: switching to SMP code
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.049286] Brought up 2 CPUs
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.049349] devtmpfs: initialized
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.049349] Grant table initialized
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.049349] NET: Registered protocol family 16
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.052471] PCI: setting up Xen PCI frontend stub
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.056043] bio: create slab <bio-0> at 0
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.056058] ACPI: Interpreter disabled.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.056132] xen/balloon: Initialising balloon driver.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.060124] xen-balloon: Initialising balloon driver.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.060158] vgaarb: loaded
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.060158] PCI: System does not support PCI
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.060158] PCI: System does not support PCI
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.060301] NetLabel: Initializing
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.060307] NetLabel:  domain hash size = 128
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.060311] NetLabel:  protocols = UNLABELED CIPSOv4
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.060326] NetLabel:  unlabeled traffic allowed by default
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.060334] Switching to clocksource xen
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.066914] pnp: PnP ACPI: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.074316] NET: Registered protocol family 2
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.074862] IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.077242] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.079475] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.079720] TCP: Hash tables configured (established 524288 bind 65536)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.079726] TCP reno registered
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.079766] UDP hash table entries: 4096 (order: 5, 131072 bytes)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.079835] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.079961] NET: Registered protocol family 1
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.080115] Unpacking initramfs...
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.104125] Freeing initrd memory: 20296k freed
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.110637] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.110656] Placing 64MB software IO TLB between ffff8800fb0f3000 - ffff8800ff0f3000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.110662] software IO TLB at phys 0xfb0f3000 - 0xff0f3000
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.110870] platform rtc_cmos: registered platform RTC device (no PNP device found)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.112573] audit: initializing netlink socket (disabled)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.112592] type=2000 audit(1360700379.774:1): initialized
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.136245] VFS: Disk quotas dquot_6.5.2
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.136342] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.136395] msgmni has been set to 14907
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.136800] alg: No test for stdrng (krng)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.136954] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.136998] io scheduler noop registered (default)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.138079] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.254043] loop: module loaded
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.258888] blkfront: xvda1: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.262418] blkfront: xvdb: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.272465]  xvdb: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.273878] Setting capacity to 880732160
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.273893] xvdb: detected capacity change from 0 to 450934865920
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.275405] blkfront: xvdc: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.279177]  xvdc: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.284292] blkfront: xvdf: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.286728]  xvdf: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.287778] Setting capacity to 880732160
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.287790] xvdc: detected capacity change from 0 to 450934865920
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.289138] blkfront: xvdg: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.291063]  xvdg: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.291573] Setting capacity to 1073741824
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.291586] xvdf: detected capacity change from 0 to 549755813888
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.293286] blkfront: xvdh: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.295207]  xvdh: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.296072] Initialising Xen virtual ethernet driver.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.296160] Setting capacity to 1073741824
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.296168] xvdg: detected capacity change from 0 to 549755813888
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.296583] Setting capacity to 1073741824
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.296598] xvdh: detected capacity change from 0 to 549755813888
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.297953] blkfront: xvdi: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.298589] i8042: PNP: No PS/2 controller found. Probing ports directly.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.299613] mousedev: PS/2 mouse device common for all mice
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.299922] TCP cubic registered
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.299930] NET: Registered protocol family 17
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.299942] Registering the dns_resolver key type
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.300055]  xvdi: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.300408] Setting capacity to 1073741824
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.300417] xvdi: detected capacity change from 0 to 549755813888
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.300459] registered taskstats version 1
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.301257] blkfront: xvdj: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.303014]  xvdj: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.303415] Setting capacity to 1073741824
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.303427] xvdj: detected capacity change from 0 to 549755813888
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.304413] blkfront: xvdk: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.306294]  xvdk: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.306548] Setting capacity to 1073741824
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.306561] xvdk: detected capacity change from 0 to 549755813888
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.307302] blkfront: xvdl: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.309150]  xvdl: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.309407] Setting capacity to 1073741824
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.309420] xvdl: detected capacity change from 0 to 549755813888
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.310203] blkfront: xvdm: barrier or flush: disabled
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.311626]  xvdm: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.311923] Setting capacity to 1073741824
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.311933] xvdm: detected capacity change from 0 to 549755813888
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.400043] XENBUS: Device with no driver: device/console/0
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.400459] Freeing unused kernel memory: 556k freed
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.400661] Write protecting the kernel read-only data: 6144k
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.403178] Freeing unused kernel memory: 232k freed
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.403482] Freeing unused kernel memory: 468k freed
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.442489] device-mapper: uevent: version 1.0.3
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.442738] device-mapper: ioctl: 4.22.0-ioctl (2011-10-19) initialised: dm-devel@redhat.com
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.613258] md: bind<xvdh>
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.617192] md: bind<xvdi>
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.622753] md: bind<xvdg>
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.629064] md: bind<xvdj>
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.636427] md: bind<xvdl>
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.644348] md: bind<xvdf>
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.652065] md: bind<xvdm>
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.661512] md: bind<xvdk>
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.670157] md: raid0 personality registered for level 0
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.670698] bio: create slab <bio-1> at 1
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.670716] md/raid0:md127: md_size is 8589930496 sectors.
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.670727] md: RAID0 configuration for md127 - 1 zone
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.670731] md: zone0=[xvdf/xvdg/xvdh/xvdi/xvdj/xvdk/xvdl/xvdm]
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.671132]       zone-offset=         0KB, device-offset=         0KB, size=4294965248KB
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.671139] 
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.671163] md127: detected capacity change from 0 to 4398044413952
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    0.680152]  md127: unknown partition table
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    1.012334] EXT4-fs (xvda1): INFO: recovery required on readonly filesystem
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    1.012349] EXT4-fs (xvda1): write access will be enabled during recovery
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.498599] EXT4-fs (xvda1): orphan cleanup on readonly fs
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.643158] EXT4-fs (xvda1): 18 orphan inodes deleted
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.643168] EXT4-fs (xvda1): recovery complete
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.658923] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.684777] dracut: Remounting /dev/disk/by-label/\x2f with -o noatime,ro
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.697020] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.719468] dracut: Mounted root filesystem /dev/xvda1
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.766589] dracut: Loading SELinux policy
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.866701] dracut: /sbin/load_policy: Can't load policy: No such device
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    2.975560] dracut: Switching root
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    7.901373] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [    7.983268] SCSI subsystem initialized
Feb 12 20:19:57 ip-100-0-100-1 kernel: [   11.659684] EXT4-fs (xvda1): re-mounted. Opts: (null)
Feb 12 20:19:57 ip-100-0-100-1 kernel: [   11.801328] kjournald starting.  Commit interval 5 seconds
Feb 12 20:19:57 ip-100-0-100-1 kernel: [   11.801654] EXT3-fs (xvdb): using internal journal
Feb 12 20:19:57 ip-100-0-100-1 kernel: [   11.801664] EXT3-fs (xvdb): mounted filesystem with ordered data mode
Feb 12 20:19:57 ip-100-0-100-1 kernel: [   12.369189] NET: Registered protocol family 10
Feb 12 20:20:04 ip-100-0-100-1 ntpdate[1409]: step time server 138.236.128.112 offset 0.012273 sec
Feb 12 20:20:04 ip-100-0-100-1 ntpd[1415]: ntpd 4.2.6p5@1.2349-o Mon Nov 26 17:29:38 UTC 2012 (1)
Feb 12 20:20:04 ip-100-0-100-1 ntpd[1416]: proto: precision = 0.910 usec
Feb 12 20:20:04 ip-100-0-100-1 ntpd[1416]: 0.0.0.0 c01d 0d kern kernel time sync enabled
Feb 12 20:20:04 ip-100-0-100-1 ntpd[1416]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
Feb 12 20:20:04 ip-100-0-100-1 ntpd[1416]: Listen normally on 1 lo 127.0.0.1 UDP 123
Feb 12 20:20:04 ip-100-0-100-1 ntpd[1416]: Listen normally on 2 eth0 10.0.1.48 UDP 123
Feb 12 20:20:04 ip-100-0-100-1 ntpd[1416]: peers refreshed
Feb 12 20:20:04 ip-100-0-100-1 ntpd[1416]: Listening on routing socket on fd #19 for interface updates
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.127720] md127: detected capacity change from 4398044413952 to 0
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.127738] md: md127 stopped.
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.127747] md: unbind<xvdk>
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.136055] md: export_rdev(xvdk)
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.136122] md: unbind<xvdm>
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.148039] md: export_rdev(xvdm)
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.148108] md: unbind<xvdf>
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.168033] md: export_rdev(xvdf)
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.168371] md: unbind<xvdl>
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.204048] md: export_rdev(xvdl)
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.204111] md: unbind<xvdj>
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.240029] md: export_rdev(xvdj)
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.240080] md: unbind<xvdg>
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.304030] md: export_rdev(xvdg)
Feb 12 20:22:09 ip-100-0-100-1 kernel: [  150.304081] md: unbind<xvdi>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.340026] md: export_rdev(xvdi)
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.340080] md: unbind<xvdh>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.376053] md: export_rdev(xvdh)
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.467815] md: md0 stopped.
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.476714] md: bind<xvdg>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.478003] md: bind<xvdh>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.479101] md: bind<xvdi>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.480163] md: bind<xvdj>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.481312] md: bind<xvdk>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.482498] md: bind<xvdl>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.483348] md: bind<xvdm>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.484094] md: bind<xvdf>
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.502697] md/raid0:md0: md_size is 8589930496 sectors.
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.502712] md: RAID0 configuration for md0 - 1 zone
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.502716] md: zone0=[xvdf/xvdg/xvdh/xvdi/xvdj/xvdk/xvdl/xvdm]
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.502728]       zone-offset=         0KB, device-offset=         0KB, size=4294965248KB
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.502733] 
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.502759] md0: detected capacity change from 0 to 4398044413952
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.519995]  md0: unknown partition table
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.851140] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.853980] SGI XFS Quota Management subsystem
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  150.863890] XFS (md0): Mounting Filesystem
Feb 12 20:22:10 ip-100-0-100-1 kernel: [  151.187827] XFS (md0): Starting recovery (logdev: internal)
Feb 12 20:22:12 ip-100-0-100-1 kernel: [  152.950963] XFS (md0): Ending recovery (logdev: internal)

[-- Attachment #3: xfs_corruption_logs2.txt --]
[-- Type: text/plain, Size: 30760 bytes --]

Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248619] XFS (md0): Corruption detected. Unmount and run xfs_repair
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248641] XFS (md0): Internal error xfs_trans_cancel at line 1925 of file fs/xfs/xfs_trans.c.  Caller 0xffffffffa021738f
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248642] 
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248654] Pid: 8140, comm: cm-sql-server Not tainted 3.2.28-45.63.amzn1.x86_64 #1
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248681] Call Trace:
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248719]  [<ffffffffa020390a>] xfs_error_report+0x3a/0x40 [xfs]
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248744]  [<ffffffffa021738f>] ? xfs_create+0x1df/0x640 [xfs]
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248772]  [<ffffffffa0251a49>] xfs_trans_cancel+0xe9/0x110 [xfs]
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248794]  [<ffffffffa021738f>] xfs_create+0x1df/0x640 [xfs]
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248814]  [<ffffffffa0207be2>] ? xfs_iunlock+0x62/0xf0 [xfs]
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248835]  [<ffffffffa020be71>] xfs_vn_mknod+0xa1/0x1a0 [xfs]
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248856]  [<ffffffffa020bf8b>] xfs_vn_create+0xb/0x10 [xfs]
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248871]  [<ffffffff8114f8d0>] vfs_create+0xa0/0xc0
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248878]  [<ffffffff8115134a>] do_last+0x57a/0x790
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248885]  [<ffffffff81151ea0>] path_openat+0xd0/0x3f0
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248893]  [<ffffffff811537ee>] ? user_path_at_empty+0x5e/0xa0
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248899]  [<ffffffff811522d4>] do_filp_open+0x44/0xa0
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248907]  [<ffffffff8115f2d2>] ? alloc_fd+0x102/0x150
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248916]  [<ffffffff81142f42>] do_sys_open+0x102/0x1e0
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248927]  [<ffffffff810ba44e>] ? audit_syscall_entry+0x1be/0x1e0
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248935]  [<ffffffff8114304b>] sys_open+0x1b/0x20
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248945]  [<ffffffff813b9012>] system_call_fastpath+0x16/0x1b
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248953] XFS (md0): xfs_do_force_shutdown(0x8) called from line 1926 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffffa0251a62
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.255999] XFS (md0): Corruption of in-memory data detected.  Shutting down filesystem
Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.256031] XFS (md0): Please umount the filesystem and rectify the problem(s)
Mar  5 01:14:46 ip-100-0-100-1 kernel: [14139943.904054] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:15:16 ip-100-0-100-1 kernel: [14139973.984037] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:15:46 ip-100-0-100-1 kernel: [14140004.064047] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:16:16 ip-100-0-100-1 kernel: [14140034.144035] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:16:46 ip-100-0-100-1 kernel: [14140064.224033] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:17:17 ip-100-0-100-1 kernel: [14140094.304044] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:17:47 ip-100-0-100-1 kernel: [14140124.384089] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:18:17 ip-100-0-100-1 kernel: [14140154.464046] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:18:47 ip-100-0-100-1 kernel: [14140184.544048] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:19:17 ip-100-0-100-1 kernel: [14140214.624042] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:19:47 ip-100-0-100-1 kernel: [14140244.704053] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:20:17 ip-100-0-100-1 kernel: [14140274.784037] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:20:47 ip-100-0-100-1 kernel: [14140304.864053] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:21:17 ip-100-0-100-1 kernel: [14140334.944047] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:21:47 ip-100-0-100-1 kernel: [14140365.024039] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:22:17 ip-100-0-100-1 kernel: [14140395.104048] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:22:47 ip-100-0-100-1 kernel: [14140425.184059] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:23:18 ip-100-0-100-1 kernel: [14140455.268032] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:23:48 ip-100-0-100-1 kernel: [14140485.344057] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:24:18 ip-100-0-100-1 kernel: [14140515.424049] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:24:48 ip-100-0-100-1 kernel: [14140545.504036] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:25:18 ip-100-0-100-1 kernel: [14140575.584050] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:25:48 ip-100-0-100-1 kernel: [14140605.664066] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:26:18 ip-100-0-100-1 kernel: [14140635.744039] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:26:48 ip-100-0-100-1 kernel: [14140665.824046] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:27:18 ip-100-0-100-1 kernel: [14140695.904032] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:27:48 ip-100-0-100-1 kernel: [14140725.984050] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:28:18 ip-100-0-100-1 kernel: [14140756.064031] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:28:48 ip-100-0-100-1 kernel: [14140786.144043] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:29:18 ip-100-0-100-1 kernel: [14140816.224052] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:29:49 ip-100-0-100-1 kernel: [14140846.304055] XFS (md0): xfs_log_force: error 5 returned.
Mar  5 01:30:13 ip-100-0-100-1 init: tty (/dev/tty1) main process (1745) killed by TERM signal
Mar  5 01:30:13 ip-100-0-100-1 init: serial (hvc0) main process (1748) killed by TERM signal
Mar  5 01:30:13 ip-100-0-100-1 init: tty (/dev/tty2) main process (1750) killed by TERM signal
Mar  5 01:30:13 ip-100-0-100-1 init: tty (/dev/tty3) main process (1753) killed by TERM signal
Mar  5 01:30:13 ip-100-0-100-1 init: tty (/dev/tty4) main process (1755) killed by TERM signal
Mar  5 01:30:13 ip-100-0-100-1 init: tty (/dev/tty5) main process (1757) killed by TERM signal
Mar  5 01:30:13 ip-100-0-100-1 init: tty (/dev/tty6) main process (1759) killed by TERM signal
Mar  5 01:30:13 ip-100-0-100-1 init: plymouth-shutdown main process (28749) terminated with status 1
Mar  5 01:30:13 ip-100-0-100-1 init: splash-manager main process (28745) terminated with status 1
Mar  5 01:30:14 ip-100-0-100-1 ntpd[1397]: ntpd exiting on signal 15
Mar  5 01:30:15 ip-100-0-100-1 init: Disconnected from system bus
Mar  5 01:30:15 ip-100-0-100-1 auditd[1299]: The audit daemon is exiting.
Mar  5 01:30:15 ip-100-0-100-1 kernel: [14140872.394676] type=1305 audit(1362447015.159:903786): audit_pid=0 old=1299 auid=4294967295 ses=4294967295 res=1
Mar  5 01:30:15 ip-100-0-100-1 kernel: [14140872.504066] type=1305 audit(1362447015.271:903787): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1
Mar  5 01:30:15 ip-100-0-100-1 kernel: Kernel logging (proc) stopped.
Mar  5 01:30:15 ip-100-0-100-1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1314" x-info="http://www.rsyslog.com"] exiting on signal 15.
Mar  5 01:31:20 ip-100-0-100-1 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Mar  5 01:31:20 ip-100-0-100-1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1310" x-info="http://www.rsyslog.com"] start
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Initializing cgroup subsys cpuset
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Initializing cgroup subsys cpu
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Linux version 3.2.28-45.63.amzn1.x86_64 (mockbuild@gobi-build-31004) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Fri Aug 24 05:34:49 UTC 2012
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Command line: root=LABEL=/ console=hvc0 LANG=en_US.UTF-8 KEYTABLE=us
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Marking TSC unstable due to Xen domain
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] ACPI in unprivileged domain disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Released 0 pages of unused memory
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Set 0 page(s) to 1-1 mapping
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] BIOS-provided physical RAM map:
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000]  Xen: 0000000000100000 - 00000001e0800000 (usable)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] NX (Execute Disable) protection: active
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] DMI not present or invalid.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] No AGP bridge found
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] last_pfn = 0x1e0800 max_arch_pfn = 0x400000000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] last_pfn = 0x100000 max_arch_pfn = 0x400000000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] init_memory_mapping: 0000000000000000-0000000100000000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] init_memory_mapping: 0000000100000000-00000001e0800000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] RAMDISK: 019a5000 - 02d63000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] No NUMA configuration found
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Faking a node at 0000000000000000-00000001e0800000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Initmem setup node 0 0000000000000000-00000001e0800000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000]   NODE_DATA [00000001dfffb000 - 00000001dfffffff]
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Zone PFN ranges:
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000]   DMA      0x00000010 -> 0x00001000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000]   DMA32    0x00001000 -> 0x00100000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000]   Normal   0x00100000 -> 0x001e0800
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Movable zone start PFN for each node
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] early_node_map[2] active PFN ranges
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000]     0: 0x00000010 -> 0x000000a0
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000]     0: 0x00000100 -> 0x001e0800
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] No local APIC present
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] APIC: disable apic facility
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] APIC: switched to apic NOOP
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] PCI: Warning: Cannot find a gap in the 32bit address range
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] PCI: Unassigned devices with 32bit resource registers may break!
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Allocating PCI resources starting at 1e0900000 (gap: 1e0900000:400000)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Booting paravirtualized kernel on Xen
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Xen version: 3.4.3-2.6.18 (preserve-AD)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] setup_percpu: NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:2 nr_node_ids:1
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] PERCPU: Embedded 27 pages/cpu @ffff8801dfc00000 s80512 r8192 d21888 u1048576
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1935244
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Policy zone: Normal
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Kernel command line: root=LABEL=/ console=hvc0 LANG=en_US.UTF-8 KEYTABLE=us
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Checking aperture...
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] No AGP bridge found
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Memory: 7612384k/7872512k available (3831k kernel code, 448k absent, 259680k reserved, 2986k data, 564k init)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Hierarchical RCU implementation.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] NR_IRQS:4352 nr_irqs:288 16
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Console: colour dummy device 80x25
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] console [tty0] enabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] console [hvc0] enabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] installing Xen timer for CPU 0
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.000000] Detected 2266.746 MHz processor.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4533.49 BogoMIPS (lpj=9066984)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.004000] pid_max: default: 32768 minimum: 301
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.004000] Security Framework initialized
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.004000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.004879] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.005876] Mount-cache hash table entries: 256
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.006089] Initializing cgroup subsys cpuacct
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.006099] Initializing cgroup subsys devices
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.006105] Initializing cgroup subsys freezer
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.006110] Initializing cgroup subsys blkio
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.006123] Initializing cgroup subsys perf_event
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.006204] CPU: Physical Processor ID: 0
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.006209] CPU: Processor Core ID: 0
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.006293] SMP alternatives: switching to UP code
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.031828] cpu 0 spinlock event irq 17
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.031970] Performance Events: unsupported p6 CPU model 26 no PMU driver, software events only.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.032327] installing Xen timer for CPU 1
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.032342] cpu 1 spinlock event irq 23
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.032373] SMP alternatives: switching to SMP code
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.052311] Brought up 2 CPUs
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.052610] devtmpfs: initialized
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.056410] Grant table initialized
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.056445] NET: Registered protocol family 16
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.058176] PCI: setting up Xen PCI frontend stub
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.060148] bio: create slab <bio-0> at 0
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.060148] ACPI: Interpreter disabled.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.060411] xen/balloon: Initialising balloon driver.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.064126] xen-balloon: Initialising balloon driver.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.064196] vgaarb: loaded
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.064196] PCI: System does not support PCI
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.064196] PCI: System does not support PCI
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.064196] NetLabel: Initializing
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.064196] NetLabel:  domain hash size = 128
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.064196] NetLabel:  protocols = UNLABELED CIPSOv4
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.064196] NetLabel:  unlabeled traffic allowed by default
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.064196] Switching to clocksource xen
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.071062] pnp: PnP ACPI: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.077967] NET: Registered protocol family 2
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.078519] IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.080914] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.083061] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.083302] TCP: Hash tables configured (established 524288 bind 65536)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.083308] TCP reno registered
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.083348] UDP hash table entries: 4096 (order: 5, 131072 bytes)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.083415] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.083539] NET: Registered protocol family 1
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.083611] Unpacking initramfs...
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.107495] Freeing initrd memory: 20216k freed
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.114360] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.114382] Placing 64MB software IO TLB between ffff8800fb0f3000 - ffff8800ff0f3000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.114389] software IO TLB at phys 0xfb0f3000 - 0xff0f3000
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.114593] platform rtc_cmos: registered platform RTC device (no PNP device found)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.116254] audit: initializing netlink socket (disabled)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.140897] type=2000 audit(1362447069.000:1): initialized
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.167965] VFS: Disk quotas dquot_6.5.2
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.168142] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.168353] msgmni has been set to 14907
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.168847] alg: No test for stdrng (krng)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.169109] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.169185] io scheduler noop registered (default)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.170111] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.233889] loop: module loaded
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.239661] blkfront: xvda1: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.246811] blkfront: xvdb: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.256682]  xvdb: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.262806] Setting capacity to 880732160
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.262818] xvdb: detected capacity change from 0 to 450934865920
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.268257] blkfront: xvdc: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.269543] Initialising Xen virtual ethernet driver.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.271761] i8042: PNP: No PS/2 controller found. Probing ports directly.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.272903] mousedev: PS/2 mouse device common for all mice
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.273117] TCP cubic registered
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.273124] NET: Registered protocol family 17
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.273136] Registering the dns_resolver key type
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.273414] registered taskstats version 1
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.278222]  xvdc: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.278521] Setting capacity to 880732160
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.278529] xvdc: detected capacity change from 0 to 450934865920
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.279090] blkfront: xvdf: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.280694]  xvdf: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.280946] Setting capacity to 1073741824
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.280955] xvdf: detected capacity change from 0 to 549755813888
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.281570] blkfront: xvdg: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.283235]  xvdg: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.283466] Setting capacity to 1073741824
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.283474] xvdg: detected capacity change from 0 to 549755813888
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.284065] blkfront: xvdh: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.285854]  xvdh: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.286079] Setting capacity to 1073741824
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.286088] xvdh: detected capacity change from 0 to 549755813888
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.286648] blkfront: xvdi: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.288278]  xvdi: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.288507] Setting capacity to 1073741824
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.288516] xvdi: detected capacity change from 0 to 549755813888
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.289093] blkfront: xvdj: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.290744]  xvdj: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.290976] Setting capacity to 1073741824
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.290985] xvdj: detected capacity change from 0 to 549755813888
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.291551] blkfront: xvdk: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.293249]  xvdk: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.293475] Setting capacity to 1073741824
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.293484] xvdk: detected capacity change from 0 to 549755813888
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.294080] blkfront: xvdl: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.295760]  xvdl: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.295992] Setting capacity to 1073741824
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.296001] xvdl: detected capacity change from 0 to 549755813888
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.341564] blkfront: xvdm: barrier or flush: disabled
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.343537]  xvdm: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.343811] Setting capacity to 1073741824
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.343820] xvdm: detected capacity change from 0 to 549755813888
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.376036] XENBUS: Device with no driver: device/console/0
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.376497] Freeing unused kernel memory: 564k freed
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.376667] Write protecting the kernel read-only data: 6144k
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.379215] Freeing unused kernel memory: 248k freed
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.379498] Freeing unused kernel memory: 476k freed
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.424640] device-mapper: uevent: version 1.0.3
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.424938] device-mapper: ioctl: 4.22.0-ioctl (2011-10-19) initialised: dm-devel@redhat.com
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.546167] md: bind<xvdh>
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.550535] md: bind<xvdg>
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.555182] md: bind<xvdf>
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.564242] md: bind<xvdl>
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.634878] md: bind<xvdj>
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.643855] md: bind<xvdk>
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.652782] md: bind<xvdi>
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.662105] md: bind<xvdm>
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.670066] md: raid0 personality registered for level 0
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.670268] bio: create slab <bio-1> at 1
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.670283] md/raid0:md127: md_size is 8589918208 sectors.
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.670292] md: RAID0 configuration for md127 - 1 zone
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.670298] md: zone0=[xvdf/xvdg/xvdh/xvdi/xvdj/xvdk/xvdl/xvdm]
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.670312]       zone-offset=         0KB, device-offset=         0KB, size=4294959104KB
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.670318] 
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.670339] md127: detected capacity change from 0 to 4398038122496
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    0.672711]  md127: unknown partition table
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    1.032147] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    1.064561] dracut: Remounting /dev/disk/by-label/\x2f with -o noatime,ro
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    1.093007] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    1.097641] dracut: Mounted root filesystem /dev/xvda1
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    1.201583] dracut: Loading SELinux policy
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    1.367920] dracut: /sbin/load_policy: Can't load policy: No such device
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    1.461984] dracut: Switching root
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    6.757631] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    6.880038] SCSI subsystem initialized
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    7.328511] EXT4-fs (xvda1): re-mounted. Opts: (null)
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    7.554407] kjournald starting.  Commit interval 5 seconds
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    7.554754] EXT3-fs (xvdb): using internal journal
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    7.554763] EXT3-fs (xvdb): mounted filesystem with ordered data mode
Mar  5 01:31:20 ip-100-0-100-1 kernel: [    8.455121] NET: Registered protocol family 10
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.244240] md127: detected capacity change from 4398038122496 to 0
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.244257] md: md127 stopped.
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.244265] md: unbind<xvdm>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.260062] md: export_rdev(xvdm)
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.260199] md: unbind<xvdi>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.272074] md: export_rdev(xvdi)
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.272132] md: unbind<xvdk>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.284061] md: export_rdev(xvdk)
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.284121] md: unbind<xvdj>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.340085] md: export_rdev(xvdj)
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.340374] md: unbind<xvdl>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.368075] md: export_rdev(xvdl)
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.368140] md: unbind<xvdf>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.380060] md: export_rdev(xvdf)
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.380128] md: unbind<xvdg>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.408059] md: export_rdev(xvdg)
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.408122] md: unbind<xvdh>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.440058] md: export_rdev(xvdh)
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.466272] md: md0 stopped.
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.474154] md: bind<xvdg>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.475207] md: bind<xvdh>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.476115] md: bind<xvdi>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.476993] md: bind<xvdj>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.478148] md: bind<xvdk>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.478984] md: bind<xvdl>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.479975] md: bind<xvdm>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.480837] md: bind<xvdf>
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.495680] md/raid0:md0: md_size is 8589918208 sectors.
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.495693] md: RAID0 configuration for md0 - 1 zone
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.495697] md: zone0=[xvdf/xvdg/xvdh/xvdi/xvdj/xvdk/xvdl/xvdm]
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.495709]       zone-offset=         0KB, device-offset=         0KB, size=4294959104KB
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.495715] 
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.495740] md0: detected capacity change from 0 to 4398038122496
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.516737]  md0: unknown partition table
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.786885] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.787293] SGI XFS Quota Management subsystem
Mar  5 01:33:34 ip-100-0-100-1 kernel: [  145.789442] XFS (md0): Mounting Filesystem
Mar  5 01:33:35 ip-100-0-100-1 kernel: [  146.274464] XFS (md0): Starting recovery (logdev: internal)
Mar  5 01:33:42 ip-100-0-100-1 kernel: [  153.274667] XFS (md0): Ending recovery (logdev: internal)

[-- Attachment #4: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-06  8:07 XFS filesystem on EC2 instance corrupts and shuts down Shrinath M
@ 2013-03-06 12:59 ` Ric Wheeler
  2013-03-06 13:03   ` Shrinath M
  0 siblings, 1 reply; 15+ messages in thread
From: Ric Wheeler @ 2013-03-06 12:59 UTC (permalink / raw)
  To: Shrinath M; +Cc: Sabyasachi Ruj, Vivek Goel, xfs

On 03/06/2013 03:07 AM, Shrinath M wrote:
> We are experiencing a strange XFS corruption issue. If we look in 
> /var/log/messages, it simply says -
>
>     Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248619] XFS (md0): 
> Corruption detected. Unmount and run xfs_repair
>
> It shuts down the filesystem after this. On rebooting, it calls xfs_repair 
> automatically and everything comes back to normal.
> We have had 2 such occurrences till now, I am attaching the relevant parts of 
> /var/log/messages here, assuming someone can enlighten me on whats going wrong.
>
> Machine details are as follows -
>
> We are using Amazon AMI version: Amazon Linux AMI release 2012.09 We are 
> running 8 EBS volumes of 512 MB each, in RAID 0 Array. $~: uname -a Linux 
> ip-100-0-100-1 3.2.34-55.46.amzn1.x86_64 #1 SMP Tue Nov 20 10:06:15 UTC 2012 
> x86_64 x86_64 x86_64 GNU/Linux
>
>
> Ask me if anyone wants any more details
>

I think that you would need to verify that the Amazon storage is not throwing 
errors - do your logs show IO errors or issues before XFS hits an issue?

Thanks!

Ric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-06 12:59 ` Ric Wheeler
@ 2013-03-06 13:03   ` Shrinath M
  2013-03-06 13:08     ` Ric Wheeler
  0 siblings, 1 reply; 15+ messages in thread
From: Shrinath M @ 2013-03-06 13:03 UTC (permalink / raw)
  To: Ric Wheeler, Supratik Goswami; +Cc: Sabyasachi Ruj, Vivek Goel, xfs


[-- Attachment #1.1: Type: text/plain, Size: 328 bytes --]

On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@redhat.com> wrote:

> I think that you would need to verify that the Amazon storage is not
> throwing errors - do your logs show IO errors or issues before XFS hits an
> issue?


No IO errors in /var/log/messages.
Where else should I be looking?



-- 
Regards
*Shrinath.M*

[-- Attachment #1.2: Type: text/html, Size: 1016 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-06 13:03   ` Shrinath M
@ 2013-03-06 13:08     ` Ric Wheeler
  2013-03-06 13:12       ` Supratik Goswami
  0 siblings, 1 reply; 15+ messages in thread
From: Ric Wheeler @ 2013-03-06 13:08 UTC (permalink / raw)
  To: Shrinath M; +Cc: Sabyasachi Ruj, xfs, Supratik Goswami, Vivek Goel

On 03/06/2013 08:03 AM, Shrinath M wrote:
>
> On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@redhat.com 
> <mailto:rwheeler@redhat.com>> wrote:
>
>     I think that you would need to verify that the Amazon storage is not
>     throwing errors - do your logs show IO errors or issues before XFS hits an
>     issue?
>
>
> No IO errors in /var/log/messages.
> Where else should I be looking?
>
>

Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O Error Detected. Shutting down filesystem

Is an IO error from MD.

I would suggest trying to reproduce without MD in the picture first - always 
best to try to reproduce with the simplest setup first and work your way up the 
complexity ladder,

Ric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-06 13:08     ` Ric Wheeler
@ 2013-03-06 13:12       ` Supratik Goswami
  2013-03-06 13:15         ` Supratik Goswami
  2013-03-06 14:25         ` Ric Wheeler
  0 siblings, 2 replies; 15+ messages in thread
From: Supratik Goswami @ 2013-03-06 13:12 UTC (permalink / raw)
  To: Shrinath M, Sabyasachi Ruj, Vivek Goel, xfs


[-- Attachment #1.1: Type: text/plain, Size: 970 bytes --]

Have we created a ticket with AWS ?

It could be an EBS issue who knows, we need to confirm that first.

--
Warm Regards

Supratik


On Wed, Mar 6, 2013 at 6:38 PM, Ric Wheeler <rwheeler@redhat.com> wrote:

> On 03/06/2013 08:03 AM, Shrinath M wrote:
>
>
>> On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@redhat.com <mailto:
>> rwheeler@redhat.com>> wrote:
>>
>>     I think that you would need to verify that the Amazon storage is not
>>     throwing errors - do your logs show IO errors or issues before XFS
>> hits an
>>     issue?
>>
>>
>> No IO errors in /var/log/messages.
>> Where else should I be looking?
>>
>>
>>
> Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O
> Error Detected. Shutting down filesystem
>
> Is an IO error from MD.
>
> I would suggest trying to reproduce without MD in the picture first -
> always best to try to reproduce with the simplest setup first and work your
> way up the complexity ladder,
>
> Ric
>
>

[-- Attachment #1.2: Type: text/html, Size: 1848 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-06 13:12       ` Supratik Goswami
@ 2013-03-06 13:15         ` Supratik Goswami
  2013-03-06 14:25         ` Ric Wheeler
  1 sibling, 0 replies; 15+ messages in thread
From: Supratik Goswami @ 2013-03-06 13:15 UTC (permalink / raw)
  To: Shrinath M, Sabyasachi Ruj, Vivek Goel, xfs


[-- Attachment #1.1: Type: text/plain, Size: 1167 bytes --]

Please ignore my previous mail.

--
Warm Regards

Supratik


On Wed, Mar 6, 2013 at 6:42 PM, Supratik Goswami <
supratik.goswami@webyog.com> wrote:

> Have we created a ticket with AWS ?
>
> It could be an EBS issue who knows, we need to confirm that first.
>
> --
> Warm Regards
>
> Supratik
>
>
> On Wed, Mar 6, 2013 at 6:38 PM, Ric Wheeler <rwheeler@redhat.com> wrote:
>
>> On 03/06/2013 08:03 AM, Shrinath M wrote:
>>
>>
>>> On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@redhat.com<mailto:
>>> rwheeler@redhat.com>> wrote:
>>>
>>>     I think that you would need to verify that the Amazon storage is not
>>>     throwing errors - do your logs show IO errors or issues before XFS
>>> hits an
>>>     issue?
>>>
>>>
>>> No IO errors in /var/log/messages.
>>> Where else should I be looking?
>>>
>>>
>>>
>> Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O
>> Error Detected. Shutting down filesystem
>>
>> Is an IO error from MD.
>>
>> I would suggest trying to reproduce without MD in the picture first -
>> always best to try to reproduce with the simplest setup first and work your
>> way up the complexity ladder,
>>
>> Ric
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 2508 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-06 13:12       ` Supratik Goswami
  2013-03-06 13:15         ` Supratik Goswami
@ 2013-03-06 14:25         ` Ric Wheeler
  2013-03-13 18:07           ` Shrinath M
  1 sibling, 1 reply; 15+ messages in thread
From: Ric Wheeler @ 2013-03-06 14:25 UTC (permalink / raw)
  To: Supratik Goswami; +Cc: Sabyasachi Ruj, xfs, Vivek Goel, Shrinath M

I would suggest contacting Amazon's customer support channel (or the vendor you 
paid for the Linux instance you are running).

XFS developer list is probably not the correct forum to help you debug this :)

Good luck!

Ric


On 03/06/2013 08:12 AM, Supratik Goswami wrote:
> Have we created a ticket with AWS ?
>
> It could be an EBS issue who knows, we need to confirm that first.
>
> --
> Warm Regards
>
> Supratik
>
>
> On Wed, Mar 6, 2013 at 6:38 PM, Ric Wheeler <rwheeler@redhat.com 
> <mailto:rwheeler@redhat.com>> wrote:
>
>     On 03/06/2013 08:03 AM, Shrinath M wrote:
>
>
>         On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@redhat.com
>         <mailto:rwheeler@redhat.com> <mailto:rwheeler@redhat.com
>         <mailto:rwheeler@redhat.com>>> wrote:
>
>             I think that you would need to verify that the Amazon storage is not
>             throwing errors - do your logs show IO errors or issues before XFS
>         hits an
>             issue?
>
>
>         No IO errors in /var/log/messages.
>         Where else should I be looking?
>
>
>
>     Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O
>     Error Detected. Shutting down filesystem
>
>     Is an IO error from MD.
>
>     I would suggest trying to reproduce without MD in the picture first -
>     always best to try to reproduce with the simplest setup first and work
>     your way up the complexity ladder,
>
>     Ric
>
>
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-06 14:25         ` Ric Wheeler
@ 2013-03-13 18:07           ` Shrinath M
  2013-03-13 18:24             ` Ben Myers
  2013-03-13 18:56             ` Eric Sandeen
  0 siblings, 2 replies; 15+ messages in thread
From: Shrinath M @ 2013-03-13 18:07 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Sabyasachi Ruj, xfs, Supratik Goswami, Vivek Goel


[-- Attachment #1.1: Type: text/plain, Size: 2103 bytes --]

Sorry to be asking in dev thread, but Amazon seems to be clueless in this
case :(
Can someone tell me where can we find the logs/output of xfs repair after
this runs? We just reboot the machine when we see this and the
/var/log/messages or dmesg seems to know nothing about what it repaired.


On Wed, Mar 6, 2013 at 7:55 PM, Ric Wheeler <rwheeler@redhat.com> wrote:

> I would suggest contacting Amazon's customer support channel (or the
> vendor you paid for the Linux instance you are running).
>
> XFS developer list is probably not the correct forum to help you debug
> this :)
>
> Good luck!
>
> Ric
>
>
>
> On 03/06/2013 08:12 AM, Supratik Goswami wrote:
>
>> Have we created a ticket with AWS ?
>>
>> It could be an EBS issue who knows, we need to confirm that first.
>>
>> --
>> Warm Regards
>>
>> Supratik
>>
>>
>> On Wed, Mar 6, 2013 at 6:38 PM, Ric Wheeler <rwheeler@redhat.com <mailto:
>> rwheeler@redhat.com>> wrote:
>>
>>     On 03/06/2013 08:03 AM, Shrinath M wrote:
>>
>>
>>         On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@redhat.com
>>         <mailto:rwheeler@redhat.com> <mailto:rwheeler@redhat.com
>>
>>         <mailto:rwheeler@redhat.com>>> wrote:
>>
>>             I think that you would need to verify that the Amazon storage
>> is not
>>             throwing errors - do your logs show IO errors or issues
>> before XFS
>>         hits an
>>             issue?
>>
>>
>>         No IO errors in /var/log/messages.
>>         Where else should I be looking?
>>
>>
>>
>>     Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O
>>     Error Detected. Shutting down filesystem
>>
>>     Is an IO error from MD.
>>
>>     I would suggest trying to reproduce without MD in the picture first -
>>     always best to try to reproduce with the simplest setup first and work
>>     your way up the complexity ladder,
>>
>>     Ric
>>
>>
>>
>>
>> ______________________________**_________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/**listinfo/xfs<http://oss.sgi.com/mailman/listinfo/xfs>
>>
>
>


-- 
Regards
*Shrinath.M*

[-- Attachment #1.2: Type: text/html, Size: 3500 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-13 18:07           ` Shrinath M
@ 2013-03-13 18:24             ` Ben Myers
  2013-03-13 18:56             ` Eric Sandeen
  1 sibling, 0 replies; 15+ messages in thread
From: Ben Myers @ 2013-03-13 18:24 UTC (permalink / raw)
  To: Shrinath M; +Cc: Sabyasachi Ruj, Vivek Goel, Supratik Goswami, Ric Wheeler, xfs

Hey Shrinath,

On Wed, Mar 13, 2013 at 11:37:52PM +0530, Shrinath M wrote:
> Sorry to be asking in dev thread, but Amazon seems to be clueless in this
> case :(
> Can someone tell me where can we find the logs/output of xfs repair after
> this runs?

xfs_repair doesn't keep a separate log file.  All the output is on the command
line.  You'll need to either redirect the output of stdout and stderr to a
file, or keep a screen or console log.

>	We just reboot the machine when we see this and the
> /var/log/messages or dmesg seems to know nothing about what it repaired.

The contents of /var/log/messages could help you to understand why xfs might
have forced shutdown, but won't help you with xfs_repair.

Since you're getting IO errors, it sounds like you have a problem lower in the
stack than the filesystem.  At the filesystem level there isn't much we can do
with a block device that is giving IO errors so we just shut down.  Consider
copying the remote block device to a local one ('dd' might be a good choice for
this) and see if you can get a clean copy.  Then it's time to see about the
filesystem.

Regards,
Ben

> On Wed, Mar 6, 2013 at 7:55 PM, Ric Wheeler <rwheeler@redhat.com> wrote:
> 
> > I would suggest contacting Amazon's customer support channel (or the
> > vendor you paid for the Linux instance you are running).
> >
> > XFS developer list is probably not the correct forum to help you debug
> > this :)
> >
> > Good luck!
> >
> > Ric
> >
> >
> >
> > On 03/06/2013 08:12 AM, Supratik Goswami wrote:
> >
> >> Have we created a ticket with AWS ?
> >>
> >> It could be an EBS issue who knows, we need to confirm that first.
> >>
> >> --
> >> Warm Regards
> >>
> >> Supratik
> >>
> >>
> >> On Wed, Mar 6, 2013 at 6:38 PM, Ric Wheeler <rwheeler@redhat.com <mailto:
> >> rwheeler@redhat.com>> wrote:
> >>
> >>     On 03/06/2013 08:03 AM, Shrinath M wrote:
> >>
> >>
> >>         On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@redhat.com
> >>         <mailto:rwheeler@redhat.com> <mailto:rwheeler@redhat.com
> >>
> >>         <mailto:rwheeler@redhat.com>>> wrote:
> >>
> >>             I think that you would need to verify that the Amazon storage
> >> is not
> >>             throwing errors - do your logs show IO errors or issues
> >> before XFS
> >>         hits an
> >>             issue?
> >>
> >>
> >>         No IO errors in /var/log/messages.
> >>         Where else should I be looking?
> >>
> >>
> >>
> >>     Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O
> >>     Error Detected. Shutting down filesystem
> >>
> >>     Is an IO error from MD.
> >>
> >>     I would suggest trying to reproduce without MD in the picture first -
> >>     always best to try to reproduce with the simplest setup first and work
> >>     your way up the complexity ladder,
> >>
> >>     Ric
> >>
> >>
> >>
> >>
> >> ______________________________**_________________
> >> xfs mailing list
> >> xfs@oss.sgi.com
> >> http://oss.sgi.com/mailman/**listinfo/xfs<http://oss.sgi.com/mailman/listinfo/xfs>
> >>
> >
> >
> 
> 
> -- 
> Regards
> *Shrinath.M*

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-13 18:07           ` Shrinath M
  2013-03-13 18:24             ` Ben Myers
@ 2013-03-13 18:56             ` Eric Sandeen
  2013-03-13 19:10               ` Eric Sandeen
  2013-03-13 23:42               ` Dave Chinner
  1 sibling, 2 replies; 15+ messages in thread
From: Eric Sandeen @ 2013-03-13 18:56 UTC (permalink / raw)
  To: Shrinath M; +Cc: Sabyasachi Ruj, Vivek Goel, Supratik Goswami, Ric Wheeler, xfs

On 3/13/13 1:07 PM, Shrinath M wrote:
> Sorry to be asking in dev thread, but Amazon seems to be clueless in this case :(
> Can someone tell me where can we find the logs/output of xfs repair
> after this runs? We just reboot the machine when we see this and the
> /var/log/messages or dmesg seems to know nothing about what it
> repaired.

xfs_repair does not run automatically at boot on any OS I know of; xfs simply
replays the log.  But then I don't know what OS you are running, looks like
an amazon special?  It's a pity they can't support the OS they provide you,
because on an older kernel like this, upstream developers will be less
interested unless the problem persists in upstream kernels.  This sort
of support is usually best left to an OS vendor.

But all that aside, you list this as the first error:

    Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248619] XFS (md0): Corruption detected. Unmount and run xfs_repair

but I am wondering if there might be more information before this which is not in your trimmed logs.

The text above is from xfs_corruption_error() which calls xfs_error_report() before
the above message, and which should normally tell us a lot more about what went wrong, for 
example something like "Internal error %s at line %d of file %s.  Caller 0x%"
and possibly a hexdump or stack trace.

One of the things in
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
is:

" dmesg output showing all error messages and stack traces "

If you really didn't get anything else before this, try:

echo 11 > /proc/sys/fs/xfs/error_level

to capture the one instance where a corruption does not trigger verbose logs. That actually might be what you hit.

It's a little odd that you get:

Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.014259] XFS (md0): xfs_iunlink_remove: xfs_itobp() returned error 117.

because AFAIK, 117 is not any known error number (not even xfs's old EFSCORRUPTED value, which was 990)
But I see other references in various places to this error number coming from XFS - so I'm not sure.

-Eric

> 
> On Wed, Mar 6, 2013 at 7:55 PM, Ric Wheeler <rwheeler@redhat.com <mailto:rwheeler@redhat.com>> wrote:
> 
>     I would suggest contacting Amazon's customer support channel (or the vendor you paid for the Linux instance you are running).
> 
>     XFS developer list is probably not the correct forum to help you debug this :)
> 
>     Good luck!
> 
>     Ric
> 
> 
> 
>     On 03/06/2013 08:12 AM, Supratik Goswami wrote:
> 
>         Have we created a ticket with AWS ?
> 
>         It could be an EBS issue who knows, we need to confirm that first.
> 
>         --
>         Warm Regards
> 
>         Supratik
> 
> 
>         On Wed, Mar 6, 2013 at 6:38 PM, Ric Wheeler <rwheeler@redhat.com <mailto:rwheeler@redhat.com> <mailto:rwheeler@redhat.com <mailto:rwheeler@redhat.com>>> wrote:
> 
>             On 03/06/2013 08:03 AM, Shrinath M wrote:
> 
> 
>                 On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@redhat.com <mailto:rwheeler@redhat.com>
>                 <mailto:rwheeler@redhat.com <mailto:rwheeler@redhat.com>> <mailto:rwheeler@redhat.com <mailto:rwheeler@redhat.com>
> 
>                 <mailto:rwheeler@redhat.com <mailto:rwheeler@redhat.com>>>> wrote:
> 
>                     I think that you would need to verify that the Amazon storage is not
>                     throwing errors - do your logs show IO errors or issues before XFS
>                 hits an
>                     issue?
> 
> 
>                 No IO errors in /var/log/messages.
>                 Where else should I be looking?
> 
> 
> 
>             Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O
>             Error Detected. Shutting down filesystem
> 
>             Is an IO error from MD.
> 
>             I would suggest trying to reproduce without MD in the picture first -
>             always best to try to reproduce with the simplest setup first and work
>             your way up the complexity ladder,
> 
>             Ric
> 
> 
> 
> 
>         _________________________________________________
>         xfs mailing list
>         xfs@oss.sgi.com <mailto:xfs@oss.sgi.com>
>         http://oss.sgi.com/mailman/__listinfo/xfs <http://oss.sgi.com/mailman/listinfo/xfs>
> 
> 
> 
> 
> 
> -- 
> Regards
> *Shrinath.M*
> 
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-13 18:56             ` Eric Sandeen
@ 2013-03-13 19:10               ` Eric Sandeen
  2013-03-13 23:42               ` Dave Chinner
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2013-03-13 19:10 UTC (permalink / raw)
  To: Shrinath M; +Cc: Sabyasachi Ruj, xfs, Supratik Goswami, Ric Wheeler, Vivek Goel

On 3/13/13 1:56 PM, Eric Sandeen wrote:

> It's a little odd that you get:
> 
> Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.014259] XFS (md0): xfs_iunlink_remove: xfs_itobp() returned error 117.
> 
> because AFAIK, 117 is not any known error number (not even xfs's old EFSCORRUPTED value, which was 990)
> But I see other references in various places to this error number coming from XFS - so I'm not sure.

Ugh, no it's not odd, it's:

#define EUCLEAN 117
which maps to EFSCORRUPTED.  Sorry for that noise, how'd I miss that?  :(

Anyway, turn up the error level & see if you get more info.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-13 18:56             ` Eric Sandeen
  2013-03-13 19:10               ` Eric Sandeen
@ 2013-03-13 23:42               ` Dave Chinner
  2013-03-14  1:28                 ` Shrinath M
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2013-03-13 23:42 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Shrinath M, Sabyasachi Ruj, Vivek Goel, xfs, Supratik Goswami,
	Ric Wheeler

On Wed, Mar 13, 2013 at 01:56:35PM -0500, Eric Sandeen wrote:
> XFS (md0): xfs_iunlink_remove: xfs_itobp() returned error 117.

Corrupted unlinked inode list. You need to run xfs_repair to fix
this.

Chers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-13 23:42               ` Dave Chinner
@ 2013-03-14  1:28                 ` Shrinath M
  2013-03-14 13:31                   ` Stan Hoeppner
  2013-03-14 22:02                   ` Dave Chinner
  0 siblings, 2 replies; 15+ messages in thread
From: Shrinath M @ 2013-03-14  1:28 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Sabyasachi Ruj, Vivek Goel, Eric Sandeen, xfs, Supratik Goswami,
	Ric Wheeler


[-- Attachment #1.1: Type: text/plain, Size: 2173 bytes --]

Thanks Ben, Dave and Eric.

Eric,
>>but I am wondering if there might be more information before this which
is not in your trimmed logs.
No, this was the first entry every time we have it in /var/log/messages.
dmesg also holds the same. After reboot, it simply fixes without anyone
doing anything.

The Linux we are running is definitely amazon baked one, looks like this -
$~: uname -a Linux ip-100-0-100-1 3.2.34-55.46.amzn1.x86_64 #1 SMP Tue Nov
20 10:06:15 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

 - dmesg shows something like this after repairing/rebooting -

[    8.414176] SGI XFS with ACLs, security attributes, realtime, large
block/inode numbers, no debug enabled
[    8.415342] SGI XFS Quota Management subsystem
[    8.417664] XFS (md0): Mounting Filesystem
[    8.771553] XFS (md0): Starting recovery (logdev: internal)
[    9.977325] XFS (md0): Ending recovery (logdev: internal)

Check the first line there, it says no debug enabled. How good/bad is this
debug mode in production environments? We are not getting any corruption in
our local/test environments, in production, we are getting it once on every
third day.

Dave,
You say unlinked inode list, but if that, it should have an entry in
/var/log/messages, right?
Anyway, how can we create this situation? By forcing multiple processes to
write/delete files from small disk? Since we are still unaware of what is
causing this issue, reproducing it in local/production environment is just
shooting in dark... :(

Does turning up the error level affect the data in any way? Or is it *just*
detailed good logging while being sensitive to all small errors?


Really appreciate the support that you devs are giving which really is the
job of AWS support... I so wish they had some helpful and knowledgeable
people in support.



On Thu, Mar 14, 2013 at 5:12 AM, Dave Chinner <david@fromorbit.com> wrote:

> On Wed, Mar 13, 2013 at 01:56:35PM -0500, Eric Sandeen wrote:
> > XFS (md0): xfs_iunlink_remove: xfs_itobp() returned error 117.
>
> Corrupted unlinked inode list. You need to run xfs_repair to fix
> this.
>
> Chers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>



-- 
Regards
*Shrinath.M*

[-- Attachment #1.2: Type: text/html, Size: 4627 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-14  1:28                 ` Shrinath M
@ 2013-03-14 13:31                   ` Stan Hoeppner
  2013-03-14 22:02                   ` Dave Chinner
  1 sibling, 0 replies; 15+ messages in thread
From: Stan Hoeppner @ 2013-03-14 13:31 UTC (permalink / raw)
  To: Shrinath M
  Cc: Sabyasachi Ruj, Eric Sandeen, xfs, Vivek Goel, Supratik Goswami,
	Ric Wheeler

On 3/13/2013 8:28 PM, Shrinath M wrote:
> Thanks Ben, Dave and Eric.
> 
> Eric,
>>> but I am wondering if there might be more information before this which
> is not in your trimmed logs.
> No, this was the first entry every time we have it in /var/log/messages.
> dmesg also holds the same. After reboot, it simply fixes without anyone
> doing anything.
...
>  - dmesg shows something like this after repairing/rebooting -
> 
> [    8.414176] SGI XFS with ACLs, security attributes, realtime, large
> block/inode numbers, no debug enabled
> [    8.415342] SGI XFS Quota Management subsystem
> [    8.417664] XFS (md0): Mounting Filesystem
> [    8.771553] XFS (md0): Starting recovery (logdev: internal)
> [    9.977325] XFS (md0): Ending recovery (logdev: internal)

The active log displayed by the dmesg command is cleared and started
fresh at each reboot, which maybe is why you don't see the IO errors.
You should find them in the previous dmesg log files.

$ ls -la /var/log/dmesg*
-rw-r--r-- 1 root adm  12K Feb 29  2012 /var/log/dmesg
-rw-r--r-- 1 root adm  12K Feb 20  2012 /var/log/dmesg.0
-rw-r--r-- 1 root adm 4.7K Aug 18  2011 /var/log/dmesg.1.gz
-rw-r--r-- 1 root adm 4.7K Aug 18  2011 /var/log/dmesg.2.gz
-rw-r--r-- 1 root adm 4.7K Jun 27  2011 /var/log/dmesg.3.gz
-rw-r--r-- 1 root adm 4.7K May 18  2011 /var/log/dmesg.4.gz

Don't let the file dates in this example throw you, here's why:

 08:14:42 up 378 days, 19:44,  1 user,  load average: 0.06, 0.31, 0.22
          ^^^^^^^^^^^

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: XFS filesystem on EC2 instance corrupts and shuts down
  2013-03-14  1:28                 ` Shrinath M
  2013-03-14 13:31                   ` Stan Hoeppner
@ 2013-03-14 22:02                   ` Dave Chinner
  1 sibling, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2013-03-14 22:02 UTC (permalink / raw)
  To: Shrinath M
  Cc: Sabyasachi Ruj, Vivek Goel, Eric Sandeen, xfs, Supratik Goswami,
	Ric Wheeler

On Thu, Mar 14, 2013 at 06:58:19AM +0530, Shrinath M wrote:
> Thanks Ben, Dave and Eric.
> 
> Eric,
> >>but I am wondering if there might be more information before this which
> is not in your trimmed logs.
> No, this was the first entry every time we have it in /var/log/messages.
> dmesg also holds the same. After reboot, it simply fixes without anyone
> doing anything.
> 
> The Linux we are running is definitely amazon baked one, looks like this -
> $~: uname -a Linux ip-100-0-100-1 3.2.34-55.46.amzn1.x86_64 #1 SMP Tue Nov
> 20 10:06:15 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

So, this is an amazon special kernel by the looks of it. I think
that only amazon can really help you track down the problem...

>  - dmesg shows something like this after repairing/rebooting -
> 
> [    8.414176] SGI XFS with ACLs, security attributes, realtime, large
> block/inode numbers, no debug enabled
> [    8.415342] SGI XFS Quota Management subsystem
> [    8.417664] XFS (md0): Mounting Filesystem
> [    8.771553] XFS (md0): Starting recovery (logdev: internal)
> [    9.977325] XFS (md0): Ending recovery (logdev: internal)
> 
> Check the first line there, it says no debug enabled. How good/bad is this
> debug mode in production environments? We are not getting any corruption in
> our local/test environments, in production, we are getting it once on every
> third day.

debug shoul dnot be used in production environments. It'll cause
panics in situations where production kernels continue just
fine, and it changes the allocation algorithms to give better code
coverage for testing rather than optimal layout.

> Dave,
> You say unlinked inode list, but if that, it should have an entry in
> /var/log/messages, right?

It did - the error that Eric pointed out.

> Anyway, how can we create this situation?

If I knew, I would have fixed the bug already. You need to work out
what in your production environment is triggering it...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-03-14 22:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-06  8:07 XFS filesystem on EC2 instance corrupts and shuts down Shrinath M
2013-03-06 12:59 ` Ric Wheeler
2013-03-06 13:03   ` Shrinath M
2013-03-06 13:08     ` Ric Wheeler
2013-03-06 13:12       ` Supratik Goswami
2013-03-06 13:15         ` Supratik Goswami
2013-03-06 14:25         ` Ric Wheeler
2013-03-13 18:07           ` Shrinath M
2013-03-13 18:24             ` Ben Myers
2013-03-13 18:56             ` Eric Sandeen
2013-03-13 19:10               ` Eric Sandeen
2013-03-13 23:42               ` Dave Chinner
2013-03-14  1:28                 ` Shrinath M
2013-03-14 13:31                   ` Stan Hoeppner
2013-03-14 22:02                   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox