From: Alberto Alonso <alberto@ggsys.net>
To: David Chinner <dgc@sgi.com>
Cc: Pallai Roland <dap@mail.index.hu>,
Linux-Raid <linux-raid@vger.kernel.org>,
xfs@oss.sgi.com
Subject: Re: raid5: I lost a XFS file system due to a minor IDE cable problem
Date: Mon, 28 May 2007 17:45:27 -0500 [thread overview]
Message-ID: <1180392327.21028.140.camel@w100> (raw)
In-Reply-To: <20070525083650.GO85884050@sgi.com>
On Fri, 2007-05-25 at 18:36 +1000, David Chinner wrote:
> On Fri, May 25, 2007 at 12:43:51AM -0500, Alberto Alonso wrote:
> > I think his point was that going into a read only mode causes a
> > less catastrophic situation (ie. a web server can still serve
> > pages).
>
> Sure - but once you've detected one corruption or had metadata
> I/O errors, can you trust the rest of the filesystem?
>
> > I think that is a valid point, rather than shutting down
> > the file system completely, an automatic switch to where the least
> > disruption of service can occur is always desired.
>
> I consider the possibility of serving out bad data (i.e after
> a remount to readonly) to be the worst possible disruption of
> service that can happen ;)
I guess it does depend on the nature of the failure. A write failure
on block 2000 does not imply corruption of the other 2TB of data.
I wish I knew more on the internals of file systems, unfortunately since
I don't, I was just commenting on feature that would be nice, but maybe
there is no way to implement them. I figured that a dynamic table
with bad blocks could be kept, if an attempt to access those blocks is
generated (read or write) an I/O error is returned, if the block is
not on the list, the access is processed. This would help a server
with large file systems continue operations for most users.
> > I personally have found the XFS file system to be great for
> > my needs (except issues with NFS interaction, where the bug report
> > never got answered), but that doesn't mean it can not be improved.
>
> Got a pointer?
I can't seem to find it. I'm pretty sure I used bugzilla to report
it. I did find the kernel dump file though, so here it is:
Oct 3 15:34:07 localhost kernel: xfs_iget_core: ambiguous vns:
vp/0xd1e69c80, invp/0xc989e380
Oct 3 15:34:07 localhost kernel: ------------[ cut here ]------------
Oct 3 15:34:07 localhost kernel: kernel BUG at
fs/xfs/support/debug.c:106!
Oct 3 15:34:07 localhost kernel: invalid operand: 0000 [#1]
Oct 3 15:34:07 localhost kernel: PREEMPT SMP
Oct 3 15:34:07 localhost kernel: Modules linked in: af_packet
iptable_filter ip_tables nfsd exportfs lockd sunrpc ipv6xfs capability
commoncap ext3 jbd mbc
ache aic7xxx i2c_dev tsdev floppy mousedev parport_pc parport psmouse
evdev pcspkrhw_random shpchp pciehp pci_hotplug intel_agp intel_mch_agp
agpgart uhci_h
cd usbcore piix ide_core e1000 cfi_cmdset_0001 cfi_util mtdpart mtdcore
jedec_probe gen_probe chipreg dm_mod w83781d i2c_sensor i2c_i801
i2c_core raid5 xor
genrtc sd_mod aic79xx scsi_mod raid1 md unix font vesafb cfbcopyarea
cfbimgblt cfbfillrect
Oct 3 15:34:07 localhost kernel: CPU: 0
Oct 3 15:34:07 localhost kernel: EIP: 0060:[__crc_pm_idle
+3334982/5290900] Not tainted
Oct 3 15:34:07 localhost kernel: EFLAGS: 00010246 (2.6.8-2-686-smp)
Oct 3 15:34:07 localhost kernel: EIP is at cmn_err+0xc5/0xe0 [xfs]
Oct 3 15:34:07 localhost kernel: eax: 00000000 ebx: f602c000 ecx:
c02dcfbc edx: c02dcfbc
Oct 3 15:34:07 localhost kernel: esi: f8c40e28 edi: f8c56a3e ebp:
00000293 esp: f602da08
Oct 3 15:34:07 localhost kernel: ds: 007b es: 007b ss: 0068
Oct 3 15:34:07 localhost kernel: Process nfsd (pid: 2740,
threadinfo=f602c000 task=f71a7210)
Oct 3 15:34:07 localhost kernel: Stack: f8c40e28 f8c40def f8c56a00
00000000 f602c000 074aa1aa f8c41700 ea2f0a40
Oct 3 15:34:07 localhost kernel: f8c0a745 00000000 f8c41700
d1e69c80 c989e380 f7d4cc00 c2934754 074aa1aa
Oct 3 15:34:07 localhost kernel: 00000000 f6555624 074aa1aa
f7d4cc00 c017d6bd f6555620 00000000 00000000
Oct 3 15:34:07 localhost kernel: Call Trace:
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3123398/5290900]
xfs_iget_core+0x565/0x6b0 [xfs]
Oct 3 15:34:07 localhost kernel: [iget_locked+189/256] iget_locked
+0xbd/0x100
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3124083/5290900]
xfs_iget+0x162/0x1a0 [xfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3252484/5290900]
xfs_vget+0x63/0x100 [xfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3331204/5290900]
vfs_vget+0x43/0x50 [xfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3329570/5290900]
linvfs_get_dentry+0x51/0x90 [xfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+1536451/5290900]
find_exported_dentry+0x42/0x830 [exportfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3234969/5290900]
xfs_trans_tail_ail+0x38/0x80 [xfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3174595/5290900]
xlog_write+0x102/0x580 [xfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3234969/5290900]
xfs_trans_tail_ail+0x38/0x80 [xfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3170617/5290900]
xlog_assign_tail_lsn+0x18/0x90 [xfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3234969/5290900]
xfs_trans_tail_ail+0x38/0x80 [xfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+3174595/5290900]
xlog_write+0x102/0x580 [xfs]
Oct 3 15:34:07 localhost kernel: [alloc_skb+71/240] alloc_skb
+0x47/0xf0
Oct 3 15:34:07 localhost kernel: [sock_alloc_send_pskb+197/464]
sock_alloc_send_pskb+0xc5/0x1d0
Oct 3 15:34:07 localhost kernel: [sock_alloc_send_skb+45/64]
sock_alloc_send_skb+0x2d/0x40
Oct 3 15:34:07 localhost kernel: [ip_append_data+1810/2016]
ip_append_data+0x712/0x7e0
Oct 3 15:34:07 localhost kernel: [recalc_task_prio+168/416]
recalc_task_prio+0xa8/0x1a0
Oct 3 15:34:07 localhost kernel: [__ip_route_output_key+47/288]
__ip_route_output_key+0x2f/0x120
Oct 3 15:34:07 localhost kernel: [udp_sendmsg+831/1888] udp_sendmsg
+0x33f/0x760
Oct 3 15:34:07 localhost kernel: [ip_generic_getfrag+0/192]
ip_generic_getfrag+0x0/0xc0
Oct 3 15:34:07 localhost kernel: [qdisc_restart+23/560] qdisc_restart
+0x17/0x230
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+1539451/5290900]
export_decode_fh+0x5a/0x7a [exportfs]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4695505/5290900]
nfsd_acceptable+0x0/0x140 [nfsd]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4696349/5290900]
fh_verify+0x20c/0x5a0 [nfsd]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4695505/5290900]
nfsd_acceptable+0x0/0x140 [nfsd]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4702954/5290900]
nfsd_open+0x39/0x1a0 [nfsd]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4704974/5290900]
nfsd_write+0x5d/0x360 [nfsd]
Oct 3 15:34:07 localhost kernel: [skb_copy_and_csum_bits+102/784]
skb_copy_and_csum_bits+0x66/0x310
Oct 3 15:34:07 localhost kernel: [resched_task+83/144] resched_task
+0x53/0x90
Oct 3 15:34:07 localhost kernel: [skb_copy_and_csum_bits+556/784]
skb_copy_and_csum_bits+0x22c/0x310
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+2136279/5290900]
skb_read_and_csum_bits+0x46/0x90 [sunrpc]
Oct 3 15:34:07 localhost kernel: [kfree_skbmem+36/48] kfree_skbmem
+0x24/0x30
Oct 3 15:34:07 localhost kernel: [__kfree_skb+173/336] __kfree_skb
+0xad/0x150
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+2184090/5290900]
xdr_partial_copy_from_skb+0x169/0x180 [sunrpc]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+2180355/5290900]
svcauth_unix_accept+0x272/0x2c0 [sunrpc]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4735417/5290900]
nfsd3_proc_write+0xb8/0x120 [nfsd]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4688328/5290900]
nfsd_dispatch+0xd7/0x1e0 [nfsd]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4688113/5290900]
nfsd_dispatch+0x0/0x1e0 [nfsd]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+2162754/5290900]
svc_process+0x4b1/0x619 [sunrpc]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4687545/5290900] nfsd
+0x248/0x480 [nfsd]
Oct 3 15:34:07 localhost kernel: [__crc_pm_idle+4686961/5290900] nfsd
+0x0/0x480 [nfsd]
Oct 3 15:34:07 localhost kernel: [kernel_thread_helper+5/16]
kernel_thread_helper+0x5/0x10
Oct 3 15:34:07 localhost kernel: Code: 0f 0b 6a 00 0f 0e c4 f8 83 c4 10
5b 5e 5f 5d c3 e8 c6 03 66
Oct 3 15:34:07 localhost kernel: <6>note: nfsd[2740] exited with
preempt_count 1
Oct 3 15:51:23 localhost kernel: klogd 1.4.1#17, log source
= /proc/kmsg started.
Oct 3 15:51:23 localhost kernel:
Inspecting /boot/System.map-2.6.8-2-686-smp
Oct 3 15:51:24 localhost kernel: Loaded 27755 symbols
from /boot/System.map-2.6.8-2-686-smp.
Oct 3 15:51:24 localhost kernel: Symbols match kernel version 2.6.8.
Oct 3 15:51:24 localhost kernel: No module symbols loaded - kernel
modules not enabled.
Oct 3 15:51:24 localhost kernel: fef0000 (usable)
Oct 3 15:51:24 localhost kernel: BIOS-e820: 00000000bfef0000 -
00000000bfefc000 (ACPI data)
Oct 3 15:51:24 localhost kernel: BIOS-e820: 00000000bfefc000 -
00000000bff00000 (ACPI NVS)
Oct 3 15:51:24 localhost kernel: BIOS-e820: 00000000bff00000 -
00000000bff80000 (usable)
Oct 3 15:51:24 localhost kernel: BIOS-e820: 00000000bff80000 -
00000000c0000000 (reserved)
Oct 3 15:51:24 localhost kernel: BIOS-e820: 00000000fec00000 -
00000000fec10000 (reserved)
Oct 3 15:51:24 localhost kernel: BIOS-e820: 00000000fee00000 -
00000000fee01000 (reserved)
Oct 3 15:51:24 localhost kernel: BIOS-e820: 00000000ff800000 -
00000000ffc00000 (reserved)
Oct 3 15:51:24 localhost kernel: BIOS-e820: 00000000fff00000 -
0000000100000000 (reserved)
Oct 3 15:51:24 localhost kernel: 2175MB HIGHMEM available.
Oct 3 15:51:24 localhost kernel: 896MB LOWMEM available.
Oct 3 15:51:24 localhost kernel: found SMP MP-table at 000f6810
Oct 3 15:51:24 localhost kernel: On node 0 totalpages: 786304
Oct 3 15:51:24 localhost kernel: DMA zone: 4096 pages, LIFO batch:1
Oct 3 15:51:24 localhost kernel: Normal zone: 225280 pages, LIFO
batch:16
Oct 3 15:51:24 localhost kernel: HighMem zone: 556928 pages, LIFO
batch:16
Oct 3 15:51:24 localhost kernel: DMI present.
Thanks,
Alberto
next prev parent reply other threads:[~2007-05-28 22:45 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-24 11:18 raid5: I lost a XFS file system due to a minor IDE cable problem Pallai Roland
2007-05-24 11:20 ` Justin Piszcz
2007-05-25 0:05 ` David Chinner
2007-05-25 1:35 ` Pallai Roland
2007-05-25 4:55 ` David Chinner
2007-05-25 5:43 ` Alberto Alonso
2007-05-25 8:36 ` David Chinner
2007-05-28 22:45 ` Alberto Alonso [this message]
2007-05-29 3:28 ` David Chinner
2007-05-29 3:37 ` Alberto Alonso
2007-05-25 14:35 ` Pallai Roland
2007-05-28 0:30 ` David Chinner
2007-05-28 1:50 ` Pallai Roland
2007-05-28 2:17 ` David Chinner
2007-05-28 11:17 ` Pallai Roland
2007-05-28 23:06 ` David Chinner
2007-05-25 14:01 ` Pallai Roland
2007-05-28 12:53 ` Pallai Roland
2007-05-28 15:30 ` Pallai Roland
2007-05-28 23:36 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1180392327.21028.140.camel@w100 \
--to=alberto@ggsys.net \
--cc=dap@mail.index.hu \
--cc=dgc@sgi.com \
--cc=linux-raid@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).