All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alberto Alonso <alberto@ggsys.net>
To: David Chinner <dgc@sgi.com>
Cc: Pallai Roland <dap@mail.index.hu>,
	Linux-Raid <linux-raid@vger.kernel.org>,
	xfs@oss.sgi.com
Subject: Re: raid5: I lost a XFS file system due to a minor IDE cable problem
Date: Mon, 28 May 2007 17:45:27 -0500	[thread overview]
Message-ID: <1180392327.21028.140.camel@w100> (raw)
In-Reply-To: <20070525083650.GO85884050@sgi.com>

On Fri, 2007-05-25 at 18:36 +1000, David Chinner wrote:
> On Fri, May 25, 2007 at 12:43:51AM -0500, Alberto Alonso wrote:
> > I think his point was that going into a read only mode causes a
> > less catastrophic situation (ie. a web server can still serve
> > pages).
> 
> Sure - but once you've detected one corruption or had metadata
> I/O errors, can you trust the rest of the filesystem?
> 
> > I think that is a valid point, rather than shutting down
> > the file system completely, an automatic switch to where the least
> > disruption of service can occur is always desired.
> 
> I consider the possibility of serving out bad data (i.e after
> a remount to readonly) to be the worst possible disruption of
> service that can happen ;)

I guess it does depend on the nature of the failure. A write failure
on block 2000 does not imply corruption of the other 2TB of data.

I wish I knew more on the internals of file systems, unfortunately since
I don't, I was just commenting on feature that would be nice, but maybe
there is no way to implement them. I figured that a dynamic table
with bad blocks could be kept, if an attempt to access those blocks is
generated (read or write) an I/O error is returned, if the block is
not on the list, the access is processed. This would help a server
with large file systems continue operations for most users.

> > I personally have found the XFS file system to be great for
> > my needs (except issues with NFS interaction, where the bug report
> > never got answered), but that doesn't mean it can not be improved.
> 
> Got a pointer?

I can't seem to find it. I'm pretty sure I used bugzilla to report
it. I did find the kernel dump file though, so here it is:

Oct  3 15:34:07 localhost kernel: xfs_iget_core: ambiguous vns:
vp/0xd1e69c80, invp/0xc989e380
Oct  3 15:34:07 localhost kernel: ------------[ cut here ]------------
Oct  3 15:34:07 localhost kernel: kernel BUG at
fs/xfs/support/debug.c:106!
Oct  3 15:34:07 localhost kernel: invalid operand: 0000 [#1]
Oct  3 15:34:07 localhost kernel: PREEMPT SMP
Oct  3 15:34:07 localhost kernel: Modules linked in: af_packet
iptable_filter ip_tables nfsd exportfs lockd sunrpc ipv6xfs capability
commoncap ext3 jbd mbc
ache aic7xxx i2c_dev tsdev floppy mousedev parport_pc parport psmouse
evdev pcspkrhw_random shpchp pciehp pci_hotplug intel_agp intel_mch_agp
agpgart uhci_h
cd usbcore piix ide_core e1000 cfi_cmdset_0001 cfi_util mtdpart mtdcore
jedec_probe gen_probe chipreg dm_mod w83781d i2c_sensor i2c_i801
i2c_core raid5 xor
genrtc sd_mod aic79xx scsi_mod raid1 md unix font vesafb cfbcopyarea
cfbimgblt cfbfillrect
Oct  3 15:34:07 localhost kernel: CPU:    0
Oct  3 15:34:07 localhost kernel: EIP:    0060:[__crc_pm_idle
+3334982/5290900]    Not tainted
Oct  3 15:34:07 localhost kernel: EFLAGS: 00010246   (2.6.8-2-686-smp)
Oct  3 15:34:07 localhost kernel: EIP is at cmn_err+0xc5/0xe0 [xfs]
Oct  3 15:34:07 localhost kernel: eax: 00000000   ebx: f602c000   ecx:
c02dcfbc   edx: c02dcfbc
Oct  3 15:34:07 localhost kernel: esi: f8c40e28   edi: f8c56a3e   ebp:
00000293   esp: f602da08
Oct  3 15:34:07 localhost kernel: ds: 007b   es: 007b   ss: 0068
Oct  3 15:34:07 localhost kernel: Process nfsd (pid: 2740,
threadinfo=f602c000 task=f71a7210)
Oct  3 15:34:07 localhost kernel: Stack: f8c40e28 f8c40def f8c56a00
00000000 f602c000 074aa1aa f8c41700 ea2f0a40
Oct  3 15:34:07 localhost kernel:        f8c0a745 00000000 f8c41700
d1e69c80 c989e380 f7d4cc00 c2934754 074aa1aa
Oct  3 15:34:07 localhost kernel:        00000000 f6555624 074aa1aa
f7d4cc00 c017d6bd f6555620 00000000 00000000
Oct  3 15:34:07 localhost kernel: Call Trace:
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3123398/5290900]
xfs_iget_core+0x565/0x6b0 [xfs]
Oct  3 15:34:07 localhost kernel:  [iget_locked+189/256] iget_locked
+0xbd/0x100
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3124083/5290900]
xfs_iget+0x162/0x1a0 [xfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3252484/5290900]
xfs_vget+0x63/0x100 [xfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3331204/5290900]
vfs_vget+0x43/0x50 [xfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3329570/5290900]
linvfs_get_dentry+0x51/0x90 [xfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+1536451/5290900]
find_exported_dentry+0x42/0x830 [exportfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3234969/5290900]
xfs_trans_tail_ail+0x38/0x80 [xfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3174595/5290900]
xlog_write+0x102/0x580 [xfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3234969/5290900]
xfs_trans_tail_ail+0x38/0x80 [xfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3170617/5290900]
xlog_assign_tail_lsn+0x18/0x90 [xfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3234969/5290900]
xfs_trans_tail_ail+0x38/0x80 [xfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+3174595/5290900]
xlog_write+0x102/0x580 [xfs]
Oct  3 15:34:07 localhost kernel:  [alloc_skb+71/240] alloc_skb
+0x47/0xf0
Oct  3 15:34:07 localhost kernel:  [sock_alloc_send_pskb+197/464]
sock_alloc_send_pskb+0xc5/0x1d0
Oct  3 15:34:07 localhost kernel:  [sock_alloc_send_skb+45/64]
sock_alloc_send_skb+0x2d/0x40
Oct  3 15:34:07 localhost kernel:  [ip_append_data+1810/2016]
ip_append_data+0x712/0x7e0
Oct  3 15:34:07 localhost kernel:  [recalc_task_prio+168/416]
recalc_task_prio+0xa8/0x1a0
Oct  3 15:34:07 localhost kernel:  [__ip_route_output_key+47/288]
__ip_route_output_key+0x2f/0x120
Oct  3 15:34:07 localhost kernel:  [udp_sendmsg+831/1888] udp_sendmsg
+0x33f/0x760
Oct  3 15:34:07 localhost kernel:  [ip_generic_getfrag+0/192]
ip_generic_getfrag+0x0/0xc0
Oct  3 15:34:07 localhost kernel:  [qdisc_restart+23/560] qdisc_restart
+0x17/0x230
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+1539451/5290900]
export_decode_fh+0x5a/0x7a [exportfs]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4695505/5290900]
nfsd_acceptable+0x0/0x140 [nfsd]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4696349/5290900]
fh_verify+0x20c/0x5a0 [nfsd]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4695505/5290900]
nfsd_acceptable+0x0/0x140 [nfsd]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4702954/5290900]
nfsd_open+0x39/0x1a0 [nfsd]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4704974/5290900]
nfsd_write+0x5d/0x360 [nfsd]
Oct  3 15:34:07 localhost kernel:  [skb_copy_and_csum_bits+102/784]
skb_copy_and_csum_bits+0x66/0x310
Oct  3 15:34:07 localhost kernel:  [resched_task+83/144] resched_task
+0x53/0x90
Oct  3 15:34:07 localhost kernel:  [skb_copy_and_csum_bits+556/784]
skb_copy_and_csum_bits+0x22c/0x310
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+2136279/5290900]
skb_read_and_csum_bits+0x46/0x90 [sunrpc]
Oct  3 15:34:07 localhost kernel:  [kfree_skbmem+36/48] kfree_skbmem
+0x24/0x30
Oct  3 15:34:07 localhost kernel:  [__kfree_skb+173/336] __kfree_skb
+0xad/0x150
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+2184090/5290900]
xdr_partial_copy_from_skb+0x169/0x180 [sunrpc]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+2180355/5290900]
svcauth_unix_accept+0x272/0x2c0 [sunrpc]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4735417/5290900]
nfsd3_proc_write+0xb8/0x120 [nfsd]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4688328/5290900]
nfsd_dispatch+0xd7/0x1e0 [nfsd]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4688113/5290900]
nfsd_dispatch+0x0/0x1e0 [nfsd]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+2162754/5290900]
svc_process+0x4b1/0x619 [sunrpc]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4687545/5290900] nfsd
+0x248/0x480 [nfsd]
Oct  3 15:34:07 localhost kernel:  [__crc_pm_idle+4686961/5290900] nfsd
+0x0/0x480 [nfsd]
Oct  3 15:34:07 localhost kernel:  [kernel_thread_helper+5/16]
kernel_thread_helper+0x5/0x10
Oct  3 15:34:07 localhost kernel: Code: 0f 0b 6a 00 0f 0e c4 f8 83 c4 10
5b 5e 5f 5d c3 e8 c6 03 66
Oct  3 15:34:07 localhost kernel:  <6>note: nfsd[2740] exited with
preempt_count 1
Oct  3 15:51:23 localhost kernel: klogd 1.4.1#17, log source
= /proc/kmsg started.
Oct  3 15:51:23 localhost kernel:
Inspecting /boot/System.map-2.6.8-2-686-smp
Oct  3 15:51:24 localhost kernel: Loaded 27755 symbols
from /boot/System.map-2.6.8-2-686-smp.
Oct  3 15:51:24 localhost kernel: Symbols match kernel version 2.6.8.
Oct  3 15:51:24 localhost kernel: No module symbols loaded - kernel
modules not enabled.
Oct  3 15:51:24 localhost kernel: fef0000 (usable)
Oct  3 15:51:24 localhost kernel:  BIOS-e820: 00000000bfef0000 -
00000000bfefc000 (ACPI data)
Oct  3 15:51:24 localhost kernel:  BIOS-e820: 00000000bfefc000 -
00000000bff00000 (ACPI NVS)
Oct  3 15:51:24 localhost kernel:  BIOS-e820: 00000000bff00000 -
00000000bff80000 (usable)
Oct  3 15:51:24 localhost kernel:  BIOS-e820: 00000000bff80000 -
00000000c0000000 (reserved)
Oct  3 15:51:24 localhost kernel:  BIOS-e820: 00000000fec00000 -
00000000fec10000 (reserved)
Oct  3 15:51:24 localhost kernel:  BIOS-e820: 00000000fee00000 -
00000000fee01000 (reserved)
Oct  3 15:51:24 localhost kernel:  BIOS-e820: 00000000ff800000 -
00000000ffc00000 (reserved)
Oct  3 15:51:24 localhost kernel:  BIOS-e820: 00000000fff00000 -
0000000100000000 (reserved)
Oct  3 15:51:24 localhost kernel: 2175MB HIGHMEM available.
Oct  3 15:51:24 localhost kernel: 896MB LOWMEM available.
Oct  3 15:51:24 localhost kernel: found SMP MP-table at 000f6810
Oct  3 15:51:24 localhost kernel: On node 0 totalpages: 786304
Oct  3 15:51:24 localhost kernel:   DMA zone: 4096 pages, LIFO batch:1
Oct  3 15:51:24 localhost kernel:   Normal zone: 225280 pages, LIFO
batch:16
Oct  3 15:51:24 localhost kernel:   HighMem zone: 556928 pages, LIFO
batch:16
Oct  3 15:51:24 localhost kernel: DMI present.


Thanks,

Alberto




  reply	other threads:[~2007-05-28 22:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-24 11:18 raid5: I lost a XFS file system due to a minor IDE cable problem Pallai Roland
2007-05-24 11:20 ` Justin Piszcz
2007-05-25  0:05   ` David Chinner
2007-05-25  1:35     ` Pallai Roland
2007-05-25  4:55       ` David Chinner
2007-05-25  5:43         ` Alberto Alonso
2007-05-25  8:36           ` David Chinner
2007-05-28 22:45             ` Alberto Alonso [this message]
2007-05-29  3:28               ` David Chinner
2007-05-29  3:37                 ` Alberto Alonso
2007-05-25 14:35         ` Pallai Roland
2007-05-28  0:30           ` David Chinner
2007-05-28  1:50             ` Pallai Roland
2007-05-28  2:17               ` David Chinner
2007-05-28 11:17                 ` Pallai Roland
2007-05-28 23:06                   ` David Chinner
2007-05-25 14:01       ` Pallai Roland
2007-05-28 12:53     ` Pallai Roland
2007-05-28 15:30       ` Pallai Roland
2007-05-28 23:36         ` David Chinner
2007-05-30 16:11         ` Christian Kujau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1180392327.21028.140.camel@w100 \
    --to=alberto@ggsys.net \
    --cc=dap@mail.index.hu \
    --cc=dgc@sgi.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.