another hard disk broken or xfs problems?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* another hard disk broken or xfs problems?
@ 2004-02-25 22:00 Nico Schottelius
  2004-02-25 22:34 ` Nathan Scott
  0 siblings, 1 reply; 12+ messages in thread
From: Nico Schottelius @ 2004-02-25 22:00 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: nico-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1547 bytes --]

Hello!

I am now using a brand new 40GB Hitachi hard disk for my notebook and
today I got the first problems:


Starting XFS recovery on filesystem: hda3 (dev: hda3)
Ending XFS recovery on filesystem: hda3 (dev: hda3)
VFS: Mounted root (xfs filesystem) readonly.
Mounted devfs on /dev
Freeing unused kernel memory: 156k freed
XFS mounting filesystem hda1
Starting XFS recovery on filesystem: hda1 (dev: hda1)
Ending XFS recovery on filesystem: hda1 (dev: hda1)
XFS mounting filesystem loop0
Starting XFS recovery on filesystem: loop0 (dev: loop0)
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1589 of file fs/xfs/xfs_alloc.c.  Caller 0xc0198272
Call Trace: [<c0197151>]  [<c0198272>]  [<c0198272>]  [<c01c7fff>]  [<c01ee628>]  [<c01e5286>]  [<c01e5355>]  [<c01e6b04>]  [<c01d24fa>]  [<c01dd2cc>]  [<c01e83bd>]  [<c0203c60>]  [<c01d908f>]  [<c01f02c6>]  [<c02048e3>]  [<c02046c8>]  [<c0215517>]  [<c0185764>]  [<c015c835>]  [<c015c268>]  [<c0204890>]  [<c0204630>]  [<c015c4af>]  [<c0171ee8>]  [<c01721f4>]  [<c0172044>]  [<c01725ef>]  [<c010b34b>] 
Ending XFS recovery on filesystem: loop0 (dev: loop0)
Adding 192772k swap on /dev/discs/disc0/part2.  Priority:-1 extents:1


I got this after a clean shutdown. Just tell me it's an xfs error and my
harddisk is fine...

Nico, who does not want to believe the next hard disk is dieing.

ps: attached full dmesg

-- 
Keep it simple & stupid, use what's available.
pgp: 8D0E E27A          | Nico Schottelius
http://nerd-hosting.net | http://linux.schottelius.org

[-- Attachment #1.2: dmesg.first.new.error --]
[-- Type: text/plain, Size: 9018 bytes --]

Linux version 2.6.3 (root@scice) (gcc version 3.3.3 (Debian)) #3 Fri Feb 20 20:05:17 CET 2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 0000000000095c00 (usable)
 BIOS-e820: 0000000000095c00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000016ff0000 (usable)
 BIOS-e820: 0000000016ff0000 - 0000000016ff8000 (ACPI data)
 BIOS-e820: 0000000016ff8000 - 0000000017000000 (ACPI NVS)
 BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
367MB LOWMEM available.
On node 0 totalpages: 94192
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 90096 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
DMI 2.3 present.
ACPI: RSDP (v000 AMI                                       ) @ 0x000fad50
ACPI: RSDT (v001 AMIINT AMIINT09 0x00000011 MSFT 0x00000097) @ 0x16ff0000
ACPI: FADT (v001 AMIINT AMIINT09 0x00000011 MSFT 0x00000097) @ 0x16ff0030
ACPI: DSDT (v001 TM5600    TMx86 0x00001000 MSFT 0x0100000d) @ 0x00000000
Built 1 zonelists
Kernel command line: root=/dev/hda3 init=/sbin/minit video=sisfb:mode:1024x768x16,rate=60
sisfb: Options mode:1024x768x16,rate=60
sisfb: Invalid option rate=60
No local APIC present or hardware disabled
Initializing CPU#0
PID hash table entries: 2048 (order 11: 16384 bytes)
Detected 599.310 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Memory: 369300k/376768k available (2165k kernel code, 6692k reserved, 416k data, 156k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 1101.82 BogoMIPS
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU:     After generic identify, caps: 0084893f 0081813f 00000000 00000000
CPU:     After vendor identify, caps: 0084893f 0081813f 0000000e 00000000
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (32 bytes/line)
CPU: L2 Cache: 512K (128 bytes/line)
CPU: Processor revision 1.3.2.1, 600 MHz
CPU: Code Morphing Software revision 4.2.7-8-278
CPU: 20011004 02:04 official release 4.2.7#7
CPU serial number disabled.
CPU:     After all inits, caps: 0080893f 0081813f 0000000e 00000000
CPU: Transmeta(tm) Crusoe(tm) Processor TM5600 stepping 03
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfda61, last bus=0
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040116
ACPI: IRQ4 SCI: Edge set to Level Trigger.
ACPI: Interpreter enabled
ACPI: Using PIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [URP2] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 10 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 14 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 *5 6 7 14 15)
ACPI: PCI Interrupt Link [LNKP] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
ACPI: PCI Interrupt Link [LNKH] enabled at IRQ 5
ACPI: PCI Interrupt Link [LNKG] enabled at IRQ 9
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 9
ACPI: No IRQ known for interrupt pin A of device 0000:00:10.0
ACPI: PCI Interrupt Link [LNKP] enabled at IRQ 10
PCI: Using ACPI for IRQ routing
PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off'
sisfb: Video ROM found and mapped to c00c0000
sisfb: Framebuffer at 0xd0000000, mapped to 0xd780a000, size 16384k
sisfb: MMIO at 0xeffc0000, mapped to 0xd880b000, size 256k
sisfb: Memory heap starting at 15360K
sisfb: Using MMIO queue mode
sisfb: Detected SiS301LV video bridge
sisfb: Detected LCD PanelDelayCompensation 4
sisfb: Default mode is 1024x768x16 (60Hz)
sisfb: Initial vbflags 0x4000012
sisfb: Added MTRRs
sisfb: Installed SISFB_GET_INFO ioctl (80046ef8)
sisfb: Installed SISFB_GET_VBRSTATUS ioctl (80046ef9)
sisfb: 2D acceleration is enabled, scrolling mode ypan (auto-max)
fb0: SIS 315PRO frame buffer device, Version 1.6.25
sisfb: (C) 2001-2004 Thomas Winischhofer.
ikconfig 0.7 with /proc/config*
devfs: v1.22 (20021013) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x1
SGI XFS with no debug enabled
SGI XFS Quota Management subsystem
Initializing Cryptographic API
Activating ISA DMA hang workarounds.
ACPI: Power Button (FF) [PWRF]
ACPI: Lid Switch [LID0]
ACPI: Processor [CPU1] (supports C1, 8 throttling states)
Console: switching to colour frame buffer device 128x48
pty: 256 Unix98 ptys configured
Linux agpgart interface v0.100 (c) Dave Jones
[drm] Initialized sis 1.1.0 20030826 on minor 0
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ALI15X3: IDE controller at PCI slot 0000:00:10.0
ACPI: No IRQ known for interrupt pin A of device 0000:00:10.0
ALI15X3: chipset revision 196
ALI15X3: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xff00-0xff07, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xff08-0xff0f, BIOS settings: hdc:DMA, hdd:pio
hda: TOSHIBA MK4025GAS, ATA DISK drive
Using anticipatory io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: QSI CD-RW/DVD-ROM SBW-242, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 78140160 sectors (40007 MB), CHS=65535/16/63, UDMA(100)
 /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 p4
Console: switching to colour frame buffer device 128x48
mice: PS/2 mouse device common for all mice
i8042.c: Detected active multiplexing controller, rev 1.1.
serio: i8042 AUX0 port at 0x60,0x64 irq 12
serio: i8042 AUX1 port at 0x60,0x64 irq 12
serio: i8042 AUX2 port at 0x60,0x64 irq 12
serio: i8042 AUX3 port at 0x60,0x64 irq 12
Synaptics Touchpad, model: 1
 Firmware: 5.8
 Sensor: 29
 new absolute packet format
 Touchpad has extended capability bits
 -> 4 multi-buttons, i.e. besides standard buttons
 -> multifinger detection
 -> palm detection
input: SynPS/2 Synaptics TouchPad on isa0060/serio4
serio: i8042 KBD port at 0x60,0x64 irq 1
input: AT Translated Set 2 keyboard on isa0060/serio0
NET: Registered protocol family 2
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 65536)
NET: Registered protocol family 1
NET: Registered protocol family 17
BIOS EDD facility v0.12 2004-Jan-26, 1 devices found
Please report your BIOS at http://linux.dell.com/edd/results.html
ACPI: (supports S0 S1 S3 S4 S5)
XFS mounting filesystem hda3
Starting XFS recovery on filesystem: hda3 (dev: hda3)
Ending XFS recovery on filesystem: hda3 (dev: hda3)
VFS: Mounted root (xfs filesystem) readonly.
Mounted devfs on /dev
Freeing unused kernel memory: 156k freed
XFS mounting filesystem hda1
Starting XFS recovery on filesystem: hda1 (dev: hda1)
Ending XFS recovery on filesystem: hda1 (dev: hda1)
XFS mounting filesystem loop0
Starting XFS recovery on filesystem: loop0 (dev: loop0)
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1589 of file fs/xfs/xfs_alloc.c.  Caller 0xc0198272
Call Trace: [<c0197151>]  [<c0198272>]  [<c0198272>]  [<c01c7fff>]  [<c01ee628>]  [<c01e5286>]  [<c01e5355>]  [<c01e6b04>]  [<c01d24fa>]  [<c01dd2cc>]  [<c01e83bd>]  [<c0203c60>]  [<c01d908f>]  [<c01f02c6>]  [<c02048e3>]  [<c02046c8>]  [<c0215517>]  [<c0185764>]  [<c015c835>]  [<c015c268>]  [<c0204890>]  [<c0204630>]  [<c015c4af>]  [<c0171ee8>]  [<c01721f4>]  [<c0172044>]  [<c01725ef>]  [<c010b34b>] 
Ending XFS recovery on filesystem: loop0 (dev: loop0)
Adding 192772k swap on /dev/discs/disc0/part2.  Priority:-1 extents:1
8139too Fast Ethernet driver 0.9.27
eth0: RealTek RTL8139 at 0xd886ee00, 00:0a:e6:ba:f6:c2, IRQ 5
eth0:  Identified 8139 chip type 'RTL-8100B/8139D'
eth0: link down
hdc: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
NET: Registered protocol family 10
Disabled Privacy Extensions on device c0379780(lo)
IPv6 over IPv4 tunneling driver
eth0: link up, 100Mbps, full-duplex, lpa 0x41E1
eth0: no IPv6 routers present
eth0: link up, 100Mbps, full-duplex, lpa 0x41E1
eth0: no IPv6 routers present

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: another hard disk broken or xfs problems?
  2004-02-25 22:00 another hard disk broken or xfs problems? Nico Schottelius
@ 2004-02-25 22:34 ` Nathan Scott
  2004-02-25 23:10   ` Michael Joy
  2004-02-25 23:49   ` Nico Schottelius
  0 siblings, 2 replies; 12+ messages in thread
From: Nathan Scott @ 2004-02-25 22:34 UTC (permalink / raw)
  To: Nico Schottelius, Linux Kernel Mailing List

On Wed, Feb 25, 2004 at 11:00:51PM +0100, Nico Schottelius wrote:
> Hello!
> 
> I am now using a brand new 40GB Hitachi hard disk for my notebook and
> today I got the first problems:
> 
> 
> Starting XFS recovery on filesystem: hda3 (dev: hda3)
> Ending XFS recovery on filesystem: hda3 (dev: hda3)
> VFS: Mounted root (xfs filesystem) readonly.
> Mounted devfs on /dev
> Freeing unused kernel memory: 156k freed
> XFS mounting filesystem hda1
> Starting XFS recovery on filesystem: hda1 (dev: hda1)
> Ending XFS recovery on filesystem: hda1 (dev: hda1)
> XFS mounting filesystem loop0
> Starting XFS recovery on filesystem: loop0 (dev: loop0)
> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1589 of file fs/xfs/xfs_alloc.c.  Caller 0xc0198272
> Call Trace: [<c0197151>]  [<c0198272>]  [<c0198272>]  [<c01c7fff>]  [<c01ee628>]  [<c01e5286>]  [<c01e5355>]  [<c01e6b04>]  [<c01d24fa>]  [<c01dd2cc>]  [<c01e83bd>]  [<c0203c60>]  [<c01d908f>]  [<c01f02c6>]  [<c02048e3>]  [<c02046c8>]  [<c0215517>]  [<c0185764>]  [<c015c835>]  [<c015c268>]  [<c0204890>]  [<c0204630>]  [<c015c4af>]  [<c0171ee8>]  [<c01721f4>]  [<c0172044>]  [<c01725ef>]  [<c010b34b>] 
> Ending XFS recovery on filesystem: loop0 (dev: loop0)
> Adding 192772k swap on /dev/discs/disc0/part2.  Priority:-1 extents:1
> 
> 
> I got this after a clean shutdown. Just tell me it's an xfs error and my
> harddisk is fine...

Filesystem recovery doesn't run after a clean shutdown...
the "Starting/Ending XFS recovery" messages indicate that
all of your filesystems were not unmounted by the look of
it.

Probably file data for the file backing your loopback device
has been lost/corrupted due to what looks like an "abrupt" end
before reboot (only metadata is journalled), and hence trying
to recover the loop device is going horribly wrong.

So, doesn't look like a hard disk error to me, and nor does it
look like an XFS problem.  You should be able to run xfs_repair
on your loopback file to fix the problem.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: another hard disk broken or xfs problems?
  2004-02-25 22:34 ` Nathan Scott
@ 2004-02-25 23:10   ` Michael Joy
  2004-02-28 21:43     ` Stephen Satchell
  2004-02-25 23:49   ` Nico Schottelius
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Joy @ 2004-02-25 23:10 UTC (permalink / raw)
  To: 'Nathan Scott', 'Nico Schottelius',
	'Linux Kernel Mailing List'

One thing I've run into recently is that Hitachi drives come from the
factory with bad sectors out of the box. If you run a full format on the
drive checking for bad sectors the first time you use the drive, it will
occasionally find an error, the format will die, and the next time you
format nothing will be found.

It's a symptom of Hitachi not properly QC'ing their drives by fully
formatting them several times. The problem can reoccur over the first few
days until the drive finds all the questionable bad sectors and reallocates
backup sectors to cover for it.

This is OS independent as NTFS has serious issues with this :)

I hope this helps.

Michael

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Nathan Scott
Sent: Wednesday, February 25, 2004 4:34 PM
To: Nico Schottelius; Linux Kernel Mailing List
Subject: Re: another hard disk broken or xfs problems?

On Wed, Feb 25, 2004 at 11:00:51PM +0100, Nico Schottelius wrote:
> Hello!
> 
> I am now using a brand new 40GB Hitachi hard disk for my notebook and
> today I got the first problems:
> 
> 
> Starting XFS recovery on filesystem: hda3 (dev: hda3)
> Ending XFS recovery on filesystem: hda3 (dev: hda3)
> VFS: Mounted root (xfs filesystem) readonly.
> Mounted devfs on /dev
> Freeing unused kernel memory: 156k freed
> XFS mounting filesystem hda1
> Starting XFS recovery on filesystem: hda1 (dev: hda1)
> Ending XFS recovery on filesystem: hda1 (dev: hda1)
> XFS mounting filesystem loop0
> Starting XFS recovery on filesystem: loop0 (dev: loop0)
> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1589 of file
fs/xfs/xfs_alloc.c.  Caller 0xc0198272
> Call Trace: [<c0197151>]  [<c0198272>]  [<c0198272>]  [<c01c7fff>]
[<c01ee628>]  [<c01e5286>]  [<c01e5355>]  [<c01e6b04>]  [<c01d24fa>]
[<c01dd2cc>]  [<c01e83bd>]  [<c0203c60>]  [<c01d908f>]  [<c01f02c6>]
[<c02048e3>]  [<c02046c8>]  [<c0215517>]  [<c0185764>]  [<c015c835>]
[<c015c268>]  [<c0204890>]  [<c0204630>]  [<c015c4af>]  [<c0171ee8>]
[<c01721f4>]  [<c0172044>]  [<c01725ef>]  [<c010b34b>] 
> Ending XFS recovery on filesystem: loop0 (dev: loop0)
> Adding 192772k swap on /dev/discs/disc0/part2.  Priority:-1 extents:1
> 
> 
> I got this after a clean shutdown. Just tell me it's an xfs error and my
> harddisk is fine...

Filesystem recovery doesn't run after a clean shutdown...
the "Starting/Ending XFS recovery" messages indicate that
all of your filesystems were not unmounted by the look of
it.

Probably file data for the file backing your loopback device
has been lost/corrupted due to what looks like an "abrupt" end
before reboot (only metadata is journalled), and hence trying
to recover the loop device is going horribly wrong.

So, doesn't look like a hard disk error to me, and nor does it
look like an XFS problem.  You should be able to run xfs_repair
on your loopback file to fix the problem.

cheers.

-- 
Nathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: another hard disk broken or xfs problems?
  2004-02-25 22:34 ` Nathan Scott
  2004-02-25 23:10   ` Michael Joy
@ 2004-02-25 23:49   ` Nico Schottelius
  2004-02-26  3:27     ` Nathan Scott
  1 sibling, 1 reply; 12+ messages in thread
From: Nico Schottelius @ 2004-02-25 23:49 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2582 bytes --]

Nathan Scott [Thu, Feb 26, 2004 at 09:34:28AM +1100]:
> > I am now using a brand new 40GB Hitachi hard disk for my notebook and
> > today I got the first problems:
> > 
> > 
> > Starting XFS recovery on filesystem: hda3 (dev: hda3)
> > Ending XFS recovery on filesystem: hda3 (dev: hda3)
> > VFS: Mounted root (xfs filesystem) readonly.
> > Mounted devfs on /dev
> > Freeing unused kernel memory: 156k freed
> > XFS mounting filesystem hda1
> > Starting XFS recovery on filesystem: hda1 (dev: hda1)
> > Ending XFS recovery on filesystem: hda1 (dev: hda1)
> > XFS mounting filesystem loop0
> > Starting XFS recovery on filesystem: loop0 (dev: loop0)
> > XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1589 of file fs/xfs/xfs_alloc.c.  Caller 0xc0198272
> > Call Trace: [<c0197151>]  [<c0198272>]  [<c0198272>]  [<c01c7fff>]  [<c01ee628>]  [<c01e5286>]  [<c01e5355>]  [<c01e6b04>]  [<c01d24fa>]  [<c01dd2cc>]  [<c01e83bd>]  [<c0203c60>]  [<c01d908f>]  [<c01f02c6>]  [<c02048e3>]  [<c02046c8>]  [<c0215517>]  [<c0185764>]  [<c015c835>]  [<c015c268>]  [<c0204890>]  [<c0204630>]  [<c015c4af>]  [<c0171ee8>]  [<c01721f4>]  [<c0172044>]  [<c01725ef>]  [<c010b34b>] 
> > Ending XFS recovery on filesystem: loop0 (dev: loop0)
> > Adding 192772k swap on /dev/discs/disc0/part2.  Priority:-1 extents:1
> > 
> > 
> > I got this after a clean shutdown. Just tell me it's an xfs error and my
> > harddisk is fine...
> 
> Filesystem recovery doesn't run after a clean shutdown...

That's what I assume(d), too.

> the "Starting/Ending XFS recovery" messages indicate that
> all of your filesystems were not unmounted by the look of
> it.

That's true and interesting, as I did a shutdown -o (poweroff) from minit, which
should have unmounted the partions. I'll retry this with -h (halt only)
this evening.

> [...] 
> So, doesn't look like a hard disk error to me, and nor does it
> look like an XFS problem.  You should be able to run xfs_repair
> on your loopback file to fix the problem.

Will reboot in half an hour, but I think as the recovery was done, it
won't have any problems anymore.

There are still some questions open for me:

1. why is it an internal xfs error?
2. why does it print a call trace?
3. how can I find out what's wrong / what should I do when seeing call
   traces? And what should I've done before (adding debugging somewhere?)

Have a nice night,

Nico

-- 
Keep it simple & stupid, use what's available.
pgp: 8D0E E27A          | Nico Schottelius
http://nerd-hosting.net | http://linux.schottelius.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: another hard disk broken or xfs problems?
  2004-02-25 23:49   ` Nico Schottelius
@ 2004-02-26  3:27     ` Nathan Scott
  2004-02-26  8:25       ` Nico Schottelius
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Scott @ 2004-02-26  3:27 UTC (permalink / raw)
  To: Nico Schottelius, Linux Kernel Mailing List

hi there,

On Thu, Feb 26, 2004 at 12:49:44AM +0100, Nico Schottelius wrote:
> Nathan Scott [Thu, Feb 26, 2004 at 09:34:28AM +1100]:
> > [...] 
> > So, doesn't look like a hard disk error to me, and nor does it
> > look like an XFS problem.  You should be able to run xfs_repair
> > on your loopback file to fix the problem.
> 
> Will reboot in half an hour, but I think as the recovery was done, it
> won't have any problems anymore.

Well, recovery is a garbage-in, garbage-out process - I
think you will need to repair that loopback file.

> There are still some questions open for me:
> 
> 1. why is it an internal xfs error?

Your loopback file seems to have got corrupted, XFS reports
this as an internal error (generic error message).

> 2. why does it print a call trace?

XFS detected corruption, and tried to dump out some state info
at the point where it noticed the problem.

> 3. how can I find out what's wrong / what should I do when seeing call
>    traces? And what should I've done before (adding debugging somewhere?)

xfs_repair will tell you whats wrong, and should fix it.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: another hard disk broken or xfs problems?
  2004-02-26  3:27     ` Nathan Scott
@ 2004-02-26  8:25       ` Nico Schottelius
  2004-02-26  9:46         ` Nathan Scott
  2004-02-26 10:02         ` Rogier Wolff
  0 siblings, 2 replies; 12+ messages in thread
From: Nico Schottelius @ 2004-02-26  8:25 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1624 bytes --]

Nathan Scott [Thu, Feb 26, 2004 at 02:27:41PM +1100]:
> On Thu, Feb 26, 2004 at 12:49:44AM +0100, Nico Schottelius wrote:
> > Nathan Scott [Thu, Feb 26, 2004 at 09:34:28AM +1100]:
> > > [...] 
> > > So, doesn't look like a hard disk error to me, and nor does it
> > > look like an XFS problem.  You should be able to run xfs_repair
> > > on your loopback file to fix the problem.
> > 
> > Will reboot in half an hour, but I think as the recovery was done, it
> > won't have any problems anymore.
> 
> Well, recovery is a garbage-in, garbage-out process - I
> think you will need to repair that loopback file.

Well, after the recovery the system works fine again.

> > There are still some questions open for me:
> > 
> > 1. why is it an internal xfs error?
> 
> Your loopback file seems to have got corrupted, XFS reports
> this as an internal error (generic error message).

I am really wondering about the error message, as "internal errors" 
indicate for me an error in the kernel.

> > 2. why does it print a call trace?
> 
> XFS detected corruption, and tried to dump out some state info
> at the point where it noticed the problem.

I am wondering how my dmesg will look like if I've to recover some
Gigabytes of date.

And btw, do all filesystem drivers behave in this way, printing internal
errors and displaying call traces when they find errors in the
filesystem?

For me this is really confusing.

Sincerly,

Nico

-- 
Keep it simple & stupid, use what's available.
pgp: 8D0E E27A          | Nico Schottelius
http://nerd-hosting.net | http://linux.schottelius.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: another hard disk broken or xfs problems?
  2004-02-26  8:25       ` Nico Schottelius
@ 2004-02-26  9:46         ` Nathan Scott
  2004-02-26 19:26           ` Mike Fedyk
  2004-02-26 10:02         ` Rogier Wolff
  1 sibling, 1 reply; 12+ messages in thread
From: Nathan Scott @ 2004-02-26  9:46 UTC (permalink / raw)
  To: Nico Schottelius, Linux Kernel Mailing List

On Thu, Feb 26, 2004 at 09:25:51AM +0100, Nico Schottelius wrote:
> Nathan Scott [Thu, Feb 26, 2004 at 02:27:41PM +1100]:
> > On Thu, Feb 26, 2004 at 12:49:44AM +0100, Nico Schottelius wrote:
> > > There are still some questions open for me:
> > > 
> > > 1. why is it an internal xfs error?
> > 
> > Your loopback file seems to have got corrupted, XFS reports
> > this as an internal error (generic error message).
> 
> I am really wondering about the error message, as "internal errors" 
> indicate for me an error in the kernel.

It is a misleading error message, I'll look into correcting that.

> > > 2. why does it print a call trace?
> > 
> > XFS detected corruption, and tried to dump out some state info
> > at the point where it noticed the problem.
> 
> I am wondering how my dmesg will look like if I've to recover some
> Gigabytes of date.

The filesystem typically shuts down on detection of corruption,
so you should get just the one error report.

> And btw, do all filesystem drivers behave in this way, printing internal
> errors and displaying call traces when they find errors in the
> filesystem?

No, not all filesystem behave this way.  And it is configurable
in XFS; if you don't want this to happen, you can switch it off
via the sysctl/procfs interface - see the "error_level" section
in Documentation/filesystems/xfs.txt.

> For me this is really confusing.

Hope this helps.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: another hard disk broken or xfs problems?
  2004-02-26  8:25       ` Nico Schottelius
  2004-02-26  9:46         ` Nathan Scott
@ 2004-02-26 10:02         ` Rogier Wolff
  1 sibling, 0 replies; 12+ messages in thread
From: Rogier Wolff @ 2004-02-26 10:02 UTC (permalink / raw)
  To: Nico Schottelius, Nathan Scott, Linux Kernel Mailing List

On Thu, Feb 26, 2004 at 09:25:51AM +0100, Nico Schottelius wrote:
> I am really wondering about the error message, as "internal errors" 
> indicate for me an error in the kernel.

[...]

> And btw, do all filesystem drivers behave in this way, printing internal
> errors and displaying call traces when they find errors in the
> filesystem?

For a filesystem driver, things are clear: it's the only one writing
to the data on the drive. So when things go wrong: in principle, it's
an internal error. 

It would be nice if you'd always be able to salvage your data by
just mounting the partition, but that's not the case. A specialized
program like ...-repair or fsck will do a better job. 

Now for a logging filesystem, the assumption that it messed up itself
is even stronger than for a classical filesystem. It should be able
to handle whatever happens. 

It's very difficult to check all assumptions about what's on the disk
at every step and still have a reasonable performance. That's why some
errors may only be noticed a bit late and lead to slightly misleading
error messages. 

	Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: another hard disk broken or xfs problems?
  2004-02-26  9:46         ` Nathan Scott
@ 2004-02-26 19:26           ` Mike Fedyk
  2004-02-27  5:53             ` Nathan Scott
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Fedyk @ 2004-02-26 19:26 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Nico Schottelius, Linux Kernel Mailing List

Nathan Scott wrote:
> On Thu, Feb 26, 2004 at 09:25:51AM +0100, Nico Schottelius wrote:
>>And btw, do all filesystem drivers behave in this way, printing internal
>>errors and displaying call traces when they find errors in the
>>filesystem?
> 
> 
> No, not all filesystem behave this way.  And it is configurable
> in XFS; if you don't want this to happen, you can switch it off
> via the sysctl/procfs interface - see the "error_level" section
> in Documentation/filesystems/xfs.txt.

I like this idea.

Is it just calling dump_stack() based on error level?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: another hard disk broken or xfs problems?
  2004-02-26 19:26           ` Mike Fedyk
@ 2004-02-27  5:53             ` Nathan Scott
  0 siblings, 0 replies; 12+ messages in thread
From: Nathan Scott @ 2004-02-27  5:53 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Nico Schottelius, Linux Kernel Mailing List

On Thu, Feb 26, 2004 at 11:26:51AM -0800, Mike Fedyk wrote:
> 
> I like this idea.
> 
> Is it just calling dump_stack() based on error level?

Yes, pretty much.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: another hard disk broken or xfs problems?
  2004-02-25 23:10   ` Michael Joy
@ 2004-02-28 21:43     ` Stephen Satchell
  2004-02-29  8:07       ` John Bradford
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Satchell @ 2004-02-28 21:43 UTC (permalink / raw)
  To: Michael Joy
  Cc: 'Nathan Scott', 'Nico Schottelius',
	'Linux Kernel Mailing List'

On Wed, 2004-02-25 at 15:10, Michael Joy wrote:
> One thing I've run into recently is that Hitachi drives come from the
> factory with bad sectors out of the box. If you run a full format on the
> drive checking for bad sectors the first time you use the drive, it will
> occasionally find an error, the format will die, and the next time you
> format nothing will be found.
> 
> It's a symptom of Hitachi not properly QC'ing their drives by fully
> formatting them several times. The problem can reoccur over the first few
> days until the drive finds all the questionable bad sectors and reallocates
> backup sectors to cover for it.

It's not just Hitachi drives, either.  I just purchased a NEW (not
reburbed) Maxtor 80-GB drive to replace a Fujitsu that finally reached
end of life, and found that I had to zero the drive before it would mkfs
properly.  The command I used to make this happen is:

   dd if=/dev/zero of=/dev/hdb bs=1M

I'm used to doing this because I've discovered that the refurb drives my
employer has been buying have been needing zeroing before use.  I know
that's not as good as using a worst-case pattern specific to the drive,
but it's good enough to get the known bad sectors remapped at the start,
then let the drive discover other bad sectors as it goes.  To that end,
the systems under my control run a weekly CRON job that looks something
like this:

  /bin/nice -n 15 /bin/dd if=/dev/hda of=/dev/null bs=128k
  /bin/nice -n 15 /bin/dd if=/dev/hdc of=/dev/null bs=128k
  /bin/nice -n 15 /bin/dd if=/dev/sda of=/dev/null bs=128k

This ensures that all sectors are readable, regardless of file system
state, and relocates and reassigns those sectors that can be read in any
way.

I also have started installing smartmontools into all my systems, and
configuring them to e-mail me when a critical parameter changes -- I'm
getting tired of coming in to work and discovering that a hard drive has
finally bit the big one.  (It doesn't help prevent a crisis, because a
thrown head is such a catastropic event, but it does help with
end-of-life discovery, so a disk change can be scheduled.)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: another hard disk broken or xfs problems?
  2004-02-28 21:43     ` Stephen Satchell
@ 2004-02-29  8:07       ` John Bradford
  0 siblings, 0 replies; 12+ messages in thread
From: John Bradford @ 2004-02-29  8:07 UTC (permalink / raw)
  To: Stephen Satchell, Michael Joy
  Cc: 'Nathan Scott', 'Nico Schottelius',
	'Linux Kernel Mailing List'

Quote from Stephen Satchell <list@satchell.net>:
>   /bin/nice -n 15 /bin/dd if=/dev/hda of=/dev/null bs=128k
>   /bin/nice -n 15 /bin/dd if=/dev/hdc of=/dev/null bs=128k
>   /bin/nice -n 15 /bin/dd if=/dev/sda of=/dev/null bs=128k
> 
> This ensures that all sectors are readable, regardless of file system
> state, and relocates and reassigns those sectors that can be read in any
> way.

The majority of drives do not re-allocate on read, only on write.
Therefore, the above cron jobs will simply find them each time they
run, unless something writes to the defective block inbetween runs.

John.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-02-29  8:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-25 22:00 another hard disk broken or xfs problems? Nico Schottelius
2004-02-25 22:34 ` Nathan Scott
2004-02-25 23:10   ` Michael Joy
2004-02-28 21:43     ` Stephen Satchell
2004-02-29  8:07       ` John Bradford
2004-02-25 23:49   ` Nico Schottelius
2004-02-26  3:27     ` Nathan Scott
2004-02-26  8:25       ` Nico Schottelius
2004-02-26  9:46         ` Nathan Scott
2004-02-26 19:26           ` Mike Fedyk
2004-02-27  5:53             ` Nathan Scott
2004-02-26 10:02         ` Rogier Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox