* Huge problem with XFS/iCH7R
@ 2006-07-02 19:51 Carsten Otto
2006-07-02 19:56 ` Carsten Otto
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Carsten Otto @ 2006-07-02 19:51 UTC (permalink / raw)
To: linux-kernel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 1875 bytes --]
Hi there!
(System specs below)
Short summary:
System (with software raid 5, XFS, four disks connected to AHCI
controller) crashes very often and loses data.
My system crashes every few days, at the moment daily. The message shown
is (the drive changes about every time, I do not see a pattern here):
---
ata4: handling error/timeout
ata4: port reset, p_is 0 is 0 pis 0 cmd c017 tf 7f ss 0 se 0
ata4: status=0x50 { DriveReady SeekComplete }
sdd: Current: sense key=0x0
ASC=0x0 ASCQ=0x0
Info fid=0x0
---
Although according to this message only one of four drives failed (in
software RAID5) the system does not do anything useful. Hitting enter at
the login prompt does cause the password prompt to appear and no service
responds.
If I do a soft reset (using Magic Key u, then b) the BIOS does not detect
exactly one drive (which is the one shown in the error message I guess).
After a hard reset all drives are found, but I have to do a raid resync and
xfs_repair (at least, sometimes the raid needs to be tricked into starting).
This problem occured with all kernels (all vanilla), starting with
2.6.16.something up to 2.6.17.2.
I checked all four drives with a Maxtor tool, all drives are fine.
The temperature is not a problem, all drives are stable at about 35°C.
I replaced the SATA cables several times.
Some images showing the errors on screen are here:
http://c-otto.de/fehler/
I'd like to know what component causes this problem and how I can solve
it.
Please tell me if you need further information!
System specs:
- Intel iCH7R on Foxconn 945P7AA-EKRS2
- Pentium D 805 (2.66 GHz, 1MB Cache, Dual Core)
- 4x Maxtor 7V300F0 (MaXLine Plus III 300 GB; Sata 2; 16 MB Cache)
- 2 GB RAM
PS: Please include me in CC as I do not read the whole LKML.
Thanks a lot,
--
Carsten Otto
c-otto@gmx.de
www.c-otto.de
[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Huge problem with XFS/iCH7R
2006-07-02 19:51 Huge problem with XFS/iCH7R Carsten Otto
@ 2006-07-02 19:56 ` Carsten Otto
2006-07-02 21:22 ` Jan Engelhardt
2006-07-03 6:32 ` Nathan Scott
2006-07-16 8:45 ` Carsten Otto
2 siblings, 1 reply; 8+ messages in thread
From: Carsten Otto @ 2006-07-02 19:56 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 774 bytes --]
On Sun, Jul 02, 2006 at 09:51:45PM +0200, Carsten Otto wrote:
> If I do a soft reset (using Magic Key u, then b) the BIOS does not detect
> exactly one drive (which is the one shown in the error message I guess).
> After a hard reset all drives are found, but I have to do a raid resync and
> xfs_repair (at least, sometimes the raid needs to be tricked into starting).
I forgot to add:
Even after a xfs_repair sometimes another directly following run of
xfs_check or xfs_repair finds errors.
Currently I have problems deleting files from lost+found:
rm: cannot remove directory `lost+found//2171932973': Directory not
empty
I ran xfs_check only a few hours ago and it did not show any output
(which is good).
--
Carsten Otto
c-otto@gmx.de
www.c-otto.de
[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Huge problem with XFS/iCH7R
2006-07-02 19:56 ` Carsten Otto
@ 2006-07-02 21:22 ` Jan Engelhardt
0 siblings, 0 replies; 8+ messages in thread
From: Jan Engelhardt @ 2006-07-02 21:22 UTC (permalink / raw)
To: Carsten Otto; +Cc: Linux Kernel Mailing List, xfs
Adding cc,
>> If I do a soft reset (using Magic Key u, then b) the BIOS does not detect
>> exactly one drive (which is the one shown in the error message I guess).
>> After a hard reset all drives are found, but I have to do a raid resync and
>> xfs_repair (at least, sometimes the raid needs to be tricked into starting).
But at best, RAID5 should give you the safety that you can continue working
with one disk less and not worry about the filesystem. Sounds like a double
bug.
>I forgot to add:
>Even after a xfs_repair sometimes another directly following run of
>xfs_check or xfs_repair finds errors.
>Currently I have problems deleting files from lost+found:
>rm: cannot remove directory `lost+found//2171932973': Directory not
>empty
XFS devs: possibly related the ominous 16777216 thing?
>I ran xfs_check only a few hours ago and it did not show any output
>(which is good).
>>Hi there!
>>
>>(System specs below)
>>
>>Short summary:
>>System (with software raid 5, XFS, four disks connected to AHCI
>>controller) crashes very often and loses data.
>>
>>My system crashes every few days, at the moment daily. The message shown
>>is (the drive changes about every time, I do not see a pattern here):
>>---
>>ata4: handling error/timeout
>>ata4: port reset, p_is 0 is 0 pis 0 cmd c017 tf 7f ss 0 se 0
>>ata4: status=0x50 { DriveReady SeekComplete }
>>sdd: Current: sense key=0x0
>> ASC=0x0 ASCQ=0x0
>>Info fid=0x0
>>---
>>Although according to this message only one of four drives failed (in
>>software RAID5) the system does not do anything useful. Hitting enter at
>>the login prompt does cause the password prompt to appear and no service
>>responds.
>>
>>If I do a soft reset (using Magic Key u, then b) the BIOS does not detect
>>exactly one drive (which is the one shown in the error message I guess).
>>After a hard reset all drives are found, but I have to do a raid resync and
>>xfs_repair (at least, sometimes the raid needs to be tricked into starting).
>>
>>This problem occured with all kernels (all vanilla), starting with
>>2.6.16.something up to 2.6.17.2.
>>I checked all four drives with a Maxtor tool, all drives are fine.
>>The temperature is not a problem, all drives are stable at about 35C.
>>I replaced the SATA cables several times.
>>
>>Some images showing the errors on screen are here:
>>http://c-otto.de/fehler/
>>
>>I'd like to know what component causes this problem and how I can solve
>>it.
>>
>>Please tell me if you need further information!
>>
>>System specs:
>>- Intel iCH7R on Foxconn 945P7AA-EKRS2
>>- Pentium D 805 (2.66 GHz, 1MB Cache, Dual Core)
>>- 4x Maxtor 7V300F0 (MaXLine Plus III 300 GB; Sata 2; 16 MB Cache)
>>- 2 GB RAM
>>
>>PS: Please include me in CC as I do not read the whole LKML.
Jan Engelhardt
--
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Huge problem with XFS/iCH7R
2006-07-02 19:51 Huge problem with XFS/iCH7R Carsten Otto
2006-07-02 19:56 ` Carsten Otto
@ 2006-07-03 6:32 ` Nathan Scott
2006-07-03 21:20 ` Bill Davidsen
2006-07-16 8:45 ` Carsten Otto
2 siblings, 1 reply; 8+ messages in thread
From: Nathan Scott @ 2006-07-03 6:32 UTC (permalink / raw)
To: c-otto, linux-kernel
On Sun, Jul 02, 2006 at 09:51:45PM +0200, Carsten Otto wrote:
> Hi there!
>
> (System specs below)
>
> Short summary:
> System (with software raid 5, XFS, four disks connected to AHCI
> controller) crashes very often and loses data.
>
> My system crashes every few days, at the moment daily. The message shown
> is (the drive changes about every time, I do not see a pattern here):
> ---
> ata4: handling error/timeout
> ata4: port reset, p_is 0 is 0 pis 0 cmd c017 tf 7f ss 0 se 0
> ata4: status=0x50 { DriveReady SeekComplete }
> sdd: Current: sense key=0x0
> ASC=0x0 ASCQ=0x0
> Info fid=0x0
FWIW, the above look like hardware/driver problems.
> http://c-otto.de/fehler/
Your first issue there is the XFS dir2 regression discussed recently
here and on xfs@oss.sgi.com - there's a patch available for that and
it's likely to be included in the next -stable release.
> I'd like to know what component causes this problem and how I can solve
> it.
The initial problem (above) and three-of-four of your photos look
unrelated to XFS, but that first image is indicating a (now fixed)
XFS problem - so, looks like you have multiple issues there.
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Huge problem with XFS/iCH7R
2006-07-03 6:32 ` Nathan Scott
@ 2006-07-03 21:20 ` Bill Davidsen
2006-07-03 22:47 ` Nathan Scott
0 siblings, 1 reply; 8+ messages in thread
From: Bill Davidsen @ 2006-07-03 21:20 UTC (permalink / raw)
To: linux-kernel
Nathan Scott wrote:
> On Sun, Jul 02, 2006 at 09:51:45PM +0200, Carsten Otto wrote:
>> Hi there!
>>
>> (System specs below)
>>
>> Short summary:
>> System (with software raid 5, XFS, four disks connected to AHCI
>> controller) crashes very often and loses data.
>>
>> My system crashes every few days, at the moment daily. The message shown
>> is (the drive changes about every time, I do not see a pattern here):
>> ---
>> ata4: handling error/timeout
>> ata4: port reset, p_is 0 is 0 pis 0 cmd c017 tf 7f ss 0 se 0
>> ata4: status=0x50 { DriveReady SeekComplete }
>> sdd: Current: sense key=0x0
>> ASC=0x0 ASCQ=0x0
>> Info fid=0x0
>
> FWIW, the above look like hardware/driver problems.
If the problem doesn't occur before 2.6.16, that makes a hardware
problem less likely. It's not impossible that some buggy feature is now
used, but lower probability than the kernel change being the culprit.
The bug fix you mention may solve the whole thing, or make it easier to
debug.
General comment: if a kernel version made my system crash once a day I
sure wouldn't be using it. New features are neat, but I wouldn't put up
with that if it made my cat pee holy water.
I would test proposed fixes, of course, but only until I got more info
for developers.
--
Bill Davidsen <davidsen@tmr.com>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Huge problem with XFS/iCH7R
2006-07-03 21:20 ` Bill Davidsen
@ 2006-07-03 22:47 ` Nathan Scott
2006-07-05 12:29 ` Bill Davidsen
0 siblings, 1 reply; 8+ messages in thread
From: Nathan Scott @ 2006-07-03 22:47 UTC (permalink / raw)
To: Bill Davidsen; +Cc: linux-kernel
On Mon, Jul 03, 2006 at 05:20:11PM -0400, Bill Davidsen wrote:
> >> My system crashes every few days, at the moment daily. The message shown
> >> is (the drive changes about every time, I do not see a pattern here):
> >> ---
> >> ata4: handling error/timeout
> >> ata4: port reset, p_is 0 is 0 pis 0 cmd c017 tf 7f ss 0 se 0
> >> ata4: status=0x50 { DriveReady SeekComplete }
> >> sdd: Current: sense key=0x0
> >> ASC=0x0 ASCQ=0x0
> >> Info fid=0x0
> >
> > FWIW, the above look like hardware/driver problems.
>
> If the problem doesn't occur before 2.6.16, that makes a hardware
> problem less likely. It's not impossible that some buggy feature is now
> used, but lower probability than the kernel change being the culprit.
*nod*.
> The bug fix you mention may solve the whole thing, or make it easier to
I'm certain it wont; that was an XFS problem, and I can see no way it
could cause device/driver issues also.
> General comment: if a kernel version made my system crash once a day I
> sure wouldn't be using it.
Heh, yes - by definition.
> New features are neat, but I wouldn't put up
> with that if it made my cat pee holy water.
Not sure what new features you're talking about here? Nor how your
cat fits into all this.. ;)
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Huge problem with XFS/iCH7R
2006-07-03 22:47 ` Nathan Scott
@ 2006-07-05 12:29 ` Bill Davidsen
0 siblings, 0 replies; 8+ messages in thread
From: Bill Davidsen @ 2006-07-05 12:29 UTC (permalink / raw)
To: Nathan Scott; +Cc: linux-kernel
Nathan Scott wrote:
> On Mon, Jul 03, 2006 at 05:20:11PM -0400, Bill Davidsen wrote:
>> New features are neat, but I wouldn't put up
>> with that if it made my cat pee holy water.
>
> Not sure what new features you're talking about here? Nor how your
> cat fits into all this.. ;)
Presumably the new features which required or encouraged a kernel
upgrade. Why else would anyone ever mess with a working system? I count
bugfixes and performance enhancements as new features for this
discussion, some improvement to justify change.
--
Bill Davidsen <davidsen@tmr.com>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Huge problem with XFS/iCH7R
2006-07-02 19:51 Huge problem with XFS/iCH7R Carsten Otto
2006-07-02 19:56 ` Carsten Otto
2006-07-03 6:32 ` Nathan Scott
@ 2006-07-16 8:45 ` Carsten Otto
2 siblings, 0 replies; 8+ messages in thread
From: Carsten Otto @ 2006-07-16 8:45 UTC (permalink / raw)
To: linux-kernel
On Sun, Jul 02, 2006 at 09:51:45PM +0200, Carsten Otto wrote:
> System (with software raid 5, XFS, four disks connected to AHCI
> controller) crashes very often and loses data.
An underdimensioned power supply was the main problem (and a XFS bug
killed some files).
--
Carsten Otto
c-otto@gmx.de
www.c-otto.de
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-07-16 9:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-02 19:51 Huge problem with XFS/iCH7R Carsten Otto
2006-07-02 19:56 ` Carsten Otto
2006-07-02 21:22 ` Jan Engelhardt
2006-07-03 6:32 ` Nathan Scott
2006-07-03 21:20 ` Bill Davidsen
2006-07-03 22:47 ` Nathan Scott
2006-07-05 12:29 ` Bill Davidsen
2006-07-16 8:45 ` Carsten Otto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox