public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick <ragamuffin@datacomm.ch>
To: linux-kernel@vger.kernel.org
Subject: SB600 AHCI: Hard Disk Corruption
Date: Sun, 25 May 2008 14:10:35 +0200	[thread overview]
Message-ID: <1211717435.6038.53.camel@localhost.localdomain> (raw)

Hello (Tejun Heo *)

I've got an annoying problem with my athlon 64bit, 4gb ram, asus m2a-vm
(->SB600 AHCI controller), SAMSUNG HD501LJ SATA Disk. I'm using kernel
2.6.26-rc3. Everything works fine, expect for standby/suspend/hibernate.
Standby freezes, hibernate, I acually haven't tested lately cause I
want suspend to ram to work first.

"echo mem > /sys/power/state; vbetool post;" (on text console)
successfully suspends the system and it resumes as well, BUT: After
resuming, things quickly turn bad: "file not fonund", kernel reports
ext2 errors on root (lvm) partition. After a (hard) reboot the root
fileystem won't even be recognized again by mount and e2fschk can harldy
recover it (thousands of inodes go to lost+found, have to restore
backups to make the system work again). This happend even when the
partition was mounted _readonly_ and it happens to ALL partitions
mounted during suspend. ** I'm testing now by appending break=init to
the kernel command line, getting to a busybox on the initramfs, and then
unmounting "root" before suspending. From there i can dmesg to see
what's happening (though the dmesg buffer is quiet small...can i
increase that in proc somewhere?). I'd be willing to test and send
whatever logs you need to get this fixed.

Some additional infos: Upgrading from 2.6.24, I hoped the
AHCI_HFLAG_NO_MSI in drivers/ata/ahci.c might solve the issue - no luck.
All the other sb600 workarounds: obviousley no luck as well.
irqpoll: slightly different behaviour when unloading sd_mod and ahci
modules before suspending:
without irqpoll, the disk ([sda]) doesn't show up again after "modprobe
ahci; modprobe sd_mod" and I get "ata5.00: failed to IDENTIFY [...]
err_mask=0x80" "failed to restore some devices [...]" errors
with irqpoll, disk shows up again and no errors, but "there is different
data" on each read (head -c10000) from /dev/sda. Though the disk is not
changed, after rebooting it contains the original data. I just wonder
how the data is "created" - it seems to be disk content from different
locations (not beginning) on the disk - if i "dd if=/dev/sda
of=/dev/null", i hear the disk reading data....

Well - I hope you might be able to make some sense of that and tell me
what logs and dumps exactly you need to fix it...

Greets - Patrick



* I read many threads in which Tejun provided patches for the SB600 AHCI
Controller which seems to be seriously broken - if only i knew that in
advance... Maybe he can fix this issue as well - last ressort. Otherwise
I'll burn that mobo!

** After my firs install and configuring the system for a day, trying
out suspend to ram smashed it with no backups, since then i didn't learn
my lesson and smashed it again 2-3 times, this time with backups at hand
though, ...




             reply	other threads:[~2008-05-25 12:27 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-25 12:10 Patrick [this message]
2008-05-25 12:16 ` SB600 AHCI: Hard Disk Corruption Patrick
2008-05-25 17:38 ` Pavel Machek
2008-05-25 20:08   ` Patrick
2008-05-25 20:39     ` >3G => iommu => suspend problems -- was " Pavel Machek
2008-05-25 21:10       ` Pavel Machek
2008-05-26 15:31         ` Patrick
2008-05-27 11:22           ` Pavel Machek
2008-05-29 18:44             ` Patrick
2008-05-29 18:51               ` Patrick
2008-05-29 21:05               ` Patrick
2008-06-03 22:33             ` Rafael J. Wysocki
2008-06-06 13:20               ` Pavel Machek
2008-06-08 22:36                 ` Rafael J. Wysocki
     [not found]                   ` <20080609124630.GA28799@elte.hu>
2008-06-09 22:10                     ` [PATCH] x86 GART: Add resume handling (was: Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption) Rafael J. Wysocki
2008-06-10 10:03                       ` Rafael J. Wysocki
2008-06-12  9:34                         ` Ingo Molnar
2008-06-11 11:43                   ` >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption Patrick
2008-06-11 14:38                     ` Rafael J. Wysocki
2008-06-11 15:04                       ` Andi Kleen
2008-07-03 17:35                         ` Patrick
2008-08-07  8:17                           ` Pavel Machek
2008-08-08 22:40                             ` Patrick
2008-09-02  8:05                               ` Pavel Machek
2008-05-27 10:23         ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1211717435.6038.53.camel@localhost.localdomain \
    --to=ragamuffin@datacomm.ch \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox