From: David Greaves <david@dgreaves.com>
To: Tejun Heo <htejun@gmail.com>
Cc: David Chinner <dgc@sgi.com>, David Robinson <zxvdr.au@gmail.com>,
LVM general discussion and development <linux-lvm@redhat.com>,
"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com, linux-pm <linux-pm@lists.osdl.org>,
LinuxRaid <linux-raid@vger.kernel.org>,
"Rafael J. Wysocki" <rjw@sisk.pl>
Subject: Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
Date: Thu, 21 Jun 2007 19:06:29 +0100 [thread overview]
Message-ID: <467ABE25.7060303@dgreaves.com> (raw)
In-Reply-To: <4678DF56.1020903@gmail.com>
been away, back now...
Tejun Heo wrote:
> David Greaves wrote:
>> Tejun Heo wrote:
>>> How reproducible is the problem? Does the problem go away or occur more
>>> often if you change the drive you write the memory image to?
>> I don't think there should be activity on the sda drive during resume
>> itself.
>>
>> [I broke my / md mirror and am using some of that for swap/resume for now]
>>
>> I did change the swap/resume device to sdd2 (different controller,
>> onboard sata_via) and there was no EH during resume. The system seemed
>> OK, wrote a few Gb of video and did a kernel compile.
>> I repeated this test, no EH during resume, no problems.
>> I even ran xfs_fsr, the defragment utility, to stress the fs.
>>
>> I retain this configuration and try again tonight but it looks like
>> there _may_ be a link between EH during resume and my problems...
Having retained this new configuration for a couple of days now I haven't had
any problems.
This is good but not really ideal since / isn't mirrored anymore :(
>> Of course, I don't understand why it *should* EH during resume, it
>> doesn't during boot or normal operation...
>
> EH occurs during boot, suspend and resume all the time. It just runs in
> quiet mode to avoid disturbing the users too much. In your case, EH is
> kicking in due to actual exception conditions so it's being verbose to
> give clue about what's going on.
I was trying to say that I don't actually see any errors being handled in normal
operation.
I'm not sure if you are saying that these PHY RDY events are normally handled
quietly (which would explain it).
> It's really weird tho. The PHY RDY status changed events are coming
> from the device which is NOT used while resuming
yes - but the erroring device which is not being used is on the same controller
as the device with the in-use resume partition.
> and it's before any
> actual PM events are triggered. Your kernel just boots, swsusp realizes
> it's resuming and tries to read memory image from the swap device.
yes
> While reading, the disk controller raises consecutive PHY readiness
> changed interrupts. EH recovers them alright but the end result seems
> to indicate that the loaded image is corrupt.
Yes, that's consistent with what I'm seeing.
When I move the swap/resume partition to a different controller (ie when I broke
the / mirror and used the freed space) the problem seems to go away.
I am seeing messages in dmesg though:
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ata1.00: configured for UDMA/100
ata2.00: revalidation failed (errno=-2)
ata2: failed to recover some devices, retrying in 5 secs
sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB)
sd 0:0:0:0: resuming
sd 0:0:0:0: [sda] Starting disk
ATA: abnormal status 0x7F on port 0x00019807
ATA: abnormal status 0x7F on port 0x00019007
ATA: abnormal status 0x7F on port 0x00019007
ATA: abnormal status 0x7F on port 0x00019807
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ata1.00: configured for UDMA/100
ata2.00: revalidation failed (errno=-2)
ata2: failed to recover some devices, retrying in 5 secs
> So, there's no device suspend/resume code involved at all. The kernel
> just booted and is trying to read data from the drive. Please try with
> only the first drive attached and see what happens.
That's kinda hard; swap and root are on different drives...
Does it help that although the errors above appear, the system seems OK when I
just use the other controller?
I have to be cautious what I do with this machine as it's the wife's active
desktop box <grin>.
David
next prev parent reply other threads:[~2007-06-21 18:06 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-16 19:56 [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume David Greaves
2007-06-16 19:56 ` David Greaves
2007-06-16 19:56 ` David Greaves
2007-06-16 22:29 ` [linux-lvm] " David Robinson
2007-06-16 22:29 ` David Robinson
2007-06-16 22:29 ` David Robinson
2007-06-17 11:38 ` [linux-lvm] " David Greaves
2007-06-17 11:38 ` David Greaves
2007-06-18 7:49 ` David Greaves
2007-06-18 7:49 ` David Greaves
2007-06-18 14:50 ` David Chinner
2007-06-18 19:14 ` David Greaves
2007-06-18 19:14 ` David Greaves
2007-06-19 9:24 ` [linux-lvm] 2.6.22-rc5 " David Greaves
2007-06-19 9:44 ` Tejun Heo
2007-06-19 9:44 ` Tejun Heo
2007-06-19 14:13 ` David Greaves
2007-06-20 8:03 ` Tejun Heo
2007-06-21 18:06 ` David Greaves [this message]
2007-06-29 8:20 ` David Greaves
2007-07-02 10:56 ` Tejun Heo
2007-07-02 14:08 ` Rafael J. Wysocki
2007-07-02 14:32 ` David Greaves
2007-07-02 15:12 ` Rafael J. Wysocki
2007-07-02 16:36 ` David Greaves
2007-07-02 20:15 ` Rafael J. Wysocki
2007-06-19 11:21 ` Rafael J. Wysocki
2007-06-19 15:31 ` David Greaves
2007-06-19 15:31 ` David Greaves
2007-06-20 0:18 ` David Chinner
2007-06-27 20:49 ` [linux-lvm] 2.6.22-rc4 " Pavel Machek
2007-06-28 15:27 ` Rafael J. Wysocki
2007-06-28 22:00 ` [linux-pm] " Pavel Machek
2007-06-28 22:16 ` Rafael J. Wysocki
2007-06-29 5:00 ` David Chinner
2007-06-29 7:40 ` David Greaves
2007-06-29 7:43 ` David Chinner
2007-06-29 7:54 ` David Greaves
2007-06-29 13:18 ` Rafael J. Wysocki
2007-06-29 13:30 ` David Greaves
2007-06-29 4:55 ` David Chinner
2007-06-16 22:47 ` Rafael J. Wysocki
2007-06-16 22:47 ` Rafael J. Wysocki
2007-06-17 11:37 ` [linux-lvm] " David Greaves
2007-06-17 11:37 ` David Greaves
2007-06-17 11:37 ` David Greaves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=467ABE25.7060303@dgreaves.com \
--to=david@dgreaves.com \
--cc=dgc@sgi.com \
--cc=htejun@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-lvm@redhat.com \
--cc=linux-pm@lists.osdl.org \
--cc=linux-raid@vger.kernel.org \
--cc=rjw@sisk.pl \
--cc=xfs@oss.sgi.com \
--cc=zxvdr.au@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.