From: David Greaves <david@dgreaves.com>
To: David Chinner <dgc@sgi.com>, Tejun Heo <htejun@gmail.com>
Cc: David Robinson <zxvdr.au@gmail.com>,
LVM general discussion and development <linux-lvm@redhat.com>,
"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com, linux-pm <linux-pm@lists.osdl.org>,
LinuxRaid <linux-raid@vger.kernel.org>,
"Rafael J. Wysocki" <rjw@sisk.pl>
Subject: Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
Date: Tue, 19 Jun 2007 10:24:23 +0100 [thread overview]
Message-ID: <4677A0C7.4000306@dgreaves.com> (raw)
In-Reply-To: <4676D97E.4000403@dgreaves.com>
David Greaves wrote:
> I'm going to have to do some more testing...
done
> David Chinner wrote:
>> On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:
>>> David Greaves wrote:
>>> So doing:
>>> xfs_freeze -f /scratch
>>> sync
>>> echo platform > /sys/power/disk
>>> echo disk > /sys/power/state
>>> # resume
>>> xfs_freeze -u /scratch
>>>
>>> Works (for now - more usage testing tonight)
>>
>> Verrry interesting.
> Good :)
Now, not so good :)
>> What you were seeing was an XFS shutdown occurring because the free space
>> btree was corrupted. IOWs, the process of suspend/resume has resulted
>> in either bad data being written to disk, the correct data not being
>> written to disk or the cached block being corrupted in memory.
> That's the kind of thing I was suspecting, yes.
>
>> If you run xfs_check on the filesystem after it has shut down after a
>> resume,
>> can you tell us if it reports on-disk corruption? Note: do not run
>> xfs_repair
>> to check this - it does not check the free space btrees; instead it
>> simply
>> rebuilds them from scratch. If xfs_check reports an error, then run
>> xfs_repair
>> to fix it up.
> OK, I can try this tonight...
This is on 2.6.22-rc5
So I hibernated last night and resumed this morning.
Before hibernating I froze and sync'ed. After resume I thawed it. (Sorry Dave)
Here are some photos of the screen during resume. This is not 100% reproducable
- it seems to occur only if the system is shutdown for 30mins or so.
Tejun, I wonder if error handling during resume is problematic? I got the same
errors in 2.6.21. I have never seen these (or any other libata) errors other
than during resume.
http://www.dgreaves.com/pub/2.6.22-rc5-resume-failure.jpg
(hard to read, here's one from 2.6.21
http://www.dgreaves.com/pub/2.6.21-resume-failure.jpg
I _think_ I've only seen the xfs problem when a resume shows these errors.
Ok, to try and cause a problem I ran a make and got this back at once:
make: stat: Makefile: Input/output error
make: stat: clean: Input/output error
make: *** No rule to make target `clean'. Stop.
make: stat: GNUmakefile: Input/output error
make: stat: makefile: Input/output error
I caught the first dmesg this time:
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file
fs/xfs/xfs_btree.c. Caller 0xc01b58e1
[<c0104f6a>] show_trace_log_lvl+0x1a/0x30
[<c0105c52>] show_trace+0x12/0x20
[<c0105d15>] dump_stack+0x15/0x20
[<c01daddf>] xfs_error_report+0x4f/0x60
[<c01cd736>] xfs_btree_check_sblock+0x56/0xd0
[<c01b58e1>] xfs_alloc_lookup+0x181/0x390
[<c01b5b06>] xfs_alloc_lookup_le+0x16/0x20
[<c01b30c1>] xfs_free_ag_extent+0x51/0x690
[<c01b4ea4>] xfs_free_extent+0xa4/0xc0
[<c01bf739>] xfs_bmap_finish+0x119/0x170
[<c01e3f4a>] xfs_itruncate_finish+0x23a/0x3a0
[<c02046a2>] xfs_inactive+0x482/0x500
[<c0210ad4>] xfs_fs_clear_inode+0x34/0xa0
[<c017d777>] clear_inode+0x57/0xe0
[<c017d8e5>] generic_delete_inode+0xe5/0x110
[<c017da77>] generic_drop_inode+0x167/0x1b0
[<c017cedf>] iput+0x5f/0x70
[<c01735cf>] do_unlinkat+0xdf/0x140
[<c0173640>] sys_unlink+0x10/0x20
[<c01040a4>] syscall_call+0x7/0xb
=======================
xfs_force_shutdown(dm-0,0x8) called from line 4258 of file fs/xfs/xfs_bmap.c.
Return address = 0xc021101e
Filesystem "dm-0": Corruption of in-memory data detected. Shutting down
filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)
so I cd'ed out of /scratch and umounted.
I then tried the xfs_check.
haze:~# xfs_check /dev/video_vg/video_lv
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_check. If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
haze:~# mount /scratch/
haze:~# umount /scratch/
haze:~# xfs_check /dev/video_vg/video_lv
Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'xfs_db'
Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767bc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0
Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed
Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:
Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'syslogd'
Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767cc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0
Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed
Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:
ugh. Try again
haze:~# xfs_check /dev/video_vg/video_lv
haze:~#
whilst running a top reported this as roughly the peak memory usage:
8759 root 18 0 479m 474m 876 R 2.0 46.9 0:02.49 xfs_db
so it looks like it didn't run out of memory (machine has 1Gb).
Dave, I ran xfs_check -v... but I got bored when it reached 122M of bz2
compressed output with no sign of stopping... still got it if it's any use...
lots of:
setting block 0/0 to sb
setting block 0/1 to freelist
setting block 0/2 to freelist
setting block 0/3 to freelist
setting block 0/4 to freelist
setting block 0/75 to btbno
setting block 0/346901 to free1
setting block 0/346903 to free1
setting block 0/346904 to free1
setting block 0/346905 to free1
and stuff like this
inode 128 mode 040777 fmt extents afmt extents nex 1 anex 0 nblk 1 sz 4096
inode 128 nlink 39 is dir
inode 128 extent [0,7,1,0]
I then rebooted and ran a repair which didn't show any damage.
David
next prev parent reply other threads:[~2007-06-19 9:24 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-16 19:56 2.6.22-rc4 XFS fails after hibernate/resume David Greaves
2007-06-16 22:29 ` [linux-lvm] " David Robinson
2007-06-17 11:38 ` David Greaves
2007-06-18 7:49 ` David Greaves
2007-06-18 14:50 ` David Chinner
2007-06-18 19:14 ` David Greaves
2007-06-19 9:24 ` David Greaves [this message]
2007-06-19 9:44 ` [linux-lvm] 2.6.22-rc5 " Tejun Heo
2007-06-19 14:13 ` David Greaves
2007-06-20 8:03 ` Tejun Heo
2007-06-21 18:06 ` David Greaves
2007-06-29 8:20 ` David Greaves
2007-07-02 10:56 ` Tejun Heo
2007-07-02 14:08 ` Rafael J. Wysocki
2007-07-02 14:32 ` David Greaves
2007-07-02 15:12 ` Rafael J. Wysocki
2007-07-02 16:36 ` David Greaves
2007-07-02 20:15 ` Rafael J. Wysocki
2007-06-19 11:21 ` Rafael J. Wysocki
2007-06-19 15:31 ` David Greaves
2007-06-20 0:18 ` David Chinner
2007-06-27 20:49 ` [linux-lvm] 2.6.22-rc4 " Pavel Machek
2007-06-28 15:27 ` Rafael J. Wysocki
2007-06-28 22:00 ` [linux-pm] " Pavel Machek
2007-06-28 22:16 ` Rafael J. Wysocki
2007-06-29 5:00 ` David Chinner
2007-06-29 7:40 ` David Greaves
2007-06-29 7:43 ` David Chinner
2007-06-29 7:54 ` David Greaves
2007-06-29 13:18 ` Rafael J. Wysocki
2007-06-29 13:30 ` David Greaves
2007-06-29 4:55 ` David Chinner
2007-06-16 22:47 ` Rafael J. Wysocki
2007-06-17 11:37 ` David Greaves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4677A0C7.4000306@dgreaves.com \
--to=david@dgreaves.com \
--cc=dgc@sgi.com \
--cc=htejun@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-lvm@redhat.com \
--cc=linux-pm@lists.osdl.org \
--cc=linux-raid@vger.kernel.org \
--cc=rjw@sisk.pl \
--cc=xfs@oss.sgi.com \
--cc=zxvdr.au@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).