linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Greaves <david@dgreaves.com>
To: David Chinner <dgc@sgi.com>, Tejun Heo <htejun@gmail.com>
Cc: David Robinson <zxvdr.au@gmail.com>,
	LVM general discussion and development  <linux-lvm@redhat.com>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com, linux-pm <linux-pm@lists.osdl.org>,
	LinuxRaid <linux-raid@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>
Subject: Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
Date: Tue, 19 Jun 2007 10:24:23 +0100	[thread overview]
Message-ID: <4677A0C7.4000306@dgreaves.com> (raw)
In-Reply-To: <4676D97E.4000403@dgreaves.com>

David Greaves wrote:
> I'm going to have to do some more testing...
done


> David Chinner wrote:
>> On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:
>>> David Greaves wrote:
>>> So doing:
>>> xfs_freeze -f /scratch
>>> sync
>>> echo platform > /sys/power/disk
>>> echo disk > /sys/power/state
>>> # resume
>>> xfs_freeze -u /scratch
>>>
>>> Works (for now - more usage testing tonight)
>>
>> Verrry interesting.
> Good :)
Now, not so good :)


>> What you were seeing was an XFS shutdown occurring because the free space
>> btree was corrupted. IOWs, the process of suspend/resume has resulted
>> in either bad data being written to disk, the correct data not being
>> written to disk or the cached block being corrupted in memory.
> That's the kind of thing I was suspecting, yes.
> 
>> If you run xfs_check on the filesystem after it has shut down after a 
>> resume,
>> can you tell us if it reports on-disk corruption? Note: do not run 
>> xfs_repair
>> to check this - it does not check the free space btrees; instead it 
>> simply
>> rebuilds them from scratch. If xfs_check reports an error, then run 
>> xfs_repair
>> to fix it up.
> OK, I can try this tonight...


This is on 2.6.22-rc5

So I hibernated last night and resumed this morning.
Before hibernating I froze and sync'ed. After resume I thawed it. (Sorry Dave)

Here are some photos of the screen during resume. This is not 100% reproducable 
- it seems to occur only if the system is shutdown for 30mins or so.

Tejun, I wonder if error handling during resume is problematic? I got the same 
errors in 2.6.21. I have never seen these (or any other libata) errors other 
than during resume.

http://www.dgreaves.com/pub/2.6.22-rc5-resume-failure.jpg
(hard to read, here's one from 2.6.21
http://www.dgreaves.com/pub/2.6.21-resume-failure.jpg

I _think_ I've only seen the xfs problem when a resume shows these errors.


Ok, to try and cause a problem I ran a make and got this back at once:
make: stat: Makefile: Input/output error
make: stat: clean: Input/output error
make: *** No rule to make target `clean'.  Stop.
make: stat: GNUmakefile: Input/output error
make: stat: makefile: Input/output error


I caught the first dmesg this time:

Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b58e1
  [<c0104f6a>] show_trace_log_lvl+0x1a/0x30
  [<c0105c52>] show_trace+0x12/0x20
  [<c0105d15>] dump_stack+0x15/0x20
  [<c01daddf>] xfs_error_report+0x4f/0x60
  [<c01cd736>] xfs_btree_check_sblock+0x56/0xd0
  [<c01b58e1>] xfs_alloc_lookup+0x181/0x390
  [<c01b5b06>] xfs_alloc_lookup_le+0x16/0x20
  [<c01b30c1>] xfs_free_ag_extent+0x51/0x690
  [<c01b4ea4>] xfs_free_extent+0xa4/0xc0
  [<c01bf739>] xfs_bmap_finish+0x119/0x170
  [<c01e3f4a>] xfs_itruncate_finish+0x23a/0x3a0
  [<c02046a2>] xfs_inactive+0x482/0x500
  [<c0210ad4>] xfs_fs_clear_inode+0x34/0xa0
  [<c017d777>] clear_inode+0x57/0xe0
  [<c017d8e5>] generic_delete_inode+0xe5/0x110
  [<c017da77>] generic_drop_inode+0x167/0x1b0
  [<c017cedf>] iput+0x5f/0x70
  [<c01735cf>] do_unlinkat+0xdf/0x140
  [<c0173640>] sys_unlink+0x10/0x20
  [<c01040a4>] syscall_call+0x7/0xb
  =======================
xfs_force_shutdown(dm-0,0x8) called from line 4258 of file fs/xfs/xfs_bmap.c. 
Return address = 0xc021101e
Filesystem "dm-0": Corruption of in-memory data detected.  Shutting down 
filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)

so I cd'ed out of /scratch and umounted.

I then tried the xfs_check.

haze:~# xfs_check /dev/video_vg/video_lv
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_check.  If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
haze:~# mount /scratch/
haze:~# umount /scratch/
haze:~# xfs_check /dev/video_vg/video_lv

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'xfs_db'

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767bc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'syslogd'

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767cc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:

ugh. Try again
haze:~# xfs_check /dev/video_vg/video_lv
haze:~#

whilst running a top reported this as roughly the peak memory usage:
  8759 root      18   0  479m 474m  876 R  2.0 46.9   0:02.49 xfs_db
so it looks like it didn't run out of memory (machine has 1Gb).

Dave, I ran xfs_check -v... but I got bored when it reached 122M of bz2 
compressed output with no sign of stopping... still got it if it's any use...

lots of:
setting block 0/0 to sb
setting block 0/1 to freelist
setting block 0/2 to freelist
setting block 0/3 to freelist
setting block 0/4 to freelist
setting block 0/75 to btbno
setting block 0/346901 to free1
setting block 0/346903 to free1
setting block 0/346904 to free1
setting block 0/346905 to free1
   and stuff like this
inode 128 mode 040777 fmt extents afmt extents nex 1 anex 0 nblk 1 sz 4096
inode 128 nlink 39 is dir
inode 128 extent [0,7,1,0]

I then rebooted and ran a repair which didn't show any damage.

David


  reply	other threads:[~2007-06-19  9:24 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-16 19:56 2.6.22-rc4 XFS fails after hibernate/resume David Greaves
2007-06-16 22:29 ` [linux-lvm] " David Robinson
2007-06-17 11:38   ` David Greaves
2007-06-18  7:49     ` David Greaves
2007-06-18 14:50       ` David Chinner
2007-06-18 19:14         ` David Greaves
2007-06-19  9:24           ` David Greaves [this message]
2007-06-19  9:44             ` [linux-lvm] 2.6.22-rc5 " Tejun Heo
2007-06-19 14:13               ` David Greaves
2007-06-20  8:03                 ` Tejun Heo
2007-06-21 18:06                   ` David Greaves
2007-06-29  8:20                     ` David Greaves
2007-07-02 10:56                       ` Tejun Heo
2007-07-02 14:08                         ` Rafael J. Wysocki
2007-07-02 14:32                           ` David Greaves
2007-07-02 15:12                             ` Rafael J. Wysocki
2007-07-02 16:36                               ` David Greaves
2007-07-02 20:15                                 ` Rafael J. Wysocki
2007-06-19 11:21             ` Rafael J. Wysocki
2007-06-19 15:31               ` David Greaves
2007-06-20  0:18             ` David Chinner
2007-06-27 20:49         ` [linux-lvm] 2.6.22-rc4 " Pavel Machek
2007-06-28 15:27           ` Rafael J. Wysocki
2007-06-28 22:00             ` [linux-pm] " Pavel Machek
2007-06-28 22:16               ` Rafael J. Wysocki
2007-06-29  5:00                 ` David Chinner
2007-06-29  7:40                   ` David Greaves
2007-06-29  7:43                     ` David Chinner
2007-06-29  7:54                       ` David Greaves
2007-06-29 13:18                         ` Rafael J. Wysocki
2007-06-29 13:30                           ` David Greaves
2007-06-29  4:55           ` David Chinner
2007-06-16 22:47 ` Rafael J. Wysocki
2007-06-17 11:37   ` David Greaves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4677A0C7.4000306@dgreaves.com \
    --to=david@dgreaves.com \
    --cc=dgc@sgi.com \
    --cc=htejun@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-lvm@redhat.com \
    --cc=linux-pm@lists.osdl.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=xfs@oss.sgi.com \
    --cc=zxvdr.au@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).