All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Greaves <david@dgreaves.com>
To: David Chinner <dgc@sgi.com>, Tejun Heo <htejun@gmail.com>
Cc: David Robinson <zxvdr.au@gmail.com>,
	LVM general discussion and development <linux-lvm@redhat.com>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com, linux-pm <linux-pm@lists.osdl.org>,
	LinuxRaid <linux-raid@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>
Subject: Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
Date: Tue, 19 Jun 2007 10:24:23 +0100	[thread overview]
Message-ID: <4677A0C7.4000306@dgreaves.com> (raw)
In-Reply-To: <4676D97E.4000403@dgreaves.com>

David Greaves wrote:
> I'm going to have to do some more testing...
done


> David Chinner wrote:
>> On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:
>>> David Greaves wrote:
>>> So doing:
>>> xfs_freeze -f /scratch
>>> sync
>>> echo platform > /sys/power/disk
>>> echo disk > /sys/power/state
>>> # resume
>>> xfs_freeze -u /scratch
>>>
>>> Works (for now - more usage testing tonight)
>>
>> Verrry interesting.
> Good :)
Now, not so good :)


>> What you were seeing was an XFS shutdown occurring because the free space
>> btree was corrupted. IOWs, the process of suspend/resume has resulted
>> in either bad data being written to disk, the correct data not being
>> written to disk or the cached block being corrupted in memory.
> That's the kind of thing I was suspecting, yes.
> 
>> If you run xfs_check on the filesystem after it has shut down after a 
>> resume,
>> can you tell us if it reports on-disk corruption? Note: do not run 
>> xfs_repair
>> to check this - it does not check the free space btrees; instead it 
>> simply
>> rebuilds them from scratch. If xfs_check reports an error, then run 
>> xfs_repair
>> to fix it up.
> OK, I can try this tonight...


This is on 2.6.22-rc5

So I hibernated last night and resumed this morning.
Before hibernating I froze and sync'ed. After resume I thawed it. (Sorry Dave)

Here are some photos of the screen during resume. This is not 100% reproducable 
- it seems to occur only if the system is shutdown for 30mins or so.

Tejun, I wonder if error handling during resume is problematic? I got the same 
errors in 2.6.21. I have never seen these (or any other libata) errors other 
than during resume.

http://www.dgreaves.com/pub/2.6.22-rc5-resume-failure.jpg
(hard to read, here's one from 2.6.21
http://www.dgreaves.com/pub/2.6.21-resume-failure.jpg

I _think_ I've only seen the xfs problem when a resume shows these errors.


Ok, to try and cause a problem I ran a make and got this back at once:
make: stat: Makefile: Input/output error
make: stat: clean: Input/output error
make: *** No rule to make target `clean'.  Stop.
make: stat: GNUmakefile: Input/output error
make: stat: makefile: Input/output error


I caught the first dmesg this time:

Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b58e1
  [<c0104f6a>] show_trace_log_lvl+0x1a/0x30
  [<c0105c52>] show_trace+0x12/0x20
  [<c0105d15>] dump_stack+0x15/0x20
  [<c01daddf>] xfs_error_report+0x4f/0x60
  [<c01cd736>] xfs_btree_check_sblock+0x56/0xd0
  [<c01b58e1>] xfs_alloc_lookup+0x181/0x390
  [<c01b5b06>] xfs_alloc_lookup_le+0x16/0x20
  [<c01b30c1>] xfs_free_ag_extent+0x51/0x690
  [<c01b4ea4>] xfs_free_extent+0xa4/0xc0
  [<c01bf739>] xfs_bmap_finish+0x119/0x170
  [<c01e3f4a>] xfs_itruncate_finish+0x23a/0x3a0
  [<c02046a2>] xfs_inactive+0x482/0x500
  [<c0210ad4>] xfs_fs_clear_inode+0x34/0xa0
  [<c017d777>] clear_inode+0x57/0xe0
  [<c017d8e5>] generic_delete_inode+0xe5/0x110
  [<c017da77>] generic_drop_inode+0x167/0x1b0
  [<c017cedf>] iput+0x5f/0x70
  [<c01735cf>] do_unlinkat+0xdf/0x140
  [<c0173640>] sys_unlink+0x10/0x20
  [<c01040a4>] syscall_call+0x7/0xb
  =======================
xfs_force_shutdown(dm-0,0x8) called from line 4258 of file fs/xfs/xfs_bmap.c. 
Return address = 0xc021101e
Filesystem "dm-0": Corruption of in-memory data detected.  Shutting down 
filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)

so I cd'ed out of /scratch and umounted.

I then tried the xfs_check.

haze:~# xfs_check /dev/video_vg/video_lv
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_check.  If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
haze:~# mount /scratch/
haze:~# umount /scratch/
haze:~# xfs_check /dev/video_vg/video_lv

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'xfs_db'

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767bc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'syslogd'

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767cc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:

ugh. Try again
haze:~# xfs_check /dev/video_vg/video_lv
haze:~#

whilst running a top reported this as roughly the peak memory usage:
  8759 root      18   0  479m 474m  876 R  2.0 46.9   0:02.49 xfs_db
so it looks like it didn't run out of memory (machine has 1Gb).

Dave, I ran xfs_check -v... but I got bored when it reached 122M of bz2 
compressed output with no sign of stopping... still got it if it's any use...

lots of:
setting block 0/0 to sb
setting block 0/1 to freelist
setting block 0/2 to freelist
setting block 0/3 to freelist
setting block 0/4 to freelist
setting block 0/75 to btbno
setting block 0/346901 to free1
setting block 0/346903 to free1
setting block 0/346904 to free1
setting block 0/346905 to free1
   and stuff like this
inode 128 mode 040777 fmt extents afmt extents nex 1 anex 0 nblk 1 sz 4096
inode 128 nlink 39 is dir
inode 128 extent [0,7,1,0]

I then rebooted and ran a repair which didn't show any damage.

David


  reply	other threads:[~2007-06-19  9:24 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-16 19:56 [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume David Greaves
2007-06-16 19:56 ` David Greaves
2007-06-16 19:56 ` David Greaves
2007-06-16 22:29 ` [linux-lvm] " David Robinson
2007-06-16 22:29   ` David Robinson
2007-06-16 22:29   ` David Robinson
2007-06-17 11:38   ` [linux-lvm] " David Greaves
2007-06-17 11:38     ` David Greaves
2007-06-18  7:49     ` David Greaves
2007-06-18  7:49       ` David Greaves
2007-06-18 14:50       ` David Chinner
2007-06-18 19:14         ` David Greaves
2007-06-18 19:14           ` David Greaves
2007-06-19  9:24           ` David Greaves [this message]
2007-06-19  9:44             ` [linux-lvm] 2.6.22-rc5 " Tejun Heo
2007-06-19  9:44               ` Tejun Heo
2007-06-19 14:13               ` David Greaves
2007-06-20  8:03                 ` Tejun Heo
2007-06-21 18:06                   ` David Greaves
2007-06-29  8:20                     ` David Greaves
2007-07-02 10:56                       ` Tejun Heo
2007-07-02 14:08                         ` Rafael J. Wysocki
2007-07-02 14:32                           ` David Greaves
2007-07-02 15:12                             ` Rafael J. Wysocki
2007-07-02 16:36                               ` David Greaves
2007-07-02 20:15                                 ` Rafael J. Wysocki
2007-06-19 11:21             ` Rafael J. Wysocki
2007-06-19 15:31               ` David Greaves
2007-06-19 15:31                 ` David Greaves
2007-06-20  0:18             ` David Chinner
2007-06-27 20:49         ` [linux-lvm] 2.6.22-rc4 " Pavel Machek
2007-06-28 15:27           ` Rafael J. Wysocki
2007-06-28 22:00             ` [linux-pm] " Pavel Machek
2007-06-28 22:16               ` Rafael J. Wysocki
2007-06-29  5:00                 ` David Chinner
2007-06-29  7:40                   ` David Greaves
2007-06-29  7:43                     ` David Chinner
2007-06-29  7:54                       ` David Greaves
2007-06-29 13:18                         ` Rafael J. Wysocki
2007-06-29 13:30                           ` David Greaves
2007-06-29  4:55           ` David Chinner
2007-06-16 22:47 ` Rafael J. Wysocki
2007-06-16 22:47   ` Rafael J. Wysocki
2007-06-17 11:37   ` [linux-lvm] " David Greaves
2007-06-17 11:37     ` David Greaves
2007-06-17 11:37     ` David Greaves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4677A0C7.4000306@dgreaves.com \
    --to=david@dgreaves.com \
    --cc=dgc@sgi.com \
    --cc=htejun@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-lvm@redhat.com \
    --cc=linux-pm@lists.osdl.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=xfs@oss.sgi.com \
    --cc=zxvdr.au@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.