Oops while going into hibernate

All of lore.kernel.org
 help / color / mirror / Atom feed

* Oops while going into hibernate
@ 2011-01-12 13:48 Sebastian Ott
  2011-01-12 16:26 ` Ted Ts'o
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Ott @ 2011-01-12 13:48 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel

Hi,

while goning into hibernation right after resuming with
2.6.37-05668-g4162cf6 on S390 I get this one:

[  270.317785] PM: Basic memory bitmaps created
[  270.317789] PM: Syncing filesystems ...
[  270.318854] Unable to handle kernel pointer dereference at virtual kernel address           (null)
[  270.318865] Oops: 0004 [#1] PREEMPT SMP
[  270.318878] Modules linked in: binfmt_misc dm_multipath scsi_dh vmur [last unloaded: scsi_wait_scan]
[  270.318900] CPU: 0 Not tainted 2.6.37-05668-g4162cf6 #19
[  270.318907] Process flush-94:0 (pid: 1226, task: 000000003f0a0040, ksp: 000000003d75fc70)
[  270.318916] Krnl PSW : 0704000180000000 0000000000206662 (jbd2_journal_file_inode+0xfe/0x148)
[  270.318938]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
[  270.318948] Krnl GPRS: ffe62f366dac01a2 0000000000000000 0000000000000000 00000000007e4260
[  270.318961]            00000000002065dc 6880000000000000 000000003f234d60 0000000000000005
[  270.318971]            000000003f6e15a8 000000003d75fa88 000000003ed59228 000000003ed58c60
[  270.319105]            0000000000000000 000000003f036488 00000000002065dc 000000003d75f9a0
[  270.319123] Krnl Code: 0000000000206656: b9140022            lgfr    %r2,%r2
[  270.319133]            000000000020665a: 07f4                bcr     15,%r4
[  270.319145]            000000000020665c: ec260023007c        cgij    %r2,0,6,2066a2
[  270.319158]           >0000000000206662: e3d0c0000024        stg     %r13,0(%r12)
[  270.319181]            0000000000206668: 4110c010            la      %r1,16(%r12)
[  270.319198]            000000000020666c: e320d0600004        lg      %r2,96(%r13)
[  270.319241]            0000000000206672: e31020080024        stg     %r1,8(%r2)
[  270.319260]            0000000000206678: e320c0100024        stg     %r2,16(%r12)
[  270.319281] Call Trace:
[  270.319289] ([<00000000002065dc>] jbd2_journal_file_inode+0x78/0x148)
[  270.319303]  [<00000000001d0f66>] mpage_da_map_and_submit+0x2ea/0x440
[  270.319319]  [<00000000001d197a>] ext4_da_writepages+0x32e/0x744
[  270.319334]  [<000000000016fe64>] writeback_single_inode+0xd4/0x25c
[  270.319391]  [<0000000000170514>] writeback_sb_inodes+0xd0/0x160
[  270.319396]  [<0000000000171278>] writeback_inodes_wb+0x90/0x16c
[  270.319408]  [<00000000001715fa>] wb_writeback+0x2a6/0x480
[  270.319414]  [<00000000001718b6>] wb_do_writeback+0xe2/0x294
[  270.319419]  [<0000000000171b14>] bdi_writeback_thread+0xac/0x2fc
[  270.319424]  [<000000000007d192>] kthread+0xbe/0xc8
[  270.319457]  [<000000000047910a>] kernel_thread_starter+0x6/0xc
[  270.319467]  [<0000000000479104>] kernel_thread_starter+0x0/0xc

bisecting points to:

commit 8aefcd557d26d0023a36f9ec5afbf55e59f8f26b
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Mon Jan 10 12:29:43 2011 -0500

    ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary

    Replace the jbd2_inode structure (which is 48 bytes) with a pointer
    and only allocate the jbd2_inode when it is needed --- that is, when
    the file system has a journal present and the inode has been opened
    for writing.  This allows us to further slim down the ext4_inode_info
    structure.

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-12 13:48 Oops while going into hibernate Sebastian Ott
@ 2011-01-12 16:26 ` Ted Ts'o
  2011-01-12 16:56   ` Sebastian Ott
  0 siblings, 1 reply; 34+ messages in thread
From: Ted Ts'o @ 2011-01-12 16:26 UTC (permalink / raw)
  To: Sebastian Ott; +Cc: linux-ext4, linux-kernel

On Wed, Jan 12, 2011 at 02:48:58PM +0100, Sebastian Ott wrote:
> Hi,
> 
> while goning into hibernation right after resuming with
> 2.6.37-05668-g4162cf6 on S390 I get this one:

Hmm... what are you hibernating to?  A swap device?  A file-backed
swap device?

Thanks,

						- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-12 16:26 ` Ted Ts'o
@ 2011-01-12 16:56   ` Sebastian Ott
  2011-01-12 17:26     ` Ted Ts'o
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Ott @ 2011-01-12 16:56 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel



On Wed, 12 Jan 2011, Ted Ts'o wrote:
> On Wed, Jan 12, 2011 at 02:48:58PM +0100, Sebastian Ott wrote:
> > Hi,
> > 
> > while goning into hibernation right after resuming with
> > 2.6.37-05668-g4162cf6 on S390 I get this one:
> 
> Hmm... what are you hibernating to?  A swap device?  A file-backed
> swap device?
swap device.

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-12 16:56   ` Sebastian Ott
@ 2011-01-12 17:26     ` Ted Ts'o
  2011-01-12 17:37       ` Sebastian Ott
  2011-01-12 18:49       ` Sebastian Ott
  0 siblings, 2 replies; 34+ messages in thread
From: Ted Ts'o @ 2011-01-12 17:26 UTC (permalink / raw)
  To: Sebastian Ott; +Cc: linux-ext4, linux-kernel

Since I don't have a machine set up to test hibernation easily at
hand, I'd really appreciate it if you could try this patch to
determine which inode had the NULL jinode --- and then once you get
the device and inode number, to use debugfs's "ncheck" command to map
the inode number to a pathname.

If you could do that, it would be a huge help.

Thanks, regards,

						- Ted

P.S.  Also, if you could try suspending once or twice, with different
programs running, to see if the inode number and pathname are constant
or vary, that would also be helpful.

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index d8b992e..7d6d7d7 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -252,8 +252,15 @@ static inline int ext4_journal_force_commit(journal_t *journal)

 static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode)
 {
-	if (ext4_handle_valid(handle))
+	if (ext4_handle_valid(handle)) {
+		if (unlikely(EXT4_I(inode)->jinode == NULL)) {
+			/* Should never happen */
+			ext4_msg(inode->i_sb, KERN_CRIT, 
+				 "inode #%lu has NULL jinode", inode->i_ino);
+			BUG();
+		}
 		return jbd2_journal_file_inode(handle, EXT4_I(inode)->jinode);
+	}
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-12 17:26     ` Ted Ts'o
@ 2011-01-12 17:37       ` Sebastian Ott
  2011-01-12 18:49       ` Sebastian Ott
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-12 17:37 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel


On Wed, 12 Jan 2011, Ted Ts'o wrote:
> Since I don't have a machine set up to test hibernation easily at
> hand, I'd really appreciate it if you could try this patch to
> determine which inode had the NULL jinode --- and then once you get
> the device and inode number, to use debugfs's "ncheck" command to map
> the inode number to a pathname.
> 
> If you could do that, it would be a huge help.
ok, will do

> 
> Thanks, regards,
> 
> 						- Ted
> 
> P.S.  Also, if you could try suspending once or twice, with different
> programs running, to see if the inode number and pathname are constant
> or vary, that would also be helpful.
> 
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index d8b992e..7d6d7d7 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -252,8 +252,15 @@ static inline int ext4_journal_force_commit(journal_t *journal)
> 
>  static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode)
>  {
> -	if (ext4_handle_valid(handle))
> +	if (ext4_handle_valid(handle)) {
> +		if (unlikely(EXT4_I(inode)->jinode == NULL)) {
> +			/* Should never happen */
> +			ext4_msg(inode->i_sb, KERN_CRIT, 
> +				 "inode #%lu has NULL jinode", inode->i_ino);
> +			BUG();
> +		}
>  		return jbd2_journal_file_inode(handle, EXT4_I(inode)->jinode);
> +	}
>  	return 0;
>  }
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-12 17:26     ` Ted Ts'o
  2011-01-12 17:37       ` Sebastian Ott
@ 2011-01-12 18:49       ` Sebastian Ott
  2011-01-13  0:44         ` Theodore Tso
  2011-01-13  0:44         ` Theodore Tso
  1 sibling, 2 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-12 18:49 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, linux-kernel

Hi,
On Wed, 12 Jan 2011, Ted Ts'o wrote:
> Since I don't have a machine set up to test hibernation easily at
> hand, I'd really appreciate it if you could try this patch to
> determine which inode had the NULL jinode --- and then once you get
> the device and inode number, to use debugfs's "ncheck" command to map
> the inode number to a pathname.
> 
> If you could do that, it would be a huge help.

I did about 5 re-tests with different loads, the result was always:

[  249.238697] EXT4-fs (dasda1): inode #1106 has NULL jinode
[  249.238732] ------------[ cut here ]------------
[  249.238738] kernel BUG at fs/ext4/ext4_jbd2.h:260!
[  249.238747] illegal operation: 0001 [#1] PREEMPT SMP 
[  249.238760] Modules linked in: binfmt_misc dm_multipath scsi_dh vmur [last unloaded: scsi_wait_scan]
[  249.238786] CPU: 2 Not tainted 2.6.37-05668-g4162cf6-dirty #20
[  249.238794] Process flush-94:0 (pid: 1244, task: 000000003f110040, ksp: 000000003dca3c70)
[  249.238803] Krnl PSW : 0704000180000000 00000000001d1130 (mpage_da_map_and_submit+0x464/0x468)
[  249.238824]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
[  249.238835] Krnl GPRS: 000000000000c65f 0000000000000000 0000000000000043 0000000000000000
[  249.238848]            00000000004735c0 0000000000000000 000000003ee45478 0000000000000001
[  249.238863]            0000000035544aa8 000000003dca3a88 0000000000000004 000000003d29d680
[  249.238868]            000000003dca3b80 000000000049ef98 00000000001d112c 000000003dca39d8
[  249.238886] Krnl Code: 00000000001d1120: c040001e01c2        larl    %r4,5914a4
[  249.238893]            00000000001d1126: c0e5000092ed        brasl   %r14,1e3700
[  249.238899]            00000000001d112c: a7f40001            brc     15,1d112e
[  249.238906]           >00000000001d1130: a7f40000            brc     15,1d1130
[  249.238913]            00000000001d1134: e3e0f0080024        stg     %r14,8(%r15)
[  249.238920]            00000000001d113a: c0f400000006        brcl    15,1d1146
[  249.238942]            00000000001d1140: b9040000            lgr     %r0,%r0
[  249.238963]            00000000001d1144: 0de1                basr    %r14,%r1
[  249.238989] Call Trace:
[  249.238998] ([<00000000001d112c>] mpage_da_map_and_submit+0x460/0x468)
[  249.239012]  [<00000000001d19f2>] ext4_da_writepages+0x32e/0x744
[  249.239028]  [<000000000016fe64>] writeback_single_inode+0xd4/0x25c
[  249.239044]  [<0000000000170514>] writeback_sb_inodes+0xd0/0x160
[  249.239050]  [<0000000000171278>] writeback_inodes_wb+0x90/0x16c
[  249.239055]  [<00000000001715fa>] wb_writeback+0x2a6/0x480
[  249.239060]  [<00000000001718b6>] wb_do_writeback+0xe2/0x294
[  249.239066]  [<0000000000171b14>] bdi_writeback_thread+0xac/0x2fc
[  249.239072]  [<000000000007d192>] kthread+0xbe/0xc8
[  249.239087]  [<0000000000479182>] kernel_thread_starter+0x6/0xc
[  249.239101]  [<000000000047917c>] kernel_thread_starter+0x0/0xc
[  249.239112] INFO: lockdep is turned off.
[  249.239119] Last Breaking-Event-Address:
[  249.239126]  [<00000000001d112c>] mpage_da_map_and_submit+0x460/0x468


# debugfs /dev/dasda1
debugfs 1.41.10 (10-Feb-2009)
debugfs:  ncheck 1106
Inode   Pathname
1106    /usr/bin/killall
debugfs:  q

there are also a few messages like:
EXT4-fs error (device dasda1): ext4_lookup:1043: inode #130849: comm find: deleted inode referenced: 129148
I guess they appeared after one of the crashes during bisecting (can't 
run fsck right now, since its my /)

Regards,
Sebastian

> 
> Thanks, regards,
> 
> 						- Ted
> 
> P.S.  Also, if you could try suspending once or twice, with different
> programs running, to see if the inode number and pathname are constant
> or vary, that would also be helpful.
> 
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index d8b992e..7d6d7d7 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -252,8 +252,15 @@ static inline int ext4_journal_force_commit(journal_t *journal)
> 
>  static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode)
>  {
> -	if (ext4_handle_valid(handle))
> +	if (ext4_handle_valid(handle)) {
> +		if (unlikely(EXT4_I(inode)->jinode == NULL)) {
> +			/* Should never happen */
> +			ext4_msg(inode->i_sb, KERN_CRIT, 
> +				 "inode #%lu has NULL jinode", inode->i_ino);
> +			BUG();
> +		}
>  		return jbd2_journal_file_inode(handle, EXT4_I(inode)->jinode);
> +	}
>  	return 0;
>  }
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-12 18:49       ` Sebastian Ott
@ 2011-01-13  0:44         ` Theodore Tso
  2011-01-13  5:56           ` Ted Ts'o
                             ` (5 more replies)
  2011-01-13  0:44         ` Theodore Tso
  1 sibling, 6 replies; 34+ messages in thread
From: Theodore Tso @ 2011-01-13  0:44 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: linux-ext4@vger.kernel.org development, LKML Kernel, pm list

On Jan 12, 2011, at 1:49 PM, Sebastian Ott wrote:

> 
> I did about 5 re-tests with different loads, the result was always:
> 
> [  249.238697] EXT4-fs (dasda1): inode #1106 has NULL jinode
> 
> # debugfs /dev/dasda1
> debugfs 1.41.10 (10-Feb-2009)
> debugfs:  ncheck 1106
> Inode   Pathname
> 1106    /usr/bin/killall
> debugfs:  q
> 

That looks really bogus.  /usr/bin/killall is a system binary, and there's no good reason that hibernation should be trying to write pages to that binary.

You said originally that the oops was happening "while going into hibernation right after resuming with...".   So that means you did a successful suspend/resume, and then the second suspend caused the oops?   It looks like somehow the pages were left marked as dirty, so the writeback daemons attempted writing back a page to an inode which was never opened read/write (and in fact as a text page for /usr/bin/killall, was mapped read/only).   Given that ext4 initializes jinode only when the file is opened read/write, the fact that it is null, and the fact that it makes no sense that a program would be modifying /usr/bin/killall as part of a suspend/resume, it looks very much like we just unmasked a software suspend bug....

-- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13  0:44         ` Theodore Tso
@ 2011-01-13  5:56           ` Ted Ts'o
  2011-01-13  5:59             ` [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing Theodore Ts'o
  2011-01-13  5:59             ` Theodore Ts'o
  2011-01-13  5:56           ` Oops while going into hibernate Ted Ts'o
                             ` (4 subsequent siblings)
  5 siblings, 2 replies; 34+ messages in thread
From: Ted Ts'o @ 2011-01-13  5:56 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: linux-ext4@vger.kernel.org development, LKML Kernel, pm list

On Wed, Jan 12, 2011 at 07:44:17PM -0500, Theodore Tso wrote:
> 
> You said originally that the oops was happening "while going into
> hibernation right after resuming with...".  So that means you did a
> successful suspend/resume, and then the second suspend caused the
> oops?  It looks like somehow the pages were left marked as dirty, so
> the writeback daemons attempted writing back a page to an inode
> which was never opened read/write (and in fact as a text page for
> /usr/bin/killall, was mapped read/only).  Given that ext4
> initializes jinode only when the file is opened read/write, the fact
> that it is null, and the fact that it makes no sense that a program
> would be modifying /usr/bin/killall as part of a suspend/resume, it
> looks very much like we just unmasked a software suspend bug....

... and I think I've found the problem.  In kernel/power/block_io.c,
in the function submit(), we see this:

	if (bio_chain == NULL) {
		submit_bio(bio_rw, bio);
		wait_on_page_locked(page);
		if (rw == READ)
			bio_set_pages_dirty(bio); <====
		bio_put(bio);

So when we read in pages from the software suspend device, we end up
marking the pages as dirty(!).  I'm guessing this was caused by a copy
and paste from the only other caller of bio_set_pages_dirty(), which
is the direct I/O code, which needs this when we are writing from a
file into a user-provided buffer.  But for restoring from a software
suspend case, this is as far as I can tell wholely inappropriate.

This causes needless writes, which is bad even before ext4 unmasked
the problem.  I will send a patch under separate cover; could you give
it a try and see if it fixes your crash?

I will look into bulletproofing ext4 by adding checks for this case
and printing warning messages, but neverthe less, I think the root
cause is actually in the hibernation's bio code.

					- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing
  2011-01-13  5:56           ` Ted Ts'o
@ 2011-01-13  5:59             ` Theodore Ts'o
  2011-01-13 12:36               ` Sebastian Ott
  2011-01-13 12:36               ` Sebastian Ott
  2011-01-13  5:59             ` Theodore Ts'o
  1 sibling, 2 replies; 34+ messages in thread
From: Theodore Ts'o @ 2011-01-13  5:59 UTC (permalink / raw)
  To: Ext4 Developers List
  Cc: Linux Kernel Developers List, Theodore Ts'o, Sebastian Ott,
	linux-pm

Everything was sync'ed before the hibernation, so no pages could be
dirty.  So this causes a lot of wasted I/O activity right after
resuming from hibernation.

Worse, it also causes pages from files that were opened read/only to
be marked writeble which makes them subject to writeback.  This was
discovered when ext4 was changed to so that the jinode pointer was not
initialized unless the file was opened read/write, and this caused
things to blow up.  But that just unmasked a problem, since the pages
belonging to the file in question should have never been marked dirty
in the first place.  It increases the chances the text blocks for
executables like /usr/bin/killall will get corrupted when they are
needlessly written, and of course it means extra write cycles to the
SSD.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: linux-pm@lists.linux-foundation.org
---
 kernel/power/block_io.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
index 83bbc7c..108a4f3 100644
--- a/kernel/power/block_io.c
+++ b/kernel/power/block_io.c
@@ -49,8 +49,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
 	if (bio_chain == NULL) {
 		submit_bio(bio_rw, bio);
 		wait_on_page_locked(page);
-		if (rw == READ)
-			bio_set_pages_dirty(bio);
 		bio_put(bio);
 	} else {
 		if (rw == READ)
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing
  2011-01-13  5:59             ` [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing Theodore Ts'o
@ 2011-01-13 12:36               ` Sebastian Ott
  2011-01-13 12:36               ` Sebastian Ott
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 12:36 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Ext4 Developers List, Linux Kernel Developers List, linux-pm

Hi,

On Thu, 13 Jan 2011, Theodore Ts'o wrote:
> Everything was sync'ed before the hibernation, so no pages could be
> dirty.  So this causes a lot of wasted I/O activity right after
> resuming from hibernation.
> 
> Worse, it also causes pages from files that were opened read/only to
> be marked writeble which makes them subject to writeback.  This was
> discovered when ext4 was changed to so that the jinode pointer was not
> initialized unless the file was opened read/write, and this caused
> things to blow up.  But that just unmasked a problem, since the pages
> belonging to the file in question should have never been marked dirty
> in the first place.  It increases the chances the text blocks for
> executables like /usr/bin/killall will get corrupted when they are
> needlessly written, and of course it means extra write cycles to the
> SSD.
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
> Cc: linux-pm@lists.linux-foundation.org
> ---
>  kernel/power/block_io.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
> index 83bbc7c..108a4f3 100644
> --- a/kernel/power/block_io.c
> +++ b/kernel/power/block_io.c
> @@ -49,8 +49,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
>  	if (bio_chain == NULL) {
>  		submit_bio(bio_rw, bio);
>  		wait_on_page_locked(page);
> -		if (rw == READ)
> -			bio_set_pages_dirty(bio);
>  		bio_put(bio);
>  	} else {
>  		if (rw == READ)
> -- 
> 1.7.3.1
> 
> 

I did some test with this patch applied, but sadly it didn't help.
The testcase was reduced to one hibernation followed by a sync.

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing
  2011-01-13  5:59             ` [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing Theodore Ts'o
  2011-01-13 12:36               ` Sebastian Ott
@ 2011-01-13 12:36               ` Sebastian Ott
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 12:36 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: linux-pm, Ext4 Developers List, Linux Kernel Developers List

Hi,

On Thu, 13 Jan 2011, Theodore Ts'o wrote:
> Everything was sync'ed before the hibernation, so no pages could be
> dirty.  So this causes a lot of wasted I/O activity right after
> resuming from hibernation.
> 
> Worse, it also causes pages from files that were opened read/only to
> be marked writeble which makes them subject to writeback.  This was
> discovered when ext4 was changed to so that the jinode pointer was not
> initialized unless the file was opened read/write, and this caused
> things to blow up.  But that just unmasked a problem, since the pages
> belonging to the file in question should have never been marked dirty
> in the first place.  It increases the chances the text blocks for
> executables like /usr/bin/killall will get corrupted when they are
> needlessly written, and of course it means extra write cycles to the
> SSD.
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
> Cc: linux-pm@lists.linux-foundation.org
> ---
>  kernel/power/block_io.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
> index 83bbc7c..108a4f3 100644
> --- a/kernel/power/block_io.c
> +++ b/kernel/power/block_io.c
> @@ -49,8 +49,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
>  	if (bio_chain == NULL) {
>  		submit_bio(bio_rw, bio);
>  		wait_on_page_locked(page);
> -		if (rw == READ)
> -			bio_set_pages_dirty(bio);
>  		bio_put(bio);
>  	} else {
>  		if (rw == READ)
> -- 
> 1.7.3.1
> 
> 

I did some test with this patch applied, but sadly it didn't help.
The testcase was reduced to one hibernation followed by a sync.

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing
  2011-01-13  5:56           ` Ted Ts'o
  2011-01-13  5:59             ` [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing Theodore Ts'o
@ 2011-01-13  5:59             ` Theodore Ts'o
  1 sibling, 0 replies; 34+ messages in thread
From: Theodore Ts'o @ 2011-01-13  5:59 UTC (permalink / raw)
  To: Ext4 Developers List
  Cc: linux-pm, Theodore Ts'o, Linux Kernel Developers List

Everything was sync'ed before the hibernation, so no pages could be
dirty.  So this causes a lot of wasted I/O activity right after
resuming from hibernation.

Worse, it also causes pages from files that were opened read/only to
be marked writeble which makes them subject to writeback.  This was
discovered when ext4 was changed to so that the jinode pointer was not
initialized unless the file was opened read/write, and this caused
things to blow up.  But that just unmasked a problem, since the pages
belonging to the file in question should have never been marked dirty
in the first place.  It increases the chances the text blocks for
executables like /usr/bin/killall will get corrupted when they are
needlessly written, and of course it means extra write cycles to the
SSD.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: linux-pm@lists.linux-foundation.org
---
 kernel/power/block_io.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
index 83bbc7c..108a4f3 100644
--- a/kernel/power/block_io.c
+++ b/kernel/power/block_io.c
@@ -49,8 +49,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
 	if (bio_chain == NULL) {
 		submit_bio(bio_rw, bio);
 		wait_on_page_locked(page);
-		if (rw == READ)
-			bio_set_pages_dirty(bio);
 		bio_put(bio);
 	} else {
 		if (rw == READ)
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13  0:44         ` Theodore Tso
  2011-01-13  5:56           ` Ted Ts'o
@ 2011-01-13  5:56           ` Ted Ts'o
  2011-01-13 11:12           ` [linux-pm] " Bojan Smojver
                             ` (3 subsequent siblings)
  5 siblings, 0 replies; 34+ messages in thread
From: Ted Ts'o @ 2011-01-13  5:56 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: pm list, linux-ext4@vger.kernel.org development, LKML Kernel

On Wed, Jan 12, 2011 at 07:44:17PM -0500, Theodore Tso wrote:
> 
> You said originally that the oops was happening "while going into
> hibernation right after resuming with...".  So that means you did a
> successful suspend/resume, and then the second suspend caused the
> oops?  It looks like somehow the pages were left marked as dirty, so
> the writeback daemons attempted writing back a page to an inode
> which was never opened read/write (and in fact as a text page for
> /usr/bin/killall, was mapped read/only).  Given that ext4
> initializes jinode only when the file is opened read/write, the fact
> that it is null, and the fact that it makes no sense that a program
> would be modifying /usr/bin/killall as part of a suspend/resume, it
> looks very much like we just unmasked a software suspend bug....

... and I think I've found the problem.  In kernel/power/block_io.c,
in the function submit(), we see this:

	if (bio_chain == NULL) {
		submit_bio(bio_rw, bio);
		wait_on_page_locked(page);
		if (rw == READ)
			bio_set_pages_dirty(bio); <====
		bio_put(bio);

So when we read in pages from the software suspend device, we end up
marking the pages as dirty(!).  I'm guessing this was caused by a copy
and paste from the only other caller of bio_set_pages_dirty(), which
is the direct I/O code, which needs this when we are writing from a
file into a user-provided buffer.  But for restoring from a software
suspend case, this is as far as I can tell wholely inappropriate.

This causes needless writes, which is bad even before ext4 unmasked
the problem.  I will send a patch under separate cover; could you give
it a try and see if it fixes your crash?

I will look into bulletproofing ext4 by adding checks for this case
and printing warning messages, but neverthe less, I think the root
cause is actually in the hibernation's bio code.

					- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Oops while going into hibernate
  2011-01-13  0:44         ` Theodore Tso
  2011-01-13  5:56           ` Ted Ts'o
  2011-01-13  5:56           ` Oops while going into hibernate Ted Ts'o
@ 2011-01-13 11:12           ` Bojan Smojver
  2011-01-13 11:49             ` Sebastian Ott
  2011-01-13 11:49             ` Sebastian Ott
  2011-01-13 11:12           ` Bojan Smojver
                             ` (2 subsequent siblings)
  5 siblings, 2 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 11:12 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Sebastian Ott, pm list, linux-ext4@vger.kernel.org development,
	LKML Kernel

On Wed, 2011-01-12 at 19:44 -0500, Theodore Tso wrote:
> It looks like somehow the pages were left marked as dirty, so the
> writeback daemons attempted writing back a page to an inode which was
> never opened read/write (and in fact as a text page
> for /usr/bin/killall, was mapped read/only).

Just out of curiosity, when this happens, is LZO compression being used
with hibernation (default in 2.6.37) or has it been disabled?

-- 
Bojan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Oops while going into hibernate
  2011-01-13 11:12           ` [linux-pm] " Bojan Smojver
@ 2011-01-13 11:49             ` Sebastian Ott
  2011-01-13 11:49             ` Sebastian Ott
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 11:49 UTC (permalink / raw)
  To: Bojan Smojver
  Cc: Theodore Tso, pm list, linux-ext4@vger.kernel.org development,
	LKML Kernel

Hi,

On Thu, 13 Jan 2011, Bojan Smojver wrote:
> On Wed, 2011-01-12 at 19:44 -0500, Theodore Tso wrote:
> > It looks like somehow the pages were left marked as dirty, so the
> > writeback daemons attempted writing back a page to an inode which was
> > never opened read/write (and in fact as a text page
> > for /usr/bin/killall, was mapped read/only).
> 
> Just out of curiosity, when this happens, is LZO compression being used
> with hibernation (default in 2.6.37) or has it been disabled?
LZO is being used - should I try with hibernate=nocompress?

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13 11:12           ` [linux-pm] " Bojan Smojver
  2011-01-13 11:49             ` Sebastian Ott
@ 2011-01-13 11:49             ` Sebastian Ott
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 11:49 UTC (permalink / raw)
  To: Bojan Smojver
  Cc: pm list, linux-ext4@vger.kernel.org development, Theodore Tso,
	LKML Kernel

Hi,

On Thu, 13 Jan 2011, Bojan Smojver wrote:
> On Wed, 2011-01-12 at 19:44 -0500, Theodore Tso wrote:
> > It looks like somehow the pages were left marked as dirty, so the
> > writeback daemons attempted writing back a page to an inode which was
> > never opened read/write (and in fact as a text page
> > for /usr/bin/killall, was mapped read/only).
> 
> Just out of curiosity, when this happens, is LZO compression being used
> with hibernation (default in 2.6.37) or has it been disabled?
LZO is being used - should I try with hibernate=nocompress?

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13  0:44         ` Theodore Tso
                             ` (2 preceding siblings ...)
  2011-01-13 11:12           ` [linux-pm] " Bojan Smojver
@ 2011-01-13 11:12           ` Bojan Smojver
  2011-01-13 11:48           ` Sebastian Ott
  2011-01-13 11:48           ` Sebastian Ott
  5 siblings, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 11:12 UTC (permalink / raw)
  To: Theodore Tso; +Cc: pm list, linux-ext4@vger.kernel.org development, LKML Kernel

On Wed, 2011-01-12 at 19:44 -0500, Theodore Tso wrote:
> It looks like somehow the pages were left marked as dirty, so the
> writeback daemons attempted writing back a page to an inode which was
> never opened read/write (and in fact as a text page
> for /usr/bin/killall, was mapped read/only).

Just out of curiosity, when this happens, is LZO compression being used
with hibernation (default in 2.6.37) or has it been disabled?

-- 
Bojan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13  0:44         ` Theodore Tso
                             ` (3 preceding siblings ...)
  2011-01-13 11:12           ` Bojan Smojver
@ 2011-01-13 11:48           ` Sebastian Ott
  2011-01-13 12:11             ` Bojan Smojver
                               ` (3 more replies)
  2011-01-13 11:48           ` Sebastian Ott
  5 siblings, 4 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 11:48 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4@vger.kernel.org development, LKML Kernel, pm list


On Wed, 12 Jan 2011, Theodore Tso wrote:
> That looks really bogus.  /usr/bin/killall is a system binary, and there's
> no good reason that hibernation should be trying to write pages to that
> binary.
> 
> You said originally that the oops was happening "while going into
> hibernation right after resuming with...".   So that means you did a
> successful suspend/resume, and then the second suspend caused the oops?
Yes. I basically did a
echo reboot > /sys/power/disk
for i in {1..5} ;do echo disk > /sys/power/state ;done
and it crashed very early in the second suspend.

> It looks like somehow the pages were left marked as dirty, so the
> writeback daemons attempted writing back a page to an inode which was
> never opened read/write (and in fact as a text page for
> /usr/bin/killall, was mapped read/only).
> Given that ext4 initializes jinode only when the file is opened
> read/write, the fact that it is null, and the fact that it makes no
> sense that a program would be modifying /usr/bin/killall as part of a
> suspend/resume, it looks very much like we just unmasked a software
> suspend bug....
Ah, ok. Thanks for the explanation!

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13 11:48           ` Sebastian Ott
@ 2011-01-13 12:11             ` Bojan Smojver
  2011-01-13 12:11             ` [linux-pm] " Bojan Smojver
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 12:11 UTC (permalink / raw)
  To: sebott; +Cc: linux-pm, linux-ext4, tytso, linux-kernel

> Yes. I basically did a
> echo reboot > /sys/power/disk
> for i in {1..5} ;do echo disk > /sys/power/state ;done
> and it crashed very early in the second suspend.

Was compression on or off?

--
Bojan 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Oops while going into hibernate
  2011-01-13 11:48           ` Sebastian Ott
  2011-01-13 12:11             ` Bojan Smojver
@ 2011-01-13 12:11             ` Bojan Smojver
  2011-01-13 12:31               ` Sebastian Ott
  2011-01-13 12:31               ` Sebastian Ott
  2011-01-13 13:36             ` Heiko Carstens
  2011-01-13 13:36             ` Heiko Carstens
  3 siblings, 2 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 12:11 UTC (permalink / raw)
  To: sebott; +Cc: tytso, linux-pm, linux-ext4, linux-kernel

> Yes. I basically did a
> echo reboot > /sys/power/disk
> for i in {1..5} ;do echo disk > /sys/power/state ;done
> and it crashed very early in the second suspend.

Was compression on or off?

--
Bojan 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Oops while going into hibernate
  2011-01-13 12:11             ` [linux-pm] " Bojan Smojver
@ 2011-01-13 12:31               ` Sebastian Ott
  2011-01-13 12:31               ` Sebastian Ott
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 12:31 UTC (permalink / raw)
  To: Bojan Smojver; +Cc: tytso, linux-pm, linux-ext4, linux-kernel


On Thu, 13 Jan 2011, Bojan Smojver wrote:
> > Yes. I basically did a
> > echo reboot > /sys/power/disk
> > for i in {1..5} ;do echo disk > /sys/power/state ;done
> > and it crashed very early in the second suspend.
> 
> Was compression on or off?
compression was on. I did a re-test with hibernate=nocompress
..but with the same result

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13 12:11             ` [linux-pm] " Bojan Smojver
  2011-01-13 12:31               ` Sebastian Ott
@ 2011-01-13 12:31               ` Sebastian Ott
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 12:31 UTC (permalink / raw)
  To: Bojan Smojver; +Cc: linux-pm, linux-ext4, tytso, linux-kernel


On Thu, 13 Jan 2011, Bojan Smojver wrote:
> > Yes. I basically did a
> > echo reboot > /sys/power/disk
> > for i in {1..5} ;do echo disk > /sys/power/state ;done
> > and it crashed very early in the second suspend.
> 
> Was compression on or off?
compression was on. I did a re-test with hibernate=nocompress
..but with the same result

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13 11:48           ` Sebastian Ott
  2011-01-13 12:11             ` Bojan Smojver
  2011-01-13 12:11             ` [linux-pm] " Bojan Smojver
@ 2011-01-13 13:36             ` Heiko Carstens
  2011-01-13 18:46               ` Ted Ts'o
  2011-01-13 18:46               ` Ted Ts'o
  2011-01-13 13:36             ` Heiko Carstens
  3 siblings, 2 replies; 34+ messages in thread
From: Heiko Carstens @ 2011-01-13 13:36 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Theodore Tso, linux-ext4@vger.kernel.org development, LKML Kernel,
	pm list

On Thu, Jan 13, 2011 at 12:48:40PM +0100, Sebastian Ott wrote:
> On Wed, 12 Jan 2011, Theodore Tso wrote:
> > It looks like somehow the pages were left marked as dirty, so the
> > writeback daemons attempted writing back a page to an inode which was
> > never opened read/write (and in fact as a text page for
> > /usr/bin/killall, was mapped read/only).
> > Given that ext4 initializes jinode only when the file is opened
> > read/write, the fact that it is null, and the fact that it makes no
> > sense that a program would be modifying /usr/bin/killall as part of a
> > suspend/resume, it looks very much like we just unmasked a software
> > suspend bug....
> Ah, ok. Thanks for the explanation!

Eeeek... this seems to be an architecture specific bug that is only present
on s390.
The dirty bit for user space pages on all architectures but s390 are stored
into the PTE's. On s390 however they are stored into the storage key that
exists per _physical_ page.
So, what we should have done, when implementing suspend/resume on s390, is
to save the storage key for each page and write that to the suspend device
and upon resume restore the storage key contents for each physical page.
The code that would do that is missing... Hence _all_ pages of the resumed
image are dirty after they have been copied to their location.
*ouch*

Will fix.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13 13:36             ` Heiko Carstens
@ 2011-01-13 18:46               ` Ted Ts'o
  2011-01-13 18:46               ` Ted Ts'o
  1 sibling, 0 replies; 34+ messages in thread
From: Ted Ts'o @ 2011-01-13 18:46 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: pm list, linux-ext4@vger.kernel.org development, LKML Kernel

On Thu, Jan 13, 2011 at 02:36:12PM +0100, Heiko Carstens wrote:
> 
> Eeeek... this seems to be an architecture specific bug that is only present
> on s390.
> The dirty bit for user space pages on all architectures but s390 are stored
> into the PTE's. On s390 however they are stored into the storage key that
> exists per _physical_ page.
> So, what we should have done, when implementing suspend/resume on s390, is
> to save the storage key for each page and write that to the suspend device
> and upon resume restore the storage key contents for each physical page.
> The code that would do that is missing... Hence _all_ pages of the resumed
> image are dirty after they have been copied to their location.
> *ouch*
> 
> Will fix.

Glad you found the root cause.  If you don't think you can get this
fixed quickly, before -rc2 or -rc3, I can fairly quickly add some
checks to ext4 to detect this condition, issue a warning, and then
return an error code from the ->writepages() hook.  (Which will then
promptly be ignored by the writeback code, since, hey, what are they
going to do with an error, but that's a discussion for another forum.)
Would that be helpful?

I'm still a bit concerned with the call to set the pages' PTE to be
dirty that I found in the hibernate code, but I accept the fact that
removing it doesn't solve the s390 crash.  It still seems wrong to me,
and hopefully someone from linux-pm can look at that more closely.

    	      	      	   	    	     - Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13 13:36             ` Heiko Carstens
  2011-01-13 18:46               ` Ted Ts'o
@ 2011-01-13 18:46               ` Ted Ts'o
  2011-01-13 21:30                 ` [linux-pm] " Bojan Smojver
  2011-01-13 21:30                 ` Bojan Smojver
  1 sibling, 2 replies; 34+ messages in thread
From: Ted Ts'o @ 2011-01-13 18:46 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Sebastian Ott, linux-ext4@vger.kernel.org development,
	LKML Kernel, pm list

On Thu, Jan 13, 2011 at 02:36:12PM +0100, Heiko Carstens wrote:
> 
> Eeeek... this seems to be an architecture specific bug that is only present
> on s390.
> The dirty bit for user space pages on all architectures but s390 are stored
> into the PTE's. On s390 however they are stored into the storage key that
> exists per _physical_ page.
> So, what we should have done, when implementing suspend/resume on s390, is
> to save the storage key for each page and write that to the suspend device
> and upon resume restore the storage key contents for each physical page.
> The code that would do that is missing... Hence _all_ pages of the resumed
> image are dirty after they have been copied to their location.
> *ouch*
> 
> Will fix.

Glad you found the root cause.  If you don't think you can get this
fixed quickly, before -rc2 or -rc3, I can fairly quickly add some
checks to ext4 to detect this condition, issue a warning, and then
return an error code from the ->writepages() hook.  (Which will then
promptly be ignored by the writeback code, since, hey, what are they
going to do with an error, but that's a discussion for another forum.)
Would that be helpful?

I'm still a bit concerned with the call to set the pages' PTE to be
dirty that I found in the hibernate code, but I accept the fact that
removing it doesn't solve the s390 crash.  It still seems wrong to me,
and hopefully someone from linux-pm can look at that more closely.

    	      	      	   	    	     - Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Oops while going into hibernate
  2011-01-13 18:46               ` Ted Ts'o
@ 2011-01-13 21:30                 ` Bojan Smojver
  2011-01-14  9:53                   ` Heiko Carstens
  2011-01-14  9:53                   ` Heiko Carstens
  2011-01-13 21:30                 ` Bojan Smojver
  1 sibling, 2 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 21:30 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Heiko Carstens, pm list, linux-ext4@vger.kernel.org development,
	LKML Kernel

On Thu, 2011-01-13 at 13:46 -0500, Ted Ts'o wrote:
> I'm still a bit concerned with the call to set the pages' PTE to be
> dirty that I found in the hibernate code, but I accept the fact that
> removing it doesn't solve the s390 crash.  It still seems wrong to me,
> and hopefully someone from linux-pm can look at that more closely. 

If I'm understanding things correctly, this should affect only the
situation when compression is not used. Otherwise, pages that are read
into by block I/O are decompressed first and copied into different
pages. No?

-- 
Bojan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Oops while going into hibernate
  2011-01-13 21:30                 ` [linux-pm] " Bojan Smojver
@ 2011-01-14  9:53                   ` Heiko Carstens
  2011-01-14 13:14                     ` Bojan Smojver
  2011-01-14 13:14                     ` Bojan Smojver
  2011-01-14  9:53                   ` Heiko Carstens
  1 sibling, 2 replies; 34+ messages in thread
From: Heiko Carstens @ 2011-01-14  9:53 UTC (permalink / raw)
  To: Bojan Smojver
  Cc: Ted Ts'o, pm list, linux-ext4@vger.kernel.org development,
	LKML Kernel

On Fri, Jan 14, 2011 at 08:30:12AM +1100, Bojan Smojver wrote:
> On Thu, 2011-01-13 at 13:46 -0500, Ted Ts'o wrote:
> > I'm still a bit concerned with the call to set the pages' PTE to be
> > dirty that I found in the hibernate code, but I accept the fact that
> > removing it doesn't solve the s390 crash.  It still seems wrong to me,
> > and hopefully someone from linux-pm can look at that more closely. 
> 
> If I'm understanding things correctly, this should affect only the
> situation when compression is not used. Otherwise, pages that are read
> into by block I/O are decompressed first and copied into different
> pages. No?

When the pages get copied to their final resting place the dirty bits
of the corresponding physical pages get set automatically by the
hardware.
If there would be some code that would clear the dirty bit after the
copy operation then we would have some underindication and as a result
the possibility of data corruption, but we've never seen this.
However since s390 is the only architecture which has dirty bits for
physical pages I doubt that there is any such code present. So the
bug should happen independently of image compression.

What is missing is code that restores the original storage keys after
the pages have been copied to their final place.
I need to read the suspend/resume code and figure out how to fix this.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Oops while going into hibernate
  2011-01-14  9:53                   ` Heiko Carstens
@ 2011-01-14 13:14                     ` Bojan Smojver
  2011-01-14 13:14                     ` Bojan Smojver
  1 sibling, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-14 13:14 UTC (permalink / raw)
  To: heiko.carstens; +Cc: tytso, linux-pm, linux-ext4, linux-kernel

> However since s390 is the only architecture which has dirty bits for
> physical pages I doubt that there is any such code present. So the
> bug should happen independently of image compression.

Ah, sorry - I wasn't clear there. I meant for other archs, where setting 
the dirty bit on read may matter when compression is off.

Obviously, s390 will have to have its own problem fixed separately.

--
Bojan 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-14  9:53                   ` Heiko Carstens
  2011-01-14 13:14                     ` Bojan Smojver
@ 2011-01-14 13:14                     ` Bojan Smojver
  1 sibling, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-14 13:14 UTC (permalink / raw)
  To: heiko.carstens; +Cc: linux-pm, linux-ext4, tytso, linux-kernel

> However since s390 is the only architecture which has dirty bits for
> physical pages I doubt that there is any such code present. So the
> bug should happen independently of image compression.

Ah, sorry - I wasn't clear there. I meant for other archs, where setting 
the dirty bit on read may matter when compression is off.

Obviously, s390 will have to have its own problem fixed separately.

--
Bojan 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13 21:30                 ` [linux-pm] " Bojan Smojver
  2011-01-14  9:53                   ` Heiko Carstens
@ 2011-01-14  9:53                   ` Heiko Carstens
  1 sibling, 0 replies; 34+ messages in thread
From: Heiko Carstens @ 2011-01-14  9:53 UTC (permalink / raw)
  To: Bojan Smojver
  Cc: pm list, linux-ext4@vger.kernel.org development, Ted Ts'o,
	LKML Kernel

On Fri, Jan 14, 2011 at 08:30:12AM +1100, Bojan Smojver wrote:
> On Thu, 2011-01-13 at 13:46 -0500, Ted Ts'o wrote:
> > I'm still a bit concerned with the call to set the pages' PTE to be
> > dirty that I found in the hibernate code, but I accept the fact that
> > removing it doesn't solve the s390 crash.  It still seems wrong to me,
> > and hopefully someone from linux-pm can look at that more closely. 
> 
> If I'm understanding things correctly, this should affect only the
> situation when compression is not used. Otherwise, pages that are read
> into by block I/O are decompressed first and copied into different
> pages. No?

When the pages get copied to their final resting place the dirty bits
of the corresponding physical pages get set automatically by the
hardware.
If there would be some code that would clear the dirty bit after the
copy operation then we would have some underindication and as a result
the possibility of data corruption, but we've never seen this.
However since s390 is the only architecture which has dirty bits for
physical pages I doubt that there is any such code present. So the
bug should happen independently of image compression.

What is missing is code that restores the original storage keys after
the pages have been copied to their final place.
I need to read the suspend/resume code and figure out how to fix this.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13 18:46               ` Ted Ts'o
  2011-01-13 21:30                 ` [linux-pm] " Bojan Smojver
@ 2011-01-13 21:30                 ` Bojan Smojver
  1 sibling, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 21:30 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: pm list, linux-ext4@vger.kernel.org development, Heiko Carstens,
	LKML Kernel

On Thu, 2011-01-13 at 13:46 -0500, Ted Ts'o wrote:
> I'm still a bit concerned with the call to set the pages' PTE to be
> dirty that I found in the hibernate code, but I accept the fact that
> removing it doesn't solve the s390 crash.  It still seems wrong to me,
> and hopefully someone from linux-pm can look at that more closely. 

If I'm understanding things correctly, this should affect only the
situation when compression is not used. Otherwise, pages that are read
into by block I/O are decompressed first and copied into different
pages. No?

-- 
Bojan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13 11:48           ` Sebastian Ott
                               ` (2 preceding siblings ...)
  2011-01-13 13:36             ` Heiko Carstens
@ 2011-01-13 13:36             ` Heiko Carstens
  3 siblings, 0 replies; 34+ messages in thread
From: Heiko Carstens @ 2011-01-13 13:36 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: pm list, linux-ext4@vger.kernel.org development, Theodore Tso,
	LKML Kernel

On Thu, Jan 13, 2011 at 12:48:40PM +0100, Sebastian Ott wrote:
> On Wed, 12 Jan 2011, Theodore Tso wrote:
> > It looks like somehow the pages were left marked as dirty, so the
> > writeback daemons attempted writing back a page to an inode which was
> > never opened read/write (and in fact as a text page for
> > /usr/bin/killall, was mapped read/only).
> > Given that ext4 initializes jinode only when the file is opened
> > read/write, the fact that it is null, and the fact that it makes no
> > sense that a program would be modifying /usr/bin/killall as part of a
> > suspend/resume, it looks very much like we just unmasked a software
> > suspend bug....
> Ah, ok. Thanks for the explanation!

Eeeek... this seems to be an architecture specific bug that is only present
on s390.
The dirty bit for user space pages on all architectures but s390 are stored
into the PTE's. On s390 however they are stored into the storage key that
exists per _physical_ page.
So, what we should have done, when implementing suspend/resume on s390, is
to save the storage key for each page and write that to the suspend device
and upon resume restore the storage key contents for each physical page.
The code that would do that is missing... Hence _all_ pages of the resumed
image are dirty after they have been copied to their location.
*ouch*

Will fix.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-13  0:44         ` Theodore Tso
                             ` (4 preceding siblings ...)
  2011-01-13 11:48           ` Sebastian Ott
@ 2011-01-13 11:48           ` Sebastian Ott
  5 siblings, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 11:48 UTC (permalink / raw)
  To: Theodore Tso; +Cc: pm list, linux-ext4@vger.kernel.org development, LKML Kernel


On Wed, 12 Jan 2011, Theodore Tso wrote:
> That looks really bogus.  /usr/bin/killall is a system binary, and there's
> no good reason that hibernation should be trying to write pages to that
> binary.
> 
> You said originally that the oops was happening "while going into
> hibernation right after resuming with...".   So that means you did a
> successful suspend/resume, and then the second suspend caused the oops?
Yes. I basically did a
echo reboot > /sys/power/disk
for i in {1..5} ;do echo disk > /sys/power/state ;done
and it crashed very early in the second suspend.

> It looks like somehow the pages were left marked as dirty, so the
> writeback daemons attempted writing back a page to an inode which was
> never opened read/write (and in fact as a text page for
> /usr/bin/killall, was mapped read/only).
> Given that ext4 initializes jinode only when the file is opened
> read/write, the fact that it is null, and the fact that it makes no
> sense that a program would be modifying /usr/bin/killall as part of a
> suspend/resume, it looks very much like we just unmasked a software
> suspend bug....
Ah, ok. Thanks for the explanation!

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Oops while going into hibernate
  2011-01-12 18:49       ` Sebastian Ott
  2011-01-13  0:44         ` Theodore Tso
@ 2011-01-13  0:44         ` Theodore Tso
  1 sibling, 0 replies; 34+ messages in thread
From: Theodore Tso @ 2011-01-13  0:44 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: pm list, linux-ext4@vger.kernel.org development, LKML Kernel

On Jan 12, 2011, at 1:49 PM, Sebastian Ott wrote:

> 
> I did about 5 re-tests with different loads, the result was always:
> 
> [  249.238697] EXT4-fs (dasda1): inode #1106 has NULL jinode
> 
> # debugfs /dev/dasda1
> debugfs 1.41.10 (10-Feb-2009)
> debugfs:  ncheck 1106
> Inode   Pathname
> 1106    /usr/bin/killall
> debugfs:  q
> 

That looks really bogus.  /usr/bin/killall is a system binary, and there's no good reason that hibernation should be trying to write pages to that binary.

You said originally that the oops was happening "while going into hibernation right after resuming with...".   So that means you did a successful suspend/resume, and then the second suspend caused the oops?   It looks like somehow the pages were left marked as dirty, so the writeback daemons attempted writing back a page to an inode which was never opened read/write (and in fact as a text page for /usr/bin/killall, was mapped read/only).   Given that ext4 initializes jinode only when the file is opened read/write, the fact that it is null, and the fact that it makes no sense that a program would be modifying /usr/bin/killall as part of a suspend/resume, it looks very much like we just unmasked a software suspend bug....

-- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-01-14 13:14 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-12 13:48 Oops while going into hibernate Sebastian Ott
2011-01-12 16:26 ` Ted Ts'o
2011-01-12 16:56   ` Sebastian Ott
2011-01-12 17:26     ` Ted Ts'o
2011-01-12 17:37       ` Sebastian Ott
2011-01-12 18:49       ` Sebastian Ott
2011-01-13  0:44         ` Theodore Tso
2011-01-13  5:56           ` Ted Ts'o
2011-01-13  5:59             ` [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing Theodore Ts'o
2011-01-13 12:36               ` Sebastian Ott
2011-01-13 12:36               ` Sebastian Ott
2011-01-13  5:59             ` Theodore Ts'o
2011-01-13  5:56           ` Oops while going into hibernate Ted Ts'o
2011-01-13 11:12           ` [linux-pm] " Bojan Smojver
2011-01-13 11:49             ` Sebastian Ott
2011-01-13 11:49             ` Sebastian Ott
2011-01-13 11:12           ` Bojan Smojver
2011-01-13 11:48           ` Sebastian Ott
2011-01-13 12:11             ` Bojan Smojver
2011-01-13 12:11             ` [linux-pm] " Bojan Smojver
2011-01-13 12:31               ` Sebastian Ott
2011-01-13 12:31               ` Sebastian Ott
2011-01-13 13:36             ` Heiko Carstens
2011-01-13 18:46               ` Ted Ts'o
2011-01-13 18:46               ` Ted Ts'o
2011-01-13 21:30                 ` [linux-pm] " Bojan Smojver
2011-01-14  9:53                   ` Heiko Carstens
2011-01-14 13:14                     ` Bojan Smojver
2011-01-14 13:14                     ` Bojan Smojver
2011-01-14  9:53                   ` Heiko Carstens
2011-01-13 21:30                 ` Bojan Smojver
2011-01-13 13:36             ` Heiko Carstens
2011-01-13 11:48           ` Sebastian Ott
2011-01-13  0:44         ` Theodore Tso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.