* Re: Oops while going into hibernate
2011-01-13 0:44 ` Theodore Tso
@ 2011-01-13 5:56 ` Ted Ts'o
2011-01-13 5:59 ` [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing Theodore Ts'o
2011-01-13 5:59 ` Theodore Ts'o
2011-01-13 5:56 ` Oops while going into hibernate Ted Ts'o
` (4 subsequent siblings)
5 siblings, 2 replies; 34+ messages in thread
From: Ted Ts'o @ 2011-01-13 5:56 UTC (permalink / raw)
To: Sebastian Ott
Cc: linux-ext4@vger.kernel.org development, LKML Kernel, pm list
On Wed, Jan 12, 2011 at 07:44:17PM -0500, Theodore Tso wrote:
>
> You said originally that the oops was happening "while going into
> hibernation right after resuming with...". So that means you did a
> successful suspend/resume, and then the second suspend caused the
> oops? It looks like somehow the pages were left marked as dirty, so
> the writeback daemons attempted writing back a page to an inode
> which was never opened read/write (and in fact as a text page for
> /usr/bin/killall, was mapped read/only). Given that ext4
> initializes jinode only when the file is opened read/write, the fact
> that it is null, and the fact that it makes no sense that a program
> would be modifying /usr/bin/killall as part of a suspend/resume, it
> looks very much like we just unmasked a software suspend bug....
... and I think I've found the problem. In kernel/power/block_io.c,
in the function submit(), we see this:
if (bio_chain == NULL) {
submit_bio(bio_rw, bio);
wait_on_page_locked(page);
if (rw == READ)
bio_set_pages_dirty(bio); <====
bio_put(bio);
So when we read in pages from the software suspend device, we end up
marking the pages as dirty(!). I'm guessing this was caused by a copy
and paste from the only other caller of bio_set_pages_dirty(), which
is the direct I/O code, which needs this when we are writing from a
file into a user-provided buffer. But for restoring from a software
suspend case, this is as far as I can tell wholely inappropriate.
This causes needless writes, which is bad even before ext4 unmasked
the problem. I will send a patch under separate cover; could you give
it a try and see if it fixes your crash?
I will look into bulletproofing ext4 by adding checks for this case
and printing warning messages, but neverthe less, I think the root
cause is actually in the hibernation's bio code.
- Ted
^ permalink raw reply [flat|nested] 34+ messages in thread* [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing
2011-01-13 5:56 ` Ted Ts'o
@ 2011-01-13 5:59 ` Theodore Ts'o
2011-01-13 12:36 ` Sebastian Ott
2011-01-13 12:36 ` Sebastian Ott
2011-01-13 5:59 ` Theodore Ts'o
1 sibling, 2 replies; 34+ messages in thread
From: Theodore Ts'o @ 2011-01-13 5:59 UTC (permalink / raw)
To: Ext4 Developers List
Cc: Linux Kernel Developers List, Theodore Ts'o, Sebastian Ott,
linux-pm
Everything was sync'ed before the hibernation, so no pages could be
dirty. So this causes a lot of wasted I/O activity right after
resuming from hibernation.
Worse, it also causes pages from files that were opened read/only to
be marked writeble which makes them subject to writeback. This was
discovered when ext4 was changed to so that the jinode pointer was not
initialized unless the file was opened read/write, and this caused
things to blow up. But that just unmasked a problem, since the pages
belonging to the file in question should have never been marked dirty
in the first place. It increases the chances the text blocks for
executables like /usr/bin/killall will get corrupted when they are
needlessly written, and of course it means extra write cycles to the
SSD.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: linux-pm@lists.linux-foundation.org
---
kernel/power/block_io.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
index 83bbc7c..108a4f3 100644
--- a/kernel/power/block_io.c
+++ b/kernel/power/block_io.c
@@ -49,8 +49,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
if (bio_chain == NULL) {
submit_bio(bio_rw, bio);
wait_on_page_locked(page);
- if (rw == READ)
- bio_set_pages_dirty(bio);
bio_put(bio);
} else {
if (rw == READ)
--
1.7.3.1
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing
2011-01-13 5:59 ` [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing Theodore Ts'o
@ 2011-01-13 12:36 ` Sebastian Ott
2011-01-13 12:36 ` Sebastian Ott
1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 12:36 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Ext4 Developers List, Linux Kernel Developers List, linux-pm
Hi,
On Thu, 13 Jan 2011, Theodore Ts'o wrote:
> Everything was sync'ed before the hibernation, so no pages could be
> dirty. So this causes a lot of wasted I/O activity right after
> resuming from hibernation.
>
> Worse, it also causes pages from files that were opened read/only to
> be marked writeble which makes them subject to writeback. This was
> discovered when ext4 was changed to so that the jinode pointer was not
> initialized unless the file was opened read/write, and this caused
> things to blow up. But that just unmasked a problem, since the pages
> belonging to the file in question should have never been marked dirty
> in the first place. It increases the chances the text blocks for
> executables like /usr/bin/killall will get corrupted when they are
> needlessly written, and of course it means extra write cycles to the
> SSD.
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
> Cc: linux-pm@lists.linux-foundation.org
> ---
> kernel/power/block_io.c | 2 --
> 1 files changed, 0 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
> index 83bbc7c..108a4f3 100644
> --- a/kernel/power/block_io.c
> +++ b/kernel/power/block_io.c
> @@ -49,8 +49,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
> if (bio_chain == NULL) {
> submit_bio(bio_rw, bio);
> wait_on_page_locked(page);
> - if (rw == READ)
> - bio_set_pages_dirty(bio);
> bio_put(bio);
> } else {
> if (rw == READ)
> --
> 1.7.3.1
>
>
I did some test with this patch applied, but sadly it didn't help.
The testcase was reduced to one hibernation followed by a sync.
Regards,
Sebastian
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing
2011-01-13 5:59 ` [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing Theodore Ts'o
2011-01-13 12:36 ` Sebastian Ott
@ 2011-01-13 12:36 ` Sebastian Ott
1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 12:36 UTC (permalink / raw)
To: Theodore Ts'o
Cc: linux-pm, Ext4 Developers List, Linux Kernel Developers List
Hi,
On Thu, 13 Jan 2011, Theodore Ts'o wrote:
> Everything was sync'ed before the hibernation, so no pages could be
> dirty. So this causes a lot of wasted I/O activity right after
> resuming from hibernation.
>
> Worse, it also causes pages from files that were opened read/only to
> be marked writeble which makes them subject to writeback. This was
> discovered when ext4 was changed to so that the jinode pointer was not
> initialized unless the file was opened read/write, and this caused
> things to blow up. But that just unmasked a problem, since the pages
> belonging to the file in question should have never been marked dirty
> in the first place. It increases the chances the text blocks for
> executables like /usr/bin/killall will get corrupted when they are
> needlessly written, and of course it means extra write cycles to the
> SSD.
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
> Cc: linux-pm@lists.linux-foundation.org
> ---
> kernel/power/block_io.c | 2 --
> 1 files changed, 0 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
> index 83bbc7c..108a4f3 100644
> --- a/kernel/power/block_io.c
> +++ b/kernel/power/block_io.c
> @@ -49,8 +49,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
> if (bio_chain == NULL) {
> submit_bio(bio_rw, bio);
> wait_on_page_locked(page);
> - if (rw == READ)
> - bio_set_pages_dirty(bio);
> bio_put(bio);
> } else {
> if (rw == READ)
> --
> 1.7.3.1
>
>
I did some test with this patch applied, but sadly it didn't help.
The testcase was reduced to one hibernation followed by a sync.
Regards,
Sebastian
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing
2011-01-13 5:56 ` Ted Ts'o
2011-01-13 5:59 ` [PATCH] PM / Hibernate: Don't mark pages dirty when reading pages while thawing Theodore Ts'o
@ 2011-01-13 5:59 ` Theodore Ts'o
1 sibling, 0 replies; 34+ messages in thread
From: Theodore Ts'o @ 2011-01-13 5:59 UTC (permalink / raw)
To: Ext4 Developers List
Cc: linux-pm, Theodore Ts'o, Linux Kernel Developers List
Everything was sync'ed before the hibernation, so no pages could be
dirty. So this causes a lot of wasted I/O activity right after
resuming from hibernation.
Worse, it also causes pages from files that were opened read/only to
be marked writeble which makes them subject to writeback. This was
discovered when ext4 was changed to so that the jinode pointer was not
initialized unless the file was opened read/write, and this caused
things to blow up. But that just unmasked a problem, since the pages
belonging to the file in question should have never been marked dirty
in the first place. It increases the chances the text blocks for
executables like /usr/bin/killall will get corrupted when they are
needlessly written, and of course it means extra write cycles to the
SSD.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: linux-pm@lists.linux-foundation.org
---
kernel/power/block_io.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
index 83bbc7c..108a4f3 100644
--- a/kernel/power/block_io.c
+++ b/kernel/power/block_io.c
@@ -49,8 +49,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
if (bio_chain == NULL) {
submit_bio(bio_rw, bio);
wait_on_page_locked(page);
- if (rw == READ)
- bio_set_pages_dirty(bio);
bio_put(bio);
} else {
if (rw == READ)
--
1.7.3.1
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-13 0:44 ` Theodore Tso
2011-01-13 5:56 ` Ted Ts'o
@ 2011-01-13 5:56 ` Ted Ts'o
2011-01-13 11:12 ` [linux-pm] " Bojan Smojver
` (3 subsequent siblings)
5 siblings, 0 replies; 34+ messages in thread
From: Ted Ts'o @ 2011-01-13 5:56 UTC (permalink / raw)
To: Sebastian Ott
Cc: pm list, linux-ext4@vger.kernel.org development, LKML Kernel
On Wed, Jan 12, 2011 at 07:44:17PM -0500, Theodore Tso wrote:
>
> You said originally that the oops was happening "while going into
> hibernation right after resuming with...". So that means you did a
> successful suspend/resume, and then the second suspend caused the
> oops? It looks like somehow the pages were left marked as dirty, so
> the writeback daemons attempted writing back a page to an inode
> which was never opened read/write (and in fact as a text page for
> /usr/bin/killall, was mapped read/only). Given that ext4
> initializes jinode only when the file is opened read/write, the fact
> that it is null, and the fact that it makes no sense that a program
> would be modifying /usr/bin/killall as part of a suspend/resume, it
> looks very much like we just unmasked a software suspend bug....
... and I think I've found the problem. In kernel/power/block_io.c,
in the function submit(), we see this:
if (bio_chain == NULL) {
submit_bio(bio_rw, bio);
wait_on_page_locked(page);
if (rw == READ)
bio_set_pages_dirty(bio); <====
bio_put(bio);
So when we read in pages from the software suspend device, we end up
marking the pages as dirty(!). I'm guessing this was caused by a copy
and paste from the only other caller of bio_set_pages_dirty(), which
is the direct I/O code, which needs this when we are writing from a
file into a user-provided buffer. But for restoring from a software
suspend case, this is as far as I can tell wholely inappropriate.
This causes needless writes, which is bad even before ext4 unmasked
the problem. I will send a patch under separate cover; could you give
it a try and see if it fixes your crash?
I will look into bulletproofing ext4 by adding checks for this case
and printing warning messages, but neverthe less, I think the root
cause is actually in the hibernation's bio code.
- Ted
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [linux-pm] Oops while going into hibernate
2011-01-13 0:44 ` Theodore Tso
2011-01-13 5:56 ` Ted Ts'o
2011-01-13 5:56 ` Oops while going into hibernate Ted Ts'o
@ 2011-01-13 11:12 ` Bojan Smojver
2011-01-13 11:49 ` Sebastian Ott
2011-01-13 11:49 ` Sebastian Ott
2011-01-13 11:12 ` Bojan Smojver
` (2 subsequent siblings)
5 siblings, 2 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 11:12 UTC (permalink / raw)
To: Theodore Tso
Cc: Sebastian Ott, pm list, linux-ext4@vger.kernel.org development,
LKML Kernel
On Wed, 2011-01-12 at 19:44 -0500, Theodore Tso wrote:
> It looks like somehow the pages were left marked as dirty, so the
> writeback daemons attempted writing back a page to an inode which was
> never opened read/write (and in fact as a text page
> for /usr/bin/killall, was mapped read/only).
Just out of curiosity, when this happens, is LZO compression being used
with hibernation (default in 2.6.37) or has it been disabled?
--
Bojan
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [linux-pm] Oops while going into hibernate
2011-01-13 11:12 ` [linux-pm] " Bojan Smojver
@ 2011-01-13 11:49 ` Sebastian Ott
2011-01-13 11:49 ` Sebastian Ott
1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 11:49 UTC (permalink / raw)
To: Bojan Smojver
Cc: Theodore Tso, pm list, linux-ext4@vger.kernel.org development,
LKML Kernel
Hi,
On Thu, 13 Jan 2011, Bojan Smojver wrote:
> On Wed, 2011-01-12 at 19:44 -0500, Theodore Tso wrote:
> > It looks like somehow the pages were left marked as dirty, so the
> > writeback daemons attempted writing back a page to an inode which was
> > never opened read/write (and in fact as a text page
> > for /usr/bin/killall, was mapped read/only).
>
> Just out of curiosity, when this happens, is LZO compression being used
> with hibernation (default in 2.6.37) or has it been disabled?
LZO is being used - should I try with hibernate=nocompress?
Regards,
Sebastian
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-13 11:12 ` [linux-pm] " Bojan Smojver
2011-01-13 11:49 ` Sebastian Ott
@ 2011-01-13 11:49 ` Sebastian Ott
1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 11:49 UTC (permalink / raw)
To: Bojan Smojver
Cc: pm list, linux-ext4@vger.kernel.org development, Theodore Tso,
LKML Kernel
Hi,
On Thu, 13 Jan 2011, Bojan Smojver wrote:
> On Wed, 2011-01-12 at 19:44 -0500, Theodore Tso wrote:
> > It looks like somehow the pages were left marked as dirty, so the
> > writeback daemons attempted writing back a page to an inode which was
> > never opened read/write (and in fact as a text page
> > for /usr/bin/killall, was mapped read/only).
>
> Just out of curiosity, when this happens, is LZO compression being used
> with hibernation (default in 2.6.37) or has it been disabled?
LZO is being used - should I try with hibernate=nocompress?
Regards,
Sebastian
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-13 0:44 ` Theodore Tso
` (2 preceding siblings ...)
2011-01-13 11:12 ` [linux-pm] " Bojan Smojver
@ 2011-01-13 11:12 ` Bojan Smojver
2011-01-13 11:48 ` Sebastian Ott
2011-01-13 11:48 ` Sebastian Ott
5 siblings, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 11:12 UTC (permalink / raw)
To: Theodore Tso; +Cc: pm list, linux-ext4@vger.kernel.org development, LKML Kernel
On Wed, 2011-01-12 at 19:44 -0500, Theodore Tso wrote:
> It looks like somehow the pages were left marked as dirty, so the
> writeback daemons attempted writing back a page to an inode which was
> never opened read/write (and in fact as a text page
> for /usr/bin/killall, was mapped read/only).
Just out of curiosity, when this happens, is LZO compression being used
with hibernation (default in 2.6.37) or has it been disabled?
--
Bojan
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: Oops while going into hibernate
2011-01-13 0:44 ` Theodore Tso
` (3 preceding siblings ...)
2011-01-13 11:12 ` Bojan Smojver
@ 2011-01-13 11:48 ` Sebastian Ott
2011-01-13 12:11 ` Bojan Smojver
` (3 more replies)
2011-01-13 11:48 ` Sebastian Ott
5 siblings, 4 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 11:48 UTC (permalink / raw)
To: Theodore Tso; +Cc: linux-ext4@vger.kernel.org development, LKML Kernel, pm list
On Wed, 12 Jan 2011, Theodore Tso wrote:
> That looks really bogus. /usr/bin/killall is a system binary, and there's
> no good reason that hibernation should be trying to write pages to that
> binary.
>
> You said originally that the oops was happening "while going into
> hibernation right after resuming with...". So that means you did a
> successful suspend/resume, and then the second suspend caused the oops?
Yes. I basically did a
echo reboot > /sys/power/disk
for i in {1..5} ;do echo disk > /sys/power/state ;done
and it crashed very early in the second suspend.
> It looks like somehow the pages were left marked as dirty, so the
> writeback daemons attempted writing back a page to an inode which was
> never opened read/write (and in fact as a text page for
> /usr/bin/killall, was mapped read/only).
> Given that ext4 initializes jinode only when the file is opened
> read/write, the fact that it is null, and the fact that it makes no
> sense that a program would be modifying /usr/bin/killall as part of a
> suspend/resume, it looks very much like we just unmasked a software
> suspend bug....
Ah, ok. Thanks for the explanation!
Regards,
Sebastian
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: Oops while going into hibernate
2011-01-13 11:48 ` Sebastian Ott
@ 2011-01-13 12:11 ` Bojan Smojver
2011-01-13 12:11 ` [linux-pm] " Bojan Smojver
` (2 subsequent siblings)
3 siblings, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 12:11 UTC (permalink / raw)
To: sebott; +Cc: linux-pm, linux-ext4, tytso, linux-kernel
> Yes. I basically did a
> echo reboot > /sys/power/disk
> for i in {1..5} ;do echo disk > /sys/power/state ;done
> and it crashed very early in the second suspend.
Was compression on or off?
--
Bojan
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [linux-pm] Oops while going into hibernate
2011-01-13 11:48 ` Sebastian Ott
2011-01-13 12:11 ` Bojan Smojver
@ 2011-01-13 12:11 ` Bojan Smojver
2011-01-13 12:31 ` Sebastian Ott
2011-01-13 12:31 ` Sebastian Ott
2011-01-13 13:36 ` Heiko Carstens
2011-01-13 13:36 ` Heiko Carstens
3 siblings, 2 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 12:11 UTC (permalink / raw)
To: sebott; +Cc: tytso, linux-pm, linux-ext4, linux-kernel
> Yes. I basically did a
> echo reboot > /sys/power/disk
> for i in {1..5} ;do echo disk > /sys/power/state ;done
> and it crashed very early in the second suspend.
Was compression on or off?
--
Bojan
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [linux-pm] Oops while going into hibernate
2011-01-13 12:11 ` [linux-pm] " Bojan Smojver
@ 2011-01-13 12:31 ` Sebastian Ott
2011-01-13 12:31 ` Sebastian Ott
1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 12:31 UTC (permalink / raw)
To: Bojan Smojver; +Cc: tytso, linux-pm, linux-ext4, linux-kernel
On Thu, 13 Jan 2011, Bojan Smojver wrote:
> > Yes. I basically did a
> > echo reboot > /sys/power/disk
> > for i in {1..5} ;do echo disk > /sys/power/state ;done
> > and it crashed very early in the second suspend.
>
> Was compression on or off?
compression was on. I did a re-test with hibernate=nocompress
..but with the same result
Regards,
Sebastian
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: Oops while going into hibernate
2011-01-13 12:11 ` [linux-pm] " Bojan Smojver
2011-01-13 12:31 ` Sebastian Ott
@ 2011-01-13 12:31 ` Sebastian Ott
1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 12:31 UTC (permalink / raw)
To: Bojan Smojver; +Cc: linux-pm, linux-ext4, tytso, linux-kernel
On Thu, 13 Jan 2011, Bojan Smojver wrote:
> > Yes. I basically did a
> > echo reboot > /sys/power/disk
> > for i in {1..5} ;do echo disk > /sys/power/state ;done
> > and it crashed very early in the second suspend.
>
> Was compression on or off?
compression was on. I did a re-test with hibernate=nocompress
..but with the same result
Regards,
Sebastian
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-13 11:48 ` Sebastian Ott
2011-01-13 12:11 ` Bojan Smojver
2011-01-13 12:11 ` [linux-pm] " Bojan Smojver
@ 2011-01-13 13:36 ` Heiko Carstens
2011-01-13 18:46 ` Ted Ts'o
2011-01-13 18:46 ` Ted Ts'o
2011-01-13 13:36 ` Heiko Carstens
3 siblings, 2 replies; 34+ messages in thread
From: Heiko Carstens @ 2011-01-13 13:36 UTC (permalink / raw)
To: Sebastian Ott
Cc: Theodore Tso, linux-ext4@vger.kernel.org development, LKML Kernel,
pm list
On Thu, Jan 13, 2011 at 12:48:40PM +0100, Sebastian Ott wrote:
> On Wed, 12 Jan 2011, Theodore Tso wrote:
> > It looks like somehow the pages were left marked as dirty, so the
> > writeback daemons attempted writing back a page to an inode which was
> > never opened read/write (and in fact as a text page for
> > /usr/bin/killall, was mapped read/only).
> > Given that ext4 initializes jinode only when the file is opened
> > read/write, the fact that it is null, and the fact that it makes no
> > sense that a program would be modifying /usr/bin/killall as part of a
> > suspend/resume, it looks very much like we just unmasked a software
> > suspend bug....
> Ah, ok. Thanks for the explanation!
Eeeek... this seems to be an architecture specific bug that is only present
on s390.
The dirty bit for user space pages on all architectures but s390 are stored
into the PTE's. On s390 however they are stored into the storage key that
exists per _physical_ page.
So, what we should have done, when implementing suspend/resume on s390, is
to save the storage key for each page and write that to the suspend device
and upon resume restore the storage key contents for each physical page.
The code that would do that is missing... Hence _all_ pages of the resumed
image are dirty after they have been copied to their location.
*ouch*
Will fix.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-13 13:36 ` Heiko Carstens
@ 2011-01-13 18:46 ` Ted Ts'o
2011-01-13 18:46 ` Ted Ts'o
1 sibling, 0 replies; 34+ messages in thread
From: Ted Ts'o @ 2011-01-13 18:46 UTC (permalink / raw)
To: Heiko Carstens
Cc: pm list, linux-ext4@vger.kernel.org development, LKML Kernel
On Thu, Jan 13, 2011 at 02:36:12PM +0100, Heiko Carstens wrote:
>
> Eeeek... this seems to be an architecture specific bug that is only present
> on s390.
> The dirty bit for user space pages on all architectures but s390 are stored
> into the PTE's. On s390 however they are stored into the storage key that
> exists per _physical_ page.
> So, what we should have done, when implementing suspend/resume on s390, is
> to save the storage key for each page and write that to the suspend device
> and upon resume restore the storage key contents for each physical page.
> The code that would do that is missing... Hence _all_ pages of the resumed
> image are dirty after they have been copied to their location.
> *ouch*
>
> Will fix.
Glad you found the root cause. If you don't think you can get this
fixed quickly, before -rc2 or -rc3, I can fairly quickly add some
checks to ext4 to detect this condition, issue a warning, and then
return an error code from the ->writepages() hook. (Which will then
promptly be ignored by the writeback code, since, hey, what are they
going to do with an error, but that's a discussion for another forum.)
Would that be helpful?
I'm still a bit concerned with the call to set the pages' PTE to be
dirty that I found in the hibernate code, but I accept the fact that
removing it doesn't solve the s390 crash. It still seems wrong to me,
and hopefully someone from linux-pm can look at that more closely.
- Ted
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: Oops while going into hibernate
2011-01-13 13:36 ` Heiko Carstens
2011-01-13 18:46 ` Ted Ts'o
@ 2011-01-13 18:46 ` Ted Ts'o
2011-01-13 21:30 ` [linux-pm] " Bojan Smojver
2011-01-13 21:30 ` Bojan Smojver
1 sibling, 2 replies; 34+ messages in thread
From: Ted Ts'o @ 2011-01-13 18:46 UTC (permalink / raw)
To: Heiko Carstens
Cc: Sebastian Ott, linux-ext4@vger.kernel.org development,
LKML Kernel, pm list
On Thu, Jan 13, 2011 at 02:36:12PM +0100, Heiko Carstens wrote:
>
> Eeeek... this seems to be an architecture specific bug that is only present
> on s390.
> The dirty bit for user space pages on all architectures but s390 are stored
> into the PTE's. On s390 however they are stored into the storage key that
> exists per _physical_ page.
> So, what we should have done, when implementing suspend/resume on s390, is
> to save the storage key for each page and write that to the suspend device
> and upon resume restore the storage key contents for each physical page.
> The code that would do that is missing... Hence _all_ pages of the resumed
> image are dirty after they have been copied to their location.
> *ouch*
>
> Will fix.
Glad you found the root cause. If you don't think you can get this
fixed quickly, before -rc2 or -rc3, I can fairly quickly add some
checks to ext4 to detect this condition, issue a warning, and then
return an error code from the ->writepages() hook. (Which will then
promptly be ignored by the writeback code, since, hey, what are they
going to do with an error, but that's a discussion for another forum.)
Would that be helpful?
I'm still a bit concerned with the call to set the pages' PTE to be
dirty that I found in the hibernate code, but I accept the fact that
removing it doesn't solve the s390 crash. It still seems wrong to me,
and hopefully someone from linux-pm can look at that more closely.
- Ted
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [linux-pm] Oops while going into hibernate
2011-01-13 18:46 ` Ted Ts'o
@ 2011-01-13 21:30 ` Bojan Smojver
2011-01-14 9:53 ` Heiko Carstens
2011-01-14 9:53 ` Heiko Carstens
2011-01-13 21:30 ` Bojan Smojver
1 sibling, 2 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 21:30 UTC (permalink / raw)
To: Ted Ts'o
Cc: Heiko Carstens, pm list, linux-ext4@vger.kernel.org development,
LKML Kernel
On Thu, 2011-01-13 at 13:46 -0500, Ted Ts'o wrote:
> I'm still a bit concerned with the call to set the pages' PTE to be
> dirty that I found in the hibernate code, but I accept the fact that
> removing it doesn't solve the s390 crash. It still seems wrong to me,
> and hopefully someone from linux-pm can look at that more closely.
If I'm understanding things correctly, this should affect only the
situation when compression is not used. Otherwise, pages that are read
into by block I/O are decompressed first and copied into different
pages. No?
--
Bojan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [linux-pm] Oops while going into hibernate
2011-01-13 21:30 ` [linux-pm] " Bojan Smojver
@ 2011-01-14 9:53 ` Heiko Carstens
2011-01-14 13:14 ` Bojan Smojver
2011-01-14 13:14 ` Bojan Smojver
2011-01-14 9:53 ` Heiko Carstens
1 sibling, 2 replies; 34+ messages in thread
From: Heiko Carstens @ 2011-01-14 9:53 UTC (permalink / raw)
To: Bojan Smojver
Cc: Ted Ts'o, pm list, linux-ext4@vger.kernel.org development,
LKML Kernel
On Fri, Jan 14, 2011 at 08:30:12AM +1100, Bojan Smojver wrote:
> On Thu, 2011-01-13 at 13:46 -0500, Ted Ts'o wrote:
> > I'm still a bit concerned with the call to set the pages' PTE to be
> > dirty that I found in the hibernate code, but I accept the fact that
> > removing it doesn't solve the s390 crash. It still seems wrong to me,
> > and hopefully someone from linux-pm can look at that more closely.
>
> If I'm understanding things correctly, this should affect only the
> situation when compression is not used. Otherwise, pages that are read
> into by block I/O are decompressed first and copied into different
> pages. No?
When the pages get copied to their final resting place the dirty bits
of the corresponding physical pages get set automatically by the
hardware.
If there would be some code that would clear the dirty bit after the
copy operation then we would have some underindication and as a result
the possibility of data corruption, but we've never seen this.
However since s390 is the only architecture which has dirty bits for
physical pages I doubt that there is any such code present. So the
bug should happen independently of image compression.
What is missing is code that restores the original storage keys after
the pages have been copied to their final place.
I need to read the suspend/resume code and figure out how to fix this.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [linux-pm] Oops while going into hibernate
2011-01-14 9:53 ` Heiko Carstens
@ 2011-01-14 13:14 ` Bojan Smojver
2011-01-14 13:14 ` Bojan Smojver
1 sibling, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-14 13:14 UTC (permalink / raw)
To: heiko.carstens; +Cc: tytso, linux-pm, linux-ext4, linux-kernel
> However since s390 is the only architecture which has dirty bits for
> physical pages I doubt that there is any such code present. So the
> bug should happen independently of image compression.
Ah, sorry - I wasn't clear there. I meant for other archs, where setting
the dirty bit on read may matter when compression is off.
Obviously, s390 will have to have its own problem fixed separately.
--
Bojan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-14 9:53 ` Heiko Carstens
2011-01-14 13:14 ` Bojan Smojver
@ 2011-01-14 13:14 ` Bojan Smojver
1 sibling, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-14 13:14 UTC (permalink / raw)
To: heiko.carstens; +Cc: linux-pm, linux-ext4, tytso, linux-kernel
> However since s390 is the only architecture which has dirty bits for
> physical pages I doubt that there is any such code present. So the
> bug should happen independently of image compression.
Ah, sorry - I wasn't clear there. I meant for other archs, where setting
the dirty bit on read may matter when compression is off.
Obviously, s390 will have to have its own problem fixed separately.
--
Bojan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-13 21:30 ` [linux-pm] " Bojan Smojver
2011-01-14 9:53 ` Heiko Carstens
@ 2011-01-14 9:53 ` Heiko Carstens
1 sibling, 0 replies; 34+ messages in thread
From: Heiko Carstens @ 2011-01-14 9:53 UTC (permalink / raw)
To: Bojan Smojver
Cc: pm list, linux-ext4@vger.kernel.org development, Ted Ts'o,
LKML Kernel
On Fri, Jan 14, 2011 at 08:30:12AM +1100, Bojan Smojver wrote:
> On Thu, 2011-01-13 at 13:46 -0500, Ted Ts'o wrote:
> > I'm still a bit concerned with the call to set the pages' PTE to be
> > dirty that I found in the hibernate code, but I accept the fact that
> > removing it doesn't solve the s390 crash. It still seems wrong to me,
> > and hopefully someone from linux-pm can look at that more closely.
>
> If I'm understanding things correctly, this should affect only the
> situation when compression is not used. Otherwise, pages that are read
> into by block I/O are decompressed first and copied into different
> pages. No?
When the pages get copied to their final resting place the dirty bits
of the corresponding physical pages get set automatically by the
hardware.
If there would be some code that would clear the dirty bit after the
copy operation then we would have some underindication and as a result
the possibility of data corruption, but we've never seen this.
However since s390 is the only architecture which has dirty bits for
physical pages I doubt that there is any such code present. So the
bug should happen independently of image compression.
What is missing is code that restores the original storage keys after
the pages have been copied to their final place.
I need to read the suspend/resume code and figure out how to fix this.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-13 18:46 ` Ted Ts'o
2011-01-13 21:30 ` [linux-pm] " Bojan Smojver
@ 2011-01-13 21:30 ` Bojan Smojver
1 sibling, 0 replies; 34+ messages in thread
From: Bojan Smojver @ 2011-01-13 21:30 UTC (permalink / raw)
To: Ted Ts'o
Cc: pm list, linux-ext4@vger.kernel.org development, Heiko Carstens,
LKML Kernel
On Thu, 2011-01-13 at 13:46 -0500, Ted Ts'o wrote:
> I'm still a bit concerned with the call to set the pages' PTE to be
> dirty that I found in the hibernate code, but I accept the fact that
> removing it doesn't solve the s390 crash. It still seems wrong to me,
> and hopefully someone from linux-pm can look at that more closely.
If I'm understanding things correctly, this should affect only the
situation when compression is not used. Otherwise, pages that are read
into by block I/O are decompressed first and copied into different
pages. No?
--
Bojan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-13 11:48 ` Sebastian Ott
` (2 preceding siblings ...)
2011-01-13 13:36 ` Heiko Carstens
@ 2011-01-13 13:36 ` Heiko Carstens
3 siblings, 0 replies; 34+ messages in thread
From: Heiko Carstens @ 2011-01-13 13:36 UTC (permalink / raw)
To: Sebastian Ott
Cc: pm list, linux-ext4@vger.kernel.org development, Theodore Tso,
LKML Kernel
On Thu, Jan 13, 2011 at 12:48:40PM +0100, Sebastian Ott wrote:
> On Wed, 12 Jan 2011, Theodore Tso wrote:
> > It looks like somehow the pages were left marked as dirty, so the
> > writeback daemons attempted writing back a page to an inode which was
> > never opened read/write (and in fact as a text page for
> > /usr/bin/killall, was mapped read/only).
> > Given that ext4 initializes jinode only when the file is opened
> > read/write, the fact that it is null, and the fact that it makes no
> > sense that a program would be modifying /usr/bin/killall as part of a
> > suspend/resume, it looks very much like we just unmasked a software
> > suspend bug....
> Ah, ok. Thanks for the explanation!
Eeeek... this seems to be an architecture specific bug that is only present
on s390.
The dirty bit for user space pages on all architectures but s390 are stored
into the PTE's. On s390 however they are stored into the storage key that
exists per _physical_ page.
So, what we should have done, when implementing suspend/resume on s390, is
to save the storage key for each page and write that to the suspend device
and upon resume restore the storage key contents for each physical page.
The code that would do that is missing... Hence _all_ pages of the resumed
image are dirty after they have been copied to their location.
*ouch*
Will fix.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Oops while going into hibernate
2011-01-13 0:44 ` Theodore Tso
` (4 preceding siblings ...)
2011-01-13 11:48 ` Sebastian Ott
@ 2011-01-13 11:48 ` Sebastian Ott
5 siblings, 0 replies; 34+ messages in thread
From: Sebastian Ott @ 2011-01-13 11:48 UTC (permalink / raw)
To: Theodore Tso; +Cc: pm list, linux-ext4@vger.kernel.org development, LKML Kernel
On Wed, 12 Jan 2011, Theodore Tso wrote:
> That looks really bogus. /usr/bin/killall is a system binary, and there's
> no good reason that hibernation should be trying to write pages to that
> binary.
>
> You said originally that the oops was happening "while going into
> hibernation right after resuming with...". So that means you did a
> successful suspend/resume, and then the second suspend caused the oops?
Yes. I basically did a
echo reboot > /sys/power/disk
for i in {1..5} ;do echo disk > /sys/power/state ;done
and it crashed very early in the second suspend.
> It looks like somehow the pages were left marked as dirty, so the
> writeback daemons attempted writing back a page to an inode which was
> never opened read/write (and in fact as a text page for
> /usr/bin/killall, was mapped read/only).
> Given that ext4 initializes jinode only when the file is opened
> read/write, the fact that it is null, and the fact that it makes no
> sense that a program would be modifying /usr/bin/killall as part of a
> suspend/resume, it looks very much like we just unmasked a software
> suspend bug....
Ah, ok. Thanks for the explanation!
Regards,
Sebastian
^ permalink raw reply [flat|nested] 34+ messages in thread