[PATCH] Page writeback broken after resume: wb

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] Page writeback broken after resume: wb_timer lost
@ 2006-05-20 13:03 Peter Lundkvist
  2006-05-20 17:37 ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Lundkvist @ 2006-05-20 13:03 UTC (permalink / raw)
  To: linux-kernel

Hi,
I have noticed for some time that nr_dirty never drops but
increases except when VM pressure forces it down. This only
occurs after a resume, never on a freshly booted system.

It seems the wb_timer is lost when the timer function is
trying to start a frozen pdflush thread, and this occurs
during suspend or resume.

I have included a patch which work for me. Don't know if the
test also should include a check for freezing to be safe, ie
  if ( !frozen(..) && !freezing(..) )



diff -ru linux-2.6.17.org/mm/pdflush.c linux-2.6.17/mm/pdflush.c
--- linux-2.6.17.org/mm/pdflush.c	2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.17/mm/pdflush.c	2006-05-20 14:22:35.000000000 +0200
@@ -213,12 +213,16 @@
 		struct pdflush_work *pdf;
 
 		pdf = list_entry(pdflush_list.next, struct pdflush_work, list);
-		list_del_init(&pdf->list);
-		if (list_empty(&pdflush_list))
-			last_empty_jifs = jiffies;
-		pdf->fn = fn;
-		pdf->arg0 = arg0;
-		wake_up_process(pdf->who);
+		if (!frozen(pdf->who)) {
+			list_del_init(&pdf->list);
+			if (list_empty(&pdflush_list))
+				last_empty_jifs = jiffies;
+			pdf->fn = fn;
+			pdf->arg0 = arg0;
+			wake_up_process(pdf->who);
+		}
+		else
+			ret = -1;
 		spin_unlock_irqrestore(&pdflush_lock, flags);
 	}
 	return ret;

-- 
Peter Lundkvist

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-05-20 13:03 [PATCH] Page writeback broken after resume: wb_timer lost Peter Lundkvist
@ 2006-05-20 17:37 ` Andrew Morton
  2006-05-20 22:50   ` Pavel Machek
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2006-05-20 17:37 UTC (permalink / raw)
  To: Peter Lundkvist; +Cc: linux-kernel, Pavel Machek, Rafael J. Wysocki

Peter Lundkvist <p.lundkvist@telia.com> wrote:
>
> Hi,
> I have noticed for some time that nr_dirty never drops but
> increases except when VM pressure forces it down. This only
> occurs after a resume, never on a freshly booted system.
> 
> It seems the wb_timer is lost when the timer function is
> trying to start a frozen pdflush thread, and this occurs
> during suspend or resume.
> 
> I have included a patch which work for me. Don't know if the
> test also should include a check for freezing to be safe, ie
>   if ( !frozen(..) && !freezing(..) )
> 
> 
> 
> diff -ru linux-2.6.17.org/mm/pdflush.c linux-2.6.17/mm/pdflush.c
> --- linux-2.6.17.org/mm/pdflush.c	2006-03-20 06:53:29.000000000 +0100
> +++ linux-2.6.17/mm/pdflush.c	2006-05-20 14:22:35.000000000 +0200
> @@ -213,12 +213,16 @@
>  		struct pdflush_work *pdf;
>  
>  		pdf = list_entry(pdflush_list.next, struct pdflush_work, list);
> -		list_del_init(&pdf->list);
> -		if (list_empty(&pdflush_list))
> -			last_empty_jifs = jiffies;
> -		pdf->fn = fn;
> -		pdf->arg0 = arg0;
> -		wake_up_process(pdf->who);
> +		if (!frozen(pdf->who)) {
> +			list_del_init(&pdf->list);
> +			if (list_empty(&pdflush_list))
> +				last_empty_jifs = jiffies;
> +			pdf->fn = fn;
> +			pdf->arg0 = arg0;
> +			wake_up_process(pdf->who);
> +		}
> +		else
> +			ret = -1;
>  		spin_unlock_irqrestore(&pdflush_lock, flags);
>  	}
>  	return ret;

Maybe the code over in page-writeback.c should just rearm the timee within
the timer handler rather than waiting for a pdflush thread to do it.  I'll
think about that.

But the main questions is: what on earth is going on here?  We've taken a
kernel thread and we've done a wake_up_process() on it, but because it was
in a frozen state it just never gets to run, even after the resume. 
Presumably it goes back into interruptible sleep after the resume.  We took
it off the list (in the expectation that it'd run again) so we've lost
control of it.

Pavel, Rafael: this amounts to a lost wakeup.  What's the story?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-05-20 17:37 ` Andrew Morton
@ 2006-05-20 22:50   ` Pavel Machek
  2006-05-21  0:12     ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2006-05-20 22:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Lundkvist, linux-kernel, Rafael J.  Wysocki

Hi!

> > I have noticed for some time that nr_dirty never drops but
> > increases except when VM pressure forces it down. This only
> > occurs after a resume, never on a freshly booted system.
> > 
> > It seems the wb_timer is lost when the timer function is
> > trying to start a frozen pdflush thread, and this occurs
> > during suspend or resume.
> > 
> > I have included a patch which work for me. Don't know if the
> > test also should include a check for freezing to be safe, ie
> >   if ( !frozen(..) && !freezing(..) )

Yep, I have seen this too. Sync took *way* too long and I believe I
lost some data because of this problem.

> Maybe the code over in page-writeback.c should just rearm the timee within
> the timer handler rather than waiting for a pdflush thread to do it.  I'll
> think about that.
> 
> But the main questions is: what on earth is going on here?  We've taken a
> kernel thread and we've done a wake_up_process() on it, but because it was
> in a frozen state it just never gets to run, even after the resume. 
> Presumably it goes back into interruptible sleep after the resume.  We took
> it off the list (in the expectation that it'd run again) so we've lost
> control of it.

I guess you should not try to wake up process while it is frozen. Such
wakeups are likely to get lost. Should we add some BUG_ON() somewhere?

...we have to eat some wakeups, because we fake some.

Or perhaps we should do WARN_ON(frozen(current)) just after schedule()
below?

> Pavel, Rafael: this amounts to a lost wakeup.  What's the story?

								Pavel

Refrigerator looks like this:

/* Refrigerator is place where frozen processes are stored :-). */
void refrigerator(void)
{
        /* Hmm, should we be allowed to suspend when there are
realtime
           processes around? */
        long save;
        save = current->state;
        pr_debug("%s entered refrigerator\n", current->comm);
        printk("=");

        frozen_process(current);
        spin_lock_irq(&current->sighand->siglock);
        recalc_sigpending(); /* We sent fake signal, clean it up */
        spin_unlock_irq(&current->sighand->siglock);

        while (frozen(current)) {
                current->state = TASK_UNINTERRUPTIBLE;
                schedule();
        }
        pr_debug("%s left refrigerator\n", current->comm);
        current->state = save;
}

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-05-20 22:50   ` Pavel Machek
@ 2006-05-21  0:12     ` Andrew Morton
  2006-05-21  6:52       ` Peter Lundkvist
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Andrew Morton @ 2006-05-21  0:12 UTC (permalink / raw)
  To: Pavel Machek; +Cc: p.lundkvist, linux-kernel, rjw

Pavel Machek <pavel@ucw.cz> wrote:
>
> Refrigerator looks like this:
> 
>  /* Refrigerator is place where frozen processes are stored :-). */
>  void refrigerator(void)
>  {
>          /* Hmm, should we be allowed to suspend when there are
>  realtime
>             processes around? */
>          long save;
>          save = current->state;
>          pr_debug("%s entered refrigerator\n", current->comm);
>          printk("=");
> 
>          frozen_process(current);
>          spin_lock_irq(&current->sighand->siglock);
>          recalc_sigpending(); /* We sent fake signal, clean it up */
>          spin_unlock_irq(&current->sighand->siglock);
> 
>          while (frozen(current)) {
>                  current->state = TASK_UNINTERRUPTIBLE;
>                  schedule();
>          }
>          pr_debug("%s left refrigerator\n", current->comm);
>          current->state = save;
>  }

Well that's a crock, isn't it?


Peter, does this fix it?


From: Andrew Morton <akpm@osdl.org>

pdflush is carefully designed to ensure that all wakeups have some
corresponding work to do - if a woken-up pdflush thread discovers that it
hasn't been given any work to do then this is considered an error.

That all broke when swsusp came along - because a timer-delivered wakeup to a
frozen pdflush thread will just get lost.  This causes the pdflush thread to
get lost as well: the writeback timer is supposed to be re-armed by pdflush in
process context, but pdflush doesn't execute the callout which does this.

Fix that up by ignoring the return value from try_to_freeze(): jsut proceed,
see if we have any work pending and only go back to sleep if that is not the
case.


Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 mm/pdflush.c |    9 ++-------
 1 files changed, 2 insertions(+), 7 deletions(-)

diff -puN mm/pdflush.c~pdflush-handle-resume-wakeups mm/pdflush.c
--- devel/mm/pdflush.c~pdflush-handle-resume-wakeups	2006-05-20 17:02:21.000000000 -0700
+++ devel-akpm/mm/pdflush.c	2006-05-20 17:11:25.000000000 -0700
@@ -104,13 +104,8 @@ static int __pdflush(struct pdflush_work
 		list_move(&my_work->list, &pdflush_list);
 		my_work->when_i_went_to_sleep = jiffies;
 		spin_unlock_irq(&pdflush_lock);
-
 		schedule();
-		if (try_to_freeze()) {
-			spin_lock_irq(&pdflush_lock);
-			continue;
-		}
-
+		try_to_freeze();
 		spin_lock_irq(&pdflush_lock);
 		if (!list_empty(&my_work->list)) {
 			printk("pdflush: bogus wakeup!\n");
@@ -118,7 +113,7 @@ static int __pdflush(struct pdflush_work
 			continue;
 		}
 		if (my_work->fn == NULL) {
-			printk("pdflush: NULL work function\n");
+			printk("pflush: resuming\n");
 			continue;
 		}
 		spin_unlock_irq(&pdflush_lock);
_




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-05-21  0:12     ` Andrew Morton
@ 2006-05-21  6:52       ` Peter Lundkvist
  2006-05-21 10:08       ` Pavel Machek
  2006-06-16 21:24       ` Johannes Stezenbach
  2 siblings, 0 replies; 14+ messages in thread
From: Peter Lundkvist @ 2006-05-21  6:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Pavel Machek, linux-kernel, rjw

On Sat, May 20, 2006 at 05:12:44PM -0700, Andrew Morton wrote:
> 
> Peter, does this fix it?
> 

Yes, it does fix the problem.

Hopefully we'll see this fix in the next stable update, because
of the risk of data loss.

> 
> From: Andrew Morton <akpm@osdl.org>
> 
> pdflush is carefully designed to ensure that all wakeups have some
> corresponding work to do - if a woken-up pdflush thread discovers that it
> hasn't been given any work to do then this is considered an error.
> 
> That all broke when swsusp came along - because a timer-delivered wakeup to a
> frozen pdflush thread will just get lost.  This causes the pdflush thread to
> get lost as well: the writeback timer is supposed to be re-armed by pdflush in
> process context, but pdflush doesn't execute the callout which does this.
> 
> Fix that up by ignoring the return value from try_to_freeze(): jsut proceed,
> see if we have any work pending and only go back to sleep if that is not the
> case.
> 
> 
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  mm/pdflush.c |    9 ++-------
>  1 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff -puN mm/pdflush.c~pdflush-handle-resume-wakeups mm/pdflush.c
> --- devel/mm/pdflush.c~pdflush-handle-resume-wakeups	2006-05-20 17:02:21.000000000 -0700
> +++ devel-akpm/mm/pdflush.c	2006-05-20 17:11:25.000000000 -0700
> @@ -104,13 +104,8 @@ static int __pdflush(struct pdflush_work
>  		list_move(&my_work->list, &pdflush_list);
>  		my_work->when_i_went_to_sleep = jiffies;
>  		spin_unlock_irq(&pdflush_lock);
> -
>  		schedule();
> -		if (try_to_freeze()) {
> -			spin_lock_irq(&pdflush_lock);
> -			continue;
> -		}
> -
> +		try_to_freeze();
>  		spin_lock_irq(&pdflush_lock);
>  		if (!list_empty(&my_work->list)) {
>  			printk("pdflush: bogus wakeup!\n");
> @@ -118,7 +113,7 @@ static int __pdflush(struct pdflush_work
>  			continue;
>  		}
>  		if (my_work->fn == NULL) {
> -			printk("pdflush: NULL work function\n");
> +			printk("pflush: resuming\n");
>  			continue;
>  		}
>  		spin_unlock_irq(&pdflush_lock);
> _
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-05-21  0:12     ` Andrew Morton
  2006-05-21  6:52       ` Peter Lundkvist
@ 2006-05-21 10:08       ` Pavel Machek
  2006-06-16 21:24       ` Johannes Stezenbach
  2 siblings, 0 replies; 14+ messages in thread
From: Pavel Machek @ 2006-05-21 10:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: p.lundkvist, linux-kernel, rjw

Hi!

> Well that's a crock, isn't it?
> 
> 
> Peter, does this fix it?
> 
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> pdflush is carefully designed to ensure that all wakeups have some
> corresponding work to do - if a woken-up pdflush thread discovers that it
> hasn't been given any work to do then this is considered an error.
> 
> That all broke when swsusp came along - because a timer-delivered wakeup to a
> frozen pdflush thread will just get lost.  This causes the pdflush thread to
> get lost as well: the writeback timer is supposed to be re-armed by pdflush in
> process context, but pdflush doesn't execute the callout which does this.
> 
> Fix that up by ignoring the return value from try_to_freeze(): jsut proceed,
> see if we have any work pending and only go back to sleep if that is not the
> case.

Looks okay to me.

							Pavel
	(who wonders what are the other places where we have similar
	problems)
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-05-21  0:12     ` Andrew Morton
  2006-05-21  6:52       ` Peter Lundkvist
  2006-05-21 10:08       ` Pavel Machek
@ 2006-06-16 21:24       ` Johannes Stezenbach
  2006-06-16 23:12         ` Nigel Cunningham
  2006-06-19 15:41         ` Mark Lord
  2 siblings, 2 replies; 14+ messages in thread
From: Johannes Stezenbach @ 2006-06-16 21:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Pavel Machek, p.lundkvist, linux-kernel, rjw, Mark Lord

On Sat, May 20, 2006, Andrew Morton wrote:
> From: Andrew Morton <akpm@osdl.org>
> 
> pdflush is carefully designed to ensure that all wakeups have some
> corresponding work to do - if a woken-up pdflush thread discovers that it
> hasn't been given any work to do then this is considered an error.
> 
> That all broke when swsusp came along - because a timer-delivered wakeup to a
> frozen pdflush thread will just get lost.  This causes the pdflush thread to
> get lost as well: the writeback timer is supposed to be re-armed by pdflush in
> process context, but pdflush doesn't execute the callout which does this.
> 
> Fix that up by ignoring the return value from try_to_freeze(): jsut proceed,
> see if we have any work pending and only go back to sleep if that is not the
> case.
> 
> 
> Signed-off-by: Andrew Morton <akpm@osdl.org>


I've tested this patch for about a week now, by applying it to
the 2.6.17-rc3 kernel on my laptop, which I've been using
for more than a month now. This patch seems to cure the
mysterious symptoms reported in February:

http://lkml.org/lkml/2006/2/6/167
http://lkml.org/lkml/2006/2/6/170
http://lkml.org/lkml/2006/2/13/424
etc.

Actually I didn't remember to check "Dirty:" in /proc/meminfo,
but when I "sync"ed at the end of my workday, just prior to
swsupending it, sync returned immediately. with unpatched
2.6.17-rc3, sync would take half a minute. Maybe Mark can give
this patch a spin to check if it cures his problem, too.
(I still use vmware, so vmware was not the culprit.)


Thanks,
Johannes


> ---
> 
>  mm/pdflush.c |    9 ++-------
>  1 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff -puN mm/pdflush.c~pdflush-handle-resume-wakeups mm/pdflush.c
> --- devel/mm/pdflush.c~pdflush-handle-resume-wakeups	2006-05-20 17:02:21.000000000 -0700
> +++ devel-akpm/mm/pdflush.c	2006-05-20 17:11:25.000000000 -0700
> @@ -104,13 +104,8 @@ static int __pdflush(struct pdflush_work
>  		list_move(&my_work->list, &pdflush_list);
>  		my_work->when_i_went_to_sleep = jiffies;
>  		spin_unlock_irq(&pdflush_lock);
> -
>  		schedule();
> -		if (try_to_freeze()) {
> -			spin_lock_irq(&pdflush_lock);
> -			continue;
> -		}
> -
> +		try_to_freeze();
>  		spin_lock_irq(&pdflush_lock);
>  		if (!list_empty(&my_work->list)) {
>  			printk("pdflush: bogus wakeup!\n");
> @@ -118,7 +113,7 @@ static int __pdflush(struct pdflush_work
>  			continue;
>  		}
>  		if (my_work->fn == NULL) {
> -			printk("pdflush: NULL work function\n");
> +			printk("pflush: resuming\n");
>  			continue;
>  		}
>  		spin_unlock_irq(&pdflush_lock);
> _
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-06-16 21:24       ` Johannes Stezenbach
@ 2006-06-16 23:12         ` Nigel Cunningham
  2006-06-19 15:41         ` Mark Lord
  1 sibling, 0 replies; 14+ messages in thread
From: Nigel Cunningham @ 2006-06-16 23:12 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: Andrew Morton, Pavel Machek, p.lundkvist, linux-kernel, rjw,
	Mark Lord

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

Hi.

Sorry for coming late to the party. I've only just seen this thread, and have 
had reports of it too.

My concern is, shouldn't we be dealing with the cause rather than just one 
symptom (and as Pavel rightly wondered, assuming there aren't more)? I'm not 
sure that I have a solution, but I think the point is worth raising again.

Do we want something like adding the process's task struct to timer data, and 
get the timer code to delay firing timers for frozen processes?

Regards,

Nigel
-- 
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-06-16 21:24       ` Johannes Stezenbach
  2006-06-16 23:12         ` Nigel Cunningham
@ 2006-06-19 15:41         ` Mark Lord
  2006-06-21  3:38           ` Mark Lord
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Lord @ 2006-06-19 15:41 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: Andrew Morton, Pavel Machek, p.lundkvist, linux-kernel, rjw,
	Mark Lord

Johannes Stezenbach wrote:
> On Sat, May 20, 2006, Andrew Morton wrote:
>> From: Andrew Morton <akpm@osdl.org>
>>
>> pdflush is carefully designed to ensure that all wakeups have some
>> corresponding work to do - if a woken-up pdflush thread discovers that it
>> hasn't been given any work to do then this is considered an error.
>>
>> That all broke when swsusp came along - because a timer-delivered wakeup to a
>> frozen pdflush thread will just get lost.  This causes the pdflush thread to
>> get lost as well: the writeback timer is supposed to be re-armed by pdflush in
>> process context, but pdflush doesn't execute the callout which does this.
>>
>> Fix that up by ignoring the return value from try_to_freeze(): jsut proceed,
>> see if we have any work pending and only go back to sleep if that is not the
>> case.
>>
>>
>> Signed-off-by: Andrew Morton <akpm@osdl.org>
> 
> 
> I've tested this patch for about a week now, by applying it to
> the 2.6.17-rc3 kernel on my laptop, which I've been using
> for more than a month now. This patch seems to cure the
> mysterious symptoms reported in February:
> 
> http://lkml.org/lkml/2006/2/6/167
> http://lkml.org/lkml/2006/2/6/170
> http://lkml.org/lkml/2006/2/13/424
> etc.
> 
> Actually I didn't remember to check "Dirty:" in /proc/meminfo,
> but when I "sync"ed at the end of my workday, just prior to
> swsupending it, sync returned immediately. with unpatched
> 2.6.17-rc3, sync would take half a minute. Maybe Mark can give
> this patch a spin to check if it cures his problem, too.
> (I still use vmware, so vmware was not the culprit.)

I just gave it a try here.  With or without a suspend/resume cycle after boot,
the "sync" time is much quicker.  But the Dirty count in /proc/meminfo
still shows very huge (eg. 600MB) values that never really get smaller
until I type "sync".  But that subsequent "sync" only takes a couple
of seconds now, rather than 10-20 seconds like before.

Dunno what that all means -- I'm still keeping my little daemon around
to do periodic "sync" calls for safety.

Cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-06-19 15:41         ` Mark Lord
@ 2006-06-21  3:38           ` Mark Lord
  2006-06-21  3:54             ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Lord @ 2006-06-21  3:38 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: Andrew Morton, Pavel Machek, p.lundkvist, linux-kernel, rjw

Mark Lord wrote:
> Johannes Stezenbach wrote:
>> On Sat, May 20, 2006, Andrew Morton wrote:
>>> From: Andrew Morton <akpm@osdl.org>
>>>
>>> pdflush is carefully designed to ensure that all wakeups have some
>>> corresponding work to do - if a woken-up pdflush thread discovers 
>>> that it
>>> hasn't been given any work to do then this is considered an error.
>>>
>>> That all broke when swsusp came along - because a timer-delivered 
>>> wakeup to a
>>> frozen pdflush thread will just get lost.  This causes the pdflush 
>>> thread to
>>> get lost as well: the writeback timer is supposed to be re-armed by 
>>> pdflush in
>>> process context, but pdflush doesn't execute the callout which does 
>>> this.
>>>
>>> Fix that up by ignoring the return value from try_to_freeze(): jsut 
>>> proceed,
>>> see if we have any work pending and only go back to sleep if that is 
>>> not the
>>> case.
>>>
>>>
>>> Signed-off-by: Andrew Morton <akpm@osdl.org>
>>
>>
>> I've tested this patch for about a week now, by applying it to
>> the 2.6.17-rc3 kernel on my laptop, which I've been using
>> for more than a month now. This patch seems to cure the
>> mysterious symptoms reported in February:
>>
>> http://lkml.org/lkml/2006/2/6/167
>> http://lkml.org/lkml/2006/2/6/170
>> http://lkml.org/lkml/2006/2/13/424
>> etc.
>>
>> Actually I didn't remember to check "Dirty:" in /proc/meminfo,
>> but when I "sync"ed at the end of my workday, just prior to
>> swsupending it, sync returned immediately. with unpatched
>> 2.6.17-rc3, sync would take half a minute
...
> I just gave it a try here.  With or without a suspend/resume cycle after 
> boot,
> the "sync" time is much quicker.  But the Dirty count in /proc/meminfo
> still shows very huge (eg. 600MB) values that never really get smaller
> until I type "sync".  But that subsequent "sync" only takes a couple
> of seconds now, rather than 10-20 seconds like before.
..

Yup, behaviour is *definitely* much better now.  I'm not sure why
the /proc/meminfo "Dirty" count lags behind reality, but the disk
is being kept much more up-to-date than without this patch.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-06-21  3:38           ` Mark Lord
@ 2006-06-21  3:54             ` Andrew Morton
  2006-06-21  4:10               ` Mark Lord
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2006-06-21  3:54 UTC (permalink / raw)
  To: Mark Lord; +Cc: js, pavel, p.lundkvist, linux-kernel, rjw

On Tue, 20 Jun 2006 23:38:57 -0400
Mark Lord <lkml@rtr.ca> wrote:

> > I just gave it a try here.  With or without a suspend/resume cycle after 
> > boot,
> > the "sync" time is much quicker.  But the Dirty count in /proc/meminfo
> > still shows very huge (eg. 600MB) values that never really get smaller
> > until I type "sync".  But that subsequent "sync" only takes a couple
> > of seconds now, rather than 10-20 seconds like before.
> ..
> 
> Yup, behaviour is *definitely* much better now.  I'm not sure why
> the /proc/meminfo "Dirty" count lags behind reality, but the disk
> is being kept much more up-to-date than without this patch.

Are you able to come up with a foolproof set of steps which would allow the
laggy-dirtiness to be reproduced by yours truly?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-06-21  3:54             ` Andrew Morton
@ 2006-06-21  4:10               ` Mark Lord
  2006-06-21  4:19                 ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Lord @ 2006-06-21  4:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: js, pavel, p.lundkvist, linux-kernel, rjw

Andrew Morton wrote:
> On Tue, 20 Jun 2006 23:38:57 -0400
> Mark Lord <lkml@rtr.ca> wrote:
> 
>>> I just gave it a try here.  With or without a suspend/resume cycle after 
>>> boot,
>>> the "sync" time is much quicker.  But the Dirty count in /proc/meminfo
>>> still shows very huge (eg. 600MB) values that never really get smaller
>>> until I type "sync".  But that subsequent "sync" only takes a couple
>>> of seconds now, rather than 10-20 seconds like before.
>> ..
>>
>> Yup, behaviour is *definitely* much better now.  I'm not sure why
>> the /proc/meminfo "Dirty" count lags behind reality, but the disk
>> is being kept much more up-to-date than without this patch.
> 
> Are you able to come up with a foolproof set of steps which would allow the
> laggy-dirtiness to be reproduced by yours truly?

Heh.. don't I wish!

The best is still as described originally:

http://lkml.org/lkml/2006/2/6/170

Basically, "cat" a ton of huge files together into a single new one,
and then watch /proc/meminfo to see what happens.  For me, the count
there still just hangs at some big number like 500MB until I type "sync",
at which point it (nearly) instantly now goes to zero.

Previous to this patch, the "sync" actually resulted in a ton of disk writes,
but now those happen on the tail end of the "cat" command, as they should.

My kernel .config is available from http://rtr.ca/dell_i9300/kernel/latest/

Cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-06-21  4:10               ` Mark Lord
@ 2006-06-21  4:19                 ` Andrew Morton
  2006-06-22 20:25                   ` Mark Lord
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2006-06-21  4:19 UTC (permalink / raw)
  To: Mark Lord; +Cc: js, pavel, p.lundkvist, linux-kernel, rjw

On Wed, 21 Jun 2006 00:10:55 -0400
Mark Lord <lkml@rtr.ca> wrote:

> Andrew Morton wrote:
> > On Tue, 20 Jun 2006 23:38:57 -0400
> > Mark Lord <lkml@rtr.ca> wrote:
> > 
> >>> I just gave it a try here.  With or without a suspend/resume cycle after 
> >>> boot,
> >>> the "sync" time is much quicker.  But the Dirty count in /proc/meminfo
> >>> still shows very huge (eg. 600MB) values that never really get smaller
> >>> until I type "sync".  But that subsequent "sync" only takes a couple
> >>> of seconds now, rather than 10-20 seconds like before.
> >> ..
> >>
> >> Yup, behaviour is *definitely* much better now.  I'm not sure why
> >> the /proc/meminfo "Dirty" count lags behind reality, but the disk
> >> is being kept much more up-to-date than without this patch.
> > 
> > Are you able to come up with a foolproof set of steps which would allow the
> > laggy-dirtiness to be reproduced by yours truly?
> 
> Heh.. don't I wish!
> 
> The best is still as described originally:
> 
> http://lkml.org/lkml/2006/2/6/170
> 
> Basically, "cat" a ton of huge files together into a single new one,
> and then watch /proc/meminfo to see what happens.  For me, the count
> there still just hangs at some big number like 500MB until I type "sync",
> at which point it (nearly) instantly now goes to zero.
> 
> Previous to this patch, the "sync" actually resulted in a ton of disk writes,
> but now those happen on the tail end of the "cat" command, as they should.
> 
> My kernel .config is available from http://rtr.ca/dell_i9300/kernel/latest/

Is that after a suspend/resume, or does it happen after a reboot?

Are you sure all the dirty memory doesn't get autocleaned after 30-60
seconds?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Page writeback broken after resume: wb_timer lost
  2006-06-21  4:19                 ` Andrew Morton
@ 2006-06-22 20:25                   ` Mark Lord
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Lord @ 2006-06-22 20:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: js, pavel, p.lundkvist, linux-kernel, rjw

Andrew Morton wrote:
>
> Is that after a suspend/resume, or does it happen after a reboot?

Definitely after a suspend/resume (RAM).
It's been too long for me to remember about after a reboot.

> Are you sure all the dirty memory doesn't get autocleaned after 30-60
> seconds?

No, it does not get autocleaned (without this patch) after 30-60 *minutes*,
let alone 30-60 seconds.  Nor with or without laptop-mode enabled/disabled.
Nor anything else suggested.  Except for this patch.

Cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-06-22 20:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-20 13:03 [PATCH] Page writeback broken after resume: wb_timer lost Peter Lundkvist
2006-05-20 17:37 ` Andrew Morton
2006-05-20 22:50   ` Pavel Machek
2006-05-21  0:12     ` Andrew Morton
2006-05-21  6:52       ` Peter Lundkvist
2006-05-21 10:08       ` Pavel Machek
2006-06-16 21:24       ` Johannes Stezenbach
2006-06-16 23:12         ` Nigel Cunningham
2006-06-19 15:41         ` Mark Lord
2006-06-21  3:38           ` Mark Lord
2006-06-21  3:54             ` Andrew Morton
2006-06-21  4:10               ` Mark Lord
2006-06-21  4:19                 ` Andrew Morton
2006-06-22 20:25                   ` Mark Lord

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox