Re: [RFC PATCH 0/3] Do not wait the full timeout on congestion_wait when there is no congestion

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
	linux-kernel@vger.kernel.org, Rik van Riel <riel@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC PATCH 0/3] Do not wait the full timeout on congestion_wait when there is no congestion
Date: Thu, 26 Aug 2010 18:31:47 +0100	[thread overview]
Message-ID: <20100826173147.GH20944@csn.ul.ie> (raw)
In-Reply-To: <20100826172038.GA6873@barrios-desktop>

On Fri, Aug 27, 2010 at 02:20:38AM +0900, Minchan Kim wrote:
> On Thu, Aug 26, 2010 at 04:14:13PM +0100, Mel Gorman wrote:
> > congestion_wait() is a bit stupid in that it goes to sleep even when there
> > is no congestion. This causes stalls in a number of situations and may be
> > partially responsible for bug reports about desktop interactivity.
> > 
> > This patch series aims to account for these unnecessary congestion_waits()
> > and to avoid going to sleep when there is no congestion available. Patches
> > 1 and 2 add instrumentation related to congestion which should be reuable
> > by alternative solutions to congestion_wait. Patch 3 calls cond_resched()
> > instead of going to sleep if there is no congestion.
> > 
> > Once again, I shoved this through performance test. Unlike previous tests,
> > I ran this on a ported version of my usual test-suite that should be suitable
> > for release soon. It's not quite as good as my old set but it's sufficient
> > for this and related series. The tests I ran were kernbench vmr-stream
> > iozone hackbench-sockets hackbench-pipes netperf-udp netperf-tcp sysbench
> > stress-highalloc. Sysbench was a read/write tests and stress-highalloc is
> > the usual stress the number of high order allocations that can be made while
> > the system is under severe stress. The suite contains the necessary analysis
> > scripts as well and I'd release it now except the documentation blows.
> > 
> > x86:    Intel Pentium D 3GHz with 3G RAM (no-brand machine)
> > x86-64:	AMD Phenom 9950 1.3GHz with 3G RAM (no-brand machine)
> > ppc64:	PPC970MP 2.5GHz with 3GB RAM (it's a terrasoft powerstation)
> > 
> > The disks on all of them were single disks and not particularly fast.
> > 
> > Comparison was between a 2.6.36-rc1 with patches 1 and 2 applied for
> > instrumentation and a second test with patch 3 applied.
> > 
> > In all cases, kernbench, hackbench, STREAM and iozone did not show any
> > performance difference because none of them were pressuring the system
> > enough to be calling congestion_wait() so I won't post the results.
> > About all worth noting for them is that nothing horrible appeared to break.
> > 
> > In the analysis scripts, I record unnecessary sleeps to be a sleep that
> > had no congestion. The post-processing scripts for cond_resched() will only
> > count an uncongested call to congestion_wait() as unnecessary if the process
> > actually gets scheduled. Ordinarily, we'd expect it to continue uninterrupted.
> > 
> > One vague concern I have is when too many pages are isolated, we call
> > congestion_wait(). This could now actively spin in the loop for its quanta
> > before calling cond_resched(). If it's calling with no congestion, it's
> > hard to know what the proper thing to do there is.
> 
> Suddenly, many processes could enter into the direct reclaim path by another
> reason(ex, fork bomb) regradless of congestion. backing dev congestion is 
> just one of them. 
> 

This situation applys with or without this series, right?

> I think if congestion_wait returns without calling io_schedule_timeout 
> by your patch, too_many_isolated can schedule_timeout to wait for the system's 
> calm to preventing OOM killing.
> 

More likely, to stop a loop in too_many_isolated() consuming CPU time it
can do nothing with.

> How about this?
> 
> If you don't mind, I will send the patch based on this patch series 
> after your patch settle down or Could you add this to your patch series?
> But I admit this doesn't almost affect your experiment. 
> 

I think it's a related topic so could belong with the series.

> From 70d6584e125c3954d74a69bfcb72de17244635d2 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan.kim@gmail.com>
> Date: Fri, 27 Aug 2010 02:06:45 +0900
> Subject: [PATCH] Wait regardless of congestion if too many pages are isolated
> 
> Suddenly, many processes could enter into the direct reclaim path
> regradless of congestion. backing dev congestion is just one of them.
> But current implementation calls congestion_wait if too many pages are isolated.
> 
> if congestion_wait returns without calling io_schedule_timeout,
> too_many_isolated can schedule_timeout to wait for the system's calm
> to preventing OOM killing.
> 

I think the reasoning here might be a little off. How about;

If many processes enter direct reclaim or memory compaction, too many pages
can get isolated. In this situation, too_many_isolated() can call
congestion_wait() but if there is no congestion, it fails to go to sleep
and instead spins until it's quota expires.

This patch checks if congestion_wait() returned without sleeping. If it
did because there was no congestion, it unconditionally goes to sleep
instead of hogging the CPU.

> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> ---
>  mm/backing-dev.c |    5 ++---
>  mm/compaction.c  |    6 +++++-
>  mm/vmscan.c      |    6 +++++-
>  3 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 6abe860..9431bca 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -756,8 +756,7 @@ EXPORT_SYMBOL(set_bdi_congested);
>   * @timeout: timeout in jiffies
>   *
>   * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
> - * write congestion.  If no backing_devs are congested then just wait for the
> - * next write to be completed.
> + * write congestion.  If no backing_devs are congested then just returns.
>   */  
>  long congestion_wait(int sync, long timeout)
>  {
> @@ -776,7 +775,7 @@ long congestion_wait(int sync, long timeout)
>         if (atomic_read(&nr_bdi_congested[sync]) == 0) {
>                 unnecessary = true;
>                 cond_resched();
> -               ret = 0;
> +               ret = timeout;
>         } else {
>                 prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
>                 ret = io_schedule_timeout(timeout);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 94cce51..7370683 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -253,7 +253,11 @@ static unsigned long isolate_migratepages(struct zone *zone,
>          * delay for some time until fewer pages are isolated
>          */  
>         while (unlikely(too_many_isolated(zone))) {
> -               congestion_wait(BLK_RW_ASYNC, HZ/10);
> +               long timeout = HZ/10;
> +               if (timeout == congestion_wait(BLK_RW_ASYNC, timeout)) {
> +                       set_current_state(TASK_INTERRUPTIBLE);
> +                       schedule_timeout(timeout);
> +               }
> 

We don't really need the timeout variable here but I see what you are
at. It's unfortunate to just go to sleep for HZ/10 but if it's not
congestion, we do not have any other event to wake up on at the moment.
We'd have to introduce a too_many_isolated waitqueue that is kicked if
pages are put back on the LRU.

This is better than spinning though.

>                 if (fatal_signal_pending(current))
>                         return 0;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 3109ff7..f5e3e28 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1337,7 +1337,11 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>         unsigned long nr_dirty;
>         while (unlikely(too_many_isolated(zone, file, sc))) {
> -               congestion_wait(BLK_RW_ASYNC, HZ/10);
> +               long timeout = HZ/10;
> +               if (timeout == congestion_wait(BLK_RW_ASYNC, timeout)) {
> +                       set_current_state(TASK_INTERRUPTIBLE);
> +                       schedule_timeout(timeout);
> +               }
> 
>                 /* We are about to die and free our memory. Return now. */
>                 if (fatal_signal_pending(current))

This seems very reasonable. I'll review it more carefully tomorrow and if I
spot nothing horrible, I'll add it onto the series. I'm not sure I'm hitting
the too_many_isolated() case but I cannot think of a better alternative
without adding more waitqueues.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mel@csn.ul.ie>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
	linux-kernel@vger.kernel.org, Rik van Riel <riel@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC PATCH 0/3] Do not wait the full timeout on congestion_wait when there is no congestion
Date: Thu, 26 Aug 2010 18:31:47 +0100	[thread overview]
Message-ID: <20100826173147.GH20944@csn.ul.ie> (raw)
In-Reply-To: <20100826172038.GA6873@barrios-desktop>

On Fri, Aug 27, 2010 at 02:20:38AM +0900, Minchan Kim wrote:
> On Thu, Aug 26, 2010 at 04:14:13PM +0100, Mel Gorman wrote:
> > congestion_wait() is a bit stupid in that it goes to sleep even when there
> > is no congestion. This causes stalls in a number of situations and may be
> > partially responsible for bug reports about desktop interactivity.
> > 
> > This patch series aims to account for these unnecessary congestion_waits()
> > and to avoid going to sleep when there is no congestion available. Patches
> > 1 and 2 add instrumentation related to congestion which should be reuable
> > by alternative solutions to congestion_wait. Patch 3 calls cond_resched()
> > instead of going to sleep if there is no congestion.
> > 
> > Once again, I shoved this through performance test. Unlike previous tests,
> > I ran this on a ported version of my usual test-suite that should be suitable
> > for release soon. It's not quite as good as my old set but it's sufficient
> > for this and related series. The tests I ran were kernbench vmr-stream
> > iozone hackbench-sockets hackbench-pipes netperf-udp netperf-tcp sysbench
> > stress-highalloc. Sysbench was a read/write tests and stress-highalloc is
> > the usual stress the number of high order allocations that can be made while
> > the system is under severe stress. The suite contains the necessary analysis
> > scripts as well and I'd release it now except the documentation blows.
> > 
> > x86:    Intel Pentium D 3GHz with 3G RAM (no-brand machine)
> > x86-64:	AMD Phenom 9950 1.3GHz with 3G RAM (no-brand machine)
> > ppc64:	PPC970MP 2.5GHz with 3GB RAM (it's a terrasoft powerstation)
> > 
> > The disks on all of them were single disks and not particularly fast.
> > 
> > Comparison was between a 2.6.36-rc1 with patches 1 and 2 applied for
> > instrumentation and a second test with patch 3 applied.
> > 
> > In all cases, kernbench, hackbench, STREAM and iozone did not show any
> > performance difference because none of them were pressuring the system
> > enough to be calling congestion_wait() so I won't post the results.
> > About all worth noting for them is that nothing horrible appeared to break.
> > 
> > In the analysis scripts, I record unnecessary sleeps to be a sleep that
> > had no congestion. The post-processing scripts for cond_resched() will only
> > count an uncongested call to congestion_wait() as unnecessary if the process
> > actually gets scheduled. Ordinarily, we'd expect it to continue uninterrupted.
> > 
> > One vague concern I have is when too many pages are isolated, we call
> > congestion_wait(). This could now actively spin in the loop for its quanta
> > before calling cond_resched(). If it's calling with no congestion, it's
> > hard to know what the proper thing to do there is.
> 
> Suddenly, many processes could enter into the direct reclaim path by another
> reason(ex, fork bomb) regradless of congestion. backing dev congestion is 
> just one of them. 
> 

This situation applys with or without this series, right?

> I think if congestion_wait returns without calling io_schedule_timeout 
> by your patch, too_many_isolated can schedule_timeout to wait for the system's 
> calm to preventing OOM killing.
> 

More likely, to stop a loop in too_many_isolated() consuming CPU time it
can do nothing with.

> How about this?
> 
> If you don't mind, I will send the patch based on this patch series 
> after your patch settle down or Could you add this to your patch series?
> But I admit this doesn't almost affect your experiment. 
> 

I think it's a related topic so could belong with the series.

> From 70d6584e125c3954d74a69bfcb72de17244635d2 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan.kim@gmail.com>
> Date: Fri, 27 Aug 2010 02:06:45 +0900
> Subject: [PATCH] Wait regardless of congestion if too many pages are isolated
> 
> Suddenly, many processes could enter into the direct reclaim path
> regradless of congestion. backing dev congestion is just one of them.
> But current implementation calls congestion_wait if too many pages are isolated.
> 
> if congestion_wait returns without calling io_schedule_timeout,
> too_many_isolated can schedule_timeout to wait for the system's calm
> to preventing OOM killing.
> 

I think the reasoning here might be a little off. How about;

If many processes enter direct reclaim or memory compaction, too many pages
can get isolated. In this situation, too_many_isolated() can call
congestion_wait() but if there is no congestion, it fails to go to sleep
and instead spins until it's quota expires.

This patch checks if congestion_wait() returned without sleeping. If it
did because there was no congestion, it unconditionally goes to sleep
instead of hogging the CPU.

> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> ---
>  mm/backing-dev.c |    5 ++---
>  mm/compaction.c  |    6 +++++-
>  mm/vmscan.c      |    6 +++++-
>  3 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 6abe860..9431bca 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -756,8 +756,7 @@ EXPORT_SYMBOL(set_bdi_congested);
>   * @timeout: timeout in jiffies
>   *
>   * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
> - * write congestion.  If no backing_devs are congested then just wait for the
> - * next write to be completed.
> + * write congestion.  If no backing_devs are congested then just returns.
>   */  
>  long congestion_wait(int sync, long timeout)
>  {
> @@ -776,7 +775,7 @@ long congestion_wait(int sync, long timeout)
>         if (atomic_read(&nr_bdi_congested[sync]) == 0) {
>                 unnecessary = true;
>                 cond_resched();
> -               ret = 0;
> +               ret = timeout;
>         } else {
>                 prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
>                 ret = io_schedule_timeout(timeout);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 94cce51..7370683 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -253,7 +253,11 @@ static unsigned long isolate_migratepages(struct zone *zone,
>          * delay for some time until fewer pages are isolated
>          */  
>         while (unlikely(too_many_isolated(zone))) {
> -               congestion_wait(BLK_RW_ASYNC, HZ/10);
> +               long timeout = HZ/10;
> +               if (timeout == congestion_wait(BLK_RW_ASYNC, timeout)) {
> +                       set_current_state(TASK_INTERRUPTIBLE);
> +                       schedule_timeout(timeout);
> +               }
> 

We don't really need the timeout variable here but I see what you are
at. It's unfortunate to just go to sleep for HZ/10 but if it's not
congestion, we do not have any other event to wake up on at the moment.
We'd have to introduce a too_many_isolated waitqueue that is kicked if
pages are put back on the LRU.

This is better than spinning though.

>                 if (fatal_signal_pending(current))
>                         return 0;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 3109ff7..f5e3e28 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1337,7 +1337,11 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>         unsigned long nr_dirty;
>         while (unlikely(too_many_isolated(zone, file, sc))) {
> -               congestion_wait(BLK_RW_ASYNC, HZ/10);
> +               long timeout = HZ/10;
> +               if (timeout == congestion_wait(BLK_RW_ASYNC, timeout)) {
> +                       set_current_state(TASK_INTERRUPTIBLE);
> +                       schedule_timeout(timeout);
> +               }
> 
>                 /* We are about to die and free our memory. Return now. */
>                 if (fatal_signal_pending(current))

This seems very reasonable. I'll review it more carefully tomorrow and if I
spot nothing horrible, I'll add it onto the series. I'm not sure I'm hitting
the too_many_isolated() case but I cannot think of a better alternative
without adding more waitqueues.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-08-26 17:32 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-26 15:14 [RFC PATCH 0/3] Do not wait the full timeout on congestion_wait when there is no congestion Mel Gorman
2010-08-26 15:14 ` Mel Gorman
2010-08-26 15:14 ` [PATCH 1/3] writeback: Account for time spent congestion_waited Mel Gorman
2010-08-26 15:14   ` Mel Gorman
2010-08-26 17:23   ` Minchan Kim
2010-08-26 17:23     ` Minchan Kim
2010-08-26 18:10   ` Johannes Weiner
2010-08-26 18:10     ` Johannes Weiner
2010-08-26 15:14 ` [PATCH 2/3] writeback: Record if the congestion was unnecessary Mel Gorman
2010-08-26 15:14   ` Mel Gorman
2010-08-26 17:35   ` Minchan Kim
2010-08-26 17:35     ` Minchan Kim
2010-08-26 17:41     ` Mel Gorman
2010-08-26 17:41       ` Mel Gorman
2010-08-26 18:29   ` Johannes Weiner
2010-08-26 18:29     ` Johannes Weiner
2010-08-26 20:31     ` Mel Gorman
2010-08-26 20:31       ` Mel Gorman
2010-08-27  2:12       ` Shaohua Li
2010-08-27  2:12         ` Shaohua Li
2010-08-27  9:20         ` Mel Gorman
2010-08-27  9:20           ` Mel Gorman
2010-08-27  8:16       ` Johannes Weiner
2010-08-27  8:16         ` Johannes Weiner
2010-08-27  9:24         ` Mel Gorman
2010-08-27  9:24           ` Mel Gorman
2010-08-30 13:19           ` Johannes Weiner
2010-08-31 15:02             ` Mel Gorman
2010-08-31 15:02               ` Mel Gorman
2010-09-02 15:49               ` Johannes Weiner
2010-09-02 15:49                 ` Johannes Weiner
2010-09-02 18:28                 ` Mel Gorman
2010-09-02 18:28                   ` Mel Gorman
2010-08-29 16:03     ` Minchan Kim
2010-08-29 16:03       ` Minchan Kim
2010-08-26 15:14 ` [PATCH 3/3] writeback: Do not congestion sleep when there are no congested BDIs Mel Gorman
2010-08-26 15:14   ` Mel Gorman
2010-08-26 17:38   ` Minchan Kim
2010-08-26 17:38     ` Minchan Kim
2010-08-26 17:42     ` Mel Gorman
2010-08-26 17:42       ` Mel Gorman
2010-08-26 18:17       ` Johannes Weiner
2010-08-26 18:17         ` Johannes Weiner
2010-08-26 20:23         ` Mel Gorman
2010-08-26 20:23           ` Mel Gorman
2010-08-27  1:11           ` Wu Fengguang
2010-08-27  1:11             ` Wu Fengguang
2010-08-27  9:34             ` Mel Gorman
2010-08-27  9:34               ` Mel Gorman
2010-08-27  1:42         ` Wu Fengguang
2010-08-27  1:42           ` Wu Fengguang
2010-08-27  9:37           ` Mel Gorman
2010-08-27  9:37             ` Mel Gorman
2010-08-27  5:13   ` Dave Chinner
2010-08-27  5:13     ` Dave Chinner
2010-08-27  9:33     ` Mel Gorman
2010-08-27  9:33       ` Mel Gorman
2010-08-26 17:20 ` [RFC PATCH 0/3] Do not wait the full timeout on congestion_wait when there is no congestion Minchan Kim
2010-08-26 17:20   ` Minchan Kim
2010-08-26 17:31   ` Mel Gorman [this message]
2010-08-26 17:31     ` Mel Gorman
2010-08-26 17:50     ` Minchan Kim
2010-08-26 17:50       ` Minchan Kim
2010-08-27  1:21   ` Wu Fengguang
2010-08-27  1:21     ` Wu Fengguang
2010-08-27  1:41     ` Minchan Kim
2010-08-27  1:41       ` Minchan Kim
2010-08-27  1:50       ` Wu Fengguang
2010-08-27  1:50         ` Wu Fengguang
2010-08-27  2:02         ` Minchan Kim
2010-08-27  2:02           ` Minchan Kim
2010-08-27  4:34           ` Wu Fengguang
2010-08-27  4:34             ` Wu Fengguang
2010-08-27  9:38     ` Mel Gorman
2010-08-27  9:38       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100826173147.GH20944@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=ehrhardt@linux.vnet.ibm.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.