Re: [PATCH 03/10] writeback: Do not congestion sleep if there are no congested BDIs or significant writeback

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan.kim@gmail.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Linux Kernel List <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 03/10] writeback: Do not congestion sleep if there are no congested BDIs or significant writeback
Date: Mon, 13 Sep 2010 00:37:44 +0900	[thread overview]
Message-ID: <20100912153744.GA3563@barrios-desktop> (raw)
In-Reply-To: <20100909085436.GJ29263@csn.ul.ie>

On Thu, Sep 09, 2010 at 09:54:36AM +0100, Mel Gorman wrote:
> On Wed, Sep 08, 2010 at 11:52:45PM +0900, Minchan Kim wrote:
> > On Wed, Sep 08, 2010 at 12:04:03PM +0100, Mel Gorman wrote:
> > > On Wed, Sep 08, 2010 at 12:25:33AM +0900, Minchan Kim wrote:
> > > > > + * @zone: A zone to consider the number of being being written back from
> > > > > + * @sync: SYNC or ASYNC IO
> > > > > + * @timeout: timeout in jiffies
> > > > > + *
> > > > > + * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
> > > > > + * write congestion.  If no backing_devs are congested then the number of
> > > > > + * writeback pages in the zone are checked and compared to the inactive
> > > > > + * list. If there is no sigificant writeback or congestion, there is no point
> > > >                                                 and 
> > > > 
> > > 
> > > Why and? "or" makes sense because we avoid sleeping on either condition.
> > 
> > if (nr_bdi_congested[sync]) == 0) {
> >         if (writeback < inactive / 2) {
> >                 cond_resched();
> >                 ..
> >                 goto out
> >         }
> > }
> > 
> > for avoiding sleeping, above two condition should meet. 
> 
> This is a terrible comment that is badly written. Is this any clearer?
> 
> /**
>  * wait_iff_congested - Conditionally wait for a backing_dev to become uncongested or a zone to complete writes
>  * @zone: A zone to consider the number of being being written back from
>  * @sync: SYNC or ASYNC IO
>  * @timeout: timeout in jiffies
>  *
>  * In the event of a congested backing_dev (any backing_dev) or a given @zone
>  * having a large number of pages in writeback, this waits for up to @timeout
>  * jiffies for either a BDI to exit congestion or a write to complete.
>  *
>  * If there is no congestion and few pending writes, then cond_resched()
>  * is called to yield the processor if necessary but otherwise does not
>  * sleep.
>  */

Looks good.

> 
> > > 
> > > > > + * in sleeping but cond_resched() is called in case the current process has
> > > > > + * consumed its CPU quota.
> > > > > + */
> > > > > +long wait_iff_congested(struct zone *zone, int sync, long timeout)
> > > > > +{
> > > > > +	long ret;
> > > > > +	unsigned long start = jiffies;
> > > > > +	DEFINE_WAIT(wait);
> > > > > +	wait_queue_head_t *wqh = &congestion_wqh[sync];
> > > > > +
> > > > > +	/*
> > > > > +	 * If there is no congestion, check the amount of writeback. If there
> > > > > +	 * is no significant writeback and no congestion, just cond_resched
> > > > > +	 */
> > > > > +	if (atomic_read(&nr_bdi_congested[sync]) == 0) {
> > > > > +		unsigned long inactive, writeback;
> > > > > +
> > > > > +		inactive = zone_page_state(zone, NR_INACTIVE_FILE) +
> > > > > +				zone_page_state(zone, NR_INACTIVE_ANON);
> > > > > +		writeback = zone_page_state(zone, NR_WRITEBACK);
> > > > > +
> > > > > +		/*
> > > > > +		 * If less than half the inactive list is being written back,
> > > > > +		 * reclaim might as well continue
> > > > > +		 */
> > > > > +		if (writeback < inactive / 2) {
> > > > 
> > > > I am not sure this is best.
> > > > 
> > > 
> > > I'm not saying it is. The objective is to identify a situation where
> > > sleeping until the next write or congestion clears is pointless. We have
> > > already identified that we are not congested so the question is "are we
> > > writing a lot at the moment?". The assumption is that if there is a lot
> > > of writing going on, we might as well sleep until one completes rather
> > > than reclaiming more.
> > > 
> > > This is the first effort at identifying pointless sleeps. Better ones
> > > might be identified in the future but that shouldn't stop us making a
> > > semi-sensible decision now.
> > 
> > nr_bdi_congested is no problem since we have used it for a long time.
> > But you added new rule about writeback. 
> > 
> 
> Yes, I'm trying to add a new rule about throttling in the page allocator
> and from vmscan. As you can see from the results in the leader, we are
> currently sleeping more than we need to.

I can see the about avoiding congestion_wait but can't find about 
(writeback < incative / 2) hueristic result. 

> 
> > Why I pointed out is that you added new rule and I hope let others know
> > this change since they have a good idea or any opinions. 
> > I think it's a one of roles as reviewer.
> > 
> 
> Of course.
> 
> > > 
> > > > 1. Without considering various speed class storage, could we fix it as half of inactive?
> > > 
> > > We don't really have a good means of identifying speed classes of
> > > storage. Worse, we are considering on a zone-basis here, not a BDI
> > > basis. The pages being written back in the zone could be backed by
> > > anything so we cannot make decisions based on BDI speed.
> > 
> > True. So it's why I have below question.
> > As you said, we don't have enough information in vmscan.
> > So I am not sure how effective such semi-sensible decision is. 
> > 
> 
> What additional metrics would you apply than the ones I used in the
> leader mail?

effectiveness of (writeback < inactive / 2) heuristic. 

> 
> > I think best is to throttle in page-writeback well. 
> 
> I do not think there is a problem as such in page writeback throttling.
> The problem is that we are going to sleep without any congestion or without
> writes in progress. We sleep for a full timeout in this case for no reason
> and this is what I'm trying to avoid.

Yes. I agree. 
Just my concern is heuristic accuarcy I mentioned.
In your previous verstion, you don't add the heuristic.
But suddenly you added it in this version. 
So I think you have any clue to add it in this version.
Please, write down cause and data if you have. 

-- 
Kind regards,
Minchan Kim

WARNING: multiple messages have this Message-ID (diff)

From: Minchan Kim <minchan.kim@gmail.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Linux Kernel List <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 03/10] writeback: Do not congestion sleep if there are no congested BDIs or significant writeback
Date: Mon, 13 Sep 2010 00:37:44 +0900	[thread overview]
Message-ID: <20100912153744.GA3563@barrios-desktop> (raw)
In-Reply-To: <20100909085436.GJ29263@csn.ul.ie>

On Thu, Sep 09, 2010 at 09:54:36AM +0100, Mel Gorman wrote:
> On Wed, Sep 08, 2010 at 11:52:45PM +0900, Minchan Kim wrote:
> > On Wed, Sep 08, 2010 at 12:04:03PM +0100, Mel Gorman wrote:
> > > On Wed, Sep 08, 2010 at 12:25:33AM +0900, Minchan Kim wrote:
> > > > > + * @zone: A zone to consider the number of being being written back from
> > > > > + * @sync: SYNC or ASYNC IO
> > > > > + * @timeout: timeout in jiffies
> > > > > + *
> > > > > + * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
> > > > > + * write congestion.  If no backing_devs are congested then the number of
> > > > > + * writeback pages in the zone are checked and compared to the inactive
> > > > > + * list. If there is no sigificant writeback or congestion, there is no point
> > > >                                                 and 
> > > > 
> > > 
> > > Why and? "or" makes sense because we avoid sleeping on either condition.
> > 
> > if (nr_bdi_congested[sync]) == 0) {
> >         if (writeback < inactive / 2) {
> >                 cond_resched();
> >                 ..
> >                 goto out
> >         }
> > }
> > 
> > for avoiding sleeping, above two condition should meet. 
> 
> This is a terrible comment that is badly written. Is this any clearer?
> 
> /**
>  * wait_iff_congested - Conditionally wait for a backing_dev to become uncongested or a zone to complete writes
>  * @zone: A zone to consider the number of being being written back from
>  * @sync: SYNC or ASYNC IO
>  * @timeout: timeout in jiffies
>  *
>  * In the event of a congested backing_dev (any backing_dev) or a given @zone
>  * having a large number of pages in writeback, this waits for up to @timeout
>  * jiffies for either a BDI to exit congestion or a write to complete.
>  *
>  * If there is no congestion and few pending writes, then cond_resched()
>  * is called to yield the processor if necessary but otherwise does not
>  * sleep.
>  */

Looks good.

> 
> > > 
> > > > > + * in sleeping but cond_resched() is called in case the current process has
> > > > > + * consumed its CPU quota.
> > > > > + */
> > > > > +long wait_iff_congested(struct zone *zone, int sync, long timeout)
> > > > > +{
> > > > > +	long ret;
> > > > > +	unsigned long start = jiffies;
> > > > > +	DEFINE_WAIT(wait);
> > > > > +	wait_queue_head_t *wqh = &congestion_wqh[sync];
> > > > > +
> > > > > +	/*
> > > > > +	 * If there is no congestion, check the amount of writeback. If there
> > > > > +	 * is no significant writeback and no congestion, just cond_resched
> > > > > +	 */
> > > > > +	if (atomic_read(&nr_bdi_congested[sync]) == 0) {
> > > > > +		unsigned long inactive, writeback;
> > > > > +
> > > > > +		inactive = zone_page_state(zone, NR_INACTIVE_FILE) +
> > > > > +				zone_page_state(zone, NR_INACTIVE_ANON);
> > > > > +		writeback = zone_page_state(zone, NR_WRITEBACK);
> > > > > +
> > > > > +		/*
> > > > > +		 * If less than half the inactive list is being written back,
> > > > > +		 * reclaim might as well continue
> > > > > +		 */
> > > > > +		if (writeback < inactive / 2) {
> > > > 
> > > > I am not sure this is best.
> > > > 
> > > 
> > > I'm not saying it is. The objective is to identify a situation where
> > > sleeping until the next write or congestion clears is pointless. We have
> > > already identified that we are not congested so the question is "are we
> > > writing a lot at the moment?". The assumption is that if there is a lot
> > > of writing going on, we might as well sleep until one completes rather
> > > than reclaiming more.
> > > 
> > > This is the first effort at identifying pointless sleeps. Better ones
> > > might be identified in the future but that shouldn't stop us making a
> > > semi-sensible decision now.
> > 
> > nr_bdi_congested is no problem since we have used it for a long time.
> > But you added new rule about writeback. 
> > 
> 
> Yes, I'm trying to add a new rule about throttling in the page allocator
> and from vmscan. As you can see from the results in the leader, we are
> currently sleeping more than we need to.

I can see the about avoiding congestion_wait but can't find about 
(writeback < incative / 2) hueristic result. 

> 
> > Why I pointed out is that you added new rule and I hope let others know
> > this change since they have a good idea or any opinions. 
> > I think it's a one of roles as reviewer.
> > 
> 
> Of course.
> 
> > > 
> > > > 1. Without considering various speed class storage, could we fix it as half of inactive?
> > > 
> > > We don't really have a good means of identifying speed classes of
> > > storage. Worse, we are considering on a zone-basis here, not a BDI
> > > basis. The pages being written back in the zone could be backed by
> > > anything so we cannot make decisions based on BDI speed.
> > 
> > True. So it's why I have below question.
> > As you said, we don't have enough information in vmscan.
> > So I am not sure how effective such semi-sensible decision is. 
> > 
> 
> What additional metrics would you apply than the ones I used in the
> leader mail?

effectiveness of (writeback < inactive / 2) heuristic. 

> 
> > I think best is to throttle in page-writeback well. 
> 
> I do not think there is a problem as such in page writeback throttling.
> The problem is that we are going to sleep without any congestion or without
> writes in progress. We sleep for a full timeout in this case for no reason
> and this is what I'm trying to avoid.

Yes. I agree. 
Just my concern is heuristic accuarcy I mentioned.
In your previous verstion, you don't add the heuristic.
But suddenly you added it in this version. 
So I think you have any clue to add it in this version.
Please, write down cause and data if you have. 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-09-12 15:37 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-06 10:47 [PATCH 0/9] Reduce latencies and improve overall reclaim efficiency v1 Mel Gorman
2010-09-06 10:47 ` Mel Gorman
2010-09-06 10:47 ` [PATCH 01/10] tracing, vmscan: Add trace events for LRU list shrinking Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-06 10:47 ` [PATCH 02/10] writeback: Account for time spent congestion_waited Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-06 10:47 ` [PATCH 03/10] writeback: Do not congestion sleep if there are no congested BDIs or significant writeback Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:25   ` Minchan Kim
2010-09-07 15:25     ` Minchan Kim
2010-09-08 11:04     ` Mel Gorman
2010-09-08 11:04       ` Mel Gorman
2010-09-08 14:52       ` Minchan Kim
2010-09-08 14:52         ` Minchan Kim
2010-09-09  8:54         ` Mel Gorman
2010-09-09  8:54           ` Mel Gorman
2010-09-12 15:37           ` Minchan Kim [this message]
2010-09-12 15:37             ` Minchan Kim
2010-09-13  8:55             ` Mel Gorman
2010-09-13  8:55               ` Mel Gorman
2010-09-13  9:48               ` Minchan Kim
2010-09-13  9:48                 ` Minchan Kim
2010-09-13 10:07                 ` Mel Gorman
2010-09-13 10:07                   ` Mel Gorman
2010-09-13 10:20                   ` Minchan Kim
2010-09-13 10:20                     ` Minchan Kim
2010-09-13 10:30                     ` Mel Gorman
2010-09-13 10:30                       ` Mel Gorman
2010-09-08 21:23   ` Andrew Morton
2010-09-08 21:23     ` Andrew Morton
2010-09-09 10:43     ` Mel Gorman
2010-09-09 10:43       ` Mel Gorman
2010-09-09  3:02   ` KAMEZAWA Hiroyuki
2010-09-09  3:02     ` KAMEZAWA Hiroyuki
2010-09-09  8:58     ` Mel Gorman
2010-09-09  8:58       ` Mel Gorman
2010-09-06 10:47 ` [PATCH 04/10] vmscan: Synchronous lumpy reclaim should not call congestion_wait() Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:26   ` Minchan Kim
2010-09-07 15:26     ` Minchan Kim
2010-09-08  6:15   ` Johannes Weiner
2010-09-08  6:15     ` Johannes Weiner
2010-09-08 11:25   ` Wu Fengguang
2010-09-08 11:25     ` Wu Fengguang
2010-09-09  3:03   ` KAMEZAWA Hiroyuki
2010-09-09  3:03     ` KAMEZAWA Hiroyuki
2010-09-06 10:47 ` [PATCH 05/10] vmscan: Synchrounous lumpy reclaim use lock_page() instead trylock_page() Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:28   ` Minchan Kim
2010-09-07 15:28     ` Minchan Kim
2010-09-08  6:16   ` Johannes Weiner
2010-09-08  6:16     ` Johannes Weiner
2010-09-08 11:28   ` Wu Fengguang
2010-09-08 11:28     ` Wu Fengguang
2010-09-09  3:04   ` KAMEZAWA Hiroyuki
2010-09-09  3:04     ` KAMEZAWA Hiroyuki
2010-09-09  3:15     ` KAMEZAWA Hiroyuki
2010-09-09  3:15       ` KAMEZAWA Hiroyuki
2010-09-09  3:25       ` Wu Fengguang
2010-09-09  3:25         ` Wu Fengguang
2010-09-09  4:13       ` KOSAKI Motohiro
2010-09-09  4:13         ` KOSAKI Motohiro
2010-09-09  9:22         ` Mel Gorman
2010-09-09  9:22           ` Mel Gorman
2010-09-10 10:25           ` KOSAKI Motohiro
2010-09-10 10:25             ` KOSAKI Motohiro
2010-09-10 10:33             ` KOSAKI Motohiro
2010-09-10 10:33               ` KOSAKI Motohiro
2010-09-10 10:33               ` KOSAKI Motohiro
2010-09-13  9:14             ` Mel Gorman
2010-09-13  9:14               ` Mel Gorman
2010-09-14 10:14               ` KOSAKI Motohiro
2010-09-14 10:14                 ` KOSAKI Motohiro
2010-09-06 10:47 ` [PATCH 06/10] vmscan: Narrow the scenarios lumpy reclaim uses synchrounous reclaim Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-09  3:14   ` KAMEZAWA Hiroyuki
2010-09-09  3:14     ` KAMEZAWA Hiroyuki
2010-09-06 10:47 ` [PATCH 07/10] vmscan: Remove dead code in shrink_inactive_list() Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:33   ` Minchan Kim
2010-09-07 15:33     ` Minchan Kim
2010-09-06 10:47 ` [PATCH 08/10] vmscan: isolated_lru_pages() stop neighbour search if neighbour cannot be isolated Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-07 15:37   ` Minchan Kim
2010-09-07 15:37     ` Minchan Kim
2010-09-08 11:12     ` Mel Gorman
2010-09-08 11:12       ` Mel Gorman
2010-09-08 14:58       ` Minchan Kim
2010-09-08 14:58         ` Minchan Kim
2010-09-08 11:37   ` Wu Fengguang
2010-09-08 11:37     ` Wu Fengguang
2010-09-08 12:50     ` Mel Gorman
2010-09-08 12:50       ` Mel Gorman
2010-09-08 13:14       ` Wu Fengguang
2010-09-08 13:14         ` Wu Fengguang
2010-09-08 13:27         ` Mel Gorman
2010-09-08 13:27           ` Mel Gorman
2010-09-09  3:17   ` KAMEZAWA Hiroyuki
2010-09-09  3:17     ` KAMEZAWA Hiroyuki
2010-09-06 10:47 ` [PATCH 09/10] vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-13 13:31   ` Wu Fengguang
2010-09-13 13:31     ` Wu Fengguang
2010-09-13 13:55     ` Mel Gorman
2010-09-13 13:55       ` Mel Gorman
2010-09-13 14:33       ` Wu Fengguang
2010-09-13 14:33         ` Wu Fengguang
2010-10-28 21:50   ` Christoph Hellwig
2010-10-28 21:50     ` Christoph Hellwig
2010-10-29 10:26     ` Mel Gorman
2010-10-29 10:26       ` Mel Gorman
2010-09-06 10:47 ` [PATCH 10/10] vmscan: Kick flusher threads to clean pages when reclaim is encountering dirty pages Mel Gorman
2010-09-06 10:47   ` Mel Gorman
2010-09-09  3:22   ` KAMEZAWA Hiroyuki
2010-09-09  3:22     ` KAMEZAWA Hiroyuki
2010-09-09  9:32     ` Mel Gorman
2010-09-09  9:32       ` Mel Gorman
2010-09-13  0:53       ` KAMEZAWA Hiroyuki
2010-09-13  0:53         ` KAMEZAWA Hiroyuki
2010-09-13 13:48   ` Wu Fengguang
2010-09-13 13:48     ` Wu Fengguang
2010-09-13 14:10     ` Mel Gorman
2010-09-13 14:10       ` Mel Gorman
2010-09-13 14:41       ` Wu Fengguang
2010-09-13 14:41         ` Wu Fengguang
2010-09-06 10:49 ` [PATCH 0/9] Reduce latencies and improve overall reclaim efficiency v1 Mel Gorman
2010-09-06 10:49   ` Mel Gorman
2010-09-08  3:14 ` KOSAKI Motohiro
2010-09-08  3:14   ` KOSAKI Motohiro
2010-09-08  8:38   ` Mel Gorman
2010-09-08  8:38     ` Mel Gorman
2010-09-13 23:10 ` Minchan Kim
2010-09-13 23:10   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100912153744.GA3563@barrios-desktop \
    --to=minchan.kim@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.