linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	mgorman@suse.de
Subject: Re: [PATCH 3/7] writeback: introduce smoothed global dirty limit
Date: Mon, 20 Jun 2011 23:18:10 +0200	[thread overview]
Message-ID: <20110620211810.GH4810@quack.suse.cz> (raw)
In-Reply-To: <20110619150510.246140117@intel.com>

On Sun 19-06-11 23:01:11, Wu Fengguang wrote:
> The start of a heavy weight application (ie. KVM) may instantly knock
> down determine_dirtyable_memory() and hence the global/bdi dirty
> thresholds.
> 
> So introduce global_dirty_limit for tracking the global dirty threshold
> with policies
> 
> - follow downwards slowly
> - follow up in one shot
> 
> global_dirty_limit can effectively mask out the impact of sudden drop of
> dirtyable memory. It will be used in the next patch for two new type of
> dirty limits.
Looking into the code, this patch is dependent on previously submitted
patches for estimation of BDI write bandwidth, isn't it?

> 
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  include/linux/writeback.h |    2 +
>  mm/page-writeback.c       |   41 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> --- linux-next.orig/include/linux/writeback.h	2011-06-19 22:56:18.000000000 +0800
> +++ linux-next/include/linux/writeback.h	2011-06-19 22:59:29.000000000 +0800
> @@ -88,6 +88,8 @@ static inline void laptop_sync_completio
>  #endif
>  void throttle_vm_writeout(gfp_t gfp_mask);
>  
> +extern unsigned long global_dirty_limit;
> +
>  /* These are exported to sysctl. */
>  extern int dirty_background_ratio;
>  extern unsigned long dirty_background_bytes;
> --- linux-next.orig/mm/page-writeback.c	2011-06-19 22:56:18.000000000 +0800
> +++ linux-next/mm/page-writeback.c	2011-06-19 22:59:29.000000000 +0800
> @@ -116,6 +116,7 @@ EXPORT_SYMBOL(laptop_mode);
>  
>  /* End of sysctl-exported parameters */
>  
> +unsigned long global_dirty_limit;
>  
>  /*
>   * Scale the writeback cache size proportional to the relative writeout speeds.
> @@ -510,6 +511,43 @@ static void bdi_update_write_bandwidth(s
>  	bdi->avg_write_bandwidth = avg;
>  }
>  
> +static void update_dirty_limit(unsigned long thresh,
> +				 unsigned long dirty)
> +{
> +	unsigned long limit = global_dirty_limit;
> +
> +	if (limit < thresh) {
> +		limit = thresh;
> +		goto update;
> +	}
> +
> +	if (limit > thresh &&
> +	    limit > dirty) {
> +		limit -= (limit - max(thresh, dirty)) >> 5;
> +		goto update;
> +	}
Hmm, but strictly speaking this never really converges to the new limit,
right? And even in practice it takes 22 steps to converge within 50% and 73
steps to converge withing 10% of the desired threshold. But before we get
into the discussion about how to update dirty threshold, I'd like to
discuss what are the properties we require from dirty threshold updates?

I understand it's kind of unexpected that when some process allocates
anonymous memory, we suddently stall writers to flush out dirty data. OTOH
it is a nice behavior from memory management point of view because we
really try to keep amount of dirty pages in free memory at given
percentage which is nice for reclaim. When we choose to update dirty limit
in steps as you propose, we get some kind of compromise between behavior
for user and behavior for memory management.

But there are also other options - for example it would seem natural to me
treat allocation of anonymous page the same way as dirtiying the page thus
the process getting us over dirty threshold due to allocations will wait
until enough pages are written. Then we wouldn't need any smoothing of
memory available for caches because allocations would be naturally
throttled (OK, we still have memory taken by kernel allocations but these
are order of magnitude smaller problem I'd say). But I'm not sure how
acceptable this idea would be.

								Honza

> +	return;
> +update:
> +	global_dirty_limit = limit;
> +}
> +
> +static void global_update_bandwidth(unsigned long thresh,
> +				    unsigned long dirty,
> +				    unsigned long now)
> +{
> +	static DEFINE_SPINLOCK(dirty_lock);
> +
> +	if (now - default_backing_dev_info.bw_time_stamp < MAX_PAUSE)
> +		return;
> +
> +	spin_lock(&dirty_lock);
> +	if (now - default_backing_dev_info.bw_time_stamp >= MAX_PAUSE) {
> +		update_dirty_limit(thresh, dirty);
> +		default_backing_dev_info.bw_time_stamp = now;
> +	}
> +	spin_unlock(&dirty_lock);
> +}
> +
>  void __bdi_update_bandwidth(struct backing_dev_info *bdi,
>  			    unsigned long thresh,
>  			    unsigned long dirty,
> @@ -535,6 +573,9 @@ void __bdi_update_bandwidth(struct backi
>  	if (elapsed > HZ && time_before(bdi->bw_time_stamp, start_time))
>  		goto snapshot;
>  
> +	if (thresh)
> +		global_update_bandwidth(thresh, dirty, now);
> +
>  	bdi_update_write_bandwidth(bdi, elapsed, written);
>  
>  snapshot:
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  parent reply	other threads:[~2011-06-20 21:18 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-19 15:01 [PATCH 0/7] more writeback patches for 3.1 Wu Fengguang
2011-06-19 15:01 ` [PATCH 1/7] writeback: consolidate variable names in balance_dirty_pages() Wu Fengguang
2011-06-20  7:45   ` Christoph Hellwig
2011-06-19 15:01 ` [PATCH 2/7] writeback: add parameters to __bdi_update_bandwidth() Wu Fengguang
2011-06-19 15:31   ` Christoph Hellwig
2011-06-19 15:35     ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 3/7] writeback: introduce smoothed global dirty limit Wu Fengguang
2011-06-19 15:36   ` Christoph Hellwig
2011-06-19 15:55     ` Wu Fengguang
2011-06-21 23:59       ` Andrew Morton
2011-06-22 14:11         ` Wu Fengguang
2011-06-20 21:18   ` Jan Kara [this message]
2011-06-21 14:24     ` Wu Fengguang
2011-06-22  0:04   ` Andrew Morton
2011-06-22 14:24     ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 4/7] writeback: introduce max-pause and pass-good dirty limits Wu Fengguang
2011-06-22  0:20   ` Andrew Morton
2011-06-23 13:18     ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 5/7] writeback: make writeback_control.nr_to_write straight Wu Fengguang
2011-06-19 15:35   ` Christoph Hellwig
2011-06-19 16:14     ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 6/7] writeback: scale IO chunk size up to half device bandwidth Wu Fengguang
2011-06-19 15:01 ` [PATCH 7/7] writeback: timestamp based bdi dirty_exceeded state Wu Fengguang
2011-06-20 20:09   ` Christoph Hellwig
2011-06-21 10:00     ` Steven Whitehouse
2011-06-20 21:38   ` Jan Kara
2011-06-21 15:07     ` Wu Fengguang
2011-06-21 21:14       ` Jan Kara
2011-06-22 14:37         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110620211810.GH4810@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).