Re: [patch 2/4] mm: writeback: distribute write pages across allowable zones

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@google.com>
To: Johannes Weiner <jweiner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
	Rik van Riel <riel@redhat.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Chris Mason <chris.mason@oracle.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	xfs@oss.sgi.com, linux-btrfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [patch 2/4] mm: writeback: distribute write pages across allowable zones
Date: Wed, 21 Sep 2011 16:02:26 -0700	[thread overview]
Message-ID: <20110921160226.1bf74494.akpm@google.com> (raw)
In-Reply-To: <1316526315-16801-3-git-send-email-jweiner@redhat.com>

On Tue, 20 Sep 2011 15:45:13 +0200
Johannes Weiner <jweiner@redhat.com> wrote:

> This patch allows allocators to pass __GFP_WRITE when they know in
> advance that the allocated page will be written to and become dirty
> soon.  The page allocator will then attempt to distribute those
> allocations across zones, such that no single zone will end up full of
> dirty, and thus more or less, unreclaimable pages.

Across all zones, or across the zones within the node or what?  Some
more description of how all this plays with NUMA is needed, please.

> The global dirty limits are put in proportion to the respective zone's
> amount of dirtyable memory

I don't know what this means.  How can a global limit be controlled by
what is happening within each single zone?  Please describe this design
concept fully.

> and allocations diverted to other zones
> when the limit is reached.

hm.

> For now, the problem remains for NUMA configurations where the zones
> allowed for allocation are in sum not big enough to trigger the global
> dirty limits, but a future approach to solve this can reuse the
> per-zone dirty limit infrastructure laid out in this patch to have
> dirty throttling and the flusher threads consider individual zones.
> 
> ...
>
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -36,6 +36,7 @@ struct vm_area_struct;
>  #endif
>  #define ___GFP_NO_KSWAPD	0x400000u
>  #define ___GFP_OTHER_NODE	0x800000u
> +#define ___GFP_WRITE		0x1000000u
>  
>  /*
>   * GFP bitmasks..
> 
> ...
>
> +static unsigned long zone_dirtyable_memory(struct zone *zone)

Appears to return the number of pages in a particular zone which are
considered "dirtyable".  Some discussion of how this decision is made
would be illuminating.

> +{
> +	unsigned long x;
> +	/*
> +	 * To keep a reasonable ratio between dirty memory and lowmem,
> +	 * highmem is not considered dirtyable on a global level.

Whereabouts in the kernel is this policy implemented? 
determine_dirtyable_memory()?  It does (or can) consider highmem
pages?  Comment seems wrong?

Should we rename determine_dirtyable_memory() to
global_dirtyable_memory(), to get some sense of its relationship with
zone_dirtyable_memory()?

> +	 * But we allow individual highmem zones to hold a potentially
> +	 * bigger share of that global amount of dirty pages as long
> +	 * as they have enough free or reclaimable pages around.
> +	 */
> +	x = zone_page_state(zone, NR_FREE_PAGES) - zone->totalreserve_pages;
> +	x += zone_reclaimable_pages(zone);
> +	return x;
> +}
> +
> 
> ...
>
> -void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty)
> +static void dirty_limits(struct zone *zone,
> +			 unsigned long *pbackground,
> +			 unsigned long *pdirty)
>  {
> +	unsigned long uninitialized_var(zone_memory);
> +	unsigned long available_memory;
> +	unsigned long global_memory;
>  	unsigned long background;
> -	unsigned long dirty;
> -	unsigned long uninitialized_var(available_memory);
>  	struct task_struct *tsk;
> +	unsigned long dirty;
>  
> -	if (!vm_dirty_bytes || !dirty_background_bytes)
> -		available_memory = determine_dirtyable_memory();
> +	global_memory = determine_dirtyable_memory();
> +	if (zone)
> +		available_memory = zone_memory = zone_dirtyable_memory(zone);
> +	else
> +		available_memory = global_memory;
>  
> -	if (vm_dirty_bytes)
> +	if (vm_dirty_bytes) {
>  		dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
> -	else
> +		if (zone)

So passing zone==NULL alters dirty_limits()'s behaviour.  Seems that it
flips the function between global_dirty_limits and zone_dirty_limits?

Would it be better if we actually had separate global_dirty_limits()
and zone_dirty_limits() rather than a magical mode?

> +			dirty = dirty * zone_memory / global_memory;
> +	} else
>  		dirty = (vm_dirty_ratio * available_memory) / 100;
>  
> -	if (dirty_background_bytes)
> +	if (dirty_background_bytes) {
>  		background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);
> -	else
> +		if (zone)
> +			background = background * zone_memory / global_memory;
> +	} else
>  		background = (dirty_background_ratio * available_memory) / 100;
>  
>  	if (background >= dirty)
> 
> ...
>
> +bool zone_dirty_ok(struct zone *zone)

Full description of the return value, please.

> +{
> +	unsigned long background_thresh, dirty_thresh;
> +
> +	dirty_limits(zone, &background_thresh, &dirty_thresh);
> +
> +	return zone_page_state(zone, NR_FILE_DIRTY) +
> +		zone_page_state(zone, NR_UNSTABLE_NFS) +
> +		zone_page_state(zone, NR_WRITEBACK) <= dirty_thresh;
> +}

We never needed to calculate &background_thresh,.  I wonder if that
matters.

> 
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-09-21 23:02 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-20 13:45 [patch 0/4] 50% faster writing to your USB drive!* Johannes Weiner
2011-09-20 13:45 ` [patch 1/4] mm: exclude reserved pages from dirtyable memory Johannes Weiner
2011-09-20 15:21   ` Rik van Riel
2011-09-21 14:04   ` Mel Gorman
2011-09-21 15:03     ` Mel Gorman
2011-09-22  9:03       ` Johannes Weiner
2011-09-22 10:54         ` Mel Gorman
2011-09-23 14:38           ` [patch 1/4 v2] " Johannes Weiner
2011-09-28  4:55             ` Minchan Kim
2011-09-28  7:50               ` Johannes Weiner
2011-09-28 18:35                 ` Minchan Kim
2011-09-20 13:45 ` [patch 2/4] mm: writeback: distribute write pages across allowable zones Johannes Weiner
2011-09-20 18:36   ` Rik van Riel
2011-09-21 11:04   ` Shaohua Li
2011-09-21 13:35     ` Johannes Weiner
2011-09-21 14:30   ` Mel Gorman
2011-09-21 23:02   ` Andrew Morton [this message]
2011-09-22  8:52     ` Johannes Weiner
2011-09-23 14:41       ` [patch 1/2/4] mm: writeback: cleanups in preparation for per-zone dirty limits Johannes Weiner
2011-09-28  5:57         ` Minchan Kim
2011-09-28  9:27         ` Mel Gorman
2011-09-23 14:42       ` [patch 2/2/4] mm: try to distribute dirty pages fairly across zones Johannes Weiner
2011-09-28  5:56         ` Minchan Kim
2011-09-28  7:11           ` Johannes Weiner
2011-09-28 18:09             ` Minchan Kim
2011-09-28  9:36         ` Mel Gorman
2011-09-20 13:45 ` [patch 3/4] mm: filemap: pass __GFP_WRITE from grab_cache_page_write_begin() Johannes Weiner
2011-09-20 14:25   ` Christoph Hellwig
2011-09-20 18:38     ` Rik van Riel
2011-09-20 18:40       ` Christoph Hellwig
2011-09-21 14:09         ` Johannes Weiner
2011-09-20 18:40   ` Rik van Riel
2011-09-21 14:34   ` Mel Gorman
2011-09-28  6:02   ` Minchan Kim
2011-09-20 13:45 ` [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations Johannes Weiner
2011-09-20 13:56   ` Johannes Weiner
2011-09-20 14:09     ` Josef Bacik
2011-09-20 14:14       ` Johannes Weiner
2011-09-20 18:41   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110921160226.1bf74494.akpm@google.com \
    --to=akpm@google.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jweiner@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    --cc=tytso@mit.edu \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).