All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Johannes Weiner <jweiner@redhat.com>
Cc: Andrew Morton <akpm@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
	Rik van Riel <riel@redhat.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Chris Mason <chris.mason@oracle.com>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	xfs@oss.sgi.com, linux-btrfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [patch 2/2/4] mm: try to distribute dirty pages fairly across zones
Date: Wed, 28 Sep 2011 10:36:29 +0100	[thread overview]
Message-ID: <20110928093629.GE11313@suse.de> (raw)
In-Reply-To: <20110923144248.GC2606@redhat.com>

On Fri, Sep 23, 2011 at 04:42:48PM +0200, Johannes Weiner wrote:
> The maximum number of dirty pages that exist in the system at any time
> is determined by a number of pages considered dirtyable and a
> user-configured percentage of those, or an absolute number in bytes.
> 
> This number of dirtyable pages is the sum of memory provided by all
> the zones in the system minus their lowmem reserves and high
> watermarks, so that the system can retain a healthy number of free
> pages without having to reclaim dirty pages.
> 
> But there is a flaw in that we have a zoned page allocator which does
> not care about the global state but rather the state of individual
> memory zones.  And right now there is nothing that prevents one zone
> from filling up with dirty pages while other zones are spared, which
> frequently leads to situations where kswapd, in order to restore the
> watermark of free pages, does indeed have to write pages from that
> zone's LRU list.  This can interfere so badly with IO from the flusher
> threads that major filesystems (btrfs, xfs, ext4) mostly ignore write
> requests from reclaim already, taking away the VM's only possibility
> to keep such a zone balanced, aside from hoping the flushers will soon
> clean pages from that zone.
> 
> Enter per-zone dirty limits.  They are to a zone's dirtyable memory
> what the global limit is to the global amount of dirtyable memory, and
> try to make sure that no single zone receives more than its fair share
> of the globally allowed dirty pages in the first place.  As the number
> of pages considered dirtyable exclude the zones' lowmem reserves and
> high watermarks, the maximum number of dirty pages in a zone is such
> that the zone can always be balanced without requiring page cleaning.
> 
> As this is a placement decision in the page allocator and pages are
> dirtied only after the allocation, this patch allows allocators to
> pass __GFP_WRITE when they know in advance that the page will be
> written to and become dirty soon.  The page allocator will then
> attempt to allocate from the first zone of the zonelist - which on
> NUMA is determined by the task's NUMA memory policy - that has not
> exceeded its dirty limit.
> 
> At first glance, it would appear that the diversion to lower zones can
> increase pressure on them, but this is not the case.  With a full high
> zone, allocations will be diverted to lower zones eventually, so it is
> more of a shift in timing of the lower zone allocations.  Workloads
> that previously could fit their dirty pages completely in the higher
> zone may be forced to allocate from lower zones, but the amount of
> pages that 'spill over' are limited themselves by the lower zones'
> dirty constraints, and thus unlikely to become a problem.
> 
> For now, the problem of unfair dirty page distribution remains for
> NUMA configurations where the zones allowed for allocation are in sum
> not big enough to trigger the global dirty limits, wake up the flusher
> threads and remedy the situation.  Because of this, an allocation that
> could not succeed on any of the considered zones is allowed to ignore
> the dirty limits before going into direct reclaim or even failing the
> allocation, until a future patch changes the global dirty throttling
> and flusher thread activation so that they take individual zone states
> into account.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Johannes Weiner <jweiner@redhat.com>
Cc: Andrew Morton <akpm@google.com>, Rik van Riel <riel@redhat.com>,
	linux-ext4@vger.kernel.org, Jan Kara <jack@suse.cz>,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	xfs@oss.sgi.com, Christoph Hellwig <hch@infradead.org>,
	linux-mm@kvack.org, Andreas Dilger <adilger.kernel@dilger.ca>,
	Minchan Kim <minchan.kim@gmail.com>,
	linux-fsdevel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [patch 2/2/4] mm: try to distribute dirty pages fairly across zones
Date: Wed, 28 Sep 2011 10:36:29 +0100	[thread overview]
Message-ID: <20110928093629.GE11313@suse.de> (raw)
In-Reply-To: <20110923144248.GC2606@redhat.com>

On Fri, Sep 23, 2011 at 04:42:48PM +0200, Johannes Weiner wrote:
> The maximum number of dirty pages that exist in the system at any time
> is determined by a number of pages considered dirtyable and a
> user-configured percentage of those, or an absolute number in bytes.
> 
> This number of dirtyable pages is the sum of memory provided by all
> the zones in the system minus their lowmem reserves and high
> watermarks, so that the system can retain a healthy number of free
> pages without having to reclaim dirty pages.
> 
> But there is a flaw in that we have a zoned page allocator which does
> not care about the global state but rather the state of individual
> memory zones.  And right now there is nothing that prevents one zone
> from filling up with dirty pages while other zones are spared, which
> frequently leads to situations where kswapd, in order to restore the
> watermark of free pages, does indeed have to write pages from that
> zone's LRU list.  This can interfere so badly with IO from the flusher
> threads that major filesystems (btrfs, xfs, ext4) mostly ignore write
> requests from reclaim already, taking away the VM's only possibility
> to keep such a zone balanced, aside from hoping the flushers will soon
> clean pages from that zone.
> 
> Enter per-zone dirty limits.  They are to a zone's dirtyable memory
> what the global limit is to the global amount of dirtyable memory, and
> try to make sure that no single zone receives more than its fair share
> of the globally allowed dirty pages in the first place.  As the number
> of pages considered dirtyable exclude the zones' lowmem reserves and
> high watermarks, the maximum number of dirty pages in a zone is such
> that the zone can always be balanced without requiring page cleaning.
> 
> As this is a placement decision in the page allocator and pages are
> dirtied only after the allocation, this patch allows allocators to
> pass __GFP_WRITE when they know in advance that the page will be
> written to and become dirty soon.  The page allocator will then
> attempt to allocate from the first zone of the zonelist - which on
> NUMA is determined by the task's NUMA memory policy - that has not
> exceeded its dirty limit.
> 
> At first glance, it would appear that the diversion to lower zones can
> increase pressure on them, but this is not the case.  With a full high
> zone, allocations will be diverted to lower zones eventually, so it is
> more of a shift in timing of the lower zone allocations.  Workloads
> that previously could fit their dirty pages completely in the higher
> zone may be forced to allocate from lower zones, but the amount of
> pages that 'spill over' are limited themselves by the lower zones'
> dirty constraints, and thus unlikely to become a problem.
> 
> For now, the problem of unfair dirty page distribution remains for
> NUMA configurations where the zones allowed for allocation are in sum
> not big enough to trigger the global dirty limits, wake up the flusher
> threads and remedy the situation.  Because of this, an allocation that
> could not succeed on any of the considered zones is allowed to ignore
> the dirty limits before going into direct reclaim or even failing the
> allocation, until a future patch changes the global dirty throttling
> and flusher thread activation so that they take individual zone states
> into account.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Johannes Weiner <jweiner@redhat.com>
Cc: Andrew Morton <akpm@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
	Rik van Riel <riel@redhat.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Chris Mason <chris.mason@oracle.com>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	xfs@oss.sgi.com, linux-btrfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [patch 2/2/4] mm: try to distribute dirty pages fairly across zones
Date: Wed, 28 Sep 2011 10:36:29 +0100	[thread overview]
Message-ID: <20110928093629.GE11313@suse.de> (raw)
In-Reply-To: <20110923144248.GC2606@redhat.com>

On Fri, Sep 23, 2011 at 04:42:48PM +0200, Johannes Weiner wrote:
> The maximum number of dirty pages that exist in the system at any time
> is determined by a number of pages considered dirtyable and a
> user-configured percentage of those, or an absolute number in bytes.
> 
> This number of dirtyable pages is the sum of memory provided by all
> the zones in the system minus their lowmem reserves and high
> watermarks, so that the system can retain a healthy number of free
> pages without having to reclaim dirty pages.
> 
> But there is a flaw in that we have a zoned page allocator which does
> not care about the global state but rather the state of individual
> memory zones.  And right now there is nothing that prevents one zone
> from filling up with dirty pages while other zones are spared, which
> frequently leads to situations where kswapd, in order to restore the
> watermark of free pages, does indeed have to write pages from that
> zone's LRU list.  This can interfere so badly with IO from the flusher
> threads that major filesystems (btrfs, xfs, ext4) mostly ignore write
> requests from reclaim already, taking away the VM's only possibility
> to keep such a zone balanced, aside from hoping the flushers will soon
> clean pages from that zone.
> 
> Enter per-zone dirty limits.  They are to a zone's dirtyable memory
> what the global limit is to the global amount of dirtyable memory, and
> try to make sure that no single zone receives more than its fair share
> of the globally allowed dirty pages in the first place.  As the number
> of pages considered dirtyable exclude the zones' lowmem reserves and
> high watermarks, the maximum number of dirty pages in a zone is such
> that the zone can always be balanced without requiring page cleaning.
> 
> As this is a placement decision in the page allocator and pages are
> dirtied only after the allocation, this patch allows allocators to
> pass __GFP_WRITE when they know in advance that the page will be
> written to and become dirty soon.  The page allocator will then
> attempt to allocate from the first zone of the zonelist - which on
> NUMA is determined by the task's NUMA memory policy - that has not
> exceeded its dirty limit.
> 
> At first glance, it would appear that the diversion to lower zones can
> increase pressure on them, but this is not the case.  With a full high
> zone, allocations will be diverted to lower zones eventually, so it is
> more of a shift in timing of the lower zone allocations.  Workloads
> that previously could fit their dirty pages completely in the higher
> zone may be forced to allocate from lower zones, but the amount of
> pages that 'spill over' are limited themselves by the lower zones'
> dirty constraints, and thus unlikely to become a problem.
> 
> For now, the problem of unfair dirty page distribution remains for
> NUMA configurations where the zones allowed for allocation are in sum
> not big enough to trigger the global dirty limits, wake up the flusher
> threads and remedy the situation.  Because of this, an allocation that
> could not succeed on any of the considered zones is allowed to ignore
> the dirty limits before going into direct reclaim or even failing the
> allocation, until a future patch changes the global dirty throttling
> and flusher thread activation so that they take individual zone states
> into account.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-09-28  9:36 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-20 13:45 [patch 0/4] 50% faster writing to your USB drive!* Johannes Weiner
2011-09-20 13:45 ` Johannes Weiner
2011-09-20 13:45 ` Johannes Weiner
2011-09-20 13:45 ` [patch 1/4] mm: exclude reserved pages from dirtyable memory Johannes Weiner
2011-09-20 13:45   ` Johannes Weiner
2011-09-20 13:45   ` Johannes Weiner
2011-09-20 15:21   ` Rik van Riel
2011-09-20 15:21     ` Rik van Riel
2011-09-20 15:21     ` Rik van Riel
2011-09-21 14:04   ` Mel Gorman
2011-09-21 14:04     ` Mel Gorman
2011-09-21 14:04     ` Mel Gorman
2011-09-21 15:03     ` Mel Gorman
2011-09-21 15:03       ` Mel Gorman
2011-09-21 15:03       ` Mel Gorman
2011-09-22  9:03       ` Johannes Weiner
2011-09-22  9:03         ` Johannes Weiner
2011-09-22  9:03         ` Johannes Weiner
2011-09-22 10:54         ` Mel Gorman
2011-09-22 10:54           ` Mel Gorman
2011-09-22 10:54           ` Mel Gorman
2011-09-23 14:38           ` [patch 1/4 v2] " Johannes Weiner
2011-09-23 14:38             ` Johannes Weiner
2011-09-23 14:38             ` Johannes Weiner
2011-09-28  4:55             ` Minchan Kim
2011-09-28  4:55               ` Minchan Kim
2011-09-28  4:55               ` Minchan Kim
2011-09-28  7:50               ` Johannes Weiner
2011-09-28  7:50                 ` Johannes Weiner
2011-09-28  7:50                 ` Johannes Weiner
2011-09-28 18:35                 ` Minchan Kim
2011-09-28 18:35                   ` Minchan Kim
2011-09-28 18:35                   ` Minchan Kim
2011-09-20 13:45 ` [patch 2/4] mm: writeback: distribute write pages across allowable zones Johannes Weiner
2011-09-20 13:45   ` Johannes Weiner
2011-09-20 13:45   ` Johannes Weiner
2011-09-20 18:36   ` Rik van Riel
2011-09-20 18:36     ` Rik van Riel
2011-09-20 18:36     ` Rik van Riel
2011-09-21 11:04   ` Shaohua Li
2011-09-21 11:04     ` Shaohua Li
2011-09-21 11:04     ` Shaohua Li
2011-09-21 13:35     ` Johannes Weiner
2011-09-21 13:35       ` Johannes Weiner
2011-09-21 13:35       ` Johannes Weiner
2011-09-21 14:30   ` Mel Gorman
2011-09-21 14:30     ` Mel Gorman
2011-09-21 14:30     ` Mel Gorman
2011-09-21 23:02   ` Andrew Morton
2011-09-21 23:02     ` Andrew Morton
2011-09-21 23:02     ` Andrew Morton
2011-09-22  8:52     ` Johannes Weiner
2011-09-22  8:52       ` Johannes Weiner
2011-09-22  8:52       ` Johannes Weiner
2011-09-23 14:41       ` [patch 1/2/4] mm: writeback: cleanups in preparation for per-zone dirty limits Johannes Weiner
2011-09-23 14:41         ` Johannes Weiner
2011-09-23 14:41         ` Johannes Weiner
2011-09-28  5:57         ` Minchan Kim
2011-09-28  5:57           ` Minchan Kim
2011-09-28  5:57           ` Minchan Kim
2011-09-28  9:27         ` Mel Gorman
2011-09-28  9:27           ` Mel Gorman
2011-09-28  9:27           ` Mel Gorman
2011-09-23 14:42       ` [patch 2/2/4] mm: try to distribute dirty pages fairly across zones Johannes Weiner
2011-09-23 14:42         ` Johannes Weiner
2011-09-23 14:42         ` Johannes Weiner
2011-09-28  5:56         ` Minchan Kim
2011-09-28  5:56           ` Minchan Kim
2011-09-28  5:56           ` Minchan Kim
2011-09-28  7:11           ` Johannes Weiner
2011-09-28  7:11             ` Johannes Weiner
2011-09-28  7:11             ` Johannes Weiner
2011-09-28 18:09             ` Minchan Kim
2011-09-28 18:09               ` Minchan Kim
2011-09-28 18:09               ` Minchan Kim
2011-09-28  9:36         ` Mel Gorman [this message]
2011-09-28  9:36           ` Mel Gorman
2011-09-28  9:36           ` Mel Gorman
2011-09-20 13:45 ` [patch 3/4] mm: filemap: pass __GFP_WRITE from grab_cache_page_write_begin() Johannes Weiner
2011-09-20 13:45   ` Johannes Weiner
2011-09-20 13:45   ` Johannes Weiner
2011-09-20 14:25   ` Christoph Hellwig
2011-09-20 14:25     ` Christoph Hellwig
2011-09-20 14:25     ` Christoph Hellwig
2011-09-20 18:38     ` Rik van Riel
2011-09-20 18:38       ` Rik van Riel
2011-09-20 18:38       ` Rik van Riel
2011-09-20 18:40       ` Christoph Hellwig
2011-09-20 18:40         ` Christoph Hellwig
2011-09-20 18:40         ` Christoph Hellwig
2011-09-21 14:09         ` Johannes Weiner
2011-09-21 14:09           ` Johannes Weiner
2011-09-21 14:09           ` Johannes Weiner
2011-09-20 18:40   ` Rik van Riel
2011-09-20 18:40     ` Rik van Riel
2011-09-20 18:40     ` Rik van Riel
2011-09-21 14:34   ` Mel Gorman
2011-09-21 14:34     ` Mel Gorman
2011-09-21 14:34     ` Mel Gorman
2011-09-28  6:02   ` Minchan Kim
2011-09-28  6:02     ` Minchan Kim
2011-09-28  6:02     ` Minchan Kim
2011-09-20 13:45 ` [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations Johannes Weiner
2011-09-20 13:45   ` Johannes Weiner
2011-09-20 13:45   ` Johannes Weiner
2011-09-20 13:56   ` Johannes Weiner
2011-09-20 13:56     ` Johannes Weiner
2011-09-20 13:56     ` Johannes Weiner
2011-09-20 14:09     ` Josef Bacik
2011-09-20 14:09       ` Josef Bacik
2011-09-20 14:09       ` Josef Bacik
2011-09-20 14:14       ` Johannes Weiner
2011-09-20 14:14         ` Johannes Weiner
2011-09-20 14:14         ` Johannes Weiner
2011-09-20 18:41   ` Rik van Riel
2011-09-20 18:41     ` Rik van Riel
2011-09-20 18:41     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110928093629.GE11313@suse.de \
    --to=mgorman@suse.de \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jweiner@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    --cc=tytso@mit.edu \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.