Re: [PATCH] mm: compaction: make buffer cache __GFP_MOVABLE

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rik van Riel <riel@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>, Linux-MM <linux-mm@kvack.org>,
	Minchan Kim <minchan.kim@gmail.com>, Jan Kara <jack@suse.cz>,
	Andy Isaacson <adi@hexapodia.org>,
	Johannes Weiner <jweiner@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: compaction: make buffer cache __GFP_MOVABLE
Date: Mon, 21 Nov 2011 19:55:37 -0500	[thread overview]
Message-ID: <4ECAF309.60908@redhat.com> (raw)
In-Reply-To: <20111121224545.GC8397@redhat.com>

On 11/21/2011 05:45 PM, Andrea Arcangeli wrote:
> On Mon, Nov 21, 2011 at 11:17:26AM +0000, Mel Gorman wrote:
>> On Fri, Nov 18, 2011 at 10:35:30PM +0100, Andrea Arcangeli wrote:
>>> folks who wants low latency or no memory overhead should simply
>>> disable compaction.
>>
>> That strikes me as being somewhat heavy handed. Compaction should be as
>> low latency as possible.
>
> Yes I was meaning in the very short term. Optimizations are always
> possible :) we've just to sort out some issues (as previous part of
> the email discussed).
>
>> There might be some confusion on what commits were for. Commit
>> [e0887c19: vmscan: limit direct reclaim for higher order allocations]
>> was not about low latency but more about reclaim/compaction reclaiming
>> too much memory. IIRC, Rik's main problem was that there was too much
>> memory free on his machine when THP was enabled.
>>
>>> the __GFP_NO_KSWAPD check too should be dropped I think,
>>
>> Only if we can get rid of the major stalls. I haven't looked closely at
>> your series yet but I'll be searching for a replacment for patch 3 of
>> this series in it.
>
> I reduced the migrate loops, for both async and sync compactions. I
> doubt it'll be very effective but it may help a bit.
>
> Also this one I also suggest it in the short term.
>
> I mean until async migrate can deal with all type of pages (the issues
> you're trying to fix) the __GFP_NO_KSWAPD check would not be reliable
> enough as part of the movable zone wouldn't be movable. It'd defeat
> the reliability from the movable pageblock in compaction context. And
> I doubt a more advanced async compaction will be ok for 3.2, so I
> don't think 3.2 should have the __GFP_NO_KSWAPD and I tend to back
> Andrew's argument. My patch OTOH that only reduces the loops and
> doesn't alter the movable pageblock semantics in compaction context,
> sounds safer. It won't help equally well though.
>
>> Ok. It's not even close to what I was testing but I can move to this
>> test so we're looking at the same thing for allocation success rates.
>
> Note I guess we also need the below. This also should fix by practical
> means Rik's trouble (he was using KVM without O_DIRECT on raw
> blkdev). That explains why he experienced too much reclaim, the VM had
> no choice but to do reclaim because the blkdev cache was not staying
> in the movable pageblocks preventing compaction effectiveness (and
> likely they used lots of ram).
>
> We may still have to limit reclaim but not like the patch that went
> upstream implements. When compaction_suitable is happy about the
> wmarks, the compaction loop isn't still as reliable as it could be in
> the movable zone, and doing more reclaim helps a lot, the more free
> pages the more compaction goes ahead and has a change to compact more
> stuff before the two scans meets in the middle. With Rik we thought
> about a way to do just a no-swapping reclaim to shrink caches like
> slab.
>
> I mean the cp /dev/sda /dev/null scenario that happens without the
> below patch can still materialize in a fileserver with very large slab
> caches. We still have to invoke reclaim, maybe not too heavy, we
> shouldn't end up swapping, basically it should be a light reclaim (the
> opposite of lumpy reclaim). But with the current check upstream
> reclaim practically won't run at all if compaction fails, so it's
> normal the success rate goes down a lot. (btw I didn't verify yet at
> runtime if migration succeeds on the blkdev pagecache)
>
> ====
> mm: block_dev pagecache is movable
>
> Allow block_dev pagecache to go in movable pageblocks. If compaction
> fails reclaim is invoked. So with raw devices being used without
> O_DIRECT, reclaim would run too much and because the cache pages would
> get mixed with slab and other not movable entities, the compaction
> reliability would decrease.
>
> Signed-off-by: Andrea Arcangeli<aarcange@redhat.com>
> ---
>   fs/block_dev.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index b07f1da..f7111c4 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -565,7 +565,7 @@ struct block_device *bdget(dev_t dev)
>   		inode->i_rdev = dev;
>   		inode->i_bdev = bdev;
>   		inode->i_data.a_ops =&def_blk_aops;
> -		mapping_set_gfp_mask(&inode->i_data, GFP_USER);
> +		mapping_set_gfp_mask(&inode->i_data, GFP_USER|__GFP_MOVABLE);
>   		inode->i_data.backing_dev_info =&default_backing_dev_info;
>   		spin_lock(&bdev_lock);
>   		list_add(&bdev->bd_list,&all_bdevs);

Reviewed-by: Rik van Riel <riel@redhat.com>

Maybe prettier done by defining a GFP_USER_MOVABLE in the
proper include file and using that?

Still, we need this fix.

I believe this is safe because all the users of buffer cache
pages already need to protect the page against getting
reclaimed (by taking a refcount), which also protects against
migration.

Only pages without an extra refcount (not in current use by
anyone) can be migrated.

I do not believe this change could break any filesystem, but
adding linux-fsdevel to the CC anyway just to check...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Rik van Riel <riel@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>, Linux-MM <linux-mm@kvack.org>,
	Minchan Kim <minchan.kim@gmail.com>, Jan Kara <jack@suse.cz>,
	Andy Isaacson <adi@hexapodia.org>,
	Johannes Weiner <jweiner@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: compaction: make buffer cache __GFP_MOVABLE
Date: Mon, 21 Nov 2011 19:55:37 -0500	[thread overview]
Message-ID: <4ECAF309.60908@redhat.com> (raw)
In-Reply-To: <20111121224545.GC8397@redhat.com>

On 11/21/2011 05:45 PM, Andrea Arcangeli wrote:
> On Mon, Nov 21, 2011 at 11:17:26AM +0000, Mel Gorman wrote:
>> On Fri, Nov 18, 2011 at 10:35:30PM +0100, Andrea Arcangeli wrote:
>>> folks who wants low latency or no memory overhead should simply
>>> disable compaction.
>>
>> That strikes me as being somewhat heavy handed. Compaction should be as
>> low latency as possible.
>
> Yes I was meaning in the very short term. Optimizations are always
> possible :) we've just to sort out some issues (as previous part of
> the email discussed).
>
>> There might be some confusion on what commits were for. Commit
>> [e0887c19: vmscan: limit direct reclaim for higher order allocations]
>> was not about low latency but more about reclaim/compaction reclaiming
>> too much memory. IIRC, Rik's main problem was that there was too much
>> memory free on his machine when THP was enabled.
>>
>>> the __GFP_NO_KSWAPD check too should be dropped I think,
>>
>> Only if we can get rid of the major stalls. I haven't looked closely at
>> your series yet but I'll be searching for a replacment for patch 3 of
>> this series in it.
>
> I reduced the migrate loops, for both async and sync compactions. I
> doubt it'll be very effective but it may help a bit.
>
> Also this one I also suggest it in the short term.
>
> I mean until async migrate can deal with all type of pages (the issues
> you're trying to fix) the __GFP_NO_KSWAPD check would not be reliable
> enough as part of the movable zone wouldn't be movable. It'd defeat
> the reliability from the movable pageblock in compaction context. And
> I doubt a more advanced async compaction will be ok for 3.2, so I
> don't think 3.2 should have the __GFP_NO_KSWAPD and I tend to back
> Andrew's argument. My patch OTOH that only reduces the loops and
> doesn't alter the movable pageblock semantics in compaction context,
> sounds safer. It won't help equally well though.
>
>> Ok. It's not even close to what I was testing but I can move to this
>> test so we're looking at the same thing for allocation success rates.
>
> Note I guess we also need the below. This also should fix by practical
> means Rik's trouble (he was using KVM without O_DIRECT on raw
> blkdev). That explains why he experienced too much reclaim, the VM had
> no choice but to do reclaim because the blkdev cache was not staying
> in the movable pageblocks preventing compaction effectiveness (and
> likely they used lots of ram).
>
> We may still have to limit reclaim but not like the patch that went
> upstream implements. When compaction_suitable is happy about the
> wmarks, the compaction loop isn't still as reliable as it could be in
> the movable zone, and doing more reclaim helps a lot, the more free
> pages the more compaction goes ahead and has a change to compact more
> stuff before the two scans meets in the middle. With Rik we thought
> about a way to do just a no-swapping reclaim to shrink caches like
> slab.
>
> I mean the cp /dev/sda /dev/null scenario that happens without the
> below patch can still materialize in a fileserver with very large slab
> caches. We still have to invoke reclaim, maybe not too heavy, we
> shouldn't end up swapping, basically it should be a light reclaim (the
> opposite of lumpy reclaim). But with the current check upstream
> reclaim practically won't run at all if compaction fails, so it's
> normal the success rate goes down a lot. (btw I didn't verify yet at
> runtime if migration succeeds on the blkdev pagecache)
>
> ====
> mm: block_dev pagecache is movable
>
> Allow block_dev pagecache to go in movable pageblocks. If compaction
> fails reclaim is invoked. So with raw devices being used without
> O_DIRECT, reclaim would run too much and because the cache pages would
> get mixed with slab and other not movable entities, the compaction
> reliability would decrease.
>
> Signed-off-by: Andrea Arcangeli<aarcange@redhat.com>
> ---
>   fs/block_dev.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index b07f1da..f7111c4 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -565,7 +565,7 @@ struct block_device *bdget(dev_t dev)
>   		inode->i_rdev = dev;
>   		inode->i_bdev = bdev;
>   		inode->i_data.a_ops =&def_blk_aops;
> -		mapping_set_gfp_mask(&inode->i_data, GFP_USER);
> +		mapping_set_gfp_mask(&inode->i_data, GFP_USER|__GFP_MOVABLE);
>   		inode->i_data.backing_dev_info =&default_backing_dev_info;
>   		spin_lock(&bdev_lock);
>   		list_add(&bdev->bd_list,&all_bdevs);

Reviewed-by: Rik van Riel <riel@redhat.com>

Maybe prettier done by defining a GFP_USER_MOVABLE in the
proper include file and using that?

Still, we need this fix.

I believe this is safe because all the users of buffer cache
pages already need to protect the page against getting
reclaimed (by taking a refcount), which also protects against
migration.

Only pages without an extra refcount (not in current use by
anyone) can be migrated.

I do not believe this change could break any filesystem, but
adding linux-fsdevel to the CC anyway just to check...

next prev parent reply	other threads:[~2011-11-22  0:55 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-18 16:58 [RFC PATCH 0/5] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v3 Mel Gorman
2011-11-18 16:58 ` Mel Gorman
2011-11-18 16:58 ` [PATCH 1/5] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-18 17:28   ` Andrea Arcangeli
2011-11-18 17:28     ` Andrea Arcangeli
2011-11-21 17:16   ` Rik van Riel
2011-11-21 17:16     ` Rik van Riel
2011-11-18 16:58 ` [PATCH 2/5] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-18 17:27   ` Andrea Arcangeli
2011-11-18 17:27     ` Andrea Arcangeli
2011-11-21 21:46   ` Rik van Riel
2011-11-21 21:46     ` Rik van Riel
2011-11-18 16:58 ` [PATCH 3/5] mm: Do not stall in synchronous compaction for THP allocations Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-18 17:34   ` Andrea Arcangeli
2011-11-18 17:34     ` Andrea Arcangeli
2011-11-18 16:58 ` [PATCH 4/5] mm: compaction: Determine if dirty pages can be migreated without blocking within ->migratepage Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-18 21:35   ` Andrea Arcangeli
2011-11-18 21:35     ` Andrea Arcangeli
2011-11-21 11:17     ` Mel Gorman
2011-11-21 11:17       ` Mel Gorman
2011-11-21 22:45       ` Andrea Arcangeli
2011-11-21 22:45         ` Andrea Arcangeli
2011-11-22  0:55         ` Rik van Riel [this message]
2011-11-22  0:55           ` [PATCH] mm: compaction: make buffer cache __GFP_MOVABLE Rik van Riel
2011-11-22 12:59         ` [PATCH 4/5] mm: compaction: Determine if dirty pages can be migreated without blocking within ->migratepage Mel Gorman
2011-11-22 12:59           ` Mel Gorman
2011-11-24  1:19           ` Andrea Arcangeli
2011-11-24  1:19             ` Andrea Arcangeli
2011-11-24 12:21             ` Mel Gorman
2011-11-24 12:21               ` Mel Gorman
2011-11-26  6:51               ` Andy Isaacson
2011-11-26  6:51                 ` Andy Isaacson
2011-11-27 20:50               ` Rik van Riel
2011-11-27 20:50                 ` Rik van Riel
2011-11-19  8:59   ` Nai Xia
2011-11-19  8:59     ` Nai Xia
2011-11-19  9:48     ` Nai Xia
2011-11-19  9:48       ` Nai Xia
2011-11-21 11:19     ` Mel Gorman
2011-11-21 11:19       ` Mel Gorman
2011-11-18 16:58 ` [PATCH 5/5] mm: compaction: make isolate_lru_page() filter-aware again Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-19 19:54 ` [RFC PATCH 0/5] Reduce compaction-related stalls Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-19 19:54 ` [PATCH 1/8] mm: compaction: Allow compaction to isolate dirty pages Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-19 19:54 ` [PATCH 2/8] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-19 19:54 ` [PATCH 3/8] mm: check if we isolated a compound page during lumpy scan Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 11:51   ` Mel Gorman
2011-11-21 11:51     ` Mel Gorman
2011-11-19 19:54 ` [PATCH 4/8] mm: compaction: defer compaction only with sync_migration Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 12:36   ` Mel Gorman
2011-11-21 12:36     ` Mel Gorman
2011-11-19 19:54 ` [PATCH 5/8] mm: compaction: avoid overwork in migrate sync mode Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 21:59   ` Rik van Riel
2011-11-21 21:59     ` Rik van Riel
2011-11-22  9:51     ` Mel Gorman
2011-11-22  9:51       ` Mel Gorman
2011-11-19 19:54 ` [PATCH 6/8] Revert "mm: compaction: make isolate_lru_page() filter-aware" Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 12:57   ` Mel Gorman
2011-11-21 12:57     ` Mel Gorman
2011-11-19 19:54 ` [PATCH 7/8] Revert "vmscan: abort reclaim/compaction if compaction can proceed" Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 13:09   ` Mel Gorman
2011-11-21 13:09     ` Mel Gorman
2011-11-21 15:37     ` Rik van Riel
2011-11-21 15:37       ` Rik van Riel
2011-11-19 19:54 ` [PATCH 8/8] Revert "vmscan: limit direct reclaim for higher order allocations" Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 21:57   ` Rik van Riel
2011-11-21 21:57     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ECAF309.60908@redhat.com \
    --to=riel@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=adi@hexapodia.org \
    --cc=jack@suse.cz \
    --cc=jweiner@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.