Re: [PATCH 4/5] mm: compaction: Determine if dirty pages can be migreated without blocking within ->migratepage

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rik van Riel <riel@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Linux-MM <linux-mm@kvack.org>,
	Minchan Kim <minchan.kim@gmail.com>, Jan Kara <jack@suse.cz>,
	Andy Isaacson <adi@hexapodia.org>,
	Johannes Weiner <jweiner@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/5] mm: compaction: Determine if dirty pages can be migreated without blocking within ->migratepage
Date: Sun, 27 Nov 2011 15:50:22 -0500	[thread overview]
Message-ID: <4ED2A28E.2070206@redhat.com> (raw)
In-Reply-To: <20111124122144.GR19415@suse.de>

On 11/24/2011 07:21 AM, Mel Gorman wrote:
> On Thu, Nov 24, 2011 at 02:19:43AM +0100, Andrea Arcangeli wrote:

>> But funny thing grow_dev_page already sets __GFP_MOVABLE. That's
>> pretty weird and it's probably source of a few not movable pages in
>> the movable block. But then many bh are movable... most of them are,
>> it's just the superblock that isn't.
>>
>> But considering grow_dev_page sets __GFP_MOVABLE, any worry about pins
>> from the fs on the block_dev.c pagecache shouldn't be a concern...
>>
>
> Except in quantity. We can cope with some pollution of MIGRATE_MOVABLE
> but if it gets excessive, it will cause a lot of trouble. Superblock
> bh's may not be movable but there are not many of them and they are
> long lived.

We're potentially doomed either way :)

If we allocate a lot of movable pages in non-movable
blocks, we can end up with a lot of slightly polluted
blocks even after reclaiming all the reclaimable page
cache.

If we allocate a few non-movable pages in movable
blocks, we can end up with the same situation.

Either way, we can potentially end up with a lot of
memory that cannot be defragmented.

Of course, it could take the mounting of a lot of
filesystems for this problem to be triggered, but we
know there are people doing that.

>> __GFP_MOVABLE missing block_dev also was not
>> so common and it most certainly contributed to a reclaim more
>> aggressive than it would have happened with that fix. I think you can
>> push things one at time without urgency here, and I'd prefer maybe if
>> block_dev patch is applied and the other reversed in vmscan.c or
>> improved to start limiting only if we're above 8*high or some
>> percentage check to allow a little more reclaim than rc2 allows
>
> The limiting is my current preferred option - at least until it is
> confirmed that it really is ok to mark block_dev pages movable and that
> Rik is ok with the revert.

I am fine with replacing the compaction checks with free limit
checks. Funny enough, the first iteration of the patch I submitted
to limit reclaim used a free limit check :)

I also suspect we will want to call shrink_slab regardless of
whether or not a memory zone is already over its free limit for
direct reclaim, since that has the potential to free an otherwise
unmovable page.

>> (i.e. no reclaim at all which likely results in a failure in hugepage
>> allocation). Not unlimited as 3.1 is ok with me but if kswapd can free
>> a percentage I don't see why reclaim can't (consdiering more free
>> pages in movable pageblocks are needed to succeed compaction). The
>> ideal is to improve the compaction rate and at the same time reduce
>> reclaim aggressiveness. Let's start with the parts that are more
>> obviously right fixes and that don't risk regressions, we don't want
>> compaction regressions :).
>>
>
> I don't think there are any "obviously right fixes" right now until the
> block_dev patch is proven to be ok and that reverting does not regress
> Rik's workload. Going to take time.

Ironically the test Andrea is measuring THP allocations with
(dd from /dev/sda to /dev/null) is functionally equivalent to
me running KVM guests with cache=writethrough directly from
a block device.

The difference is that Andrea is measuring THP allocation
success rate, while I am watching how well the programs (and
KVM guests) actually run.

Not surprisingly, swapping out the working set has a pretty
catastrophic effect on performance, even if it helps THP
allocation success :)

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Rik van Riel <riel@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Linux-MM <linux-mm@kvack.org>,
	Minchan Kim <minchan.kim@gmail.com>, Jan Kara <jack@suse.cz>,
	Andy Isaacson <adi@hexapodia.org>,
	Johannes Weiner <jweiner@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/5] mm: compaction: Determine if dirty pages can be migreated without blocking within ->migratepage
Date: Sun, 27 Nov 2011 15:50:22 -0500	[thread overview]
Message-ID: <4ED2A28E.2070206@redhat.com> (raw)
In-Reply-To: <20111124122144.GR19415@suse.de>

On 11/24/2011 07:21 AM, Mel Gorman wrote:
> On Thu, Nov 24, 2011 at 02:19:43AM +0100, Andrea Arcangeli wrote:

>> But funny thing grow_dev_page already sets __GFP_MOVABLE. That's
>> pretty weird and it's probably source of a few not movable pages in
>> the movable block. But then many bh are movable... most of them are,
>> it's just the superblock that isn't.
>>
>> But considering grow_dev_page sets __GFP_MOVABLE, any worry about pins
>> from the fs on the block_dev.c pagecache shouldn't be a concern...
>>
>
> Except in quantity. We can cope with some pollution of MIGRATE_MOVABLE
> but if it gets excessive, it will cause a lot of trouble. Superblock
> bh's may not be movable but there are not many of them and they are
> long lived.

We're potentially doomed either way :)

If we allocate a lot of movable pages in non-movable
blocks, we can end up with a lot of slightly polluted
blocks even after reclaiming all the reclaimable page
cache.

If we allocate a few non-movable pages in movable
blocks, we can end up with the same situation.

Either way, we can potentially end up with a lot of
memory that cannot be defragmented.

Of course, it could take the mounting of a lot of
filesystems for this problem to be triggered, but we
know there are people doing that.

>> __GFP_MOVABLE missing block_dev also was not
>> so common and it most certainly contributed to a reclaim more
>> aggressive than it would have happened with that fix. I think you can
>> push things one at time without urgency here, and I'd prefer maybe if
>> block_dev patch is applied and the other reversed in vmscan.c or
>> improved to start limiting only if we're above 8*high or some
>> percentage check to allow a little more reclaim than rc2 allows
>
> The limiting is my current preferred option - at least until it is
> confirmed that it really is ok to mark block_dev pages movable and that
> Rik is ok with the revert.

I am fine with replacing the compaction checks with free limit
checks. Funny enough, the first iteration of the patch I submitted
to limit reclaim used a free limit check :)

I also suspect we will want to call shrink_slab regardless of
whether or not a memory zone is already over its free limit for
direct reclaim, since that has the potential to free an otherwise
unmovable page.

>> (i.e. no reclaim at all which likely results in a failure in hugepage
>> allocation). Not unlimited as 3.1 is ok with me but if kswapd can free
>> a percentage I don't see why reclaim can't (consdiering more free
>> pages in movable pageblocks are needed to succeed compaction). The
>> ideal is to improve the compaction rate and at the same time reduce
>> reclaim aggressiveness. Let's start with the parts that are more
>> obviously right fixes and that don't risk regressions, we don't want
>> compaction regressions :).
>>
>
> I don't think there are any "obviously right fixes" right now until the
> block_dev patch is proven to be ok and that reverting does not regress
> Rik's workload. Going to take time.

Ironically the test Andrea is measuring THP allocations with
(dd from /dev/sda to /dev/null) is functionally equivalent to
me running KVM guests with cache=writethrough directly from
a block device.

The difference is that Andrea is measuring THP allocation
success rate, while I am watching how well the programs (and
KVM guests) actually run.

Not surprisingly, swapping out the working set has a pretty
catastrophic effect on performance, even if it helps THP
allocation success :)

-- 
All rights reversed

next prev parent reply	other threads:[~2011-11-27 20:50 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-18 16:58 [RFC PATCH 0/5] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v3 Mel Gorman
2011-11-18 16:58 ` Mel Gorman
2011-11-18 16:58 ` [PATCH 1/5] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-18 17:28   ` Andrea Arcangeli
2011-11-18 17:28     ` Andrea Arcangeli
2011-11-21 17:16   ` Rik van Riel
2011-11-21 17:16     ` Rik van Riel
2011-11-18 16:58 ` [PATCH 2/5] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-18 17:27   ` Andrea Arcangeli
2011-11-18 17:27     ` Andrea Arcangeli
2011-11-21 21:46   ` Rik van Riel
2011-11-21 21:46     ` Rik van Riel
2011-11-18 16:58 ` [PATCH 3/5] mm: Do not stall in synchronous compaction for THP allocations Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-18 17:34   ` Andrea Arcangeli
2011-11-18 17:34     ` Andrea Arcangeli
2011-11-18 16:58 ` [PATCH 4/5] mm: compaction: Determine if dirty pages can be migreated without blocking within ->migratepage Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-18 21:35   ` Andrea Arcangeli
2011-11-18 21:35     ` Andrea Arcangeli
2011-11-21 11:17     ` Mel Gorman
2011-11-21 11:17       ` Mel Gorman
2011-11-21 22:45       ` Andrea Arcangeli
2011-11-21 22:45         ` Andrea Arcangeli
2011-11-22  0:55         ` [PATCH] mm: compaction: make buffer cache __GFP_MOVABLE Rik van Riel
2011-11-22  0:55           ` Rik van Riel
2011-11-22 12:59         ` [PATCH 4/5] mm: compaction: Determine if dirty pages can be migreated without blocking within ->migratepage Mel Gorman
2011-11-22 12:59           ` Mel Gorman
2011-11-24  1:19           ` Andrea Arcangeli
2011-11-24  1:19             ` Andrea Arcangeli
2011-11-24 12:21             ` Mel Gorman
2011-11-24 12:21               ` Mel Gorman
2011-11-26  6:51               ` Andy Isaacson
2011-11-26  6:51                 ` Andy Isaacson
2011-11-27 20:50               ` Rik van Riel [this message]
2011-11-27 20:50                 ` Rik van Riel
2011-11-19  8:59   ` Nai Xia
2011-11-19  8:59     ` Nai Xia
2011-11-19  9:48     ` Nai Xia
2011-11-19  9:48       ` Nai Xia
2011-11-21 11:19     ` Mel Gorman
2011-11-21 11:19       ` Mel Gorman
2011-11-18 16:58 ` [PATCH 5/5] mm: compaction: make isolate_lru_page() filter-aware again Mel Gorman
2011-11-18 16:58   ` Mel Gorman
2011-11-19 19:54 ` [RFC PATCH 0/5] Reduce compaction-related stalls Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-19 19:54 ` [PATCH 1/8] mm: compaction: Allow compaction to isolate dirty pages Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-19 19:54 ` [PATCH 2/8] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-19 19:54 ` [PATCH 3/8] mm: check if we isolated a compound page during lumpy scan Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 11:51   ` Mel Gorman
2011-11-21 11:51     ` Mel Gorman
2011-11-19 19:54 ` [PATCH 4/8] mm: compaction: defer compaction only with sync_migration Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 12:36   ` Mel Gorman
2011-11-21 12:36     ` Mel Gorman
2011-11-19 19:54 ` [PATCH 5/8] mm: compaction: avoid overwork in migrate sync mode Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 21:59   ` Rik van Riel
2011-11-21 21:59     ` Rik van Riel
2011-11-22  9:51     ` Mel Gorman
2011-11-22  9:51       ` Mel Gorman
2011-11-19 19:54 ` [PATCH 6/8] Revert "mm: compaction: make isolate_lru_page() filter-aware" Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 12:57   ` Mel Gorman
2011-11-21 12:57     ` Mel Gorman
2011-11-19 19:54 ` [PATCH 7/8] Revert "vmscan: abort reclaim/compaction if compaction can proceed" Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 13:09   ` Mel Gorman
2011-11-21 13:09     ` Mel Gorman
2011-11-21 15:37     ` Rik van Riel
2011-11-21 15:37       ` Rik van Riel
2011-11-19 19:54 ` [PATCH 8/8] Revert "vmscan: limit direct reclaim for higher order allocations" Andrea Arcangeli
2011-11-19 19:54   ` Andrea Arcangeli
2011-11-21 21:57   ` Rik van Riel
2011-11-21 21:57     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED2A28E.2070206@redhat.com \
    --to=riel@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=adi@hexapodia.org \
    --cc=jack@suse.cz \
    --cc=jweiner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.