From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Minchan Kim <minchan@kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable
Date: Mon, 3 Mar 2014 17:22:27 +0900 [thread overview]
Message-ID: <20140303082227.GA28899@lge.com> (raw)
In-Reply-To: <1393596904-16537-3-git-send-email-vbabka@suse.cz>
On Fri, Feb 28, 2014 at 03:15:00PM +0100, Vlastimil Babka wrote:
> In order to prevent race with set_pageblock_migratetype, most of calls to
> get_pageblock_migratetype have been moved under zone->lock. For the remaining
> call sites, the extra locking is undesirable, notably in free_hot_cold_page().
>
> This patch introduces a _nolock version to be used on these call sites, where
> a wrong value does not affect correctness. The function makes sure that the
> value does not exceed valid migratetype numbers. Such too-high values are
> assumed to be a result of race and caller-supplied fallback value is returned
> instead.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> include/linux/mmzone.h | 24 ++++++++++++++++++++++++
> mm/compaction.c | 14 +++++++++++---
> mm/memory-failure.c | 3 ++-
> mm/page_alloc.c | 22 +++++++++++++++++-----
> mm/vmstat.c | 2 +-
> 5 files changed, 55 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..7c3f678 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -75,6 +75,30 @@ enum {
>
> extern int page_group_by_mobility_disabled;
>
> +/*
> + * When called without zone->lock held, a race with set_pageblock_migratetype
> + * may result in bogus values. Use this variant only when this does not affect
> + * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
> + * are considered to be a result of this race and the value of race_fallback
> + * argument is returned instead.
> + */
> +static inline int get_pageblock_migratetype_nolock(struct page *page,
> + int race_fallback)
> +{
> + int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
> +
> + if (unlikely(ret >= MIGRATE_TYPES))
> + ret = race_fallback;
> +
> + return ret;
> +}
Hello, Vlastimil.
First of all, thanks for nice work!
I have another opinion about this implementation. It can be wrong, so if it
is wrong, please let me know.
Although this implementation would close the race which triggers NULL dereference,
I think that this isn't enough if you have a plan to add more
{start,undo}_isolate_page_range().
Consider that there are lots of {start,undo}_isolate_page_range() calls
on the system without CMA.
bit representation of migratetype is like as following.
MIGRATE_MOVABLE = 010
MIGRATE_ISOLATE = 100
We could read following values as migratetype of the page on movable pageblock
if race occurs.
start_isolate_page_range() case: 010 -> 100
010, 000, 100
undo_isolate_page_range() case: 100 -> 010
100, 110, 010
Above implementation prevents us from getting 110, but, it can't prevent us from
getting 000, that is, MIGRATE_UNMOVABLE. If this race occurs in free_hot_cold_page(),
this page would go into unmovable pcp and then allocated for that migratetype.
It results in more fragmented memory.
Consider another case that system enables CONFIG_CMA,
MIGRATE_MOVABLE = 010
MIGRATE_ISOLATE = 101
start_isolate_page_range() case: 010 -> 101
010, 011, 001, 101
undo_isolate_page_range() case: 101 -> 010
101, 100, 110, 010
This can results in totally different values and this also makes the problem
mentioned above. And, although this doesn't cause any problem on CMA for now,
if another migratetype is introduced or some migratetype is removed, it can cause
CMA typed page to go into other migratetype and makes CMA permanently failed.
To close this kind of races without dependency how many pageblock isolation occurs,
I recommend that you use separate pageblock bits for MIGRATE_CMA, MIGRATE_ISOLATE
and use accessor function whenver we need to check migratetype. IMHO, it may not
impose much overhead.
How about it?
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-03-03 8:21 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-28 14:14 [PATCH 0/6] close pageblock_migratetype and pageblock_skip races Vlastimil Babka
2014-02-28 14:14 ` [PATCH 1/6] mm: call get_pageblock_migratetype() under zone->lock where possible Vlastimil Babka
2014-02-28 14:15 ` [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable Vlastimil Babka
2014-03-03 8:22 ` Joonsoo Kim [this message]
2014-03-03 13:54 ` Vlastimil Babka
2014-03-04 0:55 ` Joonsoo Kim
2014-03-04 12:16 ` Vlastimil Babka
2014-03-05 0:29 ` Joonsoo Kim
2014-03-05 0:37 ` Joonsoo Kim
2014-02-28 14:15 ` [PATCH 3/6] mm: add is_migrate_isolate_page_nolock() " Vlastimil Babka
2014-03-05 0:39 ` Joonsoo Kim
2014-02-28 14:15 ` [PATCH 4/6] mm: add set_pageblock_migratetype_nolock() for calls outside zone->lock Vlastimil Babka
2014-02-28 14:15 ` [PATCH 5/6] mm: compaction: do not set pageblock skip bit when already set Vlastimil Babka
2014-02-28 14:15 ` [PATCH 6/6] mm: use atomic bit operations in set_pageblock_flags_group() Vlastimil Babka
2014-03-03 8:28 ` Joonsoo Kim
2014-03-03 12:46 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140303082227.GA28899@lge.com \
--to=iamjoonsoo.kim@lge.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=riel@redhat.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).