All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>, Jiang Liu <jiang.liu@huawei.com>,
	Cody P Schafer <cody@linux.vnet.ibm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Minchan Kim <minchan@kernel.org>,
	Michal Nazarewicz <mina86@mina86.com>,
	Andi Kleen <ak@linux.intel.com>,
	Wei Yongjun <yongjun_wei@trendmicro.com.cn>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype
Date: Mon, 03 Feb 2014 10:16:50 +0100	[thread overview]
Message-ID: <52EF5E82.4060003@suse.cz> (raw)
In-Reply-To: <20140203074507.GB2360@lge.com>

On 02/03/2014 08:45 AM, Joonsoo Kim wrote:
> On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote:
>> On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
>>> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
>>>> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
>>>>> Hello,
>>>>>
>>>>> I found some weaknesses on handling migratetype during code review and
>>>>> testing CMA.
>>>>>
>>>>> First, we don't have any synchronization method on get/set pageblock
>>>>> migratetype. When we change migratetype, we hold the zone lock. So
>>>>> writer-writer race doesn't exist. But while someone changes migratetype,
>>>>> others can get migratetype. This may introduce totally unintended value
>>>>> as migratetype. Although I haven't heard of any problem report about
>>>>> that, it is better to protect properly.
>>>>>
>>>>
>>>> This is deliberate. The migratetypes for the majority of users are advisory
>>>> and aimed for fragmentation avoidance. It was important that the cost of
>>>> that be kept as low as possible and the general case is that migration types
>>>> change very rarely. In many cases, the zone lock is held. In other cases,
>>>> such as splitting free pages, the cost is simply not justified.
>>>>
>>>> I doubt there is any amount of data you could add in support that would
>>>> justify hammering the free fast paths (which call get_pageblock_type).
>>>
>>> Hello, Mel.
>>>
>>> There is a possibility that we can get unintended value such as 6 as migratetype
>>> if reader-writer (get/set pageblock_migratetype) race happends. It can be
>>> possible, because we read the value without any synchronization method. And
>>> this migratetype, 6, has no place in buddy freelist, so array index overrun can
>>> be possible and the system can break, although I haven't heard that it occurs.
>>
>> Hello,
>>
>> it seems this can indeed happen. I'm working on memory compaction
>> improvements and in a prototype patch, I'm basically adding calls of
>> start_isolate_page_range() undo_isolate_page_range() some functions
>> under compact_zone(). With this I've seen occurrences of NULL
>> pointers in move_freepages(), free_one_page() in places where
>> free_list[migratetype] is manipulated by e.g. list_move(). That lead
>> me to question the value of migratetype and I found this thread.
>> Adding some debugging in get_pageblock_migratetype() and voila, I
>> get a value of 6 being read.
>>
>> So is it just my patch adding a dangerous situation, or does it exist in
>> mainline as well? By looking at free_one_page(), it uses zone->lock, but
>> get_pageblock_migratetype() is called by its callers
>> (free_hot_cold_page() or __free_pages_ok()) outside of the lock.
>> This determined migratetype is then used under free_one_page() to
>> access a free_list.
>>
>> It seems that this could race with set_pageblock_migratetype()
>> called from try_to_steal_freepages() (despite the latter being
>> properly locked). There are also other callers but those seem to be
>> either limited to initialization and isolation, which should be rare
>> (?).
>> However, try_to_steal_freepages can occur repeatedly.
>> So I assume that the race happens but never manifests as a fatal
>> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and
>> MIGRATE_MOVABLE
>> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values
>> with bit 4 enabled and can thus result in invalid values due to
>> non-atomic access.
>>
>> Does that make sense to you and should we thus proceed with patching
>> this race?
>>
>
> Hello,
>
> This race is possible without your prototype patch, however, on very low
> probability. Some codes related to memory failure use set_migratetype_isolate()
> which could result in this race.
>
> Although it may be very rare case and not critical, it is better to fix
> this race. I prefer that we don't depend on luck. :)

I agree :) I also don't like the possibility that the non-fatal type of 
race (where higher-order bits are not involved) occurs and can hurt 
anti-fragmentation, or even suddenly become a problem in the future if 
e.g. more migratetypes are added. I'll try to quantify that with a debug 
patch.

> Mel's suggestion looks good to me. Do you have another idea?

No, it sounds good so I'm going to work on this as outlined.

> Thanks.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>, Jiang Liu <jiang.liu@huawei.com>,
	Cody P Schafer <cody@linux.vnet.ibm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Minchan Kim <minchan@kernel.org>,
	Michal Nazarewicz <mina86@mina86.com>,
	Andi Kleen <ak@linux.intel.com>,
	Wei Yongjun <yongjun_wei@trendmicro.com.cn>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype
Date: Mon, 03 Feb 2014 10:16:50 +0100	[thread overview]
Message-ID: <52EF5E82.4060003@suse.cz> (raw)
In-Reply-To: <20140203074507.GB2360@lge.com>

On 02/03/2014 08:45 AM, Joonsoo Kim wrote:
> On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote:
>> On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
>>> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
>>>> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
>>>>> Hello,
>>>>>
>>>>> I found some weaknesses on handling migratetype during code review and
>>>>> testing CMA.
>>>>>
>>>>> First, we don't have any synchronization method on get/set pageblock
>>>>> migratetype. When we change migratetype, we hold the zone lock. So
>>>>> writer-writer race doesn't exist. But while someone changes migratetype,
>>>>> others can get migratetype. This may introduce totally unintended value
>>>>> as migratetype. Although I haven't heard of any problem report about
>>>>> that, it is better to protect properly.
>>>>>
>>>>
>>>> This is deliberate. The migratetypes for the majority of users are advisory
>>>> and aimed for fragmentation avoidance. It was important that the cost of
>>>> that be kept as low as possible and the general case is that migration types
>>>> change very rarely. In many cases, the zone lock is held. In other cases,
>>>> such as splitting free pages, the cost is simply not justified.
>>>>
>>>> I doubt there is any amount of data you could add in support that would
>>>> justify hammering the free fast paths (which call get_pageblock_type).
>>>
>>> Hello, Mel.
>>>
>>> There is a possibility that we can get unintended value such as 6 as migratetype
>>> if reader-writer (get/set pageblock_migratetype) race happends. It can be
>>> possible, because we read the value without any synchronization method. And
>>> this migratetype, 6, has no place in buddy freelist, so array index overrun can
>>> be possible and the system can break, although I haven't heard that it occurs.
>>
>> Hello,
>>
>> it seems this can indeed happen. I'm working on memory compaction
>> improvements and in a prototype patch, I'm basically adding calls of
>> start_isolate_page_range() undo_isolate_page_range() some functions
>> under compact_zone(). With this I've seen occurrences of NULL
>> pointers in move_freepages(), free_one_page() in places where
>> free_list[migratetype] is manipulated by e.g. list_move(). That lead
>> me to question the value of migratetype and I found this thread.
>> Adding some debugging in get_pageblock_migratetype() and voila, I
>> get a value of 6 being read.
>>
>> So is it just my patch adding a dangerous situation, or does it exist in
>> mainline as well? By looking at free_one_page(), it uses zone->lock, but
>> get_pageblock_migratetype() is called by its callers
>> (free_hot_cold_page() or __free_pages_ok()) outside of the lock.
>> This determined migratetype is then used under free_one_page() to
>> access a free_list.
>>
>> It seems that this could race with set_pageblock_migratetype()
>> called from try_to_steal_freepages() (despite the latter being
>> properly locked). There are also other callers but those seem to be
>> either limited to initialization and isolation, which should be rare
>> (?).
>> However, try_to_steal_freepages can occur repeatedly.
>> So I assume that the race happens but never manifests as a fatal
>> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and
>> MIGRATE_MOVABLE
>> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values
>> with bit 4 enabled and can thus result in invalid values due to
>> non-atomic access.
>>
>> Does that make sense to you and should we thus proceed with patching
>> this race?
>>
>
> Hello,
>
> This race is possible without your prototype patch, however, on very low
> probability. Some codes related to memory failure use set_migratetype_isolate()
> which could result in this race.
>
> Although it may be very rare case and not critical, it is better to fix
> this race. I prefer that we don't depend on luck. :)

I agree :) I also don't like the possibility that the non-fatal type of 
race (where higher-order bits are not involved) occurs and can hurt 
anti-fragmentation, or even suddenly become a problem in the future if 
e.g. more migratetypes are added. I'll try to quantify that with a debug 
patch.

> Mel's suggestion looks good to me. Do you have another idea?

No, it sounds good so I'm going to work on this as outlined.

> Thanks.
>


  reply	other threads:[~2014-02-03  9:16 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
2014-01-09  7:04 ` Joonsoo Kim
2014-01-09  7:04 ` [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock Joonsoo Kim
2014-01-09  7:04   ` Joonsoo Kim
2014-01-09  9:08   ` Michal Nazarewicz
2014-01-09  9:08     ` Michal Nazarewicz
2014-01-09  7:04 ` [PATCH 2/7] mm/cma: fix cma free page accounting Joonsoo Kim
2014-01-09  7:04   ` Joonsoo Kim
2014-01-09 21:10   ` Laura Abbott
2014-01-09 21:10     ` Laura Abbott
2014-01-10  8:50     ` Joonsoo Kim
2014-01-10  8:50       ` Joonsoo Kim
2014-01-09  7:04 ` [PATCH 3/7] mm/page_alloc: move set_freepage_migratetype() to better place Joonsoo Kim
2014-01-09  7:04   ` Joonsoo Kim
2014-01-09  7:04 ` [PATCH 4/7] mm/isolation: remove invalid check condition Joonsoo Kim
2014-01-09  7:04   ` Joonsoo Kim
2014-01-09  7:04 ` [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage Joonsoo Kim
2014-01-09  7:04   ` Joonsoo Kim
2014-01-09  9:18   ` Michal Nazarewicz
2014-01-09  9:18     ` Michal Nazarewicz
2014-01-09  7:04 ` [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly Joonsoo Kim
2014-01-09  7:04   ` Joonsoo Kim
2014-01-09  9:19   ` Michal Nazarewicz
2014-01-09  9:19     ` Michal Nazarewicz
2014-01-09  7:04 ` [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy Joonsoo Kim
2014-01-09  7:04   ` Joonsoo Kim
2014-01-09  9:22   ` Michal Nazarewicz
2014-01-09  9:22     ` Michal Nazarewicz
2014-01-09  9:06 ` [PATCH 0/7] improve robustness on handling migratetype Michal Nazarewicz
2014-01-09  9:06   ` Michal Nazarewicz
2014-01-09 14:05   ` Joonsoo Kim
2014-01-09 14:05     ` Joonsoo Kim
2014-01-09  9:27 ` Mel Gorman
2014-01-09  9:27   ` Mel Gorman
2014-01-10  8:48   ` Joonsoo Kim
2014-01-10  8:48     ` Joonsoo Kim
2014-01-10  9:48     ` Mel Gorman
2014-01-10  9:48       ` Mel Gorman
2014-01-13  1:57       ` Joonsoo Kim
2014-01-13  1:57         ` Joonsoo Kim
2014-01-29 16:52     ` Vlastimil Babka
2014-01-29 16:52       ` Vlastimil Babka
2014-01-31 15:39       ` Mel Gorman
2014-01-31 15:39         ` Mel Gorman
2014-02-03  7:45       ` Joonsoo Kim
2014-02-03  7:45         ` Joonsoo Kim
2014-02-03  9:16         ` Vlastimil Babka [this message]
2014-02-03  9:16           ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52EF5E82.4060003@suse.cz \
    --to=vbabka@suse.cz \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cody@linux.vnet.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jiang.liu@huawei.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=mina86@mina86.com \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=tangchen@cn.fujitsu.com \
    --cc=yongjun_wei@trendmicro.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.