Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Kyungmin Park <kyungmin.park@samsung.com>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Dave Jones <davej@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cong Wang <amwang@redhat.com>,
	Markus Trippelsdorf <markus@trippelsdorf.de>
Subject: Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
Date: Tue, 05 Jun 2012 10:40:17 -0400	[thread overview]
Message-ID: <4FCE1A51.3040407@gmail.com> (raw)
In-Reply-To: <4FCDA1B4.9050301@kernel.org>

(6/5/12 2:05 AM), Minchan Kim wrote:
> On 06/05/2012 01:35 PM, KOSAKI Motohiro wrote:
>
>>>>> Minchan, are you interest this patch? If yes, can you please rewrite
>>>>> it?
>>>>
>>>> Can do it but I want to give credit to Bartlomiej.
>>>> Bartlomiej, if you like my patch, could you resend it as formal patch
>>>> after you do broad testing?
>>>>
>>>> Frankly speaking, I don't want to merge it without any data which
>>>> prove it's really good for real practice.
>>>>
>>>> When the patch firstly was submitted, it wasn't complicated so I was
>>>> okay at that time but it has been complicated
>>>> than my expectation. So if Andrew might pass the decision to me, I'm
>>>> totally NACK if author doesn't provide
>>>> any real data or VOC of some client.
>>
>> I agree. And you don't need to bother this patch if you are not interest
>> this one. I'm sorry.
>
>
> Never mind.
>
>> Let's throw it away until the author send us data.
>>
>
> I guess it's hard to make such workload to prove it's useful normally.
> But we can't make sure there isn't such workload in the world.
> So I hope listen VOC. At least, Mel might require it.
>
> If anyone doesn't support it, I hope let's add some vmstat like stuff for proving
> this patch's effect. If we can't see the benefit through vmstat, we can deprecate
> it later.

Eek, bug we can not deprecate the vmstat. I hope to make good decision _before_
inclusion. ;-)


>>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>> need_lrulock)
>>> +{
>>> +    struct zone *zone;
>>> +    unsigned long pfn, start_pfn, end_pfn;
>>> +    struct page *start_page, *end_page, *cursor_page;
>>> +    bool lru_locked = false;
>>> +
>>> +    zone = page_zone(page);
>>> +    pfn = page_to_pfn(page);
>>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
>>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>> +
>>> +    start_page = pfn_to_page(start_pfn);
>>> +    end_page = pfn_to_page(end_pfn);
>>> +
>>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
>>> end_page;
>>> +        pfn++, cursor_page++) {
>>>
>>> -/* Returns true if the page is within a block suitable for migration
>>> to */
>>> -static bool suitable_migration_target(struct page *page)
>>> +        if (!pfn_valid_within(pfn))
>>> +            continue;
>>> +
>>> +        /* Do not deal with pageblocks that overlap zones */
>>> +        if (page_zone(cursor_page) != zone)
>>> +            goto out;
>>> +
>>> +        if (PageBuddy(cursor_page)) {
>>> +            unsigned long order = page_order(cursor_page);
>>> +
>>> +            pfn += (1<<  order) - 1;
>>> +            cursor_page += (1<<  order) - 1;
>>> +            continue;
>>> +        } else if (page_count(cursor_page) == 0) {
>>> +            continue;
>>
>> Can we assume freed tail page always have page_count()==0? if yes, why
>> do we
>> need dangerous PageBuddy(cursor_page) check? ok, but this may be harmless.
>
> page_count check is for pcp pages.

Right. but my point was, I doubt we can do buddy walk w/o zone->lock.


> Am I missing your point?
>
>
>> But if no, this code is seriously dangerous. think following scenario,
>>
>> 1) cursor page points free page
>>
>>      +----------------+------------------+
>>      | free (order-1) |  used (order-1)  |
>>      +----------------+------------------+
>>      |
>>     cursor
>>
>> 2) moved cursor
>>
>>      +----------------+------------------+
>>      | free (order-1) |  used (order-1)  |
>>      +----------------+------------------+
>>                       |
>>                       cursor
>>
>> 3) neighbor block was freed
>>
>>
>>      +----------------+------------------+
>>      | free (order-2)                    |
>>      +----------------+------------------+
>>                       |
>>                       cursor
>>
>> now, cursor points to middle of free block.
>
>> Anyway, I recommend to avoid dangerous no zone->lock game and change
>> can_rescue_unmovable_pageblock() is only called w/ zone->lock. I have
>
>
>
> I can't understand your point.
> If the page is middle of free block, what's the problem in can_rescue_unmovable_pageblock
> at first trial of can_rescue_xxx?

I'm not sure. but other all pfn scanning code carefully avoid to touch a middle of free pages
block. (also they take zone->lock anytime)


> I think we can stabilize it in second trial of can_rescue_unmovable_pageblock with zone->lock.
>
>> no seen any worth to include this high complex for mere minor optimization.
>
>>
>
>>
>>> +        } else if (PageLRU(cursor_page)) {
>>> +            if (!need_lrulock)
>>> +                continue;
>>> +            else if (lru_locked)
>>> +                continue;
>>> +            else {
>>> +                spin_lock(&zone->lru_lock);
>>
>> Hmm...
>> I don't like to take lru_lock. 1) Until now, we carefully avoid to take
>> both zone->lock and zone->lru_lock. they are both performance critical
>> lock. And I think pageblock migratetype don't need strictly correct. It
>> is only optimization of anti fragmentation. Why do we need take it?
>
> movable_block has unmovable page can make regression of anti-fragmentation.
> So I did it. I agree it's a sort of optimization.
> If others don't want it at the cost of regression anti-fragmentation, we can remove the lock.

ok.


>
>>
>>
>>> +                lru_locked = true;
>>> +                if (PageLRU(page))
>>> +                    continue;
>>> +            }
>>> +        }
>>> +
>>> +        goto out;
>>> +    }
>>> +
>>
>> Why don't we need to release lru_lock when returning true.
>
>
> Because my brain has gone. :(

Never mind.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Kyungmin Park <kyungmin.park@samsung.com>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Dave Jones <davej@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cong Wang <amwang@redhat.com>,
	Markus Trippelsdorf <markus@trippelsdorf.de>
Subject: Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
Date: Tue, 05 Jun 2012 10:40:17 -0400	[thread overview]
Message-ID: <4FCE1A51.3040407@gmail.com> (raw)
In-Reply-To: <4FCDA1B4.9050301@kernel.org>

(6/5/12 2:05 AM), Minchan Kim wrote:
> On 06/05/2012 01:35 PM, KOSAKI Motohiro wrote:
>
>>>>> Minchan, are you interest this patch? If yes, can you please rewrite
>>>>> it?
>>>>
>>>> Can do it but I want to give credit to Bartlomiej.
>>>> Bartlomiej, if you like my patch, could you resend it as formal patch
>>>> after you do broad testing?
>>>>
>>>> Frankly speaking, I don't want to merge it without any data which
>>>> prove it's really good for real practice.
>>>>
>>>> When the patch firstly was submitted, it wasn't complicated so I was
>>>> okay at that time but it has been complicated
>>>> than my expectation. So if Andrew might pass the decision to me, I'm
>>>> totally NACK if author doesn't provide
>>>> any real data or VOC of some client.
>>
>> I agree. And you don't need to bother this patch if you are not interest
>> this one. I'm sorry.
>
>
> Never mind.
>
>> Let's throw it away until the author send us data.
>>
>
> I guess it's hard to make such workload to prove it's useful normally.
> But we can't make sure there isn't such workload in the world.
> So I hope listen VOC. At least, Mel might require it.
>
> If anyone doesn't support it, I hope let's add some vmstat like stuff for proving
> this patch's effect. If we can't see the benefit through vmstat, we can deprecate
> it later.

Eek, bug we can not deprecate the vmstat. I hope to make good decision _before_
inclusion. ;-)


>>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>> need_lrulock)
>>> +{
>>> +    struct zone *zone;
>>> +    unsigned long pfn, start_pfn, end_pfn;
>>> +    struct page *start_page, *end_page, *cursor_page;
>>> +    bool lru_locked = false;
>>> +
>>> +    zone = page_zone(page);
>>> +    pfn = page_to_pfn(page);
>>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
>>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>> +
>>> +    start_page = pfn_to_page(start_pfn);
>>> +    end_page = pfn_to_page(end_pfn);
>>> +
>>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
>>> end_page;
>>> +        pfn++, cursor_page++) {
>>>
>>> -/* Returns true if the page is within a block suitable for migration
>>> to */
>>> -static bool suitable_migration_target(struct page *page)
>>> +        if (!pfn_valid_within(pfn))
>>> +            continue;
>>> +
>>> +        /* Do not deal with pageblocks that overlap zones */
>>> +        if (page_zone(cursor_page) != zone)
>>> +            goto out;
>>> +
>>> +        if (PageBuddy(cursor_page)) {
>>> +            unsigned long order = page_order(cursor_page);
>>> +
>>> +            pfn += (1<<  order) - 1;
>>> +            cursor_page += (1<<  order) - 1;
>>> +            continue;
>>> +        } else if (page_count(cursor_page) == 0) {
>>> +            continue;
>>
>> Can we assume freed tail page always have page_count()==0? if yes, why
>> do we
>> need dangerous PageBuddy(cursor_page) check? ok, but this may be harmless.
>
> page_count check is for pcp pages.

Right. but my point was, I doubt we can do buddy walk w/o zone->lock.


> Am I missing your point?
>
>
>> But if no, this code is seriously dangerous. think following scenario,
>>
>> 1) cursor page points free page
>>
>>      +----------------+------------------+
>>      | free (order-1) |  used (order-1)  |
>>      +----------------+------------------+
>>      |
>>     cursor
>>
>> 2) moved cursor
>>
>>      +----------------+------------------+
>>      | free (order-1) |  used (order-1)  |
>>      +----------------+------------------+
>>                       |
>>                       cursor
>>
>> 3) neighbor block was freed
>>
>>
>>      +----------------+------------------+
>>      | free (order-2)                    |
>>      +----------------+------------------+
>>                       |
>>                       cursor
>>
>> now, cursor points to middle of free block.
>
>> Anyway, I recommend to avoid dangerous no zone->lock game and change
>> can_rescue_unmovable_pageblock() is only called w/ zone->lock. I have
>
>
>
> I can't understand your point.
> If the page is middle of free block, what's the problem in can_rescue_unmovable_pageblock
> at first trial of can_rescue_xxx?

I'm not sure. but other all pfn scanning code carefully avoid to touch a middle of free pages
block. (also they take zone->lock anytime)


> I think we can stabilize it in second trial of can_rescue_unmovable_pageblock with zone->lock.
>
>> no seen any worth to include this high complex for mere minor optimization.
>
>>
>
>>
>>> +        } else if (PageLRU(cursor_page)) {
>>> +            if (!need_lrulock)
>>> +                continue;
>>> +            else if (lru_locked)
>>> +                continue;
>>> +            else {
>>> +                spin_lock(&zone->lru_lock);
>>
>> Hmm...
>> I don't like to take lru_lock. 1) Until now, we carefully avoid to take
>> both zone->lock and zone->lru_lock. they are both performance critical
>> lock. And I think pageblock migratetype don't need strictly correct. It
>> is only optimization of anti fragmentation. Why do we need take it?
>
> movable_block has unmovable page can make regression of anti-fragmentation.
> So I did it. I agree it's a sort of optimization.
> If others don't want it at the cost of regression anti-fragmentation, we can remove the lock.

ok.


>
>>
>>
>>> +                lru_locked = true;
>>> +                if (PageLRU(page))
>>> +                    continue;
>>> +            }
>>> +        }
>>> +
>>> +        goto out;
>>> +    }
>>> +
>>
>> Why don't we need to release lru_lock when returning true.
>
>
> Because my brain has gone. :(

Never mind.

next prev parent reply	other threads:[~2012-06-05 14:40 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-04 13:43 [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks Bartlomiej Zolnierkiewicz
2012-06-04 13:43 ` Bartlomiej Zolnierkiewicz
2012-06-04 14:22 ` Michal Nazarewicz
2012-06-04 14:22   ` Michal Nazarewicz
2012-06-06 12:55   ` Bartlomiej Zolnierkiewicz
2012-06-06 12:55     ` Bartlomiej Zolnierkiewicz
2012-06-06 15:52     ` Michal Nazarewicz
2012-06-06 15:52       ` Michal Nazarewicz
2012-06-07  4:23       ` Minchan Kim
2012-06-07  4:23         ` Minchan Kim
2012-06-04 17:13 ` Dave Jones
2012-06-04 17:13   ` Dave Jones
2012-06-04 20:22 ` KOSAKI Motohiro
2012-06-04 20:22   ` KOSAKI Motohiro
2012-06-05  1:59   ` Minchan Kim
2012-06-05  2:38     ` Minchan Kim
2012-06-05  2:38       ` Minchan Kim
2012-06-05  4:35       ` KOSAKI Motohiro
2012-06-05  4:35         ` KOSAKI Motohiro
2012-06-05  6:05         ` Minchan Kim
2012-06-05  6:05           ` Minchan Kim
2012-06-05 14:40           ` KOSAKI Motohiro [this message]
2012-06-05 14:40             ` KOSAKI Motohiro
2012-06-11 13:06           ` Mel Gorman
2012-06-11 13:06             ` Mel Gorman
2012-06-11 13:35             ` Rik van Riel
2012-06-11 13:35               ` Rik van Riel
2012-06-06 10:06       ` Bartlomiej Zolnierkiewicz
2012-06-06 10:06         ` Bartlomiej Zolnierkiewicz
2012-06-07  4:13         ` Minchan Kim
2012-06-07  4:13           ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FCE1A51.3040407@gmail.com \
    --to=kosaki.motohiro@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=amwang@redhat.com \
    --cc=b.zolnierkie@samsung.com \
    --cc=davej@redhat.com \
    --cc=hughd@google.com \
    --cc=kyungmin.park@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=markus@trippelsdorf.de \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.