All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	riel@redhat.com, aquini@redhat.com, linux-mm@kvack.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: skip the page buddy block instead of one page
Date: Thu, 15 Aug 2013 13:17:55 +0900	[thread overview]
Message-ID: <20130815041736.GA2592@gmail.com> (raw)
In-Reply-To: <520C4EFF.8040305@huawei.com>

Hello,

On Thu, Aug 15, 2013 at 11:46:07AM +0800, Xishi Qiu wrote:
> On 2013/8/15 10:44, Minchan Kim wrote:
> 
> > Hi Xishi,
> > 
> > On Thu, Aug 15, 2013 at 10:32:50AM +0800, Xishi Qiu wrote:
> >> On 2013/8/15 2:00, Mel Gorman wrote:
> >>
> >>>>> Even if the page is still page buddy, there is no guarantee that it's
> >>>>> the same page order as the first read. It could have be currently
> >>>>> merging with adjacent buddies for example. There is also a really
> >>>>> small race that a page was freed, allocated with some number stuffed
> >>>>> into page->private and freed again before the second PageBuddy check.
> >>>>> It's a bit of a hand grenade. How much of a performance benefit is there
> >>>>
> >>>> 1. Just worst case is skipping pageblock_nr_pages
> >>>
> >>> No, the worst case is that page_order returns a number that is
> >>> completely garbage and low_pfn goes off the end of the zone
> >>>
> >>>> 2. Race is really small
> >>>> 3. Higher order page allocation customer always have graceful fallback.
> >>>>
> >>
> >> Hi Minchan, 
> >> I think in this case, we may get the wrong value from page_order(page).
> >>
> >> 1. page is in page buddy
> >>
> >>> if (PageBuddy(page)) {
> >>
> >> 2. someone allocated the page, and set page->private to another value
> >>
> >>> 	int nr_pages = (1 << page_order(page)) - 1;
> >>
> >> 3. someone freed the page
> >>
> >>> 	if (PageBuddy(page)) {
> >>
> >> 4. we will skip wrong pages
> > 
> > So, what's the result by that?
> > As I said, it's just skipping (pageblock_nr_pages -1) at worst case
> 
> Hi Minchan,
> I mean if the private is set to a large number, it will skip 2^private 
> pages, not (pageblock_nr_pages -1). I find somewhere will use page->private, 
> such as fs. Here is the comment about parivate.
> /* Mapping-private opaque data:
>  * usually used for buffer_heads
>  * if PagePrivate set; used for
>  * swp_entry_t if PageSwapCache;
>  * indicates order in the buddy
>  * system if PG_buddy is set.
>  */

Please read full thread in detail.

Mel suggested following as

if (PageBuddy(page)) {
        int nr_pages = (1 << page_order(page)) - 1;
        if (PageBuddy(page)) {
                nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
                low_pfn += nr_pages;
                continue;
        }
}

min(nr_pages, xxx) removes your concern but I think Mel's version
isn't right. It should be aligned with pageblock boundary so I 
suggested following.

if (PageBuddy(page)) {
#ifdef CONFIG_MEMORY_ISOLATION
	unsigned long order = page_order(page);
	if (PageBuddy(page)) {
		low_pfn += (1 << order) - 1;
		low_pfn = min(low_pfn, end_pfn);
	}
#endif
	continue;
}

so worst case is (pageblock_nr_pages - 1).
but we don't need to add CONFIG_MEMORY_ISOLATION so my suggestion
is following as.

if (PageBuddy(page)) {
	unsigned long order = page_order(page);
	if (PageBuddy(page)) {
		low_pfn += (1 << order) - 1;
		low_pfn = min(low_pfn, end_pfn);
	}
	continue;
}


> Thanks,
> Xishi Qiu
> 
> > and the case you mentioned is right academically and I and Mel
> > already pointed out that. But how often could that happen in real
> > practice? I believe such is REALLY REALLY rare.
> > So, as Mel said, if you have some workloads to see the benefit
> > from this patch, I think we could accept the patch.
> > Could you try and respin with the number?
> > I guess big contigous memory range or memory-hotplug which are
> > full of free pages in embedded CPU which is rather slower than server
> > or desktop side could have benefit.
> > 
> > Thanks.
> > 
> >>
> >>> 		nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> >>> 		low_pfn += nr_pages;
> >>> 		continue;
> >>> 	}
> >>> }
> >>>
> >>> It's still race-prone meaning that it really should be backed by some
> >>> performance data justifying it.
> >>>
> >>
> >>
> >>
> > 
> 
> 
> 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	riel@redhat.com, aquini@redhat.com, linux-mm@kvack.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: skip the page buddy block instead of one page
Date: Thu, 15 Aug 2013 13:17:55 +0900	[thread overview]
Message-ID: <20130815041736.GA2592@gmail.com> (raw)
In-Reply-To: <520C4EFF.8040305@huawei.com>

Hello,

On Thu, Aug 15, 2013 at 11:46:07AM +0800, Xishi Qiu wrote:
> On 2013/8/15 10:44, Minchan Kim wrote:
> 
> > Hi Xishi,
> > 
> > On Thu, Aug 15, 2013 at 10:32:50AM +0800, Xishi Qiu wrote:
> >> On 2013/8/15 2:00, Mel Gorman wrote:
> >>
> >>>>> Even if the page is still page buddy, there is no guarantee that it's
> >>>>> the same page order as the first read. It could have be currently
> >>>>> merging with adjacent buddies for example. There is also a really
> >>>>> small race that a page was freed, allocated with some number stuffed
> >>>>> into page->private and freed again before the second PageBuddy check.
> >>>>> It's a bit of a hand grenade. How much of a performance benefit is there
> >>>>
> >>>> 1. Just worst case is skipping pageblock_nr_pages
> >>>
> >>> No, the worst case is that page_order returns a number that is
> >>> completely garbage and low_pfn goes off the end of the zone
> >>>
> >>>> 2. Race is really small
> >>>> 3. Higher order page allocation customer always have graceful fallback.
> >>>>
> >>
> >> Hi Minchan, 
> >> I think in this case, we may get the wrong value from page_order(page).
> >>
> >> 1. page is in page buddy
> >>
> >>> if (PageBuddy(page)) {
> >>
> >> 2. someone allocated the page, and set page->private to another value
> >>
> >>> 	int nr_pages = (1 << page_order(page)) - 1;
> >>
> >> 3. someone freed the page
> >>
> >>> 	if (PageBuddy(page)) {
> >>
> >> 4. we will skip wrong pages
> > 
> > So, what's the result by that?
> > As I said, it's just skipping (pageblock_nr_pages -1) at worst case
> 
> Hi Minchan,
> I mean if the private is set to a large number, it will skip 2^private 
> pages, not (pageblock_nr_pages -1). I find somewhere will use page->private, 
> such as fs. Here is the comment about parivate.
> /* Mapping-private opaque data:
>  * usually used for buffer_heads
>  * if PagePrivate set; used for
>  * swp_entry_t if PageSwapCache;
>  * indicates order in the buddy
>  * system if PG_buddy is set.
>  */

Please read full thread in detail.

Mel suggested following as

if (PageBuddy(page)) {
        int nr_pages = (1 << page_order(page)) - 1;
        if (PageBuddy(page)) {
                nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
                low_pfn += nr_pages;
                continue;
        }
}

min(nr_pages, xxx) removes your concern but I think Mel's version
isn't right. It should be aligned with pageblock boundary so I 
suggested following.

if (PageBuddy(page)) {
#ifdef CONFIG_MEMORY_ISOLATION
	unsigned long order = page_order(page);
	if (PageBuddy(page)) {
		low_pfn += (1 << order) - 1;
		low_pfn = min(low_pfn, end_pfn);
	}
#endif
	continue;
}

so worst case is (pageblock_nr_pages - 1).
but we don't need to add CONFIG_MEMORY_ISOLATION so my suggestion
is following as.

if (PageBuddy(page)) {
	unsigned long order = page_order(page);
	if (PageBuddy(page)) {
		low_pfn += (1 << order) - 1;
		low_pfn = min(low_pfn, end_pfn);
	}
	continue;
}


> Thanks,
> Xishi Qiu
> 
> > and the case you mentioned is right academically and I and Mel
> > already pointed out that. But how often could that happen in real
> > practice? I believe such is REALLY REALLY rare.
> > So, as Mel said, if you have some workloads to see the benefit
> > from this patch, I think we could accept the patch.
> > Could you try and respin with the number?
> > I guess big contigous memory range or memory-hotplug which are
> > full of free pages in embedded CPU which is rather slower than server
> > or desktop side could have benefit.
> > 
> > Thanks.
> > 
> >>
> >>> 		nr_pages = min(nr_pages, MAX_ORDER_NR_PAGES - 1);
> >>> 		low_pfn += nr_pages;
> >>> 		continue;
> >>> 	}
> >>> }
> >>>
> >>> It's still race-prone meaning that it really should be backed by some
> >>> performance data justifying it.
> >>>
> >>
> >>
> >>
> > 
> 
> 
> 

-- 
Kind regards,
Minchan Kim

  parent reply	other threads:[~2013-08-15  4:18 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-14  4:45 [PATCH] mm: skip the page buddy block instead of one page Xishi Qiu
2013-08-14  4:45 ` Xishi Qiu
2013-08-14  7:07 ` Minchan Kim
2013-08-14  7:07   ` Minchan Kim
2013-08-14  8:57 ` Mel Gorman
2013-08-14  8:57   ` Mel Gorman
2013-08-14  9:14   ` Xishi Qiu
2013-08-14  9:14     ` Xishi Qiu
2013-08-14 15:52   ` Minchan Kim
2013-08-14 15:52     ` Minchan Kim
2013-08-14 16:16     ` Mel Gorman
2013-08-14 16:16       ` Mel Gorman
2013-08-14 16:39       ` Minchan Kim
2013-08-14 16:39         ` Minchan Kim
2013-08-14 18:00         ` Mel Gorman
2013-08-14 18:00           ` Mel Gorman
2013-08-14 19:11           ` Minchan Kim
2013-08-14 19:11             ` Minchan Kim
2013-08-15  2:32           ` Xishi Qiu
2013-08-15  2:32             ` Xishi Qiu
2013-08-15  2:44             ` Minchan Kim
2013-08-15  2:44               ` Minchan Kim
2013-08-15  3:46               ` Xishi Qiu
2013-08-15  3:46                 ` Xishi Qiu
2013-08-15  3:59                 ` Wanpeng Li
2013-08-15  3:59                 ` Wanpeng Li
2013-08-15  4:17                 ` Minchan Kim [this message]
2013-08-15  4:17                   ` Minchan Kim
2013-08-15  4:24                   ` Minchan Kim
2013-08-15  4:24                     ` Minchan Kim
2013-08-15  7:45                     ` Xishi Qiu
2013-08-15  7:45                       ` Xishi Qiu
2013-08-15  9:51                       ` Wanpeng Li
2013-08-15  9:51                       ` Wanpeng Li
2013-08-15 11:15                         ` Xishi Qiu
2013-08-15 11:15                           ` Xishi Qiu
2013-08-15 11:23                           ` Wanpeng Li
2013-08-15 11:23                           ` Wanpeng Li
2013-08-15 11:17                         ` Xishi Qiu
2013-08-15 11:17                           ` Xishi Qiu
2013-08-15  6:38                   ` Xishi Qiu
2013-08-15  6:38                     ` Xishi Qiu
2013-08-15 11:30                   ` Mel Gorman
2013-08-15 11:30                     ` Mel Gorman
2013-08-15 13:19                     ` Minchan Kim
2013-08-15 13:19                       ` Minchan Kim
2013-08-15 13:42                       ` Mel Gorman
2013-08-15 13:42                         ` Mel Gorman
2013-08-15 14:16                         ` Minchan Kim
2013-08-15 14:16                           ` Minchan Kim
2013-08-14 20:26     ` Andrew Morton
2013-08-14 20:26       ` Andrew Morton
2013-08-14 22:22       ` Mel Gorman
2013-08-14 22:22         ` Mel Gorman
2014-01-17 14:32         ` [PATCH] mm: Improve documentation of page_order Mel Gorman
2014-01-17 14:32           ` Mel Gorman
2014-01-17 18:40           ` Rafael Aquini
2014-01-17 18:40             ` Rafael Aquini
2014-01-17 18:53           ` Laura Abbott
2014-01-17 18:53             ` Laura Abbott
2014-01-17 19:59             ` Mel Gorman
2014-01-17 19:59               ` Mel Gorman
2014-01-21 11:05               ` [PATCH] mm: Improve documentation of page_order v2 Mel Gorman
2014-01-21 11:05                 ` Mel Gorman
2014-01-20  6:12           ` [PATCH] mm: Improve documentation of page_order Minchan Kim
2014-01-20  6:12             ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130815041736.GA2592@gmail.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=qiuxishi@huawei.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.