linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
	Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
	Izik Eidus <ieidus@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Chris Wright <chrisw@sous-sol.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paul Mundt <lethal@linux-sh.org>
Subject: Re: [PATCH 25 of 28] transparent hugepage core
Date: Sun, 3 Jan 2010 18:38:03 +0000	[thread overview]
Message-ID: <20100103183802.GA11420@csn.ul.ie> (raw)
In-Reply-To: <20091223000640.GI6429@random.random>

On Wed, Dec 23, 2009 at 01:06:40AM +0100, Andrea Arcangeli wrote:
> On Mon, Dec 21, 2009 at 08:31:50PM +0000, Mel Gorman wrote:
> > My vague worry is that multiple huge page sizes are currently supported in
> > hugetlbfs but transparent support is obviously tied to the page-table level
> > it's implemented for. In the future, the term "huge" could be ambiguous . How
> > about instead of things like HUGE_MASK, it would be HUGE_PMD_MASK? It's not
> > something I feel very strongly about as eventually I'll remember what sort of
> > "huge" is meant in each context.
> 
> Ok this naming seems to be a little troublesome. HUGE_PMD_MASK would
> then require HUGE_PMD_SIZE. That is confusing a little to me, that is
> the size of the page not of the pmd... Maybe HPAGE_PMD_SIZE is better?

HPAGE_PMD_SIZE is better

> Overall this is just one #define and search and replace, I can do that
> if people likes it more than HPAGE_SIZE.
> 
> > /*
> >  * Currently uses  __GFP_REPEAT during allocation. Should be implemented
> >  * using page migration in the future
> >  */
> 
> Done! thanks.
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -75,6 +75,11 @@ static ssize_t enabled_store(struct kobj
>  static struct kobj_attribute enabled_attr =
>  	__ATTR(enabled, 0644, enabled_show, enabled_store);
>  
> +/*
> + * Currently uses __GFP_REPEAT during allocation. Should be
> + * implemented using page migration and real defrag algorithms in
> + * future VM.
> + */
>  static ssize_t defrag_show(struct kobject *kobj,
>  			   struct kobj_attribute *attr, char *buf)
>  {
> 
> > do_huge_pmd_anonymous_page makes sense.
> 
> Agreed, I already changed all methods called from memory.c to
> huge_memory.c with a "huge_pmd" prefix instead of just "huge".
> 
> > IA-64 can't in its currently implementation. Due to the page table format
> > they use, huge pages can only be mapped at specific ranges in the virtual
> > address space. If the long-format version of the page table was used, they
> 
> Hmm ok, so it sounds like hugetlbfs limitations are a software feature
> for ia64 too.
> 

It's not hugetlbfs that is the problem, it's the page table format
itself. There is a more flexible flexible long-form pagetable format
available on the hardware but Linux doesn't use it.

In theory, you could implement transparent support on IA-64 without
disabling the short-form pagetable format by disabling the hardware
pagetable walker altogether and handling TLB misses in software but it
would likely be an overall loss.

> > would be able to but I bet it's not happening any time soon. The best bet
> > for other architectures supporting this would be sparc and maybe sh.
> > It might be worth poking Paul Mundt in particular because he expressed
> > an interest in transparent support of some sort in the past for sh.
> 
> I added him to CC.
> 
> > Because huge pages cannot move. If the MOVABLE zone has been set up to
> > guarantee memory hot-plug removal, they don't want huge pages to be
> > getting in the way. To allow unconditional use of GFP_HIGHUSER_MOVABLE,
> > memory hotplug would have to know it can demote all the transparent huge
> > pages and migrate them that way.
> 
> It should already do. migrate.c calls try_to_unmap that will split
> them and migrate them just fine. If they can't be migrated I will
> remove GFP_HIGHUSER_MOVABLE but I think they can already. migrate.c
> can't notice the difference.
> 

Ok, if it is a case that the huge pages get demoted and migrated, then
the use of GFP_HIGHUSER_MOVABLE is not a problem.

> > My preference would be to move the alloc_mask into common code or at
> > least make it available via mm/internal.h because otherwise this will
> > collide with memory hot-remove in the future.
> 
> We can do that. But what I don't understand is why do_anonymous_page
> ses an unconditional GFP_HIGHUSER_MOVABLE.

Because it can be migrated.

> If there's no benefit to
> do_anonymous_page to turn off the gfp movable flag, I don't see why it
> could be beneficial to turn it off on hugepages.

There is no benefit in turning of the gfp movable flag. The presense of
the flag allows the use of ZONE_MOVABLE i.e. there is more physical
memory that can be potentially used.

> If there's good
> reason for that we surely can make it conditional into common code. I
> didn't look too hard for it, but what is the reason there is this flag
> in hugetlbfs?
> 

hugetlbfs does not use the flag by default because its pages cannot be migrated
(it could be implemented of course, but it hasn't been to date). The flag is
conditionally used because ZONE_MOVABLE can be used to almost guarantee that
X number of hugepages can always be allocated regardless of the fragmentation
state of the system. It's an "almost" guarantee because we do not have memory
defragmentation to move mlocked pages.

> > I would prefer pmd to be added to the huge names. However, this was
> > mostly to aid comprehension of the patchset when I was taking a quick
> 
> That is neutral to me... it's just that HPAGE_SIZE already existed so
> I tried to avoid adding unnecessary things but I'm not against
> HPAGE_PMD_SIZE, that will make it more clearer this is the size of a
> hugepage mapped by a pmd (and not a gigapage mapped by pud).
> 

Agreed.

> Thanks for the help! (we'll need more of your help in the defrag area
> too according to comment added above ;)
> 

I prototyped memory deframentation ages ago. It worked for the most case
but has bit-rotted significantly. I really should dig it out from
whatever hole I left it in.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-01-03 18:38 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-17 19:00 [PATCH 00 of 28] Transparent Hugepage support #2 Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 01 of 28] compound_lock Andrea Arcangeli
2009-12-17 19:46   ` Christoph Lameter
2009-12-18 14:27     ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 02 of 28] alter compound get_page/put_page Andrea Arcangeli
2009-12-17 19:50   ` Christoph Lameter
2009-12-18 14:30     ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 03 of 28] clear compound mapping Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 04 of 28] add native_set_pmd_at Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 05 of 28] add pmd paravirt ops Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 06 of 28] no paravirt version of pmd ops Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 07 of 28] export maybe_mkwrite Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 08 of 28] comment reminder in destroy_compound_page Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 09 of 28] config_transparent_hugepage Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 10 of 28] add pmd mangling functions to x86 Andrea Arcangeli
2009-12-18 18:56   ` Mel Gorman
2009-12-19 15:27     ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 11 of 28] add pmd mangling generic functions Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 12 of 28] special pmd_trans_* functions Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 13 of 28] bail out gup_fast on freezed pmd Andrea Arcangeli
2009-12-18 18:59   ` Mel Gorman
2009-12-19 15:48     ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 14 of 28] pte alloc trans splitting Andrea Arcangeli
2009-12-18 19:03   ` Mel Gorman
2009-12-19 15:59     ` Andrea Arcangeli
2009-12-21 19:57       ` Mel Gorman
2009-12-17 19:00 ` [PATCH 15 of 28] add pmd mmu_notifier helpers Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 16 of 28] clear page compound Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 17 of 28] add pmd_huge_pte to mm_struct Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 18 of 28] ensure mapcount is taken on head pages Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 19 of 28] split_huge_page_mm/vma Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 20 of 28] split_huge_page paging Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 21 of 28] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 22 of 28] clear_huge_page fix Andrea Arcangeli
2009-12-18 19:16   ` Mel Gorman
2009-12-17 19:00 ` [PATCH 23 of 28] clear_copy_huge_page Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 24 of 28] kvm mmu transparent hugepage support Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 25 of 28] transparent hugepage core Andrea Arcangeli
2009-12-18 20:03   ` Mel Gorman
2009-12-19 16:41     ` Andrea Arcangeli
2009-12-21 20:31       ` Mel Gorman
2009-12-23  0:06         ` Andrea Arcangeli
2009-12-23  6:09           ` Paul Mundt
2010-01-03 18:38           ` Mel Gorman [this message]
2010-01-04 15:49             ` Andrea Arcangeli
2010-01-04 16:58             ` Christoph Lameter
2010-01-04  6:16   ` Daisuke Nishimura
2010-01-04 16:04     ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 26 of 28] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 27 of 28] memcg compound Andrea Arcangeli
2009-12-18  1:27   ` KAMEZAWA Hiroyuki
2009-12-18 16:02     ` Andrea Arcangeli
2009-12-17 19:00 ` [PATCH 28 of 28] memcg huge memory Andrea Arcangeli
2009-12-18  1:33   ` KAMEZAWA Hiroyuki
2009-12-18 16:04     ` Andrea Arcangeli
2009-12-18 23:06       ` KAMEZAWA Hiroyuki
2009-12-20 18:39         ` Andrea Arcangeli
2009-12-21  0:26           ` KAMEZAWA Hiroyuki
2009-12-21  1:24             ` Daisuke Nishimura
2009-12-21  3:52               ` KAMEZAWA Hiroyuki
2009-12-21  4:33                 ` Daisuke Nishimura
2009-12-25  4:17                   ` Daisuke Nishimura
2009-12-25  4:37                     ` KAMEZAWA Hiroyuki
2009-12-24 10:00   ` Balbir Singh
2009-12-24 11:40     ` Andrea Arcangeli
2009-12-24 12:07       ` Balbir Singh
2009-12-17 19:54 ` [PATCH 00 of 28] Transparent Hugepage support #2 Christoph Lameter
2009-12-17 19:58   ` Rik van Riel
2009-12-17 20:09     ` Christoph Lameter
2009-12-18  5:12       ` Ingo Molnar
2009-12-18  6:18         ` KOSAKI Motohiro
2009-12-18 18:28         ` Christoph Lameter
2009-12-18 18:41           ` Dave Hansen
2009-12-18 19:17             ` Mike Travis
2009-12-18 19:28               ` Swap on flash SSDs Dave Hansen
2009-12-18 19:38                 ` Andi Kleen
2009-12-18 19:39                 ` Ingo Molnar
2009-12-18 20:13                   ` Linus Torvalds
2009-12-18 20:31                     ` Ingo Molnar
2009-12-19 18:38                   ` Jörn Engel
2009-12-18 14:05       ` [PATCH 00 of 28] Transparent Hugepage support #2 Andrea Arcangeli
2009-12-18 18:33         ` Christoph Lameter
2009-12-19 15:09           ` Andrea Arcangeli
2009-12-17 20:47     ` Mike Travis
2009-12-18  3:28       ` Rik van Riel
2009-12-18 14:12       ` Andrea Arcangeli
2009-12-18 12:52     ` Avi Kivity
2009-12-18 18:47 ` Dave Hansen
2009-12-19 15:20   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100103183802.GA11420@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=chrisw@sous-sol.org \
    --cc=cl@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=ieidus@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=lethal@linux-sh.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).