linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	kniht@linux.vnet.ibm.com, abh@cray.com, wli@holomorphy.com
Subject: Re: [patch 00/18] multi size, and giant hugetlb page support, 1GB hugetlb for x86
Date: Thu, 24 Apr 2008 10:08:04 -0700	[thread overview]
Message-ID: <20080424170804.GB8451@us.ibm.com> (raw)
In-Reply-To: <20080424070624.GA14543@wotan.suse.de>

On 24.04.2008 [09:06:24 +0200], Nick Piggin wrote:
> On Wed, Apr 23, 2008 at 11:43:50PM -0700, Nishanth Aravamudan wrote:
> > On 24.04.2008 [04:08:28 +0200], Nick Piggin wrote:
> > > On Wed, Apr 23, 2008 at 11:52:23AM -0700, Nishanth Aravamudan wrote:
> > > > On 23.04.2008 [17:53:38 +0200], Nick Piggin wrote:
> > > > > > It's not fully compatible. And that is bad.
> > > > > 
> > > > > It is fully compatible because if you don't actually ask for
> > > > > any new option then you don't get it. What you see will be
> > > > > exactly unchanged.  If you ask for _only_ 1G pages, then this
> > > > > new scheme is very likely to work with well written
> > > > > applications wheras if you also print out the 2MB legacy
> > > > > values first, then they have little to no chance of working.
> > > > > 
> > > > > Then if you want legacy apps to use 2MB pages, and new ones to
> > > > > use 1G, then you ask for both and get the 2MB column printed
> > > > > in /proc/meminfo (actually it can probably get printed 2nd if
> > > > > you ask for 2MB pages after asking for 1G pages -- that is
> > > > > something I'll fix).
> > > > 
> > > > Yep, the "default hugepagesz" was something I was going to ask
> > > > about. I believe hugepagesz= should function kind of like
> > > > console= where the order matters if specified multiple times for
> > > > where /dev/console points.  I agree with you that hugepagesz=XX
> > > > hugepagesz=YY implies XX is the
> > > > default, and YY is the "other", regardless of their values, and that is
> > > > how they should be presented in meminfo.
> > > 
> > > OK, that would be fine. I was going to do it the other way and
> > > make 2M always come first. However so long as we document as such
> > > the command line parameters, I don't see why we couldn't have this
> > > extra flexibility (and that means I shouldn't have to write any
> > > more code ;))
> > 
> > Keep in mind, I did retract this to some extent in my other
> > reply...After thinking about Andi's points a bit more, I believe the
> > most flexible (not too-x86_64-centric, either) option is to have all
> > potential hugepage sizes be "available" at run-time. What hugepages
> > are allocated at boot-time is all that is specified on the kernel
> > command-line, in that case (and is only truly necessary for the
> > ginormous hugepages, and needs to be heavily documented as such).
> > 
> > Realistically, yes, we could have it either way (hugepagesz=
> > determines the order), but it shouldn't matter to well-written
> > applications, so keeping things reflecting current reality as much
> > as possible does make sense -- that is, 2M would always come first
> > meminfo on x86_64.
> > 
> > If you want, I can send you a patch to do that, as I start the sysfs
> > patches.
> 
> Honestly, I don't really care about the exact behaviour and user APIs.
> 
> I agree with the point Andi stresses that backwards compatibility is
> #1 priority; and with unchanged kernel command line / config options,
> I think we need to have /proc/meminfo give *unchanged* (ie. single
> column) output.

Ok -- so meminfo will have one format (single column) if the command
line is unchanged, and a different one if, say "hugepagesz=1G" is
specified?

Should we just leave the default hugepage size info in /proc/meminfo
(always single column) and use sysfs for everything else? Including
hugepage meminfo's on a page-size basis? I guess that would violate
sysfs rules, but might be fine for a proof-of-concept?

> Second, future apps obviously should use some more appropriate sysfs
> tunables and be aware of multiple hstates.

Indeed.

> Finally, I would have thought people would be interested in *trying*
> to get legacy apps to work with 1G hugepages (eg. oracle/db2 or HPC
> stuff could probably make use of them quite nicely). However this 3rd
> consideration is obviously the least important of the 3. I wouldn't
> lose any sleep if my option doesn't get in.

Well, there are two interfaces, right?

1) SHM_HUGETLB
  I'm not sure how to extend this best. iirc, SHM_HUGETLB uses an
  internal (invisible) hugetlbfs mount. And I don't think it specifies a
  size or anything to said mount...so unless *only* 1G hugepages are
  available (which we've decided will not be the case?), I believe
  SHM_HUGETLB as currently used will never use them.

2) hugetlbfs
  By mounting hugetlbfs with size= (I believe), we can specify which
  pool should be accessed by files in the mount. This is what
  libhugetlbfs would leverage to use different hugepage sizes. There has
  been some discussion on that list and among some of us working on
  libhugetlbfs on how best to allow applications to specify the size
  they'd prefer. Eric Munson has been working on a binary (hugectl) to
  demonstrate hugepage-backed stacks in-kernel, which might be
  extended to include a --preferred-size flag (it's essentially an
  exec() wrapper, in the same vein as numactl). In any case,
  libhugetlbfs could be used (by only mounting the 1G sized hugetlbfs)
  for legacy apps without modification (well segment remapping may not
  work due to alignments, but should be easy to fix, and will probably
  be fixed in 2.0, which will change our remapping algorithm).

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-04-24 17:06 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-23  1:53 [patch 00/18] multi size, and giant hugetlb page support, 1GB hugetlb for x86 npiggin
2008-04-23  1:53 ` [patch 01/18] hugetlb: fix lockdep spew npiggin
2008-04-23 13:06   ` KOSAKI Motohiro
2008-04-23  1:53 ` [patch 02/18] hugetlb: factor out huge_new_page npiggin
2008-04-24 23:49   ` Nishanth Aravamudan
2008-04-24 23:54   ` Nishanth Aravamudan
2008-04-24 23:58     ` Nishanth Aravamudan
2008-04-25  7:10       ` Andi Kleen
2008-04-25 16:54         ` Nishanth Aravamudan
2008-04-25 19:13           ` Christoph Lameter
2008-04-25 19:29             ` Nishanth Aravamudan
2008-04-30 19:16               ` Christoph Lameter
2008-04-30 20:44                 ` Nishanth Aravamudan
2008-05-01 19:23                   ` Christoph Lameter
2008-05-01 20:25                     ` Nishanth Aravamudan
2008-05-01 20:34                       ` Christoph Lameter
2008-05-01 21:01                         ` Nishanth Aravamudan
2008-05-23  5:03                           ` Nick Piggin
2008-04-23  1:53 ` [patch 03/18] mm: offset align in alloc_bootmem npiggin, Yinghai Lu
2008-04-23  1:53 ` [patch 04/18] hugetlb: modular state npiggin
2008-04-23 15:21   ` Jon Tollefson
2008-04-23 15:38     ` Nick Piggin
2008-04-25 17:13   ` Nishanth Aravamudan
2008-05-23  5:02     ` Nick Piggin
2008-05-23 20:48       ` Nishanth Aravamudan
2008-04-23  1:53 ` [patch 05/18] hugetlb: multiple hstates npiggin
2008-04-25 17:38   ` Nishanth Aravamudan
2008-04-25 17:48     ` Nishanth Aravamudan
2008-04-25 17:55     ` Andi Kleen
2008-04-25 17:52       ` Nishanth Aravamudan
2008-04-25 18:10         ` Andi Kleen
2008-04-28 10:13           ` Andy Whitcroft
2008-05-23  5:18     ` Nick Piggin
2008-04-29 17:27   ` Nishanth Aravamudan
2008-05-23  5:19     ` Nick Piggin
2008-04-23  1:53 ` [patch 06/18] hugetlb: multi hstate proc files npiggin
2008-05-02 19:53   ` Nishanth Aravamudan
2008-05-23  5:22     ` Nick Piggin
2008-05-23 20:30       ` Nishanth Aravamudan
2008-04-23  1:53 ` [patch 07/18] hugetlbfs: per mount hstates npiggin
2008-04-25 18:09   ` Nishanth Aravamudan
2008-04-25 20:36     ` Nishanth Aravamudan
2008-04-25 22:39       ` Nishanth Aravamudan
2008-04-28 18:20         ` Adam Litke
2008-04-28 18:46           ` Nishanth Aravamudan
2008-05-23  5:24     ` Nick Piggin
2008-05-23 20:34       ` Nishanth Aravamudan
2008-05-23 22:49         ` Nick Piggin
2008-05-23 23:24           ` Nishanth Aravamudan
2008-04-23  1:53 ` [patch 08/18] hugetlb: multi hstate sysctls npiggin
2008-04-25 18:14   ` Nishanth Aravamudan
2008-05-23  5:25     ` Nick Piggin
2008-05-23 20:27       ` Nishanth Aravamudan
2008-04-25 23:35   ` Nishanth Aravamudan
2008-05-23  5:28     ` Nick Piggin
2008-05-23 10:40       ` Andi Kleen
2008-04-23  1:53 ` [patch 09/18] hugetlb: abstract numa round robin selection npiggin
2008-04-23  1:53 ` [patch 10/18] mm: introduce non panic alloc_bootmem npiggin
2008-04-23  1:53 ` [patch 11/18] mm: export prep_compound_page to mm npiggin
2008-04-23 16:12   ` Andrew Hastings
2008-05-23  5:29     ` Nick Piggin
2008-04-23  1:53 ` [patch 12/18] hugetlbfs: support larger than MAX_ORDER npiggin
2008-04-23 16:15   ` Andrew Hastings
2008-04-23 16:25     ` Andi Kleen
2008-04-25 18:55   ` Nishanth Aravamudan
2008-05-23  5:29     ` Nick Piggin
2008-04-30 21:01   ` Dave Hansen
2008-05-23  5:30     ` Nick Piggin
2008-04-23  1:53 ` [patch 13/18] hugetlb: support boot allocate different sizes npiggin
2008-04-23 16:15   ` Andrew Hastings
2008-04-25 18:40   ` Nishanth Aravamudan
2008-04-25 18:50     ` Andi Kleen
2008-04-25 20:05       ` Nishanth Aravamudan
2008-05-23  5:36     ` Nick Piggin
2008-05-23  6:04       ` Nick Piggin
2008-05-23 20:32         ` Nishanth Aravamudan
2008-05-23 22:45           ` Nick Piggin
2008-05-23 22:53             ` Nishanth Aravamudan
2008-04-23  1:53 ` [patch 14/18] hugetlb: printk cleanup npiggin
2008-04-27  3:32   ` Nishanth Aravamudan
2008-05-23  5:37     ` Nick Piggin
2008-04-23  1:53 ` [patch 15/18] hugetlb: introduce huge_pud npiggin
2008-04-23  1:53 ` [patch 16/18] x86: support GB hugepages on 64-bit npiggin
2008-04-23  1:53 ` [patch 17/18] x86: add hugepagesz option " npiggin
2008-04-30 19:34   ` Nishanth Aravamudan
2008-04-30 19:52     ` Andi Kleen
2008-04-30 20:02       ` Nishanth Aravamudan
2008-04-30 20:19         ` Andi Kleen
2008-04-30 20:23           ` Nishanth Aravamudan
2008-04-30 20:45             ` Andi Kleen
2008-04-30 20:51               ` Nishanth Aravamudan
2008-04-30 20:40     ` Jon Tollefson
2008-04-30 20:48   ` Nishanth Aravamudan
2008-05-23  5:41     ` Nick Piggin
2008-05-23 10:43       ` Andi Kleen
2008-05-23 12:34         ` Nick Piggin
2008-05-23 14:29           ` Andi Kleen
2008-05-23 20:43             ` Nishanth Aravamudan
2008-05-23 20:39       ` Nishanth Aravamudan
2008-05-23 22:52         ` Nick Piggin
2008-04-23  1:53 ` [patch 18/18] hugetlb: my fixes 2 npiggin
2008-04-23 10:48   ` Andi Kleen
2008-04-23 15:36     ` Nick Piggin
2008-04-23 18:49     ` Nishanth Aravamudan
2008-04-23 19:37       ` Andi Kleen
2008-04-23 21:11         ` Nishanth Aravamudan
2008-04-23 21:38           ` Nishanth Aravamudan
2008-04-23 22:06           ` Dave Hansen
2008-04-23 15:20   ` Jon Tollefson
2008-04-23 15:44     ` Nick Piggin
2008-04-23  8:05 ` [patch 00/18] multi size, and giant hugetlb page support, 1GB hugetlb for x86 Andi Kleen
2008-04-23 15:34   ` Nick Piggin
2008-04-23 15:46     ` Andi Kleen
2008-04-23 15:53       ` Nick Piggin
2008-04-23 16:02         ` Andi Kleen
2008-04-23 16:02           ` Nick Piggin
2008-04-23 18:54           ` Nishanth Aravamudan
2008-04-23 18:52         ` Nishanth Aravamudan
2008-04-24  2:08           ` Nick Piggin
2008-04-24  6:43             ` Nishanth Aravamudan
2008-04-24  7:06               ` Nick Piggin
2008-04-24 17:08                 ` Nishanth Aravamudan [this message]
2008-04-23 18:43   ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080424170804.GB8451@us.ibm.com \
    --to=nacc@us.ibm.com \
    --cc=abh@cray.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=kniht@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).