public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ray Bryant <raybry@sgi.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Andi Kleen <ak@suse.de>,
	lse-tech@lists.sourceforge.net, linux-ia64@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Lse-tech] Re: Hugetlbpages in very large memory machines.......
Date: Sun, 14 Mar 2004 02:38:33 -0600	[thread overview]
Message-ID: <40541A09.3050600@sgi.com> (raw)
In-Reply-To: <20040313184547.6e127b51.akpm@osdl.org>



Andrew Morton wrote:

>>
>> One drawback is that the out of memory handling is lot less nicer
>> than it was before - when you run out of hugepages you get SIGBUS
>> now instead of a ENOMEM from mmap. Maybe some prereservation would
>> make sense, but that would be somewhat harder. Alternatively
>> fall back to smaller pages if possible (I was told it isn't easily
>> possible on IA64)
> 
> 
> Demand-paging the hugepages is a decent feature to have, and ISTR resisting
> it before for this reason.
> 
> Even though it's early in the 2.6 series I'd be a bit worried about
> breaking existing hugetlb users in this way.  Yes, the pages are
> preallocated so it is unlikely that a working setup is suddenly going to
> break.  Unless someone is using the return value from mmap to find out how
> many pages they can get.
> 
> So ho-hum.  I think it needs to be back-compatible.  Could we add
> MAP_NO_PREFAULT?
> 
> 
> 

I agree with the compatibility concern, but the other part of the problem
is that while hugetlb_prefault() is running, it holds both the mm->mmap_sem in
write mode and the mm->page_table_lock.  So not only does it take 500 s for
the mmap() to return on our test system, but ps, top, etc all freeze for the
duration.  Very irritating, especially on a 64 or 128 P system.

My preference would be to do away with bugetlb_prefault() altogether.
(If there was a MAP_NO_PREFAULT, we would have to make this the default on
Altix to avoid the freeze problem mentioned above.  Can't have an arbitrary
user locking up the system.)  As Andi pointed out, perhaps we can do some
prereservation of huge pages so that we can return a ENONMEM to the mmap()
if there are not enough huge pages to (lazily) be allocated to satisfy the
request, but then still allocate the pages at fault time.  A simple count
would suffice.

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------


  parent reply	other threads:[~2004-03-14  8:34 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-13  3:44 Hugetlbpages in very large memory machines Ray Bryant
2004-03-13  3:48 ` Andi Kleen
2004-03-13  5:49   ` William Lee Irwin III
2004-03-13 16:10     ` [Lse-tech] " Andi Kleen
2004-03-14  0:05       ` William Lee Irwin III
2004-03-14  5:22         ` Peter Chubb
     [not found]     ` <844231526.20040313030948@adinet.com.uy>
     [not found]       ` <20040313061232.GB655@holomorphy.com>
2004-03-13 16:32         ` Re[2]: " Luis Mirabal
2004-03-14  2:45   ` Andrew Morton
2004-03-14  4:06     ` [Lse-tech] " Anton Blanchard
2004-03-17 19:05       ` Andy Whitcroft
2004-03-18 20:25         ` Andrew Morton
2004-03-18 21:22           ` Stephen Smalley
2004-03-18 22:21             ` Andy Whitcroft
2004-03-23 17:30         ` Andy Whitcroft
2004-03-24 17:38           ` Andy Whitcroft
2004-03-14  8:38     ` Ray Bryant [this message]
2004-03-14  8:48       ` William Lee Irwin III
2004-03-14  8:57       ` Andrew Morton
2004-03-14  9:02         ` Andrew Morton
2004-03-14  9:07         ` William Lee Irwin III
2004-03-15  6:45         ` Ray Bryant
2004-03-15 23:54           ` William Lee Irwin III
2004-03-13  3:55 ` William Lee Irwin III
2004-03-13  4:56 ` Hirokazu Takahashi
2004-03-16  0:30   ` Nobuhiko Yoshida
2004-03-16  1:54     ` Andi Kleen
2004-03-16  2:32       ` Hirokazu Takahashi
2004-03-16  3:20         ` Hirokazu Takahashi
2004-03-16  3:15       ` Nobuhiko Yoshida
2004-04-01  9:10         ` Nobuhiko Yoshida
2004-03-15 15:28 ` jlnance
  -- strict thread matches above, loose matches on Subject: below --
2004-03-15 23:31 [Lse-tech] " Seth, Rohit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40541A09.3050600@sgi.com \
    --to=raybry@sgi.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox