From: 'David Gibson' <david@gibson.dropbear.id.au>
To: Andrew Morton <akpm@osdl.org>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>,
'Christoph Lameter' <christoph@schroedinger.engr.sgi.com>,
Hugh Dickins <hugh@veritas.com>,
bill.irwin@oracle.com, Adam Litke <agl@us.ibm.com>,
linux-mm@kvack.org
Subject: Re: [RFC] reduce hugetlb_instantiation_mutex usage
Date: Fri, 27 Oct 2006 13:11:56 +1000 [thread overview]
Message-ID: <20061027031156.GH11733@localhost.localdomain> (raw)
In-Reply-To: <20061026170415.ec0bb0b9.akpm@osdl.org>
On Thu, Oct 26, 2006 at 05:04:15PM -0700, Andrew Morton wrote:
> On Fri, 27 Oct 2006 09:31:37 +1000
> "'David Gibson'" <david@gibson.dropbear.id.au> wrote:
>
> > On Thu, Oct 26, 2006 at 03:44:51PM -0700, Andrew Morton wrote:
> > > On Thu, 26 Oct 2006 15:17:20 -0700
> > > "Chen, Kenneth W" <kenneth.w.chen@intel.com> wrote:
> > >
> > > > First rev of patch to allow hugetlb page fault to scale.
> > > >
> > > > hugetlb_instantiation_mutex was introduced to prevent spurious allocation
> > > > failure in a corner case: two threads race to instantiate same page with
> > > > only one free page left in the global pool. However, this global
> > > > serialization hurts fault performance badly as noted by Christoph Lameter.
> > > > This patch attempt to cut back the use of mutex only when free page resource
> > > > is limited, thus allow fault to scale in most common cases.
> > > >
> > >
> > > ug.
> > >
> > > How about we kill that instantiation_mutex thing altogether and fix
> > > the original bug in a better fashion? Like...
> > >
> > > In hugetlb_no_page():
> > >
> > > retry:
> > > page = find_lock_page(...)
> > > if (!page) {
> > > write_lock_irq(&mapping->tree_lock);
> > > if (radix_tree_lookup(...)) {
> > > write_unlock_irq(tree_lock);
> > > goto retry;
> > > }
> > > page = alloc_huge_page(...);
> > > if (!page)
> > > bail;
> > > radix_tree_insert(...);
> > > SetPageLocked(page);
> > > write_unlock_irq(tree_lock);
> > > clear_huge_page(...);
> > > }
> > >
> > > <stick it in page tables>
> > >
> > > unlock_page(page);
> > >
> > > The key points:
> > >
> > > - Use tree_lock to prevent the race
> > >
> > > - allocate the hugepage inside tree_lock so we never get into this
> > > two-threads-tried-to-allocate-the-final-page problem.
> > >
> > > - The hugepage is zeroed without locks held, under lock_page()
> > >
> > > - lock_page() is used to make the other thread(s) sleep while the winner
> > > thread is zeroing out the page.
> > >
> > > It means that rather a lot of add_to_page_cache() will need to be copied
> > > into hugetlb_no_page().
> >
> > This handles the case of processes racing on a shared mapping, but not
> > the case of threads racing on a private mapping. In the latter case
> > the race ends at the set_pte() rather than the add_to_page_cache()
> > (well, strictly with the whole page_table_lock atomic lump). And we
> > can't move the clear after the set_pte() :(.
> >
>
> I expect we can do a similar thing, using page_table_lock to prevent the
> race.
>
> The key is to be able to make racing threads still block on the page lock.
> Perhaps install a temp pte which is !pte_present() and also !pte_none().
> So the racing thread can use that pte to locate and wait upon the
> presently-locked page while it is being COWed by another CPU.
Um.. yes, that might work. Though I'd need to think hard about a more
specific scheme. I've been through a lot of approaches lately that
looked ok at first glance, but weren't :-/
And obviously we'd need to make sure such "tentative" PTEs are
constructible won't confuse other code on each relevant architecture.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-10-27 3:11 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-26 22:17 [RFC] reduce hugetlb_instantiation_mutex usage Chen, Kenneth W
2006-10-26 22:44 ` Andrew Morton
2006-10-26 23:31 ` 'David Gibson'
2006-10-27 0:04 ` Andrew Morton
2006-10-27 3:11 ` 'David Gibson' [this message]
2006-10-27 3:35 ` Andrew Morton
2006-10-27 4:06 ` 'David Gibson'
2006-10-31 2:54 ` Chen, Kenneth W
2006-10-31 3:17 ` 'David Gibson'
2006-10-31 5:15 ` Chen, Kenneth W
2006-10-31 11:05 ` 'David Gibson'
2006-10-31 12:48 ` Hugh Dickins
2006-11-01 6:18 ` Nick Piggin
2006-11-01 10:17 ` Chen, Kenneth W
2006-11-02 3:06 ` Nick Piggin
2006-11-02 2:29 ` 'David Gibson'
2006-10-27 1:47 ` 'David Gibson'
2006-10-30 20:55 ` Adam Litke
2006-10-26 23:47 ` 'David Gibson'
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061027031156.GH11733@localhost.localdomain \
--to=david@gibson.dropbear.id.au \
--cc=agl@us.ibm.com \
--cc=akpm@osdl.org \
--cc=bill.irwin@oracle.com \
--cc=christoph@schroedinger.engr.sgi.com \
--cc=hugh@veritas.com \
--cc=kenneth.w.chen@intel.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.