From: David Gibson <david@gibson.dropbear.id.au>
To: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: Andrew Morton <akpm@osdl.org>,
William Lee Irwin <wli@holomorphy.com>,
linux-kernel@vger.kernel.org
Subject: Re: hugepage: Strict page reservation for hugepage inodes
Date: Tue, 28 Feb 2006 20:14:40 +1100 [thread overview]
Message-ID: <20060228091440.GA28841@localhost.localdomain> (raw)
In-Reply-To: <117E3EB5059E4E48ADFF2822933287A43B3904@pdsmsx404>
On Tue, Feb 28, 2006 at 04:53:49PM +0800, Zhang, Yanmin wrote:
[snip]
> >>+ if (vma->vm_flags & VM_MAYSHARE) {
> >>+
> >>+ /* idx = radix tree index, i.e. offset into file in
> >>+ * HPAGE_SIZE units */
> >>+ idx = ((addr - vma->vm_start) >> HPAGE_SHIFT)
> >>+ + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
> >>+
> >>+ /* The hugetlbfs specific inode info stores the number
> >>+ * of "guaranteed available" (huge) pages. That is,
> >>+ * the first 'prereserved_hpages' pages of the inode
> >>+ * are either already instantiated, or have been
> >>+ * pre-reserved (by hugetlb_reserve_for_inode()). Here
> >>+ * we're in the process of instantiating the page, so
> >>+ * we use this to determine whether to draw from the
> >>+ * pre-reserved pool or the truly free pool. */
> >>+ if (idx < HUGETLBFS_I(inode)->prereserved_hpages)
> >>+ use_reserve = 1;
> >>+ }
> >>+
> >>+ if (!use_reserve) {
> >>+ if (free_huge_pages <= reserved_huge_pages)
> >>+ goto fail;
> >>+ } else {
> >>+ BUG_ON(reserved_huge_pages == 0);
> >>+ reserved_huge_pages--;
> [YM] Consider this scenario of multi-thread:
> One process has 2 threads. The process mmaps a hugetlb area with 1
> huge page and there is a free huge page. Later on, the 2 threads
> fault on the huge page at the same time. The second thread would
> fail, and WARN_ON check is triggered, then the second thread is
> killed by function hugetlb_no_page.
That's why this patch *must* go after my other patch which serializes
the allocation->instantiation path.
> >> }
> >>+
> >>+ page = dequeue_huge_page(vma, addr);
> >>+ if (!page)
> >>+ goto fail;
> >>+
> >> spin_unlock(&hugetlb_lock);
> >> set_page_count(page, 1);
> >> return page;
> >>+
> >>+ fail:
> >>+ WARN_ON(use_reserve); /* reserved allocations shouldn't fail */
> >>+ spin_unlock(&hugetlb_lock);
> >>+ return NULL;
> >>+}
> >>+
> >>+/* hugetlb_extend_reservation()
> >>+ *
> >>+ * Ensure that at least 'atleast' hugepages are, and will remain,
> >>+ * available to instantiate the first 'atleast' pages of the given
> >>+ * inode. If the inode doesn't already have this many pages reserved
> >>+ * or instantiated, set aside some hugepages in the reserved pool to
> >>+ * satisfy later faults (or fail now if there aren't enough, rather
> >>+ * than getting the SIGBUS later).
> >>+ */
> >>+int hugetlb_extend_reservation(struct hugetlbfs_inode_info *info,
> >>+ unsigned long atleast)
> >>+{
> >>+ struct inode *inode = &info->vfs_inode;
> >>+ struct address_space *mapping = inode->i_mapping;
> >>+ unsigned long idx;
> >>+ unsigned long change_in_reserve = 0;
> >>+ struct page *page;
> >>+ int ret = 0;
> >>+
> >>+ spin_lock(&hugetlb_lock);
> >>+ read_lock_irq(&inode->i_mapping->tree_lock);
> >>+
> >>+ if (info->prereserved_hpages >= atleast)
> >>+ goto out;
> >>+
> >>+ /* prereserved_hpages stores the number of pages already
> >>+ * guaranteed (reserved or instantiated) for this inode.
> >>+ * Count how many extra pages we need to reserve. */
> >>+ for (idx = info->prereserved_hpages; idx < atleast; idx++) {
> >>+ page = radix_tree_lookup(&mapping->page_tree, idx);
> >>+ if (!page)
> >>+ /* Pages which are already instantiated don't
> >>+ * need to be reserved */
> >>+ change_in_reserve++;
> >>+ }
> [YM] Why always to go through the page cache? prereserved_hpages and
> reserved_huge_pages are protected by hugetlb_lock.
Erm.. sorry, I don't see how that helps. We need to go through the
page cache to see which pages have already been instantiated, and so
don't need to be reserved.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
next prev parent reply other threads:[~2006-02-28 9:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-28 8:53 hugepage: Strict page reservation for hugepage inodes Zhang, Yanmin
2006-02-28 9:14 ` David Gibson [this message]
-- strict thread matches above, loose matches on Subject: below --
2006-03-08 9:25 Zhang, Yanmin
2006-03-08 10:23 ` David Gibson
2006-03-08 18:38 ` Chen, Kenneth W
2006-03-08 23:52 ` 'David Gibson'
2006-03-09 0:19 ` Chen, Kenneth W
2006-03-09 0:30 ` 'David Gibson'
2006-03-09 0:29 ` Andrew Morton
2006-02-28 9:21 Zhang, Yanmin
2006-02-28 23:35 ` David Gibson
2006-02-28 7:11 David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060228091440.GA28841@localhost.localdomain \
--to=david@gibson.dropbear.id.au \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=wli@holomorphy.com \
--cc=yanmin.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.