public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: Andrew Morton <akpm@osdl.org>,
	William Lee Irwin <wli@holomorphy.com>,
	linux-kernel@vger.kernel.org
Subject: Re: hugepage: Strict page reservation for hugepage inodes
Date: Tue, 28 Feb 2006 20:14:40 +1100	[thread overview]
Message-ID: <20060228091440.GA28841@localhost.localdomain> (raw)
In-Reply-To: <117E3EB5059E4E48ADFF2822933287A43B3904@pdsmsx404>

On Tue, Feb 28, 2006 at 04:53:49PM +0800, Zhang, Yanmin wrote:
[snip]
> >>+	if (vma->vm_flags & VM_MAYSHARE) {
> >>+
> >>+		/* idx = radix tree index, i.e. offset into file in
> >>+		 * HPAGE_SIZE units */
> >>+		idx = ((addr - vma->vm_start) >> HPAGE_SHIFT)
> >>+			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
> >>+
> >>+		/* The hugetlbfs specific inode info stores the number
> >>+		 * of "guaranteed available" (huge) pages.  That is,
> >>+		 * the first 'prereserved_hpages' pages of the inode
> >>+		 * are either already instantiated, or have been
> >>+		 * pre-reserved (by hugetlb_reserve_for_inode()). Here
> >>+		 * we're in the process of instantiating the page, so
> >>+		 * we use this to determine whether to draw from the
> >>+		 * pre-reserved pool or the truly free pool. */
> >>+		if (idx < HUGETLBFS_I(inode)->prereserved_hpages)
> >>+			use_reserve = 1;
> >>+	}
> >>+
> >>+	if (!use_reserve) {
> >>+		if (free_huge_pages <= reserved_huge_pages)
> >>+			goto fail;
> >>+	} else {
> >>+		BUG_ON(reserved_huge_pages == 0);
> >>+		reserved_huge_pages--;
> [YM] Consider this scenario of multi-thread:
> One process has 2 threads. The process mmaps a hugetlb area with 1
> huge page and there is a free huge page. Later on, the 2 threads
> fault on the huge page at the same time.  The second thread would
> fail, and WARN_ON check is triggered, then the second thread is
> killed by function hugetlb_no_page.

That's why this patch *must* go after my other patch which serializes
the allocation->instantiation path.

> >> 	}
> >>+
> >>+	page = dequeue_huge_page(vma, addr);
> >>+	if (!page)
> >>+		goto fail;
> >>+
> >> 	spin_unlock(&hugetlb_lock);
> >> 	set_page_count(page, 1);
> >> 	return page;
> >>+
> >>+ fail:
> >>+	WARN_ON(use_reserve); /* reserved allocations shouldn't fail */
> >>+	spin_unlock(&hugetlb_lock);
> >>+	return NULL;
> >>+}
> >>+
> >>+/* hugetlb_extend_reservation()
> >>+ *
> >>+ * Ensure that at least 'atleast' hugepages are, and will remain,
> >>+ * available to instantiate the first 'atleast' pages of the given
> >>+ * inode.  If the inode doesn't already have this many pages reserved
> >>+ * or instantiated, set aside some hugepages in the reserved pool to
> >>+ * satisfy later faults (or fail now if there aren't enough, rather
> >>+ * than getting the SIGBUS later).
> >>+ */
> >>+int hugetlb_extend_reservation(struct hugetlbfs_inode_info *info,
> >>+			       unsigned long atleast)
> >>+{
> >>+	struct inode *inode = &info->vfs_inode;
> >>+	struct address_space *mapping = inode->i_mapping;
> >>+	unsigned long idx;
> >>+	unsigned long change_in_reserve = 0;
> >>+	struct page *page;
> >>+	int ret = 0;
> >>+
> >>+	spin_lock(&hugetlb_lock);
> >>+	read_lock_irq(&inode->i_mapping->tree_lock);
> >>+
> >>+	if (info->prereserved_hpages >= atleast)
> >>+		goto out;
> >>+
> >>+	/* prereserved_hpages stores the number of pages already
> >>+	 * guaranteed (reserved or instantiated) for this inode.
> >>+	 * Count how many extra pages we need to reserve. */
> >>+	for (idx = info->prereserved_hpages; idx < atleast; idx++) {
> >>+		page = radix_tree_lookup(&mapping->page_tree, idx);
> >>+		if (!page)
> >>+			/* Pages which are already instantiated don't
> >>+			 * need to be reserved */
> >>+			change_in_reserve++;
> >>+	}
> [YM] Why always to go through the page cache? prereserved_hpages and
> reserved_huge_pages are protected by hugetlb_lock.

Erm.. sorry, I don't see how that helps.  We need to go through the
page cache to see which pages have already been instantiated, and so
don't need to be reserved.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

  reply	other threads:[~2006-02-28  9:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-28  8:53 hugepage: Strict page reservation for hugepage inodes Zhang, Yanmin
2006-02-28  9:14 ` David Gibson [this message]
  -- strict thread matches above, loose matches on Subject: below --
2006-03-08  9:25 Zhang, Yanmin
2006-03-08 10:23 ` David Gibson
2006-03-08 18:38   ` Chen, Kenneth W
2006-03-08 23:52     ` 'David Gibson'
2006-03-09  0:19       ` Chen, Kenneth W
2006-03-09  0:30         ` 'David Gibson'
2006-03-09  0:29       ` Andrew Morton
2006-02-28  9:21 Zhang, Yanmin
2006-02-28 23:35 ` David Gibson
2006-02-28  7:11 David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060228091440.GA28841@localhost.localdomain \
    --to=david@gibson.dropbear.id.au \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=wli@holomorphy.com \
    --cc=yanmin.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox