public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: "'David Gibson'" <david@gibson.dropbear.id.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>,
	linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: IA64 non-contiguous memory space bugs
Date: Wed, 22 Feb 2006 23:49:49 +0000	[thread overview]
Message-ID: <20060222234949.GB25108@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.61.0602221546040.9885@goblin.wat.veritas.com>

On Wed, Feb 22, 2006 at 04:35:34PM +0000, Hugh Dickins wrote:
> On Wed, 22 Feb 2006, 'David Gibson' wrote:
> > On Tue, Feb 21, 2006 at 05:51:52PM -0800, Chen, Kenneth W wrote:
> > > 
> > > free_pgtables() has partial crap that the check of is_hugepage_only_range()
> > > should be done on the entire vma range, not just the first hugetlb page.
> 
> We're testing whether this vma falls in a hugepage_only_range.  It's
> important to check the size (end) of the vma when setting it up; but
> when tearing down it is fair to assume that it was set up correctly,
> so unnecessary to check its size (end).
> 
> I used HPAGE_SIZE rather than 0 because one can imagine an implementation
> of is_hugepage_only_range which would go wrong with 0, or with a fraction
> of HPAGE_SIZE.  Perhaps you'd be happier to add an is_hugepage_only_addr
> macro which hides that size arg to is_hugepage_only_range.

(Aside: is_hugepage_only_range() isn't about telling where huge pages
can go, it's about telling where normal pages can't go.  As such it
must for it's primary callsite on the MAP_FIXED path in
get_unmapped_area() work with parameters that aren't HPAGE_SIZE
aligned).

> > > Though, it's not possible to have a hugetlb vma while having normal page
> > > instantiated inside that vma.
> 
> That is, if the setup does its checks correctly, you're not allowed
> to have a normal vma in (or spanning) a hugepage_only_range.
> 
> > > So the bug is mostly phantom.  For correctness, it should be fixed.
> 
> I believe the bug is non-existent, and therefore needs no fix.

The bug is real alright, I've watched it call hugetlb_free_pgd_range()
for a normal page VMA on powerpc.

> > Actually, from ppc64's point of view, the problem with the test is
> > that the whole vma could be *less* than HPAGE_SIZE - we don't test
> > that the address is aligned before checking is_hugepage_only_range().
> > We thus can call hugetlb_free_pgd_range() on normal page VMAs - which
> > we only get away with because the ppc64 hugetlb_free_pgd_range() is
> > (so far) an alias for the normal free_pgd_range().
> 
> Are you saying that on ppc64, you can put non-hugepage vmas into a
> hugepage_only_range?  If so, why is it called a hugepage_only_range?
> Or, are you saying that your hugepage_only_range is smaller than
> HPAGE_SIZE, so no hugepages can fit in it?  Or that you have
> hugepage vma start addresses not aligned to HPAGE_SIZE?
> None of that makes sense to me.

None of the above.  However, is_hugepage_only_range() does not need to
be called on a hugepage aligned range (and is not here), and returns
true (and must do so) if the given range intersects a hugepage only
area, not only if it lies entirely within a hugepage only area.

Consider a HPAGE_SIZE hugepage VMA starting at 4GB, and a normal page
VMA starting at (4GB-PAGE_SIZE).  This situation is possible on
powerpc, and is_hugepage_only_range(4GB-PAGE_SIZE, HPAGE_SIZE) will
(and must) return true.  Therefore the free_pgtables() logic will call
hugetlb_free_pgd_range() across the normal page VMA.

> > Your patch below is insufficient, because there's a second test of
> > is_hugepage_only_range() further down.  However, instead of tweaking
> > the tested ranges, I think what we really want to do is check for
> > is_vm_hugetlb_page() instead.
> 
> No, that looks cleaner, but it's wrong.  hugetlb_free_pgd_range does
> something useful on those architectures which have a not-always-false
> is_hugepage_only_range (ia64 and powerpc alone): it's paired with it.
> (Though as you've noticed, powerpc only does the usual free_pgd_range.)
> 
> hugetlb_free_pgd_range does nothing on most architectures, even those
> (i386 etc) which have a not-always-false is_vm_hugetlb_page: we do
> want to free_pgd_range on those.  So using is_vm_hugetlb_page instead
> of is_hugepage_only_range is wrong for them.  Though I guess you could
> change their hugetlb_free_pgd_range definitions to free_pgd_range, and
> then use is_vm_hugetlb_page as you did, that would work too (though
> with less combining of vmas in that loop, so not an improvement).

Yes, I realised that was wrong shortly after posting.  In fact it's
wrong in just the same way that is_hugepage_only_range() is wrong for
powerpc right now - which we work around becuse
hugetlb_free_pgd_range() is identical to free_pgd_range().

I can see two ways of fixing this.  The quick, hacky fix is to use
is_vm_hugetlb_page(), and work around the problems by having
hugetlb_free_pgd_range() be identical to free_pgd_range() in most
cases.  Fixing it more cleanly will need a new callback that actually
encodes this "need special pagetable freeing" concept.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

  reply	other threads:[~2006-02-22 23:49 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-22  0:13 IA64 non-contiguous memory space bugs David Gibson
2006-02-22  0:39 ` David S. Miller
2006-02-22  2:17   ` David Gibson
2006-02-22  1:31 ` Chen, Kenneth W
2006-02-22  2:15   ` 'David Gibson'
2006-02-22  1:51 ` Chen, Kenneth W
2006-02-22  2:25   ` 'David Gibson'
2006-02-22 16:35     ` Hugh Dickins
2006-02-22 23:49       ` 'David Gibson' [this message]
2006-02-23 20:13         ` Hugh Dickins
2006-02-24  0:11           ` 'David Gibson'
2006-02-22  2:45 ` Chen, Kenneth W
2006-02-22  2:53 ` Chen, Kenneth W
2006-02-22  3:55   ` David Mosberger-Tang
2006-02-22  4:02     ` David Gibson
2006-02-22  3:01 ` Zhang, Yanmin
2006-02-22 16:19 ` Chen, Kenneth W
2006-02-22 23:26   ` 'David Gibson'
2006-02-24  1:14 ` Chen, Kenneth W
2006-02-24  2:46   ` 'David Gibson'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060222234949.GB25108@localhost.localdomain \
    --to=david@gibson.dropbear.id.au \
    --cc=hugh@veritas.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox