From: David Gibson <david@gibson.dropbear.id.au>
To: Becky Bruce <beckyb@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 04/13] powerpc: Update hugetlb huge_pte_alloc and tablewalk code for FSL BOOKE
Date: Wed, 30 Nov 2011 12:10:18 +1100 [thread overview]
Message-ID: <20111130011018.GB5435@truffala.fritz.box> (raw)
In-Reply-To: <9FB8A0F3-ED6C-4C4F-BB9F-87CDEB19C3C2@kernel.crashing.org>
On Tue, Nov 29, 2011 at 10:36:42AM -0600, Becky Bruce wrote:
>
> On Nov 28, 2011, at 11:25 PM, Benjamin Herrenschmidt wrote:
>
> > On Mon, 2011-10-10 at 15:50 -0500, Becky Bruce wrote:
> >> From: Becky Bruce <beckyb@kernel.crashing.org>
> >>
> >> This updates the hugetlb page table code to handle 64-bit FSL_BOOKE.
> >> The previous 32-bit work counted on the inner levels of the page table
> >> collapsing.
> >
> > Seriously, my brain hurts !!!
>
> Now you know how I felt when I got the original code from David :)
Heh, I can't blame you. Between the constraints of the hardware and
fitting in with x86, that thing's accreted into a horrible pile.
> > So I've tried to understand that code and so far, what I came up with is
> > this, please reply and let me know if I'm full of crack or what ...
> >
> > - David's code has entire levels "branching off" into hugepd's which
> > are hugetlb specific page dirs. That requires address space constrainst.
>
> Yes.
>
> > - The hack you do with HUGEPD_PGD_SHIFT defined to PGDIR_SHIFT and
> > HUGEPD_PUD_SHIFT to PUD_SHIFT consists of collasping that back into the
> > normal levels.
>
> That exists so Jimi's code for A2 and mine can coexist peacefully
> without ifdeffing huge blocks because we peel off the page table at
> different points for a hugepd. In my case, if I have a 4M page,
> that is handled by having 2 entries at the 2M layer, each of which
> is a pointer to the same pte stored in a kmem_cache created for that
> purpose. In the server/A2 case, they peel off at the layer above 4M
> and have a sub-table kmem_cache that has a bunch of same-size huge
> page ptes (this is all because of the slice constraint).
>
> > - I really don't understand what you are doing in __hugepte_alloc(). It
> > seems to me that you are trying to make things still point to some kind
> > of separate hugepd dir with the hugeptes in it and have the page tables
> > point to that but I don't quite get it.
>
> In your example below, the alloc code is:
> 1) allocating a small kmem_cache for the pte
>
> 2) filling in 8 entries at the 2M level with the pointers to that pte, with the upper bit munged to indicate huge and bits in the lower region that store the huge page size because it can't be encoded in the book3e pte format
>
> > - Couldn't we just instead ditch the whole hugepd concept alltogether
> > and simply have the huge ptes in the page table at the right level,
> > using possibly multiple consecutive of them for a single page when
> > needed ?
> >
> > Example: 4K base page size. PMD maps 2M. a 16M page could be
> > representing by having 8 consecutive hugeptes pointing to the same huge
> > page in the PMD directory.
>
> We currently have 8 consecutive PMD entries that are pointers to the
> same kmem_cache that holds the actual PTE. I did this for a few
> reasons:
>
> 1) I was trying to stay as close to what David had done as possible
>
> 2) symmetry - in every other case entries at higher levels of the
> normal page table are pointers to something, and it's easy to
> identify that something is a pointer to hugepte using David's
> upper-bit-flipping trick. If we have an actual entry mixed in with
> the pointers it might be hard to tell that's it's an actual PTE and
> not a pointer without getting information from somewhere else
> (i.e. the vma)
>
> 3) I was trying to avoid having multiple copies of the actual pte -
> this way it's easy to do stuff like change the perms on the PTE,
> since I only have to modify one copy
>
> 4) I needed the information laid out for quick TLB miss
> fault-handling of hugepte tlb misses (see below).
>
> > I believe we always "know" when accessing a PTE whether it's going to be
> > huge or not and if it's huge, the page size. IE. All the functions we
> > care about either get the vma (which gets you everything you need) or
> > the size (huge_pte_alloc).
>
> An exception is the 32-bit fault hander asm code. It does a walk of
> the page table to reload the tlb. We need to be able to easily
> identify that we're walking into a hugepage area so we know to load
> the tlbcam. Having the pointer there with the munged upper bit that
> David devised is very convenient for that. Also, the Book3e page
> table format does not allow us to represent page sizes > 32m. So
> that's encoded in the hugepd instead (and not in the pte).
>
> I'm not sure how to get around this without slowing things down. I
> originally had a slower handler and it turned out to impact
> performance of several important workloads and my perf guys griped
> at me. I was actually eventually planning to rewrite the 64b fsl
> book3e handler to deal with this in asm as well. Large workloads on
> our systems do a lot of tlbcam entry replacement due to 1) the small
> size of the tlbcam and 2) the lack of any hardware replacement
> policy on that array.
>
> There are other places where we'd have to modify the code to have
> the vma available (not that it's hard to do, but it's not floating
> around everywhere). And there may be other places where this is an
> issue - I'd have to go dig around a bit to answer that.
>
> For the record, I hate the idea of not being able to walk the page
> table without going elsewhere for information. IMHO I should be
> able to tell everything I need to load a TLB entry from there
> without digging up a vma.
I agree, all things being equal, but there might be tradeoffs that
make it the least bad option.
> > This should be much simpler than what we have today.
> >
> > That way, we have a completely generic accross-the-board way to store
> > huge pte's in our page tables, which is totally disconnected from the
> > slices. The later remains a purely server-only thing which only affects
> > the SLB code and get_unmapped_area().
>
> David and Jimi will have to comment about whether they can flatten
> out their stuff to just store PTEs. A lot of my code exists because
> I was attempting to be as close to the IBM implementation as
> possible.
I was talking with Ben about this yesterday, and I think it can
probably be done.
> > That means that we'll have to find another option for PowerEN giant
> > indirects but that's a non issue at the moment. I think we can keep the
> > complexity local to the PowerEN code by doing shadows there if we need
> > to.
> >
> > What do you think ?
>
> I'm happy if we can come up with another solution that still allows
> me to do my miss fault handling efficiently in asm.... What we do on
> FSL Booke with storing multiple pointers to a single pte is actually
> a fairly clean solution given the constraints. It's only in the
> context of having a different implementation coexisting with it that
> makes things start getting complicated. If you have some
> suggestions on another way to deal with this, I'm all ears.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
next prev parent reply other threads:[~2011-11-30 1:10 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-10 20:50 [PATCH 0/13] Hugetlb for 64-bit Freescale Book3E Becky Bruce
2011-10-10 20:50 ` [PATCH 01/13] powerpc: Only define HAVE_ARCH_HUGETLB_UNMAPPED_AREA if PPC_MM_SLICES Becky Bruce
2011-10-10 20:50 ` [PATCH 02/13] powerpc: hugetlb: fix huge_ptep_set_access_flags return value Becky Bruce
2011-10-10 20:50 ` [PATCH 03/13] powerpc: Fix booke hugetlb preload code for PPC_MM_SLICES and 64-bit Becky Bruce
2011-10-10 20:50 ` [PATCH 04/13] powerpc: Update hugetlb huge_pte_alloc and tablewalk code for FSL BOOKE Becky Bruce
2011-10-10 20:50 ` [PATCH 05/13] powerpc: hugetlb: modify include usage for FSL BookE code Becky Bruce
2011-10-10 20:50 ` [PATCH 06/13] powerpc: Whitespace/comment changes to tlb_low_64e.S Becky Bruce
2011-10-10 20:50 ` [PATCH 07/13] powerpc: Add hugepage support to 64-bit tablewalk code for FSL_BOOK3E Becky Bruce
2011-10-10 20:50 ` [PATCH 08/13] powerpc: Add gpages reservation code for 64-bit FSL BOOKE Becky Bruce
2011-10-10 20:50 ` [PATCH 09/13] powerpc: Kconfig updates for FSL BookE HUGETLB 64-bit Becky Bruce
2011-10-10 20:50 ` [PATCH 10/13] powerpc: Update mpc85xx/corenet 32-bit defconfigs Becky Bruce
2011-10-10 20:50 ` [PATCH 11/13] powerpc: Enable Hugetlb by default for 32-bit 85xx/corenet Becky Bruce
2011-10-10 20:50 ` [PATCH 12/13] powerpc: Update corenet64_smp_defconfig Becky Bruce
2011-10-10 20:50 ` [PATCH 13/13] powerpc: Enable hugetlb by default for corenet64 platforms Becky Bruce
2011-10-12 4:29 ` [PATCH 12/13] powerpc: Update corenet64_smp_defconfig Kumar Gala
2011-10-12 4:29 ` [PATCH 10/13] powerpc: Update mpc85xx/corenet 32-bit defconfigs Kumar Gala
2011-12-09 22:05 ` Tabi Timur-B04825
2011-11-29 5:25 ` [PATCH 04/13] powerpc: Update hugetlb huge_pte_alloc and tablewalk code for FSL BOOKE Benjamin Herrenschmidt
2011-11-29 16:36 ` Becky Bruce
2011-11-30 1:10 ` David Gibson [this message]
2011-11-25 0:43 ` [PATCH 03/13] powerpc: Fix booke hugetlb preload code for PPC_MM_SLICES and 64-bit Benjamin Herrenschmidt
2011-11-28 16:01 ` Kumar Gala
2011-11-28 16:54 ` Becky Bruce
2011-11-28 22:50 ` Benjamin Herrenschmidt
2011-11-29 3:58 ` [PATCH 02/13] powerpc: hugetlb: fix huge_ptep_set_access_flags return value Benjamin Herrenschmidt
2011-11-29 16:52 ` Becky Bruce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111130011018.GB5435@truffala.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=beckyb@kernel.crashing.org \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).