From: catalin.marinas@arm.com (Catalin Marinas)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC] ARM64: 4 level page table translation for 4KB pages
Date: Mon, 31 Mar 2014 16:27:19 +0100 [thread overview]
Message-ID: <20140331152719.GH29871@arm.com> (raw)
In-Reply-To: <9531814.OxBzcO1V3J@wuerfel>
On Mon, Mar 31, 2014 at 01:53:20PM +0100, Arnd Bergmann wrote:
> On Monday 31 March 2014 12:31:14 Catalin Marinas wrote:
> > On Mon, Mar 31, 2014 at 07:56:53AM +0100, Arnd Bergmann wrote:
> > > On Monday 31 March 2014 12:51:07 Jungseok Lee wrote:
> > > > Current ARM64 kernel cannot support 4KB pages for 40-bit physical address
> > > > space described in [1] due to one major issue + one minor issue.
> > > >
> > > > Firstly, kernel logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > > cannot cover DRAM region from 544GB to 1024GB in [1]. Specifically, ARM64
> > > > kernel fails to create mapping for this region in map_mem function
> > > > (arch/arm64/mm/mmu.c) since __phys_to_virt for this region reaches to
> > > > address overflow. I've used 3.14-rc8+Fast Models to validate the statement.
> > >
> > > It took me a while to understand what is going on, but it essentially comes
> > > down to the logical memory map (0xffffffc000000000-0xffffffffffffffff)
> > > being able to represent only RAM in the first 256GB of address space.
> > >
> > > More importantly, this means that any system following [1] will only be
> > > able to use 32GB of RAM, which is a much more severe restriction than
> > > what it sounds like at first.
> >
> > On a 64-bit platform, do we still need the alias at the bottom and the
> > 512-544GB hole (even for 32-bit DMA, top address bits can be wired to
> > 512GB)? Only the idmap would need 4 levels, but that's static, we don't
> > need to switch Linux to 4-levels. Otherwise the memory is too sparse.
>
> I think we should keep a static virtual-to-physical mapping,
Just so that I understand: with a PHYS_OFFSET of 0?
> and to keep
> relocating the kernel at compile time without a hack like ARM_PATCH_PHYS_VIRT
> if at all possible.
and the kernel running at a virtual alias at a higher position than the
end of the mapped RAM? IIUC x86_64 does something similar.
> > > > Secondly, vmemmap space is not enough to cover over about 585GB physical
> > > > address space. Fortunately, this issue can be resolved as utilizing an extra
> > > > vmemmap space (0xffffffbe00000000-0xffffffbffbbfffff) in [2]. However,
> > > > it would not cover systems having a couple of terabytes DRAM.
> > >
> > > This one can be trivially changed by taking more space out of the vmalloc
> > > area, to go much higher if necessary. vmemmap space is always just a fraction
> > > of the linear mapping size, so we can accommodate it by definition if we
> > > find space to fit the physical memory.
> >
> > vmemmap is the total range / page size * sizeof(struct page). So for 1TB
> > range and 4K pages we would need 8GB (the current value, unless I
> > miscalculated the above). Anyway, you can't cover 1TB range with
> > 3-levels.
>
> The size of 'struct page' depends on a couple of configuration variables.
> If they are all enabled, you might need a bit more, even for configurations
> that don't have that much address space.
Yes. We could make vmemmap configurable at run-time or just go for a
maximum value.
> > > > Therefore, it would be needed to implement 4 level page table translations
> > > > for 4KB pages on 40-bit physical address space platforms. Someone might
> > > > suggest use of 64KB pages in this case, but I'm not sure about how to
> > > > deal with internal memory fragmentation.
> > > >
> > > > I would like to contribute 4 level page table translations to upstream,
> > > > the target of which is 3.16 kernel, if there is no movement on it. I saw
> > > > some related RFC patches a couple of months ago, but they didn't seem to
> > > > be merged into maintainer's tree.
> > >
> > > I think you are answering the wrong question here. Four level page tables
> > > should not be required to support >32GB of RAM, that would be very silly.
> >
> > I agree, we should only enable 4-levels of page table if we have close
> > to 512GB of RAM or the range is too sparse but I would actually push
> > back on the hardware guys to keep it tighter.
>
> But remember this part:
>
> > > There are good reasons to use a 50 bit virtual address space in user
> > > land, e.g. for supporting data base applications that mmap huge files.
>
> You may actually need 4-level tables even if you have much less installed
> memory, depending on how the application is written. Note that x86, powerpc
> and s390 all chose to use 4-level tables for 64-bit kernels all the
> time, even thought they can also use 3-level of 5-level in some cases.
I don't mind 4-level tables by default but I would still keep a
configuration option (or at least doing some benchmarks to assess the
impact before switching permanently to 4-levels). There are mobile
platforms that don't really need as much VA space (and people are even
talking about ILP32).
> > > If this is not the goal however, we should not pay for the overhead
> > > of the extra page table in user space. I can see two other possible
> > > solutions for the problem:
> > >
> > > a) always use a four-level page table in kernel space, regardless of
> > > whether we do it in user space. We can move the kernel mappings down
> > > in address space either by one 512GB entry to 0xffffff0000000000, or
> > > to match the 64k-page location at 0xfffffc0000000000, or all the way
> > > to to 0xfffc000000000000. In any case, we can have all the dynamic
> > > mappings within one 512GB area and pretend we have a three-level
> > > page table for them, while the rest of DRAM is mapped statically at
> > > early boot time using 512GB large pages.
> >
> > That's a workaround but we end up with two (or more) kernel pgds - one
> > for vmalloc, ioremap etc. and another (static) one for the kernel linear
> > mapping. So far there isn't any memory mapping carved out but we have to
> > be careful in the future.
> >
> > However, kernel page table walking would be a bit slower with 4-levels.
>
> Do we actually walk the kernel page tables that often? With what I suggested,
> we can still pretend that it's 3-level for all practical purposes, since
> you wouldn't walk the page tables for the linear mapping.
I was referring to hardware page table walk (TLB miss). Again, we need
some benchmarks (it gets worse in a guest as it needs to walk the stage
2 for each stage 1 level miss; if you are really unlucky you can have up
to 24 memory accesses for a TLB miss with two translation stages and 4
levels each).
--
Catalin
next prev parent reply other threads:[~2014-03-31 15:27 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-31 3:51 [RFC] ARM64: 4 level page table translation for 4KB pages Jungseok Lee
2014-03-31 6:56 ` Arnd Bergmann
2014-03-31 11:31 ` Catalin Marinas
2014-03-31 12:45 ` Catalin Marinas
2014-03-31 12:58 ` Arnd Bergmann
2014-03-31 15:00 ` Catalin Marinas
2014-03-31 12:53 ` Arnd Bergmann
2014-03-31 15:27 ` Catalin Marinas [this message]
2014-03-31 23:11 ` Arnd Bergmann
2014-04-01 13:23 ` Catalin Marinas
2014-04-02 3:58 ` Jungseok Lee
2014-04-02 9:01 ` Catalin Marinas
2014-04-02 15:24 ` Catalin Marinas
2014-04-02 22:41 ` Jungseok Lee
2014-04-03 2:15 ` Sungjinn Chung
2014-04-03 8:38 ` Catalin Marinas
2014-04-03 9:14 ` Sungjinn Chung
2014-04-03 9:17 ` Catalin Marinas
2014-04-01 0:44 ` 정성진
2014-04-01 9:46 ` Catalin Marinas
2014-04-01 10:13 ` 정성진
2014-04-01 11:22 ` Catalin Marinas
2014-04-01 23:35 ` Sungjinn Chung
2014-04-01 0:42 ` Jungseok Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140331152719.GH29871@arm.com \
--to=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).