linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: cdall@cs.columbia.edu (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH 06/18] ARM: LPAE: Introduce the 3-level page table format definitions
Date: Mon, 6 Dec 2010 10:27:50 +0100	[thread overview]
Message-ID: <AANLkTimEdNj71v-oiKKho27DWcZbxOWtO6M4eLg2NDRE@mail.gmail.com> (raw)
In-Reply-To: <201010252025.24128.arnd@arndb.de>

Sorry for jumping in here at such a late hour...

>> You can look at the IPA as the virtual address translation set up by the
>> hypervisor (stage 2 translation). The guest OS only sets up stage 1
>> translations but can use 40-bit physical addresses (via stage 1) with or
>> without the hypervisor. The input to the stage 1 translations is always
>> 32-bit.
>
>
> Right, that's what I thought.
>
>> > Are there any significant differences to Linux between setting up page
>> > tables for a 32 bit VA space or a 40 bit IPA space, other than the
>> > size of the PGD?
>>
>> I think I get what you were asking :).
>>
>> >From KVM you could indeed set up stage 2 translations that a guest OS
>> can use (you need some code running in hypervisor mode to turn this on).
>> The format is pretty close to the stage 1 tables, so the Linux macros
>> could be reused. The PGD size would be different (depending on whether
>> you want to emulate 40-bit physical address space or a 32-bit one).
>> There are also a few bits (memory attributes) that may differ but you
>> could handle them in KVM.
>>
>> If KVM would reuse the existing pgd/pmd/pte Linux macros, it would
>> indeed be restricted to 32-bit IPA (sizeof(long)). You may need to
>> define different macros to use either a pfn or long long as address
>> input.

I'm not even sure it would be a big advantage to re-use the macros for
KVM. Sure, creating separate macros may duplicate some bit-shifting
logic, but my guess is that code will be easier to read if using
separate macros for the 2-nd stage translation in KVM. One might also
imagine specific virtualization-oriented bits which could be
explicitly names or directly targeted in macros that don't have to
handle both standard non-virt tables and 2-nd stage translation
tables.

At least from my experience writing KVM code, it's difficult enough to
make it clear to anyone reading the code which address space exactly
is being referenced at which time.

>> But if KVM uses qemu for platform emulation, this may only support
>> 32-bit physical address space so the guest OS could only generate 32-bit
>> IPA.
>
> Good point. At the very least, qemu would need a way to get at the highmem
> portion of the guest that is not normally part of the qemu virtual address
> space. In fact this would already be required without LPAE in order to run
> a VM with 4GB guest physical addressing.
>
> There are probable (slow) ways of doing that, e.g. remap_file_pages or
> a new syscall for accessing high guest memory. It's not entirely clear
> to me how useful that is, the most sensible way to start here is certainly
> to start out with a 32-bit IPA as you suggested and see how badly that
> limits guests in real-world setups.

So this depends on what the use would be. True, if you wanted a guest
that used more than 4GB of memory AND you wanted QEMU to be able to
readily access all of that, then yes, it would be difficult on a
32-bit architecture.

But QEMU doesn't really use the mmap'ed areas backing physical memory
for anything - it's merely a way of telling KVM how much physical
memory should be given to the guest, and the kernel side conveniently
uses get_user_pages() to access that memory. Instead, QEMU could
simply call an IOCTL to KVM telling it something like
register_user_memory(long long base_phys_addr, long long size); and
KVM could just allocate physical pages to back that without them being
mapped on the host side. An individual page could be mapped in as
needed for emulation and mapped out again. I don't see a huge
performance hit for such a solution.

But as you both suggest, 32-bit physical address space is probably
going to be more than needed for initial uses of ARM virtual machines.

-Christoffer

  reply	other threads:[~2010-12-06  9:27 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-25  8:59 [RFC PATCH 00/18] ARM: Add support for the Large Physical Address Extensions Catalin Marinas
2010-10-25  8:59 ` [RFC PATCH 01/18] ARM: LPAE: Use PMD_(SHIFT|SIZE|MASK) instead of PGDIR_* Catalin Marinas
2010-10-25  8:59 ` [RFC PATCH 02/18] ARM: LPAE: Factor out 2-level page table definitions into separate files Catalin Marinas
2010-10-25  8:59 ` [RFC PATCH 03/18] ARM: LPAE: use u32 instead of unsigned long for 32-bit ptes Catalin Marinas
2010-10-25  8:59 ` [RFC PATCH 04/18] ARM: LPAE: Do not assume Linux PTEs are always at PTRS_PER_PTE offset Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 05/18] ARM: LPAE: Introduce L_PTE_NOEXEC and L_PTE_NOWRITE Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 06/18] ARM: LPAE: Introduce the 3-level page table format definitions Catalin Marinas
2010-10-25 11:15   ` Arnd Bergmann
2010-10-25 11:59     ` Catalin Marinas
2010-10-25 13:25       ` Arnd Bergmann
2010-10-25 16:18         ` Catalin Marinas
2010-10-25 18:25           ` Arnd Bergmann
2010-12-06  9:27             ` Christoffer Dall [this message]
2010-12-06 14:21               ` Arnd Bergmann
2010-10-25  9:00 ` [RFC PATCH 07/18] ARM: LPAE: Page table maintenance for the 3-level format Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 08/18] ARM: LPAE: MMU setup for the 3-level page table format Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 09/18] ARM: LPAE: Add fault handling support Catalin Marinas
2010-10-25 10:17   ` Kirill A. Shutemov
2010-10-25 10:35     ` Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 10/18] ARM: LPAE: Add context switching support Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 11/18] ARM: LPAE: Add SMP support for the 3-level page table format Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 12/18] ARM: LPAE: use phys_addr_t instead of unsigned long for physical addresses Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 13/18] ARM: LPAE: ensure dma_addr_t is the same size as phys_addr_t Catalin Marinas
2010-10-25 11:08   ` Arnd Bergmann
2010-10-25 11:32     ` Catalin Marinas
2010-10-25 12:01       ` FUJITA Tomonori
2010-10-25 12:31         ` Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 14/18] ARM: LPAE: mark memory banks with start > ULONG_MAX as highmem Catalin Marinas
2010-10-25  9:00 ` [RFC PATCH 15/18] ARM: LPAE: use phys_addr_t for physical start address in early_mem Catalin Marinas
2010-10-25  9:01 ` [RFC PATCH 16/18] ARM: LPAE: add support for ATAG_MEM64 Catalin Marinas
2010-10-25  9:01 ` [RFC PATCH 17/18] ARM: LPAE: define printk format for physical addresses and page table entries Catalin Marinas
2010-10-25 12:01   ` Kirill A. Shutemov
2010-10-25  9:01 ` [RFC PATCH 18/18] ARM: LPAE: Add the Kconfig entries Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTimEdNj71v-oiKKho27DWcZbxOWtO6M4eLg2NDRE@mail.gmail.com \
    --to=cdall@cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).