From: Julien Grall <julien.grall@arm.com>
To: Sergej Proskurin <proskurin@sec.in.tum.de>,
xen-devel@lists.xenproject.org
Cc: Stefano Stabellini <sstabellini@kernel.org>
Subject: Re: [PATCH v4 7/9] arm/mem_access: Add long-descriptor based gpt
Date: Thu, 22 Jun 2017 13:12:31 +0100 [thread overview]
Message-ID: <be0a5086-a72d-afb8-4ae2-5cde3db01477@arm.com> (raw)
In-Reply-To: <20170620203332.17833-8-proskurin@sec.in.tum.de>
Hi Sergej,
On 20/06/17 21:33, Sergej Proskurin wrote:
> This commit adds functionality to walk the guest's page tables using the
> long-descriptor translation table format for both ARMv7 and ARMv8.
> Similar to the hardware architecture, the implementation supports
> different page granularities (4K, 16K, and 64K). The implementation is
> based on ARM DDI 0487B.a J1-5922, J1-5999, and ARM DDI 0406C.b B3-1510.
>
> Note that the current implementation lacks support for Large VA/PA on
> ARMv8.2 architectures (LVA/LPA, 52-bit virtual and physical address
> sizes). The associated location in the code is marked appropriately.
>
> Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> ---
> v2: Use TCR_SZ_MASK instead of TTBCR_SZ_MASK for ARM 32-bit guests using
> the long-descriptor translation table format.
>
> Cosmetic fixes.
>
> v3: Move the implementation to ./xen/arch/arm/guest_copy.c.
>
> Remove the array strides and declare the array grainsizes as static
> const instead of just const to reduce the function stack overhead.
>
> Move parts of the funtion guest_walk_ld into the static functions
> get_ttbr_and_gran_64bit and get_top_bit to reduce complexity.
>
> Use the macro BIT(x) instead of (1UL << x).
>
> Add more comments && Cosmetic fixes.
>
> v4: Move functionality responsible for determining the configured IPA
> output-size into a separate function get_ipa_output_size. In this
> function, we remove the previously used switch statement, which was
> responsible for distinguishing between different IPA output-sizes.
> Instead, we retrieve the information from the introduced ipa_sizes
> array.
>
> Remove the defines GRANULE_SIZE_INDEX_* and TTBR0_VALID from
> guest_walk.h. Instead, introduce the enums granule_size_index
> active_ttbr directly inside of guest_walk.c so that the associated
> fields don't get exported.
>
> Adapt the function to the new parameter of type "struct vcpu *".
>
> Remove support for 52bit IPA output-sizes entirely from this commit.
>
> Use lpae_* helpers instead of p2m_* helpers.
>
> Cosmetic fixes & Additional comments.
> ---
> xen/arch/arm/guest_walk.c | 385 +++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 383 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
> index b8bb553a6e..c37c595157 100644
> --- a/xen/arch/arm/guest_walk.c
> +++ b/xen/arch/arm/guest_walk.c
> @@ -16,6 +16,8 @@
> */
>
> #include <xen/sched.h>
> +#include <xen/domain_page.h>
> +#include <asm/guest_walk.h>
Please order the include the alphabetic order.
>
> /*
> * The function guest_walk_sd translates a given GVA into an IPA using the
> @@ -33,6 +35,142 @@ static int guest_walk_sd(const struct vcpu *v,
> }
>
> /*
> + * Get the IPA output_size (configured in TCR_EL1) that shall be used for the
> + * long-descriptor based translation table walk.
> + */
> +static unsigned int get_ipa_output_size(struct domain *d, register_t tcr)
> +{
> + unsigned int output_size;
> + uint64_t ips;
> +
> + static const uint64_t ipa_sizes[7] = {
> + TCR_EL1_IPS_32_BIT_VAL,
> + TCR_EL1_IPS_36_BIT_VAL,
> + TCR_EL1_IPS_40_BIT_VAL,
> + TCR_EL1_IPS_42_BIT_VAL,
> + TCR_EL1_IPS_44_BIT_VAL,
> + TCR_EL1_IPS_48_BIT_VAL,
> + TCR_EL1_IPS_52_BIT_VAL
> + };
> +
> + if ( is_64bit_domain(d) )
> + {
> + /* Get the intermediate physical address size. */
> + ips = (tcr & TCR_EL1_IPS_MASK) >> TCR_EL1_IPS_SHIFT;
> +
> + /*
> + * Return an error on reserved IPA output-sizes and if the IPA
> + * output-size is 52bit.
> + *
> + * XXX: 52 bit output_size is not supported yet.
> + */
> + if ( ips > TCR_EL1_IPS_48_BIT )
> + return -EFAULT;
> +
> + output_size = ipa_sizes[ips];
> + }
> + else
> + output_size = TCR_EL1_IPS_40_BIT_VAL;
> +
> + return output_size;
> +}
> +
> +/* Normalized page granule size indices. */
> +enum granule_size_index {
> + GRANULE_SIZE_INDEX_4K,
> + GRANULE_SIZE_INDEX_16K,
> + GRANULE_SIZE_INDEX_64K
> +};
> +
> +/* Represent whether TTBR0 or TTBR1 is active. */
> +enum active_ttbr {
> + TTBR0_ACTIVE,
> + TTBR1_ACTIVE
> +};
> +
> +/*
> + * Select the TTBR(0|1)_EL1 that will be used for address translation using the
> + * long-descriptor translation table format and return the page granularity
> + * that is used by the selected TTBR.
> + */
> +static bool get_ttbr_and_gran_64bit(uint64_t *ttbr, unsigned int *gran,
> + register_t tcr, enum active_ttbr ttbrx)
> +{
> + bool disabled;
> +
> + if ( ttbrx == TTBR0_ACTIVE )
> + {
> + /* Normalize granule size. */
> + switch ( tcr & TCR_TG0_MASK )
> + {
> + case TCR_TG0_16K:
> + *gran = GRANULE_SIZE_INDEX_16K;
> + break;
> + case TCR_TG0_64K:
> + *gran = GRANULE_SIZE_INDEX_64K;
> + break;
> + default:
Please explain why you use 4K by default.
> + *gran = GRANULE_SIZE_INDEX_4K;
> + }
> +
> + /* Use TTBR0 for GVA to IPA translation. */
> + *ttbr = READ_SYSREG64(TTBR0_EL1);
> +
> + /* If TCR.EPD0 is set, translations using TTBR0 are disabled. */
> + disabled = tcr & TCR_EPD0;
> + }
> + else
> + {
> + /* Normalize granule size. */
> + switch ( tcr & TCR_EL1_TG1_MASK )
> + {
> + case TCR_EL1_TG1_16K:
> + *gran = GRANULE_SIZE_INDEX_16K;
> + break;
> + case TCR_EL1_TG1_64K:
> + *gran = GRANULE_SIZE_INDEX_64K;
> + break;
> + default:
> + *gran = GRANULE_SIZE_INDEX_4K;
Please explain why you use 4K by default.
> + }
> +
> + /* Use TTBR1 for GVA to IPA translation. */
> + *ttbr = READ_SYSREG64(TTBR1_EL1);
> +
> + /* If TCR.EPD1 is set, translations using TTBR1 are disabled. */
> + disabled = tcr & TCR_EPD1;
> + }
> +
> + return disabled;
> +}
> +
> +/*
> + * Get the MSB number of the GVA, according to "AddrTop" pseudocode
> + * implementation in ARM DDI 0487B.a J1-6066.
> + */
> +static unsigned int get_top_bit(struct domain *d, vaddr_t gva, register_t tcr)
> +{
> + unsigned int topbit;
> +
> + /*
> + * IF EL1 is using AArch64 then addresses from EL0 using AArch32 are
> + * zero-extended to 64 bits (ARM DDI 0487B.a J1-6066).
> + */
> + if ( is_32bit_domain(d) )
> + topbit = 31;
> + else if ( is_64bit_domain(d) )
> + {
> + if ( ((gva & BIT(55)) && (tcr & TCR_EL1_TBI1)) ||
> + (!(gva & BIT(55)) && (tcr & TCR_EL1_TBI0)) )
> + topbit = 55;
> + else
> + topbit = 63;
> + }
> +
> + return topbit;
> +}
> +
> +/*
> * The function guest_walk_ld translates a given GVA into an IPA using the
> * long-descriptor translation table format in software. This function assumes
> * that the domain is running on the currently active vCPU. To walk the guest's
> @@ -43,8 +181,251 @@ static int guest_walk_ld(const struct vcpu *v,
> vaddr_t gva, paddr_t *ipa,
> unsigned int *perms)
> {
> - /* Not implemented yet. */
> - return -EFAULT;
> + bool disabled = true;
> + bool ro_table = false, xn_table = false;
> + unsigned int t0_sz, t1_sz;
> + unsigned int level, gran;
> + unsigned int topbit = 0, input_size = 0, output_size;
> + uint64_t ttbr = 0;
> + paddr_t mask;
> + lpae_t pte, *table;
> + struct page_info *page;
> + register_t tcr = READ_SYSREG(TCR_EL1);
> + struct domain *d = v->domain;
> +
> + const vaddr_t offsets[4][3] = {
It would have been more readable if you had inverted the index (i.e
granularity then level). It would also allow to use macro:
#define OFFSET(gva, gran)
{
zeroeth_guest_table_offset_##gran(gva),
first_guest_table_offset_##gran(gva),
second_guest_table_offset_##gran(gva),
third_guest_table_offset_##gran(gva),
}
const vaddr_t offsets[3][4] = {
OFFSET(gva, 4K),
OFFSET(gva, 16K),
OFFSET(gva, 64K),
};
#undef OFFSET
> + {
> +#ifdef CONFIG_ARM_64
Again, I find those #ifdef pointless, more that you don't use them for
the masks below... and you will always compute them for 32-bit domain on
AArch64 Xen.
> + zeroeth_guest_table_offset_4K(gva),
> + zeroeth_guest_table_offset_16K(gva),
> + 0, /* There is no zeroeth lookup level with a 64K granule size. */
Please use zeroeth_guest_table_offset_64K(gva) as it does already the
job for you.
> +#endif
> + },
> + {
> + first_guest_table_offset_4K(gva),
> +#ifdef CONFIG_ARM_64
Ditto
> + first_guest_table_offset_16K(gva),
> + first_guest_table_offset_64K(gva),
> +#endif
> + },
> + {
> + second_guest_table_offset_4K(gva),
> +#ifdef CONFIG_ARM_64
Ditto
> + second_guest_table_offset_16K(gva),
> + second_guest_table_offset_64K(gva),
> +#endif
> + },
> + {
> + third_guest_table_offset_4K(gva),
> +#ifdef CONFIG_ARM_64
Ditto
> + third_guest_table_offset_16K(gva),
> + third_guest_table_offset_64K(gva),
> +#endif
> + }
> + };
> +
> + static const paddr_t masks[4][3] = {
Similar here for the array.
> + {
> + zeroeth_size(4K) - 1,
> + zeroeth_size(16K) - 1,
> + 0 /* There is no zeroeth lookup level with a 64K granule size. */
> + },
> + {
> + first_size(4K) - 1,
> + first_size(16K) - 1,
> + first_size(64K) - 1
> + },
> + {
> + second_size(4K) - 1,
> + second_size(16K) - 1,
> + second_size(64K) - 1
> + },
> + {
> + third_size(4K) - 1,
> + third_size(16K) - 1,
> + third_size(64K) - 1
> + }
> + };
> +
> + static const unsigned int grainsizes[3] = {
> + PAGE_SHIFT_4K,
> + PAGE_SHIFT_16K,
> + PAGE_SHIFT_64K
> + };
> +
> + t0_sz = (tcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
> + t1_sz = (tcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
> +
> + /* Get the MSB number of the GVA. */
> + topbit = get_top_bit(d, gva, tcr);
> +
> + if ( is_64bit_domain(d) )
> + {
> + /* Select the TTBR(0|1)_EL1 that will be used for address translation. */
> +
> + if ( (gva & BIT(topbit)) == 0 )
> + {
> + input_size = BITS_PER_DOUBLE_WORD - t0_sz;
I don't think using BITS_PER_DOUBLE_WORD is correct here. The
pseudo-code is explicitly saying 64 and does not mention anything about
double word.
I agree that bit shift should be using define when it is related to a
register. But I totally disagree on replacing every single number by
macro when it is not possibility to find a suitable name.
> +
> + /* Get TTBR0 and configured page granularity. */
> + disabled = get_ttbr_and_gran_64bit(&ttbr, &gran, tcr, TTBR0_ACTIVE);
> + }
> + else
> + {
> + input_size = BITS_PER_DOUBLE_WORD - t1_sz;
Ditto.
> +
> + /* Get TTBR1 and configured page granularity. */
> + disabled = get_ttbr_and_gran_64bit(&ttbr, &gran, tcr, TTBR1_ACTIVE);
> + }
> +
> + /*
> + * The current implementation supports intermediate physical address
> + * sizes (IPS) up to 48 bit.
> + *
> + * XXX: Determine whether the IPS_MAX_VAL is 48 or 52 in software.
> + */
> + if ( (input_size > TCR_EL1_IPS_48_BIT_VAL) ||
> + (input_size < TCR_EL1_IPS_MIN_VAL) )
> + return -EFAULT;
> + }
> + else
> + {
> + /* Granule size of AArch32 architectures is always 4K. */
> + gran = GRANULE_SIZE_INDEX_4K;
> +
> + /* Select the TTBR(0|1)_EL1 that will be used for address translation. */
> +
> + /*
> + * Check if the bits <31:32-t0_sz> of the GVA are set to 0 (DDI 0487B.a
> + * J1-5999). If so, TTBR0 shall be used for address translation.
> + */
> + mask = ((1ULL << BITS_PER_WORD) - 1) &
Same remark as BITS_PER_DOUBLE_WORD and it stands for all the usage of
BITS_PER_WORD and BITS_PER_DOUBLE_WORD within this patch.
> + ~((1ULL << (BITS_PER_WORD - t0_sz)) - 1);
> +
> + if ( t0_sz == 0 || !(gva & mask) )
> + {
> + input_size = BITS_PER_WORD - t0_sz;
> +
> + /* Use TTBR0 for GVA to IPA translation. */
> + ttbr = READ_SYSREG64(TTBR0_EL1);
> +
> + /* If TCR.EPD0 is set, translations using TTBR0 are disabled. */
> + disabled = tcr & TCR_EPD0;
> + }
> +
> + /*
> + * Check if the bits <31:32-t1_sz> of the GVA are set to 1 (DDI 0487B.a
> + * J1-6000). If so, TTBR1 shall be used for address translation.
> + */
> + mask = ((1ULL << BITS_PER_WORD) - 1) &
> + ~((1ULL << (BITS_PER_WORD - t1_sz)) - 1);
Please use GENMASK here plase.
> +
> + if ( ((t1_sz == 0) && !ttbr) || (t1_sz && (gva & mask) == mask) )
> + {
> + input_size = BITS_PER_WORD - t1_sz;
> +
> + /* Use TTBR1 for GVA to IPA translation. */
> + ttbr = READ_SYSREG64(TTBR1_EL1);
> +
> + /* If TCR.EPD1 is set, translations using TTBR1 are disabled. */
> + disabled = tcr & TCR_EPD1;
> + }
> + }
> +
> + if ( disabled )
> + return -EFAULT;
> +
> + /*
> + * The starting level is the number of strides (grainsizes[gran] - 3)
> + * needed to consume the input address (DDI 0487B.a J1-5924).
> + */
> + level = 4 - DIV_ROUND_UP((input_size - grainsizes[gran]), (grainsizes[gran] - 3));
> +
> + /* Get the IPA output_size. */
> + output_size = get_ipa_output_size(d, tcr);
> +
> + /* Make sure the base address does not exceed its configured size. */
> + mask = ((1ULL << TCR_EL1_IPS_48_BIT_VAL) - 1) & ~((1ULL << output_size) - 1);
> + if ( output_size < TCR_EL1_IPS_48_BIT_VAL && (ttbr & mask) )
> + return -EFAULT;
> +
> + mask = ((1ULL << output_size) - 1);
> + page = get_page_from_gfn(d, paddr_to_pfn(ttbr & mask), NULL, P2M_ALLOC);
> + if ( !page )
> + return -EFAULT;
> +
> + table = __map_domain_page(page);
> +
> + for ( ; ; level++ )
> + {
> + pte = table[offsets[level][gran]];
> +
> + unmap_domain_page(table);
> + put_page(page);
> +
> + /* Make sure the base address does not exceed its configured size. */
> + mask = ((1ULL << TCR_EL1_IPS_48_BIT_VAL) - 1) &
> + ~((1ULL << output_size) - 1);
This use this construction in two place with a check after. It would
make sense to have an helper for it.
> +
> + if ( (output_size < TCR_EL1_IPS_48_BIT_VAL) &&
> + (pfn_to_paddr(pte.walk.base) & mask) )
> + return -EFAULT;
> +
> + /*
> + * If page granularity is 64K, make sure the address is aligned
> + * appropriately.
> + */
> + if ( (output_size < TCR_EL1_IPS_52_BIT_VAL) &&
> + (gran == GRANULE_SIZE_INDEX_64K) &&
> + (pte.walk.base & 0xf) )
> + return -EFAULT;
> +
> + /*
> + * Break if one of the following conditions are true:
> + *
> + * - We have found the PTE holding the IPA (level == 3).
> + * - The PTE is not valid.
> + * - If (level < 3) and the PTE is valid, we found a block descriptor.
> + */
> + if ( level == 3 || !lpae_valid(pte) || lpae_is_superpage(pte, level) )
> + break;
> +
> + /*
> + * Temporarily store permissions of the table descriptor as they are
> + * inherited by page table attributes (ARM DDI 0487B.a J1-5928).
> + */
> + xn_table |= pte.pt.xnt; /* Execute-Never */
> + ro_table |= pte.pt.apt & BIT(1); /* Read-Only */
> +
> + page = get_page_from_gfn(d, pte.walk.base, NULL, P2M_ALLOC);
> +
> + if ( !page )
> + return -EFAULT;
> +
> + table = __map_domain_page(page);
> + }
> +
> + /*
> + * According to to ARM DDI 0487B.a J1-5927, we return an error if the found
> + * PTE is invalid or holds a reserved entry (PTE<1:0> == x0)) or if the PTE
> + * maps a memory block at level 3 (PTE<1:0> == 01).
> + */
> + if ( !lpae_valid(pte) || ((level == 3) && lpae_mapping(pte)) )
If you look at the comment on top of lpae_mapping, it should only be
used for L0..L2 ptes. So why are you using it on L3 ptes?
> + return -EFAULT;
> +
> + *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level][gran]);
> +
> + /*
> + * Set permissions so that the caller can check the flags by herself. Note
> + * that stage 1 translations also inherit attributes from the tables
> + * (ARM DDI 0487B.a J1-5928).
> + */
> + if ( !pte.pt.ro && !ro_table )
> + *perms = GV2M_WRITE;
> + if ( !pte.pt.xn && !xn_table )
> + *perms |= GV2M_EXEC;
> +
> + return 0;
> }
>
> int guest_walk_tables(const struct vcpu *v, vaddr_t gva,
>
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-06-22 12:12 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-20 20:33 [PATCH v4 0/9] arm/mem_access: Walk guest page tables in SW if mem_access is active Sergej Proskurin
2017-06-20 20:33 ` [PATCH v4 1/9] arm/mem_access: Add (TCR_|TTBCR_)* defines Sergej Proskurin
2017-06-22 10:42 ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 2/9] arm/mem_access: Add defines supporting PTs with varying page sizes Sergej Proskurin
2017-06-22 10:52 ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 3/9] arm/mem_access: Add short-descriptor pte typedefs Sergej Proskurin
2017-07-04 16:22 ` Julien Grall
2017-07-04 16:24 ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 4/9] arm/mem_access: Introduce GV2M_EXEC permission Sergej Proskurin
2017-06-20 20:33 ` [PATCH v4 5/9] arm/mem_access: Extend BIT-operations to unsigned long long Sergej Proskurin
2017-06-22 11:13 ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 6/9] arm/mem_access: Add software guest-page-table walk Sergej Proskurin
2017-06-22 11:16 ` Julien Grall
2017-06-22 11:36 ` Sergej Proskurin
2017-06-20 20:33 ` [PATCH v4 7/9] arm/mem_access: Add long-descriptor based gpt Sergej Proskurin
2017-06-22 12:12 ` Julien Grall [this message]
2017-06-23 14:23 ` Sergej Proskurin
2017-06-23 14:35 ` Julien Grall
2017-06-22 13:54 ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 8/9] arm/mem_access: Add short-descriptor " Sergej Proskurin
2017-06-22 13:53 ` Julien Grall
2017-06-23 19:09 ` Sergej Proskurin
2017-06-23 20:46 ` Julien Grall
2017-06-26 7:57 ` Sergej Proskurin
2017-06-20 20:33 ` [PATCH v4 9/9] arm/mem_access: Walk the guest's pt in software Sergej Proskurin
2017-06-20 20:44 ` Tamas K Lengyel
2017-06-20 20:59 ` Sergej Proskurin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=be0a5086-a72d-afb8-4ae2-5cde3db01477@arm.com \
--to=julien.grall@arm.com \
--cc=proskurin@sec.in.tum.de \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).