xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Julien Grall <julien.grall@arm.com>
To: Sergej Proskurin <proskurin@sec.in.tum.de>,
	xen-devel@lists.xenproject.org
Cc: Stefano Stabellini <sstabellini@kernel.org>
Subject: Re: [PATCH v4 7/9] arm/mem_access: Add long-descriptor based gpt
Date: Thu, 22 Jun 2017 13:12:31 +0100	[thread overview]
Message-ID: <be0a5086-a72d-afb8-4ae2-5cde3db01477@arm.com> (raw)
In-Reply-To: <20170620203332.17833-8-proskurin@sec.in.tum.de>

Hi Sergej,

On 20/06/17 21:33, Sergej Proskurin wrote:
> This commit adds functionality to walk the guest's page tables using the
> long-descriptor translation table format for both ARMv7 and ARMv8.
> Similar to the hardware architecture, the implementation supports
> different page granularities (4K, 16K, and 64K). The implementation is
> based on ARM DDI 0487B.a J1-5922, J1-5999, and ARM DDI 0406C.b B3-1510.
>
> Note that the current implementation lacks support for Large VA/PA on
> ARMv8.2 architectures (LVA/LPA, 52-bit virtual and physical address
> sizes). The associated location in the code is marked appropriately.
>
> Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> ---
> v2: Use TCR_SZ_MASK instead of TTBCR_SZ_MASK for ARM 32-bit guests using
>     the long-descriptor translation table format.
>
>     Cosmetic fixes.
>
> v3: Move the implementation to ./xen/arch/arm/guest_copy.c.
>
>     Remove the array strides and declare the array grainsizes as static
>     const instead of just const to reduce the function stack overhead.
>
>     Move parts of the funtion guest_walk_ld into the static functions
>     get_ttbr_and_gran_64bit and get_top_bit to reduce complexity.
>
>     Use the macro BIT(x) instead of (1UL << x).
>
>     Add more comments && Cosmetic fixes.
>
> v4: Move functionality responsible for determining the configured IPA
>     output-size into a separate function get_ipa_output_size. In this
>     function, we remove the previously used switch statement, which was
>     responsible for distinguishing between different IPA output-sizes.
>     Instead, we retrieve the information from the introduced ipa_sizes
>     array.
>
>     Remove the defines GRANULE_SIZE_INDEX_* and TTBR0_VALID from
>     guest_walk.h. Instead, introduce the enums granule_size_index
>     active_ttbr directly inside of guest_walk.c so that the associated
>     fields don't get exported.
>
>     Adapt the function to the new parameter of type "struct vcpu *".
>
>     Remove support for 52bit IPA output-sizes entirely from this commit.
>
>     Use lpae_* helpers instead of p2m_* helpers.
>
>     Cosmetic fixes & Additional comments.
> ---
>  xen/arch/arm/guest_walk.c | 385 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 383 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
> index b8bb553a6e..c37c595157 100644
> --- a/xen/arch/arm/guest_walk.c
> +++ b/xen/arch/arm/guest_walk.c
> @@ -16,6 +16,8 @@
>   */
>
>  #include <xen/sched.h>
> +#include <xen/domain_page.h>
> +#include <asm/guest_walk.h>

Please order the include the alphabetic order.

>
>  /*
>   * The function guest_walk_sd translates a given GVA into an IPA using the
> @@ -33,6 +35,142 @@ static int guest_walk_sd(const struct vcpu *v,
>  }
>
>  /*
> + * Get the IPA output_size (configured in TCR_EL1) that shall be used for the
> + * long-descriptor based translation table walk.
> + */
> +static unsigned int get_ipa_output_size(struct domain *d, register_t tcr)
> +{
> +    unsigned int output_size;
> +    uint64_t ips;
> +
> +    static const uint64_t ipa_sizes[7] = {
> +        TCR_EL1_IPS_32_BIT_VAL,
> +        TCR_EL1_IPS_36_BIT_VAL,
> +        TCR_EL1_IPS_40_BIT_VAL,
> +        TCR_EL1_IPS_42_BIT_VAL,
> +        TCR_EL1_IPS_44_BIT_VAL,
> +        TCR_EL1_IPS_48_BIT_VAL,
> +        TCR_EL1_IPS_52_BIT_VAL
> +    };
> +
> +    if ( is_64bit_domain(d) )
> +    {
> +        /* Get the intermediate physical address size. */
> +        ips = (tcr & TCR_EL1_IPS_MASK) >> TCR_EL1_IPS_SHIFT;
> +
> +        /*
> +         * Return an error on reserved IPA output-sizes and if the IPA
> +         * output-size is 52bit.
> +         *
> +         * XXX: 52 bit output_size is not supported yet.
> +         */
> +        if ( ips > TCR_EL1_IPS_48_BIT )
> +            return -EFAULT;
> +
> +        output_size = ipa_sizes[ips];
> +    }
> +    else
> +        output_size = TCR_EL1_IPS_40_BIT_VAL;
> +
> +    return output_size;
> +}
> +
> +/* Normalized page granule size indices. */
> +enum granule_size_index {
> +    GRANULE_SIZE_INDEX_4K,
> +    GRANULE_SIZE_INDEX_16K,
> +    GRANULE_SIZE_INDEX_64K
> +};
> +
> +/* Represent whether TTBR0 or TTBR1 is active. */
> +enum active_ttbr {
> +    TTBR0_ACTIVE,
> +    TTBR1_ACTIVE
> +};
> +
> +/*
> + * Select the TTBR(0|1)_EL1 that will be used for address translation using the
> + * long-descriptor translation table format and return the page granularity
> + * that is used by the selected TTBR.
> + */
> +static bool get_ttbr_and_gran_64bit(uint64_t *ttbr, unsigned int *gran,
> +                                    register_t tcr, enum active_ttbr ttbrx)
> +{
> +    bool disabled;
> +
> +    if ( ttbrx == TTBR0_ACTIVE )
> +    {
> +        /* Normalize granule size. */
> +        switch ( tcr & TCR_TG0_MASK )
> +        {
> +        case TCR_TG0_16K:
> +            *gran = GRANULE_SIZE_INDEX_16K;
> +            break;
> +        case TCR_TG0_64K:
> +            *gran = GRANULE_SIZE_INDEX_64K;
> +            break;
> +        default:

Please explain why you use 4K by default.

> +            *gran = GRANULE_SIZE_INDEX_4K;
> +        }
> +
> +        /* Use TTBR0 for GVA to IPA translation. */
> +        *ttbr = READ_SYSREG64(TTBR0_EL1);
> +
> +        /* If TCR.EPD0 is set, translations using TTBR0 are disabled. */
> +        disabled = tcr & TCR_EPD0;
> +    }
> +    else
> +    {
> +        /* Normalize granule size. */
> +        switch ( tcr & TCR_EL1_TG1_MASK )
> +        {
> +        case TCR_EL1_TG1_16K:
> +            *gran = GRANULE_SIZE_INDEX_16K;
> +            break;
> +        case TCR_EL1_TG1_64K:
> +            *gran = GRANULE_SIZE_INDEX_64K;
> +            break;
> +        default:
> +            *gran = GRANULE_SIZE_INDEX_4K;

Please explain why you use 4K by default.

> +        }
> +
> +        /* Use TTBR1 for GVA to IPA translation. */
> +        *ttbr = READ_SYSREG64(TTBR1_EL1);
> +
> +        /* If TCR.EPD1 is set, translations using TTBR1 are disabled. */
> +        disabled = tcr & TCR_EPD1;
> +    }
> +
> +    return disabled;
> +}
> +
> +/*
> + * Get the MSB number of the GVA, according to "AddrTop" pseudocode
> + * implementation in ARM DDI 0487B.a J1-6066.
> + */
> +static unsigned int get_top_bit(struct domain *d, vaddr_t gva, register_t tcr)
> +{
> +    unsigned int topbit;
> +
> +    /*
> +     * IF EL1 is using AArch64 then addresses from EL0 using AArch32 are
> +     * zero-extended to 64 bits (ARM DDI 0487B.a J1-6066).
> +     */
> +    if ( is_32bit_domain(d) )
> +        topbit = 31;
> +    else if ( is_64bit_domain(d) )
> +    {
> +        if ( ((gva & BIT(55)) && (tcr & TCR_EL1_TBI1)) ||
> +             (!(gva & BIT(55)) && (tcr & TCR_EL1_TBI0)) )
> +            topbit = 55;
> +        else
> +            topbit = 63;
> +    }
> +
> +    return topbit;
> +}
> +
> +/*
>   * The function guest_walk_ld translates a given GVA into an IPA using the
>   * long-descriptor translation table format in software. This function assumes
>   * that the domain is running on the currently active vCPU. To walk the guest's
> @@ -43,8 +181,251 @@ static int guest_walk_ld(const struct vcpu *v,
>                           vaddr_t gva, paddr_t *ipa,
>                           unsigned int *perms)
>  {
> -    /* Not implemented yet. */
> -    return -EFAULT;
> +    bool disabled = true;
> +    bool ro_table = false, xn_table = false;
> +    unsigned int t0_sz, t1_sz;
> +    unsigned int level, gran;
> +    unsigned int topbit = 0, input_size = 0, output_size;
> +    uint64_t ttbr = 0;
> +    paddr_t mask;
> +    lpae_t pte, *table;
> +    struct page_info *page;
> +    register_t tcr = READ_SYSREG(TCR_EL1);
> +    struct domain *d = v->domain;
> +
> +    const vaddr_t offsets[4][3] = {

It would have been more readable if you had inverted the index (i.e 
granularity then level). It would also allow to use macro:

#define OFFSET(gva, gran)
{
	zeroeth_guest_table_offset_##gran(gva),
	first_guest_table_offset_##gran(gva),
	second_guest_table_offset_##gran(gva),
	third_guest_table_offset_##gran(gva),
}

const vaddr_t offsets[3][4] = {
	OFFSET(gva, 4K),
	OFFSET(gva, 16K),
	OFFSET(gva, 64K),
};

#undef OFFSET

> +        {
> +#ifdef CONFIG_ARM_64

Again, I find those #ifdef pointless, more that you don't use them for 
the masks below... and you will always compute them for 32-bit domain on 
AArch64 Xen.

> +            zeroeth_guest_table_offset_4K(gva),
> +            zeroeth_guest_table_offset_16K(gva),
> +            0, /* There is no zeroeth lookup level with a 64K granule size. */

Please use zeroeth_guest_table_offset_64K(gva) as it does already the 
job for you.

> +#endif
> +        },
> +        {
> +            first_guest_table_offset_4K(gva),
> +#ifdef CONFIG_ARM_64

Ditto

> +            first_guest_table_offset_16K(gva),
> +            first_guest_table_offset_64K(gva),
> +#endif
> +        },
> +        {
> +            second_guest_table_offset_4K(gva),
> +#ifdef CONFIG_ARM_64

Ditto

> +            second_guest_table_offset_16K(gva),
> +            second_guest_table_offset_64K(gva),
> +#endif
> +        },
> +        {
> +            third_guest_table_offset_4K(gva),
> +#ifdef CONFIG_ARM_64

Ditto

> +            third_guest_table_offset_16K(gva),
> +            third_guest_table_offset_64K(gva),
> +#endif
> +        }
> +    };
> +
> +    static const paddr_t masks[4][3] = {

Similar here for the array.

> +        {
> +            zeroeth_size(4K) - 1,
> +            zeroeth_size(16K) - 1,
> +            0 /* There is no zeroeth lookup level with a 64K granule size. */
> +        },
> +        {
> +            first_size(4K) - 1,
> +            first_size(16K) - 1,
> +            first_size(64K) - 1
> +        },
> +        {
> +            second_size(4K) - 1,
> +            second_size(16K) - 1,
> +            second_size(64K) - 1
> +        },
> +        {
> +            third_size(4K) - 1,
> +            third_size(16K) - 1,
> +            third_size(64K) - 1
> +        }
> +    };
> +
> +    static const unsigned int grainsizes[3] = {
> +        PAGE_SHIFT_4K,
> +        PAGE_SHIFT_16K,
> +        PAGE_SHIFT_64K
> +    };
> +
> +    t0_sz = (tcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
> +    t1_sz = (tcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
> +
> +    /* Get the MSB number of the GVA. */
> +    topbit = get_top_bit(d, gva, tcr);
> +
> +    if ( is_64bit_domain(d) )
> +    {
> +        /* Select the TTBR(0|1)_EL1 that will be used for address translation. */
> +
> +        if ( (gva & BIT(topbit)) == 0 )
> +        {
> +            input_size = BITS_PER_DOUBLE_WORD - t0_sz;

I don't think using BITS_PER_DOUBLE_WORD is correct here. The 
pseudo-code is explicitly saying 64 and does not mention anything about 
double word.

I agree that bit shift should be using define when it is related to a 
register. But I totally disagree on replacing every single number by 
macro when it is not possibility to find a suitable name.

> +
> +            /* Get TTBR0 and configured page granularity. */
> +            disabled = get_ttbr_and_gran_64bit(&ttbr, &gran, tcr, TTBR0_ACTIVE);
> +        }
> +        else
> +        {
> +            input_size = BITS_PER_DOUBLE_WORD - t1_sz;

Ditto.

> +
> +            /* Get TTBR1 and configured page granularity. */
> +            disabled = get_ttbr_and_gran_64bit(&ttbr, &gran, tcr, TTBR1_ACTIVE);
> +        }
> +
> +        /*
> +         * The current implementation supports intermediate physical address
> +         * sizes (IPS) up to 48 bit.
> +         *
> +         * XXX: Determine whether the IPS_MAX_VAL is 48 or 52 in software.
> +         */
> +        if ( (input_size > TCR_EL1_IPS_48_BIT_VAL) ||
> +             (input_size < TCR_EL1_IPS_MIN_VAL) )
> +            return -EFAULT;
> +    }
> +    else
> +    {
> +        /* Granule size of AArch32 architectures is always 4K. */
> +        gran = GRANULE_SIZE_INDEX_4K;
> +
> +        /* Select the TTBR(0|1)_EL1 that will be used for address translation. */
> +
> +        /*
> +         * Check if the bits <31:32-t0_sz> of the GVA are set to 0 (DDI 0487B.a
> +         * J1-5999). If so, TTBR0 shall be used for address translation.
> +         */
> +        mask = ((1ULL << BITS_PER_WORD) - 1) &

Same remark as BITS_PER_DOUBLE_WORD and it stands for all the usage of 
BITS_PER_WORD and BITS_PER_DOUBLE_WORD within this patch.

> +               ~((1ULL << (BITS_PER_WORD - t0_sz)) - 1);
> +
> +        if ( t0_sz == 0 || !(gva & mask) )
> +        {
> +            input_size = BITS_PER_WORD - t0_sz;
> +
> +            /* Use TTBR0 for GVA to IPA translation. */
> +            ttbr = READ_SYSREG64(TTBR0_EL1);
> +
> +            /* If TCR.EPD0 is set, translations using TTBR0 are disabled. */
> +            disabled = tcr & TCR_EPD0;
> +        }
> +
> +        /*
> +         * Check if the bits <31:32-t1_sz> of the GVA are set to 1 (DDI 0487B.a
> +         * J1-6000). If so, TTBR1 shall be used for address translation.
> +         */
> +        mask = ((1ULL << BITS_PER_WORD) - 1) &
> +               ~((1ULL << (BITS_PER_WORD - t1_sz)) - 1);

Please use GENMASK here plase.

> +
> +        if ( ((t1_sz == 0) && !ttbr) || (t1_sz && (gva & mask) == mask) )
> +        {
> +            input_size = BITS_PER_WORD - t1_sz;
> +
> +            /* Use TTBR1 for GVA to IPA translation. */
> +            ttbr = READ_SYSREG64(TTBR1_EL1);
> +
> +            /* If TCR.EPD1 is set, translations using TTBR1 are disabled. */
> +            disabled = tcr & TCR_EPD1;
> +        }
> +    }
> +
> +    if ( disabled )
> +        return -EFAULT;
> +
> +    /*
> +     * The starting level is the number of strides (grainsizes[gran] - 3)
> +     * needed to consume the input address (DDI 0487B.a J1-5924).
> +     */
> +    level = 4 - DIV_ROUND_UP((input_size - grainsizes[gran]), (grainsizes[gran] - 3));
> +
> +    /* Get the IPA output_size. */
> +    output_size = get_ipa_output_size(d, tcr);
> +
> +    /* Make sure the base address does not exceed its configured size. */
> +    mask = ((1ULL << TCR_EL1_IPS_48_BIT_VAL) - 1) & ~((1ULL << output_size) - 1);
> +    if ( output_size < TCR_EL1_IPS_48_BIT_VAL && (ttbr & mask) )
> +        return -EFAULT;
> +
> +    mask = ((1ULL << output_size) - 1);
> +    page = get_page_from_gfn(d, paddr_to_pfn(ttbr & mask), NULL, P2M_ALLOC);
> +    if ( !page )
> +        return -EFAULT;
> +
> +    table = __map_domain_page(page);
> +
> +    for ( ; ; level++ )
> +    {
> +        pte = table[offsets[level][gran]];
> +
> +        unmap_domain_page(table);
> +        put_page(page);
> +
> +        /* Make sure the base address does not exceed its configured size. */
> +        mask = ((1ULL << TCR_EL1_IPS_48_BIT_VAL) - 1) &
> +               ~((1ULL << output_size) - 1);

This use this construction in two place with a check after. It would 
make sense to have an helper for it.

> +
> +        if ( (output_size < TCR_EL1_IPS_48_BIT_VAL) &&
> +             (pfn_to_paddr(pte.walk.base) & mask) )
> +            return -EFAULT;
> +
> +        /*
> +         * If page granularity is 64K, make sure the address is aligned
> +         * appropriately.
> +         */
> +        if ( (output_size < TCR_EL1_IPS_52_BIT_VAL) &&
> +             (gran == GRANULE_SIZE_INDEX_64K) &&
> +             (pte.walk.base & 0xf) )
> +            return -EFAULT;
> +
> +        /*
> +         * Break if one of the following conditions are true:
> +         *
> +         * - We have found the PTE holding the IPA (level == 3).
> +         * - The PTE is not valid.
> +         * - If (level < 3) and the PTE is valid, we found a block descriptor.
> +         */
> +        if ( level == 3 || !lpae_valid(pte) || lpae_is_superpage(pte, level) )
> +            break;
> +
> +        /*
> +         * Temporarily store permissions of the table descriptor as they are
> +         * inherited by page table attributes (ARM DDI 0487B.a J1-5928).
> +         */
> +        xn_table |= pte.pt.xnt;             /* Execute-Never */
> +        ro_table |= pte.pt.apt & BIT(1);    /* Read-Only */
> +
> +        page = get_page_from_gfn(d, pte.walk.base, NULL, P2M_ALLOC);
> +
> +        if ( !page )
> +            return -EFAULT;
> +
> +        table = __map_domain_page(page);
> +    }
> +
> +    /*
> +     * According to to ARM DDI 0487B.a J1-5927, we return an error if the found
> +     * PTE is invalid or holds a reserved entry (PTE<1:0> == x0)) or if the PTE
> +     * maps a memory block at level 3 (PTE<1:0> == 01).
> +     */
> +    if ( !lpae_valid(pte) || ((level == 3) && lpae_mapping(pte)) )

If you look at the comment on top of lpae_mapping, it should only be 
used for L0..L2 ptes. So why are you using it on L3 ptes?

> +        return -EFAULT;
> +
> +    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level][gran]);
> +
> +    /*
> +     * Set permissions so that the caller can check the flags by herself. Note
> +     * that stage 1 translations also inherit attributes from the tables
> +     * (ARM DDI 0487B.a J1-5928).
> +     */
> +    if ( !pte.pt.ro && !ro_table )
> +        *perms = GV2M_WRITE;
> +    if ( !pte.pt.xn && !xn_table )
> +        *perms |= GV2M_EXEC;
> +
> +    return 0;
>  }
>
>  int guest_walk_tables(const struct vcpu *v, vaddr_t gva,
>

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-06-22 12:12 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-20 20:33 [PATCH v4 0/9] arm/mem_access: Walk guest page tables in SW if mem_access is active Sergej Proskurin
2017-06-20 20:33 ` [PATCH v4 1/9] arm/mem_access: Add (TCR_|TTBCR_)* defines Sergej Proskurin
2017-06-22 10:42   ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 2/9] arm/mem_access: Add defines supporting PTs with varying page sizes Sergej Proskurin
2017-06-22 10:52   ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 3/9] arm/mem_access: Add short-descriptor pte typedefs Sergej Proskurin
2017-07-04 16:22   ` Julien Grall
2017-07-04 16:24     ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 4/9] arm/mem_access: Introduce GV2M_EXEC permission Sergej Proskurin
2017-06-20 20:33 ` [PATCH v4 5/9] arm/mem_access: Extend BIT-operations to unsigned long long Sergej Proskurin
2017-06-22 11:13   ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 6/9] arm/mem_access: Add software guest-page-table walk Sergej Proskurin
2017-06-22 11:16   ` Julien Grall
2017-06-22 11:36     ` Sergej Proskurin
2017-06-20 20:33 ` [PATCH v4 7/9] arm/mem_access: Add long-descriptor based gpt Sergej Proskurin
2017-06-22 12:12   ` Julien Grall [this message]
2017-06-23 14:23     ` Sergej Proskurin
2017-06-23 14:35       ` Julien Grall
2017-06-22 13:54   ` Julien Grall
2017-06-20 20:33 ` [PATCH v4 8/9] arm/mem_access: Add short-descriptor " Sergej Proskurin
2017-06-22 13:53   ` Julien Grall
2017-06-23 19:09     ` Sergej Proskurin
2017-06-23 20:46       ` Julien Grall
2017-06-26  7:57         ` Sergej Proskurin
2017-06-20 20:33 ` [PATCH v4 9/9] arm/mem_access: Walk the guest's pt in software Sergej Proskurin
2017-06-20 20:44   ` Tamas K Lengyel
2017-06-20 20:59     ` Sergej Proskurin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be0a5086-a72d-afb8-4ae2-5cde3db01477@arm.com \
    --to=julien.grall@arm.com \
    --cc=proskurin@sec.in.tum.de \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).