From: cyril@ti.com (Cyril Chemparathy)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 05/22] ARM: LPAE: support 64-bit virt_to_phys patching
Date: Sun, 12 Aug 2012 19:27:49 -0400 [thread overview]
Message-ID: <50283BF5.1050908@ti.com> (raw)
In-Reply-To: <alpine.LFD.2.02.1208112316560.5231@xanadu.home>
On 08/11/12 23:39, Nicolas Pitre wrote:
> On Fri, 10 Aug 2012, Cyril Chemparathy wrote:
>
>> This patch adds support for 64-bit physical addresses in virt_to_phys()
>> patching. This does not do real 64-bit add/sub, but instead patches in the
>> upper 32-bits of the phys_offset directly into the output of virt_to_phys.
>>
>> There is no corresponding change on the phys_to_virt() side, because
>> computations on the upper 32-bits would be discarded anyway.
>>
>> Signed-off-by: Cyril Chemparathy <cyril@ti.com>
>> ---
>> arch/arm/include/asm/memory.h | 22 ++++++++++++++++++----
>> arch/arm/kernel/head.S | 4 ++++
>> arch/arm/kernel/setup.c | 2 +-
>> 3 files changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
>> index 81e1714..dc5fbf3 100644
>> --- a/arch/arm/include/asm/memory.h
>> +++ b/arch/arm/include/asm/memory.h
>> @@ -154,14 +154,28 @@
>> #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>>
>> extern unsigned long __pv_offset;
>> -extern unsigned long __pv_phys_offset;
>> +extern phys_addr_t __pv_phys_offset;
>> #define PHYS_OFFSET __virt_to_phys(PAGE_OFFSET)
>>
>> static inline phys_addr_t __virt_to_phys(unsigned long x)
>> {
>> - unsigned long t;
>> - early_patch_imm8("add", t, x, __pv_offset, 0);
>> - return t;
>> + unsigned long tlo, thi;
>> +
>> + early_patch_imm8("add", tlo, x, __pv_offset, 0);
>> +
>> +#ifdef CONFIG_ARM_LPAE
>> + /*
>> + * On LPAE, we do not _need_ to do 64-bit arithmetic because the high
>> + * order 32 bits are never changed by the phys-virt offset. We simply
>> + * patch in the high order physical address bits instead.
>> + */
>> +#ifdef __ARMEB__
>> + early_patch_imm8_mov("mov", thi, __pv_phys_offset, 0);
>> +#else
>> + early_patch_imm8_mov("mov", thi, __pv_phys_offset, 4);
>> +#endif
>> +#endif
>> + return (u64)tlo | (u64)thi << 32;
>> }
>
> Hmmm... I'm afraid this is going to be suboptimal when LPAE is not
> selected.
>
I understand your concern, but I don't see the sub-optimality. I tested
the following function with GCC versions 4.3.3 and 4.7. This is after
the other changes that I mentioned in my previous email, but with the
__virt_to_phys() code itself unchanged:
phys_addr_t ____test_virt_to_phys(unsigned long addr)
{
return __virt_to_phys(addr);
}
The resulting code in both cases looks like:
<____test_virt_to_phys>:
b c04f0528
bx lr
[the branch of course gets patched to an add]
> First of all, you do not need to cast tlo to a u64 in the return value.
>
True enough.
> Then, I'm not sure if the compiler is smart enough to see that the
> returned value is a phys_addr_t which can be a u32, and in this case the
> (u64)thi << 32 is going to be truncated right away, and therefore there
> is no point in emiting the corresponding instructions.
>
In this case, it appears to be smart enough. However, I agree that
relying on compiler smarts is probably not the best thing for us to do.
> Furthermore, if LPAE is not defined, then thi never gets initialized and
> should produce a warning. Did you test compilation of the code with LPAE
> turned off?
>
Sure. One of our test platforms is non-LPAE. The compiler does not
produce warnings on this, and this is consistent across both compiler
versions.
> I'd prefer something like this where more stuff is validated by the
> compiler:
>
> static inline phys_addr_t __virt_to_phys(unsigned long x)
> {
> unsigned long tlo, thi;
> phys_addr_t ret;
>
> early_patch_imm8("add", tlo, x, __pv_offset, 0);
> ret = tlo;
>
> if (sizeof(phys_addr_t) > 4) {
> #ifdef __ARMEB__
> early_patch_imm8_mov("mov", thi, __pv_phys_offset, 0);
> #else
> early_patch_imm8_mov("mov", thi, __pv_phys_offset, 4);
> #endif
> ret |= ((u64)thi) << 32;
> }
>
> return ret);
> }
>
> This should let the compiler optimize things whether LPAE is enabledor
> not while validating both cases.
>
Agreed on the principal, but more below...
I've meanwhile been chasing down another problem - the code generated
for the LPAE case. The original code resulted in the following:
<____test_virt_to_phys>:
mov r2, #0
b c01bc800 # patch: add r1, r0, __pv_offset
b c01bc810 # patch: mov r0, __phys_offset_high
orr r2, r2, r1
mov r3, r0
mov r1, r3
mov r0, r2
bx lr
Yikes! This code does a bunch of futile register shuffling and a
pointless or, all in the name of generating the result in a 64-bit
register-pair from the 32-bit halves.
In order to get past this, I tried adding operand qualifiers (R = upper
32-bits, Q = lower 32-bits) in the patch macros, in the hope that
treating these as native 64-bit register pairs would eliminate the need
to shuffle them around after the inline assembly blocks. This then
allows us to implement __virt_to_phys() as follows:
static inline phys_addr_t __virt_to_phys(unsigned long x)
{
phys_addr_t t;
if (sizeof(t) == 4) {
t = x;
early_patch_imm8("add", t, "", t, __pv_offset, 0);
return t;
}
/*
* On LPAE, we do not _need_ to do 64-bit arithmetic because
* the high order 32 bits are never changed by the phys-virt
* offset. We simply patch in the high order physical address
* bits instead.
*
* Note: the mov _must_ be first here. From the compiler's
* perspective, this is the initializer for the variable. The
* mov itself initializes only the upper half. The subsequent
* add treats t as a read/write operand and initializes the
* lower half.
*/
#ifdef __ARMEB__
early_patch_imm8_mov("mov", t, "R", __pv_phys_offset, 0);
#else
early_patch_imm8_mov("mov", t, "R", __pv_phys_offset, 4);
#endif
early_patch_imm8("add", t, "Q", x, __pv_offset, 0);
return t;
}
With this, we get significantly better looking generated code:
<____test_virt_to_phys>:
b c01d519c # patch: mov r3, __phys_offset_high
b c01d51ac # patch: add r2, r0, __phys_offset_high
mov r0, r2
mov r1, r3
bx lr
This is about as far along as I've been able to proceed. I still
haven't figured out a way to get it to patch in place without an extra
register pair.
Overall, this is still a bit too kludgy for my liking. In particular,
the read/write operand forces add/sub/... users to initialize the result
variable. I am currently leaning towards adding native support for
64-bit operations in the runtime patch code, instead of having to hack
around it with 32-bit primitives. Better ideas, any one?
Thanks
-- Cyril.
WARNING: multiple messages have this Message-ID (diff)
From: Cyril Chemparathy <cyril@ti.com>
To: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: <linux-kernel@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>, <arnd@arndb.de>,
<catalin.marinas@arm.com>, <grant.likely@secretlab.ca>,
<linux@arm.linux.org.uk>, <will.deacon@arm.com>
Subject: Re: [PATCH v2 05/22] ARM: LPAE: support 64-bit virt_to_phys patching
Date: Sun, 12 Aug 2012 19:27:49 -0400 [thread overview]
Message-ID: <50283BF5.1050908@ti.com> (raw)
In-Reply-To: <alpine.LFD.2.02.1208112316560.5231@xanadu.home>
On 08/11/12 23:39, Nicolas Pitre wrote:
> On Fri, 10 Aug 2012, Cyril Chemparathy wrote:
>
>> This patch adds support for 64-bit physical addresses in virt_to_phys()
>> patching. This does not do real 64-bit add/sub, but instead patches in the
>> upper 32-bits of the phys_offset directly into the output of virt_to_phys.
>>
>> There is no corresponding change on the phys_to_virt() side, because
>> computations on the upper 32-bits would be discarded anyway.
>>
>> Signed-off-by: Cyril Chemparathy <cyril@ti.com>
>> ---
>> arch/arm/include/asm/memory.h | 22 ++++++++++++++++++----
>> arch/arm/kernel/head.S | 4 ++++
>> arch/arm/kernel/setup.c | 2 +-
>> 3 files changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
>> index 81e1714..dc5fbf3 100644
>> --- a/arch/arm/include/asm/memory.h
>> +++ b/arch/arm/include/asm/memory.h
>> @@ -154,14 +154,28 @@
>> #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>>
>> extern unsigned long __pv_offset;
>> -extern unsigned long __pv_phys_offset;
>> +extern phys_addr_t __pv_phys_offset;
>> #define PHYS_OFFSET __virt_to_phys(PAGE_OFFSET)
>>
>> static inline phys_addr_t __virt_to_phys(unsigned long x)
>> {
>> - unsigned long t;
>> - early_patch_imm8("add", t, x, __pv_offset, 0);
>> - return t;
>> + unsigned long tlo, thi;
>> +
>> + early_patch_imm8("add", tlo, x, __pv_offset, 0);
>> +
>> +#ifdef CONFIG_ARM_LPAE
>> + /*
>> + * On LPAE, we do not _need_ to do 64-bit arithmetic because the high
>> + * order 32 bits are never changed by the phys-virt offset. We simply
>> + * patch in the high order physical address bits instead.
>> + */
>> +#ifdef __ARMEB__
>> + early_patch_imm8_mov("mov", thi, __pv_phys_offset, 0);
>> +#else
>> + early_patch_imm8_mov("mov", thi, __pv_phys_offset, 4);
>> +#endif
>> +#endif
>> + return (u64)tlo | (u64)thi << 32;
>> }
>
> Hmmm... I'm afraid this is going to be suboptimal when LPAE is not
> selected.
>
I understand your concern, but I don't see the sub-optimality. I tested
the following function with GCC versions 4.3.3 and 4.7. This is after
the other changes that I mentioned in my previous email, but with the
__virt_to_phys() code itself unchanged:
phys_addr_t ____test_virt_to_phys(unsigned long addr)
{
return __virt_to_phys(addr);
}
The resulting code in both cases looks like:
<____test_virt_to_phys>:
b c04f0528
bx lr
[the branch of course gets patched to an add]
> First of all, you do not need to cast tlo to a u64 in the return value.
>
True enough.
> Then, I'm not sure if the compiler is smart enough to see that the
> returned value is a phys_addr_t which can be a u32, and in this case the
> (u64)thi << 32 is going to be truncated right away, and therefore there
> is no point in emiting the corresponding instructions.
>
In this case, it appears to be smart enough. However, I agree that
relying on compiler smarts is probably not the best thing for us to do.
> Furthermore, if LPAE is not defined, then thi never gets initialized and
> should produce a warning. Did you test compilation of the code with LPAE
> turned off?
>
Sure. One of our test platforms is non-LPAE. The compiler does not
produce warnings on this, and this is consistent across both compiler
versions.
> I'd prefer something like this where more stuff is validated by the
> compiler:
>
> static inline phys_addr_t __virt_to_phys(unsigned long x)
> {
> unsigned long tlo, thi;
> phys_addr_t ret;
>
> early_patch_imm8("add", tlo, x, __pv_offset, 0);
> ret = tlo;
>
> if (sizeof(phys_addr_t) > 4) {
> #ifdef __ARMEB__
> early_patch_imm8_mov("mov", thi, __pv_phys_offset, 0);
> #else
> early_patch_imm8_mov("mov", thi, __pv_phys_offset, 4);
> #endif
> ret |= ((u64)thi) << 32;
> }
>
> return ret);
> }
>
> This should let the compiler optimize things whether LPAE is enabledor
> not while validating both cases.
>
Agreed on the principal, but more below...
I've meanwhile been chasing down another problem - the code generated
for the LPAE case. The original code resulted in the following:
<____test_virt_to_phys>:
mov r2, #0
b c01bc800 # patch: add r1, r0, __pv_offset
b c01bc810 # patch: mov r0, __phys_offset_high
orr r2, r2, r1
mov r3, r0
mov r1, r3
mov r0, r2
bx lr
Yikes! This code does a bunch of futile register shuffling and a
pointless or, all in the name of generating the result in a 64-bit
register-pair from the 32-bit halves.
In order to get past this, I tried adding operand qualifiers (R = upper
32-bits, Q = lower 32-bits) in the patch macros, in the hope that
treating these as native 64-bit register pairs would eliminate the need
to shuffle them around after the inline assembly blocks. This then
allows us to implement __virt_to_phys() as follows:
static inline phys_addr_t __virt_to_phys(unsigned long x)
{
phys_addr_t t;
if (sizeof(t) == 4) {
t = x;
early_patch_imm8("add", t, "", t, __pv_offset, 0);
return t;
}
/*
* On LPAE, we do not _need_ to do 64-bit arithmetic because
* the high order 32 bits are never changed by the phys-virt
* offset. We simply patch in the high order physical address
* bits instead.
*
* Note: the mov _must_ be first here. From the compiler's
* perspective, this is the initializer for the variable. The
* mov itself initializes only the upper half. The subsequent
* add treats t as a read/write operand and initializes the
* lower half.
*/
#ifdef __ARMEB__
early_patch_imm8_mov("mov", t, "R", __pv_phys_offset, 0);
#else
early_patch_imm8_mov("mov", t, "R", __pv_phys_offset, 4);
#endif
early_patch_imm8("add", t, "Q", x, __pv_offset, 0);
return t;
}
With this, we get significantly better looking generated code:
<____test_virt_to_phys>:
b c01d519c # patch: mov r3, __phys_offset_high
b c01d51ac # patch: add r2, r0, __phys_offset_high
mov r0, r2
mov r1, r3
bx lr
This is about as far along as I've been able to proceed. I still
haven't figured out a way to get it to patch in place without an extra
register pair.
Overall, this is still a bit too kludgy for my liking. In particular,
the read/write operand forces add/sub/... users to initialize the result
variable. I am currently leaning towards adding native support for
64-bit operations in the runtime patch code, instead of having to hack
around it with 32-bit primitives. Better ideas, any one?
Thanks
-- Cyril.
next prev parent reply other threads:[~2012-08-12 23:27 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-11 1:24 [PATCH v2 00/22] Introducing the TI Keystone platform Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-11 1:24 ` [PATCH v2 01/22] ARM: add mechanism for late code patching Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 2:22 ` Nicolas Pitre
2012-08-12 2:22 ` Nicolas Pitre
2012-08-12 18:13 ` Cyril Chemparathy
2012-08-12 18:13 ` Cyril Chemparathy
2012-08-11 1:24 ` [PATCH v2 02/22] ARM: add self test for runtime patch mechanism Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 2:35 ` Nicolas Pitre
2012-08-12 2:35 ` Nicolas Pitre
2012-08-12 16:32 ` Cyril Chemparathy
2012-08-12 16:32 ` Cyril Chemparathy
2012-08-13 3:19 ` Nicolas Pitre
2012-08-13 3:19 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 03/22] ARM: use late patch framework for phys-virt patching Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 3:03 ` Nicolas Pitre
2012-08-12 3:03 ` Nicolas Pitre
2012-08-12 17:34 ` Cyril Chemparathy
2012-08-12 17:34 ` Cyril Chemparathy
2012-08-13 3:32 ` Nicolas Pitre
2012-08-13 3:32 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 04/22] ARM: LPAE: use phys_addr_t on virt <--> phys conversion Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 3:04 ` Nicolas Pitre
2012-08-12 3:04 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 05/22] ARM: LPAE: support 64-bit virt_to_phys patching Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 3:39 ` Nicolas Pitre
2012-08-12 3:39 ` Nicolas Pitre
2012-08-12 23:27 ` Cyril Chemparathy [this message]
2012-08-12 23:27 ` Cyril Chemparathy
2012-08-13 4:03 ` Nicolas Pitre
2012-08-13 4:03 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 06/22] ARM: LPAE: use signed arithmetic for mask definitions Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 3:57 ` Nicolas Pitre
2012-08-12 3:57 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 07/22] ARM: LPAE: use phys_addr_t in alloc_init_pud() Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-11 1:24 ` [PATCH v2 08/22] ARM: LPAE: use phys_addr_t in free_memmap() Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-11 1:24 ` [PATCH v2 09/22] ARM: LPAE: use phys_addr_t for initrd location and size Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 3:58 ` Nicolas Pitre
2012-08-12 3:58 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 10/22] ARM: LPAE: use phys_addr_t in switch_mm() Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 4:04 ` Nicolas Pitre
2012-08-12 4:04 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 11/22] ARM: LPAE: use 64-bit accessors for TTBR registers Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 4:11 ` Nicolas Pitre
2012-08-12 4:11 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 12/22] ARM: LPAE: define ARCH_LOW_ADDRESS_LIMIT for bootmem Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-11 1:24 ` [PATCH v2 13/22] ARM: LPAE: factor out T1SZ and TTBR1 computations Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 4:19 ` Nicolas Pitre
2012-08-12 4:19 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 14/22] ARM: LPAE: accomodate >32-bit addresses for page table base Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-11 1:24 ` [PATCH v2 15/22] ARM: mm: use physical addresses in highmem sanity checks Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 4:29 ` Nicolas Pitre
2012-08-12 4:29 ` Nicolas Pitre
2012-08-11 1:24 ` [PATCH v2 16/22] ARM: mm: cleanup checks for membank overlap with vmalloc area Cyril Chemparathy
2012-08-11 1:24 ` Cyril Chemparathy
2012-08-12 4:36 ` Nicolas Pitre
2012-08-12 4:36 ` Nicolas Pitre
2012-09-10 17:43 ` Cyril Chemparathy
2012-09-10 17:43 ` Cyril Chemparathy
2012-09-10 18:07 ` Nicolas Pitre
2012-09-10 18:07 ` Nicolas Pitre
2012-08-11 1:25 ` [PATCH v2 17/22] ARM: mm: clean up membank size limit checks Cyril Chemparathy
2012-08-11 1:25 ` Cyril Chemparathy
2012-08-11 1:25 ` [PATCH v2 18/22] ARM: add virt_to_idmap for interconnect aliasing Cyril Chemparathy
2012-08-11 1:25 ` Cyril Chemparathy
2012-08-11 1:25 ` [PATCH v2 19/22] ARM: recreate kernel mappings in early_paging_init() Cyril Chemparathy
2012-08-11 1:25 ` Cyril Chemparathy
2012-08-11 1:25 ` [RFC v2 20/22] ARM: keystone: introducing TI Keystone platform Cyril Chemparathy
2012-08-11 1:26 ` Cyril Chemparathy
2012-08-11 1:25 ` [RFC v2 21/22] ARM: keystone: enable SMP on Keystone machines Cyril Chemparathy
2012-08-11 1:25 ` Cyril Chemparathy
2012-08-11 1:25 ` [RFC v2 22/22] ARM: keystone: add switch over to high physical address range Cyril Chemparathy
2012-08-11 1:25 ` Cyril Chemparathy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50283BF5.1050908@ti.com \
--to=cyril@ti.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.