* [PATCH] x86: page: get_order() optimization
@ 2011-03-19 16:25 Maksym Planeta
2011-03-23 10:07 ` Ingo Molnar
0 siblings, 1 reply; 4+ messages in thread
From: Maksym Planeta @ 2011-03-19 16:25 UTC (permalink / raw)
To: tglx; +Cc: kernel-janitors, mingo, linux-kernel, Maksym Planeta
For x86 architecture get_order function can be optimized due to
assembler instruction bsr.
I'm sorry. I've forgot about Signed-off, so the same, but with the sign.
Signed-off-by: Maksym Planeta <mcsim.planeta@gmail.com>
---
arch/x86/include/asm/page.h | 20 +++++++++++++++++++-
1 files changed, 19 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..339ae26 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -60,10 +60,28 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
extern bool __virt_addr_valid(unsigned long kaddr);
#define virt_addr_valid(kaddr) __virt_addr_valid((unsigned long) (kaddr))
+/* Pure 2^n version of get_order */
+static inline __attribute_const__ int get_order(unsigned long size)
+{
+ int order;
+
+ size = (size - 1) >> (PAGE_SHIFT - 1);
+#ifdef CONFIG_X86_CMOV
+ asm("bsr %1,%0\n\t"
+ "cmovzl %2,%0"
+ : "=&r" (order) : "rm" (size), "rm" (0));
+#else
+ asm("bsr %1,%0\n\t"
+ "jnz 1f\n\t"
+ "movl $0,%0\n"
+ "1:" : "=r" (order) : "rm" (size));
+#endif
+ return order;
+}
+
#endif /* __ASSEMBLY__ */
#include <asm-generic/memory_model.h>
-#include <asm-generic/getorder.h>
#define __HAVE_ARCH_GATE_AREA 1
--
1.7.2.3
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] x86: page: get_order() optimization
2011-03-19 16:25 [PATCH] x86: page: get_order() optimization Maksym Planeta
@ 2011-03-23 10:07 ` Ingo Molnar
2011-03-27 8:57 ` Maksym Planeta
0 siblings, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2011-03-23 10:07 UTC (permalink / raw)
To: Maksym Planeta; +Cc: tglx, kernel-janitors, mingo, linux-kernel
* Maksym Planeta <mcsim.planeta@gmail.com> wrote:
> For x86 architecture get_order function can be optimized due to
> assembler instruction bsr.
>
> I'm sorry. I've forgot about Signed-off, so the same, but with the sign.
>
> Signed-off-by: Maksym Planeta <mcsim.planeta@gmail.com>
> ---
> arch/x86/include/asm/page.h | 20 +++++++++++++++++++-
> 1 files changed, 19 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
> index 8ca8283..339ae26 100644
> --- a/arch/x86/include/asm/page.h
> +++ b/arch/x86/include/asm/page.h
> @@ -60,10 +60,28 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
> extern bool __virt_addr_valid(unsigned long kaddr);
> #define virt_addr_valid(kaddr) __virt_addr_valid((unsigned long) (kaddr))
>
> +/* Pure 2^n version of get_order */
> +static inline __attribute_const__ int get_order(unsigned long size)
> +{
> + int order;
> +
> + size = (size - 1) >> (PAGE_SHIFT - 1);
> +#ifdef CONFIG_X86_CMOV
> + asm("bsr %1,%0\n\t"
> + "cmovzl %2,%0"
> + : "=&r" (order) : "rm" (size), "rm" (0));
> +#else
> + asm("bsr %1,%0\n\t"
> + "jnz 1f\n\t"
> + "movl $0,%0\n"
> + "1:" : "=r" (order) : "rm" (size));
> +#endif
> + return order;
> +}
Ok, that's certainly a nice optimization.
One detail: in many cases 'size' is a constant. Have you checked recent GCC,
does it turn the generic version of get_order() into a loop even for constants,
or is it able does it perhaps recognize the pattern and precompute the result?
If it recognizes the pattern then this optmization needs to be made dependent
on whether the expression is constant or not - see bitops.h of how to do that.
Furthermore, a cleanliness observation it would be nicer to encapsulate the
CMOVZL/jump pattern into a macro, something like ASM_CMOVZL(2,0) to express
'cmovzl %2,%0'. In the !CONFIG_X86_CMOV case it gets turned into the jnz/movl
instructions. The assembly code here would be much cleaner that way:
asm("bsr %1,%0\n"
ASM_CMOVZL(2,0)
: "=&r" (order) : "rm" (size), "rm" (0));
With no #ifdefs in get_order().
Thanks,
Ingo
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] x86: page: get_order() optimization
2011-03-23 10:07 ` Ingo Molnar
@ 2011-03-27 8:57 ` Maksym Planeta
0 siblings, 0 replies; 4+ messages in thread
From: Maksym Planeta @ 2011-03-27 8:57 UTC (permalink / raw)
To: Ingo Molnar; +Cc: tglx, kernel-janitors, mingo, linux-kernel
On Wed, 23/03/2011 at 11:07 +0100, Ingo Molnar wrote:
> Ok, that's certainly a nice optimization.
Thanks, I rewrote patch according to your observations.
> One detail: in many cases 'size' is a constant. Have you checked recent GCC,
> does it turn the generic version of get_order() into a loop even for constants,
> or is it able does it perhaps recognize the pattern and precompute the result?
Yes, gcc precomputes the result, so I added case for constants.
> With no #ifdefs in get_order().
And removed #ifdefs from get_order().
--
Thanks,
Maksym Planeta
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] x86: page: get_order() optimization
@ 2011-03-19 16:08 Maksym Planeta
0 siblings, 0 replies; 4+ messages in thread
From: Maksym Planeta @ 2011-03-19 16:08 UTC (permalink / raw)
To: tglx; +Cc: kernel-janitors, mingo, linux-kernel, Maksym Planeta
For x86 architecture get_order function can be optimized due to
assembler instruction bsr.
---
arch/x86/include/asm/page.h | 20 +++++++++++++++++++-
1 files changed, 19 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..339ae26 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -60,10 +60,28 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
extern bool __virt_addr_valid(unsigned long kaddr);
#define virt_addr_valid(kaddr) __virt_addr_valid((unsigned long) (kaddr))
+/* Pure 2^n version of get_order */
+static inline __attribute_const__ int get_order(unsigned long size)
+{
+ int order;
+
+ size = (size - 1) >> (PAGE_SHIFT - 1);
+#ifdef CONFIG_X86_CMOV
+ asm("bsr %1,%0\n\t"
+ "cmovzl %2,%0"
+ : "=&r" (order) : "rm" (size), "rm" (0));
+#else
+ asm("bsr %1,%0\n\t"
+ "jnz 1f\n\t"
+ "movl $0,%0\n"
+ "1:" : "=r" (order) : "rm" (size));
+#endif
+ return order;
+}
+
#endif /* __ASSEMBLY__ */
#include <asm-generic/memory_model.h>
-#include <asm-generic/getorder.h>
#define __HAVE_ARCH_GATE_AREA 1
--
1.7.2.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-03-27 8:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-19 16:25 [PATCH] x86: page: get_order() optimization Maksym Planeta
2011-03-23 10:07 ` Ingo Molnar
2011-03-27 8:57 ` Maksym Planeta
-- strict thread matches above, loose matches on Subject: below --
2011-03-19 16:08 Maksym Planeta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).