From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
rusty@rustcorp.com.au, tglx@linutronix.de, x86@kernel.org,
linux-kernel@vger.kernel.org, hpa@zytor.com, jeremy@goop.org,
cpw@sgi.com
Subject: Re: [patch] x86: optimize __pa() to be linear again on 64-bit x86
Date: Tue, 24 Feb 2009 01:08:53 +1100 [thread overview]
Message-ID: <200902240108.54892.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <20090223133804.GA20468@elte.hu>
On Tuesday 24 February 2009 00:38:04 Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> > > Are __pa()/__va() that hot paths? Or am I over-estimating
> > > the cost of 2MB dTLB?
> >
> > yes, __pa()/__va() is a very hot path - in a defconfig they
> > are used in about a thousand different places.
> >
> > In fact it would be nice to get rid of the __phys_addr()
> > redirection on the 64-bit side (which is non-linear and a
> > function there, and all __pa()s go through it) and make it a
> > constant offset again.
> >
> > This isnt trivial/possible to do though as .data/.bss is in
> > the high alias. (high .text aliases alone wouldnt be a big
> > issue to fix, but the data aliases are an issue.)
> >
> > Moving .data/.bss into the linear space isnt feasible as we'd
> > lose RIP-relative addressing shortcuts.
> >
> > Maybe we could figure out the places that do __pa() on a high
> > alias and gradually eliminate them. __pa() on .data/.bss is a
> > rare and unusal thing to do, and CONFIG_DEBUG_VIRTUAL could
> > warn about them without crashing the kernel.
> >
> > Later on we could make this check unconditional, and then
> > switch over __pa() to addr-PAGE_OFFSET in the
> > !CONFIG_DEBUG_VIRTUAL case (which is the default).
>
> Ok, i couldnt resist and using ftrace_printk() (regular printk
> in __pa() would hang during bootup) and came up with the patch
> below - which allows the second patch below that does:
>
> -#define __pa(x) __phys_addr((unsigned long)(x))
> +#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
>
> It cuts a nice (and hotly executed) ~650 bytes chunk out of the
> x86 64-bit defconfig kernel text:
>
> text data bss dec hex filename
> 7999071 1137780 843672 9980523 984a6b vmlinux.before
> 7998414 1137780 843672 9979866 9847da vmlinux.after
>
> And it even boots.
>
> (the load_cr3() hack needs to be changed, by setting the init
> pgdir from init_level4_pgt to __va(__pa_symbol(init_level4_pgt).)
>
> (32-bit is untested and likely wont even build.)
>
> It's not even that bad and looks quite maintainable as a
> concept.
>
> This also means that __va() and __pa() will be one and the same
> thing simple arithmetics again on both 32-bit and 64-bit
> kernels.
>
> Ingo
>
> ---
> arch/x86/include/asm/page.h | 4 +++-
> arch/x86/include/asm/page_64_types.h | 1 +
> arch/x86/include/asm/pgalloc.h | 4 ++--
> arch/x86/include/asm/pgtable.h | 2 +-
> arch/x86/include/asm/processor.h | 7 ++++++-
> arch/x86/kernel/setup.c | 12 ++++++------
> arch/x86/mm/init_64.c | 6 +++---
> arch/x86/mm/ioremap.c | 12 +++++++++++-
> arch/x86/mm/pageattr.c | 28 ++++++++++++++--------------
> arch/x86/mm/pgtable.c | 2 +-
> 10 files changed, 48 insertions(+), 30 deletions(-)
>
> Index: linux/arch/x86/include/asm/page.h
> ===================================================================
> --- linux.orig/arch/x86/include/asm/page.h
> +++ linux/arch/x86/include/asm/page.h
> @@ -34,10 +34,11 @@ static inline void copy_user_page(void *
> #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
>
> #define __pa(x) __phys_addr((unsigned long)(x))
> +#define __pa_slow(x) __phys_addr_slow((unsigned long)(x))
> #define __pa_nodebug(x) __phys_addr_nodebug((unsigned long)(x))
> /* __pa_symbol should be used for C visible symbols.
> This seems to be the official gcc blessed way to do such arithmetic. */
> -#define __pa_symbol(x) __pa(__phys_reloc_hide((unsigned long)(x)))
> +#define __pa_symbol(x) __pa_slow(__phys_reloc_hide((unsigned long)(x)))
>
> #define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
>
> @@ -49,6 +50,7 @@ static inline void copy_user_page(void *
> * virt_addr_valid(kaddr) returns true.
> */
> #define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
> +#define virt_to_page_slow(kaddr) pfn_to_page(__pa_slow(kaddr) >>
Heh. I have almost the exact opposite patch which adds a virt_to_page_fast
and uses it in critical places (in the slab allocator).
But if you can do this more complete conversion, cool. Yes, __pa is very
performance critical (not just code size). Time to alloc+free an object
in the slab allocator is on the order of 100 cycles, so saving a few
cycles here == saving a few %. (although saying that, you hardly ever see
a workload where the slab allocator is too prominent)
next prev parent reply other threads:[~2009-02-23 14:09 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-18 12:04 [PATCHSET x86/core/percpu] implement dynamic percpu allocator Tejun Heo
2009-02-18 12:04 ` [PATCH 01/10] vmalloc: call flush_cache_vunmap() from unmap_kernel_range() Tejun Heo
2009-02-19 12:06 ` Nick Piggin
2009-02-19 22:36 ` David Miller
2009-02-18 12:04 ` [PATCH 02/10] module: fix out-of-range memory access Tejun Heo
2009-02-19 12:08 ` Nick Piggin
2009-02-20 7:16 ` Tejun Heo
2009-02-18 12:04 ` [PATCH 03/10] module: reorder module pcpu related functions Tejun Heo
2009-02-18 12:04 ` [PATCH 04/10] alloc_percpu: change percpu_ptr to per_cpu_ptr Tejun Heo
2009-02-18 12:04 ` [PATCH 05/10] alloc_percpu: add align argument to __alloc_percpu Tejun Heo
2009-02-18 12:04 ` [PATCH 06/10] percpu: kill percpu_alloc() and friends Tejun Heo
2009-02-19 0:17 ` Rusty Russell
2009-03-11 18:36 ` Tony Luck
2009-03-11 22:44 ` Rusty Russell
2009-03-12 2:06 ` Tejun Heo
2009-02-18 12:04 ` [PATCH 07/10] vmalloc: implement vm_area_register_early() Tejun Heo
2009-02-19 0:55 ` Tejun Heo
2009-02-19 12:09 ` Nick Piggin
2009-02-18 12:04 ` [PATCH 08/10] vmalloc: add un/map_kernel_range_noflush() Tejun Heo
2009-02-19 12:17 ` Nick Piggin
2009-02-20 1:27 ` Tejun Heo
2009-02-20 7:15 ` Subject: [PATCH 08/10 UPDATED] " Tejun Heo
2009-02-20 8:32 ` Andrew Morton
2009-02-21 3:21 ` Tejun Heo
2009-02-18 12:04 ` [PATCH 09/10] percpu: implement new dynamic percpu allocator Tejun Heo
2009-02-19 10:10 ` Andrew Morton
2009-02-19 11:01 ` Ingo Molnar
2009-02-20 2:45 ` Tejun Heo
2009-02-19 12:07 ` Rusty Russell
2009-02-20 2:35 ` Tejun Heo
2009-02-20 3:04 ` Andrew Morton
2009-02-20 5:29 ` Tejun Heo
2009-02-24 2:52 ` Rusty Russell
2009-02-19 11:51 ` Rusty Russell
2009-02-20 3:01 ` Tejun Heo
2009-02-20 3:02 ` Tejun Heo
2009-02-24 2:56 ` Rusty Russell
2009-02-24 5:27 ` [PATCH tj-percpu] percpu: add __read_mostly to variables which are mostly read only Tejun Heo
2009-02-24 5:47 ` [PATCH 09/10] percpu: implement new dynamic percpu allocator Tejun Heo
2009-02-24 17:41 ` Luck, Tony
2009-02-26 3:17 ` Tejun Heo
2009-02-27 19:41 ` Luck, Tony
2009-02-19 12:36 ` Nick Piggin
2009-02-20 3:04 ` Tejun Heo
2009-02-20 7:30 ` [PATCH UPDATED " Tejun Heo
2009-02-20 8:37 ` Andrew Morton
2009-02-21 3:23 ` Tejun Heo
2009-02-21 3:42 ` [PATCH tj-percpu] percpu: s/size/bytes/g in new percpu allocator and interface Tejun Heo
2009-02-21 7:48 ` Tejun Heo
2009-02-21 7:55 ` [PATCH tj-percpu] percpu: clean up size usage Tejun Heo
2009-02-21 7:56 ` Tejun Heo
2009-02-18 12:04 ` [PATCH 10/10] x86: convert to the new dynamic percpu allocator Tejun Heo
2009-02-18 13:43 ` [PATCHSET x86/core/percpu] implement " Ingo Molnar
2009-02-19 0:31 ` Tejun Heo
2009-02-19 10:51 ` Rusty Russell
2009-02-19 11:06 ` Ingo Molnar
2009-02-19 12:14 ` Rusty Russell
2009-02-20 3:08 ` Tejun Heo
2009-02-20 5:36 ` Tejun Heo
2009-02-20 7:33 ` Tejun Heo
2009-02-19 0:30 ` Tejun Heo
2009-02-19 11:07 ` Ingo Molnar
2009-02-20 3:17 ` Tejun Heo
2009-02-20 9:32 ` Ingo Molnar
2009-02-21 7:10 ` Tejun Heo
2009-02-21 7:33 ` Tejun Heo
2009-02-22 19:38 ` Ingo Molnar
2009-02-23 0:43 ` Tejun Heo
2009-02-23 10:17 ` Ingo Molnar
2009-02-23 13:38 ` [patch] x86: optimize __pa() to be linear again on 64-bit x86 Ingo Molnar
2009-02-23 14:08 ` Nick Piggin [this message]
2009-02-23 14:53 ` Ingo Molnar
2009-02-24 16:00 ` Andi Kleen
2009-02-27 5:57 ` Tejun Heo
2009-02-27 6:57 ` Ingo Molnar
2009-02-27 7:11 ` Tejun Heo
2009-02-22 19:27 ` [PATCHSET x86/core/percpu] implement dynamic percpu allocator Ingo Molnar
2009-02-23 0:47 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200902240108.54892.nickpiggin@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=cpw@sgi.com \
--cc=hpa@zytor.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.