From: Tejun Heo <tj@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] Additional x86 fixes for 2.6.31-rc5
Date: Mon, 03 Aug 2009 10:49:18 +0900 [thread overview]
Message-ID: <4A76421E.3040005@kernel.org> (raw)
In-Reply-To: <alpine.LFD.2.01.0908011214330.3304@localhost.localdomain>
Hello, Linus.
Linus Torvalds wrote:
> I just noticed another issue on x86 code generation, since I was looking
> at assembly language generation due to the do_sigaltstack() kernel stack
> info leak thing.
>
> Our "get_current()" seriously sucks now that it's a per-cpu variable.
>
> Look at the code generated for something like
>
> current->sas_ss_sp = (unsigned long) ss_sp;
> current->sas_ss_size = ss_size;
>
> and notice how the code really really sucks:
>
> movq %gs:per_cpu__current_task,%rcx
> movq %rdx, 1152(%rcx)
> movq %gs:per_cpu__current_task,%rdx
> movq %rax, 1160(%rdx)
Right. This isn't new tho. In practive, the current percpu_read()
thingies behave identically to the original x86_read_percpu(). 2.6.29
generates about the same code.
(This is from the object file so the offsets haven't been set yet)
(gdb) disassemble block_all_signals
Dump of assembler code for function block_all_signals:
...
0x000000000000168a <block_all_signals+74>: mov %gs:0x0,%rax
0x0000000000001693 <block_all_signals+83>: mov %r12,0x4e0(%rax)
0x000000000000169a <block_all_signals+90>: mov %gs:0x0,%rax
0x00000000000016a3 <block_all_signals+99>: mov %r13,0x4d8(%rax)
0x00000000000016aa <block_all_signals+106>: mov %gs:0x0,%rax
0x00000000000016b3 <block_all_signals+115>: mov %r14,0x4d0(%rax)
0x00000000000016ba <block_all_signals+122>: mov %gs:0x0,%rax
0x00000000000016c3 <block_all_signals+131>: mov 0x488(%rax),%rdi
...
...
> End result: the above horror becomes a more reasonable
>
> movq %gs:per_cpu__current_task,%rax
> movq %rcx, 1152(%rax)
> movq %rdx, 1160(%rax)
>
> instead (it still doesn't cache it over the whole function, but it's
> certainly better).
>
> NOTE! I did not test that it all worked. I only looked at the asm, and
> checked out the improvements. All the ones I looked at looked reasonable.
Cool. I'm currently testing the kernel. It has booted fine on 8-core
2-way NUMA machine and repetetively compiling kernel w/ -j16.
Everything seems to work fine till now.
> Worthwhile? You be the judge.
I think it's definitely worthwhile. The gain is significant compared
to the added miniscule complexity.
"current" is a per-thread variable implemented as a per-cpu variable.
Given that we don't have many of them yet and they're accessed via
arch-specific accessors, percpu_read_stable() seems like a pretty good
solution to me.
If we ever need more thread variables, we might venture into using %fs
and maybe __thread if other archs can be converted similarly but I
think we're better off with packing such things in task_struct -
per-thread is far scarier than per-cpu scalability-wise.
> There's another detail that may be worth looking at: we often get
> 'current' and 'thread_info' together, and they are _not_ in the same
> cache-line. It might be worth defining them together in the per-cpu data,
> and making sure they are in the same cacheline too. In general, we should
> probably look at which per-pcu variables are hot and read-only, and try to
> gathe them all together.
Yeap, putting hot ones together would be great. I'm a bit curious why
it should be hot _and_ read-only tho. For variables which aren't
accessed too often by other cpus, read-onlyness shouldn't matter too
much, right?
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Sat, 1 Aug 2009 11:50:54 -0700
> Subject: [PATCH] x86-64: Add 'percpu_read_stable()' interface for cacheable accesses
>
> This is very useful for some common things like 'get_current()' and
> 'get_thread_info()', which can be used multiple times in a function, and
> where the result is cacheable.
>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do you want to queue it for 2.6.31? Given that the generated code
changes compared to the previous kernels (both pre and post percpu
stuff), I think it would be safer to queue it for 2.6.32 window. I
would be happy to carry it in the percpu tree.
Thanks.
--
tejun
next prev parent reply other threads:[~2009-08-03 1:49 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-31 18:13 [GIT PULL] Additional x86 fixes for 2.6.31-rc5 H. Peter Anvin
2009-07-31 19:45 ` Linus Torvalds
2009-07-31 19:57 ` Ingo Molnar
2009-08-01 19:28 ` Linus Torvalds
2009-08-01 19:38 ` H. Peter Anvin
2009-08-01 22:04 ` Linus Torvalds
2009-08-01 22:35 ` H. Peter Anvin
2009-08-02 1:20 ` Paul Mackerras
2009-08-02 3:52 ` H. Peter Anvin
2009-08-03 1:01 ` Tejun Heo
2009-08-03 1:14 ` Linus Torvalds
2009-08-03 1:49 ` Tejun Heo [this message]
2009-08-03 2:14 ` Linus Torvalds
2009-08-03 5:08 ` [PATCH 1/3] x86: Add 'percpu_read_stable()' interface for cacheable accesses Tejun Heo
2009-08-03 5:13 ` H. Peter Anvin
2009-08-03 5:18 ` Tejun Heo
2009-08-03 6:04 ` Ingo Molnar
2009-08-03 6:08 ` H. Peter Anvin
2009-08-03 6:16 ` Ingo Molnar
2009-08-03 7:00 ` Ingo Molnar
2009-08-03 15:13 ` [PATCH 1/3 UPDATED] x86, percpu: " Tejun Heo
2009-08-03 5:10 ` [PATCH 2/3] x86,percpu: fix DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED() Tejun Heo
2009-08-03 5:12 ` [PATCH 3/3] x86: collect hot percpu variables into one cacheline Tejun Heo
2009-08-05 7:34 ` [GIT PULL] Additional x86 fixes for 2.6.31-rc5 Tan, Wei Chong
2009-08-05 8:06 ` Ingo Molnar
2009-08-10 0:42 ` Tan, Wei Chong
2009-08-10 9:05 ` Ingo Molnar
2009-08-10 15:32 ` Linus Torvalds
2009-08-10 9:06 ` [tip:x86/urgent] x86: Fix serialization in pit_expect_msb() tip-bot for Linus Torvalds
2009-08-10 18:01 ` tip-bot for Linus Torvalds
2009-08-05 23:10 ` [GIT PULL] Additional x86 fixes for 2.6.31-rc5 Tan, Wei Chong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A76421E.3040005@kernel.org \
--to=tj@kernel.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.