* Re: RFC: Kernel lock elision for TSX
@ 2013-06-30 21:45 max
0 siblings, 0 replies; 7+ messages in thread
From: max @ 2013-06-30 21:45 UTC (permalink / raw)
To: linux-kernel
On Saturday, March 23, 2013 6:11:52 PM UTC+1, Linus Torvalds wrote:
> On Fri, Mar 22, 2013 at 6:24 PM, Andi Kleen <andi@firstfloor.org> wrote:
>
> >
> > Some questions and answers:
> >
> > - How much does it improve performance?
>
> > I cannot share any performance numbers at this point unfortunately.
> > Also please keep in mind that the tuning is very preliminary and
> > will be revised.
>
> If we don't know how much it helps, we can't judge whether it's worth
> even discussing this patch. It adds enough complexity that it had
> better be worth it, and without knowing the performance side, all we
> can see are the negatives.
>
> Talk to your managers about this. Tell them that without performance
> numbers, any patch-series like this is totally pointless.
Hello,
I don't know if the thread is still actual, but I have a Core i7 4770
as my home PC, which supports TSX. I bought it *exactly* to experiment
with hardware transactions.
I am willing to test and benchmark kernel patches, and since I do not
work for Intel I can tell all the quantitative performance differences
I find.
Obviously, they will be *my* results, not official Intel ones -
it's up to Andi Kleen or some other Intel guy to tell if they are ok
or not with this, but since CPUs with TSX are now available in shops,
non-disclosure about their performance seems a bit difficult to
enforce...
--
I can tell from my preliminary performance results that at least for
user-space RTM seems really fast. On my PC, the overhead of an empty
transaction is approximately 11 nanoseconds and a minimal transaction
reading and writing 2 or 3 memory addresses runs in approximately
15-20 nanoseconds.
I just hope I did not violate some non-disclosure condition attached
to the CPU guarantee certificate ;-)
I tested it both with GCC, using inline assembler and .byte directives,
and in Lisp (don't tell anybody), by writing a compiler module that
defines the XBEGIN, XTEST, XABORT and XEND primitives.
--
How can I help?
I would start with the patches already posted by Andi, but the ones
I found in LKML archives seem to belong to at least two different sets
of patches: xy/31 (September 2012) and xy/29 (March 2013) and I could
not find if the first ones are a prerequisite for the second.
Regards,
Massimiliano
^ permalink raw reply [flat|nested] 7+ messages in thread
* RFC: Kernel lock elision for TSX
@ 2013-03-23 1:24 Andi Kleen
2013-03-23 17:11 ` Linus Torvalds
0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2013-03-23 1:24 UTC (permalink / raw)
To: linux-kernel; +Cc: torvalds, akpm, x86
This patchkit implements TSX lock elision for the kernel locks.
Lock elision uses hardware transactional memory to execute
locks in parallel.
This is just a RFC at this point, so that people can comment
on the code. Please send your feedback.
Code is against v3.9-rc3
Also available from:
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git
Branch: hle39/spinlock
The branch names may change as the tree is rebased.
For more details on the general elision concept please see:
http://halobates.de/adding-lock-elision-to-linux.pdf
http://lwn.net/Articles/533894/
Full TSX specification:
http://software.intel.com/file/41417 (chapter 8)
The patches provides the elision infrastructure and the changes
to the standard locks (rwsems, mutexes, spinlocks, rwspinlocks,
bit spinlocks) to elide.
The general strategy is to elide as many locks as possible,
and use a combination of manual disabling and automatic
adaptation to handle lock regions that do not elide well.
Some additional kernel changes are also useful to fix common
transaction aborts. I have not included those in this patchkit,
but they will be submitted separately. Many of these changes
improve general scalability, but improving cache line sharing
overhead.
Especially the adaptation algorithms have a lot of tunables.
The tuning is currently preliminary and will be revised later.
Some questions and answers:
- How much does it improve performance?
I cannot share any performance numbers at this point unfortunately.
Also please keep in mind that the tuning is very preliminary and
will be revised.
- How to test it:
You either need a system with Intel TSX. A qemu version with
TSX support is available from https://github.com/crjohns/qemu-tsx
and may also support the kernel (untested)
- The CONFIG_RTM_LOCKS option does not appear
Make sure CONFIG_PARAVIRT_GUEST and CONFIG_PARAVIRT_SPINLOCKS
is enabled. The spinlock code uses the paravirt locking infrastructure
to add elision.
- How does it interact with virtualization?
It cannot interoperate with Xen paravirtualized locks, but without
them lock elision should work in virtualization. If the Xen
pvlocks are active spinlock elision will be disabled.
This may be fixed at some point.
There are some limitations in perf TSX PMU profiling with virtualization.
- How to tune it:
Use perf with the TSX extensions and the statistics exposed in
/sys/module/rtm_locks
You may need the latest hsw/pmu* branch from
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git
- Why does this use RTM and not HLE
RTM is more flexible and we don't need HLE in this code.
Andi Kleen
ak@linux.intel.com
Speaking for myself only
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Kernel lock elision for TSX
2013-03-23 1:24 Andi Kleen
@ 2013-03-23 17:11 ` Linus Torvalds
2013-03-23 18:00 ` Andi Kleen
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2013-03-23 17:11 UTC (permalink / raw)
To: Andi Kleen
Cc: Linux Kernel Mailing List, Andrew Morton,
the arch/x86 maintainers, Benjamin Herrenschmidt
On Fri, Mar 22, 2013 at 6:24 PM, Andi Kleen <andi@firstfloor.org> wrote:
>
> Some questions and answers:
>
> - How much does it improve performance?
> I cannot share any performance numbers at this point unfortunately.
> Also please keep in mind that the tuning is very preliminary and
> will be revised.
Quite frankly, since the *only* reason for RTM is performance, this
fundamentally makes the patch-set pointless.
If we don't know how much it helps, we can't judge whether it's worth
even discussing this patch. It adds enough complexity that it had
better be worth it, and without knowing the performance side, all we
can see are the negatives.
Talk to your managers about this. Tell them that without performance
numbers, any patch-series like this is totally pointless.
Does it make non-contended code slower? We don't know. Does it improve
anything but micro-benchmarks? We don't know. Is there any point to
this? WE DON"T KNOW.
Inside of intel, it might be useful for testing and validating the
hardware. Outside of intel, it is totally useless without performance
numbers.
The other comment I have is that since it does touch non-x86 header
files etc (although not a lot), you really need to talk to the POWER8
people about naming of the thing. Calling it <linux/rtm.h> and having
"generic" helpers called _xtest() used by the generic spinlock code
sounds a bit suspect.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: RFC: Kernel lock elision for TSX
2013-03-23 17:11 ` Linus Torvalds
@ 2013-03-23 18:00 ` Andi Kleen
2013-03-23 18:02 ` Andi Kleen
2013-03-24 14:17 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 7+ messages in thread
From: Andi Kleen @ 2013-03-23 18:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andi Kleen, Linux Kernel Mailing List, Andrew Morton,
the arch/x86 maintainers, Benjamin Herrenschmidt
Hi Linux,
Thanks. Other code/design review would be still appreciated, even
under the current constraints.
> The other comment I have is that since it does touch non-x86 header
> files etc (although not a lot), you really need to talk to the POWER8
> people about naming of the thing. Calling it <linux/rtm.h> and having
> "generic" helpers called _xtest() used by the generic spinlock code
> sounds a bit suspect.
I can make up another name for _xtest()/_xabort() and linux/rtm.h,
(any suggestions?)
The basic concepts implemented there should be pretty universal.
If others have a equivalent of "is this a transaction" and "abort
this tranction" they can just plug it in. Otherwise they will nop it,
as it's only hints anyways.
The only things used outside x86 code is _xtest()/_xabort(), can
remove the rest from linux/*. Without transactions this is all nops.
The primary interface for the lock code is the much higher level
elide()/elide_lock_adapt() interface anyways.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Kernel lock elision for TSX
2013-03-23 18:00 ` Andi Kleen
@ 2013-03-23 18:02 ` Andi Kleen
2013-03-24 14:17 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2013-03-23 18:02 UTC (permalink / raw)
To: Andi Kleen
Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton,
the arch/x86 maintainers, Benjamin Herrenschmidt
On Sat, Mar 23, 2013 at 07:00:10PM +0100, Andi Kleen wrote:
>
> Hi Linux,
Also I debut on finally making that famous typo too. Sorry.
-Andi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Kernel lock elision for TSX
2013-03-23 18:00 ` Andi Kleen
2013-03-23 18:02 ` Andi Kleen
@ 2013-03-24 14:17 ` Benjamin Herrenschmidt
2013-03-25 0:59 ` Michael Neuling
1 sibling, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2013-03-24 14:17 UTC (permalink / raw)
To: Andi Kleen
Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton,
the arch/x86 maintainers, Michael Neuling
On Sat, 2013-03-23 at 19:00 +0100, Andi Kleen wrote:
> Hi Linux,
>
> Thanks. Other code/design review would be still appreciated, even
> under the current constraints.
>
> > The other comment I have is that since it does touch non-x86 header
> > files etc (although not a lot), you really need to talk to the POWER8
> > people about naming of the thing. Calling it <linux/rtm.h> and having
> > "generic" helpers called _xtest() used by the generic spinlock code
> > sounds a bit suspect.
>
> I can make up another name for _xtest()/_xabort() and linux/rtm.h,
> (any suggestions?)
>
> The basic concepts implemented there should be pretty universal.
> If others have a equivalent of "is this a transaction" and "abort
> this tranction" they can just plug it in. Otherwise they will nop it,
> as it's only hints anyways.
>
> The only things used outside x86 code is _xtest()/_xabort(), can
> remove the rest from linux/*. Without transactions this is all nops.
> The primary interface for the lock code is the much higher level
> elide()/elide_lock_adapt() interface anyways.
Adding Michael Neuling to the CC list, he's probably the LTC person who
is the most familiar with POWER8 TM at the moment.
Cheers,
Ben.
> -Andi
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Kernel lock elision for TSX
2013-03-24 14:17 ` Benjamin Herrenschmidt
@ 2013-03-25 0:59 ` Michael Neuling
0 siblings, 0 replies; 7+ messages in thread
From: Michael Neuling @ 2013-03-25 0:59 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Andi Kleen, Linus Torvalds, Linux Kernel Mailing List,
Andrew Morton, the arch/x86 maintainers
> On Sat, 2013-03-23 at 19:00 +0100, Andi Kleen wrote:
> > Hi Linux,
> >
> > Thanks. Other code/design review would be still appreciated, even
> > under the current constraints.
> >
> > > The other comment I have is that since it does touch non-x86 header
> > > files etc (although not a lot), you really need to talk to the POWER8
> > > people about naming of the thing. Calling it <linux/rtm.h> and having
> > > "generic" helpers called _xtest() used by the generic spinlock code
> > > sounds a bit suspect.
> >
> > I can make up another name for _xtest()/_xabort() and linux/rtm.h,
> > (any suggestions?)
> >
> > The basic concepts implemented there should be pretty universal.
> > If others have a equivalent of "is this a transaction" and "abort
> > this tranction" they can just plug it in. Otherwise they will nop it,
> > as it's only hints anyways.
> >
> > The only things used outside x86 code is _xtest()/_xabort(), can
> > remove the rest from linux/*. Without transactions this is all nops.
> > The primary interface for the lock code is the much higher level
> > elide()/elide_lock_adapt() interface anyways.
>
> Adding Michael Neuling to the CC list, he's probably the LTC person who
> is the most familiar with POWER8 TM at the moment.
Thanks. I'll respond inline, but agree, the naming convention is very
x86 centrinc and will not suit powerpc at all.
Also, like Andy, we don't have permission to post any performance
numbers so have held off on bothering with any thing like this for now.
Mikey
>
> Cheers,
> Ben.
>
> > -Andi
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-06-30 21:49 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-30 21:45 RFC: Kernel lock elision for TSX max
-- strict thread matches above, loose matches on Subject: below --
2013-03-23 1:24 Andi Kleen
2013-03-23 17:11 ` Linus Torvalds
2013-03-23 18:00 ` Andi Kleen
2013-03-23 18:02 ` Andi Kleen
2013-03-24 14:17 ` Benjamin Herrenschmidt
2013-03-25 0:59 ` Michael Neuling
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox