From mboxrd@z Thu Jan 1 00:00:00 1970 From: arnd@arndb.de (Arnd Bergmann) Date: Wed, 20 May 2015 08:51:32 +0200 Subject: [RFC] arm: Add for atomic half word exchange In-Reply-To: <1348896100.440561432098574765.JavaMail.weblogic@ep2mlwas07a> References: <1348896100.440561432098574765.JavaMail.weblogic@ep2mlwas07a> Message-ID: <2528978.P5FT0BVksd@wuerfel> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wednesday 20 May 2015 05:09:35 Sarbojit Ganguly wrote: > > ------- Original Message ------- > > Sender : Peter Zijlstra > > Date : May 19, 2015 21:43 (GMT+09:00) > > Title : Re: [RFC] arm: Add for atomic half word exchange > > > > On Tue, May 19, 2015 at 11:20:13AM +0000, Sarbojit Ganguly wrote: > > > On Tuesday 19 May 2015 09:39:33 Sarbojit Ganguly wrote: > > > > Since 16 bit half word exchange was not there and MCS based > > > > qspinlock by Waiman's xchg_tail() requires an atomic exchange on a > > > > half word, here is a small modification to __xchg() code. > > > > Can you actually see a performance improvement with the qspinlock code > > on ARM ? > > > > The real improvements on x86 were on NUMA systems; although there were > > real improvements on light loads as well. > > > > > > Note that ARM (or any load-store arch) could get rid of all the cmpxchg > > loops in that code. Although I suppose we replaced the most common ones > > with these unconditional atomics already -- like that xchg16 -- so > > implementing those with ll/sc, as you did, should be near optimal. > > Yes, the main advantage of Qspinlock code can be observed in NUMA but > when I tested in an embedded system, a slight advantage was observed. Is this a multi-cluster SMP system? Those can behave like NUMA machines in some ways. We could easily limit the use of 16-bit xchg() to ARMv7 machines by using select ARCH_USE_QUEUED_SPINLOCKS if !SMP_ON_UP or select ARCH_USE_QUEUED_SPINLOCKS if !CPU_V6 when enabling the qspinlock implementation. Arnd