From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Marcos E. Matsunaga" Subject: Re: [PATCH 1/2] rwlock: add per-cpu reader-writer locks Date: Thu, 5 Nov 2015 10:46:37 -0500 Message-ID: <563B79DD.3060805@oracle.com> References: <1446573502-8019-1-git-send-email-malcolm.crossley@citrix.com> <563B5E13.6000709@oracle.com> <563B73C1.4090302@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <563B73C1.4090302@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Malcolm Crossley , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Hi Malcolm, If you can post the updated patches, that would be great. I think it = would be better for me to test with your update. Thanks again. On 11/05/2015 10:20 AM, Malcolm Crossley wrote: > On 05/11/15 13:48, Marcos E. Matsunaga wrote: >> Hi Malcolm, >> >> I tried your patches against staging yesterday and as soon as I started = a guest, it panic. I have >> lock_profile enabled and applied your patches against: > I tested with a non debug version of Xen (because I was analysing the per= formance of Xen) and thus > those ASSERTS were never run. > > The ASSERTS can be safely removed, the rwlock behaviour is slightly diffe= rent in that it's possible > for a writer to hold the write lock whilst a reader is progressing throug= h the read critical > section, this is safe because the writer is waiting for the percpu variab= les to clear before > actually progressing through it's own critical section. > > I have an updated version of the patch series which fixes this. Do you wa= nt me to post it or are you > happy to remove the ASSERTS yourself ( or switch to non-debug build of Xe= n) > > Sorry for not catching this before it hit the list. > > Malcolm > >> 6f04de658574833688c3f9eab310e7834d56a9c0 x86: cleanup of early cpuid han= dling >> >> >> >> (XEN) HVM1 save: CPU >> (XEN) HVM1 save: PIC >> (XEN) HVM1 save: IOAPIC >> (XEN) HVM1 save: LAPIC >> (XEN) HVM1 save: LAPIC_REGS >> (XEN) HVM1 save: PCI_IRQ >> (XEN) HVM1 save: ISA_IRQ >> (XEN) HVM1 save: PCI_LINK >> (XEN) HVM1 save: PIT >> (XEN) HVM1 save: RTC >> (XEN) HVM1 save: HPET >> (XEN) HVM1 save: PMTIMER >> (XEN) HVM1 save: MTRR >> (XEN) HVM1 save: VIRIDIAN_DOMAIN >> (XEN) HVM1 save: CPU_XSAVE >> (XEN) HVM1 save: VIRIDIAN_VCPU >> (XEN) HVM1 save: VMCE_VCPU >> (XEN) HVM1 save: TSC_ADJUST >> (XEN) HVM1 restore: CPU 0 >> [ 394.163143] loop: module loaded >> (XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215 >> (XEN) ----[ Xen-4.7-unstable x86_64 debug=3Dy Tainted: C ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[] do_grant_table_op+0x63f/0x2e04 >> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (d0v0) >> (XEN) rax: 0000000000000000 rbx: ffff83400f9dc9e0 rcx: 0000000000000= 000 >> (XEN) rdx: 0000000000000001 rsi: ffff82d080342b10 rdi: ffff83400819b= 784 >> (XEN) rbp: ffff8300774ffef8 rsp: ffff8300774ffdf8 r8: 00000000000000= 02 >> (XEN) r9: 0000000000000002 r10: 0000000000000002 r11: 00000000fffff= fff >> (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: ffff83400819b= 780 >> (XEN) r15: ffff83400f9d0000 cr0: 0000000080050033 cr4: 0000000000152= 6e0 >> (XEN) cr3: 000001007f613000 cr2: ffff8800746182b8 >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=3Dffff8300774ffdf8: >> (XEN) ffff8300774ffe08 ffff82d000000000 ffff8300774ffef8 ffff82d08017= fc9b >> (XEN) ffff82d080342b28 ffff83400f9d8600 ffff82d080342b10 000000000000= 0000 >> (XEN) ffff83400f9dca20 ffff832100000000 ffff834008188000 000000000000= 0001 >> (XEN) 00000001772ee000 ffff8801e98d03e0 ffff8300774ffe88 000000000000= 0000 >> (XEN) 0000000000000000 ffff8300774fff18 00000021d0269c10 000000000001= 001a >> (XEN) ffffffff00000001 0000000000000000 0000000000000246 00007ff7de45= a407 >> (XEN) 0000000000000100 00007ff7de45a407 0000000000000033 ffff8300772e= e000 >> (XEN) ffff8801eb0e3c00 ffff880004bf57e8 ffff8801e98d03e0 ffff8801eb0a= 5938 >> (XEN) 00007cff88b000c7 ffff82d08023d952 ffffffff8100128a 000000000000= 0014 >> (XEN) 0000000000000000 0000000000000001 ffff8801f6e18388 ffffffff81d3= d740 >> (XEN) ffff8801efb7bd40 ffff88000542e780 0000000000000282 000000000000= 0000 >> (XEN) ffff8801e98d03a0 ffff8801efe07000 0000000000000014 ffffffff8100= 128a >> (XEN) 0000000000000001 ffff8801e98d03e0 0000000000000000 000101000000= 0000 >> (XEN) ffffffff8100128a 000000000000e033 0000000000000282 ffff8801efb7= bce0 >> (XEN) 000000000000e02b 0000000000000000 0000000000000000 000000000000= 0000 >> (XEN) 0000000000000000 0000000000000000 ffff8300772ee000 000000000000= 0000 >> (XEN) 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [] do_grant_table_op+0x63f/0x2e04 >> (XEN) [] lstar_enter+0xe2/0x13c >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215 >> (XEN) **************************************** >> (XEN) >> (XEN) Manual reset required ('noreboot' specified) >> >> >> Thanks for your help. >> >> On 11/03/2015 12:58 PM, Malcolm Crossley wrote: >>> Per-cpu read-write locks allow for the fast path read case to have low = overhead >>> by only setting/clearing a per-cpu variable for using the read lock. >>> The per-cpu read fast path also avoids locked compare swap operations w= hich can >>> be particularly slow on coherent multi-socket systems, particularly if = there is >>> heavy usage of the read lock itself. >>> >>> The per-cpu reader-writer lock uses a global variable to control the re= ad lock >>> fast path. This allows a writer to disable the fast path and ensure the= readers >>> use the underlying read-write lock implementation. >>> >>> Once the writer has taken the write lock and disabled the fast path, it= must >>> poll the per-cpu variable for all CPU's which have entered the critical= section >>> for the specific read-write lock the writer is attempting to take. This= design >>> allows for a single per-cpu variable to be used for read/write locks be= longing >>> to seperate data structures as long as multiple per-cpu read locks are = not >>> simultaneously held by one particular cpu. This also means per-cpu >>> reader-writer locks are not recursion safe. >>> >>> Slow path readers which are unblocked set the per-cpu variable and drop= the >>> read lock. This simplifies the implementation and allows for fairness i= n the >>> underlying read-write lock to be taken advantage of. >>> >>> There may be slightly more overhead on the per-cpu write lock path due = to >>> checking each CPUs fast path read variable but this overhead is likely = be hidden >>> by the required delay of waiting for readers to exit the critical secti= on. >>> The loop is optimised to only iterate over the per-cpu data of active r= eaders >>> of the rwlock. >>> >>> Signed-off-by: Malcolm Crossley >>> --- >>> xen/common/spinlock.c | 32 ++++++++++++++++++++++++++++++++ >>> xen/include/asm-arm/percpu.h | 5 +++++ >>> xen/include/asm-x86/percpu.h | 6 ++++++ >>> xen/include/xen/percpu.h | 4 ++++ >>> xen/include/xen/spinlock.h | 37 ++++++++++++++++++++++++++++++++++= +++ >>> 5 files changed, 84 insertions(+) >>> >>> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c >>> index 7f89694..a526216 100644 >>> --- a/xen/common/spinlock.c >>> +++ b/xen/common/spinlock.c >>> @@ -492,6 +492,38 @@ int _rw_is_write_locked(rwlock_t *lock) >>> return (lock->lock =3D=3D RW_WRITE_FLAG); /* writer in critical = section? */ >>> } >>> +void percpu_write_lock(rwlock_t **per_cpudata, bool_t *writer_activ= ating, >>> + rwlock_t *rwlock) >>> +{ >>> + int cpu; >>> + cpumask_t active_readers; >>> + >>> + cpumask_copy(&active_readers, &cpu_online_map); >>> + /* First take the write lock to protect against other writers */ >>> + write_lock(rwlock); >>> + >>> + /* Now set the global variable so that readers start using read_lo= ck */ >>> + *writer_activating =3D true; >>> + smp_mb(); >>> + >>> + /* Check if there are any percpu readers in progress on our rwlock= */ >>> + do >>> + { >>> + for_each_cpu(cpu, &active_readers) >>> + { >>> + /* Remove any percpu readers not contending >>> + * from our check mask */ >>> + if ( per_cpu_ptr(per_cpudata, cpu) !=3D rwlock ) >>> + cpumask_clear_cpu(cpu, &active_readers); >>> + } >>> + /* Check if we've cleared all percpu readers */ >>> + if ( cpumask_empty(&active_readers) ) >>> + break; >>> + /* Give the coherency fabric a break */ >>> + cpu_relax(); >>> + } while ( 1 ); >>> +} >>> + >>> #ifdef LOCK_PROFILE >>> struct lock_profile_anc { >>> diff --git a/xen/include/asm-arm/percpu.h b/xen/include/asm-arm/percpu.h >>> index 71e7649..c308a56 100644 >>> --- a/xen/include/asm-arm/percpu.h >>> +++ b/xen/include/asm-arm/percpu.h >>> @@ -27,6 +27,11 @@ void percpu_init_areas(void); >>> #define __get_cpu_var(var) \ >>> (*RELOC_HIDE(&per_cpu__##var, READ_SYSREG(TPIDR_EL2))) >>> +#define per_cpu_ptr(var, cpu) \ >>> + (*RELOC_HIDE(&var, __per_cpu_offset[cpu])) >>> +#define __get_cpu_ptr(var) \ >>> + (*RELOC_HIDE(&var, READ_SYSREG(TPIDR_EL2))) >>> + >>> #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu_= _##name >>> DECLARE_PER_CPU(unsigned int, cpu_id); >>> diff --git a/xen/include/asm-x86/percpu.h b/xen/include/asm-x86/percpu.h >>> index 604ff0d..51562b9 100644 >>> --- a/xen/include/asm-x86/percpu.h >>> +++ b/xen/include/asm-x86/percpu.h >>> @@ -20,4 +20,10 @@ void percpu_init_areas(void); >>> #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cp= u__##name >>> +#define __get_cpu_ptr(var) \ >>> + (*RELOC_HIDE(var, get_cpu_info()->per_cpu_offset)) >>> + >>> +#define per_cpu_ptr(var, cpu) \ >>> + (*RELOC_HIDE(var, __per_cpu_offset[cpu])) >>> + >>> #endif /* __X86_PERCPU_H__ */ >>> diff --git a/xen/include/xen/percpu.h b/xen/include/xen/percpu.h >>> index abe0b11..c896863 100644 >>> --- a/xen/include/xen/percpu.h >>> +++ b/xen/include/xen/percpu.h >>> @@ -16,6 +16,10 @@ >>> /* Preferred on Xen. Also see arch-defined per_cpu(). */ >>> #define this_cpu(var) __get_cpu_var(var) >>> +#define this_cpu_ptr(ptr) __get_cpu_ptr(ptr) >>> + >>> +#define get_per_cpu_var(var) (per_cpu__##var) >>> + >>> /* Linux compatibility. */ >>> #define get_cpu_var(var) this_cpu(var) >>> #define put_cpu_var(var) >>> diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h >>> index fb0438e..f929f1b 100644 >>> --- a/xen/include/xen/spinlock.h >>> +++ b/xen/include/xen/spinlock.h >>> @@ -3,6 +3,7 @@ >>> #include >>> #include >>> +#include >>> #ifndef NDEBUG >>> struct lock_debug { >>> @@ -261,4 +262,40 @@ int _rw_is_write_locked(rwlock_t *lock); >>> #define rw_is_locked(l) _rw_is_locked(l) >>> #define rw_is_write_locked(l) _rw_is_write_locked(l) >>> +static inline void percpu_read_lock(rwlock_t **per_cpudata, bool_t = *writer_activating, >>> + rwlock_t *rwlock) >>> +{ >>> + /* Indicate this cpu is reading */ >>> + this_cpu_ptr(per_cpudata) =3D rwlock; >>> + smp_mb(); >>> + /* Check if a writer is waiting */ >>> + if ( unlikely(*writer_activating) ) >>> + { >>> + /* Let the waiting writer know we aren't holding the lock */ >>> + this_cpu_ptr(per_cpudata) =3D NULL; >>> + /* Wait using the read lock to keep the lock fair */ >>> + read_lock(rwlock); >>> + /* Set the per CPU data again and continue*/ >>> + this_cpu_ptr(per_cpudata) =3D rwlock; >>> + /* Drop the read lock because we don't need it anymore */ >>> + read_unlock(rwlock); >>> + } >>> +} >>> + >>> +static inline void percpu_read_unlock(rwlock_t **per_cpudata) >>> +{ >>> + this_cpu_ptr(per_cpudata) =3D NULL; >>> + smp_wmb(); >>> +} >>> + >>> +/* Don't inline percpu write lock as it's a complex function */ >>> +void percpu_write_lock(rwlock_t **per_cpudata, bool_t *writer_activati= ng, >>> + rwlock_t *rwlock); >>> + >>> +static inline void percpu_write_unlock(bool_t *writer_activating, rwlo= ck_t *rwlock) >>> +{ >>> + *writer_activating =3D false; >>> + write_unlock(rwlock); >>> +} >>> + >>> #endif /* __SPINLOCK_H__ */ -- = Regards, Marcos Eduardo Matsunaga Oracle USA Linux Engineering =93The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.=94