From mboxrd@z Thu Jan 1 00:00:00 1970 From: Malcolm Crossley Subject: Re: [PATCHv5 1/3] rwlock: Add per-cpu reader-writer lock infrastructure Date: Tue, 19 Jan 2016 10:29:22 +0000 Message-ID: <569E1002.6060007@citrix.com> References: <1450454920-11036-1-git-send-email-malcolm.crossley@citrix.com> <1450454920-11036-2-git-send-email-malcolm.crossley@citrix.com> <56793A8B.1070005@citrix.com> <5693C4DD.3040907@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aLTXT-0000j9-3q for xen-devel@lists.xenproject.org; Tue, 19 Jan 2016 10:29:27 +0000 In-Reply-To: <5693C4DD.3040907@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: George Dunlap , JBeulich@suse.com, ian.campbell@citrix.com, andrew.cooper3@citrix.com, Marcos.Matsunaga@oracle.com, keir@xen.org, konrad.wilk@oracle.com, george.dunlap@eu.citrix.com Cc: xen-devel@lists.xenproject.org, dario.faggioli@citrix.com, stefano.stabellini@citrix.com List-Id: xen-devel@lists.xenproject.org On 11/01/16 15:06, Malcolm Crossley wrote: > On 22/12/15 11:56, George Dunlap wrote: >> On 18/12/15 16:08, Malcolm Crossley wrote: >>> >>> + >>> +#ifndef NDEBUG >>> +#define PERCPU_RW_LOCK_UNLOCKED(owner) { RW_LOCK_UNLOCKED, 0, owner } >>> +static inline void _percpu_rwlock_owner_check(percpu_rwlock_t **per_cpudata, >>> + percpu_rwlock_t *percpu_rwlock) >>> +{ >>> + ASSERT(per_cpudata == percpu_rwlock->percpu_owner); >>> +} >>> +#else >>> +#define PERCPU_RW_LOCK_UNLOCKED(owner) { RW_LOCK_UNLOCKED, 0 } >>> +#define _percpu_rwlock_owner_check(data, lock) ((void)0) >>> +#endif >>> + >>> +#define DEFINE_PERCPU_RWLOCK_RESOURCE(l, owner) \ >>> + percpu_rwlock_t l = PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner)) >>> +#define percpu_rwlock_resource_init(l, owner) \ >>> + (*(l) = (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner))) >>> + >>> +static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata, >>> + percpu_rwlock_t *percpu_rwlock) >> >> Is there a particular reason you chose to only use the "owner" value in >> the struct to verify that the "per_cpudata" argument passed matched the >> one you expected, rather than just getting rid of the "per_cpudata" >> argument altogether and always using the pointer in the struct? > > Initially I was aiming to add percpu aspects to the rwlock without increasing > the size of the rwlock structure itself, this was to keep data cache usage and > memory allocations the same. > It became clear that having a global writer_activating barrier would cause the > read_lock to enter the slow path far too often. So I put the writer_activating > variable in the percpu_rwlock_t, as writer_activating is just a bool then the > additional data overhead should be small. Always having a 8 byte pointer may > add a lot of overhead to data structures contain multiple rwlocks and thus > cause additional allocation overhead. >> >> (i.e., _percpu_read_lock(percpu_rwlock_t *percpu_rwlock) { ... >> per_cpudata = percpu_rwlock->percpu_owner; ... }) >> >> I'm not an expert in this sort of micro-optimization, but it seems like >> you're trading off storing a pointer in your rwlock struct for storing a >> pointer at every call site. Since you have to read writer_activating >> for every lock or unlock anyway, > > writer_activating is not read on the read_unlock path. As these are rwlocks > then I'm assuming the read lock/unlock paths are more critical for performance. > So I'd prefer to not do a read of the percpu_rwlock structure if it's not > required (i.e. on the read unlock path) > Furthermore, the single byte for the writer_activating variable is likely > to have been read into cache by accesses to other parts of the data structure > near the percpu_rwlock_t. If we add additional 8 bytes to the percpu_rwlock_t > then this may not happen and it may also adjust the cache line alignment aswell. > >> it doesn't seem like you'd actually be >> saving that many memory fetches; but having only one copy in the cache, >> rather than one copy per call site, would on the whole reduce both the >> cache footprint and the total memory used (if only by a few bytes). > > If you put the owner pointer in the percpu_rwlock_t then wouldn't you have > a copy per instance of percpu_rwlock_t? Surely this would use more cache than > the handful of call site references to a global variable. > >> >> It also makes the code cleaner to have only one argument, rather than >> two which must match; but since in all the places you use it you end up >> using a wrapper to give you a single argument anyway, I don't think that >> matters in this case. (i.e., if there's a good reason for having it at >> the call site instead if in the struct, I'm fine with this approach). > > If you agree with my reasoning for the cache overhead and performance of the > read unlock path being better with passing the percpu_data as an argument then > I propose we keep the patches as is. > Ping? I believe this is the last point of discussion before the patches can go in. Malcolm >> >> Everything else looks good, thanks. >> >> -George >> > >