From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mathieu Desnoyers Subject: Re: [RFC PATCH v7 7/7] Restartable sequences: self-tests Date: Mon, 25 Jul 2016 16:43:29 +0000 (UTC) Message-ID: <1031089010.81455.1469465009989.JavaMail.zimbra@efficios.com> References: <1469135662-31512-1-git-send-email-mathieu.desnoyers@efficios.com> <1469135662-31512-8-git-send-email-mathieu.desnoyers@efficios.com> <1590181502.79032.1469329777708.JavaMail.zimbra@efficios.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Dave Watson Cc: Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api , Paul Turner , Andrew Hunter , Peter Zijlstra , Andy Lutomirski , Andi Kleen , Chris Lameter , Ben Maurer , rostedt , "Paul E. McKenney" , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Michael Kerrisk , Boqun Feng List-Id: linux-api@vger.kernel.org ----- On Jul 24, 2016, at 2:01 PM, Dave Watson davejwatson-b10kYP2dOMg@public.gmane.org wrote: >>> +static inline __attribute__((always_inline)) >>> +bool rseq_finish(struct rseq_lock *rlock, >>> + intptr_t *p, intptr_t to_write, >>> + struct rseq_state start_value) > >>> This ABI looks like it will work fine for our use case. I don't think it >>> has been mentioned yet, but we may still need multiple asm blocks >>> for differing numbers of writes. For example, an array-based freelist push: > >>> void push(void *obj) { >>> if (index < maxlen) { >>> freelist[index++] = obj; >>> } >>> } > >>> would be more efficiently implemented with a two-write rseq_finish: > >>> rseq_finish2(&freelist[index], obj, // first write >>> &index, index + 1, // second write >>> ...); > >> Would pairing one rseq_start with two rseq_finish do the trick >> there ? > > Yes, two rseq_finish works, as long as the extra rseq management overhead > is not substantial. The different is actually not negligible. On x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz: (counter increment benchmark (single-thread)) * Single store per increment: 3.6 ns * Two rseq_finish() per increment: 5.2 ns * rseq_finish2() with two mov instructions per rseq_finish2(): 4.0 ns And I expect the difference to be even larger on non-x86 architectures. I'll try to figure out a way to do rseq_finish() and rseq_finish2() without duplicating the code. Perhaps macros will be helpful there. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com