From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752694AbdKVMgB (ORCPT ); Wed, 22 Nov 2017 07:36:01 -0500 Received: from mail.efficios.com ([167.114.142.141]:42008 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752370AbdKVMf7 (ORCPT ); Wed, 22 Nov 2017 07:35:59 -0500 Date: Wed, 22 Nov 2017 12:36:59 +0000 (UTC) From: Mathieu Desnoyers To: Thomas Gleixner Cc: Andi Kleen , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Andy Lutomirski , Dave Watson , linux-kernel , linux-api , Paul Turner , Andrew Morton , Russell King , Ingo Molnar , "H. Peter Anvin" , Andrew Hunter , Chris Lameter , Ben Maurer , rostedt , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Michael Kerrisk Message-ID: <809252084.19901.1511354219731.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20171121141900.18471-1-mathieu.desnoyers@efficios.com> <20171121172144.GL2482@two.firstfloor.org> <740195164.19702.1511301908907.JavaMail.zimbra@efficios.com> Subject: Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.141] X-Mailer: Zimbra 8.7.11_GA_1854 (ZimbraWebClient - FF52 (Linux)/8.7.11_GA_1854) Thread-Topic: Restartable sequences and CPU op vector Thread-Index: ATLYBAk1XXDRAyrSstQssJQB8TmYkQ== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Nov 21, 2017, at 5:59 PM, Thomas Gleixner tglx@linutronix.de wrote: > On Tue, 21 Nov 2017, Mathieu Desnoyers wrote: >> ----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi@firstfloor.org wrote: >> >> > On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote: >> >> Hi, >> >> >> >> Following changes based on a thorough coding style and patch changelog >> >> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this >> >> series for another RFC. >> >> >> > My suggestion would be that you also split out the opv system call. >> > That seems to be main contention point currently, and the restartable >> > sequences should be useful without it. >> >> I consider rseq to be incomplete and a pain to use in various scenarios >> without cpu_opv. >> >> About the contention point you refer to: >> >> Using vDSO as an example of how things should be done is just wrong: the >> vDSO interaction with debugger instruction single-stepping is broken, >> as I detailed in my previous email. > > Let me turn that around. You're lamenting about a conditional branch in > your rseq thing for performance reasons and at the same time you want to > force extra code into the VDSO? clock_gettime() is one of the hottest > vsyscalls in certain scenarions. So why would we want to have extra code > there? Just to make debuggers happy. You really can't be serious about > that. There is *already* an existing branch in the clock_gettime vsyscall: it's a loop. It won't hurt the fast-path to use that branch and make it do something else instead. It could even help the vDSO fast-path for some non-x86 architectures where branch prediction assumes that backward branches are always taken (adding an unlikely() does not help in those cases). > >> Thomas' proposal of handling single-stepping with a user-space locking >> fallback, which is pretty much what I had in 2016, pushes a lot of >> complexity to user-space, requires an extra branch in the fast-path, >> as well as additional store-release/load-acquire semantics for consistency. >> I don't plan going down that route. >> >> Other than that, I have not received any concrete alternative proposal to >> properly handle single-stepping. > > You provided the details today. Up to that point all we had was handwaving > and inconsistent information. I mistakenly presumed you took interest in the past 2 years discussions. It appears I was wrong, and that information needed to be summarized in my changelog. This was my mistake and I fixed it. > >> The only opposition against cpu_opv is that there *should* be an hypothetical >> simpler solution. The rseq idea is not new: it's been presented by Paul Turner >> in 2012 at LPC. And so far, cpu_opv is the overall simplest and most >> efficient way I encountered to handle single-stepping, and it gives extra >> benefits, as described in my changelog. > > That's how you define it and that does not make cpu_opv less complex and > more debuggable. There is no way to debug that and still you claim that it > removes compexity from user space. So I should ask: what kind of observability within cpu_opv() do you want ? I can add a tracepoint for each operation, which would technically take care of your concern. You main counter-argument seems to be a tooling issue. > That ops stuff comes from user space and > is not magically constructed by the kernel. In some of your use cases it > even has different semantics than the rseq section code. So how is that > removing any complexity from user space? All it buys you is an extra branch > less in your rseq hotpath and that's your justification to shove that > thing into the kernel. Actually, the cpu-op user-space library can hide this difference from the user: I implemented the equivalent rseq algorithm using a compare-and-store: int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, off_t voffp, intptr_t *load, int cpu) { intptr_t oldv = READ_ONCE(*v); intptr_t *newp = (intptr_t *)(oldv + voffp); int ret; if (oldv == expectnot) return 1; ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu); if (!ret) { *load = oldv; return 0; } if (ret > 0) { errno = EAGAIN; return -1; } return -1; } So from a library user perspective, the fast-path and slow-path are exactly the same. > > The version I reviewed was just undigestable. Thanks for the thorough coding style review by the way. > I did not have time to look > at the hastily cobbled together version of today. Aside of that the > scheduler portion of it has not seen any review from scheduler folks > either. True. It appears that it really takes a merge window to get some people's attention. That's OK, you guys are really busy on other stuff. It's just unfortunate that the feedback about the cpu_opv concept did not come sooner, e.g. during first rounds of patches where the cpu_opv design was presented, or even at KS. > > AFAICT there is not a single reviewed-by tag on the sys_rseq and the > sys_opv patches either. Very good point! Anyone in CC who cares about getting this in can find time to do some official review ? > > Are you seriously expecting that new syscalls of that kind are going to be > merged without a deep and thorough review just based on your decision to > declare them ready? In my reply to Andi, I merely state that I'm not willing to push an half-baked user-space ABI into the kernel, and rseq without cpu_opv is only part of the solution. Let's see if others find time to do an official review. Thanks, Mathieu > > Thanks, > > tglx -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com