From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758379Ab0HOQFN (ORCPT ); Sun, 15 Aug 2010 12:05:13 -0400 Received: from mail.openrapids.net ([64.15.138.104]:58961 "EHLO blackscsi.openrapids.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1758299Ab0HOQFL convert rfc822-to-8bit (ORCPT ); Sun, 15 Aug 2010 12:05:11 -0400 Date: Sun, 15 Aug 2010 10:10:17 -0400 From: Mathieu Desnoyers To: Steven Rostedt , "Paul E. McKenney" Cc: Peter Zijlstra , Linus Torvalds , Frederic Weisbecker , Ingo Molnar , LKML , Andrew Morton , Thomas Gleixner , Christoph Hellwig , Li Zefan , Lai Jiangshan , Johannes Berg , Masami Hiramatsu , Arnaldo Carvalho de Melo , Tom Zanussi , KOSAKI Motohiro , Andi Kleen , "H. Peter Anvin" , Jeremy Fitzhardinge , "Frank Ch. Eigler" , Tejun Heo Subject: [RFD] Mechanical synchronization algorithms proofs (was: Re: [patch 1/2] x86_64 page fault NMI-safe) Message-ID: <20100815141017.GB18175@Krystal> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: 8BIT X-Editor: vi X-Info: http://www.efficios.com X-Operating-System: Linux/2.6.26-2-686 (i686) X-Uptime: 09:35:25 up 204 days, 16:12, 3 users, load average: 0.07, 0.04, 0.06 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (I re-threaded this email because this is yet another topic that does not have much to do with NMIs) * Steven Rostedt (rostedt@goodmis.org) wrote: > On Fri, 2010-08-06 at 10:13 -0400, Mathieu Desnoyers wrote: > > * Peter Zijlstra (peterz@infradead.org) wrote: > > > Less code = less instruction cache overhead. I've also shown that the LTTng code > > is at least twice faster. In terms of complexity, it is not much more complex; I > > also took the extra care of doing the formal proofs to make sure the > > corner-cases were dealt with, which I don't reckon neither Steven nor yourself > > have done. > > Yes Mathieu, you did a formal proof. Good for you. But honestly, it is > starting to get very annoying to hear you constantly stating that, > because, to most kernel developers, it is meaningless. So what you are stating is that having mechanically tested that all possible reordering done by the architecture on enough unrolled loops of the algorithm, testing all boundary cases (buffer full, NMI, etc) is 'meaningless' to kernel developers ? This tells me I have an important explanation job to do, because this can be tremendously useful to ensure rock-solidness of the current kernel synchronization primitives. I vastly prefer this approach to synchronization primitives to the 'let's turn this knob and see what breaks and which reviewers will oppose' approach that some kernel developers currently have. > Any slight > modification of your algorithm, renders the proof invalid. We can add the model and test-cases somewhere in Documentation then, and require that the model should be updated and the test-suite ran when important synchronization changes are done. > > You are not the only one that has done a proof to an algorithm in the > kernel, I know Paul McKenney has done formal proofs about RCU too. I actually started from his work when I started looked into proving the LTTng sync. algos. I even worked with him on a formal model of Userspace RCU. It ended up being a more generic proof framework that I think can be useful beyond modeling of algorithm corner-cases. It lets us model whole synchronization algorithms; the framework per se takes care of modeling memory and instruction reordering. > but you are definitely the only one that constantly reminds > people that you have done so. Congrats on your PhD, and in academia, > proofs are important. Even though you don't seem to see it or like it, this proof holds an important place in the robustness we can expect from this ring buffer, and I want people to go look and ask questions about the model if they care about it. Yes, modeling and doing formal proofs takes time. I understand that kernel developers don't have that much time on their hands. But don't look down on people who spent the time and effort to do it. With a more collaborative mindset, we could try to automate the C to out-of-order memory/instruction framework with a simple compiler, so we could create those proofs with much less effort than what is currently required. > > But this is a ring buffer, not a critical part of the workings of the > kernel. There are much more critical and fragile parts of the kernel > that work fine without a formal proof. I've come to find this out all by myself: regarding the LTTng tree, I receive more bug reports about problems coming from the kernel than tracer-related problems. We seem to have slightly different definitions of "working fine". > > Paul McKenney did a proof for RCU not for us, but just to help give him > a warm fuzzy about it. RCU is much more complex than the ftrace ring > buffer, Not that much I'm afraid. Well, the now deprecated "preempt rcu" was a complexity mess, but I think Paul did a good job at simplifying the TREE_RCU preempt implementation to bring the complexity level doing to something we can model and understand in its entirety. > and it also is much more critical. If Paul gets it wrong, a > machine will crash. He's right to worry. What if we start using the ring buffer as a core kernel synchronization primitive in audio drivers, video drivers, etc ? These are all implementing their own ring buffer flavor at the moment, which duplicates code with similar functionality and multiply reviewer's job. Then the ring buffer suddenly become more critical than you initially thought. > And even Paul told me that no > formal proof makes up for large scale testing. Which BTW, the ftrace > ring buffer has gone through. LTTng has gone through large-scale testing in terms of user-base and machine configurations too. I've also ran it on the largest-scale machines in terms of CPU I could find (only 8 so far, but I'd love to see results on larger machines). I can say for sure that Ftrace performance has not been tested on large-scale SMP machines, simply because your Ftrace benchmark only use one writer/one reader thread. > > Someday I may go ahead and do that proof, Please do. > but I did do a very intensive > state diagram, and I'm quite confident that it works. Convincing others is the hard part. > It's been deployed > for quite a bit, and the design has yet to be a factor in any bug report > of the ring buffer. Yeah, synchronization bugs are really hard to reproduce, I'm not that surprised. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com