From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E17A9C282C8 for ; Mon, 28 Jan 2019 09:24:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B1F152087F for ; Mon, 28 Jan 2019 09:24:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="uiQiFlBX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726800AbfA1JY2 (ORCPT ); Mon, 28 Jan 2019 04:24:28 -0500 Received: from merlin.infradead.org ([205.233.59.134]:33702 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726647AbfA1JY2 (ORCPT ); Mon, 28 Jan 2019 04:24:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Dhs0G08dFrRahelnBMrHg3TXWJ2h5RFGMYpqGVvsWes=; b=uiQiFlBX7OWpImwwzogYpbT7H omRd+Vyr/VucxEh4eQ/c+7vlib5cdVkX54SXH6wyHofzawVCRhjzKU5ujQDmNspi1mSTPF0hUc37b pcVK5oUHNWLMbr7yWlouR5gyy5P7BvAzv92W5QdeN7mR5lXAI0PFwbHuumear0t2SM5jxwOafmMWP R/PwSU/uz9m0/f1yWJ9dAtq/tYmJBoqYJiGdDES2LvZtF0Zk9op8XWCqW2ON39E5c9g1ogLcg/5/1 guCGaA0Hqv2aZXs6MGwbEUeGKjxikzjkCkcRxbL/FdD3P429sqd2m/rRySOGSThx+h/Kg8TJQlpkk s+rLmacUg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1go39F-0005zC-Uk; Mon, 28 Jan 2019 09:24:10 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 2648720101B90; Mon, 28 Jan 2019 10:24:08 +0100 (CET) Date: Mon, 28 Jan 2019 10:24:08 +0100 From: Peter Zijlstra To: Alexei Starovoitov Cc: Alexei Starovoitov , davem@davemloft.net, daniel@iogearbox.net, jakub.kicinski@netronome.com, netdev@vger.kernel.org, kernel-team@fb.com, mingo@redhat.com, will.deacon@arm.com, Paul McKenney , jannh@google.com Subject: Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock Message-ID: <20190128092408.GD28467@hirez.programming.kicks-ass.net> References: <20190124041403.2100609-1-ast@kernel.org> <20190124041403.2100609-2-ast@kernel.org> <20190124180109.GA27771@hirez.programming.kicks-ass.net> <20190124235857.xyb5xx2ufr6x5mbt@ast-mbp.dhcp.thefacebook.com> <20190125102312.GC4500@hirez.programming.kicks-ass.net> <20190126001725.roqqfrpysyljqiqx@ast-mbp.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190126001725.roqqfrpysyljqiqx@ast-mbp.dhcp.thefacebook.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote: > On Fri, Jan 25, 2019 at 11:23:12AM +0100, Peter Zijlstra wrote: > > On Thu, Jan 24, 2019 at 03:58:59PM -0800, Alexei Starovoitov wrote: > > > On Thu, Jan 24, 2019 at 07:01:09PM +0100, Peter Zijlstra wrote: > > > > > > And this would again be the moment where I go pester you about the BPF > > > > memory model :-) > > > > > > hehe :) > > > How do you propose to define it in a way that it applies to all archs > > > and yet doesn't penalize x86 ? > > > "Assume minimum execution ordering model" the way kernel does > > > unfortunately is not usable, since bpf doesn't have a luxury > > > of using nice #defines that convert into nops on x86. > > > > Why not? Surely the JIT can fix it up? That is, suppose you were to have > > smp_rmb() as a eBPF instruction then the JIT (which knows what > > architecture it is targeting) can simply avoid emitting instructions for > > it. > > I'm all for adding new instructions that solve real use cases. > imo bpf_spin_lock() is the first step in helping with concurrency. > At plumbers conference we agreed to add new sync_fetch_and_add() > and xchg() instructions. That's a second step. > smp_rmb/wmb/mb should be added as well. > JITs will patch them depending on architecture. > > What I want to avoid is to define the whole execution ordering model upfront. > We cannot say that BPF ISA is weakly ordered like alpha. > Most of the bpf progs are written and running on x86. We shouldn't > twist bpf developer's arm by artificially relaxing memory model. > BPF memory model is equal to memory model of underlying architecture. > What we can do is to make it bpf progs a bit more portable with > smp_rmb instructions, but we must not force weak execution on the developer. Well, I agree with only introducing bits you actually need, and my smp_rmb() example might have been poorly chosen, smp_load_acquire() / smp_store_release() might have been a far more useful example. But I disagree with the last part; we have to pick a model now; otherwise you'll pain yourself into a corner. Also; Alpha isn't very relevant these days; however ARM64 does seem to be gaining a lot of attention and that is very much a weak architecture. Adding strongly ordered assumptions to BPF now, will penalize them in the long run. > > Similarly; could something like this not also help with the spinlock > > thing? Use that generic test-and-set thing for the interpreter, but > > provide a better implementation in the JIT? > > May be eventually. We can add cmpxchg insn, but the verifier still > doesn't support loops. We made a lot of progress in bounded loop research > over the last 2 years, but loops in bpf are still a year away. > We considered adding 'bpf_spin_lock' as a new instruction instead of helper call, > but that approach has a lot of negatives, so we went with the helper. Ah, but the loop won't be in the BPF program itself. The BPF program would only have had the BPF_SPIN_LOCK instruction, the JIT them emits code similar to queued_spin_lock()/queued_spin_unlock() (or calls to out-of-line versions of them). There isn't anything that mandates the JIT uses the exact same locking routines the interpreter does, is there?