From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=BDae=QE=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E17A9C282C8
	for <netdev@archiver.kernel.org>; Mon, 28 Jan 2019 09:24:29 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id B1F152087F
	for <netdev@archiver.kernel.org>; Mon, 28 Jan 2019 09:24:29 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="uiQiFlBX"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726800AbfA1JY2 (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Mon, 28 Jan 2019 04:24:28 -0500
Received: from merlin.infradead.org ([205.233.59.134]:33702 "EHLO
        merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726647AbfA1JY2 (ORCPT
        <rfc822;netdev@vger.kernel.org>); Mon, 28 Jan 2019 04:24:28 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
        d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version:
        References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:
        Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
        Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:
        List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
         bh=Dhs0G08dFrRahelnBMrHg3TXWJ2h5RFGMYpqGVvsWes=; b=uiQiFlBX7OWpImwwzogYpbT7H
        omRd+Vyr/VucxEh4eQ/c+7vlib5cdVkX54SXH6wyHofzawVCRhjzKU5ujQDmNspi1mSTPF0hUc37b
        pcVK5oUHNWLMbr7yWlouR5gyy5P7BvAzv92W5QdeN7mR5lXAI0PFwbHuumear0t2SM5jxwOafmMWP
        R/PwSU/uz9m0/f1yWJ9dAtq/tYmJBoqYJiGdDES2LvZtF0Zk9op8XWCqW2ON39E5c9g1ogLcg/5/1
        guCGaA0Hqv2aZXs6MGwbEUeGKjxikzjkCkcRxbL/FdD3P429sqd2m/rRySOGSThx+h/Kg8TJQlpkk
        s+rLmacUg==;
Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net)
        by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux))
        id 1go39F-0005zC-Uk; Mon, 28 Jan 2019 09:24:10 +0000
Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000)
        id 2648720101B90; Mon, 28 Jan 2019 10:24:08 +0100 (CET)
Date:   Mon, 28 Jan 2019 10:24:08 +0100
From:   Peter Zijlstra <peterz@infradead.org>
To:     Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc:     Alexei Starovoitov <ast@kernel.org>, davem@davemloft.net,
        daniel@iogearbox.net, jakub.kicinski@netronome.com,
        netdev@vger.kernel.org, kernel-team@fb.com, mingo@redhat.com,
        will.deacon@arm.com, Paul McKenney <paulmck@linux.vnet.ibm.com>,
        jannh@google.com
Subject: Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce
 bpf_spin_lock
Message-ID: <20190128092408.GD28467@hirez.programming.kicks-ass.net>
References: <20190124041403.2100609-1-ast@kernel.org>
 <20190124041403.2100609-2-ast@kernel.org>
 <20190124180109.GA27771@hirez.programming.kicks-ass.net>
 <20190124235857.xyb5xx2ufr6x5mbt@ast-mbp.dhcp.thefacebook.com>
 <20190125102312.GC4500@hirez.programming.kicks-ass.net>
 <20190126001725.roqqfrpysyljqiqx@ast-mbp.dhcp.thefacebook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190126001725.roqqfrpysyljqiqx@ast-mbp.dhcp.thefacebook.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> On Fri, Jan 25, 2019 at 11:23:12AM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 24, 2019 at 03:58:59PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Jan 24, 2019 at 07:01:09PM +0100, Peter Zijlstra wrote:
> > 
> > > > And this would again be the moment where I go pester you about the BPF
> > > > memory model :-)
> > > 
> > > hehe :)
> > > How do you propose to define it in a way that it applies to all archs
> > > and yet doesn't penalize x86 ?
> > > "Assume minimum execution ordering model" the way kernel does
> > > unfortunately is not usable, since bpf doesn't have a luxury
> > > of using nice #defines that convert into nops on x86.
> > 
> > Why not? Surely the JIT can fix it up? That is, suppose you were to have
> > smp_rmb() as a eBPF instruction then the JIT (which knows what
> > architecture it is targeting) can simply avoid emitting instructions for
> > it.
> 
> I'm all for adding new instructions that solve real use cases.
> imo bpf_spin_lock() is the first step in helping with concurrency.
> At plumbers conference we agreed to add new sync_fetch_and_add()
> and xchg() instructions. That's a second step.
> smp_rmb/wmb/mb should be added as well.
> JITs will patch them depending on architecture.
> 
> What I want to avoid is to define the whole execution ordering model upfront.
> We cannot say that BPF ISA is weakly ordered like alpha.
> Most of the bpf progs are written and running on x86. We shouldn't
> twist bpf developer's arm by artificially relaxing memory model.
> BPF memory model is equal to memory model of underlying architecture.
> What we can do is to make it bpf progs a bit more portable with
> smp_rmb instructions, but we must not force weak execution on the developer.

Well, I agree with only introducing bits you actually need, and my
smp_rmb() example might have been poorly chosen, smp_load_acquire() /
smp_store_release() might have been a far more useful example.

But I disagree with the last part; we have to pick a model now;
otherwise you'll pain yourself into a corner.

Also; Alpha isn't very relevant these days; however ARM64 does seem to
be gaining a lot of attention and that is very much a weak architecture.
Adding strongly ordered assumptions to BPF now, will penalize them in
the long run.

> > Similarly; could something like this not also help with the spinlock
> > thing? Use that generic test-and-set thing for the interpreter, but
> > provide a better implementation in the JIT?
> 
> May be eventually. We can add cmpxchg insn, but the verifier still
> doesn't support loops. We made a lot of progress in bounded loop research
> over the last 2 years, but loops in bpf are still a year away.
> We considered adding 'bpf_spin_lock' as a new instruction instead of helper call,
> but that approach has a lot of negatives, so we went with the helper.

Ah, but the loop won't be in the BPF program itself. The BPF program
would only have had the BPF_SPIN_LOCK instruction, the JIT them emits
code similar to queued_spin_lock()/queued_spin_unlock() (or calls to
out-of-line versions of them).

There isn't anything that mandates the JIT uses the exact same locking
routines the interpreter does, is there?