From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=auZG=QG=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AE570C282D6
	for <netdev@archiver.kernel.org>; Wed, 30 Jan 2019 08:59:15 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 7451920881
	for <netdev@archiver.kernel.org>; Wed, 30 Jan 2019 08:59:15 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="sfZP9mxs"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1730499AbfA3I7O (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Wed, 30 Jan 2019 03:59:14 -0500
Received: from bombadil.infradead.org ([198.137.202.133]:36386 "EHLO
        bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1730494AbfA3I7N (ORCPT
        <rfc822;netdev@vger.kernel.org>); Wed, 30 Jan 2019 03:59:13 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
        d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version
        :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:
        Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
        Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:
        List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
         bh=yLQCRMgiMTUH4OREiJrF1elAv0lGn56fRYhJ/AuQfIQ=; b=sfZP9mxsmybhguPiczsrcdSHr
        zihTrYZWH4+5FYpMMpWO6aAlzoNdQBVYJyDmWjA5kDrF7zFWU/zNvT0+GooQMW+YyOOa9KIEgsz0N
        85e4cnzrcEenQrHdCo/4gJ2JO6BTky2LwDVTcr0+8yK+NnhMVxQJcipRC9Fc7MsFU5BTPZThP6p5w
        26IxdWNzVjADjKUX+0wjo3adTB9q+BFCVuQ0deGirDoxMP9GSUGnTIgwnPmyG9BR5WTRIZ59CrIDj
        uV6LGfZ9FUXM3b4kREq3MdUOShJbcCDu/NQx8NjVG3ZV21r6laOwoCmo6oATx1QYRH7SqhkjuHC51
        l76/evVcQ==;
Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net)
        by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux))
        id 1golhr-0002jM-SF; Wed, 30 Jan 2019 08:58:55 +0000
Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000)
        id 19C2B20289CC5; Wed, 30 Jan 2019 09:58:50 +0100 (CET)
Date:   Wed, 30 Jan 2019 09:58:50 +0100
From:   Peter Zijlstra <peterz@infradead.org>
To:     Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc:     Alexei Starovoitov <ast@kernel.org>, davem@davemloft.net,
        daniel@iogearbox.net, jakub.kicinski@netronome.com,
        netdev@vger.kernel.org, kernel-team@fb.com, mingo@redhat.com,
        will.deacon@arm.com, Paul McKenney <paulmck@linux.vnet.ibm.com>,
        jannh@google.com
Subject: Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce
 bpf_spin_lock
Message-ID: <20190130085850.GA2278@hirez.programming.kicks-ass.net>
References: <20190124041403.2100609-1-ast@kernel.org>
 <20190124041403.2100609-2-ast@kernel.org>
 <20190124180109.GA27771@hirez.programming.kicks-ass.net>
 <20190124235857.xyb5xx2ufr6x5mbt@ast-mbp.dhcp.thefacebook.com>
 <20190125102312.GC4500@hirez.programming.kicks-ass.net>
 <20190126001725.roqqfrpysyljqiqx@ast-mbp.dhcp.thefacebook.com>
 <20190128092408.GD28467@hirez.programming.kicks-ass.net>
 <20190128215623.6eqskzhklydhympa@ast-mbp>
 <20190129091654.GD28485@hirez.programming.kicks-ass.net>
 <20190130023212.zs4d6hws5tsfl5uc@ast-mbp.dhcp.thefacebook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190130023212.zs4d6hws5tsfl5uc@ast-mbp.dhcp.thefacebook.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On Tue, Jan 29, 2019 at 06:32:13PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 29, 2019 at 10:16:54AM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > 
> > > > Ah, but the loop won't be in the BPF program itself. The BPF program
> > > > would only have had the BPF_SPIN_LOCK instruction, the JIT them emits
> > > > code similar to queued_spin_lock()/queued_spin_unlock() (or calls to
> > > > out-of-line versions of them).
> > > 
> > > As I said we considered exactly that and such approach has a lot of downsides
> > > comparing with the helper approach.
> > > Pretty much every time new feature is added we're evaluating whether it
> > > should be new instruction or new helper. 99% of the time we go with new helper.
> > 
> > Ah; it seems I'm confused on helper vs instruction. As in, I've no idea
> > what a helper is.
> 
> bpf helper is a normal kernel function that can be called from bpf program.
> In assembler it's a direct function call.

Ah, it is what is otherwise known as a standard library,

> > > > There isn't anything that mandates the JIT uses the exact same locking
> > > > routines the interpreter does, is there?
> > > 
> > > sure. This bpf_spin_lock() helper can be optimized whichever way the kernel wants.
> > > Like bpf_map_lookup_elem() call is _inlined_ by the verifier for certain map types.
> > > JITs don't even need to do anything. It looks like function call from bpf prog
> > > point of view, but in JITed code it is a sequence of native instructions.
> > > 
> > > Say tomorrow we find out that bpf_prog->bpf_spin_lock()->queued_spin_lock()
> > > takes too much time then we can inline fast path of queued_spin_lock
> > > directly into bpf prog and save function call cost.
> > 
> > OK, so then the JIT can optimize helpers. Would it not make sense to
> > have the simple test-and-set spinlock in the generic code and have the
> > JITs use arch_spinlock_t where appropriate?
> 
> I think that pretty much the same as what I have with qspinlock.
> Instead of taking a risk how JIT writers implement bpf_spin_lock optimization
> I'm using qspinlock on architectures that are known to support it.

I see the argument for it...

> So instead of starting with dumb test-and-set there will be faster
> qspinlock from the start on x86, arm64 and few others archs.
> Those are the archs we care about the most anyway. Other archs can take
> time to optimize it (if optimizations are necessary at all).
> In general hacking JITs is much harder and more error prone than
> changing core and adding helpers. Hence we avoid touching JITs
> as much as possible.

So archs/JITs are not trivially able to override those helper functions?
Because for example ARM (32bit) doesn't do qspinlock but it's
arch_spinlock_t is _much_ better than a TAS lock.

> Like map_lookup inlining optimization we do only when JIT is on.
> And we do it purely in the generic core. See array_map_gen_lookup().
> We generate bpf instructions only to feed them into JITs so they
> can replace them with native asm. That is much easier to implement
> correctly than if we were doing inlining in every JIT independently.

That seems prudent and fairly painful at the same time. I see why you
would want to share as much of it as possible, but being restricted to
BPF instructions seems very limiting.

Anyway, I'll not press the issue; but I do think that arch specific
helpers would make sense here.