From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE570C282D6 for ; Wed, 30 Jan 2019 08:59:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7451920881 for ; Wed, 30 Jan 2019 08:59:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="sfZP9mxs" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730499AbfA3I7O (ORCPT ); Wed, 30 Jan 2019 03:59:14 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:36386 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730494AbfA3I7N (ORCPT ); Wed, 30 Jan 2019 03:59:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=yLQCRMgiMTUH4OREiJrF1elAv0lGn56fRYhJ/AuQfIQ=; b=sfZP9mxsmybhguPiczsrcdSHr zihTrYZWH4+5FYpMMpWO6aAlzoNdQBVYJyDmWjA5kDrF7zFWU/zNvT0+GooQMW+YyOOa9KIEgsz0N 85e4cnzrcEenQrHdCo/4gJ2JO6BTky2LwDVTcr0+8yK+NnhMVxQJcipRC9Fc7MsFU5BTPZThP6p5w 26IxdWNzVjADjKUX+0wjo3adTB9q+BFCVuQ0deGirDoxMP9GSUGnTIgwnPmyG9BR5WTRIZ59CrIDj uV6LGfZ9FUXM3b4kREq3MdUOShJbcCDu/NQx8NjVG3ZV21r6laOwoCmo6oATx1QYRH7SqhkjuHC51 l76/evVcQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1golhr-0002jM-SF; Wed, 30 Jan 2019 08:58:55 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 19C2B20289CC5; Wed, 30 Jan 2019 09:58:50 +0100 (CET) Date: Wed, 30 Jan 2019 09:58:50 +0100 From: Peter Zijlstra To: Alexei Starovoitov Cc: Alexei Starovoitov , davem@davemloft.net, daniel@iogearbox.net, jakub.kicinski@netronome.com, netdev@vger.kernel.org, kernel-team@fb.com, mingo@redhat.com, will.deacon@arm.com, Paul McKenney , jannh@google.com Subject: Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock Message-ID: <20190130085850.GA2278@hirez.programming.kicks-ass.net> References: <20190124041403.2100609-1-ast@kernel.org> <20190124041403.2100609-2-ast@kernel.org> <20190124180109.GA27771@hirez.programming.kicks-ass.net> <20190124235857.xyb5xx2ufr6x5mbt@ast-mbp.dhcp.thefacebook.com> <20190125102312.GC4500@hirez.programming.kicks-ass.net> <20190126001725.roqqfrpysyljqiqx@ast-mbp.dhcp.thefacebook.com> <20190128092408.GD28467@hirez.programming.kicks-ass.net> <20190128215623.6eqskzhklydhympa@ast-mbp> <20190129091654.GD28485@hirez.programming.kicks-ass.net> <20190130023212.zs4d6hws5tsfl5uc@ast-mbp.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190130023212.zs4d6hws5tsfl5uc@ast-mbp.dhcp.thefacebook.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, Jan 29, 2019 at 06:32:13PM -0800, Alexei Starovoitov wrote: > On Tue, Jan 29, 2019 at 10:16:54AM +0100, Peter Zijlstra wrote: > > On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote: > > > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote: > > > > > > Ah, but the loop won't be in the BPF program itself. The BPF program > > > > would only have had the BPF_SPIN_LOCK instruction, the JIT them emits > > > > code similar to queued_spin_lock()/queued_spin_unlock() (or calls to > > > > out-of-line versions of them). > > > > > > As I said we considered exactly that and such approach has a lot of downsides > > > comparing with the helper approach. > > > Pretty much every time new feature is added we're evaluating whether it > > > should be new instruction or new helper. 99% of the time we go with new helper. > > > > Ah; it seems I'm confused on helper vs instruction. As in, I've no idea > > what a helper is. > > bpf helper is a normal kernel function that can be called from bpf program. > In assembler it's a direct function call. Ah, it is what is otherwise known as a standard library, > > > > There isn't anything that mandates the JIT uses the exact same locking > > > > routines the interpreter does, is there? > > > > > > sure. This bpf_spin_lock() helper can be optimized whichever way the kernel wants. > > > Like bpf_map_lookup_elem() call is _inlined_ by the verifier for certain map types. > > > JITs don't even need to do anything. It looks like function call from bpf prog > > > point of view, but in JITed code it is a sequence of native instructions. > > > > > > Say tomorrow we find out that bpf_prog->bpf_spin_lock()->queued_spin_lock() > > > takes too much time then we can inline fast path of queued_spin_lock > > > directly into bpf prog and save function call cost. > > > > OK, so then the JIT can optimize helpers. Would it not make sense to > > have the simple test-and-set spinlock in the generic code and have the > > JITs use arch_spinlock_t where appropriate? > > I think that pretty much the same as what I have with qspinlock. > Instead of taking a risk how JIT writers implement bpf_spin_lock optimization > I'm using qspinlock on architectures that are known to support it. I see the argument for it... > So instead of starting with dumb test-and-set there will be faster > qspinlock from the start on x86, arm64 and few others archs. > Those are the archs we care about the most anyway. Other archs can take > time to optimize it (if optimizations are necessary at all). > In general hacking JITs is much harder and more error prone than > changing core and adding helpers. Hence we avoid touching JITs > as much as possible. So archs/JITs are not trivially able to override those helper functions? Because for example ARM (32bit) doesn't do qspinlock but it's arch_spinlock_t is _much_ better than a TAS lock. > Like map_lookup inlining optimization we do only when JIT is on. > And we do it purely in the generic core. See array_map_gen_lookup(). > We generate bpf instructions only to feed them into JITs so they > can replace them with native asm. That is much easier to implement > correctly than if we were doing inlining in every JIT independently. That seems prudent and fairly painful at the same time. I see why you would want to share as much of it as possible, but being restricted to BPF instructions seems very limiting. Anyway, I'll not press the issue; but I do think that arch specific helpers would make sense here.