From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEA8DC432C2 for ; Thu, 26 Sep 2019 11:48:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A27642146E for ; Thu, 26 Sep 2019 11:48:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=toke.dk header.i=@toke.dk header.b="E7B4xf/s" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725877AbfIZLsj (ORCPT ); Thu, 26 Sep 2019 07:48:39 -0400 Received: from mail.toke.dk ([52.28.52.200]:56781 "EHLO mail.toke.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725787AbfIZLsj (ORCPT ); Thu, 26 Sep 2019 07:48:39 -0400 X-Greylist: delayed 598 seconds by postgrey-1.27 at vger.kernel.org; Thu, 26 Sep 2019 07:48:38 EDT From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1569497917; bh=2ozAWmHQqzsPDD0WmdG2auChc9T1w49xSlDI6ZfDT7o=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=E7B4xf/sxTyXBeJoIGW4nq+Nma+qA/w0+mvTe4vhnAMnXOMwoNqN3gDE5hVdmdNzh /cy8vHjKBAT2ogyoiI6miZ76hx67WQMVHN6a2BVqXmNEVEsuXo+5o7swBUAPCBWqHv VMmzaEdMsV2Uv19vqCGkyzwg2EJEb01nOVmOnVhCTdWd4ObjWDaCmz3mtvmwvX1xkx N27rxXs/S8RzJjikh/vluQgPD3Jx6octJQQJEEF38ED/mcB5ByakvsgPj4OAYlok2R So9Nhhgnw01AwyOqMWr8VQnGQ5pUR3ZKQGpfV25KDOG2KIbUNkX5Z+F9X+p9L2Xo7C n9nJBQm/HWwGw== To: "Jason A. Donenfeld" , Pascal Van Leeuwen Cc: Ard Biesheuvel , Linux Crypto Mailing List , linux-arm-kernel , Herbert Xu , David Miller , Greg KH , Linus Torvalds , Samuel Neves , Dan Carpenter , Arnd Bergmann , Eric Biggers , Andy Lutomirski , Will Deacon , Marc Zyngier , Catalin Marinas , Willy Tarreau , Netdev , Dave Taht Subject: Re: chapoly acceleration hardware [Was: Re: [RFC PATCH 00/18] crypto: wireguard using the existing crypto API] In-Reply-To: References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> Date: Thu, 26 Sep 2019 13:38:36 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <8736gj2soz.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org "Jason A. Donenfeld" writes: > [CC +willy, toke, dave, netdev] > > Hi Pascal > > On Thu, Sep 26, 2019 at 12:19 PM Pascal Van Leeuwen > wrote: >> Actually, that assumption is factually wrong. I don't know if anything >> is *publicly* available, but I can assure you the silicon is running in >> labs already. And something will be publicly available early next year >> at the latest. Which could nicely coincide with having Wireguard support >> in the kernel (which I would also like to see happen BTW) ... >> >> Not "at some point". It will. Very soon. Maybe not in consumer or server >> CPUs, but definitely in the embedded (networking) space. >> And it *will* be much faster than the embedded CPU next to it, so it will >> be worth using it for something like bulk packet encryption. > > Super! I was wondering if you could speak a bit more about the > interface. My biggest questions surround latency. Will it be > synchronous or asynchronous? If the latter, why? What will its > latencies be? How deep will its buffers be? The reason I ask is that a > lot of crypto acceleration hardware of the past has been fast and > having very deep buffers, but at great expense of latency. In the > networking context, keeping latency low is pretty important. Already > WireGuard is multi-threaded which isn't super great all the time for > latency (improvements are a work in progress). If you're involved with > the design of the hardware, perhaps this is something you can help > ensure winds up working well? For example, AES-NI is straightforward > and good, but Intel can do that because they are the CPU. It sounds > like your silicon will be adjacent. How do you envision this working > in a low latency environment? Being asynchronous doesn't *necessarily* have to hurt latency; you just need the right queue back-pressure. We already have multiple queues in the stack. With an async crypto engine we would go from something like: stack -> [qdisc] -> wg if -> [wireguard buffer] -> netdev driver -> device -> [device buffer] -> wire to stack -> [qdisc] -> wg if -> [wireguard buffer] -> crypto stack -> crypto device -> [crypto device buffer] -> wg post-crypto -> netdev driver -> device -> [device buffer] -> wire (where everything in [] is a packet queue). The wireguard buffer is the source of the latency you're alluding to above (the comment about multi-threaded behaviour), so we probably need to fix that anyway. For the device buffer we have BQL to keep it at a minimum. So that leaves the buffering in the crypto offload device. If we add something like BQL to the crypto offload drivers, we could conceivably avoid having that add a significant amount of latency. In fact, doing so may benefit other users of crypto offloads as well, no? Presumably ipsec has this same issue? Caveat: I am fairly ignorant about the inner workings of the crypto subsystem, so please excuse any inaccuracies in the above; the diagrams are solely for illustrative purposes... :) -Toke From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=0.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_ADSP_ALL, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11A8EC432C2 for ; Thu, 26 Sep 2019 11:48:53 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D7C2B21D56 for ; Thu, 26 Sep 2019 11:48:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="sF9awVM1"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=toke.dk header.i=@toke.dk header.b="E7B4xf/s" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7C2B21D56 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=toke.dk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:References :In-Reply-To:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=pd9VskoUjSzPS1ydzslDQmjUyeHh18oZml7uZB2uug0=; b=sF9awVM1yC0Ghr 70eyy3asvHzHtHq2xJnjxPNoznzwmyv3QN2Vf+Dvh2TgM2FxaEKY/YJHPT91KNIkmC7ETcofOEnQI IY3WJVscGRDRWDweulcXnFzv6AELN2dKpgkB3JO7fffDv8yIIxWwT9kOSQqiEg8Po1JJMWtt3GRz2 5m185YONT4x8q2a4KV3QN09S487MX3YI7nFglEQYD7cKU+JlVHI2Ns3QD3aqP/5HHMCl0xarhgqO2 KXvDMiYwjfU4/SArMLBTdYJafA26DEhoCH12C6GCVTBueFuoOJ3wrXjLKTqdqpEJp0OMXg+OENDXt qTu3BTpGq6DPvXcx2TqA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iDSGN-00086W-Sh; Thu, 26 Sep 2019 11:48:47 +0000 Received: from mail.toke.dk ([52.28.52.200]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iDSGK-000863-50 for linux-arm-kernel@lists.infradead.org; Thu, 26 Sep 2019 11:48:46 +0000 From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1569497917; bh=2ozAWmHQqzsPDD0WmdG2auChc9T1w49xSlDI6ZfDT7o=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=E7B4xf/sxTyXBeJoIGW4nq+Nma+qA/w0+mvTe4vhnAMnXOMwoNqN3gDE5hVdmdNzh /cy8vHjKBAT2ogyoiI6miZ76hx67WQMVHN6a2BVqXmNEVEsuXo+5o7swBUAPCBWqHv VMmzaEdMsV2Uv19vqCGkyzwg2EJEb01nOVmOnVhCTdWd4ObjWDaCmz3mtvmwvX1xkx N27rxXs/S8RzJjikh/vluQgPD3Jx6octJQQJEEF38ED/mcB5ByakvsgPj4OAYlok2R So9Nhhgnw01AwyOqMWr8VQnGQ5pUR3ZKQGpfV25KDOG2KIbUNkX5Z+F9X+p9L2Xo7C n9nJBQm/HWwGw== To: "Jason A. Donenfeld" , Pascal Van Leeuwen Subject: Re: chapoly acceleration hardware [Was: Re: [RFC PATCH 00/18] crypto: wireguard using the existing crypto API] In-Reply-To: References: <20190925161255.1871-1-ard.biesheuvel@linaro.org> Date: Thu, 26 Sep 2019 13:38:36 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <8736gj2soz.fsf@toke.dk> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190926_044844_533842_F80D59FE X-CRM114-Status: GOOD ( 15.76 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Catalin Marinas , Herbert Xu , Arnd Bergmann , Ard Biesheuvel , Greg KH , Eric Biggers , Dave Taht , Willy Tarreau , Samuel Neves , Will Deacon , Netdev , Linux Crypto Mailing List , Andy Lutomirski , Marc Zyngier , Dan Carpenter , Linus Torvalds , David Miller , linux-arm-kernel Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org "Jason A. Donenfeld" writes: > [CC +willy, toke, dave, netdev] > > Hi Pascal > > On Thu, Sep 26, 2019 at 12:19 PM Pascal Van Leeuwen > wrote: >> Actually, that assumption is factually wrong. I don't know if anything >> is *publicly* available, but I can assure you the silicon is running in >> labs already. And something will be publicly available early next year >> at the latest. Which could nicely coincide with having Wireguard support >> in the kernel (which I would also like to see happen BTW) ... >> >> Not "at some point". It will. Very soon. Maybe not in consumer or server >> CPUs, but definitely in the embedded (networking) space. >> And it *will* be much faster than the embedded CPU next to it, so it will >> be worth using it for something like bulk packet encryption. > > Super! I was wondering if you could speak a bit more about the > interface. My biggest questions surround latency. Will it be > synchronous or asynchronous? If the latter, why? What will its > latencies be? How deep will its buffers be? The reason I ask is that a > lot of crypto acceleration hardware of the past has been fast and > having very deep buffers, but at great expense of latency. In the > networking context, keeping latency low is pretty important. Already > WireGuard is multi-threaded which isn't super great all the time for > latency (improvements are a work in progress). If you're involved with > the design of the hardware, perhaps this is something you can help > ensure winds up working well? For example, AES-NI is straightforward > and good, but Intel can do that because they are the CPU. It sounds > like your silicon will be adjacent. How do you envision this working > in a low latency environment? Being asynchronous doesn't *necessarily* have to hurt latency; you just need the right queue back-pressure. We already have multiple queues in the stack. With an async crypto engine we would go from something like: stack -> [qdisc] -> wg if -> [wireguard buffer] -> netdev driver -> device -> [device buffer] -> wire to stack -> [qdisc] -> wg if -> [wireguard buffer] -> crypto stack -> crypto device -> [crypto device buffer] -> wg post-crypto -> netdev driver -> device -> [device buffer] -> wire (where everything in [] is a packet queue). The wireguard buffer is the source of the latency you're alluding to above (the comment about multi-threaded behaviour), so we probably need to fix that anyway. For the device buffer we have BQL to keep it at a minimum. So that leaves the buffering in the crypto offload device. If we add something like BQL to the crypto offload drivers, we could conceivably avoid having that add a significant amount of latency. In fact, doing so may benefit other users of crypto offloads as well, no? Presumably ipsec has this same issue? Caveat: I am fairly ignorant about the inner workings of the crypto subsystem, so please excuse any inaccuracies in the above; the diagrams are solely for illustrative purposes... :) -Toke _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel