From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E91DC43381 for ; Thu, 28 Feb 2019 15:14:14 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F278B2184A for ; Thu, 28 Feb 2019 15:14:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="OKFm85nY" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F278B2184A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Qa1RIaQ9ROUx/o8pcxez04J8DuiFZxxGsXHLB8jE0BM=; b=OKFm85nY+AQ+UcVj9YG4cfV86 fjOEr+mEKIz68+TBC5slJHR89fRhkHrLAiH5LCDLC2dZWxVy0x9vHoC2bwDHfMmCXRvY/NNaGOV6T c1jjHg2+TJDICkQKXP/ecqk8nSyLHDhSb9lMqZVVOZF2cP4m6/laSBACXTnCDHKA4KjYAKi3Gce2H 5zJKerBEFlP0YE6OZcLt5FS6QLf8mwXmR+uKelFuv/VzjWEYi8gN7BdFbUh9rhuvLcOD2j2TmWGKJ /GNWI/aL/S+Deqtzoqul88dBvzFKy277VBHf9TqaeHinAfbbOwrdUbHtco419scIvVsO2RbEkWI0F NZrmLVTXw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1gzNO0-0006h7-DC; Thu, 28 Feb 2019 15:14:12 +0000 Received: from foss.arm.com ([217.140.101.70]) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1gzNNs-0006fH-TN for linux-arm-kernel@lists.infradead.org; Thu, 28 Feb 2019 15:14:10 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A1E12A78; Thu, 28 Feb 2019 07:14:02 -0800 (PST) Received: from [10.1.196.75] (e110467-lin.cambridge.arm.com [10.1.196.75]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4449D3F720; Thu, 28 Feb 2019 07:14:01 -0800 (PST) Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version To: Ard Biesheuvel , Ilias Apalodimas , Catalin Marinas References: <20190218230842.11448-1-ard.biesheuvel@linaro.org> <20190219150848.GA26652@apalos> From: Robin Murphy Message-ID: <93697477-4dcc-4ab2-c838-2f487d334c56@arm.com> Date: Thu, 28 Feb 2019 15:13:59 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190228_071404_955213_D841DCB8 X-CRM114-Status: GOOD ( 21.23 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "" , Steve Capper , Will Deacon , linux-arm-kernel , "huanglingyan \(A\)" Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Ard, On 28/02/2019 14:16, Ard Biesheuvel wrote: > (+ Catalin) > > On Tue, 19 Feb 2019 at 16:08, Ilias Apalodimas > wrote: >> >> On Tue, Feb 19, 2019 at 12:08:42AM +0100, Ard Biesheuvel wrote: >>> It turns out that the IP checksumming code is still exercised often, >>> even though one might expect that modern NICs with checksum offload >>> have no use for it. However, as Lingyan points out, there are >>> combinations of features where the network stack may still fall back >>> to software checksumming, and so it makes sense to provide an >>> optimized implementation in software as well. >>> >>> So provide an implementation of do_csum() in scalar assembler, which, >>> unlike C, gives direct access to the carry flag, making the code run >>> substantially faster. The routine uses overlapping 64 byte loads for >>> all input size > 64 bytes, in order to reduce the number of branches >>> and improve performance on cores with deep pipelines. >>> >>> On Cortex-A57, this implementation is on par with Lingyan's NEON >>> implementation, and roughly 7x as fast as the generic C code. >>> >>> Cc: "huanglingyan (A)" >>> Signed-off-by: Ard Biesheuvel > ... >> >> Acked-by: Ilias Apalodimas > > Full patch here > > https://lore.kernel.org/linux-arm-kernel/20190218230842.11448-1-ard.biesheuvel@linaro.org/ > > This was a follow-up to some discussions about Lingyan's NEON code, > CC'ed to netdev@ so people could chime in as to whether we need > accelerated checksumming code in the first place. FWIW ever since we did ip_fast_csum() I've been meaning to see how well I can do with a similar tweaked C implementation for this (mostly for fun). Since I've recently dug out my RK3328 box for other reasons I'll give this a test - that's a weedy little quad-A53 whose GbE hardware checksumming is slightly busted and has to be turned off, so the do_csum() overhead under heavy network load is comparatively massive. (plus it's non-EFI so I should be able to try big-endian easily too) The asm looks pretty reasonable to me - instinct says there's *possibly* some value for out-of-order cores in doing the 8-way accumulations in a more pairwise fashion, but I guess either way the carry flag dependency is going to dominate, so it may well be moot. What may be more worthwhile is taking the effort to align the source pointer, at least for larger inputs, so as to be kinder to little cores - according to its optimisation guide, A55 is fairly sensitive to unaligned loads, so I'd assume that's true of its older/smaller friends too. I'll see what I can measure in practice - until proven otherwise I'd have no great objection to merging this patch as-is if the need is real. Improvements can always come later :) Robin. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel