From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA23E3B47C8 for ; Mon, 30 Mar 2026 09:31:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774863109; cv=none; b=eHu2jHV/lUj5/08JO+9JYtp6bP2siXmVbgHTD+D4khXJg3o+4swEIvWTmxnKYBwWG/hcKjpetJH1BKPqr3rUcro1IgE+lTp3GZNmtUDPQT2/gZOQCo3NEfOUREvF0skhxQHuyfbBDFZH8xX3rjRD/1p1qKpOeXZfwc2FPmGhGVI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774863109; c=relaxed/simple; bh=LU+5ctIuk/2P8a3ACWO/4xkPZ6cdiMXVw5WjWNmJytc=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=f6D+CNqUAuYBM61i/FNCWyUMT6ub86JbF8CuTRHbYxUWzzxBOg2Yo9Mq7VJUa4wEIiKxQVTtzEF5aLUtHoNEl4IR0eigLrjYq4Fk9TMvkYPW7H8MO7AZ+2yk6CVZ3KCfAIad2rotl5WTEVY7p7i0rbH/c1PAJGKe/or8yRRqCqA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GfoK27ib; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GfoK27ib" Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-487035181a7so27868625e9.2 for ; Mon, 30 Mar 2026 02:31:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774863105; x=1775467905; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=evkirLKGhh0WuOBuZT2IPb0j74uGRBwEy0ZVb1MgXNE=; b=GfoK27ibM8Zo7Fx+TZVqoDdltGw6oPhCMZStiC/5mXK5KlvG9KMRcQRoDAbWUxDAdq 8LzQQhvLzswSnTscYlyFNyssWgMBcpH8Zf48bH9Q0qNFi8Gcl1xgdRg0IlLyH3VTboFR 95WZE37pBmeNCQmX9TdyD3IUqim0VZkJTpUIsOwfpSxWubX7WeDscdTZDSRpObfkBPLB iTvS0HrYJuyEEEAaOXepY8/ZBCjqqVLTZOeyihac8TGBdZ/ckZw6KQmfKSXYN7vhTNud f5yalO6SiCYPdO9qRU3uaTTQp0Vf3p/fCGorlThd2u0K9CunJFwDbBgY/VyVwYXB4gTc NpQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774863105; x=1775467905; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=evkirLKGhh0WuOBuZT2IPb0j74uGRBwEy0ZVb1MgXNE=; b=DYGTx/irDDx044Gwvs+AxEoS1bngbW4wfM9Wee/yGeiXeybjQxHmu6dt3XoG8Y0bdP B9vCXs1IhVQ6A4QCzAbbZGtAnN66rfC3kyYpSTep9h49TUySdRwkhT0u0wKRNL8yf3Gn ZUsEI0i+9wskWSgvPccv64dndOP+8uO4vv+6Ni78nsadSteMN+3yU07UCwt10LsPMGwX uOoku7/gPAs0d99A6IYlFJUsVQ2biD/XraMW+coPZItLTkzRCIRVg+Gg0cYNk1YicglY xlHAeYsIigglI3Wr+yUWVIgHRkZPlu4LA4myhtAKBIl0BCzNAhGOPjgMPWxefpDuTOLP Kkpg== X-Forwarded-Encrypted: i=1; AJvYcCWHFYNAhKAFCwoD+/0Pv5qi2Rf57nK/ujUdBZBLp5Qt4gFoej3PDgUTYwxNNCa1t57Q6JLXi8d/xQFI6nk=@vger.kernel.org X-Gm-Message-State: AOJu0YwLuVYjl7+LH/f5QOwZJAhapnc3vm5HQWG8l939FnwMNqsf/6aX BC8eZPYphfJ4c6hretMQDQyGdfkB6bgZOTKqS+//dwPlbxb1dkrZTpWQ X-Gm-Gg: ATEYQzx9cRWo76y97hm2YDBWi7uvJZDdCff/gX6gG+2mi8TbCcehYVPMUgxDflQH/0l TEUt5f5rKQSt296EX0B+e7m+IZgaVKWe55gcBdmFosGO+46lcvfIecaiS0Kl3cwhHtREluf27To HoZ2ZQ6KpPiDN81QKEg/m7yJDjb26Caa9k4McKmtitlF9pgU5543PwyMC0lBNpyS1Sj1JeT7Cnn iIh7SRz+ysi2rXTrUmRyUGstQHJaVz+uu7DqDfZH53uwoyUZgfbiXTy8Z57DYssJbU0AqMC0v4F BJsBT1iLUzdwrDbrboDrAZ2S7XjpeUg1HyT1a+2Q08UjXSolIISmxlriFJXrfR7ho4ab5VhDowh rLM3bHA7bnJuOSHGgZJGjlCINYqJolysLN32xhI7nDxrf5dQ/NB6U06k0OlsDEkXvfLxVeuKMwk I+s6VJZ8TePyaey2NCPD6aOVmgWuePe+HCt5tKZkTFjVee+z1X6jUGPzucNRHlebPxyOugJVk= X-Received: by 2002:a05:6000:2282:b0:439:b811:11de with SMTP id ffacd0b85a97d-43b9e9d5e30mr20922607f8f.7.1774863104329; Mon, 30 Mar 2026 02:31:44 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43cf21e9e18sm16796564f8f.9.2026.03.30.02.31.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Mar 2026 02:31:44 -0700 (PDT) Date: Mon, 30 Mar 2026 10:31:42 +0100 From: David Laight To: Eric Biggers Cc: Demian Shulhan , linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, ardb@kernel.org Subject: Re: [PATCH v3] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation Message-ID: <20260330103142.193e2a98@pumpkin> In-Reply-To: <20260329221821.GC2106@quark> References: <20260317065425.2684093-1-demyansh@gmail.com> <20260329074338.1053550-1-demyansh@gmail.com> <20260329203829.GA2746@quark> <20260329225704.0eb82966@pumpkin> <20260329221821.GC2106@quark> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 29 Mar 2026 15:18:21 -0700 Eric Biggers wrote: > On Sun, Mar 29, 2026 at 10:57:04PM +0100, David Laight wrote: > > Final thought: > > Is that allowing for the cost of kernel_fpu_begin()? - which I think only > > affects the first call. > > And the cost of the data-cache misses for the lookup table reads? - again > > worse for the first call. > > I assume you mean kernel_neon_begin(). This is an arm64 patch. Well, much the same. > (I encourage you to actually read the code. You seem to send a lot of > speculation-heavy comments without actually reading the code.) I have looked at the code, since I (mostly) understand the maths I can almost work out what it is doing - but all the conversions between three different ways of holding two 64bit values in one 128bit register really don't help. > Currently, the benchmark in crc_kunit just measures the throughput in a > loop (as has been discussed before). So no, it doesn't currently > capture the overhead of pulling code and data into cache. For NEON > register use it captures only the amortized overhead. > > Note that using PMULL saves having to pull the table into memory, while > using the table is a bit less code and saves having to use kernel-mode > NEON. So both have their advantages and disadvantages. Indeed - so the 128 is really a 'finger in the air' value :-) > This patch does fall back to the table for the last 'len & ~15' bytes, > which means the table may be needed anyway. Nibble lookups on two separate tables (256 bytes instead of 2k) might be almost as fast even with the tables in the cache. The critical part of the table lookup loop should be the: crc = crc ^ table[crc & 0xff] part (the rotate should get hidden in the memory read latency). With nibble tables is becomes: crc = crc ^ table_lo[crc & 0xf] ^ table_hi[(crc & 0xf0) >> 4] on any modern cpu the table lookups will happen in parallel; so it should just add one 'xor' to the loop. (And yes, I probably could measure it, at least in userspace on x86-64.) > That is not the optimal way > to do it, and it's something to address later when this is replaced with > something similar to x86's crc-pclmul-template.S. That is one bit I do need to grok... David > > - Eric