From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 296D3165FAA; Wed, 1 May 2024 22:07:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714601252; cv=none; b=c17cf7rjr8LP13cCMixygBoOPMAlUV6psWl9DNGqvIlGhmR6xPoHjW22OI2dhcQRfpx2Ewsnm63li170TLZ+trBYEEXmi7MH0+Gg5KioYq/8/FkesxgOp286kdH4OyYHqvnkTH2T0MBdlFjAZerVq7ZV4fUls+s9EevxDPbFHVg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714601252; c=relaxed/simple; bh=dzMMcZXpNSmLkhfRYOw0UtbzZ2n+xiKCO8/PWDvxKsQ=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=p2eKEs8B+yz7tqfYv0ODQieH47lP1CXUNszgOpVwFXjSGaepMoEXZKZ+fqE5YU9d8IMHRP3qe5OCjOKYUrBdJ79tJWpLq7zWi08SuJPh1mvuuaV7nGo9Cl7vqL62NIuCIE/idF9Yh6e+uiWYCWuH37rIfd3gIM9ac1wCxugCVdg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FKmbj/hU; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FKmbj/hU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1714601251; x=1746137251; h=message-id:subject:from:to:cc:date:in-reply-to: references:content-transfer-encoding:mime-version; bh=dzMMcZXpNSmLkhfRYOw0UtbzZ2n+xiKCO8/PWDvxKsQ=; b=FKmbj/hUTQM3vK45sk0vpLJxMD0YcoNB+KFXcyQ9zQmbsUtWWhPrF9HO 7FLFcthalMDLhW9Z4XtpzRyANHgf0No4EjX5z2EZtlBr5wBsKw105HMYP WjZlJrHuWmHMb3MKmIjeutXXqEZRmWkn8U9K2M+jBwD8er4a5ofJkTkPP /GzyxBXDmVK1Rv/yM38g+wR08jxAKRpsc8qoHkXaDox+wMlex/J0cDls5 OT/qM82ORbtdSyBNCeTGhONxL4X7L1IUlizb33kUpN50vi7b+JfQ7rBAT s3wVH6wiy23fZ1Tvb3MEkyHr4LZretj37rGESRwoh4biDh0iBRXY96j6L g==; X-CSE-ConnectionGUID: wiNcyBVMTOadjq5vhHS98A== X-CSE-MsgGUID: vo+B/DwYTDSS34q56zZKzg== X-IronPort-AV: E=McAfee;i="6600,9927,11061"; a="20971039" X-IronPort-AV: E=Sophos;i="6.07,246,1708416000"; d="scan'208";a="20971039" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 May 2024 15:07:30 -0700 X-CSE-ConnectionGUID: vfdw8mxDS9GzrQTp/MWkMA== X-CSE-MsgGUID: JNafHgGuTAad5P9cUyU2bg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,246,1708416000"; d="scan'208";a="31393980" Received: from sdp.jf.intel.com ([10.165.56.45]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 May 2024 15:07:30 -0700 Message-ID: <8c687681e566ce6c11d3e71e728d12ac2ef08b07.camel@linux.intel.com> Subject: Re: [PATCH 0/4] crypto: Add new compression modes for zlib and IAA From: Andre Glover To: Herbert Xu Cc: tom.zanussi@linux.intel.com, davem@davemloft.net, dave.jiang@intel.com, fenghua.yu@intel.com, wajdi.k.feghali@intel.com, james.guilford@intel.com, vinodh.gopal@intel.com, tony.luck@intel.com, linux-crypto@vger.kernel.org, dmaengine@vger.kernel.org Date: Wed, 01 May 2024 15:07:15 -0700 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4-0ubuntu2 Precedence: bulk X-Mailing-List: dmaengine@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Hi Herbert, On Fri, 2024-04-05 at 15:07 +0800, Herbert Xu wrote: > On Thu, Mar 28, 2024 at 10:44:41AM -0700, Andre Glover wrote: > >=20 > > Below is a table showing the latency improvements with zlib, > > between > > zlib dynamic and zlib canned modes, and the compression ratio for=20 > > each mode while using a set of 4300 4KB pages sampled from SPEC=20 > > CPU17 workloads: > > _________________________________________________________ > > > Zlib Level |=C2=A0 Canned Latency Gain=C2=A0 |=C2=A0=C2=A0=C2=A0 Comp= Ratio=C2=A0=C2=A0=C2=A0 | > > > ------------|-----------------------|------------------| > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | compre= ss | decompress | dynamic | canned | > > > ____________|__________|____________|_________|________| > > > =C2=A0=C2=A0=C2=A0 1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2= =A0 49%=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0 29%=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0= 3.16=C2=A0=C2=A0 |=C2=A0 2.92=C2=A0 | > > > ------------|----------|------------|---------|--------| > > > =C2=A0=C2=A0=C2=A0 6=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2= =A0=C2=A0=C2=A0 27%=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0 28%=C2=A0=C2=A0=C2=A0= =C2=A0 |=C2=A0 3.35=C2=A0=C2=A0 |=C2=A0 3.09=C2=A0 | > > > ------------|----------|------------|---------|--------| > > > =C2=A0=C2=A0=C2=A0 9=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2= =A0 12%=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0 29%=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0= 3.36=C2=A0=C2=A0 |=C2=A0 3.11=C2=A0 | > > > ____________|__________|____________|_________|________| >=20 > So which kernel user (zswap I presume) is clamouring for this > feature? We don't add new algorithms that have no in-kernel > users.=C2=A0 So we need to be sure that the kernel user actually > want this. >=20 > Thanks, Hi Herbert, We have recently submitted an RFC to zswap and zram maintainers and users for by_n compression with Intel IAA [1] feedback. This work is in support of efforts to swap in/out large and multi-sized folios. With by_n compression, we have created a scheme that allows parallel IAA compression and decompression operations on a single folio resulting in performance gains. Currently the by_n scheme uses the canned mode compression algorithm to perform the compression and decompression operations. Using canned mode compression results in reduced compression latency because the deflate header doesnt need to be created dynamically, while also producing better ratio than Deflate Fixed mode. We would appreciate your feedback on this scheme. Here is data from the RFC showing a performance comparison for 64KB folio swap in/out=20 with zram on Sapphire Rapids, whose core frequency is fixed at 2500MHz: +------------+-------------+---------+-------------+----------+-------+ | | Compression | Decomp | Compression | zram | zram | | Algorithm | latency | latency | ratio | write | read | +------------+-------------+---------+-------------+----------+-------+ | | Median (ns) | | Median (ns) | +------------+-------------+---------+-------------+----------+-------+ | | | | | | | | IAA by_1 | 34,493 | 20,038 | 2.93 | 40,130 | 24,478| | IAA by_2 | 18,830 | 11,888 | 2.93 | 24,149 | 15,536| | IAA by_4 | 11,364 | 8,146 | 2.90 | 16,735 | 11,469| | IAA by_8 | 8,344 | 6,342 | 2.77 | 13,527 | 9,177| | IAA by_16 | 8,837 | 6,549 | 2.33 | 15,309 | 9,547| | IAA by_32 | 11,153 | 9,641 | 2.19 | 16,457 | 14,086| | IAA by_64 | 18,272 | 16,696 | 1.96 | 24,294 | 20,048| | | | | | | | | lz4 | 139,190 | 33,687 | 2.40 | 144,940 | 37,312| | | | | | | | | lzo-rle | 138,235 | 61,055 | 2.52 | 143,666 | 64,321| | | | | | | | | zstd | 251,820 | 90,878 | 3.40 | 256,384 | 94,328| +------------+-------------+---------+-------------+----------+-------+ [1]https://lore.kernel.org/all/cover.1714581792.git.andre.glover@linux. intel.com/