Re: [PATCH v8 14/14] mm: zswap: Compress batching with request chaining in zswap

Building the Linux kernel with Clang and LLVM
 help / color / mirror / Atom feed

* Re: [PATCH v8 14/14] mm: zswap: Compress batching with request chaining in zswap_store() of large folios.
       [not found] <20250303084724.6490-15-kanchana.p.sridhar@intel.com>
@ 2025-03-03 11:07 ` kernel test robot
  2025-03-03 18:21   ` Nhat Pham
  0 siblings, 1 reply; 5+ messages in thread
From: kernel test robot @ 2025-03-03 11:07 UTC (permalink / raw)
  To: Kanchana P Sridhar, linux-kernel, linux-mm, hannes, yosry.ahmed,
	nphamcs, chengming.zhou, usamaarif642, ryan.roberts, 21cnbao,
	ying.huang, akpm, linux-crypto, herbert, davem, clabbe, ardb,
	ebiggers, surenb, kristen.c.accardi
  Cc: llvm, oe-kbuild-all, wajdi.k.feghali, vinodh.gopal,
	kanchana.p.sridhar

Hi Kanchana,

kernel test robot noticed the following build errors:

[auto build test ERROR on 5f089a9aa987ccf72df0c6955e168e865f280603]

url:    https://github.com/intel-lab-lkp/linux/commits/Kanchana-P-Sridhar/crypto-acomp-Add-synchronous-asynchronous-acomp-request-chaining/20250303-164927
base:   5f089a9aa987ccf72df0c6955e168e865f280603
patch link:    https://lore.kernel.org/r/20250303084724.6490-15-kanchana.p.sridhar%40intel.com
patch subject: [PATCH v8 14/14] mm: zswap: Compress batching with request chaining in zswap_store() of large folios.
config: s390-randconfig-001-20250303 (https://download.01.org/0day-ci/archive/20250303/202503031847.j1iReOtf-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250303/202503031847.j1iReOtf-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503031847.j1iReOtf-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/zswap.c:1166:4: error: call to undeclared function 'prefetchw'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    1166 |                         prefetchw(entries[j]);
         |                         ^
   1 error generated.


vim +/prefetchw +1166 mm/zswap.c

  1053	
  1054	/*
  1055	 * Unified code paths for compressors that do and do not support
  1056	 * batching. This procedure will compress multiple @nr_pages in @folio,
  1057	 * starting from @index.
  1058	 * If @batching is set to true, it will create a request chain for
  1059	 * compression batching. It is assumed that the caller has verified
  1060	 * that the acomp_ctx->nr_reqs is at least @nr_pages.
  1061	 * If @batching is set to false, it will process each page sequentially.
  1062	 * In both cases, if all compressions were successful, it will proceed
  1063	 * to store the compressed buffers in zpool.
  1064	 */
  1065	static bool zswap_batch_compress(struct folio *folio,
  1066					 long index,
  1067					 unsigned int nr_pages,
  1068					 struct zswap_entry *entries[],
  1069					 struct zswap_pool *pool,
  1070					 struct crypto_acomp_ctx *acomp_ctx,
  1071					 bool batching)
  1072	{
  1073		struct scatterlist inputs[ZSWAP_MAX_BATCH_SIZE];
  1074		struct scatterlist outputs[ZSWAP_MAX_BATCH_SIZE];
  1075		struct zpool *zpool = pool->zpool;
  1076		int acomp_idx = 0, nr_to_store = 1;
  1077		unsigned int i, j;
  1078		int err = 0;
  1079		gfp_t gfp;
  1080	
  1081		lockdep_assert_held(&acomp_ctx->mutex);
  1082	
  1083		gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
  1084		if (zpool_malloc_support_movable(zpool))
  1085			gfp |= __GFP_HIGHMEM | __GFP_MOVABLE;
  1086	
  1087		for (i = 0; i < nr_pages; ++i) {
  1088			struct page *page = folio_page(folio, index + i);
  1089	
  1090			sg_init_table(&inputs[acomp_idx], 1);
  1091			sg_set_page(&inputs[acomp_idx], page, PAGE_SIZE, 0);
  1092	
  1093			/*
  1094			 * Each dst buffer should be of size (PAGE_SIZE * 2).
  1095			 * Reflect same in sg_list.
  1096			 */
  1097			sg_init_one(&outputs[acomp_idx], acomp_ctx->buffers[acomp_idx], PAGE_SIZE * 2);
  1098			acomp_request_set_params(acomp_ctx->reqs[acomp_idx], &inputs[acomp_idx],
  1099						 &outputs[acomp_idx], PAGE_SIZE, PAGE_SIZE);
  1100	
  1101			if (batching) {
  1102				/* Add the acomp request to the chain. */
  1103				if (likely(i))
  1104					acomp_request_chain(acomp_ctx->reqs[acomp_idx], acomp_ctx->reqs[0]);
  1105				else
  1106					acomp_reqchain_init(acomp_ctx->reqs[0], 0, crypto_req_done,
  1107							    &acomp_ctx->wait);
  1108	
  1109				if (i == (nr_pages - 1)) {
  1110					/* Process the request chain. */
  1111					err = crypto_wait_req(crypto_acomp_compress(acomp_ctx->reqs[0]), &acomp_ctx->wait);
  1112	
  1113					/*
  1114					 * Get the individual compress errors from request chaining.
  1115					 */
  1116					for (j = 0; j < nr_pages; ++j) {
  1117						if (unlikely(acomp_request_err(acomp_ctx->reqs[j]))) {
  1118							err = -EINVAL;
  1119							if (acomp_request_err(acomp_ctx->reqs[j]) == -ENOSPC)
  1120								zswap_reject_compress_poor++;
  1121							else
  1122								zswap_reject_compress_fail++;
  1123						}
  1124					}
  1125					/*
  1126					 * Request chaining cleanup:
  1127					 *
  1128					 * - Clear the CRYPTO_TFM_REQ_CHAIN bit on acomp_ctx->reqs[0].
  1129					 * - Reset the acomp_ctx->wait to notify acomp_ctx->reqs[0].
  1130					 */
  1131					acomp_reqchain_clear(acomp_ctx->reqs[0], &acomp_ctx->wait);
  1132					if (unlikely(err))
  1133						return false;
  1134					j = 0;
  1135					nr_to_store = nr_pages;
  1136					goto store_zpool;
  1137				}
  1138	
  1139				++acomp_idx;
  1140				continue;
  1141			} else {
  1142				err = crypto_wait_req(crypto_acomp_compress(acomp_ctx->reqs[0]), &acomp_ctx->wait);
  1143	
  1144				if (unlikely(err)) {
  1145					if (err == -ENOSPC)
  1146						zswap_reject_compress_poor++;
  1147					else
  1148						zswap_reject_compress_fail++;
  1149					return false;
  1150				}
  1151				j = i;
  1152				nr_to_store = 1;
  1153			}
  1154	
  1155	store_zpool:
  1156			/*
  1157			 * All batch pages were successfully compressed.
  1158			 * Store the pages in zpool.
  1159			 */
  1160			acomp_idx = -1;
  1161			while (nr_to_store--) {
  1162				unsigned long handle;
  1163				char *buf;
  1164	
  1165				++acomp_idx;
> 1166				prefetchw(entries[j]);
  1167				err = zpool_malloc(zpool, acomp_ctx->reqs[acomp_idx]->dlen, gfp, &handle);
  1168	
  1169				if (unlikely(err)) {
  1170					if (err == -ENOSPC)
  1171						zswap_reject_compress_poor++;
  1172					else
  1173						zswap_reject_alloc_fail++;
  1174	
  1175					return false;
  1176				}
  1177	
  1178				buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO);
  1179				memcpy(buf, acomp_ctx->buffers[acomp_idx], acomp_ctx->reqs[acomp_idx]->dlen);
  1180				zpool_unmap_handle(zpool, handle);
  1181	
  1182				entries[j]->handle = handle;
  1183				entries[j]->length = acomp_ctx->reqs[acomp_idx]->dlen;
  1184				++j;
  1185			}
  1186		}
  1187	
  1188		return true;
  1189	}
  1190	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v8 14/14] mm: zswap: Compress batching with request chaining in zswap_store() of large folios.
  2025-03-03 11:07 ` [PATCH v8 14/14] mm: zswap: Compress batching with request chaining in zswap_store() of large folios kernel test robot
@ 2025-03-03 18:21   ` Nhat Pham
  2025-03-03 21:34     ` Sridhar, Kanchana P
  0 siblings, 1 reply; 5+ messages in thread
From: Nhat Pham @ 2025-03-03 18:21 UTC (permalink / raw)
  To: kernel test robot
  Cc: Kanchana P Sridhar, linux-kernel, linux-mm, hannes, yosry.ahmed,
	chengming.zhou, usamaarif642, ryan.roberts, 21cnbao, ying.huang,
	akpm, linux-crypto, herbert, davem, clabbe, ardb, ebiggers,
	surenb, kristen.c.accardi, llvm, oe-kbuild-all, wajdi.k.feghali,
	vinodh.gopal

On Mon, Mar 3, 2025 at 3:07 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Kanchana,
>
> kernel test robot noticed the following build errors:
>
> > 1166                          prefetchw(entries[j]);
> --

Why are we doing this anyway? Does it have a notable performance
difference? At the very least, leave a comment explaining why we're
prefetching this (although the build error suggests that we have to
remove it anyway).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v8 14/14] mm: zswap: Compress batching with request chaining in zswap_store() of large folios.
  2025-03-03 18:21   ` Nhat Pham
@ 2025-03-03 21:34     ` Sridhar, Kanchana P
  2025-03-06 21:20       ` Yosry Ahmed
  0 siblings, 1 reply; 5+ messages in thread
From: Sridhar, Kanchana P @ 2025-03-03 21:34 UTC (permalink / raw)
  To: Nhat Pham, lkp
  Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	hannes@cmpxchg.org, yosry.ahmed@linux.dev,
	chengming.zhou@linux.dev, usamaarif642@gmail.com,
	ryan.roberts@arm.com, 21cnbao@gmail.com,
	ying.huang@linux.alibaba.com, akpm@linux-foundation.org,
	linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au,
	davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org,
	ebiggers@google.com, surenb@google.com, Accardi, Kristen C,
	llvm@lists.linux.dev, oe-kbuild-all@lists.linux.dev,
	Feghali, Wajdi K, Gopal, Vinodh, Sridhar, Kanchana P


> -----Original Message-----
> From: Nhat Pham <nphamcs@gmail.com>
> Sent: Monday, March 3, 2025 10:22 AM
> To: lkp <lkp@intel.com>
> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> yosry.ahmed@linux.dev; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> ying.huang@linux.alibaba.com; akpm@linux-foundation.org; linux-
> crypto@vger.kernel.org; herbert@gondor.apana.org.au;
> davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org;
> ebiggers@google.com; surenb@google.com; Accardi, Kristen C
> <kristen.c.accardi@intel.com>; llvm@lists.linux.dev; oe-kbuild-
> all@lists.linux.dev; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal,
> Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v8 14/14] mm: zswap: Compress batching with request
> chaining in zswap_store() of large folios.
> 
> On Mon, Mar 3, 2025 at 3:07 AM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi Kanchana,
> >
> > kernel test robot noticed the following build errors:
> >
> > > 1166                          prefetchw(entries[j]);
> > --
> 
> Why are we doing this anyway? Does it have a notable performance
> difference? At the very least, leave a comment explaining why we're
> prefetching this (although the build error suggests that we have to
> remove it anyway).

Hi Nhat,

Yes, it does. The use of prefetchw reduces sys time by ~1.5% because
it minimizes cache-miss latency by moving the zswap entry to the cache
before it is written to. 

This is data with kernel compilation test, v8 without prefetchw and v8 as-is:

--------------------------------------------------------------------------------
 Kernel compile       v8 without               v8      v8 without              v8
 allmodconfig          prefetchw                        prefetchw
 2M folios
 --------------------------------------------------------------------------------
 zswap compressor    deflate-iaa      deflate-iaa            zstd            zstd   
 --------------------------------------------------------------------------------
 real_sec                 732.89           735.63          768.53          758.21
 user_sec              15,708.37        15,699.84       15,702.64       15,678.73
 sys_sec                4,632.58         4,563.70        5,735.06        5,635.69
 --------------------------------------------------------------------------------
 Max_Res_Set_Size_KB   1,874,672        1,867,516       1,874,684       1,872,888
 --------------------------------------------------------------------------------
 memcg_high                    0                0               0               0
 memcg_swap_fail               0                0               0               0
 zswpout             114,742,930      112,836,725      92,904,961      89,596,085
 zswpin               41,184,897       39,983,793      31,018,149      29,163,932
 pswpout                     625            1,069             558           1,059
 pswpin                      599            1,056             540           1,051
 thp_swpout                    1                2               1               2
 thp_swpout_fallback      10,967           10,195           6,918           6,141
 pgmajfault           42,588,331       41,349,069      31,931,882      30,006,422
 ZSWPOUT-2048kB            7,661            8,710           6,799           7,480
 SWPOUT-2048kB                 1                2               1               2
 --------------------------------------------------------------------------------


Sure, I will add a comment, and also "#include <linux/prefetch.h>" in zswap.c
that will resolve the build error. This is similar to how these files handle prefetchw:
mm/vmscan.c, kernel/locking/qspinlock.c, include/asm-generic/xor.h, etc.

Thanks,
Kanchana


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v8 14/14] mm: zswap: Compress batching with request chaining in zswap_store() of large folios.
  2025-03-03 21:34     ` Sridhar, Kanchana P
@ 2025-03-06 21:20       ` Yosry Ahmed
  2025-04-30 21:07         ` Sridhar, Kanchana P
  0 siblings, 1 reply; 5+ messages in thread
From: Yosry Ahmed @ 2025-03-06 21:20 UTC (permalink / raw)
  To: Sridhar, Kanchana P
  Cc: Nhat Pham, lkp, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	hannes@cmpxchg.org, chengming.zhou@linux.dev,
	usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
	ying.huang@linux.alibaba.com, akpm@linux-foundation.org,
	linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au,
	davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org,
	ebiggers@google.com, surenb@google.com, Accardi, Kristen C,
	llvm@lists.linux.dev, oe-kbuild-all@lists.linux.dev,
	Feghali, Wajdi K, Gopal, Vinodh

On Mon, Mar 03, 2025 at 09:34:04PM +0000, Sridhar, Kanchana P wrote:
> 
> > -----Original Message-----
> > From: Nhat Pham <nphamcs@gmail.com>
> > Sent: Monday, March 3, 2025 10:22 AM
> > To: lkp <lkp@intel.com>
> > Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> > kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> > yosry.ahmed@linux.dev; chengming.zhou@linux.dev;
> > usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> > ying.huang@linux.alibaba.com; akpm@linux-foundation.org; linux-
> > crypto@vger.kernel.org; herbert@gondor.apana.org.au;
> > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org;
> > ebiggers@google.com; surenb@google.com; Accardi, Kristen C
> > <kristen.c.accardi@intel.com>; llvm@lists.linux.dev; oe-kbuild-
> > all@lists.linux.dev; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal,
> > Vinodh <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v8 14/14] mm: zswap: Compress batching with request
> > chaining in zswap_store() of large folios.
> > 
> > On Mon, Mar 3, 2025 at 3:07 AM kernel test robot <lkp@intel.com> wrote:
> > >
> > > Hi Kanchana,
> > >
> > > kernel test robot noticed the following build errors:
> > >
> > > > 1166                          prefetchw(entries[j]);
> > > --
> > 
> > Why are we doing this anyway? Does it have a notable performance
> > difference? At the very least, leave a comment explaining why we're
> > prefetching this (although the build error suggests that we have to
> > remove it anyway).
> 
> Hi Nhat,
> 
> Yes, it does. The use of prefetchw reduces sys time by ~1.5% because
> it minimizes cache-miss latency by moving the zswap entry to the cache
> before it is written to. 
> 
> This is data with kernel compilation test, v8 without prefetchw and v8 as-is:
> 
> --------------------------------------------------------------------------------
>  Kernel compile       v8 without               v8      v8 without              v8
>  allmodconfig          prefetchw                        prefetchw
>  2M folios
>  --------------------------------------------------------------------------------
>  zswap compressor    deflate-iaa      deflate-iaa            zstd            zstd   
>  --------------------------------------------------------------------------------
>  real_sec                 732.89           735.63          768.53          758.21
>  user_sec              15,708.37        15,699.84       15,702.64       15,678.73
>  sys_sec                4,632.58         4,563.70        5,735.06        5,635.69
>  --------------------------------------------------------------------------------
>  Max_Res_Set_Size_KB   1,874,672        1,867,516       1,874,684       1,872,888
>  --------------------------------------------------------------------------------
>  memcg_high                    0                0               0               0
>  memcg_swap_fail               0                0               0               0
>  zswpout             114,742,930      112,836,725      92,904,961      89,596,085
>  zswpin               41,184,897       39,983,793      31,018,149      29,163,932
>  pswpout                     625            1,069             558           1,059
>  pswpin                      599            1,056             540           1,051
>  thp_swpout                    1                2               1               2
>  thp_swpout_fallback      10,967           10,195           6,918           6,141
>  pgmajfault           42,588,331       41,349,069      31,931,882      30,006,422
>  ZSWPOUT-2048kB            7,661            8,710           6,799           7,480
>  SWPOUT-2048kB                 1                2               1               2
>  --------------------------------------------------------------------------------
> 
> 
> Sure, I will add a comment, and also "#include <linux/prefetch.h>" in zswap.c
> that will resolve the build error. This is similar to how these files handle prefetchw:
> mm/vmscan.c, kernel/locking/qspinlock.c, include/asm-generic/xor.h, etc.

Please also explicitly mention that the prefetch and likely/unlikely
annotations prevent regressions with software compression like zstd, and
generally improve the performance with the batching code by ~1.5%.

> 
> Thanks,
> Kanchana
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v8 14/14] mm: zswap: Compress batching with request chaining in zswap_store() of large folios.
  2025-03-06 21:20       ` Yosry Ahmed
@ 2025-04-30 21:07         ` Sridhar, Kanchana P
  0 siblings, 0 replies; 5+ messages in thread
From: Sridhar, Kanchana P @ 2025-04-30 21:07 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Nhat Pham, lkp, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	hannes@cmpxchg.org, chengming.zhou@linux.dev,
	usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
	ying.huang@linux.alibaba.com, akpm@linux-foundation.org,
	linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au,
	davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org,
	ebiggers@google.com, surenb@google.com, Accardi, Kristen C,
	llvm@lists.linux.dev, oe-kbuild-all@lists.linux.dev,
	Feghali, Wajdi K, Gopal, Vinodh, Sridhar, Kanchana P


> -----Original Message-----
> From: Yosry Ahmed <yosry.ahmed@linux.dev>
> Sent: Thursday, March 6, 2025 1:21 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Nhat Pham <nphamcs@gmail.com>; lkp <lkp@intel.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com;
> ying.huang@linux.alibaba.com; akpm@linux-foundation.org; linux-
> crypto@vger.kernel.org; herbert@gondor.apana.org.au;
> davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org;
> ebiggers@google.com; surenb@google.com; Accardi, Kristen C
> <kristen.c.accardi@intel.com>; llvm@lists.linux.dev; oe-kbuild-
> all@lists.linux.dev; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal,
> Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v8 14/14] mm: zswap: Compress batching with request
> chaining in zswap_store() of large folios.
> 
> On Mon, Mar 03, 2025 at 09:34:04PM +0000, Sridhar, Kanchana P wrote:
> >
> > > -----Original Message-----
> > > From: Nhat Pham <nphamcs@gmail.com>
> > > Sent: Monday, March 3, 2025 10:22 AM
> > > To: lkp <lkp@intel.com>
> > > Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> > > kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> > > yosry.ahmed@linux.dev; chengming.zhou@linux.dev;
> > > usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> > > ying.huang@linux.alibaba.com; akpm@linux-foundation.org; linux-
> > > crypto@vger.kernel.org; herbert@gondor.apana.org.au;
> > > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org;
> > > ebiggers@google.com; surenb@google.com; Accardi, Kristen C
> > > <kristen.c.accardi@intel.com>; llvm@lists.linux.dev; oe-kbuild-
> > > all@lists.linux.dev; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal,
> > > Vinodh <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v8 14/14] mm: zswap: Compress batching with
> request
> > > chaining in zswap_store() of large folios.
> > >
> > > On Mon, Mar 3, 2025 at 3:07 AM kernel test robot <lkp@intel.com>
> wrote:
> > > >
> > > > Hi Kanchana,
> > > >
> > > > kernel test robot noticed the following build errors:
> > > >
> > > > > 1166                          prefetchw(entries[j]);
> > > > --
> > >
> > > Why are we doing this anyway? Does it have a notable performance
> > > difference? At the very least, leave a comment explaining why we're
> > > prefetching this (although the build error suggests that we have to
> > > remove it anyway).
> >
> > Hi Nhat,
> >
> > Yes, it does. The use of prefetchw reduces sys time by ~1.5% because
> > it minimizes cache-miss latency by moving the zswap entry to the cache
> > before it is written to.
> >
> > This is data with kernel compilation test, v8 without prefetchw and v8 as-is:
> >
> > --------------------------------------------------------------------------------
> >  Kernel compile       v8 without               v8      v8 without              v8
> >  allmodconfig          prefetchw                        prefetchw
> >  2M folios
> >  --------------------------------------------------------------------------------
> >  zswap compressor    deflate-iaa      deflate-iaa            zstd            zstd
> >  --------------------------------------------------------------------------------
> >  real_sec                 732.89           735.63          768.53          758.21
> >  user_sec              15,708.37        15,699.84       15,702.64       15,678.73
> >  sys_sec                4,632.58         4,563.70        5,735.06        5,635.69
> >  --------------------------------------------------------------------------------
> >  Max_Res_Set_Size_KB   1,874,672        1,867,516       1,874,684
> 1,872,888
> >  --------------------------------------------------------------------------------
> >  memcg_high                    0                0               0               0
> >  memcg_swap_fail               0                0               0               0
> >  zswpout             114,742,930      112,836,725      92,904,961      89,596,085
> >  zswpin               41,184,897       39,983,793      31,018,149      29,163,932
> >  pswpout                     625            1,069             558           1,059
> >  pswpin                      599            1,056             540           1,051
> >  thp_swpout                    1                2               1               2
> >  thp_swpout_fallback      10,967           10,195           6,918           6,141
> >  pgmajfault           42,588,331       41,349,069      31,931,882      30,006,422
> >  ZSWPOUT-2048kB            7,661            8,710           6,799           7,480
> >  SWPOUT-2048kB                 1                2               1               2
> >  --------------------------------------------------------------------------------
> >
> >
> > Sure, I will add a comment, and also "#include <linux/prefetch.h>" in
> zswap.c
> > that will resolve the build error. This is similar to how these files handle
> prefetchw:
> > mm/vmscan.c, kernel/locking/qspinlock.c, include/asm-generic/xor.h, etc.
> 
> Please also explicitly mention that the prefetch and likely/unlikely
> annotations prevent regressions with software compression like zstd, and
> generally improve the performance with the batching code by ~1.5%.

Yes, I have mentioned this in the comments and commit log. So also the
mutex locking.

> 
> >
> > Thanks,
> > Kanchana
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-04-30 21:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250303084724.6490-15-kanchana.p.sridhar@intel.com>
2025-03-03 11:07 ` [PATCH v8 14/14] mm: zswap: Compress batching with request chaining in zswap_store() of large folios kernel test robot
2025-03-03 18:21   ` Nhat Pham
2025-03-03 21:34     ` Sridhar, Kanchana P
2025-03-06 21:20       ` Yosry Ahmed
2025-04-30 21:07         ` Sridhar, Kanchana P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox