From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C803ED5D683 for ; Thu, 7 Nov 2024 18:53:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D1026B007B; Thu, 7 Nov 2024 13:53:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1816A6B0083; Thu, 7 Nov 2024 13:53:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 021486B0085; Thu, 7 Nov 2024 13:53:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D6C5F6B007B for ; Thu, 7 Nov 2024 13:53:49 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4F5701A0B13 for ; Thu, 7 Nov 2024 18:53:49 +0000 (UTC) X-FDA: 82760197842.14.E7489FF Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) by imf29.hostedemail.com (Postfix) with ESMTP id 8D5FA120012 for ; Thu, 7 Nov 2024 18:52:58 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=DDqssh1W; spf=pass (imf29.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.169 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731005542; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l1CuDjKsYRjEGFe+TKoRAgWlmPg/3uy1Ak3gO25I4Jg=; b=tqCALMP2KOdZHmTZ0/RXfbZcWL3NGQVvgyExSGFlGvaA93v1Q8wpKzDRGmpv/w3/FcyUct NQUr/KvufO2zOJ2i7t97MmLh+sj4oxbkSwgsb34mnAlBajGK5f8YT3dByW0n3DBzv+Yr+i wi4v+AZftz16Ijzfwcd3iOrGs7GC3rI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731005542; a=rsa-sha256; cv=none; b=10Qek4gqilGkFwCXA6nPhYNGXPmoEckGPUXK4u9SEIY7xoAEO7/SaW6pnxAfpuFlmqOWNb 4hwX23E22AZYN39qD/KNYL7SB71NXuPsatxfM5zGelplZ7BEAplzvDkHruMwEUL+hCv9q7 VBl0NXLVVIicjEPc0yyP3DaSJALDYnI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=DDqssh1W; spf=pass (imf29.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.169 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-7b157c9ad12so94176085a.1 for ; Thu, 07 Nov 2024 10:53:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1731005626; x=1731610426; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=l1CuDjKsYRjEGFe+TKoRAgWlmPg/3uy1Ak3gO25I4Jg=; b=DDqssh1WRgiQANzmDhN2IJAU/BFiqCHWIACjw4EHrsZmcW0devBstVazCnDlmzP0h0 lMpCbUrYp7E/yyDBqycbHjsfILk5jxajJu3D2g4lM9fheXL95/lxELbpoaQPffmfGOv3 6d6aGyLmXCSrBkDjv2YZLUXzyFWTe7pMnD8Z7WEKAC/JUYRbvyIziRggx85K0wFK17jz 02w2i2YNbeY1mLDY5L/BMPLzSIrO+4r5clsWLxGhcLr7MUMU56AaJxQIsl1S1L+f7LYO chnanhJODjWmlkzJJ0KwRk/829Fh8Mde3pwxVISEqChSEo3QVxomTLdKGzOSqB0q8lK6 JYWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731005626; x=1731610426; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=l1CuDjKsYRjEGFe+TKoRAgWlmPg/3uy1Ak3gO25I4Jg=; b=Xk8amXM+yp9jn3SiLdnzolJ6yXMwr7pFo8jgHqWj97pSkilOVMixFzPY9aP6SiAgx3 4j++6O4tyB/QkYstKDGj9WHbFlikW7ZeHttH3Op6P0tYdQhR2dbyVNMJNZ5aHPzGmYWb wkQlmBZeulEyD1QfOTRnxbxHtlPrxbVjIEZTjKYpi5uwXEdEjWM9qhZPnKZqWLblDOFW NG6kE/yrX4/RVccSw5O+VReecmWpyumtSuBMFHZF6ULOxGkKR/xiTm9s7Sc8S2CvxANC qpCWHAABADPkQL9VzVNMGY12E/W1D2erFUFL6sdOBQWIxNdVQq5Eol7Yhb1hBxN+oOmu pNgw== X-Forwarded-Encrypted: i=1; AJvYcCUoIzRfwzvusymyib2J6JjA3RdGQcuFGOvTDwtGpklhEMXN19+kx+eJHaZiRTmpt5ol1Bwvl7RdAg==@kvack.org X-Gm-Message-State: AOJu0Yxc7dlyzdamjS/mm/UCjcjB54KG2l2TN+fEbY1YKmbPyslxwYy9 NRWc/YLeBr8xYUS3g5NdJUDsxNBfM2DA9sOF9nllWaKDnUByZXnA83CGdKxV/U0= X-Google-Smtp-Source: AGHT+IFQA72Bggxxtu7CiC4RsRBOrNujGGLDPSKH3I4JgK7r308sntvvm3Fz4x0toz4RRZuo0ReGdA== X-Received: by 2002:a05:620a:1908:b0:7b1:3b5e:4b50 with SMTP id af79cd13be357-7b331eb107dmr31913285a.19.1731005625679; Thu, 07 Nov 2024 10:53:45 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b32ac427cfsm89472585a.30.2024.11.07.10.53.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 10:53:44 -0800 (PST) Date: Thu, 7 Nov 2024 13:53:40 -0500 From: Johannes Weiner To: Kanchana P Sridhar Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com, zanussi@kernel.org, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com Subject: Re: [PATCH v3 13/13] mm: zswap: Compress batching with Intel IAA in zswap_store() of large folios. Message-ID: <20241107185340.GG1172372@cmpxchg.org> References: <20241106192105.6731-1-kanchana.p.sridhar@intel.com> <20241106192105.6731-14-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241106192105.6731-14-kanchana.p.sridhar@intel.com> X-Rspamd-Server: rspam10 X-Stat-Signature: fqju1eymkc4o83zcksi1f75yseqsxbca X-Rspamd-Queue-Id: 8D5FA120012 X-Rspam-User: X-HE-Tag: 1731005578-474065 X-HE-Meta: U2FsdGVkX18pvcIpLnd7EruBuiR3co9SdjwPIIUZk1IKaozB5Qm83RVXXq0GQPWVs2rVFHkO8NEB0ntkjrqJyoVRfOoUxU1QsUl7HuYNzWZe6OTDxJmfbLrt0M8LOcQ1EZXHpjtbB5HHniWfPFEO1guB3SReTrapAaFUqdjUsNMT4QtnfhUGCspL2uUxWczxaHPhFkSfNUUKOZp2zxPRCxjwhv2aW9LOO0tdsNUGmAcOIu1HdxkF5sfHAOg0EMXiluVfmYL/ertK1Slv8OIl4ZrfQgc3BSvLigU8/Dm3yMLJYtnPNis7r4oEwVYeOSV1lg+K/RX3/TWzh2f3GP60SSskuddwN1zO6GWC3FLu7GD85eegzY9HMIzP2NA8Xx98mpvf5Ss8GKik3BHqbzngnnDEYpSOzrlTtoF7kGh6NLgG4f0oRyEP0XsdoI1NP8KKjQ4lsvdVd8+NlKSAUEbqgqDJoEzIaFowb6hU1ifv44PKGAuw2JsmzofAWJ9deT9PJExIDaklhy6V9Ipg4wJXV1HeQp8pjfYeAstf6Tp6Wvvf6WdbjLhVovHsZ4/fh/d/pbfNAJs/WF+8MNglhN34ykIEFq2bMj5QG3RZRphv1H8R+tHY1KK7+Oeg9+YhKl/rNnfSm3LKGEtTK7e07bjxY5q/nCEZYr2n14VDMUTVbaI4vXur4MnILTXQ9B0M0IrYkSCSkuIk9P9sAyyL4G1u58ANAyKtxCA7JzOB61TpA9XJlPPnmND6o6t+Vj3qxvsqhVQRZew9CuNjJKrYZ4hq4xbEXBeeFkbI5uFJSVYgHSog7+EayYfQoUp3ydleUb2xGr/X84ESzhz4rSwtzq5xTm+/9jbu5V3EG0FHs65wPRe0RPmc1JD9bXOyp85nXC8Fi8miyOIEnOzpY+K6Ehg9Jv9PAg9vnvotuRh41YxkB81sO9g01f8LqlRhpZubHeTLlmdiOLpmLH7byN3FKFA oEGZvbHo NWEBFFTtMpmO0vgBEGJXWlUHvmNjwAw4EiSFNwVoadPJRNo2fQu+HpqFUOiXo0f6Y7xadTSbcwfVrRaycfNhB+mGsE1wglWq7bi35XMXTlW72PUmmv7RDs14rkx5OCQBzjmGfdZGeU2KAX5MgvatLshHe82w//yxYt7Yn1jFuHplWsE6NKl572rtVC1W+1m+UlNWfgXXhO0TT5KoOLih8YOkR6Tr/sjpPAA7Z16dgtIGYofrEKC44OwBZWTmUyUdOKGhn84FkPTQKtn8DDPQQfn1VcF2+c3nDVwUAA4QHdxGd3XwKeoBxxPQMxCS6+mk8Hl4PKVDHfI9d2kFUbG+knJSWS35sQBUT8hVFnnQxdO6GO0qBYlpQOMWYjuc9lnIiLEK0pe/wvnCaJ812AZPOKpATx5oEnZZelZIyn7ZbqchWk3ZwiTBoT/uEJOvHIhF8Wr59kmBCoKl7w6c2eBVuKB1oQs0doxglvDuglZvVoKFIPuTsE0aU04hsj6971NsjmZub X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 06, 2024 at 11:21:05AM -0800, Kanchana P Sridhar wrote: > +static void zswap_zpool_store_sub_batch( > + struct zswap_store_pipeline_state *zst) There is a zswap_store_sub_batch() below, which does something completely different. Naming is hard, but please invest a bit more time into this to make this readable. > +{ > + u8 i; > + > + for (i = 0; i < zst->nr_comp_pages; ++i) { > + struct zswap_store_sub_batch_page *sbp = &zst->sub_batch[i]; > + struct zpool *zpool; > + unsigned long handle; > + char *buf; > + gfp_t gfp; > + int err; > + > + /* Skip pages that had compress errors. */ > + if (sbp->error) > + continue; > + > + zpool = zst->pool->zpool; > + gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; > + if (zpool_malloc_support_movable(zpool)) > + gfp |= __GFP_HIGHMEM | __GFP_MOVABLE; > + err = zpool_malloc(zpool, zst->comp_dlens[i], gfp, &handle); > + > + if (err) { > + if (err == -ENOSPC) > + zswap_reject_compress_poor++; > + else > + zswap_reject_alloc_fail++; > + > + /* > + * An error should be propagated to other pages of the > + * same folio in the sub-batch, and zpool resources for > + * those pages (in sub-batch order prior to this zpool > + * error) should be de-allocated. > + */ > + zswap_store_propagate_errors(zst, sbp->batch_idx); > + continue; > + } > + > + buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO); > + memcpy(buf, zst->comp_dsts[i], zst->comp_dlens[i]); > + zpool_unmap_handle(zpool, handle); > + > + sbp->entry->handle = handle; > + sbp->entry->length = zst->comp_dlens[i]; > + } > +} > + > +/* > + * Returns true if the entry was successfully > + * stored in the xarray, and false otherwise. > + */ > +static bool zswap_store_entry(swp_entry_t page_swpentry, > + struct zswap_entry *entry) > +{ > + struct zswap_entry *old = xa_store(swap_zswap_tree(page_swpentry), > + swp_offset(page_swpentry), > + entry, GFP_KERNEL); > + if (xa_is_err(old)) { > + int err = xa_err(old); > + > + WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err); > + zswap_reject_alloc_fail++; > + return false; > + } > + > + /* > + * We may have had an existing entry that became stale when > + * the folio was redirtied and now the new version is being > + * swapped out. Get rid of the old. > + */ > + if (old) > + zswap_entry_free(old); > + > + return true; > +} > + > +static void zswap_batch_compress_post_proc( > + struct zswap_store_pipeline_state *zst) > +{ > + int nr_objcg_pages = 0, nr_pages = 0; > + struct obj_cgroup *objcg = NULL; > + size_t compressed_bytes = 0; > + u8 i; > + > + zswap_zpool_store_sub_batch(zst); > + > + for (i = 0; i < zst->nr_comp_pages; ++i) { > + struct zswap_store_sub_batch_page *sbp = &zst->sub_batch[i]; > + > + if (sbp->error) > + continue; > + > + if (!zswap_store_entry(sbp->swpentry, sbp->entry)) { > + zswap_store_propagate_errors(zst, sbp->batch_idx); > + continue; > + } > + > + /* > + * The entry is successfully compressed and stored in the tree, > + * there is no further possibility of failure. Grab refs to the > + * pool and objcg. These refs will be dropped by > + * zswap_entry_free() when the entry is removed from the tree. > + */ > + zswap_pool_get(zst->pool); > + if (sbp->objcg) > + obj_cgroup_get(sbp->objcg); > + > + /* > + * We finish initializing the entry while it's already in xarray. > + * This is safe because: > + * > + * 1. Concurrent stores and invalidations are excluded by folio > + * lock. > + * > + * 2. Writeback is excluded by the entry not being on the LRU yet. > + * The publishing order matters to prevent writeback from seeing > + * an incoherent entry. > + */ > + sbp->entry->pool = zst->pool; > + sbp->entry->swpentry = sbp->swpentry; > + sbp->entry->objcg = sbp->objcg; > + sbp->entry->referenced = true; > + if (sbp->entry->length) { > + INIT_LIST_HEAD(&sbp->entry->lru); > + zswap_lru_add(&zswap_list_lru, sbp->entry); > + } > + > + if (!objcg && sbp->objcg) { > + objcg = sbp->objcg; > + } else if (objcg && sbp->objcg && (objcg != sbp->objcg)) { > + obj_cgroup_charge_zswap(objcg, compressed_bytes); > + count_objcg_events(objcg, ZSWPOUT, nr_objcg_pages); > + compressed_bytes = 0; > + nr_objcg_pages = 0; > + objcg = sbp->objcg; > + } > + > + if (sbp->objcg) { > + compressed_bytes += sbp->entry->length; > + ++nr_objcg_pages; > + } > + > + ++nr_pages; > + } /* for sub-batch pages. */ > + > + if (objcg) { > + obj_cgroup_charge_zswap(objcg, compressed_bytes); > + count_objcg_events(objcg, ZSWPOUT, nr_objcg_pages); > + } > + > + atomic_long_add(nr_pages, &zswap_stored_pages); > + count_vm_events(ZSWPOUT, nr_pages); > +} > + > +static void zswap_store_sub_batch(struct zswap_store_pipeline_state *zst) > +{ > + u8 i; > + > + for (i = 0; i < zst->nr_comp_pages; ++i) { > + zst->comp_dsts[i] = zst->acomp_ctx->buffers[i]; > + zst->comp_dlens[i] = PAGE_SIZE; > + } /* for sub-batch pages. */ > + > + /* > + * Batch compress sub-batch "N". If IAA is the compressor, the > + * hardware will compress multiple pages in parallel. > + */ > + zswap_compress_batch(zst); > + > + zswap_batch_compress_post_proc(zst); The control flow here is a mess. Keep loops over the same batch at the same function level. IOW, pull the nr_comp_pages loop out of zswap_batch_compress_post_proc() and call the function from the loop. Also give it a more descriptive name. If that's hard to do, then you're probably doing too many different things in it. Create functions for a specific purpose, don't carve up sequences at arbitrary points. My impression after trying to read this is that the existing zswap_store() sequence could be a subset of the batched store, where you can reuse most code to get the pool, charge the cgroup, allocate entries, store entries, bump the stats etc. for both cases. Alas, your naming choices make it a bit difficult to be sure. Please explore this direction. Don't worry about the CONFIG symbol for now, we can still look at this later. Right now, it's basically if (special case) lots of duplicative code in slightly different order regular store sequence and that isn't going to be maintainable. Look for a high-level sequence that makes sense for both cases. E.g.: if (!zswap_enabled) goto check_old; get objcg check limits allocate memcg list lru for each batch { for each entry { allocate entry acquire objcg ref acquire pool ref } compress for each entry { store in tree add to lru bump stats and counters } } put objcg return true; check_error: ... and then set up the two loops such that they also makes sense when the folio is just a single page.