From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C627CD11BF for ; Wed, 20 Mar 2024 19:11:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3D636B0087; Wed, 20 Mar 2024 15:11:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EEDEB6B008C; Wed, 20 Mar 2024 15:11:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB58E6B0092; Wed, 20 Mar 2024 15:11:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C68696B0087 for ; Wed, 20 Mar 2024 15:11:43 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8323712137B for ; Wed, 20 Mar 2024 19:11:43 +0000 (UTC) X-FDA: 81918361686.27.CCC5633 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf12.hostedemail.com (Postfix) with ESMTP id B1CB640007 for ; Wed, 20 Mar 2024 19:11:41 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wPfQxuUS; spf=pass (imf12.hostedemail.com: domain of 37DT7ZQoKCHkvlpovXejbadlldib.Zljifkru-jjhsXZh.lod@flex--yosryahmed.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=37DT7ZQoKCHkvlpovXejbadlldib.Zljifkru-jjhsXZh.lod@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710961901; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=j7LMRio+PS8aI9k3DhAx1SV/ZtsvK3v4/pXa7Z8PPvo=; b=1JpzopZSo6N+2dneD5zIlvsu4CEyqu+lZh+Q0WbVkup4JKoScEiljx2MQAv8zoUuIaQxjw vqK6ud5N5dFX8MNhJAikb6fFnNeOcvzZpcukgVPf4T8cdpddpU7Q84+KUu8tgc00gybn72 ocVteNrqbj5zwOQiXdpZSR31bsQBEYc= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wPfQxuUS; spf=pass (imf12.hostedemail.com: domain of 37DT7ZQoKCHkvlpovXejbadlldib.Zljifkru-jjhsXZh.lod@flex--yosryahmed.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=37DT7ZQoKCHkvlpovXejbadlldib.Zljifkru-jjhsXZh.lod@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710961901; a=rsa-sha256; cv=none; b=5jb/fS+Imp9rpgVuemdkXjxzRVrGCAyTMeshXhGnj/I67puXV/aOTfMINn3ndpg3UwH4q/ 6BZ6Hnvh6DCr5V7gZU8JjrGdK9MA34N1q6hFspo0b4DG2S4uX7xUFQV1XdWrIh6LDP5eRZ eSc3AqplYlV+YyKOcFtzeA1FxQKuxyo= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-608ad239f8fso2414807b3.0 for ; Wed, 20 Mar 2024 12:11:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1710961901; x=1711566701; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=j7LMRio+PS8aI9k3DhAx1SV/ZtsvK3v4/pXa7Z8PPvo=; b=wPfQxuUSQ5/HMIz6Zhe9TpD01KVIDzT9Iz93iOleACUG3f3K9bJoTVE44RSXr8E7/o kPOuwiUcnTwucMwzxE6gtpq1Y0Dh+6aLRkC0sv7G2xakFGpY+G2nC8fneeoiAelZljPH osaQSewy82weTk/wyl/3Ale9dEKxtKvpaLL3h9NKJCbwZJR9i8DEVmhS9KLEd9W2dnYh aj1D7d/6DSHEb2fbPNTlTYX5+loAHveOo3AUUN/TfQgzGZT32YLt+/nZaXiosUKKPTHk Hss8BmhmlhxywcDT+5lO/PJXQnW3GOUzgGsj1ttHdnVfecl1OU1lIAyOUZhnuqFaqySp +B3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710961901; x=1711566701; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=j7LMRio+PS8aI9k3DhAx1SV/ZtsvK3v4/pXa7Z8PPvo=; b=wVJZ74k9ecbM9k6tiy4NcT+D/SgZ+wvA4DKFmAuVUyK41zykfT/XWd2hbvuFrGB6iw 4PdHvwkTEmr11CV8ElFUIlbVi3whYj7xH2eEv0Sv6uuQ6mFAEALH4TS1SB8Um1x2E/Mf 7AfEWRU7V14uFNZ2O/NMCzzxt02QOMEZxJIlCULRG3953YrFq7p5XVKQj72qb3bTXVxP sUHoIcwvDzlXmuRSgG+kmnwZ66Noy/oWmamVzE4H4t7c7JmXm3mW9P/RrGxCAfOwwcFH W0MWBEs7lbm5B/71X7aB88zAdG+xSInMMINNVIWXWSS21xhUu2YdUXHg5Ax5awus2bJp iyOQ== X-Forwarded-Encrypted: i=1; AJvYcCVzSNj+4JkCPub7hyAqfNPxaZsIyPzzP/YH4ggPDFKF3uSETntCuiPw9oVfldKni2sCpGUtcbKiWB8R2COXeOf1TjI= X-Gm-Message-State: AOJu0YzwFsnknMOrdhVPpF1TIy2tzODDuCR8bRcL5LWzqQjVDkE8x8dO iUcYDmbAjunAZXMXxUhxfDyACe+mR2WR13ojG47hAsK6+ViN3vTptaE9IqcfMs1BjK+MX9RPJXp L0Q1NuoX8hleOJyxwhw== X-Google-Smtp-Source: AGHT+IF2exxs4A0z7JTbw+RIKRHHmq4+3usbqdmbipaxHHv7PEsceQthi2sGcPPpOdLj4dwpa7bwWyOHNzP+4VZO X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:e09:b0:dc2:3441:897f with SMTP id df9-20020a0569020e0900b00dc23441897fmr4715579ybb.6.1710961900635; Wed, 20 Mar 2024 12:11:40 -0700 (PDT) Date: Wed, 20 Mar 2024 19:11:38 +0000 In-Reply-To: <20240320100803.GB294822@cmpxchg.org> Mime-Version: 1.0 References: <20240319-zswap-xarray-v7-1-e9a03a049e86@kernel.org> <20240320100803.GB294822@cmpxchg.org> Message-ID: Subject: Re: [PATCH v7] zswap: replace RB tree with xarray From: Yosry Ahmed To: Johannes Weiner Cc: Chris Li , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Nhat Pham , "Matthew Wilcox (Oracle)" , Chengming Zhou , Barry Song Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: B1CB640007 X-Rspam-User: X-Stat-Signature: oj1ejr7xdd9u9gg66hg33dwhhpmipjn4 X-Rspamd-Server: rspam01 X-HE-Tag: 1710961901-403225 X-HE-Meta: U2FsdGVkX19y7NP+Xe4Iy3WQfH/INdQs0rlTcnI5fMEEUImcZgcl4Toq1g73pq7U4EAu+ADOGERy0EYL4ouiDMikEv7dNXhsjukpZd60Aa+DoBIIiPv83i5JSI4Fi14OTRHbITwjuZNYlwtHsM+SlWmGisW9XmrdpbAwkRMm//QmkukBnQYwmgB/WQcEs1y6EZgK9WrD2oc5k+PnGFOkunJt1I1tzGxVkzhdtJk/anLPtq7ebnPOOWHzjPJlAq8JQMjm+lG1PKljK3FeDJfT4aY40MNTMGLHnzLR/vLeqdHhOc+VDnBQ22bYFjwnJmWNvk2keZdB40/fFbpWlu43t+qcPGW5urSzjR0TlBU1anupANzmWZj0B2DkqL1Ce9/T6eT8vcoDD/gLj56+H6yNXnYIYigyCfhIME37jw+/N32TwLzx+HgaoiQUC6Ke3CaniIKpxeRkOy124GiOkoMVCSjhw6KaURt0JcjRh6qgSPLKHmVieFkj9XlLpb5NIaka+zNEr7dMFxaBKuDWhynk75jDFZ0Rhn85htI5vPvNkpqyXcqR1VLqs9qsdLW+SzKn1M8vm/wkjW0ww3HfyQGgyE9UWmsFngxzo4UhYsC/aXI1F0bAk+4oQrSAh/2BQ2KS2dKGbgRpVs485AbRwuqGZy92WcI4eNJYOdjeuw4ttmBYVAyVkvYDpRzfdmG/qUII4VAPgb2Q2P/R1MiLcUvHlO0K/bZT8TfVmpZZgXf+2hUx70LktBVrJaOdEIG10DhYfLPWGlPIXs54cqymt1q3OgDkhlU4leEeByZCK7qzVfry8/xLrO5WLSyrg87mdbeiDozOiDx05TewSRQIQO+p4Bv1oG4e57dpTH30dvA/xFrlxv7Do1L49y8Chs1xyj2CLJw8aM/N2j8gk7Ex/sGFtYyVHrDoNgaQpuZ23NgNjeAZjExpmzJlkU9KPUYgduXP8ZNDgslY1kZ3At8MjoV Oo0O08iY LAUojhZ2sS2xK3KKFRrBTPLnsxSY/OTrAvNcf74ZyuwGFBKrsk5o13lh6EkBBbpXMlzMMNowBDkWTKYvNA3s/GKYxDSP8tA/5udJtIBYZSWCocOK6qPL2NDlx2OuHdA/TRb7YLxIUx5LJMnDa5sncyzF6M64xdyaPrBptYMr4c8+VbT9C4PtYMzWS4XFX25x17saeC9AUpGIDjRSTYOBkgaNkNstT7/CyXzPQwhBnW066b41sM59lkoe0tzVUQ0TuBh0lSqrwej479/ZnSfN5EOVcfvVwXO1kZbVZYrQ+HM/YAptxUuAJ2OGhzxNw4luD945zW8oj+FfAT+g+3lFnv+Vt6M6L/X0BFz9286AdLFHYSWT7qNVhjFL101fx/l+a+K0CypkdyJYkcHV1hudPyH4fn7kUMVOuZRNWS1q0iwyknqVq+VZapkCjwr9Mr4CWKLKGganUDCfVmFPB9cUvls0Rvm/184HwO5W2PZ4NbfIbK4N4RdVDm01hzPFX7EX+fR90FNlwczHFg/I280uv137liea+Ej7cBKdYETgJWwElRgKlbZllPJrJgvQ3g0pd5r217D+w8DhtbpRmMwWA2myEFeaiExWJLK8FLhk4yfMLqQKGbLEudJO30w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 20, 2024 at 06:08:03AM -0400, Johannes Weiner wrote: > On Wed, Mar 20, 2024 at 07:24:27AM +0000, Yosry Ahmed wrote: > > [..] > > > > > - /* map */ > > > > > - spin_lock(&tree->lock); > > > > > /* > > > > > - * The folio may have been dirtied again, invalidate the > > > > > - * possibly stale entry before inserting the new entry. > > > > > + * We finish initializing the entry while it's already in xarray. > > > > > + * This is safe because: > > > > > + * > > > > > + * 1. Concurrent stores and invalidations are excluded by folio lock. > > > > > + * > > > > > + * 2. Writeback is excluded by the entry not being on the LRU yet. > > > > > + * The publishing order matters to prevent writeback from seeing > > > > > + * an incoherent entry. > > > > > > > > As I mentioned before, writeback is also protected by the folio lock. > > > > Concurrent writeback will find the folio in the swapcache and abort. The > > > > fact that the entry is not on the LRU yet is just additional protection, > > > > so I don't think the publishing order actually matters here. Right? > > > > > > Right. This comment is explaining why this publishing order does not > > > matter. I think we are talking about the same thing here? > > > > The comment literally says "the publishing order matters.." :) > > > > I believe Johannes meant that we should only publish the entry to the > > LRU once it is fully initialized, to prevent writeback from using a > > partially initialized entry. > > > > What I am saying is that, even if we add a partially initialized entry > > to the zswap LRU, writeback will skip it anyway because the folio is > > locked in the swapcache. > > > > So basically I think the comment should say: > > > > /* > > * We finish initializing the entry while it's already in the > > * xarray. This is safe because the folio is locked in the swap > > * cache, which should protect against concurrent stores, > > * invalidations, and writeback. > > */ > > > > Johannes, what do you think? > > I don't think that's quite right. > > Writeback will bail on swapcache insert, yes, but it will access the > entry before attempting it. If LRU publishing happened before setting > entry->swpentry e.g., we'd have a problem, while your comment suggets > it would be safe to rearrange the code like this. > > So LRU publishing order does matter. Ah yes, you are right. entry->swpentry should be set to make sure we lookup the correct entry in the swapcache and the tree. Perhaps we should spell this out in the comment and make the initialization ordering more explicit? Maybe something like: diff --git a/mm/zswap.c b/mm/zswap.c index d8a14b27adcd7..70924b437743a 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1472,9 +1472,6 @@ bool zswap_store(struct folio *folio) goto put_pool; insert_entry: - entry->swpentry = swp; - entry->objcg = objcg; - old = xa_store(tree, offset, entry, GFP_KERNEL); if (xa_is_err(old)) { int err = xa_err(old); @@ -1491,6 +1488,7 @@ bool zswap_store(struct folio *folio) if (old) zswap_entry_free(old); + entry->objcg = objcg; if (objcg) { obj_cgroup_charge_zswap(objcg, entry->length); count_objcg_event(objcg, ZSWPOUT); @@ -1498,15 +1496,16 @@ bool zswap_store(struct folio *folio) /* * We finish initializing the entry while it's already in xarray. - * This is safe because: - * - * 1. Concurrent stores and invalidations are excluded by folio lock. + * This is safe because the folio is locked in the swapcache, which + * protects against concurrent stores and invalidations. * - * 2. Writeback is excluded by the entry not being on the LRU yet. - * The publishing order matters to prevent writeback from seeing - * an incoherent entry. + * Concurrent writeback is not possible until we add the entry to the + * LRU. We need to at least initialize entry->swpentry *before* adding + * the entry to the LRU to make sure writeback looks up the correct + * entry in the swapcache. */ if (entry->length) { + entry->swpentry = swp; INIT_LIST_HEAD(&entry->lru); zswap_lru_add(&zswap_list_lru, entry); atomic_inc(&zswap_nr_stored); This also got me wondering, do we need a write barrier between initializing entry->swpentry and zswap_lru_add()? I guess if we read the wrong swpentry in zswap_writeback_entry() we will eventually fail the xa_cmpxchg() and drop it anyway, but it seems bug-prone.