From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 014DAC54E67 for ; Wed, 20 Mar 2024 20:12:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CF256B007B; Wed, 20 Mar 2024 16:12:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 37FF46B0085; Wed, 20 Mar 2024 16:12:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 247906B0088; Wed, 20 Mar 2024 16:12:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 157806B007B for ; Wed, 20 Mar 2024 16:12:33 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AC302C01AF for ; Wed, 20 Mar 2024 20:12:32 +0000 (UTC) X-FDA: 81918514944.28.309DB0C Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf26.hostedemail.com (Postfix) with ESMTP id EDD2D140018 for ; Wed, 20 Mar 2024 20:12:30 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hrpC2yNp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of 3LUP7ZQoKCNYQGKJQ29E658GG8D6.4GEDAFMP-EECN24C.GJ8@flex--yosryahmed.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3LUP7ZQoKCNYQGKJQ29E658GG8D6.4GEDAFMP-EECN24C.GJ8@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710965551; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MFPFliDVzkwfGu3jramLj0g3qa3jh4QdDyPnAlfKBdY=; b=oFun3cgpMEGklpt1aD2cuFYb60mfxxUqvzONWBJddonoWSU/I9uf9QpEBICJGOLTgG5Qx+ Ez+/jDJAnHxquBy7C/pERPSlDPqILvQ0wnxpeT2CB5RpytdSyB6D6s+XB3hAo6LiSbutdO 8P+oJXNXSapHuc2Fqvh09x1beTT7mhs= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hrpC2yNp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of 3LUP7ZQoKCNYQGKJQ29E658GG8D6.4GEDAFMP-EECN24C.GJ8@flex--yosryahmed.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3LUP7ZQoKCNYQGKJQ29E658GG8D6.4GEDAFMP-EECN24C.GJ8@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710965551; a=rsa-sha256; cv=none; b=qUGFevQIHktmX9bZDWL9IlfKAFPrYaavAtAvbtz3XDHrEnQiu5RhPQcdt8rftSDFmwhCkA 5YmyhA8DOq5xJBqfeEZaafCOuzOFgNaRHt5D99ks8uhmPBqWecVJ6hvBufxH5M1zfRocU5 Xpo4gi2SvXOxgasVD5QPw5RAlkllFnY= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-609fe93b5cfso3681657b3.0 for ; Wed, 20 Mar 2024 13:12:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1710965550; x=1711570350; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MFPFliDVzkwfGu3jramLj0g3qa3jh4QdDyPnAlfKBdY=; b=hrpC2yNpy5PWwFuWAnOQrTG/87QkVyk014n9UhegsZy1BhorEy7nxzmUQ4IgMTugcm JYr0Yq9nExyGmlTK0womIlgYRBar2T6BjWmabZJ5VCPqx8iun7ae3jFvonSp0OnVBD4S hF8UDUAsGvA5NKQ4ET5D4Eov3cMVB67b/DAiviRpagL+cwImyK7Lfchw1krA4LHDyYI3 /gUY1HQ4Pi4ls4rvVKMOh2t77QGyx2FCTOw2szQOgvepH84vxmuFh4mhdjFV6hOpKvCt U19nCEDXtMM8KvKEJ6wYbATnL+9bsPue8HgVQR84sgAj1sE6AQBzszNQdcosL9qTsWl6 7zAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710965550; x=1711570350; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MFPFliDVzkwfGu3jramLj0g3qa3jh4QdDyPnAlfKBdY=; b=QdWX9mdvctKrDvIqnl5TgH2gKMTdRbI9KJadVKRAWaU2TJqcpTJh18qK3q+6lHy/cE xcZoB3FklPl06kuxRW1Adn6pyo9Oga7CjBhox1mtlhm8/LGoGAPYIcRL72XJyFS8IohU 7/EcazUzCaWyazekWwHWilZmgRPNLdTkaJ3C8Ti6umFGF4N1GuK/JmSA+ECBZw19udbI 6UR3KaoveCZZzwX0WWqyjZRbLDt07jSpDDnmDXndYGrfDV+axxb8/wRmqnYLIyyK4b+y Ym+XEIPfuH2A0q+eV1pCSFhuXnxtBG+lc1Nro8f5d1ojpchhlsM9YGtvMbpPUFZenev/ 5PMg== X-Forwarded-Encrypted: i=1; AJvYcCXzjcrBxFTZZUz02AFh+vBxl/bDOzWFteZdNK4IbY4KMXLeky8ie/dupEDXpR/5kIff2SifHc614VXh4zxQfGnrmMQ= X-Gm-Message-State: AOJu0YyEDap1KrQ/HJjVHWdIAUB+MSViNt9sSovaK8HRL/XR1qxSZQxC sX3mjJFjEa8+N+fGm3506jP/7MfMh5w5Gx4oo8g5yyydk7nIb/+nQD39jjV6dwgJgOolQj44f5T LOiQmyvW06b7hs93Xfw== X-Google-Smtp-Source: AGHT+IGPF/ySlgD25U7W2vZazRdRk9ruHMep/a5Eo67dNFR2Jegj86DoTftiFq+3a4jk688Kl9QtyHFGvx26RvqG X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:200b:b0:dc8:27e6:cde1 with SMTP id dh11-20020a056902200b00b00dc827e6cde1mr1037009ybb.5.1710965549936; Wed, 20 Mar 2024 13:12:29 -0700 (PDT) Date: Wed, 20 Mar 2024 20:12:27 +0000 In-Reply-To: <20240320200322.GG294822@cmpxchg.org> Mime-Version: 1.0 References: <20240319-zswap-xarray-v7-1-e9a03a049e86@kernel.org> <20240320100803.GB294822@cmpxchg.org> <20240320192558.GF294822@cmpxchg.org> <20240320200322.GG294822@cmpxchg.org> Message-ID: Subject: Re: [PATCH v7] zswap: replace RB tree with xarray From: Yosry Ahmed To: Johannes Weiner Cc: Chris Li , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Nhat Pham , "Matthew Wilcox (Oracle)" , Chengming Zhou , Barry Song Content-Type: text/plain; charset="us-ascii" X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: EDD2D140018 X-Stat-Signature: wej4j6cxjsjer4t4aqd8njzudpcyf1tm X-HE-Tag: 1710965550-965664 X-HE-Meta: U2FsdGVkX1+S0s6S3qA6gR+NQRfiqUCMFS0tOfL9YHiy0AEMmBocQd2eZ0aTTCIlmgvwOT66Tv8vOmrr49kNFHTA+9ImUJ1d0j+2qXDa07qrfzCpiODYq0Z6Du0cNbh/2N66dCSxkAMIHAiwT8yJR3QPh6gyTyRJlwgjp6KdH3sIcPSCVlzE+8Q+hEbNrYkqbsn/mcrzkH/GCRSfu3OAhlIJ69anlGoQHfwotaURQcDtzYZ1DWr1nhA0iRDZWeNEBnpiYoWj0aLdWuFl+f6ympm0xfRdur1Q4owwKMTpYBrKYM+epSt0dkgaAOd6L6ACiKKccswCbDTRtZqkrKL9lU1AnhiAXCbxXLd+a1LBpDghBkDOaLALqHqXk72Pr2Rlhcxju1Hhdlcvj9SgcBDwvlbq7GB5ZCV1fH592/xEH6ytkjSrAgpucXTyubh5grSOas9Bcn5m4AEOXdnETSge0CwGaxzSwCacoylNqdxPcTdH2wENewV3WlzdAeoe4I6yonu9/urlIPQh/2jQjec529nlrX3X9EdFN2EZ/5TPCutfSmwfcNxM1fPIWAT1o9z1bq0gtvVYLPoOOYnfzLPYdjFd5SFUDDAp7rBAXGs8/PnZhDe49jwR1eXvK8EkoM1E/H+YuWdjfwhPqHaqnjo/ceCPLz1O5zJ2e3TL6f89+oQRlgW+dAlNmvq6tTs/4Adu6qQhmWAiFhUXX9471SOBDQVPsaR6eW9TFW9dXrfE2xSYU3be172Mz3Of8IXhlT8/5P4aZjrqqwdrq9J4/ZMUUnvn1SNC85lVquM8qlvRSrKNY6j32EGJYmwBGts+YdNERqVbyhy9F3PMgZsb3AND5/2adSAjpA4OEPt7mL0DZ2nrdfgAWeUt4NvH6WELRF9s6RhrhDnDvv+d83OULnDxNKDZMKMp/Wo0KpgItkqqdkItLWCGovnCN5iWrbBm/jATbqyUkWY90V19Y4em876 MzH5GZ+8 rDx1+qglZsARf0JThehGoRpjAYrc9UYPv3ftTBWbSqgrmp53bYnSUBfp6VudC+Kg4SNWCb922qefi6CGFl7yVxQPEZdTZJNh52zLsyzw9wezVegSK26SkCd/HSX5Uni2oZtI5qo9gu1MfId0eAtmxOKNBOguAE8UQjpKd+tTnipT4Gj40OQES8w79EQw+v62T/hNoaL5y3LaD3W+AGoAZp1aYLvGW5TLOB5mulv5gJ2Z+pgXuURven+dBfpCpATByBYreSmljdk+xSS+WiJPXzsumBFOQ5tDEWVh1N9tbGX+UjSKwaq6dpgPz+K8nWe0OJ8mqxatrUwkE3PSBnrLl+w4Asa3bojSBLAJ/UWBvzB8x+ij5TGwqwi5XqAeRhc8FpOUYakhrZTe/dBVUNEetvDlm47pAEEJCPX4oL8WGZZII16mSKmHXkbCrkULMfz9/Erg/xERT4PsX9IxkSC3KrXujU5OM9d771cX9vAkYrnLzzDbm18kSltqcFmA6z8rakK60goVBjNZKDrVVs/7DydxzJx6Mht3h1r1onUkR7M7auvhBPYASSJRV3FV77AwrqmyqsohNv2rG7z5QHuIBL0HDRv5Em0PPIqsleFWV7zC9T0Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 20, 2024 at 04:03:22PM -0400, Johannes Weiner wrote: > On Wed, Mar 20, 2024 at 07:34:36PM +0000, Yosry Ahmed wrote: > > On Wed, Mar 20, 2024 at 03:25:58PM -0400, Johannes Weiner wrote: > > > On Wed, Mar 20, 2024 at 07:11:38PM +0000, Yosry Ahmed wrote: > > > > On Wed, Mar 20, 2024 at 06:08:03AM -0400, Johannes Weiner wrote: > > > > > On Wed, Mar 20, 2024 at 07:24:27AM +0000, Yosry Ahmed wrote: > > > > > > [..] > > > > > > > > > - /* map */ > > > > > > > > > - spin_lock(&tree->lock); > > > > > > > > > /* > > > > > > > > > - * The folio may have been dirtied again, invalidate the > > > > > > > > > - * possibly stale entry before inserting the new entry. > > > > > > > > > + * We finish initializing the entry while it's already in xarray. > > > > > > > > > + * This is safe because: > > > > > > > > > + * > > > > > > > > > + * 1. Concurrent stores and invalidations are excluded by folio lock. > > > > > > > > > + * > > > > > > > > > + * 2. Writeback is excluded by the entry not being on the LRU yet. > > > > > > > > > + * The publishing order matters to prevent writeback from seeing > > > > > > > > > + * an incoherent entry. > > > > > > > > > > > > > > > > As I mentioned before, writeback is also protected by the folio lock. > > > > > > > > Concurrent writeback will find the folio in the swapcache and abort. The > > > > > > > > fact that the entry is not on the LRU yet is just additional protection, > > > > > > > > so I don't think the publishing order actually matters here. Right? > > > > > > > > > > > > > > Right. This comment is explaining why this publishing order does not > > > > > > > matter. I think we are talking about the same thing here? > > > > > > > > > > > > The comment literally says "the publishing order matters.." :) > > > > > > > > > > > > I believe Johannes meant that we should only publish the entry to the > > > > > > LRU once it is fully initialized, to prevent writeback from using a > > > > > > partially initialized entry. > > > > > > > > > > > > What I am saying is that, even if we add a partially initialized entry > > > > > > to the zswap LRU, writeback will skip it anyway because the folio is > > > > > > locked in the swapcache. > > > > > > > > > > > > So basically I think the comment should say: > > > > > > > > > > > > /* > > > > > > * We finish initializing the entry while it's already in the > > > > > > * xarray. This is safe because the folio is locked in the swap > > > > > > * cache, which should protect against concurrent stores, > > > > > > * invalidations, and writeback. > > > > > > */ > > > > > > > > > > > > Johannes, what do you think? > > > > > > > > > > I don't think that's quite right. > > > > > > > > > > Writeback will bail on swapcache insert, yes, but it will access the > > > > > entry before attempting it. If LRU publishing happened before setting > > > > > entry->swpentry e.g., we'd have a problem, while your comment suggets > > > > > it would be safe to rearrange the code like this. > > > > > > > > > > So LRU publishing order does matter. > > > > > > > > Ah yes, you are right. entry->swpentry should be set to make sure we > > > > lookup the correct entry in the swapcache and the tree. > > > > > > > > Perhaps we should spell this out in the comment and make the > > > > initialization ordering more explicit? Maybe something like: > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > index d8a14b27adcd7..70924b437743a 100644 > > > > --- a/mm/zswap.c > > > > +++ b/mm/zswap.c > > > > @@ -1472,9 +1472,6 @@ bool zswap_store(struct folio *folio) > > > > goto put_pool; > > > > > > > > insert_entry: > > > > - entry->swpentry = swp; > > > > - entry->objcg = objcg; > > > > - > > > > old = xa_store(tree, offset, entry, GFP_KERNEL); > > > > if (xa_is_err(old)) { > > > > int err = xa_err(old); > > > > @@ -1491,6 +1488,7 @@ bool zswap_store(struct folio *folio) > > > > if (old) > > > > zswap_entry_free(old); > > > > > > > > + entry->objcg = objcg; > > > > if (objcg) { > > > > obj_cgroup_charge_zswap(objcg, entry->length); > > > > count_objcg_event(objcg, ZSWPOUT); > > > > @@ -1498,15 +1496,16 @@ bool zswap_store(struct folio *folio) > > > > > > > > /* > > > > * We finish initializing the entry while it's already in xarray. > > > > - * This is safe because: > > > > - * > > > > - * 1. Concurrent stores and invalidations are excluded by folio lock. > > > > + * This is safe because the folio is locked in the swapcache, which > > > > + * protects against concurrent stores and invalidations. > > > > * > > > > - * 2. Writeback is excluded by the entry not being on the LRU yet. > > > > - * The publishing order matters to prevent writeback from seeing > > > > - * an incoherent entry. > > > > + * Concurrent writeback is not possible until we add the entry to the > > > > + * LRU. We need to at least initialize entry->swpentry *before* adding > > > > + * the entry to the LRU to make sure writeback looks up the correct > > > > + * entry in the swapcache. > > > > */ > > > > if (entry->length) { > > > > + entry->swpentry = swp; > > > > INIT_LIST_HEAD(&entry->lru); > > > > zswap_lru_add(&zswap_list_lru, entry); > > > > atomic_inc(&zswap_nr_stored); > > > > > > > > > > > > This also got me wondering, do we need a write barrier between > > > > initializing entry->swpentry and zswap_lru_add()? > > > > > > > > I guess if we read the wrong swpentry in zswap_writeback_entry() we will > > > > eventually fail the xa_cmpxchg() and drop it anyway, but it seems > > > > bug-prone. > > > > > > I think it's more robust the way Chris has it now. Writeback only > > > derefs ->swpentry today, but who knows if somebody wants to make a > > > changes that relies on a different member. Having submembers follow > > > different validity rules and timelines is error prone and makes the > > > code less hackable without buying all that much. The concept of > > > "publishing" an object like this is more common: if you can see it, > > > you can expect it to be coherent. > > > > Fair enough, but don't we still need a barrier there? Couldn't some > > initializations still be reorder after zswap_lru_add()? > > Only if it were lockless. The LRU unlocking in zswap_store() implies > RELEASE, the LRU locking in writeback implies ACQUIRE. Those force the > desired ordering - nothing can bleed after RELEASE, nothing can bleed > before ACQUIRE. Ah yes, I found smp_store_release() deep in the spin_unlock() code. Thanks for pointing this out, all is good in the world.