From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D69E6C54E58 for ; Wed, 20 Mar 2024 19:34:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DE9F6B0099; Wed, 20 Mar 2024 15:34:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 58F096B009A; Wed, 20 Mar 2024 15:34:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 457FC6B009B; Wed, 20 Mar 2024 15:34:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 35B166B0099 for ; Wed, 20 Mar 2024 15:34:41 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F1C891A08BE for ; Wed, 20 Mar 2024 19:34:40 +0000 (UTC) X-FDA: 81918419520.22.0304B99 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf16.hostedemail.com (Postfix) with ESMTP id 249AC18001D for ; Wed, 20 Mar 2024 19:34:38 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="bq1cE/2Y"; spf=pass (imf16.hostedemail.com: domain of 3Tjr7ZQoKCOUfVZYfHOTLKNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--yosryahmed.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3Tjr7ZQoKCOUfVZYfHOTLKNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710963279; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nCtNOSJdwwctkwSnXfVUE4a/gFZD0gzHtS0vQGF5kXs=; b=puNkXnh//zLRPbexB/6SRD2rJr/LJLUSGM6CVrVE9Zyp0Ppaw5Q7MnYzXoFXULrgQjt4vA 3S1eKN48d7p80z/ZRmwMYNVMzpBn7TZUdlrTWdl4gXLgdYi+YB2a0PJmT8+eo6VNzQqYW9 etL+M05SDDCptoaPUSQ5l+NQZXzmsLQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710963279; a=rsa-sha256; cv=none; b=2bDHLyfMm7y/DEQDdsA9Z8sex3QCIRoJ9xIcxB/fjgf81fAVEklqA8ES60/DivfLPPJmDv BRodOSW85CtdekW9iRgVRRV9f2kC/Hh7QUC6KTxg1o2CDkL5brcwCRZtt6TfzvN+F40BPJ nv59XX85V7jHQ5h1tDX5wj3gvym3RDI= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="bq1cE/2Y"; spf=pass (imf16.hostedemail.com: domain of 3Tjr7ZQoKCOUfVZYfHOTLKNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--yosryahmed.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3Tjr7ZQoKCOUfVZYfHOTLKNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dc6dbdcfd39so290503276.2 for ; Wed, 20 Mar 2024 12:34:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1710963278; x=1711568078; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nCtNOSJdwwctkwSnXfVUE4a/gFZD0gzHtS0vQGF5kXs=; b=bq1cE/2Y1WMSuAOmZUlb0Va2U0sbFrbxQRZc8nZKlcy7gy1WxAYinX5ZIyvyEAfMvC NNoAVEbYmqXHwK6uC5Z0Nyl25w86JI3I59v9611KyCyDmmlyo1kdXK4SlFHeSS3RwSp7 0z7M+qX8kRD91zeQf0tzllF9xu5HraKiu0B+aLkK3LZNzDLR3ap6beBNB9gUdSl+mRGg Cn93H1QgG18wrx+GCnenKZqSD0DPVOx2fNYSFhBdJcHXBLXuXN6HNyWKmv4NJPYL/5Ol COBpU8Agj8f1i4pyCjjrFzpg/WYwQ0WWqwTdx1W8lHJ6gcvs7URrkQp8U1DuGFIGsH5o 6VbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710963278; x=1711568078; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nCtNOSJdwwctkwSnXfVUE4a/gFZD0gzHtS0vQGF5kXs=; b=LnC2P11+cy9IjhJj9K0351K69LDwis2ZTkM1bpzXYVyVuHXMLqKdgsu23tmPWlkfEe 0WEoc7RAGlQVpEWBJ0vMldowN58N01GMR6VFe6Fm4HiYMS+6HMOTrdd9LvsohDypIMec U+IQ1Y48TpG8FKW7nCYy9ZKWB+jNOHYcR0w2uVf07KZYMVAeZbIu9GEQG2rXyIohaV+Q lp4zB0jWPbq5TqYM5Pja8zvpWgZQUtzGnnZmlqV5OkYfhUAWWYelNIhXu5Rh2re7+Oe/ xnvXzQ7FupwsR4s38EhgwkNnTI8HfZxSL4kBt3xISx44aDNL7bVs2k7eYZpusf41ULhx LjEw== X-Forwarded-Encrypted: i=1; AJvYcCUV6IBdiL04JhIx25NAyL9lyvpLiRixOdtCnDPXSdtbORhQHFBqvoVgmXncM8QVYVoAQqmlU4P1xnwAI94vLOUw5Qw= X-Gm-Message-State: AOJu0YzI83PtklmvDUTjuTzf7wCpK6jsdu0Qrnq6eJSkgVGeqBoFDw5K FYQR2/VphH5Nf+ihosrUalJYrTW2aFBwUA/4VOKIm74MluLhBAIi+iDSmyYgjaNNhIVUrJXLDXe u7NWokJTnBajNLK3kNQ== X-Google-Smtp-Source: AGHT+IG+D7riHHTgenWZerCEUVXqrKvvCbK0pOaP+HMOgVPdxYRUXyMEprWnBYPidPoo3JHK251YueLjBLppeNA7 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:2204:b0:dcc:f01f:65e1 with SMTP id dm4-20020a056902220400b00dccf01f65e1mr5026622ybb.8.1710963278170; Wed, 20 Mar 2024 12:34:38 -0700 (PDT) Date: Wed, 20 Mar 2024 19:34:36 +0000 In-Reply-To: <20240320192558.GF294822@cmpxchg.org> Mime-Version: 1.0 References: <20240319-zswap-xarray-v7-1-e9a03a049e86@kernel.org> <20240320100803.GB294822@cmpxchg.org> <20240320192558.GF294822@cmpxchg.org> Message-ID: Subject: Re: [PATCH v7] zswap: replace RB tree with xarray From: Yosry Ahmed To: Johannes Weiner Cc: Chris Li , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Nhat Pham , "Matthew Wilcox (Oracle)" , Chengming Zhou , Barry Song Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: 249AC18001D X-Rspam-User: X-Stat-Signature: jsmd6ukc99j3z9p86ar4eawe6w13wqsq X-Rspamd-Server: rspam03 X-HE-Tag: 1710963278-217242 X-HE-Meta: U2FsdGVkX1/aZwBKRKCBAPFKAYpLK7CL36NjtgJz2F8R9ecdULXVR297T/EhOi0MFVMLC9SfDmaWGdU/60o9ux0cM1aa/B2cn2fkHHu93dpb9gCTDw53nToD/+UstIgt3VJwE4vz/+F9m6y3WRCI3WArk5BNPvMIArRWHebK4DPLnDsxoEkfI66rrOuXUflBaOD02Nr2mUvHV226HIb92kof72U9ZWSXoimoH6fE9usWegNBHyF2kMx57T3uIJezJUzvGgQzARUvDvIWXxoQBgC12QoNOXAF63/ID7wzdbH+pV7cukuWfe9AEiLKCvgtZCFpm9pKgFquqnYzZbUmS4nhCpUxNErjOR9eWP+Gc/gADll3D6A30jWpS85FymCPZCPtQa1CAGq9UCu0I0kH1UrA9w8kuqgaNAEHlmrLN43wm+CH5OZROmVqIi7QNBE0qulLaQnTQmTMsQhsB4f9WddbhzItih5MWGagYNuMNLOdlpFXiQZAyYFv+DeTR25NZkKqtzabfFCDgVnfOq2hO9MGaUFF89XIq94vJefIJqDE6kUW06vq3TkZ9ZJmWIyToM9OZ/iT14bXkodC0TsFHvtbblnP2xIKq2tuv3VRO0rXaPNFNx+u/ezsCADg8h5lPu72hdjQbGlQRpH5YAmgGIJWE5oPPraIF+09NRVLUY/nRE8GUvF96uyHNP8xm8e0ql+E2gW+QtU8bW+7Kdw0nkhE9sCPKBlsG35L2oczO52G8icebHFbvgwqAqgTs7cNYHC2ELNwO0zYaB/OuhFfqV8zWH3Eeqw1W0UW5wQcKdDXRdDJ4a3nWn6CLk38ZOKijIuQ8S9mYw59DTJvKOzDMOOqb8MFMRrmQJM7t1aB8Vqi05gT86KufxkZ7rxJOO6kYuRzYYNfm1wDNl5jU/GdVKqyrZH3UGrqhfw4Hf1ztVlnL3NHj3X/Qjc6GJCx723f+rjhzcLBpYxqc12/jOO x1a8+rk5 azdhx44YJS+aBitRA+hcTrk6z9atu7/hS21TBcKJECLg/KJEmeaGCdBi6TRUOqK8O23C+Omfi8cysPrpkY56hVrb835ai+Zbf2vSt4lh5/1Cx7u61JSyJq3Ti+uH/Gn/ipfiMHRk8ygpwitNWUAEeevG0oCof7Mt7A7iy/AbMgoH+WE1lpmkHHnqt+d3YiKtb0XEERx+zrCemIakL0Rrf0AL1JdsPh9h0qUp0vPw3AXIQqGv1QWie/I4DxTnPfQfsTSjTrzWlDKysF7wsvqJSUMPk3MGDMRrg5NOGhyVVmsr6XLDgSkMeMW6Sia8tgjFNspp3ddv8FuXOsMmenjY8wpQ5CEQU+ovAaZkxVfWIHuF3ePhTcgAQ9+B5rhmpN7eJE7Stvnrtomwd0EbMzGyifd8OZs5LML05G2lFIrCuVfnyv5DGdx4Ty4UJIkwIjWMMUEzJa2LbyGgoiWvHRxuNChjQ44z3xgO7D8qtiySPf9nqD/mCGyxFZgyYEBVW4w0WFvh1oe57zQXkUmLp8WPcdDnoFhI0VQjHwWvHZLbKtniCVEmEAB9iRT3L2mle1fy9FBssQ+VL3zsctlUD+FFoAqCUqLgwTW2TA+GDBQIQmvcrZziI/kA6oJcn3w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 20, 2024 at 03:25:58PM -0400, Johannes Weiner wrote: > On Wed, Mar 20, 2024 at 07:11:38PM +0000, Yosry Ahmed wrote: > > On Wed, Mar 20, 2024 at 06:08:03AM -0400, Johannes Weiner wrote: > > > On Wed, Mar 20, 2024 at 07:24:27AM +0000, Yosry Ahmed wrote: > > > > [..] > > > > > > > - /* map */ > > > > > > > - spin_lock(&tree->lock); > > > > > > > /* > > > > > > > - * The folio may have been dirtied again, invalidate the > > > > > > > - * possibly stale entry before inserting the new entry. > > > > > > > + * We finish initializing the entry while it's already in xarray. > > > > > > > + * This is safe because: > > > > > > > + * > > > > > > > + * 1. Concurrent stores and invalidations are excluded by folio lock. > > > > > > > + * > > > > > > > + * 2. Writeback is excluded by the entry not being on the LRU yet. > > > > > > > + * The publishing order matters to prevent writeback from seeing > > > > > > > + * an incoherent entry. > > > > > > > > > > > > As I mentioned before, writeback is also protected by the folio lock. > > > > > > Concurrent writeback will find the folio in the swapcache and abort. The > > > > > > fact that the entry is not on the LRU yet is just additional protection, > > > > > > so I don't think the publishing order actually matters here. Right? > > > > > > > > > > Right. This comment is explaining why this publishing order does not > > > > > matter. I think we are talking about the same thing here? > > > > > > > > The comment literally says "the publishing order matters.." :) > > > > > > > > I believe Johannes meant that we should only publish the entry to the > > > > LRU once it is fully initialized, to prevent writeback from using a > > > > partially initialized entry. > > > > > > > > What I am saying is that, even if we add a partially initialized entry > > > > to the zswap LRU, writeback will skip it anyway because the folio is > > > > locked in the swapcache. > > > > > > > > So basically I think the comment should say: > > > > > > > > /* > > > > * We finish initializing the entry while it's already in the > > > > * xarray. This is safe because the folio is locked in the swap > > > > * cache, which should protect against concurrent stores, > > > > * invalidations, and writeback. > > > > */ > > > > > > > > Johannes, what do you think? > > > > > > I don't think that's quite right. > > > > > > Writeback will bail on swapcache insert, yes, but it will access the > > > entry before attempting it. If LRU publishing happened before setting > > > entry->swpentry e.g., we'd have a problem, while your comment suggets > > > it would be safe to rearrange the code like this. > > > > > > So LRU publishing order does matter. > > > > Ah yes, you are right. entry->swpentry should be set to make sure we > > lookup the correct entry in the swapcache and the tree. > > > > Perhaps we should spell this out in the comment and make the > > initialization ordering more explicit? Maybe something like: > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index d8a14b27adcd7..70924b437743a 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1472,9 +1472,6 @@ bool zswap_store(struct folio *folio) > > goto put_pool; > > > > insert_entry: > > - entry->swpentry = swp; > > - entry->objcg = objcg; > > - > > old = xa_store(tree, offset, entry, GFP_KERNEL); > > if (xa_is_err(old)) { > > int err = xa_err(old); > > @@ -1491,6 +1488,7 @@ bool zswap_store(struct folio *folio) > > if (old) > > zswap_entry_free(old); > > > > + entry->objcg = objcg; > > if (objcg) { > > obj_cgroup_charge_zswap(objcg, entry->length); > > count_objcg_event(objcg, ZSWPOUT); > > @@ -1498,15 +1496,16 @@ bool zswap_store(struct folio *folio) > > > > /* > > * We finish initializing the entry while it's already in xarray. > > - * This is safe because: > > - * > > - * 1. Concurrent stores and invalidations are excluded by folio lock. > > + * This is safe because the folio is locked in the swapcache, which > > + * protects against concurrent stores and invalidations. > > * > > - * 2. Writeback is excluded by the entry not being on the LRU yet. > > - * The publishing order matters to prevent writeback from seeing > > - * an incoherent entry. > > + * Concurrent writeback is not possible until we add the entry to the > > + * LRU. We need to at least initialize entry->swpentry *before* adding > > + * the entry to the LRU to make sure writeback looks up the correct > > + * entry in the swapcache. > > */ > > if (entry->length) { > > + entry->swpentry = swp; > > INIT_LIST_HEAD(&entry->lru); > > zswap_lru_add(&zswap_list_lru, entry); > > atomic_inc(&zswap_nr_stored); > > > > > > This also got me wondering, do we need a write barrier between > > initializing entry->swpentry and zswap_lru_add()? > > > > I guess if we read the wrong swpentry in zswap_writeback_entry() we will > > eventually fail the xa_cmpxchg() and drop it anyway, but it seems > > bug-prone. > > I think it's more robust the way Chris has it now. Writeback only > derefs ->swpentry today, but who knows if somebody wants to make a > changes that relies on a different member. Having submembers follow > different validity rules and timelines is error prone and makes the > code less hackable without buying all that much. The concept of > "publishing" an object like this is more common: if you can see it, > you can expect it to be coherent. Fair enough, but don't we still need a barrier there? Couldn't some initializations still be reorder after zswap_lru_add()?