From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0A998C77B73
	for <linux-mm@archiver.kernel.org>; Wed, 31 May 2023 02:30:24 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 965606B0072; Tue, 30 May 2023 22:30:23 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 9157E280002; Tue, 30 May 2023 22:30:23 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7DD1A280001; Tue, 30 May 2023 22:30:23 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 6DEC46B0072
	for <linux-mm@kvack.org>; Tue, 30 May 2023 22:30:23 -0400 (EDT)
Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 2FDA31A00E9
	for <linux-mm@kvack.org>; Wed, 31 May 2023 02:30:23 +0000 (UTC)
X-FDA: 80848971126.25.8D0D72D
Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49])
	by imf10.hostedemail.com (Postfix) with ESMTP id 26349C001F
	for <linux-mm@kvack.org>; Wed, 31 May 2023 02:30:20 +0000 (UTC)
Authentication-Results: imf10.hostedemail.com;
	dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=W+CvHuZu;
	spf=pass (imf10.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.49 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org;
	dmarc=pass (policy=none) header.from=cmpxchg.org
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1685500221;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=iwgqufLL+BZf+iFjUCvuUpn5v95zPOIiAfCvQNA1OTY=;
	b=oUoHaQKxHXMfLD7x8N7a/hsxXy5xeESgCQcxY3w4ULejIlsfY/iWO0MWr0S2CelkIKQTP0
	goYLinP1DFCIj4/0+T6MxdgD1M8XUeqsxB9QbBVQYUCE7s09GxXnbaMeH5KLAwJo2XfOof
	Mb4FV6bBRsAVTTQGiu22MJ8/hpvy8pM=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685500221; a=rsa-sha256;
	cv=none;
	b=sr6M7PNPeWJ2pTMM27sNKgcPV+ekUPSGB6tttFgcH8DOjZfR0G0KZKy+REOHDUtpMVVzW1
	2m+6fLkFmHj6pBxi3hg2HjVbWXd39nmDteTnCvynDwGWLh6mfwQJo6fuaiPcH4XKC1tIug
	q6z0pBpBJYEomkItb0gkL9TOavCTxKs=
ARC-Authentication-Results: i=1;
	imf10.hostedemail.com;
	dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=W+CvHuZu;
	spf=pass (imf10.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.49 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org;
	dmarc=pass (policy=none) header.from=cmpxchg.org
Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-61cd6191a62so27676146d6.3
        for <linux-mm@kvack.org>; Tue, 30 May 2023 19:30:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1685500220; x=1688092220;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=iwgqufLL+BZf+iFjUCvuUpn5v95zPOIiAfCvQNA1OTY=;
        b=W+CvHuZuGxk1g1OV26ZeNlIdaJCON/0DbBboKDdO8uREAiJ58P6JokBX8FMBbCBXlO
         lu7QSX406XjqlPluKy0+0lB5BH7lpIe/nT341347lEBE0dwnEYXftfNTinQMFoLYdZf+
         pI1djVBRX5Fxf/PhkVULt/IrU8ShNx2hQU3HCv7JctlcEV9GfEStkxQe8ZB0iWZwB4EN
         jPa8OsujecSAcgyLXwBys3SVnwE5e7E2I2BaBqwvk5YqvtGMq96Jn8MhYl5IhOQlcnE+
         HMfvYt2glTogyHPldgDfq3k92r0mngVKEQhSDrBDZ51l5NyRPAK1ATDwd8zES4hwSuKk
         7x8A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1685500220; x=1688092220;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=iwgqufLL+BZf+iFjUCvuUpn5v95zPOIiAfCvQNA1OTY=;
        b=R4lRLQkgL/akhzokOPYuahJclI9HwdjBVvc2zch0YMBU2xzAufFWe7NnWSjc71ElNe
         /i9sOVPYRX0bNQtsjTnzeKuURTMX85EAjzD+f+qFwxKDxw1KFgDBAqeGVhBmQInwIXir
         5pl7PEr+dZ5aiGsaKP7ZuuCkFoS5iCZrimbhvCnogn7JKJzd68iRT7uttmiFdIaelU+l
         0/hakWsO4xIhYqk6CN266Y0R4TIiS6Cp32Me2idqBbvapqvxOmZ0USQszSzET+XtieG0
         nqMcLGPNOjmq7HRriaG4XES8QoXrZnA9eZJ1q9AxgneMl0WJadsGPG/+lAeMQSi0K8uQ
         pygQ==
X-Gm-Message-State: AC+VfDw3fliXPMOFgBvAFVamtr1uMbJwIKvsvZYK3z8biq0I7TCUyu5D
	UGAVET2ByPgxRvkycW3u9a8muQ==
X-Google-Smtp-Source: ACHHUZ5dnAdpJQEsgVQGfGbrrL9fOzvHwkNp3TtGdEGJEbU/6FyWqYAEns/B6P92wSR9d57Gs6o/1A==
X-Received: by 2002:a05:6214:e43:b0:616:4b40:5ea9 with SMTP id o3-20020a0562140e4300b006164b405ea9mr4908421qvc.40.1685500220181;
        Tue, 30 May 2023 19:30:20 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:8bb6])
        by smtp.gmail.com with ESMTPSA id k8-20020ad44208000000b006261a1cd7f7sm3147860qvp.10.2023.05.30.19.30.19
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 30 May 2023 19:30:19 -0700 (PDT)
Date: Tue, 30 May 2023 22:30:18 -0400
From: Johannes Weiner <hannes@cmpxchg.org>
To: Chris Li <chrisl@kernel.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, sjenning@redhat.com,
	ddstreet@ieee.org, vitaly.wool@konsulko.com, yosryahmed@google.com,
	kernel-team@fb.com
Subject: Re: [PATCH] mm: zswap: shrink until can accept
Message-ID: <20230531023018.GC102494@cmpxchg.org>
References: <20230524065051.6328-1-cerasuolodomenico@gmail.com>
 <ZHE7KWId1fvXLpUy@google.com>
 <CA+CLi1iC35exSaKaCmVmTWtna=9xBKHSyGNGmHFo=OqiSECGoA@mail.gmail.com>
 <ZHUSfg+z3wcaIhAT@google.com>
 <20230530041341.GB84971@cmpxchg.org>
 <ZHYNaxgcKs7PSp9I@google.com>
 <20230530155519.GB97194@cmpxchg.org>
 <ZHY+C0ICTah8/+V3@google.com>
 <20230530185451.GA101722@cmpxchg.org>
 <ZHadno3POk05o82x@google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ZHadno3POk05o82x@google.com>
X-Rspamd-Queue-Id: 26349C001F
X-Rspam-User: 
X-Stat-Signature: fpdh7haqm95f9eun9wornc5egjqeti4m
X-Rspamd-Server: rspam03
X-HE-Tag: 1685500220-951857
X-HE-Meta: U2FsdGVkX1/9p7kfncNbLcDkfpO8vu+WcQf9PopV61YNfYFlm/a4rl2VGj8v1WLzfB1bwlLrXNT1IpeNuLMFQ0kQfDw0c8wSsw3YYrMEttUwNDqIIwgEL/uoff4VBfuTSZiA6BWfhxWAPe1nAwhejqXtw0+PQkj77HEH/vzStH25hrTdjK9gbHNvGdFI8T8LYxKNU8IZOwBNDX+xmt3hRvb8PQjrbgjzK5yienX1qoSxNQrbvIjEq9bEZDB7952fiJa56ox/tj1poAxCRjrezYQ19S5zXNDxhIYFE8rcDB3YKtbBa0ds74rG2PXkI4gNs9g/5j/VzuzQIZYZKQxTN4AXqhZOcSaTayf1e1Ar767RGDvc0uymO1N6lBzBfASunpJ7WGzw74rLHg/eTJS9aXD86uUJVbsBuc15ZV0dFaEyVcdfn1ff+3XFAICo95CvQzoqSwbOaJ6GpnPK9m1sFjbceJ0vTghQP8ZuR/Or3SY63uC9qwApvswE+4fgf2jzkyxVLXJdMOsUGF2jaOQeJ4h6S5f9eNpr5Qqh3d5P6/uf/yMEFgDG5YqzMpWreso7s6P3gYDC+G8BSC8m11hzsTsylHYcfsW2cfQJTfPttZQxvWfWZraIqg44eHcC5N6zT+QzSUwfA/AHl/IMKR4U/gkoDSB26OZ1iEETLPAW81yoOr2EYM0gINww3pgDWFF5r1WeFAD1yB1Waj2Yrj6a7A6UIFddFt4fZmxaa/9QtASo75YS3gwYNkwyz7P3GjjQxjaIlHancI+lFun99nr/gjbmXzwOjFNVHiu0VHAnTsIlz4U4cU4C2skbRtlkMZiIdi7dQpay9u6IOEyAAY/BdpkWYwiwCYGDKDf54Gv+3kyuCjlT1pQJ5SztX64xXFMqBLplRUBvxVMatkRxoSfQXMcMUCYhi8DE5g3Wlr4Hro8sqpoHwsMBBfqWjefE/vCNea4FoNYavYI1pmsjKqV
 WA58dFDj
 OJLVGZeVOjksSVxhxrNUYskSMjo9OjhaIYgeLojJVe/gKy4Lc+9NIyxKKmYYyZ/gh99w7NrJ8XnI0RLBIacUfvFAzuTuDDJ2pxQ+rf5XdM7E8vJIiGUKuOtageT9EK48QerA50WbdlkZ116FcQtfWrlvF1JEGg2nlD2q8BbhiXIsKMaRhfU0tOWPTcctN8+ZsHMMLdg7O0royuEIxRfSNRauCjw5R7pk9QWJlJD0ZSzzmGYbblej7MRVkYpiZWPZoCdVeREQTOt+34RR365MEIc14GJQwQlkh/x8bS7xhlhCaNU6cFLe5BaOdY96W7VFD10ghV2O4Q3AZh6hemkjrbeb6uv8PXKP5nRVQj5N6SaP7pghiOKZzUMQyd3JWR1I7YLLlT1Bbvp2d5KWazMlTSvWJqvXb2BIzIJMoZMEr2RinGWs=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, May 30, 2023 at 06:06:38PM -0700, Chris Li wrote:
> On Tue, May 30, 2023 at 02:54:51PM -0400, Johannes Weiner wrote:
> > > Maybe ENOMEM is a bad example. How about if the swap device
> > > just went bad and can't complete new IO writes?
> > 
> > This is actually outside the scope of zswap, and handled by the
> > swapcache (end_swap_bio_write).
> > 
> > Once the IO is submitted, zswap will ax its copy and leave the rest to
> > the swapcache. It behaves the same way as if zswap had never been
> > involved to begin with when the swap out fails on IO errors.
> > 
> > From a zswap perspective, there are no persistent errors in moving a
> > zswap entry back into the swapcache. Not just currently, but generally.
> Again, you are right that this zswap writeback is async.
> So the writeback error is NOT going to propagate to the
> shrink function.
> 
> With the current three pool backends that I looked at{zbud,
> z3fold,zsmalloc} they all have internal retry 8 times.
> Adding more retry did not fundamentally change the existing
> behavior.

Ah, but they're looping over different things.

The internal loop in the zs_reclaim_page() function is about walking
the LRU list until at least one backing page is freed.

Then there is zs_zpool_shrink() which calls zs_reclaim_page() until
the requested number of pages are freed.

Finally, there is shrink_worker(), which calls zs_zpool_shrink(). It
currently calls it for a single page when woken up during a store that
hits the zswap pool limit. This is the problematic one, because zswap
is very unlikely to go back to accepting stores after one page freed.

Domenico's patch isn't adding more retries for error conditions. It
ensures the pool is shrunk back down to where it accepts stores again.

The reason that it now looks at errors as well isn't to retry over
them (that's zs_reclaim_page()'s job). It's to avoid an infinite loop
in case there is an unexpectedly high rate of errors across a whole
series of pages (suggesting there is a bug of some kind).

> I look at all the possible error codes generated inside
> the reclaim code, the only noticeable errors are ENOMEM
> and concurrent swap invalidation or a racing swapin fault.

Right.

> BTW, zswap reclaim consumes memory. Keep on looping ENOMEM
> might cause more OOM. But that can exist in current code
> as well.

Right.

And this is temporary. Zswap will allocate a page to decompress in,
add it to the swapcache and kick off the IO. Once the page is written
out, it'll be reclaimed again. So while the consumption increases
temporarily, the end result is a net reduction by the amount of
compressed data that was written back from zswap.

This is typical for other types of reclaim as well, e.g. allocating
entries in the swapcache tree, allocating bios and IO requests...

> > > > Aside from -ENOMEM, writeback_entry will fail on concurrent swap
> > > > invalidation or a racing swapin fault. In both cases we should
> > > > absolutely keep trying other entries until the goal is met.
> > > 
> > > How about a narrower fix recognizing those error cases and making
> > > the inner loop continue in those errors?
> > 
> > Right, I just I don't really see the value proposition of this
> > complication, and I see some downsides (see below). No single entry
> > error should ever cause us to stop the wider reclaim loop.
> 
> That is until the current LRU list has been through once.
> I expect repeating the same list yields less reclaimed pages.

If we see failure due to invalidation races, the entries will free and
shrink the pool, thus getting us closer to the can_accept() condition.
We'll stop looping in the shrinker once enough entries have been freed
- whether we reclaimed them ourselves, or somebody else invalidated them.