From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A998C77B73 for ; Wed, 31 May 2023 02:30:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 965606B0072; Tue, 30 May 2023 22:30:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9157E280002; Tue, 30 May 2023 22:30:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DD1A280001; Tue, 30 May 2023 22:30:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6DEC46B0072 for ; Tue, 30 May 2023 22:30:23 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2FDA31A00E9 for ; Wed, 31 May 2023 02:30:23 +0000 (UTC) X-FDA: 80848971126.25.8D0D72D Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) by imf10.hostedemail.com (Postfix) with ESMTP id 26349C001F for ; Wed, 31 May 2023 02:30:20 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=W+CvHuZu; spf=pass (imf10.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.49 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685500221; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iwgqufLL+BZf+iFjUCvuUpn5v95zPOIiAfCvQNA1OTY=; b=oUoHaQKxHXMfLD7x8N7a/hsxXy5xeESgCQcxY3w4ULejIlsfY/iWO0MWr0S2CelkIKQTP0 goYLinP1DFCIj4/0+T6MxdgD1M8XUeqsxB9QbBVQYUCE7s09GxXnbaMeH5KLAwJo2XfOof Mb4FV6bBRsAVTTQGiu22MJ8/hpvy8pM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685500221; a=rsa-sha256; cv=none; b=sr6M7PNPeWJ2pTMM27sNKgcPV+ekUPSGB6tttFgcH8DOjZfR0G0KZKy+REOHDUtpMVVzW1 2m+6fLkFmHj6pBxi3hg2HjVbWXd39nmDteTnCvynDwGWLh6mfwQJo6fuaiPcH4XKC1tIug q6z0pBpBJYEomkItb0gkL9TOavCTxKs= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=W+CvHuZu; spf=pass (imf10.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.49 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-61cd6191a62so27676146d6.3 for ; Tue, 30 May 2023 19:30:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1685500220; x=1688092220; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=iwgqufLL+BZf+iFjUCvuUpn5v95zPOIiAfCvQNA1OTY=; b=W+CvHuZuGxk1g1OV26ZeNlIdaJCON/0DbBboKDdO8uREAiJ58P6JokBX8FMBbCBXlO lu7QSX406XjqlPluKy0+0lB5BH7lpIe/nT341347lEBE0dwnEYXftfNTinQMFoLYdZf+ pI1djVBRX5Fxf/PhkVULt/IrU8ShNx2hQU3HCv7JctlcEV9GfEStkxQe8ZB0iWZwB4EN jPa8OsujecSAcgyLXwBys3SVnwE5e7E2I2BaBqwvk5YqvtGMq96Jn8MhYl5IhOQlcnE+ HMfvYt2glTogyHPldgDfq3k92r0mngVKEQhSDrBDZ51l5NyRPAK1ATDwd8zES4hwSuKk 7x8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685500220; x=1688092220; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=iwgqufLL+BZf+iFjUCvuUpn5v95zPOIiAfCvQNA1OTY=; b=R4lRLQkgL/akhzokOPYuahJclI9HwdjBVvc2zch0YMBU2xzAufFWe7NnWSjc71ElNe /i9sOVPYRX0bNQtsjTnzeKuURTMX85EAjzD+f+qFwxKDxw1KFgDBAqeGVhBmQInwIXir 5pl7PEr+dZ5aiGsaKP7ZuuCkFoS5iCZrimbhvCnogn7JKJzd68iRT7uttmiFdIaelU+l 0/hakWsO4xIhYqk6CN266Y0R4TIiS6Cp32Me2idqBbvapqvxOmZ0USQszSzET+XtieG0 nqMcLGPNOjmq7HRriaG4XES8QoXrZnA9eZJ1q9AxgneMl0WJadsGPG/+lAeMQSi0K8uQ pygQ== X-Gm-Message-State: AC+VfDw3fliXPMOFgBvAFVamtr1uMbJwIKvsvZYK3z8biq0I7TCUyu5D UGAVET2ByPgxRvkycW3u9a8muQ== X-Google-Smtp-Source: ACHHUZ5dnAdpJQEsgVQGfGbrrL9fOzvHwkNp3TtGdEGJEbU/6FyWqYAEns/B6P92wSR9d57Gs6o/1A== X-Received: by 2002:a05:6214:e43:b0:616:4b40:5ea9 with SMTP id o3-20020a0562140e4300b006164b405ea9mr4908421qvc.40.1685500220181; Tue, 30 May 2023 19:30:20 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:8bb6]) by smtp.gmail.com with ESMTPSA id k8-20020ad44208000000b006261a1cd7f7sm3147860qvp.10.2023.05.30.19.30.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 19:30:19 -0700 (PDT) Date: Tue, 30 May 2023 22:30:18 -0400 From: Johannes Weiner To: Chris Li Cc: Domenico Cerasuolo , linux-mm@kvack.org, linux-kernel@vger.kernel.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, yosryahmed@google.com, kernel-team@fb.com Subject: Re: [PATCH] mm: zswap: shrink until can accept Message-ID: <20230531023018.GC102494@cmpxchg.org> References: <20230524065051.6328-1-cerasuolodomenico@gmail.com> <20230530041341.GB84971@cmpxchg.org> <20230530155519.GB97194@cmpxchg.org> <20230530185451.GA101722@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 26349C001F X-Rspam-User: X-Stat-Signature: fpdh7haqm95f9eun9wornc5egjqeti4m X-Rspamd-Server: rspam03 X-HE-Tag: 1685500220-951857 X-HE-Meta: U2FsdGVkX1/9p7kfncNbLcDkfpO8vu+WcQf9PopV61YNfYFlm/a4rl2VGj8v1WLzfB1bwlLrXNT1IpeNuLMFQ0kQfDw0c8wSsw3YYrMEttUwNDqIIwgEL/uoff4VBfuTSZiA6BWfhxWAPe1nAwhejqXtw0+PQkj77HEH/vzStH25hrTdjK9gbHNvGdFI8T8LYxKNU8IZOwBNDX+xmt3hRvb8PQjrbgjzK5yienX1qoSxNQrbvIjEq9bEZDB7952fiJa56ox/tj1poAxCRjrezYQ19S5zXNDxhIYFE8rcDB3YKtbBa0ds74rG2PXkI4gNs9g/5j/VzuzQIZYZKQxTN4AXqhZOcSaTayf1e1Ar767RGDvc0uymO1N6lBzBfASunpJ7WGzw74rLHg/eTJS9aXD86uUJVbsBuc15ZV0dFaEyVcdfn1ff+3XFAICo95CvQzoqSwbOaJ6GpnPK9m1sFjbceJ0vTghQP8ZuR/Or3SY63uC9qwApvswE+4fgf2jzkyxVLXJdMOsUGF2jaOQeJ4h6S5f9eNpr5Qqh3d5P6/uf/yMEFgDG5YqzMpWreso7s6P3gYDC+G8BSC8m11hzsTsylHYcfsW2cfQJTfPttZQxvWfWZraIqg44eHcC5N6zT+QzSUwfA/AHl/IMKR4U/gkoDSB26OZ1iEETLPAW81yoOr2EYM0gINww3pgDWFF5r1WeFAD1yB1Waj2Yrj6a7A6UIFddFt4fZmxaa/9QtASo75YS3gwYNkwyz7P3GjjQxjaIlHancI+lFun99nr/gjbmXzwOjFNVHiu0VHAnTsIlz4U4cU4C2skbRtlkMZiIdi7dQpay9u6IOEyAAY/BdpkWYwiwCYGDKDf54Gv+3kyuCjlT1pQJ5SztX64xXFMqBLplRUBvxVMatkRxoSfQXMcMUCYhi8DE5g3Wlr4Hro8sqpoHwsMBBfqWjefE/vCNea4FoNYavYI1pmsjKqV WA58dFDj OJLVGZeVOjksSVxhxrNUYskSMjo9OjhaIYgeLojJVe/gKy4Lc+9NIyxKKmYYyZ/gh99w7NrJ8XnI0RLBIacUfvFAzuTuDDJ2pxQ+rf5XdM7E8vJIiGUKuOtageT9EK48QerA50WbdlkZ116FcQtfWrlvF1JEGg2nlD2q8BbhiXIsKMaRhfU0tOWPTcctN8+ZsHMMLdg7O0royuEIxRfSNRauCjw5R7pk9QWJlJD0ZSzzmGYbblej7MRVkYpiZWPZoCdVeREQTOt+34RR365MEIc14GJQwQlkh/x8bS7xhlhCaNU6cFLe5BaOdY96W7VFD10ghV2O4Q3AZh6hemkjrbeb6uv8PXKP5nRVQj5N6SaP7pghiOKZzUMQyd3JWR1I7YLLlT1Bbvp2d5KWazMlTSvWJqvXb2BIzIJMoZMEr2RinGWs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 30, 2023 at 06:06:38PM -0700, Chris Li wrote: > On Tue, May 30, 2023 at 02:54:51PM -0400, Johannes Weiner wrote: > > > Maybe ENOMEM is a bad example. How about if the swap device > > > just went bad and can't complete new IO writes? > > > > This is actually outside the scope of zswap, and handled by the > > swapcache (end_swap_bio_write). > > > > Once the IO is submitted, zswap will ax its copy and leave the rest to > > the swapcache. It behaves the same way as if zswap had never been > > involved to begin with when the swap out fails on IO errors. > > > > From a zswap perspective, there are no persistent errors in moving a > > zswap entry back into the swapcache. Not just currently, but generally. > Again, you are right that this zswap writeback is async. > So the writeback error is NOT going to propagate to the > shrink function. > > With the current three pool backends that I looked at{zbud, > z3fold,zsmalloc} they all have internal retry 8 times. > Adding more retry did not fundamentally change the existing > behavior. Ah, but they're looping over different things. The internal loop in the zs_reclaim_page() function is about walking the LRU list until at least one backing page is freed. Then there is zs_zpool_shrink() which calls zs_reclaim_page() until the requested number of pages are freed. Finally, there is shrink_worker(), which calls zs_zpool_shrink(). It currently calls it for a single page when woken up during a store that hits the zswap pool limit. This is the problematic one, because zswap is very unlikely to go back to accepting stores after one page freed. Domenico's patch isn't adding more retries for error conditions. It ensures the pool is shrunk back down to where it accepts stores again. The reason that it now looks at errors as well isn't to retry over them (that's zs_reclaim_page()'s job). It's to avoid an infinite loop in case there is an unexpectedly high rate of errors across a whole series of pages (suggesting there is a bug of some kind). > I look at all the possible error codes generated inside > the reclaim code, the only noticeable errors are ENOMEM > and concurrent swap invalidation or a racing swapin fault. Right. > BTW, zswap reclaim consumes memory. Keep on looping ENOMEM > might cause more OOM. But that can exist in current code > as well. Right. And this is temporary. Zswap will allocate a page to decompress in, add it to the swapcache and kick off the IO. Once the page is written out, it'll be reclaimed again. So while the consumption increases temporarily, the end result is a net reduction by the amount of compressed data that was written back from zswap. This is typical for other types of reclaim as well, e.g. allocating entries in the swapcache tree, allocating bios and IO requests... > > > > Aside from -ENOMEM, writeback_entry will fail on concurrent swap > > > > invalidation or a racing swapin fault. In both cases we should > > > > absolutely keep trying other entries until the goal is met. > > > > > > How about a narrower fix recognizing those error cases and making > > > the inner loop continue in those errors? > > > > Right, I just I don't really see the value proposition of this > > complication, and I see some downsides (see below). No single entry > > error should ever cause us to stop the wider reclaim loop. > > That is until the current LRU list has been through once. > I expect repeating the same list yields less reclaimed pages. If we see failure due to invalidation races, the entries will free and shrink the pool, thus getting us closer to the can_accept() condition. We'll stop looping in the shrinker once enough entries have been freed - whether we reclaimed them ourselves, or somebody else invalidated them.