Re: [PATCH] mm: zswap: shrink until can accept

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chris Li <chrisl@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com,
	yosryahmed@google.com, kernel-team@fb.com
Subject: Re: [PATCH] mm: zswap: shrink until can accept
Date: Tue, 30 May 2023 11:18:51 -0700	[thread overview]
Message-ID: <ZHY+C0ICTah8/+V3@google.com> (raw)
In-Reply-To: <20230530155519.GB97194@cmpxchg.org>

On Tue, May 30, 2023 at 11:55:19AM -0400, Johannes Weiner wrote:
> On Tue, May 30, 2023 at 07:51:23AM -0700, Chris Li wrote:
> > Thanks for pointing out -ENOMEM shouldn't be persistent.
> > Points taken.
> > 
> > The original point of not retrying the persistent error
> > still holds.
> 
> Okay, but what persistent errors are you referring to?

Maybe ENOMEM is a bad example. How about if the swap device
just went bad and can't complete new IO writes?

> Aside from -ENOMEM, writeback_entry will fail on concurrent swap
> invalidation or a racing swapin fault. In both cases we should
> absolutely keep trying other entries until the goal is met.

How about a narrower fix recognizing those error cases and making
the inner loop continue in those errors?

> > > Should it be fixed before merging this patch? I don't think the
> > > ordering matters. Right now the -ENOMEM case invokes OOM, so it isn't
> > > really persistent either. Retrying a few times in that case certainly
> > > doesn't seem to make things worse.
> > 
> > If you already know the error is persistent, retrying is wasting
> > CPU. It can pertancial hold locks during the retry, which can
> > slow someone else down.
> 
> That's a bit of a truism. How does this pertain to the zswap reclaim
> situation?

See the above narrower fix alternative.
> 
> > > > > As I was writing to Yosry, the differentiation would be a great improvement
> > > > > here, I just have a patch set in the queue that moves the inner reclaim loop
> > > > > from the zpool driver up to zswap. With that, updating the error handling
> > > > > would be more convenient as it would be done in one place instead of three.i
> > > > 
> > > > This has tricky complications as well. The current shrink interface
> > > > doesn't support continuing from the previous error position. If you want
> > > > to avoid a repeat attempt if the page has a writeback error, you kinda
> > > > of need a way to skip that page.
> > > 
> > > A page that fails to reclaim is put back to the tail of the LRU, so
> > > for all intents and purposes it will be skipped. In the rare and
> > 
> > Do you mean the page is treated as hot again?
> > 
> > Wouldn't that be undesirable from the app's point of view?
> 
> That's current backend LRU behavior. Is it optimal? That's certainly
> debatable. But it's tangential to this patch. The point is that
> capping retries to a fixed number of failures works correctly as a
> safety precaution and introduces no (new) undesirable behavior.
> 
> It's entirely moot once we refactor the backend page LRU to the zswap
> entry LRU. The only time we'll fail to reclaim an entry is if we race
> with something already freeing it, so it doesn't really matter where
> we put it.

Agree with you there. A bit side tracked.

> > > extreme case where it's the only page left on the list, I again don't
> > > see how retrying a few times will make the situation worse.
> > > 
> > > In practice, IMO there is little upside in trying to be more
> > > discerning about the error codes. Simple seems better here.
> > 
> > Just trying to think about what should be the precise loop termination
> > condition here.
> > 
> > I still feel blindly trying a few times is a very imprecise condition.
> 
> The precise termination condition is when can_accept() returns true
> again. The safety cap is only added as precaution to avoid infinite
> loops if something goes wrong or unexpected, now or in the future.

In my mind, that statement already suggests can_accept() is not
*precise*, considering the avoid infinite loop.
e.g. Do we know what is the optimal cap value and why that value
is optical?

Putting the definition of precise aside, I do see the unconditional
retry can have unwanted effects.

Chris

next prev parent reply	other threads:[~2023-05-30 18:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-24  6:50 [PATCH] mm: zswap: shrink until can accept Domenico Cerasuolo
2023-05-25  0:58 ` Yosry Ahmed
2023-05-25 16:52   ` Domenico Cerasuolo
2023-05-25 19:09     ` Yosry Ahmed
2023-05-26  8:52       ` Domenico Cerasuolo
2023-05-26 16:53         ` Yosry Ahmed
2023-05-26 10:18 ` Vitaly Wool
2023-05-26 13:56   ` Johannes Weiner
2023-05-26 23:05 ` Chris Li
2023-05-28  8:53   ` Domenico Cerasuolo
2023-05-29 21:00     ` Chris Li
2023-05-30  4:13       ` Johannes Weiner
2023-05-30 14:51         ` Chris Li
2023-05-30 15:55           ` Johannes Weiner
2023-05-30 18:18             ` Chris Li [this message]
2023-05-30 18:54               ` Johannes Weiner
2023-05-31  1:06                 ` Chris Li
2023-05-31  2:30                   ` Johannes Weiner
2023-06-01 15:27                   ` Domenico Cerasuolo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZHY+C0ICTah8/+V3@google.com \
    --to=chrisl@kernel.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.