All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible
Date: Wed, 5 Feb 2025 19:06:08 +0000	[thread overview]
Message-ID: <Z6O2oPP7lyRGXer_@google.com> (raw)
In-Reply-To: <6vtpamir4bvn3snlj36tfmnmpcbd6ks6m3sdn7ewmoles7jhau@nbezqbnoukzv>

On Wed, Feb 05, 2025 at 11:43:16AM +0900, Sergey Senozhatsky wrote:
> On (25/02/04 17:19), Yosry Ahmed wrote:
> > > sizeof(struct zs_page) change is one thing.  Another thing is that
> > > zspage->lock is taken from atomic sections, pretty much everywhere.
> > > compaction/migration write-lock it under pool rwlock and class spinlock,
> > > but both compaction and migration now EAGAIN if the lock is locked
> > > already, so that is sorted out.
> > > 
> > > The remaining problem is map(), which takes zspage read-lock under pool
> > > rwlock.  RFC series (which you hated with passion :P) converted all zsmalloc
> > > into preemptible ones because of this - zspage->lock is a nested leaf-lock,
> > > so it cannot schedule unless locks it's nested under permit it (needless to
> > > say neither rwlock nor spinlock permit it).
> > 
> > Hmm, so we want the lock to be preemtible, but we don't want to use an
> > existing preemtible lock because it may be held it from atomic context.
> > 
> > I think one problem here is that the lock you are introducing is a
> > spinning lock but the lock holder can be preempted. This is why spinning
> > locks do not allow preemption. Others waiting for the lock can spin
> > waiting for a process that is scheduled out.
> > 
> > For example, the compaction/migration code could be sleeping holding the
> > write lock, and a map() call would spin waiting for that sleeping task.
> 
> write-lock holders cannot sleep, that's the key part.
> 
> So the rules are:
> 
> 1) writer cannot sleep
>    - migration/compaction runs in atomic context and grabs
> 	 write-lock only from atomic context
>    - write-locking function disables preemption before lock(), just to be
> 	 safe, and enables it after unlock()
> 
> 2) writer does not spin waiting
>    - that's why there is only write_try_lock function
> 	  - compaction and migration bail out when they cannot lock the
> 		zspage
> 
> 3) readers can sleep and can spin waiting for a lock
>    - other (even preempted) readers don't block new readers
>    - writers don't sleep, they always unlock

That's useful, thanks. If we go with custom locking we need to document
this clearly and add debug checks where possible.

> 
> > I wonder if there's a way to rework the locking instead to avoid the
> > nesting. It seems like sometimes we lock the zspage with the pool lock
> > held, sometimes with the class lock held, and sometimes with no lock
> > held.
> > 
> > What are the rules here for acquiring the zspage lock?
> 
> Most of that code is not written by me, but I think the rule is to disable
> "migration" be it via pool lock or class lock.

It seems like we're not holding either of these locks in
async_free_zspage() when we call lock_zspage(). Is it safe for a
different reason?

> 
> > Do we need to hold another lock just to make sure the zspage does not go
> > away from under us?
> 
> Yes, the page cannot go away via "normal" path:
>    zs_free(last object) -> zspage becomes empty -> free zspage
> 
> so when we have active mapping() it's only migration and compaction
> that can free zspage (its content is migrated and so it becomes empty).
> 
> > Can we use RCU or something similar to do that instead?
> 
> Hmm, I don't know... zsmalloc is not "read-mostly", it's whatever data
> patterns the clients have.   I suspect we'd need to synchronize RCU every
> time a zspage is freed: zs_free() [this one is complicated], or migration,
> or compaction?  Sounds like anti-pattern for RCU?

Can't we use kfree_rcu() instead of synchronizing? Not sure if this
would still be an antipattern tbh. It just seems like the current
locking scheme is really complicated :/


  reply	other threads:[~2025-02-05 19:06 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-31  9:05 [PATCHv4 00/17] zsmalloc/zram: there be preemption Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 01/17] zram: switch to non-atomic entry locking Sergey Senozhatsky
2025-01-31 11:41   ` Hillf Danton
2025-02-03  3:21     ` Sergey Senozhatsky
2025-02-03  3:52       ` Sergey Senozhatsky
2025-02-03 12:39       ` Sergey Senozhatsky
2025-01-31 22:55   ` Andrew Morton
2025-02-03  3:26     ` Sergey Senozhatsky
2025-02-03  7:11       ` Sergey Senozhatsky
2025-02-03  7:33         ` Sergey Senozhatsky
2025-02-04  0:19       ` Andrew Morton
2025-02-04  4:22         ` Sergey Senozhatsky
2025-02-06  7:01     ` Sergey Senozhatsky
2025-02-06  7:38       ` Sebastian Andrzej Siewior
2025-02-06  7:47         ` Sergey Senozhatsky
2025-02-06  8:13           ` Sebastian Andrzej Siewior
2025-02-06  8:17             ` Sergey Senozhatsky
2025-02-06  8:26               ` Sebastian Andrzej Siewior
2025-02-06  8:29                 ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 02/17] zram: do not use per-CPU compression streams Sergey Senozhatsky
2025-02-01  9:21   ` Kairui Song
2025-02-03  3:49     ` Sergey Senozhatsky
2025-02-03 21:00       ` Yosry Ahmed
2025-02-06 12:26         ` Sergey Senozhatsky
2025-02-06  6:55       ` Kairui Song
2025-02-06  7:22         ` Sergey Senozhatsky
2025-02-06  8:22           ` Sergey Senozhatsky
2025-02-06 16:16           ` Yosry Ahmed
2025-02-07  2:56             ` Sergey Senozhatsky
2025-02-07  6:12               ` Sergey Senozhatsky
2025-02-07 21:07                 ` Yosry Ahmed
2025-02-08 16:20                   ` Sergey Senozhatsky
2025-02-08 16:41                     ` Sergey Senozhatsky
2025-02-09  6:22                     ` Sergey Senozhatsky
2025-02-09  7:42                       ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 03/17] zram: remove crypto include Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 04/17] zram: remove max_comp_streams device attr Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 05/17] zram: remove two-staged handle allocation Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 06/17] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 07/17] zram: permit reclaim in recompression handle allocation Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 08/17] zram: remove writestall zram_stats member Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 09/17] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 10/17] zram: filter out recomp targets based on priority Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 11/17] zram: unlock slot during recompression Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 12/17] zsmalloc: factor out pool locking helpers Sergey Senozhatsky
2025-01-31 15:46   ` Yosry Ahmed
2025-02-03  4:57     ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 13/17] zsmalloc: factor out size-class " Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-01-31 15:51   ` Yosry Ahmed
2025-02-03  3:13     ` Sergey Senozhatsky
2025-02-03  4:56       ` Sergey Senozhatsky
2025-02-03 21:11       ` Yosry Ahmed
2025-02-04  6:59         ` Sergey Senozhatsky
2025-02-04 17:19           ` Yosry Ahmed
2025-02-05  2:43             ` Sergey Senozhatsky
2025-02-05 19:06               ` Yosry Ahmed [this message]
2025-02-06  3:05                 ` Sergey Senozhatsky
2025-02-06  3:28                   ` Sergey Senozhatsky
2025-02-06 16:19                   ` Yosry Ahmed
2025-02-07  2:48                     ` Sergey Senozhatsky
2025-02-07 21:09                       ` Yosry Ahmed
2025-02-12  5:00                         ` Sergey Senozhatsky
2025-02-12 15:35                           ` Yosry Ahmed
2025-02-13  2:18                             ` Sergey Senozhatsky
2025-02-13  2:57                               ` Yosry Ahmed
2025-02-13  7:21                                 ` Sergey Senozhatsky
2025-02-13  8:22                                   ` Sergey Senozhatsky
2025-02-13 15:25                                     ` Yosry Ahmed
2025-02-14  3:33                                       ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 15/17] zsmalloc: introduce new object mapping API Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 16/17] zram: switch to new zsmalloc " Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 17/17] zram: add might_sleep to zcomp API Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6O2oPP7lyRGXer_@google.com \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.