Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible
Date: Thu, 6 Feb 2025 16:19:36 +0000	[thread overview]
Message-ID: <Z6ThGFt6wyNpx9xi@google.com> (raw)
In-Reply-To: <6uhsj4bckhursiblkxe54azfgyqal6tq2de3lpkxw6omkised6@uylodcjruuei>

On Thu, Feb 06, 2025 at 12:05:55PM +0900, Sergey Senozhatsky wrote:
> On (25/02/05 19:06), Yosry Ahmed wrote:
> > > > For example, the compaction/migration code could be sleeping holding the
> > > > write lock, and a map() call would spin waiting for that sleeping task.
> > > 
> > > write-lock holders cannot sleep, that's the key part.
> > > 
> > > So the rules are:
> > > 
> > > 1) writer cannot sleep
> > >    - migration/compaction runs in atomic context and grabs
> > > 	 write-lock only from atomic context
> > >    - write-locking function disables preemption before lock(), just to be
> > > 	 safe, and enables it after unlock()
> > > 
> > > 2) writer does not spin waiting
> > >    - that's why there is only write_try_lock function
> > > 	  - compaction and migration bail out when they cannot lock the
> > > 		zspage
> > > 
> > > 3) readers can sleep and can spin waiting for a lock
> > >    - other (even preempted) readers don't block new readers
> > >    - writers don't sleep, they always unlock
> > 
> > That's useful, thanks. If we go with custom locking we need to document
> > this clearly and add debug checks where possible.
> 
> Sure.  That's what it currently looks like (can always improve)
> 
> ---
> /*
>  * zspage lock permits preemption on the reader-side (there can be multiple
>  * readers).  Writers (exclusive zspage ownership), on the other hand, are
>  * always run in atomic context and cannot spin waiting for a (potentially
>  * preempted) reader to unlock zspage.  This, basically, means that writers
>  * can only call write-try-lock and must bail out if it didn't succeed.
>  *
>  * At the same time, writers cannot reschedule under zspage write-lock,
>  * so readers can spin waiting for the writer to unlock zspage.
>  */
> static void zspage_read_lock(struct zspage *zspage)
> {
>         atomic_t *lock = &zspage->lock;
>         int old = atomic_read_acquire(lock);
> 
>         do {
>                 if (old == ZS_PAGE_WRLOCKED) {
>                         cpu_relax();
>                         old = atomic_read_acquire(lock);
>                         continue;
>                 }
>         } while (!atomic_try_cmpxchg_acquire(lock, &old, old + 1));
> 
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>         rwsem_acquire_read(&zspage->lockdep_map, 0, 0, _RET_IP_);
> #endif
> }
> 
> static void zspage_read_unlock(struct zspage *zspage)
> {
>         atomic_dec_return_release(&zspage->lock);
> 
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>         rwsem_release(&zspage->lockdep_map, _RET_IP_);
> #endif
> }
> 
> static bool zspage_try_write_lock(struct zspage *zspage)
> {
>         atomic_t *lock = &zspage->lock;
>         int old = ZS_PAGE_UNLOCKED;
> 
>         preempt_disable();
>         if (atomic_try_cmpxchg_acquire(lock, &old, ZS_PAGE_WRLOCKED)) {
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>                 rwsem_acquire(&zspage->lockdep_map, 0, 0, _RET_IP_);
> #endif
>                 return true;
>         }
> 
>         preempt_enable();
>         return false;
> }
> 
> static void zspage_write_unlock(struct zspage *zspage)
> {
>         atomic_set_release(&zspage->lock, ZS_PAGE_UNLOCKED);
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>         rwsem_release(&zspage->lockdep_map, _RET_IP_);
> #endif
>         preempt_enable();
> }
> ---
> 
> Maybe I'll just copy-paste the locking rules list, a list is always cleaner.

Thanks. I think it would be nice if we could also get someone with
locking expertise to take a look at this.

> 
> > > > I wonder if there's a way to rework the locking instead to avoid the
> > > > nesting. It seems like sometimes we lock the zspage with the pool lock
> > > > held, sometimes with the class lock held, and sometimes with no lock
> > > > held.
> > > > 
> > > > What are the rules here for acquiring the zspage lock?
> > > 
> > > Most of that code is not written by me, but I think the rule is to disable
> > > "migration" be it via pool lock or class lock.
> > 
> > It seems like we're not holding either of these locks in
> > async_free_zspage() when we call lock_zspage(). Is it safe for a
> > different reason?
> 
> I think we hold size class lock there. async-free is only for pages that
> reached 0 usage ratio (empty fullness group), so they don't hold any
> objects any more and from her such zspages either get freed or
> find_get_zspage() recovers them from fullness 0 and allocates an object.
> Both are synchronized by size class lock.
> 
> > > Hmm, I don't know... zsmalloc is not "read-mostly", it's whatever data
> > > patterns the clients have.   I suspect we'd need to synchronize RCU every
> > > time a zspage is freed: zs_free() [this one is complicated], or migration,
> > > or compaction?  Sounds like anti-pattern for RCU?
> > 
> > Can't we use kfree_rcu() instead of synchronizing? Not sure if this
> > would still be an antipattern tbh.
> 
> Yeah, I don't know.  The last time I wrongly used kfree_rcu() it caused a
> 27% performance drop (some internal code).  This zspage thingy maybe will
> be better, but still has a potential to generate high numbers of RCU calls,
> depends on the clients.  Probably the chances are too high.  Apart from
> that, kvfree_rcu() can sleep, as far as I understand, so zram might have
> some extra things to deal with, namely slot-free notifications which can
> be called from softirq, and always called under spinlock:
> 
>  mm slot-free -> zram slot-free -> zs_free -> empty zspage -> kfree_rcu
> 
> > It just seems like the current locking scheme is really complicated :/
> 
> That's very true.

Seems like we have to compromise either way, custom locking or we enter
into a new complexity realm with RCU freeing.

next prev parent reply	other threads:[~2025-02-06 16:19 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-31  9:05 [PATCHv4 00/17] zsmalloc/zram: there be preemption Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 01/17] zram: switch to non-atomic entry locking Sergey Senozhatsky
2025-01-31 11:41   ` Hillf Danton
2025-02-03  3:21     ` Sergey Senozhatsky
2025-02-03  3:52       ` Sergey Senozhatsky
2025-02-03 12:39       ` Sergey Senozhatsky
2025-01-31 22:55   ` Andrew Morton
2025-02-03  3:26     ` Sergey Senozhatsky
2025-02-03  7:11       ` Sergey Senozhatsky
2025-02-03  7:33         ` Sergey Senozhatsky
2025-02-04  0:19       ` Andrew Morton
2025-02-04  4:22         ` Sergey Senozhatsky
2025-02-06  7:01     ` Sergey Senozhatsky
2025-02-06  7:38       ` Sebastian Andrzej Siewior
2025-02-06  7:47         ` Sergey Senozhatsky
2025-02-06  8:13           ` Sebastian Andrzej Siewior
2025-02-06  8:17             ` Sergey Senozhatsky
2025-02-06  8:26               ` Sebastian Andrzej Siewior
2025-02-06  8:29                 ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 02/17] zram: do not use per-CPU compression streams Sergey Senozhatsky
2025-02-01  9:21   ` Kairui Song
2025-02-03  3:49     ` Sergey Senozhatsky
2025-02-03 21:00       ` Yosry Ahmed
2025-02-06 12:26         ` Sergey Senozhatsky
2025-02-06  6:55       ` Kairui Song
2025-02-06  7:22         ` Sergey Senozhatsky
2025-02-06  8:22           ` Sergey Senozhatsky
2025-02-06 16:16           ` Yosry Ahmed
2025-02-07  2:56             ` Sergey Senozhatsky
2025-02-07  6:12               ` Sergey Senozhatsky
2025-02-07 21:07                 ` Yosry Ahmed
2025-02-08 16:20                   ` Sergey Senozhatsky
2025-02-08 16:41                     ` Sergey Senozhatsky
2025-02-09  6:22                     ` Sergey Senozhatsky
2025-02-09  7:42                       ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 03/17] zram: remove crypto include Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 04/17] zram: remove max_comp_streams device attr Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 05/17] zram: remove two-staged handle allocation Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 06/17] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 07/17] zram: permit reclaim in recompression handle allocation Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 08/17] zram: remove writestall zram_stats member Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 09/17] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 10/17] zram: filter out recomp targets based on priority Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 11/17] zram: unlock slot during recompression Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 12/17] zsmalloc: factor out pool locking helpers Sergey Senozhatsky
2025-01-31 15:46   ` Yosry Ahmed
2025-02-03  4:57     ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 13/17] zsmalloc: factor out size-class " Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-01-31 15:51   ` Yosry Ahmed
2025-02-03  3:13     ` Sergey Senozhatsky
2025-02-03  4:56       ` Sergey Senozhatsky
2025-02-03 21:11       ` Yosry Ahmed
2025-02-04  6:59         ` Sergey Senozhatsky
2025-02-04 17:19           ` Yosry Ahmed
2025-02-05  2:43             ` Sergey Senozhatsky
2025-02-05 19:06               ` Yosry Ahmed
2025-02-06  3:05                 ` Sergey Senozhatsky
2025-02-06  3:28                   ` Sergey Senozhatsky
2025-02-06 16:19                   ` Yosry Ahmed [this message]
2025-02-07  2:48                     ` Sergey Senozhatsky
2025-02-07 21:09                       ` Yosry Ahmed
2025-02-12  5:00                         ` Sergey Senozhatsky
2025-02-12 15:35                           ` Yosry Ahmed
2025-02-13  2:18                             ` Sergey Senozhatsky
2025-02-13  2:57                               ` Yosry Ahmed
2025-02-13  7:21                                 ` Sergey Senozhatsky
2025-02-13  8:22                                   ` Sergey Senozhatsky
2025-02-13 15:25                                     ` Yosry Ahmed
2025-02-14  3:33                                       ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 15/17] zsmalloc: introduce new object mapping API Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 16/17] zram: switch to new zsmalloc " Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 17/17] zram: add might_sleep to zcomp API Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6ThGFt6wyNpx9xi@google.com \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.