From: Minchan Kim <minchan@kernel.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
linux-mm <linux-mm@kvack.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 7/8] zsmalloc: replace per zpage lock with pool->migrate_lock
Date: Thu, 11 Nov 2021 15:11:34 -0800 [thread overview]
Message-ID: <YY2jJgMxJHPNs9ES@google.com> (raw)
In-Reply-To: <20211111090727.eq67hxfpux23dagd@linutronix.de>
On Thu, Nov 11, 2021 at 10:07:27AM +0100, Sebastian Andrzej Siewior wrote:
> On 2021-11-10 10:54:32 [-0800], Minchan Kim wrote:
> > The zsmalloc has used a bit for spin_lock in zpage handle to keep
> > zpage object alive during several operations. However, it causes
> > the problem for PREEMPT_RT as well as introducing too complicated.
> >
> > This patch replaces the bit spin_lock with pool->migrate_lock
> > rwlock. It could make the code simple as well as zsmalloc work
> > under PREEMPT_RT.
> >
> > The drawback is the pool->migrate_lock is bigger granuarity than
> > per zpage lock so the contention would be higher than old when
> > both IO-related operations(i.e., zsmalloc, zsfree, zs_[map|unmap])
> > and compaction(page/zpage migration) are going in parallel(*,
> > the migrate_lock is rwlock and IO related functions are all read
> > side lock so there is no contention). However, the write-side
> > is fast enough(dominant overhead is just page copy) so it wouldn't
> > affect much. If the lock granurity becomes more problem later,
> > we could introduce table locks based on handle as a hash value.
> >
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> …
> > index b8b098be92fa..5d4c4d254679 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1789,6 +1767,11 @@ static void migrate_write_lock(struct zspage *zspage)
> > write_lock(&zspage->lock);
> > }
> >
> > +static void migrate_write_lock_nested(struct zspage *zspage)
> > +{
> > + write_lock_nested(&zspage->lock, SINGLE_DEPTH_NESTING);
>
> I don't have this in my tree.
I forgot it. I append it at the tail of the thread.
I will also include it at nest revision.
>
> > +}
> > +
> > static void migrate_write_unlock(struct zspage *zspage)
> > {
> > write_unlock(&zspage->lock);
> …
> > @@ -2077,8 +2043,13 @@ static unsigned long __zs_compact(struct zs_pool *pool,
> > struct zspage *dst_zspage = NULL;
> > unsigned long pages_freed = 0;
> >
> > + /* protect the race between zpage migration and zs_free */
> > + write_lock(&pool->migrate_lock);
> > + /* protect zpage allocation/free */
> > spin_lock(&class->lock);
> > while ((src_zspage = isolate_zspage(class, true))) {
> > + /* protect someone accessing the zspage(i.e., zs_map_object) */
> > + migrate_write_lock(src_zspage);
> >
> > if (!zs_can_compact(class))
> > break;
> > @@ -2087,6 +2058,8 @@ static unsigned long __zs_compact(struct zs_pool *pool,
> > cc.s_page = get_first_page(src_zspage);
> >
> > while ((dst_zspage = isolate_zspage(class, false))) {
> > + migrate_write_lock_nested(dst_zspage);
> > +
> > cc.d_page = get_first_page(dst_zspage);
> > /*
> > * If there is no more space in dst_page, resched
>
> Looking at the these two chunks, the page here comes from a list, you
> remove that page from that list and this ensures that you can't lock the
> very same pages in reverse order as in:
>
> migrate_write_lock(dst_zspage);
> …
> migrate_write_lock(src_zspage);
>
> right?
Sure.
From e0bfc5185bbd15c651a7a367b6d053b8c88b1e01 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 19 Oct 2021 15:34:09 -0700
Subject: [PATCH] locking/rwlocks: introduce write_lock_nested
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
include/linux/rwlock.h | 6 ++++++
include/linux/rwlock_api_smp.h | 9 +++++++++
include/linux/spinlock_api_up.h | 1 +
kernel/locking/spinlock.c | 6 ++++++
4 files changed, 22 insertions(+)
diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index 7ce9a51ae5c0..93086de7bf9e 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -70,6 +70,12 @@ do { \
#define write_lock(lock) _raw_write_lock(lock)
#define read_lock(lock) _raw_read_lock(lock)
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define write_lock_nested(lock, subclass) _raw_write_lock_nested(lock, subclass)
+#else
+#define write_lock_nested(lock, subclass) _raw_write_lock(lock)
+#endif
+
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
#define read_lock_irqsave(lock, flags) \
diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index abfb53ab11be..e0c866177c03 100644
--- a/include/linux/rwlock_api_smp.h
+++ b/include/linux/rwlock_api_smp.h
@@ -17,6 +17,7 @@
void __lockfunc _raw_read_lock(rwlock_t *lock) __acquires(lock);
void __lockfunc _raw_write_lock(rwlock_t *lock) __acquires(lock);
+void __lockfunc _raw_write_lock_nested(rwlock_t *lock, int subclass) __acquires(lock);
void __lockfunc _raw_read_lock_bh(rwlock_t *lock) __acquires(lock);
void __lockfunc _raw_write_lock_bh(rwlock_t *lock) __acquires(lock);
void __lockfunc _raw_read_lock_irq(rwlock_t *lock) __acquires(lock);
@@ -46,6 +47,7 @@ _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
#ifdef CONFIG_INLINE_WRITE_LOCK
#define _raw_write_lock(lock) __raw_write_lock(lock)
+#define _raw_write_lock_nested(lock, subclass) __raw_write_lock_nested(lock, subclass)
#endif
#ifdef CONFIG_INLINE_READ_LOCK_BH
@@ -211,6 +213,13 @@ static inline void __raw_write_lock(rwlock_t *lock)
LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock);
}
+static inline void __raw_write_lock_nested(rwlock_t *lock, int subclass)
+{
+ preempt_disable();
+ rwlock_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock);
+}
+
#endif /* !CONFIG_GENERIC_LOCKBREAK || CONFIG_DEBUG_LOCK_ALLOC */
static inline void __raw_write_unlock(rwlock_t *lock)
diff --git a/include/linux/spinlock_api_up.h b/include/linux/spinlock_api_up.h
index d0d188861ad6..b8ba00ccccde 100644
--- a/include/linux/spinlock_api_up.h
+++ b/include/linux/spinlock_api_up.h
@@ -59,6 +59,7 @@
#define _raw_spin_lock_nested(lock, subclass) __LOCK(lock)
#define _raw_read_lock(lock) __LOCK(lock)
#define _raw_write_lock(lock) __LOCK(lock)
+#define _raw_write_lock_nested(lock, subclass) __LOCK(lock)
#define _raw_spin_lock_bh(lock) __LOCK_BH(lock)
#define _raw_read_lock_bh(lock) __LOCK_BH(lock)
#define _raw_write_lock_bh(lock) __LOCK_BH(lock)
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index c5830cfa379a..22969ec69288 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -300,6 +300,12 @@ void __lockfunc _raw_write_lock(rwlock_t *lock)
__raw_write_lock(lock);
}
EXPORT_SYMBOL(_raw_write_lock);
+
+void __lockfunc _raw_write_lock_nested(rwlock_t *lock, int subclass)
+{
+ __raw_write_lock_nested(lock, subclass);
+}
+EXPORT_SYMBOL(_raw_write_lock_nested);
#endif
#ifndef CONFIG_INLINE_WRITE_LOCK_IRQSAVE
--
2.34.0.rc1.387.gb447b232ab-goog
next prev parent reply other threads:[~2021-11-11 23:11 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-10 18:54 [PATCH 0/8] zsmalloc: remove bit_spin_lock Minchan Kim
2021-11-10 18:54 ` [PATCH 1/8] zsmalloc: introduce some helper functions Minchan Kim
2021-11-10 18:54 ` [PATCH 2/8] zsmalloc: rename zs_stat_type to class_stat_type Minchan Kim
2021-11-10 18:54 ` [PATCH 3/8] zsmalloc: decouple class actions from zspage works Minchan Kim
2021-11-10 18:54 ` [PATCH 4/8] zsmalloc: introduce obj_allocated Minchan Kim
2021-11-10 18:54 ` [PATCH 5/8] zsmalloc: move huge compressed obj from page to zspage Minchan Kim
2021-11-10 18:54 ` [PATCH 6/8] zsmalloc: remove zspage isolation for migration Minchan Kim
2021-11-10 18:54 ` [PATCH 7/8] zsmalloc: replace per zpage lock with pool->migrate_lock Minchan Kim
2021-11-11 9:07 ` Sebastian Andrzej Siewior
2021-11-11 23:11 ` Minchan Kim [this message]
2021-11-12 7:28 ` Sebastian Andrzej Siewior
2021-11-12 7:31 ` Sebastian Andrzej Siewior
2021-11-12 22:10 ` Minchan Kim
2021-11-11 10:13 ` kernel test robot
2021-11-11 10:13 ` kernel test robot
2021-11-10 18:54 ` [PATCH 8/8] zsmalloc: replace get_cpu_var with local_lock Minchan Kim
2021-11-11 8:56 ` Sebastian Andrzej Siewior
2021-11-11 23:08 ` Minchan Kim
2021-11-15 3:56 ` Davidlohr Bueso
2021-11-15 7:27 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YY2jJgMxJHPNs9ES@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=linux-mm@kvack.org \
--cc=senozhatsky@chromium.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.