From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32A1B421A1B for ; Mon, 2 Mar 2026 15:53:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772466821; cv=none; b=JOCmxgUwA2+jF/NCFVgmgxFDWVWOnqqHdYdSElJnOpBLwDnFDtE+78B9lFEW7+o8jD0brWNesf+dGucY7M1PpBJd7X+wxYUg1VpVM28neA+JrOJl+GZhah8KMH2n2VV3w4OjjurWSxoykd9f205vu8iFvdaP9mgEGbG5AbjCDDU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772466821; c=relaxed/simple; bh=Wa7WeNgBwMMPwrE40KGRIAGnYf9uCCLjvrOkrW7dOOc=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=WykoS6nbjKa+tjzhGkI9Oz0ZKj4pliY6ljluoNmLjXshW5RcukwGu4VZOxCOXq6DWkSxaE8Z2t0wCOe580EWYnxL91Cvu91jnwFrUGrAklonCuOTdZ1YA+BhGI9ln4ROMOFfUt/DggpUbnqwXVXpjky7ToKP7s2zGDPcMADtvf8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Zzto++Fh; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Zzto++Fh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1772466813; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=OUxDwIV1fHQiGjXtBvudrJH6Xl2HAj9m2MEmb/6VBik=; b=Zzto++FhYIIly8pk27jQeZIHugo/Gt8XNPfv8OvaYM4aFwj6xTEQRuyFmbi7jVl5OqFdDS qHcc1UytypJ0RQ4j/ZOHw+l12DYCf2b2nad8r5aQ6UL3nNM9sDjXLHAlXUL1XbM/EPVKX4 y6sqoiyvo33CYfgiOZ7zNQZRH5bRdGA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-655-VXjnp75hNN-bvcAijeasBw-1; Mon, 02 Mar 2026 10:53:29 -0500 X-MC-Unique: VXjnp75hNN-bvcAijeasBw-1 X-Mimecast-MFC-AGG-ID: VXjnp75hNN-bvcAijeasBw_1772466807 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DBFE218004AD; Mon, 2 Mar 2026 15:53:26 +0000 (UTC) Received: from tpad.localdomain (unknown [10.96.133.6]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2FA8719560A7; Mon, 2 Mar 2026 15:53:26 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id A895C4025F125; Mon, 2 Mar 2026 12:53:00 -0300 (-03) Message-ID: <20260302155105.275396307@redhat.com> User-Agent: quilt/0.69 Date: Mon, 02 Mar 2026 12:49:49 -0300 From: Marcelo Tosatti To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Leonardo Bras , Thomas Gleixner , Waiman Long , Boqun Feun , Frederic Weisbecker , Marcelo Tosatti Subject: [PATCH v2 4/5] swap: apply new queue_percpu_work_on() interface References: <20260302154945.143996316@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Make use of the new qpw_{un,}lock*() and queue_percpu_work_on() interface to improve performance & latency. For functions that may be scheduled in a different cpu, replace local_{un,}lock*() by qpw_{un,}lock*(), and replace schedule_work_on() by queue_percpu_work_on(). The same happens for flush_work() and flush_percpu_work(). The change requires allocation of qpw_structs instead of a work_structs, and changing parameters of a few functions to include the cpu parameter. This should bring no relevant performance impact on non-QPW kernels: For functions that may be scheduled in a different cpu, the local_*lock's this_cpu_ptr() becomes a per_cpu_ptr(smp_processor_id()). Signed-off-by: Leonardo Bras Signed-off-by: Marcelo Tosatti --- mm/internal.h | 4 ++- mm/mlock.c | 51 ++++++++++++++++++++++++++++++----------- mm/page_alloc.c | 2 - mm/swap.c | 69 ++++++++++++++++++++++++++++++-------------------------- 4 files changed, 79 insertions(+), 47 deletions(-) Index: linux/mm/mlock.c =================================================================== --- linux.orig/mm/mlock.c +++ linux/mm/mlock.c @@ -25,17 +25,16 @@ #include #include #include +#include #include "internal.h" struct mlock_fbatch { - local_lock_t lock; + qpw_lock_t lock; struct folio_batch fbatch; }; -static DEFINE_PER_CPU(struct mlock_fbatch, mlock_fbatch) = { - .lock = INIT_LOCAL_LOCK(lock), -}; +static DEFINE_PER_CPU(struct mlock_fbatch, mlock_fbatch); bool can_do_mlock(void) { @@ -209,18 +208,29 @@ static void mlock_folio_batch(struct fol folios_put(fbatch); } +void mlock_drain_cpu(int cpu) +{ + struct folio_batch *fbatch; + + qpw_lock(&mlock_fbatch.lock, cpu); + fbatch = per_cpu_ptr(&mlock_fbatch.fbatch, cpu); + if (folio_batch_count(fbatch)) + mlock_folio_batch(fbatch); + qpw_unlock(&mlock_fbatch.lock, cpu); +} + void mlock_drain_local(void) { struct folio_batch *fbatch; - local_lock(&mlock_fbatch.lock); + local_qpw_lock(&mlock_fbatch.lock); fbatch = this_cpu_ptr(&mlock_fbatch.fbatch); if (folio_batch_count(fbatch)) mlock_folio_batch(fbatch); - local_unlock(&mlock_fbatch.lock); + local_qpw_unlock(&mlock_fbatch.lock); } -void mlock_drain_remote(int cpu) +void mlock_drain_offline(int cpu) { struct folio_batch *fbatch; @@ -243,7 +253,7 @@ void mlock_folio(struct folio *folio) { struct folio_batch *fbatch; - local_lock(&mlock_fbatch.lock); + local_qpw_lock(&mlock_fbatch.lock); fbatch = this_cpu_ptr(&mlock_fbatch.fbatch); if (!folio_test_set_mlocked(folio)) { @@ -257,7 +267,7 @@ void mlock_folio(struct folio *folio) if (!folio_batch_add(fbatch, mlock_lru(folio)) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); - local_unlock(&mlock_fbatch.lock); + local_qpw_unlock(&mlock_fbatch.lock); } /** @@ -269,7 +279,7 @@ void mlock_new_folio(struct folio *folio struct folio_batch *fbatch; int nr_pages = folio_nr_pages(folio); - local_lock(&mlock_fbatch.lock); + local_qpw_lock(&mlock_fbatch.lock); fbatch = this_cpu_ptr(&mlock_fbatch.fbatch); folio_set_mlocked(folio); @@ -280,7 +290,7 @@ void mlock_new_folio(struct folio *folio if (!folio_batch_add(fbatch, mlock_new(folio)) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); - local_unlock(&mlock_fbatch.lock); + local_qpw_unlock(&mlock_fbatch.lock); } /** @@ -291,7 +301,7 @@ void munlock_folio(struct folio *folio) { struct folio_batch *fbatch; - local_lock(&mlock_fbatch.lock); + local_qpw_lock(&mlock_fbatch.lock); fbatch = this_cpu_ptr(&mlock_fbatch.fbatch); /* * folio_test_clear_mlocked(folio) must be left to __munlock_folio(), @@ -301,7 +311,7 @@ void munlock_folio(struct folio *folio) if (!folio_batch_add(fbatch, folio) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); - local_unlock(&mlock_fbatch.lock); + local_qpw_unlock(&mlock_fbatch.lock); } static inline unsigned int folio_mlock_step(struct folio *folio, @@ -823,3 +833,18 @@ void user_shm_unlock(size_t size, struct spin_unlock(&shmlock_user_lock); put_ucounts(ucounts); } + +int __init mlock_init(void) +{ + unsigned int cpu; + + for_each_possible_cpu(cpu) { + struct mlock_fbatch *fbatch = &per_cpu(mlock_fbatch, cpu); + + qpw_lock_init(&fbatch->lock); + } + + return 0; +} + +module_init(mlock_init); Index: linux/mm/swap.c =================================================================== --- linux.orig/mm/swap.c +++ linux/mm/swap.c @@ -35,7 +35,7 @@ #include #include #include -#include +#include #include #include "internal.h" @@ -52,7 +52,7 @@ struct cpu_fbatches { * The following folio batches are grouped together because they are protected * by disabling preemption (and interrupts remain enabled). */ - local_lock_t lock; + qpw_lock_t lock; struct folio_batch lru_add; struct folio_batch lru_deactivate_file; struct folio_batch lru_deactivate; @@ -61,14 +61,11 @@ struct cpu_fbatches { struct folio_batch lru_activate; #endif /* Protecting the following batches which require disabling interrupts */ - local_lock_t lock_irq; + qpw_lock_t lock_irq; struct folio_batch lru_move_tail; }; -static DEFINE_PER_CPU(struct cpu_fbatches, cpu_fbatches) = { - .lock = INIT_LOCAL_LOCK(lock), - .lock_irq = INIT_LOCAL_LOCK(lock_irq), -}; +static DEFINE_PER_CPU(struct cpu_fbatches, cpu_fbatches); static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, unsigned long *flagsp) @@ -187,18 +184,18 @@ static void __folio_batch_add_and_move(s folio_get(folio); if (disable_irq) - local_lock_irqsave(&cpu_fbatches.lock_irq, flags); + local_qpw_lock_irqsave(&cpu_fbatches.lock_irq, flags); else - local_lock(&cpu_fbatches.lock); + local_qpw_lock(&cpu_fbatches.lock); if (!folio_batch_add(this_cpu_ptr(fbatch), folio) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) folio_batch_move_lru(this_cpu_ptr(fbatch), move_fn); if (disable_irq) - local_unlock_irqrestore(&cpu_fbatches.lock_irq, flags); + local_qpw_unlock_irqrestore(&cpu_fbatches.lock_irq, flags); else - local_unlock(&cpu_fbatches.lock); + local_qpw_unlock(&cpu_fbatches.lock); } #define folio_batch_add_and_move(folio, op) \ @@ -359,7 +356,7 @@ static void __lru_cache_activate_folio(s struct folio_batch *fbatch; int i; - local_lock(&cpu_fbatches.lock); + local_qpw_lock(&cpu_fbatches.lock); fbatch = this_cpu_ptr(&cpu_fbatches.lru_add); /* @@ -381,7 +378,7 @@ static void __lru_cache_activate_folio(s } } - local_unlock(&cpu_fbatches.lock); + local_qpw_unlock(&cpu_fbatches.lock); } #ifdef CONFIG_LRU_GEN @@ -653,9 +650,9 @@ void lru_add_drain_cpu(int cpu) unsigned long flags; /* No harm done if a racing interrupt already did this */ - local_lock_irqsave(&cpu_fbatches.lock_irq, flags); + qpw_lock_irqsave(&cpu_fbatches.lock_irq, flags, cpu); folio_batch_move_lru(fbatch, lru_move_tail); - local_unlock_irqrestore(&cpu_fbatches.lock_irq, flags); + qpw_unlock_irqrestore(&cpu_fbatches.lock_irq, flags, cpu); } fbatch = &fbatches->lru_deactivate_file; @@ -733,9 +730,9 @@ void folio_mark_lazyfree(struct folio *f void lru_add_drain(void) { - local_lock(&cpu_fbatches.lock); + local_qpw_lock(&cpu_fbatches.lock); lru_add_drain_cpu(smp_processor_id()); - local_unlock(&cpu_fbatches.lock); + local_qpw_unlock(&cpu_fbatches.lock); mlock_drain_local(); } @@ -745,30 +742,30 @@ void lru_add_drain(void) * the same cpu. It shouldn't be a problem in !SMP case since * the core is only one and the locks will disable preemption. */ -static void lru_add_mm_drain(void) +static void lru_add_mm_drain(int cpu) { - local_lock(&cpu_fbatches.lock); - lru_add_drain_cpu(smp_processor_id()); - local_unlock(&cpu_fbatches.lock); - mlock_drain_local(); + qpw_lock(&cpu_fbatches.lock, cpu); + lru_add_drain_cpu(cpu); + qpw_unlock(&cpu_fbatches.lock, cpu); + mlock_drain_cpu(cpu); } void lru_add_drain_cpu_zone(struct zone *zone) { - local_lock(&cpu_fbatches.lock); + local_qpw_lock(&cpu_fbatches.lock); lru_add_drain_cpu(smp_processor_id()); drain_local_pages(zone); - local_unlock(&cpu_fbatches.lock); + local_qpw_unlock(&cpu_fbatches.lock); mlock_drain_local(); } #ifdef CONFIG_SMP -static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work); +static DEFINE_PER_CPU(struct qpw_struct, lru_add_drain_qpw); -static void lru_add_drain_per_cpu(struct work_struct *dummy) +static void lru_add_drain_per_cpu(struct work_struct *w) { - lru_add_mm_drain(); + lru_add_mm_drain(qpw_get_cpu(w)); } static DEFINE_PER_CPU(struct work_struct, bh_add_drain_work); @@ -883,12 +880,12 @@ static inline void __lru_add_drain_all(b cpumask_clear(&has_mm_work); cpumask_clear(&has_bh_work); for_each_online_cpu(cpu) { - struct work_struct *mm_work = &per_cpu(lru_add_drain_work, cpu); + struct qpw_struct *mm_qpw = &per_cpu(lru_add_drain_qpw, cpu); struct work_struct *bh_work = &per_cpu(bh_add_drain_work, cpu); if (cpu_needs_mm_drain(cpu)) { - INIT_WORK(mm_work, lru_add_drain_per_cpu); - queue_work_on(cpu, mm_percpu_wq, mm_work); + INIT_QPW(mm_qpw, lru_add_drain_per_cpu, cpu); + queue_percpu_work_on(cpu, mm_percpu_wq, mm_qpw); __cpumask_set_cpu(cpu, &has_mm_work); } @@ -900,7 +897,7 @@ static inline void __lru_add_drain_all(b } for_each_cpu(cpu, &has_mm_work) - flush_work(&per_cpu(lru_add_drain_work, cpu)); + flush_percpu_work(&per_cpu(lru_add_drain_qpw, cpu)); for_each_cpu(cpu, &has_bh_work) flush_work(&per_cpu(bh_add_drain_work, cpu)); @@ -950,7 +947,7 @@ void lru_cache_disable(void) #ifdef CONFIG_SMP __lru_add_drain_all(true); #else - lru_add_mm_drain(); + lru_add_mm_drain(smp_processor_id()); invalidate_bh_lrus_cpu(); #endif } @@ -1124,6 +1121,7 @@ static const struct ctl_table swap_sysct void __init swap_setup(void) { unsigned long megs = PAGES_TO_MB(totalram_pages()); + unsigned int cpu; /* Use a smaller cluster for small-memory machines */ if (megs < 16) @@ -1136,4 +1134,11 @@ void __init swap_setup(void) */ register_sysctl_init("vm", swap_sysctl_table); + + for_each_possible_cpu(cpu) { + struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu); + + qpw_lock_init(&fbatches->lock); + qpw_lock_init(&fbatches->lock_irq); + } } Index: linux/mm/internal.h =================================================================== --- linux.orig/mm/internal.h +++ linux/mm/internal.h @@ -1140,10 +1140,12 @@ static inline void munlock_vma_folio(str munlock_folio(folio); } +int __init mlock_init(void); void mlock_new_folio(struct folio *folio); bool need_mlock_drain(int cpu); void mlock_drain_local(void); -void mlock_drain_remote(int cpu); +void mlock_drain_cpu(int cpu); +void mlock_drain_offline(int cpu); extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); Index: linux/mm/page_alloc.c =================================================================== --- linux.orig/mm/page_alloc.c +++ linux/mm/page_alloc.c @@ -6285,7 +6285,7 @@ static int page_alloc_cpu_dead(unsigned struct zone *zone; lru_add_drain_cpu(cpu); - mlock_drain_remote(cpu); + mlock_drain_offline(cpu); drain_pages(cpu); /*