The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Yang Shi <yang@os.amperecomputing.com>
To: Heiko Carstens <hca@linux.ibm.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	cl@gentwo.org, dennis@kernel.org, tj@kernel.org,
	urezki@gmail.com, catalin.marinas@arm.com, will@kernel.org,
	ryan.roberts@arm.com, akpm@linux-foundation.org,
	gor@linux.ibm.com, agordeev@linux.ibm.com, linux-mm@kvack.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC v1 PATCH 0/11] Optimize this_cpu_*() ops for non-x86 (ARM64 for this series)
Date: Fri, 15 May 2026 11:35:35 -0700	[thread overview]
Message-ID: <124aabec-b4bf-4d76-8bf8-de30c31d44c9@os.amperecomputing.com> (raw)
In-Reply-To: <20260515162804.10935A18-hca@linux.ibm.com>



On 5/15/26 9:28 AM, Heiko Carstens wrote:
> On Wed, May 13, 2026 at 05:00:19PM -0700, Yang Shi wrote:
>> On 5/12/26 2:02 AM, David Hildenbrand (Arm) wrote:
>>> There was quite some feedback during the LSF/MM session, what's the current plan?
> ...
>> I'm not sure whether S390 folks will implement this on S390 or not, anyway
>> they are cc'ed.
> I'm not sure yet, however after I had a look at the architecture documentation
> a couple of weeks ago, I think it shouldn't be too hard to get this working on
> s390 as well. I was a bit concerned about TLB flushing, if changes to the
> kernel mapping happen with per-cpu page tables, but as of now I believe this
> shouldn't cause any harm (famous last words...).

Yeah, it shouldn't. Kernel needs to flush TLB for all CPUs regardless of 
percpu page table when kernel mapping is changed. There should not be 
any extra overhead for the most cases.

Some extra TLB flush is needed for "percpu local mapping area", but all 
CPUs use the same virtual address, so we should just need one more TLB 
flush call with the same virtual address for all CPUs. In addition, the 
percpu chunk destruction happens asynchronously in work queue. Umapping 
page tables, flushing TLB and freeing pages all happen in work queue 
when the whole chunk is freed. The fast path basically just updates an 
allocation bitmap.

Thanks,
Yang