From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BD0C1C83F1B
	for <linux-mm@archiver.kernel.org>; Wed, 16 Jul 2025 08:04:08 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 09FB28E0001; Wed, 16 Jul 2025 04:04:08 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E32B26B009A; Wed, 16 Jul 2025 04:04:07 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id AD5BD8E0001; Wed, 16 Jul 2025 04:04:07 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id D73836B0099
	for <linux-mm@kvack.org>; Wed, 16 Jul 2025 04:04:04 -0400 (EDT)
Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id 251B412C798
	for <linux-mm@kvack.org>; Wed, 16 Jul 2025 08:04:04 +0000 (UTC)
X-FDA: 83669389608.19.4261D84
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by imf13.hostedemail.com (Postfix) with ESMTP id AA7F520002
	for <linux-mm@kvack.org>; Wed, 16 Jul 2025 08:04:01 +0000 (UTC)
Authentication-Results: imf13.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Dnu4b2N7;
	spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com;
	dmarc=pass (policy=quarantine) header.from=redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752653041; a=rsa-sha256;
	cv=none;
	b=y7vP0o+hM/7mVr0xdQv4qmC44yym5chSxqq6/eCdCaVShe6tJ3FvlqBdEEkJl78izzFHT+
	HfB9FXO28JNtnJbF1NrWcMJKWtJtEooYYydIrA3ZAG8LQFJn2w9yx0uvo8QvSE/q+0GV+I
	pjlD55gtHH8P8ILPjzLnyO8WXZhrEzk=
ARC-Authentication-Results: i=1;
	imf13.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Dnu4b2N7;
	spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com;
	dmarc=pass (policy=quarantine) header.from=redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1752653041;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=851XL3aft8TGiZsR94ORULOuagFeL0V3mQOANhLNmA4=;
	b=QLuxuC2zwLMlefGIPT7lDx/Kth7hEZF30DlcHqxXQpVYPVh0t61gJ6qONa35mH58c34zGi
	xZLp99lE/BzcQfozj14/Ip3DxB+yaaWmJ1v91j1nS4Zcpi/yrNQ3HATTYOALOUTnekDjCY
	iUQwb7HiY/pNX+qY83uRGjOmuASl1+E=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1752653041;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=851XL3aft8TGiZsR94ORULOuagFeL0V3mQOANhLNmA4=;
	b=Dnu4b2N7GMTWRXkYmctipP3SwpJu20byZWOxRI6llzJlcAZ/3bvYozrGC/1bAxMOM1YDur
	LNZe92d8cWBwooYmnD+EHjNMCIU/fsr8E3BTNPgJOXbJMCoznyZ+vRrjFOX2jh2vFR1yFU
	kXa51UCWn4+4RoOrE7gAQ5pYvjrxUAk=
Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com
 [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-352-dqfMJ6JgO3-eXf8KPmGyYw-1; Wed, 16 Jul 2025 04:03:59 -0400
X-MC-Unique: dqfMJ6JgO3-eXf8KPmGyYw-1
X-Mimecast-MFC-AGG-ID: dqfMJ6JgO3-eXf8KPmGyYw_1752653039
Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-456013b59c1so23930705e9.3
        for <linux-mm@kvack.org>; Wed, 16 Jul 2025 01:03:59 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1752653038; x=1753257838;
        h=content-transfer-encoding:in-reply-to:organization:content-language
         :from:references:cc:to:subject:user-agent:mime-version:date
         :message-id:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=851XL3aft8TGiZsR94ORULOuagFeL0V3mQOANhLNmA4=;
        b=tJ91ViZbrME/CvCMFwzeIbRBT7ya47uwj3x8eeagBwjWTRXjWi3+70adsuWRp4nfL1
         W23E3ebwxAO0lv6sSTqw7TtrmTi+YO8toWIgnJCANTBBOxao/tFUUBY3uXYYTlH+UeOr
         kKfbRMg/ZvD+GziMSqKWqm7q8cATdFF4N3uUYu5P6nly1N+xuwu0PsqqET4QvuzxU9FO
         6blp38XQM30+RmikPnc0Ytg0yuzeKn3sJbDWzllI1OJ3fI6EnT1WQZCt2wWJDQThk1uQ
         U7sAOJLreTeLamugTJ/aVhhH0wuvLpG5ssrJHVrCgnIwBSQojlloeRJldTfkAfEzOq7K
         epcA==
X-Forwarded-Encrypted: i=1; AJvYcCV2mDnaY5DyZPEhG2CCgwLpROqJ2GHAzN/JdecPJ8PyhczeaCX+m6tFlfjSzTPlI1BfbpA0CbQUZA==@kvack.org
X-Gm-Message-State: AOJu0YzDJbrrZdhxU1esNtu/IfWt83MI8HoHkQxnFmqa7D6IF4oHv0gT
	qlwFJ2dvfk/D8TXnk96U/YtNMT3tmUpQeR+zVKPBgESWV/WvUdqMSHHrqnyuy/IB5A5Xi5o/crP
	5oJ6vyE8tf3JcN9q4mnLBC8hXv0fKiTSXf0skM/gsDqz1j2891ceb
X-Gm-Gg: ASbGnctkG4XGWJxfbsnyuJ/qc19qEeo60FfB1/nPV5j7GMWAGpMmozJswJ8WXpFtgLg
	DaRV3JRVFBwbK47KIKfccndwEo1aANoFFt5Vymlq9g73G7bNWYYRXKCX+E4s3tWBydQp4DjYU2y
	b/QhEylMOHnKEozUzY941yE/IJK+yMCss6euPRjWu41AQk3jQ3ltHsXaJGL1puKHWOa8jmdlLG/
	qqIjhtRd4vxTnP+ze0obVWJNg4BOF+vV806F6VlVOhL77ThTJF1i8rRpEndZFm3kKfJQK2IR2/l
	fawtyLD16/bEISL3IVFIr5c1ZJUdfGMkahF8xt8dDYaj7FxpslrTpxl8hS2H+6uoKmPZkRGYe7x
	zAKjk4194bzxGH7N+dYFdINo7fk8uNF2+mnZRSc1NlGgRFDBVdrTW81ZM2WzAznUMlHQ=
X-Received: by 2002:a05:600c:3f09:b0:456:1dd2:4e3a with SMTP id 5b1f17b1804b1-4562e32e292mr16194995e9.3.1752653038272;
        Wed, 16 Jul 2025 01:03:58 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IHkKL4nxmV1Lcj2I1V4yty5DWzRH60iNK2uYmfl9myl0rm6VUxxn6tUDnd+KzR9rieDyOGNiA==
X-Received: by 2002:a05:600c:3f09:b0:456:1dd2:4e3a with SMTP id 5b1f17b1804b1-4562e32e292mr16194275e9.3.1752653037564;
        Wed, 16 Jul 2025 01:03:57 -0700 (PDT)
Received: from ?IPV6:2003:d8:2f1d:ed00:1769:dd7c:7208:eb33? (p200300d82f1ded001769dd7c7208eb33.dip0.t-ipconnect.de. [2003:d8:2f1d:ed00:1769:dd7c:7208:eb33])
        by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4562e89c308sm12802295e9.30.2025.07.16.01.03.55
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 16 Jul 2025 01:03:56 -0700 (PDT)
Message-ID: <213a4333-24de-4216-8d9a-b70ac52c4263@redhat.com>
Date: Wed, 16 Jul 2025 10:03:54 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v5 13/14] mm: memory: support clearing page-extents
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
 akpm@linux-foundation.org, bp@alien8.de, dave.hansen@linux.intel.com,
 hpa@zytor.com, mingo@redhat.com, mjguzik@gmail.com, luto@kernel.org,
 peterz@infradead.org, acme@kernel.org, namhyung@kernel.org,
 tglx@linutronix.de, willy@infradead.org, raghavendra.kt@amd.com,
 boris.ostrovsky@oracle.com, konrad.wilk@oracle.com
References: <20250710005926.1159009-1-ankur.a.arora@oracle.com>
 <20250710005926.1159009-14-ankur.a.arora@oracle.com>
 <d6413d17-c530-4553-9eca-dec8dce37e7e@redhat.com> <878qkohleu.fsf@oracle.com>
From: David Hildenbrand <david@redhat.com>
Organization: Red Hat
In-Reply-To: <878qkohleu.fsf@oracle.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-MFC-PROC-ID: yz3DDmYQ6bp2nC4EaJjOgnFG17hvII5q96FWckS4qYY_1752653039
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Rspam-User: 
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: AA7F520002
X-Stat-Signature: ao9k75s3jp99cygf7k3fu7zs85i1kxnx
X-HE-Tag: 1752653041-454100
X-HE-Meta: U2FsdGVkX1+xvWy8k3wDj2bKXoaIBEvzOLqNB1EnX05zKkNuF2JtyJ4F5lghyBfFLw1fTBstbRCQK5oLwxSmo+JevDRh9nU6MxYKoACzGYMdmhVlpl3wq7Dbe084TNV9zMIzzZUwrtC5f31A9/V2eKZUw5QjCSVGemsTmL7/yU4lTnAbidnLqnph6hZWqZpMST268ya8Yp/xbxOmRDb/2ZskrjE5ArlyDPXFzNlsuO4hI2dqoHN7kqLvSIKWSpl+oDsoaeRkSRYQaRNUVWcUm9mu1NsRB/QjLaUrkDX2Idamlva7754rhYXN+6yF1XHqNQe+l+pq/NuLFx9dfbNVtI4hZfQQzr0IV9n4GpHALQ3ou7oaKbIecgaUOBHJFEusfF2OHwOT/SZUZYVZhDCv1G3eN+jzF2AEb6d99j12FVuwegpGs4XeK8gictkQl0FdmGgS4dUnlop9gvSj6wJlSEqddtF5GY+G1RALHM7KCSvlejfZUNtZ/YSvRD6qAUZ4yPMBCNAAy4kNhPEY4X+Yv4OpXzVS5dxyH+fVjl26EM4xq9bAjvJ7CrYE35sY+ZIFFdtVWlB88xHqFqStPHnzINNpiXFT+e3S3rcRzmYHYx6Xage4zDoJO0e4g1gEi1zVyk3UH3Spfm7M6GS6TQ0WOEZQg6yLbVLp/p3M8isFRXCoWe8zOvG3PXF3nmv0iw8k5howiVNU+RfoQ1k5dofETRsaAPnpFgmtzYpUdKvWLS6VywRDbAjEl2B62s6mtg4lagqczoJXxsSBeckaaLyr79dyfT1cMECeWEIDAgPTKXsL6H3T5OnIOxl5h50FL9N1UcpxQgO+PxU/mhXqD41qHP2SW3Pfxn4uh7cmVGYsJphYgoCvwhydE1WjMb9nW0dovb8OdvqUZck0yC/+TIcMvXR7e6hzJXecKFF56+bT8a9sXYn9c5FjIai3o4sRWjZK799E2T06J8yxJ89kD9f
 2V7B460f
 0fEh6wyTJjd7ssC8Ft6FiYxltUCyrjcHs9iM6sLoKoPif72VGgVoiHgpVAYEfzxQJH0h/LzcfAe5a3McxyJ06IkTGuj9Y3knT6tYZnl+1S9pR4L+vwcq8W4LhUw0Dk037TvhuOTpfWc40q3X3GTqLi4vWeNIfQnpf9DVBmbecXjEn5FXErAYw/5Guqqrl+SZXKKWmn0TnjT/7XQnBNO0qBrWuI5ISrZ0iNBpZYGvZiDS6csWtFESmF1YumHwgWBj4nPaFbhcm1V5+Jo4UZBeyDAn7F9ApbOKdjtUmKuqnixlFGxOHz0cvrHjzdq7eP2evRsJQX6cE3rGKy8u22rh2I4OWctg1c7GVCd6t58CXce1naVQ6T5BNM3aMfzwXK1F1ISJR+iqRN44alteinaD6PPaT7m8X2Bt/ANYUbVyPXcOplZZe9YdFsiUQZA==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 16.07.25 05:19, Ankur Arora wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 10.07.25 02:59, Ankur Arora wrote:
>>> folio_zero_user() is constrained to clear in a page-at-a-time
>>> fashion because it supports CONFIG_HIGHMEM which means that kernel
>>> mappings for pages in a folio are not guaranteed to be contiguous.
>>> We don't have this problem when running under configurations with
>>> CONFIG_CLEAR_PAGE_EXTENT (implies !CONFIG_HIGHMEM), so zero in
>>> longer page-extents.
>>> This is expected to be faster because the processor can now optimize
>>> the clearing based on the knowledge of the extent.
>>> However, clearing in larger chunks can have two other problems:
>>>    - cache locality when clearing small folios (< MAX_ORDER_NR_PAGES)
>>>      (larger folios don't have any expectation of cache locality).
>>>    - preemption latency when clearing large folios.
>>> Handle the first by splitting the clearing in three parts: the
>>> faulting page and its immediate locality, its left and right
>>> regions; the local neighbourhood is cleared last.
>>> The second problem is relevant only when running under cooperative
>>> preemption models. Limit the worst case preemption latency by clearing
>>> in architecture specified ARCH_CLEAR_PAGE_EXTENT units.
>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>> ---
>>>    mm/memory.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>    1 file changed, 85 insertions(+), 1 deletion(-)
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index b0cda5aab398..c52806270375 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -7034,6 +7034,7 @@ static inline int process_huge_page(
>>>    	return 0;
>>>    }
>>>    +#ifndef CONFIG_CLEAR_PAGE_EXTENT
>>>    static void clear_gigantic_page(struct folio *folio, unsigned long addr_hint,
>>>    				unsigned int nr_pages)
>>>    {
>>> @@ -7058,7 +7059,10 @@ static int clear_subpage(unsigned long addr, int idx, void *arg)
>>>    /**
>>>     * folio_zero_user - Zero a folio which will be mapped to userspace.
>>>     * @folio: The folio to zero.
>>> - * @addr_hint: The address will be accessed or the base address if uncelar.
>>> + * @addr_hint: The address accessed by the user or the base address.
>>> + *
>>> + * folio_zero_user() uses clear_gigantic_page() or process_huge_page() to
>>> + * do page-at-a-time zeroing because it needs to handle CONFIG_HIGHMEM.
>>>     */
>>>    void folio_zero_user(struct folio *folio, unsigned long addr_hint)
>>>    {
>>> @@ -7070,6 +7074,86 @@ void folio_zero_user(struct folio *folio, unsigned long addr_hint)
>>>    		process_huge_page(addr_hint, nr_pages, clear_subpage, folio);
>>>    }
>>>    +#else /* CONFIG_CLEAR_PAGE_EXTENT */
>>> +
>>> +static void clear_pages_resched(void *addr, int npages)
>>> +{
>>> +	int i, remaining;
>>> +
>>> +	if (preempt_model_preemptible()) {
>>> +		clear_pages(addr, npages);
>>> +		goto out;
>>> +	}
>>> +
>>> +	for (i = 0; i < npages/ARCH_CLEAR_PAGE_EXTENT; i++) {
>>> +		clear_pages(addr + i * ARCH_CLEAR_PAGE_EXTENT * PAGE_SIZE,
>>> +			    ARCH_CLEAR_PAGE_EXTENT);
>>> +		cond_resched();
>>> +	}
>>> +
>>> +	remaining = npages % ARCH_CLEAR_PAGE_EXTENT;
>>> +
>>> +	if (remaining)
>>> +		clear_pages(addr + i * ARCH_CLEAR_PAGE_EXTENT * PAGE_SHIFT,
>>> +			    remaining);
>>> +out:
>>> +	cond_resched();
>>> +}
>>> +
>>> +/*
>>> + * folio_zero_user - Zero a folio which will be mapped to userspace.
>>> + * @folio: The folio to zero.
>>> + * @addr_hint: The address accessed by the user or the base address.
>>> + *
>>> + * Uses architectural support for clear_pages() to zero page extents
>>> + * instead of clearing page-at-a-time.
>>> + *
>>> + * Clearing of small folios (< MAX_ORDER_NR_PAGES) is split in three parts:
>>> + * pages in the immediate locality of the faulting page, and its left, right
>>> + * regions; the local neighbourhood cleared last in order to keep cache
>>> + * lines of the target region hot.
>>> + *
>>> + * For larger folios we assume that there is no expectation of cache locality
>>> + * and just do a straight zero.
>>> + */
>>> +void folio_zero_user(struct folio *folio, unsigned long addr_hint)
>>> +{
>>> +	unsigned long base_addr = ALIGN_DOWN(addr_hint, folio_size(folio));
>>> +	const long fault_idx = (addr_hint - base_addr) / PAGE_SIZE;
>>> +	const struct range pg = DEFINE_RANGE(0, folio_nr_pages(folio) - 1);
>>> +	const int width = 2; /* number of pages cleared last on either side */
>>> +	struct range r[3];
>>> +	int i;
>>> +
>>> +	if (folio_nr_pages(folio) > MAX_ORDER_NR_PAGES) {
>>> +		clear_pages_resched(page_address(folio_page(folio, 0)), folio_nr_pages(folio));
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Faulting page and its immediate neighbourhood. Cleared at the end to
>>> +	 * ensure it sticks around in the cache.
>>> +	 */
>>> +	r[2] = DEFINE_RANGE(clamp_t(s64, fault_idx - width, pg.start, pg.end),
>>> +			    clamp_t(s64, fault_idx + width, pg.start, pg.end));
>>> +
>>> +	/* Region to the left of the fault */
>>> +	r[1] = DEFINE_RANGE(pg.start,
>>> +			    clamp_t(s64, r[2].start-1, pg.start-1, r[2].start));
>>> +
>>> +	/* Region to the right of the fault: always valid for the common fault_idx=0 case. */
>>> +	r[0] = DEFINE_RANGE(clamp_t(s64, r[2].end+1, r[2].end, pg.end+1),
>>> +			    pg.end);
>>> +
>>> +	for (i = 0; i <= 2; i++) {
>>> +		int npages = range_len(&r[i]);
>>> +
>>> +		if (npages > 0)
>>> +			clear_pages_resched(page_address(folio_page(folio, r[i].start)), npages);
>>> +	}
>>> +}
>>> +#endif /* CONFIG_CLEAR_PAGE_EXTENT */
>>> +
>>>    static int copy_user_gigantic_page(struct folio *dst, struct folio *src,
>>>    				   unsigned long addr_hint,
>>>    				   struct vm_area_struct *vma,
>>
>> So, folio_zero_user() is only compiled with THP | HUGETLB already.
>>
>> What we should probably do is scrap the whole new kconfig option and
>> do something like this in here:
> 
> So, in principle I don't disagree and unifying both of these is cleaner
> than introducing a whole new option.

Yes, after playing with the code, a new config option just for that is not
what we want.

> 
> However that still leaves this code having to contort around CONFIG_HIGHMEM
> which is probably even less frequently used than THP | HUGETLB.

Not sure I understand your question correctly, but thp+hugetlb are compatible with
32bit and highmem.

There are plans of removing highmem support, but that's a different story :)

I think as long as these configs exist, we should just support them, although
performance is a secondary concern.

> 
> Maybe we should get rid of ARCH_HAS_CLEAR_PAGES completely and everyone
> with !HIGHMEM either use a generic version of clear_pages() which loops
> and calls clear_page() or some architectural override.
> 
> And, then we can do a similar transformation with copy_pages() (and
> copy_user_large_folio()).
> 
> At that point, process_huge_page() is used only for !HIGHMEM configs

I assume you meant HIGHMEM

> configs which likely have relatively small caches and so that leaves
> it probably over-engineered.

I don't think we need to jump through hoops to optimize performance on
highmem, yes.

> 
> The thing that gives me pause is that non-x86 might perform worse
> when they switch away from the left-right zeroing approach in
> process_huge_page() to a generic clear_pages().

Right. Or they perform better. Hard to know.

> 
> So, maybe allowing architectures to opt in by having to define
> ARCH_HAS_CLEAR_PAGES would allow doing this in a more measured fashion.

One tricky thing is dealing with architectures where clear_user_highpage()
does cachemanagement.

So the more I think about it, I wonder if we really should just design it
all around clear_user_highpages and clear_user_pages, and have only a
single clearing algorithm.

Essentially, something like the following, just that we need a generic
clear_user_pages that iterates over clear_user_page.

Then, x86_64 could simply implement clear_user_pages by routing it to your
clear_pages, and define CLEAR_PAGES_RESCHED_NR (although I wonder if we can
do better here).


diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 6234f316468c9..031e19c56765b 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -264,6 +264,14 @@ static inline void tag_clear_highpage(struct page *page)
  #ifdef CONFIG_HIGHMEM
  void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
  		unsigned start2, unsigned end2);
+static inline void clear_user_highpages(struct page *page, unsigned long vaddr,
+		unsigned int nr_pages)
+{
+	unsigned int i;
+
+	for (i = 0; i <= nr_pages; i++)
+		clear_user_highpage(nth_page(page, i), vaddr + i * PAGE_SIZE);
+}
  #else
  static inline void zero_user_segments(struct page *page,
  		unsigned start1, unsigned end1,
@@ -284,6 +292,7 @@ static inline void zero_user_segments(struct page *page,
  	for (i = 0; i < compound_nr(page); i++)
  		flush_dcache_page(page + i);
  }
+#define clear_user_highpages clear_user_pages
  #endif
  
  static inline void zero_user_segment(struct page *page,
diff --git a/mm/memory.c b/mm/memory.c
index 3dd6c57e6511e..8aebf6e0765d8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -7009,40 +7009,92 @@ static inline int process_huge_page(
  	return 0;
  }
  
-static void clear_gigantic_page(struct folio *folio, unsigned long addr_hint,
-				unsigned int nr_pages)
+#ifndef CLEAR_PAGES_RESCHED_NR
+#define CLEAR_PAGES_RESCHED_NR		1
+#endif /* CLEAR_PAGES_RESCHED_NR */
+
+static void clear_user_highpages_resched(struct page *page, unsigned long addr,
+		unsigned int nr_pages)
  {
-	unsigned long addr = ALIGN_DOWN(addr_hint, folio_size(folio));
-	int i;
+	unsigned int i, remaining;
  
-	might_sleep();
-	for (i = 0; i < nr_pages; i++) {
+	if (preempt_model_preemptible()) {
+		clear_user_highpages(page, addr, nr_pages);
+		goto out;
+	}
+
+	for (i = 0; i < nr_pages / CLEAR_PAGES_RESCHED_NR; i++) {
+		clear_user_highpages(nth_page(page, i * CLEAR_PAGES_RESCHED_NR),
+				     addr + i * CLEAR_PAGES_RESCHED_NR * PAGE_SIZE,
+				     CLEAR_PAGES_RESCHED_NR);
-		clear_user_highpage(folio_page(folio, i), addr + i * PAGE_SIZE);
  		cond_resched();
  	}
-}
  
-static int clear_subpage(unsigned long addr, int idx, void *arg)
-{
-	struct folio *folio = arg;
+	remaining = nr_pages % CLEAR_PAGES_RESCHED_NR;
  
-	clear_user_highpage(folio_page(folio, idx), addr);
-	return 0;
+	if (remaining)
+		clear_user_highpages(nth_page(page, i * CLEAR_PAGES_RESCHED_NR),
+				     addr + i * CLEAR_PAGES_RESCHED_NR * PAGE_SHIFT,
+				     remaining);
+out:
+	cond_resched();
  }
  
-/**
+/*
   * folio_zero_user - Zero a folio which will be mapped to userspace.
   * @folio: The folio to zero.
- * @addr_hint: The address will be accessed or the base address if uncelar.
+ * @addr_hint: The address accessed by the user or the base address.
+ *
+ * Uses architectural support for clear_pages() to zero page extents
+ * instead of clearing page-at-a-time.
+ *
+ * Clearing of small folios (< MAX_ORDER_NR_PAGES) is split in three parts:
+ * pages in the immediate locality of the faulting page, and its left, right
+ * regions; the local neighbourhood cleared last in order to keep cache
+ * lines of the target region hot.
+ *
+ * For larger folios we assume that there is no expectation of cache locality
+ * and just do a straight zero.
   */
  void folio_zero_user(struct folio *folio, unsigned long addr_hint)
  {
-	unsigned int nr_pages = folio_nr_pages(folio);
+	const unsigned int nr_pages = folio_nr_pages(folio);
+	const unsigned long addr = ALIGN_DOWN(addr_hint, nr_pages * PAGE_SIZE);
+	const long fault_idx = (addr_hint - addr) / PAGE_SIZE;
+	const struct range pg = DEFINE_RANGE(0, nr_pages - 1);
+	const int width = 2; /* number of pages cleared last on either side */
+	struct range r[3];
+	int i;
+
+	if (unlikely(nr_pages >= MAX_ORDER_NR_PAGES)) {
+		clear_user_highpages_resched(folio_page(folio, 0), addr, nr_pages);
+		return;
+	}
+
+	/*
+	 * Faulting page and its immediate neighbourhood. Cleared at the end to
+	 * ensure it sticks around in the cache.
+	 */
+	r[2] = DEFINE_RANGE(clamp_t(s64, fault_idx - width, pg.start, pg.end),
+			    clamp_t(s64, fault_idx + width, pg.start, pg.end));
+
+	/* Region to the left of the fault */
+	r[1] = DEFINE_RANGE(pg.start,
+			    clamp_t(s64, r[2].start-1, pg.start-1, r[2].start));
+
+	/* Region to the right of the fault: always valid for the common fault_idx=0 case. */
+	r[0] = DEFINE_RANGE(clamp_t(s64, r[2].end+1, r[2].end, pg.end+1),
+			    pg.end);
+
+	for (i = 0; i <= 2; i++) {
+		unsigned int cur_nr_pages = range_len(&r[i]);
+		struct page *cur_page = folio_page(folio, r[i].start);
+		unsigned long cur_addr = addr + folio_page_idx(folio, cur_page) * PAGE_SIZE;
+
+		if (cur_nr_pages > 0)
+			clear_user_highpages_resched(cur_page, cur_addr, cur_nr_pages);
+	}
  
-	if (unlikely(nr_pages > MAX_ORDER_NR_PAGES))
-		clear_gigantic_page(folio, addr_hint, nr_pages);
-	else
-		process_huge_page(addr_hint, nr_pages, clear_subpage, folio);
  }
  
  static int copy_user_gigantic_page(struct folio *dst, struct folio *src,
-- 
2.50.1


On highmem we'd simply process individual pages, who cares.

On !highmem, we'd use the optimized clear_user_pages -> clear_pages implementation
if available. Otherwise, we clear individual pages.

Yes, we'd lose the left-right pattern.

If really important we could somehow let the architecture opt in and do the call
to the existing process function.


-- 
Cheers,

David / dhildenb