From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD0C1C83F1B for ; Wed, 16 Jul 2025 08:04:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09FB28E0001; Wed, 16 Jul 2025 04:04:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E32B26B009A; Wed, 16 Jul 2025 04:04:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD5BD8E0001; Wed, 16 Jul 2025 04:04:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D73836B0099 for ; Wed, 16 Jul 2025 04:04:04 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 251B412C798 for ; Wed, 16 Jul 2025 08:04:04 +0000 (UTC) X-FDA: 83669389608.19.4261D84 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id AA7F520002 for ; Wed, 16 Jul 2025 08:04:01 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Dnu4b2N7; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752653041; a=rsa-sha256; cv=none; b=y7vP0o+hM/7mVr0xdQv4qmC44yym5chSxqq6/eCdCaVShe6tJ3FvlqBdEEkJl78izzFHT+ HfB9FXO28JNtnJbF1NrWcMJKWtJtEooYYydIrA3ZAG8LQFJn2w9yx0uvo8QvSE/q+0GV+I pjlD55gtHH8P8ILPjzLnyO8WXZhrEzk= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Dnu4b2N7; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752653041; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=851XL3aft8TGiZsR94ORULOuagFeL0V3mQOANhLNmA4=; b=QLuxuC2zwLMlefGIPT7lDx/Kth7hEZF30DlcHqxXQpVYPVh0t61gJ6qONa35mH58c34zGi xZLp99lE/BzcQfozj14/Ip3DxB+yaaWmJ1v91j1nS4Zcpi/yrNQ3HATTYOALOUTnekDjCY iUQwb7HiY/pNX+qY83uRGjOmuASl1+E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752653041; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=851XL3aft8TGiZsR94ORULOuagFeL0V3mQOANhLNmA4=; b=Dnu4b2N7GMTWRXkYmctipP3SwpJu20byZWOxRI6llzJlcAZ/3bvYozrGC/1bAxMOM1YDur LNZe92d8cWBwooYmnD+EHjNMCIU/fsr8E3BTNPgJOXbJMCoznyZ+vRrjFOX2jh2vFR1yFU kXa51UCWn4+4RoOrE7gAQ5pYvjrxUAk= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-352-dqfMJ6JgO3-eXf8KPmGyYw-1; Wed, 16 Jul 2025 04:03:59 -0400 X-MC-Unique: dqfMJ6JgO3-eXf8KPmGyYw-1 X-Mimecast-MFC-AGG-ID: dqfMJ6JgO3-eXf8KPmGyYw_1752653039 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-456013b59c1so23930705e9.3 for ; Wed, 16 Jul 2025 01:03:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752653038; x=1753257838; h=content-transfer-encoding:in-reply-to:organization:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=851XL3aft8TGiZsR94ORULOuagFeL0V3mQOANhLNmA4=; b=tJ91ViZbrME/CvCMFwzeIbRBT7ya47uwj3x8eeagBwjWTRXjWi3+70adsuWRp4nfL1 W23E3ebwxAO0lv6sSTqw7TtrmTi+YO8toWIgnJCANTBBOxao/tFUUBY3uXYYTlH+UeOr kKfbRMg/ZvD+GziMSqKWqm7q8cATdFF4N3uUYu5P6nly1N+xuwu0PsqqET4QvuzxU9FO 6blp38XQM30+RmikPnc0Ytg0yuzeKn3sJbDWzllI1OJ3fI6EnT1WQZCt2wWJDQThk1uQ U7sAOJLreTeLamugTJ/aVhhH0wuvLpG5ssrJHVrCgnIwBSQojlloeRJldTfkAfEzOq7K epcA== X-Forwarded-Encrypted: i=1; AJvYcCV2mDnaY5DyZPEhG2CCgwLpROqJ2GHAzN/JdecPJ8PyhczeaCX+m6tFlfjSzTPlI1BfbpA0CbQUZA==@kvack.org X-Gm-Message-State: AOJu0YzDJbrrZdhxU1esNtu/IfWt83MI8HoHkQxnFmqa7D6IF4oHv0gT qlwFJ2dvfk/D8TXnk96U/YtNMT3tmUpQeR+zVKPBgESWV/WvUdqMSHHrqnyuy/IB5A5Xi5o/crP 5oJ6vyE8tf3JcN9q4mnLBC8hXv0fKiTSXf0skM/gsDqz1j2891ceb X-Gm-Gg: ASbGnctkG4XGWJxfbsnyuJ/qc19qEeo60FfB1/nPV5j7GMWAGpMmozJswJ8WXpFtgLg DaRV3JRVFBwbK47KIKfccndwEo1aANoFFt5Vymlq9g73G7bNWYYRXKCX+E4s3tWBydQp4DjYU2y b/QhEylMOHnKEozUzY941yE/IJK+yMCss6euPRjWu41AQk3jQ3ltHsXaJGL1puKHWOa8jmdlLG/ qqIjhtRd4vxTnP+ze0obVWJNg4BOF+vV806F6VlVOhL77ThTJF1i8rRpEndZFm3kKfJQK2IR2/l fawtyLD16/bEISL3IVFIr5c1ZJUdfGMkahF8xt8dDYaj7FxpslrTpxl8hS2H+6uoKmPZkRGYe7x zAKjk4194bzxGH7N+dYFdINo7fk8uNF2+mnZRSc1NlGgRFDBVdrTW81ZM2WzAznUMlHQ= X-Received: by 2002:a05:600c:3f09:b0:456:1dd2:4e3a with SMTP id 5b1f17b1804b1-4562e32e292mr16194995e9.3.1752653038272; Wed, 16 Jul 2025 01:03:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHkKL4nxmV1Lcj2I1V4yty5DWzRH60iNK2uYmfl9myl0rm6VUxxn6tUDnd+KzR9rieDyOGNiA== X-Received: by 2002:a05:600c:3f09:b0:456:1dd2:4e3a with SMTP id 5b1f17b1804b1-4562e32e292mr16194275e9.3.1752653037564; Wed, 16 Jul 2025 01:03:57 -0700 (PDT) Received: from ?IPV6:2003:d8:2f1d:ed00:1769:dd7c:7208:eb33? (p200300d82f1ded001769dd7c7208eb33.dip0.t-ipconnect.de. [2003:d8:2f1d:ed00:1769:dd7c:7208:eb33]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4562e89c308sm12802295e9.30.2025.07.16.01.03.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 16 Jul 2025 01:03:56 -0700 (PDT) Message-ID: <213a4333-24de-4216-8d9a-b70ac52c4263@redhat.com> Date: Wed, 16 Jul 2025 10:03:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 13/14] mm: memory: support clearing page-extents To: Ankur Arora Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, mjguzik@gmail.com, luto@kernel.org, peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, willy@infradead.org, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com References: <20250710005926.1159009-1-ankur.a.arora@oracle.com> <20250710005926.1159009-14-ankur.a.arora@oracle.com> <878qkohleu.fsf@oracle.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <878qkohleu.fsf@oracle.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: yz3DDmYQ6bp2nC4EaJjOgnFG17hvII5q96FWckS4qYY_1752653039 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: AA7F520002 X-Stat-Signature: ao9k75s3jp99cygf7k3fu7zs85i1kxnx X-HE-Tag: 1752653041-454100 X-HE-Meta: U2FsdGVkX1+xvWy8k3wDj2bKXoaIBEvzOLqNB1EnX05zKkNuF2JtyJ4F5lghyBfFLw1fTBstbRCQK5oLwxSmo+JevDRh9nU6MxYKoACzGYMdmhVlpl3wq7Dbe084TNV9zMIzzZUwrtC5f31A9/V2eKZUw5QjCSVGemsTmL7/yU4lTnAbidnLqnph6hZWqZpMST268ya8Yp/xbxOmRDb/2ZskrjE5ArlyDPXFzNlsuO4hI2dqoHN7kqLvSIKWSpl+oDsoaeRkSRYQaRNUVWcUm9mu1NsRB/QjLaUrkDX2Idamlva7754rhYXN+6yF1XHqNQe+l+pq/NuLFx9dfbNVtI4hZfQQzr0IV9n4GpHALQ3ou7oaKbIecgaUOBHJFEusfF2OHwOT/SZUZYVZhDCv1G3eN+jzF2AEb6d99j12FVuwegpGs4XeK8gictkQl0FdmGgS4dUnlop9gvSj6wJlSEqddtF5GY+G1RALHM7KCSvlejfZUNtZ/YSvRD6qAUZ4yPMBCNAAy4kNhPEY4X+Yv4OpXzVS5dxyH+fVjl26EM4xq9bAjvJ7CrYE35sY+ZIFFdtVWlB88xHqFqStPHnzINNpiXFT+e3S3rcRzmYHYx6Xage4zDoJO0e4g1gEi1zVyk3UH3Spfm7M6GS6TQ0WOEZQg6yLbVLp/p3M8isFRXCoWe8zOvG3PXF3nmv0iw8k5howiVNU+RfoQ1k5dofETRsaAPnpFgmtzYpUdKvWLS6VywRDbAjEl2B62s6mtg4lagqczoJXxsSBeckaaLyr79dyfT1cMECeWEIDAgPTKXsL6H3T5OnIOxl5h50FL9N1UcpxQgO+PxU/mhXqD41qHP2SW3Pfxn4uh7cmVGYsJphYgoCvwhydE1WjMb9nW0dovb8OdvqUZck0yC/+TIcMvXR7e6hzJXecKFF56+bT8a9sXYn9c5FjIai3o4sRWjZK799E2T06J8yxJ89kD9f 2V7B460f 0fEh6wyTJjd7ssC8Ft6FiYxltUCyrjcHs9iM6sLoKoPif72VGgVoiHgpVAYEfzxQJH0h/LzcfAe5a3McxyJ06IkTGuj9Y3knT6tYZnl+1S9pR4L+vwcq8W4LhUw0Dk037TvhuOTpfWc40q3X3GTqLi4vWeNIfQnpf9DVBmbecXjEn5FXErAYw/5Guqqrl+SZXKKWmn0TnjT/7XQnBNO0qBrWuI5ISrZ0iNBpZYGvZiDS6csWtFESmF1YumHwgWBj4nPaFbhcm1V5+Jo4UZBeyDAn7F9ApbOKdjtUmKuqnixlFGxOHz0cvrHjzdq7eP2evRsJQX6cE3rGKy8u22rh2I4OWctg1c7GVCd6t58CXce1naVQ6T5BNM3aMfzwXK1F1ISJR+iqRN44alteinaD6PPaT7m8X2Bt/ANYUbVyPXcOplZZe9YdFsiUQZA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 16.07.25 05:19, Ankur Arora wrote: > > David Hildenbrand writes: > >> On 10.07.25 02:59, Ankur Arora wrote: >>> folio_zero_user() is constrained to clear in a page-at-a-time >>> fashion because it supports CONFIG_HIGHMEM which means that kernel >>> mappings for pages in a folio are not guaranteed to be contiguous. >>> We don't have this problem when running under configurations with >>> CONFIG_CLEAR_PAGE_EXTENT (implies !CONFIG_HIGHMEM), so zero in >>> longer page-extents. >>> This is expected to be faster because the processor can now optimize >>> the clearing based on the knowledge of the extent. >>> However, clearing in larger chunks can have two other problems: >>> - cache locality when clearing small folios (< MAX_ORDER_NR_PAGES) >>> (larger folios don't have any expectation of cache locality). >>> - preemption latency when clearing large folios. >>> Handle the first by splitting the clearing in three parts: the >>> faulting page and its immediate locality, its left and right >>> regions; the local neighbourhood is cleared last. >>> The second problem is relevant only when running under cooperative >>> preemption models. Limit the worst case preemption latency by clearing >>> in architecture specified ARCH_CLEAR_PAGE_EXTENT units. >>> Signed-off-by: Ankur Arora >>> --- >>> mm/memory.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 85 insertions(+), 1 deletion(-) >>> diff --git a/mm/memory.c b/mm/memory.c >>> index b0cda5aab398..c52806270375 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -7034,6 +7034,7 @@ static inline int process_huge_page( >>> return 0; >>> } >>> +#ifndef CONFIG_CLEAR_PAGE_EXTENT >>> static void clear_gigantic_page(struct folio *folio, unsigned long addr_hint, >>> unsigned int nr_pages) >>> { >>> @@ -7058,7 +7059,10 @@ static int clear_subpage(unsigned long addr, int idx, void *arg) >>> /** >>> * folio_zero_user - Zero a folio which will be mapped to userspace. >>> * @folio: The folio to zero. >>> - * @addr_hint: The address will be accessed or the base address if uncelar. >>> + * @addr_hint: The address accessed by the user or the base address. >>> + * >>> + * folio_zero_user() uses clear_gigantic_page() or process_huge_page() to >>> + * do page-at-a-time zeroing because it needs to handle CONFIG_HIGHMEM. >>> */ >>> void folio_zero_user(struct folio *folio, unsigned long addr_hint) >>> { >>> @@ -7070,6 +7074,86 @@ void folio_zero_user(struct folio *folio, unsigned long addr_hint) >>> process_huge_page(addr_hint, nr_pages, clear_subpage, folio); >>> } >>> +#else /* CONFIG_CLEAR_PAGE_EXTENT */ >>> + >>> +static void clear_pages_resched(void *addr, int npages) >>> +{ >>> + int i, remaining; >>> + >>> + if (preempt_model_preemptible()) { >>> + clear_pages(addr, npages); >>> + goto out; >>> + } >>> + >>> + for (i = 0; i < npages/ARCH_CLEAR_PAGE_EXTENT; i++) { >>> + clear_pages(addr + i * ARCH_CLEAR_PAGE_EXTENT * PAGE_SIZE, >>> + ARCH_CLEAR_PAGE_EXTENT); >>> + cond_resched(); >>> + } >>> + >>> + remaining = npages % ARCH_CLEAR_PAGE_EXTENT; >>> + >>> + if (remaining) >>> + clear_pages(addr + i * ARCH_CLEAR_PAGE_EXTENT * PAGE_SHIFT, >>> + remaining); >>> +out: >>> + cond_resched(); >>> +} >>> + >>> +/* >>> + * folio_zero_user - Zero a folio which will be mapped to userspace. >>> + * @folio: The folio to zero. >>> + * @addr_hint: The address accessed by the user or the base address. >>> + * >>> + * Uses architectural support for clear_pages() to zero page extents >>> + * instead of clearing page-at-a-time. >>> + * >>> + * Clearing of small folios (< MAX_ORDER_NR_PAGES) is split in three parts: >>> + * pages in the immediate locality of the faulting page, and its left, right >>> + * regions; the local neighbourhood cleared last in order to keep cache >>> + * lines of the target region hot. >>> + * >>> + * For larger folios we assume that there is no expectation of cache locality >>> + * and just do a straight zero. >>> + */ >>> +void folio_zero_user(struct folio *folio, unsigned long addr_hint) >>> +{ >>> + unsigned long base_addr = ALIGN_DOWN(addr_hint, folio_size(folio)); >>> + const long fault_idx = (addr_hint - base_addr) / PAGE_SIZE; >>> + const struct range pg = DEFINE_RANGE(0, folio_nr_pages(folio) - 1); >>> + const int width = 2; /* number of pages cleared last on either side */ >>> + struct range r[3]; >>> + int i; >>> + >>> + if (folio_nr_pages(folio) > MAX_ORDER_NR_PAGES) { >>> + clear_pages_resched(page_address(folio_page(folio, 0)), folio_nr_pages(folio)); >>> + return; >>> + } >>> + >>> + /* >>> + * Faulting page and its immediate neighbourhood. Cleared at the end to >>> + * ensure it sticks around in the cache. >>> + */ >>> + r[2] = DEFINE_RANGE(clamp_t(s64, fault_idx - width, pg.start, pg.end), >>> + clamp_t(s64, fault_idx + width, pg.start, pg.end)); >>> + >>> + /* Region to the left of the fault */ >>> + r[1] = DEFINE_RANGE(pg.start, >>> + clamp_t(s64, r[2].start-1, pg.start-1, r[2].start)); >>> + >>> + /* Region to the right of the fault: always valid for the common fault_idx=0 case. */ >>> + r[0] = DEFINE_RANGE(clamp_t(s64, r[2].end+1, r[2].end, pg.end+1), >>> + pg.end); >>> + >>> + for (i = 0; i <= 2; i++) { >>> + int npages = range_len(&r[i]); >>> + >>> + if (npages > 0) >>> + clear_pages_resched(page_address(folio_page(folio, r[i].start)), npages); >>> + } >>> +} >>> +#endif /* CONFIG_CLEAR_PAGE_EXTENT */ >>> + >>> static int copy_user_gigantic_page(struct folio *dst, struct folio *src, >>> unsigned long addr_hint, >>> struct vm_area_struct *vma, >> >> So, folio_zero_user() is only compiled with THP | HUGETLB already. >> >> What we should probably do is scrap the whole new kconfig option and >> do something like this in here: > > So, in principle I don't disagree and unifying both of these is cleaner > than introducing a whole new option. Yes, after playing with the code, a new config option just for that is not what we want. > > However that still leaves this code having to contort around CONFIG_HIGHMEM > which is probably even less frequently used than THP | HUGETLB. Not sure I understand your question correctly, but thp+hugetlb are compatible with 32bit and highmem. There are plans of removing highmem support, but that's a different story :) I think as long as these configs exist, we should just support them, although performance is a secondary concern. > > Maybe we should get rid of ARCH_HAS_CLEAR_PAGES completely and everyone > with !HIGHMEM either use a generic version of clear_pages() which loops > and calls clear_page() or some architectural override. > > And, then we can do a similar transformation with copy_pages() (and > copy_user_large_folio()). > > At that point, process_huge_page() is used only for !HIGHMEM configs I assume you meant HIGHMEM > configs which likely have relatively small caches and so that leaves > it probably over-engineered. I don't think we need to jump through hoops to optimize performance on highmem, yes. > > The thing that gives me pause is that non-x86 might perform worse > when they switch away from the left-right zeroing approach in > process_huge_page() to a generic clear_pages(). Right. Or they perform better. Hard to know. > > So, maybe allowing architectures to opt in by having to define > ARCH_HAS_CLEAR_PAGES would allow doing this in a more measured fashion. One tricky thing is dealing with architectures where clear_user_highpage() does cachemanagement. So the more I think about it, I wonder if we really should just design it all around clear_user_highpages and clear_user_pages, and have only a single clearing algorithm. Essentially, something like the following, just that we need a generic clear_user_pages that iterates over clear_user_page. Then, x86_64 could simply implement clear_user_pages by routing it to your clear_pages, and define CLEAR_PAGES_RESCHED_NR (although I wonder if we can do better here). diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 6234f316468c9..031e19c56765b 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -264,6 +264,14 @@ static inline void tag_clear_highpage(struct page *page) #ifdef CONFIG_HIGHMEM void zero_user_segments(struct page *page, unsigned start1, unsigned end1, unsigned start2, unsigned end2); +static inline void clear_user_highpages(struct page *page, unsigned long vaddr, + unsigned int nr_pages) +{ + unsigned int i; + + for (i = 0; i <= nr_pages; i++) + clear_user_highpage(nth_page(page, i), vaddr + i * PAGE_SIZE); +} #else static inline void zero_user_segments(struct page *page, unsigned start1, unsigned end1, @@ -284,6 +292,7 @@ static inline void zero_user_segments(struct page *page, for (i = 0; i < compound_nr(page); i++) flush_dcache_page(page + i); } +#define clear_user_highpages clear_user_pages #endif static inline void zero_user_segment(struct page *page, diff --git a/mm/memory.c b/mm/memory.c index 3dd6c57e6511e..8aebf6e0765d8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -7009,40 +7009,92 @@ static inline int process_huge_page( return 0; } -static void clear_gigantic_page(struct folio *folio, unsigned long addr_hint, - unsigned int nr_pages) +#ifndef CLEAR_PAGES_RESCHED_NR +#define CLEAR_PAGES_RESCHED_NR 1 +#endif /* CLEAR_PAGES_RESCHED_NR */ + +static void clear_user_highpages_resched(struct page *page, unsigned long addr, + unsigned int nr_pages) { - unsigned long addr = ALIGN_DOWN(addr_hint, folio_size(folio)); - int i; + unsigned int i, remaining; - might_sleep(); - for (i = 0; i < nr_pages; i++) { + if (preempt_model_preemptible()) { + clear_user_highpages(page, addr, nr_pages); + goto out; + } + + for (i = 0; i < nr_pages / CLEAR_PAGES_RESCHED_NR; i++) { + clear_user_highpages(nth_page(page, i * CLEAR_PAGES_RESCHED_NR), + addr + i * CLEAR_PAGES_RESCHED_NR * PAGE_SIZE, + CLEAR_PAGES_RESCHED_NR); - clear_user_highpage(folio_page(folio, i), addr + i * PAGE_SIZE); cond_resched(); } -} -static int clear_subpage(unsigned long addr, int idx, void *arg) -{ - struct folio *folio = arg; + remaining = nr_pages % CLEAR_PAGES_RESCHED_NR; - clear_user_highpage(folio_page(folio, idx), addr); - return 0; + if (remaining) + clear_user_highpages(nth_page(page, i * CLEAR_PAGES_RESCHED_NR), + addr + i * CLEAR_PAGES_RESCHED_NR * PAGE_SHIFT, + remaining); +out: + cond_resched(); } -/** +/* * folio_zero_user - Zero a folio which will be mapped to userspace. * @folio: The folio to zero. - * @addr_hint: The address will be accessed or the base address if uncelar. + * @addr_hint: The address accessed by the user or the base address. + * + * Uses architectural support for clear_pages() to zero page extents + * instead of clearing page-at-a-time. + * + * Clearing of small folios (< MAX_ORDER_NR_PAGES) is split in three parts: + * pages in the immediate locality of the faulting page, and its left, right + * regions; the local neighbourhood cleared last in order to keep cache + * lines of the target region hot. + * + * For larger folios we assume that there is no expectation of cache locality + * and just do a straight zero. */ void folio_zero_user(struct folio *folio, unsigned long addr_hint) { - unsigned int nr_pages = folio_nr_pages(folio); + const unsigned int nr_pages = folio_nr_pages(folio); + const unsigned long addr = ALIGN_DOWN(addr_hint, nr_pages * PAGE_SIZE); + const long fault_idx = (addr_hint - addr) / PAGE_SIZE; + const struct range pg = DEFINE_RANGE(0, nr_pages - 1); + const int width = 2; /* number of pages cleared last on either side */ + struct range r[3]; + int i; + + if (unlikely(nr_pages >= MAX_ORDER_NR_PAGES)) { + clear_user_highpages_resched(folio_page(folio, 0), addr, nr_pages); + return; + } + + /* + * Faulting page and its immediate neighbourhood. Cleared at the end to + * ensure it sticks around in the cache. + */ + r[2] = DEFINE_RANGE(clamp_t(s64, fault_idx - width, pg.start, pg.end), + clamp_t(s64, fault_idx + width, pg.start, pg.end)); + + /* Region to the left of the fault */ + r[1] = DEFINE_RANGE(pg.start, + clamp_t(s64, r[2].start-1, pg.start-1, r[2].start)); + + /* Region to the right of the fault: always valid for the common fault_idx=0 case. */ + r[0] = DEFINE_RANGE(clamp_t(s64, r[2].end+1, r[2].end, pg.end+1), + pg.end); + + for (i = 0; i <= 2; i++) { + unsigned int cur_nr_pages = range_len(&r[i]); + struct page *cur_page = folio_page(folio, r[i].start); + unsigned long cur_addr = addr + folio_page_idx(folio, cur_page) * PAGE_SIZE; + + if (cur_nr_pages > 0) + clear_user_highpages_resched(cur_page, cur_addr, cur_nr_pages); + } - if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) - clear_gigantic_page(folio, addr_hint, nr_pages); - else - process_huge_page(addr_hint, nr_pages, clear_subpage, folio); } static int copy_user_gigantic_page(struct folio *dst, struct folio *src, -- 2.50.1 On highmem we'd simply process individual pages, who cares. On !highmem, we'd use the optimized clear_user_pages -> clear_pages implementation if available. Otherwise, we clear individual pages. Yes, we'd lose the left-right pattern. If really important we could somehow let the architecture opt in and do the call to the existing process function. -- Cheers, David / dhildenb