public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Dev Jain <dev.jain@arm.com>
Cc: akpm@linux-foundation.org, ryan.roberts@arm.com,
	david@redhat.com, willy@infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
	will@kernel.org, Liam.Howlett@oracle.com, vbabka@suse.cz,
	jannh@google.com, anshuman.khandual@arm.com, peterx@redhat.com,
	joey.gouly@arm.com, ioworker0@gmail.com, baohua@kernel.org,
	kevin.brodsky@arm.com, quic_zhenhuah@quicinc.com,
	christophe.leroy@csgroup.eu, yangyicong@hisilicon.com,
	linux-arm-kernel@lists.infradead.org, namit@vmware.com,
	hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com
Subject: Re: [PATCH v2 1/7] mm: Refactor code in mprotect
Date: Tue, 29 Apr 2025 12:00:27 +0100	[thread overview]
Message-ID: <ffecd4cd-32ce-4c3f-a5ff-0ced14b13f8d@lucifer.local> (raw)
In-Reply-To: <20250429052336.18912-2-dev.jain@arm.com>

For changes like this, difftastic comes in very handy :)

On Tue, Apr 29, 2025 at 10:53:30AM +0530, Dev Jain wrote:
> Reduce indentation in change_pte_range() by refactoring some of the code
> into a new function. No functional change.
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>

Overall a big fan of the intent of this patch! This is a nice cleanup, just
need to nail down details.

> ---
>  mm/mprotect.c | 116 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 68 insertions(+), 48 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 88608d0dc2c2..70f59aa8c2a8 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -83,6 +83,71 @@ bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
>  	return pte_dirty(pte);
>  }
>
> +
> +

Nit: stray extra newline.

> +static bool prot_numa_skip(struct vm_area_struct *vma, struct folio *folio,
> +		int target_node)

This is a bit weird, it's like you have two functions to determine whether to
skip a PTE entry, but named differently?

I think you say in response to a comment elsewhere that you intend to further
split things up in subsequent patches, but this kinda bugs me as subjective as
it is :)

I'd say rename prot_numa_avoid_fault() -> can_skip_prot_numa_pte()

And this to can_skip_prot_numa_folio()?

Then again, the below function does some folio stuff too, so I'm not sure
exactly what the separation is? Can you explain?

Also it'd be good to add some brief comment, something like 'the prot_numa
change-prot (cp) flag indicates that this protection change is due to NUMA
hinting, we determine if we actually have work to do or can skip this folio
entirely'.

Or equivalent in the below function.

> +{
> +	bool toptier;
> +	int nid;
> +
> +	/* Also skip shared copy-on-write pages */
> +	if (is_cow_mapping(vma->vm_flags) &&
> +	    (folio_maybe_dma_pinned(folio) ||
> +	     folio_maybe_mapped_shared(folio)))
> +		return true;
> +
> +	/*
> +	 * While migration can move some dirty pages,
> +	 * it cannot move them all from MIGRATE_ASYNC
> +	 * context.
> +	 */
> +	if (folio_is_file_lru(folio) &&
> +	    folio_test_dirty(folio))
> +		return true;
> +
> +	/*
> +	 * Don't mess with PTEs if page is already on the node
> +	 * a single-threaded process is running on.
> +	 */
> +	nid = folio_nid(folio);
> +	if (target_node == nid)
> +		return true;
> +	toptier = node_is_toptier(nid);
> +
> +	/*
> +	 * Skip scanning top tier node if normal numa
> +	 * balancing is disabled
> +	 */
> +	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
> +	    toptier)
> +		return true;
> +	return false;
> +}
> +
> +static bool prot_numa_avoid_fault(struct vm_area_struct *vma,
> +		unsigned long addr, pte_t oldpte, int target_node)
> +{
> +	struct folio *folio;
> +	int ret;
> +
> +	/* Avoid TLB flush if possible */
> +	if (pte_protnone(oldpte))
> +		return true;
> +
> +	folio = vm_normal_folio(vma, addr, oldpte);
> +	if (!folio || folio_is_zone_device(folio) ||
> +	    folio_test_ksm(folio))
> +		return true;
> +	ret = prot_numa_skip(vma, folio, target_node);
> +	if (ret)
> +		return ret;

This is a bit silly as it returns a boolean value, surely;

	if (prot_numa_skip(vma, folio, target_node))
		return true;

Is better?

> +	if (folio_use_access_time(folio))
> +		folio_xchg_access_time(folio,
> +			jiffies_to_msecs(jiffies));

Why is this here and not in prot_numa_skip() or whatever we rename it to?

> +	return false;
> +}
> +
>  static long change_pte_range(struct mmu_gather *tlb,
>  		struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr,
>  		unsigned long end, pgprot_t newprot, unsigned long cp_flags)
> @@ -116,56 +181,11 @@ static long change_pte_range(struct mmu_gather *tlb,
>  			 * Avoid trapping faults against the zero or KSM
>  			 * pages. See similar comment in change_huge_pmd.
>  			 */
> -			if (prot_numa) {
> -				struct folio *folio;
> -				int nid;
> -				bool toptier;
> -
> -				/* Avoid TLB flush if possible */
> -				if (pte_protnone(oldpte))
> -					continue;
> -
> -				folio = vm_normal_folio(vma, addr, oldpte);
> -				if (!folio || folio_is_zone_device(folio) ||
> -				    folio_test_ksm(folio))
> -					continue;
> -
> -				/* Also skip shared copy-on-write pages */
> -				if (is_cow_mapping(vma->vm_flags) &&
> -				    (folio_maybe_dma_pinned(folio) ||
> -				     folio_maybe_mapped_shared(folio)))
> -					continue;
> -
> -				/*
> -				 * While migration can move some dirty pages,
> -				 * it cannot move them all from MIGRATE_ASYNC
> -				 * context.
> -				 */
> -				if (folio_is_file_lru(folio) &&
> -				    folio_test_dirty(folio))
> +			if (prot_numa &&
> +			    prot_numa_avoid_fault(vma, addr,
> +						  oldpte, target_node))
>  					continue;
>
> -				/*
> -				 * Don't mess with PTEs if page is already on the node
> -				 * a single-threaded process is running on.
> -				 */
> -				nid = folio_nid(folio);
> -				if (target_node == nid)
> -					continue;
> -				toptier = node_is_toptier(nid);
> -
> -				/*
> -				 * Skip scanning top tier node if normal numa
> -				 * balancing is disabled
> -				 */
> -				if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
> -				    toptier)
> -					continue;
> -				if (folio_use_access_time(folio))
> -					folio_xchg_access_time(folio,
> -						jiffies_to_msecs(jiffies));
> -			}
> -
>  			oldpte = ptep_modify_prot_start(vma, addr, pte);
>  			ptent = pte_modify(oldpte, newprot);
>
> --
> 2.30.2
>

  parent reply	other threads:[~2025-04-29 11:01 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-29  5:23 [PATCH v2 0/7] Optimize mprotect for large folios Dev Jain
2025-04-29  5:23 ` [PATCH v2 1/7] mm: Refactor code in mprotect Dev Jain
2025-04-29  6:41   ` Anshuman Khandual
2025-04-29  6:54     ` Dev Jain
2025-04-29 11:00   ` Lorenzo Stoakes [this message]
2025-04-29  5:23 ` [PATCH v2 2/7] mm: Optimize mprotect() by batch-skipping PTEs Dev Jain
2025-04-29  7:14   ` Anshuman Khandual
2025-04-29  8:59     ` Dev Jain
2025-04-29 13:19   ` Lorenzo Stoakes
2025-04-30  6:37     ` Dev Jain
2025-04-30 13:18       ` Ryan Roberts
2025-04-30 13:36         ` Lorenzo Stoakes
2025-04-29  5:23 ` [PATCH v2 3/7] mm: Add batched versions of ptep_modify_prot_start/commit Dev Jain
2025-04-29  8:39   ` Anshuman Khandual
2025-04-29  9:01     ` Dev Jain
2025-04-29 13:52   ` Lorenzo Stoakes
2025-04-30  6:25     ` Dev Jain
2025-04-30 14:37       ` Lorenzo Stoakes
2025-05-06 14:30         ` David Hildenbrand
2025-05-06 15:03           ` Lorenzo Stoakes
2025-04-30 14:09     ` Ryan Roberts
2025-04-30 14:34       ` Lorenzo Stoakes
2025-05-01 11:33         ` Ryan Roberts
2025-05-01 12:58           ` Lorenzo Stoakes
2025-05-06 14:28             ` David Hildenbrand
2025-04-30  5:35   ` kernel test robot
2025-04-30  5:45   ` kernel test robot
2025-04-30 14:16   ` Ryan Roberts
2025-04-29  5:23 ` [PATCH v2 4/7] arm64: Add batched version of ptep_modify_prot_start Dev Jain
2025-04-30  5:43   ` Anshuman Khandual
2025-04-30  5:49     ` Dev Jain
2025-04-30  6:14       ` Anshuman Khandual
2025-04-30  6:32         ` Dev Jain
2025-04-29  5:23 ` [PATCH v2 5/7] arm64: Add batched version of ptep_modify_prot_commit Dev Jain
2025-04-29  5:23 ` [PATCH v2 6/7] mm: Batch around can_change_pte_writable() Dev Jain
2025-04-29  9:15   ` David Hildenbrand
2025-04-29  9:19   ` David Hildenbrand
2025-04-29  9:27     ` David Hildenbrand
2025-04-29 13:57       ` Lorenzo Stoakes
2025-04-29 14:00         ` David Hildenbrand
2025-04-30  5:44         ` Dev Jain
2025-05-06  9:16       ` Dev Jain
2025-05-06 14:34         ` David Hildenbrand
2025-04-30  6:17   ` kernel test robot
2025-04-29  5:23 ` [PATCH v2 7/7] mm: Optimize mprotect() through PTE-batching Dev Jain
2025-04-29  7:06 ` [PATCH v2 0/7] Optimize mprotect for large folios Lance Yang
2025-04-29  9:02   ` Dev Jain
2025-04-29 10:41     ` Lorenzo Stoakes
2025-04-30  5:42       ` Dev Jain
2025-04-30  6:22         ` Lance Yang
2025-04-30  7:07           ` Dev Jain
2025-04-29 11:03 ` Lorenzo Stoakes
2025-04-29 14:02   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ffecd4cd-32ce-4c3f-a5ff-0ced14b13f8d@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=baohua@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=joey.gouly@arm.com \
    --cc=kevin.brodsky@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=quic_zhenhuah@quicinc.com \
    --cc=ryan.roberts@arm.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=yangyicong@hisilicon.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox