All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dev Jain <dev.jain@arm.com>, akpm@linux-foundation.org
Cc: ryan.roberts@arm.com, willy@infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, hughd@google.com,
	vishal.moola@gmail.com, yang@os.amperecomputing.com,
	ziy@nvidia.com
Subject: Re: [PATCH] mempolicy: Optimize queue_folios_pte_range by PTE batching
Date: Tue, 15 Apr 2025 13:59:00 +0200	[thread overview]
Message-ID: <1d6d7842-1700-40d2-9d5b-e044fbc242de@redhat.com> (raw)
In-Reply-To: <9ed4c113-37eb-4e3d-98a1-f46f786aaea9@arm.com>

On 15.04.25 13:47, Dev Jain wrote:
> 
> 
> On 15/04/25 3:47 pm, David Hildenbrand wrote:
>> On 11.04.25 10:13, Dev Jain wrote:
>>> After the check for queue_folio_required(), the code only cares about the
>>> folio in the for loop, i.e the PTEs are redundant. Therefore, optimize
>>> this
>>> loop by skipping over a PTE batch mapping the same folio.
>>>
>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>> Unfortunately I have only build tested this since my test environment is
>>> broken.
>>>
>>>    mm/mempolicy.c | 12 +++++++++++-
>>>    1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>>> index b28a1e6ae096..b019524da8a2 100644
>>> --- a/mm/mempolicy.c
>>> +++ b/mm/mempolicy.c
>>> @@ -573,6 +573,9 @@ static int queue_folios_pte_range(pmd_t *pmd,
>>> unsigned long addr,
>>>        pte_t *pte, *mapped_pte;
>>>        pte_t ptent;
>>>        spinlock_t *ptl;
>>> +    int max_nr;
>>> +    const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
>>> +    int nr = 1;
>>
>> Try sticking to reverse xmas tree, please. (not completely the case
>> here, but fpb_flags can easily be moved all he way to the top)
> 
> I thought that the initializations were to be kept at the bottom.

Not that I am aware of.

> Asking for future patches, should I put all declarations in reverse-xmas
> fashion (even those which I don't intend to touch w.r.t the patch
> logic), or do I do that for only my additions?

We try to stay as close to reverse-xmas tree as possible. It's not 
always possible (e.g., dependent assignments), but fpb_flags in this 
case here can easily go all the way to the top.

...

> 
>>
>>   >       ptl = pmd_trans_huge_lock(pmd, vma);>       if (ptl) {
>>> @@ -586,7 +589,8 @@ static int queue_folios_pte_range(pmd_t *pmd,
>>> unsigned long addr,
>>>            walk->action = ACTION_AGAIN;
>>>            return 0;
>>>        }
>>   > -    for (; addr != end; pte++, addr += PAGE_SIZE) {> +    for (;
>> addr != end; pte += nr, addr += nr * PAGE_SIZE) {
>>> +        nr = 1;
>>>            ptent = ptep_get(pte);
>>>            if (pte_none(ptent))
>>>                continue;
>>> @@ -607,6 +611,11 @@ static int queue_folios_pte_range(pmd_t *pmd,
>>> unsigned long addr,
>>>            if (!queue_folio_required(folio, qp))
>>>                continue;
>>>            if (folio_test_large(folio)) {
>>> +            max_nr = (end - addr) >> PAGE_SHIFT;
>>> +            if (max_nr != 1)
>>> +                nr = folio_pte_batch(folio, addr, pte, ptent,
>>> +                             max_nr, fpb_flags,
>>> +                             NULL, NULL, NULL);
>>
>> We should probably do that immediately after we verified that
>> vm_normal_folio() have us something reasonable.
> 
> But shouldn't we keep the small folio case separate to avoid the
> overhead of folio_pte_batch()?

Yes, just do something like

if (folio_test_large(folio) && end - addr > 1)
	nr = folio_pte_batch(folio, addr, pte, ptent, end - addr,
			     max_nr, fpb_flags, ...);

before the folio_test_reserved().

Then you'd also skip the all ptes if !queue_folio_required.

> 
>>
>>>                /*
>>>                 * A large folio can only be isolated from LRU once,
>>>                 * but may be mapped by many PTEs (and Copy-On-Write may
>>> @@ -633,6 +642,7 @@ static int queue_folios_pte_range(pmd_t *pmd,
>>> unsigned long addr,
>>>                qp->nr_failed++;
>>>                if (strictly_unmovable(flags))
>>>                    break;
>>> +            qp->nr_failed += nr - 1;
>>
>> Can't we do qp->nr_failed += nr; above?
> 
> I did not dive deep into the significance of nr_failed, but I did that
> to keep the code, before and after the change, equivalent:

And I question exactly that.

If we hit strictly_unmovable(flags), we end up returning "-EIO" from
queue_folios_pte_range().

And staring at queue_pages_range(), we ignore nr_failed if 
walk_page_range() returned an error.

So looks like we can just add everything in one shot, independent of 
strictly_unmovable()?

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-04-15 11:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11  8:13 [PATCH] mempolicy: Optimize queue_folios_pte_range by PTE batching Dev Jain
2025-04-15 10:17 ` David Hildenbrand
2025-04-15 11:47   ` Dev Jain
2025-04-15 11:59     ` David Hildenbrand [this message]
2025-04-15 12:06       ` Dev Jain
2025-04-16  7:33     ` Baolin Wang
2025-04-16  8:55       ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d6d7842-1700-40d2-9d5b-e044fbc242de@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dev.jain@arm.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=vishal.moola@gmail.com \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.