All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>, "Huang, Ying" <ying.huang@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v1 2/2] mm/migrate: move NUMA hinting fault folio isolation + checks under PTL
Date: Mon, 1 Jul 2024 16:03:00 +0200	[thread overview]
Message-ID: <e2b4933f-7102-47fc-bb33-ecb46eddedcf@redhat.com> (raw)
In-Reply-To: <D994321B-BF99-45F8-A4BB-F8C8E4DA77A9@nvidia.com>

On 01.07.24 15:50, Zi Yan wrote:
> On 1 Jul 2024, at 4:32, Huang, Ying wrote:
> 
>> "Zi Yan" <ziy@nvidia.com> writes:
>>
>>> On Wed Jun 26, 2024 at 12:49 PM EDT, David Hildenbrand wrote:
>>>> On 21.06.24 22:48, Zi Yan wrote:
>>>>> On 21 Jun 2024, at 16:18, David Hildenbrand wrote:
>>>>>
>>>>>> On 21.06.24 15:44, Zi Yan wrote:
>>>>>>> On 20 Jun 2024, at 17:29, David Hildenbrand wrote:
>>>>>>>
>>>>>>>> Currently we always take a folio reference even if migration will not
>>>>>>>> even be tried or isolation failed, requiring us to grab+drop an additional
>>>>>>>> reference.
>>>>>>>>
>>>>>>>> Further, we end up calling folio_likely_mapped_shared() while the folio
>>>>>>>> might have already been unmapped, because after we dropped the PTL, that
>>>>>>>> can easily happen. We want to stop touching mapcounts and friends from
>>>>>>>> such context, and only call folio_likely_mapped_shared() while the folio
>>>>>>>> is still mapped: mapcount information is pretty much stale and unreliable
>>>>>>>> otherwise.
>>>>>>>>
>>>>>>>> So let's move checks into numamigrate_isolate_folio(), rename that
>>>>>>>> function to migrate_misplaced_folio_prepare(), and call that function
>>>>>>>> from callsites where we call migrate_misplaced_folio(), but still with
>>>>>>>> the PTL held.
>>>>>>>>
>>>>>>>> We can now stop taking temporary folio references, and really only take
>>>>>>>> a reference if folio isolation succeeded. Doing the
>>>>>>>> folio_likely_mapped_shared() + golio isolation under PT lock is now similar
>>>>>>>> to how we handle MADV_PAGEOUT.
>>>>>>>>
>>>>>>>> While at it, combine the folio_is_file_lru() checks.
>>>>>>>>
>>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>>>> ---
>>>>>>>>     include/linux/migrate.h |  7 ++++
>>>>>>>>     mm/huge_memory.c        |  8 ++--
>>>>>>>>     mm/memory.c             |  9 +++--
>>>>>>>>     mm/migrate.c            | 81 +++++++++++++++++++----------------------
>>>>>>>>     4 files changed, 55 insertions(+), 50 deletions(-)
>>>>>>>
>>>>>>> LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>>>>>
>>>>>>> One nit below:
>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>>>> index fc27dabcd8e3..4b2817bb2c7d 100644
>>>>>>>> --- a/mm/huge_memory.c
>>>>>>>> +++ b/mm/huge_memory.c
>>>>>>>> @@ -1688,11 +1688,13 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
>>>>>>>>     	if (node_is_toptier(nid))
>>>>>>>>     		last_cpupid = folio_last_cpupid(folio);
>>>>>>>>     	target_nid = numa_migrate_prep(folio, vmf, haddr, nid, &flags);
>>>>>>>> -	if (target_nid == NUMA_NO_NODE) {
>>>>>>>> -		folio_put(folio);
>>>>>>>> +	if (target_nid == NUMA_NO_NODE)
>>>>>>>> +		goto out_map;
>>>>>>>> +	if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) {
>>>>>>>> +		flags |= TNF_MIGRATE_FAIL;
>>>>>>>>     		goto out_map;
>>>>>>>>     	}
>>>>>>>> -
>>>>>>>> +	/* The folio is isolated and isolation code holds a folio reference. */
>>>>>>>>     	spin_unlock(vmf->ptl);
>>>>>>>>     	writable = false;
>>>>>>>>
>>>>>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>>>>>> index 118660de5bcc..4fd1ecfced4d 100644
>>>>>>>> --- a/mm/memory.c
>>>>>>>> +++ b/mm/memory.c
>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>> @@ -5345,10 +5343,13 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>>>>>>>>     	else
>>>>>>>>     		last_cpupid = folio_last_cpupid(folio);
>>>>>>>>     	target_nid = numa_migrate_prep(folio, vmf, vmf->address, nid, &flags);
>>>>>>>> -	if (target_nid == NUMA_NO_NODE) {
>>>>>>>> -		folio_put(folio);
>>>>>>>> +	if (target_nid == NUMA_NO_NODE)
>>>>>>>> +		goto out_map;
>>>>>>>> +	if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) {
>>>>>>>> +		flags |= TNF_MIGRATE_FAIL;
>>>>>>>>     		goto out_map;
>>>>>>>>     	}
>>>>>>>
>>>>>>> These two locations are repeated code, maybe just merge the ifs into
>>>>>>> numa_migrate_prep(). Feel free to ignore if you are not going to send
>>>>>>> another version. :)
>>>>>>
>>>>>> I went back and forth a couple of times and
>>>>>>
>>>>>> a) Didn't want to move numa_migrate_prep() into
>>>>>>      migrate_misplaced_folio_prepare(), because having that code in
>>>>>>      mm/migrate.c felt a bit odd.
>>>>>
>>>>> I agree after checking the actual code, since the code is just
>>>>> updating NUMA fault stats and checking where the folio should be.
>>>>>
>>>>>>
>>>>>> b) Didn't want to move migrate_misplaced_folio_prepare() because I enjoy
>>>>>>      seeing the migrate_misplaced_folio_prepare() and
>>>>>>      migrate_misplaced_folio() calls in the same callercontext.
>>>>>>
>>>>>> I also considered renaming numa_migrate_prep(), but wasn't really able to come up with a good name.
>>>>>
>>>>> How about numa_migrate_check()? Since it tells whether a folio should be
>>>>> migrated or not.
>>>>>
>>>>>>
>>>>>> But maybe a) is not too bad?
>>>>>>
>>>>>> We'd have migrate_misplaced_folio_prepare() consume &flags and &target_nid, and perform the "flags |= TNF_MIGRATE_FAIL;" internally.
>>>>>>
>>>>>> What would be your take?
>>>>>
>>>>> I would either rename numa_migrate_prep() or just do nothing. I have to admit
>>>>> that the "prep" and "prepare" in both function names motivated me to propose
>>>>> the merge, but now the actual code tells me they should be separate.
>>>>
>>>> Let's leave it like that for now. Renaming to numa_migrate_check() makes
>>>> sense, and likely moving more numa handling stuff in there.
>>>>
>>>> Bit I yet have to figure out why some of the memory.c vs. huge_memory.c
>>>> code differences exist, so we can unify them.
>>>>
>>>> For example, why did 33024536bafd9 introduce slightly different
>>>> last_cpupid handling in do_huge_pmd_numa_page(), whereby it seems like
>>>> some subtle difference in handling NUMA_BALANCING_MEMORY_TIERING? Maybe
>>>> I am missing something obvious. :)
>>>
>>> It seems to me that a sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING
>>> check is missing in do_huge_pmd_numa_page(). So the
>>>
>>> if (node_is_toptier(nid))
>>>
>>> should be
>>>
>>> if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) ||
>>> node_is_toptier(nid))
>>>
>>> to be consistent with other checks. Add Ying to confirm.
>>
>> Yes.  It should be so.  Sorry for my mistake and confusing.
> 
> Thank you for the confirmation.
> 
>>
>>> I also think a function like
>>>
>>> bool folio_has_cpupid(folio)
>>> {
>>>      return !(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
>>>      || node_is_toptier(folio_nid(folio));
>>> }
>>>
>>> would be better than the existing checks.
>>
>> Yes.  This looks better.  Even better, we can add some comments to the
>> function too.
> 
> I will prepare a patch about it.

Do you have capacity to further consolidate the logic, maybe moving more 
stuff into the numa_migrate_prep (and renaming it? :)).

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-07-01 14:03 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20 21:29 [PATCH v1 0/2] mm/migrate: move NUMA hinting fault folio isolation + checks under PTL David Hildenbrand
2024-06-20 21:29 ` [PATCH v1 1/2] mm/migrate: make migrate_misplaced_folio() return 0 on success David Hildenbrand
2024-06-21  1:40   ` Zi Yan
2024-06-21  3:39   ` Baolin Wang
2024-07-01  7:36   ` Huang, Ying
2024-07-01  7:44     ` David Hildenbrand
2024-06-20 21:29 ` [PATCH v1 2/2] mm/migrate: move NUMA hinting fault folio isolation + checks under PTL David Hildenbrand
2024-06-21  2:05   ` Zi Yan
2024-06-21  7:32     ` David Hildenbrand
2024-06-21  4:07   ` Baolin Wang
2024-06-21  7:31     ` David Hildenbrand
2024-06-21 13:44   ` Zi Yan
2024-06-21 20:18     ` David Hildenbrand
2024-06-21 20:48       ` Zi Yan
2024-06-26 16:49         ` David Hildenbrand
2024-06-26 17:37           ` Zi Yan
2024-07-01  8:32             ` Huang, Ying
2024-07-01 13:50               ` Zi Yan
2024-07-01 14:03                 ` David Hildenbrand [this message]
2024-07-01 14:04                   ` Zi Yan
2024-06-21 17:47   ` Donet Tom
2024-06-21 20:14     ` David Hildenbrand
2024-06-26 16:22   ` David Hildenbrand
2024-06-27  6:00     ` Donet Tom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2b4933f-7102-47fc-bb33-ecb46eddedcf@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.