Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com,
	mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com,
	rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM
Date: Wed, 16 Aug 2023 11:49:03 +0200	[thread overview]
Message-ID: <ded3c4dc-2df9-2ef2-add0-c17f0cdfaf32@redhat.com> (raw)
In-Reply-To: <ZNyRnU+KynjCzwRm@yzhao56-desk.sh.intel.com>

On 16.08.23 11:06, Yan Zhao wrote:
> On Wed, Aug 16, 2023 at 09:43:40AM +0200, David Hildenbrand wrote:
>> On 15.08.23 04:34, John Hubbard wrote:
>>> On 8/14/23 02:09, Yan Zhao wrote:
>>> ...
>>>>> hmm_range_fault()-based memory management in particular might benefit
>>>>> from having NUMA balancing disabled entirely for the memremap_pages()
>>>>> region, come to think of it. That seems relatively easy and clean at
>>>>> first glance anyway.
>>>>>
>>>>> For other regions (allocated by the device driver), a per-VMA flag
>>>>> seems about right: VM_NO_NUMA_BALANCING ?
>>>>>
>>>> Thanks a lot for those good suggestions!
>>>> For VMs, when could a per-VMA flag be set?
>>>> Might be hard in mmap() in QEMU because a VMA may not be used for DMA until
>>>> after it's mapped into VFIO.
>>>> Then, should VFIO set this flag on after it maps a range?
>>>> Could this flag be unset after device hot-unplug?
>>>>
>>>
>>> I'm hoping someone who thinks about VMs and VFIO often can chime in.
>>
>> At least QEMU could just set it on the applicable VMAs (as said by Yuan Yao,
>> using madvise).
>>
>> BUT, I do wonder what value there would be for autonuma to still be active
> Currently MADV_* is up to 25
> 	#define MADV_COLLAPSE   25,
> while madvise behavior is of type "int". So it's ok.
> 
> But vma->vm_flags is of "unsigned long", so it's full at least on 32bit platform.

I remember there were discussions to increase it also for 32bit. If 
that's required, we might want to go down that path.

But do 32bit architectures even care about NUMA hinting? If not, just 
ignore them ...

> 
>> for the remainder of the hypervisor. If there is none, a prctl() would be
>> better.
> Add a new field in "struct vma_numab_state" in vma, and use prctl() to
> update this field?

Rather a global toggle per MM, no need to update individual VMAs -- if 
we go down that prctl() path.

No need to consume more memory for VMAs.

[...]

>> We already do have a mechanism in QEMU to get notified when longterm-pinning
>> in the kernel might happen (and, therefore, MADV_DONTNEED must not be used):
>> * ram_block_discard_disable()
>> * ram_block_uncoordinated_discard_disable()
> Looks this ram_block_discard allow/disallow state is global rather than per-VMA
> in QEMU.

Yes. Once you transition into "discard of any kind disabled", you can go 
over all guest memory VMAs (RAMBlock) and issue an madvise() for them. 
(or alternatively, do the prctl() once )

We'll also have to handle new guest memory being created afterwards, but 
that is easy.

Once we transition to "no discarding disabled", you can go over all 
guest memory VMAs (RAMBlock) and issue an madvise() for them again (or 
alternatively, do the prctl() once).

> So, do you mean that let kernel provide a per-VMA allow/disallow mechanism, and
> it's up to the user space to choose between per-VMA and complex way or
> global and simpler way?

QEMU could do either way. The question would be if a per-vma settings 
makes sense for NUMA hinting.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2023-08-16  9:49 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10  8:56 [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM Yan Zhao
2023-08-10  8:57 ` [RFC PATCH v2 1/5] mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA Yan Zhao
2023-08-10  8:58 ` [RFC PATCH v2 2/5] mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 3/5] mm/mmu_notifier: introduce a new callback .numa_protect Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 4/5] mm/autonuma: call .numa_protect() when page is protected for NUMA migrate Yan Zhao
2023-08-11 18:52   ` Nadav Amit
2023-08-14  7:52     ` Yan Zhao
2023-08-10  9:02 ` [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration Yan Zhao
2023-08-10 13:16   ` bibo mao
2023-08-11  3:45     ` Yan Zhao
2023-08-11  7:40       ` bibo mao
2023-08-11  8:01         ` Yan Zhao
2023-08-11 17:14           ` Sean Christopherson
2023-08-11 17:18             ` Jason Gunthorpe
2023-08-14  6:52             ` Yan Zhao
2023-08-14  7:44               ` Yan Zhao
2023-08-14 16:40               ` Sean Christopherson
2023-08-15  1:54                 ` Yan Zhao
2023-08-15 14:50                   ` Sean Christopherson
2023-08-16  2:43                     ` bibo mao
2023-08-16  3:44                       ` bibo mao
2023-08-16  5:14                         ` Yan Zhao
2023-08-16  7:29                           ` bibo mao
2023-08-16  7:18                             ` Yan Zhao
2023-08-16  7:53                               ` bibo mao
2023-08-16 13:39                                 ` Sean Christopherson
2023-08-10  9:34 ` [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM David Hildenbrand
2023-08-10  9:50   ` Yan Zhao
2023-08-11 17:25     ` David Hildenbrand
2023-08-11 18:20       ` John Hubbard
2023-08-11 18:39         ` David Hildenbrand
2023-08-11 19:35           ` John Hubbard
2023-08-14  9:09             ` Yan Zhao
2023-08-15  2:34               ` John Hubbard
2023-08-16  7:43                 ` David Hildenbrand
2023-08-16  9:06                   ` Yan Zhao
2023-08-16  9:49                     ` David Hildenbrand [this message]
2023-08-16 18:00                       ` John Hubbard
2023-08-17  5:05                         ` Yan Zhao
2023-08-17  7:38                           ` David Hildenbrand
2023-08-18  0:13                             ` Yan Zhao
2023-08-18  2:29                               ` John Hubbard
2023-09-04  9:18                                 ` Yan Zhao
2023-08-15  2:36               ` Yuan Yao
2023-08-15  2:37                 ` Yan Zhao
2023-08-10 13:58 ` Chao Gao
2023-08-11  5:22   ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ded3c4dc-2df9-2ef2-add0-c17f0cdfaf32@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mike.kravetz@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).