From: Jerome Glisse <jglisse@redhat.com>
To: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, John Hubbard <jhubbard@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [HMM v14 05/16] mm/ZONE_DEVICE/unaddressable: add support for un-addressable device memory
Date: Mon, 26 Dec 2016 14:02:46 -0500 (EST) [thread overview]
Message-ID: <897363324.7325313.1482778965996.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <5860DEE7.5040505@linux.vnet.ibm.com>
> On 12/09/2016 02:07 AM, Jerome Glisse wrote:
> >> On 12/08/2016 08:39 AM, Jerome Glisse wrote:
> >>>> > >> On 12/08/2016 08:39 AM, Jérôme Glisse wrote:
> >>>>>>> > >>> > > Architecture that wish to support un-addressable device
> >>>>>>> > >>> > > memory should
> >>>>>>> > >>> > > make
> >>>>>>> > >>> > > sure to never populate the kernel linar mapping for the
> >>>>>>> > >>> > > physical
> >>>>>>> > >>> > > range.
> >>>>> > >> >
> >>>>> > >> > Does the platform somehow provide a range of physical addresses
> >>>>> > >> > for this
> >>>>> > >> > unaddressable area? How do we know no memory will be hot-added
> >>>>> > >> > in a
> >>>>> > >> > range we're using for unaddressable device memory, for instance?
> >>> > > That's what one of the big issue. No platform does not reserve any
> >>> > > range so
> >>> > > there is a possibility that some memory get hotpluged and assign this
> >>> > > range.
> >>> > >
> >>> > > I pushed the range decision to higher level (ie it is the device
> >>> > > driver
> >>> > > that
> >>> > > pick one) so right now for device driver using HMM (NVidia close
> >>> > > driver as
> >>> > > we don't have nouveau ready for that yet) it goes from the highest
> >>> > > physical
> >>> > > address and scan down until finding an empty range big enough.
> >> >
> >> > I don't think you should be stealing physical address space for things
> >> > that don't and can't have physical addresses. Delegating this to
> >> > individual device drivers and hoping that they all get it right seems
> >> > like a recipe for disaster.
> > Well i expected device driver to use hmm_devmem_add() which does not take
> > physical address but use the above logic to pick one.
> >
> >> >
> >> > Maybe worth adding to the changelog:
> >> >
> >> > This feature potentially breaks memory hotplug unless every
> >> > driver using it magically predicts the future addresses of
> >> > where memory will be hotplugged.
> > I will add debug printk to memory hotplug in case it fails because of some
> > un-addressable resource. If you really dislike memory hotplug being broken
> > then i can go down the way of allowing to hotplug memory above the max
> > physical memory limit. This require more changes but i believe this is
> > doable for some of the memory model (sparsemem and sparsemem extreme).
>
> Did not get that. Hotplug memory request will come within the max physical
> memory limit as they are real RAM. The address range also would have been
> specified. How it can be added beyond the physical limit irrespective of
> which we memory model we use.
>
Maybe what you do not know is that on x86 we do not have resource reserve by the
patform for the device memory (the PCIE bar never cover the whole memory so this
range can not be use).
Right now i pick random unuse physical address range for device memory and thus
real memory might later be hotplug just inside the range i took and hotplug will
fail because i already registered a resource for my device memory. This is an
x86 platform limitation.
Now if i bump the maximum physical memory by one bit than i can hotplug device
memory inside that extra bit range and be sure that i will never have any real
memory conflict (as i am above the architectural limit).
Allowing to bump the maximum physical memory have implication and i can not just
bump MAX_PHYSMEM_BITS as it will have repercusion that i don't want. Now in some
memory model i can allow hotplug to happen above the MAX_PHYSMEM_BITS without
having to change MAX_PHYSMEM_BITS and allowing page_to_pfn() and pfn_to_page()
to work above MAX_PHYSMEM_BITS again without changing it.
Memory model like SPARSEMEM_VMEMMAP are problematic as i would need to change the
kernel virtual memory map for the architecture and it is not something i want to
do.
In the meantime people using HMM are "~happy~" enough with memory hotplug failing.
Cheers,
Jérôme
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-12-26 19:02 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-08 16:39 [HMM v14 00/16] HMM (Heterogeneous Memory Management) v14 Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 01/16] mm/free_hot_cold_page: catch ZONE_DEVICE pages Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 02/16] mm/memory/hotplug: convert device bool to int to allow for more flags v2 Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 03/16] mm/ZONE_DEVICE/devmem_pages_remove: allow early removal of device memory Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 04/16] mm/ZONE_DEVICE/free-page: callback when page is freed Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 05/16] mm/ZONE_DEVICE/unaddressable: add support for un-addressable device memory Jérôme Glisse
2016-12-08 16:21 ` Dave Hansen
2016-12-08 16:39 ` Jerome Glisse
2016-12-08 20:07 ` Dave Hansen
2016-12-08 20:37 ` Jerome Glisse
2016-12-26 9:12 ` Anshuman Khandual
2016-12-26 19:02 ` Jerome Glisse [this message]
2016-12-08 16:39 ` [HMM v14 06/16] mm/ZONE_DEVICE/x86: " Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 07/16] mm/hmm: heterogeneous memory management (HMM for short) Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 08/16] mm/hmm/mirror: mirror process address space on device with HMM helpers Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 09/16] mm/hmm/mirror: helper to snapshot CPU page table Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 10/16] mm/hmm/mirror: device page fault handler Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 11/16] mm/hmm/migrate: support un-addressable ZONE_DEVICE page in migration Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 12/16] mm/hmm/migrate: add new boolean copy flag to migratepage() callback Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 13/16] mm/hmm/migrate: new memory migration helper for use with device memory v2 Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 14/16] mm/hmm/migrate: optimize page map once in vma being migrated Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 15/16] mm/hmm/devmem: device driver helper to hotplug ZONE_DEVICE memory Jérôme Glisse
2016-12-08 16:39 ` [HMM v14 16/16] mm/hmm/devmem: dummy HMM device as an helper for " Jérôme Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=897363324.7325313.1482778965996.JavaMail.zimbra@redhat.com \
--to=jglisse@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=jhubbard@nvidia.com \
--cc=khandual@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ross.zwisler@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).