From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-f197.google.com (mail-wj0-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 192246B0038 for ; Mon, 26 Dec 2016 14:02:53 -0500 (EST) Received: by mail-wj0-f197.google.com with SMTP id iq1so20802713wjb.1 for ; Mon, 26 Dec 2016 11:02:53 -0800 (PST) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com. [209.132.183.25]) by mx.google.com with ESMTPS id w3si39030117wjp.149.2016.12.26.11.02.51 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 26 Dec 2016 11:02:51 -0800 (PST) Date: Mon, 26 Dec 2016 14:02:46 -0500 (EST) From: Jerome Glisse Message-ID: <897363324.7325313.1482778965996.JavaMail.zimbra@redhat.com> In-Reply-To: <5860DEE7.5040505@linux.vnet.ibm.com> References: <1481215184-18551-1-git-send-email-jglisse@redhat.com> <1481215184-18551-6-git-send-email-jglisse@redhat.com> <152004793.3187283.1481215199204.JavaMail.zimbra@redhat.com> <7df66ace-ef29-c76b-d61c-88263a61c6d0@intel.com> <2093258630.3273244.1481229443563.JavaMail.zimbra@redhat.com> <5860DEE7.5040505@linux.vnet.ibm.com> Subject: Re: [HMM v14 05/16] mm/ZONE_DEVICE/unaddressable: add support for un-addressable device memory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Anshuman Khandual Cc: Dave Hansen , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, John Hubbard , Dan Williams , Ross Zwisler > On 12/09/2016 02:07 AM, Jerome Glisse wrote: > >> On 12/08/2016 08:39 AM, Jerome Glisse wrote: > >>>> > >> On 12/08/2016 08:39 AM, J=C3=A9r=C3=B4me Glisse wrote: > >>>>>>> > >>> > > Architecture that wish to support un-addressable device > >>>>>>> > >>> > > memory should > >>>>>>> > >>> > > make > >>>>>>> > >>> > > sure to never populate the kernel linar mapping for the > >>>>>>> > >>> > > physical > >>>>>>> > >>> > > range. > >>>>> > >> >=20 > >>>>> > >> > Does the platform somehow provide a range of physical addres= ses > >>>>> > >> > for this > >>>>> > >> > unaddressable area? How do we know no memory will be hot-ad= ded > >>>>> > >> > in a > >>>>> > >> > range we're using for unaddressable device memory, for insta= nce? > >>> > > That's what one of the big issue. No platform does not reserve an= y > >>> > > range so > >>> > > there is a possibility that some memory get hotpluged and assign = this > >>> > > range. > >>> > >=20 > >>> > > I pushed the range decision to higher level (ie it is the device > >>> > > driver > >>> > > that > >>> > > pick one) so right now for device driver using HMM (NVidia close > >>> > > driver as > >>> > > we don't have nouveau ready for that yet) it goes from the highes= t > >>> > > physical > >>> > > address and scan down until finding an empty range big enough. > >> >=20 > >> > I don't think you should be stealing physical address space for thin= gs > >> > that don't and can't have physical addresses. Delegating this to > >> > individual device drivers and hoping that they all get it right seem= s > >> > like a recipe for disaster. > > Well i expected device driver to use hmm_devmem_add() which does not ta= ke > > physical address but use the above logic to pick one. > >=20 > >> >=20 > >> > Maybe worth adding to the changelog: > >> >=20 > >> > =09This feature potentially breaks memory hotplug unless every > >> > =09driver using it magically predicts the future addresses of > >> > =09where memory will be hotplugged. > > I will add debug printk to memory hotplug in case it fails because of s= ome > > un-addressable resource. If you really dislike memory hotplug being bro= ken > > then i can go down the way of allowing to hotplug memory above the max > > physical memory limit. This require more changes but i believe this is > > doable for some of the memory model (sparsemem and sparsemem extreme). >=20 > Did not get that. Hotplug memory request will come within the max physica= l > memory limit as they are real RAM. The address range also would have been > specified. How it can be added beyond the physical limit irrespective of > which we memory model we use. >=20 Maybe what you do not know is that on x86 we do not have resource reserve b= y the patform for the device memory (the PCIE bar never cover the whole memory so= this range can not be use). Right now i pick random unuse physical address range for device memory and = thus real memory might later be hotplug just inside the range i took and hotplug= will fail because i already registered a resource for my device memory. This is = an x86 platform limitation. Now if i bump the maximum physical memory by one bit than i can hotplug dev= ice memory inside that extra bit range and be sure that i will never have any r= eal memory conflict (as i am above the architectural limit). Allowing to bump the maximum physical memory have implication and i can not= just bump MAX_PHYSMEM_BITS as it will have repercusion that i don't want. Now in= some memory model i can allow hotplug to happen above the MAX_PHYSMEM_BITS witho= ut having to change MAX_PHYSMEM_BITS and allowing page_to_pfn() and pfn_to_pag= e() to work above MAX_PHYSMEM_BITS again without changing it. Memory model like SPARSEMEM_VMEMMAP are problematic as i would need to chan= ge the kernel virtual memory map for the architecture and it is not something i wa= nt to do. In the meantime people using HMM are "~happy~" enough with memory hotplug f= ailing. Cheers, J=C3=A9r=C3=B4me -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org