From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from g1t5425.austin.hp.com (g1t5425.austin.hp.com [15.216.225.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 28E4C1A1FB5 for ; Wed, 6 Apr 2016 10:52:57 -0700 (PDT) Message-ID: <1459964672.20338.41.camel@hpe.com> Subject: Re: [PATCH] x86 get_unmapped_area: Add PMD alignment for DAX PMD mmap From: Toshi Kani Date: Wed, 06 Apr 2016 11:44:32 -0600 In-Reply-To: <20160406165027.GA2781@linux.intel.com> References: <1459951089-14911-1-git-send-email-toshi.kani@hpe.com> <20160406165027.GA2781@linux.intel.com> Mime-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Matthew Wilcox Cc: linux-nvdimm@lists.01.org, x86@kernel.org, linux-kernel@vger.kernel.org, mingo@kernel.org, linux-mm@kvack.org, hpa@zytor.com, tglx@linutronix.de, bp@suse.de, kirill.shutemov@linux.intel.com List-ID: T24gV2VkLCAyMDE2LTA0LTA2IGF0IDEyOjUwIC0wNDAwLCBNYXR0aGV3IFdpbGNveCB3cm90ZToK PiBPbiBXZWQsIEFwciAwNiwgMjAxNiBhdCAwNzo1ODowOUFNIC0wNjAwLCBUb3NoaSBLYW5pIHdy b3RlOgo+ID4gCj4gPiBXaGVuIENPTkZJR19GU19EQVhfUE1EIGlzIHNldCwgREFYIHN1cHBvcnRz IG1tYXAoKSB1c2luZyBQTUQgcGFnZQo+ID4gc2l6ZS7CoMKgVGhpcyBmZWF0dXJlIHJlbGllcyBv biBib3RoIG1tYXAgdmlydHVhbCBhZGRyZXNzIGFuZCBGUwo+ID4gYmxvY2sgZGF0YSAoaS5lLiBw aHlzaWNhbCBhZGRyZXNzKSB0byBiZSBhbGlnbmVkIGJ5IHRoZSBQTUQgcGFnZQo+ID4gc2l6ZS7C oMKgVXNlcnMgY2FuIHVzZSBta2ZzIG9wdGlvbnMgdG8gc3BlY2lmeSBGUyB0byBhbGlnbiBibG9j awo+ID4gYWxsb2NhdGlvbnMuwqDCoEhvd2V2ZXIsIGFsaWduaW5nIG1tYXAoKSBhZGRyZXNzIHJl cXVpcmVzIGFwcGxpY2F0aW9uCj4gPiBjaGFuZ2VzIHRvIG1tYXAoKSBjYWxscywgc3VjaCBhczoK PiA+IAo+ID4gwqAtwqDCoC8qIGxldCB0aGUga2VybmVsIHRvIGFzc2lnbiBhIG1tYXAgYWRkciAq Lwo+ID4gwqAtwqDCoG1wdHIgPSBtbWFwKE5VTEwsIGZzaXplLCBQUk9UX1JFQUR8UFJPVF9XUklU RSwgRkxBR1MsIGZkLCAwKTsKPiA+IAo+ID4gwqArwqDCoC8qIDEuIG9idGFpbiBhIFBNRC1hbGln bmVkIHZpcnR1YWwgYWRkcmVzcyAqLwo+ID4gwqArwqDCoHJldCA9IHBvc2l4X21lbWFsaWduKCZt cHRyLCBQTURfU0laRSwgZnNpemUpOwo+ID4gwqArwqDCoGlmICghcmV0KQo+ID4gwqArwqDCoMKg wqBmcmVlKG1wdHIpO8KgwqAvKiAyLiByZWxlYXNlIHRoZSB2aXJ0IGFkZHIgKi8KPiA+IMKgKwo+ ID4gwqArwqDCoC8qIDMuIHRoZW4gcGFzcyB0aGUgUE1ELWFsaWduZWQgdmlydCBhZGRyIHRvIG1t YXAoKSAqLwo+ID4gwqArwqDCoG1wdHIgPSBtbWFwKG1wdHIsIGZzaXplLCBQUk9UX1JFQUR8UFJP VF9XUklURSwgRkxBR1MsIGZkLCAwKTsKPiA+IAo+ID4gVGhlc2UgY2hhbmdlcyBhZGQgdW5uZWNl c3NhcnkgZGVwZW5kZW5jeSB0byBEQVggYW5kIFBNRCBwYWdlIHNpemUKPiA+IGludG8gYXBwbGlj YXRpb24gY29kZS7CoMKgVGhlIGtlcm5lbCBzaG91bGQgYXNzaWduIGEgbW1hcCBhZGRyZXNzCj4g PiBhcHByb3ByaWF0ZSBmb3IgdGhlIG9wZXJhdGlvbi4KPgo+IEkgcXVlc3Rpb24gdGhlIG5lZWQg Zm9yIHRoaXMgcGF0Y2guwqDCoENob29zaW5nIGFuIGFwcHJvcHJpYXRlIGJhc2UgYWRkcmVzcwo+ IGlzIHRoZSBsZWFzdCBvZiB0aGUgY2hhbmdlcyBuZWVkZWQgZm9yIGFuIGFwcGxpY2F0aW9uIHRv IHRha2UgYWR2YW50YWdlCj4gb2YgREFYLsKgwqAKCkFuIGFwcGxpY2F0aW9uIGFsc28gbmVlZHMg dG8gbWFrZSBzdXJlIHRoYXQgYSBnaXZlbiByYW5nZSBbYmFzZSAtCmJhc2Urc2l6ZV0gaXMgZnJl ZSBpbiBWTUEuIMKgVGhlIGFib3ZlIGV4YW1wbGUgdXNlcyBwb3NpeF9tZW1hbGlnbigpIHRvIGZp bmQKc3VjaCBhIHJhbmdlLCB3aGljaCBpbiB0dXJuIGNhbGxzIG1tYXAoKSB3aXRoIHNpemUgYXMg KGZzaXplICsgUE1EX1NJWkUpIGluCnRoaXMgY2FzZS4KCj4gVGhlIE5WTUwgY2hvb3NlcyBhcHBy b3ByaWF0ZSBhZGRyZXNzZXMgYW5kIGdldHMgYSBwcm9wZXJseSBhbGlnbmVkCj4gYWRkcmVzcyB3 aXRob3V0IGFueSBrZXJuZWwgY29kZS4KCkFuIGFwcGxpY2F0aW9uIGxpa2UgTlZNTCBjYW4gY29u dGludWUgdG8gc3BlY2lmeSBhIHNwZWNpZmljIGFkZHJlc3MgdG8KbW1hcCgpLiDCoE1vc3QgZXhp c3RpbmcgYXBwbGljYXRpb25zLCBob3dldmVyLCBkbyBub3Qgc3BlY2lmeSBhbiBhZGRyZXNzIHRv Cm1tYXAoKS4gwqBXaXRoIHRoaXMgcGF0Y2gsIHNwZWNpZnlpbmcgYW4gYWRkcmVzcyB3aWxsIHJl bWFpbiBvcHRpb25hbC4KCj4gPiBDaGFuZ2UgYXJjaF9nZXRfdW5tYXBwZWRfYXJlYSgpIGFuZCBh cmNoX2dldF91bm1hcHBlZF9hcmVhX3RvcGRvd24oKQo+ID4gdG8gcmVxdWVzdCBQTURfU0laRSBh bGlnbm1lbnQgd2hlbiB0aGUgcmVxdWVzdCBpcyBmb3IgYSBEQVggZmlsZSBhbmQKPiA+IGl0cyBt YXBwaW5nIHJhbmdlIGlzIGxhcmdlIGVub3VnaCBmb3IgdXNpbmcgYSBQTUQgcGFnZS4KPgo+IEkg dGhpbmsgdGhpcyBpcyB0aGUgd3JvbmcgcGxhY2UgZm9yIGl0LCBpZiB3ZSBkZWNpZGUgdGhhdCB0 aGlzIGlzIHRoZQo+IHJpZ2h0IHRoaW5nIHRvIGRvLsKgwqBUaGUgZmlsZXN5c3RlbSBoYXMgYSBn ZXRfdW5tYXBwZWRfYXJlYSgpIHdoaWNoCj4gc2hvdWxkIGJlIHVzZWQgaW5zdGVhZC4KClllcywg SSBjb25zaWRlcmVkIGFkZGluZyBhIGZpbGVzeXN0ZW0gZW50cnkgcG9pbnQsIGJ1dCBkZWNpZGVk IGdvaW5nIHRoaXMKd2F5IGJlY2F1c2U6CsKgLcKgYXJjaF9nZXRfdW5tYXBwZWRfYXJlYSgpIGFu ZMKgYXJjaF9nZXRfdW5tYXBwZWRfYXJlYV90b3Bkb3duKCkgYXJlIGFyY2gtCnNwZWNpZmljIGNv ZGUuIMKgVGhlcmVmb3JlLCB0aGlzIGZpbGVzeXN0ZW0gZW50cnkgcG9pbnQgd2lsbCBuZWVkIGFy Y2gtCnNwZWNpZmljIGltcGxlbWVudGF0aW9uLsKgCsKgLSBUaGVyZSBpcyBub3RoaW5nIGZpbGVz eXN0ZW0gc3BlY2lmaWMgYWJvdXQgcmVxdWVzdGluZyBQTUQgYWxpZ25tZW50LgoKPiA+IAo+ID4g QEAgLTE1Nyw2ICsxNTcsMTMgQEAgYXJjaF9nZXRfdW5tYXBwZWRfYXJlYShzdHJ1Y3QgZmlsZSAq ZmlscCwgdW5zaWduZWQKPiA+IGxvbmcgYWRkciwKPiA+IMKgCQlpbmZvLmFsaWduX21hc2sgPSBn ZXRfYWxpZ25fbWFzaygpOwo+ID4gwqAJCWluZm8uYWxpZ25fb2Zmc2V0ICs9IGdldF9hbGlnbl9i aXRzKCk7Cj4gPiDCoAl9Cj4gPiArCWlmIChmaWxwICYmIElTX0VOQUJMRUQoQ09ORklHX0ZTX0RB WF9QTUQpICYmCj4gPiBJU19EQVgoZmlsZV9pbm9kZShmaWxwKSkpIHsKPgo+IEFuZCB0aGVyZSdz IG5ldmVyIGEgbmVlZCBmb3IgdGhlIElTX0VOQUJMRUQuwqDCoElTX0RBWCgpIGNvbXBpbGVzIHRv ICcwJyBpZgo+IENPTkZJR19GU19EQVggaXMgZGlzYWJsZWQuCgpDT05GSUdfRlNfREFYX1BNRCBj YW4gYmUgZGlzYWJsZWQgd2hpbGUgQ09ORklHX0ZTX0RBWCBpcyBlbmFibGVkLgoKPiBBbmQgd2hl cmUgd291bGQgdGhpcyBlbmQ/wqDCoFdvdWxkIHlvdSBhbHNvIGNoYW5nZSB0aGlzIGNvZGUgdG8g bG9vayBmb3IKPiAxR0IgZW50cmllcyBpZiBDT05GSUdfRlNfREFYX1BVRCBpcyBlbmFibGVkP8Kg wqBGYXIgYmV0dGVyIHRvIGhhdmUgdGhpcwo+IGluIHRoZSBpbmRpdmlkdWFsIGZpbGVzeXN0ZW0g KHByb2JhYmx5IGNhbGxpbmcgYSBjb21tb24gaGVscGVyIGluIHRoZSBEQVgKPiBjb2RlKS4KClll cywgaXQgY2FuIGJlIGVhc2lseSBleHRlbmRlZCB0byBzdXBwb3J0IFBVRC4gwqBUaGlzIGF2b2lk cyBhbm90aGVyIHJvdW5kCm9mIGFwcGxpY2F0aW9uIGNoYW5nZXMgdG8gYWxpZ24gd2l0aCB0aGUg UFVEIHNpemUuCgpJZiB0aGUgUFVEIHN1cHBvcnQgdHVybnMgb3V0IHRvIGJlIGZpbGVzeXN0ZW0g c3BlY2lmaWMsIHdlIG1heSBuZWVkIGEKY2FwYWJpbGl0eSBiaXQgaW4gYWRkaXRpb24gdG8gQ09O RklHX0ZTX0RBWF9QVUQuCgpUaGFua3MsCi1Ub3NoaQpfX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fXwpMaW51eC1udmRpbW0gbWFpbGluZyBsaXN0CkxpbnV4LW52 ZGltbUBsaXN0cy4wMS5vcmcKaHR0cHM6Ly9saXN0cy4wMS5vcmcvbWFpbG1hbi9saXN0aW5mby9s aW51eC1udmRpbW0K From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f169.google.com (mail-ig0-f169.google.com [209.85.213.169]) by kanga.kvack.org (Postfix) with ESMTP id 832356B0005 for ; Wed, 6 Apr 2016 13:52:57 -0400 (EDT) Received: by mail-ig0-f169.google.com with SMTP id gy3so84039432igb.0 for ; Wed, 06 Apr 2016 10:52:57 -0700 (PDT) Received: from g1t5425.austin.hp.com (g1t5425.austin.hp.com. [15.216.225.55]) by mx.google.com with ESMTPS id g79si4005604ioe.130.2016.04.06.10.52.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Apr 2016 10:52:57 -0700 (PDT) Message-ID: <1459964672.20338.41.camel@hpe.com> Subject: Re: [PATCH] x86 get_unmapped_area: Add PMD alignment for DAX PMD mmap From: Toshi Kani Date: Wed, 06 Apr 2016 11:44:32 -0600 In-Reply-To: <20160406165027.GA2781@linux.intel.com> References: <1459951089-14911-1-git-send-email-toshi.kani@hpe.com> <20160406165027.GA2781@linux.intel.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: mingo@kernel.org, bp@suse.de, hpa@zytor.com, tglx@linutronix.de, dan.j.williams@intel.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, x86@kernel.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org On Wed, 2016-04-06 at 12:50 -0400, Matthew Wilcox wrote: > On Wed, Apr 06, 2016 at 07:58:09AM -0600, Toshi Kani wrote: > > > > When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using PMD page > > size.A A This feature relies on both mmap virtual address and FS > > block data (i.e. physical address) to be aligned by the PMD page > > size.A A Users can use mkfs options to specify FS to align block > > allocations.A A However, aligning mmap() address requires application > > changes to mmap() calls, such as: > > > > A -A A /* let the kernel to assign a mmap addr */ > > A -A A mptr = mmap(NULL, fsize, PROT_READ|PROT_WRITE, FLAGS, fd, 0); > > > > A +A A /* 1. obtain a PMD-aligned virtual address */ > > A +A A ret = posix_memalign(&mptr, PMD_SIZE, fsize); > > A +A A if (!ret) > > A +A A A A free(mptr);A A /* 2. release the virt addr */ > > A + > > A +A A /* 3. then pass the PMD-aligned virt addr to mmap() */ > > A +A A mptr = mmap(mptr, fsize, PROT_READ|PROT_WRITE, FLAGS, fd, 0); > > > > These changes add unnecessary dependency to DAX and PMD page size > > into application code.A A The kernel should assign a mmap address > > appropriate for the operation. > > I question the need for this patch.A A Choosing an appropriate base address > is the least of the changes needed for an application to take advantage > of DAX.A A An application also needs to make sure that a given range [base - base+size] is free in VMA. A The above example uses posix_memalign() to find such a range, which in turn calls mmap() with size as (fsize + PMD_SIZE) in this case. > The NVML chooses appropriate addresses and gets a properly aligned > address without any kernel code. An application like NVML can continue to specify a specific address to mmap(). A Most existing applications, however, do not specify an address to mmap(). A With this patch, specifying an address will remain optional. > > Change arch_get_unmapped_area() and arch_get_unmapped_area_topdown() > > to request PMD_SIZE alignment when the request is for a DAX file and > > its mapping range is large enough for using a PMD page. > > I think this is the wrong place for it, if we decide that this is the > right thing to do.A A The filesystem has a get_unmapped_area() which > should be used instead. Yes, I considered adding a filesystem entry point, but decided going this way because: A -A arch_get_unmapped_area() andA arch_get_unmapped_area_topdown() are arch- specific code. A Therefore, this filesystem entry point will need arch- specific implementation.A A - There is nothing filesystem specific about requesting PMD alignment. > > > > @@ -157,6 +157,13 @@ arch_get_unmapped_area(struct file *filp, unsigned > > long addr, > > A info.align_mask = get_align_mask(); > > A info.align_offset += get_align_bits(); > > A } > > + if (filp && IS_ENABLED(CONFIG_FS_DAX_PMD) && > > IS_DAX(file_inode(filp))) { > > And there's never a need for the IS_ENABLED.A A IS_DAX() compiles to '0' if > CONFIG_FS_DAX is disabled. CONFIG_FS_DAX_PMD can be disabled while CONFIG_FS_DAX is enabled. > And where would this end?A A Would you also change this code to look for > 1GB entries if CONFIG_FS_DAX_PUD is enabled?A A Far better to have this > in the individual filesystem (probably calling a common helper in the DAX > code). Yes, it can be easily extended to support PUD. A This avoids another round of application changes to align with the PUD size. If the PUD support turns out to be filesystem specific, we may need a capability bit in addition to CONFIG_FS_DAX_PUD. Thanks, -Toshi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752424AbcDFRw7 (ORCPT ); Wed, 6 Apr 2016 13:52:59 -0400 Received: from g1t5425.austin.hp.com ([15.216.225.55]:38810 "EHLO g1t5425.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751507AbcDFRw5 (ORCPT ); Wed, 6 Apr 2016 13:52:57 -0400 Message-ID: <1459964672.20338.41.camel@hpe.com> Subject: Re: [PATCH] x86 get_unmapped_area: Add PMD alignment for DAX PMD mmap From: Toshi Kani To: Matthew Wilcox Cc: mingo@kernel.org, bp@suse.de, hpa@zytor.com, tglx@linutronix.de, dan.j.williams@intel.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, x86@kernel.org, linux-nvdimm@ml01.01.org, linux-kernel@vger.kernel.org Date: Wed, 06 Apr 2016 11:44:32 -0600 In-Reply-To: <20160406165027.GA2781@linux.intel.com> References: <1459951089-14911-1-git-send-email-toshi.kani@hpe.com> <20160406165027.GA2781@linux.intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2 (3.18.5.2-1.fc23) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2016-04-06 at 12:50 -0400, Matthew Wilcox wrote: > On Wed, Apr 06, 2016 at 07:58:09AM -0600, Toshi Kani wrote: > > > > When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using PMD page > > size.  This feature relies on both mmap virtual address and FS > > block data (i.e. physical address) to be aligned by the PMD page > > size.  Users can use mkfs options to specify FS to align block > > allocations.  However, aligning mmap() address requires application > > changes to mmap() calls, such as: > > > >  -  /* let the kernel to assign a mmap addr */ > >  -  mptr = mmap(NULL, fsize, PROT_READ|PROT_WRITE, FLAGS, fd, 0); > > > >  +  /* 1. obtain a PMD-aligned virtual address */ > >  +  ret = posix_memalign(&mptr, PMD_SIZE, fsize); > >  +  if (!ret) > >  +    free(mptr);  /* 2. release the virt addr */ > >  + > >  +  /* 3. then pass the PMD-aligned virt addr to mmap() */ > >  +  mptr = mmap(mptr, fsize, PROT_READ|PROT_WRITE, FLAGS, fd, 0); > > > > These changes add unnecessary dependency to DAX and PMD page size > > into application code.  The kernel should assign a mmap address > > appropriate for the operation. > > I question the need for this patch.  Choosing an appropriate base address > is the least of the changes needed for an application to take advantage > of DAX.   An application also needs to make sure that a given range [base - base+size] is free in VMA.  The above example uses posix_memalign() to find such a range, which in turn calls mmap() with size as (fsize + PMD_SIZE) in this case. > The NVML chooses appropriate addresses and gets a properly aligned > address without any kernel code. An application like NVML can continue to specify a specific address to mmap().  Most existing applications, however, do not specify an address to mmap().  With this patch, specifying an address will remain optional. > > Change arch_get_unmapped_area() and arch_get_unmapped_area_topdown() > > to request PMD_SIZE alignment when the request is for a DAX file and > > its mapping range is large enough for using a PMD page. > > I think this is the wrong place for it, if we decide that this is the > right thing to do.  The filesystem has a get_unmapped_area() which > should be used instead. Yes, I considered adding a filesystem entry point, but decided going this way because:  - arch_get_unmapped_area() and arch_get_unmapped_area_topdown() are arch- specific code.  Therefore, this filesystem entry point will need arch- specific implementation.   - There is nothing filesystem specific about requesting PMD alignment. > > > > @@ -157,6 +157,13 @@ arch_get_unmapped_area(struct file *filp, unsigned > > long addr, > >   info.align_mask = get_align_mask(); > >   info.align_offset += get_align_bits(); > >   } > > + if (filp && IS_ENABLED(CONFIG_FS_DAX_PMD) && > > IS_DAX(file_inode(filp))) { > > And there's never a need for the IS_ENABLED.  IS_DAX() compiles to '0' if > CONFIG_FS_DAX is disabled. CONFIG_FS_DAX_PMD can be disabled while CONFIG_FS_DAX is enabled. > And where would this end?  Would you also change this code to look for > 1GB entries if CONFIG_FS_DAX_PUD is enabled?  Far better to have this > in the individual filesystem (probably calling a common helper in the DAX > code). Yes, it can be easily extended to support PUD.  This avoids another round of application changes to align with the PUD size. If the PUD support turns out to be filesystem specific, we may need a capability bit in addition to CONFIG_FS_DAX_PUD. Thanks, -Toshi