From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Hellwig Subject: Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap Date: Sat, 12 Aug 2017 09:33:49 +0200 Message-ID: <20170812073349.GA12679@lst.de> References: <150181368442.32119.13336247800141074356.stgit@dwillia2-desk3.amr.corp.intel.com> <20170805095013.GC14930@lst.de> <20170811104429.GA13736@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Dan Williams Cc: Christoph Hellwig , "Darrick J. Wong" , Jan Kara , "linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org" , Dave Chinner , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jeff Moyer , Alexander Viro , Andy Lutomirski , linux-fsdevel , Ross Zwisler , Linux API List-Id: linux-api@vger.kernel.org On Fri, Aug 11, 2017 at 03:26:05PM -0700, Dan Williams wrote: > Right, but they let userspace make inferences about the state of > metadata relative to I/O to a given storage address. In this regard > S_IOMAP_IMMUTABLE is no different than MAP_SYNC, but 'immutable' goes > a step further to let an application infer that the storage address is > stable. This enables applications that MAP_SYNC does not, see below. But the application must not know (and cannot know) the storage address, so it doesn't matter. > > What is the observable behavior of an extent map change? How can you > > describe your immutable extent map behavior so that when I violate > > them by e.g. moving one extent to a different place on disk you can > > observe that in userspace? > > The violation is blocked, it's immutable. Using this feature means the > application is taking away some of the kernel's freedom. That is a > valid / safe tradeoff for the set of applications that would otherwise > resort to raw device access. What can the application do with it safely that it can't otherwise do? Short answer: nothing. > > > > Please explain how this interface allows for any sort of safe userspace > > DMA. > > So this is where I continue to see S_IOMAP_IMMUTABLE being able to > support applications that MAP_SYNC does not. Dave mentioned userspace > pNFS4 servers, but there's also Samba and other protocols that want to > negotiate a direct path to pmem outside the kernel. Userspace pNFS servers must use a userspace file system. Everything else is just brainded stupid due to the amount of communication they need to do. Also note that the only pNFS layouts that would even cause direct block access are pNFS block/scsi and for those the S_IOMAP_IMMUTABLE semantics are not very useful (background: I wrote the Linux implementation for those, and authored the scsi layout spec) > Applications that just want flush from userspace can use MAP_SYNC, > those that need to temporarily pin the block for RDMA can use the > in-kernel pNFS server, and those that need to coordinate both from > userspace can use S_IOMAP_IMMUTABLE. It's a continuum, not a > competition. Again - how does your application even know that I moved your block around with your S_IOMAP_IMMUTABLE? We should never add interfaces that mandate implementations - we should based interfaces based on user observable behavior - and debug tools like fiemap don't count. Before going any further please write a man page that describeѕ your intended semantics in a way that an application programmer understands. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from newverein.lst.de (verein.lst.de [213.95.11.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 4AD7B21D49110 for ; Sat, 12 Aug 2017 00:31:30 -0700 (PDT) Date: Sat, 12 Aug 2017 09:33:49 +0200 From: Christoph Hellwig Subject: Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap Message-ID: <20170812073349.GA12679@lst.de> References: <150181368442.32119.13336247800141074356.stgit@dwillia2-desk3.amr.corp.intel.com> <20170805095013.GC14930@lst.de> <20170811104429.GA13736@lst.de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Dan Williams Cc: Jan Kara , "linux-nvdimm@lists.01.org" , Linux API , "Darrick J. Wong" , Dave Chinner , "linux-kernel@vger.kernel.org" , linux-xfs@vger.kernel.org, Alexander Viro , Andy Lutomirski , linux-fsdevel , Christoph Hellwig List-ID: T24gRnJpLCBBdWcgMTEsIDIwMTcgYXQgMDM6MjY6MDVQTSAtMDcwMCwgRGFuIFdpbGxpYW1zIHdy b3RlOgo+IFJpZ2h0LCBidXQgdGhleSBsZXQgdXNlcnNwYWNlIG1ha2UgaW5mZXJlbmNlcyBhYm91 dCB0aGUgc3RhdGUgb2YKPiBtZXRhZGF0YSByZWxhdGl2ZSB0byBJL08gdG8gYSBnaXZlbiBzdG9y YWdlIGFkZHJlc3MuIEluIHRoaXMgcmVnYXJkCj4gU19JT01BUF9JTU1VVEFCTEUgaXMgbm8gZGlm ZmVyZW50IHRoYW4gTUFQX1NZTkMsIGJ1dCAnaW1tdXRhYmxlJyBnb2VzCj4gYSBzdGVwIGZ1cnRo ZXIgdG8gbGV0IGFuIGFwcGxpY2F0aW9uIGluZmVyIHRoYXQgdGhlIHN0b3JhZ2UgYWRkcmVzcyBp cwo+IHN0YWJsZS4gVGhpcyBlbmFibGVzIGFwcGxpY2F0aW9ucyB0aGF0IE1BUF9TWU5DIGRvZXMg bm90LCBzZWUgYmVsb3cuCgpCdXQgdGhlIGFwcGxpY2F0aW9uIG11c3Qgbm90IGtub3cgKGFuZCBj YW5ub3Qga25vdykgdGhlIHN0b3JhZ2UgYWRkcmVzcywKc28gaXQgZG9lc24ndCBtYXR0ZXIuCgo+ ID4gV2hhdCBpcyB0aGUgb2JzZXJ2YWJsZSBiZWhhdmlvciBvZiBhbiBleHRlbnQgbWFwIGNoYW5n ZT8gIEhvdyBjYW4geW91Cj4gPiBkZXNjcmliZSB5b3VyIGltbXV0YWJsZSBleHRlbnQgbWFwIGJl aGF2aW9yIHNvIHRoYXQgd2hlbiBJIHZpb2xhdGUKPiA+IHRoZW0gYnkgZS5nLiBtb3Zpbmcgb25l IGV4dGVudCB0byBhIGRpZmZlcmVudCBwbGFjZSBvbiBkaXNrIHlvdSBjYW4KPiA+IG9ic2VydmUg dGhhdCBpbiB1c2Vyc3BhY2U/Cj4gCj4gVGhlIHZpb2xhdGlvbiBpcyBibG9ja2VkLCBpdCdzIGlt bXV0YWJsZS4gVXNpbmcgdGhpcyBmZWF0dXJlIG1lYW5zIHRoZQo+IGFwcGxpY2F0aW9uIGlzIHRh a2luZyBhd2F5IHNvbWUgb2YgdGhlIGtlcm5lbCdzIGZyZWVkb20uIFRoYXQgaXMgYQo+IHZhbGlk IC8gc2FmZSB0cmFkZW9mZiBmb3IgdGhlIHNldCBvZiBhcHBsaWNhdGlvbnMgdGhhdCB3b3VsZCBv dGhlcndpc2UKPiByZXNvcnQgdG8gcmF3IGRldmljZSBhY2Nlc3MuCgpXaGF0IGNhbiB0aGUgYXBw bGljYXRpb24gZG8gd2l0aCBpdCBzYWZlbHkgdGhhdCBpdCBjYW4ndCBvdGhlcndpc2UgZG8/ClNo b3J0IGFuc3dlcjogbm90aGluZy4KCj4gPgo+ID4gUGxlYXNlIGV4cGxhaW4gaG93IHRoaXMgaW50 ZXJmYWNlIGFsbG93cyBmb3IgYW55IHNvcnQgb2Ygc2FmZSB1c2Vyc3BhY2UKPiA+IERNQS4KPiAK PiBTbyB0aGlzIGlzIHdoZXJlIEkgY29udGludWUgdG8gc2VlIFNfSU9NQVBfSU1NVVRBQkxFIGJl aW5nIGFibGUgdG8KPiBzdXBwb3J0IGFwcGxpY2F0aW9ucyB0aGF0IE1BUF9TWU5DIGRvZXMgbm90 LiBEYXZlIG1lbnRpb25lZCB1c2Vyc3BhY2UKPiBwTkZTNCBzZXJ2ZXJzLCBidXQgdGhlcmUncyBh bHNvIFNhbWJhIGFuZCBvdGhlciBwcm90b2NvbHMgdGhhdCB3YW50IHRvCj4gbmVnb3RpYXRlIGEg ZGlyZWN0IHBhdGggdG8gcG1lbSBvdXRzaWRlIHRoZSBrZXJuZWwuCgpVc2Vyc3BhY2UgcE5GUyBz ZXJ2ZXJzIG11c3QgdXNlIGEgdXNlcnNwYWNlIGZpbGUgc3lzdGVtLiAgRXZlcnl0aGluZwplbHNl IGlzIGp1c3QgYnJhaW5kZWQgc3R1cGlkIGR1ZSB0byB0aGUgYW1vdW50IG9mIGNvbW11bmljYXRp b24gdGhleQpuZWVkIHRvIGRvLiAgQWxzbyBub3RlIHRoYXQgdGhlIG9ubHkgcE5GUyBsYXlvdXRz IHRoYXQgd291bGQgZXZlbiBjYXVzZQpkaXJlY3QgYmxvY2sgYWNjZXNzIGFyZSBwTkZTIGJsb2Nr L3Njc2kgYW5kIGZvciB0aG9zZSB0aGUKU19JT01BUF9JTU1VVEFCTEUgc2VtYW50aWNzIGFyZSBu b3QgdmVyeSB1c2VmdWwgKGJhY2tncm91bmQ6IEkgd3JvdGUKdGhlIExpbnV4IGltcGxlbWVudGF0 aW9uIGZvciB0aG9zZSwgYW5kIGF1dGhvcmVkIHRoZSBzY3NpIGxheW91dCBzcGVjKQoKCj4gQXBw bGljYXRpb25zIHRoYXQganVzdCB3YW50IGZsdXNoIGZyb20gdXNlcnNwYWNlIGNhbiB1c2UgTUFQ X1NZTkMsCj4gdGhvc2UgdGhhdCBuZWVkIHRvIHRlbXBvcmFyaWx5IHBpbiB0aGUgYmxvY2sgZm9y IFJETUEgY2FuIHVzZSB0aGUKPiBpbi1rZXJuZWwgcE5GUyBzZXJ2ZXIsIGFuZCB0aG9zZSB0aGF0 IG5lZWQgdG8gY29vcmRpbmF0ZSBib3RoIGZyb20KPiB1c2Vyc3BhY2UgY2FuIHVzZSBTX0lPTUFQ X0lNTVVUQUJMRS4gSXQncyBhIGNvbnRpbnV1bSwgbm90IGEKPiBjb21wZXRpdGlvbi4KCkFnYWlu IC0gaG93IGRvZXMgeW91ciBhcHBsaWNhdGlvbiBldmVuIGtub3cgdGhhdCBJIG1vdmVkIHlvdXIg YmxvY2sKYXJvdW5kIHdpdGggeW91ciBTX0lPTUFQX0lNTVVUQUJMRT8gIFdlIHNob3VsZCBuZXZl ciBhZGQgaW50ZXJmYWNlcwp0aGF0IG1hbmRhdGUgaW1wbGVtZW50YXRpb25zIC0gd2Ugc2hvdWxk IGJhc2VkIGludGVyZmFjZXMgYmFzZWQgb24KdXNlciBvYnNlcnZhYmxlIGJlaGF2aW9yIC0gYW5k IGRlYnVnIHRvb2xzIGxpa2UgZmllbWFwIGRvbid0IGNvdW50LgoKQmVmb3JlIGdvaW5nIGFueSBm dXJ0aGVyIHBsZWFzZSB3cml0ZSBhIG1hbiBwYWdlIHRoYXQgZGVzY3JpYmXRlSB5b3VyCmludGVu ZGVkIHNlbWFudGljcyBpbiBhIHdheSB0aGF0IGFuIGFwcGxpY2F0aW9uIHByb2dyYW1tZXIgdW5k ZXJzdGFuZHMuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f CkxpbnV4LW52ZGltbSBtYWlsaW5nIGxpc3QKTGludXgtbnZkaW1tQGxpc3RzLjAxLm9yZwpodHRw czovL2xpc3RzLjAxLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LW52ZGltbQo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from verein.lst.de ([213.95.11.211]:44214 "EHLO newverein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750762AbdHLHdv (ORCPT ); Sat, 12 Aug 2017 03:33:51 -0400 Date: Sat, 12 Aug 2017 09:33:49 +0200 From: Christoph Hellwig Subject: Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap Message-ID: <20170812073349.GA12679@lst.de> References: <150181368442.32119.13336247800141074356.stgit@dwillia2-desk3.amr.corp.intel.com> <20170805095013.GC14930@lst.de> <20170811104429.GA13736@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Dan Williams Cc: Christoph Hellwig , "Darrick J. Wong" , Jan Kara , "linux-nvdimm@lists.01.org" , Dave Chinner , "linux-kernel@vger.kernel.org" , linux-xfs@vger.kernel.org, Jeff Moyer , Alexander Viro , Andy Lutomirski , linux-fsdevel , Ross Zwisler , Linux API On Fri, Aug 11, 2017 at 03:26:05PM -0700, Dan Williams wrote: > Right, but they let userspace make inferences about the state of > metadata relative to I/O to a given storage address. In this regard > S_IOMAP_IMMUTABLE is no different than MAP_SYNC, but 'immutable' goes > a step further to let an application infer that the storage address is > stable. This enables applications that MAP_SYNC does not, see below. But the application must not know (and cannot know) the storage address, so it doesn't matter. > > What is the observable behavior of an extent map change? How can you > > describe your immutable extent map behavior so that when I violate > > them by e.g. moving one extent to a different place on disk you can > > observe that in userspace? > > The violation is blocked, it's immutable. Using this feature means the > application is taking away some of the kernel's freedom. That is a > valid / safe tradeoff for the set of applications that would otherwise > resort to raw device access. What can the application do with it safely that it can't otherwise do? Short answer: nothing. > > > > Please explain how this interface allows for any sort of safe userspace > > DMA. > > So this is where I continue to see S_IOMAP_IMMUTABLE being able to > support applications that MAP_SYNC does not. Dave mentioned userspace > pNFS4 servers, but there's also Samba and other protocols that want to > negotiate a direct path to pmem outside the kernel. Userspace pNFS servers must use a userspace file system. Everything else is just brainded stupid due to the amount of communication they need to do. Also note that the only pNFS layouts that would even cause direct block access are pNFS block/scsi and for those the S_IOMAP_IMMUTABLE semantics are not very useful (background: I wrote the Linux implementation for those, and authored the scsi layout spec) > Applications that just want flush from userspace can use MAP_SYNC, > those that need to temporarily pin the block for RDMA can use the > in-kernel pNFS server, and those that need to coordinate both from > userspace can use S_IOMAP_IMMUTABLE. It's a continuum, not a > competition. Again - how does your application even know that I moved your block around with your S_IOMAP_IMMUTABLE? We should never add interfaces that mandate implementations - we should based interfaces based on user observable behavior - and debug tools like fiemap don't count. Before going any further please write a man page that describeѕ your intended semantics in a way that an application programmer understands.