From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.kernel.org ([198.145.29.136]:33364 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752853AbcEBPvf (ORCPT ); Mon, 2 May 2016 11:51:35 -0400 Message-ID: <1462204291.11211.20.camel@kernel.org> Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io From: Vishal Verma To: Boaz Harrosh , Vishal Verma , linux-nvdimm@lists.01.org Cc: linux-block@vger.kernel.org, Jan Kara , Matthew Wilcox , Dave Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, Jens Axboe , linux-mm@kvack.org, Al Viro , Christoph Hellwig , linux-fsdevel@vger.kernel.org, Andrew Morton , linux-ext4@vger.kernel.org Date: Mon, 02 May 2016 09:51:31 -0600 In-Reply-To: <5727753F.6090104@plexistor.com> References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Sender: linux-block-owner@vger.kernel.org List-Id: linux-block@vger.kernel.org On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote: > On 04/29/2016 12:16 AM, Vishal Verma wrote: > > > > All IO in a dax filesystem used to go through dax_do_io, which > > cannot > > handle media errors, and thus cannot provide a recovery path that > > can > > send a write through the driver to clear errors. > > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In the > > IO > > path for DAX filesystems, use the same direct_IO path for both DAX > > and > > direct_io iocbs, but use the flags to identify when we are in > > O_DIRECT > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the > > conventional > > direct_IO path instead of DAX. > > > Really? What are your thinking here? > > What about all the current users of O_DIRECT, you have just made them > 4 times slower and "less concurrent*" then "buffred io" users. Since > direct_IO path will queue an IO request and all. > (And if it is not so slow then why do we need dax_do_io at all? > [Rhetorical]) > > I hate it that you overload the semantics of a known and expected > O_DIRECT flag, for special pmem quirks. This is an incompatible > and unrelated overload of the semantics of O_DIRECT. We overloaded O_DIRECT a long time ago when we made DAX piggyback on the same path: static inline bool io_is_direct(struct file *filp) { return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); } Yes O_DIRECT on a DAX mounted file system will now be slower, but - > > > > > This allows us a recovery path in the form of opening the file with > > O_DIRECT and writing to it with the usual O_DIRECT semantics > > (sector > > alignment restrictions). > > > I understand that you want a sector aligned IO, right? for the > clear of errors. But I hate it that you forced all O_DIRECT IO > to be slow for this. > Can you not make dax_do_io handle media errors? At least for the > parts of the IO that are aligned. > (And your recovery path application above can use only aligned >  IO to make sure) > > Please look for another solution. Even a special > IOCTL_DAX_CLEAR_ERROR  - see all the versions of this series prior to this one, where we try to do a fallback... > > [*"less concurrent" because of the queuing done in bdev. Note how >   pmem is not even multi-queue, and even if it was it will be much >   slower then DAX because of the code depth and all the locks and > task >   switches done in the block layer. In DAX the final memcpy is done > directly >   on the user-mode thread] > > Thanks > Boaz > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vishal Verma Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io Date: Mon, 02 May 2016 09:51:31 -0600 Message-ID: <1462204291.11211.20.camel@kernel.org> References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: linux-block@vger.kernel.org, Jan Kara , Matthew Wilcox , Dave Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, Jens Axboe , linux-mm@kvack.org, Al Viro , Christoph Hellwig , linux-fsdevel@vger.kernel.org, Andrew Morton , linux-ext4@vger.kernel.org To: Boaz Harrosh , Vishal Verma , linux-nvdimm@lists.01.org Return-path: In-Reply-To: <5727753F.6090104@plexistor.com> Sender: owner-linux-mm@kvack.org List-Id: linux-ext4.vger.kernel.org On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote: > On 04/29/2016 12:16 AM, Vishal Verma wrote: > >=20 > > All IO in a dax filesystem used to go through dax_do_io, which > > cannot > > handle media errors, and thus cannot provide a recovery path that > > can > > send a write through the driver to clear errors. > >=20 > > Add a new iocb flag for DAX, and set it only for DAX mounts. In the > > IO > > path for DAX filesystems, use the same direct_IO path for both DAX > > and > > direct_io iocbs, but use the flags to identify when we are in > > O_DIRECT > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the > > conventional > > direct_IO path instead of DAX. > >=20 > Really? What are your thinking here? >=20 > What about all the current users of O_DIRECT, you have just made them > 4 times slower and "less concurrent*" then "buffred io" users. Since > direct_IO path will queue an IO request and all. > (And if it is not so slow then why do we need dax_do_io at all? > [Rhetorical]) >=20 > I hate it that you overload the semantics of a known and expected > O_DIRECT flag, for special pmem quirks. This is an incompatible > and unrelated overload of the semantics of O_DIRECT. We overloaded O_DIRECT a long time ago when we made DAX piggyback on the same path: static inline bool io_is_direct(struct file *filp) { return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); } Yes O_DIRECT on a DAX mounted file system will now be slower, but - >=20 > >=20 > > This allows us a recovery path in the form of opening the file with > > O_DIRECT and writing to it with the usual O_DIRECT semantics > > (sector > > alignment restrictions). > >=20 > I understand that you want a sector aligned IO, right? for the > clear of errors. But I hate it that you forced all O_DIRECT IO > to be slow for this. > Can you not make dax_do_io handle media errors? At least for the > parts of the IO that are aligned. > (And your recovery path application above can use only aligned > =C2=A0IO to make sure) >=20 > Please look for another solution. Even a special > IOCTL_DAX_CLEAR_ERROR =C2=A0- see all the versions of this series prior to this one, where we t= ry to do a fallback... >=20 > [*"less concurrent" because of the queuing done in bdev. Note how > =C2=A0 pmem is not even multi-queue, and even if it was it will be much > =C2=A0 slower then DAX because of the code depth and all the locks and > task > =C2=A0 switches done in the block layer. In DAX the final memcpy is don= e > directly > =C2=A0 on the user-mode thread] >=20 > Thanks > Boaz >=20 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 8847B1A1EE6 for ; Mon, 2 May 2016 08:51:34 -0700 (PDT) Message-ID: <1462204291.11211.20.camel@kernel.org> Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io From: Vishal Verma Date: Mon, 02 May 2016 09:51:31 -0600 In-Reply-To: <5727753F.6090104@plexistor.com> References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> Mime-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Boaz Harrosh , Vishal Verma , linux-nvdimm@lists.01.org Cc: Jens Axboe , Jan Kara , Matthew Wilcox , Dave Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, linux-block@vger.kernel.org, linux-mm@kvack.org, Al Viro , Christoph Hellwig , linux-fsdevel@vger.kernel.org, Andrew Morton , linux-ext4@vger.kernel.org List-ID: T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE4OjQxICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6Cj4g T24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOgo+ID4gCj4gPiBBbGwg SU8gaW4gYSBkYXggZmlsZXN5c3RlbSB1c2VkIHRvIGdvIHRocm91Z2ggZGF4X2RvX2lvLCB3aGlj aAo+ID4gY2Fubm90Cj4gPiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJv dmlkZSBhIHJlY292ZXJ5IHBhdGggdGhhdAo+ID4gY2FuCj4gPiBzZW5kIGEgd3JpdGUgdGhyb3Vn aCB0aGUgZHJpdmVyIHRvIGNsZWFyIGVycm9ycy4KPiA+IAo+ID4gQWRkIGEgbmV3IGlvY2IgZmxh ZyBmb3IgREFYLCBhbmQgc2V0IGl0IG9ubHkgZm9yIERBWCBtb3VudHMuIEluIHRoZQo+ID4gSU8K PiA+IHBhdGggZm9yIERBWCBmaWxlc3lzdGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRo IGZvciBib3RoIERBWAo+ID4gYW5kCj4gPiBkaXJlY3RfaW8gaW9jYnMsIGJ1dCB1c2UgdGhlIGZs YWdzIHRvIGlkZW50aWZ5IHdoZW4gd2UgYXJlIGluCj4gPiBPX0RJUkVDVAo+ID4gbW9kZSB2cyBu b24gT19ESVJFQ1Qgd2l0aCBEQVgsIGFuZCBmb3IgT19ESVJFQ1QsIHVzZSB0aGUKPiA+IGNvbnZl bnRpb25hbAo+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVguCj4gPiAKPiBSZWFsbHk/ IFdoYXQgYXJlIHlvdXIgdGhpbmtpbmcgaGVyZT8KPiAKPiBXaGF0IGFib3V0IGFsbCB0aGUgY3Vy cmVudCB1c2VycyBvZiBPX0RJUkVDVCwgeW91IGhhdmUganVzdCBtYWRlIHRoZW0KPiA0IHRpbWVz IHNsb3dlciBhbmQgImxlc3MgY29uY3VycmVudCoiIHRoZW4gImJ1ZmZyZWQgaW8iIHVzZXJzLiBT aW5jZQo+IGRpcmVjdF9JTyBwYXRoIHdpbGwgcXVldWUgYW4gSU8gcmVxdWVzdCBhbmQgYWxsLgo+ IChBbmQgaWYgaXQgaXMgbm90IHNvIHNsb3cgdGhlbiB3aHkgZG8gd2UgbmVlZCBkYXhfZG9faW8g YXQgYWxsPwo+IFtSaGV0b3JpY2FsXSkKPiAKPiBJIGhhdGUgaXQgdGhhdCB5b3Ugb3ZlcmxvYWQg dGhlIHNlbWFudGljcyBvZiBhIGtub3duIGFuZCBleHBlY3RlZAo+IE9fRElSRUNUIGZsYWcsIGZv ciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlzIGFuIGluY29tcGF0aWJsZQo+IGFuZCB1bnJl bGF0ZWQgb3ZlcmxvYWQgb2YgdGhlIHNlbWFudGljcyBvZiBPX0RJUkVDVC4KCldlIG92ZXJsb2Fk ZWQgT19ESVJFQ1QgYSBsb25nIHRpbWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9u CnRoZSBzYW1lIHBhdGg6CgpzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBm aWxlICpmaWxwKQp7CglyZXR1cm4gKGZpbHAtPmZfZmxhZ3MgJiBPX0RJUkVDVCkgfHwgSVNfREFY KGZpbHAtPmZfbWFwcGluZy0+aG9zdCk7Cn0KClllcyBPX0RJUkVDVCBvbiBhIERBWCBtb3VudGVk IGZpbGUgc3lzdGVtIHdpbGwgbm93IGJlIHNsb3dlciwgYnV0IC0KCj4gCj4gPiAKPiA+IFRoaXMg YWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUgZm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxl IHdpdGgKPiA+IE9fRElSRUNUIGFuZCB3cml0aW5nIHRvIGl0IHdpdGggdGhlIHVzdWFsIE9fRElS RUNUIHNlbWFudGljcwo+ID4gKHNlY3Rvcgo+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuCj4g PiAKPiBJIHVuZGVyc3RhbmQgdGhhdCB5b3Ugd2FudCBhIHNlY3RvciBhbGlnbmVkIElPLCByaWdo dD8gZm9yIHRoZQo+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlvdSBmb3Jj ZWQgYWxsIE9fRElSRUNUIElPCj4gdG8gYmUgc2xvdyBmb3IgdGhpcy4KPiBDYW4geW91IG5vdCBt YWtlIGRheF9kb19pbyBoYW5kbGUgbWVkaWEgZXJyb3JzPyBBdCBsZWFzdCBmb3IgdGhlCj4gcGFy dHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuCj4gKEFuZCB5b3VyIHJlY292ZXJ5IHBhdGgg YXBwbGljYXRpb24gYWJvdmUgY2FuIHVzZSBvbmx5IGFsaWduZWQKPiDCoElPIHRvIG1ha2Ugc3Vy ZSkKPiAKPiBQbGVhc2UgbG9vayBmb3IgYW5vdGhlciBzb2x1dGlvbi4gRXZlbiBhIHNwZWNpYWwK PiBJT0NUTF9EQVhfQ0xFQVJfRVJST1IKCsKgLSBzZWUgYWxsIHRoZSB2ZXJzaW9ucyBvZiB0aGlz IHNlcmllcyBwcmlvciB0byB0aGlzIG9uZSwgd2hlcmUgd2UgdHJ5CnRvIGRvIGEgZmFsbGJhY2su Li4KCj4gCj4gWyoibGVzcyBjb25jdXJyZW50IiBiZWNhdXNlIG9mIHRoZSBxdWV1aW5nIGRvbmUg aW4gYmRldi4gTm90ZSBob3cKPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11bHRpLXF1ZXVlLCBhbmQg ZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoCj4gwqAgc2xvd2VyIHRoZW4gREFYIGJlY2F1 c2Ugb2YgdGhlIGNvZGUgZGVwdGggYW5kIGFsbCB0aGUgbG9ja3MgYW5kCj4gdGFzawo+IMKgIHN3 aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxheWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBp cyBkb25lCj4gZGlyZWN0bHkKPiDCoCBvbiB0aGUgdXNlci1tb2RlIHRocmVhZF0KPiAKPiBUaGFu a3MKPiBCb2F6Cj4gCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fXwpMaW51eC1udmRpbW0gbWFpbGluZyBsaXN0CkxpbnV4LW52ZGltbUBsaXN0cy4wMS5vcmcK aHR0cHM6Ly9saXN0cy4wMS5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1udmRpbW0K From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id 15E197CBE for ; Mon, 2 May 2016 10:51:37 -0500 (CDT) Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by relay2.corp.sgi.com (Postfix) with ESMTP id DB6B230405F for ; Mon, 2 May 2016 08:51:36 -0700 (PDT) Received: from mail.kernel.org ([198.145.29.136]) by cuda.sgi.com with ESMTP id Ta4DCXiqugQyS6MP (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Mon, 02 May 2016 08:51:34 -0700 (PDT) Message-ID: <1462204291.11211.20.camel@kernel.org> Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io From: Vishal Verma Date: Mon, 02 May 2016 09:51:31 -0600 In-Reply-To: <5727753F.6090104@plexistor.com> References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> Mime-Version: 1.0 List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Boaz Harrosh , Vishal Verma , linux-nvdimm@lists.01.org Cc: Jens Axboe , Jan Kara , Matthew Wilcox , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, linux-block@vger.kernel.org, linux-mm@kvack.org, Al Viro , Christoph Hellwig , linux-fsdevel@vger.kernel.org, Andrew Morton , linux-ext4@vger.kernel.org T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE4OjQxICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6Cj4g T24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOgo+ID4gCj4gPiBBbGwg SU8gaW4gYSBkYXggZmlsZXN5c3RlbSB1c2VkIHRvIGdvIHRocm91Z2ggZGF4X2RvX2lvLCB3aGlj aAo+ID4gY2Fubm90Cj4gPiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJv dmlkZSBhIHJlY292ZXJ5IHBhdGggdGhhdAo+ID4gY2FuCj4gPiBzZW5kIGEgd3JpdGUgdGhyb3Vn aCB0aGUgZHJpdmVyIHRvIGNsZWFyIGVycm9ycy4KPiA+IAo+ID4gQWRkIGEgbmV3IGlvY2IgZmxh ZyBmb3IgREFYLCBhbmQgc2V0IGl0IG9ubHkgZm9yIERBWCBtb3VudHMuIEluIHRoZQo+ID4gSU8K PiA+IHBhdGggZm9yIERBWCBmaWxlc3lzdGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRo IGZvciBib3RoIERBWAo+ID4gYW5kCj4gPiBkaXJlY3RfaW8gaW9jYnMsIGJ1dCB1c2UgdGhlIGZs YWdzIHRvIGlkZW50aWZ5IHdoZW4gd2UgYXJlIGluCj4gPiBPX0RJUkVDVAo+ID4gbW9kZSB2cyBu b24gT19ESVJFQ1Qgd2l0aCBEQVgsIGFuZCBmb3IgT19ESVJFQ1QsIHVzZSB0aGUKPiA+IGNvbnZl bnRpb25hbAo+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVguCj4gPiAKPiBSZWFsbHk/ IFdoYXQgYXJlIHlvdXIgdGhpbmtpbmcgaGVyZT8KPiAKPiBXaGF0IGFib3V0IGFsbCB0aGUgY3Vy cmVudCB1c2VycyBvZiBPX0RJUkVDVCwgeW91IGhhdmUganVzdCBtYWRlIHRoZW0KPiA0IHRpbWVz IHNsb3dlciBhbmQgImxlc3MgY29uY3VycmVudCoiIHRoZW4gImJ1ZmZyZWQgaW8iIHVzZXJzLiBT aW5jZQo+IGRpcmVjdF9JTyBwYXRoIHdpbGwgcXVldWUgYW4gSU8gcmVxdWVzdCBhbmQgYWxsLgo+ IChBbmQgaWYgaXQgaXMgbm90IHNvIHNsb3cgdGhlbiB3aHkgZG8gd2UgbmVlZCBkYXhfZG9faW8g YXQgYWxsPwo+IFtSaGV0b3JpY2FsXSkKPiAKPiBJIGhhdGUgaXQgdGhhdCB5b3Ugb3ZlcmxvYWQg dGhlIHNlbWFudGljcyBvZiBhIGtub3duIGFuZCBleHBlY3RlZAo+IE9fRElSRUNUIGZsYWcsIGZv ciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlzIGFuIGluY29tcGF0aWJsZQo+IGFuZCB1bnJl bGF0ZWQgb3ZlcmxvYWQgb2YgdGhlIHNlbWFudGljcyBvZiBPX0RJUkVDVC4KCldlIG92ZXJsb2Fk ZWQgT19ESVJFQ1QgYSBsb25nIHRpbWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9u CnRoZSBzYW1lIHBhdGg6CgpzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBm aWxlICpmaWxwKQp7CglyZXR1cm4gKGZpbHAtPmZfZmxhZ3MgJiBPX0RJUkVDVCkgfHwgSVNfREFY KGZpbHAtPmZfbWFwcGluZy0+aG9zdCk7Cn0KClllcyBPX0RJUkVDVCBvbiBhIERBWCBtb3VudGVk IGZpbGUgc3lzdGVtIHdpbGwgbm93IGJlIHNsb3dlciwgYnV0IC0KCj4gCj4gPiAKPiA+IFRoaXMg YWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUgZm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxl IHdpdGgKPiA+IE9fRElSRUNUIGFuZCB3cml0aW5nIHRvIGl0IHdpdGggdGhlIHVzdWFsIE9fRElS RUNUIHNlbWFudGljcwo+ID4gKHNlY3Rvcgo+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuCj4g PiAKPiBJIHVuZGVyc3RhbmQgdGhhdCB5b3Ugd2FudCBhIHNlY3RvciBhbGlnbmVkIElPLCByaWdo dD8gZm9yIHRoZQo+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlvdSBmb3Jj ZWQgYWxsIE9fRElSRUNUIElPCj4gdG8gYmUgc2xvdyBmb3IgdGhpcy4KPiBDYW4geW91IG5vdCBt YWtlIGRheF9kb19pbyBoYW5kbGUgbWVkaWEgZXJyb3JzPyBBdCBsZWFzdCBmb3IgdGhlCj4gcGFy dHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuCj4gKEFuZCB5b3VyIHJlY292ZXJ5IHBhdGgg YXBwbGljYXRpb24gYWJvdmUgY2FuIHVzZSBvbmx5IGFsaWduZWQKPiDCoElPIHRvIG1ha2Ugc3Vy ZSkKPiAKPiBQbGVhc2UgbG9vayBmb3IgYW5vdGhlciBzb2x1dGlvbi4gRXZlbiBhIHNwZWNpYWwK PiBJT0NUTF9EQVhfQ0xFQVJfRVJST1IKCsKgLSBzZWUgYWxsIHRoZSB2ZXJzaW9ucyBvZiB0aGlz IHNlcmllcyBwcmlvciB0byB0aGlzIG9uZSwgd2hlcmUgd2UgdHJ5CnRvIGRvIGEgZmFsbGJhY2su Li4KCj4gCj4gWyoibGVzcyBjb25jdXJyZW50IiBiZWNhdXNlIG9mIHRoZSBxdWV1aW5nIGRvbmUg aW4gYmRldi4gTm90ZSBob3cKPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11bHRpLXF1ZXVlLCBhbmQg ZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoCj4gwqAgc2xvd2VyIHRoZW4gREFYIGJlY2F1 c2Ugb2YgdGhlIGNvZGUgZGVwdGggYW5kIGFsbCB0aGUgbG9ja3MgYW5kCj4gdGFzawo+IMKgIHN3 aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxheWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBp cyBkb25lCj4gZGlyZWN0bHkKPiDCoCBvbiB0aGUgdXNlci1tb2RlIHRocmVhZF0KPiAKPiBUaGFu a3MKPiBCb2F6Cj4gCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fXwp4ZnMgbWFpbGluZyBsaXN0Cnhmc0Bvc3Muc2dpLmNvbQpodHRwOi8vb3NzLnNnaS5jb20v bWFpbG1hbi9saXN0aW5mby94ZnMK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 66D306B0253 for ; Mon, 2 May 2016 11:51:35 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id 203so359293614pfy.2 for ; Mon, 02 May 2016 08:51:35 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.136]) by mx.google.com with ESMTPS id o86si2010256pfi.217.2016.05.02.08.51.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 02 May 2016 08:51:34 -0700 (PDT) Message-ID: <1462204291.11211.20.camel@kernel.org> Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io From: Vishal Verma Date: Mon, 02 May 2016 09:51:31 -0600 In-Reply-To: <5727753F.6090104@plexistor.com> References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Boaz Harrosh , Vishal Verma , linux-nvdimm@lists.01.org Cc: linux-block@vger.kernel.org, Jan Kara , Matthew Wilcox , Dave Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, Jens Axboe , linux-mm@kvack.org, Al Viro , Christoph Hellwig , linux-fsdevel@vger.kernel.org, Andrew Morton , linux-ext4@vger.kernel.org On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote: > On 04/29/2016 12:16 AM, Vishal Verma wrote: > > > > All IO in a dax filesystem used to go through dax_do_io, which > > cannot > > handle media errors, and thus cannot provide a recovery path that > > can > > send a write through the driver to clear errors. > > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In the > > IO > > path for DAX filesystems, use the same direct_IO path for both DAX > > and > > direct_io iocbs, but use the flags to identify when we are in > > O_DIRECT > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the > > conventional > > direct_IO path instead of DAX. > > > Really? What are your thinking here? > > What about all the current users of O_DIRECT, you have just made them > 4 times slower and "less concurrent*" then "buffred io" users. Since > direct_IO path will queue an IO request and all. > (And if it is not so slow then why do we need dax_do_io at all? > [Rhetorical]) > > I hate it that you overload the semantics of a known and expected > O_DIRECT flag, for special pmem quirks. This is an incompatible > and unrelated overload of the semantics of O_DIRECT. We overloaded O_DIRECT a long time ago when we made DAX piggyback on the same path: static inline bool io_is_direct(struct file *filp) { return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); } Yes O_DIRECT on a DAX mounted file system will now be slower, but - > > > > > This allows us a recovery path in the form of opening the file with > > O_DIRECT and writing to it with the usual O_DIRECT semantics > > (sector > > alignment restrictions). > > > I understand that you want a sector aligned IO, right? for the > clear of errors. But I hate it that you forced all O_DIRECT IO > to be slow for this. > Can you not make dax_do_io handle media errors? At least for the > parts of the IO that are aligned. > (And your recovery path application above can use only aligned > A IO to make sure) > > Please look for another solution. Even a special > IOCTL_DAX_CLEAR_ERROR A - see all the versions of this series prior to this one, where we try to do a fallback... > > [*"less concurrent" because of the queuing done in bdev. Note how > A pmem is not even multi-queue, and even if it was it will be much > A slower then DAX because of the code depth and all the locks and > task > A switches done in the block layer. In DAX the final memcpy is done > directly > A on the user-mode thread] > > Thanks > Boaz > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753950AbcEBPvp (ORCPT ); Mon, 2 May 2016 11:51:45 -0400 Received: from mail.kernel.org ([198.145.29.136]:33364 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752853AbcEBPvf (ORCPT ); Mon, 2 May 2016 11:51:35 -0400 Message-ID: <1462204291.11211.20.camel@kernel.org> Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io From: Vishal Verma To: Boaz Harrosh , Vishal Verma , linux-nvdimm@ml01.01.org Cc: linux-block@vger.kernel.org, Jan Kara , Matthew Wilcox , Dave Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, Jens Axboe , linux-mm@kvack.org, Al Viro , Christoph Hellwig , linux-fsdevel@vger.kernel.org, Andrew Morton , linux-ext4@vger.kernel.org Date: Mon, 02 May 2016 09:51:31 -0600 In-Reply-To: <5727753F.6090104@plexistor.com> References: <1461878218-3844-1-git-send-email-vishal.l.verma@intel.com> <1461878218-3844-6-git-send-email-vishal.l.verma@intel.com> <5727753F.6090104@plexistor.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2 (3.18.5.2-1.fc23) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote: > On 04/29/2016 12:16 AM, Vishal Verma wrote: > > > > All IO in a dax filesystem used to go through dax_do_io, which > > cannot > > handle media errors, and thus cannot provide a recovery path that > > can > > send a write through the driver to clear errors. > > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In the > > IO > > path for DAX filesystems, use the same direct_IO path for both DAX > > and > > direct_io iocbs, but use the flags to identify when we are in > > O_DIRECT > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the > > conventional > > direct_IO path instead of DAX. > > > Really? What are your thinking here? > > What about all the current users of O_DIRECT, you have just made them > 4 times slower and "less concurrent*" then "buffred io" users. Since > direct_IO path will queue an IO request and all. > (And if it is not so slow then why do we need dax_do_io at all? > [Rhetorical]) > > I hate it that you overload the semantics of a known and expected > O_DIRECT flag, for special pmem quirks. This is an incompatible > and unrelated overload of the semantics of O_DIRECT. We overloaded O_DIRECT a long time ago when we made DAX piggyback on the same path: static inline bool io_is_direct(struct file *filp) { return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); } Yes O_DIRECT on a DAX mounted file system will now be slower, but - > > > > > This allows us a recovery path in the form of opening the file with > > O_DIRECT and writing to it with the usual O_DIRECT semantics > > (sector > > alignment restrictions). > > > I understand that you want a sector aligned IO, right? for the > clear of errors. But I hate it that you forced all O_DIRECT IO > to be slow for this. > Can you not make dax_do_io handle media errors? At least for the > parts of the IO that are aligned. > (And your recovery path application above can use only aligned >  IO to make sure) > > Please look for another solution. Even a special > IOCTL_DAX_CLEAR_ERROR  - see all the versions of this series prior to this one, where we try to do a fallback... > > [*"less concurrent" because of the queuing done in bdev. Note how >   pmem is not even multi-queue, and even if it was it will be much >   slower then DAX because of the code depth and all the locks and > task >   switches done in the block layer. In DAX the final memcpy is done > directly >   on the user-mode thread] > > Thanks > Boaz >