diff for duplicates of <1462215110.1421.43.camel@intel.com> diff --git a/a/1.txt b/N1/1.txt index c608940..105a3b6 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -1,82 +1,136 @@ -T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+ -IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP -biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g -PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K -PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz -ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g -PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5 -IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo -cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg -YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g -SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz -dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K -PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0 -byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl -IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+ -ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu -DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+ -ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU -LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu -ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j -ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu -DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4 -X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl -IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0 -ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz -IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt -YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp -bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN -Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm -aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT -X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg -ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz -ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu -byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u -bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp -dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg -dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs -ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4 -IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g -ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g -aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K -PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3 -aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN -Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg -Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg -YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g -PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K -PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln -aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv -dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+ -ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh -c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g -PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg -YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs -b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY -X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz -IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh -Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg -dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz -byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz -cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg -NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91 -dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx -dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo -ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp -cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K -PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy -ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K -U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy -IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2 -ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3 -YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt -YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g -ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+ -ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl -dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11 -bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz -bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr -cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh -eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0 -bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu -a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA== +On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote: +> On 05/02/2016 06:51 PM, Vishal Verma wrote: +> > +> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote: +> > > +> > > On 04/29/2016 12:16 AM, Vishal Verma wrote: +> > > > +> > > > +> > > > All IO in a dax filesystem used to go through dax_do_io, which +> > > > cannot +> > > > handle media errors, and thus cannot provide a recovery path +> > > > that +> > > > can +> > > > send a write through the driver to clear errors. +> > > > +> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In +> > > > the +> > > > IO +> > > > path for DAX filesystems, use the same direct_IO path for both +> > > > DAX +> > > > and +> > > > direct_io iocbs, but use the flags to identify when we are in +> > > > O_DIRECT +> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the +> > > > conventional +> > > > direct_IO path instead of DAX. +> > > > +> > > Really? What are your thinking here? +> > > +> > > What about all the current users of O_DIRECT, you have just made +> > > them +> > > 4 times slower and "less concurrent*" then "buffred io" users. +> > > Since +> > > direct_IO path will queue an IO request and all. +> > > (And if it is not so slow then why do we need dax_do_io at all? +> > > [Rhetorical]) +> > > +> > > I hate it that you overload the semantics of a known and expected +> > > O_DIRECT flag, for special pmem quirks. This is an incompatible +> > > and unrelated overload of the semantics of O_DIRECT. +> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on +> > the same path: +> > +> > static inline bool io_is_direct(struct file *filp) +> > { +> > return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping- +> > >host); +> > } +> > +> No as far as the user is concerned we have not. The O_DIRECT user +> is still getting all the semantics he wants, .i.e no syncs no +> memory cache usage, no copies ... +> +> Only with DAX the buffered IO is the same since with pmem it is +> faster. +> Then why not? The basic contract with the user did not break. +> +> The above was just an implementation detail to easily navigate +> through the Linux vfs IO stack and make the least amount of changes +> in every FS that wanted to support DAX.(And since dax_do_io is much +> more like direct_IO then like page-cache IO) +> +> > +> > Yes O_DIRECT on a DAX mounted file system will now be slower, but - +> > +> > > +> > > +> > > > +> > > > +> > > > This allows us a recovery path in the form of opening the file +> > > > with +> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics +> > > > (sector +> > > > alignment restrictions). +> > > > +> > > I understand that you want a sector aligned IO, right? for the +> > > clear of errors. But I hate it that you forced all O_DIRECT IO +> > > to be slow for this. +> > > Can you not make dax_do_io handle media errors? At least for the +> > > parts of the IO that are aligned. +> > > (And your recovery path application above can use only aligned +> > > IO to make sure) +> > > +> > > Please look for another solution. Even a special +> > > IOCTL_DAX_CLEAR_ERROR +> > - see all the versions of this series prior to this one, where we +> > try +> > to do a fallback... +> > +> And? +> +> So now all O_DIRECT APPs go 4 times slower. I will have a look but if +> it is really so bad than please consider an IOCTL or syscall. Or a +> special +> O_DAX_ERRORS flag ... + +I'm curious where the 4x slower comes from.. The O_DIRECT path is still +without page-cache copies, and nor does it go through request queues +(since pmem is a bio-based driver). The only overhead is that of +submitting a bio - and while I agree it is more overhead than dax_do_io, +4x seems a bit high. + +> +> Please do not trash all the O_DIRECT users, they are the more +> important +> clients, like DBs and VMs. + +Shouldn't they be using mmaps and dax faults? I was under the impression +that the dax_do_io path is a nice-to-have, but for anyone that will want +to use DAX, they will want the mmap/fault path, not the IO path. This is +just making the IO path 'more correct' by allowing it a way to deal with +errors. + +> +> Thanks +> Boaz +> +> > +> > > +> > > +> > > [*"less concurrent" because of the queuing done in bdev. Note how +> > > pmem is not even multi-queue, and even if it was it will be much +> > > slower then DAX because of the code depth and all the locks and +> > > task +> > > switches done in the block layer. In DAX the final memcpy is +> > > done +> > > directly +> > > on the user-mode thread] +> > > +> > > Thanks +> > > Boaz +> > > +_______________________________________________ +xfs mailing list +xfs@oss.sgi.com +http://oss.sgi.com/mailman/listinfo/xfs diff --git a/a/content_digest b/N1/content_digest index 6fb5f8a..4436ead 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -8,102 +8,155 @@ "Date\0Mon, 2 May 2016 18:52:02 +0000\0" "To\0linux-nvdimm@lists.01.org <linux-nvdimm@lists.01.org>" " boaz@plexistor.com <boaz@plexistor.com>\0" - "Cc\0linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>" - linux-block@vger.kernel.org <linux-block@vger.kernel.org> - hch@infradead.org <hch@infradead.org> + "Cc\0hch@infradead.org <hch@infradead.org>" + jack@suse.cz <jack@suse.cz> + matthew@wil.cx <matthew@wil.cx> + axboe@fb.com <axboe@fb.com> + linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org> xfs@oss.sgi.com <xfs@oss.sgi.com> + linux-block@vger.kernel.org <linux-block@vger.kernel.org> linux-mm@kvack.org <linux-mm@kvack.org> viro@zeniv.linux.org.uk <viro@zeniv.linux.org.uk> - axboe@fb.com <axboe@fb.com> - akpm@linux-foundation.org <akpm@linux-foundation.org> linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org> - linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org> - david@fromorbit.com <david@fromorbit.com> - jack@suse.cz <jack@suse.cz> - " matthew@wil.cx <matthew@wil.cx>\0" + akpm@linux-foundation.org <akpm@linux-foundation.org> + " linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org>\0" "\00:1\0" "b\0" - "T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+\n" - "IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP\n" - "biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g\n" - "PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K\n" - "PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz\n" - "ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g\n" - "PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5\n" - "IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo\n" - "cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg\n" - "YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g\n" - "SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz\n" - "dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K\n" - "PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0\n" - "byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl\n" - "IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+\n" - "ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu\n" - "DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+\n" - "ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU\n" - "LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu\n" - "ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j\n" - "ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu\n" - "DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4\n" - "X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl\n" - "IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0\n" - "ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz\n" - "IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt\n" - "YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp\n" - "bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN\n" - "Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm\n" - "aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT\n" - "X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg\n" - "ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz\n" - "ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu\n" - "byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u\n" - "bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp\n" - "dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg\n" - "dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs\n" - "ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4\n" - "IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g\n" - "ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g\n" - "aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K\n" - "PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3\n" - "aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN\n" - "Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg\n" - "Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg\n" - "YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g\n" - "PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K\n" - "PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln\n" - "aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv\n" - "dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+\n" - "ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh\n" - "c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g\n" - "PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg\n" - "YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs\n" - "b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY\n" - "X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz\n" - "IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh\n" - "Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg\n" - "dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz\n" - "byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz\n" - "cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg\n" - "NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91\n" - "dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx\n" - "dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo\n" - "ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp\n" - "cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K\n" - "PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy\n" - "ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K\n" - "U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy\n" - "IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2\n" - "ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3\n" - "YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt\n" - "YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g\n" - "ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+\n" - "ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl\n" - "dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11\n" - "bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz\n" - "bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr\n" - "cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh\n" - "eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0\n" - "bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu\n" - a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA== + "On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:\n" + "> On 05/02/2016 06:51 PM, Vishal Verma wrote:\n" + "> > \n" + "> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:\n" + "> > > \n" + "> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:\n" + "> > > > \n" + "> > > > \n" + "> > > > All IO in a dax filesystem used to go through dax_do_io, which\n" + "> > > > cannot\n" + "> > > > handle media errors, and thus cannot provide a recovery path\n" + "> > > > that\n" + "> > > > can\n" + "> > > > send a write through the driver to clear errors.\n" + "> > > > \n" + "> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In\n" + "> > > > the\n" + "> > > > IO\n" + "> > > > path for DAX filesystems, use the same direct_IO path for both\n" + "> > > > DAX\n" + "> > > > and\n" + "> > > > direct_io iocbs, but use the flags to identify when we are in\n" + "> > > > O_DIRECT\n" + "> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the\n" + "> > > > conventional\n" + "> > > > direct_IO path instead of DAX.\n" + "> > > > \n" + "> > > Really? What are your thinking here?\n" + "> > > \n" + "> > > What about all the current users of O_DIRECT, you have just made\n" + "> > > them\n" + "> > > 4 times slower and \"less concurrent*\" then \"buffred io\" users.\n" + "> > > Since\n" + "> > > direct_IO path will queue an IO request and all.\n" + "> > > (And if it is not so slow then why do we need dax_do_io at all?\n" + "> > > [Rhetorical])\n" + "> > > \n" + "> > > I hate it that you overload the semantics of a known and expected\n" + "> > > O_DIRECT flag, for special pmem quirks. This is an incompatible\n" + "> > > and unrelated overload of the semantics of O_DIRECT.\n" + "> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on\n" + "> > the same path:\n" + "> > \n" + "> > static inline bool io_is_direct(struct file *filp)\n" + "> > {\n" + "> > \treturn (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-\n" + "> > >host);\n" + "> > }\n" + "> > \n" + "> No as far as the user is concerned we have not. The O_DIRECT user\n" + "> is still getting all the semantics he wants, .i.e no syncs no\n" + "> memory cache usage, no copies ...\n" + "> \n" + "> Only with DAX the buffered IO is the same since with pmem it is\n" + "> faster.\n" + "> Then why not? The basic contract with the user did not break.\n" + "> \n" + "> The above was just an implementation detail to easily navigate\n" + "> through the Linux vfs IO stack and make the least amount of changes\n" + "> in every FS that wanted to support DAX.(And since dax_do_io is much\n" + "> more like direct_IO then like page-cache IO)\n" + "> \n" + "> > \n" + "> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -\n" + "> > \n" + "> > > \n" + "> > > \n" + "> > > > \n" + "> > > > \n" + "> > > > This allows us a recovery path in the form of opening the file\n" + "> > > > with\n" + "> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics\n" + "> > > > (sector\n" + "> > > > alignment restrictions).\n" + "> > > > \n" + "> > > I understand that you want a sector aligned IO, right? for the\n" + "> > > clear of errors. But I hate it that you forced all O_DIRECT IO\n" + "> > > to be slow for this.\n" + "> > > Can you not make dax_do_io handle media errors? At least for the\n" + "> > > parts of the IO that are aligned.\n" + "> > > (And your recovery path application above can use only aligned\n" + "> > > \302\240IO to make sure)\n" + "> > > \n" + "> > > Please look for another solution. Even a special\n" + "> > > IOCTL_DAX_CLEAR_ERROR\n" + "> > \302\240- see all the versions of this series prior to this one, where we\n" + "> > try\n" + "> > to do a fallback...\n" + "> > \n" + "> And?\n" + "> \n" + "> So now all O_DIRECT APPs go 4 times slower. I will have a look but if\n" + "> it is really so bad than please consider an IOCTL or syscall. Or a\n" + "> special\n" + "> O_DAX_ERRORS flag ...\n" + "\n" + "I'm curious where the 4x slower comes from.. The O_DIRECT path is still\n" + "without page-cache copies, and nor does it go through request queues\n" + "(since pmem is a bio-based driver). The only overhead is that of\n" + "submitting a bio - and while I agree it is more overhead than dax_do_io,\n" + "4x seems a bit high.\n" + "\n" + "> \n" + "> Please do not trash all the O_DIRECT users, they are the more\n" + "> important\n" + "> clients, like DBs and VMs.\n" + "\n" + "Shouldn't they be using mmaps and dax faults? I was under the impression\n" + "that the dax_do_io path is a nice-to-have, but for anyone that will want\n" + "to use DAX, they will want the mmap/fault path, not the IO path. This is\n" + "just making the IO path 'more correct' by allowing it a way to deal with\n" + "errors.\n" + "\n" + "> \n" + "> Thanks\n" + "> Boaz\n" + "> \n" + "> > \n" + "> > > \n" + "> > > \n" + "> > > [*\"less concurrent\" because of the queuing done in bdev. Note how\n" + "> > > \302\240 pmem is not even multi-queue, and even if it was it will be much\n" + "> > > \302\240 slower then DAX because of the code depth and all the locks and\n" + "> > > task\n" + "> > > \302\240 switches done in the block layer. In DAX the final memcpy is\n" + "> > > done\n" + "> > > directly\n" + "> > > \302\240 on the user-mode thread]\n" + "> > > \n" + "> > > Thanks\n" + "> > > Boaz\n" + "> > > \n" + "_______________________________________________\n" + "xfs mailing list\n" + "xfs@oss.sgi.com\n" + http://oss.sgi.com/mailman/listinfo/xfs -24869abd3ea9a39bba870c7d85f8910222fd85059cf703e4e503c13cf44d32f9 +c6b62129be977b04a3d1a751e69ec97daafabb4ca3d8261728afa886d62cedb9
diff --git a/a/1.txt b/N2/1.txt index c608940..fd75ce0 100644 --- a/a/1.txt +++ b/N2/1.txt @@ -1,82 +1,136 @@ -T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+ -IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP -biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g -PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K -PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz -ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g -PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5 -IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo -cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg -YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g -SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz -dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K -PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0 -byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl -IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+ -ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu -DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+ -ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU -LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu -ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j -ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu -DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4 -X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl -IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0 -ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz -IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt -YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp -bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN -Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm -aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT -X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg -ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz -ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu -byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u -bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp -dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg -dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs -ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4 -IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g -ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g -aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K -PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3 -aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN -Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg -Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg -YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g -PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K -PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln -aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv -dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+ -ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh -c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g -PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg -YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs -b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY -X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz -IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh -Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg -dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz -byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz -cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg -NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91 -dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx -dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo -ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp -cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K -PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy -ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K -U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy -IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2 -ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3 -YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt -YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g -ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+ -ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl -dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11 -bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz -bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr -cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh -eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0 -bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu -a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA== +On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote: +> On 05/02/2016 06:51 PM, Vishal Verma wrote: +> > +> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote: +> > > +> > > On 04/29/2016 12:16 AM, Vishal Verma wrote: +> > > > +> > > > +> > > > All IO in a dax filesystem used to go through dax_do_io, which +> > > > cannot +> > > > handle media errors, and thus cannot provide a recovery path +> > > > that +> > > > can +> > > > send a write through the driver to clear errors. +> > > > +> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In +> > > > the +> > > > IO +> > > > path for DAX filesystems, use the same direct_IO path for both +> > > > DAX +> > > > and +> > > > direct_io iocbs, but use the flags to identify when we are in +> > > > O_DIRECT +> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the +> > > > conventional +> > > > direct_IO path instead of DAX. +> > > > +> > > Really? What are your thinking here? +> > > +> > > What about all the current users of O_DIRECT, you have just made +> > > them +> > > 4 times slower and "less concurrent*" then "buffred io" users. +> > > Since +> > > direct_IO path will queue an IO request and all. +> > > (And if it is not so slow then why do we need dax_do_io at all? +> > > [Rhetorical]) +> > > +> > > I hate it that you overload the semantics of a known and expected +> > > O_DIRECT flag, for special pmem quirks. This is an incompatible +> > > and unrelated overload of the semantics of O_DIRECT. +> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on +> > the same path: +> > +> > static inline bool io_is_direct(struct file *filp) +> > { +> > return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping- +> > >host); +> > } +> > +> No as far as the user is concerned we have not. The O_DIRECT user +> is still getting all the semantics he wants, .i.e no syncs no +> memory cache usage, no copies ... +> +> Only with DAX the buffered IO is the same since with pmem it is +> faster. +> Then why not? The basic contract with the user did not break. +> +> The above was just an implementation detail to easily navigate +> through the Linux vfs IO stack and make the least amount of changes +> in every FS that wanted to support DAX.(And since dax_do_io is much +> more like direct_IO then like page-cache IO) +> +> > +> > Yes O_DIRECT on a DAX mounted file system will now be slower, but - +> > +> > > +> > > +> > > > +> > > > +> > > > This allows us a recovery path in the form of opening the file +> > > > with +> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics +> > > > (sector +> > > > alignment restrictions). +> > > > +> > > I understand that you want a sector aligned IO, right? for the +> > > clear of errors. But I hate it that you forced all O_DIRECT IO +> > > to be slow for this. +> > > Can you not make dax_do_io handle media errors? At least for the +> > > parts of the IO that are aligned. +> > > (And your recovery path application above can use only aligned +> > > IO to make sure) +> > > +> > > Please look for another solution. Even a special +> > > IOCTL_DAX_CLEAR_ERROR +> > - see all the versions of this series prior to this one, where we +> > try +> > to do a fallback... +> > +> And? +> +> So now all O_DIRECT APPs go 4 times slower. I will have a look but if +> it is really so bad than please consider an IOCTL or syscall. Or a +> special +> O_DAX_ERRORS flag ... + +I'm curious where the 4x slower comes from.. The O_DIRECT path is still +without page-cache copies, and nor does it go through request queues +(since pmem is a bio-based driver). The only overhead is that of +submitting a bio - and while I agree it is more overhead than dax_do_io, +4x seems a bit high. + +> +> Please do not trash all the O_DIRECT users, they are the more +> important +> clients, like DBs and VMs. + +Shouldn't they be using mmaps and dax faults? I was under the impression +that the dax_do_io path is a nice-to-have, but for anyone that will want +to use DAX, they will want the mmap/fault path, not the IO path. This is +just making the IO path 'more correct' by allowing it a way to deal with +errors. + +> +> Thanks +> Boaz +> +> > +> > > +> > > +> > > [*"less concurrent" because of the queuing done in bdev. Note how +> > > pmem is not even multi-queue, and even if it was it will be much +> > > slower then DAX because of the code depth and all the locks and +> > > task +> > > switches done in the block layer. In DAX the final memcpy is +> > > done +> > > directly +> > > on the user-mode thread] +> > > +> > > Thanks +> > > Boaz +> > > +_______________________________________________ +Linux-nvdimm mailing list +Linux-nvdimm@lists.01.org +https://lists.01.org/mailman/listinfo/linux-nvdimm diff --git a/a/content_digest b/N2/content_digest index 6fb5f8a..d54e34f 100644 --- a/a/content_digest +++ b/N2/content_digest @@ -8,102 +8,156 @@ "Date\0Mon, 2 May 2016 18:52:02 +0000\0" "To\0linux-nvdimm@lists.01.org <linux-nvdimm@lists.01.org>" " boaz@plexistor.com <boaz@plexistor.com>\0" - "Cc\0linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>" - linux-block@vger.kernel.org <linux-block@vger.kernel.org> - hch@infradead.org <hch@infradead.org> + "Cc\0hch@infradead.org <hch@infradead.org>" + jack@suse.cz <jack@suse.cz> + matthew@wil.cx <matthew@wil.cx> + axboe@fb.com <axboe@fb.com> + david@fromorbit.com <david@fromorbit.com> + linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org> xfs@oss.sgi.com <xfs@oss.sgi.com> + linux-block@vger.kernel.org <linux-block@vger.kernel.org> linux-mm@kvack.org <linux-mm@kvack.org> viro@zeniv.linux.org.uk <viro@zeniv.linux.org.uk> - axboe@fb.com <axboe@fb.com> - akpm@linux-foundation.org <akpm@linux-foundation.org> linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org> - linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org> - david@fromorbit.com <david@fromorbit.com> - jack@suse.cz <jack@suse.cz> - " matthew@wil.cx <matthew@wil.cx>\0" + akpm@linux-foundation.org <akpm@linux-foundation.org> + " linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org>\0" "\00:1\0" "b\0" - "T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+\n" - "IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP\n" - "biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g\n" - "PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K\n" - "PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz\n" - "ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g\n" - "PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5\n" - "IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo\n" - "cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg\n" - "YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g\n" - "SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz\n" - "dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K\n" - "PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0\n" - "byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl\n" - "IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+\n" - "ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu\n" - "DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+\n" - "ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU\n" - "LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu\n" - "ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j\n" - "ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu\n" - "DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4\n" - "X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl\n" - "IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0\n" - "ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz\n" - "IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt\n" - "YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp\n" - "bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN\n" - "Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm\n" - "aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT\n" - "X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg\n" - "ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz\n" - "ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu\n" - "byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u\n" - "bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp\n" - "dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg\n" - "dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs\n" - "ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4\n" - "IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g\n" - "ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g\n" - "aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K\n" - "PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3\n" - "aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN\n" - "Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg\n" - "Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg\n" - "YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g\n" - "PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K\n" - "PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln\n" - "aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv\n" - "dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+\n" - "ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh\n" - "c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g\n" - "PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg\n" - "YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs\n" - "b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY\n" - "X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz\n" - "IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh\n" - "Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg\n" - "dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz\n" - "byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz\n" - "cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg\n" - "NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91\n" - "dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx\n" - "dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo\n" - "ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp\n" - "cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K\n" - "PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy\n" - "ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K\n" - "U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy\n" - "IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2\n" - "ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3\n" - "YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt\n" - "YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g\n" - "ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+\n" - "ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl\n" - "dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11\n" - "bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz\n" - "bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr\n" - "cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh\n" - "eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0\n" - "bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu\n" - a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA== + "On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:\n" + "> On 05/02/2016 06:51 PM, Vishal Verma wrote:\n" + "> > \n" + "> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:\n" + "> > > \n" + "> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:\n" + "> > > > \n" + "> > > > \n" + "> > > > All IO in a dax filesystem used to go through dax_do_io, which\n" + "> > > > cannot\n" + "> > > > handle media errors, and thus cannot provide a recovery path\n" + "> > > > that\n" + "> > > > can\n" + "> > > > send a write through the driver to clear errors.\n" + "> > > > \n" + "> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In\n" + "> > > > the\n" + "> > > > IO\n" + "> > > > path for DAX filesystems, use the same direct_IO path for both\n" + "> > > > DAX\n" + "> > > > and\n" + "> > > > direct_io iocbs, but use the flags to identify when we are in\n" + "> > > > O_DIRECT\n" + "> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the\n" + "> > > > conventional\n" + "> > > > direct_IO path instead of DAX.\n" + "> > > > \n" + "> > > Really? What are your thinking here?\n" + "> > > \n" + "> > > What about all the current users of O_DIRECT, you have just made\n" + "> > > them\n" + "> > > 4 times slower and \"less concurrent*\" then \"buffred io\" users.\n" + "> > > Since\n" + "> > > direct_IO path will queue an IO request and all.\n" + "> > > (And if it is not so slow then why do we need dax_do_io at all?\n" + "> > > [Rhetorical])\n" + "> > > \n" + "> > > I hate it that you overload the semantics of a known and expected\n" + "> > > O_DIRECT flag, for special pmem quirks. This is an incompatible\n" + "> > > and unrelated overload of the semantics of O_DIRECT.\n" + "> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on\n" + "> > the same path:\n" + "> > \n" + "> > static inline bool io_is_direct(struct file *filp)\n" + "> > {\n" + "> > \treturn (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-\n" + "> > >host);\n" + "> > }\n" + "> > \n" + "> No as far as the user is concerned we have not. The O_DIRECT user\n" + "> is still getting all the semantics he wants, .i.e no syncs no\n" + "> memory cache usage, no copies ...\n" + "> \n" + "> Only with DAX the buffered IO is the same since with pmem it is\n" + "> faster.\n" + "> Then why not? The basic contract with the user did not break.\n" + "> \n" + "> The above was just an implementation detail to easily navigate\n" + "> through the Linux vfs IO stack and make the least amount of changes\n" + "> in every FS that wanted to support DAX.(And since dax_do_io is much\n" + "> more like direct_IO then like page-cache IO)\n" + "> \n" + "> > \n" + "> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -\n" + "> > \n" + "> > > \n" + "> > > \n" + "> > > > \n" + "> > > > \n" + "> > > > This allows us a recovery path in the form of opening the file\n" + "> > > > with\n" + "> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics\n" + "> > > > (sector\n" + "> > > > alignment restrictions).\n" + "> > > > \n" + "> > > I understand that you want a sector aligned IO, right? for the\n" + "> > > clear of errors. But I hate it that you forced all O_DIRECT IO\n" + "> > > to be slow for this.\n" + "> > > Can you not make dax_do_io handle media errors? At least for the\n" + "> > > parts of the IO that are aligned.\n" + "> > > (And your recovery path application above can use only aligned\n" + "> > > \302\240IO to make sure)\n" + "> > > \n" + "> > > Please look for another solution. Even a special\n" + "> > > IOCTL_DAX_CLEAR_ERROR\n" + "> > \302\240- see all the versions of this series prior to this one, where we\n" + "> > try\n" + "> > to do a fallback...\n" + "> > \n" + "> And?\n" + "> \n" + "> So now all O_DIRECT APPs go 4 times slower. I will have a look but if\n" + "> it is really so bad than please consider an IOCTL or syscall. Or a\n" + "> special\n" + "> O_DAX_ERRORS flag ...\n" + "\n" + "I'm curious where the 4x slower comes from.. The O_DIRECT path is still\n" + "without page-cache copies, and nor does it go through request queues\n" + "(since pmem is a bio-based driver). The only overhead is that of\n" + "submitting a bio - and while I agree it is more overhead than dax_do_io,\n" + "4x seems a bit high.\n" + "\n" + "> \n" + "> Please do not trash all the O_DIRECT users, they are the more\n" + "> important\n" + "> clients, like DBs and VMs.\n" + "\n" + "Shouldn't they be using mmaps and dax faults? I was under the impression\n" + "that the dax_do_io path is a nice-to-have, but for anyone that will want\n" + "to use DAX, they will want the mmap/fault path, not the IO path. This is\n" + "just making the IO path 'more correct' by allowing it a way to deal with\n" + "errors.\n" + "\n" + "> \n" + "> Thanks\n" + "> Boaz\n" + "> \n" + "> > \n" + "> > > \n" + "> > > \n" + "> > > [*\"less concurrent\" because of the queuing done in bdev. Note how\n" + "> > > \302\240 pmem is not even multi-queue, and even if it was it will be much\n" + "> > > \302\240 slower then DAX because of the code depth and all the locks and\n" + "> > > task\n" + "> > > \302\240 switches done in the block layer. In DAX the final memcpy is\n" + "> > > done\n" + "> > > directly\n" + "> > > \302\240 on the user-mode thread]\n" + "> > > \n" + "> > > Thanks\n" + "> > > Boaz\n" + "> > > \n" + "_______________________________________________\n" + "Linux-nvdimm mailing list\n" + "Linux-nvdimm@lists.01.org\n" + https://lists.01.org/mailman/listinfo/linux-nvdimm -24869abd3ea9a39bba870c7d85f8910222fd85059cf703e4e503c13cf44d32f9 +10fd1ae6cddab73b2221b4ed39bdcbfabc6145d838b99905bfe21ccf5c24acce
diff --git a/a/1.txt b/N3/1.txt index c608940..529b226 100644 --- a/a/1.txt +++ b/N3/1.txt @@ -1,82 +1,132 @@ -T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+ -IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP -biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g -PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K -PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz -ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g -PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5 -IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo -cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg -YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g -SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz -dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K -PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0 -byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl -IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+ -ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu -DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+ -ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU -LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu -ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j -ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu -DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4 -X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl -IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0 -ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz -IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt -YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp -bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN -Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm -aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT -X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg -ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz -ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu -byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u -bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp -dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg -dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs -ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4 -IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g -ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g -aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K -PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3 -aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN -Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg -Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg -YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g -PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K -PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln -aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv -dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+ -ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh -c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g -PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg -YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs -b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY -X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz -IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh -Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg -dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz -byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz -cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg -NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91 -dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx -dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo -ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp -cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K -PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy -ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K -U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy -IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2 -ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3 -YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt -YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g -ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+ -ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl -dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11 -bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz -bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr -cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh -eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0 -bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu -a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA== +On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote: +> On 05/02/2016 06:51 PM, Vishal Verma wrote: +> > +> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote: +> > > +> > > On 04/29/2016 12:16 AM, Vishal Verma wrote: +> > > > +> > > > +> > > > All IO in a dax filesystem used to go through dax_do_io, which +> > > > cannot +> > > > handle media errors, and thus cannot provide a recovery path +> > > > that +> > > > can +> > > > send a write through the driver to clear errors. +> > > > +> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In +> > > > the +> > > > IO +> > > > path for DAX filesystems, use the same direct_IO path for both +> > > > DAX +> > > > and +> > > > direct_io iocbs, but use the flags to identify when we are in +> > > > O_DIRECT +> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the +> > > > conventional +> > > > direct_IO path instead of DAX. +> > > > +> > > Really? What are your thinking here? +> > > +> > > What about all the current users of O_DIRECT, you have just made +> > > them +> > > 4 times slower and "less concurrent*" then "buffred io" users. +> > > Since +> > > direct_IO path will queue an IO request and all. +> > > (And if it is not so slow then why do we need dax_do_io at all? +> > > [Rhetorical]) +> > > +> > > I hate it that you overload the semantics of a known and expected +> > > O_DIRECT flag, for special pmem quirks. This is an incompatible +> > > and unrelated overload of the semantics of O_DIRECT. +> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on +> > the same path: +> > +> > static inline bool io_is_direct(struct file *filp) +> > { +> > return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping- +> > >host); +> > } +> > +> No as far as the user is concerned we have not. The O_DIRECT user +> is still getting all the semantics he wants, .i.e no syncs no +> memory cache usage, no copies ... +> +> Only with DAX the buffered IO is the same since with pmem it is +> faster. +> Then why not? The basic contract with the user did not break. +> +> The above was just an implementation detail to easily navigate +> through the Linux vfs IO stack and make the least amount of changes +> in every FS that wanted to support DAX.(And since dax_do_io is much +> more like direct_IO then like page-cache IO) +> +> > +> > Yes O_DIRECT on a DAX mounted file system will now be slower, but - +> > +> > > +> > > +> > > > +> > > > +> > > > This allows us a recovery path in the form of opening the file +> > > > with +> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics +> > > > (sector +> > > > alignment restrictions). +> > > > +> > > I understand that you want a sector aligned IO, right? for the +> > > clear of errors. But I hate it that you forced all O_DIRECT IO +> > > to be slow for this. +> > > Can you not make dax_do_io handle media errors? At least for the +> > > parts of the IO that are aligned. +> > > (And your recovery path application above can use only aligned +> > > IO to make sure) +> > > +> > > Please look for another solution. Even a special +> > > IOCTL_DAX_CLEAR_ERROR +> > - see all the versions of this series prior to this one, where we +> > try +> > to do a fallback... +> > +> And? +> +> So now all O_DIRECT APPs go 4 times slower. I will have a look but if +> it is really so bad than please consider an IOCTL or syscall. Or a +> special +> O_DAX_ERRORS flag ... + +I'm curious where the 4x slower comes from.. The O_DIRECT path is still +without page-cache copies, and nor does it go through request queues +(since pmem is a bio-based driver). The only overhead is that of +submitting a bio - and while I agree it is more overhead than dax_do_io, +4x seems a bit high. + +> +> Please do not trash all the O_DIRECT users, they are the more +> important +> clients, like DBs and VMs. + +Shouldn't they be using mmaps and dax faults? I was under the impression +that the dax_do_io path is a nice-to-have, but for anyone that will want +to use DAX, they will want the mmap/fault path, not the IO path. This is +just making the IO path 'more correct' by allowing it a way to deal with +errors. + +> +> Thanks +> Boaz +> +> > +> > > +> > > +> > > [*"less concurrent" because of the queuing done in bdev. Note how +> > > pmem is not even multi-queue, and even if it was it will be much +> > > slower then DAX because of the code depth and all the locks and +> > > task +> > > switches done in the block layer. In DAX the final memcpy is +> > > done +> > > directly +> > > on the user-mode thread] +> > > +> > > Thanks +> > > Boaz +> > > diff --git a/a/content_digest b/N3/content_digest index 6fb5f8a..69d8743 100644 --- a/a/content_digest +++ b/N3/content_digest @@ -23,87 +23,137 @@ " matthew@wil.cx <matthew@wil.cx>\0" "\00:1\0" "b\0" - "T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+\n" - "IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP\n" - "biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g\n" - "PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K\n" - "PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz\n" - "ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g\n" - "PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5\n" - "IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo\n" - "cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg\n" - "YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g\n" - "SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz\n" - "dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K\n" - "PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0\n" - "byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl\n" - "IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+\n" - "ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu\n" - "DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+\n" - "ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU\n" - "LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu\n" - "ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j\n" - "ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu\n" - "DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4\n" - "X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl\n" - "IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0\n" - "ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz\n" - "IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt\n" - "YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp\n" - "bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN\n" - "Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm\n" - "aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT\n" - "X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg\n" - "ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz\n" - "ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu\n" - "byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u\n" - "bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp\n" - "dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg\n" - "dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs\n" - "ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4\n" - "IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g\n" - "ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g\n" - "aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K\n" - "PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3\n" - "aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN\n" - "Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg\n" - "Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg\n" - "YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g\n" - "PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K\n" - "PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln\n" - "aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv\n" - "dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+\n" - "ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh\n" - "c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g\n" - "PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg\n" - "YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs\n" - "b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY\n" - "X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz\n" - "IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh\n" - "Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg\n" - "dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz\n" - "byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz\n" - "cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg\n" - "NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91\n" - "dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx\n" - "dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo\n" - "ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp\n" - "cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K\n" - "PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy\n" - "ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K\n" - "U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy\n" - "IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2\n" - "ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3\n" - "YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt\n" - "YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g\n" - "ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+\n" - "ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl\n" - "dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11\n" - "bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz\n" - "bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr\n" - "cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh\n" - "eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0\n" - "bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu\n" - a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA== + "On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:\n" + "> On 05/02/2016 06:51 PM, Vishal Verma wrote:\n" + "> > \n" + "> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:\n" + "> > > \n" + "> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:\n" + "> > > > \n" + "> > > > \n" + "> > > > All IO in a dax filesystem used to go through dax_do_io, which\n" + "> > > > cannot\n" + "> > > > handle media errors, and thus cannot provide a recovery path\n" + "> > > > that\n" + "> > > > can\n" + "> > > > send a write through the driver to clear errors.\n" + "> > > > \n" + "> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In\n" + "> > > > the\n" + "> > > > IO\n" + "> > > > path for DAX filesystems, use the same direct_IO path for both\n" + "> > > > DAX\n" + "> > > > and\n" + "> > > > direct_io iocbs, but use the flags to identify when we are in\n" + "> > > > O_DIRECT\n" + "> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the\n" + "> > > > conventional\n" + "> > > > direct_IO path instead of DAX.\n" + "> > > > \n" + "> > > Really? What are your thinking here?\n" + "> > > \n" + "> > > What about all the current users of O_DIRECT, you have just made\n" + "> > > them\n" + "> > > 4 times slower and \"less concurrent*\" then \"buffred io\" users.\n" + "> > > Since\n" + "> > > direct_IO path will queue an IO request and all.\n" + "> > > (And if it is not so slow then why do we need dax_do_io at all?\n" + "> > > [Rhetorical])\n" + "> > > \n" + "> > > I hate it that you overload the semantics of a known and expected\n" + "> > > O_DIRECT flag, for special pmem quirks. This is an incompatible\n" + "> > > and unrelated overload of the semantics of O_DIRECT.\n" + "> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on\n" + "> > the same path:\n" + "> > \n" + "> > static inline bool io_is_direct(struct file *filp)\n" + "> > {\n" + "> > \treturn (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-\n" + "> > >host);\n" + "> > }\n" + "> > \n" + "> No as far as the user is concerned we have not. The O_DIRECT user\n" + "> is still getting all the semantics he wants, .i.e no syncs no\n" + "> memory cache usage, no copies ...\n" + "> \n" + "> Only with DAX the buffered IO is the same since with pmem it is\n" + "> faster.\n" + "> Then why not? The basic contract with the user did not break.\n" + "> \n" + "> The above was just an implementation detail to easily navigate\n" + "> through the Linux vfs IO stack and make the least amount of changes\n" + "> in every FS that wanted to support DAX.(And since dax_do_io is much\n" + "> more like direct_IO then like page-cache IO)\n" + "> \n" + "> > \n" + "> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -\n" + "> > \n" + "> > > \n" + "> > > \n" + "> > > > \n" + "> > > > \n" + "> > > > This allows us a recovery path in the form of opening the file\n" + "> > > > with\n" + "> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics\n" + "> > > > (sector\n" + "> > > > alignment restrictions).\n" + "> > > > \n" + "> > > I understand that you want a sector aligned IO, right? for the\n" + "> > > clear of errors. But I hate it that you forced all O_DIRECT IO\n" + "> > > to be slow for this.\n" + "> > > Can you not make dax_do_io handle media errors? At least for the\n" + "> > > parts of the IO that are aligned.\n" + "> > > (And your recovery path application above can use only aligned\n" + "> > > \302\240IO to make sure)\n" + "> > > \n" + "> > > Please look for another solution. Even a special\n" + "> > > IOCTL_DAX_CLEAR_ERROR\n" + "> > \302\240- see all the versions of this series prior to this one, where we\n" + "> > try\n" + "> > to do a fallback...\n" + "> > \n" + "> And?\n" + "> \n" + "> So now all O_DIRECT APPs go 4 times slower. I will have a look but if\n" + "> it is really so bad than please consider an IOCTL or syscall. Or a\n" + "> special\n" + "> O_DAX_ERRORS flag ...\n" + "\n" + "I'm curious where the 4x slower comes from.. The O_DIRECT path is still\n" + "without page-cache copies, and nor does it go through request queues\n" + "(since pmem is a bio-based driver). The only overhead is that of\n" + "submitting a bio - and while I agree it is more overhead than dax_do_io,\n" + "4x seems a bit high.\n" + "\n" + "> \n" + "> Please do not trash all the O_DIRECT users, they are the more\n" + "> important\n" + "> clients, like DBs and VMs.\n" + "\n" + "Shouldn't they be using mmaps and dax faults? I was under the impression\n" + "that the dax_do_io path is a nice-to-have, but for anyone that will want\n" + "to use DAX, they will want the mmap/fault path, not the IO path. This is\n" + "just making the IO path 'more correct' by allowing it a way to deal with\n" + "errors.\n" + "\n" + "> \n" + "> Thanks\n" + "> Boaz\n" + "> \n" + "> > \n" + "> > > \n" + "> > > \n" + "> > > [*\"less concurrent\" because of the queuing done in bdev. Note how\n" + "> > > \302\240 pmem is not even multi-queue, and even if it was it will be much\n" + "> > > \302\240 slower then DAX because of the code depth and all the locks and\n" + "> > > task\n" + "> > > \302\240 switches done in the block layer. In DAX the final memcpy is\n" + "> > > done\n" + "> > > directly\n" + "> > > \302\240 on the user-mode thread]\n" + "> > > \n" + "> > > Thanks\n" + "> > > Boaz\n" + > > > -24869abd3ea9a39bba870c7d85f8910222fd85059cf703e4e503c13cf44d32f9 +997f9483b00dd6b2c9a381c0bfe4c8edf4ae91f4824f8898c28cfc81c8db5a1d
diff --git a/a/1.txt b/N4/1.txt index c608940..529b226 100644 --- a/a/1.txt +++ b/N4/1.txt @@ -1,82 +1,132 @@ -T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+ -IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP -biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g -PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K -PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz -ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g -PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5 -IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo -cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg -YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g -SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz -dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K -PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0 -byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl -IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+ -ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu -DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+ -ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU -LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu -ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j -ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu -DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4 -X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl -IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0 -ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz -IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt -YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp -bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN -Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm -aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT -X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg -ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz -ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu -byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u -bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp -dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg -dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs -ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4 -IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g -ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g -aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K -PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3 -aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN -Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg -Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg -YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g -PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K -PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln -aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv -dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+ -ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh -c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g -PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg -YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs -b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY -X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz -IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh -Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg -dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz -byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz -cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg -NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91 -dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx -dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo -ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp -cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K -PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy -ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K -U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy -IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2 -ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3 -YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt -YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g -ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+ -ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl -dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11 -bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz -bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr -cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh -eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0 -bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu -a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA== +On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote: +> On 05/02/2016 06:51 PM, Vishal Verma wrote: +> > +> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote: +> > > +> > > On 04/29/2016 12:16 AM, Vishal Verma wrote: +> > > > +> > > > +> > > > All IO in a dax filesystem used to go through dax_do_io, which +> > > > cannot +> > > > handle media errors, and thus cannot provide a recovery path +> > > > that +> > > > can +> > > > send a write through the driver to clear errors. +> > > > +> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In +> > > > the +> > > > IO +> > > > path for DAX filesystems, use the same direct_IO path for both +> > > > DAX +> > > > and +> > > > direct_io iocbs, but use the flags to identify when we are in +> > > > O_DIRECT +> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the +> > > > conventional +> > > > direct_IO path instead of DAX. +> > > > +> > > Really? What are your thinking here? +> > > +> > > What about all the current users of O_DIRECT, you have just made +> > > them +> > > 4 times slower and "less concurrent*" then "buffred io" users. +> > > Since +> > > direct_IO path will queue an IO request and all. +> > > (And if it is not so slow then why do we need dax_do_io at all? +> > > [Rhetorical]) +> > > +> > > I hate it that you overload the semantics of a known and expected +> > > O_DIRECT flag, for special pmem quirks. This is an incompatible +> > > and unrelated overload of the semantics of O_DIRECT. +> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on +> > the same path: +> > +> > static inline bool io_is_direct(struct file *filp) +> > { +> > return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping- +> > >host); +> > } +> > +> No as far as the user is concerned we have not. The O_DIRECT user +> is still getting all the semantics he wants, .i.e no syncs no +> memory cache usage, no copies ... +> +> Only with DAX the buffered IO is the same since with pmem it is +> faster. +> Then why not? The basic contract with the user did not break. +> +> The above was just an implementation detail to easily navigate +> through the Linux vfs IO stack and make the least amount of changes +> in every FS that wanted to support DAX.(And since dax_do_io is much +> more like direct_IO then like page-cache IO) +> +> > +> > Yes O_DIRECT on a DAX mounted file system will now be slower, but - +> > +> > > +> > > +> > > > +> > > > +> > > > This allows us a recovery path in the form of opening the file +> > > > with +> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics +> > > > (sector +> > > > alignment restrictions). +> > > > +> > > I understand that you want a sector aligned IO, right? for the +> > > clear of errors. But I hate it that you forced all O_DIRECT IO +> > > to be slow for this. +> > > Can you not make dax_do_io handle media errors? At least for the +> > > parts of the IO that are aligned. +> > > (And your recovery path application above can use only aligned +> > > IO to make sure) +> > > +> > > Please look for another solution. Even a special +> > > IOCTL_DAX_CLEAR_ERROR +> > - see all the versions of this series prior to this one, where we +> > try +> > to do a fallback... +> > +> And? +> +> So now all O_DIRECT APPs go 4 times slower. I will have a look but if +> it is really so bad than please consider an IOCTL or syscall. Or a +> special +> O_DAX_ERRORS flag ... + +I'm curious where the 4x slower comes from.. The O_DIRECT path is still +without page-cache copies, and nor does it go through request queues +(since pmem is a bio-based driver). The only overhead is that of +submitting a bio - and while I agree it is more overhead than dax_do_io, +4x seems a bit high. + +> +> Please do not trash all the O_DIRECT users, they are the more +> important +> clients, like DBs and VMs. + +Shouldn't they be using mmaps and dax faults? I was under the impression +that the dax_do_io path is a nice-to-have, but for anyone that will want +to use DAX, they will want the mmap/fault path, not the IO path. This is +just making the IO path 'more correct' by allowing it a way to deal with +errors. + +> +> Thanks +> Boaz +> +> > +> > > +> > > +> > > [*"less concurrent" because of the queuing done in bdev. Note how +> > > pmem is not even multi-queue, and even if it was it will be much +> > > slower then DAX because of the code depth and all the locks and +> > > task +> > > switches done in the block layer. In DAX the final memcpy is +> > > done +> > > directly +> > > on the user-mode thread] +> > > +> > > Thanks +> > > Boaz +> > > diff --git a/a/content_digest b/N4/content_digest index 6fb5f8a..c23c46e 100644 --- a/a/content_digest +++ b/N4/content_digest @@ -6,7 +6,7 @@ "From\0Verma, Vishal L <vishal.l.verma@intel.com>\0" "Subject\0Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io\0" "Date\0Mon, 2 May 2016 18:52:02 +0000\0" - "To\0linux-nvdimm@lists.01.org <linux-nvdimm@lists.01.org>" + "To\0linux-nvdimm@lists.01.org <linux-nvdimm@ml01.01.org>" " boaz@plexistor.com <boaz@plexistor.com>\0" "Cc\0linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>" linux-block@vger.kernel.org <linux-block@vger.kernel.org> @@ -20,90 +20,140 @@ linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org> david@fromorbit.com <david@fromorbit.com> jack@suse.cz <jack@suse.cz> - " matthew@wil.cx <matthew@wil.cx>\0" + " matthew@wil.cx <matthew@freeurl.abc188.com>\0" "\00:1\0" "b\0" - "T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+\n" - "IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP\n" - "biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g\n" - "PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K\n" - "PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz\n" - "ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g\n" - "PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5\n" - "IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo\n" - "cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg\n" - "YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g\n" - "SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz\n" - "dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K\n" - "PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0\n" - "byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl\n" - "IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+\n" - "ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu\n" - "DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+\n" - "ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU\n" - "LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu\n" - "ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j\n" - "ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu\n" - "DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4\n" - "X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl\n" - "IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0\n" - "ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz\n" - "IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt\n" - "YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp\n" - "bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN\n" - "Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm\n" - "aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT\n" - "X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg\n" - "ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz\n" - "ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu\n" - "byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u\n" - "bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp\n" - "dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg\n" - "dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs\n" - "ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4\n" - "IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g\n" - "ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g\n" - "aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K\n" - "PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3\n" - "aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN\n" - "Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg\n" - "Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg\n" - "YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g\n" - "PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K\n" - "PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln\n" - "aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv\n" - "dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+\n" - "ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh\n" - "c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g\n" - "PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg\n" - "YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs\n" - "b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY\n" - "X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz\n" - "IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh\n" - "Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg\n" - "dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz\n" - "byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz\n" - "cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg\n" - "NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91\n" - "dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx\n" - "dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo\n" - "ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp\n" - "cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K\n" - "PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy\n" - "ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K\n" - "U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy\n" - "IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2\n" - "ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3\n" - "YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt\n" - "YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g\n" - "ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+\n" - "ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl\n" - "dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11\n" - "bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz\n" - "bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr\n" - "cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh\n" - "eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0\n" - "bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu\n" - a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA== + "On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:\n" + "> On 05/02/2016 06:51 PM, Vishal Verma wrote:\n" + "> > \n" + "> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:\n" + "> > > \n" + "> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:\n" + "> > > > \n" + "> > > > \n" + "> > > > All IO in a dax filesystem used to go through dax_do_io, which\n" + "> > > > cannot\n" + "> > > > handle media errors, and thus cannot provide a recovery path\n" + "> > > > that\n" + "> > > > can\n" + "> > > > send a write through the driver to clear errors.\n" + "> > > > \n" + "> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In\n" + "> > > > the\n" + "> > > > IO\n" + "> > > > path for DAX filesystems, use the same direct_IO path for both\n" + "> > > > DAX\n" + "> > > > and\n" + "> > > > direct_io iocbs, but use the flags to identify when we are in\n" + "> > > > O_DIRECT\n" + "> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the\n" + "> > > > conventional\n" + "> > > > direct_IO path instead of DAX.\n" + "> > > > \n" + "> > > Really? What are your thinking here?\n" + "> > > \n" + "> > > What about all the current users of O_DIRECT, you have just made\n" + "> > > them\n" + "> > > 4 times slower and \"less concurrent*\" then \"buffred io\" users.\n" + "> > > Since\n" + "> > > direct_IO path will queue an IO request and all.\n" + "> > > (And if it is not so slow then why do we need dax_do_io at all?\n" + "> > > [Rhetorical])\n" + "> > > \n" + "> > > I hate it that you overload the semantics of a known and expected\n" + "> > > O_DIRECT flag, for special pmem quirks. This is an incompatible\n" + "> > > and unrelated overload of the semantics of O_DIRECT.\n" + "> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on\n" + "> > the same path:\n" + "> > \n" + "> > static inline bool io_is_direct(struct file *filp)\n" + "> > {\n" + "> > \treturn (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-\n" + "> > >host);\n" + "> > }\n" + "> > \n" + "> No as far as the user is concerned we have not. The O_DIRECT user\n" + "> is still getting all the semantics he wants, .i.e no syncs no\n" + "> memory cache usage, no copies ...\n" + "> \n" + "> Only with DAX the buffered IO is the same since with pmem it is\n" + "> faster.\n" + "> Then why not? The basic contract with the user did not break.\n" + "> \n" + "> The above was just an implementation detail to easily navigate\n" + "> through the Linux vfs IO stack and make the least amount of changes\n" + "> in every FS that wanted to support DAX.(And since dax_do_io is much\n" + "> more like direct_IO then like page-cache IO)\n" + "> \n" + "> > \n" + "> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -\n" + "> > \n" + "> > > \n" + "> > > \n" + "> > > > \n" + "> > > > \n" + "> > > > This allows us a recovery path in the form of opening the file\n" + "> > > > with\n" + "> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics\n" + "> > > > (sector\n" + "> > > > alignment restrictions).\n" + "> > > > \n" + "> > > I understand that you want a sector aligned IO, right? for the\n" + "> > > clear of errors. But I hate it that you forced all O_DIRECT IO\n" + "> > > to be slow for this.\n" + "> > > Can you not make dax_do_io handle media errors? At least for the\n" + "> > > parts of the IO that are aligned.\n" + "> > > (And your recovery path application above can use only aligned\n" + "> > > \302\240IO to make sure)\n" + "> > > \n" + "> > > Please look for another solution. Even a special\n" + "> > > IOCTL_DAX_CLEAR_ERROR\n" + "> > \302\240- see all the versions of this series prior to this one, where we\n" + "> > try\n" + "> > to do a fallback...\n" + "> > \n" + "> And?\n" + "> \n" + "> So now all O_DIRECT APPs go 4 times slower. I will have a look but if\n" + "> it is really so bad than please consider an IOCTL or syscall. Or a\n" + "> special\n" + "> O_DAX_ERRORS flag ...\n" + "\n" + "I'm curious where the 4x slower comes from.. The O_DIRECT path is still\n" + "without page-cache copies, and nor does it go through request queues\n" + "(since pmem is a bio-based driver). The only overhead is that of\n" + "submitting a bio - and while I agree it is more overhead than dax_do_io,\n" + "4x seems a bit high.\n" + "\n" + "> \n" + "> Please do not trash all the O_DIRECT users, they are the more\n" + "> important\n" + "> clients, like DBs and VMs.\n" + "\n" + "Shouldn't they be using mmaps and dax faults? I was under the impression\n" + "that the dax_do_io path is a nice-to-have, but for anyone that will want\n" + "to use DAX, they will want the mmap/fault path, not the IO path. This is\n" + "just making the IO path 'more correct' by allowing it a way to deal with\n" + "errors.\n" + "\n" + "> \n" + "> Thanks\n" + "> Boaz\n" + "> \n" + "> > \n" + "> > > \n" + "> > > \n" + "> > > [*\"less concurrent\" because of the queuing done in bdev. Note how\n" + "> > > \302\240 pmem is not even multi-queue, and even if it was it will be much\n" + "> > > \302\240 slower then DAX because of the code depth and all the locks and\n" + "> > > task\n" + "> > > \302\240 switches done in the block layer. In DAX the final memcpy is\n" + "> > > done\n" + "> > > directly\n" + "> > > \302\240 on the user-mode thread]\n" + "> > > \n" + "> > > Thanks\n" + "> > > Boaz\n" + > > > -24869abd3ea9a39bba870c7d85f8910222fd85059cf703e4e503c13cf44d32f9 +1fe66ca5bc13c160471810c6a5d3ff5be76e226e2e363817077c46ed09919a2f
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.