All of lore.kernel.org
 help / color / mirror / Atom feed
diff for duplicates of <1462215110.1421.43.camel@intel.com>

diff --git a/a/1.txt b/N1/1.txt
index c608940..105a3b6 100644
--- a/a/1.txt
+++ b/N1/1.txt
@@ -1,82 +1,136 @@
-T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+
-IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP
-biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g
-PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K
-PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz
-ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g
-PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5
-IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo
-cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg
-YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g
-SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz
-dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K
-PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0
-byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl
-IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+
-ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu
-DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+
-ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU
-LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu
-ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j
-ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu
-DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4
-X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl
-IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0
-ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz
-IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt
-YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp
-bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN
-Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm
-aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT
-X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg
-ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz
-ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu
-byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u
-bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp
-dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg
-dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs
-ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4
-IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g
-ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g
-aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K
-PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3
-aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN
-Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg
-Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg
-YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g
-PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K
-PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln
-aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv
-dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+
-ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh
-c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g
-PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg
-YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs
-b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY
-X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz
-IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh
-Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg
-dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz
-byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz
-cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg
-NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91
-dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx
-dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo
-ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp
-cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K
-PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy
-ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K
-U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy
-IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2
-ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3
-YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt
-YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g
-ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+
-ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl
-dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11
-bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz
-bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr
-cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh
-eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0
-bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu
-a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA==
+On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:
+> On 05/02/2016 06:51 PM, Vishal Verma wrote:
+> > 
+> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:
+> > > 
+> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:
+> > > > 
+> > > > 
+> > > > All IO in a dax filesystem used to go through dax_do_io, which
+> > > > cannot
+> > > > handle media errors, and thus cannot provide a recovery path
+> > > > that
+> > > > can
+> > > > send a write through the driver to clear errors.
+> > > > 
+> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In
+> > > > the
+> > > > IO
+> > > > path for DAX filesystems, use the same direct_IO path for both
+> > > > DAX
+> > > > and
+> > > > direct_io iocbs, but use the flags to identify when we are in
+> > > > O_DIRECT
+> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the
+> > > > conventional
+> > > > direct_IO path instead of DAX.
+> > > > 
+> > > Really? What are your thinking here?
+> > > 
+> > > What about all the current users of O_DIRECT, you have just made
+> > > them
+> > > 4 times slower and "less concurrent*" then "buffred io" users.
+> > > Since
+> > > direct_IO path will queue an IO request and all.
+> > > (And if it is not so slow then why do we need dax_do_io at all?
+> > > [Rhetorical])
+> > > 
+> > > I hate it that you overload the semantics of a known and expected
+> > > O_DIRECT flag, for special pmem quirks. This is an incompatible
+> > > and unrelated overload of the semantics of O_DIRECT.
+> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on
+> > the same path:
+> > 
+> > static inline bool io_is_direct(struct file *filp)
+> > {
+> > 	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-
+> > >host);
+> > }
+> > 
+> No as far as the user is concerned we have not. The O_DIRECT user
+> is still getting all the semantics he wants, .i.e no syncs no
+> memory cache usage, no copies ...
+> 
+> Only with DAX the buffered IO is the same since with pmem it is
+> faster.
+> Then why not? The basic contract with the user did not break.
+> 
+> The above was just an implementation detail to easily navigate
+> through the Linux vfs IO stack and make the least amount of changes
+> in every FS that wanted to support DAX.(And since dax_do_io is much
+> more like direct_IO then like page-cache IO)
+> 
+> > 
+> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -
+> > 
+> > > 
+> > > 
+> > > > 
+> > > > 
+> > > > This allows us a recovery path in the form of opening the file
+> > > > with
+> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics
+> > > > (sector
+> > > > alignment restrictions).
+> > > > 
+> > > I understand that you want a sector aligned IO, right? for the
+> > > clear of errors. But I hate it that you forced all O_DIRECT IO
+> > > to be slow for this.
+> > > Can you not make dax_do_io handle media errors? At least for the
+> > > parts of the IO that are aligned.
+> > > (And your recovery path application above can use only aligned
+> > >  IO to make sure)
+> > > 
+> > > Please look for another solution. Even a special
+> > > IOCTL_DAX_CLEAR_ERROR
+> >  - see all the versions of this series prior to this one, where we
+> > try
+> > to do a fallback...
+> > 
+> And?
+> 
+> So now all O_DIRECT APPs go 4 times slower. I will have a look but if
+> it is really so bad than please consider an IOCTL or syscall. Or a
+> special
+> O_DAX_ERRORS flag ...
+
+I'm curious where the 4x slower comes from.. The O_DIRECT path is still
+without page-cache copies, and nor does it go through request queues
+(since pmem is a bio-based driver). The only overhead is that of
+submitting a bio - and while I agree it is more overhead than dax_do_io,
+4x seems a bit high.
+
+> 
+> Please do not trash all the O_DIRECT users, they are the more
+> important
+> clients, like DBs and VMs.
+
+Shouldn't they be using mmaps and dax faults? I was under the impression
+that the dax_do_io path is a nice-to-have, but for anyone that will want
+to use DAX, they will want the mmap/fault path, not the IO path. This is
+just making the IO path 'more correct' by allowing it a way to deal with
+errors.
+
+> 
+> Thanks
+> Boaz
+> 
+> > 
+> > > 
+> > > 
+> > > [*"less concurrent" because of the queuing done in bdev. Note how
+> > >   pmem is not even multi-queue, and even if it was it will be much
+> > >   slower then DAX because of the code depth and all the locks and
+> > > task
+> > >   switches done in the block layer. In DAX the final memcpy is
+> > > done
+> > > directly
+> > >   on the user-mode thread]
+> > > 
+> > > Thanks
+> > > Boaz
+> > > 
+_______________________________________________
+xfs mailing list
+xfs@oss.sgi.com
+http://oss.sgi.com/mailman/listinfo/xfs
diff --git a/a/content_digest b/N1/content_digest
index 6fb5f8a..4436ead 100644
--- a/a/content_digest
+++ b/N1/content_digest
@@ -8,102 +8,155 @@
  "Date\0Mon, 2 May 2016 18:52:02 +0000\0"
  "To\0linux-nvdimm@lists.01.org <linux-nvdimm@lists.01.org>"
  " boaz@plexistor.com <boaz@plexistor.com>\0"
- "Cc\0linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>"
-  linux-block@vger.kernel.org <linux-block@vger.kernel.org>
-  hch@infradead.org <hch@infradead.org>
+ "Cc\0hch@infradead.org <hch@infradead.org>"
+  jack@suse.cz <jack@suse.cz>
+  matthew@wil.cx <matthew@wil.cx>
+  axboe@fb.com <axboe@fb.com>
+  linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
   xfs@oss.sgi.com <xfs@oss.sgi.com>
+  linux-block@vger.kernel.org <linux-block@vger.kernel.org>
   linux-mm@kvack.org <linux-mm@kvack.org>
   viro@zeniv.linux.org.uk <viro@zeniv.linux.org.uk>
-  axboe@fb.com <axboe@fb.com>
-  akpm@linux-foundation.org <akpm@linux-foundation.org>
   linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>
-  linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org>
-  david@fromorbit.com <david@fromorbit.com>
-  jack@suse.cz <jack@suse.cz>
- " matthew@wil.cx <matthew@wil.cx>\0"
+  akpm@linux-foundation.org <akpm@linux-foundation.org>
+ " linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org>\0"
  "\00:1\0"
  "b\0"
- "T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+\n"
- "IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP\n"
- "biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g\n"
- "PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K\n"
- "PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz\n"
- "ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g\n"
- "PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5\n"
- "IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo\n"
- "cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg\n"
- "YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g\n"
- "SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz\n"
- "dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K\n"
- "PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0\n"
- "byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl\n"
- "IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+\n"
- "ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu\n"
- "DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+\n"
- "ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU\n"
- "LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu\n"
- "ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j\n"
- "ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu\n"
- "DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4\n"
- "X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl\n"
- "IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0\n"
- "ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz\n"
- "IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt\n"
- "YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp\n"
- "bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN\n"
- "Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm\n"
- "aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT\n"
- "X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg\n"
- "ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz\n"
- "ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu\n"
- "byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u\n"
- "bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp\n"
- "dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg\n"
- "dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs\n"
- "ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4\n"
- "IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g\n"
- "ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g\n"
- "aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K\n"
- "PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3\n"
- "aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN\n"
- "Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg\n"
- "Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg\n"
- "YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g\n"
- "PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K\n"
- "PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln\n"
- "aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv\n"
- "dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+\n"
- "ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh\n"
- "c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g\n"
- "PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg\n"
- "YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs\n"
- "b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY\n"
- "X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz\n"
- "IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh\n"
- "Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg\n"
- "dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz\n"
- "byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz\n"
- "cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg\n"
- "NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91\n"
- "dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx\n"
- "dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo\n"
- "ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp\n"
- "cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K\n"
- "PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy\n"
- "ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K\n"
- "U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy\n"
- "IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2\n"
- "ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3\n"
- "YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt\n"
- "YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g\n"
- "ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+\n"
- "ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl\n"
- "dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11\n"
- "bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz\n"
- "bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr\n"
- "cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh\n"
- "eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0\n"
- "bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu\n"
- a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA==
+ "On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:\n"
+ "> On 05/02/2016 06:51 PM, Vishal Verma wrote:\n"
+ "> > \n"
+ "> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:\n"
+ "> > > \n"
+ "> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:\n"
+ "> > > > \n"
+ "> > > > \n"
+ "> > > > All IO in a dax filesystem used to go through dax_do_io, which\n"
+ "> > > > cannot\n"
+ "> > > > handle media errors, and thus cannot provide a recovery path\n"
+ "> > > > that\n"
+ "> > > > can\n"
+ "> > > > send a write through the driver to clear errors.\n"
+ "> > > > \n"
+ "> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In\n"
+ "> > > > the\n"
+ "> > > > IO\n"
+ "> > > > path for DAX filesystems, use the same direct_IO path for both\n"
+ "> > > > DAX\n"
+ "> > > > and\n"
+ "> > > > direct_io iocbs, but use the flags to identify when we are in\n"
+ "> > > > O_DIRECT\n"
+ "> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the\n"
+ "> > > > conventional\n"
+ "> > > > direct_IO path instead of DAX.\n"
+ "> > > > \n"
+ "> > > Really? What are your thinking here?\n"
+ "> > > \n"
+ "> > > What about all the current users of O_DIRECT, you have just made\n"
+ "> > > them\n"
+ "> > > 4 times slower and \"less concurrent*\" then \"buffred io\" users.\n"
+ "> > > Since\n"
+ "> > > direct_IO path will queue an IO request and all.\n"
+ "> > > (And if it is not so slow then why do we need dax_do_io at all?\n"
+ "> > > [Rhetorical])\n"
+ "> > > \n"
+ "> > > I hate it that you overload the semantics of a known and expected\n"
+ "> > > O_DIRECT flag, for special pmem quirks. This is an incompatible\n"
+ "> > > and unrelated overload of the semantics of O_DIRECT.\n"
+ "> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on\n"
+ "> > the same path:\n"
+ "> > \n"
+ "> > static inline bool io_is_direct(struct file *filp)\n"
+ "> > {\n"
+ "> > \treturn (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-\n"
+ "> > >host);\n"
+ "> > }\n"
+ "> > \n"
+ "> No as far as the user is concerned we have not. The O_DIRECT user\n"
+ "> is still getting all the semantics he wants, .i.e no syncs no\n"
+ "> memory cache usage, no copies ...\n"
+ "> \n"
+ "> Only with DAX the buffered IO is the same since with pmem it is\n"
+ "> faster.\n"
+ "> Then why not? The basic contract with the user did not break.\n"
+ "> \n"
+ "> The above was just an implementation detail to easily navigate\n"
+ "> through the Linux vfs IO stack and make the least amount of changes\n"
+ "> in every FS that wanted to support DAX.(And since dax_do_io is much\n"
+ "> more like direct_IO then like page-cache IO)\n"
+ "> \n"
+ "> > \n"
+ "> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -\n"
+ "> > \n"
+ "> > > \n"
+ "> > > \n"
+ "> > > > \n"
+ "> > > > \n"
+ "> > > > This allows us a recovery path in the form of opening the file\n"
+ "> > > > with\n"
+ "> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics\n"
+ "> > > > (sector\n"
+ "> > > > alignment restrictions).\n"
+ "> > > > \n"
+ "> > > I understand that you want a sector aligned IO, right? for the\n"
+ "> > > clear of errors. But I hate it that you forced all O_DIRECT IO\n"
+ "> > > to be slow for this.\n"
+ "> > > Can you not make dax_do_io handle media errors? At least for the\n"
+ "> > > parts of the IO that are aligned.\n"
+ "> > > (And your recovery path application above can use only aligned\n"
+ "> > > \302\240IO to make sure)\n"
+ "> > > \n"
+ "> > > Please look for another solution. Even a special\n"
+ "> > > IOCTL_DAX_CLEAR_ERROR\n"
+ "> > \302\240- see all the versions of this series prior to this one, where we\n"
+ "> > try\n"
+ "> > to do a fallback...\n"
+ "> > \n"
+ "> And?\n"
+ "> \n"
+ "> So now all O_DIRECT APPs go 4 times slower. I will have a look but if\n"
+ "> it is really so bad than please consider an IOCTL or syscall. Or a\n"
+ "> special\n"
+ "> O_DAX_ERRORS flag ...\n"
+ "\n"
+ "I'm curious where the 4x slower comes from.. The O_DIRECT path is still\n"
+ "without page-cache copies, and nor does it go through request queues\n"
+ "(since pmem is a bio-based driver). The only overhead is that of\n"
+ "submitting a bio - and while I agree it is more overhead than dax_do_io,\n"
+ "4x seems a bit high.\n"
+ "\n"
+ "> \n"
+ "> Please do not trash all the O_DIRECT users, they are the more\n"
+ "> important\n"
+ "> clients, like DBs and VMs.\n"
+ "\n"
+ "Shouldn't they be using mmaps and dax faults? I was under the impression\n"
+ "that the dax_do_io path is a nice-to-have, but for anyone that will want\n"
+ "to use DAX, they will want the mmap/fault path, not the IO path. This is\n"
+ "just making the IO path 'more correct' by allowing it a way to deal with\n"
+ "errors.\n"
+ "\n"
+ "> \n"
+ "> Thanks\n"
+ "> Boaz\n"
+ "> \n"
+ "> > \n"
+ "> > > \n"
+ "> > > \n"
+ "> > > [*\"less concurrent\" because of the queuing done in bdev. Note how\n"
+ "> > > \302\240 pmem is not even multi-queue, and even if it was it will be much\n"
+ "> > > \302\240 slower then DAX because of the code depth and all the locks and\n"
+ "> > > task\n"
+ "> > > \302\240 switches done in the block layer. In DAX the final memcpy is\n"
+ "> > > done\n"
+ "> > > directly\n"
+ "> > > \302\240 on the user-mode thread]\n"
+ "> > > \n"
+ "> > > Thanks\n"
+ "> > > Boaz\n"
+ "> > > \n"
+ "_______________________________________________\n"
+ "xfs mailing list\n"
+ "xfs@oss.sgi.com\n"
+ http://oss.sgi.com/mailman/listinfo/xfs
 
-24869abd3ea9a39bba870c7d85f8910222fd85059cf703e4e503c13cf44d32f9
+c6b62129be977b04a3d1a751e69ec97daafabb4ca3d8261728afa886d62cedb9

diff --git a/a/1.txt b/N2/1.txt
index c608940..fd75ce0 100644
--- a/a/1.txt
+++ b/N2/1.txt
@@ -1,82 +1,136 @@
-T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+
-IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP
-biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g
-PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K
-PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz
-ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g
-PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5
-IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo
-cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg
-YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g
-SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz
-dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K
-PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0
-byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl
-IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+
-ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu
-DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+
-ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU
-LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu
-ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j
-ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu
-DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4
-X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl
-IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0
-ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz
-IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt
-YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp
-bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN
-Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm
-aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT
-X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg
-ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz
-ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu
-byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u
-bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp
-dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg
-dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs
-ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4
-IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g
-ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g
-aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K
-PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3
-aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN
-Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg
-Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg
-YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g
-PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K
-PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln
-aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv
-dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+
-ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh
-c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g
-PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg
-YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs
-b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY
-X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz
-IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh
-Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg
-dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz
-byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz
-cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg
-NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91
-dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx
-dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo
-ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp
-cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K
-PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy
-ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K
-U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy
-IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2
-ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3
-YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt
-YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g
-ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+
-ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl
-dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11
-bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz
-bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr
-cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh
-eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0
-bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu
-a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA==
+On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:
+> On 05/02/2016 06:51 PM, Vishal Verma wrote:
+> > 
+> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:
+> > > 
+> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:
+> > > > 
+> > > > 
+> > > > All IO in a dax filesystem used to go through dax_do_io, which
+> > > > cannot
+> > > > handle media errors, and thus cannot provide a recovery path
+> > > > that
+> > > > can
+> > > > send a write through the driver to clear errors.
+> > > > 
+> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In
+> > > > the
+> > > > IO
+> > > > path for DAX filesystems, use the same direct_IO path for both
+> > > > DAX
+> > > > and
+> > > > direct_io iocbs, but use the flags to identify when we are in
+> > > > O_DIRECT
+> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the
+> > > > conventional
+> > > > direct_IO path instead of DAX.
+> > > > 
+> > > Really? What are your thinking here?
+> > > 
+> > > What about all the current users of O_DIRECT, you have just made
+> > > them
+> > > 4 times slower and "less concurrent*" then "buffred io" users.
+> > > Since
+> > > direct_IO path will queue an IO request and all.
+> > > (And if it is not so slow then why do we need dax_do_io at all?
+> > > [Rhetorical])
+> > > 
+> > > I hate it that you overload the semantics of a known and expected
+> > > O_DIRECT flag, for special pmem quirks. This is an incompatible
+> > > and unrelated overload of the semantics of O_DIRECT.
+> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on
+> > the same path:
+> > 
+> > static inline bool io_is_direct(struct file *filp)
+> > {
+> > 	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-
+> > >host);
+> > }
+> > 
+> No as far as the user is concerned we have not. The O_DIRECT user
+> is still getting all the semantics he wants, .i.e no syncs no
+> memory cache usage, no copies ...
+> 
+> Only with DAX the buffered IO is the same since with pmem it is
+> faster.
+> Then why not? The basic contract with the user did not break.
+> 
+> The above was just an implementation detail to easily navigate
+> through the Linux vfs IO stack and make the least amount of changes
+> in every FS that wanted to support DAX.(And since dax_do_io is much
+> more like direct_IO then like page-cache IO)
+> 
+> > 
+> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -
+> > 
+> > > 
+> > > 
+> > > > 
+> > > > 
+> > > > This allows us a recovery path in the form of opening the file
+> > > > with
+> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics
+> > > > (sector
+> > > > alignment restrictions).
+> > > > 
+> > > I understand that you want a sector aligned IO, right? for the
+> > > clear of errors. But I hate it that you forced all O_DIRECT IO
+> > > to be slow for this.
+> > > Can you not make dax_do_io handle media errors? At least for the
+> > > parts of the IO that are aligned.
+> > > (And your recovery path application above can use only aligned
+> > >  IO to make sure)
+> > > 
+> > > Please look for another solution. Even a special
+> > > IOCTL_DAX_CLEAR_ERROR
+> >  - see all the versions of this series prior to this one, where we
+> > try
+> > to do a fallback...
+> > 
+> And?
+> 
+> So now all O_DIRECT APPs go 4 times slower. I will have a look but if
+> it is really so bad than please consider an IOCTL or syscall. Or a
+> special
+> O_DAX_ERRORS flag ...
+
+I'm curious where the 4x slower comes from.. The O_DIRECT path is still
+without page-cache copies, and nor does it go through request queues
+(since pmem is a bio-based driver). The only overhead is that of
+submitting a bio - and while I agree it is more overhead than dax_do_io,
+4x seems a bit high.
+
+> 
+> Please do not trash all the O_DIRECT users, they are the more
+> important
+> clients, like DBs and VMs.
+
+Shouldn't they be using mmaps and dax faults? I was under the impression
+that the dax_do_io path is a nice-to-have, but for anyone that will want
+to use DAX, they will want the mmap/fault path, not the IO path. This is
+just making the IO path 'more correct' by allowing it a way to deal with
+errors.
+
+> 
+> Thanks
+> Boaz
+> 
+> > 
+> > > 
+> > > 
+> > > [*"less concurrent" because of the queuing done in bdev. Note how
+> > >   pmem is not even multi-queue, and even if it was it will be much
+> > >   slower then DAX because of the code depth and all the locks and
+> > > task
+> > >   switches done in the block layer. In DAX the final memcpy is
+> > > done
+> > > directly
+> > >   on the user-mode thread]
+> > > 
+> > > Thanks
+> > > Boaz
+> > > 
+_______________________________________________
+Linux-nvdimm mailing list
+Linux-nvdimm@lists.01.org
+https://lists.01.org/mailman/listinfo/linux-nvdimm
diff --git a/a/content_digest b/N2/content_digest
index 6fb5f8a..d54e34f 100644
--- a/a/content_digest
+++ b/N2/content_digest
@@ -8,102 +8,156 @@
  "Date\0Mon, 2 May 2016 18:52:02 +0000\0"
  "To\0linux-nvdimm@lists.01.org <linux-nvdimm@lists.01.org>"
  " boaz@plexistor.com <boaz@plexistor.com>\0"
- "Cc\0linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>"
-  linux-block@vger.kernel.org <linux-block@vger.kernel.org>
-  hch@infradead.org <hch@infradead.org>
+ "Cc\0hch@infradead.org <hch@infradead.org>"
+  jack@suse.cz <jack@suse.cz>
+  matthew@wil.cx <matthew@wil.cx>
+  axboe@fb.com <axboe@fb.com>
+  david@fromorbit.com <david@fromorbit.com>
+  linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
   xfs@oss.sgi.com <xfs@oss.sgi.com>
+  linux-block@vger.kernel.org <linux-block@vger.kernel.org>
   linux-mm@kvack.org <linux-mm@kvack.org>
   viro@zeniv.linux.org.uk <viro@zeniv.linux.org.uk>
-  axboe@fb.com <axboe@fb.com>
-  akpm@linux-foundation.org <akpm@linux-foundation.org>
   linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>
-  linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org>
-  david@fromorbit.com <david@fromorbit.com>
-  jack@suse.cz <jack@suse.cz>
- " matthew@wil.cx <matthew@wil.cx>\0"
+  akpm@linux-foundation.org <akpm@linux-foundation.org>
+ " linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org>\0"
  "\00:1\0"
  "b\0"
- "T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+\n"
- "IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP\n"
- "biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g\n"
- "PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K\n"
- "PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz\n"
- "ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g\n"
- "PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5\n"
- "IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo\n"
- "cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg\n"
- "YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g\n"
- "SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz\n"
- "dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K\n"
- "PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0\n"
- "byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl\n"
- "IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+\n"
- "ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu\n"
- "DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+\n"
- "ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU\n"
- "LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu\n"
- "ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j\n"
- "ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu\n"
- "DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4\n"
- "X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl\n"
- "IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0\n"
- "ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz\n"
- "IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt\n"
- "YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp\n"
- "bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN\n"
- "Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm\n"
- "aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT\n"
- "X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg\n"
- "ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz\n"
- "ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu\n"
- "byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u\n"
- "bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp\n"
- "dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg\n"
- "dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs\n"
- "ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4\n"
- "IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g\n"
- "ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g\n"
- "aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K\n"
- "PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3\n"
- "aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN\n"
- "Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg\n"
- "Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg\n"
- "YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g\n"
- "PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K\n"
- "PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln\n"
- "aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv\n"
- "dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+\n"
- "ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh\n"
- "c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g\n"
- "PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg\n"
- "YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs\n"
- "b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY\n"
- "X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz\n"
- "IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh\n"
- "Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg\n"
- "dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz\n"
- "byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz\n"
- "cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg\n"
- "NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91\n"
- "dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx\n"
- "dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo\n"
- "ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp\n"
- "cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K\n"
- "PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy\n"
- "ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K\n"
- "U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy\n"
- "IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2\n"
- "ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3\n"
- "YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt\n"
- "YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g\n"
- "ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+\n"
- "ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl\n"
- "dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11\n"
- "bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz\n"
- "bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr\n"
- "cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh\n"
- "eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0\n"
- "bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu\n"
- a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA==
+ "On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:\n"
+ "> On 05/02/2016 06:51 PM, Vishal Verma wrote:\n"
+ "> > \n"
+ "> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:\n"
+ "> > > \n"
+ "> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:\n"
+ "> > > > \n"
+ "> > > > \n"
+ "> > > > All IO in a dax filesystem used to go through dax_do_io, which\n"
+ "> > > > cannot\n"
+ "> > > > handle media errors, and thus cannot provide a recovery path\n"
+ "> > > > that\n"
+ "> > > > can\n"
+ "> > > > send a write through the driver to clear errors.\n"
+ "> > > > \n"
+ "> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In\n"
+ "> > > > the\n"
+ "> > > > IO\n"
+ "> > > > path for DAX filesystems, use the same direct_IO path for both\n"
+ "> > > > DAX\n"
+ "> > > > and\n"
+ "> > > > direct_io iocbs, but use the flags to identify when we are in\n"
+ "> > > > O_DIRECT\n"
+ "> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the\n"
+ "> > > > conventional\n"
+ "> > > > direct_IO path instead of DAX.\n"
+ "> > > > \n"
+ "> > > Really? What are your thinking here?\n"
+ "> > > \n"
+ "> > > What about all the current users of O_DIRECT, you have just made\n"
+ "> > > them\n"
+ "> > > 4 times slower and \"less concurrent*\" then \"buffred io\" users.\n"
+ "> > > Since\n"
+ "> > > direct_IO path will queue an IO request and all.\n"
+ "> > > (And if it is not so slow then why do we need dax_do_io at all?\n"
+ "> > > [Rhetorical])\n"
+ "> > > \n"
+ "> > > I hate it that you overload the semantics of a known and expected\n"
+ "> > > O_DIRECT flag, for special pmem quirks. This is an incompatible\n"
+ "> > > and unrelated overload of the semantics of O_DIRECT.\n"
+ "> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on\n"
+ "> > the same path:\n"
+ "> > \n"
+ "> > static inline bool io_is_direct(struct file *filp)\n"
+ "> > {\n"
+ "> > \treturn (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-\n"
+ "> > >host);\n"
+ "> > }\n"
+ "> > \n"
+ "> No as far as the user is concerned we have not. The O_DIRECT user\n"
+ "> is still getting all the semantics he wants, .i.e no syncs no\n"
+ "> memory cache usage, no copies ...\n"
+ "> \n"
+ "> Only with DAX the buffered IO is the same since with pmem it is\n"
+ "> faster.\n"
+ "> Then why not? The basic contract with the user did not break.\n"
+ "> \n"
+ "> The above was just an implementation detail to easily navigate\n"
+ "> through the Linux vfs IO stack and make the least amount of changes\n"
+ "> in every FS that wanted to support DAX.(And since dax_do_io is much\n"
+ "> more like direct_IO then like page-cache IO)\n"
+ "> \n"
+ "> > \n"
+ "> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -\n"
+ "> > \n"
+ "> > > \n"
+ "> > > \n"
+ "> > > > \n"
+ "> > > > \n"
+ "> > > > This allows us a recovery path in the form of opening the file\n"
+ "> > > > with\n"
+ "> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics\n"
+ "> > > > (sector\n"
+ "> > > > alignment restrictions).\n"
+ "> > > > \n"
+ "> > > I understand that you want a sector aligned IO, right? for the\n"
+ "> > > clear of errors. But I hate it that you forced all O_DIRECT IO\n"
+ "> > > to be slow for this.\n"
+ "> > > Can you not make dax_do_io handle media errors? At least for the\n"
+ "> > > parts of the IO that are aligned.\n"
+ "> > > (And your recovery path application above can use only aligned\n"
+ "> > > \302\240IO to make sure)\n"
+ "> > > \n"
+ "> > > Please look for another solution. Even a special\n"
+ "> > > IOCTL_DAX_CLEAR_ERROR\n"
+ "> > \302\240- see all the versions of this series prior to this one, where we\n"
+ "> > try\n"
+ "> > to do a fallback...\n"
+ "> > \n"
+ "> And?\n"
+ "> \n"
+ "> So now all O_DIRECT APPs go 4 times slower. I will have a look but if\n"
+ "> it is really so bad than please consider an IOCTL or syscall. Or a\n"
+ "> special\n"
+ "> O_DAX_ERRORS flag ...\n"
+ "\n"
+ "I'm curious where the 4x slower comes from.. The O_DIRECT path is still\n"
+ "without page-cache copies, and nor does it go through request queues\n"
+ "(since pmem is a bio-based driver). The only overhead is that of\n"
+ "submitting a bio - and while I agree it is more overhead than dax_do_io,\n"
+ "4x seems a bit high.\n"
+ "\n"
+ "> \n"
+ "> Please do not trash all the O_DIRECT users, they are the more\n"
+ "> important\n"
+ "> clients, like DBs and VMs.\n"
+ "\n"
+ "Shouldn't they be using mmaps and dax faults? I was under the impression\n"
+ "that the dax_do_io path is a nice-to-have, but for anyone that will want\n"
+ "to use DAX, they will want the mmap/fault path, not the IO path. This is\n"
+ "just making the IO path 'more correct' by allowing it a way to deal with\n"
+ "errors.\n"
+ "\n"
+ "> \n"
+ "> Thanks\n"
+ "> Boaz\n"
+ "> \n"
+ "> > \n"
+ "> > > \n"
+ "> > > \n"
+ "> > > [*\"less concurrent\" because of the queuing done in bdev. Note how\n"
+ "> > > \302\240 pmem is not even multi-queue, and even if it was it will be much\n"
+ "> > > \302\240 slower then DAX because of the code depth and all the locks and\n"
+ "> > > task\n"
+ "> > > \302\240 switches done in the block layer. In DAX the final memcpy is\n"
+ "> > > done\n"
+ "> > > directly\n"
+ "> > > \302\240 on the user-mode thread]\n"
+ "> > > \n"
+ "> > > Thanks\n"
+ "> > > Boaz\n"
+ "> > > \n"
+ "_______________________________________________\n"
+ "Linux-nvdimm mailing list\n"
+ "Linux-nvdimm@lists.01.org\n"
+ https://lists.01.org/mailman/listinfo/linux-nvdimm
 
-24869abd3ea9a39bba870c7d85f8910222fd85059cf703e4e503c13cf44d32f9
+10fd1ae6cddab73b2221b4ed39bdcbfabc6145d838b99905bfe21ccf5c24acce

diff --git a/a/1.txt b/N3/1.txt
index c608940..529b226 100644
--- a/a/1.txt
+++ b/N3/1.txt
@@ -1,82 +1,132 @@
-T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+
-IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP
-biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g
-PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K
-PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz
-ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g
-PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5
-IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo
-cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg
-YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g
-SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz
-dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K
-PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0
-byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl
-IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+
-ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu
-DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+
-ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU
-LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu
-ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j
-ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu
-DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4
-X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl
-IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0
-ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz
-IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt
-YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp
-bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN
-Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm
-aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT
-X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg
-ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz
-ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu
-byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u
-bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp
-dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg
-dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs
-ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4
-IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g
-ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g
-aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K
-PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3
-aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN
-Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg
-Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg
-YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g
-PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K
-PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln
-aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv
-dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+
-ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh
-c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g
-PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg
-YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs
-b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY
-X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz
-IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh
-Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg
-dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz
-byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz
-cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg
-NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91
-dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx
-dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo
-ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp
-cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K
-PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy
-ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K
-U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy
-IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2
-ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3
-YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt
-YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g
-ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+
-ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl
-dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11
-bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz
-bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr
-cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh
-eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0
-bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu
-a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA==
+On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:
+> On 05/02/2016 06:51 PM, Vishal Verma wrote:
+> > 
+> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:
+> > > 
+> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:
+> > > > 
+> > > > 
+> > > > All IO in a dax filesystem used to go through dax_do_io, which
+> > > > cannot
+> > > > handle media errors, and thus cannot provide a recovery path
+> > > > that
+> > > > can
+> > > > send a write through the driver to clear errors.
+> > > > 
+> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In
+> > > > the
+> > > > IO
+> > > > path for DAX filesystems, use the same direct_IO path for both
+> > > > DAX
+> > > > and
+> > > > direct_io iocbs, but use the flags to identify when we are in
+> > > > O_DIRECT
+> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the
+> > > > conventional
+> > > > direct_IO path instead of DAX.
+> > > > 
+> > > Really? What are your thinking here?
+> > > 
+> > > What about all the current users of O_DIRECT, you have just made
+> > > them
+> > > 4 times slower and "less concurrent*" then "buffred io" users.
+> > > Since
+> > > direct_IO path will queue an IO request and all.
+> > > (And if it is not so slow then why do we need dax_do_io at all?
+> > > [Rhetorical])
+> > > 
+> > > I hate it that you overload the semantics of a known and expected
+> > > O_DIRECT flag, for special pmem quirks. This is an incompatible
+> > > and unrelated overload of the semantics of O_DIRECT.
+> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on
+> > the same path:
+> > 
+> > static inline bool io_is_direct(struct file *filp)
+> > {
+> > 	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-
+> > >host);
+> > }
+> > 
+> No as far as the user is concerned we have not. The O_DIRECT user
+> is still getting all the semantics he wants, .i.e no syncs no
+> memory cache usage, no copies ...
+> 
+> Only with DAX the buffered IO is the same since with pmem it is
+> faster.
+> Then why not? The basic contract with the user did not break.
+> 
+> The above was just an implementation detail to easily navigate
+> through the Linux vfs IO stack and make the least amount of changes
+> in every FS that wanted to support DAX.(And since dax_do_io is much
+> more like direct_IO then like page-cache IO)
+> 
+> > 
+> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -
+> > 
+> > > 
+> > > 
+> > > > 
+> > > > 
+> > > > This allows us a recovery path in the form of opening the file
+> > > > with
+> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics
+> > > > (sector
+> > > > alignment restrictions).
+> > > > 
+> > > I understand that you want a sector aligned IO, right? for the
+> > > clear of errors. But I hate it that you forced all O_DIRECT IO
+> > > to be slow for this.
+> > > Can you not make dax_do_io handle media errors? At least for the
+> > > parts of the IO that are aligned.
+> > > (And your recovery path application above can use only aligned
+> > >  IO to make sure)
+> > > 
+> > > Please look for another solution. Even a special
+> > > IOCTL_DAX_CLEAR_ERROR
+> >  - see all the versions of this series prior to this one, where we
+> > try
+> > to do a fallback...
+> > 
+> And?
+> 
+> So now all O_DIRECT APPs go 4 times slower. I will have a look but if
+> it is really so bad than please consider an IOCTL or syscall. Or a
+> special
+> O_DAX_ERRORS flag ...
+
+I'm curious where the 4x slower comes from.. The O_DIRECT path is still
+without page-cache copies, and nor does it go through request queues
+(since pmem is a bio-based driver). The only overhead is that of
+submitting a bio - and while I agree it is more overhead than dax_do_io,
+4x seems a bit high.
+
+> 
+> Please do not trash all the O_DIRECT users, they are the more
+> important
+> clients, like DBs and VMs.
+
+Shouldn't they be using mmaps and dax faults? I was under the impression
+that the dax_do_io path is a nice-to-have, but for anyone that will want
+to use DAX, they will want the mmap/fault path, not the IO path. This is
+just making the IO path 'more correct' by allowing it a way to deal with
+errors.
+
+> 
+> Thanks
+> Boaz
+> 
+> > 
+> > > 
+> > > 
+> > > [*"less concurrent" because of the queuing done in bdev. Note how
+> > >   pmem is not even multi-queue, and even if it was it will be much
+> > >   slower then DAX because of the code depth and all the locks and
+> > > task
+> > >   switches done in the block layer. In DAX the final memcpy is
+> > > done
+> > > directly
+> > >   on the user-mode thread]
+> > > 
+> > > Thanks
+> > > Boaz
+> > >
diff --git a/a/content_digest b/N3/content_digest
index 6fb5f8a..69d8743 100644
--- a/a/content_digest
+++ b/N3/content_digest
@@ -23,87 +23,137 @@
  " matthew@wil.cx <matthew@wil.cx>\0"
  "\00:1\0"
  "b\0"
- "T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+\n"
- "IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP\n"
- "biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g\n"
- "PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K\n"
- "PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz\n"
- "ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g\n"
- "PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5\n"
- "IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo\n"
- "cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg\n"
- "YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g\n"
- "SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz\n"
- "dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K\n"
- "PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0\n"
- "byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl\n"
- "IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+\n"
- "ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu\n"
- "DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+\n"
- "ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU\n"
- "LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu\n"
- "ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j\n"
- "ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu\n"
- "DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4\n"
- "X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl\n"
- "IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0\n"
- "ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz\n"
- "IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt\n"
- "YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp\n"
- "bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN\n"
- "Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm\n"
- "aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT\n"
- "X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg\n"
- "ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz\n"
- "ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu\n"
- "byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u\n"
- "bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp\n"
- "dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg\n"
- "dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs\n"
- "ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4\n"
- "IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g\n"
- "ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g\n"
- "aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K\n"
- "PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3\n"
- "aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN\n"
- "Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg\n"
- "Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg\n"
- "YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g\n"
- "PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K\n"
- "PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln\n"
- "aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv\n"
- "dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+\n"
- "ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh\n"
- "c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g\n"
- "PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg\n"
- "YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs\n"
- "b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY\n"
- "X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz\n"
- "IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh\n"
- "Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg\n"
- "dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz\n"
- "byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz\n"
- "cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg\n"
- "NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91\n"
- "dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx\n"
- "dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo\n"
- "ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp\n"
- "cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K\n"
- "PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy\n"
- "ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K\n"
- "U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy\n"
- "IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2\n"
- "ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3\n"
- "YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt\n"
- "YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g\n"
- "ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+\n"
- "ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl\n"
- "dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11\n"
- "bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz\n"
- "bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr\n"
- "cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh\n"
- "eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0\n"
- "bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu\n"
- a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA==
+ "On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:\n"
+ "> On 05/02/2016 06:51 PM, Vishal Verma wrote:\n"
+ "> > \n"
+ "> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:\n"
+ "> > > \n"
+ "> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:\n"
+ "> > > > \n"
+ "> > > > \n"
+ "> > > > All IO in a dax filesystem used to go through dax_do_io, which\n"
+ "> > > > cannot\n"
+ "> > > > handle media errors, and thus cannot provide a recovery path\n"
+ "> > > > that\n"
+ "> > > > can\n"
+ "> > > > send a write through the driver to clear errors.\n"
+ "> > > > \n"
+ "> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In\n"
+ "> > > > the\n"
+ "> > > > IO\n"
+ "> > > > path for DAX filesystems, use the same direct_IO path for both\n"
+ "> > > > DAX\n"
+ "> > > > and\n"
+ "> > > > direct_io iocbs, but use the flags to identify when we are in\n"
+ "> > > > O_DIRECT\n"
+ "> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the\n"
+ "> > > > conventional\n"
+ "> > > > direct_IO path instead of DAX.\n"
+ "> > > > \n"
+ "> > > Really? What are your thinking here?\n"
+ "> > > \n"
+ "> > > What about all the current users of O_DIRECT, you have just made\n"
+ "> > > them\n"
+ "> > > 4 times slower and \"less concurrent*\" then \"buffred io\" users.\n"
+ "> > > Since\n"
+ "> > > direct_IO path will queue an IO request and all.\n"
+ "> > > (And if it is not so slow then why do we need dax_do_io at all?\n"
+ "> > > [Rhetorical])\n"
+ "> > > \n"
+ "> > > I hate it that you overload the semantics of a known and expected\n"
+ "> > > O_DIRECT flag, for special pmem quirks. This is an incompatible\n"
+ "> > > and unrelated overload of the semantics of O_DIRECT.\n"
+ "> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on\n"
+ "> > the same path:\n"
+ "> > \n"
+ "> > static inline bool io_is_direct(struct file *filp)\n"
+ "> > {\n"
+ "> > \treturn (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-\n"
+ "> > >host);\n"
+ "> > }\n"
+ "> > \n"
+ "> No as far as the user is concerned we have not. The O_DIRECT user\n"
+ "> is still getting all the semantics he wants, .i.e no syncs no\n"
+ "> memory cache usage, no copies ...\n"
+ "> \n"
+ "> Only with DAX the buffered IO is the same since with pmem it is\n"
+ "> faster.\n"
+ "> Then why not? The basic contract with the user did not break.\n"
+ "> \n"
+ "> The above was just an implementation detail to easily navigate\n"
+ "> through the Linux vfs IO stack and make the least amount of changes\n"
+ "> in every FS that wanted to support DAX.(And since dax_do_io is much\n"
+ "> more like direct_IO then like page-cache IO)\n"
+ "> \n"
+ "> > \n"
+ "> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -\n"
+ "> > \n"
+ "> > > \n"
+ "> > > \n"
+ "> > > > \n"
+ "> > > > \n"
+ "> > > > This allows us a recovery path in the form of opening the file\n"
+ "> > > > with\n"
+ "> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics\n"
+ "> > > > (sector\n"
+ "> > > > alignment restrictions).\n"
+ "> > > > \n"
+ "> > > I understand that you want a sector aligned IO, right? for the\n"
+ "> > > clear of errors. But I hate it that you forced all O_DIRECT IO\n"
+ "> > > to be slow for this.\n"
+ "> > > Can you not make dax_do_io handle media errors? At least for the\n"
+ "> > > parts of the IO that are aligned.\n"
+ "> > > (And your recovery path application above can use only aligned\n"
+ "> > > \302\240IO to make sure)\n"
+ "> > > \n"
+ "> > > Please look for another solution. Even a special\n"
+ "> > > IOCTL_DAX_CLEAR_ERROR\n"
+ "> > \302\240- see all the versions of this series prior to this one, where we\n"
+ "> > try\n"
+ "> > to do a fallback...\n"
+ "> > \n"
+ "> And?\n"
+ "> \n"
+ "> So now all O_DIRECT APPs go 4 times slower. I will have a look but if\n"
+ "> it is really so bad than please consider an IOCTL or syscall. Or a\n"
+ "> special\n"
+ "> O_DAX_ERRORS flag ...\n"
+ "\n"
+ "I'm curious where the 4x slower comes from.. The O_DIRECT path is still\n"
+ "without page-cache copies, and nor does it go through request queues\n"
+ "(since pmem is a bio-based driver). The only overhead is that of\n"
+ "submitting a bio - and while I agree it is more overhead than dax_do_io,\n"
+ "4x seems a bit high.\n"
+ "\n"
+ "> \n"
+ "> Please do not trash all the O_DIRECT users, they are the more\n"
+ "> important\n"
+ "> clients, like DBs and VMs.\n"
+ "\n"
+ "Shouldn't they be using mmaps and dax faults? I was under the impression\n"
+ "that the dax_do_io path is a nice-to-have, but for anyone that will want\n"
+ "to use DAX, they will want the mmap/fault path, not the IO path. This is\n"
+ "just making the IO path 'more correct' by allowing it a way to deal with\n"
+ "errors.\n"
+ "\n"
+ "> \n"
+ "> Thanks\n"
+ "> Boaz\n"
+ "> \n"
+ "> > \n"
+ "> > > \n"
+ "> > > \n"
+ "> > > [*\"less concurrent\" because of the queuing done in bdev. Note how\n"
+ "> > > \302\240 pmem is not even multi-queue, and even if it was it will be much\n"
+ "> > > \302\240 slower then DAX because of the code depth and all the locks and\n"
+ "> > > task\n"
+ "> > > \302\240 switches done in the block layer. In DAX the final memcpy is\n"
+ "> > > done\n"
+ "> > > directly\n"
+ "> > > \302\240 on the user-mode thread]\n"
+ "> > > \n"
+ "> > > Thanks\n"
+ "> > > Boaz\n"
+ > > >
 
-24869abd3ea9a39bba870c7d85f8910222fd85059cf703e4e503c13cf44d32f9
+997f9483b00dd6b2c9a381c0bfe4c8edf4ae91f4824f8898c28cfc81c8db5a1d

diff --git a/a/1.txt b/N4/1.txt
index c608940..529b226 100644
--- a/a/1.txt
+++ b/N4/1.txt
@@ -1,82 +1,132 @@
-T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+
-IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP
-biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g
-PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K
-PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz
-ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g
-PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5
-IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo
-cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg
-YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g
-SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz
-dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K
-PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0
-byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl
-IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+
-ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu
-DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+
-ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU
-LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu
-ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j
-ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu
-DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4
-X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl
-IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0
-ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz
-IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt
-YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp
-bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN
-Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm
-aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT
-X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg
-ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz
-ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu
-byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u
-bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp
-dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg
-dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs
-ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4
-IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g
-ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g
-aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K
-PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3
-aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN
-Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg
-Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg
-YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g
-PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K
-PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln
-aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv
-dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+
-ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh
-c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g
-PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg
-YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs
-b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY
-X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz
-IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh
-Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg
-dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz
-byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz
-cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg
-NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91
-dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx
-dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo
-ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp
-cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K
-PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy
-ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K
-U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy
-IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2
-ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3
-YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt
-YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g
-ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+
-ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl
-dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11
-bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz
-bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr
-cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh
-eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0
-bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu
-a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA==
+On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:
+> On 05/02/2016 06:51 PM, Vishal Verma wrote:
+> > 
+> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:
+> > > 
+> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:
+> > > > 
+> > > > 
+> > > > All IO in a dax filesystem used to go through dax_do_io, which
+> > > > cannot
+> > > > handle media errors, and thus cannot provide a recovery path
+> > > > that
+> > > > can
+> > > > send a write through the driver to clear errors.
+> > > > 
+> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In
+> > > > the
+> > > > IO
+> > > > path for DAX filesystems, use the same direct_IO path for both
+> > > > DAX
+> > > > and
+> > > > direct_io iocbs, but use the flags to identify when we are in
+> > > > O_DIRECT
+> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the
+> > > > conventional
+> > > > direct_IO path instead of DAX.
+> > > > 
+> > > Really? What are your thinking here?
+> > > 
+> > > What about all the current users of O_DIRECT, you have just made
+> > > them
+> > > 4 times slower and "less concurrent*" then "buffred io" users.
+> > > Since
+> > > direct_IO path will queue an IO request and all.
+> > > (And if it is not so slow then why do we need dax_do_io at all?
+> > > [Rhetorical])
+> > > 
+> > > I hate it that you overload the semantics of a known and expected
+> > > O_DIRECT flag, for special pmem quirks. This is an incompatible
+> > > and unrelated overload of the semantics of O_DIRECT.
+> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on
+> > the same path:
+> > 
+> > static inline bool io_is_direct(struct file *filp)
+> > {
+> > 	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-
+> > >host);
+> > }
+> > 
+> No as far as the user is concerned we have not. The O_DIRECT user
+> is still getting all the semantics he wants, .i.e no syncs no
+> memory cache usage, no copies ...
+> 
+> Only with DAX the buffered IO is the same since with pmem it is
+> faster.
+> Then why not? The basic contract with the user did not break.
+> 
+> The above was just an implementation detail to easily navigate
+> through the Linux vfs IO stack and make the least amount of changes
+> in every FS that wanted to support DAX.(And since dax_do_io is much
+> more like direct_IO then like page-cache IO)
+> 
+> > 
+> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -
+> > 
+> > > 
+> > > 
+> > > > 
+> > > > 
+> > > > This allows us a recovery path in the form of opening the file
+> > > > with
+> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics
+> > > > (sector
+> > > > alignment restrictions).
+> > > > 
+> > > I understand that you want a sector aligned IO, right? for the
+> > > clear of errors. But I hate it that you forced all O_DIRECT IO
+> > > to be slow for this.
+> > > Can you not make dax_do_io handle media errors? At least for the
+> > > parts of the IO that are aligned.
+> > > (And your recovery path application above can use only aligned
+> > >  IO to make sure)
+> > > 
+> > > Please look for another solution. Even a special
+> > > IOCTL_DAX_CLEAR_ERROR
+> >  - see all the versions of this series prior to this one, where we
+> > try
+> > to do a fallback...
+> > 
+> And?
+> 
+> So now all O_DIRECT APPs go 4 times slower. I will have a look but if
+> it is really so bad than please consider an IOCTL or syscall. Or a
+> special
+> O_DAX_ERRORS flag ...
+
+I'm curious where the 4x slower comes from.. The O_DIRECT path is still
+without page-cache copies, and nor does it go through request queues
+(since pmem is a bio-based driver). The only overhead is that of
+submitting a bio - and while I agree it is more overhead than dax_do_io,
+4x seems a bit high.
+
+> 
+> Please do not trash all the O_DIRECT users, they are the more
+> important
+> clients, like DBs and VMs.
+
+Shouldn't they be using mmaps and dax faults? I was under the impression
+that the dax_do_io path is a nice-to-have, but for anyone that will want
+to use DAX, they will want the mmap/fault path, not the IO path. This is
+just making the IO path 'more correct' by allowing it a way to deal with
+errors.
+
+> 
+> Thanks
+> Boaz
+> 
+> > 
+> > > 
+> > > 
+> > > [*"less concurrent" because of the queuing done in bdev. Note how
+> > >   pmem is not even multi-queue, and even if it was it will be much
+> > >   slower then DAX because of the code depth and all the locks and
+> > > task
+> > >   switches done in the block layer. In DAX the final memcpy is
+> > > done
+> > > directly
+> > >   on the user-mode thread]
+> > > 
+> > > Thanks
+> > > Boaz
+> > >
diff --git a/a/content_digest b/N4/content_digest
index 6fb5f8a..c23c46e 100644
--- a/a/content_digest
+++ b/N4/content_digest
@@ -6,7 +6,7 @@
  "From\0Verma, Vishal L <vishal.l.verma@intel.com>\0"
  "Subject\0Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io\0"
  "Date\0Mon, 2 May 2016 18:52:02 +0000\0"
- "To\0linux-nvdimm@lists.01.org <linux-nvdimm@lists.01.org>"
+ "To\0linux-nvdimm@lists.01.org <linux-nvdimm@ml01.01.org>"
  " boaz@plexistor.com <boaz@plexistor.com>\0"
  "Cc\0linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>"
   linux-block@vger.kernel.org <linux-block@vger.kernel.org>
@@ -20,90 +20,140 @@
   linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org>
   david@fromorbit.com <david@fromorbit.com>
   jack@suse.cz <jack@suse.cz>
- " matthew@wil.cx <matthew@wil.cx>\0"
+ " matthew@wil.cx <matthew@freeurl.abc188.com>\0"
  "\00:1\0"
  "b\0"
- "T24gTW9uLCAyMDE2LTA1LTAyIGF0IDE5OjAzICswMzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+\n"
- "IE9uIDA1LzAyLzIwMTYgMDY6NTEgUE0sIFZpc2hhbCBWZXJtYSB3cm90ZToNCj4gPiANCj4gPiBP\n"
- "biBNb24sIDIwMTYtMDUtMDIgYXQgMTg6NDEgKzAzMDAsIEJvYXogSGFycm9zaCB3cm90ZToNCj4g\n"
- "PiA+IA0KPiA+ID4gT24gMDQvMjkvMjAxNiAxMjoxNiBBTSwgVmlzaGFsIFZlcm1hIHdyb3RlOg0K\n"
- "PiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IEFsbCBJTyBpbiBhIGRheCBmaWxlc3lzdGVtIHVz\n"
- "ZWQgdG8gZ28gdGhyb3VnaCBkYXhfZG9faW8sIHdoaWNoDQo+ID4gPiA+IGNhbm5vdA0KPiA+ID4g\n"
- "PiBoYW5kbGUgbWVkaWEgZXJyb3JzLCBhbmQgdGh1cyBjYW5ub3QgcHJvdmlkZSBhIHJlY292ZXJ5\n"
- "IHBhdGgNCj4gPiA+ID4gdGhhdA0KPiA+ID4gPiBjYW4NCj4gPiA+ID4gc2VuZCBhIHdyaXRlIHRo\n"
- "cm91Z2ggdGhlIGRyaXZlciB0byBjbGVhciBlcnJvcnMuDQo+ID4gPiA+IA0KPiA+ID4gPiBBZGQg\n"
- "YSBuZXcgaW9jYiBmbGFnIGZvciBEQVgsIGFuZCBzZXQgaXQgb25seSBmb3IgREFYIG1vdW50cy4g\n"
- "SW4NCj4gPiA+ID4gdGhlDQo+ID4gPiA+IElPDQo+ID4gPiA+IHBhdGggZm9yIERBWCBmaWxlc3lz\n"
- "dGVtcywgdXNlIHRoZSBzYW1lIGRpcmVjdF9JTyBwYXRoIGZvciBib3RoDQo+ID4gPiA+IERBWA0K\n"
- "PiA+ID4gPiBhbmQNCj4gPiA+ID4gZGlyZWN0X2lvIGlvY2JzLCBidXQgdXNlIHRoZSBmbGFncyB0\n"
- "byBpZGVudGlmeSB3aGVuIHdlIGFyZSBpbg0KPiA+ID4gPiBPX0RJUkVDVA0KPiA+ID4gPiBtb2Rl\n"
- "IHZzIG5vbiBPX0RJUkVDVCB3aXRoIERBWCwgYW5kIGZvciBPX0RJUkVDVCwgdXNlIHRoZQ0KPiA+\n"
- "ID4gPiBjb252ZW50aW9uYWwNCj4gPiA+ID4gZGlyZWN0X0lPIHBhdGggaW5zdGVhZCBvZiBEQVgu\n"
- "DQo+ID4gPiA+IA0KPiA+ID4gUmVhbGx5PyBXaGF0IGFyZSB5b3VyIHRoaW5raW5nIGhlcmU/DQo+\n"
- "ID4gPiANCj4gPiA+IFdoYXQgYWJvdXQgYWxsIHRoZSBjdXJyZW50IHVzZXJzIG9mIE9fRElSRUNU\n"
- "LCB5b3UgaGF2ZSBqdXN0IG1hZGUNCj4gPiA+IHRoZW0NCj4gPiA+IDQgdGltZXMgc2xvd2VyIGFu\n"
- "ZCAibGVzcyBjb25jdXJyZW50KiIgdGhlbiAiYnVmZnJlZCBpbyIgdXNlcnMuDQo+ID4gPiBTaW5j\n"
- "ZQ0KPiA+ID4gZGlyZWN0X0lPIHBhdGggd2lsbCBxdWV1ZSBhbiBJTyByZXF1ZXN0IGFuZCBhbGwu\n"
- "DQo+ID4gPiAoQW5kIGlmIGl0IGlzIG5vdCBzbyBzbG93IHRoZW4gd2h5IGRvIHdlIG5lZWQgZGF4\n"
- "X2RvX2lvIGF0IGFsbD8NCj4gPiA+IFtSaGV0b3JpY2FsXSkNCj4gPiA+IA0KPiA+ID4gSSBoYXRl\n"
- "IGl0IHRoYXQgeW91IG92ZXJsb2FkIHRoZSBzZW1hbnRpY3Mgb2YgYSBrbm93biBhbmQgZXhwZWN0\n"
- "ZWQNCj4gPiA+IE9fRElSRUNUIGZsYWcsIGZvciBzcGVjaWFsIHBtZW0gcXVpcmtzLiBUaGlzIGlz\n"
- "IGFuIGluY29tcGF0aWJsZQ0KPiA+ID4gYW5kIHVucmVsYXRlZCBvdmVybG9hZCBvZiB0aGUgc2Vt\n"
- "YW50aWNzIG9mIE9fRElSRUNULg0KPiA+IFdlIG92ZXJsb2FkZWQgT19ESVJFQ1QgYSBsb25nIHRp\n"
- "bWUgYWdvIHdoZW4gd2UgbWFkZSBEQVggcGlnZ3liYWNrIG9uDQo+ID4gdGhlIHNhbWUgcGF0aDoN\n"
- "Cj4gPiANCj4gPiBzdGF0aWMgaW5saW5lIGJvb2wgaW9faXNfZGlyZWN0KHN0cnVjdCBmaWxlICpm\n"
- "aWxwKQ0KPiA+IHsNCj4gPiAJcmV0dXJuIChmaWxwLT5mX2ZsYWdzICYgT19ESVJFQ1QpIHx8IElT\n"
- "X0RBWChmaWxwLT5mX21hcHBpbmctDQo+ID4gPmhvc3QpOw0KPiA+IH0NCj4gPiANCj4gTm8gYXMg\n"
- "ZmFyIGFzIHRoZSB1c2VyIGlzIGNvbmNlcm5lZCB3ZSBoYXZlIG5vdC4gVGhlIE9fRElSRUNUIHVz\n"
- "ZXINCj4gaXMgc3RpbGwgZ2V0dGluZyBhbGwgdGhlIHNlbWFudGljcyBoZSB3YW50cywgLmkuZSBu\n"
- "byBzeW5jcyBubw0KPiBtZW1vcnkgY2FjaGUgdXNhZ2UsIG5vIGNvcGllcyAuLi4NCj4gDQo+IE9u\n"
- "bHkgd2l0aCBEQVggdGhlIGJ1ZmZlcmVkIElPIGlzIHRoZSBzYW1lIHNpbmNlIHdpdGggcG1lbSBp\n"
- "dCBpcw0KPiBmYXN0ZXIuDQo+IFRoZW4gd2h5IG5vdD8gVGhlIGJhc2ljIGNvbnRyYWN0IHdpdGgg\n"
- "dGhlIHVzZXIgZGlkIG5vdCBicmVhay4NCj4gDQo+IFRoZSBhYm92ZSB3YXMganVzdCBhbiBpbXBs\n"
- "ZW1lbnRhdGlvbiBkZXRhaWwgdG8gZWFzaWx5IG5hdmlnYXRlDQo+IHRocm91Z2ggdGhlIExpbnV4\n"
- "IHZmcyBJTyBzdGFjayBhbmQgbWFrZSB0aGUgbGVhc3QgYW1vdW50IG9mIGNoYW5nZXMNCj4gaW4g\n"
- "ZXZlcnkgRlMgdGhhdCB3YW50ZWQgdG8gc3VwcG9ydCBEQVguKEFuZCBzaW5jZSBkYXhfZG9faW8g\n"
- "aXMgbXVjaA0KPiBtb3JlIGxpa2UgZGlyZWN0X0lPIHRoZW4gbGlrZSBwYWdlLWNhY2hlIElPKQ0K\n"
- "PiANCj4gPiANCj4gPiBZZXMgT19ESVJFQ1Qgb24gYSBEQVggbW91bnRlZCBmaWxlIHN5c3RlbSB3\n"
- "aWxsIG5vdyBiZSBzbG93ZXIsIGJ1dCAtDQo+ID4gDQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gPiAN\n"
- "Cj4gPiA+ID4gDQo+ID4gPiA+IFRoaXMgYWxsb3dzIHVzIGEgcmVjb3ZlcnkgcGF0aCBpbiB0aGUg\n"
- "Zm9ybSBvZiBvcGVuaW5nIHRoZSBmaWxlDQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gT19ESVJFQ1Qg\n"
- "YW5kIHdyaXRpbmcgdG8gaXQgd2l0aCB0aGUgdXN1YWwgT19ESVJFQ1Qgc2VtYW50aWNzDQo+ID4g\n"
- "PiA+IChzZWN0b3INCj4gPiA+ID4gYWxpZ25tZW50IHJlc3RyaWN0aW9ucykuDQo+ID4gPiA+IA0K\n"
- "PiA+ID4gSSB1bmRlcnN0YW5kIHRoYXQgeW91IHdhbnQgYSBzZWN0b3IgYWxpZ25lZCBJTywgcmln\n"
- "aHQ/IGZvciB0aGUNCj4gPiA+IGNsZWFyIG9mIGVycm9ycy4gQnV0IEkgaGF0ZSBpdCB0aGF0IHlv\n"
- "dSBmb3JjZWQgYWxsIE9fRElSRUNUIElPDQo+ID4gPiB0byBiZSBzbG93IGZvciB0aGlzLg0KPiA+\n"
- "ID4gQ2FuIHlvdSBub3QgbWFrZSBkYXhfZG9faW8gaGFuZGxlIG1lZGlhIGVycm9ycz8gQXQgbGVh\n"
- "c3QgZm9yIHRoZQ0KPiA+ID4gcGFydHMgb2YgdGhlIElPIHRoYXQgYXJlIGFsaWduZWQuDQo+ID4g\n"
- "PiAoQW5kIHlvdXIgcmVjb3ZlcnkgcGF0aCBhcHBsaWNhdGlvbiBhYm92ZSBjYW4gdXNlIG9ubHkg\n"
- "YWxpZ25lZA0KPiA+ID4gwqBJTyB0byBtYWtlIHN1cmUpDQo+ID4gPiANCj4gPiA+IFBsZWFzZSBs\n"
- "b29rIGZvciBhbm90aGVyIHNvbHV0aW9uLiBFdmVuIGEgc3BlY2lhbA0KPiA+ID4gSU9DVExfREFY\n"
- "X0NMRUFSX0VSUk9SDQo+ID4gwqAtIHNlZSBhbGwgdGhlIHZlcnNpb25zIG9mIHRoaXMgc2VyaWVz\n"
- "IHByaW9yIHRvIHRoaXMgb25lLCB3aGVyZSB3ZQ0KPiA+IHRyeQ0KPiA+IHRvIGRvIGEgZmFsbGJh\n"
- "Y2suLi4NCj4gPiANCj4gQW5kPw0KPiANCj4gU28gbm93IGFsbCBPX0RJUkVDVCBBUFBzIGdvIDQg\n"
- "dGltZXMgc2xvd2VyLiBJIHdpbGwgaGF2ZSBhIGxvb2sgYnV0IGlmDQo+IGl0IGlzIHJlYWxseSBz\n"
- "byBiYWQgdGhhbiBwbGVhc2UgY29uc2lkZXIgYW4gSU9DVEwgb3Igc3lzY2FsbC4gT3IgYQ0KPiBz\n"
- "cGVjaWFsDQo+IE9fREFYX0VSUk9SUyBmbGFnIC4uLg0KDQpJJ20gY3VyaW91cyB3aGVyZSB0aGUg\n"
- "NHggc2xvd2VyIGNvbWVzIGZyb20uLiBUaGUgT19ESVJFQ1QgcGF0aCBpcyBzdGlsbA0Kd2l0aG91\n"
- "dCBwYWdlLWNhY2hlIGNvcGllcywgYW5kIG5vciBkb2VzIGl0IGdvIHRocm91Z2ggcmVxdWVzdCBx\n"
- "dWV1ZXMNCihzaW5jZSBwbWVtIGlzIGEgYmlvLWJhc2VkIGRyaXZlcikuIFRoZSBvbmx5IG92ZXJo\n"
- "ZWFkIGlzIHRoYXQgb2YNCnN1Ym1pdHRpbmcgYSBiaW8gLSBhbmQgd2hpbGUgSSBhZ3JlZSBpdCBp\n"
- "cyBtb3JlIG92ZXJoZWFkIHRoYW4gZGF4X2RvX2lvLA0KNHggc2VlbXMgYSBiaXQgaGlnaC4NCg0K\n"
- "PiANCj4gUGxlYXNlIGRvIG5vdCB0cmFzaCBhbGwgdGhlIE9fRElSRUNUIHVzZXJzLCB0aGV5IGFy\n"
- "ZSB0aGUgbW9yZQ0KPiBpbXBvcnRhbnQNCj4gY2xpZW50cywgbGlrZSBEQnMgYW5kIFZNcy4NCg0K\n"
- "U2hvdWxkbid0IHRoZXkgYmUgdXNpbmcgbW1hcHMgYW5kIGRheCBmYXVsdHM/IEkgd2FzIHVuZGVy\n"
- "IHRoZSBpbXByZXNzaW9uDQp0aGF0IHRoZSBkYXhfZG9faW8gcGF0aCBpcyBhIG5pY2UtdG8taGF2\n"
- "ZSwgYnV0IGZvciBhbnlvbmUgdGhhdCB3aWxsIHdhbnQNCnRvIHVzZSBEQVgsIHRoZXkgd2lsbCB3\n"
- "YW50IHRoZSBtbWFwL2ZhdWx0IHBhdGgsIG5vdCB0aGUgSU8gcGF0aC4gVGhpcyBpcw0KanVzdCBt\n"
- "YWtpbmcgdGhlIElPIHBhdGggJ21vcmUgY29ycmVjdCcgYnkgYWxsb3dpbmcgaXQgYSB3YXkgdG8g\n"
- "ZGVhbCB3aXRoDQplcnJvcnMuDQoNCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IA0KPiA+\n"
- "ID4gDQo+ID4gPiANCj4gPiA+IFsqImxlc3MgY29uY3VycmVudCIgYmVjYXVzZSBvZiB0aGUgcXVl\n"
- "dWluZyBkb25lIGluIGJkZXYuIE5vdGUgaG93DQo+ID4gPiDCoCBwbWVtIGlzIG5vdCBldmVuIG11\n"
- "bHRpLXF1ZXVlLCBhbmQgZXZlbiBpZiBpdCB3YXMgaXQgd2lsbCBiZSBtdWNoDQo+ID4gPiDCoCBz\n"
- "bG93ZXIgdGhlbiBEQVggYmVjYXVzZSBvZiB0aGUgY29kZSBkZXB0aCBhbmQgYWxsIHRoZSBsb2Nr\n"
- "cyBhbmQNCj4gPiA+IHRhc2sNCj4gPiA+IMKgIHN3aXRjaGVzIGRvbmUgaW4gdGhlIGJsb2NrIGxh\n"
- "eWVyLiBJbiBEQVggdGhlIGZpbmFsIG1lbWNweSBpcw0KPiA+ID4gZG9uZQ0KPiA+ID4gZGlyZWN0\n"
- "bHkNCj4gPiA+IMKgIG9uIHRoZSB1c2VyLW1vZGUgdGhyZWFkXQ0KPiA+ID4gDQo+ID4gPiBUaGFu\n"
- a3MNCj4gPiA+IEJvYXoNCj4gPiA+IA==
+ "On Mon, 2016-05-02 at 19:03 +0300, Boaz Harrosh wrote:\n"
+ "> On 05/02/2016 06:51 PM, Vishal Verma wrote:\n"
+ "> > \n"
+ "> > On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:\n"
+ "> > > \n"
+ "> > > On 04/29/2016 12:16 AM, Vishal Verma wrote:\n"
+ "> > > > \n"
+ "> > > > \n"
+ "> > > > All IO in a dax filesystem used to go through dax_do_io, which\n"
+ "> > > > cannot\n"
+ "> > > > handle media errors, and thus cannot provide a recovery path\n"
+ "> > > > that\n"
+ "> > > > can\n"
+ "> > > > send a write through the driver to clear errors.\n"
+ "> > > > \n"
+ "> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In\n"
+ "> > > > the\n"
+ "> > > > IO\n"
+ "> > > > path for DAX filesystems, use the same direct_IO path for both\n"
+ "> > > > DAX\n"
+ "> > > > and\n"
+ "> > > > direct_io iocbs, but use the flags to identify when we are in\n"
+ "> > > > O_DIRECT\n"
+ "> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the\n"
+ "> > > > conventional\n"
+ "> > > > direct_IO path instead of DAX.\n"
+ "> > > > \n"
+ "> > > Really? What are your thinking here?\n"
+ "> > > \n"
+ "> > > What about all the current users of O_DIRECT, you have just made\n"
+ "> > > them\n"
+ "> > > 4 times slower and \"less concurrent*\" then \"buffred io\" users.\n"
+ "> > > Since\n"
+ "> > > direct_IO path will queue an IO request and all.\n"
+ "> > > (And if it is not so slow then why do we need dax_do_io at all?\n"
+ "> > > [Rhetorical])\n"
+ "> > > \n"
+ "> > > I hate it that you overload the semantics of a known and expected\n"
+ "> > > O_DIRECT flag, for special pmem quirks. This is an incompatible\n"
+ "> > > and unrelated overload of the semantics of O_DIRECT.\n"
+ "> > We overloaded O_DIRECT a long time ago when we made DAX piggyback on\n"
+ "> > the same path:\n"
+ "> > \n"
+ "> > static inline bool io_is_direct(struct file *filp)\n"
+ "> > {\n"
+ "> > \treturn (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping-\n"
+ "> > >host);\n"
+ "> > }\n"
+ "> > \n"
+ "> No as far as the user is concerned we have not. The O_DIRECT user\n"
+ "> is still getting all the semantics he wants, .i.e no syncs no\n"
+ "> memory cache usage, no copies ...\n"
+ "> \n"
+ "> Only with DAX the buffered IO is the same since with pmem it is\n"
+ "> faster.\n"
+ "> Then why not? The basic contract with the user did not break.\n"
+ "> \n"
+ "> The above was just an implementation detail to easily navigate\n"
+ "> through the Linux vfs IO stack and make the least amount of changes\n"
+ "> in every FS that wanted to support DAX.(And since dax_do_io is much\n"
+ "> more like direct_IO then like page-cache IO)\n"
+ "> \n"
+ "> > \n"
+ "> > Yes O_DIRECT on a DAX mounted file system will now be slower, but -\n"
+ "> > \n"
+ "> > > \n"
+ "> > > \n"
+ "> > > > \n"
+ "> > > > \n"
+ "> > > > This allows us a recovery path in the form of opening the file\n"
+ "> > > > with\n"
+ "> > > > O_DIRECT and writing to it with the usual O_DIRECT semantics\n"
+ "> > > > (sector\n"
+ "> > > > alignment restrictions).\n"
+ "> > > > \n"
+ "> > > I understand that you want a sector aligned IO, right? for the\n"
+ "> > > clear of errors. But I hate it that you forced all O_DIRECT IO\n"
+ "> > > to be slow for this.\n"
+ "> > > Can you not make dax_do_io handle media errors? At least for the\n"
+ "> > > parts of the IO that are aligned.\n"
+ "> > > (And your recovery path application above can use only aligned\n"
+ "> > > \302\240IO to make sure)\n"
+ "> > > \n"
+ "> > > Please look for another solution. Even a special\n"
+ "> > > IOCTL_DAX_CLEAR_ERROR\n"
+ "> > \302\240- see all the versions of this series prior to this one, where we\n"
+ "> > try\n"
+ "> > to do a fallback...\n"
+ "> > \n"
+ "> And?\n"
+ "> \n"
+ "> So now all O_DIRECT APPs go 4 times slower. I will have a look but if\n"
+ "> it is really so bad than please consider an IOCTL or syscall. Or a\n"
+ "> special\n"
+ "> O_DAX_ERRORS flag ...\n"
+ "\n"
+ "I'm curious where the 4x slower comes from.. The O_DIRECT path is still\n"
+ "without page-cache copies, and nor does it go through request queues\n"
+ "(since pmem is a bio-based driver). The only overhead is that of\n"
+ "submitting a bio - and while I agree it is more overhead than dax_do_io,\n"
+ "4x seems a bit high.\n"
+ "\n"
+ "> \n"
+ "> Please do not trash all the O_DIRECT users, they are the more\n"
+ "> important\n"
+ "> clients, like DBs and VMs.\n"
+ "\n"
+ "Shouldn't they be using mmaps and dax faults? I was under the impression\n"
+ "that the dax_do_io path is a nice-to-have, but for anyone that will want\n"
+ "to use DAX, they will want the mmap/fault path, not the IO path. This is\n"
+ "just making the IO path 'more correct' by allowing it a way to deal with\n"
+ "errors.\n"
+ "\n"
+ "> \n"
+ "> Thanks\n"
+ "> Boaz\n"
+ "> \n"
+ "> > \n"
+ "> > > \n"
+ "> > > \n"
+ "> > > [*\"less concurrent\" because of the queuing done in bdev. Note how\n"
+ "> > > \302\240 pmem is not even multi-queue, and even if it was it will be much\n"
+ "> > > \302\240 slower then DAX because of the code depth and all the locks and\n"
+ "> > > task\n"
+ "> > > \302\240 switches done in the block layer. In DAX the final memcpy is\n"
+ "> > > done\n"
+ "> > > directly\n"
+ "> > > \302\240 on the user-mode thread]\n"
+ "> > > \n"
+ "> > > Thanks\n"
+ "> > > Boaz\n"
+ > > >
 
-24869abd3ea9a39bba870c7d85f8910222fd85059cf703e4e503c13cf44d32f9
+1fe66ca5bc13c160471810c6a5d3ff5be76e226e2e363817077c46ed09919a2f

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.