All of lore.kernel.org
 help / color / mirror / Atom feed
diff for duplicates of <1484872265.4857.1.camel@intel.com>

diff --git a/a/1.txt b/N1/1.txt
index f21b819..5bfa9c5 100644
--- a/a/1.txt
+++ b/N1/1.txt
@@ -1,81 +1,128 @@
-T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE3OjU4IC0wODAwLCBBbmRpcnkgWHUgd3JvdGU6DQo+IE9u
-IFR1ZSwgSmFuIDE3LCAyMDE3IGF0IDM6NTEgUE0sIFZpc2hhbCBWZXJtYSA8dmlzaGFsLmwudmVy
-bWFAaW50ZWwuY28NCj4gbT4gd3JvdGU6DQo+ID4gT24gMDEvMTcsIEFuZGlyeSBYdSB3cm90ZToN
-Cj4gPiANCj4gPiA8c25pcD4NCj4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGUgcG1lbV9k
-b19idmVjKCkgcmVhZCBsb2dpYyBpcyBsaWtlIHRoaXM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g
-cG1lbV9kb19idmVjKCkNCj4gPiA+ID4gPiDCoMKgwqDCoGlmIChpc19iYWRfcG1lbSgpKQ0KPiA+
-ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gLUVJTzsNCj4gPiA+ID4gPiDCoMKgwqDCoGVs
-c2UNCj4gPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgbWVtY3B5X2Zyb21fcG1lbSgpOw0KPiA+ID4g
-PiA+IA0KPiA+ID4gPiA+IE5vdGUgbWVtY3B5X2Zyb21fcG1lbSgpIGlzIGNhbGxpbmcgbWVtY3B5
-X21jc2FmZSgpLiBEb2VzIHRoaXMNCj4gPiA+ID4gPiBpbXBseQ0KPiA+ID4gPiA+IHRoYXQgZXZl
-biBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0aGUgYmFkYmxvY2sgbGlzdCwgaXQgc3RpbGwgY2FuDQo+
-ID4gPiA+ID4gYmUgYmFkDQo+ID4gPiA+ID4gYW5kIGNhdXNlcyBNQ0U/IERvZXMgdGhlIGJhZGJs
-b2NrIGxpc3QgZ2V0IGNoYW5nZWQgZHVyaW5nIGZpbGUNCj4gPiA+ID4gPiBzeXN0ZW0NCj4gPiA+
-ID4gPiBydW5uaW5nPyBJZiB0aGF0IGlzIHRoZSBjYXNlLCBzaG91bGQgdGhlIGZpbGUgc3lzdGVt
-IGdldCBhDQo+ID4gPiA+ID4gbm90aWZpY2F0aW9uIHdoZW4gaXQgZ2V0cyBjaGFuZ2VkPyBJZiBh
-IGJsb2NrIGlzIGdvb2Qgd2hlbiBJDQo+ID4gPiA+ID4gZmlyc3QNCj4gPiA+ID4gPiByZWFkIGl0
-LCBjYW4gSSBzdGlsbCB0cnVzdCBpdCB0byBiZSBnb29kIGZvciB0aGUgc2Vjb25kDQo+ID4gPiA+
-ID4gYWNjZXNzPw0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLCBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0
-aGUgYmFkYmxvY2tzIGxpc3QsIGl0IGNhbiBzdGlsbCBjYXVzZQ0KPiA+ID4gPiBhbg0KPiA+ID4g
-PiBNQ0UuIFRoaXMgaXMgdGhlIGxhdGVudCBlcnJvciBjYXNlIEkgZGVzY3JpYmVkIGFib3ZlLiBG
-b3IgYQ0KPiA+ID4gPiBzaW1wbGUgcmVhZCgpDQo+ID4gPiA+IHZpYSB0aGUgcG1lbSBkcml2ZXIs
-IHRoaXMgd2lsbCBnZXQgaGFuZGxlZCBieSBtZW1jcHlfbWNzYWZlLiBGb3INCj4gPiA+ID4gbW1h
-cCwNCj4gPiA+ID4gYW4gTUNFIGlzIGluZXZpdGFibGUuDQo+ID4gPiA+IA0KPiA+ID4gPiBZZXMg
-dGhlIGJhZGJsb2NrcyBsaXN0IG1heSBjaGFuZ2Ugd2hpbGUgYSBmaWxlc3lzdGVtIGlzIHJ1bm5p
-bmcuDQo+ID4gPiA+IFRoZSBSRkMNCj4gPiA+ID4gcGF0Y2hlc1sxXSBJIGxpbmtlZCB0byBhZGQg
-YSBub3RpZmljYXRpb24gZm9yIHRoZSBmaWxlc3lzdGVtDQo+ID4gPiA+IHdoZW4gdGhpcw0KPiA+
-ID4gPiBoYXBwZW5zLg0KPiA+ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhpcyBpcyByZWFsbHkgYmFk
-IGFuZCBpdCBtYWtlcyBmaWxlIHN5c3RlbSBpbXBsZW1lbnRhdGlvbiBtdWNoDQo+ID4gPiBtb3Jl
-DQo+ID4gPiBjb21wbGljYXRlZC4gQW5kIGJhZGJsb2NrIG5vdGlmaWNhdGlvbiBkb2VzIG5vdCBo
-ZWxwIHZlcnkgbXVjaCwNCj4gPiA+IGJlY2F1c2UgYW55IGJsb2NrIGNhbiBiZSBiYWQgcG90ZW50
-aWFsbHksIG5vIG1hdHRlciBpdCBpcyBpbg0KPiA+ID4gYmFkYmxvY2sNCj4gPiA+IGxpc3Qgb3Ig
-bm90LiBBbmQgZmlsZSBzeXN0ZW0gaGFzIHRvIHBlcmZvcm0gY2hlY2tpbmcgZm9yIGV2ZXJ5DQo+
-ID4gPiByZWFkLA0KPiA+ID4gdXNpbmcgbWVtY3B5X21jc2FmZS4gVGhpcyBpcyBkaXNhc3RlciBm
-b3IgZmlsZSBzeXN0ZW0gbGlrZSBOT1ZBLA0KPiA+ID4gd2hpY2gNCj4gPiA+IHVzZXMgcG9pbnRl
-ciBkZS1yZWZlcmVuY2UgdG8gYWNjZXNzIGRhdGEgc3RydWN0dXJlcyBvbiBwbWVtLiBOb3cNCj4g
-PiA+IGlmIEkNCj4gPiA+IHdhbnQgdG8gcmVhZCBhIGZpZWxkIGluIGFuIGlub2RlIG9uIHBtZW0s
-IEkgaGF2ZSB0byBjb3B5IGl0IHRvDQo+ID4gPiBEUkFNDQo+ID4gPiBmaXJzdCBhbmQgbWFrZSBz
-dXJlIG1lbWNweV9tY3NhZmUoKSBkb2VzIG5vdCByZXBvcnQgYW55dGhpbmcNCj4gPiA+IHdyb25n
-Lg0KPiA+IA0KPiA+IFlvdSBoYXZlIGEgZ29vZCBwb2ludCwgYW5kIEkgZG9uJ3Qga25vdyBpZiBJ
-IGhhdmUgYW4gYW5zd2VyIGZvcg0KPiA+IHRoaXMuLg0KPiA+IEFzc3VtaW5nIGEgc3lzdGVtIHdp
-dGggTUNFIHJlY292ZXJ5LCBtYXliZSBOT1ZBIGNhbiBhZGQgYSBtY2UNCj4gPiBoYW5kbGVyDQo+
-ID4gc2ltaWxhciB0byBuZml0X2hhbmRsZV9tY2UoKSwgYW5kIGhhbmRsZSBlcnJvcnMgYXMgdGhl
-eSBoYXBwZW4sIGJ1dA0KPiA+IEknbQ0KPiA+IGJlaW5nIHZlcnkgaGFuZC13YXZleSBoZXJlIGFu
-ZCBkb24ndCBrbm93IGhvdyBtdWNoL2hvdyB3ZWxsIHRoYXQNCj4gPiBtaWdodA0KPiA+IHdvcmsu
-Lg0KPiA+IA0KPiA+ID4gDQo+ID4gPiA+IE5vLCBpZiB0aGUgbWVkaWEsIGZvciBzb21lIHJlYXNv
-biwgJ2R2ZWxvcHMnIGEgYmFkIGNlbGwsIGENCj4gPiA+ID4gc2Vjb25kDQo+ID4gPiA+IGNvbnNl
-Y3V0aXZlIHJlYWQgZG9lcyBoYXZlIGEgY2hhbmNlIG9mIGJlaW5nIGJhZC4gT25jZSBhDQo+ID4g
-PiA+IGxvY2F0aW9uIGhhcw0KPiA+ID4gPiBiZWVuIG1hcmtlZCBhcyBiYWQsIGl0IHdpbGwgc3Rh
-eSBiYWQgdGlsbCB0aGUgQUNQSSBjbGVhciBlcnJvcg0KPiA+ID4gPiAnRFNNJyBoYXMNCj4gPiA+
-ID4gYmVlbiBjYWxsZWQgdG8gbWFyayBpdCBhcyBjbGVhbi4NCj4gPiA+ID4gDQo+ID4gPiANCj4g
-PiA+IEkgd29uZGVyIHdoYXQgaGFwcGVucyB0byB3cml0ZSBpbiB0aGlzIGNhc2U/IElmIGEgYmxv
-Y2sgaXMgYmFkIGJ1dA0KPiA+ID4gbm90DQo+ID4gPiByZXBvcnRlZCBpbiBiYWRibG9jayBsaXN0
-LiBOb3cgSSB3cml0ZSB0byBpdCB3aXRob3V0IHJlYWRpbmcNCj4gPiA+IGZpcnN0LiBEbw0KPiA+
-ID4gSSBjbGVhciB0aGUgcG9pc29uIHdpdGggdGhlIHdyaXRlPyBPciBzdGlsbCByZXF1aXJlIGEg
-QUNQSSBEU00/DQo+ID4gDQo+ID4gV2l0aCB3cml0ZXMsIG15IHVuZGVyc3RhbmRpbmcgaXMgdGhl
-cmUgaXMgc3RpbGwgYSBwb3NzaWJpbGl0eSB0aGF0DQo+ID4gYW4NCj4gPiBpbnRlcm5hbCByZWFk
-LW1vZGlmeS13cml0ZSBjYW4gaGFwcGVuLCBhbmQgY2F1c2UgYSBNQ0UgKHRoaXMgaXMgdGhlDQo+
-ID4gc2FtZQ0KPiA+IGFzIHdyaXRpbmcgdG8gYSBiYWQgRFJBTSBjZWxsLCB3aGljaCBjYW4gYWxz
-byBjYXVzZSBhbiBNQ0UpLiBZb3UNCj4gPiBjYW4ndA0KPiA+IHJlYWxseSB1c2UgdGhlIEFDUEkg
-RFNNIHByZWVtcHRpdmVseSBiZWNhdXNlIHlvdSBkb24ndCBrbm93IHdoZXRoZXINCj4gPiB0aGUN
-Cj4gPiBsb2NhdGlvbiB3YXMgYmFkLiBUaGUgZXJyb3IgZmxvdyB3aWxsIGJlIHNvbWV0aGluZyBs
-aWtlIHdyaXRlIGNhdXNlcw0KPiA+IHRoZQ0KPiA+IE1DRSwgYSBiYWRibG9jayBnZXRzIGFkZGVk
-IChlaXRoZXIgdGhyb3VnaCB0aGUgbWNlIGhhbmRsZXIgb3IgYWZ0ZXINCj4gPiB0aGUNCj4gPiBu
-ZXh0IHJlYm9vdCksIGFuZCB0aGUgcmVjb3ZlcnkgcGF0aCBpcyBub3cgdGhlIHNhbWUgYXMgYSBy
-ZWd1bGFyDQo+ID4gYmFkYmxvY2suDQo+ID4gDQo+IA0KPiBUaGlzIGlzIGRpZmZlcmVudCBmcm9t
-IG15IHVuZGVyc3RhbmRpbmcuIFJpZ2h0IG5vdyB3cml0ZV9wbWVtKCkgaW4NCj4gcG1lbV9kb19i
-dmVjKCkgZG9lcyBub3QgdXNlIG1lbWNweV9tY3NhZmUoKS4gSWYgdGhlIGJsb2NrIGlzIGJhZCBp
-dA0KPiBjbGVhcnMgcG9pc29uIGFuZCB3cml0ZXMgdG8gcG1lbSBhZ2Fpbi4gU2VlbXMgdG8gbWUg
-d3JpdGluZyB0byBiYWQNCj4gYmxvY2tzIGRvZXMgbm90IGNhdXNlIE1DRS4gRG8gd2UgbmVlZCBt
-ZW1jcHlfbWNzYWZlIGZvciBwbWVtIHN0b3Jlcz8NCg0KWW91IGFyZSByaWdodCwgd3JpdGVzIGRv
-bid0IHVzZSBtZW1jcHlfbWNzYWZlLCBhbmQgd2lsbCBub3QgZGlyZWN0bHkNCmNhdXNlIGFuIE1D
-RS4gSG93ZXZlciBhIHdyaXRlIGNhbiBjYXVzZSBhbiBhc3luY2hyb25vdXMgJ0NNQ0knIC0NCmNv
-cnJlY3RlZCBtYWNoaW5lIGNoZWNrIGludGVycnVwdCwgYnV0IHRoaXMgaXMgbm90IGNyaXRpY2Fs
-LCBhbmQgd29udCBiZQ0KYSBtZW1vcnkgZXJyb3IgYXMgdGhlIGNvcmUgZGlkbid0IGNvbnN1bWUg
-cG9pc29uLiBtZW1jcHlfbWNzYWZlIGNhbm5vdA0KcHJvdGVjdCBhZ2FpbnN0IHRoaXMgYmVjYXVz
-ZSB0aGUgd3JpdGUgaXMgJ3Bvc3RlZCcgYW5kIHRoZSBDTUNJIGlzIG5vdA0Kc3luY2hyb25vdXMu
-IE5vdGUgdGhhdCB0aGlzIGlzIG9ubHkgaW4gdGhlIGxhdGVudCBlcnJvciBvciBtZW1tYXAtc3Rv
-cmUNCmNhc2UuDQoNCj4gDQo+IFRoYW5rcywNCj4gQW5kaXJ5DQo+IA0KPiA+ID4gDQo+ID4gPiA+
-IFsxXTogaHR0cDovL3d3dy5saW51eC5zZ2kuY29tL2FyY2hpdmVzL3hmcy8yMDE2LTA2L21zZzAw
-Mjk5Lmh0bWwNCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IFRoYW5rIHlvdSBmb3IgdGhlIHBhdGNo
-c2V0LiBJIHdpbGwgbG9vayBpbnRvIGl0Lg0KPiA+ID4g
+On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote:
+> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co
+> m> wrote:
+> > On 01/17, Andiry Xu wrote:
+> > 
+> > <snip>
+> > 
+> > > > > 
+> > > > > The pmem_do_bvec() read logic is like this:
+> > > > > 
+> > > > > pmem_do_bvec()
+> > > > >     if (is_bad_pmem())
+> > > > >         return -EIO;
+> > > > >     else
+> > > > >         memcpy_from_pmem();
+> > > > > 
+> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this
+> > > > > imply
+> > > > > that even if a block is not in the badblock list, it still can
+> > > > > be bad
+> > > > > and causes MCE? Does the badblock list get changed during file
+> > > > > system
+> > > > > running? If that is the case, should the file system get a
+> > > > > notification when it gets changed? If a block is good when I
+> > > > > first
+> > > > > read it, can I still trust it to be good for the second
+> > > > > access?
+> > > > 
+> > > > Yes, if a block is not in the badblocks list, it can still cause
+> > > > an
+> > > > MCE. This is the latent error case I described above. For a
+> > > > simple read()
+> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For
+> > > > mmap,
+> > > > an MCE is inevitable.
+> > > > 
+> > > > Yes the badblocks list may change while a filesystem is running.
+> > > > The RFC
+> > > > patches[1] I linked to add a notification for the filesystem
+> > > > when this
+> > > > happens.
+> > > > 
+> > > 
+> > > This is really bad and it makes file system implementation much
+> > > more
+> > > complicated. And badblock notification does not help very much,
+> > > because any block can be bad potentially, no matter it is in
+> > > badblock
+> > > list or not. And file system has to perform checking for every
+> > > read,
+> > > using memcpy_mcsafe. This is disaster for file system like NOVA,
+> > > which
+> > > uses pointer de-reference to access data structures on pmem. Now
+> > > if I
+> > > want to read a field in an inode on pmem, I have to copy it to
+> > > DRAM
+> > > first and make sure memcpy_mcsafe() does not report anything
+> > > wrong.
+> > 
+> > You have a good point, and I don't know if I have an answer for
+> > this..
+> > Assuming a system with MCE recovery, maybe NOVA can add a mce
+> > handler
+> > similar to nfit_handle_mce(), and handle errors as they happen, but
+> > I'm
+> > being very hand-wavey here and don't know how much/how well that
+> > might
+> > work..
+> > 
+> > > 
+> > > > No, if the media, for some reason, 'dvelops' a bad cell, a
+> > > > second
+> > > > consecutive read does have a chance of being bad. Once a
+> > > > location has
+> > > > been marked as bad, it will stay bad till the ACPI clear error
+> > > > 'DSM' has
+> > > > been called to mark it as clean.
+> > > > 
+> > > 
+> > > I wonder what happens to write in this case? If a block is bad but
+> > > not
+> > > reported in badblock list. Now I write to it without reading
+> > > first. Do
+> > > I clear the poison with the write? Or still require a ACPI DSM?
+> > 
+> > With writes, my understanding is there is still a possibility that
+> > an
+> > internal read-modify-write can happen, and cause a MCE (this is the
+> > same
+> > as writing to a bad DRAM cell, which can also cause an MCE). You
+> > can't
+> > really use the ACPI DSM preemptively because you don't know whether
+> > the
+> > location was bad. The error flow will be something like write causes
+> > the
+> > MCE, a badblock gets added (either through the mce handler or after
+> > the
+> > next reboot), and the recovery path is now the same as a regular
+> > badblock.
+> > 
+> 
+> This is different from my understanding. Right now write_pmem() in
+> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it
+> clears poison and writes to pmem again. Seems to me writing to bad
+> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores?
+
+You are right, writes don't use memcpy_mcsafe, and will not directly
+cause an MCE. However a write can cause an asynchronous 'CMCI' -
+corrected machine check interrupt, but this is not critical, and wont be
+a memory error as the core didn't consume poison. memcpy_mcsafe cannot
+protect against this because the write is 'posted' and the CMCI is not
+synchronous. Note that this is only in the latent error or memmap-store
+case.
+
+> 
+> Thanks,
+> Andiry
+> 
+> > > 
+> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html
+> > > > 
+> > > 
+> > > Thank you for the patchset. I will look into it.
+> > > 
+_______________________________________________
+Linux-nvdimm mailing list
+Linux-nvdimm@lists.01.org
+https://lists.01.org/mailman/listinfo/linux-nvdimm
diff --git a/a/content_digest b/N1/content_digest
index ab6185c..ab3a720 100644
--- a/a/content_digest
+++ b/N1/content_digest
@@ -9,99 +9,147 @@
  "ref\0CAOvWMLYcP9PN6LT51gwJvmyCTfRRrVeDTrjN-8_zTKhD+UmDiw@mail.gmail.com\0"
  "ref\020170117235150.GE4880@omniknight.lm.intel.com\0"
  "ref\0CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w@mail.gmail.com\0"
- "From\0Verma, Vishal L <vishal.l.verma@intel.com>\0"
+ "ref\0CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org\0"
+ "From\0Verma, Vishal L <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>\0"
  "Subject\0Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems\0"
  "Date\0Fri, 20 Jan 2017 00:32:14 +0000\0"
- "To\0andiry@gmail.com <andiry@gmail.com>\0"
- "Cc\0darrick.wong@oracle.com <darrick.wong@oracle.com>"
-  Vyacheslav.Dubeyko@wdc.com <Vyacheslav.Dubeyko@wdc.com>
-  linux-block@vger.kernel.org <linux-block@vger.kernel.org>
-  slava@dubeyko.com <slava@dubeyko.com>
-  lsf-pc@lists.linux-foundation.org <lsf-pc@lists.linux-foundation.org>
-  linux-nvdimm@ml01.01.org <linux-nvdimm@ml01.01.org>
- " linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>\0"
+ "To\0andiry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <andiry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>\0"
+ "Cc\0Vyacheslav.Dubeyko-Sjgp3cTcYWE@public.gmane.org <Vyacheslav.Dubeyko-Sjgp3cTcYWE@public.gmane.org>"
+  darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
+  linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org <linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org>
+  linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
+  slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
+  linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
+ " lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org <lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>\0"
  "\00:1\0"
  "b\0"
- "T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE3OjU4IC0wODAwLCBBbmRpcnkgWHUgd3JvdGU6DQo+IE9u\n"
- "IFR1ZSwgSmFuIDE3LCAyMDE3IGF0IDM6NTEgUE0sIFZpc2hhbCBWZXJtYSA8dmlzaGFsLmwudmVy\n"
- "bWFAaW50ZWwuY28NCj4gbT4gd3JvdGU6DQo+ID4gT24gMDEvMTcsIEFuZGlyeSBYdSB3cm90ZToN\n"
- "Cj4gPiANCj4gPiA8c25pcD4NCj4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGUgcG1lbV9k\n"
- "b19idmVjKCkgcmVhZCBsb2dpYyBpcyBsaWtlIHRoaXM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g\n"
- "cG1lbV9kb19idmVjKCkNCj4gPiA+ID4gPiDCoMKgwqDCoGlmIChpc19iYWRfcG1lbSgpKQ0KPiA+\n"
- "ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gLUVJTzsNCj4gPiA+ID4gPiDCoMKgwqDCoGVs\n"
- "c2UNCj4gPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgbWVtY3B5X2Zyb21fcG1lbSgpOw0KPiA+ID4g\n"
- "PiA+IA0KPiA+ID4gPiA+IE5vdGUgbWVtY3B5X2Zyb21fcG1lbSgpIGlzIGNhbGxpbmcgbWVtY3B5\n"
- "X21jc2FmZSgpLiBEb2VzIHRoaXMNCj4gPiA+ID4gPiBpbXBseQ0KPiA+ID4gPiA+IHRoYXQgZXZl\n"
- "biBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0aGUgYmFkYmxvY2sgbGlzdCwgaXQgc3RpbGwgY2FuDQo+\n"
- "ID4gPiA+ID4gYmUgYmFkDQo+ID4gPiA+ID4gYW5kIGNhdXNlcyBNQ0U/IERvZXMgdGhlIGJhZGJs\n"
- "b2NrIGxpc3QgZ2V0IGNoYW5nZWQgZHVyaW5nIGZpbGUNCj4gPiA+ID4gPiBzeXN0ZW0NCj4gPiA+\n"
- "ID4gPiBydW5uaW5nPyBJZiB0aGF0IGlzIHRoZSBjYXNlLCBzaG91bGQgdGhlIGZpbGUgc3lzdGVt\n"
- "IGdldCBhDQo+ID4gPiA+ID4gbm90aWZpY2F0aW9uIHdoZW4gaXQgZ2V0cyBjaGFuZ2VkPyBJZiBh\n"
- "IGJsb2NrIGlzIGdvb2Qgd2hlbiBJDQo+ID4gPiA+ID4gZmlyc3QNCj4gPiA+ID4gPiByZWFkIGl0\n"
- "LCBjYW4gSSBzdGlsbCB0cnVzdCBpdCB0byBiZSBnb29kIGZvciB0aGUgc2Vjb25kDQo+ID4gPiA+\n"
- "ID4gYWNjZXNzPw0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLCBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0\n"
- "aGUgYmFkYmxvY2tzIGxpc3QsIGl0IGNhbiBzdGlsbCBjYXVzZQ0KPiA+ID4gPiBhbg0KPiA+ID4g\n"
- "PiBNQ0UuIFRoaXMgaXMgdGhlIGxhdGVudCBlcnJvciBjYXNlIEkgZGVzY3JpYmVkIGFib3ZlLiBG\n"
- "b3IgYQ0KPiA+ID4gPiBzaW1wbGUgcmVhZCgpDQo+ID4gPiA+IHZpYSB0aGUgcG1lbSBkcml2ZXIs\n"
- "IHRoaXMgd2lsbCBnZXQgaGFuZGxlZCBieSBtZW1jcHlfbWNzYWZlLiBGb3INCj4gPiA+ID4gbW1h\n"
- "cCwNCj4gPiA+ID4gYW4gTUNFIGlzIGluZXZpdGFibGUuDQo+ID4gPiA+IA0KPiA+ID4gPiBZZXMg\n"
- "dGhlIGJhZGJsb2NrcyBsaXN0IG1heSBjaGFuZ2Ugd2hpbGUgYSBmaWxlc3lzdGVtIGlzIHJ1bm5p\n"
- "bmcuDQo+ID4gPiA+IFRoZSBSRkMNCj4gPiA+ID4gcGF0Y2hlc1sxXSBJIGxpbmtlZCB0byBhZGQg\n"
- "YSBub3RpZmljYXRpb24gZm9yIHRoZSBmaWxlc3lzdGVtDQo+ID4gPiA+IHdoZW4gdGhpcw0KPiA+\n"
- "ID4gPiBoYXBwZW5zLg0KPiA+ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhpcyBpcyByZWFsbHkgYmFk\n"
- "IGFuZCBpdCBtYWtlcyBmaWxlIHN5c3RlbSBpbXBsZW1lbnRhdGlvbiBtdWNoDQo+ID4gPiBtb3Jl\n"
- "DQo+ID4gPiBjb21wbGljYXRlZC4gQW5kIGJhZGJsb2NrIG5vdGlmaWNhdGlvbiBkb2VzIG5vdCBo\n"
- "ZWxwIHZlcnkgbXVjaCwNCj4gPiA+IGJlY2F1c2UgYW55IGJsb2NrIGNhbiBiZSBiYWQgcG90ZW50\n"
- "aWFsbHksIG5vIG1hdHRlciBpdCBpcyBpbg0KPiA+ID4gYmFkYmxvY2sNCj4gPiA+IGxpc3Qgb3Ig\n"
- "bm90LiBBbmQgZmlsZSBzeXN0ZW0gaGFzIHRvIHBlcmZvcm0gY2hlY2tpbmcgZm9yIGV2ZXJ5DQo+\n"
- "ID4gPiByZWFkLA0KPiA+ID4gdXNpbmcgbWVtY3B5X21jc2FmZS4gVGhpcyBpcyBkaXNhc3RlciBm\n"
- "b3IgZmlsZSBzeXN0ZW0gbGlrZSBOT1ZBLA0KPiA+ID4gd2hpY2gNCj4gPiA+IHVzZXMgcG9pbnRl\n"
- "ciBkZS1yZWZlcmVuY2UgdG8gYWNjZXNzIGRhdGEgc3RydWN0dXJlcyBvbiBwbWVtLiBOb3cNCj4g\n"
- "PiA+IGlmIEkNCj4gPiA+IHdhbnQgdG8gcmVhZCBhIGZpZWxkIGluIGFuIGlub2RlIG9uIHBtZW0s\n"
- "IEkgaGF2ZSB0byBjb3B5IGl0IHRvDQo+ID4gPiBEUkFNDQo+ID4gPiBmaXJzdCBhbmQgbWFrZSBz\n"
- "dXJlIG1lbWNweV9tY3NhZmUoKSBkb2VzIG5vdCByZXBvcnQgYW55dGhpbmcNCj4gPiA+IHdyb25n\n"
- "Lg0KPiA+IA0KPiA+IFlvdSBoYXZlIGEgZ29vZCBwb2ludCwgYW5kIEkgZG9uJ3Qga25vdyBpZiBJ\n"
- "IGhhdmUgYW4gYW5zd2VyIGZvcg0KPiA+IHRoaXMuLg0KPiA+IEFzc3VtaW5nIGEgc3lzdGVtIHdp\n"
- "dGggTUNFIHJlY292ZXJ5LCBtYXliZSBOT1ZBIGNhbiBhZGQgYSBtY2UNCj4gPiBoYW5kbGVyDQo+\n"
- "ID4gc2ltaWxhciB0byBuZml0X2hhbmRsZV9tY2UoKSwgYW5kIGhhbmRsZSBlcnJvcnMgYXMgdGhl\n"
- "eSBoYXBwZW4sIGJ1dA0KPiA+IEknbQ0KPiA+IGJlaW5nIHZlcnkgaGFuZC13YXZleSBoZXJlIGFu\n"
- "ZCBkb24ndCBrbm93IGhvdyBtdWNoL2hvdyB3ZWxsIHRoYXQNCj4gPiBtaWdodA0KPiA+IHdvcmsu\n"
- "Lg0KPiA+IA0KPiA+ID4gDQo+ID4gPiA+IE5vLCBpZiB0aGUgbWVkaWEsIGZvciBzb21lIHJlYXNv\n"
- "biwgJ2R2ZWxvcHMnIGEgYmFkIGNlbGwsIGENCj4gPiA+ID4gc2Vjb25kDQo+ID4gPiA+IGNvbnNl\n"
- "Y3V0aXZlIHJlYWQgZG9lcyBoYXZlIGEgY2hhbmNlIG9mIGJlaW5nIGJhZC4gT25jZSBhDQo+ID4g\n"
- "PiA+IGxvY2F0aW9uIGhhcw0KPiA+ID4gPiBiZWVuIG1hcmtlZCBhcyBiYWQsIGl0IHdpbGwgc3Rh\n"
- "eSBiYWQgdGlsbCB0aGUgQUNQSSBjbGVhciBlcnJvcg0KPiA+ID4gPiAnRFNNJyBoYXMNCj4gPiA+\n"
- "ID4gYmVlbiBjYWxsZWQgdG8gbWFyayBpdCBhcyBjbGVhbi4NCj4gPiA+ID4gDQo+ID4gPiANCj4g\n"
- "PiA+IEkgd29uZGVyIHdoYXQgaGFwcGVucyB0byB3cml0ZSBpbiB0aGlzIGNhc2U/IElmIGEgYmxv\n"
- "Y2sgaXMgYmFkIGJ1dA0KPiA+ID4gbm90DQo+ID4gPiByZXBvcnRlZCBpbiBiYWRibG9jayBsaXN0\n"
- "LiBOb3cgSSB3cml0ZSB0byBpdCB3aXRob3V0IHJlYWRpbmcNCj4gPiA+IGZpcnN0LiBEbw0KPiA+\n"
- "ID4gSSBjbGVhciB0aGUgcG9pc29uIHdpdGggdGhlIHdyaXRlPyBPciBzdGlsbCByZXF1aXJlIGEg\n"
- "QUNQSSBEU00/DQo+ID4gDQo+ID4gV2l0aCB3cml0ZXMsIG15IHVuZGVyc3RhbmRpbmcgaXMgdGhl\n"
- "cmUgaXMgc3RpbGwgYSBwb3NzaWJpbGl0eSB0aGF0DQo+ID4gYW4NCj4gPiBpbnRlcm5hbCByZWFk\n"
- "LW1vZGlmeS13cml0ZSBjYW4gaGFwcGVuLCBhbmQgY2F1c2UgYSBNQ0UgKHRoaXMgaXMgdGhlDQo+\n"
- "ID4gc2FtZQ0KPiA+IGFzIHdyaXRpbmcgdG8gYSBiYWQgRFJBTSBjZWxsLCB3aGljaCBjYW4gYWxz\n"
- "byBjYXVzZSBhbiBNQ0UpLiBZb3UNCj4gPiBjYW4ndA0KPiA+IHJlYWxseSB1c2UgdGhlIEFDUEkg\n"
- "RFNNIHByZWVtcHRpdmVseSBiZWNhdXNlIHlvdSBkb24ndCBrbm93IHdoZXRoZXINCj4gPiB0aGUN\n"
- "Cj4gPiBsb2NhdGlvbiB3YXMgYmFkLiBUaGUgZXJyb3IgZmxvdyB3aWxsIGJlIHNvbWV0aGluZyBs\n"
- "aWtlIHdyaXRlIGNhdXNlcw0KPiA+IHRoZQ0KPiA+IE1DRSwgYSBiYWRibG9jayBnZXRzIGFkZGVk\n"
- "IChlaXRoZXIgdGhyb3VnaCB0aGUgbWNlIGhhbmRsZXIgb3IgYWZ0ZXINCj4gPiB0aGUNCj4gPiBu\n"
- "ZXh0IHJlYm9vdCksIGFuZCB0aGUgcmVjb3ZlcnkgcGF0aCBpcyBub3cgdGhlIHNhbWUgYXMgYSBy\n"
- "ZWd1bGFyDQo+ID4gYmFkYmxvY2suDQo+ID4gDQo+IA0KPiBUaGlzIGlzIGRpZmZlcmVudCBmcm9t\n"
- "IG15IHVuZGVyc3RhbmRpbmcuIFJpZ2h0IG5vdyB3cml0ZV9wbWVtKCkgaW4NCj4gcG1lbV9kb19i\n"
- "dmVjKCkgZG9lcyBub3QgdXNlIG1lbWNweV9tY3NhZmUoKS4gSWYgdGhlIGJsb2NrIGlzIGJhZCBp\n"
- "dA0KPiBjbGVhcnMgcG9pc29uIGFuZCB3cml0ZXMgdG8gcG1lbSBhZ2Fpbi4gU2VlbXMgdG8gbWUg\n"
- "d3JpdGluZyB0byBiYWQNCj4gYmxvY2tzIGRvZXMgbm90IGNhdXNlIE1DRS4gRG8gd2UgbmVlZCBt\n"
- "ZW1jcHlfbWNzYWZlIGZvciBwbWVtIHN0b3Jlcz8NCg0KWW91IGFyZSByaWdodCwgd3JpdGVzIGRv\n"
- "bid0IHVzZSBtZW1jcHlfbWNzYWZlLCBhbmQgd2lsbCBub3QgZGlyZWN0bHkNCmNhdXNlIGFuIE1D\n"
- "RS4gSG93ZXZlciBhIHdyaXRlIGNhbiBjYXVzZSBhbiBhc3luY2hyb25vdXMgJ0NNQ0knIC0NCmNv\n"
- "cnJlY3RlZCBtYWNoaW5lIGNoZWNrIGludGVycnVwdCwgYnV0IHRoaXMgaXMgbm90IGNyaXRpY2Fs\n"
- "LCBhbmQgd29udCBiZQ0KYSBtZW1vcnkgZXJyb3IgYXMgdGhlIGNvcmUgZGlkbid0IGNvbnN1bWUg\n"
- "cG9pc29uLiBtZW1jcHlfbWNzYWZlIGNhbm5vdA0KcHJvdGVjdCBhZ2FpbnN0IHRoaXMgYmVjYXVz\n"
- "ZSB0aGUgd3JpdGUgaXMgJ3Bvc3RlZCcgYW5kIHRoZSBDTUNJIGlzIG5vdA0Kc3luY2hyb25vdXMu\n"
- "IE5vdGUgdGhhdCB0aGlzIGlzIG9ubHkgaW4gdGhlIGxhdGVudCBlcnJvciBvciBtZW1tYXAtc3Rv\n"
- "cmUNCmNhc2UuDQoNCj4gDQo+IFRoYW5rcywNCj4gQW5kaXJ5DQo+IA0KPiA+ID4gDQo+ID4gPiA+\n"
- "IFsxXTogaHR0cDovL3d3dy5saW51eC5zZ2kuY29tL2FyY2hpdmVzL3hmcy8yMDE2LTA2L21zZzAw\n"
- "Mjk5Lmh0bWwNCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IFRoYW5rIHlvdSBmb3IgdGhlIHBhdGNo\n"
- c2V0LiBJIHdpbGwgbG9vayBpbnRvIGl0Lg0KPiA+ID4g
+ "On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote:\n"
+ "> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co\n"
+ "> m> wrote:\n"
+ "> > On 01/17, Andiry Xu wrote:\n"
+ "> > \n"
+ "> > <snip>\n"
+ "> > \n"
+ "> > > > > \n"
+ "> > > > > The pmem_do_bvec() read logic is like this:\n"
+ "> > > > > \n"
+ "> > > > > pmem_do_bvec()\n"
+ "> > > > > \302\240\302\240\302\240\302\240if (is_bad_pmem())\n"
+ "> > > > > \302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240return -EIO;\n"
+ "> > > > > \302\240\302\240\302\240\302\240else\n"
+ "> > > > > \302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240memcpy_from_pmem();\n"
+ "> > > > > \n"
+ "> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this\n"
+ "> > > > > imply\n"
+ "> > > > > that even if a block is not in the badblock list, it still can\n"
+ "> > > > > be bad\n"
+ "> > > > > and causes MCE? Does the badblock list get changed during file\n"
+ "> > > > > system\n"
+ "> > > > > running? If that is the case, should the file system get a\n"
+ "> > > > > notification when it gets changed? If a block is good when I\n"
+ "> > > > > first\n"
+ "> > > > > read it, can I still trust it to be good for the second\n"
+ "> > > > > access?\n"
+ "> > > > \n"
+ "> > > > Yes, if a block is not in the badblocks list, it can still cause\n"
+ "> > > > an\n"
+ "> > > > MCE. This is the latent error case I described above. For a\n"
+ "> > > > simple read()\n"
+ "> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For\n"
+ "> > > > mmap,\n"
+ "> > > > an MCE is inevitable.\n"
+ "> > > > \n"
+ "> > > > Yes the badblocks list may change while a filesystem is running.\n"
+ "> > > > The RFC\n"
+ "> > > > patches[1] I linked to add a notification for the filesystem\n"
+ "> > > > when this\n"
+ "> > > > happens.\n"
+ "> > > > \n"
+ "> > > \n"
+ "> > > This is really bad and it makes file system implementation much\n"
+ "> > > more\n"
+ "> > > complicated. And badblock notification does not help very much,\n"
+ "> > > because any block can be bad potentially, no matter it is in\n"
+ "> > > badblock\n"
+ "> > > list or not. And file system has to perform checking for every\n"
+ "> > > read,\n"
+ "> > > using memcpy_mcsafe. This is disaster for file system like NOVA,\n"
+ "> > > which\n"
+ "> > > uses pointer de-reference to access data structures on pmem. Now\n"
+ "> > > if I\n"
+ "> > > want to read a field in an inode on pmem, I have to copy it to\n"
+ "> > > DRAM\n"
+ "> > > first and make sure memcpy_mcsafe() does not report anything\n"
+ "> > > wrong.\n"
+ "> > \n"
+ "> > You have a good point, and I don't know if I have an answer for\n"
+ "> > this..\n"
+ "> > Assuming a system with MCE recovery, maybe NOVA can add a mce\n"
+ "> > handler\n"
+ "> > similar to nfit_handle_mce(), and handle errors as they happen, but\n"
+ "> > I'm\n"
+ "> > being very hand-wavey here and don't know how much/how well that\n"
+ "> > might\n"
+ "> > work..\n"
+ "> > \n"
+ "> > > \n"
+ "> > > > No, if the media, for some reason, 'dvelops' a bad cell, a\n"
+ "> > > > second\n"
+ "> > > > consecutive read does have a chance of being bad. Once a\n"
+ "> > > > location has\n"
+ "> > > > been marked as bad, it will stay bad till the ACPI clear error\n"
+ "> > > > 'DSM' has\n"
+ "> > > > been called to mark it as clean.\n"
+ "> > > > \n"
+ "> > > \n"
+ "> > > I wonder what happens to write in this case? If a block is bad but\n"
+ "> > > not\n"
+ "> > > reported in badblock list. Now I write to it without reading\n"
+ "> > > first. Do\n"
+ "> > > I clear the poison with the write? Or still require a ACPI DSM?\n"
+ "> > \n"
+ "> > With writes, my understanding is there is still a possibility that\n"
+ "> > an\n"
+ "> > internal read-modify-write can happen, and cause a MCE (this is the\n"
+ "> > same\n"
+ "> > as writing to a bad DRAM cell, which can also cause an MCE). You\n"
+ "> > can't\n"
+ "> > really use the ACPI DSM preemptively because you don't know whether\n"
+ "> > the\n"
+ "> > location was bad. The error flow will be something like write causes\n"
+ "> > the\n"
+ "> > MCE, a badblock gets added (either through the mce handler or after\n"
+ "> > the\n"
+ "> > next reboot), and the recovery path is now the same as a regular\n"
+ "> > badblock.\n"
+ "> > \n"
+ "> \n"
+ "> This is different from my understanding. Right now write_pmem() in\n"
+ "> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it\n"
+ "> clears poison and writes to pmem again. Seems to me writing to bad\n"
+ "> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores?\n"
+ "\n"
+ "You are right, writes don't use memcpy_mcsafe, and will not directly\n"
+ "cause an MCE. However a write can cause an asynchronous 'CMCI' -\n"
+ "corrected machine check interrupt, but this is not critical, and wont be\n"
+ "a memory error as the core didn't consume poison. memcpy_mcsafe cannot\n"
+ "protect against this because the write is 'posted' and the CMCI is not\n"
+ "synchronous. Note that this is only in the latent error or memmap-store\n"
+ "case.\n"
+ "\n"
+ "> \n"
+ "> Thanks,\n"
+ "> Andiry\n"
+ "> \n"
+ "> > > \n"
+ "> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html\n"
+ "> > > > \n"
+ "> > > \n"
+ "> > > Thank you for the patchset. I will look into it.\n"
+ "> > > \n"
+ "_______________________________________________\n"
+ "Linux-nvdimm mailing list\n"
+ "Linux-nvdimm@lists.01.org\n"
+ https://lists.01.org/mailman/listinfo/linux-nvdimm
 
-f2d10fa7320b827160b9f1c61dcc0c2599df8fbf66603ef83422cab07af32ccc
+524345bf3bb68c65aea5f176e62ba4b2ec7bd9dbb66a9accc036a9cea4f224a4

diff --git a/a/1.txt b/N2/1.txt
index f21b819..31fcd0f 100644
--- a/a/1.txt
+++ b/N2/1.txt
@@ -1,81 +1,124 @@
-T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE3OjU4IC0wODAwLCBBbmRpcnkgWHUgd3JvdGU6DQo+IE9u
-IFR1ZSwgSmFuIDE3LCAyMDE3IGF0IDM6NTEgUE0sIFZpc2hhbCBWZXJtYSA8dmlzaGFsLmwudmVy
-bWFAaW50ZWwuY28NCj4gbT4gd3JvdGU6DQo+ID4gT24gMDEvMTcsIEFuZGlyeSBYdSB3cm90ZToN
-Cj4gPiANCj4gPiA8c25pcD4NCj4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGUgcG1lbV9k
-b19idmVjKCkgcmVhZCBsb2dpYyBpcyBsaWtlIHRoaXM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g
-cG1lbV9kb19idmVjKCkNCj4gPiA+ID4gPiDCoMKgwqDCoGlmIChpc19iYWRfcG1lbSgpKQ0KPiA+
-ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gLUVJTzsNCj4gPiA+ID4gPiDCoMKgwqDCoGVs
-c2UNCj4gPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgbWVtY3B5X2Zyb21fcG1lbSgpOw0KPiA+ID4g
-PiA+IA0KPiA+ID4gPiA+IE5vdGUgbWVtY3B5X2Zyb21fcG1lbSgpIGlzIGNhbGxpbmcgbWVtY3B5
-X21jc2FmZSgpLiBEb2VzIHRoaXMNCj4gPiA+ID4gPiBpbXBseQ0KPiA+ID4gPiA+IHRoYXQgZXZl
-biBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0aGUgYmFkYmxvY2sgbGlzdCwgaXQgc3RpbGwgY2FuDQo+
-ID4gPiA+ID4gYmUgYmFkDQo+ID4gPiA+ID4gYW5kIGNhdXNlcyBNQ0U/IERvZXMgdGhlIGJhZGJs
-b2NrIGxpc3QgZ2V0IGNoYW5nZWQgZHVyaW5nIGZpbGUNCj4gPiA+ID4gPiBzeXN0ZW0NCj4gPiA+
-ID4gPiBydW5uaW5nPyBJZiB0aGF0IGlzIHRoZSBjYXNlLCBzaG91bGQgdGhlIGZpbGUgc3lzdGVt
-IGdldCBhDQo+ID4gPiA+ID4gbm90aWZpY2F0aW9uIHdoZW4gaXQgZ2V0cyBjaGFuZ2VkPyBJZiBh
-IGJsb2NrIGlzIGdvb2Qgd2hlbiBJDQo+ID4gPiA+ID4gZmlyc3QNCj4gPiA+ID4gPiByZWFkIGl0
-LCBjYW4gSSBzdGlsbCB0cnVzdCBpdCB0byBiZSBnb29kIGZvciB0aGUgc2Vjb25kDQo+ID4gPiA+
-ID4gYWNjZXNzPw0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLCBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0
-aGUgYmFkYmxvY2tzIGxpc3QsIGl0IGNhbiBzdGlsbCBjYXVzZQ0KPiA+ID4gPiBhbg0KPiA+ID4g
-PiBNQ0UuIFRoaXMgaXMgdGhlIGxhdGVudCBlcnJvciBjYXNlIEkgZGVzY3JpYmVkIGFib3ZlLiBG
-b3IgYQ0KPiA+ID4gPiBzaW1wbGUgcmVhZCgpDQo+ID4gPiA+IHZpYSB0aGUgcG1lbSBkcml2ZXIs
-IHRoaXMgd2lsbCBnZXQgaGFuZGxlZCBieSBtZW1jcHlfbWNzYWZlLiBGb3INCj4gPiA+ID4gbW1h
-cCwNCj4gPiA+ID4gYW4gTUNFIGlzIGluZXZpdGFibGUuDQo+ID4gPiA+IA0KPiA+ID4gPiBZZXMg
-dGhlIGJhZGJsb2NrcyBsaXN0IG1heSBjaGFuZ2Ugd2hpbGUgYSBmaWxlc3lzdGVtIGlzIHJ1bm5p
-bmcuDQo+ID4gPiA+IFRoZSBSRkMNCj4gPiA+ID4gcGF0Y2hlc1sxXSBJIGxpbmtlZCB0byBhZGQg
-YSBub3RpZmljYXRpb24gZm9yIHRoZSBmaWxlc3lzdGVtDQo+ID4gPiA+IHdoZW4gdGhpcw0KPiA+
-ID4gPiBoYXBwZW5zLg0KPiA+ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhpcyBpcyByZWFsbHkgYmFk
-IGFuZCBpdCBtYWtlcyBmaWxlIHN5c3RlbSBpbXBsZW1lbnRhdGlvbiBtdWNoDQo+ID4gPiBtb3Jl
-DQo+ID4gPiBjb21wbGljYXRlZC4gQW5kIGJhZGJsb2NrIG5vdGlmaWNhdGlvbiBkb2VzIG5vdCBo
-ZWxwIHZlcnkgbXVjaCwNCj4gPiA+IGJlY2F1c2UgYW55IGJsb2NrIGNhbiBiZSBiYWQgcG90ZW50
-aWFsbHksIG5vIG1hdHRlciBpdCBpcyBpbg0KPiA+ID4gYmFkYmxvY2sNCj4gPiA+IGxpc3Qgb3Ig
-bm90LiBBbmQgZmlsZSBzeXN0ZW0gaGFzIHRvIHBlcmZvcm0gY2hlY2tpbmcgZm9yIGV2ZXJ5DQo+
-ID4gPiByZWFkLA0KPiA+ID4gdXNpbmcgbWVtY3B5X21jc2FmZS4gVGhpcyBpcyBkaXNhc3RlciBm
-b3IgZmlsZSBzeXN0ZW0gbGlrZSBOT1ZBLA0KPiA+ID4gd2hpY2gNCj4gPiA+IHVzZXMgcG9pbnRl
-ciBkZS1yZWZlcmVuY2UgdG8gYWNjZXNzIGRhdGEgc3RydWN0dXJlcyBvbiBwbWVtLiBOb3cNCj4g
-PiA+IGlmIEkNCj4gPiA+IHdhbnQgdG8gcmVhZCBhIGZpZWxkIGluIGFuIGlub2RlIG9uIHBtZW0s
-IEkgaGF2ZSB0byBjb3B5IGl0IHRvDQo+ID4gPiBEUkFNDQo+ID4gPiBmaXJzdCBhbmQgbWFrZSBz
-dXJlIG1lbWNweV9tY3NhZmUoKSBkb2VzIG5vdCByZXBvcnQgYW55dGhpbmcNCj4gPiA+IHdyb25n
-Lg0KPiA+IA0KPiA+IFlvdSBoYXZlIGEgZ29vZCBwb2ludCwgYW5kIEkgZG9uJ3Qga25vdyBpZiBJ
-IGhhdmUgYW4gYW5zd2VyIGZvcg0KPiA+IHRoaXMuLg0KPiA+IEFzc3VtaW5nIGEgc3lzdGVtIHdp
-dGggTUNFIHJlY292ZXJ5LCBtYXliZSBOT1ZBIGNhbiBhZGQgYSBtY2UNCj4gPiBoYW5kbGVyDQo+
-ID4gc2ltaWxhciB0byBuZml0X2hhbmRsZV9tY2UoKSwgYW5kIGhhbmRsZSBlcnJvcnMgYXMgdGhl
-eSBoYXBwZW4sIGJ1dA0KPiA+IEknbQ0KPiA+IGJlaW5nIHZlcnkgaGFuZC13YXZleSBoZXJlIGFu
-ZCBkb24ndCBrbm93IGhvdyBtdWNoL2hvdyB3ZWxsIHRoYXQNCj4gPiBtaWdodA0KPiA+IHdvcmsu
-Lg0KPiA+IA0KPiA+ID4gDQo+ID4gPiA+IE5vLCBpZiB0aGUgbWVkaWEsIGZvciBzb21lIHJlYXNv
-biwgJ2R2ZWxvcHMnIGEgYmFkIGNlbGwsIGENCj4gPiA+ID4gc2Vjb25kDQo+ID4gPiA+IGNvbnNl
-Y3V0aXZlIHJlYWQgZG9lcyBoYXZlIGEgY2hhbmNlIG9mIGJlaW5nIGJhZC4gT25jZSBhDQo+ID4g
-PiA+IGxvY2F0aW9uIGhhcw0KPiA+ID4gPiBiZWVuIG1hcmtlZCBhcyBiYWQsIGl0IHdpbGwgc3Rh
-eSBiYWQgdGlsbCB0aGUgQUNQSSBjbGVhciBlcnJvcg0KPiA+ID4gPiAnRFNNJyBoYXMNCj4gPiA+
-ID4gYmVlbiBjYWxsZWQgdG8gbWFyayBpdCBhcyBjbGVhbi4NCj4gPiA+ID4gDQo+ID4gPiANCj4g
-PiA+IEkgd29uZGVyIHdoYXQgaGFwcGVucyB0byB3cml0ZSBpbiB0aGlzIGNhc2U/IElmIGEgYmxv
-Y2sgaXMgYmFkIGJ1dA0KPiA+ID4gbm90DQo+ID4gPiByZXBvcnRlZCBpbiBiYWRibG9jayBsaXN0
-LiBOb3cgSSB3cml0ZSB0byBpdCB3aXRob3V0IHJlYWRpbmcNCj4gPiA+IGZpcnN0LiBEbw0KPiA+
-ID4gSSBjbGVhciB0aGUgcG9pc29uIHdpdGggdGhlIHdyaXRlPyBPciBzdGlsbCByZXF1aXJlIGEg
-QUNQSSBEU00/DQo+ID4gDQo+ID4gV2l0aCB3cml0ZXMsIG15IHVuZGVyc3RhbmRpbmcgaXMgdGhl
-cmUgaXMgc3RpbGwgYSBwb3NzaWJpbGl0eSB0aGF0DQo+ID4gYW4NCj4gPiBpbnRlcm5hbCByZWFk
-LW1vZGlmeS13cml0ZSBjYW4gaGFwcGVuLCBhbmQgY2F1c2UgYSBNQ0UgKHRoaXMgaXMgdGhlDQo+
-ID4gc2FtZQ0KPiA+IGFzIHdyaXRpbmcgdG8gYSBiYWQgRFJBTSBjZWxsLCB3aGljaCBjYW4gYWxz
-byBjYXVzZSBhbiBNQ0UpLiBZb3UNCj4gPiBjYW4ndA0KPiA+IHJlYWxseSB1c2UgdGhlIEFDUEkg
-RFNNIHByZWVtcHRpdmVseSBiZWNhdXNlIHlvdSBkb24ndCBrbm93IHdoZXRoZXINCj4gPiB0aGUN
-Cj4gPiBsb2NhdGlvbiB3YXMgYmFkLiBUaGUgZXJyb3IgZmxvdyB3aWxsIGJlIHNvbWV0aGluZyBs
-aWtlIHdyaXRlIGNhdXNlcw0KPiA+IHRoZQ0KPiA+IE1DRSwgYSBiYWRibG9jayBnZXRzIGFkZGVk
-IChlaXRoZXIgdGhyb3VnaCB0aGUgbWNlIGhhbmRsZXIgb3IgYWZ0ZXINCj4gPiB0aGUNCj4gPiBu
-ZXh0IHJlYm9vdCksIGFuZCB0aGUgcmVjb3ZlcnkgcGF0aCBpcyBub3cgdGhlIHNhbWUgYXMgYSBy
-ZWd1bGFyDQo+ID4gYmFkYmxvY2suDQo+ID4gDQo+IA0KPiBUaGlzIGlzIGRpZmZlcmVudCBmcm9t
-IG15IHVuZGVyc3RhbmRpbmcuIFJpZ2h0IG5vdyB3cml0ZV9wbWVtKCkgaW4NCj4gcG1lbV9kb19i
-dmVjKCkgZG9lcyBub3QgdXNlIG1lbWNweV9tY3NhZmUoKS4gSWYgdGhlIGJsb2NrIGlzIGJhZCBp
-dA0KPiBjbGVhcnMgcG9pc29uIGFuZCB3cml0ZXMgdG8gcG1lbSBhZ2Fpbi4gU2VlbXMgdG8gbWUg
-d3JpdGluZyB0byBiYWQNCj4gYmxvY2tzIGRvZXMgbm90IGNhdXNlIE1DRS4gRG8gd2UgbmVlZCBt
-ZW1jcHlfbWNzYWZlIGZvciBwbWVtIHN0b3Jlcz8NCg0KWW91IGFyZSByaWdodCwgd3JpdGVzIGRv
-bid0IHVzZSBtZW1jcHlfbWNzYWZlLCBhbmQgd2lsbCBub3QgZGlyZWN0bHkNCmNhdXNlIGFuIE1D
-RS4gSG93ZXZlciBhIHdyaXRlIGNhbiBjYXVzZSBhbiBhc3luY2hyb25vdXMgJ0NNQ0knIC0NCmNv
-cnJlY3RlZCBtYWNoaW5lIGNoZWNrIGludGVycnVwdCwgYnV0IHRoaXMgaXMgbm90IGNyaXRpY2Fs
-LCBhbmQgd29udCBiZQ0KYSBtZW1vcnkgZXJyb3IgYXMgdGhlIGNvcmUgZGlkbid0IGNvbnN1bWUg
-cG9pc29uLiBtZW1jcHlfbWNzYWZlIGNhbm5vdA0KcHJvdGVjdCBhZ2FpbnN0IHRoaXMgYmVjYXVz
-ZSB0aGUgd3JpdGUgaXMgJ3Bvc3RlZCcgYW5kIHRoZSBDTUNJIGlzIG5vdA0Kc3luY2hyb25vdXMu
-IE5vdGUgdGhhdCB0aGlzIGlzIG9ubHkgaW4gdGhlIGxhdGVudCBlcnJvciBvciBtZW1tYXAtc3Rv
-cmUNCmNhc2UuDQoNCj4gDQo+IFRoYW5rcywNCj4gQW5kaXJ5DQo+IA0KPiA+ID4gDQo+ID4gPiA+
-IFsxXTogaHR0cDovL3d3dy5saW51eC5zZ2kuY29tL2FyY2hpdmVzL3hmcy8yMDE2LTA2L21zZzAw
-Mjk5Lmh0bWwNCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IFRoYW5rIHlvdSBmb3IgdGhlIHBhdGNo
-c2V0LiBJIHdpbGwgbG9vayBpbnRvIGl0Lg0KPiA+ID4g
+On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote:
+> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co
+> m> wrote:
+> > On 01/17, Andiry Xu wrote:
+> > 
+> > <snip>
+> > 
+> > > > > 
+> > > > > The pmem_do_bvec() read logic is like this:
+> > > > > 
+> > > > > pmem_do_bvec()
+> > > > >     if (is_bad_pmem())
+> > > > >         return -EIO;
+> > > > >     else
+> > > > >         memcpy_from_pmem();
+> > > > > 
+> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this
+> > > > > imply
+> > > > > that even if a block is not in the badblock list, it still can
+> > > > > be bad
+> > > > > and causes MCE? Does the badblock list get changed during file
+> > > > > system
+> > > > > running? If that is the case, should the file system get a
+> > > > > notification when it gets changed? If a block is good when I
+> > > > > first
+> > > > > read it, can I still trust it to be good for the second
+> > > > > access?
+> > > > 
+> > > > Yes, if a block is not in the badblocks list, it can still cause
+> > > > an
+> > > > MCE. This is the latent error case I described above. For a
+> > > > simple read()
+> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For
+> > > > mmap,
+> > > > an MCE is inevitable.
+> > > > 
+> > > > Yes the badblocks list may change while a filesystem is running.
+> > > > The RFC
+> > > > patches[1] I linked to add a notification for the filesystem
+> > > > when this
+> > > > happens.
+> > > > 
+> > > 
+> > > This is really bad and it makes file system implementation much
+> > > more
+> > > complicated. And badblock notification does not help very much,
+> > > because any block can be bad potentially, no matter it is in
+> > > badblock
+> > > list or not. And file system has to perform checking for every
+> > > read,
+> > > using memcpy_mcsafe. This is disaster for file system like NOVA,
+> > > which
+> > > uses pointer de-reference to access data structures on pmem. Now
+> > > if I
+> > > want to read a field in an inode on pmem, I have to copy it to
+> > > DRAM
+> > > first and make sure memcpy_mcsafe() does not report anything
+> > > wrong.
+> > 
+> > You have a good point, and I don't know if I have an answer for
+> > this..
+> > Assuming a system with MCE recovery, maybe NOVA can add a mce
+> > handler
+> > similar to nfit_handle_mce(), and handle errors as they happen, but
+> > I'm
+> > being very hand-wavey here and don't know how much/how well that
+> > might
+> > work..
+> > 
+> > > 
+> > > > No, if the media, for some reason, 'dvelops' a bad cell, a
+> > > > second
+> > > > consecutive read does have a chance of being bad. Once a
+> > > > location has
+> > > > been marked as bad, it will stay bad till the ACPI clear error
+> > > > 'DSM' has
+> > > > been called to mark it as clean.
+> > > > 
+> > > 
+> > > I wonder what happens to write in this case? If a block is bad but
+> > > not
+> > > reported in badblock list. Now I write to it without reading
+> > > first. Do
+> > > I clear the poison with the write? Or still require a ACPI DSM?
+> > 
+> > With writes, my understanding is there is still a possibility that
+> > an
+> > internal read-modify-write can happen, and cause a MCE (this is the
+> > same
+> > as writing to a bad DRAM cell, which can also cause an MCE). You
+> > can't
+> > really use the ACPI DSM preemptively because you don't know whether
+> > the
+> > location was bad. The error flow will be something like write causes
+> > the
+> > MCE, a badblock gets added (either through the mce handler or after
+> > the
+> > next reboot), and the recovery path is now the same as a regular
+> > badblock.
+> > 
+> 
+> This is different from my understanding. Right now write_pmem() in
+> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it
+> clears poison and writes to pmem again. Seems to me writing to bad
+> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores?
+
+You are right, writes don't use memcpy_mcsafe, and will not directly
+cause an MCE. However a write can cause an asynchronous 'CMCI' -
+corrected machine check interrupt, but this is not critical, and wont be
+a memory error as the core didn't consume poison. memcpy_mcsafe cannot
+protect against this because the write is 'posted' and the CMCI is not
+synchronous. Note that this is only in the latent error or memmap-store
+case.
+
+> 
+> Thanks,
+> Andiry
+> 
+> > > 
+> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html
+> > > > 
+> > > 
+> > > Thank you for the patchset. I will look into it.
+> > >
diff --git a/a/content_digest b/N2/content_digest
index ab6185c..3c67493 100644
--- a/a/content_digest
+++ b/N2/content_digest
@@ -22,86 +22,129 @@
  " linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>\0"
  "\00:1\0"
  "b\0"
- "T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE3OjU4IC0wODAwLCBBbmRpcnkgWHUgd3JvdGU6DQo+IE9u\n"
- "IFR1ZSwgSmFuIDE3LCAyMDE3IGF0IDM6NTEgUE0sIFZpc2hhbCBWZXJtYSA8dmlzaGFsLmwudmVy\n"
- "bWFAaW50ZWwuY28NCj4gbT4gd3JvdGU6DQo+ID4gT24gMDEvMTcsIEFuZGlyeSBYdSB3cm90ZToN\n"
- "Cj4gPiANCj4gPiA8c25pcD4NCj4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGUgcG1lbV9k\n"
- "b19idmVjKCkgcmVhZCBsb2dpYyBpcyBsaWtlIHRoaXM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g\n"
- "cG1lbV9kb19idmVjKCkNCj4gPiA+ID4gPiDCoMKgwqDCoGlmIChpc19iYWRfcG1lbSgpKQ0KPiA+\n"
- "ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gLUVJTzsNCj4gPiA+ID4gPiDCoMKgwqDCoGVs\n"
- "c2UNCj4gPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgbWVtY3B5X2Zyb21fcG1lbSgpOw0KPiA+ID4g\n"
- "PiA+IA0KPiA+ID4gPiA+IE5vdGUgbWVtY3B5X2Zyb21fcG1lbSgpIGlzIGNhbGxpbmcgbWVtY3B5\n"
- "X21jc2FmZSgpLiBEb2VzIHRoaXMNCj4gPiA+ID4gPiBpbXBseQ0KPiA+ID4gPiA+IHRoYXQgZXZl\n"
- "biBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0aGUgYmFkYmxvY2sgbGlzdCwgaXQgc3RpbGwgY2FuDQo+\n"
- "ID4gPiA+ID4gYmUgYmFkDQo+ID4gPiA+ID4gYW5kIGNhdXNlcyBNQ0U/IERvZXMgdGhlIGJhZGJs\n"
- "b2NrIGxpc3QgZ2V0IGNoYW5nZWQgZHVyaW5nIGZpbGUNCj4gPiA+ID4gPiBzeXN0ZW0NCj4gPiA+\n"
- "ID4gPiBydW5uaW5nPyBJZiB0aGF0IGlzIHRoZSBjYXNlLCBzaG91bGQgdGhlIGZpbGUgc3lzdGVt\n"
- "IGdldCBhDQo+ID4gPiA+ID4gbm90aWZpY2F0aW9uIHdoZW4gaXQgZ2V0cyBjaGFuZ2VkPyBJZiBh\n"
- "IGJsb2NrIGlzIGdvb2Qgd2hlbiBJDQo+ID4gPiA+ID4gZmlyc3QNCj4gPiA+ID4gPiByZWFkIGl0\n"
- "LCBjYW4gSSBzdGlsbCB0cnVzdCBpdCB0byBiZSBnb29kIGZvciB0aGUgc2Vjb25kDQo+ID4gPiA+\n"
- "ID4gYWNjZXNzPw0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLCBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0\n"
- "aGUgYmFkYmxvY2tzIGxpc3QsIGl0IGNhbiBzdGlsbCBjYXVzZQ0KPiA+ID4gPiBhbg0KPiA+ID4g\n"
- "PiBNQ0UuIFRoaXMgaXMgdGhlIGxhdGVudCBlcnJvciBjYXNlIEkgZGVzY3JpYmVkIGFib3ZlLiBG\n"
- "b3IgYQ0KPiA+ID4gPiBzaW1wbGUgcmVhZCgpDQo+ID4gPiA+IHZpYSB0aGUgcG1lbSBkcml2ZXIs\n"
- "IHRoaXMgd2lsbCBnZXQgaGFuZGxlZCBieSBtZW1jcHlfbWNzYWZlLiBGb3INCj4gPiA+ID4gbW1h\n"
- "cCwNCj4gPiA+ID4gYW4gTUNFIGlzIGluZXZpdGFibGUuDQo+ID4gPiA+IA0KPiA+ID4gPiBZZXMg\n"
- "dGhlIGJhZGJsb2NrcyBsaXN0IG1heSBjaGFuZ2Ugd2hpbGUgYSBmaWxlc3lzdGVtIGlzIHJ1bm5p\n"
- "bmcuDQo+ID4gPiA+IFRoZSBSRkMNCj4gPiA+ID4gcGF0Y2hlc1sxXSBJIGxpbmtlZCB0byBhZGQg\n"
- "YSBub3RpZmljYXRpb24gZm9yIHRoZSBmaWxlc3lzdGVtDQo+ID4gPiA+IHdoZW4gdGhpcw0KPiA+\n"
- "ID4gPiBoYXBwZW5zLg0KPiA+ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhpcyBpcyByZWFsbHkgYmFk\n"
- "IGFuZCBpdCBtYWtlcyBmaWxlIHN5c3RlbSBpbXBsZW1lbnRhdGlvbiBtdWNoDQo+ID4gPiBtb3Jl\n"
- "DQo+ID4gPiBjb21wbGljYXRlZC4gQW5kIGJhZGJsb2NrIG5vdGlmaWNhdGlvbiBkb2VzIG5vdCBo\n"
- "ZWxwIHZlcnkgbXVjaCwNCj4gPiA+IGJlY2F1c2UgYW55IGJsb2NrIGNhbiBiZSBiYWQgcG90ZW50\n"
- "aWFsbHksIG5vIG1hdHRlciBpdCBpcyBpbg0KPiA+ID4gYmFkYmxvY2sNCj4gPiA+IGxpc3Qgb3Ig\n"
- "bm90LiBBbmQgZmlsZSBzeXN0ZW0gaGFzIHRvIHBlcmZvcm0gY2hlY2tpbmcgZm9yIGV2ZXJ5DQo+\n"
- "ID4gPiByZWFkLA0KPiA+ID4gdXNpbmcgbWVtY3B5X21jc2FmZS4gVGhpcyBpcyBkaXNhc3RlciBm\n"
- "b3IgZmlsZSBzeXN0ZW0gbGlrZSBOT1ZBLA0KPiA+ID4gd2hpY2gNCj4gPiA+IHVzZXMgcG9pbnRl\n"
- "ciBkZS1yZWZlcmVuY2UgdG8gYWNjZXNzIGRhdGEgc3RydWN0dXJlcyBvbiBwbWVtLiBOb3cNCj4g\n"
- "PiA+IGlmIEkNCj4gPiA+IHdhbnQgdG8gcmVhZCBhIGZpZWxkIGluIGFuIGlub2RlIG9uIHBtZW0s\n"
- "IEkgaGF2ZSB0byBjb3B5IGl0IHRvDQo+ID4gPiBEUkFNDQo+ID4gPiBmaXJzdCBhbmQgbWFrZSBz\n"
- "dXJlIG1lbWNweV9tY3NhZmUoKSBkb2VzIG5vdCByZXBvcnQgYW55dGhpbmcNCj4gPiA+IHdyb25n\n"
- "Lg0KPiA+IA0KPiA+IFlvdSBoYXZlIGEgZ29vZCBwb2ludCwgYW5kIEkgZG9uJ3Qga25vdyBpZiBJ\n"
- "IGhhdmUgYW4gYW5zd2VyIGZvcg0KPiA+IHRoaXMuLg0KPiA+IEFzc3VtaW5nIGEgc3lzdGVtIHdp\n"
- "dGggTUNFIHJlY292ZXJ5LCBtYXliZSBOT1ZBIGNhbiBhZGQgYSBtY2UNCj4gPiBoYW5kbGVyDQo+\n"
- "ID4gc2ltaWxhciB0byBuZml0X2hhbmRsZV9tY2UoKSwgYW5kIGhhbmRsZSBlcnJvcnMgYXMgdGhl\n"
- "eSBoYXBwZW4sIGJ1dA0KPiA+IEknbQ0KPiA+IGJlaW5nIHZlcnkgaGFuZC13YXZleSBoZXJlIGFu\n"
- "ZCBkb24ndCBrbm93IGhvdyBtdWNoL2hvdyB3ZWxsIHRoYXQNCj4gPiBtaWdodA0KPiA+IHdvcmsu\n"
- "Lg0KPiA+IA0KPiA+ID4gDQo+ID4gPiA+IE5vLCBpZiB0aGUgbWVkaWEsIGZvciBzb21lIHJlYXNv\n"
- "biwgJ2R2ZWxvcHMnIGEgYmFkIGNlbGwsIGENCj4gPiA+ID4gc2Vjb25kDQo+ID4gPiA+IGNvbnNl\n"
- "Y3V0aXZlIHJlYWQgZG9lcyBoYXZlIGEgY2hhbmNlIG9mIGJlaW5nIGJhZC4gT25jZSBhDQo+ID4g\n"
- "PiA+IGxvY2F0aW9uIGhhcw0KPiA+ID4gPiBiZWVuIG1hcmtlZCBhcyBiYWQsIGl0IHdpbGwgc3Rh\n"
- "eSBiYWQgdGlsbCB0aGUgQUNQSSBjbGVhciBlcnJvcg0KPiA+ID4gPiAnRFNNJyBoYXMNCj4gPiA+\n"
- "ID4gYmVlbiBjYWxsZWQgdG8gbWFyayBpdCBhcyBjbGVhbi4NCj4gPiA+ID4gDQo+ID4gPiANCj4g\n"
- "PiA+IEkgd29uZGVyIHdoYXQgaGFwcGVucyB0byB3cml0ZSBpbiB0aGlzIGNhc2U/IElmIGEgYmxv\n"
- "Y2sgaXMgYmFkIGJ1dA0KPiA+ID4gbm90DQo+ID4gPiByZXBvcnRlZCBpbiBiYWRibG9jayBsaXN0\n"
- "LiBOb3cgSSB3cml0ZSB0byBpdCB3aXRob3V0IHJlYWRpbmcNCj4gPiA+IGZpcnN0LiBEbw0KPiA+\n"
- "ID4gSSBjbGVhciB0aGUgcG9pc29uIHdpdGggdGhlIHdyaXRlPyBPciBzdGlsbCByZXF1aXJlIGEg\n"
- "QUNQSSBEU00/DQo+ID4gDQo+ID4gV2l0aCB3cml0ZXMsIG15IHVuZGVyc3RhbmRpbmcgaXMgdGhl\n"
- "cmUgaXMgc3RpbGwgYSBwb3NzaWJpbGl0eSB0aGF0DQo+ID4gYW4NCj4gPiBpbnRlcm5hbCByZWFk\n"
- "LW1vZGlmeS13cml0ZSBjYW4gaGFwcGVuLCBhbmQgY2F1c2UgYSBNQ0UgKHRoaXMgaXMgdGhlDQo+\n"
- "ID4gc2FtZQ0KPiA+IGFzIHdyaXRpbmcgdG8gYSBiYWQgRFJBTSBjZWxsLCB3aGljaCBjYW4gYWxz\n"
- "byBjYXVzZSBhbiBNQ0UpLiBZb3UNCj4gPiBjYW4ndA0KPiA+IHJlYWxseSB1c2UgdGhlIEFDUEkg\n"
- "RFNNIHByZWVtcHRpdmVseSBiZWNhdXNlIHlvdSBkb24ndCBrbm93IHdoZXRoZXINCj4gPiB0aGUN\n"
- "Cj4gPiBsb2NhdGlvbiB3YXMgYmFkLiBUaGUgZXJyb3IgZmxvdyB3aWxsIGJlIHNvbWV0aGluZyBs\n"
- "aWtlIHdyaXRlIGNhdXNlcw0KPiA+IHRoZQ0KPiA+IE1DRSwgYSBiYWRibG9jayBnZXRzIGFkZGVk\n"
- "IChlaXRoZXIgdGhyb3VnaCB0aGUgbWNlIGhhbmRsZXIgb3IgYWZ0ZXINCj4gPiB0aGUNCj4gPiBu\n"
- "ZXh0IHJlYm9vdCksIGFuZCB0aGUgcmVjb3ZlcnkgcGF0aCBpcyBub3cgdGhlIHNhbWUgYXMgYSBy\n"
- "ZWd1bGFyDQo+ID4gYmFkYmxvY2suDQo+ID4gDQo+IA0KPiBUaGlzIGlzIGRpZmZlcmVudCBmcm9t\n"
- "IG15IHVuZGVyc3RhbmRpbmcuIFJpZ2h0IG5vdyB3cml0ZV9wbWVtKCkgaW4NCj4gcG1lbV9kb19i\n"
- "dmVjKCkgZG9lcyBub3QgdXNlIG1lbWNweV9tY3NhZmUoKS4gSWYgdGhlIGJsb2NrIGlzIGJhZCBp\n"
- "dA0KPiBjbGVhcnMgcG9pc29uIGFuZCB3cml0ZXMgdG8gcG1lbSBhZ2Fpbi4gU2VlbXMgdG8gbWUg\n"
- "d3JpdGluZyB0byBiYWQNCj4gYmxvY2tzIGRvZXMgbm90IGNhdXNlIE1DRS4gRG8gd2UgbmVlZCBt\n"
- "ZW1jcHlfbWNzYWZlIGZvciBwbWVtIHN0b3Jlcz8NCg0KWW91IGFyZSByaWdodCwgd3JpdGVzIGRv\n"
- "bid0IHVzZSBtZW1jcHlfbWNzYWZlLCBhbmQgd2lsbCBub3QgZGlyZWN0bHkNCmNhdXNlIGFuIE1D\n"
- "RS4gSG93ZXZlciBhIHdyaXRlIGNhbiBjYXVzZSBhbiBhc3luY2hyb25vdXMgJ0NNQ0knIC0NCmNv\n"
- "cnJlY3RlZCBtYWNoaW5lIGNoZWNrIGludGVycnVwdCwgYnV0IHRoaXMgaXMgbm90IGNyaXRpY2Fs\n"
- "LCBhbmQgd29udCBiZQ0KYSBtZW1vcnkgZXJyb3IgYXMgdGhlIGNvcmUgZGlkbid0IGNvbnN1bWUg\n"
- "cG9pc29uLiBtZW1jcHlfbWNzYWZlIGNhbm5vdA0KcHJvdGVjdCBhZ2FpbnN0IHRoaXMgYmVjYXVz\n"
- "ZSB0aGUgd3JpdGUgaXMgJ3Bvc3RlZCcgYW5kIHRoZSBDTUNJIGlzIG5vdA0Kc3luY2hyb25vdXMu\n"
- "IE5vdGUgdGhhdCB0aGlzIGlzIG9ubHkgaW4gdGhlIGxhdGVudCBlcnJvciBvciBtZW1tYXAtc3Rv\n"
- "cmUNCmNhc2UuDQoNCj4gDQo+IFRoYW5rcywNCj4gQW5kaXJ5DQo+IA0KPiA+ID4gDQo+ID4gPiA+\n"
- "IFsxXTogaHR0cDovL3d3dy5saW51eC5zZ2kuY29tL2FyY2hpdmVzL3hmcy8yMDE2LTA2L21zZzAw\n"
- "Mjk5Lmh0bWwNCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IFRoYW5rIHlvdSBmb3IgdGhlIHBhdGNo\n"
- c2V0LiBJIHdpbGwgbG9vayBpbnRvIGl0Lg0KPiA+ID4g
+ "On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote:\n"
+ "> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co\n"
+ "> m> wrote:\n"
+ "> > On 01/17, Andiry Xu wrote:\n"
+ "> > \n"
+ "> > <snip>\n"
+ "> > \n"
+ "> > > > > \n"
+ "> > > > > The pmem_do_bvec() read logic is like this:\n"
+ "> > > > > \n"
+ "> > > > > pmem_do_bvec()\n"
+ "> > > > > \302\240\302\240\302\240\302\240if (is_bad_pmem())\n"
+ "> > > > > \302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240return -EIO;\n"
+ "> > > > > \302\240\302\240\302\240\302\240else\n"
+ "> > > > > \302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240memcpy_from_pmem();\n"
+ "> > > > > \n"
+ "> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this\n"
+ "> > > > > imply\n"
+ "> > > > > that even if a block is not in the badblock list, it still can\n"
+ "> > > > > be bad\n"
+ "> > > > > and causes MCE? Does the badblock list get changed during file\n"
+ "> > > > > system\n"
+ "> > > > > running? If that is the case, should the file system get a\n"
+ "> > > > > notification when it gets changed? If a block is good when I\n"
+ "> > > > > first\n"
+ "> > > > > read it, can I still trust it to be good for the second\n"
+ "> > > > > access?\n"
+ "> > > > \n"
+ "> > > > Yes, if a block is not in the badblocks list, it can still cause\n"
+ "> > > > an\n"
+ "> > > > MCE. This is the latent error case I described above. For a\n"
+ "> > > > simple read()\n"
+ "> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For\n"
+ "> > > > mmap,\n"
+ "> > > > an MCE is inevitable.\n"
+ "> > > > \n"
+ "> > > > Yes the badblocks list may change while a filesystem is running.\n"
+ "> > > > The RFC\n"
+ "> > > > patches[1] I linked to add a notification for the filesystem\n"
+ "> > > > when this\n"
+ "> > > > happens.\n"
+ "> > > > \n"
+ "> > > \n"
+ "> > > This is really bad and it makes file system implementation much\n"
+ "> > > more\n"
+ "> > > complicated. And badblock notification does not help very much,\n"
+ "> > > because any block can be bad potentially, no matter it is in\n"
+ "> > > badblock\n"
+ "> > > list or not. And file system has to perform checking for every\n"
+ "> > > read,\n"
+ "> > > using memcpy_mcsafe. This is disaster for file system like NOVA,\n"
+ "> > > which\n"
+ "> > > uses pointer de-reference to access data structures on pmem. Now\n"
+ "> > > if I\n"
+ "> > > want to read a field in an inode on pmem, I have to copy it to\n"
+ "> > > DRAM\n"
+ "> > > first and make sure memcpy_mcsafe() does not report anything\n"
+ "> > > wrong.\n"
+ "> > \n"
+ "> > You have a good point, and I don't know if I have an answer for\n"
+ "> > this..\n"
+ "> > Assuming a system with MCE recovery, maybe NOVA can add a mce\n"
+ "> > handler\n"
+ "> > similar to nfit_handle_mce(), and handle errors as they happen, but\n"
+ "> > I'm\n"
+ "> > being very hand-wavey here and don't know how much/how well that\n"
+ "> > might\n"
+ "> > work..\n"
+ "> > \n"
+ "> > > \n"
+ "> > > > No, if the media, for some reason, 'dvelops' a bad cell, a\n"
+ "> > > > second\n"
+ "> > > > consecutive read does have a chance of being bad. Once a\n"
+ "> > > > location has\n"
+ "> > > > been marked as bad, it will stay bad till the ACPI clear error\n"
+ "> > > > 'DSM' has\n"
+ "> > > > been called to mark it as clean.\n"
+ "> > > > \n"
+ "> > > \n"
+ "> > > I wonder what happens to write in this case? If a block is bad but\n"
+ "> > > not\n"
+ "> > > reported in badblock list. Now I write to it without reading\n"
+ "> > > first. Do\n"
+ "> > > I clear the poison with the write? Or still require a ACPI DSM?\n"
+ "> > \n"
+ "> > With writes, my understanding is there is still a possibility that\n"
+ "> > an\n"
+ "> > internal read-modify-write can happen, and cause a MCE (this is the\n"
+ "> > same\n"
+ "> > as writing to a bad DRAM cell, which can also cause an MCE). You\n"
+ "> > can't\n"
+ "> > really use the ACPI DSM preemptively because you don't know whether\n"
+ "> > the\n"
+ "> > location was bad. The error flow will be something like write causes\n"
+ "> > the\n"
+ "> > MCE, a badblock gets added (either through the mce handler or after\n"
+ "> > the\n"
+ "> > next reboot), and the recovery path is now the same as a regular\n"
+ "> > badblock.\n"
+ "> > \n"
+ "> \n"
+ "> This is different from my understanding. Right now write_pmem() in\n"
+ "> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it\n"
+ "> clears poison and writes to pmem again. Seems to me writing to bad\n"
+ "> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores?\n"
+ "\n"
+ "You are right, writes don't use memcpy_mcsafe, and will not directly\n"
+ "cause an MCE. However a write can cause an asynchronous 'CMCI' -\n"
+ "corrected machine check interrupt, but this is not critical, and wont be\n"
+ "a memory error as the core didn't consume poison. memcpy_mcsafe cannot\n"
+ "protect against this because the write is 'posted' and the CMCI is not\n"
+ "synchronous. Note that this is only in the latent error or memmap-store\n"
+ "case.\n"
+ "\n"
+ "> \n"
+ "> Thanks,\n"
+ "> Andiry\n"
+ "> \n"
+ "> > > \n"
+ "> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html\n"
+ "> > > > \n"
+ "> > > \n"
+ "> > > Thank you for the patchset. I will look into it.\n"
+ > > >
 
-f2d10fa7320b827160b9f1c61dcc0c2599df8fbf66603ef83422cab07af32ccc
+7e8770f3f253ce8b0cf565b3b94fb8ecd7881357581d7d7004e0c1ea7aab2d8f

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.