diff for duplicates of <1484872265.4857.1.camel@intel.com> diff --git a/a/1.txt b/N1/1.txt index f21b819..5bfa9c5 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -1,81 +1,128 @@ -T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE3OjU4IC0wODAwLCBBbmRpcnkgWHUgd3JvdGU6DQo+IE9u -IFR1ZSwgSmFuIDE3LCAyMDE3IGF0IDM6NTEgUE0sIFZpc2hhbCBWZXJtYSA8dmlzaGFsLmwudmVy -bWFAaW50ZWwuY28NCj4gbT4gd3JvdGU6DQo+ID4gT24gMDEvMTcsIEFuZGlyeSBYdSB3cm90ZToN -Cj4gPiANCj4gPiA8c25pcD4NCj4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGUgcG1lbV9k -b19idmVjKCkgcmVhZCBsb2dpYyBpcyBsaWtlIHRoaXM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g -cG1lbV9kb19idmVjKCkNCj4gPiA+ID4gPiDCoMKgwqDCoGlmIChpc19iYWRfcG1lbSgpKQ0KPiA+ -ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gLUVJTzsNCj4gPiA+ID4gPiDCoMKgwqDCoGVs -c2UNCj4gPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgbWVtY3B5X2Zyb21fcG1lbSgpOw0KPiA+ID4g -PiA+IA0KPiA+ID4gPiA+IE5vdGUgbWVtY3B5X2Zyb21fcG1lbSgpIGlzIGNhbGxpbmcgbWVtY3B5 -X21jc2FmZSgpLiBEb2VzIHRoaXMNCj4gPiA+ID4gPiBpbXBseQ0KPiA+ID4gPiA+IHRoYXQgZXZl -biBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0aGUgYmFkYmxvY2sgbGlzdCwgaXQgc3RpbGwgY2FuDQo+ -ID4gPiA+ID4gYmUgYmFkDQo+ID4gPiA+ID4gYW5kIGNhdXNlcyBNQ0U/IERvZXMgdGhlIGJhZGJs -b2NrIGxpc3QgZ2V0IGNoYW5nZWQgZHVyaW5nIGZpbGUNCj4gPiA+ID4gPiBzeXN0ZW0NCj4gPiA+ -ID4gPiBydW5uaW5nPyBJZiB0aGF0IGlzIHRoZSBjYXNlLCBzaG91bGQgdGhlIGZpbGUgc3lzdGVt -IGdldCBhDQo+ID4gPiA+ID4gbm90aWZpY2F0aW9uIHdoZW4gaXQgZ2V0cyBjaGFuZ2VkPyBJZiBh -IGJsb2NrIGlzIGdvb2Qgd2hlbiBJDQo+ID4gPiA+ID4gZmlyc3QNCj4gPiA+ID4gPiByZWFkIGl0 -LCBjYW4gSSBzdGlsbCB0cnVzdCBpdCB0byBiZSBnb29kIGZvciB0aGUgc2Vjb25kDQo+ID4gPiA+ -ID4gYWNjZXNzPw0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLCBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0 -aGUgYmFkYmxvY2tzIGxpc3QsIGl0IGNhbiBzdGlsbCBjYXVzZQ0KPiA+ID4gPiBhbg0KPiA+ID4g -PiBNQ0UuIFRoaXMgaXMgdGhlIGxhdGVudCBlcnJvciBjYXNlIEkgZGVzY3JpYmVkIGFib3ZlLiBG -b3IgYQ0KPiA+ID4gPiBzaW1wbGUgcmVhZCgpDQo+ID4gPiA+IHZpYSB0aGUgcG1lbSBkcml2ZXIs -IHRoaXMgd2lsbCBnZXQgaGFuZGxlZCBieSBtZW1jcHlfbWNzYWZlLiBGb3INCj4gPiA+ID4gbW1h -cCwNCj4gPiA+ID4gYW4gTUNFIGlzIGluZXZpdGFibGUuDQo+ID4gPiA+IA0KPiA+ID4gPiBZZXMg -dGhlIGJhZGJsb2NrcyBsaXN0IG1heSBjaGFuZ2Ugd2hpbGUgYSBmaWxlc3lzdGVtIGlzIHJ1bm5p -bmcuDQo+ID4gPiA+IFRoZSBSRkMNCj4gPiA+ID4gcGF0Y2hlc1sxXSBJIGxpbmtlZCB0byBhZGQg -YSBub3RpZmljYXRpb24gZm9yIHRoZSBmaWxlc3lzdGVtDQo+ID4gPiA+IHdoZW4gdGhpcw0KPiA+ -ID4gPiBoYXBwZW5zLg0KPiA+ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhpcyBpcyByZWFsbHkgYmFk -IGFuZCBpdCBtYWtlcyBmaWxlIHN5c3RlbSBpbXBsZW1lbnRhdGlvbiBtdWNoDQo+ID4gPiBtb3Jl -DQo+ID4gPiBjb21wbGljYXRlZC4gQW5kIGJhZGJsb2NrIG5vdGlmaWNhdGlvbiBkb2VzIG5vdCBo -ZWxwIHZlcnkgbXVjaCwNCj4gPiA+IGJlY2F1c2UgYW55IGJsb2NrIGNhbiBiZSBiYWQgcG90ZW50 -aWFsbHksIG5vIG1hdHRlciBpdCBpcyBpbg0KPiA+ID4gYmFkYmxvY2sNCj4gPiA+IGxpc3Qgb3Ig -bm90LiBBbmQgZmlsZSBzeXN0ZW0gaGFzIHRvIHBlcmZvcm0gY2hlY2tpbmcgZm9yIGV2ZXJ5DQo+ -ID4gPiByZWFkLA0KPiA+ID4gdXNpbmcgbWVtY3B5X21jc2FmZS4gVGhpcyBpcyBkaXNhc3RlciBm -b3IgZmlsZSBzeXN0ZW0gbGlrZSBOT1ZBLA0KPiA+ID4gd2hpY2gNCj4gPiA+IHVzZXMgcG9pbnRl -ciBkZS1yZWZlcmVuY2UgdG8gYWNjZXNzIGRhdGEgc3RydWN0dXJlcyBvbiBwbWVtLiBOb3cNCj4g -PiA+IGlmIEkNCj4gPiA+IHdhbnQgdG8gcmVhZCBhIGZpZWxkIGluIGFuIGlub2RlIG9uIHBtZW0s -IEkgaGF2ZSB0byBjb3B5IGl0IHRvDQo+ID4gPiBEUkFNDQo+ID4gPiBmaXJzdCBhbmQgbWFrZSBz -dXJlIG1lbWNweV9tY3NhZmUoKSBkb2VzIG5vdCByZXBvcnQgYW55dGhpbmcNCj4gPiA+IHdyb25n -Lg0KPiA+IA0KPiA+IFlvdSBoYXZlIGEgZ29vZCBwb2ludCwgYW5kIEkgZG9uJ3Qga25vdyBpZiBJ -IGhhdmUgYW4gYW5zd2VyIGZvcg0KPiA+IHRoaXMuLg0KPiA+IEFzc3VtaW5nIGEgc3lzdGVtIHdp -dGggTUNFIHJlY292ZXJ5LCBtYXliZSBOT1ZBIGNhbiBhZGQgYSBtY2UNCj4gPiBoYW5kbGVyDQo+ -ID4gc2ltaWxhciB0byBuZml0X2hhbmRsZV9tY2UoKSwgYW5kIGhhbmRsZSBlcnJvcnMgYXMgdGhl -eSBoYXBwZW4sIGJ1dA0KPiA+IEknbQ0KPiA+IGJlaW5nIHZlcnkgaGFuZC13YXZleSBoZXJlIGFu -ZCBkb24ndCBrbm93IGhvdyBtdWNoL2hvdyB3ZWxsIHRoYXQNCj4gPiBtaWdodA0KPiA+IHdvcmsu -Lg0KPiA+IA0KPiA+ID4gDQo+ID4gPiA+IE5vLCBpZiB0aGUgbWVkaWEsIGZvciBzb21lIHJlYXNv -biwgJ2R2ZWxvcHMnIGEgYmFkIGNlbGwsIGENCj4gPiA+ID4gc2Vjb25kDQo+ID4gPiA+IGNvbnNl -Y3V0aXZlIHJlYWQgZG9lcyBoYXZlIGEgY2hhbmNlIG9mIGJlaW5nIGJhZC4gT25jZSBhDQo+ID4g -PiA+IGxvY2F0aW9uIGhhcw0KPiA+ID4gPiBiZWVuIG1hcmtlZCBhcyBiYWQsIGl0IHdpbGwgc3Rh -eSBiYWQgdGlsbCB0aGUgQUNQSSBjbGVhciBlcnJvcg0KPiA+ID4gPiAnRFNNJyBoYXMNCj4gPiA+ -ID4gYmVlbiBjYWxsZWQgdG8gbWFyayBpdCBhcyBjbGVhbi4NCj4gPiA+ID4gDQo+ID4gPiANCj4g -PiA+IEkgd29uZGVyIHdoYXQgaGFwcGVucyB0byB3cml0ZSBpbiB0aGlzIGNhc2U/IElmIGEgYmxv -Y2sgaXMgYmFkIGJ1dA0KPiA+ID4gbm90DQo+ID4gPiByZXBvcnRlZCBpbiBiYWRibG9jayBsaXN0 -LiBOb3cgSSB3cml0ZSB0byBpdCB3aXRob3V0IHJlYWRpbmcNCj4gPiA+IGZpcnN0LiBEbw0KPiA+ -ID4gSSBjbGVhciB0aGUgcG9pc29uIHdpdGggdGhlIHdyaXRlPyBPciBzdGlsbCByZXF1aXJlIGEg -QUNQSSBEU00/DQo+ID4gDQo+ID4gV2l0aCB3cml0ZXMsIG15IHVuZGVyc3RhbmRpbmcgaXMgdGhl -cmUgaXMgc3RpbGwgYSBwb3NzaWJpbGl0eSB0aGF0DQo+ID4gYW4NCj4gPiBpbnRlcm5hbCByZWFk -LW1vZGlmeS13cml0ZSBjYW4gaGFwcGVuLCBhbmQgY2F1c2UgYSBNQ0UgKHRoaXMgaXMgdGhlDQo+ -ID4gc2FtZQ0KPiA+IGFzIHdyaXRpbmcgdG8gYSBiYWQgRFJBTSBjZWxsLCB3aGljaCBjYW4gYWxz -byBjYXVzZSBhbiBNQ0UpLiBZb3UNCj4gPiBjYW4ndA0KPiA+IHJlYWxseSB1c2UgdGhlIEFDUEkg -RFNNIHByZWVtcHRpdmVseSBiZWNhdXNlIHlvdSBkb24ndCBrbm93IHdoZXRoZXINCj4gPiB0aGUN -Cj4gPiBsb2NhdGlvbiB3YXMgYmFkLiBUaGUgZXJyb3IgZmxvdyB3aWxsIGJlIHNvbWV0aGluZyBs -aWtlIHdyaXRlIGNhdXNlcw0KPiA+IHRoZQ0KPiA+IE1DRSwgYSBiYWRibG9jayBnZXRzIGFkZGVk -IChlaXRoZXIgdGhyb3VnaCB0aGUgbWNlIGhhbmRsZXIgb3IgYWZ0ZXINCj4gPiB0aGUNCj4gPiBu -ZXh0IHJlYm9vdCksIGFuZCB0aGUgcmVjb3ZlcnkgcGF0aCBpcyBub3cgdGhlIHNhbWUgYXMgYSBy -ZWd1bGFyDQo+ID4gYmFkYmxvY2suDQo+ID4gDQo+IA0KPiBUaGlzIGlzIGRpZmZlcmVudCBmcm9t -IG15IHVuZGVyc3RhbmRpbmcuIFJpZ2h0IG5vdyB3cml0ZV9wbWVtKCkgaW4NCj4gcG1lbV9kb19i -dmVjKCkgZG9lcyBub3QgdXNlIG1lbWNweV9tY3NhZmUoKS4gSWYgdGhlIGJsb2NrIGlzIGJhZCBp -dA0KPiBjbGVhcnMgcG9pc29uIGFuZCB3cml0ZXMgdG8gcG1lbSBhZ2Fpbi4gU2VlbXMgdG8gbWUg -d3JpdGluZyB0byBiYWQNCj4gYmxvY2tzIGRvZXMgbm90IGNhdXNlIE1DRS4gRG8gd2UgbmVlZCBt -ZW1jcHlfbWNzYWZlIGZvciBwbWVtIHN0b3Jlcz8NCg0KWW91IGFyZSByaWdodCwgd3JpdGVzIGRv -bid0IHVzZSBtZW1jcHlfbWNzYWZlLCBhbmQgd2lsbCBub3QgZGlyZWN0bHkNCmNhdXNlIGFuIE1D -RS4gSG93ZXZlciBhIHdyaXRlIGNhbiBjYXVzZSBhbiBhc3luY2hyb25vdXMgJ0NNQ0knIC0NCmNv -cnJlY3RlZCBtYWNoaW5lIGNoZWNrIGludGVycnVwdCwgYnV0IHRoaXMgaXMgbm90IGNyaXRpY2Fs -LCBhbmQgd29udCBiZQ0KYSBtZW1vcnkgZXJyb3IgYXMgdGhlIGNvcmUgZGlkbid0IGNvbnN1bWUg -cG9pc29uLiBtZW1jcHlfbWNzYWZlIGNhbm5vdA0KcHJvdGVjdCBhZ2FpbnN0IHRoaXMgYmVjYXVz -ZSB0aGUgd3JpdGUgaXMgJ3Bvc3RlZCcgYW5kIHRoZSBDTUNJIGlzIG5vdA0Kc3luY2hyb25vdXMu -IE5vdGUgdGhhdCB0aGlzIGlzIG9ubHkgaW4gdGhlIGxhdGVudCBlcnJvciBvciBtZW1tYXAtc3Rv -cmUNCmNhc2UuDQoNCj4gDQo+IFRoYW5rcywNCj4gQW5kaXJ5DQo+IA0KPiA+ID4gDQo+ID4gPiA+ -IFsxXTogaHR0cDovL3d3dy5saW51eC5zZ2kuY29tL2FyY2hpdmVzL3hmcy8yMDE2LTA2L21zZzAw -Mjk5Lmh0bWwNCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IFRoYW5rIHlvdSBmb3IgdGhlIHBhdGNo -c2V0LiBJIHdpbGwgbG9vayBpbnRvIGl0Lg0KPiA+ID4g +On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote: +> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co +> m> wrote: +> > On 01/17, Andiry Xu wrote: +> > +> > <snip> +> > +> > > > > +> > > > > The pmem_do_bvec() read logic is like this: +> > > > > +> > > > > pmem_do_bvec() +> > > > > if (is_bad_pmem()) +> > > > > return -EIO; +> > > > > else +> > > > > memcpy_from_pmem(); +> > > > > +> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this +> > > > > imply +> > > > > that even if a block is not in the badblock list, it still can +> > > > > be bad +> > > > > and causes MCE? Does the badblock list get changed during file +> > > > > system +> > > > > running? If that is the case, should the file system get a +> > > > > notification when it gets changed? If a block is good when I +> > > > > first +> > > > > read it, can I still trust it to be good for the second +> > > > > access? +> > > > +> > > > Yes, if a block is not in the badblocks list, it can still cause +> > > > an +> > > > MCE. This is the latent error case I described above. For a +> > > > simple read() +> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For +> > > > mmap, +> > > > an MCE is inevitable. +> > > > +> > > > Yes the badblocks list may change while a filesystem is running. +> > > > The RFC +> > > > patches[1] I linked to add a notification for the filesystem +> > > > when this +> > > > happens. +> > > > +> > > +> > > This is really bad and it makes file system implementation much +> > > more +> > > complicated. And badblock notification does not help very much, +> > > because any block can be bad potentially, no matter it is in +> > > badblock +> > > list or not. And file system has to perform checking for every +> > > read, +> > > using memcpy_mcsafe. This is disaster for file system like NOVA, +> > > which +> > > uses pointer de-reference to access data structures on pmem. Now +> > > if I +> > > want to read a field in an inode on pmem, I have to copy it to +> > > DRAM +> > > first and make sure memcpy_mcsafe() does not report anything +> > > wrong. +> > +> > You have a good point, and I don't know if I have an answer for +> > this.. +> > Assuming a system with MCE recovery, maybe NOVA can add a mce +> > handler +> > similar to nfit_handle_mce(), and handle errors as they happen, but +> > I'm +> > being very hand-wavey here and don't know how much/how well that +> > might +> > work.. +> > +> > > +> > > > No, if the media, for some reason, 'dvelops' a bad cell, a +> > > > second +> > > > consecutive read does have a chance of being bad. Once a +> > > > location has +> > > > been marked as bad, it will stay bad till the ACPI clear error +> > > > 'DSM' has +> > > > been called to mark it as clean. +> > > > +> > > +> > > I wonder what happens to write in this case? If a block is bad but +> > > not +> > > reported in badblock list. Now I write to it without reading +> > > first. Do +> > > I clear the poison with the write? Or still require a ACPI DSM? +> > +> > With writes, my understanding is there is still a possibility that +> > an +> > internal read-modify-write can happen, and cause a MCE (this is the +> > same +> > as writing to a bad DRAM cell, which can also cause an MCE). You +> > can't +> > really use the ACPI DSM preemptively because you don't know whether +> > the +> > location was bad. The error flow will be something like write causes +> > the +> > MCE, a badblock gets added (either through the mce handler or after +> > the +> > next reboot), and the recovery path is now the same as a regular +> > badblock. +> > +> +> This is different from my understanding. Right now write_pmem() in +> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it +> clears poison and writes to pmem again. Seems to me writing to bad +> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores? + +You are right, writes don't use memcpy_mcsafe, and will not directly +cause an MCE. However a write can cause an asynchronous 'CMCI' - +corrected machine check interrupt, but this is not critical, and wont be +a memory error as the core didn't consume poison. memcpy_mcsafe cannot +protect against this because the write is 'posted' and the CMCI is not +synchronous. Note that this is only in the latent error or memmap-store +case. + +> +> Thanks, +> Andiry +> +> > > +> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html +> > > > +> > > +> > > Thank you for the patchset. I will look into it. +> > > +_______________________________________________ +Linux-nvdimm mailing list +Linux-nvdimm@lists.01.org +https://lists.01.org/mailman/listinfo/linux-nvdimm diff --git a/a/content_digest b/N1/content_digest index ab6185c..ab3a720 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -9,99 +9,147 @@ "ref\0CAOvWMLYcP9PN6LT51gwJvmyCTfRRrVeDTrjN-8_zTKhD+UmDiw@mail.gmail.com\0" "ref\020170117235150.GE4880@omniknight.lm.intel.com\0" "ref\0CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w@mail.gmail.com\0" - "From\0Verma, Vishal L <vishal.l.verma@intel.com>\0" + "ref\0CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org\0" + "From\0Verma, Vishal L <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>\0" "Subject\0Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems\0" "Date\0Fri, 20 Jan 2017 00:32:14 +0000\0" - "To\0andiry@gmail.com <andiry@gmail.com>\0" - "Cc\0darrick.wong@oracle.com <darrick.wong@oracle.com>" - Vyacheslav.Dubeyko@wdc.com <Vyacheslav.Dubeyko@wdc.com> - linux-block@vger.kernel.org <linux-block@vger.kernel.org> - slava@dubeyko.com <slava@dubeyko.com> - lsf-pc@lists.linux-foundation.org <lsf-pc@lists.linux-foundation.org> - linux-nvdimm@ml01.01.org <linux-nvdimm@ml01.01.org> - " linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>\0" + "To\0andiry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <andiry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>\0" + "Cc\0Vyacheslav.Dubeyko-Sjgp3cTcYWE@public.gmane.org <Vyacheslav.Dubeyko-Sjgp3cTcYWE@public.gmane.org>" + darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> + linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org <linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org> + linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> + slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> + linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> + " lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org <lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>\0" "\00:1\0" "b\0" - "T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE3OjU4IC0wODAwLCBBbmRpcnkgWHUgd3JvdGU6DQo+IE9u\n" - "IFR1ZSwgSmFuIDE3LCAyMDE3IGF0IDM6NTEgUE0sIFZpc2hhbCBWZXJtYSA8dmlzaGFsLmwudmVy\n" - "bWFAaW50ZWwuY28NCj4gbT4gd3JvdGU6DQo+ID4gT24gMDEvMTcsIEFuZGlyeSBYdSB3cm90ZToN\n" - "Cj4gPiANCj4gPiA8c25pcD4NCj4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGUgcG1lbV9k\n" - "b19idmVjKCkgcmVhZCBsb2dpYyBpcyBsaWtlIHRoaXM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g\n" - "cG1lbV9kb19idmVjKCkNCj4gPiA+ID4gPiDCoMKgwqDCoGlmIChpc19iYWRfcG1lbSgpKQ0KPiA+\n" - "ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gLUVJTzsNCj4gPiA+ID4gPiDCoMKgwqDCoGVs\n" - "c2UNCj4gPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgbWVtY3B5X2Zyb21fcG1lbSgpOw0KPiA+ID4g\n" - "PiA+IA0KPiA+ID4gPiA+IE5vdGUgbWVtY3B5X2Zyb21fcG1lbSgpIGlzIGNhbGxpbmcgbWVtY3B5\n" - "X21jc2FmZSgpLiBEb2VzIHRoaXMNCj4gPiA+ID4gPiBpbXBseQ0KPiA+ID4gPiA+IHRoYXQgZXZl\n" - "biBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0aGUgYmFkYmxvY2sgbGlzdCwgaXQgc3RpbGwgY2FuDQo+\n" - "ID4gPiA+ID4gYmUgYmFkDQo+ID4gPiA+ID4gYW5kIGNhdXNlcyBNQ0U/IERvZXMgdGhlIGJhZGJs\n" - "b2NrIGxpc3QgZ2V0IGNoYW5nZWQgZHVyaW5nIGZpbGUNCj4gPiA+ID4gPiBzeXN0ZW0NCj4gPiA+\n" - "ID4gPiBydW5uaW5nPyBJZiB0aGF0IGlzIHRoZSBjYXNlLCBzaG91bGQgdGhlIGZpbGUgc3lzdGVt\n" - "IGdldCBhDQo+ID4gPiA+ID4gbm90aWZpY2F0aW9uIHdoZW4gaXQgZ2V0cyBjaGFuZ2VkPyBJZiBh\n" - "IGJsb2NrIGlzIGdvb2Qgd2hlbiBJDQo+ID4gPiA+ID4gZmlyc3QNCj4gPiA+ID4gPiByZWFkIGl0\n" - "LCBjYW4gSSBzdGlsbCB0cnVzdCBpdCB0byBiZSBnb29kIGZvciB0aGUgc2Vjb25kDQo+ID4gPiA+\n" - "ID4gYWNjZXNzPw0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLCBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0\n" - "aGUgYmFkYmxvY2tzIGxpc3QsIGl0IGNhbiBzdGlsbCBjYXVzZQ0KPiA+ID4gPiBhbg0KPiA+ID4g\n" - "PiBNQ0UuIFRoaXMgaXMgdGhlIGxhdGVudCBlcnJvciBjYXNlIEkgZGVzY3JpYmVkIGFib3ZlLiBG\n" - "b3IgYQ0KPiA+ID4gPiBzaW1wbGUgcmVhZCgpDQo+ID4gPiA+IHZpYSB0aGUgcG1lbSBkcml2ZXIs\n" - "IHRoaXMgd2lsbCBnZXQgaGFuZGxlZCBieSBtZW1jcHlfbWNzYWZlLiBGb3INCj4gPiA+ID4gbW1h\n" - "cCwNCj4gPiA+ID4gYW4gTUNFIGlzIGluZXZpdGFibGUuDQo+ID4gPiA+IA0KPiA+ID4gPiBZZXMg\n" - "dGhlIGJhZGJsb2NrcyBsaXN0IG1heSBjaGFuZ2Ugd2hpbGUgYSBmaWxlc3lzdGVtIGlzIHJ1bm5p\n" - "bmcuDQo+ID4gPiA+IFRoZSBSRkMNCj4gPiA+ID4gcGF0Y2hlc1sxXSBJIGxpbmtlZCB0byBhZGQg\n" - "YSBub3RpZmljYXRpb24gZm9yIHRoZSBmaWxlc3lzdGVtDQo+ID4gPiA+IHdoZW4gdGhpcw0KPiA+\n" - "ID4gPiBoYXBwZW5zLg0KPiA+ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhpcyBpcyByZWFsbHkgYmFk\n" - "IGFuZCBpdCBtYWtlcyBmaWxlIHN5c3RlbSBpbXBsZW1lbnRhdGlvbiBtdWNoDQo+ID4gPiBtb3Jl\n" - "DQo+ID4gPiBjb21wbGljYXRlZC4gQW5kIGJhZGJsb2NrIG5vdGlmaWNhdGlvbiBkb2VzIG5vdCBo\n" - "ZWxwIHZlcnkgbXVjaCwNCj4gPiA+IGJlY2F1c2UgYW55IGJsb2NrIGNhbiBiZSBiYWQgcG90ZW50\n" - "aWFsbHksIG5vIG1hdHRlciBpdCBpcyBpbg0KPiA+ID4gYmFkYmxvY2sNCj4gPiA+IGxpc3Qgb3Ig\n" - "bm90LiBBbmQgZmlsZSBzeXN0ZW0gaGFzIHRvIHBlcmZvcm0gY2hlY2tpbmcgZm9yIGV2ZXJ5DQo+\n" - "ID4gPiByZWFkLA0KPiA+ID4gdXNpbmcgbWVtY3B5X21jc2FmZS4gVGhpcyBpcyBkaXNhc3RlciBm\n" - "b3IgZmlsZSBzeXN0ZW0gbGlrZSBOT1ZBLA0KPiA+ID4gd2hpY2gNCj4gPiA+IHVzZXMgcG9pbnRl\n" - "ciBkZS1yZWZlcmVuY2UgdG8gYWNjZXNzIGRhdGEgc3RydWN0dXJlcyBvbiBwbWVtLiBOb3cNCj4g\n" - "PiA+IGlmIEkNCj4gPiA+IHdhbnQgdG8gcmVhZCBhIGZpZWxkIGluIGFuIGlub2RlIG9uIHBtZW0s\n" - "IEkgaGF2ZSB0byBjb3B5IGl0IHRvDQo+ID4gPiBEUkFNDQo+ID4gPiBmaXJzdCBhbmQgbWFrZSBz\n" - "dXJlIG1lbWNweV9tY3NhZmUoKSBkb2VzIG5vdCByZXBvcnQgYW55dGhpbmcNCj4gPiA+IHdyb25n\n" - "Lg0KPiA+IA0KPiA+IFlvdSBoYXZlIGEgZ29vZCBwb2ludCwgYW5kIEkgZG9uJ3Qga25vdyBpZiBJ\n" - "IGhhdmUgYW4gYW5zd2VyIGZvcg0KPiA+IHRoaXMuLg0KPiA+IEFzc3VtaW5nIGEgc3lzdGVtIHdp\n" - "dGggTUNFIHJlY292ZXJ5LCBtYXliZSBOT1ZBIGNhbiBhZGQgYSBtY2UNCj4gPiBoYW5kbGVyDQo+\n" - "ID4gc2ltaWxhciB0byBuZml0X2hhbmRsZV9tY2UoKSwgYW5kIGhhbmRsZSBlcnJvcnMgYXMgdGhl\n" - "eSBoYXBwZW4sIGJ1dA0KPiA+IEknbQ0KPiA+IGJlaW5nIHZlcnkgaGFuZC13YXZleSBoZXJlIGFu\n" - "ZCBkb24ndCBrbm93IGhvdyBtdWNoL2hvdyB3ZWxsIHRoYXQNCj4gPiBtaWdodA0KPiA+IHdvcmsu\n" - "Lg0KPiA+IA0KPiA+ID4gDQo+ID4gPiA+IE5vLCBpZiB0aGUgbWVkaWEsIGZvciBzb21lIHJlYXNv\n" - "biwgJ2R2ZWxvcHMnIGEgYmFkIGNlbGwsIGENCj4gPiA+ID4gc2Vjb25kDQo+ID4gPiA+IGNvbnNl\n" - "Y3V0aXZlIHJlYWQgZG9lcyBoYXZlIGEgY2hhbmNlIG9mIGJlaW5nIGJhZC4gT25jZSBhDQo+ID4g\n" - "PiA+IGxvY2F0aW9uIGhhcw0KPiA+ID4gPiBiZWVuIG1hcmtlZCBhcyBiYWQsIGl0IHdpbGwgc3Rh\n" - "eSBiYWQgdGlsbCB0aGUgQUNQSSBjbGVhciBlcnJvcg0KPiA+ID4gPiAnRFNNJyBoYXMNCj4gPiA+\n" - "ID4gYmVlbiBjYWxsZWQgdG8gbWFyayBpdCBhcyBjbGVhbi4NCj4gPiA+ID4gDQo+ID4gPiANCj4g\n" - "PiA+IEkgd29uZGVyIHdoYXQgaGFwcGVucyB0byB3cml0ZSBpbiB0aGlzIGNhc2U/IElmIGEgYmxv\n" - "Y2sgaXMgYmFkIGJ1dA0KPiA+ID4gbm90DQo+ID4gPiByZXBvcnRlZCBpbiBiYWRibG9jayBsaXN0\n" - "LiBOb3cgSSB3cml0ZSB0byBpdCB3aXRob3V0IHJlYWRpbmcNCj4gPiA+IGZpcnN0LiBEbw0KPiA+\n" - "ID4gSSBjbGVhciB0aGUgcG9pc29uIHdpdGggdGhlIHdyaXRlPyBPciBzdGlsbCByZXF1aXJlIGEg\n" - "QUNQSSBEU00/DQo+ID4gDQo+ID4gV2l0aCB3cml0ZXMsIG15IHVuZGVyc3RhbmRpbmcgaXMgdGhl\n" - "cmUgaXMgc3RpbGwgYSBwb3NzaWJpbGl0eSB0aGF0DQo+ID4gYW4NCj4gPiBpbnRlcm5hbCByZWFk\n" - "LW1vZGlmeS13cml0ZSBjYW4gaGFwcGVuLCBhbmQgY2F1c2UgYSBNQ0UgKHRoaXMgaXMgdGhlDQo+\n" - "ID4gc2FtZQ0KPiA+IGFzIHdyaXRpbmcgdG8gYSBiYWQgRFJBTSBjZWxsLCB3aGljaCBjYW4gYWxz\n" - "byBjYXVzZSBhbiBNQ0UpLiBZb3UNCj4gPiBjYW4ndA0KPiA+IHJlYWxseSB1c2UgdGhlIEFDUEkg\n" - "RFNNIHByZWVtcHRpdmVseSBiZWNhdXNlIHlvdSBkb24ndCBrbm93IHdoZXRoZXINCj4gPiB0aGUN\n" - "Cj4gPiBsb2NhdGlvbiB3YXMgYmFkLiBUaGUgZXJyb3IgZmxvdyB3aWxsIGJlIHNvbWV0aGluZyBs\n" - "aWtlIHdyaXRlIGNhdXNlcw0KPiA+IHRoZQ0KPiA+IE1DRSwgYSBiYWRibG9jayBnZXRzIGFkZGVk\n" - "IChlaXRoZXIgdGhyb3VnaCB0aGUgbWNlIGhhbmRsZXIgb3IgYWZ0ZXINCj4gPiB0aGUNCj4gPiBu\n" - "ZXh0IHJlYm9vdCksIGFuZCB0aGUgcmVjb3ZlcnkgcGF0aCBpcyBub3cgdGhlIHNhbWUgYXMgYSBy\n" - "ZWd1bGFyDQo+ID4gYmFkYmxvY2suDQo+ID4gDQo+IA0KPiBUaGlzIGlzIGRpZmZlcmVudCBmcm9t\n" - "IG15IHVuZGVyc3RhbmRpbmcuIFJpZ2h0IG5vdyB3cml0ZV9wbWVtKCkgaW4NCj4gcG1lbV9kb19i\n" - "dmVjKCkgZG9lcyBub3QgdXNlIG1lbWNweV9tY3NhZmUoKS4gSWYgdGhlIGJsb2NrIGlzIGJhZCBp\n" - "dA0KPiBjbGVhcnMgcG9pc29uIGFuZCB3cml0ZXMgdG8gcG1lbSBhZ2Fpbi4gU2VlbXMgdG8gbWUg\n" - "d3JpdGluZyB0byBiYWQNCj4gYmxvY2tzIGRvZXMgbm90IGNhdXNlIE1DRS4gRG8gd2UgbmVlZCBt\n" - "ZW1jcHlfbWNzYWZlIGZvciBwbWVtIHN0b3Jlcz8NCg0KWW91IGFyZSByaWdodCwgd3JpdGVzIGRv\n" - "bid0IHVzZSBtZW1jcHlfbWNzYWZlLCBhbmQgd2lsbCBub3QgZGlyZWN0bHkNCmNhdXNlIGFuIE1D\n" - "RS4gSG93ZXZlciBhIHdyaXRlIGNhbiBjYXVzZSBhbiBhc3luY2hyb25vdXMgJ0NNQ0knIC0NCmNv\n" - "cnJlY3RlZCBtYWNoaW5lIGNoZWNrIGludGVycnVwdCwgYnV0IHRoaXMgaXMgbm90IGNyaXRpY2Fs\n" - "LCBhbmQgd29udCBiZQ0KYSBtZW1vcnkgZXJyb3IgYXMgdGhlIGNvcmUgZGlkbid0IGNvbnN1bWUg\n" - "cG9pc29uLiBtZW1jcHlfbWNzYWZlIGNhbm5vdA0KcHJvdGVjdCBhZ2FpbnN0IHRoaXMgYmVjYXVz\n" - "ZSB0aGUgd3JpdGUgaXMgJ3Bvc3RlZCcgYW5kIHRoZSBDTUNJIGlzIG5vdA0Kc3luY2hyb25vdXMu\n" - "IE5vdGUgdGhhdCB0aGlzIGlzIG9ubHkgaW4gdGhlIGxhdGVudCBlcnJvciBvciBtZW1tYXAtc3Rv\n" - "cmUNCmNhc2UuDQoNCj4gDQo+IFRoYW5rcywNCj4gQW5kaXJ5DQo+IA0KPiA+ID4gDQo+ID4gPiA+\n" - "IFsxXTogaHR0cDovL3d3dy5saW51eC5zZ2kuY29tL2FyY2hpdmVzL3hmcy8yMDE2LTA2L21zZzAw\n" - "Mjk5Lmh0bWwNCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IFRoYW5rIHlvdSBmb3IgdGhlIHBhdGNo\n" - c2V0LiBJIHdpbGwgbG9vayBpbnRvIGl0Lg0KPiA+ID4g + "On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote:\n" + "> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co\n" + "> m> wrote:\n" + "> > On 01/17, Andiry Xu wrote:\n" + "> > \n" + "> > <snip>\n" + "> > \n" + "> > > > > \n" + "> > > > > The pmem_do_bvec() read logic is like this:\n" + "> > > > > \n" + "> > > > > pmem_do_bvec()\n" + "> > > > > \302\240\302\240\302\240\302\240if (is_bad_pmem())\n" + "> > > > > \302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240return -EIO;\n" + "> > > > > \302\240\302\240\302\240\302\240else\n" + "> > > > > \302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240memcpy_from_pmem();\n" + "> > > > > \n" + "> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this\n" + "> > > > > imply\n" + "> > > > > that even if a block is not in the badblock list, it still can\n" + "> > > > > be bad\n" + "> > > > > and causes MCE? Does the badblock list get changed during file\n" + "> > > > > system\n" + "> > > > > running? If that is the case, should the file system get a\n" + "> > > > > notification when it gets changed? If a block is good when I\n" + "> > > > > first\n" + "> > > > > read it, can I still trust it to be good for the second\n" + "> > > > > access?\n" + "> > > > \n" + "> > > > Yes, if a block is not in the badblocks list, it can still cause\n" + "> > > > an\n" + "> > > > MCE. This is the latent error case I described above. For a\n" + "> > > > simple read()\n" + "> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For\n" + "> > > > mmap,\n" + "> > > > an MCE is inevitable.\n" + "> > > > \n" + "> > > > Yes the badblocks list may change while a filesystem is running.\n" + "> > > > The RFC\n" + "> > > > patches[1] I linked to add a notification for the filesystem\n" + "> > > > when this\n" + "> > > > happens.\n" + "> > > > \n" + "> > > \n" + "> > > This is really bad and it makes file system implementation much\n" + "> > > more\n" + "> > > complicated. And badblock notification does not help very much,\n" + "> > > because any block can be bad potentially, no matter it is in\n" + "> > > badblock\n" + "> > > list or not. And file system has to perform checking for every\n" + "> > > read,\n" + "> > > using memcpy_mcsafe. This is disaster for file system like NOVA,\n" + "> > > which\n" + "> > > uses pointer de-reference to access data structures on pmem. Now\n" + "> > > if I\n" + "> > > want to read a field in an inode on pmem, I have to copy it to\n" + "> > > DRAM\n" + "> > > first and make sure memcpy_mcsafe() does not report anything\n" + "> > > wrong.\n" + "> > \n" + "> > You have a good point, and I don't know if I have an answer for\n" + "> > this..\n" + "> > Assuming a system with MCE recovery, maybe NOVA can add a mce\n" + "> > handler\n" + "> > similar to nfit_handle_mce(), and handle errors as they happen, but\n" + "> > I'm\n" + "> > being very hand-wavey here and don't know how much/how well that\n" + "> > might\n" + "> > work..\n" + "> > \n" + "> > > \n" + "> > > > No, if the media, for some reason, 'dvelops' a bad cell, a\n" + "> > > > second\n" + "> > > > consecutive read does have a chance of being bad. Once a\n" + "> > > > location has\n" + "> > > > been marked as bad, it will stay bad till the ACPI clear error\n" + "> > > > 'DSM' has\n" + "> > > > been called to mark it as clean.\n" + "> > > > \n" + "> > > \n" + "> > > I wonder what happens to write in this case? If a block is bad but\n" + "> > > not\n" + "> > > reported in badblock list. Now I write to it without reading\n" + "> > > first. Do\n" + "> > > I clear the poison with the write? Or still require a ACPI DSM?\n" + "> > \n" + "> > With writes, my understanding is there is still a possibility that\n" + "> > an\n" + "> > internal read-modify-write can happen, and cause a MCE (this is the\n" + "> > same\n" + "> > as writing to a bad DRAM cell, which can also cause an MCE). You\n" + "> > can't\n" + "> > really use the ACPI DSM preemptively because you don't know whether\n" + "> > the\n" + "> > location was bad. The error flow will be something like write causes\n" + "> > the\n" + "> > MCE, a badblock gets added (either through the mce handler or after\n" + "> > the\n" + "> > next reboot), and the recovery path is now the same as a regular\n" + "> > badblock.\n" + "> > \n" + "> \n" + "> This is different from my understanding. Right now write_pmem() in\n" + "> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it\n" + "> clears poison and writes to pmem again. Seems to me writing to bad\n" + "> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores?\n" + "\n" + "You are right, writes don't use memcpy_mcsafe, and will not directly\n" + "cause an MCE. However a write can cause an asynchronous 'CMCI' -\n" + "corrected machine check interrupt, but this is not critical, and wont be\n" + "a memory error as the core didn't consume poison. memcpy_mcsafe cannot\n" + "protect against this because the write is 'posted' and the CMCI is not\n" + "synchronous. Note that this is only in the latent error or memmap-store\n" + "case.\n" + "\n" + "> \n" + "> Thanks,\n" + "> Andiry\n" + "> \n" + "> > > \n" + "> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html\n" + "> > > > \n" + "> > > \n" + "> > > Thank you for the patchset. I will look into it.\n" + "> > > \n" + "_______________________________________________\n" + "Linux-nvdimm mailing list\n" + "Linux-nvdimm@lists.01.org\n" + https://lists.01.org/mailman/listinfo/linux-nvdimm -f2d10fa7320b827160b9f1c61dcc0c2599df8fbf66603ef83422cab07af32ccc +524345bf3bb68c65aea5f176e62ba4b2ec7bd9dbb66a9accc036a9cea4f224a4
diff --git a/a/1.txt b/N2/1.txt index f21b819..31fcd0f 100644 --- a/a/1.txt +++ b/N2/1.txt @@ -1,81 +1,124 @@ -T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE3OjU4IC0wODAwLCBBbmRpcnkgWHUgd3JvdGU6DQo+IE9u -IFR1ZSwgSmFuIDE3LCAyMDE3IGF0IDM6NTEgUE0sIFZpc2hhbCBWZXJtYSA8dmlzaGFsLmwudmVy -bWFAaW50ZWwuY28NCj4gbT4gd3JvdGU6DQo+ID4gT24gMDEvMTcsIEFuZGlyeSBYdSB3cm90ZToN -Cj4gPiANCj4gPiA8c25pcD4NCj4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGUgcG1lbV9k -b19idmVjKCkgcmVhZCBsb2dpYyBpcyBsaWtlIHRoaXM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g -cG1lbV9kb19idmVjKCkNCj4gPiA+ID4gPiDCoMKgwqDCoGlmIChpc19iYWRfcG1lbSgpKQ0KPiA+ -ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gLUVJTzsNCj4gPiA+ID4gPiDCoMKgwqDCoGVs -c2UNCj4gPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgbWVtY3B5X2Zyb21fcG1lbSgpOw0KPiA+ID4g -PiA+IA0KPiA+ID4gPiA+IE5vdGUgbWVtY3B5X2Zyb21fcG1lbSgpIGlzIGNhbGxpbmcgbWVtY3B5 -X21jc2FmZSgpLiBEb2VzIHRoaXMNCj4gPiA+ID4gPiBpbXBseQ0KPiA+ID4gPiA+IHRoYXQgZXZl -biBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0aGUgYmFkYmxvY2sgbGlzdCwgaXQgc3RpbGwgY2FuDQo+ -ID4gPiA+ID4gYmUgYmFkDQo+ID4gPiA+ID4gYW5kIGNhdXNlcyBNQ0U/IERvZXMgdGhlIGJhZGJs -b2NrIGxpc3QgZ2V0IGNoYW5nZWQgZHVyaW5nIGZpbGUNCj4gPiA+ID4gPiBzeXN0ZW0NCj4gPiA+ -ID4gPiBydW5uaW5nPyBJZiB0aGF0IGlzIHRoZSBjYXNlLCBzaG91bGQgdGhlIGZpbGUgc3lzdGVt -IGdldCBhDQo+ID4gPiA+ID4gbm90aWZpY2F0aW9uIHdoZW4gaXQgZ2V0cyBjaGFuZ2VkPyBJZiBh -IGJsb2NrIGlzIGdvb2Qgd2hlbiBJDQo+ID4gPiA+ID4gZmlyc3QNCj4gPiA+ID4gPiByZWFkIGl0 -LCBjYW4gSSBzdGlsbCB0cnVzdCBpdCB0byBiZSBnb29kIGZvciB0aGUgc2Vjb25kDQo+ID4gPiA+ -ID4gYWNjZXNzPw0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLCBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0 -aGUgYmFkYmxvY2tzIGxpc3QsIGl0IGNhbiBzdGlsbCBjYXVzZQ0KPiA+ID4gPiBhbg0KPiA+ID4g -PiBNQ0UuIFRoaXMgaXMgdGhlIGxhdGVudCBlcnJvciBjYXNlIEkgZGVzY3JpYmVkIGFib3ZlLiBG -b3IgYQ0KPiA+ID4gPiBzaW1wbGUgcmVhZCgpDQo+ID4gPiA+IHZpYSB0aGUgcG1lbSBkcml2ZXIs -IHRoaXMgd2lsbCBnZXQgaGFuZGxlZCBieSBtZW1jcHlfbWNzYWZlLiBGb3INCj4gPiA+ID4gbW1h -cCwNCj4gPiA+ID4gYW4gTUNFIGlzIGluZXZpdGFibGUuDQo+ID4gPiA+IA0KPiA+ID4gPiBZZXMg -dGhlIGJhZGJsb2NrcyBsaXN0IG1heSBjaGFuZ2Ugd2hpbGUgYSBmaWxlc3lzdGVtIGlzIHJ1bm5p -bmcuDQo+ID4gPiA+IFRoZSBSRkMNCj4gPiA+ID4gcGF0Y2hlc1sxXSBJIGxpbmtlZCB0byBhZGQg -YSBub3RpZmljYXRpb24gZm9yIHRoZSBmaWxlc3lzdGVtDQo+ID4gPiA+IHdoZW4gdGhpcw0KPiA+ -ID4gPiBoYXBwZW5zLg0KPiA+ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhpcyBpcyByZWFsbHkgYmFk -IGFuZCBpdCBtYWtlcyBmaWxlIHN5c3RlbSBpbXBsZW1lbnRhdGlvbiBtdWNoDQo+ID4gPiBtb3Jl -DQo+ID4gPiBjb21wbGljYXRlZC4gQW5kIGJhZGJsb2NrIG5vdGlmaWNhdGlvbiBkb2VzIG5vdCBo -ZWxwIHZlcnkgbXVjaCwNCj4gPiA+IGJlY2F1c2UgYW55IGJsb2NrIGNhbiBiZSBiYWQgcG90ZW50 -aWFsbHksIG5vIG1hdHRlciBpdCBpcyBpbg0KPiA+ID4gYmFkYmxvY2sNCj4gPiA+IGxpc3Qgb3Ig -bm90LiBBbmQgZmlsZSBzeXN0ZW0gaGFzIHRvIHBlcmZvcm0gY2hlY2tpbmcgZm9yIGV2ZXJ5DQo+ -ID4gPiByZWFkLA0KPiA+ID4gdXNpbmcgbWVtY3B5X21jc2FmZS4gVGhpcyBpcyBkaXNhc3RlciBm -b3IgZmlsZSBzeXN0ZW0gbGlrZSBOT1ZBLA0KPiA+ID4gd2hpY2gNCj4gPiA+IHVzZXMgcG9pbnRl -ciBkZS1yZWZlcmVuY2UgdG8gYWNjZXNzIGRhdGEgc3RydWN0dXJlcyBvbiBwbWVtLiBOb3cNCj4g -PiA+IGlmIEkNCj4gPiA+IHdhbnQgdG8gcmVhZCBhIGZpZWxkIGluIGFuIGlub2RlIG9uIHBtZW0s -IEkgaGF2ZSB0byBjb3B5IGl0IHRvDQo+ID4gPiBEUkFNDQo+ID4gPiBmaXJzdCBhbmQgbWFrZSBz -dXJlIG1lbWNweV9tY3NhZmUoKSBkb2VzIG5vdCByZXBvcnQgYW55dGhpbmcNCj4gPiA+IHdyb25n -Lg0KPiA+IA0KPiA+IFlvdSBoYXZlIGEgZ29vZCBwb2ludCwgYW5kIEkgZG9uJ3Qga25vdyBpZiBJ -IGhhdmUgYW4gYW5zd2VyIGZvcg0KPiA+IHRoaXMuLg0KPiA+IEFzc3VtaW5nIGEgc3lzdGVtIHdp -dGggTUNFIHJlY292ZXJ5LCBtYXliZSBOT1ZBIGNhbiBhZGQgYSBtY2UNCj4gPiBoYW5kbGVyDQo+ -ID4gc2ltaWxhciB0byBuZml0X2hhbmRsZV9tY2UoKSwgYW5kIGhhbmRsZSBlcnJvcnMgYXMgdGhl -eSBoYXBwZW4sIGJ1dA0KPiA+IEknbQ0KPiA+IGJlaW5nIHZlcnkgaGFuZC13YXZleSBoZXJlIGFu -ZCBkb24ndCBrbm93IGhvdyBtdWNoL2hvdyB3ZWxsIHRoYXQNCj4gPiBtaWdodA0KPiA+IHdvcmsu -Lg0KPiA+IA0KPiA+ID4gDQo+ID4gPiA+IE5vLCBpZiB0aGUgbWVkaWEsIGZvciBzb21lIHJlYXNv -biwgJ2R2ZWxvcHMnIGEgYmFkIGNlbGwsIGENCj4gPiA+ID4gc2Vjb25kDQo+ID4gPiA+IGNvbnNl -Y3V0aXZlIHJlYWQgZG9lcyBoYXZlIGEgY2hhbmNlIG9mIGJlaW5nIGJhZC4gT25jZSBhDQo+ID4g -PiA+IGxvY2F0aW9uIGhhcw0KPiA+ID4gPiBiZWVuIG1hcmtlZCBhcyBiYWQsIGl0IHdpbGwgc3Rh -eSBiYWQgdGlsbCB0aGUgQUNQSSBjbGVhciBlcnJvcg0KPiA+ID4gPiAnRFNNJyBoYXMNCj4gPiA+ -ID4gYmVlbiBjYWxsZWQgdG8gbWFyayBpdCBhcyBjbGVhbi4NCj4gPiA+ID4gDQo+ID4gPiANCj4g -PiA+IEkgd29uZGVyIHdoYXQgaGFwcGVucyB0byB3cml0ZSBpbiB0aGlzIGNhc2U/IElmIGEgYmxv -Y2sgaXMgYmFkIGJ1dA0KPiA+ID4gbm90DQo+ID4gPiByZXBvcnRlZCBpbiBiYWRibG9jayBsaXN0 -LiBOb3cgSSB3cml0ZSB0byBpdCB3aXRob3V0IHJlYWRpbmcNCj4gPiA+IGZpcnN0LiBEbw0KPiA+ -ID4gSSBjbGVhciB0aGUgcG9pc29uIHdpdGggdGhlIHdyaXRlPyBPciBzdGlsbCByZXF1aXJlIGEg -QUNQSSBEU00/DQo+ID4gDQo+ID4gV2l0aCB3cml0ZXMsIG15IHVuZGVyc3RhbmRpbmcgaXMgdGhl -cmUgaXMgc3RpbGwgYSBwb3NzaWJpbGl0eSB0aGF0DQo+ID4gYW4NCj4gPiBpbnRlcm5hbCByZWFk -LW1vZGlmeS13cml0ZSBjYW4gaGFwcGVuLCBhbmQgY2F1c2UgYSBNQ0UgKHRoaXMgaXMgdGhlDQo+ -ID4gc2FtZQ0KPiA+IGFzIHdyaXRpbmcgdG8gYSBiYWQgRFJBTSBjZWxsLCB3aGljaCBjYW4gYWxz -byBjYXVzZSBhbiBNQ0UpLiBZb3UNCj4gPiBjYW4ndA0KPiA+IHJlYWxseSB1c2UgdGhlIEFDUEkg -RFNNIHByZWVtcHRpdmVseSBiZWNhdXNlIHlvdSBkb24ndCBrbm93IHdoZXRoZXINCj4gPiB0aGUN -Cj4gPiBsb2NhdGlvbiB3YXMgYmFkLiBUaGUgZXJyb3IgZmxvdyB3aWxsIGJlIHNvbWV0aGluZyBs -aWtlIHdyaXRlIGNhdXNlcw0KPiA+IHRoZQ0KPiA+IE1DRSwgYSBiYWRibG9jayBnZXRzIGFkZGVk -IChlaXRoZXIgdGhyb3VnaCB0aGUgbWNlIGhhbmRsZXIgb3IgYWZ0ZXINCj4gPiB0aGUNCj4gPiBu -ZXh0IHJlYm9vdCksIGFuZCB0aGUgcmVjb3ZlcnkgcGF0aCBpcyBub3cgdGhlIHNhbWUgYXMgYSBy -ZWd1bGFyDQo+ID4gYmFkYmxvY2suDQo+ID4gDQo+IA0KPiBUaGlzIGlzIGRpZmZlcmVudCBmcm9t -IG15IHVuZGVyc3RhbmRpbmcuIFJpZ2h0IG5vdyB3cml0ZV9wbWVtKCkgaW4NCj4gcG1lbV9kb19i -dmVjKCkgZG9lcyBub3QgdXNlIG1lbWNweV9tY3NhZmUoKS4gSWYgdGhlIGJsb2NrIGlzIGJhZCBp -dA0KPiBjbGVhcnMgcG9pc29uIGFuZCB3cml0ZXMgdG8gcG1lbSBhZ2Fpbi4gU2VlbXMgdG8gbWUg -d3JpdGluZyB0byBiYWQNCj4gYmxvY2tzIGRvZXMgbm90IGNhdXNlIE1DRS4gRG8gd2UgbmVlZCBt -ZW1jcHlfbWNzYWZlIGZvciBwbWVtIHN0b3Jlcz8NCg0KWW91IGFyZSByaWdodCwgd3JpdGVzIGRv -bid0IHVzZSBtZW1jcHlfbWNzYWZlLCBhbmQgd2lsbCBub3QgZGlyZWN0bHkNCmNhdXNlIGFuIE1D -RS4gSG93ZXZlciBhIHdyaXRlIGNhbiBjYXVzZSBhbiBhc3luY2hyb25vdXMgJ0NNQ0knIC0NCmNv -cnJlY3RlZCBtYWNoaW5lIGNoZWNrIGludGVycnVwdCwgYnV0IHRoaXMgaXMgbm90IGNyaXRpY2Fs -LCBhbmQgd29udCBiZQ0KYSBtZW1vcnkgZXJyb3IgYXMgdGhlIGNvcmUgZGlkbid0IGNvbnN1bWUg -cG9pc29uLiBtZW1jcHlfbWNzYWZlIGNhbm5vdA0KcHJvdGVjdCBhZ2FpbnN0IHRoaXMgYmVjYXVz -ZSB0aGUgd3JpdGUgaXMgJ3Bvc3RlZCcgYW5kIHRoZSBDTUNJIGlzIG5vdA0Kc3luY2hyb25vdXMu -IE5vdGUgdGhhdCB0aGlzIGlzIG9ubHkgaW4gdGhlIGxhdGVudCBlcnJvciBvciBtZW1tYXAtc3Rv -cmUNCmNhc2UuDQoNCj4gDQo+IFRoYW5rcywNCj4gQW5kaXJ5DQo+IA0KPiA+ID4gDQo+ID4gPiA+ -IFsxXTogaHR0cDovL3d3dy5saW51eC5zZ2kuY29tL2FyY2hpdmVzL3hmcy8yMDE2LTA2L21zZzAw -Mjk5Lmh0bWwNCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IFRoYW5rIHlvdSBmb3IgdGhlIHBhdGNo -c2V0LiBJIHdpbGwgbG9vayBpbnRvIGl0Lg0KPiA+ID4g +On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote: +> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co +> m> wrote: +> > On 01/17, Andiry Xu wrote: +> > +> > <snip> +> > +> > > > > +> > > > > The pmem_do_bvec() read logic is like this: +> > > > > +> > > > > pmem_do_bvec() +> > > > > if (is_bad_pmem()) +> > > > > return -EIO; +> > > > > else +> > > > > memcpy_from_pmem(); +> > > > > +> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this +> > > > > imply +> > > > > that even if a block is not in the badblock list, it still can +> > > > > be bad +> > > > > and causes MCE? Does the badblock list get changed during file +> > > > > system +> > > > > running? If that is the case, should the file system get a +> > > > > notification when it gets changed? If a block is good when I +> > > > > first +> > > > > read it, can I still trust it to be good for the second +> > > > > access? +> > > > +> > > > Yes, if a block is not in the badblocks list, it can still cause +> > > > an +> > > > MCE. This is the latent error case I described above. For a +> > > > simple read() +> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For +> > > > mmap, +> > > > an MCE is inevitable. +> > > > +> > > > Yes the badblocks list may change while a filesystem is running. +> > > > The RFC +> > > > patches[1] I linked to add a notification for the filesystem +> > > > when this +> > > > happens. +> > > > +> > > +> > > This is really bad and it makes file system implementation much +> > > more +> > > complicated. And badblock notification does not help very much, +> > > because any block can be bad potentially, no matter it is in +> > > badblock +> > > list or not. And file system has to perform checking for every +> > > read, +> > > using memcpy_mcsafe. This is disaster for file system like NOVA, +> > > which +> > > uses pointer de-reference to access data structures on pmem. Now +> > > if I +> > > want to read a field in an inode on pmem, I have to copy it to +> > > DRAM +> > > first and make sure memcpy_mcsafe() does not report anything +> > > wrong. +> > +> > You have a good point, and I don't know if I have an answer for +> > this.. +> > Assuming a system with MCE recovery, maybe NOVA can add a mce +> > handler +> > similar to nfit_handle_mce(), and handle errors as they happen, but +> > I'm +> > being very hand-wavey here and don't know how much/how well that +> > might +> > work.. +> > +> > > +> > > > No, if the media, for some reason, 'dvelops' a bad cell, a +> > > > second +> > > > consecutive read does have a chance of being bad. Once a +> > > > location has +> > > > been marked as bad, it will stay bad till the ACPI clear error +> > > > 'DSM' has +> > > > been called to mark it as clean. +> > > > +> > > +> > > I wonder what happens to write in this case? If a block is bad but +> > > not +> > > reported in badblock list. Now I write to it without reading +> > > first. Do +> > > I clear the poison with the write? Or still require a ACPI DSM? +> > +> > With writes, my understanding is there is still a possibility that +> > an +> > internal read-modify-write can happen, and cause a MCE (this is the +> > same +> > as writing to a bad DRAM cell, which can also cause an MCE). You +> > can't +> > really use the ACPI DSM preemptively because you don't know whether +> > the +> > location was bad. The error flow will be something like write causes +> > the +> > MCE, a badblock gets added (either through the mce handler or after +> > the +> > next reboot), and the recovery path is now the same as a regular +> > badblock. +> > +> +> This is different from my understanding. Right now write_pmem() in +> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it +> clears poison and writes to pmem again. Seems to me writing to bad +> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores? + +You are right, writes don't use memcpy_mcsafe, and will not directly +cause an MCE. However a write can cause an asynchronous 'CMCI' - +corrected machine check interrupt, but this is not critical, and wont be +a memory error as the core didn't consume poison. memcpy_mcsafe cannot +protect against this because the write is 'posted' and the CMCI is not +synchronous. Note that this is only in the latent error or memmap-store +case. + +> +> Thanks, +> Andiry +> +> > > +> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html +> > > > +> > > +> > > Thank you for the patchset. I will look into it. +> > > diff --git a/a/content_digest b/N2/content_digest index ab6185c..3c67493 100644 --- a/a/content_digest +++ b/N2/content_digest @@ -22,86 +22,129 @@ " linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>\0" "\00:1\0" "b\0" - "T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE3OjU4IC0wODAwLCBBbmRpcnkgWHUgd3JvdGU6DQo+IE9u\n" - "IFR1ZSwgSmFuIDE3LCAyMDE3IGF0IDM6NTEgUE0sIFZpc2hhbCBWZXJtYSA8dmlzaGFsLmwudmVy\n" - "bWFAaW50ZWwuY28NCj4gbT4gd3JvdGU6DQo+ID4gT24gMDEvMTcsIEFuZGlyeSBYdSB3cm90ZToN\n" - "Cj4gPiANCj4gPiA8c25pcD4NCj4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGUgcG1lbV9k\n" - "b19idmVjKCkgcmVhZCBsb2dpYyBpcyBsaWtlIHRoaXM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g\n" - "cG1lbV9kb19idmVjKCkNCj4gPiA+ID4gPiDCoMKgwqDCoGlmIChpc19iYWRfcG1lbSgpKQ0KPiA+\n" - "ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gLUVJTzsNCj4gPiA+ID4gPiDCoMKgwqDCoGVs\n" - "c2UNCj4gPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgbWVtY3B5X2Zyb21fcG1lbSgpOw0KPiA+ID4g\n" - "PiA+IA0KPiA+ID4gPiA+IE5vdGUgbWVtY3B5X2Zyb21fcG1lbSgpIGlzIGNhbGxpbmcgbWVtY3B5\n" - "X21jc2FmZSgpLiBEb2VzIHRoaXMNCj4gPiA+ID4gPiBpbXBseQ0KPiA+ID4gPiA+IHRoYXQgZXZl\n" - "biBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0aGUgYmFkYmxvY2sgbGlzdCwgaXQgc3RpbGwgY2FuDQo+\n" - "ID4gPiA+ID4gYmUgYmFkDQo+ID4gPiA+ID4gYW5kIGNhdXNlcyBNQ0U/IERvZXMgdGhlIGJhZGJs\n" - "b2NrIGxpc3QgZ2V0IGNoYW5nZWQgZHVyaW5nIGZpbGUNCj4gPiA+ID4gPiBzeXN0ZW0NCj4gPiA+\n" - "ID4gPiBydW5uaW5nPyBJZiB0aGF0IGlzIHRoZSBjYXNlLCBzaG91bGQgdGhlIGZpbGUgc3lzdGVt\n" - "IGdldCBhDQo+ID4gPiA+ID4gbm90aWZpY2F0aW9uIHdoZW4gaXQgZ2V0cyBjaGFuZ2VkPyBJZiBh\n" - "IGJsb2NrIGlzIGdvb2Qgd2hlbiBJDQo+ID4gPiA+ID4gZmlyc3QNCj4gPiA+ID4gPiByZWFkIGl0\n" - "LCBjYW4gSSBzdGlsbCB0cnVzdCBpdCB0byBiZSBnb29kIGZvciB0aGUgc2Vjb25kDQo+ID4gPiA+\n" - "ID4gYWNjZXNzPw0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLCBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0\n" - "aGUgYmFkYmxvY2tzIGxpc3QsIGl0IGNhbiBzdGlsbCBjYXVzZQ0KPiA+ID4gPiBhbg0KPiA+ID4g\n" - "PiBNQ0UuIFRoaXMgaXMgdGhlIGxhdGVudCBlcnJvciBjYXNlIEkgZGVzY3JpYmVkIGFib3ZlLiBG\n" - "b3IgYQ0KPiA+ID4gPiBzaW1wbGUgcmVhZCgpDQo+ID4gPiA+IHZpYSB0aGUgcG1lbSBkcml2ZXIs\n" - "IHRoaXMgd2lsbCBnZXQgaGFuZGxlZCBieSBtZW1jcHlfbWNzYWZlLiBGb3INCj4gPiA+ID4gbW1h\n" - "cCwNCj4gPiA+ID4gYW4gTUNFIGlzIGluZXZpdGFibGUuDQo+ID4gPiA+IA0KPiA+ID4gPiBZZXMg\n" - "dGhlIGJhZGJsb2NrcyBsaXN0IG1heSBjaGFuZ2Ugd2hpbGUgYSBmaWxlc3lzdGVtIGlzIHJ1bm5p\n" - "bmcuDQo+ID4gPiA+IFRoZSBSRkMNCj4gPiA+ID4gcGF0Y2hlc1sxXSBJIGxpbmtlZCB0byBhZGQg\n" - "YSBub3RpZmljYXRpb24gZm9yIHRoZSBmaWxlc3lzdGVtDQo+ID4gPiA+IHdoZW4gdGhpcw0KPiA+\n" - "ID4gPiBoYXBwZW5zLg0KPiA+ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhpcyBpcyByZWFsbHkgYmFk\n" - "IGFuZCBpdCBtYWtlcyBmaWxlIHN5c3RlbSBpbXBsZW1lbnRhdGlvbiBtdWNoDQo+ID4gPiBtb3Jl\n" - "DQo+ID4gPiBjb21wbGljYXRlZC4gQW5kIGJhZGJsb2NrIG5vdGlmaWNhdGlvbiBkb2VzIG5vdCBo\n" - "ZWxwIHZlcnkgbXVjaCwNCj4gPiA+IGJlY2F1c2UgYW55IGJsb2NrIGNhbiBiZSBiYWQgcG90ZW50\n" - "aWFsbHksIG5vIG1hdHRlciBpdCBpcyBpbg0KPiA+ID4gYmFkYmxvY2sNCj4gPiA+IGxpc3Qgb3Ig\n" - "bm90LiBBbmQgZmlsZSBzeXN0ZW0gaGFzIHRvIHBlcmZvcm0gY2hlY2tpbmcgZm9yIGV2ZXJ5DQo+\n" - "ID4gPiByZWFkLA0KPiA+ID4gdXNpbmcgbWVtY3B5X21jc2FmZS4gVGhpcyBpcyBkaXNhc3RlciBm\n" - "b3IgZmlsZSBzeXN0ZW0gbGlrZSBOT1ZBLA0KPiA+ID4gd2hpY2gNCj4gPiA+IHVzZXMgcG9pbnRl\n" - "ciBkZS1yZWZlcmVuY2UgdG8gYWNjZXNzIGRhdGEgc3RydWN0dXJlcyBvbiBwbWVtLiBOb3cNCj4g\n" - "PiA+IGlmIEkNCj4gPiA+IHdhbnQgdG8gcmVhZCBhIGZpZWxkIGluIGFuIGlub2RlIG9uIHBtZW0s\n" - "IEkgaGF2ZSB0byBjb3B5IGl0IHRvDQo+ID4gPiBEUkFNDQo+ID4gPiBmaXJzdCBhbmQgbWFrZSBz\n" - "dXJlIG1lbWNweV9tY3NhZmUoKSBkb2VzIG5vdCByZXBvcnQgYW55dGhpbmcNCj4gPiA+IHdyb25n\n" - "Lg0KPiA+IA0KPiA+IFlvdSBoYXZlIGEgZ29vZCBwb2ludCwgYW5kIEkgZG9uJ3Qga25vdyBpZiBJ\n" - "IGhhdmUgYW4gYW5zd2VyIGZvcg0KPiA+IHRoaXMuLg0KPiA+IEFzc3VtaW5nIGEgc3lzdGVtIHdp\n" - "dGggTUNFIHJlY292ZXJ5LCBtYXliZSBOT1ZBIGNhbiBhZGQgYSBtY2UNCj4gPiBoYW5kbGVyDQo+\n" - "ID4gc2ltaWxhciB0byBuZml0X2hhbmRsZV9tY2UoKSwgYW5kIGhhbmRsZSBlcnJvcnMgYXMgdGhl\n" - "eSBoYXBwZW4sIGJ1dA0KPiA+IEknbQ0KPiA+IGJlaW5nIHZlcnkgaGFuZC13YXZleSBoZXJlIGFu\n" - "ZCBkb24ndCBrbm93IGhvdyBtdWNoL2hvdyB3ZWxsIHRoYXQNCj4gPiBtaWdodA0KPiA+IHdvcmsu\n" - "Lg0KPiA+IA0KPiA+ID4gDQo+ID4gPiA+IE5vLCBpZiB0aGUgbWVkaWEsIGZvciBzb21lIHJlYXNv\n" - "biwgJ2R2ZWxvcHMnIGEgYmFkIGNlbGwsIGENCj4gPiA+ID4gc2Vjb25kDQo+ID4gPiA+IGNvbnNl\n" - "Y3V0aXZlIHJlYWQgZG9lcyBoYXZlIGEgY2hhbmNlIG9mIGJlaW5nIGJhZC4gT25jZSBhDQo+ID4g\n" - "PiA+IGxvY2F0aW9uIGhhcw0KPiA+ID4gPiBiZWVuIG1hcmtlZCBhcyBiYWQsIGl0IHdpbGwgc3Rh\n" - "eSBiYWQgdGlsbCB0aGUgQUNQSSBjbGVhciBlcnJvcg0KPiA+ID4gPiAnRFNNJyBoYXMNCj4gPiA+\n" - "ID4gYmVlbiBjYWxsZWQgdG8gbWFyayBpdCBhcyBjbGVhbi4NCj4gPiA+ID4gDQo+ID4gPiANCj4g\n" - "PiA+IEkgd29uZGVyIHdoYXQgaGFwcGVucyB0byB3cml0ZSBpbiB0aGlzIGNhc2U/IElmIGEgYmxv\n" - "Y2sgaXMgYmFkIGJ1dA0KPiA+ID4gbm90DQo+ID4gPiByZXBvcnRlZCBpbiBiYWRibG9jayBsaXN0\n" - "LiBOb3cgSSB3cml0ZSB0byBpdCB3aXRob3V0IHJlYWRpbmcNCj4gPiA+IGZpcnN0LiBEbw0KPiA+\n" - "ID4gSSBjbGVhciB0aGUgcG9pc29uIHdpdGggdGhlIHdyaXRlPyBPciBzdGlsbCByZXF1aXJlIGEg\n" - "QUNQSSBEU00/DQo+ID4gDQo+ID4gV2l0aCB3cml0ZXMsIG15IHVuZGVyc3RhbmRpbmcgaXMgdGhl\n" - "cmUgaXMgc3RpbGwgYSBwb3NzaWJpbGl0eSB0aGF0DQo+ID4gYW4NCj4gPiBpbnRlcm5hbCByZWFk\n" - "LW1vZGlmeS13cml0ZSBjYW4gaGFwcGVuLCBhbmQgY2F1c2UgYSBNQ0UgKHRoaXMgaXMgdGhlDQo+\n" - "ID4gc2FtZQ0KPiA+IGFzIHdyaXRpbmcgdG8gYSBiYWQgRFJBTSBjZWxsLCB3aGljaCBjYW4gYWxz\n" - "byBjYXVzZSBhbiBNQ0UpLiBZb3UNCj4gPiBjYW4ndA0KPiA+IHJlYWxseSB1c2UgdGhlIEFDUEkg\n" - "RFNNIHByZWVtcHRpdmVseSBiZWNhdXNlIHlvdSBkb24ndCBrbm93IHdoZXRoZXINCj4gPiB0aGUN\n" - "Cj4gPiBsb2NhdGlvbiB3YXMgYmFkLiBUaGUgZXJyb3IgZmxvdyB3aWxsIGJlIHNvbWV0aGluZyBs\n" - "aWtlIHdyaXRlIGNhdXNlcw0KPiA+IHRoZQ0KPiA+IE1DRSwgYSBiYWRibG9jayBnZXRzIGFkZGVk\n" - "IChlaXRoZXIgdGhyb3VnaCB0aGUgbWNlIGhhbmRsZXIgb3IgYWZ0ZXINCj4gPiB0aGUNCj4gPiBu\n" - "ZXh0IHJlYm9vdCksIGFuZCB0aGUgcmVjb3ZlcnkgcGF0aCBpcyBub3cgdGhlIHNhbWUgYXMgYSBy\n" - "ZWd1bGFyDQo+ID4gYmFkYmxvY2suDQo+ID4gDQo+IA0KPiBUaGlzIGlzIGRpZmZlcmVudCBmcm9t\n" - "IG15IHVuZGVyc3RhbmRpbmcuIFJpZ2h0IG5vdyB3cml0ZV9wbWVtKCkgaW4NCj4gcG1lbV9kb19i\n" - "dmVjKCkgZG9lcyBub3QgdXNlIG1lbWNweV9tY3NhZmUoKS4gSWYgdGhlIGJsb2NrIGlzIGJhZCBp\n" - "dA0KPiBjbGVhcnMgcG9pc29uIGFuZCB3cml0ZXMgdG8gcG1lbSBhZ2Fpbi4gU2VlbXMgdG8gbWUg\n" - "d3JpdGluZyB0byBiYWQNCj4gYmxvY2tzIGRvZXMgbm90IGNhdXNlIE1DRS4gRG8gd2UgbmVlZCBt\n" - "ZW1jcHlfbWNzYWZlIGZvciBwbWVtIHN0b3Jlcz8NCg0KWW91IGFyZSByaWdodCwgd3JpdGVzIGRv\n" - "bid0IHVzZSBtZW1jcHlfbWNzYWZlLCBhbmQgd2lsbCBub3QgZGlyZWN0bHkNCmNhdXNlIGFuIE1D\n" - "RS4gSG93ZXZlciBhIHdyaXRlIGNhbiBjYXVzZSBhbiBhc3luY2hyb25vdXMgJ0NNQ0knIC0NCmNv\n" - "cnJlY3RlZCBtYWNoaW5lIGNoZWNrIGludGVycnVwdCwgYnV0IHRoaXMgaXMgbm90IGNyaXRpY2Fs\n" - "LCBhbmQgd29udCBiZQ0KYSBtZW1vcnkgZXJyb3IgYXMgdGhlIGNvcmUgZGlkbid0IGNvbnN1bWUg\n" - "cG9pc29uLiBtZW1jcHlfbWNzYWZlIGNhbm5vdA0KcHJvdGVjdCBhZ2FpbnN0IHRoaXMgYmVjYXVz\n" - "ZSB0aGUgd3JpdGUgaXMgJ3Bvc3RlZCcgYW5kIHRoZSBDTUNJIGlzIG5vdA0Kc3luY2hyb25vdXMu\n" - "IE5vdGUgdGhhdCB0aGlzIGlzIG9ubHkgaW4gdGhlIGxhdGVudCBlcnJvciBvciBtZW1tYXAtc3Rv\n" - "cmUNCmNhc2UuDQoNCj4gDQo+IFRoYW5rcywNCj4gQW5kaXJ5DQo+IA0KPiA+ID4gDQo+ID4gPiA+\n" - "IFsxXTogaHR0cDovL3d3dy5saW51eC5zZ2kuY29tL2FyY2hpdmVzL3hmcy8yMDE2LTA2L21zZzAw\n" - "Mjk5Lmh0bWwNCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IFRoYW5rIHlvdSBmb3IgdGhlIHBhdGNo\n" - c2V0LiBJIHdpbGwgbG9vayBpbnRvIGl0Lg0KPiA+ID4g + "On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote:\n" + "> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co\n" + "> m> wrote:\n" + "> > On 01/17, Andiry Xu wrote:\n" + "> > \n" + "> > <snip>\n" + "> > \n" + "> > > > > \n" + "> > > > > The pmem_do_bvec() read logic is like this:\n" + "> > > > > \n" + "> > > > > pmem_do_bvec()\n" + "> > > > > \302\240\302\240\302\240\302\240if (is_bad_pmem())\n" + "> > > > > \302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240return -EIO;\n" + "> > > > > \302\240\302\240\302\240\302\240else\n" + "> > > > > \302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240memcpy_from_pmem();\n" + "> > > > > \n" + "> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this\n" + "> > > > > imply\n" + "> > > > > that even if a block is not in the badblock list, it still can\n" + "> > > > > be bad\n" + "> > > > > and causes MCE? Does the badblock list get changed during file\n" + "> > > > > system\n" + "> > > > > running? If that is the case, should the file system get a\n" + "> > > > > notification when it gets changed? If a block is good when I\n" + "> > > > > first\n" + "> > > > > read it, can I still trust it to be good for the second\n" + "> > > > > access?\n" + "> > > > \n" + "> > > > Yes, if a block is not in the badblocks list, it can still cause\n" + "> > > > an\n" + "> > > > MCE. This is the latent error case I described above. For a\n" + "> > > > simple read()\n" + "> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For\n" + "> > > > mmap,\n" + "> > > > an MCE is inevitable.\n" + "> > > > \n" + "> > > > Yes the badblocks list may change while a filesystem is running.\n" + "> > > > The RFC\n" + "> > > > patches[1] I linked to add a notification for the filesystem\n" + "> > > > when this\n" + "> > > > happens.\n" + "> > > > \n" + "> > > \n" + "> > > This is really bad and it makes file system implementation much\n" + "> > > more\n" + "> > > complicated. And badblock notification does not help very much,\n" + "> > > because any block can be bad potentially, no matter it is in\n" + "> > > badblock\n" + "> > > list or not. And file system has to perform checking for every\n" + "> > > read,\n" + "> > > using memcpy_mcsafe. This is disaster for file system like NOVA,\n" + "> > > which\n" + "> > > uses pointer de-reference to access data structures on pmem. Now\n" + "> > > if I\n" + "> > > want to read a field in an inode on pmem, I have to copy it to\n" + "> > > DRAM\n" + "> > > first and make sure memcpy_mcsafe() does not report anything\n" + "> > > wrong.\n" + "> > \n" + "> > You have a good point, and I don't know if I have an answer for\n" + "> > this..\n" + "> > Assuming a system with MCE recovery, maybe NOVA can add a mce\n" + "> > handler\n" + "> > similar to nfit_handle_mce(), and handle errors as they happen, but\n" + "> > I'm\n" + "> > being very hand-wavey here and don't know how much/how well that\n" + "> > might\n" + "> > work..\n" + "> > \n" + "> > > \n" + "> > > > No, if the media, for some reason, 'dvelops' a bad cell, a\n" + "> > > > second\n" + "> > > > consecutive read does have a chance of being bad. Once a\n" + "> > > > location has\n" + "> > > > been marked as bad, it will stay bad till the ACPI clear error\n" + "> > > > 'DSM' has\n" + "> > > > been called to mark it as clean.\n" + "> > > > \n" + "> > > \n" + "> > > I wonder what happens to write in this case? If a block is bad but\n" + "> > > not\n" + "> > > reported in badblock list. Now I write to it without reading\n" + "> > > first. Do\n" + "> > > I clear the poison with the write? Or still require a ACPI DSM?\n" + "> > \n" + "> > With writes, my understanding is there is still a possibility that\n" + "> > an\n" + "> > internal read-modify-write can happen, and cause a MCE (this is the\n" + "> > same\n" + "> > as writing to a bad DRAM cell, which can also cause an MCE). You\n" + "> > can't\n" + "> > really use the ACPI DSM preemptively because you don't know whether\n" + "> > the\n" + "> > location was bad. The error flow will be something like write causes\n" + "> > the\n" + "> > MCE, a badblock gets added (either through the mce handler or after\n" + "> > the\n" + "> > next reboot), and the recovery path is now the same as a regular\n" + "> > badblock.\n" + "> > \n" + "> \n" + "> This is different from my understanding. Right now write_pmem() in\n" + "> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it\n" + "> clears poison and writes to pmem again. Seems to me writing to bad\n" + "> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores?\n" + "\n" + "You are right, writes don't use memcpy_mcsafe, and will not directly\n" + "cause an MCE. However a write can cause an asynchronous 'CMCI' -\n" + "corrected machine check interrupt, but this is not critical, and wont be\n" + "a memory error as the core didn't consume poison. memcpy_mcsafe cannot\n" + "protect against this because the write is 'posted' and the CMCI is not\n" + "synchronous. Note that this is only in the latent error or memmap-store\n" + "case.\n" + "\n" + "> \n" + "> Thanks,\n" + "> Andiry\n" + "> \n" + "> > > \n" + "> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html\n" + "> > > > \n" + "> > > \n" + "> > > Thank you for the patchset. I will look into it.\n" + > > > -f2d10fa7320b827160b9f1c61dcc0c2599df8fbf66603ef83422cab07af32ccc +7e8770f3f253ce8b0cf565b3b94fb8ecd7881357581d7d7004e0c1ea7aab2d8f
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.