From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ross Zwisler Subject: Re: [PATCH 5/7] dax: New fault locking Date: Thu, 12 May 2016 13:36:24 -0600 Message-ID: <20160512193624.GA20719@linux.intel.com> References: <1463070560-1979-1-git-send-email-jack@suse.cz> <1463070560-1979-6-git-send-email-jack@suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-nvdimm@lists.01.org, Ted Tso , linux-ext4@vger.kernel.org, Vishal Verma , Ross Zwisler , Dan Williams , linux-fsdevel@vger.kernel.org To: Jan Kara Return-path: Content-Disposition: inline In-Reply-To: <1463070560-1979-6-git-send-email-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Thu, May 12, 2016 at 06:29:18PM +0200, Jan Kara wrote: > Currently DAX page fault locking is racy. >=20 > CPU0 (write fault) CPU1 (read fault) >=20 > __dax_fault() __dax_fault() > get_block(inode, block, &bh, 0) -> not mapped > get_block(inode, block, &bh, 0) > -> not mapped > if (!buffer_mapped(&bh)) > if (vmf->flags & FAULT_FLAG_WRITE) > get_block(inode, block, &bh, 1) -> allocates blocks > if (page) -> no > if (!buffer_mapped(&bh)) > if (vmf->flags & FAULT_FLAG_WRITE) { > } else { > dax_load_hole(); > } > dax_insert_mapping() >=20 > And we are in a situation where we fail in dax_radix_entry() with -EI= O. >=20 > Another problem with the current DAX page fault locking is that there= is > no race-free way to clear dirty tag in the radix tree. We can always > end up with clean radix tree and dirty data in CPU cache. >=20 > We fix the first problem by introducing locking of exceptional radix > tree entries in DAX mappings acting very similarly to page lock and t= hus > synchronizing properly faults against the same mapping index. The sam= e > lock can later be used to avoid races when clearing radix tree dirty > tag. >=20 > Reviewed-by: NeilBrown > Reviewed-by: Ross Zwisler > Signed-off-by: Jan Kara > --- <> > @@ -897,13 +1166,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma= , unsigned long address, > * the write to insert a dirty entry. > */ > if (write) { > - error =3D dax_radix_entry(mapping, pgoff, dax.sector, > - true, true); > - if (error) { > - dax_pmd_dbg(&bh, address, > - "PMD radix insertion failed"); > - goto fallback; > - } > + /* > + * We should insert radix-tree entry and dirty it here. > + * For now this is broken... > + */ With this change the 'error' variable in __dax_pmd_fault() is now unuse= d, resulting in a compiler warning. fs/dax.c: In function =E2=80=98__dax_pmd_fault=E2=80=99: fs/dax.c:1019:6: warning: unused variable =E2=80=98error=E2=80=99 [-Wun= used-variable] int error, result =3D 0; ^ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by ml01.01.org (Postfix) with ESMTP id 5814B1A1F8D for ; Thu, 12 May 2016 12:36:28 -0700 (PDT) Date: Thu, 12 May 2016 13:36:24 -0600 From: Ross Zwisler Subject: Re: [PATCH 5/7] dax: New fault locking Message-ID: <20160512193624.GA20719@linux.intel.com> References: <1463070560-1979-1-git-send-email-jack@suse.cz> <1463070560-1979-6-git-send-email-jack@suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1463070560-1979-6-git-send-email-jack@suse.cz> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Jan Kara Cc: Ted Tso , linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org List-ID: T24gVGh1LCBNYXkgMTIsIDIwMTYgYXQgMDY6Mjk6MThQTSArMDIwMCwgSmFuIEthcmEgd3JvdGU6 Cj4gQ3VycmVudGx5IERBWCBwYWdlIGZhdWx0IGxvY2tpbmcgaXMgcmFjeS4KPiAKPiBDUFUwICh3 cml0ZSBmYXVsdCkJCUNQVTEgKHJlYWQgZmF1bHQpCj4gCj4gX19kYXhfZmF1bHQoKQkJCV9fZGF4 X2ZhdWx0KCkKPiAgIGdldF9ibG9jayhpbm9kZSwgYmxvY2ssICZiaCwgMCkgLT4gbm90IG1hcHBl ZAo+IAkJCQkgIGdldF9ibG9jayhpbm9kZSwgYmxvY2ssICZiaCwgMCkKPiAJCQkJICAgIC0+IG5v dCBtYXBwZWQKPiAgIGlmICghYnVmZmVyX21hcHBlZCgmYmgpKQo+ICAgICBpZiAodm1mLT5mbGFn cyAmIEZBVUxUX0ZMQUdfV1JJVEUpCj4gICAgICAgZ2V0X2Jsb2NrKGlub2RlLCBibG9jaywgJmJo LCAxKSAtPiBhbGxvY2F0ZXMgYmxvY2tzCj4gICBpZiAocGFnZSkgLT4gbm8KPiAJCQkJICBpZiAo IWJ1ZmZlcl9tYXBwZWQoJmJoKSkKPiAJCQkJICAgIGlmICh2bWYtPmZsYWdzICYgRkFVTFRfRkxB R19XUklURSkgewo+IAkJCQkgICAgfSBlbHNlIHsKPiAJCQkJICAgICAgZGF4X2xvYWRfaG9sZSgp Owo+IAkJCQkgICAgfQo+ICAgZGF4X2luc2VydF9tYXBwaW5nKCkKPiAKPiBBbmQgd2UgYXJlIGlu IGEgc2l0dWF0aW9uIHdoZXJlIHdlIGZhaWwgaW4gZGF4X3JhZGl4X2VudHJ5KCkgd2l0aCAtRUlP Lgo+IAo+IEFub3RoZXIgcHJvYmxlbSB3aXRoIHRoZSBjdXJyZW50IERBWCBwYWdlIGZhdWx0IGxv Y2tpbmcgaXMgdGhhdCB0aGVyZSBpcwo+IG5vIHJhY2UtZnJlZSB3YXkgdG8gY2xlYXIgZGlydHkg dGFnIGluIHRoZSByYWRpeCB0cmVlLiBXZSBjYW4gYWx3YXlzCj4gZW5kIHVwIHdpdGggY2xlYW4g cmFkaXggdHJlZSBhbmQgZGlydHkgZGF0YSBpbiBDUFUgY2FjaGUuCj4gCj4gV2UgZml4IHRoZSBm aXJzdCBwcm9ibGVtIGJ5IGludHJvZHVjaW5nIGxvY2tpbmcgb2YgZXhjZXB0aW9uYWwgcmFkaXgK PiB0cmVlIGVudHJpZXMgaW4gREFYIG1hcHBpbmdzIGFjdGluZyB2ZXJ5IHNpbWlsYXJseSB0byBw YWdlIGxvY2sgYW5kIHRodXMKPiBzeW5jaHJvbml6aW5nIHByb3Blcmx5IGZhdWx0cyBhZ2FpbnN0 IHRoZSBzYW1lIG1hcHBpbmcgaW5kZXguIFRoZSBzYW1lCj4gbG9jayBjYW4gbGF0ZXIgYmUgdXNl ZCB0byBhdm9pZCByYWNlcyB3aGVuIGNsZWFyaW5nIHJhZGl4IHRyZWUgZGlydHkKPiB0YWcuCj4g Cj4gUmV2aWV3ZWQtYnk6IE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+Cj4gUmV2aWV3ZWQtYnk6 IFJvc3MgWndpc2xlciA8cm9zcy56d2lzbGVyQGxpbnV4LmludGVsLmNvbT4KPiBTaWduZWQtb2Zm LWJ5OiBKYW4gS2FyYSA8amFja0BzdXNlLmN6Pgo+IC0tLQo8Pgo+IEBAIC04OTcsMTMgKzExNjYs MTAgQEAgaW50IF9fZGF4X3BtZF9mYXVsdChzdHJ1Y3Qgdm1fYXJlYV9zdHJ1Y3QgKnZtYSwgdW5z aWduZWQgbG9uZyBhZGRyZXNzLAo+ICAJCSAqIHRoZSB3cml0ZSB0byBpbnNlcnQgYSBkaXJ0eSBl bnRyeS4KPiAgCQkgKi8KPiAgCQlpZiAod3JpdGUpIHsKPiAtCQkJZXJyb3IgPSBkYXhfcmFkaXhf ZW50cnkobWFwcGluZywgcGdvZmYsIGRheC5zZWN0b3IsCj4gLQkJCQkJdHJ1ZSwgdHJ1ZSk7Cj4g LQkJCWlmIChlcnJvcikgewo+IC0JCQkJZGF4X3BtZF9kYmcoJmJoLCBhZGRyZXNzLAo+IC0JCQkJ CQkiUE1EIHJhZGl4IGluc2VydGlvbiBmYWlsZWQiKTsKPiAtCQkJCWdvdG8gZmFsbGJhY2s7Cj4g LQkJCX0KPiArCQkJLyoKPiArCQkJICogV2Ugc2hvdWxkIGluc2VydCByYWRpeC10cmVlIGVudHJ5 IGFuZCBkaXJ0eSBpdCBoZXJlLgo+ICsJCQkgKiBGb3Igbm93IHRoaXMgaXMgYnJva2VuLi4uCj4g KwkJCSAqLwoKV2l0aCB0aGlzIGNoYW5nZSB0aGUgJ2Vycm9yJyB2YXJpYWJsZSBpbiBfX2RheF9w bWRfZmF1bHQoKSBpcyBub3cgdW51c2VkLApyZXN1bHRpbmcgaW4gYSBjb21waWxlciB3YXJuaW5n LgoKZnMvZGF4LmM6IEluIGZ1bmN0aW9uIOKAmF9fZGF4X3BtZF9mYXVsdOKAmToKZnMvZGF4LmM6 MTAxOTo2OiB3YXJuaW5nOiB1bnVzZWQgdmFyaWFibGUg4oCYZXJyb3LigJkgWy1XdW51c2VkLXZh cmlhYmxlXQogIGludCBlcnJvciwgcmVzdWx0ID0gMDsKICAgICAgICBeCgpfX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpMaW51eC1udmRpbW0gbWFpbGluZyBs aXN0CkxpbnV4LW52ZGltbUBsaXN0cy4wMS5vcmcKaHR0cHM6Ly9saXN0cy4wMS5vcmcvbWFpbG1h bi9saXN0aW5mby9saW51eC1udmRpbW0K From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com ([134.134.136.20]:51645 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752693AbcELTg2 (ORCPT ); Thu, 12 May 2016 15:36:28 -0400 Date: Thu, 12 May 2016 13:36:24 -0600 From: Ross Zwisler To: Jan Kara Cc: linux-nvdimm@lists.01.org, Ted Tso , linux-ext4@vger.kernel.org, Vishal Verma , Ross Zwisler , Dan Williams , linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 5/7] dax: New fault locking Message-ID: <20160512193624.GA20719@linux.intel.com> References: <1463070560-1979-1-git-send-email-jack@suse.cz> <1463070560-1979-6-git-send-email-jack@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1463070560-1979-6-git-send-email-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, May 12, 2016 at 06:29:18PM +0200, Jan Kara wrote: > Currently DAX page fault locking is racy. > > CPU0 (write fault) CPU1 (read fault) > > __dax_fault() __dax_fault() > get_block(inode, block, &bh, 0) -> not mapped > get_block(inode, block, &bh, 0) > -> not mapped > if (!buffer_mapped(&bh)) > if (vmf->flags & FAULT_FLAG_WRITE) > get_block(inode, block, &bh, 1) -> allocates blocks > if (page) -> no > if (!buffer_mapped(&bh)) > if (vmf->flags & FAULT_FLAG_WRITE) { > } else { > dax_load_hole(); > } > dax_insert_mapping() > > And we are in a situation where we fail in dax_radix_entry() with -EIO. > > Another problem with the current DAX page fault locking is that there is > no race-free way to clear dirty tag in the radix tree. We can always > end up with clean radix tree and dirty data in CPU cache. > > We fix the first problem by introducing locking of exceptional radix > tree entries in DAX mappings acting very similarly to page lock and thus > synchronizing properly faults against the same mapping index. The same > lock can later be used to avoid races when clearing radix tree dirty > tag. > > Reviewed-by: NeilBrown > Reviewed-by: Ross Zwisler > Signed-off-by: Jan Kara > --- <> > @@ -897,13 +1166,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > * the write to insert a dirty entry. > */ > if (write) { > - error = dax_radix_entry(mapping, pgoff, dax.sector, > - true, true); > - if (error) { > - dax_pmd_dbg(&bh, address, > - "PMD radix insertion failed"); > - goto fallback; > - } > + /* > + * We should insert radix-tree entry and dirty it here. > + * For now this is broken... > + */ With this change the 'error' variable in __dax_pmd_fault() is now unused, resulting in a compiler warning. fs/dax.c: In function ‘__dax_pmd_fault’: fs/dax.c:1019:6: warning: unused variable ‘error’ [-Wunused-variable] int error, result = 0; ^