From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f46.google.com (mail-lf0-f46.google.com [209.85.215.46]) by kanga.kvack.org (Postfix) with ESMTP id 28E0A828F3 for ; Mon, 8 Feb 2016 06:23:58 -0500 (EST) Received: by mail-lf0-f46.google.com with SMTP id l143so93230327lfe.2 for ; Mon, 08 Feb 2016 03:23:58 -0800 (PST) Received: from mail-lb0-x233.google.com (mail-lb0-x233.google.com. [2a00:1450:4010:c04::233]) by mx.google.com with ESMTPS id th8si15852616lbb.174.2016.02.08.03.23.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Feb 2016 03:23:56 -0800 (PST) Received: by mail-lb0-x233.google.com with SMTP id bc4so80879466lbc.2 for ; Mon, 08 Feb 2016 03:23:56 -0800 (PST) From: Dmitry Monakhov Subject: DAX: __dax_fault race question Date: Mon, 08 Feb 2016 14:23:49 +0300 Message-ID: <87bn7rwim2.fsf@openvz.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Sender: owner-linux-mm@kvack.org List-ID: To: Linux Memory Management Cc: willy@linux.intel.com, ross.zwisler@linux.intel.com --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hi, I try to understand locking rules for dax and realized that there is some suspicious case in dax_fault On __dax_fault we try to replace normal page with dax-entry Basically dax_fault steps looks like follows 1) page =3D find_get_page(..) 2) lock_page_or_retry(page) 3) get_block 4) delete_from_page_cache(page) 5) unlock_page(page) 6) dax_insert_mapping(inode, &bh, vma, vmf) ... But what protects us from other taks does new page_fault after (4) but before (6). AFAIU this case is not prohibited Let's see what happens for two read/write tasks does fault inside file-hole task_1(writer) task_2(reader) __dax_fault(write) ->lock_page_or_retry ->delete_from_page_cache() __dax_fault(read) ->dax_load_hole ->find_or_create_page() ->new page in mapping->radix_tree=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 ->dax_insert_mapping ->dax_radix_entry->collision: return -EIO Before dax/fsync patch-set this race result in silent dax/page duality(which likely result data incoherence or data corruption), Luckily now this race result in collision on insertion to radix_tree and return -EIO. From=20first glance testcase looks very simple, but I can not reproduce this in my environment.=20 Imho it is reasonable pass locked page to dax_insert_mapping and let dax_radix_entry use atomic page/dax-entry replacement similar to replace_page_cache_page. Am I right? --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBCgAGBQJWuHrGAAoJELhyPTmIL6kBij8IAKsU5Qd5BLD4DHX8Sos8teix EGeJv2fp+gaFjbQJmtmG25B7gF1KByFaynL17Rt8H+P6WNcNXx//DPqMyU3/kw1M pWL5iaXnTyz1UR22UZ34HyKhasIjR9MUNrnI0ZZ0e9YfG3iNRU3mQzgIJV3bHOUd KLTe9wfd8tC0GgzI+T5mrEPoL4pEgWOm+Cgl5drDdiWO8XVcgKZysw9MqSPTVAQr Uk25F2ksghl0zSHJHZQ1Ps9n/gpkjcj0G0UaPWRtZchxEPKHYx2cfgAagH95OfhG j5b/cHqD4hBK2B5d03joB4GoUoK/zu4ByhhxQW9Ank0/wZPrnFep3p0s+Aj9MK0= =PMeb -----END PGP SIGNATURE----- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) by kanga.kvack.org (Postfix) with ESMTP id B6521830A0 for ; Mon, 8 Feb 2016 08:53:34 -0500 (EST) Received: by mail-lf0-f48.google.com with SMTP id m1so96952588lfg.0 for ; Mon, 08 Feb 2016 05:53:34 -0800 (PST) Received: from relay.sw.ru (mailhub.sw.ru. [195.214.232.25]) by mx.google.com with ESMTPS id l71si16133481lfi.45.2016.02.08.05.53.31 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 08 Feb 2016 05:53:32 -0800 (PST) From: Dmitry Monakhov Subject: [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Date: Mon, 8 Feb 2016 17:53:17 +0400 Message-Id: <1454939598-16238-1-git-send-email-dmonakhov@openvz.org> In-Reply-To: <87bn7rwim2.fsf@openvz.org> References: <87bn7rwim2.fsf@openvz.org> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: willy@linux.intel.com, ross.zwisler@linux.intel.com, Dmitry Monakhov - dax_radix_entry_insert is more appropriate name for that function - Add lockless helper __dax_radix_entry_insert, it will be used by second patch Signed-off-by: Dmitry Monakhov --- fs/dax.c | 39 +++++++++++++++++++++++---------------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index fc2e314..89bb1f8 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -349,7 +349,7 @@ static int copy_user_bh(struct page *to, struct inode *inode, #define NO_SECTOR -1 #define DAX_PMD_INDEX(page_index) (page_index & (PMD_MASK >> PAGE_CACHE_SHIFT)) -static int dax_radix_entry(struct address_space *mapping, pgoff_t index, +static int __dax_radix_entry_insert(struct address_space *mapping, pgoff_t index, sector_t sector, bool pmd_entry, bool dirty) { struct radix_tree_root *page_tree = &mapping->page_tree; @@ -358,10 +358,6 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index, void *entry; WARN_ON_ONCE(pmd_entry && !dirty); - if (dirty) - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - - spin_lock_irq(&mapping->tree_lock); entry = radix_tree_lookup(page_tree, pmd_index); if (entry && RADIX_DAX_TYPE(entry) == RADIX_DAX_PMD) { @@ -374,8 +370,7 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index, type = RADIX_DAX_TYPE(entry); if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) { - error = -EIO; - goto unlock; + return -EIO; } if (!pmd_entry || type == RADIX_DAX_PMD) @@ -402,19 +397,31 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index, * pte_same() check will fail, eventually causing page fault * to be retried by the CPU. */ - goto unlock; + return 0; } error = radix_tree_insert(page_tree, index, RADIX_DAX_ENTRY(sector, pmd_entry)); if (error) - goto unlock; + return error; mapping->nrexceptional++; - dirty: +dirty: if (dirty) radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY); - unlock: + return error; +} + +static int dax_radix_entry_insert(struct address_space *mapping, pgoff_t index, + sector_t sector, bool pmd_entry, bool dirty) +{ + int error; + + if (dirty) + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + + spin_lock_irq(&mapping->tree_lock); + error =__dax_radix_entry_insert(mapping, index, sector, pmd_entry, dirty); spin_unlock_irq(&mapping->tree_lock); return error; } @@ -579,8 +586,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, } dax_unmap_atomic(bdev, &dax); - error = dax_radix_entry(mapping, vmf->pgoff, dax.sector, false, - vmf->flags & FAULT_FLAG_WRITE); + error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false, + vmf->flags & FAULT_FLAG_WRITE, vmf->page); if (error) goto out; @@ -984,7 +991,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, * the write to insert a dirty entry. */ if (write) { - error = dax_radix_entry(mapping, pgoff, dax.sector, + error = dax_radix_entry_insert(mapping, pgoff, dax.sector, true, true); if (error) { dax_pmd_dbg(&bh, address, @@ -1057,14 +1064,14 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) struct file *file = vma->vm_file; /* - * We pass NO_SECTOR to dax_radix_entry() because we expect that a + * We pass NO_SECTOR to dax_radix_entry_insert() because we expect that a * RADIX_DAX_PTE entry already exists in the radix tree from a * previous call to __dax_fault(). We just want to look up that PTE * entry using vmf->pgoff and make sure the dirty tag is set. This * saves us from having to make a call to get_block() here to look * up the sector. */ - dax_radix_entry(file->f_mapping, vmf->pgoff, NO_SECTOR, false, true); + dax_radix_entry_insert(file->f_mapping, vmf->pgoff, NO_SECTOR, false, true); return VM_FAULT_NOPAGE; } EXPORT_SYMBOL_GPL(dax_pfn_mkwrite); -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com [209.85.217.176]) by kanga.kvack.org (Postfix) with ESMTP id 78A04830A0 for ; Mon, 8 Feb 2016 08:53:35 -0500 (EST) Received: by mail-lb0-f176.google.com with SMTP id x4so84301174lbm.0 for ; Mon, 08 Feb 2016 05:53:35 -0800 (PST) Received: from relay.sw.ru (mailhub.sw.ru. [195.214.232.25]) by mx.google.com with ESMTPS id u7si16164737lbw.3.2016.02.08.05.53.32 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 08 Feb 2016 05:53:33 -0800 (PST) From: Dmitry Monakhov Subject: [PATCH 2/2] dax: fix race dax_fault write vs read Date: Mon, 8 Feb 2016 17:53:18 +0400 Message-Id: <1454939598-16238-2-git-send-email-dmonakhov@openvz.org> In-Reply-To: <1454939598-16238-1-git-send-email-dmonakhov@openvz.org> References: <87bn7rwim2.fsf@openvz.org> <1454939598-16238-1-git-send-email-dmonakhov@openvz.org> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: willy@linux.intel.com, ross.zwisler@linux.intel.com, Dmitry Monakhov Two read/write tasks does fault inside file-hole task_1(writer) task_2(reader) __dax_fault(write) ->lock_page_or_retry ->delete_from_page_cache() __dax_fault(read) ->dax_load_hole ->find_or_create_page() ->new page in mapping->radix_tree ->dax_insert_mapping ->dax_radix_entry => collision Let's move radix_tree update to dax_radix_entry_replace() where page deletion and dax entry insertion will be protected by ->tree_lock Signed-off-by: Dmitry Monakhov --- fs/dax.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 89bb1f8..0294fc9 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -424,6 +424,31 @@ static int dax_radix_entry_insert(struct address_space *mapping, pgoff_t index, error =__dax_radix_entry_insert(mapping, index, sector, pmd_entry, dirty); spin_unlock_irq(&mapping->tree_lock); return error; + +} + +static int dax_radix_entry_replace(struct address_space *mapping, pgoff_t index, + sector_t sector, bool pmd_entry, bool dirty, + struct page* old_page) +{ + int error; + + BUG_ON(old_page && !PageLocked(old_page)); + if (dirty) + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + + spin_lock_irq(&mapping->tree_lock); + if (old_page) + __delete_from_page_cache(old_page, NULL); + error =__dax_radix_entry_insert(mapping, index, sector, pmd_entry, dirty); + spin_unlock_irq(&mapping->tree_lock); + if (old_page) { + if (mapping->a_ops->freepage) + mapping->a_ops->freepage(old_page); + page_cache_release(old_page); + } + return error; + } static int dax_writeback_one(struct block_device *bdev, @@ -586,7 +611,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, } dax_unmap_atomic(bdev, &dax); - error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false, + error = dax_radix_entry_replace(mapping, vmf->pgoff, dax.sector, false, vmf->flags & FAULT_FLAG_WRITE, vmf->page); if (error) goto out; @@ -711,14 +736,16 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf, page = find_lock_page(mapping, vmf->pgoff); if (page) { + vmf->page = page; unmap_mapping_range(mapping, vmf->pgoff << PAGE_SHIFT, PAGE_CACHE_SIZE, 0); - delete_from_page_cache(page); + } + error = dax_insert_mapping(inode, &bh, vma, vmf); + if (page) { unlock_page(page); page_cache_release(page); - page = NULL; + vmf->page = page = NULL; } - /* * If we successfully insert the new mapping over an unwritten extent, * we need to ensure we convert the unwritten extent. If there is an @@ -729,14 +756,12 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf, * indicate what the callback should do via the uptodate variable, same * as for normal BH based IO completions. */ - error = dax_insert_mapping(inode, &bh, vma, vmf); if (buffer_unwritten(&bh)) { if (complete_unwritten) complete_unwritten(&bh, !error); else WARN_ON_ONCE(!(vmf->flags & FAULT_FLAG_WRITE)); } - out: if (error == -ENOMEM) return VM_FAULT_OOM | major; -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by kanga.kvack.org (Postfix) with ESMTP id 0FC6A6B0009 for ; Thu, 11 Feb 2016 12:43:36 -0500 (EST) Received: by mail-pa0-f45.google.com with SMTP id ho8so32231466pac.2 for ; Thu, 11 Feb 2016 09:43:36 -0800 (PST) Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTP id sm4si13748298pac.245.2016.02.11.09.43.34 for ; Thu, 11 Feb 2016 09:43:35 -0800 (PST) Date: Thu, 11 Feb 2016 10:42:55 -0700 From: Ross Zwisler Subject: Re: [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Message-ID: <20160211174255.GA11014@linux.intel.com> References: <87bn7rwim2.fsf@openvz.org> <1454939598-16238-1-git-send-email-dmonakhov@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1454939598-16238-1-git-send-email-dmonakhov@openvz.org> Sender: owner-linux-mm@kvack.org List-ID: To: Dmitry Monakhov Cc: linux-mm@kvack.org, willy@linux.intel.com, ross.zwisler@linux.intel.com On Mon, Feb 08, 2016 at 05:53:17PM +0400, Dmitry Monakhov wrote: > - dax_radix_entry_insert is more appropriate name for that function I think I may have actually had it named that at some point. :) I changed it because it doesn't always insert an entry - in the read case for example we insert a clean entry, and then on the following dax_pfn_mkwrite() we call back in and mark it as dirty. > - Add lockless helper __dax_radix_entry_insert, it will be used by second patch > > Signed-off-by: Dmitry Monakhov > --- > fs/dax.c | 39 +++++++++++++++++++++++---------------- > 1 file changed, 23 insertions(+), 16 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index fc2e314..89bb1f8 100644 > --- a/fs/dax.c > +++ b/fs/dax.c <> > @@ -579,8 +586,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, > } > dax_unmap_atomic(bdev, &dax); > > - error = dax_radix_entry(mapping, vmf->pgoff, dax.sector, false, > - vmf->flags & FAULT_FLAG_WRITE); > + error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false, > + vmf->flags & FAULT_FLAG_WRITE, vmf->page); fs/dax.c: In function a??dax_insert_mappinga??: fs/dax.c:589:10: error: too many arguments to function a??dax_radix_entry_inserta?? error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false, ^ fs/dax.c:415:12: note: declared here static int dax_radix_entry_insert(struct address_space *mapping, pgoff_t index, ^ scripts/Makefile.build:258: recipe for target 'fs/dax.o' failed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f180.google.com (mail-pf0-f180.google.com [209.85.192.180]) by kanga.kvack.org (Postfix) with ESMTP id 089B56B0009 for ; Thu, 11 Feb 2016 13:44:03 -0500 (EST) Received: by mail-pf0-f180.google.com with SMTP id x65so33560493pfb.1 for ; Thu, 11 Feb 2016 10:44:03 -0800 (PST) Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTP id sp7si14022221pac.230.2016.02.11.10.44.02 for ; Thu, 11 Feb 2016 10:44:02 -0800 (PST) Date: Thu, 11 Feb 2016 11:43:50 -0700 From: Ross Zwisler Subject: Re: DAX: __dax_fault race question Message-ID: <20160211184350.GA27848@linux.intel.com> References: <87bn7rwim2.fsf@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87bn7rwim2.fsf@openvz.org> Sender: owner-linux-mm@kvack.org List-ID: To: Dmitry Monakhov Cc: Linux Memory Management , willy@linux.intel.com, ross.zwisler@linux.intel.com On Mon, Feb 08, 2016 at 02:23:49PM +0300, Dmitry Monakhov wrote: > > Hi, > > I try to understand locking rules for dax and realized that there is > some suspicious case in dax_fault > > On __dax_fault we try to replace normal page with dax-entry > Basically dax_fault steps looks like follows > > 1) page = find_get_page(..) > 2) lock_page_or_retry(page) > 3) get_block > 4) delete_from_page_cache(page) > 5) unlock_page(page) > 6) dax_insert_mapping(inode, &bh, vma, vmf) > ... > > But what protects us from other taks does new page_fault after (4) but > before (6). > AFAIU this case is not prohibited > Let's see what happens for two read/write tasks does fault inside file-hole > task_1(writer) task_2(reader) > __dax_fault(write) > ->lock_page_or_retry > ->delete_from_page_cache() __dax_fault(read) > ->dax_load_hole > ->find_or_create_page() > ->new page in mapping->radix_tree > ->dax_insert_mapping > ->dax_radix_entry->collision: return -EIO > > Before dax/fsync patch-set this race result in silent dax/page duality(which > likely result data incoherence or data corruption), Luckily now this > race result in collision on insertion to radix_tree and return -EIO. > From first glance testcase looks very simple, but I can not reproduce > this in my environment. > > Imho it is reasonable pass locked page to dax_insert_mapping and let > dax_radix_entry use atomic page/dax-entry replacement similar to > replace_page_cache_page. Am I right? We are trying to come up with a general locking scheme that will solve this race as well as others. https://lkml.org/lkml/2016/2/9/607 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org