* DAX: __dax_fault race question @ 2016-02-08 11:23 Dmitry Monakhov 2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov 2016-02-11 18:43 ` DAX: __dax_fault race question Ross Zwisler 0 siblings, 2 replies; 5+ messages in thread From: Dmitry Monakhov @ 2016-02-08 11:23 UTC (permalink / raw) To: Linux Memory Management; +Cc: willy, ross.zwisler [-- Attachment #1: Type: text/plain, Size: 1505 bytes --] Hi, I try to understand locking rules for dax and realized that there is some suspicious case in dax_fault On __dax_fault we try to replace normal page with dax-entry Basically dax_fault steps looks like follows 1) page = find_get_page(..) 2) lock_page_or_retry(page) 3) get_block 4) delete_from_page_cache(page) 5) unlock_page(page) 6) dax_insert_mapping(inode, &bh, vma, vmf) ... But what protects us from other taks does new page_fault after (4) but before (6). AFAIU this case is not prohibited Let's see what happens for two read/write tasks does fault inside file-hole task_1(writer) task_2(reader) __dax_fault(write) ->lock_page_or_retry ->delete_from_page_cache() __dax_fault(read) ->dax_load_hole ->find_or_create_page() ->new page in mapping->radix_tree ->dax_insert_mapping ->dax_radix_entry->collision: return -EIO Before dax/fsync patch-set this race result in silent dax/page duality(which likely result data incoherence or data corruption), Luckily now this race result in collision on insertion to radix_tree and return -EIO. From first glance testcase looks very simple, but I can not reproduce this in my environment. Imho it is reasonable pass locked page to dax_insert_mapping and let dax_radix_entry use atomic page/dax-entry replacement similar to replace_page_cache_page. Am I right? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert 2016-02-08 11:23 DAX: __dax_fault race question Dmitry Monakhov @ 2016-02-08 13:53 ` Dmitry Monakhov 2016-02-08 13:53 ` [PATCH 2/2] dax: fix race dax_fault write vs read Dmitry Monakhov 2016-02-11 17:42 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Ross Zwisler 2016-02-11 18:43 ` DAX: __dax_fault race question Ross Zwisler 1 sibling, 2 replies; 5+ messages in thread From: Dmitry Monakhov @ 2016-02-08 13:53 UTC (permalink / raw) To: linux-mm; +Cc: willy, ross.zwisler, Dmitry Monakhov - dax_radix_entry_insert is more appropriate name for that function - Add lockless helper __dax_radix_entry_insert, it will be used by second patch Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/dax.c | 39 +++++++++++++++++++++++---------------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index fc2e314..89bb1f8 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -349,7 +349,7 @@ static int copy_user_bh(struct page *to, struct inode *inode, #define NO_SECTOR -1 #define DAX_PMD_INDEX(page_index) (page_index & (PMD_MASK >> PAGE_CACHE_SHIFT)) -static int dax_radix_entry(struct address_space *mapping, pgoff_t index, +static int __dax_radix_entry_insert(struct address_space *mapping, pgoff_t index, sector_t sector, bool pmd_entry, bool dirty) { struct radix_tree_root *page_tree = &mapping->page_tree; @@ -358,10 +358,6 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index, void *entry; WARN_ON_ONCE(pmd_entry && !dirty); - if (dirty) - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - - spin_lock_irq(&mapping->tree_lock); entry = radix_tree_lookup(page_tree, pmd_index); if (entry && RADIX_DAX_TYPE(entry) == RADIX_DAX_PMD) { @@ -374,8 +370,7 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index, type = RADIX_DAX_TYPE(entry); if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) { - error = -EIO; - goto unlock; + return -EIO; } if (!pmd_entry || type == RADIX_DAX_PMD) @@ -402,19 +397,31 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index, * pte_same() check will fail, eventually causing page fault * to be retried by the CPU. */ - goto unlock; + return 0; } error = radix_tree_insert(page_tree, index, RADIX_DAX_ENTRY(sector, pmd_entry)); if (error) - goto unlock; + return error; mapping->nrexceptional++; - dirty: +dirty: if (dirty) radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY); - unlock: + return error; +} + +static int dax_radix_entry_insert(struct address_space *mapping, pgoff_t index, + sector_t sector, bool pmd_entry, bool dirty) +{ + int error; + + if (dirty) + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + + spin_lock_irq(&mapping->tree_lock); + error =__dax_radix_entry_insert(mapping, index, sector, pmd_entry, dirty); spin_unlock_irq(&mapping->tree_lock); return error; } @@ -579,8 +586,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, } dax_unmap_atomic(bdev, &dax); - error = dax_radix_entry(mapping, vmf->pgoff, dax.sector, false, - vmf->flags & FAULT_FLAG_WRITE); + error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false, + vmf->flags & FAULT_FLAG_WRITE, vmf->page); if (error) goto out; @@ -984,7 +991,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, * the write to insert a dirty entry. */ if (write) { - error = dax_radix_entry(mapping, pgoff, dax.sector, + error = dax_radix_entry_insert(mapping, pgoff, dax.sector, true, true); if (error) { dax_pmd_dbg(&bh, address, @@ -1057,14 +1064,14 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) struct file *file = vma->vm_file; /* - * We pass NO_SECTOR to dax_radix_entry() because we expect that a + * We pass NO_SECTOR to dax_radix_entry_insert() because we expect that a * RADIX_DAX_PTE entry already exists in the radix tree from a * previous call to __dax_fault(). We just want to look up that PTE * entry using vmf->pgoff and make sure the dirty tag is set. This * saves us from having to make a call to get_block() here to look * up the sector. */ - dax_radix_entry(file->f_mapping, vmf->pgoff, NO_SECTOR, false, true); + dax_radix_entry_insert(file->f_mapping, vmf->pgoff, NO_SECTOR, false, true); return VM_FAULT_NOPAGE; } EXPORT_SYMBOL_GPL(dax_pfn_mkwrite); -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] dax: fix race dax_fault write vs read 2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov @ 2016-02-08 13:53 ` Dmitry Monakhov 2016-02-11 17:42 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Ross Zwisler 1 sibling, 0 replies; 5+ messages in thread From: Dmitry Monakhov @ 2016-02-08 13:53 UTC (permalink / raw) To: linux-mm; +Cc: willy, ross.zwisler, Dmitry Monakhov Two read/write tasks does fault inside file-hole task_1(writer) task_2(reader) __dax_fault(write) ->lock_page_or_retry ->delete_from_page_cache() __dax_fault(read) ->dax_load_hole ->find_or_create_page() ->new page in mapping->radix_tree ->dax_insert_mapping ->dax_radix_entry => collision Let's move radix_tree update to dax_radix_entry_replace() where page deletion and dax entry insertion will be protected by ->tree_lock Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/dax.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 89bb1f8..0294fc9 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -424,6 +424,31 @@ static int dax_radix_entry_insert(struct address_space *mapping, pgoff_t index, error =__dax_radix_entry_insert(mapping, index, sector, pmd_entry, dirty); spin_unlock_irq(&mapping->tree_lock); return error; + +} + +static int dax_radix_entry_replace(struct address_space *mapping, pgoff_t index, + sector_t sector, bool pmd_entry, bool dirty, + struct page* old_page) +{ + int error; + + BUG_ON(old_page && !PageLocked(old_page)); + if (dirty) + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + + spin_lock_irq(&mapping->tree_lock); + if (old_page) + __delete_from_page_cache(old_page, NULL); + error =__dax_radix_entry_insert(mapping, index, sector, pmd_entry, dirty); + spin_unlock_irq(&mapping->tree_lock); + if (old_page) { + if (mapping->a_ops->freepage) + mapping->a_ops->freepage(old_page); + page_cache_release(old_page); + } + return error; + } static int dax_writeback_one(struct block_device *bdev, @@ -586,7 +611,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, } dax_unmap_atomic(bdev, &dax); - error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false, + error = dax_radix_entry_replace(mapping, vmf->pgoff, dax.sector, false, vmf->flags & FAULT_FLAG_WRITE, vmf->page); if (error) goto out; @@ -711,14 +736,16 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf, page = find_lock_page(mapping, vmf->pgoff); if (page) { + vmf->page = page; unmap_mapping_range(mapping, vmf->pgoff << PAGE_SHIFT, PAGE_CACHE_SIZE, 0); - delete_from_page_cache(page); + } + error = dax_insert_mapping(inode, &bh, vma, vmf); + if (page) { unlock_page(page); page_cache_release(page); - page = NULL; + vmf->page = page = NULL; } - /* * If we successfully insert the new mapping over an unwritten extent, * we need to ensure we convert the unwritten extent. If there is an @@ -729,14 +756,12 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf, * indicate what the callback should do via the uptodate variable, same * as for normal BH based IO completions. */ - error = dax_insert_mapping(inode, &bh, vma, vmf); if (buffer_unwritten(&bh)) { if (complete_unwritten) complete_unwritten(&bh, !error); else WARN_ON_ONCE(!(vmf->flags & FAULT_FLAG_WRITE)); } - out: if (error == -ENOMEM) return VM_FAULT_OOM | major; -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert 2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov 2016-02-08 13:53 ` [PATCH 2/2] dax: fix race dax_fault write vs read Dmitry Monakhov @ 2016-02-11 17:42 ` Ross Zwisler 1 sibling, 0 replies; 5+ messages in thread From: Ross Zwisler @ 2016-02-11 17:42 UTC (permalink / raw) To: Dmitry Monakhov; +Cc: linux-mm, willy, ross.zwisler On Mon, Feb 08, 2016 at 05:53:17PM +0400, Dmitry Monakhov wrote: > - dax_radix_entry_insert is more appropriate name for that function I think I may have actually had it named that at some point. :) I changed it because it doesn't always insert an entry - in the read case for example we insert a clean entry, and then on the following dax_pfn_mkwrite() we call back in and mark it as dirty. > - Add lockless helper __dax_radix_entry_insert, it will be used by second patch > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> > --- > fs/dax.c | 39 +++++++++++++++++++++++---------------- > 1 file changed, 23 insertions(+), 16 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index fc2e314..89bb1f8 100644 > --- a/fs/dax.c > +++ b/fs/dax.c <> > @@ -579,8 +586,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, > } > dax_unmap_atomic(bdev, &dax); > > - error = dax_radix_entry(mapping, vmf->pgoff, dax.sector, false, > - vmf->flags & FAULT_FLAG_WRITE); > + error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false, > + vmf->flags & FAULT_FLAG_WRITE, vmf->page); fs/dax.c: In function a??dax_insert_mappinga??: fs/dax.c:589:10: error: too many arguments to function a??dax_radix_entry_inserta?? error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false, ^ fs/dax.c:415:12: note: declared here static int dax_radix_entry_insert(struct address_space *mapping, pgoff_t index, ^ scripts/Makefile.build:258: recipe for target 'fs/dax.o' failed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: DAX: __dax_fault race question 2016-02-08 11:23 DAX: __dax_fault race question Dmitry Monakhov 2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov @ 2016-02-11 18:43 ` Ross Zwisler 1 sibling, 0 replies; 5+ messages in thread From: Ross Zwisler @ 2016-02-11 18:43 UTC (permalink / raw) To: Dmitry Monakhov; +Cc: Linux Memory Management, willy, ross.zwisler On Mon, Feb 08, 2016 at 02:23:49PM +0300, Dmitry Monakhov wrote: > > Hi, > > I try to understand locking rules for dax and realized that there is > some suspicious case in dax_fault > > On __dax_fault we try to replace normal page with dax-entry > Basically dax_fault steps looks like follows > > 1) page = find_get_page(..) > 2) lock_page_or_retry(page) > 3) get_block > 4) delete_from_page_cache(page) > 5) unlock_page(page) > 6) dax_insert_mapping(inode, &bh, vma, vmf) > ... > > But what protects us from other taks does new page_fault after (4) but > before (6). > AFAIU this case is not prohibited > Let's see what happens for two read/write tasks does fault inside file-hole > task_1(writer) task_2(reader) > __dax_fault(write) > ->lock_page_or_retry > ->delete_from_page_cache() __dax_fault(read) > ->dax_load_hole > ->find_or_create_page() > ->new page in mapping->radix_tree > ->dax_insert_mapping > ->dax_radix_entry->collision: return -EIO > > Before dax/fsync patch-set this race result in silent dax/page duality(which > likely result data incoherence or data corruption), Luckily now this > race result in collision on insertion to radix_tree and return -EIO. > From first glance testcase looks very simple, but I can not reproduce > this in my environment. > > Imho it is reasonable pass locked page to dax_insert_mapping and let > dax_radix_entry use atomic page/dax-entry replacement similar to > replace_page_cache_page. Am I right? We are trying to come up with a general locking scheme that will solve this race as well as others. https://lkml.org/lkml/2016/2/9/607 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-02-11 18:44 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-08 11:23 DAX: __dax_fault race question Dmitry Monakhov 2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov 2016-02-08 13:53 ` [PATCH 2/2] dax: fix race dax_fault write vs read Dmitry Monakhov 2016-02-11 17:42 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Ross Zwisler 2016-02-11 18:43 ` DAX: __dax_fault race question Ross Zwisler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).