* [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. [not found] <CACT4Y+a99OW7TYeLsuEic19uY2j45DGXL=LowUMq3TywWS3f2Q@mail.gmail.com> @ 2016-07-14 11:19 ` Andrey Ryabinin 2016-07-14 12:21 ` Konstantin Khlebnikov 2016-07-14 22:25 ` Ross Zwisler 0 siblings, 2 replies; 11+ messages in thread From: Andrey Ryabinin @ 2016-07-14 11:19 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, ross.zwisler, Kirill A. Shutemov, linux-mm, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, linux-kernel, Andrey Ryabinin, Konstantin Khlebnikov, Matthew Wilcox, Hugh Dickins, stable radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() leading to crash: RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 [<ffffffff816951a4>] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 .... Call Trace: [<ffffffff816cd91a>] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 [<ffffffff81ab4231>] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 [<ffffffff81ac883e>] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 [<ffffffff816c99c7>] do_writepages+0x97/0x100 mm/page-writeback.c:2364 [<ffffffff8169bee8>] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 [<ffffffff8169c371>] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 [<ffffffff81aa584d>] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 [<ffffffff818b667a>] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 [< inline >] vfs_fsync fs/sync.c:209 [<ffffffff818b6832>] do_fsync+0x42/0x70 fs/sync.c:219 [< inline >] SYSC_fdatasync fs/sync.c:232 [<ffffffff818b6f89>] SyS_fdatasync+0x19/0x20 fs/sync.c:230 [<ffffffff86a94e00>] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 We must reset iterator's tags to bail out from radix_tree_next_slot() and go to the slow-path in radix_tree_next_chunk(). Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> --- include/linux/radix-tree.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index cb4b7e8..eca6f62 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -407,6 +407,7 @@ static inline __must_check void **radix_tree_iter_retry(struct radix_tree_iter *iter) { iter->next_index = iter->index; + iter->tags = 0; return NULL; } -- 2.7.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-14 11:19 ` [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators Andrey Ryabinin @ 2016-07-14 12:21 ` Konstantin Khlebnikov 2016-07-14 22:25 ` Ross Zwisler 1 sibling, 0 replies; 11+ messages in thread From: Konstantin Khlebnikov @ 2016-07-14 12:21 UTC (permalink / raw) To: Andrey Ryabinin Cc: Andrew Morton, Jan Kara, ross.zwisler, Kirill A. Shutemov, linux-mm@kvack.org, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, Linux Kernel Mailing List, Matthew Wilcox, Hugh Dickins, Stable ACK Originally retry could happen only at index 0 when first indirect node installed: in this case tags holds only 1 bit. Seems like now this happends at any index. On Thu, Jul 14, 2016 at 2:19 PM, Andrey Ryabinin <aryabinin@virtuozzo.com> wrote: > radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. > Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() > leading to crash: > > RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 > [<ffffffff816951a4>] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 > .... > Call Trace: > [<ffffffff816cd91a>] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 > [<ffffffff81ab4231>] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 > [<ffffffff81ac883e>] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 > [<ffffffff816c99c7>] do_writepages+0x97/0x100 mm/page-writeback.c:2364 > [<ffffffff8169bee8>] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 > [<ffffffff8169c371>] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 > [<ffffffff81aa584d>] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 > [<ffffffff818b667a>] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 > [< inline >] vfs_fsync fs/sync.c:209 > [<ffffffff818b6832>] do_fsync+0x42/0x70 fs/sync.c:219 > [< inline >] SYSC_fdatasync fs/sync.c:232 > [<ffffffff818b6f89>] SyS_fdatasync+0x19/0x20 fs/sync.c:230 > [<ffffffff86a94e00>] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 > > We must reset iterator's tags to bail out from radix_tree_next_slot() and > go to the slow-path in radix_tree_next_chunk(). > > Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") > Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Cc: Konstantin Khlebnikov <koct9i@gmail.com> > Cc: Matthew Wilcox <willy@linux.intel.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: <stable@vger.kernel.org> > --- > include/linux/radix-tree.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index cb4b7e8..eca6f62 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -407,6 +407,7 @@ static inline __must_check > void **radix_tree_iter_retry(struct radix_tree_iter *iter) > { > iter->next_index = iter->index; > + iter->tags = 0; > return NULL; > } > > -- > 2.7.3 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-14 11:19 ` [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators Andrey Ryabinin 2016-07-14 12:21 ` Konstantin Khlebnikov @ 2016-07-14 22:25 ` Ross Zwisler 2016-07-15 8:52 ` Andrey Ryabinin 1 sibling, 1 reply; 11+ messages in thread From: Ross Zwisler @ 2016-07-14 22:25 UTC (permalink / raw) To: Andrey Ryabinin Cc: Andrew Morton, Jan Kara, ross.zwisler, Kirill A. Shutemov, linux-mm, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, linux-kernel, Konstantin Khlebnikov, Matthew Wilcox, Hugh Dickins, stable On Thu, Jul 14, 2016 at 02:19:56PM +0300, Andrey Ryabinin wrote: > radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. > Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() > leading to crash: > > RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 > [<ffffffff816951a4>] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 > .... > Call Trace: > [<ffffffff816cd91a>] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 > [<ffffffff81ab4231>] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 > [<ffffffff81ac883e>] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 > [<ffffffff816c99c7>] do_writepages+0x97/0x100 mm/page-writeback.c:2364 > [<ffffffff8169bee8>] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 > [<ffffffff8169c371>] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 > [<ffffffff81aa584d>] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 > [<ffffffff818b667a>] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 > [< inline >] vfs_fsync fs/sync.c:209 > [<ffffffff818b6832>] do_fsync+0x42/0x70 fs/sync.c:219 > [< inline >] SYSC_fdatasync fs/sync.c:232 > [<ffffffff818b6f89>] SyS_fdatasync+0x19/0x20 fs/sync.c:230 > [<ffffffff86a94e00>] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 > > We must reset iterator's tags to bail out from radix_tree_next_slot() and > go to the slow-path in radix_tree_next_chunk(). This analysis doesn't make sense to me. In find_get_pages_tag(), when we call radix_tree_iter_retry(), this sets the local 'slot' variable to NULL, then does a 'continue'. This will hop to the next iteration of the radix_tree_for_each_tagged() loop, which will very check the exit condition of the for() loop: #define radix_tree_for_each_tagged(slot, root, iter, start, tag) \ for (slot = radix_tree_iter_init(iter, start) ; \ slot || (slot = radix_tree_next_chunk(root, iter, \ RADIX_TREE_ITER_TAGGED | tag)) ; \ slot = radix_tree_next_slot(slot, iter, \ RADIX_TREE_ITER_TAGGED)) So, we'll run the slot || (slot = radix_tree_next_chunk(root, iter, \ RADIX_TREE_ITER_TAGGED | tag)) ; \ bit first. 'slot' is NULL, so we'll set it via radix_tree_next_chunk(). At this point radix_tree_next_slot() hasn't been called. radix_tree_next_chunk() will set up the iter->index, iter->next_index and iter->tags before it returns. The next iteration of the loop in find_get_pages_tag() will use the non-NULL slot provided by radix_tree_next_chunk(), and only after that iteration will we call radix_tree_next_slot() again. By then iter->tags should be up to date. Do you have a test setup that reliably fails without this code but passes when you zero out iter->tags? I've been looking at this as well, but haven't been able to get a reliable reproducer in my test setup. > Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") > Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Cc: Konstantin Khlebnikov <koct9i@gmail.com> > Cc: Matthew Wilcox <willy@linux.intel.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: <stable@vger.kernel.org> > --- > include/linux/radix-tree.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index cb4b7e8..eca6f62 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -407,6 +407,7 @@ static inline __must_check > void **radix_tree_iter_retry(struct radix_tree_iter *iter) > { > iter->next_index = iter->index; > + iter->tags = 0; > return NULL; > } > > -- > 2.7.3 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-14 22:25 ` Ross Zwisler @ 2016-07-15 8:52 ` Andrey Ryabinin 2016-07-15 19:00 ` Ross Zwisler 0 siblings, 1 reply; 11+ messages in thread From: Andrey Ryabinin @ 2016-07-15 8:52 UTC (permalink / raw) To: Ross Zwisler Cc: Andrew Morton, Jan Kara, Kirill A. Shutemov, linux-mm, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, linux-kernel, Konstantin Khlebnikov, Matthew Wilcox, Hugh Dickins, stable On 07/15/2016 01:25 AM, Ross Zwisler wrote: > On Thu, Jul 14, 2016 at 02:19:56PM +0300, Andrey Ryabinin wrote: >> radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. >> Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() >> leading to crash: >> >> RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 >> [<ffffffff816951a4>] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 >> .... >> Call Trace: >> [<ffffffff816cd91a>] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 >> [<ffffffff81ab4231>] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 >> [<ffffffff81ac883e>] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 >> [<ffffffff816c99c7>] do_writepages+0x97/0x100 mm/page-writeback.c:2364 >> [<ffffffff8169bee8>] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 >> [<ffffffff8169c371>] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 >> [<ffffffff81aa584d>] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 >> [<ffffffff818b667a>] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 >> [< inline >] vfs_fsync fs/sync.c:209 >> [<ffffffff818b6832>] do_fsync+0x42/0x70 fs/sync.c:219 >> [< inline >] SYSC_fdatasync fs/sync.c:232 >> [<ffffffff818b6f89>] SyS_fdatasync+0x19/0x20 fs/sync.c:230 >> [<ffffffff86a94e00>] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 >> >> We must reset iterator's tags to bail out from radix_tree_next_slot() and >> go to the slow-path in radix_tree_next_chunk(). > > This analysis doesn't make sense to me. In find_get_pages_tag(), when we call > radix_tree_iter_retry(), this sets the local 'slot' variable to NULL, then > does a 'continue'. This will hop to the next iteration of the > radix_tree_for_each_tagged() loop, which will very check the exit condition of > the for() loop: > > #define radix_tree_for_each_tagged(slot, root, iter, start, tag) \ > for (slot = radix_tree_iter_init(iter, start) ; \ > slot || (slot = radix_tree_next_chunk(root, iter, \ > RADIX_TREE_ITER_TAGGED | tag)) ; \ > slot = radix_tree_next_slot(slot, iter, \ > RADIX_TREE_ITER_TAGGED)) > > So, we'll run the > slot || (slot = radix_tree_next_chunk(root, iter, \ > RADIX_TREE_ITER_TAGGED | tag)) ; \ > > bit first. This is not the way how the for() loop works. slot = radix_tree_next_slot() executed first and only after that goes the condition statement. > 'slot' is NULL, so we'll set it via radix_tree_next_chunk(). At > this point radix_tree_next_slot() hasn't been called. > > radix_tree_next_chunk() will set up the iter->index, iter->next_index and > iter->tags before it returns. The next iteration of the loop in > find_get_pages_tag() will use the non-NULL slot provided by > radix_tree_next_chunk(), and only after that iteration will we call > radix_tree_next_slot() again. By then iter->tags should be up to date. > > Do you have a test setup that reliably fails without this code but passes when > you zero out iter->tags? > Yup, I run Dmitry's reproducer in a parallel loop: $ while true; do ./a.out & done Usually it takes just couple minutes maximum. > I've been looking at this as well, but haven't been able to get a reliable > reproducer in my test setup. > >> Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") >> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> >> Reported-by: Dmitry Vyukov <dvyukov@google.com> >> Cc: Konstantin Khlebnikov <koct9i@gmail.com> >> Cc: Matthew Wilcox <willy@linux.intel.com> >> Cc: Hugh Dickins <hughd@google.com> >> Cc: <stable@vger.kernel.org> >> --- >> include/linux/radix-tree.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h >> index cb4b7e8..eca6f62 100644 >> --- a/include/linux/radix-tree.h >> +++ b/include/linux/radix-tree.h >> @@ -407,6 +407,7 @@ static inline __must_check >> void **radix_tree_iter_retry(struct radix_tree_iter *iter) >> { >> iter->next_index = iter->index; >> + iter->tags = 0; >> return NULL; >> } >> >> -- >> 2.7.3 >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-15 8:52 ` Andrey Ryabinin @ 2016-07-15 19:00 ` Ross Zwisler 2016-07-15 20:57 ` Andrew Morton 2016-07-16 13:45 ` Konstantin Khlebnikov 0 siblings, 2 replies; 11+ messages in thread From: Ross Zwisler @ 2016-07-15 19:00 UTC (permalink / raw) To: Andrey Ryabinin Cc: Ross Zwisler, Andrew Morton, Jan Kara, Kirill A. Shutemov, linux-mm, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, linux-kernel, Konstantin Khlebnikov, Matthew Wilcox, Hugh Dickins, stable On Fri, Jul 15, 2016 at 11:52:58AM +0300, Andrey Ryabinin wrote: > On 07/15/2016 01:25 AM, Ross Zwisler wrote: > > On Thu, Jul 14, 2016 at 02:19:56PM +0300, Andrey Ryabinin wrote: > >> radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. > >> Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() > >> leading to crash: > >> > >> RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 > >> [<ffffffff816951a4>] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 > >> .... > >> Call Trace: > >> [<ffffffff816cd91a>] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 > >> [<ffffffff81ab4231>] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 > >> [<ffffffff81ac883e>] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 > >> [<ffffffff816c99c7>] do_writepages+0x97/0x100 mm/page-writeback.c:2364 > >> [<ffffffff8169bee8>] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 > >> [<ffffffff8169c371>] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 > >> [<ffffffff81aa584d>] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 > >> [<ffffffff818b667a>] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 > >> [< inline >] vfs_fsync fs/sync.c:209 > >> [<ffffffff818b6832>] do_fsync+0x42/0x70 fs/sync.c:219 > >> [< inline >] SYSC_fdatasync fs/sync.c:232 > >> [<ffffffff818b6f89>] SyS_fdatasync+0x19/0x20 fs/sync.c:230 > >> [<ffffffff86a94e00>] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 > >> > >> We must reset iterator's tags to bail out from radix_tree_next_slot() and > >> go to the slow-path in radix_tree_next_chunk(). > > > > This analysis doesn't make sense to me. In find_get_pages_tag(), when we call > > radix_tree_iter_retry(), this sets the local 'slot' variable to NULL, then > > does a 'continue'. This will hop to the next iteration of the > > radix_tree_for_each_tagged() loop, which will very check the exit condition of > > the for() loop: > > > > #define radix_tree_for_each_tagged(slot, root, iter, start, tag) \ > > for (slot = radix_tree_iter_init(iter, start) ; \ > > slot || (slot = radix_tree_next_chunk(root, iter, \ > > RADIX_TREE_ITER_TAGGED | tag)) ; \ > > slot = radix_tree_next_slot(slot, iter, \ > > RADIX_TREE_ITER_TAGGED)) > > > > So, we'll run the > > slot || (slot = radix_tree_next_chunk(root, iter, \ > > RADIX_TREE_ITER_TAGGED | tag)) ; \ > > > > bit first. > > This is not the way how the for() loop works. slot = radix_tree_next_slot() executed first > and only after that goes the condition statement. Right...*sigh*... Thanks for the sanity check. :) > > 'slot' is NULL, so we'll set it via radix_tree_next_chunk(). At > > this point radix_tree_next_slot() hasn't been called. > > > > radix_tree_next_chunk() will set up the iter->index, iter->next_index and > > iter->tags before it returns. The next iteration of the loop in > > find_get_pages_tag() will use the non-NULL slot provided by > > radix_tree_next_chunk(), and only after that iteration will we call > > radix_tree_next_slot() again. By then iter->tags should be up to date. > > > > Do you have a test setup that reliably fails without this code but passes when > > you zero out iter->tags? > > > > > Yup, I run Dmitry's reproducer in a parallel loop: > $ while true; do ./a.out & done > > Usually it takes just couple minutes maximum. Cool - I was able to get this to work on my system as well by upping the thread count. In looking at this more, I agree that your patch fixes this particular bug, but I think that ultimately we might want something more general. IIUC, the real issue is that we shouldn't be running through radix_tree_next_slot() with a NULL 'slot' parameter. In the end I think it's fine to zero out iter->tags in radix_tree_iter_retry(), but really we want to guarantee that we just bail out of radix_tree_next_slot() if we have a NULL 'slot'. I've run this patch in my test setup, and it fixes the reproducer provided by Dmitry. I've also run xfstests against it with out any failures. --- 8< --- >From 533beefac12f61f467aeb72e2d2c46685247b9bc Mon Sep 17 00:00:00 2001 From: Ross Zwisler <ross.zwisler@linux.intel.com> Date: Fri, 15 Jul 2016 12:46:38 -0600 Subject: [PATCH] radix-tree: 'slot' can be NULL in radix_tree_next_slot() There are four cases I can see where we could end up with a NULL 'slot' in radix_tree_next_slot() (there might be more): 1) radix_tree_iter_retry() via a non-tagged iteration like radix_tree_for_each_slot(). In this case we currently aren't seeing a bug because radix_tree_iter_retry() sets iter->next_index = iter->index; which means that in in the else case in radix_tree_next_slot(), 'count' is zero, so we skip over the while() loop and effectively just return NULL without ever dereferencing 'slot'. 2) radix_tree_iter_retry() via tagged iteration like radix_tree_for_each_tagged(). With the current code this case is unhandled and we have seen it result in a kernel crash when we dereference 'slot'. 3) radix_tree_iter_next() via via a non-tagged iteration like radix_tree_for_each_slot(). This currently happens in shmem_tag_pins() and shmem_partial_swap_usage(). I think that this case is currently unhandled. Unlike with radix_tree_iter_retry() case (#1 above) we can't rely on 'count' in the else case of radix_tree_next_slot() to be zero, so I think it's possible we'll end up executing code in the while() loop in radix_tree_next_slot() that assumes 'slot' is valid. I haven't actually seen this crash on a test setup, but I don't think the current code is safe. 4) radix_tree_iter_next() via tagged iteration like radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins(). radix_tree_iter_next() zeros out iter->tags, so we end up exiting radix_tree_next_slot() here: if (flags & RADIX_TREE_ITER_TAGGED) { void *canon = slot; iter->tags >>= 1; if (unlikely(!iter->tags)) return NULL; Really we want to guarantee that we just bail out of radix_tree_next_slot() if we have a NULL 'slot'. This is a more explicit way of handling all the 4 above cases. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> --- include/linux/radix-tree.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index cb4b7e8..840308d 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -463,6 +463,9 @@ static inline struct radix_tree_node *entry_to_node(void *ptr) static __always_inline void ** radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags) { + if (unlikely(!slot)) + return NULL; + if (flags & RADIX_TREE_ITER_TAGGED) { void *canon = slot; -- 2.9.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-15 19:00 ` Ross Zwisler @ 2016-07-15 20:57 ` Andrew Morton 2016-07-18 23:09 ` Ross Zwisler 2016-07-16 13:45 ` Konstantin Khlebnikov 1 sibling, 1 reply; 11+ messages in thread From: Andrew Morton @ 2016-07-15 20:57 UTC (permalink / raw) To: Ross Zwisler Cc: Andrey Ryabinin, Jan Kara, Kirill A. Shutemov, linux-mm, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, linux-kernel, Konstantin Khlebnikov, Matthew Wilcox, Hugh Dickins, stable On Fri, 15 Jul 2016 13:00:40 -0600 Ross Zwisler <ross.zwisler@linux.intel.com> wrote: > > ... > > In looking at this more, I agree that your patch fixes this particular bug, > but I think that ultimately we might want something more general. > > ... > > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -463,6 +463,9 @@ static inline struct radix_tree_node *entry_to_node(void *ptr) > static __always_inline void ** > radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags) > { > + if (unlikely(!slot)) > + return NULL; > + > if (flags & RADIX_TREE_ITER_TAGGED) { > void *canon = slot; I'll hang onto Andrey's radix-tree-fix-radix_tree_iter_retry-for-tagged-iterators.patch for now, plan to send it in to Linus Wednesdayish. If we can get the above settled down prior to that then I shall swap over. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-15 20:57 ` Andrew Morton @ 2016-07-18 23:09 ` Ross Zwisler 0 siblings, 0 replies; 11+ messages in thread From: Ross Zwisler @ 2016-07-18 23:09 UTC (permalink / raw) To: Andrew Morton Cc: Ross Zwisler, Andrey Ryabinin, Jan Kara, Kirill A. Shutemov, linux-mm, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, linux-kernel, Konstantin Khlebnikov, Matthew Wilcox, Hugh Dickins, stable On Fri, Jul 15, 2016 at 01:57:33PM -0700, Andrew Morton wrote: > On Fri, 15 Jul 2016 13:00:40 -0600 Ross Zwisler <ross.zwisler@linux.intel.com> wrote: > > > > > ... > > > > In looking at this more, I agree that your patch fixes this particular bug, > > but I think that ultimately we might want something more general. > > > > ... > > > > --- a/include/linux/radix-tree.h > > +++ b/include/linux/radix-tree.h > > @@ -463,6 +463,9 @@ static inline struct radix_tree_node *entry_to_node(void *ptr) > > static __always_inline void ** > > radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags) > > { > > + if (unlikely(!slot)) > > + return NULL; > > + > > if (flags & RADIX_TREE_ITER_TAGGED) { > > void *canon = slot; > > I'll hang onto Andrey's > radix-tree-fix-radix_tree_iter_retry-for-tagged-iterators.patch for > now, plan to send it in to Linus Wednesdayish. If we can get the above > settled down prior to that then I shall swap over. Sure, that works for me. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-15 19:00 ` Ross Zwisler 2016-07-15 20:57 ` Andrew Morton @ 2016-07-16 13:45 ` Konstantin Khlebnikov 2016-07-17 17:57 ` Konstantin Khlebnikov 2016-07-19 4:11 ` Ross Zwisler 1 sibling, 2 replies; 11+ messages in thread From: Konstantin Khlebnikov @ 2016-07-16 13:45 UTC (permalink / raw) To: Ross Zwisler Cc: Andrey Ryabinin, Andrew Morton, Jan Kara, Kirill A. Shutemov, linux-mm@kvack.org, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, Linux Kernel Mailing List, Matthew Wilcox, Hugh Dickins, Stable On Fri, Jul 15, 2016 at 10:00 PM, Ross Zwisler <ross.zwisler@linux.intel.com> wrote: > On Fri, Jul 15, 2016 at 11:52:58AM +0300, Andrey Ryabinin wrote: >> On 07/15/2016 01:25 AM, Ross Zwisler wrote: >> > On Thu, Jul 14, 2016 at 02:19:56PM +0300, Andrey Ryabinin wrote: >> >> radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. >> >> Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() >> >> leading to crash: >> >> >> >> RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 >> >> [<ffffffff816951a4>] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 >> >> .... >> >> Call Trace: >> >> [<ffffffff816cd91a>] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 >> >> [<ffffffff81ab4231>] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 >> >> [<ffffffff81ac883e>] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 >> >> [<ffffffff816c99c7>] do_writepages+0x97/0x100 mm/page-writeback.c:2364 >> >> [<ffffffff8169bee8>] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 >> >> [<ffffffff8169c371>] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 >> >> [<ffffffff81aa584d>] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 >> >> [<ffffffff818b667a>] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 >> >> [< inline >] vfs_fsync fs/sync.c:209 >> >> [<ffffffff818b6832>] do_fsync+0x42/0x70 fs/sync.c:219 >> >> [< inline >] SYSC_fdatasync fs/sync.c:232 >> >> [<ffffffff818b6f89>] SyS_fdatasync+0x19/0x20 fs/sync.c:230 >> >> [<ffffffff86a94e00>] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 >> >> >> >> We must reset iterator's tags to bail out from radix_tree_next_slot() and >> >> go to the slow-path in radix_tree_next_chunk(). >> > >> > This analysis doesn't make sense to me. In find_get_pages_tag(), when we call >> > radix_tree_iter_retry(), this sets the local 'slot' variable to NULL, then >> > does a 'continue'. This will hop to the next iteration of the >> > radix_tree_for_each_tagged() loop, which will very check the exit condition of >> > the for() loop: >> > >> > #define radix_tree_for_each_tagged(slot, root, iter, start, tag) \ >> > for (slot = radix_tree_iter_init(iter, start) ; \ >> > slot || (slot = radix_tree_next_chunk(root, iter, \ >> > RADIX_TREE_ITER_TAGGED | tag)) ; \ >> > slot = radix_tree_next_slot(slot, iter, \ >> > RADIX_TREE_ITER_TAGGED)) >> > >> > So, we'll run the >> > slot || (slot = radix_tree_next_chunk(root, iter, \ >> > RADIX_TREE_ITER_TAGGED | tag)) ; \ >> > >> > bit first. >> >> This is not the way how the for() loop works. slot = radix_tree_next_slot() executed first >> and only after that goes the condition statement. > > Right...*sigh*... Thanks for the sanity check. :) > >> > 'slot' is NULL, so we'll set it via radix_tree_next_chunk(). At >> > this point radix_tree_next_slot() hasn't been called. >> > >> > radix_tree_next_chunk() will set up the iter->index, iter->next_index and >> > iter->tags before it returns. The next iteration of the loop in >> > find_get_pages_tag() will use the non-NULL slot provided by >> > radix_tree_next_chunk(), and only after that iteration will we call >> > radix_tree_next_slot() again. By then iter->tags should be up to date. >> > >> > Do you have a test setup that reliably fails without this code but passes when >> > you zero out iter->tags? >> > >> >> >> Yup, I run Dmitry's reproducer in a parallel loop: >> $ while true; do ./a.out & done >> >> Usually it takes just couple minutes maximum. > > Cool - I was able to get this to work on my system as well by upping the > thread count. > > In looking at this more, I agree that your patch fixes this particular bug, > but I think that ultimately we might want something more general. > > IIUC, the real issue is that we shouldn't be running through > radix_tree_next_slot() with a NULL 'slot' parameter. In the end I think it's > fine to zero out iter->tags in radix_tree_iter_retry(), but really we want to > guarantee that we just bail out of radix_tree_next_slot() if we have a NULL > 'slot'. > > I've run this patch in my test setup, and it fixes the reproducer provided by > Dmitry. I've also run xfstests against it with out any failures. > > --- 8< --- > From 533beefac12f61f467aeb72e2d2c46685247b9bc Mon Sep 17 00:00:00 2001 > From: Ross Zwisler <ross.zwisler@linux.intel.com> > Date: Fri, 15 Jul 2016 12:46:38 -0600 > Subject: [PATCH] radix-tree: 'slot' can be NULL in radix_tree_next_slot() > > There are four cases I can see where we could end up with a NULL 'slot' in > radix_tree_next_slot() (there might be more): > > 1) radix_tree_iter_retry() via a non-tagged iteration like > radix_tree_for_each_slot(). In this case we currently aren't seeing a bug > because radix_tree_iter_retry() sets > > iter->next_index = iter->index; > > which means that in in the else case in radix_tree_next_slot(), 'count' is > zero, so we skip over the while() loop and effectively just return NULL > without ever dereferencing 'slot'. > > 2) radix_tree_iter_retry() via tagged iteration like > radix_tree_for_each_tagged(). With the current code this case is > unhandled and we have seen it result in a kernel crash when we dereference > 'slot'. > > 3) radix_tree_iter_next() via via a non-tagged iteration like > radix_tree_for_each_slot(). This currently happens in shmem_tag_pins() > and shmem_partial_swap_usage(). > > I think that this case is currently unhandled. Unlike with > radix_tree_iter_retry() case (#1 above) we can't rely on 'count' in the else > case of radix_tree_next_slot() to be zero, so I think it's possible we'll end > up executing code in the while() loop in radix_tree_next_slot() that assumes > 'slot' is valid. > > I haven't actually seen this crash on a test setup, but I don't think the > current code is safe. This is becase distance between ->index and ->next_index now could be more that one? We could fix that by adding "iter->index = iter->next_index - 1;" into radix_tree_iter_next() right after updating next_index and tweak multi-order itreration logic if it depends on that. I'd like to keep radix_tree_next_slot() as small as possible because this is supposed to be a fast-path. > > 4) radix_tree_iter_next() via tagged iteration like > radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins(). > > radix_tree_iter_next() zeros out iter->tags, so we end up exiting > radix_tree_next_slot() here: > > if (flags & RADIX_TREE_ITER_TAGGED) { > void *canon = slot; > > iter->tags >>= 1; > if (unlikely(!iter->tags)) > return NULL; > > Really we want to guarantee that we just bail out of > radix_tree_next_slot() if we have a NULL 'slot'. This is a more explicit > way of handling all the 4 above cases. > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> > --- > include/linux/radix-tree.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index cb4b7e8..840308d 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -463,6 +463,9 @@ static inline struct radix_tree_node *entry_to_node(void *ptr) > static __always_inline void ** > radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags) > { > + if (unlikely(!slot)) > + return NULL; > + > if (flags & RADIX_TREE_ITER_TAGGED) { > void *canon = slot; > > -- > 2.9.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-16 13:45 ` Konstantin Khlebnikov @ 2016-07-17 17:57 ` Konstantin Khlebnikov 2016-07-19 4:11 ` Ross Zwisler 1 sibling, 0 replies; 11+ messages in thread From: Konstantin Khlebnikov @ 2016-07-17 17:57 UTC (permalink / raw) To: Ross Zwisler Cc: Andrey Ryabinin, Andrew Morton, Jan Kara, Kirill A. Shutemov, linux-mm@kvack.org, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, Linux Kernel Mailing List, Matthew Wilcox, Hugh Dickins, Stable On Sat, Jul 16, 2016 at 4:45 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: > On Fri, Jul 15, 2016 at 10:00 PM, Ross Zwisler > <ross.zwisler@linux.intel.com> wrote: >> On Fri, Jul 15, 2016 at 11:52:58AM +0300, Andrey Ryabinin wrote: >>> On 07/15/2016 01:25 AM, Ross Zwisler wrote: >>> > On Thu, Jul 14, 2016 at 02:19:56PM +0300, Andrey Ryabinin wrote: >>> >> radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. >>> >> Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() >>> >> leading to crash: >>> >> >>> >> RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 >>> >> [<ffffffff816951a4>] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 >>> >> .... >>> >> Call Trace: >>> >> [<ffffffff816cd91a>] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 >>> >> [<ffffffff81ab4231>] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 >>> >> [<ffffffff81ac883e>] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 >>> >> [<ffffffff816c99c7>] do_writepages+0x97/0x100 mm/page-writeback.c:2364 >>> >> [<ffffffff8169bee8>] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 >>> >> [<ffffffff8169c371>] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 >>> >> [<ffffffff81aa584d>] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 >>> >> [<ffffffff818b667a>] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 >>> >> [< inline >] vfs_fsync fs/sync.c:209 >>> >> [<ffffffff818b6832>] do_fsync+0x42/0x70 fs/sync.c:219 >>> >> [< inline >] SYSC_fdatasync fs/sync.c:232 >>> >> [<ffffffff818b6f89>] SyS_fdatasync+0x19/0x20 fs/sync.c:230 >>> >> [<ffffffff86a94e00>] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 >>> >> >>> >> We must reset iterator's tags to bail out from radix_tree_next_slot() and >>> >> go to the slow-path in radix_tree_next_chunk(). >>> > >>> > This analysis doesn't make sense to me. In find_get_pages_tag(), when we call >>> > radix_tree_iter_retry(), this sets the local 'slot' variable to NULL, then >>> > does a 'continue'. This will hop to the next iteration of the >>> > radix_tree_for_each_tagged() loop, which will very check the exit condition of >>> > the for() loop: >>> > >>> > #define radix_tree_for_each_tagged(slot, root, iter, start, tag) \ >>> > for (slot = radix_tree_iter_init(iter, start) ; \ >>> > slot || (slot = radix_tree_next_chunk(root, iter, \ >>> > RADIX_TREE_ITER_TAGGED | tag)) ; \ >>> > slot = radix_tree_next_slot(slot, iter, \ >>> > RADIX_TREE_ITER_TAGGED)) >>> > >>> > So, we'll run the >>> > slot || (slot = radix_tree_next_chunk(root, iter, \ >>> > RADIX_TREE_ITER_TAGGED | tag)) ; \ >>> > >>> > bit first. >>> >>> This is not the way how the for() loop works. slot = radix_tree_next_slot() executed first >>> and only after that goes the condition statement. >> >> Right...*sigh*... Thanks for the sanity check. :) >> >>> > 'slot' is NULL, so we'll set it via radix_tree_next_chunk(). At >>> > this point radix_tree_next_slot() hasn't been called. >>> > >>> > radix_tree_next_chunk() will set up the iter->index, iter->next_index and >>> > iter->tags before it returns. The next iteration of the loop in >>> > find_get_pages_tag() will use the non-NULL slot provided by >>> > radix_tree_next_chunk(), and only after that iteration will we call >>> > radix_tree_next_slot() again. By then iter->tags should be up to date. >>> > >>> > Do you have a test setup that reliably fails without this code but passes when >>> > you zero out iter->tags? >>> > >>> >>> >>> Yup, I run Dmitry's reproducer in a parallel loop: >>> $ while true; do ./a.out & done >>> >>> Usually it takes just couple minutes maximum. >> >> Cool - I was able to get this to work on my system as well by upping the >> thread count. >> >> In looking at this more, I agree that your patch fixes this particular bug, >> but I think that ultimately we might want something more general. >> >> IIUC, the real issue is that we shouldn't be running through >> radix_tree_next_slot() with a NULL 'slot' parameter. In the end I think it's >> fine to zero out iter->tags in radix_tree_iter_retry(), but really we want to >> guarantee that we just bail out of radix_tree_next_slot() if we have a NULL >> 'slot'. >> >> I've run this patch in my test setup, and it fixes the reproducer provided by >> Dmitry. I've also run xfstests against it with out any failures. >> >> --- 8< --- >> From 533beefac12f61f467aeb72e2d2c46685247b9bc Mon Sep 17 00:00:00 2001 >> From: Ross Zwisler <ross.zwisler@linux.intel.com> >> Date: Fri, 15 Jul 2016 12:46:38 -0600 >> Subject: [PATCH] radix-tree: 'slot' can be NULL in radix_tree_next_slot() >> >> There are four cases I can see where we could end up with a NULL 'slot' in >> radix_tree_next_slot() (there might be more): >> >> 1) radix_tree_iter_retry() via a non-tagged iteration like >> radix_tree_for_each_slot(). In this case we currently aren't seeing a bug >> because radix_tree_iter_retry() sets >> >> iter->next_index = iter->index; >> >> which means that in in the else case in radix_tree_next_slot(), 'count' is >> zero, so we skip over the while() loop and effectively just return NULL >> without ever dereferencing 'slot'. >> >> 2) radix_tree_iter_retry() via tagged iteration like >> radix_tree_for_each_tagged(). With the current code this case is >> unhandled and we have seen it result in a kernel crash when we dereference >> 'slot'. >> >> 3) radix_tree_iter_next() via via a non-tagged iteration like >> radix_tree_for_each_slot(). This currently happens in shmem_tag_pins() >> and shmem_partial_swap_usage(). >> >> I think that this case is currently unhandled. Unlike with >> radix_tree_iter_retry() case (#1 above) we can't rely on 'count' in the else >> case of radix_tree_next_slot() to be zero, so I think it's possible we'll end >> up executing code in the while() loop in radix_tree_next_slot() that assumes >> 'slot' is valid. >> >> I haven't actually seen this crash on a test setup, but I don't think the >> current code is safe. > > This is becase distance between ->index and ->next_index now could be > more that one? > > We could fix that by adding "iter->index = iter->next_index - 1;" into > radix_tree_iter_next() > right after updating next_index and tweak multi-order itreration logic > if it depends on that. > > I'd like to keep radix_tree_next_slot() as small as possible because > this is supposed to be a fast-path. Support of multi-order entries in iterator is ridiculously over-engineered. If radix_tree_next_chunk() finds multi-order entry it must return chunk with size 1, radix_tree_next_slot() should know nothing about that. I'll try to fix that. > >> >> 4) radix_tree_iter_next() via tagged iteration like >> radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins(). >> >> radix_tree_iter_next() zeros out iter->tags, so we end up exiting >> radix_tree_next_slot() here: >> >> if (flags & RADIX_TREE_ITER_TAGGED) { >> void *canon = slot; >> >> iter->tags >>= 1; >> if (unlikely(!iter->tags)) >> return NULL; >> >> Really we want to guarantee that we just bail out of >> radix_tree_next_slot() if we have a NULL 'slot'. This is a more explicit >> way of handling all the 4 above cases. >> >> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> >> --- >> include/linux/radix-tree.h | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h >> index cb4b7e8..840308d 100644 >> --- a/include/linux/radix-tree.h >> +++ b/include/linux/radix-tree.h >> @@ -463,6 +463,9 @@ static inline struct radix_tree_node *entry_to_node(void *ptr) >> static __always_inline void ** >> radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags) >> { >> + if (unlikely(!slot)) >> + return NULL; >> + >> if (flags & RADIX_TREE_ITER_TAGGED) { >> void *canon = slot; >> >> -- >> 2.9.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-16 13:45 ` Konstantin Khlebnikov 2016-07-17 17:57 ` Konstantin Khlebnikov @ 2016-07-19 4:11 ` Ross Zwisler 2016-07-19 9:07 ` Konstantin Khlebnikov 1 sibling, 1 reply; 11+ messages in thread From: Ross Zwisler @ 2016-07-19 4:11 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Ross Zwisler, Andrey Ryabinin, Andrew Morton, Jan Kara, Kirill A. Shutemov, linux-mm@kvack.org, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, Linux Kernel Mailing List, Matthew Wilcox, Hugh Dickins, Stable On Sat, Jul 16, 2016 at 04:45:31PM +0300, Konstantin Khlebnikov wrote: > On Fri, Jul 15, 2016 at 10:00 PM, Ross Zwisler > <ross.zwisler@linux.intel.com> wrote: <> > > 3) radix_tree_iter_next() via via a non-tagged iteration like > > radix_tree_for_each_slot(). This currently happens in shmem_tag_pins() > > and shmem_partial_swap_usage(). > > > > I think that this case is currently unhandled. Unlike with > > radix_tree_iter_retry() case (#1 above) we can't rely on 'count' in the else > > case of radix_tree_next_slot() to be zero, so I think it's possible we'll end > > up executing code in the while() loop in radix_tree_next_slot() that assumes > > 'slot' is valid. > > > > I haven't actually seen this crash on a test setup, but I don't think the > > current code is safe. > > This is becase distance between ->index and ->next_index now could be > more that one? > > We could fix that by adding "iter->index = iter->next_index - 1;" into > radix_tree_iter_next() > right after updating next_index and tweak multi-order itreration logic > if it depends on that. > > I'd like to keep radix_tree_next_slot() as small as possible because > this is supposed to be a fast-path. I think it'll be exactly one? iter->next_index = __radix_tree_iter_add(iter, 1); So iter->index will be X, iter->next_index will be X+1, accounting for the iterator's shift. So, basically, whatever your height is, you'll be set up to process one more entry, I think. This means that radix_tree_chunk_size() will return 1. I guess with the current logic this is safe: static __always_inline void ** radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags) { ... } else { long count = radix_tree_chunk_size(iter); void *canon = slot; while (--count > 0) { /* code that assumes 'slot' is non-NULL */ So 'count' will be 1, the prefix decrement will make it 0, so we won't execute the code in the while() loop. So maybe all the cases are covered after all. It seems like we need some unit tests in tools/testing/radix-tree around this - I'll try and find time to add them this week. I just feel like this isn't very organized. We have a lot of code in radix_tree_next_slot() that assumes that 'slot' is non-NULL, but we don't check it anywhere. We know it *can* be NULL, but we just happen to have things set up so that none of the code that uses 'slot' is executed. I personally feel like a quick check for slot==NULL at the beginning of the function is the simplest way to keep ourselves safe, and it doesn't seem like we'd be adding that much overhead. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. 2016-07-19 4:11 ` Ross Zwisler @ 2016-07-19 9:07 ` Konstantin Khlebnikov 0 siblings, 0 replies; 11+ messages in thread From: Konstantin Khlebnikov @ 2016-07-19 9:07 UTC (permalink / raw) To: Ross Zwisler Cc: Andrey Ryabinin, Andrew Morton, Jan Kara, Kirill A. Shutemov, linux-mm@kvack.org, Greg Thelen, Suleiman Souhlal, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin, Linux Kernel Mailing List, Matthew Wilcox, Hugh Dickins, Stable On Tue, Jul 19, 2016 at 7:11 AM, Ross Zwisler <ross.zwisler@linux.intel.com> wrote: > On Sat, Jul 16, 2016 at 04:45:31PM +0300, Konstantin Khlebnikov wrote: >> On Fri, Jul 15, 2016 at 10:00 PM, Ross Zwisler >> <ross.zwisler@linux.intel.com> wrote: > <> >> > 3) radix_tree_iter_next() via via a non-tagged iteration like >> > radix_tree_for_each_slot(). This currently happens in shmem_tag_pins() >> > and shmem_partial_swap_usage(). >> > >> > I think that this case is currently unhandled. Unlike with >> > radix_tree_iter_retry() case (#1 above) we can't rely on 'count' in the else >> > case of radix_tree_next_slot() to be zero, so I think it's possible we'll end >> > up executing code in the while() loop in radix_tree_next_slot() that assumes >> > 'slot' is valid. >> > >> > I haven't actually seen this crash on a test setup, but I don't think the >> > current code is safe. >> >> This is becase distance between ->index and ->next_index now could be >> more that one? >> >> We could fix that by adding "iter->index = iter->next_index - 1;" into >> radix_tree_iter_next() >> right after updating next_index and tweak multi-order itreration logic >> if it depends on that. >> >> I'd like to keep radix_tree_next_slot() as small as possible because >> this is supposed to be a fast-path. > > I think it'll be exactly one? > > iter->next_index = __radix_tree_iter_add(iter, 1); > > So iter->index will be X, iter->next_index will be X+1, accounting for the > iterator's shift. So, basically, whatever your height is, you'll be set up to > process one more entry, I think. > > This means that radix_tree_chunk_size() will return 1. I guess with the > current logic this is safe: > > static __always_inline void ** > radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags) > { > ... > } else { > long count = radix_tree_chunk_size(iter); > void *canon = slot; > > while (--count > 0) { > /* code that assumes 'slot' is non-NULL */ > > So 'count' will be 1, the prefix decrement will make it 0, so we won't execute > the code in the while() loop. So maybe all the cases are covered after all. > > It seems like we need some unit tests in tools/testing/radix-tree around this > - I'll try and find time to add them this week. > > I just feel like this isn't very organized. We have a lot of code in > radix_tree_next_slot() that assumes that 'slot' is non-NULL, but we don't > check it anywhere. We know it *can* be NULL, but we just happen to have > things set up so that none of the code that uses 'slot' is executed. > > I personally feel like a quick check for slot==NULL at the beginning of the > function is the simplest way to keep ourselves safe, and it doesn't seem like > we'd be adding that much overhead. Either fix is fine now. I working on better design for multiorder iterator which moves all that logic from radix_tree_next_slot() into radix_tree_next_chunk(). Most likely I'll change tree structure a little. For example I think sibling entries chould hold offset to head entry and order rather than a pointer to it. Or maybe size: support of non-power-of-2 entries is interesting feature too. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-07-19 9:07 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CACT4Y+a99OW7TYeLsuEic19uY2j45DGXL=LowUMq3TywWS3f2Q@mail.gmail.com>
2016-07-14 11:19 ` [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators Andrey Ryabinin
2016-07-14 12:21 ` Konstantin Khlebnikov
2016-07-14 22:25 ` Ross Zwisler
2016-07-15 8:52 ` Andrey Ryabinin
2016-07-15 19:00 ` Ross Zwisler
2016-07-15 20:57 ` Andrew Morton
2016-07-18 23:09 ` Ross Zwisler
2016-07-16 13:45 ` Konstantin Khlebnikov
2016-07-17 17:57 ` Konstantin Khlebnikov
2016-07-19 4:11 ` Ross Zwisler
2016-07-19 9:07 ` Konstantin Khlebnikov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).