From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f70.google.com (mail-it0-f70.google.com [209.85.214.70]) by kanga.kvack.org (Postfix) with ESMTP id EC4DC6B0005 for ; Fri, 15 Jul 2016 04:52:02 -0400 (EDT) Received: by mail-it0-f70.google.com with SMTP id d65so28274820ith.1 for ; Fri, 15 Jul 2016 01:52:02 -0700 (PDT) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on0114.outbound.protection.outlook.com. [104.47.2.114]) by mx.google.com with ESMTPS id 190si5913932oib.247.2016.07.15.01.52.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 15 Jul 2016 01:52:02 -0700 (PDT) Subject: Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. References: <1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com> <20160714222527.GA26136@linux.intel.com> From: Andrey Ryabinin Message-ID: <5788A46A.70106@virtuozzo.com> Date: Fri, 15 Jul 2016 11:52:58 +0300 MIME-Version: 1.0 In-Reply-To: <20160714222527.GA26136@linux.intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Ross Zwisler Cc: Andrew Morton , Jan Kara , "Kirill A. Shutemov" , linux-mm@kvack.org, Greg Thelen , Suleiman Souhlal , syzkaller@googlegroups.com, Kostya Serebryany , Alexander Potapenko , Sasha Levin , linux-kernel@vger.kernel.org, Konstantin Khlebnikov , Matthew Wilcox , Hugh Dickins , stable@vger.kernel.org On 07/15/2016 01:25 AM, Ross Zwisler wrote: > On Thu, Jul 14, 2016 at 02:19:56PM +0300, Andrey Ryabinin wrote: >> radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. >> Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() >> leading to crash: >> >> RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 >> [] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 >> .... >> Call Trace: >> [] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 >> [] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 >> [] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 >> [] do_writepages+0x97/0x100 mm/page-writeback.c:2364 >> [] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 >> [] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 >> [] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 >> [] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 >> [< inline >] vfs_fsync fs/sync.c:209 >> [] do_fsync+0x42/0x70 fs/sync.c:219 >> [< inline >] SYSC_fdatasync fs/sync.c:232 >> [] SyS_fdatasync+0x19/0x20 fs/sync.c:230 >> [] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 >> >> We must reset iterator's tags to bail out from radix_tree_next_slot() and >> go to the slow-path in radix_tree_next_chunk(). > > This analysis doesn't make sense to me. In find_get_pages_tag(), when we call > radix_tree_iter_retry(), this sets the local 'slot' variable to NULL, then > does a 'continue'. This will hop to the next iteration of the > radix_tree_for_each_tagged() loop, which will very check the exit condition of > the for() loop: > > #define radix_tree_for_each_tagged(slot, root, iter, start, tag) \ > for (slot = radix_tree_iter_init(iter, start) ; \ > slot || (slot = radix_tree_next_chunk(root, iter, \ > RADIX_TREE_ITER_TAGGED | tag)) ; \ > slot = radix_tree_next_slot(slot, iter, \ > RADIX_TREE_ITER_TAGGED)) > > So, we'll run the > slot || (slot = radix_tree_next_chunk(root, iter, \ > RADIX_TREE_ITER_TAGGED | tag)) ; \ > > bit first. This is not the way how the for() loop works. slot = radix_tree_next_slot() executed first and only after that goes the condition statement. > 'slot' is NULL, so we'll set it via radix_tree_next_chunk(). At > this point radix_tree_next_slot() hasn't been called. > > radix_tree_next_chunk() will set up the iter->index, iter->next_index and > iter->tags before it returns. The next iteration of the loop in > find_get_pages_tag() will use the non-NULL slot provided by > radix_tree_next_chunk(), and only after that iteration will we call > radix_tree_next_slot() again. By then iter->tags should be up to date. > > Do you have a test setup that reliably fails without this code but passes when > you zero out iter->tags? > Yup, I run Dmitry's reproducer in a parallel loop: $ while true; do ./a.out & done Usually it takes just couple minutes maximum. > I've been looking at this as well, but haven't been able to get a reliable > reproducer in my test setup. > >> Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") >> Signed-off-by: Andrey Ryabinin >> Reported-by: Dmitry Vyukov >> Cc: Konstantin Khlebnikov >> Cc: Matthew Wilcox >> Cc: Hugh Dickins >> Cc: >> --- >> include/linux/radix-tree.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h >> index cb4b7e8..eca6f62 100644 >> --- a/include/linux/radix-tree.h >> +++ b/include/linux/radix-tree.h >> @@ -407,6 +407,7 @@ static inline __must_check >> void **radix_tree_iter_retry(struct radix_tree_iter *iter) >> { >> iter->next_index = iter->index; >> + iter->tags = 0; >> return NULL; >> } >> >> -- >> 2.7.3 >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932513AbcGOJHx (ORCPT ); Fri, 15 Jul 2016 05:07:53 -0400 Received: from mail-db5eur01on0093.outbound.protection.outlook.com ([104.47.2.93]:43551 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932476AbcGOJHl (ORCPT ); Fri, 15 Jul 2016 05:07:41 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=aryabinin@virtuozzo.com; Subject: Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. To: Ross Zwisler References: <1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com> <20160714222527.GA26136@linux.intel.com> CC: Andrew Morton , Jan Kara , "Kirill A. Shutemov" , , Greg Thelen , Suleiman Souhlal , , Kostya Serebryany , Alexander Potapenko , Sasha Levin , , Konstantin Khlebnikov , Matthew Wilcox , Hugh Dickins , From: Andrey Ryabinin Message-ID: <5788A46A.70106@virtuozzo.com> Date: Fri, 15 Jul 2016 11:52:58 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <20160714222527.GA26136@linux.intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [195.214.232.10] X-ClientProxiedBy: AM4PR01CA0008.eurprd01.prod.exchangelabs.com (10.164.74.146) To HE1PR0801MB1306.eurprd08.prod.outlook.com (10.167.247.148) X-MS-Office365-Filtering-Correlation-Id: d4bcd815-8245-4f92-dffa-08d3ac8d47cc X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1306;2:cHiu8dz1fKZTDESIYRf333q/T5HTracmETCel+9kkNBku/eDJOgS7Q24X8ODTtrpveY2KIwfR1AIylaXTc1ePwnXQt3onrvakc85jO9PKeVJWLRMrsXLQdQ4o1b7IAeMKuhsvj+fk5c9m0AGAyClRLzcWycRShvjOlaFexsX4+4uqQZiUGlmZt/xX3v+A3eC;3:66aKO72bgsTwMJ3n0efl+q+FpMElZJRTV6WSIcxI/NeYHymLaXX6fbcTfHgt8asOciDaeTJmFZlfxosRRLlTb2Moyb1PVTckcZHew0LGd+ujY9ApP2Cg1DutDxSlmlAz;25:1c5JpnTUe7+huSTb9rYR1B3b4aX16mCFJEeaWjaTN0RY1E8nsPVNX42cmSkeO58DOhMUxowZ1tXJGza9P1aPy3xnIbJ8AT62oGx+pBdVWO+L+Q+tXA9r1/NXdFyz7i38CrUS9jmXbOoSzgdB0mu7b1maYNVYkmG0b3+qtxVGauSLiwOd1lgxlzDYun+t+zbugOpBagIhh3hQGLCJriWKc3nKXlcmRsHWY0rYRsE4hP4bCH0AOT70GVc+uFQ5yHE2ZUBeKja9JKmceNzaowvG9VDgHHZCqLhjttun5/aLvTfaaxO0bR0BF8FA6F2CvV4Gy+PDZ+a4bg9sSNt3FR8JCQgFZCqydZuoMUz3dL6e612H8vN+fNjWCO3nNyGenmuP2xTg4f+BZbti/24Hxh5GE8/B7PU8qMKDmi7SD9fq0+s= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0801MB1306; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1306;31:Dquw7olFBsyIxMFrokKFFar8q5jl7Lmw8lWbmFY8VNeVFcY4jy8CcxZdMjnoc7P7ixg89P7OOJ0+4G2g99Ua7q+8tVhdKXqwOgz/uepPSK1TGrnXE0U1Fl56Hno8X2d7yG2tviqAiHWQPn0sTWySSg0X55jaHxUbk1ThIN4s9SG5OOwZlpNrobevhDTrRf6RFL3CveM1r44JQdB9g63ugA==;4:DXZxIf17pd9epuiyJwQVRQu9wPbrUCI8fQcV+P6x7kTxQplP9ZUlwAvDmdPl+Vz5Q1zR1NEjJzoZIGRsSH1B3ylm3XxQChamiGlPnrXbnm/MVaWiSTqeaYSKZmb5c9v7UWlR3aJFYb6omyintRkhpUxX9LM/BFY1/275b9P2weNY9aq4vagQLqfMqdKEwG/jkc5nLY2I6a2Ir6v907U1U4GLxTa9kbcTmN7ByLsK0vVtwhihW68+JwvctrxraN2KJo7hT0yz3OGYAPE6NnSmToH72SVa+yQcnYCkpWJgAZI9SB42leej7OsDK65PjQczZmCJDcjIfSGMAwyjJ19VlIGDukuLZSXTf0kABV2m8lFayEB4s79tqP/L8Ob/OXrPtOiSdDjmZmrJ2uhBJ9mwSmym7atV+mf/MqKh5Cc0jnp9q089JfMLj6s7hauoY/5foGwW2RS55f2Cu2bPTTwjNddxpzDWPWvIsISeujAh2Q2TfWyPnJ85gWi1XFIDDXbB53jRoWdUwePuHPH9bN7BuQ== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(9452136761055)(211936372134217)(228905959029699); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040130)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6041072)(6043046);SRVR:HE1PR0801MB1306;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0801MB1306; X-Forefront-PRVS: 00046D390F X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(7916002)(189002)(199003)(377454003)(24454002)(23746002)(81156014)(33656002)(36756003)(7846002)(66066001)(65956001)(47776003)(7736002)(65806001)(92566002)(50466002)(8676002)(305945005)(83506001)(105586002)(64126003)(106356001)(76176999)(68736007)(81166006)(65816999)(54356999)(4326007)(50986999)(19580395003)(6116002)(19580405001)(2906002)(586003)(101416001)(2950100001)(3846002)(42186005)(86362001)(230700001)(97736004)(77096005)(110136002)(4001350100001)(189998001);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR0801MB1306;H:[10.30.19.223];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;HE1PR0801MB1306;23:ObhCQc+yxL3E8XY3lrSNS8oFW7CxvvVD7ge?= =?Windows-1252?Q?QJjIznx2i+PlIGxhYtogqZJ57qRdCtP1Riu9utWWDkQmznpLzYFYmPAC?= =?Windows-1252?Q?D6E/jppQ20kLj0ioIHZqNL+YFGjhInt7L63RgqWtoTlaBHonyymt5Tnz?= =?Windows-1252?Q?H0mpRSl1WedxOtfTVFpdRaPDeqHjNPZzMQH2DMm1CAGGIdwJJ7V0U0+r?= =?Windows-1252?Q?UpXEP3/ObW8sEvB2A99ZkQBHsVr+De92wczGqoKZQEPavp+d50+uSx48?= =?Windows-1252?Q?1LZ5J5epMU+vsD0SZIj+uwsNO0BN5n9EYlyZ6zMQnmxAHy5aGY5v0MFu?= =?Windows-1252?Q?T5zqcMoyuvqf/c9inzcRM9F367aaMX6twMrHHg9+lYLD1Sd4c3A25ddn?= =?Windows-1252?Q?D2noXyhlSOhHaEcCdsl2/nifg9eY6e+UcpuiPP4Z0lC9RjYlEuFxIKZA?= =?Windows-1252?Q?nbsEyroDjfOJzVdM9r6F5UitYYDbgDLKnp3KMv/YnXT/diFClNEz3/Fq?= =?Windows-1252?Q?MD7rYew8KK0OCYEZVuWEYYUgGVJ52MQpg/ZEGvoAr5qo6NzgPqXbUcvS?= =?Windows-1252?Q?R/5zhOSpAMt+INmmHBSrAcjyqmf+NplUvOR2pUKBYozQYMCrwwhI01D1?= =?Windows-1252?Q?i1YWqOqwv9bxpGWcZUce8txY1u4BiBImMm21vTMiSfHxeNo2hGZkCzwx?= =?Windows-1252?Q?lBiTl3VlbZ51W8fMxyEF2S9BohVQ9qU83BFSWivXXMj73waNs7ddZRy3?= =?Windows-1252?Q?CldPFNlDfYSlBg6gBViXoH3LJGa1VijsPjFK6ZR+wDfGSBmfiRxJ/XPO?= =?Windows-1252?Q?CADEeMCZ5txkLisJDRuKiaKg8AdT+PXPl8Hw8aGfeyzHA/qMsB/dUzUZ?= =?Windows-1252?Q?Ee24mXNjq+r5fZ4V5ra28fEdh0MPCoAeqe8izITKYk7hpqCFgyLWBQB6?= =?Windows-1252?Q?+/G0mh0WpMECuOauYdyh80/V8TrTJf3BpgSvk/EAU78jOws1GVVTzLAq?= =?Windows-1252?Q?K/f7q7A+x7B96kjGyta/07xnDPvR53nqCh/VqAQbAhbVxzlHKIs73k0J?= =?Windows-1252?Q?x2CQ1gTbG1VzCHbS1E1SIqKIapzpHix4+RhCM0u7EFRCjUSTqSt9wL4Q?= =?Windows-1252?Q?7EiIvmN7Ps4J4hQ+HVv0oebcndaU0A0p/9Ty/hvefllcxKeuBxTktMw2?= =?Windows-1252?Q?m7TTSFO4I/f9FAy5uEh6YwLj0Lc13KI1mk8Mocg07FHrYWPaL4Iqk5Am?= =?Windows-1252?Q?VbUu7hCRmWn1k1LuyhGjgkOrboLHBC/tfJizqsWPPnBrP+2Af+O9j7b8?= =?Windows-1252?Q?+BxW/TWgCARD8NXL95iRcLbtYQA=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1306;6:IAR6IQMlZW4C57fwdvmy+wx0B2+ovsl9MQPcM0HtftgNxJpMm/6eDVvjCXxG0lw3V3yF6Pg0DcqkfcHel38GMszBweN5t6xiZOKY+/MDfo5x9qpGQN7DGWIj6C8WXxAHGqniIb1f0aAYL+Vki+PR8stuv/OqzC8N3OrGBQGSHa/LCdTSDi/IF3vZwm29BDqVvyd0wIuNLLqk9MxxDqv4vFDvq0B50TPuKPKbka0m7NQUb5mfhkc2qR3yBcdfsdGOcvsp+BrDgCwF4z4jPwm38FfcrCQbndB57SCLmCxjA/UJdMqu0HUTbY0sOqoAaPtL;5:4bZ0BLOfH4TDfPWVFY63uxOTI98pJAEWHsPWAtpHspp95l4r9H3eJaVbq/++r+Y0DfaY7c9WdViIgPeH5W05UUA+qH56hmubdFJ+lUKMbfENPSbCgsh/NfdhRzTUWoeFc3gJiHB0qlIf7fPTkA0sWA==;24:71YGhj4nmCimoo36VgC8XDxQFEcGqHABBJJekfzHRe5f0wg6i1VBy7qKzYbP2+WNC9LYpO2o0AyshChXeZaPM5RlSCvanMiNHGpqgGpg+kU=;7:XGBLPLwo3R+CoHh484kBCYNf2j55v6aUWggvJABmwoTV0WlCTdFHxaUkwcNeK/wCL2uJVl9aTgsCS9pMD4gu9URj/0XErPXWmzvzxrNMIdqEwVUVT4d81u+g5vlO9smm7Qs91SbVPNGcy65IkNHIQfmxOqmPjBcNkrUMXDcZ2zCeSDna+ER0QVVUeWh6pLcHa1pDahy8qQZPYhylG005wZfKzB0nPMhamANP45oxoYvxmOMtgnOhfMgPEIa6yqMr SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1306;20:K3YqZOMrMeqcrq0JUaLOTmhNJKIM+X+AjGOqjsYpS6nOXiZVH5ue1KWbmMappqSQ2mOPgTWLwt2Xhhfdcb8zT+D3UWUqrTHFRuQBxk2in9O3fJh/mqcMXTs9IoBSkNOa8YaGEw+NHlXignonz0DUZeUVUx4mxiF1A4Ou+AexzeA= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jul 2016 08:51:57.0806 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB1306 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/15/2016 01:25 AM, Ross Zwisler wrote: > On Thu, Jul 14, 2016 at 02:19:56PM +0300, Andrey Ryabinin wrote: >> radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. >> Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() >> leading to crash: >> >> RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 >> [] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 >> .... >> Call Trace: >> [] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 >> [] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 >> [] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 >> [] do_writepages+0x97/0x100 mm/page-writeback.c:2364 >> [] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 >> [] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 >> [] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 >> [] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 >> [< inline >] vfs_fsync fs/sync.c:209 >> [] do_fsync+0x42/0x70 fs/sync.c:219 >> [< inline >] SYSC_fdatasync fs/sync.c:232 >> [] SyS_fdatasync+0x19/0x20 fs/sync.c:230 >> [] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 >> >> We must reset iterator's tags to bail out from radix_tree_next_slot() and >> go to the slow-path in radix_tree_next_chunk(). > > This analysis doesn't make sense to me. In find_get_pages_tag(), when we call > radix_tree_iter_retry(), this sets the local 'slot' variable to NULL, then > does a 'continue'. This will hop to the next iteration of the > radix_tree_for_each_tagged() loop, which will very check the exit condition of > the for() loop: > > #define radix_tree_for_each_tagged(slot, root, iter, start, tag) \ > for (slot = radix_tree_iter_init(iter, start) ; \ > slot || (slot = radix_tree_next_chunk(root, iter, \ > RADIX_TREE_ITER_TAGGED | tag)) ; \ > slot = radix_tree_next_slot(slot, iter, \ > RADIX_TREE_ITER_TAGGED)) > > So, we'll run the > slot || (slot = radix_tree_next_chunk(root, iter, \ > RADIX_TREE_ITER_TAGGED | tag)) ; \ > > bit first. This is not the way how the for() loop works. slot = radix_tree_next_slot() executed first and only after that goes the condition statement. > 'slot' is NULL, so we'll set it via radix_tree_next_chunk(). At > this point radix_tree_next_slot() hasn't been called. > > radix_tree_next_chunk() will set up the iter->index, iter->next_index and > iter->tags before it returns. The next iteration of the loop in > find_get_pages_tag() will use the non-NULL slot provided by > radix_tree_next_chunk(), and only after that iteration will we call > radix_tree_next_slot() again. By then iter->tags should be up to date. > > Do you have a test setup that reliably fails without this code but passes when > you zero out iter->tags? > Yup, I run Dmitry's reproducer in a parallel loop: $ while true; do ./a.out & done Usually it takes just couple minutes maximum. > I've been looking at this as well, but haven't been able to get a reliable > reproducer in my test setup. > >> Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") >> Signed-off-by: Andrey Ryabinin >> Reported-by: Dmitry Vyukov >> Cc: Konstantin Khlebnikov >> Cc: Matthew Wilcox >> Cc: Hugh Dickins >> Cc: >> --- >> include/linux/radix-tree.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h >> index cb4b7e8..eca6f62 100644 >> --- a/include/linux/radix-tree.h >> +++ b/include/linux/radix-tree.h >> @@ -407,6 +407,7 @@ static inline __must_check >> void **radix_tree_iter_retry(struct radix_tree_iter *iter) >> { >> iter->next_index = iter->index; >> + iter->tags = 0; >> return NULL; >> } >> >> -- >> 2.7.3 >> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators. To: Ross Zwisler References: <1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com> <20160714222527.GA26136@linux.intel.com> CC: Andrew Morton , Jan Kara , "Kirill A. Shutemov" , , Greg Thelen , Suleiman Souhlal , , Kostya Serebryany , Alexander Potapenko , Sasha Levin , , Konstantin Khlebnikov , Matthew Wilcox , Hugh Dickins , From: Andrey Ryabinin Message-ID: <5788A46A.70106@virtuozzo.com> Date: Fri, 15 Jul 2016 11:52:58 +0300 MIME-Version: 1.0 In-Reply-To: <20160714222527.GA26136@linux.intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: On 07/15/2016 01:25 AM, Ross Zwisler wrote: > On Thu, Jul 14, 2016 at 02:19:56PM +0300, Andrey Ryabinin wrote: >> radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. >> Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() >> leading to crash: >> >> RIP: [< inline >] radix_tree_next_slot include/linux/radix-tree.h:473 >> [] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 >> .... >> Call Trace: >> [] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 >> [] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 >> [] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 >> [] do_writepages+0x97/0x100 mm/page-writeback.c:2364 >> [] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 >> [] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 >> [] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 >> [] vfs_fsync_range+0x10a/0x250 fs/sync.c:195 >> [< inline >] vfs_fsync fs/sync.c:209 >> [] do_fsync+0x42/0x70 fs/sync.c:219 >> [< inline >] SYSC_fdatasync fs/sync.c:232 >> [] SyS_fdatasync+0x19/0x20 fs/sync.c:230 >> [] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 >> >> We must reset iterator's tags to bail out from radix_tree_next_slot() and >> go to the slow-path in radix_tree_next_chunk(). > > This analysis doesn't make sense to me. In find_get_pages_tag(), when we call > radix_tree_iter_retry(), this sets the local 'slot' variable to NULL, then > does a 'continue'. This will hop to the next iteration of the > radix_tree_for_each_tagged() loop, which will very check the exit condition of > the for() loop: > > #define radix_tree_for_each_tagged(slot, root, iter, start, tag) \ > for (slot = radix_tree_iter_init(iter, start) ; \ > slot || (slot = radix_tree_next_chunk(root, iter, \ > RADIX_TREE_ITER_TAGGED | tag)) ; \ > slot = radix_tree_next_slot(slot, iter, \ > RADIX_TREE_ITER_TAGGED)) > > So, we'll run the > slot || (slot = radix_tree_next_chunk(root, iter, \ > RADIX_TREE_ITER_TAGGED | tag)) ; \ > > bit first. This is not the way how the for() loop works. slot = radix_tree_next_slot() executed first and only after that goes the condition statement. > 'slot' is NULL, so we'll set it via radix_tree_next_chunk(). At > this point radix_tree_next_slot() hasn't been called. > > radix_tree_next_chunk() will set up the iter->index, iter->next_index and > iter->tags before it returns. The next iteration of the loop in > find_get_pages_tag() will use the non-NULL slot provided by > radix_tree_next_chunk(), and only after that iteration will we call > radix_tree_next_slot() again. By then iter->tags should be up to date. > > Do you have a test setup that reliably fails without this code but passes when > you zero out iter->tags? > Yup, I run Dmitry's reproducer in a parallel loop: $ while true; do ./a.out & done Usually it takes just couple minutes maximum. > I've been looking at this as well, but haven't been able to get a reliable > reproducer in my test setup. > >> Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") >> Signed-off-by: Andrey Ryabinin >> Reported-by: Dmitry Vyukov >> Cc: Konstantin Khlebnikov >> Cc: Matthew Wilcox >> Cc: Hugh Dickins >> Cc: >> --- >> include/linux/radix-tree.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h >> index cb4b7e8..eca6f62 100644 >> --- a/include/linux/radix-tree.h >> +++ b/include/linux/radix-tree.h >> @@ -407,6 +407,7 @@ static inline __must_check >> void **radix_tree_iter_retry(struct radix_tree_iter *iter) >> { >> iter->next_index = iter->index; >> + iter->tags = 0; >> return NULL; >> } >> >> -- >> 2.7.3 >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org