* An incorrect assumption over radix_tree_tag_get() @ 2010-04-06 16:19 David Howells 2010-04-06 17:09 ` Nick Piggin 0 siblings, 1 reply; 6+ messages in thread From: David Howells @ 2010-04-06 16:19 UTC (permalink / raw) To: paulmck, npiggin, corbet; +Cc: dhowells, linux-kernel, linux-cachefs Hi, I think I've made a bad assumption over my usage of radix_tree_tag_get() in fs/fscache/page.c. I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set() and radix_tree_tag_clear() by the RCU read lock. However, now I'm not so sure. I think it's only protected against removal of part of the tree. Can you confirm? David ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: An incorrect assumption over radix_tree_tag_get() 2010-04-06 16:19 An incorrect assumption over radix_tree_tag_get() David Howells @ 2010-04-06 17:09 ` Nick Piggin 2010-04-06 18:52 ` David Howells 2010-04-06 23:34 ` Dave Chinner 0 siblings, 2 replies; 6+ messages in thread From: Nick Piggin @ 2010-04-06 17:09 UTC (permalink / raw) To: David Howells; +Cc: paulmck, corbet, linux-kernel, linux-cachefs On Tue, Apr 06, 2010 at 05:19:49PM +0100, David Howells wrote: > > Hi, > > I think I've made a bad assumption over my usage of radix_tree_tag_get() in > fs/fscache/page.c. > > I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set() > and radix_tree_tag_clear() by the RCU read lock. However, now I'm not so > sure. I think it's only protected against removal of part of the tree. > > Can you confirm? It is safe. Synchronization requirements for using the radix tree API are documented. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: An incorrect assumption over radix_tree_tag_get() 2010-04-06 17:09 ` Nick Piggin @ 2010-04-06 18:52 ` David Howells 2010-04-06 19:16 ` David Howells 2010-04-06 23:34 ` Dave Chinner 1 sibling, 1 reply; 6+ messages in thread From: David Howells @ 2010-04-06 18:52 UTC (permalink / raw) To: Nick Piggin; +Cc: dhowells, paulmck, corbet, linux-kernel, linux-cachefs Nick Piggin <npiggin@suse.de> wrote: > It is safe. Synchronization requirements for using the radix tree API > are documented. I presume you mean the big comment on it in radix-tree.h. According to that, it is not safe: * - any function _modifying_ the tree or tags (inserting or deleting * items, setting or clearing tags) must exclude other modifications, and * exclude any functions reading the tree. David ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: An incorrect assumption over radix_tree_tag_get() 2010-04-06 18:52 ` David Howells @ 2010-04-06 19:16 ` David Howells 0 siblings, 0 replies; 6+ messages in thread From: David Howells @ 2010-04-06 19:16 UTC (permalink / raw) To: Nick Piggin; +Cc: dhowells, paulmck, corbet, linux-kernel, linux-cachefs David Howells <dhowells@redhat.com> wrote: > Nick Piggin <npiggin@suse.de> wrote: > > > It is safe. Synchronization requirements for using the radix tree API > > are documented. > > I presume you mean the big comment on it in radix-tree.h. > > According to that, it is not safe: > > * - any function _modifying_ the tree or tags (inserting or deleting > * items, setting or clearing tags) must exclude other modifications, and > * exclude any functions reading the tree. Having said that, the next few lines, say that it is: * The notable exceptions to this rule are the following functions: * radix_tree_lookup * radix_tree_lookup_slot * radix_tree_tag_get * radix_tree_gang_lookup * radix_tree_gang_lookup_slot * radix_tree_gang_lookup_tag * radix_tree_gang_lookup_tag_slot * radix_tree_tagged However, I'm not sure I agree that radix_tree_tag_get() belongs in this list. The bug symptoms are this: Someone is seeing is a bug with an apparently corrupt radix tree tag chain being observed in radix_tree_tag_get(). Leastways, the BUG() on line 602 in radix_tree_tag_get() trips once in a while: kernel BUG at /usr/src/linux-2.6-2.6.33/debian/build/source_i386_none/lib/radix-tree.c:602! RIP: 0010:[<ffffffff81182040>] radix_tree_tag_get+0xbc/0xe3 [<ffffffffa0247b67>] ? __fscache_maybe_release_page+0x42/0x115 [<ffffffffa0372e7d>] ? nfs_fscache_release_page+0x66/0x99 [nfs] [<ffffffff810b6dee>] ? invalidate_inode_pages2_range+0x15a/0x262 [<ffffffffa035312f>] ? nfs_invalidate_mapping_nolock+0x18/0xb4 [<ffffffffa0354097>] ? nfs_revalidate_mapping+0x85/0x99 [nfs] [<ffffffffa0351158>] ? nfs_file_splice_read+0x5b/0x8e [nfs] [<ffffffff811043d3>] ? splice_direct_to_actor+0xbe/0x188 [<ffffffff81104a1c>] ? direct_splice_actor+0x0/0x1e [<ffffffff81113274>] ? ep_scan_ready_list+0x132/0x151 [<ffffffff811044e7>] ? do_splice_direct+0x4a/0x64 [<ffffffff810e8fa8>] ? do_sendfile+0x12d/0x1a8 [<ffffffff8106685b>] ? getnstimeofday+0x55/0xaf [<ffffffff810e906c>] ? sys_sendfile64+0x49/0x88 [<ffffffff8103145f>] ? sysenter_dispatch+0x7/0x2e which is this: if (!tag_get(node, tag, offset)) saw_unset_tag = 1; if (height == 1) { int ret = tag_get(node, tag, offset); --> BUG_ON(ret && saw_unset_tag); return !!ret; } In fs/fscache/page.c, __fscache_maybe_release_page() does a radix_tree_lookup() with just the RCU read lock held, and then calls radix_tree_tag_get() a couple of times. In this case, it's the first instance, before we grab the stores_lock spinlock (which is used to serialise alteration of the radix tree) that is the problem: /* see if the page is actually undergoing storage - if so we can't get * rid of it till the cache has finished with it */ if (radix_tree_tag_get(&cookie->stores, page->index, FSCACHE_COOKIE_STORING_TAG)) { rcu_read_unlock(); goto page_busy; } Looking at radix_tree_tag_get(), I can see that it carefully uses rcu_dereference_raw() to protect itself against pointer modification - but looking at radix_tree_tag_set/clear(), no pointers are modified, no nodes are replaced. radix_tree_tag_get()'s attempts to protect itself count for nothing as set/clear() modify the node directly. So, what I'm seeing is that the two calls to tag_get() on the same bit occasionally show a different value, and, looking at the code, I can't see any reason for the confidence displayed in the documenation that this cannot happen. David ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: An incorrect assumption over radix_tree_tag_get() 2010-04-06 17:09 ` Nick Piggin 2010-04-06 18:52 ` David Howells @ 2010-04-06 23:34 ` Dave Chinner 2010-04-07 7:57 ` Nick Piggin 1 sibling, 1 reply; 6+ messages in thread From: Dave Chinner @ 2010-04-06 23:34 UTC (permalink / raw) To: Nick Piggin; +Cc: David Howells, paulmck, corbet, linux-kernel, linux-cachefs On Wed, Apr 07, 2010 at 03:09:03AM +1000, Nick Piggin wrote: > On Tue, Apr 06, 2010 at 05:19:49PM +0100, David Howells wrote: > > > > Hi, > > > > I think I've made a bad assumption over my usage of radix_tree_tag_get() in > > fs/fscache/page.c. > > > > I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set() > > and radix_tree_tag_clear() by the RCU read lock. However, now I'm not so > > sure. I think it's only protected against removal of part of the tree. > > > > Can you confirm? > > It is safe. Synchronization requirements for using the radix tree API > are documented. I don't think it is safe - I made modifications to XFS that modified radix tree tags under a read lock (not RCU), but this resulted in corrupted tag state as concurrent tag set/clear operations for different slots propagated through the tree and got mixed up. Christoph fixed the problem (f1f724e4b523d444c5a598d74505aefa3d6844d2) by putting all tag modifications under the write lock. I can't see how doing tag modifications under RCU read locks is any safer than doing it under a spinning read lock.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: An incorrect assumption over radix_tree_tag_get() 2010-04-06 23:34 ` Dave Chinner @ 2010-04-07 7:57 ` Nick Piggin 0 siblings, 0 replies; 6+ messages in thread From: Nick Piggin @ 2010-04-07 7:57 UTC (permalink / raw) To: Dave Chinner; +Cc: David Howells, paulmck, corbet, linux-kernel, linux-cachefs On Wed, Apr 07, 2010 at 09:34:38AM +1000, Dave Chinner wrote: > On Wed, Apr 07, 2010 at 03:09:03AM +1000, Nick Piggin wrote: > > On Tue, Apr 06, 2010 at 05:19:49PM +0100, David Howells wrote: > > > > > > Hi, > > > > > > I think I've made a bad assumption over my usage of radix_tree_tag_get() in > > > fs/fscache/page.c. > > > > > > I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set() > > > and radix_tree_tag_clear() by the RCU read lock. However, now I'm not so > > > sure. I think it's only protected against removal of part of the tree. > > > > > > Can you confirm? > > > > It is safe. Synchronization requirements for using the radix tree API > > are documented. > > I don't think it is safe - I made modifications to XFS that modified > radix tree tags under a read lock (not RCU), but this resulted in > corrupted tag state as concurrent tag set/clear operations for > different slots propagated through the tree and got mixed up. > Christoph fixed the problem (f1f724e4b523d444c5a598d74505aefa3d6844d2) > by putting all tag modifications under the write lock. I can't see > how doing tag modifications under RCU read locks is any safer than > doing it under a spinning read lock.... No the modifications must all be serialized, but they can run in parallel with a radix_tree_tag_get(). ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-04-07 7:57 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-06 16:19 An incorrect assumption over radix_tree_tag_get() David Howells 2010-04-06 17:09 ` Nick Piggin 2010-04-06 18:52 ` David Howells 2010-04-06 19:16 ` David Howells 2010-04-06 23:34 ` Dave Chinner 2010-04-07 7:57 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox