From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David Dabbs" Subject: RE: Performance improvements to key comparison functions Date: Tue, 22 Jun 2004 05:53:05 -0500 Message-ID: <20040714105110.9A52415CE7@mail03.powweb.com> References: <40F4D9D4.8050306@namesys.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <40F4D9D4.8050306@namesys.com> List-Id: Content-Type: text/plain; charset="us-ascii" To: 'Hans Reiser' Cc: reiserfs-list@namesys.com > David Dabbs wrote: > >Level-Segregated znode Cache >The cache scanning walks the cache and calls znode_contains_key_strict if >cached_node->level == search_key->level. If the scan tests some nodes not >at the same level as the in-cache node it will ultimately find, does it >make sense to segregate the cache by level? > >> ...and later... >>Scratch the per-level caching idea. What was I thinking? A cached >>node's level is not invariant. >> > Hans wrote: > Yes it is, for balanced trees at least. > > Please forgive me for not having read this code in a year, can you say > this in more detail? > >I don't quite understand you. Any given key will have at least one node >(more than one only if there are multiple items with the same key, which >happens for directory items if hashing is not perfect) into which it falls >at each level of the tree. Let me see if I can be more accurate in my description. reiser4_keys do not have levels. When I wrote "search_key->level" I meant the h->level passed into cbk_scan_cache_slots along with the search key. In search.c, cbk_cache_search() calls cbk_cache_scan_slots() for the inclusive level range h->stop_level to h->lock_level read from the cbk_handle passed in by cbk_cache_search (see below). Now, after re-reading the code and taking into account your responses, I wonder why cbk_cache_scan_slots cannot simply check whether a particular cached node's level is between stop_level and lock_level inclusive instead of scanning the slots with one level value, and then another, etc.? Is this done so that the cache is not locked for the entire duration of the scan even though it is locked/unlocked with every scan using a new level value? > Hans: > Is your point that a list structure is inferior to an array, when it is > used for a fixed size cache? No, not at all. I was proposing a linked-list node cache _per_tree_level_. The array came in with my sloppy suggestion that the pointers to these cache structures should be stored in a static array of size MAX_TREE_LEVEL (256) to make starting at the list head (for a particular level) as immediate as it is with one cache today. A conditional would be removed from the slot scan loop (testing the level of each cached node) and separate threads could lock the various caches. Unfortunately, if cache activity ends up centering on one level, then there would be no benefit. In addition, the number of nodes to search could at worse double, yes? If there is no reason to scan this level range in order, would it hurt to pick one of the levels at random? Anyway, the sun is coming up, and I should do something about sleep. I think I should not chew up bandwidth with any more speculations until after I finish my work in progress and after more experience with the system. David /* look for item with given key in the coord cache This function, called by coord_by_key(), scans "coord cache" (&cbk_cache) which is a small LRU list of znodes accessed lately. For each znode in this list, it checks whether key we are looking for fits into key range covered by this node. If so, and in addition, node lies at allowed level (this is to handle extents on a twig level), node is locked, and lookup inside it is performed. we need a measurement of the cost of this cache search compared to the cost of coord_by_key. */ cbk_cache_search() { int result = 0; tree_level level; /* add CBK_IN_CACHE to the handle flags. This means that * cbk_node_lookup() assumes that cbk_cache is scanned and would add * found node to the cache. */ h->flags |= CBK_IN_CACHE; for (level = h->stop_level; level <= h->lock_level; ++level) { h->level = level; result = cbk_cache_scan_slots(h); if (result != 0) { done_lh(h->active_lh); done_lh(h->parent_lh); reiser4_stat_inc(tree.cbk_cache_miss); } else { assert("nikita-1319", !IS_CBKERR(h->result)); reiser4_stat_inc(tree.cbk_cache_hit); write_tree_log(h->tree, tree_cached); break; } } h->flags &= ~CBK_IN_CACHE; return result; } cbk_cache_scan_slots() { ---8< --- snip /* * this is time-critical function and dragons had, hence, been settled * here. * * Loop below scans cbk cache slots trying to find matching node with * suitable range of delimiting keys and located at the h->level. * * Scan is done under cbk cache spin lock that protects slot->node * pointers. If suitable node is found we want to pin it in * memory. But slot->node can point to the node with x_count 0 * (unreferenced). Such node can be recycled at any moment, or can * already be in the process of being recycled (within jput()). * * As we found node in the cbk cache, it means that jput() hasn't yet * called cbk_cache_invalidate(). * * We acquire reference to the node without holding tree lock, and * later, check node's RIP bit. This avoids races with jput(). * */ rcu_read_lock(); read_lock_cbk_cache(cache); slot = cbk_cache_list_prev(cbk_cache_list_front(&cache->lru)); while (1) { slot = cbk_cache_list_next(slot); if (!cbk_cache_list_end(&cache->lru, slot)) node = slot->node; else node = NULL; if (unlikely(node == NULL)) break; /* * this is (hopefully) the only place in the code where we are * working with delimiting keys without holding dk lock. This * is fine here, because this is only "guess" anyway---keys * are rechecked under dk lock below. */ if (znode_get_level(node) == level && /* min_key < key < max_key */ znode_contains_key_strict(node, key, isunique)) { zref(node); result = 0; spin_lock_prefetch(&tree->tree_lock.lock); break; } } read_unlock_cbk_cache(cache); assert("nikita-2475", cbk_cache_invariant(cache)); if (unlikely(result == 0 && ZF_ISSET(node, JNODE_RIP))) result = -ENOENT; rcu_read_unlock(); if (result != 0) { h->result = CBK_COORD_NOTFOUND; return RETERR(-ENOENT); } result = longterm_lock_znode(h->active_lh, node, cbk_lock_mode(level, h), ZNODE_LOCK_LOPRI); zput(node); if (result != 0) return result; result = zload(node); if (result != 0) return result; /* recheck keys */ result = UNDER_RW(dk, tree, read, znode_contains_key_strict(node, key, isunique)) && !ZF_ISSET(node, JNODE_HEARD_BANSHEE); if (result) { /* do lookup inside node */ ---8< --- snip } > David wrote: >Lock-Free Data Structures >I've recently been reading research regarding lock-, wait-, and >obstruction-free concurrent data structures. Here are some relevant links: > > http://www.cs.yorku.ca/~ruppert/papers/lfll.pdf > http://www.cl.cam.ac.uk/Research/SRG/netos/lock-free/ > http://research.sun.com/scalable/pubs/index.html > >To the degree that these approaches do not require architecture-specific >atomic primitives, their use could increase performance and possibly >simplify things in some cases. Assuming they are portable, the place to >apply them might be the cache linked-list(s). > > Hans wrote: Please let me know your thoughts on this once you have had a chance to pick a suggestion you like the most. I will print these and put them in my queue, sigh. Sorry for the somewhat mindless link propagation. Wanted to see whether these approaches had been considered and rejected, were on the drawing board, etc. I will make a recommendation when I finish the key stuff.