* Q. hlist_bl_add_head_rcu() in d_alloc_parallel() @ 2016-06-17 20:50 J. R. Okajima 2016-06-17 22:16 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: J. R. Okajima @ 2016-06-17 20:50 UTC (permalink / raw) To: viro, linux-fsdevel I am afraid there may exist another violation of "no lookups on the same name in parallel" rule, but I am not sure. Roughly d_alloc_parallel() behaves like this. struct dentry *d_alloc_parallel() { new = d_alloc(parent, name); rcu_read_lock(); hlist_bl_lock(b); rcu_read_unlock(); hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) { if (!matched_dentry_found) continue; dget(dentry); hlist_bl_unlock(b); return dentry; } hlist_bl_add_head_rcu(&new->d_u.d_in_lookup_hash, b); hlist_bl_unlock(b); return new; } When two processes try opening a single existing file and enters d_alloc_parallel() at the same time, only one process wins and should succeeds hlist_bl_add_head_rcu(). The other process should find the dentry in d_u.d_in_lookup_hash and return 'dentry' (instead of 'new'). Am I right? My question is when will 'new' be added into d_u.d_in_lookup_hash? It should be between these two lines, I guess. rcu_read_unlock(); hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) { But can it surely happen? If 'new' is not added here because someone else is in rcu_read_lock region or other reason, then both processes will add the same named but different dentry? Is it better to change the lock/unlock-order like this? rcu_read_unlock(); rcu_barrier(); hlist_bl_lock(b); hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) { J. R. Okajima ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-17 20:50 Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima @ 2016-06-17 22:16 ` Al Viro 2016-06-17 22:56 ` Al Viro 2016-06-19 5:24 ` J. R. Okajima 0 siblings, 2 replies; 16+ messages in thread From: Al Viro @ 2016-06-17 22:16 UTC (permalink / raw) To: J. R. Okajima; +Cc: linux-fsdevel On Sat, Jun 18, 2016 at 05:50:30AM +0900, J. R. Okajima wrote: > hlist_bl_lock(b); > rcu_read_unlock(); > hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) { > if (!matched_dentry_found) > continue; > dget(dentry); > hlist_bl_unlock(b); > return dentry; > } > hlist_bl_add_head_rcu(&new->d_u.d_in_lookup_hash, b); > hlist_bl_unlock(b); > return new; > } > When two processes try opening a single existing file and enters > d_alloc_parallel() at the same time, only one process wins and should > succeeds hlist_bl_add_head_rcu(). The other process should find the > dentry in d_u.d_in_lookup_hash and return 'dentry' (instead of > 'new'). Am I right? > > My question is when will 'new' be added into d_u.d_in_lookup_hash? You've quoted it yourself: > hlist_bl_add_head_rcu(&new->d_u.d_in_lookup_hash, b); > It should be between these two lines, I guess. > rcu_read_unlock(); > hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) { > But can it surely happen? > If 'new' is not added here because someone else is in rcu_read_lock > region or other reason, then both processes will add the same named but > different dentry? Huh? If you have two processes reaching that insertion into the in-lookup hash, whoever gets the hlist_bl_lock() first wins; the loser will find the instance added by the winner and bugger off with it. RCU is completely unrelated to that. It's about the search in *primary* hash. > Is it better to change the lock/unlock-order like this? > > rcu_read_unlock(); > rcu_barrier(); > hlist_bl_lock(b); > hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) { No. We need to verify that nothing had been transferred from in-lookup to primary hash after we'd found no match in the primary. That's what the ->i_dir_seq check is about, and it has to be done after we'd locked the in-lookup hash chain. We could drop rcu_read_lock before locking the chain, but this way we make sure we won't get preempted between fetching the ->i_dir_seq and checking if it had been changed. The last thing we want is rcu_barrier() - WTF for? To get an extra chance of something being moved from in-lookup to primary, forcing us to repeat the primary hash lookup? We are *NOT* modifying the primary hash in d_alloc_parallel(). At all. With respect to the primary hash it's a pure reader. And in-lookup hash is only accessed under hlist_bl_lock() on its chains - no RCU accesses to that one. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-17 22:16 ` Al Viro @ 2016-06-17 22:56 ` Al Viro 2016-06-19 5:24 ` J. R. Okajima 1 sibling, 0 replies; 16+ messages in thread From: Al Viro @ 2016-06-17 22:56 UTC (permalink / raw) To: J. R. Okajima; +Cc: linux-fsdevel On Fri, Jun 17, 2016 at 11:16:14PM +0100, Al Viro wrote: > We are *NOT* modifying the primary hash in d_alloc_parallel(). At all. > With respect to the primary hash it's a pure reader. And in-lookup hash > is only accessed under hlist_bl_lock() on its chains - no RCU accesses to > that one. Note that if d_alloc_parallel() finds a matching dentry in the in-lookup hash, it does *NOT* return it until it has left in-lookup hash and had been moved to the primary one. That's what d_wait_lookup() call is about; we wait for whoever had added it into in-lookup hash to be done with it. Then we check if it's been transferred into the primary hash (with the same name and parent). If it has been, we treat it as if we'd found it in the primary hash in the first place - it's not an in-lookup one anymore. If it hasn't, we repeat the whole thing from scratch, starting with primary hash lookup. The *only* case when we return an in-lookup dentry is when we had allocated it, found no matches either in primary or in-lookup hashes and inserted it into in-lookup hash. What happens is an optimized variant of this: new = new dentry instance retry: fetch the value of ->i_dir_seq look for match in primary if found - drop what we'd allocated and return what we'd found lock the chain in in-lookup hash check if ->i_dir_seq has changed if it has changed unlock the chain goto retry look for match in in-lookup hash if found unlock the chain wait for the match to cease being in-lookup drop the match goto retry [see below] insert new into in-lookup hash unlock the chain return new The difference between that and actual d_alloc_parallel() is a simple optimisation in the last goto retry - the one after waiting for match to leave in-lookup hash. Pseudocode above would rescan the primary hash; very often it would end up finding the same dentry we'd just been waiting for. So if our match is hashed and still has the same name and parent we can return it without bothering with primary hash lookup. It is common enough to be worth doing, but it's just an optimisation - behaviour is the same as if we'd followed d_wait_lookup() with unconditional spin_unlock(&dentry->d_lock); dput(dentry); goto retry; The only difference is that we don't bother with drop/search in primary/find and grab the same match in a very common case. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-17 22:16 ` Al Viro 2016-06-17 22:56 ` Al Viro @ 2016-06-19 5:24 ` J. R. Okajima 2016-06-19 16:55 ` Al Viro 1 sibling, 1 reply; 16+ messages in thread From: J. R. Okajima @ 2016-06-19 5:24 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel Al Viro: > Huh? If you have two processes reaching that insertion into the in-lookup > hash, whoever gets the hlist_bl_lock() first wins; the loser will find > the instance added by the winner and bugger off with it. RCU is completely > unrelated to that. It's about the search in *primary* hash. Ok, forget about rcu_barrier. ---------------------------------------- > look for match in in-lookup hash > if found > unlock the chain > wait for the match to cease being in-lookup > drop the match > goto retry [see below] > insert new into in-lookup hash The actual matching test which corresponds to above pseudo-code (if found) is this (from v4.7-rc3). dentry->d_name.hash != hash dentry->d_parent != parent d_unhashed(dentry) name (length and string) I am afraid this d_unhashed() test is racy. Here is what I am guessing. - two processes try opening the same file - the both enter the hlist_bl_lock protected loop in d_alloc_parallel() - the winner puts the new dentry into in-lookup hash + here d_unhashed(dentry) would still return true. - then the winner process will call ->atomic_open or ->lookup. finally d_add() and rehash will be called and the dentry will be moved to the primary hash. + here d_unhashed(dentry) would return false. As soon as the winner calls hlist_bl_unlock(), the looser starts d_in_lookup_hash loop and find the dentry which the winner added. - the looser (or we should call processB) do the tests dentry->d_name.hash != hash dentry->d_parent != parent d_unhashed(dentry) - if processA has already called d_add and rehash, then this d_unhashed() test would return false, and processB will throw away his own 'new' dentry and return the found one. - if processA has NOT called d_add and rehash yet (due to the schedule timing), then this d_unhashed() test would return true, and processB will simply skip the found dentry. in this case, processB will add his own 'new' dentry into in-lookup hash and return it. Finally this race between these two - d_add and rehash via ->atomic_open or ->lookup - d_unhashed test in d_alloc_parallel leads to the duplicated dentries (same named dentry under the same parent). Do you think it can happen? J. R. Okajima ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-19 5:24 ` J. R. Okajima @ 2016-06-19 16:55 ` Al Viro 2016-06-20 4:34 ` J. R. Okajima 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2016-06-19 16:55 UTC (permalink / raw) To: J. R. Okajima; +Cc: linux-fsdevel On Sun, Jun 19, 2016 at 02:24:44PM +0900, J. R. Okajima wrote: > - two processes try opening the same file > - the both enter the hlist_bl_lock protected loop in d_alloc_parallel() > > - the winner puts the new dentry into in-lookup hash > + here d_unhashed(dentry) would still return true. > - then the winner process will call ->atomic_open or ->lookup. finally > d_add() and rehash will be called and the dentry will be moved to the > primary hash. > + here d_unhashed(dentry) would return false. > As soon as the winner calls hlist_bl_unlock(), the looser starts > d_in_lookup_hash loop and find the dentry which the winner added. > > - the looser (or we should call processB) do the tests > dentry->d_name.hash != hash > dentry->d_parent != parent > d_unhashed(dentry) > - if processA has already called d_add and rehash, then this > d_unhashed() test would return false, and processB will throw away his > own 'new' dentry and return the found one. > - if processA has NOT called d_add and rehash yet (due to the schedule > timing), then this d_unhashed() test would return true, and processB > will simply skip the found dentry. > in this case, processB will add his own 'new' dentry into in-lookup > hash and return it. How would processB get past d_wait_lookup()? It would have to have observed !d_in_lookup() while holding ->d_lock on that sucker; it does *not* drop ->d_lock through the tests you've mentioned. And both the removals of in-lookup flag and insertion into primary hash are done under ->d_lock without dropping it in between. That was the point of taking security_d_instantiate() prior to attaching to inode, etc. - that way we can make those actions (removal from in-lookup hash, possibly attaching to inode, inserting into the primary hash) atomic wrt d_wait_lookup(). ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-19 16:55 ` Al Viro @ 2016-06-20 4:34 ` J. R. Okajima 2016-06-20 5:35 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: J. R. Okajima @ 2016-06-20 4:34 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel Al Viro: > How would processB get past d_wait_lookup()? It would have to have By the first d_unhashed() test in the loop, processB doesn't reach d_wait_lookup(). Here is a more detailed scenario I am considering. - two processes try opening the same file - the both enter lookup_open() and the hlist_bl_lock protected loop in d_alloc_parallel() processA (the winner of hlist_bl_lock) d_alloc_parallel() + new = d_alloc() + rcu_read_lock() + dentry = __d_lookup_rcu() NULL + hlist_bl_lock() --- (BL0) + test parent's i_dir_seq --- (DS1) + rcu_read_unlock() + hlist_bl_for_each_entry(d_u.d_in_lookup_hash) not found + new->d_flags |= DCACHE_PAR_LOOKUP + new->d_wait = wq + hlist_bl_add_head_rcu(new->d_u.d_in_lookup_hash) + hlist_bl_unlock() --- (BL1) + return new --> a new dentry is added into in-lookup hash. And then processA ->atomic_open or ->lookup + __d_add() + spin_lock(d_lock) + update parent's i_dir_seq --- (DS2) + __d_lookup_done() + hlist_bl_lock() --- (BL2) + dentry->d_flags &= ~DCACHE_PAR_LOOKUP + __hlist_bl_del(d_u.d_in_lookup_hash) + hlist_bl_unlock() --- (BL3) + _d_rehash() + hlist_bl_lock() --- (BL4) + hlist_bl_add_head_rcu(d_hash) + hlist_bl_unlock() + spin_unlock(d_lock) --> the new dentry is moved from in-lookup to primary hash. Between (BL1) and (BL2) in processA flow, processB may acquire hlist_bl_lock() which was blocked at (BL0), and it may happen even before processA's (DS2). In this case, processB will return an unexpectedly duplicated dentry because DS1 test will be passed and the loop body will be skipped by d_unhashed() test before d_wait_lookup(). It is after (BL4) when d_unhashed() turns FALSE. processB d_alloc_parallel() + hlist_bl_lock() + test parent's i_dir_seq passed if processA doesn't reach (DS2) yet. + rcu_read_unlock() + hlist_bl_for_each_entry(d_u.d_in_lookup_hash) dentry found but skips because if (d_unhashed(dentry)) continue; before d_wait_lookup(). + new->d_flags |= DCACHE_PAR_LOOKUP + new->d_wait = wq + hlist_bl_add_head_rcu(new->d_u.d_in_lookup_hash) + hlist_bl_unlock() + return new --> another same named dentry is added into in-lookup hash. On the other hand, in case of processB acquires hlist_bl_lock() between (BL3) and (BL4) in processA's flow, processB will detect the parent's i_dir_seq is modified and 'goto retry'. It is good. Finally the race condition I am afraid is - processB aqcuires BL0 between processA's BL1 and BL2. and - processB tests DS1 before processA's DS2. J. R. Okajima ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-20 4:34 ` J. R. Okajima @ 2016-06-20 5:35 ` Al Viro 2016-06-20 14:51 ` Al Viro 2016-06-23 1:19 ` Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima 0 siblings, 2 replies; 16+ messages in thread From: Al Viro @ 2016-06-20 5:35 UTC (permalink / raw) To: J. R. Okajima; +Cc: linux-fsdevel On Mon, Jun 20, 2016 at 01:34:14PM +0900, J. R. Okajima wrote: > > Al Viro: > > How would processB get past d_wait_lookup()? It would have to have > > By the first d_unhashed() test in the loop, processB doesn't reach > d_wait_lookup(). Huh? What first d_unhashed()... <stares> That check is definitely bogus and I'm completely at loss as to WTF is it doing there. Thanks for catching that; this kind of idiotic braino can escape notice when rereading the code again and again, unfortunately ;-/ Fixed, will push to Linus tonight or tomorrow. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-20 5:35 ` Al Viro @ 2016-06-20 14:51 ` Al Viro 2016-06-20 17:14 ` [git pull] vfs fixes Al Viro 2016-06-23 1:19 ` Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima 1 sibling, 1 reply; 16+ messages in thread From: Al Viro @ 2016-06-20 14:51 UTC (permalink / raw) To: J. R. Okajima; +Cc: linux-fsdevel, Linus Torvalds On Mon, Jun 20, 2016 at 06:35:30AM +0100, Al Viro wrote: > On Mon, Jun 20, 2016 at 01:34:14PM +0900, J. R. Okajima wrote: > > > > Al Viro: > > > How would processB get past d_wait_lookup()? It would have to have > > > > By the first d_unhashed() test in the loop, processB doesn't reach > > d_wait_lookup(). > > Huh? What first d_unhashed()... <stares> > > That check is definitely bogus and I'm completely at loss as to WTF is it > doing there. Thanks for catching that; this kind of idiotic braino can > escape notice when rereading the code again and again, unfortunately ;-/ > > Fixed, will push to Linus tonight or tomorrow. FWIW, I understand how it got there; it was a garbage from cut'n'paste from lockless primary hash lookups (cut'n'paste was for the sake of "compare the name" logics). It was absolutely wrong - dentry is never added to the primary hash until it has been removed from in-lookup one. And we are walking the in-lookup hash chain with its bitlock held, so there's no chance of that. In effect that junk prevented d_alloc_parallel() from *ever* spotting in-lookup matches. What's more, removing it has instantly uncovered another bug in the match-handling code - dget() done under the chain bitlock, which nests inside ->d_lock. Trivially fixed, of course (we just hold rcu_read_lock() through the in-lookup hash search and instead of dget() while holding the chain bitlock do lockref_get_not_dead() after dropping the bitlock), but... *ouch* It's going through the local tests right now; seems to be OK so far; I'll send a pull request once it's through those. But this demonstrates why RTFS (and by somebody other than the author of TFS being R) is really, _really_ important. I have read through that loop many times and kept missing that turdlet ;-/ Al, wearing a brown paperbag ;-/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* [git pull] vfs fixes 2016-06-20 14:51 ` Al Viro @ 2016-06-20 17:14 ` Al Viro 0 siblings, 0 replies; 16+ messages in thread From: Al Viro @ 2016-06-20 17:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel A couple more of d_walk()/d_subdirs reordering fixes (stable fodder; ought to solve that crap for good) and a fix for a brown paperbag bug in d_alloc_parallel() (this cycle). The following changes since commit 1607f09c226d1378439c411baaaa020042750338: coredump: fix dumping through pipes (2016-06-07 22:07:09 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus for you to fetch changes up to e7d6ef9790bc281f5c29d0132b68031248523fe8: fix idiotic braino in d_alloc_parallel() (2016-06-20 10:07:42 -0400) ---------------------------------------------------------------- Al Viro (3): much milder d_walk() race autofs races fix idiotic braino in d_alloc_parallel() fs/autofs4/autofs_i.h | 8 ++++-- fs/autofs4/expire.c | 27 ++++++------------ fs/autofs4/root.c | 2 +- fs/dcache.c | 75 ++++++++++++++++++++++++++++++++++++++++++-------- fs/internal.h | 1 + fs/libfs.c | 4 +-- include/linux/dcache.h | 1 + 7 files changed, 82 insertions(+), 36 deletions(-) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-20 5:35 ` Al Viro 2016-06-20 14:51 ` Al Viro @ 2016-06-23 1:19 ` J. R. Okajima 2016-06-23 2:58 ` Al Viro 1 sibling, 1 reply; 16+ messages in thread From: J. R. Okajima @ 2016-06-23 1:19 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel Al Viro: > That check is definitely bogus and I'm completely at loss as to WTF is it > doing there. Thanks for catching that; this kind of idiotic braino can > escape notice when rereading the code again and again, unfortunately ;-/ > > Fixed, will push to Linus tonight or tomorrow. Thank you too for fixing. I've confirmed the patch was merged and it passed my local tests too. Happy. I have another and relatively less important suggestion. Since the two groups tests in the loop are very similar, how about extract them as a new single static function? Do you think it will invite a performance down? J. R. Okajima diff --git a/fs/dcache.c b/fs/dcache.c index 76afffd..3a9ebbc 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2462,14 +2462,42 @@ static void d_wait_lookup(struct dentry *dentry) } } +/* tests for d_alloc_parallel() */ +static bool dap_test(struct dentry *dentry, const struct qstr *name, + struct dentry *parent, bool do_unhashed_test) +{ + bool ret; + + ret = false; + if (dentry->d_name.hash != name->hash) + goto out; + if (dentry->d_parent != parent) + goto out; + if (do_unhashed_test && d_unhashed(dentry)) + goto out; + if (parent->d_flags & DCACHE_OP_COMPARE) { + int tlen = dentry->d_name.len; + const char *tname = dentry->d_name.name; + if (parent->d_op->d_compare(parent, dentry, tlen, tname, name)) + goto out; + } else { + unsigned int len = name->len; + if (dentry->d_name.len != len) + goto out; + if (dentry_cmp(dentry, name->name, len)) + goto out; + } + ret = true; + +out: + return ret; +} + struct dentry *d_alloc_parallel(struct dentry *parent, const struct qstr *name, wait_queue_head_t *wq) { - unsigned int len = name->len; - unsigned int hash = name->hash; - const unsigned char *str = name->name; - struct hlist_bl_head *b = in_lookup_hash(parent, hash); + struct hlist_bl_head *b = in_lookup_hash(parent, name->hash); struct hlist_bl_node *node; struct dentry *new = d_alloc(parent, name); struct dentry *dentry; @@ -2515,21 +2543,8 @@ retry: * we encounter. */ hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) { - if (dentry->d_name.hash != hash) - continue; - if (dentry->d_parent != parent) + if (!dap_test(dentry, name, parent, /*do_unhashed_test*/false)) continue; - if (parent->d_flags & DCACHE_OP_COMPARE) { - int tlen = dentry->d_name.len; - const char *tname = dentry->d_name.name; - if (parent->d_op->d_compare(parent, dentry, tlen, tname, name)) - continue; - } else { - if (dentry->d_name.len != len) - continue; - if (dentry_cmp(dentry, str, len)) - continue; - } hlist_bl_unlock(b); /* now we can try to grab a reference */ if (!lockref_get_not_dead(&dentry->d_lockref)) { @@ -2550,23 +2565,9 @@ retry: * d_lookup() would've found anyway. If it is, just return it; * otherwise we really have to repeat the whole thing. */ - if (unlikely(dentry->d_name.hash != hash)) - goto mismatch; - if (unlikely(dentry->d_parent != parent)) + if (unlikely(!dap_test(dentry, name, parent, + /*do_unhashed_test*/true))) goto mismatch; - if (unlikely(d_unhashed(dentry))) - goto mismatch; - if (parent->d_flags & DCACHE_OP_COMPARE) { - int tlen = dentry->d_name.len; - const char *tname = dentry->d_name.name; - if (parent->d_op->d_compare(parent, dentry, tlen, tname, name)) - goto mismatch; - } else { - if (unlikely(dentry->d_name.len != len)) - goto mismatch; - if (unlikely(dentry_cmp(dentry, str, len))) - goto mismatch; - } /* OK, it *is* a hashed match; return it */ spin_unlock(&dentry->d_lock); dput(new); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-23 1:19 ` Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima @ 2016-06-23 2:58 ` Al Viro 2016-06-24 5:57 ` Linus Torvalds 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2016-06-23 2:58 UTC (permalink / raw) To: J. R. Okajima; +Cc: linux-fsdevel, Linus Torvalds, George Spelvin [Linus and George Cc'd since it's close to the area affected by hash rework and friends] On Thu, Jun 23, 2016 at 10:19:16AM +0900, J. R. Okajima wrote: > > Al Viro: > > That check is definitely bogus and I'm completely at loss as to WTF is it > > doing there. Thanks for catching that; this kind of idiotic braino can > > escape notice when rereading the code again and again, unfortunately ;-/ > > > > Fixed, will push to Linus tonight or tomorrow. > > Thank you too for fixing. > I've confirmed the patch was merged and it passed my local tests > too. Happy. > I have another and relatively less important suggestion. Since the two > groups tests in the loop are very similar, how about extract them as a > new single static function? > Do you think it will invite a performance down? We do have too many duplicates, especially if you count the related but not identical ones as well. 1) __d_lookup(): check hash grab d_lock to stabilize the rest check parent check if it's hashed check name/length 2) d_alloc_parallel(), search in in-lookup hash: hash/parent/name stable for everything in in-lookup hash, need no locks check hash check parent check name/length 3) d_alloc_parallel(), check after waiting: d_lock already held check hash check parent check if it's hashed check name/length 4) d_exact_alias(): check hash check parent check name/length (simplified since at the moment it's only used for filesystems that do not have ->d_compare()). FWIW, now that I look at d_exact_alias()... the comment in there needs an update; the code is correct, but only because we don't call that thing for directories. Which means that d_splice_alias()-induced moves do not affect us, leaving only d_move(), which is always called with parent locked exclusive. Hell knows... Order of checks can be delicate; out of those cases, (1) and (2) are on the hottest paths. We certainly can do this: static always_inline bool d_same_name(const struct dentry *dentry, const struct dentry *parent, const struct qstr *name) { if (parent->d_flags & DCACHE_OP_COMPARE) { int tlen = dentry->d_name.len; const char *tname = dentry->d_name.name; if (parent->d_op->d_compare(parent, dentry, tlen, tname, name)) return false; } else { if (dentry->d_name.len != qstr->len) return false; if (dentry_cmp(dentry, qstr->name, qstr->len)) return false; } return true; } then switch all four to this. That would reduce the amount of boilerplate; I would hesitate to replace always_inline with inline, since we don't want (1) and (2) to become uninlined, but maybe even that doesn't matter all that much - most of rejects will happen without looking at the names. It really needs profiling... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-23 2:58 ` Al Viro @ 2016-06-24 5:57 ` Linus Torvalds 2016-06-25 22:54 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2016-06-24 5:57 UTC (permalink / raw) To: Al Viro; +Cc: J. R. Okajima, linux-fsdevel, George Spelvin On Wed, Jun 22, 2016 at 7:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Hell knows... Order of checks can be delicate; out of those cases, (1) and (2) > are on the hottest paths. Actuially, as long as we leave the actual RCU paths alone, none of the lookup paths are all that hot in any profile I've seen recently. So I'm certainly ok with using that d_same_name() helper for the four cases you mention (__d_lookup, d_alloc_parallel*2, d_exact_alias). In fact, if you add a "unlikely()" for the DCACHE_OP_COMPARE test, you might even improve on code generation. The non-RCU case basically never shows up in profiles (it used to, with symlinks, but you fixed that case), and if it ever does I suspect that the fix will be to make sure we don't fall out of rcu. So don't worry too much about __d_lookup() being a hot case, I don't think it is. (Of course, if you have a load that shows me wrong, let's look at it by all means. Maybe the loads I have used have been bad) Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-24 5:57 ` Linus Torvalds @ 2016-06-25 22:54 ` Al Viro 2016-06-26 1:25 ` Linus Torvalds 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2016-06-25 22:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: J. R. Okajima, linux-fsdevel, George Spelvin On Thu, Jun 23, 2016 at 10:57:09PM -0700, Linus Torvalds wrote: > The non-RCU case basically never shows up in profiles (it used to, > with symlinks, but you fixed that case), and if it ever does I suspect > that the fix will be to make sure we don't fall out of rcu. > > So don't worry too much about __d_lookup() being a hot case, I don't > think it is. > > (Of course, if you have a load that shows me wrong, let's look at it > by all means. Maybe the loads I have used have been bad) BTW, speaking of that area - is there any reason why dentry_cmp() isn't simply return dentry_string_cmp(lockless_dereference(dentry->d_name.name), ct, tcount); other than "it predates lockless_dereference()"? The only difference is s/ACCESS_ONCE/READ_ONCE/, AFAICS - lockless_dereference() uses the latter these days. And that shouldn't be a problem; as the matter of fact, *all* remaining ACCESS_ONCE in fs/dcache.c look like they should become READ_ONCE. Objections? Al, digging through the barrier-related issues with dentries and peeling the paint off the walls in process... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-25 22:54 ` Al Viro @ 2016-06-26 1:25 ` Linus Torvalds 2016-06-29 8:17 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2016-06-26 1:25 UTC (permalink / raw) To: Al Viro; +Cc: J. R. Okajima, linux-fsdevel, George Spelvin On Sat, Jun 25, 2016 at 3:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > BTW, speaking of that area - is there any reason why dentry_cmp() > isn't simply > return dentry_string_cmp(lockless_dereference(dentry->d_name.name), > ct, tcount); > other than "it predates lockless_dereference()"? No, the only important thing is that it's a read-once, and has the smp_read_barrier_depends(). So lockless_derefence() looks like the right thing to do, it is just that - as you suspected - the code predates that concept. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-26 1:25 ` Linus Torvalds @ 2016-06-29 8:17 ` Al Viro 2016-06-29 9:22 ` Hekuang 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2016-06-29 8:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-fsdevel, He Kuang On Sat, Jun 25, 2016 at 06:25:58PM -0700, Linus Torvalds wrote: > On Sat, Jun 25, 2016 at 3:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > BTW, speaking of that area - is there any reason why dentry_cmp() > > isn't simply > > return dentry_string_cmp(lockless_dereference(dentry->d_name.name), > > ct, tcount); > > other than "it predates lockless_dereference()"? > > No, the only important thing is that it's a read-once, and has the > smp_read_barrier_depends(). > > So lockless_derefence() looks like the right thing to do, it is just > that - as you suspected - the code predates that concept. ... and as the matter of fact, patch to that effect (only cosmetically different from what I'd cooked) had been posted by He Kuang back in March. At least I'd spotted that before pushing mine out... The patch posted back then applied, with slightly edited commit message. Apologies for missing it... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel() 2016-06-29 8:17 ` Al Viro @ 2016-06-29 9:22 ` Hekuang 0 siblings, 0 replies; 16+ messages in thread From: Hekuang @ 2016-06-29 9:22 UTC (permalink / raw) To: Al Viro, Linus Torvalds; +Cc: linux-fsdevel 在 2016/6/29 16:17, Al Viro 写道: > On Sat, Jun 25, 2016 at 06:25:58PM -0700, Linus Torvalds wrote: >> On Sat, Jun 25, 2016 at 3:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >>> BTW, speaking of that area - is there any reason why dentry_cmp() >>> isn't simply >>> return dentry_string_cmp(lockless_dereference(dentry->d_name.name), >>> ct, tcount); >>> other than "it predates lockless_dereference()"? >> No, the only important thing is that it's a read-once, and has the >> smp_read_barrier_depends(). >> >> So lockless_derefence() looks like the right thing to do, it is just >> that - as you suspected - the code predates that concept. > ... and as the matter of fact, patch to that effect (only cosmetically different > from what I'd cooked) had been posted by He Kuang back in March. At least > I'd spotted that before pushing mine out... > > The patch posted back then applied, with slightly edited commit message. Apologies > for missing it... Its okay and really appreciate you remembering, I've forgotten that patch if not mentioned. :-) Thank you. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-06-29 9:25 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-17 20:50 Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima 2016-06-17 22:16 ` Al Viro 2016-06-17 22:56 ` Al Viro 2016-06-19 5:24 ` J. R. Okajima 2016-06-19 16:55 ` Al Viro 2016-06-20 4:34 ` J. R. Okajima 2016-06-20 5:35 ` Al Viro 2016-06-20 14:51 ` Al Viro 2016-06-20 17:14 ` [git pull] vfs fixes Al Viro 2016-06-23 1:19 ` Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima 2016-06-23 2:58 ` Al Viro 2016-06-24 5:57 ` Linus Torvalds 2016-06-25 22:54 ` Al Viro 2016-06-26 1:25 ` Linus Torvalds 2016-06-29 8:17 ` Al Viro 2016-06-29 9:22 ` Hekuang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).