From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Akira Yokosawa <akiyks@gmail.com>
Cc: Junchang Wang <junchangwang@gmail.com>, perfbook@vger.kernel.org
Subject: Re: Question regarding hash_resize
Date: Tue, 8 Jan 2019 07:32:26 -0800 [thread overview]
Message-ID: <20190108153225.GO1215@linux.ibm.com> (raw)
In-Reply-To: <24b85a58-a2fe-3ce4-e58c-bf67488ae6b8@gmail.com>
On Wed, Jan 09, 2019 at 12:18:14AM +0900, Akira Yokosawa wrote:
> On 2019/01/07 15:48:50 -0800, Paul E. McKenney wrote:
> > On Tue, Jan 08, 2019 at 08:06:51AM +0900, Akira Yokosawa wrote:
> >> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> >>> Hi Paul,
> >>>
> >>> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> >>>> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I'm reading hash_resize recently, and have a few questions regarding
> >>>>> this algorithm. Please take a look if you have time. Any suggestions
> >>>>> are warmly welcomed.
> >>>>>
> >>>>> === Question 1 ===
> >>>>> In hash_resize.c : hashtab_lock_mod
> >>>>> 186 if (b > READ_ONCE(htp->ht_resize_cur)) {
> >>>>> 187 lsp->hbp[1] = NULL;
> >>>>> 188 return;
> >>>>> 189 }
> >>>>> 190 htp = rcu_dereference(htp->ht_new);
> >>>>>
> >>>>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> >>>>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> >>>>> compilers and hardware from reordering the two unrelated variables,
> >>>>> ht_resize_cur and ht_new. Is my understanding correct?
> >>>>
> >>>> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> >>>> section
> >>>
> >>> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> >>> an RCU read-side critical section", don't you?
> >>>
> >>>> and there is a synchronize_rcu() between the update to ->ht_new
> >>>> and the updates to ->ht_resize_cur. For more details on how this works,
> >>>> please see https://lwn.net/Articles/573497/.
> >>>>
> >>>> Of course, if you find a code path in which a call to hashtab_lock_mod()
> >>>> is invoked outside of an RCU read-side critical section, that would be
> >>>> a bug. (Can you tell me an exception to this rule, that is, a case
> >>>> where hashtab_lock_mod() could safely be invoked outside of an RCU
> >>>> read-side critical section?)
> >>>>
> >>>>> === Question 2 ===
> >>>>> In hash_resize.c, each time an updater wants to access a bucket, the
> >>>>> updater must first acquire the bucket's lock (htb_lock), preventing
> >>>>> other updaters accessing the same bucket concurrently. This approach
> >>>>> is OK if the linked list of a bucket is relatively short, but for a
> >>>>> larger system where linked lists are long enough and the
> >>>>> perftest_resize thread is running simultaneously, it could become a
> >>>>> potential performance bottleneck. One naive solution is to allow
> >>>>> multiple updaters to access the same bucket, only if they don't
> >>>>> operate on the same item of the list of this bucket. I wonder if there
> >>>>> are any existing works or discussions on this topic?
> >>>>
> >>>> One approach is to use a hashed array of locks, and to hash a given
> >>>> element's address to locate the lock to be used. Please see
> >>>> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> >>>> Needed Locks First"), including Quick Quiz 7.9, for additional details.
> >>>>
> >>>> Another approach is to use RCU to protect traversals, and locks within the
> >>>> linked-list elements themselves. These locks are conditionally acquired
> >>>> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> >>>> them in list order, and the tricks in Quick Quiz 7.9.
> >>>>
> >>>> Non-blocking synchronization can also be used, but it is often quite a
> >>>> bit more complicated. See for example the split-order list of Shalev
> >>>> and Shavit, along with Desnoyers's RCU-protected extension in the
> >>>> userspace RCU library.
> >>>>
> >>>> But it is usually -way- better to just choose a good hash function and
> >>>> to increase the number of buckets. Which is of course one reason for
> >>>> having resizable hash tables. ;-)
> >>>>
> >>>> But the other techniques can be useful in more complex linked data
> >>>> structures, such as graphs, where there is no reasonable way to
> >>>> partition the data. Nevertheless, many people choose to do the
> >>>> partitioning anyway, especially on distributed systems.
> >>>>
> >>>>> === Question 3 ===
> >>>>> Chapter Data Structures also discusses other resizable hash tables,
> >>>>> namely "Resizable, scalable, concurrent hash tables via relativistic
> >>>>> programming" from Josh Triplett, which can save memory footprint by
> >>>>> using a single pair of pointers. But my understanding is that
> >>>>> perftest_resize.c is unique in that it allows you to rebuild the hash
> >>>>> table by utilizing a different hash function, which could be very
> >>>>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> >>>>> not share this property. Is my understanding correct? Did I miss any
> >>>>> discussions on this topic in perfbook?
> >>>>
> >>>> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> >>>> (which I use in hash_resize.c) is the only one allowing arbitrary changes
> >>>> to hash functions. I expect that this advantage will become increasingly
> >>>> important as security issues become more challenging. Furthermore, I
> >>>> suspect that the pointer-pair approach is faster and more scalable.
> >>>> It is certainly simpler.
> >>>>
> >>>> On the other hand, one advantage of the other two approaches is decreased
> >>>> memory consumption.
> >>>>
> >>>> Another advantage of Josh Triplett's pointer-unzip approach is that
> >>>> concurrent updates are (in theory, anyway) not blocked for as long
> >>>> by resize operations. The other edge of this sword is that resizing
> >>>> is much slower, given the need to wait for many RCU grace periods.
> >>>>
> >>>> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> >>>> and Shavit's split-order list is that all operations are non-blocking,
> >>>> which can be important on massively overloaded systems, such as one
> >>>> might find in cloud computing.
> >>>>
> >>>>> === Question 4 ===
> >>>>> In the current implementation of hash_resize.c, the perftest_resize
> >>>>> could block an updater, and vice versa. It seems this is not what we
> >>>>> expected. Ideally, they should be allowed to run concurrently, or at
> >>>>> least the perftest_resize thread should have lower priority and
> >>>>> updaters should never be blocked by the perftest_resize thread. Is
> >>>>> that right? I'm very interested in helping improve. Please let me know
> >>>>> if you have any suggestions.
> >>>>
> >>>> In hash_resize.c, an updater is blocked only for the time required to
> >>>> redisposition a bucket. This is a great improvement over blocking
> >>>> updaters for the full resize over all buckets.
> >>>>
> >>>> But yes, it is not hard to do better, for example, periodically dropping
> >>>> the old-table lock in hashtab_resize(). This requires a few careful
> >>>> adjustments, of course. Can you tell me what these adjustments are?
> >>>>
> >>>> Hmmm... I could simplify hashtab_lookup(), couldn't I? After all,
> >>>> optimizing for the race with hashtab_resize() doesn't make a whole lot
> >>>> of sense. Please see the patch below. Thoughts?
> >>>>
> >>>> Thanx, Paul
> >>>>
> >>>> ------------------------------------------------------------------------
> >>>>
> >>>> commit 737646a9c868d841b32199b52f5569668975953e
> >>>> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> >>>> Date: Mon Jan 7 10:29:14 2019 -0800
> >>>>
> >>>> datastruct/hash: Simplify hashtab_lookup()
> >>>>
> >>>> Because resizing leaves the old hash table intact, and because lookups
> >>>> are carried out within RCU read-side critical sections (which prevent
> >>>> a second resizing operation from starting), there is no need for a
> >>>> lookup to search anywhere but in the old hash table. And in the common
> >>>> case, there is no resize, so there is no new hash table. Therefore,
> >>>> eliminating the check for resizing speeds things up in the common
> >>>> case. In addition, this simplifies the code.
> >>>>
> >>>> This commit therefore eliminates the ht_get_bucket() function,
> >>>> renames the ht_get_bucket_single() function to ht_get_bucket(),
> >>>> and modifies callers appropriately.
> >>>>
> >>>> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> >>>>
> >>>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> >>>> index 29e05f907200..be4157959b83 100644
> >>>> --- a/CodeSamples/datastruct/hash/hash_resize.c
> >>>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> >>>> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> >>>> //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> >>>> /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> >>>> static struct ht_bucket * //\lnlbl{single:b}
> >>>> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>>> - unsigned long *h)
> >>>> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> >>>> {
> >>>> unsigned long hash = htp->ht_gethash(key);
> >>>>
> >>>> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>>> *h = hash; //\lnlbl{single:h}
> >>>> return &htp->ht_bkt[*b]; //\lnlbl{single:return}
> >>>> } //\lnlbl{single:e}
> >>>> -
> >>>> -/* Get hash bucket correesponding to key, accounting for resize. */
> >>>> -static struct ht_bucket * //\lnlbl{b}
> >>>> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> >>>> -{
> >>>> - struct ht_bucket *htbp;
> >>>> -
> >>>> - htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> >>>> - //\fcvexclude
> >>>> - if (*b <= READ_ONCE((*htp)->ht_resize_cur)) { //\lnlbl{resized}
> >>>> - smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> >>>
> >>> If we can remove this memory barrier, the counterpart smp_mb() in
> >>> hashtab_resize() becomes unnecessary, doesn't it?
> >>
> >> And the WRITE_ONCE() in the following line.
> >
> > Actually, that must stay. It is true that the bucket lock is held by
> > hashtab_unlock_mod(), and that this prevents concurrent resizing of that
> > bucket, but other buckets might well be resized, which results in the
> > possibiliity of concurrent reads and writes for ->ht_resize_cur.
> >
> > Anyway, here is the resulting commit. Thoughts?
>
> Looks good to me. (Give or take a few typo in commit log.)
> I missed the remaining READ_ONCE(), which didn't appear in the diff.
>
> And if you choose this way of simplifying hashtab_lookup(),
> the Quick Quiz on smp_mb()s would become out of context.
Good catch, queued its removal with your Reported-by.
Thanx, Paul
> Thanks, Akira
>
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit 1aac0c703482c90c2ce4092b2cc604d474f5a44b
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date: Mon Jan 7 15:39:40 2019 -0800
> >
> > datastruct/hash: Remove extraneous barrier from hashtab_resize()
> >
> > Now that hashtab_lookup() is iresizing-agnostic, all non-initialization
> > accesses to ->ht_resize-cur are protected by locking in the restricted
> > sense that any change to ->ht_resize_cur that would change the value
> > of the "if" condition cannot happen while the lock is held on the old
> > bucket. This means that the memory barrier may be removed. However,
> > the READ_ONCE() and WRITE_ONCE() markings on non-initialization accesses
> > to ->ht_resize_cur must remain because reads from ->ht_resize_cur really
> > can race with writes, just not is a way to change the "if" conditions.
> >
> > Reported-by: Akira Yokosawa <akiyks@gmail.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> >
> > diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> > index be4157959b83..9f68a00dabe3 100644
> > --- a/CodeSamples/datastruct/hash/hash_resize.c
> > +++ b/CodeSamples/datastruct/hash/hash_resize.c
> > @@ -288,7 +288,6 @@ int hashtab_resize(struct hashtab *htp_master,
> > cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
> > spin_unlock(&htbp_new->htb_lock);
> > } //\lnlbl{loop_list:e}
> > - smp_mb(); /* Fill new buckets before claiming them. */
> > WRITE_ONCE(htp->ht_resize_cur, i); //\lnlbl{update_resize}
> > spin_unlock(&htbp->htb_lock); //\lnlbl{rel_oldcur}
> > } //\lnlbl{loop:e}
> > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > index 0152437c274e..e2159330790f 100644
> > --- a/datastruct/datastruct.tex
> > +++ b/datastruct/datastruct.tex
> > @@ -1245,10 +1245,7 @@ lines~\lnref{loop_list:b}-\lnref{loop_list:e} adds one data element
> > from the current old-table bucket to the corresponding new-table bucket,
> > holding the new-table bucket's lock during the add operation.
> > Line~\lnref{update_resize} updates
> > -\co{->ht_resize_cur} to indicate that this bucket has been distributed,
> > -and is preceded by a full memory barrier that pairs with the one in
> > -\co{ht_get_bucket()} shown in
> > -Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}.
> > +\co{->ht_resize_cur} to indicate that this bucket has been distributed.
> > Finally, line~\lnref{rel_oldcur} releases the old-table bucket lock.
> >
> > \QuickQuiz{}
> >
>
next prev parent reply other threads:[~2019-01-08 15:32 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-07 13:49 Question regarding hash_resize Junchang Wang
2019-01-07 18:33 ` Paul E. McKenney
2019-01-07 22:54 ` Akira Yokosawa
2019-01-07 23:06 ` Akira Yokosawa
2019-01-07 23:48 ` Paul E. McKenney
2019-01-08 15:18 ` Akira Yokosawa
2019-01-08 15:32 ` Paul E. McKenney [this message]
2019-01-08 1:56 ` Junchang Wang
2019-01-08 15:28 ` Paul E. McKenney
2019-01-08 15:35 ` Akira Yokosawa
2019-01-08 18:39 ` Paul E. McKenney
2019-01-08 22:16 ` Akira Yokosawa
2019-01-09 0:19 ` Paul E. McKenney
2019-01-09 2:59 ` Paul E. McKenney
2019-01-11 4:08 ` Paul E. McKenney
2019-01-11 14:25 ` Akira Yokosawa
2019-01-11 15:43 ` Paul E. McKenney
2019-01-11 22:56 ` Akira Yokosawa
2019-01-11 23:28 ` Paul E. McKenney
2019-01-07 23:13 ` Paul E. McKenney
2019-01-07 23:33 ` Paul E. McKenney
2019-01-18 14:32 ` Junchang Wang
2019-01-18 18:34 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190108153225.GO1215@linux.ibm.com \
--to=paulmck@linux.ibm.com \
--cc=akiyks@gmail.com \
--cc=junchangwang@gmail.com \
--cc=perfbook@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox