From: linux@horizon.com
To: akepner@sgi.com, linux@horizon.com
Cc: davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [RFC/TOY]Extensible hashing and RCU
Date: 6 Feb 2007 14:09:11 -0500 [thread overview]
Message-ID: <20070206190911.23405.qmail@science.horizon.com> (raw)
In-Reply-To: <Pine.LNX.4.61.0702051030550.26852@localhost.localdomain>
> For purposes of discussion, I've attached a "toy" implementation
> for doing dynamic resizing of a hashtable. It is useless, except
> as a proof of concept.
>
> I think this is very similar to what you are describing, no?
Er... not quite; that has a lot more overhead than what I was
thinking about.
I have used the trick of distinguishable sentinel values in a
doubly-linked list to maintain read cursors while it's being updated,
but I don't think that's necessary here.
(You can also encode the nh_type flag in the lsbit of the pointer
if you're being sneaky. That will attract curses from the
memleak detector, though.)
In particular, I was imagining a singly linked list. To delete
an item, use the hash to find the head pointer and walk it
to find the pointer to be fixed up. Since the chains are short
anyway, this is entirely reasonable.
Less fundamental comments include:
1) Is the seqlock in get_nh() and nh_replace() really required?
Is there any architecture that doesn't have atomic pointer stores?
If you wanted to store the table size in a fixed location as
well, I could see the need...
2) I think the whole __nh_sort_chain business will royally confuse
anyone walking the chain while it happens. This is exactly
what I was working to avoid. The partial sorting in __nh_insert
isn't good enough.
Instead, try:
/* Return true if bitrev(x) > bitrev(y) */
static bool bitrev_gt(unsigned long x, unsinged long y)
{
/* Identify the bits that differ between x and y */
unsigned long mask = x ^ y; /* Find the bits that differ */
mask ^= mask-1; /* Find lsbit of difference (and all lower bits) */
return (x & mask) > (y & mask);
}
static void __nh_insert(struct nh_entry *entry, struct nh_head *head)
{
struct list_head *p, *n;
unsigned long const hashval = nh_hashval(entry->data);
/*
* Insert the new entry just before the first element of the list
* that its hash value is not greater than (bit-reversed).
*/
p = &head->list;
list_for_each_rcu(n, &head->list) {
struct nh_entry *t = container_of(n, struct nh_entry, nh_list);
if (t->nh_type == NH_ENTRY &&
!bitrev_gt(hashval, nh_hashval(t->data)))
break;
p = n;
}
__list_add_rcu(&entry->nh_list, p, n);
}
static int nh_add(unsigned long data)
{
struct nh_entry *entry = kmalloc(sizeof *entry, GFP_KERNEL);
struct nh *nh;
if (!entry) return -ENOMEM;
entry->nh_type = NH_ENTRY;
entry->data = data;
rcu_read_lock();
nh = get_nh(); /* or nh = __nh */
if (nh) {
struct nh_head *h = \
&nh->hash[ nh_bucket(nh_hashval(data), nh->nentries) ];
spin_lock(&h->lock);
__nh_insert(entry, h);
spin_unlock(&h->lock);
}
rcu_read_unlock();
}
Then there's no need for __nh_sort_chain at all. Alternatively, if the
upper bits of nh_hashval are as good as the lower bits, just index the
hash table on them.
3) Code inside a mutex like nh_resize() can use plain list_for_each();
the _rcu variant is only required if there can be simultaneous
mutation.
That's a nice module framework. I'll see if I can write some code
of the sort I was thinking about.
FWIW, I figured out a way around the need to delay deletion for two
RCU intervals.
Suppose that an expansion is pending, and we have just stretched the
table from 16 entries to 32, and the following hash values are
stored. (Note the bit-reversed order.)
old[3] --\
new[3] ---+-> 0x03 -> 0x43 -> 0x23 -> 0x63 -> 0x13 -> 0x53 -> 0x33 -> 0x73
/
new[3+16]-----------/
After an RCU period, you can throw away the old[] array and NUL-terminate
the new[i] list after 0x63. But until then, you need to leave the list alone
to accomodate people who are looking for 0x53 via the old head.
The tricky case comes when you delete 0x13. If you only delete it from the
new[3+16] list, you can't discard it until the RCU quiescent period
after the one which dicsonnects the 0x63->0x13 link.
The solution is actually very simple: notice when you're
- Deleting the first entry in a list
- While an expension is pending
- And the list is in the second half of the expanded table
then unlink the entry from BOTH the new head and the old list.
It's a bit more work, and requires some lock-ordering care, but
it lets you queue the node for RCU cleanup the normal way.
prev parent reply other threads:[~2007-02-06 19:09 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-04 7:41 Extensible hashing and RCU linux
2007-02-05 18:02 ` akepner
2007-02-17 13:13 ` Evgeniy Polyakov
2007-02-18 18:46 ` Eric Dumazet
2007-02-18 19:10 ` Evgeniy Polyakov
2007-02-18 20:21 ` Eric Dumazet
2007-02-18 21:23 ` Michael K. Edwards
2007-02-18 22:04 ` Michael K. Edwards
2007-02-19 12:04 ` Andi Kleen
2007-02-19 19:18 ` Michael K. Edwards
2007-02-19 11:41 ` Evgeniy Polyakov
2007-02-19 13:38 ` Eric Dumazet
2007-02-19 13:56 ` Evgeniy Polyakov
2007-02-19 14:14 ` Eric Dumazet
2007-02-19 14:25 ` Evgeniy Polyakov
2007-02-19 15:14 ` Eric Dumazet
2007-02-19 18:13 ` Eric Dumazet
2007-02-19 18:26 ` Benjamin LaHaise
2007-02-19 18:38 ` Benjamin LaHaise
2007-02-20 9:25 ` Evgeniy Polyakov
2007-02-20 9:57 ` David Miller
2007-02-20 10:22 ` Evgeniy Polyakov
2007-02-20 10:04 ` Eric Dumazet
2007-02-20 10:12 ` David Miller
2007-02-20 10:30 ` Evgeniy Polyakov
2007-02-20 11:10 ` Eric Dumazet
2007-02-20 11:23 ` Evgeniy Polyakov
2007-02-20 11:30 ` Eric Dumazet
2007-02-20 11:41 ` Evgeniy Polyakov
2007-02-20 10:49 ` Eric Dumazet
2007-02-20 15:07 ` Michael K. Edwards
2007-02-20 15:11 ` Evgeniy Polyakov
2007-02-20 15:49 ` Michael K. Edwards
2007-02-20 15:59 ` Evgeniy Polyakov
2007-02-20 16:08 ` Eric Dumazet
2007-02-20 16:20 ` Evgeniy Polyakov
2007-02-20 16:38 ` Eric Dumazet
2007-02-20 16:59 ` Evgeniy Polyakov
2007-02-20 17:05 ` Evgeniy Polyakov
2007-02-20 17:53 ` Eric Dumazet
2007-02-20 18:00 ` Evgeniy Polyakov
2007-02-20 18:55 ` Eric Dumazet
2007-02-20 19:06 ` Evgeniy Polyakov
2007-02-20 19:17 ` Eric Dumazet
2007-02-20 19:36 ` Evgeniy Polyakov
2007-02-20 19:44 ` Michael K. Edwards
2007-02-20 17:20 ` Eric Dumazet
2007-02-20 17:55 ` Evgeniy Polyakov
2007-02-20 18:12 ` Evgeniy Polyakov
2007-02-20 19:13 ` Michael K. Edwards
2007-02-20 19:44 ` Evgeniy Polyakov
2007-02-20 20:03 ` Michael K. Edwards
2007-02-20 20:09 ` Michael K. Edwards
2007-02-21 8:56 ` Evgeniy Polyakov
2007-02-21 9:34 ` David Miller
2007-02-21 9:51 ` Evgeniy Polyakov
2007-02-21 10:03 ` David Miller
2007-02-21 8:54 ` Evgeniy Polyakov
2007-02-21 9:15 ` Eric Dumazet
2007-02-21 9:27 ` Evgeniy Polyakov
2007-02-21 9:38 ` Eric Dumazet
2007-02-21 9:57 ` Evgeniy Polyakov
2007-02-21 21:15 ` Michael K. Edwards
2007-02-22 9:06 ` David Miller
2007-02-22 11:00 ` Michael K. Edwards
2007-02-22 11:07 ` David Miller
2007-02-22 19:24 ` Stephen Hemminger
2007-02-20 16:04 ` Eric Dumazet
2007-02-22 23:49 ` linux
2007-02-23 2:31 ` Michael K. Edwards
2007-02-20 10:44 ` Evgeniy Polyakov
2007-02-20 11:09 ` Eric Dumazet
2007-02-20 11:29 ` Evgeniy Polyakov
2007-02-20 11:34 ` Eric Dumazet
2007-02-20 11:45 ` Evgeniy Polyakov
2007-02-21 12:41 ` Andi Kleen
2007-02-21 13:19 ` Eric Dumazet
2007-02-21 13:37 ` David Miller
2007-02-21 23:13 ` Robert Olsson
2007-02-22 6:06 ` Eric Dumazet
2007-02-22 11:41 ` Andi Kleen
2007-02-22 11:44 ` David Miller
2007-02-20 12:11 ` Evgeniy Polyakov
2007-02-19 22:10 ` Andi Kleen
2007-02-19 12:02 ` Andi Kleen
2007-02-19 12:35 ` Robert Olsson
2007-02-19 14:04 ` Evgeniy Polyakov
2007-03-02 8:52 ` Evgeniy Polyakov
2007-03-02 9:56 ` Eric Dumazet
2007-03-02 10:28 ` Evgeniy Polyakov
2007-03-02 20:45 ` Michael K. Edwards
2007-03-03 10:46 ` Evgeniy Polyakov
2007-03-04 10:02 ` Michael K. Edwards
2007-03-04 20:36 ` David Miller
2007-03-05 7:12 ` Michael K. Edwards
2007-03-05 10:02 ` Robert Olsson
2007-03-05 10:00 ` Evgeniy Polyakov
2007-03-13 9:32 ` Evgeniy Polyakov
2007-03-13 10:08 ` Eric Dumazet
2007-03-13 10:24 ` Evgeniy Polyakov
2007-02-05 18:41 ` [RFC/TOY]Extensible " akepner
2007-02-06 19:09 ` linux [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070206190911.23405.qmail@science.horizon.com \
--to=linux@horizon.com \
--cc=akepner@sgi.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).