From: Liu Bo <liubo2009@cn.fujitsu.com>
To: dave@jikos.cz
Cc: linux-btrfs@vger.kernel.org, chris.mason@oracle.com, josef@redhat.com
Subject: Re: [RFC PATCH v2 1/3] Btrfs: add the Probabilistic Skiplist
Date: Wed, 11 Jan 2012 10:06:47 +0800 [thread overview]
Message-ID: <4F0CEEB7.2080809@cn.fujitsu.com> (raw)
In-Reply-To: <20120111003700.GT7322@twin.jikos.cz>
On 01/11/2012 08:37 AM, David Sterba wrote:
> Hi, a few thoughts and comments below:
>
> On Tue, Jan 10, 2012 at 03:31:34PM +0800, Liu Bo wrote:
>> c) The level limit may need to be adjusted.
>> I know it is a magic number, but now for simplicity we just keep it at 16,
>> and then each skiplist is able to contain (2^32-1)/3 nodes at most.
>
> (2^32-1)/3 = 1,431,655,765 that's a lot, I wonder what an average member
> count of a skiplist would be and whether eg. maxlevel = 12 is not enough
> (5,592,405 members).
>
hmm, sorry, I found I've made a mistake here,
let me correct it here (changelog will also be updated later):
As I set the probability to 1/4, the members linked on N+1 level list will be 1/4 of
those linked on N level list.
And what's more, in skiplist a node can be linked on multi levels, eg. a node with N+1 level will
also be linked on N level list.
So before the node count reaches to 4^(maxlevel - 1), the skiplist can maintain O(lgn),
and after that, it will be no more O(lgn) although we can still insert nodes into the skiplist.
That's the difference.
> or you can set the maxlevel during skiplist creation, or predefine a
> "small" skiplist with compile-time-set level to whatever < 16.
>
> this can be tuned later of course.
>
Yes, I do set the maxlevel to 16 at the creation of a skiplist.
Here 4^(16 - 1) is 2^30, I don't think this is enough for some severe workloads which build large
amount of fragments.
Maybe we should make the maxlevel self-update.
>> --- /dev/null
>> +++ b/fs/btrfs/skiplist.c
>> @@ -0,0 +1,98 @@
>> +inline int sl_fill_node(struct sl_node *node, int level, gfp_t mask)
>
> I suggest to pick the full prefix "skiplist_" instead of just "sl_",
> it'll be IMHO more readable and googlable. (Out of curiosity I grepped
> for the sl_ prefix and it's used by drivers/net/slip/slip.c).
>
I did hesitate for a while between "skiplist_" and "sl_"...
and I just wanna make it be similar to "rb_".
Anyway, I'm ok with "skiplist_".
>> +{
>> + struct sl_node **p;
>> + struct sl_node **q;
>> + int num;
>> +
>> + BUG_ON(level > MAXLEVEL);
>> +
>> + num = level + 1;
>> + p = kmalloc(sizeof(*p) * num, mask);
>> + BUG_ON(!p);
>
> you can drop the BUG_ON
>
>> + if (!p)
> ^^^^^^^
>
>> + return -ENOMEM;
>> + q = kmalloc(sizeof(*q) * num, mask);
>> + BUG_ON(!q);
> ^^^^^^^^^^
>
ok, just in case.
>> + if (!q) {
>> + kfree(p);
>> + return -ENOMEM;
>> + }
>> +
>> + node->next = p;
>> + node->prev = q;
>> + node->level = level;
>> + return 0;
>> +}
>> +
>> diff --git a/fs/btrfs/skiplist.h b/fs/btrfs/skiplist.h
>> new file mode 100644
>> index 0000000..3e414b5
>> --- /dev/null
>> +++ b/fs/btrfs/skiplist.h
>> +
>> +#define MAXLEVEL 16
>> +/* double p = 0.25; */
>> +
>> +struct sl_node {
>> + struct sl_node **next;
>> + struct sl_node **prev;
>> + unsigned int level;
>> + unsigned int head:1;
>
> the bitfield will use another sizeof(int) bytes, but the level is at
> most 16, you can reduce it's size eg to unsigned short.
> on the other hand, the structure has to start at address aligned to
> sizeof(void*) and the bytes after 'head' up to next sizeof(void*)
> boundary will be left unusable anyway. then, 'head' could be a full int
> or bool so the compiler is not restricted and forced to keep state of
> the single bit. if access to these items is exptected to be frequent,
> the diffenence could be mesurable.
>
I see. Thanks a lot for your advice!
thanks,
liubo
>> +};
>
>
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2012-01-11 2:06 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-10 7:31 [RFC PATCH v2 0/3] Btrfs: apply the Probabilistic Skiplist on btrfs Liu Bo
2012-01-10 7:31 ` [RFC PATCH v2 1/3] Btrfs: add the Probabilistic Skiplist Liu Bo
2012-01-11 0:37 ` David Sterba
2012-01-11 2:06 ` Liu Bo [this message]
2012-01-11 14:41 ` David Sterba
2012-01-11 1:36 ` David Sterba
2012-01-10 7:31 ` [RFC PATCH v2 2/3] Btrfs: rebuild extent_map based on skiplist Liu Bo
2012-01-10 7:31 ` [RFC PATCH v2 3/3] Btrfs: convert rwlock to RCU for extent_map Liu Bo
2012-01-12 21:28 ` [RFC PATCH v2 0/3] Btrfs: apply the Probabilistic Skiplist on btrfs Andi Kleen
2012-01-13 2:18 ` Liu Bo
2012-01-13 4:10 ` Dave Chinner
2012-01-13 9:01 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F0CEEB7.2080809@cn.fujitsu.com \
--to=liubo2009@cn.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=dave@jikos.cz \
--cc=josef@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).