linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <liubo2009@cn.fujitsu.com>
To: dave@jikos.cz
Cc: linux-btrfs@vger.kernel.org, chris.mason@oracle.com, josef@redhat.com
Subject: Re: [RFC PATCH v2 1/3] Btrfs: add the Probabilistic Skiplist
Date: Wed, 11 Jan 2012 10:06:47 +0800	[thread overview]
Message-ID: <4F0CEEB7.2080809@cn.fujitsu.com> (raw)
In-Reply-To: <20120111003700.GT7322@twin.jikos.cz>

On 01/11/2012 08:37 AM, David Sterba wrote:
> Hi, a few thoughts and comments below:
> 
> On Tue, Jan 10, 2012 at 03:31:34PM +0800, Liu Bo wrote:
>> c) The level limit may need to be adjusted.
>>    I know it is a magic number, but now for simplicity we just keep it at 16,
>>    and then each skiplist is able to contain (2^32-1)/3 nodes at most.
> 
> (2^32-1)/3 = 1,431,655,765 that's a lot, I wonder what an average member
> count of a skiplist would be and whether eg. maxlevel = 12 is not enough
> (5,592,405 members).
> 


hmm, sorry, I found I've made a mistake here,
let me correct it here (changelog will also be updated later):

As I set the probability to 1/4, the members linked on N+1 level list will be 1/4 of
those linked on N level list.

And what's more, in skiplist a node can be linked on multi levels, eg. a node with N+1 level will
also be linked on N level list.

So before the node count reaches to 4^(maxlevel - 1), the skiplist can maintain O(lgn),
and after that, it will be no more O(lgn) although we can still insert nodes into the skiplist.

That's the difference.


> or you can set the maxlevel during skiplist creation, or predefine a
> "small" skiplist with compile-time-set level to whatever < 16.
>
> this can be tuned later of course.
> 

Yes, I do set the maxlevel to 16 at the creation of a skiplist.

Here 4^(16 - 1) is 2^30, I don't think this is enough for some severe workloads which build large
amount of fragments.

Maybe we should make the maxlevel self-update.


>> --- /dev/null
>> +++ b/fs/btrfs/skiplist.c
>> @@ -0,0 +1,98 @@
>> +inline int sl_fill_node(struct sl_node *node, int level, gfp_t mask)
> 
> I suggest to pick the full prefix "skiplist_" instead of just "sl_",
> it'll be IMHO more readable and googlable. (Out of curiosity I grepped
> for the sl_ prefix and it's used by drivers/net/slip/slip.c).
>

I did hesitate for a while between "skiplist_" and "sl_"...
and I just wanna make it be similar to "rb_".

Anyway, I'm ok with "skiplist_".

>> +{
>> +	struct sl_node **p;
>> +	struct sl_node **q;
>> +	int num;
>> +
>> +	BUG_ON(level > MAXLEVEL);
>> +
>> +	num = level + 1;
>> +	p = kmalloc(sizeof(*p) * num, mask);
>> +	BUG_ON(!p);
> 
> you can drop the BUG_ON
> 
>> +	if (!p)
>         ^^^^^^^
> 
>> +		return -ENOMEM;
>> +	q = kmalloc(sizeof(*q) * num, mask);
>> +	BUG_ON(!q);
>         ^^^^^^^^^^
> 

ok, just in case.

>> +	if (!q) {
>> +		kfree(p);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	node->next = p;
>> +	node->prev = q;
>> +	node->level = level;
>> +	return 0;
>> +}
>> +
>> diff --git a/fs/btrfs/skiplist.h b/fs/btrfs/skiplist.h
>> new file mode 100644
>> index 0000000..3e414b5
>> --- /dev/null
>> +++ b/fs/btrfs/skiplist.h
>> +
>> +#define MAXLEVEL 16
>> +/* double p = 0.25; */
>> +
>> +struct sl_node {
>> +	struct sl_node **next;
>> +	struct sl_node **prev;
>> +	unsigned int level;
>> +	unsigned int head:1;
> 
> the bitfield will use another sizeof(int) bytes, but the level is at
> most 16, you can reduce it's size eg to unsigned short.
> on the other hand, the structure has to start at address aligned to
> sizeof(void*) and the bytes after 'head' up to next sizeof(void*)
> boundary will be left unusable anyway. then, 'head' could be a full int
> or bool so the compiler is not restricted and forced to keep state of
> the single bit. if access to these items is exptected to be frequent,
> the diffenence could be mesurable.
> 

I see.  Thanks a lot for your advice!

thanks,
liubo

>> +};
> 
> 
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  reply	other threads:[~2012-01-11  2:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-10  7:31 [RFC PATCH v2 0/3] Btrfs: apply the Probabilistic Skiplist on btrfs Liu Bo
2012-01-10  7:31 ` [RFC PATCH v2 1/3] Btrfs: add the Probabilistic Skiplist Liu Bo
2012-01-11  0:37   ` David Sterba
2012-01-11  2:06     ` Liu Bo [this message]
2012-01-11 14:41       ` David Sterba
2012-01-11  1:36   ` David Sterba
2012-01-10  7:31 ` [RFC PATCH v2 2/3] Btrfs: rebuild extent_map based on skiplist Liu Bo
2012-01-10  7:31 ` [RFC PATCH v2 3/3] Btrfs: convert rwlock to RCU for extent_map Liu Bo
2012-01-12 21:28 ` [RFC PATCH v2 0/3] Btrfs: apply the Probabilistic Skiplist on btrfs Andi Kleen
2012-01-13  2:18   ` Liu Bo
2012-01-13  4:10     ` Dave Chinner
2012-01-13  9:01       ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F0CEEB7.2080809@cn.fujitsu.com \
    --to=liubo2009@cn.fujitsu.com \
    --cc=chris.mason@oracle.com \
    --cc=dave@jikos.cz \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).