From: David Teigland <teigland@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree
Date: Wed, 5 Oct 2011 16:05:43 -0400 [thread overview]
Message-ID: <20111005200543.GD11895@redhat.com> (raw)
In-Reply-To: <eca5485c-6413-4cf4-a000-acd3b0f17c7a@zmail06.collab.prod.int.phx2.redhat.com>
On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote:
> Hi,
>
> This upstream patch changes the way DLM keeps track of RSBs.
> Before, they were in a linked list off a hash table. Now,
> they're an rb_tree off the same hash table. This speeds up
> DLM lookups greatly.
>
> Today's DLM is faster than older DLMs for many file systems,
> (e.g. in RHEL5) due to the larger hash table size. However,
> this rb_tree implementation scales much better. For my
> 1000-directories-with-1000-files test, the patch doesn't
> show much of an improvement. But when I scale the file system
> to 4000 directories with 4000 files (16 million files), it
> helps greatly. The time to do rm -fR /mnt/gfs2/* drops from
> 42.01 hours to 23.68 hours.
How many hash table buckets were you using in that test?
If it was the default (1024), I'd be interested to know how
16k compares.
> With this patch I believe we could also reduce the size of
> the hash table again or eliminate it completely, but we can
> evaluate and do that later.
>
> NOTE: Today's upstream DLM code has special code to
> pre-allocate RSB structures for faster lookup. This patch
> eliminates that step, since it doesn't have a resource name
> at that time for inserting new entries in the rb_tree.
We need to keep that; why do you say there's no resource name?
pre_rsb_struct() and get_rsb_struct() are specially designed to work
as they do because:
> @@ -367,28 +336,16 @@ static int get_rsb_struct(struct dlm_ls *ls, char *name, int len,
> struct dlm_rsb **r_ret)
> + r = dlm_allocate_rsb(ls);
> + if (!r)
> + return -ENOMEM;
That's not allowed here because a spinlock is held:
> spin_lock(&ls->ls_rsbtbl[bucket].lock);
>
> error = _search_rsb(ls, name, namelen, bucket, flags, &r);
> @@ -508,10 +492,6 @@ static int find_rsb(struct dlm_ls *ls, char *name, int namelen,
> goto out_unlock;
>
> error = get_rsb_struct(ls, name, namelen, &r);
> - if (error == -EAGAIN) {
> - spin_unlock(&ls->ls_rsbtbl[bucket].lock);
> - goto retry;
> - }
> if (error)
> goto out_unlock;
If you try to fix the problem above by releasing the spinlock between the
search and the malloc, then you have to repeat the search. Eliminating
the repeated search is the main reason for pre_rsb/get_rsb.
next prev parent reply other threads:[~2011-10-05 20:05 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9ba880ab-984b-4588-b2cb-04089b0943ee@zmail06.collab.prod.int.phx2.redhat.com>
2011-10-05 19:25 ` [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree Bob Peterson
2011-10-05 20:05 ` David Teigland [this message]
2011-10-08 10:13 ` Bob Peterson
2011-10-10 14:43 ` David Teigland
2011-10-10 15:51 ` Steven Whitehouse
2011-10-10 17:01 ` David Teigland
2011-10-10 19:00 ` Steven Whitehouse
2011-10-10 19:33 ` David Teigland
2011-10-24 19:47 ` Bob Peterson
2011-10-25 23:13 ` David Teigland
2011-10-26 17:28 ` Bob Peterson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111005200543.GD11895@redhat.com \
--to=teigland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).