public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Steven Whitehouse <swhiteho@redhat.com>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org,
	cluster-devel@redhat.com, Abhijith Das <adas@redhat.com>
Subject: Re: [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas
Date: Wed, 22 Jan 2014 09:43:09 +0000	[thread overview]
Message-ID: <1390383789.2742.13.camel@menhir> (raw)
In-Reply-To: <52DF5FE9.9020508@oracle.com>

Hi,

On Wed, 2014-01-22 at 01:06 -0500, Sasha Levin wrote:
> On 01/22/2014 12:32 AM, Paul E. McKenney wrote:
> > On Mon, Jan 20, 2014 at 12:23:40PM +0000, Steven Whitehouse wrote:
> >> >Prior to this patch, GFS2 kept all the quotas for each
> >> >super block in a single linked list. This is rather slow
> >> >when there are large numbers of quotas.
> >> >
> >> >This patch introduces a hlist_bl based hash table, similar
> >> >to the one used for glocks. The initial look up of the quota
> >> >is now lockless in the case where it is already cached,
> >> >although we still have to take the per quota spinlock in
> >> >order to bump the ref count. Either way though, this is a
> >> >big improvement on what was there before.
> >> >
> >> >The qd_lock and the per super block list is preserved, for
> >> >the time being. However it is intended that since this is no
> >> >longer used for its original role, it should be possible to
> >> >shrink the number of items on that list in due course and
> >> >remove the requirement to take qd_lock in qd_get.
> >> >
> >> >Signed-off-by: Steven Whitehouse<swhiteho@redhat.com>
> >> >Cc: Abhijith Das<adas@redhat.com>
> >> >Cc: Paul E. McKenney<paulmck@linux.vnet.ibm.com>
> > Interesting!  I thought that Sasha Levin had a hash table in the works,
> > but I don't see it, so CCing him.
> 
> Indeed, there is a hlist based hashtable at include/linux/hashtable.h for couple kernel
> versions now. However, there's no hlist_bl one.
> 
> If there is a plan on adding a hlist_bl hashtable for whatever reason, it should probably
> be done by expanding hashtable.h so that more places that use hlist_bl would benefit from it (yes,
> there are couple more places that do hlist_bl hashtable).
> 
> Also, do we really want to use hlist_bl here? It doesn't seem like it's being done to conserve on
> memory, and that's the only reason it should be used for. Doing a single spinlock per bucket is
> much more efficient than using the bit locking scheme that hlist_bl does.
> 
> 
> Thanks,
> Sasha

So this will probably make a bit more sense with a bit of history to
explain how we got here... The recent addition of the quota hash table
is modeled upon the one we are using for glocks at the moment. The glock
hash table at one stage, had one lock per bucket, and was then expanded
so that we had more buckets, but the same number of locks - to conserve
memory. It was then expanded again when we moved to hlist_bl a little
while ago. At each stage we gained performance so the changes seemed to
be a good thing.

Really the max number of glocks depends on the size of the memory, so
that we should really try to scale the hash table with increasing memory
size, however a simpler static table has worked reasonably well for now.
If we could use a tree (or forest) instead of a hash table that would be
even better, but still a bit of a pain to do with RCU I think, which is
why I've not gone that extra step yet.

Now the quota hash table has been modeled as a smaller version of the
glock hash table for the time being. The question is how large it should
be?... any more than a single hash list head is an improvement on the
previous code, and maybe the table should be larger than I've made it
currently. We'll see whether anybody runs into issues if they have large
number of quotas in due course.

So perhaps the memory saving is not the most important thing with the
quota hash table, but at least it matches in form the glock hash table
which has been proven over many releases as being effective. However, if
we can make use of some generic code that solves many of the problems
for us, then I can certainly look into that.

The changes in the quota code are not complete yet, and with a bit of
luck we'll have a further set of changes ready for the next merge window
too,

Steve.



  reply	other threads:[~2014-01-22  9:44 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
2014-01-20 12:23 ` [PATCH 01/24] GFS2: If requested is too large, use the largest extent in the rgrp Steven Whitehouse
2014-01-20 12:23 ` [PATCH 02/24] GFS2: Drop inadequate rgrps from the reservation tree Steven Whitehouse
2014-01-20 12:23 ` [PATCH 03/24] GFS2: Implement a "rgrp has no extents longer than X" scheme Steven Whitehouse
2014-01-20 12:23 ` [PATCH 04/24] GFS2: Clean up releasepage Steven Whitehouse
2014-01-20 12:23 ` [PATCH 05/24] GFS2: Remove gfs2_quota_change_host structure Steven Whitehouse
2014-01-20 12:23 ` [PATCH 06/24] GFS2: Remove test which is always true Steven Whitehouse
2014-01-20 12:23 ` [PATCH 07/24] GFS2: Use range based functions for rgrp sync/invalidation Steven Whitehouse
2014-01-20 12:23 ` [PATCH 08/24] GFS2: Use only a single address space for rgrps Steven Whitehouse
2014-01-20 12:23 ` [PATCH 09/24] GFS2: Add directory addition info structure Steven Whitehouse
2014-01-20 12:23 ` [PATCH 10/24] GFS2: Consolidate transaction blocks calculation for dir add Steven Whitehouse
2014-01-20 12:23 ` [PATCH 11/24] GFS2: Remember directory insert point Steven Whitehouse
2014-01-20 12:23 ` [PATCH 12/24] GFS2: Increase i_writecount during gfs2_setattr_chown Steven Whitehouse
2014-01-20 12:23 ` [PATCH 13/24] GFS2: For exhash conversion, only one block is needed Steven Whitehouse
2014-01-20 12:23 ` [PATCH 14/24] GFS2: Add hints to directory leaf blocks Steven Whitehouse
2014-01-20 12:23 ` [PATCH 15/24] GFS2: Add initialization for address space in super block Steven Whitehouse
2014-01-20 12:23 ` [PATCH 16/24] GFS2: No need to invalidate pages for a dio read Steven Whitehouse
2014-01-20 12:23 ` [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas Steven Whitehouse
2014-01-22  5:32   ` Paul E. McKenney
2014-01-22  6:06     ` Sasha Levin
2014-01-22  9:43       ` Steven Whitehouse [this message]
2014-01-22  9:58     ` Steven Whitehouse
2014-01-20 12:23 ` [PATCH 18/24] GFS2: Only run logd and quota when mounted read/write Steven Whitehouse
2014-01-20 12:23 ` [PATCH 19/24] GFS2: Clean up quota slot allocation Steven Whitehouse
2014-01-20 12:23 ` [PATCH 20/24] GFS2: Move quota bitmap operations under their own lock Steven Whitehouse
2014-01-20 12:23 ` [PATCH 21/24] GFS2: Fix kbuild test robot reported warning Steven Whitehouse
2014-01-20 12:23 ` [PATCH 22/24] GFS2: Don't use ENOBUFS when ENOMEM is the correct error code Steven Whitehouse
2014-01-20 12:23 ` [PATCH 23/24] GFS2: Small cleanup Steven Whitehouse
2014-01-20 12:23 ` [PATCH 24/24] GFS2: revert "GFS2: d_splice_alias() can't return error" Steven Whitehouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1390383789.2742.13.camel@menhir \
    --to=swhiteho@redhat.com \
    --cc=adas@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox