[Cluster-devel] [GFS2 PATCH] GFS2: Don't brelse rgrp buffer_heads every allocation

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Don't brelse rgrp buffer_heads every allocation
Date: Fri, 12 Jun 2015 15:50:34 -0400 (EDT)	[thread overview]
Message-ID: <2055885404.16127476.1434138634146.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <557811B0.2050406@redhat.com>

----- Original Message -----
> Hi,
> 
> 
> On 09/06/15 15:45, Bob Peterson wrote:
> > ----- Original Message -----
> >> Hi,
> >>
> >>
> >> On 05/06/15 15:49, Bob Peterson wrote:
> >>> Hi,
> >>>
> >>> This patch allows the block allocation code to retain the buffers
> >>> for the resource groups so they don't need to be re-read from buffer
> >>> cache with every request. This is a performance improvement that's
> >>> especially noticeable when resource groups are very large. For
> >>> example, with 2GB resource groups and 4K blocks, there can be 33
> >>> blocks for every resource group. This patch allows those 33 buffers
> >>> to be kept around and not read in and thrown away with every
> >>> operation. The buffers are released when the resource group is
> >>> either synced or invalidated.
> >> The blocks should be cached between operations, so this should only be
> >> resulting in a skip of the look up of the cached block, and no changes
> >> to the actual I/O. Does that mean that grab_cache_page() is slow I
> >> wonder? Or is this an issue of going around the retry loop due to lack
> >> of memory at some stage?
> >>
> >> How does this interact with the rgrplvb support? I'd guess that with
> >> that turned on, this is no longer an issue, because we'd only read in
> >> the blocks for the rgrps that we are actually going to use?
> >>
> >>
> >>
> >> Steve.
> > Hi,
> >
> > If you compare the two vmstat outputs in the bugzilla #1154782, you'll
> > see no significant difference in memory usage nor cpu usage. So I assume
> > the page lookup is the "slow" part; not because it's such a slow thing
> > but because it's done 33 times per read-reference-invalidate (33 pages
> > to look up per rgrp).
> >
> > Regards,
> >
> > Bob Peterson
> > Red Hat File Systems
> 
> Thats true, however, as I understand the problem here, the issue is not
> reading in the blocks for the rgrp that is eventually selected to use,
> but the reading in of those blocks for the rgrps that we reject, for
> whatever reason (full, or congested, or whatever). So with rgrplvb
> enabled, we don't then read those rgrps in off disk at all in most cases
> - so I was wondering whether that solves the problem without needing
> this change?
> 
> Ideally I'd like to make the rgrplvb setting the default, since it is
> much more efficient. The question is how we can do that and still remain
> backward compatible? Not an easy one to answer :(
> 
> Also, if the page lookup is the slow thing, then we should look at using
> pagevec_lookup() to get the pages in chunks rather than doing it
> individually (and indeed, multiple times per page, in case of block size
> less than page size). We know that the blocks will always be contiguous
> on disk, so we should be able to send down large I/Os, rather than
> relying on the block stack to merge them as we do at the moment, which
> should be a further improvement too,
> 
> Steve.

Hi,

The rgrplvb mount option only helps if the file system is using lock_dlm.
For lock_nolock, it's still just as slow because lock_nolock has no knowledge
of lvbs. Now, granted, that's an unusual case because GFS2 is normally used
with lock_dlm.

I like the idea of making rgrplvb the default mount option, and I don't
see a problem doing that.

I think the rgrplvb option should be compatible with this patch, but
I'll set up a test environment in order to test that they work together
harmoniously.

I also like the idea of using a pagevec for reading in multiple pages for
the rgrps, but that's another improvement for another day. If there's
not a bugzilla record open for that, perhaps we should open one.

Regards,

Bob Peterson
Red Hat File Systems

next prev parent reply	other threads:[~2015-06-12 19:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1673564717.11791069.1433515261791.JavaMail.zimbra@redhat.com>
2015-06-05 14:49 ` [Cluster-devel] [GFS2 PATCH] GFS2: Don't brelse rgrp buffer_heads every allocation Bob Peterson
2015-06-08 12:18   ` Steven Whitehouse
2015-06-09 14:45     ` Bob Peterson
2015-06-10 10:30       ` Steven Whitehouse
2015-06-12 19:50         ` Bob Peterson [this message]
2015-06-15 11:18           ` Steven Whitehouse
2015-06-15 13:56             ` Bob Peterson
2015-06-15 14:26               ` Steven Whitehouse
2015-06-15 14:43                 ` Bob Peterson
2015-06-16 10:19                   ` Steven Whitehouse
2015-06-16 13:54               ` Bob Peterson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2055885404.16127476.1434138634146.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).