cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Eliminate bitmap clones
Date: Tue, 3 Jul 2018 09:28:46 -0400 (EDT)	[thread overview]
Message-ID: <1706893656.47709899.1530624526353.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <b827940f-d39a-8681-6f4f-fb87e18a560d@redhat.com>

Hi Steve,

----- Original Message -----
> > Do we really still need "clone bitmaps" in gfs2? If so, why?
> > I think maybe we can get rid of them. Can someone (Steve Whitehouse
> > perhaps?) think of a scenario in which they're still needed? If so,
> > please elaborate and give an example.
(snip)
> You need to ensure that the blocks cannot be reused in the same
> transaction (thats true of all metadata blocks, not just inodes) in
> order that recovery will work correctly. You cannot just eliminate the
> bitmaps without adding a mechanism to prevent this reuse,
> 
> Steve.

I don't see how it's possible for a transaction to reuse the same blocks,
even when transactions are combined.

As you know, GFS2 (unlike GFS1) marks only one type of metadata in its
bitmaps, and that's for dinode blocks. Any other metadata associated with
a dinode are marked as data blocks in the bitmap, and they remain marked
as such until freed. So if you have a process that truncates a file,
for example, and transitions its blocks from data to free, then searches,
finds and reallocates those blocks as data again, there would still only
be one copy of the bitmap buffer data in the ail lists, right?
And it should always reflect the most recent status of those bits, which
is data, right? So a journal replay will still replay the latest known
version of those bitmaps.

If a dinode references indirect blocks (marked as data) then
truncates the file to 0, the indirect blocks still remain because
the metadata for indirect blocks is never shrunk.

If the dinode is unlinked rather than deleted, its indirect blocks and
data blocks will all remain "data" until the inode is actually evicted.
When the inode is evicted and those blocks actually freed, that's all
done in separate transactions as per Andreas's "shrinker" patches, and
we know those don't search for free blocks to assign.

If a dinode is unlinked, and someone goes after free blocks, they won't
find those blocks anyway because they're still not "free" until the inode
is evicted. And, of course, the only process that searches the bitmaps
for unlinked blocks is the eviction process itself (which actually does
something with them) and inplace_reserve, which just tries to kick
off a potential eviction (but never actually does an eviction itself).

It's a little bit different with directories, because the hash table
is kind of data and kind of metadata, but even so, we don't ever shrink
the directory hash tables nor free leaf blocks or leaf continuation
blocks, as per bz#223783 (which suggests we might want to in the future.)
The clones today cost us file fragmentation, file system fragmentation,
and performance required to do kmalloc/kfrees, and twice as much work
setting and clearing bits, so I question whether the savings in
shrinking hash tables or freeing unused continuation leafs outweigh the
potential savings we might get by eliminating the bitmap clones. 

Again, I don't see a scenario that can get us into trouble, even with
journal replay.

Perhaps I should be worried about extended attributes that are freed
and reused? I'll look into that.

Bob



  reply	other threads:[~2018-07-03 13:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <2132185052.47517256.1530553894290.JavaMail.zimbra@redhat.com>
2018-07-02 17:58 ` [Cluster-devel] [GFS2 PATCH] GFS2: Eliminate bitmap clones Bob Peterson
2018-07-02 19:48   ` Andreas Gruenbacher
2018-07-03  9:00   ` Steven Whitehouse
2018-07-03 13:28     ` Bob Peterson [this message]
2018-07-03 14:34       ` Steven Whitehouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1706893656.47709899.1530624526353.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).