linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: mingming cao <mingming.cao@oracle.com>
To: linux-ext4@vger.kernel.org
Cc: mingming cao <mingming.cao@oracle.com>
Subject: [RFC 0/2] ext4 btree
Date: Mon, 22 Jun 2015 20:24:36 -0700	[thread overview]
Message-ID: <1435029878-4517-1-git-send-email-mingming.cao@oracle.com> (raw)

Hello list,

Last week during ext4 weekly call, we discussed about some of design issues with ext4 btree.  Some background about ext4 btree -- when we started to look at ext4 reflink feature, one of the key design issue is how to store/index the refcount(number of times a range of disk blocks being shared) efficially on disk. Btree seems to a good data structure to serve that purpose.  So I started to look at a ext4 btree to store refcounts for sharing data blocks.  I started to play with a in memory btree (ideas from linux btree library) and have implemented basic functionality of btrees -- insert, delete, split, merge etc...

And while we a a btree for ext4,  there are raising interest to design a more flexible and generic ext4 btree, so we might able to use it for other purpose, like data checksumming, directories, etc other metadata.  We plan to use a ext4_btree_geo structure to define a btree layout and use many access functions to get into the btree index keys or leaf records. The key size and record size are defined by the geometry when initialize a btree. If there are other btree users like to have variable length records within a leaf node, that could be considered in the design too.

As where to root of btree store on disk for reflink, Darrick initially suggested to have a per-flexible block group refcount btree.. The plan is to create a new on-disk per-flexbg metadata structure, which will stores the root block of the reflink (and maybe to store other per-flex bg btrees in the future), and the block to store the new per-flexbg metadata structure will be stored in the last unused 32bit of the block group descriptor... This way we will have  the other options considered are 1) store the the root on the reflinked inode's extended attributes, so the btrees are per-reflink-related files only 2) or we have a globle per-filesystem reflink btree that sorts the refcount of physical blocks for entire filesystem, which maybe create lock contention whenever cow happens.

Attached is the very early draft of the btree prototype still looks very basic -- just to show the ideas about the btree in hope to find out who are  interested in using btrees in ext4 and what is missing .. I am very looking forward to ideas, suggestions, and comments..critics are welcomed too!


Mingming
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in

             reply	other threads:[~2015-06-23  6:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-23  3:24 mingming cao [this message]
2015-06-23  3:24 ` [RFC 1/2] btree header mingming cao
2015-06-23 19:33   ` Darrick J. Wong
2015-06-24  4:14     ` Mingming Cao
2015-06-24  5:21       ` Darrick J. Wong
2015-06-23  3:24 ` [RFC 2/2] ext4 btree basic implementation mingming cao
2015-06-23 23:02   ` Darrick J. Wong
2015-06-29 22:08     ` mingming cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1435029878-4517-1-git-send-email-mingming.cao@oracle.com \
    --to=mingming.cao@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).