From: Ming Lin <mlin@kernel.org>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: "linux-bcache@vger.kernel.org" <linux-bcache@vger.kernel.org>
Subject: Re: [ANNOUNCE] bcachefs!
Date: Thu, 06 Aug 2015 22:21:25 -0700 [thread overview]
Message-ID: <1438924885.31517.11.camel@hasee> (raw)
In-Reply-To: <20150806231112.GB2459@kmo-pixel>
On Thu, 2015-08-06 at 16:11 -0700, Kent Overstreet wrote:
> On Wed, Aug 05, 2015 at 11:40:06PM -0700, Ming Lin wrote:
> > On Tue, 2015-07-28 at 11:45 -0700, Ming Lin wrote:
> > > On Tue, Jul 28, 2015 at 11:41 AM, Ming Lin <mlin@kernel.org> wrote:
> > > > On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@kernel.org> wrote:
> > > >>
> > > >> And I want to learn how the btree node insert/delete/update happens on
> > > >> disk. These maybe too detail. I'm going to write a small tool to dump
> > > >> the file system. Then I could understand better the on disk btree
> > > >> format.
> > > >
> > > > Here is my simple tool to dump parts of the on-disk format.
> > > > http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2
> > >
> > > Actually: http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=3121eec
> > >
> > > >
> > > > It's not in good shape, but simple enough to learn the on-disk format.
> >
> > Hi Kent,
> >
> > I'm trying to understand how the root inode is stored in the inode
> > btree.
> >
> > dd if=/dev/zero of=fs.img bs=10M count=1
> > bcacheadm format -C fs.img
> > mount -t bcache -o loop fs.img /mnt
> > umount /mnt
> > hexdump -C fs.img > fs.hex
> >
> > From my simple tool, I know that the inode btree starts from offset
> > 0xec000
>
> The root node of the inode btree? Are you handling trees with multiple nodes
> yet?
Yes and no.
>
> >
> > 000ec000 43 ef f3 df ff ff ff ff 86 c1 47 1e 99 25 51 35 |C.........G..%Q5|
> > 000ec010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > 000ec020 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................|
> > 000ec030 ff ff ff ff ff ff ff ff 01 05 00 00 00 00 00 00 |................|
> > 000ec040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > *
> > 000ec070 88 b5 38 e2 45 36 eb f6 00 00 00 00 00 00 00 00 |..8.E6..........|
> > 000ec080 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
> > 000ec090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > *
> > 000ed000 31 66 fd 31 ff ff ff ff 88 b5 38 e2 45 36 eb f6 |1f.1......8.E6..|
> > 000ed010 02 00 00 00 00 00 00 00 01 00 00 00 03 00 0b 00 |................|
> > 000ed020 0b 01 80 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > 000ed030 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
> > 000ed040 ed 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.A..............|
> > 000ed050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > *
> > 000ed070 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > 000ed080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > *
> >
> > btree_node (0xec000)
> > bset (0xed008) ---> bset->u64s = 0x0b = 11
> > bkey_packed (0xed020)
> > bkey (0xed020)
> > bch_inode (0xed040 to 0xed077) ---> root inode
> >
> > Is the decode above correct?
>
> I think so. The code that deals with reading in a btree node disk and
> interpreting the contents is mainly in bch_btree_node_read_done(), btree_io.c -
> it looks like you found that?
I haven't dig into the code yet.
Firstly to understand the on-disk structure by hexdump.
>
> > I found the root inode manually. But how is it actually found by code?
>
> The root inode is the inode with inode number BCACHE_ROOT_INO (4096) -
> http://evilpiepirate.org/git/linux-bcache.git/tree/drivers/md/bcache/fs.c?h=bcache-dev&id=5cf7fb11d124839eea2191fd7e8eddecb296d67d#n2285
>
> So to do it correctly, you'll need the bkey packing code in order to unpack the
> key (if it was packed) so that you can get the actual inode number of the key.
>
> You'll also need to do something like the mergesort algorithm (or something
> equivalent; you don't need to do the actual mergesort if you're just doing a
> linear search for one key). That is - if there's multiple bsets, they will
> likely contain duplicates and keys in newer bsets overwrite keys in older bsets.
Don't understand this part for now. I'll learn it.
>
> > Could you help to explain what it is from 0xec070 to 0xed007?
> > Are they also bsets?
>
> Without knowing your block size and spending a fair amount of time staring at
> the hexdump, I don't know what starts there - but quite possibly yes; bsets that
> aren't at the start of the btree node are embeddedd in a struct
> btree_node_entry, not a struct btree_node.
>
> To tell if it's a valid bset, you compare bset->seq against the seq in the first
> bset - it's a random number generated for each new btree node; if they match
> then the bset there goes with that btree node.
The block size is 4K.
OK, now I can interpret the hexdump.
000ec000 43 ef f3 df ff ff ff ff 86 c1 47 1e 99 25 51 35 |C.........G..%Q5|
000ec010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ec020 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................|
000ec030 ff ff ff ff ff ff ff ff 01 05 00 00 00 00 00 00 |................|
000ec040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ec070 88 b5 38 e2 45 36 eb f6 00 00 00 00 00 00 00 00 |..8.E6..........|
000ec080 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
000ec090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ed000 31 66 fd 31 ff ff ff ff 88 b5 38 e2 45 36 eb f6 |1f.1......8.E6..|
000ed010 02 00 00 00 00 00 00 00 01 00 00 00 03 00 0b 00 |................|
000ed020 0b 01 80 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ed030 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
000ed040 ed 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.A..............|
000ed050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ed070 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ed080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ee000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
There are 2 bsets: bset->seq "88 b5 38 e2 45 36 eb f6"
btree_node (0xec000)
bset_1 (0xec070) ---> bset->u64s = 0 (a empty bset?)
btree_node_entry (0xed000)
bset_2 (0xed008) ---> bset->u64s = 0x0b = 11
bkey_packed (0xed020)
bkey (0xed020)
bch_inode (0xed040 to 0xed077) ---> root inode
Why is there a empty bset at the start of the btree node?
next prev parent reply other threads:[~2015-08-07 5:21 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-14 0:58 [ANNOUNCE] bcachefs! Kent Overstreet
[not found] ` <CACaajQtwx45r8GcRmchrQwDts1GH-V8g0x1FwGfDvnfm02bq+Q@mail.gmail.com>
2015-07-14 8:11 ` Kent Overstreet
2015-07-20 1:11 ` Denis Bychkov
[not found] ` <CAC7rs0uWSt85F443PRw1zvybccg+EfebaSyH9EhUwHjhTGryRA@mail.gmail.com>
[not found] ` <CAC7rs0upqkuH1CPd-OAmrpQ=8PmaDpzHYY1MaBDpAL6TS_iKyw@mail.gmail.com>
2015-07-20 2:52 ` Denis Bychkov
2015-07-24 19:25 ` Kent Overstreet
2015-07-15 6:11 ` Ming Lin
[not found] ` <CAC7rs0sbg2ci6=niQ0X11AONZbr2AOYhRbxfDH_w4N4A7dyPLw@mail.gmail.com>
2015-07-15 7:15 ` Ming Lin
2015-07-15 7:39 ` Ming Lin
2015-07-17 23:17 ` Kent Overstreet
2015-07-17 23:35 ` Ming Lin
2015-07-17 23:40 ` Kent Overstreet
2015-07-17 23:48 ` Ming Lin
2015-07-17 23:51 ` Kent Overstreet
2015-07-17 23:58 ` Ming Lin
2015-07-18 2:10 ` Kent Overstreet
2015-07-18 5:21 ` Ming Lin
2015-07-22 5:11 ` Ming Lin
2015-07-22 5:15 ` Ming Lin
2015-07-24 19:15 ` Kent Overstreet
2015-07-24 20:47 ` Ming Lin
2015-07-28 18:41 ` Ming Lin
2015-07-28 18:45 ` Ming Lin
2015-08-06 6:40 ` Ming Lin
2015-08-06 23:11 ` Kent Overstreet
2015-08-07 5:21 ` Ming Lin [this message]
2015-08-06 22:58 ` Kent Overstreet
2015-08-06 23:27 ` Ming Lin
2015-08-06 23:59 ` Kent Overstreet
2015-07-18 0:01 ` Denis Bychkov
2015-07-18 2:12 ` Kent Overstreet
2015-07-19 7:46 ` Denis Bychkov
2015-07-21 18:37 ` David Mohr
2015-07-21 21:53 ` Jason Warr
2015-07-24 19:32 ` Kent Overstreet
2015-07-24 19:42 ` Jason Warr
2015-07-22 7:19 ` Killian De Volder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1438924885.31517.11.camel@hasee \
--to=mlin@kernel.org \
--cc=kent.overstreet@gmail.com \
--cc=linux-bcache@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox