From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?
Date: Mon, 01 Dec 2014 07:53:08 -0500 [thread overview]
Message-ID: <547C64B4.2050802@gmail.com> (raw)
In-Reply-To: <547BCB43.5020505@cn.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 3533 bytes --]
On 2014-11-30 20:58, Qu Wenruo wrote:
> [BACKGROUND]
> I'm trying to implement the function to repair missing inode item.
> Under that case, inode type must be salvaged(although it can be fallback to
> FILE).
>
> One case should be, if there is any dir_item/index or inode_ref refers the
> inode as parent, the type of that inode must be DIR.
>
> However, currently btrfsck implement (inode_record only records
> backref), we
> are unable to search the inode_backref whose parent is given inode number.
>
> [FIRST IMPLEMENT DESIGN]
> My first thought is to implement an generic inode-relation structure,
> recording parent ino, child ino, name and namelen, and restore the
> structure
> in a rbtree, not in the child/parent's list.
>
> But I soon recognize that this is a perfect use case for relational
> database,
> as 'ino' as the primary key for INODE table,
> ('parent_ino', 'child_ino', 'name') as the primary key for INODE_REF table.
>
> [CRAZY IDEA]
> So why not using SQL to implement the btrfsck inode-record things?
>
> With such crazy idea, it will be much much easier to do any iteration
> from a
> given ino, and with the already mature RDB implement, like sqlite3, we can
> save hundreds of lines of codes implementing the rb-tree or list.
>
> [PROS]
> 1. Easy to maintain
> Now we don't need to maintain the rbtree searching or list
> iteration, but
> easy SQL lines and its wrapper.
>
> 2. Easy to extend
> If we need to record something more, like extents and its relation to
> inode, we only need to create 2 tables and several SQL and wrappers.
>
> 3. Reduced memory usage for HUGE fs.
> When metadata grows to several TB or even more, current rb-tree based
> implement may run short of memory since they are all stored in memory.
> But if use SQL, RDBMS like sqlite3 can restore things in either
> memory or
> disk, which may hugely reduce the memory usage for huge btrfs.
>
> If not use existing RDBMS, we need to implement complicated memory
> control
> system to manage memory in userland.
>
> [CONS]
> 1. Heavy implement
> SQL hide the rb-tree or B+ tree implement but costs more memory(if not
> compressed) and CPU cycles, which will be slower than the simple
> rb-tree
> implement even using lightweight RDBMS like sqlite3.
>
> 2. Heavy dependency
> If use it, btrfs-progs will include RDBMS as the make and runtime
> dependency.
> Such low level progs depend on high level programs like sqlite3 may
> be very
> strange.
>
> 3. A lot of rework on existing codes.
> Even SQL is easier to maintain and extend, if we use it, we still
> need to
> reimplement several hundreds or even thousands lines of code to
> implement
> it, not to mention the regression tests.
>
> 4. Copyright
> Will it cause any copyright problem if using non-GPL RDBMS like
> sqlite3 in
> GPLv2 btrfs-progs?
>
> [NEED FEEDBACK]
> Any feedback or discussion on the crazy idea is welcomed, since this may
> needs
> a lot of work, it definitely needs a lot review on the idea before it
> comes to
> codes.
>
So, I think this does a good job of highlighting one of the bigger
issues with btrfsck when it is compared to ext* and/or xfs. Despite
this being a problem, I really don't think using a rdbms is the way to
fix it, both for reasons outlined in other responses, and because fsck
should be as fast as possible when nothing is wrong with the fs.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]
next prev parent reply other threads:[~2014-12-01 12:53 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-01 1:58 Crazy idea of cleanup the inode_record btrfsck things with SQL? Qu Wenruo
2014-12-01 3:08 ` Duncan
2014-12-01 3:24 ` Qu Wenruo
2014-12-01 5:47 ` Duncan
2014-12-01 6:25 ` Qu Wenruo
2014-12-01 4:03 ` Robert White
2014-12-01 6:18 ` Qu Wenruo
2014-12-01 18:10 ` Robert White
2014-12-02 1:17 ` Qu Wenruo
2014-12-03 19:18 ` Robert White
2014-12-04 6:56 ` Qu Wenruo
2014-12-10 21:57 ` Zygo Blaxell
2014-12-11 2:05 ` Qu Wenruo
2014-12-11 2:27 ` Zygo Blaxell
2014-12-01 12:53 ` Austin S Hemmelgarn [this message]
2014-12-02 0:37 ` Qu Wenruo
2014-12-11 19:00 ` Martin Steigerwald
2014-12-11 19:38 ` Roger Binns
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=547C64B4.2050802@gmail.com \
--to=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.