Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Robert White <rwhite@pobox.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Cc: David Sterba <dsterba@suse.cz>
Subject: Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?
Date: Tue, 2 Dec 2014 09:17:45 +0800	[thread overview]
Message-ID: <547D1339.10404@cn.fujitsu.com> (raw)
In-Reply-To: <547CAF2E.7070109@pobox.com>


-------- Original Message --------
Subject: Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?
From: Robert White <rwhite@pobox.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>, linux-btrfs 
<linux-btrfs@vger.kernel.org>
Date: 2014年12月02日 02:10
> On 11/30/2014 10:18 PM, Qu Wenruo wrote:
>> (advocacy for using SQL internally for btrfsck)
>
> All of these ideas you want to toss a entire SQL front end on are more 
> simply handled with simple data structures.
>
> In C++ terms "map<inode,parent>" and/or "map<parent,vector<children>>" 
> beats the heck out of including all of SQL and its related indexes and 
> type conversions (sqlite, for example, stores integers as doubles, or 
> decimal numbers depending on version).
>
> RDBMS _are_ good at representing things, so noticing that a thing 
> _can_ be represented with an RDBMS is very common.
>
> But by the time you put two or three indexes on relation->(parent, 
> child, name) you've given yourself three or four copies of the core 
> data in three or four different places. And those copies are largely 
> immutable and randomly distributed and will include the overhead in 
> memory for fairly sparse trees.
>
> It's not that it's an unworkable idea.
>
> But it is unnecessarily generic and adds an order of magnitude of 
> complexity to your problems.
>
> For instance, if I boot from a CD to run a btrfsck where will the 
> database files be written to?
This is easy, memory.
Since only when we judge the fs' metadata is too huge then we will use file.

One of the problem in current inode_record is, btrfsck can only record 
them all in memory,
when metadata of the file system is too big, sysadmin can only add swap 
space or memory
to handle it.

Although it is not a urgent problem, since 1T btrfs fs with about 5G 
metadata will only takes about 500M
checking chunk and extent and even less for checking fs roots.
>
> If it is an in-memory table why do I want the overhead of SQL to look 
> up something indexed by integer?
>
> If the sparse vectors of integers don't fit in memory why would the 
> SQL tables of integers fit "better"?
>
> SQL would be the second slowest possible for representing this data -- 
> The slowest would be an XML schema stored as flat text.
>
> So your crazy ides is also a pretty bad one compared to most if not 
> all sparse data representations and techniques that come to bear on 
> this problem set. All you are really doing is pushing the same work 
> (walking a tree to find an integer) into a difficult "spell it out in 
> SQL" space.
>
> Is prepare_sql(curosr,"SELECT parent FROM parantage_tree WHERE child = 
> %d"); execute_sql(cursor,child); and its possible error returns 
> actually clearer or better than "parent=inheretance.find(child); if 
> (parent!=inheretance.end()) {...}" (as it might be written in C++)?
>
> Do you want to know if (keep track of whether) an inode is allocated 
> and referenced? There's a sparse bit-vector for that...
>
> Want to be able to get back to an inode's location on disk, a sparse 
> array of disk offsets exists (among other options).
>
> Before you can even access the RDBMS you'd have to fill it completely; 
> otherwise you wouldn't know if a select returning zero rows was an 
> authoritative indication that the datum didn't exist or if it was 
> instead an indication that the datum hadn't been populated yet.
>
> THIS IS NOT SARCASM: If you strongly disagree, I suggest you start 
> coding. Seriously, don't ask, do... And in a month really check to see 
> if your solution is any smaller, faster, easier, or in _any_ _way_ 
> more optimal than using native data structures. The attempt will 
> answer the question definitively and then we'll all know...
I know this is a crazy idea and not disagree with your opinion.
But I am also somewhat tired of bringing new structure new searching 
functions or even bring larger change on
the btrfsck record infrastructure when I found that can't provide the 
function when new recovery function is going
to be implemented.

In fact, after I implement the whole corrupted-leaf recovery patchset, I 
may try to implement it as an experimental
try-and-error for cleanup/enhance for the inode_record infrastructure 
and see if there is the huge performance drop
or the lines of code reduced(anyway, just a personal try-and-error, will 
not send them if there is no such interesting
result, and it may be highly possible a disaster as you mentioned)

Thanks,
Qu

next prev parent reply	other threads:[~2014-12-02  1:17 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-01  1:58 Crazy idea of cleanup the inode_record btrfsck things with SQL? Qu Wenruo
2014-12-01  3:08 ` Duncan
2014-12-01  3:24   ` Qu Wenruo
2014-12-01  5:47     ` Duncan
2014-12-01  6:25       ` Qu Wenruo
2014-12-01  4:03 ` Robert White
2014-12-01  6:18   ` Qu Wenruo
2014-12-01 18:10     ` Robert White
2014-12-02  1:17       ` Qu Wenruo [this message]
2014-12-03 19:18         ` Robert White
2014-12-04  6:56           ` Qu Wenruo
2014-12-10 21:57             ` Zygo Blaxell
2014-12-11  2:05               ` Qu Wenruo
2014-12-11  2:27                 ` Zygo Blaxell
2014-12-01 12:53 ` Austin S Hemmelgarn
2014-12-02  0:37   ` Qu Wenruo
2014-12-11 19:00     ` Martin Steigerwald
2014-12-11 19:38 ` Roger Binns

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=547D1339.10404@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rwhite@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.