Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: Robert White <rwhite@pobox.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	David Sterba <dsterba@suse.cz>
Subject: Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?
Date: Wed, 10 Dec 2014 21:27:43 -0500	[thread overview]
Message-ID: <20141211022742.GA25614@hungrycats.org> (raw)
In-Reply-To: <5488FBE0.7060309@cn.fujitsu.com>

[-- Attachment #1: Type: text/plain, Size: 1489 bytes --]

On Thu, Dec 11, 2014 at 10:05:20AM +0800, Qu Wenruo wrote:
> 
> -------- Original Message --------
> Subject: Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?
> From: Zygo Blaxell <zblaxell@furryterror.org>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2014年12月11日 05:57
> >On Thu, Dec 04, 2014 at 02:56:55PM +0800, Qu Wenruo wrote:
> >>The main memory usage in btrfsck is extent record, which
> >>we can't free them until we read them all in and checked, so even we
> >>mmap/unmap, it can only help with
> >>the extent_buffer(which is already freed if not used according to refs).
> >I'm thinking aloud here, but is it *really* necessary to read everything
> >into memory?
> Totally agreed to only read what we need.
> But some backref and counts on refs can only be determined after a
> full scan, especially for leaf/node corruption
> case.

It might be faster (and smaller) to pipe them out to sort (with gzip/lzma
compression on temporary files) than to try to insert them in a tree.
I have used that technique in some of my deduplicating programs.  It can
cut the working set size by several orders of magnitude (trading it for
an O(n log n) sort, which will mostly read and write sequentially).

e.g. duplicate refs will all sort together, so when you are sequentially
reading the sorted data and the current key value changes, you know you've
seen everything that could be a duplicate, and can discard everything
in RAM.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

next prev parent reply	other threads:[~2014-12-11  2:27 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-01  1:58 Crazy idea of cleanup the inode_record btrfsck things with SQL? Qu Wenruo
2014-12-01  3:08 ` Duncan
2014-12-01  3:24   ` Qu Wenruo
2014-12-01  5:47     ` Duncan
2014-12-01  6:25       ` Qu Wenruo
2014-12-01  4:03 ` Robert White
2014-12-01  6:18   ` Qu Wenruo
2014-12-01 18:10     ` Robert White
2014-12-02  1:17       ` Qu Wenruo
2014-12-03 19:18         ` Robert White
2014-12-04  6:56           ` Qu Wenruo
2014-12-10 21:57             ` Zygo Blaxell
2014-12-11  2:05               ` Qu Wenruo
2014-12-11  2:27                 ` Zygo Blaxell [this message]
2014-12-01 12:53 ` Austin S Hemmelgarn
2014-12-02  0:37   ` Qu Wenruo
2014-12-11 19:00     ` Martin Steigerwald
2014-12-11 19:38 ` Roger Binns

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141211022742.GA25614@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    --cc=rwhite@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox