linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin <m_btrfs@ml1.co.uk>
To: linux-btrfs@vger.kernel.org
Subject: Re: Will big metadata blocks fix # of hardlinks?
Date: Tue, 29 May 2012 14:09:03 +0100	[thread overview]
Message-ID: <jq2hpf$c6r$1@dough.gmane.org> (raw)
In-Reply-To: <20120526182211.GA16059@sli.dy.fi>

Thanks for noting this one. That is one very surprising and unexpected
limit!... And a killer for some not completely rare applications...

On 26/05/12 19:22, Sami Liedes wrote:
> Hi!
> 
> I see that Linux 3.4 supports bigger metadata blocks for btrfs.
> 
> Will using them allow a bigger number of hardlinks on a single file
> (i.e. the bug that has bitten at least git users on Debian[1,2], and
> BackupPC[3])? As far as I understand correctly, the problem has been
> that the hard links are stored in the same metadata block with some
> other metadata, so the size of the block is an inherent limitation?
> 
> If so, I think it would be worth for me to try Btrfs again :)
> 
> 	Sami
> 
> 
> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/13603
> [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642603
> [3] https://bugzilla.kernel.org/show_bug.cgi?id=15762

One example fail case is just 13 hard links. Even x4 that (16k blocks)
only gives 52 links for that example fail case.


The brief summary for those are:

* It's a rare corner case that needs a format change to fix, so "won't-fix";

* There are real world problem examples noted in those threads for such
as: BackupPC (backups); nnmaildir mail backend in Gnus (an Emacs package
for reading news and email); and a web archiver.

* Also, Bacula (backups) and Mutt (email client) are quoted as problem
examples in:

Btrfs File-System Plans For Ubuntu 12.10
http://www.phoronix.com/scan.php?page=news_item&px=MTEwMDE


For myself, I have a real world example for deduplication of identical
files from a proprietary data capture system where the filenames change
(timestamp and index data stored in the filename) yet there are periods
where the file contents change only occasionally... The 'natural' thing
to do is hardlink together all the identical files to then just have the
unique filenames... And you might have many files in a particular
directory...

Note that for long filenames (surprisingly commonly done!), one fail
case noted above is just 13 hard links.


Looks like I'm stuck on ext4 with an impoverished "cp -l" for a fast
'snapshot' for the time being still... (Or differently, LVM snapshot and
copy.)


For btrfs, rather than a "break everything" format change, can a neat
and robust 'workaround' be made so that the problem-case hardlinks to a
file within the same directory perhaps spawn their own transparent
subdirectory for the hard links?... Worse case then is that upon a
downgrade to an older kernel, the 'transparent' subdirectory of hard
links becomes visible as a distinct subdirectory? (That is a 'break' but
at least data isn't lost.)

Or am I chasing the wrong bits? ;-)


More seriously: The killer there for me is that running rsync or running
a deduplication script might hit too many hard links that were perfectly
fine when on ext4.

Regards,
Martin



  reply	other threads:[~2012-05-29 13:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-26 18:22 Will big metadata blocks fix # of hardlinks? Sami Liedes
2012-05-29 13:09 ` Martin [this message]
2012-05-29 13:23   ` Hugo Mills

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='jq2hpf$c6r$1@dough.gmane.org' \
    --to=m_btrfs@ml1.co.uk \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).