linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Many questions from a potential btrfs user
@ 2013-10-14  2:54 Rogério Brito
  2013-10-14  7:48 ` Hugo Mills
  0 siblings, 1 reply; 2+ messages in thread
From: Rogério Brito @ 2013-10-14  2:54 UTC (permalink / raw)
  To: linux-btrfs

Hi.

I am seriously considering employing btrfs on my systems, particularly due
to some space-saving features that it has (namely, deduplication and
compression).

In fact, I was (a few moments ago) trying to back up some of my systems to a
2TB HD that has an ext4 filesystem and, in the middle of the last one, I got
the error message that the backup HD was full.

Given that what I backup there are systems where I have some of the data
present multiple times (e.g., my mailbox that is sync'ed via offlineimap, or
videos that I download from online learning sites) and that such data
consists of many small files that are highly compressible (the e-mails) or
large files (the videos), I would like to employ btrfs.

So, after reading the documentation on https://btrfs.wiki.kernel.org/, I am
still unsure of some points and I would like to have some clarifications
and/or expectations set straight.


* I understand that I can convert an ext4 filesystem to btrfs. Will such
  conversion work with an almost full ext4 filesystem? How much overhead
  will be needed to perform the conversion? I can (temporarily) remove some
  files that already are on this backup.

* Is it possible to deduplicate the files that are already in it? As
  mentioned before, there are likely to be many, and some of them are on the
  order of 1 to 2GBs.

* Doing a defragmentation with the filesystem mounted with compression will
  recompress the files (if they are deemed compressible by the
  filesystem). Is that understanding correct?  Will compressed blocks among
  many files also be deduplicated?

* How exactly do the recently merged offline deduplication features in the
  kernel interfere with what was (in my limited understanding) already
  possible with userspace tools like <https://github.com/g2p/bedup>?  Are
  such third-party tools likely to be integrated into btrfs-progs? Are they
  supposed to be kept separate?

* Does this change the on-disk format? Putting it another way, will it be
  safe to possibly go back to a previous kernel, if there is some problem
  with the current kernels? (Not that I necessarily want to go back to a
  previous kernel, but, sometimes, one would need to, say, git bisect the
  kernel).

* I most likely *don't* want to use online deduplication (given my bad
  experiences with ZFS).  With that in mind, is the current userspace
  deduplicaton intended to be run as a cron job? Is the offline
  deduplication too memory intensive?  How much RAM would it be needed for a
  2TB filesystem? Are 2GB enough? How about 4GB?

* Will further runs of the offline deduplication be "incremental" in some
  imprecise sense of the word? That is, if I run the deduplication once and
  immediately run it again (supposing nothing changes), will the 2nd time be
  faster than the first?  (If the disk caches are dropped?)

* Will I be able to add further HDs to my btrfs filesystem, once I get some
  more money to run something like a RAID0 configuration? If I get more HDs
  later, will I be able to change the configuration to, say, RAID5 or RAID6?
  I don't intend to use lvm, unless I have to.


I think that I had other questions, but since it is now past bed time, I
can't remember them. :)

Any further comments and/or guidance will be gladly accepted.


Thanks in advance,

Rogério Brito.


-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-10-14  7:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-14  2:54 Many questions from a potential btrfs user Rogério Brito
2013-10-14  7:48 ` Hugo Mills

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).