* [Announce] bup 0.09: git-based backup system for really huge datasets
@ 2010-02-09 22:48 Avery Pennarun
2010-02-10 9:54 ` Jakub Narebski
2010-02-11 13:51 ` Stephen R. van den Berg
0 siblings, 2 replies; 5+ messages in thread
From: Avery Pennarun @ 2010-02-09 22:48 UTC (permalink / raw)
To: Git Mailing List
Hi all,
bup is a file backup tool based on the git packfile format. If you're
interested in git, you might find bup interesting because:
- It can handle really massive datasets (hundreds of gigabytes)
without melting down.
- It can handle huge individual files (hundreds of gigabytes), such as
virtual machine images or giant textual database dumps, while neither
wasting disk space nor bogging down in xdelta.
- It can backup files directly to a remote server, without creating
git objects on the local system first.
- It uses a different format for its index file (.bup/bupindex) that
allows you to search and iterate non-linearly. Thus if you have a
filesystem with a million files and only one of them is marked dirty,
bup can back it up near-instantly.
- Like git, it separates the concept of indexing the filesystem from
the concept of actually making new commits. Thus it would be easy to
plugin an inotify-like system eventually, avoiding the slow filesystem
iteration every time you want to make a backup.
- It introduces a "multi-index" file (midx) that has a sorted list of
the objects from multiple .pack files, so that checking for a
nonexistent object only needs to swap in two pages at most. (This is
unimportant in git, but critical when most of your work is ingesting
huge files whose sha1sums haven't been seen before.)
- It provides a FUSE-based filesystem so that you can easily browse
your backup history, including exporting it via samba if you want.
bup doesn't yet back up extra file metadata (beyond what git already
tracks). Obviously this will be needed relatively soon.
bup is still pretty experimental, but it's already a useful tool for
backing up your files, even if those files include millions of files
and hundreds of gigs of VM images.
You can find the source code (and README) at github:
http://github.com/apenwarr/bup
To subscribe to the bup mailing list, send an email to:
bup-list+subscribe@googlegroups.com
Looking forward to everyone's feedback.
Have fun,
Avery
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Announce] bup 0.09: git-based backup system for really huge datasets
2010-02-09 22:48 [Announce] bup 0.09: git-based backup system for really huge datasets Avery Pennarun
@ 2010-02-10 9:54 ` Jakub Narebski
2010-02-10 20:01 ` Avery Pennarun
2010-02-11 13:51 ` Stephen R. van den Berg
1 sibling, 1 reply; 5+ messages in thread
From: Jakub Narebski @ 2010-02-10 9:54 UTC (permalink / raw)
To: Avery Pennarun; +Cc: Git Mailing List
Avery Pennarun <apenwarr@gmail.com> writes:
> bup is a file backup tool based on the git packfile format.
[...]
> bup is still pretty experimental, but it's already a useful tool for
> backing up your files, even if those files include millions of files
> and hundreds of gigs of VM images.
>
> You can find the source code (and README) at github:
>
> http://github.com/apenwarr/bup
>
> To subscribe to the bup mailing list, send an email to:
>
> bup-list+subscribe@googlegroups.com
>
> Looking forward to everyone's feedback.
Would you be adding short info about your project to
http://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Announce] bup 0.09: git-based backup system for really huge datasets
2010-02-10 9:54 ` Jakub Narebski
@ 2010-02-10 20:01 ` Avery Pennarun
0 siblings, 0 replies; 5+ messages in thread
From: Avery Pennarun @ 2010-02-10 20:01 UTC (permalink / raw)
To: Jakub Narebski; +Cc: Git Mailing List
On Wed, Feb 10, 2010 at 4:54 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> Avery Pennarun <apenwarr@gmail.com> writes:
>> bup is a file backup tool based on the git packfile format.
> [...]
>> bup is still pretty experimental, but it's already a useful tool for
>> backing up your files, even if those files include millions of files
>> and hundreds of gigs of VM images.
>>
>> You can find the source code (and README) at github:
>>
>> http://github.com/apenwarr/bup
>>
>> To subscribe to the bup mailing list, send an email to:
>>
>> bup-list+subscribe@googlegroups.com
>>
>> Looking forward to everyone's feedback.
>
> Would you be adding short info about your project to
> http://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools
Done. Thanks for the reminder!
Have fun,
Avery
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Announce] bup 0.09: git-based backup system for really huge datasets
2010-02-09 22:48 [Announce] bup 0.09: git-based backup system for really huge datasets Avery Pennarun
2010-02-10 9:54 ` Jakub Narebski
@ 2010-02-11 13:51 ` Stephen R. van den Berg
2010-02-12 17:51 ` Avery Pennarun
1 sibling, 1 reply; 5+ messages in thread
From: Stephen R. van den Berg @ 2010-02-11 13:51 UTC (permalink / raw)
To: Avery Pennarun; +Cc: Git Mailing List
Avery Pennarun wrote:
>bup is a file backup tool based on the git packfile format. If you're
>interested in git, you might find bup interesting because:
Interesting concept. It has some killer features which make it a good
competitor to any of the existing solutions.
The only obvious thing missing for unattended backup-operation is a way
to purge specific or old backups.
--
Sincerely,
Stephen R. van den Berg.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Announce] bup 0.09: git-based backup system for really huge datasets
2010-02-11 13:51 ` Stephen R. van den Berg
@ 2010-02-12 17:51 ` Avery Pennarun
0 siblings, 0 replies; 5+ messages in thread
From: Avery Pennarun @ 2010-02-12 17:51 UTC (permalink / raw)
To: Stephen R. van den Berg; +Cc: Git Mailing List
On Thu, Feb 11, 2010 at 8:51 AM, Stephen R. van den Berg <srb@cuci.nl> wrote:
> Avery Pennarun wrote:
>>bup is a file backup tool based on the git packfile format. If you're
>>interested in git, you might find bup interesting because:
>
> Interesting concept. It has some killer features which make it a good
> competitor to any of the existing solutions.
> The only obvious thing missing for unattended backup-operation is a way
> to purge specific or old backups.
Thanks. Oddly enough, pruning of old backups hasn't been a really
high priority for me (or apparently any of the other users) because
chunking-based deduplication is so efficient that my backup disk
hasn't filled up yet :) But it's clear that this will need to be
added eventually.
Unfortunately git's normal pruning and gc stuff is inapplicable since
it dies horribly when faced with hundreds of gigabytes of data.
That's to be expected, but it means I can't just cheat by running 'git
gc' and hoping for magic.
Have fun,
Avery
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-02-12 17:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-09 22:48 [Announce] bup 0.09: git-based backup system for really huge datasets Avery Pennarun
2010-02-10 9:54 ` Jakub Narebski
2010-02-10 20:01 ` Avery Pennarun
2010-02-11 13:51 ` Stephen R. van den Berg
2010-02-12 17:51 ` Avery Pennarun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).