git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roberto Tyley <roberto.tyley@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, jrnieder@gmail.com,
	Roberto Tyley <roberto.tyley@gmail.com>
Subject: [PATCH v2] docs: add filter-branch notes on The BFG
Date: Wed, 18 Dec 2013 14:25:16 +0000	[thread overview]
Message-ID: <1387376716-72615-2-git-send-email-roberto.tyley@gmail.com> (raw)
In-Reply-To: <1387376716-72615-1-git-send-email-roberto.tyley@gmail.com>

The BFG is a tool specifically designed for the task of removing
unwanted data from Git repository history - a common use-case for which
git-filter-branch has been the traditional workhorse.

It's beneficial to let users know that filter-branch has an alternative
here:

* speed : The BFG is 10-50x faster
  http://rtyley.github.io/bfg-repo-cleaner/#speed
* complexity of configuration : filter-branch is a very flexible tool,
  but demands very careful usage in order to get the desired results
  http://rtyley.github.io/bfg-repo-cleaner/#examples

Obviously, filter-branch has it's advantages too - it permits very
complex rewrites, and doesn't require a JVM - but for the common
use-case of deleting unwanted data, it's helpful to users to be aware
that an alternative exists.

The BFG was released under the GPL in February 2013, and has since seen
widespread production use (The Guardian, RedHat, Google, UK Government
Digital Service), been tested against large repos (~300K commits, ~5GB
packfiles) and received significant positive feedback from users:

http://rtyley.github.io/bfg-repo-cleaner/#feedback

Signed-off-by: Roberto Tyley <roberto.tyley@gmail.com>
---
 Documentation/git-filter-branch.txt | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index e4c8e82..2eba627 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -393,7 +393,7 @@ git filter-branch --index-filter \
 Checklist for Shrinking a Repository
 ------------------------------------
 
-git-filter-branch is often used to get rid of a subset of files,
+git-filter-branch can be used to get rid of a subset of files,
 usually with some combination of `--index-filter` and
 `--subdirectory-filter`.  People expect the resulting repository to
 be smaller than the original, but you need a few more steps to
@@ -429,6 +429,37 @@ warned.
   (or if your git-gc is not new enough to support arguments to
   `--prune`, use `git repack -ad; git prune` instead).
 
+Notes
+-----
+
+git-filter-branch allows you to make complex shell-scripted rewrites
+of your Git history, but you probably don't need this flexibility if
+you're simply _removing unwanted data_ like large files or passwords.
+For those operations you may want to consider
+link:http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
+a JVM-based alternative to git-filter-branch, typically at least
+10-50x faster for those use-cases, and with quite different
+characteristics:
+
+* Any particular version of a file is cleaned exactly _once_. The BFG,
+  unlike git-filter-branch, does not give you the opportunity to
+  handle a file differently based on where or when it was committed
+  within your history. This constraint gives the core performance
+  benefit of The BFG, and is well-suited to the task of cleansing bad
+  data - you don't care _where_ the bad data is, you just want it
+  _gone_.
+
+* By default The BFG takes full advantage of multi-core machines,
+  cleansing commit file-trees in parallel. git-filter-branch cleans
+  commits sequentially (ie in a single-threaded manner), though it
+  _is_ possible to write filters that include their own parallellism,
+  in the scripts executed against each commit.
+
+* The link:http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
+  are much more restrictive than git-filter branch, and dedicated just
+  to the tasks of removing unwanted data- e.g:
+  `--strip-blobs-bigger-than 1M`.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
-- 
1.8.3.4 (Apple Git-47)

  reply	other threads:[~2013-12-18 14:25 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-17 10:53 [PATCH] docs: add filter-branch note about The BFG Roberto Tyley
2013-12-17 18:13 ` Junio C Hamano
2013-12-18  1:04   ` Roberto Tyley
2013-12-18  5:57     ` Junio C Hamano
2013-12-18 14:25       ` [PATCH v2] Tweaked notes on gfb<->bfg differences Roberto Tyley
2013-12-18 14:25         ` Roberto Tyley [this message]
2013-12-17 18:40 ` [PATCH] docs: add filter-branch note about The BFG Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1387376716-72615-2-git-send-email-roberto.tyley@gmail.com \
    --to=roberto.tyley@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).