All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Roberto Tyley <roberto.tyley@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, tr@thomasrast.ch
Subject: Re: [PATCH] docs: add filter-branch note about The BFG
Date: Tue, 17 Dec 2013 10:13:42 -0800	[thread overview]
Message-ID: <xmqqk3f3mjl5.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <1387277599-69719-1-git-send-email-roberto.tyley@gmail.com> (Roberto Tyley's message of "Tue, 17 Dec 2013 10:53:19 +0000")

Roberto Tyley <roberto.tyley@gmail.com> writes:

> The BFG is a tool specifically designed for the task of removing
> unwanted data from Git repository history - a common use-case for which
> git-filter-branch has been the traditional workhorse.
>
> It's beneficial to let users know that filter-branch has an alternative
> here:
>
> * speed : The BFG is 10-50x faster
>   http://rtyley.github.io/bfg-repo-cleaner/#speed
> * complexity of configuration : filter-branch is a very flexible tool,
>   but demands very careful usage in order to get the desired results
>   http://rtyley.github.io/bfg-repo-cleaner/#examples
>
> Obviously, filter-branch has it's advantages too - it permits very
> complex rewrites, and doesn't require a JVM - but for the common
> use-case of deleting unwanted data, it's helpful to users to be aware
> that an alternative exists.
>
> The BFG was released under the GPL in February 2013, and has since seen
> widespread production use (The Guardian, RedHat, Google, UK Government
> Digital Service), been tested against large repos (~300K commits, ~5GB
> packfiles) and received significant positive feedback from users:
>
> http://rtyley.github.io/bfg-repo-cleaner/#feedback
>
> Signed-off-by: Roberto Tyley <roberto.tyley@gmail.com>
> ---
>  Documentation/git-filter-branch.txt | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
> index e4c8e82..918e965 100644
> --- a/Documentation/git-filter-branch.txt
> +++ b/Documentation/git-filter-branch.txt
> @@ -18,6 +18,12 @@ SYNOPSIS
>  
>  DESCRIPTION
>  -----------
> +
> +NOTE: For simply removing unwanted data from repository history, you may
> +want to use link:http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner]
> +instead - it's generally faster and simpler for eliminating large files
> +or private data.
> +

My understanding is that the primary speed up of BFG comes from the
design decision it made to fitler each blob only once, unlike
filter-branch that allows you to (and forces you to) decide how the
same blob is filtered depending on the places it appears in space
(i.e. the path in the project's directory hierarchy) and time
(i.e. the commit it appears in).  For "removing unwanted data", I
think nobody needs the flexibility to filter differently depending
on the context, an it is a good idea to refer those with such need
to BFG.

Having said that, "You may want to use ..." without giving the
reason why we recommend the other tool leaves the reader wondering
what the pros and cons are, and why git-filter-branch exists if BFG
is the first thing its document recommends even before it describes
what git-filter-branch is and does.  "You may want to check ..."
might be slightly better, but probably by not that much improvement.

Rewriting "it's generally faster ..."  part to give a bit more info
to allow readers decide the pros and cons themselves may be needed.

>  Lets you rewrite Git revision history by rewriting the branches mentioned
>  in the <rev-list options>, applying custom filters on each revision.
>  Those filters can modify each tree (e.g. removing a file or running
> @@ -393,7 +399,7 @@ git filter-branch --index-filter \
>  Checklist for Shrinking a Repository
>  ------------------------------------
>  
> -git-filter-branch is often used to get rid of a subset of files,
> +git-filter-branch can be used to get rid of a subset of files,
>  usually with some combination of `--index-filter` and
>  `--subdirectory-filter`.  People expect the resulting repository to
>  be smaller than the original, but you need a few more steps to
> @@ -429,6 +435,12 @@ warned.
>    (or if your git-gc is not new enough to support arguments to
>    `--prune`, use `git repack -ad; git prune` instead).
>  
> +SEE ALSO
> +--------
> +link:http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner]
> +- a tool specifically designed for removing unwanted data from Git
> +repository history.
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite

  reply	other threads:[~2013-12-17 18:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-17 10:53 [PATCH] docs: add filter-branch note about The BFG Roberto Tyley
2013-12-17 18:13 ` Junio C Hamano [this message]
2013-12-18  1:04   ` Roberto Tyley
2013-12-18  5:57     ` Junio C Hamano
2013-12-18 14:25       ` [PATCH v2] Tweaked notes on gfb<->bfg differences Roberto Tyley
2013-12-18 14:25         ` [PATCH v2] docs: add filter-branch notes on The BFG Roberto Tyley
2013-12-17 18:40 ` [PATCH] docs: add filter-branch note about " Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqk3f3mjl5.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=roberto.tyley@gmail.com \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.