git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Martin Scherer <m.scherer@fu-berlin.de>
Cc: git@vger.kernel.org
Subject: Re: Blobs not referenced by file (anymore) are not removed by GC
Date: Tue, 9 Dec 2014 09:14:57 -0500	[thread overview]
Message-ID: <20141209141457.GA18544@peff.net> (raw)
In-Reply-To: <5485D03F.3060008@fu-berlin.de>

On Mon, Dec 08, 2014 at 05:22:23PM +0100, Martin Scherer wrote:

> # invoke bfg --delete-folders something multiple times with different
> pattern.
> 
> # try to cleanup
> 
> git gc --aggressive --prune=now # big blobs still in history
> git fsck # no results
> git fsck --full  --unreachable --dangling # no results

Might you still have reflogs pointing to the objects? Try:

  git reflog expire --expire-unreachable=now --all

I also don't know if BFG keeps backup refs around (filter-branch, for
example, writes a copy of the original refs into refs/original; you
would want to delete that if you're trying to slim down the repo).

In general, you can see the on-disk size of the objects required for a
particular ref with something like:

  size() {
    git rev-list --objects "$@" |
    cut -d' ' -f1 |
    git cat-file --batch-check='%(objectsize:disk)' |
    perl -lne '$t += $_; END { print $t }'
  }

  # size of master branch
  size master

  # size of each ref on top of what is in the master branch
  git for-each-ref --format='%(refname)' |
  while read ref; do
    echo "$(size master..$ref) $ref"
  done | sort -rn


Note that these sizes are somewhat approximate. We may store object X
needed by one ref as a delta against Y used by another ref. The
accounting shows X as tiny compared to Y. And then a repack may find the
delta in the opposite direction. But if you're talking about rewriting
history to drop a bunch of gigantic objects, the output of the final
loop is a good way to see which refs are still referring to the old
history.

-Peff

  parent reply	other threads:[~2014-12-09 14:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08 16:22 Blobs not referenced by file (anymore) are not removed by GC Martin Scherer
     [not found] ` <CAFY1edaEq1zYV0vgSfiPAXU6bqVBzaA-apVnSn8DBMbzcAa2tQ@mail.gmail.com>
2014-12-08 16:47   ` Roberto Tyley
2014-12-09 14:14 ` Jeff King [this message]
2014-12-09 16:01   ` Roberto Tyley
2014-12-09 16:11     ` Jeff King
2014-12-09 22:15       ` Roberto Tyley
2014-12-10  7:11         ` Jeff King
2014-12-10 16:07           ` Junio C Hamano
2014-12-10 23:41             ` Roberto Tyley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141209141457.GA18544@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=m.scherer@fu-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).