All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Martin Scherer <m.scherer@fu-berlin.de>
Cc: git@vger.kernel.org
Subject: Re: Blobs not referenced by file (anymore) are not removed by GC
Date: Tue, 9 Dec 2014 09:14:57 -0500	[thread overview]
Message-ID: <20141209141457.GA18544@peff.net> (raw)
In-Reply-To: <5485D03F.3060008@fu-berlin.de>

On Mon, Dec 08, 2014 at 05:22:23PM +0100, Martin Scherer wrote:

> # invoke bfg --delete-folders something multiple times with different
> pattern.
> 
> # try to cleanup
> 
> git gc --aggressive --prune=now # big blobs still in history
> git fsck # no results
> git fsck --full  --unreachable --dangling # no results

Might you still have reflogs pointing to the objects? Try:

  git reflog expire --expire-unreachable=now --all

I also don't know if BFG keeps backup refs around (filter-branch, for
example, writes a copy of the original refs into refs/original; you
would want to delete that if you're trying to slim down the repo).

In general, you can see the on-disk size of the objects required for a
particular ref with something like:

  size() {
    git rev-list --objects "$@" |
    cut -d' ' -f1 |
    git cat-file --batch-check='%(objectsize:disk)' |
    perl -lne '$t += $_; END { print $t }'
  }

  # size of master branch
  size master

  # size of each ref on top of what is in the master branch
  git for-each-ref --format='%(refname)' |
  while read ref; do
    echo "$(size master..$ref) $ref"
  done | sort -rn


Note that these sizes are somewhat approximate. We may store object X
needed by one ref as a delta against Y used by another ref. The
accounting shows X as tiny compared to Y. And then a repack may find the
delta in the opposite direction. But if you're talking about rewriting
history to drop a bunch of gigantic objects, the output of the final
loop is a good way to see which refs are still referring to the old
history.

-Peff

  parent reply	other threads:[~2014-12-09 14:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08 16:22 Blobs not referenced by file (anymore) are not removed by GC Martin Scherer
     [not found] ` <CAFY1edaEq1zYV0vgSfiPAXU6bqVBzaA-apVnSn8DBMbzcAa2tQ@mail.gmail.com>
2014-12-08 16:47   ` Roberto Tyley
2014-12-09 14:14 ` Jeff King [this message]
2014-12-09 16:01   ` Roberto Tyley
2014-12-09 16:11     ` Jeff King
2014-12-09 22:15       ` Roberto Tyley
2014-12-10  7:11         ` Jeff King
2014-12-10 16:07           ` Junio C Hamano
2014-12-10 23:41             ` Roberto Tyley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141209141457.GA18544@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=m.scherer@fu-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.