git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Mateusz Loskot <mateusz@loskot.net>, git@vger.kernel.org
Subject: Re: Migration to Git LFS inflates repository multiple times
Date: Mon, 12 Nov 2018 13:42:21 +0100	[thread overview]
Message-ID: <87k1li31si.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20181112123058.GE3956@sigill.intra.peff.net>


On Mon, Nov 12 2018, Jeff King wrote:

> On Mon, Nov 12, 2018 at 12:47:42AM +0100, Mateusz Loskot wrote:
>
>> Hi,
>>
>> I'm posting here for the first time and I hope it's the right place to ask
>> questions about Git LFS.
>>
>> TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times
>> and how to deal with it?
>
> That does sound odd to me. People with more LFS experience can probably
> give you a better answers, but one thought occurred to me: does LFS
> store backup copies of the original refs that it rewrites (similar to
> the way filter-branch stores refs/original)?
>
> If so, then the resulting repo has the new history _and_ the old
> history. Which might mean storing those large blobs both as Git objects
> (for the old history) and in an LFS cache directory (for the new
> history).
>
> And the right next step is probably to delete those backup refs, and
> then "git gc --prune=now". Hmm, actually thinking about it, reflogs
> could be making the old history reachable, too.
>
> Try looking at the output of "git for-each-ref" and seeing if there are
> any backup refs. After deleting them (or confirming that there aren't),
> prune the reflogs with:
>
>   git reflog expire --expire-unreachable=now --all
>
> and then "git gc --prune=now".

Even if it's only the most recent version of each file this could also
be explained by LFS storing each file inflated as-is on disk, whereas
git will store them delta-compressed.

According to the initial E-Mail "*.exe,*.dll,*.lib,*.pdb,*.zip" was
added to LFS. Depending on the content of those they might be delta
compressing somewhat better than random data.

  reply	other threads:[~2018-11-12 12:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-11 23:47 Migration to Git LFS inflates repository multiple times Mateusz Loskot
2018-11-12 12:30 ` Jeff King
2018-11-12 12:42   ` Ævar Arnfjörð Bjarmason [this message]
2018-11-13  0:39   ` Mateusz Loskot
2018-11-13 20:24 ` Mateusz Loskot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1li31si.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mateusz@loskot.net \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).