git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [PATCH] gc: do not warn about too many loose objects
Date: Mon, 16 Jul 2018 17:21:59 -0400	[thread overview]
Message-ID: <20180716212159.GH25189@sigill.intra.peff.net> (raw)
In-Reply-To: <CABPp-BEpCF9FE7eJwZWjY+bMsjDQnnDaSrHO+e3DtDDsR-=7Hg@mail.gmail.com>

On Mon, Jul 16, 2018 at 02:09:43PM -0700, Elijah Newren wrote:

> >> Um, except it doesn't actually do that.  The testcase I provided shows
> >> that it leaves around 10000 objects that are totally deletable and
> >> were only previously referenced by reflog entries -- entries that gc
> >> removed without removing the corresponding objects.
> >
> > What's your definition of "totally deletable"?
> 
> The point of gc is to: expire reflogs, repack referenced objects, and
> delete loose objects that (1) are no longer referenced and (2) are
> "old enough".
> 
> The "old enough" bit is the special part here.  Before the gc, those
> objects were referenced (only by old reflog entries) and were found in
> a pack.  git currently writes those objects out to disk and gives them
> the age of the packfile they are contained in, which I think is the
> source of the bug.  We have a reference for those objects from the
> reflogs, know they aren't referenced anywhere else, so those objects
> to me are the age of those reflog entries: 90 days.  As such, they are
> "old enough" and should be deleted.

OK, I see what you're saying, but...

> I never got around to fixing it properly, though, because 'git prune'
> is such a handy workaround that for now.  Having people nuke all their
> loose objects is a bit risky, but the only other workaround people had
> was to re-clone (waiting the requisite 20+ minutes for the repo to
> download) and throw away their old clone.  (Though some people even
> went that route, IIRC.)

If we were to delete those objects, wouldn't it be exactly the same as
running "git prune"? Or setting your gc.pruneExpire to "now" or even "5
minutes"?  Or are you concerned with taking other objects along for the
ride that weren't part of old reflogs? I think that's a valid concern,
but it's also an issue for objects which were previously referenced in
a reflog, but are part of another current operation.

Also, what do you do if there weren't reflogs in the repo? Or the
reflogs were deleted (e.g., because branch deletion drops the associated
reflog entirely)?

> With big repos, it's easy to get into situations where there are well
> more than 10000 objects satisfying these conditions.  In fact, it's
> not all that rare that the repo has far more loose objects after a git
> gc than before.

Yes, this is definitely a wart and I think is worth addressing.

> I totally agree with your general plan to put unreferenced loose
> objects into a pack.  However, I don't think these objects should be
> part of that pack; they should just be deleted instead.

I assume by "these objects" you mean ones which used to be reachable
from a reflog, but that reflog entry just expired.  I think you'd be
sacrificing some race-safety in that case.

If the objects went into a pack under a race-proof scheme, would that
actually bother you? Is it the 10,000 objects that's a problem, or is it
the horrible I/O from exploding them coupled with the annoying auto-gc
behavior?

-Peff

  reply	other threads:[~2018-07-16 21:22 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 17:27 [PATCH] gc: do not warn about too many loose objects Jonathan Tan
2018-07-16 17:51 ` Jeff King
2018-07-16 18:22   ` Jonathan Nieder
2018-07-16 18:52     ` Jeff King
2018-07-16 19:09       ` Jonathan Nieder
2018-07-16 19:41         ` Jeff King
2018-07-16 19:54           ` Jonathan Nieder
2018-07-16 20:29             ` Jeff King
2018-07-16 20:37               ` Jonathan Nieder
2018-07-16 21:09                 ` Jeff King
2018-07-16 21:40                   ` Jonathan Nieder
2018-07-16 21:45                     ` Jeff King
2018-07-16 22:03                       ` Jonathan Nieder
2018-07-16 22:43                         ` Jeff King
2018-07-16 22:56                           ` Jonathan Nieder
2018-07-16 23:26                             ` Jeff King
2018-07-17  1:53                               ` Jonathan Nieder
2018-07-17  8:59                                 ` Ævar Arnfjörð Bjarmason
2018-07-17 14:03                                   ` Jonathan Nieder
2018-07-17 15:24                                     ` Ævar Arnfjörð Bjarmason
2018-07-17 20:27                                   ` Jeff King
2018-07-18 13:11                                     ` Ævar Arnfjörð Bjarmason
2018-07-18 17:29                                       ` Jeff King
2018-07-17 15:59                                 ` Duy Nguyen
2018-07-17 18:09                                 ` Junio C Hamano
2018-07-16 19:15 ` Elijah Newren
2018-07-16 19:19   ` Jonathan Nieder
2018-07-16 20:21     ` Elijah Newren
2018-07-16 20:35       ` Jeff King
2018-07-16 20:56         ` Jonathan Nieder
2018-07-16 21:12           ` Jeff King
2018-07-16 19:52   ` Jeff King
2018-07-16 20:16     ` Elijah Newren
2018-07-16 20:38       ` Jeff King
2018-07-16 21:09         ` Elijah Newren
2018-07-16 21:21           ` Jeff King [this message]
2018-07-16 22:07             ` Elijah Newren
2018-07-16 22:55               ` Jeff King
2018-07-16 23:06                 ` Elijah Newren
2018-07-16 21:31           ` Jonathan Nieder
2018-07-17  6:51 ` [PATCH v2 0/3] gc --auto: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17  6:53   ` [PATCH 1/3] gc: improve handling of errors reading gc.log Jonathan Nieder
2018-07-17 18:19     ` Junio C Hamano
2018-07-17 19:58     ` Jeff King
2018-07-17  6:54   ` [PATCH 2/3] gc: exit with status 128 on failure Jonathan Nieder
2018-07-17 18:22     ` Junio C Hamano
2018-07-17 19:59     ` Jeff King
2018-09-17 18:33       ` Jeff King
2018-09-17 18:40         ` Jonathan Nieder
2018-09-18 17:30           ` Jeff King
2018-07-17  6:57   ` [PATCH 3/3] gc: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17 20:13     ` Jeff King
2018-07-18 16:21       ` Junio C Hamano
2018-07-18 17:22         ` Jeff King
2018-07-18 18:19           ` Junio C Hamano
2018-07-18 19:06             ` Jeff King
2018-07-18 19:55               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180716212159.GH25189@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).