git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Fick <mfick@codeaurora.org>
To: git@vger.kernel.org
Subject: Re: Keeping unreachable objects in a separate pack instead of loose?
Date: Wed, 13 Jun 2012 18:17:46 +0000 (UTC)	[thread overview]
Message-ID: <loom.20120613T185623-81@post.gmane.org> (raw)
In-Reply-To: 20120612191528.GB16911@sigill.intra.peff.net

Jeff King <peff <at> peff.net> writes:
> > Then, the creation of unreferenced objects from successive 'git add' 
> > shouldn't create that many objects in the first place.  They currently 
> > never get the chance to be packed to start with.
> 
> I don't think these objects are necessarily from successive "git add"s.
> That is one source, but they may also come from reflogs expiring. I
> guess in that case that they would typically be in an older pack,
> though.
...
> That is satisfyingly simple, but the storage requirement is quite bad.
> The unreachable objects are very much in the minority, and an 
> occasional duplication there is not a big deal; duplicating all of the 
> reachable objects would double the object directory's size.
...
(I don't think this is a valid generalization for servers)

I am sorry to be coming a bit late into this discussion, but I think there
 is an even worse use case which can cause much worse loose object 
explosions which does not seem to have been mentioned yet:   "the 
server upload rejected case".  For example, think of a client pushing a 
change from the wrong repository to a server.  Since there will be no 
history in common, the client will push the entire repository and if for
 some reason this gets rejected by the server (perhaps a pre-receive 
hook, or a gerrit server which says:  "way too many new changes..."), 
then the pack file may stay abandonned on the server.  When gc runs: 
boom the entire history of that other project will explode but not get
 pruned since the pack file may be fairly new!

I believe that this has happened to us several times fairly recently.  We
 have a tiny project which some people keep confusing for the kernel
and they push a change destined for the kernel to it.  Gerrit rejects it and
their massive packfile (larger than the entire project) stays around.  If gc 
runs, it almost becomes a DOS for us, the sheer number of loose object
files makes the system crawl when accessing that repo, even on an SSD.
 We have been talking about moving to NFS soon (with packfiles git 
should still perform fairly well on NFS), but this explosion really scares 
me.

It seems like the current design is a DOS just waiting to happen for
servers.  While I would love to eliminate the races discussed in this
thread, I think I agree with Ted in that the first fix should just focus on
never expanding loose objects for pruning (if certain objects simply don't 
do well in pack files and the local gc policy says they should be loose, 
go ahead: expand them, but that should be unrelated to pruning).  People
can DOS a server with unused packfiles too, but that rarely will have the
same impact that loose objects would have,

-Martin


-- 
Employee of Qualcomm Innovation Center, Inc. which is a member 
of Code Aurora Forum

  reply	other threads:[~2012-06-13 18:30 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-10 12:31 Keeping unreachable objects in a separate pack instead of loose? Theodore Ts'o
2012-06-10 23:24 ` Hallvard B Furuseth
2012-06-11 14:44   ` Thomas Rast
2012-06-11 15:31     ` Ted Ts'o
2012-06-11 16:08       ` Jeff King
2012-06-11 17:04         ` Nicolas Pitre
2012-06-11 17:45           ` Ted Ts'o
2012-06-11 17:54             ` Jeff King
2012-06-11 18:20               ` Ted Ts'o
2012-06-11 18:43                 ` Jeff King
2012-06-11 17:46           ` Jeff King
2012-06-11 17:27         ` Ted Ts'o
2012-06-11 18:34           ` Jeff King
2012-06-11 20:44             ` Hallvard Breien Furuseth
2012-06-11 21:14               ` Jeff King
2012-06-11 21:41                 ` Hallvard Breien Furuseth
2012-06-11 21:14             ` Ted Ts'o
2012-06-11 21:39               ` Jeff King
2012-06-11 22:14                 ` Ted Ts'o
2012-06-11 22:23                   ` Jeff King
2012-06-11 22:28                     ` Ted Ts'o
2012-06-11 22:35                       ` Jeff King
2012-06-12  0:41                     ` Nicolas Pitre
2012-06-12 17:10                       ` Jeff King
2012-06-12 17:30                         ` Nicolas Pitre
2012-06-12 17:32                           ` Jeff King
2012-06-12 17:45                             ` Shawn Pearce
2012-06-12 17:50                               ` Jeff King
2012-06-12 17:57                                 ` Nicolas Pitre
2012-06-12 18:43                                 ` Andreas Schwab
2012-06-12 19:07                                   ` Jeff King
2012-06-12 19:09                                   ` Nicolas Pitre
2012-06-12 19:23                                     ` Jeff King
2012-06-12 19:39                                       ` Nicolas Pitre
2012-06-12 19:41                                         ` Jeff King
2012-06-12 17:55                               ` Nicolas Pitre
2012-06-12 17:49                             ` Nicolas Pitre
2012-06-12 17:54                               ` Jeff King
2012-06-12 18:25                                 ` Nicolas Pitre
2012-06-12 18:37                                   ` Ted Ts'o
2012-06-12 19:15                                     ` Nicolas Pitre
2012-06-12 19:19                                       ` Ted Ts'o
2012-06-12 19:35                                         ` Nicolas Pitre
2012-06-12 19:43                                           ` Ted Ts'o
2012-06-12 19:15                                   ` Jeff King
2012-06-13 18:17                                     ` Martin Fick [this message]
2012-06-13 21:27                                       ` Johan Herland
2012-06-11 15:40 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=loom.20120613T185623-81@post.gmane.org \
    --to=mfick@codeaurora.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).