From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: No progress from push when using bitmaps
Date: Fri, 14 Mar 2014 11:29:40 -0400 [thread overview]
Message-ID: <20140314152939.GA6186@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8BZd0uL2M8NHO+sFcWfkmoyuVnU1Mh2eKZ2OXm=Y-Ge5g@mail.gmail.com>
On Fri, Mar 14, 2014 at 05:21:59PM +0700, Duy Nguyen wrote:
> On Fri, Mar 14, 2014 at 4:43 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> > Would it be practical to change it to a percentage of bytes written?
> > Then we'd have progress info that is both convenient *and* truthful.
>
> I agreed for a second, then remembered that we don't know the final
> pack size until we finish writing it.. Not sure if we could estimate
> (cheaply) with a good accuracy though.
Right. I'm not sure what Michael meant by "it". We can send a percentage
of bytes written for the reused pack (my option 2), but we do not know
the total bytes for the rest of the objects. So we'd end up with two
progress meters (one for the reused pack, and one for everything else),
both counting up to different endpoints. And it would require quite a
few changes to the progress code.
> If an object is reused, we already know its compressed size. If it's
> not reused and is a loose object, we could use on-disk size. It's a
> lot harder to estimate an not-reused, deltified object. All we have is
> the uncompressed size, and the size of each delta in the delta chain..
> Neither gives a good hint of what the compressed size would be.
Hmm. I think we do have the compressed delta size after having run the
compression phase (because that is ultimately what we compare to find
the best delta). Loose objects are probably the hardest here, as we
actually recompress them (IIRC, because packfiles encode the type/size
info outside of the compressed bit, whereas it is inside for loose
objects; the "experimental loose" format harmonized this, but it never
caught on).
Without doing that recompression, any value you came up with would be an
estimate, though it would be pretty close (not off by more than a few
bytes per object). However, you can't just run through the packing list
and add up the object sizes; you'd need to do a real "dry-run" through
the writing phase. There are probably more I'm missing, but you need at
least to figure out:
1. The actual compressed size of a full loose object, as described
above.
2. The variable-length headers for each object based on its type and
size.
3. The final form that the object will take based on what has come
before. For example, if there is a max pack size, we may split an
object from its delta base, in which case we have to throw away the
delta. We don't know where those breaks will be until we walk
through the whole list.
4. If an object we attempt to reuse turns out to be corrupted, we
fall back to the non-reuse code path, which will have a different
size. So you'd need to actually check the reused object CRCs during
the dry-run (and for local repacks, not transfers, we actually
inflate and check the zlib, too, for safety).
So I think it's _possible_. But it's definitely not trivial. For now, I
think it makes sense to go with something like the patch I posted
earlier (which I'll re-roll in a few minutes). That fixes what is IMHO a
regression in the bitmaps case. And it does not make it any harder for
somebody to later convert us to a true byte-counter (i.e., it is the
easy half already).
-Peff
next prev parent reply other threads:[~2014-03-14 15:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-13 0:21 No progress from push when using bitmaps Shawn Pearce
2014-03-13 21:26 ` Jeff King
2014-03-13 22:01 ` Shawn Pearce
2014-03-13 22:07 ` Jeff King
2014-03-13 22:23 ` Junio C Hamano
2014-03-13 22:24 ` Jeff King
2014-03-14 9:43 ` Michael Haggerty
2014-03-14 10:21 ` Duy Nguyen
2014-03-14 15:29 ` Jeff King [this message]
2014-03-14 23:53 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140314152939.GA6186@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).