Re: No progress from push when using bitmaps

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: No progress from push when using bitmaps
Date: Fri, 14 Mar 2014 11:29:40 -0400	[thread overview]
Message-ID: <20140314152939.GA6186@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8BZd0uL2M8NHO+sFcWfkmoyuVnU1Mh2eKZ2OXm=Y-Ge5g@mail.gmail.com>

On Fri, Mar 14, 2014 at 05:21:59PM +0700, Duy Nguyen wrote:

> On Fri, Mar 14, 2014 at 4:43 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> > Would it be practical to change it to a percentage of bytes written?
> > Then we'd have progress info that is both convenient *and* truthful.
> 
> I agreed for a second, then remembered that we don't know the final
> pack size until we finish writing it.. Not sure if we could estimate
> (cheaply) with a good accuracy though.

Right. I'm not sure what Michael meant by "it". We can send a percentage
of bytes written for the reused pack (my option 2), but we do not know
the total bytes for the rest of the objects. So we'd end up with two
progress meters (one for the reused pack, and one for everything else),
both counting up to different endpoints. And it would require quite a
few changes to the progress code.

> If an object is reused, we already know its compressed size. If it's
> not reused and is a loose object, we could use on-disk size. It's a
> lot harder to estimate an not-reused, deltified object. All we have is
> the uncompressed size, and the size of each delta in the delta chain..
> Neither gives a good hint of what the compressed size would be.

Hmm. I think we do have the compressed delta size after having run the
compression phase (because that is ultimately what we compare to find
the best delta). Loose objects are probably the hardest here, as we
actually recompress them (IIRC, because packfiles encode the type/size
info outside of the compressed bit, whereas it is inside for loose
objects; the "experimental loose" format harmonized this, but it never
caught on).

Without doing that recompression, any value you came up with would be an
estimate, though it would be pretty close (not off by more than a few
bytes per object). However, you can't just run through the packing list
and add up the object sizes; you'd need to do a real "dry-run" through
the writing phase. There are probably more I'm missing, but you need at
least to figure out:

  1. The actual compressed size of a full loose object, as described
     above.

  2. The variable-length headers for each object based on its type and
     size.

  3. The final form that the object will take based on what has come
     before. For example, if there is a max pack size, we may split an
     object from its delta base, in which case we have to throw away the
     delta. We don't know where those breaks will be until we walk
     through the whole list.

  4. If an object we attempt to reuse turns out to be corrupted, we
     fall back to the non-reuse code path, which will have a different
     size. So you'd need to actually check the reused object CRCs during
     the dry-run (and for local repacks, not transfers, we actually
     inflate and check the zlib, too, for safety).

So I think it's _possible_. But it's definitely not trivial. For now, I
think it makes sense to go with something like the patch I posted
earlier (which I'll re-roll in a few minutes). That fixes what is IMHO a
regression in the bitmaps case. And it does not make it any harder for
somebody to later convert us to a true byte-counter (i.e., it is the
easy half already).

-Peff

next prev parent reply	other threads:[~2014-03-14 15:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-13  0:21 No progress from push when using bitmaps Shawn Pearce
2014-03-13 21:26 ` Jeff King
2014-03-13 22:01   ` Shawn Pearce
2014-03-13 22:07     ` Jeff King
2014-03-13 22:23       ` Junio C Hamano
2014-03-13 22:24       ` Jeff King
2014-03-14  9:43       ` Michael Haggerty
2014-03-14 10:21         ` Duy Nguyen
2014-03-14 15:29           ` Jeff King [this message]
2014-03-14 23:53             ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140314152939.GA6186@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).